Help Amazon Web Services Amazon Elasticsearch Service

Amazon Elasticsearch Service Monitoring Integration

 Elasticsearch as a service from AWS makes it easy to deploy and operate Elasticsearch for log analytics, data search and more. By monitoring Amazon ES with Site24x7 you can oversee the operational aspects like performance optimization.

Table of contents

Setup and configuration

  • If you haven't done it already, enable access to your AWS resource by creating Site24x7 as an IAM user or by creating a cross-account IAM role between your account and Site24x7's AWS account. Learn more.
  • Next, In the Integrate AWS Account page, please make sure the Elasticsearch checkbox is selected in the Services to be discovered field. Learn more.

Policies and permissions

Please make sure the following read level actions are present in the IAM policy assigned to Site24x7 entity. Learn more.

  • "es:DescribeElasticsearchDomain",
  • "es:ListDomainNames",
  • "es:ListTags",
  • "logs:DescribeLogStreams",
  • "logs:GetLogEvents",
  • "es:DescribePackages"

Polling frequency

Site24x7 queries the AWS service level APIs and CloudWatch APIs as per the poll frequency set (1 minute to a day), to collect performance metrics. Learn more.

Threshold configuration

Go to Admin > Configuration Profiles > Threshold and Availability (+) > choose the monitor type. You can set threshold values for all the applicable metrics. Further, you can choose to mute inactive alerts in the threshold form for elastic search nodes.

Supported metrics

Attribute Description Unit Statistic
Cluster Status

Green - Indicates that all index shards are allocated to nodes in the cluster.

Yellow- Indicates that the primary shards for all indices are allocated to nodes in a cluster, but the replica shards for at least one index are not.

Red- Indicates that the primary and replica shards of at least one index are not allocated to nodes in a cluster.

Learn more here.

State Minimum
CPU Utilization The percentage of CPU resources used for data nodes in the cluster.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Percentage Average
Storage The free space and used space in GB, for nodes in the cluster. GB Sum, Maximum
Nodes The number of nodes in the Amazon ES cluster, including dedicated master nodes. Count Minimum
Documents

Searcable documents- The total number of searchable documents across all indices in the cluster.

Editable documents - The total number of documents marked for deletion across all indices in the cluster and do not appear in the search results.

Count Maximum
Cluster Index Writes Blocked

Cluster block or accepts incoming requests.

0 - cluster is accepting requests, 1 - cluster is blocking requests.

State Maximum
JVM Memory Pressure The percentage of the Java heap used for all data nodes in the cluster. Percentage Maximum
Automated snapshot failure The number of failed automated snapshots for the cluster. Count Maximum
CPU Credit Balance The remaining CPU credits available for data nodes in the cluster. Count Minimum
Kibana Healthy Nodes A health check for Kibana.

1- normal behavior, 0- Kibana is inaccessible.

State Minimum
KMS Key Error KMS customer master key used to encrypt data at rest has been disabled. State Maximum
KMS Key Inaccessible KMS customer master key used to encrypt data at rest has been deleted or revoked its grants to Amazon ES. State Maximum
Invalid Host Header Requests The number of HTTP requests made to the Elasticsearch cluster that included an invalid (or missing) host header. Count Sum
Elastcisearch Requests The number of requests made to the Elasticsearch cluster. Count Sum
Request Count The number of requests to a domain and the HTTP response code (2xx, 3xx, 4xx, 5xx) for each request. Count Sum

Top

EBS volume metrics

Attribute Description Unit Statistic
Read Latency The latency, in seconds, for read operations on EBS volumes. Count/sec Average
Write Latency The latency, in seconds, for write operations on EBS volumes. Count/sec Average
Read Throughput The throughput, in bytes per second, for read operations on EBS volumes. MB/sec Average
Write Throughput The throughput, in bytes per second, for write operations on EBS volumes. MB/sec Average
Disk Queue Depth The number of pending input and output (I/O) requests for an EBS volume. Count Maximum
Read IOPS The number of input and output (I/O) operations per second for read operations on EBS volumes. Count/sec Average
Write IOPS The number of input and output (I/O) operations per second for write operations on EBS volumes. Count/sec Average

Dedicated master node metrics

Attribute Description Unit Statistic
Master CPU Utilization The maximum percentage of CPU resources used by the dedicated master nodes. Percentage Average
Master Free Storage Space Free storage space for master node.

Applicable as an Elasticsearch node metric.

MB Average
Master JVM Memory Pressure The maximum percentage of the Java heap used for all dedicated master nodes in the cluster. Percentage Maximum
Master CPU Credit Balance The CPU credits available for dedicated master nodes in the cluster. Count Minimum
Master Reachable From Node A health check for MasterNotDiscovered exceptions. A value of 1 indicates normal behavior. A value of 0 indicates that cluster health is failing. Count Sum
Master Sys Memory Utilization The percentage of the master node's memory that is in use. Percentage Maximum

Instance metrics

Attribute Description Unit Statistic
Indexing Latency The average time, in milliseconds, that it takes a shard to complete an indexing operation.

Applicable as an Elasticsearch node metric.

Milliseconds Average
Indexing Rate The number of indexing operations per minute. A single call to the _bulk API that adds two documents and updates two counts as four operations, which might be spread across one or more nodes. If that index has one or more replicas, other nodes in the cluster also record a total of four indexing operations. Document deletions do not count towards this metric.

Applicable as an Elasticsearch node metric.

Ops/min Average
Search Latency The average time, in milliseconds, that it takes a shard on a data node to complete a search operation.

Applicable as an Elasticsearch node metric.

Milliseconds Average
Search Rate The total number of search requests per minute for all shards on a data node. A single call to the _search API might return results from many different shards. If five of these shards are on one node, the node would report 5 for this metric, even though the client only made one request.

Applicable as an Elasticsearch node metric.

Ops/min Average
Sys Memory Utilization The percentage of the instance's memory that is in use.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Percentage Maximum
JVMGC Young Collection Count The number of times that "young generation" garbage collection has run. A large, ever-growing number of runs is a normal part of cluster operations.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Sum
JVMGC Young Collection Time The amount of time, in milliseconds, that the cluster has spent performing "young generation" garbage collection.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Milliseconds Average
JVMGC Old Collection Count The number of times that "old generation" garbage collection has run. In a cluster with sufficient resources, this number should remain small and grow infrequently.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Sum
JVMGC Old Collection Time The amount of time, in milliseconds, that the cluster has spent performing "old generation" garbage collection.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Millisecond Average
Threadpool Force_merge Queue The number of queued tasks in the force merge thread pool.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Sum
Threadpool Force_merge Rejected The number of rejected tasks in the force merge thread pool. If this number continually grows, consider scaling your cluster..

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Sum
Threadpool Force_merge Threads The size of the force merge thread pool.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Average
Threadpool Index Queue The number of queued tasks in the index thread pool.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Sum
Threadpool Index Rejected The number of rejected tasks in the index thread pool. If this number continually grows, consider scaling your cluster.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Sum
Threadpool Index Threads The number of queued tasks in the search thread pool.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Sum
Threadpool Search Queue The number of queued tasks in the search thread pool.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Sum
Threadpool Search Rejected The number of rejected tasks in the search thread pool.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Sum
Threadpool Search Threads The number of rejected tasks in the search thread pool.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Average
Threadpool Bulk Queue The number of queued tasks in the bulk thread pool. If the queue size is consistently high, consider scaling your cluster.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Sum
Threadpool Bulk Rejected The number of rejected tasks in the bulk thread pool.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Sum
Threadpool Bulk Threads The number of rejected tasks in the search thread pool.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Average
Threadpool Write Threads The size of the write thread pool.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Average
Threadpool Write Rejected The number of rejected tasks in the write thread pool.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Sum
Threadpool Write Queue The number of queued tasks in the write thread pool.

Applicable as an Elasticsearch node metric with a relevant statistic as Maximum.

Count Sum

Top

Ultra warm metrics

Attribute Description Unit Statistic
Warm CPU Utilization The percentage of CPU usage for UltraWarm nodes in the cluster. Percentage Average
Warm Free Storage Space The amount of free warm storage space in MB. MB Average
Warm JVM Memory Pressure The maximum percentage of the Java heap used for the UltraWarm nodes. Percentage Max
Warm Searchable Documents The total number of searchable documents across all warm indices in the cluster. Count Sum
Warm Search Latency The average time, in milliseconds, that it takes a shard on an UltraWarm node to complete a search operation. Milliseconds Average
Warm Search Rate The total number of search requests per minute for all shards on an UltraWarm node. A single call to the _search API might return results from many different shards. Ops/min Average
Warm Storage Space Utilization The total amount of warm storage space that the cluster is using. MB Maximum
Hot Storage Space Utilization The total amount of hot storage space that the cluster is using. MB Maximum
Warm Sys Memory Utilization The percentage of the warm node's memory that is in use. Percentage Maximum
Hot To Warm Migration Queue Size The number of indices currently waiting to migrate from hot to warm storage. Count Maximum
Warm To Hot Migration Queue Size The number of indices currently waiting to migrate from warm to hot storage. Count Maximum
Hot To Warm Migration Failure Count The total number of failed hot to warm migrations. Count Sum
Hot To Warm Migration Success Count The total number of successful hot to warm migrations. Count Sum

Forecast

Estimate future values of the following Elasticsearch Domain performance metrics and make informed decisions about adding capacity or scaling your AWS infrastructure.

  • Deleted Documents
  • CPU Utilization
  • Free Storage Usage
  • Cluster Used Space
  • CPU Credit Balance
  • Elastisearch Requests
  • Disk Queue Depth
  • Read IOPS
  • JVMGC Old Collection Time
  • JVMGC Old Collection Count
  • Sys Memory Utilization

Similarly, you can also view the forecast for the following metrics of Elasticsearch Domain Node:

  • CPU Utilization
  • Free Storage Space
  • Cluster Used Space
  • Search Rate
  • Sys Memory Utilization
  • JVMGC Old Collection Time
  • JVMGC Old Collection Count

Elasticsearch monitoring interface

Summary

View the performance metrics of the Elastcisearch service displayed as time series charts.

Volume details

Detailed graphs of EBS volumes metrics such as Read/Write IOPS, Read/Write latency and Read/Write throughput.

Top

Was this document helpful?
Thanks for taking the time to share your feedback. We’ll use your feedback to improve our online help resources.

Help Amazon Web Services Amazon Elasticsearch Service