Amazon Elasticsearch Service Monitoring Integration
Elasticsearch as a service from AWS makes it easy to deploy and operate Elasticsearch for log analytics, data search and more. By monitoring Amazon ES with Site24x7 you can oversee the operational aspects like performance optimization.
Table of contents
- Setup and configuration
- Policies and permissions
- Polling frequency
- Threshold configuration
- Supported metrics
- EBS volume metrics
- Dedicated master node metrics
- Instance metrics
- Ultra warm metrics
- Forecast
- Elasticsearch monitoring interface
Setup and configuration
- If you haven't done it already, enable access to your AWS resource by creating Site24x7 as an IAM user or by creating a cross-account IAM role between your account and Site24x7's AWS account. Learn more.
- Next, In the Integrate AWS Account page, please make sure the Elasticsearch checkbox is selected in the Services to be discovered field. Learn more.
Policies and permissions
Please make sure the following read level actions are present in the IAM policy assigned to Site24x7 entity. Learn more.
- "es:DescribeElasticsearchDomain",
- "es:ListDomainNames",
- "es:ListTags",
- "logs:DescribeLogStreams",
- "logs:GetLogEvents",
- "es:DescribePackages"
Polling frequency
Site24x7 queries the AWS service level APIs and CloudWatch APIs as per the poll frequency set (1 minute to a day), to collect performance metrics. Learn more.
Threshold configuration
Go to Admin > Configuration Profiles > Threshold and Availability (+) > choose the monitor type. You can set threshold values for all the applicable metrics. Further, you can choose to mute inactive alerts in the threshold form for elastic search nodes.
Supported metrics
Attribute | Description | Unit | Statistic |
Cluster Status |
Green - Indicates that all index shards are allocated to nodes in the cluster. Yellow- Indicates that the primary shards for all indices are allocated to nodes in a cluster, but the replica shards for at least one index are not. Red- Indicates that the primary and replica shards of at least one index are not allocated to nodes in a cluster. |
State | Minimum |
CPU Utilization | The percentage of CPU resources used for data nodes in the cluster.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Percentage | Average |
Storage | The free space and used space in GB, for nodes in the cluster. | GB | Sum, Maximum |
Nodes | The number of nodes in the Amazon ES cluster, including dedicated master nodes. | Count | Minimum |
Documents |
Searcable documents- The total number of searchable documents across all indices in the cluster. Editable documents - The total number of documents marked for deletion across all indices in the cluster and do not appear in the search results. |
Count | Maximum |
Cluster Index Writes Blocked |
Cluster block or accepts incoming requests. 0 - cluster is accepting requests, 1 - cluster is blocking requests. |
State | Maximum |
JVM Memory Pressure | The percentage of the Java heap used for all data nodes in the cluster. | Percentage | Maximum |
Automated snapshot failure | The number of failed automated snapshots for the cluster. | Count | Maximum |
CPU Credit Balance | The remaining CPU credits available for data nodes in the cluster. | Count | Minimum |
Kibana Healthy Nodes | A health check for Kibana.
1- normal behavior, 0- Kibana is inaccessible. |
State | Minimum |
KMS Key Error | KMS customer master key used to encrypt data at rest has been disabled. | State | Maximum |
KMS Key Inaccessible | KMS customer master key used to encrypt data at rest has been deleted or revoked its grants to Amazon ES. | State | Maximum |
Invalid Host Header Requests | The number of HTTP requests made to the Elasticsearch cluster that included an invalid (or missing) host header. | Count | Sum |
Elastcisearch Requests | The number of requests made to the Elasticsearch cluster. | Count | Sum |
Request Count | The number of requests to a domain and the HTTP response code (2xx, 3xx, 4xx, 5xx) for each request. | Count | Sum |
EBS volume metrics
Attribute | Description | Unit | Statistic |
Read Latency | The latency, in seconds, for read operations on EBS volumes. | Count/sec | Average |
Write Latency | The latency, in seconds, for write operations on EBS volumes. | Count/sec | Average |
Read Throughput | The throughput, in bytes per second, for read operations on EBS volumes. | MB/sec | Average |
Write Throughput | The throughput, in bytes per second, for write operations on EBS volumes. | MB/sec | Average |
Disk Queue Depth | The number of pending input and output (I/O) requests for an EBS volume. | Count | Maximum |
Read IOPS | The number of input and output (I/O) operations per second for read operations on EBS volumes. | Count/sec | Average |
Write IOPS | The number of input and output (I/O) operations per second for write operations on EBS volumes. | Count/sec | Average |
Dedicated master node metrics
Attribute | Description | Unit | Statistic |
Master CPU Utilization | The maximum percentage of CPU resources used by the dedicated master nodes. | Percentage | Average |
Master Free Storage Space | Free storage space for master node.
Applicable as an Elasticsearch node metric. |
MB | Average |
Master JVM Memory Pressure | The maximum percentage of the Java heap used for all dedicated master nodes in the cluster. | Percentage | Maximum |
Master CPU Credit Balance | The CPU credits available for dedicated master nodes in the cluster. | Count | Minimum |
Master Reachable From Node | A health check for MasterNotDiscovered exceptions. A value of 1 indicates normal behavior. A value of 0 indicates that cluster health is failing. | Count | Sum |
Master Sys Memory Utilization | The percentage of the master node's memory that is in use. | Percentage | Maximum |
Instance metrics
Attribute | Description | Unit | Statistic |
Indexing Latency | The average time, in milliseconds, that it takes a shard to complete an indexing operation.
Applicable as an Elasticsearch node metric. |
Milliseconds | Average |
Indexing Rate | The number of indexing operations per minute. A single call to the _bulk API that adds two documents and updates two counts as four operations, which might be spread across one or more nodes. If that index has one or more replicas, other nodes in the cluster also record a total of four indexing operations. Document deletions do not count towards this metric.
Applicable as an Elasticsearch node metric. |
Ops/min | Average |
Search Latency | The average time, in milliseconds, that it takes a shard on a data node to complete a search operation.
Applicable as an Elasticsearch node metric. |
Milliseconds | Average |
Search Rate | The total number of search requests per minute for all shards on a data node. A single call to the _search API might return results from many different shards. If five of these shards are on one node, the node would report 5 for this metric, even though the client only made one request.
Applicable as an Elasticsearch node metric. |
Ops/min | Average |
Sys Memory Utilization | The percentage of the instance's memory that is in use.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Percentage | Maximum |
JVMGC Young Collection Count | The number of times that "young generation" garbage collection has run. A large, ever-growing number of runs is a normal part of cluster operations.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Sum |
JVMGC Young Collection Time | The amount of time, in milliseconds, that the cluster has spent performing "young generation" garbage collection.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Milliseconds | Average |
JVMGC Old Collection Count | The number of times that "old generation" garbage collection has run. In a cluster with sufficient resources, this number should remain small and grow infrequently.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Sum |
JVMGC Old Collection Time | The amount of time, in milliseconds, that the cluster has spent performing "old generation" garbage collection.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Millisecond | Average |
Threadpool Force_merge Queue | The number of queued tasks in the force merge thread pool.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Force_merge Rejected | The number of rejected tasks in the force merge thread pool. If this number continually grows, consider scaling your cluster..
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Force_merge Threads | The size of the force merge thread pool.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Average |
Threadpool Index Queue | The number of queued tasks in the index thread pool.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Index Rejected | The number of rejected tasks in the index thread pool. If this number continually grows, consider scaling your cluster.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Index Threads | The number of queued tasks in the search thread pool.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Search Queue | The number of queued tasks in the search thread pool.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Search Rejected | The number of rejected tasks in the search thread pool.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Search Threads | The number of rejected tasks in the search thread pool.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Average |
Threadpool Bulk Queue | The number of queued tasks in the bulk thread pool. If the queue size is consistently high, consider scaling your cluster.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Bulk Rejected | The number of rejected tasks in the bulk thread pool.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Bulk Threads | The number of rejected tasks in the search thread pool.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Average |
Threadpool Write Threads | The size of the write thread pool.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Average |
Threadpool Write Rejected | The number of rejected tasks in the write thread pool.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Write Queue | The number of queued tasks in the write thread pool.
Applicable as an Elasticsearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Ultra warm metrics
Attribute | Description | Unit | Statistic |
Warm CPU Utilization | The percentage of CPU usage for UltraWarm nodes in the cluster. | Percentage | Average |
Warm Free Storage Space | The amount of free warm storage space in MB. | MB | Average |
Warm JVM Memory Pressure | The maximum percentage of the Java heap used for the UltraWarm nodes. | Percentage | Max |
Warm Searchable Documents | The total number of searchable documents across all warm indices in the cluster. | Count | Sum |
Warm Search Latency | The average time, in milliseconds, that it takes a shard on an UltraWarm node to complete a search operation. | Milliseconds | Average |
Warm Search Rate | The total number of search requests per minute for all shards on an UltraWarm node. A single call to the _search API might return results from many different shards. | Ops/min | Average |
Warm Storage Space Utilization | The total amount of warm storage space that the cluster is using. | MB | Maximum |
Hot Storage Space Utilization | The total amount of hot storage space that the cluster is using. | MB | Maximum |
Warm Sys Memory Utilization | The percentage of the warm node's memory that is in use. | Percentage | Maximum |
Hot To Warm Migration Queue Size | The number of indices currently waiting to migrate from hot to warm storage. | Count | Maximum |
Warm To Hot Migration Queue Size | The number of indices currently waiting to migrate from warm to hot storage. | Count | Maximum |
Hot To Warm Migration Failure Count | The total number of failed hot to warm migrations. | Count | Sum |
Hot To Warm Migration Success Count | The total number of successful hot to warm migrations. | Count | Sum |
Forecast
Estimate future values of the following Elasticsearch Domain performance metrics and make informed decisions about adding capacity or scaling your AWS infrastructure.
- Deleted Documents
- CPU Utilization
- Free Storage Usage
- Cluster Used Space
- CPU Credit Balance
- Elastisearch Requests
- Disk Queue Depth
- Read IOPS
- JVMGC Old Collection Time
- JVMGC Old Collection Count
- Sys Memory Utilization
Similarly, you can also view the forecast for the following metrics of Elasticsearch Domain Node:
- CPU Utilization
- Free Storage Space
- Cluster Used Space
- Search Rate
- Sys Memory Utilization
- JVMGC Old Collection Time
- JVMGC Old Collection Count
Elasticsearch monitoring interface
Summary
View the performance metrics of the Elastcisearch service displayed as time series charts.
Volume details
Detailed graphs of EBS volumes metrics such as Read/Write IOPS, Read/Write latency and Read/Write throughput.