Amazon OpenSearch Service Monitoring Integration

Amazon OpenSearch Service (previously, Amazon Elasticsearch Service) makes it easy to deploy and operate OpenSearch for log analytics, data search and more. By monitoring Amazon OpenSearch Service with Site24x7 you can oversee the operational aspects like performance optimization.

Setup and configuration

If you haven't done it already, enable access to your AWS resource by creating Site24x7 as an IAM user or by creating a cross-account IAM role between your account and Site24x7's AWS account. Learn more.
Next, In the Integrate AWS Account page, please make sure the OpenSearch checkbox is selected in the Services to be discovered field. Learn more.

Policies and permissions

Please make sure the following read level actions are present in the IAM policy assigned to Site24x7 entity. Learn more.

"es:DescribeElasticsearchDomain",
"es:ListDomainNames",
"es:ListTags",
"logs:DescribeLogStreams",
"logs:GetLogEvents",
"es:DescribePackages"

Polling frequency

Site24x7 queries the AWS service level APIs and CloudWatch APIs as per the poll frequency set (1 minute to a day), to collect performance metrics. Learn more.

Threshold configuration

Go to Admin > Configuration Profiles > Threshold and Availability (+) > choose the monitor type. You can set threshold values for all the applicable metrics. Further, you can choose to mute inactive alerts in the threshold form for OpenSearch nodes.

Supported metrics

Attribute	Description	Unit	Statistic
Cluster Status	Green - Indicates that all index shards are allocated to nodes in the cluster. Yellow- Indicates that the primary shards for all indices are allocated to nodes in a cluster, but the replica shards for at least one index are not. Red- Indicates that the primary and replica shards of at least one index are not allocated to nodes in a cluster. Learn more here.	State	Minimum
CPU Utilization	The percentage of CPU resources used for data nodes in the cluster. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Percentage	Average
Storage	The free space and used space in GB, for nodes in the cluster.	GB	Sum, Maximum
Nodes	The number of nodes in the Amazon OpenSearch cluster, including dedicated master nodes.	Count	Minimum
Documents	Searcable documents- The total number of searchable documents across all indices in the cluster. Editable documents - The total number of documents marked for deletion across all indices in the cluster and do not appear in the search results.	Count	Maximum
Cluster Index Writes Blocked	Cluster block or accepts incoming requests. 0 - cluster is accepting requests, 1 - cluster is blocking requests.	State	Maximum
JVM Memory Pressure	The percentage of the Java heap used for all data nodes in the cluster.	Percentage	Maximum
Automated snapshot failure	The number of failed automated snapshots for the cluster.	Count	Maximum
CPU Credit Balance	The remaining CPU credits available for data nodes in the cluster.	Count	Minimum
OpenSearchDashboardsHealthyNodes (previously KibanaHealthyNodes)	A health check for Kibana. 1- normal behavior, 0- Kibana is inaccessible.	State	Minimum
KMS Key Error	KMS customer master key used to encrypt data at rest has been disabled.	State	Maximum
KMS Key Inaccessible	KMS customer master key used to encrypt data at rest has been deleted or revoked its grants to Amazon ES.	State	Maximum
Invalid Host Header Requests	The number of HTTP requests made to the OpenSearch cluster that included an invalid (or missing) host header.	Count	Sum
Elastcisearch Requests	The number of requests made to the OpenSearch cluster.	Count	Sum
Request Count	The number of requests to a domain and the HTTP response code (2xx, 3xx, 4xx, 5xx) for each request.	Count	Sum

Top

EBS volume metrics

Attribute	Description	Unit	Statistic
Read Latency	The latency, in seconds, for read operations on EBS volumes.	Count/sec	Average
Write Latency	The latency, in seconds, for write operations on EBS volumes.	Count/sec	Average
Read Throughput	The throughput, in bytes per second, for read operations on EBS volumes.	MB/sec	Average
Write Throughput	The throughput, in bytes per second, for write operations on EBS volumes.	MB/sec	Average
Disk Queue Depth	The number of pending input and output (I/O) requests for an EBS volume.	Count	Maximum
Read IOPS	The number of input and output (I/O) operations per second for read operations on EBS volumes.	Count/sec	Average
Write IOPS	The number of input and output (I/O) operations per second for write operations on EBS volumes.	Count/sec	Average

Dedicated master node metrics

Attribute	Description	Unit	Statistic
Master CPU Utilization	The maximum percentage of CPU resources used by the dedicated master nodes.	Percentage	Average
Master Free Storage Space	Free storage space for master node. Applicable as an OpenSearch node metric.	MB	Average
Master JVM Memory Pressure	The maximum percentage of the Java heap used for all dedicated master nodes in the cluster.	Percentage	Maximum
Master CPU Credit Balance	The CPU credits available for dedicated master nodes in the cluster.	Count	Minimum
Master Reachable From Node	A health check for MasterNotDiscovered exceptions. A value of 1 indicates normal behavior. A value of 0 indicates that cluster health is failing.	Count	Sum
Master Sys Memory Utilization	The percentage of the master node's memory that is in use.	Percentage	Maximum

Instance metrics

Attribute	Description	Unit	Statistic
Indexing Latency	The average time, in milliseconds, that it takes a shard to complete an indexing operation. Applicable as an OpenSearch node metric.	Milliseconds	Average
Indexing Rate	The number of indexing operations per minute. A single call to the _bulk API that adds two documents and updates two counts as four operations, which might be spread across one or more nodes. If that index has one or more replicas, other nodes in the cluster also record a total of four indexing operations. Document deletions do not count towards this metric. Applicable as an OpenSearch node metric.	Ops/min	Average
Search Latency	The average time, in milliseconds, that it takes a shard on a data node to complete a search operation. Applicable as an OpenSearch node metric.	Milliseconds	Average
Search Rate	The total number of search requests per minute for all shards on a data node. A single call to the _search API might return results from many different shards. If five of these shards are on one node, the node would report 5 for this metric, even though the client only made one request. Applicable as an OpenSearch node metric.	Ops/min	Average
Sys Memory Utilization	The percentage of the instance's memory that is in use. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Percentage	Maximum
JVMGC Young Collection Count	The number of times that "young generation" garbage collection has run. A large, ever-growing number of runs is a normal part of cluster operations. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Sum
JVMGC Young Collection Time	The amount of time, in milliseconds, that the cluster has spent performing "young generation" garbage collection. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Milliseconds	Average
JVMGC Old Collection Count	The number of times that "old generation" garbage collection has run. In a cluster with sufficient resources, this number should remain small and grow infrequently. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Sum
JVMGC Old Collection Time	The amount of time, in milliseconds, that the cluster has spent performing "old generation" garbage collection. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Millisecond	Average
Threadpool Force_merge Queue	The number of queued tasks in the force merge thread pool. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Sum
Threadpool Force_merge Rejected	The number of rejected tasks in the force merge thread pool. If this number continually grows, consider scaling your cluster.. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Sum
Threadpool Force_merge Threads	The size of the force merge thread pool. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Average
Threadpool Index Queue	The number of queued tasks in the index thread pool. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Sum
Threadpool Index Rejected	The number of rejected tasks in the index thread pool. If this number continually grows, consider scaling your cluster. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Sum
Threadpool Index Threads	The number of queued tasks in the search thread pool. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Sum
Threadpool Search Queue	The number of queued tasks in the search thread pool. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Sum
Threadpool Search Rejected	The number of rejected tasks in the search thread pool. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Sum
Threadpool Search Threads	The number of rejected tasks in the search thread pool. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Average
Threadpool Bulk Queue	The number of queued tasks in the bulk thread pool. If the queue size is consistently high, consider scaling your cluster. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Sum
Threadpool Bulk Rejected	The number of rejected tasks in the bulk thread pool. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Sum
Threadpool Bulk Threads	The number of rejected tasks in the search thread pool. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Average
Threadpool Write Threads	The size of the write thread pool. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Average
Threadpool Write Rejected	The number of rejected tasks in the write thread pool. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Sum
Threadpool Write Queue	The number of queued tasks in the write thread pool. Applicable as an OpenSearch node metric with a relevant statistic as Maximum.	Count	Sum

Top

Ultra warm metrics

Attribute	Description	Unit	Statistic
Warm CPU Utilization	The percentage of CPU usage for UltraWarm nodes in the cluster.	Percentage	Average
Warm Free Storage Space	The amount of free warm storage space in MB.	MB	Average
Warm JVM Memory Pressure	The maximum percentage of the Java heap used for the UltraWarm nodes.	Percentage	Max
Warm Searchable Documents	The total number of searchable documents across all warm indices in the cluster.	Count	Sum
Warm Search Latency	The average time, in milliseconds, that it takes a shard on an UltraWarm node to complete a search operation.	Milliseconds	Average
Warm Search Rate	The total number of search requests per minute for all shards on an UltraWarm node. A single call to the _search API might return results from many different shards.	Ops/min	Average
Warm Storage Space Utilization	The total amount of warm storage space that the cluster is using.	MB	Maximum
Hot Storage Space Utilization	The total amount of hot storage space that the cluster is using.	MB	Maximum
Warm Sys Memory Utilization	The percentage of the warm node's memory that is in use.	Percentage	Maximum
Hot To Warm Migration Queue Size	The number of indices currently waiting to migrate from hot to warm storage.	Count	Maximum
Warm To Hot Migration Queue Size	The number of indices currently waiting to migrate from warm to hot storage.	Count	Maximum
Hot To Warm Migration Failure Count	The total number of failed hot to warm migrations.	Count	Sum
Hot To Warm Migration Success Count	The total number of successful hot to warm migrations.	Count	Sum

Forecast

Estimate future values of the following OpenSearch Domain performance metrics and make informed decisions about adding capacity or scaling your AWS infrastructure.