Amazon Elastic Kubernetes Service Monitoring Integration
Amazon Elastic Kubernetes Service (Amazon EKS) enables you to easily deploy, manage, and scale containerized applications using Kubernetes on AWS. With Kubernetes you can automate the deployment, scaling, and management of containerized applications at scale.
With Site24x7's integration, monitor your Amazon EKS at the cluster, node, and namespace level to achieve full-stack visibility into your Amazon EKS.
- Setup and configuration
- Prerequisite
- Policy and permissions
- Cluster-level metrics
- Node-level metrics
- Namespace-level metrics
- Service-level metrics
- Pod-level metrics
- Threshold configuration
- Forecast
- Site24x7's EKS monitoring interface
Setup and Configuration
1. If you haven't already, enable access to your AWS resources between your AWS account and Site24x7's AWS account by either:
- Creating Site24x7 as an IAM user.
- Creating a cross-account IAM role. Learn more
2. On the Integrate AWS Account page, check the box next to Amazon EKS. Learn more
Prerequisite
- Install the Container Insights on Amazon EKS. Learn more
Policy and Permissions
Site24x7 uses various Amazon EKS APIs to collect information about your clusters. Assign the AWS Managed policy ReadOnlyAccess to the Site24x7 entity (IAM user or IAM role) to help Site24x7 collect metrics and metadata. If you want to assign a custom policy, please make sure the following read-level actions are present in the policy JSON. Learn more
- "eks:DescribeCluster",
- "eks:ListClusters",
- "cloudwatch:ListMetrics"
Polling Frequency
Site24x7 collects metric data on your clusters, namespace and nodes as per the poll frequency set, ranging from one minute up to one day. Learn more
Cluster-level metrics
CloudWatch metric | Description | Statistic | Data Type |
---|---|---|---|
cluster_failed_node_count | Number of failed nodes in a cluster | Maximum | Nodes |
cluster_node_count | Total nodes in a cluster | Maximum | Nodes |
namespace_number_of_running_pods | Number of pods running in namespaces | Maximum | Pods |
service_number_of_running_pods | Number of pods running in services | Maximum | Pods |
node_number_of_running_pods | Number of pods running in nodes | Maximum | Pods |
node_number_of_running_containers | Number of containers running in nodes | Maximum | Containers |
node_cpu_usage_total | CPU used by all nodes | Maximum | Units |
node_cpu_limit | CPU assigned to nodes | Maximum | Units |
node_cpu_reserved_capacity | CPU reserved for nodes | Average | Percentage |
node_cpu_utilization | CPU used by nodes | Average | Percentage |
node_filesystem_utilization | File system capacity on nodes | Average | Percentage |
node_memory_limit | Memory assigned to nodes | Maximum | MB |
node_memory_working_set | Memory used in working sets of nodes | Average | MB |
node_memory_reserved_capacity | Memory reserved for nodes | Average | Percentage |
node_memory_utilization | Memory utilized by nodes | Average | Percentage |
node_network_total_bytes | Total network traffic in nodes | Sum | MB/sec |
pod_cpu_reserved_capacity | CPU reserved for pods | Average | Percentage |
pod_cpu_utilization | CPU utilized by pods | Average | Percentage |
pod_cpu_utilization_over_pod_limit | CPU utilized over pod limit | Average | Percentage |
pod_memory_reserved_capacity | Memory reserved for pods | Average | Percentage |
pod_memory_utilization | Memory utilized by pods | Average | Percentage |
pod_memory_utilization_over_pod_limit | Memory utilized over pod limit | Average | Percentage |
pod_network_rx_bytes | Total bytes received by pods | Sum | MB/sec |
pod_network_tx_bytes | Total bytes sent by pods | Sum | MB/sec |
Node-level metrics
CloudWatch metric | Description | Statistic | Data Type |
---|---|---|---|
node_number_of_running_pods | Number of pods running in nodes | Maximum | Pods |
node_number_of_running_containers | Number of containers running in nodes | Maximum | Containers |
node_cpu_reserved_capacity | CPU reserved for nodes | Average | Percentage |
node_cpu_utilization | CPU used by nodes | Average | Percentage |
node_filesystem_utilization | File system capacity on nodes | Average | Percentage |
node_memory_reserved_capacity | Memory reserved for nodes | Average | Percentage |
node_memory_utilization | Memory utilized by nodes | Average | Percentage |
node_network_total_bytes | Total network traffic in nodes | Sum | MB/sec |
Namespace-level metrics
CloudWatch metric | Description | Statistic | Data Type |
---|---|---|---|
namespace_number_of_running_pods | Number of pods running in namespaces | Maximum | Pods |
pod_cpu_utilization | CPU utilized by pods | Average | Percentage |
pod_cpu_utilization_over_pod_limit | CPU utilized over pod limit | Average | Percentage |
pod_memory_utilization | Memory utilized by pods | Average | Percentage |
pod_memory_utilization_over_pod_limit | Memory utilized over pod limit | Average | Percentage |
pod_network_rx_bytes | Total bytes received by pods | Sum | MB/sec |
pod_network_tx_bytes | Total bytes sent by pods | Sum | MB/sec |
Service-level metrics
CloudWatch metric | Description | Statistic | Data Type |
---|---|---|---|
service_number_of_running_pods | Number of pods running in services | Maximum | Pods |
pod_cpu_utilization | CPU Utilized by pods | Average | Percentage |
pod_cpu_utilization_over_pod_limit | CPU Utilized over pod limit | Average | Percentage |
pod_memory_utilization | Memory utilized by pods | Average | Percentage |
pod_memory_utilization_over_pod_limit | Memory utilized over pod limit | Average | Percentage |
pod_network_rx_bytes | Total bytes received by pods | Sum | MB/sec |
pod_network_tx_bytes | Total bytes sent by pods | Sum | MB/sec |
Pod-level metrics
CloudWatch metric | Description | Statistic | Data Type |
---|---|---|---|
pod_cpu_reserved_capacity | CPU reserved for pods | Average | Percentage |
pod_cpu_utilization | CPU Utilized by pods | Average | Percentage |
pod_cpu_utilization_over_pod_limit | CPU utilized over pod limit | Average | Percentage |
pod_memory_reserved_capacity | Memory reserved for pods | Average | Percentage |
pod_memory_utilization | Memory utilized by pods | Average | Percentage |
pod_memory_utilization_over_pod_limit | Memory utilized over pod limit | Average | Percentage |
pod_network_rx_bytes | Total bytes received by pods | Sum | MB/sec |
pod_network_tx_bytes | Total bytes sent by pods | Sum | MB/sec |
pod_number_of_container_restarts | Number of container restarts | Maximum | Containers |
Threshold Configuration
Go to Admin > Configuration Profiles > Threshold and Availability (+) > choose the monitor type as EKS Cluster/EKS Node/EKS Namespace. You can set threshold values for all the metrics mentioned above. Further, for EKS Namespace and EKS Node monitors, you can set inactive namespaces and nodes respectively into maintenance in the threshold form.
Forecast
Estimate future values of the following Amazon EKS Cluster performance metrics and make informed decisions about adding capacity or scaling your AWS infrastructure.
- Node CPU Usage
- Node CPU Utilization
- Node Memory Utilization
- Pod CPU Utilization
- Pod Memory Utilization
Similarly, you can also view the forecast for the following metrics of Amazon EKS Namespace:
- Pod CPU Utilization
- Pod Memory Utilization
- Service CPU Utilization
- Service Memory Utilization
- CPU Utilization
- Memory Utilization
Similarly, you can also view the forecast for the following metrics of Amazon EKS Node:
- CPU Utilization per Node
- Memory Utilization per Node
- Network per Node
Site24x7's EKS monitoring interface
Summary
Gain an overview of different events occurring within each resource with time series charts. These charts provide event timelines on CPU utilization and memory utilization at a pod and node level in percentage, total bytes sent or received, the file system capacity, and the number of running containers and pods. All time series charts have the average, minimum, and maximum values listed.
Node and Namespace Details
Here you can view a list of nodes and namespaces associated with your Elastic Kubernetes environment. Click on an individual listing to see performance and resource usage stats associated with that resource. You can also set thresholds and be notified when any of these services fail by clicking the pencil icon under Action.
Logs
Collect EKS control plane log entries for selected log types, with the logs being fetched from CloudWatch and categorized under log stream name.
Configuration
The configuration details of an EKS are provided under this tab. Details on the resource name, endpoint URL, region of a resource, status of a resource, security groups, subnets, VPC ID, status on the public access/private access, security groups, and many more are provided in this section.