Kubernetes API server monitoring
Gain deep insights into your Kubernetes cluster's control plane with Site24x7’s Kubernetes API Server Monitoring. Track critical metrics of your Kubernetes API server, the control center of your containerized environment, and the health and performance of nodes, pods, and workloads, ensuring optimal operation of your clusters.
With detailed metrics such as request handling, response sizes, resource usage, webhook activities, and authentication attempts, you can:
- Identify bottlenecks in API server performance.
- Detect security risks like deprecated APIs and insecure TLS connections.
- Optimize resource allocation by analyzing workloads and storage.
Supported versions:
This feature is supported from Linux server monitoring agent version 19.9.0. 
Control Plane monitoring and other latest features require you to upgrade your Kubernetes agent to the latest version. 
If you haven't added a Kubernetes monitor yet, follow these steps to add one.
Adding an API server monitor
To enable Kubernetes API server monitor,
- Log in to your Site24x7 account.
- Navigate to K8s > click the hamburger icon near the cluster monitor name and select Edit.
- In the Edit page, under the Filter Resources section, click the drop-down near Resource Types. Select Kubernetes API Server.
- Click Save.
After saving, the Site24x7 Kubernetes API server monitor will fetch all the API server metrics. You can view them by navigating to K8s > select the Cluster > API Servers.
Supported metrics:
| Metric | Description | Unit | 
| Audit Events Generated | The number of audit events generated and sent to the audit backend during the last poll interval | Count | 
| Audit Requests Rejected | The number of API server requests rejected due to an error in the audit logging backend during the last poll interval | Count | 
| Current Inqueue Requests | The maximum number of queued requests in this API server during the polling time | Count | 
| Kube Aggregator X509 Insecure SHA1 | The number of requests to servers with insecure SHA1 signatures in their serving certificate or the number of connection failures due to the insecure SHA1 signatures (either/or, based on the runtime environment) during the last poll interval | Count | 
| Webhooks X509 Insecure SHA1 | The number of requests to servers with insecure SHA1 signatures in their serving certificate or the number of connection failures due to the insecure SHA1 signatures (either/or, based on the runtime environment) during the last poll interval | Count | 
| Aborted Requests | The number of requests which the API server has aborted, possibly due to a timeout during the last poll interval | Count | 
| Deprecated API Requests | The number of deprecated APIs that have been requested during the last polling time | Count | 
| TLS Handshake Errors | The number of requests dropped with the TLS handshake error during the last poll interval | Count | 
| Average Webhook Admission Duration | The average duration of admission review requests handled by the admission webhooks during the last poll interval | Seconds | 
| Webhook Admission Requests | The total number of admission review requests handled by the admission webhooks received during the last poll interval | Count | 
| Total Webhook Admission Duration | The total time taken for admission review requests handled by the admission webhooks during the last poll interval | Seconds | 
| Average Webhook Controller Duration | The average duration of admission review requests handled by the admission controller during the last poll interval | Seconds | 
| Webhook Controller Requests | The total number of admission review requests handled by the admission controller received during the last poll interval | Count | 
| Total Webhook Controller Duration | The total time taken for admission review requests handled by the admission controller during the last poll interval | Seconds | 
| Average Etcd Duration | The average seconds/request spent for requests made to the etcd server during the poll interval | Seconds | 
| Etcd Requests | The total number of requests made to the etcd server during the poll interval | Count | 
| Total Etcd Duration | The total sum of latency of requests made to the etcd server during the poll interval | Seconds | 
| Process Resident Memory | The amount of resident memory size used by the API server process during the last poll interval | Bytes | 
| Process CPU Time | The CPU time consumed by the API server process during the last poll interval | Seconds | 
| Process Open File Descriptors | The number of file descriptors opened by the API server process during the last poll interval | Count | 
| Process Virtual Memory | The amount of virtual memory size used by the API server process during the last poll interval | Bytes | 
| Go Threads | The number of OS threads created by the Go runtime of the API server process during the last poll interval | Count | 
| Go Routines | The number of Go routines that currently exist for the API server process during the last poll interval | Count | 
| Request Count | The total number of requests to the Kubernetes API server during the last poll interval | Count | 
| Current Inflight Requests | The maximum number of currently used Inflight request limits of the API server during the last poll interval | Count | 
| Average Response Size | The size of responses sent by the Kubernetes API server per request during the last poll interval | Bytes | 
| Response Count | The total number of requests responded by the API server during the last poll interval | Count | 
| Total Responses Size | The total size of responses sent by the Kubernetes API server for all the requests during the last poll interval | Bytes | 
| Storage Objects | The total number of objects stored in the API server's underlying storage during the last poll interval | Count | 
| Average Request Duration | The duration of the HTTP request handled by the API server during the last poll interval | Seconds | 
| Requests Count | The total number of HTTP requests handled by the API server during the last poll interval | Count | 
| Total Requests Duration | The total duration of HTTP requests handled by the API server during the last poll interval | Seconds | 
| Storage Database File Size | The total size of the storage database file physically allocated that is used by the API server during the last poll interval | Bytes | 
| Resource Name | The name of the resource | Text | 
| Average Request Duration | The duration of the HTTP request handled by the API server during the last poll interval grouped by resource | Seconds | 
| Requests Handled | The total number of HTTP requests handled by the API server during the last poll interval grouped by resource | Count | 
| Total Requests Duration | The total duration of HTTP requests handled by the API server during the last poll interval grouped by resource | Seconds | 
| Average Response Size | The size of responses per request sent by the API server during the last poll interval grouped by resource | Bytes | 
| Response Count | The total number of requests responded by the API server during the last poll interval grouped by resource | Count | 
| Total Responses Size | The total size of responses sent by the API server for all the requests during the last poll interval grouped by resource | Bytes | 
| Total Requests | The number of requests to the API server grouped by resource during the last poll interval | Count | 
| Response Code | The request code number | Number represented as text (no unit) | 
| Total Requests | The number of requests to the API server grouped by code during the last poll interval | Count | 
| Total Rest Client Requests | The total number of HTTP requests from your API server to external services or APIs grouped by code during the last poll interval | Count | 
| Verb | The verb action of the request | Text | 
| Total Requests | The number of requests to the API server grouped by verb during the last poll interval | Count | 
| Total Rest Client Requests | The total number of HTTP requests from your API server to external services or APIs grouped by verb during the last poll interval | Count | 
| Host | The hostname of the service | Text | 
| Total Rest Client Requests | The total number of HTTP requests from your API server to external services or APIs grouped by hostname during the last poll interval | Count | 
| Resource Name | The name of the resource | Text | 
| Storage Objects | The total number of objects stored in the API server's underlying storage grouped by resource during the last poll interval | Count | 
| Resource Name | The name of the resource | Text | 
| Verb | The name of the verb | Text | 
| Active Long-Running Requests | The number of all active long-running API server requests grouped by combination of resource and verb during the last poll interval | Count | 
| Service Name | The name of the gRPC service that being called for the request | Text | 
| Methods | The name of the gRPC method that is being invoked | Text | 
| Total Requests | The total number of the gRPC requests completed by client with the service and method combination during the last poll interval | Count | 
| Code | The name of the code (final status of the gRPC request) | Text | 
| Total Requests | The total number of the gRPC requests completed by client with the code during the last poll interval | Count | 
| Name | The name of the API feature | Text | 
| StageName | The stage name of the API feature | Text | 
| Feature Status | The enable/disable (1/0) status | Text | 
| Resource Name | The name of the action or task workqueue | Text | 
| Total Workqueue Adds | The total number of adds handled by workqueue grouped by action name during the last poll interval | Count | 
| Workqueue Depth | The number of actions or task in the workqueue grouped by the action name to be processed during the last poll | Count | 
| Authentication Duration (Success), Authentication Duration (Failed) | The total duration spent for authentication grouped by result during the last poll interval | Seconds | 
| Authentication Attempts (Success), Authentication Attempts (Failed) | The total number of authentication attempts made grouped by result during the last poll interval | Count | 
| Average Authentication Duration (Success), Average Authentication Duration (Failed) | The average time taken to authenticate per request made grouped by result during the last poll interval | Seconds | 
