Monitoring linux server performance: Site24x7

System administrators often receive complaints related to poor server performance, and it can be difficult to troubleshoot these issues as they are subjective. An end user's slow server experience may be caused by a process that is running and consuming more resources than usual or there might actually be something wrong with the server.

Whatever the reason may be, it is important to regularly monitor server performance to:

Ensure smooth and error-free functioning of the server
Understand the server system’s resource usage
Identify problems before they impact end users, even a small number of users
Get an indication of the server’s responsiveness and availability
Identify processes that have suddenly started consuming more resources than usual, which may lead to an overloaded server
Identify security threats

In this article, we’ll discuss various metrics used to monitor server performance and troubleshoot performance issues.

Server performance metrics

Several metrics can help monitor the three pillars of any server—CPU, RAM, and Disk IO—to get an idea of how your Linux server is performing. Some of these metrics are:

CPU utilization: Percentage of CPU time to perform a task
Load average: Number of processes running on the CPU over a while
Uptime: Amount of time passed since the last reboot of the system
Memory utilization: Memory percentage utilized
Disk utilization: Percentage of the total disk space used
I/O wait time: Percentage of the CPU time spent waiting for I/O operations

Effects of poor Linux server performance

Various performance issues occur in different operating systems or services, and every issue requires a unique approach to troubleshooting. Most issues are caused due to CPU, disk, memory, networking, and IO.

Each area yields different symptoms and needs another diagnosis and solution to mitigate server slowness.

For instance, if the server is slow or outdated, the system will not be able to keep up with the requirements for modern applications. This can lead to end users experiencing an increase in the response time, making them frustrated and decreasing their confidence in your application.

Database queries could also take too long to retrieve data, leading to poor application performance. A sluggish server can additionally directly impact the CPU’s capability to be utilized.

Lastly, poor server performance may sometimes lead to downtime, although the probability of this occurring is low.

Causes of slow-running Linux servers

Overuse of system load is probably one of the most common causes of system slowness.

CPU-bounded system load can create issues due to processes waiting for CPU resources, whereas RAM-bounded system load can lead to high I/O wait times since the system starts using a swap in the server when it runs out of RAM. Meanwhile, IO-bounded system load can cause slowness since processes compete for disk or network I/O resources. High CPU user time can contribute to high system load as well.

Additional factors that can lead to a slower server include:

Unnecessary services started at boot time
Abnormal memory volumes due to multiple heavy-use services left open
Failure to update the operating system (OS) and software
Detected and undetected viruses
Network bottlenecks, excessive network traffic, and database traffic

Troubleshooting Linux server performance

Linux offers various commands to help you troubleshoot system performance. These commands help to monitor various components of your system, such as memory, CPU, and IO. The three most commonly used commands are top, vmstat, and iostat commands.

`top` command

The top command is the most commonly used command for displaying the overall view of a system. It summarizes the various CPU-level metrics discussed in the above sections, including uptime, load averages, and I/O wait time. In addition, it displays a running list of processes or tasks currently being managed by the kernel.

Syntax:

top [options]

Example:

Running the top command will display an output similar to that shown below:

Fig 1: Output of top command

As you can see in the above output, the first section displays the real-time and dynamic summary of the system.

The first line displays the uptime along with the system load averages over the last one, five, and 15-minute periods. If the load average is higher than the normal load value, i.e., 0.7 for one and five-minute intervals, it can be ignored. When the load average remains consistently higher than 0.7 for a longer duration, it is a clear indication that the server is overloaded.

The second line displays various task-level information, including active, running, and sleeping tasks.

The third line, labeled %CPU(s), displays the various CPU-level statistics. Some metrics to monitor here are:

us: Percentage of CPU time for user processes
sy: Percentage of CPU time for system processes
id: Percentage of CPU time spent idle
wa: Percentage of CPU time spent waiting for I/O operations to conclude

The fourth and fifth lines display data about the system memory, while the final section displays the dynamic and real-time list of processes. You can sort this list based on the different columns shown. For example, pressing P will sort the processes based on the %CPU column.

`vmstat` command

The next important command on the list is vmstat, which stands for virtual memory statistics. It is a performance monitoring tool provided by Linux that displays various statistics about the different components of the system, including memory, IO, and CPU.

By default, this command displays reports since the system was last rebooted. However, you can also view real-time statistics, which are updated after an interval specified by you.

Syntax:

vmstat [options][delay [count]]

The two parameters you can use to view real-time reports are:

delay: To continuously update the reports after the specified delay in seconds
count: To define the number of updates wanted; the default value is infinite

Example:

Running the vmstat command will give you an output similar to the following:

Fig 2: Output of vmstat command

The output is divided into five sections.

The memory section displays the following information:

swpd: Total amount of memory being used as virtual memory
free: Total free/idle memory (in kilobytes) available
buff: Memory being temporarily used as a data buffer
cache: Total cache memory

The swap section displays the following information:

si: Memory (in kilobytes) swapped in from disk per second
so: Memory (in kilobytes) swapped out to disk per second

The io section displays the following information:

bi: The number of blocks being received from a block device per second
bo: The number of blocks being sent to a block device per second

The system section displays the following information:

in: The number of interrupts per second, including the clock, since the last system boot
cs: The number of context switches per second when the kernel swaps to user mode processing from system mode processing

The cpu section displays data similar to that found in the %Cpu(s) line of the top command output.

`iostat` command

iostat is an important and widely used command to monitor your system’s input/output statistics. While commands like top or vmstat may help you discover that the server is experiencing a high I/O wait time, you should also know which devices/partitions are under load and causing the high I/O wait time.

The iostat command does just this and generates device-level reports. Besides, it also yields CPU-level statistics similar to those provided by the top and vmstat commands. Just like vmstat, the output produced is, by default, for the time since the system was last rebooted, but you may view real-time data by providing a couple of extra parameters.

Syntax:

iostat [option] [interval] [count]

The interval and count parameters here are similar to that in the vmstat command and are used to show the real-time updates infinitely or for specific counts.

Example:

Running the iostat command will give you an output similar to the one shown below:

Fig 3: Output of iostat command

As you can see in the above output, the first line displays the CPU report:

%user: The CPU utilization occurred while executing at user level
%nice: The CPU utilization occurred while executing at user level with a nice priority
%system: The CPU utilization occurred while executing at system (kernel) level
%iowait: The percentage of time the CPU spent idle while the system was waiting for an I/O request to conclude
%steal: The percentage of time the CPU spent waiting while the hypervisor was servicing another virtual processor
%idle: The percentage of time spent idle by CPU and there were no outstanding I/O requests as well

The next section shows the following statistics for all the devices/partitions in the system:

Device: Partition/device name
tps: Transfer per second—a higher tps represents a busy processor and may impact server performance
Blk_read/s (kB_read/s): The number of blocks of data in kilobytes read from the device per second
Blk_wrtn/s (kB_wrtn/s): The number of blocks of data in kilobytes written to the device per second
Blk_read (kB_read): Total number of blocks in kilobytes read from this device since the last reboot
Blk_wrtn (kB_wrtn): Total number of blocks in kilobytes written to this device since the last reboot
Blk_dscd (kB_dscd): Total number of blocks in kilobytes discarded for this device since the last reboot
Blk_dscd/s (kB_dscd): Total number of blocks in kilobytes discarded for this device per second

A very high Blk_wrtn/s or Blk_read/s for a particular device or partition should be investigated, as it may be increasing load on the CPU and hence decreasing the server performance.

Conclusion

Server performance is crucial, as it can directly impact an application’s user base. Monitoring performance metrics helps identify and fix any issues in the server before it could affect your application.

In this article we’ve mentioned major commands—top, vmstat, and iostat—that can be used to narrow down what’s causing system slowness. The methods described here can be used to troubleshoot and fix permanence issues.

Was this article helpful?

Sorry to hear that. Let us know how we can improve the article.

Previous Resolve process network usage issues

Next Understanding CPU run queue length in Linux

Monitor Linux server performance

Server performance metrics

Effects of poor Linux server performance

Causes of slow-running Linux servers

Troubleshooting Linux server performance

`top` command

`vmstat` command

`iostat` command

Conclusion

Write For Us

Monitor Linux server performance

Server performance metrics

Effects of poor Linux server performance

Causes of slow-running Linux servers

Troubleshooting Linux server performance

top command

vmstat command

iostat command

Conclusion

Write For Us

`top` command

`vmstat` command

`iostat` command