Monitor Linux server performance

System administrators often receive complaints related to poor server performance, and it can be difficult to troubleshoot these issues as they are subjective. An end user's slow server experience may be caused by a process that is running and consuming more resources than usual or there might actually be something wrong with the server.

Whatever the reason may be, it is important to regularly monitor server performance to:

  • Ensure smooth and error-free functioning of the server
  • Understand the server system’s resource usage
  • Identify problems before they impact end users, even a small number of users
  • Get an indication of the server’s responsiveness and availability
  • Identify processes that have suddenly started consuming more resources than usual, which may lead to an overloaded server
  • Identify security threats

In this article, we’ll discuss various metrics used to monitor server performance and troubleshoot performance issues.

Server performance metrics

Several metrics can help monitor the three pillars of any server—CPU, RAM, and Disk IO—to get an idea of how your Linux server is performing. Some of these metrics are:

  • CPU utilization: Percentage of CPU time to perform a task
  • Load average: Number of processes running on the CPU over a while
  • Uptime: Amount of time passed since the last reboot of the system
  • Memory utilization: Memory percentage utilized
  • Disk utilization: Percentage of the total disk space used
  • I/O wait time: Percentage of the CPU time spent waiting for I/O operations

Effects of poor Linux server performance

Various performance issues occur in different operating systems or services, and every issue requires a unique approach to troubleshooting. Most issues are caused due to CPU, disk, memory, networking, and IO.

Each area yields different symptoms and needs another diagnosis and solution to mitigate server slowness.

For instance, if the server is slow or outdated, the system will not be able to keep up with the requirements for modern applications. This can lead to end users experiencing an increase in the response time, making them frustrated and decreasing their confidence in your application.

Database queries could also take too long to retrieve data, leading to poor application performance. A sluggish server can additionally directly impact the CPU’s capability to be utilized.

Lastly, poor server performance may sometimes lead to downtime, although the probability of this occurring is low.

Causes of slow-running Linux servers

Overuse of system load is probably one of the most common causes of system slowness.

CPU-bounded system load can create issues due to processes waiting for CPU resources, whereas RAM-bounded system load can lead to high I/O wait times since the system starts using a swap in the server when it runs out of RAM. Meanwhile, IO-bounded system load can cause slowness since processes compete for disk or network I/O resources. High CPU user time can contribute to high system load as well.

Additional factors that can lead to a slower server include:

  • Unnecessary services started at boot time
  • Abnormal memory volumes due to multiple heavy-use services left open
  • Failure to update the operating system (OS) and software
  • Detected and undetected viruses
  • Network bottlenecks, excessive network traffic, and database traffic

Troubleshooting Linux server performance

Linux offers various commands to help you troubleshoot system performance. These commands help to monitor various components of your system, such as memory, CPU, and IO. The three most commonly used commands are top, vmstat, and iostat commands.

top command

The top command is the most commonly used command for displaying the overall view of a system. It summarizes the various CPU-level metrics discussed in the above sections, including uptime, load averages, and I/O wait time. In addition, it displays a running list of processes or tasks currently being managed by the kernel.

Syntax:

top [options]

Example:

Running the top command will display an output similar to that shown below:

Fig 1: Output of top command Fig 1: Output of top command

As you can see in the above output, the first section displays the real-time and dynamic summary of the system.

The first line displays the uptime along with the system load averages over the last one, five, and 15-minute periods. If the load average is higher than the normal load value, i.e., 0.7 for one and five-minute intervals, it can be ignored. When the load average remains consistently higher than 0.7 for a longer duration, it is a clear indication that the server is overloaded.

The second line displays various task-level information, including active, running, and sleeping tasks.

The third line, labeled %CPU(s), displays the various CPU-level statistics. Some metrics to monitor here are:

  • us: Percentage of CPU time for user processes
  • sy: Percentage of CPU time for system processes
  • id: Percentage of CPU time spent idle
  • wa: Percentage of CPU time spent waiting for I/O operations to conclude

The fourth and fifth lines display data about the system memory, while the final section displays the dynamic and real-time list of processes. You can sort this list based on the different columns shown. For example, pressing P will sort the processes based on the %CPU column.

vmstat command

The next important command on the list is vmstat, which stands for virtual memory statistics. It is a performance monitoring tool provided by Linux that displays various statistics about the different components of the system, including memory, IO, and CPU.

By default, this command displays reports since the system was last rebooted. However, you can also view real-time statistics, which are updated after an interval specified by you.

Syntax:

vmstat [options][delay [count]]

The two parameters you can use to view real-time reports are:

  • delay: To continuously update the reports after the specified delay in seconds
  • count: To define the number of updates wanted; the default value is infinite

Example:

Running the vmstat command will give you an output similar to the following:

Fig 2: Output of vmstat command Fig 2: Output of vmstat command

The output is divided into five sections.

  • The memory section displays the following information:
    • swpd: Total amount of memory being used as virtual memory
    • free: Total free/idle memory (in kilobytes) available
    • buff: Memory being temporarily used as a data buffer
    • cache: Total cache memory
  • The swap section displays the following information:
    • si: Memory (in kilobytes) swapped in from disk per second
    • so: Memory (in kilobytes) swapped out to disk per second
  • The io section displays the following information:
    • bi: The number of blocks being received from a block device per second
    • bo: The number of blocks being sent to a block device per second
  • The system section displays the following information:
    • in: The number of interrupts per second, including the clock, since the last system boot
    • cs: The number of context switches per second when the kernel swaps to user mode processing from system mode processing

The cpu section displays data similar to that found in the %Cpu(s) line of the top command output.

iostat command

iostat is an important and widely used command to monitor your system’s input/output statistics. While commands like top or vmstat may help you discover that the server is experiencing a high I/O wait time, you should also know which devices/partitions are under load and causing the high I/O wait time.

The iostat command does just this and generates device-level reports. Besides, it also yields CPU-level statistics similar to those provided by the top and vmstat commands. Just like vmstat, the output produced is, by default, for the time since the system was last rebooted, but you may view real-time data by providing a couple of extra parameters.

Syntax:

iostat [option] [interval] [count]

The interval and count parameters here are similar to that in the vmstat command and are used to show the real-time updates infinitely or for specific counts.

Example:

Running the iostat command will give you an output similar to the one shown below:

Fig 3: Output of iostat command Fig 3: Output of iostat command

As you can see in the above output, the first line displays the CPU report:

  • %user: The CPU utilization occurred while executing at user level
  • %nice: The CPU utilization occurred while executing at user level with a nice priority
  • %system: The CPU utilization occurred while executing at system (kernel) level
  • %iowait: The percentage of time the CPU spent idle while the system was waiting for an I/O request to conclude
  • %steal: The percentage of time the CPU spent waiting while the hypervisor was servicing another virtual processor
  • %idle: The percentage of time spent idle by CPU and there were no outstanding I/O requests as well

The next section shows the following statistics for all the devices/partitions in the system:

  • Device: Partition/device name
  • tps: Transfer per second—a higher tps represents a busy processor and may impact server performance
  • Blk_read/s (kB_read/s): The number of blocks of data in kilobytes read from the device per second
  • Blk_wrtn/s (kB_wrtn/s): The number of blocks of data in kilobytes written to the device per second
  • Blk_read (kB_read): Total number of blocks in kilobytes read from this device since the last reboot
  • Blk_wrtn (kB_wrtn): Total number of blocks in kilobytes written to this device since the last reboot
  • Blk_dscd (kB_dscd): Total number of blocks in kilobytes discarded for this device since the last reboot
  • Blk_dscd/s (kB_dscd): Total number of blocks in kilobytes discarded for this device per second

A very high Blk_wrtn/s or Blk_read/s for a particular device or partition should be investigated, as it may be increasing load on the CPU and hence decreasing the server performance.

Conclusion

Server performance is crucial, as it can directly impact an application’s user base. Monitoring performance metrics helps identify and fix any issues in the server before it could affect your application.

In this article we’ve mentioned major commands—top, vmstat, and iostat—that can be used to narrow down what’s causing system slowness. The methods described here can be used to troubleshoot and fix permanence issues.

Was this article helpful?

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 "Learn" portal. Get paid for your writing.

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.

Apply Now
Write For Us