topThe top command is one of the most familiar performance tools. Most system administrators run top to see how their Linux and UNIX systems are performing. The top utility provides a great way to monitor the performance of processes and Linux as a whole. It is more accurate to call Linux processes tasks, but in this chapter we call them processes because that is what the tools call them.[1] top can be run as a normal user as well as root. Figure 3-1 shows typical top output from an idle system. Figure 3-1. top output[View full size image] The top display has two parts. The first third or so shows information about Linux as a whole. The remaining lines are filled with individual process information. If the window is stretched, more processes are shown to fill the screen. Much general Linux information can be obtained by using several other commands instead of top. It is nice to have it all on one screen from one command, though. The first line shows the load average for the last one, five, and fifteen minutes. Load average indicates how many processes are running on a CPU or waiting to run. The uptime command can be used to display load averages as well. Next comes process information, followed by CPU, memory, and swap. The memory and swap information is similar to the free command output. After we determine memory and CPU usage, the next question is, which processes are using it? Most of the process information can be obtained from the ps command too, but top provides a nicer format that is easier to read. The most useful interactive top command is h for help, which lists top's other interactive commands. Adding and Removing FieldsFields can be added or removed from the display. The process output can be sorted by CPU, memory, or other metric. This is a great way to see what process is hogging memory. The top syntax and interactive options differ among Linux distributions. The help command quickly lists what commands are available. Many interactive options are available. Spend some time trying them out. Figure 3-2 shows a Red Hat Enterprise Linux ES release 3 help screen. Figure 3-2. top help screen[View full size image] The f command adds or removes fields from the top output. Figure 3-3 is a Red Hat Enterprise Linux ES release 3 help screen showing what fields can be added. Figure 3-3. top add/remove fields screen[View full size image] Figure 3-4 shows a SUSE Linux 9.0 top help screen. You can see that the commands they offer differ greatly. Figure 3-4. SUSE top help screen[View full size image] Output ExplainedLet's take a look at what the information from top means. We'll use the following output from top as an example: 16:30:30 up 16 days, 7:35, 2 users, load average: 0.54, 0.30, 0.11 The first line from top displays the load average information: 16:30:30 up 16 days, 7:35, 2 users, load average: 0.54, 0.30, 0.11 This output is similar to the output from uptime. You can see how long Linux has been up, the time, and the number of users. The 1-, 5-, and 15-minute load averages are displayed as well. Next, the process summary is displayed: 73 processes: 72 sleeping, 1 running, 0 zombie, 0 stopped We see 73 total processes. Of those, 72 are sleeping, and one is running. There are no zombies or stopped processes. A process becomes a zombie when it exits and its parent has not waited for it with the wait(2) or waitpid(2) functions. This often happens because the parent process exits before its children. Zombies don't take up resources other than the entry in the process table. Stopped processes are processes that have been sent the STOP signal. See the signal(7) man page for more information. Next up is the CPU information: CPU states: cpu user nice system irq softirq iowait idle The CPU lines describe how the CPUs spend their time. The top command reports the percentage of CPU time spent in user or kernel mode, running niced processes, and in idleness. The iowait column shows the percentage of time that the processor was waiting for I/O to complete while no process was executing on the CPU. The irq and softirq columns indicate time spent serving hardware and software interrupts. Linux kernels earlier than 2.6 don't report irq, softirq, and iowait. The memory information is next: Mem: 511996k av, 498828k used, 13168k free, 0k shrd, 59712k buff The first three metrics give a summary of memory usage. They list total usable memory, used memory, and free memory. These are all you need to determine whether Linux is low on memory. The next five metrics identify how the used memory is allocated. The shrd field shows shared memory usage and buff is memory used in buffers. Memory that has been allocated to the kernel or user processes can be in three different states: active, inactive dirty, and inactive clean. Active, actv in top, indicates that the memory has been used recently. Inactive dirty, in_d in top, indicates that the memory has not been used recently and may be reclaimed. In order for the memory to be reclaimed, its contents must be written to disk. This process is called "laundering" and can be called a fourth temporary state for memory. Once laundered, the inactive dirty memory becomes inactive clean, in_c in top. Available at the time of this writing is an excellent white paper by Norm Murray and Neil Horman titled "Understanding Virtual Memory in Red Hat Enterprise Linux 3" at http://people.redhat.com/nhorman/papers/rhel3_vm.pdf. The swap information is next: Swap: 105832k av, 2500k used, 103332k free 343056k cached The av field is the total amount of swap that is available for use, followed by the amount used and amount free. Last is the amount of memory used for cache by the kernel. The rest of the top display is process information: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND top shows as many processes as can fit on the screen. The field descriptions are described well in the top(1) man page. Table 3-1 provides a summary of the fields.
Saving CustomizationA very nice top feature is the capability to save the current configuration. Change the display as you please using the interactive commands and then press w to save the view. top writes a .toprc file in the user's home directory that saves the configuration. The next time this user starts top, the same display options are used. top also looks for a default configuration file, /etc/toprc. This file is a global configuration file and is read by top when any user runs the utility. This file can be used to cause top to run in secure mode and also to set the refresh delay. Secure mode prevents non-root users from killing or changing the nice value of processes. It also prevents non-root users from changing the refresh value of top. A sample /etc/toprc file for our Red Hat Enterprise Linux ES release 3 looks like the following: $ cat /etc/toprc The s indicates secure mode, and the 3 specifies three-second refresh intervals. Other distributions may have different formats for /etc/toprc. The capability to kill processes is a pretty nice feature. If some user has a runaway process, the top command makes it easy to find and kill. Run top, show all the processes for a user with the u command, and then use k to kill it. top not only is a good performance monitoring tool, but it can also be used to improve performance by killing those offensive processes. Batch Modetop can also be run in batch mode. Try running the following command: $ top n 1 b >/tmp/top.out The -n 1 tells top to only show one iteration, and the -b option indicates that the output should be in text suitable for writing to a file or piping to another program such as less. Something like the following two-line script would make a nice cron job: # cat /home/dave/top_metrics.sh We could add it to crontab and collect output every 15 minutes. # crontab -l The batch output makes it easy to take a thorough look at what is running while enjoying a good cup of coffee. All the processes are listed, and the output isn't refreshing every five seconds. If a .toprc configuration file exists in the user's home directory, it is used to format the display. The following output came from the top batch mode running on a multi-CPU Linux server. Note that we don't show all 258 processes from the top output. 10:17:21 up 125 days, 10:10, 4 users, load average: 3.60, 3.46, 3.73 By now you can see why top is such a popular performance tool. The interactive nature of top and the ability to easily customize the output makes it a great resource for identifying problems. |
Wednesday, November 25, 2009
top
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment