Programmer's Life: Collecting Disk Performance Data

I l@ve RuBoard

Collecting Disk Performance Data

This section describes some common tools for measuring and monitoring disk performance. The following are some key terms that you need to know for this section:

Block I/O:
The reads and writes that are held in the buffer cache (buffered) and then transferred in fixed-size blocks.

File I/O:
The I/O access to a physical disk, which does not include virtual memory. It includes filesystem I/O, system I/O, raw I/O, and block I/O.

Logical I/O:
The read or write system call made by an application to the filesystem. The call results in physical I/O if the data is not in the buffer cache.

Physical I/O:
Data transferred from memory to disk, or vice versa. Physical I/O includes both file I/O and virtual memory I/O.

Raw I/O:
Unbuffered I/O between a user application and the physical disk that bypasses the filesystem's buffer cache (also known as character mode).

Virtual memory I/O:
The reads and writes from the disk for memory-mapped files and for paging out pages from a swap area.

Performance tools, such as BMC PATROL and HP MeasureWare Agent, do not always provide the same set of metrics on all platforms. For simplicity, this section focuses only on the Sun Solaris and HP-UX platforms. Also, these products are continually being enhanced, so the actual metrics available for use in your environment may not precisely match the information presented in this section.

MeasureWare

HP MeasureWare Agent is a Hewlett-Packard product that collects and logs resource and performance metrics. MeasureWare agents are installed on the individual server systems to be monitored. MeasureWare agents exist for many platforms and operating systems.

MeasureWare agents collect data at the global, application, and process levels. Many of the system metrics are described in Chapter 4, "Monitoring the System." This section lists the additional global metrics that are used to monitor disk devices.

The following is a list of system-wide disk-related metrics available on HP-UX and Sun Solaris:

Number of disk drives configured on system

Average utilization of busiest disk during interval

Number and rate of physical disk reads during interval

Number and rate of physical disk writes during interval

Number and rate of physical disk transfers during interval

Number and rate of disk reads by filesystem during interval

Number and rate of disk writes for memory management during interval

Percentage of logical reads satisfied by memory cache

Number and rate of filesystem reads, per disk drive, during interval

Number and rate of filesystem writes, per disk drive, during interval

Disk utilization, per disk drive, during interval

These additional system-wide disk-related metrics are available on HP-UX:

Number and rate of logical disk I/Os during interval

Number and rate of logical disk reads during interval

Number and rate of logical disk writes during interval

Number and rate of logical disk transfer reads

Number and rate of logical disk transfer writes

Number and rate of disk writes by filesystem during interval

Number and rate of disk reads for memory management during interval

Number and rate of disk reads for system during interval

Number and rate of disk writes for system during interval

Number and rate of raw reads during interval

Number and rate of raw writes during interval

Number and rate of logical disk reads, per disk drive, during interval

Number and rate of logical disk writes, per disk drive, during interval

Number and rate of raw reads, per disk drive, during interval

Number and rate of raw writes, per disk drive, during interval

Number and rate of memory manager transfers, per disk drive, during interval

Number and rate of system transfers, per disk drive, during interval

Average number of requests in queue, per disk drive

MeasureWare can also provide information on swap space utilization and the "fullest filesystem," which is the filesystem with the highest percentage of disk space in use.

GlancePlus

GlancePlus is a real-time, graphical performance monitoring tool. It is used to monitor the performance and system resource utilization of a single system. Both Motif-based and character-based interfaces are available. The product can be used on HP-UX, Sun Solaris, and many other operating systems.

GlancePlus can be used to view and graph a system's current CPU, memory, swap, and disk activity. GlancePlus has screens dedicated to each of these main resources.

GlancePlus can display a variety of data useful for disk monitoring:

Disk utilization and queue length per disk device

Disk I/O rates by filesystem

Disk I/O rates per process

Number of configured disks

Physical reads and writes per disk and per filesystem

Reads and writes per logical volume

Number of configured LVM volume groups

Filesystem capacity and utilization

Swap space capacity and utilization

System table resources

The specific list of available metrics can be found when running GlancePlus, through its online help facility.

GlancePlus is also capable of setting and receiving performance-related alarms. Customizable rules determine when a system performance problem should be sent as an alarm. The rules are managed by the GlancePlus Adviser. An Adviser menu option allows you to Edit Adviser Syntax. When you select this option, all of the alarm conditions are shown and can be modified, as demonstrated in Figure 5-8.

Figure 5-8. Using GlancePlus to configure alarms for monitoring swap space utilization.

Notice in Figure 5-8 how the swap-related alarms are integrated into the same definition file along with network-related alarms. When alarms occur, they can be reflected directly in the GlancePlus interface.

GlancePlus can be launched from the command line, or you can start it from the Performance Monitors functional area in SAM (on HP-UX).

PerfView

PerfView is a graphical performance monitoring tool that is used to monitor the performance and system resource utilization for multiple systems in your environment. A variety of performance graphs can be displayed. The graphs are based on data collected over a period of time, unlike the real-time graphs of GlancePlus. PerfView can show graphs from multiple systems simultaneously, so that comparisons can be made.

PerfView is integrated with other monitoring tools. For example, you can launch GlancePlus from within PerfView by accessing the Tools menu. And, PerfView can be launched from the IT/O Applications Bank. When troubleshooting an event in the IT/O Message Browser window, you can launch PerfView to see a related performance graph.

PerfView relies on MeasureWare data, so it can display performance information only for systems that support the MeasureWare Agent. Refer to the previous section on MeasureWare to see a list of the disk metrics available.

PerfView has three main components:

PerfView Monitor:
Provides the ability to receive alarms. A textual description of an alarm can be displayed. Alarms can be filtered by severity, type, or source system. Also, after an alarm is received, the alarm can be selected, to display a graph of related metrics. An operator can monitor trends leading to failures, and can then take proactive actions to avoid problems. Graphs can be used for comparison between systems and to show a history of resource consumption. An internal database is maintained that keeps a history of alarm notification messages.

PerfView Analyzer:
Provides resource and performance analysis for disks and other resources. System metrics can be shown at three different levels: process, application (configured by the user as a set of processes), and global system information. It relies on data received from MeasureWare agents on managed nodes. Data can be analyzed from up to eight systems concurrently. All MeasureWare data sources are supported. PerfView Analyzer is required by both PerfView Monitor and PerfView Planner.

PerfView Planner:
Provides forecasting capability. Graphs can be extrapolated into the future. A variety of graphs (such as linear, exponential, s-curve, and smoothed) can be shown for forecasted data.

PerfView's ability to show history and trend information can be helpful in diagnosing disk problems. Graphing performance information can help you to understand whether a persistent problem exists or is an anomaly (simply a momentary spike of activity). Figure 5-9 shows a PerfView graph illustrating an application's I/O performance over time. Additional system performance metrics are also included in the graph.

Figure 5-9. PerfView can show the history of an application's I/O access rate.

To diagnose a problem further, PerfView Monitor allows the user to change time intervals, to try to find the specific time that a problem occurred. The graph is redrawn showing the new time period.

BMC PATROL

BMC provides monitoring capabilities through its PATROL software suite. PATROL provides the basic framework for defining thresholds, sending and translating events, and so forth. Optional products called Knowledge Modules (KMs) contain the ability to monitor specific components. For example, BMC PATROL includes KMs for UNIX, SAP R/3, Oracle, Informix, and other applications. In fact, more than 40 KMs are available from BMC for use with PATROL.

BMC provides a tool with its UNIX KM to provide information about disks and disk usage. The following disk and filesystem metrics are available on HP-UX and Sun Solaris:

Number and rate of system data transfers, per disk drive, during interval

Average service time, per disk drive

Average disk seek time, per disk drive

Percentage of time the drive is busy fulfilling a transfer request

Average time spent waiting in queue, per disk drive

Rate of physical disk reads, per disk drive

Rate of raw reads, per disk drive

Rate of logical disk reads, per disk drive

Percentage of logical reads in the buffer cache

Rate of physical disk writes, per disk drive

Number of logical writes to system buffer

Percentage of logical blocks written to buffer cache

Average number of requests in queue, per disk drive

Percentage of time that CPU spends waiting for I/O operations

BMC can also monitor the percentage of swap space in use, and the amount of filesystem space in use, per filesystem.

The disk monitoring capabilities of BMC PATROL are similar to those of MeasureWare. Some minimal configuration information is provided, but its primary value is in tracking resource and performance information. Indirectly, BMC PATROL can provide some fault information as well. For example, if disk utilization suddenly drops to zero on a particular disk drive, it may be an indication that the disk has failed. Of course, it could also be an indication that an application has terminated and is no longer using the disk.

I l@ve RuBoard

Programmer's Life

Wednesday, December 16, 2009

Collecting Disk Performance Data

Collecting Disk Performance Data

MeasureWare

GlancePlus

Figure 5-8. Using GlancePlus to configure alarms for monitoring swap space utilization.

PerfView

Figure 5-9. PerfView can show the history of an application's I/O access rate.

BMC PATROL

No comments:

Blog Archive

About Me

Link