I l@ve RuBoard |
Data Protection (Disk Availability)
Protecting the disks with redundancy is the minimum level of protection required by SAP. Disk array controllers (DACs), as well as volume managers, have features that help protect access to the data. RAID levels, disk array controllers, and file systems are discussed in this section, as well as their potential performance impact.
Basic Principles
Protecting the data on the storage system of the SAP database server is a matter of doing a few simple things.
Choose a RAID configuration that satisfies both data protection and the performance requirements of the SAP system.
Protect access to the disks by making the I/O channels redundant.
Protect the integrity of the database by separating the logs from the data onto different physical disks, and by mirroring any write-I/O cache on the disk controller.
Protect your investment against disaster by replicating the data volumes or at least backing them up for a rapid restore.
The concept of protecting your database by segmenting the log files onto separate disks is fundamental to good management of the SAP database server. In case the disk volume(s) with the data crash, the database can always be recovered from the last full backup with application of the last log files, if available. More information on tape devices and recovery solutions is provided in Chapter 5
.
Disk Redundancy桼AID Levels
RAID stands for Redundant Array of Inexpensive (or Independent) Disks. It is strictly a theoretical definition or model of how to represent multiple disks as one, whether they are redundant. While RAID does not prevent individual disk failures, it can be used to manage recovery from them when using larger volumes consisting of multiple disks. Deploying a storage system implementation with RAID makes basic sense梚t's a relatively low investment to protect from incurring costly downtime expenses.
If the operating system is used to configure a set of disks together as one, either mirrored or striped, this is called software RAID. If a disk array controller is used to represent multiple disks as one, this is known as hardware RAID.
If a specific RAID configuration isn't applied to a set of disks, then it's just a bunch of disks (JBOD). While it is possible to install, configure, and operate an SAP database server with a JBOD configuration, some level of disk redundancy should be applied if it is to be used for any serious reasons. SAP only supports production systems if they have the appropriate RAID level of disk protection, whether in software, hardware, or other.
RAID 0
When a bunch of disks is configured to act as one, with data evenly spread or striped across them all, then this is a stripe set. If no protection is applied, this stripe set is called RAID 0. This is one of the highest performing disk configurations possible because there's no penalty for data protection. Up to a point, the more disks you add the faster the disk I/O system with RAID 0 gets, regardless of reading or writing. Of course, the chances of losing any one disk in the stripe set also increases as disks are added.
Because RAID 0 has no protection against disk failure, it is not recommended in an SAP environment for any server, whether for application or database servers, without some other form of disk protection.
RAID 1桵irroring
Mirroring one disk with another is called RAID 1. This method of disk protection requires a minimum of two disks. Most of the smaller disk volumes used in an SAP database server will be configured with a RAID 1, or mirroring, configuration (ex: OS and log disks). Duplexing is also a common term used with RAID 1 but represents a software RAID solution spanning two disk I/O channels or host bus adapters in a server.
RAID 0/1 or 10
Because the standard RAID 1 configuration uses only two disks, a higher capacity configuration is often needed. RAID 0/1 or 10 combines the distributed RAID 0 stripe configuration with the RAID 1 mirror. The benefit is a configuration of one very large mirrored and striped disk volume. It can handle as many disk I/O requests as the sum of the RAID 1 pairs configured, but it distributes the I/O evenly among all the disks (balanced I/O).
The number of disks within the RAID 0/1 set depends on the manufacturer of the RAID controller. Whereas standard two-disk RAID 1 is available in both software and hardware RAID configurations, the more sophisticated RAID 0/1 or 10 is usually hardware manufacturer specific and does require a hardware RAID controller. For some manufacturers, four or more even number of disks can be used to configure RAID 0/1. For others, RAID 10 may mean very specifically four disks, no more, and no less.
|
RAID 5桪istributed Parity
Although there are other RAID levels between 1 and 5, RAID 5 is the most common for business application systems. RAID level 5 creates a form of redundancy called parity, which allows for online recovery. Recovery from a disk failure can be done by reading the remaining good data, adding the parity, and calculating what the missing good data should be. RAID 1 does not make a parity calculation because it has a complete, not computed, mirror copy of each data block written to disk.
TIP
RAID 5 Support
SAP does not make explicit support statements about which RAID levels to use but instead defers to the system and storage provider for the recommendation. RAID 5 can be effectively used and is supported.
Figure 4-6
shows RAID 5 and the distribution of parity across all the disks. RAID 5 is able to perform overlapping I/Os for both read and write requests, which offers a good balance of performance, redundancy, and cost for business applications. The minimum number of disks needed for RAID 5 is three. Recovery in a RAID 5 disk set only works when one disk has failed. If more than one disk fails in a single RAID 5 disk set (volume set), then the volume of data is lost. The more disks there are in a RAID 5 set, the greater the risk of a failure. For this reason, it makes sense to configure RAID 5 disk sets with a smaller amount of disks (six or less are recommended).
Figure 4-6. RAID 5 Distributed Parity
The number of physical disk I/Os generated is higher with RAID 5 than with RAID 0 or 1. Because storage systems generally overlap read requests, all disks are used with no reduction in read performance compared with RAID 1 or 0. However, with write-I/O requests, all of the disks are used to write data except the one with the parity information. Often, RAID 5 write requests generate even more physical I/Os due to a read-modify-write cycle. This happens when the data block to be updated is smaller than the RAID 5 stripe across all of the disks in the volume set. However, most array controller implementations with very large caches can keep data in cache long enough to perform very efficient write-back operations, reducing the performance impact of RAID 5 writes.
In extreme situations, such as an initial data load or any other sequential I/O forcing an immediate flush of the storage system's cache, the RAID 5 write performance penalty cannot be avoided. More disks are then needed to be equivalent with a RAID level 1 or 0 performance. In theory, this penalty can be 1.8譨s many I/Os as an equivalent RAID 0 disk configuration, which will be assumed for this chapter.
TIP
RAID Levels and I/O Performance
The amount of I/Os from the server is always driven by the application demands and is considered the front-end I/Os to the storage system. These are independent of the RAID level used. The amount of physical I/Os that happen inside the storage system, behind the cache (back-end), is dependent on the RAID level used, however.
Software RAID and File Systems
The RAID implementations just discussed are theoretical models of disk protection. The practical implementation is usually some form of software and hardware combination. The lowest cost RAID implementation available is software RAID. Most operating systems, including Microsoft Windows NT/2000 or Unix, support one or more of the RAID levels described, along with standard I/O controllers.
General Notes about Software RAID
Because software RAID is at the operating system level, it can span the disk I/O controllers or host bus adapters. Software RAID 0 (striping) can effectively be used in combination with hardware RAID 1 or 5.
When using software mirroring (RAID 1), the impact on the CPU, I/O, and memory bus utilization is negligible during normal mirroring operations. The main issues start when a failure occurs. Not all operating systems support hot swap replacement of failed disks (this is normally easier with hardware RAID solutions). Rebuilding a software-mirrored volume takes up some of the host server's CPU, I/O, and memory bus utilization. This has a direct impact on the business application software running. In addition, the server may not boot if the first SCSI disk in the software RAID 1 set is the one that failed (depends on the server and SCSI boot controller). Many IT administrators simply prefer a hardware RAID 1 implementation, even for their OS boot disks, to keep a uniform or standard way of dealing with disk administration. This is a common situation on Intel-based servers.
Running software RAID 5 volumes does have a measurable impact on the CPU during normal operations. It gets worse in a failure or rebuild situation. This isn't a problem for a file server, but a business application server needs as much CPU available as possible and shouldn't waste any for unnecessary I/O activity. SAP does not recommend using software distributed parity (RAID 5) on any server running SAP software.
Microsoft Windows NT 4.0 and 2000
SAP only supports using Windows NT 4.0 with the NTFS file system. SAP does not support using the FAT or FAT32 file systems mostly due to security permissions but also due to performance reasons for very large volumes. Microsoft Windows NT and 2000 server editions support creating spanned and striped volumes, mirrored volumes, as well as software distributed parity (RAID 5) volumes.
For Microsoft Windows NT 4.0, the information or metadata used to define the disk volumes is only stored in the NT registry. The metadata includes which disk signatures and other disk volume information is used to make up the software RAID set. If the NT registry was ever lost, or if the disk signatures were ever physically overwritten, it would be very difficult to re-create the disk volume without an up-to-date backup copy of the NT registry. For some IT managers, this represents too much risk during recovery situations.
With Microsoft Windows 2000, however, improvements to the file system have been made. All disks formatted as dynamic with Windows 2000 store the metadata on a special partition at the end of each disk in the volume set. The metadata is only replicated in the registry when needed, but the registry is no longer a single point of failure. This makes using some of the software RAID features previously described a viable solution for SAP database servers using Microsoft Windows 2000, appropriate when building large disk volumes using RAID 0 volume striping with a hardware RAID configuration underneath.
Unix Software RAID Support
Software RAID levels are also supported in the Unix operating systems. Volume striping, spanning, and mirroring are the typical RAID levels supported in Unix file systems. Sometimes, an optional software package must be installed to provide the software RAID level support, for example, MirrorDisk/UX on HP-UX. However, it does have the typical utilization drawbacks of software RAID solutions.
How disks are organized and managed depends on the Unix version. Some Unix flavors assign complete disks as one addressable unit or partition. With HP-UX, for example, a pool of storage (disks) is first organized in smaller blocks or extents. These can then be configured in panned, striped, or mirrored sets of logical volumes, typically with a volume manger (LVM). These raw logical volumes can then be used directly by the database to store information, or they can be formatted by a file system.
Unix File Systems
There are multiple types of file systems and data access methods supported on Unix. SAP doesn't make any specific certification statements about Unix file systems supported with SAP applications, other than the hardware and database vendors must support them. There are two main categories of disk access methods in Unix: buffered I/O or unbuffered direct (raw) I/O.
Buffered I/O Disk Access
The fie systems that employ a buffered I/O data access method include:
HFS桯igh Performance File System (a.k.a. UFS桿nix File System)
JFS桱ournal File System
Windows NT or 2000 NTFS (for comparison)
Buffered I/O on Unix makes use of the OS file system buffers. This introduces a copy of the data in memory, which impacts the overall performance of an SAP database server with lots of small, random I/O activity. Both the HFS and the JFS file systems are based on this type of data access. For a file server with larger, sequential I/O, this type of data access has certain performance advantages because it can asynchronously move data to the buffer cache while the application continues processing. The main advantage for SAP database servers of this type of data access, however, is that you have a formatted file system available on the disk, and that means easier administration of the data. SAP database data files are visible in the file system and can be treated with the same IT management processes as other non-SAP file types (with standard backup processes, etc.).
In addition to the HFS or UFS file systems, most versions of Unix also include the JFS file system. The main advantage of using JFS is for faster recovery after a system crash. The file system check (fsck) takes significantly less time for JFS because it keeps track of file system changes in a intent log or journal. The newer, dynamic variation of JFS is called OnlineJFS, which allows even easier file system resizing and management.
In many cases, SAP will recommend that customers use a file system like HFS or JFS instead of raw devices mostly for operational reasons. The handling of files is easier for most IT administrators than raw disks and so provides a level of protection against user errors (deleting or unmounting disk volumes, failing to make backups, etc.). The decision depends on both the absolute I/O requirements of the database server and the skills of the IT staff available to manage complex systems.
Unbuffered Direct I/O Disk Access
Unbuffered I/O is used by the following data-access methods:
Raw I/O
JFS with Direct I/O (VxFS; Veritas File System)
Some performance degradation occurs with the previously described buffered disk I/O in environments with small, random I/O, as with SAP R/3 database servers. In addition to being slower, the extra I/O buffer takes up valuable memory space in the server. If the absolute maximum disk performance is needed on a Unix database host, the answer is to use the raw disk access. Raw disk I/O removes the extra buffer and allows applications to directly write to the disk volumes, without additional delay. Asynchronous raw I/O is nonblocking, so the application must ensure data integrity by not writing to the same disk block at once (most databases do support row-level locking to ensure data integrity). Synchronous raw I/O serializes the write-I/O activity and is therefore slower.
For very I/O intensive SAP database systems, when a strong IT administration team is available, it is viable to use raw disk I/O for the SAP database system. It's faster than the standard buffered I/O methods, and it's supported by most databases. The best Unix-based SAP benchmark results are usually achieved on database servers with raw I/O.
The main drawback of using raw I/O data access is the seemingly more difficult administration of disks without a standard file system. Not all tools for backup can be used with raw devices, and IT administrators may accidentally place a file system on an already assigned raw disk. Although there are plenty of software tools available for properly managing raw disks, it does require thinking differently.
Hardware RAID
The RAID disk configuration can be created and managed either in the operating system software of the server or within the micro-code and processor of a dedicated hardware disk array controller (DAC). Because the remirroring and parity computation processes take up CPU cycles, they can take up either the server's CPU time or the CPU time of a dedicated array controller. To achieve optimal performance for the SAP and database software application, SAP highly recommends the use of hardware RAID storage systems to offload the server's CPU of unnecessary I/O tasks.
Disk Array Controllers
The hardware disk array controller (DAC) can be either internal to the server as a PCI card (iDAC), or external in a separate storage system (eDAC). A typical internal DAC contains a host interface (PCI connector); SCSI or FC connectors to the disk storage box; a RISC processor for the RAID 5 parity computation; and disk cache or memory for reads, writes, or both. In addition, internal DACs contain nonvolatile RAM (NVRAM) for storing the disk stripe and RAID configuration information. Internal DACs are most common on Intel-based servers. An example is the AMI MegaRAID controller series, although many others are available.
External DACs are more common on Unix-based systems or for any server that requires a more sophisticated disk solution. Disk solutions with external DACs are usually more expensive because they also require a standard SCSI or FC host bus adapter in the server to connect the external storage system. One big benefit of storage systems with eDACs is that the disk stripe or RAID configuration information is external to the server, which is helpful for high availability purposes (failover to another server is more easily done).
Many external disk storage systems offer two disk array controllers, which can be made to either fail over, provide some level of load balancing, or both. This removes the critical single points of failures in the disk I/O channel path. This is often referred to as an AutoPath solution and is recommended for database servers.
Impact of RAID Volume Rebuild
In addition to managing the disks during normal operations, disk array controllers are also responsible for automatically rebuilding disk array volumes after the failed disk has been replaced. This is preferably done online without impacting the server. Support for auto-rebuild and hot plug capabilities are more readily available with hardware RAID solutions, making online replacement of failed disks possible without interruption or rebooting of the server.
Rebuilding a mirrored or RAID 5 array volume does not require any CPU time of the database server because it is handled by the onboard RISC processor of the DAC. Recovery prioritization can normally be adjusted, depending on the I/O requirements of the storage system. A heavily used SAP database server may not be effective if the recovery process is consuming 100% of the DAC's processing power, however.
Virtual Arrays
Some disk storage systems include intelligent array controllers that make effective use of all the disks by mapping all logical volumes across all of the available physical disks. This virtual mapping of I/O blocks occurs in a specialized array controller and thus is limited to specific storage vendors. It can help reduce the administrative effort involved in managing the storage system to achieve a balanced or distributed layout.
When using these virtual arrays, care must be taken to ensure a database recovery is possible in case multiple disks in the virtual array fail at once (occasionally happens). It is recommended to use virtual arrays for the data files but not for the log files needed for recovery. These should always be kept both logically and physically separate. Virtual arrays typically provide logical but not physical separation of the disks' volumes.
Stripe Sizes
The physical I/O block stripe size used to store data on the disks can usually be set up using the disk array controller, although some arrays use a fixed size. Ideally, the physical I/O stripe size should match the I/O request size coming from the application or operating system.
With an SAP database server, there several types of file I/O access: large, sequential I/O (OS paging/swap and transaction log files) and small, random I/O (database data files). For sequential I/O, physical block stripe sizes of 32KB to 256KB are appropriate, with 64KB being the most commonly used setting. Some array controllers can even create stripe sizes several megabytes in size. For smaller, random I/O smaller physical block stripe sizes are typically used, although anything up to 64KB is appropriate (depends on the array controller).
This physical stripe size is also referred to as stripe depth. This is not the same as the stripe or allocation size used by the operating system when formatting a file system. It is important that the file system stripe or allocation size is smaller than or equal to the physical I/O stripe size on the disks, otherwise more physical disk I/Os will be generated than needed. For example, if the Windows NTFS partition is formatted with a 4096-byte allocation size, then the physical I/O block stripe size on the disks should be 4KB or greater.
TIP
Disk Array Stripe Size Dependencies
If the physical block stripe on the disks is set too high, it quickly fills the disk cache with unneeded I/O chunks. If it's too small, too many physical I/Os to or from disk are generated, also impacting the performance. The ideal setting should be based on database software I/O access size, the file system block size (or greater), and the performance of the individual DAC at a particular stripe size.
Disk Caches
All disk array storage products commonly use a cache, regardless of whether it's a low cost internal DAC or an enterprise disk storage system. This helps improve the storage system's performance by buffering read and write requests to and from the operating system or application software. Data transfer from a storage system's cache is significantly faster than transferring from a slower physical disk device.
For mission-critical SAP systems, there are a few important things to note about using storage system caches:
The storage system's cache is best used as a write cache to more quickly acknowledge database write-I/O requests, helping keep the response time low.
When using a write-cache, make sure it is mirrored in case of failure.
Ensure the cache has a battery backup in case of power loss.
Size the cache large enough for the expected average working set of data.
The ideal storage system cache size depends on the server to which it is attached. If the database server does not have enough memory to store the data buffers, then the disk system cache should be sized large enough to act as a secondary database read buffer. If possible, the expected working set, those data accessed frequently during average processing periods, should fit as much as possible into the storage system's (and server's) cache. In this case, an enterprise storage system is usually configured with more than 4GB of cache. If large caches are not available on the storage system, then additional disk mechanisms will be needed to make up the performance required.
Write Cache Modes
The write-cache for a disk array controller can be set to two different modes: write-through or write-back. The write-through version is the safest but also slowest because the write-I/O request isn't complete until the data is physically transferred to the disk device, impacting the performance of the database server. This is typically an option on internal DACs found in Intel-based systems.
Most external storage systems, however, use the write-back (asynchronous or copy-back) method. It is the fastest method because the write-I/O request is completed once the data is in the storage system's cache梚t doesn't wait for the data to transfer down to the slower disk device. The write-back cache setting helps with RAID 5 by taking the slower write-I/O part of the read-modify-write cycle offline. The application continues processing, thinking that the actual write-I/O is complete. The write-I/Os, however, are flushed to the disk asynchronously, or at a later time, when disk I/O activity is lower. Due to very large caches in enterprise storage systems, there is typically very little difference between RAID 1 and RAID 5 performance from the database server's front-end point of view.
Mirrored Caches
Although using write-back caches has great performance benefits, doing so increases the risk to data integrity in case of a storage system failure, whether power related or other. Because the updated data is flushed to the physical disk at a later time, the disks may not be consistent with what the application thinks they should be at the point of failure. Not only should a battery be used to back up the storage system cache but also the cache should be mirrored. Without the mirrored cache and the battery, there is a higher risk of data integrity problems when using write-back cache settings. Only entry-level disk storage systems do not have a mirrored and battery backed-up cache.
Data Replication
An additional way to protect data is to make additional copies of the disk volumes. This solution is often used for making nonimpact backups for disaster recovery, for making the production data available for testing, reporting, data extraction, versioning, nondisruptive upgrades or for sharing data with other systems without impacting the production system's disk performance. It is also commonly referred to as the point-in-time copy feature.
Physical Copy Method
The traditional method of implementing data replication is to make an additional physical copy of the primary disk volumes onto separate physical disks across the data bus of the enterprise storage system, as shown in Figure 4-7
. Both the primary and copy volumes are protected with a RAID level. This feature is offered in several enterprise storage solutions, such as Business Copy on the HP SureStore E Disk System XP series and Business Continuance Volume (BCV) on EMC's Symmetrix�series. It is commonly used in production SAP installations.
Figure 4-7. Additional Copy Volume(s)
Because a real physical copy of the primary disks is made, it takes some time to synchronize the data stored on the disks. Typically, a point in time is chosen to split the volumes from each other, which requires that storage system's cache be synchronized with the copy disk volume (a few moments of sequential I/O). Once the split occurs, activity on the database can resume and the system is immediately productive. When the additional mirror volume is rejoined later to the primary disk volume, then a physical copy or resynchronization occurs. How long this takes depends on how much data was changed since the last split, as well as on how fast the disks and internal data buses are. This could be a matter of minutes to an hour or two.
A physical copy also means that the raw physical capacity needed in the storage system is more than what would normally be needed only for the primary disk volumes. For example, if 24 18GB disks are needed just to hold the data (~432GB), 48 disks will be needed if using RAID 1 for the primary volume. Then 48 more disks would be needed for the additional copy volume, if using RAID 1. That is a total of 96 18GB disks needed (not including hot spares), or 1.7TB of raw disk capacity to hold 432GB of usable production disk space.
One of the benefits of an additional physical copy of the primary disks is that a separate set of disks helps with disaster recovery scenarios when they are located in another storage cabinet in a remote data center. The drawback is that multiple copies of the disk volumes demand more physical disks, which can be expensive. An additional benefit is that the production data disks are separate from all others, thus there is no performance impact due to sharing of disk drives.
Making multiple physical copies of the disk volumes is also possible through the OS via LVM or equivalent, and as such, is a software-RAID solution. Veritas offers this software file system solution, for example. This may have a performance impact on the SAP database server, so should be considered carefully before implementing.
Snapshot Copy Method
An alternative implementation of additional mirrored volumes within an enterprise storage system employs the method of making a snapshot of the primary disk volume. The data on the disk from which the snapshot was taken is no longer changed but only referenced. Any changes to the primary disk volume after the snapshot are placed in an additional internal volume set. Additional copy volumes are established as new virtual disk volumes, simply referencing the original disks from which a snapshot of the data was made. No physical mirror copy of data onto separate disks is made, saving time and redundant physical disk space (referred to as data compression). Although a cache flush still needs to be made for the virtual split or copy to occur, the resynchronization phase afterwards is no longer needed. Snapshots can be taken at any time forward, because all the data is virtually managed on the available disks.
This solution does have an impact on performance, however. When using a copy volume, the desire is often to have zero impact on the primary database volumes. A virtual split still requires reading of the data on the primary volumes, which may already be under heavy I/O load from the production application, leading to contention and possibly higher DB response time. It is limited to specific storage vendors (ex: Storage Tek SVA), although similar solutions from other vendors are planned.
I l@ve RuBoard |
No comments:
Post a Comment