10.4 Recovering from Disasters
Despite the prevalence of redundant or protected disk storage, media failures can and do occur. In cases in which one or more Oracle datafiles are lost due to disk failure, you must use database backups to recover the lost data.
There are times when simple human or machine error can also lead to the loss of data, just as a media failure can. For example, an administrator may accidentally delete a datafile, or an I/O subsystem may malfunction, corrupting data on the disks. The key to being prepared to handle these types of disasters is implementing a good backup and recovery strategy.
10.4.1 Developing a Backup and Recovery Strategy
Proper development, documentation, and testing of your backup and recovery strategy is one of the most important activities in implementing an Oracle database. You must test every phase of the backup and recovery process to ensure that the entire process works, because once a disaster hits the complete recovery process must work flawlessly.
Some companies test the backup procedure but fail to actually test recovery using the backups taken. Only when a failure requires the use of the backups do companies discover that the backups in place were unusable for some reason. It's critical to test the entire cycle from backup through restore and recovery.
10.4.2 Taking Oracle Backups
Two basic types of backups are available with Oracle:
- Hot backup
The datafiles for one or more tablespaces are backed up while the database is active.
- Cold backup
The database is shut down and all the datafiles, redo log files, and control files are backed up.
With a hot backup, not all of the datafiles must be backed up at once. For instance, you may want to back up a different group of datafiles each night. You must be sure to keep backups of the archived redo logs that date back to your oldest backed-up datafile, because you'll need them if you have to implement rollforward recovery from the time of that oldest datafile backup.
Some DBAs with very large databases back up the various datafiles over several runs. Some DBAs back up the datafiles that contain data subject to frequent changes more frequently (for example, daily), and back up datafiles containing more static data less often (for example, weekly). There are commands to back up the control file as well; this should be done after all the datafiles have been backed up.
If the database isn't archiving redo logs (this is known as running in NOARCHIVELOG mode and is described in Chapter 2), you can only take complete cold backups. If the database is archiving redo logs, it can be backed up while running.
Regardless of backup type, you should also back up the INIT.ORA or SPFILE file and password files�these are key files for the operation of your Oracle database. While not required, you should also back up the various scripts used to create and further develop the database. These scripts represent an important part of the documentation of the structure and evolution of the database.
For more information about the different types of backups and variations on these types, please refer to your Oracle documentation as well as the third-party books listed in Appendix B.
10.4.3 Using Backups to Recover
Two basic types of recovery are possible with Oracle, based on whether or not you are archiving the redo logs:
- Complete database recovery
If the database did not archive redo logs, only a complete cold backup is possible. Correspondingly, only a complete database recovery can be performed. You restore the database files, redo logs, and control files from the backup. The database is essentially restored as of the time of the backup. All work done since the time of the backup is lost and a complete recovery must be performed even if only one of the datafiles is damaged. The potential for lost work, coupled with the need to restore the entire database to correct partial failure, are reasons most shops avoid this situation by running their databases in ARCHIVELOG mode. Figure 10-8 illustrates backup and recovery for a database without archived redo logs.
- Partial or targeted restore and rollforward recovery
When you're running the Oracle database in ARCHIVELOG mode, you can restore only the damaged datafile(s) and can apply redo log information from the time the backup was taken to the point of failure. The archived and online redo logs reproduce all the changes to the restored datafiles to bring them up to the same point in time as the rest of the database. This procedure minimizes the time for the restore and recovery operations. Partial recovery like this can be done with the database down. Alternatively, the affected tablespace(s) can be placed offline and recovery can be performed with the rest of the database available. Oracle9i improved the granularity of the recovery process by also enabling restore and recovery of individual data blocks instead of providing restore and recovery only of entire datafiles. Figure 10-9 illustrates backup and recovery with archived redo logs.
Obviously, the redo logs are extremely important. Oracle first enabled analysis of these files through the LogMiner tool in Oracle8i. Since Oracle9i, the LogMiner is accessible through an Oracle Enterprise Manager GUI, and it provides log analysis for all datatypes. If the redo log has become corrupted, the LogMiner can now read past corrupted records as desired in order to analyze the impact on transactions after the corruption.
10.4.4 Read-Only Tablespaces
Oracle 7.3 introduced read-only tablespaces. Using the ALTER TABLESPACE statement in SQL, you can mark a tablespace as read-only. No changes are possible to the objects stored in a read-only tablespace. You can toggle a tablespace between read/write and read-only states as you wish.
Once a tablespace is in read-only mode, it can be backed up once and doesn't have to be backed up again, because its contents cannot change unless it's placed in read/write mode. Marking a tablespace as read-only allows entire sections of a database to be marked read-only, backed up once, and excluded from regular backups thereafter.
If a datafile of a read-only tablespace is damaged, you can restore it directly from the backup without any recovery. Because no changes were made to the datafiles, no redo log information needs to be applied. For databases with significant static or historical data, this option can significantly simplify and streamline backup and restore operations.
Read-only tablespaces, combined with Oracle's ability to partition a table on a range or list of column values (for example, a date) provide powerful support for the rolling windows common to data warehouses (described in Chapter 9). Once a new month's data is loaded, indexed, and so on, the relevant tablespaces can be marked read-only and backed up once, removing the tablespaces datafile(s) from the cycle of ongoing backup and significantly reducing the time required for those backup operations.
10.4.5 Point-in-Time Recovery
Oracle 7.3 introduced point-in-time recovery (PITR) for the entire database. Point-in-time recovery allows a DBA to restore the datafiles for the database and apply redo information up to a specific time or System Change Number (SCN). This limited type of recovery is useful for cases in which an error occurred�for example, if a table was dropped accidentally or a large number of rows were deleted incorrectly. The DBA can restore the database to the point in time just prior to the event to undo the results of the mistake.
A difficulty with database-level point-in-time recovery is that the entire database has to be restored. In response to this limitation, Oracle8 introduced point-in-time recovery at the tablespace level within the database. Point-in-time recovery based on a tablespace allows a DBA to restore and recover a specific tablespace or set of tablespaces to a particular point in time. Only the tablespace(s) containing the desired objects need to be recovered. This is a very useful improvement given the ever-increasing size of today's databases.
You should use this tablespace feature carefully, because objects in one tablespace may have dependencies, such as referential integrity constraints, on objects in other tablespaces. For example, suppose that Tablespace1 contains the EMP table and Tablespace2 contains the DEPT table, and a foreign key constraint links these two tables together for referential integrity. If you were to recover Tablespace2 to an earlier point than Tablespace1, you might find that you had rows in the EMP table that contained an invalid foreign key value, because the matching primary key entry in the DEPT table had not been rolled forward to the place where the primary key value to which the EMP table refers had been added.
10.4.6 Flashback
Oracle9i introduced a new recovery option called flashback, which was designed to help to recovery from user errors.
The concept behind flashback recovery is simple. You can execute a query against the database as of a particular time or system change number (SCN). Oracle delivers the result set as it would have appeared if the query were run at that time, using the undo log information segments to reconstruct the data, which can then be used to correct the results of the errant action.
Oracle Database 10g includes a much wider range of flashback options, including:
- FLASHBACK DATABASE
Returns (rolls back) the entire database to a particular point in time. Can be used instead of point-in-time recovery in some situations.
- FLASHBACK TABLE
Returns a specific table to a specific point in time.
- FLASHBACK TRANSACTION
Returns all the changes made by one specific transaction.
- FLASHBACK DROP
Rolls back a drop operation. When an object is dropped, it is placed in a Recycle Bin, so a user can simply un-drop the object to restore it.
- SELECT Flashback clauses
New clauses in the SELECT statement return all the versions of rows (i.e., show changes to the rows) in a particular query over a span of time.
10.4.7 Recovery Manager
Recovery Manager (RMAN) debuted with Oracle8 and provides server-managed backup and recovery. RMAN includes the ability to perform and track backup and recovery operations. RMAN does the following:
Backs up one or more datafiles to disk or tape Backs up archived redo logs to disk or tape Restores datafiles from disk or tape Restores and applies archived redo logs to perform recovery Automatically parallelizes both the reading and writing of the various Oracle files being backed up
RMAN performs the backup operations and updates a catalog, which is stored in an Oracle database with the details of what backups were taken and where they were stored. You can query this catalog for critical information, such as datafiles that have not been backed up or datafiles whose backups have been invalidated through NOLOGGING operations performed on objects contained in those datafiles.
RMAN also uses the catalog to perform incremental backups. RMAN will back up only database blocks that have changed since the last backup. When RMAN backs up only the individual changed blocks in the database, the overall backup and recovery time can be significantly reduced for databases in which a small percentage of the data in large tables changes. With Oracle Database 10g, RMAN can apply incremental backups to an image backup of the database; it also provides enhanced performance for incremental backups.
Another major advantage RMAN offers is to make it simpler to perform hot or online backups. RMAN also reduces the overhead required to make online backups.
Prior to RMAN (in Oracle7), you took hot backups by issuing ALTER TABLESPACE BEGIN BACKUP statements for the tablespace whose datafiles were to be backed up; you backed up the datafiles using operating system commands and issuing ALTER TABLESPACE END BACKUP statements. Oracle continued to write blocks to the datafiles while they were being backed up, and there was no restriction on user activity. This meant that the datafile backups could contain blocks from different points in time, because the backup process itself occurred over time and blocks were being written to the datafiles during this time.
Consider the following example. The operating system process reading the datafile reads the first operating system block within an Oracle block composed of two operating system blocks and writes it to tape at Time T. At T+1, Oracle updates the database block in the datafile. Both operating system blocks in the datafile on disk are now consistent as of Time T+1. The operating system process for the backup then reads and backs up the second operating system block as of T+1. The datafile backup now contains a "fuzzy" database block. The two operating system blocks that make up the Oracle block in the backup are from different points in time�the first operating system block is as of Time T, and the second is as of Time T+1.
To address this potential inconsistency, Oracle7 backups automatically included extra redo information to correct this situation for any tablespace for which an ALTER TABLESPACE BEGIN BACKUP had been issued. In Oracle Database 10g, you can issue a single ALTER DATABASE statement to put all of the tablespaces in a database into hot backup mode.
RMAN is an Oracle process in Oracle8 and more recent releases that reads and writes Oracle blocks, not operating system blocks. While RMAN is backing up a datafile Oracle blocks can be written to it, but RMAN will read and write in consistent Oracle blocks, not operating system blocks within an Oracle block. This removes the possibility of fuzzy Oracle blocks in the datafile backups and therefore removes the need for the ALTER TABLESPACE statements and the additional redo.
In Oracle Database 10g, RMAN is used to support automated disk-based backup. Disk-based strategies have an advantage over tape: they enable random access to any data such that only changes need be backed up or recovered. RMAN can be set up to run a backup job to disk at a specific time. RMAN will manage the deletion of backup files that are no longer necessary. Future releases promise to integrate these automatic backups with a maintenance window of time, which can minimize the impact of backup operations by running them when resources are readily available.
|
No comments:
Post a Comment