Monday, November 2, 2009

10.4 Recovering from Disasters



[ Team LiB ]






10.4 Recovering from Disasters



Despite
the prevalence of redundant or protected disk storage, media failures
can and do occur. In cases in which one or more Oracle datafiles are
lost due to disk failure, you must use database
backups to recover
the lost data.



There are times when simple human or machine error can also lead to
the loss of data, just as a media failure can. For example, an
administrator may accidentally delete a datafile, or an I/O subsystem
may malfunction, corrupting data on the disks. The key to being
prepared to handle these types of disasters is implementing a good
backup and recovery strategy.




10.4.1 Developing a Backup and Recovery Strategy



Proper
development,
documentation, and testing of your backup and recovery strategy is
one of the most important activities in implementing an Oracle
database. You must test every phase of the backup and recovery
process to ensure that the entire process works, because once a
disaster hits the complete recovery process must
work flawlessly.



Some companies test the backup procedure but fail to actually test
recovery using the backups taken. Only when a failure requires the
use of the backups do companies discover that the backups in place
were unusable for some reason. It's critical to test
the entire cycle from backup through restore and recovery.





10.4.2 Taking Oracle Backups



Two basic types of backups are available
with Oracle:




Hot backup



The datafiles for
one or more tablespaces are backed up while the database is active.




Cold backup



The database is
shut down and all the datafiles, redo log files, and control files
are backed up.





With a hot backup, not all of the datafiles must be backed up at
once. For instance, you may want to back up a different group of
datafiles each night. You must be sure to keep backups of the
archived redo logs that date back to your oldest backed-up datafile,
because you'll need them if you have to implement
rollforward recovery from the time of that oldest datafile backup.



Some DBAs with very large databases back up the various datafiles
over several runs. Some DBAs back up the datafiles that contain data
subject to frequent changes more frequently (for example, daily), and
back up datafiles containing more static data less often (for
example, weekly). There are commands to back up the control file as
well; this should be done after all the datafiles have been backed
up.



If the database isn't archiving redo logs (this is
known as running in NOARCHIVELOG mode and is described in
Chapter 2), you can only take complete cold backups. If the database
is archiving redo logs, it can be backed up while running.



Regardless of backup type, you should also
back up the INIT.ORA or
SPFILE
file and
password files�these are key files for
the operation of your Oracle database. While not required, you should
also back up the various scripts used to create and further develop
the database. These scripts represent an important part of the
documentation of the structure and evolution of the database.



For more information about the different types of backups and
variations on these types, please refer to your Oracle documentation
as well as the third-party books listed in Appendix B.





10.4.3 Using Backups to Recover



Two basic types of
recovery are possible with Oracle, based on
whether or not you are archiving the redo logs:




Complete database recovery



If the database did not archive
redo logs, only a complete cold backup is
possible. Correspondingly, only a complete database recovery can
be performed. You restore the database files, redo logs, and control
files from the backup. The database is essentially restored as of the
time of the backup. All work done since the time of the backup is
lost and a complete recovery must be performed even if only one of
the datafiles is damaged. The potential for lost work, coupled with
the need to restore the entire database to correct partial failure,
are reasons most shops avoid this situation by running their
databases in ARCHIVELOG
mode. Figure 10-8 illustrates backup and recovery for a database
without archived redo logs.






Figure 10-8. Database backup and recovery without archived redo logs




Partial or targeted restore and rollforward recovery



When you're
running the Oracle database in ARCHIVELOG mode, you can restore only
the damaged datafile(s) and can apply redo log information from the
time the backup was taken to the point of failure. The archived and
online redo logs reproduce all the changes to the restored datafiles
to bring them up to the same point in time as the rest of the
database. This procedure minimizes the time for the restore and
recovery operations. Partial recovery like this
can be done with the database down. Alternatively, the affected
tablespace(s) can be placed offline and recovery can be performed
with the rest of the database available.
Oracle9i
improved the granularity of the recovery process by also enabling
restore and recovery of individual data blocks instead of providing
restore and recovery only of entire datafiles. Figure 10-9 illustrates backup and recovery with archived redo logs.






Figure 10-9. Database backup and recovery with archived redo logs



Obviously, the redo logs are extremely important. Oracle first
enabled analysis of these files through the LogMiner tool in
Oracle8i. Since
Oracle9i,
the LogMiner is accessible through an Oracle Enterprise Manager GUI,
and it provides log
analysis for all datatypes. If the redo log has become corrupted, the
LogMiner can now read past corrupted records as desired in order to
analyze the impact on transactions after the corruption.





10.4.4 Read-Only Tablespaces



Oracle 7.3 introduced
read-only tablespaces. Using the
ALTER TABLESPACE statement in SQL, you can mark a tablespace as
read-only. No changes are possible to the objects stored in a
read-only tablespace. You can toggle a tablespace between read/write
and read-only states as you wish.



Once a tablespace is in read-only mode, it can be backed up once and
doesn't have to be backed up again, because its
contents cannot change unless it's placed in
read/write mode. Marking a tablespace as read-only allows entire
sections of a database to be marked read-only, backed up once, and
excluded from regular backups thereafter.



If a datafile of a read-only tablespace is damaged, you
can restore it directly from the backup without any recovery. Because
no changes were made to the datafiles, no redo log information needs
to be applied. For databases with significant static or historical
data, this option can significantly simplify and streamline backup
and restore operations.



Read-only tablespaces, combined with Oracle's
ability to partition a table on a range or list of column values (for
example, a date) provide powerful support for the rolling windows
common to data warehouses (described in Chapter 9).
Once a new month's data is loaded, indexed, and so
on, the relevant tablespaces can be marked read-only and backed up
once, removing the tablespaces datafile(s) from the cycle of ongoing
backup and significantly reducing the time required for those backup
operations.





10.4.5 Point-in-Time Recovery



Oracle
7.3 introduced point-in-time recovery (PITR) for the
entire database. Point-in-time recovery allows a DBA to restore the
datafiles for the database and apply redo information up to a
specific time or System Change Number (SCN). This limited type of
recovery is useful for cases in which an error occurred�for
example, if a table was dropped accidentally or a large number of
rows were deleted incorrectly. The DBA can restore the database to
the point in time just prior to the event to undo the results of the
mistake.



A difficulty with database-level point-in-time recovery is that the
entire database has to be restored. In response to this limitation,
Oracle8 introduced point-in-time
recovery at the tablespace level within the database. Point-in-time
recovery based on a tablespace allows a DBA to restore and recover a
specific tablespace or set of tablespaces to a particular point in
time. Only the tablespace(s) containing the desired objects need to
be recovered. This is a very useful improvement given the
ever-increasing size of today's databases.



You should use this tablespace feature carefully, because objects in
one tablespace may have dependencies, such as referential integrity
constraints, on objects in other tablespaces. For example, suppose
that Tablespace1 contains the EMP table and Tablespace2 contains the
DEPT table, and a foreign key constraint links these two tables
together for referential integrity. If you were to recover
Tablespace2 to an earlier point than Tablespace1, you might find that
you had rows in the EMP table that contained an invalid foreign key
value, because the matching primary key entry in the DEPT table had
not been rolled forward to the place where the primary key value to
which the EMP table refers had been added.





10.4.6 Flashback



Oracle9i introduced a new recovery option called
flashback, which was designed to help to recovery from user errors.



The concept behind flashback recovery is simple. You can execute a
query against the database as of a particular time or system change
number (SCN). Oracle delivers the result set as it would have
appeared if the query were run at that time, using the undo log
information segments to reconstruct the data, which can then be used
to correct the results of the errant action.



Oracle Database 10g includes a much wider range
of flashback options, including:




FLASHBACK DATABASE



Returns (rolls back) the entire database to a particular point in
time. Can be used instead of point-in-time recovery in some
situations.




FLASHBACK TABLE



Returns a specific table to a specific point in time.




FLASHBACK TRANSACTION



Returns all the changes made by one specific transaction.




FLASHBACK DROP



Rolls back a drop operation. When an object is dropped, it is placed
in a Recycle Bin, so a user can simply un-drop the object to restore
it.




SELECT Flashback clauses



New clauses in the SELECT statement return all the versions of rows
(i.e., show changes to the rows) in a particular query over a span of
time.







10.4.7 Recovery Manager



Recovery Manager (RMAN) debuted with
Oracle8 and provides server-managed backup and recovery. RMAN
includes the ability to perform and track backup and recovery
operations. RMAN does the following:



  • Backs up one or more datafiles to disk or tape

  • Backs up archived redo logs to disk or tape

  • Restores datafiles from disk or tape

  • Restores and applies archived redo logs to perform recovery

  • Automatically parallelizes both the reading and writing of the
    various Oracle files being backed up


RMAN performs the backup operations and updates a catalog, which is
stored in an Oracle database with the details of what backups were
taken and where they were stored. You can query this catalog for
critical information, such as datafiles that have not been backed up
or datafiles whose backups have been invalidated through NOLOGGING
operations performed on objects contained in those datafiles.



RMAN also uses the catalog to perform
incremental backups. RMAN will back up only database blocks that have
changed since the last backup. When RMAN backs up only the individual
changed blocks in the database, the overall backup and recovery time
can be significantly reduced for databases in which a small
percentage of the data in large tables changes. With Oracle Database
10g, RMAN can apply incremental backups to an
image backup of the database; it also provides enhanced performance
for incremental backups.



Another major advantage RMAN offers is to make it simpler to perform
hot or online backups. RMAN also reduces the overhead required to
make online backups.



Prior to RMAN (in Oracle7), you took hot backups by issuing ALTER
TABLESPACE BEGIN BACKUP statements for the tablespace whose datafiles
were to be backed up; you backed up the datafiles using operating
system commands and issuing ALTER TABLESPACE END BACKUP statements.
Oracle continued to write blocks to the datafiles while they were
being backed up, and there was no restriction on user activity. This
meant that the datafile backups could contain blocks from different
points in time, because the backup process itself occurred over time
and blocks were being written to the datafiles during this time.



Consider the following example. The operating system process reading
the datafile reads the first operating system block within an Oracle
block composed of two operating system blocks and writes it to tape
at Time T. At T+1, Oracle updates the database block in the datafile.
Both operating system blocks in the datafile on disk are now
consistent as of Time T+1. The operating system process for the
backup then reads and backs up the second operating system block as
of T+1. The datafile backup now contains a
"fuzzy" database block. The two
operating system blocks that make up the Oracle block in the backup
are from different points in time�the first operating system
block is as of Time T, and the second is as of Time T+1.



To address this potential inconsistency, Oracle7 backups
automatically included extra redo information to correct this
situation for any tablespace for which an ALTER TABLESPACE BEGIN
BACKUP had been issued. In Oracle Database 10g,
you can issue a single ALTER DATABASE statement to put all of the
tablespaces in a database into hot backup mode.



RMAN is an Oracle process in Oracle8 and more recent releases that
reads and writes Oracle blocks, not operating system blocks. While
RMAN is backing up a datafile Oracle blocks can be written to it, but
RMAN will read and write in consistent Oracle blocks, not operating
system blocks within an Oracle block. This removes the possibility of
fuzzy Oracle blocks in the datafile backups and therefore removes the
need for the ALTER TABLESPACE statements and the additional redo.



In Oracle Database 10g, RMAN is used to support
automated disk-based backup. Disk-based strategies have an advantage
over tape: they enable random access to any data such that only
changes need be backed up or recovered. RMAN can be set up to run a
backup job to disk at a specific time. RMAN will manage the deletion
of backup files that are no longer necessary. Future releases promise
to integrate these automatic backups with a maintenance window of
time, which can minimize the impact of backup operations by running
them when resources are readily available.








    [ Team LiB ]



    No comments: