Bulletproofing, Backups, and Disaster Recovery Scenarios

(Last Updated 06/14/05)

From http://www.uaex.edu/srea/bkupreco.htm

Introduction [Contents] (Updated 4/13/05)

At the end of the day one Monday, I ran one of my scripts to show our web usage for that day, but, instead of the listing I expected, it gave me a batch of UGLY Oracle error messages:

ORA-01115: IO error reading block from file 13 (block # 66771)
ORA-01110: data file 13: '/ndxs/oradata/PROD/medi_PROD_01.dbf'
ORA-27091: skgfqio: unable to queue I/O
ORA-27072: skgfdisp: I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 66770

which meant that it got a read error on one of our index datafiles (file 13, at that - a bit of triskaidekaphobia!). I did an "ls" command on the directory containing our indexes, and got an equally frightening message:

$ ls -ltr /ndxs/oradata/PROD
ls: /ndxs/oradata/PROD: There is an input or output error.
total 0

which indicated that it couldn't find ANY of our index datafiles ("total 0"). Apparently our RAID drive with the indexes directory had gone south, so I shut down the database to await diagnostics and repair. To top it off, our payroll group was scheduled to begin their run to get the deposits and checks out by the end of the week. The next day when there was no quick resolution, contingency plans started being talked about to do the payroll checks by hand. WHAT DO WE DO NOW? Round up some typewriters? I don't think so! I'll give you our resolution at the end, but, in the mean time ...

How vulnerable is your Oracle database? What can you do to protect it? And, if those efforts fail, how do you get your Oracle database back in working order with minimal loss of data if disaster strikes? This presentation shows you step-by-step how to bulletproof your database as much as possible against data loss, gives you the what's and how's of database backups and restores, and takes you through disaster recovery scenarios if you encounter those heart-stopping Oracle errors on database startup, such as datafile corruption or loss, disk drive loss, and other nasties. This is geared for Oracle 9i (and 8i for the most part) and UNIX installations.

Contents [Top]

Bulletproofing
    Shutting Down and Starting Up Your Database
    Notes
    Enabling Archiving
    Moving Datafiles
    Adding Redo Log Members
    Adding Redo Log Groups
    Multiplexing Control Files
    Multiplexing Archive Log Files
Backups
    What To Back Up
    Cold Backups
    Hot Backups
    Other Nightly Processing
    Archiving To Tape
Disaster Recovery Scenarios
    Things To Check
    What To Restore
    Backup/Restore Using Unix cpio Command
    Disaster Recovery Overview
    Basic Recover Database Recovery Option
    Basic Recover Datafile Recovery Option
    Basic Recover Tablespace Recovery Option
    Archivelogs Disk Volume Filled Up
    Loss of Control Files
    Loss of TEMP Datafile
    TEMP Datafile Offline
    Loss of INACTIVE Online Redo Log Group
    Loss of CURRENT Online Redo Log Group
    Failure During Hot Backup
    Loss of Server Parameter File (spfile)
    Bonus Topics
        Who is Toon Koppelaars, and why should I care?
        The Ultimate Disaster
    If All Else Fails
The Resolution
Summary
Contributors
References and Sources
Appendix
    Sample Textual Control File
    Cloning a Database
Sample SQL and Unix Shell Scripts
    Our Backup Scripts

Bulletproofing [Top] [Contents]

Yes, you really can bulletproof your database so that it is always up and available and you never lose any data from almost any disaster scenario - for a price, whether in dollars, storage, servers, and/or bandwidth. You can do things such as running parallel servers in separate locations with standby databases for quick switchovers using Oracle's Data Guard package, along with using other options that Oracle and others provide. For the rest of us with or without large budgets and staff, there are things that we, too, can do to lessen the possibility of data loss, minimize downtime and make our databases more bulletproof, such as those described below:

Enable archivelog mode (to reapply the changes during recovery; which is required by most of the disaster recovery scenarios detailed below).

Separate the archive logs from the redo logs (allocating them to separate drives; likewise for the following items).

Separate the redo logs and archive logs from the datafiles.

Multiplex (mirror) the redo log groups and members.

Multiplex (mirror) the control file.

Multiplex (mirror) the archive log files.

Enable archivelog mode so that the old redo logs, which contain the changes to the database that can be reapplied during datafile recovery, will be saved to another disk directory after a redo log switch before Oracle overwrites them when it cycles back around. Most of the disaster recovery scenarios require that archiving is turned on and that those archive logs are available on-line. Doing hot backups while the database is running also requires archiving.

Separate the archive logs directory from the redo logs on different disk drives to reduce disk contention and speed up writing when the old redo logs are being archived while the next redo logs are being written so that cycling back around doesn't freeze the database momentarily til the archiving has completed, preferrably with those disk drives being on separate drive controllers. (Having more than two redo log groups will also reduce the possibility of freezing.)

Separate the redo logs and archive logs from the datafiles on different disk drives to protect from data loss on a single disk crash (a media failure), also on separate controllers. If the datafiles' disk is lost, today's changes are still in the archive logs. If the archive logs' disk is lost, today's changes are still in the datafiles.

Have multiple redo log group members on separate disk drives (called mirroring or multiplexing) to keep multiple copies of the changes currently being made to the database so that, if one disk copy is lost, the other disk copy(s) will be used to keep the database running while you fix that lost disk, with those drives also on separate controllers.

Have multiple control files, which contain the list of datafile pathnames and the current system change number to synchronize all of those datafiles, mirrored on separate disk drives, to allow you to restart the database if one of those copies is lost, with those drives also on separate controllers. Keeping a separate up-to-date textual control file, produced by "alter database backup controlfile to trace;", will also allow you to restart the database in case all control files are lost. Note that Oracle only reads the first control file listed in the init.ora file (or as listed in the "show parameter control_files" results, for when you are using an spfile (server parameter file) instead of an init.ora file), although it updates all of them in parallel as changes are made.

Have multiple archive log directories, mirroring them on separate disk drives, to protect the set of archive logs from media failure (multiplexing of archive logs can be done through Oracle in 8i and beyond). If you had a dedicated tape drive, you could have a cron job write those new files to tape periodically. This is not the same as archiving directly to tape, but could be used to get around the problems inherent with direct tape archiving.

Some of the above is discussed in the Oracle 9i Dababase Administrator's Guide (chapters 6 through 8, and chapter 12).

Shutting Down and Starting Up Your Database [Top] [Contents]

If you've never worked as an Oracle Database Administrator before, some basics that you need to know are how to position to the correct database and how to shut down and start up your database. Each Oracle database is identified by an "Oracle SID" (System Identifier), such as PROD or TEST. To position to the PROD database in UNIX (shown with a dollar prompt here), you would enter:

$ . oraenv (that's a dot and a space before oraenv)
PROD (when prompted for the ORACLE_SID)

Or, in NT (shown with a c:\> prompt here), you would enter:

c:\> set ORACLE_SID=PROD
c:\> set ORACLE_HOME=d:\oracle\v9204 (wherever your Oracle product directory is located)

To shut down or start up your database, you will first have to connect to your database as a system database administrator (sysdba) user (from a UNIXor NT dba group user login, such as oracle). Prior to Oracle 9i, you could use Server Manager (using the "svrmgrl" command or version equivalent), or the line-mode SQL*DBA (using the "sqldba lmode=y" command or version equivalent), but, in Oracle 8.1.5 and later, Oracle moved the Server Manager commands to SQL*Plus (using the "sqlplus" command or version equivalent). So, for Oracle 8.1.5 and above on UNIX, to connect to your database as sysdba, you would enter:

$ sqlplus "/ as sysdba" (or, if already in sqlplus: SQL> connect / as sysdba)

Or, prior to Oracle 9i, to use Server Manager, you would enter:

$ svrmgrl
SVRMGRL> connect internal

For the remaining examples, Oracle 9i on UNIX will be used.

To shut down the database (connected as sysdba), you would enter the following, which kills any active sessions and rolls back any pending database changes:

SQL> shutdown immediate

If "shutdown immediate" fails or hangs, bring up another sysdba session and enter:

SQL> shutdown abort
SQL> startup
SQL> shutdown immediate

When an Oracle database starts up, it goes through three stages. The nomount stage reads the init.ora file (or spfile), allocates SGA memory, and starts the background processes, such as pmon (process monitor). The mount stage opens the control files. The open stage opens the datafiles and online redo logs, and performs crash recovery if needed. Most times you will be just issuing a startup command, which goes through the three stages and opens the database. However, various database recovery scenarios and database administration tasks require commands to be entered at a certain state. To get to or step through each state (connected as sysdba), you would enter the following:

nomount state:
    SQL> startup nomount    (from shutdown state)
mount state (use just one of these):
    SQL> startup mount    (from shutdown state)
    SQL> alter database mount;    (from nomount state)
open state (use just one of these):
    SQL> startup [open]    (from shutdown state)
    SQL> alter database open;    (from nomount or mount state)

Notes: [Top] [Contents]

"$" assumes that you are at the unix prompt, and have exited out of svrmgr or sqlplus (by typing "exit" at their prompts) if the prior command was a svrmgr or sqlplus command.

"SVRMGR>" assumes that you are in the line-mode version of Server Manager, which is entered by typing "svrmgrl" at the unix prompt. Similar commands can be executed from the line-mode version of SQL*DBA, which is entered by typing "sqldba lmode=y" at the unix prompt. References to "svrmgr" mean through the "svrmgrl" command.

"SQL>" assumes that you are in SQL*PLUS, usually either connected as sysdba (connect / as sysdba) or logged in as user "system", which is entered by typing "sqlplus" at the unix prompt (but, if you don't know that, you should take a basic course in SQL before attempting the instructions given here).

"!" is the same thing as "host" at the SQL> prompt or SVRMGR> prompt, so, for example, you could use either "!mv" or "host mv" to move a file while you are in sqlplus or Server Manager on a UNIX system. In NT, you would use "$" at the SQL> prompt instead of "!" to execute a host command.

In all of these examples, replace "PROD" with your own Oracle SID name, and replace the data file names with your own names, and, for the most part, run these commands from the unix userid "oracle". Also, replace "password" with the user id's password on your site.

When using the export command, be aware that changes to related tables during the export may cause inconsistencies. To prevent this, especially during full database exports, either make sure that no users are on and "alter system enable restricted session;" has been entered, or use the "consistent = Y" parameter to the exp line mode command.

Finally, ALWAYS make sure you have a good current FULL BACKUP of your database before using these scripts and commands. Some of them have the potential to delete or corrupt large amounts of your precious data if you don't do them just right. Don't say you haven't been warned!

Enabling Archiving [Top] [Contents]

To enable archivelog mode so that Oracle saves the old redo logs to disk after a log switch before overwriting them when it cycles back around, perform the following commands. Oracle requires at least two redo logs (log groups) in order to accommodate possible archiving, since one could be being spooled during archiving to the archivelogs directory while changes currently being made to the database are being written to the other by the log writer.

$ vi $ORACLE_HOME/dbs/initPROD.ora
Edit init.ora file to contain the archive parameters (directory name and file format and the start flag):
      log_archive_dest = /u01/oradata/PROD/archivelogs/
      log_archive_format = arch_PROD_%S.arc
      log_archive_start = true
("archive log start" can also be used when connected as sysdba to start the arch process, if you don't want it automatically done through init.ora; but, neither of these turn on archivelog mode - they just start the arch process itself.)
$ mkdir /u01/oradata/PROD/archivelogs
$ sqlplus "/ as sysdba"
SQL> shutdown immediate
SQL> startup mount
SQL> alter database archivelog;
Turns on archivelog mode for the database.
SQL> alter database open;
SQL> archive log list
Shows that you are now in archivelog mode, that archiving is enabled, and shows you the current log sequence numbers. You can "alter system switch logfile;" a few times to see archivelog files going into the log_archive_dest directory.
SQL> select * from v$database;
Also shows that you are in archivelog mode.

To disable archivelog mode, similar commands (when connected as sysdba) are used, except for using the "alter database noarchivelog;" command instead (these commands are also performed after the database is shut down). The log_archive_start parameter in the init.ora file would also need to be set to false to permanently disable the arch process itself.

Moving Datafiles [Top] [Contents]

You can move a datafile to another disk, or change it's name (such as for restoring a copy of it from your backup tape to that other location in the event of a disk failure or moving it to another disk that has more room for it) by making the move after shutting down the database, and using the "alter database rename file" command to tell Oracle where the file has been moved to, as in the following example, which moves one of the development datafiles from disk u03 to disk u04:

$ sqlplus "/ as sysdba"
SQL> shutdown immediate
SQL> !mv /u03/oradata/PROD/devl_PROD_01.dbf /u04/oradata/PROD
SQL> startup mount
SQL> alter database rename file '/u03/oradata/PROD/devl_PROD_01.dbf'
to '/u04/oradata/PROD/devl_PROD_01.dbf';
SQL> alter database open;
SQL> select * from v$datafile;
Lists the current datafiles, showing your changes.

This can also be done without shutting down the database, but taking the associated tablespace offline first (which prevents others from accessing that tablespace's tables, indexes, and other data):

$ sqlplus "/ as sysdba"
SQL> alter tablespace development offline;
SQL> !mv /u03/oradata/PROD/devl_PROD_01.dbf /u04/oradata/PROD
SQL> alter database rename file '/u03/oradata/PROD/devl_PROD_01.dbf'
to '/u04/oradata/PROD/devl_PROD_01.dbf';
SQL> alter tablespace development online;
SQL> select * from v$datafile;

Finally, the textual control file from the "alter database backup controlfile to trace;" command can be used to change the datafile locations, such as to move all datafiles from one disk to another (u03 to u04 in the example below). The textual control file is created in the database's udump directory, which is found by using either of the following commands:

SQL> select value from v$parameter where name = 'user_dump_dest';
SQL> show parameter user_dump

with a name like /u00/oracle/admin/PROD/udump/prod_ora_16060.trc. (A sample textual control file is shown in the Appendix.)

SQL> connect / as sysdba
SQL> alter database backup controlfile to trace;
SQL> show parameter user_dump
SQL> shutdown immediate
SQL> exit
$ mv /u03/oradata/PROD/*PROD*.dbf /u04/oradata/PROD
$ cd /u00/oracle/admin/PROD/udump
$ ls -ltr *.trc
To find the name of the textual control file just created (the last one in the ls listing, such as prod_ora_16060.trc).
$ vi prod_ora_16060.trc
Edit the textual control file and remove all lines before the STARTUP NOMOUNT line (such as deleting lines 1 through 20, for which the vi command is ":1,20d"). Then, edit all of the names in the DATAFILE section to match the new disk names. Do NOT change the order of the file names! The following command in vi will change "u03" to "u04" in the DATAFILE section:
:g/DATAFILE/,/;/s/u03/u04/
Finally, comment out (#) the RECOVER command near the end of the file. [Be sure to read the NOTES below.]
$ sqlplus "/ as sysdba"
SQL> @prod_ora_16060.trc
Recreates the control files with the new disk volume names.
SQL> select * from v$datafile;

NOTES: Be aware that the 9i "alter database backup controlfile to trace;" command creates a textual control file with two "create database" commands. You will use the first set (the one with NORESETLOGS) and edit out the second set, while you are changing the directory and/or file names in the textual control file. Also, be sure to include the ADD TEMPFILE command that may be generated following the OPEN command - that is easy to overlook and get edited out accidentally.

Adding Redo Log Members [Top] [Contents]

To add another member on another disk drive and separate controller to a redo log group, in order to mirror the logs in the group for media failure protection, you will need to specify the new file name and the current group number, as shown below. The new redo log file will be created the same size as all the other redo log files in the group.

SQL> alter database add logfile member
'/u04/oradata/PROD/log_PROD_1C.rdo' to group 1;
SQL> select * from v$logfile;
Lists the current log files, showing your changes.

To add more members than maxlogmembers to a redo log group, you will need to recreate the control files with that new maximum by creating a textual control file, editing the maxlogmembers parameter in it, and running that SQL when connected as sysdba after shutting down the database:

$ sqlplus "/ as sysdba"
SQL> alter database backup controlfile to trace;
SQL> !ls -ltr /u00/oracle/admin/PROD/udump
To find the name of the textual control file just created.
SQL> !vi /u00/oracle/admin/PROD/udump/prod_ora_16060.trc
Edit the textual control file and remove all lines before the STARTUP NOMOUNT line (such as vi command ":1,20d", which deletes the first 20 lines), then, change the maxlogmembers value from the default 2 to 3 or 4 (such as vi command ":4s/2/3/" to change the value on line 4). Finally, comment out (put # in front of) the RECOVER command near the end of the file.
SQL> shutdown immediate
SQL> @/u00/oracle/admin/PROD/udump/prod_ora_16060.trc
Recreates the control files with the new maxlogmembers value.

Adding Redo Log Groups [Top] [Contents]

Additional redo log groups can be added to lessen the possibility of the database freezing while waiting on archiving to free up the next log group. The new group number along with the list of new redo logs and the allocation size for those member files is specified as shown:

SQL> alter database add logfile group 4
('/u00/oradata/PROD/log_PROD_4A.rdo',
'/u01/oradata/PROD/log_PROD_4B.rdo') size 500K;
SQL> select * from v$logfile;

Multiplexing Control Files [Top] [Contents]

Multiplexing the control files involves just shutting down the database, copying the current control file to the other disks and renaming them, editing the init.ora file to include the new control file names, and restarting the database. From that point, all datafile header updates and datafile additions and migrations will be reflected in all of the identical multiplexed control files, and the other remaining control file(s) can be used to bring the database back up if one or more of the control files are lost.

$ sqlplus "/ as sysdba"
SQL> shutdown immediate
SQL> host
$ cp -p /u03/oradata/PROD/ctrl_PROD_01.ctl /u01/oradata/PROD/ctrl_PROD_02.ctl
$ cp -p /u03/oradata/PROD/ctrl_PROD_01.ctl /u00/oradata/PROD/ctrl_PROD_03.ctl
Copies the original control file to the other disks, renaming them to match the control_files parameter.
$ vi $ORACLE_HOME/dbs/initPROD.ora
Edit init.ora to add other control files on separate disks to the control_files parameter, using their full pathnames, separated by commas, such as:
      control_files = (/u03/oradata/PROD/ctrl_PROD_01.ctl,
            /u01/oradata/PROD/ctrl_PROD_02.ctl,
            /u00/oradata/PROD/ctrl_PROD_03.ctl)
$ exit
SQL> startup
SQL> select * from v$controlfile;
Lists the current control files, showing your changes.

Multiplexing Archive Log Files [Top] [Contents]

I haven't done this yet, but, multiplexing archive log files is available in Oracle in 8i and beyond (see chapter 8 in Oracle9i Database Administrator's Guide). For earlier versions, you could set up a cron job to run every half-hour or so to copy the archive log files not currently in the mirror directory into it. Make sure that the mirror directory is on a different disk than the archive logs directory; otherwise, you defeat the purpose if the archive logs disk goes out.

Backups [Top] [Contents]

The DBA's primary job is to make sure that the data is available and accessable by the users during those times that they need it, which means that a complete and well-tested backup and recovery procedure is in place and functioning. This section covers what files need to be included in that backup, the types of backups (cold backups and hot backups), and other processing that you could perform during the backup for proactive maintenance.

What To Back Up [Top] [Contents]

The following types of files making up the database and providing recovery capabilities should be backed up during one backup cycle, such as all of them during a cold backup, or all or subsets of them during a hot backup cycle, assuming the entire set is eventually backed up during that cycle (see Oracle9i User-Managed Backup and Recovery Guide, chapter 2):

Datafiles (for all tablespaces)
Control Files (binary and textual versions)
Redo Log Files (cold backups only, not hot backups)
Archive Log Files (archived redo logs, if archivelog mode is enabled)
Parameter Files (init.ora; like $ORACLE_HOME/dbs/initPROD.ora)
Password Files (like $ORACLE_HOME/dbs/orapwdPROD, if used)

The most basic way to get the names of the datafiles, control files, and redo log files through SQL is shown below, which can be used in both the startup mount state and the database open state:

SQL> select name from v$datafile;
SQL> select name from v$tempfile;
SQL> select name from v$controlfile;
SQL> select member from v$logfile;

You can also get the tablespace name with the datafile and temp file names using:

SQL> select tablespace_name,file_name from dba_data_files order by tablespace_name;
SQL> select tablespace_name,file_name from dba_temp_files;

The list of archive log files which are currently on-line cannot be found through SQL. However, you can get the directory containing those archive logs, the format of the file names, and whether or not archiving was turned on in the init.ora parameter file using:

SQL> select name,value from v$parameter where name in ('log_archive_dest',
'log_archive_format','log_archive_start');

These archiving parameters can also be found in an sqldba session by issuing either of the following commands (but the first one might truncate the dest directory name):

SQL> show parameter archive
SQL> archive log list

To get the list of archive log files for the last 5 days (the date is when Oracle started writing to that redo log, not when it was copied to the archivelogs directory), which may or may not still be on-line, you can query the v$archived_log table in Oracle 8.x and above:

SQL> select name from v$archived_log
where trunc(completion_time) >= trunc(sysdate)-5;

Or, you can query the v$log_history table in Oracle 7.x:

SQL> select archive_name from v$log_history
where trunc(to_date(substr(time,1,8),'MM/DD/YY')) >=
trunc(sysdate)-5;

The name of the parameter file is not kept in the database, but is usually $ORACLE_HOME/dbs/initPROD.ora (substitute your Oracle SID).

Since you can't find the currently existing archive log files or the init.ora file or the password file through SQL, the best way to get the list of database files to back up (assuming you are using a standard naming convention for all of your files related to a database instance, such as using the Oracle SID in the names of those files or in the name of one level of the directory hierarchies containing your datafiles, control files, redo log files, and archive log files) is by using the unix find command to get the list of file pathnames that contain that Oracle SID, as shown below (which is redirect to a file named backemup.dat):

$ find / -name '*PROD*' ! -type d 2>/dev/null >backemup.dat

The addendums to this presentation contain the scripts backup_list.shl and backup_list.sql, which are sample unix and SQL scripts used to create the list of files to be backed up, assuming a standard naming convention. More information on naming conventions can be found in papers on Oracle's OFA, which is referred to as either Oracle Flexible Architecture or Optimal Flexible Architecture.

Cold Backups [Top] [Contents]

Cold backups are done while the database is shut down, usually after a "shutdown immediate" is issued, and contain the most complete backup snapshot, since all database files, including control files and redo log files, are backed up from the same point in time.

During the time that a cold backup is in progress, the database is not available for querying or updating while those files are being copied to tape or to some other disk drive or directory (possibly compressing (zipping) them at the same time). Cold backups can be done with the database either in archivelog mode or in noarchivelog mode.

To perform cold backups, you will first need to get the list of files to be backed up (putting the file names into backemup.dat in the examples below), including datafiles, control files, and redo log files, which you can get either by querying the database itself while the database is still up (see "What To Back Up") or by using the unix "find" command to locate files following a standard naming convention, preferrably after you have shut down the database (see below), since the archive log files and other database files can also be picked up at that time by that "find" command. Be sure to get a fresh list of files for each backup in order to accommodate structural changes in the database (such as new datafiles) as well as additional archive log files.

Then, you should include the parameter file and password file (if used) in the list. If you are using the "find" command, those files could also be picked up by "find", since they both usually contain the SID name. It would also be a good idea to include a textual control file in the list, which you would create before shutting down the database for the backup, as shown below (which contains the SID name as part of the directory name and, so, could also be picked up by "find"):

$ sqlplus "/ as sysdba"
SQL> alter database backup controlfile to trace;
SQL> shutdown immediate
SQL> exit
$ ls -ltr /u00/oracle/admin/PROD/udump/* | tail -1 | sed 's/^.* \//\//' >>backemup.dat
Puts the name of the textual control file just created in the udump directory (the last one in the ls listing, such as prod_ora_16060.trc) into the list of files to be backed up.

Finally, after shutting down the database, you will need to get the list of the archive log files to back up (the shutdown may cause new archive log files to be written out). You could do this either as part of the "find" command at this point, along with the datafiles, control files, and redo log files, or by just getting the fully-qualified listing of the file pathnames in the archivelogs directory. You could also combine the two and get the list of non-archive files for the backup, followed by the list of archive files (zipped followed by not zipped here), as shown below:

$ find / -name '*PROD*' ! -type d 2>/dev/null | grep -v 'arc$' | grep -v 'gz$' >>backemup.dat
$ ls /u01/oradata/PROD/archivelogs/*.arc.gz >>backemup.dat
$ ls /u01/oradata/PROD/archivelogs/*.arc >>backemup.dat

After the list of files has been created (in backemup.dat) and the database has been shut down, you can do any of the following to back up the files or make copies for the backup. The first two options allow you to start up the database before writing the copied files to tape, which usually results in a shorter down time for the database.

Copy the files to another staging (backup) directory (such as /u03/oradata/prod1 here) to be written to tape later, and restart the database, or,

Compress (using GNU Zip here) the files to another staging directory to be written to tape later, and restart the database, or

Copy the files directly to tape (no staging directory), then, restart the database.

These three options can be performed as shown below (do only one of them):

$ cat backemup.dat | sed 's/$^.*$\/$.*$$/cp -p \1\/\2 \/u03\/oradata\/prod1\/\2/' | sh
Copies the files to another staging (backup) directory.
$ cat backemup.dat | sed 's/$^.*$\/$.*$$/gzip -cv1 \1\/\2 >\/u03\/oradata\/prod1\/\2.gz;
touch -r \1\/\2 \/u03\/oradata\/prod1\/\2.gz/' | sh
Compresses (zips) the files to another staging directory (the touch command must be on the same line as the sed command).
$ cat backemup.dat | cpio -ovC64 >/dev/rmt0
Copies the files directly to tape (without using a staging directory).
Then, restart the database after either of the three above commands:
$ sqlplus "/ as sysdba"
SQL> startup
SQL> exit

Be sure to copy the backup disk directory files to tape after you have done the startup for the first two options, such as shown below. (See "Backup/Restore Using Unix cpio Command" for cpio description.)

$ ls /u03/oradata/prod1/* | cpio -ovC64 >/dev/rmt0

Hot Backups [Top] [Contents]

Hot backups are done while the database is still up and running and being accessed and updated. This is used when continuous 24-hour-per-day / 7-day-per-week operations are required. If you have even a half-hour window when your database can be down, you can probably do a cold backup using the staging directory method above.

Doing hot backups requires that archivelog mode is turned on. It is more taxing on redo logs and archivelogs when the backups are being performed, since entire database blocks are written to the redo logs for the changes that are made, instead of just the individual changes (transaction data).

You must NEVER backup the online redo log files when doing hot backups. Restoring a backup of an online redo log file would cause corruption during database recovery. The online redo log files are the most vulnerable items in the hot backup scheme. However, forcing log switches can catch the pending datafile updates in the redo logs and give you some coverage in this area.

You cannot backup the online control files themselves, but you can backup a special copy of them, either a binary copy or a textual copy, as shown below.

To perform hot backups, you will only back up one tablespace's datafiles at a time. In order to reduce the time that the tablespace is in backup mode, it is best to copy its datafiles to a backup disk directory, instead of to tape at the current time. When all copies have been done, you can then copy that backup disk directory to tape while the database is running normally. You must use the "begin backup" and "end backup" commands, as shown below; otherwise, your backups will be corrupted and useless. Note that the data in the tablespace remains available to the users while in backup mode, and any transactions can still be performed on that data. (See section on Disaster Recovery Scenarios for info on database crashes during hot backups.)

For each tablespace, perform the following steps while connected as sysdba (the DEVELOPMENT tablespace is shown here - change the tablespace name and the datafile names as appropriate for each tablespace backup):

SQL> alter tablespace development begin backup;
SQL> !cp -p /u03/oradata/PROD/devl_PROD_*.dbf /u03/oradata/prod1
Copies the tablespace's datafiles to a backup disk directory (/u03/oradata/prod1 here), using standard naming conventions.
SQL> alter tablespace development end backup;

Force a log switch to archive the current log to catch all of the committed transactions in it. Then, copy the archive log files (zipped and not zipped) to your backup disk directory, as shown below:

SQL> alter system switch logfile;
$ ls /u01/oradata/PROD/archivelogs/*.arc.gz >logfile.lst
$ ls /u01/oradata/PROD/archivelogs/*.arc >>logfile.lst
Generates the list of archivelog files to be copied (don't just copy the entire directory - you might catch an archivelog file in mid-creation, but, do keep the archiving process running).
$ sleep 5
Waits for a few seconds to allow current archivelog writes to complete.
$ cat logfile.lst | sed "s/$.*\/$$[^\/].*$/cp -p \1\2 \/u03\/oradata\/prod1\/\2/" >logfile.shl
$ sh logfile.shl
The above commands copy the archivelog files to your backup disk directory.

Create a control file to back up (binary and textual) AFTER all other files have been copied:

SQL> alter database backup controlfile to '/u03/oradata/prod1/controlfile.ctl';
SQL> alter database backup controlfile to trace;
$ ls -ltr /u00/oracle/admin/PROD/udump
To find the name of the textual control file just created (the last one in the ls listing, such as prod_ora_16060.trc).
$ cp -p /u00/oracle/admin/PROD/udump/prod_ora_16060.trc /u03/oradata/prod1
Copies that textual control file to your backup disk directory.

After all datafiles and log files have been copied to your backup disk directory, back up those copies in your backup disk directory to tape. If you wanted to zip those files first with a utility such as GNU Zip to backup the compressed versions (but see below), you would do that before the cpio copy command, as shown below.

$ cd /u03/oradata/prod1
$ gzip -vN1 *
$ find /u03/oradata/prod1 | cpio -ovC64 >/dev/rmt0

Partial hot backups, in which the tablespaces are backed up at separate times, possibly across several days for the entire set of tablespaces, are possible, but not recommended.

NOTE: Only put one tablespace at a time into hot backup mode, and, then, for as short a time as possible. Hot backups are hard on your redo logs, since entire database blocks are written to them for each change, instead of just the individual changes themselves, so, your redo logs tend to get filled up fast, resulting in much more frequent log switches while your tablespaces are in hot backup mode. A checksum is written to the beginning and to the end of the database block whenever a change is made. In Oracle7, during recovery, if those two checksums are different (meaning a change occurred in that block while it was being written out during a hot backup), the changed block is retrieved from the archivelogs. In Oracle8 (from what I remember from my Oracle class), those two checksums are compared during the hot backup, and the archivelog block is written out instead of the database block if a change is detected. It is for this reason that entire changed blocks are archived during hot backups, instead of just the changed bytes.

Instead of doing compression after you have copied the files into the backup disk directory, you can use a UNIX pipe (or other operating system equivalent, if available) to compress the files on the fly as you copy or create them, such as in this export example, in which exp writes to the pipe while gzip reads from the pipe (that's a double quote/single quote after exp and a single quote/double quote after sysdba in the export (exp) command):

$ mknod /tmp/exp_pipe p
$ gzip -cNf </tmp/exp_pipe >prod.dmp.gz &
$ exp "'/ as sysdba'" file=/tmp/exp_pipe full=y compress=n log=prod.dmp.log
$ rm -f /tmp/exp_pipe

Other Nightly Processing [Top] [Contents]

In addition to backups, other nightly processing can also be done to guard against data loss and allow for proactive problem detection and correction, such as the following, which are described below:

Create and keep current copies of your textual control files.

Create and keep current copies of your textual init.ora file if you are using an spfile.

Perform full database exports for quick table restores and to check datafile integrity.

Generate definitions for all of your tables and indexes.

Gather statistics on datafile and index space usage and table extent growth for proactive maintenance.

Keep current copies of the textual control files for each of your database instances, in case the control file is corrupted and you have to rebuild it. For each of the database instances (such as SEED, TRNG, PPRD, and PROD), switch to that instance (use ". oraenv" and enter SEED, or, use "export ORACLE_SID=SEED", to switch to SEED in unix), and enter the following in sqlplus or svrmgr to create the textual control file in the instance's user_dump_dest directory specified in the init.ora file:

SQL> alter database backup controlfile to trace;

Keep a current copy of the textual init.ora file (initialization parameter file) if you are using an spfile (server parameter file) instead of an init.ora file to hold your initialization parameters. If the binary spfile becomes corrupt, the init.ora file can be used in its place to restart your database (after deleting the spfile, since Oracle looks for the spfile first). To create the init.ora file (which is placed in the $ORACLE_HOME/dbs directory on UNIX systems, or the $ORACLE_HOME\database directory on NT systems):

SQL> create pfile from spfile;

Create an export file of the entire database for individual table restores. This will also detect any block corruption in the table data, since a full table scan is performed. Block corruptions are not detected in physical block copies during tape backups, so you wouldn't know that you were propagating bad blocks to all of your backups until a user tried to access data in that block. During export, if a bad data block is encountered, the export will fail.

$ exp "'/ as sysdba'" file=/u03/oradata/prod1.dmp full=y \
log=/u03/oradata/prod1.dmp.log >/dev/null 2>/dev/null

Create an indexfile of that export for table and index definitions. The indexfile can also be used to separate tables into product-based tablespaces, or to split the indexes off into their own tablespace.

$ imp "'/ as sysdba'" file=/u03/oradata/prod1.dmp \
indexfile=/u03/oradata/prod1.idx full=y \
log=/u03/oradata/prod1.idx.log >/dev/null 2>/dev/null

Gather statistics on space usage so that you will know when to reallocate tables to give them larger extents before they run out of extents to allocate (using gurrddl or gurnddl along with export/import), or to add datafiles to tablespaces that are about to fill up (alter tablespace ... add datafile ...), or to recreate indexes that currently have unused or wasted space due to deletions from their associated tables. Below are some sample commands you can use to see potential space problems for tables and tablespaces and to see if an index has a lot of wasted space (and should be recreated at a time of no activity):

SQL> select segment_name,segment_type,extents,max_extents
            from sys.dba_segments where extents + 5 > max_extents
            or extents > 50 order by segment_name,segment_type desc;
SQL> col "Free" format 9,999,999,999
SQL> col "Tot Size" format 9,999,999,999
SQL> select tablespace_name,sum(bytes) "Tot Size"
            from dba_data_files group by tablespace_name;
SQL> select tablespace_name,sum(bytes) "Free"
            from dba_free_space group by tablespace_name;
For each index, find the number of deleted rows (may rebuild it later):
SQL> validate index posnctl.nhrfinc_key_index;
SQL> select name,del_lf_rows_len from index_stats;
The above pair are used together (validate clears out the index_stats table before putting its stats into it).

Archiving To Tape [Top] [Contents]

The best method for periodically writing the archive log files to tape is to send them to disk first and have a cron job backup those archive logs to tape later on. Never archive directly to tape, since that would require a dedicated tape drive just for the archivelogs, and since the backup would need to complete before cycling back to that online redo log (otherwise, the database will "freeze" until that redo log is available for reuse - the same thing that happens if your archivelog directory gets filled up). If you did have a tape drive that could be dedicated to the archivelogs, they could be backed up more frequently than a normal backup procedure, such as every 10 minutes. In that case, you should only backup the new files which haven't been placed on the tape yet.

Disaster Recovery Scenarios [Top] [Contents]

Backups are no good unless you know how to use them to restore your database up to and including the most recent change made to the data. Oracle gives those capabilities for up-to-the-instant recoveries to you, as long as you know how to use them. This section describes Oracle's recovery capabilities and how to use them for general recovery of datafiles, tablespaces, or the entire database, and for specific disaster recovery scenarios, which are variations on the themes of the general recovery procedures.

Things To Check [Top] [Contents]

Messages displayed during database startup will usually indicate what kind of problems the database is experiencing. After you get an error message during your initial startup, you will most likely need to do a shutdown before proceeding with database recovery.

Alert Log in the database's bdump directory (you can find bdump using "select value from v$parameter where name = 'background_dump_dest';" or "show parameter background_dump"), named something like /u00/oracle/admin/PROD/bdump/alert_PROD.log).

Other recent trace (.trc) files in the database's bdump (background_dump_dest) and udump (user_dump_dest) directories.

Oracle processes, using "ps -ef | grep ora" at the unix command line. Should see entries like ora_smon_PROD, ora_pmon_PROD, ora_dbwr_PROD, ora_lgwr_PROD, ora_arch_PROD, and other ora_xxxx_PROD processes for each database instance, and oraclePROD for jobsub and other users.

The most recent .lis files in the job submission directory (such as /home/jobsub), at the end of the "ls -ltr /home/jobsub" listing, which may indicate why a user's run crashed or why it filled up a table or archive log directory.

What To Restore [Top] [Contents]

Only restore the datafile that has gone bad if you are recovering up to the current time.

Restore ALL datafiles if you are recovering up to an earlier time than now but after those datafiles were backed up (for point-in-time recovery or cancel-based recovery). Since Oracle requires that all datafiles be synchronized at the same SCN (System Change Number), just recovering one datafile to an earlier time would cause that datafile to have a lower SCN than all the other datafiles, which Oracle wouldn't accept.

Restore all archivelog files that will be needed in the recovery (for the time range between the backup and the point-in-time that you are recovering to), if they are not currently available on-line or if those on-line archivelog files have been corrupted.

NEVER restore control files unless all copies are lost. See "Loss of Control Files" for information about restoring control files.

NEVER restore online redo log files from a hot backup (you shouldn't have backed them up in the first place); it is OK to restore them when doing cold backups, but ONLY IF you are restoring up to the time of that backup or doing a partial restore where those old logs won't be used during the restore. Restoring a backup of an active redo log file would cause corruption during database recovery. If the redo logs are deleted or damaged, just remove the damaged redo log files and reset the logs on open, and Oracle will recreate the redo log files for you (be sure you backup immediately after any resetlogs command, since Oracle can't recover datafiles restored from a backup made prior to resetting the logs):

$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> alter database open resetlogs;

Backup/Restore Using Unix cpio Command [Top] [Contents]

Here is a quick overview of using the unix cpio command for tape backups and restores. The tape drive for our system is /dev/rmt0. (A blocksize of 32K, indicated by C64, is used by these cpio commands. You can change the blocksize of your tape drive in AIX from user root through the smit utility, or by using the chdev command, as in "chdev -l 'rmt0' -a block_size='32768'", which will speed up the writing to that tape drive from the default 512-byte blocksize.)

Saving files to backup:
        $ cat backemup.dat | cpio -ovC64 >/dev/rmt0
Listing tape contents:
        $ cpio -itvC64 </dev/rmt0
Restoring files to same pathnames:
        $ cpio -imvC64 </dev/rmt0
Restoring files individually to new pathnames, entering the new pathname when prompted (also use "u" flag to overwrite newer files if they exist as newpathname; for fullpathname, enclose patterns in double quotes):
        $ cpio -irmC64 fullpathname </dev/rmt0
        newpathname (or, period (".") for same name, or, enter to skip file)

Disaster Recovery Overview [Top] [Contents]

The first thing you should ALWAYS do when your database crashes is to make a BACKUP of your datafiles, control files, archive log files, and initialization parameters file (init.ora) for your database that crashed, either copying those files to tape or to another directory, or zipping them with something like GNU Zip, or making a tar file of them. DO NOT TRY TO RESTART YOUR DATABASE BEFORE BACKING UP THOSE FILES. Otherwise, you might make the situation worse. If all else fails, the people at SCT or Oracle may be able to get your database going again from that initial backup, even though the crash may have corrupted some files.

After a crash occurs and you've made a backup, do a STARTUP when connected as sysdba. If there are problems encountered during startup, Oracle will display an error message about the first problem, which will usually give you an idea about what type of recovery scenario you will need to apply. In most cases, you will need to do a shutdown after this initial startup attempt before beginning the recovery.

There are three primary recovery options for media failure scenarios. The three recovery options are:

1) Recover Database 2) Recover Datafile 3) Recover Tablespace

Most of the other recovery scenarios are just variations on these three themes. Recovery steps for specific disaster recovery scenarios are given later on; however, in a generic recovery:

The database is shut down or the affected datafile(s) or tablespace is taken offline.

The affected datafile(s) is restored from a backup along with the archived redo logs created since that backup.

A recover command is issued.

The tablespace or datafile(s) is brought back online.

You can see the Oracle9i User-Managed Backup and Recovery Guide for more information on Oracle database disaster recovery.

Basic Recover Database Recovery Option [Top] [Contents]

The Recover Database option is used to recover all datafiles needing recovery, either up to the point of failure, called complete recovery, or up to a point in the past before the failure, called incomplete recovery, which is only available using this option. Recover Database is performed from the MOUNT state, so, the database will have to be SHUT DOWN before using this. All datafiles must be ONLINE to be recovered, in contrast to the other two options where the datafiles are offline in most cases. The basic steps are: restore the datafiles, mount, online the datafiles, recover database, and open, as shown below:

$ cp -p /u03/oradata/prod1/devl_PROD_01.dbf /u03/oradata/PROD
Restores the datafiles from backup, such as devl_PROD_01.dbf here.
$sqlplus "/ as sysdba"
SQL> startup mount
SQL> select * from v$datafile;
Shows status of datafiles, indicating those that are offline.
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' online;
Online the offline datafiles shown above, such as devl_PROD_01.dbf here.
SQL> set autorecovery on
Tells recovery not to ask which redo logs to use (in log_archive_dest).
SQL> recover database;
Recovers all datafiles, as needed, up to the current point in time.
SQL> alter database open;
Opens the database for user access.

You may also need to drop and recreate the TEMP tablespace after a database recovery. See Loss of TEMP Datafile on how to do this.

Recover Database can also be used to perform an incomplete recovery up to some point in time before the failure occurred, or before the database was last shut down. To do this, ALL datafiles must be restored from the backup before doing the incomplete recovery (not control files or online redo log files), and ALL datafiles are recovered back to the same point in time - you can't do a partial incomplete recovery.

There are three incomplete recovery options available: time-based, cancel-based, and change-based recovery. Time-based recovery brings the datafiles up to a given date and time. Cancel-based recovery applies entire archive log files and online redo log files one at a time until the word CANCEL is typed in, so you can't use autorecovery with it. Change-based recovery applies all archive log files and online redo log files up to, but NOT including, a given system change number. If you have added datafile(s) since the last backup, and you want to recover up to a point before the datafile(s) were added, you will need to restore the control files and use the "using backup controlfile" option of the recover command (see Loss of Control Files). (I don't know of a way to recover up to a point between datafile additions - it's all or none.) The basic steps are similar to the complete recover database option, except for the incomplete recovery commands, which are shown below (only use one of these):

SQL> recover automatic database until time '2005-02-14:15:45:00';
Automatic is similar to Set Autorecovery On. Use 1 second before the time of failure or of the log to exclude, which, for Oracle 8.x and above, could be found using:
select to_char(a.time,'YYYY-MM-DD:HH24:MI:SS') as time, a.name from (select first_time as time, substr(name,1,58) as name, first_change# as change from v$archived_log union select first_time, 'Redo Log Group ' || to_char(group#), first_change# from v$log) a order by change;
Or, for Oracle 7.x, could be found using:
select to_char(a.time,'YYYY-MM-DD:HH24:MI:SS') as time, a.name from (select time, substr(archive_name,1,58) as name, low_change# as change from v$log_history union select first_time, 'Redo Log Group ' || to_char(group#), first_change# from v$log) a order by change;
SQL> recover database until cancel;
Accept logs until you type in CANCEL.
SQL> recover automatic database until change 43047423;
The system change numbers (SCN) contained in each archive log file is shown in Oracle 8.x and above using: select name,first_change#,next_change# - 1 from v$archived_log;
Or, for Oracle 7.x, using: select name,low_change#,high_change# from v$log_history;
For the online redo logs, you could use "select max(first_change# - 1) from v$log;" to find the SCN to use for applying all but the current online redo log file.

After an incomplete recovery, you must open the database with the RESETLOGS option, as shown below, and then immediately shut down the database and make a BACKUP, since the old backup is unuseable after RESETLOGS is used.

SQL> alter database open resetlogs;

Basic Recover Datafile Recovery Option [Top] [Contents]

The Recover Datafile option is used to recover the specified datafile up to the point of failure, synchronizing it with the other datafiles (complete recovery only). Recover Datafile is performed either from the MOUNT state (after a shutdown), with the datafile ONLINE or OFFLINE, or from the OPEN state, with the datafile OFFLINE. A bad datafile must be taken offline before the database can be opened. Since the SYSTEM tablespace CANNOT be taken offline, you cannot use Recover Datafile from the OPEN state on SYSTEM.

The basic steps for the MOUNT state are: restore the datafile, mount, recover datafile (using automatic here instead of autorecovery), online the datafile if needed, and open, as shown below:

$ cp -p /u03/oradata/prod1/devl_PROD_01.dbf /u03/oradata/PROD
$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> recover automatic datafile '/u03/oradata/PROD/devl_PROD_01.dbf';
SQL> select * from v$datafile;
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' online;
SQL> alter database open;

The basic steps for Recover Datafile from the OPEN state (except for SYSTEM) while the rest of the database is still up and running, are: offline the datafile, restore the datafile, recover datafile, and online the datafile, as shown below:

$ sqlplus "/ as sysdba"
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' offline;
SQL> !cp -p /u03/oradata/prod1/devl_PROD_01.dbf /u03/oradata/PROD
SQL> recover automatic datafile '/u03/oradata/PROD/devl_PROD_01.dbf';
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' online;

Doing a Recover Datafile from the OPEN state (except for SYSTEM) can also be used to bring up your database for use before doing a recovery. The basic steps are: mount, offline the datafile, open, restore the datafile, recover datafile, and online the datafile, as shown below:

$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' offline;
SQL> alter database open;
The database is now available to the users, except for that datafile.
SQL> !cp -p /u03/oradata/prod1/devl_PROD_01.dbf /u03/oradata/PROD
SQL> recover automatic datafile '/u03/oradata/PROD/devl_PROD_01.dbf';
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' online;

You can still access data in tables in a tablespace which has an offlined datafile, just so long as the data is in one of the other datafiles of the tablespace and the table header is not in the offlined datafile.

Basic Recover Tablespace Recovery Option [Top] [Contents]

The Recover Tablespace option is used to recover all datafiles needing recovery in a tablespace up to the point of failure, synchronizing them with the other datafiles (complete recovery only). Recover Tablespace is performed from the OPEN state after taking the tablespace OFFLINE (which brings its datafiles offline). Since the SYSTEM tablespace CANNOT be taken offline, you cannot use Recover Tablespace on SYSTEM. The basic steps while the database is open are: offline the tablespace, restore the datafiles, recover tablespace, and online the tablespace, as shown below:

$ sqlplus "/ as sysdba"
SQL> alter tablespace development offline immediate;
Using Immediate rolls back currently pending transactions.
SQL> !cp -p /u03/oradata/prod1/devl_PROD* /u03/oradata/PROD
SQL> recover automatic tablespace development;
SQL> alter tablespace development online;

Doing a Recover Tablespace after opening the database can be used to bring up your database for use before doing a recovery. The basic steps are: mount, offline the bad datafiles, open, offline the tablespace, restore the datafiles, recover tablespace, and online the tablespace, as shown below:

$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf' offline;
SQL> alter database open;
SQL> alter tablespace development offline;
SQL> !cp -p /u03/oradata/prod1/devl_PROD* /u03/oradata/PROD
SQL> recover automatic tablespace development;
SQL> alter tablespace development online;

If a write error occurs on one of the datafiles when you offline a tablespace, use "alter tablespace tsname offline temporary;". Then, you can run a recovery on the datafile. If all datafiles in an offlined tablespace have write errors, use "alter tablespace tsname offline immediate;". Then, you can run a recovery on the tablespace.

Archivelogs Disk Volume Filled Up [Top] [Contents]

Symptoms: If the disk containing the directory for the archive log files fills up, Oracle will stop all transactions until the archiver process is able to continue writing to that directory. Doing a "df -k" shows that there is no room left for more archivelogs in that directory's disk. Current users' sessions will freeze, and users trying to log on will get "ERROR: ORA-00257: archiver error. Connect internal only, until freed.", since the archiver process is still waiting to archive that redo log group. (Note: Use the oerr utility in UNIX to get the error description, such as "oerr ora 257".) To verify this conclusion:

$ sqlplus "/ as sysdba"
SQL> select value from v$parameter where name = 'background_dump_dest';
Shows the pathname of the bdump directory.
SQL> !tail -200 /u00/oracle/admin/PROD/bdump/alert_PROD.log
Shows "ORA-00272: error writing archive log", indicating that the redo log group can't be written.
SQL> !ls -ltr /u00/oracle/admin/PROD/bdump/arch*
Get the latest archiver trace file in the bdump directory, such as arch_13106.trc (the last one listed).
SQL> !cat /u00/oracle/admin/PROD/bdump/arch_13106.trc
Also shows "ORA-00272: error writing archive log".

Action: You will need to free up some space on that disk volume for Oracle to continue, either by moving files off of that volume, or, more likely, by deleting old archive log files out of that directory which you have already backed up to tape. You must not delete archive log files that have not been backed up, since you wouldn't be able to recover transactions in those files if your database crashes. The following script deletes the old log files earlier than the given number of days (24-hour periods). If you back up nightly, 1 day is the smallest number you should enter, or 3 on Monday's if you don't back up on weekends. You have to be logged in as userid oracle or as root to remove the archive log files.

# File: remove_old_logs.shl
echo "You must be logged in as user Oracle to run this script,"
echo "which removes all archivelog files older than X days."
echo "Enter number of days to keep: \c"
read DAYS_KP; export DAYS_KP
find /u01/oradata/PROD/archivelogs -name '*.arc' -mtime +$DAYS_KP -exec rm {} \;
find /u01/oradata/PROD/archivelogs -name '*.arc.gz' -mtime +$DAYS_KP -exec rm {} \;
echo "Results after deletions:"
du -k
df -k

If you are on a Windows NT-based system (including Windows XP and Windows 2000), I've written a similar script using DOS Batch commands, which you can download by clicking on this: remove_old_logs.bat. This script uses extended DOS commands, which aren't available in Windows 98 and before.

If you can't free up any space on the archive log disk, you can redirect the archives to another disk until you are able to free up space on it, such as shown below. This redirection goes away if you restart your database, but you can make it permanent by updating the server parameter file (spfile), also shown below, or the init.ora file before restarting, according to which one of those that you are using. I haven't tried this, but just saw it on the web in a presentation by Chris Lawson on dbspecialists.com. (added 6/3/05)

$ sqlplus "/ as sysdba"
SQL> alter system archive log start to '<new archive log path name>';
SQL> alter system set log_archive_dest='<new archive log path name>' scope=spfile;

Loss of Control Files [Top] [Contents]

Symptoms: May be none until you try to shutdown and startup the database. On shutdown, if the control files were deleted, you would get "ORA-00210: cannot open control file '/u03/oradata/PROD/ctrl_PROD_01.ctl'", or, if the control files were overwritten, you would get "ORA-00201: control file version incompatible with ORACLE version" along with their names. On startup for both cases, you would get "ORA-00205: error in identifying control file '/u03/oradata/PROD/ctrl_PROD_01.ctl', along with "ORA-07366: sfifi: invalid file, file does not have valid header block." if overwritten.

Action: If you have a recent up-to-date textual control file (no added datafiles) from "alter database backup controlfile to trace;" in your user_dump_dest directory, you can just edit out the control file header and run it:

$ sqlplus "/ as sysdba"
SQL> shutdown abort
SQL> !ls -ltr /u00/oracle/admin/PROD/udump/*.trc
Get the latest textual control file in user_dump_dest directory, such as prod_ora_31494.trc (the last one listed).
SQL> !vi /u00/oracle/admin/PROD/udump/prod_ora_31494.trc
Edit the textual control file and remove all lines before the STARTUP NOMOUNT line (such as vi command ":1,20d").
SQL> @/u00/oracle/admin/PROD/udump/prod_ora_31494.trc

If you don't have a textual control file, you'll need to restore all datafiles and control files, but, not online redo log files, from the last backup and do a recovery "using backup controlfile":

$ sqlplus "/ as sysdba"
SQL> shutdown abort
At this point, restore ALL datafiles AND control files from the last backup, along with any archivelogs that are needed since that time, but, NOT the online redo log files.
SQL> connect / as sysdba
SQL> startup mount
SQL> recover automatic database using backup controlfile;
Then, AUTO on "ORA-00308: cannot open archived log ...".
SQL> alter database open resetlogs;
Immediately shut down the database and make a BACKUP, since the old backup is unuseable after RESETLOGS is used.

Note that you can combine the recovery options depending on what's needed. For example, I needed to recover up to a point in time before a datafile was added, so, I did the steps above, including deleting the added datafile after shutting down the database, and substituting the following recover command (you won't enter AUTO as above, and, when it's finished the partial recovery, it will show "Media recovery cancelled."):

SQL> recover automatic database using backup controlfile until time '2001-07-16:13:15:00';

Loss of TEMP Datafile [Top] [Contents]

Symptoms: On large sorts (select distinct, order by, group by, union) that can't be done in memory, the sort will fail with "ORA-01157: cannot identify data file 3 - file not found" if the loss happened in the middle of the sort, or "ORA-01116: error in opening database file 3" if the loss happened before the sort started, along with the file name "ORA-01110: data file 3: '/u03/oradata/PROD/temp_PROD_01.dbf'". Nothing is put into the alert.log file, and no trace files will be generated for this.

You may also need to drop and recreate the TEMP tablespace after a database recovery, which may be indicated by an error message like "ORA-25153: temporary tablespace is empty" when you first log into your application (such as SCT Banner), although you probably won't get that message just logging into sqlplus.

Action: All you need to do is to take the datafile offline and drop and recreate the TEMP tablespace, which can be done either while the database is still running (from sqlplus) or beginning at the mount state (when connected as sysdba). Here, we are using a locally managed temporary tablespace.

$ sqlplus "/ as sysdba"
SQL> alter database datafile '/u03/oradata/PROD/temp_PROD_01.dbf' offline;
Then, if from mount state, do an "alter database open;" here.
SQL> select file_name,bytes/1024 kbytes from dba_temp_files;
Shows the size of the TEMP datafile(s) in Kbytes.
SQL> select initial_extent/1024 kbytes from dba_tablespaces
            where tablespace_name = 'TEMP';
Shows the uniform size of the extents in Kbytes.
SQL> drop tablespace temp;
SQL> !rm /u03/oradata/PROD/temp_PROD_01.dbf
SQL> create tablespace temp
            tempfile '/u03/oradata/PROD/temp_PROD_01.dbf' size 40064K
            extent management local uniform size 640K;

TEMP Datafile Offline [Top] [Contents]

Symptoms: Similar to Loss of TEMP Datafile, except for the message "ORA-00376: file 3 cannot be read at this time", along with the file name message.

Action: You can offline the datafile and drop and recreate the tablespace, as above, or you can do a datafile recovery with the database open and bring the datafile online.

SQL> recover automatic datafile '/u03/oradata/PROD/temp_PROD_01.dbf';
SQL> alter database datafile '/u03/oradata/PROD/temp_PROD_01.dbf' online;

Loss of INACTIVE Online Redo Log Group (which had already been archived) [Top] [Contents]

Symptoms: The database crashes when an attempt is made to access that redo log group during a logfile switch. Current users will be kicked out with "ORA-01092: ORACLE instance terminated. Disconnection forced", and users trying to log on will get "ERROR: ORA-03114: not connected to ORACLE" and "ERROR: ORA-00472: PMON process terminated with error". Nothing will be placed in the alert.log file; however, there will be pmon (process monitor), dbwr (database writer), and lgwr (log writer) trace files generated in the background dump dest directory. Check these from the unix command line, since the database is not up to check them from:

$ grep background_dump_dest /u00/oracle/product/v723/dbs/initPROD.ora
Shows the pathname of the bdump directory.
$ cd /u00/oracle/admin/PROD/bdump
$ ls -ltr *.trc
Get the latest trace files in the bdump directory, such as pmon_13612.trc and lgwr_32306.trc (the last ones listed).
$ cat pmon_13612.trc
Shows "ORA-00470: LGWR process terminated with error" (also in dbwr).
$ cat lgwr_32306.trc
Shows "ORA-00313: open failed for members of log group 3 of thread 1" and messages containing the missing log file names like "ORA-00312: online log 3 thread 1: '/u03/oradata/PROD/log_PROD_3B.rdo'".

Action: You will need to startup the database in the mount state, drop the missing logfile group (and remove the group's member files, if they are not already gone), and add the logfile group back, at which time the database can be opened for use.

$ sqlplus "/ as sysdba"
SQL> startup
Shows "ORA-01081: cannot start already-running ORACLE - shut it down first".
SQL> startup force
Or, you could have done a "shutdown abort" followed by a "startup". Shows "Database mounted." and then "ORA-00313: open failed for members of log group 3 of thread 1" with additional messages containing the log file names.
SQL> select bytes/1024 from v$log where group# = 3;
Shows the size in K of the missing group's members.
SQL> select member from v$logfile where group# = 3;
Shows the member (file) names of the missing group.
SQL> alter database drop logfile group 3;
SQL> alter database add logfile group 3
('/u03/oradata/PROD/log_PROD_3A.rdo',
'/u03/oradata/PROD/log_PROD_3B.rdo') size 500K;
Recreates all member files for that group of the given size.
SQL> alter database open;

If just one member of group is lost, the other member(s) will take over and allow Oracle to keep running with no apparent problem. However, alert_PROD.log will show errors such as "ORA-00313: open failed for members of log group 3 of thread 1" and "ORA-00312: online log 3 thread 1: '/u03/oradata/PROD/log_PROD_3A.rdo'" on each log switch for that group. In this case, drop and recreate the member:

SQL> select * from v$log where group# = 3;
If the status is active or current for that group, do an "alter system switch logfile;"
SQL> alter database drop logfile member
'/u03/oradata/PROD/log_PROD_3A.rdo';
SQL> alter database add logfile member
'/u03/oradata/PROD/log_PROD_3A.rdo' to group 3;

Loss of CURRENT Online Redo Log Group (which needed to be archived) [Top] [Contents]

Symptoms: This behaves like the database freezing when the archivelogs disk volume is filled up; however, doing a "df -k" shows that there is plenty of room for more archivelogs. Current users' sessions will freeze, and users trying to log on will get "ERROR: ORA-00257: archiver error. Connect internal only, until freed.", since the archiver process is still waiting to archive that redo log group after Oracle has cycled through the other groups and needs to use that group again. To verify this conclusion:

$ sqlplus "/ as sysdba"
SQL> select value from v$parameter where name = 'background_dump_dest';
Shows the pathname of the bdump directory.
SQL> !tail -200 /u00/oracle/admin/PROD/bdump/alert_PROD.log
Shows "ORA-00286: No members available, or no member contains valid data", indicating that the redo log group is missing or corrupt, as well as messages such as "ORACLE Instance PROD - Can not allocate log, archival required", "Thread 1 cannot allocate new log, sequence 21", and "All online logs needed archiving".
SQL> !ls -ltr /u00/oracle/admin/PROD/bdump/arch*
Get the latest archiver trace file in the bdump directory, such as arch_22882.trc (the last one listed).
SQL> !cat /u00/oracle/admin/PROD/bdump/arch_22882.trc
Also shows "ORA-00286: No members available, or no member contains valid data".
SQL> shutdown abort
SQL> startup
Shows "ORA-00313: open failed for members of log group 2 of thread 1", or whatever group number has the problem.
Then, if you had tried to just drop and recreate the redo log group from this mount state (as in loss of inactive online redo log group), you would have gotten:
SQL> alter database drop logfile group 2;
Fails with "ORA-00350: log 2 of thread 1 needs to be archived".

Action: This requires an incomplete recovery to just before that group was used, since Oracle can't continue without doing a successful archive, and since you can't drop and recreate the redo log group while the archiver has it marked for archiving. Note that all changes in the lost redo log group will be lost (yuck!). An incomplete time-based recovery is shown below:

SQL> connect / as sysdba
SQL> shutdown abort
At this point, restore ALL datafiles from the last backup, any archivelogs that are needed since that time, AND the lost online redo log group's files (this is the only case in which you would restore those redo logs), but, NOT the control files.
SQL> startup mount
SQL> select group#, sequence#, bytes, first_change#, to_char(first_time,'DD-MON-YY HH24:MI:SS'), status from v$log;
Get the first time for changes in that missing group, subtract 1 second from it, and use that value in the recover command below.
SQL> recover automatic database until time '2005-02-14:12:59:59';
SQL> alter database open resetlogs;
Then, immediately shut down the database and make a BACKUP, since the old backup is unuseable after RESETLOGS is used.

Failure During Hot Backup [Top] [Contents]

If you have a failure while you are doing a hot backup on a tablespace, besides doing any recovery that is needed for the particular failure, you will also need to bring those tablespace datafiles back out of hot backup mode. To do this, while the database is in a mount state, do an "end backup" on each of those datafiles before opening the database. This is available in Oracle 7.2 and above - before that, you would have to restore all the datafiles and do an incomplete recovery to the time before the hot backup was started on that tablespace.

$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> select df.name,bk.time from v$datafile df,v$backup bk
where df.file# = bk.file# and bk.status = 'ACTIVE';
Shows the datafiles currently in a hot backup state.
SQL> alter database datafile
'/u03/oradata/PROD/devl_PROD_01.dbf' end backup;
Do an "end backup" on those listed hot backup datafiles.
SQL> alter database open;

Loss of Server Parameter File (spfile) [Top] [Contents]

If your server parameter file (spfile) becomes corrupt, and you haven't been creating a textual init.ora parameter file as a backup, you can pull the parameters from it using the strings command in UNIX to create an init.ora file. You will need to edit the resulting file to get rid of any garbage characters in it (but don't worry about the "*." characters at the beginning of the lines) and make any corrections to it before using it to start your database, but, at least you will have something to go by:

$ cd $ORACLE_HOME/dbs
$ strings spfilePROD.ora >initPROD.ora

If you have been saving off a textual init.ora parameter file as a backup, you can restore that init.ora file to the $ORACLE_HOME/dbs directory in UNIX (or $ORACLE_HOME\database directory in NT). You will need to delete the corrupt spfile before trying to restart your database, since Oracle looks for the spfile first, and the init.ora file last, to use as the parameter file when it starts up the database (or, you could leave the spfile there and use the pfile option in the startup command to point to the init.ora file). Then, once your database is up, you can recreate the spfile using the following (as sysdba):

SQL> create spfile from pfile;

Bonus Topics [Top] [Contents]

The following two topics will be discussed at my SCT Summit session, if time permits.

Who is Toon Koppelaars, and why should I care? [Top] [Contents]

See "Solving Oracle Data-Block Corruption" by Toon Koppelaars (below in References and Sources section). When you're really desperate to recover any semblance of your data, he takes you into the bowels of index files and datafile blocks to extract whatever pieces you can from them, showing you how to recover from a corrupted block in a datafile by extracting the affected records' column values from existing indexes on that table and/or extracting all unaffected records from blocks that are still valid. This would be useful if the bad blocks had been propagated to all of your backups before you caught the problem (cpio and other O/S backup commands don't detect Oracle blocks that are corrupted; export would catch them and would fail during export if a bad block was encountered in a table, since it does a full table scan).

The Ultimate Disaster [Top] [Contents]

Finally, we come to the ultimate disaster. Your building burns down.

Or, if you're in a state like Arkansas, a tornado comes through and blows away all your equipment, or mixes it up with the stuff from the business next door, and the poultry processing plant across town. You've heard of a bad hair day? That's nothing like a bad feather day! And you had payroll set up to run today!

Now, what do you do? You've got no PC's, no network, no server - just lots of scrap wire. Not even your rolodex to look up the phone numbers of your suppliers to beg some equipment from, and no building to put it in even if you got it.

Something to think about.

But, you say, we've got that covered. We've got a contract with IBM, who will supply us with an equivalent server and operating system like we've got now, along with 10 PC's, a laser printer, and the wiring and other hardware and the network software to get us back up and running payrolls within 48 hours from their Dallas emergency center.

And, of course, we've got all of our data backup tapes that we've been diligently making, and the CD-ROM's and backups of our Oracle and Banner software - which we got from one of our two storage sites [yea, right!] that wasn't destroyed by the tornado that ripped a mile-wide path through town.

Great. That tape archiving system you bought a few years ago has saved your bacon plenty of times up til now. You know, the one with the fastest backup speed and best compression ratio available at the time. Yep, the one with the proprietary format that can only be read by that archiving software. Are we starting to get the picture, now?

And, where IS that archiving software? IT'S ON THE TAPE!

Well, Steeeve, I'll just have the company send me a replacement disk. WHERE IS THAT ROLODEX?! Anyway, you finally find the number and call. Ring ... ring ... ring ... Is the company still in business? Whew! Somebody answers, and the conversation goes something like this:

"Oh, yes, that's our best-selling software, but we're at version 6 now. Did you get the upgrades? No? Well, what version did you have? Version 2? Oh, I'm sorry. Version 6 won't read that format any more. I'm afraid you're out of luck. But version 6 is a great package. And, I think the discount is still available for version 2 users. Would you like to place an order for it today?"

And you're starting to steam or faint or look for the quickest way out of town, and seeing red at the same time - I mean, this guy obviously doesn't have a clue!

Something to think about.

And even if you got over all of these hurdles and were able to restore your data and get your database up and running, what about all of the manuals and installation instructions and all your sticky notes that you had posted all over your office?

OK, I've printed out copies of all of my notes and stored them off-site, too, along with enough instructions to get us going til we can order some new manuals. And, I've even got a list of the manuals and other books that would need to be replaced.

Great. Now, has anybody seen any of our pre-printed check stock laying around that we can print those payroll checks on?! And, did anyone find our signature cartridge for the printer so we don't have to sign those 600 checks by hand?!

What do you mean, it'll take 6 weeks to get more stock ordered and printed?!!! We've got a payroll to meet!!! Uh-Oh, this cartridge doesn't seem to work with that laser printer - find us another one!

Something to think about.

You need to put together a disaster recovery plan for your site, and test it out and refine it til you get all the bugs worked out. What do you need to get your operation back up and running? Who can you call for replacement equipment and supplies? What are their phone numbers? Where do you put the equipment once you get it? Can you restore your network and Oracle and Banner and your database to that equipment and get it all running?

What about the C and COBOL compilers and compression software and FTP utilities and other software and utilities that you might need til you get back on your feet that isn't included with the replacement hardware or with Banner or Oracle? What about your manuals and user guides and other reference books? And, what about the check stock and other forms and signature cartridges and other specialized hardware that you need?

Get your group together and brainstorm and write it all down and create your disaster recovery plan, and try it all out as much as you can.

If All Else Fails [Top] [Contents]

If all else fails, and your backups are corrupted (or nonexistent), and you can't even start up Oracle on your database, and your company's future depends on the data in that database (you get the picture), you can contact Oracle Support Services to have their systems engineer specialists come to your location and try to recover your data using their recovery tools. But, be prepared to shell out $10,000 or more for the attempted recovery, and there's no guarantees. However, they can do some amazing things. In one case I know about, an NT Oracle database server crashed, cutting one of the datafiles in half, and Oracle was able to recover all the data in that datafile in about four hours!

The Resolution [Top] [Contents] (Added 4/13/05)

So, what did we do to fix our missing index datafiles? Well, the RAID drive with the table datafiles was still up. And, we had our archive logs on a third drive (see Enabling Archiving), with the last one being created at 5:11:06 PM, and copies of our redo logs multiplexed on two drives (see Adding Redo Log Members). And, we had Friday night's backups (see Backups). So, with the archive logs and redo logs separated from the table and index datafiles (see Bulletproofing and Moving Datafiles), we could recover from a drive loss. In this case, we had the archive logs up through Monday afternoon that we could apply to Friday night's backup to bring the datafiles up to a point in time before the failure.

I restored the index datafiles from that backup to the table datafiles directory (instead of to the still-dead index drive). I also restored the table datafiles and the control files from that backup to their current locations (I didn't want to use the current control files with half the datafiles not being close-able during the shutdown). Since we keep 5 days of archive logs on-line, I didn't have to restore any of those (see What To Restore). Then, I started up the database in the mount state and did a "rename file" on all of the index datafiles so that Oracle would know they are now in the table datafiles directory (see Moving Datafiles). I then issued the recovery command with "using backup controlfile" to bring all datafiles and the control files back up to date (see Loss of Control Files):
recover automatic database using backup controlfile until time '2005-04-11:17:10:00';
(Actually, I tried several recoveries until I didn't get "ORA-01113: file 1 needs media recovery" when opening the database - full recovery, then 5:15, then 5:11, then 5:10 - starting from restoring the datafiles each time, since I didn't know what to expect with this scenario.)

After opening the database and trying to log into Banner, it gave an error message about the temporary tablespace, so, I recreated that (see Loss of TEMP Datafile). Then, we went to the conference room where the manual check run planning was still underway (see The Ultimate Disaster) to let them know that Banner was back up, and they bolted out the door to run the payroll, getting it done by the time the service technician arrived to complete the diagnostics and work on the dead drive. I quickly ran a database backup while he was running diagnostics, and was done with time to spare before he needed to take the system down. We celebrated at lunch the next day, and got letters of commendation for our files.

Summary [Top] [Contents]

In this presentation, we discussed steps you can take to bulletproof your database against data loss, including enabling the archiving of redo log files, separating the different types of database files, with several ways to move those files, adding redo log members and groups, and the mirroring of log files and control files. We also discussed the two methods of backup that Oracle provides - cold and hot, along with identifying the files that need to be backed up, how to perform backups with the database either down or available to users, and other tasks you might run at that same time, such as gathering database statistics. Finally, we discussed how to recover from various disaster scenarios, including how to determine the cause of failure, identifying the database files to restore for recovery, the three basic disaster recovery methods of database, datafile, and tablespace recovery, and how to do either a complete up to the minute recovery or an incomplete recovery to an earlier point in time. We also went over some specific file type recovery scenarios, including the step-by-step commands needed for their recovery.

But, the main points to remember are to backup, backup, backup, making sure that you can restore from those backups to the same place or to a different disk or directory; practice, practice, practice your disaster recovery plans (create a test database and try out these various scenarios and others to see what their symptoms are and how you can recover from them); and, in the unlikely event that disaster does strike, don't panic - follow your plan.

===================================================================

Contributors: [Top] [Contents]

My thanks and appreciation goes out to everyone who contributed solutions to various disaster scenarios and to other topics presented in this paper, including but not limited to:

Antonio Martino (Oracle Corporation), who was my instructor for the Oracle7 Backup and Recovery Workshop.

The boracle@sungardsct.com listserve members, from whom I gleaned nuggets of gold over the past years.

Nina Boston (University of Arkansas Cooperative Extension Service), my boss who surprised me by consenting to send me to the Oracle classes.

Toon Koppelaars, who said it's nice to see that a paper written several years ago still helps people.

===================================================================

References and Sources: [Top] [Contents]

"Solving Oracle Data-Block Corruption" by Toon Koppelaars:
Oracle Magazine; Jan/Feb 1996 (no longer available at oramag.com)
http://web.inter.nl.net/users/T.Koppelaars/ora01578.rtf
See also http://www.fors.com/orasupp/rdbms/dba/28814_1.HTM for another paper on data block corruption and recovery.

"Optimal Flexible Architecture: Oracle7 for Open Systems, Part 1" (OFA) by Cary V. Millsap:
Oracle Magazine; May/June 1995
http://www.oramag.com/archives/55DBA.html (no longer available)

"Oracle8i Backup & Recovery Handbook" by Rama Velpuri; Oracle Press; 2000. Get this book - highly recommended! (Unfortunately, no 9i version yet.)

Oracle Books (for a listing, see http://download-west.oracle.com/docs/cd/B10501_01/nav/docindex.htm):
     Oracle9i User-Managed Backup and Recovery Guide
     Oracle9i Database Administrator's Guide
     Oracle9i Backup and Recovery Concepts
     Oracle Data Guard Concepts and Administration
     Oracle9i Recovery Manager User's Guide
     Oracle9i Recovery Manager Quick Reference
     Oracle9i Database Utilities (includes export and import)

Oracle Backup and Recovery classes; Oracle Corporation (various sites). Oracle keeps moving these course descriptions around, so, these links may not be current. See Oracle's education site for latest offerings.
Enterprise DBA Part 1B: Backup and Recovery (Oracle 8i)
Oracle9i Dababase Administration Fundamentals II

Oracle Underground Frequently Asked Questions web site:
http://www.orafaq.org/faq.htm

Oracle Support Bulletins (these are Oracle 8 and older, but still may be useful):
http://www.uaex.edu/srea/osee/osee.html

GNU Zip (Source code. Written in C. Must compile it on your site using the included Makefile, then, copy the resulting gzip and gunzip executables to your /usr/local/bin directory, and put links into /usr/bin for gzip and gunzip that point to those /usr/local/bin entries. Also includes Makefiles for VMS and other operating systems. Other ftp sites for gzip and other GNU tools are listed in http://www.gnu.org/order/ftp.html. There may now be later versions than 1.2.4 at the GNU web site.):
      ftp://ftp.gnu.org/pub/gnu/gzip/gzip-1.2.4.tar (tar format; 780K)
      ftp://ftp.gnu.org/pub/gnu/gzip/gzip-1.2.4.tar.gz (gzip format; 215K)
      (Both of these can be opened with Nico Mak's WinZip on Windows systems)

WARNING: If you do use GNU Zip for compressed backups (or any other 3rd party tool), make sure you have an uncompressed backup of that tool (source and executables) on tape or somewhere else that you can easily get to it in case you have a disk crash or other disaster and need to recover those zipped files from tape! Zipped files without the decompression software are useless! (Note: The Windows WinZip utility CAN read and extract files in a GNU Zip (.gzip) file; however, don't try to take database (.dbf) files created on one system, such as a UNIX system, and use them on another system, such as a Windows system.)

===================================================================

Appendix [Top] [Contents]

Sample Textual Control File [Top] [Contents]

A sample textual control file is shown below, before any editing is done on it (note: other versions of Oracle will have differing header information).

Dump file /u00/oracle/admin/PROD/udump/prod_ora_16060.trc
Oracle7 Server Release 7.2.3.0.0 - Production Release
PL/SQL Release 2.2.3.0.0 - Production
ORACLE_HOME = /u00/oracle/product/v723
ORACLE_SID = PROD
Oracle process number: 15 Unix process id: 16060
System name: AIX
Node name: Ournode
Release: 1
Version: 4
Machine: 123456789012

Thu May 21 23:21:25 1998
Thu May 21 23:21:25 1998
*** SESSION ID:(8.3158) 1998.05.21.23.21.25.000
# The following commands will create a new control file and use it
# to open the database.
# No data other than log history will be lost. Additional logs may
# be required for media recovery of offline data files. Use this
# only if the current version of all online logs are available.
STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "PROD" NORESETLOGS ARCHIVELOG
      MAXLOGFILES 32
      MAXLOGMEMBERS 2
      MAXDATAFILES 30
      MAXINSTANCES 8
      MAXLOGHISTORY 800
LOGFILE
      GROUP 1 (
            '/u00/oradata/PROD/log_PROD_1A.rdo',
            '/u02/oradata/PROD/log_PROD_1B.rdo'
      ) SIZE 500K,
      GROUP 2 (
            '/u00/oradata/PROD/log_PROD_2A.rdo',
            '/u02/oradata/PROD/log_PROD_2B.rdo'
      ) SIZE 500K,
      GROUP 3 (
            '/u00/oradata/PROD/log_PROD_3A.rdo',
            '/u02/oradata/PROD/log_PROD_3B.rdo'
      ) SIZE 500K
DATAFILE
      '/u03/oradata/PROD/syst_PROD_01.dbf',
      '/u03/oradata/PROD/rbs_PROD_01.dbf',
      '/u03/oradata/PROD/temp_PROD_01.dbf',
      '/u03/oradata/PROD/tool_PROD_01.dbf',
      '/u03/oradata/PROD/usr_PROD_01.dbf',
      '/u03/oradata/PROD/devl_PROD_01.dbf',
      '/u03/oradata/PROD/devl_PROD_04.dbf',
      '/u03/oradata/PROD/devl_PROD_02.dbf',
      '/u03/oradata/PROD/temp_PROD_02.dbf',
      '/u03/oradata/PROD/devl_PROD_03.dbf',
      '/u03/oradata/PROD/temp_PROD_03.dbf',
      '/u03/oradata/PROD/usr_PROD_02.dbf'
;
# Recovery is required if any of the datafiles are restored backups,
# or if the last shutdown was not normal or immediate.
RECOVER DATABASE
# All logs need archiving and a log switch is needed.
ALTER SYSTEM ARCHIVE LOG ALL;
# Database can now be opened normally.
ALTER DATABASE OPEN;

Cloning a Database [Top] [Contents] (updated 6/9/05)

A copy of a database can be made with a different Oracle SID name in order to create a new test database to try out transactions on, or to clone the PPRD database into the PROD database, or to make a copy from a backup to restore tables from without impacting the current database, or for other purposes. As long as you get all the pieces done, cloning is fairly straightforward, and should take less than an hour to do. The steps in cloning from an existing database to a new database are shown below. These steps would need to be modified slightly if you are restoring a copy from tape to the new database's directories instead of from an existing online database. The details of the cloning procedure are in cloning.txt.

The steps in cloning from an existing database to a new database are:

Copy the existing database's init.ora file to the new database's init.ora file, and edit the parameters in the copy to match.

Edit the /etc/oratab file to add a line for the new SID.

Get the list of pathnames for the existing database's datafiles, control files, and redo log files.

Create a textual control file of the existing database.

Edit the textual control file from step 4 to match the new SID, directory locations, and file names.

Shut down tasks that are running on the existing database, as needed, such as jobsub, posting, and approvals.

Shutdown the existing database.

Create other directories needed by the new database, such as the bdump, cdump, udump, and archivelog directories.

Copy the files from step 3 to the new database's directories, renaming the files as needed to match the new SID.

Run the textual control file from step 5, creating the new control files and starting up the new database instance. (Note that the existing database must be shut down, in most cases, before Oracle will create the control file of the copy.)

Update the global name to the new SID (if needed for remote access with create database link).

Make other updates for Banner for the new SID, such as changing the jobsub sequence number to not collide with the original, and changing the instance name for GUAINST.

Startup the the existing database instance.

Edit the listener.ora file to add a section for the new SID.

Reload the listener.

Edit the tnsnames.ora files on the server and on the client network to add the new SID.

Edit the start_jobsub.shl file to add a section for the new SID, if needed.

Stop and restart the jobsub processes.

Restart other tasks as needed.

Now, the new instance is up and running, listener and jobsub recognize it, and the original instance is up and running. (These steps should be generic to any Oracle site, except for parameter file and other changes for sites using Data Guard; steps 12 and 17 through 19 are specific to Banner products.)

WARNING: This cloning procedure may not work with some of the Oracle tools (such as the RMAN Recovery Manager's recovery catalog, because it uses an internal database ID instead of just the Oracle SID to identify the database). So, if you want to use those tools, you may need to find some other way to clone your Oracle database. You may can set up separate RMAN's for databases in Oracle 8+, which may allow you to use this cloning procedure (with separate RMAN's for production and for the clone). Also, in Oracle 8i+, you can copy tablespaces between databases, which would allow you to create an empty clone database, then, copy your production data tablespaces to that clone database (dropping and re-copying the tablespaces as needed to refresh the clone data).

===================================================================

Sample SQL and Unix Shell Scripts: [Top] [Contents]

backup_list.sql: DB-based backup file list generation (would need to be modified to work on a per-tablespace basis for hot backups). Don't run this as part of your only backup script. If the database is down because of a crash, you'll need a backup script that can run without accessing the database. The SID-based backup script would be best in that case.

backup_list.shl: SID-based backup file list generation.

Our Backup Scripts [Top] [Contents]

Our database backup scripts run nightly as a cron job from userid oracle for making cold backups of our set of databases. Features of this nightly backup system include minimal database downtime, low tape storage requirements (through gzip compression), database block integrity checking (through a full database export), and two night's worth of online compressed backups (datafiles and full exports) for quick disaster recovery, along with reports on tablespace, datafile, table, and index space usage for proactive maintenance.

It requires that your datafiles, control files, redo logs, archive logs, and other database files follow a standard naming convention (i.e., for PROD, all files with "PROD" in the leaf name of the pathname, such as devl_PROD_01.dbf for datafiles and arch_PROD_0000000977.arc for archivelogs). It uses two directories to hold two day's worth of zipped online backups, which you must set up (such as /u05/oradata/PROD.gzip1 and /u05/oradata/PROD.gzip2, both owned by oracle.dba). You also have to have GNU Zip (freeware compression software) loaded on your unix system. If you want to download and use these, put them in your /home/oracle directory and change them to match your site, including SID's and directory structures (I would hope your production database is really not named PROD!):

dbbackup.shl
dbb_gen_all.shl
dbb_gen_list.shl
dbb_gen_cmds.shl
dbb_overview.sql
dbb_ctlfile.sql
dbb_index_stats.sql

Also, make sure you protect dbbackup.shl so that only the oracle user ID can see it, since it contains the password for the oracle system user. And, make sure the .shl scripts have execute privileges:

chmod 700 dbbackup.shl
chmod 755 dbb_gen_all.shl
chmod 755 dbb_gen_list.shl
chmod 755 dbb_gen_cmds.shl

To set up this dbbackup job to run from cron in unix, log into unix as user "oracle", type crontab -e to bring up the vi editor for cron jobs for the current userid, and insert the following line to run dbbackup.shl at 11:00 pm ("0 23") each weeknight ("1-5"); then, save the change (:wq):

0 23 * * 1-5 /usr/bin/sh /home/oracle/dbbackup.shl

You may need to add other cron jobs running from the unix system user "root" to do other shutdown and startup tasks before and after the backup, such as shutting down sleep/wake tasks before the backup (by removing their flag files) and restarting jobsub and sleep/wake tasks after the backup, such as the following:

0 18 * * 1-5 rm -f /home/jobsub/PROD.fgractg
0 18 * * 1-5 rm -f /home/jobsub/PROD.forappl
0 2 * * 2-6 ./jsbringup.shl
Checks to see if the database is up and if the jobsub processes aren't running yet, then does an: su - jobsub "-c sh start_jobsub.shl".
0 8 * * 1-5 ./swbringup.shl
Checks to see if the database is up and if the sleep/wake processes aren't running yet, then does an fwakeup.shl command for each process to start.

After updating oracle's and root's cron jobs, you may have to shut down cron and let it restart itself if you find that unix is running multiple copies of your change (I just do this as a matter of habit). To shut cron down, log into unix as user "root" and enter the following:

# ps -ef | grep cron
Shows the process ID for /usr/sbin/cron, such as 12345 (the first number after "root").
# kill -9 12345
Kills the cron task. Unix will immediately start it back up again and read all of the crontab files for all of the users to set up those timed tasks.

For more information about cron, enter man cron at the unix prompt.