Implementing
Data Guard
General Concepts
Architecture
DataGuard Protection Modes
DataGuard Physical Implementation
DataGuard
Logical Implementation
Quick
Steps for creating the Physical Standby from a Hot backup
DataGuard Monitoring
Failover Steps
Switchover Steps
Implementation Tips
More Information
Applying Pacthes with Standby
General
Concepts
Oracle9i Data Guard is the
management, monitoring, and
automation software that work with a production database and one or
more
standby databases to protect data against failures, errors, and
corruption that
might otherwise destroy your database.
Data
Guard Components
Oracle9i Data Guard consists of the following
components:
A database can operate in one of the two mutually
exclusive
roles: primary or standby database.
Data
Guard Interfaces
The log transport services and log apply services
use the
following processes to ship and apply redo logs to the physical standby
database:
On the primary database site, the log writer process (LGWR) collects transactions from the log buffer and writes to the online redo logs. The archiver process (ARCH) creates a copy of the online redo logs, and writes to the local archive destination. Depending on the configuration, the archiver process or log writer process can also transmit redo logs to standby database. When using the log writer process, you can specify synchronous or asynchronous network transmission of redo logs to remote destinations. Data Guard achieves synchronous network I/O using LGWR process. Data Guard achieves asynchronous network I/O using LGWR network server process (LNS). These network severs processes are deployed by LOG_ARCHIVE_DEST_n initialization parameter. Data Guard’s asynchronous log transport (i.e. the Maximum Performance mode) is recommended for a configuration in which the network distance is up to thousands of miles, providing continual maximum performance, while minimizing the risks of transaction loss in the event of a disaster.
On the standby database site, the remote file server process (RFS) receives archived redo logs from the primary database. The primary site launches the RFS process during the first log transfer. The redo logs information received by the RFS process can be stored as either standby redo logs or archived redo logs. Data Guard introduces the concept of standby redo logs (separate pool of log file groups). Standby redo logs must be archived by the ARCH process to the standby archived destination before the managed recovery process (MRP) applies redo log information to the standby database.
The fetch archive log
(FAL) client is the MRP process. The
fetch archive log (FAL) server is a
foreground process that runs on the primary database and services the
fetch
archive log requests coming from the FAL client. A
separate FAL server is created for each
incoming FAL client.
Thanks to the FAL_CLIENT and
FAL_SERVER parameters, the managed-recovery process in the physical
database will automatically check and resolve gaps at the time redo is
applied. This helps in the sense that you don't need to perform the
transfer of those gaps by yourselve.
When using Data Guard broker (dg_broker_start = true), the monitor agent process named Data Guard Broker Monitor (DMON) is running on every site (primary and standby) and maintain a two-way communication.
Logical
Standby Processes Architecture
The major difference between the
logical and
physical
standby database architectures is in its log apply services. On Logical Standby, you can query it while simultaneously
applying transactions from the primary. This is ideal for business that
requires a near real-time copy of your production DB for reporting.
The logical standby process (LSP) is the coordinator process for two groups of parallel execution process (PX) that work concurrently to read, prepare, build, and apply completed SQL transactions from the archived redo logs sent from the primary database. The first group of PX processes read log files and extract the SQL statements by using LogMiner technology; the second group of PX processes apply these extracted SQL transactions to the logical standby database. The mining and applying process occurs in parallel. Logical standby database does not use standby online redo logs. Logical standby database does not have FAL capabilities in Oracle9i. All gaps are resolved by the proactive gap resolution mechanism running on the primary that polls the standby to see if they have a gap.
Data
Protection Modes
Maximum
Protection:
It offers the highest level of data availability for the primary
database. Redo records are synchronously
transmitted from the primary database to the standby database using
LGWR
process. Transaction is not committed
on the primary database until it has been confirmed
that the
transaction data is available on at least one standby database. This mode is usually configured with at least
two standby databases. If all standby
databases become unavailable, it may result in primary instance
shutdown. This ensures that no data is
lost when the
primary database loses contact with all the standby databases. Standby online redo logs are required in this
mode. Therefore, logical standby
database cannot participate in a maximum protection configuration. This mode is similar to 9iR1’s guaranteed
mode. LGWR SYNC AFFIRM option
Maximum Availability: It offers the next highest level of data availability for the primary database. Redo records are synchronously transmitted from the primary database to the standby database using LGWR process. The transaction is not complete on the primary database until it has been confirmed that the transaction data is available on the standby database. If standby database becomes unavailable, it will not shut down the primary database. Instead, the protection mode is temporarily switched to maximum performance mode until the fault has been corrected and the standby database will re-synchronize with the primary database. This protection mode supports both physical and logical standby databases, and only available in Oracle9i release 2. LGWR SYNC AFFIRM option
Maximum Performance: It is the default protection mode. It offers slightly less primary database protection than maximum availability mode but with higher performance. Redo logs are asynchronously shipped from the primary database to the standby database using either LGWR or ARCH process. When operating in this mode, the primary database continues its transaction processing without regard to data availability on any standby databases and there is little or no effect on performance. This protection mode is similar to the combination of 9iR1’s Instance, Rapid, and Delay modes. It supports both physical and logical standby databases. LGWR ASYNC AFFIRM or NOAFFIRM option
Mode |
Log Writing Process |
Network Trans Mode |
Disk Write Option |
Redo Log Reception Option |
Supported on |
Maximum Protection |
LGWR |
SYNC |
AFFIRM |
Standby redo logs are required |
Physical standby databases |
Maximum Availability |
LGWR |
SYNC |
AFFIRM |
Standby redo logs |
Physical and logical standby databases |
Maximum Performance |
LGWR or ARCH |
ASYNC if LGWR |
NOAFFIRM |
Standby redo logs |
Physical and logical standby databases |
These can be found in the Oracle documentation in Chapter 5 of the Data
Guard Concepts and Administration Manual.
• AFFIRM assures that archive
logs are written to disk, primary or
standby.
• MANDATORY assures that redo
logs are not overwritten until archive
logs are successfully created. This should only apply to the
primary database.
• REOPEN=30 means that there
will be a 30 second delay until ARCn
and/or LGWR processes try again on a MANDATORY destination which failed.
• DELAY is in minutes and
does not stop the copy of an archive log file
to a standby server but the application of redo on the standby after
copying the archive log to the standby. This will not help primary
database performance.
• Using ARCH instead of LGWR for the second standby
database may help
primary database performance but smaller sized log files would probably
be required. SYNC=PARALLEL applies to LGWR only. Using ARCH waits for a
switch on the primary, LGWR copies entries to a standby archive log,
applied only at switch. ARCH will copy and apply at switch. LGWR is
more efficient since it writes redo entries to all standby databases at
once but a primary and two standby databases could possibly cause a
performance issue for the primary database, possibly but unlikely!
Additionally multiple archiver processes can be created on the primary
database. Increase the value of the LOG_ARCHIVE_MAX_PROCESSES parameter
to start additional archiver processes. The default on my machine
appears to be 2 and not 1 as stated in the manuals; probably because I
have two standby databases.
• The ARCHIVE_LAG_TARGET
parameter could be used to increase the
frequency of log switches, thus sending less data to the standby
databases more often. Specifies the maximum number of seconds between
each log switch, so it will force a log switch when that number in
seconds is reached.
Now let’s check for objects and attributes which are unsupported on a
logical standby database. Application objects such as tables could be a
problem.
SELECT * FROM DBA_LOGSTDBY_UNSUPPORTED ORDER BY owner,table_name,column_name;
Now check for missing primary keys. Application tables without unique
primary keys will require them, as rows will not be identifiable in the
logical standby database for update by SQL Apply. Drop any objects
listed or create primary keys for them.
SELECT OWNER, TABLE_NAME, BAD_COLUMN
FROM DBA_LOGSTDBY_NOT_UNIQUE;
Using Data
Guard Redo Apply in a LAN the following is recommended:
• Use Maximum Protection or Maximum Availability modes for zero data
loss; the performance impact was less than 3% in all synchronous tests.
With a single remote archive destination, use the NOPARALLEL option
(“lgwr sync=noparallel”).
• For very good performance and a minimal risk of transaction loss in
the event of a disaster, use Maximum Performance mode, with LGWR ASYNC
and a 10 MB async buffer (ASYNC=20480). LGWR ASYNC performance degraded
no more than 1% as compared to using the ARCH transport. LGWR ASYNC
also bounds the risk of potential transaction loss much better than the
ARCH transport. The 10 MB async buffer outperformed smaller buffer
sizes and reduced the chance of network timeout errors in a high
latency / low bandwidth network.
Metropolitan
and Wide Area Network
Data Guard is used across a metropolitan area networks (MAN) or WANs to
get complete disaster recovery protection. Typically a MAN covers
a large metropolitan area and has network Round-Trip-Times (RTT) from
2-10 ms. For the MAN/WAN tests, different network RTT’s were simulated
during testing to measure the impact of the RTT on the primary database
performance. The tests were conducted for the following RTT’s: 2 ms
(MAN), 10 ms, 50 ms, and 100 ms (WAN) Additionally, tests using Secure
Shell (SSH) port forwarding with compression were also done for
different RTT’s.
Best practices recommendations are:
• Use Maximum Protection and Maximum Availability modes over a MAN for
zero data loss. For these modes, the network RTT overhead over a WAN
can impact response time and throughput of the primary database. The
performance impact was less than 6% with a 10 ms network RTT and a high
transaction rate.
• For very good performance and a minimal risk of transaction loss in
the event of a disaster, use Maximum Performance mode, with LGWR ASYNC
and a 10 MB async buffer (ASYNC=20480). LGWR SYNC performance degraded
no more than 2% as compared to remote archiving. The 10 MB async buffer
outperformed smaller buffer sizes and reduced the chance of network
timeout errors in a high latency / low bandwidth network.
• For optimal primary database performance throughput, use remote
archiving (i.e. the ARCH process as the log transport). This
configuration is best used when network bandwidth is limited and when
your applications can risk some transaction loss in the event of a
disaster.
• If you have sufficient memory, then set the TCP send and receive
buffer sizes (these affect the advertised TCP window sizes) to the
bandwidth delay product, the bandwidth times the network round trip
time. This can improve transfer time to the standby by as much as 10
times, especially with the ARCH transport.
• Set SDU=32768 (32K) for the Oracle Net connections between the
primary and standby. Setting the Oracle network services session data
unit (SDU) to its maximum setting of 32K resulted in a 5% throughput
improvement over the default setting of 2048 (2K) for LGWR ASYNC
transport services and a 10% improvement for the LGWR SYNC transport
service.
• Use SSH port forwarding with compression for WAN’s with a large RTT
when using maximum performance mode. Do not use SSH with compression
for Maximum Protection and Maximum Availability modes since it
adversely affected the primary throughput. Using SSH port forwarding
with compression reduced the network traffic by 23-60% at a 3-6%
increase in CPU usage. This also eliminated network timeout errors.
With the ARCH transport, using SSH also reduced the log transfer time
for RTT’s of 50 ms or greater. For RTT’s of 10ms or less, the ARCH
transport log transfer time was increased when using SSH with
compression.
This is achieved by using the following command syntax executed on the
primary database.
ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE
{ PROTECTION | AVAILABILITY | PERFORMANCE };
The protection mode can be found by executing this query. PERFORMANCE
is the default.
SELECT name, protection_mode,
protection_level FROM v$database;
--------- -------------------- --------------------
STBY MAXIMUM PERFORMANCE MAXIMUM
PERFORMANCE
-Minimal Data Loss. The AVAILABILITY mode prevents a transaction
committing on the primary until all redo entries are written to at
least one standby database. SYNC transport is required and this option
is available to both logical and physical standby type databases.
Unlike PROTECTION mode, which shuts down the primary database in the
event of failure to pass redo entries to the standby, this mode simply
lowers the protection mode to PERFORMANCE until the error is corrected.
- No Data Divergence. PERFORMANCE mode is the default setting and
available for both physical and logical standby type databases. A
transaction will commit to the primary before all redo entries are
written to any standby database.
To ensure that minimal data loss will be encountered execute this
command on the primary database. The database must be in mounted
exclusive mode to execute this command.
ALTER DATABASE SET STANDBY DATABASE
TO MAXIMIZE AVAILABILITY;
PHysical
Data
Guard Implementation
Primary Site: | Standby Site: |
FGUARD:/u01/app/oracle/product/9.2.0:Y | FGUARD:/u01/app/oracle/product/9.2.0:N |
Primary Site: | Standby Site: |
FGUARD = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = server1)(PORT = 1521)) ) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = FGUARD) ) ) DR_FGUARD = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = server2)(PORT = 1521)) ) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = FGUARD) ) ) |
FGUARD= (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = server2)(PORT = 1521)) ) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = FGUARD) ) ) PROD_FGUARD = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = server1)(PORT = 1521)) ) (CONNECT_DATA = (SERVER = DEDICATED) (SID = FGUARD) ) ) |
Step
1: The Preparation
Step
2: Backup
the
primary Database Datafiles
Step
3: Create the Physical standby Database Control File
Step
4: Transfer the Datafiles and Control File to the Standby Site
Step
5: Start the Listeners on both Primary and Standby Site
Step
6: Start the Standby Database (Primary Database already running)
Step
7: Place the Standby Database in Managed Recovery Mode
Step
8: Monitor the Log Transport services and Log Apply Services
Step 1. Enable logging. On your primary database, instruct Oracle Database to force all logging of changes to the redo, even if nologging or unrecoverable data loads are performed:
SQL> alter database force logging;
Verify that forced logging has been enabled on your primary database, by issuing the following:
SQL> select force_logging from v$database;
Step 2. Create a password file for the primary database. Every database in a Data Guard environment must use a password file. Additionally, the password used by SYS must be the same for all primary and standby databases. On your primary database server, navigate to ORACLE_HOME/dbs and issue the following command:
$ orapwd file=orapw<sid_name> password=top_secret
Also, instruct Oracle Database to use the newly created password file, by setting the init.ora/spfile remote_login_ passwordfile parameter to either EXCLUSIVE or SHARED.
Step 3.
Configure the primary database init.ora/spfile.
If you are using an spfile, you may find it easier to switch to using
an init.ora file while implementing your logical standby. After
implementation, you can easily switch back to using an spfile. In
this example, BRDSTN is the
database name of both the primary and the standby. Primarydb is the Oracle Net service name of the primary
database, and standbydb is the
Oracle Net service name of the standby database.
db_name=BRDSTN
db_unique_name=primarydb
# dg_config specifies unique Oracle Net service names in Data Guard environment
log_archive_config='dg_config=(primarydb, standbydb)'
log_archive_dest_1='location=/orarchive/BRDSTN db_unique_name=primarydb'
log_archive_dest_2='service=standbydb valid_for=(online_logfiles,primary_role) db_unique_name=standbydb'
log_archive_dest_state_1=enable
log_archive_dest_state_2=enable
remote_login_password=exclusive
# Enable automatic propagation of data file operations
standby_file_management=auto
Step 4. Enable archiving. Ensure that your primary database is in archive log mode:
SQL> archive log list;
If archiving hasn't been enabled on your primary database, run the following:
SQL> shutdown immediate;
SQL> startup mount;
SQL> alter database archivelog;
SQL> alter database open;
Note that the LOG_ARCHIVE_START initialization parameter is obsolete in Oracle Database 10g. Archiving is automatically enabled when you put your database into archive log mode.
Step 5. Put a primary key on every replicated table. The SQL Apply process must be able to match rows changing in the primary database to the corresponding rows changing in the standby database. SQL Apply can't use a rowid, because it can be different between the two databases. Therefore, each table being replicated to a logical standby database must have a primary or unique key defined.
To identify tables that have rows that cannot be uniquely identified, query the DBA_LOGSTDBY_NOT_UNIQUE view.
Step 6. Enable supplemental logging. Enabling supplemental logging will direct Oracle Database to add a small amount of extra information to the redo stream. The SQL Apply process uses this additional information to maintain tables being replicated. On your primary database, enable supplemental logging as follows:
SQL> alter database add supplemental log data (primary key, unique index) columns;
SQL> alter system archive log current;
You can verify that supplemental logging has been enabled, by issuing the following on your primary database:
SQL> select supplemental_log_data_pk, supplemental_log_data_ui from v$database;
Step 7. Take a backup of your primary database, and move it to the standby machine. Take an offline or online backup of your primary database and copy it to your standby server. You need to back up and copy only data files, not online redo files or controlfiles.
Step 8. Create a logical standby controlfile. You must create a special logical standby database controlfile and then copy it to your standby machine. On your primary database, issue the following SQL:
SQL> alter database create logical standby controlfile as '/ora01/oradata/BRDSTN/sb.ctl';
Note the use of the keyword logical; it's critical to use the correct syntax.
After creating the logical standby
controlfile,
copy it to your standby machine. In this example, the standby
controlfile must be placed in the /ora01/ oradata/BRDSTN directory on
the standby machine
Step 9.
Create init.ora for logical standby.
Copy the primary init.ora file to the standby machine, and then make
the necessary modifications for your logical standby database, as shown
HERE:
# Change db_unique_name to standby Oracle Net name
db_unique_name=standbydb
# In the archive destination, change db_unique_name to standby Oracle Net name
log_archive_dest_1='location=/orarchive/BRDSTN db_unique_name=standbydb'
# Specify where arriving archive redo should be placed
standby_archive_dest=/orarchive/BRDSTN
# If you have a new controlfile name, change it here
control_files=/ora01/oradata/BRDSTN/sb.ctl
# FAL parameters facilitate initial LS setup, but are not required after setup
fal_server=primarydb
fal_client=standbydb
# Oracle recommends setting this minimally to 9
parallel_max_servers=9
Step 10. Create a password file for the logical standby database. As noted in Step 2, every Oracle Data Guard-enabled database needs a password file using the same password. On your standby machine, go to ORACLE_HOME/dbs and issue the following command:
$ orapwd file=orapw<sid_name> password=top_secret
Step 11. Configure Oracle Net for primary and standby databases. The primary and logical standby databases need to communicate with each other via Oracle Net. Ensure that both the primary and the logical standby databases have listeners and that the appropriate Oracle Net service information is in place. Here are examples of the entries in the tnsnames.ora file on both the primary and the standby servers:
primarydb =
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp) (PORT=1521) (HOST=primary_host))
(CONNECT_DATA=(SERVICE_NAME=BRDSTN)))
standbydb=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp) (PORT=1521) (HOST=standby_host))
(CONNECT_DATA=(SERVICE_NAME=BRDSTN)))
Also, Oracle recommends enabling dead connection detection by the setting of sqlnet.expire_time to one minute in your sqlnet.ora file, as shown:
sqlnet.expire_time=1
Step 12. Start up and activate the logical standby database. On your logical standby server, start up and activate your logical standby database, as follows:
SQL> startup mount;
SQL> alter database
recover managed standby database;
You may need to give the above ALTER command a few minutes to complete. When it is finished, you can activate your standby database as follows:
SQL> alter database activate standby database;
Step 13. Rename your logical standby database. Renaming your logical standby database is not a required step. However, Oracle recommends renaming your logical standby database to ensure that the logical standby is never confused with the primary database.
Use the nid utility to reset the name of your logical standby database. Before running nid, shut down and start up your database in mount mode.
SQL> shutdown immediate;
SQL> startup mount;
$ nid target=sys/top_secret dbname=BRDLS
In this example, BRDLS is the new name of your logical standby database. You should now see the following line prompting you:
Change database ID and database name BRDSTN to BRDLS? (Y/[N]) =>
Enter Y and a return. At this point, you should see at the bottom of the message text:
DBNEWID - Completed successfully.
Step 14. Change the logical standby db_name in the init.ora file. Now you need to change the db_name initialization parameter. For example, in your logical standby database init.ora file, make this change:
db_name=BRDLS
Step 15. Re-create the password file for the logical standby database. After running the nid utility, you need to re-create your logical standby password file. To do this, navigate to ORACLE_HOME/dbs, remove the old password file, and issue the following OS command:
$ orapwd file=orapw<sid_name> password=top_secret
Step 16. Open the logical standby database with resetlogs. You can now make your logical standby database accessible. Start up your database, and open it with the RESETLOGS command, as follows:
SQL> startup mount;
SQL> alter database open resetlogs;
Step 17. Add temp files to the logical standby database. You'll need a temp tablespace in your logical database if you plan to do any reporting or if you ever transition the logical standby database to a primary database role. Add the temp file(s) to the logical standby as they existed on the primary database:
SQL> alter session disable guard;
SQL> alter tablespace temp add tempfile '/ora01/oradata/BRDSTN/temp01.dbf'
size 500M reuse;
SQL> alter session enable guard;
Step 18. Restart the logical standby database SQL Apply process. All you need to do now is restart the SQL Apply process on your logical standby database:
SQL> alter database start logical standby apply;
You now have a fully functioning logical standby database.
--On the standby, Get the sequence number of the
last applied archive log.
select max(sequence#)
Last_applied_arch_log from
v$archived_log where
applied='YES';
-- On the standby, Get the sequence
number of the
-- This is the last log the standby can apply without receiving
-- additional archive logs from the primary.
SELECT
min(sequence#) Last_archive_log_received FROM
v$archived_log
WHERE
(sequence#+1) NOT IN (SELECT sequence# FROM v$archived_log)
AND
sequence# > &Last_applied_arch_log;
--Connect
to the primary database and obtain the sequence number of the current online
log:
select sequence# from v$log
where status='CURRENT';
-- The difference between 2nd query
and 1st
query should be 0
-- The difference between 3d query and
1st query is the number of archive logs
that the standby
-- database
would not be able to recover
if the primary host become unavailable
Another Method
Use the following SQL on the standby database (the database must be
mounted).
SELECT high.thread#, "LowGap#","HighGap#"
FROM (SELECT thread#, MIN(sequence#)-1 "HighGap#"
FROM (SELECT
a.thread#, a.sequence#
FROM ( SELECT *
FROM v$archived_log) a,
(SELECT thread#, MAX(next_change#) gap1
FROM v$log_history
GROUP BY thread#) b
WHERE a.thread# = b.thread#
AND a.next_change# > gap1
)
GROUP BY
thread#
) high,
(SELECT thread#, MIN(sequence#)
"LowGap#"
FROM ( SELECT thread#,
sequence#
FROM v$log_history, v$datafile
WHERE checkpoint_change# <= next_change#
AND checkpoint_change# >= first_change#
)
GROUP BY thread#) low
WHERE low.thread# = high.thread#;
For the purpose of the following list, let's pretend the master server
is called PROD_01 and the standby server STDBY_PROD. In a nutshell,
these are the steps involved.
1. STDBY_PROD -- Suspend recovery
on the standby site.
2. PROD_01 -- Suspend log
shipping from the master to the standby site.
3. PROD_01 -- Perform one or more
log switches on the master using alter system switch logfile; commands.
4. PROD_01 -- Note the log
sequence numbers generated on the master.
5. PROD_01 -- Shutdown the master
using the immediate option.
6. PROD_01 -- Ship the archived
redo logs to the standby site.
7. STDBY_PROD -- Apply those
archived redo to the standby instance.
8. STDBY_PROD -- Shutdown the
standby using the immediate option.
9. PROD_01 -- Ship a copy of the
master control files to the standby site positioned in the proper
location.
10. PROD_01 -- Ship a copy of the
master's online redo logs to the standby site positioned in the proper
location.
11. STDBY_PROD -- Startup the new
master database.
12. STDBY_PROD -- Create a new
standby control file.
13. PROD_01 -- Get that new
standby control file and position properly.
14. PROD_01 -- Startup mount
(standby) the new standby to verify it is OK.
15. PROD_01 -- Shutdown the
standby using the immediate option.
In this FULL Example, the
old standby database (prod_02)
becomes the new primary, and the old primary (prod_01) becomes
the new
standby database.
|
Standby Online Redo Logs | Standby Archived Redo Logs |
Advantages | -
Pre-allocated files - Can place on raw devices - Can be duplexed for more protection - Improve redo data availability - No Data Loss capable |
-
No extra ARCH process - Reduce lag time |