Monday, December 28, 2015

Restore Standby Database from Tape

My Data Guard database was running on 11.2.0.3 version and the DB size was around 10 TB.  Some time back, the primary database Hardware was crashed and whole DB  server went down and UNIX admin not able to bring the server up. It is  VERY critical and high SLA  database,  management wanted to bring it up as quickly as possible.

Here is what i have done to bring the application up.  I would like to share this and thought it might be helpful for others.

Environment :

Oracle version 11.2.0.3
Data Guard replication from primary to standby
Flashback feature is not enabled on the database.
Data Guard is running in Maximum performance mode

Here are the major steps :

1. Activate the standby database
2. Point the application to old standby and currently it is primary
3. Drop the old primary and restore from tape and setup the standby

Step 1  Activate the standby database.

As a first step, we need to activate the current standby data base to bring the application up.

Step 2  Point the application to current primary(activated the standby from step 1).

Application team will take care of this. DBA can provide the DB/Host info just in case application engineer needs it.

Step 3 Rebuild the standby database from tape.

Now the old Primary database is useless.  We will have to restore from the scratch. We have several ways we can restore the DB.  But I restored the DB from tape. Since i did not want to clone the DB from current primary. The reason is, i did not want to add additional load   on the current primary database. 

Step A

Disable archive log delete job in current Primary database.  The reason is, we need to apply the archive log files after building the standby database. It is safe to disable the archive log delete job.  I just want to save all the archive logs on the server.

Step B  Drop the broken database(the one it crashed)

Once UNIX SA bring the server up,  drop the database and restore the database from the scratch.
We don't need to restore the database from scratch if flashback is enabled on the database. But apparently, the flashback is not enabled on my environment and end up restoring the database from scratch.

We have several ways to drop the database. Please refer this just in case if you need more info. How to drop the database

I used RMAN to drop the database.  You can use different approach if you like.

RMAN> connect target
RMAN> startup force mount
RMAN>  sql 'alter system enable restricted session';
RMAN> drop database;

Step C  Verify the backup list from tape.

Here is the sample command to check the tape backup list..

/usr/openv/netbackup/bin/bplist -B -C hostname.bu -t 4 -l -s 02/22/2015 -e 02/22/2015 -R /

From the above command, we can find out the control file and spfile and backup info.

Step D  Restore the SPFILE 

Restore the spfile from tape by using below script.  I masked  XXXX in some places due to security reason. Please use the appropriate name according to your environment.  I restored the sfile under /work area.  

run
{
allocate channel t01 device type 'sbt_tape' parms  'ENV=(NB_ORA_CLIENT=XXXXX.bu,NB_ORA_SERV=hcmsnbu1)';
restore spfile to '/work/XXXXX_spfile.ora' from '/XXXXX_SPFILE_20140222_0400_58082_n2p1bbbi_1_1_840281458';
RELEASE CHANNEL t01;
}

Move the spfile to $ORACLE_HOME/dbs


Step E  Restore the control file

 Restore control file from tape using below script.
run
{
allocate channel t01 device type 'sbt_tape' parms 'ENV=(NB_ORA_CLIENT=XXXXX.bu,NB_ORA_SERV=hcmsnbu1)';
restore controlfile to '/work/XXXX_control.ctl' from 'XXXXX_CTRL_20140222_0400_58083_n3p1bbdp_1_1_840281529';
RELEASE CHANNEL t01;
}

Move the control file to appropriate location.

cd /work/
cp  XXXXXX_control.ctl  /data01/oradata/XXXX/control01.ctl
cp  XXXXXX_control.ctl  /data02/oradata/XXXX/control02.ctl
cp  XXXXXX_control.ctl  /data03/oradata/XXXX/control03.ctl

Step F  Start the instance.

Startup the instance as below

SQL> startup nomount
SQL> alter database mount;
















Step G  Restore and Recover the database. This step took me 10 hours.  In your environment,  restore time would be depends on your DB size, number of CPU on your host.

Go to Current primary and get the below SCN.

system@XXXXXX> SELECT TO_CHAR(STANDBY_BECAME_PRIMARY_SCN) FROM V$DATABASE;

TO_CHAR(STANDBY_BECAME_PRIMARY_SCN)
----------------------------------------
66132064577

system@XXXXXX>

Apply the above SCN +1 on the below script.

Here is the shell script i used to restore the database.  I ran this script in nohup background mode. I used 8 channel since my host has 8 CPU with hyper threading enabled.

$ORACLE_HOME/bin/rman msglog=rman_restore_db.log <
connect target
 run
        {
        allocate channel t01 device type 'sbt_tape' parms 'ENV=(NB_ORA_CLIENT=XXXXXX.bu,NB_ORA_SERV=hcmsnbu1)';
        allocate channel t02 device type 'sbt_tape' parms 'ENV=(NB_ORA_CLIENT=XXXXXX.bu,NB_ORA_SERV=hcmsnbu1)';
        allocate channel t03 device type 'sbt_tape' parms 'ENV=(NB_ORA_CLIENT=XXXXXX.bu,NB_ORA_SERV=hcmsnbu1)';
        allocate channel t04 device type 'sbt_tape' parms 'ENV=(NB_ORA_CLIENT=XXXXXX.bu,NB_ORA_SERV=hcmsnbu1)';
        allocate channel t05 device type 'sbt_tape' parms 'ENV=(NB_ORA_CLIENT=XXXXXX.bu,NB_ORA_SERV=hcmsnbu1)';
        allocate channel t06 device type 'sbt_tape' parms 'ENV=(NB_ORA_CLIENT=XXXXXX.bu,NB_ORA_SERV=hcmsnbu1)';
        allocate channel t07 device type 'sbt_tape' parms 'ENV=(NB_ORA_CLIENT=XXXXXX.bu,NB_ORA_SERV=hcmsnbu1)';
        allocate channel t08 device type 'sbt_tape' parms 'ENV=(NB_ORA_CLIENT=XXXXXX.bu,NB_ORA_SERV=hcmsnbu1)';
        SET UNTIL SCN 66132064578;
        RESTORE DATABASE;
        RECOVER DATABASE ;
        RELEASE CHANNEL t01;
        RELEASE CHANNEL t02;
        RELEASE CHANNEL t03;
        RELEASE CHANNEL t04;
        RELEASE CHANNEL t05;
        RELEASE CHANNEL t06;
        RELEASE CHANNEL t07;
        RELEASE CHANNEL t08;
        }
  exit
EOF

Here is the log file info for the above restore step.

Recovery Manager: Release 11.2.0.3.0 - Production on Mon Feb 24 14:34:49 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

RMAN>
connected to target database: XXXXXX (DBID=1081194209, not open)

RMAN> 2> 3> 4> 5> 6> 7> 8> 9> 10> 11> 12> 13> 14> 15> 16> 17> 18> 19> 20> 21> 22>
using target database control file instead of recovery catalog
allocated channel: t01
channel t01: SID=518 device type=SBT_TAPE
channel t01: Veritas NetBackup for Oracle - Release 7.5 (2012060523)

allocated channel: t02
channel t02: SID=604 device type=SBT_TAPE
channel t02: Veritas NetBackup for Oracle - Release 7.5 (2012060523)

allocated channel: t03
channel t03: SID=690 device type=SBT_TAPE
channel t03: Veritas NetBackup for Oracle - Release 7.5 (2012060523)

allocated channel: t04
channel t04: SID=776 device type=SBT_TAPE
channel t04: Veritas NetBackup for Oracle - Release 7.5 (2012060523)

allocated channel: t05
channel t05: SID=862 device type=SBT_TAPE
channel t05: Veritas NetBackup for Oracle - Release 7.5 (2012060523)

allocated channel: t06
channel t06: SID=948 device type=SBT_TAPE
channel t06: Veritas NetBackup for Oracle - Release 7.5 (2012060523)

allocated channel: t07
channel t07: SID=1034 device type=SBT_TAPE
channel t07: Veritas NetBackup for Oracle - Release 7.5 (2012060523)

allocated channel: t08
channel t08: SID=1120 device type=SBT_TAPE
channel t08: Veritas NetBackup for Oracle - Release 7.5 (2012060523)

executing command: SET until clause

Starting restore at 02/24/2014 14:34:53
channel t01: starting datafile backup set restore
channel t01: specifying datafile(s) to restore from backup set
channel t01: restoring datafile 00190 to /data24/oradata/XXXXXX/EDCFDT11_TRANSHISTORY_INDX_F28.dbf
channel t01: reading from backup piece

         Skipped more lines due to volume of the log

channel t04: reading from backup piece XXXXXX_AR_20140223_1045_58030_lep1b8li_1_1_840278706
channel t06: piece handle=XXXXXX_AR_20140222_1445_57789_dtp192bh_1_1_840206705 tag=XXXXXX_DG_AR_20140222_1445
channel t06: restored backup piece 1
channel t06: restore complete, elapsed time: 00:03:26
channel t05: piece handle=XXXXXX_AR_20140222_1245_57767_d7p18rah_1_1_840199505 tag=XXXXXX_DG_AR_20140222_1245
channel t05: restored backup piece 1
channel t05: restore complete, elapsed time: 00:03:26
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29305_818382098.arc thread=1 sequence=29305
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29306_818382098.arc thread=1 sequence=29306
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29307_818382098.arc thread=1 sequence=29307
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29308_818382098.arc thread=1 sequence=29308
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29309_818382098.arc thread=1 sequence=29309
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29310_818382098.arc thread=1 sequence=29310
channel t01: piece handle=XXXXXX_AR_20140223_0045_57885_gtp1a5gh_1_1_840242705 tag=XXXXXX_DG_AR_20140223_0045
channel t01: restored backup piece 1
channel t01: restore complete, elapsed time: 00:02:15
channel t03: piece handle=XXXXXX_AR_20140222_2045_57849_fpp19neh_1_1_840228305 tag=XXXXXX_DG_AR_20140222_2045
channel t03: restored backup piece 1
channel t03: restore complete, elapsed time: 00:02:35
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29311_818382098.arc thread=1 sequence=29311
channel t07: piece handle=XXXXXX_AR_20140222_2245_57865_g9p19ufi_1_1_840235506 tag=XXXXXX_DG_AR_20140222_2245
channel t07: restored backup piece 1
channel t07: restore complete, elapsed time: 00:02:45
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29312_818382098.arc thread=1 sequence=29312
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29313_818382098.arc thread=1 sequence=29313
channel t08: piece handle=XXXXXX_AR_20140223_0845_57988_k4p1b1kh_1_1_840271505 tag=XXXXXX_DG_AR_20140223_0845
channel t08: restored backup piece 1
channel t08: restore complete, elapsed time: 00:02:37
channel t02: piece handle=XXXXXX_AR_20140223_0245_57905_hhp1achi_1_1_840249906 tag=XXXXXX_DG_AR_20140223_0245
channel t02: restored backup piece 1
channel t02: restore complete, elapsed time: 00:02:37
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29314_818382098.arc thread=1 sequence=29314
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29315_818382098.arc thread=1 sequence=29315
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29316_818382098.arc thread=1 sequence=29316
channel t04: piece handle=XXXXXX_AR_20140223_1045_58030_lep1b8li_1_1_840278706 tag=XXXXXX_DG_AR_20140223_1045
channel t04: restored backup piece 1
channel t04: restore complete, elapsed time: 00:02:03
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29317_818382098.arc thread=1 sequence=29317
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29318_818382098.arc thread=1 sequence=29318
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29319_818382098.arc thread=1 sequence=29319
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29320_818382098.arc thread=1 sequence=29320
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29321_818382098.arc thread=1 sequence=29321
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29322_818382098.arc thread=1 sequence=29322
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29323_818382098.arc thread=1 sequence=29323
archived log file name=/dbArch/XXXXXX/XXXXXX_1_29324_818382098.arc thread=1 sequence=29324
media recovery complete, elapsed time: 00:05:16
Finished recover at 02/25/2014 02:06:15
released channel: t01

released channel: t02

released channel: t03

released channel: t04

released channel: t05

released channel: t06

released channel: t07

released channel: t08

RMAN>

Recovery Manager complete.

Step H  Convert the database to Standby

ALTER DATABASE COVERT TO PHYSICAL STANDBY













Now the DB is converted to standby database.  Run the below SQL to check the status.

set linesize 200
col DB_UNIQUE_NAME form a30
col DATABASE_ROLE for a20
col OPEN_MODE for a30
col SWITCHOVER_STATUS for a15
column protection_mode form a30
select DB_UNIQUE_NAME,Database_role,open_mode,switchover_status,protection_mode  from v$database;

In my environment,  the status shows as below.

database role = Physical Standby
open mode = Mounted
switch over status = sessions active


Step I Start the Standby and Enable the MRP

Enable the log shipping in primary database. In my environment, log_archive_dest_stat_2 is pointing to standby database. Open the primary and standby database alert log and tail the log files on different putty window.

Login to current primary database and run the below command.

alter system set log_archive_dest_state_2 = 'ENABLE'









Start up the current restored database in standby mode as below. Login the current standby database
and run the below command.  Monitor the alert log while running the below commands.

shutdown immediate;
startup nomount;
alter database mount standby database;
alter database recover managed standby database disconnect from session;
-- Stop here and verify the alert and make sure it is shipping the log files to standby
-- wait for 5 minutes and double check the alert log and make sure no issues.
alter database recover managed standby database cancel;
alter database open;
alter database recover managed standby database using current logfile disconnect;

Verify the log shipping and DG replication.

Step J  Post Verification.

Verify the alert log on both database and make sure no errors. Enable the archive log delete join in primary database once standby is caught up with primary database.

Lesson learned :  Enabling flashback is highly important on data guard environment. Otherwise, we will have to rebuild the database from the scratch if the DB is crashed. I enabled flashback features on all the data guard environment after i went through the costly exercise.

Hope this post helps!  Please let me know if you have any questions or comments.


No comments: