Wednesday, August 8, 2018

Restoring from multiple nodes in a RAC cluster

My previous post involved backing up from all nodes in a RAC cluster.  You can find it here.

In that post I explained why it is important to use multiples nodes to increase your throughput.

In this post I will explain how to restore using multiple nodes in the clusters.

First let's start with my database and drop it.

I stopped my database and went to ASM and removed all files from the disk group.


Step 1 -- Find the DBID from the rman catalog



select db_id,REG_DB_UNIQUE_NAME from db;

     DB_ID REG_DB_UNIQUE_NAME
---------- ------------------------------
4099465517 db1
3776942889 BRYAN
3281763275 db2
2294802577 db3

There it is !  My database BRYAN has a DBID of 3776942889


Step 2 --  Create a small initfile to use to start the database nomount.



db_name='bryan'
memory_target=1G
processes = 150


Step 3 --  start the database nomount.


startup nomount;
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size            2932632 bytes
Variable Size          616562792 bytes
Database Buffers      394264576 bytes
Redo Buffers           59981824 bytes


Step 4 -- Connect to RMAN using the target / and catalog {my_rman_catalog}



Recovery Manager: Release 12.1.0.2.0 - Production on Wed Aug 8 11:26:18 2018

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

connected to target database: BRYAN (not mounted)
connected to recovery catalog database


Step 5 -- Set the DBID of the database.


RMAN set dbid=3776942889;

executing command: SET DBID
database name is "BRYAN" and DBID is 3776942889


Step 6 - Restore the SPFILE.



RMAN> run
2> {
3> allocate CHANNEL sbt1 DEVICE TYPE 'SBT_TAPE'  FORMAT   '%d_%U' PARMS  "SBT_LIBRARY=/u01/app/oracle/product/12.1.0.2/dbhome_1/lib/libra.so, ENV=(RA_WALLET='location=file:/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/zdlra credential_alias=zdl2ing-scan:1521/zdlra:dedicated')";
4> restore spfile;
5> }

allocated channel: sbt1
channel sbt1: SID=133 device type=SBT_TAPE
channel sbt1: RA Library (ZDLRA) SID=72F1DBDB0D05BC0FE05375AF880AE677

Starting restore at 08-AUG-2018 11:41:30

channel sbt1: starting datafile backup set restore
channel sbt1: restoring SPFILE
output file name=/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/spfilebryan1.ora
channel sbt1: reading from backup piece c-3776942889-20180808-0b
channel sbt1: piece handle=c-3776942889-20180808-0b tag=TAG20180808T093007
channel sbt1: restored backup piece 1
channel sbt1: restore complete, elapsed time: 00:00:25
Finished restore at 08-AUG-2018 11:43:22
released channel: sbt1




Step 7 - from asmcmd copy the spfile into ASM, and remove it from the OS.


ASMCMD> mkdir +datac1/bryan
ASMCMD> mkdir +datac1/bryan/parameterfile
ASMCMD> cp /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/spfilebryan1.ora +datac1/bryan/parameterfile/spfilebryan.ora
copying /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/spfilebryan1.ora -> +datac1/bryan/parameterfile/spfilebryan.ora
ASMCMD> 



Step 8 - point the initfile to use the spfile on both instances. and restart the instance nomount using the spfile


Step 9 - Modify the service to use the spfile location


srvctl modify database -db bryan -spfile '+DATAC1/bryan/parameterfile/spfilebryan.ora'


Step 10 - recreate the password file and change the service to point to it.


orapwd file=+DATAC1/bryan/PASSWORD/pwdbryan.ora dbuniquename=bryan

srvctl modify database -db bryan -pwfile '+DATAC1/bryan/PASSWORD/pwdbryan.ora'


Step 11 - Restore the controlfiles



RMAN> set dbid=3776942889;

executing command: SET DBID
database name is "BRYAN" and DBID is 3776942889

RMAN> run
2> {
3> allocate CHANNEL sbt1 DEVICE TYPE 'SBT_TAPE'  FORMAT   '%d_%U' PARMS  "SBT_LIBRARY=/u01/app/oracle/product/12.1.0.2/dbhome_1/lib/libra.so, ENV=(RA_WALLET='location=file:/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/zdlra credential_alias=zdl2ing-scan:1521/zdlra:dedicated')";
4> restore controlfile;
5> }

allocated channel: sbt1
channel sbt1: SID=568 instance=bryan1 device type=SBT_TAPE
channel sbt1: RA Library (ZDLRA) SID=72F21BF1040F9F09E05375AF880A4C78

Starting restore at 08-AUG-2018 11:59:25

channel sbt1: starting datafile backup set restore
channel sbt1: restoring control file
channel sbt1: reading from backup piece c-3776942889-20180808-0b
channel sbt1: piece handle=c-3776942889-20180808-0b tag=TAG20180808T093007
channel sbt1: restored backup piece 1
channel sbt1: restore complete, elapsed time: 00:00:02
output file name=+DATAC1/BRYAN/CONTROLFILE/current.708.983620771
output file name=+DATAC1/BRYAN/CONTROLFILE/current.710.983620771
Finished restore at 08-AUG-2018 11:59:31
released channel: sbt1


Step 11 - Mount the database on all the nodes.


srvctl start database -d bryan -o mount

Step 12 - Connect to rman using the service, and restore the database.


rman

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Aug 8 12:51:41 2018

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target sys/bryan@bryan

connected to target database: BRYAN (DBID=3776942889, not open)

RMAN> connect catalog /@zdl2ing-scan:1521/zdlra:dedicated

connected to recovery catalog database

RMAN> restore database;




Step 13. - Now let's check the log and make sure it restoring across all nodes.


Starting restore at 08-AUG-2018 12:52:07
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: SID=595 instance=bryan2 device type=SBT_TAPE
channel ORA_SBT_TAPE_1: RA Library (ZDLRA) SID=72F2D881B658956EE05376AF880A153A
allocated channel: ORA_SBT_TAPE_2
channel ORA_SBT_TAPE_2: SID=64 instance=bryan1 device type=SBT_TAPE
channel ORA_SBT_TAPE_2: RA Library (ZDLRA) SID=72F2D88B3E252398E05375AF880A7E91
allocated channel: ORA_SBT_TAPE_3
channel ORA_SBT_TAPE_3: SID=604 instance=bryan2 device type=SBT_TAPE
channel ORA_SBT_TAPE_3: RA Library (ZDLRA) SID=72F2D894B4979578E05376AF880A512C
allocated channel: ORA_SBT_TAPE_4
channel ORA_SBT_TAPE_4: SID=586 instance=bryan1 device type=SBT_TAPE
channel ORA_SBT_TAPE_4: RA Library (ZDLRA) SID=72F2D89E2F8D23B1E05375AF880AADDA
allocated channel: ORA_SBT_TAPE_5
channel ORA_SBT_TAPE_5: SID=613 instance=bryan2 device type=SBT_TAPE
channel ORA_SBT_TAPE_5: RA Library (ZDLRA) SID=72F2D8A7B4E7958BE05376AF880ADDE6
allocated channel: ORA_SBT_TAPE_6
channel ORA_SBT_TAPE_6: SID=622 instance=bryan2 device type=SBT_TAPE
channel ORA_SBT_TAPE_6: RA Library (ZDLRA) SID=72F2D8B13CF99592E05376AF880A55C4
allocated channel: ORA_SBT_TAPE_7
channel ORA_SBT_TAPE_7: SID=595 instance=bryan1 device type=SBT_TAPE
channel ORA_SBT_TAPE_7: RA Library (ZDLRA) SID=72F2D8BADCBC2404E05375AF880A42C0
allocated channel: ORA_SBT_TAPE_8
channel ORA_SBT_TAPE_8: SID=631 instance=bryan2 device type=SBT_TAPE
channel ORA_SBT_TAPE_8: RA Library (ZDLRA) SID=72F2D8C482E695ACE05376AF880A4989


OK.. I can see it is restoring on both instances.

Step 14 - shutdown down all instances


srvctl stop database -d bryan

Step 15 - Start up the first instance mount in RMAN and recover the database.


RMAN> startup mount;

Oracle instance started
database mounted

Total System Global Area   62813896704 bytes

Fixed Size                     7661424 bytes
Variable Size               9261025424 bytes
Database Buffers           53418655744 bytes
Redo Buffers                 126554112 bytes

RMAN> recover database;

Starting recover at 08-AUG-2018 13:00:15
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: SID=586 instance=bryan1 device type=SBT_TAPE
channel ORA_SBT_TAPE_1: RA Library (ZDLRA) SID=72F2F5883A878935E05375AF880A8E9B
...
starting media recovery

channel ORA_SBT_TAPE_1: starting archived log restore to default destination
channel ORA_SBT_TAPE_1: restoring archived log
archived log thread=1 sequence=91
channel ORA_SBT_TAPE_1: reading from backup piece $RSCN_1594143_RTIM_983531820_THRD_1_SEQ_91_CTKEY_5520974_BACKUP
channel ORA_SBT_TAPE_2: starting archived log restore to default destination
channel ORA_SBT_TAPE_2: restoring archived log
archived log thread=2 sequence=34
channel ORA_SBT_TAPE_2: reading from backup piece $RSCN_1594143_RTIM_983531820_THRD_2_SEQ_34_CTKEY_5520979_BACKUP
channel ORA_SBT_TAPE_1: piece handle=$RSCN_1594143_RTIM_983531820_THRD_1_SEQ_91_CTKEY_5520974_BACKUP tag=TAG20180808T105809
channel ORA_SBT_TAPE_1: restored backup piece 1
channel ORA_SBT_TAPE_1: restore complete, elapsed time: 00:00:01
channel ORA_SBT_TAPE_2: piece handle=$RSCN_1594143_RTIM_983531820_THRD_2_SEQ_34_CTKEY_5520979_BACKUP tag=TAG20180808T105736
channel ORA_SBT_TAPE_2: restored backup piece 1
channel ORA_SBT_TAPE_2: restore complete, elapsed time: 00:00:01



Step 16  - Open the instance and startup the other instances.


That's it !

Backing up a RAC database with all nodes

Backing up a RAC environment from multiple DB nodes


I've been working with many customers who have never performed a multi-node backup for a RAC database.

If you didn't already know, in a RAC environment you can backup from more than 1 DB node.

Backing up from multiple DB nodes can be very important for a large multi-terabyte database to give you faster backup times.


** Keep in mind these throughput numbers would be the same for restore capability.

PART 1 - WHY


Why would I want to go through the trouble of backing up on multiple nodes ?

When looking at backup rates, the number to keep in mind is the backup rate for each network speed.

1   Gb/s  == 350 GB/hr
10 Gb/s == 3.5 TB/hr
25 Gb/s == 8.8 TB/hr

**  Note that networks are measured in small "b" which is bits/seconds
**  Database size is measured in large "B" which is Bytes.

The chart below gives you an idea of the impact of adding active network ports, and how the speed of the ports affects the backup throughput.
















Using this chart you can calculate how long it will take to backup your large database.

Here is the example of the number of hours it would take to backup a 50 T database..





PART 2 -  How ?


When performing a backup, the backup process uses channels.  I'm guessing you probably already knew that.  
Each allocated channel connects to a database instance and sends the backup through that channel to the backup location (disk, media manager, etc).
By default the allocated channel connects to the database instance that the RMAN process is executed from.
What this means is... if you are "inst_id 1" and execute a backup, all channels will be connected to "inst_id 1".

This can be changed within the channel allocation by using a channel parameter named "CONNECT".

The connect (as you expect) uses an ID and Password (or a wallet) to connect to a specific service, or by specifying an EZ Connect string.

For my example I am going to use a configure channel setting and a common service that is shared across all nodes and let it balance. the other options are

  • Create a specific service to control which instances are available for backing up.  
  • Manually allocate channels and specify the instance name in the connection.

Step 1 - Create a user for backing up


I'm going assume you are already on 12c, which contains the sysbackup role.
I'm going to create a user ,"backup_only", specifically for backing up, and I am going to use a complicated password.
Next,  I am going to grant that user sysbackup privlege.

SQL  create user backup_only identified by kkk3223490sf9090fjkljkl3590;

User created.

SQL   grant sysbackup to backup_only;

Grant succeeded.




Step 2 - Change the channel configuration to use this user


Now I am going to configure my channel to use this user for the connect clause.

Below is the statement I'm executing.  ** Notice I added the "as sysbackup" to the connection.

CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' connect 'backup_only/kkk3223490sf9090fjkljkl3590@bryan as sysbackup' FORMAT   '%d_%U' PARMS  "SBT_LIBRARY=/u01/app/oracle/product/12.1.0.2/dbhome_1/lib/libra.so, ENV=(RA_WALLET='location=file:/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/zdlra credential_alias=zdl2ing-scan:1521/zdlra:dedicated')";



This is for a ZDLRA so I have the additional information it needs for the library etc.

Notice that the I am including the id/password and tnsnames entry in the connection string.

This is what happens when I execute the configure command.



RMAN  
RMAN CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' connect * FORMAT   '%d_%U' PARMS  "SBT_LIBRARY=/u01/app/oracle/product/12.1.0.2/dbhome_1/lib/libra.so, ENV=(RA_WALLET='location=file:/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/zdlra credential_alias=zdl2ing-scan:1521/zdlra:dedicated')";
old RMAN configuration parameters:
CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' FORMAT   '%d_%U' PARMS  "SBT_LIBRARY=/u01/app/oracle/product/12.1.0.2/dbhome_1/lib/libra.so, ENV=(RA_WALLET='location=file:/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/zdlra credential_alias=zdl2ing-scan:1521/zdlra:dedicated')";
new RMAN configuration parameters:
CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' CONNECT '*' FORMAT   '%d_%U' PARMS  "SBT_LIBRARY=/u01/app/oracle/product/12.1.0.2/dbhome_1/lib/libra.so, ENV=(RA_WALLET='location=file:/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/zdlra credential_alias=zdl2ing-scan:1521/zdlra:dedicated')";
new RMAN configuration parameters are successfully stored
starting full resync of recovery catalog
full resync complete



One thing that stands out is that the connection information doesn't appear in the channel.  it is obfuscated by the '*' even though it is stored.


Step 3 - Execute backup


I can now execute my backup from either OEM, or from the command line.

This can be monitored (to ensure multiple nodes are used) by looking at the channel allocation in the output log of the backup.



Starting backup at 2018-08-08 09:27:17
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: SID=218 instance=bryan2 device type=SBT_TAPE
channel ORA_SBT_TAPE_1: RA Library (ZDLRA) SID=72EFFBF5EAF19225E05376AF880A0BA7
allocated channel: ORA_SBT_TAPE_2
channel ORA_SBT_TAPE_2: SID=245 instance=bryan2 device type=SBT_TAPE
channel ORA_SBT_TAPE_2: RA Library (ZDLRA) SID=72EFFBFF3A0E9235E05376AF880A1C6D
allocated channel: ORA_SBT_TAPE_3
channel ORA_SBT_TAPE_3: SID=615 instance=bryan1 device type=SBT_TAPE
channel ORA_SBT_TAPE_3: RA Library (ZDLRA) SID=72EFFC0907A51E40E05375AF880AA173
allocated channel: ORA_SBT_TAPE_4
channel ORA_SBT_TAPE_4: SID=299 instance=bryan2 device type=SBT_TAPE
channel ORA_SBT_TAPE_4: RA Library (ZDLRA) SID=72EFFC128EDD9244E05376AF880A6B6B
allocated channel: ORA_SBT_TAPE_5
channel ORA_SBT_TAPE_5: SID=64 instance=bryan2 device type=SBT_TAPE
channel ORA_SBT_TAPE_5: RA Library (ZDLRA) SID=72EFFC1BD2EF924AE05376AF880A68B2
allocated channel: ORA_SBT_TAPE_6
channel ORA_SBT_TAPE_6: SID=64 instance=bryan1 device type=SBT_TAPE
channel ORA_SBT_TAPE_6: RA Library (ZDLRA) SID=72EFFC2595C71E53E05375AF880AA8D1
allocated channel: ORA_SBT_TAPE_7
channel ORA_SBT_TAPE_7: SID=74 instance=bryan2 device type=SBT_TAPE
channel ORA_SBT_TAPE_7: RA Library (ZDLRA) SID=72EFFC2FA4979254E05376AF880AF834
allocated channel: ORA_SBT_TAPE_8


Above is the example from my output.. You can see the that there were 8 channels allocated, and 5 of the channels were on the first instance and 3 were allocated on the second instance.

That's all there is to it !!

Next blog post is going to be restoring across multiple nodes in a RAC cluster.