Friday, November 6, 2020

Cloud restores to a RAC cluster with RMAN.

 This post is about an RMAN command you probably never thought much about,"Autolocate". I know I never did until I started testing restores from a cloud store.

Now let's see what it is, what it does.

First let's see what the documentation says it does.

"RMAN automatically performs autolocation of all files that it must back up or restore. If you use the noncluster file system local archiving scheme, then a node can only read the archived redo logs that were generated by an instance on that node. RMAN never attempts to back up archived redo logs on a channel it cannot read.

During a restore operation, RMAN automatically performs the autolocation of backups. A channel connected to a specific node only attempts to restore files that were backed up to the node. For example, assume that log sequence 1001 is backed up to the drive attached to node1, while log 1002 is backed up to the drive attached to node2. If you then allocate channels that connect to each node, then the channel connected to node1 can restore log 1001 (but not 1002), and the channel connected to node2 can restore log 1002 (but not 1001)."

After reading this, you are probably wondering why this matters when restoring from a cloud store.

To show you, I will walk through what happens during the "autolocate" process.

First, as you would guess, the "autolocation" occurs before any restore operations can start.

Also, this only comes into play when restoring to multiple nodes in the RAC cluster using channels allocated to the different nodes..

To show why it's important to understand it, I will walk through the test case that I had.

EXAMPLE environment:

MYDB - I have a very large Database, 100 TB composed of 8,000 individual datafiles.

DB Host - I have 8 nodes in my RAC cluster

BACKUP - I backed up to a cloud store with a filesperset 1, section size 32G. since my datafiles where all 32G, they would be individual pieces even with a different filesperset.  My backup is composed of ~8,000 individual backup pieces.

RESTORE - In order to improve my restore performance I configured my restore across all 8 nodes.

What happens:

What the "autolocate" does, by default, is it validates that each backup piece is available from each node.  This is a serial process for each backup, AND for each node.

This turned out ot be a slow process due to the # of validations that needed to be performed. For my example it validated 8,000 X 8 = 64,000 validations.  

Also, I found that this serial process took a lot of time.  Even though each validation takes a fraction of a second, the total time for the validation becomes significant.  In my test case, 8 pieces/second were being validated.

This added up because below is what was happening.


    Node 1 - validate 500 pieces : 01:02


    Node 1 - validate 5000 pieces : 10:25


    Node 1 - validate 8000 pieces : 16:40

    Node 2 - start validation : 16:40



     Node 8 - validate 8000 pieces    2 : 13:20

BEGIN Restoring files.

So how to get around this issue? If you are sure that all backupieces you are restoring are available from every node in your cluster, you can set it off at the beginning of your restore operation.

RMAN> set autolocate off;

During my testing, it bypassed the validation step, and started restoring the database within a few seconds.

This is something to keep in mind, if you see a gap in time between starting a restore on a RAC cluster, and when it starts assigning datafiles to channels.