Tuesday, May 12, 2020

ZDLRA and using stored scripts


One new item I learned with RMAN is the ability to utilize shared scripts. Using shared scripts can be very useful as you standardize your environment.
This is where it fits into the ZDLRA.  The ZDLRA is an awesome product that helps you standardize your Oracle Backups, so adding shared scripts to your environment makes it even better !!

First, a little bit on shared scripts if you are new to them (I was).
Shared scripts are exactly as you expect by the name. They are a way to store a script in your RMAN catalog that can be shared by all the databases that utilize that RMAN catalog.
The list of commands to be used for shared scripts are.
  • [CREATE]/[CREATE OR REPLACE]  {GLOBAL} SCRIPT
  • DELETE {GLOBAL} SCRIPT
  • EXECUTE {GLOBAL} SCRIPT
  • PRINT {GLOBAL} SCRIPT
Using the {GLOBAL} keyword is optional, and is only useful if you are are using multiple VPD users in your RMAN catalog.  In a multi-VPD RMAN catalog, each VPD users can store scripts with the same name and they will be separate.
 I will go through VPD users in a future post, as this is a topic on it's own.

Now let's go through how to best utilize stored scripts through an example. In my example, I have 2 ZDLRAs (let's pretend).  With ZDLRA, the RMAN catalog is contained with the ZDLRA itself. This means that because I have 2 ZDLRAs, I have 2 RMAN catalogs.  In this example, I will show how I can use a shared script to automatically switch backups to a different ZDLRA when the primary ZDLRA is not available during a patching window.  Below is the picture of my fictitious environment.

In this case my backups go to ZDLRA_SF, and they are replicated to ZDLRA_NYC.  My database is registered in both RMAN catalogs, but I only connect to the catalog on ZDLRA_SF if it is available.
Now let's create a script to put in each catalog that is specific to each ZDLRA.

Step #1 - add a script to both ZDLRAs that creates the channel configuration.



This is the script for the SF ZDLRA.


create or replace script ZDLRA_INCREMENTAL_BACKUP_L1
COMMENT "Normal level 1 incremental backups for San Francisco Primary ZDLRA"
  {
###
### Script to backup incremental level 1 to Primary San Francisco ZDLRA
###
###
###  First configure the channel for San Francisco ZDLRA
###
    CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' FORMAT '%d_%U' PARMS
      "SBT_LIBRARY=/u01/app/oracle/lib/libra.so,
      ENV=(RA_WALLET='location=file:/u01/app/oracle/wallet
       credential_alias=zdlrasf-scan:1521/zdlra:dedicated')";
###
###  Perform backup command
###
    backup device type sbt cumulative incremental level 1 filesperset 1 section size 64g database plus archivelog filesperset 32 not backed up;
   }


created script ZDLRA_INCREMENTAL_BACKUP_L1




This is the script for the NYC ZDLRA

create or replace script ZDLRA_INCREMENTAL_BACKUP_L1
COMMENT "Normal level 1 incremental backups for New York Primary ZDLRA"
  {
###
### Script to backup incremental level 1 to Primary New York ZDLRA
###
###
###  First configure the channel for New York ZDLRA
###
    CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' FORMAT '%d_%U' PARMS
      "SBT_LIBRARY=/u01/app/oracle/lib/libra.so,
      ENV=(RA_WALLET='location=file:/u01/app/oracle/wallet
       credential_alias=zdlranyc-scan:1521/zdlra:dedicated')";
###
###  Perform backup command
###
    backup device type sbt cumulative incremental level 1 filesperset 1 section size 64g database plus archivelog filesperset 32 not backed up;
   }

created script ZDLRA_INCREMENTAL_BACKUP_L1


Now, I have a script in each ZDLRA which allocates the channel properly for each ZDLRA.

Step #2 - Ensure the connection to the RMAN fails over if primary ZDLRA is not available.


The best way to accomplish this is the GDS - Global Data Services. In order to perform my test I am going to set up a failover in my tnsnames.ora file.  Below is the entry in my file.


ZDLRA_SF=
       (DESCRIPTION_LIST=(FAILOVER=YES)(LOAD_BALANCE= NO)
          (DESCRIPTION=(FAILOVER= NO)
             (CONNECT_DATA=(SERVICE_NAME=ZDLRA_SF))
             (ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1522))
           )
          (DESCRIPTION=(FAILOVER= NO)
             (CONNECT_DATA=(SERVICE_NAME=ZDLRA_NYC))
             (ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1522))
           )
       )



Step #3 - Let's put it all together.


Since I don't have 2 ZDLRAs to test this, I will test it all by printing out the script that will be executed in RMAN.

I am going first going to  show what happens under normal conditions. I am going to execute the following script I created in /tmp/rman.sql


print script  ZDLRA_INCREMENTAL_BACKUP_L1;
exit;


I am going run that script over and over to ensure I keep connecting properly. Below is the command.


 rman target / catalog rman/oracle@zdlra_sf @/tmp/rman.sql


Here is the output I get everytime showing that I am connecting and I will allocate channels to ZDLRA_SF

RMAN> print script  ZDLRA_INCREMENTAL_BACKUP_L1;
2> exit;
printing stored script: ZDLRA_INCREMENTAL_BACKUP_L1
{
###
### Script to backup incremental level 1 to Primary San Francisco ZDLRA
###
###
###  First configure the channel for San Francisco ZDLRA
###
    CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' FORMAT '%d_%U' PARMS
      "SBT_LIBRARY=/u01/app/oracle/lib/libra.so,
      ENV=(RA_WALLET='location=file:/u01/app/oracle/wallet
       credential_alias=zdlrasf-scan:1521/zdlra:dedicated')";
###
###  Perform backup command
###
    backup device type sbt cumulative incremental level 1 filesperset 1 section size 64g database plus archivelog filesperset 32 not backed up;
   }


Everything is working fine.  Now let's simulate the primary being unavailable like below.


I shutdown the ZDLRA_SF database.

Now I am executing the same command as before.

 rman target / catalog rman/oracle@zdlra_sf @/tmp/rman.sql


My output now shows that it is using my New York ZDLRA.

RMAN> print script  ZDLRA_INCREMENTAL_BACKUP_L1;
2> exit;
printing stored script: ZDLRA_INCREMENTAL_BACKUP_L1
{
###
### Script to backup incremental level 1 to Primary New York ZDLRA
###
###
###  First configure the channel for New York ZDLRA
###
    CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' FORMAT '%d_%U' PARMS
      "SBT_LIBRARY=/u01/app/oracle/lib/libra.so,
      ENV=(RA_WALLET='location=file:/u01/app/oracle/wallet
       credential_alias=zdlranyc-scan:1521/zdlra:dedicated')";
###
###  Perform backup command
###
    backup device type sbt cumulative incremental level 1 filesperset 1 section size 64g database plus archivelog filesperset 32 not backed up;
   }

Recovery Manager complete.



SUMMARY :  



  •  With stored scripts, you can easily script a central backup script to be shared by all databases, and not have to worry about the individual channel configuration.
  • If there are any changes to the best practice for my backups (Doc ID 2176686.1), I just have to update the stored script in the RMAN catalog.

Wednesday, April 15, 2020

ZDLRA, Real-time redo compression


  ZDLRA: Changes in the Protection Policy Compression Algorithms (Doc ID 2654539.1)

First I wanted to go through a little background on how compression works for archive logs.


As you probably know, database backups are automatically compressed.  That is not necessarily the case for archive logs.

Archive logs can reach the ZDLRA by 2 different methods.

  • Real-time redo,
  • Log sweeps.
Regardless of how the Archive Logs reach the ZDLRA, they are stored as backup files.

Log Sweeps


Now for log sweeps, the backupset of archive logs is stored exactly as they are sent.
if you follow the best practices for backups, you send over archive logs as
   - Plus archive logs not backed up filesperset=32;
As you can figure out, this creates a single, or multiple backupsets each backupset contains multiple archive logs up to a total of 32 archive logs per backupset.
Also note, that that by default the archive log backupsets are NOT compressed.  You can RMAN compress the archive log backups (if you have the ACO license), but be careful because it is not recommended to compress the datafile backups.

Real-time redo

Real-time redo is where this MOS note takes affect.  Up to the current release of ZDLRA, archive logs sent via real-time redo were compressed using basic compression.  A "standby redo" log would mirror the redo log for the protected database, and a "BACKUP_ARCH" task would be queued to create a backupset from the "standby redo log".  In the process of creating the RMAN backupset, basic compression was applied and a compress RMAN backupset is created.

This does a few things that I want to point to out.
  • During the creation of the archivelog backupset, CPU is used to compress the backupset, and the final result is a smaller backupset.
  • During the validation of backups for a database, CPU is used to uncompress the RMAN backupset to validate it. NOTE: this would also occur if you RMAN compress the archive logs during a log sweep.
  • When recovering a database, the compressed RMAN backupset is sent over the network (less network traffic), and the Database Host uncompresses the backupset to read the archive logs.

If you read this carefully, you will see that there a CPU cost for compression, and there is a space savings when using it.

New Feature to control log compression!


The new feature allows you to control the log compression for archive logs sent through real-time redo.
NOTE: When using compression on the ZDLRA, you don't need the ACO license.

Below is the information from the documentation.

log_compression_algorithm

The setting for the archivelog compression feature. 
This setting is used to adjust the compression level of NZDL/polled archivelog backups.


OFF means that the archivelogs will not be compressed. 
BASIC means the BASIC compression alogorithm will be used to compress the backups. 
LOW means the LOW compression alogorithm will be used to compress the backups. 
MEDIUM means the MEDIUM compression alogorithm will be used to compress the backups. 
HIGH means the HIGH compression alogorithm will be used to compress the backups.



This is a great lever to control what happens with archive logs.

Medium compression can be used if you have ample CPU and are concerned about space, OFF can be used if you have CPU concerns and ample space.

Remember, the deeper the compression, the higher the CPU. Like most things in IT, there are tradeoffs .


Monday, February 24, 2020

What are those processes in V$DATAGUARD_PROCESSES on the ZDLRA?


With 12.2 of Oracle there is a now detail in the V$DATAGUARD_PROCESS view on the downstream database.  Keep in mind that the "downstream database" may not only be a standby database, it can include a Far Sync database, or a ZDLRA receiving Real-time-redo.

Below is a table of the processes that I've seen in this view from a ZDLRA, and I will go through what each one of these are.

NAME ACTION        CLIENT_ROLE      COUNT(*)
---- ------------ ---------------- ----------
rfs  IDLE          async SRL single      10
rfs  IDLE          async SRL multi       20
rfs  IDLE          async ORL single      30
rfs  IDLE          async ORL multi       40
rfs  IDLE          gap manager          100
rfs  IDLE          archive gap           10



First off.  I want to point out what happens in a RAC environment.  As you can image, each thread (node) in a RAC environment independently sends it's redo to the downstream.  The same thing happens with a standby database sending it's redo to a downstream.
For example, if my primary DB cluster is a 4 Node RAC cluster, and I have a standby database that is a single node, redo sent from both the primary AND the standby cluster to a ZDLRA will be sent as 4 independent streams.

Now to walk through what each of these types of processes are, I will first define the types.
Since I am seeing multiple types, this example is from a ZDLRA (rather than a standby database) so I will define the terms from a ZDLRA standpoint.


SRL – Standby Redo Logs are sending the changes. These are standby databases
ORL – Online Redo Logs are sending the changes. These are primary databases

Multi  - The Protected database is using the same log buffer to write both archive logs and REDO to the ZDLRA (and possibly another standby).
Single – The Protected database is using separate buffers for REDO to the ZDLRA. The ZDLRA could not keep up with the archive/standby database.

Async       - Real-time redo processes
Gap manager – Process that keeps track of any gaps in redo, and can send archive logs to fill the gap.
Archive gap – Additional processes that send over any gaps in redo. This is controlled by the parameter “LOG_ARCHIVE_MAX_PROCESSES” on the protected databases. The Gap manager will send over the first archive log, but archive gap processes will be started in parallel with the Gap manager if more than 1 archive log is needed. This is controlled by the LOG_ARCHIVE_MAX_PROCESSES parameter which has a default value of 4.

So for my example, this is what we are seeing 

  • “Async SRL single” processes of 10 is telling us that there are 10 standby treads being applied and sending real-time-redo using the same buffer on the protected database that the archive redo process is using. 
  • Async SRL multi” processes of 20 is telling us that there are 20 standby threads being applied and sending real-time-redo that could not keep up with the archiver process and spawned an additional buffer. 
  • A total of 30 combined processes for "Async SRL" is telling us that there are 30 threads sending real-time redo in total.
**** Notice I used the term "threads" above.  The number of threads is the number of RAC nodes on the primary database regardless of how many instances are on the standby applying redo.


  • “Async ORL single” processes of 30 is telling us that there are 30 primary database instances sending real-time-redo using the same buffer on the protected database that the archive redo process is using. 

  • “Async ORL Multi” processes of 40 is telling us that there are 40 primary database instances sending real-time-redo that could not keep up with the archiver process and spawned an additional buffer. 
  • A total of 70 combined processes for "Async ORL" is telling us there are 70 primary instances sending real-time redo.
**** Notice I used the term "instances" above.  The number of instances is the number of RAC nodes on the primary database rather than the number of databases.
  • "Gap Manager" process of 100 is telling us that there are 100 process that are monitoring the async process.  Notice that the total number of "Gap Manager" processes matches the total number of async processes.
  • “archive gap” processes are temporary and should be killed once they are idle for 2 minutes since they completed their work. We should only see these processes if the ZDLRA falls behind in collecting redo.

I hope this helps explain what the processes are that are in use to manage the real-time redo.

Monday, December 23, 2019

Oracle - Alternate Destinations, FRA, and balancing 2 destinations


My previous post was on using Groups/Priorities when specifying an alternate destination for sending Redo to a remote destination. You can find it here.

This post covers 2 topics that come up whenever I talk about alternate destinations.

  • Does it balance 2 destinations with the same group/priority. and once it choses a destination, does it remain sticky to that destination and ignore other destinations with the same "group/priority"
  • Does my FRA know both destinations can be considered the same for "shipped" status?


First I will show you what I configured my tests.

In my example, I actually have both a standby database and a Far Sync instance for my primary database.

bsg18   --> Primary database
bsg18d --> Dataguard database
bsg18f  --> Far Sync database

This is what my 2 remote destinations look like in my configuration.


log_archive_dest_2 ='service="bsg18d" ASYNC NOAFFIRM max_failure=0 db_unique_name="bsg18d" group=1 priority=1 valid_for=(online_logfile,all_roles)';

log_archive_dest_3 ='Service="bsg18f" SYNC AFFIRM  max_failure=1 db_unique_name="bsg18f" group=1 priority=1 valid_for=(online_logfile,all_roles)';


You can see that both Dataguard (bsg18d) and Far Sync (bsg18f) are in "group=1 priority=1"

Multiple Destinations with the same Group and Priority - Does it balance them ?


First I'm going to look at where the logs are going.

   DEST_ID  SEQUENCE# S SHI APPLIED   DEL FAL
---------- ---------- - --- --------- --- ---
         2       1751 A YES NO        NO  NO
         2       1752 A YES NO        NO  NO
         2       1753 A YES NO        NO  NO


I can see that they are going to DEST_2, which is my dataguard instance.  I am going to shut it down, do a few log switches and see what happens.

   DEST_ID  SEQUENCE# S SHI APPLIED   DEL FAL
---------- ---------- - --- --------- --- ---
         2       1751 A YES YES       NO  NO
         1       1751 A YES NO        NO  NO
         1       1752 A YES NO        NO  NO
         2       1752 A YES YES       NO  NO
         1       1753 A YES NO        NO  NO
         2       1753 A YES YES       NO  NO
         1       1754 A YES NO        NO  NO
         2       1754 A YES NO        NO  NO
         1       1755 A YES NO        NO  NO
         1       1756 A YES NO        NO  NO
         1       1757 A YES NO        NO  NO
         1       1758 A YES NO        NO  NO
         1       1759 A YES NO        NO  NO
         1       1760 A YES NO        NO  NO


I've including DEST_1 this time to show that my sequence# had moved forward, but any sending of redo my standby has stopped.

Now I started it back up, and did a few log switches.

  DEST_ID  SEQUENCE# S SHI APPLIED   DEL FAL
---------- ---------- - --- --------- --- ---
         2       1751 A YES YES       NO  NO
         2       1752 A YES YES       NO  NO
         2       1753 A YES YES       NO  NO
         2       1754 A YES YES       NO  NO
         2       1761 A YES NO        NO  NO
         3       1755 A YES YES       NO  YES
         3       1758 A YES NO        NO  YES
 .....
         3       1754 A YES NO        NO  YES
         3       1752 A YES NO        NO  YES
         3       1762 A YES NO        NO  NO
         3       1763 A YES NO        NO  NO


After restarting the database, I can see that it is now using DEST_3 (my Far Sync) instance instead of  DEST_2 (dataguard).

Now for a final test on this, I am going to STOP my Far Sync instance.

   DEST_ID  SEQUENCE# S SHI APPLIED   DEL FAL
---------- ---------- - --- --------- --- ---
         2       1751 A YES YES       NO  NO
         2       1752 A YES YES       NO  NO
         2       1753 A YES YES       NO  NO
         2       1754 A YES YES       NO  NO
         2       1761 A YES YES       NO  NO
         3       1755 A YES YES       NO  YES
         3       1758 A YES NO        NO  YES
 ....
         3       1754 A YES YES       NO  YES
         3       1752 A YES YES       NO  YES
         3       1762 A YES NO        NO  NO
         3       1763 A YES NO        NO  NO
         2       1764 A YES NO        NO  YES
         2       1765 A YES NO        NO  NO
         2       1766 A YES NO        NO  NO


I stopped bsg18f (Far Sync), when it was on sequence 1764.  I can see that the database automatically switched to sending logs to bsg18d (dataguard) when DEST_3 because unavailable. 

This is exactly what I would expect to happen when I have 2 destinations with the same Group/Priority.
  • It chooses one of the 2 destinations and becomes "sticky" once chosen.
  • If the "sticky" destination becomes unavailable, it then automatically switches to the second destination.

FRA - If you use an FRA (Fast Recovery Area), how does it understand that I my redo can go to an alternate location?

Now after all this, you can see from my previous tests, some logs were sent from DEST_2 and some were sent from DEST_3.  Both of these were members of the same group.
Let's see if the FRA sees them as shipped (my retention policy).

FILE_TYPE               PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES     CON_ID
----------------------- ------------------ ------------------------- --------------- ----------
CONTROL FILE                             0                         0               0          0
REDO LOG                                 0                         0               0          0
ARCHIVED LOG                         21.78                     21.78             555          0
BACKUP PIECE                          7.54                       .45              38          0
IMAGE COPY                               0                         0               0          0
FLASHBACK LOG                            0                         0               0          0
FOREIGN ARCHIVED LOG                     0                         0               0          0
AUXILIARY DATAFILE COPY                  0                         0               0          0



Perfect !  All the archive log space is reclaimable.  When I have different remote destinations that are members of the same group, as long as the Redo Log was successfully shipped to a destination in the group, the log is eligible for deletion.

Thursday, December 19, 2019

ZDLRA alternate redo destination replaced with Groups/Priority in 12.2

Alternate destination is depreciated in 12.2 for remote locations.

This diagram shows the use of an alternate destination for sending logs.
An alternate destination can be local to the database (to be used if the disk location fills up), but they can also be used to change the remote destination that the logs get sent to.

Configuring an "ALTERNATE" for a local destination is still supported

Below is what this looks like from the 18C manual.

LOG_ARCHIVE_DEST_1='LOCATION=/disk1 MANDATORY MAX_FAILURE=1 
ALTERNATE=LOG_ARCHIVE_DEST_2'
 
LOG_ARCHIVE_DEST_STATE_1=ENABLE
 
LOG_ARCHIVE_DEST_STATE_2=ALTERNATE

LOG_ARCHIVE_DEST_2='LOCATION=/disk2 MANDATORY'

In this example, if /disk1 is not available (fills up, etc). the database will switch to sending logs to /disk2 so that the database will not get hung.

HOWEVER, for remote destinations AKA destinations using "SERVICE" , this changes with 12.2

12.2 introduces the idea of "GROUP" and "PRIORITY". in place of a simple "ALTERNATE".
I'm sure this was added for Far Sync but it also comes into play with the ZDLRA.

Using only "ALTERNATE" gives you the ability to switch to a single remote destination. Using GROUP and PRIORITY lets you set multiple destinations and decide the order in which they are used.

Now let's start with simple example with FAR SYNC.
I have 2 locations. San Francisco (SF) is Primary with 2 local farsync servers, and my remote destination is New York (NYC).


I can group them together and set the priority that they use. I can even switch from sync to async.  Below is what the new syntax would look like for the destinations.

LOG_ARCHIVE_DEST_2='SERVICE=FARSYNC1 SYNC GROUP=1 PRIORITY=1'
LOG_ARCHIVE_DEST_3='SERVICE=FARSYNC2 SYNC GROUP=1 PRIORITY=1'
LOG_ARCHIVE_DEST_4='SERVICE=NYC ASYNC GROUP=1 PRIORITY=2’

This behaves in the following manner.


  • My database in San Francisco sends archive logs to EITHER of the two services specified as PRIORITY=1.  The log could go to FARSYNC1 or FARSYNC2 and get passed on to NYC.
  • If either of these 2 PRIORITY=1 destinations becomes unavailable the other will continue to work.
  • If both of these 2 PRIORITY=1 destinations become unavailable, the redo will be sent to NYC as ASYNC.
  • If NYC is the destination (because of a double failure), it will switch back to the either of the 2 FARSYNC services when they become available.
You can add up to 8 layers of PRIORITY and include a remote FARSYNC server if you would like.

Now, what does this have to do with ZDLRA ?

You can also use an alternate destination for redo traffic on the ZDLRA.  The alternate can be the downstream ZDLRA if replication is used, or it can be the upstream ZDLRA if store-and-forward is being used.  These are outlined in the ZDLRA admin guide under "Implementing Additional High Availability Strategies".

In the case of the ZDLRA you would use 2 destinations.
The first destination is the primary ZDLRA, and the second destination is the alternate ZDLRA.
The configuration would look like this.


LOG_ARCHIVE_DEST_2='SERVICE=ZDLRA_SF_SCAN:1521/ZDLRASF:dedicated" ASYNC GROUP=1 PRIORITY=1'
LOG_ARCHIVE_DEST_3='SERVICE=ZDLRA_NYC_SCAN:1521/ZDLRANYC:dedicated" ASYNC GROUP=1 PRIORITY=2'



This is an interesting change to the ALTERNATE setting on remote destinations.



Tuesday, November 26, 2019

Dataguard and LOG_ARCHIVE_MAX_PROCESSES


Any DBA who has set up Dataguard has seen this diagram of how the sending of logs to standby database works.

What I wanted to go through is the use of LOG_ARCHIVE_MAX_PROCESSES and how how it affects the sending of redo logs.

If you have set this parameter in your environment it is typically because you are seeing issues with the database cleaning out redo logs.
The primary reason this gets set from the default (4) is to add more processes capable of clearing out redo logs and making them archive logs.  The more processes running in parallel to clean out redo logs, the faster  the database can move on to start a new redo log.

But did you know the LOG_ARCHIVE_MAX_PROCESSES parameter also affects how many processes are used on your standby database to fill any gaps ?

In my example I have a primary database (BSG18) which on 18.3.0.  I also have a standby database (BSG18d).

In my primary database I set the parameter to 10.

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_max_processes            integer     10


I then shutdown my dataguard database, and performed many log switches. I needed at least 10 log switches to demonstrate what happens.

Now once I start up my standby database, the LOG_ARCHIVE_MAX_PROCESSES is used to determine how many processes on my standby database to fill the gap.

Here is my query from V$DATAGUARD_PROCESS

select name,
           pid "standby PID",
           role, client_pid "Primary DB PID",
           client_role
    from V$DATAGUARD_PROCESS
    where type='NET'
order by 2;


NAME  standby PID              ROLE                     Primary DB PID CLIENT_ROLE     
----- ------------------------ ------------------------ -------------- ----------------
rfs   18116                    RFS async                          4165 async ORL multi 
rfs   26337                    RFS ping                           4153 gap manager     
rfs   26401                    RFS archive                        4159 archive gap     
rfs   26404                    RFS archive                        8752 archive gap     
rfs   26408                    RFS archive                        8738 archive gap     
rfs   26411                    RFS archive                        8750 archive gap     
rfs   26413                    RFS archive                        8740 archive gap     
rfs   26415                    RFS archive                        8748 archive gap     
rfs   26417                    RFS archive                        8742 archive gap     
rfs   26419                    RFS archive                        8744 archive gap     
rfs   26421                    RFS archive                        8746 archive gap     


If you look at the output, you can see all the pieces fit together.


Real Time redo process


The first process sends the redo information to the standby database in async mode, and is written to the standby redo logs.


oracle    4165     1  0 Nov05 ?        00:05:57 ora_tt02_bsg18


We can see on the primary database it is a "tt" process and starts up with the database.

oracle   18116     1  0 Nov12 ?        00:07:58 oraclebsg18d (LOCAL=NO)


and for the standby database , it is a sqlnet process.

Gap Manager process


Now let's look at the second process.

oracle    4153     1  0 Nov05 ?        00:00:26 ora_tt00_bsg18


We can see on the primary database it is also a "tt" process and also starts up with the database.

oracle   26337     1  0 Nov12 ?        00:00:18 oraclebsg18d (LOCAL=NO)

and for the standby database , it is a sqlnet process.


Gap fill processes


Now let's look at the remaining processes.


oracle    4159     1  0 Nov05 ?        00:00:14 ora_arc1_bsg18
oracle    8738     1  0 Nov05 ?        00:00:13 ora_arc2_bsg18
oracle    8740     1  0 Nov05 ?        00:00:14 ora_arc3_bsg18
oracle    8742     1  0 Nov05 ?        00:00:13 ora_arc4_bsg18
oracle    8744     1  0 Nov05 ?        00:00:13 ora_arc5_bsg18
oracle    8746     1  0 Nov05 ?        00:00:13 ora_arc6_bsg18
oracle    8748     1  0 Nov05 ?        00:00:13 ora_arc7_bsg18
oracle    8750     1  0 Nov05 ?        00:00:13 ora_arc8_bsg18
oracle    8752     1  0 Nov05 ?        00:00:13 ora_arc9_bsg18


When I look at the primary database, I can see 9 "arc" processes.
Why 9 processes? because LOG_ARCHIVE_MAX_PROCESSES of 10 tells the primary database that it can use 10 processes to fill the gap on the standby.
 The 1st process is the gap manager "tt', and the remaining 9 processes "arc" are used to send over the archive logs.

Increasing the LOG_ARCHIVE_MAX_PROCESSES can be used to increase the parallelization of sending archive logs to fill any gaps.



Monday, October 21, 2019

Oracle Active Dataguard - More than just a read-only copy.


NOTE: I updated this on 1/16/20 with additional information.

Anyone that knows me, knows I'm a stickler for properly explaining technical topics and ensuring I use the correct term.

Dataguard  vs Active Dataguard is a topic that drives me crazy sometimes as people tend to use the two options interchangeably.

The differences appear to be subtle on the surface, but there are some major difference (other than the obvious) that you might not know about.

What I am hoping you get out this blog post is... If you have the license for Active Dataguard, then turn  on portions of it, even if the application isn't using it for queries. There are more benefits from Active Dataguard than just having a read-only database copy.


Dataguard  -  First let's talk about normal Dataguard.


Basically, a Dataguard database, is an exact copy of the primary/protected database, that is constantly in recovery.  Redo log information (used for recovery) is automatically sent from the primary/protected database by specifying the name/location of the Dataguard copy as an additional location for the redo logs.
This is a simple explanation, but Dataguard concepts are that simple.

Dataguard - What Dataguard does for me to protect my primary database.

Dataguard is used for a number of purposes.  The most common of which is to have a "Disaster Recovery" copy of the primary database available in the event of the loss of the primary copy.
For a "Disaster Recovery" copy, best practice is to have this copy be geographically isolated from the primary.  This ensures a disaster to the primary datacenter (flood, earthquake, etc.) doesn't affect the Dataguard copy.

When moving the application to use the Dataguard copy, there are 2 different ways to bring up the dataguard copy as the primary.

1) Switchover - In the case of the switchover, all transactions on the primary are applied to the dataguard copy before it is opened up.  This ensures no data loss. This isn't always possible since transactions on primary database need to be "drained" and transferred to the Dataguard copy.

2) Failover - In the case of a failover, it is not possible to "drain" transactions from the primary database.  All outstanding transactions that have been received from the primary database are applied on the Dataguard copy and it is opened with "resetlogs" and there is data loss.

Other uses for Dataguard

  • Dataguard can be used to create an up-to-date copy of production for testing/QA, etc.
  • A Dataguard database can be opened for write (snapshot standby) to test code releases etc and then flashed back to being a Datagaurd copy again.
  • A Dataguard database can have a delay in log apply, essentially providing a time gap, allowing data to be recovered within the time gap in case of user error.

Active Dataguard Option (Licensible option)- 

This contains many features now

Original feature  -- Dataguard copy is is open as read only.

  That is as simple as it is to use active dataguard (if you have the license).  Before starting the apply of redo log information, the database is open read only.

  The main advantage of Active Dataguard is that you can now use the DR copy of the database for queries.  This not only offloads activity from the primary to the mostly idle Dataguard copy, it also ensures that there is a readable copy of the data even while the primary is not available (patching, etc.).

What I wanted to go through in this post, is all the other features that comes with Active Dataguard that you might not realize.

Additional Active Dataguard Features.

First I am going to separate the features into 2 requirements.

Features that are available when the database is in mount mode (read-only not required).


  1. Far Sync -  Far sync allows you to create a shell instance that is used to capture real-time redo from a database, and send it on to standby database.  You can have multiple far sync instances for redundancy, and they are typically local to primary to provide a synchronous destination with very little network lag..
  2. BCT (Block Change Tracking). You can create a BCT file on your standby database, and it will be used for incremental backups. 
  3. Real-time- redo - This allows you to cascade redo from the standby to a destination real-time in the same manner that the primary DB does with Standby redologs.
Features that are available when the database is in read-only mode
  1. Automatic Block repair -  Corrupted blocks on the primary are repaired by automatically applying the "clean" block from the Dataguard copy.
  2. DML redirection.  Occasional updates to the dataguard copy are redirected to the primary database.
  3. Preserve Buffer cache during Role change - When a Dataguard database becomes the primary, the mode change is done without bouncing the database, thus preserving what is in memory.

Additional Active Dataguard Features affecting ZDLRA.



I wanted to call out there 2 features specific to Active Dataguard that have an affect on the ZDLRA recoverability.

  1. Block Change Tracking File - With Active Dataguard, the BCT file gets updated with the changes and is used for incremental backups.. This can be extremely important when using a ZDLRA to backup your Dataguard database.  With the ZDLRA, only incremental backups are performed.  Without an active BCT file, incremental backups will scan all database blocks with every incremental backup.  If you have the license for Active Dataguard be sure to create a BCT anyway, and ensure it is used.
  2. Far Sync Support -  You are probably wondering what this has to do with ZDLRA.  The Far Sync support (starting with 12c) is more than just support for Far Sync.  This feature changes when the applied updates are written to the destinations. Prior to 12c, changes were written to the destinations AFTER the log switch.  This means that the downstream Dataguard databases, and ZDLRA only got full archive logs.  With 12c, an Active Dataguard database, just like the primary sends changes to the destinations from memory as they are applied.  This can make a big difference as to the recovery point objective (RPO) of a dataguard database from the ZDLRA.  This feature, using real-time redo from a standby database to a ZDLRA is allowed as an exception under licensing.

Key takeaways.


Active Dataguard has a couple of great features that the ZDLRA takes advantage of. If you have the license for it, you should turn on BCT, even if the application isn't using any of the other features..