Thursday, July 16, 2020

ZDLRA and TDE wallet location - Part 2

TDE and SEPS security - how do I get there?
If you read my last blog post on TDE and SEPS security you might be asking yourself, how do I get there ?

Many customers use the default location for the TDE wallet (because they are new to TDE) and find that it the default location will cause conflicts with other Oracle features.

The basic question around this would be.

"all my TDE wallets are in the default location of $ORACLE_HOME/admin/DB_UNQUE_NAME/wallet 
                  or 
$ORACLE_BASE/admin/DB_UNQUE_NAME/wallet
and  I have multiple databases sharing the same $ORACLE_HOME location 
how do I get to a dedication location for TDE?

The challenge, especially if you want to use WALLET_LOCATION (which the ZDLRA requires for real-time redo) is how to get from the default to a dedicated location.
The issue is that WALLET_LOCATION overrides the default location, unless a dedicated TDE wallet location is specified.

First-- The SQLNET.ORA file is ONLY read by the database at startup. Any changes made to the sqlnet.ora file will be effective when a database instance bounces.  You do want to be careful with the coordination however, because a database instance can bounce at any time for any number of reasons so plan carefully.

Now let's start with the where to put the TDE wallet files.  There are many options

1) Leave the wallet files within the $ORACLE_HOME directory using the $ORACLE_SID. 
     PROS - This is less disruptive since it uses a variable already set
     CONS - Wallets have to be be moved to a new location with an out of place upgrade.
                   You need copy the wallet to this new location when implementing.
                    In a multi-node RAC cluster the location is different on each node

    STEPS

  • For each database sharing the $ORACLE_HOME ensure there is a wallet subdirectory created on each node for every instance.
  • Copy the wallet files to the appropriate subdirectory for each node and for each instance
  • Update the SQLNET.ORA file to point to $ORACLE_HOME/admin/$ORACLE_SID/tde_wallet
2) Leave the wallet files within the original location in $ORACLE_HOME that uses the $DB_UNIQUE_NAME.
     PROS - You don't have to move the wallet files
     CONS - You need to set a new variable
                    Wallets have to be be moved to a new location with an out of place upgrade.

    STEPS
  • For ALL databases sharing the same $ORACLE_HOME ensure that the variable $DB_UNIQUE_NAME is set through srvctl (if available). This ensures all nodes in a RAC cluster have the variable set.
  • Ensure all login scripts on all nodes (including the login script) have the variable $DB_UNIQUE_NAME set
  • Update the SQLNET.ORA file to point to the $ORACLE_HOME/admin/$DB_UNIQUE_NAME/wallet
3) Leave (or move) the wallet files within the $ORACLE_BASE directory using the $ORACLE_SID.  

     PROS - This is less disruptive since it uses a variable already set
     CONS - Wallets have to be be moved to a new location with an out of place upgrade.
                   You need copy the wallet to this new location when implementing.
                    In a multi-node RAC cluster the location is different on each node

    STEPS

  • For each database sharing the $ORACLE_HOME ensure there is a wallet subdirectory created on each node for every instance within the $ORACLE_BASE/admin directory (unless this was already the default)
  • If necessary, copy the wallet files to the appropriate subdirectory for each node and for each instance
  • Update the SQLNET.ORA file to point to $ORACLE_BASE/admin/$ORACLE_SID/wallet
4) Migrate to $ORACLE_BASE and use $DB_UNIQUE_NAME
     PROS - Once set, you can leave the wallets after out-of-place upgrades
     CONS -  You need copy the wallet to this new location when implementing.
                    You need to set a variable to be used

    STEPS

  • For each database sharing the $ORACLE_HOME ensure there is a wallet subdirectory created on each node for every $DB_UNIQUE_NAME within the $ORACLE_BASE/admin directory (unless this was already the default)
  • Copy the wallet files to the appropriate subdirectory for each node and for each instance
  • For ALL databases sharing the same $ORACLE_HOME ensure that the variable $DB_UNIQUE_NAME is set through srvctl (if available). This ensures all nodes in a RAC cluster have the variable set.
  • Ensure all login scripts on all nodes (including the login script) have the variable $DB_UNIQUE_NAME set
  • Update the SQLNET.ORA file to point to $ORACLE_BASE/admin/$DB_UNIQUE_NAME/tde_wallet

5) Migrate to ASM (Not available in 11.2) and use $DB_UNIQUE_NAME
     PROS - Once set, you can leave the wallets after out-of-place upgrades
                   You now have a central location for a RAC cluster
     CONS -  You need copy the wallet to this new location when implementing.
                    You need to set a variable to be used

    STEPS

  • For each database sharing the $ORACLE_HOME ensure there is a wallet subdirectory created in ASM for every $DB_UNIQUE_NAME 
  • Copy the wallet files to the appropriate subdirectory for each database
  • For ALL databases sharing the same $ORACLE_HOME ensure that the variable $DB_UNIQUE_NAME is set through srvctl (if available). This ensures all nodes in a RAC cluster have the variable set.
  • Ensure all login scripts on all nodes (including the login script) have the variable $DB_UNIQUE_NAME set
  • Update the SQLNET.ORA file to point to +DISKGROUP/$DB_UNIQUE_NAME/tde_wallet

It's your choice which path to take.  For me, the best (if ASM isn''t an option) is to put the TDE Wallets within $ORACLE_BASE/admin/$DB_UNIQUE_NAME/tde_wallet.  That way with each out-of-place upgrade I don't have do anything with the wallet. As long as the sqlnet.ora points to the $ORACLE_BASE there won't be any changes.


NOTE: for 18c and above just migrate to WALLET_ROOT which allows you set the value for each database individually.

Tuesday, July 14, 2020

ZDLRA and TDE wallet location

TDE and SEPS security



I am seeing TDE used more and more at customers as security concerns increase.
This blog post will go through configuring TDE and SEPS security (which ZDLRA uses) together.
If OID is used also, this post talks about how to combine OID and SEPS.

First off, the solution depends on the version of oracle you are using.  Depending on your configuration SEPS security and TDE may use the same wallet location. This is NOT recommended.
Below is the hierarchy of where Oracle expects the TDE wallet to be. As soon as it finds the setting it stops

TDE_WALLET_LOCATION
         WALLET_LOCATION
                    $ORACLE_HOME/admin/$DB_UNIQUE_NAME/wallet
                              $ORACLE_BASE/admin/$DB_UNIQUE_NAME/wallet

**NOTE: unless the TDE_WALLET_LOCATION is already set,
                 setting the WALLET_LOCATION will break TDE

When using SEPS security it is critical that you properly set the TDE wallet location first.

11.2

First let's talk through 11.2 and the recommendation for TDE encryption wallet. This is the most basic configuration setting.

Best practice is the set the ENCRYPTION_WALLET_LOCATION in the sqlnet.ora.
If there are multiple databases sharing the same $ORACLE_HOME (multi-homing), then the location needs to use a variable.

Single home example.



ENCRYPTION_WALLET_LOCATION=
 (SOURCE=
  (METHOD=FILE)
   (METHOD_DATA=
    (DIRECTORY=/u01/app/oracle/tde_wallet)))


Multi-Home examples



Example 1 - using the $ORACLE_SID variable for the location



ENCRYPTION_WALLET_LOCATION=
 (SOURCE=
  (METHOD=FILE)
   (METHOD_DATA=
    (DIRECTORY=/u01/app/oracle/admin/$ORACLE_SID/tde_wallet)))

Example 2 - using a new variable


First ensure that the variable set is set when servctl is used to restart the databases.

srvctl setenv database -db database_name -env "DB_UNIQUE_NAME=database_name"

Second ensure the variable is set during any scripts and when logging into the host

export $DB_UNIQUE_NAME=database_name

Then use this variable within the sqlnet.ora

ENCRYPTION_WALLET_LOCATION=
 (SOURCE=
  (METHOD=FILE)
   (METHOD_DATA=
    (DIRECTORY=/u01/app/oracle/admin/$DB_UNIQUE_NAME/tde_wallet)))

** NOTE: you need to create the directories for all databases sharing that same $ORACLE_HOME even if they don't use TDE or SEPS.


12.1/12.2

The configuration for 12.1 is similar to 11.2 with one exception, 12.1 allows you to use ASM for the location of the wallet in a RAC environment.

Here are the examples of ASM based on the 11.2 information.

Single home example.


ENCRYPTION_WALLET_LOCATION=
 (SOURCE=
  (METHOD=FILE)
   (METHOD_DATA=
    (DIRECTORY=+DATA/tde_wallet)))

Multi-Home example


ENCRYPTION_WALLET_LOCATION=
 (SOURCE=
  (METHOD=FILE)
   (METHOD_DATA=
    (DIRECTORY=+DATA/$DB_UNIQUE_NAME/tde_wallet)))


18c+

Oracle version 18c adds more functionality for the TDE wallet.

18C introduces a new init parameter for TDE called "WALLET_ROOT". in fact, TDE_ENCRYPTION_LOCATION will be depreciated (see below from 18c docs).





WALLET_ROOT is set to the starting location of the TDE wallet, and uses the location as the starting location for wallets for both the CDB, and subdirectories for PDB wallets.

WALLET_ROOT can either be a local file system (or NAS).

          Example
                           WALLET_ROOT=wallet-root-directory-path

It can also be set to an ASM location

         Example
                           WALLET_ROOT=+disk-group-name/db-unique-name

SUMMARY : When implementing the ZDLRA (which uses SEPS security) with an existing TDE implementation, it is critical to ensure that TDE was configured using best practices.  If best practices were not followed, configuring the WALLET_LOCATION may cause wallet issues with databases.





Wednesday, May 13, 2020

Sharing an RMAN Catalog with multiple users on ZDLRA


One of the topics that comes when implementing ZDLRA is the new RMAN feature that allows multiple users to share an RMAN catalog, but still be isolated.
This is NOT a ZDLRA specific feature, but I find that most customers have never used it until they move to a shared BaaS (Backup as a Service) environment like ZDLRA.

Here is the basic explanation.  I am a DBA in the ACME corporation. The ACME corporation has 2 divisions.
The white division and the black division.


I work for the black division of ACME.
Since the ACME corporation has a shared infrastructure, there is a single ZDLRA and thus a single RMAN catalog for all database backups.

As you can guess my division (black) wants to be isolated from the white division. We don't want the white division to be able to access the backups for any database that my division supports.  The White division feel the same way about my division.

Now this where the idea of VPC users in the RMAN catalog comes in.  First here is some documentation on it to do your own reading.

For my example I have 2 databases.

  • OEMDB (White division maintained)
  • RMAN19C (Black division maintained)

The ZDLRA already has a catalog created, and in order to manage VPC users, you need to use the Command Line Interface (racli).  

With the naming, on the ZDLRA the users are referred to as VPC (Virtual Private Catalog users).  The documentation for this feature refers to it as a VPD (Virtual Private Database) model.

In order to demonstrate how this works, I am going to walk through the steps with just RMAN.

Step # 1 - Create an RMAN user in a database to own the RMAN catalog.



SQL> create user rman identified by oracle
            temporary tablespace temp
            default tablespace rman_data
            quota unlimited on rman_data;

grant recovery_catalog_owner to rman;
  2    3    4
User created.

SQL> SQL>
Grant succeeded.


Step #2 - create the catalog for the RMAN user.


[oracle@oracle-server admin]$ rman target / catalog rman/oracle

Recovery Manager: Release 19.0.0.0.0 - Production on Wed May 13 08:10:23 2020
Version 19.6.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: OEMDB (DBID=700723428)
connected to recovery catalog database

RMAN> create catalog tablespace rman_data;

recovery catalog created

RMAN>



Step #3 - Enable the VPD model within the recovery catalog. The VPD model is not available by default.



SQL> @/$ORACLE_HOME/rdbms/admin/dbmsrmanvpc.sql -vpd rman

Checking the operating user... Passed

Granting VPD privileges to the owner of the base catalog schema RMAN

========================================
VPD SETUP STATUS:
VPD privileges granted successfully!
Connect to RMAN base catalog and perform UPGRADE CATALOG.

Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.6.0.0.0


Step #4 - Upgrade the RMAN catalog.


I didn't notice this at first, but the last message at the end of the script indicates that you need to execute UPGRADE CATALOG.

-vpd command grants required privileges to support VPD protected catalog.
Connect to RMAN base catalog and perform UPGRADE CATALOG after the VPD
privileges are granted.


After UPGRADE CATALOG is performed for the base catalog schemas a cleanup
of VPC schemas has to take place for that the RMAN base catalog schema
names have to be supplied as command line parameters.  Up to 10 schema
names can be supplied per script execution.  When -all is specified the
script attempts to detect the RMAN base catalog schemas automatically
and perform the upgrade.

Now to upgrade the catalog.

[oracle@oracle-server admin]$ rman catalog rman/oracle

Recovery Manager: Release 19.0.0.0.0 - Production on Wed May 13 08:22:18 2020
Version 19.6.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to recovery catalog database

RMAN> upgrade catalog;

recovery catalog owner is RMAN
enter UPGRADE CATALOG command again to confirm catalog upgrade

RMAN> upgrade catalog;

recovery catalog upgraded to version 19.06.00.00.00
DBMS_RCVMAN package upgraded to version 19.06.00.00
DBMS_RCVCAT package upgraded to version 19.06.00.00.



Step #5 - Create my 2 VPC users within my RMAN catalog which is now VPD enabled.



SQL>
create user black identified by oracle
            temporary tablespace temp;

grant create session to black;

create user white identified by oracle
            temporary tablespace temp;

grant create session to white;
SQL>   2
User created.

SQL> SQL>
Grant succeeded.

SQL> SQL>   2
User created.

SQL> SQL>
Grant succeeded.




Now I have 2 choices for registering my 2 databases.

  • I can reserve the "REGISTER DATABASE" for the catalog owner (RMAN) only
  • I can grant VPC users the authority to register their own databases

Step #5 - I have decided to grant the VPC users the ability to register databases themselves.



[oracle@oracle-server ~]$ rman catalog rman/oracle

Recovery Manager: Release 19.0.0.0.0 - Production on Wed May 13 08:25:29 2020
Version 19.6.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to recovery catalog database

RMAN> grant register database to black;
grant register database to white;

Grant succeeded.

RMAN>
Grant succeeded.



NOTE : When I first ran the commands I received the error below.

RMAN-07543: recovery catalog does not have VPD support enabled

f you see this error, it is most likely because you didn't execute the" UPGRADE CATALOG" after enabling the VPD model.

Step #6 - Now lets register the databases with separate VPC users and backup a datafile.


I am going to register database OEMDB as "white" vpc user and backup datafile 1;


[oracle@oracle-server ~]$ rman target / catalog white/oracle@zdlra_sf
Recovery Manager: Release 19.0.0.0.0 - Production on Wed May 13 08:39:57 2020
Version 19.6.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: OEMDB (DBID=700723428)
connected to recovery catalog database

RMAN> register database;

database registered in recovery catalog
starting full resync of recovery catalog
full resync complete

RMAN> backup incremental level 0 datafile 1;

Starting backup at 05/13/20 08:40:29
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=398 device type=DISK
channel ORA_DISK_1: starting incremental level 0 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/home/oracle/app/oracle/oradata/OEMDB/datafile/o1_mf_system_h4llc7kt_.dbf
channel ORA_DISK_1: starting piece 1 at 05/13/20 08:40:30
channel ORA_DISK_1: finished piece 1 at 05/13/20 08:41:15
piece handle=/home/oracle/app/oracle/fast_recovery_area/OEMDB/backupset/2020_05_13/o1_mf_nnnd0_TAG20200513T084030_hcqtoz3z_.bkp tag=TAG20200513T084030 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:45
Finished backup at 05/13/20 08:41:15

Starting Control File and SPFILE Autobackup at 05/13/20 08:41:15
piece handle=/home/oracle/app/oracle/fast_recovery_area/OEMDB/autobackup/2020_05_13/o1_mf_s_1040287283_hcqtqnbm_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 05/13/20 08:41:26



I am going to register database RMAN19C as "black" vpc user and backup datafile 1;

[oracle@oracle-server ~]$ rman target / catalog black/oracle@zdlra_sf
Recovery Manager: Release 19.0.0.0.0 - Production on Wed May 13 08:42:49 2020
Version 19.6.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: RMAN19C (DBID=4290218304)
connected to recovery catalog database

RMAN> register database;

database registered in recovery catalog
starting full resync of recovery catalog
full resync complete

RMAN> backup incremental level 0 datafile 1;

Starting backup at 05/13/20 08:43:12
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=274 device type=DISK
channel ORA_DISK_1: starting incremental level 0 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/home/oracle/app/oracle/oradata/RMAN19C/datafile/system.dbf
channel ORA_DISK_1: starting piece 1 at 05/13/20 08:43:13
channel ORA_DISK_1: finished piece 1 at 05/13/20 08:44:08
piece handle=/home/oracle/app/oracle/fast_recovery_area/RMAN19C/backupset/2020_05_13/o1_mf_nnnd0_TAG20200513T084313_hcqtv1fc_.bkp tag=TAG20200513T084313 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:55
Finished backup at 05/13/20 08:44:08

Starting Control File and SPFILE Autobackup at 05/13/20 08:44:08
piece handle=/home/oracle/app/oracle/fast_recovery_area/RMAN19C/autobackup/2020_05_13/o1_mf_s_1040287449_hcqtwtmj_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 05/13/20 08:44:12



Step #7 - Now lets see what happens when I try to cross VPC users within the catalog.


NOTE -below are the 2 databases and DBIDs. I will try to access opposite database to restore it from RMAN using the DBID.


white VPC user    ---->  OEMDB (DBID=700723428)
black VPC user    ---->  RMAN19C (DBID=4290218304)

First lets connect as "white" to a new instance and try to set the dbid for each of the databases. You can see that "white" can only see the metadata for the OEMDB.

[oracle@oracle-server admin]$ rman target / catalog white/oracle@zdlra_sf

Recovery Manager: Release 19.0.0.0.0 - Production on Wed May 13 08:50:39 2020
Version 19.6.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database (not started)
connected to recovery catalog database

RMAN> set dbid=4290218304;

executing command: SET DBID
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of set command at 05/13/2020 08:51:15
RMAN-06063: DBID is not found in the recovery catalog

RMAN> set dbid=700723428;

executing command: SET DBID
database name is "OEMDB" and DBID is 700723428

RMAN>


Now lets connect as "black" to a new instance and try to set the dbid for each of the databases. You can see that "black" can only see the metadata for the RMAN19C.

[oracle@oracle-server admin]$ rman target / catalog black/oracle@zdlra_sf

Recovery Manager: Release 19.0.0.0.0 - Production on Wed May 13 08:54:20 2020
Version 19.6.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database (not started)
connected to recovery catalog database

RMAN> set dbid=4290218304;
set dbid=700723428;
executing command: SET DBID
database name is "RMAN19C" and DBID is 4290218304

RMAN>

executing command: SET DBID
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of set command at 05/13/2020 08:55:01
RMAN-06063: DBID is not found in the recovery catalog


CONCLUSION : Implementing the VPD model in an RMAN catalog allows you to share a single RMAN catalog while still allowing backup metadata to be isolated within same catalog.

This feature is leveraged by the ZDLRA to allow a single ZDLRA to be shared across a corporation, but still allow isolation by internal organizations.

Final Note : As Multi-tenant becomes the standard, support for isolation will start to be at the PDB level also. This will allow for different VPC users to manage PDBs within the same CPD.  This is dependent on the version of RMAN/DB/ZDLRA.

Tuesday, May 12, 2020

ZDLRA and using stored scripts


One new item I learned with RMAN is the ability to utilize shared scripts. Using shared scripts can be very useful as you standardize your environment.
This is where it fits into the ZDLRA.  The ZDLRA is an awesome product that helps you standardize your Oracle Backups, so adding shared scripts to your environment makes it even better !!

First, a little bit on shared scripts if you are new to them (I was).
Shared scripts are exactly as you expect by the name. They are a way to store a script in your RMAN catalog that can be shared by all the databases that utilize that RMAN catalog.
The list of commands to be used for shared scripts are.
  • [CREATE]/[CREATE OR REPLACE]  {GLOBAL} SCRIPT
  • DELETE {GLOBAL} SCRIPT
  • EXECUTE {GLOBAL} SCRIPT
  • PRINT {GLOBAL} SCRIPT
Using the {GLOBAL} keyword is optional, and is only useful if you are are using multiple VPD users in your RMAN catalog.  In a multi-VPD RMAN catalog, each VPD users can store scripts with the same name and they will be separate.
 I will go through VPD users in a future post, as this is a topic on it's own.

Now let's go through how to best utilize stored scripts through an example. In my example, I have 2 ZDLRAs (let's pretend).  With ZDLRA, the RMAN catalog is contained with the ZDLRA itself. This means that because I have 2 ZDLRAs, I have 2 RMAN catalogs.  In this example, I will show how I can use a shared script to automatically switch backups to a different ZDLRA when the primary ZDLRA is not available during a patching window.  Below is the picture of my fictitious environment.

In this case my backups go to ZDLRA_SF, and they are replicated to ZDLRA_NYC.  My database is registered in both RMAN catalogs, but I only connect to the catalog on ZDLRA_SF if it is available.
Now let's create a script to put in each catalog that is specific to each ZDLRA.

Step #1 - add a script to both ZDLRAs that creates the channel configuration.



This is the script for the SF ZDLRA.


create or replace script ZDLRA_INCREMENTAL_BACKUP_L1
COMMENT "Normal level 1 incremental backups for San Francisco Primary ZDLRA"
  {
###
### Script to backup incremental level 1 to Primary San Francisco ZDLRA
###
###
###  First configure the channel for San Francisco ZDLRA
###
    CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' FORMAT '%d_%U' PARMS
      "SBT_LIBRARY=/u01/app/oracle/lib/libra.so,
      ENV=(RA_WALLET='location=file:/u01/app/oracle/wallet
       credential_alias=zdlrasf-scan:1521/zdlra:dedicated')";
###
###  Perform backup command
###
    backup device type sbt cumulative incremental level 1 filesperset 1 section size 64g database plus archivelog filesperset 32 not backed up;
   }


created script ZDLRA_INCREMENTAL_BACKUP_L1




This is the script for the NYC ZDLRA

create or replace script ZDLRA_INCREMENTAL_BACKUP_L1
COMMENT "Normal level 1 incremental backups for New York Primary ZDLRA"
  {
###
### Script to backup incremental level 1 to Primary New York ZDLRA
###
###
###  First configure the channel for New York ZDLRA
###
    CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' FORMAT '%d_%U' PARMS
      "SBT_LIBRARY=/u01/app/oracle/lib/libra.so,
      ENV=(RA_WALLET='location=file:/u01/app/oracle/wallet
       credential_alias=zdlranyc-scan:1521/zdlra:dedicated')";
###
###  Perform backup command
###
    backup device type sbt cumulative incremental level 1 filesperset 1 section size 64g database plus archivelog filesperset 32 not backed up;
   }

created script ZDLRA_INCREMENTAL_BACKUP_L1


Now, I have a script in each ZDLRA which allocates the channel properly for each ZDLRA.

Step #2 - Ensure the connection to the RMAN fails over if primary ZDLRA is not available.


The best way to accomplish this is the GDS - Global Data Services. In order to perform my test I am going to set up a failover in my tnsnames.ora file.  Below is the entry in my file.


ZDLRA_SF=
       (DESCRIPTION_LIST=(FAILOVER=YES)(LOAD_BALANCE= NO)
          (DESCRIPTION=(FAILOVER= NO)
             (CONNECT_DATA=(SERVICE_NAME=ZDLRA_SF))
             (ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1522))
           )
          (DESCRIPTION=(FAILOVER= NO)
             (CONNECT_DATA=(SERVICE_NAME=ZDLRA_NYC))
             (ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1522))
           )
       )



Step #3 - Let's put it all together.


Since I don't have 2 ZDLRAs to test this, I will test it all by printing out the script that will be executed in RMAN.

I am going first going to  show what happens under normal conditions. I am going to execute the following script I created in /tmp/rman.sql


print script  ZDLRA_INCREMENTAL_BACKUP_L1;
exit;


I am going run that script over and over to ensure I keep connecting properly. Below is the command.


 rman target / catalog rman/oracle@zdlra_sf @/tmp/rman.sql


Here is the output I get everytime showing that I am connecting and I will allocate channels to ZDLRA_SF

RMAN> print script  ZDLRA_INCREMENTAL_BACKUP_L1;
2> exit;
printing stored script: ZDLRA_INCREMENTAL_BACKUP_L1
{
###
### Script to backup incremental level 1 to Primary San Francisco ZDLRA
###
###
###  First configure the channel for San Francisco ZDLRA
###
    CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' FORMAT '%d_%U' PARMS
      "SBT_LIBRARY=/u01/app/oracle/lib/libra.so,
      ENV=(RA_WALLET='location=file:/u01/app/oracle/wallet
       credential_alias=zdlrasf-scan:1521/zdlra:dedicated')";
###
###  Perform backup command
###
    backup device type sbt cumulative incremental level 1 filesperset 1 section size 64g database plus archivelog filesperset 32 not backed up;
   }


Everything is working fine.  Now let's simulate the primary being unavailable like below.


I shutdown the ZDLRA_SF database.

Now I am executing the same command as before.

 rman target / catalog rman/oracle@zdlra_sf @/tmp/rman.sql


My output now shows that it is using my New York ZDLRA.

RMAN> print script  ZDLRA_INCREMENTAL_BACKUP_L1;
2> exit;
printing stored script: ZDLRA_INCREMENTAL_BACKUP_L1
{
###
### Script to backup incremental level 1 to Primary New York ZDLRA
###
###
###  First configure the channel for New York ZDLRA
###
    CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' FORMAT '%d_%U' PARMS
      "SBT_LIBRARY=/u01/app/oracle/lib/libra.so,
      ENV=(RA_WALLET='location=file:/u01/app/oracle/wallet
       credential_alias=zdlranyc-scan:1521/zdlra:dedicated')";
###
###  Perform backup command
###
    backup device type sbt cumulative incremental level 1 filesperset 1 section size 64g database plus archivelog filesperset 32 not backed up;
   }

Recovery Manager complete.



SUMMARY :  



  •  With stored scripts, you can easily script a central backup script to be shared by all databases, and not have to worry about the individual channel configuration.
  • If there are any changes to the best practice for my backups (Doc ID 2176686.1), I just have to update the stored script in the RMAN catalog.

Wednesday, April 15, 2020

ZDLRA, Real-time redo compression


  ZDLRA: Changes in the Protection Policy Compression Algorithms (Doc ID 2654539.1)

First I wanted to go through a little background on how compression works for archive logs.


As you probably know, database backups are automatically compressed.  That is not necessarily the case for archive logs.

Archive logs can reach the ZDLRA by 2 different methods.

  • Real-time redo,
  • Log sweeps.
Regardless of how the Archive Logs reach the ZDLRA, they are stored as backup files.

Log Sweeps


Now for log sweeps, the backupset of archive logs is stored exactly as they are sent.
if you follow the best practices for backups, you send over archive logs as
   - Plus archive logs not backed up filesperset=32;
As you can figure out, this creates a single, or multiple backupsets each backupset contains multiple archive logs up to a total of 32 archive logs per backupset.
Also note, that that by default the archive log backupsets are NOT compressed.  You can RMAN compress the archive log backups (if you have the ACO license), but be careful because it is not recommended to compress the datafile backups.

Real-time redo

Real-time redo is where this MOS note takes affect.  Up to the current release of ZDLRA, archive logs sent via real-time redo were compressed using basic compression.  A "standby redo" log would mirror the redo log for the protected database, and a "BACKUP_ARCH" task would be queued to create a backupset from the "standby redo log".  In the process of creating the RMAN backupset, basic compression was applied and a compress RMAN backupset is created.

This does a few things that I want to point to out.
  • During the creation of the archivelog backupset, CPU is used to compress the backupset, and the final result is a smaller backupset.
  • During the validation of backups for a database, CPU is used to uncompress the RMAN backupset to validate it. NOTE: this would also occur if you RMAN compress the archive logs during a log sweep.
  • When recovering a database, the compressed RMAN backupset is sent over the network (less network traffic), and the Database Host uncompresses the backupset to read the archive logs.

If you read this carefully, you will see that there a CPU cost for compression, and there is a space savings when using it.

New Feature to control log compression!


The new feature allows you to control the log compression for archive logs sent through real-time redo.
NOTE: When using compression on the ZDLRA, you don't need the ACO license.

Below is the information from the documentation.

log_compression_algorithm

The setting for the archivelog compression feature. 
This setting is used to adjust the compression level of NZDL/polled archivelog backups.


OFF means that the archivelogs will not be compressed. 
BASIC means the BASIC compression alogorithm will be used to compress the backups. 
LOW means the LOW compression alogorithm will be used to compress the backups. 
MEDIUM means the MEDIUM compression alogorithm will be used to compress the backups. 
HIGH means the HIGH compression alogorithm will be used to compress the backups.



This is a great lever to control what happens with archive logs.

Medium compression can be used if you have ample CPU and are concerned about space, OFF can be used if you have CPU concerns and ample space.

Remember, the deeper the compression, the higher the CPU. Like most things in IT, there are tradeoffs .


Monday, February 24, 2020

What are those processes in V$DATAGUARD_PROCESSES on the ZDLRA?


With 12.2 of Oracle there is a now detail in the V$DATAGUARD_PROCESS view on the downstream database.  Keep in mind that the "downstream database" may not only be a standby database, it can include a Far Sync database, or a ZDLRA receiving Real-time-redo.

Below is a table of the processes that I've seen in this view from a ZDLRA, and I will go through what each one of these are.

NAME ACTION        CLIENT_ROLE      COUNT(*)
---- ------------ ---------------- ----------
rfs  IDLE          async SRL single      10
rfs  IDLE          async SRL multi       20
rfs  IDLE          async ORL single      30
rfs  IDLE          async ORL multi       40
rfs  IDLE          gap manager          100
rfs  IDLE          archive gap           10



First off.  I want to point out what happens in a RAC environment.  As you can image, each thread (node) in a RAC environment independently sends it's redo to the downstream.  The same thing happens with a standby database sending it's redo to a downstream.
For example, if my primary DB cluster is a 4 Node RAC cluster, and I have a standby database that is a single node, redo sent from both the primary AND the standby cluster to a ZDLRA will be sent as 4 independent streams.

Now to walk through what each of these types of processes are, I will first define the types.
Since I am seeing multiple types, this example is from a ZDLRA (rather than a standby database) so I will define the terms from a ZDLRA standpoint.


SRL – Standby Redo Logs are sending the changes. These are standby databases
ORL – Online Redo Logs are sending the changes. These are primary databases

Multi  - The Protected database is using the same log buffer to write both archive logs and REDO to the ZDLRA (and possibly another standby).
Single – The Protected database is using separate buffers for REDO to the ZDLRA. The ZDLRA could not keep up with the archive/standby database.

Async       - Real-time redo processes
Gap manager – Process that keeps track of any gaps in redo, and can send archive logs to fill the gap.
Archive gap – Additional processes that send over any gaps in redo. This is controlled by the parameter “LOG_ARCHIVE_MAX_PROCESSES” on the protected databases. The Gap manager will send over the first archive log, but archive gap processes will be started in parallel with the Gap manager if more than 1 archive log is needed. This is controlled by the LOG_ARCHIVE_MAX_PROCESSES parameter which has a default value of 4.

So for my example, this is what we are seeing 

  • “Async SRL single” processes of 10 is telling us that there are 10 standby treads being applied and sending real-time-redo using the same buffer on the protected database that the archive redo process is using. 
  • Async SRL multi” processes of 20 is telling us that there are 20 standby threads being applied and sending real-time-redo that could not keep up with the archiver process and spawned an additional buffer. 
  • A total of 30 combined processes for "Async SRL" is telling us that there are 30 threads sending real-time redo in total.
**** Notice I used the term "threads" above.  The number of threads is the number of RAC nodes on the primary database regardless of how many instances are on the standby applying redo.


  • “Async ORL single” processes of 30 is telling us that there are 30 primary database instances sending real-time-redo using the same buffer on the protected database that the archive redo process is using. 

  • “Async ORL Multi” processes of 40 is telling us that there are 40 primary database instances sending real-time-redo that could not keep up with the archiver process and spawned an additional buffer. 
  • A total of 70 combined processes for "Async ORL" is telling us there are 70 primary instances sending real-time redo.
**** Notice I used the term "instances" above.  The number of instances is the number of RAC nodes on the primary database rather than the number of databases.
  • "Gap Manager" process of 100 is telling us that there are 100 process that are monitoring the async process.  Notice that the total number of "Gap Manager" processes matches the total number of async processes.
  • “archive gap” processes are temporary and should be killed once they are idle for 2 minutes since they completed their work. We should only see these processes if the ZDLRA falls behind in collecting redo.

I hope this helps explain what the processes are that are in use to manage the real-time redo.

Monday, December 23, 2019

Oracle - Alternate Destinations, FRA, and balancing 2 destinations


My previous post was on using Groups/Priorities when specifying an alternate destination for sending Redo to a remote destination. You can find it here.

This post covers 2 topics that come up whenever I talk about alternate destinations.

  • Does it balance 2 destinations with the same group/priority. and once it choses a destination, does it remain sticky to that destination and ignore other destinations with the same "group/priority"
  • Does my FRA know both destinations can be considered the same for "shipped" status?


First I will show you what I configured my tests.

In my example, I actually have both a standby database and a Far Sync instance for my primary database.

bsg18   --> Primary database
bsg18d --> Dataguard database
bsg18f  --> Far Sync database

This is what my 2 remote destinations look like in my configuration.


log_archive_dest_2 ='service="bsg18d" ASYNC NOAFFIRM max_failure=0 db_unique_name="bsg18d" group=1 priority=1 valid_for=(online_logfile,all_roles)';

log_archive_dest_3 ='Service="bsg18f" SYNC AFFIRM  max_failure=1 db_unique_name="bsg18f" group=1 priority=1 valid_for=(online_logfile,all_roles)';


You can see that both Dataguard (bsg18d) and Far Sync (bsg18f) are in "group=1 priority=1"

Multiple Destinations with the same Group and Priority - Does it balance them ?


First I'm going to look at where the logs are going.

   DEST_ID  SEQUENCE# S SHI APPLIED   DEL FAL
---------- ---------- - --- --------- --- ---
         2       1751 A YES NO        NO  NO
         2       1752 A YES NO        NO  NO
         2       1753 A YES NO        NO  NO


I can see that they are going to DEST_2, which is my dataguard instance.  I am going to shut it down, do a few log switches and see what happens.

   DEST_ID  SEQUENCE# S SHI APPLIED   DEL FAL
---------- ---------- - --- --------- --- ---
         2       1751 A YES YES       NO  NO
         1       1751 A YES NO        NO  NO
         1       1752 A YES NO        NO  NO
         2       1752 A YES YES       NO  NO
         1       1753 A YES NO        NO  NO
         2       1753 A YES YES       NO  NO
         1       1754 A YES NO        NO  NO
         2       1754 A YES NO        NO  NO
         1       1755 A YES NO        NO  NO
         1       1756 A YES NO        NO  NO
         1       1757 A YES NO        NO  NO
         1       1758 A YES NO        NO  NO
         1       1759 A YES NO        NO  NO
         1       1760 A YES NO        NO  NO


I've including DEST_1 this time to show that my sequence# had moved forward, but any sending of redo my standby has stopped.

Now I started it back up, and did a few log switches.

  DEST_ID  SEQUENCE# S SHI APPLIED   DEL FAL
---------- ---------- - --- --------- --- ---
         2       1751 A YES YES       NO  NO
         2       1752 A YES YES       NO  NO
         2       1753 A YES YES       NO  NO
         2       1754 A YES YES       NO  NO
         2       1761 A YES NO        NO  NO
         3       1755 A YES YES       NO  YES
         3       1758 A YES NO        NO  YES
 .....
         3       1754 A YES NO        NO  YES
         3       1752 A YES NO        NO  YES
         3       1762 A YES NO        NO  NO
         3       1763 A YES NO        NO  NO


After restarting the database, I can see that it is now using DEST_3 (my Far Sync) instance instead of  DEST_2 (dataguard).

Now for a final test on this, I am going to STOP my Far Sync instance.

   DEST_ID  SEQUENCE# S SHI APPLIED   DEL FAL
---------- ---------- - --- --------- --- ---
         2       1751 A YES YES       NO  NO
         2       1752 A YES YES       NO  NO
         2       1753 A YES YES       NO  NO
         2       1754 A YES YES       NO  NO
         2       1761 A YES YES       NO  NO
         3       1755 A YES YES       NO  YES
         3       1758 A YES NO        NO  YES
 ....
         3       1754 A YES YES       NO  YES
         3       1752 A YES YES       NO  YES
         3       1762 A YES NO        NO  NO
         3       1763 A YES NO        NO  NO
         2       1764 A YES NO        NO  YES
         2       1765 A YES NO        NO  NO
         2       1766 A YES NO        NO  NO


I stopped bsg18f (Far Sync), when it was on sequence 1764.  I can see that the database automatically switched to sending logs to bsg18d (dataguard) when DEST_3 because unavailable. 

This is exactly what I would expect to happen when I have 2 destinations with the same Group/Priority.
  • It chooses one of the 2 destinations and becomes "sticky" once chosen.
  • If the "sticky" destination becomes unavailable, it then automatically switches to the second destination.

FRA - If you use an FRA (Fast Recovery Area), how does it understand that I my redo can go to an alternate location?

Now after all this, you can see from my previous tests, some logs were sent from DEST_2 and some were sent from DEST_3.  Both of these were members of the same group.
Let's see if the FRA sees them as shipped (my retention policy).

FILE_TYPE               PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES     CON_ID
----------------------- ------------------ ------------------------- --------------- ----------
CONTROL FILE                             0                         0               0          0
REDO LOG                                 0                         0               0          0
ARCHIVED LOG                         21.78                     21.78             555          0
BACKUP PIECE                          7.54                       .45              38          0
IMAGE COPY                               0                         0               0          0
FLASHBACK LOG                            0                         0               0          0
FOREIGN ARCHIVED LOG                     0                         0               0          0
AUXILIARY DATAFILE COPY                  0                         0               0          0



Perfect !  All the archive log space is reclaimable.  When I have different remote destinations that are members of the same group, as long as the Redo Log was successfully shipped to a destination in the group, the log is eligible for deletion.