Showing posts with label high availability. Show all posts

Wednesday, May 10, 2023

ZDLRA real-time redo demonstrated

One of the key features of the ZDLRA is the ability to capture changes from the database "real-time" just like a standby database does. In this blog post I am going to demonstrate what is happening during this process so that you can get a better understanding of how it works.

If you look at the GIF above, I will explain what is happening, and show what happens with a demo of the process.

The ZDLRA uses the same process as a standby database. In fact if you look at the flow of the real-time redo you will notice the redo blocks are sent to BOTH the local redo log files, AND to the staging area on the ZDLRA. The staging area on the ZDLRA acts just like a standby redo does on a standby database.

As the ZDLRA receives the REDO blocks from the protected database they are validated to ensure that they are valid Oracle Redo block information. This ensures that a man-in-the-middle attack does not change any of the backup information. The validation process also assures that if the database is attacked by ransomware (changing blocks), the redo received is not tainted.

The next thing that happens during the process is the logic when a LOG SWITCH occurs. As we all know, when a log switch occurs on a database instance, the contents of the redo log are written to an archive log. With real-time redo, this causes the contents of the redo staging area on the ZDLRA (picture a standby redo log) to become a backup set of an archive log. The RMAN catalog on the ZDLRA is then updated with the internal location of the backup set.

Log switch operation

I am going to go through a demo of what you see happen when this process occurs.

ZDLRA is configured as a redo destination

Below you can see that my database has a "Log archive destination" 3 configured. The destination itself is the database on the ZDLRA (zdl9), and also notice that the log information will be sent for ALL_ROLES, which will send the log information regardless if it is a primary database or a standby database.

List backup of recent archive logs from RMAN catalog

Before I demonstrate what happens with the RMAN catalog, I am going to list out the current archive log backup. Below you see that the current archive log backed up to the ZDLRA has the "SEQUENCE #10".

Perform a log switch

As you see in the animation at the top of the post, when a log switch occurs, the contents of the redo log in the "redo staging area" are used to create an archive log backup that is stored and cataloged. I am going to perform a log switch to force this process.

List backup of archive logs from RMAN catalog

Now that the log switch occurred, you can see below that there is a new backup set created from the redo staging area.

There are a couple of interesting items to note when you look at the backup set created.

The backup of the archive log is compressed. As part of the policy on the ZDLRA you have the option to have the backup of the archive log compressed when it is created from the "staged redo". This does NOT require the ACO (Advanced Compression) license. The compressed archive log will be sent back to the DB compressed during a restore operation, and the DB host will uncompress it. This is the default option (standard compression) and I recommend changing it. If you decide to compress, then MEDIUM or Low is recommended. Keep this in mind that he this may put more workload on the client to uncompress the backup sets which may affect recovery times. NOTE: When using TDE, there will be little to no compression possible.
The TAG is automatically generated. By looking at the timestamp in the RMAN catalog information, you can see that the TAG is automatically generated using the timestamp to make it unique.
The handle begins with "$RSCN_", this is because the backup piece was generated by the ZDLRA itself, and archivelog backup sets will begin with these characters.

Restore and Recovery using partial log information

Now I am going to demonstrate what happens when the database crashes, and there is no time for the database to perform a log switch.

List the active redo log and current SCN

Below you can see that my currently active redo log is sequence # 12. This is where I am going to begin my test.

Create a table

To demonstrate what happens when the database crashes I am going to create a new table. In the table I am going to store the current date, and the current SCN. Using the current SCN we will be able to determine the redo log that contains the table creation.

Abort the database

As you probably know, if I shut down the database gracefully, the DB will automatically clean out the redo logs and archive it's contents. Because I want to demonstrate what happens with crash I am going to shut the database down with an ABORT to ensure the log switch doesn't occur. Then start the database mount so I can look at the current redo log information

Verify that the log switch did not occur

Next I am going to look at the REDO Log information and verify that my table creation (SCN 32908369) is still in the active redo log and did not get archived during the shutdown.

Restore the database

Next I am going to restore the database from backup.

Recover the database

This is where the magic occurs so I am going to show that happens step by step.

Recover using archive logs on disk

The first step the database does is to use the current archive logs to recover the database. You can see in the screenshot below that the database recovers the database using archive logs on disk up to sequence #11 for thread 1. This contains all the changes for this thread, but does not include what is in the REDO log sequence #12. Sequence #12 contains the create table we are interested in.

Recover using partial redo log

This step is where the magic of the ZDLRA occurs. You can see from the screen shot below that the RMAN catalog on the ZDLRA returns the redo log information for Sequence #12 even though it was never archived. The ZDLRA was able to create an archive log backup from the partial contents it had in the Redo Staging area.

Open the database and display table contents.

This is where it all comes together. Using the partial redo log information from Redo Log sequence #12, you can see that when the database is opened, the table creation transaction is indeed in the database even though the redo did not become an archive log.

Conclusion : I am hoping this post gives you a better idea of how Real-time redo works on the ZDLRA, and how it handles recovering transactions after a database crash

Friday, April 23, 2021

Enrolling my ExaCC RAC database using REST APIs

This post will continue the process of automating the enrollment of my RAC database using the OKV REST API, and some automation scripts. the steps to create the scripts are in my previous post.

NOTE: These steps are for ExaCC specific. If you want to learn about configuring OKV with Autonomous Database (ADB) when using ExaCC, the product manager, Peter Wahl has a great blog post on this topic. He also has videos as part of the "Ask Tom" series if you want to learn more about OKV 21c, or just OKV in general.

The first step is to download the zip file I created in the previous post. I downloaded it onto the first DB host in my RAC cluster. I unzipped it into /home/oracle/okv.

Below is what I am starting with.

.
 |-lib
 | |-okvrestcli.jar
 |-bin
 |-conf
 | |-okvrestcli_logging.properties
 | |-okvrestcli.ini
 | |-ewallet.p12.lck
 | |-ewallet.p12
 | |-cwallet.sso.lck
 | |-cwallet.sso
 | |-okvclient.ora
 |-setenv.sh
 |-run-me.sh

STEP #1 - Set the environment

First I am going to set my environment to the database instance I want to configure (jckey1), and then I am going to source the environment for my OKV install.

[oracle@exacc1]$ cd /home/oracle/okv
[oracle@exacc1]$ . oraenv
ORACLE_SID = [jckey1] ? jckey1
The Oracle base remains unchanged with value /u02/app/oracle
[oracle@exacc1]$ . ./setenv.sh
 
 
create environment variables OKV_RESTCLI_HOME and OKC_RESTCLI_CONFIG  
 
$OKV_RESTCLI_HOME    :  /home/oracle/okv 
$OKV_RESTCLI_CONFIG  :  /home/oracle/okv/conf/okvrestcli.ini 
 
Adding $OKV_RESTCLI_BIN to the $PATH

STEP #2 - Execute the enrollment creation script

The next step is to execute the run-me.sh that I created in the previous post. This will create the enrollment script. At the end of the output you will see the script it creates (okv-ep.sh).

NOTE: It will default to my DBNAME for the wallet name.

[oracle@exacc1]$ ./run-me.sh
 
executing script with $OKV_RESTCLI_HOME=/home/oracle/okv
 
 
DB Name is identified as jckey and ORACLE_SID is set to jckey1 setting
 
Press enter to keep this default [jckey], or enter the DB Name
DB Name [enter for Default]    : 
 
Using DB Name : jckey
 
#!/bin/bash
mkdir -pv /u02/app/oracle/admin/jckey/wallet
mkdir -pv /u02/app/oracle/admin/jckey/wallet/okv
 okv manage-access wallet create --wallet JCKEY --description "wallet for database JCKEY" --unique FALSE
okv admin endpoint create --endpoint JCKEY1_on_exacc1 --description "exacc11, 10.136.106.36" --type ORACLE_DB --platform L
INUX64 --unique FALSE
okv manage-access wallet set-default --wallet JCKEY --endpoint JCKEY1_on_exacc1
expect << _EOF
    set timeout 120
    spawn okv admin endpoint provision --endpoint JCKEY1_on_exacc1 --location /u02/app/oracle/admin/jckey/wallet/okv --auto
-login FALSE
    expect "Enter Oracle Key Vault endpoint password: "
    send "change-on-install\r"
    expect eof
_EOF

STEP #2 - Execute the enrollment script

[oracle@exacc1]$ ./okv-ep.sh
{
  "result" : "Success"
}
{
  "result" : "Success"
}
{
  "result" : "Success"
}
spawn okv admin endpoint provision --endpoint JCKEY1_on_exacc1 --location /u02/app/oracle/admin/jckey/wallet/okv --auto-login FALSE
Enter Oracle Key Vault endpoint password: 
{
  "result" : "Success",
  "value" : {
    "javaHome" : "/u02/app/oracle/product/19.0.0.0/dbhome_8/jdk"
  }
}

STEP #3 - We can verify what the enrollment script did

I am first going to look under $ORACLE_BASE/admin/$DBNAME/wallet where it placed the okv client.

[oracle@exacc1]$ pwd
/u02/app/oracle/admin/jckey/wallet
[oracle@exacc1]$ find . | sed -e "s/[^-][^\/]*\// |/g" -e "s/|\([^ ]\)/|-\1/"
.
  |-okv
 | |-bin
 | | |-okveps.x64
 | | |-okvutil
 | | |-root.sh
 | |-ssl
 | | |-ewallet.p12
 | |-csdk
 | | |-lib
 | | | |-liborasdk.so
 | |-jlib
 | | |-okvutil.jar
 | |-conf
 | | |-okvclient.ora
 | | |-logging.properties
 | | |-okvclient.lck
 | |-lib
 | | |-liborapkcs.so
 | |-log
 | | |-okvutil.deploy.log

Now I am going to verify in OKV and I can see the wallet got created for my database.

And I am going to look at the endpoint, and verify the default wallet is set.

STEP #4 Execute root.sh (only if this is the first install on this host).

I execute the root.sh script in the /bin directory as root.

[root@exacc1]# ./root.sh
Creating directory: /opt/oracle/extapi/64/hsm/oracle/1.0.0/
Copying PKCS library to /opt/oracle/extapi/64/hsm/oracle/1.0.0/
Setting PKCS library file permissions
Installation successful.

STEP #5 - Verify we can contact the OKV server

The next step is to execute the okvutil list command to verify we can contact the OKV host, and that the default wallet is configured.

[oracle@exacc1]$ ./okvutil list
Enter Oracle Key Vault endpoint password: 
Unique ID                               Type            Identifier
9E8BD892-D799-44B7-8289-94447E7ACC54    Template    Default template for JCKEY1_ON_ECC5C2N1

STEP #6 - change the OKV endpoint password

[oracle@exacc1]$ /u02/app/oracle/admin/jckey/wallet/okv/bin/okvutil changepwd -t wallet -l /u02/app/oracle/admin/jckey/wallet/okv/ssl/
Enter wallet password: change-on-install
Enter new wallet password: {my new password}
Confirm new wallet password:  {my new password}
Wallet password changed successfully

STEP #7 Install the client and change the password on all nodes.

I followed the steps above on the other 3 nodes to install the client and change the password.

STEP #8 Upload the keys from the wallet file.

I uploaded the keys from the shared wallet files on ACFS.

[oracle@exacc1]$ /u02/app/oracle/admin/jckey/wallet/okv/bin/okvutil upload -t wallet -l /var/opt/oracle/dbaas_acfs/jckey/wallet_root/tde -v 2 -g JCKEY
okvutil version 21.1.0.0.0
Endpoint type: Oracle Database
Configuration file: /u02/app/oracle/admin/jckey/wallet/okv/conf/okvclient.ora
Server: 10.136.102.243:5696 
Standby Servers: 
Uploading from /acfs01/dbaas_acfs/jckey/wallet_root/tde
Enter source wallet password: 
Enter Oracle Key Vault endpoint password: 
ORACLE.SECURITY.DB.ENCRYPTION.Ab8Sv6Ezs08fv9Sy7/zZB8oAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ORACLE.SECURITY.KM.ENCRYPTION.Ab8Sv6Ezs08fv9Sy7/zZB8oAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ORACLE.SECURITY.KB.ENCRYPTION.
ORACLE.SECURITY.ID.ENCRYPTION.
ORACLE.SECURITY.KM.ENCRYPTION.ATQdCFHhVk9Yv7er6uZtDf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ORACLE.SECURITY.DB.ENCRYPTION.ATQdCFHhVk9Yv7er6uZtDf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ORACLE.SECURITY.DB.ENCRYPTION.MASTERKEY.BFF45EC14E46013BE053246A880A5564
ORACLE.SECURITY.DB.ENCRYPTION.MASTERKEY

Uploaded 2 TDE keys
Uploaded 0 SEPS entries
Uploaded 0 other secrets
Uploaded 4 opaque objects

Uploading private persona
Uploading certificate request
Uploading trust points

Uploaded 1 private keys
Uploaded 1 certificate requests
Uploaded 0 user certificates
Uploaded 0 trust points

Upload succeeded

STEP #9 Copy current wallet, and add OKV credentials.

Now you copy the current wallet file (from the ACFS location) to the tde directory (new OKV install) next to the OKV install.

In my case since my OKV client is installed in $ORACLE_BASE/admin/jckey/wallet (which will be the WALLET_ROOT), the tde directory will be the file location for wallets.

I am also adding my password credentials to the local wallet.

NOTE: "OKV_PASSWORD" is used to open the wallet. "HSM_PASSWORD" is used to access the OKV server(s).

mkdir /u02/app/oracle/admin/jckey/wallet/tde_seps
mkdir /u02/app/oracle/admin/jckey/wallet/tde
cp /var/opt/oracle/dbaas_acfs/jckey/wallet_root/tde/* /u02/app/oracle/admin/jckey/wallet/tde/.
ADMINISTER KEY MANAGEMENT ADD SECRET 'Welcome1+' FOR CLIENT 'OKV_PASSWORD' TO LOCAL AUTO_LOGIN KEYSTORE '/u02/app/oracle/admin/jckey/wallet/tde_seps';
ADMINISTER KEY MANAGEMENT ADD SECRET 'Welcome1+' FOR CLIENT 'HSM_PASSWORD' TO AUTO_LOGIN KEYSTORE '/u02/app/oracle/admin/jckey/wallet/tde';

STEP # 10 Change the WALLET_ROOT

Since WALLET_ROOT can only be changed with a restart, I am going to shut down all instances in the cluster and perform the next few steps on the first node only.

SQL> alter system set WALLET_ROOT='/u02/app/oracle/admin/jckey/wallet' scope=spfile;

System altered.

SQL> shutdown immediate
startup mount;
ORA-01109: database not open


Database dismounted.
ORACLE instance shut down.
SQL> 
alter system set tde_configuration='KEYSTORE_CONFIGURATION=OKV|FILE' scope=both;

select b.name pdb_name,wrl_type,
wrl_parameter,
status,wallet_type,
keystore_mode,
fully_backed_up
from v$encryption_wallet a,v$containers b
where a.con_id = b.con_id(+);SQL> SQL> SQL> SQL> SQL> SQL> SQL>   2    3    4    5    6    7  

PDB Name   Type       WRL_PARAMETER					 Status 			WALLET_TYPE	     KEYSTORE Backed Up
---------- ---------- -------------------------------------------------- ------------------------------ -------------------- -------- ----------
CDB$ROOT   FILE       /u02/app/oracle/admin/jckey/wallet/tde/		 OPEN				AUTOLOGIN	     NONE     YES
CDB$ROOT   OKV								 OPEN_NO_MASTER_KEY		OKV		     NONE     UNDEFINED
PDB$SEED   FILE 							 OPEN				AUTOLOGIN	     UNITED   YES
PDB$SEED   OKV								 OPEN_NO_MASTER_KEY		OKV		     UNITED   UNDEFINED
JCKPDB	   FILE 							 OPEN				AUTOLOGIN	     UNITED   YES
JCKPDB	   OKV								 OPEN_NO_MASTER_KEY		OKV		     UNITED   UNDEFINED

SQL> shutdown immediate
startup ;

STEP # 11 Combine the local wallet File and OKV.

Next I need to migrate the keys using the local wallet. Note this will rekey the database.

ADMINISTER KEY MANAGEMENT SET ENCRYPTION KEY IDENTIFIED BY "-okv key" MIGRATE USING "-local wallet key-" WITH BACKUP;

STEP # 12 restart the instance and make sure the wallet open.


PDB Name   Type       WRL_PARAMETER                              Status              WALLET_TYPE     KEYSTORE Backed Up
---------- ---------- -------------------------------            ------------------- --------------- --------- ----------
CDB$ROOT   FILE       /u02/app/oracle/admin/jckey/wallet/tde/    OPEN                AUTOLOGIN       NONE     YES
CDB$ROOT   OKV                                                   OPEN                OKV             NONE     UNDEFINED
PDB$SEED   FILE                                                  OPEN                AUTOLOGIN       UNITED   YES
PDB$SEED   OKV                                                   OPEN                OKV             UNITED   UNDEFINED
JCKPDB     FILE                                                  OPEN                AUTOLOGIN       UNITED   YES
JCKPDB     OKV                                                   OPEN                OKV             UNITED   UNDEFINED

STEP # 13 rebuild the local wallet with the password

I deleted the original wallet files from the "tde" and "tde_seps" directories and recreated them using the exact same steps from step #9. The only addition is that I needed to create the wallet first


ADMINISTER KEY MANAGEMENT ADD SECRET 'Welcome1+' FOR CLIENT 'OKV_PASSWORD' TO LOCAL AUTO_LOGIN KEYSTORE '/u02/app/oracle/admin/jckey/wallet/tde_seps';
ADMINISTER KEY MANAGEMENT ADD SECRET 'Welcome1+' FOR CLIENT 'HSM_PASSWORD' TO AUTO_LOGIN KEYSTORE '/u02/app/oracle/admin/jckey/wallet/tde';

I then pushed executed the same commands to create the wallets on all the nodes in the clusters in the same location .

STEP # 14 - Bounce the database.

I bounced the database and made sure the wallet was open on all 4 nodes. Done.

INST_ID    PDB Name Type  WRL_PARAMETER                           Status               WALLET_TYPE   KEYSTORE Backed Up
-------- ---------- ----- ----------------------------------------  ------------------ -------------- -------- ---------
1        CDB$ROOT   OKV                                             OPEN               OKV            NONE     UNDEFINED
2        CDB$ROOT   OKV                                             OPEN               OKV            NONE     UNDEFINED
3        CDB$ROOT   OKV                                             OPEN               OKV            NONE     UNDEFINED
4        CDB$ROOT   OKV                                             OPEN               OKV            NONE     UNDEFINED
1        PDB$SEED   OKV                                             OPEN               OKV            UNITED   UNDEFINED
2        PDB$SEED   OKV                                             OPEN               OKV            UNITED   UNDEFINED
3        PDB$SEED   OKV                                             OPEN               OKV            UNITED   UNDEFINED
4        PDB$SEED   OKV                                             OPEN               OKV            UNITED   UNDEFINED
1        JCKPDB     OKV                                             OPEN               OKV            UNITED   UNDEFINED
2        JCKPDB     OKV                                             OPEN               OKV            UNITED   UNDEFINED
3        JCKPDB     OKV                                             OPEN               OKV            UNITED   UNDEFINED
4        JCKPDB     OKV                                             OPEN               OKV            UNITED   UNDEFINED
1        PDB$SEED   FILE                                            OPEN_NO_MASTER_KEY AUTOLOGIN      UNITED   UNDEFINED
1        CDB$ROOT   FILE  /u02/app/oracle/admin/jckey/wallet/tde/   OPEN_NO_MASTER_KEY AUTOLOGIN      NONE     UNDEFINED
2        CDB$ROOT   FILE  /u02/app/oracle/admin/jckey/wallet/tde/   OPEN_NO_MASTER_KEY AUTOLOGIN      NONE     UNDEFINED
3        CDB$ROOT   FILE  /u02/app/oracle/admin/jckey/wallet/tde/   OPEN_NO_MASTER_KEY AUTOLOGIN      NONE     UNDEFINED
4        CDB$ROOT   FILE  /u02/app/oracle/admin/jckey/wallet/tde/   OPEN_NO_MASTER_KEY AUTOLOGIN      NONE     UNDEFINED
1        PDB$SEED   FILE                                            OPEN_NO_MASTER_KEY AUTOLOGIN      UNITED   UNDEFINED
2        PDB$SEED   FILE                                            OPEN_NO_MASTER_KEY AUTOLOGIN      UNITED   UNDEFINED
3        PDB$SEED   FILE                                            OPEN_NO_MASTER_KEY AUTOLOGIN      UNITED   UNDEFINED
4        PDB$SEED   FILE                                            OPEN_NO_MASTER_KEY AUTOLOGIN      UNITED   UNDEFINED
1        JCKPDB     FILE                                            OPEN_NO_MASTER_KEY AUTOLOGIN      UNITED   UNDEFINED
2        JCKPDB     FILE                                            OPEN_NO_MASTER_KEY AUTOLOGIN      UNITED   UNDEFINED
3        JCKPDB     FILE                                            OPEN_NO_MASTER_KEY AUTOLOGIN      UNITED   UNDEFINED
4        JCKPDB     FILE                                            OPEN_NO_MASTER_KEY AUTOLOGIN      UNITED   UNDEFINED

That's all there is to it. I now have my ExaCC database configuring to use OKV as the key store, and autologin into the wallet on all instances !

Monday, March 29, 2021

ZDRLA adds smart incremental to be even smarter.

Recently version 19.1.1.2 of ZDLRA software was released, and one the features is something called "Smart Incremental". I will walk through how this feature works, and help you understand why features like this are "ZDLRA Only".

I am going to start by walking through how incremental backups become "virtual full backups", and that will give you a better picture of how "smart incremental" is possible.

The most important thing to understand about these features is that the RMAN catalog itself is within the ZDLRA AND the ZDLRA has the ability to update the RMAN catalog.

How does a normal backup strategy work ?

That is probably the best place to start. What DBAs typically do is perform a WFDI (Weekly Full Daily Incremental) backup. To keep my example simple, I will use the following assumptions.

My database contains 3 datafile database. SYSTEM, SYSAUX, USERS, but I will only use the example of backing up datafile users.
Each of these 3 datafiles are 50 GB in size
I am only performing a differential backup which creates a backup containing the changes since the last backup (full OR incremental).
My database is in archivelog *

* NOTE: With ZDLRA you can back up a nologging database, and still take advantage of virtual fulls. The database needs to be in a MOUNTED state when performing the incremental backup.

If placed in a table the backups for datafile USERS would look this. Checkpoint SCN is the current SCN number of the database at the start of the backup.

If I were to look at what is contained in the RMAN catalog (RC_BACKUP_DATAFILE), I would see the same backup information but I would see the SCN information 2 columns.

Incremental change # is the oldest SCN contained in the backupset. This is the starting SCN number of the previous backup, this backup is based on.
Checkpoint Change # is starting SCN number of the backup. Everything newer than this SCN (including this SCN) needs to be defuzzied.

Normal backup progression (differential)

When performing an incremental RMAN backup of a datafile, the first thing that RMAN does is decide which blocks needs to be backed up. Because you are performing an incremental backup, you may be backing up all of the blocks, only some of the blocks, or even none of the blocks if the file has not changed.

This is a decision RMAN makes by querying the RMAN catalog entries (or the controlfile entries if you not using an RMAN catalog).

Now let's walk through this decision process. Each RMAN incremental differential's starting SCN is based on the beginning SCN of the previous backup (except for the full).

By looking at the RMAN catalog (or controlfile), RMAN determines which blocks need to be contained in each incremental backup.

Normal backup progression (cumulative differential)

Up to release 19.1.1.2, the recommendation was to perform a Cumulative Differential backup. The cumulative differential backup compares the starting SCN number of the last full backup to determine the starting point of the incremental backup (rather than the last incremental backup) .

The advantage of the cumulative over differential, is that a cumulative backups can be applied to the last full and take the place of applying multiple differential backups. However, cumulative backups are bigger every day that passes between full backups because they contain all blocks since the last full.

Below is what a cumulative schedule would look like and you can compare this to the differential above.

You can see that each cumulative backups starts with the Checkpoint SCN of the last full to ensure that all blocks changed since the full backup started are captured.

The RMAN catalog entries would look like this.

If you were astute, you would notice a few things happened with the cumulative differential vs the differential.

The backup size got bigger every day
The time it took to perform the incremental backup got longer
The range of SCNs contained in the incremental is larger for a cumulative backup.

ZDLRA backup progression (cumulative differential)

As you most likely know, one the most important features of the ZDLRA is the ability to create a "virtual full" from an incremental backup.,

If we look at what happens with a cumulative differential (from above), I will fill in the virtual full RMAN catalog entries by shading them light green.

The process of performing backups on the ZDLRA is exactly the same as it is for the above cumulative, but the RMAN catalog looks like this.

What you will noticed by looking at this compared to the normal cumulative process that

For every cumulative incremental backup there is a matching virtual full backup The Virtual full backup appears (from the newly inserted catalog entry) to have beeen taken at the same time, and the same starting SCN number as the cumulative incremental. Virtual full backups, and incremental backups match time, and SCN as catalog entries.
The size of the virtual full is 0 since it is virtual and does not take up any space.
The completion time for the cumulative incremental backup is the same as the differential backups. Because the RMAN logic can see the virtual full entry in the catalog, it executes the cumulative incremental EXACTLY as if it is the first differential incremental following a full backup.

Smart Incremental backups -

Now all of this led us to smart incremental backups. Sometimes the cumulative backup process doesn't work quite right. A few of the reasons this can happen are.

You perform a full backup to a backup location other than the ZDLRA. This could be because you are backing up to the ZDLRA for the first time replacing a current backup strategy, or maybe you created a special backup to disk to seed a test environment (Using a keep backup for this will alleviate this issue). The cumulative incremental backup will compare against the last full regardless of where it was taken (there is exceptions if you always use tags to compare).

You implement TDE or rekey the database. Implementing TDE (Transparent Data Encryption) changes the blocks, but does not change the SCN numbers of the blocks. A new full backup is required.

Previously, you would have to perform a special full backup to correct these issues. In the example below you can see what happens (without smart incremental) to the RMAN catalog if you perform a backup on Thursday at 12:00 to disk to refresh a development environment.

Since the cumulative backups are based on the last full backup, the Thursday - Saturday backups contain all the blocks that have changed since the disk backup started on Thursday at 12:00.

And, since it is cumulative, each days backup is larger, and takes longer.

This is when you would typically have to force a new level 0 backup of the datafile.

What the smart incremental does

Since the RMAN catalog is controlled by the ZDLRA it can correct the problem for you. You no longer need to perform cumulative backups as the ZDLRA can fill in any issues that occur.

In the case of the Full backup to disk, it can "hide" that entry, and continue to correctly perform differential backups. It would "hide" the disk backup that occured, and inform the RMAN client that the last full backup as of Thursday night is NOT the disk backup, but it is the previous virtual full backup.

In the case of the TDE, it can "hide" all of the Level 0 virtual full backups, and the L1 differential backups (which will force a new level 0).

All of this is done without updating the DB client version. All the magic is done within the RMAN catalog on the ZDLRA.

Now isn't that smart ?

Thursday, September 24, 2020

ZDLRA - How to do a storage checkup

One of the items that comes up with the ZDLRA is a storage checkup. The DBAs want to know more detail about the storage utilization of each database.

Once you understand the above concepts you realize that are there 2 major pieces that affect the storage utilization for a database.

1) How much space a level 0 backup takes. Since the ZDLRA virtualizes full backups, each database has at least 1 copy of each block on the ZDLRA. It would be only 1 if it doesn't change, or it could 30 copies of the same block if it changes every day (like system tablespace data). What you are interested is the size of 1 full backup

2) The amount of storage 1 day of changes takes up (on average). This would be the stored size of an incremental backup (if you perform an incremental every day), and it would be the stored size of the archive logs for a day of workload.

By combining these 2 pieces you can then calculate how much storage is needed for X number of days of backups.

Now how do I do this ? below is the query I use, and I will explain the columns in it.

select db_unique_name,
               trunc(size_estimate,0) estimated_db_size,
               recovery_window_goal,
               trunc(space_usage,0) space_usage,
               trunc(estimate_zero_day_space - ((estimate_seven_day_space - estimate_one_day_space)/6),0) level_0_size,
               trunc((estimate_seven_day_space - estimate_one_day_space)/6,1) one_day_space,
               trunc(recovery_window_space,0) recovery_window_space,
               disk_reserved_space,
estimate_rwg_space
from
(Select db_unique_name,
       Space_usage,
       extract(day from recovery_window_goal) recovery_window_goal,
       dbms_ra.estimate_space (DB_UNIQUE_NAME,numtodsinterval(1,'day')) estimate_zero_day_space,
       dbms_ra.estimate_space (DB_UNIQUE_NAME,numtodsinterval(1,'day')) estimate_one_day_space,
       dbms_ra.estimate_space (DB_UNIQUE_NAME,numtodsinterval(7,'day')) estimate_seven_day_space,
       dbms_ra.estimate_space (DB_UNIQUE_NAME,recovery_window_goal) estimate_rwg_space,
        RECOVERY_WINDOW_SPACE,
       disk_reserved_space,
       size_estimate
               from ra_database);

What's returned

DB_UNIQUE_NAME	DB name
RECOVERY_WINDOW_GOAL	How long backups are kept
SPACE_USAGE	How much space (GB) is the DB using in total ?
LEVEL_0_SIZE	Estimated size (GB) of just the full backup
ONE_DAY_SPACE	Estimated space usage (GB) for a single day of backups
RECOVERY_WINDOW_SPACE	How much space is needed for a 14 day recovery window.
DISK_RESERVED_SPACE	How much space is set aside for backups ?
ESTIMATE_DB_SIZE	How big is the database (GB) estimated to be ?
ESTIMATE_RWG_SPACE	This returns the space (GB) needed for the recovery window from RA_DATABASE which may not match calculating using the columns returned.

Now let's take a look at what I can do with this..

This is an example, where I summarize the space utilization for a couple of ZDLRAs.

And here I looked at the detail for the databases.

This report (just above this) gives me some really useful information.

I can see DB02 has a really big change rate. The database is only about 2.5 TB, but it is using 14 TB of storage.
I can see the disk_reserved_space is way too small for DB01. DB01 needs about 15 TB to keep it's recovery window goal, but the disk_reserved_space is only 500 GB
DB03 looks good. Change rate is not significant, and disk_reserved_space is set a little higher than the RECOVERY_WINDOW_SPACE.

Now finally, I was able to use the one_day_space, and graph out the space utilization for each days Recovery window.

This graph shows each day of RWG and it's storage needs, the currently USED, and the USABLE space. I can see that even though my used space is close to the usable space, there is still room for growth. I can also use this to see what happens to my storage utilization if I changed the RWG to 10 days.

I highly recommend periodically doing a health check on your storage utilization, and review the disk_reserved_space.

I hope this gives some information you can use to take a closer look.

**NOTE ** this query is accurate as 9/30/20. It might need to be adjusted with future releases.

Also, it is only as accurate as the data. The longer a database has been backing up with a consistent workload, the better the estimate.

Wednesday, September 23, 2020

ZDLRA, Real-time redo and compression

In this post I will go through what happens to archive logs sent to the ZDLRA through real-time redo.

The most common way to send archivelog backups to a ZDLRA is through real-time redo.

In this method the ZDLRA is treated just like a standby database destination.

The main difference with sending logs to the ZDLRA is that logs need to be sent (REDO_TRANSPORT_USER) as the VPC (virtual private catalog) account that is registered to send backups.

This is done by use of wallet containing the VPC user ID and Password and is included in the channel configuration parameter.

There is a great explanation of most of this from my colleague Fernando Simon and you can find it here.

ZDLRA, Real-Time Redo and Zero RPO

What I wanted to go through is the process of sending the logs (real-time), and the process of storing the logs on the ZDLRA.

The first thing to understand is the steps in the process of turning real-time redo into RMAN backupsets.

Step 1 The redo is captured real-time from the ZDLRA through the use of "shadow logs". Think of "shadow logs" as standby redo logs that are created for each database, and for each redo log that is being captured. Just like standby redo logs, these are full size logs. To give you an example, lets say there are 6 databases sending real-time redo the the ZDLRA, 3 of these are 2 node RAC clusters. Each database have a redo log size of 20 GB.

On the ZDLRA, these are mirrored (to disk) and will use storage which is included in the USAGE number for the database. In my example there will be 9 logs

Step 2 - When a log switch occurs a task is created called BACKUP_ARCH. This task is responsible for taking the "shadow log" and turning it into an RMAN backupset containing the log.

The RMAN backupset can be compressed (and it uses BASIC by default, please change it) based on the policy that the Database is a member of.

One of the advantages of the ZDLRA is that the compression license is NOT needed to use other degrees of compression.

The suggestion I would make is.

TDE Databases - Put ALL TDE databases in their own policy and set compression to NONE. TDE archive logs will not compress and will cause overhead.]

NON-TDE databases - Use LOW compression. this will give you best combination of compression ratio and elapsed time.

Now let's take a look at the tasks to see what I am talking about.

Below is a snippet from the currently running tasks (taken from a SAR report).

TASK_TYPE                 PRIORITY  STATE            CURRENT_COUNT  LAST_EXECUTE_TIME     WORK_TYPE    MIN_CREATION
----------------------  ----------  ---------------  -------------  --------------------  -----------  ------------
BACKUP_ARCH                    120  RUNNING                      7  03-OCT-2019 14:49:08  Work         03-OCT-2019

I can see there there are currently 7 redo logs that have switched, and are awaiting processing to become backupsets. This number should always be very small.

Below is a snippet from the tasks executed in the last 24 hours (also from a SAR report).

TASK_TYPE               STATE                   CNT     CREATED  MIN_COMPLETION_TIME     MAX_COMPLETION_TIME     OLD_CREATION_TIME
----------------------  ---------------  ----------  ----------  ----------------------  ----------------------  ----------------------
BACKUP_ARCH             COMPLETED             9,591       9,580  02-OCT-2019 18:50:35    03-OCT-2019 14:50:28    02-OCT-2019 18:49:49

This is telling me that there were 9,591 log switches on all my protected databases in the last 24 hours.

From a compression standpoint. PLEASE at least change the current setting in your policies for compression. and use the recommendations.

TDE - No compression

No TDE - LOW compression.

I point out in the my last post why this so important to get right.

Sunday, April 1, 2018

ZDLRA "Store and Forward " feature

Most people didn't notice, but there was a new feature added to the ZDLRA called "store and forward".

Documentation on how to implement it is in the "ZDLRA Administration Guide" under the topic of implementing high availability strategies.

Within that you will find the section on

"Managing Temporary Outages with a Backup and Redo Failover Strategy". This section describes what I have called “store and forward”.

ZDLRA offers the customer the ability to send backups (Redo logs and Level 1 backups) to an alternate ZDLRA location. This provides an efficient HA solution for this information if the primary ZDLRA can't be reached.

Now in order to explain how Store and Forward works, first lets take a look at the architecture.

We have a database we are backing up called "PROTDB"
We have 2 different ZDLRA's. Store and Forward requires a minimum of 2 ZDLRA appliances in a datacenter. In this case some of the databases have one of their ZDLRAs as their backup target and the remaining databases have the other ZDLRA as their backup target.
For databases backing up to ZDLRA #1 "RA01" will be the preferred ZDLRA that their Level 1 backups and the redo log stream will go to. ZDLRA #2 "RA02" will be the alternate ZDLRA that Level 1 backups and the redo log stream will go to in the event of an outage communicating with preferred ZDLRA "RA01".
The reverse will be true for databases backing up to ZDLRA #2 with the alternate being ZDLRA #1

NOTE : A database has to be unique within a ZDLRA. What this means is that the alternate ZDLRA cannot already used for replication or to backup a dataguard copy of the same database.

Now that we have defined the architecture let's go through the pieces that make up the store-and-forward methodology.

First however I will define what I mean by "Upstream" and "downstream".

UPSTREAM - This is the ZDLRA that sends replicated backup copies.

DOWNSTREAM - This is the ZDLRA that receives the replicated backup copies.

A ZDLRA can act as both an UPSTREAM and a DOWNSTREAM. This is common when a customer has 2 active datacenters. Each ZDLRA acts as both an Upstream (receiving backups directly) and as a Downstream (receiving replicated backups).

In the store-and-forward methodology backups are sent to the Downstream as the primary, and the Upstream as the Alternate. This allows for backups to replicate from the Alternate (Upstream) to the Primary (Downstream). This will be explained as you walk through flow.

Configuring Store-and-Forward

1) Configure "RA01" to be the down stream replicated pair of RA02.
2) Ensure that the protected database ("PROTDB") is added to policies on both RAs (this process is described in the 12.2 admin guide)
3) Ensure "PROTDB" has a wallet entries for both RAs, and that it the database is properly registered in both RMAN catalogs (using the admin guide).
3) Configure real-time redo apply using "RA01" as the primary RA and "RA02" as the alternate.

NOTE: Real-time redo isn't mandatory to use but it makes the switching over of redo a lot easier. I will show how the environment looks with real-time redo. if you are manually sending archive logs and level 0 backups, the flow will be similar.

Real-time Redo flow

First lets take a look at the configuration for real-time redo.

Below is the configuration for a database with both a primary and and alternate ZDLRA. Working with an alternate destination is well described in this blog post.

Primary ZDLRA (RA01) configuration

LOG_ARCHIVE_DEST_3=“SERVICE=<"RA01" string from wallet>”, VALID_FOR=(ALL_LOGFILES, ALL_ROLES) ASYNC DB_UNIQUE_NAME=’<"RA01" ZDLRA DB>’ noreopen alternate=log_archive_dest_4;

log_archive_dest_state_3=enable;

Alternate ZDLRA (RA02) configuration

LOG_ARCHIVE_DEST_4=“SERVICE=<"RA02" string from wallet>”, VALID_FOR=(ALL_LOGFILES, ALL_ROLES) ASYNC DB_UNIQUE_NAME=’<"RA02" ZDLRA DB> ;
LOG_ARCHIVE_STATE__4=alternate;

Below is what the flow looks like.

Redo log traffic and backups are sent from "PROTDB" to "RA01". "RA02" (since it is the upstream pair of "RA01") is aware of the backups in it's RMAN catalog.

Now let's take a look at the status of the destinations

SQL> select dest_id, dest_name, status from 
v$archive_dest_status where status <> 'INACTIVE';
DEST_ID DEST_NAME STATUS
---------- --------------------- ---------
 1 LOG_ARCHIVE_DEST_3 VALID
 2 LOG_ARCHIVE_DEST_4 UNKNOWN

You can see that the redo logs are sent to DEST_3 ("RA01") and DEST_4 ("RA02") is not active.

Now lets see what happens when "RA01" can't be reached.

SQL> select dest_id, dest_name, status from 
v$archive_dest_status where status <> 'INACTIVE';
DEST_ID DEST_NAME STATUS
---------- --------------------- ---------
 1 LOG_ARCHIVE_DEST_3 DISABLED
 2 LOG_ARCHIVE_DEST_4 VALID

After the second failed attempt, the original destination is marked as disabled, and the alternate is valid.

Below you can see that the redo logs, and the backups (Level 1) are being sent to "RA02".

"PROTDB" connects to the catalog on "RA02" which is aware of the previous backups and synchronizes its backup information with the control file.

This allows the next Level 1 incremental backup to be aware of the most current virtual full backup on "RA01".

This also allows the redo log stream to continue where it left off with "RA01". The RMAN catalog on "RA02" is aware of all redo logs backups on "RA01" and is able to continue with the next log.

Now lets see what happens when "RA01" becomes available.

When "RA01" becomes available, you start the replication flow downstream. This will allow all the backups (redo and/or Level 1) to replicate to "RA01", be applied to the RA, and update the RMAN catalog.

Once this complete, RA01 will have virtualized any backups, along with storing and cataloging all redo logs captured.

BUT, at this point the primary log destination is still disabled so we need to renable it to start the redo log flow back.

SQL> alter system set log_archive_dest_state_3=enable;
System altered.
SQL> alter system set log_archive_dest_state_4=alternate;
System altered.

Once this is complete. We are back to where we started.

That's it.

Store-and-forward is a great HA solution for capturing real-time redo log information to absorb any hiccups that may occur.