Monday, July 24, 2023

RMAN - Create weekly archival backup from weekly full backups

This blog post demonstrates a process to create KEEP archival backups dynamically by using backups pieces within a weekly full/daily incremental backup strategy.

Thanks to Battula Surya Shiva Prasad and Kameswara RaoIndrakanti for coming up process of doing this.

KEEP backups

First let's go through what keep a keep backup is and how it affects your backup strategy.

A KEEP backup is a self-contained backupset. The archive logs needed to de-fuzzy the database files are automatically included in the backupset.
The archive logs included in the backup are only the archive logs needed to de-fuzzy.
The backup pieces in the KEEP backup (both datafile backups and included archive log pieces) are ignored in the normal incremental backup strategy, and in any log sweeps.
When a recovery window is set in RMAN, KEEP backup pieces are ignored in any "Delete Obsolete" processing.
KEEP backup pieces, once past the "until time" are removed using the "Delete expired" command.

Normal process to create an archival KEEP backup.

Perform a weekly full backup and a daily incremental backup that are deleted using an RMAN recovery window.
Perform archive log backups with the full/incremental backups along with log sweeps. These are also deleted using the an RMAN recovery window.
One of these processes are used to create an archival KEEP backup.

A separate full KEEP backup is performed along with the normal weekly full backup
The weekly full backup (and archive logs based on tag) are copied to tape with "backup as backupset" and marked as "KEEP" backup pieces.

Issues with this process

The process of copying the full backup to tape using "backup as backupset" requires 2 copies of the same backup for a period of time. You don't want to wait until the end of retention to copy it to tape.
If the KEEP full backups are stored on disk, along with the weekly full backups you cannot use the backup as backupset, you must perform a second, separate backup.

Proposal to create a weekly KEEP backup

Problems with simple solution

The basic idea is that you perform a weekly full backup, along with daily incremental backups that are kept for 30 days. After the 30 day retention, just the full backups (along with archive logs to defuzzy) are kept for an additional 30 days.

The most obvious way to do this is to

Set the RMAN retention 30 days
Create a weekly full backup that is a KEEP backup with an until time of 60 days in the future.
Create a daily incremental backup that NOT a keep backup.
Create archive backups as normal.
Allow delete obsolete to remove the "non-KEEP" backups after 30 days.

Unfortunately when you create an incremental backups, and there is only KEEP backups proceeding it, the incremental Level 1 backup is forced into an incremental level 0 backups. And with delete obsolete, if you look through MOS note "RMAN Archival (KEEP) backups and Retention Policy (Doc ID 986382.1)" you find that the incremental backups and archive logs are kept for 60 days because there is no proceeding non-KEEP backup.

Solution

The solution is to use tags, mark the weekly full as a keep after a week, and use the "delete backups completed before tag='xx'" command.

Weekly full backup scripts

run
{
   backup archivelog all filesperset=20  tag ARCHIVE_ONLY delete input;
   change backup tag='INC_LEVEL_0'  keep until time 'sysdate+53';
   backup incremental level 0 database tag='INC_LEVEL_0' filesperset=20  plus archivelog filesperset=20 tag='INC_LEVEL_0';

  delete backup completed before 'sysdate-61' tag= 'INC_LEVEL_0';
  delete backup completed before 'sysdate-31' tag= 'INC_LEVEL_1';
  delete backup completed before 'sysdate-31' tag= 'ARCHIVE_ONLY';
}

Daily Incremental backup scripts

run
{
  backup incremental level 1 database tag='INC_LEVEL_1'  filesperset=20 plus archivelog filesperset=20 tag='INC_LEVEL_1';
}

Archive log sweep backup scripts

run
{
  backup archivelog all tag='ARCHIVE_ONLY' delete input;
}

Example

I then took these scripts, and built an example using a 7 day recovery window. My full backup commands are below.

run
{
   backup archivelog all filesperset=20  tag ARCHIVE_ONLY delete input;
   change backup tag='INC_LEVEL_0'  keep until time 'sysdate+30';
   backup incremental level 0 database tag='INC_LEVEL_0' filesperset=20  plus archivelog filesperset=20 tag='INC_LEVEL_0';

  delete backup completed before 'sysdate-30' tag= 'INC_LEVEL_0';
  delete backup completed before 'sysdate-8' tag= 'INC_LEVEL_1';
  delete backup completed before 'sysdate-8' tag= 'ARCHIVE_ONLY';
}

First I am going to perform a weekly backup and incremental backups for 7 days to see how the settings affect the backup pieces in RMAN.

for Datafile #1.

 File# Checkpoint Time   Incr level Incr chg# Chkp chg# Incremental Typ Keep Keep until Keep options    Tag
------ ----------------- ---------- --------- --------- --------------- ---- ---------- --------------- ---------------
     3 06-01-23 00:00:06          0         0   3334337 FULL            NO                              INC_LEVEL_0
     3 06-02-23 00:00:03          1   3334337   3334513 INCR1           NO                              INC_LEVEL_1
     3 06-03-23 00:00:03          1   3334513   3334665 INCR1           NO                              INC_LEVEL_1
     3 06-04-23 00:00:03          1   3334665   3334805 INCR1           NO                              INC_LEVEL_1
     3 06-05-23 00:00:03          1   3334805   3334949 INCR1           NO                              INC_LEVEL_1
     3 06-06-23 00:00:03          1   3334949   3335094 INCR1           NO                              INC_LEVEL_1
     3 06-07-23 00:00:03          1   3335094   3335234 INCR1           NO                              INC_LEVEL_1

for archive logs

Sequence# First chg# Next chg# Create Time       Keep Keep until Keep options    Tag
--------- ---------- --------- ----------------- ---- ---------- --------------- ---------------
      625    3333260   3334274 15-JUN-23         NO                              ARCHIVE_ONLY
      626    3334274   3334321 01-JUN-23         NO                              INC_LEVEL_0
      627    3334321   3334375 01-JUN-23         NO                              INC_LEVEL_0
      628    3334375   3334440 01-JUN-23         NO                              ARCHIVE_ONLY
      629    3334440   3334490 01-JUN-23         NO                              INC_LEVEL_1
      630    3334490   3334545 02-JUN-23         NO                              INC_LEVEL_1
      631    3334545   3334584 02-JUN-23         NO                              ARCHIVE_ONLY
      632    3334584   3334633 02-JUN-23         NO                              INC_LEVEL_1
      633    3334633   3334695 03-JUN-23         NO                              INC_LEVEL_1
      634    3334695   3334733 03-JUN-23         NO                              ARCHIVE_ONLY
      635    3334733   3334782 03-JUN-23         NO                              INC_LEVEL_1
      636    3334782   3334839 04-JUN-23         NO                              INC_LEVEL_1
      637    3334839   3334876 04-JUN-23         NO                              ARCHIVE_ONLY
      638    3334876   3334926 04-JUN-23         NO                              INC_LEVEL_1
      639    3334926   3334984 05-JUN-23         NO                              INC_LEVEL_1
      640    3334984   3335023 05-JUN-23         NO                              ARCHIVE_ONLY
      641    3335023   3335072 05-JUN-23         NO                              INC_LEVEL_1
      642    3335072   3335124 06-JUN-23         NO                              INC_LEVEL_1
      643    3335124   3335162 06-JUN-23         NO                              ARCHIVE_ONLY
      644    3335162   3335211 06-JUN-23         NO                              INC_LEVEL_1
      645    3335211   3335273 07-JUN-23         NO                              INC_LEVEL_1
      646    3335273   3335311 07-JUN-23         NO                              ARCHIVE_ONLY

Next I'm going to execute the weekly full backup script that changes the last backup to a keep backup to see how the settings affect the backup pieces in RMAN.

for Datafile #1.

 File# Checkpoint Time   Incr level Incr chg# Chkp chg# Incremental Typ Keep Keep until Keep options    Tag
------ ----------------- ---------- --------- --------- --------------- ---- ---------- --------------- ---------------
     3 06-01-23 00:00:06          0         0   3334337 FULL            YES  08-JUL-23  BACKUP_LOGS     INC_LEVEL_0
     3 06-02-23 00:00:03          1   3334337   3334513 INCR1           NO                              INC_LEVEL_1
     3 06-03-23 00:00:03          1   3334513   3334665 INCR1           NO                              INC_LEVEL_1
     3 06-04-23 00:00:03          1   3334665   3334805 INCR1           NO                              INC_LEVEL_1
     3 06-05-23 00:00:03          1   3334805   3334949 INCR1           NO                              INC_LEVEL_1
     3 06-06-23 00:00:03          1   3334949   3335094 INCR1           NO                              INC_LEVEL_1
     3 06-07-23 00:00:03          1   3335094   3335234 INCR1           NO                              INC_LEVEL_1
     3 06-08-23 00:00:07          0         0   3335715 FULL            NO                              INC_LEVEL_0

for archive logs

Sequence# First chg# Next chg# Create Time       Keep Keep until Keep options    Tag
--------- ---------- --------- ----------------- ---- ---------- --------------- ---------------
      625    3333260   3334274 15-JUN-23         NO                              ARCHIVE_ONLY
      626    3334274   3334321 01-JUN-23         YES  08-JUL-23  BACKUP_LOGS     INC_LEVEL_0
      627    3334321   3334375 01-JUN-23         YES  08-JUL-23  BACKUP_LOGS     INC_LEVEL_0
      628    3334375   3334440 01-JUN-23         NO                              ARCHIVE_ONLY
      629    3334440   3334490 01-JUN-23         NO                              INC_LEVEL_1
      630    3334490   3334545 02-JUN-23         NO                              INC_LEVEL_1
      631    3334545   3334584 02-JUN-23         NO                              ARCHIVE_ONLY
      632    3334584   3334633 02-JUN-23         NO                              INC_LEVEL_1
      633    3334633   3334695 03-JUN-23         NO                              INC_LEVEL_1
      634    3334695   3334733 03-JUN-23         NO                              ARCHIVE_ONLY
      635    3334733   3334782 03-JUN-23         NO                              INC_LEVEL_1
      636    3334782   3334839 04-JUN-23         NO                              INC_LEVEL_1
      637    3334839   3334876 04-JUN-23         NO                              ARCHIVE_ONLY
      638    3334876   3334926 04-JUN-23         NO                              INC_LEVEL_1
      639    3334926   3334984 05-JUN-23         NO                              INC_LEVEL_1
      640    3334984   3335023 05-JUN-23         NO                              ARCHIVE_ONLY
      641    3335023   3335072 05-JUN-23         NO                              INC_LEVEL_1
      642    3335072   3335124 06-JUN-23         NO                              INC_LEVEL_1
      643    3335124   3335162 06-JUN-23         NO                              ARCHIVE_ONLY
      644    3335162   3335211 06-JUN-23         NO                              INC_LEVEL_1
      645    3335211   3335273 07-JUN-23         NO                              INC_LEVEL_1
      646    3335273   3335311 07-JUN-23         NO                              ARCHIVE_ONLY
      647    3335311   3335652 07-JUN-23         NO                              ARCHIVE_ONLY
      648    3335652   3335699 08-JUN-23         NO                              INC_LEVEL_0
      649    3335699   3335760 08-JUN-23         NO                              INC_LEVEL_0
      650    3335760   3335833 08-JUN-23         NO                              ARCHIVE_ONLY

Finally I'm going to execute the weekly full backup script that changes the last backup to a keep backup and this time it will delete the older backup pieces to see how the settings affect the backup pieces in RMAN.

for Datafile #1.

File# Checkpoint Time   Incr level Incr chg# Chkp chg# Incremental Typ Keep Keep until Keep options    Tag
------ ----------------- ---------- --------- --------- --------------- ---- ---------- --------------- ---------------
     3 06-01-23 00:00:06          0         0   3334337 FULL            YES  15-JUL-23  BACKUP_LOGS     INC_LEVEL_0
     3 06-08-23 00:00:07          0         0   3335715 FULL            YES  15-JUL-23  BACKUP_LOGS     INC_LEVEL_0
     3 06-09-23 00:00:03          1   3335715   3336009 INCR1           NO                              INC_LEVEL_1
     3 06-10-23 00:00:03          1   3336009   3336183 INCR1           NO                              INC_LEVEL_1
     3 06-11-23 00:00:03          1   3336183   3336330 INCR1           NO                              INC_LEVEL_1
     3 06-12-23 00:00:03          1   3336330   3336470 INCR1           NO                              INC_LEVEL_1
     3 06-13-23 00:00:03          1   3336470   3336617 INCR1           NO                              INC_LEVEL_1
     3 06-14-23 00:00:04          1   3336617   3336757 INCR1           NO                              INC_LEVEL_1
     3 06-15-23 00:00:07          0         0   3336969 FULL            NO                              INC_LEVEL_0

for archive logs

Sequence# First chg# Next chg# Create Time       Keep Keep until Keep options    Tag
--------- ---------- --------- ----------------- ---- ---------- --------------- ---------------
      626    3334274   3334321 01-JUN-23         YES  15-JUL-23  BACKUP_LOGS     INC_LEVEL_0
      627    3334321   3334375 01-JUN-23         YES  15-JUL-23  BACKUP_LOGS     INC_LEVEL_0
      647    3335311   3335652 07-JUN-23         NO                              ARCHIVE_ONLY
      648    3335652   3335699 08-JUN-23         YES  15-JUL-23  BACKUP_LOGS     INC_LEVEL_0
      649    3335699   3335760 08-JUN-23         YES  15-JUL-23  BACKUP_LOGS     INC_LEVEL_0
      650    3335760   3335833 08-JUN-23         NO                              ARCHIVE_ONLY
      651    3335833   3335986 08-JUN-23         NO                              INC_LEVEL_1
      652    3335986   3336065 09-JUN-23         NO                              INC_LEVEL_1
      653    3336065   3336111 09-JUN-23         NO                              ARCHIVE_ONLY
      654    3336111   3336160 09-JUN-23         NO                              INC_LEVEL_1
      655    3336160   3336219 10-JUN-23         NO                              INC_LEVEL_1
      656    3336219   3336258 10-JUN-23         NO                              ARCHIVE_ONLY
      657    3336258   3336307 10-JUN-23         NO                              INC_LEVEL_1
      658    3336307   3336359 11-JUN-23         NO                              INC_LEVEL_1
      659    3336359   3336397 11-JUN-23         NO                              ARCHIVE_ONLY
      660    3336397   3336447 11-JUN-23         NO                              INC_LEVEL_1
      661    3336447   3336506 12-JUN-23         NO                              INC_LEVEL_1
      662    3336506   3336544 12-JUN-23         NO                              ARCHIVE_ONLY
      663    3336544   3336594 12-JUN-23         NO                              INC_LEVEL_1
      664    3336594   3336639 13-JUN-23         NO                              INC_LEVEL_1
      665    3336639   3336677 13-JUN-23         NO                              ARCHIVE_ONLY
      666    3336677   3336734 13-JUN-23         NO                              INC_LEVEL_1
      667    3336734   3336819 14-JUN-23         NO                              INC_LEVEL_1
      668    3336819   3336857 14-JUN-23         NO                              ARCHIVE_ONLY
      669    3336857   3336906 14-JUN-23         NO                              ARCHIVE_ONLY
      670    3336906   3336953 15-JUN-23         NO                              INC_LEVEL_0
      671    3336953   3337041 15-JUN-23         NO                              INC_LEVEL_0
      672    3337041   3337113 15-JUN-23         NO                              ARCHIVE_ONLY

Result

For my datafiles, I still have the weekly full backup, and it is a keep backup. For my archive logs, I still have the archive logs that were part of the full backup which are needed to de-fuzzy my backup.

Restore Test

Now for the final test using the next chg# on the June 1st archive logs 3334375;

RMAN> restore database until scn=3334375;

Starting restore at 15-JUN-23
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=259 device type=DISK
...
channel ORA_DISK_1: piece handle=/u01/ocidb/backups/da1tiok6_1450_1_1 tag=INC_LEVEL_0
channel ORA_DISK_1: restored backup piece 1
...
channel ORA_DISK_1: reading from backup piece /u01/ocidb/backups/db1tiola_1451_1_1
channel ORA_DISK_1: piece handle=/u01/ocidb/backups/db1tiola_1451_1_1 tag=INC_LEVEL_0
channel ORA_DISK_1: restored backup piece 1

RMAN> recover database until scn=3334375;
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=627
channel ORA_DISK_1: reading from backup piece /u01/ocidb/backups/dd1tiom8_1453_1_1
channel ORA_DISK_1: piece handle=/u01/ocidb/backups/dd1tiom8_1453_1_1 tag=INC_LEVEL_0
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
archived log file name=/u01/app/oracle/product/19c/dbhome_1/dbs/arch1_627_1142178912.dbf thread=1 sequence=627
media recovery complete, elapsed time: 00:00:00
Finished recover at 15-JUN-23
RMAN> alter database open resetlogs;

Statement processed

Success !

Monday, June 5, 2023

Autotuned_reserved_space is a new feature on the ZDLRA that you should be using

Autotuned_reserved_space is a new policy setting that got released with 21.1 and you should be using it. When I talk to customers about how to manage databases on a ZDLRA, the biggest confusion comes in when I talk about reserved space. Reserved space needs to be understood, and properly managed. This new feature in 21.1 allows the ZDLRA to handle the reserved space for you, and I explain how to use it in this blog post. First let's go through space usage, and reserved space in general.

Space usage on the ZDLRA.

Recovery Window goal (which drives the space utilization)

The recovery window goal is set at the policy level, and this value (in days) is the number of days that you want to keep as a recovery window for all databases that are a member of this policy. This will drive the space utilization.

Total space

The ZDLRA comes with all the space pre-allocated. When you are looking at OEM, or in the SAR report you will see the total space listed. You want to make sure that you have enough space for your database backups and any incoming new backups.

Used Space

When the ZDLRA purges backups beyond the the Recovery Window Goal that you set, if does a bulk purge of backups. This can be controlled by setting the maximum disk backup retention in days (which defaults to 1.5 times the recovery window goal). Because of the bulk purge, more space is shown as used than is needed to support your recovery window goal.

Recovery Window Space

This is the amount space that is needed to support the recovery window goal. Because, of the bulk purge, the recovery window space is less than the used space.

Reserved space

In order to control what happens with space, the concept of reserved space is used. When a database is added to the ZLDRA, the reserved space value is set for this database. This value should be updated regularly to ensure that there is enough space for the database backups to be stored.

The important things to know about reserved space are:

The sum of all the reserved space cannot be greater than the total space available on the ZDLRA.
When adding a new database, it's reserved space must fit within the unreserved space.
When a new database is added, the reserved space must be set to least the size of the database, and defaults to 2.5 times the size of the database.
The reserved space for a database needs to be at least the size of the largest datafile.
The reserved space should be larger than the amount of space needed to support the recovery window goal space for the database. For databases with fluctuation, you need to reserve space for the peak usage.

The reserved space serves two purposes when properly set

It can be used to determine how much space is available for new database backups.
If the ZDLRA determines that it does not have enough space to support the recovery window goal of the supported databases, space is reclaimed from databases whose reserved space is too small.

It is critical to keep the reserved space updated, and many customers have used an automated process to set the reserved space to "recovery window space needed" + 10%

Unfortunately configuring an automated process for all databases does not take into account any fluctuations in usage. Let's say I have a database which is much busier at months end, I want to make that sure my reserved space is not adjusted down to the low value, I want it to stay adjusted based on the highest space usage value.

Autotuned_reserved_space

This where autotuned reserved space can help you manage the reserved space. This setting is controlled at the policy level.

AUTOTUNED_RESERVED_SPACE

This value is set at the protection policy level and contains either "YES" or "NO", and defaults to "NO". "YES" will allow the ZDLRA to manage reserved space automatically for all databases (whose disk_reserved_space is not set) and are a member of this policy.

MAX_RESERVED_SPACE

This value is also set at the protection policy level. This value is optional for autotuned_reserved_space, but if set, it will control the maximum amount of reserved space that can be set for an individual database in the protection policy.

AUTOTUNE_SPACE_LIMIT

This value is set at the storage level for ALL databases. This sets a reserved space usage limit, where autotuning can slow down large reserved space increases. When reached, autotune will limit databases from increasing their reserved space growth to 10% per week. This value is optional and will default to the total space if not set.

SUMMARY:

autotuned_reserved_space - Enables autotuning of space within a protection policy
max_reserved_space - Controls the maximum reserved space of databases in a protection policy
autotune_space_limit - Slows the reserved space growth when a specified space limit is reached.

What does autotune reserved space do ?

On a regular basis, if needed, the reserved space for each autotune controlled database is adjusted to reserve space for the recovery window goal, and incoming backups.

If the database has a disk_reserved_space set, autotuning will not be used for this database. It is assumed that the disk_reserved_space will be set manually for this database

Autotune will replace the need for the ZDLRA admin to constantly update the reserved space for each database, as it's space needs change over time. It will also allow them to configure a constant reserved space for databases with fluctuating storage usage.

Thursday, May 11, 2023

ZDLRA Validation is your best protection against Ransomware

Validation of Oracle backups on ZDLRA is often one the most overlooked features of the product. With the rise of ransomware, the question "how to I ensure that I have validated Oracle backups" is critical.

I know there are a lot of vendors out there that provide a great solution for most generic backups. But, as you probably know, Oracle Database backups are different from other system backups and they provide unique challenges which include

The backup of a large database consists of 100s, if not 1000s of backup pieces. All of which are necessary to successfully restore the database.
Oracle Database backups won't contain "ransomware signatures" or any easy way of determining if the backup pieces are tainted.
Oracle Database backups are in a proprietary format that can only be validated by performing a "restore validate" which reads, and validates the contents of Oracle database backup pieces.

How ZDLRA provides superior validation

Backups land on flash during ingest

When backup pieces are sent to the ZDLRA during backup, they land on Flash Storage and are quarantined within the ZDLRA waiting to be validated.

Backup pieces are validated

The ZDLRA will then examine arriving backup pieces. The internal metadata is read and the contents of the backup pieces are validated block-by-block. This ensures that before storing the backup pieces, they are confirmed to be Oracle Database backup pieces, containing valid Oracle Blocks.

Backup pieces are stored and virtual full created

Once the backup piece is examined, and the metadata is read, the individual validated blocks are stored on disk compressed. The blocks are indexed, and a virtual full backup is built. The final step in the process is to update the RMAN catalog on the ZDLRA with an entry pointing to the virtual full.

Weekly validation for both block content, and restore continuity

On a weekly basis all backups on the ZDLRA undergo a "restore validate" which will validate that all the backup pieces are valid, usable backup pieces. This is critical with an "incremental forever" strategy to ensure that unchanged blocks are valid. Along with checking for the integrity of the backup piece, the ZDLRA also checks for "Restore Continuity". I know this a term I made up. The idea is that whatever time/SCN you choose within the recovery window, the ZDLRA ensures that ALL backup pieces needed to recover are available. This is similar to performing a "restore preview" of all time periods to ensure that all backup pieces are available for recovery.

Validation during replication

Replication of backup pieces from one ZDLRA to another takes this process one step further.

Along with all the same validation that occurs when the ZDLRA receives backups from databases, the upstream ZDLRA also catalogs the replicated copy of the backup pieces.

ZDLRA in a Cyber vault

This is where all the pieces come together. The ZDLRA not only utilizes it's validated, incremental forever strategy to keep replication traffic to a minimum, but it also ensures that backups pieces are validated PRIOR to cataloging them.

The ZDLRA has a number of advantages in a Cyber vault scenario

Replication traffic is much smaller than most solutions which require a Weekly Full backup. The ZDLRA uses incremental forever.
Backup pieces are quarantined after arrival in the vault to ensure tainted backups are not included in restore plans. This process is similar to what other vendors do to check for ransomware. The ZDLRA goes one step further by using the proprietary knowledge of Oracle Blocks to ensure all backup, and blocks within the backups are valid.
Backups stored within the ZDLRA in the vault are validated on a weekly basis for both content, and continuity to ensure a restore will be successful.
The upstream sending the backup pieces catalogs what backups are in the vault, and can resend any backup pieces if necessary.

I hope this helps you understand better why the ZDLRA provides superior ransomware protection.

Wednesday, May 10, 2023

ZDLRA real-time redo demonstrated

One of the key features of the ZDLRA is the ability to capture changes from the database "real-time" just like a standby database does. In this blog post I am going to demonstrate what is happening during this process so that you can get a better understanding of how it works.

If you look at the GIF above, I will explain what is happening, and show what happens with a demo of the process.

The ZDLRA uses the same process as a standby database. In fact if you look at the flow of the real-time redo you will notice the redo blocks are sent to BOTH the local redo log files, AND to the staging area on the ZDLRA. The staging area on the ZDLRA acts just like a standby redo does on a standby database.

As the ZDLRA receives the REDO blocks from the protected database they are validated to ensure that they are valid Oracle Redo block information. This ensures that a man-in-the-middle attack does not change any of the backup information. The validation process also assures that if the database is attacked by ransomware (changing blocks), the redo received is not tainted.

The next thing that happens during the process is the logic when a LOG SWITCH occurs. As we all know, when a log switch occurs on a database instance, the contents of the redo log are written to an archive log. With real-time redo, this causes the contents of the redo staging area on the ZDLRA (picture a standby redo log) to become a backup set of an archive log. The RMAN catalog on the ZDLRA is then updated with the internal location of the backup set.

Log switch operation

I am going to go through a demo of what you see happen when this process occurs.

ZDLRA is configured as a redo destination

Below you can see that my database has a "Log archive destination" 3 configured. The destination itself is the database on the ZDLRA (zdl9), and also notice that the log information will be sent for ALL_ROLES, which will send the log information regardless if it is a primary database or a standby database.

List backup of recent archive logs from RMAN catalog

Before I demonstrate what happens with the RMAN catalog, I am going to list out the current archive log backup. Below you see that the current archive log backed up to the ZDLRA has the "SEQUENCE #10".

Perform a log switch

As you see in the animation at the top of the post, when a log switch occurs, the contents of the redo log in the "redo staging area" are used to create an archive log backup that is stored and cataloged. I am going to perform a log switch to force this process.

List backup of archive logs from RMAN catalog

Now that the log switch occurred, you can see below that there is a new backup set created from the redo staging area.

There are a couple of interesting items to note when you look at the backup set created.

The backup of the archive log is compressed. As part of the policy on the ZDLRA you have the option to have the backup of the archive log compressed when it is created from the "staged redo". This does NOT require the ACO (Advanced Compression) license. The compressed archive log will be sent back to the DB compressed during a restore operation, and the DB host will uncompress it. This is the default option (standard compression) and I recommend changing it. If you decide to compress, then MEDIUM or Low is recommended. Keep this in mind that he this may put more workload on the client to uncompress the backup sets which may affect recovery times. NOTE: When using TDE, there will be little to no compression possible.
The TAG is automatically generated. By looking at the timestamp in the RMAN catalog information, you can see that the TAG is automatically generated using the timestamp to make it unique.
The handle begins with "$RSCN_", this is because the backup piece was generated by the ZDLRA itself, and archivelog backup sets will begin with these characters.

Restore and Recovery using partial log information

Now I am going to demonstrate what happens when the database crashes, and there is no time for the database to perform a log switch.

List the active redo log and current SCN

Below you can see that my currently active redo log is sequence # 12. This is where I am going to begin my test.

Create a table

To demonstrate what happens when the database crashes I am going to create a new table. In the table I am going to store the current date, and the current SCN. Using the current SCN we will be able to determine the redo log that contains the table creation.

Abort the database

As you probably know, if I shut down the database gracefully, the DB will automatically clean out the redo logs and archive it's contents. Because I want to demonstrate what happens with crash I am going to shut the database down with an ABORT to ensure the log switch doesn't occur. Then start the database mount so I can look at the current redo log information

Verify that the log switch did not occur

Next I am going to look at the REDO Log information and verify that my table creation (SCN 32908369) is still in the active redo log and did not get archived during the shutdown.

Restore the database

Next I am going to restore the database from backup.

Recover the database

This is where the magic occurs so I am going to show that happens step by step.

Recover using archive logs on disk

The first step the database does is to use the current archive logs to recover the database. You can see in the screenshot below that the database recovers the database using archive logs on disk up to sequence #11 for thread 1. This contains all the changes for this thread, but does not include what is in the REDO log sequence #12. Sequence #12 contains the create table we are interested in.

Recover using partial redo log

This step is where the magic of the ZDLRA occurs. You can see from the screen shot below that the RMAN catalog on the ZDLRA returns the redo log information for Sequence #12 even though it was never archived. The ZDLRA was able to create an archive log backup from the partial contents it had in the Redo Staging area.

Open the database and display table contents.

This is where it all comes together. Using the partial redo log information from Redo Log sequence #12, you can see that when the database is opened, the table creation transaction is indeed in the database even though the redo did not become an archive log.

Conclusion : I am hoping this post gives you a better idea of how Real-time redo works on the ZDLRA, and how it handles recovering transactions after a database crash

Thursday, March 23, 2023

Why DBCS (Oracle Base Database Service) in OCI can make a DBA's life much easier (even with BYOL)

DBCS (now named Oracle Base Database service, but I will call it DBCS throughout this post) in OCI can help make a DBA's life easier. When I was testing the new Autonomous Recovery Service for Oracle Database in OCI, I created a LOT of different DBCS systems to test backup and recovery. Along the way I learned a lot about the workings of DBCS, and I came to appreciate how it makes sense, even if you are a BYOL (bring your own license) customer.

I'm more of a an "old school" DBA, preferring command line, and scripting processes myself. I am typically not a fan of automation. When using DBCS I was surprised by all the things it would do for me that I would have to do manually.

Install oracle software and create a database

Having installed oracle software hundreds of times, and having created test databases, I didn't think I would care much about automation that did this for me.

Central Software image management

What I found in OCI, is that you can create your own software images that can be used to ensure each new database environment is consistent. OCI gives you ability to create your own set of release images (which can include patches). This ensures each time I create a new DBCS environment, and choose my custom image, it's running the same version in all environments. No more installing base release, then patches, and then then any possible one-off patches. This makes the installation of the database software much, much easier, and ensures consistency.

Easy Database creation

Recently I've gotten familiar with performing a silent database creation, as using dbca isn't always easy to configure. The tooling provided by DBCS will not only create a database for you, but will also configure TDE encryption (with a local wallet, or using OCI vault). It can even create a RAC database across 2 nodes. And don't forget, it can create the standby for me also.

Configure ASM storage

Now this is the most interesting piece I found when using DBCS. Not only does the DBCS service create a disk group, but it automatically stripes multiple block volumes together maximizing performance. This is a HUGE help in ensuring I am getting the best performance.

When I was going through what the configuration did, I tried to build tables showing how the different storage sizes translate to the storage configurations.

There were 2 configurations and DB data storage sizes, one for Flex, and one for Standard shapes.

Flex

First I looked at flex, and regardless of the performance level these were the sizes.

Then within Flex, I looked at the "Balanced performance" configuration.

Balanced Performance configuration

You can see that as the DB storage available goes up, the number of disks goes up also allowing for a higher possible IOPS than you would get from a single Block Storage device.

Below is the chart for "High Performance"

High Performance configuration

You can see that the IOPS is even higher, and it is using even more disks to get that performance.

Standard

Next looked at standard shapes, and regardless of the performance level these were the sizes. Note that with Standard shapes, there were many more options for configurations.

Balanced Performance configuration

High Performance configuration

Benefits of DBCS

I also went through what some of the other benefits of DBCS are, and below is the list I came up with.

When using the DBCS service, the storage cost is based on the Block Storage cost. This is the same cost as you would pay in an IaaS service. Having the storage striped and configured for maximum IOPS makes this a huge plus.

DBCS allows you purchase licenses if you don't have enough licenses to use the BYOL option.

The DBCS service price is based on OCPU and is the same regardless of the shape. Memory is included in the OCPU cost.

DBCS automatically configures RAC if you choose it.

DBCS provides tooling that automatically configures backups, can apply patches, and rotate encryption keys.

DBCS allows you to automate the cloning of your database, and automate any restores.

DBCS includes TDE, and relieves you of having to own the ASO license.

Conclusion:

DBCS offers a lot more than you realize. Take a deep dive into what it can do for you to save time as DBA and you also might realize that sometimes tooling along with automation has it's benefits.

Wednesday, March 1, 2023

Oracle Database recovery using Incremental merge, snapshots, OMF and "switch to copy"

I work with backup and recovery of the Oracle Database, and sometimes this means looking at the Incremental Merge backup strategy. I know this isn't the best backup/recovery strategy, and below are few posts giving you more detail on the topic.

Chris Craft's blog post - "Using incremental merge"
Tim Chien's blog post - "Why CDB(snapshots) are NOT backups"

They have some great points, and I typically don't recommend using incremental merge backups. The incremental merge backup strategy is almost always paired with snapshots to increase the recovery window.

Below is an image of how these are typically paired with snapshots.

One of the biggest draws of using the incremental merge strategy with snapshots, is the ability to perform a "switch to copy" as a recovery strategy.

NOTE: When you perform "switch to copy" the database is now accessing datafiles using the backup copy. This is not supported on Exadata for any storage other than Oracle ZFS.

If you review the MOS note "Using External Storage with Exadata (Doc ID 2663308.1)" you will find that "Use of non-Oracle storage for database files is not supported."

Given all of that, I got the question "I am using the incremental merge strategy on Oracle an ZFS appliance using snapshots. If I perform a switch-to-copy recovery of one or more datafiles, how do I avoid forcing a new full backup on the next incremental merge backup?".

I thought this was a great question, and I created a test database, and started googlin'. Below are some of the posts I looked at.

I started by using the first post and walked through a testing scenario using a DBCS database in OCI.

My database was a 19.8 database using local storage (to make things easier to see the datafiles), and it was using OMF by default.

The piece that was missing from the first post was the "alter database move datafile 'xxx' to 'xxx' KEEP;

What I found is that it wasn't so easy with my database using OMF for 2 reasons.

Using OMF, you don't specify the "to 'xxx'" since OMF will automatically name the destination datafile.
Using "KEEP" is ignored when the source file is OMF. This meant that the original image copy being used by the database is removed when move process completes. I couldn't catalog the image copy.

Since it took a bit of research to find the best strategy I wanted to share the process that I would recommend when dealing with OMF and non-OMF image copy backups with snapshots.

NON-OMF image copy backups

Snap the backup storage just to preserve the starting point. --> optional but recommended
Take the tablespaces offline
Perform a "switch to copy" of the datafiles --> This will use the incremental merge backup.
Recover the datafiles
Bring the tablespaces online ---> Application is running using the external image copy
Perform an "alter database move datafile 'xxx' to 'xxx' KEEP; --> Using keep will preserve the original copy, but will only work if the image copy is NOT OMF. If the destination is an OMF file, you will not use the "to"
Catalog the image copy that was preserved with the "KEEP" ensuring you use the same tag used for the incremental merge. "catalog datafilecopy '+fra/ocm121/datafile/MONSTER.346.919594959' level 0 TAG 'incr_update';"
The next incremental merge will pick up with the updated image copy.

OMF image copy backups

Snap the backup storage to create a copy for the switch to copy.
Unmount the "current" image copy
Mount the snap copy using the same mount point as the "current" image copy.
Take the tablespaces offline
Perform a "switch to copy" of the datafiles --> This will use the snap copy of the incremental merge backup on external storage.
Recover the datafiles
Bring the tablespaces online ---> Application is running using external copies.
Perform an "alter database move datafile 'xxx' ; --> Since the source is an OMF file you cannot use "KEEP" to preserve the original copy. The original copy will be removed.
Once all moves are complete, unmount the snapped copy.
Mount the "current" copy. This is as of when you started this process.
Catalog the image copy for all datafiles that performed the "switch to copy" ensuring you use the same tag used for the incremental merge "catalog datafilecopy '+fra/ocm121/datafile/MONSTER.346.919594959' level 0 TAG 'incr_update';"
You can now destroy the snap that was created to perform the switch to copy.
The next incremental merge will pick up with the current image datafile copies where it left off.

As you can see, using OMF greatly complicates preserving the incremental merge backup, and forces you to start at the last backup.

Wednesday, November 16, 2022

ZDLRA - Quick Start Guide

This post is intended to be a Quick Start Guide for those who are new to ZDLRA (RA for short). I spend part of time working with customers who are new the RA and often the same topics/questions come up. I wanted to put together a "Quick Start" guide that they can use to learn more about these common topics.

The steps I would follow for anyone new to the RA are.

Read through the section on configuring users and security settings for the RA. Decide which compliance settings make sense for the RA and come with a plan to implement them.
Identify the users, both OS users (if you are disabling direct root access), and users within the databases that will mange and/or monitor the RA. OS users can be added with "racli add admin_user". Database users can be added with "racli add db_user"
Create protection policies that contain the recovery window(s) that you want to set for the databases. You will also set compliance windows when creating policies. This can be done manually using the package DBMS_RA.CREATE_PROTECTION_POLICY.
Identify the VPC user(s) needed to manage the database. Is it a single DBA team, or different teams requiring multiple VPC users? Create the VPC user using "racli add vpc_user"
Add databases to be backed up to the RA, associate the database with both a protection policy and a VPC user who will be managing the database. NOTE that you should look at the Reserved Space, and adjust it as needed. Databases can be added manually by using two PL/SQL calls. DBMS_RA.ADD_DB will add the database to the RA. DBMS_RA.GRANT_DB_ACCESS will allow the VPC user to manage the database.
Configure the database to be backed up to the RA either by using OEM, or manually. The manual steps would be

Create a wallet on the DB client that contains the VPC credentials to connect to the RA.
Update the sqlnet.ora file to point to this wallet
Connect to the RMAN catalog on the RA from the DB client
Register the database to the RA
Configure the channel configuration to point to the RA
Configure Block change tracking (if it is not configured).
Configure the redo destination to point to the RA if you want to configure real-time redo.
Change the RMAN retention to be "applied to all standby" if using real-time redo, or "backed up 1 time" if not.
Update OEM to have the database point to the RMAN catalog on the ZDLRA.

Documentation

The documentation can be found here. Within the documentation there are several sections that will help you manage the RA.

Get Started

The get started section contains some subtopics to delve into

Install and configure the Recovery Appliance

The links in this section cover all the details about the installation and configuration of the RA. I won't be talking about those sections in the post, but be aware this is where to look for general maintenance/patching/expanding information.

Learn about the Recovery Appliance.

This section covers an overview of the RA, and is mostly marketing material. If you are not familiar with the RA, or want an overview this is the place to turn.

Administer the Recovery Appliance

These sections are going to be a lot more helpful to get you started. This section of the documentation covers

Managing Protect Policies - Protection policies is the place to start when configuring an RA. Protection policies group databases together and it is critical to make sure you have the correct protection policies in place before adding databases to be backed up.

Copying Backups to Tape - This section is useful if you plan on creating backups (either point in time or archival) that will be sent externally from the RA. This can be either to physical/virtual tape, or to an external media manager.

Archiving Backups to the Cloud - This section covers how to configure the RA to send backups to an OCI compatible object storage. This can either be OCI, or it can be an on-premises ZFS that has a project configured as OCI object storage.

Accessing Recovery Appliance Reports - This section covers how to access all the reports available to you. You will find these reports are priceless to manage the RA over time. Some examples of the areas these reports cover are.

Storage Capacity Planning reports with future usage projections
Recovery Window Summary reports to validate backups are available
Active incident reports to manage any alerts
API History Report to audit any changes to the RA

NOTE : If you are using the RA in a charge backup model to your internal business units, there is specific reporting that can be used for this. Talk your Oracle team find out more.

Monitoring the Recovery Appliance - This section covers how to monitor the RA and set up alerts. This will allow you identify any issues that would affect the recovery of the backups including space issues, and missing backups.

Administer the Recovery Appliance

Configure Protected Databases - This section goes through how to configure databases to be backed up to the recovery appliance and includes instructions for both using OEM, and adding databases using the command line.

Backup Protected Databases - This section covers how to backup a database from either OEM, or from the traditional RMAN command line. I would also recommend looking at the MOS note to ensure that you are using the current best practices for backups. "RMAN best practice recommendations for backing up to the Recovery Appliance (Doc ID 2176686.1)".

Recover Databases - This section covers how to recover databases from the RA. This section also covers information about cloning databases. Cloning copies of production is a common use case for the RA, and this section is very useful to help you with this process.

Books

This section contains the documentation you look at regularly to manage the RA and answer questions that you may have on managing it. I am only going to point the sections that you find most useful.

Deployment

The one important section under deployment is the Zero Data Loss Recovery Appliance Owners Guide.

Zero Data Loss Recovery Appliance Owners Guide - This guide contains information on configuring users on the RA, and the most critical sections to look at are

"Part III Security and Maintenance of Recovery Appliance". If you are using the RA to manage immutable backups, it is important to go through this section to understand how users will be managed for maximum protection of your backups.
Part IV Command Reference - This section covers the CLI commands you will use the manage the RA.

Administration

This is probably the most important guide in the documentation. It covers many of the areas of you will be managing as you configure databases to be backed up. The most critical sections are

Part I Managing Recovery appliance - This section covers

Implementing Immutable Backups
Securing the Recovery Appliance operations
Managing Protection Policies
Configuring replication and replication concepts
Additional High Availability strategies

Part III Recovery Appliance Reference - This section covers

DBMS_RA packages to manage the RA through commands
Recovery Appliance View Reference to see what views are available

MOS Notes

There are number of useful MOS notes that you will want to bookmark

Zero Data Loss Recovery Appliance (ZDLRA) Information Center (Doc ID 2673011.2)
How to Backup and Recover the Zero Data Loss Recovery Appliance (Doc ID 2048074.1)
Zero Data Loss Recovery Appliance Supported Versions (Doc ID 1927416.1)
Zero Data Loss Recovery Appliance Software Updates Guide (Doc ID 2028931.1)
Cross Platform Database Migration using ZDLRA (Doc ID 2460552.1)
How to Move RMAN Catalog To A Different Database (Doc ID 351918.1)

Helpful Blogs

Fernando Simon

Fernando has a number of helpful blog entries. Be aware he has been blogging for a long time on the RA, and some of the management processes have changed. One example is RACLI is now used to create VPC users. Some of the Blogs to note are

Bryan Grenn

I have a number of blog posts on features of the ZDLRA.