Wednesday, May 10, 2023

ZDLRA real-time redo demonstrated

 One of the key features of the ZDLRA is the ability to capture changes from the database "real-time" just like a standby database does. In this blog post I am going to demonstrate what is happening during this process so that you can get a better understanding of how it works.

ZDLRA Real-time Redo


If you look at the GIF above, I will explain what is happening, and show what happens with a demo of the process.

The ZDLRA uses the same process as a standby database.  In fact if you look at the flow of the real-time redo you will notice the redo blocks are sent to BOTH the local redo log files, AND to the staging area on the ZDLRA.  The staging area on the ZDLRA acts just like a standby redo does on a standby database.

As the ZDLRA receives the REDO blocks from the protected database they are validated to ensure that they are valid Oracle Redo block information.  This ensures that a man-in-the-middle attack does not change any of the backup information.  The validation process also assures that if the database is attacked by ransomware (changing blocks), the redo received is not tainted.


The next thing that happens during the process is the logic when a LOG SWITCH occurs.  As we all know, when a log switch occurs on a database instance, the contents of the redo log are written to an archive log.  With real-time redo, this causes the contents of the redo staging area on the ZDLRA (picture a standby redo log) to become a backup set of an archive log.  The RMAN catalog on the ZDLRA is then updated with the internal location of the backup set.


Log switch operation

I am going to go through a demo of what you see happen when this process occurs.

ZDLRA is configured as a redo destination

Below you can see that my database has a "Log archive destination" 3 configured.  The destination itself is the database on the ZDLRA (zdl9), and also notice that the log information will be sent for ALL_ROLES, which will send the log information regardless if it is a primary database or a standby database.
Archive Dest


List backup of recent archive logs from RMAN catalog


Before I demonstrate what happens with the RMAN catalog, I am going to list out the current archive log backup. Below you see that the current archive log backed up to the ZDLRA has the "SEQUENCE #10".

archive log backups prior

Perform a log switch

As you see in the animation at the top of the post, when a log switch occurs, the contents of the redo log in the "redo staging area" are used to create an archive log backup that is stored and cataloged.  I am going to perform a log switch to force this process.

Log switch


List backup of archive logs from RMAN catalog

Now that the log switch occurred, you can see below that there is a new backup set created from the redo staging area.
There are a couple of interesting items to note when you look at the backup set created.

archive logs after


  1. The backup of the archive log is compressed.  As part of the policy on the ZDLRA you have the option to have the backup of the archive log compressed when it is created from the "staged redo". This does NOT require the ACO (Advanced Compression) license. The compressed archive log will be sent back to the DB compressed during a restore operation, and the DB host will uncompress it.  This is the default option (standard compression) and I recommend changing it.  If you decide to compress, then MEDIUM or Low is recommended. Keep this in mind that he this may put more workload on the client to uncompress  the backup sets which may affect recovery times.  NOTE: When using TDE, there will be little to no compression possible.
  2. The TAG is automatically generated. By looking at the timestamp in the RMAN catalog information, you can see that the TAG is automatically generated using the timestamp to make it unique.
  3. The handle begins with "$RSCN_", this is because the backup piece was generated by the ZDLRA itself, and archivelog backup sets will begin with these characters.

Restore and Recovery using partial log information


Now I am going to demonstrate what happens when the database crashes, and there is no time for the database to perform a log switch.

List the active redo log and current SCN

Below you can see that my currently active redo log is sequence # 12.  This is where I am going to begin my test.

begin test


Create a table 

To demonstrate what happens when the database crashes I am going to create a new table. In the table I am going to store the current date, and the current SCN. Using the current SCN we will be able to determine the redo log that contains the table creation.

table create


Abort the database


As you probably know, if I shut down the database gracefully, the DB will automatically clean out the redo logs and archive it's contents. Because I want to demonstrate what happens with crash I am going to shut the database down with an ABORT to ensure the log switch doesn't occur.  Then start the database mount so I can look at the current redo log information

abort


Verify that the log switch did not occur


Next I am going to look at the REDO Log information and verify that my table creation (SCN 32908369) is still in the active redo log and did not get archived during the shutdown.

Log switch doesn't occur

Restore the database


Next I am going to restore the database from backup.


restore

Recover the database


This is where the magic occurs so I am going to show that happens step by step.

Recover using archive logs on disk


The first step the database does is to use the current archive logs to recover the database. You can see in the screenshot below that the database recovers the database using archive logs on disk up to sequence #11 for thread 1.  This contains all the changes for this thread, but does not include what is in the REDO log sequence #12.  Sequence #12 contains the create table we are interested in.

archives on disk

Recover using partial redo log


This step is where the magic of the ZDLRA occurs.  You can see from the screen shot below that the RMAN catalog on the ZDLRA returns the redo log information for Sequence #12 even though it was never archived. The ZDLRA was able to create an archive log backup from the partial contents it had in the Redo Staging area.

rtr recovery

Open the database and display table contents.


This is where it all comes together.  Using the partial redo log information from Redo Log sequence #12, you can see that when the database is opened, the table creation transaction is indeed in the database even though the redo did not become an archive log.
'


Conclusion : I am hoping this post gives you a better idea of how Real-time redo works on the ZDLRA, and how it handles recovering transactions after a database crash

Thursday, March 23, 2023

Why DBCS (Oracle Base Database Service) in OCI can make a DBA's life much easier (even with BYOL)

DBCS (now named Oracle Base Database service, but I will call it DBCS throughout this post) in OCI  can help make a DBA's life easier.  When I was testing the new Autonomous Recovery Service for Oracle Database in OCI, I created a LOT of different DBCS systems to test backup and recovery.  Along the way I learned a lot about the workings of DBCS, and I came to appreciate how it makes sense, even if you are a BYOL (bring your own license) customer.




I'm more of a an "old school" DBA, preferring command line, and scripting processes myself.  I am typically not a fan of automation.  When using DBCS I was surprised by all the things it would do for me that I would have to do manually.

Install oracle software and create a database

Having installed oracle software hundreds of times, and having created test databases, I didn't think I would care much about automation that did this for me.

Central Software image management

What I found in OCI, is that you can create your own software images that can be used to ensure each new database environment is consistent.  OCI gives you ability to create your own set of release images (which can include patches).  This ensures each time I create a new DBCS environment, and choose my custom image, it's running the same version in all environments. No more installing base release, then patches, and then then any possible one-off patches.  This makes the installation of the database software much, much easier, and ensures consistency.


Easy Database creation

Recently I've gotten familiar with performing a silent database creation, as using dbca isn't always easy to configure.  The tooling provided by DBCS will not only create a database for you, but will also configure TDE encryption (with a local wallet, or using OCI vault).  It can even create a RAC database across 2 nodes.  And don't forget, it can create the standby for me also.


Configure ASM storage

Now this is the most interesting piece I found when using DBCS.  Not only does the DBCS service create a disk group, but it automatically stripes multiple block volumes together maximizing performance.  This is a HUGE help in ensuring I am getting the best performance.
When I was going through what the configuration did, I tried to build tables showing how the different storage sizes translate to the storage configurations.
There were 2 configurations and DB data storage sizes, one for Flex, and one for Standard shapes.

Flex


First I looked at flex, and regardless of the performance level these were the sizes.


Then within Flex, I looked at the "Balanced performance" configuration.

Balanced Performance configuration





You can see that as the DB storage available goes up, the number of disks goes up also allowing for a higher  possible IOPS than you would get from a single Block Storage device.

Below is the chart for "High Performance"

High Performance configuration



You can see that the IOPS is even higher, and it is using even more disks to get that performance.

Standard


Next looked at standard shapes, and regardless of the performance level these were the sizes. Note that with Standard shapes, there were many more options for configurations.


Balanced Performance configuration





High Performance configuration






Benefits of DBCS

I also went through what some of the other benefits of DBCS are, and below is the list I came up with.

  • When using the DBCS service,  the storage cost is based on the Block Storage cost. This is the same cost as you would pay in an IaaS service.  Having the storage striped and configured for maximum IOPS makes this a huge plus.

  • DBCS allows you purchase licenses if you don't have enough licenses to use the BYOL option.

  • The DBCS service price is based on OCPU and is the same regardless of the shape. Memory is included in the OCPU cost.

  • DBCS automatically configures RAC if you choose it.

  • DBCS provides tooling that automatically configures backups, can apply patches, and rotate encryption keys.

  • DBCS allows you to automate the cloning of your database, and automate any restores.

  • DBCS includes TDE, and relieves you of having to own the ASO license.  

Conclusion:

DBCS offers a lot more than you realize. Take a deep dive into what it can do for you to save time as DBA and you also might realize that sometimes tooling along with automation has it's benefits.


Wednesday, March 1, 2023

Oracle Database recovery using Incremental merge, snapshots, OMF and "switch to copy"

I work with backup and recovery of the Oracle Database, and sometimes this means looking at the Incremental Merge backup strategy.  I know this isn't the best backup/recovery strategy, and below are few posts giving you more detail on the topic.

They have some great points, and I typically don't recommend using incremental merge backups.  The incremental merge backup strategy is almost always paired with snapshots to increase the recovery window.

Below is an image of how these are typically paired with snapshots.



One of the biggest draws of using the incremental merge strategy with snapshots, is the ability to perform a "switch to copy" as a recovery strategy.

NOTE: When you perform "switch to copy" the database is now accessing datafiles using the backup copy.  This is not supported on Exadata for any storage other than Oracle ZFS.

If you review the MOS note "Using External Storage with Exadata (Doc ID 2663308.1)" you will find that "Use of non-Oracle storage for database files is not supported."

Given all of that, I got the question "I am using the incremental merge strategy on Oracle an ZFS appliance using snapshots. If I perform a switch-to-copy recovery of one or more datafiles, how do I avoid forcing a new full backup on the next incremental merge backup?".

I thought this was a great question, and I created a test database, and started googlin'.  Below are some of the posts I looked at.

I started  by using the first post and walked through a testing scenario using a DBCS database in OCI.
My database was a 19.8 database using local storage (to make things easier to see the datafiles), and it was using OMF by default.  
The piece that was missing from the first post was the "alter database move datafile 'xxx' to 'xxx' KEEP;

What I found is that it wasn't so easy with my database using OMF for 2 reasons.
  1. Using OMF, you don't specify the "to 'xxx'" since OMF will automatically name the destination datafile.
  2. Using "KEEP" is ignored when the source file is OMF.  This meant that the original image copy being used by the database is removed when move process completes.  I couldn't catalog the image copy.
Since it took a bit of research to find the best strategy I wanted to share the process that I would recommend when dealing with OMF and non-OMF image copy backups with snapshots.

NON-OMF image copy backups

  1. Snap the backup storage just to preserve the starting point. --> optional but recommended
  2. Take the tablespaces offline
  3. Perform a "switch to copy" of the datafiles --> This will use the incremental merge backup.
  4. Recover the datafiles
  5. Bring the tablespaces online ---> Application is running using the external image copy
  6. Perform an "alter database move datafile 'xxx' to 'xxx' KEEP; --> Using keep will preserve the original copy, but will only work if the image copy is NOT OMF. If the destination is an OMF file, you will not use the "to"
  7. Catalog the image copy that was preserved with the "KEEP" ensuring you use the same tag used for the incremental merge. "catalog datafilecopy '+fra/ocm121/datafile/MONSTER.346.919594959' level 0 TAG 'incr_update';"
  8. The next incremental merge will pick up with the updated image copy.

OMF image copy backups

  1. Snap the backup storage to create a copy for the switch to copy.
  2. Unmount the "current" image copy 
  3. Mount the snap copy using the same mount point as the "current" image copy.
  4. Take the tablespaces offline
  5. Perform a "switch to copy" of the datafiles --> This will use the snap copy of the incremental merge backup on external storage.
  6. Recover the datafiles
  7. Bring the tablespaces online ---> Application is running using external copies.
  8. Perform an "alter database move datafile 'xxx' ; --> Since the source is an OMF file you cannot use "KEEP" to preserve the original copy. The original copy will be removed.
  9. Once all moves are complete, unmount the snapped copy.
  10. Mount the "current" copy. This is as of when you started this process.
  11. Catalog the image copy for all datafiles that performed the "switch to copy" ensuring you use the same tag used for the incremental merge "catalog datafilecopy '+fra/ocm121/datafile/MONSTER.346.919594959' level 0 TAG 'incr_update';"
  12. You can now destroy the snap that was created to perform the switch to copy.
  13. The next incremental merge will pick up with the current image datafile copies where it left off.
As you can see, using OMF greatly complicates preserving the incremental merge backup, and forces you to start at the last backup.

Wednesday, November 16, 2022

ZDLRA - Quick Start Guide

 This post is intended to be a Quick Start Guide for those who are new to ZDLRA (RA for short).  I spend part of time working with customers who are new the RA and often the same topics/questions come up.  I wanted to put together a "Quick Start" guide that they can use to learn more about these common topics.


ZDLRA Quick Start


The steps I would follow for anyone new to the RA are.


  1. Read through the section on configuring users and security settings for the RA. Decide which compliance settings make sense for the RA and come with a plan to implement them.
  2. Identify the users, both OS users (if you are disabling direct root access), and users within the databases that will mange and/or monitor the RA. OS users can be added with "racli add admin_user". Database users can be added with "racli add db_user"
  3. Create protection policies that contain the recovery window(s) that you want to set for the databases. You will also set compliance windows when creating policies. This can be done manually using the package DBMS_RA.CREATE_PROTECTION_POLICY.
  4. Identify the VPC user(s) needed to manage the database. Is it a single DBA team, or different teams requiring multiple VPC users? Create the VPC user using "racli add vpc_user"
  5. Add databases to be backed up to the RA, associate the database with both a protection policy and a VPC user who will be managing the database. NOTE that you should look at the Reserved Space, and adjust it as needed.  Databases can be added manually by using two PL/SQL calls. DBMS_RA.ADD_DB will add the database to the RA. DBMS_RA.GRANT_DB_ACCESS will allow the VPC user to manage the database.
  6. Configure the database to be backed up to the RA either by using OEM, or manually. The manual steps would be
    • Create a wallet on the DB client that contains the VPC credentials to connect to the RA.
    • Update the sqlnet.ora file to point to this wallet
    • Connect to the RMAN catalog on the RA from the DB client
    • Register the database to the RA
    • Configure the channel configuration to point to the RA
    • Configure Block change tracking (if it is not configured).
    • Configure the redo destination to point to the RA if you want to configure real-time redo.
    • Change the RMAN retention to be "applied to all standby" if using real-time redo, or "backed up 1 time" if not.
    • Update OEM to have the database point to the RMAN catalog on the ZDLRA.

Documentation

The documentation can be found here. Within the documentation there are several sections that will help you manage the RA.

Get Started 

The get started section contains some subtopics to delve into

Install and configure the Recovery Appliance

The links in this section cover all the details about the installation and configuration of the RA.  I won't be talking about those sections in the post, but be aware this is where to look for general maintenance/patching/expanding information.

Learn about the Recovery Appliance.

This section covers an overview of the RA, and is mostly marketing material. If you are not familiar with the RA, or want an overview this is the place to turn.

Administer the Recovery Appliance


These sections are going to be a lot more helpful to get you started. This section of the documentation covers 

Managing Protect Policies - Protection policies is the place to start when configuring an RA. Protection policies group databases together and it is critical to make sure you have the correct protection policies in place before adding databases to be backed up.

Copying Backups to Tape - This section is useful if you plan on creating backups (either point in time or archival) that will be sent externally from the RA. This can be either to physical/virtual tape, or to an external media manager.

Archiving Backups to the Cloud - This section covers how to configure the RA to send backups to an OCI compatible object storage.  This can either be OCI, or it can be an on-premises ZFS that has a project configured as OCI object storage.

Accessing Recovery Appliance Reports - This section covers how to access all the reports available to you.  You will find these reports are priceless to manage the RA over time. Some examples of the areas these reports cover are.
  • Storage Capacity Planning reports with future usage projections
  • Recovery Window Summary reports to validate backups are available
  • Active incident reports to manage any alerts
  • API History Report to audit any changes to the RA
NOTE : If you are using the RA in a charge backup model to your internal business units, there is specific reporting that can be used for this. Talk your Oracle team find out more.

Monitoring the Recovery Appliance - This section covers how to monitor the RA and set up alerts. This will allow you identify any issues that would affect the recovery of the backups including space issues, and missing backups.


Administer the Recovery Appliance

Configure Protected Databases - This section goes through how to configure databases to be backed up to the recovery appliance and includes instructions for both using OEM, and adding databases using the command line.

Backup Protected Databases - This section covers how to backup a database from either OEM, or from the traditional RMAN command line. I would also recommend looking at the MOS note to ensure that you are using the current best practices for backups. "RMAN best practice recommendations for backing up to the Recovery Appliance (Doc ID 2176686.1)".

Recover Databases - This section covers how to recover databases from the RA. This section also covers information about cloning databases. Cloning copies of production is a common use case for the RA, and this section is very useful to help you with this process.


Books

This section contains the documentation you look at regularly to manage the RA and answer questions that you may have on managing it.  I am only going to point the sections that you find most useful.


Deployment

The one important section under deployment is the Zero Data Loss Recovery Appliance Owners Guide.

Zero Data Loss Recovery Appliance Owners Guide - This guide contains information on configuring users on the RA, and the most critical sections to look at are

  •  "Part III Security and Maintenance of Recovery Appliance".   If you are using the RA to manage immutable backups, it is important to go through this section to understand how users will be managed for maximum protection of your backups.
  • Part IV Command Reference - This section covers the CLI commands you will use the manage the RA.

Administration

This is probably the most important guide in the documentation. It covers many of the areas of you will be managing as you configure databases to be backed up.  The most critical sections are

Part I Managing Recovery appliance - This section covers
  • Implementing Immutable Backups
  • Securing the Recovery Appliance operations
  • Managing Protection Policies
  • Configuring replication and replication concepts
  • Additional High Availability strategies
Part III Recovery Appliance Reference - This section covers
  • DBMS_RA packages to manage the RA through commands
  • Recovery Appliance View Reference to see what views are available

MOS Notes

There are number of useful MOS notes that you will want to bookmark

  • Zero Data Loss Recovery Appliance (ZDLRA) Information Center (Doc ID 2673011.2)
  • How to Backup and Recover the Zero Data Loss Recovery Appliance (Doc ID 2048074.1)
  • Zero Data Loss Recovery Appliance Supported Versions (Doc ID 1927416.1)
  • Zero Data Loss Recovery Appliance Software Updates Guide (Doc ID 2028931.1)
  • Cross Platform Database Migration using ZDLRA (Doc ID 2460552.1)
  • How to Move RMAN Catalog To A Different Database (Doc ID 351918.1)

Helpful Blogs

Fernando Simon

Fernando has a number of helpful blog entries. Be aware he has been blogging for a long time on the RA, and some of the management processes have changed. One example is RACLI is now used to create VPC users. Some of the Blogs to note are

Bryan Grenn


I have a number of blog posts on features of the ZDLRA.









Thursday, October 6, 2022

Estimated space for Compliance Window on RA

 In this post  I will go through how to estimate how much space you need to store backups on the Recovery Appliance to meet your Compliance Window.

This is critical to understand, since compliance protected backups cannot be removed from the RA, and if all space is utilized to meet Compliance Windows, new backups will be refused.


First a bit about Compliance window.


COMPLIANCE WINDOW

Compliance Window is set at the Policy level.  All databases within that policy will inherit the Compliance Window going forward.  Below is some more detail you need to know on Compliance Window.

  • The Compliance Window cannot be greater than the Recovery Window Goal
  • You cannot set the Policy to "Auto Tune" reserve space when setting a Compliance Window. You must manage the reserve space as you did in the past.
  • The Compliance Window can be adjusted up or down once set, but this will not affect any previous backups. Backups previously created observe the Compliance Window in effect when the backup was created.
  • The RA does not have to be in Compliance Mode (disabled direct root access) in order to set the a Compliance Window.

Space management for Compliance Window

Reserved Space

If you are familiar with reserved space, then you understand how that can help.  Reserved space is set for each database, and is the estimate of how much is needed to meet the Recovery Window Goal.  The major points to understand with reserved space are
  • The sum of all reserved space cannot be greater than the usable space on the RA.
  • Reserved space is used during space pressure to determine which databases will not be able to keep their recovery window goal. Databases with reserved space less than what is needed will have their older backups purged first.
  • Reserved space should be either
    • About 10% greater than the space needed to meet the recovery window goal
    • The high water mark of space needed during large volume updates (Black Friday sales for example).
By setting the reserved space for each database to be 10% larger than the space needed to meet the recovery window goal, you can alert when the Recovery Appliance cannot accept new databases to be backed up.  If all reserved space is allocated, then the Recovery Appliance is 90% full.

Recovery Window Goal

Within each policy you set a recovery window goal. This is a "goal" and if you run into space pressure, backups can be deleted from databases with insufficient reserved space (noted in the previous section).
The recommendation is to set the Compliance Window smaller than Recovery Window Goal if all databases are being protected.
By setting the recovery window goal smaller, you can alert when the required space to meet the recovery window goal is not available on the Recovery Appliance.  This will give you time to determine the space issue and take corrective action.


Compliance Window


Within each policy you can set a Compliance Window. This will lock any backups for the protected databases from being deleted, and will disable the database from being from the Recovery Appliance as long as it has backups that fall under compliance.  Since these backups cannot be removed, and the database cannot be removed, it is critical that you do not reach the point where all storage is utilized by compliant backups.

ESTIMATING COMPLIANCE SPACE

As you can tell by reading through how this works, it is critical to understand the space needed for compliant backups. 
The recommendation to estimate the space needed is to utilize the DBMS_RA.ESTIMATE_SPACE package.
Unfortunately with release 21.1 you cannot call this package from within a SQL statement. You will receive the following error.

Select dbms_ra.estimate_space ('TIMSP' , numtodsinterval(45,'day')) from dual
       *
ERROR at line 1:
ORA-14551: cannot perform a DML operation inside a query
ORA-06512: at "RASYS.DBMS_RA_MISC", line 5092
ORA-06512: at "RASYS.DBMS_RA", line 1204
ORA-06512: at line 1


In order to help everyone calculate the space needed, I came up with a code snippet that can give you the data you need.
Using the snippet below, and setting the variable for compliance window you can create an HTML report that will show you the estimate for space needed.




What the output looks like is below.  Note you can adjust the compliance window you want to look at.






This should allow you to look at the effect of setting a compliance window and compare it to the reserved space, and the RWG database by database, policy by policy, and as a whole.




Thursday, September 29, 2022

ZFSSA File Retention and Snapshot Retention provide protection for RMAN incremental merge backups.

File Retention Lock and Snapshot Retention Lock are great new features on ZFSSA that can help protect your backups from deletion and help you meet regulatory requirements. Whether it be an accidental deletion or a bad actor attempting to corrupt your backups they are protected.

In this post I am going to walk through how to implement File Retention and Snapshot Retention together to protect an RMAN incremental merge backup from being deleted . 

 Why do I need both? 

The first question you might have is why do I need both File Retention and Snapshot Retention to protect my backups ? RMAN incremental merge backups consists of 3 types of backup pieces.

 FILE IMAGE COPIES - Each day when the backup job is executed the same image copy of each datafile file is updated by recovering the datafile with an incremental backup. This moves the image copy of each datafile forward one day using the changed blocks from the incremental backup. The backup files containing the image copy of the datafiles needs to be updatable by RMAN.

INCREMENTAL BACKUP - Each day a new incremental backup (differential) is taken. This incremental backup contains the blocks that changed in the database files since the previous incremental backup. Once created this file does not change. 

 ARCHIVE LOG BACKUPS - Multiple times a day, archive log backups (also known as log sweeps) are taken. These backup files contain the change records for the database and do not change once written. 


 How to leverage both retention types 


 SNAPSHOT RETENTION can be used to create a periodic restorable copy of a share/project by saving the unique blocks as of the "snapshot" time a new snapshot is taken. Each of these periodic snapshots can be scheduled on a regular basis. With snapshot retention, the snapshots are locked from being deleted, and the schedule itself is locked to prevent tampering with the snapshots. This is perfect for ensuring we have a restorable copy of the datafile images each time they are updated by RMAN.

FILE RETENTION can be used to lock both the incremental backups and the archive log backups. Both types of backup files do not change once created and should be locked to prevent removal or tampering with for the retention period. 


 How do I implement this ? 

First I am going create a new project for my backups named "DBBACKUPS". Of course you could create 2 different projects. Within this project I am going to create 2 shares with different retention settings. 

 FULLBACKUP - Snapshot retention share 

 My image copy backups are going to a share that is protected with snapshot retention. The documentation on where to start with snapshot retention can be found here. In the example below I am keeping 5 days of snapshots, and I am locking the most recent 3 days of snapshots. This configuration will ensure that I have locked image copies of my database files for the last 3 days. 

 NOTE: Snapshots only contain the unique blocks since the last snapshot, but still provide a FULL copy of each datafile. The storage used to keep each snapshots is similar to the storage needed for each incremental backup. 

ZFSSA snapshot retention settings for /fullbackup




 DAILYBACKUPS - File Retention share 

My incremental backups and archivelog backups are going to a share with File Retention. The files (backup pieces) stored on this share will be locked from being modified or deleted. The documentation on where to start with File Retention can be found here

 NOTE: I chose the "Privileged override" file retention policy. I could have chosen "Mandatory" file retention policy if I wanted to lock down the backup pieces even further. 

 In the example below I am retaining all files for 6 days. 

ZFSSA file retention settings for /dailybackups



DAILY BACKUP SCRIPT 


Below is the daily backup script I am using to perform the incremental backup, and the recovery of the image copy datafiles with the changed blocks. You can see that I am allocating channels to "/fullbackup" which is the share configured with Snapshot Retention, and the image copy backups are going to this share. The incremental backups are going to "/dailybackups" which is protected with File Retention. 

run {
  ALLOCATE CHANNEL Z1 TYPE DISK  format '/fullbackup/radb/DATA_%N_%f.dbf';
  ALLOCATE CHANNEL Z2 TYPE DISK  format '/fullbackup/radb/DATA_%N_%f.dbf';
  ALLOCATE CHANNEL Z3 TYPE DISK  format '/fullbackup/radb/DATA_%N_%f.dbf';
  ALLOCATE CHANNEL Z4 TYPE DISK  format '/fullbackup/radb/DATA_%N_%f.dbf';
  ALLOCATE CHANNEL Z5 TYPE DISK  format '/fullbackup/radb/DATA_%N_%f.dbf';
  ALLOCATE CHANNEL Z6 TYPE DISK  format '/fullbackup/radb/DATA_%N_%f.dbf';
  
  backup
    section size 32G
    incremental level 1
    for recover of copy with tag 'DEMODBTEST' database FORMAT='/dailybackups/radb/FRA_%d_%T_%U.bkp';
  recover copy of database with tag 'DEMODBTEST' ;
  RELEASE CHANNEL Z1;
  RELEASE CHANNEL Z2;
  RELEASE CHANNEL Z3;
  RELEASE CHANNEL Z4;
  RELEASE CHANNEL Z5;
  RELEASE CHANNEL Z6;
}


 ARCHIVELOG BACKUP SCRIPT 

Below is the log sweep script that will perform the periodic backup of archive logs and send them to the "/dailybackups" share which has File Retention configured. 

run {
  ALLOCATE CHANNEL Z1 TYPE DISK  format '/dailybackups/radb/ARCH_%U.bkup';
  ALLOCATE CHANNEL Z2 TYPE DISK  format '/dailybackups/radb/ARCH_%U.bkup';
  ALLOCATE CHANNEL Z3 TYPE DISK  format '/dailybackups/radb/ARCH_%U.bkup';
  ALLOCATE CHANNEL Z4 TYPE DISK  format '/dailybackups/radb/ARCH_%U.bkup';
  ALLOCATE CHANNEL Z5 TYPE DISK  format '/dailybackups/radb/ARCH_%U.bkup';
  ALLOCATE CHANNEL Z6 TYPE DISK  format '/dailybackups/radb/ARCH_%U.bkup';

  
  backup
    section size 32G
    filesperset 32
    archivelog all;
  RELEASE CHANNEL Z1;
  RELEASE CHANNEL Z2;
  RELEASE CHANNEL Z3;
  RELEASE CHANNEL Z4;
  RELEASE CHANNEL Z5;
  RELEASE CHANNEL Z6;
}




 RESULT: 

This strategy will ensure that I have 5 days of untouched full backups available for recovery. It also ensures that I have 6 days of untouched archive logs, and incremental backups that can be applied if necessary. This will protect my RMAN incremental merge backups using a combination of Snapshot Retention for backup pieces that need to be updated, and File Retention for backup pieces that will not change.

Friday, July 29, 2022

OCI Database backups with retention lock

 OCI Object Storage provides both lifecycle rules and retention lock.  How to take advantage of both these features isn't always as easy as it looks.

 In this post I will go through an example customer request and how to implement a backup strategy to accomplish the requirements.

OCI Buckets

This image above gives you an idea of what they are looking to accomplish.

Requirements

  • RMAN retention is to keep a 14 day point in time recovery window
  • All long term backups beyond 14 days are cataloged as KEEP backups
  • All buckets are protected with a retention rule to prevent backups from being deleted before they become obsolete
  • Backups are moved to lower tier storage when appropriate to save costs.

Backup strategy

  • A full backup is taken every Sunday at 5:30 PM and this backup is kept for 6 weeks.
  • Incremental backups are taken Monday through Saturday at 5:30 PM and are kept for 14 days
  • Archive log sweeps are taken 4 times a day and are kept for 14 days
  • A backup is taken the 1st day of the month at 5:30 PM and this backup is kept for 13 months.
  • A full backup is taken following the Tuesday morning bi-weekly payroll run and is kept for 7 years
This sounds easy enough.  If you look at the image above you can what this strategy looks like in general. I took this strategy and mapped it to the 4 buckets, how they would be configured, and what they would contain. This is the image below.

OCI Object rules


Challenges


As I walked through this strategy I found that it involved some challenges. My goal was limit the number of full backups to take advantage of current backups.  Below are the challenges I realized exist with this schedule
  • The weekly full backup taken every Sunday is kept for longer than the incremental backups and archive logs. This caused 2 problems
    1. I wanted to make this backup a KEEP backup that is kept for 6 weeks before becoming obsolete.  Unfortunately KEEP backups are ignored as part of an incremental backup strategy. I could not create a weekly full backup that was both a KEEP backup and also be used as part of  incremental backup strategy.
    2. Since the weekly full backup is kept longer than the archive logs, I need to ensure that this backup contains the archive logs needed to defuzzy the backup without containing too many unneeded archive logs
  • The weekly full backup could fall on the 1st of the month. If this is the case it needs to be kept for 13 months otherwise it needs to be kept for 6 weeks.
  • I want the payrun backups to be immediately placed in archival storage to save costs.  When doing a restore I want to ignore these backups as they will take longer to restore.
  • When restoring and recovering the database within the 14 day window I need to include channels allocated to all the buckets that could contain those buckets. 14_DAY, 6_WEEK,  and 13_MONTH.

Solutions

I then worked through how I would solve each issue.

  1. Weekly full backup must be both a normal incremental backup and KEEP backup - After doing some digging I found the best way to handle this issue was to CHANGE the backup to be a KEEP backup with either a 6 week retention, or a 13 month retention from the normal NOKEEP type. By using tags I can identify the backup I want change after it is no longer needed as part of the 14 day strategy.
  2. Weekly full backup contains only archive logs needed to defuzzy - The best way to accomplish this task is to perform an archive log backup to the 14_DAY bucket immediately before taking the weekly full backup
  3. Weekly full backup requires a longer retention - This can be accomplished by checking if the the full backup is being executed on the 1st of the month. If it is the 1st, the full backup will be placed in the 13_MONTH bucket.  If it is not the 1st, this backup will be placed in the 6_WEEK bucket.  This backup will be created with a TAG with a format that can be used to identify it later.
  4. Ignore bi-weekly payrun backups that are in archival storage - I found that if I execute a recovery and do not have any channels allocated to the 7_YEAR bucket, it will may try to restore this backup, but it will not find it and move to the next previous backup. Using tags will help identify that a restore from the payrun backup was attempted and ultimately bypassed.
  5. Include all possible buckets during restore - By using a run block within RMAN I can allocate channels to different buckets and ultimately include channels from all 3 appropriate buckets.
Then as a check I drew out a calendar to walk through what this strategy would look like.

OCI backup schedule


Backup examples

Finally I am including examples of what this would look like.

Mon-Sat 5:30 backup job



dg=$(date +%Y%m%d)
rman <<EOD
run {
ALLOCATE CHANNEL daily1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
ALLOCATE CHANNEL daily2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
backup incremental level 1 database tag="incr_backup_${dg}" plus archivelog tag="arch_backup_${dg}";
   }
exit
EOD

Sat 5:30 backup job schedule

1) Clean up archive logs first



dg=$(date +%Y%m%d:%H)
rman <<EOD
run {
ALLOCATE CHANNEL daily1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
ALLOCATE CHANNEL daily2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
backup archivelog tag="arch_backup_${dg}";
   }
exit
EOD

2a) If this 1st of the month then execute this script to send the full backup to the 13_MONTH bucket


dg=$(date +%Y%m%d)
rman <<EOD
run {
ALLOCATE CHANNEL monthly1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/13_MONTH.ora)';
ALLOCATE CHANNEL monthly2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/13_MONTH.ora)';
backup incremental level 1 database tag="full_backup_${dg}" plus archivelog tag="full_backup_${dg}";
   }
exit
EOD


2b) If this is NOT the 1st of the month execute this script and send the full backup to the 6_WEEK bucket

dg=$(date +%Y%m%d)
rman <<EOD
run {
ALLOCATE CHANNEL weekly1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/6_WEEK.ora)';
ALLOCATE CHANNEL weekly2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/6_WEEK.ora)';
backup incremental level 1 database tag="full_backup_${dg}" plus archivelog tag="full_backup_${dg}";
   }
exit
EOD


3a) If today is the 15th then change the  full backup to a 13 month retention


dg=$(date --date "-14 days" +%Y%m%d)
rman <<EOD
CHANGE BACKUPSET TAG="full_backup_${dg}" keep until time 'sysdate + 390';
EOD

3b) If today is NOT the 14th then change the  full backup to a 6 week retention


dg=$(date --date "-14 days" +%Y%m%d)
rman <<EOD
CHANGE BACKUPSET TAG="full_backup_${dg}" keep until time 'sysdate + 28';
EOD

Tuesday after payrun backup job 

1) Clean up archive logs first


dg=$(date +%Y%m%d:%H)
rman <<EOD
run {
ALLOCATE CHANNEL daily1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
ALLOCATE CHANNEL daily2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
backup archivelog tag="arch_backup_${dg}";
   }
exit
EOD

2) Execute the keep backup


dg=$(date +%Y%m%d)
rman <<EOD
run {
ALLOCATE CHANNEL yearly1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/7_YEAR.ora)';
ALLOCATE CHANNEL yearly2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/7_YEAR.ora)';
backup database tag="payrun_backup_${dg}" plus archivelog tag="full_backup_${dg}" keep until time 'sysdate + 2555';
   }
exit
EOD


Restore example

Now in order to restore, I need to allocate channels to all the possible buckets. Below is the script I used  to validate this with a "restore database validate" command.


run {
ALLOCATE CHANNEL daily1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
ALLOCATE CHANNEL daily2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
ALLOCATE CHANNEL weekly1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/6_WEEK.ora)';
ALLOCATE CHANNEL weekly2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/6_WEEK.ora)';
ALLOCATE CHANNEL monthly1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/13_MONTH.ora)';
ALLOCATE CHANNEL monthly2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/13_MONTH.ora)';
restore database validate;
    }


Below is what I am seeing in the RMAN log because I picked a point in time where I want it to ignore the 7_YEAR backups.

In this case you can see that it tried to retrieve the Payrun backup but failed back to the previous backup with tag "FULL_073122". This is the backup I want.


channel daily1: starting validation of datafile backup set
channel daily1: reading from backup piece h613o4a4_550_1_1
channel daily1: ORA-19870: error while restoring backup piece h613o4a4_550_1_1
ORA-19507: failed to retrieve sequential file, handle="h613o4a4_550_1_1", parms=""
ORA-27029: skgfrtrv: sbtrestore returned error
ORA-19511: non RMAN, but media manager or vendor specific failure, error text:
   KBHS-07502: File not found
KBHS-01404: See trace file /u01/app/oracle/diag/rdbms/acmedbp/acmedbp/trace/sbtio_4819_140461854265664.log for det
failover to previous backup

channel daily1: starting validation of datafile backup set
channel daily1: reading from backup piece gq13o3rm_538_1_1
channel daily1: piece handle=gq13o3rm_538_1_1 tag=FULL_073122
channel daily1: restored backup piece 1
channel daily1: validation complete, elapsed time: 00:00:08


That's all there is to it. Tags are very help helpful to identify the correct backups.