Showing posts with label ZDLRA. Show all posts
Showing posts with label ZDLRA. Show all posts

Monday, October 23, 2023

Oracle Recovery Service now offers retention lock

 Oracle DB Recovery Service recently added a new feature to protect backups from being prematurely deleted, even by a tenancy administrator.  This new feature adds a retention lock to the Backup Retention Period at the policy level. The image below shows the new settings that you see within the protection policy.

Enabling retention lock

The recovery service comes with some default policies that appear as "oracle defined" policy types

Name            Backup retention period
Platinum            46 days
Gold                   65 days
Silver                 35 days
Bronze               14 days

These policies can't' be changed, and they do not enable retention lock.

In order to implement a retention lock you need to create a new protection policy or  update an existing user defined protection policy.

Step #1 Set/Adjust "Backup retention period"

If you are creating a new "user defined" protection policy, you need to set the backup retention to a number of days between 14 and 95.  You should also take this opportunity to adjust the backup retention of an existing policy, if appropriate, before it is locked.

NOTE: Once a retention lock on the protection policy is activated (discussed in step #3), the backup retention period cannot be decreased, it can only be increased.

Step #2 Click on "enable retention lock"

This step is pretty straightforward. But the most important item to know is that the retention lock is not immediately in effect.  Much like the "retention lock" that is set on object storage, there is a minimum period of at least 14 days before the lock is "active".

 Note: Once the grace period has expired for the policy (explained later in this blog post) the  "retention lock"  is permanent and cannot be removed.


Step #3 Set "Scheduled lock time"

As I said in the previous step, the lock isn't immediately active. In this step you set the future date/time  that the lock time becomes active, and this Date/Time must be at least 14 days in the future.  This provides a grace period that delays when the lock on the policy becomes active. You have up until the lock activation date/time to adjust the scheduled lock time further into the future if it becomes necessary to further day lock activation.

Grace Period 

I wanted to make sure I explain what happens with this grace period so that you can plan accordingly.

  • If you change an existing "user defined" policy to enable the retention lock, any databases that are a member of this policy will not have locked backups until the scheduled lock date/time activates the lock.  
  • If you add databases to a protection policy that has a retention lock enabled, the backups will not be locked until whichever time is farther in the future.
    • Scheduled lock time for the policy if the retention lock has not yet activated.
    • 14 days after the database is added to the protection policy.
  • Databases can be removed from a retention locked protection policy during this grace period.
  • If the policy itself is still within it's grace period from activating, the backup retention period can be adjusted down for the protection policy.
NOTE: This 14 day grace period allows you to review the estimated space needed.  On the protected database summary page, for each database, you can see the "projected space for policy"  in the Space Usage section.  This value can be used to estimate the "locked backup" utilization.


What happens with a retention lock ?

Once the grace period expires the backups for the protected database are time locked and can't be prematurely deleted.  

The backups are protected by the following rules.

1. The database cannot be moved to another policy. No user within the tenancy, including an administrator can remove a database from it's retention enabled policy.  If it becomes necessary to move a database to another policy , an SR needs to raised, and security policies are followed to ensure that this is an approved change.


2.  There is always a 14 day grace period in which changes can be made before the backups become locked. This is your window to verify the backup storage usage required before the lock activates.

3. Even if you check the "72 hour termination option" on the database, backups are locked throughout the retention window.


Comments:

This is a great new feature that protects backups from being deleted by anyone in the tenancy, including tenancy administrators.  This provides an extra layer of security from an attack with compromised credentials.  Because the lock is permanent, always use the 14 day grace period to ensure the usage and duration is appropriate for you database.






Wednesday, October 4, 2023

Cyber Vault Characteristics

 One topic that has been coming up over and over this year is Cyber Vault. In this post I am going to through the characteristics I commonly see when a customer build a Cyber Vault.  The image below gives you a good idea of what is involved.

Characteristics of a Cyber Vault

Cyber Vault


  • NTP and DNS services.: Because a Cyber Vault is often isolated from the rest of the datacenter it is critical to have NTP service.  Proper time management is critical to ensuring backups are kept for the proper retention.  DNS isn't critical, but it is definitely very helpful in configuring infrastructure.  In many cases "/etc/hosts" can get around this, but is a pain to maintain.
  • Firewalls:  Configuring firewalls, and isolated networks is critical to ensure the Cyber Vault is isolated.  The vault is often physically in the same datacenter, with network isolation providing the protection.  Be sure to understand what ports, networks, and traffic direction is utilized on all infrastructure so you can proper set firewall rules.
  • Air Gap:  Creating an Air-Gap has become the standard to protect backups in the Cyber Vault. The Air Gap is often open for only a few hours a day at random times to ensure that the opening isn't predictable.  To limit the exposure time, it is critical to maximize the networking into the vault, and minimize the amount of data necessary to transfer.
 NOTE: Not all customers choose to have an Air Gap.  Having an Air Gap that is closed for long periods of times ensures there is less chance of intrusions, BUT it guarantees long periods of data loss when a restoration is performed.  This is most critical to decide with databases that are always changing.
  • Break-the-glass: There needs to be control on who gets access into the vault, and an approval process to ensure that all access is planned and controlled.
  • Backup validation: There needs to be a validation process in a vault to ensure that the backups are untouched.  When the backups contain executables, this is typically scanning for ransomware signatures. When backups are Oracle Backups, performing  "Restore Database Validate" is the gold standard for validating backups.
  • Clean Room: A clean room is an environment where backups can tested, This can be a small environment (a server or 2) or it can be large enough to restore and run the whole application.
  • Monitoring and reporting infrastructure : For Oracle this OEM (Cloud Control). It is critical that any issues are alerted and reported outside the vault.
  • Audit Reports: Audit reports are critical to ensuring that the backups in the Cyber Vault are secured.  Audit reports will capture any changes to the environment, and any issues with the backups themselves.

BONUS: One thing that customers don't often think about is encryption keys.  Implementing TDE on Oracle Databases is an important part of protecting your data from exfiltration. But you should also ensure that you have a secure backup of you encryption keys in the Vault.
OKV (Oracle Key Vault) is the best way of managing the keys for Oracle databases.

Tuesday, September 5, 2023

Creating dynamic KEEP archival backups from ZDLRA

 This post covers how to utilize the new package DBMS_RA.CREATE_ARCHIVAL_BACKUP to dynamically create KEEP archival backups from a ZDLRA.

When using this package to schedule KEEP backups, I recommend creating restore points with every incremental backup.  Read this blog post to find out why.

PROCEDURE CREATE_ARCHIVAL_BACKUP(
   db_unique_name IN VARCHAR2,
   from_tag IN VARCHAR2 DEFAULT NULL,
   compression_algorithm IN VARCHAR2 DEFAULT NULL,
   encryption_algorithm IN VARCHAR2 DEFAULT NULL,
   restore_point IN VARCHAR2 DEFAULT NULL,
   restore_until_scn      IN VARCHAR2 DEFAULT NULL,
   restore_until_time     IN TIMESTAMP WITH TIME ZONE DEFAULT NULL,
   attribute_set_name     IN VARCHAR2,
   format                 IN VARCHAR2 DEFAULT NULL,
   autobackup_prefix      IN VARCHAR2 DEFAULT NULL,
   restore_tag            IN VARCHAR2 DEFAULT NULL,
   keep_until_time        IN TIMESTAMP WITH TIME ZONE DEFAULT NULL,
   max_redo_to_apply      IN INTEGER DEFAULT 14                    --> Added in 21.1 June PSU
   comments IN VARCHAR2 DEFAULT NULL);

NOTE: This blog post was updated to include the MAX_REDO_TO_APPLY parameter which is not documented as of writing this post.

 The documentation can be found here.  

These archival KEEP backups can be sent to either

  • TAPE - Using the copy-to-tape process you can send archival backups to physical tape, virtual tape, or any media manager that uses a "TAPE" backup type.
  • CLOUD - Using the copy-to-cloud process you can send archival backups to an OCI object store bucket which can be either on a local ZFSSA (using the OCI API protocol), or to the Oracle Cloud directly.



NOTE: When sending backups to a cloud location, retention rules can be set on the bucket LOCKING the cloud backups to ensure that they are immutable.  This is integrated with the new compliance settings on the RA21.



How to use this package

1. Identify the Database

Because this is more of an on demand process, you have to execute the package for each database separately (rather than by using a protection policy), and identify for each database the point-in-time you want to use for recovery..

2. Set Archival Restore Point

Because the archival backup is dynamically created using existing backups the restore point works differently than if you create the KEEP backup on demand from the protected database. 


When you create a KEEP backup from the protected database, the backup contains 

    • Full backup of all datafiles
    • Backup of spfile and controlfile
    • Backup of archive logs created during the backup starting with a log switch at the beginning of the backup.
    • Final archive logs created by performing a log switch at the end of the backup.

 When you create an Archival backup from the ZDLRA , the backup contains

    • Most current virtual full backup of each datafile prior to the point in time for recovery that you choose. 
    • Backup of spfile and controlfile 
    • Backup of the active archive logs generated when the oldest virtual full datafile backup started, up to the archive logs needed to recover until the point in time chosen for recovery.

As you can see a normal KEEP backup generated by the protected database is a a "self-contained" backup that can be recovered only to the point in time that the backup completed.  You can increase the recover point by adding additional KEEP archival log backups after the backup.

The dynamically created KEEP backup generated by the ZDLRA is also a "self-contained" backup that can be recovered to any point in time after the last datafile backup completed, but it also includes any point in time up to the restore point identified.  

Choices for a dynamic restore point 

 There are 3 options to choose a specific restore point. If you do not set one of these options, the KEEP backup will be created using the current restore point of the database.  

  •  RESTORE_POINT - If you set a unique restore point in the database immediately following an incremental backup (or  at a later point in time), you can create a KEEP backup that will recover to that point-in-time.  When using this process, after creating the restore point you should ensure that you also perform a log switch, and a log sweep to backup the archive logs.  This restore point name is used as the default RESTORE_TAG, and should be unique.  The recommended name (because it is the default KEEP restore tag) is "<KEEP_BACKUP_><yyyyMMddHH24miSS>".  BUT- in order to better identify the restore point, I would use a shorter name that just contains the date (assuming you are only performing an single daily incremental backup), for example "KEEP_BACKUP_MMDDYY".  By using a restore point, you can better control the amount of archive logs necessarily to recover the database.

 

    • Incremental forever backups ensure that the duration of the backup is much shorter than a typical full KEEP backup limiting the amount of archive logs necessary to have a recovery point.
    • Setting a restore point immediately following the backup ensures that the recovery window following the last datafile backup piece is short also limiting the amount of archive logs necessary.

  • RESTORE_UNTIL_SCN or RESTORE_UNTIL_TIME I am grouping these 2 choices together, because they are so similar.  Unlike using a restore point that is preset, using either of these options will create the KEEP archive backup with a recover point as the SCN number given or the UNTIL TIME given (using the databases timezone). 


FROM_TAG - The documentation states that only backups containing the FROM_TAG will be considered if a FROM_TAG is set. I am thinking this would make sense if you let the restore point default to the current time, and you want to choose which backup pieces to include.  I am not sure of the full use of this option however.


WARNING: This process only looks back 14 days for a full backup to start the KEEP backupset with.  If you do not have a full backup within the 14 day window this can be over ridden with the  MAX_REDO_TO_APPLY parameter in the package call. This was added in the 21.1 June PSU to allow customers to set a window farther than 14 days.

 RECOMMENDATIONS 

  •  Because you can create up to 2048 RESTORE_POINTs in a database, and normal restore points are automatically dropped when necessary, I would recommend creating a restore point following each incremental backup with the format mentioned above, This will allow you to create a self-contained FULL KEEP backup from any incremental backup as needed. This can be used to easily create an end-of-month KEEP backup (for example).

 

  • I would use the RESTORE_UNTIL options when it is necessary to create a KEEP backup as of a specific point-in-time regardless of when the backup completed. This would be used if the recovery point is critical.

WARNING

Before creating the archival backup, ensure you have the archive logs backed up that are needed to support the recover point, and ensure there is enough time for the incremental backups to virtualize.  You many need perform a log switch and execute an additional log sweep prior to scheduling the archival backup.

3. Set Archival Options


COMPRESSION_ALGORITHM
-  The default is no compression, and if the backup piece is already compressed, it will not try to compress the backup again.  The documentation does a good job of going through the options, and why you would chose one or the other.  Keep in mind that if your database uses TDE for all the datafiles, there will be no gain with compression, and the extra resources required for compression may slow down the restore.  Also, the compression is performed by the ZDLRA (RMAN compression), but the de-compression is performed by the protected database during restore.

 ENCRYPTION_ALGORITHM - The default is no encryption, but it is important to understand that any copy-to-cloud processing MUST have encryption set.  It is also important to understand that the ZDLRA must be using OKV (Oracle Key Vault) to store the encryption keys when encryption is set. The list of algorithms can be found in the documentation.

 

4. Set Archival Location and Name

ATTRIBUTE_SET_NAME - This must be specified, and this identifies the backup location to send the archival backups.

FORMAT - By default the  backup pieces are given handles automatically generated by the ZDLRA, this setting allows you to change the default backup piece format using normal RMAN formatting options.

AUTOBACKUP_PREFIX - - By default the autobackup pieces will retain the original names, but  you can add a prefix to the original autobackup names. 

 

5. Set Restore TAG

 By default the RESTORE_TAG defaults to  "<KEEP_BACKUP_><yyyyMMddHH24miSS>". This can be overridden to give the backup a more meaningful tag. For example, the end-of-month backup could be tagged as "MONTHLY_12_2023", making it easier to automate finding specific KEEP backups.

 RECOMMENDATIONS 

I would set the Restore Tag to a set format that makes the KEEP backups easy to find. You can see the example above. 

6. Set KEEP_UNTIL time

The default KEEP_UNTIL time is "FOREVER". In most cases you want to set an end date for the backup, allowing the ZDLRA to automatically remove the backup when it expires.  This date-time is based on the timezone of the protected database. 



 SUMMARY 

 If using this functionality to dynamically create Archival KEEP backups...

  • I would set a Restore Point in each database immediately following every incremental backup.  
  • I would schedule this procedure to create the archival backup with a formatted restore tag to make the backup easy to find.
  • If backing up to a CLOUD location, I would use retention rules to ensure the backups are immutable until they expire.

 

 

Monday, June 5, 2023

Autotuned_reserved_space is a new feature on the ZDLRA that you should be using

 Autotuned_reserved_space is a new policy setting that got released with 21.1 and you should be using it. When I talk to customers about how to manage databases on a ZDLRA, the biggest confusion comes in when I talk about reserved space.  Reserved space needs to be understood, and properly managed. This new feature in 21.1 allows the ZDLRA to handle the reserved space for you, and I explain how to use it in this blog post.  First let's go through space usage, and reserved space in general.

space usage ZDLRA

Space usage on the ZDLRA. 


Recovery Window goal (which drives the space utilization)

The recovery window goal is set at the policy level, and this value (in days) is the number of days that you want to keep as a recovery window for all databases that are a member of this policy.  This will drive the space utilization.

Total space

The ZDLRA comes with all the space pre-allocated.  When you are looking at OEM, or in the SAR report you will see the total space listed. You want to make sure that you have enough space for your database backups and any incoming new backups.

Used Space

When the ZDLRA purges backups beyond the the Recovery Window Goal that you set, if does a bulk purge of backups.  This can be controlled by setting the maximum disk backup retention in days (which defaults to 1.5 times the recovery window goal).  Because of the bulk purge, more space is shown as used than is needed to support your recovery window goal.

Recovery Window Space

This is the amount space that is needed to support the recovery window goal.  Because, of the bulk purge, the recovery window space is less than the used space.


Reserved space

In order to control what happens with space, the concept of reserved space is used.  When a database is added to the ZLDRA, the reserved space value is set for this database.  This value should be updated regularly to ensure that there is enough space for the database backups to be stored.

The important things to know about reserved space are:
  • The sum of all the reserved space cannot be greater than the total space available on the ZDLRA.
  • When adding a new database, it's reserved space must fit within the unreserved space.
  • When a new database is added, the reserved space must be set to least the size of the database, and defaults to 2.5 times the size of the database.
  • The reserved space for a database needs to be at least the size of the largest datafile.
  • The reserved space should be larger than the amount of space needed to support the recovery window goal space for the database.  For databases with fluctuation, you need to reserve space for the peak usage. 
The reserved space serves two purposes when properly set
  1. It can be used to determine how much space is available for new database backups.
  2. If the ZDLRA determines that it does not have enough space to support the recovery window goal of the supported databases, space is reclaimed from databases whose reserved space is too small.
It is critical to keep the reserved space updated, and many customers have used an automated process to set the reserved space to "recovery window space needed" + 10%

Unfortunately configuring an automated process for all databases does not take into account any fluctuations in usage.  Let's say I have a database which is much busier at months end, I want to make that sure my reserved space is not adjusted down to the low value, I want it to stay adjusted based on the highest space usage value.

Autotuned_reserved_space 


This where autotuned reserved space can help you manage the reserved space.  This setting is controlled at the policy level.  

AUTOTUNED_RESERVED_SPACE

This value is set at the protection policy level and contains either "YES" or "NO", and defaults to "NO". "YES" will allow the ZDLRA to manage reserved space automatically for all databases (whose disk_reserved_space is not set) and are a member of this policy.

MAX_RESERVED_SPACE


This value is also set at the protection policy level.  This value is optional for autotuned_reserved_space, but if set, it will control the maximum amount of reserved space that can be set for an individual database in the protection policy. 

AUTOTUNE_SPACE_LIMIT


This value is set at the storage level for ALL databases. This sets a reserved space usage limit, where autotuning can slow down large reserved space increases. When reached, autotune will limit databases from increasing their reserved space growth to 10% per week.  This value is optional and will default to the total space if not set.  


SUMMARY:

  • autotuned_reserved_space - Enables autotuning of space within a protection policy
  • max_reserved_space - Controls the maximum reserved space of databases in a protection policy
  • autotune_space_limit - Slows the reserved space growth when a specified space limit is reached.

What does autotune reserved space do ?

  • On a regular basis, if needed, the reserved space for each autotune controlled database is adjusted to reserve space for the recovery window goal, and incoming backups.
  • If the database has a disk_reserved_space set, autotuning will not be used for this database.  It is assumed that the disk_reserved_space will be set manually for this database

Autotune  will replace the need for the ZDLRA admin to constantly update the reserved space for each database, as it's space needs change over time. It will also allow them to configure a constant reserved space for databases with fluctuating storage usage.

Wednesday, May 10, 2023

ZDLRA real-time redo demonstrated

 One of the key features of the ZDLRA is the ability to capture changes from the database "real-time" just like a standby database does. In this blog post I am going to demonstrate what is happening during this process so that you can get a better understanding of how it works.

ZDLRA Real-time Redo


If you look at the GIF above, I will explain what is happening, and show what happens with a demo of the process.

The ZDLRA uses the same process as a standby database.  In fact if you look at the flow of the real-time redo you will notice the redo blocks are sent to BOTH the local redo log files, AND to the staging area on the ZDLRA.  The staging area on the ZDLRA acts just like a standby redo does on a standby database.

As the ZDLRA receives the REDO blocks from the protected database they are validated to ensure that they are valid Oracle Redo block information.  This ensures that a man-in-the-middle attack does not change any of the backup information.  The validation process also assures that if the database is attacked by ransomware (changing blocks), the redo received is not tainted.


The next thing that happens during the process is the logic when a LOG SWITCH occurs.  As we all know, when a log switch occurs on a database instance, the contents of the redo log are written to an archive log.  With real-time redo, this causes the contents of the redo staging area on the ZDLRA (picture a standby redo log) to become a backup set of an archive log.  The RMAN catalog on the ZDLRA is then updated with the internal location of the backup set.


Log switch operation

I am going to go through a demo of what you see happen when this process occurs.

ZDLRA is configured as a redo destination

Below you can see that my database has a "Log archive destination" 3 configured.  The destination itself is the database on the ZDLRA (zdl9), and also notice that the log information will be sent for ALL_ROLES, which will send the log information regardless if it is a primary database or a standby database.
Archive Dest


List backup of recent archive logs from RMAN catalog


Before I demonstrate what happens with the RMAN catalog, I am going to list out the current archive log backup. Below you see that the current archive log backed up to the ZDLRA has the "SEQUENCE #10".

archive log backups prior

Perform a log switch

As you see in the animation at the top of the post, when a log switch occurs, the contents of the redo log in the "redo staging area" are used to create an archive log backup that is stored and cataloged.  I am going to perform a log switch to force this process.

Log switch


List backup of archive logs from RMAN catalog

Now that the log switch occurred, you can see below that there is a new backup set created from the redo staging area.
There are a couple of interesting items to note when you look at the backup set created.

archive logs after


  1. The backup of the archive log is compressed.  As part of the policy on the ZDLRA you have the option to have the backup of the archive log compressed when it is created from the "staged redo". This does NOT require the ACO (Advanced Compression) license. The compressed archive log will be sent back to the DB compressed during a restore operation, and the DB host will uncompress it.  This is the default option (standard compression) and I recommend changing it.  If you decide to compress, then MEDIUM or Low is recommended. Keep this in mind that he this may put more workload on the client to uncompress  the backup sets which may affect recovery times.  NOTE: When using TDE, there will be little to no compression possible.
  2. The TAG is automatically generated. By looking at the timestamp in the RMAN catalog information, you can see that the TAG is automatically generated using the timestamp to make it unique.
  3. The handle begins with "$RSCN_", this is because the backup piece was generated by the ZDLRA itself, and archivelog backup sets will begin with these characters.

Restore and Recovery using partial log information


Now I am going to demonstrate what happens when the database crashes, and there is no time for the database to perform a log switch.

List the active redo log and current SCN

Below you can see that my currently active redo log is sequence # 12.  This is where I am going to begin my test.

begin test


Create a table 

To demonstrate what happens when the database crashes I am going to create a new table. In the table I am going to store the current date, and the current SCN. Using the current SCN we will be able to determine the redo log that contains the table creation.

table create


Abort the database


As you probably know, if I shut down the database gracefully, the DB will automatically clean out the redo logs and archive it's contents. Because I want to demonstrate what happens with crash I am going to shut the database down with an ABORT to ensure the log switch doesn't occur.  Then start the database mount so I can look at the current redo log information

abort


Verify that the log switch did not occur


Next I am going to look at the REDO Log information and verify that my table creation (SCN 32908369) is still in the active redo log and did not get archived during the shutdown.

Log switch doesn't occur

Restore the database


Next I am going to restore the database from backup.


restore

Recover the database


This is where the magic occurs so I am going to show that happens step by step.

Recover using archive logs on disk


The first step the database does is to use the current archive logs to recover the database. You can see in the screenshot below that the database recovers the database using archive logs on disk up to sequence #11 for thread 1.  This contains all the changes for this thread, but does not include what is in the REDO log sequence #12.  Sequence #12 contains the create table we are interested in.

archives on disk

Recover using partial redo log


This step is where the magic of the ZDLRA occurs.  You can see from the screen shot below that the RMAN catalog on the ZDLRA returns the redo log information for Sequence #12 even though it was never archived. The ZDLRA was able to create an archive log backup from the partial contents it had in the Redo Staging area.

rtr recovery

Open the database and display table contents.


This is where it all comes together.  Using the partial redo log information from Redo Log sequence #12, you can see that when the database is opened, the table creation transaction is indeed in the database even though the redo did not become an archive log.
'


Conclusion : I am hoping this post gives you a better idea of how Real-time redo works on the ZDLRA, and how it handles recovering transactions after a database crash

Wednesday, July 28, 2021

A New ZDLRA feature can help you migrate to a new ZDLRA

 A new feature was included in the 19.2.1.1.2 ZDLRA software release to help you migrate your backup strategy when moving to a new ZDLRA.


This feature allows you to continue to access your older backups during the cut-over period directly from the new ZDLRA.  You point your database restore to the the new ZDLRA  and it will automagically access the older backups if necessary. Once the cutover period has passed, the old ZDLRA can be retired.

I am going to walk through the steps.

1. Configure new ZDLRA

  • Add the new ZDLRA to OEM - The first step is to ensure that the new ZDLRA has been registered within your OEM environment. This will allow it to be managed, and of course monitored.
  • Add a replication VPC user to the new ZDLRA. This will be used to connect from the old ZDLRA.
  • Add the VPC users on the new ZDLRA that match the old ZDLRA
  • Configure policies on new ZDLRA to match old ZDLRA.
          This can done by dynamically executing DBMS_RA.CREATE_PROTECTION_POLICY. 
           Current protection policy information can be read from the RA_PROTECTION_POLICY view.
  • Add databases to proper protection policies on new ZDLRA.
        This can be done by dynamically executing DBMS_RA.ADD_DB. 
        Current database information can be read from the RA_DATABASE view.

  • Grant the replication VPC user access to all databases for replication.
        This can be done by dynamically executing DBMS_RA.GRANT_DB_ACCESS
        The current list of databases can be read from the RA_DATABASE view.

  • Grant the VPC users access to the database for backups/restores
        This can be by dynamically executing DBMS_RA.GRANT_DB_ACCESS
        The current list of grants can be read from the RA_DB_ACCESS view
  • Create a replication server on the old ZDLRA that points to the new ZDLRA
  • Add the protection policies on the old ZDLRA to the replication server created previously..

NOTE: When these steps are completed, the old ZDLRA will replicate the most recent L0 to the new ZDLRA, and will then replicate all new incremental backups and archive logs.




2. Switch to new ZDLRA for backups

  • Update the wallet on all clients to include the VPC user/Scan listener of the new ZDLRA.
  • Update the real-time redo configuration (if using real-time redo) to point to the new ZDLRA.
  • Update backup jobs to open channels to the new ZDLRA
  • Remove the VPC replication user from the new ZDLRA  
  • Drop the replication server on the old ZDLRA
NOTE: The backups will begin with an incremental backup based on the contents of the new ZDLRA and will properly create a "virtual full". Archive logs will automatically pick up with the sequence number following the last log replicated from the old ZDLRA.



3 . Configure "Read-Only Mode" replication to old ZDLRA

  • Add a replication VPC user on the old ZDLRA. This will be used to connect from the new ZDLRA.
  • Create a replication server from new ZDLRA to the old ZDLRA
  • Grant the replication VPC user on the old ZDLRA access to all databases for replication.
        This can be done by dynamically executing DBMS_RA.GRANT_DB_ACCESS
        The current list of databases can be read from the RA_DATABASE view.
  • Add a replication server for each policy that includes the "Read-Only" flag set to "YES".
NOTE: this will allow the new ZDLRA to pull backups from the old ZDLRA that only exist on the old ZDLRA.


4 . Retire old ZDLRA after cutover period

  • Remove replication server from new ZDLRA that points to old ZDLRA
NOTE: The old ZDLRA can now be decommissioned.



That's all there is to it. This will allow you to restore from the new ZDLRA, and not have to keep track of which backups are on which appliance during the cutover window !

Monday, March 29, 2021

ZDRLA adds smart incremental to be even smarter.

 Recently version 19.1.1.2 of ZDLRA software was released, and one the features is something called "Smart Incremental".  I will walk through how this feature works, and help you understand why features like this are "ZDLRA Only".




I am going to start by walking through how incremental backups become "virtual full backups", and that will give you a better picture of how "smart incremental" is possible.

The most important thing to understand about these features is that the RMAN catalog itself is within the ZDLRA  AND the ZDLRA has the ability to update the RMAN catalog.

How does a normal backup strategy work ? 

That is probably the best place to start.  What DBAs typically do is perform a WFDI (Weekly Full Daily Incremental) backup.  To keep my example simple, I will use the following assumptions.
  • My database contains 3 datafile database. SYSTEM, SYSAUX, USERS, but I will only use the example of backing up datafile users.
  • Each of these 3 datafiles are 50 GB in size
  • I am only performing a differential backup which creates a backup containing the changes since the last backup (full OR incremental).
  • My database is in archivelog  *
* NOTE: With ZDLRA you can back up a nologging database, and still take advantage of virtual fulls. The database needs to be in a MOUNTED state when performing the incremental backup.

If placed in a table the backups for datafile USERS would look this. Checkpoint SCN is the current SCN number of the database at the start of the backup.



If I were to look at what is contained in the RMAN catalog (RC_BACKUP_DATAFILE), I would see the same backup information but I would see the SCN information 2 columns.
  • Incremental change # is the oldest SCN contained in the backupset. This is the starting SCN number of the previous backup, this backup is based on.
  • Checkpoint Change # is  starting SCN number of the backup.  Everything newer than this SCN (including this SCN) needs to be defuzzied.


Normal backup progression (differential)


When performing an incremental RMAN backup of a datafile, the first thing that RMAN does is decide which blocks needs to be backed up. Because you are performing an incremental backup,  you may be backing up all of the blocks, only some of the blocks, or even none of the blocks if the file has not changed.
This is a decision RMAN makes by querying the RMAN catalog entries (or the controlfile entries if you not using an RMAN catalog).

Now let's walk through this decision process.  Each RMAN incremental differential's starting SCN is based on the beginning SCN of the previous backup (except for the full).



By looking at the RMAN catalog (or controlfile), RMAN determines  which blocks need to be contained in each incremental backup.



Normal backup progression (cumulative differential)


Up to release 19.1.1.2, the recommendation was to perform a Cumulative Differential backup. The cumulative differential backup compares the starting SCN number of the last full backup to determine the starting point of the incremental backup (rather than the last incremental backup) .
The advantage of the cumulative over differential, is that a cumulative backups can be applied to the last full and take the place of applying multiple differential backups.  However, cumulative backups are bigger  every day that passes between full backups because they contain all blocks since the last full.

Below is what a cumulative schedule would look like and you can compare this to the differential above.
You can see that each cumulative backups starts with the Checkpoint SCN of the last full to ensure that all blocks changed since the full backup started are captured.



The RMAN catalog entries would look like this.




If you were astute, you would notice a few things happened with the cumulative differential vs the differential.
  • The backup size got bigger every day
  • The time it took to perform the incremental backup got longer
  • The range of SCNs contained in the incremental is larger for a cumulative backup.

ZDLRA backup progression (cumulative differential)

As  you most likely know, one the most important features of the ZDLRA is the ability to create a "virtual full" from an incremental backup.,

If we look at what happens with a cumulative differential (from above), I will fill in the virtual full RMAN catalog entries by shading them light green.

The process of performing backups on the ZDLRA is exactly the same as it is for the above cumulative, but the RMAN catalog looks like this.


What you will noticed by looking at this compared to the normal cumulative process that
  • For every cumulative incremental backup there is a matching virtual full backup  The Virtual full backup appears (from the newly inserted catalog entry) to have beeen taken at the same time, and the same starting SCN number as the cumulative incremental. Virtual full backups, and incremental backups match time, and SCN as catalog entries.
  • The size of the virtual full is 0 since it is virtual and does not take up any space.
  • The completion time for the cumulative incremental backup is the same as the differential backups.  Because the RMAN logic can see the virtual full entry in the catalog, it executes the cumulative incremental EXACTLY as if it is the first differential incremental following a full backup.
Smart Incremental backups -

Now all of this led us to smart incremental backups. Sometimes the cumulative backup process doesn't work quite right.  A few of the reasons this can happen are.

  • You perform a full backup to a backup location other than the ZDLRA. This could be because you are backing up to the ZDLRA for the first time replacing a current backup strategy, or maybe you created a special backup to disk to seed a test environment (Using a keep backup for this will alleviate this issue).  The cumulative incremental backup will compare against the last full regardless of where it was taken (there is exceptions if you always use tags to compare).
  • You implement TDE or rekey the database.  Implementing TDE (Transparent Data Encryption) changes the blocks, but does not change the SCN numbers of the blocks. A new full backup is required.
Previously, you would have to perform a special full backup to correct these issues. In the example below you can see what happens (without smart incremental) to the RMAN catalog if you perform a backup on Thursday at 12:00 to disk to refresh a development environment.



Since the cumulative backups are based on the last full backup, the Thursday - Saturday backups contain all the blocks that have changed since the disk backup started on Thursday at 12:00.
And, since it is cumulative, each days backup is larger, and takes longer.

This is when you would typically have to force a new level 0 backup of the datafile.


What the smart incremental does

Since the RMAN catalog is controlled by the ZDLRA it can correct the problem for you. You no longer need to perform cumulative backups as the ZDLRA can fill in any issues that occur.

In the case of the Full backup to disk, it can "hide" that entry, and continue to correctly perform differential backups. It would "hide" the disk backup that occured, and inform the RMAN client that the last full backup as of Thursday night is NOT the disk backup, but it is the previous virtual full backup.
\


 In the case of the TDE, it can "hide" all of the Level 0 virtual full backups, and the L1 differential backups (which will force a new level 0).





All of this is done without updating the DB client version. All the magic is done within the RMAN catalog on the ZDLRA.

Now isn't that smart ?



Friday, March 26, 2021

ZDLRA leverages the ZFS object store in the newest release

Yes, Using ZFSSA as an on-prem object store with ZDLRA is here, and How to configure Zero Data Loss Recovery Appliance to use ZFS OCI Object Storage as a cloud repository (Doc ID 2761114.1) shows you how.


Above is the diagram from Tim Chien's "Ask Tom" session on the new feature with ZDRA release 19.1.1.2.

 For those how have been reading my blog posts, and wondering why the sudden interest in ZFS as an object store, here is another reason.

The idea behind this is pretty simple,  many customers are looking for an additional tier of storage behind the ZDLRA for 2 reasons
  • They want to extend the the recovery window onto a lower tier of storage. This may include going from a full "any point in time" recovery to a set of "recovery points"
  • They want an archival backup for a long period of time that is a set backup point.  Keep backups are the perfect example of this. With Keep backups you get a self-contained restore point of your choosing.
Now for the magic of how all this works.

1. The first step is to configure your ZFSSA as an OCI object store. As long you are on the latest patched release of OS 8.8, this functionality is available to you.  If you are unfamiliar with how to do this, in previous posts, I have walked through the steps of configuring this. Below are some places to start.


Also, here is the documentation from ZFS.

2. The second step is to configure Key Vault (OKV), which is a licensed product. Key vault is a centralized Encryption Key management system that is used to store the master encryption key for the backups.  OKV is released as a virtual image, that can be installed on physical hardware, or in a virtual environment. the installation is self-contained and walks through a series of questions to finish the configuration.  Easy.
  WHY do I need TDE ?  I'm sure you are asking this question.  The "Copy-to-cloud" functionality of the ZDLRA is being utilized to present ZFS as an "OCI cloud store".  It acts just like an object store in the Oracle Public Cloud.  The only difference is that there is no "ARCHIVE" tier on ZFS.  Since ZFS is considered a "Cloud destination", it follows the Larry rule that "All data in the Cloud is encrypted.". Because of that, the backups going to ZFS will be RMAN encrypted (no license needed for this part).  The ZDLRA uses OKV to store the master keys used to encrypt the RMAN backupsets.

3. The third step is to configure the ZDLRA to utilize OKV as a client, and to point the ZDLRA to your ZFS.
  One of the great things of using this solution is that the process is exactly the same as configuring the ZDLRA to send backups to the Oracle public cloud. This link points to the documentation that makes it clear how to configure this process.

That's all there is to it. The most complicated task is configuring the authentication for the OCI object store on ZFS, as it requires setting up a public and private key.

Now to walk through the workflow.

Backups -- Below is the backup workflow from the presentation.  The ZDLRA creates an RMAN backupset from the backup pieces on the ZDLRA. This backupset is an RMAN encrypted backupset.




One item is NOT mentioned on this slide is compression.  If your Database is using TDE, then the backup cannot be compressed when sent to the ZFS because the ZDLRA does not have the encryption master key for the database.  BUT, if your database is NOT TDE enabled, then you should be using compression when sending the backups to the ZFS. As I've said earlier, the backset is an RMAN encrypted backupset. Because it is already encrypted when sent to ZFS, the ZFS will be unable to compress the backups.  You can find instructions to add compression in the documentation for creating a job template.  There is a setting for the template called

Compression_algorithm=> 
By implementing compression on the ZDLRA you are:
  • Decreasing the size of the backups on the ZFS..
  • Decreasing the networkwork traffic between the ZDLRA and ZFS as the data is compressed before it is sent to ZFS. This can double the throughput for backups and restores.
Keep in mind, that if you restore directly to your database host from the ZFS Object store, the database host will be performing the uncompression.

Restores - Below is the restore workflow. Typically you would utilize the catalog on the ZDLRA and let the ZDLRA be the conduit for uncompressing (if it was compressed when sent to ZFS), and unencrypting it, as the ZDLRA encrypted it.  The ZDLRA already has the credentials for the object store, and it has the Encryption master key available to it from OKV.



Alternately you can restore the backups directly from the ZFS object store.
This would be a 3 step process..

1) You would download the Oracle Database Cloud Backup Module . Once downloaded you would configure the database to utilize the OCI object store. The link above also contains links to documentation for the module, and to a MOS note containing the FAQ.  Keep in mind that in this case you are configuring the Module for the on-premise ZFS (rather than the Oracle public cloud), and the instructions may have to be modified. The table below gives you an idea of the differences.


2) You would catalog the backup pieces. If the RMAN catalog is not available (for some reason) the MOS note mentioned below contains detail on how to list what is in the object store, and how to clean it out.

How to report or delete backup pieces stored in Cloud Object Storage by Database Backup Cloud Service without using RMAN (Doc ID 2360800.1)

The script contained in the MOS note ( odbsrmt.py) should work with a few minor changes to the instructions (since we are talking about an on-prem ZFS).  I will continue to work through the changes and post the results in a future blog post.


3) You would register the restore location as an OKV endpoint (if it isn't already registered), OR you can alternately export the encryption key and create a wallet file.






Conclusion - This is a very exciting addition to the many features that the ZDLRA already provides.

Tuesday, February 2, 2021

ZDLRA - Using Protection Policies to manage databases that have migrated or to be retired

 One the questions that keeps coming up with ZDLRA is how to manage the backups for a database that has either

  • Been migrated to another ZDRA
  • Been retired, but the backup needs to be kept for a period of time












The best way to deal with this by the use of Protection Policies.

How Protection Policies work:


If you remember right, Protection Policies are way of grouping databases together that have the same basic characteristics.

The most important of which are :

Name/Description             - Used to identify the Protection Policy
Recovery Window Goal    - How many days of recovery do you want to store at a minimum 
Max Retention Window    - (Optional) Maximum number of days of backups you want to keep
Unprotected Window        - (Optional) Used to set alerts for databases that are no longer receiving recovery data.

One of the common questions I get is.. What happens if I change the Protection Policy associated with my database ?

Answer :  By changing the Protection Policy a database is associated with, you are only changing the metadata.  Once the change is made, the database follows the Protection Policy rules it is now associated with, and no longer is associated with the old Protection Policy

How this plays out with a real example is... 
My Database (PRODDB) is a member of a Protection Policy (GOLD) which has a Recovery Window Goal of 20 days, and a Max Retention Window of 40 days (the default value being 2x the Recovery Window Goal).
My Database (PRODDB) currently has 30 days of backups, which is right in the middle. 



 What would normally happen for this database is (given enough space), backups will continue to be kept until PRODDB has 40 days of backups.  On day 41, a maintenance job (which runs daily) will execute, and find that my database, PRODDB, has exceeded it's Recovery Window Goal.  This job will remove all backups (in a batch process for efficiency) that are older than 20 days.

BUT ........................

Today, I moved my database, PRODDB, to a new protection policy (Silver) which only has a 10 day Recovery Window Goal, and a Max Recovery Window of 20 Days.


As I pointed out, the characteristics of the NEW Protection Policy will be used, and the next time the daily purge occurs, this database will be flagged, and all backups greater than the Recovery Window Goal will be purged.





Retiring databases: - 

One very common question how to handle the retiring of database.  As you might know, when you remove a database from the ZDLRA, ALL backups are removed from ZDLRA.
When a database is no longer sending backups to the ZDLRA,  the backups will continue to be purged until only a single level 0 backup remains.  This is to ensure that at least one backup is kept, regardless of the Max Recovery Window.
The best way to deal with Retiring database (and still keep the last Level 0 backup) through the use of Protection Policies.
In my example for my database PRODDB, I am going to retire the database instead of moving it to the Silver policy.  My companies standard is to  keep the final backup for my database available for 90 days, and on day 91 all backups can be removed.

These are requirements from the above information.
  • At least 1 backup is kept for 90 days, even though my Max Recovery Window was 40 days.
  • I want to know when my database has been retired for 90 days so I can remove it from the ZDLRA.
In order to accomplish both of these items, I am going to create a Protection Policy named RETIRED_DB with the following attributes
  • Recovery Window Goal of 2 days
  • Max Recovery Window of 3 Days
  • Unprotected Data Window of 90 days
  • New Alert in OEM to tell me when a database in this policy violates its Unprotected Data Window
If you look closely at the attributes, you will noticed that I decreased the Recovery Window Goal to allow backups to be removed after 3 days.  I also set the Unprotected Data Window to be 90 days.
What this looks like over  time is 




As you can see by moving it to the new policy, within a few days, all backups except for the most recent Full back is removed.  You can also see that on day 91 (when it's time to remove this database) I will be getting an alert.

Migrating Databases:

Migrating databases is very similar to retiring databases, except that I don't want remove the old backups until they naturally expire.  For my example of PRODB with a Recovery Window Goal of 20 days, as soon as I have a new Level 0 on the new ZDLRA, I will move this database to a new policy (GOLD_MIGRATED) with the following attributes.
  • Recovery Window Goal of 20 days, since I still need to preserve old backups
  • Max Recovery Window goal of 21 days. This will  remove the old backups as they age off.
  • Unprotected Data Window of 21 days, which will alert me that it time to remove this database.
How this would look over time time is.




Conclusion:

When retiring or migrating databases, Protection Policies can be leveraged to both
  • Ensure backups are removed as they age out until only a single L0 (Full) remains
  • Alert you when it is time to remove the database from the ZDLRA.