Showing posts with label maa. Show all posts
Showing posts with label maa. Show all posts

Thursday, July 28, 2022

ZFSSA replicating locked snaphots to OCI for offsite backup

ZFSSA replication can be used to create locked offsite backups. In this post I will show you how to take advantage of the new "Locked Snapshot" feature of ZFSSA and the ZFS Image in OCI to create an offsite backup strategy to OCI.

ZFSSA Snapshot Replication
If you haven't heard of the locked snapshot feature of ZFSSA I blogged about here.  In this post I am going to take advantage of this feature and show you how you can leverage it to provide a locked backup in the Oracle Cloud using the ZFS image available in OCI.

In order to demonstrate this I will start by following the documentation to create a ZFS image in OCI as my destination.  Here is a great place to start with creating the virtual ZFS appliance in OCI.

Step 1 - Configure remote replication from source ZFSSA to ZFS appliance in OCI. 


By enabling the "Remote Replication" service with a named destination, "downstream_zfs" in my example, I can now replicate to my ZFS appliance in OCI.

zfssa remote replication


Step 2 -  Ensure the source project/share has "Enable retention policy for Scheduled Snapshots" turned on


For my example I created a new project "Blogtest".  On the "snapshots" tab I put a checkmark next to 
"Enable retention policy for Scheduled Snapshots".  By checking this, the project will adhere to preventing the deletion of any locked snapshots.  This property is replicated to the downstream and will cause the replicated project shares to also adhere to locking snapshots.  This can also be set at the individual share level if you wish to control the configuration of locked snapshots for individual shares.

Below you can see where this is enabled for snapshots created within the project.

ZFSSA Enable Snapshot Retention


Step 3 -  Create a snapshot schedule with "locked" snapshots


The next step is to create locked snapshots. This can be done at the project level (affecting all shares) or at the share level. In my example below I gave the scheduled snapshots a label "daily_snaps".  Notice for my example I am only keeping only 1 snapshot and I am locking the snapshot at the source. In order for the snapshot to be locked at the destination
  • Retention Policy MUST be enabled for the share (or inherited from the project).
  • The source snapshot MUST be locked when it is created
zfssa create snapshots

Step 4 -  Add replication to downstream ZFS in OCI

The next step is to add replication to the project  configuration to replicate the shares to my ZFS in OCI. Below you can see the target is my "downstream_zfs" that I configured in the "Remote Replication" service.
You can also see that I am telling the replication to "include snapshots", which are my locked snapshots, and also to "Retain user snapshots on target".  Under "Disaster Recovery" you can see that I am telling the downstream to keep a 30 day recovery point.  Even though I am only keeping 1 locked snapshot on the source, I want to keep 30 days of recovery on the downstream in OCI.

ZFSSA add replication

Step 5 -  Configure snapshots to replicate

In this step I am updating the replication action to replicate the locked scheduled snapshot to the downstream.  Notice that I changed the number of snapshots from 1 (on the source) to 30 on the destination, and I am keeping the snapshot retention locked. This will ensure that the daily locked snapshot taken on the source will replicate to the destination as a locked snapshot, and 30 snapshots on the destination will remain locked.  The 31st snapshot is no longer needed.

ZFSSA Autosnap replication


Step 6 -  Configure the replication schedule

The last step is to configure the replication schedule. This ensures that on a daily basis the snapshots that are configured to be replicated will be replicated regularly to the downstream. You can make this more aggressive than daily if you wish the downstream to be more in sync in the primary.  In my example below I configured the replication to occur every 10 minutes. This means that the downstream should have all updates as of 10 minutes ago or less. If I need to go back in time, I will have daily snapshots for the last 30 days that are locked and cannot be removed.

ZFSSA Replication Schedule

Step 7 -  Validate the replication


Now that I have everything configured I am going to take a look at the replicated snapshots on my destination.  I navigate to "shares" and I look under "replicat" and find my share. By clicking on the pencil and looking at the "snapshots" tab I can see my snapshot replicated over.

zfssa downstream copy

And when I click on the pencil next to the snapshot I can see that the snapshot is locked and I can't unlock it.

zfssa downstream locked



From there I can clone the snap and create a local snapshot, back it up to object storage, or reverse the replication if needed.



Wednesday, July 28, 2021

A New ZDLRA feature can help you migrate to a new ZDLRA

 A new feature was included in the 19.2.1.1.2 ZDLRA software release to help you migrate your backup strategy when moving to a new ZDLRA.


This feature allows you to continue to access your older backups during the cut-over period directly from the new ZDLRA.  You point your database restore to the the new ZDLRA  and it will automagically access the older backups if necessary. Once the cutover period has passed, the old ZDLRA can be retired.

I am going to walk through the steps.

1. Configure new ZDLRA

  • Add the new ZDLRA to OEM - The first step is to ensure that the new ZDLRA has been registered within your OEM environment. This will allow it to be managed, and of course monitored.
  • Add a replication VPC user to the new ZDLRA. This will be used to connect from the old ZDLRA.
  • Add the VPC users on the new ZDLRA that match the old ZDLRA
  • Configure policies on new ZDLRA to match old ZDLRA.
          This can done by dynamically executing DBMS_RA.CREATE_PROTECTION_POLICY. 
           Current protection policy information can be read from the RA_PROTECTION_POLICY view.
  • Add databases to proper protection policies on new ZDLRA.
        This can be done by dynamically executing DBMS_RA.ADD_DB. 
        Current database information can be read from the RA_DATABASE view.

  • Grant the replication VPC user access to all databases for replication.
        This can be done by dynamically executing DBMS_RA.GRANT_DB_ACCESS
        The current list of databases can be read from the RA_DATABASE view.

  • Grant the VPC users access to the database for backups/restores
        This can be by dynamically executing DBMS_RA.GRANT_DB_ACCESS
        The current list of grants can be read from the RA_DB_ACCESS view
  • Create a replication server on the old ZDLRA that points to the new ZDLRA
  • Add the protection policies on the old ZDLRA to the replication server created previously..

NOTE: When these steps are completed, the old ZDLRA will replicate the most recent L0 to the new ZDLRA, and will then replicate all new incremental backups and archive logs.




2. Switch to new ZDLRA for backups

  • Update the wallet on all clients to include the VPC user/Scan listener of the new ZDLRA.
  • Update the real-time redo configuration (if using real-time redo) to point to the new ZDLRA.
  • Update backup jobs to open channels to the new ZDLRA
  • Remove the VPC replication user from the new ZDLRA  
  • Drop the replication server on the old ZDLRA
NOTE: The backups will begin with an incremental backup based on the contents of the new ZDLRA and will properly create a "virtual full". Archive logs will automatically pick up with the sequence number following the last log replicated from the old ZDLRA.



3 . Configure "Read-Only Mode" replication to old ZDLRA

  • Add a replication VPC user on the old ZDLRA. This will be used to connect from the new ZDLRA.
  • Create a replication server from new ZDLRA to the old ZDLRA
  • Grant the replication VPC user on the old ZDLRA access to all databases for replication.
        This can be done by dynamically executing DBMS_RA.GRANT_DB_ACCESS
        The current list of databases can be read from the RA_DATABASE view.
  • Add a replication server for each policy that includes the "Read-Only" flag set to "YES".
NOTE: this will allow the new ZDLRA to pull backups from the old ZDLRA that only exist on the old ZDLRA.


4 . Retire old ZDLRA after cutover period

  • Remove replication server from new ZDLRA that points to old ZDLRA
NOTE: The old ZDLRA can now be decommissioned.



That's all there is to it. This will allow you to restore from the new ZDLRA, and not have to keep track of which backups are on which appliance during the cutover window !

Monday, March 29, 2021

ZDRLA adds smart incremental to be even smarter.

 Recently version 19.1.1.2 of ZDLRA software was released, and one the features is something called "Smart Incremental".  I will walk through how this feature works, and help you understand why features like this are "ZDLRA Only".




I am going to start by walking through how incremental backups become "virtual full backups", and that will give you a better picture of how "smart incremental" is possible.

The most important thing to understand about these features is that the RMAN catalog itself is within the ZDLRA  AND the ZDLRA has the ability to update the RMAN catalog.

How does a normal backup strategy work ? 

That is probably the best place to start.  What DBAs typically do is perform a WFDI (Weekly Full Daily Incremental) backup.  To keep my example simple, I will use the following assumptions.
  • My database contains 3 datafile database. SYSTEM, SYSAUX, USERS, but I will only use the example of backing up datafile users.
  • Each of these 3 datafiles are 50 GB in size
  • I am only performing a differential backup which creates a backup containing the changes since the last backup (full OR incremental).
  • My database is in archivelog  *
* NOTE: With ZDLRA you can back up a nologging database, and still take advantage of virtual fulls. The database needs to be in a MOUNTED state when performing the incremental backup.

If placed in a table the backups for datafile USERS would look this. Checkpoint SCN is the current SCN number of the database at the start of the backup.



If I were to look at what is contained in the RMAN catalog (RC_BACKUP_DATAFILE), I would see the same backup information but I would see the SCN information 2 columns.
  • Incremental change # is the oldest SCN contained in the backupset. This is the starting SCN number of the previous backup, this backup is based on.
  • Checkpoint Change # is  starting SCN number of the backup.  Everything newer than this SCN (including this SCN) needs to be defuzzied.


Normal backup progression (differential)


When performing an incremental RMAN backup of a datafile, the first thing that RMAN does is decide which blocks needs to be backed up. Because you are performing an incremental backup,  you may be backing up all of the blocks, only some of the blocks, or even none of the blocks if the file has not changed.
This is a decision RMAN makes by querying the RMAN catalog entries (or the controlfile entries if you not using an RMAN catalog).

Now let's walk through this decision process.  Each RMAN incremental differential's starting SCN is based on the beginning SCN of the previous backup (except for the full).



By looking at the RMAN catalog (or controlfile), RMAN determines  which blocks need to be contained in each incremental backup.



Normal backup progression (cumulative differential)


Up to release 19.1.1.2, the recommendation was to perform a Cumulative Differential backup. The cumulative differential backup compares the starting SCN number of the last full backup to determine the starting point of the incremental backup (rather than the last incremental backup) .
The advantage of the cumulative over differential, is that a cumulative backups can be applied to the last full and take the place of applying multiple differential backups.  However, cumulative backups are bigger  every day that passes between full backups because they contain all blocks since the last full.

Below is what a cumulative schedule would look like and you can compare this to the differential above.
You can see that each cumulative backups starts with the Checkpoint SCN of the last full to ensure that all blocks changed since the full backup started are captured.



The RMAN catalog entries would look like this.




If you were astute, you would notice a few things happened with the cumulative differential vs the differential.
  • The backup size got bigger every day
  • The time it took to perform the incremental backup got longer
  • The range of SCNs contained in the incremental is larger for a cumulative backup.

ZDLRA backup progression (cumulative differential)

As  you most likely know, one the most important features of the ZDLRA is the ability to create a "virtual full" from an incremental backup.,

If we look at what happens with a cumulative differential (from above), I will fill in the virtual full RMAN catalog entries by shading them light green.

The process of performing backups on the ZDLRA is exactly the same as it is for the above cumulative, but the RMAN catalog looks like this.


What you will noticed by looking at this compared to the normal cumulative process that
  • For every cumulative incremental backup there is a matching virtual full backup  The Virtual full backup appears (from the newly inserted catalog entry) to have beeen taken at the same time, and the same starting SCN number as the cumulative incremental. Virtual full backups, and incremental backups match time, and SCN as catalog entries.
  • The size of the virtual full is 0 since it is virtual and does not take up any space.
  • The completion time for the cumulative incremental backup is the same as the differential backups.  Because the RMAN logic can see the virtual full entry in the catalog, it executes the cumulative incremental EXACTLY as if it is the first differential incremental following a full backup.
Smart Incremental backups -

Now all of this led us to smart incremental backups. Sometimes the cumulative backup process doesn't work quite right.  A few of the reasons this can happen are.

  • You perform a full backup to a backup location other than the ZDLRA. This could be because you are backing up to the ZDLRA for the first time replacing a current backup strategy, or maybe you created a special backup to disk to seed a test environment (Using a keep backup for this will alleviate this issue).  The cumulative incremental backup will compare against the last full regardless of where it was taken (there is exceptions if you always use tags to compare).
  • You implement TDE or rekey the database.  Implementing TDE (Transparent Data Encryption) changes the blocks, but does not change the SCN numbers of the blocks. A new full backup is required.
Previously, you would have to perform a special full backup to correct these issues. In the example below you can see what happens (without smart incremental) to the RMAN catalog if you perform a backup on Thursday at 12:00 to disk to refresh a development environment.



Since the cumulative backups are based on the last full backup, the Thursday - Saturday backups contain all the blocks that have changed since the disk backup started on Thursday at 12:00.
And, since it is cumulative, each days backup is larger, and takes longer.

This is when you would typically have to force a new level 0 backup of the datafile.


What the smart incremental does

Since the RMAN catalog is controlled by the ZDLRA it can correct the problem for you. You no longer need to perform cumulative backups as the ZDLRA can fill in any issues that occur.

In the case of the Full backup to disk, it can "hide" that entry, and continue to correctly perform differential backups. It would "hide" the disk backup that occured, and inform the RMAN client that the last full backup as of Thursday night is NOT the disk backup, but it is the previous virtual full backup.
\


 In the case of the TDE, it can "hide" all of the Level 0 virtual full backups, and the L1 differential backups (which will force a new level 0).





All of this is done without updating the DB client version. All the magic is done within the RMAN catalog on the ZDLRA.

Now isn't that smart ?



Friday, March 26, 2021

ZDLRA leverages the ZFS object store in the newest release

Yes, Using ZFSSA as an on-prem object store with ZDLRA is here, and How to configure Zero Data Loss Recovery Appliance to use ZFS OCI Object Storage as a cloud repository (Doc ID 2761114.1) shows you how.


Above is the diagram from Tim Chien's "Ask Tom" session on the new feature with ZDRA release 19.1.1.2.

 For those how have been reading my blog posts, and wondering why the sudden interest in ZFS as an object store, here is another reason.

The idea behind this is pretty simple,  many customers are looking for an additional tier of storage behind the ZDLRA for 2 reasons
  • They want to extend the the recovery window onto a lower tier of storage. This may include going from a full "any point in time" recovery to a set of "recovery points"
  • They want an archival backup for a long period of time that is a set backup point.  Keep backups are the perfect example of this. With Keep backups you get a self-contained restore point of your choosing.
Now for the magic of how all this works.

1. The first step is to configure your ZFSSA as an OCI object store. As long you are on the latest patched release of OS 8.8, this functionality is available to you.  If you are unfamiliar with how to do this, in previous posts, I have walked through the steps of configuring this. Below are some places to start.


Also, here is the documentation from ZFS.

2. The second step is to configure Key Vault (OKV), which is a licensed product. Key vault is a centralized Encryption Key management system that is used to store the master encryption key for the backups.  OKV is released as a virtual image, that can be installed on physical hardware, or in a virtual environment. the installation is self-contained and walks through a series of questions to finish the configuration.  Easy.
  WHY do I need TDE ?  I'm sure you are asking this question.  The "Copy-to-cloud" functionality of the ZDLRA is being utilized to present ZFS as an "OCI cloud store".  It acts just like an object store in the Oracle Public Cloud.  The only difference is that there is no "ARCHIVE" tier on ZFS.  Since ZFS is considered a "Cloud destination", it follows the Larry rule that "All data in the Cloud is encrypted.". Because of that, the backups going to ZFS will be RMAN encrypted (no license needed for this part).  The ZDLRA uses OKV to store the master keys used to encrypt the RMAN backupsets.

3. The third step is to configure the ZDLRA to utilize OKV as a client, and to point the ZDLRA to your ZFS.
  One of the great things of using this solution is that the process is exactly the same as configuring the ZDLRA to send backups to the Oracle public cloud. This link points to the documentation that makes it clear how to configure this process.

That's all there is to it. The most complicated task is configuring the authentication for the OCI object store on ZFS, as it requires setting up a public and private key.

Now to walk through the workflow.

Backups -- Below is the backup workflow from the presentation.  The ZDLRA creates an RMAN backupset from the backup pieces on the ZDLRA. This backupset is an RMAN encrypted backupset.




One item is NOT mentioned on this slide is compression.  If your Database is using TDE, then the backup cannot be compressed when sent to the ZFS because the ZDLRA does not have the encryption master key for the database.  BUT, if your database is NOT TDE enabled, then you should be using compression when sending the backups to the ZFS. As I've said earlier, the backset is an RMAN encrypted backupset. Because it is already encrypted when sent to ZFS, the ZFS will be unable to compress the backups.  You can find instructions to add compression in the documentation for creating a job template.  There is a setting for the template called

Compression_algorithm=> 
By implementing compression on the ZDLRA you are:
  • Decreasing the size of the backups on the ZFS..
  • Decreasing the networkwork traffic between the ZDLRA and ZFS as the data is compressed before it is sent to ZFS. This can double the throughput for backups and restores.
Keep in mind, that if you restore directly to your database host from the ZFS Object store, the database host will be performing the uncompression.

Restores - Below is the restore workflow. Typically you would utilize the catalog on the ZDLRA and let the ZDLRA be the conduit for uncompressing (if it was compressed when sent to ZFS), and unencrypting it, as the ZDLRA encrypted it.  The ZDLRA already has the credentials for the object store, and it has the Encryption master key available to it from OKV.



Alternately you can restore the backups directly from the ZFS object store.
This would be a 3 step process..

1) You would download the Oracle Database Cloud Backup Module . Once downloaded you would configure the database to utilize the OCI object store. The link above also contains links to documentation for the module, and to a MOS note containing the FAQ.  Keep in mind that in this case you are configuring the Module for the on-premise ZFS (rather than the Oracle public cloud), and the instructions may have to be modified. The table below gives you an idea of the differences.


2) You would catalog the backup pieces. If the RMAN catalog is not available (for some reason) the MOS note mentioned below contains detail on how to list what is in the object store, and how to clean it out.

How to report or delete backup pieces stored in Cloud Object Storage by Database Backup Cloud Service without using RMAN (Doc ID 2360800.1)

The script contained in the MOS note ( odbsrmt.py) should work with a few minor changes to the instructions (since we are talking about an on-prem ZFS).  I will continue to work through the changes and post the results in a future blog post.


3) You would register the restore location as an OKV endpoint (if it isn't already registered), OR you can alternately export the encryption key and create a wallet file.






Conclusion - This is a very exciting addition to the many features that the ZDLRA already provides.

Tuesday, February 2, 2021

ZDLRA - Using Protection Policies to manage databases that have migrated or to be retired

 One the questions that keeps coming up with ZDLRA is how to manage the backups for a database that has either

  • Been migrated to another ZDRA
  • Been retired, but the backup needs to be kept for a period of time












The best way to deal with this by the use of Protection Policies.

How Protection Policies work:


If you remember right, Protection Policies are way of grouping databases together that have the same basic characteristics.

The most important of which are :

Name/Description             - Used to identify the Protection Policy
Recovery Window Goal    - How many days of recovery do you want to store at a minimum 
Max Retention Window    - (Optional) Maximum number of days of backups you want to keep
Unprotected Window        - (Optional) Used to set alerts for databases that are no longer receiving recovery data.

One of the common questions I get is.. What happens if I change the Protection Policy associated with my database ?

Answer :  By changing the Protection Policy a database is associated with, you are only changing the metadata.  Once the change is made, the database follows the Protection Policy rules it is now associated with, and no longer is associated with the old Protection Policy

How this plays out with a real example is... 
My Database (PRODDB) is a member of a Protection Policy (GOLD) which has a Recovery Window Goal of 20 days, and a Max Retention Window of 40 days (the default value being 2x the Recovery Window Goal).
My Database (PRODDB) currently has 30 days of backups, which is right in the middle. 



 What would normally happen for this database is (given enough space), backups will continue to be kept until PRODDB has 40 days of backups.  On day 41, a maintenance job (which runs daily) will execute, and find that my database, PRODDB, has exceeded it's Recovery Window Goal.  This job will remove all backups (in a batch process for efficiency) that are older than 20 days.

BUT ........................

Today, I moved my database, PRODDB, to a new protection policy (Silver) which only has a 10 day Recovery Window Goal, and a Max Recovery Window of 20 Days.


As I pointed out, the characteristics of the NEW Protection Policy will be used, and the next time the daily purge occurs, this database will be flagged, and all backups greater than the Recovery Window Goal will be purged.





Retiring databases: - 

One very common question how to handle the retiring of database.  As you might know, when you remove a database from the ZDLRA, ALL backups are removed from ZDLRA.
When a database is no longer sending backups to the ZDLRA,  the backups will continue to be purged until only a single level 0 backup remains.  This is to ensure that at least one backup is kept, regardless of the Max Recovery Window.
The best way to deal with Retiring database (and still keep the last Level 0 backup) through the use of Protection Policies.
In my example for my database PRODDB, I am going to retire the database instead of moving it to the Silver policy.  My companies standard is to  keep the final backup for my database available for 90 days, and on day 91 all backups can be removed.

These are requirements from the above information.
  • At least 1 backup is kept for 90 days, even though my Max Recovery Window was 40 days.
  • I want to know when my database has been retired for 90 days so I can remove it from the ZDLRA.
In order to accomplish both of these items, I am going to create a Protection Policy named RETIRED_DB with the following attributes
  • Recovery Window Goal of 2 days
  • Max Recovery Window of 3 Days
  • Unprotected Data Window of 90 days
  • New Alert in OEM to tell me when a database in this policy violates its Unprotected Data Window
If you look closely at the attributes, you will noticed that I decreased the Recovery Window Goal to allow backups to be removed after 3 days.  I also set the Unprotected Data Window to be 90 days.
What this looks like over  time is 




As you can see by moving it to the new policy, within a few days, all backups except for the most recent Full back is removed.  You can also see that on day 91 (when it's time to remove this database) I will be getting an alert.

Migrating Databases:

Migrating databases is very similar to retiring databases, except that I don't want remove the old backups until they naturally expire.  For my example of PRODB with a Recovery Window Goal of 20 days, as soon as I have a new Level 0 on the new ZDLRA, I will move this database to a new policy (GOLD_MIGRATED) with the following attributes.
  • Recovery Window Goal of 20 days, since I still need to preserve old backups
  • Max Recovery Window goal of 21 days. This will  remove the old backups as they age off.
  • Unprotected Data Window of 21 days, which will alert me that it time to remove this database.
How this would look over time time is.




Conclusion:

When retiring or migrating databases, Protection Policies can be leveraged to both
  • Ensure backups are removed as they age out until only a single L0 (Full) remains
  • Alert you when it is time to remove the database from the ZDLRA.

Thursday, September 24, 2020

ZDLRA - How to do a storage checkup

 One of the items that comes up with the ZDLRA is a storage checkup.  The DBAs want to know more detail about the storage utilization of each database.



Once you understand the above concepts you realize that are there 2 major pieces that affect the storage utilization for a database.

1) How much space a level 0 backup takes.  Since the ZDLRA virtualizes full backups, each database has at least 1 copy of each block on the ZDLRA.  It would be only 1 if it doesn't change, or it could 30 copies of the same block if it changes every day (like system tablespace data). What you are interested is the size of 1 full backup

2) The amount of storage 1 day of changes takes up (on average).  This would be the stored size of an incremental backup (if you perform an incremental every day), and it would be the stored size of the archive logs for a day of workload.

By combining these 2 pieces you can then calculate how much storage is needed for X number of days of backups.

Now how do I do this ? below is the query I use, and I will explain the columns in it.

select db_unique_name,
               trunc(size_estimate,0) estimated_db_size,
               recovery_window_goal,
               trunc(space_usage,0) space_usage,
               trunc(estimate_zero_day_space - ((estimate_seven_day_space - estimate_one_day_space)/6),0) level_0_size,
               trunc((estimate_seven_day_space - estimate_one_day_space)/6,1) one_day_space,
               trunc(recovery_window_space,0) recovery_window_space,
               disk_reserved_space,
estimate_rwg_space
from
(Select db_unique_name,
       Space_usage,
       extract(day from recovery_window_goal) recovery_window_goal,
       dbms_ra.estimate_space (DB_UNIQUE_NAME,numtodsinterval(1,'day')) estimate_zero_day_space,
       dbms_ra.estimate_space (DB_UNIQUE_NAME,numtodsinterval(1,'day')) estimate_one_day_space,
       dbms_ra.estimate_space (DB_UNIQUE_NAME,numtodsinterval(7,'day')) estimate_seven_day_space,
       dbms_ra.estimate_space (DB_UNIQUE_NAME,recovery_window_goal) estimate_rwg_space,
        RECOVERY_WINDOW_SPACE,
       disk_reserved_space,
       size_estimate
               from ra_database);


 What's returned

DB_UNIQUE_NAME

DB name

RECOVERY_WINDOW_GOAL

How long backups are kept

SPACE_USAGE

How much space (GB) is the DB using in total ?

LEVEL_0_SIZE

Estimated size (GB) of just the full backup

ONE_DAY_SPACE

Estimated space usage (GB) for a single day of backups

RECOVERY_WINDOW_SPACE

How much space is needed for a 14 day recovery window.

DISK_RESERVED_SPACE

How much space is set aside for backups ?

ESTIMATE_DB_SIZE

How big is the database (GB) estimated to be ?

ESTIMATE_RWG_SPACE

This returns the space (GB) needed for the recovery window from RA_DATABASE which may not match calculating using the columns returned.


Now let's take a look at what I can do with this..

This is an example, where I summarize the space utilization for a couple of  ZDLRAs.



And here I looked at the detail for the databases.


This report (just above this) gives me some really useful information.

  • I can see DB02 has a really big change rate. The database is only about 2.5 TB, but it is using 14 TB of storage.
  • I can see the disk_reserved_space is way too small for DB01.  DB01 needs about 15 TB to keep it's recovery window goal, but the disk_reserved_space is only 500 GB
  • DB03 looks good.  Change rate is not significant, and disk_reserved_space is set a little higher than the RECOVERY_WINDOW_SPACE.


Now finally, I was able to use the one_day_space, and graph out the space utilization for each days Recovery window.
This graph shows each day of RWG and it's storage needs,  the currently USED, and the USABLE space.  I can see that even though my used space is close to the usable space, there is still room for growth. I can also use this to see what happens to my storage utilization if I changed the RWG to 10 days.

I highly recommend periodically doing a health check on your storage utilization, and review the disk_reserved_space.

I hope this gives some information you can use to take a closer look.

**NOTE ** this query is accurate as 9/30/20.  It might need to be adjusted with future releases.

Also, it is only as accurate as the data. The longer a database has been backing up with a consistent workload, the better the estimate.

 

Wednesday, September 23, 2020

ZDLRA, Real-time redo and compression

 In this post I will go through what happens to archive logs sent to the ZDLRA through real-time redo.


The most common way to send archivelog backups to a ZDLRA is through real-time redo.

In this method the ZDLRA is treated just like a standby database destination.

The main difference with sending logs to the ZDLRA is that logs need to be sent (REDO_TRANSPORT_USER) as the VPC (virtual private catalog) account that is registered to send backups.

This is done by use of wallet containing the VPC user ID and Password and is included in the channel configuration parameter.

There is a great explanation of most of this from my colleague Fernando Simon and you can find it here.

ZDLRA, Real-Time Redo and Zero RPO


What I wanted to go through is the process of sending the logs (real-time), and the process of storing the logs on the ZDLRA.

The first thing to understand is the steps in the process of turning real-time redo into RMAN backupsets.


Step 1  The redo is captured real-time from the ZDLRA through the use of "shadow logs". Think of "shadow logs" as standby redo logs that are created for each database, and for each redo log that is being captured.  Just like standby redo logs, these are full size logs. To give you an example, lets say there are 6 databases sending real-time redo the the ZDLRA, 3 of these are 2 node RAC clusters.  Each database have a redo log size of 20 GB.

On the ZDLRA, these are mirrored (to disk) and will use storage which is included in the USAGE number for the database.  In my example there will be 9 logs



Step 2 - When a log switch occurs a task is created called BACKUP_ARCH. This task is responsible for taking the "shadow log" and turning it into an RMAN backupset containing the log.

The RMAN backupset can be compressed (and it uses BASIC by default, please change it) based on the policy that the Database is a member of.

One of the advantages of the ZDLRA is that the compression license is NOT needed to use other degrees of compression.

The suggestion I would make is.

TDE Databases - Put ALL TDE databases in their own policy and set compression to NONE.  TDE archive logs will not compress and will cause overhead.]

NON-TDE databases - Use LOW compression. this will give you best combination of compression ratio and elapsed time.


Now let's take a look at the tasks to see what I am talking about.


Below is a snippet from the currently running tasks (taken from a SAR report).

TASK_TYPE                 PRIORITY  STATE            CURRENT_COUNT  LAST_EXECUTE_TIME     WORK_TYPE    MIN_CREATION
----------------------  ----------  ---------------  -------------  --------------------  -----------  ------------
BACKUP_ARCH                    120  RUNNING                      7  03-OCT-2019 14:49:08  Work         03-OCT-2019

I can see there there are currently 7 redo logs that have switched, and are awaiting processing to become backupsets. This number should always be very small.

Below is a snippet from the tasks executed in the last 24 hours (also from a SAR report).

TASK_TYPE               STATE                   CNT     CREATED  MIN_COMPLETION_TIME     MAX_COMPLETION_TIME     OLD_CREATION_TIME
----------------------  ---------------  ----------  ----------  ----------------------  ----------------------  ----------------------
BACKUP_ARCH             COMPLETED             9,591       9,580  02-OCT-2019 18:50:35    03-OCT-2019 14:50:28    02-OCT-2019 18:49:49


This is telling me that there were 9,591 log switches on all my protected databases in the last 24 hours.

From a compression standpoint. PLEASE at least change the current setting in your policies for compression. and use the recommendations.

TDE - No compression
No TDE - LOW compression.

I point out in the my last post why this so important to get right.




Monday, September 21, 2020

ZDLRA, archivelog log backups and compression

 In this post I will go through what happens with Archive log Backupsets sent to the ZDLRA through log sweeps.


When you implement ZDLRA you have 2 choices in backing up archive logs.

1) Use real-time redo transport (RRT) which is the same mechanism that is used to send archive logs to a standby database.

2) Use traditional log sweeps (RMAN) that pick up the archive logs and send them to the ZDLRA as backupsets.

Today I am going to go through the second option, using RMAN log sweeps.

Before I go into detail please refer to this MOS note to ensure you understand best practice for backing up a database to the ZDLRA.

RMAN best practice recommendations for backing up to the Recovery Appliance (Doc ID 2176686.1)

As of writing this post, the best practice is

backup device type sbt cumulative incremental level 1 filesperset 1 section size 64g database plus archivelog filesperset 32 not backed up; 

When you execute the best practice command, there are 2 pieces to this backup script.

Database Backup - The best practice is filesperset=1 and section size 64G. This ensures that a large datafile backup (big file) is broken up into pieces, and each backup piece contains only a single datafile. This allows the virtualization process to start as soon as each backup piece is received

Archivelog Backup - Best practice is to use filesperset=32 and only backup archivelogs that have not been backed up.

Now to walk through the archive log backup process:

RMAN will create a backupset of 32 archive logs.  This backupset will be sent to the ZDLRA (through the libra.so library) and will be written to physical disk on the ZDLRA.  The RMAN catalog on the ZDLRA will be immediately updated with the location of the backupset.

Since there is no processing done on the ZDLRA once received (beyond what the RMAN client does), the file is written "as is" on the ZDLRA.

So what why do I point this out ?  As you may know the ZDLRA compresses Datafile backups received, but it does not compress archivelog backupsets through RMAN. If you want your archivelog backupset compressed (that came to the ZDLRA through an RMAN log sweep) you must perform compression through RMAN before sending the archive logs.,

There are a few items to think about before you rush into immediately compressing archive logs.

The first of which (and probably most important to your company) is that RMAN compression, other than basic (which is NOT recommended) requires the ACO (advanced Compression) option (license).  If the databases you support are NOT licensed for ACO usage, then you should stop right here, and consider using real-time redo.  Real-time redo can use all levels of compression without the ACO because the compression is done on the ZDLRA. This will be my next blog post.

#1 - ACO is required for RMAN compression. Use real-time redo to compress on the ZDLRA without the ACO license

The second thing to think about is what level of compression.  Below is some example compression ratio AND timings that have been achieved to give you an idea of the differences. Of course every one's data is different, so your mileage could vary. This does give you an idea however.


BASIC - The elapsed time is 5x longer than it is for NOCOMP. I would absolutely not recommend using BASIC compression.

LOW - The elapsed time was actually less than NOCOMP, most likely due to sending less traffic. The backup ratio was roughly 2:1 giving a great balance of similar execution time and reasonable compression

MEDIUM - The elapsed time was triple (3x) that of LOW or NOCOMP. The compression ratio was slightly better, but not significant.

HIGH - The elapsed time was 24x longer than it is for NOCOMP, and the compression ratio was only slightly better. I would absolutely not recommend using HIGH compression

#2 - LOW compression offers the best balance between elapsed time, and compression ratio.

As I point out that compression of archive logs is a good thing, there as a BIG CAVEAT to this. The ZDLRA has its own compression of datafile backups.  The ZDLRA compression is of each individual block, NOT the backupset. Because of this RMAN compression of datafiles is not recommended, and if TDE is implemented this will cause backups not to virtualize.  The 2 items are.

  • The ZDLRA will uncompress the RMAN backupset and recompress the blocks once virtualized.
  • TDE data will not be virtualized since RMAN compression re-encrypts the backupset.

#3 - DO NOT compress datafile backups.

The 4th item associated with the compression of archive log backupsets is replication. The replication of archivelogs on the ZDLRA is the "cascade" of backupsets.  The backupset containing the archive logs are sent to the downstream "as-is".  If you compress the archive logs with RMAN, then they get replicated compressed. The compressed backupsets not only use less network traffic when replicating, but they will also be stored on the downstream compressed.

#4 - Compression of archive logs means less network traffic with replication.

The 5th item associated with the compression of archive logs is validation on the ZDLRA. Compression of archive logs comes with a slight cost, and this is one of the trade-offs.  The ZDLRA (as you might know) does a "restore validate" of all backups on the ZDLRA on regular basis (typically once a week).  In order to validated archivelog backupsets, these backupsets need to be uncompressed. The uncompression of archivelog backupsets uses CPU on the ZDLRA and the higher the compression, the greater the overhead of this process. Believe it or not, weekly validation is one of the most intensive tasks performed on the ZDLRA.  Using LOW compression has minimal impact on CPU during validation and is recommended unless space is at a premium and MEDIUM compression can be tolerated.

NOTE: This can be monitored in the SAR report by looking at the VALIDATE task. You should see VALIDATE tasks completing, and when looking at executing tasks, the MIN_CREATION should with a day or 2 of executing the SAR report.  If the MIN_CREATION data is more than few days old, VALIDATION tasks are not keeping up and implementing compression will exasperate this situation.

#5 - Validation requires uncompressing archivelog backupsets, so be careful of too high a level of compression.

The final item associated with the compression of archive logs is the recovery of the database using archivelog backupsets.  During a recovery operation, any archivelogs restored through RMAN will have to be uncompressed. This uncompression may affect recovery time. LOW gives the best tradeoff since the elapsed time to uncompress is minimal.  If the network is saturated, restoring compressed archivelogs (which are typically 50% the size) may actually help with recovery time.

#6 - The DB host will have to uncompress archivelog backupsets during recovery. This may affect recovery time.

Now the question is.. How do I put this together to get LOW compression of archive logs AND not compress datafiles?

This is how it can be done.


1) Enable RMAN LOW compression option.
RMAN> CONFIGURE DEVICE TYPE 'SBT_TAPE' BACKUP TYPE TO BACKUPSET;


2) Ensure that compressed backupsets are NOT used by default
RMAN> CONFIGURE DEVICE TYPE 'SBT_TAPE' BACKUP TYPE TO BACKUPSET;

3) Daily incremental level 1 Backups.

run
{
backup as compressed backupset filesperset 8 archivelog all not backed up delete input;
backup as backupset cumulative incremental level 1 filesperset 1 section size 128G database;
backup as compressed backupset filesperset 8 archivelog all not backed up delete input;
}

4) Periodic log sweep Backups.

run
{
backup as compressed backupset filesperset 8 archivelog all not backed up delete input;
}


I am hoping this gives you everything you need to know about using RMAN log sweeps with the ZDLRA and you can decide if you want to use compression of archivelogs during those sweeps.






Thursday, July 16, 2020

ZDLRA and TDE wallet location - Part 2

TDE and SEPS security - how do I get there?
If you read my last blog post on TDE and SEPS security you might be asking yourself, how do I get there ?

Many customers use the default location for the TDE wallet (because they are new to TDE) and find that it the default location will cause conflicts with other Oracle features.

The basic question around this would be.

"all my TDE wallets are in the default location of $ORACLE_HOME/admin/DB_UNQUE_NAME/wallet 
                  or 
$ORACLE_BASE/admin/DB_UNQUE_NAME/wallet
and  I have multiple databases sharing the same $ORACLE_HOME location 
how do I get to a dedication location for TDE?

The challenge, especially if you want to use WALLET_LOCATION (which the ZDLRA requires for real-time redo) is how to get from the default to a dedicated location.
The issue is that WALLET_LOCATION overrides the default location, unless a dedicated TDE wallet location is specified.

First-- The SQLNET.ORA file is ONLY read by the database at startup. Any changes made to the sqlnet.ora file will be effective when a database instance bounces.  You do want to be careful with the coordination however, because a database instance can bounce at any time for any number of reasons so plan carefully.

Now let's start with the where to put the TDE wallet files.  There are many options

1) Leave the wallet files within the $ORACLE_HOME directory using the $ORACLE_SID. 
     PROS - This is less disruptive since it uses a variable already set
     CONS - Wallets have to be be moved to a new location with an out of place upgrade.
                   You need copy the wallet to this new location when implementing.
                    In a multi-node RAC cluster the location is different on each node

    STEPS

  • For each database sharing the $ORACLE_HOME ensure there is a wallet subdirectory created on each node for every instance.
  • Copy the wallet files to the appropriate subdirectory for each node and for each instance
  • Update the SQLNET.ORA file to point to $ORACLE_HOME/admin/$ORACLE_SID/tde_wallet
2) Leave the wallet files within the original location in $ORACLE_HOME that uses the $DB_UNIQUE_NAME.
     PROS - You don't have to move the wallet files
     CONS - You need to set a new variable
                    Wallets have to be be moved to a new location with an out of place upgrade.

    STEPS
  • For ALL databases sharing the same $ORACLE_HOME ensure that the variable $DB_UNIQUE_NAME is set through srvctl (if available). This ensures all nodes in a RAC cluster have the variable set.
  • Ensure all login scripts on all nodes (including the login script) have the variable $DB_UNIQUE_NAME set
  • Update the SQLNET.ORA file to point to the $ORACLE_HOME/admin/$DB_UNIQUE_NAME/wallet
3) Leave (or move) the wallet files within the $ORACLE_BASE directory using the $ORACLE_SID.  

     PROS - This is less disruptive since it uses a variable already set
     CONS - Wallets have to be be moved to a new location with an out of place upgrade.
                   You need copy the wallet to this new location when implementing.
                    In a multi-node RAC cluster the location is different on each node

    STEPS

  • For each database sharing the $ORACLE_HOME ensure there is a wallet subdirectory created on each node for every instance within the $ORACLE_BASE/admin directory (unless this was already the default)
  • If necessary, copy the wallet files to the appropriate subdirectory for each node and for each instance
  • Update the SQLNET.ORA file to point to $ORACLE_BASE/admin/$ORACLE_SID/wallet
4) Migrate to $ORACLE_BASE and use $DB_UNIQUE_NAME
     PROS - Once set, you can leave the wallets after out-of-place upgrades
     CONS -  You need copy the wallet to this new location when implementing.
                    You need to set a variable to be used

    STEPS

  • For each database sharing the $ORACLE_HOME ensure there is a wallet subdirectory created on each node for every $DB_UNIQUE_NAME within the $ORACLE_BASE/admin directory (unless this was already the default)
  • Copy the wallet files to the appropriate subdirectory for each node and for each instance
  • For ALL databases sharing the same $ORACLE_HOME ensure that the variable $DB_UNIQUE_NAME is set through srvctl (if available). This ensures all nodes in a RAC cluster have the variable set.
  • Ensure all login scripts on all nodes (including the login script) have the variable $DB_UNIQUE_NAME set
  • Update the SQLNET.ORA file to point to $ORACLE_BASE/admin/$DB_UNIQUE_NAME/tde_wallet

5) Migrate to ASM (Not available in 11.2) and use $DB_UNIQUE_NAME
     PROS - Once set, you can leave the wallets after out-of-place upgrades
                   You now have a central location for a RAC cluster
     CONS -  You need copy the wallet to this new location when implementing.
                    You need to set a variable to be used

    STEPS

  • For each database sharing the $ORACLE_HOME ensure there is a wallet subdirectory created in ASM for every $DB_UNIQUE_NAME 
  • Copy the wallet files to the appropriate subdirectory for each database
  • For ALL databases sharing the same $ORACLE_HOME ensure that the variable $DB_UNIQUE_NAME is set through srvctl (if available). This ensures all nodes in a RAC cluster have the variable set.
  • Ensure all login scripts on all nodes (including the login script) have the variable $DB_UNIQUE_NAME set
  • Update the SQLNET.ORA file to point to +DISKGROUP/$DB_UNIQUE_NAME/tde_wallet

It's your choice which path to take.  For me, the best (if ASM isn''t an option) is to put the TDE Wallets within $ORACLE_BASE/admin/$DB_UNIQUE_NAME/tde_wallet.  That way with each out-of-place upgrade I don't have do anything with the wallet. As long as the sqlnet.ora points to the $ORACLE_BASE there won't be any changes.


NOTE: for 18c and above just migrate to WALLET_ROOT which allows you set the value for each database individually.