Tuesday, September 29, 2020

ZDLRA - DISK_RESERVED_SPACE checkup

I wanted to go through some very basic items to think about on disk reserved space for databases backed up to a ZDLRA.

There are couple of posts that have been written on this by both myself and Sudhakar Kotagiri.

One of the key items to concentrate on is the DISK_RESERVED_SPACE for each database. A simple explanation of the DISK_RESERVED_SPACE is this setting represents the amount of space you set aside on a database-by-database basis to keep a backup window to support the recovery window goal.

A simple example of the DISK_RESERVED_SPACE is.....

My Full backup takes up 40 TB. Keeping 14 days of recovery (which is my recovery window goal) takes up an additional 1 TB/day.

In this example, I need 54 TB of storage to keep 14 days of recovery.

For this database, I would set the reserved space to be 56.5 TB to ensure I have an extra 5% of space available to handle any unexpected peaks.

Easy right ? The value for RECOVERY_WINDOW_SPACE in the RA_DATABASE view gives you the space needed to support the Recovery Window.

But.. the reason I called this a checkup is that I wanted to make sure some thought is given to the setting. If your database is changing (which it almost always is), then this needs to be reviewed and adjusted.

Below are some simple rules of thumb of what to think about when adjusting DISK_RESERVED_SPACE

Stable Mature Databases - If your database is mature, and the workload is consistent, the DISK_RESERVED_SPACE should be 5-10% larger than the RECOVERY_WINDOW_SPACE. This setting should be reviewed at least 4 times year to be sure it is accurate.
Actively Changing Databases - If the database has a changing workload. Maybe it is still growing, or maybe new features are being added to the application. The DISK_RESERVED_SPACE should be set at 5-10% larger than RECOVERY_WINDOW_SPACE + Include a percentage for growth. This should be reviewed monthly (at a minimum) OR if a big growth spurt is planned.
Databases with Peaks - For some business, there may be databases that have peaks. Maybe they support "Black Friday", or maybe they support huge sales around the superbowl. The DISK_RESERVED_SPACE should be 5-10% larger than the RECOVERY_WINDOW_SPACE needed during this peak. This will ensure that the space is available when the peak comes.
TDE databases - when a database migrates to TDE, there is a time period where storage is needed for the Pre-TDE backup, and the Post-TDE backup. You need to adjust the DISK_RESERVED_SPACE to take this into account. NOTE: Staggering the migration when you migrate to TDE can avoid running out of DISK_RESERVED_SPACE for all databases.
Databases with ILM - if you have databases performing ILM activities this affects backup space needed. A simple example would be a database whose historical data is moved to an HCC tablespace when it becomes inactive. Space needs to be reserved in DISK_RESERVED_SPACE to hold the old structure, the new structure, and the archive logs created during this change.

My suggestion to simplify this is to use PROTECTION POLICIES. Each type of database can be in it's own protection policy. Review the DISK_RESERVED_SPACE at the appropriate time for each policy.

It's that easy. :)

Thursday, September 24, 2020

ZDLRA - How to do a storage checkup

One of the items that comes up with the ZDLRA is a storage checkup. The DBAs want to know more detail about the storage utilization of each database.

Once you understand the above concepts you realize that are there 2 major pieces that affect the storage utilization for a database.

1) How much space a level 0 backup takes. Since the ZDLRA virtualizes full backups, each database has at least 1 copy of each block on the ZDLRA. It would be only 1 if it doesn't change, or it could 30 copies of the same block if it changes every day (like system tablespace data). What you are interested is the size of 1 full backup

2) The amount of storage 1 day of changes takes up (on average). This would be the stored size of an incremental backup (if you perform an incremental every day), and it would be the stored size of the archive logs for a day of workload.

By combining these 2 pieces you can then calculate how much storage is needed for X number of days of backups.

Now how do I do this ? below is the query I use, and I will explain the columns in it.

select db_unique_name,
               trunc(size_estimate,0) estimated_db_size,
               recovery_window_goal,
               trunc(space_usage,0) space_usage,
               trunc(estimate_zero_day_space - ((estimate_seven_day_space - estimate_one_day_space)/6),0) level_0_size,
               trunc((estimate_seven_day_space - estimate_one_day_space)/6,1) one_day_space,
               trunc(recovery_window_space,0) recovery_window_space,
               disk_reserved_space,
estimate_rwg_space
from
(Select db_unique_name,
       Space_usage,
       extract(day from recovery_window_goal) recovery_window_goal,
       dbms_ra.estimate_space (DB_UNIQUE_NAME,numtodsinterval(1,'day')) estimate_zero_day_space,
       dbms_ra.estimate_space (DB_UNIQUE_NAME,numtodsinterval(1,'day')) estimate_one_day_space,
       dbms_ra.estimate_space (DB_UNIQUE_NAME,numtodsinterval(7,'day')) estimate_seven_day_space,
       dbms_ra.estimate_space (DB_UNIQUE_NAME,recovery_window_goal) estimate_rwg_space,
        RECOVERY_WINDOW_SPACE,
       disk_reserved_space,
       size_estimate
               from ra_database);

What's returned

DB_UNIQUE_NAME	DB name
RECOVERY_WINDOW_GOAL	How long backups are kept
SPACE_USAGE	How much space (GB) is the DB using in total ?
LEVEL_0_SIZE	Estimated size (GB) of just the full backup
ONE_DAY_SPACE	Estimated space usage (GB) for a single day of backups
RECOVERY_WINDOW_SPACE	How much space is needed for a 14 day recovery window.
DISK_RESERVED_SPACE	How much space is set aside for backups ?
ESTIMATE_DB_SIZE	How big is the database (GB) estimated to be ?
ESTIMATE_RWG_SPACE	This returns the space (GB) needed for the recovery window from RA_DATABASE which may not match calculating using the columns returned.

Now let's take a look at what I can do with this..

This is an example, where I summarize the space utilization for a couple of ZDLRAs.

And here I looked at the detail for the databases.

This report (just above this) gives me some really useful information.

I can see DB02 has a really big change rate. The database is only about 2.5 TB, but it is using 14 TB of storage.
I can see the disk_reserved_space is way too small for DB01. DB01 needs about 15 TB to keep it's recovery window goal, but the disk_reserved_space is only 500 GB
DB03 looks good. Change rate is not significant, and disk_reserved_space is set a little higher than the RECOVERY_WINDOW_SPACE.

Now finally, I was able to use the one_day_space, and graph out the space utilization for each days Recovery window.

This graph shows each day of RWG and it's storage needs, the currently USED, and the USABLE space. I can see that even though my used space is close to the usable space, there is still room for growth. I can also use this to see what happens to my storage utilization if I changed the RWG to 10 days.

I highly recommend periodically doing a health check on your storage utilization, and review the disk_reserved_space.

I hope this gives some information you can use to take a closer look.

**NOTE ** this query is accurate as 9/30/20. It might need to be adjusted with future releases.

Also, it is only as accurate as the data. The longer a database has been backing up with a consistent workload, the better the estimate.

Wednesday, September 23, 2020

ZDLRA, Real-time redo and compression

In this post I will go through what happens to archive logs sent to the ZDLRA through real-time redo.

The most common way to send archivelog backups to a ZDLRA is through real-time redo.

In this method the ZDLRA is treated just like a standby database destination.

The main difference with sending logs to the ZDLRA is that logs need to be sent (REDO_TRANSPORT_USER) as the VPC (virtual private catalog) account that is registered to send backups.

This is done by use of wallet containing the VPC user ID and Password and is included in the channel configuration parameter.

There is a great explanation of most of this from my colleague Fernando Simon and you can find it here.

ZDLRA, Real-Time Redo and Zero RPO

What I wanted to go through is the process of sending the logs (real-time), and the process of storing the logs on the ZDLRA.

The first thing to understand is the steps in the process of turning real-time redo into RMAN backupsets.

Step 1 The redo is captured real-time from the ZDLRA through the use of "shadow logs". Think of "shadow logs" as standby redo logs that are created for each database, and for each redo log that is being captured. Just like standby redo logs, these are full size logs. To give you an example, lets say there are 6 databases sending real-time redo the the ZDLRA, 3 of these are 2 node RAC clusters. Each database have a redo log size of 20 GB.

On the ZDLRA, these are mirrored (to disk) and will use storage which is included in the USAGE number for the database. In my example there will be 9 logs

Step 2 - When a log switch occurs a task is created called BACKUP_ARCH. This task is responsible for taking the "shadow log" and turning it into an RMAN backupset containing the log.

The RMAN backupset can be compressed (and it uses BASIC by default, please change it) based on the policy that the Database is a member of.

One of the advantages of the ZDLRA is that the compression license is NOT needed to use other degrees of compression.

The suggestion I would make is.

TDE Databases - Put ALL TDE databases in their own policy and set compression to NONE. TDE archive logs will not compress and will cause overhead.]

NON-TDE databases - Use LOW compression. this will give you best combination of compression ratio and elapsed time.

Now let's take a look at the tasks to see what I am talking about.

Below is a snippet from the currently running tasks (taken from a SAR report).

TASK_TYPE                 PRIORITY  STATE            CURRENT_COUNT  LAST_EXECUTE_TIME     WORK_TYPE    MIN_CREATION
----------------------  ----------  ---------------  -------------  --------------------  -----------  ------------
BACKUP_ARCH                    120  RUNNING                      7  03-OCT-2019 14:49:08  Work         03-OCT-2019

I can see there there are currently 7 redo logs that have switched, and are awaiting processing to become backupsets. This number should always be very small.

Below is a snippet from the tasks executed in the last 24 hours (also from a SAR report).

TASK_TYPE               STATE                   CNT     CREATED  MIN_COMPLETION_TIME     MAX_COMPLETION_TIME     OLD_CREATION_TIME
----------------------  ---------------  ----------  ----------  ----------------------  ----------------------  ----------------------
BACKUP_ARCH             COMPLETED             9,591       9,580  02-OCT-2019 18:50:35    03-OCT-2019 14:50:28    02-OCT-2019 18:49:49

This is telling me that there were 9,591 log switches on all my protected databases in the last 24 hours.

From a compression standpoint. PLEASE at least change the current setting in your policies for compression. and use the recommendations.

TDE - No compression

No TDE - LOW compression.

I point out in the my last post why this so important to get right.

Monday, September 21, 2020

ZDLRA, archivelog log backups and compression

In this post I will go through what happens with Archive log Backupsets sent to the ZDLRA through log sweeps.

When you implement ZDLRA you have 2 choices in backing up archive logs.

1) Use real-time redo transport (RRT) which is the same mechanism that is used to send archive logs to a standby database.
2) Use traditional log sweeps (RMAN) that pick up the archive logs and send them to the ZDLRA as backupsets.

Today I am going to go through the second option, using RMAN log sweeps.

Before I go into detail please refer to this MOS note to ensure you understand best practice for backing up a database to the ZDLRA.

RMAN best practice recommendations for backing up to the Recovery Appliance (Doc ID 2176686.1)

As of writing this post, the best practice is

backup device type sbt cumulative incremental level 1 filesperset 1 section size 64g database plus archivelog filesperset 32 not backed up;

When you execute the best practice command, there are 2 pieces to this backup script.

Database Backup - The best practice is filesperset=1 and section size 64G. This ensures that a large datafile backup (big file) is broken up into pieces, and each backup piece contains only a single datafile. This allows the virtualization process to start as soon as each backup piece is received

Archivelog Backup - Best practice is to use filesperset=32 and only backup archivelogs that have not been backed up.

Now to walk through the archive log backup process:

RMAN will create a backupset of 32 archive logs. This backupset will be sent to the ZDLRA (through the libra.so library) and will be written to physical disk on the ZDLRA. The RMAN catalog on the ZDLRA will be immediately updated with the location of the backupset.

Since there is no processing done on the ZDLRA once received (beyond what the RMAN client does), the file is written "as is" on the ZDLRA.

So what why do I point this out ? As you may know the ZDLRA compresses Datafile backups received, but it does not compress archivelog backupsets through RMAN. If you want your archivelog backupset compressed (that came to the ZDLRA through an RMAN log sweep) you must perform compression through RMAN before sending the archive logs.,

There are a few items to think about before you rush into immediately compressing archive logs.

The first of which (and probably most important to your company) is that RMAN compression, other than basic (which is NOT recommended) requires the ACO (advanced Compression) option (license). If the databases you support are NOT licensed for ACO usage, then you should stop right here, and consider using real-time redo. Real-time redo can use all levels of compression without the ACO because the compression is done on the ZDLRA. This will be my next blog post.

#1 - ACO is required for RMAN compression. Use real-time redo to compress on the ZDLRA without the ACO license

The second thing to think about is what level of compression. Below is some example compression ratio AND timings that have been achieved to give you an idea of the differences. Of course every one's data is different, so your mileage could vary. This does give you an idea however.

BASIC - The elapsed time is 5x longer than it is for NOCOMP. I would absolutely not recommend using BASIC compression.

LOW - The elapsed time was actually less than NOCOMP, most likely due to sending less traffic. The backup ratio was roughly 2:1 giving a great balance of similar execution time and reasonable compression

MEDIUM - The elapsed time was triple (3x) that of LOW or NOCOMP. The compression ratio was slightly better, but not significant.

HIGH - The elapsed time was 24x longer than it is for NOCOMP, and the compression ratio was only slightly better. I would absolutely not recommend using HIGH compression

#2 - LOW compression offers the best balance between elapsed time, and compression ratio.

As I point out that compression of archive logs is a good thing, there as a BIG CAVEAT to this. The ZDLRA has its own compression of datafile backups. The ZDLRA compression is of each individual block, NOT the backupset. Because of this RMAN compression of datafiles is not recommended, and if TDE is implemented this will cause backups not to virtualize. The 2 items are.

The ZDLRA will uncompress the RMAN backupset and recompress the blocks once virtualized.
TDE data will not be virtualized since RMAN compression re-encrypts the backupset.

#3 - DO NOT compress datafile backups.

The 4th item associated with the compression of archive log backupsets is replication. The replication of archivelogs on the ZDLRA is the "cascade" of backupsets. The backupset containing the archive logs are sent to the downstream "as-is". If you compress the archive logs with RMAN, then they get replicated compressed. The compressed backupsets not only use less network traffic when replicating, but they will also be stored on the downstream compressed.

#4 - Compression of archive logs means less network traffic with replication.

The 5th item associated with the compression of archive logs is validation on the ZDLRA. Compression of archive logs comes with a slight cost, and this is one of the trade-offs. The ZDLRA (as you might know) does a "restore validate" of all backups on the ZDLRA on regular basis (typically once a week). In order to validated archivelog backupsets, these backupsets need to be uncompressed. The uncompression of archivelog backupsets uses CPU on the ZDLRA and the higher the compression, the greater the overhead of this process. Believe it or not, weekly validation is one of the most intensive tasks performed on the ZDLRA. Using LOW compression has minimal impact on CPU during validation and is recommended unless space is at a premium and MEDIUM compression can be tolerated.

NOTE: This can be monitored in the SAR report by looking at the VALIDATE task. You should see VALIDATE tasks completing, and when looking at executing tasks, the MIN_CREATION should with a day or 2 of executing the SAR report. If the MIN_CREATION data is more than few days old, VALIDATION tasks are not keeping up and implementing compression will exasperate this situation.

#5 - Validation requires uncompressing archivelog backupsets, so be careful of too high a level of compression.

The final item associated with the compression of archive logs is the recovery of the database using archivelog backupsets. During a recovery operation, any archivelogs restored through RMAN will have to be uncompressed. This uncompression may affect recovery time. LOW gives the best tradeoff since the elapsed time to uncompress is minimal. If the network is saturated, restoring compressed archivelogs (which are typically 50% the size) may actually help with recovery time.

#6 - The DB host will have to uncompress archivelog backupsets during recovery. This may affect recovery time.

Now the question is.. How do I put this together to get LOW compression of archive logs AND not compress datafiles?

This is how it can be done.

1) Enable RMAN LOW compression option.

RMAN> CONFIGURE DEVICE TYPE 'SBT_TAPE' BACKUP TYPE TO BACKUPSET;

2) Ensure that compressed backupsets are NOT used by default

RMAN> CONFIGURE DEVICE TYPE 'SBT_TAPE' BACKUP TYPE TO BACKUPSET;

3) Daily incremental level 1 Backups.

run
{
backup as compressed backupset filesperset 8 archivelog all not backed up delete input;
backup as backupset cumulative incremental level 1 filesperset 1 section size 128G database;
backup as compressed backupset filesperset 8 archivelog all not backed up delete input;
}

4) Periodic log sweep Backups.

run
{
backup as compressed backupset filesperset 8 archivelog all not backed up delete input;
}

I am hoping this gives you everything you need to know about using RMAN log sweeps with the ZDLRA and you can decide if you want to use compression of archivelogs during those sweeps.

Bryan's Oracle Blog