Tuesday, September 3, 2019

Oracle : Dataguard vs Backups

For anyone that has been around Oracle databases for a long time, they should be familiar with MAA (Maximum Availability Architecture).
Here is a link to the latest version of it.

As you probably know, MAA is described as "best practices" for implementing the Oracle Database.

What I am going to explain in this blog post, is why you need the different pieces.
Questions often comes up around why each piece of availability (for lack of a better word) is important.
These pieces include
  • Data Guard
  • Active Data Guard
  • Flashback database
  • Point-in-time backups
  • Archive Backups
  • DR site

Data Guard


Well let's start with the first piece Data Guard.  I'm not going address Data Guard in the same datacenter, just Data Guard between datacenters.
There is a difference between "just Data Guard" and "Active Data Guard"

Data Guard  -  Mounted copy of the primary database that is constantly applying redo changes. It can be thought of as always in recovery. This copy can be gracefully "switched" to without data loss (all transactions are applied before opening). In the event of an emergency, it can be "failed" to which allows for opening it without applying all transactions.

Active Data Guard -  A Data Guard copy of the data (like above), BUT this is a read-only copy of the primary database.  Note this is an extra cost option requiring a license. Having the database open read-only has many advantages.
  • Reporting, and read-only workload can be sent to the Data Guard copy of the database taking advantage of the resources available that would typically sit idle.
  • Automatic block repair from the primary database.
  • A Block Change Tracking can be used to speed up incremental backups.

Data Guard, regardless of the 2 types is often between datacenters with limited bandwidth.


This becomes important for a few reasons
  •  Cloning across datacenters, or even rebuilding a copy of the database in the opposite datacenter can take a long time.  Even days for a very large database.
  • The application tier that accesses the database needs to move with the database. In the event of a switchover/failover of the database, the application must also switchover.

NOTE :  Making your Data Guard database your primary database involves moving the application tier to your DR data center.  The testing and changes to move the application may be risky and involve more time than simply restoring the database.

Flashback Database


Flashback database allows you to query the database as of a previous point-in-time, and even flashback the entire database to a previous point-in-time.
You can also use flashback database to "recover" a table that was accidentally dropped.
With flashback database, the changes are kept in the FRA for a period of time, typically only a few hours or less depending on workload.

Flashback can be helpful in a few situations.
  1. Logical corruption of a subset of data.  If a logical corruption occurs, like data is accidentally deleted, or incorrectly updated, you can view the data as it appeared before the corruption occurred. This of course, is assuming that the corruption is found within the flashback window.  To correct corruption, the pre-corrupt data can be selected from the database and saved to a file.  The application team familiar with the data, can then create scripts to surgically correct the data.
  2. Logical corruption of much of data.  If much of the data is corrupted you can flashback the database to a point-in-time prior to the corruption within the flashback window.  This a FULL flashback of ALL data.  After the flashback occurs, a resetlogs is performed creating a new incarnation of the database.  This can be useful if an application release causes havoc with the data, and it is better to suffer data loss, than deal with the corruption.
  3. Iterative testing.  Flashback database can be a great tool when testing, to repeat the same test over and over.  Typically a restore point is created before beginning a test, and the database is flashed back to that state.  This allows an application team to be assured that the data is in the same state at the beginning of each iterative test.

Backups 


Backups allow you to restore the database to a point-in-time prior to the current point-in-time. ZDLRA is the best product to use for backups.

ZDLRA offers

  • Near Zero RPO
  • Very low RTO by using the proprietary incremental forever strategy.  Incremental forever reduces recovery time, by creating a full restore point for each incremental backup taken.

The most recent backups are kept local to the database to ensure quick restore time. here are typically 2 types of backups. 

Archive backups - These are often "keep" backups.  They are special because they are a self contained backupset that allows you recovery only to a specific prior point-in-time. An example would be the creation of an archival backup on the last day of the month, that allows you recover only to the end of the backup time.  A single recovery point for each month rather than any point during the month.

Point-in-time backups - These are backups that include the archive logs to recovery to any point-in-time.  Typically these types of backups allow you to recover to any point in time from the beginning to the current point in time. For example, allow you to recover to any point in the last 30 days.

The advantages of backups are
  • You can recover to any point-in-time within the recovery window.  This window is much longer than flashback database can provide.  Any logical corruptions can be corrected by restoring portions of "good" data, or the whole database.
  • You can restore the database within the same data center returning the database to availability without affecting the application tier.
  • Backups are very important if using active Data Guard. The issue may be with the read-only copy of the database.  Since both the primary and the Data Guard copy are used by the application, restoring from a backup is the quickest method to rebuild either environment.

Disaster Recovery


 Disaster Recovery is typically a documented, structured approach to dealing with unplanned incidents.  These incidents include planning for a complete site failure involving the loss of all data centers within a geographical area.

The 2 major risks areas addressed by any approach are

RTO - Recovery Time Objective.  How long does it take to return the application to full availability

RPO - Recovery Point Objective. How much data loss can be tolerated in the event of a disaster


Comparison



The bottom line when comparing the different technologies


Flashback database - Correct logical corruption that is immediately found.

Dataguard - Full disaster requiring moving the full environment. Keep application continuity in the event of a disaster.

Backups - Restore current environment to availability (production site or Data Guard site).


ZDLRA, by offering a very low RTO and RPO provides a solution that can be used to return availability in all but a datacenter disaster.




Tuesday, August 27, 2019

Oracle incremental merge and corruption



In the last couple of years I have seen a lot of backup vendors (Commvault, Netbackup, Rubrik etc.) using the incremental merge backup process that Oracle introduced in 10G.

These vendors have combined the incremental merge with snapshots of each new merged backup to provide a daily restore point.  The process is to perform an incremental backup, use the DB software to merge in the changes, and then "snap" the storage.  Rinse and repeat.
The archive logs are also backed up to provide a recovery point in time.

This process has a few flaws.

  1. You are leveraging DB resources to perform the merge.  Only the DB software knows the proprietary format of files to merge in the changes.
  2. The merge process is a sequential process that can be slow. You are essentially "recovering" the database one day forward using the merge process. We all know how slow a recovery can be.
  3. The storage used for these backups is typically not tier 1 storage, and this also affects the speed of the merge process.
  4. The RMAN catalog only knows about the last incremental merge.  In order to recover to more than the last merge, you need to catalog the older backups.
  5. These are FULL SIZE datafile copies.  RMAN backup sets have many features to optimize the size of backupsets (exclude unused blocks, compress, etc.) that can't be leveraged with this type of backups.
  6. Lastly, there is no inherent checking of backups for corruption.  If there is any corruption in the backup, it may not be found.

I am going walk through an example showing you the last point.

But first, I want to point out how the ZDLRA addresses all of these points.


  1. The ZDLRA uses the same proprietary software that RMAN uses to merge in the changes.  Rather than using DB resources (and licensed CPUs) to merge in the changes, the ZDLRA offloads this workload and performs the merge internally
  2. The ZDLRA simply keeps track of block changes and indexes them.  This process is much more efficient than a recovery process, and scales much better.
  3. The ZDLRA uses tier 1 storage and flash allowing the performance to match or exceed that of the databases it is protecting
  4. The RMAN catalog is within the ZDLRA allowing it to automatically catalog new full backups as they are created.
  5. These are backupsets, and don't include unused space optimizing space utilization.
  6. There are many layers of error checking.  This includes not only during the backup, but also within the ZDLRA (and when replicated).  The ZDLRA offloads the "restore validate" process to ensure that DB blocks that haven't been touched for a long time are periodically checked for corruption.  An "incremental forever" strategy is risky if you don't periodically check blocks for corruption.

Now to show why periodically checking for corruption is so important.

I am going to schedule an incremental merge of a single datafile (to keep it simple), inject corruption into the datafile copy, and then continue to merge into the backup.

First, I'm going to create a new tablespace for my testing.

SYS:DB10 > create tablespace bsg datafile '/tmp/bsg.dbf' size 10m;

Tablespace created.


Now that we have a tablespace, let's create a table in the tablespace that we can use to provide corruption.


create table corruption_test tablespace bsg as select * from dba_users;

Table created.

SQL> select  file_id, block_id from dba_extents where segment_name = 'CORRUPTION_TEST';

   FILE_ID   BLOCK_ID
---------- ----------
       16    128


Now I have a copy of dba_users in my new tablespace, and I know where the data is.

I am going run my script (below) and perform a couple of incremental merges.


run
{
allocate channel disk1 device type disk format '/tmp/bkup%U';
recover copy of datafile 16  with tag 'incr_merge';
backup incremental level 1 for recover of copy with tag 'incr_merge' datafile 16;
}


Now here is the output, everything looks good.  I backed it up, and the changes are getting merged into the image copy of the datafile.


Starting recover at 08/27/19 17:05:00
no copy of datafile 16 found to recover
Finished recover at 08/27/19 17:05:00

Starting backup at 08/27/19 17:05:00
channel disk1: starting incremental level 1 datafile backup set
channel disk1: specifying datafile(s) in backup set
input datafile file number=00016 name=/home/oracle/app/oracle/oradata/BSG18/bsg.dbf
channel disk1: starting piece 1 at 08/27/19 17:05:01
channel disk1: finished piece 1 at 08/27/19 17:05:02
piece handle=/tmp/bkup52ua97lt_1_1 tag=INCR_MERGE comment=NONE
channel disk1: backup set complete, elapsed time: 00:00:01
Finished backup at 08/27/19 17:05:02

Starting Control File and SPFILE Autobackup at 08/27/19 17:02:49
piece handle=/home/oracle/app/oracle/fast_recovery_area/bsg18/BSG18/autobackup/2019_08_27/o1_mf_s_1017421369_gpc6mt45_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 08/27/19 17:02:52
released channel: disk1

RMAN> run
{
allocate channel disk1 device type disk format '/tmp/bkup%U';
recover copy of datafile  16 with tag 'incr_merge';
backup incremental level 1 for recover of copy with tag 'incr_merge' datafile 16;
}


Starting recover at 08/27/19 17:07:35
channel disk1: starting incremental datafile backup set restore
channel disk1: specifying datafile copies to recover
recovering datafile copy file number=00016 name=/tmp/bkupdata_D-BSG18_I-101380451_TS-BSG_FNO-16_4vua97ho
channel disk1: reading from backup piece /tmp/bkup52ua97lt_1_1
channel disk1: piece handle=/tmp/bkup52ua97lt_1_1 tag=INCR_MERGE_BSG
channel disk1: restored backup piece 1
channel disk1: restore complete, elapsed time: 00:00:01
Finished recover at 08/27/19 17:07:37

Starting backup at 08/27/19 17:07:37
channel disk1: starting incremental level 1 datafile backup set
channel disk1: specifying datafile(s) in backup set
input datafile file number=00016 name=/home/oracle/app/oracle/oradata/BSG18/bsg.dbf
channel disk1: starting piece 1 at 08/27/19 17:07:37
channel disk1: finished piece 1 at 08/27/19 17:07:38
piece handle=/tmp/bkup54ua97qp_1_1 tag=INCR_MERGE_BSG comment=NONE
channel disk1: backup set complete, elapsed time: 00:00:01
Finished backup at 08/27/19 17:07:38


Now let's corrupt the datafile.
I am going to use "sed" and change the user "SYSTEM" to "      " in the image copy of the datafile.


oracle@/tmp [18c] $ sed -i 's/SYSTEM/      /g' /tmp/bkupdata_D-BSG18_I-101380451_TS-BSG_FNO-16_4vua97ho



Now I'm going to run the same incremental merge again (and again).
Here's the output.. everything looks fine since I didn't touch the block that is corrupted.
The merge process is simply replacing blocks that have changed.

Starting recover at 08/27/19 17:13:05
channel disk1: starting incremental datafile backup set restore
channel disk1: specifying datafile copies to recover
recovering datafile copy file number=00016 name=/tmp/bkupdata_D-BSG18_I-101380451_TS-BSG_FNO-16_4vua97ho
channel disk1: reading from backup piece /tmp/bkup54ua97qp_1_1
channel disk1: piece handle=/tmp/bkup54ua97qp_1_1 tag=INCR_MERGE
channel disk1: restored backup piece 1
channel disk1: restore complete, elapsed time: 00:00:01
Finished recover at 08/27/19 17:13:07

Starting backup at 08/27/19 17:13:07
channel disk1: starting incremental level 1 datafile backup set
channel disk1: specifying datafile(s) in backup set
input datafile file number=00016 name=/home/oracle/app/oracle/oradata/BSG18/bsg.dbf
channel disk1: starting piece 1 at 08/27/19 17:13:07
channel disk1: finished piece 1 at 08/27/19 17:13:08
piece handle=/tmp/bkup56ua9853_1_1 tag=INCR_MERGE comment=NONE
channel disk1: backup set complete, elapsed time: 00:00:01
Finished backup at 08/27/19 17:13:08



Now that I have performed some incremental merges, let's see what happens when I go to restore that datafile.
I took the datafile offline, removed it, and now will try to restore it from the image copy.

RMAN> alter database datafile 16 offline;

Statement processed
RMAN> exit
[oracle@oracle-server] rm /home/oracle/app/oracle/oradata/BSG18/bsg.dbf

[oracle@oracle-server]  rman target /

RMAN> restore datafile 16;



Starting restore at 08/27/19 17:15:01
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=30 device type=DISK

channel ORA_DISK_1: restoring datafile 00016
input datafile copy RECID=53 STAMP=1017421986 file name=/tmp/bkupdata_D-BSG18_I-101380451_TS-BSG_FNO-16_4vua97ho
destination for restore of datafile 00016: /home/oracle/app/oracle/oradata/BSG18/bsg.dbf
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 08/27/2019 17:15:02
ORA-19566: exceeded limit of 0 corrupt blocks for file /tmp/bkupdata_D-BSG18_I-101380451_TS-BSG_FNO-16_4vua97ho
ORA-19600: input file is datafile-copy 53 (/tmp/bkupdata_D-BSG18_I-101380451_TS-BSG_FNO-16_4vua97ho)
ORA-19601: output file is datafile 16 (/home/oracle/app/oracle/oradata/BSG18/bsg.dbf)



Now, even though the incremental merges all look good, the corruption isn't caught until I go to restore the datafile.

This is why it is so important to execute a restore validate on a regular basis, and keep a secondary backup.

With the incremental merge process, there is only 1 copy of each unique block.  For historical data that doesn't change, those blocks will never get checked for corruption.





Tuesday, July 30, 2019

Block change tracking and the ZDLRA (part 2)

Oracle Block Change Tracking(BCT) and the ZDLRA



One of the most commonly asked question I get when explaining how the ZDLRA works has to do with Block Change tracking.

The question is usually something like...

"If the the Block Change Tracking file only keeps 7 days of history by default, how does it work with the ZDLRA where you only perform a full backup once"?

And the second question is

"If I only perform incremental cumulative backups, how does it use the BCT if it get's recreated/invalidated ?"

So how is the BCT file used ?


First, I am assuming you understand what's in it. This is explained in my previous post

For full backups -

The BCT file is not used, but it is updated prior to the backup if any changes took place.
Since the datafiles are "fuzzy" (being updated), the next BCT record starts with SCN number prior to the checkpoint SCN of the backup


 For Incremental Cumulative/Differential backups -

The BCT file is updated prior to the incremental cumulative if any changes took place.


RMAN then determines starting SCN and ending SCN number to backup, and if a BCT can possibly be used.





Once RMAN determines the starting/stopping SCN for the backup, and verifies that the BCT is in use, it can then determine if the BCT was capturing changes for the backup period, and if the changes fall within BCT capturing period



Now at this point we have determined

  • BCT was in use
  • The backup period falls within the window that the BCT has complete, valid records for.

Now we need to determine if any changes took place, and if so, build the list of blocks to backup up.  Keep in mind that if there are no version records, this means that no changes were made.








NOTE : if using multiple backups strategies, and TAGS to identify backups, this will complicate the process.


ZDLRA PROCESS


For the ZDLRA, the process is very simple.  As you go through the steps, you see that that new version records are created for each datafile that has any changes since the last backup (regardless of the backup type).

When an incremental cumulative backup is executed and sent to the ZDLRA, the RMAN catalog on the ZDLRA get's updated with the virtual Full.  The RMAN client, which uses this catalog,  compares against the checkpoint scn of the last full backup in the catalog (the virtual full).

Because the RMAN client is using the RMAN catalog from the ZDLRA (containing the virtual fulls), the RMAN client always compares to the last virtual full.

Keeping 7 days of BCT history isn't an issue because the current RMAN incremental cumulative backup always compares to the previous incremental cumulative backup which is virtualized. 

Monday, July 29, 2019

Block change tracking and the ZDLRA (part 1)

Oracle Block Change Tracking(BCT) and the ZDLRA



One of the most commonly asked question I get when explaining how the ZDLRA works has to do with Block Change tracking.

The question is usually something like...

"If the the Block Change Tracking file only keeps 7 days of history by default, how does it work with the ZDLRA where you only perform a full backup once"?

And the second question is

"If I only perform incremental cumulative backups, how does it use the BCT if it get's recreated/invalidated ?"

How does the BCT work ?

Well let's walk through what the block change does from a high level.

If you want the detail of the internals, Alex Gorbachev wrote a great paper and MOS note explaining it all.  ORACLE 10G BLOCK CHANGE TRACKING INSIDE OUT (Doc ID 1528510.1)

I'm going to start with a simple example.
First I'm going to turn on block change tracking, and then check the File# for my database files.


SQL> alter database enable block change tracking using file '/home/oracle/app/oracle/oradata/BSG18/bct.dbf';

SQL>
TABLESPACE_NAME                   FILE_ID
------------------------------ ----------
USERS                                   7
UNDOTBS1                                4
SYSTEM                                  1
SYSAUX                                  3




I am going to use 2 internal structures to walk through the BCT information.

X$KRCFH - This structure contains the lowscn of the BCT file.  It is used by RMAN to determine
                      the beginning SCN of the BCT file.  If no changes have been captured for a datafile,
                      then RMAN knows at which point it can make that assumption.

X$KRCFBH - This structure contains the versions of block changes available to use for backups.
                         This structure contains a row for each datafile, and each version.
                         This row points to a bitmap containing the block changes associated with the version.
                          The key columns I am interested in to explain how it works are

                                FNO          - File number
                                VERCNT - Version number identifying the change records.
                                                    This starts at 1 and increases over the life of the BCT file.
                                 LOW        - Low SCN of the block changes. It is 0 for the first record
                                                     indicating that it not complete
                                 HIGH        - High SCN of the block changes. The last SCN number
                                                      identifying the block changes.

I am going to look the internal structure X$KRCFBH to view what's in the block change tracking.

select (select tablespace_name from dba_data_files where file_id=fno) tablespace_name,vercnt,to_char(vertime,'mm/dd/yy hh24:mi:ss') vertime,low, high from x$krcfbh
   where fno in (select file_id from dba_data_files);
SQL> SQL>   2
TABLESPACE     VERCNT VERTIME                  LOW       HIGH
---------- ---------- ----------------- ---------- ----------
SYSTEM              1 07/29/19 09:21:17          0          0
SYSAUX              1 07/29/19 09:21:18          0          0
UNDOTBS1            1 07/29/19 09:21:18          0          0
USERS               1 07/29/19 09:21:18          0          0



I can see that there are entries for all my tablespaces, with a "version count" of 1 and low/high time of 0.  This is telling me that that no backups have been executed yet usingthe BCT.
I am also going to look at X$KRCFH to see what the starting SCN is for the BCT file.


select lowscn from X$KRCFH;

LOWSCN
----------
8361930


Now I am going to execute a full backup and see what happens.



TABLESPACE     VERCNT VERTIME                  LOW       HIGH
---------- ---------- ----------------- ---------- ----------
SYSTEM              1 07/29/19 09:21:17          0    8362001
SYSAUX              1 07/29/19 09:21:18          0    8362001
UNDOTBS1            1 07/29/19 09:21:18          0    8362001
USERS               1 07/29/19 09:21:18          0    8362001




You can see that the high SCN set to 8362001.
Now I am going to look at the RMAN backup (USERS tablespace) to see what the Checkpoint SCN number was for the backup.

  File LV Type Ckp SCN    Ckp Time          Abs Fuz SCN Sparse Name
  ---- -- ---- ---------- ----------------- ----------- ------ ----
  7    0  Incr 8362002    07/29/19 09:46:59              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_users_fz01zl49_.dbf


Ahhh.. Now I can see how it fits together. The high SCN for the BCT file is SCN number right before the checkpoint taken with the backup.

Now let's execute an incremental backup and see what happens.

TABLESPACE     VERCNT VERTIME                  LOW       HIGH
---------- ---------- ----------------- ---------- ----------
SYSAUX              1 07/29/19 09:21:18          0    8362001
SYSTEM              1 07/29/19 09:21:17          0    8362001
UNDOTBS1            1 07/29/19 09:21:18          0    8362001
USERS               1 07/29/19 09:21:18          0    8362001
SYSAUX              2 07/29/19 09:46:59    8362001    8363961
SYSTEM              2 07/29/19 09:46:59    8362001    8363961
UNDOTBS1            2 07/29/19 09:46:59    8362001    8363961


1    0  Incr 8362002    07/29/19 09:46:59              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_system_fz01xnqo_.dbf
1    1  Incr 8363962    07/29/19 10:04:00              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_system_fz01xnqo_.dbf
  


By looking at the SYSTEM tablespace (file 1) I can see exactly what is happening with the BCT file.
The first version marks the starting SCN prior to the first backup after creating the file.
The second version marks the checkpoint SCN of the first backup (low), and the SCN prior to the second backup(high).
These marks, LOW/HIGH SCN, can be used to identify the blocks that changed between the backups.
Now I am going to perform a few more incremental backups with a few changes to the USERS tablespace and see what happens to the SYSTEM and USERS tablespaces versions.

First here is my query.

col tablespace_name format a10

select (select tablespace_name from dba_data_files where file_id=fno) tablespace_name,vercnt,to_char(vertime,'mm/dd/yy hh24:mi:ss') vertime,low, high from x$krcfbh
   where fno in (select file_id from dba_data_files)
    order by 1,2;

Now let's see what what my backups look for File 1 (SYSTEM tablespace).

List of Backup Sets
===================
  File LV Type Ckp SCN    Ckp Time          Abs Fuz SCN Sparse Name
  ---- -- ---- ---------- ----------------- ----------- ------ ----
  1    0  Incr 8362002    07/29/19 09:46:59              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_system_fz01xnqo_.dbf
  1    1  Incr 8363962    07/29/19 10:04:00              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_system_fz01xnqo_.dbf
  1    1  Incr 8364371    07/29/19 10:13:21              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_system_fz01xnqo_.dbf
  1    1  Incr 8364482    07/29/19 10:15:19              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_system_fz01xnqo_.dbf
  1    1  Incr 8364637    07/29/19 10:16:38              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_system_fz01xnqo_.dbf
  1    1  Incr 8365075    07/29/19 10:26:23              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_system_fz01xnqo_.dbf
  1    1  Incr 8365373    07/29/19 10:29:13              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_system_fz01xnqo_.dbf
  1    1  Incr 8365578    07/29/19 10:30:21              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_system_fz01xnqo_.dbf
  1    1  Incr 8365763    07/29/19 10:30:58              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_system_fz01xnqo_.dbf


Let's see what my backups look like for File 7 (USERS tablespace)

List of Backup Sets
===================
  File LV Type Ckp SCN    Ckp Time          Abs Fuz SCN Sparse Name
  ---- -- ---- ---------- ----------------- ----------- ------ ----
    7    0  Incr 8362002    07/29/19 09:46:59              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_users_fz01zl49_.dbf
    7    1  Incr 8363962    07/29/19 10:04:00              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_users_fz01zl49_.dbf
    7    1  Incr 8364371    07/29/19 10:13:21              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_users_fz01zl49_.dbf
    7    1  Incr 8364482    07/29/19 10:15:19              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_users_fz01zl49_.dbf
    7    1  Incr 8364637    07/29/19 10:16:38              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_users_fz01zl49_.dbf
    7    1  Incr 8365075    07/29/19 10:26:23              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_users_fz01zl49_.dbf
    7    1  Incr 8365373    07/29/19 10:29:13              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_users_fz01zl49_.dbf
    7    1  Incr 8365578    07/29/19 10:30:21              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_users_fz01zl49_.dbf
    7    1  Incr 8365763    07/29/19 10:30:58              NO    /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_users_fz01zl49_.dbf




My backups for both of these match. They have the same Ckp SCN.
Notice that I performed an Incremental level 0 backup, and then 8 Incremental level 1 backups.
I actually performed both differential backups and cumulative backups but it didn't matter.

Now let's look at the block change tracking file for these 2 tablespaces (system and users)

TABLESPACE     VERCNT VERTIME                  LOW       HIGH
---------- ---------- ----------------- ---------- ----------
SYSTEM              3 07/29/19 10:04:00    8363961    8364370
SYSTEM              4 07/29/19 10:13:21    8364370    8364481
SYSTEM              5 07/29/19 10:15:19    8364481    8364636
SYSTEM              6 07/29/19 10:16:38    8364636    8365074
SYSTEM              7 07/29/19 10:26:23    8365074    8365372
SYSTEM              8 07/29/19 10:29:13    8365372    8365577
SYSTEM              9 07/29/19 10:30:20    8365577    8365762
USERS               1 07/29/19 09:21:18          0    8362001
USERS               2 07/29/19 09:46:59    8362001    8364481
USERS               3 07/29/19 10:15:19    8364481    8364636


Very interesting.. Since I only made few changes to the users tablespace it has 3 versions, the oldest of which is the full backup.
The system tablespace has gone over 7 versions and it no longer has the original version from the level 0 backup.

Now let's see if it used the BCT files for the backups.

     FILE# Creation Time     INCREMENTAL_LEVEL INCREMENTAL_CHANGE# CHECKPOINT_CHANGE# USED BCT
---------- ----------------- ----------------- ------------------- ------------------ ---
         1 07/29/19 09:47:24                 0                   0            8362002 YES
         1 07/29/19 10:04:02                 1             8362002            8363962 YES
         1 07/29/19 10:13:21                 1             8363962            8364371 YES
         1 07/29/19 10:15:19                 1             8364371            8364482 YES
         1 07/29/19 10:16:38                 1             8364482            8364637 YES
         1 07/29/19 10:26:24                 1             8364637            8365075 YES
         1 07/29/19 10:29:14                 1             8362002            8365373 YES
         1 07/29/19 10:30:22                 1             8362002            8365578 YES
         1 07/29/19 10:31:10                 1             8362002            8365763 NO

         7 07/29/19 09:47:00                 0                   0            8362002 YES
         7 07/29/19 10:04:02                 1             8362002            8363962 YES
         7 07/29/19 10:13:21                 1             8364300            8364371 YES
         7 07/29/19 10:15:19                 1             8364371            8364482 YES
         7 07/29/19 10:16:38                 1             8364482            8364637 YES
         7 07/29/19 10:26:24                 1             8364637            8365075 YES
         7 07/29/19 10:29:14                 1             8362002            8365373 YES
         7 07/29/19 10:30:22                 1             8362002            8365578 YES
         7 07/29/19 10:30:59                 1             8362002            8365763 YES



WOW...  Notice that the system tablespace (FILE #1) could not use the BCT file for the last backup, but the backup of the user tablespace (FILE #7) could because there were no changes between a few of the backups.

I also noticed the creation time of the RMAN backup, and the creation time of the BCT file record.  The new BCT file record (if changes occured) is created BEFORE the backup begins.

Finally, I can also see that the high SCN for users, 8364636,is older than the high SCN for system.


I am going to change users, and perform an incremental to see what happens to the LOW/HIGH scn in the next version.

TABLESPACE     VERCNT VERTIME                  LOW       HIGH
---------- ---------- ----------------- ---------- ----------
USERS               1 07/29/19 09:21:18          0    8362001
USERS               2 07/29/19 09:46:59    8362001    8364481
USERS               3 07/29/19 10:15:19    8364481    8364636
USERS               4 07/29/19 10:16:38    8364636    8366975

After the backup, there is no gap in the SCN numbers.  It created a new version that contained the changes between the previous version HIGH and the SCN of the incremental backup.


So what have we learned about the BCT ?



  1. The BCT has a structure that contains the starting SCN of changes captured.
  2. The BCT file creates version records associated with block changes before the backup begins.
  3. If no BCT changes are captured for a datafile, it is assumed that no changes occured since the starting SCN of the BCT file. 
  4. The BCT file keeps a bit map of changes between 2 SCN numbers
  5. The BCT file keeps changes on a datafile level.
  6. If a datafile didn't change between backups, it doesn't create a new record in the BCT file.  It doesn't matter if the backup is an incremental or full backup.
  7. By default 7 versions are kept for EACH DATAFILE. After 7 days of backups, some datafiles may still use the BCT if they haven't changed.

Below is a flow chart of what happens.



I explain how it is used in part 2 found here.

Tuesday, May 14, 2019

Oracle Security on your database -- Combining SEPS and EUS Security

Oracle offers many methods to authenticate users, and often it becomes difficult to determine how to combine multiple methods.

In this blog I will talk about how to combine 2 commonly used methods.


Security Methods

SEPS 

What is SEPS ?

The first method is SEPS (Secure External Password Store).
SEPS uses a wallet to store default credentials for connecting to a database.  The wallet can be used as the central location for username/password information when connecting from a system/application.  Using the wallet credentials in scripts etc, keeps the credentials secure, and makes it easier to periodically update the password to follow security rules.

Creating the wallet

The location of the wallet file (WALLET_LOCATION) must be added to the sqlnet.ora file that is used by the OS session.
There is also a second parameter WALLET_OVERRIDE that must also be set to true. This parameter determines if SEPS security is used for this database.

SQLNET.ORA entries

WALLET_LOCATION= < Location of wallet on OS >
WALLET_OVERRIDE=true

******  Note that the many databases can share the same wallet if they each point to the same sqlnet.ora file. *****

Below is an example of how to add an entry into the wallet.

mkstore -wrl  -createCredential < db_connect_string > < username > < password >
The creation of the wallet entry takes 3 parameters

1) db_connect_string - The connect string for the database that you wish to add. This can be either an entry in the tnsnames.ora file OR EZCONNECT
2) username - This is the user that you wish to connect to the database with
3) password - This is the current password for the user

The wallet is encrypted to obscure the password and permissions should limit the OS users than can read the wallet.

Using the wallet


To access the database using the entry in the wallet, use the "/@" prior to the connection.

Examples for sqlplus and RMAN

SQLPLUS> connect /@<db_connect_string>

RMAN> connect catalog /@<db_connect_string>

OID/EUS


OID (Oracle Internet Directory)/EUS (Enterprise User Security) uses LDAP or other methods to validate user credentials.

OID/EUS also uses a wallet that typically contains a unique certificate to authenticate a database.
Each database that utilized OID/EUS has it's own wallet file.

OID/EUS wallets

The default location for OID/EUS wallets is with the ${ORACLE_BASE} directory structure. By default the wallet for a database (like other database files) is contained with ${ORACLE_BASE}/admin/${DB_NAME}/wallet.

Also, if utilizing the multitenant option, each PDB has a wallet, and the wallet's default location is ${ORACLE_BASE}/admin/${DB_NAME}/pdbguid/wallet.


Combining SEPS and OID/EUS.


Question . Then how do we combine SEPS and OID/EUS security ?  SEPS requires WALLET_LOCATION to be set in the sqlnet.ora, and OID/EUS needs individual wallets for each database.  Setting the WALLET_LOCATION to a central wallet location will "break" OID/EUS authentication.

Answer. The way to combine both these authentication methods is to utilize an environmental variable in the sqlnet.ora file.  A variable needs to be created identifying the current database.

Step 1 - Ensure that there is a variable set that identifies the database name.
  - This can be either on the OS level (setenv DB_NAME=) 
       or
  - Through svrctl creating a variable that is set for all nodes in a cluster.

Step 2 - use this variable to set the wallet location in the sqlnet.ora file
   - WALLET_LOCATION = ${ORACLE_BASE}/admin/${DB_NAME}/wallet

*** Note: If multitenancy is used, the WALLET_LOCATION will be the root directory for PDB wallets.

Step 3 - ensure that the SEPS  connection/username/password entry is in each wallet that will utilize the SEPS entry.  

Conclusion


That's all there is to it.  By using environmental variables within the sqlnet.ora file you can point to the individual wallet for each database (satisfying OID/EUS requirements), and still use SEPS (by adding the SEPS entry to each wallet).




Monday, April 8, 2019

RMAN restore as encrypted

There is a new feature that was added to 12.2 of RMAN that allows you to restore tablespaces or a whole database "AS ENCRYPTED" or "AS DECRYPTED".





There is documentation of this feature here for 18c, but for some reason it isn't mentioned as a new feature for 12.2, or documented in 12.2.

Now I am going to demonstrate how to use this.

I have a database (bsg) which is a 12.2 database. I have created an encryption wallet, but none of the datafiles are encrypted.
You can see this from the dba_tablespaces view.

[oracle@oracle-server BSG]$ sqlplus

SQL*Plus: Release 12.2.0.1.0 Production on Mon Apr 8 15:37:52 2019

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Enter user-name: / as sysdba

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL>  select tablespace_name,encrypted from dba_tablespaces;

TABLESPACE_NAME                ENC
------------------------------ ---
SYSTEM                         NO
SYSAUX                         NO
UNDOTBS1                       NO
TEMP                           NO
USERS                          NO

SQL>


Now I ran a Full backup of my database

[oracle@oracle-server BSG]$ rman

Recovery Manager: Release 12.2.0.1.0 - Production on Mon Apr 8 14:43:50 2019

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target /

connected to target database: BSG (DBID=589892482)

RMAN> backup incremental level 0 database;




I have a full backup and now I am going to restore the tablespace users as encrypted.
NOTE : I am using "RESTORE TABLESPACE xxx AS ENCRYPTED";

[oracle@oracle-server BSG]$ rman

Recovery Manager: Release 12.2.0.1.0 - Production on Mon Apr 8 14:43:50 2019

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target /

connected to target database: BSG (DBID=589892482)

RMAN> alter tablespace users offline;

Statement processed

RMAN> restore tablespace users as encrypted;

Starting restore at 04/08/19 14:53:22
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00007 to /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_users_fz01zl49_.dbf
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/fast_recovery_area/bsg/BSG/backupset/2019_04_08/o1_mf_nnnd0_TAG20190408T144422_gbq5n71p_.bkp
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/fast_recovery_area/bsg/BSG/backupset/2019_04_08/o1_mf_nnnd0_TAG20190408T144422_gbq5n71p_.bkp tag=TAG20190408T144422
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 04/08/19 14:53:23
RMAN> recover tablespace users;

Starting recover at 04/08/19 14:53:44
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished recover at 04/08/19 14:53:44

RMAN> alter tablespace users online;

Statement processed

RMAN> exit


Now that I restored the tablespace and recovered it let's see if it is encrypted in the catalog.

[oracle@oracle-server BSG]$ sqlplus

SQL*Plus: Release 12.2.0.1.0 Production on Mon Apr 8 14:54:13 2019

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Enter user-name: / as sysdba

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> select tablespace_name,encrypted from dba_tablespaces;

TABLESPACE_NAME                ENC
------------------------------ ---
SYSTEM                         NO
SYSAUX                         NO
UNDOTBS1                       NO
TEMP                           NO
USERS                          YES



Now to backup the tablespace.  Since it is newly encrypted, I need to make sure I perform a new Level 0 incremental backup to capture the encrypted datafiles.


[oracle@oracle-server datafile]$ rman

Recovery Manager: Release 12.2.0.1.0 - Production on Mon Apr 8 16:12:08 2019

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target /

connected to target database: BSG (DBID=589892482)

RMAN> backup incremental level 0 tablespace users;

Starting backup at 04/08/19 16:12:25
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=265 device type=DISK
channel ORA_DISK_1: starting incremental level 0 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00007 name=/home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_users_fz01zl49_.dbf
channel ORA_DISK_1: starting piece 1 at 04/08/19 16:12:26
channel ORA_DISK_1: finished piece 1 at 04/08/19 16:12:27
piece handle=/home/oracle/app/oracle/fast_recovery_area/bsg/BSG/backupset/2019_04_08/o1_mf_nnnd0_TAG20190408T161226_gbqbsbmv_.bkp tag=TAG20190408T161226 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 04/08/19 16:12:27

Starting Control File and SPFILE Autobackup at 04/08/19 16:12:27
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of Control File and SPFILE Autobackup command on ORA_DISK_1 channel at 04/08/2019 16:12:28
ORA-00237: snapshot operation disallowed: control file newly created



Now the backup was successful, but the autobackup of the controlfile failed.

There is a MOS note on this (Doc ID 2320446.1)

You cannot backup the controlfile until the database is bounced because there is a data dictionary check pending.  Once I bounce the database all is good.

Now now let's try it the other way..

RESTORE TABLESPACE xxx AS DECRYPTED;

[oracle@oracle-server trace]$ rman

Recovery Manager: Release 12.2.0.1.0 - Production on Mon Apr 8 16:29:30 2019

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target /

connected to target database: BSG (DBID=589892482)

RMAN> alter tablespace users offline;

using target database control file instead of recovery catalog
Statement processed

RMAN> restore tablespace users as decrypted;

Starting restore at 04/08/19 16:29:43
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=265 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00007 to /home/oracle/app/oracle/oradata/BSG/datafile/o1_mf_users_fz01zl49_.dbf
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/fast_recovery_area/bsg/BSG/backupset/2019_04_08/o1_mf_nnnd0_TAG20190408T162748_gbqcp4vg_.bkp
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/fast_recovery_area/bsg/BSG/backupset/2019_04_08/o1_mf_nnnd0_TAG20190408T162748_gbqcp4vg_.bkp tag=TAG20190408T162748
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 04/08/19 16:29:45

RMAN> recover tablespace users;

Starting recover at 04/08/19 16:29:54
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished recover at 04/08/19 16:29:54

RMAN> alter tablespace users online;

Statement processed

RMAN>



Finally let's double check.

[oracle@oracle-server trace]$ sqlplus

SQL*Plus: Release 12.2.0.1.0 Production on Mon Apr 8 16:31:15 2019

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Enter user-name: / as sysdba

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> select tablespace_name,encrypted from dba_tablespaces;

TABLESPACE_NAME                ENC
------------------------------ ---
SYSTEM                         NO
SYSAUX                         NO
UNDOTBS1                       NO
TEMP                           NO
USERS                          NO



Tuesday, February 26, 2019

ZDLRA - Space management





When backing up to a ZDLRA you want to optimize  your storage allocation to ensure each database can meet its recovery window goal.

You also want to make sure that the ZDLRA doesn't needs to aggressively reclaim space.

***  NOTE - This was updated on 9/13/19 as I've learned more.


Values you can set to control space and purging

Mandatory values

  • Recovery Window Goal (RWG) - How far back (in days) do I need to ensure a point in time recovery for my databases. This is set at the policy level.
        ** NOTE. This is a GOAL, not a guarantee.

  • Disk Reserved Space - How much storage to I need to ensure I keep my recovery window goal for this database.    This is set at the database level and should be adjusted as the database grows.  The sum of Disk_Reserved_Space for all databases can not be greater than the amount of space available on the storage location.  

Optional values


  • Max_Retention_Window -  What is the maximum number of days I want to keep backups for.  The MOS note on purging does a great job of documenting what the recommended value is (if you want to set it), and what the current default value is based on RA version.
  • Guaranteed_Copy -  This is set at the policy level and has 2 possible values
    • NO (default) - Follow purge rules regardless if backups have been replicated to tape/downstream.
    • YES - Backups that have not been successfully copied to tape/replicated are NOT eligible for deletion.  If the space usage for the database is above the Disk_Reserved_Space, incoming backups (both through RMAN and real-time redo) will be rejected.

Derived values that control space and purging


Each storage location has the following values that are used to control purging.

  • Free_Space - The amount of space (GB) that is available for storing backups . This is for the storage location
  • Free_Space_Goal - The estimated amount of space that is needed to support incoming backups to the storage location. This is evaluated every 3 hours. This estimate is the sum of space needed for the next 6 hours taking into account 2 types of incoming backups .
    • Daily backups from databases that have historical data.  The amount of space required to support normal incoming backup space is calculated in the Free_Space_Goal.
    • Newly added databases. In the case of databases that do not have good historical data, the Disk_Reserved_Space is used to calculate how much space needs to be available to support incoming backups.
  • Recovery_Window_Space - The amount of space needed by the database to support the RWG. This is available for each database.
  • Space_Usage - The amount of space currently being used by this protected database


Purge Process


How backups are prioritized for purging is described in this MOS note.
Zero Data Loss Recovery Appliance Purging Rules (Doc ID 2353188.1)


There are 3 rules of purging and this is outlined in the MOS note.




  • Rule 1 - This is the normal housekeeping purging and occurs on a regular basis for all databases.  Any backups whose recovery window is greater than the Max_Retention_Window are periodically removed from the ZDLRA to make space for incoming backups. This is a low priority background process.
The way this rule works is that the retention window for each database (maximum based on policy) grows until the number of days set for Max_Retention_Window is reached for the database.  When this occurs, the retention period is purged down to the Recovery_Window_goal for that database.  For example if my database has a Recovery_Window_Goal of  15 days, and a Max_Retention_Window of 20 days, when the database reaches 20 days of retention, the oldest 5 days of backups will be removed and I will have 15 days of backups after purging.  This ensures the purge is more of a bulk process, and ensures that the purge workload is spread out for all databases.

  • Rule 2 - When the Free_Space for a storage location falls below the Free_Space_Goal regular purging begins.  The database whose current retention is most above it's RWG is added to the purging queue and purging begins at a low priority process.

The way this rule works is similar to Rule 1.  Beginning with the database with the largest current  number of days of retention beyond the Recovery_Window_goal, databases, each database is purged down to its Recovery_Window_Goal until enough space is released. As I said, the processing is similar to rule 1, except it doesn't wait until the Max_Retention_Window is reached.

     
  • Rule 3 - When the Free_Space is not enough  to allow incoming backups, a more aggressive purging begins starting with the database that is most above (by percentage) its allocated Disk_Reserved_Space.  This could impact the RWG if the Disk_Reserved_Space is not large enough to support the RWG.

** NOTE there are a few exceptions to removing backups.
  • Any backup that is currently being restored by a client is exempt from the purging rules. THis ensures that any restore in process is successful
  • Any replication or copy-to-tape process that is currently executing is exempt.  This ensures that the replication/tape copy completes successfully.

How this works.

As long as there is enough space in the storage location to support the Max_Retention_Window for all databases, there is nothing to note on purging.  Rule 1 executes on a periodic basis.

Rule 2 - Free_Space_Goal.


As I tried to explain above, Free_Space_Goal is a estimate of how much space may be needed to ingest new backups.
If the RA believes that it will need more space available ingest backups than what is currently available, it will start purging older backups that are not needed to support the RWG for databases.
This purging will continue until the Free_Space_Goal is more than Free_Space, or all databases only have backups that support their RWG.

This makes sense for a couple of reasons.

  • It takes into account newly added databases that don't have history.  This ensures that when you send the first few backups to the RA, there is sufficient space to ingest these backups.
  • If space has become tight on the RA, and it can no longer support the Max_Retention_Window for databases, there is enough space available to keep ingesting new backups.
Rule 2 only goes as far as removing backups that are not needed to support the RWG for all databases.

As you can imagine this might not free enough space if your Disk_Reserved_Space is not adequate for all databases.


Rule 3 - Disk_Reserved_Space

Rule 3 kicks in when you run out of Free_Space.  This is the most aggressive purging, and it's priority is higher than most processes that occur on the ZDLRA.  Accepting incoming backups is one of the highest priorities and freeing up space for backups is critical.

From this document, you can see that this is where the Disk Reserved Space allocation for each database becomes important.

The amount of Disk Reserved Space allocated is compared to the amount of space used by this database.  The % of space used greater than the space reserved is used to decide the order in which databases backups are purged.

Backups will continue to be purged to free up space.  

A good recommendation is to set the Disk_Reserved_Space to at least 10% greater than what is needed to support your RWG. This needs to be updated on a regular basis.

*** NOTE   -- Also, you need to keep in mind that you might have some basis with seasonal peaks, ie cyber Monday.  For databases that have periodic peak periods, the Disk_Reserved_space should reflect the amount of space needed to support the peak.  This should be exempt from any automated adjustment using the RWG needed space.

Keep in mind that the combined Disk Reserved Space for all databases cannot be larger than the amount space available in the storage pool.


Important notes :

It is critical that you accurately set the Disk_Reserved_Space for all databases.  If you consistently set the Disk_Reserved_Space to be 20% greater than what is needed for the RWG for ALL databases, this ensures that the ZDLRA will not get more 80% full, and give you some buffer when space becomes tight.

Incorrectly setting the Disk_Reserved_Space for an existing  database may cause the RWG to be lost.  If the Disk_Reserved_Space is less than what is needed, Rule 3 may purge backups needed to support the RWG to free up space. THE RWG is NOT GUARANTEED.

Correctly setting the Disk_Reserved_Space for newly added databases is important.  This value is used by the Free_Space_Goal, and may force unnecessary purging to make room for the incoming backups.


Adjusting RWG


There is a package you can use to estimate how much storage is needed to support a given recovery window goal.  This is can be used to determine if a databases's reserved space is enough to meet the goal.  DBMS_RA.ESTIMAT_SPACE is the package you call.

ESTIMATE_SPACE

This procedure estimates the amount of storage in GB required for recovery of a given database and a desired recovery window.
Syntax
FUNCTION estimate_space (
   db_unique_name IN VARCHAR2,
   target_window IN DSINTERVAL_UNCONSTRAINED) RETURN NUMBER;
Parameters
Table 12-23 ESTIMATE_SPACE Parameters
ParameterDescription
db_unique_name
The name of the database needing the storage estimate.
target_window
The desired recovery window for the database. Specify the goal as any valid INTERVAL DAY TO SECOND expression, such as INTERVAL '2' DAY (2 days), INTERVAL '4' HOUR (4 hours), and so on.


Recommendations on Purging and Space Management.


  • Correctly setting the Disk_Reserved_Space is critical to managing space
  • Periodically adjusting the Disk_Reserved_Space as the databases grows or shrinks is important.  This can be done through the PLSQL package, and can be scheduled on a regular basis.
  • Know which databases, if any, have seasonal peaks and ensure the Disk_Reserved_Space reflects any peak.
  • Properly set the Disk_Reserved_Space for new database.  This will help prevent unnecessary purging, and possibly affecting ingesting of backups.
  • Use the estimate space package to adjust Disk_Reserved_Space if adjusting RWG for a policy.
  • Create an alert on the Disk_Reserved_Space falling below the Recovery_Window_Space. This tells you that the RWG may be lost if the ZDLRA needs to reclaim space.
  • Keep track of the Space_Usage vs Disk_Reserved_Space vs Recovery_Window_Space to understand how each databases space will be managed.
  • Periodically check SAR report (MOS note 2275176.1) for issues.  Process issues, etc. that cause ordering waits affect the estimated "needed space for RWG".  This may quickly inflate the Disk_Reserved_space when using an automated process.
  • Use the Size_Estimate from the RA_DATABASE view to ensure the space utilization makes sense.  This can be a check to verify that the storage utilization of backups on the ZDLRA seems reasonable for the size of the database.