Tuesday, February 26, 2019

ZDLRA - Space management





When backing up to a ZDLRA you want to optimize  your storage allocation to ensure each database can meet its recovery window goal.

You also want to make sure that the ZDLRA doesn't needs to aggressively reclaim space.

***  NOTE - This was updated on 9/13/19 as I've learned more.


Values you can set to control space and purging

Mandatory values

  • Recovery Window Goal (RWG) - How far back (in days) do I need to ensure a point in time recovery for my databases. This is set at the policy level.
        ** NOTE. This is a GOAL, not a guarantee.

  • Disk Reserved Space - How much storage to I need to ensure I keep my recovery window goal for this database.    This is set at the database level and should be adjusted as the database grows.  The sum of Disk_Reserved_Space for all databases can not be greater than the amount of space available on the storage location.  

Optional values


  • Max_Retention_Window -  What is the maximum number of days I want to keep backups for.  The MOS note on purging does a great job of documenting what the recommended value is (if you want to set it), and what the current default value is based on RA version.
  • Guaranteed_Copy -  This is set at the policy level and has 2 possible values
    • NO (default) - Follow purge rules regardless if backups have been replicated to tape/downstream.
    • YES - Backups that have not been successfully copied to tape/replicated are NOT eligible for deletion.  If the space usage for the database is above the Disk_Reserved_Space, incoming backups (both through RMAN and real-time redo) will be rejected.

Derived values that control space and purging


Each storage location has the following values that are used to control purging.

  • Free_Space - The amount of space (GB) that is available for storing backups . This is for the storage location
  • Free_Space_Goal - The estimated amount of space that is needed to support incoming backups to the storage location. This is evaluated every 3 hours. This estimate is the sum of space needed for the next 6 hours taking into account 2 types of incoming backups .
    • Daily backups from databases that have historical data.  The amount of space required to support normal incoming backup space is calculated in the Free_Space_Goal.
    • Newly added databases. In the case of databases that do not have good historical data, the Disk_Reserved_Space is used to calculate how much space needs to be available to support incoming backups.
  • Recovery_Window_Space - The amount of space needed by the database to support the RWG. This is available for each database.
  • Space_Usage - The amount of space currently being used by this protected database


Purge Process


How backups are prioritized for purging is described in this MOS note.
Zero Data Loss Recovery Appliance Purging Rules (Doc ID 2353188.1)


There are 3 rules of purging and this is outlined in the MOS note.




  • Rule 1 - This is the normal housekeeping purging and occurs on a regular basis for all databases.  Any backups whose recovery window is greater than the Max_Retention_Window are periodically removed from the ZDLRA to make space for incoming backups. This is a low priority background process.
The way this rule works is that the retention window for each database (maximum based on policy) grows until the number of days set for Max_Retention_Window is reached for the database.  When this occurs, the retention period is purged down to the Recovery_Window_goal for that database.  For example if my database has a Recovery_Window_Goal of  15 days, and a Max_Retention_Window of 20 days, when the database reaches 20 days of retention, the oldest 5 days of backups will be removed and I will have 15 days of backups after purging.  This ensures the purge is more of a bulk process, and ensures that the purge workload is spread out for all databases.

  • Rule 2 - When the Free_Space for a storage location falls below the Free_Space_Goal regular purging begins.  The database whose current retention is most above it's RWG is added to the purging queue and purging begins at a low priority process.

The way this rule works is similar to Rule 1.  Beginning with the database with the largest current  number of days of retention beyond the Recovery_Window_goal, databases, each database is purged down to its Recovery_Window_Goal until enough space is released. As I said, the processing is similar to rule 1, except it doesn't wait until the Max_Retention_Window is reached.

     
  • Rule 3 - When the Free_Space is not enough  to allow incoming backups, a more aggressive purging begins starting with the database that is most above (by percentage) its allocated Disk_Reserved_Space.  This could impact the RWG if the Disk_Reserved_Space is not large enough to support the RWG.

** NOTE there are a few exceptions to removing backups.
  • Any backup that is currently being restored by a client is exempt from the purging rules. THis ensures that any restore in process is successful
  • Any replication or copy-to-tape process that is currently executing is exempt.  This ensures that the replication/tape copy completes successfully.

How this works.

As long as there is enough space in the storage location to support the Max_Retention_Window for all databases, there is nothing to note on purging.  Rule 1 executes on a periodic basis.

Rule 2 - Free_Space_Goal.


As I tried to explain above, Free_Space_Goal is a estimate of how much space may be needed to ingest new backups.
If the RA believes that it will need more space available ingest backups than what is currently available, it will start purging older backups that are not needed to support the RWG for databases.
This purging will continue until the Free_Space_Goal is more than Free_Space, or all databases only have backups that support their RWG.

This makes sense for a couple of reasons.

  • It takes into account newly added databases that don't have history.  This ensures that when you send the first few backups to the RA, there is sufficient space to ingest these backups.
  • If space has become tight on the RA, and it can no longer support the Max_Retention_Window for databases, there is enough space available to keep ingesting new backups.
Rule 2 only goes as far as removing backups that are not needed to support the RWG for all databases.

As you can imagine this might not free enough space if your Disk_Reserved_Space is not adequate for all databases.


Rule 3 - Disk_Reserved_Space

Rule 3 kicks in when you run out of Free_Space.  This is the most aggressive purging, and it's priority is higher than most processes that occur on the ZDLRA.  Accepting incoming backups is one of the highest priorities and freeing up space for backups is critical.

From this document, you can see that this is where the Disk Reserved Space allocation for each database becomes important.

The amount of Disk Reserved Space allocated is compared to the amount of space used by this database.  The % of space used greater than the space reserved is used to decide the order in which databases backups are purged.

Backups will continue to be purged to free up space.  

A good recommendation is to set the Disk_Reserved_Space to at least 10% greater than what is needed to support your RWG. This needs to be updated on a regular basis.

*** NOTE   -- Also, you need to keep in mind that you might have some basis with seasonal peaks, ie cyber Monday.  For databases that have periodic peak periods, the Disk_Reserved_space should reflect the amount of space needed to support the peak.  This should be exempt from any automated adjustment using the RWG needed space.

Keep in mind that the combined Disk Reserved Space for all databases cannot be larger than the amount space available in the storage pool.


Important notes :

It is critical that you accurately set the Disk_Reserved_Space for all databases.  If you consistently set the Disk_Reserved_Space to be 20% greater than what is needed for the RWG for ALL databases, this ensures that the ZDLRA will not get more 80% full, and give you some buffer when space becomes tight.

Incorrectly setting the Disk_Reserved_Space for an existing  database may cause the RWG to be lost.  If the Disk_Reserved_Space is less than what is needed, Rule 3 may purge backups needed to support the RWG to free up space. THE RWG is NOT GUARANTEED.

Correctly setting the Disk_Reserved_Space for newly added databases is important.  This value is used by the Free_Space_Goal, and may force unnecessary purging to make room for the incoming backups.


Adjusting RWG


There is a package you can use to estimate how much storage is needed to support a given recovery window goal.  This is can be used to determine if a databases's reserved space is enough to meet the goal.  DBMS_RA.ESTIMAT_SPACE is the package you call.

ESTIMATE_SPACE

This procedure estimates the amount of storage in GB required for recovery of a given database and a desired recovery window.
Syntax
FUNCTION estimate_space (
   db_unique_name IN VARCHAR2,
   target_window IN DSINTERVAL_UNCONSTRAINED) RETURN NUMBER;
Parameters
Table 12-23 ESTIMATE_SPACE Parameters
ParameterDescription
db_unique_name
The name of the database needing the storage estimate.
target_window
The desired recovery window for the database. Specify the goal as any valid INTERVAL DAY TO SECOND expression, such as INTERVAL '2' DAY (2 days), INTERVAL '4' HOUR (4 hours), and so on.


Recommendations on Purging and Space Management.


  • Correctly setting the Disk_Reserved_Space is critical to managing space
  • Periodically adjusting the Disk_Reserved_Space as the databases grows or shrinks is important.  This can be done through the PLSQL package, and can be scheduled on a regular basis.
  • Know which databases, if any, have seasonal peaks and ensure the Disk_Reserved_Space reflects any peak.
  • Properly set the Disk_Reserved_Space for new database.  This will help prevent unnecessary purging, and possibly affecting ingesting of backups.
  • Use the estimate space package to adjust Disk_Reserved_Space if adjusting RWG for a policy.
  • Create an alert on the Disk_Reserved_Space falling below the Recovery_Window_Space. This tells you that the RWG may be lost if the ZDLRA needs to reclaim space.
  • Keep track of the Space_Usage vs Disk_Reserved_Space vs Recovery_Window_Space to understand how each databases space will be managed.
  • Periodically check SAR report (MOS note 2275176.1) for issues.  Process issues, etc. that cause ordering waits affect the estimated "needed space for RWG".  This may quickly inflate the Disk_Reserved_space when using an automated process.
  • Use the Size_Estimate from the RA_DATABASE view to ensure the space utilization makes sense.  This can be a check to verify that the storage utilization of backups on the ZDLRA seems reasonable for the size of the database.