Thursday, January 3, 2019

Verifying you have a recoverable backup

As you probably know the best way to validate that you can recover your database to a specific point-in-time is to perform a point-in-time recovery of the database and successfully open it. That isn't always possible due to the size of the database, infrastructure necessary, and the time it takes to go through the process.  To verify recoverability there are several commands that give you all the pieces necessary, but it is important to understand what each command does.

Restore Validate -  The restore validate command can be performed for any database objects including the whole database to verify the files are valid.  This command can be used to restore as of the current time (default), but you can also specify a previous point-in-time.  The validate verifies that the files are available and also checks for corruption.  By default it checks for physical corruption, but it can also check for logical corruption. This command reads through the backup pieces 
Examples of restore validate for different cases.
verify current backup -
RMAN> RESTORE DATABASE VALIDATE;
RMAN> RESTORE ARCHIVELOG ALL VALIDATE;
This will verify that the last full backup (level 0) does not contain corruption, and it verifies that archivelogs exist throughout the recovery window, and that they do not contain corruption. 
verify previous backup -
RMAN> RESTORE DATABASE VALIDATE UNTIL TIME "to_date('xx/xx/xx xx:xx:xx','mm/dd/yy hh24:mi:ss')";
This will verify that the full backup performed prior to the "until time" is available and does not contain corruption.
Restore Preview - The restore preview can be used to identify the backup pieces necessary for a restore AND recovery of the database.  The preview will show all the datafile backups (both full and incremental), along with the archive logs needed.  The restore preview does not check for corruption, it simply lists the backup pieces necessary.

Validate -  The validate command can be performed on any database objects along with backupsets. Unlike restore, you cannot give it an "until time", you must identify the object you want to validate.  By default it checks for physical corruption, but it can also check for logical corruption.

It is critical to understand exactly what all three commands do to ensure you can efficiently recover to a previous point-in-time.  It takes a combination of 2 commands (plus an extra to be sure).

So let's see how I can test if I can recover to midnight on the last day of the previous month.  My backup strategy is "Weekly full/Daily incremental" backups.  I perform my full backups on Saturday night and keep my backups for 90 days. My archive logs are kept on disk for 20 days.

It is now 12/15/18 and I want to verify that I can recover my database to 23:59:59 on 11/30.

First I want to perform a restore validate of my database using until time.
RMAN > restore validate database UNTIL TIME "to_date('11/30/18 23:59:59','mm/dd/yy hh24:mi:ss')";
By looking through the logs, I can see the this command performed a validation of the Level 0 (full) backup I performed on 11/24/18.  It did not validate any of the incremental backups or archive logs that I will need to recover the database.
Now I want to perform a restore validate of my archive logs using until time.
RMAN > restore validate archivelog UNTIL TIME "to_date('11/30/18 23:59:59','mm/dd/yy hh24:mi:ss')";
By looking through the logs, I can see that this command validated the backup of all my older archive logs (older than 20 days), and it validated the FRA copy of my archive logs that were 20 days or newer.
There are couple of issues with this that jump out at me.
1) My incremental backups are NOT validated.  Yes I know I can recover to the  point-in-time I specified, but all I verified is that it is possible by applying 6 days of archive logs.
2) I don't know if any of my newer backups of archive logs are corrupt.  The validation of archive logs only validated backups of archivelogs that are NOT on disk. I still need to validate the archive log backups for archive logs that will age off from the FRA in a few days to be sure..
3) The process doesn't check or validate a backup of the controlfile.  If I am restoring to another server (or if I lose my database completely) I want to make sure I have a backup of the controlfile to restore.
The only way to be sure that I have an efficient, successful recovery to this point in time is to combine the 2 commands and use that information to validate all the pieces necessary for recovery.
Here are the steps to ensure I have valid backups of all the pieces necessary to restore efficiently.
1) Perform a restore preview and write the output to a log file.
2) Go through the output from step 1 and identify all the "BS KEY" (backupset key) values.
3) Perform a "validate backupset xx;" using the keys from step 2 to ensure all datafile backups (both full and incremental backups) are valid.
4) Go through the output from step 1 and identify the archive log sequences needed.
5) Identify the backupsets that contain all log sequences needed (my restore preview pointed to the archive logs on disk).
6) Perform a "validate backupset xx;" using the keys from step 5.
7) Perform a restore controlfile to a dummy location using the UNTIL time.
As you can see the "restore validate" is not enough to ensure efficient recovery. It only ensures that the you have a valid RESTORE not a valid recovery.
The ZDLRA does constant recovery checks using the virtual full backups, and archive logs.
The ZDLRA ensures you can quickly AND successfully recover to any point-in-time within your retention window.


No comments:

Post a Comment