Thursday, January 31, 2019

TDE Implementation first 30 days

This blog post is on the immediate effect of implementing TDE in your database backup storage.

In the last blog post I mentioned that after implementing TDE you need to explicitly perform a full backup on all tablespaces you encrypt.
  • If you are performing an incremental strategy (level 0 and level 1) then explicitly perform a level 0 backup.
  • If you are using an incremental merge backup (BACKUP INCREMENTAL LEVEL 1 for RECOVERY of COPY), you must create a new image copy and start a new divergent backup
The reason a full explicit backup is needed is that the SCN number of the underlying data blocks do not change when you implement encryption.  Because of this, any backup strategy you are using does not recognize that the the blocks have changed.  This is true even with the ZDLRA.

Above is the reason for this post.  If you have an unencrypted database, and you implement TDE on all tablespaces, then perform a full backup of all tablespaces, you will have 2 copies of your backups for a period of time.

I am going to take a closer look at what this means to my backup storage when implementing TDE.

For my example, I'm going to use the same example I used in the last post.
Below is what it looks like.

Using my example dataset for a just a weekly full backup and a daily incremental backup (using deduplication) you can see that

The data in my database is 35 GB ( the fully allocated size is 50 GB).
The amount of change over 30 days is 70 GB.
Compressing both of these, the final size is 41.5 GB used.

Prior to implementing TDE, I am using 41.5 consistently to store 30 days of compressed backups for my 50GB database.

The next column shows what happens when after implementing TDE.  The full backup size compressed is now 43.9 GB and the amount of change over 30 days is 54 GB compressed.
After implementing TDE, I am using 97.9 GB consistently to store 30 days of compressed backups for my 50 GB database.

Now let's see what first 30 days looks like.

This shows the unencrypted backups (purple) and how they start to age off and get removed on day 31.  You can also see how the encrypted backups (gray) grow over time and the will stabilize to the size you see on day 31.

You can see that there is an immediate impact to the backup usage. That growth continues until the old backup is finally ages off.

You need to plan for TDE ahead of time.  For my example dataset, the storage needed more than doubled.

Friday, January 25, 2019

Implementing TDE

This blog post is about what happens when you implement TDE (Transparent Data Encryption) in your database.

I started with 2 other blog posts, but I soon realized that this is a HUGE topic with ramifications throughout the database system.

Well let's start at the beginning.
I wanted to know what happens when you implement TDE.
The 3 items that I keep hearing are

  1. Encrypted data will not compress
  2. Encrypted data will not dedupe
  3. Implementing Advanced compression in the database will mitigate the loss of compression to encryption by compressing the data before encrypting

Now in order to investigate this further, I had some restrictions that affected my testing.

  • I had no access to a deduplicating appliance to test point 2
  • I didn't have a real data set to use.  I ended up using the SOE schema in swingbench 
  • The only way I could test the effect of compression was to compress the backupset.  Most appliances (including the ZDLRA) break the file up into pieces and compress each piece. I couldn't emulate the affect of doing that.
Now for my testing dataset.  I created a dataset in the SOE tablespace that in uncompressed.  I then took that dataset and created a second copy of the schema with advanced compression in place in the SOE_COMPRESSED tablespace.

When I finished this I had 2 tablespaces comprised of 10 x 5 GB datafiles.
SOE                              --> uncompressed copy of data
SOE_COMPRESSED --> Advanced compression (ACO) copy of data

In the graphs above you can see  that out of the 50 GB of allocated space
uncompressed  --> 35.8 GB of used space
compressed      -->  24.9 GB of used space

These are the datasets I am going to use throughout my testing.

I started by taking these 2 datasets and seeing what happens with the 2 full backup types
L0               --> Full backups saved as backupset
image copy --> Full copy of datafile.

Below is what I found when I compressed both these backup types.

This was very helpful to see what happens.

For my uncompressed dataset (35 GB), I was able to compress the file down to 20 GB.
For my uncompressed image copy (35 GB used) I was able to compress the file down to 18 GB
For my compressed dataset (24 GB), I was able to compress the file down to 18 GB
For my compressed image copy (24 GB), I was able to compress the file down to 18 GB.

So what I learned is no matter how I compressed the data, in the database (ACO), or just compress the backup, the final size is still ~18 GB.

Now we have some benchmarks on what happens when I just compress the backups.

I started with encrypting the image copy.  This was one area that I wanted to learn more about for 2 reasons.

  1. Many of the newer backup strategies use the "incremental merge" process that started with 10G of oracle. I was curious what happens to image copies when you implement TDE.
  2. Many of the Flash arrays use compression to get more usable storage from the smaller SSD drives.  Looking at what happens with compressing the datafile copies gives a lot of insight into how those flash arrays will be affected by TDE.

You can see from the above testing there was a very significant impact on the compressed size of the image copy.  Both compressed and uncompressed data compressed nicely when it was unencrypted.  After encrypting the dataset, the compressed size is about 3x larger than the compressed size.

This gives me the answer on how it will affect these 2 areas.

  1. If using an "incremental merge" based backup that is stored on compressed storage, you will need 3x the amount of storage for your backups.
  2. If using Flash storage for your database files that utilizes compression, you will need 3x the amount of database storage.
OK.. There's the first answer I was looking for.

Now let's see what happens with my Level 0 backups.

With level 0 backups, I can see the benefit of compressing the data first.
I can also see the effect of encryption.

Compressing the data first saved me a about a 1/3 of the size of the data, and about 1/3 of the size of the backupset compressed (once encrypted).

Now let's take a look of the effect of encryption on the change rate.  I updated one of the tables in the database to create a change rate, and I looked at both the archive logs and the incremental backup.

Wow.. What stood out to me on this testing is the size of the archive logs.  Adding Advanced Compression to the database decreased the size of the incremental backups (expected), but increased the size of the archive logs.  Then when I looked at the compressed size (both with and without encryption) compressing the data in the database increased the size of the archive logs.

This showed me a lot of what happens with TDE.  What I learned through this was.

  1. Encrypted data indeed does not compress, in fact it can get much bigger.
  2. Implementing Advanced Compression does not always mitigate the effects of encryption. Depending on the dataset, it might have very limited effect.
  3. The compressed size of datafiles and image copy backups may increase by 3x or more depending on the amount of free space you typically have in your datafiles..

Thursday, January 3, 2019

Verifying you have a recoverable backup

As you probably know the best way to validate that you can recover your database to a specific point-in-time is to perform a point-in-time recovery of the database and successfully open it. That isn't always possible due to the size of the database, infrastructure necessary, and the time it takes to go through the process.  To verify recoverability there are several commands that give you all the pieces necessary, but it is important to understand what each command does.

Restore Validate -  The restore validate command can be performed for any database objects including the whole database to verify the files are valid.  This command can be used to restore as of the current time (default), but you can also specify a previous point-in-time.  The validate verifies that the files are available and also checks for corruption.  By default it checks for physical corruption, but it can also check for logical corruption. This command reads through the backup pieces 
Examples of restore validate for different cases.
verify current backup -
This will verify that the last full backup (level 0) does not contain corruption, and it verifies that archivelogs exist throughout the recovery window, and that they do not contain corruption. 
verify previous backup -
RMAN> RESTORE DATABASE VALIDATE UNTIL TIME "to_date('xx/xx/xx xx:xx:xx','mm/dd/yy hh24:mi:ss')";
This will verify that the full backup performed prior to the "until time" is available and does not contain corruption.
Restore Preview - The restore preview can be used to identify the backup pieces necessary for a restore AND recovery of the database.  The preview will show all the datafile backups (both full and incremental), along with the archive logs needed.  The restore preview does not check for corruption, it simply lists the backup pieces necessary.

Validate -  The validate command can be performed on any database objects along with backupsets. Unlike restore, you cannot give it an "until time", you must identify the object you want to validate.  By default it checks for physical corruption, but it can also check for logical corruption.

It is critical to understand exactly what all three commands do to ensure you can efficiently recover to a previous point-in-time.  It takes a combination of 2 commands (plus an extra to be sure).

So let's see how I can test if I can recover to midnight on the last day of the previous month.  My backup strategy is "Weekly full/Daily incremental" backups.  I perform my full backups on Saturday night and keep my backups for 90 days. My archive logs are kept on disk for 20 days.

It is now 12/15/18 and I want to verify that I can recover my database to 23:59:59 on 11/30.

First I want to perform a restore validate of my database using until time.
RMAN > restore validate database UNTIL TIME "to_date('11/30/18 23:59:59','mm/dd/yy hh24:mi:ss')";
By looking through the logs, I can see the this command performed a validation of the Level 0 (full) backup I performed on 11/24/18.  It did not validate any of the incremental backups or archive logs that I will need to recover the database.
Now I want to perform a restore validate of my archive logs using until time.
RMAN > restore validate archivelog UNTIL TIME "to_date('11/30/18 23:59:59','mm/dd/yy hh24:mi:ss')";
By looking through the logs, I can see that this command validated the backup of all my older archive logs (older than 20 days), and it validated the FRA copy of my archive logs that were 20 days or newer.
There are couple of issues with this that jump out at me.
1) My incremental backups are NOT validated.  Yes I know I can recover to the  point-in-time I specified, but all I verified is that it is possible by applying 6 days of archive logs.
2) I don't know if any of my newer backups of archive logs are corrupt.  The validation of archive logs only validated backups of archivelogs that are NOT on disk. I still need to validate the archive log backups for archive logs that will age off from the FRA in a few days to be sure..
3) The process doesn't check or validate a backup of the controlfile.  If I am restoring to another server (or if I lose my database completely) I want to make sure I have a backup of the controlfile to restore.
The only way to be sure that I have an efficient, successful recovery to this point in time is to combine the 2 commands and use that information to validate all the pieces necessary for recovery.
Here are the steps to ensure I have valid backups of all the pieces necessary to restore efficiently.
1) Perform a restore preview and write the output to a log file.
2) Go through the output from step 1 and identify all the "BS KEY" (backupset key) values.
3) Perform a "validate backupset xx;" using the keys from step 2 to ensure all datafile backups (both full and incremental backups) are valid.
4) Go through the output from step 1 and identify the archive log sequences needed.
5) Identify the backupsets that contain all log sequences needed (my restore preview pointed to the archive logs on disk).
6) Perform a "validate backupset xx;" using the keys from step 5.
7) Perform a restore controlfile to a dummy location using the UNTIL time.
As you can see the "restore validate" is not enough to ensure efficient recovery. It only ensures that the you have a valid RESTORE not a valid recovery.
The ZDLRA does constant recovery checks using the virtual full backups, and archive logs.
The ZDLRA ensures you can quickly AND successfully recover to any point-in-time within your retention window.