Monday, February 27, 2012

Setting aside a node for maintenance on Exadata

Actually, this isn't exadata specific, but it becomes even more important on a multi-node cluster.

First the background.

  I have a data warehouse application in which we are loading up lots of data.  At the same time, we have users reporting off the data.  I am finding that we actually have 2 needs, and they are opposed

USERS -- Their needs
  • Lots of concurrency
  • Small amounts of data
  • Small PGA
  • small temp
  • Large SGA
If the users need more than this something probably went wrong with their query..

 DBA/ODI jobs

  • Very little concurrency (except for some parallelization)
  • Large amounts of data
  • HUGE PGA
  • HUGE Temp
  • HUGE Undo segments
  • Small SGA

The Temp issue is easy enough to fix with a separate temp  for each user, and by setting up a temporary tablespace group for users.

But what about my the other things data load jobs need ??  The only answer seems to be to set aside 1 (or more) nodes out of my cluster for maintenance/loading.  This node (or nodes) will have a different configuration.  This node, let's say node 8, has the following characteristics.

  • The only service  running on this node is my ODI (data load) service, and a service the DBA's to use for index rebuilds
  • PGA Automatic memory management is not enabled
  • work_area_size_policy is manual
  • sort_area_size=60g
  • hash_area_size=60g
  • undo tablespace size is set to 1tb, much, much larger than the other nodes.  Undo_retention is set to a very large number.
The only work done on Node 8 will be loading of large tables, and  rebuild/creation of indexes.

I was even thinking about getting the memory expansion kit for just this node, to bring it up to 144g from 96g.

Anyone else do this ? set aside a node specifically for "heavy lifting" with a different configuration ?

UPDATE ---  After writing this, and looking at my load process, I noticed that most of my load time is going to temp.. Both reads, and writes since I am doing huge hashes.  I am considering Dedicating SSD luns to the temp area for my ODI jobs only.  I might even try dedicating SSD to the i$ (intermediate staging tables) that ODI uses.

Friday, February 10, 2012

OLTP compression slow for large data set

I am working on loading a large dataset (about 500 Million rows that take up about 100g of space).

I am loading them into a partitioned table and I was hoping to use HCC compression, but at least OLTP compression.

After loading for a while, the inserts seem to go slow and slower,  I was able to test my table structure with OLTP and no compression, and found that there was indeed a bottleneck with compress, but it didn't really get bad until about an hour into the procession.

My process is to do a bulk collect (in pl/sql) of 500 rows, and insert them into my partitioned table.




Below is an example.. From 7:00 am, until 9:30 (in red) I was inserting data into the table with compression off.
You can see that that the number of rows processed in each interval (15 minutes) was consistently ~30 million. for a throughput of 127 Million/hour.
Also take a look at buffer gets/exec, elapsed time/exec, and CPU time/exec.  These values all remain fairly consistent after the first 45 minutes.
At the end of 2.5 hours 351 Million rows were loaded

Now compare to the Blue  (10:15 - 10:45), I was inserting data into the same table with compression on.
You can see that the rows processed started at 22 Million (for the first 15 minutes), but it kept trending downward.  You will also notice that the reads went up,
Compare same values (buffer gets, rows processed, cpu time), and you can see the performance just continues to degrade.
Finally, look at the Violet. This is a snapshot of the currently running load after over 250 Million of data has been loaded.
Notice that we are processing at about 10 Million rows/ hour, the buffer gets are up, and the CPU time has increased.

OLTP compression seems to be significantly slowing down the loads once they get moving along.

 
Anyone see this, or have any idea why it slows down ?  The only theory I can come up with is "garbage collection" for the partitions.. I reach a point, where I am inserting into blocks, that haven't been compressed, and oracle is now going back and compressing the blocks to make room.

Here are the performance numbes.  I've also included the AWR logic read output, If you take number of executions * buffer gets, you find that the logical reads are all from the inserts.


END_TIME
ELAPSED_TIME
EXECUTIONS
TOTAL_READS_PER_EXECUTION
ROWS_PROCESSED Total
BUFFER_GETS
CPU_TIME


2/10/2012 7:00
0.004
56,711
283
28,355,500
283
4,316


2/10/2012 7:15
0.004
83,225
262
41,612,500
262
4,206


2/10/2012 7:30
0.004
81,178
293
40,589,000
293
4,332


2/10/2012 7:45
0.007
66,630
945
33,315,000
945
6,821


2/10/2012 8:00
0.007
62,374
1,190
31,187,000
1,190
7,353


2/10/2012 8:15
0.009
58,031
1,640
29,015,500
1,640
8,912


2/10/2012 8:30
0.008
59,598
1,442
29,799,000
1,442
8,292


2/10/2012 8:45
0.009
57,116
1,648
28,558,000
1,648
8,952


2/10/2012 9:00
0.008
60,477
1,410
30,238,500
1,410
8,057


2/10/2012 9:15
0.009
56,334
1,710
28,167,000
1,710
9,060


2/10/2012 9:30
0.008
61,627
1,293
30,813,500
1,293
7,681






351,650,500







Throughput
127,872,909






















2/10/2012 10:15
0.013
45,964
1,940
22,982,000
1,940
12,878


2/10/2012 10:30
0.019
33,048
3,014
16,524,000
3,014
19,466


2/10/2012 10:45
0.018
36,192
2,235
18,096,000
2,235
18,024


2/10/2012 11:00
0.017
37,362
1,737
18,681,000
1,737
17,507


2/10/2012 11:15
0.018
34,992
1,526
17,496,000
1,526
17,799


2/10/2012 11:30
0.036
20,757
6,253
10,378,500
6,253
35,703


2/10/2012 11:45
0.046
16,744
8,714
8,372,000
8,714
46,436






112,529,500







throughput
64,302,571













END_TIME
ELAPSED_TIME_DELTA
EXECUTIONS_DELTA
TOTAL_READS_PER_EXECUTION
ROWS_PROCESSED_DELTA
DISK_READS_DELTA
BUFFER_GETS
CPU_TIME

2/9/2012 22:00
0.186
4,572
33,631
2,286,000
11
33,620
171,338

2/9/2012 22:15
0.188
4,632
33,240
2,316,000
11
33,229
171,302

2/9/2012 22:30
0.19
4,545
33,574
2,272,500
10
33,564
174,641

2/9/2012 22:45
0.182
4,762
33,027
2,381,000
11
33,016
167,433














9,255,500













Segments by Logical Reads

  • Total Logical Reads: 159,380,402
  • Captured Segments account for 99.5% of Total
Owner
Tablespace Name
Object Name
Subobject Name
Obj. Type
Logical Reads
%Total
BGRENN
BGRENN_2009
FACT_BGRENN_DETL
ERD_BGRENN_2009
TABLE PARTITION
36,379,968
22.83
BGRENN
BGRENN_2010
FACT_BGRENN_DETL
ERD_BGRENN_2010
TABLE PARTITION
35,459,344
22.25
BGRENN
BGRENN_2011
FACT_BGRENN_DETL
ERD_BGRENN_2011
TABLE PARTITION
34,801,888
21.84
BGRENN
BGRENN_2008
FACT_BGRENN_DETL
ERD_BGRENN_2008
TABLE PARTITION
33,651,856
21.11
BGRENN
BGRENN_2007
FACT_BGRENN_DETL
ERD_BGRENN_2007
TABLE PARTITION
9,641,168
6.05