Thursday, April 28, 2011

Cardinality Feedback recommendations

Here are the recommendations I have for working nicely with Cardinaly feedback.

1) Make sure your statistics are accurate.

2) Try to use histograms where necessary for cadinality

3) Avoid functions on columns or anything that forces oracle to estimate cardinality.

4) Test a query more than once.  Cardinality feedback doesn't show up until the second execution

5) Don't necessarily trust the "explain plan for" for any of the tools that gives you the explain plan based on the first  execution.  The first plan may change.

6) Use function based indexes on expressions (like the decode expression in my previous post), and gather statistics on the function based index

7) Create a histogram for indexes with composite keys, especially when the cardinality isn't as expected. The perfect example of this is a composite index on the column  "Birth Month" and  "Astrological Sign".  There are only so many valid combiniations. Without a histogram, Oracle assumes a cartesion product, and estimates cardinality based on that.

I think it all comes down to Giving oracle as much information as possible, and then making sure you have tested the query a couple of times.

Often the development folks don't realize these things.  They find "cheats" that only work the first time.  They often rely on the explain plan for the first execution.

It is up to us in the DBA realm to educate developers on how things have changed with this feature.

Wednesday, April 27, 2011

Cardinality Feedback Example

I have been fighting cardinality feedback issues all week, and I wanted to document an example of cardinality feedback in action.

First take build my test case, it can be found here

drop user testcard cascade;
create user testcard identified by testcard;
grant dba to testcard;
create TABLE testcard.object_sample1 as select * from dba_objects where 1=2;
create TABLE testcard.object_sample2 as select * from dba_objects where 1=2;

Insert into testcard.object_sample1 (select * from dba_objects where owner in ('SYS','TESTCARD') );
Insert into testcard.object_sample1 (select * from testcard.object_sample1 );
Insert into testcard.object_sample1 (select * from testcard.object_sample1 );
Insert into testcard.object_sample2 (select * from dba_objects where owner in ('SYS','TESTCARD') );

create index testcard.idx1_object_sample2 on testcard.object_sample2(owner);
create index testcard.idx1_object_sample1 on testcard.object_sample1(owner);

EXECUTE DBMS_STATS.GATHER_TABLE_STATS( ownname => 'TESTCARD', tabname => 'OBJECT_SAMPLE1', method_opt => 'FOR ALL COLUMNS SIZE 1',granularity => 'ALL',cascade => TRUE,degree => DBMS_STATS.DEFAULT_DEGREE);

EXECUTE DBMS_STATS.GATHER_TABLE_STATS( ownname => 'TESTCARD', tabname => 'OBJECT_SAMPLE2', method_opt => 'FOR ALL COLUMNS SIZE 1',granularity => 'ALL',cascade => TRUE,degree => DBMS_STATS.DEFAULT_DEGREE);


Now lets take a look a what we have

select owner,count(1) from testcard.object_sample1 group by owner;

OWNER                            COUNT(1)
------------------------------ ----------
TESTCARD                                8
SYS                                126620

select owner,count(1) from testcard.object_sample2 group by owner;

OWNER                            COUNT(1)
------------------------------ ----------
TESTCARD                                2
SYS                                 31655


Now here is the query

select * from testcard.object_sample1 a,
testcard.object_sample2 b
where a.owner='SYS' and a.owner is not null and a.object_type='SSSSS'
and b.owner='SYS' and b.object_type=a.object_type and a.owner=b.owner
 
Notice the "is not null" clause, and notice the cardinality.  Finally notice I did not create a histogram for the columns.  if you run this query for owner of 'TESTCARD', it should return 2 rows, and if you run it for 'SYS' it should scan 126,00+ rows.
 
Well the developers are running it for 'TESTCARD', and it is doing a full table scan. They want to use the index for OWNER.  They find a cheat, and rewrite the query. This fools the optimize into thinking the index is a better bet since a function is used (the decode function)..
 
Below is the query with the "is not null" and the "decode" and you can see the 2 different paths.
 

"A.OWNER IS NOT NULL"
 
EXPLAIN PLAN
SET STATEMENT_ID = 'abc' FOR
select * from testcard.object_sample1 a,
testcard.object_sample2 b
where a.owner='SYS' and a.owner is not null and a.object_type='SSSSS'
and b.owner='SYS' and b.object_type=a.object_type and a.owner=b.owner;
set pagesize 25
set linesize 121
set long 1000000
col xplan format a100
SELECT dbms_xplan.display_plan(statement_id => 'abc') AS XPLAN
FROM dual;
 


 Plan Hash Value  : 3876965040

-------------------------------------------------------------------------------------
| Id  | Operation            | Name           | Rows   | Bytes    | Cost | Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                | 294519 | 56547648 |  650 | 00:00:08 |
| * 1 |   HASH JOIN          |                | 294519 | 56547648 |  650 | 00:00:08 |
| * 2 |    TABLE ACCESS FULL | OBJECT_SAMPLE2 |    384 |    36864 |  136 | 00:00:02 |
| * 3 |    TABLE ACCESS FULL | OBJECT_SAMPLE1 |   1535 |   147360 |  513 | 00:00:07 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 1 - access("B"."OBJECT_TYPE"="A"."OBJECT_TYPE" AND "A"."OWNER"="B"."OWNER")
* 2 - filter("B"."OBJECT_TYPE"='SSSSS' AND "B"."OWNER"='SYS')
* 3 - filter("A"."OBJECT_TYPE"='SSSSS' AND "A"."OWNER"='SYS')



"DECODE(A.OWNER,NULL,1,2)=2"
EXPLAIN PLAN

SET STATEMENT_ID = 'cde' FOR
select * from testcard.object_sample1 a,
testcard.object_sample2 b
where a.owner='SYS' and decode(a.owner,null,1,2)=2 and a.object_type='SSSSS'
and b.owner='SYS' and b.object_type=a.object_type and a.owner=b.owner ;
set pagesize 25
set linesize 121
set long 1000000
col xplan format a100
SELECT dbms_xplan.display_plan(statement_id => 'cde') AS XPLAN
FROM dual;



----------------------------------------------------------------------------------------------------
 Plan Hash Value  : 2829654189

------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                | Rows | Bytes  | Cost | Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                     | 2945 | 565440 |  279 | 00:00:04 |
| * 1 |   HASH JOIN                    |                     | 2945 | 565440 |  279 | 00:00:04 |
| * 2 |    TABLE ACCESS BY INDEX ROWID | OBJECT_SAMPLE1      |   15 |   1440 |  142 | 00:00:02 |
| * 3 |     INDEX RANGE SCAN           | IDX1_OBJECT_SAMPLE1 |  629 |        |  133 | 00:00:02 |
| * 4 |    TABLE ACCESS FULL           | OBJECT_SAMPLE2      |  384 |  36864 |  136 | 00:00:02 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 1 - access("B"."OBJECT_TYPE"="A"."OBJECT_TYPE" AND "A"."OWNER"="B"."OWNER")
* 2 - filter("A"."OBJECT_TYPE"='SSSSS')
* 3 - access("A"."OWNER"='SYS')
* 3 - filter(DECODE("A"."OWNER",NULL,1,2)=2)
* 4 - filter("B"."OBJECT_TYPE"='SSSSS' AND "B"."OWNER"='SYS')


So as you can see there are decode plan is a better plan for owner of  'TESTCODE'.  The deveopment folks then rewrite their query with this "cheat" thinking they beat the optimizer...


But cardinality feedback kicks in, when the query is executed with an owner of 'SYS'.

Below is the output of the display_cursor.. Notice the 2 plans, and notice that cardinality feedback was used to change the plan from an index lookup to FTS again.


select * from testcard.object_sample1 a,testcard.object_sample2 b where a.owner='SYS' and decode(a.owner,null,1,2)=2 and a.object_type='SSSSS' and b.owner='SYS' and b.object_type=a.object_type and a.owner=b.owner;


PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
SQL_ID  91ugxuy8jc3g4, child number 0
-------------------------------------
 select * from testcard.object_sample1 a,
testcard.object_sample2 b  where a.owner='SYS' and
decode(a.owner,null,1,2)=2  and a.object_type='SSSSS'       and
b.owner='SYS' and b.object_type=a.object_type and a.owner=b.owner

Plan hash value: 2829654189

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                     |       |       |   279 (100)|          |
|*  1 |  HASH JOIN                   |                     |  2945 |   552K|   279   (1)| 00:00:04 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| OBJECT_SAMPLE1      |    15 |  1440 |   142   (1)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | IDX1_OBJECT_SAMPLE1 |   629 |       |   133   (1)| 00:00:02 |
|*  4 |   TABLE ACCESS FULL          | OBJECT_SAMPLE2      |   384 | 36864 |   136   (0)| 00:00:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------


PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
   1 - access("B"."OBJECT_TYPE"="A"."OBJECT_TYPE" AND "A"."OWNER"="B"."OWNER")
   2 - filter("A"."OBJECT_TYPE"='SSSSS')
   3 - access("A"."OWNER"='SYS')
       filter(DECODE("A"."OWNER",NULL,1,2)=2)
   4 - filter(("B"."OBJECT_TYPE"='SSSSS' AND "B"."OWNER"='SYS'))






select * from testcard.object_sample1 a,testcard.object_sample2 b where a.owner='SYS' and decode(a.owner,null,1,2)=2 and a.object_type='SSSSS' and b.owner='SYS' and b.object_type=a.object_type and a.owner=b.owner;

select * from table(dbms_xplan.display_cursor('07prx7pkch6tr',null,'typical +peeked_binds'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
SQL_ID  91ugxuy8jc3g4, child number 0
-------------------------------------
 select * from testcard.object_sample1 a,
testcard.object_sample2 b  where a.owner='SYS' and
decode(a.owner,null,1,2)=2  and a.object_type='SSSSS'       and
b.owner='SYS' and b.object_type=a.object_type and a.owner=b.owner

Plan hash value: 2829654189

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                     |       |       |   279 (100)|          |
|*  1 |  HASH JOIN                   |                     |  2945 |   552K|   279   (1)| 00:00:04 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| OBJECT_SAMPLE1      |    15 |  1440 |   142   (1)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | IDX1_OBJECT_SAMPLE1 |   629 |       |   133   (1)| 00:00:02 |
|*  4 |   TABLE ACCESS FULL          | OBJECT_SAMPLE2      |   384 | 36864 |   136   (0)| 00:00:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------


PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
   1 - access("B"."OBJECT_TYPE"="A"."OBJECT_TYPE" AND "A"."OWNER"="B"."OWNER")
   2 - filter("A"."OBJECT_TYPE"='SSSSS')
   3 - access("A"."OWNER"='SYS')
       filter(DECODE("A"."OWNER",NULL,1,2)=2)
   4 - filter(("B"."OBJECT_TYPE"='SSSSS' AND "B"."OWNER"='SYS'))

SQL_ID  91ugxuy8jc3g4, child number 1
-------------------------------------
 select * from testcard.object_sample1 a,
testcard.object_sample2 b  where a.owner='SYS' and
decode(a.owner,null,1,2)=2  and a.object_type='SSSSS'       and
b.owner='SYS' and b.object_type=a.object_type and a.owner=b.owner

Plan hash value: 1738482415

-------------------------------------------------------------------------------------
| Id  | Operation          | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                |       |       |   650 (100)|          |
|*  1 |  HASH JOIN         |                |  2945 |   552K|   650   (1)| 00:00:08 |
|*  2 |   TABLE ACCESS FULL| OBJECT_SAMPLE1 |    15 |  1440 |   513   (1)| 00:00:07 |
|*  3 |   TABLE ACCESS FULL| OBJECT_SAMPLE2 |   384 | 36864 |   136   (0)| 00:00:02 |

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("B"."OBJECT_TYPE"="A"."OBJECT_TYPE" AND
              "A"."OWNER"="B"."OWNER")
   2 - filter(("A"."OBJECT_TYPE"='SSSSS' AND "A"."OWNER"='SYS' AND
              DECODE("A"."OWNER",NULL,1,2)=2))
   3 - filter(("B"."OBJECT_TYPE"='SSSSS' AND "B"."OWNER"='SYS'))

Note
-----
   - cardinality feedback used for this statement


This should be enough to play with and see how cardinality feedback can kick in to dynamically change the plan on you.

NOTE : I've played further, and this is what I could find.  Anyone correct if any of these are wrong..

) The v$sqlstats only showed one plan_hash_value even though the plan changed.


2) The plan didn't change immediately if I execute one after another.. When I put in a sleep, then the plan changes (must be some sort of delay before cardinality feedback kicks in).

3) Once locked into the new plan, it stays with it.

4) Adaptive Cursor sharing only kicks in if a histogram is used, or if the bind values fall outside the high/low values.

5) Oracle avoids using plans that are suspect (i.e. in my example with decode).

6) Oracle tends to use cardinality feedback more when the plan is "suspect" because of some sort of rough estimation.

7) Using dynamic sampling causes Oracle to use cardinality feedback more.



Cardinality feedback seems to be more of a weighting thing. Oracle uses tries to look at the "situation" and the less likely the estimate is reliable, the more likely it is to use cardinality feedback to choose another plan.






Friday, April 22, 2011

cardinality feedback

My big challenge this week was an 11g database that wouldn't behave.  I'm sure you've been there.. Getting frustrated by a database that doesn't seen to perform right.

I can tell you that challenge has gotten even bigger in 11g.. A new feature was added called "cardinality feedback".  The idea is simple.  Oracle comes up with a plan, including cardinality for each step, and then executes the plan.  If, after executing the plan, the optimizer believes that the Actual cardinality is out of line with the estimated cardinality, the optimizer will pick a new plan.

Sounds like a great feature right ?  But what if you have issues with histograms, and your actual cardinality is lopsided.  Oracle may chose a plan, run against the lobsided data, and decided the plan is no good.  Oracle morphs the plan for you.. No charge, no control.  It get's frrustrating.

The only quick fix I've found is the COE_PROFILE script that is part of the SQLT tuning kit available from MOS.

There is quick way to test this behavior (if you think it's occuring), use this undocumented parameter

-- Turn on cardinality feedback

alter session set "_optimizer_use_feedback" = true;

-- Turn off (disable) cardinality feedback
alter session set "_optimizer_use_feedback" = false;

Tuesday, April 19, 2011

HCC compression

In my free time this week I have been playing on the exadata.  The first thing I wanted to determine is the compression ratios for my data, and the select statistics.

One of the first observations I made was how HCC saves you on selects. I took a 158g table from another system, and loaded it up in HCC query mode (I loaded it direct).. the table is now 6g in size.. WOW, but that didn't impress me as much as my select.

First I selected from the table with storage indexes off.  2.02 seconds.. 2.02 seconds ?  1.7 Billion rows that was 158g of data.  wow.

Next I select from the table with storage indexes on. it takes 9.02..  The extra 7 seconds was the exadata building the storage indexes.   Then I select again.  .30 seconds.  No indexes.

I am going to do some more testing.  but this certainly is impressive !!

Saturday, April 16, 2011

pin x wait on s wait event

Chances are you've stumbled onto my website because you are searching for the keywords "pin x wait on s" "wait" "event" and "parsing".

Well I have been fighting a fire for the last 2 weeks in our performance enviroment on just these "keywords".

We have been doing our testing by sending through a set workload with very similar processing needs.  This workload is date sensitive, so it has been a challenge to keep moving dates and retesting. Recently, we have begin utilzing "flashback database" for our testing.. This has great promise for getting consistent testing done!
You run through your test case scenario, and capture your performance data.  You flashback to the start point, make the change, then run through the same exact scenario. Pretty cool huh ?

Well all this was going as planned.  We ran our processing, saved AWR data, and reran.  Suddenly we started seeing these "pin x wait on S" events holding up processing for minutes at a time. Where did these come from ? was it the release ??  We had to dig in and find out.

Well here is some background.
  • Our queries are very, very complex.  They often take ~ 1000ms to parse (1 second), and 4ms to execute.
  • flashback will flush your cache, kind of obvious, but pertinent to my situation.
  • Our processing is very structured, and sends multiple processing streams through the same steps in the same order
So what was happening ?  Well it turns out that this is caused by hard parsing multiple duplicate sql (probably 50+ in our case) simultaneously.  This wait event is Oracle trying to parse the first sql statement, and having all the others wait.  This parse queing doesn't do well with sql of this complexity, and it throws off the parsing causing these wait events.

Just wanted to pass this on, that if you see this event, and you are doing similar testing, look at the complexity of the queries, and consider hard simultaneous parsing of the same sql as the cuplrit.


Friday, April 8, 2011

Exadata class

This week I attended "exadata and database machine admintration" class.  It was a good class, but they usually are from OU.

This class covers a lot of material in 3 days and it is a good intro to what it's going to be like managing an exadara.  Take heed to the prerecs.  They aren't manditory, but knowing about RAC and grid is necessary to understand how Exadata implements those pieces.

For those who might attend, my advice is do more with dbcontrol,  The class doesn't say much about dbcontrol or grid control.

Well to start at the beginning the class is in a virual environment, Each student has their own virtual 1/4 rack with 1 cpu per node,  This works out well because everyone has their own isolated environment.  This environment has a database with the sh default schema.

In order to get the most out of the class
1) I changed database password for sys,system,dbsnmp,and sysman.
2) I recreated the password file.
3) I installed dbcontrol (dbca)
4) I set up ssh keys for celladmin, and cellmonitor to go to each cell from the oracle account.  Be aware that to set up ssh for cellmonitor, copy the keys as root and change permissions.  The cellmonitor account has very little privileges,
5) I created my own account in asm and granted sysasm to it.

After doing all this I was able to start up dbcontrol and connect to my virtual environment,  This gave me a great view into how the disks were configured, and I was able to add the exadata cells to dbcontrol.

I think having dbcontrol set up made the class much more useful so I could visually see all the items newly available on the exadata.

Tuesday, April 5, 2011

Hadoop

I know after all this exadata posting, your are probably wondering why my blood doesn't run black and red ?

Well I've been hearing about Hadoop for years, and I think it's finally about time that I set up an Hadoop envioronment.

If you have never heard of hadoop here is a starting point.

There are also 2 very interesting flavors.

Cassandra - this allows for HA through clustering your hadoop database with no single point of failure
Hive          - This allows for data warehousing with the building of materialized views.

The thing that grabbed my attention is the ability to handle large amounts of log data.  Like most major companies we produce unwielding amounts of log files from our application server farms with no way to sift through it.  It seems like a lot of people are utilizing hadoop to organize and search log files.

Interesting subject that I will be pursuing further. If anyone has any experience, or words of wisdom on this pass it on.

Sunday, April 3, 2011

Configuring an Exadata (lessons learned)

Well, the time is finally here. 

Twas the night before Exadata,
and all through the data warehouse,
Not a keyboard was stirring,
Not even a mouse

ACS was all snug in their hotel beds,
with visions of "ONE" scripts danced in their heads.
And DBA's with laptops, and I with my smart phone,
had just settled in to see the hype all get blown.

When out on the server room there arose such a clatter,

I sprang from the bed to see what was the matter.
Away to the laptop I flew like a flash,
Tore open the lid and started up bash.

The nodes on the cabinet of  the new server appliance
Gave the heat of mid-day to everything near the servers in the alliance.
When, what to my wondering eyes should appear,
But a man with a sweater drinking a beer.
he was a sailing pro with the magic of a fairy,

I knew in a moment it must be Larry.

More rapid than eagles his crew they came,
And he whistled, and shouted, and called them by name!

Now Dan Norris! now, Kerry Osborne! now, .....

Well you get the picture..

Anyway here is my lessons learned.  These are mostly the things that would have helped us get from purchase, to ACS coming on site a little faster.

1) IP addresses.. Yes the exadata needs lots of IP address.  Here is what you need for an x2-2 full rack.


Ethernet Subnet 1 IP addresses 51
ILOM for Database Servers 8
ILOM for Exadata Cells 14
Eth0 for Database Servers 8
Eth0 for Exadata Cells 14
Mgmt port for IB switches 3
IP address for KVM 1
IP address for Ethernet Switch 1
IP address for PDUs 2
Ethernet Subnet 2 IP addresses 19
Eth1 for Database Servers 8
VIPs for Database Servers 8
SCAN Addresses (per Cluster) 3
Total 70

2) Naming - The naming convention for a server name is used for all the components within the Exadata.  Even the disk themselves include the name of the server so you can track down any issues.  That's not saying that your standard host name isn't usable. It just means that you give Oracle 1 name, and it is the building block for everything else.

3) Default database -  Surprisingly the default database is probably going to be useless to you (unless you are currently going to use the same configuration as oracle provides).. Oracle creates a default database with default parameters, and default Characterset.  Anything other than that you are on your own.

4) Backup.  If you've read my previous posts you have probably found that this is the most confusing area.  Each Database server comes with 4 1ge ports, and 2 10ge ports.  If you are using 10ge you are all set.. Bond and aggregate, and you are in business.. Infiniband you just use the 2 infiniband ports.
If you are still on 1ge like a lot of the datacenters, you can't easily make it redundant.  Eth0 is reserved for management.  Eth1 and eth2 are usually bonded for active-passive redundancy. This leaves one port for your tape backup.. No redundancy.  This is one of the most important things if you are planning on using 1ge.  Make sure you understand how you are going to configure the Exadata, and what that you most likely will not have redundancy.

Finally, I'll pass on two of my favorite comments to make on all this.

"Buying an exadata is like putting a window airconditioner in your 1920's house" -- This is from a time when I had a house with original wiring.. My wife would try to blow dry her air with the airconditioner on, and the fuse would blow.. If you put an exadata in your datacenter running 1ge , you will blow a fuse.

"Buying an Exadata doesn't make things easier.. It is easier to be told we can't afford it, it's more difficult if they buy it.  It's like telling your mangement that you need a ferarri to go fast, and they say yes.. now drive it 320MPH without crashing." -This should be pretty self explainitory.  There are strong expectations from managment when you buy one of these.








Friday, April 1, 2011

New Free oracle products are out

Oracle XE  11g beta  just came out today, SQL Developer 3.0 came out, and Oracle SQL developer data modeler came out a couple of months ago. All free.