Monday, August 1, 2011

Cardinality Feedback detailed example

Well, I have often written about cardinality feedback, and I have been striving to find out more detail on how it works, how to determine why it was used, and how to determine when it was used.

I have ran through a series of tests that answers a lot of these questions.  There is some information available, but  it isn't complete.  Here a couple of good starting points..


They were even able to recreate some examples. Well here is my example.

First, my dataset. I used the zipcode database you can find here.  I used a sqldr script you can find here, to load a table structure you can find here.

First I loaded the data (about 74,000 rows), created a couple of indexes and ran a query.

here are the indexes
 CREATE INDEX "SYSTEM"."INDEX1" ON "SYSTEM"."TEST_DATA"
    (  "ZIPCODE" )
    TABLESPACE "SYSTEM" ;
  
 CREATE INDEX "SYSTEM"."INDEX2" ON "SYSTEM"."TEST_DATA"
    (  "CITY" )
     TABLESPACE "SYSTEM" ;
  
 CREATE INDEX "SYSTEM"."INDEX3" ON "SYSTEM"."TEST_DATA"
    (  "COUNTY" )
     TABLESPACE "SYSTEM" ;  

select /* colorme */ * from test_data where city='ROCHESTER' and county='MONROE';



There are individual indexes on both city and county, and there is a relationship between the 2 items.  This query will return 44 rows of data.

First output is the information from v$sql_shared_cursor.  Notice the column "use_feedback_stats".  I am also supplying the information from v$sqlstats for this query.  Here is the output after the first execution.

v$SQL_SHARED_CURSOR
SQL_ID CHILD_NUMBER USE_FEEDBACK_STATS ------------- ---------------------- ------------------ 33x0cazrmgrgy 0 Y
V$SQLSTATS
SQL_ID PLAN_HASH_VALUE PARSE_CALLS BUFFER_GETS ROWS_PROCESSED ------------- ---------------------- ---------------------- ---------------------- ---------------------- 33x0cazrmgrgy 757026858 1 551 44

Notice that the column "use_feedback_stats" is set to 'Y" indicating that the query is marked for cardinality feedback to determine if the first execution is accurate enough.  I also ran a 10053 trace.  You can see from the output the query is rewritten to be executed with dynamic sampling.

SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0), NVL(SUM(C3),0), NVL(SUM(C4),0) 
FROM 
(SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("TEST_DATA") FULL("TEST_DATA") NO_PARALLEL_INDEX("TEST_DATA") */ 1 AS C1, 
CASE WHEN "TEST_DATA"."CITY"='ROCHESTER' AND "TEST_DATA"."COUNTY"='MONROE' THEN 1 ELSE 0 END AS C2, 
CASE WHEN "TEST_DATA"."COUNTY"='MONROE' THEN 1 ELSE 0 END AS C3, 
CASE WHEN "TEST_DATA"."CITY"='ROCHESTER' THEN 1 ELSE 0 END AS C4 
FROM "BGRENN"."TEST_DATA" SAMPLE BLOCK (4.972376 , 1) SEED (1) "TEST_DATA") SAMPLESUB



Here is the output from explain plan

 PLAN_TABLE_OUTPUT                                                    
  
 -------------------------------------------------------------------------------------------------------------------------
   Plan hash value: 757026858                                                
   -----------------------------------------------------------------------------------------                
   | Id | Operation          | Name   | Rows | Bytes | Cost (%CPU)| Time   |                
   -----------------------------------------------------------------------------------------                
   |  0 | SELECT STATEMENT      |      |   8 | 1272 |   3  (0)| 00:00:01 |                
   |* 1 | TABLE ACCESS BY INDEX ROWID| TEST_DATA |   8 | 1272 |   3  (0)| 00:00:01 |                
   |* 2 |  INDEX RANGE SCAN     | INDEX2  |  121 |    |   1  (0)| 00:00:01 |                
   -----------------------------------------------------------------------------------------                
   Predicate Information (identified by operation id):                                   
   ---------------------------------------------------                                  
     1 - filter("COUNTY"='MONROE')                                             
     2 - access("CITY"='ROCHESTER')                                            
  
 Note                                                           
   -----                                                          
     - dynamic sampling used for this statement (level=2)                                 
  
Second output is the information from v$sql_shared_cursor (again).  Notice the column "use_feedback_stats" is set to 'N'
SQL_ID        CHILD_NUMBER           USE_FEEDBACK_STATS 
------------- ---------------------- ------------------ 
33x0cazrmgrgy 0                      Y                  
33x0cazrmgrgy 1                      N                  

SQL_ID        PLAN_HASH_VALUE        PARSE_CALLS            BUFFER_GETS            ROWS_PROCESSED         
------------- ---------------------- ---------------------- ---------------------- ---------------------- 
33x0cazrmgrgy 757026858              2                      572                    88                     



I ran a 10053 trace again (as before).  You can see in this case the results from the first execution (with 44 rows), is passed into the second execution using the hint "opt_estimate". You can also see what parts of the query are utilizing cardinality feedback to change the query.


SELECT /*+ OPT_ESTIMATE (TABLE "TEST_DATA" ROWS=44.000000 ) OPT_ESTIMATE (INDEX_SCAN "TEST_DATA" "INDEX2" MIN=69.000000 ) 
OPT_ESTIMATE (INDEX_FILTER "TEST_DATA" "INDEX2" ROWS=69.000000 ) */ 
"TEST_DATA"."ZIPCODE" "ZIPCODE",
"TEST_DATA"."LAT" "LAT",
"TEST_DATA"."CITY" "CITY",
"TEST_DATA"."STATE" "STATE",
"TEST_DATA"."COUNTY" "COUNTY",
"TEST_DATA"."TYPE" "TYPE",
"TEST_DATA"."PREFERRED" "PREFERRED",
"TEST_DATA"."WORLDREGION" "WORLDREGION",
"TEST_DATA"."COUNTRY" "COUNTRY",
"TEST_DATA"."LOCATIONTEXT" "LOCATIONTEXT",
"TEST_DATA"."LOCATION" "LOCATION",
"TEST_DATA"."POPULATION" "POPULATION",
"TEST_DATA"."HOUSINGUNITS" "HOUSINGUNITS"
,"TEST_DATA"."INCOME" "INCOME",
"TEST_DATA"."LANDAREA",
"LANDAREATEST_DATA"."WATERAREA" "WATERAREA",
"TEST_DATA"."DECOMMISIONED" "DECOMMISIONED",
"TEST_DATA"."MILITARYRESTRICTIONCODES" "MILITARYRESTRICTIONCODES" 
FROM "BGRENN"."TEST_DATA" "TEST_DATA" 
WHERE "TEST_DATA"."CITY"='ROCHESTER' AND "TEST_DATA"."COUNTY"='MONROE'


I ran this query until the v$sql_shared_cursor came back clean, and showed a "n" indicating that dynamic feedback was no longer being considered.  There was information from 2 hard parses in the 10053 file.
SQL_ID        CHILD_NUMBER           USE_FEEDBACK_STATS 
------------- ---------------------- ------------------ 
33x0cazrmgrgy 0                      Y                  
33x0cazrmgrgy 1                      N                  

SQL_ID        PLAN_HASH_VALUE        PARSE_CALLS            BUFFER_GETS            ROWS_PROCESSED         
------------- ---------------------- ---------------------- ---------------------- ---------------------- 
33x0cazrmgrgy 757026858              3                      593                    132                    


This answered my first question. How do you tell what caused the cardinality feedback ? You run a 10053, and find the OPT_ESTIMATE cardinality hints that adjust the cardinility and reparse the query.


/***************************************/

For my second test, I used the same result set, and ran statistics against the data.  After running the query from the first test, I found that cardinality was not being utilzed, so I changed my query.



select /* colorme */ * from test_data where city='ROCHESTER' and county='MONROE1' and state='NY';

This returns no rows of data, but the optimizer can't figure this out.

First execution you can see that cardinality feedback was marked for the statement, and notice the plan_hash_value of 7570268858.  In this example I've also included the output from v$SQLSTATS_PLAN_HASH. This information was added after reading the post from Lisa G.

SQL_ID        CHILD_NUMBER           USE_FEEDBACK_STATS 
------------- ---------------------- ------------------ 
88jhv4mryj58p 0                      Y                  

V$SQLSTATS

SQL_ID        PLAN_HASH_VALUE        PARSE_CALLS            BUFFER_GETS            ROWS_PROCESSED         
------------- ---------------------- ---------------------- ---------------------- ---------------------- 
88jhv4mryj58p 757026858              1                      146                    0                      

v$SQLSTATS_PLAN_HASH

SQL_ID        PLAN_HASH_VALUE        PARSE_CALLS            BUFFER_GETS            ROWS_PROCESSED         
------------- ---------------------- ---------------------- ---------------------- ---------------------- 
88jhv4mryj58p 757026858              1                      146                    0                      



Here is the output from the 10053 trace.


SELECT 
    "TEST_DATA"."ZIPCODE" "ZIPCODE",
    "TEST_DATA"."LAT" "LAT",
    "TEST_DATA"."CITY" "CITY",
    "TEST_DATA"."STATE" "STATE",
    "TEST_DATA"."COUNTY" "COUNTY",
    "TEST_DATA"."TYPE" "TYPE",
    "TEST_DATA"."PREFERRED" "PREFERRED",
    "TEST_DATA"."WORLDREGION" "WORLDREGION",
    "TEST_DATA"."COUNTRY" "COUNTRY",
    "TEST_DATA"."LOCATIONTEXT" "LOCATIONTEXT",
    "TEST_DATA"."LOCATION" "LOCATION",
    "TEST_DATA"."POPULATION" "POPULATION",
    "TEST_DATA"."HOUSINGUNITS" "HOUSINGUNITS",
    "TEST_DATA"."INCOME" "INCOME",
    "TEST_DATA"."LANDAREA",
    "LANDAREATEST_DATA"."WATERAREA" "WATERAREA",
    "TEST_DATA"."DECOMMISIONED" "DECOMMISIONED",
    "TEST_DATA"."MILITARYRESTRICTIONCODES" "MILITARYRESTRICTIONCODES" 
FROM "BGRENN"."TEST_DATA" "TEST_DATA" 
WHERE "TEST_DATA"."CITY"='ROCHESTER' AND "TEST_DATA"."COUNTY"='MONROE1'
Here is the first plan

 PLAN_TABLE_OUTPUT                                                    
  
 -------------------------------------------------------------------------------------------------------------------------
  
 Plan hash value: 757026858                                                
  
 -----------------------------------------------------------------------------------------                
  
 | Id | Operation          | Name   | Rows | Bytes | Cost (%CPU)| Time   |                
 -----------------------------------------------------------------------------------------                
 |  0 | SELECT STATEMENT      |      |   1 |  120 |   3  (0)| 00:00:01 |                
 |* 1 | TABLE ACCESS BY INDEX ROWID| TEST_DATA |   1 |  120 |   3  (0)| 00:00:01 |                
 |* 2 |  INDEX RANGE SCAN     | INDEX2  |   2 |    |   1  (0)| 00:00:01 |                
 -----------------------------------------------------------------------------------------                
 Predicate Information (identified by operation id):                                   
 ---------------------------------------------------                                   
   1 - filter("COUNTY"='MONROE1' AND "STATE"='NY')                                    
   2 - access("CITY"='ROCHESTER')                                            
  
Second execution, notice that cardinality feedback will not be used on subsequent executions.

SQL_ID        CHILD_NUMBER           USE_FEEDBACK_STATS 
------------- ---------------------- ------------------ 
88jhv4mryj58p 0                      Y                  
88jhv4mryj58p 1                      N                  

V$SQLSTATS

SQL_ID        PLAN_HASH_VALUE        PARSE_CALLS            BUFFER_GETS            ROWS_PROCESSED         
------------- ---------------------- ---------------------- ---------------------- ---------------------- 
88jhv4mryj58p 2704719303             2                      148                    0                      

v$SQLSTATS_PLAN_HASH

SQL_ID        PLAN_HASH_VALUE        PARSE_CALLS            BUFFER_GETS            ROWS_PROCESSED         
------------- ---------------------- ---------------------- ---------------------- ---------------------- 
88jhv4mryj58p 2704719303             1                      2                      0                      
88jhv4mryj58p 757026858              1                      146                    0                      


Here is the output from the 10053 trace showing the feedback from the first execution using the OPT_ESTIMATE hint again.

SELECT /*+ OPT_ESTIMATE (INDEX_SCAN "TEST_DATA" "INDEX2" MIN=69.000000 ) 
           OPT_ESTIMATE (INDEX_FILTER "TEST_DATA" "INDEX2" ROWS=69.000000 ) */ 
    "TEST_DATA"."ZIPCODE" "ZIPCODE",
    "TEST_DATA"."LAT" "LAT",
    "TEST_DATA"."CITY" "CITY",
    "TEST_DATA"."STATE" "STATE",
    "TEST_DATA"."COUNTY" "COUNTY",
    "TEST_DATA"."TYPE" "TYPE",
    "TEST_DATA"."PREFERRED" "PREFERRED",
    "TEST_DATA"."WORLDREGION" "WORLDREGION",
    "TEST_DATA"."COUNTRY" "COUNTRY",
    "TEST_DATA"."LOCATIONTEXT" "LOCATIONTEXT",
    "TEST_DATA"."LOCATION" "LOCATION",
    "TEST_DATA"."POPULATION" "POPULATION",
    "TEST_DATA"."HOUSINGUNITS" "HOUSINGUNITS",
    "TEST_DATA"."INCOME" "INCOME",
    "TEST_DATA"."LANDAREA",
    "LANDAREATEST_DATA"."WATERAREA" "WATERAREA",
    "TEST_DATA"."DECOMMISIONED" "DECOMMISIONED",
    "TEST_DATA"."MILITARYRESTRICTIONCODES" "MILITARYRESTRICTIONCODES" 
FROM "BGRENN"."TEST_DATA" "TEST_DATA" 
WHERE "TEST_DATA"."CITY"='ROCHESTER' AND "TEST_DATA"."COUNTY"='MONROE1'
But notice something else... notice the plan has value changed from 757026858 to 2704719303.  Fine right ? Cardinality feedback did it's thing, but also notice this is the output from v$sqlstats.  Huh ?? The history for executions of the plan (1 for the first plan, and 1 for the second), got summarized into the second plan.

here is the second plan
 Plan hash value: 2704719303                                                     
  
 -----------------------------------------------------------------------------------------                      
 | Id | Operation          | Name   | Rows | Bytes | Cost (%CPU)| Time   |                      
 -----------------------------------------------------------------------------------------                      
 |  0 | SELECT STATEMENT      |      |    |    |   6 (100)|     |                      
 |* 1 | TABLE ACCESS BY INDEX ROWID| TEST_DATA |   1 |  120 |   6  (0)| 00:00:01 |                      
 |* 2 |  INDEX RANGE SCAN     | INDEX3  |  38 |    |   1  (0)| 00:00:01 |                      
 -----------------------------------------------------------------------------------------                      
  
 Predicate Information (identified by operation id):                                         
 ---------------------------------------------------  
   1 - filter("CITY"='ROCHESTER')                                                  
   2 - access("COUNTY"='MONROE1')                                                  
 PLAN_TABLE_OUTPUT                                                          
 ------------------------------------------------------------------------------------------------------------------------------------
  
 Note                                                                
 -----                                                                
   - cardinality feedback used for this statement         
What does all this mean ????

1) the 10053 trace will reveal where cardinality feedback was used by finding the OPT_ESTIMATE hint.
2) the v$sql_session_cursor view will show you where cardinality feedback was used.
3) There is a new view for looking at the multiple plans created for cardinality feedback called v$SQLSTATS_PLAN_HASH.

And in looking at the DBA_HIST_SQLSTAT, I do see both plans.

SQL_ID        PLAN_HASH_VALUE        PARSE_CALLS_DELTA      BUFFER_GETS_DELTA      ROWS_PROCESSED_DELTA   
------------- ---------------------- ---------------------- ---------------------- ---------------------- 
88jhv4mryj58p 757026858              1                      146                    0                      
88jhv4mryj58p 2704719303             1                      2                      0                      


Another observation I will would make is that when you multiple columns (and you aren't using a composite index), oracle is likely to utilyze cardinality feedback to ensure the cardinality is correct.

Finally, I did try the same example with bind variables (instead of literals).  I found that Cardinality Feedback did not kick in for my example.  Bind variables must tell the optimizer to pick a "generically" better plan, and not look around.

This is all on version 11.2.0.2


I have included a query I'm using to point out where cardinality feedback has been utilized, and sorted the queries by the affect they have on performance, so I can assess the gains (or losses) associated with the use of this feature. Overall my databases have show remarkable improvement, and those fiew cases where there is a regression, the sql_id is highligted so I can create a profile.


I have to note that this query has become invaluable to identify the top queries affected by cardinality feedback.  I have used it to identify where extended stats may help, and I have been able to determine, that cardinality feedback is indeed a good thing most of the time.  A lot of this detail is lost in the AWR reports.





select  query1.sql_id,
query1.plan_hash_value phv1,
query2.plan_hash_value phv2, 
query1.executions exec1,
query2.executions exec2,
trunc((query1.avg_elapsed_time)/1000000,2) aet,
trunc(( query2.avg_elapsed_time)/1000000,2) aet_card,
trunc(( query1.avg_elapsed_time - query2.avg_elapsed_time)/1000000,2) aet_diff,
abs(trunc(( (query1.avg_elapsed_time - query2.avg_elapsed_time)* query2.executions)/1000000,2)) abs_diff,
trunc( ((query1.avg_elapsed_time - query2.avg_elapsed_time) * query2.executions)/1000000,2) tot_diff,
trunc( ((query1.avg_elapsed_time - query2.avg_elapsed_time)/ query1.avg_elapsed_time) * 100,1) card_feedback_perc_difference,query1.sql_text
  from 
(select sql_id,plan_hash_value,last_active_time,executions,(rows_processed/executions) rows_processed,
(elapsed_time/executions) avg_elapsed_time,(cpu_time/executions) avg_cpu_time,(buffer_gets/executions) avg_buffer_gets,
sql_text from V$SQLSTATS_PLAN_HASH a
where exists (select plan_hash_value  from V$SQLSTATS_PLAN_HASH b  where a.sql_id=b.sql_id and a.plan_hash_value <> b.plan_hash_value and executions > 0)
and executions > 0
order by sql_id,last_active_time) query1,
(select sql_id,plan_hash_value,last_active_time,executions,(rows_processed/executions) rows_processed,
(elapsed_time/executions) avg_elapsed_time,(cpu_time/executions) avg_cpu_time,(buffer_gets/executions) avg_buffer_gets,
sql_text from V$SQLSTATS_PLAN_HASH a
where exists (select plan_hash_value  from V$SQLSTATS_PLAN_HASH b  where a.sql_id=b.sql_id and a.plan_hash_value <> b.plan_hash_value and executions > 0)
and executions > 0
order by sql_id,last_active_time) query2
where query1.sql_id=query2.sql_id
and query1.last_active_time < query2.last_active_time 
and exists (select 1 from V$SQL_SHARED_CURSOR c where query1.sql_id=c.sql_id and c.use_feedback_stats='Y') 
--and query1.avg_elapsed_time < query2.avg_elapsed_time
order by abs_diff desc

3 comments:

  1. Oracle support was actually very helpful. Here's the info per my SR.

    I located non-published Bug 6688020 with a title of "V$SQLSTATS Should Aggregate on SQL_ID and PLAN_HASH_VALUE and Not SQL_ID Only." The way this was fixed is to introduce the view V$SQLSTATS_PLAN_HASH, which contains "one row per unique combination of SQL_ID and PLAN_HASH_VALUE," per the "Database Reference."

    ReplyDelete
  2. MOS is writing an article which should be published in a few weeks.

    1345516.1

    ReplyDelete
  3. Thanks - I'm in the process of learning about Oracle and this post is great. Wasn't sure about cardinality but found another page that complemented yours very well for a newbie like me:

    http://www.programmerinterview.com/index.php/database-sql/cardinality-in-sql/

    ReplyDelete