Well, I have often written about cardinality feedback, and I have been striving to find out more detail on how it works, how to determine why it was used, and how to determine when it was used.
I have ran through a series of tests that answers a lot of these questions. There is some information available, but it isn't complete. Here a couple of good starting points..
They were even able to recreate some examples. Well here is my example.
First I loaded the data (about 74,000 rows), created a couple of indexes and ran a query.
here are the indexes
CREATE INDEX "SYSTEM"."INDEX1" ON "SYSTEM"."TEST_DATA"
( "ZIPCODE" )
TABLESPACE "SYSTEM" ;
CREATE INDEX "SYSTEM"."INDEX2" ON "SYSTEM"."TEST_DATA"
( "CITY" )
TABLESPACE "SYSTEM" ;
CREATE INDEX "SYSTEM"."INDEX3" ON "SYSTEM"."TEST_DATA"
( "COUNTY" )
TABLESPACE "SYSTEM" ;
There are individual indexes on both city and county, and there is a relationship between the 2 items. This query will return 44 rows of data.
First output is the information from v$sql_shared_cursor. Notice the column "use_feedback_stats". I am also supplying the information from v$sqlstats for this query. Here is the output after the first execution.
v$SQL_SHARED_CURSOR
SQL_ID CHILD_NUMBER USE_FEEDBACK_STATS ------------- ---------------------- ------------------ 33x0cazrmgrgy 0 Y
V$SQLSTATS
SQL_ID PLAN_HASH_VALUE PARSE_CALLS BUFFER_GETS ROWS_PROCESSED ------------- ---------------------- ---------------------- ---------------------- ---------------------- 33x0cazrmgrgy 757026858 1 551 44
Notice that the column "use_feedback_stats" is set to 'Y" indicating that the query is marked for cardinality feedback to determine if the first execution is accurate enough. I also ran a 10053 trace. You can see from the output the query is rewritten to be executed with dynamic sampling.
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0), NVL(SUM(C3),0), NVL(SUM(C4),0)
FROM
(SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("TEST_DATA") FULL("TEST_DATA") NO_PARALLEL_INDEX("TEST_DATA") */ 1 AS C1,
CASE WHEN "TEST_DATA"."CITY"='ROCHESTER' AND "TEST_DATA"."COUNTY"='MONROE' THEN 1 ELSE 0 END AS C2,
CASE WHEN "TEST_DATA"."COUNTY"='MONROE' THEN 1 ELSE 0 END AS C3,
CASE WHEN "TEST_DATA"."CITY"='ROCHESTER' THEN 1 ELSE 0 END AS C4
FROM "BGRENN"."TEST_DATA" SAMPLE BLOCK (4.972376 , 1) SEED (1) "TEST_DATA") SAMPLESUB
Here is the output from explain plan
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
Plan hash value: 757026858
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8 | 1272 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| TEST_DATA | 8 | 1272 | 3 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | INDEX2 | 121 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COUNTY"='MONROE')
2 - access("CITY"='ROCHESTER')
Note
-----
- dynamic sampling used for this statement (level=2)
Second output is the information from v$sql_shared_cursor (again). Notice the column "use_feedback_stats" is set to 'N'
SQL_ID CHILD_NUMBER USE_FEEDBACK_STATS
------------- ---------------------- ------------------
33x0cazrmgrgy 0 Y
33x0cazrmgrgy 1 N
SQL_ID PLAN_HASH_VALUE PARSE_CALLS BUFFER_GETS ROWS_PROCESSED
------------- ---------------------- ---------------------- ---------------------- ----------------------
33x0cazrmgrgy 757026858 2 572 88
I ran a 10053 trace again (as before). You can see in this case the results from the first execution (with 44 rows), is passed into the second execution using the hint "opt_estimate". You can also see what parts of the query are utilizing cardinality feedback to change the query.
SELECT /*+ OPT_ESTIMATE (TABLE "TEST_DATA" ROWS=44.000000 ) OPT_ESTIMATE (INDEX_SCAN "TEST_DATA" "INDEX2" MIN=69.000000 )
OPT_ESTIMATE (INDEX_FILTER "TEST_DATA" "INDEX2" ROWS=69.000000 ) */
"TEST_DATA"."ZIPCODE" "ZIPCODE",
"TEST_DATA"."LAT" "LAT",
"TEST_DATA"."CITY" "CITY",
"TEST_DATA"."STATE" "STATE",
"TEST_DATA"."COUNTY" "COUNTY",
"TEST_DATA"."TYPE" "TYPE",
"TEST_DATA"."PREFERRED" "PREFERRED",
"TEST_DATA"."WORLDREGION" "WORLDREGION",
"TEST_DATA"."COUNTRY" "COUNTRY",
"TEST_DATA"."LOCATIONTEXT" "LOCATIONTEXT",
"TEST_DATA"."LOCATION" "LOCATION",
"TEST_DATA"."POPULATION" "POPULATION",
"TEST_DATA"."HOUSINGUNITS" "HOUSINGUNITS"
,"TEST_DATA"."INCOME" "INCOME",
"TEST_DATA"."LANDAREA",
"LANDAREATEST_DATA"."WATERAREA" "WATERAREA",
"TEST_DATA"."DECOMMISIONED" "DECOMMISIONED",
"TEST_DATA"."MILITARYRESTRICTIONCODES" "MILITARYRESTRICTIONCODES"
FROM "BGRENN"."TEST_DATA" "TEST_DATA"
WHERE "TEST_DATA"."CITY"='ROCHESTER' AND "TEST_DATA"."COUNTY"='MONROE'
I ran this query until the v$sql_shared_cursor came back clean, and showed a "n" indicating that dynamic feedback was no longer being considered. There was information from 2 hard parses in the 10053 file.
SQL_ID CHILD_NUMBER USE_FEEDBACK_STATS
------------- ---------------------- ------------------
33x0cazrmgrgy 0 Y
33x0cazrmgrgy 1 N
SQL_ID PLAN_HASH_VALUE PARSE_CALLS BUFFER_GETS ROWS_PROCESSED
------------- ---------------------- ---------------------- ---------------------- ----------------------
33x0cazrmgrgy 757026858 3 593 132
This answered my first question. How do you tell what caused the cardinality feedback ? You run a 10053, and find the OPT_ESTIMATE cardinality hints that adjust the cardinility and reparse the query.
/***************************************/
For my second test, I used the same result set, and ran statistics against the data. After running the query from the first test, I found that cardinality was not being utilzed, so I changed my query.
select /* colorme */ * from test_data where city='ROCHESTER' and county='MONROE1' and state='NY';
This returns no rows of data, but the optimizer can't figure this out.
First execution you can see that cardinality feedback was marked for the statement, and notice the plan_hash_value of 7570268858. In this example I've also included the output from v$SQLSTATS_PLAN_HASH. This information was added after reading the post from Lisa G.
SQL_ID CHILD_NUMBER USE_FEEDBACK_STATS
------------- ---------------------- ------------------
88jhv4mryj58p 0 Y
V$SQLSTATS
SQL_ID PLAN_HASH_VALUE PARSE_CALLS BUFFER_GETS ROWS_PROCESSED
------------- ---------------------- ---------------------- ---------------------- ----------------------
88jhv4mryj58p 757026858 1 146 0
v$SQLSTATS_PLAN_HASH
SQL_ID PLAN_HASH_VALUE PARSE_CALLS BUFFER_GETS ROWS_PROCESSED
------------- ---------------------- ---------------------- ---------------------- ----------------------
88jhv4mryj58p 757026858 1 146 0
Here is the output from the 10053 trace.
SELECT
"TEST_DATA"."ZIPCODE" "ZIPCODE",
"TEST_DATA"."LAT" "LAT",
"TEST_DATA"."CITY" "CITY",
"TEST_DATA"."STATE" "STATE",
"TEST_DATA"."COUNTY" "COUNTY",
"TEST_DATA"."TYPE" "TYPE",
"TEST_DATA"."PREFERRED" "PREFERRED",
"TEST_DATA"."WORLDREGION" "WORLDREGION",
"TEST_DATA"."COUNTRY" "COUNTRY",
"TEST_DATA"."LOCATIONTEXT" "LOCATIONTEXT",
"TEST_DATA"."LOCATION" "LOCATION",
"TEST_DATA"."POPULATION" "POPULATION",
"TEST_DATA"."HOUSINGUNITS" "HOUSINGUNITS",
"TEST_DATA"."INCOME" "INCOME",
"TEST_DATA"."LANDAREA",
"LANDAREATEST_DATA"."WATERAREA" "WATERAREA",
"TEST_DATA"."DECOMMISIONED" "DECOMMISIONED",
"TEST_DATA"."MILITARYRESTRICTIONCODES" "MILITARYRESTRICTIONCODES"
FROM "BGRENN"."TEST_DATA" "TEST_DATA"
WHERE "TEST_DATA"."CITY"='ROCHESTER' AND "TEST_DATA"."COUNTY"='MONROE1'
Here is the first plan
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
Plan hash value: 757026858
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 120 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| TEST_DATA | 1 | 120 | 3 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | INDEX2 | 2 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COUNTY"='MONROE1' AND "STATE"='NY')
2 - access("CITY"='ROCHESTER')
Second execution, notice that cardinality feedback will not be used on subsequent executions.
SQL_ID CHILD_NUMBER USE_FEEDBACK_STATS
------------- ---------------------- ------------------
88jhv4mryj58p 0 Y
88jhv4mryj58p 1 N
V$SQLSTATS
SQL_ID PLAN_HASH_VALUE PARSE_CALLS BUFFER_GETS ROWS_PROCESSED
------------- ---------------------- ---------------------- ---------------------- ----------------------
88jhv4mryj58p 2704719303 2 148 0
v$SQLSTATS_PLAN_HASH
SQL_ID PLAN_HASH_VALUE PARSE_CALLS BUFFER_GETS ROWS_PROCESSED
------------- ---------------------- ---------------------- ---------------------- ----------------------
88jhv4mryj58p 2704719303 1 2 0
88jhv4mryj58p 757026858 1 146 0
Here is the output from the 10053 trace showing the feedback from the first execution using the OPT_ESTIMATE hint again.
SELECT /*+ OPT_ESTIMATE (INDEX_SCAN "TEST_DATA" "INDEX2" MIN=69.000000 )
OPT_ESTIMATE (INDEX_FILTER "TEST_DATA" "INDEX2" ROWS=69.000000 ) */
"TEST_DATA"."ZIPCODE" "ZIPCODE",
"TEST_DATA"."LAT" "LAT",
"TEST_DATA"."CITY" "CITY",
"TEST_DATA"."STATE" "STATE",
"TEST_DATA"."COUNTY" "COUNTY",
"TEST_DATA"."TYPE" "TYPE",
"TEST_DATA"."PREFERRED" "PREFERRED",
"TEST_DATA"."WORLDREGION" "WORLDREGION",
"TEST_DATA"."COUNTRY" "COUNTRY",
"TEST_DATA"."LOCATIONTEXT" "LOCATIONTEXT",
"TEST_DATA"."LOCATION" "LOCATION",
"TEST_DATA"."POPULATION" "POPULATION",
"TEST_DATA"."HOUSINGUNITS" "HOUSINGUNITS",
"TEST_DATA"."INCOME" "INCOME",
"TEST_DATA"."LANDAREA",
"LANDAREATEST_DATA"."WATERAREA" "WATERAREA",
"TEST_DATA"."DECOMMISIONED" "DECOMMISIONED",
"TEST_DATA"."MILITARYRESTRICTIONCODES" "MILITARYRESTRICTIONCODES"
FROM "BGRENN"."TEST_DATA" "TEST_DATA"
WHERE "TEST_DATA"."CITY"='ROCHESTER' AND "TEST_DATA"."COUNTY"='MONROE1'
But notice something else... notice the plan has value changed from 757026858 to 2704719303. Fine right ? Cardinality feedback did it's thing, but also notice this is the output from v$sqlstats. Huh ?? The history for executions of the plan (1 for the first plan, and 1 for the second), got summarized into the second plan.
here is the second plan
Plan hash value: 2704719303
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)| |
|* 1 | TABLE ACCESS BY INDEX ROWID| TEST_DATA | 1 | 120 | 6 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | INDEX3 | 38 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CITY"='ROCHESTER')
2 - access("COUNTY"='MONROE1')
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Note
-----
- cardinality feedback used for this statement
What does all this mean ????
1) the 10053 trace will reveal where cardinality feedback was used by finding the OPT_ESTIMATE hint.
2) the v$sql_session_cursor view will show you where cardinality feedback was used.
3) There is a new view for looking at the multiple plans created for cardinality feedback called v$SQLSTATS_PLAN_HASH.
And in looking at the DBA_HIST_SQLSTAT, I do see both plans.
SQL_ID PLAN_HASH_VALUE PARSE_CALLS_DELTA BUFFER_GETS_DELTA ROWS_PROCESSED_DELTA
------------- ---------------------- ---------------------- ---------------------- ----------------------
88jhv4mryj58p 757026858 1 146 0
88jhv4mryj58p 2704719303 1 2 0
Another observation I will would make is that when you multiple columns (and you aren't using a composite index), oracle is likely to utilyze cardinality feedback to ensure the cardinality is correct.
Finally, I did try the same example with bind variables (instead of literals). I found that Cardinality Feedback did not kick in for my example. Bind variables must tell the optimizer to pick a "generically" better plan, and not look around.
This is all on version 11.2.0.2
I have included a query I'm using to point out where cardinality feedback has been utilized, and sorted the queries by the affect they have on performance, so I can assess the gains (or losses) associated with the use of this feature. Overall my databases have show remarkable improvement, and those fiew cases where there is a regression, the sql_id is highligted so I can create a profile.
I have to note that this query has become invaluable to identify the top queries affected by cardinality feedback. I have used it to identify where extended stats may help, and I have been able to determine, that cardinality feedback is indeed a good thing most of the time. A lot of this detail is lost in the AWR reports.
select query1.sql_id,
query1.plan_hash_value phv1,
query2.plan_hash_value phv2,
query1.executions exec1,
query2.executions exec2,
trunc((query1.avg_elapsed_time)/1000000,2) aet,
trunc(( query2.avg_elapsed_time)/1000000,2) aet_card,
trunc(( query1.avg_elapsed_time - query2.avg_elapsed_time)/1000000,2) aet_diff,
abs(trunc(( (query1.avg_elapsed_time - query2.avg_elapsed_time)* query2.executions)/1000000,2)) abs_diff,
trunc( ((query1.avg_elapsed_time - query2.avg_elapsed_time) * query2.executions)/1000000,2) tot_diff,
trunc( ((query1.avg_elapsed_time - query2.avg_elapsed_time)/ query1.avg_elapsed_time) * 100,1) card_feedback_perc_difference,query1.sql_text
from
(select sql_id,plan_hash_value,last_active_time,executions,(rows_processed/executions) rows_processed,
(elapsed_time/executions) avg_elapsed_time,(cpu_time/executions) avg_cpu_time,(buffer_gets/executions) avg_buffer_gets,
sql_text from V$SQLSTATS_PLAN_HASH a
where exists (select plan_hash_value from V$SQLSTATS_PLAN_HASH b where a.sql_id=b.sql_id and a.plan_hash_value <> b.plan_hash_value and executions > 0)
and executions > 0
order by sql_id,last_active_time) query1,
(select sql_id,plan_hash_value,last_active_time,executions,(rows_processed/executions) rows_processed,
(elapsed_time/executions) avg_elapsed_time,(cpu_time/executions) avg_cpu_time,(buffer_gets/executions) avg_buffer_gets,
sql_text from V$SQLSTATS_PLAN_HASH a
where exists (select plan_hash_value from V$SQLSTATS_PLAN_HASH b where a.sql_id=b.sql_id and a.plan_hash_value <> b.plan_hash_value and executions > 0)
and executions > 0
order by sql_id,last_active_time) query2
where query1.sql_id=query2.sql_id
and query1.last_active_time < query2.last_active_time
and exists (select 1 from V$SQL_SHARED_CURSOR c where query1.sql_id=c.sql_id and c.use_feedback_stats='Y')
--and query1.avg_elapsed_time < query2.avg_elapsed_time
order by abs_diff desc
Oracle support was actually very helpful. Here's the info per my SR.
ReplyDeleteI located non-published Bug 6688020 with a title of "V$SQLSTATS Should Aggregate on SQL_ID and PLAN_HASH_VALUE and Not SQL_ID Only." The way this was fixed is to introduce the view V$SQLSTATS_PLAN_HASH, which contains "one row per unique combination of SQL_ID and PLAN_HASH_VALUE," per the "Database Reference."
MOS is writing an article which should be published in a few weeks.
ReplyDelete1345516.1
Thanks - I'm in the process of learning about Oracle and this post is great. Wasn't sure about cardinality but found another page that complemented yours very well for a newbie like me:
ReplyDeletehttp://www.programmerinterview.com/index.php/database-sql/cardinality-in-sql/