I was running a query that had an in clause, and the cardinality turned out to be wrong.. After investigating, I found that the optimizer was double applying the selectivity of the index.
Here is my example
select * from system.tab1 a,system.tab2 b
where a.col1 IN ('WRH$_PARAMETER_PK','WRH$_SEG_STAT','WRH$_DB_CACHE_ADVICE')
and col1=col2;
Here is my example
select * from system.tab1 a,system.tab2 b
where a.col1 IN ('WRH$_PARAMETER_PK','WRH$_SEG_STAT','WRH$_DB_CACHE_ADVICE')
and col1=col2;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 54186084
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 162 | 14 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 3 | 162 | 14 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID| TAB1 | 3 | 75 | 7 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | TAB1_PK | 3 | | 4 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | TAB2_IX | 1 | | 2 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | TAB2 | 1 | 29 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("A"."COL1"='WRH$_DB_CACHE_ADVICE' OR "A"."COL1"='WRH$_PARAMETER_PK'
OR "A"."COL1"='WRH$_SEG_STAT')
6 - access("COL1"="COL2")
filter("COL2"='WRH$_DB_CACHE_ADVICE' OR "COL2"='WRH$_PARAMETER_PK' OR
"COL2"='WRH$_SEG_STAT')
23 rows selected.
Notice the cardinality of 3 rows. What made me wonder, is when I used just an '=' sign
select * from system.tab1 a,system.tab2 b
where col1 ='WRH$_DB_CACHE_ADVICE'
and col1=col2;
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1964798218
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 26 | 1404 | 23 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 26 | 1404 | 23 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| TAB1 | 1 | 25 | 2 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | TAB1_PK | 1 | | 1 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| TAB2 | 26 | 754 | 21 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | TAB2_IX | 26 | | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("COL1"='WRH$_DB_CACHE_ADVICE')
5 - access("COL2"='WRH$_DB_CACHE_ADVICE')
Notice the cardinality is 26 ? why is the optimizer changing the cardinality from 26 to 3, when I include an in clause with one other value ? I would expect the cardinality to be 78. In fact when I use a union it comes back right.
select * from system.tab1 a,system.tab2 b
where col1 ='WRH$_DB_CACHE_ADVICE'
and col1=col2
union
select * from system.tab1 a,system.tab2 b
where col1 ='WRH$_SEG_STAT'
and col1=col2
union
select * from system.tab1 a,system.tab2 b
where col1 ='WRH$_PARAMETER_PK'
and col1=col2;
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2805776637
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 77 | 7014 | 71 (68)| 00:00:01 |
| 1 | SORT UNIQUE | | 77 | 7014 | 71 (68)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
| 3 | NESTED LOOPS | | 26 | 1404 | 23 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID | TAB1 | 1 | 25 | 2 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | TAB1_PK | 1 | | 1 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID | TAB2 | 26 | 754 | 21 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | TAB2_IX | 26 | | 2 (0)| 00:00:01 |
| 8 | NESTED LOOPS | | | | | |
| 9 | NESTED LOOPS | | 51 | 5610 | 46 (0)| 00:00:01 |
| 10 | VIEW | VW_JF_SET$623BBB07 | 2 | 162 | 4 (0)| 00:00:01 |
| 11 | SORT UNIQUE | | 2 | 50 | 4 (50)| 00:00:01 |
| 12 | UNION-ALL | | | | | |
| 13 | TABLE ACCESS BY INDEX ROWID| TAB1 | 1 | 25 | 2 (0)| 00:00:01 |
|* 14 | INDEX UNIQUE SCAN | TAB1_PK | 1 | | 1 (0)| 00:00:01 |
| 15 | TABLE ACCESS BY INDEX ROWID| TAB1 | 1 | 25 | 2 (0)| 00:00:01 |
|* 16 | INDEX UNIQUE SCAN | TAB1_PK | 1 | | 1 (0)| 00:00:01 |
|* 17 | INDEX RANGE SCAN | TAB2_IX | 26 | | 2 (0)| 00:00:01 |
| 18 | TABLE ACCESS BY INDEX ROWID | TAB2 | 26 | 754 | 21 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("COL1"='WRH$_PARAMETER_PK')
7 - access("COL2"='WRH$_PARAMETER_PK')
14 - access("COL1"='WRH$_DB_CACHE_ADVICE')
16 - access("COL1"='WRH$_SEG_STAT')
17 - access("ITEM_1"="COL2")
So I ran a 10053 trace and this is what I found.
1) the table (tab2) has 1280422 rows with 49,840 distinct values
Table Stats::
Table: TAB2 Alias: B
#Rows: 1280422 #Blks: 5972 AvgRowLen: 29.00 ChainCnt: 0.00
Column (#1): COL2(
AvgLen: 23 NDV: 49840 Nulls: 0 Density: 0.000020
Index Stats::
Index: TAB2_IX Col#: 1
LVLS: 2 #LB: 5787 #DK: 49840 LB/K: 1.00 DB/K: 17.00 CLUF: 884604.00
2) oracle takes this cardinality and computes it for the table access (1,280,422/49,840) * 3 = 77
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for TAB2[B]
Table: TAB2 Alias: B
Card: Original: 1280422.000000 Rounded: 77 Computed: 77.07 Non Adjusted: 77.07
Access Path: TableScan
3) It then takes the cardinality of the first table (tab1) the cardinality for the second table (tab2), and applies the selectivity.
Join Card: 2.964306 = = outer (3.000000) * inner (77.071950) * sel (0.012821)
Here is my example and my 10053 trace