Thursday, May 10, 2012

Analytical functions

I have been working on a query that is uses an analytical function..
I’m not familiar with them, but using my usual query tuning got me into some trouble..

This is the query.

WITH
SACOMMON71991 AS (select T17189.SUR_REF_TIME_PERD_ID as SUR_REF_TIME_PERD_ID,
    T17189.RTP_CAL_DT as c4,
    T17189.RTP_CAL_YR as c5,
    ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR ORDER BY
T17189.RTP_CAL_YR DESC) as c6,
    ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR,
T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC,
T17189.SUR_REF_TIME_PERD_ID DESC) as c7
from
    REF_TIME_PERD T17189 /* Dim_TIME_PERD_D_Check_Date */ )
select * from SACOMMON71991 where SUR_REF_TIME_PERD_ID=20240213


The data in the table looks like this.

SUR_REF_TIME_PERD_ID    varchar(8)
RPT_CAL_DT                      date
RPT_CAL_YR                      varchar(4)

Some example values are.

SUR_REF_TIME_PERD_ID            RPT_CAL_DT              RPT_CAL_YR
19000101                                         01/01/1900                    1900

20120101                                         01/01/2012                    2012
-..
--
20120125                                          01/25/2012                    2012
..
20120430                                          04/30/2012                    2012
..
20240213                                          02/13/2024                    2024
..
47121231                                           12/31/4712                   4712


Now back to my query..

WITH
SACOMMON71991 AS (select T17189.SUR_REF_TIME_PERD_ID as SUR_REF_TIME_PERD_ID,
    T17189.RTP_CAL_DT as c4,
    T17189.RTP_CAL_YR as c5,
    ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR ORDER BY T17189.RTP_CAL_YR DESC) as c6,
    ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR,T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC,T17189.SUR_REF_TIME_PERD_ID DESC) as c7
from
    REF_TIME_PERD T17189 /* Dim_TIME_PERD_D_Check_Date */ )
select * from SACOMMON71991 where SUR_REF_TIME_PERD_ID=20240213



I noticed that the query did a full table scan of 219,000 rows.




  I figured, well I only wanted 1 row, so why can’t I just change the
query ??

select T17189.SUR_REF_TIME_PERD_ID as SUR_REF_TIME_PERD_ID,
     T17189.RTP_CAL_DT as c4,
     T17189.RTP_CAL_YR as c5,
     ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR ORDER BY T17189.RTP_CAL_YR DESC) as c6,
     ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR, T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC, T17189.SUR_REF_TIME_PERD_ID DESC) as c7
from 
     V_REF_TIME_PERD T17189  where SUR_REF_TIME_PERD_ID=20240213


My plan looks much better



But the answer is different ??

The first query returns.


SUR_REF_TIME_PERD_ID C4                            C5         C6         C7
-------------------- --------------------- ---------- ---------- ----------
            20240213 2024-02-13 00:00:00         2024         44          1

The second query returns



SUR_REF_TIME_PERD_ID C4                            C5         C6         C7
-------------------- --------------------- ---------- ---------- ----------
            20240213 2024-02-13 00:00:00         2024          1          1


So what exactly is the query doing ???

The query is actually analyzing more data than just my row.  After examining the query, I noticed the 4th column (C6) is actually the day of the year.  2/13 is the 44th day of the year.

So what exactly is the query doing to get this ???

It is starting with all rows in the table, then sorting them by RPT_CAL_YR, and SUR_REF_TIME_PERD_ID.  This is column c7 in table.. If we look back we see this column is

ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR, T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC, T17189.SUR_REF_TIME_PERD_ID DESC) as c7

This column does the sorting to get the result.. Now if you look at C6, you see that it is a row_number over just the 1 column RPT_CAL_YR.  Since the rows are now sorted in data order (thanks to c7), C6 is just  the row number for the value passed to the query..

I know it gets kind of complicated.. The lesson learned from this, is that analytical functions sometimes need to do FTS to do their work.  By giving it the key value within the select, I removed the queries ability to look at the full tables data.

I also learned that by properly limiting the values in the right spot, you can eliminate rows.  Here is a better query. I am limiting the analytical function to just look at the current year.


   WITH
SACOMMON71991 AS (select T17189.SUR_REF_TIME_PERD_ID as SUR_REF_TIME_PERD_ID,
    T17189.RTP_CAL_DT as c4,
    T17189.RTP_CAL_YR as c5,
    ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR ORDER BY
T17189.RTP_CAL_YR DESC) as c6,
    ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR,
T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC,
T17189.SUR_REF_TIME_PERD_ID DESC) as c7
from
    REF_TIME_PERD T17189 /* Dim_TIME_PERD_D_Check_Date */ 
    where RTP_CAL_YR='2024')
select * from SACOMMON71991 where SUR_REF_TIME_PERD_ID=20240213

The plan is going through less data, but the appropriate data to get the right answer.



No comments:

Post a Comment