I have been working on a query that is uses an analytical function..
I’m not familiar with them, but using my usual query tuning got me into some trouble..
This is the query.
The data in the table looks like this.
SUR_REF_TIME_PERD_ID varchar(8)
RPT_CAL_DT date
RPT_CAL_YR varchar(4)
Some example values are.
SUR_REF_TIME_PERD_ID RPT_CAL_DT RPT_CAL_YR
19000101 01/01/1900 1900
…
20120101 01/01/2012 2012
-..
--
20120125 01/25/2012 2012
..
20120430 04/30/2012 2012
..
20240213 02/13/2024 2024
..
47121231 12/31/4712 4712
Now back to my query..
I noticed that the query did a full table scan of 219,000 rows.
I figured, well I only wanted 1 row, so why can’t I just change the
query ??
My plan looks much better
But the answer is different ??
The first query returns.
The second query returns
So what exactly is the query doing ???
The query is actually analyzing more data than just my row. After examining the query, I noticed the 4th column (C6) is actually the day of the year. 2/13 is the 44th day of the year.
So what exactly is the query doing to get this ???
It is starting with all rows in the table, then sorting them by RPT_CAL_YR, and SUR_REF_TIME_PERD_ID. This is column c7 in table.. If we look back we see this column is
This column does the sorting to get the result.. Now if you look at C6, you see that it is a row_number over just the 1 column RPT_CAL_YR. Since the rows are now sorted in data order (thanks to c7), C6 is just the row number for the value passed to the query..
I know it gets kind of complicated.. The lesson learned from this, is that analytical functions sometimes need to do FTS to do their work. By giving it the key value within the select, I removed the queries ability to look at the full tables data.
I also learned that by properly limiting the values in the right spot, you can eliminate rows. Here is a better query. I am limiting the analytical function to just look at the current year.
The plan is going through less data, but the appropriate data to get the right answer.
I’m not familiar with them, but using my usual query tuning got me into some trouble..
This is the query.
WITH
SACOMMON71991 AS (select T17189.SUR_REF_TIME_PERD_ID as SUR_REF_TIME_PERD_ID,
T17189.RTP_CAL_DT as c4,
T17189.RTP_CAL_YR as c5,
ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR ORDER BY
T17189.RTP_CAL_YR DESC) as c6,
ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR,
T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC,
T17189.SUR_REF_TIME_PERD_ID DESC) as c7
from
REF_TIME_PERD T17189 /* Dim_TIME_PERD_D_Check_Date */ )
select * from SACOMMON71991 where SUR_REF_TIME_PERD_ID=20240213
The data in the table looks like this.
SUR_REF_TIME_PERD_ID varchar(8)
RPT_CAL_DT date
RPT_CAL_YR varchar(4)
Some example values are.
SUR_REF_TIME_PERD_ID RPT_CAL_DT RPT_CAL_YR
19000101 01/01/1900 1900
…
20120101 01/01/2012 2012
-..
--
20120125 01/25/2012 2012
..
20120430 04/30/2012 2012
..
20240213 02/13/2024 2024
..
47121231 12/31/4712 4712
Now back to my query..
WITH
SACOMMON71991 AS (select T17189.SUR_REF_TIME_PERD_ID as SUR_REF_TIME_PERD_ID,
T17189.RTP_CAL_DT as c4,
T17189.RTP_CAL_YR as c5,
ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR ORDER BY T17189.RTP_CAL_YR DESC) as c6,
ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR,T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC,T17189.SUR_REF_TIME_PERD_ID DESC) as c7
from
REF_TIME_PERD T17189 /* Dim_TIME_PERD_D_Check_Date */ )
select * from SACOMMON71991 where SUR_REF_TIME_PERD_ID=20240213
I noticed that the query did a full table scan of 219,000 rows.
I figured, well I only wanted 1 row, so why can’t I just change the
query ??
select T17189.SUR_REF_TIME_PERD_ID as SUR_REF_TIME_PERD_ID,
T17189.RTP_CAL_DT as c4,
T17189.RTP_CAL_YR as c5,
ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR ORDER BY T17189.RTP_CAL_YR DESC) as c6,
ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR, T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC, T17189.SUR_REF_TIME_PERD_ID DESC) as c7
from
V_REF_TIME_PERD T17189 where SUR_REF_TIME_PERD_ID=20240213
My plan looks much better
But the answer is different ??
The first query returns.
SUR_REF_TIME_PERD_ID C4 C5 C6 C7
-------------------- --------------------- ---------- ---------- ----------
20240213 2024-02-13 00:00:00 2024 44 1
The second query returns
SUR_REF_TIME_PERD_ID C4 C5 C6 C7
-------------------- --------------------- ---------- ---------- ----------
20240213 2024-02-13 00:00:00 2024 1 1
So what exactly is the query doing ???
The query is actually analyzing more data than just my row. After examining the query, I noticed the 4th column (C6) is actually the day of the year. 2/13 is the 44th day of the year.
So what exactly is the query doing to get this ???
It is starting with all rows in the table, then sorting them by RPT_CAL_YR, and SUR_REF_TIME_PERD_ID. This is column c7 in table.. If we look back we see this column is
ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR, T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC, T17189.SUR_REF_TIME_PERD_ID DESC) as c7
This column does the sorting to get the result.. Now if you look at C6, you see that it is a row_number over just the 1 column RPT_CAL_YR. Since the rows are now sorted in data order (thanks to c7), C6 is just the row number for the value passed to the query..
I know it gets kind of complicated.. The lesson learned from this, is that analytical functions sometimes need to do FTS to do their work. By giving it the key value within the select, I removed the queries ability to look at the full tables data.
I also learned that by properly limiting the values in the right spot, you can eliminate rows. Here is a better query. I am limiting the analytical function to just look at the current year.
WITH
SACOMMON71991 AS (select T17189.SUR_REF_TIME_PERD_ID as SUR_REF_TIME_PERD_ID,
T17189.RTP_CAL_DT as c4,
T17189.RTP_CAL_YR as c5,
ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR ORDER BY
T17189.RTP_CAL_YR DESC) as c6,
ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR,
T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC,
T17189.SUR_REF_TIME_PERD_ID DESC) as c7
from
REF_TIME_PERD T17189 /* Dim_TIME_PERD_D_Check_Date */
where RTP_CAL_YR='2024')
select * from SACOMMON71991 where SUR_REF_TIME_PERD_ID=20240213
The plan is going through less data, but the appropriate data to get the right answer.
No comments:
Post a Comment