Friday, December 3, 2010

SQL profiles

I have been spending a lot of time with SQL_PROFILES, and figuring out how my profile went stale (and subsequently the performance went from .04 seconds to 100 seconds).


First I know profiles can go bad because of cardinality. Kerry Osborne just did a great blog on this subject, and he talks about how profiles built by the SQL_ADVISOR use opt_estimate.  How to lock SQL_PROFILES created by SQL_ADVISOR. Here is a snippet of what you would see


OPT_ESTIMATE(@"SEL$5DA710D3", INDEX_FILTER, "F"@"SEL$1", IDX$$_1AA260002, SCALE_ROWS=8.883203639e-06)


The issues I've been having come from profiles created by SQLT.  SQLT, can create profiles for you from a previous SQL statement that runs properly.  The profile is named COE_{sql_id}_{plan_hash_value}. 


The first thing I noticed is that I use GTT (global temporary tables) a lot.. In fact during some of the queries that have profiles, the same GTT is called 17 times.  The GTT is primed  with only a single row and dynamic sampling is utilized (there are no stats on the table).  The second thing I noticed is that we have queries, that look simple (a join of 2 objects), but they really call views.  I see the profile is built on a query, not on the views (more later).


So where did I start ?  First I looked at the explain plan for a database that was using the plan I want.  I know that my GTT (tmp_gcc_id) has 1 row in it. 

Operation                                                                          Name                  cardinality
TABLE ACCESS FULL                                                   TMP_GCC_ID      1   

Then later in the plan

NESTED LOOPS                                                                                        179696
     MERGE JOIN CARTESIAN                                                                      8168
         TABLE ACCESS FULL                                         TMP_GCC_ID        8168
           TABLE ACCESS BY INDEX ROWID              order                           1
              INDEX RANGE SCAN                                    idx_order                  22


The  interesting thing is that the cardinality is 8168 for the GTT for the rest of Explain plan.. This dramitically inflluences the cost, and cardinality estimates throughout.  Oracle however does use the profile and keeps the plan to be what I want it to be.  Now I look at the database where the profile is causing performance issues.

Operation                                                                    Name                  cardinality
TABLE ACCESS  FULL                                       TMP_GCC_ID         1




then later in the plan



As far as ADG (active data guard), baselines can't be created. They require a sql_tuning_set which gets created in the database.


Baselines have long way to go to make them as portable as profiles, and profiles have a long way to go to make them as stable as baselines.  I hope there is migration path from profiles to baselines, but I haven't heard any rumblings yet.
NESTED LOOPS                                                                                        238,125
     MERGE JOIN CARTESIAN                                                                      8168
         TABLE ACCESS FULL                                         TMP_GCC_ID        8168
           TABLE ACCESS FULL                                     ORDER            534,285

    


I
think my problem with the profile causing a FTS on a large table is 2 fold.


GTT  -- SInce I am using a GTT, with dynamic sampling, oracle is only using dynamic sampling the first time it encounters the GTT in the query (from what I can gather).. All the subsequent subselects utilizing the GTT default to a cardinality of 8168, and follow that through..
This works fine when the profile is recognized, and working.. Once the profile starts to fail, the bad cardinality drives FTS where they shouldn't be.

PROFILES on Queries with Underlying views -  My query that I built the profile on has an underlying view that does most of the work.. The view was modified, and the order of the tables accessed was changed.  This was enough to throw off the profile.. The profile is built on the SEL$ names that oracle assigns to each section of the query.  Since the query didn't change, the profile was used.  And since the SEL$ names no longer matched the new view, the profile couldn't be properly utilized.  Since the profile wasn't properly utilzed, oracle defaulted to figuring out the best plan.. GTT's cause issues because they didn't dynamic sample every time.  The default cardinality of the GTT (8168) was enough to cause full table scans.

The moral of the story is that profiles can go bad 2 ways.

1) if they use any sort of cardinality hint (like Kerry mentions).

2) You change an underlying object (like a view), that doesn't change the query text.

Now the next topic is Baselines.  I tried to use baselines (to save myself all this trouble with profiles), and I had no luck.. We have multiple databases with the same schema, but different login ID's.  In order to utilize baselines you have to create SQL_TUNING_SETS.   Creating a SQL_TUNING_SET is much more difficult than just creating a SQL_PROFILE from an existing plan.  The other issue I have had is that we already have profiles on our queries that cause us trouble.. Creating a baselines, relies on the profile.. Create a baseline, remove the profile, and the baseline fails.,. This dependency to the profile makes it very difficult to replace our current profiles.

Lastly baselines have a column called "parsing_schema_name".. I love the idea that you can move baselines between databases (and promote them as part of your code), but we don't have the same parsing_schema_name in all our database.. I try to import the baseline, and it fails because the parsing_schema_name doesn't exist in the database.

Wednesday, November 17, 2010

Oracle on ASM

I discovered an interesting little by product of running Oracle on ASM.. Our SGA was sized a little small on a non-production box.  All of a sudden the Users started seeing some I/O errors.

Error: Selecting clients for minimum billing ORA-01115: IO error reading block
from file (block # )


ORA-01110: data file 49: '+DATA/orcl/data_128k_dt01.dbf'
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk

Of course I looked at the sytem logs to find out if there was the dreaded corruption.. but nothing shows up.. I look in the alert log and I see

ORA-04031: unable to allocate 3240 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","FileOpenBlock")



ERROR: error ORA-4031 caught in ASM I/O path

 
 
Hmm.. looks like Oracle not being able to get to the ASM path causes I/O error messages.

Wednesday, November 3, 2010

SCAN (single client access name)

If you didn't know it, there is a new feature in 11.2 Grid called SCAN.  Here is a link to a little more information http://www.orafaq.com/node/2369 .

The reason I thought I should mention this, is that scan is a new concept you have to deal with when you upgrade/install 11.2 grid.. There is no getting around it in the installation.  It is possible to disable it however once you install it.

My suggestion on using it, is to move slowly.. There is still a lot of old clients out there that have hard time with scan, so you might find that your client won't connect the "new way"..  It is also very persnickity,.. I've seen some installations where the DNS resolution (files,dns), has thrown it off too.. For a new application, you can work through issues as you get ready to deploy, but for existing apps I would recommend you plan some time in your upgrade schedule to work through issues (or plan on disabling it).

Just  a word to the wise :)

Wednesday, October 27, 2010

Concurrency and parallelism

I've had a lot of discussions with some very "seasoned" professions on how to handle a high level concurrency. Most of these professions point to the new features of 11gr2 and and parallization..

True 11gr2 has added a new way of handling degree of parallization.

There are some new parameters

parallel_degree_policy
PARALLEL_MIN_TIME_THRESHOLD

These control how parallism is handled.. They can be used to actually create a funnel to ensure the system isn't flooded.

The problem is all this, is that parallelism has a price.. Take a small efficient query, and turn on these parameters.. Guess what happens when you ramp up and run 500 of the same query concurrently ? You see much lower throughput (I've seen as much as 10x lower throughput). Why ?? The overhead of parallel query can be quite high, and can consume more time than even CPU in your AWR report.



Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
enq: JX - SQL statement queue 71 7,815 1.E+05 99.3 Scheduler
DB CPU 96 1.2
PX Nsq: PQ load info query 46 9 201 .1 Other
enq: RD - RAC load 90 2 20 .0 Other
PX Deq: reap credit 152,105 1 0 .0 Other


The moral of the story is, parallism is good for longer queries.. For shorter queries your milege may vary

Adaptive Direct Path Reads

This is one of my favorite topics, as it keeps coming up.

Well as you read the title, you are probably going huh ?? Until you upgrade to 11g, you probably just think direct reads are for reading temp only.. Well it all changes with 11g. Full table scans of large tables turn from "db scattered read" to "direct path read"..


What does this mean ?? well the good news is it runs about 4 times faster than the old fasion reads.. How does it do this ?? It bypasses the SGA.. Is this good ? probably.. especially for full table scans.. it doesn't force anything out of your buffer cache, and you get faster reads. The only concern I would have is if you wanted to read a lot of data into memory, I don't know how to burn things into cache. I tried to disable it to do timings, but no luck. Even with the optimizer set back to 10g it still does direct path reads.

Here is some great information on it.

http://afatkulin.blogspot.com/2009/01/11g-adaptive-direct-path-reads-what-is.html

http://shallahamer-orapub.blogspot.com/2010/01/mystery-surrounding-11g-and-direct.html

I've had some bad experiences with Direct path reads, and concurrency. If you think about lots and lots concurrent sessions, doing direct path reads you start to imagine how this can reek havoc. None of these sessions share the results of the read, and they all independently read disk blocks.. Eventually you run out of runway for I/O.

Happy tuning.

previous blog posts

If anyone is interested in my previous blog posts, you can find them at

http://www.unyoug.com/forums/viewforum.php?f=1

Monday, October 18, 2010

Infiniband coming to a town near you

As the Exadata takes off (I've heard it is REALLY taking off), more and more vendors are opening their eyes to the bottleneck of I/O.

You are going to see some other solutions come to the market that are infiniband based. You are also going to see more solutions like the Storage Cell's.. Just look at the IBM XIV ! they have storage cells just like the exadata with large caches. The concept is catching on..

But what does the Exadata have besides the obvious ? It has the ability to parallelize the I/O at the storage level.. I'm sure you going yea. I knew that.. but think about it.

a) Exadata.. run a non-parallel query that does a FTS on a 5tb table. You will marshall all the resources of the I/O from a single query on a single NODE

b) XXXXXX.. Run a non-parallel query, and you only will be able to marshall all the I/O that the single CPU can handle.

Sure you can build an Exadata like solution, but in order to utilze the power of the storage/infiniband subsystem you need to parallelize across multiple CPU's.. This uses a lot of CPU's, and parallization might not be the best plan for all queries.

In my mind, this is big bonus of the exadata.. Parallize or not, you can do FTS's at 20.8g/s