Thursday, August 7, 2014

12.1.0.2 New Features PDB CONTAINERS Clause

When 12.1.0.2 came out, one of the features I wanted to play was the Cross-container functionality, and I finally had time to play with it.

First here is the description of the feature.

The CONTAINERS clause is a new way of looking at multitenant container databases (CDBs). With this clause, data can be aggregated from a single identical table or view across many pluggable databases (PDBs) from the root container. The CONTAINERS clause accepts a table or view name as an input parameter that is expected to exist in all PDBs in that container. Data from a single PDB or a set of PDBs can be included with the use of CON_ID in the WHERE clause. 

I decided to play with this and see what it really means.... And mostly to see the explain plan to see what happens under the covers.

Step 1 --   The first thing you need to do is create a "common user".  A common user is a new term that comes with Pluggable databases.  A common user is a user which is created in the CDB (the Root container), and is then available as a user in all the PDB's that are part of the CDB.  There are some rules around this.

  • In Oracle Database 12c Release 1 (12.1.0.1), the name of a common user must begin with C## or c## and the name of a local user must not begin with C## or c##.
  • Starting with Oracle Database 12c Release 1 (12.1.0.2):
    • The name of a common user must begin with characters that are a case-insensitive match to the prefix specified by the COMMON_USER_PREFIX initialization parameter. By default, the prefix is C##.
    • The name of a local user must not begin with characters that are a case-insensitive match to the prefix specified by the COMMON_USER_PREFIX initialization parameter. Regardless of the value of COMMON_USER_PREFIX, the name of a local user can never begin with C## or c##.

 So here goes for step 1 ..
$ sqlplus "/ as sysdba"

SQL*Plus: Release 12.1.0.2.0 Production on Thu Aug 7 22:23:26 2014

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL create user c##bgrenn identified by bgrenn;
grant dba to c##bgrenn;

User created.

SQL 
Grant succeeded.


One thing to mention at this point is that I granted DBA role to my new common user c##bgrenn.  You also need to set individual privileges at each PDB level

Step 2 --   Now that I have a common user (c##bgrenn), I now need to go into my pdb's and create the objects that I want to have shared across PDB's. 

For simplicity, I chose DBA_TABLES, and DBA_OBJECTS.

The 3  PDBS I want to create this in are "orclpdb", "orclpdba" and "orcldbb"

SQL> COLUMN NAME FORMAT A15
COLUMN RESTRICTED FORMAT A10
COLUMN OPEN_TIME FORMAT A30
 
SELECT NAME, OPEN_MODE, con_id  FROM V$PDBS
SQL /

NAME   OPEN_MODE      CON_ID
--------------- ---------- ----------
PDB$SEED  READ ONLY     2
ORCLPDB  READ WRITE     3
ORCLPDBA  READ WRITE     4
ORCLPDBB READ WRITE     5





I then went into all 3 PDB's and created a local copy of DBA_OBJECTS and DBA_TABLES.

connect sys/oracle@localhost:1521/orclpdb as sysdba

grant dba to c##bgrenn;

create table c##bgrenn.local_objects as select * from dba_objects;

create table c##bgrenn.local_tables as select * from dba_tables; 
Connected.
SQL
Grant succeeded.

SQL 

Table created.

SQL 
 Table created.

SQL


Step 3 - Now that the same objects are created with data in all 3 PDB's, I can now do a combined query from the root PDB.  The next step is create a table in the CDB (root PDB) that is empty.

create table c##bgrenn.local_objects as select * from dba_objects where 1=0;

create table c##bgrenn.local_tables as select * from dba_tables where 1=0;

Table created.

Table created.


Then finally query and see the rows in the local_objects tables across all the PDB's

select count(*) from containers(c##bgrenn.local_objects) where con_id in (3) ;
  COUNT(*)
----------
     90925
select count(*) from containers(c##bgrenn.local_objects) where con_id in (4) ;
  COUNT(*)
----------
     90923
select count(*) from containers(c##bgrenn.local_objects) where con_id in (5) ;
  COUNT(*)
----------
     90925
select count(*) from containers(c##bgrenn.local_objects)  ;
 
  COUNT(*)
----------
    272773


Step 4 - Now to  look at the explain plan for one of the tables across containers

SET LINESIZE 130
SET PAGESIZE 0
SELECT * FROM table(DBMS_XPLAN.DISPLAY);
SQL> SQL> Plan hash value: 1439328272

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |   381 |     0   (0)|       |       |        |      |         |
|   1 |  PX COORDINATOR     |       |            |       |       |            |       |       |        |      |
|   2 |   PX SEND QC (RANDOM)       | :TQ10000   |     1 |   381 |            |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION LIST ALL    |            |     1 |   381 |            |     1 |   254 |  Q1,00 | PCWC |         |
|   4 |     FIXED TABLE FULL        | X$CDBVW$   |     1 |   381 |            |       |       |  Q1,00 | PCWP |         |
----------------------------------------------------------------------------------------------------------------------



Finally, this is the explain plan for a join query to get all the tables


explain plan for
select *
  from containers(c##bgrenn.local_objects) a,
       containers(c##bgrenn.local_tables) b
where
  a.object_name = b.table_name and
  a.object_type = 'TABLE';

SET LINESIZE 150
SET PAGESIZE 0
SELECT * FROM table(DBMS_XPLAN.DISPLAY);

  2    3    4    5    6    7  
Explained.

SQL> SQL> SQL> SQL> Plan hash value: 198107036

-------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ    |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |     1 |  1226 |     0     (0)| 00:00:01|       |       |        |      |        |
|   1 |  PX COORDINATOR              |           |       |       |            |           |       |       |        |      |        |
|   2 |   PX SEND QC (RANDOM)        | :TQ10002  |     1 |  1226 |     0     (0)| 00:00:01|       |       |  Q1,02 | P-S  | QC (RAND)  |
|*  3 |    HASH JOIN BUFFERED        |           |     1 |  1226 |     0     (0)| 00:00:01|       |       |  Q1,02 | PCWP |        |
|   4 |     PX RECEIVE               |           |     1 |   381 |            |           |       |       |  Q1,02 | PCWP |        |
|   5 |      PX SEND HYBRID HASH     | :TQ10000  |     1 |   381 |            |           |       |       |  Q1,00 | P-P  | HYBRID HASH|
|   6 |       STATISTICS COLLECTOR   |           |       |       |            |           |       |       |  Q1,00 | PCWC |        |
|   7 |        PX PARTITION LIST ALL |           |     1 |   381 |            |           |     1 |   254 |  Q1,00 | PCWC |        |
|*  8 |     FIXED TABLE FULL         | X$CDBVW$  |     1 |   381 |            |           |       |       |  Q1,00 | PCWP |        |
|   9 |     PX RECEIVE               |           |     1 |   845 |            |           |       |       |  Q1,02 | PCWP |        |
|  10 |      PX SEND HYBRID HASH     | :TQ10001  |     1 |   845 |            |           |       |       |  Q1,01 | P-P  | HYBRID HASH|
|  11 |       PX PARTITION LIST ALL  |           |     1 |   845 |            |           |     1 |   254 |  Q1,01 | PCWC |        |
|  12 |        FIXED TABLE FULL      | X$CDBVW$  |     1 |   845 |            |           |       |       |  Q1,01 | PCWP |        |
-------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("A"."OBJECT_NAME"="B"."TABLE_NAME")
   8 - filter("A"."OBJECT_TYPE"='TABLE')

25 rows selected.






This looks like a very useful feature to get a high level view of all PDB's.  The things to note are.

1) You need to use a common user and this user needs to be the schema owner in all PDB's
2) By looking at the plan, I'm sure there is some high level view that does this using partitions.



Monday, May 5, 2014

Performance tuning using Oracle Internal Packages

I had an interesting problem last week with a customer who was performance testing on a new system compared to their current system.

The script was pretty simple. It was a PL/SQL package that inserted into a table 10M rows, and committed every 1,000 rows.  To make the data more "normal" the customer used DBMS_RANDOM .

The basic insert looked like this.

INSERT INTO TEST_TABLE1
     (ID,DTVAL,
      DTSTAMP,
      COL1,
      NUM)
VALUES
    (:B1 ,
      SYSDATE,
      SYSTIMESTAMP,
      DBMS_RANDOM.STRING('A', 100),
      DBMS_RANDOM.RANDOM);

To me it seemed like a simple test.  Unfortunately the performance results were not as expected.    To step back for a minute the current system was running on 11.1.0.7 and the new system they were benchmarking against was 11.2.0.4.

I even had them check the output of the  Table to ensure no changes in the output.. Everything looked the same.

You wouldn't think that would matter, but the differences in DBMS_RANDOM between versions seemed to be issue.  You see DBMS_RANDOM periodically has logic changes, and the performance of DBMS_RANDOM cannot be compared between versions in a performance benchmark.

I had the customer re-run the tests with constants instead of calling DBMS_RANDOM and the results were much better.

To reproduce what they saw  I finally tested against 11.2.0.2 and  12.1.0.1 (on the same machine).  I could not get a copy of 11.1.0.7 and 11.2.0.4 to test.  These 2 versions were enough to see the difference that affected the Customers Benchmark.

Below I've included the TKPROF formatted output from the trace file on 11.2.0.2

SQL ID: fg7gf0m6a2ca4 Plan Hash: 0

INSERT INTO TEST_TABLE1 (ID,DTVAL,DTSTAMP,COL1,NUM)
VALUES
(:B1 , SYSDATE, SYSTIMESTAMP, DBMS_RANDOM.STRING('A', 100),
  DBMS_RANDOM.RANDOM)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 100000     25.81      39.51          0       2464     119161      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   100001     25.81      39.51          0       2464     119161      100000


Notice the CPU time.. 25.81 seconds of CPU time on 11.2.0.2

Below is the TKPROF formatted output from the trace file on 12..1.0.1

INSERT INTO TEST_TABLE1 (ID,DTVAL,DTSTAMP,COL1,NUM)
VALUES
(:B1 , SYSDATE, SYSTIMESTAMP, DBMS_RANDOM.STRING('A', 100),
  DBMS_RANDOM.RANDOM)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 100000     74.01      90.31          1       3722     111116      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   100001     74.01      90.31          1       3722     111116      100000


This time notice that it 74.01 seconds of CPU.. Same statement executed the same number of times..

The difference between the 2 versions is almost 3X longer in 12.1.0.1

No I re-ran it with constants

11.2.0.2 Test

INSERT INTO TEST_TABLE1 (ID,DTVAL,DTSTAMP,COL1,NUM)
VALUES
(:B1 , SYSDATE, SYSTIMESTAMP, 'a', 1)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 100000      4.62       6.02          0        536     108087      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   100001      4.62       6.02          0        536     108087      100000


12.1.0.1 test

INSERT INTO TEST_TABLE1 (ID,DTVAL,DTSTAMP,COL1,NUM)
VALUES
(:B1 , SYSDATE, SYSTIMESTAMP, 'a', 1)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 100000      4.78       7.09          1        586     105731      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   100001      4.78       7.09          1        586     105731      100000



Wow.. Now that I use constants, the CPU time was almost identical.

There are absolutely some major performance differences with DBMS_RANDOM between versions.

Moral of the story is don't use internal packages for benchmarking (unless they are critical to your application).



Finally, this is the package code I used for testing..

OWNER        SET TIME ON
SET TIMING ON
SET ECHO ON
SET SERVEROUTPUT ON
SET TERMOUT ON
SET VERIFY ON
SET FEEDBACK ON

WHENEVER SQLERROR CONTINUE

select to_char(sysdate,'mm/dd/yyyy hh:mi:ss AM') from dual;


prompt create the test_table1

drop table test_table1;

create table test_table1
(
id NUMBER,
dtval DATE,
dtstamp TIMESTAMP,
col1 varchar2(100),
num NUMBER
);   

prompt Insert 1M with commit every 1000 records

alter session set tracefile_identifier = 'test_sess_1';
exec dbms_monitor.session_trace_enable( waits => true );


DECLARE
 x  PLS_INTEGER;
 rn NUMBER(20); 
BEGIN

  SELECT hsecs
  INTO rn
  FROM v$timer;
  
  dbms_random.initialize(rn);
  FOR i IN 1..100000
  LOOP
    x := dbms_random.random;
    rn := x; 
    
    insert into test_table1 (id,dtval,dtstamp,col1,num)
    values(x, sysdate, systimestamp, DBMS_RANDOM.string('A', 100), dbms_random.random);
    
    If ( MOD(i,100) = 0) then
        commit;
    end if;
    
  END LOOP;
  dbms_random.terminate;
END;
/

EXEC DBMS_MONITOR.session_trace_disable;

prompt Count of all records

select count(*) from test_table1;
select count(distinct col1) from test_table1 ;
select count(distinct num) from test_table1 ;





Thursday, December 12, 2013

Monitoring your Exadata health

One of the biggest topics I talk to customers about is the monitoring of your exadata health. 

The best tool for this is the Exachk (see MOS Doc ID 1070954.1)

This document contains the current Exachk release, and any new beta release that is available.

The recommendation for Exachk is to

1) Run the exachk (at a minimum) quarterly, and after any changes are made to the configuration
2) ALWAYS run the current exachk.  This script is periodically updated/improved upon so it is very important to be current
3) Keep track of any failures to ensure that you can identify any new items that appear in the report
4) A score of 80 or above is a good score for production. It is very rare to have a score that is 99+.

There are also a great whitepaper  released in Sept. 2013 (just a few months ago).

This white paper can be  here.

http://www.oracle.com/technetwork/database/availability/exadata-health-resource-usage-2021227.pdf

Wednesday, December 11, 2013

X4-2 Exadata Announcement

These the differences with the new X4-2 just announced, along with a table comparing the differences.
1) Double the size of flashcache
2) Switch from 3tb drives to 4tb drives (HC)
3) More cpu cores
4) Increase in Infiniband throughput by using an Active-Active configuration
5) Automatic Flash compression on X3 and X4 systems (using the ACO option)


x2 X3 X4
Database
Processesor 2 x Six-Core Intel Xeon® X5675 Processors (3.06 GHz) 2 x Eight-Core Intel Xeon®E5-2690 Processors (2.9 GHz) 2 X Twelve-Core Intel® Xeon® E5-2697 V2 Processors (2.7 GHz)
Memory 96G 128g/256g 256g
Disk controller
Disk Controller HBA with 512MB Batter Backed Write Cache
Disk Controller HBA with 512MB Batter Backed Write Cache Disk Controller HBA with 512MB Batter Backed Write Cache
Internal disks 4 x 300 GB 10,000 RPM SAS Disks 4 x 300 GB 10,000 RPM Disks 4 x 600 GB 10,000 RPM Disks
Infiniband 2 x QDR (40Gb/s) Ports 2 x QDR (40Gb/s) Ports 2 x QDR (40Gb/s) Ports
Ethernet 2 x 10 Gb Ethernet Ports based on the Intel 82599 10GbE Controller  4 x 1/10 Gb Ethernet Ports (copper) 4 x 1/10 Gb Ethernet Ports (copper)
Ethernet 4 x 1 Gb Ethernet Ports 2 x 10 Gb Ethernet Ports (optical) 2 x 10 Gb Ethernet Ports (optical)
Full Rack 96 CPU cores and 768 gb memory for database processing (12 CPU cores and 96 GB memory per Database Server)  128 CPU cores and 1TB or  2 TB memory for database processing (16 CPU cores and 256 GB memory per Database Server)  192 CPU cores and 2TB memory for database processing (24 CPU cores and up to 512 GB memory per Database Server)
Storage Cells
CPU
2 x Six-Core Intel® Xeon® L5640 (2.26 GHz) Processors
2 x Six-Core Intel® Xeon® E5-2630L (2.0 GHz processors) 2 x Six-Core Intel® Xeon® E5-2630 v2 (2.6 GHz processors)
Memory 24 GB 64 GB 96 GB
HC (High Capacity)
Disk Bandwidth¹
Up to 18 GB/second of uncompressed disk bandwidth
Up to 18 GB/second of uncompressed disk bandwidth
Up to 20 GB/second of uncompressed disk bandwidth
Flash Bandwidth¹ Up to 68 GB/second of uncompressed Flash data bandwidth Up to 93 GB/second of uncompressed Flash data bandwidth Up to 100 GB/second of uncompressed Flash data bandwidth
Disk IOPS ² Up to 28,000 Database Disk IOPS Up to 28,000 Database Disk IOPS Up to 32,000 Database Disk IOPS
Flash read IOPS ² Up to 1,500,000 Database Flash IOPS Up to 1,500,000 Database Flash IOPS Up to 2,660,000 Database Flash IOPS
Flash write IOPS³ N/A Up to 1,000,000 Database Flash IOPS Up to 1,680,000 Database Flash IOPS
Flash Data Capacity (raw) 5.3 TB Exadata Smart Flash Cache 22.4 TB 44.8 TB
Disk Data capacity (raw) 504 TB of raw disk data capacity 504 TB 672 TB
Disk Data capacity (Usable)
Up to 224 TB of uncompressed usable capacity
224 TB 300 TB
HP (High Performance)
Disk Bandwidth¹
Up to 25 GB/second of uncompressed disk bandwidth
Up to 25 GB/second of uncompressed disk bandwidth
Up to 24 GB/second of uncompressed disk bandwidth
Flash Bandwidth¹ Up to 75 GB/second of uncompressed Flash data bandwidth Up to 100 GB/second of uncompressed Flash data bandwidth Up to 100 GB/second of uncompressed Flash data bandwidth
Disk IOPS ² Up to 50,000 Database Disk IOPS Up to 50,000 Database Disk IOPS Up to 50,000 Database Disk IOPS
Flash read IOPS ² Up to 1,500,000 Database Flash IOPS Up to 1,500,000 Database Flash IOPS Up to 2,660,000 Database Flash IOPS
Flash write IOPS³ N/A Up to 1,000,000 Database Flash IOPS Up to 1,680,000 Database Flash IOPS
Flash Data Capacity (raw) 5.3 TB Exadata Smart Flash Cache 22.4 TB 44.8 TB
Disk Data capacity (raw) 100 TB of raw disk data capacity 100 TB 200 TB
Disk Data capacity (Usable)
Up to 45 TB of uncompressed usable capacity
45 TB 90 TB
¹Bandwidth is peak physical scan bandwidth achieved running SQL, assuming no database compression. Effective user data bandwidth is higher when database compression is used.
 ²Based on 8K IO requests running SQL. Note that the IO size greatly affects Flash IOPS. Others quote IOPS based on 2K or smaller IOs and are not relevant for databases.
³Based on 8K IO requests running SQL. Flash write I/Os measured at the storage servers after ASM mirroring. Database writes will usually issue multiple storage IOs to maintain redundancy.
⁴Raw capacity is measured in standard disk drive terminology with 1 GB = 1 billion bytes. Capacity is measured using normal powers of 2 space terminology with 1 TB = 1024 * 1024 * 1024 * 1024 bytes. Actual formatted capacity is less.
⁵Raw capacity is measured in standard disk drive terminology with 1 GB = 1 billion bytes. Capacity is measured using normal powers of 2 space terminology with 1 TB = 1024 * 1024 * 1024 * 1024 bytes. Actual formatted capacity is less.
⁶Actual space available for a database after mirroring (ASM normal redundancy) while also providing adequate space (one disk on Quarter and Half Racks and two disks on a Full Rack) to reestablish the mirroring protection after a disk failure in the normal redundancy case.

--> -->

Tuesday, November 12, 2013

Oracle 12c PL/SQL improvements.


Last week I was giving a presentation for the UNYOUG (Upstate NY Oracle users Group) and I talked about the new features in 12c (along with In-Memory database).

I thought I would share some thoughts after the meeting.

I went through Tom Kyte's top 12 new features, and surprisingly the top feature that excited people was the PL/SQL improvements. 

The PL/SQL improvements had to do with the ability to write a PL/SQL as part of the query.

Lets say currently with 11g you have a function that includes the days between 2 dates.

  CREATE OR REPLACE FUNCTION Days_Between 
             (first_dt DATE, second_dt DATE)
                RETURN NUMBER IS 
       dt_one NUMBER
       dt_two NUMBER
BEGIN 
      dt_one := TO_NUMBER(TO_CHAR(first_dt, 'DDD')); 
      dt_two := TO_NUMBER(TO_CHAR(second_dt, 'DDD'));  
           RETURN (dt_two - dt_one); 
 END Days_Between;   

select Days_between(start_date,end_date) from mytable;



The problem is that in order to test this function you need to create the function.  There are multiple issues that developers face with having to do this.
  1. Developers often don't have the authority to create/change functions, especially if they need to be owned by a different schema
  2. Replacing the current function affects other users and this may not be desirable while debugging changes.
  3. Testing against production data is often not possible because of authorization, and collision issues.

The answer in 12c is the ability to include a function in the "WITH" clause.. The above would become


WITH  FUNCTION Days_Between
             (first_dt DATE, second_dt DATE)
                RETURN NUMBER IS
       dt_one NUMBER;
       dt_two NUMBER;
BEGIN
      dt_one := TO_NUMBER(TO_CHAR(first_dt, 'DDD'));
      dt_two := TO_NUMBER(TO_CHAR(second_dt, 'DDD'));
           RETURN (dt_two - dt_one);
 END Days_Between;  
select Days_between(start_date,end_date) from mytable;

So. what about Procedures you ask ? You can also include procedures in the mix.  The main purpose of doing this is to include any procedures that are invoked from the function.  This way you can include all the "dependencies" in the with clause. 

Finally, I read an article talking about how much this improves performance too.

http://www.oracle-base.com/articles/12c/with-clause-enhancements-12cr1.php#plsql-support

but to the developers I talked to the big advantage was with the ability to test..

As far as performance gains, I don't know how much I would put reusable code (like functions) directly into a sql statement. It would be a bear to support any changes to a "common function" defined multiple places.
 

 

Wednesday, January 9, 2013

Performance and Indexes with oracle

It's been a while since I've written a post, but twice this week, the same issue came up..

The story goes like this.. "I have a query that is using indexes not FTS, but it is much slower then expected".  It seems most folks have it drummed into their heads that Indexes are fastest.

This is the statement I've made twice....

"The only thing worse than a FTS is an index lookup of the whole table."  I figured I would show you what I mean.

First I create 2 tables..

TEST_TABLE with 76,989 rows of data.  There is a primary key.
DRIVER  with the same rows of test_table.

This is the 2 test I did.

1) I created DRIVER with 1 row, and analyzed it , then I deleted the row, and inserted all the rows from TEST_TABLE.
2) I ran the following query which should return every row.

select  /*+ GATHER_PLAN_STATISTICS */  count(distinct a.capture_time)
from test_table a,
     driver b
where a.pval=b.pval
and b.instance_number < 20;

So what happens ?  since the statistics on DRIVER say there is only row you can see the plan and actual vs estimated below.







----------------------------------------------------------------------------------------------        
| Id  | Operation                       | Name       | Starts | E-Rows | A-Rows |   A-Time   |    
----------------------------------------------------------------------------------------------   

PLAN_TABLE_OUTPUT                                                                                                                           
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |            |      1 |        |      1 |00:00:00.86 |
|   1 |  SORT AGGREGATE                 |            |      1 |      1 |      1 |00:00:00.86 |
|   2 |   VIEW                          | VW_DAG_0   |      1 |      1 |  77961 |00:00:01.28 |
|   3 |    HASH GROUP BY                |            |      1 |      1 |  77961 |00:00:01.01 |
|   4 |     NESTED LOOPS                |            |      1 |        |  77989 |00:00:01.00 |
|   5 |      NESTED LOOPS               |            |      1 |      1 |  77989 |00:00:00.58 |
|*  6 |       TABLE ACCESS FULL         | DRIVER     |      1 |      1 |  77989 |00:00:00.07 |
|*  7 |       INDEX UNIQUE SCAN         | PVAL_KEY   |  77989 |      1 |  77989 |00:00:00.21 |
|   8 |      TABLE ACCESS BY INDEX ROWID| TEST_TABLE |  77989 |      1 |  77989 |00:00:00.22 |
----------------------------------------------------------------------------------------------
                                                                                                

Since the Optimizer is only expecting 1 row to come back from the table DRIVER, it does an index lookup on TEST_TABLE, for every row. Notice the actual number of rows is the full table.

Now lets look at the cost of this index lookup.


select  /*+ GATHER_PLAN_STATISTICS */  count(distinct a.capture_time)
from test_table a,
     driver b
where a.pval=b.pval
and b.instance_number < 20

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.84       0.85          0      83932          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.84       0.85          0      83932          0           1


OK.  so the cost of the index lookup row-by-row is .85 seconds elapsed time.

Now after analyzing the DRIVER table, you can see the plan changed to a FTS.


------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Starts | E-Rows | A-Rows |   A-Time   |
------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT                                                                                                                           
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |      1 |        |      1 |00:00:00.16 |
|   1 |  SORT AGGREGATE       |            |      1 |      1 |      1 |00:00:00.16 |
|   2 |   VIEW                | VW_DAG_0   |      1 |  76723 |  77961 |00:00:00.40 |
|   3 |    HASH GROUP BY      |            |      1 |  76723 |  77961 |00:00:00.23 |
|*  4 |     HASH JOIN         |            |      1 |  76723 |  77989 |00:00:01.23 | 
|*  5 |      TABLE ACCESS FULL| DRIVER     |      1 |  76723 |  77989 |00:00:00.22 |
|   6 |      TABLE ACCESS FULL| TEST_TABLE |      1 |  79110 |  77989 |00:00:00.22 |
------------------------------------------------------------------------------------
                                                                                              


Notice the actual rows, and estimates match.  You can also see it is a FTS.

Now for the run time stats with the FTS.

select  /*+ GATHER_PLAN_STATISTICS */  count(distinct a.capture_time)
from test_table a,
     driver b
where a.pval=b.pval
and b.instance_number < 20

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.15       0.15          0       1390          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.15       0.15          0       1390          0           1

     

WOW.. look at that the FTS took .15 seconds compared to .85 seconds.

The bottom line is.. The next time you talk to developers, and they think their query should be fast because they are using an index, look deeper.  The biggest clue is to look at the Estimate vs Actual for the plan.  The index might not be what you want. FTS's can be good.




Monday, November 12, 2012

Big Data and Privacy

I am writing an editorial (which is unusual for me). This was caused by a conversation I had with my Dad about how the world is changing with big data, and how much retiree's (like him) should know.  So here goes.

Dear Dad,

  I know you are involved in college education for Older folks that have a passion to learn.  An interesting topic would be "Big Data".  For someone outside the IT field, I would tell you that Big Data describes the plethora of new data that is generated in today's society.
Where does it come from ??


  • Logs from webservers 
  • Cell phones (including location data).
  • Search data
  • Medical data
  • Machine generated data (like from your computer in your car)
  • Sales data
All this can be tied together in new ways, that most people didn't think were possible years ago.  My favorite example is... You are walking in the mall past past a store, and you get a text on your phone.  The store's computer system has texted you a 30% offer for a new sweater, good for 2 hours.  The store knows ...
  • Your location from phone
  • That you have a ski trip planned for the next week, from your search history, and purchase history
  • That you are looking for a new sweater from your facebook, and twitter posts.
Amazing huh ?

All these things can open up miraculous possibilities. The day may come when your doctor calls you to schedule a preventive test.  Medical history gathered from a large group of people has shown that others with a similar medical history as yours have had a issue that can be tested for and prevented. Wow. really amazing things.

They can predict something that may go wrong on your car given all the information the computer has gathered.

Target (the department store), has even predicted when a woman is pregnant based on purchase history.. 

All these things are amazing possiblities, but they are also scary.

I find this topic very exciting, but I'm sure for a lot of people these ideas are very scary. Where do you cross the line and enter privacy issues.

This is going to be a interesting battle. Who ones all this data ? what data is public, and what data is private ?

But the most interesting thing is what can be done with the data....

Me, I'm an optimist so I can all the good that can come out of all this new data.

At the very least, the students would at least have a better idea what is going on in the world (behind the scenes), and they would understand why Target things "grandma" is pregnant.