Sunday, January 29, 2012

Disk space layout on your Exadata

This blog post is a product of my last post on Exadata disk usage.

I have multiple exadatas (both full Rack and 1/2 Racks), and I want to know exactly how each one is configured, now that ACS has left.  How do I go about finding how they are set up.

Well let's start with the basics.

Each Storage cell

  • Has 12 physical spinning disks.
  • The first 2 disks contain the os which utilizes ~29g of space
  • The disks come in either 600g (SAS) or 2tb (SATA). The newer model now has 3tb (SATA).
  • Each cell contains 384G of flash cache, made up of 4 96g f20 PCI cards..

Now lets logon to a storage cell and see how it is configuring.

First go to cellcli, and look at the physical disks.
CellCLI| list physicaldisk
         20:0            R0DQF8          normal
         20:1            R1N71G          normal
         20:2            R1NQVB          normal
         20:3            R1N8DD          normal
         20:4            R1NNBC          normal
         20:5            R1N8BW          normal
         20:6            R1KFW3          normal
         20:7            R1EX24          normal
         20:8            R2LWZC          normal
         20:9            R0K8MF          normal
         20:10           R0HR55          normal
         20:11           R0JQ9A          normal
         FLASH_1_0       3047M04YEC      normal
         FLASH_1_1       3047M05079      normal
         FLASH_1_2       3048M052FD      normal
         FLASH_1_3       3047M04YF7      normal
         FLASH_2_0       3047M04WXN      normal
         FLASH_2_1       3047M04YAJ      normal
         FLASH_2_2       3047M04WTR      normal
         FLASH_2_3       3047M04Y9L      normal
         FLASH_4_0       3047M0500W      normal
         FLASH_4_1       3047M0503G      normal
         FLASH_4_2       3047M0500X      normal
         FLASH_4_3       3047M0501G      normal
         FLASH_5_0       3047M050XG      normal
         FLASH_5_1       3047M050XP      normal
         FLASH_5_2       3047M05098      normal
         FLASH_5_3       3047M050UH      normal


From this you can see that there are 12 physical disks (20:0 - 20:11), and 16 flash disks.
Now lets look at the detail from these 2 types of disks.  I will use the command


list physicaldisk {diskname} detail



CellCLI| list physicaldisk 20:0 detail
         name:                   20:0
         deviceId:               19
         diskType:               HardDisk
         enclosureDeviceId:      20
         errMediaCount:          0
         errOtherCount:          0
         foreignState:           false
         luns:                   0_0
         makeModel:              "SEAGATE ST32000SSSUN2.0T"
         physicalFirmware:       0514
         physicalInsertTime:     2011-09-20T10:19:00-04:00
         physicalInterface:      sata
         physicalSerial:         R0DQF8
         physicalSize:           1862.6559999994934G
         slotNumber:             0
         status:                 normal



This is what you would see for a SAS 600g Disk
CellCLI| list physicaldisk 20:0 detail

         name:                   20:9
         deviceId:               17
         diskType:               HardDisk
         enclosureDeviceId:      20
         errMediaCount:          23
         errOtherCount:          0
         foreignState:           false
         luns:                   0_9
         makeModel:              "TEST ST360057SSUN600G"
         physicalFirmware:       0805
         physicalInsertTime:     0000-03-24T22:10:19+00:00
         physicalInterface:      sas
         physicalSerial:         E08XLW
         physicalSize:           558.9109999993816G
         slotNumber:             9
         status:                 normal


This is what the configuration of the FLASH drives are

CellCLI| list physicaldisk FLASH_5_0 detail
         name:                   FLASH_5_0
         diskType:               FlashDisk
         errCmdTimeoutCount:     0
         errHardReadCount:       0
         errHardWriteCount:      0
         errMediaCount:          0
         errOtherCount:          0
         errSeekCount:           0
         luns:                   5_0
         makeModel:              "MARVELL SD88SA02"
         physicalFirmware:       D20Y
         physicalInsertTime:     2011-09-20T10:20:17-04:00
         physicalInterface:      sas
         physicalSerial:         3047M050XG
         physicalSize:           22.8880615234375G
         sectorRemapCount:       0
         slotNumber:             "PCI Slot: 5; FDOM: 0"
         status:                 normal



So this gives me a good idea of what disks the storage is made up of. In my case you can see that the 12 disks are SATA, and they contain 1862 of usable space.
In the case of the SAS, you can see they contain 558g of usable space.

You can also see that the flash disks comprise of 16 separate disks, that are connected through 4 PCI cards. Each card contains 4 22g flashdisks.

For now (and the rest of this post), I will not talk about the flash. It is possible to use these cell disks, and provision them as usable storage, but I won't be discussing that.

Now that we have  the physical disk layout, we can move to next level  First to review.

We have 12 physical disks.  Each disk contains 1862.65 g of space. (22,352g/cell)

Now the next step is to look at the luns that were created out of the physical disks.  The lun, is the amount of usable space left after the disks have been turned into block devices and presented to the server. You can see that is is a small amount, and below is the output(truncated after the first 2 disks, then I've included the flashdisk to show that detail.


CellCLI| list lun detail
         name:                   0_0
         cellDisk:               CD_00_tpfh1
         deviceName:             /dev/sda
         diskType:               HardDisk
         id:                     0_0
         isSystemLun:            TRUE
         lunAutoCreate:          FALSE
         lunSize:                1861.712890625G
         lunUID:                 0_0
         physicalDrives:         20:0
         raidLevel:              0
         lunWriteCacheMode:      WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
         status:                 normal

         name:                   0_1
         cellDisk:               CD_01_tpfh1
         deviceName:             /dev/sdb
         diskType:               HardDisk
         id:                     0_1
         isSystemLun:            TRUE
         lunAutoCreate:          FALSE
         lunSize:                1861.712890625G
         lunUID:                 0_1
         physicalDrives:         20:1
         raidLevel:              0
         lunWriteCacheMode:      WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
         status:                 normal

         name:                   2_2
         cellDisk:               FD_06_tpfh1
         deviceName:             /dev/sdab
         diskType:               FlashDisk
         id:                     2_2
         isSystemLun:            FALSE
         lunAutoCreate:          FALSE
         lunSize:                22.8880615234375G
         overProvisioning:       100.0
         physicalDrives:         FLASH_2_2
         status:                 normal



So from this you can see that we have 1861.7 g of usable space on each drive, and you can see that the LUNS are given names that refer to the server. In this case the tpfh1 is the name of the storage cell, and this is included in the cellDisk name to easily identify the disk.

The next step is to take a look at the cell disks that were created out of these luns.

The items to note on this output is that first 2 disks contain the OS. You will see that the usable space left after the creation of the os partitions is less than the other disks.  The overhead for the cell software on each disk is also taken (though it is a small amount).

Here is what we have next as celldisks.



CellCLI| list celldisk detail
         name:                   CD_00_tpfh1
         comment:
         creationTime:           2011-09-23T00:19:30-04:00
         deviceName:             /dev/sda
         devicePartition:        /dev/sda3
         diskType:               HardDisk
         errorCount:             0
         freeSpace:              0
         id:                     a15671cd-2bab-4bfe
         interleaving:           none
         lun:                    0_0
         raidLevel:              0
         size:                   1832.59375G
         status:                 normal

         name:                   CD_01_tpfh1
         comment:
         creationTime:           2011-09-23T00:19:34-04:00
         deviceName:             /dev/sdb
         devicePartition:        /dev/sdb3
         diskType:               HardDisk
         errorCount:             0
         freeSpace:              0
         id:                     de0ee154-6925-4281
         interleaving:           none
         lun:                    0_1
         raidLevel:              0
         size:                   1832.59375G
         status:                 normal

         name:                   CD_02_tpfh1
         comment:
         creationTime:           2011-09-23T00:19:34-04:00
         deviceName:             /dev/sdc
         devicePartition:        /dev/sdc
         diskType:               HardDisk
         errorCount:             0
         freeSpace:              0
         id:                     711765f1-90cc-4b53
         interleaving:           none
         lun:                    0_2
         raidLevel:              0
         size:                   1861.703125G
         status:                 normal


Now you can see the first 2 disks have 1832.6g available, and the remaining 10 disks have 1861.7g available (I didn't include the last 9 disks in the output).

So to review where we are. There are 12 physical disks, which are carved into luns, then become cell disks.  These cells have (2 x 1832.6) + (10 x 1861.7) = 22,282g of raw disk available.

Now these disks get carved up into Grid disks. The grid disks are what is presented to ASM.  Lets see how my storage cell is carved up.  While looking at the output, notice that the celldisks are named CD_00_{cellname} through  CD_11_{cellname}.  Here is a snippet



CellCLI| list griddisk detail
         name:                   DATA_DMPF_CD_00_tpfh1
         availableTo:
         cellDisk:               CD_00_tpfh1
         comment:
         creationTime:           2011-09-23T00:21:59-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     2f72fb5a-adf5
         offset:                 32M
         size:                   733G
         status:                 active

         name:                   DATA_DMPF_CD_01_tpfh1
         availableTo:
         cellDisk:               CD_01_tpfh1
         comment:
         creationTime:           2011-09-23T00:21:59-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     0631c4a2-2b39
         offset:                 32M
         size:                   733G
         status:                 active
.......
.......
.......

        name:                   DATA_DMPF_CD_11_tpfh1
         availableTo:
         cellDisk:               CD_11_tpfh1
         comment:
         creationTime:           2011-09-23T00:22:00-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     ccd79051-0e24
         offset:                 32M
         size:                   733G
         status:                 active

         name:                   DBFS_DG_CD_02_tpfh1
         availableTo:
         cellDisk:               CD_02_tpfh1
         comment:
         creationTime:           2011-09-23T00:20:37-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     d292062b-0e26
         offset:                 1832.59375G
         size:                   29.109375G
         status:                 active

         name:                   DBFS_DG_CD_03_tpfh1
         availableTo:
         cellDisk:               CD_03_tpfh1
         comment:
         creationTime:           2011-09-23T00:20:38-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     b8c478a9-5ae1
         offset:                 1832.59375G
         size:                   29.109375G
         status:                 active

         name:                   DBFS_DG_CD_04_tpfh1
         availableTo:
         cellDisk:               CD_04_tpfh1
         comment:
         creationTime:           2011-09-23T00:20:39-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     606e3d69-c25b
         offset:                 1832.59375G
         size:                   29.109375G
         status:                 active
.....
.....
.....
         name:                   DBFS_DG_CD_11_tpfh1
         availableTo:
         cellDisk:               CD_11_tpfh1
         comment:
         creationTime:           2011-09-23T00:20:45-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     58af96a8-3fc8
         offset:                 1832.59375G
         size:                   29.109375G
         status:                 active

         name:                   RECO_DMPF_CD_00_tpfh1
         availableTo:
         cellDisk:               CD_00_tpfh1
         comment:
         creationTime:           2011-09-23T00:22:09-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     77f73bbf-09a9
         offset:                 733.046875G
         size:                   1099.546875G
         status:                 active

.....
.....
.....

         name:                   RECO_DMPF_CD_11_tpfh1
         availableTo:
         cellDisk:               CD_11_tpfh1
         comment:
         creationTime:           2011-09-23T00:22:09-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     fad57e10-414f
         offset:                 733.046875G
         size:                   1099.546875G
         status:                 active



Now by looking at this you can see that there are 3 sets of grid disks.

DATA - this carved out of every disk, and contains 733g of storage.  This starts at offset 32m (the beginning of the disks)..

RECO - this is carved out of every disk also, and contains 1099.5g of storage. This starts at offset 733G.

So now we are getting the picture.. Each celldisk is carved into 2 gridisk, starting with Data, followed by reco.

DBFS - This is carved out of the last 10 disks (starting with disk 2) at offset 1832.59, and it  contains 29.1g.  I can only conclude this is the size of the OS parition on the first 2 disks.

So here is what we have for sizing on each Storage cell.

DATA  -  8,796g
RECO - 13,194g
DBFS -       290g

Total   22,280

The thing to keep in mind with this number, is that the OS partitions has caused us a bit of trouble. There are only 10 of these grid disks per cell, and the are only 29g.  If we pull this out, we have ~22tb of disk usable on each storage cell.

Now to figure out how much space is in each disk group (assuming these grid disks will all go directly into 3 disk groups).

The first thing to remember is the redundance level.  Are they going to be normal redundancy (mirrored) or High redundancy (triple mirrored) ?  With normal redundancy, the disk groups are configured with a disk being redundant with a disk on another cell.  With High redundancy the disk is redundant with 2 other disks on 2 other cells. To maintain this level of redundancy, you must set aside 1 storage cells worth of storage for normal redudnacy, and 2 storage cells worth of storage for high redundancy to ensure that you are completely protected.

So what does this mean for sizing ??  The larger your array, the more usable disk you get. With a half rack, you must set aside 1 out of 7 storage cells, or 2 out of 7 storage cells for redudnacy.  For a full rack you need to set aside 1 out of 14 storage cells, or 2 out of 14 storage cells for redundancy.

Now lets run the numbers.


HALF RACK  -

Data -  Normal  (8,796g / 2) * 6 usable racks = 26,388g of usable space
            High       (8,796g / 3) * 5 usable racks = 14,660g of usable space

Reco - Normal (13,194g / 2) * 6 usable racks = 39,562g of usable space
           High      (13,194g / 3) * 5 usable racks = 21,990g of usable space

Dbfs - Normal (290g / 2) * 6 usable racks = 870g of usable space
           High      (290g / 3) * 5 usable racks = 483g of usable space

TOTAL usable (minus DBFS)
    Normal Redundancy - 65.9tb
    High Redundancy        36.6tb


FULL RACK -


Data - Normal (8,796g / 2) * 13 usable racks = 57,174g of usable space
           High (8,796g / 3) * 12 usable racks = 35,184g of usable space


Reco - Normal (13,194g / 2) * 13 usable racks = 85,761g of usable space
           High (13,194g / 3) * 12 usable racks = 52,776g of usable space


Dbfs - Normal (290g / 2) * 13 usable racks = 1885g of usable space
         High (290g / 3) * 12 usable racks = 1160g of usable space

TOTAL usable (minus DBFS)
    Normal Redundancy - 142.9 tb
    High Reundancy        - 87.96tb


So the take I get from this is.

There is a much higher cost for redunancy levels, and this cost is higher for smaller rack systems.
A certain portion of the the cells is a small gid disk, that is only on 10 of the physical disks, and is hard to utilize well.

Tuesday, January 24, 2012

Calculating disk space usage on an exadata.

I have been working on figuring out where all our space is, and how much space is actually available on our Exadata.  First to clarify what all the calculations are based on.

1/2 Rack. Sata drives. normal redundancy

This means we have
  • 7 storage cells
  • Each storage cell contains 12 disks
  • each disk is 2tb (which is about 1.862 tb usable)
  • The first 2 disks in each storage cell has ~30g already partitioned for the OS (which is mirrored).
Next I looked to see how the disks were allocated within each storage cell (they are all consistent)

list griddisk attributes name, celldisk, size
         DATA_DMPF_CD_00_srcell1        CD_00_srcell1  733G
         DATA_DMPF_CD_01_srcell1        CD_01_srcell1  733G
         DATA_DMPF_CD_02_srcell1        CD_02_srcell1  733G
         DATA_DMPF_CD_03_srcell1        CD_03_srcell1  733G
         DATA_DMPF_CD_04_srcell1        CD_04_srcell1  733G
         DATA_DMPF_CD_05_srcell1        CD_05_srcell1  733G
         DATA_DMPF_CD_06_srcell1        CD_06_srcell1  733G
         DATA_DMPF_CD_07_srcell1        CD_07_srcell1  733G
         DATA_DMPF_CD_08_srcell1        CD_08_srcell1  733G
         DATA_DMPF_CD_09_srcell1        CD_09_srcell1  733G
         DATA_DMPF_CD_10_srcell1        CD_10_srcell1  733G
         DATA_DMPF_CD_11_srcell1        CD_11_srcell1  733G
         DBFS_DG_CD_02_srcell1          CD_02_srcell1  29.109375G
         DBFS_DG_CD_03_srcell1          CD_03_srcell1  29.109375G
         DBFS_DG_CD_04_srcell1          CD_04_srcell1  29.109375G
         DBFS_DG_CD_05_srcell1          CD_05_srcell1  29.109375G
         DBFS_DG_CD_06_srcell1          CD_06_srcell1  29.109375G
         DBFS_DG_CD_07_srcell1          CD_07_srcell1  29.109375G
         DBFS_DG_CD_08_srcell1          CD_08_srcell1  29.109375G
         DBFS_DG_CD_09_srcell1          CD_09_srcell1  29.109375G
         DBFS_DG_CD_10_srcell1          CD_10_srcell1  29.109375G
         DBFS_DG_CD_11_srcell1          CD_11_srcell1  29.109375G
         RECO_DMPF_CD_00_srcell1        CD_00_srcell1  1099.546875G
         RECO_DMPF_CD_01_srcell1        CD_01_srcell1  1099.546875G
         RECO_DMPF_CD_02_srcell1        CD_02_srcell1  1099.546875G
         RECO_DMPF_CD_03_srcell1        CD_03_srcell1  1099.546875G
         RECO_DMPF_CD_04_srcell1        CD_04_srcell1  1099.546875G
         RECO_DMPF_CD_05_srcell1        CD_05_srcell1  1099.546875G
         RECO_DMPF_CD_06_srcell1        CD_06_srcell1  1099.546875G
         RECO_DMPF_CD_07_srcell1        CD_07_srcell1  1099.546875G
         RECO_DMPF_CD_08_srcell1        CD_08_srcell1  1099.546875G
         RECO_DMPF_CD_09_srcell1        CD_09_srcell1  1099.546875G
         RECO_DMPF_CD_10_srcell1        CD_10_srcell1  1099.546875G
         RECO_DMPF_CD_11_srcell1        CD_11_srcell1  1099.546875G


This is giving me a lot of information of how things were configured as griddisks.

I can tell from this that there are 3 sets of griddisks (for my diskgroups).

Data - this is composed of 12 disks containing 733g luns
reco  - this is composed of 12 disks containing 1100g luns
dbfs  - this is composed of 10 disks containing 29g luns

Notice that I mentioned previously, that the first 2 disks are used for the os (mirrored), this is why there are only 10 luns of 29g available for the dbfs disk group.

I then run the numbers for each one (7 cells * #disks * luns)

data -  61.572 tb
reco -  92.4 tb
dbfs  -   2.03 tb

Remember this is raw disk available, and I am running in normal reduncy (mirrored), if you are running triple mirrored keep this in mind.

Now this gets me a starting point, and took a look at the what asm is showing for disk usage to try see what is going on..


There are 3 values that I am looking at trying to figure out.

 Disk Group      SIZE         USED              USABLE FREE
data                     61.572   32.692             10.042
reco                     92.4         3.003             38.082
dbfs                       2.03       2.018            -.135 

Now these numbers don't seem to add up.. Only the size seems to match what I was expecting.

These are the things I started wondering about
  • How can I be using 33 tb out of 62tb raw when I am mirrored (unless it is the total raw used)
  • How can my usable free be 10tb if I am using 1/2 of the raw disk ?
  • How can my usable free be negative ???
Well in looking at the number further, and looking at the data I was able to answer the first question. The 32 tb is the raw so to state it again in actual usage...

Disk group   mirrored_used
data               16.346
reco                 1.502
dbfs                  1.009

Ok this makes a little more sense.  Looking at this this the following must be true also....

Disk group      raw left
data                 28.88
reco                 89.397
dbfs                     .019


OK, first number solved.. now lets see the next number.. The usable free must be the amount of mirred storage available (rather then raw), so If I go back to the usable free, and convert back to raw (x2 for mirrored) I get

Disk group       Usable free     Raw usable
data                   10.042          20.082
reco                    38.082         76.164
dbfs                      -.135             -.270

OK, I'm getting close, but why the discrepency, and why the negative number ??? Lets look at the diff

Disk group    Raw left     raw usable      missing raw storage
data                28.88          20.082              8.8
reco                 89.397       76.164            13.233
dbfs                     .019         -.270              -.29

Now lets take a closer look at the numbers...   and what it means to be negative.

TIP The usable free space specifies the amount of space that can be safely used for data. A value above zero means that redundancy can be properly restored after a disk failure.

So we need to reserve some space to absorb a disk loss.. hmm, in this case, it means being able to lose a storage cell, and be able to mirror on a different cell.. So lets take that calculation and see what happens
Lun_size  * disks

Disk group      calculation                   Storage cell usage
data               (.733 x 12)                     8.8
reco                (1.1 x 12)                    13.23
dbfs               (.029 x 10)                     .29

Well there is missing, space and I got answers to all my questions.

Well to summarize.

1) How much space is there to use on a 1/2 rack with 2tb Sata drives mirrored (normal redundancy) ???
    ((29g * 10 disks)     * 6 cells +
    (1833g * 12 disks) * 6 cells)/2

    66.858 tb mirrored

2) What does the values USED and SIZE mean when I am looking at ASM ?
These are the raw space avalailable across all cells, and it is the amount of raw space allocated.

3) What does the USABLE FREE show me ?
This is the amount of space you can safely allocate to your data. this (like the 2 above values) is not measured in raw, but it is measured in usable.


If anyone see's anything wrong with with my calculations let me know.  they seem to add up, and explain all the numbers...





Here is some good information, and the display from the storage cell to comfirm my sizes on whats available from the disks. My numbers match up.

http://blog.enkitec.com/wp-content/uploads/2011/02/Enkitec-Exadata-Storage-Layout11.pdf


CellCLI> list celldisk attributes name, devicePartition, size where diskType = 'HardDisk'
         CD_00_srcell1  /dev/sda3       1832.59375G
         CD_01_srcell1  /dev/sdb3       1832.59375G
         CD_02_srcell1  /dev/sdc        1861.703125G
         CD_03_srcell1  /dev/sdd        1861.703125G
         CD_04_srcell1  /dev/sde        1861.703125G
         CD_05_srcell1  /dev/sdf        1861.703125G
         CD_06_srcell1  /dev/sdg        1861.703125G
         CD_07_srcell1  /dev/sdh        1861.703125G
         CD_08_srcell1  /dev/sdi        1861.703125G
         CD_09_srcell1  /dev/sdj        1861.703125G
         CD_10_srcell1  /dev/sdk        1861.703125G
         CD_11_srcell1  /dev/sdl        1861.703125G

Thursday, January 12, 2012

AWR compare script

I don't know if anyone else has seen this, but with 12c, you can't compare before and after AWR reports.  I found this quite annoying so I used the oracle scripts, and tweaked them.. Here is the script I use.

http://dl.dropbox.com/u/23998484/awr_bsg.sql

Here are the really intesting, that you can use for any AWR period compars.

   define  num_days     = 0;
      define  dbid         =1595564914; 
      define  instance_numbers_or_ALL    = 'ALL';
      define  begin_snap   = 47097;
      define  end_snap     = 47111;
      define  num_days2    = 0;
      define  dbid2        = 1342862996;
      define  instance_numbers_or_ALL2    = 'ALL';
      define  begin_snap2  = 45907;
      define  end_snap2    = 45921; 
      define  report_type  = 'html';
      define  report_name  = /tmp/awr_gg.html
      define top_n_files        = 50;
      define top_n_segments     = 50;
      define top_n_services     = 50;
      define top_n_sql          = 100;
      @@?/rdbms/admin/awrgdrpi


Note all the Juicy goodies..  I am choosing 'ALL' for the instances in my RAC cluster.  Also notice I am overriding the number of top sql, and other things in this file.  The default number of top sql is never enough

This can be very useful to get a more meaningful AWR report in a rac environment.

Friday, December 23, 2011

Monitoring Goldengate through sql

I have been working on implementing GoldenGate.  Golden Gate works with Oracle, but I wanted to be able to monitor it's status within an Oracle session, and then use a database link to remotely query the status.. Once I have all this in place, I want to create a web page through Apex, that does a union of all my GG environments onto a single screen.. real time ! nifty Eh.

This is how I went about doing it..

1)  The basis for it all is the "info all" command executed within GGSCI.  The output looks like this.

GGSCI (orclbox) 10> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING
EXTRACT     STOPPED     S_EXTR      00:00:00      00:10:38



in this example you can see that the manager is running, but the extract is stopped.  I took the output of this command and created my own command script..

ggsci_status.sh


#!/bin/bash
export ORACLE_SID=${ORA_SID}
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1 
export ADR_HOME=/u01/app/oracle/diag/rdbms/${DB_UNIQUE_NAME}/${ORACLE_SID}
export PATH=${ORACLE_HOME}/bin:${PATH_ORIG} 
export LD_LIBRARY_PATH=$ORACLE_HOME/lib 
export PATH=$PATH:/dbfs/gg/ggs11
 cd /dbfs/gg/ggs11
 
cd /dbfs/gg/ggs11 
ggsci </tmp/ggsci.log 2>/tmp/ggsci.log
info all
EOF

cat /tmp/ggsci.log  | grep EXTRACT
cat /tmp/ggsci.log  | grep MANAGER
cat /tmp/ggsci.log  | grep REPLICAT
 


The output of this script is the 2 lines from above that show the statuses of those processes.

MANAGER     RUNNING
EXTRACT     STOPPED     S_EXTR      00:00:00      00:10:38




So now for step 2... We have the statuses we are looking for as the output of a script.. What to do ?  External tables with a preprocessor (sorry I believe this is an 11.2 feature)..

First create a directory and put your script in that directory.  (this is within oracle). I called mine SP_COPY

Here is how I defined my table creation to read the output of the above script.



create table bgrenn.ggsci_status
(gg_type varchar(20),
 status  varchar(20),
 lag_time       varchar(20),
  checkpoint    varchar(20)
)
 organization external
(type oracle_loader
  default directory SP_COPY
  access parameters
(  
   records delimited by newline
   preprocessor SP_COPY:'ggsci_status.sh'
   badfile SP_COPY: 'ggsci_status.bad'
   logfile SP_COPY: 'ggsci_status.log'
    fields terminated by whitespace
     missing field values are null
   (
   gg_type,
   status,
   lag_time,
   checkpoint
  )
)
   location ('ggsci_status.sh')
)
reject limit unlimited;

 



Now select against this table and you will see the columns from the output of your script appear as columns in the table (2 rows for this example).


Finally .... Step 3.. create a database link to this database and do a select * from ggsci_status@mylink.

There you go.  How to dynamically show the status of Golden Gate processes through a database link on a remote database.


NOTE : If the database is a RAC cluster with GG running on only one node, you need to specify the SID to ensure you are looking at the correct node running GG.

Enjoy...

Sunday, December 18, 2011

Hadoop (again)

I know I've blogged in the past that I am working on implementing Hadoop. Here are 3 articles that should explain why.

 First
http://www.nytimes.com/2011/12/18/sunday-review/the-internet-gets-physical.html?_r=1&pagewanted=all

This is a great article from the NY times, explaining that sensors are turning up everywhere. Of course more sensors mean more data! Lots more data. So how do we collect all this data and process it ?? http://www.calxeda.com/ Calxeda !! These are 5 watt processors that can be scaled up to thousands of nodes (yes I said THOUSANDS). And I know what you are saying. So what do we do this data ? So what ?

Here is a great article on how to tie it all together.

http://blogs.oracle.com/datawarehousing/entry/understanding_a_big_data_implementation

So there you are.. I think this is the vision of the future, and if you are not looking at these technologies, and how they tie together, you are missing the next big leap that is happening in the IT field. Our jobs as IT specialists will become even more important as we become an integral part of all our companies (whatever it is) business process.

Saturday, December 17, 2011

FTS vs Index scan on Exadata

I loaded up some data on my exadata, and created a pimary key on the table. The table has about 1.8 Billion rows, and is about 186g of space.  I decided to do a select count on the table to see how many rows were really there.

select count(1) from sp.wxyz_detl;

The query took about 5 minutes and 36 seconds (336 seconds) .

I was astonished !  186g took almost 6 minutes ? Well on an exadata that seemed really slow.  I took a closer look at the plan.

-----------------------------------------------------------------------------------------------------
| Id  | Operation              | Name               | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                    |       |   653K(100)|          |       |       |
|   1 |  SORT AGGREGATE        |                    |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL  |                    |  1850M|   653K  (6)| 02:10:42 |     1 |    29 |
|   3 |    INDEX FAST FULL SCAN| PIDX_WXYZ_WRKR_ID  |  1850M|   653K  (6)| 02:10:42 |     1 |    29 |
-----------------------------------------------------------------------------------------------------




Well that looked good.  INDEX FAST FULL SCAN should offload to the cells right ? it says "full scan" what could be simpler..

I looked deeper at the V$SQLSTATS table, but it didn't show what I expected.

DISK_READS BUFFER_GETS EXECUTIONS CPU_TIME            ELAPSED_TIME   CELL_OFFLOAD_ELIG_BYTES PHYSICAL_READ_BY  IO_CELL_UNC_bytes OFFLOAD_RETURNED
---------- ----------- ---------- ---------------- ---------------- ------------------------ ----------------   ----------------   ----------------
 3,165,962   3,172,055          1      112,050,965      366,230,300                0            25,935,560,704                0                0


The Offload_eligable bytes is 0, and the IO_CELL_UNCOMPRESSED_BYTES is 0.

Huh ?

Now I figured I would force a full table scan and see what happens


select /*+ full(t) */ count(1) from spr.wxyz_detl t;


-------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |  1958K(100)|          |       |       |
|   1 |  SORT AGGREGATE             |           |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL       |           |  1850M|  1958K  (3)| 06:31:40 |     1 |    29 |
|   3 |    TABLE ACCESS STORAGE FULL| WXYZ_DETL |  1850M|  1958K  (3)| 06:31:40 |     1 |    29 |
-------------------------------------------------------------------------------------------------



Looks like the cost went up, the expected elapsed time went up.. this looks like a worse plan to the optimizer, but here are the stats from v$sqlstats.

DISK_READS BUFFER_GETS EXECUTIONS CPU_TIME         ELAPSED_TIME     CELL_OFFLOAD_ELIG_BYTES PHYSICAL_READ_BY  IO_CELL_UNC_bytes  OFFLOAD_RETURNED
---------- ----------- ---------- ---------------- ---------------- ------------------------ ---------------- ---------------- ----------------
 8,684,661   8,685,010          1       36,904,390       52,623,383           71,144,235,008   71,144,742,912   71,324,041,216   25,825,191,648




Wow, look at that.. Using the FULL hint caused the cell offload eligability to change, and the elapsed time is now 52 seconds.  Big change and exactly what I wanted.

I did some searching and came up with this blog with a similar issue, but the key was a reverse lookup.

http://martincarstenbach.wordpress.com/2011/08/22/why-is-my-exadata-smart-scan-not-offloading/


This is a very simple schema (one table, one PK).. The PK was necessary because we are using Golden Gate to insert the data, and we don't want duplicates.

Thursday, December 15, 2011

Are my stats old for my query?

If you work someplace like I do you hear this all the time..
"my query is running long, can you gather stats?"

Of course the person saying this is bringing this up because somewhere, somehow it worked when reanalyzed long ago... so it's going to work again right ?  It's not like any of the users are Graduate students majoring in statistical analysis at a prestegious college like my wonderful wife (hi Jo).

Well, as we all know, that isn't always the answer, and I was looking for a faster way to tell.. I have query X and I know the SQL_ID, but are any of the statistics stale ??

Here is a great query I came up with..

set linesize 170
set pagesize 150
select table_name object_name,object_type,last_analyzed,
to_char(row_count,'999,999,999') row_count,
to_char(inserts,'999,999,999') inserts,to_char(updates,'999,999,999') updates,to_char(deletes,'999,999,999') deletes,
case when (nvl(row_count,0)/10 < nvl(inserts,0)+nvl(updates,0)+nvl(deletes,0))
then 'Y'
else 'N'
end "stale?",
case row_count
   when null then null
   when 0 then null
   else to_char(((nvl(inserts,0)+nvl(updates,0)+nvl(deletes,0))/nvl(row_count,0) *100),'99.99') || '%'
end "%change"
from
(
select  distinct object_name table_name, 'TABLE                                  ' object_type,
        (select last_analyzed from dba_tables where table_name=object_name and owner=object_owner) last_analyzed,
        (select num_rows from dba_tables where table_name=object_name and owner=object_owner) row_count,
        (select inserts from dba_tab_modifications where table_name=object_name and table_owner=object_owner) inserts,
         (select updates from dba_tab_modifications where table_name=object_name and table_owner=object_owner) updates,
        (select deletes from dba_tab_modifications where table_name=object_name and table_owner=object_owner) deletes,
        object_name sort_column
 from sys.dba_hist_sql_plan 
where sql_id='31v8553cuza05'
and object_name in (select table_name from dba_tables where table_name=object_name)
union
select  distinct object_name ,  'INDEX on ' || (select table_name from dba_indexes where index_name=object_name and owner=object_owner)  index_name,
        (select last_analyzed from dba_indexes where index_name=object_name and owner=object_owner) last_analyzed,
        (select num_rows from dba_indexes where index_name=object_name and owner=object_owner) row_count,
        null inserts,
        null updates,
        null deletes,
        (select table_name from dba_indexes where index_name=object_name and owner=object_owner)  sort_column
 from sys.dba_hist_sql_plan 
where sql_id='31v8553cuza05'
and object_name in (select index_name from dba_indexes where index_name=object_name)
)
order by sort_column,object_type desc;



and here is the output..  You can see I organized it by object in alphabetical order.   Indexes are sorted with the tables that they are on so they get grouped together.


Here is what the output looks like.

OBJECT_NAME                     OBJECT_TYPE                             LAST_ANALYZED       ROW_COUNT    INSERTS      UPDATES      DELETES      s %change
------------------------------- --------------------------------------- ------------------- ------------ ------------ ------------ ------------ - -------
CAR_TAB                         TABLE                                   2011-11-16 03:00:12           77                                        N    .00%
PK_CAR_TAB                      INDEX on CAR_TAB                        2011-11-16 03:00:13           77                                        N    .00%
CAR_DEFD_WORK_TAB               TABLE                                   2011-11-16 03:00:13           61                                        N    .00%
PK_CAR_DEFD_WORK_TAB            INDEX on CAR_DEFD_WORK_TAB              2011-11-16 03:00:13           61                                        N    .00%
CO_CAR_TAB                      TABLE                                   2011-12-01 11:19:11       27,998           94          739            0 N   2.98%
CO_EXEC_TAB                     TABLE                                   2011-11-16 03:00:57       32,679          187            2           21 N    .64%
D$VANR_TAB                      TABLE                                   2011-12-15 15:40:53            0                                        N
DIM_CLIENT                      TABLE                                   2011-12-13 22:11:51       27,203            3           22            0 N    .09%
ELEC_CMMN_ADDR                  TABLE                                   2011-11-16 03:01:06      375,095        1,949            0          171 N    .57%
PK_ELEC_CMMN_ADDR               INDEX on ELEC_CMMN_ADDR                 2011-11-16 03:01:06      375,095                                        N    .00%
ENTY_CMMN_ADDR                  TABLE                                   2011-11-16 03:01:11    2,234,749        9,221        7,755          722 N    .79%
ENTY_CMMN_ADDR_VT               TABLE                                   2011-11-16 03:01:14    2,249,575        9,395          648          722 N    .48%
IDX_ECAV_ENCA_ID                INDEX on ENTY_CMMN_ADDR_VT              2011-11-21 16:20:10    2,252,376                                        N    .00%
MAP_AREA                        TABLE                                   2011-11-16 03:01:24        4,835           11          342            0 N   7.30%
PK_MAP_AREA                     INDEX on MAP_AREA                       2011-11-16 03:01:24        4,835                                        N    .00%
INDEP_CNTRC                     TABLE                                   2011-11-16 03:01:31       17,879          241            0           32 N   1.53%
INSR_ROST_ELIG_CLSF             TABLE                                   2011-11-16 03:01:31            0                                        N
PK_INSR_ROST_ELIG_CLSF          INDEX on INSR_ROST_ELIG_CLSF            2011-11-16 03:01:31            0                                        N
J$VANR                          TABLE                                   2011-12-15 22:03:51          212                                        N    .00%
SLVR_LKUP_VAL                   TABLE                                   2011-11-16 03:01:41        2,536           24           19            2 N   1.77%
PK_SLVR_LKUP_VAL                INDEX on SLVR_LKUP_VAL                  2011-11-16 03:01:41        2,536                                        N    .00%
OPT_VAL                         TABLE                                   2011-11-16 03:01:45          628           43           16            0 N   9.39%
PK_OPT_VAL                      INDEX on OPT_VAL                        2011-11-16 03:01:45          628                                        N    .00%
REG_NM                          TABLE                                   2011-11-16 03:02:00      257,597        2,436        2,501           44 N   1.93%
REG_NM_VT                       TABLE                                   2011-11-16 03:02:02      260,111        2,513          630           44 N   1.23%
REG_ROLE                        TABLE                                   2011-11-16 03:02:05       87,808          526          239           18 N    .89%
PK_REG_ROLE                     INDEX on REG_ROLE                       2011-11-16 03:02:05       87,808                                        N    .00%
WOMN                            TABLE                                   2011-11-16 03:02:40      642,408        1,854           52           66 N    .31%
PK_WOMN                         INDEX on WOMN                           2011-11-16 03:02:41      642,408                                        N    .00%
WOMN_ETHN_CLSS                  TABLE                                   2011-11-16 03:02:42       90,622          900            4           32 N   1.03%
WOMN_NM                         TABLE                                   2011-11-16 03:02:43      678,775        1,901           84           66 N    .30%
PROD_CPNT                       TABLE                                   2011-12-02 22:05:00        2,104                                        N    .00%
PK_PROD_CPNT                    INDEX on PROD_CPNT                      2011-12-02 22:05:00        2,104                                        N    .00%
PSTL_CMMN_ADDR                  TABLE                                   2011-11-16 03:03:03      489,200        1,868          283           62 N    .45%
PK_PSTL_CMMN_ADDR               INDEX on PSTL_CMMN_ADDR                 2011-11-16 03:03:04      489,200                                        N    .00%
REF_CTRY                        TABLE                                   2011-10-25 22:02:19          260            0           21            0 N   8.08%
REF_CONV_FREQ_TYPE              TABLE                                   2011-10-26 22:01:53            8                                        N    .00%
REF_ST                          TABLE                                   2011-12-13 22:14:10           72                                        N    .00%
SNP_CDC_SUBS                    TABLE                                   2011-12-15 22:01:23            2                                        N    .00%
PK_SNP_CDC_SBS                  INDEX on SNP_CDC_SUBS                   2011-12-15 22:01:23            2                                        N    .00%
TCMN_ADDR                       TABLE                                   2011-11-16 03:03:30      628,266        4,826          219          284 N    .85%
PK_TCMN_ADDR                    INDEX on TCMN_ADDR                      2011-11-16 03:03:30      628,266                                        N    .00%
TRUC_IDFN                       TABLE                                   2011-11-16 03:03:38      471,413        3,392        4,050           84 N   1.60%
TRUC_IDFN_VT                    TABLE                                   2011-11-16 03:03:40      548,277        4,471        1,458           96 N   1.10%
VANR                            TABLE                                   2011-12-15 10:43:22      309,110           47          101            0 N    .05%
PK_VANR                         INDEX on VANR                           2011-12-15 10:43:23      309,110                                        N    .00%
VANR_EMPT_STUS                  TABLE                                   2011-11-16 03:04:43      689,725        3,098           23           54 N    .46%
VANR_EMPT_STUS_RESN_DT          TABLE                                   2011-11-16 03:04:44      477,062        2,414           21           40 N    .52%
VANR_PROD_CFG                   TABLE                                   2011-11-16 03:05:38      292,458        1,564          279           24 N    .64%
VANR_VT                         TABLE                                   2011-11-16 03:05:52      335,413        2,476          303           42 N    .84%
WV_VANR_ID_IDX                  INDEX on VANR_VT                        2011-12-13 13:05:54      337,452                                        N    .00%
VANR_WORK_CATG                  TABLE                                   2011-11-16 03:05:52      159,673        1,356            2           19 N    .86%




I'm sure I'm going to find this very useful next time I get that question.. It also nicely pinpoints any objects that you should immediately consider analyzing.

Of course you need to understand your application to really read this completely.  Especially with updates. Are they updating an index column ? Did a massive update just change the number of distinct values, and the range of values for an indexed column? Were the updates just updates to an "update date" column that isn't used (except for audits).

Lastely, it doesn't describe anything about how the statistics were gathered (histograms or not, which columns etc, etc).

Do not use this as the absolutely truth, but at least it will help point you in the right direction.