This blog post is a product of my last post on Exadata disk usage.
I have multiple exadatas (both full Rack and 1/2 Racks), and I want to know exactly how each one is configured, now that ACS has left. How do I go about finding how they are set up.
Well let's start with the basics.
Each Storage cell
- Has 12 physical spinning disks.
The first 2 disks contain the os which utilizes ~29g of space
The disks come in either 600g (SAS) or 2tb (SATA). The newer model now has 3tb (SATA).
Each cell contains 384G of flash cache, made up of 4 96g f20 PCI cards..
Now lets logon to a storage cell and see how it is configuring.
First go to cellcli, and look at the physical disks.
CellCLI| list physicaldisk
20:0 R0DQF8 normal
20:1 R1N71G normal
20:2 R1NQVB normal
20:3 R1N8DD normal
20:4 R1NNBC normal
20:5 R1N8BW normal
20:6 R1KFW3 normal
20:7 R1EX24 normal
20:8 R2LWZC normal
20:9 R0K8MF normal
20:10 R0HR55 normal
20:11 R0JQ9A normal
FLASH_1_0 3047M04YEC normal
FLASH_1_1 3047M05079 normal
FLASH_1_2 3048M052FD normal
FLASH_1_3 3047M04YF7 normal
FLASH_2_0 3047M04WXN normal
FLASH_2_1 3047M04YAJ normal
FLASH_2_2 3047M04WTR normal
FLASH_2_3 3047M04Y9L normal
FLASH_4_0 3047M0500W normal
FLASH_4_1 3047M0503G normal
FLASH_4_2 3047M0500X normal
FLASH_4_3 3047M0501G normal
FLASH_5_0 3047M050XG normal
FLASH_5_1 3047M050XP normal
FLASH_5_2 3047M05098 normal
FLASH_5_3 3047M050UH normal
From this you can see that there are 12 physical disks (20:0 - 20:11), and 16 flash disks.
Now lets look at the detail from these 2 types of disks. I will use the command
list physicaldisk {diskname} detail
CellCLI| list physicaldisk 20:0 detail
name: 20:0
deviceId: 19
diskType: HardDisk
enclosureDeviceId: 20
errMediaCount: 0
errOtherCount: 0
foreignState: false
luns: 0_0
makeModel: "SEAGATE ST32000SSSUN2.0T"
physicalFirmware: 0514
physicalInsertTime: 2011-09-20T10:19:00-04:00
physicalInterface: sata
physicalSerial: R0DQF8
physicalSize: 1862.6559999994934G
slotNumber: 0
status: normal
This is what you would see for a SAS 600g Disk
CellCLI| list physicaldisk 20:0 detail
name: 20:9
deviceId: 17
diskType: HardDisk
enclosureDeviceId: 20
errMediaCount: 23
errOtherCount: 0
foreignState: false
luns: 0_9
makeModel: "TEST ST360057SSUN600G"
physicalFirmware: 0805
physicalInsertTime: 0000-03-24T22:10:19+00:00
physicalInterface: sas
physicalSerial: E08XLW
physicalSize: 558.9109999993816G
slotNumber: 9
status: normal
This is what the configuration of the FLASH drives are
CellCLI| list physicaldisk FLASH_5_0 detail
name: FLASH_5_0
diskType: FlashDisk
errCmdTimeoutCount: 0
errHardReadCount: 0
errHardWriteCount: 0
errMediaCount: 0
errOtherCount: 0
errSeekCount: 0
luns: 5_0
makeModel: "MARVELL SD88SA02"
physicalFirmware: D20Y
physicalInsertTime: 2011-09-20T10:20:17-04:00
physicalInterface: sas
physicalSerial: 3047M050XG
physicalSize: 22.8880615234375G
sectorRemapCount: 0
slotNumber: "PCI Slot: 5; FDOM: 0"
status: normal
So this gives me a good idea of what disks the storage is made up of. In my case you can see that the 12 disks are SATA, and they contain 1862 of usable space.
In the case of the SAS, you can see they contain 558g of usable space.
You can also see that the flash disks comprise of 16 separate disks, that are connected through 4 PCI cards. Each card contains 4 22g flashdisks.
For now (and the rest of this post), I will not talk about the flash. It is possible to use these cell disks, and provision them as usable storage, but I won't be discussing that.
Now that we have the physical disk layout, we can move to next level First to review.
We have 12 physical disks. Each disk contains 1862.65 g of space. (22,352g/cell)
Now the next step is to look at the luns that were created out of the physical disks. The lun, is the amount of usable space left after the disks have been turned into block devices and presented to the server. You can see that is is a small amount, and below is the output(truncated after the first 2 disks, then I've included the flashdisk to show that detail.
CellCLI| list lun detail
name: 0_0
cellDisk: CD_00_tpfh1
deviceName: /dev/sda
diskType: HardDisk
id: 0_0
isSystemLun: TRUE
lunAutoCreate: FALSE
lunSize: 1861.712890625G
lunUID: 0_0
physicalDrives: 20:0
raidLevel: 0
lunWriteCacheMode: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
status: normal
name: 0_1
cellDisk: CD_01_tpfh1
deviceName: /dev/sdb
diskType: HardDisk
id: 0_1
isSystemLun: TRUE
lunAutoCreate: FALSE
lunSize: 1861.712890625G
lunUID: 0_1
physicalDrives: 20:1
raidLevel: 0
lunWriteCacheMode: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
status: normal
name: 2_2
cellDisk: FD_06_tpfh1
deviceName: /dev/sdab
diskType: FlashDisk
id: 2_2
isSystemLun: FALSE
lunAutoCreate: FALSE
lunSize: 22.8880615234375G
overProvisioning: 100.0
physicalDrives: FLASH_2_2
status: normal
So from this you can see that we have 1861.7 g of usable space on each drive, and you can see that the LUNS are given names that refer to the server. In this case the tpfh1 is the name of the storage cell, and this is included in the cellDisk name to easily identify the disk.
The next step is to take a look at the cell disks that were created out of these luns.
The items to note on this output is that first 2 disks contain the OS. You will see that the usable space left after the creation of the os partitions is less than the other disks. The overhead for the cell software on each disk is also taken (though it is a small amount).
Here is what we have next as celldisks.
CellCLI| list celldisk detail
name: CD_00_tpfh1
comment:
creationTime: 2011-09-23T00:19:30-04:00
deviceName: /dev/sda
devicePartition: /dev/sda3
diskType: HardDisk
errorCount: 0
freeSpace: 0
id: a15671cd-2bab-4bfe
interleaving: none
lun: 0_0
raidLevel: 0
size: 1832.59375G
status: normal
name: CD_01_tpfh1
comment:
creationTime: 2011-09-23T00:19:34-04:00
deviceName: /dev/sdb
devicePartition: /dev/sdb3
diskType: HardDisk
errorCount: 0
freeSpace: 0
id: de0ee154-6925-4281
interleaving: none
lun: 0_1
raidLevel: 0
size: 1832.59375G
status: normal
name: CD_02_tpfh1
comment:
creationTime: 2011-09-23T00:19:34-04:00
deviceName: /dev/sdc
devicePartition: /dev/sdc
diskType: HardDisk
errorCount: 0
freeSpace: 0
id: 711765f1-90cc-4b53
interleaving: none
lun: 0_2
raidLevel: 0
size: 1861.703125G
status: normal
Now you can see the first 2 disks have 1832.6g available, and the remaining 10 disks have 1861.7g available (I didn't include the last 9 disks in the output).
So to review where we are. There are 12 physical disks, which are carved into luns, then become cell disks. These cells have (2 x 1832.6) + (10 x 1861.7) = 22,282g of raw disk available.
Now these disks get carved up into Grid disks. The grid disks are what is presented to ASM. Lets see how my storage cell is carved up. While looking at the output, notice that the celldisks are named CD_00_{cellname} through CD_11_{cellname}. Here is a snippet
CellCLI| list griddisk detail
name: DATA_DMPF_CD_00_tpfh1
availableTo:
cellDisk: CD_00_tpfh1
comment:
creationTime: 2011-09-23T00:21:59-04:00
diskType: HardDisk
errorCount: 0
id: 2f72fb5a-adf5
offset: 32M
size: 733G
status: active
name: DATA_DMPF_CD_01_tpfh1
availableTo:
cellDisk: CD_01_tpfh1
comment:
creationTime: 2011-09-23T00:21:59-04:00
diskType: HardDisk
errorCount: 0
id: 0631c4a2-2b39
offset: 32M
size: 733G
status: active
.......
.......
.......
name: DATA_DMPF_CD_11_tpfh1
availableTo:
cellDisk: CD_11_tpfh1
comment:
creationTime: 2011-09-23T00:22:00-04:00
diskType: HardDisk
errorCount: 0
id: ccd79051-0e24
offset: 32M
size: 733G
status: active
name: DBFS_DG_CD_02_tpfh1
availableTo:
cellDisk: CD_02_tpfh1
comment:
creationTime: 2011-09-23T00:20:37-04:00
diskType: HardDisk
errorCount: 0
id: d292062b-0e26
offset: 1832.59375G
size: 29.109375G
status: active
name: DBFS_DG_CD_03_tpfh1
availableTo:
cellDisk: CD_03_tpfh1
comment:
creationTime: 2011-09-23T00:20:38-04:00
diskType: HardDisk
errorCount: 0
id: b8c478a9-5ae1
offset: 1832.59375G
size: 29.109375G
status: active
name: DBFS_DG_CD_04_tpfh1
availableTo:
cellDisk: CD_04_tpfh1
comment:
creationTime: 2011-09-23T00:20:39-04:00
diskType: HardDisk
errorCount: 0
id: 606e3d69-c25b
offset: 1832.59375G
size: 29.109375G
status: active
.....
.....
.....
name: DBFS_DG_CD_11_tpfh1
availableTo:
cellDisk: CD_11_tpfh1
comment:
creationTime: 2011-09-23T00:20:45-04:00
diskType: HardDisk
errorCount: 0
id: 58af96a8-3fc8
offset: 1832.59375G
size: 29.109375G
status: active
name: RECO_DMPF_CD_00_tpfh1
availableTo:
cellDisk: CD_00_tpfh1
comment:
creationTime: 2011-09-23T00:22:09-04:00
diskType: HardDisk
errorCount: 0
id: 77f73bbf-09a9
offset: 733.046875G
size: 1099.546875G
status: active
.....
.....
.....
name: RECO_DMPF_CD_11_tpfh1
availableTo:
cellDisk: CD_11_tpfh1
comment:
creationTime: 2011-09-23T00:22:09-04:00
diskType: HardDisk
errorCount: 0
id: fad57e10-414f
offset: 733.046875G
size: 1099.546875G
status: active
Now by looking at this you can see that there are 3 sets of grid disks.
DATA - this carved out of every disk, and contains 733g of storage. This starts at offset 32m (the beginning of the disks)..
RECO - this is carved out of every disk also, and contains 1099.5g of storage. This starts at offset 733G.
So now we are getting the picture.. Each celldisk is carved into 2 gridisk, starting with Data, followed by reco.
DBFS - This is carved out of the last 10 disks (starting with disk 2) at offset 1832.59, and it contains 29.1g. I can only conclude this is the size of the OS parition on the first 2 disks.
So here is what we have for sizing on each Storage cell.
DATA - 8,796g
RECO - 13,194g
DBFS - 290g
Total 22,280
The thing to keep in mind with this number, is that the OS partitions has caused us a bit of trouble. There are only 10 of these grid disks per cell, and the are only 29g. If we pull this out, we have ~22tb of disk usable on each storage cell.
Now to figure out how much space is in each disk group (assuming these grid disks will all go directly into 3 disk groups).
The first thing to remember is the redundance level. Are they going to be normal redundancy (mirrored) or High redundancy (triple mirrored) ? With normal redundancy, the disk groups are configured with a disk being redundant with a disk on another cell. With High redundancy the disk is redundant with 2 other disks on 2 other cells. To maintain this level of redundancy, you must set aside 1 storage cells worth of storage for normal redudnacy, and 2 storage cells worth of storage for high redundancy to ensure that you are completely protected.
So what does this mean for sizing ?? The larger your array, the more usable disk you get. With a half rack, you must set aside 1 out of 7 storage cells, or 2 out of 7 storage cells for redudnacy. For a full rack you need to set aside 1 out of 14 storage cells, or 2 out of 14 storage cells for redundancy.
Now lets run the numbers.
HALF RACK -
Data - Normal (8,796g / 2) * 6 usable racks = 26,388g of usable space
High (8,796g / 3) * 5 usable racks = 14,660g of usable space
Reco - Normal (13,194g / 2) * 6 usable racks = 39,562g of usable space
High (13,194g / 3) * 5 usable racks = 21,990g of usable space
Dbfs - Normal (290g / 2) * 6 usable racks = 870g of usable space
High (290g / 3) * 5 usable racks = 483g of usable space
TOTAL usable (minus DBFS)
Normal Redundancy - 65.9tb
High Redundancy 36.6tb
FULL RACK -
Data - Normal (8,796g / 2) * 13 usable racks = 57,174g of usable space
High (8,796g / 3) * 12 usable racks = 35,184g of usable space
Reco - Normal (13,194g / 2) * 13 usable racks = 85,761g of usable space
High (13,194g / 3) * 12 usable racks = 52,776g of usable space
Dbfs - Normal (290g / 2) * 13 usable racks = 1885g of usable space
High (290g / 3) * 12 usable racks = 1160g of usable space
TOTAL usable (minus DBFS)
Normal Redundancy - 142.9 tb
High Reundancy - 87.96tb
So the take I get from this is.
There is a much higher cost for redunancy levels, and this cost is higher for smaller rack systems.
A certain portion of the the cells is a small gid disk, that is only on 10 of the physical disks, and is hard to utilize well.