Wednesday, February 23, 2011

Performance Collection Script for Solaris 10

I had the need to collect a bunch of system statistics on Solaris 10 servers during a performance test. I wanted to get these statistics at a much more frequent basis than I have sar configured for and I also wanted to include some scripts that I have found useful for collecting other performance statistics. So, I wrote a quick script to use during the test. One script that I plugged into mine is one written by Brendan Gregg. It’s called “nicstat” - it collects performance statistics for network interfaces. It can be dowloaded from http://www.brendangregg.com/Perf/network.html#nicstat

To use:
1) Download the script from here.http://sunblog.mbrannigan.com/collect.tgz
1) Unzip the collect.tgz archive with gtar.
2) Put a copy of Brendan Gregg’s nicstat script into the collect subdirectory.
3) Run the collect.sh script.

Results:
When the script first starts up, it will create a subdirectory of the output directory named after the system you are on. After this, the script will loop, collecting various statistics during its execution and storing the results in the directory it created. Currently, the script will collect the following statistics:
• netstat -an
• nicstat
• A list of TCP sessions in the ESTABLISHED state
• A count of TCP sessions in the ESTABLISHED state (based on SRC and DEST IPs)
• A list of TCP sessions in the TIME_WAIT state
• A count of TCP sessions in the TIME_WAIT state (based on SRC and DEST IPs)
• netstat -i
• TCP statistics from netstat -s
• I/O statistics from iostat -xnz
• Memory / CPU statistics from vmstat
• System event activity from vmstat -s
• Paging activity from vmstat -p
• Swap activity from vmstat -S

How to stop it:
The script will sleep for 5 minutes and then append to the end of the various files that it creates. To stop collection, simply press Ctrl-C. The snooze time between collections can be changed by modifying the SNOOZE parameter. It is currently configured to snooze 300 seconds (5 minutes).

Changing a disk label (EFI / SMI)

I had inserted a drive into a V440 and after running devfsadm, I ran format on the disk. I was presented with the following partition table:

partition> p
Current partition table (original):
Total disk sectors available: 143358320 + 16384 (reserved sectors)

Part Tag Flag First Sector Size Last Sector
0 usr wm 34 68.36GB 143358320
1 unassigned wm 0 0 0
2 unassigned wm 0 0 0
3 unassigned wm 0 0 0
4 unassigned wm 0 0 0
5 unassigned wm 0 0 0
6 unassigned wm 0 0 0
8 reserved wm 143358321 8.00MB 143374704

This disk was used in a zfs pool and, as a result, uses an EFI label. The more familiar label that is used is an SMI label (8 slices; numbered 0-7 with slice 2 being the whole disk). The advantage of the EFI label is that it supports LUNs over 1TB in size and prevents overlapping partitions by providing a whole-disk device called cxtydz rather than using cxtydzs2.

However, I want to use this disk for UFS partitions. This means I need to get it back the SMI label for the device. Here’s how it’s done:

# format -e
...
partition> label
[0] SMI Label
[1] EFI Label
Specify Label type[1]: 0
Warning: This disk has an EFI label. Changing to SMI label will erase all
current partitions.
Continue? y
Auto configuration via format.dat[no]?
Auto configuration via generic SCSI-2[no]?
partition> q
...
format> q
#

Running format again will show that the SMI label was placed back onto the disk:

partition> p
Current partition table (original):
Total disk cylinders available: 14087 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks
0 root wm 0 - 25 129.19MB (26/0/0) 264576
1 swap wu 26 - 51 129.19MB (26/0/0) 264576
2 backup wu 0 - 14086 68.35GB (14087/0/0) 143349312
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 usr wm 52 - 14086 68.10GB (14035/0/0) 142820160
7 unassigned wm 0 0 (0/0/0) 0

Monday, February 7, 2011

DMX Configuration Options

I've been looking at DMX configuration options this week. Essentially the question is how best to lay out a DMX-3 or DMX-4 array with a tiered configuration. For me there are two options and it's pretty clear which I prefer. First a little background. The following diagram shows the way DMX drives are deployed within a fully configured array. The array is divided into "quadrants", splitting each drive bay into two.

Back-end directors (BED) provide connectivity to the drives as represented by the colour scheme. There are up to 4 BED pairs available for a full configuration.
Option 1 - Dedicated Quadrants


One option is to dedicate quadrants to a workload or tier. For example tier 1 storage gets given quadrant 1. Theoretically this should provide this tier with uncontended back-end bandwidth as all the tier 1 storage will reside in the same location. What it doesn't do is let tier 1 storage utilise unused bandwidth on the other BEDs, which as the array scales, may prove to be a problem.

Option 2 - Mixed Workload

In this option, disks are spread across the whole array, perhaps placing tier 1 disks first followed by tier 2 devices. In this way, the I/O load is spread across the whole configuration. As new disks are added, they are distributed throughout the array, keeping performance even. The risk with this configuration lies in whether tier 2 storage will affect tier 1, as the array becomes busy. This can be mitigated with Cache partitioning and LUN prioritisation options.
I prefer the second option when designing arrays, unless there is a very good reason to segment workload. Distributing disks gives a better overall performance balance, reducing the risk of fragmenting (and consequently wasting) resources. I would also use the same methodology for other enterprise arrays too.

Bear in mind if you choose to use Enterprise Flash Drives (EFDs) that they can only be placed in the first storage bays either side of the controller bay and with a limit of 32 per quadrant. Mind you, if you can afford more than 32 drives then you've probably paid for your onsite EMC support already!!

There's also the question of physical space. As the drives are loaded into the array, if only a small number of them are tier 1, then potentially cabinet space is wasted. Either that or the configuration has to be build in an unbalanced fashion, perhaps placing more lower tier storage to the right of the array, using the expansion BEDs.

The second diagram shows how an unbalanced array could look - tier 2 devices on the left and right are loaded at different quantities and so lead to an unbalanced layout.

DMX Configuration Options

Essentially the question is how best to lay out a DMX-3 or DMX-4 array with a tiered configuration. For me there are two options and it's pretty clear which I prefer. First a little background. The following diagram shows the way DMX drives are deployed within a fully configured array. The array is divided into "quadrants", splitting each drive bay into two. Back-end directors (BED) provide connectivity to the drives as represented by the colour scheme. There are up to 4 BED pairs available for a full configuration.

How Many IOPS? Enterprise class arrays

"How many IOPS can my RAID group sustain?" in relation to Enterprise class arrays.

Obviously the first question is to determine what the data profile is, however if it isn't known, then assume the I/O will be 100% random. If all the I/O is random, then each I/O request will require a seek (move the head to the right cylinder on the disk) and the disk to rotate to the start of the area to read (latency) which for 15K drives is 2ms. Taking the latest Seagate Cheetah 15K fibre channel drives, each drive has an identical seek time of 3.4ms for reads. This is a total time of 5.4ms, or 185 IOPS (1000/5.4). The same calculation for a Seagate SATA drive gives a worst case throughput of 104 IOPS, approximately half the capacity of the fibre channel drive.

For a RAID group of RAID-5 3+1 fibre channel drives, data will be spread across all 4 drives, so this RAID group has a potential worst case I/O throughput of 740 IOPS.

Clearly this is a "rule of thumb" as in practical terms, not every I/O will be completely random and incur the seek/latency penalties. Also, enterprise arrays have cache (the drives themselves have cache) and plenty of clever algorithms to mask the issues of the moving technology.

There are also plenty of other points of contention within the host->array stack which makes this whole subject more complicated, however, when comparing different drive speeds, calculating a worst case scenario gives a good indication of how differing drives will perform.

Incidentally, as I just mentioned, the latest Seagate 15K drives (146GB, 300GB and 460GB) all have the same performance characteristics, so tiering based on drive size isn't that useful. The only exception to this is when a high I/O throughput is required. With smaller drives, data has to be spread across more spindles, increasing the available bandwidth. That's why I think tiering should be done on drive speed not size