[OmniOS-discuss] slow drive response times
Richard Elling
richard.elling at richardelling.com
Thu Jan 1 01:13:04 UTC 2015
> On Dec 31, 2014, at 4:30 PM, Kevin Swab <Kevin.Swab at ColoState.EDU> wrote:
>
> Hello Richard and group, thanks for your reply!
>
> I'll look into sg_logs for one of these devices once I have a chance to
> track that progam down...
>
> Thanks for the tip on the 500 ms latency, I wasn't aware that could
> happen in normal cases. However, I don't believe what I'm seeing
> constitutes normal behavior.
>
> First, some anecdotal evidence: If I pull and replace the suspect
> drive, my downstream systems stop complaining, and the high service time
> numbers go away.
We call these "wounded soldiers" -- it takes more resources to manage a
wounded soldier than a dead soldier, so one strategy of war is to wound your
enemy causing them to consume resources tending the wounded. The sg_logs
should be enlightening.
NB, consider a 4TB disk with 5 platters: if a head or surface starts to go, then
you have a 1/10 chance that the data you request is under the damaged head
and will need to be recovered by the drive. So it is not uncommon to see
90+% of the I/Os to the drive completing quickly. It is also not unusual to see
only a small number of sectors or tracks affected.
Detecting these becomes tricky, especially as you reduce the timeout/retry
interval, since the problem is rarely seen in the average latency -- that which
iostat and sar record. This is an area where we can and are improving.
-- richard
>
> I threw out 500 ms as a guess to the point at which I start seeing
> problems. However, I see service times far in excess of that, sometimes
> over 30,000 ms! Below is 20 minutes of sar output from a drive I pulled
> a few days ago, during a time when downstream VMWare servers were
> complaining. (since the sar output is so verbose, I grepped out the
> info just for the suspect drive):
>
> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd91,a)'
> 14:50:00 device %busy avque r+w/s blks/s avwait avserv
> sd91,a 99 5.3 1 42 0.0 7811.7
> sd91,a 100 11.3 1 53 0.0 11016.0
> sd91,a 100 3.8 1 75 0.0 3615.8
> sd91,a 100 4.9 1 25 0.0 8633.5
> sd91,a 93 3.9 1 55 0.0 4385.3
> sd91,a 86 3.5 2 75 0.0 2060.5
> sd91,a 91 3.1 4 80 0.0 823.8
> sd91,a 97 3.5 1 50 0.0 3984.5
> sd91,a 100 4.4 1 56 0.0 6068.6
> sd91,a 100 5.0 1 55 0.0 8836.0
> sd91,a 100 5.7 1 51 0.0 7939.6
> sd91,a 98 9.9 1 42 0.0 12526.8
> sd91,a 100 7.4 0 10 0.0 36813.6
> sd91,a 51 3.8 8 90 0.0 500.2
> sd91,a 88 3.4 1 60 0.0 2338.8
> sd91,a 100 4.5 1 28 0.0 6969.2
> sd91,a 93 3.8 1 59 0.0 5138.9
> sd91,a 79 3.1 1 59 0.0 3143.9
> sd91,a 99 4.7 1 52 0.0 5598.4
> sd91,a 100 4.8 1 62 0.0 6638.4
> sd91,a 94 5.0 1 54 0.0 3752.7
>
> For comparison, here's the sar output from another drive in the same
> pool for the same period of time:
>
> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd82,a)'
> 14:50:00 device %busy avque r+w/s blks/s avwait avserv
> sd82,a 0 0.0 2 28 0.0 5.6
> sd82,a 1 0.0 3 51 0.0 5.4
> sd82,a 1 0.0 4 66 0.0 6.3
> sd82,a 1 0.0 3 48 0.0 4.3
> sd82,a 1 0.0 3 45 0.0 6.1
> sd82,a 1 0.0 6 82 0.0 2.7
> sd82,a 1 0.0 8 112 0.0 2.8
> sd82,a 0 0.0 3 27 0.0 1.8
> sd82,a 1 0.0 5 80 0.0 3.1
> sd82,a 0 0.0 3 35 0.0 3.1
> sd82,a 1 0.0 3 35 0.0 3.8
> sd82,a 1 0.0 4 49 0.0 3.2
> sd82,a 0 0.0 0 0 0.0 4.1
> sd82,a 3 0.0 9 84 0.0 4.1
> sd82,a 1 0.0 6 55 0.0 3.7
> sd82,a 0 0.0 1 23 0.0 7.0
> sd82,a 0 0.0 6 57 0.0 1.8
> sd82,a 1 0.0 5 70 0.0 2.3
> sd82,a 1 0.0 4 55 0.0 3.7
> sd82,a 1 0.0 5 72 0.0 4.1
> sd82,a 1 0.0 4 54 0.0 3.6
>
> The other drives in this pool all show data similar to that of sd82.
>
> Your point about tuning blindly is well taken, and I'm certainly no
> expert on the IO stack. What's a humble sysadmin to do?
>
> For further reference, this system is running r151010. The drive in
> question is a Toshiba MG03SCA300 (7200rpm SAS), and the pool the drive
> was in is using lz4 compression and looks like this:
>
> # zpool status data1
> pool: data1
> state: ONLINE
> scan: resilvered 1.67T in 70h56m with 0 errors on Wed Dec 31 14:40:20 2014
> config:
>
> NAME STATE READ WRITE CKSUM
> data1 ONLINE 0 0 0
> raidz2-0 ONLINE 0 0 0
> c6t5000039468CB54F0d0 ONLINE 0 0 0
> c6t5000039478CB5138d0 ONLINE 0 0 0
> c6t5000039468D000DCd0 ONLINE 0 0 0
> c6t5000039468D000E8d0 ONLINE 0 0 0
> c6t5000039468D00F5Cd0 ONLINE 0 0 0
> c6t5000039478C816CCd0 ONLINE 0 0 0
> c6t5000039478C8546Cd0 ONLINE 0 0 0
> raidz2-1 ONLINE 0 0 0
> c6t5000039478C855F0d0 ONLINE 0 0 0
> c6t5000039478C856E8d0 ONLINE 0 0 0
> c6t5000039478C856ECd0 ONLINE 0 0 0
> c6t5000039478C856F4d0 ONLINE 0 0 0
> c6t5000039478C86374d0 ONLINE 0 0 0
> c6t5000039478C8C2A8d0 ONLINE 0 0 0
> c6t5000039478C8C364d0 ONLINE 0 0 0
> raidz2-2 ONLINE 0 0 0
> c6t5000039478C9958Cd0 ONLINE 0 0 0
> c6t5000039478C995C4d0 ONLINE 0 0 0
> c6t5000039478C9DACCd0 ONLINE 0 0 0
> c6t5000039478C9DB30d0 ONLINE 0 0 0
> c6t5000039478C9DB6Cd0 ONLINE 0 0 0
> c6t5000039478CA73B4d0 ONLINE 0 0 0
> c6t5000039478CB3A20d0 ONLINE 0 0 0
> raidz2-3 ONLINE 0 0 0
> c6t5000039478CB3A64d0 ONLINE 0 0 0
> c6t5000039478CB3A70d0 ONLINE 0 0 0
> c6t5000039478CB3E7Cd0 ONLINE 0 0 0
> c6t5000039478CB3EB0d0 ONLINE 0 0 0
> c6t5000039478CB3FBCd0 ONLINE 0 0 0
> c6t5000039478CB4048d0 ONLINE 0 0 0
> c6t5000039478CB4054d0 ONLINE 0 0 0
> raidz2-4 ONLINE 0 0 0
> c6t5000039478CB424Cd0 ONLINE 0 0 0
> c6t5000039478CB4250d0 ONLINE 0 0 0
> c6t5000039478CB470Cd0 ONLINE 0 0 0
> c6t5000039478CB471Cd0 ONLINE 0 0 0
> c6t5000039478CB4E50d0 ONLINE 0 0 0
> c6t5000039478CB50A8d0 ONLINE 0 0 0
> c6t5000039478CB50BCd0 ONLINE 0 0 0
> spares
> c6t50000394A8CBC93Cd0 AVAIL
>
> errors: No known data errors
>
>
> Thanks for your help,
> Kevin
>
> On 12/31/2014 3:22 PM, Richard Elling wrote:
>>
>>> On Dec 31, 2014, at 11:25 AM, Kevin Swab <Kevin.Swab at colostate.edu> wrote:
>>>
>>> Hello Everyone,
>>>
>>> We've been running OmniOS on a number of SuperMicro 36bay chassis, with
>>> Supermicro motherboards, LSI SAS controllers (9211-8i & 9207-8i) and
>>> various SAS HDD's. These systems are serving block storage via Comstar
>>> and Qlogic FC HBA's, and have been running well for several years.
>>>
>>> The problem we've got is that as the drives age, some of them start to
>>> perform slowly (intermittently) without failing - no zpool or iostat
>>> errors, and nothing logged in /var/adm/messages. The slow performance
>>> can be seen as high average service times in iostat or sar.
>>
>> Look at the drive's error logs using sg_logs (-a for all)
>>
>>>
>>> When these service times get above 500ms, they start to cause IO
>>> timeouts on the downstream storage consumers, which is bad...
>>
>> 500 milliseconds is not unusual for a busy HDD with SCSI TCQ or SATA NCQ
>>
>>>
>>> I'm wondering - is there a way to tune OmniOS' behavior so that it
>>> doesn't try so hard to complete IOs to these slow disks, and instead
>>> just gives up and fails them?
>>
>> Yes, the tuning in Alasdair's blog should work as he describes. More below...
>>
>>>
>>> I found an old post from 2011 which states that some tunables exist,
>>> but are ignored by the mpt_sas driver:
>>>
>>> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
>>>
>>> Does anyone know the current status of these tunables, or have any other
>>> suggestions that might help?
>>
>> These tunables are on the order of seconds. The default, 60, is obviously too big
>> unless you have old, slow, SCSI CD-ROMs. But setting it below the manufacturer's
>> internal limit (default or tuned) can lead to an unstable system. Some vendors are
>> better than others at documenting these, but in any case you'll need to see their spec.
>> Expect values on the order of 6 to 15 seconds for modern HDDs and SSDs.
>>
>> There are a lot of tunables in this area at all levels of the architecture. OOB, the OmniOS
>> settings ensure stable behaviour. Tuning any layer without understanding the others can
>> lead to unstable systems, as demonstrated by your current downstream consumers.
>> -- richard
>>
>>
>>>
>>> Thanks,
>>> Kevin
>>>
>>>
>>> --
>>> -------------------------------------------------------------------
>>> Kevin Swab UNIX Systems Administrator
>>> ACNS Colorado State University
>>> Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU
>>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C
>>> _______________________________________________
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss at lists.omniti.com
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
More information about the OmniOS-discuss
mailing list