[OmniOS-discuss] slow drive response times
Kevin Swab
Kevin.Swab at ColoState.EDU
Thu Jan 1 00:30:15 UTC 2015
Hello Richard and group, thanks for your reply!
I'll look into sg_logs for one of these devices once I have a chance to
track that progam down...
Thanks for the tip on the 500 ms latency, I wasn't aware that could
happen in normal cases. However, I don't believe what I'm seeing
constitutes normal behavior.
First, some anecdotal evidence: If I pull and replace the suspect
drive, my downstream systems stop complaining, and the high service time
numbers go away.
I threw out 500 ms as a guess to the point at which I start seeing
problems. However, I see service times far in excess of that, sometimes
over 30,000 ms! Below is 20 minutes of sar output from a drive I pulled
a few days ago, during a time when downstream VMWare servers were
complaining. (since the sar output is so verbose, I grepped out the
info just for the suspect drive):
# sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd91,a)'
14:50:00 device %busy avque r+w/s blks/s avwait avserv
sd91,a 99 5.3 1 42 0.0 7811.7
sd91,a 100 11.3 1 53 0.0 11016.0
sd91,a 100 3.8 1 75 0.0 3615.8
sd91,a 100 4.9 1 25 0.0 8633.5
sd91,a 93 3.9 1 55 0.0 4385.3
sd91,a 86 3.5 2 75 0.0 2060.5
sd91,a 91 3.1 4 80 0.0 823.8
sd91,a 97 3.5 1 50 0.0 3984.5
sd91,a 100 4.4 1 56 0.0 6068.6
sd91,a 100 5.0 1 55 0.0 8836.0
sd91,a 100 5.7 1 51 0.0 7939.6
sd91,a 98 9.9 1 42 0.0 12526.8
sd91,a 100 7.4 0 10 0.0 36813.6
sd91,a 51 3.8 8 90 0.0 500.2
sd91,a 88 3.4 1 60 0.0 2338.8
sd91,a 100 4.5 1 28 0.0 6969.2
sd91,a 93 3.8 1 59 0.0 5138.9
sd91,a 79 3.1 1 59 0.0 3143.9
sd91,a 99 4.7 1 52 0.0 5598.4
sd91,a 100 4.8 1 62 0.0 6638.4
sd91,a 94 5.0 1 54 0.0 3752.7
For comparison, here's the sar output from another drive in the same
pool for the same period of time:
# sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd82,a)'
14:50:00 device %busy avque r+w/s blks/s avwait avserv
sd82,a 0 0.0 2 28 0.0 5.6
sd82,a 1 0.0 3 51 0.0 5.4
sd82,a 1 0.0 4 66 0.0 6.3
sd82,a 1 0.0 3 48 0.0 4.3
sd82,a 1 0.0 3 45 0.0 6.1
sd82,a 1 0.0 6 82 0.0 2.7
sd82,a 1 0.0 8 112 0.0 2.8
sd82,a 0 0.0 3 27 0.0 1.8
sd82,a 1 0.0 5 80 0.0 3.1
sd82,a 0 0.0 3 35 0.0 3.1
sd82,a 1 0.0 3 35 0.0 3.8
sd82,a 1 0.0 4 49 0.0 3.2
sd82,a 0 0.0 0 0 0.0 4.1
sd82,a 3 0.0 9 84 0.0 4.1
sd82,a 1 0.0 6 55 0.0 3.7
sd82,a 0 0.0 1 23 0.0 7.0
sd82,a 0 0.0 6 57 0.0 1.8
sd82,a 1 0.0 5 70 0.0 2.3
sd82,a 1 0.0 4 55 0.0 3.7
sd82,a 1 0.0 5 72 0.0 4.1
sd82,a 1 0.0 4 54 0.0 3.6
The other drives in this pool all show data similar to that of sd82.
Your point about tuning blindly is well taken, and I'm certainly no
expert on the IO stack. What's a humble sysadmin to do?
For further reference, this system is running r151010. The drive in
question is a Toshiba MG03SCA300 (7200rpm SAS), and the pool the drive
was in is using lz4 compression and looks like this:
# zpool status data1
pool: data1
state: ONLINE
scan: resilvered 1.67T in 70h56m with 0 errors on Wed Dec 31 14:40:20 2014
config:
NAME STATE READ WRITE CKSUM
data1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c6t5000039468CB54F0d0 ONLINE 0 0 0
c6t5000039478CB5138d0 ONLINE 0 0 0
c6t5000039468D000DCd0 ONLINE 0 0 0
c6t5000039468D000E8d0 ONLINE 0 0 0
c6t5000039468D00F5Cd0 ONLINE 0 0 0
c6t5000039478C816CCd0 ONLINE 0 0 0
c6t5000039478C8546Cd0 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
c6t5000039478C855F0d0 ONLINE 0 0 0
c6t5000039478C856E8d0 ONLINE 0 0 0
c6t5000039478C856ECd0 ONLINE 0 0 0
c6t5000039478C856F4d0 ONLINE 0 0 0
c6t5000039478C86374d0 ONLINE 0 0 0
c6t5000039478C8C2A8d0 ONLINE 0 0 0
c6t5000039478C8C364d0 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
c6t5000039478C9958Cd0 ONLINE 0 0 0
c6t5000039478C995C4d0 ONLINE 0 0 0
c6t5000039478C9DACCd0 ONLINE 0 0 0
c6t5000039478C9DB30d0 ONLINE 0 0 0
c6t5000039478C9DB6Cd0 ONLINE 0 0 0
c6t5000039478CA73B4d0 ONLINE 0 0 0
c6t5000039478CB3A20d0 ONLINE 0 0 0
raidz2-3 ONLINE 0 0 0
c6t5000039478CB3A64d0 ONLINE 0 0 0
c6t5000039478CB3A70d0 ONLINE 0 0 0
c6t5000039478CB3E7Cd0 ONLINE 0 0 0
c6t5000039478CB3EB0d0 ONLINE 0 0 0
c6t5000039478CB3FBCd0 ONLINE 0 0 0
c6t5000039478CB4048d0 ONLINE 0 0 0
c6t5000039478CB4054d0 ONLINE 0 0 0
raidz2-4 ONLINE 0 0 0
c6t5000039478CB424Cd0 ONLINE 0 0 0
c6t5000039478CB4250d0 ONLINE 0 0 0
c6t5000039478CB470Cd0 ONLINE 0 0 0
c6t5000039478CB471Cd0 ONLINE 0 0 0
c6t5000039478CB4E50d0 ONLINE 0 0 0
c6t5000039478CB50A8d0 ONLINE 0 0 0
c6t5000039478CB50BCd0 ONLINE 0 0 0
spares
c6t50000394A8CBC93Cd0 AVAIL
errors: No known data errors
Thanks for your help,
Kevin
On 12/31/2014 3:22 PM, Richard Elling wrote:
>
>> On Dec 31, 2014, at 11:25 AM, Kevin Swab <Kevin.Swab at colostate.edu> wrote:
>>
>> Hello Everyone,
>>
>> We've been running OmniOS on a number of SuperMicro 36bay chassis, with
>> Supermicro motherboards, LSI SAS controllers (9211-8i & 9207-8i) and
>> various SAS HDD's. These systems are serving block storage via Comstar
>> and Qlogic FC HBA's, and have been running well for several years.
>>
>> The problem we've got is that as the drives age, some of them start to
>> perform slowly (intermittently) without failing - no zpool or iostat
>> errors, and nothing logged in /var/adm/messages. The slow performance
>> can be seen as high average service times in iostat or sar.
>
> Look at the drive's error logs using sg_logs (-a for all)
>
>>
>> When these service times get above 500ms, they start to cause IO
>> timeouts on the downstream storage consumers, which is bad...
>
> 500 milliseconds is not unusual for a busy HDD with SCSI TCQ or SATA NCQ
>
>>
>> I'm wondering - is there a way to tune OmniOS' behavior so that it
>> doesn't try so hard to complete IOs to these slow disks, and instead
>> just gives up and fails them?
>
> Yes, the tuning in Alasdair's blog should work as he describes. More below...
>
>>
>> I found an old post from 2011 which states that some tunables exist,
>> but are ignored by the mpt_sas driver:
>>
>> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
>>
>> Does anyone know the current status of these tunables, or have any other
>> suggestions that might help?
>
> These tunables are on the order of seconds. The default, 60, is obviously too big
> unless you have old, slow, SCSI CD-ROMs. But setting it below the manufacturer's
> internal limit (default or tuned) can lead to an unstable system. Some vendors are
> better than others at documenting these, but in any case you'll need to see their spec.
> Expect values on the order of 6 to 15 seconds for modern HDDs and SSDs.
>
> There are a lot of tunables in this area at all levels of the architecture. OOB, the OmniOS
> settings ensure stable behaviour. Tuning any layer without understanding the others can
> lead to unstable systems, as demonstrated by your current downstream consumers.
> -- richard
>
>
>>
>> Thanks,
>> Kevin
>>
>>
>> --
>> -------------------------------------------------------------------
>> Kevin Swab UNIX Systems Administrator
>> ACNS Colorado State University
>> Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU
>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
More information about the OmniOS-discuss
mailing list