[OmniOS-discuss] slow drive response times

Thu Jan 1 00:30:15 UTC 2015

Hello Richard and group, thanks for your reply!

I'll look into sg_logs for one of these devices once I have a chance to
track that progam down...

Thanks for the tip on the 500 ms latency, I wasn't aware that could
happen in normal cases.  However, I don't believe what I'm seeing
constitutes normal behavior.

First, some anecdotal evidence:  If I pull and replace the suspect
drive, my downstream systems stop complaining, and the high service time
numbers go away.

I threw out 500 ms as a guess to the point at which I start seeing
problems.  However, I see service times far in excess of that, sometimes
over 30,000 ms!  Below is 20 minutes of sar output from a drive I pulled
a few days ago, during a time when downstream VMWare servers were
complaining.  (since the sar output is so verbose, I grepped out the
info just for the suspect drive):

# sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd91,a)'
14:50:00   device        %busy   avque   r+w/s  blks/s  avwait  avserv
           sd91,a           99     5.3       1      42     0.0  7811.7
           sd91,a          100    11.3       1      53     0.0 11016.0
           sd91,a          100     3.8       1      75     0.0  3615.8
           sd91,a          100     4.9       1      25     0.0  8633.5
           sd91,a           93     3.9       1      55     0.0  4385.3
           sd91,a           86     3.5       2      75     0.0  2060.5
           sd91,a           91     3.1       4      80     0.0   823.8
           sd91,a           97     3.5       1      50     0.0  3984.5
           sd91,a          100     4.4       1      56     0.0  6068.6
           sd91,a          100     5.0       1      55     0.0  8836.0
           sd91,a          100     5.7       1      51     0.0  7939.6
           sd91,a           98     9.9       1      42     0.0 12526.8
           sd91,a          100     7.4       0      10     0.0 36813.6
           sd91,a           51     3.8       8      90     0.0   500.2
           sd91,a           88     3.4       1      60     0.0  2338.8
           sd91,a          100     4.5       1      28     0.0  6969.2
           sd91,a           93     3.8       1      59     0.0  5138.9
           sd91,a           79     3.1       1      59     0.0  3143.9
           sd91,a           99     4.7       1      52     0.0  5598.4
           sd91,a          100     4.8       1      62     0.0  6638.4
           sd91,a           94     5.0       1      54     0.0  3752.7

For comparison, here's the sar output from another drive in the same
pool for the same period of time:

# sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd82,a)'
14:50:00   device        %busy   avque   r+w/s  blks/s  avwait  avserv
           sd82,a            0     0.0       2      28     0.0     5.6
           sd82,a            1     0.0       3      51     0.0     5.4
           sd82,a            1     0.0       4      66     0.0     6.3
           sd82,a            1     0.0       3      48     0.0     4.3
           sd82,a            1     0.0       3      45     0.0     6.1
           sd82,a            1     0.0       6      82     0.0     2.7
           sd82,a            1     0.0       8     112     0.0     2.8
           sd82,a            0     0.0       3      27     0.0     1.8
           sd82,a            1     0.0       5      80     0.0     3.1
           sd82,a            0     0.0       3      35     0.0     3.1
           sd82,a            1     0.0       3      35     0.0     3.8
           sd82,a            1     0.0       4      49     0.0     3.2
           sd82,a            0     0.0       0       0     0.0     4.1
           sd82,a            3     0.0       9      84     0.0     4.1
           sd82,a            1     0.0       6      55     0.0     3.7
           sd82,a            0     0.0       1      23     0.0     7.0
           sd82,a            0     0.0       6      57     0.0     1.8
           sd82,a            1     0.0       5      70     0.0     2.3
           sd82,a            1     0.0       4      55     0.0     3.7
           sd82,a            1     0.0       5      72     0.0     4.1
           sd82,a            1     0.0       4      54     0.0     3.6

The other drives in this pool all show data similar to that of sd82.

Your point about tuning blindly is well taken, and I'm certainly no
expert on the IO stack.  What's a humble sysadmin to do?

For further reference, this system is running r151010.  The drive in
question is a Toshiba MG03SCA300 (7200rpm SAS), and the pool the drive
was in is using lz4 compression and looks like this:

# zpool status data1
  pool: data1
 state: ONLINE
  scan: resilvered 1.67T in 70h56m with 0 errors on Wed Dec 31 14:40:20 2014
config:

        NAME                       STATE     READ WRITE CKSUM
        data1                      ONLINE       0     0     0
          raidz2-0                 ONLINE       0     0     0
            c6t5000039468CB54F0d0  ONLINE       0     0     0
            c6t5000039478CB5138d0  ONLINE       0     0     0
            c6t5000039468D000DCd0  ONLINE       0     0     0
            c6t5000039468D000E8d0  ONLINE       0     0     0
            c6t5000039468D00F5Cd0  ONLINE       0     0     0
            c6t5000039478C816CCd0  ONLINE       0     0     0
            c6t5000039478C8546Cd0  ONLINE       0     0     0
          raidz2-1                 ONLINE       0     0     0
            c6t5000039478C855F0d0  ONLINE       0     0     0
            c6t5000039478C856E8d0  ONLINE       0     0     0
            c6t5000039478C856ECd0  ONLINE       0     0     0
            c6t5000039478C856F4d0  ONLINE       0     0     0
            c6t5000039478C86374d0  ONLINE       0     0     0
            c6t5000039478C8C2A8d0  ONLINE       0     0     0
            c6t5000039478C8C364d0  ONLINE       0     0     0
          raidz2-2                 ONLINE       0     0     0
            c6t5000039478C9958Cd0  ONLINE       0     0     0
            c6t5000039478C995C4d0  ONLINE       0     0     0
            c6t5000039478C9DACCd0  ONLINE       0     0     0
            c6t5000039478C9DB30d0  ONLINE       0     0     0
            c6t5000039478C9DB6Cd0  ONLINE       0     0     0
            c6t5000039478CA73B4d0  ONLINE       0     0     0
            c6t5000039478CB3A20d0  ONLINE       0     0     0
          raidz2-3                 ONLINE       0     0     0
            c6t5000039478CB3A64d0  ONLINE       0     0     0
            c6t5000039478CB3A70d0  ONLINE       0     0     0
            c6t5000039478CB3E7Cd0  ONLINE       0     0     0
            c6t5000039478CB3EB0d0  ONLINE       0     0     0
            c6t5000039478CB3FBCd0  ONLINE       0     0     0
            c6t5000039478CB4048d0  ONLINE       0     0     0
            c6t5000039478CB4054d0  ONLINE       0     0     0
          raidz2-4                 ONLINE       0     0     0
            c6t5000039478CB424Cd0  ONLINE       0     0     0
            c6t5000039478CB4250d0  ONLINE       0     0     0
            c6t5000039478CB470Cd0  ONLINE       0     0     0
            c6t5000039478CB471Cd0  ONLINE       0     0     0
            c6t5000039478CB4E50d0  ONLINE       0     0     0
            c6t5000039478CB50A8d0  ONLINE       0     0     0
            c6t5000039478CB50BCd0  ONLINE       0     0     0
        spares
          c6t50000394A8CBC93Cd0    AVAIL

errors: No known data errors

Thanks for your help,
Kevin

On 12/31/2014 3:22 PM, Richard Elling wrote:
> 
>> On Dec 31, 2014, at 11:25 AM, Kevin Swab <Kevin.Swab at colostate.edu> wrote:
>>
>> Hello Everyone,
>>
>> 	We've been running OmniOS on a number of SuperMicro 36bay chassis, with
>> Supermicro motherboards, LSI SAS controllers (9211-8i & 9207-8i) and
>> various SAS HDD's.  These systems are serving block storage via Comstar
>> and Qlogic FC HBA's, and have been running well for several years.
>>
>> 	The problem we've got is that as the drives age, some of them start to
>> perform slowly (intermittently) without failing - no zpool or iostat
>> errors, and nothing logged in /var/adm/messages.  The slow performance
>> can be seen as high average service times in iostat or sar.
> 
> Look at the drive's error logs using sg_logs (-a for all)
> 
>>
>> 	When these service times get above 500ms, they start to cause IO
>> timeouts on the downstream storage consumers, which is bad...
> 
> 500 milliseconds is not unusual for a busy HDD with SCSI TCQ or SATA NCQ
> 
>>
>> 	I'm wondering - is there a way to tune OmniOS' behavior so that it
>> doesn't try so hard to complete IOs to these slow disks, and instead
>> just gives up and fails them?
> 
> Yes, the tuning in Alasdair's blog should work as he describes. More below...
> 
>>
>> 	I found an old post from 2011 which states that some tunables exist,
>> but are ignored by the mpt_sas driver:
>>
>> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
>>
>> Does anyone know the current status of these tunables, or have any other
>> suggestions that might help?
> 
> These tunables are on the order of seconds. The default, 60, is obviously too big
> unless you have old, slow, SCSI CD-ROMs. But setting it below the manufacturer's
> internal limit (default or tuned) can lead to an unstable system. Some vendors are
> better than others at documenting these, but in any case you'll need to see their spec.
> Expect values on the order of 6 to 15 seconds for modern HDDs and SSDs.
> 
> There are a lot of tunables in this area at all levels of the architecture. OOB, the OmniOS
> settings ensure stable behaviour. Tuning any layer without understanding the others can
> lead to unstable systems, as demonstrated by your current downstream consumers.
>  -- richard
> 
> 
>>
>> Thanks,
>> Kevin
>>
>>
>> -- 
>> -------------------------------------------------------------------
>> Kevin Swab                          UNIX Systems Administrator
>> ACNS                                Colorado State University
>> Phone: (970)491-6572                Email: Kevin.Swab at ColoState.EDU
>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17  8EB8 8A7D 142F 2392 791C
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss