[OmniOS-discuss] iSCSI traffic suddenly comes to a halt and then resumes

Matej Zerovnik matej at zunaj.si
Tue May 5 16:48:28 UTC 2015


I will replace the hardwarw in about 4 months with all SAS drives, but I would love to have a working setup for the time being as well;)

I looked at smart stats and there doesnt seem to be any errors. Also, no hard/soft/transfer error reported by any drive. Will take a look at service time tomorrow, maybe put the drives to graphite and look at them over a longer period.

I looked at iostat -x status today and stats for pool itself reported 100% busy most of the time, 98-100% wait, 500-1300 transactions in queue, around 500 active,... First line, that is average from boot, says avg service time.is around 1600ms which seems like aaaalot. Can it be due to really big queue?

Would it help to create 5 10drives raidz pools instead of one with 50 drives?

Matej


-----Original Message-----
From: "Narayan Desai" <narayan.desai at gmail.com>
Sent: ‎5.‎5.‎2015 16:32
To: "Michael Rasmussen" <mir at miras.org>
Cc: "Matej Zerovnik" <matej at zunaj.si>; "omnios-discuss" <omnios-discuss at lists.omniti.com>
Subject: Re: [OmniOS-discuss] iSCSI traffic suddenly comes to a halt and then resumes

And, if you don't have the luxury of discarding hardware and replacing it with a supported configuration, you might look at finding marginal drives, either via error counters displayed in iostat -En, or drives with really high service times (in iostat -xnz output). We found (on a similar setup), that being really aggressive about drive replacement helped a lot. 


If you have desktop sata drives, then the drive firmware is part of the problem. Desktop drives retry for quite a long time when they encounter errors, which produce really inconsistent performance profiles. When you aggregate into a raid set (including in ZFS) tail latencies really start to matter for performance, and the pool just starts going out to lunch for a long time. If you can figure out and replace the drive is causing the problem (even if it isn't causing any hard errors), the pool performance goes back to normal.
 -nld


On Tue, May 5, 2015 at 4:21 AM, <mir at miras.org> wrote:

On 2015-05-05 09:46, Matej Zerovnik wrote:


We still kept our SATA hard drives in Supermicro JBOD with SAS
expander and SATA drives.


Your problem boils down to using SATA disks in a SAS expander. Search omnios user list and you will find numerous proofs that using SATA disks in a SAS expander causes weird behaviors and instability.

The fact is that SATA disks is unsupported in a SAS expander due to incompatibility between command sets in SAS and SATA. As an example SATA NCQ is not passed through the SAS expander which might could be the cause of your strange iSCSI disconnects experienced on the client side.

----

This mail was virus scanned and spam checked before delivery.
This mail is also DKIM signed. See header dkim-signature.


_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150505/70024fd5/attachment.html>


More information about the OmniOS-discuss mailing list