[OmniOS-discuss] Comstar Disconnects under high load.

Tue May 13 01:29:34 UTC 2014

Hm, how clean is your fabric? Any errors, deadlocks, etc?
 -nld

On Mon, May 12, 2014 at 6:41 PM, David Bomba <turbo124 at gmail.com> wrote:

> Hi Narayan,
>
> We do not use iSER.
>
> We use SRP for VMWare, and IPoIB for XenServer.
>
> In our case, our VMs operate as expected. However when copying data
> between Storage Repo's that is when we see the disconnects irrespective of
> SCSI transport.
>
>
> On 13 May 2014 09:32, Narayan Desai <narayan.desai at gmail.com> wrote:
>
>> Are you perchance using iscsi/iSER? We've seen similar timeouts that
>> don't seem to correspond to hardware issues. From what we can tell,
>> something causes iscsi heartbeats not to be processed, so the client
>> eventually times out the block device and tries to reinitialize it.
>>
>> In our case, we're running VMs using KVM on linux hosts. The guest
>> detects block device death, and won't recover without a reboot.
>>
>> FWIW, switching to iscsi directly over IPoIB works great for identical
>> workloads. We've seen this with 151006 and I think 151008. We've not yet
>> tried it with 151010. This smells like some problem in comstar's iscsi/iser
>> driver.
>>  -nld
>>
>>
>> On Mon, May 12, 2014 at 5:13 PM, David Bomba <turbo124 at gmail.com> wrote:
>>
>>> Hi guys,
>>>
>>> We have ~ 10 OmniOS powered ZFS storage arrays used to drive Virtual
>>> Machines under XenServer + VMWare using Infiniband interconnect.
>>>
>>> Our usual recipe is to use either LSI HBA or Areca Cards in pass through
>>> mode using internal drives SAS drives..
>>>
>>> This has worked flawlessly with Omnios 6/8.
>>>
>>> Recently we deployed a slightly different configuration
>>>
>>> HP DL380 G6
>>> 64GB ram
>>> X5650 proc
>>> LSI 9208-e card
>>> HP MDS 600 / SSA 70 external enclosure
>>> 30 TOSHIBA-MK2001TRKB-1001-1.82TB SAS2 drives in mirrored configuration.
>>>
>>> despite the following message in dmesg the array appeared to be working
>>> as expected
>>>
>>> scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340f at 8/pci1000,30b0 at 0(mpt_sas1):
>>> May 13 04:01:07 s6      Log info 0x31140000 received for target 11.
>>>
>>> Despite this message we pushed into production and whilst the
>>> performance of the array has been good, as soon as we perform high write IO
>>> performance goes from 22k IOPS down to 100IOPS, this causes the target to
>>> disconnect from hypervisors and general mayhem ensues for the VMs.\
>>>
>>> During this period where performance degrades, there are no other
>>> messages coming into dmesg.
>>>
>>> Where should we begin to debug this? Could this be a symptom of not
>>> enough RAM? We have flashed the LSI cards to the latest firmware with no
>>> change in performance.
>>>
>>> Thanks in advance!
>>> _______________________________________________
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss at lists.omniti.com
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140512/80a3aba3/attachment.html>