[OmniOS-discuss] iSCSI target hang, no way to restart but server reboot

Matej Zerovnik matej at zunaj.si
Fri Mar 27 14:54:27 UTC 2015


It just happened about 2 hours ago... The whole system did not crash, 
but 2 clients lost the connection.

This is what I see in logs:
Mar 27 13:55:51 storage.host.org scsi: [ID 107833 kern.notice] 
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 13:55:51 storage.host.org         Timeout of 0 seconds expired 
with 1 commands on target 68 lun 0.
Mar 27 13:55:51 storage.host.org scsi: [ID 107833 kern.warning] WARNING: 
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 13:55:51 storage.host.org         Disconnected command timeout 
for target 68 w500304800039d83d, enclosure 3
Mar 27 13:55:52 storage.host.org scsi: [ID 365881 kern.info] 
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 13:55:52 storage.host.org         Log info 0x31140000 received 
for target 68 w500304800039d83d.
Mar 27 13:55:52 storage.host.org         scsi_status=0x0, 
ioc_status=0x8048, scsi_state=0xc
Mar 27 15:08:31 storage.host.org iscsit: [ID 744151 kern.notice] NOTICE: 
login_sm_session_bind: add new conn/sess continue
Mar 27 15:10:53 storage.host.org scsi: [ID 107833 kern.notice] 
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 15:10:53 storage.host.org         Timeout of 0 seconds expired 
with 1 commands on target 68 lun 0.
Mar 27 15:10:53 storage.host.org scsi: [ID 107833 kern.warning] WARNING: 
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 15:10:53 storage.host.org         Disconnected command timeout 
for target 68 w500304800039d83d, enclosure 3
Mar 27 15:10:54 storage.host.org scsi: [ID 365881 kern.info] 
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 15:10:54 storage.host.org         Log info 0x31140000 received 
for target 68 w500304800039d83d.
Mar 27 15:10:54 storage.host.org         scsi_status=0x0, 
ioc_status=0x8048, scsi_state=0xc

I read in the archives, that this errors happens when you have SATA 
drives on a SAS expander and one of the drives misbehaves:
A command did not complete and the mpt driver reset the target.
If that target is an expander, then everything behind the expander can
reset, resulting in the aborts of any in-flight commands, as follows...

iostat -Ei | grep Error reports that one device has 6 hard errors and 6 
device not ready errors, but that is a local drive, attached to a 
different controller (LSI Megaraid).

I wouldn't like to do a major upgrade, since this is a production 
machine. Too scary:)

Matej

On 27. 03. 2015 15:43, Dan McDonald wrote:
>> On Mar 27, 2015, at 9:07 AM, Matej Zerovnik <matej at zunaj.si> wrote:
>>
>> Hello!
>>
>> We are having issues with iSCSI on work. Every now and then, iSCSI target just hangs up. We are unable to kill it, restart it or do anything else to restore the service. The only option to restore iscsi target back to working state, is to reboot the whole server and loose all the sessions (around 100 clients).
>>
>> Weird thing is, that only iscsi target hangs. I can ssh to server and work on it without any problem, there is no load or anything else, nothing in log files, just iscsi target locks up and all connections to iscsi target drop (probably timeout)
>>
>> Server is a IBM 3550 M4 with dual Xeon E5-2640 CPUs and 160GB of memory.
>> Hard drives are mounted in a Supermicro JBOD, which is attached via SAS HBA LSI Logic SAS2308.
>>
>> We are using OmniOS v11 r151006.
>>
>> Anyone encounter similar troubles?
>> Any recomendations what to do or how to solve that problem?
> I'd move to 012 or wait the short amount of time until 014 hits the streets.  Then see if your problem persists.
>
> Dan
>



More information about the OmniOS-discuss mailing list