[OmniOS-discuss] iSCSI target hang, no way to restart but server reboot
Matej Zerovnik
matej at zunaj.si
Fri Mar 27 14:54:27 UTC 2015
It just happened about 2 hours ago... The whole system did not crash,
but 2 clients lost the connection.
This is what I see in logs:
Mar 27 13:55:51 storage.host.org scsi: [ID 107833 kern.notice]
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 13:55:51 storage.host.org Timeout of 0 seconds expired
with 1 commands on target 68 lun 0.
Mar 27 13:55:51 storage.host.org scsi: [ID 107833 kern.warning] WARNING:
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 13:55:51 storage.host.org Disconnected command timeout
for target 68 w500304800039d83d, enclosure 3
Mar 27 13:55:52 storage.host.org scsi: [ID 365881 kern.info]
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 13:55:52 storage.host.org Log info 0x31140000 received
for target 68 w500304800039d83d.
Mar 27 13:55:52 storage.host.org scsi_status=0x0,
ioc_status=0x8048, scsi_state=0xc
Mar 27 15:08:31 storage.host.org iscsit: [ID 744151 kern.notice] NOTICE:
login_sm_session_bind: add new conn/sess continue
Mar 27 15:10:53 storage.host.org scsi: [ID 107833 kern.notice]
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 15:10:53 storage.host.org Timeout of 0 seconds expired
with 1 commands on target 68 lun 0.
Mar 27 15:10:53 storage.host.org scsi: [ID 107833 kern.warning] WARNING:
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 15:10:53 storage.host.org Disconnected command timeout
for target 68 w500304800039d83d, enclosure 3
Mar 27 15:10:54 storage.host.org scsi: [ID 365881 kern.info]
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 15:10:54 storage.host.org Log info 0x31140000 received
for target 68 w500304800039d83d.
Mar 27 15:10:54 storage.host.org scsi_status=0x0,
ioc_status=0x8048, scsi_state=0xc
I read in the archives, that this errors happens when you have SATA
drives on a SAS expander and one of the drives misbehaves:
A command did not complete and the mpt driver reset the target.
If that target is an expander, then everything behind the expander can
reset, resulting in the aborts of any in-flight commands, as follows...
iostat -Ei | grep Error reports that one device has 6 hard errors and 6
device not ready errors, but that is a local drive, attached to a
different controller (LSI Megaraid).
I wouldn't like to do a major upgrade, since this is a production
machine. Too scary:)
Matej
On 27. 03. 2015 15:43, Dan McDonald wrote:
>> On Mar 27, 2015, at 9:07 AM, Matej Zerovnik <matej at zunaj.si> wrote:
>>
>> Hello!
>>
>> We are having issues with iSCSI on work. Every now and then, iSCSI target just hangs up. We are unable to kill it, restart it or do anything else to restore the service. The only option to restore iscsi target back to working state, is to reboot the whole server and loose all the sessions (around 100 clients).
>>
>> Weird thing is, that only iscsi target hangs. I can ssh to server and work on it without any problem, there is no load or anything else, nothing in log files, just iscsi target locks up and all connections to iscsi target drop (probably timeout)
>>
>> Server is a IBM 3550 M4 with dual Xeon E5-2640 CPUs and 160GB of memory.
>> Hard drives are mounted in a Supermicro JBOD, which is attached via SAS HBA LSI Logic SAS2308.
>>
>> We are using OmniOS v11 r151006.
>>
>> Anyone encounter similar troubles?
>> Any recomendations what to do or how to solve that problem?
> I'd move to 012 or wait the short amount of time until 014 hits the streets. Then see if your problem persists.
>
> Dan
>
More information about the OmniOS-discuss
mailing list