[OmniOS-discuss] I/O to pool seems to be hung

henrikj at henkis.net henrikj at henkis.net
Wed Jun 22 19:22:52 UTC 2016


Hi,

We have a few nodes with 36 internal disk each connected with LSI SAS9207-8i. Now multiple times different servers have crashed with hanging I/O, with one disk in the pool missing after reboot. A power-cycle have solved the problem so far. We have downgraded the firmware in the controllers (to 18)  since there are know problems with the latest firmware and illumos, but this one we have not seen before.

We are running OmniOS 151018.

Here is some output from the crash dump:

/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Timeout of 60 seconds expired with 12 commands on target 12 lun 0.
WARNING: /pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Disconnected command timeout for target 12 w5000c5007bc6d89d, enclosure 2
WARNING: /pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        mptsas_handle_event_sync: event 0xf, IOCStatus=0x8000, IOCLogInfo=0x31170000
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Log info 0x31130000 received for target 12 w5000c5007bc6d89d.
        scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Log info 0x31130000 received for target 12 w5000c5007bc6d89d.
        scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Log info 0x31130000 received for target 12 w5000c5007bc6d89d.
        scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Log info 0x31130000 received for target 12 w5000c5007bc6d89d.
        scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Log info 0x31130000 received for target 12 w5000c5007bc6d89d.
        scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Log info 0x31130000 received for target 12 w5000c5007bc6d89d.
        scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Log info 0x31130000 received for target 12 w5000c5007bc6d89d.
        scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Log info 0x31130000 received for target 12 w5000c5007bc6d89d.
        scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Log info 0x31130000 received for target 12 w5000c5007bc6d89d.
        scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Log info 0x31130000 received for target 12 w5000c5007bc6d89d.
        scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Log info 0x31130000 received for target 12 w5000c5007bc6d89d.
        scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Log info 0x31140000 received for target 12 w5000c5007bc6d89d.
        scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
WARNING: /pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        mptsas_check_task_mgt: Task 0x3 failed. IOCStatus=0x4a IOCLogInfo=0x0 target=12

WARNING: /pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        mptsas_ioc_task_management failed try to reset ioc to recovery!
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        MPT Firmware version v18.0.0.0 (SAS2308)
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        mpt_sas0 MPI Version 0x200
/pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        mpt0: IOC Operational.
WARNING: /scsi_vhci (scsi_vhci0):
        /scsi_vhci/disk at g5000c5007bc6d89d (sd5): Command Timeout on path mpt_sas1/disk at w5000c5007bc6d8
9d,0
WARNING: /pci at 0,0/pci8086,2f04 at 2/pci1000,3020 at 0 (mpt_sas0):
        Target 12 reset for command timeout recovery failed!
NOTICE: SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major


panic[cpu0]/thread=fffff001e86cbc40:
I/O to pool ‘server03zp01' appears to be hung.


fffff001e86cba20 zfs:vdev_deadman+10b ()
fffff001e86cba70 zfs:vdev_deadman+4a ()
fffff001e86cbac0 zfs:vdev_deadman+4a ()
fffff001e86cbaf0 zfs:spa_deadman+ad ()
fffff001e86cbb90 genunix:cyclic_softint+fd ()
fffff001e86cbba0 unix:cbe_low_level+14 ()
fffff001e86cbbf0 unix:av_dispatch_softvect+78 ()
fffff001e86cbc20 apix:apix_dispatch_softint+35 ()
fffff001e8605990 unix:switch_sp_and_call+13 ()
fffff001e86059e0 apix:apix_do_softint+6c ()
fffff001e8605a40 apix:apix_do_interrupt+34a ()
fffff001e8605a50 unix:_interrupt+ba ()
fffff001e8605bc0 unix:acpi_cpu_cstate+11b ()
fffff001e8605bf0 unix:cpu_acpi_idle+8d ()
fffff001e8605c00 unix:cpu_idle_adaptive+13 ()
fffff001e8605c20 unix:idle+a7 ()
fffff001e8605c30 unix:thread_start+8 ()

syncing file systems...
 done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
NOTICE: ahci0: ahci_tran_reset_dport port 0 reset port

Anyone seen this?

Regards
Henrik


More information about the OmniOS-discuss mailing list