[OmniOS-discuss] Scrub leaves pool in wiered state with all devices in "repairing" ?
Matej Žerovnik
matej at zunaj.si
Sun Jun 19 08:43:24 UTC 2016
Hey there,
I was getting the same 0x31120302 error on a brand new Supermicro JBOD chasis. I changed the JBOD with a different one and I’m not seeing those problems anymore. I sent my old chasis to service since there is probably a problem with the backplane or something (all target numbers, that kernel logged, were on the same backplane in the JBOD and I was getting errors on drives that were active as well as on hot spare drives).
Just my 2 cents…
Matej
> On 17 Jun 2016, at 20:32, Eric Sproul <eric.sproul at circonus.com> wrote:
>
> On Fri, Jun 17, 2016 at 2:21 PM, <steve at linuxsuite.org> wrote:
>> I successfully exported the problematic zfs and installed it into
>> another JBOD chassis
>> and imported. scrub and zfs send now run fine. So it isn't the
>> disks, must be cable or chassis or HBA or ....
>
> FWIW there is a handy tool [1] that decodes LSI log info codes.
> Looking at your logs, there are two unique IOCLogInfo codes:
>
> IOCLogInfo=0x3112010c
> IOCLogInfo=0x31120302
>
> $ ./lsi_decode_loginfo.py 0x3112010c
> Value 3112010Ch
> Type: 30000000h SAS
> Origin: 01000000h PL
> Code: 00120000h PL_LOGINFO_CODE_ABORT See Sub-Codes below
> (PL_LOGINFO_SUB_CODE)
> Sub Code: 00000100h PL_LOGINFO_SUB_CODE_OPEN_FAILURE
> SubSub Code: 0000000Ch PL_LOGINFO_SUB_CODE_OPEN_FAIL_OPEN_TIMEOUT_EXP
>
> $ ./lsi_decode_loginfo.py 0x31120302
> Value 31120302h
> Type: 30000000h SAS
> Origin: 01000000h PL
> Code: 00120000h PL_LOGINFO_CODE_ABORT See Sub-Codes below
> (PL_LOGINFO_SUB_CODE)
> Sub Code: 00000300h PL_LOGINFO_SUB_CODE_WRONG_REL_OFF_OR_FRAME_LENGTH
> Unparsed 00000002h
>
> If I had to hazard a guess I'd say there's a low-level issue in the
> SAS fabric, maybe a bad expander or cable that's disrupting
> everything. The HBA is aborting the commands it's waiting for answers
> to, either because the target never responds, or in the latter case,
> possibly corruption of the protocol traffic. This would seem to align
> with your finding that moving the disks to a new chassis made the
> issue go away.
>
> Eric
>
> [1] https://github.com/baruch/lsi_decode_loginfo
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3468 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20160619/58fca76d/attachment-0001.bin>
More information about the OmniOS-discuss
mailing list