[OmniOS-discuss] Scrub leaves pool in wiered state with all devices in "repairing" ?

Matej Žerovnik matej at zunaj.si
Sun Jun 19 08:43:24 UTC 2016


Hey there,

I was getting the same 0x31120302 error on a brand new Supermicro JBOD chasis. I changed the JBOD with a different one and I’m not seeing those problems anymore. I sent my old chasis to service since there is probably a problem with the backplane or something (all target numbers, that kernel logged, were on the same backplane in the JBOD and I was getting errors on drives that were active as well as on hot spare drives).

Just my 2 cents…

Matej

> On 17 Jun 2016, at 20:32, Eric Sproul <eric.sproul at circonus.com> wrote:
> 
> On Fri, Jun 17, 2016 at 2:21 PM,  <steve at linuxsuite.org> wrote:
>>      I successfully  exported the problematic zfs and installed it into
>> another JBOD chassis
>> and imported. scrub and zfs send now run fine. So it isn't the
>> disks, must be cable or chassis or HBA or ....
> 
> FWIW there is a handy tool [1] that decodes LSI log info codes.
> Looking at your logs, there are two unique IOCLogInfo codes:
> 
> IOCLogInfo=0x3112010c
> IOCLogInfo=0x31120302
> 
> $ ./lsi_decode_loginfo.py 0x3112010c
> Value     3112010Ch
> Type:     30000000h SAS
> Origin:   01000000h PL
> Code:     00120000h PL_LOGINFO_CODE_ABORT See Sub-Codes below
> (PL_LOGINFO_SUB_CODE)
> Sub Code: 00000100h PL_LOGINFO_SUB_CODE_OPEN_FAILURE
> SubSub Code: 0000000Ch PL_LOGINFO_SUB_CODE_OPEN_FAIL_OPEN_TIMEOUT_EXP
> 
> $ ./lsi_decode_loginfo.py 0x31120302
> Value     31120302h
> Type:     30000000h SAS
> Origin:   01000000h PL
> Code:     00120000h PL_LOGINFO_CODE_ABORT See Sub-Codes below
> (PL_LOGINFO_SUB_CODE)
> Sub Code: 00000300h PL_LOGINFO_SUB_CODE_WRONG_REL_OFF_OR_FRAME_LENGTH
> Unparsed   00000002h
> 
> If I had to hazard a guess I'd say there's a low-level issue in the
> SAS fabric, maybe a bad expander or cable that's disrupting
> everything.  The HBA is aborting the commands it's waiting for answers
> to, either because the target never responds, or in the latter case,
> possibly corruption of the protocol traffic.  This would seem to align
> with your finding that moving the disks to a new chassis made the
> issue go away.
> 
> Eric
> 
> [1] https://github.com/baruch/lsi_decode_loginfo
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3468 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20160619/58fca76d/attachment-0001.bin>


More information about the OmniOS-discuss mailing list