[OmniOS-discuss] Ang: multipath problem when replacing a failed SAS drive

Kevin Swab Kevin.Swab at colostate.edu
Thu Oct 31 17:01:20 UTC 2013


I put the drive that's missing a path in it's own pool and did some
reading and writing (filled the drive with 0's using 'dd', then read
them back off).  Other than a handful of errors in iostat and
/var/adm/messages (like the ones I reported before), everything appeared
to work fine:


# iostat -En c1t5000039478CA7150d0
c1t5000039478CA7150d0 Soft Errors: 0 Hard Errors: 2 Transport Errors: 29
Vendor: TOSHIBA  Product: MG03SCA300       Revision: 0108 Serial No:
Z2H0A008FTP3
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 2 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0


So the port on the backplane appears (at least partially) functional,
where do you think I should go from here?

Thanks again,
Kevin


On 10/30/2013 12:13 PM, Kevin Swab wrote:
> The problem drive is currently configured as a hot-spare (it replaced
> the old hot-spare, which kicked in when the original drive failed), but
> I'll remove it from the pool and do some testing with it and report back...
> 
> Thanks!
> Kevin
> 
> On 10/30/2013 12:02 PM, Johan Kragsterman wrote:
>> Hi, Kevin!
>>
>> What if you replace the drive with one of the hotspares? I mean, let the
>> hotspare stay at its place, and configure it for replacing the
>> problematic drive. Then you will find out wether the backplane has a bad
>> port or not. Allways start to try to narrow it down.
>>
>> Rgrds Johan
>>
>>
>>
>> -----"OmniOS-discuss" <omnios-discuss-bounces at lists.omniti.com> skrev: -----
>> Till: omnios-discuss at lists.omniti.com
>> Från: Kevin Swab
>> Sänt av: "OmniOS-discuss"
>> Datum: 2013.10.30 18:38
>> Ärende: [OmniOS-discuss] multipath problem when replacing a failed SAS drive
>>
>> Hello,
>>
>> I'm running OmniOS r151006p on the following system:
>>
>> - Supermicro X8DT6 board, Xeon E5606 CPU, 48GB ram
>> - Supermicro SC847 chassis, 36 drive bays, SAS expanders, LSI 9211-8i
>> controller
>> - 34 x Toshiba 3T SAS drives MG03SCA300 in one pool w/ 16 mirrored sets
>> + 2 hot spares
>>
>> 'mpathadm list lu' showed all drives as having two paths to the controller.
>>
>> Yesterday, one of the drives failed and was replaced.  The new drive is
>> only showing one path in mpathadm, and errors have started showing up
>> periodically in /var/adm/messages:
>>
>>
>>
>> # mpathadm list lu /dev/rdsk/c1t5000039478CA7150d0
>> mpath-support:  libmpscsi_vhci.so
>>         /dev/rdsk/c1t5000039478CA7150d0s2
>>                 Total Path Count: 1
>>                 Operational Path Count: 1
>>
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler  mptsas_handle_event_sync: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler  mptsas_handle_event: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 365881 kern.info]
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler  Log info 0x31120101 received for target 89.
>> Oct 30 09:30:22 hagler  scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler  mptsas_handle_event_sync: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler  mptsas_handle_event: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 365881 kern.info]
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler  Log info 0x31120101 received for target 89.
>> Oct 30 09:30:22 hagler  scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler  mptsas_handle_event_sync: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 365881 kern.info]
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler  Log info 0x31120101 received for target 89.
>> Oct 30 09:30:22 hagler  scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler  mptsas_handle_event: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler  mptsas_handle_event_sync: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 365881 kern.info]
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler  Log info 0x31120101 received for target 89.
>> Oct 30 09:30:22 hagler  scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler  mptsas_handle_event: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>>
>>
>>
>> The error messages refer to target 89, which I can confirm corresponds
>> to the missing path for my replacement drive using "lsiutil":
>>
>>
>>
>> # lsiutil -p 1 16
>>
>> LSI Logic MPT Configuration Utility, Version 1.63, June 4, 2009
>>
>> 1 MPT Port found
>>
>>      Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
>>  1.  mpt_sas0          LSI Logic SAS2008 03      200      0d000100     0
>>
>> SAS2008's links are 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G
>>
>>  B___T     SASAddress     PhyNum  Handle  Parent  Type
>> [ ... cut ... ]
>>  0  89  5000039478ca7152    17     0059    0032   SAS Target
>>  0  90  5000039478ca7153    17     005a    000a   SAS Target
>> [ ... cut ... ]
>>
>>
>>
>> When I ask "lsiutil" to rescan the bus, I see the following error when
>> it gets to target 89:
>>
>>
>>
>> # lsiutil -p 1 8
>>
>> LSI Logic MPT Configuration Utility, Version 1.63, June 4, 2009
>>
>> 1 MPT Port found
>>
>>      Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
>>  1.  mpt_sas0          LSI Logic SAS2008 03      200      0d000100     0
>>
>> SAS2008's links are 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G
>>
>>  B___T___L  Type       Vendor   Product          Rev
>> [ ... cut ... ]
>> ScsiIo to Bus 0 Target 89 failed, IOCStatus = 004b (IOC Terminated)
>>  0  90   0  Disk       TOSHIBA  MG03SCA300       0108  5000039478ca7153
>>    17
>> [ ... cut ... ]
>>
>>
>>
>> This problem has happened to me once before on a similar system.  At
>> that time, I tried reseating the drive, and tried several different
>> replacement drives, all had the same issue.  I even tried rebooting the
>> system and that didn't help.
>>
>> Does anyone know how I can clear this issue up?  I'd be happy to provide
>> any additional information that might be helpful,
>>
>> TIA,
>> Kevin
>>
>>
>>
>> -- 
>> -------------------------------------------------------------------
>> Kevin Swab                          UNIX Systems Administrator
>> ACNS                                Colorado State University
>> Phone: (970)491-6572                Email: Kevin.Swab at ColoState.EDU
>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17  8EB8 8A7D 142F 2392 791C
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
> 

-- 
-------------------------------------------------------------------
Kevin Swab                          UNIX Systems Administrator
ACNS                                Colorado State University
Phone: (970)491-6572                Email: Kevin.Swab at ColoState.EDU
GPG Fingerprint: 7026 3F66 A970 67BD 6F17  8EB8 8A7D 142F 2392 791C


More information about the OmniOS-discuss mailing list