[OmniOS-discuss] Ang: multipath problem when replacing a failed SAS drive
Kevin Swab
Kevin.Swab at colostate.edu
Thu Oct 31 17:01:20 UTC 2013
I put the drive that's missing a path in it's own pool and did some
reading and writing (filled the drive with 0's using 'dd', then read
them back off). Other than a handful of errors in iostat and
/var/adm/messages (like the ones I reported before), everything appeared
to work fine:
# iostat -En c1t5000039478CA7150d0
c1t5000039478CA7150d0 Soft Errors: 0 Hard Errors: 2 Transport Errors: 29
Vendor: TOSHIBA Product: MG03SCA300 Revision: 0108 Serial No:
Z2H0A008FTP3
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 2 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
So the port on the backplane appears (at least partially) functional,
where do you think I should go from here?
Thanks again,
Kevin
On 10/30/2013 12:13 PM, Kevin Swab wrote:
> The problem drive is currently configured as a hot-spare (it replaced
> the old hot-spare, which kicked in when the original drive failed), but
> I'll remove it from the pool and do some testing with it and report back...
>
> Thanks!
> Kevin
>
> On 10/30/2013 12:02 PM, Johan Kragsterman wrote:
>> Hi, Kevin!
>>
>> What if you replace the drive with one of the hotspares? I mean, let the
>> hotspare stay at its place, and configure it for replacing the
>> problematic drive. Then you will find out wether the backplane has a bad
>> port or not. Allways start to try to narrow it down.
>>
>> Rgrds Johan
>>
>>
>>
>> -----"OmniOS-discuss" <omnios-discuss-bounces at lists.omniti.com> skrev: -----
>> Till: omnios-discuss at lists.omniti.com
>> Från: Kevin Swab
>> Sänt av: "OmniOS-discuss"
>> Datum: 2013.10.30 18:38
>> Ärende: [OmniOS-discuss] multipath problem when replacing a failed SAS drive
>>
>> Hello,
>>
>> I'm running OmniOS r151006p on the following system:
>>
>> - Supermicro X8DT6 board, Xeon E5606 CPU, 48GB ram
>> - Supermicro SC847 chassis, 36 drive bays, SAS expanders, LSI 9211-8i
>> controller
>> - 34 x Toshiba 3T SAS drives MG03SCA300 in one pool w/ 16 mirrored sets
>> + 2 hot spares
>>
>> 'mpathadm list lu' showed all drives as having two paths to the controller.
>>
>> Yesterday, one of the drives failed and was replaced. The new drive is
>> only showing one path in mpathadm, and errors have started showing up
>> periodically in /var/adm/messages:
>>
>>
>>
>> # mpathadm list lu /dev/rdsk/c1t5000039478CA7150d0
>> mpath-support: libmpscsi_vhci.so
>> /dev/rdsk/c1t5000039478CA7150d0s2
>> Total Path Count: 1
>> Operational Path Count: 1
>>
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler mptsas_handle_event_sync: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler mptsas_handle_event: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 365881 kern.info]
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler Log info 0x31120101 received for target 89.
>> Oct 30 09:30:22 hagler scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler mptsas_handle_event_sync: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler mptsas_handle_event: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 365881 kern.info]
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler Log info 0x31120101 received for target 89.
>> Oct 30 09:30:22 hagler scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler mptsas_handle_event_sync: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 365881 kern.info]
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler Log info 0x31120101 received for target 89.
>> Oct 30 09:30:22 hagler scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler mptsas_handle_event: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler mptsas_handle_event_sync: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>> Oct 30 09:30:22 hagler scsi: [ID 365881 kern.info]
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler Log info 0x31120101 received for target 89.
>> Oct 30 09:30:22 hagler scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
>> Oct 30 09:30:22 hagler scsi: [ID 243001 kern.warning] WARNING:
>> /pci at 0,0/pci8086,3410 at 9/pci1000,3020 at 0 (mpt_sas0):
>> Oct 30 09:30:22 hagler mptsas_handle_event: IOCStatus=0x8000,
>> IOCLogInfo=0x31120101
>>
>>
>>
>> The error messages refer to target 89, which I can confirm corresponds
>> to the missing path for my replacement drive using "lsiutil":
>>
>>
>>
>> # lsiutil -p 1 16
>>
>> LSI Logic MPT Configuration Utility, Version 1.63, June 4, 2009
>>
>> 1 MPT Port found
>>
>> Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev IOC
>> 1. mpt_sas0 LSI Logic SAS2008 03 200 0d000100 0
>>
>> SAS2008's links are 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G
>>
>> B___T SASAddress PhyNum Handle Parent Type
>> [ ... cut ... ]
>> 0 89 5000039478ca7152 17 0059 0032 SAS Target
>> 0 90 5000039478ca7153 17 005a 000a SAS Target
>> [ ... cut ... ]
>>
>>
>>
>> When I ask "lsiutil" to rescan the bus, I see the following error when
>> it gets to target 89:
>>
>>
>>
>> # lsiutil -p 1 8
>>
>> LSI Logic MPT Configuration Utility, Version 1.63, June 4, 2009
>>
>> 1 MPT Port found
>>
>> Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev IOC
>> 1. mpt_sas0 LSI Logic SAS2008 03 200 0d000100 0
>>
>> SAS2008's links are 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G
>>
>> B___T___L Type Vendor Product Rev
>> [ ... cut ... ]
>> ScsiIo to Bus 0 Target 89 failed, IOCStatus = 004b (IOC Terminated)
>> 0 90 0 Disk TOSHIBA MG03SCA300 0108 5000039478ca7153
>> 17
>> [ ... cut ... ]
>>
>>
>>
>> This problem has happened to me once before on a similar system. At
>> that time, I tried reseating the drive, and tried several different
>> replacement drives, all had the same issue. I even tried rebooting the
>> system and that didn't help.
>>
>> Does anyone know how I can clear this issue up? I'd be happy to provide
>> any additional information that might be helpful,
>>
>> TIA,
>> Kevin
>>
>>
>>
>> --
>> -------------------------------------------------------------------
>> Kevin Swab UNIX Systems Administrator
>> ACNS Colorado State University
>> Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU
>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>
--
-------------------------------------------------------------------
Kevin Swab UNIX Systems Administrator
ACNS Colorado State University
Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU
GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C
More information about the OmniOS-discuss
mailing list