[OmniOS-discuss] Overheating faults with ST4000NM0023
Richard Elling
richard.elling at richardelling.com
Tue Apr 22 22:42:51 UTC 2014
going out on a limb...
On Apr 22, 2014, at 2:02 PM, Saso Kiselkov <skiselkov.ml at gmail.com> wrote:
> On 4/22/14, 10:31 PM, Schweiss, Chip wrote:
>>
>> On Tue, Apr 22, 2014 at 3:17 PM, Saso Kiselkov <skiselkov.ml at gmail.com
>> <mailto:skiselkov.ml at gmail.com>> wrote:
>>
>>
>> I know, but if I understand it correctly, I need to not only disable a
>> particular path, I need to disable mpath support entirely to get
>> sg_write_buffer to talk to mpt_sas directly, instead of going through
>> the scsi_vhci glob in the middle (which, presumably, is what's causing
>> this problem). If I'm misunderstanding this, please do set me straight.
>>
>> Cheers,
>> --
>> Saso
>>
>>
>> Actually no. Disabling a physical path works too. That is how I
>> stumbled upon the MP issue. I plugged one of my paths into a second
>> server to attempt using Linux to flash the firmware. When the flash
>> started working from the primary server, I never loaded Linux in the
>> second server.
>>
>> I think the problem is actually in the disk accepting firmware via
>> multipath not so much the OS. The OS throws the error when a message
>> down a second path gets rejected by the drive.
This is plausible. The default multipath policy of round-robin means that it will
chop up such big transfers across both ports. One would think that the drives
would treat this as one server, multiple queues, but my recent experience with
drive firmware bugs reaffirms the old adage: never assume anything.
>
> Still no luck, though it's possible I'm doing it wrong:
>
> # mpathadm disable path -l /dev/rdsk/c9t5000C500578F774Bd0s2 \
> -i w5b8ca3a0e5029c00 -t w5000c500578f774a
>
> # mpathadm show lu /dev/rdsk/c9t5000C500578F774Bd0s2
> Logical Unit: /dev/rdsk/c9t5000C500578F774Bd0s2
> mpath-support: libmpscsi_vhci.so
> Vendor: SEAGATE
> Product: ST2000NM0023
> Revision: 0003
> Name Type: unknown type
> Name: 5000c500578f774b
> Asymmetric: no
> Current Load Balance: round-robin
> Logical Unit Group ID: NA
> Auto Failback: on
> Auto Probing: NA
>
> Paths:
> Initiator Port Name: w5b8ca3a0e5029c00
> Target Port Name: w5000c500578f774a
> Override Path: NA
> Path State: OK
> Disabled: yes
>
> Initiator Port Name: w5b8ca3a0e5029c00
> Target Port Name: w5000c500578f7749
> Override Path: NA
> Path State: OK
> Disabled: no
The other lesson I've learned recently is that some drive firmware is
keyed to look at one port over the other for certain operations :-(
While I have no knowledge or suspicion of it in this specific case, you might
try switching ports.
>
> Target Ports:
> Name: w5000c500578f774a
> Relative ID: 0
>
> Name: w5000c500578f7749
> Relative ID: 0
>
> # sg_write_buffer -v --in=MegalodonES3-SAS-STD-0004.LOD \
> --length=1625600 --mode=5 /dev/rdsk/c9t5000C500578F774Bd0
> Write buffer cmd: 3b 05 00 00 00 00 18 ce 00 00
> ioctl(USCSICMD) failed with os_err (errno) = 22
> write buffer: pass through os error: Invalid argument
> Write buffer failed res=-1
>
> The situation is the same regardless of which path I disable. At the
> point of the sg_write_buffer, I also get a single SCSI error logged by
> "iostat -E", so it's clear there's something wrong going on on the SCSI
> bus. I suspect it might have something to do with what you mentioned,
> but I'm just no SCSI guru to figure this out.
fmdump -eV shows SCSI error reports in detail.
-- richard
--
Richard.Elling at RichardElling.com
+1-760-896-4422
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140422/0b706e0a/attachment-0001.html>
More information about the OmniOS-discuss
mailing list