[OmniOS-discuss] Bad ZeusRAM? How to tell if another component is causing issues?

wuffers moo at wuffers.net
Wed Dec 3 06:02:40 UTC 2014


I'm at home just looking into the health of our SAN and came across a bunch
of errors on the Stec ZeusRAM (in a mirrored log configuration):

# iostat -En
c12t5000A72B300780FFd0 Soft Errors: 0 Hard Errors: 1 Transport Errors: 5224
Vendor: STEC     Product: ZeusRAM          Revision: C018 Serial No:
STM000170C98
Size: 8.00GB <8000000000 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 391 Predictive Failure Analysis: 0

#fmdump -eV
Dec 03 2014 00:26:22.592888816 ereport.io.scsi.cmd.disk.recovered
nvlist version: 0
        class = ereport.io.scsi.cmd.disk.recovered
        ena = 0xd38b237e7ed02001
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = dev
                device-path = /pci at 0,0/pci8086,3c08 at 3/pci1000,3030 at 0/iport at f
/disk at w5000a72b300780ff,0
                devid = id1,sd at n5000a720300780ff
        (end detector)

        devid = id1,sd at n5000a720300780ff
        driver-assessment = recovered
        op-code = 0x2a
        cdb = 0x2a 0x0 0x0 0x2d 0xda 0x0 0x0 0x0 0xf8 0x0
        pkt-reason = 0x0
        pkt-state = 0x1f
        pkt-stats = 0x0
        __ttl = 0x1
        __tod = 0x547e9efe 0x2356c3f0

# dmesg
Dec  3 00:28:24 san1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c08 at 3
/pci1000,3030 at 0 (mpt_sas1):
Dec  3 00:28:24 san1    Log info 0x31120303 received for target 10
w5000a72b300780ff.
Dec  3 00:28:24 san1    scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc

from format:

57. c12t5000A72B300780FFd0 <STEC-ZeusRAM-C018-7.45GB>
          /pci at 0,0/pci8086,3c08 at 3/pci1000,3030 at 0/iport at f
/disk at w5000a72b300780ff,0

Both fmdump and dmesg has these errors repeating over and over. Everything
seems to point to the drive. I suppose I would have to physically move the
drive to eliminate cable, backplane or controller issues. Is there another
way to tell just by these error logs or is the physical test the way to go?

Are logs enough to justify an RMA?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20141203/eb6f9b85/attachment.html>


More information about the OmniOS-discuss mailing list