[OmniOS-discuss] ZeusRAM - predictive failure

Machine Man gearboxes at outlook.com
Mon Apr 10 20:00:50 UTC 2017


Today I received one of the ZeusRAM that I ordered, both brand new. I was struggling to find SAS SSD drives that were available in my price range as I desperately need to add a ZIL.

I decided to order ZeusRAM since they had one in stock and figured I'll add it while waiting for the other one as they are really should not be prone to failure based on design. I have not used them and would normally just prefer to use regular SSD drives.


Slotted ZeusRAM in and it began to rapidly blink the same as the disks that are currently in the pool on that backplain. Running the command format would never return with a list of disks. I left it for about 15 min and pulled it since it says on the disk that it can take up to 10 min for the caps. I could see there is an amber and green LED on the drive itself blinking, even when removed.

I slotted it back in and the disk was then available. After a few min the fault light cam on and the disk was unavailable due to the following:


Fault class : fault.io.disk.predictive-failure
Affects     : dev:///:devid=id1,sd@n5000a720300b3d57//pci@0,0/pci8086,340e@7/pci1000,3040@0/iport@f0/disk@w5000a72a300b3d57,0
                  faulted and taken out of service
FRU         : "Slot 09" (hc://:product-id=LSI-SAS2X36:server-id=:chassis-id=50030480178cf57f:serial=STM000****:part=STEC-ZeusRAM:revision=C025/ses-enclosure=1/bay=8/disk=0)
                  faulty

Description : SMART health-monitoring firmware reported that a disk
              failure is imminent.



I cleared the fault and the drive was then usable again for a few min same thing happened. Eventually the amber light on the disk itself (not the enclosure disk light) no longer blinked and the disks was online for quite some time before the alert above reappeared.



=== START OF INFORMATION SECTION ===
Vendor:               STEC
Product:              ZeusRAM
Revision:             C025
Compliance:           SPC-4
User Capacity:        8,000,000,000 bytes [8.00 GB]
Logical block size:   512 bytes
Rotation Rate:        Solid State Device
Form Factor:          3.5 inches
Logical Unit id:      0x5000a720300b3d57
Serial number:        STM000******
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Apr 10 19:17:23 2017 UTC
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     40 C
Drive Trip Temperature:        80 C

Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 0
  Blocks sent to initiator = 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0         21.323           0
write:         0        0         0         0          0         83.809           0

Non-medium error count:        0




Is there anything special that should be done for ZeusRAM in sd.conf? Its a node install and both nodes can see all the drives. I don't see any smart errors listed, but running fmadm it will show the disk as faulty due to predictive failure.

OmniOS r20 all patches applied.



thanks,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20170410/762df019/attachment.html>


More information about the OmniOS-discuss mailing list