[OmniOS-discuss] ZeusRAM - predictive failure
Machine Man
gearboxes at outlook.com
Mon Apr 10 23:30:27 UTC 2017
Do you select drives based on DWPD?
I am struggling to $500 - $700 drives in stock. I am limited to a number of distributors and pretty much unless its HP, Cisco or Dell its not kept in stock. On a number of disks options I got a ship date of late June and all 3 distributors indicating SSD drives are constrained.
I am now down to adding a single SSD during busy hours or when the alerts start rolling in and removing the ZIL afterhours or when the load reduces again.
My only other options for the next 3 weeks are:
1 - add 15K drives for ZIL and see if that helps.
2 - Hope for the best on the single old OCS Talos 2
3 - Mix SAS/SATA on the same backplane.
I was 100% banking on the ZeusRAM since that is what I could get my hands immediately.
________________________________
From: Richard Elling <richard.elling at richardelling.com>
Sent: Monday, April 10, 2017 5:49:55 PM
To: Machine Man
Cc: omnios-discuss at lists.omniti.com
Subject: Re: [OmniOS-discuss] ZeusRAM - predictive failure
On Apr 10, 2017, at 2:39 PM, Machine Man <gearboxes at outlook.com<mailto:gearboxes at outlook.com>> wrote:
Thank you. I am sending it back to where we purchased it from. I thought these were no longer avail, but the distributor still listed them and had in stock.
I was hesitant to purchase, but I am in desperate need for a ZIL.
ZeusRAMs have been EOL for a year or more. AIUI, the parts are no longer available to build them.
We do see better performance from the modern, enterprise-class, 12G SAS parts from HGST and Toshiba.
Unfortunately, they are priced by $/GB and not $/latency, so the smaller capacity (GB) drives are also slower.
— richard
________________________________
From: Richard Elling <richard.elling at richardelling.com<mailto:richard.elling at richardelling.com>>
Sent: Monday, April 10, 2017 4:15:32 PM
To: Machine Man
Cc: omnios-discuss at lists.omniti.com<mailto:omnios-discuss at lists.omniti.com>
Subject: Re: [OmniOS-discuss] ZeusRAM - predictive failure
On Apr 10, 2017, at 1:00 PM, Machine Man <gearboxes at outlook.com<mailto:gearboxes at outlook.com>> wrote:
Today I received one of the ZeusRAM that I ordered, both brand new. I was struggling to find SAS SSD drives that were available in my price range as I desperately need to add a ZIL.
I decided to order ZeusRAM since they had one in stock and figured I'll add it while waiting for the other one as they are really should not be prone to failure based on design. I have not used them and would normally just prefer to use regular SSD drives.
Slotted ZeusRAM in and it began to rapidly blink the same as the disks that are currently in the pool on that backplain. Running the command format would never return with a list of disks. I left it for about 15 min and pulled it since it says on the disk that it can take up to 10 min for the caps. I could see there is an amber and green LED on the drive itself blinking, even when removed.
I slotted it back in and the disk was then available. After a few min the fault light cam on and the disk was unavailable due to the following:
Fault class : fault.io.disk.predictive-failure
This occurs when the drive responds to an I/O and indicates a predictive failure or
the periodic query for drives sees a predicted failure. It is the drive telling the OS that
the drive thinks it will fail. There is nothing you can do on the OS to “fix” this.
It is possible that HGST (nee STEC) can help with further diagnosis using the vendor-specific
log pages. Several years ago, STEC helped us with root cause of failing ultracapacitor in a drive.
AFAIK, there is no publicly available decoder for those log pages.
— richard
Affects : dev:///:devid=id1,sd@n5000a720300b3d57//pci@0,0/pci8086,340e@7/pci1000,3040@0/iport@f0/disk@w5000a72a300b3d57,0
faulted and taken out of service
FRU : "Slot 09" (hc://:product-id=LSI-SAS2X36:server-id=:chassis-id=50030480178cf57f:serial=STM000****:part=STEC-ZeusRAM:revision=C025/ses-enclosure=1/bay=8/disk=0)
faulty
Description : SMART health-monitoring firmware reported that a disk
failure is imminent.
I cleared the fault and the drive was then usable again for a few min same thing happened. Eventually the amber light on the disk itself (not the enclosure disk light) no longer blinked and the disks was online for quite some time before the alert above reappeared.
=== START OF INFORMATION SECTION ===
Vendor: STEC
Product: ZeusRAM
Revision: C025
Compliance: SPC-4
User Capacity: 8,000,000,000 bytes [8.00 GB]
Logical block size: 512 bytes
Rotation Rate: Solid State Device
Form Factor: 3.5 inches
Logical Unit id: 0x5000a720300b3d57
Serial number: STM000******
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Mon Apr 10 19:17:23 2017 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 40 C
Drive Trip Temperature: 80 C
Elements in grown defect list: 0
Vendor (Seagate) cache information
Blocks sent to initiator = 0
Blocks sent to initiator = 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 21.323 0
write: 0 0 0 0 0 83.809 0
Non-medium error count: 0
Is there anything special that should be done for ZeusRAM in sd.conf? Its a node install and both nodes can see all the drives. I don't see any smart errors listed, but running fmadm it will show the disk as faulty due to predictive failure.
OmniOS r20 all patches applied.
thanks,
_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com<mailto:OmniOS-discuss at lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss
--
Richard.Elling at RichardElling.com<mailto:Richard.Elling at RichardElling.com>
+1-760-896-4422
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20170410/5f606ec3/attachment-0001.html>
More information about the OmniOS-discuss
mailing list