[OmniOS-discuss] zpool degraded while smart sais disks are OK

Richard Elling richard.elling at richardelling.com
Fri Mar 21 23:37:50 UTC 2014


On Mar 21, 2014, at 3:23 PM, Tobias Oetiker <tobi at oetiker.ch> wrote:

> Today Zach Malone wrote:
> 
>> On Fri, Mar 21, 2014 at 3:50 PM, Richard Elling
>> <richard.elling at richardelling.com> wrote:
>>> 
>>> On Mar 21, 2014, at 9:48 AM, Tobias Oetiker <tobi at oetiker.ch> wrote:
>>> 
>>> a zpool on one of our boxes has been degraded with several disks
>>> faulted ...
>>> 
>>> * the disks are all sas direct attached
>>> * according to smartctl the offending disks have no faults.
>>> * zfs decided to fault the disks after the events below.
>>> 
>>> I have now told the pool to clear the errors and it is resilvering the disks
>>> ... (in progress)
>>> 
>>> any idea what is happening here ?
>> 
>> ...
>> 
>> Did all the disks fault at the same time, or was it spread out over a
>> longer period?  I'd suspect your power supply or disk controller.
>> What are your zpool errors?
> 
> it happened over time as you can see from the timestamps in the
> log. The errors from zfs's point of view were 1 read and about 30 write
> 
> but according to smart the disks are without flaw

Actually, SMART is pretty dumb. In most cases, it only looks for uncorrectable
errors that are related to media or heads. For a clue to more permanent errors,
you will want to look at the read/write error reports for errors that are 
corrected with possible delays. You can also look at the grown defects list.

This behaviour is expected for drives with errors that are not being quickly 
corrected or have firmware bugs (horrors!) and where the disk does not do TLER
(or its vendor's equivalent)
 -- richard



More information about the OmniOS-discuss mailing list