[OmniOS-discuss] Pool degraded

Tue Apr 8 20:22:46 UTC 2014

Hello, and sorry for accidentally failing to "reply-all" on your first
message...

The man page seems misleading or incomplete on the subject of
"autoreplace" and spares.  Setting 'autoreplace=on' should cause your
hot spare to kick in during a drive failure - with over 1100 spindles
running ZFS here, we've had the "opportunity" to test it many times! ;-)

I couldn't find any authoratative references for this, but here's a few
unautoratative ones:

http://my.safaribooksonline.com/book/operating-systems-and-server-administration/solaris/9780137049639/managing-storage-pools/ch02lev1sec7

http://stanley-huang.blogspot.com/2009/09/how-to-set-autoreplace-in-zfs-pool.html

http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm

Hope this helps,
Kevin

On 04/08/2014 01:09 PM, Alexander Lesle wrote:
> Hello Kevin Swab and List,
> 
> On April, 08 2014, 20:17 <Kevin Swab> wrote in [1]:
> 
>> Instead of a 'zpool remove ...', you want to do a 'zpool detach ...' to
>> get rid of the old device.
> 
> thats it.
> zpool detach ... "removes" the broken device from the pool.
> 
>> If you turn the 'autoreplace' property on
>> for the pool, the spare will automatically kick in the next time a drive
>> fails...
> 
> Are you sure? Because the man zpool tell me other:
> 
> ,-----[ man zpool ]-----
> |
> | autoreplace=on | off
> | 
> |          Controls automatic device replacement. If set to  "off",
> |          device  replacement must be initiated by the administra-
> |          tor by using the "zpool  replace"  command.  If  set  to
> |          "on",  any  new device, found in the same physical loca-
> |          tion as a device that previously belonged to  the  pool,
> |          is  automatically  formatted  and  replaced. The default
> |          behavior is "off". This property can also be referred to
> |          by its shortened column name, "replace".
> |
> `-------------------
> 
> I understand it that when I pull out a device and put a new device in
> the _same_ Case-Slot ZFS make a resilver and ZFS pull out the old one
> automatically.
> When the property if off I have use the command zpool replace ... ...
> what I have done.
> 
> But in my case, the spare device was in the Case and _named for_ this
> pool
> So the 'Hot Spares-Section' tells
> ,-----[ man zpool ]-----
> |
> | ZFS allows devices to  be  associated  with  pools  as  "hot
> | spares".  These  devices  are not actively used in the pool,
> | but  when  an  active  device  fails,  it  is  automatically
> | replaced  by  a hot spare.
> |
> `-------------------
> 
> Or I have misunderstood.
> 
>> On 04/08/2014 12:13 PM, Alexander Lesle wrote:
>>> Hello All,
>>>
>>> I have a pool with mirrors and one spare.
>>> Now my pool is degraded and I though that Omnios/ZFS activate the
>>> spare itself and make a resilvering.
>>>
>>> # zpool status -x
>>>   pool: pool_ripley
>>>  state: DEGRADED
>>> status: One or more devices could not be opened.  Sufficient replicas exist for
>>>         the pool to continue functioning in a degraded state.
>>> action: Attach the missing device and online it using 'zpool online'.
>>>    see: http://illumos.org/msg/ZFS-8000-2Q
>>>   scan: resilvered 84K in 0h0m with 0 errors on Sun Mar 23 15:09:08 2014
>>> config:
>>>
>>>         NAME                       STATE     READ WRITE CKSUM
>>>         pool_ripley                DEGRADED     0     0     0
>>>           mirror-0                 DEGRADED     0     0     0
>>>             c1t5000CCA22BC16BC5d0  ONLINE       0     0     0
>>>             c1t5000CCA22BEEF6A3d0  UNAVAIL      0     0     0  cannot open
>>>           mirror-1                 ONLINE       0     0     0
>>>             c1t5000CCA22BC8D31Ad0  ONLINE       0     0     0
>>>             c1t5000CCA22BF612C4d0  ONLINE       0     0     0
>>>           .
>>>           .
>>>           .
>>>
>>>         spares
>>>           c1t5000CCA22BF5B9DEd0    AVAIL
>>>
>>> But nothing done.
>>> OK, then I do it myself.
>>> # zpool replace -f pool_ripley c1t5000CCA22BEEF6A3d0 c1t5000CCA22BF5B9DEd0
>>> Resilvering is starting immediately.
>>>
>>> # zpool status -x
>>>   pool: pool_ripley
>>>  state: DEGRADED
>>> status: One or more devices could not be opened.  Sufficient replicas exist for
>>>         the pool to continue functioning in a degraded state.
>>> action: Attach the missing device and online it using 'zpool online'.
>>>    see: http://illumos.org/msg/ZFS-8000-2Q
>>>   scan: resilvered 1.53T in 3h12m with 0 errors on Sun Apr  6 17:48:51 2014
>>> config:
>>>
>>>         NAME                         STATE     READ WRITE CKSUM
>>>         pool_ripley                  DEGRADED     0     0     0
>>>           mirror-0                   DEGRADED     0     0     0
>>>             c1t5000CCA22BC16BC5d0    ONLINE       0     0     0
>>>             spare-1                  DEGRADED     0     0     0
>>>               c1t5000CCA22BEEF6A3d0  UNAVAIL      0     0     0  cannot open
>>>               c1t5000CCA22BF5B9DEd0  ONLINE       0     0     0
>>>           mirror-1                   ONLINE       0     0     0
>>>             c1t5000CCA22BC8D31Ad0    ONLINE       0     0     0
>>>             c1t5000CCA22BF612C4d0    ONLINE       0     0     0
>>>          .
>>>          .
>>>          .
>>>         spares
>>>           c1t5000CCA22BF5B9DEd0      INUSE     currently in use
>>>
>>> After resilvering I made power-off, unplugged the broken HDD from
>>> Case-slot 1 and switched the Spare from Slot 21 to Slot 1.
>>> The pool is still degraded. The broken HDD I cant remove it.
>>>
>>> # zpool remove pool_ripley c1t5000CCA22BEEF6A3d0
>>> cannot remove c1t5000CCA22BEEF6A3d0: only inactive hot spares,
>>> cache, top-level, or log devices can be removed
>>>
>>> What can I do to through out the broken HDD and tell ZFS that the
>>> spare is now member of mirror-0 and remove it from the spare list?
>>> Why does not automatically jump in the Spare device and resilver the
>>> pool?
>>>
>>> Thanks.
>>>
> 
> 

-- 
-------------------------------------------------------------------
Kevin Swab                          UNIX Systems Administrator
ACNS                                Colorado State University
Phone: (970)491-6572                Email: Kevin.Swab at ColoState.EDU
GPG Fingerprint: 7026 3F66 A970 67BD 6F17  8EB8 8A7D 142F 2392 791C