[OmniOS-discuss] ZFS Questions. Is a bug ?

Mon Sep 2 02:07:36 UTC 2013

As far as we've been able to tell, zfs replace is a one way street; once
you start the replace, there doesn't seem to be a way to cancel it until it
is completed.

Also, resilvers appear to start from scratch any time anything about the
pool changes. Do you have a drive that is flapping offline and coming back,
or something like that? Are you getting any messages in /var/adm/messages
about disk devices?

Considering the dire appearance of that pool, you might consider trying to
boost resilver priority. We found this:
http://my2ndhead.blogspot.com/2011/03/adjusting-zfs-resilvering-speed.html
to work well to improve overall resilver performance (at the cost of
pending IO requests from clients), ymmv.
 -nld

On Fri, Aug 30, 2013 at 1:26 PM, "Daniel D. Gonçalves" <
daniel at dgnetwork.com.br> wrote:

> My ZFS POOL is over a month doing RESILVER in LOOP thus ending one
> RESILVER after a few minutes, another starts.
> The replace command is never finished, did three days ago subsitution of a
> device, and it never ends:
>           mirror-3       DEGRADED     0     0    28
>             c17t20d1     ONLINE       0     0    28
>             replacing-1  DEGRADED    28     0     0
>               c17t22d1   UNAVAIL      0     0     0 cannot open
>               c17t13d1   ONLINE       0     0    28 (resilvering)
>
>
> In the mirror belowI would like to remove all devices with status UNAVAIL
> and do a replace again for a correct device, but the commands OFFLINE,
> REMOVE, DETACH not work:
>           mirror-1       DEGRADED 28     0     0
>             c17t24d1     ONLINE       0     0    28 (resilvering)
>             replacing-1  UNAVAIL      0     0     0 insufficient replicas
>               c17t22d1   UNAVAIL      0     0     0 cannot open
>               c17t12d1   UNAVAIL      0     0     0 cannot open
>               c17t21d1   UNAVAIL      0     0     0 cannot open
>
> My entire POOL:
> pool: STORAGE01
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>         continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>   scan: resilver in progress since Fri Aug 30 14:42:42 2013
>     530G scanned out of 18.4T at 227M/s, 23h1m to go
>     62.1G resilvered, 2.80% done
> config:
>
>         NAME             STATE     READ WRITE CKSUM
>         STORAGE01        DEGRADED    14     0    16
>           mirror-0       ONLINE       0     0     0
>             c17t15d1     ONLINE       0     0     0
>             c17t19d1     ONLINE       0     0     0
>           mirror-1       DEGRADED    28     0     0
>             c17t24d1     ONLINE       0     0    28 (resilvering)
>             replacing-1  UNAVAIL      0     0     0 insufficient replicas
>               c17t22d1   UNAVAIL      0     0     0 cannot open
>               c17t12d1   UNAVAIL      0     0     0 cannot open
>               c17t21d1   UNAVAIL      0     0     0 cannot open
>           mirror-2       ONLINE       0     0     0
>             c17t18d1     ONLINE       0     0     0 (resilvering)
>             c17t17d1     ONLINE       0     0     0 (resilvering)
>           mirror-3       DEGRADED     0     0    32
>             c17t20d1     ONLINE       0     0    32
>             replacing-1  DEGRADED    32     0     0
>               c17t22d1   UNAVAIL      0     0     0 cannot open
>               c17t13d1   ONLINE       0     0    32 (resilvering)
>           mirror-5       ONLINE       0     0     0
>             c17t25d1     ONLINE       0     0     0
>             c17t27d1     ONLINE       0     0     0
>           mirror-6       ONLINE       0     0     0
>             c17t26d1     ONLINE       0     0     0
>             c17t28d1     ONLINE       0     0     0
>           mirror-7       ONLINE       0     0     0
>             c17t29d1     ONLINE       0     0     0
>             c17t31d1     ONLINE       0     0     0
>           mirror-8       ONLINE       0     0     0
>             c17t32d1     ONLINE       0     0     0
>             c17t30d1     ONLINE       0     0     0
>           mirror-9       ONLINE       0     0     0
>             c17t23d1     ONLINE       0     0     0
>             c17t14d1     ONLINE       0     0     0
>         logs
>           mirror-4       ONLINE       0     0     0
>             c14t1d0      ONLINE       0     0     0
>             c14t3d0      ONLINE       0     0     0
>         cache
>           c14t4d0        ONLINE       0     0     0
>
>
> Need urgent help to solve this. I believe it is a bug in ZFS.
>
> Thanks,
>
> Daniel
>
> Em 22/08/2013 17:42, Saso Kiselkov escreveu:
>
>> On 8/22/13 9:20 PM, "Daniel D. Gonçalves" wrote:
>>
>>> Thanks Saso,
>>>
>>> To stop RESILVER, which device I to set to OFFLINE?
>>>
>> The one that says 'resilvering'. But beware that that means that the
>> pool might not have full fault tolerance.
>>
>>  I do not know how the device "c17t33d1" was placed in the
>>> MIRROR-11/REPLACING-1, how do I remove it from there?
>>>
>> If you can, let it run to completion before attempting any further
>> manipulation. The pool seems to be in quite an unhappy state anyway, so
>> better not compound the situation by doing more changes. Let the thing
>> resync back up, find the files that have the data errors in them ("zpool
>> status -v" I think), restore them or delete them and then post a new
>> "zpool status" to the list - then we'll see what can be done.
>>
>> Above all, be patient if you don't want to lose your data.
>>
>> Cheers,
>>
>
> ______________________________**_________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.**com <OmniOS-discuss at lists.omniti.com>
> http://lists.omniti.com/**mailman/listinfo/omnios-**discuss<http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20130901/77f84391/attachment.html>