[OmniOS-discuss] Slow scrub performance

Richard Elling richard.elling at richardelling.com
Tue Jul 29 15:29:19 UTC 2014


On Jul 28, 2014, at 5:11 PM, wuffers <moo at wuffers.net> wrote:

> Does this look normal?

maybe, maybe not

> 
>   pool: rpool
>  state: ONLINE
>   scan: scrub repaired 0 in 0h3m with 0 errors on Tue Jul 15 09:36:17 2014
> config:
> 
>         NAME          STATE     READ WRITE CKSUM
>         rpool         ONLINE       0     0     0
>           mirror-0    ONLINE       0     0     0
>             c4t0d0s0  ONLINE       0     0     0
>             c4t1d0s0  ONLINE       0     0     0
> 
> errors: No known data errors
> 
>   pool: tank
>  state: ONLINE
>   scan: scrub in progress since Mon Jul 14 17:54:42 2014
>     6.59T scanned out of 24.2T at 5.71M/s, (scan is slow, no estimated time)

this is slower than most, surely slower than desired

>     0 repaired, 27.25% done
> config:
> 
>         NAME                       STATE     READ WRITE CKSUM
>         tank                       ONLINE       0     0     0
>           mirror-0                 ONLINE       0     0     0
>             c1t5000C50055F9F637d0  ONLINE       0     0     0
>             c1t5000C50055F9EF2Fd0  ONLINE       0     0     0
>           mirror-1                 ONLINE       0     0     0
>             c1t5000C50055F87D97d0  ONLINE       0     0     0
>             c1t5000C50055F9D3B3d0  ONLINE       0     0     0
>           mirror-2                 ONLINE       0     0     0
>             c1t5000C50055E6606Fd0  ONLINE       0     0     0
>             c1t5000C50055F9F92Bd0  ONLINE       0     0     0
>           mirror-3                 ONLINE       0     0     0
>             c1t5000C50055F856CFd0  ONLINE       0     0     0
>             c1t5000C50055F9FE87d0  ONLINE       0     0     0
>           mirror-4                 ONLINE       0     0     0
>             c1t5000C50055F84A97d0  ONLINE       0     0     0
>             c1t5000C50055FA0AF7d0  ONLINE       0     0     0
>           mirror-5                 ONLINE       0     0     0
>             c1t5000C50055F9D3E3d0  ONLINE       0     0     0
>             c1t5000C50055F9F0B3d0  ONLINE       0     0     0
>           mirror-6                 ONLINE       0     0     0
>             c1t5000C50055F8A46Fd0  ONLINE       0     0     0
>             c1t5000C50055F9FB8Bd0  ONLINE       0     0     0
>           mirror-7                 ONLINE       0     0     0
>             c1t5000C50055F8B21Fd0  ONLINE       0     0     0
>             c1t5000C50055F9F89Fd0  ONLINE       0     0     0
>           mirror-8                 ONLINE       0     0     0
>             c1t5000C50055F8BE3Fd0  ONLINE       0     0     0
>             c1t5000C50055F9E123d0  ONLINE       0     0     0
>           mirror-9                 ONLINE       0     0     0
>             c1t5000C50055F9379Bd0  ONLINE       0     0     0
>             c1t5000C50055F9E7D7d0  ONLINE       0     0     0
>           mirror-10                ONLINE       0     0     0
>             c1t5000C50055E65F0Fd0  ONLINE       0     0     0
>             c1t5000C50055F9F80Bd0  ONLINE       0     0     0
>           mirror-11                ONLINE       0     0     0
>             c1t5000C50055F8A22Bd0  ONLINE       0     0     0
>             c1t5000C50055F8D48Fd0  ONLINE       0     0     0
>           mirror-12                ONLINE       0     0     0
>             c1t5000C50055E65807d0  ONLINE       0     0     0
>             c1t5000C50055F8BFA3d0  ONLINE       0     0     0
>           mirror-13                ONLINE       0     0     0
>             c1t5000C50055E579F7d0  ONLINE       0     0     0
>             c1t5000C50055E65877d0  ONLINE       0     0     0
>           mirror-14                ONLINE       0     0     0
>             c1t5000C50055F9FA1Fd0  ONLINE       0     0     0
>             c1t5000C50055F8CDA7d0  ONLINE       0     0     0
>           mirror-15                ONLINE       0     0     0
>             c1t5000C50055F8BF9Bd0  ONLINE       0     0     0
>             c1t5000C50055F9A607d0  ONLINE       0     0     0
>           mirror-16                ONLINE       0     0     0
>             c1t5000C50055E66503d0  ONLINE       0     0     0
>             c1t5000C50055E4FDE7d0  ONLINE       0     0     0
>           mirror-17                ONLINE       0     0     0
>             c1t5000C50055F8E017d0  ONLINE       0     0     0
>             c1t5000C50055F9F3EBd0  ONLINE       0     0     0
>           mirror-18                ONLINE       0     0     0
>             c1t5000C50055F8B80Fd0  ONLINE       0     0     0
>             c1t5000C50055F9F63Bd0  ONLINE       0     0     0
>           mirror-19                ONLINE       0     0     0
>             c1t5000C50055F84FB7d0  ONLINE       0     0     0
>             c1t5000C50055F9FEABd0  ONLINE       0     0     0
>           mirror-20                ONLINE       0     0     0
>             c1t5000C50055F8CCAFd0  ONLINE       0     0     0
>             c1t5000C50055F9F91Bd0  ONLINE       0     0     0
>           mirror-21                ONLINE       0     0     0
>             c1t5000C50055E65ABBd0  ONLINE       0     0     0
>             c1t5000C50055F8905Fd0  ONLINE       0     0     0
>           mirror-22                ONLINE       0     0     0
>             c1t5000C50055E57A5Fd0  ONLINE       0     0     0
>             c1t5000C50055F87E73d0  ONLINE       0     0     0
>           mirror-23                ONLINE       0     0     0
>             c1t5000C50055E66053d0  ONLINE       0     0     0
>             c1t5000C50055E66B63d0  ONLINE       0     0     0
>           mirror-24                ONLINE       0     0     0
>             c1t5000C50055F8723Bd0  ONLINE       0     0     0
>             c1t5000C50055F8C3ABd0  ONLINE       0     0     0
>         logs
>           c2t5000A72A3007811Dd0    ONLINE       0     0     0
>         cache
>           c2t500117310015D579d0    ONLINE       0     0     0
>           c2t50011731001631FDd0    ONLINE       0     0     0
>           c12t500117310015D59Ed0   ONLINE       0     0     0
>           c12t500117310015D54Ed0   ONLINE       0     0     0
>         spares
>           c1t5000C50055FA2AEFd0    AVAIL
>           c1t5000C50055E595B7d0    AVAIL
> 
> errors: No known data errors
> 
> ---
> This is a ~90TB SAN on r151008, with 25 pairs of 4TB mirror drives. The last scrub I ran was about 3 months ago, which took (from my recollection) ~250 hours or so. I've only run about 4 scrubs so far on this installation.
> 
> The current scrub has been running for 2 weeks, with no end in sight. The last time I saw an estimate, it said around ~650 hours remaining. 

The estimate is often very wrong, especially for busy systems.
If this is an older ZFS implementation, this pool is likely getting pounded by the
ZFS write throttle. There are some tunings that can be applied, but the old write
throttle is not a stable control system, so it will always be a little bit unpredictable.

> 
> This thread http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/46021 from over 3 years ago mention the metaslab_min_alloc_size as a way to improve this (reducing it to 4K from 10MB). Further reading into this property got me this Illumos bug: https://www.illumos.org/issues/54, which states "Turns out this tunable is made irrelevant as a result of a change to use the metaslab_df_ops allocator. We don't need to change it. I'm closing this bug.". So that seems like a dead end to me. 

dead end.

> 
> This is the current load with scrub running (~350 VMs between Hyper-V and VMware environments):
> 
> # iostat -xnze

Unfortunately, this is the performance since boot and is not suitable for performance
analysis unless the system has been rebooted in the past 10 minutes or so. You'll need
to post the second batch from "iostat -zxCn 60 2"

>                             extended device statistics       ---- errors ---
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
>     0.4   12.5   39.7   78.8  0.1  0.0    5.0    0.1   0   0   0   0   0   0 rpool
>     0.2    6.9   19.9   39.4  0.0  0.0    0.0    0.1   0   0   0   0   0   0 c4t0d0
>     0.2    6.8   19.9   39.4  0.0  0.0    0.0    0.1   0   0   0   0   0   0 c4t1d0
>     4.4   29.3  209.7  962.7  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8723Bd0
>     4.7   25.1  209.4  962.3  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055E66B63d0
>     4.7   27.6  208.3  952.7  0.0  0.0    0.0    1.3   0   3   0   0   0   0 c1t5000C50055F87E73d0
>     4.4   28.6  209.1  974.3  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8BFA3d0
>     4.4   28.9  208.3  964.5  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9E123d0
>     4.4   25.7  208.7  955.7  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F0B3d0
>     4.4   26.5  209.1  960.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9D3B3d0
>     4.3   25.2  206.6  936.1  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055E4FDE7d0
>     4.4   26.9  208.1  982.6  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9A607d0
>     4.4   24.5  208.7  955.4  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055F8CDA7d0
>     4.3   26.5  207.8  943.8  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055E65877d0
>     4.4   27.7  208.0  961.1  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9E7D7d0
>     4.3   26.0  208.0  953.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055FA0AF7d0
>     4.3   26.1  208.0  966.2  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9FE87d0
>     4.4   28.5  208.6  965.3  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F91Bd0
>     4.3   26.7  207.2  945.0  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9FEABd0
>     4.4   26.5  209.3  980.1  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F63Bd0
>     4.3   26.1  207.6  944.3  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055F9F3EBd0
>     4.3   26.5  208.1  954.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F80Bd0
>    32.5   14.7 1005.6  751.2  0.0  0.0    0.0    0.3   0   1   0   0   0   0 c2t500117310015D579d0
>    32.5   14.7 1004.1  751.2  0.0  0.0    0.0    0.3   0   1   0   0   0   0 c2t50011731001631FDd0
>     0.0  180.8    0.0 16434.5  0.0  0.3    0.0    1.6   0   4   0   0   0   0 c2t5000A72A3007811Dd0
>     4.4   25.3  208.7  966.7  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9FB8Bd0
>     4.4   26.3  208.5  949.1  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F92Bd0
>     4.4   29.7  208.6  975.1  0.0  0.0    0.0    1.3   0   3   0   0   0   0 c1t5000C50055F8905Fd0
>     4.4   25.7  207.9  954.1  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8D48Fd0
>     4.4   26.8  208.4  967.4  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F89Fd0
>     4.4   28.5  208.1  964.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9EF2Fd0
>     4.4   29.4  209.5  962.7  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8C3ABd0
>     4.7   25.0  208.9  962.3  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055E66053d0
>     4.3   25.1  207.5  936.1  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055E66503d0
>     4.4   25.6  209.1  955.7  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9D3E3d0
>     4.3   26.6  207.4  945.0  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F84FB7d0
>     4.3   26.0  207.5  944.3  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055F8E017d0
>     4.3   26.4  207.1  943.8  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055E579F7d0
>     4.4   28.5  208.8  974.3  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055E65807d0
>     4.4   25.9  208.5  953.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F84A97d0
>     4.4   26.4  209.2  960.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F87D97d0
>     4.4   28.5  208.8  964.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F637d0
>     4.4   29.6  208.9  975.1  0.0  0.0    0.0    1.3   0   3   0   0   0   0 c1t5000C50055E65ABBd0
>     4.4   26.7  208.5  982.6  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8BF9Bd0
>     4.3   25.6  207.6  954.1  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055F8A22Bd0
>     4.4   27.6  208.2  961.1  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9379Bd0
>     4.7   27.6  208.3  952.8  0.0  0.0    0.0    1.3   0   3   0   0   0   0 c1t5000C50055E57A5Fd0
>     4.4   28.4  208.4  965.3  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8CCAFd0
>     4.4   26.4  208.9  980.1  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055F8B80Fd0
>     4.4   24.4  208.9  955.4  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055F9FA1Fd0
>     4.3   26.4  207.6  954.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055E65F0Fd0
>     4.4   28.8  208.3  964.5  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8BE3Fd0
>     4.3   26.7  207.4  967.4  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8B21Fd0
>     4.4   25.1  208.9  966.7  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8A46Fd0
>     4.4   26.0  209.7  966.2  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F856CFd0
>     4.4   26.2  209.0  949.1  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055E6606Fd0
>    32.5   14.7 1004.3  750.9  0.0  0.0    0.0    0.3   0   1   0   0   0   0 c12t500117310015D59Ed0
>    32.5   14.7 1004.4  751.3  0.0  0.0    0.0    0.3   0   1   0   0   0   0 c12t500117310015D54Ed0
>   349.1  646.9 14437.7 67437.3 52.7  2.6   52.9    2.6  12  37   0   0   0   0 tank
> 
> What should I be checking for? Is a scrub supposed to take that long (and I thought over 10 days for the last one was long..)? There doesn't seem to be any hardware errors. Is the load too high (12% wait, 37% busy with asvc_t of 2.6ms)?

There are many variables here, the biggest of which is the current non-scrub load.
 -- richard

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140729/041ddad4/attachment-0001.html>


More information about the OmniOS-discuss mailing list