[OmniOS-discuss] Slow scrub on SSD-only pool

Sun Apr 17 13:07:12 UTC 2016

Am 17.04.16 um 14:07 schrieb Stephan Budach:
> Hi all,
>
> I am running a scrub on a SSD-only zpool on r018. This zpool consists 
> of 16 iSCSI targets, which are served from two other OmniOS boxes - 
> currently still running r016 over 10GbE connections.
>
> This zpool serves as a NFS share for my Oracle VM cluster and it 
> delivers reasonable performance. Even while the scrub is running, I 
> can get approx 1200MB/s throughput when dd'ing a vdisk from the ZFS to 
> /dev/null.
>
> However, the running scrub is only progressing like this:
>
> root at zfsha02gh79:/root# zpool status ssdTank
>   pool: ssdTank
>  state: ONLINE
>   scan: scrub in progress since Sat Apr 16 23:37:52 2016
>     68,5G scanned out of 1,36T at 1,36M/s, 276h17m to go
>     0 repaired, 4,92% done
> config:
>
>         NAME STATE     READ WRITE CKSUM
>         ssdTank ONLINE       0     0     0
>           mirror-0 ONLINE       0     0     0
>             c3t600144F090D09613000056B8A76C0001d0 ONLINE 0     0     0
>             c3t600144F090D09613000056B8A93C0009d0 ONLINE 0     0     0
>           mirror-1 ONLINE       0     0     0
>             c3t600144F090D09613000056B8A7BE0002d0 ONLINE 0     0     0
>             c3t600144F090D09613000056B8A948000Ad0 ONLINE 0     0     0
>           mirror-2 ONLINE       0     0     0
>             c3t600144F090D09613000056B8A7F10003d0 ONLINE 0     0     0
>             c3t600144F090D09613000056B8A958000Bd0 ONLINE 0     0     0
>           mirror-3 ONLINE       0     0     0
>             c3t600144F090D09613000056B8A7FC0004d0 ONLINE 0     0     0
>             c3t600144F090D09613000056B8A964000Cd0 ONLINE 0     0     0
>           mirror-4 ONLINE       0     0     0
>             c3t600144F090D09613000056B8A8210005d0 ONLINE 0     0     0
>             c3t600144F090D09613000056B8A96E000Dd0 ONLINE 0     0     0
>           mirror-5 ONLINE       0     0     0
>             c3t600144F090D09613000056B8A82E0006d0 ONLINE 0     0     0
>             c3t600144F090D09613000056B8A978000Ed0 ONLINE 0     0     0
>           mirror-6 ONLINE       0     0     0
>             c3t600144F090D09613000056B8A83B0007d0 ONLINE 0     0     0
>             c3t600144F090D09613000056B8A983000Fd0 ONLINE 0     0     0
>           mirror-7 ONLINE       0     0     0
>             c3t600144F090D09613000056B8A84A0008d0 ONLINE 0     0     0
>             c3t600144F090D09613000056B8A98E0010d0 ONLINE 0     0     0
>
> errors: No known data errors
>
> These are all Intel S3710s with 800GB and I can't seem to find out why 
> it's moving so slowly.
> Anything I can look at specifically?
>
> Thanks,
> Stephan
>
Well… searching the net somewhat more thoroughfully, I came across an 
archived discussion which deals also with a similar issue. Somewhere 
down the conversation, this parameter got suggested:

echo "zfs_scrub_delay/W0" | mdb -kw

I just tried that as well and although the caculated speed climbs rathet 
slowly up, iostat now shows  approx. 380 MB/s read from the devices, 
which rates  at 24 MB/s per single device * 8 *2.

Being curious, I issued a echo "zfs_scrub_delay/W1" | mdb -kw to see 
what would happen and that command immediately drowned the rate on each 
device down to 1.4 MB/s…

What is the rational behind that? Who wants to wait for weeks for a 
scrub to finish? Usually, I am having znapzend run as well, creating 
snapshots on a regular basis. Wouldn't that hurt scrub performance even 
more?

Cheers,
Stephan