[OmniOS-discuss] write amplification zvol

Richard Elling richard.elling at richardelling.com
Wed Sep 27 23:29:52 UTC 2017


Comment below...

> On Sep 27, 2017, at 12:57 AM, anthony omnios <icoomnios at gmail.com> wrote:
> 
> Hi,
> 
> i have a problem, i used many ISCSI zvol (for each vm), network traffic is 2MB/s between kvm host and filer but i write on disks many more than that. I used a pool with separated mirror zil (intel s3710) and 8 ssd samsung  850 evo 1To
> 
>  zpool status                        
>   pool: filervm2
>  state: ONLINE
>   scan: resilvered 406G in 0h22m with 0 errors on Wed Sep 20 15:45:48 2017
> config:
> 
>         NAME                       STATE     READ WRITE CKSUM
>         filervm2                   ONLINE       0     0     0
>           mirror-0                 ONLINE       0     0     0
>             c7t5002538D41657AAFd0  ONLINE       0     0     0
>             c7t5002538D41F85C0Dd0  ONLINE       0     0     0
>           mirror-2                 ONLINE       0     0     0
>             c7t5002538D41CC7105d0  ONLINE       0     0     0
>             c7t5002538D41CC7127d0  ONLINE       0     0     0
>           mirror-3                 ONLINE       0     0     0
>             c7t5002538D41CD7F7Ed0  ONLINE       0     0     0
>             c7t5002538D41CD83FDd0  ONLINE       0     0     0
>           mirror-4                 ONLINE       0     0     0
>             c7t5002538D41CD7F7Ad0  ONLINE       0     0     0
>             c7t5002538D41CD7F7Dd0  ONLINE       0     0     0
>         logs
>           mirror-1                 ONLINE       0     0     0
>             c4t2d0                 ONLINE       0     0     0
>             c4t4d0                 ONLINE       0     0     0
> 
> i used correct ashift of 13 for samsung 850 evo
> zdb|grep ashift :
> 
> ashift: 13
> ashift: 13
> ashift: 13
> ashift: 13
> ashift: 13
> 
> But i write a lot on ssd every 5 seconds (many more than the network traffic of 2 MB/s)
> 
> iostat -xn -d 1 : 
> 
>  r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    11.0 3067.5  288.3 153457.4  6.8  0.5    2.2    0.2   5  14 filervm2

filervm2 is seeing 3067 writes per second. This is the interface to the upper layers.
These writes are small.

>     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 rpool
>     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t0d0
>     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t1d0
>     0.0  552.6    0.0 17284.0  0.0  0.1    0.0    0.2   0   8 c4t2d0
>     0.0  552.6    0.0 17284.0  0.0  0.1    0.0    0.2   0   8 c4t4d0

The log devices are seeing 552 writes per second and since sync=standard that 
means that the upper layers are requesting syncs.

>     1.0  233.3   48.1 10051.6  0.0  0.0    0.0    0.1   0   3 c7t5002538D41657AAFd0
>     5.0  250.3  144.2 13207.3  0.0  0.0    0.0    0.1   0   3 c7t5002538D41CC7127d0
>     2.0  254.3   24.0 13207.3  0.0  0.0    0.0    0.1   0   4 c7t5002538D41CC7105d0
>     3.0  235.3   72.1 10051.6  0.0  0.0    0.0    0.1   0   3 c7t5002538D41F85C0Dd0
>     0.0  228.3    0.0 16178.7  0.0  0.0    0.0    0.2   0   4 c7t5002538D41CD83FDd0
>     0.0  225.3    0.0 16210.7  0.0  0.0    0.0    0.2   0   4 c7t5002538D41CD7F7Ed0
>     0.0  282.3    0.0 19991.1  0.0  0.0    0.0    0.2   0   5 c7t5002538D41CD7F7Dd0
>     0.0  280.3    0.0 19871.0  0.0  0.0    0.0    0.2   0   5 c7t5002538D41CD7F7Ad0

The pool disks see 1989 writes per second total or 994 writes per second logically.

It seems to me that reducing 3067 requested writes to 994 logical writes is the opposite
of amplification. What do you expect?
 -- richard

> 
> I used zvol of 64k, i try with 8k and problem is the same.
> 
> zfs get all filervm2/hdd-110022a :
> 
> NAME                  PROPERTY              VALUE                  SOURCE
> filervm2/hdd-110022a  type                  volume                 -
> filervm2/hdd-110022a  creation              Tue May 16 10:24 2017  -
> filervm2/hdd-110022a  used                  5.26G                  -
> filervm2/hdd-110022a  available             2.90T                  -
> filervm2/hdd-110022a  referenced            5.24G                  -
> filervm2/hdd-110022a  compressratio         3.99x                  -
> filervm2/hdd-110022a  reservation           none                   default
> filervm2/hdd-110022a  volsize               25G                    local
> filervm2/hdd-110022a  volblocksize          64K                    -
> filervm2/hdd-110022a  checksum              on                     default
> filervm2/hdd-110022a  compression           lz4                    local
> filervm2/hdd-110022a  readonly              off                    default
> filervm2/hdd-110022a  copies                1                      default
> filervm2/hdd-110022a  refreservation        none                   default
> filervm2/hdd-110022a  primarycache          all                    default
> filervm2/hdd-110022a  secondarycache        all                    default
> filervm2/hdd-110022a  usedbysnapshots       15.4M                  -
> filervm2/hdd-110022a  usedbydataset         5.24G                  -
> filervm2/hdd-110022a  usedbychildren        0                      -
> filervm2/hdd-110022a  usedbyrefreservation  0                      -
> filervm2/hdd-110022a  logbias               latency                default
> filervm2/hdd-110022a  dedup                 off                    default
> filervm2/hdd-110022a  mlslabel              none                   default
> filervm2/hdd-110022a  sync                  standard               local
> filervm2/hdd-110022a  refcompressratio      3.99x                  -
> filervm2/hdd-110022a  written               216K                   -
> filervm2/hdd-110022a  logicalused           20.9G                  -
> filervm2/hdd-110022a  logicalreferenced     20.9G                  -
> filervm2/hdd-110022a  snapshot_limit        none                   default
> filervm2/hdd-110022a  snapshot_count        none                   default
> filervm2/hdd-110022a  redundant_metadata    all                    default
> 
> Sorry for my bad english.
> 
> What can be the problem ? thanks
> 
> Best regards,
> 
> Anthony
> 
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss



More information about the OmniOS-discuss mailing list