[OmniOS-discuss] write amplification zvol

Richard Elling richard.elling at richardelling.com
Mon Oct 2 18:33:53 UTC 2017


> On Oct 2, 2017, at 12:51 AM, anthony omnios <icoomnios at gmail.com> wrote:
> 
> Hi, 
> 
> i have tried with a pool with ashift=9 and there is no write amplification, problem is solved.

ashift=13 means that the minumum size (bytes) written will be 8k (1<<13). So when you write
a single byte, there will be at least 2 writes for the data (both sides of the mirror), 4 writes for
metadata (both sides of the mirror * 2 copies of metadata for redundancy). Each metadata 
block contains information on 128 or more data blocks, so there is not a 1:1 correlation between
data and metadata writes.

Reducing ashift doesn't change the number of blocks written for a single byte write. It can only
reduce or increase the size in bytes of the writes.

HTH
 -- richard

> 
> But i can't used a ashift=9 with ssd (850 evo), i have read many articles indicated problems with ashift=9 on ssd.
> 
> How ca i do ? does i need to tweak specific zfs value ?
> 
> Thanks,
> 
> Anthony
> 
> 
> 
> 2017-09-28 11:48 GMT+02:00 anthony omnios <icoomnios at gmail.com <mailto:icoomnios at gmail.com>>:
> Thanks for you help Stephan.
> 
> i have tried differents LUN with default of 512b and 4096:
> 
> LU Name: 600144F04D4F0600000059A588910001
>     Operational Status: Online
>     Provider Name     : sbd
>     Alias             : /dev/zvol/rdsk/filervm2/hdd-110002b
>     View Entry Count  : 1
>     Data File         : /dev/zvol/rdsk/filervm2/hdd-110002b
>     Meta File         : not set
>     Size              : 26843545600
>     Block Size        : 4096
>     Management URL    : not set
>     Vendor ID         : SUN     
>     Product ID        : COMSTAR         
>     Serial Num        : not set
>     Write Protect     : Disabled
>     Writeback Cache   : Disabled
>     Access State      : Active
> 
> Problem is the same.
> 
> Cheers,
> 
> Anthony
> 
> 2017-09-28 10:33 GMT+02:00 Stephan Budach <stephan.budach at jvm.de <mailto:stephan.budach at jvm.de>>:
> ----- Ursprüngliche Mail -----
> 
> > Von: "anthony omnios" <icoomnios at gmail.com <mailto:icoomnios at gmail.com>>
> > An: "Richard Elling" <richard.elling at richardelling.com <mailto:richard.elling at richardelling.com>>
> > CC: omnios-discuss at lists.omniti.com <mailto:omnios-discuss at lists.omniti.com>
> > Gesendet: Donnerstag, 28. September 2017 09:56:42
> > Betreff: Re: [OmniOS-discuss] write amplification zvol
> 
> > Thanks Richard for your help.
> 
> > My problem is that i have a network ISCSI traffic of 2 MB/s, each 5
> > seconds i need to write on disks 10 MB of network traffic but on
> > pool filervm2 I am writing much more that, approximatively 60 MB
> > each 5 seconds. Each ssd of filervm2 is writting 15 MB every 5
> > second. When i check with smartmootools every ssd is writing
> > approximatively 250 GB of data each day.
> 
> > How can i reduce amont of data writting on each ssd ? i have try to
> > reduce block size of zvol but it change nothing.
> 
> > Anthony
> 
> > 2017-09-28 1:29 GMT+02:00 Richard Elling <
> > richard.elling at richardelling.com <mailto:richard.elling at richardelling.com> > :
> 
> > > Comment below...
> >
> 
> > > > On Sep 27, 2017, at 12:57 AM, anthony omnios <
> > > > icoomnios at gmail.com <mailto:icoomnios at gmail.com>
> > > > > wrote:
> >
> > > >
> >
> > > > Hi,
> >
> > > >
> >
> > > > i have a problem, i used many ISCSI zvol (for each vm), network
> > > > traffic is 2MB/s between kvm host and filer but i write on disks
> > > > many more than that. I used a pool with separated mirror zil
> > > > (intel s3710) and 8 ssd samsung 850 evo 1To
> >
> > > >
> >
> > > > zpool status
> >
> > > > pool: filervm2
> >
> > > > state: ONLINE
> >
> > > > scan: resilvered 406G in 0h22m with 0 errors on Wed Sep 20
> > > > 15:45:48
> > > > 2017
> >
> > > > config:
> >
> > > >
> >
> > > > NAME STATE READ WRITE CKSUM
> >
> > > > filervm2 ONLINE 0 0 0
> >
> > > > mirror-0 ONLINE 0 0 0
> >
> > > > c7t5002538D41657AAFd0 ONLINE 0 0 0
> >
> > > > c7t5002538D41F85C0Dd0 ONLINE 0 0 0
> >
> > > > mirror-2 ONLINE 0 0 0
> >
> > > > c7t5002538D41CC7105d0 ONLINE 0 0 0
> >
> > > > c7t5002538D41CC7127d0 ONLINE 0 0 0
> >
> > > > mirror-3 ONLINE 0 0 0
> >
> > > > c7t5002538D41CD7F7Ed0 ONLINE 0 0 0
> >
> > > > c7t5002538D41CD83FDd0 ONLINE 0 0 0
> >
> > > > mirror-4 ONLINE 0 0 0
> >
> > > > c7t5002538D41CD7F7Ad0 ONLINE 0 0 0
> >
> > > > c7t5002538D41CD7F7Dd0 ONLINE 0 0 0
> >
> > > > logs
> >
> > > > mirror-1 ONLINE 0 0 0
> >
> > > > c4t2d0 ONLINE 0 0 0
> >
> > > > c4t4d0 ONLINE 0 0 0
> >
> > > >
> >
> > > > i used correct ashift of 13 for samsung 850 evo
> >
> > > > zdb|grep ashift :
> >
> > > >
> >
> > > > ashift: 13
> >
> > > > ashift: 13
> >
> > > > ashift: 13
> >
> > > > ashift: 13
> >
> > > > ashift: 13
> >
> > > >
> >
> > > > But i write a lot on ssd every 5 seconds (many more than the
> > > > network traffic of 2 MB/s)
> >
> > > >
> >
> > > > iostat -xn -d 1 :
> >
> > > >
> >
> > > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
> >
> > > > 11.0 3067.5 288.3 153457.4 6.8 0.5 2.2 0.2 5 14 filervm2
> >
> 
> > > filervm2 is seeing 3067 writes per second. This is the interface to
> > > the upper layers.
> >
> > > These writes are small.
> >
> 
> > > > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 rpool
> >
> > > > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t0d0
> >
> > > > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0
> >
> > > > 0.0 552.6 0.0 17284.0 0.0 0.1 0.0 0.2 0 8 c4t2d0
> >
> > > > 0.0 552.6 0.0 17284.0 0.0 0.1 0.0 0.2 0 8 c4t4d0
> >
> 
> > > The log devices are seeing 552 writes per second and since
> > > sync=standard that
> >
> > > means that the upper layers are requesting syncs.
> >
> 
> > > > 1.0 233.3 48.1 10051.6 0.0 0.0 0.0 0.1 0 3 c7t5002538D41657AAFd0
> >
> > > > 5.0 250.3 144.2 13207.3 0.0 0.0 0.0 0.1 0 3 c7t5002538D41CC7127d0
> >
> > > > 2.0 254.3 24.0 13207.3 0.0 0.0 0.0 0.1 0 4 c7t5002538D41CC7105d0
> >
> > > > 3.0 235.3 72.1 10051.6 0.0 0.0 0.0 0.1 0 3 c7t5002538D41F85C0Dd0
> >
> > > > 0.0 228.3 0.0 16178.7 0.0 0.0 0.0 0.2 0 4 c7t5002538D41CD83FDd0
> >
> > > > 0.0 225.3 0.0 16210.7 0.0 0.0 0.0 0.2 0 4 c7t5002538D41CD7F7Ed0
> >
> > > > 0.0 282.3 0.0 19991.1 0.0 0.0 0.0 0.2 0 5 c7t5002538D41CD7F7Dd0
> >
> > > > 0.0 280.3 0.0 19871.0 0.0 0.0 0.0 0.2 0 5 c7t5002538D41CD7F7Ad0
> >
> 
> > > The pool disks see 1989 writes per second total or 994 writes per
> > > second logically.
> >
> 
> > > It seems to me that reducing 3067 requested writes to 994 logical
> > > writes is the opposite
> >
> > > of amplification. What do you expect?
> >
> > > -- richard
> >
> 
> > > >
> >
> > > > I used zvol of 64k, i try with 8k and problem is the same.
> >
> > > >
> >
> > > > zfs get all filervm2/hdd-110022a :
> >
> > > >
> >
> > > > NAME PROPERTY VALUE SOURCE
> >
> > > > filervm2/hdd-110022a type volume -
> >
> > > > filervm2/hdd-110022a creation Tue May 16 10:24 2017 -
> >
> > > > filervm2/hdd-110022a used 5.26G -
> >
> > > > filervm2/hdd-110022a available 2.90T -
> >
> > > > filervm2/hdd-110022a referenced 5.24G -
> >
> > > > filervm2/hdd-110022a compressratio 3.99x -
> >
> > > > filervm2/hdd-110022a reservation none default
> >
> > > > filervm2/hdd-110022a volsize 25G local
> >
> > > > filervm2/hdd-110022a volblocksize 64K -
> >
> > > > filervm2/hdd-110022a checksum on default
> >
> > > > filervm2/hdd-110022a compression lz4 local
> >
> > > > filervm2/hdd-110022a readonly off default
> >
> > > > filervm2/hdd-110022a copies 1 default
> >
> > > > filervm2/hdd-110022a refreservation none default
> >
> > > > filervm2/hdd-110022a primarycache all default
> >
> > > > filervm2/hdd-110022a secondarycache all default
> >
> > > > filervm2/hdd-110022a usedbysnapshots 15.4M -
> >
> > > > filervm2/hdd-110022a usedbydataset 5.24G -
> >
> > > > filervm2/hdd-110022a usedbychildren 0 -
> >
> > > > filervm2/hdd-110022a usedbyrefreservation 0 -
> >
> > > > filervm2/hdd-110022a logbias latency default
> >
> > > > filervm2/hdd-110022a dedup off default
> >
> > > > filervm2/hdd-110022a mlslabel none default
> >
> > > > filervm2/hdd-110022a sync standard local
> >
> > > > filervm2/hdd-110022a refcompressratio 3.99x -
> >
> > > > filervm2/hdd-110022a written 216K -
> >
> > > > filervm2/hdd-110022a logicalused 20.9G -
> >
> > > > filervm2/hdd-110022a logicalreferenced 20.9G -
> >
> > > > filervm2/hdd-110022a snapshot_limit none default
> >
> > > > filervm2/hdd-110022a snapshot_count none default
> >
> > > > filervm2/hdd-110022a redundant_metadata all default
> >
> > > >
> >
> > > > Sorry for my bad english.
> >
> > > >
> >
> > > > What can be the problem ? thanks
> >
> > > >
> >
> > > > Best regards,
> >
> > > >
> >
> > > > Anthony
> >
> 
> How did you setup your LUNs? Especially, what is the block size for those LUNs. Could it be, that you went with the default of 512b blocks, where the drives do have 4k or even 8k blocks?
> 
> Cheers,
> Stephan
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20171002/9ac0dd1f/attachment.html>


More information about the OmniOS-discuss mailing list