[OmniOS-discuss] Fragmentation

Sun Jun 25 19:36:38 UTC 2017

On June 23, 2017 9:01:20 PM GMT+02:00, Richard Elling <richard.elling at richardelling.com> wrote:
>ZIL pre-allocates at the block level, so think along the lines of 12k
>or 132k.
> — richard
>
>> On Jun 23, 2017, at 11:30 AM, Günther Alka <alka at hfg-gmuend.de>
>wrote:
>> 
>> hello Richard
>> 
>> I can follow that the Zil does not add more fragmentation to the free
>space but is this effect relevant?
>> If a ZIL pre-allocates say 4G and the remaining fragmented poolsize
>for regular writes is 12T
>> 
>> Gea
>> 
>> Am 23.06.2017 um 19:30 schrieb Richard Elling:
>>> A slog helps fragmentation because the space for ZIL is
>pre-allocated based on a prediction of
>>> how big the write will be. The pre-allocated space includes a
>physical-block-sized chain block for the
>>> ZIL. An 8k write can allocate 12k for the ZIL entry that is freed
>when the txg commits. Thus, a slog
>>> can help decrease free space fragmentation in the pool.
>>>  — richard
>>> 
>>> 
>>>> On Jun 23, 2017, at 8:56 AM, Guenther Alka <alka at hfg-gmuend.de>
>wrote:
>>>> 
>>>> A Zil or better dedicated Slog device will not help as this is not
>a write cache but a logdevice. Its only there to commit every written
>datablock and to put it onto stable storage. It is read only after a
>crash to redo a missing committed write.
>>>> 
>>>> All writes, does not matter if sync or not, are going over the
>rambased write cache (per default up to 4GB). This is flushed from time
>to time as a large sequential write. Writes are fragmented then
>depending on the fragmentation of the free space.
>>>> 
>>>> Gea
>>>> 
>>>> 
>>>>> To prevent it, a ZIL caching all writes (including sync ones, e.g.
>nfs) can help. Perhaps a DDR drive (or mirror of these) with battery
>and flash protection from poweroffs, so it does not wear out like flash
>would. In this case, how-ever random writes come, ZFS does not have to
>put them on media asap - so it can do larger writes later. This can
>also protect SSD arrays from excessive small writes and wear-out,
>though there a bad(ly sized) ZIL can become a bottleneck.
>>>>> 
>>>>> Hope this helps,
>>>>> Jim
>>>>> --
>>>> _______________________________________________
>>>> OmniOS-discuss mailing list
>>>> OmniOS-discuss at lists.omniti.com
>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>> 
>> -- 
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>_______________________________________________
>OmniOS-discuss mailing list
>OmniOS-discuss at lists.omniti.com
>http://lists.omniti.com/mailman/listinfo/omnios-discuss

@Gea, IIRC one can set sync mode on a dataset, effectively forcing all writes to go to (dedicated) ZIL, and data remains in memory until flushed to persistent bulk storage like normal pool writes go. This way more consolidated writes can be sent to disks of the pool, rather than forcing many small (sync) allocations and deallocations if (sync) writes are small and intensive enough, e.g. appending log files, etc.

For SSD pools this is thought to also ease the wear due to ability to reprogram whole pages, compensating also for small intensive random writes since random LBAs can live in same page.

Jim

Hope Richard would correct me if I got something wrong ;)
--
Typos courtesy of K-9 Mail on my Redmi Android