[OmniOS-discuss] Clues for tracking down why kernel memory isn't being released?

Thu Jul 16 18:34:37 UTC 2015

> On Jul 16, 2015, at 9:48 AM, Chris Siebenmann <cks at cs.toronto.edu> wrote:
> 
> I wrote:
>> We have one ZFS-based NFS fileserver that persistently runs at a very
>> high level of non-ARC kernel memory usage that never seems to shrink.
>> On a 128 GB machine, mdb's ::memstat reports 95% memory usage by just
>> 'Kernel' while the ZFS ARC is only at about 21 GB (as reported by
>> 'kstat -m') although c_max should allow it to grow much bigger.
>> 
>> According to ::kmastat, a *huge* amount of this memory appears to be
>> vanishing into allocated but not used kmem_alloc_131072 slab buffers:
>> 
>>> ::kmastat
>> cache                            buf       buf       buf memory      alloc alloc
>> name                            size    in use     total in use    succeed  fail
>> ------------------------------ ----- --------- --------- ------ ---------- -----
>> [...]
>> kmem_alloc_131072               128K         6    613033  74.8G  196862991     0
> 
> It turns out that the explanation for this is relatively simple, as
> is the work around. Put simply: the OmniOS kernel does not actually
> free up these deallocated cache objects until the system is put under
> relatively strong memory pressure. Crucially, *the ZFS ARC does not
> create this memory pressure*; I think that you pretty much need a user
> level program allocating enough memory in order to trigger it, and I
> think the memory growth needs to happen relatively rapidly fast so that
> the kernel doesn't reclaim enough memory through lesser means (such as
> shrinking the ZFS ARC).

I don't think we will get much traction for ZFS pushing applications out of RAM.
There is a nuance here, that can be difficult to resolve.

> 
> (Specifically, you need to force kmem_reap() to be called. The primary
> path for this is if 'freemem' drops under 'lotsfree', which is only a few
> hundred MB on many systems. See usr/src/uts/common/os/vm_pageout.c in
> the OmniOS source repo.)
> 
> Since our fileservers are purely NFS fileservers and have a basically
> static level of user memory usage, they rarely or never rapidly use up
> enough memory to trigger this 'allocated but unused' reclaim[*].
> 
> The good news is that it's easy enough these days to eat memory at the
> user level (you can do it with modern 64-bit scripting languages like
> Python, even at an interactive prompt). The bad news is that when we did
> this on the server in question we provoked a significant system stall at
> both the NFS server level and even the level of ssh logins and shells;
> this is clearly not something that we'd want to automate.
> 
> It's my personal opinion that there should be something in the kernel
> that automatically reaps drastically outsized kmem caches after a
> while. It's absurd that we've run for weeks with more than 70 GB of RAM
> sitting unused and an undersized ZFS ARC because of this.

kmem reaps can be very painful

> 
> 	- cks
> [*: interested parties can see how often cache reaping has been triggered
>    with the following 'mdb -k' command:
> 	::walk kmem_cache | ::printf "%4d %s\n" kmem_cache_t cache_reap cache_name

ugh. How about:
kstat -p :::reap

 -- richard

> 
>    Even on this heavily used fileserver, up for 45 days, the reap count
>    was *8*.  Many of our other fileservers, with less usage, have reap
>    counts of 0.
> ]
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150716/14647018/attachment.html>