[OmniOS-discuss] ILB memory leak?
Al Slater
al.slater at scluk.com
Thu Nov 5 11:38:30 UTC 2015
To the mailing list as well...
On 22/10/2015 09:43, Al Slater wrote:
> On 21/10/2015 17:35, Dan McDonald wrote:
>>
>>> On Oct 21, 2015, at 6:08 AM, Al Slater <al.slater at scluk.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I am running omnios r151014 on a couple of machines with a couple
>>> of zones each. 1 zone runs apache as an SSL reverse proxy, the
>>> other runs ILB for load balancing web to app tier connections.
>>>
>>> I noticed that in the ILB zone, the ilbd process memory grows to
>>> about 2Gb. Restarting ILB releases the memory, and then the
>>> memory usage gradually increases again, with each memory increase
>>> approximately 2 * the size of the previous one. I run a cronjob
>>> twice a day ( 8am and 8pm) which restarts the ilb service and
>>> releases the memory.
>>>
>>> A graph of memory usage is available at
>>> https://www.dropbox.com/s/zaz51apxslnivlq/ILB_Memory_2_days.png?dl=0
>>>
> >> There are currently 62 rules in the load balancer, with a
> >> total
>>> of 664 server/port pairs.
>>>
>>> Is there anything I can provide that would help track this down?
>>
>> You can use svccfg(1M) to enable user-level memory debugging on ilb.
>> It may cause the ilb daemon to dump core. (And you're just noticing
>> this in the process, not kernel memory consumption, correct?)
>
> I am seeing kernel memory consumption increasing as well, but that may
> be a different issue. The ilbd process memory is definitely growing.
>
>> As root:
>>
>> svcadm disable -t ilb svccfg -s ilb setenv LD_PRELOAD libumem.so
>> svccfg -s ilb setenv UMEM_DEBUG default svccfg -s ilb refresh svcadm
>> enable ilb
>>
>> That should enable user-level memory debugging. If you get a
>> coredump, save it and share it. If you don't and the ilb daemon
>> keeps running, eventually please:
>>
>> gcore `pgrep ilbd`
>>
>> and share THAT corefile. You can also do this by youself:
>>
>> mdb <ilbd-core> > ::findleaks
>>
>> and share ::findleaks.
>>
>> Once you're done generating corefiles, repeat the steps above, but
>> use "unsetenv LD_PRELOAD" and "unsetenv UMEM_DEBUG" instead of the
>> setenv lines.
>
> Thanks Dan. As we are talking about production boxes here, I will have
> to try and reproduce on another box and then I will give the process
> above a go and see what we come up with.
I have reproduced the problem on a test box.
prstat shows:
3041 daemon 3946M 3946M sleep 59 0 0:48:03 0.1% ilbd/1
memstat:
root at loki:/export/home/BRIGHTON/aslate# echo ::memstat | mdb -k
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 238420 931 12%
ZFS File Data 630861 2464 31%
Anon 1054835 4120 51%
Exec and libs 2204 8 0%
Page cache 10624 41 1%
Free (cachelist) 9236 36 0%
Free (freelist) 105626 412 5%
Total 2051806 8014
Physical 2051805 8014
mdb findleaks:
root at loki:/export/home/BRIGHTON/aslate# mdb core.3041
Loading modules: [ libumem.so.1 libc.so.1 libcmdutils.so.1 libuutil.so.1
ld.so.1 ]
> ::findleaks
findleaks: no memory leaks detected
>
Now, I am seeing lots of log messages like the following in
/var/adm/messages
Nov 5 11:17:01 l1-lb2 ilbd[3041]: [ID 410242 daemon.error]
ilbd_hc_probe_timer: cannot restart timer: rule ggp server _ggp.11,
disabling it
So, I was wrong about growing to 2Gb, the truth is nearer 4Gb. I am
guessing that ilbd_hc_restart_timer is failing because no more memory
can be allocated.
I have the 4Gb core file. Is there anything useful I can extract from
it to try and spot where the problem is?
-- Al Slater
More information about the OmniOS-discuss
mailing list