[OmniOS-discuss] ILB memory leak?

Al Slater al.slater at scluk.com
Mon Nov 9 16:22:31 UTC 2015


On 09/11/15 15:43, Dan McDonald wrote:
> 
>> On Nov 9, 2015, at 8:39 AM, Al Slater <al.slater at scluk.com> wrote:
>> 
>> Attached is a compressed file with 5hrs or so of 10s pmaps.
>> Hopefully not too big for the list.
> 
> It compressed nicely.  I'm noticing a pattern:
> 
> Mon Nov  9 08:21:45 UTC 2015 total Kb  134008  133504  131416
> - Mon Nov  9 08:50:21 UTC 2015 total Kb  265080  264576  262488
> - Mon Nov  9 09:37:42 UTC 2015 total Kb  265088  264580  262492
> - Mon Nov  9 09:47:40 UTC 2015 total Kb  527232  526724  524636
> - Mon Nov  9 11:42:19 UTC 2015 total Kb 1051520 1050960 1048872
> - Mon Nov  9 11:42:29 UTC 2015 total Kb 1051520 1051012 1048924
> -
> 
> 
> It's mostly linear growth.  Notice the time intervals also double
> whenever the footprint essentially doubles?
> 
> So I need to back up and ask some things, especially given libumem
> doesn't appear to show leaks or even usage:
> 
> 1.) Is the eating of memory affecting your system peformance?  (If
> you've only 8GB, yeah, I can see that.)

Hmmm...  I started investigating after the servers hung a couple of
times.  I have not conclusively proved that this was the cause, but the
machines have been running for months with no issue after I added a
cronjob to restart ilb twice a day.  I can see a gradual increase in
kernel memory use as well, but I have not investigated that.

> 2.) Is ilb failing after it gets sufficiently large?

Again, no link conclusively proved, but I did see log messages like the
following when the memory use had grown to 4Gb...

Nov  5 11:17:01 l1-lb2 ilbd[3041]: [ID 410242 daemon.error]
ilbd_hc_probe_timer: cannot restart timer: rule ggp server _ggp.11,
disabling it

I looked at the source for ilbd and I think this could be caused by a
memory allocation failure in iu_schedule_timer.

After these messages was generated, it looks like the disabled servers
were never re-enabled, so eventually this could end up with no enabled
servers, and therefore no service, without manual intervention.

-- 
Al Slater



More information about the OmniOS-discuss mailing list