[OmniOS-discuss] OmniOS backup box hanging regularly

Jim Klimov jim at cos.ru
Mon Nov 2 11:19:55 UTC 2015


 Small follow-up:

 For now I just disabled the "frequent" and "hourly" schedules, and the box worked for about a week without freezing over. Good tendency ;)

 Looking at this, I guess there is similar mis-behavior on one of the Sol10 boxes where the original (pre-TimeSlider) zfs-autosnap KSH scripts were ported. It also goes down once in a few weeks with forking problems, although we thought it may be due to backup jobs (exporting data from the box) being too heavy at times. Maybe too many snapshot jobs collide instead...

 Perhaps we should look at porting Python timeslider here and there instead, or really learn znapzend...




 Thanks,
 Jim



----- Исходное сообщение -----
От: Lauri Tirkkonen <lotheac at iki.fi>
Дата: Tuesday, October 27, 2015 13:40
Тема: Re: [OmniOS-discuss] OmniOS backup box hanging regularly
Кому (To): Jim Klimov <jim at cos.ru>
Копия (Cc): omnios-discuss at lists.omniti.com

 > On Tue, Oct 27 2015 09:49:40 +0100, Jim Klimov wrote:
 > > So far I use a mix of 'standard' time-slider and additionally 
 > my script that kills oldest snapshot groups (chosen by pattern 
 > of automatic snaps) to keep a specified watermark of free space.
 > 
 > Yeah, we were previously using zfs-auto-snap from OpenSolaris 
 > before it
 > became time-slider (with one or two local patches). 
 > 
 > > Something in this simple activity is enough to bring the box 
 > down into swapping until the deadman knocks to interrupt the 
 > infinite loop looking for a free page, and I've got a screenshot 
 > to prove this theory ;)
 > 
 > In your previous mail you have a 'top' listing with way too many 'zfs'
 > processes owned by zfssnap, and all are hundreds of megabytes in RSS.
 > That sounds like a problem. IIRC, one problematic configuration that
 > caused issues like this was a single filesystem setting a
 > zfs-auto-snapshot property locally in a large tree where it also
 > inherited it from the parent. My memory on this is a bit hazy though.
 > 
 > > I wonder why doesn't the offending process die on some failed 
 > malloc...
 > Good question.
 > 
 > -- 
 > Lauri Tirkkonen | lotheac @ IRCnet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151102/9de71ccc/attachment-0001.html>


More information about the OmniOS-discuss mailing list