[OmniOS-discuss] KVMs locking up for seconds at a time...

Tobias Oetiker tobi at oetiker.ch
Fri Mar 17 13:42:09 UTC 2017


We found the cause of the problem:

  svc:/system/rcap:default

enable it and enjoy the behaviour detailed below plus random hangs on nfs and iscsi export

disable it and things are as before


cheers
tobi


----- On Mar 13, 2017, at 6:15 PM, Dan McDonald danmcd at omniti.com wrote:

> Hello!
> 
> I'm including Tobias in this, as he reported this to me in OmniOS r151020
> (released November of 2016).  He can correct any mistake I make in reporting
> this.
> 
> His KVM instances are experiencing, "Random short freezes".  Let me quote him:
> 
>> We are running kvm instances on omni r20 and are experiencing random short
>> freezes.
>> I wrote the following short test script to see how frequent the freezing
>> occurres
>> 
>> perl -e 'use Time::HiRes qw(time usleep); my $now = time; while(1){usleep
>> 200000; my $next = time; my $diff = $next - $now; $now=$next; if ($diff >
>> 0.22){ print "".localtime(time)." ".$diff,"\n"}}'
>> 
>> the output looks like this
>> 
>> Thu Mar  9 15:26:12 2017 0.224979877471924
>> Thu Mar  9 15:26:23 2017 0.273133993148804
>> Thu Mar  9 15:27:54 2017 1.17526292800903
>> Thu Mar  9 15:28:59 2017 2.04209899902344
>> Thu Mar  9 15:30:31 2017 1.0813729763031
>> Thu Mar  9 15:30:44 2017 0.600342988967896
>> Thu Mar  9 15:31:47 2017 1.43648099899292
>> Thu Mar  9 15:32:25 2017 0.897988796234131
>> Thu Mar  9 15:33:28 2017 0.998631000518799
>> Thu Mar  9 15:34:10 2017 4.89306116104126
>> Thu Mar  9 15:36:15 2017 1.22311997413635
>> Thu Mar  9 15:38:57 2017 1.64742302894592
>> Thu Mar  9 15:39:21 2017 1.36228013038635
>> Thu Mar  9 15:40:08 2017 1.62232208251953
>> Thu Mar  9 15:40:32 2017 1.37291598320007
>> Thu Mar  9 15:42:30 2017 0.211127996444702
>> 
>> as you can see there are quite frequent short freezes ...
> 
> So his KVM guest sees freezes.  And he ran that perl in the OmniOS global zone
> w/o noticing any slowdowns.
> 
> I asked him to pstack(1) the qemu process.  It's attached below as pstack.zip.
> 
> He further noticed a manifestation in the global zone:
> 
>> What I found while looking up process numbers on the problematic box, is that
>> 
>>   time cat /proc/*/psinfo > /dev/null
>> 
>> Takes anywhere between 0.01s and 4s if called repeatedly, whereas on a machine
>> another host where there are no sever kvm hangs this command always takes about
>> 0.02 secons.
> 
> So he can find slowness that's likely KVM-induced in the global zone.  With that
> in mind, I told him, and he responded:
> 
>>> lockstat(1M) may be helpful here:
>>> 
>>> 	lockstat -o <somewhere> cat /proc/*/psinfo > /dev/null
>>> 
>>> I'll want to see that output from <somewhere>.  ESPECIALLY if it's a "takes way
>>> too long" sort of result.
> 
> 
> He included three lockstat outputs, also attached below.
> 
> The longer-running ones had this lock:
> 
> 11423  73%  73% 0.00     3403 0xfffffea4320ef020
> gfn_to_memslot_unaliased+0x1f
> 
> being hammered more in the long-running ones than in the short-running ones.
> Now I don't know KVM internals all that well, but  it looks like the lock in
> question protects a linear-array-search of memory slots.  It appears it just
> runs through and stops when the requested address is in the range of one of the
> hits.
> 
> I've not asked Tobias yet to dig into kmdb to see how many memory slots are in
> this kvm, but let me do that too:
> 
> Tobias:  Try this:
> 
>	dtrace -n 'gfn_to_memslot_unaliased:entry { printf("\nKVM 0x%p, number of
>	memslots: %d\n", arg0, ((struct kvm_memslots *)((struct kvm
>	*)arg0)->memslots)->nmemslots); stack(); exit(0);}'
> 
> If your system is still running the same KVM instances, check for
> 0xfffffea4320ef000 as the KVM pointer.
> 
> Everyone else ===> Any idea if memslots would cause KVM instances to misbehave
> per above?  If not, any other clues so I don't keep chasing red herrings?  If
> so, should we perhaps spend some time making memslots more efficient?  Or are
> there operational considerations not being accounted for?
> 
> Thanks,
> Dan

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
www.oetiker.ch tobi at oetiker.ch +41 62 775 9902


More information about the OmniOS-discuss mailing list