[OmniOS-discuss] KVMs locking up for seconds at a time...
Tobias Oetiker
tobi at oetiker.ch
Fri Mar 17 13:42:09 UTC 2017
We found the cause of the problem:
svc:/system/rcap:default
enable it and enjoy the behaviour detailed below plus random hangs on nfs and iscsi export
disable it and things are as before
cheers
tobi
----- On Mar 13, 2017, at 6:15 PM, Dan McDonald danmcd at omniti.com wrote:
> Hello!
>
> I'm including Tobias in this, as he reported this to me in OmniOS r151020
> (released November of 2016). He can correct any mistake I make in reporting
> this.
>
> His KVM instances are experiencing, "Random short freezes". Let me quote him:
>
>> We are running kvm instances on omni r20 and are experiencing random short
>> freezes.
>> I wrote the following short test script to see how frequent the freezing
>> occurres
>>
>> perl -e 'use Time::HiRes qw(time usleep); my $now = time; while(1){usleep
>> 200000; my $next = time; my $diff = $next - $now; $now=$next; if ($diff >
>> 0.22){ print "".localtime(time)." ".$diff,"\n"}}'
>>
>> the output looks like this
>>
>> Thu Mar 9 15:26:12 2017 0.224979877471924
>> Thu Mar 9 15:26:23 2017 0.273133993148804
>> Thu Mar 9 15:27:54 2017 1.17526292800903
>> Thu Mar 9 15:28:59 2017 2.04209899902344
>> Thu Mar 9 15:30:31 2017 1.0813729763031
>> Thu Mar 9 15:30:44 2017 0.600342988967896
>> Thu Mar 9 15:31:47 2017 1.43648099899292
>> Thu Mar 9 15:32:25 2017 0.897988796234131
>> Thu Mar 9 15:33:28 2017 0.998631000518799
>> Thu Mar 9 15:34:10 2017 4.89306116104126
>> Thu Mar 9 15:36:15 2017 1.22311997413635
>> Thu Mar 9 15:38:57 2017 1.64742302894592
>> Thu Mar 9 15:39:21 2017 1.36228013038635
>> Thu Mar 9 15:40:08 2017 1.62232208251953
>> Thu Mar 9 15:40:32 2017 1.37291598320007
>> Thu Mar 9 15:42:30 2017 0.211127996444702
>>
>> as you can see there are quite frequent short freezes ...
>
> So his KVM guest sees freezes. And he ran that perl in the OmniOS global zone
> w/o noticing any slowdowns.
>
> I asked him to pstack(1) the qemu process. It's attached below as pstack.zip.
>
> He further noticed a manifestation in the global zone:
>
>> What I found while looking up process numbers on the problematic box, is that
>>
>> time cat /proc/*/psinfo > /dev/null
>>
>> Takes anywhere between 0.01s and 4s if called repeatedly, whereas on a machine
>> another host where there are no sever kvm hangs this command always takes about
>> 0.02 secons.
>
> So he can find slowness that's likely KVM-induced in the global zone. With that
> in mind, I told him, and he responded:
>
>>> lockstat(1M) may be helpful here:
>>>
>>> lockstat -o <somewhere> cat /proc/*/psinfo > /dev/null
>>>
>>> I'll want to see that output from <somewhere>. ESPECIALLY if it's a "takes way
>>> too long" sort of result.
>
>
> He included three lockstat outputs, also attached below.
>
> The longer-running ones had this lock:
>
> 11423 73% 73% 0.00 3403 0xfffffea4320ef020
> gfn_to_memslot_unaliased+0x1f
>
> being hammered more in the long-running ones than in the short-running ones.
> Now I don't know KVM internals all that well, but it looks like the lock in
> question protects a linear-array-search of memory slots. It appears it just
> runs through and stops when the requested address is in the range of one of the
> hits.
>
> I've not asked Tobias yet to dig into kmdb to see how many memory slots are in
> this kvm, but let me do that too:
>
> Tobias: Try this:
>
> dtrace -n 'gfn_to_memslot_unaliased:entry { printf("\nKVM 0x%p, number of
> memslots: %d\n", arg0, ((struct kvm_memslots *)((struct kvm
> *)arg0)->memslots)->nmemslots); stack(); exit(0);}'
>
> If your system is still running the same KVM instances, check for
> 0xfffffea4320ef000 as the KVM pointer.
>
> Everyone else ===> Any idea if memslots would cause KVM instances to misbehave
> per above? If not, any other clues so I don't keep chasing red herrings? If
> so, should we perhaps spend some time making memslots more efficient? Or are
> there operational considerations not being accounted for?
>
> Thanks,
> Dan
--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
www.oetiker.ch tobi at oetiker.ch +41 62 775 9902
More information about the OmniOS-discuss
mailing list