[OmniOS-discuss] Ang: Re: r151014 KVM crash

Fri Apr 10 07:06:17 UTC 2015

Hi Dan and list!

I've been doing some more research on this, see further down.

-----Dan McDonald <danmcd at omniti.com> skrev: -----
Till: Johan Kragsterman <johan.kragsterman at capvert.se>
Från: Dan McDonald <danmcd at omniti.com>
Datum: 2015-04-09 22:52
Kopia: "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>, Dan McDonald <danmcd at omniti.com>
Ärende: Re: [OmniOS-discuss] r151014 KVM crash

> On Apr 6, 2015, at 4:20 AM, Johan Kragsterman <johan.kragsterman at capvert.se> wrote:
> 
> 
> One of them is a Linux terminal server, and when I wanted to update/upgrade it, both the general OS and the chroot environments I got in it, it crashed. I tried several times, and every time I did it, it crashed. It seems to run without problems when I don't do any heavy work on it, but with this update/upgrade, it runs for about ~5 min, then it crashes. It can't get started again, until I reboot the server.

I've been running KVM on my 014 build box for the past day or two.

I've encountered one problem similar to your so far.  It manifests in boot time, where I will see this message while the guest kernel (OpenIndiana in this case) loads:

microfind: cpu is too fast

and get an instant, no-core-dump reboot.  I can catch this in KMDB and have the debugger available to me.  In illumos this can occur one of two places:

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/io/microfind.c#117

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/io/microfind.c#193

Both are in the microfind areas.  I don't know this area of the kernel very well, but I know people who do.  One thing *I* did was restart my qemu process.  Once I did that things *appeared* to be back to normal.  I'm running my KVM in the global zone currently, with this script (a variation of the one we provide on the OmniOS wiki):

--------------
r151014(~)[0]% cat oi-start.sh 
#!/usr/bin/bash

## NOTE:  This must be run as super user.

# configuration
VNIC=oi0
# Sample zvol path.
ROOTHDD=/dev/zvol/dsk/rpool/oi-disk
DATAHDD=/dev/zvol/dsk/data/oi-data-disk
CD=/export/home/danmcd/isos/oi-dev-151a8-text-x86.iso
VNC=1
# Memory for the KVM instance, in Megabytes (2^20 bytes).
MEM=8192

MAC=`dladm show-vnic -po macaddress $VNIC`

#NOTE: Add 
#	-boot cd \
# between -name and -enable-kvm if need be.

/usr/bin/qemu-system-x86_64 \
-name "$(basename $CD)" \
-enable-kvm \
-vnc 0.0.0.0:$VNC \
-smp 4 \
-m $MEM \
-no-hpet \
-localtime \
-drive file=$ROOTHDD,if=ide,index=0 \
-drive file=$DATAHDD,if=ide,index=1 \
-drive file=$CD,media=cdrom,if=ide,index=2  \
-net nic,vlan=0,name=net0,model=e1000,macaddr=$MAC \
-net vnic,vlan=0,name=net0,ifname=$VNIC \
-vga std \
-daemonize

if [ $? -gt 0 ]; then
    echo "Failed to start VM"
fi

# UDP port for VNC connections to the KVM instance.  5900 is added in the command.
port=`expr 5900 + $VNC`
public_nic=$(dladm show-vnic|grep vnic1|awk '{print $2}')
public_ip=$(ifconfig $public_nic|grep inet|awk '{print $2}')

echo "Started VM:"
echo "Public: ${public_ip}:${port}"
-----------------

I did not directly correspond these to error messages of your sort, but the times looks right.  I'm curious myself as to what these might be.

Dan

These pair of lines here under,were written to /var/adm/messages a little bit more than 1000 times/second, when I was on r151014. I'm not sure exactly when in the process they were written, but I guess it was under the quite heavy load of bulding chroot's and images in the VM, while the KVM process crashed. There doesn't seem to be a problem under more easy load.

Apr  4 16:26:09 omni2 genunix: [ID 713435 kern.info] unhandled rdmsr: 0xff311c4c
Apr  4 16:26:09 omni2 genunix: [ID 391722 kern.info] unhandled wrmsr: 0x526dc5 d

If it logs this much of the same, it looks serious to me, and it must effect the system, imho.

It doesn't log this much at all under r151012, even on heavy load, and no crashes at all under r151012.

There are no core dumps.

I will do some further investigations, like building a new chroot, and follow the messages, and also pay attention to what the console say's after the VM have crashed(if it crashes again under the same conditions).

Rgrds Johan

Later....

I have now changed back again to r151014, and I seem to not be able to repete this behaviour. I have been building several chroot's and images, and doing updates in the way that earlier chrashed the KVM process, but I see no problem now, and no extensive logging. And therefore I don't have access to the consol messages I saw earlier, as well, unfortunatly...

I don't know wether this is good or bad, that it seem to work now. It would have been good to know what caused the problems, but it is nice that it seems to work without problems now...

Rgrds Johan