[OmniOS-discuss] KVM Performance Update

Dan McDonald danmcd at omniti.com
Mon May 11 15:48:24 UTC 2015


I first want to apologize for not recognizing the cause of KVM performance problems (which were DROPPED PACKETS) much sooner.  Until recently, our KVM deployments in house have been either on r151006, or nothing else.  I've added an OI KVM box to our r151014 build machine, to make sure I have a platform to attempt replications.

What happened was that upstream illumos KVM (from Joyent) had a platform flag day during r151012's development --> the VND code. Joyent's illumos child has Virtual Networking Devices (VND) that allow KVM instances to not depend on an actual NIC's Promiscuous Mode to receive packets.  They updated their illumos, and subsequently their KVM.  Remember that "KVM" has two parts:  The kernel KVM driver (from Joyent's illumos-kvm repo), and the "KVM-cmd", which is QEMU (from Joyent's illumos-kvm-cmd repo).

Other distros do not have VND currently (the illumos community is attempting to fix this, and Joyent is leading here, modulo their own day jobs).  The compilation of illumos-kvm-cmd's latest revisions (the QEMU bits) fails without having VND around. We reset illumos-kvm-cmd to the pre-VND revision, but did NOT reset illumos-kvm bits to pre-VND.  Since the world compiled and ran in this split state, I moved forward.  The PROBLEM was that the amount of internal buffering for promiscuous devices is low, and while VND fixes the problem by reducing the use of promiscuous mode, non-VND illumos (like OmniOS) still needs to increase limits.  The up-to-date kernel side eliminated the method for increasing these buffering limits, causing MUCH higher packet drop rates.

Quoting Joyent's Robert Mustacchi:

> By default the stream high watermark for the promisc mode is quite low.
> And for some reason, that I don't recall, there was no great way to do
> that ourselves from user land (could be wrong entirely). As a result, if
> you don't set it, we're basically going to start dropping mblk_t's
> queued on the stream.
> 
> Basically without vnd, you need both of those. With vnd, then you can
> get rid of it in both QEMU and KVM.

Tobi Oetiker (who deserves a ton of credit for calling this problem out, AND determining it was packet drops) helped me test two solutions to the problem:

1.) Revert illumos-kvm to the pre-VND level as well.

2.) Keep up to date with illumos-kvm and illumos-kvm-cmd, but explicitly revert the VND changes in BOTH.

I'm strongly leaning toward committing solution #2. Regardless of which, I will be issuing an update for r151014 later this week that will push KVM performance back to its pre-VND-bump levels.

GOING FORWARD, once VND is upstreamed into illumos-gate, I can eliminate the VND backouts (or just catch up the built repos if I use option #1 above).

Thank you all for your patience, and again, sorry for not addressing this sooner.

Dan McDonald -- OmniOS Engineering




More information about the OmniOS-discuss mailing list