[OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

W Verb wverb73 at gmail.com
Sat Feb 21 01:48:29 UTC 2015


Hello all,

Each of the things in the subject line are:
1: Horrendously broken
2: Have an extremely poor short-term outlook
3: Will take a huge investment of time by intelligent, dedicated, and
insightful people to fix.

It's common knowledge that the ixgbe driver in omnios/illumos/opensolaris
is broke as hell. The point of this message is not to complain. The point
is to pass on a configuration that is working for me, albeit in a
screwed-up degraded fashion.

I have four ESXi 5.5u2 host servers with 1 Intel PCI-e quad-gigabit NIC
installed in each. Three of the gigabit ports on each client are dedicated
to carry iSCSI traffic between each host and a single storage server.

The storage server is based on a Supermicro X10SLM-F mainboard, which has
three PCI-e slots. Two of the slots are used for storage controllers, and a
single slot is used for an Intel X520 dual-port fiber 10G NIC.

Previously, I had a single storage controller and two quad-gig NICs
installed in the storage server, and was able to get close to line-rate on
multipath iSCSI with three host clients. But when I added the fourth, I
upgraded to 10G.

After installation and configuration, I observed all kinds of bad behavior
in the network traffic between the hosts and the server. All of this bad
behavior is traced to the ixgbe driver on the storage server. Without going
into the full troubleshooting process, here are my takeaways:

1: The only tuning factor that appears to have significant effect on the
driver is MTU size. This applies to both the MTU of the ixgbe NIC as well
as the MTU of the 1-gig NICs used in the hosts.

2: I have seen best performance with the MTU on the ixgbe set to 1000 bytes
(yes, 1k). The MTU on the ESXi interfaces is set to 3000 bytes.

3: Setting 9000 byte MTUs on both sides results in about 150MB/s write
speeds on a a linux vmware guest running a 10GB dd operation. But read
speeds are at 5MB/s.

4: Testing of dd operations on the storage server itself shows that the
filesystem is capable of performing 500MB/s+ reads and writes.

5: After setting the MTUs listed in point 2, I am able to get 270-300MB/s
writes on the guest OS, and ~200MB/s reads. Not perfect, but I'll take it.

6: No /etc/system or other kernel tunings are in use.

7: Delayed ACK, Nagle, and L2 flow control tests had no effect.

8: pkg -u was performed before all tests, so I should be using the latest
kernel code, etc.

9: When capturing traffic on omnios, I used the CSW distribution of
tcpdump. It's worth noting that unlike EVERY ... OTHER ... IMPLEMENTATION
... of tcpdump I've ever used (BSD flavors, OSX, various linux distros,
various embedded distros), libpcap doesn't appear to get individual frame
reports from the omnios kernel, and so aggregates multi-frame TCP segments
into a single record. This has the appearance of 20-60kB frames being
transmitted by omnios when reading a packet capture with Wireshark. I
cannot tell you how irritating this is when troubleshooting network issues.

10: At the wire level, the speed problems are clearly due to pauses in
response time by omnios. At 9000 byte frame sizes, I see a good number of
duplicate ACKs and fast retransmits during read operations (when omnios is
transmitting). But below about a 4100-byte MTU on omnios (which seems to
correlate to 4096-byte iSCSI block transfers), the transmission errors fade
away and we only see the transmission pause problem.

I'm in the process of aggregating the 10G ports and performing some IO
testing with the vmware IO performance tool. That should show the
performance of the 10G NIC when both physical ports are in use, and
hopefully get me some more granularity on the MTU settings.

If anyone has a list of kernel tuning parameters to test, I'm happy to try
them out and report back. I've found a variety of suggestions online, but
between Illumos, solaris, openindiana, Nexenta, opensolaris, etc, the
supported variables are, um, inconsistent.

-Warren V
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150220/bc7dda80/attachment-0001.html>


More information about the OmniOS-discuss mailing list