[OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy

Joerg Goltermann jg at osn.de
Mon Mar 2 11:14:10 UTC 2015


Hi,

I would try *one* TPG which includes both interface addresses
and I would double check for packet drops on the Catalyst.

The 3560 supports only receive flow control which means, that
a sending 10Gbit port can easily overload a 1Gbit port.
Do you have flow control enabled?

  - Joerg

On 02.03.2015 09:22, W Verb via illumos-developer wrote:
> Hello Garrett,
>
> No, no 802.3ad going on in this config.
>
> Here is a basic schematic:
>
> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing
>
> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
>
> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing
>
> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The
> switch is set to allow 9148-byte frames, and I'm not seeing any
> errors/buffer overruns on the switch.
>
> Here is a screenshot of a packet capture from a read operation on the
> guest OS (from it's local drive, which is actually a VMDK file on the
> storage server). In this example, only a single 1G ESXi kernel interface
> (vmk1) is bound to the software iSCSI initiator.
>
> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing
>
> Note that there's a nice, well-behaved window sizing process taking
> place. The ESXi decreases the scaled window by 11 or 12 for each ACK,
> then bumps it back up to 512.
>
> Here is a similar screenshot of a single-interface write operation:
>
> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing
>
> There are no pauses or gaps in the transmission rate in the
> single-interface transfers.
>
>
> In the next screenshots, I have enabled an additional 1G interface on
> the ESXi host, and bound it to the iSCSI initiator. The new interface is
> bound to a separate physical port, uses a different VLAN on the switch,
> and talks to a different 10G port on the storage server.
>
> First, let's look at a write operation on the guest OS, which happily
> pumps data at near-line-rate to the storage server.
>
> Here is a sequence number trace diagram. Note how the transfer has a
> nice, smooth increment rate over the entire transfer.
>
> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing
>
> Here are screenshots from packet captures on both 1G interfaces:
>
> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing
> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing
>
> Note how we again see nice, smooth window adjustment, and no gaps in
> transmission.
>
>
> But now, let's look at the problematic two-interface Read operation.
> First, the sequence graph:
>
> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing
>
> As you can see, there are gaps and jumps in the transmission throughout
> the transfer.
> It is very illustrative to look at captures of the gaps, which are
> occurring on both interfaces:
>
> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing
> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing
>
> As you can see, there are ~.4 second pauses in transmission from the
> storage server, which kills the transfer rate.
> It's clear that the ESXi box ACKs the prior iSCSI operation to
> completion, then makes a new LUN request, which the storage server
> immediately replies to. The ESXi ACKs the response packet from the
> storage server, then waits...and waits....and waits... until eventually
> the storage server starts transmitting again.
>
> Because the pause happens while the ESXi client is waiting for a packet
> from the storage server, that tells me that the gaps are not an artifact
> of traffic being switched between both active interfaces, but are
> actually indicative of short hangs occurring on the server.
>
> Having a pause or two in transmission is no big deal, but in my case, it
> is happening constantly, and dropping my overall read transfer rate down
> to 20-60MB/s, which is slower than the single interface transfer rate
> (~90-100MB/s).
>
> Decreasing the MTU makes the pauses shorter, increasing them makes the
> pauses longer.
>
> Another interesting thing is that if I set the multipath io interval to
> 3 operations instead of 1, I get better throughput. In other words, the
> less frequently I swap IP addresses on my iSCSI requests from the ESXi
> unit, the fewer pauses I see.
>
> Basically, COMSTAR seems to choke each time an iSCSI request from a new
> IP arrives.
>
> Because the single interface transfer is near line rate, that tells me
> that the storage system (mpt_sas, zfs, etc) is working fine. It's only
> when multiple paths are attempted that iSCSI falls on its face during reads.
>
> All of these captures were taken without a cache device being attached
> to the storage zpool, so this isn't looking like some kind of ZFS ARC
> problem. As mentioned previously, local transfers to/from the zpool are
> showing ~300-500 MB/s rates over long transfers (10G+).
>
> -Warren V
>
> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org
> <mailto:garrett at damore.org>> wrote:
>
>     I’m not sure I’ve followed properly.  You have *two* interfaces.
>     You are not trying to provision these in an aggr are you? As far as
>     I’m aware, VMware does not support 802.3ad link aggregations.  (Its
>     possible that you can make it work with ESXi if you give the entire
>     NIC to the guest — but I’m skeptical.)  The problem is that if you
>     try to use link aggregation, some packets (up to half!) will be
>     lost.  TCP and other protocols fare poorly in this situation.
>
>     Its possible I’ve totally misunderstood what you’re trying to do, in
>     which case I apologize.
>
>     The idle thing is a red-herring — the cpu is waiting for work to do,
>     probably because packets haven’t arrived (or where dropped by the
>     hypervisor!)  I wouldn’t read too much into that except that your
>     network stack is in trouble.  I’d look a bit more closely at the
>     kstats for tcp — I suspect you’ll see retransmits or out of order
>     values that are unusually high — if so this may help validate my
>     theory above.
>
>     - Garrett
>
>>     On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>>     <developer at lists.illumos.org <mailto:developer at lists.illumos.org>>
>>     wrote:
>>
>>     Hello all,
>>
>>
>>     Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>>
>>
>>     I tried Joerg's updated driver, which didn't improve the issue. So
>>     I went back to the drawing board and rebuilt the server from scratch.
>>
>>     What I noted is that if I have only a single 1-gig physical
>>     interface active on the ESXi host, everything works as expected.
>>     As soon as I enable two interfaces, I start seeing the performance
>>     problems I've described.
>>
>>     Response pauses from the server that I see in TCPdumps are still
>>     leading me to believe the problem is delay on the server side, so
>>     I ran a series of kernel dtraces and produced some flamegraphs.
>>
>>
>>     This was taken during a read operation with two active 10G
>>     interfaces on the server, with a single target being shared by two
>>     tpgs- one tpg for each 10G physical port. The host device has two
>>     1G ports enabled, with VLANs separating the active ports into
>>     10G/1G pairs. ESXi is set to multipath using both VLANS with a
>>     round-robin IO interval of 1.
>>
>>     https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing
>>
>>
>>     This was taken during a write operation:
>>
>>     https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing
>>
>>
>>     I then rebooted the server and disabled C-State, ACPI T-State, and
>>     general EIST (Turbo boost) functionality in the CPU.
>>
>>     I when I attempted to boot my guest VM, the iSCSI transfer
>>     gradually ground to a halt during the boot loading process, and
>>     the guest OS never did complete its boot process.
>>
>>     Here is a flamegraph taken while iSCSI is slowly dying:
>>
>>     https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing
>>
>>
>>     I edited out cpu_idle_adaptive from the dtrace output and
>>     regenerated the slowdown graph:
>>
>>     https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing
>>
>>
>>     I then edited cpu_idle_adaptive out of the speedy write operation
>>     and regenerated that graph:
>>
>>     https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing
>>
>>
>>     I have zero experience with interpreting flamegraphs, but the most
>>     significant difference I see between the slow read example and the
>>     fast write example is in unix`thread_start --> unix`idle. There's
>>     a good chunk of "unix`i86_mwait" in the read example that is not
>>     present in the write example at all.
>>
>>     Disabling the l2arc cache device didn't make a difference, and I
>>     had to reenable EIST support on the CPU to get my VMs to boot.
>>
>>     I am seeing a variety of bug reports going back to 2010 regarding
>>     excessive mwait operations, with the suggested solutions usually
>>     being to set "cpupm enable poll-mode" in power.conf. That change
>>     also had no effect on speed.
>>
>>     -Warren V
>>
>>
>>
>>
>>     -----Original Message-----
>>
>>     From: Chris Siebenmann [mailto:cks at cs.toronto.edu]
>>
>>     Sent: Monday, February 23, 2015 8:30 AM
>>
>>     To: W Verb
>>
>>     Cc: omnios-discuss at lists.omniti.com
>>     <mailto:omnios-discuss at lists.omniti.com>; cks at cs.toronto.edu
>>     <mailto:cks at cs.toronto.edu>
>>
>>     Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and
>>     the Greek economy
>>
>>
>>     > Chris, thanks for your specific details. I'd appreciate it if you
>>
>>     > could tell me which copper NIC you tried, as well as to pass on the
>>
>>     > iSCSI tuning parameters.
>>
>>
>>     Our copper NIC experience is with onboard X540-AT2 ports on
>>     SuperMicro hardware (which have the guaranteed 10-20 msec lock
>>     hold) and dual-port 82599EB TN cards (which have some sort of
>>     driver/hardware failure under load that eventually leads to
>>     2-second lock holds). I can't recommend either with the current
>>     driver; we had to revert to 1G networking in order to get stable
>>     servers.
>>
>>
>>     The iSCSI parameter modifications we do, across both initiators
>>     and targets, are:
>>
>>
>>     initialr2tno
>>
>>     firstburstlength128k
>>
>>     maxrecvdataseglen128k[only on Linux backends]
>>
>>     maxxmitdataseglen128k[only on Linux backends]
>>
>>
>>     The OmniOS initiator doesn't need tuning for more than the first
>>     two parameters; on the Linux backends we tune up all four. My
>>     extended thoughts on these tuning parameters and why we touch them
>>     can be found
>>
>>     here:
>>
>>
>>     http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol
>>
>>     http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>>
>>
>>     The short version is that these parameters probably only make a
>>     small difference but their overall goal is to do 128KB ZFS reads
>>     and writes in single iSCSI operations (although they will be
>>     fragmented at the TCP
>>
>>     layer) and to do iSCSI writes without a back-and-forth delay
>>     between initiator and target (that's 'initialr2t no').
>>
>>
>>     I think basically everyone should use InitialR2T set to no and in
>>     fact that it should be the software default. These days only
>>     unusually limited iSCSI targets should need it to be otherwise and
>>     they can change their setting for it (initiator and target must
>>     both agree to it being 'yes', so either can veto it).
>>
>>
>>     - cks
>>
>>
>>
>>     On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de
>>     <mailto:jg at osn.de>> wrote:
>>
>>         Hi,
>>
>>         I think your problem is caused by your link properties or your
>>         switch settings. In general the standard ixgbe seems to perform
>>         well.
>>
>>         I had trouble after changing the default flow control settings
>>         to "bi"
>>         and this was my motivation to update the ixgbe driver a long
>>         time ago.
>>         After I have updated our systems to ixgbe 2.5.8 I never had any
>>         problems ....
>>
>>         Make sure your switch has support for jumbo frames and you use
>>         the same mtu on all ports, otherwise the smallest will be used.
>>
>>         What switch do you use? I can tell you nice horror stories about
>>         different vendors....
>>
>>          - Joerg
>>
>>         On 23.02.2015 10:31, W Verb wrote:
>>
>>             Thank you Joerg,
>>
>>             I've downloaded the package and will try it tomorrow.
>>
>>             The only thing I can add at this point is that upon review
>>             of my
>>             testing, I may have performed my "pkg -u" between the
>>             initial quad-gig
>>             performance test and installing the 10G NIC. So this may
>>             be a new
>>             problem introduced in the latest updates.
>>
>>             Those of you who are running 10G and have not upgraded to
>>             the latest
>>             kernel, etc, might want to do some additional testing
>>             before running the
>>             update.
>>
>>             -Warren V
>>
>>             On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann
>>             <jg at osn.de <mailto:jg at osn.de>
>>             <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
>>
>>                 Hi,
>>
>>                 I remember there was a problem with the flow control
>>             settings in the
>>                 ixgbe
>>                 driver, so I updated it a long time ago for our
>>             internal servers to
>>                 2.5.8.
>>                 Last weekend I integrated the latest changes from the
>>             FreeBSD driver
>>                 to bring
>>                 the illumos ixgbe to 2.5.25 but I had no time to test
>>             it, so it's
>>                 completely
>>                 untested!
>>
>>
>>                 If you would like to give the latest driver a try you
>>             can fetch the
>>                 kernel modules from
>>             https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9
>>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>>                 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>>             <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>
>>
>>                 Clone your boot environment, place the modules in the
>>             new environment
>>                 and update the boot-archive of the new BE.
>>
>>                   - Joerg
>>
>>
>>
>>
>>
>>                 On 23.02.2015 02:54, W Verb wrote:
>>
>>                     By the way, to those of you who have working
>>             setups: please send me
>>                     your pool/volume settings, interface linkprops,
>>             and any kernel
>>                     tuning
>>                     parameters you may have set.
>>
>>                     Thanks,
>>                     Warren V
>>
>>                     On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>                     <chip at innovates.com <mailto:chip at innovates.com>
>>             <mailto:chip at innovates.com <mailto:chip at innovates.com>>>
>>             wrote:
>>
>>                         I can't say I totally agree with your performance
>>                         assessment.   I run Intel
>>                         X520 in all my OmniOS boxes.
>>
>>                         Here is a capture of nfssvrtop I made while
>>             running many
>>                         storage vMotions
>>                         between two OmniOS boxes hosting NFS
>>             datastores.   This is a
>>                         10 host VMware
>>                         cluster.  Both OmniOS boxes are dual 10G
>>             connected with
>>                         copper twin-ax to
>>                         the in rack Nexus 5010.
>>
>>                         VMware does 100% sync writes, I use ZeusRAM
>>             SSDs for log
>>                         devices.
>>
>>                         -Chip
>>
>>                         2014 Apr 24 08:05:51, load: 12.64, read:
>>             17330243 KB,
>>                         swrite: 15985    KB,
>>                         awrite: 1875455  KB
>>
>>                         Ver     Client           NFSOPS   Reads
>>             SWrites AWrites
>>                         Commits   Rd_bw
>>                         SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t
>>              Com_t  Align%
>>
>>                         4       10.28.17.105          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         4       10.28.17.215          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         4       10.28.17.213          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         4       10.28.16.151          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         4       all                   1       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         3       10.28.16.175          3       0
>>              3       0
>>                           0       1
>>                         11       0    4806      48       0       0      85
>>
>>                         3       10.28.16.183          6       0
>>              6       0
>>                           0       3
>>                         162       0     549     124       0       0
>>               73
>>
>>                         3       10.28.16.180         11       0
>>             10       0
>>                           0       3
>>                         27       0     776      89       0       0      67
>>
>>                         3       10.28.16.176         28       2
>>             26       0
>>                           0      10
>>                         405       0    2572     198       0       0
>>              100
>>
>>                         3       10.28.16.178       4606    4602
>>              4       0
>>                           0  294534
>>                         3       0     723      49       0       0      99
>>
>>                         3       10.28.16.179       4905    4879
>>             26       0
>>                           0  312208
>>                         311       0     735     271       0       0
>>               99
>>
>>                         3       10.28.16.181       5515    5502
>>             13       0
>>                           0  352107
>>                         77       0      89      87       0       0      99
>>
>>                         3       10.28.16.184      12095   12059
>>             10       0
>>                           0  763014
>>                         39       0     249     147       0       0      99
>>
>>                         3       10.28.58.1        15401    6040
>>              116    6354
>>                         53  191605
>>                         474  202346     192      96     144      83
>>               99
>>
>>                         3       all 42574 33086 <tel:42574%2033086>
>>             <tel:42574%20%20%2033086>     217
>>                         6354      53 1913488
>>                         1582  202300     348     138     153     105
>>                 99
>>
>>
>>
>>
>>
>>                         On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>>             <wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>>                         <mailto:wverb73 at gmail.com
>>             <mailto:wverb73 at gmail.com>>> wrote:
>>
>>
>>                             Hello All,
>>
>>                             Thank you for your replies.
>>                             I tried a few things, and found the following:
>>
>>                             1: Disabling hyperthreading support in the
>>             BIOS drops
>>                             performance overall
>>                             by a factor of 4.
>>                             2: Disabling VT support also seems to have
>>             some effect,
>>                             although it
>>                             appears to be minor. But this has the
>>             amusing side
>>                             effect of fixing the
>>                             hangs I've been experiencing with fast
>>             reboot. Probably
>>                             by disabling kvm.
>>                             3: The performance tests are a bit tricky
>>             to quantify
>>                             because of caching
>>                             effects. In fact, I'm not entirely sure
>>             what is
>>                             happening here. It's just
>>                             best to describe what I'm seeing:
>>
>>                             The commands I'm using to test are
>>                             dd if=/dev/zero of=./test.dd bs=2M count=5000
>>                             dd of=/dev/null if=./test.dd bs=2M count=5000
>>                             The host vm is running Centos 6.6, and has
>>             the latest
>>                             vmtools installed.
>>                             There is a host cache on an SSD local to
>>             the host that
>>                             is also in place.
>>                             Disabling the host cache didn't
>>             immediately have an
>>                             effect as far as I could
>>                             see.
>>
>>                             The host MTU set to 3000 on all iSCSI
>>             interfaces for all
>>                             tests.
>>
>>                             Test 1: Right after reboot, with an ixgbe
>>             MTU of 9000,
>>                             the write test
>>                             yields an average speed over three tests
>>             of 137MB/s. The
>>                             read test yields an
>>                             average over three tests of 5MB/s.
>>
>>                             Test 2: After setting "ifconfig ixgbe0 mtu
>>             3000", the
>>                             write tests yield
>>                             140MB/s, and the read tests yield 53MB/s.
>>             It's important
>>                             to note here that
>>                             if I cut the read test short at only
>>             2-3GB, I get
>>                             results upwards of
>>                             350MB/s, which I assume is local
>>             cache-related distortion.
>>
>>                             Test 3: MTU of 1500. Read tests are up to
>>             156 MB/s.
>>                             Write tests yield
>>                             about 142MB/s.
>>                             Test 4: MTU of 1000: Read test at 182MB/s.
>>                             Test 5: MTU of 900: Read test at 130 MB/s.
>>                             Test 6: MTU of 1000: Read test at 160MB/s.
>>             Write tests
>>                             are now
>>                             consistently at about 300MB/s.
>>                             Test 7: MTU of 1200: Read test at 124MB/s.
>>                             Test 8: MTU of 1000: Read test at 161MB/s.
>>             Write at 261MB/s.
>>
>>                             A few final notes:
>>                             L1ARC grabs about 10GB of RAM during the
>>             tests, so
>>                             there's definitely some
>>                             read caching going on.
>>                             The write operations are easier to observe
>>             with iostat,
>>                             and I'm seeing io
>>                             rates that closely correlate with the
>>             network write speeds.
>>
>>
>>                             Chris, thanks for your specific details.
>>             I'd appreciate
>>                             it if you could
>>                             tell me which copper NIC you tried, as
>>             well as to pass
>>                             on the iSCSI tuning
>>                             parameters.
>>
>>                             I've ordered an Intel EXPX9502AFXSR, which
>>             uses the
>>                             82598 chip instead of
>>                             the 82599 in the X520. If I get similar
>>             results with my
>>                             fiber transcievers,
>>                             I'll see if I can get a hold of copper ones.
>>
>>                             But I should mention that I did indeed
>>             look at PHY/MAC
>>                             error rates, and
>>                             they are nil.
>>
>>                             -Warren V
>>
>>                             On Fri, Feb 20, 2015 at 7:25 PM, Chris
>>             Siebenmann
>>                             <cks at cs.toronto.edu
>>             <mailto:cks at cs.toronto.edu> <mailto:cks at cs.toronto.edu
>>             <mailto:cks at cs.toronto.edu>>>
>>
>>                             wrote:
>>
>>
>>                                     After installation and
>>             configuration, I observed
>>                                     all kinds of bad
>>                                     behavior
>>                                     in the network traffic between the
>>             hosts and the
>>                                     server. All of this
>>                                     bad
>>                                     behavior is traced to the ixgbe
>>             driver on the
>>                                     storage server. Without
>>                                     going
>>                                     into the full troubleshooting
>>             process, here are
>>                                     my takeaways:
>>
>>                                 [...]
>>
>>                                    For what it's worth, we managed to
>>             achieve much
>>                                 better line rates on
>>                                 copper 10G ixgbe hardware of various
>>             descriptions
>>                                 between OmniOS
>>                                 and CentOS 7 (I don't think we ever
>>             tested OmniOS to
>>                                 OmniOS). I don't
>>                                 believe OmniOS could do TCP at full
>>             line rate but I
>>                                 think we managed 700+
>>                                 Mbytes/sec on both transmit and
>>             receive and we got
>>                                 basically disk-limited
>>                                 speeds with iSCSI (across multiple
>>             disks on
>>                                 multi-disk mirrored pools,
>>                                 OmniOS iSCSI initiator, Linux iSCSI
>>             targets).
>>
>>                                    I don't believe we did any specific
>>             kernel tuning
>>                                 (and in fact some of
>>                                 our attempts to fiddle ixgbe driver
>>             parameters blew
>>                                 up in our face).
>>                                 We did tune iSCSI connection
>>             parameters to increase
>>                                 various buffer
>>                                 sizes so that ZFS could do even large
>>             single
>>                                 operations in single iSCSI
>>                                 transactions. (More details available
>>             if people are
>>                                 interested.)
>>
>>                                     10: At the wire level, the speed
>>             problems are
>>                                     clearly due to pauses in
>>                                     response time by omnios. At 9000
>>             byte frame
>>                                     sizes, I see a good number
>>                                     of duplicate ACKs and fast
>>             retransmits during
>>                                     read operations (when
>>                                     omnios is transmitting). But below
>>             about a
>>                                     4100-byte MTU on omnios
>>                                     (which seems to correlate to
>>             4096-byte iSCSI
>>                                     block transfers), the
>>                                     transmission errors fade away and
>>             we only see
>>                                     the transmission pause
>>                                     problem.
>>
>>
>>                                    This is what really attracted my
>>             attention. In
>>                                 our OmniOS setup, our
>>                                 specific Intel hardware had ixgbe
>>             driver issues that
>>                                 could cause
>>                                 activity stalls during once-a-second
>>             link heartbeat
>>                                 checks. This
>>                                 obviously had an effect at the TCP and
>>             iSCSI layers.
>>                                 My initial message
>>                                 to illumos-developer sparked a potentially
>>                                 interesting discussion:
>>
>>
>>             http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/
>>             <http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/>
>>
>>             <http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/
>>             <http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/>>
>>
>>                                 If you think this is a possibility in
>>             your setup,
>>                                 I've put the DTrace
>>                                 script I used to hunt for this up on
>>             the web:
>>
>>             http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d
>>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d>
>>
>>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d
>>             <http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d>>
>>
>>                                 This isn't the only potential source
>>             of driver
>>                                 stalls by any means, it's
>>                                 just the one I found. You may also
>>             want to look at
>>                                 lockstat in general,
>>                                 as information it reported is what led
>>             us to look
>>                                 specifically at the
>>                                 ixgbe code here.
>>
>>                                 (If you suspect kernel/driver issues,
>>             lockstat
>>                                 combined with kernel
>>                                 source is a really excellent resource.)
>>
>>                                           - cks
>>
>>
>>
>>
>>
>>             ___________________________________________________
>>                             OmniOS-discuss mailing list
>>             OmniOS-discuss at lists.omniti
>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>                             <mailto:OmniOS-discuss at lists.__omniti.com
>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>             http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>>
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>
>>
>>                     ___________________________________________________
>>                     OmniOS-discuss mailing list
>>             OmniOS-discuss at lists.omniti
>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>                     <mailto:OmniOS-discuss at lists.__omniti.com
>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>             http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>>
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>
>>
>>                 --
>>                 OSN Online Service Nuernberg GmbH, Bucher Str. 78,
>>             90408 Nuernberg
>>                 Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0>
>>             <tel:%2B49%20911%2039905-0> - Fax: +49 911
>>                 39905-55 <tel:%2B49%20911%2039905-55> -
>>             http://www.osn.de <http://www.osn.de/>
>>                 HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg
>>             Goltermann
>>
>>
>>
>>         --
>>         OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>>         Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> - Fax: +49
>>         911 39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de
>>         <http://www.osn.de/>
>>         HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>
>>
>>     *illumos-developer* | Archives
>>     <https://www.listbox.com/member/archive/182179/=now>
>>     <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e>
>>     | Modify <https://www.listbox.com/member/?&> Your Subscription
>>     [Powered by Listbox] <http://www.listbox.com/>
>>
>
>
> *illumos-developer* | Archives
> <https://www.listbox.com/member/archive/182179/=now>
> <https://www.listbox.com/member/archive/rss/182179/21175123-d0c8da4c> |
> Modify
> <https://www.listbox.com/member/?member_id=21175123&id_secret=21175123-d92578cc>
> Your Subscription	[Powered by Listbox] <http://www.listbox.com>
>

-- 
OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de
HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann


More information about the OmniOS-discuss mailing list