[OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy

Wed Mar 4 08:27:08 UTC 2015

Thank you for following up, Garrett,

The logs of all lockstat sessions are now in the zipfile located here:
https://drive.google.com/file/d/0BwyUMjibonYQeVlzN2VndGstRUk/view?usp=sharing

Regards,

Warren V

On Tue, Mar 3, 2015 at 11:30 PM, Garrett D'Amore <garrett at damore.org> wrote:

> I'm not surprised by this result.  Indeed with the earlier data you had
> from lockstat it looked like a comstar or zfs issue on the server.
> Unfortunately the follow up lockstat you sent was pruned to uselessness.
> If you can post the full lockstat with -s5 somewhere it might help
> understand what is actually going on under the hood.
>
> Sent from my iPhone
>
> On Mar 3, 2015, at 9:21 PM, W Verb <wverb73 at gmail.com> wrote:
>
> Hello all,
>
> This is probably the last message in this thread.
>
> I pulled the quad-gig NIC out of one of my hosts, and installed an X520. I
> then set a single 10G port on the server to be on the same VLAN as the
> host, and defined a vswitch, vmknic, etc on the host.
>
> I set the MTU to be 9000 on both sides, then ran my tests.
>
> Read:  130 MB/s.
> Write:  156 MB/s.
>
> Additionally, at higher MTUs, the NIC would periodically lock up until I
> performed an "ipadm disable-if -t ixgbe0" and re-enabled it. I tried your
> updated driver, Jeorg, but unfortunately it failed quite often.
>
> I then disabled stmf, enabled NFS (v3 only) on the server, and shared a
> dataset on the zpool with "share -f nfs /ppool/testy".
> I then mounted the server dataset on the host via NFS, and copied my test
> VM from the iSCSI zvol to the NFS dataset. I also removed the binding of
> the 10G port on the host from the sw iscsi interface.
>
> Running the same tests on the VM over NFSv3 yielded:
>
> Read: 650MB/s
> Write: 306MB/s
>
> This is getting within 10% of the throughput I consistently get on dd
> operations local on the server, so I'm pretty happy that I'm getting as
> good as I'm going to get until I add more drives. Additionally, I haven't
> experienced any NIC hangs.
>
> I tried varying the settings in ixgbe.conf, the MTU, and disabling LSO on
> the host and server, but nothing really made that much of a difference
> (except reducing the MTU made things about 20-30% slower).
>
> mpstat during both NFS and iSCSI transfers showed all processors as
> getting roughly the same number of interrupts, etc, although I did see a
> varying number of  spins on reader/writer locks during the iSCSI transfers.
> The NFS showed no srws at all.
>
> Here is a pretty representative example of a 1s mpstat during an iSCSI
> transfer:
>
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
> idl set
>   0    0   0    0  3246 2690 8739    6  772 5967    2     0    0  11   0
> 89   0
>   1    0   0    0  2366 2249 7910    8  988 5563    2   302    0   9   0
> 91   0
>   2    0   0    0  2455 2344 5584    5  687 5656    3    66    0   9   0
> 91   0
>   3    0   0   25   248   12 6210    1  885 5679    2     0    0   9   0
> 91   0
>   4    0   0    0   284    7 5450    2  861 5751    1     0    0   8   0
> 92   0
>   5    0   0    0   232    3 4513    0  547 5733    3     0    0   7   0
> 93   0
>   6    0   0    0   322    8 6084    1  836 6295    2     0    0   8   0
> 92   0
>   7    0   0    0  3114 2848 8229    4  648 4966    2     0    0  10   0
> 90   0
>
>
> So, it seems that it's COMSTAR/iSCSI that's broke as hell, not ixgbe. My
> apologies to anyone I may have offended with my pre-judgement.
>
> The consequences of this performance issue are significant:
> 1: Instead of being able to utilize the existing quad-port NICs I have in
> my hosts, I must use dual 10G cards for redundancy purposes.
> 2: I must build out a full 10G switching infrastructure.
> 3: The network traffic is inherently less secure, as it is essentially
> impossible to do real security with NFSv3 (that is supported by ESXi).
>
> In the short run, I have already ordered some relatively cheap 20G
> infiniband gear that will hopefully push up the cost/performance ratio.
> However, I have received all sorts of advice about how painful it can be to
> build and maintain infiniband, and if iSCSI over 10G ethernet is this
> painful, I'm not hopeful that infiniband will "just work".
>
> The last option, of course, is to bail out of the Solaris derivatives and
> move to ZoL or ZoBSD. The drawbacks of this are:
>
> 1: ZoL doesn't easily support booting off of mirrored USB flash drives,
> let alone running the root filesystem and swap on them. FreeNAS, by way of
> comparison, puts a 2G swap partition on each zdev, which (strangely enough)
> causes it to often crash when a zdev experiences a failure under load.
>
> 2: Neither ZoL or FreeNAS have good, stable, kernel-based iSCSI
> implementations. FreeNAS is indeed testing istgt, but it proved unstable
> for my purposes in recent builds. Unfortunately, stmf hasn't proved itself
> any better.
>
> There are other minor differences, but these are the ones that brought me
> to OmniOS in the first place. We'll just have to wait and see how well the
> infiniband stuff works.
>
>
> Hopefully this exercise will help prevent others from going down the same
> rabbit-hole that I did.
>
> -Warren V
>
>
>
>
> On Tue, Mar 3, 2015 at 3:45 PM, W Verb <wverb73 at gmail.com> wrote:
>
>> Hello Rob et al,
>>
>> Thank you for taking the time to look at this problem with me. I
>> completely understand your inclination to look at the network as the most
>> probable source of my issue, but I believe that this is a pretty clear-cut
>> case of server-side issues.
>>
>> 1: I did run ping RTT tests during both read and write operations with
>> multiple interfaces enabled, and the RTT stayed at ~.2ms regardless of
>> whether traffic was actively being transmitted/received or not.
>>
>> 2: I am not seeing the TCP window size bouncing around, and I am
>> certainly not seeing starvation and delay in my packet captures. It is true
>> that I do see delayed ACKs and retransmissions when I bump the MTU to 9000
>> on both sides, but I stopped testing with high MTU as soon as I saw it
>> happening because I have a good understanding of incast. All of my recent
>> testing has been with MTUs between 1000 and 3000 bytes.
>>
>> 3: When testing with MTUs between 1000 and 3000 bytes, I do not see lost
>> packets and retransmission in captures on either the server or client side.
>> I only see staggered transmission delays on the part of the server.
>>
>> 4: The client is consistently advertising a large window size (20k+), so
>> the TCP throttling mechanism does not appear to play into this.
>>
>> 5: As mentioned previously, layer 2 flow control is not enabled anywhere
>> in the network, so there are no lower-level mechanisms at work.
>>
>> 6: Upon checking buffer and queue sizes (and doing the appropriate
>> research into documentation on the C3560E's buffer sizes), I do not see
>> large numbers of frames being dropped by the switch. It does happen at
>> larger MTUs, but not very often (and not consistently) during transfers at
>> 1000-3000 byte MTUs. I do not have QoS, policing, or rate-shaping enabled.
>>
>> 7: Network interface stats on both the server and the ESXi client show no
>> errors of any kind. This is via netstat on the server, and esxcli / Vsphere
>> client on the ESXi box.
>>
>> 8: When looking at captures taken simultaneously on the server and client
>> side, the server-side transmission pauses are consistently seen and
>> reproducible, even after multiple server rebuilds, ESXi rebuilds, vSphere
>> reinstallations (down to wiping the SQL db), various COMSTAR configuration
>> variations, multiple 10G NICs with different NIC chipsets, multiple
>> switches (I tried both a 48-port and 24-port C3560E), multiple IOS
>> revisions (12.2 and 15.0), OmniOS versions (r151012 and previous) multiple
>> cables, transceivers, etc etc etc etc etc
>>
>> For your review, I have uploaded the actual packet captures to Google
>> Drive:
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQZS03dHJ2ZjJvTEE/view?usp=sharing
>> 2 int write - ESXi vmk5
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQTlNTQ2M5bjlxZ00/view?usp=sharing
>> 2 int write - ESXi vmk1
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQUEFsSVJCYXBVX3c/view?usp=sharing
>> 2 int read -  server ixgbe0
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQT3FBbElnNFpJTzQ/view?usp=sharing
>> 2 int read - ESXi vmk5
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQU1hXdFRLM2cxSTA/view?usp=sharing
>> 2 int read - ESXi vmk1
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQNEFZSHVZdFNweDA/view?usp=sharing
>> 1 int write - ESXi vmk1
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQM3FpTmloQm5iMGc/view?usp=sharing
>> 1 int read - ESXi vmk1
>>
>> Regards,
>>
>> Warren V
>>
>> On Mon, Mar 2, 2015 at 1:11 PM, Mallory, Rob <rmallory at qualcomm.com>
>> wrote:
>>
>>>  Just an EWAG,   and forgive me for not following closely, I just saw
>>> this in my inbox, and looked at it and the screenshots for 2 minutes.
>>>
>>>
>>>
>>>   But this looks like the typical incast problem..  see
>>> http://www.pdl.cmu.edu/Incast/
>>>
>>> where your storage servers (there are effectively two with ISCSI/MPIO if
>>> round-robin is working) have networks which are 20:1 oversubscribed to your
>>> 1GbE host interfaces. (although one of the tcpdumps shows only one server
>>> so it may be choked out completely)
>>>
>>>
>>>
>>> What is your BDP?  I’m guessing .150ms * 1GbE.  For single-link that
>>> gets you to a MSS of 18700 or so.
>>>
>>>
>>>
>>> On your 1GbE connected clients, leave MTU at 9k, set the following in
>>> sysctl.conf,
>>>
>>> And reboot.
>>>
>>>
>>>
>>> net.ipv4.tcp_rmem = 4096 8938 17876
>>>
>>>
>>>
>>> If MPIO from the server is indeed round-robining properly, this will
>>> “make things fit” much better.
>>>
>>>
>>>
>>> Note that your tcp_wmem can and should stay high, since you are not
>>> oversubscribed going from clientàserver ;  you only need to tweak the
>>> tcp receive window size.
>>>
>>>
>>>
>>> I’ve not done it in quite some time, but IIRC, You can also set these
>>> from the server side with:
>>>
>>> Route add -sendpipe 8930   or –ssthresh
>>>
>>>
>>>
>>> And I think you can see the hash-table with computed BDP per client with
>>> ndd.
>>>
>>>
>>>
>>> I would try playing with those before delving deep into potential bugs
>>> in the TCP, nic driver, zfs, or vm.
>>>
>>> -Rob
>>>
>>>
>>>
>>> *From:* W Verb via illumos-developer [mailto:developer at lists.illumos.org]
>>>
>>> *Sent:* Monday, March 02, 2015 12:20 PM
>>> *To:* Garrett D'Amore
>>> *Cc:* Joerg Goltermann; illumos-dev; omnios-discuss at lists.omniti.com
>>> *Subject:* Re: [developer] Re: [OmniOS-discuss] The ixgbe driver,
>>> Lindsay Lohan, and the Greek economy
>>>
>>>
>>>
>>> Hello,
>>>
>>> vmstat seems pretty boring. Certainly nothing going to swap.
>>>
>>> root at sanbox:/root# vmstat
>>>  kthr      memory            page            disk          faults
>>> cpu
>>>  r b w   swap  free  re  mf pi po fr de sr po ro s0 s2   in   sy   cs us
>>> sy id
>>>  0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681
>>> 0  1 99
>>>
>>>  Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep
>>> 30" during the "fast" write operation.
>>>
>>>
>>> -------------------------------------------------------------------------------
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>> 50934   3%  79% 0.00     3437 0xffffff093145ba40     taskq_dispatch_ent
>>>
>>>       nsec ------ Time Distribution ------ count     Stack
>>>        128 |                               7
>>> spa_taskq_dispatch_ent
>>>        256 |@@                             4333      zio_taskq_dispatch
>>>        512 |@@                             3863      zio_issue_async
>>>       1024 |@@@@@                          9717      zio_execute
>>>       2048 |@@@@@@@@@                      15904
>>>       4096 |@@@@                           7595
>>>       8192 |@@                             4498
>>>      16384 |@                              2662
>>>      32768 |@                              1886
>>>      65536 |                               434
>>>     131072 |                               34
>>>     262144 |                               1
>>>
>>> -------------------------------------------------------------------------------
>>>
>>>
>>>   However, the truly "broken" function is a read operation:
>>>
>>> Top lock 1st try:
>>>
>>> -------------------------------------------------------------------------------
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>>   474  15%  15% 0.00     7031 0xffffff093145b6f8     cv_wait
>>>
>>>       nsec ------ Time Distribution ------ count     Stack
>>>        256 |@                              29        taskq_thread_wait
>>>        512 |@@@@@@                         100       taskq_thread
>>>       1024 |@@@@                           72        thread_start
>>>       2048 |@@@@                           69
>>>       4096 |@@@                            51
>>>       8192 |@@                             47
>>>      16384 |@@                             44
>>>      32768 |@@                             32
>>>      65536 |@                              25
>>>     131072 |                               5
>>>
>>> -------------------------------------------------------------------------------
>>>
>>>   Top lock 2nd try:
>>>
>>>
>>> -------------------------------------------------------------------------------
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>>   174  39%  39% 0.00   103909 0xffffff0943f116a0     dmu_zfetch_find
>>>
>>>       nsec ------ Time Distribution ------ count     Stack
>>>       2048 |                               2         dmu_zfetch
>>>       4096 |                               3         dbuf_read
>>>       8192 |                               4
>>> dmu_buf_hold_array_by_dnode
>>>      16384 |                               3         dmu_buf_hold_array
>>>      32768 |@                              7
>>>      65536 |@@                             14
>>>     131072 |@@@@@@@@@@@@@@@@@@@@           116
>>>     262144 |@@@                            19
>>>     524288 |                               4
>>>    1048576 |                               2
>>>
>>> -------------------------------------------------------------------------------
>>>
>>> Top lock 3rd try:
>>>
>>>
>>> -------------------------------------------------------------------------------
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>>   283  55%  55% 0.00    94602 0xffffff0943ff5a68     dmu_zfetch_find
>>>
>>>       nsec ------ Time Distribution ------ count     Stack
>>>        512 |                               1         dmu_zfetch
>>>       1024 |                               1         dbuf_read
>>>       2048 |                               0
>>> dmu_buf_hold_array_by_dnode
>>>       4096 |                               5         dmu_buf_hold_array
>>>       8192 |                               2
>>>      16384 |                               7
>>>      32768 |                               4
>>>      65536 |@@@                            33
>>>     131072 |@@@@@@@@@@@@@@@@@@@@           198
>>>     262144 |@@                             27
>>>     524288 |                               2
>>>    1048576 |                               3
>>>
>>> -------------------------------------------------------------------------------
>>>
>>>
>>>
>>> As for the MTU question- setting the MTU to 9000 makes read operations
>>> grind almost to a halt at 5MB/s transfer rate.
>>>
>>> -Warren V
>>>
>>>
>>>
>>> On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore <garrett at damore.org>
>>> wrote:
>>>
>>>  Here’s a theory.  You are using small (relatively) MTUs (3000 is less
>>> than the smallest ZFS block size.)  So, when you go multipathing this way,
>>> might a single upper layer transaction (ZFS block transfer request, or for
>>> that matter COMSTAR block request) get routed over different paths.  This
>>> sounds like a potentially pathological condition to me.
>>>
>>>
>>>
>>> What happens if you increase the MTU to 9000?  Have you tried it?  I’m
>>> sort of thinking that this will permit each transaction to be issued in a
>>> single IP frame, which may alleviate certain tragic code paths.  (That
>>> said, I’m not sure how aware COMSTAR is of the IP MTU.  If it is ignorant,
>>> then it shouldn’t matter *that* much, since TCP should do the right thing
>>> here and a single TCP stream should stick to a single underlying NIC.  But
>>> if COMSTAR is aware of the MTU, it may do some really screwball things as
>>> it tries to break requests up into single frames.)
>>>
>>>
>>>
>>> Your read spin really looks like only about 22 msec of wait out of a
>>> total run of 30 sec.  (That’s not *great*, but neither does it sound
>>> tragic.)  Your write  is interesting because that looks like it is going a
>>> wildly different path.  You should be aware that the locks you see are
>>> *not* necessarily related in call order, but rather are ordered by instance
>>> count.  The write code path hitting the task_thread as hard as it does is
>>> really, really weird.  Something is pounding on a taskq lock super hard.
>>> The number of taskq_dispatch_ent calls is interesting here.  I’m starting
>>> to wonder if it’s something as stupid as a spin where if the taskq is
>>> “full” (max size reached), a caller just is spinning trying to dispatch
>>> jobs to the taskq.
>>>
>>>
>>>
>>> The taskq_dispatch_ent code is super simple, and it should be almost
>>> impossible to have contention on that lock — barring a thread spinning hard
>>> on taskq_dispatch (or taskq_dispatch_ent as I think is happening here).
>>> Looking at the various call sites, there are places in both COMSTAR
>>> (iscsit) and in ZFS where this could be coming from.  To know which, we
>>> really need to have the back trace associated.
>>>
>>>
>>>
>>> lockstat can give this — try giving “-s 5” to give a short backtrace
>>> from this, that will probably give us a little more info about the guilty
>>> caller. :-)
>>>
>>>
>>>
>>> - Garrett
>>>
>>>
>>>
>>>   On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer <
>>> developer at lists.illumos.org> wrote:
>>>
>>>
>>>
>>> Hello all,
>>>
>>> I am not using layer 2 flow control. The switch carries line-rate 10G
>>> traffic without error.
>>>
>>> I think I have found the issue via lockstat. The first lockstat is taken
>>> during a multipath read:
>>>
>>>  lockstat -kWP sleep 30
>>>
>>>
>>> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec)
>>>
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>>
>>> -------------------------------------------------------------------------------
>>>  9306  44%  44% 0.00     1557 htable_mutex+0x370     htable_release
>>>  6307  23%  68% 0.00     1207 htable_mutex+0x108     htable_lookup
>>>   596   7%  75% 0.00     4100 0xffffff0931705188     cv_wait
>>>   349   5%  80% 0.00     4437 0xffffff0931705188     taskq_thread
>>>   704   2%  82% 0.00      995 0xffffff0935de3c50     dbuf_create
>>>
>>> The hash table being read here I would guess is the tcp connection hash
>>> table.
>>>
>>>
>>>
>>> When lockstat is run during a multipath write operation, I get:
>>>
>>> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec)
>>>
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>>
>>> -------------------------------------------------------------------------------
>>> 210752  28%  28% 0.00     4781 0xffffff0931705188     taskq_thread
>>> 174471  22%  50% 0.00     4476 0xffffff0931705188     cv_wait
>>> 127183  10%  61% 0.00     2871 0xffffff096f29b510     zio_notify_parent
>>> 176066  10%  70% 0.00     1922 0xffffff0931705188     taskq_dispatch_ent
>>> 105134   9%  80% 0.00     3110 0xffffff096ffdbf10     zio_remove_child
>>> 67512   4%  83% 0.00     1938 0xffffff096f3db4b0     zio_add_child
>>> 45736   3%  86% 0.00     2239 0xffffff0935de3c50     dbuf_destroy
>>> 27781   3%  89% 0.00     3416 0xffffff0935de3c50     dbuf_create
>>> 38536   2%  91% 0.00     2122 0xffffff0935de3b70     dnode_rele
>>> 27841   2%  93% 0.00     2423 0xffffff0935de3b70     dnode_diduse_space
>>> 19020   2%  95% 0.00     3046 0xffffff09d9e305e0     dbuf_rele
>>> 14627   1%  96% 0.00     3632 dbuf_hash_table+0x4f8  dbuf_find
>>>
>>>
>>>   Writes are not performing htable lookups, while reads are.
>>>
>>> -Warren V
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann <jg at osn.de> wrote:
>>>
>>>  Hi,
>>>
>>> I would try *one* TPG which includes both interface addresses
>>> and I would double check for packet drops on the Catalyst.
>>>
>>> The 3560 supports only receive flow control which means, that
>>> a sending 10Gbit port can easily overload a 1Gbit port.
>>> Do you have flow control enabled?
>>>
>>>  - Joerg
>>>
>>>
>>>
>>> On 02.03.2015 09:22, W Verb via illumos-developer wrote:
>>>
>>>   Hello Garrett,
>>>
>>> No, no 802.3ad going on in this config.
>>>
>>> Here is a basic schematic:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing
>>>
>>> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing
>>>
>>> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The
>>> switch is set to allow 9148-byte frames, and I'm not seeing any
>>> errors/buffer overruns on the switch.
>>>
>>> Here is a screenshot of a packet capture from a read operation on the
>>> guest OS (from it's local drive, which is actually a VMDK file on the
>>> storage server). In this example, only a single 1G ESXi kernel interface
>>> (vmk1) is bound to the software iSCSI initiator.
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing
>>>
>>> Note that there's a nice, well-behaved window sizing process taking
>>> place. The ESXi decreases the scaled window by 11 or 12 for each ACK,
>>> then bumps it back up to 512.
>>>
>>> Here is a similar screenshot of a single-interface write operation:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing
>>>
>>> There are no pauses or gaps in the transmission rate in the
>>> single-interface transfers.
>>>
>>>
>>> In the next screenshots, I have enabled an additional 1G interface on
>>> the ESXi host, and bound it to the iSCSI initiator. The new interface is
>>> bound to a separate physical port, uses a different VLAN on the switch,
>>> and talks to a different 10G port on the storage server.
>>>
>>> First, let's look at a write operation on the guest OS, which happily
>>> pumps data at near-line-rate to the storage server.
>>>
>>> Here is a sequence number trace diagram. Note how the transfer has a
>>> nice, smooth increment rate over the entire transfer.
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing
>>>
>>> Here are screenshots from packet captures on both 1G interfaces:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing
>>>
>>> Note how we again see nice, smooth window adjustment, and no gaps in
>>> transmission.
>>>
>>>
>>> But now, let's look at the problematic two-interface Read operation.
>>> First, the sequence graph:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing
>>>
>>> As you can see, there are gaps and jumps in the transmission throughout
>>> the transfer.
>>> It is very illustrative to look at captures of the gaps, which are
>>> occurring on both interfaces:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing
>>>
>>> As you can see, there are ~.4 second pauses in transmission from the
>>> storage server, which kills the transfer rate.
>>> It's clear that the ESXi box ACKs the prior iSCSI operation to
>>> completion, then makes a new LUN request, which the storage server
>>> immediately replies to. The ESXi ACKs the response packet from the
>>> storage server, then waits...and waits....and waits... until eventually
>>> the storage server starts transmitting again.
>>>
>>> Because the pause happens while the ESXi client is waiting for a packet
>>> from the storage server, that tells me that the gaps are not an artifact
>>> of traffic being switched between both active interfaces, but are
>>> actually indicative of short hangs occurring on the server.
>>>
>>> Having a pause or two in transmission is no big deal, but in my case, it
>>> is happening constantly, and dropping my overall read transfer rate down
>>> to 20-60MB/s, which is slower than the single interface transfer rate
>>> (~90-100MB/s).
>>>
>>> Decreasing the MTU makes the pauses shorter, increasing them makes the
>>> pauses longer.
>>>
>>> Another interesting thing is that if I set the multipath io interval to
>>> 3 operations instead of 1, I get better throughput. In other words, the
>>> less frequently I swap IP addresses on my iSCSI requests from the ESXi
>>> unit, the fewer pauses I see.
>>>
>>> Basically, COMSTAR seems to choke each time an iSCSI request from a new
>>> IP arrives.
>>>
>>> Because the single interface transfer is near line rate, that tells me
>>> that the storage system (mpt_sas, zfs, etc) is working fine. It's only
>>> when multiple paths are attempted that iSCSI falls on its face during
>>> reads.
>>>
>>> All of these captures were taken without a cache device being attached
>>> to the storage zpool, so this isn't looking like some kind of ZFS ARC
>>> problem. As mentioned previously, local transfers to/from the zpool are
>>> showing ~300-500 MB/s rates over long transfers (10G+).
>>>
>>> -Warren V
>>>
>>> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org
>>>
>>> <mailto:garrett at damore.org>> wrote:
>>>
>>>     I’m not sure I’ve followed properly.  You have *two* interfaces.
>>>     You are not trying to provision these in an aggr are you? As far as
>>>     I’m aware, VMware does not support 802.3ad link aggregations.  (Its
>>>     possible that you can make it work with ESXi if you give the entire
>>>     NIC to the guest — but I’m skeptical.)  The problem is that if you
>>>     try to use link aggregation, some packets (up to half!) will be
>>>     lost.  TCP and other protocols fare poorly in this situation.
>>>
>>>     Its possible I’ve totally misunderstood what you’re trying to do, in
>>>     which case I apologize.
>>>
>>>     The idle thing is a red-herring — the cpu is waiting for work to do,
>>>     probably because packets haven’t arrived (or where dropped by the
>>>     hypervisor!)  I wouldn’t read too much into that except that your
>>>     network stack is in trouble.  I’d look a bit more closely at the
>>>     kstats for tcp — I suspect you’ll see retransmits or out of order
>>>     values that are unusually high — if so this may help validate my
>>>     theory above.
>>>
>>>     - Garrett
>>>
>>>     On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>>>     <developer at lists.illumos.org <mailto:developer at lists.illumos.org>>
>>>
>>>
>>>     wrote:
>>>
>>>     Hello all,
>>>
>>>
>>>     Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>>>
>>>
>>>     I tried Joerg's updated driver, which didn't improve the issue. So
>>>     I went back to the drawing board and rebuilt the server from scratch.
>>>
>>>     What I noted is that if I have only a single 1-gig physical
>>>     interface active on the ESXi host, everything works as expected.
>>>     As soon as I enable two interfaces, I start seeing the performance
>>>     problems I've described.
>>>
>>>     Response pauses from the server that I see in TCPdumps are still
>>>     leading me to believe the problem is delay on the server side, so
>>>     I ran a series of kernel dtraces and produced some flamegraphs.
>>>
>>>
>>>     This was taken during a read operation with two active 10G
>>>     interfaces on the server, with a single target being shared by two
>>>     tpgs- one tpg for each 10G physical port. The host device has two
>>>     1G ports enabled, with VLANs separating the active ports into
>>>     10G/1G pairs. ESXi is set to multipath using both VLANS with a
>>>     round-robin IO interval of 1.
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing
>>>
>>>
>>>     This was taken during a write operation:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing
>>>
>>>
>>>     I then rebooted the server and disabled C-State, ACPI T-State, and
>>>     general EIST (Turbo boost) functionality in the CPU.
>>>
>>>     I when I attempted to boot my guest VM, the iSCSI transfer
>>>     gradually ground to a halt during the boot loading process, and
>>>     the guest OS never did complete its boot process.
>>>
>>>     Here is a flamegraph taken while iSCSI is slowly dying:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing
>>>
>>>
>>>     I edited out cpu_idle_adaptive from the dtrace output and
>>>     regenerated the slowdown graph:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing
>>>
>>>
>>>     I then edited cpu_idle_adaptive out of the speedy write operation
>>>     and regenerated that graph:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing
>>>
>>>
>>>     I have zero experience with interpreting flamegraphs, but the most
>>>     significant difference I see between the slow read example and the
>>>     fast write example is in unix`thread_start --> unix`idle. There's
>>>     a good chunk of "unix`i86_mwait" in the read example that is not
>>>     present in the write example at all.
>>>
>>>     Disabling the l2arc cache device didn't make a difference, and I
>>>     had to reenable EIST support on the CPU to get my VMs to boot.
>>>
>>>     I am seeing a variety of bug reports going back to 2010 regarding
>>>     excessive mwait operations, with the suggested solutions usually
>>>     being to set "cpupm enable poll-mode" in power.conf. That change
>>>     also had no effect on speed.
>>>
>>>     -Warren V
>>>
>>>
>>>
>>>
>>>     -----Original Message-----
>>>
>>>     From: Chris Siebenmann [mailto:cks at cs.toronto.edu]
>>>
>>>     Sent: Monday, February 23, 2015 8:30 AM
>>>
>>>     To: W Verb
>>>
>>>     Cc: omnios-discuss at lists.omniti.com
>>>
>>>     <mailto:omnios-discuss at lists.omniti.com>; cks at cs.toronto.edu
>>>     <mailto:cks at cs.toronto.edu>
>>>
>>>     Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and
>>>     the Greek economy
>>>
>>>
>>>     > Chris, thanks for your specific details. I'd appreciate it if you
>>>
>>>     > could tell me which copper NIC you tried, as well as to pass on the
>>>
>>>     > iSCSI tuning parameters.
>>>
>>>
>>>     Our copper NIC experience is with onboard X540-AT2 ports on
>>>     SuperMicro hardware (which have the guaranteed 10-20 msec lock
>>>     hold) and dual-port 82599EB TN cards (which have some sort of
>>>     driver/hardware failure under load that eventually leads to
>>>     2-second lock holds). I can't recommend either with the current
>>>     driver; we had to revert to 1G networking in order to get stable
>>>     servers.
>>>
>>>
>>>     The iSCSI parameter modifications we do, across both initiators
>>>     and targets, are:
>>>
>>>
>>>     initialr2tno
>>>
>>>     firstburstlength128k
>>>
>>>     maxrecvdataseglen128k[only on Linux backends]
>>>
>>>     maxxmitdataseglen128k[only on Linux backends]
>>>
>>>
>>>     The OmniOS initiator doesn't need tuning for more than the first
>>>     two parameters; on the Linux backends we tune up all four. My
>>>     extended thoughts on these tuning parameters and why we touch them
>>>     can be found
>>>
>>>     here:
>>>
>>>
>>>
>>> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol
>>>
>>>     http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>>>
>>>
>>>     The short version is that these parameters probably only make a
>>>     small difference but their overall goal is to do 128KB ZFS reads
>>>     and writes in single iSCSI operations (although they will be
>>>     fragmented at the TCP
>>>
>>>     layer) and to do iSCSI writes without a back-and-forth delay
>>>     between initiator and target (that's 'initialr2t no').
>>>
>>>
>>>     I think basically everyone should use InitialR2T set to no and in
>>>     fact that it should be the software default. These days only
>>>     unusually limited iSCSI targets should need it to be otherwise and
>>>     they can change their setting for it (initiator and target must
>>>     both agree to it being 'yes', so either can veto it).
>>>
>>>
>>>     - cks
>>>
>>>
>>>
>>>     On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de
>>>
>>>     <mailto:jg at osn.de>> wrote:
>>>
>>>         Hi,
>>>
>>>         I think your problem is caused by your link properties or your
>>>         switch settings. In general the standard ixgbe seems to perform
>>>         well.
>>>
>>>         I had trouble after changing the default flow control settings
>>>         to "bi"
>>>         and this was my motivation to update the ixgbe driver a long
>>>         time ago.
>>>         After I have updated our systems to ixgbe 2.5.8 I never had any
>>>         problems ....
>>>
>>>         Make sure your switch has support for jumbo frames and you use
>>>         the same mtu on all ports, otherwise the smallest will be used.
>>>
>>>         What switch do you use? I can tell you nice horror stories about
>>>         different vendors....
>>>
>>>          - Joerg
>>>
>>>         On 23.02.2015 10:31, W Verb wrote:
>>>
>>>             Thank you Joerg,
>>>
>>>             I've downloaded the package and will try it tomorrow.
>>>
>>>             The only thing I can add at this point is that upon review
>>>             of my
>>>             testing, I may have performed my "pkg -u" between the
>>>             initial quad-gig
>>>             performance test and installing the 10G NIC. So this may
>>>             be a new
>>>             problem introduced in the latest updates.
>>>
>>>             Those of you who are running 10G and have not upgraded to
>>>             the latest
>>>             kernel, etc, might want to do some additional testing
>>>             before running the
>>>             update.
>>>
>>>             -Warren V
>>>
>>>             On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann
>>>             <jg at osn.de <mailto:jg at osn.de>
>>>
>>>             <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
>>>
>>>                 Hi,
>>>
>>>                 I remember there was a problem with the flow control
>>>             settings in the
>>>                 ixgbe
>>>                 driver, so I updated it a long time ago for our
>>>             internal servers to
>>>                 2.5.8.
>>>                 Last weekend I integrated the latest changes from the
>>>             FreeBSD driver
>>>                 to bring
>>>                 the illumos ixgbe to 2.5.25 but I had no time to test
>>>             it, so it's
>>>                 completely
>>>                 untested!
>>>
>>>
>>>                 If you would like to give the latest driver a try you
>>>             can fetch the
>>>                 kernel modules from
>>>             https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9
>>>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>>>                 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>>>             <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>
>>>
>>>                 Clone your boot environment, place the modules in the
>>>             new environment
>>>                 and update the boot-archive of the new BE.
>>>
>>>                   - Joerg
>>>
>>>
>>>
>>>
>>>
>>>                 On 23.02.2015 02:54, W Verb wrote:
>>>
>>>                     By the way, to those of you who have working
>>>             setups: please send me
>>>                     your pool/volume settings, interface linkprops,
>>>             and any kernel
>>>                     tuning
>>>                     parameters you may have set.
>>>
>>>                     Thanks,
>>>                     Warren V
>>>
>>>                     On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>>                     <chip at innovates.com <mailto:chip at innovates.com>
>>>             <mailto:chip at innovates.com <mailto:chip at innovates.com>>>
>>>
>>>
>>>             wrote:
>>>
>>>                         I can't say I totally agree with your performance
>>>                         assessment.   I run Intel
>>>                         X520 in all my OmniOS boxes.
>>>
>>>                         Here is a capture of nfssvrtop I made while
>>>             running many
>>>                         storage vMotions
>>>                         between two OmniOS boxes hosting NFS
>>>             datastores.   This is a
>>>                         10 host VMware
>>>                         cluster.  Both OmniOS boxes are dual 10G
>>>             connected with
>>>                         copper twin-ax to
>>>                         the in rack Nexus 5010.
>>>
>>>                         VMware does 100% sync writes, I use ZeusRAM
>>>             SSDs for log
>>>                         devices.
>>>
>>>                         -Chip
>>>
>>>                         2014 Apr 24 08:05:51, load: 12.64, read:
>>>             17330243 KB,
>>>                         swrite: 15985    KB,
>>>                         awrite: 1875455  KB
>>>
>>>                         Ver     Client           NFSOPS   Reads
>>>             SWrites AWrites
>>>                         Commits   Rd_bw
>>>                         SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t
>>>              Com_t  Align%
>>>
>>>                         4       10.28.17.105          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         4       10.28.17.215          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         4       10.28.17.213          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         4       10.28.16.151          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         4       all                   1       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         3       10.28.16.175          3       0
>>>              3       0
>>>                           0       1
>>>                         11       0    4806      48       0       0
>>> 85
>>>
>>>                         3       10.28.16.183          6       0
>>>              6       0
>>>                           0       3
>>>                         162       0     549     124       0       0
>>>               73
>>>
>>>                         3       10.28.16.180         11       0
>>>             10       0
>>>                           0       3
>>>                         27       0     776      89       0       0
>>> 67
>>>
>>>                         3       10.28.16.176         28       2
>>>             26       0
>>>                           0      10
>>>                         405       0    2572     198       0       0
>>>              100
>>>
>>>                         3       10.28.16.178       4606    4602
>>>              4       0
>>>                           0  294534
>>>                         3       0     723      49       0       0      99
>>>
>>>                         3       10.28.16.179       4905    4879
>>>             26       0
>>>                           0  312208
>>>                         311       0     735     271       0       0
>>>               99
>>>
>>>                         3       10.28.16.181       5515    5502
>>>             13       0
>>>                           0  352107
>>>                         77       0      89      87       0       0
>>> 99
>>>
>>>                         3       10.28.16.184      12095   12059
>>>             10       0
>>>                           0  763014
>>>                         39       0     249     147       0       0
>>> 99
>>>
>>>                         3       10.28.58.1        15401    6040
>>>              116    6354
>>>                         53  191605
>>>                         474  202346     192      96     144      83
>>>               99
>>>
>>>                         3       all 42574 33086 <tel:42574%2033086
>>> <42574%2033086>>
>>>             <tel:42574%20%20%2033086 <42574%20%20%2033086>>     217
>>>                         6354      53 1913488
>>>                         1582  202300     348     138     153     105
>>>                 99
>>>
>>>
>>>
>>>
>>>
>>>                         On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>>>             <wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>>>                         <mailto:wverb73 at gmail.com
>>>
>>>
>>>             <mailto:wverb73 at gmail.com>>> wrote:
>>>
>>>
>>>                             Hello All,
>>>
>>>                             Thank you for your replies.
>>>                             I tried a few things, and found the
>>> following:
>>>
>>>                             1: Disabling hyperthreading support in the
>>>             BIOS drops
>>>                             performance overall
>>>                             by a factor of 4.
>>>                             2: Disabling VT support also seems to have
>>>             some effect,
>>>                             although it
>>>                             appears to be minor. But this has the
>>>             amusing side
>>>                             effect of fixing the
>>>                             hangs I've been experiencing with fast
>>>             reboot. Probably
>>>                             by disabling kvm.
>>>                             3: The performance tests are a bit tricky
>>>             to quantify
>>>                             because of caching
>>>                             effects. In fact, I'm not entirely sure
>>>             what is
>>>                             happening here. It's just
>>>                             best to describe what I'm seeing:
>>>
>>>                             The commands I'm using to test are
>>>                             dd if=/dev/zero of=./test.dd bs=2M count=5000
>>>                             dd of=/dev/null if=./test.dd bs=2M count=5000
>>>                             The host vm is running Centos 6.6, and has
>>>             the latest
>>>                             vmtools installed.
>>>                             There is a host cache on an SSD local to
>>>             the host that
>>>                             is also in place.
>>>                             Disabling the host cache didn't
>>>             immediately have an
>>>                             effect as far as I could
>>>                             see.
>>>
>>>                             The host MTU set to 3000 on all iSCSI
>>>             interfaces for all
>>>                             tests.
>>>
>>>                             Test 1: Right after reboot, with an ixgbe
>>>             MTU of 9000,
>>>                             the write test
>>>                             yields an average speed over three tests
>>>             of 137MB/s. The
>>>                             read test yields an
>>>                             average over three tests of 5MB/s.
>>>
>>>                             Test 2: After setting "ifconfig ixgbe0 mtu
>>>             3000", the
>>>                             write tests yield
>>>                             140MB/s, and the read tests yield 53MB/s.
>>>             It's important
>>>                             to note here that
>>>                             if I cut the read test short at only
>>>             2-3GB, I get
>>>                             results upwards of
>>>                             350MB/s, which I assume is local
>>>             cache-related distortion.
>>>
>>>                             Test 3: MTU of 1500. Read tests are up to
>>>             156 MB/s.
>>>                             Write tests yield
>>>                             about 142MB/s.
>>>                             Test 4: MTU of 1000: Read test at 182MB/s.
>>>                             Test 5: MTU of 900: Read test at 130 MB/s.
>>>                             Test 6: MTU of 1000: Read test at 160MB/s.
>>>             Write tests
>>>                             are now
>>>                             consistently at about 300MB/s.
>>>                             Test 7: MTU of 1200: Read test at 124MB/s.
>>>                             Test 8: MTU of 1000: Read test at 161MB/s.
>>>             Write at 261MB/s.
>>>
>>>                             A few final notes:
>>>                             L1ARC grabs about 10GB of RAM during the
>>>             tests, so
>>>                             there's definitely some
>>>                             read caching going on.
>>>                             The write operations are easier to observe
>>>             with iostat,
>>>                             and I'm seeing io
>>>                             rates that closely correlate with the
>>>             network write speeds.
>>>
>>>
>>>                             Chris, thanks for your specific details.
>>>             I'd appreciate
>>>                             it if you could
>>>                             tell me which copper NIC you tried, as
>>>             well as to pass
>>>                             on the iSCSI tuning
>>>                             parameters.
>>>
>>>                             I've ordered an Intel EXPX9502AFXSR, which
>>>             uses the
>>>                             82598 chip instead of
>>>                             the 82599 in the X520. If I get similar
>>>             results with my
>>>                             fiber transcievers,
>>>                             I'll see if I can get a hold of copper ones.
>>>
>>>                             But I should mention that I did indeed
>>>             look at PHY/MAC
>>>                             error rates, and
>>>                             they are nil.
>>>
>>>                             -Warren V
>>>
>>>                             On Fri, Feb 20, 2015 at 7:25 PM, Chris
>>>             Siebenmann
>>>                             <cks at cs.toronto.edu
>>>
>>>             <mailto:cks at cs.toronto.edu> <mailto:cks at cs.toronto.edu
>>>
>>>
>>>             <mailto:cks at cs.toronto.edu>>>
>>>
>>>                             wrote:
>>>
>>>
>>>                                     After installation and
>>>             configuration, I observed
>>>                                     all kinds of bad
>>>                                     behavior
>>>                                     in the network traffic between the
>>>             hosts and the
>>>                                     server. All of this
>>>                                     bad
>>>                                     behavior is traced to the ixgbe
>>>             driver on the
>>>                                     storage server. Without
>>>                                     going
>>>                                     into the full troubleshooting
>>>             process, here are
>>>                                     my takeaways:
>>>
>>>                                 [...]
>>>
>>>                                    For what it's worth, we managed to
>>>             achieve much
>>>                                 better line rates on
>>>                                 copper 10G ixgbe hardware of various
>>>             descriptions
>>>                                 between OmniOS
>>>                                 and CentOS 7 (I don't think we ever
>>>             tested OmniOS to
>>>                                 OmniOS). I don't
>>>                                 believe OmniOS could do TCP at full
>>>             line rate but I
>>>                                 think we managed 700+
>>>                                 Mbytes/sec on both transmit and
>>>             receive and we got
>>>                                 basically disk-limited
>>>                                 speeds with iSCSI (across multiple
>>>             disks on
>>>                                 multi-disk mirrored pools,
>>>                                 OmniOS iSCSI initiator, Linux iSCSI
>>>             targets).
>>>
>>>                                    I don't believe we did any specific
>>>             kernel tuning
>>>                                 (and in fact some of
>>>                                 our attempts to fiddle ixgbe driver
>>>             parameters blew
>>>                                 up in our face).
>>>                                 We did tune iSCSI connection
>>>             parameters to increase
>>>                                 various buffer
>>>                                 sizes so that ZFS could do even large
>>>             single
>>>                                 operations in single iSCSI
>>>                                 transactions. (More details available
>>>             if people are
>>>                                 interested.)
>>>
>>>                                     10: At the wire level, the speed
>>>             problems are
>>>                                     clearly due to pauses in
>>>                                     response time by omnios. At 9000
>>>             byte frame
>>>                                     sizes, I see a good number
>>>                                     of duplicate ACKs and fast
>>>             retransmits during
>>>                                     read operations (when
>>>                                     omnios is transmitting). But below
>>>             about a
>>>                                     4100-byte MTU on omnios
>>>                                     (which seems to correlate to
>>>             4096-byte iSCSI
>>>                                     block transfers), the
>>>                                     transmission errors fade away and
>>>             we only see
>>>                                     the transmission pause
>>>                                     problem.
>>>
>>>
>>>                                    This is what really attracted my
>>>             attention. In
>>>                                 our OmniOS setup, our
>>>                                 specific Intel hardware had ixgbe
>>>             driver issues that
>>>                                 could cause
>>>                                 activity stalls during once-a-second
>>>             link heartbeat
>>>                                 checks. This
>>>                                 obviously had an effect at the TCP and
>>>             iSCSI layers.
>>>                                 My initial message
>>>                                 to illumos-developer sparked a
>>> potentially
>>>                                 interesting discussion:
>>>
>>>
>>> http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/
>>>             <
>>> http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/
>>> >
>>>
>>>             <
>>> http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/
>>>             <
>>> http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/
>>> >>
>>>
>>>                                 If you think this is a possibility in
>>>             your setup,
>>>                                 I've put the DTrace
>>>                                 script I used to hunt for this up on
>>>             the web:
>>>
>>>
>>> http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d
>>>             <
>>> http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d>
>>>
>>>             <
>>> http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d
>>>             <
>>> http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d>>
>>>
>>>                                 This isn't the only potential source
>>>             of driver
>>>                                 stalls by any means, it's
>>>                                 just the one I found. You may also
>>>             want to look at
>>>                                 lockstat in general,
>>>                                 as information it reported is what led
>>>             us to look
>>>                                 specifically at the
>>>                                 ixgbe code here.
>>>
>>>                                 (If you suspect kernel/driver issues,
>>>             lockstat
>>>                                 combined with kernel
>>>                                 source is a really excellent resource.)
>>>
>>>                                           - cks
>>>
>>>
>>>
>>>
>>>
>>>             ___________________________________________________
>>>                             OmniOS-discuss mailing list
>>>             OmniOS-discuss at lists.omniti
>>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>>                             <mailto:OmniOS-discuss at lists.__omniti.com
>>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>>
>>> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>> >
>>>
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>>
>>>
>>>                     ___________________________________________________
>>>                     OmniOS-discuss mailing list
>>>             OmniOS-discuss at lists.omniti
>>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>>                     <mailto:OmniOS-discuss at lists.__omniti.com
>>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>>
>>> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>> >
>>>
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>>
>>>
>>>                 --
>>>                 OSN Online Service Nuernberg GmbH, Bucher Str. 78,
>>>             90408 Nuernberg
>>>                 Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0
>>> <%2B49%20911%2039905-0>>
>>>             <tel:%2B49%20911%2039905-0 <%2B49%20911%2039905-0>> - Fax:
>>> +49 911
>>>                 39905-55 <tel:%2B49%20911%2039905-55
>>> <%2B49%20911%2039905-55>> -
>>>             http://www.osn.de <http://www.osn.de/>
>>>                 HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg
>>>             Goltermann
>>>
>>>
>>>
>>>         --
>>>         OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408
>>> Nuernberg
>>>         Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0
>>> <%2B49%20911%2039905-0>> - Fax: +49
>>>         911 39905-55 <tel:%2B49%20911%2039905-55
>>> <%2B49%20911%2039905-55>> - http://www.osn.de
>>>         <http://www.osn.de/>
>>>         HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>>
>>>
>>>     *illumos-developer* | Archives
>>>     <https://www.listbox.com/member/archive/182179/=now>
>>>     <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e
>>> >
>>>     | Modify <https://www.listbox.com/member/?&> Your Subscription
>>>     [Powered by Listbox] <http://www.listbox.com/>
>>>
>>>
>>>
>>> *illumos-developer* | Archives
>>> <https://www.listbox.com/member/archive/182179/=now>
>>> <https://www.listbox.com/member/archive/rss/182179/21175123-d0c8da4c> |
>>> Modify
>>>
>>> <https://www.listbox.com/member/?&id_secret=21175123-d92578cc
>>> <https://www.listbox.com/member/?&>>
>>> Your Subscription       [Powered by Listbox] <htt
>>> <http://www.listbox.com/>
>>>
>>> ...
>>>
>>> [Message clipped]
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150304/575d6cb6/attachment-0001.html>