From wverb73 at gmail.com  Mon Mar  2 05:03:37 2015
From: wverb73 at gmail.com (W Verb)
Date: Sun, 1 Mar 2015 21:03:37 -0800
Subject: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan,
	and the Greek economy
In-Reply-To: <54EB5392.6030900@osn.de>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
Message-ID: <CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>

Hello all,



Well, I no longer blame the ixgbe driver for the problems I'm seeing.


I tried Joerg's updated driver, which didn't improve the issue. So I went
back to the drawing board and rebuilt the server from scratch.

What I noted is that if I have only a single 1-gig physical interface
active on the ESXi host, everything works as expected. As soon as I enable
two interfaces, I start seeing the performance problems I've described.

Response pauses from the server that I see in TCPdumps are still leading me
to believe the problem is delay on the server side, so I ran a series of
kernel dtraces and produced some flamegraphs.


This was taken during a read operation with two active 10G interfaces on
the server, with a single target being shared by two tpgs- one tpg for each
10G physical port. The host device has two 1G ports enabled, with VLANs
separating the active ports into 10G/1G pairs. ESXi is set to multipath
using both VLANS with a round-robin IO interval of 1.

https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing


This was taken during a write operation:

https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing


I then rebooted the server and disabled C-State, ACPI T-State, and general
EIST (Turbo boost) functionality in the CPU.

I when I attempted to boot my guest VM, the iSCSI transfer gradually ground
to a halt during the boot loading process, and the guest OS never did
complete its boot process.

Here is a flamegraph taken while iSCSI is slowly dying:

https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing



I edited out cpu_idle_adaptive from the dtrace output and regenerated the
slowdown graph:

https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing



I then edited cpu_idle_adaptive out of the speedy write operation and
regenerated that graph:

https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing



I have zero experience with interpreting flamegraphs, but the most
significant difference I see between the slow read example and the fast
write example is in unix`thread_start --> unix`idle. There's a good chunk
of "unix`i86_mwait" in the read example that is not present in the write
example at all.

Disabling the l2arc cache device didn't make a difference, and I had to
reenable EIST support on the CPU to get my VMs to boot.

I am seeing a variety of bug reports going back to 2010 regarding excessive
mwait operations, with the suggested solutions usually being to set "cpupm
enable poll-mode" in power.conf. That change also had no effect on speed.

-Warren V







-----Original Message-----

From: Chris Siebenmann [mailto:cks at cs.toronto.edu <cks at cs.toronto.edu>]

Sent: Monday, February 23, 2015 8:30 AM

To: W Verb

Cc: omnios-discuss at lists.omniti.com; cks at cs.toronto.edu

Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the
Greek economy



> Chris, thanks for your specific details. I'd appreciate it if you

> could tell me which copper NIC you tried, as well as to pass on the

> iSCSI tuning parameters.



 Our copper NIC experience is with onboard X540-AT2 ports on SuperMicro
hardware (which have the guaranteed 10-20 msec lock hold) and dual-port
82599EB TN cards (which have some sort of driver/hardware failure under
load that eventually leads to 2-second lock holds). I can't recommend
either with the current driver; we had to revert to 1G networking in order
to get stable servers.



 The iSCSI parameter modifications we do, across both initiators and
targets, are:



      initialr2t        no

      firstburstlength  128k

      maxrecvdataseglen 128k        [only on Linux backends]

      maxxmitdataseglen 128k        [only on Linux backends]



The OmniOS initiator doesn't need tuning for more than the first two
parameters; on the Linux backends we tune up all four. My extended thoughts
on these tuning parameters and why we touch them can be found

here:



   http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol

   http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning



The short version is that these parameters probably only make a small
difference but their overall goal is to do 128KB ZFS reads and writes in
single iSCSI operations (although they will be fragmented at the TCP

layer) and to do iSCSI writes without a back-and-forth delay between
initiator and target (that's 'initialr2t no').



 I think basically everyone should use InitialR2T set to no and in fact
that it should be the software default. These days only unusually limited
iSCSI targets should need it to be otherwise and they can change their
setting for it (initiator and target must both agree to it being 'yes', so
either can veto it).



      - cks



On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de> wrote:

> Hi,
>
> I think your problem is caused by your link properties or your
> switch settings. In general the standard ixgbe seems to perform
> well.
>
> I had trouble after changing the default flow control settings to "bi"
> and this was my motivation to update the ixgbe driver a long time ago.
> After I have updated our systems to ixgbe 2.5.8 I never had any
> problems ....
>
> Make sure your switch has support for jumbo frames and you use
> the same mtu on all ports, otherwise the smallest will be used.
>
> What switch do you use? I can tell you nice horror stories about
> different vendors....
>
>  - Joerg
>
> On 23.02.2015 10:31, W Verb wrote:
>
>> Thank you Joerg,
>>
>> I've downloaded the package and will try it tomorrow.
>>
>> The only thing I can add at this point is that upon review of my
>> testing, I may have performed my "pkg -u" between the initial quad-gig
>> performance test and installing the 10G NIC. So this may be a new
>> problem introduced in the latest updates.
>>
>> Those of you who are running 10G and have not upgraded to the latest
>> kernel, etc, might want to do some additional testing before running the
>> update.
>>
>> -Warren V
>>
>> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann <jg at osn.de
>> <mailto:jg at osn.de>> wrote:
>>
>>     Hi,
>>
>>     I remember there was a problem with the flow control settings in the
>>     ixgbe
>>     driver, so I updated it a long time ago for our internal servers to
>>     2.5.8.
>>     Last weekend I integrated the latest changes from the FreeBSD driver
>>     to bring
>>     the illumos ixgbe to 2.5.25 but I had no time to test it, so it's
>>     completely
>>     untested!
>>
>>
>>     If you would like to give the latest driver a try you can fetch the
>>     kernel modules from
>>     https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>>     <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>
>>
>>     Clone your boot environment, place the modules in the new environment
>>     and update the boot-archive of the new BE.
>>
>>       - Joerg
>>
>>
>>
>>
>>
>>     On 23.02.2015 02:54, W Verb wrote:
>>
>>         By the way, to those of you who have working setups: please send
>> me
>>         your pool/volume settings, interface linkprops, and any kernel
>>         tuning
>>         parameters you may have set.
>>
>>         Thanks,
>>         Warren V
>>
>>         On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>         <chip at innovates.com <mailto:chip at innovates.com>> wrote:
>>
>>             I can't say I totally agree with your performance
>>             assessment.   I run Intel
>>             X520 in all my OmniOS boxes.
>>
>>             Here is a capture of nfssvrtop I made while running many
>>             storage vMotions
>>             between two OmniOS boxes hosting NFS datastores.   This is a
>>             10 host VMware
>>             cluster.  Both OmniOS boxes are dual 10G connected with
>>             copper twin-ax to
>>             the in rack Nexus 5010.
>>
>>             VMware does 100% sync writes, I use ZeusRAM SSDs for log
>>             devices.
>>
>>             -Chip
>>
>>             2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB,
>>             swrite: 15985    KB,
>>             awrite: 1875455  KB
>>
>>             Ver     Client           NFSOPS   Reads SWrites AWrites
>>             Commits   Rd_bw
>>             SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t   Com_t  Align%
>>
>>             4       10.28.17.105          0       0       0       0
>>               0       0
>>             0       0       0       0       0       0       0
>>
>>             4       10.28.17.215          0       0       0       0
>>               0       0
>>             0       0       0       0       0       0       0
>>
>>             4       10.28.17.213          0       0       0       0
>>               0       0
>>             0       0       0       0       0       0       0
>>
>>             4       10.28.16.151          0       0       0       0
>>               0       0
>>             0       0       0       0       0       0       0
>>
>>             4       all                   1       0       0       0
>>               0       0
>>             0       0       0       0       0       0       0
>>
>>             3       10.28.16.175          3       0       3       0
>>               0       1
>>             11       0    4806      48       0       0      85
>>
>>             3       10.28.16.183          6       0       6       0
>>               0       3
>>             162       0     549     124       0       0      73
>>
>>             3       10.28.16.180         11       0      10       0
>>               0       3
>>             27       0     776      89       0       0      67
>>
>>             3       10.28.16.176         28       2      26       0
>>               0      10
>>             405       0    2572     198       0       0     100
>>
>>             3       10.28.16.178       4606    4602       4       0
>>               0  294534
>>             3       0     723      49       0       0      99
>>
>>             3       10.28.16.179       4905    4879      26       0
>>               0  312208
>>             311       0     735     271       0       0      99
>>
>>             3       10.28.16.181       5515    5502      13       0
>>               0  352107
>>             77       0      89      87       0       0      99
>>
>>             3       10.28.16.184      12095   12059      10       0
>>               0  763014
>>             39       0     249     147       0       0      99
>>
>>             3       10.28.58.1        15401    6040     116    6354
>>             53  191605
>>             474  202346     192      96     144      83      99
>>
>>             3       all 42574 33086 <tel:42574%20%20%2033086>     217
>>             6354      53 1913488
>>             1582  202300     348     138     153     105      99
>>
>>
>>
>>
>>
>>             On Fri, Feb 20, 2015 at 11:46 PM, W Verb <wverb73 at gmail.com
>>             <mailto:wverb73 at gmail.com>> wrote:
>>
>>
>>                 Hello All,
>>
>>                 Thank you for your replies.
>>                 I tried a few things, and found the following:
>>
>>                 1: Disabling hyperthreading support in the BIOS drops
>>                 performance overall
>>                 by a factor of 4.
>>                 2: Disabling VT support also seems to have some effect,
>>                 although it
>>                 appears to be minor. But this has the amusing side
>>                 effect of fixing the
>>                 hangs I've been experiencing with fast reboot. Probably
>>                 by disabling kvm.
>>                 3: The performance tests are a bit tricky to quantify
>>                 because of caching
>>                 effects. In fact, I'm not entirely sure what is
>>                 happening here. It's just
>>                 best to describe what I'm seeing:
>>
>>                 The commands I'm using to test are
>>                 dd if=/dev/zero of=./test.dd bs=2M count=5000
>>                 dd of=/dev/null if=./test.dd bs=2M count=5000
>>                 The host vm is running Centos 6.6, and has the latest
>>                 vmtools installed.
>>                 There is a host cache on an SSD local to the host that
>>                 is also in place.
>>                 Disabling the host cache didn't immediately have an
>>                 effect as far as I could
>>                 see.
>>
>>                 The host MTU set to 3000 on all iSCSI interfaces for all
>>                 tests.
>>
>>                 Test 1: Right after reboot, with an ixgbe MTU of 9000,
>>                 the write test
>>                 yields an average speed over three tests of 137MB/s. The
>>                 read test yields an
>>                 average over three tests of 5MB/s.
>>
>>                 Test 2: After setting "ifconfig ixgbe0 mtu 3000", the
>>                 write tests yield
>>                 140MB/s, and the read tests yield 53MB/s. It's important
>>                 to note here that
>>                 if I cut the read test short at only 2-3GB, I get
>>                 results upwards of
>>                 350MB/s, which I assume is local cache-related distortion.
>>
>>                 Test 3: MTU of 1500. Read tests are up to 156 MB/s.
>>                 Write tests yield
>>                 about 142MB/s.
>>                 Test 4: MTU of 1000: Read test at 182MB/s.
>>                 Test 5: MTU of 900: Read test at 130 MB/s.
>>                 Test 6: MTU of 1000: Read test at 160MB/s. Write tests
>>                 are now
>>                 consistently at about 300MB/s.
>>                 Test 7: MTU of 1200: Read test at 124MB/s.
>>                 Test 8: MTU of 1000: Read test at 161MB/s. Write at
>> 261MB/s.
>>
>>                 A few final notes:
>>                 L1ARC grabs about 10GB of RAM during the tests, so
>>                 there's definitely some
>>                 read caching going on.
>>                 The write operations are easier to observe with iostat,
>>                 and I'm seeing io
>>                 rates that closely correlate with the network write
>> speeds.
>>
>>
>>                 Chris, thanks for your specific details. I'd appreciate
>>                 it if you could
>>                 tell me which copper NIC you tried, as well as to pass
>>                 on the iSCSI tuning
>>                 parameters.
>>
>>                 I've ordered an Intel EXPX9502AFXSR, which uses the
>>                 82598 chip instead of
>>                 the 82599 in the X520. If I get similar results with my
>>                 fiber transcievers,
>>                 I'll see if I can get a hold of copper ones.
>>
>>                 But I should mention that I did indeed look at PHY/MAC
>>                 error rates, and
>>                 they are nil.
>>
>>                 -Warren V
>>
>>                 On Fri, Feb 20, 2015 at 7:25 PM, Chris Siebenmann
>>                 <cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>>
>>
>>                 wrote:
>>
>>
>>                         After installation and configuration, I observed
>>                         all kinds of bad
>>                         behavior
>>                         in the network traffic between the hosts and the
>>                         server. All of this
>>                         bad
>>                         behavior is traced to the ixgbe driver on the
>>                         storage server. Without
>>                         going
>>                         into the full troubleshooting process, here are
>>                         my takeaways:
>>
>>                     [...]
>>
>>                        For what it's worth, we managed to achieve much
>>                     better line rates on
>>                     copper 10G ixgbe hardware of various descriptions
>>                     between OmniOS
>>                     and CentOS 7 (I don't think we ever tested OmniOS to
>>                     OmniOS). I don't
>>                     believe OmniOS could do TCP at full line rate but I
>>                     think we managed 700+
>>                     Mbytes/sec on both transmit and receive and we got
>>                     basically disk-limited
>>                     speeds with iSCSI (across multiple disks on
>>                     multi-disk mirrored pools,
>>                     OmniOS iSCSI initiator, Linux iSCSI targets).
>>
>>                        I don't believe we did any specific kernel tuning
>>                     (and in fact some of
>>                     our attempts to fiddle ixgbe driver parameters blew
>>                     up in our face).
>>                     We did tune iSCSI connection parameters to increase
>>                     various buffer
>>                     sizes so that ZFS could do even large single
>>                     operations in single iSCSI
>>                     transactions. (More details available if people are
>>                     interested.)
>>
>>                         10: At the wire level, the speed problems are
>>                         clearly due to pauses in
>>                         response time by omnios. At 9000 byte frame
>>                         sizes, I see a good number
>>                         of duplicate ACKs and fast retransmits during
>>                         read operations (when
>>                         omnios is transmitting). But below about a
>>                         4100-byte MTU on omnios
>>                         (which seems to correlate to 4096-byte iSCSI
>>                         block transfers), the
>>                         transmission errors fade away and we only see
>>                         the transmission pause
>>                         problem.
>>
>>
>>                        This is what really attracted my attention. In
>>                     our OmniOS setup, our
>>                     specific Intel hardware had ixgbe driver issues that
>>                     could cause
>>                     activity stalls during once-a-second link heartbeat
>>                     checks. This
>>                     obviously had an effect at the TCP and iSCSI layers.
>>                     My initial message
>>                     to illumos-developer sparked a potentially
>>                     interesting discussion:
>>
>>
>>                     http://www.listbox.com/member/
>> __archive/182179/2014/10/sort/__time_rev/page/16/entry/6:
>> 405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/
>>                     <http://www.listbox.com/
>> member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:
>> 405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/>
>>
>>                     If you think this is a possibility in your setup,
>>                     I've put the DTrace
>>                     script I used to hunt for this up on the web:
>>
>>                     http://www.cs.toronto.edu/~__
>> cks/src/omnios-ixgbe/ixgbe___delay.d
>>                     <http://www.cs.toronto.edu/~
>> cks/src/omnios-ixgbe/ixgbe_delay.d>
>>
>>                     This isn't the only potential source of driver
>>                     stalls by any means, it's
>>                     just the one I found. You may also want to look at
>>                     lockstat in general,
>>                     as information it reported is what led us to look
>>                     specifically at the
>>                     ixgbe code here.
>>
>>                     (If you suspect kernel/driver issues, lockstat
>>                     combined with kernel
>>                     source is a really excellent resource.)
>>
>>                               - cks
>>
>>
>>
>>
>>                 _________________________________________________
>>                 OmniOS-discuss mailing list
>>                 OmniOS-discuss at lists.omniti.__com
>>                 <mailto:OmniOS-discuss at lists.omniti.com>
>>                 http://lists.omniti.com/__mailman/listinfo/omnios-__
>> discuss
>>                 <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>>
>>
>>         _________________________________________________
>>         OmniOS-discuss mailing list
>>         OmniOS-discuss at lists.omniti.__com
>>         <mailto:OmniOS-discuss at lists.omniti.com>
>>         http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>         <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>>
>>
>>     --
>>     OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>>     Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> - Fax: +49 911
>>     39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de
>>     HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>
>>
>>
> --
> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
> Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de
> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150301/4fcbf8f6/attachment-0001.html>

From garrett at damore.org  Mon Mar  2 05:11:16 2015
From: garrett at damore.org (Garrett D'Amore)
Date: Sun, 1 Mar 2015 21:11:16 -0800
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
	Lindsay Lohan, and the Greek economy
In-Reply-To: <CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
Message-ID: <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>

I?m not sure I?ve followed properly.  You have *two* interfaces.  You are not trying to provision these in an aggr are you? As far as I?m aware, VMware does not support 802.3ad link aggregations.  (Its possible that you can make it work with ESXi if you give the entire NIC to the guest ? but I?m skeptical.)  The problem is that if you try to use link aggregation, some packets (up to half!) will be lost.  TCP and other protocols fare poorly in this situation.

Its possible I?ve totally misunderstood what you?re trying to do, in which case I apologize.

The idle thing is a red-herring ? the cpu is waiting for work to do, probably because packets haven?t arrived (or where dropped by the hypervisor!)  I wouldn?t read too much into that except that your network stack is in trouble.  I?d look a bit more closely at the kstats for tcp ? I suspect you?ll see retransmits or out of order values that are unusually high ? if so this may help validate my theory above.

	- Garrett

> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer <developer at lists.illumos.org> wrote:
> 
> Hello all,
> 
>  
> Well, I no longer blame the ixgbe driver for the problems I'm seeing.
> 
> 
> 
> I tried Joerg's updated driver, which didn't improve the issue. So I went back to the drawing board and rebuilt the server from scratch.
> 
> What I noted is that if I have only a single 1-gig physical interface active on the ESXi host, everything works as expected. As soon as I enable two interfaces, I start seeing the performance problems I've described.
> 
> Response pauses from the server that I see in TCPdumps are still leading me to believe the problem is delay on the server side, so I ran a series of kernel dtraces and produced some flamegraphs.
> 
> 
> 
> This was taken during a read operation with two active 10G interfaces on the server, with a single target being shared by two tpgs- one tpg for each 10G physical port. The host device has two 1G ports enabled, with VLANs separating the active ports into 10G/1G pairs. ESXi is set to multipath using both VLANS with a round-robin IO interval of 1.
> 
> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing>
> 
> This was taken during a write operation:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing>
> 
> I then rebooted the server and disabled C-State, ACPI T-State, and general EIST (Turbo boost) functionality in the CPU.
> 
> I when I attempted to boot my guest VM, the iSCSI transfer gradually ground to a halt during the boot loading process, and the guest OS never did complete its boot process.
> 
> Here is a flamegraph taken while iSCSI is slowly dying:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing>
>  
> I edited out cpu_idle_adaptive from the dtrace output and regenerated the slowdown graph:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing>
>  
> I then edited cpu_idle_adaptive out of the speedy write operation and regenerated that graph:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing>
>  
> I have zero experience with interpreting flamegraphs, but the most significant difference I see between the slow read example and the fast write example is in unix`thread_start --> unix`idle. There's a good chunk of "unix`i86_mwait" in the read example that is not present in the write example at all.
> 
> Disabling the l2arc cache device didn't make a difference, and I had to reenable EIST support on the CPU to get my VMs to boot.
> 
> I am seeing a variety of bug reports going back to 2010 regarding excessive mwait operations, with the suggested solutions usually being to set "cpupm enable poll-mode" in power.conf. That change also had no effect on speed.
> 
> -Warren V
> 
>  
>  
>  
> -----Original Message-----
> 
> From: Chris Siebenmann [mailto:cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>]
> 
> Sent: Monday, February 23, 2015 8:30 AM
> 
> To: W Verb
> 
> Cc: omnios-discuss at lists.omniti.com <mailto:omnios-discuss at lists.omniti.com>; cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>
> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy
> 
>  
> > Chris, thanks for your specific details. I'd appreciate it if you
> 
> > could tell me which copper NIC you tried, as well as to pass on the
> 
> > iSCSI tuning parameters.
> 
>  
>  Our copper NIC experience is with onboard X540-AT2 ports on SuperMicro hardware (which have the guaranteed 10-20 msec lock hold) and dual-port 82599EB TN cards (which have some sort of driver/hardware failure under load that eventually leads to 2-second lock holds). I can't recommend either with the current driver; we had to revert to 1G networking in order to get stable servers.
> 
>  
>  The iSCSI parameter modifications we do, across both initiators and targets, are:
> 
>  
>       initialr2t        no
> 
>       firstburstlength  128k
> 
>       maxrecvdataseglen 128k        [only on Linux backends]
> 
>       maxxmitdataseglen 128k        [only on Linux backends]
> 
>  
> The OmniOS initiator doesn't need tuning for more than the first two parameters; on the Linux backends we tune up all four. My extended thoughts on these tuning parameters and why we touch them can be found
> 
> here:
> 
>  
>    http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol <http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol>
>    http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning <http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning>
>  
> The short version is that these parameters probably only make a small difference but their overall goal is to do 128KB ZFS reads and writes in single iSCSI operations (although they will be fragmented at the TCP
> 
> layer) and to do iSCSI writes without a back-and-forth delay between initiator and target (that's 'initialr2t no').
> 
>  
>  I think basically everyone should use InitialR2T set to no and in fact that it should be the software default. These days only unusually limited iSCSI targets should need it to be otherwise and they can change their setting for it (initiator and target must both agree to it being 'yes', so either can veto it).
> 
>  
>       - cks
> 
>  
> 
> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de <mailto:jg at osn.de>> wrote:
> Hi,
> 
> I think your problem is caused by your link properties or your
> switch settings. In general the standard ixgbe seems to perform
> well.
> 
> I had trouble after changing the default flow control settings to "bi"
> and this was my motivation to update the ixgbe driver a long time ago.
> After I have updated our systems to ixgbe 2.5.8 I never had any
> problems ....
> 
> Make sure your switch has support for jumbo frames and you use
> the same mtu on all ports, otherwise the smallest will be used.
> 
> What switch do you use? I can tell you nice horror stories about
> different vendors....
> 
>  - Joerg
> 
> On 23.02.2015 10:31, W Verb wrote:
> Thank you Joerg,
> 
> I've downloaded the package and will try it tomorrow.
> 
> The only thing I can add at this point is that upon review of my
> testing, I may have performed my "pkg -u" between the initial quad-gig
> performance test and installing the 10G NIC. So this may be a new
> problem introduced in the latest updates.
> 
> Those of you who are running 10G and have not upgraded to the latest
> kernel, etc, might want to do some additional testing before running the
> update.
> 
> -Warren V
> 
> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann <jg at osn.de <mailto:jg at osn.de>
> <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
> 
>     Hi,
> 
>     I remember there was a problem with the flow control settings in the
>     ixgbe
>     driver, so I updated it a long time ago for our internal servers to
>     2.5.8.
>     Last weekend I integrated the latest changes from the FreeBSD driver
>     to bring
>     the illumos ixgbe to 2.5.25 but I had no time to test it, so it's
>     completely
>     untested!
> 
> 
>     If you would like to give the latest driver a try you can fetch the
>     kernel modules from
>     https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>     <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9 <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>
> 
>     Clone your boot environment, place the modules in the new environment
>     and update the boot-archive of the new BE.
> 
>       - Joerg
> 
> 
> 
> 
> 
>     On 23.02.2015 02:54, W Verb wrote:
> 
>         By the way, to those of you who have working setups: please send me
>         your pool/volume settings, interface linkprops, and any kernel
>         tuning
>         parameters you may have set.
> 
>         Thanks,
>         Warren V
> 
>         On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>         <chip at innovates.com <mailto:chip at innovates.com> <mailto:chip at innovates.com <mailto:chip at innovates.com>>> wrote:
> 
>             I can't say I totally agree with your performance
>             assessment.   I run Intel
>             X520 in all my OmniOS boxes.
> 
>             Here is a capture of nfssvrtop I made while running many
>             storage vMotions
>             between two OmniOS boxes hosting NFS datastores.   This is a
>             10 host VMware
>             cluster.  Both OmniOS boxes are dual 10G connected with
>             copper twin-ax to
>             the in rack Nexus 5010.
> 
>             VMware does 100% sync writes, I use ZeusRAM SSDs for log
>             devices.
> 
>             -Chip
> 
>             2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB,
>             swrite: 15985    KB,
>             awrite: 1875455  KB
> 
>             Ver     Client           NFSOPS   Reads SWrites AWrites
>             Commits   Rd_bw
>             SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t   Com_t  Align%
> 
>             4       10.28.17.105          0       0       0       0
>               0       0
>             0       0       0       0       0       0       0
> 
>             4       10.28.17.215          0       0       0       0
>               0       0
>             0       0       0       0       0       0       0
> 
>             4       10.28.17.213          0       0       0       0
>               0       0
>             0       0       0       0       0       0       0
> 
>             4       10.28.16.151          0       0       0       0
>               0       0
>             0       0       0       0       0       0       0
> 
>             4       all                   1       0       0       0
>               0       0
>             0       0       0       0       0       0       0
> 
>             3       10.28.16.175          3       0       3       0
>               0       1
>             11       0    4806      48       0       0      85
> 
>             3       10.28.16.183          6       0       6       0
>               0       3
>             162       0     549     124       0       0      73
> 
>             3       10.28.16.180         11       0      10       0
>               0       3
>             27       0     776      89       0       0      67
> 
>             3       10.28.16.176         28       2      26       0
>               0      10
>             405       0    2572     198       0       0     100
> 
>             3       10.28.16.178       4606    4602       4       0
>               0  294534
>             3       0     723      49       0       0      99
> 
>             3       10.28.16.179       4905    4879      26       0
>               0  312208
>             311       0     735     271       0       0      99
> 
>             3       10.28.16.181       5515    5502      13       0
>               0  352107
>             77       0      89      87       0       0      99
> 
>             3       10.28.16.184      12095   12059      10       0
>               0  763014
>             39       0     249     147       0       0      99
> 
>             3       10.28.58.1        15401    6040     116    6354
>             53  191605
>             474  202346     192      96     144      83      99
> 
>             3       all 42574 33086 <tel:42574%2033086> <tel:42574%20%20%2033086>     217
>             6354      53 1913488
>             1582  202300     348     138     153     105      99
> 
> 
> 
> 
> 
>             On Fri, Feb 20, 2015 at 11:46 PM, W Verb <wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>             <mailto:wverb73 at gmail.com <mailto:wverb73 at gmail.com>>> wrote:
> 
> 
>                 Hello All,
> 
>                 Thank you for your replies.
>                 I tried a few things, and found the following:
> 
>                 1: Disabling hyperthreading support in the BIOS drops
>                 performance overall
>                 by a factor of 4.
>                 2: Disabling VT support also seems to have some effect,
>                 although it
>                 appears to be minor. But this has the amusing side
>                 effect of fixing the
>                 hangs I've been experiencing with fast reboot. Probably
>                 by disabling kvm.
>                 3: The performance tests are a bit tricky to quantify
>                 because of caching
>                 effects. In fact, I'm not entirely sure what is
>                 happening here. It's just
>                 best to describe what I'm seeing:
> 
>                 The commands I'm using to test are
>                 dd if=/dev/zero of=./test.dd bs=2M count=5000
>                 dd of=/dev/null if=./test.dd bs=2M count=5000
>                 The host vm is running Centos 6.6, and has the latest
>                 vmtools installed.
>                 There is a host cache on an SSD local to the host that
>                 is also in place.
>                 Disabling the host cache didn't immediately have an
>                 effect as far as I could
>                 see.
> 
>                 The host MTU set to 3000 on all iSCSI interfaces for all
>                 tests.
> 
>                 Test 1: Right after reboot, with an ixgbe MTU of 9000,
>                 the write test
>                 yields an average speed over three tests of 137MB/s. The
>                 read test yields an
>                 average over three tests of 5MB/s.
> 
>                 Test 2: After setting "ifconfig ixgbe0 mtu 3000", the
>                 write tests yield
>                 140MB/s, and the read tests yield 53MB/s. It's important
>                 to note here that
>                 if I cut the read test short at only 2-3GB, I get
>                 results upwards of
>                 350MB/s, which I assume is local cache-related distortion.
> 
>                 Test 3: MTU of 1500. Read tests are up to 156 MB/s.
>                 Write tests yield
>                 about 142MB/s.
>                 Test 4: MTU of 1000: Read test at 182MB/s.
>                 Test 5: MTU of 900: Read test at 130 MB/s.
>                 Test 6: MTU of 1000: Read test at 160MB/s. Write tests
>                 are now
>                 consistently at about 300MB/s.
>                 Test 7: MTU of 1200: Read test at 124MB/s.
>                 Test 8: MTU of 1000: Read test at 161MB/s. Write at 261MB/s.
> 
>                 A few final notes:
>                 L1ARC grabs about 10GB of RAM during the tests, so
>                 there's definitely some
>                 read caching going on.
>                 The write operations are easier to observe with iostat,
>                 and I'm seeing io
>                 rates that closely correlate with the network write speeds.
> 
> 
>                 Chris, thanks for your specific details. I'd appreciate
>                 it if you could
>                 tell me which copper NIC you tried, as well as to pass
>                 on the iSCSI tuning
>                 parameters.
> 
>                 I've ordered an Intel EXPX9502AFXSR, which uses the
>                 82598 chip instead of
>                 the 82599 in the X520. If I get similar results with my
>                 fiber transcievers,
>                 I'll see if I can get a hold of copper ones.
> 
>                 But I should mention that I did indeed look at PHY/MAC
>                 error rates, and
>                 they are nil.
> 
>                 -Warren V
> 
>                 On Fri, Feb 20, 2015 at 7:25 PM, Chris Siebenmann
>                 <cks at cs.toronto.edu <mailto:cks at cs.toronto.edu> <mailto:cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>>>
> 
>                 wrote:
> 
> 
>                         After installation and configuration, I observed
>                         all kinds of bad
>                         behavior
>                         in the network traffic between the hosts and the
>                         server. All of this
>                         bad
>                         behavior is traced to the ixgbe driver on the
>                         storage server. Without
>                         going
>                         into the full troubleshooting process, here are
>                         my takeaways:
> 
>                     [...]
> 
>                        For what it's worth, we managed to achieve much
>                     better line rates on
>                     copper 10G ixgbe hardware of various descriptions
>                     between OmniOS
>                     and CentOS 7 (I don't think we ever tested OmniOS to
>                     OmniOS). I don't
>                     believe OmniOS could do TCP at full line rate but I
>                     think we managed 700+
>                     Mbytes/sec on both transmit and receive and we got
>                     basically disk-limited
>                     speeds with iSCSI (across multiple disks on
>                     multi-disk mirrored pools,
>                     OmniOS iSCSI initiator, Linux iSCSI targets).
> 
>                        I don't believe we did any specific kernel tuning
>                     (and in fact some of
>                     our attempts to fiddle ixgbe driver parameters blew
>                     up in our face).
>                     We did tune iSCSI connection parameters to increase
>                     various buffer
>                     sizes so that ZFS could do even large single
>                     operations in single iSCSI
>                     transactions. (More details available if people are
>                     interested.)
> 
>                         10: At the wire level, the speed problems are
>                         clearly due to pauses in
>                         response time by omnios. At 9000 byte frame
>                         sizes, I see a good number
>                         of duplicate ACKs and fast retransmits during
>                         read operations (when
>                         omnios is transmitting). But below about a
>                         4100-byte MTU on omnios
>                         (which seems to correlate to 4096-byte iSCSI
>                         block transfers), the
>                         transmission errors fade away and we only see
>                         the transmission pause
>                         problem.
> 
> 
>                        This is what really attracted my attention. In
>                     our OmniOS setup, our
>                     specific Intel hardware had ixgbe driver issues that
>                     could cause
>                     activity stalls during once-a-second link heartbeat
>                     checks. This
>                     obviously had an effect at the TCP and iSCSI layers.
>                     My initial message
>                     to illumos-developer sparked a potentially
>                     interesting discussion:
> 
> 
>                     http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/ <http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/>
>                     <http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/ <http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/>>
> 
>                     If you think this is a possibility in your setup,
>                     I've put the DTrace
>                     script I used to hunt for this up on the web:
> 
>                     http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d>
>                     <http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d <http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d>>
> 
>                     This isn't the only potential source of driver
>                     stalls by any means, it's
>                     just the one I found. You may also want to look at
>                     lockstat in general,
>                     as information it reported is what led us to look
>                     specifically at the
>                     ixgbe code here.
> 
>                     (If you suspect kernel/driver issues, lockstat
>                     combined with kernel
>                     source is a really excellent resource.)
> 
>                               - cks
> 
> 
> 
> 
>                 _________________________________________________
>                 OmniOS-discuss mailing list
>                 OmniOS-discuss at lists.omniti.__com
>                 <mailto:OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>>
>                 http://lists.omniti.com/__mailman/listinfo/omnios-__discuss <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>                 <http://lists.omniti.com/mailman/listinfo/omnios-discuss <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
> 
> 
>         _________________________________________________
>         OmniOS-discuss mailing list
>         OmniOS-discuss at lists.omniti.__com
>         <mailto:OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>>
>         http://lists.omniti.com/__mailman/listinfo/omnios-__discuss <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>         <http://lists.omniti.com/mailman/listinfo/omnios-discuss <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
> 
> 
>     --
>     OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>     Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> <tel:%2B49%20911%2039905-0> - Fax: +49 911
>     39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de <http://www.osn.de/>
>     HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
> 
> 
> 
> -- 
> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
> Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> - Fax: +49 911 39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de <http://www.osn.de/>
> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
> 
> illumos-developer | Archives <https://www.listbox.com/member/archive/182179/=now>  <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e> | Modify <https://www.listbox.com/member/?member_id=21239177&id_secret=21239177-2d0c9337> Your Subscription	  <http://www.listbox.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150301/a6bd44b1/attachment-0001.html>

From mark0x01 at gmail.com  Mon Mar  2 08:12:12 2015
From: mark0x01 at gmail.com (Mark)
Date: Mon, 02 Mar 2015 21:12:12 +1300
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
 Lindsay Lohan, and the Greek economy
In-Reply-To: <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>	<20150221032559.727D07A0792@apps0.cs.toronto.edu>	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>	<54EAEFA8.4020101@osn.de>	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>	<54EB5392.6030900@osn.de>	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
Message-ID: <54F41B5C.8070108@gmail.com>

LACP does work - I have used on HP Procurve, but settings are fussy & 
usually different than Etherchannel uses.

(http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004048)

Did you try changing the virtual switch settings?


On 2/03/2015 6:11 p.m., Garrett D'Amore wrote:
> I?m not sure I?ve followed properly.  You have *two* interfaces.  You
> are not trying to provision these in an aggr are you? As far as I?m
> aware, VMware does not support 802.3ad link aggregations.  (Its possible
> that you can make it work with ESXi if you give the entire NIC to the
> guest ? but I?m skeptical.)  The problem is that if you try to use link
> aggregation, some packets (up to half!) will be lost.  TCP and other
> protocols fare poorly in this situation.
>
> Its possible I?ve totally misunderstood what you?re trying to do, in
> which case I apologize.
>
> The idle thing is a red-herring ? the cpu is waiting for work to do,
> probably because packets haven?t arrived (or where dropped by the
> hypervisor!)  I wouldn?t read too much into that except that your
> network stack is in trouble.  I?d look a bit more closely at the kstats
> for tcp ? I suspect you?ll see retransmits or out of order values that
> are unusually high ? if so this may help validate my theory above.
>
> - Garrett
>
>> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>> <developer at lists.illumos.org <mailto:developer at lists.illumos.org>> wrote:
>>
>> Hello all,
>>
>>
>> Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>>
>>
>> I tried Joerg's updated driver, which didn't improve the issue. So I
>> went back to the drawing board and rebuilt the server from scratch.
>>
>> What I noted is that if I have only a single 1-gig physical interface
>> active on the ESXi host, everything works as expected. As soon as I
>> enable two interfaces, I start seeing the performance problems I've
>> described.
>>
>> Response pauses from the server that I see in TCPdumps are still
>> leading me to believe the problem is delay on the server side, so I
>> ran a series of kernel dtraces and produced some flamegraphs.
>>
>>
>> This was taken during a read operation with two active 10G interfaces
>> on the server, with a single target being shared by two tpgs- one tpg
>> for each 10G physical port. The host device has two 1G ports enabled,
>> with VLANs separating the active ports into 10G/1G pairs. ESXi is set
>> to multipath using both VLANS with a round-robin IO interval of 1.
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing
>>
>>
>> This was taken during a write operation:
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing
>>
>>
>> I then rebooted the server and disabled C-State, ACPI T-State, and
>> general EIST (Turbo boost) functionality in the CPU.
>>
>> I when I attempted to boot my guest VM, the iSCSI transfer gradually
>> ground to a halt during the boot loading process, and the guest OS
>> never did complete its boot process.
>>
>> Here is a flamegraph taken while iSCSI is slowly dying:
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing
>>
>>
>> I edited out cpu_idle_adaptive from the dtrace output and regenerated
>> the slowdown graph:
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing
>>
>>
>> I then edited cpu_idle_adaptive out of the speedy write operation and
>> regenerated that graph:
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing
>>
>>
>> I have zero experience with interpreting flamegraphs, but the most
>> significant difference I see between the slow read example and the
>> fast write example is in unix`thread_start --> unix`idle. There's a
>> good chunk of "unix`i86_mwait" in the read example that is not present
>> in the write example at all.
>>
>> Disabling the l2arc cache device didn't make a difference, and I had
>> to reenable EIST support on the CPU to get my VMs to boot.
>>
>> I am seeing a variety of bug reports going back to 2010 regarding
>> excessive mwait operations, with the suggested solutions usually being
>> to set "cpupm enable poll-mode" in power.conf. That change also had no
>> effect on speed.
>>
>> -Warren V
>>
>>
>>
>>
>> -----Original Message-----
>>
>> From: Chris Siebenmann [mailto:cks at cs.toronto.edu]
>>
>> Sent: Monday, February 23, 2015 8:30 AM
>>
>> To: W Verb
>>
>> Cc: omnios-discuss at lists.omniti.com
>> <mailto:omnios-discuss at lists.omniti.com>; cks at cs.toronto.edu
>> <mailto:cks at cs.toronto.edu>
>>
>> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the
>> Greek economy
>>
>>
>> > Chris, thanks for your specific details. I'd appreciate it if you
>>
>> > could tell me which copper NIC you tried, as well as to pass on the
>>
>> > iSCSI tuning parameters.
>>
>>
>> Our copper NIC experience is with onboard X540-AT2 ports on SuperMicro
>> hardware (which have the guaranteed 10-20 msec lock hold) and
>> dual-port 82599EB TN cards (which have some sort of driver/hardware
>> failure under load that eventually leads to 2-second lock holds). I
>> can't recommend either with the current driver; we had to revert to 1G
>> networking in order to get stable servers.
>>
>>
>> The iSCSI parameter modifications we do, across both initiators and
>> targets, are:
>>
>>
>> initialr2tno
>>
>> firstburstlength128k
>>
>> maxrecvdataseglen128k[only on Linux backends]
>>
>> maxxmitdataseglen128k[only on Linux backends]
>>
>>
>> The OmniOS initiator doesn't need tuning for more than the first two
>> parameters; on the Linux backends we tune up all four. My extended
>> thoughts on these tuning parameters and why we touch them can be found
>>
>> here:
>>
>>
>> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol <http://utcc.utoronto.ca/%7Ecks/space/blog/tech/UnderstandingiSCSIProtocol>
>>
>> http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>> <http://utcc.utoronto.ca/%7Ecks/space/blog/tech/LikelyISCSITuning>
>>
>>
>> The short version is that these parameters probably only make a small
>> difference but their overall goal is to do 128KB ZFS reads and writes
>> in single iSCSI operations (although they will be fragmented at the TCP
>>
>> layer) and to do iSCSI writes without a back-and-forth delay between
>> initiator and target (that's 'initialr2t no').
>>
>>
>> I think basically everyone should use InitialR2T set to no and in fact
>> that it should be the software default. These days only unusually
>> limited iSCSI targets should need it to be otherwise and they can
>> change their setting for it (initiator and target must both agree to
>> it being 'yes', so either can veto it).
>>
>>
>> - cks
>>
>>
>>
>> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de
>> <mailto:jg at osn.de>> wrote:
>>
>>     Hi,
>>
>>     I think your problem is caused by your link properties or your
>>     switch settings. In general the standard ixgbe seems to perform
>>     well.
>>
>>     I had trouble after changing the default flow control settings to "bi"
>>     and this was my motivation to update the ixgbe driver a long time ago.
>>     After I have updated our systems to ixgbe 2.5.8 I never had any
>>     problems ....
>>
>>     Make sure your switch has support for jumbo frames and you use
>>     the same mtu on all ports, otherwise the smallest will be used.
>>
>>     What switch do you use? I can tell you nice horror stories about
>>     different vendors....
>>
>>      - Joerg
>>
>>     On 23.02.2015 10:31, W Verb wrote:
>>
>>         Thank you Joerg,
>>
>>         I've downloaded the package and will try it tomorrow.
>>
>>         The only thing I can add at this point is that upon review of my
>>         testing, I may have performed my "pkg -u" between the initial
>>         quad-gig
>>         performance test and installing the 10G NIC. So this may be a new
>>         problem introduced in the latest updates.
>>
>>         Those of you who are running 10G and have not upgraded to the
>>         latest
>>         kernel, etc, might want to do some additional testing before
>>         running the
>>         update.
>>
>>         -Warren V
>>
>>         On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann <jg at osn.de
>>         <mailto:jg at osn.de>
>>         <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
>>
>>             Hi,
>>
>>             I remember there was a problem with the flow control
>>         settings in the
>>             ixgbe
>>             driver, so I updated it a long time ago for our internal
>>         servers to
>>             2.5.8.
>>             Last weekend I integrated the latest changes from the
>>         FreeBSD driver
>>             to bring
>>             the illumos ixgbe to 2.5.25 but I had no time to test it,
>>         so it's
>>             completely
>>             untested!
>>
>>
>>             If you would like to give the latest driver a try you can
>>         fetch the
>>             kernel modules from
>>         https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9
>>         <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>>         <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>
>>
>>             Clone your boot environment, place the modules in the new
>>         environment
>>             and update the boot-archive of the new BE.
>>
>>               - Joerg
>>
>>
>>
>>
>>
>>             On 23.02.2015 02:54, W Verb wrote:
>>
>>                 By the way, to those of you who have working setups:
>>         please send me
>>                 your pool/volume settings, interface linkprops, and
>>         any kernel
>>                 tuning
>>                 parameters you may have set.
>>
>>                 Thanks,
>>                 Warren V
>>
>>                 On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>                 <chip at innovates.com <mailto:chip at innovates.com>
>>         <mailto:chip at innovates.com <mailto:chip at innovates.com>>> wrote:
>>
>>                     I can't say I totally agree with your performance
>>                     assessment.   I run Intel
>>                     X520 in all my OmniOS boxes.
>>
>>                     Here is a capture of nfssvrtop I made while
>>         running many
>>                     storage vMotions
>>                     between two OmniOS boxes hosting NFS datastores.
>>          This is a
>>                     10 host VMware
>>                     cluster.  Both OmniOS boxes are dual 10G connected
>>         with
>>                     copper twin-ax to
>>                     the in rack Nexus 5010.
>>
>>                     VMware does 100% sync writes, I use ZeusRAM SSDs
>>         for log
>>                     devices.
>>
>>                     -Chip
>>
>>                     2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB,
>>                     swrite: 15985    KB,
>>                     awrite: 1875455  KB
>>
>>                     Ver     Client           NFSOPS   Reads SWrites
>>         AWrites
>>                     Commits   Rd_bw
>>                     SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t   Com_t  Align%
>>
>>                     4       10.28.17.105          0       0       0
>>            0
>>                       0       0
>>                     0       0       0       0       0       0       0
>>
>>                     4       10.28.17.215          0       0       0
>>            0
>>                       0       0
>>                     0       0       0       0       0       0       0
>>
>>                     4       10.28.17.213          0       0       0
>>            0
>>                       0       0
>>                     0       0       0       0       0       0       0
>>
>>                     4       10.28.16.151          0       0       0
>>            0
>>                       0       0
>>                     0       0       0       0       0       0       0
>>
>>                     4       all                   1       0       0
>>            0
>>                       0       0
>>                     0       0       0       0       0       0       0
>>
>>                     3       10.28.16.175          3       0       3
>>            0
>>                       0       1
>>                     11       0    4806      48       0       0      85
>>
>>                     3       10.28.16.183          6       0       6
>>            0
>>                       0       3
>>                     162       0     549     124       0       0      73
>>
>>                     3       10.28.16.180         11       0      10
>>            0
>>                       0       3
>>                     27       0     776      89       0       0      67
>>
>>                     3       10.28.16.176         28       2      26
>>            0
>>                       0      10
>>                     405       0    2572     198       0       0     100
>>
>>                     3       10.28.16.178       4606    4602       4
>>            0
>>                       0  294534
>>                     3       0     723      49       0       0      99
>>
>>                     3       10.28.16.179       4905    4879      26
>>            0
>>                       0  312208
>>                     311       0     735     271       0       0      99
>>
>>                     3       10.28.16.181       5515    5502      13
>>            0
>>                       0  352107
>>                     77       0      89      87       0       0      99
>>
>>                     3       10.28.16.184      12095   12059      10
>>            0
>>                       0  763014
>>                     39       0     249     147       0       0      99
>>
>>                     3       10.28.58.1        15401    6040     116
>>         6354
>>                     53  191605
>>                     474  202346     192      96     144      83      99
>>
>>                     3       all 42574 33086 <tel:42574%2033086>
>>         <tel:42574%20%20%2033086>     217
>>                     6354      53 1913488
>>                     1582  202300     348     138     153     105      99
>>
>>
>>
>>
>>
>>                     On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>>         <wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>>                     <mailto:wverb73 at gmail.com
>>         <mailto:wverb73 at gmail.com>>> wrote:
>>
>>
>>                         Hello All,
>>
>>                         Thank you for your replies.
>>                         I tried a few things, and found the following:
>>
>>                         1: Disabling hyperthreading support in the
>>         BIOS drops
>>                         performance overall
>>                         by a factor of 4.
>>                         2: Disabling VT support also seems to have
>>         some effect,
>>                         although it
>>                         appears to be minor. But this has the amusing side
>>                         effect of fixing the
>>                         hangs I've been experiencing with fast reboot.
>>         Probably
>>                         by disabling kvm.
>>                         3: The performance tests are a bit tricky to
>>         quantify
>>                         because of caching
>>                         effects. In fact, I'm not entirely sure what is
>>                         happening here. It's just
>>                         best to describe what I'm seeing:
>>
>>                         The commands I'm using to test are
>>                         dd if=/dev/zero of=./test.dd bs=2M count=5000
>>                         dd of=/dev/null if=./test.dd bs=2M count=5000
>>                         The host vm is running Centos 6.6, and has the
>>         latest
>>                         vmtools installed.
>>                         There is a host cache on an SSD local to the
>>         host that
>>                         is also in place.
>>                         Disabling the host cache didn't immediately
>>         have an
>>                         effect as far as I could
>>                         see.
>>
>>                         The host MTU set to 3000 on all iSCSI
>>         interfaces for all
>>                         tests.
>>
>>                         Test 1: Right after reboot, with an ixgbe MTU
>>         of 9000,
>>                         the write test
>>                         yields an average speed over three tests of
>>         137MB/s. The
>>                         read test yields an
>>                         average over three tests of 5MB/s.
>>
>>                         Test 2: After setting "ifconfig ixgbe0 mtu
>>         3000", the
>>                         write tests yield
>>                         140MB/s, and the read tests yield 53MB/s. It's
>>         important
>>                         to note here that
>>                         if I cut the read test short at only 2-3GB, I get
>>                         results upwards of
>>                         350MB/s, which I assume is local cache-related
>>         distortion.
>>
>>                         Test 3: MTU of 1500. Read tests are up to 156
>>         MB/s.
>>                         Write tests yield
>>                         about 142MB/s.
>>                         Test 4: MTU of 1000: Read test at 182MB/s.
>>                         Test 5: MTU of 900: Read test at 130 MB/s.
>>                         Test 6: MTU of 1000: Read test at 160MB/s.
>>         Write tests
>>                         are now
>>                         consistently at about 300MB/s.
>>                         Test 7: MTU of 1200: Read test at 124MB/s.
>>                         Test 8: MTU of 1000: Read test at 161MB/s.
>>         Write at 261MB/s.
>>
>>                         A few final notes:
>>                         L1ARC grabs about 10GB of RAM during the tests, so
>>                         there's definitely some
>>                         read caching going on.
>>                         The write operations are easier to observe
>>         with iostat,
>>                         and I'm seeing io
>>                         rates that closely correlate with the network
>>         write speeds.
>>
>>
>>                         Chris, thanks for your specific details. I'd
>>         appreciate
>>                         it if you could
>>                         tell me which copper NIC you tried, as well as
>>         to pass
>>                         on the iSCSI tuning
>>                         parameters.
>>
>>                         I've ordered an Intel EXPX9502AFXSR, which
>>         uses the
>>                         82598 chip instead of
>>                         the 82599 in the X520. If I get similar
>>         results with my
>>                         fiber transcievers,
>>                         I'll see if I can get a hold of copper ones.
>>
>>                         But I should mention that I did indeed look at
>>         PHY/MAC
>>                         error rates, and
>>                         they are nil.
>>
>>                         -Warren V
>>
>>                         On Fri, Feb 20, 2015 at 7:25 PM, Chris Siebenmann
>>                         <cks at cs.toronto.edu
>>         <mailto:cks at cs.toronto.edu> <mailto:cks at cs.toronto.edu
>>         <mailto:cks at cs.toronto.edu>>>
>>
>>                         wrote:
>>
>>
>>                                 After installation and configuration,
>>         I observed
>>                                 all kinds of bad
>>                                 behavior
>>                                 in the network traffic between the
>>         hosts and the
>>                                 server. All of this
>>                                 bad
>>                                 behavior is traced to the ixgbe driver
>>         on the
>>                                 storage server. Without
>>                                 going
>>                                 into the full troubleshooting process,
>>         here are
>>                                 my takeaways:
>>
>>                             [...]
>>
>>                                For what it's worth, we managed to
>>         achieve much
>>                             better line rates on
>>                             copper 10G ixgbe hardware of various
>>         descriptions
>>                             between OmniOS
>>                             and CentOS 7 (I don't think we ever tested
>>         OmniOS to
>>                             OmniOS). I don't
>>                             believe OmniOS could do TCP at full line
>>         rate but I
>>                             think we managed 700+
>>                             Mbytes/sec on both transmit and receive
>>         and we got
>>                             basically disk-limited
>>                             speeds with iSCSI (across multiple disks on
>>                             multi-disk mirrored pools,
>>                             OmniOS iSCSI initiator, Linux iSCSI targets).
>>
>>                                I don't believe we did any specific
>>         kernel tuning
>>                             (and in fact some of
>>                             our attempts to fiddle ixgbe driver
>>         parameters blew
>>                             up in our face).
>>                             We did tune iSCSI connection parameters to
>>         increase
>>                             various buffer
>>                             sizes so that ZFS could do even large single
>>                             operations in single iSCSI
>>                             transactions. (More details available if
>>         people are
>>                             interested.)
>>
>>                                 10: At the wire level, the speed
>>         problems are
>>                                 clearly due to pauses in
>>                                 response time by omnios. At 9000 byte
>>         frame
>>                                 sizes, I see a good number
>>                                 of duplicate ACKs and fast retransmits
>>         during
>>                                 read operations (when
>>                                 omnios is transmitting). But below about a
>>                                 4100-byte MTU on omnios
>>                                 (which seems to correlate to 4096-byte
>>         iSCSI
>>                                 block transfers), the
>>                                 transmission errors fade away and we
>>         only see
>>                                 the transmission pause
>>                                 problem.
>>
>>
>>                                This is what really attracted my
>>         attention. In
>>                             our OmniOS setup, our
>>                             specific Intel hardware had ixgbe driver
>>         issues that
>>                             could cause
>>                             activity stalls during once-a-second link
>>         heartbeat
>>                             checks. This
>>                             obviously had an effect at the TCP and
>>         iSCSI layers.
>>                             My initial message
>>                             to illumos-developer sparked a potentially
>>                             interesting discussion:
>>
>>
>>         http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/
>>         <http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/>
>>
>>         <http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/
>>         <http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/>>
>>
>>                             If you think this is a possibility in your
>>         setup,
>>                             I've put the DTrace
>>                             script I used to hunt for this up on the web:
>>
>>         http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d
>>         <http://www.cs.toronto.edu/%7E__cks/src/omnios-ixgbe/ixgbe___delay.d>
>>
>>         <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d
>>         <http://www.cs.toronto.edu/%7Ecks/src/omnios-ixgbe/ixgbe_delay.d>>
>>
>>                             This isn't the only potential source of driver
>>                             stalls by any means, it's
>>                             just the one I found. You may also want to
>>         look at
>>                             lockstat in general,
>>                             as information it reported is what led us
>>         to look
>>                             specifically at the
>>                             ixgbe code here.
>>
>>                             (If you suspect kernel/driver issues, lockstat
>>                             combined with kernel
>>                             source is a really excellent resource.)
>>
>>                                       - cks
>>
>>
>>
>>
>>
>>         ___________________________________________________
>>                         OmniOS-discuss mailing list
>>         OmniOS-discuss at lists.omniti
>>         <mailto:OmniOS-discuss at lists.omniti>.____com
>>                         <mailto:OmniOS-discuss at lists.__omniti.com
>>         <mailto:OmniOS-discuss at lists.omniti.com>>
>>         http://lists.omniti.com/____mailman/listinfo/omnios-____discuss <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>>
>>         <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>         <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>
>>
>>                 ___________________________________________________
>>                 OmniOS-discuss mailing list
>>         OmniOS-discuss at lists.omniti
>>         <mailto:OmniOS-discuss at lists.omniti>.____com
>>                 <mailto:OmniOS-discuss at lists.__omniti.com
>>         <mailto:OmniOS-discuss at lists.omniti.com>>
>>         http://lists.omniti.com/____mailman/listinfo/omnios-____discuss <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>>
>>         <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>         <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>
>>
>>             --
>>             OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408
>>         Nuernberg
>>             Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0>
>>         <tel:%2B49%20911%2039905-0> - Fax: +49 911
>>             39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de
>>         <http://www.osn.de/>
>>             HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>
>>
>>
>>     --
>>     OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>>     Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> - Fax: +49 911
>>     39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de
>>     <http://www.osn.de/>
>>     HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>
>>
>> *illumos-developer* | Archives
>> <https://www.listbox.com/member/archive/182179/=now>
>> <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e>
>> | Modify
>> <https://www.listbox.com/member/?member_id=21239177&id_secret=21239177-2d0c9337>
>> Your Subscription	[Powered by Listbox] <http://www.listbox.com/>
>>
>
>
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>


From wverb73 at gmail.com  Mon Mar  2 08:22:26 2015
From: wverb73 at gmail.com (W Verb)
Date: Mon, 2 Mar 2015 00:22:26 -0800
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
 Lindsay Lohan, and the Greek economy
In-Reply-To: <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
Message-ID: <CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>

Hello Garrett,

No, no 802.3ad going on in this config.

Here is a basic schematic:

https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing

Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:

https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing

Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The switch
is set to allow 9148-byte frames, and I'm not seeing any errors/buffer
overruns on the switch.

Here is a screenshot of a packet capture from a read operation on the guest
OS (from it's local drive, which is actually a VMDK file on the storage
server). In this example, only a single 1G ESXi kernel interface (vmk1) is
bound to the software iSCSI initiator.

https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing

Note that there's a nice, well-behaved window sizing process taking place.
The ESXi decreases the scaled window by 11 or 12 for each ACK, then bumps
it back up to 512.

Here is a similar screenshot of a single-interface write operation:

https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing

There are no pauses or gaps in the transmission rate in the
single-interface transfers.


In the next screenshots, I have enabled an additional 1G interface on the
ESXi host, and bound it to the iSCSI initiator. The new interface is bound
to a separate physical port, uses a different VLAN on the switch, and talks
to a different 10G port on the storage server.

First, let's look at a write operation on the guest OS, which happily pumps
data at near-line-rate to the storage server.

Here is a sequence number trace diagram. Note how the transfer has a nice,
smooth increment rate over the entire transfer.

https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing

Here are screenshots from packet captures on both 1G interfaces:

https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing
https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing

Note how we again see nice, smooth window adjustment, and no gaps in
transmission.


But now, let's look at the problematic two-interface Read operation.
First, the sequence graph:

https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing

As you can see, there are gaps and jumps in the transmission throughout the
transfer.
It is very illustrative to look at captures of the gaps, which are
occurring on both interfaces:

https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing
https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing

As you can see, there are ~.4 second pauses in transmission from the
storage server, which kills the transfer rate.
It's clear that the ESXi box ACKs the prior iSCSI operation to completion,
then makes a new LUN request, which the storage server immediately replies
to. The ESXi ACKs the response packet from the storage server, then
waits...and waits....and waits... until eventually the storage server
starts transmitting again.

Because the pause happens while the ESXi client is waiting for a packet
from the storage server, that tells me that the gaps are not an artifact of
traffic being switched between both active interfaces, but are actually
indicative of short hangs occurring on the server.

Having a pause or two in transmission is no big deal, but in my case, it is
happening constantly, and dropping my overall read transfer rate down to
20-60MB/s, which is slower than the single interface transfer rate
(~90-100MB/s).

Decreasing the MTU makes the pauses shorter, increasing them makes the
pauses longer.

Another interesting thing is that if I set the multipath io interval to 3
operations instead of 1, I get better throughput. In other words, the less
frequently I swap IP addresses on my iSCSI requests from the ESXi unit, the
fewer pauses I see.

Basically, COMSTAR seems to choke each time an iSCSI request from a new IP
arrives.

Because the single interface transfer is near line rate, that tells me that
the storage system (mpt_sas, zfs, etc) is working fine. It's only when
multiple paths are attempted that iSCSI falls on its face during reads.

All of these captures were taken without a cache device being attached to
the storage zpool, so this isn't looking like some kind of ZFS ARC problem.
As mentioned previously, local transfers to/from the zpool are showing
~300-500 MB/s rates over long transfers (10G+).

-Warren V

On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org> wrote:

> I?m not sure I?ve followed properly.  You have *two* interfaces.  You are
> not trying to provision these in an aggr are you? As far as I?m aware,
> VMware does not support 802.3ad link aggregations.  (Its possible that you
> can make it work with ESXi if you give the entire NIC to the guest ? but
> I?m skeptical.)  The problem is that if you try to use link aggregation,
> some packets (up to half!) will be lost.  TCP and other protocols fare
> poorly in this situation.
>
> Its possible I?ve totally misunderstood what you?re trying to do, in which
> case I apologize.
>
> The idle thing is a red-herring ? the cpu is waiting for work to do,
> probably because packets haven?t arrived (or where dropped by the
> hypervisor!)  I wouldn?t read too much into that except that your network
> stack is in trouble.  I?d look a bit more closely at the kstats for tcp ? I
> suspect you?ll see retransmits or out of order values that are unusually
> high ? if so this may help validate my theory above.
>
> - Garrett
>
> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer <
> developer at lists.illumos.org> wrote:
>
> Hello all,
>
>
> Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>
>
> I tried Joerg's updated driver, which didn't improve the issue. So I went
> back to the drawing board and rebuilt the server from scratch.
>
> What I noted is that if I have only a single 1-gig physical interface
> active on the ESXi host, everything works as expected. As soon as I enable
> two interfaces, I start seeing the performance problems I've described.
>
> Response pauses from the server that I see in TCPdumps are still leading
> me to believe the problem is delay on the server side, so I ran a series of
> kernel dtraces and produced some flamegraphs.
>
>
> This was taken during a read operation with two active 10G interfaces on
> the server, with a single target being shared by two tpgs- one tpg for each
> 10G physical port. The host device has two 1G ports enabled, with VLANs
> separating the active ports into 10G/1G pairs. ESXi is set to multipath
> using both VLANS with a round-robin IO interval of 1.
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing
>
>
> This was taken during a write operation:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing
>
>
> I then rebooted the server and disabled C-State, ACPI T-State, and general
> EIST (Turbo boost) functionality in the CPU.
>
> I when I attempted to boot my guest VM, the iSCSI transfer gradually
> ground to a halt during the boot loading process, and the guest OS never
> did complete its boot process.
>
> Here is a flamegraph taken while iSCSI is slowly dying:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing
>
>
> I edited out cpu_idle_adaptive from the dtrace output and regenerated the
> slowdown graph:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing
>
>
> I then edited cpu_idle_adaptive out of the speedy write operation and
> regenerated that graph:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing
>
>
> I have zero experience with interpreting flamegraphs, but the most
> significant difference I see between the slow read example and the fast
> write example is in unix`thread_start --> unix`idle. There's a good chunk
> of "unix`i86_mwait" in the read example that is not present in the write
> example at all.
>
> Disabling the l2arc cache device didn't make a difference, and I had to
> reenable EIST support on the CPU to get my VMs to boot.
>
> I am seeing a variety of bug reports going back to 2010 regarding
> excessive mwait operations, with the suggested solutions usually being to
> set "cpupm enable poll-mode" in power.conf. That change also had no effect
> on speed.
>
> -Warren V
>
>
>
>
> -----Original Message-----
>
> From: Chris Siebenmann [mailto:cks at cs.toronto.edu <cks at cs.toronto.edu>]
>
> Sent: Monday, February 23, 2015 8:30 AM
>
> To: W Verb
>
> Cc: omnios-discuss at lists.omniti.com; cks at cs.toronto.edu
>
> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the
> Greek economy
>
>
> > Chris, thanks for your specific details. I'd appreciate it if you
>
> > could tell me which copper NIC you tried, as well as to pass on the
>
> > iSCSI tuning parameters.
>
>
>  Our copper NIC experience is with onboard X540-AT2 ports on SuperMicro
> hardware (which have the guaranteed 10-20 msec lock hold) and dual-port
> 82599EB TN cards (which have some sort of driver/hardware failure under
> load that eventually leads to 2-second lock holds). I can't recommend
> either with the current driver; we had to revert to 1G networking in order
> to get stable servers.
>
>
>  The iSCSI parameter modifications we do, across both initiators and
> targets, are:
>
>
>       initialr2t        no
>
>       firstburstlength  128k
>
>       maxrecvdataseglen 128k        [only on Linux backends]
>
>       maxxmitdataseglen 128k        [only on Linux backends]
>
>
> The OmniOS initiator doesn't need tuning for more than the first two
> parameters; on the Linux backends we tune up all four. My extended thoughts
> on these tuning parameters and why we touch them can be found
>
> here:
>
>
>    http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol
>
>    http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>
>
> The short version is that these parameters probably only make a small
> difference but their overall goal is to do 128KB ZFS reads and writes in
> single iSCSI operations (although they will be fragmented at the TCP
>
> layer) and to do iSCSI writes without a back-and-forth delay between
> initiator and target (that's 'initialr2t no').
>
>
>  I think basically everyone should use InitialR2T set to no and in fact
> that it should be the software default. These days only unusually limited
> iSCSI targets should need it to be otherwise and they can change their
> setting for it (initiator and target must both agree to it being 'yes', so
> either can veto it).
>
>
>       - cks
>
>
> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de> wrote:
>
>> Hi,
>>
>> I think your problem is caused by your link properties or your
>> switch settings. In general the standard ixgbe seems to perform
>> well.
>>
>> I had trouble after changing the default flow control settings to "bi"
>> and this was my motivation to update the ixgbe driver a long time ago.
>> After I have updated our systems to ixgbe 2.5.8 I never had any
>> problems ....
>>
>> Make sure your switch has support for jumbo frames and you use
>> the same mtu on all ports, otherwise the smallest will be used.
>>
>> What switch do you use? I can tell you nice horror stories about
>> different vendors....
>>
>>  - Joerg
>>
>> On 23.02.2015 10:31, W Verb wrote:
>>
>>> Thank you Joerg,
>>>
>>> I've downloaded the package and will try it tomorrow.
>>>
>>> The only thing I can add at this point is that upon review of my
>>> testing, I may have performed my "pkg -u" between the initial quad-gig
>>> performance test and installing the 10G NIC. So this may be a new
>>> problem introduced in the latest updates.
>>>
>>> Those of you who are running 10G and have not upgraded to the latest
>>> kernel, etc, might want to do some additional testing before running the
>>> update.
>>>
>>> -Warren V
>>>
>>> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann <jg at osn.de
>>> <mailto:jg at osn.de>> wrote:
>>>
>>>     Hi,
>>>
>>>     I remember there was a problem with the flow control settings in the
>>>     ixgbe
>>>     driver, so I updated it a long time ago for our internal servers to
>>>     2.5.8.
>>>     Last weekend I integrated the latest changes from the FreeBSD driver
>>>     to bring
>>>     the illumos ixgbe to 2.5.25 but I had no time to test it, so it's
>>>     completely
>>>     untested!
>>>
>>>
>>>     If you would like to give the latest driver a try you can fetch the
>>>     kernel modules from
>>>     https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>>>     <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>
>>>
>>>     Clone your boot environment, place the modules in the new environment
>>>     and update the boot-archive of the new BE.
>>>
>>>       - Joerg
>>>
>>>
>>>
>>>
>>>
>>>     On 23.02.2015 02:54, W Verb wrote:
>>>
>>>         By the way, to those of you who have working setups: please send
>>> me
>>>         your pool/volume settings, interface linkprops, and any kernel
>>>         tuning
>>>         parameters you may have set.
>>>
>>>         Thanks,
>>>         Warren V
>>>
>>>         On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>>         <chip at innovates.com <mailto:chip at innovates.com>> wrote:
>>>
>>>             I can't say I totally agree with your performance
>>>             assessment.   I run Intel
>>>             X520 in all my OmniOS boxes.
>>>
>>>             Here is a capture of nfssvrtop I made while running many
>>>             storage vMotions
>>>             between two OmniOS boxes hosting NFS datastores.   This is a
>>>             10 host VMware
>>>             cluster.  Both OmniOS boxes are dual 10G connected with
>>>             copper twin-ax to
>>>             the in rack Nexus 5010.
>>>
>>>             VMware does 100% sync writes, I use ZeusRAM SSDs for log
>>>             devices.
>>>
>>>             -Chip
>>>
>>>             2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB,
>>>             swrite: 15985    KB,
>>>             awrite: 1875455  KB
>>>
>>>             Ver     Client           NFSOPS   Reads SWrites AWrites
>>>             Commits   Rd_bw
>>>             SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t   Com_t  Align%
>>>
>>>             4       10.28.17.105          0       0       0       0
>>>               0       0
>>>             0       0       0       0       0       0       0
>>>
>>>             4       10.28.17.215          0       0       0       0
>>>               0       0
>>>             0       0       0       0       0       0       0
>>>
>>>             4       10.28.17.213          0       0       0       0
>>>               0       0
>>>             0       0       0       0       0       0       0
>>>
>>>             4       10.28.16.151          0       0       0       0
>>>               0       0
>>>             0       0       0       0       0       0       0
>>>
>>>             4       all                   1       0       0       0
>>>               0       0
>>>             0       0       0       0       0       0       0
>>>
>>>             3       10.28.16.175          3       0       3       0
>>>               0       1
>>>             11       0    4806      48       0       0      85
>>>
>>>             3       10.28.16.183          6       0       6       0
>>>               0       3
>>>             162       0     549     124       0       0      73
>>>
>>>             3       10.28.16.180         11       0      10       0
>>>               0       3
>>>             27       0     776      89       0       0      67
>>>
>>>             3       10.28.16.176         28       2      26       0
>>>               0      10
>>>             405       0    2572     198       0       0     100
>>>
>>>             3       10.28.16.178       4606    4602       4       0
>>>               0  294534
>>>             3       0     723      49       0       0      99
>>>
>>>             3       10.28.16.179       4905    4879      26       0
>>>               0  312208
>>>             311       0     735     271       0       0      99
>>>
>>>             3       10.28.16.181       5515    5502      13       0
>>>               0  352107
>>>             77       0      89      87       0       0      99
>>>
>>>             3       10.28.16.184      12095   12059      10       0
>>>               0  763014
>>>             39       0     249     147       0       0      99
>>>
>>>             3       10.28.58.1        15401    6040     116    6354
>>>             53  191605
>>>             474  202346     192      96     144      83      99
>>>
>>>             3       all 42574 33086 <tel:42574%20%20%2033086>     217
>>>             6354      53 1913488
>>>             1582  202300     348     138     153     105      99
>>>
>>>
>>>
>>>
>>>
>>>             On Fri, Feb 20, 2015 at 11:46 PM, W Verb <wverb73 at gmail.com
>>>             <mailto:wverb73 at gmail.com>> wrote:
>>>
>>>
>>>                 Hello All,
>>>
>>>                 Thank you for your replies.
>>>                 I tried a few things, and found the following:
>>>
>>>                 1: Disabling hyperthreading support in the BIOS drops
>>>                 performance overall
>>>                 by a factor of 4.
>>>                 2: Disabling VT support also seems to have some effect,
>>>                 although it
>>>                 appears to be minor. But this has the amusing side
>>>                 effect of fixing the
>>>                 hangs I've been experiencing with fast reboot. Probably
>>>                 by disabling kvm.
>>>                 3: The performance tests are a bit tricky to quantify
>>>                 because of caching
>>>                 effects. In fact, I'm not entirely sure what is
>>>                 happening here. It's just
>>>                 best to describe what I'm seeing:
>>>
>>>                 The commands I'm using to test are
>>>                 dd if=/dev/zero of=./test.dd bs=2M count=5000
>>>                 dd of=/dev/null if=./test.dd bs=2M count=5000
>>>                 The host vm is running Centos 6.6, and has the latest
>>>                 vmtools installed.
>>>                 There is a host cache on an SSD local to the host that
>>>                 is also in place.
>>>                 Disabling the host cache didn't immediately have an
>>>                 effect as far as I could
>>>                 see.
>>>
>>>                 The host MTU set to 3000 on all iSCSI interfaces for all
>>>                 tests.
>>>
>>>                 Test 1: Right after reboot, with an ixgbe MTU of 9000,
>>>                 the write test
>>>                 yields an average speed over three tests of 137MB/s. The
>>>                 read test yields an
>>>                 average over three tests of 5MB/s.
>>>
>>>                 Test 2: After setting "ifconfig ixgbe0 mtu 3000", the
>>>                 write tests yield
>>>                 140MB/s, and the read tests yield 53MB/s. It's important
>>>                 to note here that
>>>                 if I cut the read test short at only 2-3GB, I get
>>>                 results upwards of
>>>                 350MB/s, which I assume is local cache-related
>>> distortion.
>>>
>>>                 Test 3: MTU of 1500. Read tests are up to 156 MB/s.
>>>                 Write tests yield
>>>                 about 142MB/s.
>>>                 Test 4: MTU of 1000: Read test at 182MB/s.
>>>                 Test 5: MTU of 900: Read test at 130 MB/s.
>>>                 Test 6: MTU of 1000: Read test at 160MB/s. Write tests
>>>                 are now
>>>                 consistently at about 300MB/s.
>>>                 Test 7: MTU of 1200: Read test at 124MB/s.
>>>                 Test 8: MTU of 1000: Read test at 161MB/s. Write at
>>> 261MB/s.
>>>
>>>                 A few final notes:
>>>                 L1ARC grabs about 10GB of RAM during the tests, so
>>>                 there's definitely some
>>>                 read caching going on.
>>>                 The write operations are easier to observe with iostat,
>>>                 and I'm seeing io
>>>                 rates that closely correlate with the network write
>>> speeds.
>>>
>>>
>>>                 Chris, thanks for your specific details. I'd appreciate
>>>                 it if you could
>>>                 tell me which copper NIC you tried, as well as to pass
>>>                 on the iSCSI tuning
>>>                 parameters.
>>>
>>>                 I've ordered an Intel EXPX9502AFXSR, which uses the
>>>                 82598 chip instead of
>>>                 the 82599 in the X520. If I get similar results with my
>>>                 fiber transcievers,
>>>                 I'll see if I can get a hold of copper ones.
>>>
>>>                 But I should mention that I did indeed look at PHY/MAC
>>>                 error rates, and
>>>                 they are nil.
>>>
>>>                 -Warren V
>>>
>>>                 On Fri, Feb 20, 2015 at 7:25 PM, Chris Siebenmann
>>>                 <cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>>
>>>
>>>                 wrote:
>>>
>>>
>>>                         After installation and configuration, I observed
>>>                         all kinds of bad
>>>                         behavior
>>>                         in the network traffic between the hosts and the
>>>                         server. All of this
>>>                         bad
>>>                         behavior is traced to the ixgbe driver on the
>>>                         storage server. Without
>>>                         going
>>>                         into the full troubleshooting process, here are
>>>                         my takeaways:
>>>
>>>                     [...]
>>>
>>>                        For what it's worth, we managed to achieve much
>>>                     better line rates on
>>>                     copper 10G ixgbe hardware of various descriptions
>>>                     between OmniOS
>>>                     and CentOS 7 (I don't think we ever tested OmniOS to
>>>                     OmniOS). I don't
>>>                     believe OmniOS could do TCP at full line rate but I
>>>                     think we managed 700+
>>>                     Mbytes/sec on both transmit and receive and we got
>>>                     basically disk-limited
>>>                     speeds with iSCSI (across multiple disks on
>>>                     multi-disk mirrored pools,
>>>                     OmniOS iSCSI initiator, Linux iSCSI targets).
>>>
>>>                        I don't believe we did any specific kernel tuning
>>>                     (and in fact some of
>>>                     our attempts to fiddle ixgbe driver parameters blew
>>>                     up in our face).
>>>                     We did tune iSCSI connection parameters to increase
>>>                     various buffer
>>>                     sizes so that ZFS could do even large single
>>>                     operations in single iSCSI
>>>                     transactions. (More details available if people are
>>>                     interested.)
>>>
>>>                         10: At the wire level, the speed problems are
>>>                         clearly due to pauses in
>>>                         response time by omnios. At 9000 byte frame
>>>                         sizes, I see a good number
>>>                         of duplicate ACKs and fast retransmits during
>>>                         read operations (when
>>>                         omnios is transmitting). But below about a
>>>                         4100-byte MTU on omnios
>>>                         (which seems to correlate to 4096-byte iSCSI
>>>                         block transfers), the
>>>                         transmission errors fade away and we only see
>>>                         the transmission pause
>>>                         problem.
>>>
>>>
>>>                        This is what really attracted my attention. In
>>>                     our OmniOS setup, our
>>>                     specific Intel hardware had ixgbe driver issues that
>>>                     could cause
>>>                     activity stalls during once-a-second link heartbeat
>>>                     checks. This
>>>                     obviously had an effect at the TCP and iSCSI layers.
>>>                     My initial message
>>>                     to illumos-developer sparked a potentially
>>>                     interesting discussion:
>>>
>>>
>>>                     http://www.listbox.com/member/
>>> __archive/182179/2014/10/sort/__time_rev/page/16/entry/6:
>>> 405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/
>>>                     <http://www.listbox.com/
>>> member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:
>>> 405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/>
>>>
>>>                     If you think this is a possibility in your setup,
>>>                     I've put the DTrace
>>>                     script I used to hunt for this up on the web:
>>>
>>>                     http://www.cs.toronto.edu/~__
>>> cks/src/omnios-ixgbe/ixgbe___delay.d
>>>                     <http://www.cs.toronto.edu/~
>>> cks/src/omnios-ixgbe/ixgbe_delay.d>
>>>
>>>                     This isn't the only potential source of driver
>>>                     stalls by any means, it's
>>>                     just the one I found. You may also want to look at
>>>                     lockstat in general,
>>>                     as information it reported is what led us to look
>>>                     specifically at the
>>>                     ixgbe code here.
>>>
>>>                     (If you suspect kernel/driver issues, lockstat
>>>                     combined with kernel
>>>                     source is a really excellent resource.)
>>>
>>>                               - cks
>>>
>>>
>>>
>>>
>>>                 _________________________________________________
>>>                 OmniOS-discuss mailing list
>>>                 OmniOS-discuss at lists.omniti.__com
>>>                 <mailto:OmniOS-discuss at lists.omniti.com>
>>>                 http://lists.omniti.com/__mailman/listinfo/omnios-__
>>> discuss
>>>                 <http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>> >
>>>
>>>
>>>         _________________________________________________
>>>         OmniOS-discuss mailing list
>>>         OmniOS-discuss at lists.omniti.__com
>>>         <mailto:OmniOS-discuss at lists.omniti.com>
>>>         http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>>         <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>>>
>>>
>>>     --
>>>     OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>>>     Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> - Fax: +49 911
>>>     39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de
>>>     HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>>
>>>
>>>
>> --
>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>> Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de
>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>
>
> *illumos-developer* | Archives
> <https://www.listbox.com/member/archive/182179/=now>
> <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e> |
> Modify
> <https://www.listbox.com/member/?member_id=21239177&id_secret=21239177-2d0c9337>
> Your Subscription <http://www.listbox.com/>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150302/3d088e23/attachment-0001.html>

From jg at osn.de  Mon Mar  2 11:14:10 2015
From: jg at osn.de (Joerg Goltermann)
Date: Mon, 02 Mar 2015 12:14:10 +0100
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
 Lindsay Lohan, and the Greek economy
In-Reply-To: <CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>	<20150221032559.727D07A0792@apps0.cs.toronto.edu>	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>	<54EAEFA8.4020101@osn.de>	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>	<54EB5392.6030900@osn.de>	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
Message-ID: <54F44602.5030705@osn.de>

Hi,

I would try *one* TPG which includes both interface addresses
and I would double check for packet drops on the Catalyst.

The 3560 supports only receive flow control which means, that
a sending 10Gbit port can easily overload a 1Gbit port.
Do you have flow control enabled?

  - Joerg

On 02.03.2015 09:22, W Verb via illumos-developer wrote:
> Hello Garrett,
>
> No, no 802.3ad going on in this config.
>
> Here is a basic schematic:
>
> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing
>
> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
>
> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing
>
> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The
> switch is set to allow 9148-byte frames, and I'm not seeing any
> errors/buffer overruns on the switch.
>
> Here is a screenshot of a packet capture from a read operation on the
> guest OS (from it's local drive, which is actually a VMDK file on the
> storage server). In this example, only a single 1G ESXi kernel interface
> (vmk1) is bound to the software iSCSI initiator.
>
> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing
>
> Note that there's a nice, well-behaved window sizing process taking
> place. The ESXi decreases the scaled window by 11 or 12 for each ACK,
> then bumps it back up to 512.
>
> Here is a similar screenshot of a single-interface write operation:
>
> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing
>
> There are no pauses or gaps in the transmission rate in the
> single-interface transfers.
>
>
> In the next screenshots, I have enabled an additional 1G interface on
> the ESXi host, and bound it to the iSCSI initiator. The new interface is
> bound to a separate physical port, uses a different VLAN on the switch,
> and talks to a different 10G port on the storage server.
>
> First, let's look at a write operation on the guest OS, which happily
> pumps data at near-line-rate to the storage server.
>
> Here is a sequence number trace diagram. Note how the transfer has a
> nice, smooth increment rate over the entire transfer.
>
> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing
>
> Here are screenshots from packet captures on both 1G interfaces:
>
> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing
> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing
>
> Note how we again see nice, smooth window adjustment, and no gaps in
> transmission.
>
>
> But now, let's look at the problematic two-interface Read operation.
> First, the sequence graph:
>
> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing
>
> As you can see, there are gaps and jumps in the transmission throughout
> the transfer.
> It is very illustrative to look at captures of the gaps, which are
> occurring on both interfaces:
>
> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing
> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing
>
> As you can see, there are ~.4 second pauses in transmission from the
> storage server, which kills the transfer rate.
> It's clear that the ESXi box ACKs the prior iSCSI operation to
> completion, then makes a new LUN request, which the storage server
> immediately replies to. The ESXi ACKs the response packet from the
> storage server, then waits...and waits....and waits... until eventually
> the storage server starts transmitting again.
>
> Because the pause happens while the ESXi client is waiting for a packet
> from the storage server, that tells me that the gaps are not an artifact
> of traffic being switched between both active interfaces, but are
> actually indicative of short hangs occurring on the server.
>
> Having a pause or two in transmission is no big deal, but in my case, it
> is happening constantly, and dropping my overall read transfer rate down
> to 20-60MB/s, which is slower than the single interface transfer rate
> (~90-100MB/s).
>
> Decreasing the MTU makes the pauses shorter, increasing them makes the
> pauses longer.
>
> Another interesting thing is that if I set the multipath io interval to
> 3 operations instead of 1, I get better throughput. In other words, the
> less frequently I swap IP addresses on my iSCSI requests from the ESXi
> unit, the fewer pauses I see.
>
> Basically, COMSTAR seems to choke each time an iSCSI request from a new
> IP arrives.
>
> Because the single interface transfer is near line rate, that tells me
> that the storage system (mpt_sas, zfs, etc) is working fine. It's only
> when multiple paths are attempted that iSCSI falls on its face during reads.
>
> All of these captures were taken without a cache device being attached
> to the storage zpool, so this isn't looking like some kind of ZFS ARC
> problem. As mentioned previously, local transfers to/from the zpool are
> showing ~300-500 MB/s rates over long transfers (10G+).
>
> -Warren V
>
> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org
> <mailto:garrett at damore.org>> wrote:
>
>     I?m not sure I?ve followed properly.  You have *two* interfaces.
>     You are not trying to provision these in an aggr are you? As far as
>     I?m aware, VMware does not support 802.3ad link aggregations.  (Its
>     possible that you can make it work with ESXi if you give the entire
>     NIC to the guest ? but I?m skeptical.)  The problem is that if you
>     try to use link aggregation, some packets (up to half!) will be
>     lost.  TCP and other protocols fare poorly in this situation.
>
>     Its possible I?ve totally misunderstood what you?re trying to do, in
>     which case I apologize.
>
>     The idle thing is a red-herring ? the cpu is waiting for work to do,
>     probably because packets haven?t arrived (or where dropped by the
>     hypervisor!)  I wouldn?t read too much into that except that your
>     network stack is in trouble.  I?d look a bit more closely at the
>     kstats for tcp ? I suspect you?ll see retransmits or out of order
>     values that are unusually high ? if so this may help validate my
>     theory above.
>
>     - Garrett
>
>>     On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>>     <developer at lists.illumos.org <mailto:developer at lists.illumos.org>>
>>     wrote:
>>
>>     Hello all,
>>
>>
>>     Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>>
>>
>>     I tried Joerg's updated driver, which didn't improve the issue. So
>>     I went back to the drawing board and rebuilt the server from scratch.
>>
>>     What I noted is that if I have only a single 1-gig physical
>>     interface active on the ESXi host, everything works as expected.
>>     As soon as I enable two interfaces, I start seeing the performance
>>     problems I've described.
>>
>>     Response pauses from the server that I see in TCPdumps are still
>>     leading me to believe the problem is delay on the server side, so
>>     I ran a series of kernel dtraces and produced some flamegraphs.
>>
>>
>>     This was taken during a read operation with two active 10G
>>     interfaces on the server, with a single target being shared by two
>>     tpgs- one tpg for each 10G physical port. The host device has two
>>     1G ports enabled, with VLANs separating the active ports into
>>     10G/1G pairs. ESXi is set to multipath using both VLANS with a
>>     round-robin IO interval of 1.
>>
>>     https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing
>>
>>
>>     This was taken during a write operation:
>>
>>     https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing
>>
>>
>>     I then rebooted the server and disabled C-State, ACPI T-State, and
>>     general EIST (Turbo boost) functionality in the CPU.
>>
>>     I when I attempted to boot my guest VM, the iSCSI transfer
>>     gradually ground to a halt during the boot loading process, and
>>     the guest OS never did complete its boot process.
>>
>>     Here is a flamegraph taken while iSCSI is slowly dying:
>>
>>     https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing
>>
>>
>>     I edited out cpu_idle_adaptive from the dtrace output and
>>     regenerated the slowdown graph:
>>
>>     https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing
>>
>>
>>     I then edited cpu_idle_adaptive out of the speedy write operation
>>     and regenerated that graph:
>>
>>     https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing
>>
>>
>>     I have zero experience with interpreting flamegraphs, but the most
>>     significant difference I see between the slow read example and the
>>     fast write example is in unix`thread_start --> unix`idle. There's
>>     a good chunk of "unix`i86_mwait" in the read example that is not
>>     present in the write example at all.
>>
>>     Disabling the l2arc cache device didn't make a difference, and I
>>     had to reenable EIST support on the CPU to get my VMs to boot.
>>
>>     I am seeing a variety of bug reports going back to 2010 regarding
>>     excessive mwait operations, with the suggested solutions usually
>>     being to set "cpupm enable poll-mode" in power.conf. That change
>>     also had no effect on speed.
>>
>>     -Warren V
>>
>>
>>
>>
>>     -----Original Message-----
>>
>>     From: Chris Siebenmann [mailto:cks at cs.toronto.edu]
>>
>>     Sent: Monday, February 23, 2015 8:30 AM
>>
>>     To: W Verb
>>
>>     Cc: omnios-discuss at lists.omniti.com
>>     <mailto:omnios-discuss at lists.omniti.com>; cks at cs.toronto.edu
>>     <mailto:cks at cs.toronto.edu>
>>
>>     Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and
>>     the Greek economy
>>
>>
>>     > Chris, thanks for your specific details. I'd appreciate it if you
>>
>>     > could tell me which copper NIC you tried, as well as to pass on the
>>
>>     > iSCSI tuning parameters.
>>
>>
>>     Our copper NIC experience is with onboard X540-AT2 ports on
>>     SuperMicro hardware (which have the guaranteed 10-20 msec lock
>>     hold) and dual-port 82599EB TN cards (which have some sort of
>>     driver/hardware failure under load that eventually leads to
>>     2-second lock holds). I can't recommend either with the current
>>     driver; we had to revert to 1G networking in order to get stable
>>     servers.
>>
>>
>>     The iSCSI parameter modifications we do, across both initiators
>>     and targets, are:
>>
>>
>>     initialr2tno
>>
>>     firstburstlength128k
>>
>>     maxrecvdataseglen128k[only on Linux backends]
>>
>>     maxxmitdataseglen128k[only on Linux backends]
>>
>>
>>     The OmniOS initiator doesn't need tuning for more than the first
>>     two parameters; on the Linux backends we tune up all four. My
>>     extended thoughts on these tuning parameters and why we touch them
>>     can be found
>>
>>     here:
>>
>>
>>     http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol
>>
>>     http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>>
>>
>>     The short version is that these parameters probably only make a
>>     small difference but their overall goal is to do 128KB ZFS reads
>>     and writes in single iSCSI operations (although they will be
>>     fragmented at the TCP
>>
>>     layer) and to do iSCSI writes without a back-and-forth delay
>>     between initiator and target (that's 'initialr2t no').
>>
>>
>>     I think basically everyone should use InitialR2T set to no and in
>>     fact that it should be the software default. These days only
>>     unusually limited iSCSI targets should need it to be otherwise and
>>     they can change their setting for it (initiator and target must
>>     both agree to it being 'yes', so either can veto it).
>>
>>
>>     - cks
>>
>>
>>
>>     On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de
>>     <mailto:jg at osn.de>> wrote:
>>
>>         Hi,
>>
>>         I think your problem is caused by your link properties or your
>>         switch settings. In general the standard ixgbe seems to perform
>>         well.
>>
>>         I had trouble after changing the default flow control settings
>>         to "bi"
>>         and this was my motivation to update the ixgbe driver a long
>>         time ago.
>>         After I have updated our systems to ixgbe 2.5.8 I never had any
>>         problems ....
>>
>>         Make sure your switch has support for jumbo frames and you use
>>         the same mtu on all ports, otherwise the smallest will be used.
>>
>>         What switch do you use? I can tell you nice horror stories about
>>         different vendors....
>>
>>          - Joerg
>>
>>         On 23.02.2015 10:31, W Verb wrote:
>>
>>             Thank you Joerg,
>>
>>             I've downloaded the package and will try it tomorrow.
>>
>>             The only thing I can add at this point is that upon review
>>             of my
>>             testing, I may have performed my "pkg -u" between the
>>             initial quad-gig
>>             performance test and installing the 10G NIC. So this may
>>             be a new
>>             problem introduced in the latest updates.
>>
>>             Those of you who are running 10G and have not upgraded to
>>             the latest
>>             kernel, etc, might want to do some additional testing
>>             before running the
>>             update.
>>
>>             -Warren V
>>
>>             On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann
>>             <jg at osn.de <mailto:jg at osn.de>
>>             <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
>>
>>                 Hi,
>>
>>                 I remember there was a problem with the flow control
>>             settings in the
>>                 ixgbe
>>                 driver, so I updated it a long time ago for our
>>             internal servers to
>>                 2.5.8.
>>                 Last weekend I integrated the latest changes from the
>>             FreeBSD driver
>>                 to bring
>>                 the illumos ixgbe to 2.5.25 but I had no time to test
>>             it, so it's
>>                 completely
>>                 untested!
>>
>>
>>                 If you would like to give the latest driver a try you
>>             can fetch the
>>                 kernel modules from
>>             https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9
>>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>>                 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>>             <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>
>>
>>                 Clone your boot environment, place the modules in the
>>             new environment
>>                 and update the boot-archive of the new BE.
>>
>>                   - Joerg
>>
>>
>>
>>
>>
>>                 On 23.02.2015 02:54, W Verb wrote:
>>
>>                     By the way, to those of you who have working
>>             setups: please send me
>>                     your pool/volume settings, interface linkprops,
>>             and any kernel
>>                     tuning
>>                     parameters you may have set.
>>
>>                     Thanks,
>>                     Warren V
>>
>>                     On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>                     <chip at innovates.com <mailto:chip at innovates.com>
>>             <mailto:chip at innovates.com <mailto:chip at innovates.com>>>
>>             wrote:
>>
>>                         I can't say I totally agree with your performance
>>                         assessment.   I run Intel
>>                         X520 in all my OmniOS boxes.
>>
>>                         Here is a capture of nfssvrtop I made while
>>             running many
>>                         storage vMotions
>>                         between two OmniOS boxes hosting NFS
>>             datastores.   This is a
>>                         10 host VMware
>>                         cluster.  Both OmniOS boxes are dual 10G
>>             connected with
>>                         copper twin-ax to
>>                         the in rack Nexus 5010.
>>
>>                         VMware does 100% sync writes, I use ZeusRAM
>>             SSDs for log
>>                         devices.
>>
>>                         -Chip
>>
>>                         2014 Apr 24 08:05:51, load: 12.64, read:
>>             17330243 KB,
>>                         swrite: 15985    KB,
>>                         awrite: 1875455  KB
>>
>>                         Ver     Client           NFSOPS   Reads
>>             SWrites AWrites
>>                         Commits   Rd_bw
>>                         SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t
>>              Com_t  Align%
>>
>>                         4       10.28.17.105          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         4       10.28.17.215          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         4       10.28.17.213          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         4       10.28.16.151          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         4       all                   1       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         3       10.28.16.175          3       0
>>              3       0
>>                           0       1
>>                         11       0    4806      48       0       0      85
>>
>>                         3       10.28.16.183          6       0
>>              6       0
>>                           0       3
>>                         162       0     549     124       0       0
>>               73
>>
>>                         3       10.28.16.180         11       0
>>             10       0
>>                           0       3
>>                         27       0     776      89       0       0      67
>>
>>                         3       10.28.16.176         28       2
>>             26       0
>>                           0      10
>>                         405       0    2572     198       0       0
>>              100
>>
>>                         3       10.28.16.178       4606    4602
>>              4       0
>>                           0  294534
>>                         3       0     723      49       0       0      99
>>
>>                         3       10.28.16.179       4905    4879
>>             26       0
>>                           0  312208
>>                         311       0     735     271       0       0
>>               99
>>
>>                         3       10.28.16.181       5515    5502
>>             13       0
>>                           0  352107
>>                         77       0      89      87       0       0      99
>>
>>                         3       10.28.16.184      12095   12059
>>             10       0
>>                           0  763014
>>                         39       0     249     147       0       0      99
>>
>>                         3       10.28.58.1        15401    6040
>>              116    6354
>>                         53  191605
>>                         474  202346     192      96     144      83
>>               99
>>
>>                         3       all 42574 33086 <tel:42574%2033086>
>>             <tel:42574%20%20%2033086>     217
>>                         6354      53 1913488
>>                         1582  202300     348     138     153     105
>>                 99
>>
>>
>>
>>
>>
>>                         On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>>             <wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>>                         <mailto:wverb73 at gmail.com
>>             <mailto:wverb73 at gmail.com>>> wrote:
>>
>>
>>                             Hello All,
>>
>>                             Thank you for your replies.
>>                             I tried a few things, and found the following:
>>
>>                             1: Disabling hyperthreading support in the
>>             BIOS drops
>>                             performance overall
>>                             by a factor of 4.
>>                             2: Disabling VT support also seems to have
>>             some effect,
>>                             although it
>>                             appears to be minor. But this has the
>>             amusing side
>>                             effect of fixing the
>>                             hangs I've been experiencing with fast
>>             reboot. Probably
>>                             by disabling kvm.
>>                             3: The performance tests are a bit tricky
>>             to quantify
>>                             because of caching
>>                             effects. In fact, I'm not entirely sure
>>             what is
>>                             happening here. It's just
>>                             best to describe what I'm seeing:
>>
>>                             The commands I'm using to test are
>>                             dd if=/dev/zero of=./test.dd bs=2M count=5000
>>                             dd of=/dev/null if=./test.dd bs=2M count=5000
>>                             The host vm is running Centos 6.6, and has
>>             the latest
>>                             vmtools installed.
>>                             There is a host cache on an SSD local to
>>             the host that
>>                             is also in place.
>>                             Disabling the host cache didn't
>>             immediately have an
>>                             effect as far as I could
>>                             see.
>>
>>                             The host MTU set to 3000 on all iSCSI
>>             interfaces for all
>>                             tests.
>>
>>                             Test 1: Right after reboot, with an ixgbe
>>             MTU of 9000,
>>                             the write test
>>                             yields an average speed over three tests
>>             of 137MB/s. The
>>                             read test yields an
>>                             average over three tests of 5MB/s.
>>
>>                             Test 2: After setting "ifconfig ixgbe0 mtu
>>             3000", the
>>                             write tests yield
>>                             140MB/s, and the read tests yield 53MB/s.
>>             It's important
>>                             to note here that
>>                             if I cut the read test short at only
>>             2-3GB, I get
>>                             results upwards of
>>                             350MB/s, which I assume is local
>>             cache-related distortion.
>>
>>                             Test 3: MTU of 1500. Read tests are up to
>>             156 MB/s.
>>                             Write tests yield
>>                             about 142MB/s.
>>                             Test 4: MTU of 1000: Read test at 182MB/s.
>>                             Test 5: MTU of 900: Read test at 130 MB/s.
>>                             Test 6: MTU of 1000: Read test at 160MB/s.
>>             Write tests
>>                             are now
>>                             consistently at about 300MB/s.
>>                             Test 7: MTU of 1200: Read test at 124MB/s.
>>                             Test 8: MTU of 1000: Read test at 161MB/s.
>>             Write at 261MB/s.
>>
>>                             A few final notes:
>>                             L1ARC grabs about 10GB of RAM during the
>>             tests, so
>>                             there's definitely some
>>                             read caching going on.
>>                             The write operations are easier to observe
>>             with iostat,
>>                             and I'm seeing io
>>                             rates that closely correlate with the
>>             network write speeds.
>>
>>
>>                             Chris, thanks for your specific details.
>>             I'd appreciate
>>                             it if you could
>>                             tell me which copper NIC you tried, as
>>             well as to pass
>>                             on the iSCSI tuning
>>                             parameters.
>>
>>                             I've ordered an Intel EXPX9502AFXSR, which
>>             uses the
>>                             82598 chip instead of
>>                             the 82599 in the X520. If I get similar
>>             results with my
>>                             fiber transcievers,
>>                             I'll see if I can get a hold of copper ones.
>>
>>                             But I should mention that I did indeed
>>             look at PHY/MAC
>>                             error rates, and
>>                             they are nil.
>>
>>                             -Warren V
>>
>>                             On Fri, Feb 20, 2015 at 7:25 PM, Chris
>>             Siebenmann
>>                             <cks at cs.toronto.edu
>>             <mailto:cks at cs.toronto.edu> <mailto:cks at cs.toronto.edu
>>             <mailto:cks at cs.toronto.edu>>>
>>
>>                             wrote:
>>
>>
>>                                     After installation and
>>             configuration, I observed
>>                                     all kinds of bad
>>                                     behavior
>>                                     in the network traffic between the
>>             hosts and the
>>                                     server. All of this
>>                                     bad
>>                                     behavior is traced to the ixgbe
>>             driver on the
>>                                     storage server. Without
>>                                     going
>>                                     into the full troubleshooting
>>             process, here are
>>                                     my takeaways:
>>
>>                                 [...]
>>
>>                                    For what it's worth, we managed to
>>             achieve much
>>                                 better line rates on
>>                                 copper 10G ixgbe hardware of various
>>             descriptions
>>                                 between OmniOS
>>                                 and CentOS 7 (I don't think we ever
>>             tested OmniOS to
>>                                 OmniOS). I don't
>>                                 believe OmniOS could do TCP at full
>>             line rate but I
>>                                 think we managed 700+
>>                                 Mbytes/sec on both transmit and
>>             receive and we got
>>                                 basically disk-limited
>>                                 speeds with iSCSI (across multiple
>>             disks on
>>                                 multi-disk mirrored pools,
>>                                 OmniOS iSCSI initiator, Linux iSCSI
>>             targets).
>>
>>                                    I don't believe we did any specific
>>             kernel tuning
>>                                 (and in fact some of
>>                                 our attempts to fiddle ixgbe driver
>>             parameters blew
>>                                 up in our face).
>>                                 We did tune iSCSI connection
>>             parameters to increase
>>                                 various buffer
>>                                 sizes so that ZFS could do even large
>>             single
>>                                 operations in single iSCSI
>>                                 transactions. (More details available
>>             if people are
>>                                 interested.)
>>
>>                                     10: At the wire level, the speed
>>             problems are
>>                                     clearly due to pauses in
>>                                     response time by omnios. At 9000
>>             byte frame
>>                                     sizes, I see a good number
>>                                     of duplicate ACKs and fast
>>             retransmits during
>>                                     read operations (when
>>                                     omnios is transmitting). But below
>>             about a
>>                                     4100-byte MTU on omnios
>>                                     (which seems to correlate to
>>             4096-byte iSCSI
>>                                     block transfers), the
>>                                     transmission errors fade away and
>>             we only see
>>                                     the transmission pause
>>                                     problem.
>>
>>
>>                                    This is what really attracted my
>>             attention. In
>>                                 our OmniOS setup, our
>>                                 specific Intel hardware had ixgbe
>>             driver issues that
>>                                 could cause
>>                                 activity stalls during once-a-second
>>             link heartbeat
>>                                 checks. This
>>                                 obviously had an effect at the TCP and
>>             iSCSI layers.
>>                                 My initial message
>>                                 to illumos-developer sparked a potentially
>>                                 interesting discussion:
>>
>>
>>             http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/
>>             <http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/>
>>
>>             <http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/
>>             <http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/>>
>>
>>                                 If you think this is a possibility in
>>             your setup,
>>                                 I've put the DTrace
>>                                 script I used to hunt for this up on
>>             the web:
>>
>>             http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d
>>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d>
>>
>>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d
>>             <http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d>>
>>
>>                                 This isn't the only potential source
>>             of driver
>>                                 stalls by any means, it's
>>                                 just the one I found. You may also
>>             want to look at
>>                                 lockstat in general,
>>                                 as information it reported is what led
>>             us to look
>>                                 specifically at the
>>                                 ixgbe code here.
>>
>>                                 (If you suspect kernel/driver issues,
>>             lockstat
>>                                 combined with kernel
>>                                 source is a really excellent resource.)
>>
>>                                           - cks
>>
>>
>>
>>
>>
>>             ___________________________________________________
>>                             OmniOS-discuss mailing list
>>             OmniOS-discuss at lists.omniti
>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>                             <mailto:OmniOS-discuss at lists.__omniti.com
>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>             http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>>
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>
>>
>>                     ___________________________________________________
>>                     OmniOS-discuss mailing list
>>             OmniOS-discuss at lists.omniti
>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>                     <mailto:OmniOS-discuss at lists.__omniti.com
>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>             http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>>
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>
>>
>>                 --
>>                 OSN Online Service Nuernberg GmbH, Bucher Str. 78,
>>             90408 Nuernberg
>>                 Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0>
>>             <tel:%2B49%20911%2039905-0> - Fax: +49 911
>>                 39905-55 <tel:%2B49%20911%2039905-55> -
>>             http://www.osn.de <http://www.osn.de/>
>>                 HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg
>>             Goltermann
>>
>>
>>
>>         --
>>         OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>>         Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> - Fax: +49
>>         911 39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de
>>         <http://www.osn.de/>
>>         HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>
>>
>>     *illumos-developer* | Archives
>>     <https://www.listbox.com/member/archive/182179/=now>
>>     <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e>
>>     | Modify <https://www.listbox.com/member/?&> Your Subscription
>>     [Powered by Listbox] <http://www.listbox.com/>
>>
>
>
> *illumos-developer* | Archives
> <https://www.listbox.com/member/archive/182179/=now>
> <https://www.listbox.com/member/archive/rss/182179/21175123-d0c8da4c> |
> Modify
> <https://www.listbox.com/member/?member_id=21175123&id_secret=21175123-d92578cc>
> Your Subscription	[Powered by Listbox] <http://www.listbox.com>
>

-- 
OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de
HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann

From garrett at damore.org  Mon Mar  2 15:08:06 2015
From: garrett at damore.org (Garrett D'Amore)
Date: Mon, 2 Mar 2015 07:08:06 -0800
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
	Lindsay Lohan, and the Greek economy
In-Reply-To: <CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
Message-ID: <72CA76E9-35A7-4B00-A7BE-A54C99F1B98C@damore.org>

Seems like it is indeed a comstar problem.  Lockstat analysis might reveal contended locks or perhaps some kind of timeouts in the code. 

Sent from my iPhone

> On Mar 2, 2015, at 12:22 AM, W Verb <wverb73 at gmail.com> wrote:
> 
> Hello Garrett,
> 
> No, no 802.3ad going on in this config.
> 
> Here is a basic schematic:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing
> 
> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing
> 
> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The switch is set to allow 9148-byte frames, and I'm not seeing any errors/buffer overruns on the switch.
> 
> Here is a screenshot of a packet capture from a read operation on the guest OS (from it's local drive, which is actually a VMDK file on the storage server). In this example, only a single 1G ESXi kernel interface (vmk1) is bound to the software iSCSI initiator.
> 
> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing
> 
> Note that there's a nice, well-behaved window sizing process taking place. The ESXi decreases the scaled window by 11 or 12 for each ACK, then bumps it back up to 512.
> 
> Here is a similar screenshot of a single-interface write operation:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing
> 
> There are no pauses or gaps in the transmission rate in the single-interface transfers.
> 
> 
> In the next screenshots, I have enabled an additional 1G interface on the ESXi host, and bound it to the iSCSI initiator. The new interface is bound to a separate physical port, uses a different VLAN on the switch, and talks to a different 10G port on the storage server. 
> 
> First, let's look at a write operation on the guest OS, which happily pumps data at near-line-rate to the storage server. 
> 
> Here is a sequence number trace diagram. Note how the transfer has a nice, smooth increment rate over the entire transfer.
> 
> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing
> 
> Here are screenshots from packet captures on both 1G interfaces:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing
> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing
> 
> Note how we again see nice, smooth window adjustment, and no gaps in transmission.
> 
> 
> But now, let's look at the problematic two-interface Read operation.
> First, the sequence graph:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing
> 
> As you can see, there are gaps and jumps in the transmission throughout the transfer.
> It is very illustrative to look at captures of the gaps, which are occurring on both interfaces:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing
> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing
> 
> As you can see, there are ~.4 second pauses in transmission from the storage server, which kills the transfer rate.
> It's clear that the ESXi box ACKs the prior iSCSI operation to completion, then makes a new LUN request, which the storage server immediately replies to. The ESXi ACKs the response packet from the storage server, then waits...and waits....and waits... until eventually the storage server starts transmitting again.
> 
> Because the pause happens while the ESXi client is waiting for a packet from the storage server, that tells me that the gaps are not an artifact of traffic being switched between both active interfaces, but are actually indicative of short hangs occurring on the server.
> 
> Having a pause or two in transmission is no big deal, but in my case, it is happening constantly, and dropping my overall read transfer rate down to 20-60MB/s, which is slower than the single interface transfer rate (~90-100MB/s).
> 
> Decreasing the MTU makes the pauses shorter, increasing them makes the pauses longer.
> 
> Another interesting thing is that if I set the multipath io interval to 3 operations instead of 1, I get better throughput. In other words, the less frequently I swap IP addresses on my iSCSI requests from the ESXi unit, the fewer pauses I see.
> 
> Basically, COMSTAR seems to choke each time an iSCSI request from a new IP arrives.
> 
> Because the single interface transfer is near line rate, that tells me that the storage system (mpt_sas, zfs, etc) is working fine. It's only when multiple paths are attempted that iSCSI falls on its face during reads.
> 
> All of these captures were taken without a cache device being attached to the storage zpool, so this isn't looking like some kind of ZFS ARC problem. As mentioned previously, local transfers to/from the zpool are showing ~300-500 MB/s rates over long transfers (10G+).
> 
> -Warren V
> 
>> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org> wrote:
>> I?m not sure I?ve followed properly.  You have *two* interfaces.  You are not trying to provision these in an aggr are you? As far as I?m aware, VMware does not support 802.3ad link aggregations.  (Its possible that you can make it work with ESXi if you give the entire NIC to the guest ? but I?m skeptical.)  The problem is that if you try to use link aggregation, some packets (up to half!) will be lost.  TCP and other protocols fare poorly in this situation.
>> 
>> Its possible I?ve totally misunderstood what you?re trying to do, in which case I apologize.
>> 
>> The idle thing is a red-herring ? the cpu is waiting for work to do, probably because packets haven?t arrived (or where dropped by the hypervisor!)  I wouldn?t read too much into that except that your network stack is in trouble.  I?d look a bit more closely at the kstats for tcp ? I suspect you?ll see retransmits or out of order values that are unusually high ? if so this may help validate my theory above.
>> 
>> 	- Garrett
>> 
>>> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer <developer at lists.illumos.org> wrote:
>>> 
>>> Hello all,
>>> 
>>>  
>>> Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>>> 
>>> 
>>> 
>>> I tried Joerg's updated driver, which didn't improve the issue. So I went back to the drawing board and rebuilt the server from scratch.
>>> 
>>> What I noted is that if I have only a single 1-gig physical interface active on the ESXi host, everything works as expected. As soon as I enable two interfaces, I start seeing the performance problems I've described.
>>> 
>>> Response pauses from the server that I see in TCPdumps are still leading me to believe the problem is delay on the server side, so I ran a series of kernel dtraces and produced some flamegraphs.
>>> 
>>> 
>>> 
>>> This was taken during a read operation with two active 10G interfaces on the server, with a single target being shared by two tpgs- one tpg for each 10G physical port. The host device has two 1G ports enabled, with VLANs separating the active ports into 10G/1G pairs. ESXi is set to multipath using both VLANS with a round-robin IO interval of 1.
>>> 
>>> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing
>>> 
>>> 
>>> 
>>> This was taken during a write operation:
>>> 
>>> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing
>>> 
>>> 
>>> 
>>> I then rebooted the server and disabled C-State, ACPI T-State, and general EIST (Turbo boost) functionality in the CPU.
>>> 
>>> I when I attempted to boot my guest VM, the iSCSI transfer gradually ground to a halt during the boot loading process, and the guest OS never did complete its boot process.
>>> 
>>> Here is a flamegraph taken while iSCSI is slowly dying:
>>> 
>>> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing
>>> 
>>>  
>>> I edited out cpu_idle_adaptive from the dtrace output and regenerated the slowdown graph:
>>> 
>>> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing
>>> 
>>>  
>>> I then edited cpu_idle_adaptive out of the speedy write operation and regenerated that graph:
>>> 
>>> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing
>>> 
>>>  
>>> I have zero experience with interpreting flamegraphs, but the most significant difference I see between the slow read example and the fast write example is in unix`thread_start --> unix`idle. There's a good chunk of "unix`i86_mwait" in the read example that is not present in the write example at all.
>>> 
>>> Disabling the l2arc cache device didn't make a difference, and I had to reenable EIST support on the CPU to get my VMs to boot.
>>> 
>>> I am seeing a variety of bug reports going back to 2010 regarding excessive mwait operations, with the suggested solutions usually being to set "cpupm enable poll-mode" in power.conf. That change also had no effect on speed.
>>> 
>>> -Warren V
>>> 
>>>  
>>>  
>>>  
>>> -----Original Message-----
>>> 
>>> From: Chris Siebenmann [mailto:cks at cs.toronto.edu]
>>> 
>>> Sent: Monday, February 23, 2015 8:30 AM
>>> 
>>> To: W Verb
>>> 
>>> Cc: omnios-discuss at lists.omniti.com; cks at cs.toronto.edu
>>> 
>>> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy
>>> 
>>>  
>>> > Chris, thanks for your specific details. I'd appreciate it if you
>>> 
>>> > could tell me which copper NIC you tried, as well as to pass on the
>>> 
>>> > iSCSI tuning parameters.
>>> 
>>>  
>>>  Our copper NIC experience is with onboard X540-AT2 ports on SuperMicro hardware (which have the guaranteed 10-20 msec lock hold) and dual-port 82599EB TN cards (which have some sort of driver/hardware failure under load that eventually leads to 2-second lock holds). I can't recommend either with the current driver; we had to revert to 1G networking in order to get stable servers.
>>> 
>>>  
>>>  The iSCSI parameter modifications we do, across both initiators and targets, are:
>>> 
>>>  
>>>       initialr2t        no
>>> 
>>>       firstburstlength  128k
>>> 
>>>       maxrecvdataseglen 128k        [only on Linux backends]
>>> 
>>>       maxxmitdataseglen 128k        [only on Linux backends]
>>> 
>>>  
>>> The OmniOS initiator doesn't need tuning for more than the first two parameters; on the Linux backends we tune up all four. My extended thoughts on these tuning parameters and why we touch them can be found
>>> 
>>> here:
>>> 
>>>  
>>>    http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol
>>> 
>>>    http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>>> 
>>>  
>>> The short version is that these parameters probably only make a small difference but their overall goal is to do 128KB ZFS reads and writes in single iSCSI operations (although they will be fragmented at the TCP
>>> 
>>> layer) and to do iSCSI writes without a back-and-forth delay between initiator and target (that's 'initialr2t no').
>>> 
>>>  
>>>  I think basically everyone should use InitialR2T set to no and in fact that it should be the software default. These days only unusually limited iSCSI targets should need it to be otherwise and they can change their setting for it (initiator and target must both agree to it being 'yes', so either can veto it).
>>> 
>>>  
>>>       - cks
>>> 
>>>  
>>> 
>>>> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de> wrote:
>>>> Hi,
>>>> 
>>>> I think your problem is caused by your link properties or your
>>>> switch settings. In general the standard ixgbe seems to perform
>>>> well.
>>>> 
>>>> I had trouble after changing the default flow control settings to "bi"
>>>> and this was my motivation to update the ixgbe driver a long time ago.
>>>> After I have updated our systems to ixgbe 2.5.8 I never had any
>>>> problems ....
>>>> 
>>>> Make sure your switch has support for jumbo frames and you use
>>>> the same mtu on all ports, otherwise the smallest will be used.
>>>> 
>>>> What switch do you use? I can tell you nice horror stories about
>>>> different vendors....
>>>> 
>>>>  - Joerg
>>>> 
>>>>> On 23.02.2015 10:31, W Verb wrote:
>>>>> Thank you Joerg,
>>>>> 
>>>>> I've downloaded the package and will try it tomorrow.
>>>>> 
>>>>> The only thing I can add at this point is that upon review of my
>>>>> testing, I may have performed my "pkg -u" between the initial quad-gig
>>>>> performance test and installing the 10G NIC. So this may be a new
>>>>> problem introduced in the latest updates.
>>>>> 
>>>>> Those of you who are running 10G and have not upgraded to the latest
>>>>> kernel, etc, might want to do some additional testing before running the
>>>>> update.
>>>>> 
>>>>> -Warren V
>>>>> 
>>>>> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann <jg at osn.de
>>>>> <mailto:jg at osn.de>> wrote:
>>>>> 
>>>>>     Hi,
>>>>> 
>>>>>     I remember there was a problem with the flow control settings in the
>>>>>     ixgbe
>>>>>     driver, so I updated it a long time ago for our internal servers to
>>>>>     2.5.8.
>>>>>     Last weekend I integrated the latest changes from the FreeBSD driver
>>>>>     to bring
>>>>>     the illumos ixgbe to 2.5.25 but I had no time to test it, so it's
>>>>>     completely
>>>>>     untested!
>>>>> 
>>>>> 
>>>>>     If you would like to give the latest driver a try you can fetch the
>>>>>     kernel modules from
>>>>>     https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>>>>>     <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>
>>>>> 
>>>>>     Clone your boot environment, place the modules in the new environment
>>>>>     and update the boot-archive of the new BE.
>>>>> 
>>>>>       - Joerg
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>     On 23.02.2015 02:54, W Verb wrote:
>>>>> 
>>>>>         By the way, to those of you who have working setups: please send me
>>>>>         your pool/volume settings, interface linkprops, and any kernel
>>>>>         tuning
>>>>>         parameters you may have set.
>>>>> 
>>>>>         Thanks,
>>>>>         Warren V
>>>>> 
>>>>>         On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>>>>         <chip at innovates.com <mailto:chip at innovates.com>> wrote:
>>>>> 
>>>>>             I can't say I totally agree with your performance
>>>>>             assessment.   I run Intel
>>>>>             X520 in all my OmniOS boxes.
>>>>> 
>>>>>             Here is a capture of nfssvrtop I made while running many
>>>>>             storage vMotions
>>>>>             between two OmniOS boxes hosting NFS datastores.   This is a
>>>>>             10 host VMware
>>>>>             cluster.  Both OmniOS boxes are dual 10G connected with
>>>>>             copper twin-ax to
>>>>>             the in rack Nexus 5010.
>>>>> 
>>>>>             VMware does 100% sync writes, I use ZeusRAM SSDs for log
>>>>>             devices.
>>>>> 
>>>>>             -Chip
>>>>> 
>>>>>             2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB,
>>>>>             swrite: 15985    KB,
>>>>>             awrite: 1875455  KB
>>>>> 
>>>>>             Ver     Client           NFSOPS   Reads SWrites AWrites
>>>>>             Commits   Rd_bw
>>>>>             SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t   Com_t  Align%
>>>>> 
>>>>>             4       10.28.17.105          0       0       0       0
>>>>>               0       0
>>>>>             0       0       0       0       0       0       0
>>>>> 
>>>>>             4       10.28.17.215          0       0       0       0
>>>>>               0       0
>>>>>             0       0       0       0       0       0       0
>>>>> 
>>>>>             4       10.28.17.213          0       0       0       0
>>>>>               0       0
>>>>>             0       0       0       0       0       0       0
>>>>> 
>>>>>             4       10.28.16.151          0       0       0       0
>>>>>               0       0
>>>>>             0       0       0       0       0       0       0
>>>>> 
>>>>>             4       all                   1       0       0       0
>>>>>               0       0
>>>>>             0       0       0       0       0       0       0
>>>>> 
>>>>>             3       10.28.16.175          3       0       3       0
>>>>>               0       1
>>>>>             11       0    4806      48       0       0      85
>>>>> 
>>>>>             3       10.28.16.183          6       0       6       0
>>>>>               0       3
>>>>>             162       0     549     124       0       0      73
>>>>> 
>>>>>             3       10.28.16.180         11       0      10       0
>>>>>               0       3
>>>>>             27       0     776      89       0       0      67
>>>>> 
>>>>>             3       10.28.16.176         28       2      26       0
>>>>>               0      10
>>>>>             405       0    2572     198       0       0     100
>>>>> 
>>>>>             3       10.28.16.178       4606    4602       4       0
>>>>>               0  294534
>>>>>             3       0     723      49       0       0      99
>>>>> 
>>>>>             3       10.28.16.179       4905    4879      26       0
>>>>>               0  312208
>>>>>             311       0     735     271       0       0      99
>>>>> 
>>>>>             3       10.28.16.181       5515    5502      13       0
>>>>>               0  352107
>>>>>             77       0      89      87       0       0      99
>>>>> 
>>>>>             3       10.28.16.184      12095   12059      10       0
>>>>>               0  763014
>>>>>             39       0     249     147       0       0      99
>>>>> 
>>>>>             3       10.28.58.1        15401    6040     116    6354
>>>>>             53  191605
>>>>>             474  202346     192      96     144      83      99
>>>>> 
>>>>>             3       all 42574 33086 <tel:42574%20%20%2033086>     217
>>>>>             6354      53 1913488
>>>>>             1582  202300     348     138     153     105      99
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>             On Fri, Feb 20, 2015 at 11:46 PM, W Verb <wverb73 at gmail.com
>>>>>             <mailto:wverb73 at gmail.com>> wrote:
>>>>> 
>>>>> 
>>>>>                 Hello All,
>>>>> 
>>>>>                 Thank you for your replies.
>>>>>                 I tried a few things, and found the following:
>>>>> 
>>>>>                 1: Disabling hyperthreading support in the BIOS drops
>>>>>                 performance overall
>>>>>                 by a factor of 4.
>>>>>                 2: Disabling VT support also seems to have some effect,
>>>>>                 although it
>>>>>                 appears to be minor. But this has the amusing side
>>>>>                 effect of fixing the
>>>>>                 hangs I've been experiencing with fast reboot. Probably
>>>>>                 by disabling kvm.
>>>>>                 3: The performance tests are a bit tricky to quantify
>>>>>                 because of caching
>>>>>                 effects. In fact, I'm not entirely sure what is
>>>>>                 happening here. It's just
>>>>>                 best to describe what I'm seeing:
>>>>> 
>>>>>                 The commands I'm using to test are
>>>>>                 dd if=/dev/zero of=./test.dd bs=2M count=5000
>>>>>                 dd of=/dev/null if=./test.dd bs=2M count=5000
>>>>>                 The host vm is running Centos 6.6, and has the latest
>>>>>                 vmtools installed.
>>>>>                 There is a host cache on an SSD local to the host that
>>>>>                 is also in place.
>>>>>                 Disabling the host cache didn't immediately have an
>>>>>                 effect as far as I could
>>>>>                 see.
>>>>> 
>>>>>                 The host MTU set to 3000 on all iSCSI interfaces for all
>>>>>                 tests.
>>>>> 
>>>>>                 Test 1: Right after reboot, with an ixgbe MTU of 9000,
>>>>>                 the write test
>>>>>                 yields an average speed over three tests of 137MB/s. The
>>>>>                 read test yields an
>>>>>                 average over three tests of 5MB/s.
>>>>> 
>>>>>                 Test 2: After setting "ifconfig ixgbe0 mtu 3000", the
>>>>>                 write tests yield
>>>>>                 140MB/s, and the read tests yield 53MB/s. It's important
>>>>>                 to note here that
>>>>>                 if I cut the read test short at only 2-3GB, I get
>>>>>                 results upwards of
>>>>>                 350MB/s, which I assume is local cache-related distortion.
>>>>> 
>>>>>                 Test 3: MTU of 1500. Read tests are up to 156 MB/s.
>>>>>                 Write tests yield
>>>>>                 about 142MB/s.
>>>>>                 Test 4: MTU of 1000: Read test at 182MB/s.
>>>>>                 Test 5: MTU of 900: Read test at 130 MB/s.
>>>>>                 Test 6: MTU of 1000: Read test at 160MB/s. Write tests
>>>>>                 are now
>>>>>                 consistently at about 300MB/s.
>>>>>                 Test 7: MTU of 1200: Read test at 124MB/s.
>>>>>                 Test 8: MTU of 1000: Read test at 161MB/s. Write at 261MB/s.
>>>>> 
>>>>>                 A few final notes:
>>>>>                 L1ARC grabs about 10GB of RAM during the tests, so
>>>>>                 there's definitely some
>>>>>                 read caching going on.
>>>>>                 The write operations are easier to observe with iostat,
>>>>>                 and I'm seeing io
>>>>>                 rates that closely correlate with the network write speeds.
>>>>> 
>>>>> 
>>>>>                 Chris, thanks for your specific details. I'd appreciate
>>>>>                 it if you could
>>>>>                 tell me which copper NIC you tried, as well as to pass
>>>>>                 on the iSCSI tuning
>>>>>                 parameters.
>>>>> 
>>>>>                 I've ordered an Intel EXPX9502AFXSR, which uses the
>>>>>                 82598 chip instead of
>>>>>                 the 82599 in the X520. If I get similar results with my
>>>>>                 fiber transcievers,
>>>>>                 I'll see if I can get a hold of copper ones.
>>>>> 
>>>>>                 But I should mention that I did indeed look at PHY/MAC
>>>>>                 error rates, and
>>>>>                 they are nil.
>>>>> 
>>>>>                 -Warren V
>>>>> 
>>>>>                 On Fri, Feb 20, 2015 at 7:25 PM, Chris Siebenmann
>>>>>                 <cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>>
>>>>> 
>>>>>                 wrote:
>>>>> 
>>>>> 
>>>>>                         After installation and configuration, I observed
>>>>>                         all kinds of bad
>>>>>                         behavior
>>>>>                         in the network traffic between the hosts and the
>>>>>                         server. All of this
>>>>>                         bad
>>>>>                         behavior is traced to the ixgbe driver on the
>>>>>                         storage server. Without
>>>>>                         going
>>>>>                         into the full troubleshooting process, here are
>>>>>                         my takeaways:
>>>>> 
>>>>>                     [...]
>>>>> 
>>>>>                        For what it's worth, we managed to achieve much
>>>>>                     better line rates on
>>>>>                     copper 10G ixgbe hardware of various descriptions
>>>>>                     between OmniOS
>>>>>                     and CentOS 7 (I don't think we ever tested OmniOS to
>>>>>                     OmniOS). I don't
>>>>>                     believe OmniOS could do TCP at full line rate but I
>>>>>                     think we managed 700+
>>>>>                     Mbytes/sec on both transmit and receive and we got
>>>>>                     basically disk-limited
>>>>>                     speeds with iSCSI (across multiple disks on
>>>>>                     multi-disk mirrored pools,
>>>>>                     OmniOS iSCSI initiator, Linux iSCSI targets).
>>>>> 
>>>>>                        I don't believe we did any specific kernel tuning
>>>>>                     (and in fact some of
>>>>>                     our attempts to fiddle ixgbe driver parameters blew
>>>>>                     up in our face).
>>>>>                     We did tune iSCSI connection parameters to increase
>>>>>                     various buffer
>>>>>                     sizes so that ZFS could do even large single
>>>>>                     operations in single iSCSI
>>>>>                     transactions. (More details available if people are
>>>>>                     interested.)
>>>>> 
>>>>>                         10: At the wire level, the speed problems are
>>>>>                         clearly due to pauses in
>>>>>                         response time by omnios. At 9000 byte frame
>>>>>                         sizes, I see a good number
>>>>>                         of duplicate ACKs and fast retransmits during
>>>>>                         read operations (when
>>>>>                         omnios is transmitting). But below about a
>>>>>                         4100-byte MTU on omnios
>>>>>                         (which seems to correlate to 4096-byte iSCSI
>>>>>                         block transfers), the
>>>>>                         transmission errors fade away and we only see
>>>>>                         the transmission pause
>>>>>                         problem.
>>>>> 
>>>>> 
>>>>>                        This is what really attracted my attention. In
>>>>>                     our OmniOS setup, our
>>>>>                     specific Intel hardware had ixgbe driver issues that
>>>>>                     could cause
>>>>>                     activity stalls during once-a-second link heartbeat
>>>>>                     checks. This
>>>>>                     obviously had an effect at the TCP and iSCSI layers.
>>>>>                     My initial message
>>>>>                     to illumos-developer sparked a potentially
>>>>>                     interesting discussion:
>>>>> 
>>>>> 
>>>>>                     http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/
>>>>>                     <http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/>
>>>>> 
>>>>>                     If you think this is a possibility in your setup,
>>>>>                     I've put the DTrace
>>>>>                     script I used to hunt for this up on the web:
>>>>> 
>>>>>                     http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d
>>>>>                     <http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d>
>>>>> 
>>>>>                     This isn't the only potential source of driver
>>>>>                     stalls by any means, it's
>>>>>                     just the one I found. You may also want to look at
>>>>>                     lockstat in general,
>>>>>                     as information it reported is what led us to look
>>>>>                     specifically at the
>>>>>                     ixgbe code here.
>>>>> 
>>>>>                     (If you suspect kernel/driver issues, lockstat
>>>>>                     combined with kernel
>>>>>                     source is a really excellent resource.)
>>>>> 
>>>>>                               - cks
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>                 _________________________________________________
>>>>>                 OmniOS-discuss mailing list
>>>>>                 OmniOS-discuss at lists.omniti.__com
>>>>>                 <mailto:OmniOS-discuss at lists.omniti.com>
>>>>>                 http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>>>>                 <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>>>>> 
>>>>> 
>>>>>         _________________________________________________
>>>>>         OmniOS-discuss mailing list
>>>>>         OmniOS-discuss at lists.omniti.__com
>>>>>         <mailto:OmniOS-discuss at lists.omniti.com>
>>>>>         http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>>>>         <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>>>>> 
>>>>> 
>>>>>     --
>>>>>     OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>>>>>     Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> - Fax: +49 911
>>>>>     39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de
>>>>>     HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>>>> 
>>>>> 
>>>> 
>>>> -- 
>>>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>>>> Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de
>>>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>> 
>>> illumos-developer | Archives  | Modify Your Subscription	 
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150302/85509bb8/attachment-0001.html>

From wverb73 at gmail.com  Mon Mar  2 19:07:45 2015
From: wverb73 at gmail.com (W Verb)
Date: Mon, 2 Mar 2015 11:07:45 -0800
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
 Lindsay Lohan, and the Greek economy
In-Reply-To: <54F44602.5030705@osn.de>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
	<54F44602.5030705@osn.de>
Message-ID: <CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>

Hello all,
I am not using layer 2 flow control. The switch carries line-rate 10G
traffic without error.

I think I have found the issue via lockstat. The first lockstat is taken
during a multipath read:


lockstat -kWP sleep 30

Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec)

Count indv cuml rcnt     nsec Hottest Lock           Caller
-------------------------------------------------------------------------------
 9306  44%  44% 0.00     1557 htable_mutex+0x370     htable_release
 6307  23%  68% 0.00     1207 htable_mutex+0x108     htable_lookup
  596   7%  75% 0.00     4100 0xffffff0931705188     cv_wait
  349   5%  80% 0.00     4437 0xffffff0931705188     taskq_thread
  704   2%  82% 0.00      995 0xffffff0935de3c50     dbuf_create

The hash table being read here I would guess is the tcp connection hash
table.

When lockstat is run during a multipath write operation, I get:

Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec)

Count indv cuml rcnt     nsec Hottest Lock           Caller
-------------------------------------------------------------------------------
210752  28%  28% 0.00     4781 0xffffff0931705188     taskq_thread
174471  22%  50% 0.00     4476 0xffffff0931705188     cv_wait
127183  10%  61% 0.00     2871 0xffffff096f29b510     zio_notify_parent
176066  10%  70% 0.00     1922 0xffffff0931705188     taskq_dispatch_ent
105134   9%  80% 0.00     3110 0xffffff096ffdbf10     zio_remove_child
67512   4%  83% 0.00     1938 0xffffff096f3db4b0     zio_add_child
45736   3%  86% 0.00     2239 0xffffff0935de3c50     dbuf_destroy
27781   3%  89% 0.00     3416 0xffffff0935de3c50     dbuf_create
38536   2%  91% 0.00     2122 0xffffff0935de3b70     dnode_rele
27841   2%  93% 0.00     2423 0xffffff0935de3b70     dnode_diduse_space
19020   2%  95% 0.00     3046 0xffffff09d9e305e0     dbuf_rele
14627   1%  96% 0.00     3632 dbuf_hash_table+0x4f8  dbuf_find



Writes are not performing htable lookups, while reads are.

-Warren V






On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann <jg at osn.de> wrote:

> Hi,
>
> I would try *one* TPG which includes both interface addresses
> and I would double check for packet drops on the Catalyst.
>
> The 3560 supports only receive flow control which means, that
> a sending 10Gbit port can easily overload a 1Gbit port.
> Do you have flow control enabled?
>
>  - Joerg
>
>
> On 02.03.2015 09:22, W Verb via illumos-developer wrote:
>
>> Hello Garrett,
>>
>> No, no 802.3ad going on in this config.
>>
>> Here is a basic schematic:
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/
>> view?usp=sharing
>>
>> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/
>> view?usp=sharing
>>
>> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The
>> switch is set to allow 9148-byte frames, and I'm not seeing any
>> errors/buffer overruns on the switch.
>>
>> Here is a screenshot of a packet capture from a read operation on the
>> guest OS (from it's local drive, which is actually a VMDK file on the
>> storage server). In this example, only a single 1G ESXi kernel interface
>> (vmk1) is bound to the software iSCSI initiator.
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/
>> view?usp=sharing
>>
>> Note that there's a nice, well-behaved window sizing process taking
>> place. The ESXi decreases the scaled window by 11 or 12 for each ACK,
>> then bumps it back up to 512.
>>
>> Here is a similar screenshot of a single-interface write operation:
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/
>> view?usp=sharing
>>
>> There are no pauses or gaps in the transmission rate in the
>> single-interface transfers.
>>
>>
>> In the next screenshots, I have enabled an additional 1G interface on
>> the ESXi host, and bound it to the iSCSI initiator. The new interface is
>> bound to a separate physical port, uses a different VLAN on the switch,
>> and talks to a different 10G port on the storage server.
>>
>> First, let's look at a write operation on the guest OS, which happily
>> pumps data at near-line-rate to the storage server.
>>
>> Here is a sequence number trace diagram. Note how the transfer has a
>> nice, smooth increment rate over the entire transfer.
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/
>> view?usp=sharing
>>
>> Here are screenshots from packet captures on both 1G interfaces:
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/
>> view?usp=sharing
>> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/
>> view?usp=sharing
>>
>> Note how we again see nice, smooth window adjustment, and no gaps in
>> transmission.
>>
>>
>> But now, let's look at the problematic two-interface Read operation.
>> First, the sequence graph:
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/
>> view?usp=sharing
>>
>> As you can see, there are gaps and jumps in the transmission throughout
>> the transfer.
>> It is very illustrative to look at captures of the gaps, which are
>> occurring on both interfaces:
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/
>> view?usp=sharing
>> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/
>> view?usp=sharing
>>
>> As you can see, there are ~.4 second pauses in transmission from the
>> storage server, which kills the transfer rate.
>> It's clear that the ESXi box ACKs the prior iSCSI operation to
>> completion, then makes a new LUN request, which the storage server
>> immediately replies to. The ESXi ACKs the response packet from the
>> storage server, then waits...and waits....and waits... until eventually
>> the storage server starts transmitting again.
>>
>> Because the pause happens while the ESXi client is waiting for a packet
>> from the storage server, that tells me that the gaps are not an artifact
>> of traffic being switched between both active interfaces, but are
>> actually indicative of short hangs occurring on the server.
>>
>> Having a pause or two in transmission is no big deal, but in my case, it
>> is happening constantly, and dropping my overall read transfer rate down
>> to 20-60MB/s, which is slower than the single interface transfer rate
>> (~90-100MB/s).
>>
>> Decreasing the MTU makes the pauses shorter, increasing them makes the
>> pauses longer.
>>
>> Another interesting thing is that if I set the multipath io interval to
>> 3 operations instead of 1, I get better throughput. In other words, the
>> less frequently I swap IP addresses on my iSCSI requests from the ESXi
>> unit, the fewer pauses I see.
>>
>> Basically, COMSTAR seems to choke each time an iSCSI request from a new
>> IP arrives.
>>
>> Because the single interface transfer is near line rate, that tells me
>> that the storage system (mpt_sas, zfs, etc) is working fine. It's only
>> when multiple paths are attempted that iSCSI falls on its face during
>> reads.
>>
>> All of these captures were taken without a cache device being attached
>> to the storage zpool, so this isn't looking like some kind of ZFS ARC
>> problem. As mentioned previously, local transfers to/from the zpool are
>> showing ~300-500 MB/s rates over long transfers (10G+).
>>
>> -Warren V
>>
>> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org
>> <mailto:garrett at damore.org>> wrote:
>>
>>     I?m not sure I?ve followed properly.  You have *two* interfaces.
>>     You are not trying to provision these in an aggr are you? As far as
>>     I?m aware, VMware does not support 802.3ad link aggregations.  (Its
>>     possible that you can make it work with ESXi if you give the entire
>>     NIC to the guest ? but I?m skeptical.)  The problem is that if you
>>     try to use link aggregation, some packets (up to half!) will be
>>     lost.  TCP and other protocols fare poorly in this situation.
>>
>>     Its possible I?ve totally misunderstood what you?re trying to do, in
>>     which case I apologize.
>>
>>     The idle thing is a red-herring ? the cpu is waiting for work to do,
>>     probably because packets haven?t arrived (or where dropped by the
>>     hypervisor!)  I wouldn?t read too much into that except that your
>>     network stack is in trouble.  I?d look a bit more closely at the
>>     kstats for tcp ? I suspect you?ll see retransmits or out of order
>>     values that are unusually high ? if so this may help validate my
>>     theory above.
>>
>>     - Garrett
>>
>>      On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>>>     <developer at lists.illumos.org <mailto:developer at lists.illumos.org>>
>>>
>>>     wrote:
>>>
>>>     Hello all,
>>>
>>>
>>>     Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>>>
>>>
>>>     I tried Joerg's updated driver, which didn't improve the issue. So
>>>     I went back to the drawing board and rebuilt the server from scratch.
>>>
>>>     What I noted is that if I have only a single 1-gig physical
>>>     interface active on the ESXi host, everything works as expected.
>>>     As soon as I enable two interfaces, I start seeing the performance
>>>     problems I've described.
>>>
>>>     Response pauses from the server that I see in TCPdumps are still
>>>     leading me to believe the problem is delay on the server side, so
>>>     I ran a series of kernel dtraces and produced some flamegraphs.
>>>
>>>
>>>     This was taken during a read operation with two active 10G
>>>     interfaces on the server, with a single target being shared by two
>>>     tpgs- one tpg for each 10G physical port. The host device has two
>>>     1G ports enabled, with VLANs separating the active ports into
>>>     10G/1G pairs. ESXi is set to multipath using both VLANS with a
>>>     round-robin IO interval of 1.
>>>
>>>     https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/
>>> view?usp=sharing
>>>
>>>
>>>     This was taken during a write operation:
>>>
>>>     https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/
>>> view?usp=sharing
>>>
>>>
>>>     I then rebooted the server and disabled C-State, ACPI T-State, and
>>>     general EIST (Turbo boost) functionality in the CPU.
>>>
>>>     I when I attempted to boot my guest VM, the iSCSI transfer
>>>     gradually ground to a halt during the boot loading process, and
>>>     the guest OS never did complete its boot process.
>>>
>>>     Here is a flamegraph taken while iSCSI is slowly dying:
>>>
>>>     https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/
>>> view?usp=sharing
>>>
>>>
>>>     I edited out cpu_idle_adaptive from the dtrace output and
>>>     regenerated the slowdown graph:
>>>
>>>     https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/
>>> view?usp=sharing
>>>
>>>
>>>     I then edited cpu_idle_adaptive out of the speedy write operation
>>>     and regenerated that graph:
>>>
>>>     https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/
>>> view?usp=sharing
>>>
>>>
>>>     I have zero experience with interpreting flamegraphs, but the most
>>>     significant difference I see between the slow read example and the
>>>     fast write example is in unix`thread_start --> unix`idle. There's
>>>     a good chunk of "unix`i86_mwait" in the read example that is not
>>>     present in the write example at all.
>>>
>>>     Disabling the l2arc cache device didn't make a difference, and I
>>>     had to reenable EIST support on the CPU to get my VMs to boot.
>>>
>>>     I am seeing a variety of bug reports going back to 2010 regarding
>>>     excessive mwait operations, with the suggested solutions usually
>>>     being to set "cpupm enable poll-mode" in power.conf. That change
>>>     also had no effect on speed.
>>>
>>>     -Warren V
>>>
>>>
>>>
>>>
>>>     -----Original Message-----
>>>
>>>     From: Chris Siebenmann [mailto:cks at cs.toronto.edu]
>>>
>>>     Sent: Monday, February 23, 2015 8:30 AM
>>>
>>>     To: W Verb
>>>
>>>     Cc: omnios-discuss at lists.omniti.com
>>>     <mailto:omnios-discuss at lists.omniti.com>; cks at cs.toronto.edu
>>>     <mailto:cks at cs.toronto.edu>
>>>
>>>     Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and
>>>     the Greek economy
>>>
>>>
>>>     > Chris, thanks for your specific details. I'd appreciate it if you
>>>
>>>     > could tell me which copper NIC you tried, as well as to pass on the
>>>
>>>     > iSCSI tuning parameters.
>>>
>>>
>>>     Our copper NIC experience is with onboard X540-AT2 ports on
>>>     SuperMicro hardware (which have the guaranteed 10-20 msec lock
>>>     hold) and dual-port 82599EB TN cards (which have some sort of
>>>     driver/hardware failure under load that eventually leads to
>>>     2-second lock holds). I can't recommend either with the current
>>>     driver; we had to revert to 1G networking in order to get stable
>>>     servers.
>>>
>>>
>>>     The iSCSI parameter modifications we do, across both initiators
>>>     and targets, are:
>>>
>>>
>>>     initialr2tno
>>>
>>>     firstburstlength128k
>>>
>>>     maxrecvdataseglen128k[only on Linux backends]
>>>
>>>     maxxmitdataseglen128k[only on Linux backends]
>>>
>>>
>>>     The OmniOS initiator doesn't need tuning for more than the first
>>>     two parameters; on the Linux backends we tune up all four. My
>>>     extended thoughts on these tuning parameters and why we touch them
>>>     can be found
>>>
>>>     here:
>>>
>>>
>>>     http://utcc.utoronto.ca/~cks/space/blog/tech/
>>> UnderstandingiSCSIProtocol
>>>
>>>     http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>>>
>>>
>>>     The short version is that these parameters probably only make a
>>>     small difference but their overall goal is to do 128KB ZFS reads
>>>     and writes in single iSCSI operations (although they will be
>>>     fragmented at the TCP
>>>
>>>     layer) and to do iSCSI writes without a back-and-forth delay
>>>     between initiator and target (that's 'initialr2t no').
>>>
>>>
>>>     I think basically everyone should use InitialR2T set to no and in
>>>     fact that it should be the software default. These days only
>>>     unusually limited iSCSI targets should need it to be otherwise and
>>>     they can change their setting for it (initiator and target must
>>>     both agree to it being 'yes', so either can veto it).
>>>
>>>
>>>     - cks
>>>
>>>
>>>
>>>     On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de
>>>     <mailto:jg at osn.de>> wrote:
>>>
>>>         Hi,
>>>
>>>         I think your problem is caused by your link properties or your
>>>         switch settings. In general the standard ixgbe seems to perform
>>>         well.
>>>
>>>         I had trouble after changing the default flow control settings
>>>         to "bi"
>>>         and this was my motivation to update the ixgbe driver a long
>>>         time ago.
>>>         After I have updated our systems to ixgbe 2.5.8 I never had any
>>>         problems ....
>>>
>>>         Make sure your switch has support for jumbo frames and you use
>>>         the same mtu on all ports, otherwise the smallest will be used.
>>>
>>>         What switch do you use? I can tell you nice horror stories about
>>>         different vendors....
>>>
>>>          - Joerg
>>>
>>>         On 23.02.2015 10:31, W Verb wrote:
>>>
>>>             Thank you Joerg,
>>>
>>>             I've downloaded the package and will try it tomorrow.
>>>
>>>             The only thing I can add at this point is that upon review
>>>             of my
>>>             testing, I may have performed my "pkg -u" between the
>>>             initial quad-gig
>>>             performance test and installing the 10G NIC. So this may
>>>             be a new
>>>             problem introduced in the latest updates.
>>>
>>>             Those of you who are running 10G and have not upgraded to
>>>             the latest
>>>             kernel, etc, might want to do some additional testing
>>>             before running the
>>>             update.
>>>
>>>             -Warren V
>>>
>>>             On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann
>>>             <jg at osn.de <mailto:jg at osn.de>
>>>             <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
>>>
>>>                 Hi,
>>>
>>>                 I remember there was a problem with the flow control
>>>             settings in the
>>>                 ixgbe
>>>                 driver, so I updated it a long time ago for our
>>>             internal servers to
>>>                 2.5.8.
>>>                 Last weekend I integrated the latest changes from the
>>>             FreeBSD driver
>>>                 to bring
>>>                 the illumos ixgbe to 2.5.25 but I had no time to test
>>>             it, so it's
>>>                 completely
>>>                 untested!
>>>
>>>
>>>                 If you would like to give the latest driver a try you
>>>             can fetch the
>>>                 kernel modules from
>>>             https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9
>>>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>>>                 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>>>             <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>
>>>
>>>                 Clone your boot environment, place the modules in the
>>>             new environment
>>>                 and update the boot-archive of the new BE.
>>>
>>>                   - Joerg
>>>
>>>
>>>
>>>
>>>
>>>                 On 23.02.2015 02:54, W Verb wrote:
>>>
>>>                     By the way, to those of you who have working
>>>             setups: please send me
>>>                     your pool/volume settings, interface linkprops,
>>>             and any kernel
>>>                     tuning
>>>                     parameters you may have set.
>>>
>>>                     Thanks,
>>>                     Warren V
>>>
>>>                     On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>>                     <chip at innovates.com <mailto:chip at innovates.com>
>>>             <mailto:chip at innovates.com <mailto:chip at innovates.com>>>
>>>
>>>             wrote:
>>>
>>>                         I can't say I totally agree with your performance
>>>                         assessment.   I run Intel
>>>                         X520 in all my OmniOS boxes.
>>>
>>>                         Here is a capture of nfssvrtop I made while
>>>             running many
>>>                         storage vMotions
>>>                         between two OmniOS boxes hosting NFS
>>>             datastores.   This is a
>>>                         10 host VMware
>>>                         cluster.  Both OmniOS boxes are dual 10G
>>>             connected with
>>>                         copper twin-ax to
>>>                         the in rack Nexus 5010.
>>>
>>>                         VMware does 100% sync writes, I use ZeusRAM
>>>             SSDs for log
>>>                         devices.
>>>
>>>                         -Chip
>>>
>>>                         2014 Apr 24 08:05:51, load: 12.64, read:
>>>             17330243 KB,
>>>                         swrite: 15985    KB,
>>>                         awrite: 1875455  KB
>>>
>>>                         Ver     Client           NFSOPS   Reads
>>>             SWrites AWrites
>>>                         Commits   Rd_bw
>>>                         SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t
>>>              Com_t  Align%
>>>
>>>                         4       10.28.17.105          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         4       10.28.17.215          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         4       10.28.17.213          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         4       10.28.16.151          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         4       all                   1       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         3       10.28.16.175          3       0
>>>              3       0
>>>                           0       1
>>>                         11       0    4806      48       0       0
>>> 85
>>>
>>>                         3       10.28.16.183          6       0
>>>              6       0
>>>                           0       3
>>>                         162       0     549     124       0       0
>>>               73
>>>
>>>                         3       10.28.16.180         11       0
>>>             10       0
>>>                           0       3
>>>                         27       0     776      89       0       0
>>> 67
>>>
>>>                         3       10.28.16.176         28       2
>>>             26       0
>>>                           0      10
>>>                         405       0    2572     198       0       0
>>>              100
>>>
>>>                         3       10.28.16.178       4606    4602
>>>              4       0
>>>                           0  294534
>>>                         3       0     723      49       0       0      99
>>>
>>>                         3       10.28.16.179       4905    4879
>>>             26       0
>>>                           0  312208
>>>                         311       0     735     271       0       0
>>>               99
>>>
>>>                         3       10.28.16.181       5515    5502
>>>             13       0
>>>                           0  352107
>>>                         77       0      89      87       0       0
>>> 99
>>>
>>>                         3       10.28.16.184      12095   12059
>>>             10       0
>>>                           0  763014
>>>                         39       0     249     147       0       0
>>> 99
>>>
>>>                         3       10.28.58.1        15401    6040
>>>              116    6354
>>>                         53  191605
>>>                         474  202346     192      96     144      83
>>>               99
>>>
>>>                         3       all 42574 33086 <tel:42574%2033086>
>>>             <tel:42574%20%20%2033086>     217
>>>                         6354      53 1913488
>>>                         1582  202300     348     138     153     105
>>>                 99
>>>
>>>
>>>
>>>
>>>
>>>                         On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>>>             <wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>>>                         <mailto:wverb73 at gmail.com
>>>
>>>             <mailto:wverb73 at gmail.com>>> wrote:
>>>
>>>
>>>                             Hello All,
>>>
>>>                             Thank you for your replies.
>>>                             I tried a few things, and found the
>>> following:
>>>
>>>                             1: Disabling hyperthreading support in the
>>>             BIOS drops
>>>                             performance overall
>>>                             by a factor of 4.
>>>                             2: Disabling VT support also seems to have
>>>             some effect,
>>>                             although it
>>>                             appears to be minor. But this has the
>>>             amusing side
>>>                             effect of fixing the
>>>                             hangs I've been experiencing with fast
>>>             reboot. Probably
>>>                             by disabling kvm.
>>>                             3: The performance tests are a bit tricky
>>>             to quantify
>>>                             because of caching
>>>                             effects. In fact, I'm not entirely sure
>>>             what is
>>>                             happening here. It's just
>>>                             best to describe what I'm seeing:
>>>
>>>                             The commands I'm using to test are
>>>                             dd if=/dev/zero of=./test.dd bs=2M count=5000
>>>                             dd of=/dev/null if=./test.dd bs=2M count=5000
>>>                             The host vm is running Centos 6.6, and has
>>>             the latest
>>>                             vmtools installed.
>>>                             There is a host cache on an SSD local to
>>>             the host that
>>>                             is also in place.
>>>                             Disabling the host cache didn't
>>>             immediately have an
>>>                             effect as far as I could
>>>                             see.
>>>
>>>                             The host MTU set to 3000 on all iSCSI
>>>             interfaces for all
>>>                             tests.
>>>
>>>                             Test 1: Right after reboot, with an ixgbe
>>>             MTU of 9000,
>>>                             the write test
>>>                             yields an average speed over three tests
>>>             of 137MB/s. The
>>>                             read test yields an
>>>                             average over three tests of 5MB/s.
>>>
>>>                             Test 2: After setting "ifconfig ixgbe0 mtu
>>>             3000", the
>>>                             write tests yield
>>>                             140MB/s, and the read tests yield 53MB/s.
>>>             It's important
>>>                             to note here that
>>>                             if I cut the read test short at only
>>>             2-3GB, I get
>>>                             results upwards of
>>>                             350MB/s, which I assume is local
>>>             cache-related distortion.
>>>
>>>                             Test 3: MTU of 1500. Read tests are up to
>>>             156 MB/s.
>>>                             Write tests yield
>>>                             about 142MB/s.
>>>                             Test 4: MTU of 1000: Read test at 182MB/s.
>>>                             Test 5: MTU of 900: Read test at 130 MB/s.
>>>                             Test 6: MTU of 1000: Read test at 160MB/s.
>>>             Write tests
>>>                             are now
>>>                             consistently at about 300MB/s.
>>>                             Test 7: MTU of 1200: Read test at 124MB/s.
>>>                             Test 8: MTU of 1000: Read test at 161MB/s.
>>>             Write at 261MB/s.
>>>
>>>                             A few final notes:
>>>                             L1ARC grabs about 10GB of RAM during the
>>>             tests, so
>>>                             there's definitely some
>>>                             read caching going on.
>>>                             The write operations are easier to observe
>>>             with iostat,
>>>                             and I'm seeing io
>>>                             rates that closely correlate with the
>>>             network write speeds.
>>>
>>>
>>>                             Chris, thanks for your specific details.
>>>             I'd appreciate
>>>                             it if you could
>>>                             tell me which copper NIC you tried, as
>>>             well as to pass
>>>                             on the iSCSI tuning
>>>                             parameters.
>>>
>>>                             I've ordered an Intel EXPX9502AFXSR, which
>>>             uses the
>>>                             82598 chip instead of
>>>                             the 82599 in the X520. If I get similar
>>>             results with my
>>>                             fiber transcievers,
>>>                             I'll see if I can get a hold of copper ones.
>>>
>>>                             But I should mention that I did indeed
>>>             look at PHY/MAC
>>>                             error rates, and
>>>                             they are nil.
>>>
>>>                             -Warren V
>>>
>>>                             On Fri, Feb 20, 2015 at 7:25 PM, Chris
>>>             Siebenmann
>>>                             <cks at cs.toronto.edu
>>>             <mailto:cks at cs.toronto.edu> <mailto:cks at cs.toronto.edu
>>>
>>>             <mailto:cks at cs.toronto.edu>>>
>>>
>>>                             wrote:
>>>
>>>
>>>                                     After installation and
>>>             configuration, I observed
>>>                                     all kinds of bad
>>>                                     behavior
>>>                                     in the network traffic between the
>>>             hosts and the
>>>                                     server. All of this
>>>                                     bad
>>>                                     behavior is traced to the ixgbe
>>>             driver on the
>>>                                     storage server. Without
>>>                                     going
>>>                                     into the full troubleshooting
>>>             process, here are
>>>                                     my takeaways:
>>>
>>>                                 [...]
>>>
>>>                                    For what it's worth, we managed to
>>>             achieve much
>>>                                 better line rates on
>>>                                 copper 10G ixgbe hardware of various
>>>             descriptions
>>>                                 between OmniOS
>>>                                 and CentOS 7 (I don't think we ever
>>>             tested OmniOS to
>>>                                 OmniOS). I don't
>>>                                 believe OmniOS could do TCP at full
>>>             line rate but I
>>>                                 think we managed 700+
>>>                                 Mbytes/sec on both transmit and
>>>             receive and we got
>>>                                 basically disk-limited
>>>                                 speeds with iSCSI (across multiple
>>>             disks on
>>>                                 multi-disk mirrored pools,
>>>                                 OmniOS iSCSI initiator, Linux iSCSI
>>>             targets).
>>>
>>>                                    I don't believe we did any specific
>>>             kernel tuning
>>>                                 (and in fact some of
>>>                                 our attempts to fiddle ixgbe driver
>>>             parameters blew
>>>                                 up in our face).
>>>                                 We did tune iSCSI connection
>>>             parameters to increase
>>>                                 various buffer
>>>                                 sizes so that ZFS could do even large
>>>             single
>>>                                 operations in single iSCSI
>>>                                 transactions. (More details available
>>>             if people are
>>>                                 interested.)
>>>
>>>                                     10: At the wire level, the speed
>>>             problems are
>>>                                     clearly due to pauses in
>>>                                     response time by omnios. At 9000
>>>             byte frame
>>>                                     sizes, I see a good number
>>>                                     of duplicate ACKs and fast
>>>             retransmits during
>>>                                     read operations (when
>>>                                     omnios is transmitting). But below
>>>             about a
>>>                                     4100-byte MTU on omnios
>>>                                     (which seems to correlate to
>>>             4096-byte iSCSI
>>>                                     block transfers), the
>>>                                     transmission errors fade away and
>>>             we only see
>>>                                     the transmission pause
>>>                                     problem.
>>>
>>>
>>>                                    This is what really attracted my
>>>             attention. In
>>>                                 our OmniOS setup, our
>>>                                 specific Intel hardware had ixgbe
>>>             driver issues that
>>>                                 could cause
>>>                                 activity stalls during once-a-second
>>>             link heartbeat
>>>                                 checks. This
>>>                                 obviously had an effect at the TCP and
>>>             iSCSI layers.
>>>                                 My initial message
>>>                                 to illumos-developer sparked a
>>> potentially
>>>                                 interesting discussion:
>>>
>>>
>>>             http://www.listbox.com/member/____archive/182179/2014/10/
>>> sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__
>>> 4B1D-__11E4-A39C-D534381BA44D/
>>>             <http://www.listbox.com/member/__archive/182179/2014/
>>> 10/sort/__time_rev/page/16/entry/6:405/__20141003125035:
>>> 6357079A-4B1D-__11E4-A39C-D534381BA44D/>
>>>
>>>             <http://www.listbox.com/__member/archive/182179/2014/10/
>>> __sort/time_rev/page/16/entry/6:__405/20141003125035:
>>> 6357079A-__4B1D-11E4-A39C-D534381BA44D/
>>>             <http://www.listbox.com/member/archive/182179/2014/10/
>>> sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-
>>> 4B1D-11E4-A39C-D534381BA44D/>>
>>>
>>>                                 If you think this is a possibility in
>>>             your setup,
>>>                                 I've put the DTrace
>>>                                 script I used to hunt for this up on
>>>             the web:
>>>
>>>             http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe___
>>> __delay.d
>>>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___
>>> delay.d>
>>>
>>>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___
>>> delay.d
>>>             <http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_
>>> delay.d>>
>>>
>>>                                 This isn't the only potential source
>>>             of driver
>>>                                 stalls by any means, it's
>>>                                 just the one I found. You may also
>>>             want to look at
>>>                                 lockstat in general,
>>>                                 as information it reported is what led
>>>             us to look
>>>                                 specifically at the
>>>                                 ixgbe code here.
>>>
>>>                                 (If you suspect kernel/driver issues,
>>>             lockstat
>>>                                 combined with kernel
>>>                                 source is a really excellent resource.)
>>>
>>>                                           - cks
>>>
>>>
>>>
>>>
>>>
>>>             ___________________________________________________
>>>                             OmniOS-discuss mailing list
>>>             OmniOS-discuss at lists.omniti
>>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>>                             <mailto:OmniOS-discuss at lists.__omniti.com
>>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>>             http://lists.omniti.com/____mailman/listinfo/omnios-____
>>> discuss
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>> >
>>>
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>>
>>>
>>>                     ___________________________________________________
>>>                     OmniOS-discuss mailing list
>>>             OmniOS-discuss at lists.omniti
>>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>>                     <mailto:OmniOS-discuss at lists.__omniti.com
>>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>>             http://lists.omniti.com/____mailman/listinfo/omnios-____
>>> discuss
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>> >
>>>
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>>
>>>
>>>                 --
>>>                 OSN Online Service Nuernberg GmbH, Bucher Str. 78,
>>>             90408 Nuernberg
>>>                 Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0>
>>>             <tel:%2B49%20911%2039905-0> - Fax: +49 911
>>>                 39905-55 <tel:%2B49%20911%2039905-55> -
>>>             http://www.osn.de <http://www.osn.de/>
>>>                 HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg
>>>             Goltermann
>>>
>>>
>>>
>>>         --
>>>         OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408
>>> Nuernberg
>>>         Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> - Fax: +49
>>>         911 39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de
>>>         <http://www.osn.de/>
>>>         HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>>
>>>
>>>     *illumos-developer* | Archives
>>>     <https://www.listbox.com/member/archive/182179/=now>
>>>     <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e
>>> >
>>>     | Modify <https://www.listbox.com/member/?&> Your Subscription
>>>     [Powered by Listbox] <http://www.listbox.com/>
>>>
>>>
>>
>> *illumos-developer* | Archives
>> <https://www.listbox.com/member/archive/182179/=now>
>> <https://www.listbox.com/member/archive/rss/182179/21175123-d0c8da4c> |
>> Modify
>> <https://www.listbox.com/member/?member_id=21175123&id_
>> secret=21175123-d92578cc>
>> Your Subscription       [Powered by Listbox] <http://www.listbox.com>
>>
>>
> --
> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
> Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de
> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150302/62dbb20a/attachment-0001.html>

From danmcd at omniti.com  Mon Mar  2 19:15:45 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 2 Mar 2015 14:15:45 -0500
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
	Lindsay Lohan, and the Greek economy
In-Reply-To: <CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
	<54F44602.5030705@osn.de>
	<CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>
Message-ID: <AFF138EF-4D19-4DF2-9F67-4AC871ABB96B@omniti.com>


> On Mar 2, 2015, at 2:07 PM, W Verb via illumos-developer <developer at lists.illumos.org> wrote:
> 
> Count indv cuml rcnt     nsec Hottest Lock           Caller
> -------------------------------------------------------------------------------
>  9306  44%  44% 0.00     1557 htable_mutex+0x370     htable_release
>  6307  23%  68% 0.00     1207 htable_mutex+0x108     htable_lookup
>   596   7%  75% 0.00     4100 0xffffff0931705188     cv_wait
>   349   5%  80% 0.00     4437 0xffffff0931705188     taskq_thread
>   704   2%  82% 0.00      995 0xffffff0935de3c50     dbuf_create

That has NOTHING to do with TCP.

 It has everything to do with the Virtual Memory subsystem.  Here, see all the callers to htable_release():

	http://src.illumos.org/source/search?q=&defs=&refs=htable_release&path=&hist=&project=illumos-gate

I think "VM thrashing" when I see that.

Dan


From garrett at damore.org  Mon Mar  2 19:30:18 2015
From: garrett at damore.org (Garrett D'Amore)
Date: Mon, 2 Mar 2015 11:30:18 -0800
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
	Lindsay Lohan, and the Greek economy
In-Reply-To: <CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
	<54F44602.5030705@osn.de>
	<CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>
Message-ID: <064C3FC8-43E0-4757-AADA-831303782A4C@damore.org>

Here?s a theory.  You are using small (relatively) MTUs (3000 is less than the smallest ZFS block size.)  So, when you go multipathing this way, might a single upper layer transaction (ZFS block transfer request, or for that matter COMSTAR block request) get routed over different paths.  This sounds like a potentially pathological condition to me.

What happens if you increase the MTU to 9000?  Have you tried it?  I?m sort of thinking that this will permit each transaction to be issued in a single IP frame, which may alleviate certain tragic code paths.  (That said, I?m not sure how aware COMSTAR is of the IP MTU.  If it is ignorant, then it shouldn?t matter *that* much, since TCP should do the right thing here and a single TCP stream should stick to a single underlying NIC.  But if COMSTAR is aware of the MTU, it may do some really screwball things as it tries to break requests up into single frames.)

Your read spin really looks like only about 22 msec of wait out of a total run of 30 sec.  (That?s not *great*, but neither does it sound tragic.)  Your write  is interesting because that looks like it is going a wildly different path.  You should be aware that the locks you see are *not* necessarily related in call order, but rather are ordered by instance count.  The write code path hitting the task_thread as hard as it does is really, really weird.  Something is pounding on a taskq lock super hard.  The number of taskq_dispatch_ent calls is interesting here.  I?m starting to wonder if it?s something as stupid as a spin where if the taskq is ?full? (max size reached), a caller just is spinning trying to dispatch jobs to the taskq.  

The taskq_dispatch_ent code is super simple, and it should be almost impossible to have contention on that lock ? barring a thread spinning hard on taskq_dispatch (or taskq_dispatch_ent as I think is happening here).  Looking at the various call sites, there are places in both COMSTAR (iscsit) and in ZFS where this could be coming from.  To know which, we really need to have the back trace associated. 

lockstat can give this ? try giving ?-s 5? to give a short backtrace from this, that will probably give us a little more info about the guilty caller. :-)

	- Garrett

> On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer <developer at lists.illumos.org> wrote:
> 
> Hello all,
> I am not using layer 2 flow control. The switch carries line-rate 10G traffic without error.
> 
> I think I have found the issue via lockstat. The first lockstat is taken during a multipath read:
> 
> 
> lockstat -kWP sleep 30
> 
> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec)
> 
> Count indv cuml rcnt     nsec Hottest Lock           Caller
> -------------------------------------------------------------------------------
>  9306  44%  44% 0.00     1557 htable_mutex+0x370     htable_release
>  6307  23%  68% 0.00     1207 htable_mutex+0x108     htable_lookup
>   596   7%  75% 0.00     4100 0xffffff0931705188     cv_wait
>   349   5%  80% 0.00     4437 0xffffff0931705188     taskq_thread
>   704   2%  82% 0.00      995 0xffffff0935de3c50     dbuf_create
> 
> The hash table being read here I would guess is the tcp connection hash table.
> 
> When lockstat is run during a multipath write operation, I get:
> 
> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec)
> 
> Count indv cuml rcnt     nsec Hottest Lock           Caller
> -------------------------------------------------------------------------------
> 210752  28%  28% 0.00     4781 0xffffff0931705188     taskq_thread
> 174471  22%  50% 0.00     4476 0xffffff0931705188     cv_wait
> 127183  10%  61% 0.00     2871 0xffffff096f29b510     zio_notify_parent
> 176066  10%  70% 0.00     1922 0xffffff0931705188     taskq_dispatch_ent
> 105134   9%  80% 0.00     3110 0xffffff096ffdbf10     zio_remove_child
> 67512   4%  83% 0.00     1938 0xffffff096f3db4b0     zio_add_child
> 45736   3%  86% 0.00     2239 0xffffff0935de3c50     dbuf_destroy
> 27781   3%  89% 0.00     3416 0xffffff0935de3c50     dbuf_create
> 38536   2%  91% 0.00     2122 0xffffff0935de3b70     dnode_rele
> 27841   2%  93% 0.00     2423 0xffffff0935de3b70     dnode_diduse_space
> 19020   2%  95% 0.00     3046 0xffffff09d9e305e0     dbuf_rele
> 14627   1%  96% 0.00     3632 dbuf_hash_table+0x4f8  dbuf_find
> 
> 
> 
> Writes are not performing htable lookups, while reads are.
> 
> -Warren V
> 
> 
> 
> 
> 
> 
> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann <jg at osn.de <mailto:jg at osn.de>> wrote:
> Hi,
> 
> I would try *one* TPG which includes both interface addresses
> and I would double check for packet drops on the Catalyst.
> 
> The 3560 supports only receive flow control which means, that
> a sending 10Gbit port can easily overload a 1Gbit port.
> Do you have flow control enabled?
> 
>  - Joerg
> 
> 
> On 02.03.2015 09:22, W Verb via illumos-developer wrote:
> Hello Garrett,
> 
> No, no 802.3ad going on in this config.
> 
> Here is a basic schematic:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing>
> 
> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing>
> 
> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The
> switch is set to allow 9148-byte frames, and I'm not seeing any
> errors/buffer overruns on the switch.
> 
> Here is a screenshot of a packet capture from a read operation on the
> guest OS (from it's local drive, which is actually a VMDK file on the
> storage server). In this example, only a single 1G ESXi kernel interface
> (vmk1) is bound to the software iSCSI initiator.
> 
> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing>
> 
> Note that there's a nice, well-behaved window sizing process taking
> place. The ESXi decreases the scaled window by 11 or 12 for each ACK,
> then bumps it back up to 512.
> 
> Here is a similar screenshot of a single-interface write operation:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing>
> 
> There are no pauses or gaps in the transmission rate in the
> single-interface transfers.
> 
> 
> In the next screenshots, I have enabled an additional 1G interface on
> the ESXi host, and bound it to the iSCSI initiator. The new interface is
> bound to a separate physical port, uses a different VLAN on the switch,
> and talks to a different 10G port on the storage server.
> 
> First, let's look at a write operation on the guest OS, which happily
> pumps data at near-line-rate to the storage server.
> 
> Here is a sequence number trace diagram. Note how the transfer has a
> nice, smooth increment rate over the entire transfer.
> 
> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing>
> 
> Here are screenshots from packet captures on both 1G interfaces:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing>
> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing>
> 
> Note how we again see nice, smooth window adjustment, and no gaps in
> transmission.
> 
> 
> But now, let's look at the problematic two-interface Read operation.
> First, the sequence graph:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing>
> 
> As you can see, there are gaps and jumps in the transmission throughout
> the transfer.
> It is very illustrative to look at captures of the gaps, which are
> occurring on both interfaces:
> 
> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing>
> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing>
> 
> As you can see, there are ~.4 second pauses in transmission from the
> storage server, which kills the transfer rate.
> It's clear that the ESXi box ACKs the prior iSCSI operation to
> completion, then makes a new LUN request, which the storage server
> immediately replies to. The ESXi ACKs the response packet from the
> storage server, then waits...and waits....and waits... until eventually
> the storage server starts transmitting again.
> 
> Because the pause happens while the ESXi client is waiting for a packet
> from the storage server, that tells me that the gaps are not an artifact
> of traffic being switched between both active interfaces, but are
> actually indicative of short hangs occurring on the server.
> 
> Having a pause or two in transmission is no big deal, but in my case, it
> is happening constantly, and dropping my overall read transfer rate down
> to 20-60MB/s, which is slower than the single interface transfer rate
> (~90-100MB/s).
> 
> Decreasing the MTU makes the pauses shorter, increasing them makes the
> pauses longer.
> 
> Another interesting thing is that if I set the multipath io interval to
> 3 operations instead of 1, I get better throughput. In other words, the
> less frequently I swap IP addresses on my iSCSI requests from the ESXi
> unit, the fewer pauses I see.
> 
> Basically, COMSTAR seems to choke each time an iSCSI request from a new
> IP arrives.
> 
> Because the single interface transfer is near line rate, that tells me
> that the storage system (mpt_sas, zfs, etc) is working fine. It's only
> when multiple paths are attempted that iSCSI falls on its face during reads.
> 
> All of these captures were taken without a cache device being attached
> to the storage zpool, so this isn't looking like some kind of ZFS ARC
> problem. As mentioned previously, local transfers to/from the zpool are
> showing ~300-500 MB/s rates over long transfers (10G+).
> 
> -Warren V
> 
> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org <mailto:garrett at damore.org>
> <mailto:garrett at damore.org <mailto:garrett at damore.org>>> wrote:
> 
>     I?m not sure I?ve followed properly.  You have *two* interfaces.
>     You are not trying to provision these in an aggr are you? As far as
>     I?m aware, VMware does not support 802.3ad link aggregations.  (Its
>     possible that you can make it work with ESXi if you give the entire
>     NIC to the guest ? but I?m skeptical.)  The problem is that if you
>     try to use link aggregation, some packets (up to half!) will be
>     lost.  TCP and other protocols fare poorly in this situation.
> 
>     Its possible I?ve totally misunderstood what you?re trying to do, in
>     which case I apologize.
> 
>     The idle thing is a red-herring ? the cpu is waiting for work to do,
>     probably because packets haven?t arrived (or where dropped by the
>     hypervisor!)  I wouldn?t read too much into that except that your
>     network stack is in trouble.  I?d look a bit more closely at the
>     kstats for tcp ? I suspect you?ll see retransmits or out of order
>     values that are unusually high ? if so this may help validate my
>     theory above.
> 
>     - Garrett
> 
>     On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>     <developer at lists.illumos.org <mailto:developer at lists.illumos.org> <mailto:developer at lists.illumos.org <mailto:developer at lists.illumos.org>>>
> 
>     wrote:
> 
>     Hello all,
> 
> 
>     Well, I no longer blame the ixgbe driver for the problems I'm seeing.
> 
> 
>     I tried Joerg's updated driver, which didn't improve the issue. So
>     I went back to the drawing board and rebuilt the server from scratch.
> 
>     What I noted is that if I have only a single 1-gig physical
>     interface active on the ESXi host, everything works as expected.
>     As soon as I enable two interfaces, I start seeing the performance
>     problems I've described.
> 
>     Response pauses from the server that I see in TCPdumps are still
>     leading me to believe the problem is delay on the server side, so
>     I ran a series of kernel dtraces and produced some flamegraphs.
> 
> 
>     This was taken during a read operation with two active 10G
>     interfaces on the server, with a single target being shared by two
>     tpgs- one tpg for each 10G physical port. The host device has two
>     1G ports enabled, with VLANs separating the active ports into
>     10G/1G pairs. ESXi is set to multipath using both VLANS with a
>     round-robin IO interval of 1.
> 
>     https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing>
> 
> 
>     This was taken during a write operation:
> 
>     https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing>
> 
> 
>     I then rebooted the server and disabled C-State, ACPI T-State, and
>     general EIST (Turbo boost) functionality in the CPU.
> 
>     I when I attempted to boot my guest VM, the iSCSI transfer
>     gradually ground to a halt during the boot loading process, and
>     the guest OS never did complete its boot process.
> 
>     Here is a flamegraph taken while iSCSI is slowly dying:
> 
>     https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing>
> 
> 
>     I edited out cpu_idle_adaptive from the dtrace output and
>     regenerated the slowdown graph:
> 
>     https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing>
> 
> 
>     I then edited cpu_idle_adaptive out of the speedy write operation
>     and regenerated that graph:
> 
>     https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing>
> 
> 
>     I have zero experience with interpreting flamegraphs, but the most
>     significant difference I see between the slow read example and the
>     fast write example is in unix`thread_start --> unix`idle. There's
>     a good chunk of "unix`i86_mwait" in the read example that is not
>     present in the write example at all.
> 
>     Disabling the l2arc cache device didn't make a difference, and I
>     had to reenable EIST support on the CPU to get my VMs to boot.
> 
>     I am seeing a variety of bug reports going back to 2010 regarding
>     excessive mwait operations, with the suggested solutions usually
>     being to set "cpupm enable poll-mode" in power.conf. That change
>     also had no effect on speed.
> 
>     -Warren V
> 
> 
> 
> 
>     -----Original Message-----
> 
>     From: Chris Siebenmann [mailto:cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>]
> 
>     Sent: Monday, February 23, 2015 8:30 AM
> 
>     To: W Verb
> 
>     Cc: omnios-discuss at lists.omniti.com <mailto:omnios-discuss at lists.omniti.com>
>     <mailto:omnios-discuss at lists.omniti.com <mailto:omnios-discuss at lists.omniti.com>>; cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>
>     <mailto:cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>>
> 
>     Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and
>     the Greek economy
> 
> 
>     > Chris, thanks for your specific details. I'd appreciate it if you
> 
>     > could tell me which copper NIC you tried, as well as to pass on the
> 
>     > iSCSI tuning parameters.
> 
> 
>     Our copper NIC experience is with onboard X540-AT2 ports on
>     SuperMicro hardware (which have the guaranteed 10-20 msec lock
>     hold) and dual-port 82599EB TN cards (which have some sort of
>     driver/hardware failure under load that eventually leads to
>     2-second lock holds). I can't recommend either with the current
>     driver; we had to revert to 1G networking in order to get stable
>     servers.
> 
> 
>     The iSCSI parameter modifications we do, across both initiators
>     and targets, are:
> 
> 
>     initialr2tno
> 
>     firstburstlength128k
> 
>     maxrecvdataseglen128k[only on Linux backends]
> 
>     maxxmitdataseglen128k[only on Linux backends]
> 
> 
>     The OmniOS initiator doesn't need tuning for more than the first
>     two parameters; on the Linux backends we tune up all four. My
>     extended thoughts on these tuning parameters and why we touch them
>     can be found
> 
>     here:
> 
> 
>     http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol <http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol>
> 
>     http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning <http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning>
> 
> 
>     The short version is that these parameters probably only make a
>     small difference but their overall goal is to do 128KB ZFS reads
>     and writes in single iSCSI operations (although they will be
>     fragmented at the TCP
> 
>     layer) and to do iSCSI writes without a back-and-forth delay
>     between initiator and target (that's 'initialr2t no').
> 
> 
>     I think basically everyone should use InitialR2T set to no and in
>     fact that it should be the software default. These days only
>     unusually limited iSCSI targets should need it to be otherwise and
>     they can change their setting for it (initiator and target must
>     both agree to it being 'yes', so either can veto it).
> 
> 
>     - cks
> 
> 
> 
>     On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de <mailto:jg at osn.de>
>     <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
> 
>         Hi,
> 
>         I think your problem is caused by your link properties or your
>         switch settings. In general the standard ixgbe seems to perform
>         well.
> 
>         I had trouble after changing the default flow control settings
>         to "bi"
>         and this was my motivation to update the ixgbe driver a long
>         time ago.
>         After I have updated our systems to ixgbe 2.5.8 I never had any
>         problems ....
> 
>         Make sure your switch has support for jumbo frames and you use
>         the same mtu on all ports, otherwise the smallest will be used.
> 
>         What switch do you use? I can tell you nice horror stories about
>         different vendors....
> 
>          - Joerg
> 
>         On 23.02.2015 10:31, W Verb wrote:
> 
>             Thank you Joerg,
> 
>             I've downloaded the package and will try it tomorrow.
> 
>             The only thing I can add at this point is that upon review
>             of my
>             testing, I may have performed my "pkg -u" between the
>             initial quad-gig
>             performance test and installing the 10G NIC. So this may
>             be a new
>             problem introduced in the latest updates.
> 
>             Those of you who are running 10G and have not upgraded to
>             the latest
>             kernel, etc, might want to do some additional testing
>             before running the
>             update.
> 
>             -Warren V
> 
>             On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann
>             <jg at osn.de <mailto:jg at osn.de> <mailto:jg at osn.de <mailto:jg at osn.de>>
>             <mailto:jg at osn.de <mailto:jg at osn.de> <mailto:jg at osn.de <mailto:jg at osn.de>>>> wrote:
> 
>                 Hi,
> 
>                 I remember there was a problem with the flow control
>             settings in the
>                 ixgbe
>                 driver, so I updated it a long time ago for our
>             internal servers to
>                 2.5.8.
>                 Last weekend I integrated the latest changes from the
>             FreeBSD driver
>                 to bring
>                 the illumos ixgbe to 2.5.25 but I had no time to test
>             it, so it's
>                 completely
>                 untested!
> 
> 
>                 If you would like to give the latest driver a try you
>             can fetch the
>                 kernel modules from
>             https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 <https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9>
>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>>
>                 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>             <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9 <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>>
> 
>                 Clone your boot environment, place the modules in the
>             new environment
>                 and update the boot-archive of the new BE.
> 
>                   - Joerg
> 
> 
> 
> 
> 
>                 On 23.02.2015 02:54, W Verb wrote:
> 
>                     By the way, to those of you who have working
>             setups: please send me
>                     your pool/volume settings, interface linkprops,
>             and any kernel
>                     tuning
>                     parameters you may have set.
> 
>                     Thanks,
>                     Warren V
> 
>                     On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>                     <chip at innovates.com <mailto:chip at innovates.com> <mailto:chip at innovates.com <mailto:chip at innovates.com>>
>             <mailto:chip at innovates.com <mailto:chip at innovates.com> <mailto:chip at innovates.com <mailto:chip at innovates.com>>>>
> 
>             wrote:
> 
>                         I can't say I totally agree with your performance
>                         assessment.   I run Intel
>                         X520 in all my OmniOS boxes.
> 
>                         Here is a capture of nfssvrtop I made while
>             running many
>                         storage vMotions
>                         between two OmniOS boxes hosting NFS
>             datastores.   This is a
>                         10 host VMware
>                         cluster.  Both OmniOS boxes are dual 10G
>             connected with
>                         copper twin-ax to
>                         the in rack Nexus 5010.
> 
>                         VMware does 100% sync writes, I use ZeusRAM
>             SSDs for log
>                         devices.
> 
>                         -Chip
> 
>                         2014 Apr 24 08:05:51, load: 12.64, read:
>             17330243 KB,
>                         swrite: 15985    KB,
>                         awrite: 1875455  KB
> 
>                         Ver     Client           NFSOPS   Reads
>             SWrites AWrites
>                         Commits   Rd_bw
>                         SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t
>              Com_t  Align%
> 
>                         4       10.28.17.105          0       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
> 
>                         4       10.28.17.215          0       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
> 
>                         4       10.28.17.213          0       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
> 
>                         4       10.28.16.151          0       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
> 
>                         4       all                   1       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
> 
>                         3       10.28.16.175          3       0
>              3       0
>                           0       1
>                         11       0    4806      48       0       0      85
> 
>                         3       10.28.16.183          6       0
>              6       0
>                           0       3
>                         162       0     549     124       0       0
>               73
> 
>                         3       10.28.16.180         11       0
>             10       0
>                           0       3
>                         27       0     776      89       0       0      67
> 
>                         3       10.28.16.176         28       2
>             26       0
>                           0      10
>                         405       0    2572     198       0       0
>              100
> 
>                         3       10.28.16.178       4606    4602
>              4       0
>                           0  294534
>                         3       0     723      49       0       0      99
> 
>                         3       10.28.16.179       4905    4879
>             26       0
>                           0  312208
>                         311       0     735     271       0       0
>               99
> 
>                         3       10.28.16.181       5515    5502
>             13       0
>                           0  352107
>                         77       0      89      87       0       0      99
> 
>                         3       10.28.16.184      12095   12059
>             10       0
>                           0  763014
>                         39       0     249     147       0       0      99
> 
>                         3       10.28.58.1        15401    6040
>              116    6354
>                         53  191605
>                         474  202346     192      96     144      83
>               99
> 
>                         3       all 42574 33086 <tel:42574%2033086> <tel:42574%2033086>
>             <tel:42574%20%20%2033086>     217
>                         6354      53 1913488
>                         1582  202300     348     138     153     105
>                 99
> 
> 
> 
> 
> 
>                         On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>             <wverb73 at gmail.com <mailto:wverb73 at gmail.com> <mailto:wverb73 at gmail.com <mailto:wverb73 at gmail.com>>
>                         <mailto:wverb73 at gmail.com <mailto:wverb73 at gmail.com>
> 
>             <mailto:wverb73 at gmail.com <mailto:wverb73 at gmail.com>>>> wrote:
> 
> 
>                             Hello All,
> 
>                             Thank you for your replies.
>                             I tried a few things, and found the following:
> 
>                             1: Disabling hyperthreading support in the
>             BIOS drops
>                             performance overall
>                             by a factor of 4.
>                             2: Disabling VT support also seems to have
>             some effect,
>                             although it
>                             appears to be minor. But this has the
>             amusing side
>                             effect of fixing the
>                             hangs I've been experiencing with fast
>             reboot. Probably
>                             by disabling kvm.
>                             3: The performance tests are a bit tricky
>             to quantify
>                             because of caching
>                             effects. In fact, I'm not entirely sure
>             what is
>                             happening here. It's just
>                             best to describe what I'm seeing:
> 
>                             The commands I'm using to test are
>                             dd if=/dev/zero of=./test.dd bs=2M count=5000
>                             dd of=/dev/null if=./test.dd bs=2M count=5000
>                             The host vm is running Centos 6.6, and has
>             the latest
>                             vmtools installed.
>                             There is a host cache on an SSD local to
>             the host that
>                             is also in place.
>                             Disabling the host cache didn't
>             immediately have an
>                             effect as far as I could
>                             see.
> 
>                             The host MTU set to 3000 on all iSCSI
>             interfaces for all
>                             tests.
> 
>                             Test 1: Right after reboot, with an ixgbe
>             MTU of 9000,
>                             the write test
>                             yields an average speed over three tests
>             of 137MB/s. The
>                             read test yields an
>                             average over three tests of 5MB/s.
> 
>                             Test 2: After setting "ifconfig ixgbe0 mtu
>             3000", the
>                             write tests yield
>                             140MB/s, and the read tests yield 53MB/s.
>             It's important
>                             to note here that
>                             if I cut the read test short at only
>             2-3GB, I get
>                             results upwards of
>                             350MB/s, which I assume is local
>             cache-related distortion.
> 
>                             Test 3: MTU of 1500. Read tests are up to
>             156 MB/s.
>                             Write tests yield
>                             about 142MB/s.
>                             Test 4: MTU of 1000: Read test at 182MB/s.
>                             Test 5: MTU of 900: Read test at 130 MB/s.
>                             Test 6: MTU of 1000: Read test at 160MB/s.
>             Write tests
>                             are now
>                             consistently at about 300MB/s.
>                             Test 7: MTU of 1200: Read test at 124MB/s.
>                             Test 8: MTU of 1000: Read test at 161MB/s.
>             Write at 261MB/s.
> 
>                             A few final notes:
>                             L1ARC grabs about 10GB of RAM during the
>             tests, so
>                             there's definitely some
>                             read caching going on.
>                             The write operations are easier to observe
>             with iostat,
>                             and I'm seeing io
>                             rates that closely correlate with the
>             network write speeds.
> 
> 
>                             Chris, thanks for your specific details.
>             I'd appreciate
>                             it if you could
>                             tell me which copper NIC you tried, as
>             well as to pass
>                             on the iSCSI tuning
>                             parameters.
> 
>                             I've ordered an Intel EXPX9502AFXSR, which
>             uses the
>                             82598 chip instead of
>                             the 82599 in the X520. If I get similar
>             results with my
>                             fiber transcievers,
>                             I'll see if I can get a hold of copper ones.
> 
>                             But I should mention that I did indeed
>             look at PHY/MAC
>                             error rates, and
>                             they are nil.
> 
>                             -Warren V
> 
>                             On Fri, Feb 20, 2015 at 7:25 PM, Chris
>             Siebenmann
>                             <cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>
>             <mailto:cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>> <mailto:cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>
> 
>             <mailto:cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>>>>
> 
>                             wrote:
> 
> 
>                                     After installation and
>             configuration, I observed
>                                     all kinds of bad
>                                     behavior
>                                     in the network traffic between the
>             hosts and the
>                                     server. All of this
>                                     bad
>                                     behavior is traced to the ixgbe
>             driver on the
>                                     storage server. Without
>                                     going
>                                     into the full troubleshooting
>             process, here are
>                                     my takeaways:
> 
>                                 [...]
> 
>                                    For what it's worth, we managed to
>             achieve much
>                                 better line rates on
>                                 copper 10G ixgbe hardware of various
>             descriptions
>                                 between OmniOS
>                                 and CentOS 7 (I don't think we ever
>             tested OmniOS to
>                                 OmniOS). I don't
>                                 believe OmniOS could do TCP at full
>             line rate but I
>                                 think we managed 700+
>                                 Mbytes/sec on both transmit and
>             receive and we got
>                                 basically disk-limited
>                                 speeds with iSCSI (across multiple
>             disks on
>                                 multi-disk mirrored pools,
>                                 OmniOS iSCSI initiator, Linux iSCSI
>             targets).
> 
>                                    I don't believe we did any specific
>             kernel tuning
>                                 (and in fact some of
>                                 our attempts to fiddle ixgbe driver
>             parameters blew
>                                 up in our face).
>                                 We did tune iSCSI connection
>             parameters to increase
>                                 various buffer
>                                 sizes so that ZFS could do even large
>             single
>                                 operations in single iSCSI
>                                 transactions. (More details available
>             if people are
>                                 interested.)
> 
>                                     10: At the wire level, the speed
>             problems are
>                                     clearly due to pauses in
>                                     response time by omnios. At 9000
>             byte frame
>                                     sizes, I see a good number
>                                     of duplicate ACKs and fast
>             retransmits during
>                                     read operations (when
>                                     omnios is transmitting). But below
>             about a
>                                     4100-byte MTU on omnios
>                                     (which seems to correlate to
>             4096-byte iSCSI
>                                     block transfers), the
>                                     transmission errors fade away and
>             we only see
>                                     the transmission pause
>                                     problem.
> 
> 
>                                    This is what really attracted my
>             attention. In
>                                 our OmniOS setup, our
>                                 specific Intel hardware had ixgbe
>             driver issues that
>                                 could cause
>                                 activity stalls during once-a-second
>             link heartbeat
>                                 checks. This
>                                 obviously had an effect at the TCP and
>             iSCSI layers.
>                                 My initial message
>                                 to illumos-developer sparked a potentially
>                                 interesting discussion:
> 
> 
>             http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/ <http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/>
>             <http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/ <http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/>>
> 
>             <http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/ <http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/>
>             <http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/ <http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/>>>
> 
>                                 If you think this is a possibility in
>             your setup,
>                                 I've put the DTrace
>                                 script I used to hunt for this up on
>             the web:
> 
>             http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d <http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d>
>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d>>
> 
>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d>
>             <http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d <http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d>>>
> 
>                                 This isn't the only potential source
>             of driver
>                                 stalls by any means, it's
>                                 just the one I found. You may also
>             want to look at
>                                 lockstat in general,
>                                 as information it reported is what led
>             us to look
>                                 specifically at the
>                                 ixgbe code here.
> 
>                                 (If you suspect kernel/driver issues,
>             lockstat
>                                 combined with kernel
>                                 source is a really excellent resource.)
> 
>                                           - cks
> 
> 
> 
> 
> 
>             ___________________________________________________
>                             OmniOS-discuss mailing list
>             OmniOS-discuss at lists.omniti
>             <mailto:OmniOS-discuss at lists.omniti <mailto:OmniOS-discuss at lists.omniti>>.____com
>                             <mailto:OmniOS-discuss at lists. <mailto:OmniOS-discuss at lists.>__omniti.com <http://omniti.com/>
>             <mailto:OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>>>
>             http://lists.omniti.com/____mailman/listinfo/omnios-____discuss <http://lists.omniti.com/____mailman/listinfo/omnios-____discuss>
>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>>
> 
>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>>
> 
> 
>                     ___________________________________________________
>                     OmniOS-discuss mailing list
>             OmniOS-discuss at lists.omniti
>             <mailto:OmniOS-discuss at lists.omniti <mailto:OmniOS-discuss at lists.omniti>>.____com
>                     <mailto:OmniOS-discuss at lists. <mailto:OmniOS-discuss at lists.>__omniti.com <http://omniti.com/>
>             <mailto:OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>>>
>             http://lists.omniti.com/____mailman/listinfo/omnios-____discuss <http://lists.omniti.com/____mailman/listinfo/omnios-____discuss>
>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>>
> 
>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>>
> 
> 
>                 --
>                 OSN Online Service Nuernberg GmbH, Bucher Str. 78,
>             90408 Nuernberg
>                 Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> <tel:%2B49%20911%2039905-0>
>             <tel:%2B49%20911%2039905-0> - Fax: +49 911
>                 39905-55 <tel:%2B49%20911%2039905-55> -
>             http://www.osn.de <http://www.osn.de/> <http://www.osn.de/ <http://www.osn.de/>>
>                 HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg
>             Goltermann
> 
> 
> 
>         --
>         OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>         Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> <tel:%2B49%20911%2039905-0> - Fax: +49
>         911 39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de <http://www.osn.de/>
>         <http://www.osn.de/ <http://www.osn.de/>>
>         HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
> 
> 
>     *illumos-developer* | Archives
>     <https://www.listbox.com/member/archive/182179/=now <https://www.listbox.com/member/archive/182179/=now>>
>     <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e>>
>     | Modify <https://www.listbox.com/member/?& <https://www.listbox.com/member/?&>> Your Subscription
>     [Powered by Listbox] <http://www.listbox.com/ <http://www.listbox.com/>>
> 
> 
> 
> *illumos-developer* | Archives
> <https://www.listbox.com/member/archive/182179/=now <https://www.listbox.com/member/archive/182179/=now>>
> <https://www.listbox.com/member/archive/rss/182179/21175123-d0c8da4c <https://www.listbox.com/member/archive/rss/182179/21175123-d0c8da4c>> |
> Modify
> <https://www.listbox.com/member/?&id_secret=21175123-d92578cc <https://www.listbox.com/member/?&>>
> Your Subscription       [Powered by Listbox] <http://www.listbox.com <http://www.listbox.com/>>
> 
> 
> -- 
> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
> Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> - Fax: +49 911 39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de <http://www.osn.de/>
> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
> 
> illumos-developer | Archives <https://www.listbox.com/member/archive/182179/=now>  <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e> | Modify <https://www.listbox.com/member/?member_id=21239177&id_secret=21239177-2d0c9337> Your Subscription	  <http://www.listbox.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150302/0d436664/attachment-0001.html>

From wverb73 at gmail.com  Mon Mar  2 20:19:44 2015
From: wverb73 at gmail.com (W Verb)
Date: Mon, 2 Mar 2015 12:19:44 -0800
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
 Lindsay Lohan, and the Greek economy
In-Reply-To: <064C3FC8-43E0-4757-AADA-831303782A4C@damore.org>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
	<54F44602.5030705@osn.de>
	<CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>
	<064C3FC8-43E0-4757-AADA-831303782A4C@damore.org>
Message-ID: <CAHN_Y276qRUD9y0DoZmm5Uh57DBXsEi2BJ3Z7G=DYt-p9mSmWQ@mail.gmail.com>

Hello,

vmstat seems pretty boring. Certainly nothing going to swap.

root at sanbox:/root# vmstat
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr po ro s0 s2   in   sy   cs us sy
id
 0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0  1
99


Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep 30"
during the "fast" write operation.
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest Lock           Caller
50934   3%  79% 0.00     3437 0xffffff093145ba40     taskq_dispatch_ent

      nsec ------ Time Distribution ------ count     Stack
       128 |                               7         spa_taskq_dispatch_ent
       256 |@@                             4333      zio_taskq_dispatch
       512 |@@                             3863      zio_issue_async
      1024 |@@@@@                          9717      zio_execute
      2048 |@@@@@@@@@                      15904
      4096 |@@@@                           7595
      8192 |@@                             4498
     16384 |@                              2662
     32768 |@                              1886
     65536 |                               434
    131072 |                               34
    262144 |                               1
-------------------------------------------------------------------------------



However, the truly "broken" function is a read operation:

Top lock 1st try:
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest Lock           Caller
  474  15%  15% 0.00     7031 0xffffff093145b6f8     cv_wait

      nsec ------ Time Distribution ------ count     Stack
       256 |@                              29        taskq_thread_wait
       512 |@@@@@@                         100       taskq_thread
      1024 |@@@@                           72        thread_start
      2048 |@@@@                           69
      4096 |@@@                            51
      8192 |@@                             47
     16384 |@@                             44
     32768 |@@                             32
     65536 |@                              25
    131072 |                               5
-------------------------------------------------------------------------------


Top lock 2nd try:
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest Lock           Caller
  174  39%  39% 0.00   103909 0xffffff0943f116a0     dmu_zfetch_find

      nsec ------ Time Distribution ------ count     Stack
      2048 |                               2         dmu_zfetch
      4096 |                               3         dbuf_read
      8192 |                               4
dmu_buf_hold_array_by_dnode
     16384 |                               3         dmu_buf_hold_array
     32768 |@                              7
     65536 |@@                             14
    131072 |@@@@@@@@@@@@@@@@@@@@           116
    262144 |@@@                            19
    524288 |                               4
   1048576 |                               2
-------------------------------------------------------------------------------

Top lock 3rd try:

-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest Lock           Caller
  283  55%  55% 0.00    94602 0xffffff0943ff5a68     dmu_zfetch_find

      nsec ------ Time Distribution ------ count     Stack
       512 |                               1         dmu_zfetch
      1024 |                               1         dbuf_read
      2048 |                               0
dmu_buf_hold_array_by_dnode
      4096 |                               5         dmu_buf_hold_array
      8192 |                               2
     16384 |                               7
     32768 |                               4
     65536 |@@@                            33
    131072 |@@@@@@@@@@@@@@@@@@@@           198
    262144 |@@                             27
    524288 |                               2
   1048576 |                               3
-------------------------------------------------------------------------------


As for the MTU question- setting the MTU to 9000 makes read operations
grind almost to a halt at 5MB/s transfer rate.

-Warren V

On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore <garrett at damore.org> wrote:

> Here?s a theory.  You are using small (relatively) MTUs (3000 is less than
> the smallest ZFS block size.)  So, when you go multipathing this way, might
> a single upper layer transaction (ZFS block transfer request, or for that
> matter COMSTAR block request) get routed over different paths.  This sounds
> like a potentially pathological condition to me.
>
> What happens if you increase the MTU to 9000?  Have you tried it?  I?m
> sort of thinking that this will permit each transaction to be issued in a
> single IP frame, which may alleviate certain tragic code paths.  (That
> said, I?m not sure how aware COMSTAR is of the IP MTU.  If it is ignorant,
> then it shouldn?t matter *that* much, since TCP should do the right thing
> here and a single TCP stream should stick to a single underlying NIC.  But
> if COMSTAR is aware of the MTU, it may do some really screwball things as
> it tries to break requests up into single frames.)
>
> Your read spin really looks like only about 22 msec of wait out of a total
> run of 30 sec.  (That?s not *great*, but neither does it sound tragic.)
>  Your write  is interesting because that looks like it is going a wildly
> different path.  You should be aware that the locks you see are *not*
> necessarily related in call order, but rather are ordered by instance
> count.  The write code path hitting the task_thread as hard as it does is
> really, really weird.  Something is pounding on a taskq lock super hard.
> The number of taskq_dispatch_ent calls is interesting here.  I?m starting
> to wonder if it?s something as stupid as a spin where if the taskq is
> ?full? (max size reached), a caller just is spinning trying to dispatch
> jobs to the taskq.
>
> The taskq_dispatch_ent code is super simple, and it should be almost
> impossible to have contention on that lock ? barring a thread spinning hard
> on taskq_dispatch (or taskq_dispatch_ent as I think is happening here).
> Looking at the various call sites, there are places in both COMSTAR
> (iscsit) and in ZFS where this could be coming from.  To know which, we
> really need to have the back trace associated.
>
> lockstat can give this ? try giving ?-s 5? to give a short backtrace from
> this, that will probably give us a little more info about the guilty
> caller. :-)
>
> - Garrett
>
> On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer <
> developer at lists.illumos.org> wrote:
>
> Hello all,
> I am not using layer 2 flow control. The switch carries line-rate 10G
> traffic without error.
>
> I think I have found the issue via lockstat. The first lockstat is taken
> during a multipath read:
>
>
> lockstat -kWP sleep 30
>
> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec)
>
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>
> -------------------------------------------------------------------------------
>  9306  44%  44% 0.00     1557 htable_mutex+0x370     htable_release
>  6307  23%  68% 0.00     1207 htable_mutex+0x108     htable_lookup
>   596   7%  75% 0.00     4100 0xffffff0931705188     cv_wait
>   349   5%  80% 0.00     4437 0xffffff0931705188     taskq_thread
>   704   2%  82% 0.00      995 0xffffff0935de3c50     dbuf_create
>
> The hash table being read here I would guess is the tcp connection hash
> table.
>
> When lockstat is run during a multipath write operation, I get:
>
> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec)
>
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>
> -------------------------------------------------------------------------------
> 210752  28%  28% 0.00     4781 0xffffff0931705188     taskq_thread
> 174471  22%  50% 0.00     4476 0xffffff0931705188     cv_wait
> 127183  10%  61% 0.00     2871 0xffffff096f29b510     zio_notify_parent
> 176066  10%  70% 0.00     1922 0xffffff0931705188     taskq_dispatch_ent
> 105134   9%  80% 0.00     3110 0xffffff096ffdbf10     zio_remove_child
> 67512   4%  83% 0.00     1938 0xffffff096f3db4b0     zio_add_child
> 45736   3%  86% 0.00     2239 0xffffff0935de3c50     dbuf_destroy
> 27781   3%  89% 0.00     3416 0xffffff0935de3c50     dbuf_create
> 38536   2%  91% 0.00     2122 0xffffff0935de3b70     dnode_rele
> 27841   2%  93% 0.00     2423 0xffffff0935de3b70     dnode_diduse_space
> 19020   2%  95% 0.00     3046 0xffffff09d9e305e0     dbuf_rele
> 14627   1%  96% 0.00     3632 dbuf_hash_table+0x4f8  dbuf_find
>
>
>
> Writes are not performing htable lookups, while reads are.
>
> -Warren V
>
>
>
>
>
>
> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann <jg at osn.de> wrote:
>
>> Hi,
>>
>> I would try *one* TPG which includes both interface addresses
>> and I would double check for packet drops on the Catalyst.
>>
>> The 3560 supports only receive flow control which means, that
>> a sending 10Gbit port can easily overload a 1Gbit port.
>> Do you have flow control enabled?
>>
>>  - Joerg
>>
>>
>> On 02.03.2015 09:22, W Verb via illumos-developer wrote:
>>
>>> Hello Garrett,
>>>
>>> No, no 802.3ad going on in this config.
>>>
>>> Here is a basic schematic:
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/
>>> view?usp=sharing
>>>
>>> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/
>>> view?usp=sharing
>>>
>>> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The
>>> switch is set to allow 9148-byte frames, and I'm not seeing any
>>> errors/buffer overruns on the switch.
>>>
>>> Here is a screenshot of a packet capture from a read operation on the
>>> guest OS (from it's local drive, which is actually a VMDK file on the
>>> storage server). In this example, only a single 1G ESXi kernel interface
>>> (vmk1) is bound to the software iSCSI initiator.
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/
>>> view?usp=sharing
>>>
>>> Note that there's a nice, well-behaved window sizing process taking
>>> place. The ESXi decreases the scaled window by 11 or 12 for each ACK,
>>> then bumps it back up to 512.
>>>
>>> Here is a similar screenshot of a single-interface write operation:
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/
>>> view?usp=sharing
>>>
>>> There are no pauses or gaps in the transmission rate in the
>>> single-interface transfers.
>>>
>>>
>>> In the next screenshots, I have enabled an additional 1G interface on
>>> the ESXi host, and bound it to the iSCSI initiator. The new interface is
>>> bound to a separate physical port, uses a different VLAN on the switch,
>>> and talks to a different 10G port on the storage server.
>>>
>>> First, let's look at a write operation on the guest OS, which happily
>>> pumps data at near-line-rate to the storage server.
>>>
>>> Here is a sequence number trace diagram. Note how the transfer has a
>>> nice, smooth increment rate over the entire transfer.
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/
>>> view?usp=sharing
>>>
>>> Here are screenshots from packet captures on both 1G interfaces:
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/
>>> view?usp=sharing
>>> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/
>>> view?usp=sharing
>>>
>>> Note how we again see nice, smooth window adjustment, and no gaps in
>>> transmission.
>>>
>>>
>>> But now, let's look at the problematic two-interface Read operation.
>>> First, the sequence graph:
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/
>>> view?usp=sharing
>>>
>>> As you can see, there are gaps and jumps in the transmission throughout
>>> the transfer.
>>> It is very illustrative to look at captures of the gaps, which are
>>> occurring on both interfaces:
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/
>>> view?usp=sharing
>>> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/
>>> view?usp=sharing
>>>
>>> As you can see, there are ~.4 second pauses in transmission from the
>>> storage server, which kills the transfer rate.
>>> It's clear that the ESXi box ACKs the prior iSCSI operation to
>>> completion, then makes a new LUN request, which the storage server
>>> immediately replies to. The ESXi ACKs the response packet from the
>>> storage server, then waits...and waits....and waits... until eventually
>>> the storage server starts transmitting again.
>>>
>>> Because the pause happens while the ESXi client is waiting for a packet
>>> from the storage server, that tells me that the gaps are not an artifact
>>> of traffic being switched between both active interfaces, but are
>>> actually indicative of short hangs occurring on the server.
>>>
>>> Having a pause or two in transmission is no big deal, but in my case, it
>>> is happening constantly, and dropping my overall read transfer rate down
>>> to 20-60MB/s, which is slower than the single interface transfer rate
>>> (~90-100MB/s).
>>>
>>> Decreasing the MTU makes the pauses shorter, increasing them makes the
>>> pauses longer.
>>>
>>> Another interesting thing is that if I set the multipath io interval to
>>> 3 operations instead of 1, I get better throughput. In other words, the
>>> less frequently I swap IP addresses on my iSCSI requests from the ESXi
>>> unit, the fewer pauses I see.
>>>
>>> Basically, COMSTAR seems to choke each time an iSCSI request from a new
>>> IP arrives.
>>>
>>> Because the single interface transfer is near line rate, that tells me
>>> that the storage system (mpt_sas, zfs, etc) is working fine. It's only
>>> when multiple paths are attempted that iSCSI falls on its face during
>>> reads.
>>>
>>> All of these captures were taken without a cache device being attached
>>> to the storage zpool, so this isn't looking like some kind of ZFS ARC
>>> problem. As mentioned previously, local transfers to/from the zpool are
>>> showing ~300-500 MB/s rates over long transfers (10G+).
>>>
>>> -Warren V
>>>
>>> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org
>>> <mailto:garrett at damore.org>> wrote:
>>>
>>>     I?m not sure I?ve followed properly.  You have *two* interfaces.
>>>     You are not trying to provision these in an aggr are you? As far as
>>>     I?m aware, VMware does not support 802.3ad link aggregations.  (Its
>>>     possible that you can make it work with ESXi if you give the entire
>>>     NIC to the guest ? but I?m skeptical.)  The problem is that if you
>>>     try to use link aggregation, some packets (up to half!) will be
>>>     lost.  TCP and other protocols fare poorly in this situation.
>>>
>>>     Its possible I?ve totally misunderstood what you?re trying to do, in
>>>     which case I apologize.
>>>
>>>     The idle thing is a red-herring ? the cpu is waiting for work to do,
>>>     probably because packets haven?t arrived (or where dropped by the
>>>     hypervisor!)  I wouldn?t read too much into that except that your
>>>     network stack is in trouble.  I?d look a bit more closely at the
>>>     kstats for tcp ? I suspect you?ll see retransmits or out of order
>>>     values that are unusually high ? if so this may help validate my
>>>     theory above.
>>>
>>>     - Garrett
>>>
>>>     On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>>>>     <developer at lists.illumos.org <mailto:developer at lists.illumos.org>>
>>>>
>>>>     wrote:
>>>>
>>>>     Hello all,
>>>>
>>>>
>>>>     Well, I no longer blame the ixgbe driver for the problems I'm
>>>> seeing.
>>>>
>>>>
>>>>     I tried Joerg's updated driver, which didn't improve the issue. So
>>>>     I went back to the drawing board and rebuilt the server from
>>>> scratch.
>>>>
>>>>     What I noted is that if I have only a single 1-gig physical
>>>>     interface active on the ESXi host, everything works as expected.
>>>>     As soon as I enable two interfaces, I start seeing the performance
>>>>     problems I've described.
>>>>
>>>>     Response pauses from the server that I see in TCPdumps are still
>>>>     leading me to believe the problem is delay on the server side, so
>>>>     I ran a series of kernel dtraces and produced some flamegraphs.
>>>>
>>>>
>>>>     This was taken during a read operation with two active 10G
>>>>     interfaces on the server, with a single target being shared by two
>>>>     tpgs- one tpg for each 10G physical port. The host device has two
>>>>     1G ports enabled, with VLANs separating the active ports into
>>>>     10G/1G pairs. ESXi is set to multipath using both VLANS with a
>>>>     round-robin IO interval of 1.
>>>>
>>>>     https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/
>>>> view?usp=sharing
>>>>
>>>>
>>>>     This was taken during a write operation:
>>>>
>>>>     https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/
>>>> view?usp=sharing
>>>>
>>>>
>>>>     I then rebooted the server and disabled C-State, ACPI T-State, and
>>>>     general EIST (Turbo boost) functionality in the CPU.
>>>>
>>>>     I when I attempted to boot my guest VM, the iSCSI transfer
>>>>     gradually ground to a halt during the boot loading process, and
>>>>     the guest OS never did complete its boot process.
>>>>
>>>>     Here is a flamegraph taken while iSCSI is slowly dying:
>>>>
>>>>     https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/
>>>> view?usp=sharing
>>>>
>>>>
>>>>     I edited out cpu_idle_adaptive from the dtrace output and
>>>>     regenerated the slowdown graph:
>>>>
>>>>     https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/
>>>> view?usp=sharing
>>>>
>>>>
>>>>     I then edited cpu_idle_adaptive out of the speedy write operation
>>>>     and regenerated that graph:
>>>>
>>>>     https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/
>>>> view?usp=sharing
>>>>
>>>>
>>>>     I have zero experience with interpreting flamegraphs, but the most
>>>>     significant difference I see between the slow read example and the
>>>>     fast write example is in unix`thread_start --> unix`idle. There's
>>>>     a good chunk of "unix`i86_mwait" in the read example that is not
>>>>     present in the write example at all.
>>>>
>>>>     Disabling the l2arc cache device didn't make a difference, and I
>>>>     had to reenable EIST support on the CPU to get my VMs to boot.
>>>>
>>>>     I am seeing a variety of bug reports going back to 2010 regarding
>>>>     excessive mwait operations, with the suggested solutions usually
>>>>     being to set "cpupm enable poll-mode" in power.conf. That change
>>>>     also had no effect on speed.
>>>>
>>>>     -Warren V
>>>>
>>>>
>>>>
>>>>
>>>>     -----Original Message-----
>>>>
>>>>     From: Chris Siebenmann [mailto:cks at cs.toronto.edu]
>>>>
>>>>     Sent: Monday, February 23, 2015 8:30 AM
>>>>
>>>>     To: W Verb
>>>>
>>>>     Cc: omnios-discuss at lists.omniti.com
>>>>     <mailto:omnios-discuss at lists.omniti.com>; cks at cs.toronto.edu
>>>>     <mailto:cks at cs.toronto.edu>
>>>>
>>>>     Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and
>>>>     the Greek economy
>>>>
>>>>
>>>>     > Chris, thanks for your specific details. I'd appreciate it if you
>>>>
>>>>     > could tell me which copper NIC you tried, as well as to pass on
>>>> the
>>>>
>>>>     > iSCSI tuning parameters.
>>>>
>>>>
>>>>     Our copper NIC experience is with onboard X540-AT2 ports on
>>>>     SuperMicro hardware (which have the guaranteed 10-20 msec lock
>>>>     hold) and dual-port 82599EB TN cards (which have some sort of
>>>>     driver/hardware failure under load that eventually leads to
>>>>     2-second lock holds). I can't recommend either with the current
>>>>     driver; we had to revert to 1G networking in order to get stable
>>>>     servers.
>>>>
>>>>
>>>>     The iSCSI parameter modifications we do, across both initiators
>>>>     and targets, are:
>>>>
>>>>
>>>>     initialr2tno
>>>>
>>>>     firstburstlength128k
>>>>
>>>>     maxrecvdataseglen128k[only on Linux backends]
>>>>
>>>>     maxxmitdataseglen128k[only on Linux backends]
>>>>
>>>>
>>>>     The OmniOS initiator doesn't need tuning for more than the first
>>>>     two parameters; on the Linux backends we tune up all four. My
>>>>     extended thoughts on these tuning parameters and why we touch them
>>>>     can be found
>>>>
>>>>     here:
>>>>
>>>>
>>>>     http://utcc.utoronto.ca/~cks/space/blog/tech/
>>>> UnderstandingiSCSIProtocol
>>>>
>>>>     http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>>>>
>>>>
>>>>     The short version is that these parameters probably only make a
>>>>     small difference but their overall goal is to do 128KB ZFS reads
>>>>     and writes in single iSCSI operations (although they will be
>>>>     fragmented at the TCP
>>>>
>>>>     layer) and to do iSCSI writes without a back-and-forth delay
>>>>     between initiator and target (that's 'initialr2t no').
>>>>
>>>>
>>>>     I think basically everyone should use InitialR2T set to no and in
>>>>     fact that it should be the software default. These days only
>>>>     unusually limited iSCSI targets should need it to be otherwise and
>>>>     they can change their setting for it (initiator and target must
>>>>     both agree to it being 'yes', so either can veto it).
>>>>
>>>>
>>>>     - cks
>>>>
>>>>
>>>>
>>>>     On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de
>>>>     <mailto:jg at osn.de>> wrote:
>>>>
>>>>         Hi,
>>>>
>>>>         I think your problem is caused by your link properties or your
>>>>         switch settings. In general the standard ixgbe seems to perform
>>>>         well.
>>>>
>>>>         I had trouble after changing the default flow control settings
>>>>         to "bi"
>>>>         and this was my motivation to update the ixgbe driver a long
>>>>         time ago.
>>>>         After I have updated our systems to ixgbe 2.5.8 I never had any
>>>>         problems ....
>>>>
>>>>         Make sure your switch has support for jumbo frames and you use
>>>>         the same mtu on all ports, otherwise the smallest will be used.
>>>>
>>>>         What switch do you use? I can tell you nice horror stories about
>>>>         different vendors....
>>>>
>>>>          - Joerg
>>>>
>>>>         On 23.02.2015 10:31, W Verb wrote:
>>>>
>>>>             Thank you Joerg,
>>>>
>>>>             I've downloaded the package and will try it tomorrow.
>>>>
>>>>             The only thing I can add at this point is that upon review
>>>>             of my
>>>>             testing, I may have performed my "pkg -u" between the
>>>>             initial quad-gig
>>>>             performance test and installing the 10G NIC. So this may
>>>>             be a new
>>>>             problem introduced in the latest updates.
>>>>
>>>>             Those of you who are running 10G and have not upgraded to
>>>>             the latest
>>>>             kernel, etc, might want to do some additional testing
>>>>             before running the
>>>>             update.
>>>>
>>>>             -Warren V
>>>>
>>>>             On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann
>>>>             <jg at osn.de <mailto:jg at osn.de>
>>>>             <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
>>>>
>>>>                 Hi,
>>>>
>>>>                 I remember there was a problem with the flow control
>>>>             settings in the
>>>>                 ixgbe
>>>>                 driver, so I updated it a long time ago for our
>>>>             internal servers to
>>>>                 2.5.8.
>>>>                 Last weekend I integrated the latest changes from the
>>>>             FreeBSD driver
>>>>                 to bring
>>>>                 the illumos ixgbe to 2.5.25 but I had no time to test
>>>>             it, so it's
>>>>                 completely
>>>>                 untested!
>>>>
>>>>
>>>>                 If you would like to give the latest driver a try you
>>>>             can fetch the
>>>>                 kernel modules from
>>>>             https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9
>>>>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>>>>                 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>>>>             <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>
>>>>
>>>>                 Clone your boot environment, place the modules in the
>>>>             new environment
>>>>                 and update the boot-archive of the new BE.
>>>>
>>>>                   - Joerg
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                 On 23.02.2015 02:54, W Verb wrote:
>>>>
>>>>                     By the way, to those of you who have working
>>>>             setups: please send me
>>>>                     your pool/volume settings, interface linkprops,
>>>>             and any kernel
>>>>                     tuning
>>>>                     parameters you may have set.
>>>>
>>>>                     Thanks,
>>>>                     Warren V
>>>>
>>>>                     On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>>>                     <chip at innovates.com <mailto:chip at innovates.com>
>>>>             <mailto:chip at innovates.com <mailto:chip at innovates.com>>>
>>>>
>>>>             wrote:
>>>>
>>>>                         I can't say I totally agree with your
>>>> performance
>>>>                         assessment.   I run Intel
>>>>                         X520 in all my OmniOS boxes.
>>>>
>>>>                         Here is a capture of nfssvrtop I made while
>>>>             running many
>>>>                         storage vMotions
>>>>                         between two OmniOS boxes hosting NFS
>>>>             datastores.   This is a
>>>>                         10 host VMware
>>>>                         cluster.  Both OmniOS boxes are dual 10G
>>>>             connected with
>>>>                         copper twin-ax to
>>>>                         the in rack Nexus 5010.
>>>>
>>>>                         VMware does 100% sync writes, I use ZeusRAM
>>>>             SSDs for log
>>>>                         devices.
>>>>
>>>>                         -Chip
>>>>
>>>>                         2014 Apr 24 08:05:51, load: 12.64, read:
>>>>             17330243 KB,
>>>>                         swrite: 15985    KB,
>>>>                         awrite: 1875455  KB
>>>>
>>>>                         Ver     Client           NFSOPS   Reads
>>>>             SWrites AWrites
>>>>                         Commits   Rd_bw
>>>>                         SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t
>>>>              Com_t  Align%
>>>>
>>>>                         4       10.28.17.105          0       0
>>>>              0       0
>>>>                           0       0
>>>>                         0       0       0       0       0       0
>>>>  0
>>>>
>>>>                         4       10.28.17.215          0       0
>>>>              0       0
>>>>                           0       0
>>>>                         0       0       0       0       0       0
>>>>  0
>>>>
>>>>                         4       10.28.17.213          0       0
>>>>              0       0
>>>>                           0       0
>>>>                         0       0       0       0       0       0
>>>>  0
>>>>
>>>>                         4       10.28.16.151          0       0
>>>>              0       0
>>>>                           0       0
>>>>                         0       0       0       0       0       0
>>>>  0
>>>>
>>>>                         4       all                   1       0
>>>>              0       0
>>>>                           0       0
>>>>                         0       0       0       0       0       0
>>>>  0
>>>>
>>>>                         3       10.28.16.175          3       0
>>>>              3       0
>>>>                           0       1
>>>>                         11       0    4806      48       0       0
>>>> 85
>>>>
>>>>                         3       10.28.16.183          6       0
>>>>              6       0
>>>>                           0       3
>>>>                         162       0     549     124       0       0
>>>>               73
>>>>
>>>>                         3       10.28.16.180         11       0
>>>>             10       0
>>>>                           0       3
>>>>                         27       0     776      89       0       0
>>>> 67
>>>>
>>>>                         3       10.28.16.176         28       2
>>>>             26       0
>>>>                           0      10
>>>>                         405       0    2572     198       0       0
>>>>              100
>>>>
>>>>                         3       10.28.16.178       4606    4602
>>>>              4       0
>>>>                           0  294534
>>>>                         3       0     723      49       0       0
>>>> 99
>>>>
>>>>                         3       10.28.16.179       4905    4879
>>>>             26       0
>>>>                           0  312208
>>>>                         311       0     735     271       0       0
>>>>               99
>>>>
>>>>                         3       10.28.16.181       5515    5502
>>>>             13       0
>>>>                           0  352107
>>>>                         77       0      89      87       0       0
>>>> 99
>>>>
>>>>                         3       10.28.16.184      12095   12059
>>>>             10       0
>>>>                           0  763014
>>>>                         39       0     249     147       0       0
>>>> 99
>>>>
>>>>                         3       10.28.58.1        15401    6040
>>>>              116    6354
>>>>                         53  191605
>>>>                         474  202346     192      96     144      83
>>>>               99
>>>>
>>>>                         3       all 42574 33086 <tel:42574%2033086>
>>>>             <tel:42574%20%20%2033086>     217
>>>>                         6354      53 1913488
>>>>                         1582  202300     348     138     153     105
>>>>                 99
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                         On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>>>>             <wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>>>>                         <mailto:wverb73 at gmail.com
>>>>
>>>>             <mailto:wverb73 at gmail.com>>> wrote:
>>>>
>>>>
>>>>                             Hello All,
>>>>
>>>>                             Thank you for your replies.
>>>>                             I tried a few things, and found the
>>>> following:
>>>>
>>>>                             1: Disabling hyperthreading support in the
>>>>             BIOS drops
>>>>                             performance overall
>>>>                             by a factor of 4.
>>>>                             2: Disabling VT support also seems to have
>>>>             some effect,
>>>>                             although it
>>>>                             appears to be minor. But this has the
>>>>             amusing side
>>>>                             effect of fixing the
>>>>                             hangs I've been experiencing with fast
>>>>             reboot. Probably
>>>>                             by disabling kvm.
>>>>                             3: The performance tests are a bit tricky
>>>>             to quantify
>>>>                             because of caching
>>>>                             effects. In fact, I'm not entirely sure
>>>>             what is
>>>>                             happening here. It's just
>>>>                             best to describe what I'm seeing:
>>>>
>>>>                             The commands I'm using to test are
>>>>                             dd if=/dev/zero of=./test.dd bs=2M
>>>> count=5000
>>>>                             dd of=/dev/null if=./test.dd bs=2M
>>>> count=5000
>>>>                             The host vm is running Centos 6.6, and has
>>>>             the latest
>>>>                             vmtools installed.
>>>>                             There is a host cache on an SSD local to
>>>>             the host that
>>>>                             is also in place.
>>>>                             Disabling the host cache didn't
>>>>             immediately have an
>>>>                             effect as far as I could
>>>>                             see.
>>>>
>>>>                             The host MTU set to 3000 on all iSCSI
>>>>             interfaces for all
>>>>                             tests.
>>>>
>>>>                             Test 1: Right after reboot, with an ixgbe
>>>>             MTU of 9000,
>>>>                             the write test
>>>>                             yields an average speed over three tests
>>>>             of 137MB/s. The
>>>>                             read test yields an
>>>>                             average over three tests of 5MB/s.
>>>>
>>>>                             Test 2: After setting "ifconfig ixgbe0 mtu
>>>>             3000", the
>>>>                             write tests yield
>>>>                             140MB/s, and the read tests yield 53MB/s.
>>>>             It's important
>>>>                             to note here that
>>>>                             if I cut the read test short at only
>>>>             2-3GB, I get
>>>>                             results upwards of
>>>>                             350MB/s, which I assume is local
>>>>             cache-related distortion.
>>>>
>>>>                             Test 3: MTU of 1500. Read tests are up to
>>>>             156 MB/s.
>>>>                             Write tests yield
>>>>                             about 142MB/s.
>>>>                             Test 4: MTU of 1000: Read test at 182MB/s.
>>>>                             Test 5: MTU of 900: Read test at 130 MB/s.
>>>>                             Test 6: MTU of 1000: Read test at 160MB/s.
>>>>             Write tests
>>>>                             are now
>>>>                             consistently at about 300MB/s.
>>>>                             Test 7: MTU of 1200: Read test at 124MB/s.
>>>>                             Test 8: MTU of 1000: Read test at 161MB/s.
>>>>             Write at 261MB/s.
>>>>
>>>>                             A few final notes:
>>>>                             L1ARC grabs about 10GB of RAM during the
>>>>             tests, so
>>>>                             there's definitely some
>>>>                             read caching going on.
>>>>                             The write operations are easier to observe
>>>>             with iostat,
>>>>                             and I'm seeing io
>>>>                             rates that closely correlate with the
>>>>             network write speeds.
>>>>
>>>>
>>>>                             Chris, thanks for your specific details.
>>>>             I'd appreciate
>>>>                             it if you could
>>>>                             tell me which copper NIC you tried, as
>>>>             well as to pass
>>>>                             on the iSCSI tuning
>>>>                             parameters.
>>>>
>>>>                             I've ordered an Intel EXPX9502AFXSR, which
>>>>             uses the
>>>>                             82598 chip instead of
>>>>                             the 82599 in the X520. If I get similar
>>>>             results with my
>>>>                             fiber transcievers,
>>>>                             I'll see if I can get a hold of copper ones.
>>>>
>>>>                             But I should mention that I did indeed
>>>>             look at PHY/MAC
>>>>                             error rates, and
>>>>                             they are nil.
>>>>
>>>>                             -Warren V
>>>>
>>>>                             On Fri, Feb 20, 2015 at 7:25 PM, Chris
>>>>             Siebenmann
>>>>                             <cks at cs.toronto.edu
>>>>             <mailto:cks at cs.toronto.edu> <mailto:cks at cs.toronto.edu
>>>>
>>>>             <mailto:cks at cs.toronto.edu>>>
>>>>
>>>>                             wrote:
>>>>
>>>>
>>>>                                     After installation and
>>>>             configuration, I observed
>>>>                                     all kinds of bad
>>>>                                     behavior
>>>>                                     in the network traffic between the
>>>>             hosts and the
>>>>                                     server. All of this
>>>>                                     bad
>>>>                                     behavior is traced to the ixgbe
>>>>             driver on the
>>>>                                     storage server. Without
>>>>                                     going
>>>>                                     into the full troubleshooting
>>>>             process, here are
>>>>                                     my takeaways:
>>>>
>>>>                                 [...]
>>>>
>>>>                                    For what it's worth, we managed to
>>>>             achieve much
>>>>                                 better line rates on
>>>>                                 copper 10G ixgbe hardware of various
>>>>             descriptions
>>>>                                 between OmniOS
>>>>                                 and CentOS 7 (I don't think we ever
>>>>             tested OmniOS to
>>>>                                 OmniOS). I don't
>>>>                                 believe OmniOS could do TCP at full
>>>>             line rate but I
>>>>                                 think we managed 700+
>>>>                                 Mbytes/sec on both transmit and
>>>>             receive and we got
>>>>                                 basically disk-limited
>>>>                                 speeds with iSCSI (across multiple
>>>>             disks on
>>>>                                 multi-disk mirrored pools,
>>>>                                 OmniOS iSCSI initiator, Linux iSCSI
>>>>             targets).
>>>>
>>>>                                    I don't believe we did any specific
>>>>             kernel tuning
>>>>                                 (and in fact some of
>>>>                                 our attempts to fiddle ixgbe driver
>>>>             parameters blew
>>>>                                 up in our face).
>>>>                                 We did tune iSCSI connection
>>>>             parameters to increase
>>>>                                 various buffer
>>>>                                 sizes so that ZFS could do even large
>>>>             single
>>>>                                 operations in single iSCSI
>>>>                                 transactions. (More details available
>>>>             if people are
>>>>                                 interested.)
>>>>
>>>>                                     10: At the wire level, the speed
>>>>             problems are
>>>>                                     clearly due to pauses in
>>>>                                     response time by omnios. At 9000
>>>>             byte frame
>>>>                                     sizes, I see a good number
>>>>                                     of duplicate ACKs and fast
>>>>             retransmits during
>>>>                                     read operations (when
>>>>                                     omnios is transmitting). But below
>>>>             about a
>>>>                                     4100-byte MTU on omnios
>>>>                                     (which seems to correlate to
>>>>             4096-byte iSCSI
>>>>                                     block transfers), the
>>>>                                     transmission errors fade away and
>>>>             we only see
>>>>                                     the transmission pause
>>>>                                     problem.
>>>>
>>>>
>>>>                                    This is what really attracted my
>>>>             attention. In
>>>>                                 our OmniOS setup, our
>>>>                                 specific Intel hardware had ixgbe
>>>>             driver issues that
>>>>                                 could cause
>>>>                                 activity stalls during once-a-second
>>>>             link heartbeat
>>>>                                 checks. This
>>>>                                 obviously had an effect at the TCP and
>>>>             iSCSI layers.
>>>>                                 My initial message
>>>>                                 to illumos-developer sparked a
>>>> potentially
>>>>                                 interesting discussion:
>>>>
>>>>
>>>>             http://www.listbox.com/member/____archive/182179/2014/10/
>>>> sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__
>>>> 4B1D-__11E4-A39C-D534381BA44D/
>>>>             <http://www.listbox.com/member/__archive/182179/2014/
>>>> 10/sort/__time_rev/page/16/entry/6:405/__20141003125035:
>>>> 6357079A-4B1D-__11E4-A39C-D534381BA44D/>
>>>>
>>>>             <http://www.listbox.com/__member/archive/182179/2014/10/
>>>> __sort/time_rev/page/16/entry/6:__405/20141003125035:
>>>> 6357079A-__4B1D-11E4-A39C-D534381BA44D/
>>>>             <http://www.listbox.com/member/archive/182179/2014/10/
>>>> sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-
>>>> 4B1D-11E4-A39C-D534381BA44D/>>
>>>>
>>>>                                 If you think this is a possibility in
>>>>             your setup,
>>>>                                 I've put the DTrace
>>>>                                 script I used to hunt for this up on
>>>>             the web:
>>>>
>>>>             http://www.cs.toronto.edu/~___
>>>> _cks/src/omnios-ixgbe/ixgbe_____delay.d
>>>>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___
>>>> delay.d>
>>>>
>>>>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___
>>>> delay.d
>>>>             <http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_
>>>> delay.d>>
>>>>
>>>>                                 This isn't the only potential source
>>>>             of driver
>>>>                                 stalls by any means, it's
>>>>                                 just the one I found. You may also
>>>>             want to look at
>>>>                                 lockstat in general,
>>>>                                 as information it reported is what led
>>>>             us to look
>>>>                                 specifically at the
>>>>                                 ixgbe code here.
>>>>
>>>>                                 (If you suspect kernel/driver issues,
>>>>             lockstat
>>>>                                 combined with kernel
>>>>                                 source is a really excellent resource.)
>>>>
>>>>                                           - cks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>             ___________________________________________________
>>>>                             OmniOS-discuss mailing list
>>>>             OmniOS-discuss at lists.omniti
>>>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>>>                             <mailto:OmniOS-discuss at lists.__omniti.com
>>>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>>>             http://lists.omniti.com/____mailman/listinfo/omnios-____
>>>> discuss
>>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__
>>>> discuss>
>>>>
>>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__
>>>> discuss
>>>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>>>
>>>>
>>>>                     ___________________________________________________
>>>>                     OmniOS-discuss mailing list
>>>>             OmniOS-discuss at lists.omniti
>>>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>>>                     <mailto:OmniOS-discuss at lists.__omniti.com
>>>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>>>             http://lists.omniti.com/____mailman/listinfo/omnios-____
>>>> discuss
>>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__
>>>> discuss>
>>>>
>>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__
>>>> discuss
>>>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>>>
>>>>
>>>>                 --
>>>>                 OSN Online Service Nuernberg GmbH, Bucher Str. 78,
>>>>             90408 Nuernberg
>>>>                 Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0>
>>>>             <tel:%2B49%20911%2039905-0> - Fax: +49 911
>>>>                 39905-55 <tel:%2B49%20911%2039905-55> -
>>>>             http://www.osn.de <http://www.osn.de/>
>>>>                 HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg
>>>>             Goltermann
>>>>
>>>>
>>>>
>>>>         --
>>>>         OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408
>>>> Nuernberg
>>>>         Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> - Fax: +49
>>>>         911 39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de
>>>>         <http://www.osn.de/>
>>>>         HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>>>
>>>>
>>>>     *illumos-developer* | Archives
>>>>     <https://www.listbox.com/member/archive/182179/=now>
>>>>     <https://www.listbox.com/member/archive/rss/182179/
>>>> 21239177-3604570e>
>>>>     | Modify <https://www.listbox.com/member/?&> Your Subscription
>>>>     [Powered by Listbox] <http://www.listbox.com/>
>>>>
>>>>
>>>
>>> *illumos-developer* | Archives
>>> <https://www.listbox.com/member/archive/182179/=now>
>>> <https://www.listbox.com/member/archive/rss/182179/21175123-d0c8da4c> |
>>> Modify
>>> <https://www.listbox.com/member/?&id_secret=21175123-d92578cc
>>> <https://www.listbox.com/member/?&>>
>>> Your Subscription       [Powered by Listbox] <http://www.listbox.com>
>>>
>>>
>> --
>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>> Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de
>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>
>
> *illumos-developer* | Archives
> <https://www.listbox.com/member/archive/182179/=now>
> <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e> |
> Modify
> <https://www.listbox.com/member/?member_id=21239177&id_secret=21239177-2d0c9337>
> Your Subscription <http://www.listbox.com/>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150302/b652d816/attachment-0001.html>

From garrett at damore.org  Mon Mar  2 20:29:55 2015
From: garrett at damore.org (Garrett D'Amore)
Date: Mon, 2 Mar 2015 12:29:55 -0800
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
	Lindsay Lohan, and the Greek economy
In-Reply-To: <CAHN_Y276qRUD9y0DoZmm5Uh57DBXsEi2BJ3Z7G=DYt-p9mSmWQ@mail.gmail.com>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
	<54F44602.5030705@osn.de>
	<CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>
	<064C3FC8-43E0-4757-AADA-831303782A4C@damore.org>
	<CAHN_Y276qRUD9y0DoZmm5Uh57DBXsEi2BJ3Z7G=DYt-p9mSmWQ@mail.gmail.com>
Message-ID: <B44D8B86-095D-4328-90DD-11365019FAE1@damore.org>

Please include the *full* lockstep output ? in particular, the values give below look relatively normal ? there are however *multiple* groups emitted from lockstat ? blocking, spinning, etc.   You can?t just include the very first entry because it might be for a lock condition (such as blocking) that doesn?t occur much.  (The cases you list below are ?clean?, representing total delays measured in milliseconds or even hundreds of micros, so not very interesting.)

	- Garrett

> On Mar 2, 2015, at 12:19 PM, W Verb <wverb73 at gmail.com> wrote:
> 
> Hello,
> 
> vmstat seems pretty boring. Certainly nothing going to swap.
> 
> root at sanbox:/root# vmstat
>  kthr      memory            page            disk          faults      cpu
>  r b w   swap  free  re  mf pi po fr de sr po ro s0 s2   in   sy   cs us sy id
>  0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0  1 99
> 
> 
> Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep 30" during the "fast" write operation.
> -------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest Lock           Caller
> 50934   3%  79% 0.00     3437 0xffffff093145ba40     taskq_dispatch_ent
> 
>       nsec ------ Time Distribution ------ count     Stack
>        128 |                               7         spa_taskq_dispatch_ent
>        256 |@@                             4333      zio_taskq_dispatch
>        512 |@@                             3863      zio_issue_async
>       1024 |@@@@@                          9717      zio_execute
>       2048 |@@@@@@@@@                      15904
>       4096 |@@@@                           7595
>       8192 |@@                             4498
>      16384 |@                              2662
>      32768 |@                              1886
>      65536 |                               434
>     131072 |                               34
>     262144 |                               1
> -------------------------------------------------------------------------------
> 
> 
> 
> However, the truly "broken" function is a read operation:
> 
> Top lock 1st try:
> -------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>   474  15%  15% 0.00     7031 0xffffff093145b6f8     cv_wait
> 
>       nsec ------ Time Distribution ------ count     Stack
>        256 |@                              29        taskq_thread_wait
>        512 |@@@@@@                         100       taskq_thread
>       1024 |@@@@                           72        thread_start
>       2048 |@@@@                           69
>       4096 |@@@                            51
>       8192 |@@                             47
>      16384 |@@                             44
>      32768 |@@                             32
>      65536 |@                              25
>     131072 |                               5
> -------------------------------------------------------------------------------
> 
> 
> Top lock 2nd try:
> -------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>   174  39%  39% 0.00   103909 0xffffff0943f116a0     dmu_zfetch_find
> 
>       nsec ------ Time Distribution ------ count     Stack
>       2048 |                               2         dmu_zfetch
>       4096 |                               3         dbuf_read
>       8192 |                               4         dmu_buf_hold_array_by_dnode
>      16384 |                               3         dmu_buf_hold_array
>      32768 |@                              7
>      65536 |@@                             14
>     131072 |@@@@@@@@@@@@@@@@@@@@           116
>     262144 |@@@                            19
>     524288 |                               4
>    1048576 |                               2
> -------------------------------------------------------------------------------
> 
> Top lock 3rd try:
> 
> -------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>   283  55%  55% 0.00    94602 0xffffff0943ff5a68     dmu_zfetch_find
> 
>       nsec ------ Time Distribution ------ count     Stack
>        512 |                               1         dmu_zfetch
>       1024 |                               1         dbuf_read
>       2048 |                               0         dmu_buf_hold_array_by_dnode
>       4096 |                               5         dmu_buf_hold_array
>       8192 |                               2
>      16384 |                               7
>      32768 |                               4
>      65536 |@@@                            33
>     131072 |@@@@@@@@@@@@@@@@@@@@           198
>     262144 |@@                             27
>     524288 |                               2
>    1048576 |                               3
> -------------------------------------------------------------------------------
> 
> 
> As for the MTU question- setting the MTU to 9000 makes read operations grind almost to a halt at 5MB/s transfer rate.
> 
> -Warren V
> 
> On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore <garrett at damore.org <mailto:garrett at damore.org>> wrote:
> Here?s a theory.  You are using small (relatively) MTUs (3000 is less than the smallest ZFS block size.)  So, when you go multipathing this way, might a single upper layer transaction (ZFS block transfer request, or for that matter COMSTAR block request) get routed over different paths.  This sounds like a potentially pathological condition to me.
> 
> What happens if you increase the MTU to 9000?  Have you tried it?  I?m sort of thinking that this will permit each transaction to be issued in a single IP frame, which may alleviate certain tragic code paths.  (That said, I?m not sure how aware COMSTAR is of the IP MTU.  If it is ignorant, then it shouldn?t matter *that* much, since TCP should do the right thing here and a single TCP stream should stick to a single underlying NIC.  But if COMSTAR is aware of the MTU, it may do some really screwball things as it tries to break requests up into single frames.)
> 
> Your read spin really looks like only about 22 msec of wait out of a total run of 30 sec.  (That?s not *great*, but neither does it sound tragic.)  Your write  is interesting because that looks like it is going a wildly different path.  You should be aware that the locks you see are *not* necessarily related in call order, but rather are ordered by instance count.  The write code path hitting the task_thread as hard as it does is really, really weird.  Something is pounding on a taskq lock super hard.  The number of taskq_dispatch_ent calls is interesting here.  I?m starting to wonder if it?s something as stupid as a spin where if the taskq is ?full? (max size reached), a caller just is spinning trying to dispatch jobs to the taskq.  
> 
> The taskq_dispatch_ent code is super simple, and it should be almost impossible to have contention on that lock ? barring a thread spinning hard on taskq_dispatch (or taskq_dispatch_ent as I think is happening here).  Looking at the various call sites, there are places in both COMSTAR (iscsit) and in ZFS where this could be coming from.  To know which, we really need to have the back trace associated. 
> 
> lockstat can give this ? try giving ?-s 5? to give a short backtrace from this, that will probably give us a little more info about the guilty caller. :-)
> 
> 	- Garrett
> 
>> On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer <developer at lists.illumos.org <mailto:developer at lists.illumos.org>> wrote:
>> 
>> Hello all,
>> I am not using layer 2 flow control. The switch carries line-rate 10G traffic without error.
>> 
>> I think I have found the issue via lockstat. The first lockstat is taken during a multipath read:
>> 
>> 
>> lockstat -kWP sleep 30
>> 
>> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec)
>> 
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>> -------------------------------------------------------------------------------
>>  9306  44%  44% 0.00     1557 htable_mutex+0x370     htable_release
>>  6307  23%  68% 0.00     1207 htable_mutex+0x108     htable_lookup
>>   596   7%  75% 0.00     4100 0xffffff0931705188     cv_wait
>>   349   5%  80% 0.00     4437 0xffffff0931705188     taskq_thread
>>   704   2%  82% 0.00      995 0xffffff0935de3c50     dbuf_create
>> 
>> The hash table being read here I would guess is the tcp connection hash table.
>> 
>> When lockstat is run during a multipath write operation, I get:
>> 
>> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec)
>> 
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>> -------------------------------------------------------------------------------
>> 210752  28%  28% 0.00     4781 0xffffff0931705188     taskq_thread
>> 174471  22%  50% 0.00     4476 0xffffff0931705188     cv_wait
>> 127183  10%  61% 0.00     2871 0xffffff096f29b510     zio_notify_parent
>> 176066  10%  70% 0.00     1922 0xffffff0931705188     taskq_dispatch_ent
>> 105134   9%  80% 0.00     3110 0xffffff096ffdbf10     zio_remove_child
>> 67512   4%  83% 0.00     1938 0xffffff096f3db4b0     zio_add_child
>> 45736   3%  86% 0.00     2239 0xffffff0935de3c50     dbuf_destroy
>> 27781   3%  89% 0.00     3416 0xffffff0935de3c50     dbuf_create
>> 38536   2%  91% 0.00     2122 0xffffff0935de3b70     dnode_rele
>> 27841   2%  93% 0.00     2423 0xffffff0935de3b70     dnode_diduse_space
>> 19020   2%  95% 0.00     3046 0xffffff09d9e305e0     dbuf_rele
>> 14627   1%  96% 0.00     3632 dbuf_hash_table+0x4f8  dbuf_find
>> 
>> 
>> 
>> Writes are not performing htable lookups, while reads are.
>> 
>> -Warren V
>> 
>> 
>> 
>> 
>> 
>> 
>> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann <jg at osn.de <mailto:jg at osn.de>> wrote:
>> Hi,
>> 
>> I would try *one* TPG which includes both interface addresses
>> and I would double check for packet drops on the Catalyst.
>> 
>> The 3560 supports only receive flow control which means, that
>> a sending 10Gbit port can easily overload a 1Gbit port.
>> Do you have flow control enabled?
>> 
>>  - Joerg
>> 
>> 
>> On 02.03.2015 09:22, W Verb via illumos-developer wrote:
>> Hello Garrett,
>> 
>> No, no 802.3ad going on in this config.
>> 
>> Here is a basic schematic:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing>
>> 
>> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing>
>> 
>> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The
>> switch is set to allow 9148-byte frames, and I'm not seeing any
>> errors/buffer overruns on the switch.
>> 
>> Here is a screenshot of a packet capture from a read operation on the
>> guest OS (from it's local drive, which is actually a VMDK file on the
>> storage server). In this example, only a single 1G ESXi kernel interface
>> (vmk1) is bound to the software iSCSI initiator.
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing>
>> 
>> Note that there's a nice, well-behaved window sizing process taking
>> place. The ESXi decreases the scaled window by 11 or 12 for each ACK,
>> then bumps it back up to 512.
>> 
>> Here is a similar screenshot of a single-interface write operation:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing>
>> 
>> There are no pauses or gaps in the transmission rate in the
>> single-interface transfers.
>> 
>> 
>> In the next screenshots, I have enabled an additional 1G interface on
>> the ESXi host, and bound it to the iSCSI initiator. The new interface is
>> bound to a separate physical port, uses a different VLAN on the switch,
>> and talks to a different 10G port on the storage server.
>> 
>> First, let's look at a write operation on the guest OS, which happily
>> pumps data at near-line-rate to the storage server.
>> 
>> Here is a sequence number trace diagram. Note how the transfer has a
>> nice, smooth increment rate over the entire transfer.
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing>
>> 
>> Here are screenshots from packet captures on both 1G interfaces:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing>
>> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing>
>> 
>> Note how we again see nice, smooth window adjustment, and no gaps in
>> transmission.
>> 
>> 
>> But now, let's look at the problematic two-interface Read operation.
>> First, the sequence graph:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing>
>> 
>> As you can see, there are gaps and jumps in the transmission throughout
>> the transfer.
>> It is very illustrative to look at captures of the gaps, which are
>> occurring on both interfaces:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing>
>> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing>
>> 
>> As you can see, there are ~.4 second pauses in transmission from the
>> storage server, which kills the transfer rate.
>> It's clear that the ESXi box ACKs the prior iSCSI operation to
>> completion, then makes a new LUN request, which the storage server
>> immediately replies to. The ESXi ACKs the response packet from the
>> storage server, then waits...and waits....and waits... until eventually
>> the storage server starts transmitting again.
>> 
>> Because the pause happens while the ESXi client is waiting for a packet
>> from the storage server, that tells me that the gaps are not an artifact
>> of traffic being switched between both active interfaces, but are
>> actually indicative of short hangs occurring on the server.
>> 
>> Having a pause or two in transmission is no big deal, but in my case, it
>> is happening constantly, and dropping my overall read transfer rate down
>> to 20-60MB/s, which is slower than the single interface transfer rate
>> (~90-100MB/s).
>> 
>> Decreasing the MTU makes the pauses shorter, increasing them makes the
>> pauses longer.
>> 
>> Another interesting thing is that if I set the multipath io interval to
>> 3 operations instead of 1, I get better throughput. In other words, the
>> less frequently I swap IP addresses on my iSCSI requests from the ESXi
>> unit, the fewer pauses I see.
>> 
>> Basically, COMSTAR seems to choke each time an iSCSI request from a new
>> IP arrives.
>> 
>> Because the single interface transfer is near line rate, that tells me
>> that the storage system (mpt_sas, zfs, etc) is working fine. It's only
>> when multiple paths are attempted that iSCSI falls on its face during reads.
>> 
>> All of these captures were taken without a cache device being attached
>> to the storage zpool, so this isn't looking like some kind of ZFS ARC
>> problem. As mentioned previously, local transfers to/from the zpool are
>> showing ~300-500 MB/s rates over long transfers (10G+).
>> 
>> -Warren V
>> 
>> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org <mailto:garrett at damore.org>
>> <mailto:garrett at damore.org <mailto:garrett at damore.org>>> wrote:
>> 
>>     I?m not sure I?ve followed properly.  You have *two* interfaces.
>>     You are not trying to provision these in an aggr are you? As far as
>>     I?m aware, VMware does not support 802.3ad link aggregations.  (Its
>>     possible that you can make it work with ESXi if you give the entire
>>     NIC to the guest ? but I?m skeptical.)  The problem is that if you
>>     try to use link aggregation, some packets (up to half!) will be
>>     lost.  TCP and other protocols fare poorly in this situation.
>> 
>>     Its possible I?ve totally misunderstood what you?re trying to do, in
>>     which case I apologize.
>> 
>>     The idle thing is a red-herring ? the cpu is waiting for work to do,
>>     probably because packets haven?t arrived (or where dropped by the
>>     hypervisor!)  I wouldn?t read too much into that except that your
>>     network stack is in trouble.  I?d look a bit more closely at the
>>     kstats for tcp ? I suspect you?ll see retransmits or out of order
>>     values that are unusually high ? if so this may help validate my
>>     theory above.
>> 
>>     - Garrett
>> 
>>     On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>>     <developer at lists.illumos.org <mailto:developer at lists.illumos.org> <mailto:developer at lists.illumos.org <mailto:developer at lists.illumos.org>>>
>> 
>>     wrote:
>> 
>>     Hello all,
>> 
>> 
>>     Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>> 
>> 
>>     I tried Joerg's updated driver, which didn't improve the issue. So
>>     I went back to the drawing board and rebuilt the server from scratch.
>> 
>>     What I noted is that if I have only a single 1-gig physical
>>     interface active on the ESXi host, everything works as expected.
>>     As soon as I enable two interfaces, I start seeing the performance
>>     problems I've described.
>> 
>>     Response pauses from the server that I see in TCPdumps are still
>>     leading me to believe the problem is delay on the server side, so
>>     I ran a series of kernel dtraces and produced some flamegraphs.
>> 
>> 
>>     This was taken during a read operation with two active 10G
>>     interfaces on the server, with a single target being shared by two
>>     tpgs- one tpg for each 10G physical port. The host device has two
>>     1G ports enabled, with VLANs separating the active ports into
>>     10G/1G pairs. ESXi is set to multipath using both VLANS with a
>>     round-robin IO interval of 1.
>> 
>>     https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing>
>> 
>> 
>>     This was taken during a write operation:
>> 
>>     https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing>
>> 
>> 
>>     I then rebooted the server and disabled C-State, ACPI T-State, and
>>     general EIST (Turbo boost) functionality in the CPU.
>> 
>>     I when I attempted to boot my guest VM, the iSCSI transfer
>>     gradually ground to a halt during the boot loading process, and
>>     the guest OS never did complete its boot process.
>> 
>>     Here is a flamegraph taken while iSCSI is slowly dying:
>> 
>>     https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing>
>> 
>> 
>>     I edited out cpu_idle_adaptive from the dtrace output and
>>     regenerated the slowdown graph:
>> 
>>     https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing>
>> 
>> 
>>     I then edited cpu_idle_adaptive out of the speedy write operation
>>     and regenerated that graph:
>> 
>>     https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing <https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing>
>> 
>> 
>>     I have zero experience with interpreting flamegraphs, but the most
>>     significant difference I see between the slow read example and the
>>     fast write example is in unix`thread_start --> unix`idle. There's
>>     a good chunk of "unix`i86_mwait" in the read example that is not
>>     present in the write example at all.
>> 
>>     Disabling the l2arc cache device didn't make a difference, and I
>>     had to reenable EIST support on the CPU to get my VMs to boot.
>> 
>>     I am seeing a variety of bug reports going back to 2010 regarding
>>     excessive mwait operations, with the suggested solutions usually
>>     being to set "cpupm enable poll-mode" in power.conf. That change
>>     also had no effect on speed.
>> 
>>     -Warren V
>> 
>> 
>> 
>> 
>>     -----Original Message-----
>> 
>>     From: Chris Siebenmann [mailto:cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>]
>> 
>>     Sent: Monday, February 23, 2015 8:30 AM
>> 
>>     To: W Verb
>> 
>>     Cc: omnios-discuss at lists.omniti.com <mailto:omnios-discuss at lists.omniti.com>
>>     <mailto:omnios-discuss at lists.omniti.com <mailto:omnios-discuss at lists.omniti.com>>; cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>
>>     <mailto:cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>>
>> 
>>     Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and
>>     the Greek economy
>> 
>> 
>>     > Chris, thanks for your specific details. I'd appreciate it if you
>> 
>>     > could tell me which copper NIC you tried, as well as to pass on the
>> 
>>     > iSCSI tuning parameters.
>> 
>> 
>>     Our copper NIC experience is with onboard X540-AT2 ports on
>>     SuperMicro hardware (which have the guaranteed 10-20 msec lock
>>     hold) and dual-port 82599EB TN cards (which have some sort of
>>     driver/hardware failure under load that eventually leads to
>>     2-second lock holds). I can't recommend either with the current
>>     driver; we had to revert to 1G networking in order to get stable
>>     servers.
>> 
>> 
>>     The iSCSI parameter modifications we do, across both initiators
>>     and targets, are:
>> 
>> 
>>     initialr2tno
>> 
>>     firstburstlength128k
>> 
>>     maxrecvdataseglen128k[only on Linux backends]
>> 
>>     maxxmitdataseglen128k[only on Linux backends]
>> 
>> 
>>     The OmniOS initiator doesn't need tuning for more than the first
>>     two parameters; on the Linux backends we tune up all four. My
>>     extended thoughts on these tuning parameters and why we touch them
>>     can be found
>> 
>>     here:
>> 
>> 
>>     http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol <http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol>
>> 
>>     http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning <http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning>
>> 
>> 
>>     The short version is that these parameters probably only make a
>>     small difference but their overall goal is to do 128KB ZFS reads
>>     and writes in single iSCSI operations (although they will be
>>     fragmented at the TCP
>> 
>>     layer) and to do iSCSI writes without a back-and-forth delay
>>     between initiator and target (that's 'initialr2t no').
>> 
>> 
>>     I think basically everyone should use InitialR2T set to no and in
>>     fact that it should be the software default. These days only
>>     unusually limited iSCSI targets should need it to be otherwise and
>>     they can change their setting for it (initiator and target must
>>     both agree to it being 'yes', so either can veto it).
>> 
>> 
>>     - cks
>> 
>> 
>> 
>>     On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de <mailto:jg at osn.de>
>>     <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
>> 
>>         Hi,
>> 
>>         I think your problem is caused by your link properties or your
>>         switch settings. In general the standard ixgbe seems to perform
>>         well.
>> 
>>         I had trouble after changing the default flow control settings
>>         to "bi"
>>         and this was my motivation to update the ixgbe driver a long
>>         time ago.
>>         After I have updated our systems to ixgbe 2.5.8 I never had any
>>         problems ....
>> 
>>         Make sure your switch has support for jumbo frames and you use
>>         the same mtu on all ports, otherwise the smallest will be used.
>> 
>>         What switch do you use? I can tell you nice horror stories about
>>         different vendors....
>> 
>>          - Joerg
>> 
>>         On 23.02.2015 10:31, W Verb wrote:
>> 
>>             Thank you Joerg,
>> 
>>             I've downloaded the package and will try it tomorrow.
>> 
>>             The only thing I can add at this point is that upon review
>>             of my
>>             testing, I may have performed my "pkg -u" between the
>>             initial quad-gig
>>             performance test and installing the 10G NIC. So this may
>>             be a new
>>             problem introduced in the latest updates.
>> 
>>             Those of you who are running 10G and have not upgraded to
>>             the latest
>>             kernel, etc, might want to do some additional testing
>>             before running the
>>             update.
>> 
>>             -Warren V
>> 
>>             On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann
>>             <jg at osn.de <mailto:jg at osn.de> <mailto:jg at osn.de <mailto:jg at osn.de>>
>>             <mailto:jg at osn.de <mailto:jg at osn.de> <mailto:jg at osn.de <mailto:jg at osn.de>>>> wrote:
>> 
>>                 Hi,
>> 
>>                 I remember there was a problem with the flow control
>>             settings in the
>>                 ixgbe
>>                 driver, so I updated it a long time ago for our
>>             internal servers to
>>                 2.5.8.
>>                 Last weekend I integrated the latest changes from the
>>             FreeBSD driver
>>                 to bring
>>                 the illumos ixgbe to 2.5.25 but I had no time to test
>>             it, so it's
>>                 completely
>>                 untested!
>> 
>> 
>>                 If you would like to give the latest driver a try you
>>             can fetch the
>>                 kernel modules from
>>             https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 <https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9>
>>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>>
>>                 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>>             <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9 <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>>
>> 
>>                 Clone your boot environment, place the modules in the
>>             new environment
>>                 and update the boot-archive of the new BE.
>> 
>>                   - Joerg
>> 
>> 
>> 
>> 
>> 
>>                 On 23.02.2015 02:54, W Verb wrote:
>> 
>>                     By the way, to those of you who have working
>>             setups: please send me
>>                     your pool/volume settings, interface linkprops,
>>             and any kernel
>>                     tuning
>>                     parameters you may have set.
>> 
>>                     Thanks,
>>                     Warren V
>> 
>>                     On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>                     <chip at innovates.com <mailto:chip at innovates.com> <mailto:chip at innovates.com <mailto:chip at innovates.com>>
>>             <mailto:chip at innovates.com <mailto:chip at innovates.com> <mailto:chip at innovates.com <mailto:chip at innovates.com>>>>
>> 
>>             wrote:
>> 
>>                         I can't say I totally agree with your performance
>>                         assessment.   I run Intel
>>                         X520 in all my OmniOS boxes.
>> 
>>                         Here is a capture of nfssvrtop I made while
>>             running many
>>                         storage vMotions
>>                         between two OmniOS boxes hosting NFS
>>             datastores.   This is a
>>                         10 host VMware
>>                         cluster.  Both OmniOS boxes are dual 10G
>>             connected with
>>                         copper twin-ax to
>>                         the in rack Nexus 5010.
>> 
>>                         VMware does 100% sync writes, I use ZeusRAM
>>             SSDs for log
>>                         devices.
>> 
>>                         -Chip
>> 
>>                         2014 Apr 24 08:05:51, load: 12.64, read:
>>             17330243 KB,
>>                         swrite: 15985    KB,
>>                         awrite: 1875455  KB
>> 
>>                         Ver     Client           NFSOPS   Reads
>>             SWrites AWrites
>>                         Commits   Rd_bw
>>                         SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t
>>              Com_t  Align%
>> 
>>                         4       10.28.17.105          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>> 
>>                         4       10.28.17.215          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>> 
>>                         4       10.28.17.213          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>> 
>>                         4       10.28.16.151          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>> 
>>                         4       all                   1       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>> 
>>                         3       10.28.16.175          3       0
>>              3       0
>>                           0       1
>>                         11       0    4806      48       0       0      85
>> 
>>                         3       10.28.16.183          6       0
>>              6       0
>>                           0       3
>>                         162       0     549     124       0       0
>>               73
>> 
>>                         3       10.28.16.180         11       0
>>             10       0
>>                           0       3
>>                         27       0     776      89       0       0      67
>> 
>>                         3       10.28.16.176         28       2
>>             26       0
>>                           0      10
>>                         405       0    2572     198       0       0
>>              100
>> 
>>                         3       10.28.16.178       4606    4602
>>              4       0
>>                           0  294534
>>                         3       0     723      49       0       0      99
>> 
>>                         3       10.28.16.179       4905    4879
>>             26       0
>>                           0  312208
>>                         311       0     735     271       0       0
>>               99
>> 
>>                         3       10.28.16.181       5515    5502
>>             13       0
>>                           0  352107
>>                         77       0      89      87       0       0      99
>> 
>>                         3       10.28.16.184      12095   12059
>>             10       0
>>                           0  763014
>>                         39       0     249     147       0       0      99
>> 
>>                         3       10.28.58.1        15401    6040
>>              116    6354
>>                         53  191605
>>                         474  202346     192      96     144      83
>>               99
>> 
>>                         3       all 42574 33086 <tel:42574%2033086> <tel:42574%2033086>
>>             <tel:42574%20%20%2033086>     217
>>                         6354      53 1913488
>>                         1582  202300     348     138     153     105
>>                 99
>> 
>> 
>> 
>> 
>> 
>>                         On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>>             <wverb73 at gmail.com <mailto:wverb73 at gmail.com> <mailto:wverb73 at gmail.com <mailto:wverb73 at gmail.com>>
>>                         <mailto:wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>> 
>>             <mailto:wverb73 at gmail.com <mailto:wverb73 at gmail.com>>>> wrote:
>> 
>> 
>>                             Hello All,
>> 
>>                             Thank you for your replies.
>>                             I tried a few things, and found the following:
>> 
>>                             1: Disabling hyperthreading support in the
>>             BIOS drops
>>                             performance overall
>>                             by a factor of 4.
>>                             2: Disabling VT support also seems to have
>>             some effect,
>>                             although it
>>                             appears to be minor. But this has the
>>             amusing side
>>                             effect of fixing the
>>                             hangs I've been experiencing with fast
>>             reboot. Probably
>>                             by disabling kvm.
>>                             3: The performance tests are a bit tricky
>>             to quantify
>>                             because of caching
>>                             effects. In fact, I'm not entirely sure
>>             what is
>>                             happening here. It's just
>>                             best to describe what I'm seeing:
>> 
>>                             The commands I'm using to test are
>>                             dd if=/dev/zero of=./test.dd bs=2M count=5000
>>                             dd of=/dev/null if=./test.dd bs=2M count=5000
>>                             The host vm is running Centos 6.6, and has
>>             the latest
>>                             vmtools installed.
>>                             There is a host cache on an SSD local to
>>             the host that
>>                             is also in place.
>>                             Disabling the host cache didn't
>>             immediately have an
>>                             effect as far as I could
>>                             see.
>> 
>>                             The host MTU set to 3000 on all iSCSI
>>             interfaces for all
>>                             tests.
>> 
>>                             Test 1: Right after reboot, with an ixgbe
>>             MTU of 9000,
>>                             the write test
>>                             yields an average speed over three tests
>>             of 137MB/s. The
>>                             read test yields an
>>                             average over three tests of 5MB/s.
>> 
>>                             Test 2: After setting "ifconfig ixgbe0 mtu
>>             3000", the
>>                             write tests yield
>>                             140MB/s, and the read tests yield 53MB/s.
>>             It's important
>>                             to note here that
>>                             if I cut the read test short at only
>>             2-3GB, I get
>>                             results upwards of
>>                             350MB/s, which I assume is local
>>             cache-related distortion.
>> 
>>                             Test 3: MTU of 1500. Read tests are up to
>>             156 MB/s.
>>                             Write tests yield
>>                             about 142MB/s.
>>                             Test 4: MTU of 1000: Read test at 182MB/s.
>>                             Test 5: MTU of 900: Read test at 130 MB/s.
>>                             Test 6: MTU of 1000: Read test at 160MB/s.
>>             Write tests
>>                             are now
>>                             consistently at about 300MB/s.
>>                             Test 7: MTU of 1200: Read test at 124MB/s.
>>                             Test 8: MTU of 1000: Read test at 161MB/s.
>>             Write at 261MB/s.
>> 
>>                             A few final notes:
>>                             L1ARC grabs about 10GB of RAM during the
>>             tests, so
>>                             there's definitely some
>>                             read caching going on.
>>                             The write operations are easier to observe
>>             with iostat,
>>                             and I'm seeing io
>>                             rates that closely correlate with the
>>             network write speeds.
>> 
>> 
>>                             Chris, thanks for your specific details.
>>             I'd appreciate
>>                             it if you could
>>                             tell me which copper NIC you tried, as
>>             well as to pass
>>                             on the iSCSI tuning
>>                             parameters.
>> 
>>                             I've ordered an Intel EXPX9502AFXSR, which
>>             uses the
>>                             82598 chip instead of
>>                             the 82599 in the X520. If I get similar
>>             results with my
>>                             fiber transcievers,
>>                             I'll see if I can get a hold of copper ones.
>> 
>>                             But I should mention that I did indeed
>>             look at PHY/MAC
>>                             error rates, and
>>                             they are nil.
>> 
>>                             -Warren V
>> 
>>                             On Fri, Feb 20, 2015 at 7:25 PM, Chris
>>             Siebenmann
>>                             <cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>
>>             <mailto:cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>> <mailto:cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>
>> 
>>             <mailto:cks at cs.toronto.edu <mailto:cks at cs.toronto.edu>>>>
>> 
>>                             wrote:
>> 
>> 
>>                                     After installation and
>>             configuration, I observed
>>                                     all kinds of bad
>>                                     behavior
>>                                     in the network traffic between the
>>             hosts and the
>>                                     server. All of this
>>                                     bad
>>                                     behavior is traced to the ixgbe
>>             driver on the
>>                                     storage server. Without
>>                                     going
>>                                     into the full troubleshooting
>>             process, here are
>>                                     my takeaways:
>> 
>>                                 [...]
>> 
>>                                    For what it's worth, we managed to
>>             achieve much
>>                                 better line rates on
>>                                 copper 10G ixgbe hardware of various
>>             descriptions
>>                                 between OmniOS
>>                                 and CentOS 7 (I don't think we ever
>>             tested OmniOS to
>>                                 OmniOS). I don't
>>                                 believe OmniOS could do TCP at full
>>             line rate but I
>>                                 think we managed 700+
>>                                 Mbytes/sec on both transmit and
>>             receive and we got
>>                                 basically disk-limited
>>                                 speeds with iSCSI (across multiple
>>             disks on
>>                                 multi-disk mirrored pools,
>>                                 OmniOS iSCSI initiator, Linux iSCSI
>>             targets).
>> 
>>                                    I don't believe we did any specific
>>             kernel tuning
>>                                 (and in fact some of
>>                                 our attempts to fiddle ixgbe driver
>>             parameters blew
>>                                 up in our face).
>>                                 We did tune iSCSI connection
>>             parameters to increase
>>                                 various buffer
>>                                 sizes so that ZFS could do even large
>>             single
>>                                 operations in single iSCSI
>>                                 transactions. (More details available
>>             if people are
>>                                 interested.)
>> 
>>                                     10: At the wire level, the speed
>>             problems are
>>                                     clearly due to pauses in
>>                                     response time by omnios. At 9000
>>             byte frame
>>                                     sizes, I see a good number
>>                                     of duplicate ACKs and fast
>>             retransmits during
>>                                     read operations (when
>>                                     omnios is transmitting). But below
>>             about a
>>                                     4100-byte MTU on omnios
>>                                     (which seems to correlate to
>>             4096-byte iSCSI
>>                                     block transfers), the
>>                                     transmission errors fade away and
>>             we only see
>>                                     the transmission pause
>>                                     problem.
>> 
>> 
>>                                    This is what really attracted my
>>             attention. In
>>                                 our OmniOS setup, our
>>                                 specific Intel hardware had ixgbe
>>             driver issues that
>>                                 could cause
>>                                 activity stalls during once-a-second
>>             link heartbeat
>>                                 checks. This
>>                                 obviously had an effect at the TCP and
>>             iSCSI layers.
>>                                 My initial message
>>                                 to illumos-developer sparked a potentially
>>                                 interesting discussion:
>> 
>> 
>>             http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/ <http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/>
>>             <http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/ <http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/>>
>> 
>>             <http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/ <http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/>
>>             <http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/ <http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/>>>
>> 
>>                                 If you think this is a possibility in
>>             your setup,
>>                                 I've put the DTrace
>>                                 script I used to hunt for this up on
>>             the web:
>> 
>>             http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d <http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d>
>>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d>>
>> 
>>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d>
>>             <http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d <http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d>>>
>> 
>>                                 This isn't the only potential source
>>             of driver
>>                                 stalls by any means, it's
>>                                 just the one I found. You may also
>>             want to look at
>>                                 lockstat in general,
>>                                 as information it reported is what led
>>             us to look
>>                                 specifically at the
>>                                 ixgbe code here.
>> 
>>                                 (If you suspect kernel/driver issues,
>>             lockstat
>>                                 combined with kernel
>>                                 source is a really excellent resource.)
>> 
>>                                           - cks
>> 
>> 
>> 
>> 
>> 
>>             ___________________________________________________
>>                             OmniOS-discuss mailing list
>>             OmniOS-discuss at lists.omniti <mailto:OmniOS-discuss at lists.omniti>
>>             <mailto:OmniOS-discuss at lists.omniti <mailto:OmniOS-discuss at lists.omniti>>.____com
>>                             <mailto:OmniOS-discuss at lists. <mailto:OmniOS-discuss at lists.>__omniti.com <http://omniti.com/>
>>             <mailto:OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>>>
>>             http://lists.omniti.com/____mailman/listinfo/omnios-____discuss <http://lists.omniti.com/____mailman/listinfo/omnios-____discuss>
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>>
>> 
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>>
>> 
>> 
>>                     ___________________________________________________
>>                     OmniOS-discuss mailing list
>>             OmniOS-discuss at lists.omniti <mailto:OmniOS-discuss at lists.omniti>
>>             <mailto:OmniOS-discuss at lists.omniti <mailto:OmniOS-discuss at lists.omniti>>.____com
>>                     <mailto:OmniOS-discuss at lists. <mailto:OmniOS-discuss at lists.>__omniti.com <http://omniti.com/>
>>             <mailto:OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>>>
>>             http://lists.omniti.com/____mailman/listinfo/omnios-____discuss <http://lists.omniti.com/____mailman/listinfo/omnios-____discuss>
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>>
>> 
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>>
>> 
>> 
>>                 --
>>                 OSN Online Service Nuernberg GmbH, Bucher Str. 78,
>>             90408 Nuernberg
>>                 Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> <tel:%2B49%20911%2039905-0>
>>             <tel:%2B49%20911%2039905-0> - Fax: +49 911
>>                 39905-55 <tel:%2B49%20911%2039905-55> -
>>             http://www.osn.de <http://www.osn.de/> <http://www.osn.de/ <http://www.osn.de/>>
>>                 HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg
>>             Goltermann
>> 
>> 
>> 
>>         --
>>         OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>>         Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> <tel:%2B49%20911%2039905-0> - Fax: +49
>>         911 39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de <http://www.osn.de/>
>>         <http://www.osn.de/ <http://www.osn.de/>>
>>         HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>> 
>> 
>>     *illumos-developer* | Archives
>>     <https://www.listbox.com/member/archive/182179/=now <https://www.listbox.com/member/archive/182179/=now>>
>>     <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e>>
>>     | Modify <https://www.listbox.com/member/?& <https://www.listbox.com/member/?&>> Your Subscription
>>     [Powered by Listbox] <http://www.listbox.com/ <http://www.listbox.com/>>
>> 
>> 
>> 
>> *illumos-developer* | Archives
>> <https://www.listbox.com/member/archive/182179/=now <https://www.listbox.com/member/archive/182179/=now>>
>> <https://www.listbox.com/member/archive/rss/182179/21175123-d0c8da4c <https://www.listbox.com/member/archive/rss/182179/21175123-d0c8da4c>> |
>> Modify
>> <https://www.listbox.com/member/?&id_secret=21175123-d92578cc <https://www.listbox.com/member/?&>>
>> Your Subscription       [Powered by Listbox] <http://www.listbox.com <http://www.listbox.com/>>
>> 
>> 
>> -- 
>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>> Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> - Fax: +49 911 39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de <http://www.osn.de/>
>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>> 
>> illumos-developer | Archives <https://www.listbox.com/member/archive/182179/=now>  <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e> | Modify <https://www.listbox.com/member/?member_id=21239177&id_secret=21239177-2d0c9337> Your Subscription	  <http://www.listbox.com/>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150302/e0e00ce3/attachment-0001.html>

From garrett at damore.org  Mon Mar  2 20:32:26 2015
From: garrett at damore.org (Garrett D'Amore)
Date: Mon, 2 Mar 2015 12:32:26 -0800
Subject: [OmniOS-discuss] [developer]  The ixgbe driver, Lindsay Lohan,
	and the Greek economy
In-Reply-To: <AFF138EF-4D19-4DF2-9F67-4AC871ABB96B@omniti.com>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
	<54F44602.5030705@osn.de>
	<CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>
	<AFF138EF-4D19-4DF2-9F67-4AC871ABB96B@omniti.com>
Message-ID: <148BDD1F-A504-425B-8D30-78BED699E71F@damore.org>

However, if you look at the total times involved ? we?re talking about 1.5 us average, and only 9.3k events.  So there maybe some VM activity, but it is only responsible for 14.4 ms over the entire 30 sec run.  Again, that?s not *great*, but its unlikely to be related (at least directly) for the tragic behavior we?ve seen elsewhere.  (Indeed, the claim is that reads are good, so this result is from a good side, hence a red herring.)

	- Garrett

> On Mar 2, 2015, at 11:15 AM, Dan McDonald via illumos-developer <developer at lists.illumos.org> wrote:
> 
> 
>> On Mar 2, 2015, at 2:07 PM, W Verb via illumos-developer <developer at lists.illumos.org> wrote:
>> 
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>> -------------------------------------------------------------------------------
>> 9306  44%  44% 0.00     1557 htable_mutex+0x370     htable_release
>> 6307  23%  68% 0.00     1207 htable_mutex+0x108     htable_lookup
>>  596   7%  75% 0.00     4100 0xffffff0931705188     cv_wait
>>  349   5%  80% 0.00     4437 0xffffff0931705188     taskq_thread
>>  704   2%  82% 0.00      995 0xffffff0935de3c50     dbuf_create
> 
> That has NOTHING to do with TCP.
> 
> It has everything to do with the Virtual Memory subsystem.  Here, see all the callers to htable_release():
> 
> 	http://src.illumos.org/source/search?q=&defs=&refs=htable_release&path=&hist=&project=illumos-gate
> 
> I think "VM thrashing" when I see that.

> 
> Dan
> 
> 
> 
> -------------------------------------------
> illumos-developer
> Archives: https://www.listbox.com/member/archive/182179/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182179/21239177-3604570e
> Modify Your Subscription: https://www.listbox.com/member/?member_id=21239177&id_secret=21239177-2d0c9337
> Powered by Listbox: http://www.listbox.com


From nrhuff at umn.edu  Tue Mar  3 15:41:13 2015
From: nrhuff at umn.edu (Nathan Huff)
Date: Tue, 03 Mar 2015 09:41:13 -0600
Subject: [OmniOS-discuss] Long group names in ls acl output
Message-ID: <54F5D619.6000902@umn.edu>

We have a couple omnios servers that are getting NSS info from an AD 
domain that we use for serving CIFS shares.  There are groups in our 
domain that are longer than the 20 characters.  Looking at the ls man 
page and source code I don't think there is an option to display either 
a longer string or the uid/gid instead.  The main issue that I run into 
is not only are the group names long, some have a common prefix that is 
over 20 characters long.  In some cases the only way to tell which 
groups has which permissions is to connect to the share from windows and 
look at the ACLs that way.

Is there some way to disambiguate this case I don't know about?

-- 
Nathan Huff
System Administrator
Academic Health Center Information Systems
University of Minnesota
612-626-9136

From henson at acm.org  Tue Mar  3 20:58:03 2015
From: henson at acm.org (Paul B. Henson)
Date: Tue, 3 Mar 2015 12:58:03 -0800
Subject: [OmniOS-discuss] Long group names in ls acl output
In-Reply-To: <54F5D619.6000902@umn.edu>
References: <54F5D619.6000902@umn.edu>
Message-ID: <2dfa01d055f4$bfa16130$3ee42390$@acm.org>

> Nathan Huff
> Sent: Tuesday, March 03, 2015 7:41 AM
>
> domain that are longer than the 20 characters.  Looking at the ls man
> page and source code I don't think there is an option to display either
> a longer string or the uid/gid instead.

$ uname -a
SunOS storage 5.11 omnios-10b9c79 i86pc i386 i86pc

$ man ls
[...]
       -n, --numeric-uid-gid
              like -l, but list numeric user and group IDs

$ cd / ; ls -n
total 535
lrwxrwxrwx   1 0 0      9 Sep  2  2013 bin -> ./usr/bin
drwxr-xr-x   6 0 3      9 Sep  2  2013 boot
drwxr-xr-x 238 0 3    238 Jan 20 16:09 dev
drwxr-xr-x   9 0 3      9 Jan 20 16:08 devices
drwxr-xr-x  63 0 3    207 Jan 20 16:09 etc
drwxr-xr-x   7 0 0      7 Dec 14 20:06 export
dr-xr-xr-x   2 0 0      2 Aug 14  2013 home
drwxr-xr-x  18 0 3     19 Dec 20 18:00 kernel
drwxr-xr-x  10 0 2    180 Jan 16 18:42 lib
drwxr-xr-x   2 0 0      3 Sep  2  2013 media
drwxr-xr-x   2 0 3      2 Sep  2  2013 mnt
dr-xr-xr-x   2 0 0      2 Sep  2  2013 net
drwxr-xr-x   3 0 3      3 Dec 20 17:49 opt
drwxr-xr-x   5 0 3      5 Aug 14  2013 platform
dr-xr-xr-x 128 0 0 480032 Mar  3 12:53 proc
drwx------   3 0 0      7 Dec 20 20:45 root
drwxr-xr-x   3 0 0      3 Sep  2  2013 rpool
drwxr-xr-x   2 0 3     62 Nov 24 10:34 sbin
drwxr-xr-x   5 0 0      5 Aug 14  2013 system
drwxrwxrwt   3 0 3    242 Mar  3 12:50 tmp
drwxr-xr-x  29 0 3     41 Nov 24 10:34 usr
drwxr-xr-x  35 0 3     35 Nov  8  2013 var
drwxr-xr-x   4 0 0      4 Dec 20 20:46 zones




From nrhuff at umn.edu  Tue Mar  3 21:04:16 2015
From: nrhuff at umn.edu (Nathan Huff)
Date: Tue, 03 Mar 2015 15:04:16 -0600
Subject: [OmniOS-discuss] Long group names in ls acl output
In-Reply-To: <2dfa01d055f4$bfa16130$3ee42390$@acm.org>
References: <54F5D619.6000902@umn.edu>
	<2dfa01d055f4$bfa16130$3ee42390$@acm.org>
Message-ID: <54F621D0.6070506@umn.edu>

-n works for the regular user and group but seems to have no effect on 
ACL entries

/tank/shares$ /usr/bin/ls -Vn
total 132
drwx------+  5 256902   0              5 Jan 30 13:21 archive
      group:domain users:r-x---a-R-c---:-d-----:allow
     group:ahc_server_ops:rwxpdDaARWcCos:fd-----:allow
                  owner@:rwxpdDaARWc--s:fd-----:allow
                  group@:--------------:fd-----:allow
               everyone@:--------------:fd-----:allow
.
.
.

On 2015-03-03 2:58 PM, Paul B. Henson wrote:
>> Nathan Huff
>> Sent: Tuesday, March 03, 2015 7:41 AM
>>
>> domain that are longer than the 20 characters.  Looking at the ls man
>> page and source code I don't think there is an option to display either
>> a longer string or the uid/gid instead.
>
> $ uname -a
> SunOS storage 5.11 omnios-10b9c79 i86pc i386 i86pc
>
> $ man ls
> [...]
>         -n, --numeric-uid-gid
>                like -l, but list numeric user and group IDs
>
> $ cd / ; ls -n
> total 535
> lrwxrwxrwx   1 0 0      9 Sep  2  2013 bin -> ./usr/bin
> drwxr-xr-x   6 0 3      9 Sep  2  2013 boot
> drwxr-xr-x 238 0 3    238 Jan 20 16:09 dev
> drwxr-xr-x   9 0 3      9 Jan 20 16:08 devices
> drwxr-xr-x  63 0 3    207 Jan 20 16:09 etc
> drwxr-xr-x   7 0 0      7 Dec 14 20:06 export
> dr-xr-xr-x   2 0 0      2 Aug 14  2013 home
> drwxr-xr-x  18 0 3     19 Dec 20 18:00 kernel
> drwxr-xr-x  10 0 2    180 Jan 16 18:42 lib
> drwxr-xr-x   2 0 0      3 Sep  2  2013 media
> drwxr-xr-x   2 0 3      2 Sep  2  2013 mnt
> dr-xr-xr-x   2 0 0      2 Sep  2  2013 net
> drwxr-xr-x   3 0 3      3 Dec 20 17:49 opt
> drwxr-xr-x   5 0 3      5 Aug 14  2013 platform
> dr-xr-xr-x 128 0 0 480032 Mar  3 12:53 proc
> drwx------   3 0 0      7 Dec 20 20:45 root
> drwxr-xr-x   3 0 0      3 Sep  2  2013 rpool
> drwxr-xr-x   2 0 3     62 Nov 24 10:34 sbin
> drwxr-xr-x   5 0 0      5 Aug 14  2013 system
> drwxrwxrwt   3 0 3    242 Mar  3 12:50 tmp
> drwxr-xr-x  29 0 3     41 Nov 24 10:34 usr
> drwxr-xr-x  35 0 3     35 Nov  8  2013 var
> drwxr-xr-x   4 0 0      4 Dec 20 20:46 zones
>
>
>

-- 
Nathan Huff
System Administrator
Academic Health Center Information Systems
University of Minnesota
612-626-9136

From asc1111 at gmail.com  Tue Mar  3 23:44:52 2015
From: asc1111 at gmail.com (Aaron Curry)
Date: Tue, 3 Mar 2015 16:44:52 -0700
Subject: [OmniOS-discuss] CIFS File Lock Problems
Message-ID: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>

Hi all,

We have encountered an issue with out OmniOS CIFS file server and file
locks. Every now and then we get a call from a user trying to access a file
but they can't because it says that the file is in use by another user.
Long story short, we track down the session holding the lock, kill the
session, file locks are released and the user can access the file. Left at
that, it sounds pretty normal for a file server. But, it's not as simple as
it sounds.

Problem #1: Frequency

This happens much more often than I am used to with other file servers. Not
only that, but the frequency of "file locked" incidences seems to slowly
increase over time until we are inundated with requested to unlock files.
At that point we reboot the server and everyone is happy... at least for a
week or so.

Problem #2: Stale sessions

It seems that the problem is caused by sessions becoming disconnected from
the client and the server is not cleaning up those orphaned sessions. We
can tell this because once we track down the session holding the lock,
shutting down the desktop/laptop/whatever that initiated the session does
not clean it up. The session stays open on the server. If you bring the
client device back up, instead of reconnecting to the existing session it
creates a new one. Often times the session holding locks is owned by the
same user who is unable to access the file.

When we first encountered this problem I changed the keep_alive setting on
the smb server from the default of 5400 to 300. This seemed to help. At
first I told people to wait 5 minutes and then the session cleared and they
were able to access their files, but it doesn't seem to be working any
more. Or at least changing keep_alive only fixed one problem and either
didn't resolve another or maybe caused another problem?

Problem #3: Tracking down open files

Most NAS devices I have worked with have you manage session and open files
through the Windows Computer Management console. You use that to connect to
the NAS device and can see all session and open files. Through that console
you can also kill sessions or the locks on specific files. It doesn't work
very well in this case. Windows 7 takes forever to try to load the session
information and seems to try to refresh while its still loading. The result
is that it is unusable. XP/2003 loads session information just fine but if
there's more than just a few sessions the open files list is empty. I even
ran a packet capture on the client to see if it just didn't understand what
the server was saying. Not the case. If there's more than 5 or so sessions,
the request for open files returns an empty array of items.

We have been using mdb to track down the open files, which is a pain.
Getting a list of sessions from mdb is easy enough with ::smblist but open
files are only returned as an address which then needs to be checked
against ::smbnode to get the path and file name. I wrote a script to parse
all the information and return something usable, but I'm not much of a
programmer. It takes about 15 minutes to run and having to run it every
time someone calls about a file lock is a waste of time.

Problem #4: Releasing file locks

Similar to problem #3. Normally we would connect to the NAS device with
Windows Computer Management console, go to Open Files, find the file,
right-click and Close Open File. Since we can't get a list of open files in
the Computer Management console, that obviously doesn't work. That's where
we've been tracking down the session holding the lock, pulling up the
sessions in Computer Manager on a Windows 2003 machine and killing the
session. This is going to become a very big problem in the near future as
we retire all our XP/2003 systems.

So those are the problems we are facing with our OmniOS CIFS server. If you
are still reading this, thank you for your patience. We're at a loss on
where to go from here. Understandably, the end users (and management) are
starting to get a little grouchy. The questions we are having trouble
answering are:

Why do sessions seem to get disconnect and hold locks open?
When a user / client machine combo reconnects, why doesn't it reuse an
existing session and assume responsibility for open files?
Why does the problem seem to grow in frequency over time (sounds like a
system stability issue)?
What is the best way to monitor / list active sessions and open files?
Is there a way to kill individual file locks / close open files?

Any help would be appreciated.
Thank you,

Aaron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150303/bfe95d5f/attachment-0001.html>

From wverb73 at gmail.com  Tue Mar  3 23:45:29 2015
From: wverb73 at gmail.com (W Verb)
Date: Tue, 3 Mar 2015 15:45:29 -0800
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
 Lindsay Lohan, and the Greek economy
In-Reply-To: <f48da72278a64860a0f72af9a673c3f4@NASANEXM01F.na.qualcomm.com>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
	<54F44602.5030705@osn.de>
	<CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>
	<064C3FC8-43E0-4757-AADA-831303782A4C@damore.org>
	<CAHN_Y276qRUD9y0DoZmm5Uh57DBXsEi2BJ3Z7G=DYt-p9mSmWQ@mail.gmail.com>
	<f48da72278a64860a0f72af9a673c3f4@NASANEXM01F.na.qualcomm.com>
Message-ID: <CAHN_Y25pwEGavh_ffsmxDNOBiZFA0OadpBE60vQ09=ThDzj-hA@mail.gmail.com>

Hello Rob et al,

Thank you for taking the time to look at this problem with me. I completely
understand your inclination to look at the network as the most probable
source of my issue, but I believe that this is a pretty clear-cut case of
server-side issues.

1: I did run ping RTT tests during both read and write operations with
multiple interfaces enabled, and the RTT stayed at ~.2ms regardless of
whether traffic was actively being transmitted/received or not.

2: I am not seeing the TCP window size bouncing around, and I am certainly
not seeing starvation and delay in my packet captures. It is true that I do
see delayed ACKs and retransmissions when I bump the MTU to 9000 on both
sides, but I stopped testing with high MTU as soon as I saw it happening
because I have a good understanding of incast. All of my recent testing has
been with MTUs between 1000 and 3000 bytes.

3: When testing with MTUs between 1000 and 3000 bytes, I do not see lost
packets and retransmission in captures on either the server or client side.
I only see staggered transmission delays on the part of the server.

4: The client is consistently advertising a large window size (20k+), so
the TCP throttling mechanism does not appear to play into this.

5: As mentioned previously, layer 2 flow control is not enabled anywhere in
the network, so there are no lower-level mechanisms at work.

6: Upon checking buffer and queue sizes (and doing the appropriate research
into documentation on the C3560E's buffer sizes), I do not see large
numbers of frames being dropped by the switch. It does happen at larger
MTUs, but not very often (and not consistently) during transfers at
1000-3000 byte MTUs. I do not have QoS, policing, or rate-shaping enabled.

7: Network interface stats on both the server and the ESXi client show no
errors of any kind. This is via netstat on the server, and esxcli / Vsphere
client on the ESXi box.

8: When looking at captures taken simultaneously on the server and client
side, the server-side transmission pauses are consistently seen and
reproducible, even after multiple server rebuilds, ESXi rebuilds, vSphere
reinstallations (down to wiping the SQL db), various COMSTAR configuration
variations, multiple 10G NICs with different NIC chipsets, multiple
switches (I tried both a 48-port and 24-port C3560E), multiple IOS
revisions (12.2 and 15.0), OmniOS versions (r151012 and previous) multiple
cables, transceivers, etc etc etc etc etc

For your review, I have uploaded the actual packet captures to Google Drive:

https://drive.google.com/file/d/0BwyUMjibonYQZS03dHJ2ZjJvTEE/view?usp=sharing
2 int write - ESXi vmk5
https://drive.google.com/file/d/0BwyUMjibonYQTlNTQ2M5bjlxZ00/view?usp=sharing
2 int write - ESXi vmk1
https://drive.google.com/file/d/0BwyUMjibonYQUEFsSVJCYXBVX3c/view?usp=sharing
2 int read -  server ixgbe0
https://drive.google.com/file/d/0BwyUMjibonYQT3FBbElnNFpJTzQ/view?usp=sharing
2 int read - ESXi vmk5
https://drive.google.com/file/d/0BwyUMjibonYQU1hXdFRLM2cxSTA/view?usp=sharing
2 int read - ESXi vmk1
https://drive.google.com/file/d/0BwyUMjibonYQNEFZSHVZdFNweDA/view?usp=sharing
1 int write - ESXi vmk1
https://drive.google.com/file/d/0BwyUMjibonYQM3FpTmloQm5iMGc/view?usp=sharing
1 int read - ESXi vmk1

Regards,

Warren V

On Mon, Mar 2, 2015 at 1:11 PM, Mallory, Rob <rmallory at qualcomm.com> wrote:

>  Just an EWAG,   and forgive me for not following closely, I just saw
> this in my inbox, and looked at it and the screenshots for 2 minutes.
>
>
>
>   But this looks like the typical incast problem..  see
> http://www.pdl.cmu.edu/Incast/
>
> where your storage servers (there are effectively two with ISCSI/MPIO if
> round-robin is working) have networks which are 20:1 oversubscribed to your
> 1GbE host interfaces. (although one of the tcpdumps shows only one server
> so it may be choked out completely)
>
>
>
> What is your BDP?  I?m guessing .150ms * 1GbE.  For single-link that gets
> you to a MSS of 18700 or so.
>
>
>
> On your 1GbE connected clients, leave MTU at 9k, set the following in
> sysctl.conf,
>
> And reboot.
>
>
>
> net.ipv4.tcp_rmem = 4096 8938 17876
>
>
>
> If MPIO from the server is indeed round-robining properly, this will ?make
> things fit? much better.
>
>
>
> Note that your tcp_wmem can and should stay high, since you are not
> oversubscribed going from client?server ;  you only need to tweak the tcp
> receive window size.
>
>
>
> I?ve not done it in quite some time, but IIRC, You can also set these from
> the server side with:
>
> Route add -sendpipe 8930   or ?ssthresh
>
>
>
> And I think you can see the hash-table with computed BDP per client with
> ndd.
>
>
>
> I would try playing with those before delving deep into potential bugs in
> the TCP, nic driver, zfs, or vm.
>
> -Rob
>
>
>
> *From:* W Verb via illumos-developer [mailto:developer at lists.illumos.org]
> *Sent:* Monday, March 02, 2015 12:20 PM
> *To:* Garrett D'Amore
> *Cc:* Joerg Goltermann; illumos-dev; omnios-discuss at lists.omniti.com
> *Subject:* Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, Lindsay
> Lohan, and the Greek economy
>
>
>
> Hello,
>
> vmstat seems pretty boring. Certainly nothing going to swap.
>
> root at sanbox:/root# vmstat
>  kthr      memory            page            disk          faults      cpu
>  r b w   swap  free  re  mf pi po fr de sr po ro s0 s2   in   sy   cs us
> sy id
>  0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0
> 1 99
>
>  Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep
> 30" during the "fast" write operation.
>
>
> -------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest Lock           Caller
> 50934   3%  79% 0.00     3437 0xffffff093145ba40     taskq_dispatch_ent
>
>       nsec ------ Time Distribution ------ count     Stack
>        128 |                               7         spa_taskq_dispatch_ent
>        256 |@@                             4333      zio_taskq_dispatch
>        512 |@@                             3863      zio_issue_async
>       1024 |@@@@@                          9717      zio_execute
>       2048 |@@@@@@@@@                      15904
>       4096 |@@@@                           7595
>       8192 |@@                             4498
>      16384 |@                              2662
>      32768 |@                              1886
>      65536 |                               434
>     131072 |                               34
>     262144 |                               1
>
> -------------------------------------------------------------------------------
>
>
>   However, the truly "broken" function is a read operation:
>
> Top lock 1st try:
>
> -------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>   474  15%  15% 0.00     7031 0xffffff093145b6f8     cv_wait
>
>       nsec ------ Time Distribution ------ count     Stack
>        256 |@                              29        taskq_thread_wait
>        512 |@@@@@@                         100       taskq_thread
>       1024 |@@@@                           72        thread_start
>       2048 |@@@@                           69
>       4096 |@@@                            51
>       8192 |@@                             47
>      16384 |@@                             44
>      32768 |@@                             32
>      65536 |@                              25
>     131072 |                               5
>
> -------------------------------------------------------------------------------
>
>   Top lock 2nd try:
>
>
> -------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>   174  39%  39% 0.00   103909 0xffffff0943f116a0     dmu_zfetch_find
>
>       nsec ------ Time Distribution ------ count     Stack
>       2048 |                               2         dmu_zfetch
>       4096 |                               3         dbuf_read
>       8192 |                               4
> dmu_buf_hold_array_by_dnode
>      16384 |                               3         dmu_buf_hold_array
>      32768 |@                              7
>      65536 |@@                             14
>     131072 |@@@@@@@@@@@@@@@@@@@@           116
>     262144 |@@@                            19
>     524288 |                               4
>    1048576 |                               2
>
> -------------------------------------------------------------------------------
>
> Top lock 3rd try:
>
>
> -------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>   283  55%  55% 0.00    94602 0xffffff0943ff5a68     dmu_zfetch_find
>
>       nsec ------ Time Distribution ------ count     Stack
>        512 |                               1         dmu_zfetch
>       1024 |                               1         dbuf_read
>       2048 |                               0
> dmu_buf_hold_array_by_dnode
>       4096 |                               5         dmu_buf_hold_array
>       8192 |                               2
>      16384 |                               7
>      32768 |                               4
>      65536 |@@@                            33
>     131072 |@@@@@@@@@@@@@@@@@@@@           198
>     262144 |@@                             27
>     524288 |                               2
>    1048576 |                               3
>
> -------------------------------------------------------------------------------
>
>
>
> As for the MTU question- setting the MTU to 9000 makes read operations
> grind almost to a halt at 5MB/s transfer rate.
>
> -Warren V
>
>
>
> On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore <garrett at damore.org>
> wrote:
>
>  Here?s a theory.  You are using small (relatively) MTUs (3000 is less
> than the smallest ZFS block size.)  So, when you go multipathing this way,
> might a single upper layer transaction (ZFS block transfer request, or for
> that matter COMSTAR block request) get routed over different paths.  This
> sounds like a potentially pathological condition to me.
>
>
>
> What happens if you increase the MTU to 9000?  Have you tried it?  I?m
> sort of thinking that this will permit each transaction to be issued in a
> single IP frame, which may alleviate certain tragic code paths.  (That
> said, I?m not sure how aware COMSTAR is of the IP MTU.  If it is ignorant,
> then it shouldn?t matter *that* much, since TCP should do the right thing
> here and a single TCP stream should stick to a single underlying NIC.  But
> if COMSTAR is aware of the MTU, it may do some really screwball things as
> it tries to break requests up into single frames.)
>
>
>
> Your read spin really looks like only about 22 msec of wait out of a total
> run of 30 sec.  (That?s not *great*, but neither does it sound tragic.)
>  Your write  is interesting because that looks like it is going a wildly
> different path.  You should be aware that the locks you see are *not*
> necessarily related in call order, but rather are ordered by instance
> count.  The write code path hitting the task_thread as hard as it does is
> really, really weird.  Something is pounding on a taskq lock super hard.
> The number of taskq_dispatch_ent calls is interesting here.  I?m starting
> to wonder if it?s something as stupid as a spin where if the taskq is
> ?full? (max size reached), a caller just is spinning trying to dispatch
> jobs to the taskq.
>
>
>
> The taskq_dispatch_ent code is super simple, and it should be almost
> impossible to have contention on that lock ? barring a thread spinning hard
> on taskq_dispatch (or taskq_dispatch_ent as I think is happening here).
> Looking at the various call sites, there are places in both COMSTAR
> (iscsit) and in ZFS where this could be coming from.  To know which, we
> really need to have the back trace associated.
>
>
>
> lockstat can give this ? try giving ?-s 5? to give a short backtrace from
> this, that will probably give us a little more info about the guilty
> caller. :-)
>
>
>
> - Garrett
>
>
>
>   On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer <
> developer at lists.illumos.org> wrote:
>
>
>
> Hello all,
>
> I am not using layer 2 flow control. The switch carries line-rate 10G
> traffic without error.
>
> I think I have found the issue via lockstat. The first lockstat is taken
> during a multipath read:
>
>  lockstat -kWP sleep 30
>
>
> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec)
>
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>
> -------------------------------------------------------------------------------
>  9306  44%  44% 0.00     1557 htable_mutex+0x370     htable_release
>  6307  23%  68% 0.00     1207 htable_mutex+0x108     htable_lookup
>   596   7%  75% 0.00     4100 0xffffff0931705188     cv_wait
>   349   5%  80% 0.00     4437 0xffffff0931705188     taskq_thread
>   704   2%  82% 0.00      995 0xffffff0935de3c50     dbuf_create
>
> The hash table being read here I would guess is the tcp connection hash
> table.
>
>
>
> When lockstat is run during a multipath write operation, I get:
>
> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec)
>
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>
> -------------------------------------------------------------------------------
> 210752  28%  28% 0.00     4781 0xffffff0931705188     taskq_thread
> 174471  22%  50% 0.00     4476 0xffffff0931705188     cv_wait
> 127183  10%  61% 0.00     2871 0xffffff096f29b510     zio_notify_parent
> 176066  10%  70% 0.00     1922 0xffffff0931705188     taskq_dispatch_ent
> 105134   9%  80% 0.00     3110 0xffffff096ffdbf10     zio_remove_child
> 67512   4%  83% 0.00     1938 0xffffff096f3db4b0     zio_add_child
> 45736   3%  86% 0.00     2239 0xffffff0935de3c50     dbuf_destroy
> 27781   3%  89% 0.00     3416 0xffffff0935de3c50     dbuf_create
> 38536   2%  91% 0.00     2122 0xffffff0935de3b70     dnode_rele
> 27841   2%  93% 0.00     2423 0xffffff0935de3b70     dnode_diduse_space
> 19020   2%  95% 0.00     3046 0xffffff09d9e305e0     dbuf_rele
> 14627   1%  96% 0.00     3632 dbuf_hash_table+0x4f8  dbuf_find
>
>
>   Writes are not performing htable lookups, while reads are.
>
> -Warren V
>
>
>
>
>
>
>
> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann <jg at osn.de> wrote:
>
>  Hi,
>
> I would try *one* TPG which includes both interface addresses
> and I would double check for packet drops on the Catalyst.
>
> The 3560 supports only receive flow control which means, that
> a sending 10Gbit port can easily overload a 1Gbit port.
> Do you have flow control enabled?
>
>  - Joerg
>
>
>
> On 02.03.2015 09:22, W Verb via illumos-developer wrote:
>
>   Hello Garrett,
>
> No, no 802.3ad going on in this config.
>
> Here is a basic schematic:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing
>
> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing
>
> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The
> switch is set to allow 9148-byte frames, and I'm not seeing any
> errors/buffer overruns on the switch.
>
> Here is a screenshot of a packet capture from a read operation on the
> guest OS (from it's local drive, which is actually a VMDK file on the
> storage server). In this example, only a single 1G ESXi kernel interface
> (vmk1) is bound to the software iSCSI initiator.
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing
>
> Note that there's a nice, well-behaved window sizing process taking
> place. The ESXi decreases the scaled window by 11 or 12 for each ACK,
> then bumps it back up to 512.
>
> Here is a similar screenshot of a single-interface write operation:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing
>
> There are no pauses or gaps in the transmission rate in the
> single-interface transfers.
>
>
> In the next screenshots, I have enabled an additional 1G interface on
> the ESXi host, and bound it to the iSCSI initiator. The new interface is
> bound to a separate physical port, uses a different VLAN on the switch,
> and talks to a different 10G port on the storage server.
>
> First, let's look at a write operation on the guest OS, which happily
> pumps data at near-line-rate to the storage server.
>
> Here is a sequence number trace diagram. Note how the transfer has a
> nice, smooth increment rate over the entire transfer.
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing
>
> Here are screenshots from packet captures on both 1G interfaces:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing
>
> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing
>
> Note how we again see nice, smooth window adjustment, and no gaps in
> transmission.
>
>
> But now, let's look at the problematic two-interface Read operation.
> First, the sequence graph:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing
>
> As you can see, there are gaps and jumps in the transmission throughout
> the transfer.
> It is very illustrative to look at captures of the gaps, which are
> occurring on both interfaces:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing
>
> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing
>
> As you can see, there are ~.4 second pauses in transmission from the
> storage server, which kills the transfer rate.
> It's clear that the ESXi box ACKs the prior iSCSI operation to
> completion, then makes a new LUN request, which the storage server
> immediately replies to. The ESXi ACKs the response packet from the
> storage server, then waits...and waits....and waits... until eventually
> the storage server starts transmitting again.
>
> Because the pause happens while the ESXi client is waiting for a packet
> from the storage server, that tells me that the gaps are not an artifact
> of traffic being switched between both active interfaces, but are
> actually indicative of short hangs occurring on the server.
>
> Having a pause or two in transmission is no big deal, but in my case, it
> is happening constantly, and dropping my overall read transfer rate down
> to 20-60MB/s, which is slower than the single interface transfer rate
> (~90-100MB/s).
>
> Decreasing the MTU makes the pauses shorter, increasing them makes the
> pauses longer.
>
> Another interesting thing is that if I set the multipath io interval to
> 3 operations instead of 1, I get better throughput. In other words, the
> less frequently I swap IP addresses on my iSCSI requests from the ESXi
> unit, the fewer pauses I see.
>
> Basically, COMSTAR seems to choke each time an iSCSI request from a new
> IP arrives.
>
> Because the single interface transfer is near line rate, that tells me
> that the storage system (mpt_sas, zfs, etc) is working fine. It's only
> when multiple paths are attempted that iSCSI falls on its face during
> reads.
>
> All of these captures were taken without a cache device being attached
> to the storage zpool, so this isn't looking like some kind of ZFS ARC
> problem. As mentioned previously, local transfers to/from the zpool are
> showing ~300-500 MB/s rates over long transfers (10G+).
>
> -Warren V
>
> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org
>
> <mailto:garrett at damore.org>> wrote:
>
>     I?m not sure I?ve followed properly.  You have *two* interfaces.
>     You are not trying to provision these in an aggr are you? As far as
>     I?m aware, VMware does not support 802.3ad link aggregations.  (Its
>     possible that you can make it work with ESXi if you give the entire
>     NIC to the guest ? but I?m skeptical.)  The problem is that if you
>     try to use link aggregation, some packets (up to half!) will be
>     lost.  TCP and other protocols fare poorly in this situation.
>
>     Its possible I?ve totally misunderstood what you?re trying to do, in
>     which case I apologize.
>
>     The idle thing is a red-herring ? the cpu is waiting for work to do,
>     probably because packets haven?t arrived (or where dropped by the
>     hypervisor!)  I wouldn?t read too much into that except that your
>     network stack is in trouble.  I?d look a bit more closely at the
>     kstats for tcp ? I suspect you?ll see retransmits or out of order
>     values that are unusually high ? if so this may help validate my
>     theory above.
>
>     - Garrett
>
>     On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>     <developer at lists.illumos.org <mailto:developer at lists.illumos.org>>
>
>
>     wrote:
>
>     Hello all,
>
>
>     Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>
>
>     I tried Joerg's updated driver, which didn't improve the issue. So
>     I went back to the drawing board and rebuilt the server from scratch.
>
>     What I noted is that if I have only a single 1-gig physical
>     interface active on the ESXi host, everything works as expected.
>     As soon as I enable two interfaces, I start seeing the performance
>     problems I've described.
>
>     Response pauses from the server that I see in TCPdumps are still
>     leading me to believe the problem is delay on the server side, so
>     I ran a series of kernel dtraces and produced some flamegraphs.
>
>
>     This was taken during a read operation with two active 10G
>     interfaces on the server, with a single target being shared by two
>     tpgs- one tpg for each 10G physical port. The host device has two
>     1G ports enabled, with VLANs separating the active ports into
>     10G/1G pairs. ESXi is set to multipath using both VLANS with a
>     round-robin IO interval of 1.
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing
>
>
>     This was taken during a write operation:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing
>
>
>     I then rebooted the server and disabled C-State, ACPI T-State, and
>     general EIST (Turbo boost) functionality in the CPU.
>
>     I when I attempted to boot my guest VM, the iSCSI transfer
>     gradually ground to a halt during the boot loading process, and
>     the guest OS never did complete its boot process.
>
>     Here is a flamegraph taken while iSCSI is slowly dying:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing
>
>
>     I edited out cpu_idle_adaptive from the dtrace output and
>     regenerated the slowdown graph:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing
>
>
>     I then edited cpu_idle_adaptive out of the speedy write operation
>     and regenerated that graph:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing
>
>
>     I have zero experience with interpreting flamegraphs, but the most
>     significant difference I see between the slow read example and the
>     fast write example is in unix`thread_start --> unix`idle. There's
>     a good chunk of "unix`i86_mwait" in the read example that is not
>     present in the write example at all.
>
>     Disabling the l2arc cache device didn't make a difference, and I
>     had to reenable EIST support on the CPU to get my VMs to boot.
>
>     I am seeing a variety of bug reports going back to 2010 regarding
>     excessive mwait operations, with the suggested solutions usually
>     being to set "cpupm enable poll-mode" in power.conf. That change
>     also had no effect on speed.
>
>     -Warren V
>
>
>
>
>     -----Original Message-----
>
>     From: Chris Siebenmann [mailto:cks at cs.toronto.edu]
>
>     Sent: Monday, February 23, 2015 8:30 AM
>
>     To: W Verb
>
>     Cc: omnios-discuss at lists.omniti.com
>
>     <mailto:omnios-discuss at lists.omniti.com>; cks at cs.toronto.edu
>     <mailto:cks at cs.toronto.edu>
>
>     Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and
>     the Greek economy
>
>
>     > Chris, thanks for your specific details. I'd appreciate it if you
>
>     > could tell me which copper NIC you tried, as well as to pass on the
>
>     > iSCSI tuning parameters.
>
>
>     Our copper NIC experience is with onboard X540-AT2 ports on
>     SuperMicro hardware (which have the guaranteed 10-20 msec lock
>     hold) and dual-port 82599EB TN cards (which have some sort of
>     driver/hardware failure under load that eventually leads to
>     2-second lock holds). I can't recommend either with the current
>     driver; we had to revert to 1G networking in order to get stable
>     servers.
>
>
>     The iSCSI parameter modifications we do, across both initiators
>     and targets, are:
>
>
>     initialr2tno
>
>     firstburstlength128k
>
>     maxrecvdataseglen128k[only on Linux backends]
>
>     maxxmitdataseglen128k[only on Linux backends]
>
>
>     The OmniOS initiator doesn't need tuning for more than the first
>     two parameters; on the Linux backends we tune up all four. My
>     extended thoughts on these tuning parameters and why we touch them
>     can be found
>
>     here:
>
>
>
> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol
>
>     http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>
>
>     The short version is that these parameters probably only make a
>     small difference but their overall goal is to do 128KB ZFS reads
>     and writes in single iSCSI operations (although they will be
>     fragmented at the TCP
>
>     layer) and to do iSCSI writes without a back-and-forth delay
>     between initiator and target (that's 'initialr2t no').
>
>
>     I think basically everyone should use InitialR2T set to no and in
>     fact that it should be the software default. These days only
>     unusually limited iSCSI targets should need it to be otherwise and
>     they can change their setting for it (initiator and target must
>     both agree to it being 'yes', so either can veto it).
>
>
>     - cks
>
>
>
>     On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de
>
>     <mailto:jg at osn.de>> wrote:
>
>         Hi,
>
>         I think your problem is caused by your link properties or your
>         switch settings. In general the standard ixgbe seems to perform
>         well.
>
>         I had trouble after changing the default flow control settings
>         to "bi"
>         and this was my motivation to update the ixgbe driver a long
>         time ago.
>         After I have updated our systems to ixgbe 2.5.8 I never had any
>         problems ....
>
>         Make sure your switch has support for jumbo frames and you use
>         the same mtu on all ports, otherwise the smallest will be used.
>
>         What switch do you use? I can tell you nice horror stories about
>         different vendors....
>
>          - Joerg
>
>         On 23.02.2015 10:31, W Verb wrote:
>
>             Thank you Joerg,
>
>             I've downloaded the package and will try it tomorrow.
>
>             The only thing I can add at this point is that upon review
>             of my
>             testing, I may have performed my "pkg -u" between the
>             initial quad-gig
>             performance test and installing the 10G NIC. So this may
>             be a new
>             problem introduced in the latest updates.
>
>             Those of you who are running 10G and have not upgraded to
>             the latest
>             kernel, etc, might want to do some additional testing
>             before running the
>             update.
>
>             -Warren V
>
>             On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann
>             <jg at osn.de <mailto:jg at osn.de>
>
>             <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
>
>                 Hi,
>
>                 I remember there was a problem with the flow control
>             settings in the
>                 ixgbe
>                 driver, so I updated it a long time ago for our
>             internal servers to
>                 2.5.8.
>                 Last weekend I integrated the latest changes from the
>             FreeBSD driver
>                 to bring
>                 the illumos ixgbe to 2.5.25 but I had no time to test
>             it, so it's
>                 completely
>                 untested!
>
>
>                 If you would like to give the latest driver a try you
>             can fetch the
>                 kernel modules from
>             https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9
>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>                 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>             <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>
>
>                 Clone your boot environment, place the modules in the
>             new environment
>                 and update the boot-archive of the new BE.
>
>                   - Joerg
>
>
>
>
>
>                 On 23.02.2015 02:54, W Verb wrote:
>
>                     By the way, to those of you who have working
>             setups: please send me
>                     your pool/volume settings, interface linkprops,
>             and any kernel
>                     tuning
>                     parameters you may have set.
>
>                     Thanks,
>                     Warren V
>
>                     On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>                     <chip at innovates.com <mailto:chip at innovates.com>
>             <mailto:chip at innovates.com <mailto:chip at innovates.com>>>
>
>
>             wrote:
>
>                         I can't say I totally agree with your performance
>                         assessment.   I run Intel
>                         X520 in all my OmniOS boxes.
>
>                         Here is a capture of nfssvrtop I made while
>             running many
>                         storage vMotions
>                         between two OmniOS boxes hosting NFS
>             datastores.   This is a
>                         10 host VMware
>                         cluster.  Both OmniOS boxes are dual 10G
>             connected with
>                         copper twin-ax to
>                         the in rack Nexus 5010.
>
>                         VMware does 100% sync writes, I use ZeusRAM
>             SSDs for log
>                         devices.
>
>                         -Chip
>
>                         2014 Apr 24 08:05:51, load: 12.64, read:
>             17330243 KB,
>                         swrite: 15985    KB,
>                         awrite: 1875455  KB
>
>                         Ver     Client           NFSOPS   Reads
>             SWrites AWrites
>                         Commits   Rd_bw
>                         SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t
>              Com_t  Align%
>
>                         4       10.28.17.105          0       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
>
>                         4       10.28.17.215          0       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
>
>                         4       10.28.17.213          0       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
>
>                         4       10.28.16.151          0       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
>
>                         4       all                   1       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
>
>                         3       10.28.16.175          3       0
>              3       0
>                           0       1
>                         11       0    4806      48       0       0      85
>
>                         3       10.28.16.183          6       0
>              6       0
>                           0       3
>                         162       0     549     124       0       0
>               73
>
>                         3       10.28.16.180         11       0
>             10       0
>                           0       3
>                         27       0     776      89       0       0      67
>
>                         3       10.28.16.176         28       2
>             26       0
>                           0      10
>                         405       0    2572     198       0       0
>              100
>
>                         3       10.28.16.178       4606    4602
>              4       0
>                           0  294534
>                         3       0     723      49       0       0      99
>
>                         3       10.28.16.179       4905    4879
>             26       0
>                           0  312208
>                         311       0     735     271       0       0
>               99
>
>                         3       10.28.16.181       5515    5502
>             13       0
>                           0  352107
>                         77       0      89      87       0       0      99
>
>                         3       10.28.16.184      12095   12059
>             10       0
>                           0  763014
>                         39       0     249     147       0       0      99
>
>                         3       10.28.58.1        15401    6040
>              116    6354
>                         53  191605
>                         474  202346     192      96     144      83
>               99
>
>                         3       all 42574 33086 <tel:42574%2033086
> <42574%2033086>>
>             <tel:42574%20%20%2033086 <42574%20%20%2033086>>     217
>                         6354      53 1913488
>                         1582  202300     348     138     153     105
>                 99
>
>
>
>
>
>                         On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>             <wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>                         <mailto:wverb73 at gmail.com
>
>
>             <mailto:wverb73 at gmail.com>>> wrote:
>
>
>                             Hello All,
>
>                             Thank you for your replies.
>                             I tried a few things, and found the following:
>
>                             1: Disabling hyperthreading support in the
>             BIOS drops
>                             performance overall
>                             by a factor of 4.
>                             2: Disabling VT support also seems to have
>             some effect,
>                             although it
>                             appears to be minor. But this has the
>             amusing side
>                             effect of fixing the
>                             hangs I've been experiencing with fast
>             reboot. Probably
>                             by disabling kvm.
>                             3: The performance tests are a bit tricky
>             to quantify
>                             because of caching
>                             effects. In fact, I'm not entirely sure
>             what is
>                             happening here. It's just
>                             best to describe what I'm seeing:
>
>                             The commands I'm using to test are
>                             dd if=/dev/zero of=./test.dd bs=2M count=5000
>                             dd of=/dev/null if=./test.dd bs=2M count=5000
>                             The host vm is running Centos 6.6, and has
>             the latest
>                             vmtools installed.
>                             There is a host cache on an SSD local to
>             the host that
>                             is also in place.
>                             Disabling the host cache didn't
>             immediately have an
>                             effect as far as I could
>                             see.
>
>                             The host MTU set to 3000 on all iSCSI
>             interfaces for all
>                             tests.
>
>                             Test 1: Right after reboot, with an ixgbe
>             MTU of 9000,
>                             the write test
>                             yields an average speed over three tests
>             of 137MB/s. The
>                             read test yields an
>                             average over three tests of 5MB/s.
>
>                             Test 2: After setting "ifconfig ixgbe0 mtu
>             3000", the
>                             write tests yield
>                             140MB/s, and the read tests yield 53MB/s.
>             It's important
>                             to note here that
>                             if I cut the read test short at only
>             2-3GB, I get
>                             results upwards of
>                             350MB/s, which I assume is local
>             cache-related distortion.
>
>                             Test 3: MTU of 1500. Read tests are up to
>             156 MB/s.
>                             Write tests yield
>                             about 142MB/s.
>                             Test 4: MTU of 1000: Read test at 182MB/s.
>                             Test 5: MTU of 900: Read test at 130 MB/s.
>                             Test 6: MTU of 1000: Read test at 160MB/s.
>             Write tests
>                             are now
>                             consistently at about 300MB/s.
>                             Test 7: MTU of 1200: Read test at 124MB/s.
>                             Test 8: MTU of 1000: Read test at 161MB/s.
>             Write at 261MB/s.
>
>                             A few final notes:
>                             L1ARC grabs about 10GB of RAM during the
>             tests, so
>                             there's definitely some
>                             read caching going on.
>                             The write operations are easier to observe
>             with iostat,
>                             and I'm seeing io
>                             rates that closely correlate with the
>             network write speeds.
>
>
>                             Chris, thanks for your specific details.
>             I'd appreciate
>                             it if you could
>                             tell me which copper NIC you tried, as
>             well as to pass
>                             on the iSCSI tuning
>                             parameters.
>
>                             I've ordered an Intel EXPX9502AFXSR, which
>             uses the
>                             82598 chip instead of
>                             the 82599 in the X520. If I get similar
>             results with my
>                             fiber transcievers,
>                             I'll see if I can get a hold of copper ones.
>
>                             But I should mention that I did indeed
>             look at PHY/MAC
>                             error rates, and
>                             they are nil.
>
>                             -Warren V
>
>                             On Fri, Feb 20, 2015 at 7:25 PM, Chris
>             Siebenmann
>                             <cks at cs.toronto.edu
>
>             <mailto:cks at cs.toronto.edu> <mailto:cks at cs.toronto.edu
>
>
>             <mailto:cks at cs.toronto.edu>>>
>
>                             wrote:
>
>
>                                     After installation and
>             configuration, I observed
>                                     all kinds of bad
>                                     behavior
>                                     in the network traffic between the
>             hosts and the
>                                     server. All of this
>                                     bad
>                                     behavior is traced to the ixgbe
>             driver on the
>                                     storage server. Without
>                                     going
>                                     into the full troubleshooting
>             process, here are
>                                     my takeaways:
>
>                                 [...]
>
>                                    For what it's worth, we managed to
>             achieve much
>                                 better line rates on
>                                 copper 10G ixgbe hardware of various
>             descriptions
>                                 between OmniOS
>                                 and CentOS 7 (I don't think we ever
>             tested OmniOS to
>                                 OmniOS). I don't
>                                 believe OmniOS could do TCP at full
>             line rate but I
>                                 think we managed 700+
>                                 Mbytes/sec on both transmit and
>             receive and we got
>                                 basically disk-limited
>                                 speeds with iSCSI (across multiple
>             disks on
>                                 multi-disk mirrored pools,
>                                 OmniOS iSCSI initiator, Linux iSCSI
>             targets).
>
>                                    I don't believe we did any specific
>             kernel tuning
>                                 (and in fact some of
>                                 our attempts to fiddle ixgbe driver
>             parameters blew
>                                 up in our face).
>                                 We did tune iSCSI connection
>             parameters to increase
>                                 various buffer
>                                 sizes so that ZFS could do even large
>             single
>                                 operations in single iSCSI
>                                 transactions. (More details available
>             if people are
>                                 interested.)
>
>                                     10: At the wire level, the speed
>             problems are
>                                     clearly due to pauses in
>                                     response time by omnios. At 9000
>             byte frame
>                                     sizes, I see a good number
>                                     of duplicate ACKs and fast
>             retransmits during
>                                     read operations (when
>                                     omnios is transmitting). But below
>             about a
>                                     4100-byte MTU on omnios
>                                     (which seems to correlate to
>             4096-byte iSCSI
>                                     block transfers), the
>                                     transmission errors fade away and
>             we only see
>                                     the transmission pause
>                                     problem.
>
>
>                                    This is what really attracted my
>             attention. In
>                                 our OmniOS setup, our
>                                 specific Intel hardware had ixgbe
>             driver issues that
>                                 could cause
>                                 activity stalls during once-a-second
>             link heartbeat
>                                 checks. This
>                                 obviously had an effect at the TCP and
>             iSCSI layers.
>                                 My initial message
>                                 to illumos-developer sparked a potentially
>                                 interesting discussion:
>
>
> http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/
>             <
> http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/
> >
>
>             <
> http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/
>             <
> http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/
> >>
>
>                                 If you think this is a possibility in
>             your setup,
>                                 I've put the DTrace
>                                 script I used to hunt for this up on
>             the web:
>
>
> http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d
>             <
> http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d>
>
>             <
> http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d
>             <http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d
> >>
>
>                                 This isn't the only potential source
>             of driver
>                                 stalls by any means, it's
>                                 just the one I found. You may also
>             want to look at
>                                 lockstat in general,
>                                 as information it reported is what led
>             us to look
>                                 specifically at the
>                                 ixgbe code here.
>
>                                 (If you suspect kernel/driver issues,
>             lockstat
>                                 combined with kernel
>                                 source is a really excellent resource.)
>
>                                           - cks
>
>
>
>
>
>             ___________________________________________________
>                             OmniOS-discuss mailing list
>             OmniOS-discuss at lists.omniti
>             <mailto:OmniOS-discuss at lists.omniti>.____com
>                             <mailto:OmniOS-discuss at lists.__omniti.com
>             <mailto:OmniOS-discuss at lists.omniti.com>>
>
> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>
>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>
>
>                     ___________________________________________________
>                     OmniOS-discuss mailing list
>             OmniOS-discuss at lists.omniti
>             <mailto:OmniOS-discuss at lists.omniti>.____com
>                     <mailto:OmniOS-discuss at lists.__omniti.com
>             <mailto:OmniOS-discuss at lists.omniti.com>>
>
> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>
>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>
>
>                 --
>                 OSN Online Service Nuernberg GmbH, Bucher Str. 78,
>             90408 Nuernberg
>                 Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0
> <%2B49%20911%2039905-0>>
>             <tel:%2B49%20911%2039905-0 <%2B49%20911%2039905-0>> - Fax:
> +49 911
>                 39905-55 <tel:%2B49%20911%2039905-55
> <%2B49%20911%2039905-55>> -
>             http://www.osn.de <http://www.osn.de/>
>                 HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg
>             Goltermann
>
>
>
>         --
>         OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>         Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0
> <%2B49%20911%2039905-0>> - Fax: +49
>         911 39905-55 <tel:%2B49%20911%2039905-55 <%2B49%20911%2039905-55>>
> - http://www.osn.de
>         <http://www.osn.de/>
>         HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>
>
>     *illumos-developer* | Archives
>     <https://www.listbox.com/member/archive/182179/=now>
>     <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e>
>     | Modify <https://www.listbox.com/member/?&> Your Subscription
>     [Powered by Listbox] <http://www.listbox.com/>
>
>
>
> *illumos-developer* | Archives
> <https://www.listbox.com/member/archive/182179/=now>
> <https://www.listbox.com/member/archive/rss/182179/21175123-d0c8da4c> |
> Modify
>
> <https://www.listbox.com/member/?&id_secret=21175123-d92578cc
> <https://www.listbox.com/member/?&>>
> Your Subscription       [Powered by Listbox] <htt
> <http://www.listbox.com/>
>
> ...
>
> [Message clipped]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150303/e6540dfb/attachment-0001.html>

From gate03 at landcroft.co.uk  Wed Mar  4 04:26:03 2015
From: gate03 at landcroft.co.uk (Michael Mounteney)
Date: Wed, 4 Mar 2015 14:26:03 +1000
Subject: [OmniOS-discuss] speeding up file access
Message-ID: <20150304142603.152ac1da@emeritus>

Hello list;  this is a very basic question about ZFS performance from
someone with limited sysadmin knowledge.  I've seen various messages
about ZILs and caching and noticed that my Supermicro 5017C-LF
(http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm).
This has a single USB socket on the board so I wondered if it would be
worth putting a USB stick / `thumbdrive' in there and using it as the
ZIL / cache.  I know the real answer to my question is 'buy a proper
server' but this is a home system and cost, noise and power-consumption
all mandate the current choice of machine.

(Yes;  the USB socket is vertical;  I'd have to buy a right-angle
converter)

Thanks, Michael.

From wverb73 at gmail.com  Wed Mar  4 05:21:56 2015
From: wverb73 at gmail.com (W Verb)
Date: Tue, 3 Mar 2015 21:21:56 -0800
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
 Lindsay Lohan, and the Greek economy
In-Reply-To: <CAHN_Y25pwEGavh_ffsmxDNOBiZFA0OadpBE60vQ09=ThDzj-hA@mail.gmail.com>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
	<54F44602.5030705@osn.de>
	<CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>
	<064C3FC8-43E0-4757-AADA-831303782A4C@damore.org>
	<CAHN_Y276qRUD9y0DoZmm5Uh57DBXsEi2BJ3Z7G=DYt-p9mSmWQ@mail.gmail.com>
	<f48da72278a64860a0f72af9a673c3f4@NASANEXM01F.na.qualcomm.com>
	<CAHN_Y25pwEGavh_ffsmxDNOBiZFA0OadpBE60vQ09=ThDzj-hA@mail.gmail.com>
Message-ID: <CAHN_Y24kVihzV8S9vinyjAi6Sea+HzzHrxutMQatK1Xp4CVs=Q@mail.gmail.com>

Hello all,

This is probably the last message in this thread.

I pulled the quad-gig NIC out of one of my hosts, and installed an X520. I
then set a single 10G port on the server to be on the same VLAN as the
host, and defined a vswitch, vmknic, etc on the host.

I set the MTU to be 9000 on both sides, then ran my tests.

Read:  130 MB/s.
Write:  156 MB/s.

Additionally, at higher MTUs, the NIC would periodically lock up until I
performed an "ipadm disable-if -t ixgbe0" and re-enabled it. I tried your
updated driver, Jeorg, but unfortunately it failed quite often.

I then disabled stmf, enabled NFS (v3 only) on the server, and shared a
dataset on the zpool with "share -f nfs /ppool/testy".
I then mounted the server dataset on the host via NFS, and copied my test
VM from the iSCSI zvol to the NFS dataset. I also removed the binding of
the 10G port on the host from the sw iscsi interface.

Running the same tests on the VM over NFSv3 yielded:

Read: 650MB/s
Write: 306MB/s

This is getting within 10% of the throughput I consistently get on dd
operations local on the server, so I'm pretty happy that I'm getting as
good as I'm going to get until I add more drives. Additionally, I haven't
experienced any NIC hangs.

I tried varying the settings in ixgbe.conf, the MTU, and disabling LSO on
the host and server, but nothing really made that much of a difference
(except reducing the MTU made things about 20-30% slower).

mpstat during both NFS and iSCSI transfers showed all processors as getting
roughly the same number of interrupts, etc, although I did see a varying
number of  spins on reader/writer locks during the iSCSI transfers. The NFS
showed no srws at all.

Here is a pretty representative example of a 1s mpstat during an iSCSI
transfer:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl set
  0    0   0    0  3246 2690 8739    6  772 5967    2     0    0  11   0
89   0
  1    0   0    0  2366 2249 7910    8  988 5563    2   302    0   9   0
91   0
  2    0   0    0  2455 2344 5584    5  687 5656    3    66    0   9   0
91   0
  3    0   0   25   248   12 6210    1  885 5679    2     0    0   9   0
91   0
  4    0   0    0   284    7 5450    2  861 5751    1     0    0   8   0
92   0
  5    0   0    0   232    3 4513    0  547 5733    3     0    0   7   0
93   0
  6    0   0    0   322    8 6084    1  836 6295    2     0    0   8   0
92   0
  7    0   0    0  3114 2848 8229    4  648 4966    2     0    0  10   0
90   0


So, it seems that it's COMSTAR/iSCSI that's broke as hell, not ixgbe. My
apologies to anyone I may have offended with my pre-judgement.

The consequences of this performance issue are significant:
1: Instead of being able to utilize the existing quad-port NICs I have in
my hosts, I must use dual 10G cards for redundancy purposes.
2: I must build out a full 10G switching infrastructure.
3: The network traffic is inherently less secure, as it is essentially
impossible to do real security with NFSv3 (that is supported by ESXi).

In the short run, I have already ordered some relatively cheap 20G
infiniband gear that will hopefully push up the cost/performance ratio.
However, I have received all sorts of advice about how painful it can be to
build and maintain infiniband, and if iSCSI over 10G ethernet is this
painful, I'm not hopeful that infiniband will "just work".

The last option, of course, is to bail out of the Solaris derivatives and
move to ZoL or ZoBSD. The drawbacks of this are:

1: ZoL doesn't easily support booting off of mirrored USB flash drives, let
alone running the root filesystem and swap on them. FreeNAS, by way of
comparison, puts a 2G swap partition on each zdev, which (strangely enough)
causes it to often crash when a zdev experiences a failure under load.

2: Neither ZoL or FreeNAS have good, stable, kernel-based iSCSI
implementations. FreeNAS is indeed testing istgt, but it proved unstable
for my purposes in recent builds. Unfortunately, stmf hasn't proved itself
any better.

There are other minor differences, but these are the ones that brought me
to OmniOS in the first place. We'll just have to wait and see how well the
infiniband stuff works.


Hopefully this exercise will help prevent others from going down the same
rabbit-hole that I did.

-Warren V




On Tue, Mar 3, 2015 at 3:45 PM, W Verb <wverb73 at gmail.com> wrote:

> Hello Rob et al,
>
> Thank you for taking the time to look at this problem with me. I
> completely understand your inclination to look at the network as the most
> probable source of my issue, but I believe that this is a pretty clear-cut
> case of server-side issues.
>
> 1: I did run ping RTT tests during both read and write operations with
> multiple interfaces enabled, and the RTT stayed at ~.2ms regardless of
> whether traffic was actively being transmitted/received or not.
>
> 2: I am not seeing the TCP window size bouncing around, and I am certainly
> not seeing starvation and delay in my packet captures. It is true that I do
> see delayed ACKs and retransmissions when I bump the MTU to 9000 on both
> sides, but I stopped testing with high MTU as soon as I saw it happening
> because I have a good understanding of incast. All of my recent testing has
> been with MTUs between 1000 and 3000 bytes.
>
> 3: When testing with MTUs between 1000 and 3000 bytes, I do not see lost
> packets and retransmission in captures on either the server or client side.
> I only see staggered transmission delays on the part of the server.
>
> 4: The client is consistently advertising a large window size (20k+), so
> the TCP throttling mechanism does not appear to play into this.
>
> 5: As mentioned previously, layer 2 flow control is not enabled anywhere
> in the network, so there are no lower-level mechanisms at work.
>
> 6: Upon checking buffer and queue sizes (and doing the appropriate
> research into documentation on the C3560E's buffer sizes), I do not see
> large numbers of frames being dropped by the switch. It does happen at
> larger MTUs, but not very often (and not consistently) during transfers at
> 1000-3000 byte MTUs. I do not have QoS, policing, or rate-shaping enabled.
>
> 7: Network interface stats on both the server and the ESXi client show no
> errors of any kind. This is via netstat on the server, and esxcli / Vsphere
> client on the ESXi box.
>
> 8: When looking at captures taken simultaneously on the server and client
> side, the server-side transmission pauses are consistently seen and
> reproducible, even after multiple server rebuilds, ESXi rebuilds, vSphere
> reinstallations (down to wiping the SQL db), various COMSTAR configuration
> variations, multiple 10G NICs with different NIC chipsets, multiple
> switches (I tried both a 48-port and 24-port C3560E), multiple IOS
> revisions (12.2 and 15.0), OmniOS versions (r151012 and previous) multiple
> cables, transceivers, etc etc etc etc etc
>
> For your review, I have uploaded the actual packet captures to Google
> Drive:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQZS03dHJ2ZjJvTEE/view?usp=sharing
> 2 int write - ESXi vmk5
>
> https://drive.google.com/file/d/0BwyUMjibonYQTlNTQ2M5bjlxZ00/view?usp=sharing
> 2 int write - ESXi vmk1
>
> https://drive.google.com/file/d/0BwyUMjibonYQUEFsSVJCYXBVX3c/view?usp=sharing
> 2 int read -  server ixgbe0
>
> https://drive.google.com/file/d/0BwyUMjibonYQT3FBbElnNFpJTzQ/view?usp=sharing
> 2 int read - ESXi vmk5
>
> https://drive.google.com/file/d/0BwyUMjibonYQU1hXdFRLM2cxSTA/view?usp=sharing
> 2 int read - ESXi vmk1
>
> https://drive.google.com/file/d/0BwyUMjibonYQNEFZSHVZdFNweDA/view?usp=sharing
> 1 int write - ESXi vmk1
>
> https://drive.google.com/file/d/0BwyUMjibonYQM3FpTmloQm5iMGc/view?usp=sharing
> 1 int read - ESXi vmk1
>
> Regards,
>
> Warren V
>
> On Mon, Mar 2, 2015 at 1:11 PM, Mallory, Rob <rmallory at qualcomm.com>
> wrote:
>
>>  Just an EWAG,   and forgive me for not following closely, I just saw
>> this in my inbox, and looked at it and the screenshots for 2 minutes.
>>
>>
>>
>>   But this looks like the typical incast problem..  see
>> http://www.pdl.cmu.edu/Incast/
>>
>> where your storage servers (there are effectively two with ISCSI/MPIO if
>> round-robin is working) have networks which are 20:1 oversubscribed to your
>> 1GbE host interfaces. (although one of the tcpdumps shows only one server
>> so it may be choked out completely)
>>
>>
>>
>> What is your BDP?  I?m guessing .150ms * 1GbE.  For single-link that gets
>> you to a MSS of 18700 or so.
>>
>>
>>
>> On your 1GbE connected clients, leave MTU at 9k, set the following in
>> sysctl.conf,
>>
>> And reboot.
>>
>>
>>
>> net.ipv4.tcp_rmem = 4096 8938 17876
>>
>>
>>
>> If MPIO from the server is indeed round-robining properly, this will
>> ?make things fit? much better.
>>
>>
>>
>> Note that your tcp_wmem can and should stay high, since you are not
>> oversubscribed going from client?server ;  you only need to tweak the
>> tcp receive window size.
>>
>>
>>
>> I?ve not done it in quite some time, but IIRC, You can also set these
>> from the server side with:
>>
>> Route add -sendpipe 8930   or ?ssthresh
>>
>>
>>
>> And I think you can see the hash-table with computed BDP per client with
>> ndd.
>>
>>
>>
>> I would try playing with those before delving deep into potential bugs in
>> the TCP, nic driver, zfs, or vm.
>>
>> -Rob
>>
>>
>>
>> *From:* W Verb via illumos-developer [mailto:developer at lists.illumos.org]
>>
>> *Sent:* Monday, March 02, 2015 12:20 PM
>> *To:* Garrett D'Amore
>> *Cc:* Joerg Goltermann; illumos-dev; omnios-discuss at lists.omniti.com
>> *Subject:* Re: [developer] Re: [OmniOS-discuss] The ixgbe driver,
>> Lindsay Lohan, and the Greek economy
>>
>>
>>
>> Hello,
>>
>> vmstat seems pretty boring. Certainly nothing going to swap.
>>
>> root at sanbox:/root# vmstat
>>  kthr      memory            page            disk          faults      cpu
>>  r b w   swap  free  re  mf pi po fr de sr po ro s0 s2   in   sy   cs us
>> sy id
>>  0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0
>> 1 99
>>
>>  Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep
>> 30" during the "fast" write operation.
>>
>>
>> -------------------------------------------------------------------------------
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>> 50934   3%  79% 0.00     3437 0xffffff093145ba40     taskq_dispatch_ent
>>
>>       nsec ------ Time Distribution ------ count     Stack
>>        128 |                               7
>> spa_taskq_dispatch_ent
>>        256 |@@                             4333      zio_taskq_dispatch
>>        512 |@@                             3863      zio_issue_async
>>       1024 |@@@@@                          9717      zio_execute
>>       2048 |@@@@@@@@@                      15904
>>       4096 |@@@@                           7595
>>       8192 |@@                             4498
>>      16384 |@                              2662
>>      32768 |@                              1886
>>      65536 |                               434
>>     131072 |                               34
>>     262144 |                               1
>>
>> -------------------------------------------------------------------------------
>>
>>
>>   However, the truly "broken" function is a read operation:
>>
>> Top lock 1st try:
>>
>> -------------------------------------------------------------------------------
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>   474  15%  15% 0.00     7031 0xffffff093145b6f8     cv_wait
>>
>>       nsec ------ Time Distribution ------ count     Stack
>>        256 |@                              29        taskq_thread_wait
>>        512 |@@@@@@                         100       taskq_thread
>>       1024 |@@@@                           72        thread_start
>>       2048 |@@@@                           69
>>       4096 |@@@                            51
>>       8192 |@@                             47
>>      16384 |@@                             44
>>      32768 |@@                             32
>>      65536 |@                              25
>>     131072 |                               5
>>
>> -------------------------------------------------------------------------------
>>
>>   Top lock 2nd try:
>>
>>
>> -------------------------------------------------------------------------------
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>   174  39%  39% 0.00   103909 0xffffff0943f116a0     dmu_zfetch_find
>>
>>       nsec ------ Time Distribution ------ count     Stack
>>       2048 |                               2         dmu_zfetch
>>       4096 |                               3         dbuf_read
>>       8192 |                               4
>> dmu_buf_hold_array_by_dnode
>>      16384 |                               3         dmu_buf_hold_array
>>      32768 |@                              7
>>      65536 |@@                             14
>>     131072 |@@@@@@@@@@@@@@@@@@@@           116
>>     262144 |@@@                            19
>>     524288 |                               4
>>    1048576 |                               2
>>
>> -------------------------------------------------------------------------------
>>
>> Top lock 3rd try:
>>
>>
>> -------------------------------------------------------------------------------
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>   283  55%  55% 0.00    94602 0xffffff0943ff5a68     dmu_zfetch_find
>>
>>       nsec ------ Time Distribution ------ count     Stack
>>        512 |                               1         dmu_zfetch
>>       1024 |                               1         dbuf_read
>>       2048 |                               0
>> dmu_buf_hold_array_by_dnode
>>       4096 |                               5         dmu_buf_hold_array
>>       8192 |                               2
>>      16384 |                               7
>>      32768 |                               4
>>      65536 |@@@                            33
>>     131072 |@@@@@@@@@@@@@@@@@@@@           198
>>     262144 |@@                             27
>>     524288 |                               2
>>    1048576 |                               3
>>
>> -------------------------------------------------------------------------------
>>
>>
>>
>> As for the MTU question- setting the MTU to 9000 makes read operations
>> grind almost to a halt at 5MB/s transfer rate.
>>
>> -Warren V
>>
>>
>>
>> On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore <garrett at damore.org>
>> wrote:
>>
>>  Here?s a theory.  You are using small (relatively) MTUs (3000 is less
>> than the smallest ZFS block size.)  So, when you go multipathing this way,
>> might a single upper layer transaction (ZFS block transfer request, or for
>> that matter COMSTAR block request) get routed over different paths.  This
>> sounds like a potentially pathological condition to me.
>>
>>
>>
>> What happens if you increase the MTU to 9000?  Have you tried it?  I?m
>> sort of thinking that this will permit each transaction to be issued in a
>> single IP frame, which may alleviate certain tragic code paths.  (That
>> said, I?m not sure how aware COMSTAR is of the IP MTU.  If it is ignorant,
>> then it shouldn?t matter *that* much, since TCP should do the right thing
>> here and a single TCP stream should stick to a single underlying NIC.  But
>> if COMSTAR is aware of the MTU, it may do some really screwball things as
>> it tries to break requests up into single frames.)
>>
>>
>>
>> Your read spin really looks like only about 22 msec of wait out of a
>> total run of 30 sec.  (That?s not *great*, but neither does it sound
>> tragic.)  Your write  is interesting because that looks like it is going a
>> wildly different path.  You should be aware that the locks you see are
>> *not* necessarily related in call order, but rather are ordered by instance
>> count.  The write code path hitting the task_thread as hard as it does is
>> really, really weird.  Something is pounding on a taskq lock super hard.
>> The number of taskq_dispatch_ent calls is interesting here.  I?m starting
>> to wonder if it?s something as stupid as a spin where if the taskq is
>> ?full? (max size reached), a caller just is spinning trying to dispatch
>> jobs to the taskq.
>>
>>
>>
>> The taskq_dispatch_ent code is super simple, and it should be almost
>> impossible to have contention on that lock ? barring a thread spinning hard
>> on taskq_dispatch (or taskq_dispatch_ent as I think is happening here).
>> Looking at the various call sites, there are places in both COMSTAR
>> (iscsit) and in ZFS where this could be coming from.  To know which, we
>> really need to have the back trace associated.
>>
>>
>>
>> lockstat can give this ? try giving ?-s 5? to give a short backtrace from
>> this, that will probably give us a little more info about the guilty
>> caller. :-)
>>
>>
>>
>> - Garrett
>>
>>
>>
>>   On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer <
>> developer at lists.illumos.org> wrote:
>>
>>
>>
>> Hello all,
>>
>> I am not using layer 2 flow control. The switch carries line-rate 10G
>> traffic without error.
>>
>> I think I have found the issue via lockstat. The first lockstat is taken
>> during a multipath read:
>>
>>  lockstat -kWP sleep 30
>>
>>
>> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec)
>>
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>
>> -------------------------------------------------------------------------------
>>  9306  44%  44% 0.00     1557 htable_mutex+0x370     htable_release
>>  6307  23%  68% 0.00     1207 htable_mutex+0x108     htable_lookup
>>   596   7%  75% 0.00     4100 0xffffff0931705188     cv_wait
>>   349   5%  80% 0.00     4437 0xffffff0931705188     taskq_thread
>>   704   2%  82% 0.00      995 0xffffff0935de3c50     dbuf_create
>>
>> The hash table being read here I would guess is the tcp connection hash
>> table.
>>
>>
>>
>> When lockstat is run during a multipath write operation, I get:
>>
>> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec)
>>
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>
>> -------------------------------------------------------------------------------
>> 210752  28%  28% 0.00     4781 0xffffff0931705188     taskq_thread
>> 174471  22%  50% 0.00     4476 0xffffff0931705188     cv_wait
>> 127183  10%  61% 0.00     2871 0xffffff096f29b510     zio_notify_parent
>> 176066  10%  70% 0.00     1922 0xffffff0931705188     taskq_dispatch_ent
>> 105134   9%  80% 0.00     3110 0xffffff096ffdbf10     zio_remove_child
>> 67512   4%  83% 0.00     1938 0xffffff096f3db4b0     zio_add_child
>> 45736   3%  86% 0.00     2239 0xffffff0935de3c50     dbuf_destroy
>> 27781   3%  89% 0.00     3416 0xffffff0935de3c50     dbuf_create
>> 38536   2%  91% 0.00     2122 0xffffff0935de3b70     dnode_rele
>> 27841   2%  93% 0.00     2423 0xffffff0935de3b70     dnode_diduse_space
>> 19020   2%  95% 0.00     3046 0xffffff09d9e305e0     dbuf_rele
>> 14627   1%  96% 0.00     3632 dbuf_hash_table+0x4f8  dbuf_find
>>
>>
>>   Writes are not performing htable lookups, while reads are.
>>
>> -Warren V
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann <jg at osn.de> wrote:
>>
>>  Hi,
>>
>> I would try *one* TPG which includes both interface addresses
>> and I would double check for packet drops on the Catalyst.
>>
>> The 3560 supports only receive flow control which means, that
>> a sending 10Gbit port can easily overload a 1Gbit port.
>> Do you have flow control enabled?
>>
>>  - Joerg
>>
>>
>>
>> On 02.03.2015 09:22, W Verb via illumos-developer wrote:
>>
>>   Hello Garrett,
>>
>> No, no 802.3ad going on in this config.
>>
>> Here is a basic schematic:
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing
>>
>> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing
>>
>> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The
>> switch is set to allow 9148-byte frames, and I'm not seeing any
>> errors/buffer overruns on the switch.
>>
>> Here is a screenshot of a packet capture from a read operation on the
>> guest OS (from it's local drive, which is actually a VMDK file on the
>> storage server). In this example, only a single 1G ESXi kernel interface
>> (vmk1) is bound to the software iSCSI initiator.
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing
>>
>> Note that there's a nice, well-behaved window sizing process taking
>> place. The ESXi decreases the scaled window by 11 or 12 for each ACK,
>> then bumps it back up to 512.
>>
>> Here is a similar screenshot of a single-interface write operation:
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing
>>
>> There are no pauses or gaps in the transmission rate in the
>> single-interface transfers.
>>
>>
>> In the next screenshots, I have enabled an additional 1G interface on
>> the ESXi host, and bound it to the iSCSI initiator. The new interface is
>> bound to a separate physical port, uses a different VLAN on the switch,
>> and talks to a different 10G port on the storage server.
>>
>> First, let's look at a write operation on the guest OS, which happily
>> pumps data at near-line-rate to the storage server.
>>
>> Here is a sequence number trace diagram. Note how the transfer has a
>> nice, smooth increment rate over the entire transfer.
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing
>>
>> Here are screenshots from packet captures on both 1G interfaces:
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing
>>
>> Note how we again see nice, smooth window adjustment, and no gaps in
>> transmission.
>>
>>
>> But now, let's look at the problematic two-interface Read operation.
>> First, the sequence graph:
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing
>>
>> As you can see, there are gaps and jumps in the transmission throughout
>> the transfer.
>> It is very illustrative to look at captures of the gaps, which are
>> occurring on both interfaces:
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing
>>
>> As you can see, there are ~.4 second pauses in transmission from the
>> storage server, which kills the transfer rate.
>> It's clear that the ESXi box ACKs the prior iSCSI operation to
>> completion, then makes a new LUN request, which the storage server
>> immediately replies to. The ESXi ACKs the response packet from the
>> storage server, then waits...and waits....and waits... until eventually
>> the storage server starts transmitting again.
>>
>> Because the pause happens while the ESXi client is waiting for a packet
>> from the storage server, that tells me that the gaps are not an artifact
>> of traffic being switched between both active interfaces, but are
>> actually indicative of short hangs occurring on the server.
>>
>> Having a pause or two in transmission is no big deal, but in my case, it
>> is happening constantly, and dropping my overall read transfer rate down
>> to 20-60MB/s, which is slower than the single interface transfer rate
>> (~90-100MB/s).
>>
>> Decreasing the MTU makes the pauses shorter, increasing them makes the
>> pauses longer.
>>
>> Another interesting thing is that if I set the multipath io interval to
>> 3 operations instead of 1, I get better throughput. In other words, the
>> less frequently I swap IP addresses on my iSCSI requests from the ESXi
>> unit, the fewer pauses I see.
>>
>> Basically, COMSTAR seems to choke each time an iSCSI request from a new
>> IP arrives.
>>
>> Because the single interface transfer is near line rate, that tells me
>> that the storage system (mpt_sas, zfs, etc) is working fine. It's only
>> when multiple paths are attempted that iSCSI falls on its face during
>> reads.
>>
>> All of these captures were taken without a cache device being attached
>> to the storage zpool, so this isn't looking like some kind of ZFS ARC
>> problem. As mentioned previously, local transfers to/from the zpool are
>> showing ~300-500 MB/s rates over long transfers (10G+).
>>
>> -Warren V
>>
>> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org
>>
>> <mailto:garrett at damore.org>> wrote:
>>
>>     I?m not sure I?ve followed properly.  You have *two* interfaces.
>>     You are not trying to provision these in an aggr are you? As far as
>>     I?m aware, VMware does not support 802.3ad link aggregations.  (Its
>>     possible that you can make it work with ESXi if you give the entire
>>     NIC to the guest ? but I?m skeptical.)  The problem is that if you
>>     try to use link aggregation, some packets (up to half!) will be
>>     lost.  TCP and other protocols fare poorly in this situation.
>>
>>     Its possible I?ve totally misunderstood what you?re trying to do, in
>>     which case I apologize.
>>
>>     The idle thing is a red-herring ? the cpu is waiting for work to do,
>>     probably because packets haven?t arrived (or where dropped by the
>>     hypervisor!)  I wouldn?t read too much into that except that your
>>     network stack is in trouble.  I?d look a bit more closely at the
>>     kstats for tcp ? I suspect you?ll see retransmits or out of order
>>     values that are unusually high ? if so this may help validate my
>>     theory above.
>>
>>     - Garrett
>>
>>     On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>>     <developer at lists.illumos.org <mailto:developer at lists.illumos.org>>
>>
>>
>>     wrote:
>>
>>     Hello all,
>>
>>
>>     Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>>
>>
>>     I tried Joerg's updated driver, which didn't improve the issue. So
>>     I went back to the drawing board and rebuilt the server from scratch.
>>
>>     What I noted is that if I have only a single 1-gig physical
>>     interface active on the ESXi host, everything works as expected.
>>     As soon as I enable two interfaces, I start seeing the performance
>>     problems I've described.
>>
>>     Response pauses from the server that I see in TCPdumps are still
>>     leading me to believe the problem is delay on the server side, so
>>     I ran a series of kernel dtraces and produced some flamegraphs.
>>
>>
>>     This was taken during a read operation with two active 10G
>>     interfaces on the server, with a single target being shared by two
>>     tpgs- one tpg for each 10G physical port. The host device has two
>>     1G ports enabled, with VLANs separating the active ports into
>>     10G/1G pairs. ESXi is set to multipath using both VLANS with a
>>     round-robin IO interval of 1.
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing
>>
>>
>>     This was taken during a write operation:
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing
>>
>>
>>     I then rebooted the server and disabled C-State, ACPI T-State, and
>>     general EIST (Turbo boost) functionality in the CPU.
>>
>>     I when I attempted to boot my guest VM, the iSCSI transfer
>>     gradually ground to a halt during the boot loading process, and
>>     the guest OS never did complete its boot process.
>>
>>     Here is a flamegraph taken while iSCSI is slowly dying:
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing
>>
>>
>>     I edited out cpu_idle_adaptive from the dtrace output and
>>     regenerated the slowdown graph:
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing
>>
>>
>>     I then edited cpu_idle_adaptive out of the speedy write operation
>>     and regenerated that graph:
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing
>>
>>
>>     I have zero experience with interpreting flamegraphs, but the most
>>     significant difference I see between the slow read example and the
>>     fast write example is in unix`thread_start --> unix`idle. There's
>>     a good chunk of "unix`i86_mwait" in the read example that is not
>>     present in the write example at all.
>>
>>     Disabling the l2arc cache device didn't make a difference, and I
>>     had to reenable EIST support on the CPU to get my VMs to boot.
>>
>>     I am seeing a variety of bug reports going back to 2010 regarding
>>     excessive mwait operations, with the suggested solutions usually
>>     being to set "cpupm enable poll-mode" in power.conf. That change
>>     also had no effect on speed.
>>
>>     -Warren V
>>
>>
>>
>>
>>     -----Original Message-----
>>
>>     From: Chris Siebenmann [mailto:cks at cs.toronto.edu]
>>
>>     Sent: Monday, February 23, 2015 8:30 AM
>>
>>     To: W Verb
>>
>>     Cc: omnios-discuss at lists.omniti.com
>>
>>     <mailto:omnios-discuss at lists.omniti.com>; cks at cs.toronto.edu
>>     <mailto:cks at cs.toronto.edu>
>>
>>     Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and
>>     the Greek economy
>>
>>
>>     > Chris, thanks for your specific details. I'd appreciate it if you
>>
>>     > could tell me which copper NIC you tried, as well as to pass on the
>>
>>     > iSCSI tuning parameters.
>>
>>
>>     Our copper NIC experience is with onboard X540-AT2 ports on
>>     SuperMicro hardware (which have the guaranteed 10-20 msec lock
>>     hold) and dual-port 82599EB TN cards (which have some sort of
>>     driver/hardware failure under load that eventually leads to
>>     2-second lock holds). I can't recommend either with the current
>>     driver; we had to revert to 1G networking in order to get stable
>>     servers.
>>
>>
>>     The iSCSI parameter modifications we do, across both initiators
>>     and targets, are:
>>
>>
>>     initialr2tno
>>
>>     firstburstlength128k
>>
>>     maxrecvdataseglen128k[only on Linux backends]
>>
>>     maxxmitdataseglen128k[only on Linux backends]
>>
>>
>>     The OmniOS initiator doesn't need tuning for more than the first
>>     two parameters; on the Linux backends we tune up all four. My
>>     extended thoughts on these tuning parameters and why we touch them
>>     can be found
>>
>>     here:
>>
>>
>>
>> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol
>>
>>     http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>>
>>
>>     The short version is that these parameters probably only make a
>>     small difference but their overall goal is to do 128KB ZFS reads
>>     and writes in single iSCSI operations (although they will be
>>     fragmented at the TCP
>>
>>     layer) and to do iSCSI writes without a back-and-forth delay
>>     between initiator and target (that's 'initialr2t no').
>>
>>
>>     I think basically everyone should use InitialR2T set to no and in
>>     fact that it should be the software default. These days only
>>     unusually limited iSCSI targets should need it to be otherwise and
>>     they can change their setting for it (initiator and target must
>>     both agree to it being 'yes', so either can veto it).
>>
>>
>>     - cks
>>
>>
>>
>>     On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de
>>
>>     <mailto:jg at osn.de>> wrote:
>>
>>         Hi,
>>
>>         I think your problem is caused by your link properties or your
>>         switch settings. In general the standard ixgbe seems to perform
>>         well.
>>
>>         I had trouble after changing the default flow control settings
>>         to "bi"
>>         and this was my motivation to update the ixgbe driver a long
>>         time ago.
>>         After I have updated our systems to ixgbe 2.5.8 I never had any
>>         problems ....
>>
>>         Make sure your switch has support for jumbo frames and you use
>>         the same mtu on all ports, otherwise the smallest will be used.
>>
>>         What switch do you use? I can tell you nice horror stories about
>>         different vendors....
>>
>>          - Joerg
>>
>>         On 23.02.2015 10:31, W Verb wrote:
>>
>>             Thank you Joerg,
>>
>>             I've downloaded the package and will try it tomorrow.
>>
>>             The only thing I can add at this point is that upon review
>>             of my
>>             testing, I may have performed my "pkg -u" between the
>>             initial quad-gig
>>             performance test and installing the 10G NIC. So this may
>>             be a new
>>             problem introduced in the latest updates.
>>
>>             Those of you who are running 10G and have not upgraded to
>>             the latest
>>             kernel, etc, might want to do some additional testing
>>             before running the
>>             update.
>>
>>             -Warren V
>>
>>             On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann
>>             <jg at osn.de <mailto:jg at osn.de>
>>
>>             <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
>>
>>                 Hi,
>>
>>                 I remember there was a problem with the flow control
>>             settings in the
>>                 ixgbe
>>                 driver, so I updated it a long time ago for our
>>             internal servers to
>>                 2.5.8.
>>                 Last weekend I integrated the latest changes from the
>>             FreeBSD driver
>>                 to bring
>>                 the illumos ixgbe to 2.5.25 but I had no time to test
>>             it, so it's
>>                 completely
>>                 untested!
>>
>>
>>                 If you would like to give the latest driver a try you
>>             can fetch the
>>                 kernel modules from
>>             https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9
>>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>>                 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>>             <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>
>>
>>                 Clone your boot environment, place the modules in the
>>             new environment
>>                 and update the boot-archive of the new BE.
>>
>>                   - Joerg
>>
>>
>>
>>
>>
>>                 On 23.02.2015 02:54, W Verb wrote:
>>
>>                     By the way, to those of you who have working
>>             setups: please send me
>>                     your pool/volume settings, interface linkprops,
>>             and any kernel
>>                     tuning
>>                     parameters you may have set.
>>
>>                     Thanks,
>>                     Warren V
>>
>>                     On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>                     <chip at innovates.com <mailto:chip at innovates.com>
>>             <mailto:chip at innovates.com <mailto:chip at innovates.com>>>
>>
>>
>>             wrote:
>>
>>                         I can't say I totally agree with your performance
>>                         assessment.   I run Intel
>>                         X520 in all my OmniOS boxes.
>>
>>                         Here is a capture of nfssvrtop I made while
>>             running many
>>                         storage vMotions
>>                         between two OmniOS boxes hosting NFS
>>             datastores.   This is a
>>                         10 host VMware
>>                         cluster.  Both OmniOS boxes are dual 10G
>>             connected with
>>                         copper twin-ax to
>>                         the in rack Nexus 5010.
>>
>>                         VMware does 100% sync writes, I use ZeusRAM
>>             SSDs for log
>>                         devices.
>>
>>                         -Chip
>>
>>                         2014 Apr 24 08:05:51, load: 12.64, read:
>>             17330243 KB,
>>                         swrite: 15985    KB,
>>                         awrite: 1875455  KB
>>
>>                         Ver     Client           NFSOPS   Reads
>>             SWrites AWrites
>>                         Commits   Rd_bw
>>                         SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t
>>              Com_t  Align%
>>
>>                         4       10.28.17.105          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         4       10.28.17.215          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         4       10.28.17.213          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         4       10.28.16.151          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         4       all                   1       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>>
>>                         3       10.28.16.175          3       0
>>              3       0
>>                           0       1
>>                         11       0    4806      48       0       0      85
>>
>>                         3       10.28.16.183          6       0
>>              6       0
>>                           0       3
>>                         162       0     549     124       0       0
>>               73
>>
>>                         3       10.28.16.180         11       0
>>             10       0
>>                           0       3
>>                         27       0     776      89       0       0      67
>>
>>                         3       10.28.16.176         28       2
>>             26       0
>>                           0      10
>>                         405       0    2572     198       0       0
>>              100
>>
>>                         3       10.28.16.178       4606    4602
>>              4       0
>>                           0  294534
>>                         3       0     723      49       0       0      99
>>
>>                         3       10.28.16.179       4905    4879
>>             26       0
>>                           0  312208
>>                         311       0     735     271       0       0
>>               99
>>
>>                         3       10.28.16.181       5515    5502
>>             13       0
>>                           0  352107
>>                         77       0      89      87       0       0      99
>>
>>                         3       10.28.16.184      12095   12059
>>             10       0
>>                           0  763014
>>                         39       0     249     147       0       0      99
>>
>>                         3       10.28.58.1        15401    6040
>>              116    6354
>>                         53  191605
>>                         474  202346     192      96     144      83
>>               99
>>
>>                         3       all 42574 33086 <tel:42574%2033086
>> <42574%2033086>>
>>             <tel:42574%20%20%2033086 <42574%20%20%2033086>>     217
>>                         6354      53 1913488
>>                         1582  202300     348     138     153     105
>>                 99
>>
>>
>>
>>
>>
>>                         On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>>             <wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>>                         <mailto:wverb73 at gmail.com
>>
>>
>>             <mailto:wverb73 at gmail.com>>> wrote:
>>
>>
>>                             Hello All,
>>
>>                             Thank you for your replies.
>>                             I tried a few things, and found the following:
>>
>>                             1: Disabling hyperthreading support in the
>>             BIOS drops
>>                             performance overall
>>                             by a factor of 4.
>>                             2: Disabling VT support also seems to have
>>             some effect,
>>                             although it
>>                             appears to be minor. But this has the
>>             amusing side
>>                             effect of fixing the
>>                             hangs I've been experiencing with fast
>>             reboot. Probably
>>                             by disabling kvm.
>>                             3: The performance tests are a bit tricky
>>             to quantify
>>                             because of caching
>>                             effects. In fact, I'm not entirely sure
>>             what is
>>                             happening here. It's just
>>                             best to describe what I'm seeing:
>>
>>                             The commands I'm using to test are
>>                             dd if=/dev/zero of=./test.dd bs=2M count=5000
>>                             dd of=/dev/null if=./test.dd bs=2M count=5000
>>                             The host vm is running Centos 6.6, and has
>>             the latest
>>                             vmtools installed.
>>                             There is a host cache on an SSD local to
>>             the host that
>>                             is also in place.
>>                             Disabling the host cache didn't
>>             immediately have an
>>                             effect as far as I could
>>                             see.
>>
>>                             The host MTU set to 3000 on all iSCSI
>>             interfaces for all
>>                             tests.
>>
>>                             Test 1: Right after reboot, with an ixgbe
>>             MTU of 9000,
>>                             the write test
>>                             yields an average speed over three tests
>>             of 137MB/s. The
>>                             read test yields an
>>                             average over three tests of 5MB/s.
>>
>>                             Test 2: After setting "ifconfig ixgbe0 mtu
>>             3000", the
>>                             write tests yield
>>                             140MB/s, and the read tests yield 53MB/s.
>>             It's important
>>                             to note here that
>>                             if I cut the read test short at only
>>             2-3GB, I get
>>                             results upwards of
>>                             350MB/s, which I assume is local
>>             cache-related distortion.
>>
>>                             Test 3: MTU of 1500. Read tests are up to
>>             156 MB/s.
>>                             Write tests yield
>>                             about 142MB/s.
>>                             Test 4: MTU of 1000: Read test at 182MB/s.
>>                             Test 5: MTU of 900: Read test at 130 MB/s.
>>                             Test 6: MTU of 1000: Read test at 160MB/s.
>>             Write tests
>>                             are now
>>                             consistently at about 300MB/s.
>>                             Test 7: MTU of 1200: Read test at 124MB/s.
>>                             Test 8: MTU of 1000: Read test at 161MB/s.
>>             Write at 261MB/s.
>>
>>                             A few final notes:
>>                             L1ARC grabs about 10GB of RAM during the
>>             tests, so
>>                             there's definitely some
>>                             read caching going on.
>>                             The write operations are easier to observe
>>             with iostat,
>>                             and I'm seeing io
>>                             rates that closely correlate with the
>>             network write speeds.
>>
>>
>>                             Chris, thanks for your specific details.
>>             I'd appreciate
>>                             it if you could
>>                             tell me which copper NIC you tried, as
>>             well as to pass
>>                             on the iSCSI tuning
>>                             parameters.
>>
>>                             I've ordered an Intel EXPX9502AFXSR, which
>>             uses the
>>                             82598 chip instead of
>>                             the 82599 in the X520. If I get similar
>>             results with my
>>                             fiber transcievers,
>>                             I'll see if I can get a hold of copper ones.
>>
>>                             But I should mention that I did indeed
>>             look at PHY/MAC
>>                             error rates, and
>>                             they are nil.
>>
>>                             -Warren V
>>
>>                             On Fri, Feb 20, 2015 at 7:25 PM, Chris
>>             Siebenmann
>>                             <cks at cs.toronto.edu
>>
>>             <mailto:cks at cs.toronto.edu> <mailto:cks at cs.toronto.edu
>>
>>
>>             <mailto:cks at cs.toronto.edu>>>
>>
>>                             wrote:
>>
>>
>>                                     After installation and
>>             configuration, I observed
>>                                     all kinds of bad
>>                                     behavior
>>                                     in the network traffic between the
>>             hosts and the
>>                                     server. All of this
>>                                     bad
>>                                     behavior is traced to the ixgbe
>>             driver on the
>>                                     storage server. Without
>>                                     going
>>                                     into the full troubleshooting
>>             process, here are
>>                                     my takeaways:
>>
>>                                 [...]
>>
>>                                    For what it's worth, we managed to
>>             achieve much
>>                                 better line rates on
>>                                 copper 10G ixgbe hardware of various
>>             descriptions
>>                                 between OmniOS
>>                                 and CentOS 7 (I don't think we ever
>>             tested OmniOS to
>>                                 OmniOS). I don't
>>                                 believe OmniOS could do TCP at full
>>             line rate but I
>>                                 think we managed 700+
>>                                 Mbytes/sec on both transmit and
>>             receive and we got
>>                                 basically disk-limited
>>                                 speeds with iSCSI (across multiple
>>             disks on
>>                                 multi-disk mirrored pools,
>>                                 OmniOS iSCSI initiator, Linux iSCSI
>>             targets).
>>
>>                                    I don't believe we did any specific
>>             kernel tuning
>>                                 (and in fact some of
>>                                 our attempts to fiddle ixgbe driver
>>             parameters blew
>>                                 up in our face).
>>                                 We did tune iSCSI connection
>>             parameters to increase
>>                                 various buffer
>>                                 sizes so that ZFS could do even large
>>             single
>>                                 operations in single iSCSI
>>                                 transactions. (More details available
>>             if people are
>>                                 interested.)
>>
>>                                     10: At the wire level, the speed
>>             problems are
>>                                     clearly due to pauses in
>>                                     response time by omnios. At 9000
>>             byte frame
>>                                     sizes, I see a good number
>>                                     of duplicate ACKs and fast
>>             retransmits during
>>                                     read operations (when
>>                                     omnios is transmitting). But below
>>             about a
>>                                     4100-byte MTU on omnios
>>                                     (which seems to correlate to
>>             4096-byte iSCSI
>>                                     block transfers), the
>>                                     transmission errors fade away and
>>             we only see
>>                                     the transmission pause
>>                                     problem.
>>
>>
>>                                    This is what really attracted my
>>             attention. In
>>                                 our OmniOS setup, our
>>                                 specific Intel hardware had ixgbe
>>             driver issues that
>>                                 could cause
>>                                 activity stalls during once-a-second
>>             link heartbeat
>>                                 checks. This
>>                                 obviously had an effect at the TCP and
>>             iSCSI layers.
>>                                 My initial message
>>                                 to illumos-developer sparked a potentially
>>                                 interesting discussion:
>>
>>
>> http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/
>>             <
>> http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/
>> >
>>
>>             <
>> http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/
>>             <
>> http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/
>> >>
>>
>>                                 If you think this is a possibility in
>>             your setup,
>>                                 I've put the DTrace
>>                                 script I used to hunt for this up on
>>             the web:
>>
>>
>> http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d
>>             <
>> http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d>
>>
>>             <
>> http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d
>>             <
>> http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d>>
>>
>>                                 This isn't the only potential source
>>             of driver
>>                                 stalls by any means, it's
>>                                 just the one I found. You may also
>>             want to look at
>>                                 lockstat in general,
>>                                 as information it reported is what led
>>             us to look
>>                                 specifically at the
>>                                 ixgbe code here.
>>
>>                                 (If you suspect kernel/driver issues,
>>             lockstat
>>                                 combined with kernel
>>                                 source is a really excellent resource.)
>>
>>                                           - cks
>>
>>
>>
>>
>>
>>             ___________________________________________________
>>                             OmniOS-discuss mailing list
>>             OmniOS-discuss at lists.omniti
>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>                             <mailto:OmniOS-discuss at lists.__omniti.com
>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>
>> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>>
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>
>>
>>                     ___________________________________________________
>>                     OmniOS-discuss mailing list
>>             OmniOS-discuss at lists.omniti
>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>                     <mailto:OmniOS-discuss at lists.__omniti.com
>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>
>> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>>
>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>
>>
>>                 --
>>                 OSN Online Service Nuernberg GmbH, Bucher Str. 78,
>>             90408 Nuernberg
>>                 Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0
>> <%2B49%20911%2039905-0>>
>>             <tel:%2B49%20911%2039905-0 <%2B49%20911%2039905-0>> - Fax:
>> +49 911
>>                 39905-55 <tel:%2B49%20911%2039905-55
>> <%2B49%20911%2039905-55>> -
>>             http://www.osn.de <http://www.osn.de/>
>>                 HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg
>>             Goltermann
>>
>>
>>
>>         --
>>         OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>>         Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0
>> <%2B49%20911%2039905-0>> - Fax: +49
>>         911 39905-55 <tel:%2B49%20911%2039905-55 <%2B49%20911%2039905-55>>
>> - http://www.osn.de
>>         <http://www.osn.de/>
>>         HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>
>>
>>     *illumos-developer* | Archives
>>     <https://www.listbox.com/member/archive/182179/=now>
>>     <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e>
>>     | Modify <https://www.listbox.com/member/?&> Your Subscription
>>     [Powered by Listbox] <http://www.listbox.com/>
>>
>>
>>
>> *illumos-developer* | Archives
>> <https://www.listbox.com/member/archive/182179/=now>
>> <https://www.listbox.com/member/archive/rss/182179/21175123-d0c8da4c> |
>> Modify
>>
>> <https://www.listbox.com/member/?&id_secret=21175123-d92578cc
>> <https://www.listbox.com/member/?&>>
>> Your Subscription       [Powered by Listbox] <htt
>> <http://www.listbox.com/>
>>
>> ...
>>
>> [Message clipped]
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150303/97520cac/attachment-0001.html>

From garrett at damore.org  Wed Mar  4 07:30:17 2015
From: garrett at damore.org (Garrett D'Amore)
Date: Tue, 3 Mar 2015 23:30:17 -0800
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
	Lindsay Lohan, and the Greek economy
In-Reply-To: <CAHN_Y24kVihzV8S9vinyjAi6Sea+HzzHrxutMQatK1Xp4CVs=Q@mail.gmail.com>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
	<54F44602.5030705@osn.de>
	<CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>
	<064C3FC8-43E0-4757-AADA-831303782A4C@damore.org>
	<CAHN_Y276qRUD9y0DoZmm5Uh57DBXsEi2BJ3Z7G=DYt-p9mSmWQ@mail.gmail.com>
	<f48da72278a64860a0f72af9a673c3f4@NASANEXM01F.na.qualcomm.com>
	<CAHN_Y25pwEGavh_ffsmxDNOBiZFA0OadpBE60vQ09=ThDzj-hA@mail.gmail.com>
	<CAHN_Y24kVihzV8S9vinyjAi6Sea+HzzHrxutMQatK1Xp4CVs=Q@mail.gmail.com>
Message-ID: <F29824E6-D9AE-4C1F-833D-511AF455D04B@damore.org>

I'm not surprised by this result.  Indeed with the earlier data you had from lockstat it looked like a comstar or zfs issue on the server.  Unfortunately the follow up lockstat you sent was pruned to uselessness.  If you can post the full lockstat with -s5 somewhere it might help understand what is actually going on under the hood.

Sent from my iPhone

> On Mar 3, 2015, at 9:21 PM, W Verb <wverb73 at gmail.com> wrote:
> 
> Hello all,
> 
> This is probably the last message in this thread.
> 
> I pulled the quad-gig NIC out of one of my hosts, and installed an X520. I then set a single 10G port on the server to be on the same VLAN as the host, and defined a vswitch, vmknic, etc on the host.
> 
> I set the MTU to be 9000 on both sides, then ran my tests.
> 
> Read:  130 MB/s.
> Write:  156 MB/s.
> 
> Additionally, at higher MTUs, the NIC would periodically lock up until I performed an "ipadm disable-if -t ixgbe0" and re-enabled it. I tried your updated driver, Jeorg, but unfortunately it failed quite often.
> 
> I then disabled stmf, enabled NFS (v3 only) on the server, and shared a dataset on the zpool with "share -f nfs /ppool/testy".
> I then mounted the server dataset on the host via NFS, and copied my test VM from the iSCSI zvol to the NFS dataset. I also removed the binding of the 10G port on the host from the sw iscsi interface.
> 
> Running the same tests on the VM over NFSv3 yielded:
> 
> Read: 650MB/s
> Write: 306MB/s
> 
> This is getting within 10% of the throughput I consistently get on dd operations local on the server, so I'm pretty happy that I'm getting as good as I'm going to get until I add more drives. Additionally, I haven't experienced any NIC hangs.
> 
> I tried varying the settings in ixgbe.conf, the MTU, and disabling LSO on the host and server, but nothing really made that much of a difference (except reducing the MTU made things about 20-30% slower).
> 
> mpstat during both NFS and iSCSI transfers showed all processors as getting roughly the same number of interrupts, etc, although I did see a varying number of  spins on reader/writer locks during the iSCSI transfers. The NFS showed no srws at all.
> 
> Here is a pretty representative example of a 1s mpstat during an iSCSI transfer:
> 
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl set
>   0    0   0    0  3246 2690 8739    6  772 5967    2     0    0  11   0  89   0
>   1    0   0    0  2366 2249 7910    8  988 5563    2   302    0   9   0  91   0
>   2    0   0    0  2455 2344 5584    5  687 5656    3    66    0   9   0  91   0
>   3    0   0   25   248   12 6210    1  885 5679    2     0    0   9   0  91   0
>   4    0   0    0   284    7 5450    2  861 5751    1     0    0   8   0  92   0
>   5    0   0    0   232    3 4513    0  547 5733    3     0    0   7   0  93   0
>   6    0   0    0   322    8 6084    1  836 6295    2     0    0   8   0  92   0
>   7    0   0    0  3114 2848 8229    4  648 4966    2     0    0  10   0  90   0
> 
> 
> So, it seems that it's COMSTAR/iSCSI that's broke as hell, not ixgbe. My apologies to anyone I may have offended with my pre-judgement.
> 
> The consequences of this performance issue are significant:
> 1: Instead of being able to utilize the existing quad-port NICs I have in my hosts, I must use dual 10G cards for redundancy purposes.
> 2: I must build out a full 10G switching infrastructure.
> 3: The network traffic is inherently less secure, as it is essentially impossible to do real security with NFSv3 (that is supported by ESXi).
> 
> In the short run, I have already ordered some relatively cheap 20G infiniband gear that will hopefully push up the cost/performance ratio. However, I have received all sorts of advice about how painful it can be to build and maintain infiniband, and if iSCSI over 10G ethernet is this painful, I'm not hopeful that infiniband will "just work".
> 
> The last option, of course, is to bail out of the Solaris derivatives and move to ZoL or ZoBSD. The drawbacks of this are:
> 
> 1: ZoL doesn't easily support booting off of mirrored USB flash drives, let alone running the root filesystem and swap on them. FreeNAS, by way of comparison, puts a 2G swap partition on each zdev, which (strangely enough) causes it to often crash when a zdev experiences a failure under load.
> 
> 2: Neither ZoL or FreeNAS have good, stable, kernel-based iSCSI implementations. FreeNAS is indeed testing istgt, but it proved unstable for my purposes in recent builds. Unfortunately, stmf hasn't proved itself any better.
> 
> There are other minor differences, but these are the ones that brought me to OmniOS in the first place. We'll just have to wait and see how well the infiniband stuff works.
> 
> 
> Hopefully this exercise will help prevent others from going down the same rabbit-hole that I did.
> 
> -Warren V
>  
> 
> 
> 
>> On Tue, Mar 3, 2015 at 3:45 PM, W Verb <wverb73 at gmail.com> wrote:
>> Hello Rob et al,
>> 
>> Thank you for taking the time to look at this problem with me. I completely understand your inclination to look at the network as the most probable source of my issue, but I believe that this is a pretty clear-cut case of server-side issues.
>> 
>> 1: I did run ping RTT tests during both read and write operations with multiple interfaces enabled, and the RTT stayed at ~.2ms regardless of whether traffic was actively being transmitted/received or not.
>> 
>> 2: I am not seeing the TCP window size bouncing around, and I am certainly not seeing starvation and delay in my packet captures. It is true that I do see delayed ACKs and retransmissions when I bump the MTU to 9000 on both sides, but I stopped testing with high MTU as soon as I saw it happening because I have a good understanding of incast. All of my recent testing has been with MTUs between 1000 and 3000 bytes.
>> 
>> 3: When testing with MTUs between 1000 and 3000 bytes, I do not see lost packets and retransmission in captures on either the server or client side. I only see staggered transmission delays on the part of the server.
>> 
>> 4: The client is consistently advertising a large window size (20k+), so the TCP throttling mechanism does not appear to play into this.
>> 
>> 5: As mentioned previously, layer 2 flow control is not enabled anywhere in the network, so there are no lower-level mechanisms at work.
>> 
>> 6: Upon checking buffer and queue sizes (and doing the appropriate research into documentation on the C3560E's buffer sizes), I do not see large numbers of frames being dropped by the switch. It does happen at larger MTUs, but not very often (and not consistently) during transfers at 1000-3000 byte MTUs. I do not have QoS, policing, or rate-shaping enabled.
>> 
>> 7: Network interface stats on both the server and the ESXi client show no errors of any kind. This is via netstat on the server, and esxcli / Vsphere client on the ESXi box.
>> 
>> 8: When looking at captures taken simultaneously on the server and client side, the server-side transmission pauses are consistently seen and reproducible, even after multiple server rebuilds, ESXi rebuilds, vSphere reinstallations (down to wiping the SQL db), various COMSTAR configuration variations, multiple 10G NICs with different NIC chipsets, multiple switches (I tried both a 48-port and 24-port C3560E), multiple IOS revisions (12.2 and 15.0), OmniOS versions (r151012 and previous) multiple cables, transceivers, etc etc etc etc etc 
>> 
>> For your review, I have uploaded the actual packet captures to Google Drive:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQZS03dHJ2ZjJvTEE/view?usp=sharing   2 int write - ESXi vmk5
>> https://drive.google.com/file/d/0BwyUMjibonYQTlNTQ2M5bjlxZ00/view?usp=sharing   2 int write - ESXi vmk1
>> https://drive.google.com/file/d/0BwyUMjibonYQUEFsSVJCYXBVX3c/view?usp=sharing   2 int read -  server ixgbe0
>> https://drive.google.com/file/d/0BwyUMjibonYQT3FBbElnNFpJTzQ/view?usp=sharing   2 int read - ESXi vmk5
>> https://drive.google.com/file/d/0BwyUMjibonYQU1hXdFRLM2cxSTA/view?usp=sharing   2 int read - ESXi vmk1
>> https://drive.google.com/file/d/0BwyUMjibonYQNEFZSHVZdFNweDA/view?usp=sharing   1 int write - ESXi vmk1
>> https://drive.google.com/file/d/0BwyUMjibonYQM3FpTmloQm5iMGc/view?usp=sharing   1 int read - ESXi vmk1
>> 
>> Regards,
>> 
>> Warren V
>> 
>>> On Mon, Mar 2, 2015 at 1:11 PM, Mallory, Rob <rmallory at qualcomm.com> wrote:
>>> Just an EWAG,   and forgive me for not following closely, I just saw this in my inbox, and looked at it and the screenshots for 2 minutes.
>>> 
>>>  
>>> 
>>>   But this looks like the typical incast problem..  see http://www.pdl.cmu.edu/Incast/  
>>> 
>>> where your storage servers (there are effectively two with ISCSI/MPIO if round-robin is working) have networks which are 20:1 oversubscribed to your 1GbE host interfaces. (although one of the tcpdumps shows only one server so it may be choked out completely)
>>> 
>>>  
>>> 
>>> What is your BDP?  I?m guessing .150ms * 1GbE.  For single-link that gets you to a MSS of 18700 or so.
>>> 
>>>  
>>> 
>>> On your 1GbE connected clients, leave MTU at 9k, set the following in sysctl.conf,
>>> 
>>> And reboot.
>>> 
>>>  
>>> 
>>> net.ipv4.tcp_rmem = 4096 8938 17876
>>> 
>>>  
>>> 
>>> If MPIO from the server is indeed round-robining properly, this will ?make things fit? much better.
>>> 
>>>  
>>> 
>>> Note that your tcp_wmem can and should stay high, since you are not oversubscribed going from client?server ;  you only need to tweak the tcp receive window size.
>>> 
>>>  
>>> 
>>> I?ve not done it in quite some time, but IIRC, You can also set these from the server side with:
>>> 
>>> Route add -sendpipe 8930   or ?ssthresh
>>> 
>>>  
>>> 
>>> And I think you can see the hash-table with computed BDP per client with ndd.
>>> 
>>>  
>>> 
>>> I would try playing with those before delving deep into potential bugs in the TCP, nic driver, zfs, or vm.
>>> 
>>> -Rob
>>> 
>>>  
>>> 
>>> From: W Verb via illumos-developer [mailto:developer at lists.illumos.org] 
>>> Sent: Monday, March 02, 2015 12:20 PM
>>> To: Garrett D'Amore
>>> Cc: Joerg Goltermann; illumos-dev; omnios-discuss at lists.omniti.com
>>> Subject: Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy
>>> 
>>>  
>>> 
>>> Hello,
>>> 
>>> vmstat seems pretty boring. Certainly nothing going to swap.
>>> 
>>> root at sanbox:/root# vmstat
>>>  kthr      memory            page            disk          faults      cpu
>>>  r b w   swap  free  re  mf pi po fr de sr po ro s0 s2   in   sy   cs us sy id
>>>  0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0  1 99
>>> 
>>> 
>>> Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep 30" during the "fast" write operation.
>>> 
>>> -------------------------------------------------------------------------------
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>> 50934   3%  79% 0.00     3437 0xffffff093145ba40     taskq_dispatch_ent
>>> 
>>>       nsec ------ Time Distribution ------ count     Stack
>>>        128 |                               7         spa_taskq_dispatch_ent
>>>        256 |@@                             4333      zio_taskq_dispatch
>>>        512 |@@                             3863      zio_issue_async
>>>       1024 |@@@@@                          9717      zio_execute
>>>       2048 |@@@@@@@@@                      15904
>>>       4096 |@@@@                           7595
>>>       8192 |@@                             4498
>>>      16384 |@                              2662
>>>      32768 |@                              1886
>>>      65536 |                               434
>>>     131072 |                               34
>>>     262144 |                               1
>>> -------------------------------------------------------------------------------
>>> 
>>> 
>>> 
>>> However, the truly "broken" function is a read operation:
>>> 
>>> Top lock 1st try:
>>> -------------------------------------------------------------------------------
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>>   474  15%  15% 0.00     7031 0xffffff093145b6f8     cv_wait
>>> 
>>>       nsec ------ Time Distribution ------ count     Stack
>>>        256 |@                              29        taskq_thread_wait
>>>        512 |@@@@@@                         100       taskq_thread
>>>       1024 |@@@@                           72        thread_start
>>>       2048 |@@@@                           69
>>>       4096 |@@@                            51
>>>       8192 |@@                             47
>>>      16384 |@@                             44
>>>      32768 |@@                             32
>>>      65536 |@                              25
>>>     131072 |                               5
>>> -------------------------------------------------------------------------------
>>> 
>>> 
>>> Top lock 2nd try:
>>> 
>>> -------------------------------------------------------------------------------
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>>   174  39%  39% 0.00   103909 0xffffff0943f116a0     dmu_zfetch_find
>>> 
>>>       nsec ------ Time Distribution ------ count     Stack
>>>       2048 |                               2         dmu_zfetch
>>>       4096 |                               3         dbuf_read
>>>       8192 |                               4         dmu_buf_hold_array_by_dnode
>>>      16384 |                               3         dmu_buf_hold_array
>>>      32768 |@                              7
>>>      65536 |@@                             14
>>>     131072 |@@@@@@@@@@@@@@@@@@@@           116
>>>     262144 |@@@                            19
>>>     524288 |                               4
>>>    1048576 |                               2
>>> -------------------------------------------------------------------------------
>>> 
>>> Top lock 3rd try:
>>> 
>>> -------------------------------------------------------------------------------
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>>   283  55%  55% 0.00    94602 0xffffff0943ff5a68     dmu_zfetch_find
>>> 
>>>       nsec ------ Time Distribution ------ count     Stack
>>>        512 |                               1         dmu_zfetch
>>>       1024 |                               1         dbuf_read
>>>       2048 |                               0         dmu_buf_hold_array_by_dnode
>>>       4096 |                               5         dmu_buf_hold_array
>>>       8192 |                               2
>>>      16384 |                               7
>>>      32768 |                               4
>>>      65536 |@@@                            33
>>>     131072 |@@@@@@@@@@@@@@@@@@@@           198
>>>     262144 |@@                             27
>>>     524288 |                               2
>>>    1048576 |                               3
>>> -------------------------------------------------------------------------------
>>> 
>>>  
>>> 
>>> As for the MTU question- setting the MTU to 9000 makes read operations grind almost to a halt at 5MB/s transfer rate.
>>> 
>>> -Warren V
>>> 
>>>  
>>> 
>>> On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore <garrett at damore.org> wrote:
>>> 
>>> Here?s a theory.  You are using small (relatively) MTUs (3000 is less than the smallest ZFS block size.)  So, when you go multipathing this way, might a single upper layer transaction (ZFS block transfer request, or for that matter COMSTAR block request) get routed over different paths.  This sounds like a potentially pathological condition to me.
>>> 
>>>  
>>> 
>>> What happens if you increase the MTU to 9000?  Have you tried it?  I?m sort of thinking that this will permit each transaction to be issued in a single IP frame, which may alleviate certain tragic code paths.  (That said, I?m not sure how aware COMSTAR is of the IP MTU.  If it is ignorant, then it shouldn?t matter *that* much, since TCP should do the right thing here and a single TCP stream should stick to a single underlying NIC.  But if COMSTAR is aware of the MTU, it may do some really screwball things as it tries to break requests up into single frames.)
>>> 
>>>  
>>> 
>>> Your read spin really looks like only about 22 msec of wait out of a total run of 30 sec.  (That?s not *great*, but neither does it sound tragic.)  Your write  is interesting because that looks like it is going a wildly different path.  You should be aware that the locks you see are *not* necessarily related in call order, but rather are ordered by instance count.  The write code path hitting the task_thread as hard as it does is really, really weird.  Something is pounding on a taskq lock super hard.  The number of taskq_dispatch_ent calls is interesting here.  I?m starting to wonder if it?s something as stupid as a spin where if the taskq is ?full? (max size reached), a caller just is spinning trying to dispatch jobs to the taskq.  
>>> 
>>>  
>>> 
>>> The taskq_dispatch_ent code is super simple, and it should be almost impossible to have contention on that lock ? barring a thread spinning hard on taskq_dispatch (or taskq_dispatch_ent as I think is happening here).  Looking at the various call sites, there are places in both COMSTAR (iscsit) and in ZFS where this could be coming from.  To know which, we really need to have the back trace associated. 
>>> 
>>>  
>>> 
>>> lockstat can give this ? try giving ?-s 5? to give a short backtrace from this, that will probably give us a little more info about the guilty caller. :-)
>>> 
>>>  
>>> 
>>> - Garrett
>>> 
>>>  
>>> 
>>> On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer <developer at lists.illumos.org> wrote:
>>> 
>>>  
>>> 
>>> Hello all,
>>> 
>>> I am not using layer 2 flow control. The switch carries line-rate 10G traffic without error.
>>> 
>>> I think I have found the issue via lockstat. The first lockstat is taken during a multipath read:
>>> 
>>> 
>>> lockstat -kWP sleep 30
>>> 
>>> 
>>> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec)
>>> 
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>> -------------------------------------------------------------------------------
>>>  9306  44%  44% 0.00     1557 htable_mutex+0x370     htable_release
>>>  6307  23%  68% 0.00     1207 htable_mutex+0x108     htable_lookup
>>>   596   7%  75% 0.00     4100 0xffffff0931705188     cv_wait
>>>   349   5%  80% 0.00     4437 0xffffff0931705188     taskq_thread
>>>   704   2%  82% 0.00      995 0xffffff0935de3c50     dbuf_create
>>> 
>>> The hash table being read here I would guess is the tcp connection hash table.
>>> 
>>>  
>>> 
>>> When lockstat is run during a multipath write operation, I get:
>>> 
>>> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec)
>>> 
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>> -------------------------------------------------------------------------------
>>> 210752  28%  28% 0.00     4781 0xffffff0931705188     taskq_thread
>>> 174471  22%  50% 0.00     4476 0xffffff0931705188     cv_wait
>>> 127183  10%  61% 0.00     2871 0xffffff096f29b510     zio_notify_parent
>>> 176066  10%  70% 0.00     1922 0xffffff0931705188     taskq_dispatch_ent
>>> 105134   9%  80% 0.00     3110 0xffffff096ffdbf10     zio_remove_child
>>> 67512   4%  83% 0.00     1938 0xffffff096f3db4b0     zio_add_child
>>> 45736   3%  86% 0.00     2239 0xffffff0935de3c50     dbuf_destroy
>>> 27781   3%  89% 0.00     3416 0xffffff0935de3c50     dbuf_create
>>> 38536   2%  91% 0.00     2122 0xffffff0935de3b70     dnode_rele
>>> 27841   2%  93% 0.00     2423 0xffffff0935de3b70     dnode_diduse_space
>>> 19020   2%  95% 0.00     3046 0xffffff09d9e305e0     dbuf_rele
>>> 14627   1%  96% 0.00     3632 dbuf_hash_table+0x4f8  dbuf_find
>>> 
>>> 
>>> 
>>> Writes are not performing htable lookups, while reads are.
>>> 
>>> -Warren V
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>  
>>> 
>>> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann <jg at osn.de> wrote:
>>> 
>>> Hi,
>>> 
>>> I would try *one* TPG which includes both interface addresses
>>> and I would double check for packet drops on the Catalyst.
>>> 
>>> The 3560 supports only receive flow control which means, that
>>> a sending 10Gbit port can easily overload a 1Gbit port.
>>> Do you have flow control enabled?
>>> 
>>>  - Joerg
>>> 
>>> 
>>> 
>>> On 02.03.2015 09:22, W Verb via illumos-developer wrote:
>>> 
>>> Hello Garrett,
>>> 
>>> No, no 802.3ad going on in this config.
>>> 
>>> Here is a basic schematic:
>>> 
>>> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing
>>> 
>>> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
>>> 
>>> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing
>>> 
>>> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The
>>> switch is set to allow 9148-byte frames, and I'm not seeing any
>>> errors/buffer overruns on the switch.
>>> 
>>> Here is a screenshot of a packet capture from a read operation on the
>>> guest OS (from it's local drive, which is actually a VMDK file on the
>>> storage server). In this example, only a single 1G ESXi kernel interface
>>> (vmk1) is bound to the software iSCSI initiator.
>>> 
>>> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing
>>> 
>>> Note that there's a nice, well-behaved window sizing process taking
>>> place. The ESXi decreases the scaled window by 11 or 12 for each ACK,
>>> then bumps it back up to 512.
>>> 
>>> Here is a similar screenshot of a single-interface write operation:
>>> 
>>> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing
>>> 
>>> There are no pauses or gaps in the transmission rate in the
>>> single-interface transfers.
>>> 
>>> 
>>> In the next screenshots, I have enabled an additional 1G interface on
>>> the ESXi host, and bound it to the iSCSI initiator. The new interface is
>>> bound to a separate physical port, uses a different VLAN on the switch,
>>> and talks to a different 10G port on the storage server.
>>> 
>>> First, let's look at a write operation on the guest OS, which happily
>>> pumps data at near-line-rate to the storage server.
>>> 
>>> Here is a sequence number trace diagram. Note how the transfer has a
>>> nice, smooth increment rate over the entire transfer.
>>> 
>>> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing
>>> 
>>> Here are screenshots from packet captures on both 1G interfaces:
>>> 
>>> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing
>>> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing
>>> 
>>> Note how we again see nice, smooth window adjustment, and no gaps in
>>> transmission.
>>> 
>>> 
>>> But now, let's look at the problematic two-interface Read operation.
>>> First, the sequence graph:
>>> 
>>> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing
>>> 
>>> As you can see, there are gaps and jumps in the transmission throughout
>>> the transfer.
>>> It is very illustrative to look at captures of the gaps, which are
>>> occurring on both interfaces:
>>> 
>>> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing
>>> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing
>>> 
>>> As you can see, there are ~.4 second pauses in transmission from the
>>> storage server, which kills the transfer rate.
>>> It's clear that the ESXi box ACKs the prior iSCSI operation to
>>> completion, then makes a new LUN request, which the storage server
>>> immediately replies to. The ESXi ACKs the response packet from the
>>> storage server, then waits...and waits....and waits... until eventually
>>> the storage server starts transmitting again.
>>> 
>>> Because the pause happens while the ESXi client is waiting for a packet
>>> from the storage server, that tells me that the gaps are not an artifact
>>> of traffic being switched between both active interfaces, but are
>>> actually indicative of short hangs occurring on the server.
>>> 
>>> Having a pause or two in transmission is no big deal, but in my case, it
>>> is happening constantly, and dropping my overall read transfer rate down
>>> to 20-60MB/s, which is slower than the single interface transfer rate
>>> (~90-100MB/s).
>>> 
>>> Decreasing the MTU makes the pauses shorter, increasing them makes the
>>> pauses longer.
>>> 
>>> Another interesting thing is that if I set the multipath io interval to
>>> 3 operations instead of 1, I get better throughput. In other words, the
>>> less frequently I swap IP addresses on my iSCSI requests from the ESXi
>>> unit, the fewer pauses I see.
>>> 
>>> Basically, COMSTAR seems to choke each time an iSCSI request from a new
>>> IP arrives.
>>> 
>>> Because the single interface transfer is near line rate, that tells me
>>> that the storage system (mpt_sas, zfs, etc) is working fine. It's only
>>> when multiple paths are attempted that iSCSI falls on its face during reads.
>>> 
>>> All of these captures were taken without a cache device being attached
>>> to the storage zpool, so this isn't looking like some kind of ZFS ARC
>>> problem. As mentioned previously, local transfers to/from the zpool are
>>> showing ~300-500 MB/s rates over long transfers (10G+).
>>> 
>>> -Warren V
>>> 
>>> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org
>>> 
>>> <mailto:garrett at damore.org>> wrote:
>>> 
>>>     I?m not sure I?ve followed properly.  You have *two* interfaces.
>>>     You are not trying to provision these in an aggr are you? As far as
>>>     I?m aware, VMware does not support 802.3ad link aggregations.  (Its
>>>     possible that you can make it work with ESXi if you give the entire
>>>     NIC to the guest ? but I?m skeptical.)  The problem is that if you
>>>     try to use link aggregation, some packets (up to half!) will be
>>>     lost.  TCP and other protocols fare poorly in this situation.
>>> 
>>>     Its possible I?ve totally misunderstood what you?re trying to do, in
>>>     which case I apologize.
>>> 
>>>     The idle thing is a red-herring ? the cpu is waiting for work to do,
>>>     probably because packets haven?t arrived (or where dropped by the
>>>     hypervisor!)  I wouldn?t read too much into that except that your
>>>     network stack is in trouble.  I?d look a bit more closely at the
>>>     kstats for tcp ? I suspect you?ll see retransmits or out of order
>>>     values that are unusually high ? if so this may help validate my
>>>     theory above.
>>> 
>>>     - Garrett
>>> 
>>>     On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>>>     <developer at lists.illumos.org <mailto:developer at lists.illumos.org>>
>>> 
>>> 
>>>     wrote:
>>> 
>>>     Hello all,
>>> 
>>> 
>>>     Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>>> 
>>> 
>>>     I tried Joerg's updated driver, which didn't improve the issue. So
>>>     I went back to the drawing board and rebuilt the server from scratch.
>>> 
>>>     What I noted is that if I have only a single 1-gig physical
>>>     interface active on the ESXi host, everything works as expected.
>>>     As soon as I enable two interfaces, I start seeing the performance
>>>     problems I've described.
>>> 
>>>     Response pauses from the server that I see in TCPdumps are still
>>>     leading me to believe the problem is delay on the server side, so
>>>     I ran a series of kernel dtraces and produced some flamegraphs.
>>> 
>>> 
>>>     This was taken during a read operation with two active 10G
>>>     interfaces on the server, with a single target being shared by two
>>>     tpgs- one tpg for each 10G physical port. The host device has two
>>>     1G ports enabled, with VLANs separating the active ports into
>>>     10G/1G pairs. ESXi is set to multipath using both VLANS with a
>>>     round-robin IO interval of 1.
>>> 
>>>     https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing
>>> 
>>> 
>>>     This was taken during a write operation:
>>> 
>>>     https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing
>>> 
>>> 
>>>     I then rebooted the server and disabled C-State, ACPI T-State, and
>>>     general EIST (Turbo boost) functionality in the CPU.
>>> 
>>>     I when I attempted to boot my guest VM, the iSCSI transfer
>>>     gradually ground to a halt during the boot loading process, and
>>>     the guest OS never did complete its boot process.
>>> 
>>>     Here is a flamegraph taken while iSCSI is slowly dying:
>>> 
>>>     https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing
>>> 
>>> 
>>>     I edited out cpu_idle_adaptive from the dtrace output and
>>>     regenerated the slowdown graph:
>>> 
>>>     https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing
>>> 
>>> 
>>>     I then edited cpu_idle_adaptive out of the speedy write operation
>>>     and regenerated that graph:
>>> 
>>>     https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing
>>> 
>>> 
>>>     I have zero experience with interpreting flamegraphs, but the most
>>>     significant difference I see between the slow read example and the
>>>     fast write example is in unix`thread_start --> unix`idle. There's
>>>     a good chunk of "unix`i86_mwait" in the read example that is not
>>>     present in the write example at all.
>>> 
>>>     Disabling the l2arc cache device didn't make a difference, and I
>>>     had to reenable EIST support on the CPU to get my VMs to boot.
>>> 
>>>     I am seeing a variety of bug reports going back to 2010 regarding
>>>     excessive mwait operations, with the suggested solutions usually
>>>     being to set "cpupm enable poll-mode" in power.conf. That change
>>>     also had no effect on speed.
>>> 
>>>     -Warren V
>>> 
>>> 
>>> 
>>> 
>>>     -----Original Message-----
>>> 
>>>     From: Chris Siebenmann [mailto:cks at cs.toronto.edu]
>>> 
>>>     Sent: Monday, February 23, 2015 8:30 AM
>>> 
>>>     To: W Verb
>>> 
>>>     Cc: omnios-discuss at lists.omniti.com
>>> 
>>>     <mailto:omnios-discuss at lists.omniti.com>; cks at cs.toronto.edu
>>>     <mailto:cks at cs.toronto.edu>
>>> 
>>>     Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and
>>>     the Greek economy
>>> 
>>> 
>>>     > Chris, thanks for your specific details. I'd appreciate it if you
>>> 
>>>     > could tell me which copper NIC you tried, as well as to pass on the
>>> 
>>>     > iSCSI tuning parameters.
>>> 
>>> 
>>>     Our copper NIC experience is with onboard X540-AT2 ports on
>>>     SuperMicro hardware (which have the guaranteed 10-20 msec lock
>>>     hold) and dual-port 82599EB TN cards (which have some sort of
>>>     driver/hardware failure under load that eventually leads to
>>>     2-second lock holds). I can't recommend either with the current
>>>     driver; we had to revert to 1G networking in order to get stable
>>>     servers.
>>> 
>>> 
>>>     The iSCSI parameter modifications we do, across both initiators
>>>     and targets, are:
>>> 
>>> 
>>>     initialr2tno
>>> 
>>>     firstburstlength128k
>>> 
>>>     maxrecvdataseglen128k[only on Linux backends]
>>> 
>>>     maxxmitdataseglen128k[only on Linux backends]
>>> 
>>> 
>>>     The OmniOS initiator doesn't need tuning for more than the first
>>>     two parameters; on the Linux backends we tune up all four. My
>>>     extended thoughts on these tuning parameters and why we touch them
>>>     can be found
>>> 
>>>     here:
>>> 
>>> 
>>>     http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol
>>> 
>>>     http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>>> 
>>> 
>>>     The short version is that these parameters probably only make a
>>>     small difference but their overall goal is to do 128KB ZFS reads
>>>     and writes in single iSCSI operations (although they will be
>>>     fragmented at the TCP
>>> 
>>>     layer) and to do iSCSI writes without a back-and-forth delay
>>>     between initiator and target (that's 'initialr2t no').
>>> 
>>> 
>>>     I think basically everyone should use InitialR2T set to no and in
>>>     fact that it should be the software default. These days only
>>>     unusually limited iSCSI targets should need it to be otherwise and
>>>     they can change their setting for it (initiator and target must
>>>     both agree to it being 'yes', so either can veto it).
>>> 
>>> 
>>>     - cks
>>> 
>>> 
>>> 
>>>     On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de
>>> 
>>>     <mailto:jg at osn.de>> wrote:
>>> 
>>>         Hi,
>>> 
>>>         I think your problem is caused by your link properties or your
>>>         switch settings. In general the standard ixgbe seems to perform
>>>         well.
>>> 
>>>         I had trouble after changing the default flow control settings
>>>         to "bi"
>>>         and this was my motivation to update the ixgbe driver a long
>>>         time ago.
>>>         After I have updated our systems to ixgbe 2.5.8 I never had any
>>>         problems ....
>>> 
>>>         Make sure your switch has support for jumbo frames and you use
>>>         the same mtu on all ports, otherwise the smallest will be used.
>>> 
>>>         What switch do you use? I can tell you nice horror stories about
>>>         different vendors....
>>> 
>>>          - Joerg
>>> 
>>>         On 23.02.2015 10:31, W Verb wrote:
>>> 
>>>             Thank you Joerg,
>>> 
>>>             I've downloaded the package and will try it tomorrow.
>>> 
>>>             The only thing I can add at this point is that upon review
>>>             of my
>>>             testing, I may have performed my "pkg -u" between the
>>>             initial quad-gig
>>>             performance test and installing the 10G NIC. So this may
>>>             be a new
>>>             problem introduced in the latest updates.
>>> 
>>>             Those of you who are running 10G and have not upgraded to
>>>             the latest
>>>             kernel, etc, might want to do some additional testing
>>>             before running the
>>>             update.
>>> 
>>>             -Warren V
>>> 
>>>             On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann
>>>             <jg at osn.de <mailto:jg at osn.de>
>>> 
>>>             <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
>>> 
>>>                 Hi,
>>> 
>>>                 I remember there was a problem with the flow control
>>>             settings in the
>>>                 ixgbe
>>>                 driver, so I updated it a long time ago for our
>>>             internal servers to
>>>                 2.5.8.
>>>                 Last weekend I integrated the latest changes from the
>>>             FreeBSD driver
>>>                 to bring
>>>                 the illumos ixgbe to 2.5.25 but I had no time to test
>>>             it, so it's
>>>                 completely
>>>                 untested!
>>> 
>>> 
>>>                 If you would like to give the latest driver a try you
>>>             can fetch the
>>>                 kernel modules from
>>>             https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9
>>>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>>>                 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>>>             <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>
>>> 
>>>                 Clone your boot environment, place the modules in the
>>>             new environment
>>>                 and update the boot-archive of the new BE.
>>> 
>>>                   - Joerg
>>> 
>>> 
>>> 
>>> 
>>> 
>>>                 On 23.02.2015 02:54, W Verb wrote:
>>> 
>>>                     By the way, to those of you who have working
>>>             setups: please send me
>>>                     your pool/volume settings, interface linkprops,
>>>             and any kernel
>>>                     tuning
>>>                     parameters you may have set.
>>> 
>>>                     Thanks,
>>>                     Warren V
>>> 
>>>                     On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>>                     <chip at innovates.com <mailto:chip at innovates.com>
>>>             <mailto:chip at innovates.com <mailto:chip at innovates.com>>>
>>> 
>>> 
>>>             wrote:
>>> 
>>>                         I can't say I totally agree with your performance
>>>                         assessment.   I run Intel
>>>                         X520 in all my OmniOS boxes.
>>> 
>>>                         Here is a capture of nfssvrtop I made while
>>>             running many
>>>                         storage vMotions
>>>                         between two OmniOS boxes hosting NFS
>>>             datastores.   This is a
>>>                         10 host VMware
>>>                         cluster.  Both OmniOS boxes are dual 10G
>>>             connected with
>>>                         copper twin-ax to
>>>                         the in rack Nexus 5010.
>>> 
>>>                         VMware does 100% sync writes, I use ZeusRAM
>>>             SSDs for log
>>>                         devices.
>>> 
>>>                         -Chip
>>> 
>>>                         2014 Apr 24 08:05:51, load: 12.64, read:
>>>             17330243 KB,
>>>                         swrite: 15985    KB,
>>>                         awrite: 1875455  KB
>>> 
>>>                         Ver     Client           NFSOPS   Reads
>>>             SWrites AWrites
>>>                         Commits   Rd_bw
>>>                         SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t
>>>              Com_t  Align%
>>> 
>>>                         4       10.28.17.105          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>> 
>>>                         4       10.28.17.215          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>> 
>>>                         4       10.28.17.213          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>> 
>>>                         4       10.28.16.151          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>> 
>>>                         4       all                   1       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>> 
>>>                         3       10.28.16.175          3       0
>>>              3       0
>>>                           0       1
>>>                         11       0    4806      48       0       0      85
>>> 
>>>                         3       10.28.16.183          6       0
>>>              6       0
>>>                           0       3
>>>                         162       0     549     124       0       0
>>>               73
>>> 
>>>                         3       10.28.16.180         11       0
>>>             10       0
>>>                           0       3
>>>                         27       0     776      89       0       0      67
>>> 
>>>                         3       10.28.16.176         28       2
>>>             26       0
>>>                           0      10
>>>                         405       0    2572     198       0       0
>>>              100
>>> 
>>>                         3       10.28.16.178       4606    4602
>>>              4       0
>>>                           0  294534
>>>                         3       0     723      49       0       0      99
>>> 
>>>                         3       10.28.16.179       4905    4879
>>>             26       0
>>>                           0  312208
>>>                         311       0     735     271       0       0
>>>               99
>>> 
>>>                         3       10.28.16.181       5515    5502
>>>             13       0
>>>                           0  352107
>>>                         77       0      89      87       0       0      99
>>> 
>>>                         3       10.28.16.184      12095   12059
>>>             10       0
>>>                           0  763014
>>>                         39       0     249     147       0       0      99
>>> 
>>>                         3       10.28.58.1        15401    6040
>>>              116    6354
>>>                         53  191605
>>>                         474  202346     192      96     144      83
>>>               99
>>> 
>>>                         3       all 42574 33086 <tel:42574%2033086>
>>>             <tel:42574%20%20%2033086>     217
>>>                         6354      53 1913488
>>>                         1582  202300     348     138     153     105
>>>                 99
>>> 
>>> 
>>> 
>>> 
>>> 
>>>                         On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>>>             <wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>>>                         <mailto:wverb73 at gmail.com
>>> 
>>> 
>>>             <mailto:wverb73 at gmail.com>>> wrote:
>>> 
>>> 
>>>                             Hello All,
>>> 
>>>                             Thank you for your replies.
>>>                             I tried a few things, and found the following:
>>> 
>>>                             1: Disabling hyperthreading support in the
>>>             BIOS drops
>>>                             performance overall
>>>                             by a factor of 4.
>>>                             2: Disabling VT support also seems to have
>>>             some effect,
>>>                             although it
>>>                             appears to be minor. But this has the
>>>             amusing side
>>>                             effect of fixing the
>>>                             hangs I've been experiencing with fast
>>>             reboot. Probably
>>>                             by disabling kvm.
>>>                             3: The performance tests are a bit tricky
>>>             to quantify
>>>                             because of caching
>>>                             effects. In fact, I'm not entirely sure
>>>             what is
>>>                             happening here. It's just
>>>                             best to describe what I'm seeing:
>>> 
>>>                             The commands I'm using to test are
>>>                             dd if=/dev/zero of=./test.dd bs=2M count=5000
>>>                             dd of=/dev/null if=./test.dd bs=2M count=5000
>>>                             The host vm is running Centos 6.6, and has
>>>             the latest
>>>                             vmtools installed.
>>>                             There is a host cache on an SSD local to
>>>             the host that
>>>                             is also in place.
>>>                             Disabling the host cache didn't
>>>             immediately have an
>>>                             effect as far as I could
>>>                             see.
>>> 
>>>                             The host MTU set to 3000 on all iSCSI
>>>             interfaces for all
>>>                             tests.
>>> 
>>>                             Test 1: Right after reboot, with an ixgbe
>>>             MTU of 9000,
>>>                             the write test
>>>                             yields an average speed over three tests
>>>             of 137MB/s. The
>>>                             read test yields an
>>>                             average over three tests of 5MB/s.
>>> 
>>>                             Test 2: After setting "ifconfig ixgbe0 mtu
>>>             3000", the
>>>                             write tests yield
>>>                             140MB/s, and the read tests yield 53MB/s.
>>>             It's important
>>>                             to note here that
>>>                             if I cut the read test short at only
>>>             2-3GB, I get
>>>                             results upwards of
>>>                             350MB/s, which I assume is local
>>>             cache-related distortion.
>>> 
>>>                             Test 3: MTU of 1500. Read tests are up to
>>>             156 MB/s.
>>>                             Write tests yield
>>>                             about 142MB/s.
>>>                             Test 4: MTU of 1000: Read test at 182MB/s.
>>>                             Test 5: MTU of 900: Read test at 130 MB/s.
>>>                             Test 6: MTU of 1000: Read test at 160MB/s.
>>>             Write tests
>>>                             are now
>>>                             consistently at about 300MB/s.
>>>                             Test 7: MTU of 1200: Read test at 124MB/s.
>>>                             Test 8: MTU of 1000: Read test at 161MB/s.
>>>             Write at 261MB/s.
>>> 
>>>                             A few final notes:
>>>                             L1ARC grabs about 10GB of RAM during the
>>>             tests, so
>>>                             there's definitely some
>>>                             read caching going on.
>>>                             The write operations are easier to observe
>>>             with iostat,
>>>                             and I'm seeing io
>>>                             rates that closely correlate with the
>>>             network write speeds.
>>> 
>>> 
>>>                             Chris, thanks for your specific details.
>>>             I'd appreciate
>>>                             it if you could
>>>                             tell me which copper NIC you tried, as
>>>             well as to pass
>>>                             on the iSCSI tuning
>>>                             parameters.
>>> 
>>>                             I've ordered an Intel EXPX9502AFXSR, which
>>>             uses the
>>>                             82598 chip instead of
>>>                             the 82599 in the X520. If I get similar
>>>             results with my
>>>                             fiber transcievers,
>>>                             I'll see if I can get a hold of copper ones.
>>> 
>>>                             But I should mention that I did indeed
>>>             look at PHY/MAC
>>>                             error rates, and
>>>                             they are nil.
>>> 
>>>                             -Warren V
>>> 
>>>                             On Fri, Feb 20, 2015 at 7:25 PM, Chris
>>>             Siebenmann
>>>                             <cks at cs.toronto.edu
>>> 
>>>             <mailto:cks at cs.toronto.edu> <mailto:cks at cs.toronto.edu
>>> 
>>> 
>>>             <mailto:cks at cs.toronto.edu>>>
>>> 
>>>                             wrote:
>>> 
>>> 
>>>                                     After installation and
>>>             configuration, I observed
>>>                                     all kinds of bad
>>>                                     behavior
>>>                                     in the network traffic between the
>>>             hosts and the
>>>                                     server. All of this
>>>                                     bad
>>>                                     behavior is traced to the ixgbe
>>>             driver on the
>>>                                     storage server. Without
>>>                                     going
>>>                                     into the full troubleshooting
>>>             process, here are
>>>                                     my takeaways:
>>> 
>>>                                 [...]
>>> 
>>>                                    For what it's worth, we managed to
>>>             achieve much
>>>                                 better line rates on
>>>                                 copper 10G ixgbe hardware of various
>>>             descriptions
>>>                                 between OmniOS
>>>                                 and CentOS 7 (I don't think we ever
>>>             tested OmniOS to
>>>                                 OmniOS). I don't
>>>                                 believe OmniOS could do TCP at full
>>>             line rate but I
>>>                                 think we managed 700+
>>>                                 Mbytes/sec on both transmit and
>>>             receive and we got
>>>                                 basically disk-limited
>>>                                 speeds with iSCSI (across multiple
>>>             disks on
>>>                                 multi-disk mirrored pools,
>>>                                 OmniOS iSCSI initiator, Linux iSCSI
>>>             targets).
>>> 
>>>                                    I don't believe we did any specific
>>>             kernel tuning
>>>                                 (and in fact some of
>>>                                 our attempts to fiddle ixgbe driver
>>>             parameters blew
>>>                                 up in our face).
>>>                                 We did tune iSCSI connection
>>>             parameters to increase
>>>                                 various buffer
>>>                                 sizes so that ZFS could do even large
>>>             single
>>>                                 operations in single iSCSI
>>>                                 transactions. (More details available
>>>             if people are
>>>                                 interested.)
>>> 
>>>                                     10: At the wire level, the speed
>>>             problems are
>>>                                     clearly due to pauses in
>>>                                     response time by omnios. At 9000
>>>             byte frame
>>>                                     sizes, I see a good number
>>>                                     of duplicate ACKs and fast
>>>             retransmits during
>>>                                     read operations (when
>>>                                     omnios is transmitting). But below
>>>             about a
>>>                                     4100-byte MTU on omnios
>>>                                     (which seems to correlate to
>>>             4096-byte iSCSI
>>>                                     block transfers), the
>>>                                     transmission errors fade away and
>>>             we only see
>>>                                     the transmission pause
>>>                                     problem.
>>> 
>>> 
>>>                                    This is what really attracted my
>>>             attention. In
>>>                                 our OmniOS setup, our
>>>                                 specific Intel hardware had ixgbe
>>>             driver issues that
>>>                                 could cause
>>>                                 activity stalls during once-a-second
>>>             link heartbeat
>>>                                 checks. This
>>>                                 obviously had an effect at the TCP and
>>>             iSCSI layers.
>>>                                 My initial message
>>>                                 to illumos-developer sparked a potentially
>>>                                 interesting discussion:
>>> 
>>> 
>>>             http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/
>>>             <http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/>
>>> 
>>>             <http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/
>>>             <http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/>>
>>> 
>>>                                 If you think this is a possibility in
>>>             your setup,
>>>                                 I've put the DTrace
>>>                                 script I used to hunt for this up on
>>>             the web:
>>> 
>>>             http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d
>>>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d>
>>> 
>>>             <http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d
>>>             <http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d>>
>>> 
>>>                                 This isn't the only potential source
>>>             of driver
>>>                                 stalls by any means, it's
>>>                                 just the one I found. You may also
>>>             want to look at
>>>                                 lockstat in general,
>>>                                 as information it reported is what led
>>>             us to look
>>>                                 specifically at the
>>>                                 ixgbe code here.
>>> 
>>>                                 (If you suspect kernel/driver issues,
>>>             lockstat
>>>                                 combined with kernel
>>>                                 source is a really excellent resource.)
>>> 
>>>                                           - cks
>>> 
>>> 
>>> 
>>> 
>>> 
>>>             ___________________________________________________
>>>                             OmniOS-discuss mailing list
>>>             OmniOS-discuss at lists.omniti
>>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>>                             <mailto:OmniOS-discuss at lists.__omniti.com
>>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>>             http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>>> 
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>> 
>>> 
>>>                     ___________________________________________________
>>>                     OmniOS-discuss mailing list
>>>             OmniOS-discuss at lists.omniti
>>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>>                     <mailto:OmniOS-discuss at lists.__omniti.com
>>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>>             http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss>
>>> 
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>> 
>>> 
>>>                 --
>>>                 OSN Online Service Nuernberg GmbH, Bucher Str. 78,
>>>             90408 Nuernberg
>>>                 Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0>
>>>             <tel:%2B49%20911%2039905-0> - Fax: +49 911
>>>                 39905-55 <tel:%2B49%20911%2039905-55> -
>>>             http://www.osn.de <http://www.osn.de/>
>>>                 HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg
>>>             Goltermann
>>> 
>>> 
>>> 
>>>         --
>>>         OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg
>>>         Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0> - Fax: +49
>>>         911 39905-55 <tel:%2B49%20911%2039905-55> - http://www.osn.de
>>>         <http://www.osn.de/>
>>>         HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>> 
>>> 
>>>     *illumos-developer* | Archives
>>>     <https://www.listbox.com/member/archive/182179/=now>
>>>     <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e>
>>>     | Modify <https://www.listbox.com/member/?&> Your Subscription
>>>     [Powered by Listbox] <http://www.listbox.com/>
>>> 
>>> 
>>> 
>>> *illumos-developer* | Archives
>>> <https://www.listbox.com/member/archive/182179/=now>
>>> <https://www.listbox.com/member/archive/rss/182179/21175123-d0c8da4c> |
>>> Modify
>>> 
>>> <https://www.listbox.com/member/?&id_secret=21175123-d92578cc>
>>> Your Subscription       [Powered by Listbox] <htt
>>> 
>>> ...
>>> 
>>> [Message clipped]  
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150303/f0b0b90e/attachment-0001.html>

From wverb73 at gmail.com  Wed Mar  4 08:27:08 2015
From: wverb73 at gmail.com (W Verb)
Date: Wed, 4 Mar 2015 00:27:08 -0800
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
 Lindsay Lohan, and the Greek economy
In-Reply-To: <F29824E6-D9AE-4C1F-833D-511AF455D04B@damore.org>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
	<54F44602.5030705@osn.de>
	<CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>
	<064C3FC8-43E0-4757-AADA-831303782A4C@damore.org>
	<CAHN_Y276qRUD9y0DoZmm5Uh57DBXsEi2BJ3Z7G=DYt-p9mSmWQ@mail.gmail.com>
	<f48da72278a64860a0f72af9a673c3f4@NASANEXM01F.na.qualcomm.com>
	<CAHN_Y25pwEGavh_ffsmxDNOBiZFA0OadpBE60vQ09=ThDzj-hA@mail.gmail.com>
	<CAHN_Y24kVihzV8S9vinyjAi6Sea+HzzHrxutMQatK1Xp4CVs=Q@mail.gmail.com>
	<F29824E6-D9AE-4C1F-833D-511AF455D04B@damore.org>
Message-ID: <CAHN_Y26gUyQT2DsbWGYdGMsq=LANLSWN02Sh85M5+2xwKqa59g@mail.gmail.com>

Thank you for following up, Garrett,

The logs of all lockstat sessions are now in the zipfile located here:
https://drive.google.com/file/d/0BwyUMjibonYQeVlzN2VndGstRUk/view?usp=sharing

Regards,

Warren V

On Tue, Mar 3, 2015 at 11:30 PM, Garrett D'Amore <garrett at damore.org> wrote:

> I'm not surprised by this result.  Indeed with the earlier data you had
> from lockstat it looked like a comstar or zfs issue on the server.
> Unfortunately the follow up lockstat you sent was pruned to uselessness.
> If you can post the full lockstat with -s5 somewhere it might help
> understand what is actually going on under the hood.
>
> Sent from my iPhone
>
> On Mar 3, 2015, at 9:21 PM, W Verb <wverb73 at gmail.com> wrote:
>
> Hello all,
>
> This is probably the last message in this thread.
>
> I pulled the quad-gig NIC out of one of my hosts, and installed an X520. I
> then set a single 10G port on the server to be on the same VLAN as the
> host, and defined a vswitch, vmknic, etc on the host.
>
> I set the MTU to be 9000 on both sides, then ran my tests.
>
> Read:  130 MB/s.
> Write:  156 MB/s.
>
> Additionally, at higher MTUs, the NIC would periodically lock up until I
> performed an "ipadm disable-if -t ixgbe0" and re-enabled it. I tried your
> updated driver, Jeorg, but unfortunately it failed quite often.
>
> I then disabled stmf, enabled NFS (v3 only) on the server, and shared a
> dataset on the zpool with "share -f nfs /ppool/testy".
> I then mounted the server dataset on the host via NFS, and copied my test
> VM from the iSCSI zvol to the NFS dataset. I also removed the binding of
> the 10G port on the host from the sw iscsi interface.
>
> Running the same tests on the VM over NFSv3 yielded:
>
> Read: 650MB/s
> Write: 306MB/s
>
> This is getting within 10% of the throughput I consistently get on dd
> operations local on the server, so I'm pretty happy that I'm getting as
> good as I'm going to get until I add more drives. Additionally, I haven't
> experienced any NIC hangs.
>
> I tried varying the settings in ixgbe.conf, the MTU, and disabling LSO on
> the host and server, but nothing really made that much of a difference
> (except reducing the MTU made things about 20-30% slower).
>
> mpstat during both NFS and iSCSI transfers showed all processors as
> getting roughly the same number of interrupts, etc, although I did see a
> varying number of  spins on reader/writer locks during the iSCSI transfers.
> The NFS showed no srws at all.
>
> Here is a pretty representative example of a 1s mpstat during an iSCSI
> transfer:
>
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
> idl set
>   0    0   0    0  3246 2690 8739    6  772 5967    2     0    0  11   0
> 89   0
>   1    0   0    0  2366 2249 7910    8  988 5563    2   302    0   9   0
> 91   0
>   2    0   0    0  2455 2344 5584    5  687 5656    3    66    0   9   0
> 91   0
>   3    0   0   25   248   12 6210    1  885 5679    2     0    0   9   0
> 91   0
>   4    0   0    0   284    7 5450    2  861 5751    1     0    0   8   0
> 92   0
>   5    0   0    0   232    3 4513    0  547 5733    3     0    0   7   0
> 93   0
>   6    0   0    0   322    8 6084    1  836 6295    2     0    0   8   0
> 92   0
>   7    0   0    0  3114 2848 8229    4  648 4966    2     0    0  10   0
> 90   0
>
>
> So, it seems that it's COMSTAR/iSCSI that's broke as hell, not ixgbe. My
> apologies to anyone I may have offended with my pre-judgement.
>
> The consequences of this performance issue are significant:
> 1: Instead of being able to utilize the existing quad-port NICs I have in
> my hosts, I must use dual 10G cards for redundancy purposes.
> 2: I must build out a full 10G switching infrastructure.
> 3: The network traffic is inherently less secure, as it is essentially
> impossible to do real security with NFSv3 (that is supported by ESXi).
>
> In the short run, I have already ordered some relatively cheap 20G
> infiniband gear that will hopefully push up the cost/performance ratio.
> However, I have received all sorts of advice about how painful it can be to
> build and maintain infiniband, and if iSCSI over 10G ethernet is this
> painful, I'm not hopeful that infiniband will "just work".
>
> The last option, of course, is to bail out of the Solaris derivatives and
> move to ZoL or ZoBSD. The drawbacks of this are:
>
> 1: ZoL doesn't easily support booting off of mirrored USB flash drives,
> let alone running the root filesystem and swap on them. FreeNAS, by way of
> comparison, puts a 2G swap partition on each zdev, which (strangely enough)
> causes it to often crash when a zdev experiences a failure under load.
>
> 2: Neither ZoL or FreeNAS have good, stable, kernel-based iSCSI
> implementations. FreeNAS is indeed testing istgt, but it proved unstable
> for my purposes in recent builds. Unfortunately, stmf hasn't proved itself
> any better.
>
> There are other minor differences, but these are the ones that brought me
> to OmniOS in the first place. We'll just have to wait and see how well the
> infiniband stuff works.
>
>
> Hopefully this exercise will help prevent others from going down the same
> rabbit-hole that I did.
>
> -Warren V
>
>
>
>
> On Tue, Mar 3, 2015 at 3:45 PM, W Verb <wverb73 at gmail.com> wrote:
>
>> Hello Rob et al,
>>
>> Thank you for taking the time to look at this problem with me. I
>> completely understand your inclination to look at the network as the most
>> probable source of my issue, but I believe that this is a pretty clear-cut
>> case of server-side issues.
>>
>> 1: I did run ping RTT tests during both read and write operations with
>> multiple interfaces enabled, and the RTT stayed at ~.2ms regardless of
>> whether traffic was actively being transmitted/received or not.
>>
>> 2: I am not seeing the TCP window size bouncing around, and I am
>> certainly not seeing starvation and delay in my packet captures. It is true
>> that I do see delayed ACKs and retransmissions when I bump the MTU to 9000
>> on both sides, but I stopped testing with high MTU as soon as I saw it
>> happening because I have a good understanding of incast. All of my recent
>> testing has been with MTUs between 1000 and 3000 bytes.
>>
>> 3: When testing with MTUs between 1000 and 3000 bytes, I do not see lost
>> packets and retransmission in captures on either the server or client side.
>> I only see staggered transmission delays on the part of the server.
>>
>> 4: The client is consistently advertising a large window size (20k+), so
>> the TCP throttling mechanism does not appear to play into this.
>>
>> 5: As mentioned previously, layer 2 flow control is not enabled anywhere
>> in the network, so there are no lower-level mechanisms at work.
>>
>> 6: Upon checking buffer and queue sizes (and doing the appropriate
>> research into documentation on the C3560E's buffer sizes), I do not see
>> large numbers of frames being dropped by the switch. It does happen at
>> larger MTUs, but not very often (and not consistently) during transfers at
>> 1000-3000 byte MTUs. I do not have QoS, policing, or rate-shaping enabled.
>>
>> 7: Network interface stats on both the server and the ESXi client show no
>> errors of any kind. This is via netstat on the server, and esxcli / Vsphere
>> client on the ESXi box.
>>
>> 8: When looking at captures taken simultaneously on the server and client
>> side, the server-side transmission pauses are consistently seen and
>> reproducible, even after multiple server rebuilds, ESXi rebuilds, vSphere
>> reinstallations (down to wiping the SQL db), various COMSTAR configuration
>> variations, multiple 10G NICs with different NIC chipsets, multiple
>> switches (I tried both a 48-port and 24-port C3560E), multiple IOS
>> revisions (12.2 and 15.0), OmniOS versions (r151012 and previous) multiple
>> cables, transceivers, etc etc etc etc etc
>>
>> For your review, I have uploaded the actual packet captures to Google
>> Drive:
>>
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQZS03dHJ2ZjJvTEE/view?usp=sharing
>> 2 int write - ESXi vmk5
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQTlNTQ2M5bjlxZ00/view?usp=sharing
>> 2 int write - ESXi vmk1
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQUEFsSVJCYXBVX3c/view?usp=sharing
>> 2 int read -  server ixgbe0
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQT3FBbElnNFpJTzQ/view?usp=sharing
>> 2 int read - ESXi vmk5
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQU1hXdFRLM2cxSTA/view?usp=sharing
>> 2 int read - ESXi vmk1
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQNEFZSHVZdFNweDA/view?usp=sharing
>> 1 int write - ESXi vmk1
>>
>> https://drive.google.com/file/d/0BwyUMjibonYQM3FpTmloQm5iMGc/view?usp=sharing
>> 1 int read - ESXi vmk1
>>
>> Regards,
>>
>> Warren V
>>
>> On Mon, Mar 2, 2015 at 1:11 PM, Mallory, Rob <rmallory at qualcomm.com>
>> wrote:
>>
>>>  Just an EWAG,   and forgive me for not following closely, I just saw
>>> this in my inbox, and looked at it and the screenshots for 2 minutes.
>>>
>>>
>>>
>>>   But this looks like the typical incast problem..  see
>>> http://www.pdl.cmu.edu/Incast/
>>>
>>> where your storage servers (there are effectively two with ISCSI/MPIO if
>>> round-robin is working) have networks which are 20:1 oversubscribed to your
>>> 1GbE host interfaces. (although one of the tcpdumps shows only one server
>>> so it may be choked out completely)
>>>
>>>
>>>
>>> What is your BDP?  I?m guessing .150ms * 1GbE.  For single-link that
>>> gets you to a MSS of 18700 or so.
>>>
>>>
>>>
>>> On your 1GbE connected clients, leave MTU at 9k, set the following in
>>> sysctl.conf,
>>>
>>> And reboot.
>>>
>>>
>>>
>>> net.ipv4.tcp_rmem = 4096 8938 17876
>>>
>>>
>>>
>>> If MPIO from the server is indeed round-robining properly, this will
>>> ?make things fit? much better.
>>>
>>>
>>>
>>> Note that your tcp_wmem can and should stay high, since you are not
>>> oversubscribed going from client?server ;  you only need to tweak the
>>> tcp receive window size.
>>>
>>>
>>>
>>> I?ve not done it in quite some time, but IIRC, You can also set these
>>> from the server side with:
>>>
>>> Route add -sendpipe 8930   or ?ssthresh
>>>
>>>
>>>
>>> And I think you can see the hash-table with computed BDP per client with
>>> ndd.
>>>
>>>
>>>
>>> I would try playing with those before delving deep into potential bugs
>>> in the TCP, nic driver, zfs, or vm.
>>>
>>> -Rob
>>>
>>>
>>>
>>> *From:* W Verb via illumos-developer [mailto:developer at lists.illumos.org]
>>>
>>> *Sent:* Monday, March 02, 2015 12:20 PM
>>> *To:* Garrett D'Amore
>>> *Cc:* Joerg Goltermann; illumos-dev; omnios-discuss at lists.omniti.com
>>> *Subject:* Re: [developer] Re: [OmniOS-discuss] The ixgbe driver,
>>> Lindsay Lohan, and the Greek economy
>>>
>>>
>>>
>>> Hello,
>>>
>>> vmstat seems pretty boring. Certainly nothing going to swap.
>>>
>>> root at sanbox:/root# vmstat
>>>  kthr      memory            page            disk          faults
>>> cpu
>>>  r b w   swap  free  re  mf pi po fr de sr po ro s0 s2   in   sy   cs us
>>> sy id
>>>  0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681
>>> 0  1 99
>>>
>>>  Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep
>>> 30" during the "fast" write operation.
>>>
>>>
>>> -------------------------------------------------------------------------------
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>> 50934   3%  79% 0.00     3437 0xffffff093145ba40     taskq_dispatch_ent
>>>
>>>       nsec ------ Time Distribution ------ count     Stack
>>>        128 |                               7
>>> spa_taskq_dispatch_ent
>>>        256 |@@                             4333      zio_taskq_dispatch
>>>        512 |@@                             3863      zio_issue_async
>>>       1024 |@@@@@                          9717      zio_execute
>>>       2048 |@@@@@@@@@                      15904
>>>       4096 |@@@@                           7595
>>>       8192 |@@                             4498
>>>      16384 |@                              2662
>>>      32768 |@                              1886
>>>      65536 |                               434
>>>     131072 |                               34
>>>     262144 |                               1
>>>
>>> -------------------------------------------------------------------------------
>>>
>>>
>>>   However, the truly "broken" function is a read operation:
>>>
>>> Top lock 1st try:
>>>
>>> -------------------------------------------------------------------------------
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>>   474  15%  15% 0.00     7031 0xffffff093145b6f8     cv_wait
>>>
>>>       nsec ------ Time Distribution ------ count     Stack
>>>        256 |@                              29        taskq_thread_wait
>>>        512 |@@@@@@                         100       taskq_thread
>>>       1024 |@@@@                           72        thread_start
>>>       2048 |@@@@                           69
>>>       4096 |@@@                            51
>>>       8192 |@@                             47
>>>      16384 |@@                             44
>>>      32768 |@@                             32
>>>      65536 |@                              25
>>>     131072 |                               5
>>>
>>> -------------------------------------------------------------------------------
>>>
>>>   Top lock 2nd try:
>>>
>>>
>>> -------------------------------------------------------------------------------
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>>   174  39%  39% 0.00   103909 0xffffff0943f116a0     dmu_zfetch_find
>>>
>>>       nsec ------ Time Distribution ------ count     Stack
>>>       2048 |                               2         dmu_zfetch
>>>       4096 |                               3         dbuf_read
>>>       8192 |                               4
>>> dmu_buf_hold_array_by_dnode
>>>      16384 |                               3         dmu_buf_hold_array
>>>      32768 |@                              7
>>>      65536 |@@                             14
>>>     131072 |@@@@@@@@@@@@@@@@@@@@           116
>>>     262144 |@@@                            19
>>>     524288 |                               4
>>>    1048576 |                               2
>>>
>>> -------------------------------------------------------------------------------
>>>
>>> Top lock 3rd try:
>>>
>>>
>>> -------------------------------------------------------------------------------
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>>   283  55%  55% 0.00    94602 0xffffff0943ff5a68     dmu_zfetch_find
>>>
>>>       nsec ------ Time Distribution ------ count     Stack
>>>        512 |                               1         dmu_zfetch
>>>       1024 |                               1         dbuf_read
>>>       2048 |                               0
>>> dmu_buf_hold_array_by_dnode
>>>       4096 |                               5         dmu_buf_hold_array
>>>       8192 |                               2
>>>      16384 |                               7
>>>      32768 |                               4
>>>      65536 |@@@                            33
>>>     131072 |@@@@@@@@@@@@@@@@@@@@           198
>>>     262144 |@@                             27
>>>     524288 |                               2
>>>    1048576 |                               3
>>>
>>> -------------------------------------------------------------------------------
>>>
>>>
>>>
>>> As for the MTU question- setting the MTU to 9000 makes read operations
>>> grind almost to a halt at 5MB/s transfer rate.
>>>
>>> -Warren V
>>>
>>>
>>>
>>> On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore <garrett at damore.org>
>>> wrote:
>>>
>>>  Here?s a theory.  You are using small (relatively) MTUs (3000 is less
>>> than the smallest ZFS block size.)  So, when you go multipathing this way,
>>> might a single upper layer transaction (ZFS block transfer request, or for
>>> that matter COMSTAR block request) get routed over different paths.  This
>>> sounds like a potentially pathological condition to me.
>>>
>>>
>>>
>>> What happens if you increase the MTU to 9000?  Have you tried it?  I?m
>>> sort of thinking that this will permit each transaction to be issued in a
>>> single IP frame, which may alleviate certain tragic code paths.  (That
>>> said, I?m not sure how aware COMSTAR is of the IP MTU.  If it is ignorant,
>>> then it shouldn?t matter *that* much, since TCP should do the right thing
>>> here and a single TCP stream should stick to a single underlying NIC.  But
>>> if COMSTAR is aware of the MTU, it may do some really screwball things as
>>> it tries to break requests up into single frames.)
>>>
>>>
>>>
>>> Your read spin really looks like only about 22 msec of wait out of a
>>> total run of 30 sec.  (That?s not *great*, but neither does it sound
>>> tragic.)  Your write  is interesting because that looks like it is going a
>>> wildly different path.  You should be aware that the locks you see are
>>> *not* necessarily related in call order, but rather are ordered by instance
>>> count.  The write code path hitting the task_thread as hard as it does is
>>> really, really weird.  Something is pounding on a taskq lock super hard.
>>> The number of taskq_dispatch_ent calls is interesting here.  I?m starting
>>> to wonder if it?s something as stupid as a spin where if the taskq is
>>> ?full? (max size reached), a caller just is spinning trying to dispatch
>>> jobs to the taskq.
>>>
>>>
>>>
>>> The taskq_dispatch_ent code is super simple, and it should be almost
>>> impossible to have contention on that lock ? barring a thread spinning hard
>>> on taskq_dispatch (or taskq_dispatch_ent as I think is happening here).
>>> Looking at the various call sites, there are places in both COMSTAR
>>> (iscsit) and in ZFS where this could be coming from.  To know which, we
>>> really need to have the back trace associated.
>>>
>>>
>>>
>>> lockstat can give this ? try giving ?-s 5? to give a short backtrace
>>> from this, that will probably give us a little more info about the guilty
>>> caller. :-)
>>>
>>>
>>>
>>> - Garrett
>>>
>>>
>>>
>>>   On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer <
>>> developer at lists.illumos.org> wrote:
>>>
>>>
>>>
>>> Hello all,
>>>
>>> I am not using layer 2 flow control. The switch carries line-rate 10G
>>> traffic without error.
>>>
>>> I think I have found the issue via lockstat. The first lockstat is taken
>>> during a multipath read:
>>>
>>>  lockstat -kWP sleep 30
>>>
>>>
>>> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec)
>>>
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>>
>>> -------------------------------------------------------------------------------
>>>  9306  44%  44% 0.00     1557 htable_mutex+0x370     htable_release
>>>  6307  23%  68% 0.00     1207 htable_mutex+0x108     htable_lookup
>>>   596   7%  75% 0.00     4100 0xffffff0931705188     cv_wait
>>>   349   5%  80% 0.00     4437 0xffffff0931705188     taskq_thread
>>>   704   2%  82% 0.00      995 0xffffff0935de3c50     dbuf_create
>>>
>>> The hash table being read here I would guess is the tcp connection hash
>>> table.
>>>
>>>
>>>
>>> When lockstat is run during a multipath write operation, I get:
>>>
>>> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec)
>>>
>>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>>
>>> -------------------------------------------------------------------------------
>>> 210752  28%  28% 0.00     4781 0xffffff0931705188     taskq_thread
>>> 174471  22%  50% 0.00     4476 0xffffff0931705188     cv_wait
>>> 127183  10%  61% 0.00     2871 0xffffff096f29b510     zio_notify_parent
>>> 176066  10%  70% 0.00     1922 0xffffff0931705188     taskq_dispatch_ent
>>> 105134   9%  80% 0.00     3110 0xffffff096ffdbf10     zio_remove_child
>>> 67512   4%  83% 0.00     1938 0xffffff096f3db4b0     zio_add_child
>>> 45736   3%  86% 0.00     2239 0xffffff0935de3c50     dbuf_destroy
>>> 27781   3%  89% 0.00     3416 0xffffff0935de3c50     dbuf_create
>>> 38536   2%  91% 0.00     2122 0xffffff0935de3b70     dnode_rele
>>> 27841   2%  93% 0.00     2423 0xffffff0935de3b70     dnode_diduse_space
>>> 19020   2%  95% 0.00     3046 0xffffff09d9e305e0     dbuf_rele
>>> 14627   1%  96% 0.00     3632 dbuf_hash_table+0x4f8  dbuf_find
>>>
>>>
>>>   Writes are not performing htable lookups, while reads are.
>>>
>>> -Warren V
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann <jg at osn.de> wrote:
>>>
>>>  Hi,
>>>
>>> I would try *one* TPG which includes both interface addresses
>>> and I would double check for packet drops on the Catalyst.
>>>
>>> The 3560 supports only receive flow control which means, that
>>> a sending 10Gbit port can easily overload a 1Gbit port.
>>> Do you have flow control enabled?
>>>
>>>  - Joerg
>>>
>>>
>>>
>>> On 02.03.2015 09:22, W Verb via illumos-developer wrote:
>>>
>>>   Hello Garrett,
>>>
>>> No, no 802.3ad going on in this config.
>>>
>>> Here is a basic schematic:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing
>>>
>>> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing
>>>
>>> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The
>>> switch is set to allow 9148-byte frames, and I'm not seeing any
>>> errors/buffer overruns on the switch.
>>>
>>> Here is a screenshot of a packet capture from a read operation on the
>>> guest OS (from it's local drive, which is actually a VMDK file on the
>>> storage server). In this example, only a single 1G ESXi kernel interface
>>> (vmk1) is bound to the software iSCSI initiator.
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing
>>>
>>> Note that there's a nice, well-behaved window sizing process taking
>>> place. The ESXi decreases the scaled window by 11 or 12 for each ACK,
>>> then bumps it back up to 512.
>>>
>>> Here is a similar screenshot of a single-interface write operation:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing
>>>
>>> There are no pauses or gaps in the transmission rate in the
>>> single-interface transfers.
>>>
>>>
>>> In the next screenshots, I have enabled an additional 1G interface on
>>> the ESXi host, and bound it to the iSCSI initiator. The new interface is
>>> bound to a separate physical port, uses a different VLAN on the switch,
>>> and talks to a different 10G port on the storage server.
>>>
>>> First, let's look at a write operation on the guest OS, which happily
>>> pumps data at near-line-rate to the storage server.
>>>
>>> Here is a sequence number trace diagram. Note how the transfer has a
>>> nice, smooth increment rate over the entire transfer.
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing
>>>
>>> Here are screenshots from packet captures on both 1G interfaces:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing
>>>
>>> Note how we again see nice, smooth window adjustment, and no gaps in
>>> transmission.
>>>
>>>
>>> But now, let's look at the problematic two-interface Read operation.
>>> First, the sequence graph:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing
>>>
>>> As you can see, there are gaps and jumps in the transmission throughout
>>> the transfer.
>>> It is very illustrative to look at captures of the gaps, which are
>>> occurring on both interfaces:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing
>>>
>>> As you can see, there are ~.4 second pauses in transmission from the
>>> storage server, which kills the transfer rate.
>>> It's clear that the ESXi box ACKs the prior iSCSI operation to
>>> completion, then makes a new LUN request, which the storage server
>>> immediately replies to. The ESXi ACKs the response packet from the
>>> storage server, then waits...and waits....and waits... until eventually
>>> the storage server starts transmitting again.
>>>
>>> Because the pause happens while the ESXi client is waiting for a packet
>>> from the storage server, that tells me that the gaps are not an artifact
>>> of traffic being switched between both active interfaces, but are
>>> actually indicative of short hangs occurring on the server.
>>>
>>> Having a pause or two in transmission is no big deal, but in my case, it
>>> is happening constantly, and dropping my overall read transfer rate down
>>> to 20-60MB/s, which is slower than the single interface transfer rate
>>> (~90-100MB/s).
>>>
>>> Decreasing the MTU makes the pauses shorter, increasing them makes the
>>> pauses longer.
>>>
>>> Another interesting thing is that if I set the multipath io interval to
>>> 3 operations instead of 1, I get better throughput. In other words, the
>>> less frequently I swap IP addresses on my iSCSI requests from the ESXi
>>> unit, the fewer pauses I see.
>>>
>>> Basically, COMSTAR seems to choke each time an iSCSI request from a new
>>> IP arrives.
>>>
>>> Because the single interface transfer is near line rate, that tells me
>>> that the storage system (mpt_sas, zfs, etc) is working fine. It's only
>>> when multiple paths are attempted that iSCSI falls on its face during
>>> reads.
>>>
>>> All of these captures were taken without a cache device being attached
>>> to the storage zpool, so this isn't looking like some kind of ZFS ARC
>>> problem. As mentioned previously, local transfers to/from the zpool are
>>> showing ~300-500 MB/s rates over long transfers (10G+).
>>>
>>> -Warren V
>>>
>>> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org
>>>
>>> <mailto:garrett at damore.org>> wrote:
>>>
>>>     I?m not sure I?ve followed properly.  You have *two* interfaces.
>>>     You are not trying to provision these in an aggr are you? As far as
>>>     I?m aware, VMware does not support 802.3ad link aggregations.  (Its
>>>     possible that you can make it work with ESXi if you give the entire
>>>     NIC to the guest ? but I?m skeptical.)  The problem is that if you
>>>     try to use link aggregation, some packets (up to half!) will be
>>>     lost.  TCP and other protocols fare poorly in this situation.
>>>
>>>     Its possible I?ve totally misunderstood what you?re trying to do, in
>>>     which case I apologize.
>>>
>>>     The idle thing is a red-herring ? the cpu is waiting for work to do,
>>>     probably because packets haven?t arrived (or where dropped by the
>>>     hypervisor!)  I wouldn?t read too much into that except that your
>>>     network stack is in trouble.  I?d look a bit more closely at the
>>>     kstats for tcp ? I suspect you?ll see retransmits or out of order
>>>     values that are unusually high ? if so this may help validate my
>>>     theory above.
>>>
>>>     - Garrett
>>>
>>>     On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>>>     <developer at lists.illumos.org <mailto:developer at lists.illumos.org>>
>>>
>>>
>>>     wrote:
>>>
>>>     Hello all,
>>>
>>>
>>>     Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>>>
>>>
>>>     I tried Joerg's updated driver, which didn't improve the issue. So
>>>     I went back to the drawing board and rebuilt the server from scratch.
>>>
>>>     What I noted is that if I have only a single 1-gig physical
>>>     interface active on the ESXi host, everything works as expected.
>>>     As soon as I enable two interfaces, I start seeing the performance
>>>     problems I've described.
>>>
>>>     Response pauses from the server that I see in TCPdumps are still
>>>     leading me to believe the problem is delay on the server side, so
>>>     I ran a series of kernel dtraces and produced some flamegraphs.
>>>
>>>
>>>     This was taken during a read operation with two active 10G
>>>     interfaces on the server, with a single target being shared by two
>>>     tpgs- one tpg for each 10G physical port. The host device has two
>>>     1G ports enabled, with VLANs separating the active ports into
>>>     10G/1G pairs. ESXi is set to multipath using both VLANS with a
>>>     round-robin IO interval of 1.
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing
>>>
>>>
>>>     This was taken during a write operation:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing
>>>
>>>
>>>     I then rebooted the server and disabled C-State, ACPI T-State, and
>>>     general EIST (Turbo boost) functionality in the CPU.
>>>
>>>     I when I attempted to boot my guest VM, the iSCSI transfer
>>>     gradually ground to a halt during the boot loading process, and
>>>     the guest OS never did complete its boot process.
>>>
>>>     Here is a flamegraph taken while iSCSI is slowly dying:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing
>>>
>>>
>>>     I edited out cpu_idle_adaptive from the dtrace output and
>>>     regenerated the slowdown graph:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing
>>>
>>>
>>>     I then edited cpu_idle_adaptive out of the speedy write operation
>>>     and regenerated that graph:
>>>
>>>
>>> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing
>>>
>>>
>>>     I have zero experience with interpreting flamegraphs, but the most
>>>     significant difference I see between the slow read example and the
>>>     fast write example is in unix`thread_start --> unix`idle. There's
>>>     a good chunk of "unix`i86_mwait" in the read example that is not
>>>     present in the write example at all.
>>>
>>>     Disabling the l2arc cache device didn't make a difference, and I
>>>     had to reenable EIST support on the CPU to get my VMs to boot.
>>>
>>>     I am seeing a variety of bug reports going back to 2010 regarding
>>>     excessive mwait operations, with the suggested solutions usually
>>>     being to set "cpupm enable poll-mode" in power.conf. That change
>>>     also had no effect on speed.
>>>
>>>     -Warren V
>>>
>>>
>>>
>>>
>>>     -----Original Message-----
>>>
>>>     From: Chris Siebenmann [mailto:cks at cs.toronto.edu]
>>>
>>>     Sent: Monday, February 23, 2015 8:30 AM
>>>
>>>     To: W Verb
>>>
>>>     Cc: omnios-discuss at lists.omniti.com
>>>
>>>     <mailto:omnios-discuss at lists.omniti.com>; cks at cs.toronto.edu
>>>     <mailto:cks at cs.toronto.edu>
>>>
>>>     Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and
>>>     the Greek economy
>>>
>>>
>>>     > Chris, thanks for your specific details. I'd appreciate it if you
>>>
>>>     > could tell me which copper NIC you tried, as well as to pass on the
>>>
>>>     > iSCSI tuning parameters.
>>>
>>>
>>>     Our copper NIC experience is with onboard X540-AT2 ports on
>>>     SuperMicro hardware (which have the guaranteed 10-20 msec lock
>>>     hold) and dual-port 82599EB TN cards (which have some sort of
>>>     driver/hardware failure under load that eventually leads to
>>>     2-second lock holds). I can't recommend either with the current
>>>     driver; we had to revert to 1G networking in order to get stable
>>>     servers.
>>>
>>>
>>>     The iSCSI parameter modifications we do, across both initiators
>>>     and targets, are:
>>>
>>>
>>>     initialr2tno
>>>
>>>     firstburstlength128k
>>>
>>>     maxrecvdataseglen128k[only on Linux backends]
>>>
>>>     maxxmitdataseglen128k[only on Linux backends]
>>>
>>>
>>>     The OmniOS initiator doesn't need tuning for more than the first
>>>     two parameters; on the Linux backends we tune up all four. My
>>>     extended thoughts on these tuning parameters and why we touch them
>>>     can be found
>>>
>>>     here:
>>>
>>>
>>>
>>> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol
>>>
>>>     http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>>>
>>>
>>>     The short version is that these parameters probably only make a
>>>     small difference but their overall goal is to do 128KB ZFS reads
>>>     and writes in single iSCSI operations (although they will be
>>>     fragmented at the TCP
>>>
>>>     layer) and to do iSCSI writes without a back-and-forth delay
>>>     between initiator and target (that's 'initialr2t no').
>>>
>>>
>>>     I think basically everyone should use InitialR2T set to no and in
>>>     fact that it should be the software default. These days only
>>>     unusually limited iSCSI targets should need it to be otherwise and
>>>     they can change their setting for it (initiator and target must
>>>     both agree to it being 'yes', so either can veto it).
>>>
>>>
>>>     - cks
>>>
>>>
>>>
>>>     On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de
>>>
>>>     <mailto:jg at osn.de>> wrote:
>>>
>>>         Hi,
>>>
>>>         I think your problem is caused by your link properties or your
>>>         switch settings. In general the standard ixgbe seems to perform
>>>         well.
>>>
>>>         I had trouble after changing the default flow control settings
>>>         to "bi"
>>>         and this was my motivation to update the ixgbe driver a long
>>>         time ago.
>>>         After I have updated our systems to ixgbe 2.5.8 I never had any
>>>         problems ....
>>>
>>>         Make sure your switch has support for jumbo frames and you use
>>>         the same mtu on all ports, otherwise the smallest will be used.
>>>
>>>         What switch do you use? I can tell you nice horror stories about
>>>         different vendors....
>>>
>>>          - Joerg
>>>
>>>         On 23.02.2015 10:31, W Verb wrote:
>>>
>>>             Thank you Joerg,
>>>
>>>             I've downloaded the package and will try it tomorrow.
>>>
>>>             The only thing I can add at this point is that upon review
>>>             of my
>>>             testing, I may have performed my "pkg -u" between the
>>>             initial quad-gig
>>>             performance test and installing the 10G NIC. So this may
>>>             be a new
>>>             problem introduced in the latest updates.
>>>
>>>             Those of you who are running 10G and have not upgraded to
>>>             the latest
>>>             kernel, etc, might want to do some additional testing
>>>             before running the
>>>             update.
>>>
>>>             -Warren V
>>>
>>>             On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann
>>>             <jg at osn.de <mailto:jg at osn.de>
>>>
>>>             <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
>>>
>>>                 Hi,
>>>
>>>                 I remember there was a problem with the flow control
>>>             settings in the
>>>                 ixgbe
>>>                 driver, so I updated it a long time ago for our
>>>             internal servers to
>>>                 2.5.8.
>>>                 Last weekend I integrated the latest changes from the
>>>             FreeBSD driver
>>>                 to bring
>>>                 the illumos ixgbe to 2.5.25 but I had no time to test
>>>             it, so it's
>>>                 completely
>>>                 untested!
>>>
>>>
>>>                 If you would like to give the latest driver a try you
>>>             can fetch the
>>>                 kernel modules from
>>>             https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9
>>>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>>>                 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>>>             <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>
>>>
>>>                 Clone your boot environment, place the modules in the
>>>             new environment
>>>                 and update the boot-archive of the new BE.
>>>
>>>                   - Joerg
>>>
>>>
>>>
>>>
>>>
>>>                 On 23.02.2015 02:54, W Verb wrote:
>>>
>>>                     By the way, to those of you who have working
>>>             setups: please send me
>>>                     your pool/volume settings, interface linkprops,
>>>             and any kernel
>>>                     tuning
>>>                     parameters you may have set.
>>>
>>>                     Thanks,
>>>                     Warren V
>>>
>>>                     On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>>                     <chip at innovates.com <mailto:chip at innovates.com>
>>>             <mailto:chip at innovates.com <mailto:chip at innovates.com>>>
>>>
>>>
>>>             wrote:
>>>
>>>                         I can't say I totally agree with your performance
>>>                         assessment.   I run Intel
>>>                         X520 in all my OmniOS boxes.
>>>
>>>                         Here is a capture of nfssvrtop I made while
>>>             running many
>>>                         storage vMotions
>>>                         between two OmniOS boxes hosting NFS
>>>             datastores.   This is a
>>>                         10 host VMware
>>>                         cluster.  Both OmniOS boxes are dual 10G
>>>             connected with
>>>                         copper twin-ax to
>>>                         the in rack Nexus 5010.
>>>
>>>                         VMware does 100% sync writes, I use ZeusRAM
>>>             SSDs for log
>>>                         devices.
>>>
>>>                         -Chip
>>>
>>>                         2014 Apr 24 08:05:51, load: 12.64, read:
>>>             17330243 KB,
>>>                         swrite: 15985    KB,
>>>                         awrite: 1875455  KB
>>>
>>>                         Ver     Client           NFSOPS   Reads
>>>             SWrites AWrites
>>>                         Commits   Rd_bw
>>>                         SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t
>>>              Com_t  Align%
>>>
>>>                         4       10.28.17.105          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         4       10.28.17.215          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         4       10.28.17.213          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         4       10.28.16.151          0       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         4       all                   1       0
>>>              0       0
>>>                           0       0
>>>                         0       0       0       0       0       0       0
>>>
>>>                         3       10.28.16.175          3       0
>>>              3       0
>>>                           0       1
>>>                         11       0    4806      48       0       0
>>> 85
>>>
>>>                         3       10.28.16.183          6       0
>>>              6       0
>>>                           0       3
>>>                         162       0     549     124       0       0
>>>               73
>>>
>>>                         3       10.28.16.180         11       0
>>>             10       0
>>>                           0       3
>>>                         27       0     776      89       0       0
>>> 67
>>>
>>>                         3       10.28.16.176         28       2
>>>             26       0
>>>                           0      10
>>>                         405       0    2572     198       0       0
>>>              100
>>>
>>>                         3       10.28.16.178       4606    4602
>>>              4       0
>>>                           0  294534
>>>                         3       0     723      49       0       0      99
>>>
>>>                         3       10.28.16.179       4905    4879
>>>             26       0
>>>                           0  312208
>>>                         311       0     735     271       0       0
>>>               99
>>>
>>>                         3       10.28.16.181       5515    5502
>>>             13       0
>>>                           0  352107
>>>                         77       0      89      87       0       0
>>> 99
>>>
>>>                         3       10.28.16.184      12095   12059
>>>             10       0
>>>                           0  763014
>>>                         39       0     249     147       0       0
>>> 99
>>>
>>>                         3       10.28.58.1        15401    6040
>>>              116    6354
>>>                         53  191605
>>>                         474  202346     192      96     144      83
>>>               99
>>>
>>>                         3       all 42574 33086 <tel:42574%2033086
>>> <42574%2033086>>
>>>             <tel:42574%20%20%2033086 <42574%20%20%2033086>>     217
>>>                         6354      53 1913488
>>>                         1582  202300     348     138     153     105
>>>                 99
>>>
>>>
>>>
>>>
>>>
>>>                         On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>>>             <wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>>>                         <mailto:wverb73 at gmail.com
>>>
>>>
>>>             <mailto:wverb73 at gmail.com>>> wrote:
>>>
>>>
>>>                             Hello All,
>>>
>>>                             Thank you for your replies.
>>>                             I tried a few things, and found the
>>> following:
>>>
>>>                             1: Disabling hyperthreading support in the
>>>             BIOS drops
>>>                             performance overall
>>>                             by a factor of 4.
>>>                             2: Disabling VT support also seems to have
>>>             some effect,
>>>                             although it
>>>                             appears to be minor. But this has the
>>>             amusing side
>>>                             effect of fixing the
>>>                             hangs I've been experiencing with fast
>>>             reboot. Probably
>>>                             by disabling kvm.
>>>                             3: The performance tests are a bit tricky
>>>             to quantify
>>>                             because of caching
>>>                             effects. In fact, I'm not entirely sure
>>>             what is
>>>                             happening here. It's just
>>>                             best to describe what I'm seeing:
>>>
>>>                             The commands I'm using to test are
>>>                             dd if=/dev/zero of=./test.dd bs=2M count=5000
>>>                             dd of=/dev/null if=./test.dd bs=2M count=5000
>>>                             The host vm is running Centos 6.6, and has
>>>             the latest
>>>                             vmtools installed.
>>>                             There is a host cache on an SSD local to
>>>             the host that
>>>                             is also in place.
>>>                             Disabling the host cache didn't
>>>             immediately have an
>>>                             effect as far as I could
>>>                             see.
>>>
>>>                             The host MTU set to 3000 on all iSCSI
>>>             interfaces for all
>>>                             tests.
>>>
>>>                             Test 1: Right after reboot, with an ixgbe
>>>             MTU of 9000,
>>>                             the write test
>>>                             yields an average speed over three tests
>>>             of 137MB/s. The
>>>                             read test yields an
>>>                             average over three tests of 5MB/s.
>>>
>>>                             Test 2: After setting "ifconfig ixgbe0 mtu
>>>             3000", the
>>>                             write tests yield
>>>                             140MB/s, and the read tests yield 53MB/s.
>>>             It's important
>>>                             to note here that
>>>                             if I cut the read test short at only
>>>             2-3GB, I get
>>>                             results upwards of
>>>                             350MB/s, which I assume is local
>>>             cache-related distortion.
>>>
>>>                             Test 3: MTU of 1500. Read tests are up to
>>>             156 MB/s.
>>>                             Write tests yield
>>>                             about 142MB/s.
>>>                             Test 4: MTU of 1000: Read test at 182MB/s.
>>>                             Test 5: MTU of 900: Read test at 130 MB/s.
>>>                             Test 6: MTU of 1000: Read test at 160MB/s.
>>>             Write tests
>>>                             are now
>>>                             consistently at about 300MB/s.
>>>                             Test 7: MTU of 1200: Read test at 124MB/s.
>>>                             Test 8: MTU of 1000: Read test at 161MB/s.
>>>             Write at 261MB/s.
>>>
>>>                             A few final notes:
>>>                             L1ARC grabs about 10GB of RAM during the
>>>             tests, so
>>>                             there's definitely some
>>>                             read caching going on.
>>>                             The write operations are easier to observe
>>>             with iostat,
>>>                             and I'm seeing io
>>>                             rates that closely correlate with the
>>>             network write speeds.
>>>
>>>
>>>                             Chris, thanks for your specific details.
>>>             I'd appreciate
>>>                             it if you could
>>>                             tell me which copper NIC you tried, as
>>>             well as to pass
>>>                             on the iSCSI tuning
>>>                             parameters.
>>>
>>>                             I've ordered an Intel EXPX9502AFXSR, which
>>>             uses the
>>>                             82598 chip instead of
>>>                             the 82599 in the X520. If I get similar
>>>             results with my
>>>                             fiber transcievers,
>>>                             I'll see if I can get a hold of copper ones.
>>>
>>>                             But I should mention that I did indeed
>>>             look at PHY/MAC
>>>                             error rates, and
>>>                             they are nil.
>>>
>>>                             -Warren V
>>>
>>>                             On Fri, Feb 20, 2015 at 7:25 PM, Chris
>>>             Siebenmann
>>>                             <cks at cs.toronto.edu
>>>
>>>             <mailto:cks at cs.toronto.edu> <mailto:cks at cs.toronto.edu
>>>
>>>
>>>             <mailto:cks at cs.toronto.edu>>>
>>>
>>>                             wrote:
>>>
>>>
>>>                                     After installation and
>>>             configuration, I observed
>>>                                     all kinds of bad
>>>                                     behavior
>>>                                     in the network traffic between the
>>>             hosts and the
>>>                                     server. All of this
>>>                                     bad
>>>                                     behavior is traced to the ixgbe
>>>             driver on the
>>>                                     storage server. Without
>>>                                     going
>>>                                     into the full troubleshooting
>>>             process, here are
>>>                                     my takeaways:
>>>
>>>                                 [...]
>>>
>>>                                    For what it's worth, we managed to
>>>             achieve much
>>>                                 better line rates on
>>>                                 copper 10G ixgbe hardware of various
>>>             descriptions
>>>                                 between OmniOS
>>>                                 and CentOS 7 (I don't think we ever
>>>             tested OmniOS to
>>>                                 OmniOS). I don't
>>>                                 believe OmniOS could do TCP at full
>>>             line rate but I
>>>                                 think we managed 700+
>>>                                 Mbytes/sec on both transmit and
>>>             receive and we got
>>>                                 basically disk-limited
>>>                                 speeds with iSCSI (across multiple
>>>             disks on
>>>                                 multi-disk mirrored pools,
>>>                                 OmniOS iSCSI initiator, Linux iSCSI
>>>             targets).
>>>
>>>                                    I don't believe we did any specific
>>>             kernel tuning
>>>                                 (and in fact some of
>>>                                 our attempts to fiddle ixgbe driver
>>>             parameters blew
>>>                                 up in our face).
>>>                                 We did tune iSCSI connection
>>>             parameters to increase
>>>                                 various buffer
>>>                                 sizes so that ZFS could do even large
>>>             single
>>>                                 operations in single iSCSI
>>>                                 transactions. (More details available
>>>             if people are
>>>                                 interested.)
>>>
>>>                                     10: At the wire level, the speed
>>>             problems are
>>>                                     clearly due to pauses in
>>>                                     response time by omnios. At 9000
>>>             byte frame
>>>                                     sizes, I see a good number
>>>                                     of duplicate ACKs and fast
>>>             retransmits during
>>>                                     read operations (when
>>>                                     omnios is transmitting). But below
>>>             about a
>>>                                     4100-byte MTU on omnios
>>>                                     (which seems to correlate to
>>>             4096-byte iSCSI
>>>                                     block transfers), the
>>>                                     transmission errors fade away and
>>>             we only see
>>>                                     the transmission pause
>>>                                     problem.
>>>
>>>
>>>                                    This is what really attracted my
>>>             attention. In
>>>                                 our OmniOS setup, our
>>>                                 specific Intel hardware had ixgbe
>>>             driver issues that
>>>                                 could cause
>>>                                 activity stalls during once-a-second
>>>             link heartbeat
>>>                                 checks. This
>>>                                 obviously had an effect at the TCP and
>>>             iSCSI layers.
>>>                                 My initial message
>>>                                 to illumos-developer sparked a
>>> potentially
>>>                                 interesting discussion:
>>>
>>>
>>> http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/
>>>             <
>>> http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/
>>> >
>>>
>>>             <
>>> http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/
>>>             <
>>> http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/
>>> >>
>>>
>>>                                 If you think this is a possibility in
>>>             your setup,
>>>                                 I've put the DTrace
>>>                                 script I used to hunt for this up on
>>>             the web:
>>>
>>>
>>> http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d
>>>             <
>>> http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d>
>>>
>>>             <
>>> http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d
>>>             <
>>> http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d>>
>>>
>>>                                 This isn't the only potential source
>>>             of driver
>>>                                 stalls by any means, it's
>>>                                 just the one I found. You may also
>>>             want to look at
>>>                                 lockstat in general,
>>>                                 as information it reported is what led
>>>             us to look
>>>                                 specifically at the
>>>                                 ixgbe code here.
>>>
>>>                                 (If you suspect kernel/driver issues,
>>>             lockstat
>>>                                 combined with kernel
>>>                                 source is a really excellent resource.)
>>>
>>>                                           - cks
>>>
>>>
>>>
>>>
>>>
>>>             ___________________________________________________
>>>                             OmniOS-discuss mailing list
>>>             OmniOS-discuss at lists.omniti
>>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>>                             <mailto:OmniOS-discuss at lists.__omniti.com
>>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>>
>>> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>> >
>>>
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>>
>>>
>>>                     ___________________________________________________
>>>                     OmniOS-discuss mailing list
>>>             OmniOS-discuss at lists.omniti
>>>             <mailto:OmniOS-discuss at lists.omniti>.____com
>>>                     <mailto:OmniOS-discuss at lists.__omniti.com
>>>             <mailto:OmniOS-discuss at lists.omniti.com>>
>>>
>>> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>> >
>>>
>>>             <http://lists.omniti.com/__mailman/listinfo/omnios-__discuss
>>>             <http://lists.omniti.com/mailman/listinfo/omnios-discuss>>
>>>
>>>
>>>                 --
>>>                 OSN Online Service Nuernberg GmbH, Bucher Str. 78,
>>>             90408 Nuernberg
>>>                 Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0
>>> <%2B49%20911%2039905-0>>
>>>             <tel:%2B49%20911%2039905-0 <%2B49%20911%2039905-0>> - Fax:
>>> +49 911
>>>                 39905-55 <tel:%2B49%20911%2039905-55
>>> <%2B49%20911%2039905-55>> -
>>>             http://www.osn.de <http://www.osn.de/>
>>>                 HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg
>>>             Goltermann
>>>
>>>
>>>
>>>         --
>>>         OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408
>>> Nuernberg
>>>         Tel: +49 911 39905-0 <tel:%2B49%20911%2039905-0
>>> <%2B49%20911%2039905-0>> - Fax: +49
>>>         911 39905-55 <tel:%2B49%20911%2039905-55
>>> <%2B49%20911%2039905-55>> - http://www.osn.de
>>>         <http://www.osn.de/>
>>>         HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann
>>>
>>>
>>>     *illumos-developer* | Archives
>>>     <https://www.listbox.com/member/archive/182179/=now>
>>>     <https://www.listbox.com/member/archive/rss/182179/21239177-3604570e
>>> >
>>>     | Modify <https://www.listbox.com/member/?&> Your Subscription
>>>     [Powered by Listbox] <http://www.listbox.com/>
>>>
>>>
>>>
>>> *illumos-developer* | Archives
>>> <https://www.listbox.com/member/archive/182179/=now>
>>> <https://www.listbox.com/member/archive/rss/182179/21175123-d0c8da4c> |
>>> Modify
>>>
>>> <https://www.listbox.com/member/?&id_secret=21175123-d92578cc
>>> <https://www.listbox.com/member/?&>>
>>> Your Subscription       [Powered by Listbox] <htt
>>> <http://www.listbox.com/>
>>>
>>> ...
>>>
>>> [Message clipped]
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150304/575d6cb6/attachment-0001.html>

From chip at innovates.com  Wed Mar  4 14:17:50 2015
From: chip at innovates.com (Schweiss, Chip)
Date: Wed, 4 Mar 2015 08:17:50 -0600
Subject: [OmniOS-discuss] speeding up file access
In-Reply-To: <20150304142603.152ac1da@emeritus>
References: <20150304142603.152ac1da@emeritus>
Message-ID: <CALeZrrQ9Qfc+8QKxfhxqRJt04tjvr4wkWNhYzQkoapZLXzV=aQ@mail.gmail.com>

No USB flash is going to bring any benefit to the game as a log device.  If
it has any ram cache to increase write performance, it's useless as a log
device because it will not have any power protection for the ram.  Most
likely it will not have any RAM and write performance will be poor.  Decent
log devices don't come cheap.

If it's just a home server set up some frequent snapshots and turn sync
off.   You may have to throw away the most recent writes in the case of a
power failure, but your performance will be maximized.

I've been doing this for 4 years on my home ZFS server, about 1/2 dozen
power failures and I've never lost anything.  I still keep it backed up.  I
use Code42's Crashplan.

-Chip

On Tue, Mar 3, 2015 at 10:26 PM, Michael Mounteney <gate03 at landcroft.co.uk>
wrote:

> Hello list;  this is a very basic question about ZFS performance from
> someone with limited sysadmin knowledge.  I've seen various messages
> about ZILs and caching and noticed that my Supermicro 5017C-LF
> (http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm).
> This has a single USB socket on the board so I wondered if it would be
> worth putting a USB stick / `thumbdrive' in there and using it as the
> ZIL / cache.  I know the real answer to my question is 'buy a proper
> server' but this is a home system and cost, noise and power-consumption
> all mandate the current choice of machine.
>
> (Yes;  the USB socket is vertical;  I'd have to buy a right-angle
> converter)
>
> Thanks, Michael.
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150304/c27a58e5/attachment.html>

From john.barfield at bissinc.com  Wed Mar  4 16:27:29 2015
From: john.barfield at bissinc.com (John Barfield)
Date: Wed, 4 Mar 2015 16:27:29 +0000
Subject: [OmniOS-discuss] OmniOS on IBM DX360 UEFI Firmware
Message-ID: <25A710AE-6CAC-4935-BC44-BAE0A81762B6@bissinc.com>

Greetings,

       I?m writing to see if anyone could point me in the direction of a document that would detail how to get OmniOS to boot on IBM?s newest UEFI firmware on system X machines.

I?m using a DX360 3U chassis as a storage appliance and I?m having a hard time booting the installer iso from USB.

The installer ISO simply does not work but I can boot another ?installed? OmniOS appliance image off of a different USB stick.

However this image just crashes and reboots after the SunOS 5.11 screen and goes into an infinite reboot loop.

If anyone has any experience with this server I would be very grateful if you shared your knowledge.

I?ve tried disabling UEFI or enabling legacy mode but I just don?t think that its working?after scanning through IBM?s docs from what I can tell?it should just work automatically.

Thanks in advance for any help!

John Barfield




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150304/354807a8/attachment.html>

From lists at marzocchi.net  Wed Mar  4 23:08:32 2015
From: lists at marzocchi.net (Olaf Marzocchi)
Date: Thu, 5 Mar 2015 00:08:32 +0100
Subject: [OmniOS-discuss] speeding up file access
In-Reply-To: <CALeZrrQ9Qfc+8QKxfhxqRJt04tjvr4wkWNhYzQkoapZLXzV=aQ@mail.gmail.com>
References: <20150304142603.152ac1da@emeritus>
	<CALeZrrQ9Qfc+8QKxfhxqRJt04tjvr4wkWNhYzQkoapZLXzV=aQ@mail.gmail.com>
Message-ID: <23C40932-04A6-40B3-B417-353340899272@marzocchi.net>

I also have a USB connector and a SD connector on the mobo of my server (Proliant ML100 G7) and I never found any good use for them. The best I could think of is doing a local backup of /etc and of the other config dirs.

Concerning CrashPlan: I also use it, are you aware that they cut support for Solaris in 4.x? Solaris will be supported only on the old 3.x versions. Since they mantain backward compatibility for two main releases, as soon as 5.x will be released, their servers will not accept data from 3.x anymore. That should be in about 1.5 years from now, rough estimate.
I found NO alternatives yet.

Olaf




> Il giorno 04/mar/2015, alle ore 15:17, Schweiss, Chip <chip at innovates.com> ha scritto:
> 
> No USB flash is going to bring any benefit to the game as a log device.  If it has any ram cache to increase write performance, it's useless as a log device because it will not have any power protection for the ram.  Most likely it will not have any RAM and write performance will be poor.  Decent log devices don't come cheap.  
> 
> If it's just a home server set up some frequent snapshots and turn sync off.   You may have to throw away the most recent writes in the case of a power failure, but your performance will be maximized.   
> 
> I've been doing this for 4 years on my home ZFS server, about 1/2 dozen power failures and I've never lost anything.  I still keep it backed up.  I use Code42's Crashplan.  
> 
> -Chip
> 
> On Tue, Mar 3, 2015 at 10:26 PM, Michael Mounteney <gate03 at landcroft.co.uk <mailto:gate03 at landcroft.co.uk>> wrote:
> Hello list;  this is a very basic question about ZFS performance from
> someone with limited sysadmin knowledge.  I've seen various messages
> about ZILs and caching and noticed that my Supermicro 5017C-LF
> (http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm <http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm>).
> This has a single USB socket on the board so I wondered if it would be
> worth putting a USB stick / `thumbdrive' in there and using it as the
> ZIL / cache.  I know the real answer to my question is 'buy a proper
> server' but this is a home system and cost, noise and power-consumption
> all mandate the current choice of machine.
> 
> (Yes;  the USB socket is vertical;  I'd have to buy a right-angle
> converter)
> 
> Thanks, Michael.
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>
> http://lists.omniti.com/mailman/listinfo/omnios-discuss <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150305/15a929e5/attachment.html>

From gate03 at landcroft.co.uk  Wed Mar  4 23:52:57 2015
From: gate03 at landcroft.co.uk (Michael Mounteney)
Date: Thu, 5 Mar 2015 09:52:57 +1000
Subject: [OmniOS-discuss] speeding up file access
In-Reply-To: <CALeZrrQ9Qfc+8QKxfhxqRJt04tjvr4wkWNhYzQkoapZLXzV=aQ@mail.gmail.com>
References: <20150304142603.152ac1da@emeritus>
	<CALeZrrQ9Qfc+8QKxfhxqRJt04tjvr4wkWNhYzQkoapZLXzV=aQ@mail.gmail.com>
Message-ID: <20150305095257.5de6478f@emeritus>

Thanks to Doug and Chip for the replies.

On Wed, 4 Mar 2015 08:17:50 -0600
"Schweiss, Chip" <chip at innovates.com> wrote:

> [...]
> 
> If it's just a home server set up some frequent snapshots and turn
> sync off.   You may have to throw away the most recent writes in the
> case of a power failure, but your performance will be maximized.

I already turned-off sync but the machine is on a UPS, which helps.

It's a low-power server and I have to remember that.  The other day I
found that NTP synchronisation to the clients wasn't working, as the
server was responding too slowly because it had two KVM VMs running.

Michael.

From matthew.lagoe at subrigo.net  Wed Mar  4 23:59:08 2015
From: matthew.lagoe at subrigo.net (Matthew Lagoe)
Date: Wed, 4 Mar 2015 15:59:08 -0800
Subject: [OmniOS-discuss] speeding up file access
In-Reply-To: <23C40932-04A6-40B3-B417-353340899272@marzocchi.net>
References: <20150304142603.152ac1da@emeritus>	<CALeZrrQ9Qfc+8QKxfhxqRJt04tjvr4wkWNhYzQkoapZLXzV=aQ@mail.gmail.com>
	<23C40932-04A6-40B3-B417-353340899272@marzocchi.net>
Message-ID: <001e01d056d7$36d37710$a47a6530$@subrigo.net>

I have only used them for like a usb dongle other then that there pretty
useless J

 

 

From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On
Behalf Of Olaf Marzocchi
Sent: Wednesday, March 04, 2015 03:09 PM
To: Schweiss, Chip
Cc: omnios-discuss
Subject: Re: [OmniOS-discuss] speeding up file access

 

I also have a USB connector and a SD connector on the mobo of my server
(Proliant ML100 G7) and I never found any good use for them. The best I
could think of is doing a local backup of /etc and of the other config dirs.

 

Concerning CrashPlan: I also use it, are you aware that they cut support for
Solaris in 4.x? Solaris will be supported only on the old 3.x versions.
Since they mantain backward compatibility for two main releases, as soon as
5.x will be released, their servers will not accept data from 3.x anymore.
That should be in about 1.5 years from now, rough estimate.

I found NO alternatives yet.

 

Olaf

 

 

 

 

Il giorno 04/mar/2015, alle ore 15:17, Schweiss, Chip <chip at innovates.com>
ha scritto:

 

No USB flash is going to bring any benefit to the game as a log device.  If
it has any ram cache to increase write performance, it's useless as a log
device because it will not have any power protection for the ram.  Most
likely it will not have any RAM and write performance will be poor.  Decent
log devices don't come cheap.  

If it's just a home server set up some frequent snapshots and turn sync off.
You may have to throw away the most recent writes in the case of a power
failure, but your performance will be maximized.   

I've been doing this for 4 years on my home ZFS server, about 1/2 dozen
power failures and I've never lost anything.  I still keep it backed up.  I
use Code42's Crashplan.   

 

-Chip

 

On Tue, Mar 3, 2015 at 10:26 PM, Michael Mounteney <gate03 at landcroft.co.uk>
wrote:

Hello list;  this is a very basic question about ZFS performance from
someone with limited sysadmin knowledge.  I've seen various messages
about ZILs and caching and noticed that my Supermicro 5017C-LF
(http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm).
This has a single USB socket on the board so I wondered if it would be
worth putting a USB stick / `thumbdrive' in there and using it as the
ZIL / cache.  I know the real answer to my question is 'buy a proper
server' but this is a home system and cost, noise and power-consumption
all mandate the current choice of machine.

(Yes;  the USB socket is vertical;  I'd have to buy a right-angle
converter)

Thanks, Michael.
_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

 

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150304/95ccc1cf/attachment-0001.html>

From chip at innovates.com  Thu Mar  5 00:40:23 2015
From: chip at innovates.com (Schweiss, Chip)
Date: Wed, 4 Mar 2015 18:40:23 -0600
Subject: [OmniOS-discuss] OmniOS on IBM DX360 UEFI Firmware
In-Reply-To: <25A710AE-6CAC-4935-BC44-BAE0A81762B6@bissinc.com>
References: <25A710AE-6CAC-4935-BC44-BAE0A81762B6@bissinc.com>
Message-ID: <CALeZrrRy3kz9u+59GP9nDP4qiaPAKsNiCc5e54wzOado=O+=5w@mail.gmail.com>

Sounds like the problem I had on a new Supermicro box.   I found by trial
and error turning off x2apic in the bios fixed the problem.

Also disable C sleep states.

-Chip

On Wed, Mar 4, 2015 at 10:27 AM, John Barfield <john.barfield at bissinc.com>
wrote:

>  Greetings,
>
>         I?m writing to see if anyone could point me in the direction of a
> document that would detail how to get OmniOS to boot on IBM?s newest UEFI
> firmware on system X machines.
>
>  I?m using a DX360 3U chassis as a storage appliance and I?m having a
> hard time booting the installer iso from USB.
>
>  The installer ISO simply does not work but I can boot another
> ?installed? OmniOS appliance image off of a different USB stick.
>
>  However this image just crashes and reboots after the SunOS 5.11 screen
> and goes into an infinite reboot loop.
>
>  If anyone has any experience with this server I would be very grateful
> if you shared your knowledge.
>
>  I?ve tried disabling UEFI or enabling legacy mode but I just don?t think
> that its working?after scanning through IBM?s docs from what I can tell?it
> should just work automatically.
>
>  Thanks in advance for any help!
>
>  John Barfield
>
>
>
>
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150304/405560cf/attachment.html>

From john.barfield at bissinc.com  Thu Mar  5 03:58:29 2015
From: john.barfield at bissinc.com (John Barfield)
Date: Thu, 5 Mar 2015 03:58:29 +0000
Subject: [OmniOS-discuss] OmniOS on IBM DX360 UEFI Firmware
In-Reply-To: <CALeZrrRy3kz9u+59GP9nDP4qiaPAKsNiCc5e54wzOado=O+=5w@mail.gmail.com>
References: <25A710AE-6CAC-4935-BC44-BAE0A81762B6@bissinc.com>,
	<CALeZrrRy3kz9u+59GP9nDP4qiaPAKsNiCc5e54wzOado=O+=5w@mail.gmail.com>
Message-ID: <CB22577D-BCE7-4DAB-85D5-74733F4C2DFA@bissinc.com>

Thanks! Ill try that tomorrow...

Thanks and have a great day,

John Barfield

On Mar 4, 2015, at 6:40 PM, Schweiss, Chip <chip at innovates.com<mailto:chip at innovates.com>> wrote:

Sounds like the problem I had on a new Supermicro box.   I found by trial and error turning off x2apic in the bios fixed the problem.

Also disable C sleep states.

-Chip

On Wed, Mar 4, 2015 at 10:27 AM, John Barfield <john.barfield at bissinc.com<mailto:john.barfield at bissinc.com>> wrote:
Greetings,

       I'm writing to see if anyone could point me in the direction of a document that would detail how to get OmniOS to boot on IBM's newest UEFI firmware on system X machines.

I'm using a DX360 3U chassis as a storage appliance and I'm having a hard time booting the installer iso from USB.

The installer ISO simply does not work but I can boot another "installed" OmniOS appliance image off of a different USB stick.

However this image just crashes and reboots after the SunOS 5.11 screen and goes into an infinite reboot loop.

If anyone has any experience with this server I would be very grateful if you shared your knowledge.

I've tried disabling UEFI or enabling legacy mode but I just don't think that its working...after scanning through IBM's docs from what I can tell...it should just work automatically.

Thanks in advance for any help!

John Barfield





_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com<mailto:OmniOS-discuss at lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150305/6ef5e055/attachment.html>

From nsmith at careyweb.com  Thu Mar  5 14:00:51 2015
From: nsmith at careyweb.com (Nate Smith)
Date: Thu, 5 Mar 2015 09:00:51 -0500
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
Message-ID: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>

I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.
 
Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G
Mar? 5 02:00:13 newstorm last message repeated 1 time
Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G
Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150305/b0ce5a9a/attachment.html>

From rt at steait.net  Thu Mar  5 16:07:59 2015
From: rt at steait.net (Rune Tipsmark)
Date: Thu, 5 Mar 2015 16:07:59 +0000
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
Message-ID: <0224e713f8ba49249c659888858f569b@EX1301.steait.net>

Same problem here? have noticed I can cause this easily by using Windows as initiator? I cannot cause this using VMware as initiator?
No idea how to fix, but a big problem.
Br,
Rune


From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss at lists.omniti.com
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.

Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:13 newstorm last message repeated 1 time
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150305/c55ad16b/attachment-0001.html>

From nsmith at careyweb.com  Thu Mar  5 16:10:12 2015
From: nsmith at careyweb.com (Nate Smith)
Date: Thu, 5 Mar 2015 11:10:12 -0500
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <0224e713f8ba49249c659888858f569b@EX1301.steait.net>
References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
	<0224e713f8ba49249c659888858f569b@EX1301.steait.net>
Message-ID: <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com>

Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. 
 
From: Rune Tipsmark [mailto:rt at steait.net] 
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
 
Same problem here? have noticed I can cause this easily by using Windows as initiator? I cannot cause this using VMware as initiator? 
No idea how to fix, but a big problem.
Br,
Rune
 
 
From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss at lists.omniti.com
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
 
I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.
 
Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:13 newstorm last message repeated 1 time
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150305/11d64a5a/attachment.html>

From rt at steait.net  Thu Mar  5 16:14:26 2015
From: rt at steait.net (Rune Tipsmark)
Date: Thu, 5 Mar 2015 16:14:26 +0000
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com>
References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
	<0224e713f8ba49249c659888858f569b@EX1301.steait.net>
	<476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com>
Message-ID: <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>

Haven?t tried iSCSI but had similar issues with Infiniband? more frequent due to higher io load, but no console error messages.

This only happened on my SuperMicro server and never on my HP server? what brand are you running?

Br,
Rune


From: Nate Smith [mailto:nsmith at careyweb.com]
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out.

From: Rune Tipsmark [mailto:rt at steait.net]
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com<mailto:omnios-discuss at lists.omniti.com>
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Same problem here? have noticed I can cause this easily by using Windows as initiator? I cannot cause this using VMware as initiator?
No idea how to fix, but a big problem.
Br,
Rune


From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss at lists.omniti.com<mailto:omnios-discuss at lists.omniti.com>
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.

Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:13 newstorm last message repeated 1 time
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150305/3f713b8d/attachment-0001.html>

From nsmith at careyweb.com  Thu Mar  5 16:16:15 2015
From: nsmith at careyweb.com (Nate Smith)
Date: Thu, 5 Mar 2015 11:16:15 -0500
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>
References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
	<0224e713f8ba49249c659888858f569b@EX1301.steait.net>
	<476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com>
	<05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>
Message-ID: <b59f7c96-d2de-4a42-ab83-d625a292210a@careyweb.com>

Dell R720. Had it happen with an intel system too.
 
From: Rune Tipsmark [mailto:rt at steait.net] 
Sent: Thursday, March 05, 2015 11:14 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
 
Haven?t tried iSCSI but had similar issues with Infiniband? more frequent due to higher io load, but no console error messages.
 
This only happened on my SuperMicro server and never on my HP server? what brand are you running?
 
Br,
Rune
 
 
From: Nate Smith [mailto:nsmith at careyweb.com] 
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
 
Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. 
 
From: Rune Tipsmark [mailto:rt at steait.net] 
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
 
Same problem here? have noticed I can cause this easily by using Windows as initiator? I cannot cause this using VMware as initiator? 
No idea how to fix, but a big problem.
Br,
Rune
 
 
From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss at lists.omniti.com
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
 
I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.
 
Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:13 newstorm last message repeated 1 time
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150305/0a30ab5f/attachment.html>

From nsmith at careyweb.com  Thu Mar  5 16:39:25 2015
From: nsmith at careyweb.com (Nate Smith)
Date: Thu, 5 Mar 2015 11:39:25 -0500
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <b59f7c96-d2de-4a42-ab83-d625a292210a@careyweb.com>
References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
	<0224e713f8ba49249c659888858f569b@EX1301.steait.net>
	<476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com>
	<05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>
	<b59f7c96-d2de-4a42-ab83-d625a292210a@careyweb.com>
Message-ID: <2b73ea18-5f73-4c6e-ab1a-18f45e6f8329@careyweb.com>

I posted something about this last fall and didn?t get a response. Here was the only similar error I found. Looks like it happens on OI too.
 
http://openindiana.org/pipermail/openindiana-discuss/2012-May/008211.html
 
From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 11:16 AM
To: 'Rune Tipsmark'; omnios-discuss at lists.omniti.com
Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
 
Dell R720. Had it happen with an intel system too.
 
From: Rune Tipsmark [mailto:rt at steait.net] 
Sent: Thursday, March 05, 2015 11:14 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
 
Haven?t tried iSCSI but had similar issues with Infiniband? more frequent due to higher io load, but no console error messages.
 
This only happened on my SuperMicro server and never on my HP server? what brand are you running?
 
Br,
Rune
 
 
From: Nate Smith [mailto:nsmith at careyweb.com] 
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
 
Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. 
 
From: Rune Tipsmark [mailto:rt at steait.net] 
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
 
Same problem here? have noticed I can cause this easily by using Windows as initiator? I cannot cause this using VMware as initiator? 
No idea how to fix, but a big problem.
Br,
Rune
 
 
From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss at lists.omniti.com
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
 
I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.
 
Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:13 newstorm last message repeated 1 time
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150305/37ff9ef0/attachment-0001.html>

From johan.kragsterman at capvert.se  Thu Mar  5 16:59:42 2015
From: johan.kragsterman at capvert.se (Johan Kragsterman)
Date: Thu, 5 Mar 2015 17:59:42 +0100
Subject: [OmniOS-discuss] Ang: Re:  QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>
References: <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>,
	<4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>	<0224e713f8ba49249c659888858f569b@EX1301.steait.net>
	<476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com>
Message-ID: <OF6548D501.BA00D66E-ONC1257DFF.005ACC3D-C1257DFF.005D5B6A@inse.com>

Hi!

-----"OmniOS-discuss" <omnios-discuss-bounces at lists.omniti.com> skrev: -----
Till: "'Nate Smith'" <nsmith at careyweb.com>, "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
Fr?n: Rune Tipsmark 
S?nt av: "OmniOS-discuss" 
Datum: 2015-03-05 17:15
?rende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Haven&#8217;t tried iSCSI but had similar issues with Infiniband&#8230; more frequent due to higher io load, but no console error messages.

?

This only happened on my SuperMicro server and never on my HP server&#8230; what brand are you running?

?




This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here...

First, when you say "server", do you mean the SAN head? Not the hosts?

Second: Can you specify the exakt model of the Supermicro and the HP?

Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses.

Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro?

Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this.

Rgrds Johan





Br,

Rune

?

?

From: Nate Smith [mailto:nsmith at careyweb.com] 
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out.

?

From: Rune Tipsmark [mailto:rt at steait.net] 
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Same problem here&#8230; have noticed I can cause this easily by using Windows as initiator&#8230; I cannot cause this using VMware as initiator&#8230;

No idea how to fix, but a big problem.

Br,

Rune

?

?

From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss at lists.omniti.com
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

I&#8217;ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I&#8217;ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don&#8217;t get an error that it&#8217;s dropped, at least not on the Omnios system, but I get notice when it&#8217;s restored (which makes no sense). I&#8217;m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.

?

Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:13 newstorm last message repeated 1 time

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss



From nsmith at careyweb.com  Thu Mar  5 17:06:06 2015
From: nsmith at careyweb.com (Nate Smith)
Date: Thu, 5 Mar 2015 12:06:06 -0500
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <OF6548D501.BA00D66E-ONC1257DFF.005ACC3D-C1257DFF.005D5B6A@inse.com>
References: <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>,
	<4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
	<0224e713f8ba49249c659888858f569b@EX1301.steait.net>
	<476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com>
	<OF6548D501.BA00D66E-ONC1257DFF.005ACC3D-C1257DFF.005D5B6A@inse.com>
Message-ID: <48424678-f235-464c-8400-a87de6b2d161@careyweb.com>

The way I have it set up, is that Hyper-V hypervisor picks up the comstar targets and mounts them as ntfs storage to host the HVDs for Cluster File System. In the cluster, I can have either hypervisor drop and the cluster stays up. I'm getting this behavior on 2008 R2 and 2012 R2 (I have both hypervisors connecting to different luns at the same time, so it's hard to say which is causing it to fail).

As far as which PCI device I'm on, interrupts, etc, I could never find a rhyme or reason to it, but I didn't do an exacting test. It's hard to reproduce the problem to test for it. I know my HBAs were always on separate PCI busses and running at 8x on both systems I used.

0Nate



-----Original Message-----
From: Johan Kragsterman [mailto:johan.kragsterman at capvert.se] 
Sent: Thursday, March 05, 2015 12:00 PM
To: Rune Tipsmark
Cc: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: Ang: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Hi!

-----"OmniOS-discuss" <omnios-discuss-bounces at lists.omniti.com> skrev: -----
Till: "'Nate Smith'" <nsmith at careyweb.com>, "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
Fr?n: Rune Tipsmark 
S?nt av: "OmniOS-discuss" 
Datum: 2015-03-05 17:15
?rende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Haven&#8217;t tried iSCSI but had similar issues with Infiniband&#8230; more frequent due to higher io load, but no console error messages.

?

This only happened on my SuperMicro server and never on my HP server&#8230; what brand are you running?

?




This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here...

First, when you say "server", do you mean the SAN head? Not the hosts?

Second: Can you specify the exakt model of the Supermicro and the HP?

Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses.

Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro?

Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this.

Rgrds Johan





Br,

Rune

?

?

From: Nate Smith [mailto:nsmith at careyweb.com] 
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out.

?

From: Rune Tipsmark [mailto:rt at steait.net] 
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Same problem here&#8230; have noticed I can cause this easily by using Windows as initiator&#8230; I cannot cause this using VMware as initiator&#8230;

No idea how to fix, but a big problem.

Br,

Rune

?

?

From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss at lists.omniti.com
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

I&#8217;ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I&#8217;ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don&#8217;t get an error that it&#8217;s dropped, at least not on the Omnios system, but I get notice when it&#8217;s restored (which makes no sense). I&#8217;m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.

?

Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:13 newstorm last message repeated 1 time

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss






From doug at will.to  Thu Mar  5 18:31:32 2015
From: doug at will.to (Doug Hughes)
Date: Thu, 5 Mar 2015 13:31:32 -0500
Subject: [OmniOS-discuss] problems with 10g interfaces dropping off for a
	time and then coming back
Message-ID: <CAOpmc6y6FrKSApZKG4woT8hVfXOMn6cb8+yyufu-272RHC2A0A@mail.gmail.com>

I'm having an issue with r*12 with 10g Solarflare interfaces setup in an
aggregate simultaneously dropping for a while for no apparent reason and
then coming back. Oddly, I can see them leaving the port channel and
dropping on the switch side, but there's no log messages or anything on the
client side. They are 5162 cards, for what it's worth.

Has anybody else seen anything like this? Any idea why the host ports don't
seem to log any messages to the effect? I can see side affects of this on
the host. It only happens during moderate to heavy load. Interrupt
balancing looks ok (intrstat), and I watch vmstat, and then all of a sudden
the cs, interrupts and other markers drop preciptously (probably as a
result of a complete drop of network traffic), and it will stay that way
for a couple of minutes and then recover on its own. Sometimes it is up to
30 minutes and then it just recovers, equally as mysteriously. I can
sometimes fix it by toggling the interface on the switch.

I have other hosts with the same hardware and driver but running Solaris 10
that don't exhibit this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150305/5c6619ce/attachment.html>

From rt at steait.net  Thu Mar  5 18:38:24 2015
From: rt at steait.net (Rune Tipsmark)
Date: Thu, 5 Mar 2015 18:38:24 +0000
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <OF6548D501.BA00D66E-ONC1257DFF.005ACC3D-C1257DFF.005D5B6A@inse.com>
References: <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>,
	<4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
	<0224e713f8ba49249c659888858f569b@EX1301.steait.net>
	<476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com>
	<OF6548D501.BA00D66E-ONC1257DFF.005ACC3D-C1257DFF.005D5B6A@inse.com>
Message-ID: <22d96a9197cb4d7aa9c46c00b2f96337@EX1301.steait.net>

Pls see below >>


-----Original Message-----
From: Johan Kragsterman [mailto:johan.kragsterman at capvert.se] 
Sent: Thursday, March 05, 2015 9:00 AM
To: Rune Tipsmark
Cc: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: Ang: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Hi!

-----"OmniOS-discuss" <omnios-discuss-bounces at lists.omniti.com> skrev: -----
Till: "'Nate Smith'" <nsmith at careyweb.com>, "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
Fr?n: Rune Tipsmark 
S?nt av: "OmniOS-discuss" 
Datum: 2015-03-05 17:15
?rende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Haven&#8217;t tried iSCSI but had similar issues with Infiniband&#8230; more frequent due to higher io load, but no console error messages.

?

This only happened on my SuperMicro server and never on my HP server&#8230; what brand are you running?



This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here...

First, when you say "server", do you mean the SAN head? Not the hosts?
>> SAN Head yes

Second: Can you specify the exakt model of the Supermicro and the HP?
>>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed the shitty HP software and controller from and replaced with  an LSI 9207 and installed OmniOS on. I have tested on other HP and SM servers too, all exhibit the same behavior (3 SM and 2 HP tested)

Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses.
>> both are set to max performance in BIOS and afaik all C-states are disabled. I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings but I actually have two 8Gbit FC Cards in the SM server and both exhibit the problem. I have tried to swap things around too with no luck. I do use every available PCI-E slot though.. L2ARC, SLOG etc.

Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro?
>>Windows on hardware yes, all I have to do is zone a block device over to Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no doubt it will cause this issue... when I say I cannot cause this on VMware then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating system without raw device mapping - I have not tested if I can cause this using RDM. 
It is also true for both HP and SM - both behave just fine using VMware and FibreChannel - however VMware can cause issues with Infiniband on the SM but I think that's a different issue and has to do with Mellanox and their terrible drivers that are never ever updated and half-beta etc.

Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this.
>> I think it could be a mx of both, would be interesting to see if something in Comstar could be fixed,...

Bacically:

ESX+FC+SM = problem
ESX+FC+HP = no problem
Win+FC+SM = problem
Win+FC+HP = not tested
ESX+IB+SM = problem
ESX+IB+HP = no problem
Win+IB+SM = not tested, SRP not supported in Win
Win+IB+HP = not tested, SRP not supported in Win

Anyway it all lead me to some information on 8Gbit FC - in particular portCfgFillWord
Maybe this can affect some of this... google will reveal a great bit of info and also found some links.. http://virtualkenneth.com/2012/05/21/random-esxi-lun-loss-in-a-8gb-fc-brocade-infrastructure/

What about tuning the emlxs.conf? can anything be done there to get better performance?

Br,
Rune


Rgrds Johan





Br,

Rune

?

?

From: Nate Smith [mailto:nsmith at careyweb.com] 
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out.

?

From: Rune Tipsmark [mailto:rt at steait.net] 
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Same problem here&#8230; have noticed I can cause this easily by using Windows as initiator&#8230; I cannot cause this using VMware as initiator&#8230;

No idea how to fix, but a big problem.

Br,

Rune

?

?

From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss at lists.omniti.com
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

I&#8217;ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I&#8217;ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don&#8217;t get an error that it&#8217;s dropped, at least not on the Omnios system, but I get notice when it&#8217;s restored (which makes no sense). I&#8217;m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.

?

Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:13 newstorm last message repeated 1 time

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss



From johan.kragsterman at capvert.se  Thu Mar  5 19:06:52 2015
From: johan.kragsterman at capvert.se (Johan Kragsterman)
Date: Thu, 5 Mar 2015 20:06:52 +0100
Subject: [OmniOS-discuss] Ang: RE: Re: QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <22d96a9197cb4d7aa9c46c00b2f96337@EX1301.steait.net>
References: <22d96a9197cb4d7aa9c46c00b2f96337@EX1301.steait.net>,
	<05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>,
	<4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
	<0224e713f8ba49249c659888858f569b@EX1301.steait.net>	<476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com>
	<OF6548D501.BA00D66E-ONC1257DFF.005ACC3D-C1257DFF.005D5B6A@inse.com>
Message-ID: <OFA1EA593D.FAF1DFD2-ONC1257DFF.0068FFA5-C1257DFF.0068FFA9@inse.com>


Hi!




-----Rune Tipsmark <rt at steait.net> skrev: -----
Till: 'Johan Kragsterman' <johan.kragsterman at capvert.se>
Fr?n: Rune Tipsmark <rt at steait.net>
Datum: 2015-03-05 19:38
Kopia: 'Nate Smith' <nsmith at careyweb.com>, "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
?rende: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Pls see below >>
: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Haven&#8217;t tried iSCSI but had similar issues with Infiniband&#8230; more frequent due to higher io load, but no console error messages.

?

This only happened on my SuperMicro server and never on my HP server&#8230; what brand are you running?



This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here...

First, when you say "server", do you mean the SAN head? Not the hosts?
>> SAN Head yes

Second: Can you specify the exakt model of the Supermicro and the HP?
>>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed the shitty HP software and controller from and replaced with ?an LSI 9207 and installed OmniOS on. I have tested on other HP and SM servers too, all exhibit the same behavior (3 SM and 2 HP tested)

Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses.
>> both are set to max performance in BIOS and afaik all C-states are disabled. I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings but I actually have two 8Gbit FC Cards in the SM server and both exhibit the problem. I have tried to swap things around too with no luck. I do use every available PCI-E slot though.. L2ARC, SLOG etc.

Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro?
>>Windows on hardware yes, all I have to do is zone a block device over to Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no doubt it will cause this issue... when I say I cannot cause this on VMware then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating system without raw device mapping - I have not tested if I can cause this using RDM. 
It is also true for both HP and SM - both behave just fine using VMware and FibreChannel - however VMware can cause issues with Infiniband on the SM but I think that's a different issue and has to do with Mellanox and their terrible drivers that are never ever updated and half-beta etc.

Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this.
>> I think it could be a mx of both, would be interesting to see if something in Comstar could be fixed,...

Bacically:

ESX+FC+SM = problem
ESX+FC+HP = no problem
Win+FC+SM = problem
Win+FC+HP = not tested
ESX+IB+SM = problem
ESX+IB+HP = no problem
Win+IB+SM = not tested, SRP not supported in Win
Win+IB+HP = not tested, SRP not supported in Win

Anyway it all lead me to some information on 8Gbit FC - in particular portCfgFillWord
Maybe this can affect some of this... google will reveal a great bit of info and also found some links.. http://virtualkenneth.com/2012/05/21/random-esxi-lun-loss-in-a-8gb-fc-brocade-infrastructure/

What about tuning the emlxs.conf? can anything be done there to get better performance?



Are you using Emulex HBA's? That would explain things....I have never used Emulex in production. Tried some times in lab env, but always turned out to behave strangly...






Br,
Rune


Rgrds Johan





Br,

Rune

?

?

From: Nate Smith [mailto:nsmith at careyweb.com] 
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out.

?

From: Rune Tipsmark [mailto:rt at steait.net] 
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Same problem here&#8230; have noticed I can cause this easily by using Windows as initiator&#8230; I cannot cause this using VMware as initiator&#8230;

No idea how to fix, but a big problem.

Br,

Rune

?

?

From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss at lists.omniti.com
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

I&#8217;ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I&#8217;ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don&#8217;t get an error that it&#8217;s dropped, at least not on the Omnios system, but I get notice when it&#8217;s restored (which makes no sense). I&#8217;m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.

?

Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:13 newstorm last message repeated 1 time

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss






From rt at steait.net  Thu Mar  5 19:44:35 2015
From: rt at steait.net (Rune Tipsmark)
Date: Thu, 5 Mar 2015 19:44:35 +0000
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <OFA1EA593D.FAF1DFD2-ONC1257DFF.0068FFA5-C1257DFF.0068FFA9@inse.com>
References: <22d96a9197cb4d7aa9c46c00b2f96337@EX1301.steait.net>,
	<05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>,
	<4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
	<0224e713f8ba49249c659888858f569b@EX1301.steait.net>
	<476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com>
	<OF6548D501.BA00D66E-ONC1257DFF.005ACC3D-C1257DFF.005D5B6A@inse.com>
	<OFA1EA593D.FAF1DFD2-ONC1257DFF.0068FFA5-C1257DFF.0068FFA9@inse.com>
Message-ID: <e3e5deca766449d08230ac8ea6e1565a@EX1301.steait.net>

They are qLogic qmh2562 across the board... just figured the emlxs.conf had something to say since I had to edit it to get comstar into target mode.
Br,
Rune


-----Original Message-----
From: Johan Kragsterman [mailto:johan.kragsterman at capvert.se] 
Sent: Thursday, March 05, 2015 11:07 AM
To: Rune Tipsmark
Cc: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: Ang: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?


Hi!




-----Rune Tipsmark <rt at steait.net> skrev: -----
Till: 'Johan Kragsterman' <johan.kragsterman at capvert.se>
Fr?n: Rune Tipsmark <rt at steait.net>
Datum: 2015-03-05 19:38
Kopia: 'Nate Smith' <nsmith at careyweb.com>, "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
?rende: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Pls see below >>
: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Haven&#8217;t tried iSCSI but had similar issues with Infiniband&#8230; more frequent due to higher io load, but no console error messages.

?

This only happened on my SuperMicro server and never on my HP server&#8230; what brand are you running?



This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here...

First, when you say "server", do you mean the SAN head? Not the hosts?
>> SAN Head yes

Second: Can you specify the exakt model of the Supermicro and the HP?
>>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed 
>>the shitty HP software and controller from and replaced with ?an LSI 
>>9207 and installed OmniOS on. I have tested on other HP and SM servers 
>>too, all exhibit the same behavior (3 SM and 2 HP tested)

Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses.
>> both are set to max performance in BIOS and afaik all C-states are disabled. I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings but I actually have two 8Gbit FC Cards in the SM server and both exhibit the problem. I have tried to swap things around too with no luck. I do use every available PCI-E slot though.. L2ARC, SLOG etc.

Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro?
>>Windows on hardware yes, all I have to do is zone a block device over to Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no doubt it will cause this issue... when I say I cannot cause this on VMware then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating system without raw device mapping - I have not tested if I can cause this using RDM. 
It is also true for both HP and SM - both behave just fine using VMware and FibreChannel - however VMware can cause issues with Infiniband on the SM but I think that's a different issue and has to do with Mellanox and their terrible drivers that are never ever updated and half-beta etc.

Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this.
>> I think it could be a mx of both, would be interesting to see if something in Comstar could be fixed,...

Bacically:

ESX+FC+SM = problem
ESX+FC+HP = no problem
Win+FC+SM = problem
Win+FC+HP = not tested
ESX+IB+SM = problem
ESX+IB+HP = no problem
Win+IB+SM = not tested, SRP not supported in Win HP = not tested, SRP 
Win+IB+not supported in Win

Anyway it all lead me to some information on 8Gbit FC - in particular portCfgFillWord Maybe this can affect some of this... google will reveal a great bit of info and also found some links.. http://virtualkenneth.com/2012/05/21/random-esxi-lun-loss-in-a-8gb-fc-brocade-infrastructure/

What about tuning the emlxs.conf? can anything be done there to get better performance?



Are you using Emulex HBA's? That would explain things....I have never used Emulex in production. Tried some times in lab env, but always turned out to behave strangly...






Br,
Rune


Rgrds Johan





Br,

Rune

?

?

From: Nate Smith [mailto:nsmith at careyweb.com]
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out.

?

From: Rune Tipsmark [mailto:rt at steait.net]
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Same problem here&#8230; have noticed I can cause this easily by using Windows as initiator&#8230; I cannot cause this using VMware as initiator&#8230;

No idea how to fix, but a big problem.

Br,

Rune

?

?

From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss at lists.omniti.com
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

I&#8217;ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I&#8217;ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don&#8217;t get an error that it&#8217;s dropped, at least not on the Omnios system, but I get notice when it&#8217;s restored (which makes no sense). I&#8217;m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.

?

Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:13 newstorm last message repeated 1 time

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss






From johan.kragsterman at capvert.se  Thu Mar  5 20:12:06 2015
From: johan.kragsterman at capvert.se (Johan Kragsterman)
Date: Thu, 5 Mar 2015 21:12:06 +0100
Subject: [OmniOS-discuss] Ang: RE: RE: Re: QLE2652 I/O Disconnect. Heat
	Sinks?
In-Reply-To: <e3e5deca766449d08230ac8ea6e1565a@EX1301.steait.net>
References: <e3e5deca766449d08230ac8ea6e1565a@EX1301.steait.net>,
	<22d96a9197cb4d7aa9c46c00b2f96337@EX1301.steait.net>,
	<05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>, 
	<4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>	<0224e713f8ba49249c659888858f569b@EX1301.steait.net>
	<476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com>
	<OF6548D501.BA00D66E-ONC1257DFF.005ACC3D-C1257DFF.005D5B6A@inse.com>
	<OFA1EA593D.FAF1DFD2-ONC1257DFF.0068FFA5-C1257DFF.0068FFA9@inse.com>
Message-ID: <OF7C2F0066.48609C65-ONC1257DFF.006EF8DA-C1257DFF.006EF8DD@inse.com>



-----Rune Tipsmark <rt at steait.net> skrev: -----
Till: 'Johan Kragsterman' <johan.kragsterman at capvert.se>
Fr?n: Rune Tipsmark <rt at steait.net>
Datum: 2015-03-05 20:44
Kopia: 'Nate Smith' <nsmith at careyweb.com>, "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
?rende: RE: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

They are qLogic qmh2562 across the board... just figured the emlxs.conf had something to say since I had to edit it to get comstar into target mode.
Br,
Rune







COMSTAR is target only, so you don't get COMSTAR into target mode, you get the HBA into target mode with a target driver, to give COMSTAR an interface to work with. If you are using qmh2562, you need the qlt driver, which I suppose you already use. emlx is the driver for Emulex  HBA's, and is of no use when you're using qlogic HBA's.

Rgrds Johan











-----Original Message-----
From: Johan Kragsterman [mailto:johan.kragsterman at capvert.se] 
Sent: Thursday, March 05, 2015 11:07 AM
To: Rune Tipsmark
Cc: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: Ang: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?


Hi!




-----Rune Tipsmark <rt at steait.net> skrev: -----
Till: 'Johan Kragsterman' <johan.kragsterman at capvert.se>
Fr?n: Rune Tipsmark <rt at steait.net>
Datum: 2015-03-05 19:38
Kopia: 'Nate Smith' <nsmith at careyweb.com>, "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
?rende: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Pls see below >>
: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Haven&#8217;t tried iSCSI but had similar issues with Infiniband&#8230; more frequent due to higher io load, but no console error messages.

?

This only happened on my SuperMicro server and never on my HP server&#8230; what brand are you running?



This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here...

First, when you say "server", do you mean the SAN head? Not the hosts?
>> SAN Head yes

Second: Can you specify the exakt model of the Supermicro and the HP?
>>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed 
>>the shitty HP software and controller from and replaced with ?an LSI 
>>9207 and installed OmniOS on. I have tested on other HP and SM servers 
>>too, all exhibit the same behavior (3 SM and 2 HP tested)

Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses.
>> both are set to max performance in BIOS and afaik all C-states are disabled. I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings but I actually have two 8Gbit FC Cards in the SM server and both exhibit the problem. I have tried to swap things around too with no luck. I do use every available PCI-E slot though.. L2ARC, SLOG etc.

Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro?
>>Windows on hardware yes, all I have to do is zone a block device over to Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no doubt it will cause this issue... when I say I cannot cause this on VMware then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating system without raw device mapping - I have not tested if I can cause this using RDM. 
It is also true for both HP and SM - both behave just fine using VMware and FibreChannel - however VMware can cause issues with Infiniband on the SM but I think that's a different issue and has to do with Mellanox and their terrible drivers that are never ever updated and half-beta etc.

Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this.
>> I think it could be a mx of both, would be interesting to see if something in Comstar could be fixed,...

Bacically:

ESX+FC+SM = problem
ESX+FC+HP = no problem
Win+FC+SM = problem
Win+FC+HP = not tested
ESX+IB+SM = problem
ESX+IB+HP = no problem
Win+IB+SM = not tested, SRP not supported in Win HP = not tested, SRP 
Win+IB+not supported in Win

Anyway it all lead me to some information on 8Gbit FC - in particular portCfgFillWord Maybe this can affect some of this... google will reveal a great bit of info and also found some links.. http://virtualkenneth.com/2012/05/21/random-esxi-lun-loss-in-a-8gb-fc-brocade-infrastructure/

What about tuning the emlxs.conf? can anything be done there to get better performance?



Are you using Emulex HBA's? That would explain things....I have never used Emulex in production. Tried some times in lab env, but always turned out to behave strangly...






Br,
Rune


Rgrds Johan





Br,

Rune

?

?

From: Nate Smith [mailto:nsmith at careyweb.com]
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out.

?

From: Rune Tipsmark [mailto:rt at steait.net]
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Same problem here&#8230; have noticed I can cause this easily by using Windows as initiator&#8230; I cannot cause this using VMware as initiator&#8230;

No idea how to fix, but a big problem.

Br,

Rune

?

?

From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss at lists.omniti.com
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

I&#8217;ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I&#8217;ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don&#8217;t get an error that it&#8217;s dropped, at least not on the Omnios system, but I get notice when it&#8217;s restored (which makes no sense). I&#8217;m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.

?

Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:13 newstorm last message repeated 1 time

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss









From rt at steait.net  Thu Mar  5 20:33:24 2015
From: rt at steait.net (Rune Tipsmark)
Date: Thu, 5 Mar 2015 20:33:24 +0000
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <OF7C2F0066.48609C65-ONC1257DFF.006EF8DA-C1257DFF.006EF8DD@inse.com>
References: <e3e5deca766449d08230ac8ea6e1565a@EX1301.steait.net>,
	<22d96a9197cb4d7aa9c46c00b2f96337@EX1301.steait.net>,
	<05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>,
	<4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
	<0224e713f8ba49249c659888858f569b@EX1301.steait.net>
	<476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com>
	<OF6548D501.BA00D66E-ONC1257DFF.005ACC3D-C1257DFF.005D5B6A@inse.com>
	<OFA1EA593D.FAF1DFD2-ONC1257DFF.0068FFA5-C1257DFF.0068FFA9@inse.com>
	<OF7C2F0066.48609C65-ONC1257DFF.006EF8DA-C1257DFF.006EF8DD@inse.com>
Message-ID: <cabb3a6e713f4273a90b9054d2c8a92d@EX1301.steait.net>

Ah ok, so just loading the qlt drives is enough, I followed a guide from napp-it when I first learned about solaris a year or so ago and it had the emlxs.conf target=1 described so I just followed it ever since. 

Any other files that can be used to tweak the target driver or comstar?
Br,
Rune


-----Original Message-----
From: Johan Kragsterman [mailto:johan.kragsterman at capvert.se] 
Sent: Thursday, March 05, 2015 12:12 PM
To: Rune Tipsmark
Cc: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: Ang: RE: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?



-----Rune Tipsmark <rt at steait.net> skrev: -----
Till: 'Johan Kragsterman' <johan.kragsterman at capvert.se>
Fr?n: Rune Tipsmark <rt at steait.net>
Datum: 2015-03-05 20:44
Kopia: 'Nate Smith' <nsmith at careyweb.com>, "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
?rende: RE: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

They are qLogic qmh2562 across the board... just figured the emlxs.conf had something to say since I had to edit it to get comstar into target mode.
Br,
Rune







COMSTAR is target only, so you don't get COMSTAR into target mode, you get the HBA into target mode with a target driver, to give COMSTAR an interface to work with. If you are using qmh2562, you need the qlt driver, which I suppose you already use. emlx is the driver for Emulex  HBA's, and is of no use when you're using qlogic HBA's.

Rgrds Johan











-----Original Message-----
From: Johan Kragsterman [mailto:johan.kragsterman at capvert.se] 
Sent: Thursday, March 05, 2015 11:07 AM
To: Rune Tipsmark
Cc: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: Ang: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?


Hi!




-----Rune Tipsmark <rt at steait.net> skrev: -----
Till: 'Johan Kragsterman' <johan.kragsterman at capvert.se>
Fr?n: Rune Tipsmark <rt at steait.net>
Datum: 2015-03-05 19:38
Kopia: 'Nate Smith' <nsmith at careyweb.com>, "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
?rende: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Pls see below >>
: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Haven&#8217;t tried iSCSI but had similar issues with Infiniband&#8230; more frequent due to higher io load, but no console error messages.

?

This only happened on my SuperMicro server and never on my HP server&#8230; what brand are you running?



This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here...

First, when you say "server", do you mean the SAN head? Not the hosts?
>> SAN Head yes

Second: Can you specify the exakt model of the Supermicro and the HP?
>>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed 
>>the shitty HP software and controller from and replaced with ?an LSI 
>>9207 and installed OmniOS on. I have tested on other HP and SM servers 
>>too, all exhibit the same behavior (3 SM and 2 HP tested)

Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses.
>> both are set to max performance in BIOS and afaik all C-states are disabled. I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings but I actually have two 8Gbit FC Cards in the SM server and both exhibit the problem. I have tried to swap things around too with no luck. I do use every available PCI-E slot though.. L2ARC, SLOG etc.

Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro?
>>Windows on hardware yes, all I have to do is zone a block device over to Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no doubt it will cause this issue... when I say I cannot cause this on VMware then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating system without raw device mapping - I have not tested if I can cause this using RDM. 
It is also true for both HP and SM - both behave just fine using VMware and FibreChannel - however VMware can cause issues with Infiniband on the SM but I think that's a different issue and has to do with Mellanox and their terrible drivers that are never ever updated and half-beta etc.

Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this.
>> I think it could be a mx of both, would be interesting to see if something in Comstar could be fixed,...

Bacically:

ESX+FC+SM = problem
ESX+FC+HP = no problem
Win+FC+SM = problem
Win+FC+HP = not tested
ESX+IB+SM = problem
ESX+IB+HP = no problem
Win+IB+SM = not tested, SRP not supported in Win HP = not tested, SRP 
Win+IB+not supported in Win

Anyway it all lead me to some information on 8Gbit FC - in particular portCfgFillWord Maybe this can affect some of this... google will reveal a great bit of info and also found some links.. http://virtualkenneth.com/2012/05/21/random-esxi-lun-loss-in-a-8gb-fc-brocade-infrastructure/

What about tuning the emlxs.conf? can anything be done there to get better performance?



Are you using Emulex HBA's? That would explain things....I have never used Emulex in production. Tried some times in lab env, but always turned out to behave strangly...






Br,
Rune


Rgrds Johan





Br,

Rune

?

?

From: Nate Smith [mailto:nsmith at careyweb.com]
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out.

?

From: Rune Tipsmark [mailto:rt at steait.net]
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Same problem here&#8230; have noticed I can cause this easily by using Windows as initiator&#8230; I cannot cause this using VMware as initiator&#8230;

No idea how to fix, but a big problem.

Br,

Rune

?

?

From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss at lists.omniti.com
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

I&#8217;ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I&#8217;ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don&#8217;t get an error that it&#8217;s dropped, at least not on the Omnios system, but I get notice when it&#8217;s restored (which makes no sense). I&#8217;m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.

?

Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:13 newstorm last message repeated 1 time

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss









From henson at acm.org  Fri Mar  6 03:08:30 2015
From: henson at acm.org (Paul B. Henson)
Date: Thu, 5 Mar 2015 19:08:30 -0800
Subject: [OmniOS-discuss] CIFS File Lock Problems
In-Reply-To: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>
Message-ID: <32b301d057ba$d6168cc0$8243a640$@acm.org>

> From: Aaron Curry
> Sent: Tuesday, March 03, 2015 3:45 PM
> 
> We have encountered an issue with out OmniOS CIFS file server and file locks.

We are currently actually using samba under omnios rather than the in-kernel CIFS server. One reason is that the in-kernel server does not support our requirement to use an MIT Kerberos realm for NFS, and an active directory domain for CIFS. Another is that samba just supports more current features of CIFS.

I believe both of these issues are resolved in the nexenta illumos fork, which implements SMB2 and fixes a lot of other stuff. That code has been released and is available for integration into upstream illumos (where it would then come back down into omnios), but unfortunately that would be a lot of work and I don't believe anyone is currently planning on doing it :(. If that ever happens we will reevaluate the in-kernel CIFS server, as I'd rather be using that?


From henson at acm.org  Fri Mar  6 03:22:47 2015
From: henson at acm.org (Paul B. Henson)
Date: Thu, 5 Mar 2015 19:22:47 -0800
Subject: [OmniOS-discuss] Long group names in ls acl output
In-Reply-To: <54F621D0.6070506@umn.edu>
References: <54F5D619.6000902@umn.edu>	<2dfa01d055f4$bfa16130$3ee42390$@acm.org>
	<54F621D0.6070506@umn.edu>
Message-ID: <32bd01d057bc$d4102380$7c306a80$@acm.org>

> From: Nathan Huff
> Sent: Tuesday, March 03, 2015 1:04 PM
> 
> -n works for the regular user and group but seems to have no effect on
> ACL entries

Ah, sorry, I don't recall seeing ACL entries mentioned in your original
post, perhaps I missed it.

I took a quick look, it appears that ls does not parse/print ACL's itself,
it uses the acl_printacl utility function in libsec. Unfortunately, I don't
see any nontrivial way to modify it to do what you want.




From nrhuff at umn.edu  Fri Mar  6 16:32:19 2015
From: nrhuff at umn.edu (Nathan Huff)
Date: Fri, 06 Mar 2015 10:32:19 -0600
Subject: [OmniOS-discuss] Long group names in ls acl output
In-Reply-To: <32bd01d057bc$d4102380$7c306a80$@acm.org>
References: <54F5D619.6000902@umn.edu>	<2dfa01d055f4$bfa16130$3ee42390$@acm.org>
	<54F621D0.6070506@umn.edu>
	<32bd01d057bc$d4102380$7c306a80$@acm.org>
Message-ID: <54F9D693.8080400@umn.edu>

I ended up writing a shared library that overrides the acl_printacl 
routine with a maximum id string size of 256 instead of 20 that I can 
LD_PRELOAD if I need to see longer names. Hacky, but seems to work.

On 2015-03-05 9:22 PM, Paul B. Henson wrote:
>> From: Nathan Huff
>> Sent: Tuesday, March 03, 2015 1:04 PM
>>
>> -n works for the regular user and group but seems to have no effect on
>> ACL entries
>
> Ah, sorry, I don't recall seeing ACL entries mentioned in your original
> post, perhaps I missed it.
>
> I took a quick look, it appears that ls does not parse/print ACL's itself,
> it uses the acl_printacl utility function in libsec. Unfortunately, I don't
> see any nontrivial way to modify it to do what you want.
>
>
>

-- 
Nathan Huff
System Administrator
Academic Health Center Information Systems
University of Minnesota
612-626-9136

From richard.elling at richardelling.com  Fri Mar  6 16:38:42 2015
From: richard.elling at richardelling.com (Richard Elling)
Date: Fri, 6 Mar 2015 08:38:42 -0800
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
Message-ID: <EFAE308D-4950-4716-908C-ACC2D5ACBD63@richardelling.com>


> On Mar 5, 2015, at 6:00 AM, Nate Smith <nsmith at careyweb.com> wrote:
> 
> I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.

Is there a PCI bridge in the data path? These can often be found on mezzanine or riser cards.
 -- richard

>  
> Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
> Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G
> Mar  5 02:00:13 newstorm last message repeated 1 time
> Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G
> Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
> Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150306/4d2b310a/attachment.html>

From nsmith at careyweb.com  Fri Mar  6 16:56:45 2015
From: nsmith at careyweb.com (Nate Smith)
Date: Fri, 6 Mar 2015 11:56:45 -0500
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <EFAE308D-4950-4716-908C-ACC2D5ACBD63@richardelling.com>
References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
	<EFAE308D-4950-4716-908C-ACC2D5ACBD63@richardelling.com>
Message-ID: <4d6c8571-a7b9-4c31-bd87-2fe6e611f214@careyweb.com>

Yeah, there is on R720s, I think. What about on the Supermicro and HP servers?
 
From: Richard Elling [mailto:richard.elling at richardelling.com] 
Sent: Friday, March 06, 2015 11:39 AM
To: Nate Smith
Cc: omnios-discuss at lists.omniti.com
Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
 
 
On Mar 5, 2015, at 6:00 AM, Nate Smith <nsmith at careyweb.com> wrote:
 
I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.
 
Is there a PCI bridge in the data path? These can often be found on mezzanine or riser cards.
 -- richard


 
Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:13 newstorm last message repeated 1 time
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G
_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150306/5a56c450/attachment-0001.html>

From asc1111 at gmail.com  Fri Mar  6 18:41:26 2015
From: asc1111 at gmail.com (Aaron Curry)
Date: Fri, 6 Mar 2015 11:41:26 -0700
Subject: [OmniOS-discuss] CIFS File Lock Problems
In-Reply-To: <32b301d057ba$d6168cc0$8243a640$@acm.org>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>
	<32b301d057ba$d6168cc0$8243a640$@acm.org>
Message-ID: <CAOqBcP_T1q9YWDDvR0z1ko51kJJe1yREEmm+Y_YVhjNhvC5PBA@mail.gmail.com>

Paul,

Thank you for the response. I was beginning to think that everyone thought
this wasn't even worth commenting on.

We have a couple OmniOS servers that have been running the in-kernel CIFS
server with Active Directory integration for a while now and haven't had
any problems. We've been very happy with it. Of course those don't handle
as many users as this new one. I guess the problems only show up under the
stress of too many connections.

I have considered running Samba as an alternative since I know that's what
a lot of people are doing. So I'm curious, what version are you running? 3
or 4? Is there a package I can install or do I need to build it myself? Is
there any sort of documentation showing how to get Samba working with AD on
OmniOS? I'm not afraid to do the work myself, it would just save a lot of
time to follow someone else's work. And, with this file server on the
fritz, we don't have a lot of time.

Thanks again,

Aaron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150306/66ec4af1/attachment.html>

From johan.kragsterman at capvert.se  Fri Mar  6 18:57:02 2015
From: johan.kragsterman at capvert.se (Johan Kragsterman)
Date: Fri, 6 Mar 2015 19:57:02 +0100
Subject: [OmniOS-discuss] Ang: Re:  QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <EFAE308D-4950-4716-908C-ACC2D5ACBD63@richardelling.com>
References: <EFAE308D-4950-4716-908C-ACC2D5ACBD63@richardelling.com>,
	<4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
Message-ID: <OFACA2F7BE.ECBE81D0-ONC1257E00.0068196A-C1257E00.0068196D@inse.com>


-----"OmniOS-discuss" <omnios-discuss-bounces at lists.omniti.com> skrev: -----
Till: Nate Smith <nsmith at careyweb.com>
Fr?n: Richard Elling 
S?nt av: "OmniOS-discuss" 
Datum: 2015-03-06 17:39
Kopia: omnios-discuss at lists.omniti.com
?rende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?


On Mar 5, 2015, at 6:00 AM, Nate Smith <nsmith at careyweb.com> wrote:

I&#8217;ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I&#8217;ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don&#8217;t get an error that it&#8217;s dropped, at least not on the Omnios system, but I get notice when it&#8217;s restored (which makes no sense). I&#8217;m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.


Is there a PCI bridge in the data path? These can often be found on mezzanine or riser cards.
?-- richard


?

I never thought of that as a possible problem before, but of coarse, it must be a source of possible complications. I never had these problems, though, but interesting for the future! Thanks for that, Richard!

Rgrds Johan





_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss



From sjorge+ml at blackdot.be  Fri Mar  6 19:01:54 2015
From: sjorge+ml at blackdot.be (Jorge Schrauwen)
Date: Fri, 06 Mar 2015 20:01:54 +0100
Subject: [OmniOS-discuss] CIFS File Lock Problems
In-Reply-To: <CAOqBcP_T1q9YWDDvR0z1ko51kJJe1yREEmm+Y_YVhjNhvC5PBA@mail.gmail.com>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>
	<32b301d057ba$d6168cc0$8243a640$@acm.org>
	<CAOqBcP_T1q9YWDDvR0z1ko51kJJe1yREEmm+Y_YVhjNhvC5PBA@mail.gmail.com>
Message-ID: <73bd3a54ce6530d92a4f4374f4a0d994@blackdot.be>

Hi Auron,

I run the in-kernel CIFS too. I hit the same problem at home (plex 
accssing a large file share).
I did not comment because I did not have a fix :(

Gwr is upsteaming some of the smb bits, if I am not mistaken. There are 
a lot of goodies in the Nexenta tree and Joyent tree that would rock if 
they got upstreamed.

I played with Samba4 for a bit but ended up back on in-kernel CIFS for 
ease of use. That was also the original reason i switched to OmniOS :)

Goodluck on your quest for a solution

Jorge

On 2015-03-06 19:41, Aaron Curry wrote:

> Paul,
> 
> Thank you for the response. I was beginning to think that everyone 
> thought this wasn't even worth commenting on.
> 
> We have a couple OmniOS servers that have been running the in-kernel 
> CIFS server with Active Directory integration for a while now and 
> haven't had any problems. We've been very happy with it. Of course 
> those don't handle as many users as this new one. I guess the problems 
> only show up under the stress of too many connections.
> 
> I have considered running Samba as an alternative since I know that's 
> what a lot of people are doing. So I'm curious, what version are you 
> running? 3 or 4? Is there a package I can install or do I need to build 
> it myself? Is there any sort of documentation showing how to get Samba 
> working with AD on OmniOS? I'm not afraid to do the work myself, it 
> would just save a lot of time to follow someone else's work. And, with 
> this file server on the fritz, we don't have a lot of time.
> 
> Thanks again,
> 
> Aaron
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss [1]


Links:
------
[1] http://lists.omniti.com/mailman/listinfo/omnios-discuss

From geoffn at gnaa.net  Fri Mar  6 19:16:27 2015
From: geoffn at gnaa.net (Geoff Nordli)
Date: Fri, 06 Mar 2015 11:16:27 -0800
Subject: [OmniOS-discuss] CIFS File Lock Problems
In-Reply-To: <32b301d057ba$d6168cc0$8243a640$@acm.org>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>
	<32b301d057ba$d6168cc0$8243a640$@acm.org>
Message-ID: <54F9FD0B.1040601@gnaa.net>

On 15-03-05 07:08 PM, Paul B. Henson wrote:
>> From: Aaron Curry
>> Sent: Tuesday, March 03, 2015 3:45 PM
>>
>> We have encountered an issue with out OmniOS CIFS file server and file locks.
> We are currently actually using samba under omnios rather than the in-kernel CIFS server. One reason is that the in-kernel server does not support our requirement to use an MIT Kerberos realm for NFS, and an active directory domain for CIFS. Another is that samba just supports more current features of CIFS.
>
> I believe both of these issues are resolved in the nexenta illumos fork, which implements SMB2 and fixes a lot of other stuff. That code has been released and is available for integration into upstream illumos (where it would then come back down into omnios), but unfortunately that would be a lot of work and I don't believe anyone is currently planning on doing it :(. If that ever happens we will reevaluate the in-kernel CIFS server, as I'd rather be using that?
>
> ___

Paul, when using Samba, can users restore files via the "previous 
versions" within Windows to see all of the snapshots? Having an easy way 
for people to restore files is a huge plus.

The main problem I have is every once in a while the server will lockup 
and the only way around it is a hard reset.

It would be great if those SMB2 and fixes get upstreamed at some point.

thanks,

Geoff





From danmcd at omniti.com  Fri Mar  6 19:30:35 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 6 Mar 2015 14:30:35 -0500
Subject: [OmniOS-discuss] CIFS File Lock Problems
In-Reply-To: <54F9FD0B.1040601@gnaa.net>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>
	<32b301d057ba$d6168cc0$8243a640$@acm.org>
	<54F9FD0B.1040601@gnaa.net>
Message-ID: <9897DF76-CDA3-41C6-8839-A75BF9357B52@omniti.com>


> On Mar 6, 2015, at 2:16 PM, Geoff Nordli <geoffn at gnaa.net> wrote:
> It would be great if those SMB2 and fixes get upstreamed at some point.

All of the distro makers are busy... well... working on their distros.  I certainly am (r151014 with its pkg(5) improvements, including some not yet in bloody, is in its final approach), and I know Joyent & Nexenta are as well.

When we find time, we upstream.  Sometimes it's easy, and sometimes it's hard.  Sometimes it's hard because a distro's architectural decisions aren't the same as other distros, and it takes times to convert a distro's technology into something upstreamable.  Joyent coolness sometimes has this problem (e.g. their work in virtual network devices).  They're not sabotaging upstreaming, they are solving their problems first.  Another reason it's hard can be because a technology arrives in several pieces, and you really have to upstream them a piece at a time for the best fit.  I know the SMB2 work from Nexenta is like this.  Again, it's not done to screw the community, it's done because they have paying customers who want it, and they know who's writing their paychecks.

If you want something upstreamed, volunteer in the community.  Volunteer by offering to test, by offering to inspect a distro's source and see its commit history.  I have pieces myself that I'd like to upstream --> these will allow the building of stock illumos-gate on OmniOS.  I can't upstream them all just yet because they come in pieces, and because I have r151014 coming soon.

Sorry if I'm pontificating here, but this all isn't easy.  :)

Thanks,
Dan


From henson at acm.org  Fri Mar  6 19:41:34 2015
From: henson at acm.org (Paul B. Henson)
Date: Fri, 6 Mar 2015 11:41:34 -0800
Subject: [OmniOS-discuss] Long group names in ls acl output
In-Reply-To: <54F9D693.8080400@umn.edu>
References: <54F5D619.6000902@umn.edu>	<2dfa01d055f4$bfa16130$3ee42390$@acm.org>
	<54F621D0.6070506@umn.edu>
	<32bd01d057bc$d4102380$7c306a80$@acm.org>
	<54F9D693.8080400@umn.edu>
Message-ID: <330f01d05845$8fc51530$af4f3f90$@acm.org>

> From: Nathan Huff
> Sent: Friday, March 06, 2015 8:32 AM
> 
> I ended up writing a shared library that overrides the acl_printacl
> routine with a maximum id string size of 256 instead of 20 that I can
> LD_PRELOAD if I need to see longer names. Hacky, but seems to work.

Been there, done that :). Glad you at least found a workaround for now.
Tentatively, I would think the cleanest upstream solution would be to add a
new flag to acl_printacl to print uid/gid instead of name. I'm not sure how
straightforward that would be or how much work it would result in though.


From henson at acm.org  Fri Mar  6 19:50:01 2015
From: henson at acm.org (Paul B. Henson)
Date: Fri, 6 Mar 2015 11:50:01 -0800
Subject: [OmniOS-discuss] CIFS File Lock Problems
In-Reply-To: <CAOqBcP_T1q9YWDDvR0z1ko51kJJe1yREEmm+Y_YVhjNhvC5PBA@mail.gmail.com>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>	<32b301d057ba$d6168cc0$8243a640$@acm.org>
	<CAOqBcP_T1q9YWDDvR0z1ko51kJJe1yREEmm+Y_YVhjNhvC5PBA@mail.gmail.com>
Message-ID: <331101d05846$bdea7a80$39bf6f80$@acm.org>

> From: Aaron Curry
> Sent: Friday, March 06, 2015 10:41 AM
> 
> I have considered running Samba as an alternative since I know that's what a lot
> of people are doing. So I'm curious, what version are you running? 3 or 4?

We are currently running 3.6.25, we haven't made the jump to 4 yet.

> there a package I can install or do I need to build it myself?

Personally we build it from source via pkgsrc. However, you might be able to use the precompiled pkgsrc binaries from Joyent if you prefer.

>  Is there any sort of
> documentation showing how to get Samba working with AD on OmniOS?

It's basically the exact same as getting samba to work on any OS, so pretty much any guide you find on the Internet should be usable. Here is our current config:

[global]
        allow trusted domains = no
        enable privileges = no
        deadtime = 10
        debug pid = yes
        disable netbios = yes
        enable privileges = no
        idmap config * : backend = nss
        idmap config * : range = 2147483648-2147483648
        idmap config WIN : backend = nss
        idmap config WIN : range = 1000-2147483647
        lanman auth = no
        load printers = no
        log level = 1
        map archive = no
        name resolve order = host
        realm = WIN.CSUPOMONA.EDU
        restrict anonymous = 1
        security = ads
        server signing = auto
        show add printer wizard = no
        workgroup = WIN
        writable = yes
        max log size = 512000
        unix extensions = no
        vfs objects = shadow_copy2 zfsacl
        shadow: snapdir = .zfs/snapshot
        shadow: format = backup-%Y.%m.%d-%H.%M.%S
        shadow: sort = desc
        shadow: localtime = yes
        nfs4: mode = special
        multicast dns register = no
        max protocol = SMB2
        wide links = yes

[homes]
        browseable = no
        path = /export/user/%S

include = /etc/samba/smb-groups.conf

 [global]
        private dir = /etc/samba/private


From henson at acm.org  Fri Mar  6 19:51:05 2015
From: henson at acm.org (Paul B. Henson)
Date: Fri, 6 Mar 2015 11:51:05 -0800
Subject: [OmniOS-discuss] CIFS File Lock Problems
In-Reply-To: <54F9FD0B.1040601@gnaa.net>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>	<32b301d057ba$d6168cc0$8243a640$@acm.org>
	<54F9FD0B.1040601@gnaa.net>
Message-ID: <331901d05846$e406cbb0$ac146310$@acm.org>

> From: Geoff Nordli
> Sent: Friday, March 06, 2015 11:16 AM
> 
> Paul, when using Samba, can users restore files via the "previous
> versions" within Windows to see all of the snapshots? Having an easy way
> for people to restore files is a huge plus.

Yes, samba works with zfs snapshots and allows them to be presented via the Windows shadow copy interface.


From henson at acm.org  Fri Mar  6 19:58:02 2015
From: henson at acm.org (Paul B. Henson)
Date: Fri, 6 Mar 2015 11:58:02 -0800
Subject: [OmniOS-discuss] CIFS File Lock Problems
In-Reply-To: <9897DF76-CDA3-41C6-8839-A75BF9357B52@omniti.com>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>	<32b301d057ba$d6168cc0$8243a640$@acm.org>	<54F9FD0B.1040601@gnaa.net>
	<9897DF76-CDA3-41C6-8839-A75BF9357B52@omniti.com>
Message-ID: <331c01d05847$dc4005d0$94c01170$@acm.org>

> From: Dan McDonald
> Sent: Friday, March 06, 2015 11:31 AM
>
> for the best fit.  I know the SMB2 work from Nexenta is like this.  Again,
it's not
> done to screw the community, it's done because they have paying customers
> who want it, and they know who's writing their paychecks.

I don't think anybody thinks any of the distributions are screwing the
community :), they've released their code changes, which is really the only
obligation they have. Particularly for changes like the Nexenta SMB2 stuff,
they are so complicated and  divergent from upstream it's really difficult
to get them in, and it's understandable particularly for a commercial
company that it's not a high priority. While I certainly might whine and
sigh and say stuff like "I wish those SMB2 updates would get upstreamed" I'm
not blaming anybody for not doing it :).

I'm sure I speak for everybody on the list that we really appreciate all of
the work you do, particularly the help and support you provide above and
beyond the responsibilities of your day job, and in no way expect you to do
everything for everybody ;). Thanks!


From sjorge+ml at blackdot.be  Fri Mar  6 20:08:09 2015
From: sjorge+ml at blackdot.be (Jorge Schrauwen)
Date: Fri, 06 Mar 2015 21:08:09 +0100
Subject: [OmniOS-discuss] CIFS File Lock Problems
In-Reply-To: <331c01d05847$dc4005d0$94c01170$@acm.org>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>
	<32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net>
	<9897DF76-CDA3-41C6-8839-A75BF9357B52@omniti.com>
	<331c01d05847$dc4005d0$94c01170$@acm.org>
Message-ID: <91a70370340fe476e30ee8561c32ddc1@blackdot.be>



+1 this post

I feel frustration but only at the lack of my own ability to not being 
smart enough to upstream myself. Testing is all I seem to be able to 
contribute at the moment.

~ sjorge

On 2015-03-06 20:58, Paul B. Henson wrote:
>> From: Dan McDonald
>> Sent: Friday, March 06, 2015 11:31 AM
>> 
>> for the best fit.  I know the SMB2 work from Nexenta is like this.  
>> Again,
> it's not
>> done to screw the community, it's done because they have paying 
>> customers
>> who want it, and they know who's writing their paychecks.
> 
> I don't think anybody thinks any of the distributions are screwing the
> community :), they've released their code changes, which is really the 
> only
> obligation they have. Particularly for changes like the Nexenta SMB2 
> stuff,
> they are so complicated and  divergent from upstream it's really 
> difficult
> to get them in, and it's understandable particularly for a commercial
> company that it's not a high priority. While I certainly might whine 
> and
> sigh and say stuff like "I wish those SMB2 updates would get 
> upstreamed" I'm
> not blaming anybody for not doing it :).
> 
> I'm sure I speak for everybody on the list that we really appreciate 
> all of
> the work you do, particularly the help and support you provide above 
> and
> beyond the responsibilities of your day job, and in no way expect you 
> to do
> everything for everybody ;). Thanks!
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

From geoffn at gnaa.net  Fri Mar  6 20:13:43 2015
From: geoffn at gnaa.net (Geoff Nordli)
Date: Fri, 06 Mar 2015 12:13:43 -0800
Subject: [OmniOS-discuss] CIFS File Lock Problems
In-Reply-To: <9897DF76-CDA3-41C6-8839-A75BF9357B52@omniti.com>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>
	<32b301d057ba$d6168cc0$8243a640$@acm.org>
	<54F9FD0B.1040601@gnaa.net>
	<9897DF76-CDA3-41C6-8839-A75BF9357B52@omniti.com>
Message-ID: <54FA0A77.7050503@gnaa.net>

On 15-03-06 11:30 AM, Dan McDonald wrote:
>> On Mar 6, 2015, at 2:16 PM, Geoff Nordli <geoffn at gnaa.net> wrote:
>> It would be great if those SMB2 and fixes get upstreamed at some point.
> All of the distro makers are busy... well... working on their distros.  I certainly am (r151014 with its pkg(5) improvements, including some not yet in bloody, is in its final approach), and I know Joyent & Nexenta are as well.
>
> When we find time, we upstream.  Sometimes it's easy, and sometimes it's hard.  Sometimes it's hard because a distro's architectural decisions aren't the same as other distros, and it takes times to convert a distro's technology into something upstreamable.  Joyent coolness sometimes has this problem (e.g. their work in virtual network devices).  They're not sabotaging upstreaming, they are solving their problems first.  Another reason it's hard can be because a technology arrives in several pieces, and you really have to upstream them a piece at a time for the best fit.  I know the SMB2 work from Nexenta is like this.  Again, it's not done to screw the community, it's done because they have paying customers who want it, and they know who's writing their paychecks.
>
> If you want something upstreamed, volunteer in the community.  Volunteer by offering to test, by offering to inspect a distro's source and see its commit history.  I have pieces myself that I'd like to upstream --> these will allow the building of stock illumos-gate on OmniOS.  I can't upstream them all just yet because they come in pieces, and because I have r151014 coming soon.
>
> Sorry if I'm pontificating here, but this all isn't easy.  :)
>
> Thanks,
> Dan
>

Dan, it definitely isn't easy.  I know the rules: If you aren't able to 
do it and if you aren't a paying customer then you have no right to 
complain/choose what people work on.  People in the community work on 
what interests them or work on what the company which pays their bills 
ask them to work on.

I know the work (SMB2 and other fixes) Nexenta has done has a lot of 
moving pieces therefore it isn't very easy to upstream.  I have been 
following the discussion since they announced the opening of that code.

I thoroughly appreciate all the work everyone does around illumos and 
all the distributions.

I have been in the community for five years and I need to be 
contributing more than I currently do.

Happy Friday!!

Geoff




From geoffn at gnaa.net  Fri Mar  6 21:18:03 2015
From: geoffn at gnaa.net (Geoff Nordli)
Date: Fri, 06 Mar 2015 13:18:03 -0800
Subject: [OmniOS-discuss] CIFS File Lock Problems
In-Reply-To: <331901d05846$e406cbb0$ac146310$@acm.org>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>	<32b301d057ba$d6168cc0$8243a640$@acm.org>
	<54F9FD0B.1040601@gnaa.net>
	<331901d05846$e406cbb0$ac146310$@acm.org>
Message-ID: <54FA198B.7090300@gnaa.net>

On 15-03-06 11:51 AM, Paul B. Henson wrote:
>> From: Geoff Nordli
>> Sent: Friday, March 06, 2015 11:16 AM
>>
>> Paul, when using Samba, can users restore files via the "previous
>> versions" within Windows to see all of the snapshots? Having an easy way
>> for people to restore files is a huge plus.
> Yes, samba works with zfs snapshots and allows them to be presented via the Windows shadow copy interface.
>

Thanks Paul.

Did you follow an install guide to get Samba running on Omnios?

Where did you get the package from?   I don't see anything in the core 
or "extras" repo.  I see a 3.6.x in the pkgsrc repo.

Geoff



From geoffn at gnaa.net  Fri Mar  6 22:03:55 2015
From: geoffn at gnaa.net (Geoff Nordli)
Date: Fri, 06 Mar 2015 14:03:55 -0800
Subject: [OmniOS-discuss] CIFS File Lock Problems
In-Reply-To: <54FA198B.7090300@gnaa.net>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>	<32b301d057ba$d6168cc0$8243a640$@acm.org>
	<54F9FD0B.1040601@gnaa.net>
	<331901d05846$e406cbb0$ac146310$@acm.org>
	<54FA198B.7090300@gnaa.net>
Message-ID: <54FA244B.1010600@gnaa.net>

On 15-03-06 01:18 PM, Geoff Nordli wrote:
> On 15-03-06 11:51 AM, Paul B. Henson wrote:
>>> From: Geoff Nordli
>>> Sent: Friday, March 06, 2015 11:16 AM
>>>
>>> Paul, when using Samba, can users restore files via the "previous
>>> versions" within Windows to see all of the snapshots? Having an easy 
>>> way
>>> for people to restore files is a huge plus.
>> Yes, samba works with zfs snapshots and allows them to be presented 
>> via the Windows shadow copy interface.
>>
>
> Thanks Paul.
>
> Did you follow an install guide to get Samba running on Omnios?
>
> Where did you get the package from?   I don't see anything in the core 
> or "extras" repo.  I see a 3.6.x in the pkgsrc repo.
>
> Geoff
>
>

Forget it, I see that you already outlined it in a response to Aaron.

From rt at steait.net  Sat Mar  7 03:04:44 2015
From: rt at steait.net (Rune Tipsmark)
Date: Sat, 7 Mar 2015 03:04:44 +0000
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <4d6c8571-a7b9-4c31-bd87-2fe6e611f214@careyweb.com>
References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
	<EFAE308D-4950-4716-908C-ACC2D5ACBD63@richardelling.com>
	<4d6c8571-a7b9-4c31-bd87-2fe6e611f214@careyweb.com>
Message-ID: <d2fbd650521343c89db858ca6c81aa58@EX1301.steait.net>

No idea to be honest, even if there is its scary if it can cause these kinds of problems?
Br,
Rune

From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Friday, March 06, 2015 8:57 AM
To: 'Richard Elling'
Cc: omnios-discuss at lists.omniti.com
Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

Yeah, there is on R720s, I think. What about on the Supermicro and HP servers?

From: Richard Elling [mailto:richard.elling at richardelling.com]
Sent: Friday, March 06, 2015 11:39 AM
To: Nate Smith
Cc: omnios-discuss at lists.omniti.com<mailto:omnios-discuss at lists.omniti.com>
Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?


On Mar 5, 2015, at 6:00 AM, Nate Smith <nsmith at careyweb.com<mailto:nsmith at careyweb.com>> wrote:

I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.

Is there a PCI bridge in the data path? These can often be found on mezzanine or riser cards.
 -- richard


Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:13 newstorm last message repeated 1 time
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G
_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com<mailto:OmniOS-discuss at lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150307/cec4ca16/attachment-0001.html>

From omnios at citrus-it.net  Sat Mar  7 11:17:03 2015
From: omnios at citrus-it.net (Andy)
Date: Sat, 7 Mar 2015 11:17:03 +0000 (GMT)
Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator
Message-ID: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>


Hi,

I'm doing some testing with bloody, mainly to look at the new linked
ipkg zones. Got distracted for now with the addition of mailwrapper and
the new mta mediator which have come from Illumos 5166.

Mailwrapper is completely unecessary on an IPS distribution as it
effectively duplicates the function of mediators but I can see from
discussion on illumos-discuss that it was added to support
non-IPS distributions and I can see the logic in that at the Illumos
level. However, it's giving me a bit of an upgrade headache which I'm working
through and looking for ideas/help.

At Citrus, we're in the business of running mail relays and our MTA of
choice is Sendmail. Being involved in the Sendmail community, we're
actually runinng a beta of sendmail 8.15.2 as we need some of the TLS
and IPv6 enhancements it provides.

Whilst I embrace OmniOS' KYSTY principle (and it's one of the reasons we
chose OmniOS in the first place), Sendmail is one of the packages that we
deliver using standard paths and service names. That is, configuration
files under /etc/mail and a service that can be managed as just
'sendmail'.

To date (we're running r151012 in production), OmniOS doesn't install
an MTA by default but, with the integration of 5166, sendmail becomes
a dependency of mailwrapper and mailwrapper is required by SUNWcs.

Problem 1 - That immediately causes a conflict on uprade as we already have a
package which delivers /usr/lib/sendmail etc. That's easily fixed by making
these mediated links and that allows us to switch our sendmail in to replace
the default package.

aomni# (162) pkg mediator -a
MEDIATOR VER. SRC. VERSION IMPL. SRC. IMPLEMENTATION
mta      site              site       citrus-sendmail
mta      system            system     mailwrapper
mta      system            system     sendmail

Problem 2 - /etc/mail is populated by the OmniOS sendmail package.
I could work around that by rebuilding our sendmail package to use
/etc/opt/citrus-mail or something similar for its configuration. I don't
really want to do that as the change will have an impact on backend
systems and it will definitely confuse the people who look after the
systems for a while.

Problem 3 - sendmail drops in /etc/init.d/sendmail - a script which
enables the standard sendmail service. One of our support staff is bound
to type '/etc/init.d/sendmail start' at some point even though on our
current systems that command doesn't exist. That script is flagged as
preserve=true in the manifest so I suppose I could just add an exit 0 near
the top!

Problem 4 - IPS doesn't allow for mediated services so we'll always have
svc:/network/smtp:sendmail. Again, we can rebuild ours to use
svc:/network/smtp:citrus-sendmail or similar, but everyone is used to
managing sendmail as just 'sendmail'.

This is a long way of me asking if mailwrapper could be removed from
OmniOS as it isn't required for an IPS distribution. That would remove the
requirement to have the standard sendmail package installed at all - just
like <=r151012. It would mean that 'mailx' doesn't work but that should be
expected if you haven't installed an MTA and is presumably the current
behaviour.

Can anyone see any other practical and supportable options that would
allow us to replace the sendmail package wholesale?

Thanks in advance,

Andy
-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From mir at miras.org  Sat Mar  7 11:54:43 2015
From: mir at miras.org (Michael Rasmussen)
Date: Sat, 7 Mar 2015 12:54:43 +0100
Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator
In-Reply-To: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>
References: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>
Message-ID: <20150307125443.0240edbb@sleipner.datanom.net>

On Sat, 7 Mar 2015 11:17:03 +0000 (GMT)
Andy <omnios at citrus-it.net> wrote:

> 
> Can anyone see any other practical and supportable options that would
> allow us to replace the sendmail package wholesale?
> 
Could something be borrowed from Debian where the mta package simply is
creating a link to the actual mta and the sysadms change the pointer by
running update-alternatives?


-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
So you're back... about time...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <https://omniosce.org/ml-archive/attachments/20150307/766cc4a8/attachment.bin>

From omnios at citrus-it.net  Sat Mar  7 12:42:48 2015
From: omnios at citrus-it.net (Andy)
Date: Sat, 7 Mar 2015 12:42:48 +0000 (GMT)
Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator
In-Reply-To: <20150307125443.0240edbb@sleipner.datanom.net>
References: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>
	<20150307125443.0240edbb@sleipner.datanom.net>
Message-ID: <alpine.GSO.2.00.1503071239120.15563@areb.pvgehf-vg.arg>


On Sat, 7 Mar 2015, Michael Rasmussen wrote:

; On Sat, 7 Mar 2015 11:17:03 +0000 (GMT)
; Andy <omnios at citrus-it.net> wrote:
;
; >
; > Can anyone see any other practical and supportable options that would
; > allow us to replace the sendmail package wholesale?
; >
; Could something be borrowed from Debian where the mta package simply is
; creating a link to the actual mta and the sysadms change the pointer by
; running update-alternatives?

That's precisely what IPS pkg mediators do:

aomni# (168) ls -l /usr/lib/sendmail
lrwxrwxrwx   1 root     root          32 Mar  4 13:49 /usr/lib/sendmail ->
../../opt/sendmail/sbin/sendmail*

aomni# (172) pkg set-mediator -I mailwrapper mta

aomni# (173) ls -l /usr/lib/sendmail
lrwxrwxrwx   1 root     root          11 Mar  4 15:21 /usr/lib/sendmail ->
mailwrapper*

I've no problem with that as a solution for the MTA binaries

aomni# (178) pkg contents -a mediator=mta -o action.raw
service/network/smtp/sendmail
ACTION.RAW
link mediator=mta mediator-implementation=sendmail path=usr/bin/mailq
target=../lib/smtp/sendmail/mailq
link mediator=mta mediator-implementation=sendmail path=usr/sbin/sendmail
target=../lib/smtp/sendmail/sendmail
link mediator=mta mediator-implementation=sendmail
path=usr/sbin/newaliases target=../lib/smtp/sendmail/newaliases
link mediator=mta mediator-implementation=sendmail path=etc/aliases
target=./mail/aliases
link mediator=mta mediator-implementation=sendmail path=usr/lib/sendmail
target=../lib/smtp/sendmail/sendmail

but it doesn't help me with /etc/mail, the SMF service or the /etc/init.d
script that
value=pkg://omnios/service/network/smtp/sendmail at 8.14.4,5.11-0.151013
installs.

A.
-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From johan.kragsterman at capvert.se  Sat Mar  7 15:24:36 2015
From: johan.kragsterman at capvert.se (Johan Kragsterman)
Date: Sat, 7 Mar 2015 16:24:36 +0100
Subject: [OmniOS-discuss] Ang: Re:  QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <d2fbd650521343c89db858ca6c81aa58@EX1301.steait.net>
References: <d2fbd650521343c89db858ca6c81aa58@EX1301.steait.net>,
	<4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>	<EFAE308D-4950-4716-908C-ACC2D5ACBD63@richardelling.com>
	<4d6c8571-a7b9-4c31-bd87-2fe6e611f214@careyweb.com>
Message-ID: <OF495E14FD.960779DC-ONC1257E01.0054A643-C1257E01.0054A647@inse.com>


-----"OmniOS-discuss" <omnios-discuss-bounces at lists.omniti.com> skrev: -----
Till: "'Nate Smith'" <nsmith at careyweb.com>, "'Richard Elling'" <richard.elling at richardelling.com>
Fr?n: Rune Tipsmark 
S?nt av: "OmniOS-discuss" 
Datum: 2015-03-07 04:06
Kopia: "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
?rende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

No idea to be honest, even if there is its scary if it can cause these kinds of problems&#8230;

Br,

Rune






You don't know wether these systems got risers or not?

That can't be difficult to find out: Are the HBA's located directly in PCIe slots on the system board, or are they instead located in riser boards that sits in the PCIe slots?

It would be very interesting to find out....

If Richards theory is correct, you got HBA's sitting in risers on the Supermicro, but on the HP you got the HBA's directly in the PCIe slots on the system board.

 Rgrds Johan




?

From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Friday, March 06, 2015 8:57 AM
To: 'Richard Elling'
Cc: omnios-discuss at lists.omniti.com
Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

Yeah, there is on R720s, I think. What about on the Supermicro and HP servers?

?

From: Richard Elling [mailto:richard.elling at richardelling.com] 
Sent: Friday, March 06, 2015 11:39 AM
To: Nate Smith
Cc: omnios-discuss at lists.omniti.com
Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

?

?

On Mar 5, 2015, at 6:00 AM, Nate Smith <nsmith at careyweb.com> wrote:

?

I&#8217;ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I&#8217;ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don&#8217;t get an error that it&#8217;s dropped, at least not on the Omnios system, but I get notice when it&#8217;s restored (which makes no sense). I&#8217;m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.

?

Is there a PCI bridge in the data path? These can often be found on mezzanine or riser cards.

?-- richard

?

?

Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:13 newstorm last message repeated 1 time

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

?

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss



From rt at steait.net  Sat Mar  7 16:14:07 2015
From: rt at steait.net (Rune Tipsmark)
Date: Sat, 7 Mar 2015 16:14:07 +0000
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <OF495E14FD.960779DC-ONC1257E01.0054A643-C1257E01.0054A647@inse.com>
References: <d2fbd650521343c89db858ca6c81aa58@EX1301.steait.net>,
	<4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
	<EFAE308D-4950-4716-908C-ACC2D5ACBD63@richardelling.com>
	<4d6c8571-a7b9-4c31-bd87-2fe6e611f214@careyweb.com>,
	<OF495E14FD.960779DC-ONC1257E01.0054A643-C1257E01.0054A647@inse.com>
Message-ID: <1425744844314.65927@steait.net>

ok, so HP has a riser and the FC cards are sitting in the riser. SM has no riser and all cards are inserted directly onto motherboard.

Also I just remembered... when I had the Infiniband ConnectX2 installed in the SM it would not reboot and I always had to reset it via IPMI. 

The more I think about it the more I lean towards SM having an issue... and Dell uses essentially SM so same same.

br,
Rune

________________________________________
From: Johan Kragsterman <johan.kragsterman at capvert.se>
Sent: Saturday, March 7, 2015 4:24 PM
To: Rune Tipsmark
Cc: 'Nate Smith'; 'Richard Elling'; omnios-discuss at lists.omniti.com
Subject: Ang: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

-----"OmniOS-discuss" <omnios-discuss-bounces at lists.omniti.com> skrev: -----
Till: "'Nate Smith'" <nsmith at careyweb.com>, "'Richard Elling'" <richard.elling at richardelling.com>
Fr?n: Rune Tipsmark
S?nt av: "OmniOS-discuss"
Datum: 2015-03-07 04:06
Kopia: "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
?rende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

No idea to be honest, even if there is its scary if it can cause these kinds of problems&#8230;

Br,

Rune






You don't know wether these systems got risers or not?

That can't be difficult to find out: Are the HBA's located directly in PCIe slots on the system board, or are they instead located in riser boards that sits in the PCIe slots?

It would be very interesting to find out....

If Richards theory is correct, you got HBA's sitting in risers on the Supermicro, but on the HP you got the HBA's directly in the PCIe slots on the system board.

 Rgrds Johan






From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Friday, March 06, 2015 8:57 AM
To: 'Richard Elling'
Cc: omnios-discuss at lists.omniti.com
Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?



Yeah, there is on R720s, I think. What about on the Supermicro and HP servers?



From: Richard Elling [mailto:richard.elling at richardelling.com]
Sent: Friday, March 06, 2015 11:39 AM
To: Nate Smith
Cc: omnios-discuss at lists.omniti.com
Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?





On Mar 5, 2015, at 6:00 AM, Nate Smith <nsmith at careyweb.com> wrote:



I&#8217;ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I&#8217;ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don&#8217;t get an error that it&#8217;s dropped, at least not on the Omnios system, but I get notice when it&#8217;s restored (which makes no sense). I&#8217;m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.



Is there a PCI bridge in the data path? These can often be found on mezzanine or riser cards.

 -- richard





Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G

Mar  5 02:00:13 newstorm last message repeated 1 time

Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G

Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G

Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss



_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss



From johan.kragsterman at capvert.se  Sat Mar  7 16:39:37 2015
From: johan.kragsterman at capvert.se (Johan Kragsterman)
Date: Sat, 7 Mar 2015 17:39:37 +0100
Subject: [OmniOS-discuss] Ang: RE: Re: QLE2652 I/O Disconnect. Heat Sinks?
In-Reply-To: <1425744844314.65927@steait.net>
References: <1425744844314.65927@steait.net>,
	<d2fbd650521343c89db858ca6c81aa58@EX1301.steait.net>,
	<4593924c-380f-4705-871f-3105d937c5f7@careyweb.com>
	<EFAE308D-4950-4716-908C-ACC2D5ACBD63@richardelling.com>	<4d6c8571-a7b9-4c31-bd87-2fe6e611f214@careyweb.com>,
	<OF495E14FD.960779DC-ONC1257E01.0054A643-C1257E01.0054A647@inse.com>
Message-ID: <OF2382B847.2EE5093E-ONC1257E01.005B84B5-C1257E01.005B84B8@inse.com>

An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150307/8964b84b/attachment-0001.html>

From brogyi at gmail.com  Sat Mar  7 20:56:29 2015
From: brogyi at gmail.com (=?windows-1252?Q?Brogy=E1nyi_J=F3zsef?=)
Date: Sat, 07 Mar 2015 21:56:29 +0100
Subject: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00
In-Reply-To: <54FA244B.1010600@gnaa.net>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>	<32b301d057ba$d6168cc0$8243a640$@acm.org>	<54F9FD0B.1040601@gnaa.net>	<331901d05846$e406cbb0$ac146310$@acm.org>	<54FA198B.7090300@gnaa.net>
	<54FA244B.1010600@gnaa.net>
Message-ID: <54FB65FD.6040600@gmail.com>

Has anyone tested this firmware? Is it free from this error message 
"Parity Error on path"?
Thanks any information.
BR
Brogyi

From wverb73 at gmail.com  Sun Mar  8 22:58:01 2015
From: wverb73 at gmail.com (W Verb)
Date: Sun, 8 Mar 2015 15:58:01 -0700
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
 Lindsay Lohan, and the Greek economy
In-Reply-To: <7e4156d48aba46239fb4d490577382cd@NASANEXM01F.na.qualcomm.com>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
	<54F44602.5030705@osn.de>
	<CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>
	<064C3FC8-43E0-4757-AADA-831303782A4C@damore.org>
	<CAHN_Y276qRUD9y0DoZmm5Uh57DBXsEi2BJ3Z7G=DYt-p9mSmWQ@mail.gmail.com>
	<f48da72278a64860a0f72af9a673c3f4@NASANEXM01F.na.qualcomm.com>
	<CAHN_Y25pwEGavh_ffsmxDNOBiZFA0OadpBE60vQ09=ThDzj-hA@mail.gmail.com>
	<CAHN_Y24kVihzV8S9vinyjAi6Sea+HzzHrxutMQatK1Xp4CVs=Q@mail.gmail.com>
	<7e4156d48aba46239fb4d490577382cd@NASANEXM01F.na.qualcomm.com>
Message-ID: <CAHN_Y26MsHoHTbae8xZp_N6d7t8hpjTm6OPu4wOtr8HY-ZW3Ag@mail.gmail.com>

Hello,

I was able to perform my last round of testing last night. The tests were
done with a single host, while enabling one or two 1G ports.

                1 port (Read)       2 ports  (Read)
Baseline:       130MB/s Read        30MB/s
Disable LRO:    90MB/s              27MB/s
Disable LRO/LSO 88MB/s              27MB/s

LRO/LSO enabled, TCP window size varies
Default iscsid (256k max)

64k Window     96MB/s               28MB/s
32k Window     72MB/s               22MB/s
16k Window     61MB/s               17MB/s

I then set everything back to the default and captured exactly what happens
when I start a single port transfer then enable a second port in the
middle. It's pretty illustrative. The server chokes a bit, then strangely
sends TDS protocol packets saying "exception occurred". I didn't know TDS
had anything to do with iSCSI.

Captures from both interfaces here:
https://drive.google.com/open?id=0BwyUMjibonYQMG8zZnNWbk40Ymc&authuser=0

So it seems that window size isn't the limiting factor here.

I am in the middle of implementing infiniband now. I can highly recommend
the Silverstorm (QLogic) 9024CU 20G 4096MTU (with latest firmware) switch
for the lab. The fans run very quietly at normal temps, and they are very
inexpensive ($250 on eBay). It supports ethernet out-of-band management, as
well as a subnet manager web app hosted from the switch itself. Creating
the serial console cable was mildly irritating.

The latest firmware can be retrieved from the QLogic site via a Google
search, you won't find a link on their support frontpage.

I'll report back once I have iSER / SRP results.


-Warren V

On Wed, Mar 4, 2015 at 9:14 AM, Mallory, Rob <rmallory at qualcomm.com> wrote:

>  Hi Warren,
>
> [ ?no objections here if you want to take this thread off-line to a
> smaller group? I  wanted to post this to the larger groups for benefit  of
> others,
>
> And maybe if you find success in the end you can post back to the larger
> groups with a summary ]
>
>
>
>    Your recent success case going to 10GbE end to end seem to back up my
> theory of overload.
>
>     I noticed yesterday a couple things:  in the packet capture of the
> sender, it is apparent that large send offload LSO is being used (notice
> the size up to 64k packets).   Among other things, this makes it a bit
> harder to tune and understand what is happening from packet captures on the
> hosts. It also lets the NIC ?tightly pack? the outgoing packet stream
> without much gap between the MTU sized packets. You need to get a 3rd-party
> snoop on the wire to see what the wire sees. Same thing on the ESXi
> server.  I suspect it is using LRO or RSC, also ganging up the packets, and
> making it difficult to diagnose from a tcpdump on a VM.
>
>
>
> I still stand by my original inclination.  And the data you have shown
> also seems to back this. (smaller MTU = less drops/pauses and then the
> latest: equal size pipes on send and receive make it work fine.)   I was
> recommending that you either/both decrease the rwin on the client side
> (limiting it to an absurdly small, but not for this case, 17KB) or on the
> server.
>
>
>
> It makes more sense to control the window-size on the server, (to tune
> just for these small-bandwidth clients) and use a host-route with ?sendpipe
> (or is it ?ssthresh) because then the server will limit (and hopefully set
> some timers on the LSO part of the ixgbe) the amount of in-flight so-as the
> last-switch hop does not drop those important initial packets on
> tcp-slowstart.
>
>
>
> So my original recommendation still stands:  limit the rwin on the client
> to 17K, if you want to continue to use a 1GbE interface on it.
>
> (your BDP @ 150us * 1GbE is about 24k, I?d pick a max receive window size
> smaller than this to be more conservative in the case of two interfaces)
>
> (and yes,  only 2 x 9K jumbo frames fit in that BDP, so those 3k jumbos
> had less loss for a good reason, 1500MTU is probably best in your case)
>
> And two other things to help identify/understand the situation (in the
> mode you had it before with the quad-1GbE in the client)
>
> You can turn off LSO (in ixbge.conf)   and also on the ESX, and the client
> VM side you can turn off LRO or RSC.
>
>
>
> Note:  there was (maybe still is) a long-standing bug in the S10 and S11
> TCP stack which I know of only second-hand from a reliable source,
>
> It has very similar conditions that you describe here, including high
> bandwidth servers, low bandwidth clients, multiple hops.
>
> The workaround is to disable RSC on the linux clients.
>
>
>
> good to hear that you can configure the system end-to-end 10GbE.  That?s
> the obvious best case if you stick eith ethernet, and you don?t have to go
> to extremes like above.   Note that the lossless fabric of Infiniband will
> completely hide these effects of TCP loss.  Also, you can use much more
> efficient transport such as SRP (RDMA) which I think would fit really well
> if you can afford the additional complexity.   (I?ve done this, and it?s
> really not that hard on small scale).
>
>
>
> Cheers,   Rob
>
>
>
>
>
> *From:* W Verb [mailto:wverb73 at gmail.com]
> *Sent:* Tuesday, March 03, 2015 9:22 PM
> *To:* Mallory, Rob; illumos-dev
> *Cc:* Garrett D'Amore; Joerg Goltermann; omnios-discuss at lists.omniti.com
>
> *Subject:* Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, Lindsay
> Lohan, and the Greek economy
>
>
>
> Hello all,
>
> This is probably the last message in this thread.
>
> I pulled the quad-gig NIC out of one of my hosts, and installed an X520. I
> then set a single 10G port on the server to be on the same VLAN as the
> host, and defined a vswitch, vmknic, etc on the host.
>
> I set the MTU to be 9000 on both sides, then ran my tests.
>
> Read:  130 MB/s.
>
> Write:  156 MB/s.
>
> Additionally, at higher MTUs, the NIC would periodically lock up until I
> performed an "ipadm disable-if -t ixgbe0" and re-enabled it. I tried your
> updated driver, Jeorg, but unfortunately it failed quite often.
>
>
>
> I then disabled stmf, enabled NFS (v3 only) on the server, and shared a
> dataset on the zpool with "share -f nfs /ppool/testy".
> I then mounted the server dataset on the host via NFS, and copied my test
> VM from the iSCSI zvol to the NFS dataset. I also removed the binding of
> the 10G port on the host from the sw iscsi interface.
>
> Running the same tests on the VM over NFSv3 yielded:
>
> Read: 650MB/s
>
> Write: 306MB/s
>
> This is getting within 10% of the throughput I consistently get on dd
> operations local on the server, so I'm pretty happy that I'm getting as
> good as I'm going to get until I add more drives. Additionally, I haven't
> experienced any NIC hangs.
>
>
>
> I tried varying the settings in ixgbe.conf, the MTU, and disabling LSO on
> the host and server, but nothing really made that much of a difference
> (except reducing the MTU made things about 20-30% slower).
>
> mpstat during both NFS and iSCSI transfers showed all processors as
> getting roughly the same number of interrupts, etc, although I did see a
> varying number of  spins on reader/writer locks during the iSCSI transfers.
> The NFS showed no srws at all.
>
> Here is a pretty representative example of a 1s mpstat during an iSCSI
> transfer:
>
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
> idl set
>   0    0   0    0  3246 2690 8739    6  772 5967    2     0    0  11   0
> 89   0
>   1    0   0    0  2366 2249 7910    8  988 5563    2   302    0   9   0
> 91   0
>   2    0   0    0  2455 2344 5584    5  687 5656    3    66    0   9   0
> 91   0
>   3    0   0   25   248   12 6210    1  885 5679    2     0    0   9   0
> 91   0
>   4    0   0    0   284    7 5450    2  861 5751    1     0    0   8   0
> 92   0
>   5    0   0    0   232    3 4513    0  547 5733    3     0    0   7   0
> 93   0
>   6    0   0    0   322    8 6084    1  836 6295    2     0    0   8   0
> 92   0
>   7    0   0    0  3114 2848 8229    4  648 4966    2     0    0  10   0
> 90   0
>
>
>
> So, it seems that it's COMSTAR/iSCSI that's broke as hell, not ixgbe. My
> apologies to anyone I may have offended with my pre-judgement.
>
> The consequences of this performance issue are significant:
>
> 1: Instead of being able to utilize the existing quad-port NICs I have in
> my hosts, I must use dual 10G cards for redundancy purposes.
>
> 2: I must build out a full 10G switching infrastructure.
>
> 3: The network traffic is inherently less secure, as it is essentially
> impossible to do real security with NFSv3 (that is supported by ESXi).
>
> In the short run, I have already ordered some relatively cheap 20G
> infiniband gear that will hopefully push up the cost/performance ratio.
> However, I have received all sorts of advice about how painful it can be to
> build and maintain infiniband, and if iSCSI over 10G ethernet is this
> painful, I'm not hopeful that infiniband will "just work".
>
> The last option, of course, is to bail out of the Solaris derivatives and
> move to ZoL or ZoBSD. The drawbacks of this are:
>
> 1: ZoL doesn't easily support booting off of mirrored USB flash drives,
> let alone running the root filesystem and swap on them. FreeNAS, by way of
> comparison, puts a 2G swap partition on each zdev, which (strangely enough)
> causes it to often crash when a zdev experiences a failure under load.
>
> 2: Neither ZoL or FreeNAS have good, stable, kernel-based iSCSI
> implementations. FreeNAS is indeed testing istgt, but it proved unstable
> for my purposes in recent builds. Unfortunately, stmf hasn't proved itself
> any better.
>
>
>
> There are other minor differences, but these are the ones that brought me
> to OmniOS in the first place. We'll just have to wait and see how well the
> infiniband stuff works.
>
>   Hopefully this exercise will help prevent others from going down the
> same rabbit-hole that I did.
>
> -Warren V
>
>
>
>
>
>
>
>
>
> On Tue, Mar 3, 2015 at 3:45 PM, W Verb <wverb73 at gmail.com> wrote:
>
>       Hello Rob et al,
>
> Thank you for taking the time to look at this problem with me. I
> completely understand your inclination to look at the network as the most
> probable source of my issue, but I believe that this is a pretty clear-cut
> case of server-side issues.
>
> 1: I did run ping RTT tests during both read and write operations with
> multiple interfaces enabled, and the RTT stayed at ~.2ms regardless of
> whether traffic was actively being transmitted/received or not.
>
> 2: I am not seeing the TCP window size bouncing around, and I am certainly
> not seeing starvation and delay in my packet captures. It is true that I do
> see delayed ACKs and retransmissions when I bump the MTU to 9000 on both
> sides, but I stopped testing with high MTU as soon as I saw it happening
> because I have a good understanding of incast. All of my recent testing has
> been with MTUs between 1000 and 3000 bytes.
>
> 3: When testing with MTUs between 1000 and 3000 bytes, I do not see lost
> packets and retransmission in captures on either the server or client side.
> I only see staggered transmission delays on the part of the server.
>
> 4: The client is consistently advertising a large window size (20k+), so
> the TCP throttling mechanism does not appear to play into this.
>
> 5: As mentioned previously, layer 2 flow control is not enabled anywhere
> in the network, so there are no lower-level mechanisms at work.
>
> 6: Upon checking buffer and queue sizes (and doing the appropriate
> research into documentation on the C3560E's buffer sizes), I do not see
> large numbers of frames being dropped by the switch. It does happen at
> larger MTUs, but not very often (and not consistently) during transfers at
> 1000-3000 byte MTUs. I do not have QoS, policing, or rate-shaping enabled.
>
> 7: Network interface stats on both the server and the ESXi client show no
> errors of any kind. This is via netstat on the server, and esxcli / Vsphere
> client on the ESXi box.
>
> 8: When looking at captures taken simultaneously on the server and client
> side, the server-side transmission pauses are consistently seen and
> reproducible, even after multiple server rebuilds, ESXi rebuilds, vSphere
> reinstallations (down to wiping the SQL db), various COMSTAR configuration
> variations, multiple 10G NICs with different NIC chipsets, multiple
> switches (I tried both a 48-port and 24-port C3560E), multiple IOS
> revisions (12.2 and 15.0), OmniOS versions (r151012 and previous) multiple
> cables, transceivers, etc etc etc etc etc
>
>
> For your review, I have uploaded the actual packet captures to Google
> Drive:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQZS03dHJ2ZjJvTEE/view?usp=sharing
> 2 int write - ESXi vmk5
>
> https://drive.google.com/file/d/0BwyUMjibonYQTlNTQ2M5bjlxZ00/view?usp=sharing
> 2 int write - ESXi vmk1
>
> https://drive.google.com/file/d/0BwyUMjibonYQUEFsSVJCYXBVX3c/view?usp=sharing
> 2 int read -  server ixgbe0
>
> https://drive.google.com/file/d/0BwyUMjibonYQT3FBbElnNFpJTzQ/view?usp=sharing
> 2 int read - ESXi vmk5
>
> https://drive.google.com/file/d/0BwyUMjibonYQU1hXdFRLM2cxSTA/view?usp=sharing
> 2 int read - ESXi vmk1
>
> https://drive.google.com/file/d/0BwyUMjibonYQNEFZSHVZdFNweDA/view?usp=sharing
> 1 int write - ESXi vmk1
>
> https://drive.google.com/file/d/0BwyUMjibonYQM3FpTmloQm5iMGc/view?usp=sharing
> 1 int read - ESXi vmk1
>
> Regards,
>
> Warren V
>
>
>
> On Mon, Mar 2, 2015 at 1:11 PM, Mallory, Rob <rmallory at qualcomm.com>
> wrote:
>
>   Just an EWAG,   and forgive me for not following closely, I just saw
> this in my inbox, and looked at it and the screenshots for 2 minutes.
>
>
>
>   But this looks like the typical incast problem..  see
> http://www.pdl.cmu.edu/Incast/
>
> where your storage servers (there are effectively two with ISCSI/MPIO if
> round-robin is working) have networks which are 20:1 oversubscribed to your
> 1GbE host interfaces. (although one of the tcpdumps shows only one server
> so it may be choked out completely)
>
>
>
> What is your BDP?  I?m guessing .150ms * 1GbE.  For single-link that gets
> you to a MSS of 18700 or so.
>
>
>
> On your 1GbE connected clients, leave MTU at 9k, set the following in
> sysctl.conf,
>
> And reboot.
>
>
>
> net.ipv4.tcp_rmem = 4096 8938 17876
>
>
>
> If MPIO from the server is indeed round-robining properly, this will ?make
> things fit? much better.
>
>
>
> Note that your tcp_wmem can and should stay high, since you are not
> oversubscribed going from client?server ;  you only need to tweak the tcp
> receive window size.
>
>
>
> I?ve not done it in quite some time, but IIRC, You can also set these from
> the server side with:
>
> Route add -sendpipe 8930   or ?ssthresh
>
>
>
> And I think you can see the hash-table with computed BDP per client with
> ndd.
>
>
>
> I would try playing with those before delving deep into potential bugs in
> the TCP, nic driver, zfs, or vm.
>
> -Rob
>
>
>
> *From:* W Verb via illumos-developer [mailto:developer at lists.illumos.org]
> *Sent:* Monday, March 02, 2015 12:20 PM
> *To:* Garrett D'Amore
> *Cc:* Joerg Goltermann; illumos-dev; omnios-discuss at lists.omniti.com
> *Subject:* Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, Lindsay
> Lohan, and the Greek economy
>
>
>
> Hello,
>
> vmstat seems pretty boring. Certainly nothing going to swap.
>
> root at sanbox:/root# vmstat
>  kthr      memory            page            disk          faults      cpu
>  r b w   swap  free  re  mf pi po fr de sr po ro s0 s2   in   sy   cs us
> sy id
>  0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0
> 1 99
>
> Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep 30"
> during the "fast" write operation.
>
>
> -------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest Lock           Caller
> 50934   3%  79% 0.00     3437 0xffffff093145ba40     taskq_dispatch_ent
>
>       nsec ------ Time Distribution ------ count     Stack
>        128 |                               7         spa_taskq_dispatch_ent
>        256 |@@                             4333      zio_taskq_dispatch
>        512 |@@                             3863      zio_issue_async
>       1024 |@@@@@                          9717      zio_execute
>       2048 |@@@@@@@@@                      15904
>       4096 |@@@@                           7595
>       8192 |@@                             4498
>      16384 |@                              2662
>      32768 |@                              1886
>      65536 |                               434
>     131072 |                               34
>     262144 |                               1
>
> -------------------------------------------------------------------------------
>
>   However, the truly "broken" function is a read operation:
>
> Top lock 1st try:
>
> -------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>   474  15%  15% 0.00     7031 0xffffff093145b6f8     cv_wait
>
>       nsec ------ Time Distribution ------ count     Stack
>        256 |@                              29        taskq_thread_wait
>        512 |@@@@@@                         100       taskq_thread
>       1024 |@@@@                           72        thread_start
>       2048 |@@@@                           69
>       4096 |@@@                            51
>       8192 |@@                             47
>      16384 |@@                             44
>      32768 |@@                             32
>      65536 |@                              25
>     131072 |                               5
>
> -------------------------------------------------------------------------------
>
> Top lock 2nd try:
>
>
> -------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>   174  39%  39% 0.00   103909 0xffffff0943f116a0     dmu_zfetch_find
>
>       nsec ------ Time Distribution ------ count     Stack
>       2048 |                               2         dmu_zfetch
>       4096 |                               3         dbuf_read
>       8192 |                               4
> dmu_buf_hold_array_by_dnode
>      16384 |                               3         dmu_buf_hold_array
>      32768 |@                              7
>      65536 |@@                             14
>     131072 |@@@@@@@@@@@@@@@@@@@@           116
>     262144 |@@@                            19
>     524288 |                               4
>    1048576 |                               2
>
> -------------------------------------------------------------------------------
>
> Top lock 3rd try:
>
>
> -------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>   283  55%  55% 0.00    94602 0xffffff0943ff5a68     dmu_zfetch_find
>
>       nsec ------ Time Distribution ------ count     Stack
>        512 |                               1         dmu_zfetch
>       1024 |                               1         dbuf_read
>       2048 |                               0
> dmu_buf_hold_array_by_dnode
>       4096 |                               5         dmu_buf_hold_array
>       8192 |                               2
>      16384 |                               7
>      32768 |                               4
>      65536 |@@@                            33
>     131072 |@@@@@@@@@@@@@@@@@@@@           198
>     262144 |@@                             27
>     524288 |                               2
>    1048576 |                               3
>
> -------------------------------------------------------------------------------
>
>
>
> As for the MTU question- setting the MTU to 9000 makes read operations
> grind almost to a halt at 5MB/s transfer rate.
>
> -Warren V
>
>
>
> On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore <garrett at damore.org>
> wrote:
>
>  Here?s a theory.  You are using small (relatively) MTUs (3000 is less
> than the smallest ZFS block size.)  So, when you go multipathing this way,
> might a single upper layer transaction (ZFS block transfer request, or for
> that matter COMSTAR block request) get routed over different paths.  This
> sounds like a potentially pathological condition to me.
>
>
>
> What happens if you increase the MTU to 9000?  Have you tried it?  I?m
> sort of thinking that this will permit each transaction to be issued in a
> single IP frame, which may alleviate certain tragic code paths.  (That
> said, I?m not sure how aware COMSTAR is of the IP MTU.  If it is ignorant,
> then it shouldn?t matter *that* much, since TCP should do the right thing
> here and a single TCP stream should stick to a single underlying NIC.  But
> if COMSTAR is aware of the MTU, it may do some really screwball things as
> it tries to break requests up into single frames.)
>
>
>
> Your read spin really looks like only about 22 msec of wait out of a total
> run of 30 sec.  (That?s not *great*, but neither does it sound tragic.)
>  Your write  is interesting because that looks like it is going a wildly
> different path.  You should be aware that the locks you see are *not*
> necessarily related in call order, but rather are ordered by instance
> count.  The write code path hitting the task_thread as hard as it does is
> really, really weird.  Something is pounding on a taskq lock super hard.
> The number of taskq_dispatch_ent calls is interesting here.  I?m starting
> to wonder if it?s something as stupid as a spin where if the taskq is
> ?full? (max size reached), a caller just is spinning trying to dispatch
> jobs to the taskq.
>
>
>
> The taskq_dispatch_ent code is super simple, and it should be almost
> impossible to have contention on that lock ? barring a thread spinning hard
> on taskq_dispatch (or taskq_dispatch_ent as I think is happening here).
> Looking at the various call sites, there are places in both COMSTAR
> (iscsit) and in ZFS where this could be coming from.  To know which, we
> really need to have the back trace associated.
>
>
>
> lockstat can give this ? try giving ?-s 5? to give a short backtrace from
> this, that will probably give us a little more info about the guilty
> caller. :-)
>
>
>
> - Garrett
>
>
>
>   On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer <
> developer at lists.illumos.org> wrote:
>
>
>
> Hello all,
>
> I am not using layer 2 flow control. The switch carries line-rate 10G
> traffic without error.
>
> I think I have found the issue via lockstat. The first lockstat is taken
> during a multipath read:
>
> lockstat -kWP sleep 30
>
>
> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec)
>
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>
> -------------------------------------------------------------------------------
>  9306  44%  44% 0.00     1557 htable_mutex+0x370     htable_release
>  6307  23%  68% 0.00     1207 htable_mutex+0x108     htable_lookup
>   596   7%  75% 0.00     4100 0xffffff0931705188     cv_wait
>   349   5%  80% 0.00     4437 0xffffff0931705188     taskq_thread
>   704   2%  82% 0.00      995 0xffffff0935de3c50     dbuf_create
>
> The hash table being read here I would guess is the tcp connection hash
> table.
>
>
>
> When lockstat is run during a multipath write operation, I get:
>
> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec)
>
> Count indv cuml rcnt     nsec Hottest Lock           Caller
>
> -------------------------------------------------------------------------------
> 210752  28%  28% 0.00     4781 0xffffff0931705188     taskq_thread
> 174471  22%  50% 0.00     4476 0xffffff0931705188     cv_wait
> 127183  10%  61% 0.00     2871 0xffffff096f29b510     zio_notify_parent
> 176066  10%  70% 0.00     1922 0xffffff0931705188     taskq_dispatch_ent
> 105134   9%  80% 0.00     3110 0xffffff096ffdbf10     zio_remove_child
> 67512   4%  83% 0.00     1938 0xffffff096f3db4b0     zio_add_child
> 45736   3%  86% 0.00     2239 0xffffff0935de3c50     dbuf_destroy
> 27781   3%  89% 0.00     3416 0xffffff0935de3c50     dbuf_create
> 38536   2%  91% 0.00     2122 0xffffff0935de3b70     dnode_rele
> 27841   2%  93% 0.00     2423 0xffffff0935de3b70     dnode_diduse_space
> 19020   2%  95% 0.00     3046 0xffffff09d9e305e0     dbuf_rele
> 14627   1%  96% 0.00     3632 dbuf_hash_table+0x4f8  dbuf_find
>
>   Writes are not performing htable lookups, while reads are.
>
> -Warren V
>
>
>
>
>
>
> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann <jg at osn.de> wrote:
>
>  Hi,
>
> I would try *one* TPG which includes both interface addresses
> and I would double check for packet drops on the Catalyst.
>
> The 3560 supports only receive flow control which means, that
> a sending 10Gbit port can easily overload a 1Gbit port.
> Do you have flow control enabled?
>
>  - Joerg
>
>
>
> On 02.03.2015 09:22, W Verb via illumos-developer wrote:
>
>   Hello Garrett,
>
> No, no 802.3ad going on in this config.
>
> Here is a basic schematic:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing
>
> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing
>
> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The
> switch is set to allow 9148-byte frames, and I'm not seeing any
> errors/buffer overruns on the switch.
>
> Here is a screenshot of a packet capture from a read operation on the
> guest OS (from it's local drive, which is actually a VMDK file on the
> storage server). In this example, only a single 1G ESXi kernel interface
> (vmk1) is bound to the software iSCSI initiator.
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing
>
> Note that there's a nice, well-behaved window sizing process taking
> place. The ESXi decreases the scaled window by 11 or 12 for each ACK,
> then bumps it back up to 512.
>
> Here is a similar screenshot of a single-interface write operation:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing
>
> There are no pauses or gaps in the transmission rate in the
> single-interface transfers.
>
>
> In the next screenshots, I have enabled an additional 1G interface on
> the ESXi host, and bound it to the iSCSI initiator. The new interface is
> bound to a separate physical port, uses a different VLAN on the switch,
> and talks to a different 10G port on the storage server.
>
> First, let's look at a write operation on the guest OS, which happily
> pumps data at near-line-rate to the storage server.
>
> Here is a sequence number trace diagram. Note how the transfer has a
> nice, smooth increment rate over the entire transfer.
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing
>
> Here are screenshots from packet captures on both 1G interfaces:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing
>
> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing
>
> Note how we again see nice, smooth window adjustment, and no gaps in
> transmission.
>
>
> But now, let's look at the problematic two-interface Read operation.
> First, the sequence graph:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing
>
> As you can see, there are gaps and jumps in the transmission throughout
> the transfer.
> It is very illustrative to look at captures of the gaps, which are
> occurring on both interfaces:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing
>
> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing
>
> As you can see, there are ~.4 second pauses in transmission from the
> storage server, which kills the transfer rate.
> It's clear that the ESXi box ACKs the prior iSCSI operation to
> completion, then makes a new LUN request, which the storage server
> immediately replies to. The ESXi ACKs the response packet from the
> storage server, then waits...and waits....and waits... until eventually
> the storage server starts transmitting again.
>
> Because the pause happens while the ESXi client is waiting for a packet
> from the storage server, that tells me that the gaps are not an artifact
> of traffic being switched between both active interfaces, but are
> actually indicative of short hangs occurring on the server.
>
> Having a pause or two in transmission is no big deal, but in my case, it
> is happening constantly, and dropping my overall read transfer rate down
> to 20-60MB/s, which is slower than the single interface transfer rate
> (~90-100MB/s).
>
> Decreasing the MTU makes the pauses shorter, increasing them makes the
> pauses longer.
>
> Another interesting thing is that if I set the multipath io interval to
> 3 operations instead of 1, I get better throughput. In other words, the
> less frequently I swap IP addresses on my iSCSI requests from the ESXi
> unit, the fewer pauses I see.
>
> Basically, COMSTAR seems to choke each time an iSCSI request from a new
> IP arrives.
>
> Because the single interface transfer is near line rate, that tells me
> that the storage system (mpt_sas, zfs, etc) is working fine. It's only
> when multiple paths are attempted that iSCSI falls on its face during
> reads.
>
> All of these captures were taken without a cache device being attached
> to the storage zpool, so this isn't looking like some kind of ZFS ARC
> problem. As mentioned previously, local transfers to/from the zpool are
> showing ~300-500 MB/s rates over long transfers (10G+).
>
> -Warren V
>
> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org
>
> <mailto:garrett at damore.org>> wrote:
>
>     I?m not sure I?ve followed properly.  You have *two* interfaces.
>     You are not trying to provision these in an aggr are you? As far as
>     I?m aware, VMware does not support 802.3ad link aggregations.  (Its
>     possible that you can make it work with ESXi if you give the entire
>     NIC to the guest ? but I?m skeptical.)  The problem is that if you
>     try to use link aggregation, some packets (up to half!) will be
>     lost.  TCP and other protocols fare poorly in this situation.
>
>     Its possible I?ve totally misunderstood what you?re trying to do, in
>     which case I apologize.
>
>     The idle thing is a red-herring ? the cpu is waiting for work to do,
>     probably because packets haven?t arrived (or where dropped by the
>     hypervisor!)  I wouldn?t read too much into that except that your
>     network stack is in trouble.  I?d look a bit more closely at the
>     kstats for tcp ? I suspect you?ll see retransmits or out of order
>     values that are unusually high ? if so this may help validate my
>     theory above.
>
>     - Garrett
>
>     On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>     <developer at lists.illumos.org <mailto:developer at lists.illumos.org>>
>
>
>     wrote:
>
>     Hello all,
>
>
>     Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>
>
>     I tried Joerg's updated driver, which didn't improve the issue. So
>     I went back to the drawing board and rebuilt the server from scratch.
>
>     What I noted is that if I have only a single 1-gig physical
>     interface active on the ESXi host, everything works as expected.
>     As soon as I enable two interfaces, I start seeing the performance
>     problems I've described.
>
>     Response pauses from the server that I see in TCPdumps are still
>     leading me to believe the problem is delay on the server side, so
>     I ran a series of kernel dtraces and produced some flamegraphs.
>
>
>     This was taken during a read operation with two active 10G
>     interfaces on the server, with a single target being shared by two
>     tpgs- one tpg for each 10G physical port. The host device has two
>     1G ports enabled, with VLANs separating the active ports into
>     10G/1G pairs. ESXi is set to multipath using both VLANS with a
>     round-robin IO interval of 1.
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing
>
>
>     This was taken during a write operation:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing
>
>
>     I then rebooted the server and disabled C-State, ACPI T-State, and
>     general EIST (Turbo boost) functionality in the CPU.
>
>     I when I attempted to boot my guest VM, the iSCSI transfer
>     gradually ground to a halt during the boot loading process, and
>     the guest OS never did complete its boot process.
>
>     Here is a flamegraph taken while iSCSI is slowly dying:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing
>
>
>     I edited out cpu_idle_adaptive from the dtrace output and
>     regenerated the slowdown graph:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing
>
>
>     I then edited cpu_idle_adaptive out of the speedy write operation
>     and regenerated that graph:
>
>
> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing
>
>
>     I have zero experience with interpreting flamegraphs, but the most
>     significant difference I see between the slow read example and the
>     fast write example is in unix`thread_start --> unix`idle. There's
>     a good chunk of "unix`i86_mwait" in the read example that is not
>     present in the write example at all.
>
>     Disabling the l2arc cache device didn't make a difference, and I
>     had to reenable EIST support on the CPU to get my VMs to boot.
>
>     I am seeing a variety of bug reports going back to 2010 regarding
>     excessive mwait operations, with the suggested solutions usually
>     being to set "cpupm enable poll-mode" in power.conf. That change
>     also had no effect on speed.
>
>     -Warren V
>
>
>
>
>     -----Original Message-----
>
>     From: Chris Siebenmann [mailto:cks at cs.toronto.edu]
>
>     Sent: Monday, February 23, 2015 8:30 AM
>
>     To: W Verb
>
>     Cc: omnios-discuss at lists.omniti.com
>
>     <mailto:omnios-discuss at lists.omniti.com>; cks at cs.toronto.edu
>     <mailto:cks at cs.toronto.edu>
>
>     Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and
>     the Greek economy
>
>
>     > Chris, thanks for your specific details. I'd appreciate it if you
>
>     > could tell me which copper NIC you tried, as well as to pass on the
>
>     > iSCSI tuning parameters.
>
>
>     Our copper NIC experience is with onboard X540-AT2 ports on
>     SuperMicro hardware (which have the guaranteed 10-20 msec lock
>     hold) and dual-port 82599EB TN cards (which have some sort of
>     driver/hardware failure under load that eventually leads to
>     2-second lock holds). I can't recommend either with the current
>     driver; we had to revert to 1G networking in order to get stable
>     servers.
>
>
>     The iSCSI parameter modifications we do, across both initiators
>     and targets, are:
>
>
>     initialr2tno
>
>     firstburstlength128k
>
>     maxrecvdataseglen128k[only on Linux backends]
>
>     maxxmitdataseglen128k[only on Linux backends]
>
>
>     The OmniOS initiator doesn't need tuning for more than the first
>     two parameters; on the Linux backends we tune up all four. My
>     extended thoughts on these tuning parameters and why we touch them
>     can be found
>
>     here:
>
>
>
> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol
>
>     http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>
>
>     The short version is that these parameters probably only make a
>     small difference but their overall goal is to do 128KB ZFS reads
>     and writes in single iSCSI operations (although they will be
>     fragmented at the TCP
>
>     layer) and to do iSCSI writes without a back-and-forth delay
>     between initiator and target (that's 'initialr2t no').
>
>
>     I think basically everyone should use InitialR2T set to no and in
>     fact that it should be the software default. These days only
>     unusually limited iSCSI targets should need it to be otherwise and
>     they can change their setting for it (initiator and target must
>     both agree to it being 'yes', so either can veto it).
>
>
>     - cks
>
>
>
>     On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de
>
>     <mailto:jg at osn.de>> wrote:
>
>         Hi,
>
>         I think your problem is caused by your link properties or your
>         switch settings. In general the standard ixgbe seems to perform
>         well.
>
>         I had trouble after changing the default flow control settings
>         to "bi"
>         and this was my motivation to update the ixgbe driver a long
>         time ago.
>         After I have updated our systems to ixgbe 2.5.8 I never had any
>         problems ....
>
>         Make sure your switch has support for jumbo frames and you use
>         the same mtu on all ports, otherwise the smallest will be used.
>
>         What switch do you use? I can tell you nice horror stories about
>         different vendors....
>
>          - Joerg
>
>         On 23.02.2015 10:31, W Verb wrote:
>
>             Thank you Joerg,
>
>             I've downloaded the package and will try it tomorrow.
>
>             The only thing I can add at this point is that upon review
>             of my
>             testing, I may have performed my "pkg -u" between the
>             initial quad-gig
>             performance test and installing the 10G NIC. So this may
>             be a new
>             problem introduced in the latest updates.
>
>             Those of you who are running 10G and have not upgraded to
>             the latest
>             kernel, etc, might want to do some additional testing
>             before running the
>             update.
>
>             -Warren V
>
>             On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann
>             <jg at osn.de <mailto:jg at osn.de>
>
>             <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
>
>                 Hi,
>
>                 I remember there was a problem with the flow control
>             settings in the
>                 ixgbe
>                 driver, so I updated it a long time ago for our
>             internal servers to
>                 2.5.8.
>                 Last weekend I integrated the latest changes from the
>             FreeBSD driver
>                 to bring
>                 the illumos ixgbe to 2.5.25 but I had no time to test
>             it, so it's
>                 completely
>                 untested!
>
>
>                 If you would like to give the latest driver a try you
>             can fetch the
>                 kernel modules from
>             https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9
>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>                 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>             <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>
>
>                 Clone your boot environment, place the modules in the
>             new environment
>                 and update the boot-archive of the new BE.
>
>                   - Joerg
>
>
>
>
>
>                 On 23.02.2015 02:54, W Verb wrote:
>
>                     By the way, to those of you who have working
>             setups: please send me
>                     your pool/volume settings, interface linkprops,
>             and any kernel
>                     tuning
>                     parameters you may have set.
>
>                     Thanks,
>                     Warren V
>
>                     On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>                     <chip at innovates.com <mailto:chip at innovates.com>
>             <mailto:chip at innovates.com <mailto:chip at innovates.com>>>
>
>
>             wrote:
>
>                         I can't say I totally agree with your performance
>                         assessment.   I run Intel
>                         X520 in all my OmniOS boxes.
>
>                         Here is a capture of nfssvrtop I made while
>             running many
>                         storage vMotions
>                         between two OmniOS boxes hosting NFS
>             datastores.   This is a
>                         10 host VMware
>                         cluster.  Both OmniOS boxes are dual 10G
>             connected with
>                         copper twin-ax to
>                         the in rack Nexus 5010.
>
>                         VMware does 100% sync writes, I use ZeusRAM
>             SSDs for log
>                         devices.
>
>                         -Chip
>
>                         2014 Apr 24 08:05:51, load: 12.64, read:
>             17330243 KB,
>                         swrite: 15985    KB,
>                         awrite: 1875455  KB
>
>                         Ver     Client           NFSOPS   Reads
>             SWrites AWrites
>                         Commits   Rd_bw
>                         SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t
>              Com_t  Align%
>
>                         4       10.28.17.105          0       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
>
>                         4       10.28.17.215          0       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
>
>                         4       10.28.17.213          0       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
>
>                         4       10.28.16.151          0       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
>
>                         4       all                   1       0
>              0       0
>                           0       0
>                         0       0       0       0       0       0       0
>
>                         3       10.28.16.175          3       0
>              3       0
>                           0       1
>                         11       0    4806      48       0       0      85
>
>                         3       10.28.16.183          6       0
>              6       0
>                           0       3
>                         162       0     549     124       0       0
>               73
>
>                         3       10.28.16.180         11       0
>             10       0
>                           0       3
>                         27       0     776      89       0       0      67
>
>                         3       10.28.16.176         28       2
>             26       0
>                           0      10
>                         405       0    2572     198       0       0
>              100
>
>                         3       10.28.16.178       4606    4602
>              4       0
>                           0  294534
>                         3       0     723      49       0       0      99
>
>                         3       10.28.16.179       4905    4879
>             26       0
>                           0  312208
>                         311       0     735     271       0       0
>               99
>
>                         3       10.28.16.181       5515    5502
>             13       0
>                           0  352107
>                         77       0      89      87       0       0      99
>
>                         3       10.28.16.184      12095   12059
>             10       0
>                           0  763014
>                         39       0     249     147       0       0      99
>
>                         3       10.28.58.1        15401    6040
>              116    6354
>                         53  191605
>                         474  202346     192      96     144      83
>               99
>
>                         3       all 42574 33086 <tel:42574%2033086
> <42574%2033086>>
>             <tel:42574%20%20%2033086 <42574%20%20%2033086>>     217
>                         6354      53 1913488
>                         1582  202300     348     138     153     105
>                 99
>
>
>
>
>
>                         On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>             <wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>                         <mailto:wverb73 at gmail.com
>
>
>             <mailto:wverb73 at gmail.com>>> wrote:
>
>
>                             Hello All,
>
>                             Thank you for your replies.
>                             I tried a few things, and found the following:
>
>                             1: Disabling hyperthreading support in the
>             BIOS drops
>                             performance overall
>                             by a factor of 4.
>                             2: Disabling VT support also seems to have
>             some effect,
>                             although it
>                             appears to be minor. But this has the
>             amusing side
>                             effect of fixing the
>                             hangs I've been experiencing with fast
>             reboot. Probably
>                             by disabling kvm.
>                             3: The performance tests are a bit tricky
>             to quantify
>                             because of caching
>                             effects. In fact, I'm not entirely sure
>             what is
>                             happening here. It's just
>                             best to describe what I'm seeing:
>
>                             The commands I'm using to test are
>                             dd if=/dev/zero of=./test.dd bs=2M count=5000
>                             dd of=/dev/null if=./test.dd bs=2M count=5000
>                             The host vm is running Centos 6.6, and has
>             the latest
>                             vmtools installed.
>                             There is a host cache on an SSD local to
>             the host that
>                             is also in place.
>                             Disabling the host cache didn't
>             immediately have an
>                             effect as far as I could
>                             see.
>
>                             The host MTU set to 3000 on all iSCSI
>             interfaces for all
>                             tests.
>
>                             Test 1: Right after reboot, with an ixgbe
>             MTU of 9000,
>                             the write test
>                             yields an average speed over three tests
>             of 137MB/s. The
>                             read test yields an
>                             average over three tests of 5MB/s.
>
>                             Test 2: After setting "ifconfig ixgbe0 mtu
>             3000", the
>                             write tests yield
>                             140MB/s, and the read tests yield 53MB/s.
>             It's important
>                             to note here that
>                             if I cut the read test short at only
>             2-3GB, I get
>                             results upwards of
>                             350MB/s, which I assume is local
>             cache-related distortion.
>
>                             Test 3: MTU of 1500. Read tests are up to
>             156 MB/s.
>                             Write tests yield
>                             about 142MB/s.
>                             Test 4: MTU of 1000: Read test at 182MB/s.
>                             Test 5: MTU of 900: Read test at 130 MB/s.
>                             Test 6: MTU of 1000: Read test at 160MB/s.
>             Write tests
>                             are now
>                             consistently at about 300MB/s.
>                             Test 7: MTU of 1200: Read test at 124MB/s.
>                             Test 8: MTU of 1000: Read test at 161MB/s.
>             Write at 261MB/s.
>
>                             A few final notes:
>                             L1ARC grabs about 10GB of RAM during the
>             tests, so
>                             there's definitely some
>                             read cachi
>
> ...
>
> [Message clipped]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150308/045e236d/attachment-0001.html>

From garrett at damore.org  Sun Mar  8 23:30:06 2015
From: garrett at damore.org (Garrett D'Amore)
Date: Sun, 8 Mar 2015 16:30:06 -0700
Subject: [OmniOS-discuss] [developer] Re:  The ixgbe driver,
	Lindsay Lohan, and the Greek economy
In-Reply-To: <CAHN_Y26MsHoHTbae8xZp_N6d7t8hpjTm6OPu4wOtr8HY-ZW3Ag@mail.gmail.com>
References: <CAHN_Y258bM42G6bCXpMv0fd4s5JMHDY7MBc7oTTnPCp6yN+a5Q@mail.gmail.com>
	<20150221032559.727D07A0792@apps0.cs.toronto.edu>
	<CAHN_Y24MBOmS94DbzhDojT3wgZ9FKU+FMh9NR8RQaGQYi+_COQ@mail.gmail.com>
	<CALeZrrRvQtRoSX16YJX7RmZX3C6+E4w9z6RBG_xPhJzdH5E0bg@mail.gmail.com>
	<CAHN_Y26z=mPjPEJoV2G19Mr0uscFO8f1U-CwpDODGoE5y=vgUg@mail.gmail.com>
	<54EAEFA8.4020101@osn.de>
	<CAHN_Y25MXSsdNzJMvDAsuerJ0Tp74oV4CJjtac85vxmYg8LcXg@mail.gmail.com>
	<54EB5392.6030900@osn.de>
	<CAHN_Y27uY=U4wzM3bG-WTeN6dHNB06VtystVmGx6ZJgDmGdc+g@mail.gmail.com>
	<279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org>
	<CAHN_Y25XrnK+EfhW=XH-H96tsFPpcBM=YkGds=y2geoWa=aTCg@mail.gmail.com>
	<54F44602.5030705@osn.de>
	<CAHN_Y26xEc7Tvz3Pz0BSXBfeY5_Yxvp4A++n=9LP89sEzxT48Q@mail.gmail.com>
	<064C3FC8-43E0-4757-AADA-831303782A4C@damore.org>
	<CAHN_Y276qRUD9y0DoZmm5Uh57DBXsEi2BJ3Z7G=DYt-p9mSmWQ@mail.gmail.com>
	<f48da72278a64860a0f72af9a673c3f4@NASANEXM01F.na.qualcomm.com>
	<CAHN_Y25pwEGavh_ffsmxDNOBiZFA0OadpBE60vQ09=ThDzj-hA@mail.gmail.com>
	<CAHN_Y24kVihzV8S9vinyjAi6Sea+HzzHrxutMQatK1Xp4CVs=Q@mail.gmail.com>
	<7e4156d48aba46239fb4d490577382cd@NASANEXM01F.na.qualcomm.com>
	<CAHN_Y26MsHoHTbae8xZp_N6d7t8hpjTm6OPu4wOtr8HY-ZW3Ag@mail.gmail.com>
Message-ID: <EAD84CFD-E76D-491B-B7C2-7B63B0AEBD0A@damore.org>

Cool.  Sorry I've still not looked at your lockstat data.  Been buried under other tasks. I will try to find some time in the next day or so.  

Sent from my iPhone

> On Mar 8, 2015, at 3:58 PM, W Verb <wverb73 at gmail.com> wrote:
> 
> Hello,
> 
> I was able to perform my last round of testing last night. The tests were done with a single host, while enabling one or two 1G ports.
> 
>                 1 port (Read)       2 ports  (Read)
> Baseline:       130MB/s Read        30MB/s
> Disable LRO:    90MB/s              27MB/s
> Disable LRO/LSO 88MB/s              27MB/s
> 
> LRO/LSO enabled, TCP window size varies
> Default iscsid (256k max)
> 
> 64k Window     96MB/s               28MB/s
> 32k Window     72MB/s               22MB/s
> 16k Window     61MB/s               17MB/s
> 
> I then set everything back to the default and captured exactly what happens when I start a single port transfer then enable a second port in the middle. It's pretty illustrative. The server chokes a bit, then strangely sends TDS protocol packets saying "exception occurred". I didn't know TDS had anything to do with iSCSI.
> 
> Captures from both interfaces here:
> https://drive.google.com/open?id=0BwyUMjibonYQMG8zZnNWbk40Ymc&authuser=0
> 
> So it seems that window size isn't the limiting factor here.
> 
> I am in the middle of implementing infiniband now. I can highly recommend the Silverstorm (QLogic) 9024CU 20G 4096MTU (with latest firmware) switch for the lab. The fans run very quietly at normal temps, and they are very inexpensive ($250 on eBay). It supports ethernet out-of-band management, as well as a subnet manager web app hosted from the switch itself. Creating the serial console cable was mildly irritating.
> 
> The latest firmware can be retrieved from the QLogic site via a Google search, you won't find a link on their support frontpage.
> 
> I'll report back once I have iSER / SRP results. 
> 
> 
> -Warren V
> 
>> On Wed, Mar 4, 2015 at 9:14 AM, Mallory, Rob <rmallory at qualcomm.com> wrote:
>> Hi Warren,
>> 
>> [ ?no objections here if you want to take this thread off-line to a smaller group? I  wanted to post this to the larger groups for benefit  of others,
>> 
>> And maybe if you find success in the end you can post back to the larger groups with a summary ]
>> 
>>  
>> 
>>    Your recent success case going to 10GbE end to end seem to back up my theory of overload.
>> 
>>     I noticed yesterday a couple things:  in the packet capture of the sender, it is apparent that large send offload LSO is being used (notice the size up to 64k packets).   Among other things, this makes it a bit harder to tune and understand what is happening from packet captures on the hosts. It also lets the NIC ?tightly pack? the outgoing packet stream without much gap between the MTU sized packets. You need to get a 3rd-party snoop on the wire to see what the wire sees. Same thing on the ESXi server.  I suspect it is using LRO or RSC, also ganging up the packets, and making it difficult to diagnose from a tcpdump on a VM.
>> 
>>  
>> 
>> I still stand by my original inclination.  And the data you have shown also seems to back this. (smaller MTU = less drops/pauses and then the latest: equal size pipes on send and receive make it work fine.)   I was recommending that you either/both decrease the rwin on the client side (limiting it to an absurdly small, but not for this case, 17KB) or on the server.
>> 
>>  
>> 
>> It makes more sense to control the window-size on the server, (to tune just for these small-bandwidth clients) and use a host-route with ?sendpipe (or is it ?ssthresh) because then the server will limit (and hopefully set some timers on the LSO part of the ixgbe) the amount of in-flight so-as the last-switch hop does not drop those important initial packets on tcp-slowstart.
>> 
>>  
>> 
>> So my original recommendation still stands:  limit the rwin on the client to 17K, if you want to continue to use a 1GbE interface on it.
>> 
>> (your BDP @ 150us * 1GbE is about 24k, I?d pick a max receive window size smaller than this to be more conservative in the case of two interfaces)
>> 
>> (and yes,  only 2 x 9K jumbo frames fit in that BDP, so those 3k jumbos had less loss for a good reason, 1500MTU is probably best in your case)
>> 
>> And two other things to help identify/understand the situation (in the mode you had it before with the quad-1GbE in the client)
>> 
>> You can turn off LSO (in ixbge.conf)   and also on the ESX, and the client VM side you can turn off LRO or RSC.
>> 
>>  
>> 
>> Note:  there was (maybe still is) a long-standing bug in the S10 and S11 TCP stack which I know of only second-hand from a reliable source,
>> 
>> It has very similar conditions that you describe here, including high bandwidth servers, low bandwidth clients, multiple hops.
>> 
>> The workaround is to disable RSC on the linux clients.
>> 
>>  
>> 
>> good to hear that you can configure the system end-to-end 10GbE.  That?s the obvious best case if you stick eith ethernet, and you don?t have to go to extremes like above.   Note that the lossless fabric of Infiniband will completely hide these effects of TCP loss.  Also, you can use much more efficient transport such as SRP (RDMA) which I think would fit really well if you can afford the additional complexity.   (I?ve done this, and it?s really not that hard on small scale).
>> 
>>  
>> 
>> Cheers,   Rob
>> 
>>  
>> 
>>  
>> 
>> From: W Verb [mailto:wverb73 at gmail.com] 
>> Sent: Tuesday, March 03, 2015 9:22 PM
>> To: Mallory, Rob; illumos-dev
>> Cc: Garrett D'Amore; Joerg Goltermann; omnios-discuss at lists.omniti.com
>> 
>> 
>> Subject: Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy
>>  
>> 
>> Hello all,
>> 
>> This is probably the last message in this thread.
>> 
>> I pulled the quad-gig NIC out of one of my hosts, and installed an X520. I then set a single 10G port on the server to be on the same VLAN as the host, and defined a vswitch, vmknic, etc on the host.
>> 
>> I set the MTU to be 9000 on both sides, then ran my tests.
>> 
>> Read:  130 MB/s.
>> 
>> Write:  156 MB/s.
>> 
>> Additionally, at higher MTUs, the NIC would periodically lock up until I performed an "ipadm disable-if -t ixgbe0" and re-enabled it. I tried your updated driver, Jeorg, but unfortunately it failed quite often.
>> 
>>  
>> 
>> I then disabled stmf, enabled NFS (v3 only) on the server, and shared a dataset on the zpool with "share -f nfs /ppool/testy".
>> I then mounted the server dataset on the host via NFS, and copied my test VM from the iSCSI zvol to the NFS dataset. I also removed the binding of the 10G port on the host from the sw iscsi interface.
>> 
>> Running the same tests on the VM over NFSv3 yielded:
>> 
>> Read: 650MB/s
>> 
>> Write: 306MB/s
>> 
>> This is getting within 10% of the throughput I consistently get on dd operations local on the server, so I'm pretty happy that I'm getting as good as I'm going to get until I add more drives. Additionally, I haven't experienced any NIC hangs.
>> 
>>  
>> 
>> I tried varying the settings in ixgbe.conf, the MTU, and disabling LSO on the host and server, but nothing really made that much of a difference (except reducing the MTU made things about 20-30% slower).
>> 
>> mpstat during both NFS and iSCSI transfers showed all processors as getting roughly the same number of interrupts, etc, although I did see a varying number of  spins on reader/writer locks during the iSCSI transfers. The NFS showed no srws at all.
>> 
>> Here is a pretty representative example of a 1s mpstat during an iSCSI transfer:
>> 
>> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl set
>>   0    0   0    0  3246 2690 8739    6  772 5967    2     0    0  11   0  89   0
>>   1    0   0    0  2366 2249 7910    8  988 5563    2   302    0   9   0  91   0
>>   2    0   0    0  2455 2344 5584    5  687 5656    3    66    0   9   0  91   0
>>   3    0   0   25   248   12 6210    1  885 5679    2     0    0   9   0  91   0
>>   4    0   0    0   284    7 5450    2  861 5751    1     0    0   8   0  92   0
>>   5    0   0    0   232    3 4513    0  547 5733    3     0    0   7   0  93   0
>>   6    0   0    0   322    8 6084    1  836 6295    2     0    0   8   0  92   0
>>   7    0   0    0  3114 2848 8229    4  648 4966    2     0    0  10   0  90   0
>> 
>>  
>> 
>> So, it seems that it's COMSTAR/iSCSI that's broke as hell, not ixgbe. My apologies to anyone I may have offended with my pre-judgement.
>> 
>> The consequences of this performance issue are significant:
>> 
>> 1: Instead of being able to utilize the existing quad-port NICs I have in my hosts, I must use dual 10G cards for redundancy purposes.
>> 
>> 2: I must build out a full 10G switching infrastructure.
>> 
>> 3: The network traffic is inherently less secure, as it is essentially impossible to do real security with NFSv3 (that is supported by ESXi).
>> 
>> In the short run, I have already ordered some relatively cheap 20G infiniband gear that will hopefully push up the cost/performance ratio. However, I have received all sorts of advice about how painful it can be to build and maintain infiniband, and if iSCSI over 10G ethernet is this painful, I'm not hopeful that infiniband will "just work".
>> 
>> The last option, of course, is to bail out of the Solaris derivatives and move to ZoL or ZoBSD. The drawbacks of this are:
>> 
>> 1: ZoL doesn't easily support booting off of mirrored USB flash drives, let alone running the root filesystem and swap on them. FreeNAS, by way of comparison, puts a 2G swap partition on each zdev, which (strangely enough) causes it to often crash when a zdev experiences a failure under load.
>> 
>> 2: Neither ZoL or FreeNAS have good, stable, kernel-based iSCSI implementations. FreeNAS is indeed testing istgt, but it proved unstable for my purposes in recent builds. Unfortunately, stmf hasn't proved itself any better.
>> 
>>  
>> 
>> There are other minor differences, but these are the ones that brought me to OmniOS in the first place. We'll just have to wait and see how well the infiniband stuff works.
>> 
>> 
>> Hopefully this exercise will help prevent others from going down the same rabbit-hole that I did.
>> 
>> -Warren V
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> On Tue, Mar 3, 2015 at 3:45 PM, W Verb <wverb73 at gmail.com> wrote:
>> 
>> Hello Rob et al,
>> 
>> Thank you for taking the time to look at this problem with me. I completely understand your inclination to look at the network as the most probable source of my issue, but I believe that this is a pretty clear-cut case of server-side issues.
>> 
>> 1: I did run ping RTT tests during both read and write operations with multiple interfaces enabled, and the RTT stayed at ~.2ms regardless of whether traffic was actively being transmitted/received or not.
>> 
>> 2: I am not seeing the TCP window size bouncing around, and I am certainly not seeing starvation and delay in my packet captures. It is true that I do see delayed ACKs and retransmissions when I bump the MTU to 9000 on both sides, but I stopped testing with high MTU as soon as I saw it happening because I have a good understanding of incast. All of my recent testing has been with MTUs between 1000 and 3000 bytes.
>> 
>> 3: When testing with MTUs between 1000 and 3000 bytes, I do not see lost packets and retransmission in captures on either the server or client side. I only see staggered transmission delays on the part of the server.
>> 
>> 4: The client is consistently advertising a large window size (20k+), so the TCP throttling mechanism does not appear to play into this.
>> 
>> 5: As mentioned previously, layer 2 flow control is not enabled anywhere in the network, so there are no lower-level mechanisms at work.
>> 
>> 6: Upon checking buffer and queue sizes (and doing the appropriate research into documentation on the C3560E's buffer sizes), I do not see large numbers of frames being dropped by the switch. It does happen at larger MTUs, but not very often (and not consistently) during transfers at 1000-3000 byte MTUs. I do not have QoS, policing, or rate-shaping enabled.
>> 
>> 7: Network interface stats on both the server and the ESXi client show no errors of any kind. This is via netstat on the server, and esxcli / Vsphere client on the ESXi box.
>> 
>> 8: When looking at captures taken simultaneously on the server and client side, the server-side transmission pauses are consistently seen and reproducible, even after multiple server rebuilds, ESXi rebuilds, vSphere reinstallations (down to wiping the SQL db), various COMSTAR configuration variations, multiple 10G NICs with different NIC chipsets, multiple switches (I tried both a 48-port and 24-port C3560E), multiple IOS revisions (12.2 and 15.0), OmniOS versions (r151012 and previous) multiple cables, transceivers, etc etc etc etc etc
>> 
>> 
>> For your review, I have uploaded the actual packet captures to Google Drive:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQZS03dHJ2ZjJvTEE/view?usp=sharing   2 int write - ESXi vmk5
>> https://drive.google.com/file/d/0BwyUMjibonYQTlNTQ2M5bjlxZ00/view?usp=sharing   2 int write - ESXi vmk1
>> https://drive.google.com/file/d/0BwyUMjibonYQUEFsSVJCYXBVX3c/view?usp=sharing   2 int read -  server ixgbe0
>> https://drive.google.com/file/d/0BwyUMjibonYQT3FBbElnNFpJTzQ/view?usp=sharing   2 int read - ESXi vmk5
>> https://drive.google.com/file/d/0BwyUMjibonYQU1hXdFRLM2cxSTA/view?usp=sharing   2 int read - ESXi vmk1
>> https://drive.google.com/file/d/0BwyUMjibonYQNEFZSHVZdFNweDA/view?usp=sharing   1 int write - ESXi vmk1
>> https://drive.google.com/file/d/0BwyUMjibonYQM3FpTmloQm5iMGc/view?usp=sharing   1 int read - ESXi vmk1
>> 
>> Regards,
>> 
>> Warren V
>> 
>>  
>> 
>> On Mon, Mar 2, 2015 at 1:11 PM, Mallory, Rob <rmallory at qualcomm.com> wrote:
>> 
>> Just an EWAG,   and forgive me for not following closely, I just saw this in my inbox, and looked at it and the screenshots for 2 minutes.
>> 
>>  
>> 
>>   But this looks like the typical incast problem..  see http://www.pdl.cmu.edu/Incast/  
>> 
>> where your storage servers (there are effectively two with ISCSI/MPIO if round-robin is working) have networks which are 20:1 oversubscribed to your 1GbE host interfaces. (although one of the tcpdumps shows only one server so it may be choked out completely)
>> 
>>  
>> 
>> What is your BDP?  I?m guessing .150ms * 1GbE.  For single-link that gets you to a MSS of 18700 or so.
>> 
>>  
>> 
>> On your 1GbE connected clients, leave MTU at 9k, set the following in sysctl.conf,
>> 
>> And reboot.
>> 
>>  
>> 
>> net.ipv4.tcp_rmem = 4096 8938 17876
>> 
>>  
>> 
>> If MPIO from the server is indeed round-robining properly, this will ?make things fit? much better.
>> 
>>  
>> 
>> Note that your tcp_wmem can and should stay high, since you are not oversubscribed going from client?server ;  you only need to tweak the tcp receive window size.
>> 
>>  
>> 
>> I?ve not done it in quite some time, but IIRC, You can also set these from the server side with:
>> 
>> Route add -sendpipe 8930   or ?ssthresh
>> 
>>  
>> 
>> And I think you can see the hash-table with computed BDP per client with ndd.
>> 
>>  
>> 
>> I would try playing with those before delving deep into potential bugs in the TCP, nic driver, zfs, or vm.
>> 
>> -Rob
>> 
>>  
>> 
>> From: W Verb via illumos-developer [mailto:developer at lists.illumos.org] 
>> Sent: Monday, March 02, 2015 12:20 PM
>> To: Garrett D'Amore
>> Cc: Joerg Goltermann; illumos-dev; omnios-discuss at lists.omniti.com
>> Subject: Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy
>> 
>>  
>> 
>> Hello,
>> 
>> vmstat seems pretty boring. Certainly nothing going to swap.
>> 
>> root at sanbox:/root# vmstat
>>  kthr      memory            page            disk          faults      cpu
>>  r b w   swap  free  re  mf pi po fr de sr po ro s0 s2   in   sy   cs us sy id
>>  0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0  1 99
>> 
>> Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep 30" during the "fast" write operation.
>> 
>> -------------------------------------------------------------------------------
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>> 50934   3%  79% 0.00     3437 0xffffff093145ba40     taskq_dispatch_ent
>> 
>>       nsec ------ Time Distribution ------ count     Stack
>>        128 |                               7         spa_taskq_dispatch_ent
>>        256 |@@                             4333      zio_taskq_dispatch
>>        512 |@@                             3863      zio_issue_async
>>       1024 |@@@@@                          9717      zio_execute
>>       2048 |@@@@@@@@@                      15904
>>       4096 |@@@@                           7595
>>       8192 |@@                             4498
>>      16384 |@                              2662
>>      32768 |@                              1886
>>      65536 |                               434
>>     131072 |                               34
>>     262144 |                               1
>> -------------------------------------------------------------------------------
>> 
>> 
>> However, the truly "broken" function is a read operation:
>> 
>> Top lock 1st try:
>> -------------------------------------------------------------------------------
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>   474  15%  15% 0.00     7031 0xffffff093145b6f8     cv_wait
>> 
>>       nsec ------ Time Distribution ------ count     Stack
>>        256 |@                              29        taskq_thread_wait
>>        512 |@@@@@@                         100       taskq_thread
>>       1024 |@@@@                           72        thread_start
>>       2048 |@@@@                           69
>>       4096 |@@@                            51
>>       8192 |@@                             47
>>      16384 |@@                             44
>>      32768 |@@                             32
>>      65536 |@                              25
>>     131072 |                               5
>> -------------------------------------------------------------------------------
>> 
>> Top lock 2nd try:
>> 
>> -------------------------------------------------------------------------------
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>   174  39%  39% 0.00   103909 0xffffff0943f116a0     dmu_zfetch_find
>> 
>>       nsec ------ Time Distribution ------ count     Stack
>>       2048 |                               2         dmu_zfetch
>>       4096 |                               3         dbuf_read
>>       8192 |                               4         dmu_buf_hold_array_by_dnode
>>      16384 |                               3         dmu_buf_hold_array
>>      32768 |@                              7
>>      65536 |@@                             14
>>     131072 |@@@@@@@@@@@@@@@@@@@@           116
>>     262144 |@@@                            19
>>     524288 |                               4
>>    1048576 |                               2
>> -------------------------------------------------------------------------------
>> 
>> Top lock 3rd try:
>> 
>> -------------------------------------------------------------------------------
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>>   283  55%  55% 0.00    94602 0xffffff0943ff5a68     dmu_zfetch_find
>> 
>>       nsec ------ Time Distribution ------ count     Stack
>>        512 |                               1         dmu_zfetch
>>       1024 |                               1         dbuf_read
>>       2048 |                               0         dmu_buf_hold_array_by_dnode
>>       4096 |                               5         dmu_buf_hold_array
>>       8192 |                               2
>>      16384 |                               7
>>      32768 |                               4
>>      65536 |@@@                            33
>>     131072 |@@@@@@@@@@@@@@@@@@@@           198
>>     262144 |@@                             27
>>     524288 |                               2
>>    1048576 |                               3
>> -------------------------------------------------------------------------------
>> 
>>  
>> 
>> As for the MTU question- setting the MTU to 9000 makes read operations grind almost to a halt at 5MB/s transfer rate.
>> 
>> -Warren V
>> 
>>  
>> 
>> On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore <garrett at damore.org> wrote:
>> 
>> Here?s a theory.  You are using small (relatively) MTUs (3000 is less than the smallest ZFS block size.)  So, when you go multipathing this way, might a single upper layer transaction (ZFS block transfer request, or for that matter COMSTAR block request) get routed over different paths.  This sounds like a potentially pathological condition to me.
>> 
>>  
>> 
>> What happens if you increase the MTU to 9000?  Have you tried it?  I?m sort of thinking that this will permit each transaction to be issued in a single IP frame, which may alleviate certain tragic code paths.  (That said, I?m not sure how aware COMSTAR is of the IP MTU.  If it is ignorant, then it shouldn?t matter *that* much, since TCP should do the right thing here and a single TCP stream should stick to a single underlying NIC.  But if COMSTAR is aware of the MTU, it may do some really screwball things as it tries to break requests up into single frames.)
>> 
>>  
>> 
>> Your read spin really looks like only about 22 msec of wait out of a total run of 30 sec.  (That?s not *great*, but neither does it sound tragic.)  Your write  is interesting because that looks like it is going a wildly different path.  You should be aware that the locks you see are *not* necessarily related in call order, but rather are ordered by instance count.  The write code path hitting the task_thread as hard as it does is really, really weird.  Something is pounding on a taskq lock super hard.  The number of taskq_dispatch_ent calls is interesting here.  I?m starting to wonder if it?s something as stupid as a spin where if the taskq is ?full? (max size reached), a caller just is spinning  trying to dispatch jobs to the taskq.  
>> 
>>  
>> 
>> The taskq_dispatch_ent code is super simple, and it should be almost impossible to have contention on that lock ? barring a thread spinning hard on taskq_dispatch (or taskq_dispatch_ent as I think is happening here).  Looking at the various call sites, there are places in both COMSTAR (iscsit) and in ZFS where this could be coming from.  To know which, we really need to have the back trace associated. 
>> 
>>  
>> 
>> lockstat can give this ? try giving ?-s 5? to give a short backtrace from this, that will probably give us a little more info about the guilty caller. :-)
>> 
>>  
>> 
>> - Garrett
>> 
>>  
>> 
>> On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer <developer at lists.illumos.org> wrote:
>> 
>>  
>> 
>> Hello all,
>> 
>> I am not using layer 2 flow control. The switch carries line-rate 10G traffic without error.
>> 
>> I think I have found the issue via lockstat. The first lockstat is taken during a multipath read:
>> 
>> lockstat -kWP sleep 30
>> 
>> 
>> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec)
>> 
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>> -------------------------------------------------------------------------------
>>  9306  44%  44% 0.00     1557 htable_mutex+0x370     htable_release
>>  6307  23%  68% 0.00     1207 htable_mutex+0x108     htable_lookup
>>   596   7%  75% 0.00     4100 0xffffff0931705188     cv_wait
>>   349   5%  80% 0.00     4437 0xffffff0931705188     taskq_thread
>>   704   2%  82% 0.00      995 0xffffff0935de3c50     dbuf_create
>> 
>> The hash table being read here I would guess is the tcp connection hash table.
>> 
>>  
>> 
>> When lockstat is run during a multipath write operation, I get:
>> 
>> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec)
>> 
>> Count indv cuml rcnt     nsec Hottest Lock           Caller
>> -------------------------------------------------------------------------------
>> 210752  28%  28% 0.00     4781 0xffffff0931705188     taskq_thread
>> 174471  22%  50% 0.00     4476 0xffffff0931705188     cv_wait
>> 127183  10%  61% 0.00     2871 0xffffff096f29b510     zio_notify_parent
>> 176066  10%  70% 0.00     1922 0xffffff0931705188     taskq_dispatch_ent
>> 105134   9%  80% 0.00     3110 0xffffff096ffdbf10     zio_remove_child
>> 67512   4%  83% 0.00     1938 0xffffff096f3db4b0     zio_add_child
>> 45736   3%  86% 0.00     2239 0xffffff0935de3c50     dbuf_destroy
>> 27781   3%  89% 0.00     3416 0xffffff0935de3c50     dbuf_create
>> 38536   2%  91% 0.00     2122 0xffffff0935de3b70     dnode_rele
>> 27841   2%  93% 0.00     2423 0xffffff0935de3b70     dnode_diduse_space
>> 19020   2%  95% 0.00     3046 0xffffff09d9e305e0     dbuf_rele
>> 14627   1%  96% 0.00     3632 dbuf_hash_table+0x4f8  dbuf_find
>> 
>> 
>> Writes are not performing htable lookups, while reads are.
>> 
>> -Warren V
>> 
>> 
>> 
>> 
>> 
>>  
>> 
>> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann <jg at osn.de> wrote:
>> 
>> Hi,
>> 
>> I would try *one* TPG which includes both interface addresses
>> and I would double check for packet drops on the Catalyst.
>> 
>> The 3560 supports only receive flow control which means, that
>> a sending 10Gbit port can easily overload a 1Gbit port.
>> Do you have flow control enabled?
>> 
>>  - Joerg
>> 
>> 
>> 
>> On 02.03.2015 09:22, W Verb via illumos-developer wrote:
>> 
>> Hello Garrett,
>> 
>> No, no 802.3ad going on in this config.
>> 
>> Here is a basic schematic:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing
>> 
>> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing
>> 
>> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The
>> switch is set to allow 9148-byte frames, and I'm not seeing any
>> errors/buffer overruns on the switch.
>> 
>> Here is a screenshot of a packet capture from a read operation on the
>> guest OS (from it's local drive, which is actually a VMDK file on the
>> storage server). In this example, only a single 1G ESXi kernel interface
>> (vmk1) is bound to the software iSCSI initiator.
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing
>> 
>> Note that there's a nice, well-behaved window sizing process taking
>> place. The ESXi decreases the scaled window by 11 or 12 for each ACK,
>> then bumps it back up to 512.
>> 
>> Here is a similar screenshot of a single-interface write operation:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing
>> 
>> There are no pauses or gaps in the transmission rate in the
>> single-interface transfers.
>> 
>> 
>> In the next screenshots, I have enabled an additional 1G interface on
>> the ESXi host, and bound it to the iSCSI initiator. The new interface is
>> bound to a separate physical port, uses a different VLAN on the switch,
>> and talks to a different 10G port on the storage server.
>> 
>> First, let's look at a write operation on the guest OS, which happily
>> pumps data at near-line-rate to the storage server.
>> 
>> Here is a sequence number trace diagram. Note how the transfer has a
>> nice, smooth increment rate over the entire transfer.
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing
>> 
>> Here are screenshots from packet captures on both 1G interfaces:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing
>> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing
>> 
>> Note how we again see nice, smooth window adjustment, and no gaps in
>> transmission.
>> 
>> 
>> But now, let's look at the problematic two-interface Read operation.
>> First, the sequence graph:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing
>> 
>> As you can see, there are gaps and jumps in the transmission throughout
>> the transfer.
>> It is very illustrative to look at captures of the gaps, which are
>> occurring on both interfaces:
>> 
>> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing
>> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing
>> 
>> As you can see, there are ~.4 second pauses in transmission from the
>> storage server, which kills the transfer rate.
>> It's clear that the ESXi box ACKs the prior iSCSI operation to
>> completion, then makes a new LUN request, which the storage server
>> immediately replies to. The ESXi ACKs the response packet from the
>> storage server, then waits...and waits....and waits... until eventually
>> the storage server starts transmitting again.
>> 
>> Because the pause happens while the ESXi client is waiting for a packet
>> from the storage server, that tells me that the gaps are not an artifact
>> of traffic being switched between both active interfaces, but are
>> actually indicative of short hangs occurring on the server.
>> 
>> Having a pause or two in transmission is no big deal, but in my case, it
>> is happening constantly, and dropping my overall read transfer rate down
>> to 20-60MB/s, which is slower than the single interface transfer rate
>> (~90-100MB/s).
>> 
>> Decreasing the MTU makes the pauses shorter, increasing them makes the
>> pauses longer.
>> 
>> Another interesting thing is that if I set the multipath io interval to
>> 3 operations instead of 1, I get better throughput. In other words, the
>> less frequently I swap IP addresses on my iSCSI requests from the ESXi
>> unit, the fewer pauses I see.
>> 
>> Basically, COMSTAR seems to choke each time an iSCSI request from a new
>> IP arrives.
>> 
>> Because the single interface transfer is near line rate, that tells me
>> that the storage system (mpt_sas, zfs, etc) is working fine. It's only
>> when multiple paths are attempted that iSCSI falls on its face during reads.
>> 
>> All of these captures were taken without a cache device being attached
>> to the storage zpool, so this isn't looking like some kind of ZFS ARC
>> problem. As mentioned previously, local transfers to/from the zpool are
>> showing ~300-500 MB/s rates over long transfers (10G+).
>> 
>> -Warren V
>> 
>> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore <garrett at damore.org
>> 
>> <mailto:garrett at damore.org>> wrote:
>> 
>>     I?m not sure I?ve followed properly.  You have *two* interfaces.
>>     You are not trying to provision these in an aggr are you? As far as
>>     I?m aware, VMware does not support 802.3ad link aggregations.  (Its
>>     possible that you can make it work with ESXi if you give the entire
>>     NIC to the guest ? but I?m skeptical.)  The problem is that if you
>>     try to use link aggregation, some packets (up to half!) will be
>>     lost.  TCP and other protocols fare poorly in this situation.
>> 
>>     Its possible I?ve totally misunderstood what you?re trying to do, in
>>     which case I apologize.
>> 
>>     The idle thing is a red-herring ? the cpu is waiting for work to do,
>>     probably because packets haven?t arrived (or where dropped by the
>>     hypervisor!)  I wouldn?t read too much into that except that your
>>     network stack is in trouble.  I?d look a bit more closely at the
>>     kstats for tcp ? I suspect you?ll see retransmits or out of order
>>     values that are unusually high ? if so this may help validate my
>>     theory above.
>> 
>>     - Garrett
>> 
>>     On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer
>>     <developer at lists.illumos.org <mailto:developer at lists.illumos.org>>
>> 
>> 
>>     wrote:
>> 
>>     Hello all,
>> 
>> 
>>     Well, I no longer blame the ixgbe driver for the problems I'm seeing.
>> 
>> 
>>     I tried Joerg's updated driver, which didn't improve the issue. So
>>     I went back to the drawing board and rebuilt the server from scratch.
>> 
>>     What I noted is that if I have only a single 1-gig physical
>>     interface active on the ESXi host, everything works as expected.
>>     As soon as I enable two interfaces, I start seeing the performance
>>     problems I've described.
>> 
>>     Response pauses from the server that I see in TCPdumps are still
>>     leading me to believe the problem is delay on the server side, so
>>     I ran a series of kernel dtraces and produced some flamegraphs.
>> 
>> 
>>     This was taken during a read operation with two active 10G
>>     interfaces on the server, with a single target being shared by two
>>     tpgs- one tpg for each 10G physical port. The host device has two
>>     1G ports enabled, with VLANs separating the active ports into
>>     10G/1G pairs. ESXi is set to multipath using both VLANS with a
>>     round-robin IO interval of 1.
>> 
>>     https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing
>> 
>> 
>>     This was taken during a write operation:
>> 
>>     https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing
>> 
>> 
>>     I then rebooted the server and disabled C-State, ACPI T-State, and
>>     general EIST (Turbo boost) functionality in the CPU.
>> 
>>     I when I attempted to boot my guest VM, the iSCSI transfer
>>     gradually ground to a halt during the boot loading process, and
>>     the guest OS never did complete its boot process.
>> 
>>     Here is a flamegraph taken while iSCSI is slowly dying:
>> 
>>     https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing
>> 
>> 
>>     I edited out cpu_idle_adaptive from the dtrace output and
>>     regenerated the slowdown graph:
>> 
>>     https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing
>> 
>> 
>>     I then edited cpu_idle_adaptive out of the speedy write operation
>>     and regenerated that graph:
>> 
>>     https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing
>> 
>> 
>>     I have zero experience with interpreting flamegraphs, but the most
>>     significant difference I see between the slow read example and the
>>     fast write example is in unix`thread_start --> unix`idle. There's
>>     a good chunk of "unix`i86_mwait" in the read example that is not
>>     present in the write example at all.
>> 
>>     Disabling the l2arc cache device didn't make a difference, and I
>>     had to reenable EIST support on the CPU to get my VMs to boot.
>> 
>>     I am seeing a variety of bug reports going back to 2010 regarding
>>     excessive mwait operations, with the suggested solutions usually
>>     being to set "cpupm enable poll-mode" in power.conf. That change
>>     also had no effect on speed.
>> 
>>     -Warren V
>> 
>> 
>> 
>> 
>>     -----Original Message-----
>> 
>>     From: Chris Siebenmann [mailto:cks at cs.toronto.edu]
>> 
>>     Sent: Monday, February 23, 2015 8:30 AM
>> 
>>     To: W Verb
>> 
>>     Cc: omnios-discuss at lists.omniti.com
>> 
>>     <mailto:omnios-discuss at lists.omniti.com>; cks at cs.toronto.edu
>>     <mailto:cks at cs.toronto.edu>
>> 
>>     Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and
>>     the Greek economy
>> 
>> 
>>     > Chris, thanks for your specific details. I'd appreciate it if you
>> 
>>     > could tell me which copper NIC you tried, as well as to pass on the
>> 
>>     > iSCSI tuning parameters.
>> 
>> 
>>     Our copper NIC experience is with onboard X540-AT2 ports on
>>     SuperMicro hardware (which have the guaranteed 10-20 msec lock
>>     hold) and dual-port 82599EB TN cards (which have some sort of
>>     driver/hardware failure under load that eventually leads to
>>     2-second lock holds). I can't recommend either with the current
>>     driver; we had to revert to 1G networking in order to get stable
>>     servers.
>> 
>> 
>>     The iSCSI parameter modifications we do, across both initiators
>>     and targets, are:
>> 
>> 
>>     initialr2tno
>> 
>>     firstburstlength128k
>> 
>>     maxrecvdataseglen128k[only on Linux backends]
>> 
>>     maxxmitdataseglen128k[only on Linux backends]
>> 
>> 
>>     The OmniOS initiator doesn't need tuning for more than the first
>>     two parameters; on the Linux backends we tune up all four. My
>>     extended thoughts on these tuning parameters and why we touch them
>>     can be found
>> 
>>     here:
>> 
>> 
>>     http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol
>> 
>>     http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning
>> 
>> 
>>     The short version is that these parameters probably only make a
>>     small difference but their overall goal is to do 128KB ZFS reads
>>     and writes in single iSCSI operations (although they will be
>>     fragmented at the TCP
>> 
>>     layer) and to do iSCSI writes without a back-and-forth delay
>>     between initiator and target (that's 'initialr2t no').
>> 
>> 
>>     I think basically everyone should use InitialR2T set to no and in
>>     fact that it should be the software default. These days only
>>     unusually limited iSCSI targets should need it to be otherwise and
>>     they can change their setting for it (initiator and target must
>>     both agree to it being 'yes', so either can veto it).
>> 
>> 
>>     - cks
>> 
>> 
>> 
>>     On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann <jg at osn.de
>> 
>>     <mailto:jg at osn.de>> wrote:
>> 
>>         Hi,
>> 
>>         I think your problem is caused by your link properties or your
>>         switch settings. In general the standard ixgbe seems to perform
>>         well.
>> 
>>         I had trouble after changing the default flow control settings
>>         to "bi"
>>         and this was my motivation to update the ixgbe driver a long
>>         time ago.
>>         After I have updated our systems to ixgbe 2.5.8 I never had any
>>         problems ....
>> 
>>         Make sure your switch has support for jumbo frames and you use
>>         the same mtu on all ports, otherwise the smallest will be used.
>> 
>>         What switch do you use? I can tell you nice horror stories about
>>         different vendors....
>> 
>>          - Joerg
>> 
>>         On 23.02.2015 10:31, W Verb wrote:
>> 
>>             Thank you Joerg,
>> 
>>             I've downloaded the package and will try it tomorrow.
>> 
>>             The only thing I can add at this point is that upon review
>>             of my
>>             testing, I may have performed my "pkg -u" between the
>>             initial quad-gig
>>             performance test and installing the 10G NIC. So this may
>>             be a new
>>             problem introduced in the latest updates.
>> 
>>             Those of you who are running 10G and have not upgraded to
>>             the latest
>>             kernel, etc, might want to do some additional testing
>>             before running the
>>             update.
>> 
>>             -Warren V
>> 
>>             On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann
>>             <jg at osn.de <mailto:jg at osn.de>
>> 
>>             <mailto:jg at osn.de <mailto:jg at osn.de>>> wrote:
>> 
>>                 Hi,
>> 
>>                 I remember there was a problem with the flow control
>>             settings in the
>>                 ixgbe
>>                 driver, so I updated it a long time ago for our
>>             internal servers to
>>                 2.5.8.
>>                 Last weekend I integrated the latest changes from the
>>             FreeBSD driver
>>                 to bring
>>                 the illumos ixgbe to 2.5.25 but I had no time to test
>>             it, so it's
>>                 completely
>>                 untested!
>> 
>> 
>>                 If you would like to give the latest driver a try you
>>             can fetch the
>>                 kernel modules from
>>             https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9
>>             <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9>
>>                 <https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9
>>             <https://cloud.osn.de/index.php/s/Fb4so9RsNnXA7r9>>
>> 
>>                 Clone your boot environment, place the modules in the
>>             new environment
>>                 and update the boot-archive of the new BE.
>> 
>>                   - Joerg
>> 
>> 
>> 
>> 
>> 
>>                 On 23.02.2015 02:54, W Verb wrote:
>> 
>>                     By the way, to those of you who have working
>>             setups: please send me
>>                     your pool/volume settings, interface linkprops,
>>             and any kernel
>>                     tuning
>>                     parameters you may have set.
>> 
>>                     Thanks,
>>                     Warren V
>> 
>>                     On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip
>>                     <chip at innovates.com <mailto:chip at innovates.com>
>>             <mailto:chip at innovates.com <mailto:chip at innovates.com>>>
>> 
>> 
>>             wrote:
>> 
>>                         I can't say I totally agree with your performance
>>                         assessment.   I run Intel
>>                         X520 in all my OmniOS boxes.
>> 
>>                         Here is a capture of nfssvrtop I made while
>>             running many
>>                         storage vMotions
>>                         between two OmniOS boxes hosting NFS
>>             datastores.   This is a
>>                         10 host VMware
>>                         cluster.  Both OmniOS boxes are dual 10G
>>             connected with
>>                         copper twin-ax to
>>                         the in rack Nexus 5010.
>> 
>>                         VMware does 100% sync writes, I use ZeusRAM
>>             SSDs for log
>>                         devices.
>> 
>>                         -Chip
>> 
>>                         2014 Apr 24 08:05:51, load: 12.64, read:
>>             17330243 KB,
>>                         swrite: 15985    KB,
>>                         awrite: 1875455  KB
>> 
>>                         Ver     Client           NFSOPS   Reads
>>             SWrites AWrites
>>                         Commits   Rd_bw
>>                         SWr_bw  AWr_bw    Rd_t   SWr_t   AWr_t
>>              Com_t  Align%
>> 
>>                         4       10.28.17.105          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>> 
>>                         4       10.28.17.215          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>> 
>>                         4       10.28.17.213          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>> 
>>                         4       10.28.16.151          0       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>> 
>>                         4       all                   1       0
>>              0       0
>>                           0       0
>>                         0       0       0       0       0       0       0
>> 
>>                         3       10.28.16.175          3       0
>>              3       0
>>                           0       1
>>                         11       0    4806      48       0       0      85
>> 
>>                         3       10.28.16.183          6       0
>>              6       0
>>                           0       3
>>                         162       0     549     124       0       0
>>               73
>> 
>>                         3       10.28.16.180         11       0
>>             10       0
>>                           0       3
>>                         27       0     776      89       0       0      67
>> 
>>                         3       10.28.16.176         28       2
>>             26       0
>>                           0      10
>>                         405       0    2572     198       0       0
>>              100
>> 
>>                         3       10.28.16.178       4606    4602
>>              4       0
>>                           0  294534
>>                         3       0     723      49       0       0      99
>> 
>>                         3       10.28.16.179       4905    4879
>>             26       0
>>                           0  312208
>>                         311       0     735     271       0       0
>>               99
>> 
>>                         3       10.28.16.181       5515    5502
>>             13       0
>>                           0  352107
>>                         77       0      89      87       0       0      99
>> 
>>                         3       10.28.16.184      12095   12059
>>             10       0
>>                           0  763014
>>                         39       0     249     147       0       0      99
>> 
>>                         3       10.28.58.1        15401    6040
>>              116    6354
>>                         53  191605
>>                         474  202346     192      96     144      83
>>               99
>> 
>>                         3       all 42574 33086 <tel:42574%2033086>
>>             <tel:42574%20%20%2033086>     217
>>                         6354      53 1913488
>>                         1582  202300     348     138     153     105
>>                 99
>> 
>> 
>> 
>> 
>> 
>>                         On Fri, Feb 20, 2015 at 11:46 PM, W Verb
>>             <wverb73 at gmail.com <mailto:wverb73 at gmail.com>
>>                         <mailto:wverb73 at gmail.com
>> 
>> 
>>             <mailto:wverb73 at gmail.com>>> wrote:
>> 
>> 
>>                             Hello All,
>> 
>>                             Thank you for your replies.
>>                             I tried a few things, and found the following:
>> 
>>                             1: Disabling hyperthreading support in the
>>             BIOS drops
>>                             performance overall
>>                             by a factor of 4.
>>                             2: Disabling VT support also seems to have
>>             some effect,
>>                             although it
>>                             appears to be minor. But this has the
>>             amusing side
>>                             effect of fixing the
>>                             hangs I've been experiencing with fast
>>             reboot. Probably
>>                             by disabling kvm.
>>                             3: The performance tests are a bit tricky
>>             to quantify
>>                             because of caching
>>                             effects. In fact, I'm not entirely sure
>>             what is
>>                             happening here. It's just
>>                             best to describe what I'm seeing:
>> 
>>                             The commands I'm using to test are
>>                             dd if=/dev/zero of=./test.dd bs=2M count=5000
>>                             dd of=/dev/null if=./test.dd bs=2M count=5000
>>                             The host vm is running Centos 6.6, and has
>>             the latest
>>                             vmtools installed.
>>                             There is a host cache on an SSD local to
>>             the host that
>>                             is also in place.
>>                             Disabling the host cache didn't
>>             immediately have an
>>                             effect as far as I could
>>                             see.
>> 
>>                             The host MTU set to 3000 on all iSCSI
>>             interfaces for all
>>                             tests.
>> 
>>                             Test 1: Right after reboot, with an ixgbe
>>             MTU of 9000,
>>                             the write test
>>                             yields an average speed over three tests
>>             of 137MB/s. The
>>                             read test yields an
>>                             average over three tests of 5MB/s.
>> 
>>                             Test 2: After setting "ifconfig ixgbe0 mtu
>>             3000", the
>>                             write tests yield
>>                             140MB/s, and the read tests yield 53MB/s.
>>             It's important
>>                             to note here that
>>                             if I cut the read test short at only
>>             2-3GB, I get
>>                             results upwards of
>>                             350MB/s, which I assume is local
>>             cache-related distortion.
>> 
>>                             Test 3: MTU of 1500. Read tests are up to
>>             156 MB/s.
>>                             Write tests yield
>>                             about 142MB/s.
>>                             Test 4: MTU of 1000: Read test at 182MB/s.
>>                             Test 5: MTU of 900: Read test at 130 MB/s.
>>                             Test 6: MTU of 1000: Read test at 160MB/s.
>>             Write tests
>>                             are now
>>                             consistently at about 300MB/s.
>>                             Test 7: MTU of 1200: Read test at 124MB/s.
>>                             Test 8: MTU of 1000: Read test at 161MB/s.
>>             Write at 261MB/s.
>> 
>>                             A few final notes:
>>                             L1ARC grabs about 10GB of RAM during the
>>             tests, so
>>                             there's definitely some
>>                             read cachi
>> 
>> ...
>> 
>> [Message clipped]  
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150308/7f80d69b/attachment-0001.html>

From henson at acm.org  Mon Mar  9 04:07:19 2015
From: henson at acm.org (Paul B. Henson)
Date: Sun, 8 Mar 2015 21:07:19 -0700
Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator
In-Reply-To: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>
References: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>
Message-ID: <20150309040719.GS25463@bender.unx.csupomona.edu>

On Sat, Mar 07, 2015 at 11:17:03AM +0000, Andy wrote:

> To date (we're running r151012 in production), OmniOS doesn't install
> an MTA by default but, with the integration of 5166, sendmail becomes
> a dependency of mailwrapper and mailwrapper is required by SUNWcs.

:(, we actually use postfix for MTA purposes. I'd hate to see a hard
requirement for sendmail as part of the base system.

> This is a long way of me asking if mailwrapper could be removed from
> OmniOS as it isn't required for an IPS distribution. That would remove the
> requirement to have the standard sendmail package installed at all - just
> like <=r151012. It would mean that 'mailx' doesn't work but that should be
> expected if you haven't installed an MTA and is presumably the current
> behaviour.

+1 on this suggestion, thanks.

From eric.sproul at circonus.com  Mon Mar  9 14:23:13 2015
From: eric.sproul at circonus.com (Eric Sproul)
Date: Mon, 9 Mar 2015 10:23:13 -0400
Subject: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00
In-Reply-To: <54FB65FD.6040600@gmail.com>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>
	<32b301d057ba$d6168cc0$8243a640$@acm.org>
	<54F9FD0B.1040601@gnaa.net>
	<331901d05846$e406cbb0$ac146310$@acm.org>
	<54FA198B.7090300@gnaa.net> <54FA244B.1010600@gnaa.net>
	<54FB65FD.6040600@gmail.com>
Message-ID: <CAO8hXRAoCLU-rsge=b87GRwLPNahyigyP96x5FTGaQ0W88b1EQ@mail.gmail.com>

On Sat, Mar 7, 2015 at 3:56 PM, Brogy?nyi J?zsef <brogyi at gmail.com> wrote:
> Has anyone tested this firmware? Is it free from this error message "Parity
> Error on path"?
> Thanks any information.

P20 firmware is known to be toxic; just google for "lsi p20 firmware"
for the carnage.

P19 and below are fine, as far as I know.

Eric

From danmcd at omniti.com  Mon Mar  9 14:37:16 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 9 Mar 2015 10:37:16 -0400
Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator
In-Reply-To: <20150309040719.GS25463@bender.unx.csupomona.edu>
References: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>
	<20150309040719.GS25463@bender.unx.csupomona.edu>
Message-ID: <24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com>


> On Mar 9, 2015, at 12:07 AM, Paul B. Henson <henson at acm.org> wrote:
> 
> On Sat, Mar 07, 2015 at 11:17:03AM +0000, Andy wrote:
> 
>> To date (we're running r151012 in production), OmniOS doesn't install
>> an MTA by default but, with the integration of 5166, sendmail becomes
>> a dependency of mailwrapper and mailwrapper is required by SUNWcs.
> 
> :(, we actually use postfix for MTA purposes. I'd hate to see a hard
> requirement for sendmail as part of the base system.

I'll be looking into 5166 and its impact later today.  I want to cut a last or next-to-last bloody today or tomorrow.  This investigation will force it to be tomorrow.

If you have suggested diffs, please mail them to the list or create webrevs.  I'm generally okay with this so long as:

	- It does not break anything else.

	- It does not hinder the post-014 goal of building illumos-gate on OmniOS.

But I need to make sure.

Dan


From danmcd at omniti.com  Mon Mar  9 14:47:12 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 9 Mar 2015 10:47:12 -0400
Subject: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00
In-Reply-To: <CAO8hXRAoCLU-rsge=b87GRwLPNahyigyP96x5FTGaQ0W88b1EQ@mail.gmail.com>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>
	<32b301d057ba$d6168cc0$8243a640$@acm.org>
	<54F9FD0B.1040601@gnaa.net>
	<331901d05846$e406cbb0$ac146310$@acm.org>
	<54FA198B.7090300@gnaa.net> <54FA244B.1010600@gnaa.net>
	<54FB65FD.6040600@gmail.com>
	<CAO8hXRAoCLU-rsge=b87GRwLPNahyigyP96x5FTGaQ0W88b1EQ@mail.gmail.com>
Message-ID: <D55F9D30-6341-49CF-A456-EE5529F40B3F@omniti.com>


> On Mar 9, 2015, at 10:23 AM, Eric Sproul <eric.sproul at circonus.com> wrote:
> 
> On Sat, Mar 7, 2015 at 3:56 PM, Brogy?nyi J?zsef <brogyi at gmail.com> wrote:
>> Has anyone tested this firmware? Is it free from this error message "Parity
>> Error on path"?
>> Thanks any information.
> 
> P20 firmware is known to be toxic; just google for "lsi p20 firmware"
> for the carnage.
> 
> P19 and below are fine, as far as I know.

I've not heard good things about 19.  I HAVE heard that 18 is the best level of FW to run for right now.

Thanks!
Dan


From omnios at citrus-it.net  Mon Mar  9 14:55:43 2015
From: omnios at citrus-it.net (Andy)
Date: Mon, 9 Mar 2015 14:55:43 +0000 (GMT)
Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator
In-Reply-To: <24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com>
References: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>
	<20150309040719.GS25463@bender.unx.csupomona.edu>
	<24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com>
Message-ID: <alpine.GSO.2.00.1503091454530.1527@areb.pvgehf-vg.arg>

On Mon, 9 Mar 2015, Dan McDonald wrote:

;
; > On Mar 9, 2015, at 12:07 AM, Paul B. Henson <henson at acm.org> wrote:
; >
; > On Sat, Mar 07, 2015 at 11:17:03AM +0000, Andy wrote:
; >
; >> To date (we're running r151012 in production), OmniOS doesn't install
; >> an MTA by default but, with the integration of 5166, sendmail becomes
; >> a dependency of mailwrapper and mailwrapper is required by SUNWcs.
; >
; > :(, we actually use postfix for MTA purposes. I'd hate to see a hard
; > requirement for sendmail as part of the base system.
;
; I'll be looking into 5166 and its impact later today.  I want to cut a last or next-to-last bloody today or tomorrow.  This investigation will force it to be tomorrow.
;
; If you have suggested diffs, please mail them to the list or create webrevs.  I'm generally okay with this so long as:
;
; 	- It does not break anything else.
;
; 	- It does not hinder the post-014 goal of building illumos-gate on OmniOS.
;
; But I need to make sure.

This would be sufficient for me. It re-introduces the problem with 'mailx'
but that was there before.

--- usr/src/pkg/manifests/SUNWcs.mf~    Mon Mar  9 14:54:01 2015
+++ usr/src/pkg/manifests/SUNWcs.mf     Mon Mar  9 14:54:12 2015
@@ -1871,7 +1871,3 @@
 # Depend on zoneinfo data.
 #
 depend fmri=system/data/zoneinfo type=require
-#
-# The mailx binary calls /usr/lib/sendmail provided by mailwrapper
-#
-depend fmri=system/network/mailwrapper type=require

Thanks,

Andy
-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From danmcd at omniti.com  Mon Mar  9 17:18:33 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 9 Mar 2015 13:18:33 -0400
Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator
In-Reply-To: <alpine.GSO.2.00.1503091454530.1527@areb.pvgehf-vg.arg>
References: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>
	<20150309040719.GS25463@bender.unx.csupomona.edu>
	<24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com>
	<alpine.GSO.2.00.1503091454530.1527@areb.pvgehf-vg.arg>
Message-ID: <C714006B-9D09-4D3D-A4C9-A02171FEDDB4@omniti.com>


> On Mar 9, 2015, at 10:55 AM, Andy <omnios at citrus-it.net> wrote:
> 
> ; If you have suggested diffs, please mail them to the list or create webrevs.  I'm generally okay with this so long as:
> ;
> ; 	- It does not break anything else.
> ;
> ; 	- It does not hinder the post-014 goal of building illumos-gate on OmniOS.
> ;
> ; But I need to make sure.
> 
> This would be sufficient for me. It re-introduces the problem with 'mailx'
> but that was there before.
> 
> --- usr/src/pkg/manifests/SUNWcs.mf~    Mon Mar  9 14:54:01 2015
> +++ usr/src/pkg/manifests/SUNWcs.mf     Mon Mar  9 14:54:12 2015
> @@ -1871,7 +1871,3 @@
> # Depend on zoneinfo data.
> #
> depend fmri=system/data/zoneinfo type=require
> -#
> -# The mailx binary calls /usr/lib/sendmail provided by mailwrapper
> -#
> -depend fmri=system/network/mailwrapper type=require

I think this is too big of a hammer.

Tell me, would weakening the requirement of sendmail by mailwrapper help?
diff --git a/usr/src/pkg/manifests/system-network-mailwrapper.mf b/usr/src/pkg/manifests/system-network-mailwrapper.mf
index fa855da..21cc0b7 100644
--- a/usr/src/pkg/manifests/system-network-mailwrapper.mf
+++ b/usr/src/pkg/manifests/system-network-mailwrapper.mf
@@ -42,4 +42,4 @@ link path=usr/sbin/newaliases mediator=mta mediator-implementation=mailwrapper \
     target=../lib/mailwrapper
 link path=usr/sbin/sendmail mediator=mta mediator-implementation=mailwrapper \
     target=../lib/mailwrapper
-depend fmri=service/network/smtp/sendmail type=require
+depend fmri=service/network/smtp/sendmail type=optional

This keeps the spirit of the change, but doesn't trip up folks who want their own sendmail (even if they are technically violating KYSTY in their version!  ;) ).

Whatcha think?

Dan

p.s. I want to cut a bloody release today or more likely tomorrow.  Let's not bikeshed this.  I reserve the right to Just Say No also.


From henson at acm.org  Mon Mar  9 21:48:50 2015
From: henson at acm.org (Paul B. Henson)
Date: Mon, 9 Mar 2015 14:48:50 -0700
Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator
In-Reply-To: <C714006B-9D09-4D3D-A4C9-A02171FEDDB4@omniti.com>
References: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>
	<20150309040719.GS25463@bender.unx.csupomona.edu>
	<24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com>
	<alpine.GSO.2.00.1503091454530.1527@areb.pvgehf-vg.arg>
	<C714006B-9D09-4D3D-A4C9-A02171FEDDB4@omniti.com>
Message-ID: <355501d05ab2$d6ac29b0$84047d10$@acm.org>

> From: Dan McDonald
> Sent: Monday, March 09, 2015 10:19 AM
> 
> Tell me, would weakening the requirement of sendmail by mailwrapper help?
> 
> This keeps the spirit of the change, but doesn't trip up folks who want
their own
> sendmail (even if they are technically violating KYSTY in their version!
;) ).

So obviously mailwrapper would be installed, but what would happen with
sendmail? It would be installed as well, but could be removed? I'm not sure
exactly what IPS does with optional requirements. Currently we drop in our
own symlinks for /usr/lib/sendmail et al pointing to our installed postfix,
after the update, instead we would need to integrate the paths to our
postfix stuff into the mailwrapper configuration instead?




From danmcd at omniti.com  Mon Mar  9 21:53:51 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 9 Mar 2015 17:53:51 -0400
Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator
In-Reply-To: <355501d05ab2$d6ac29b0$84047d10$@acm.org>
References: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>
	<20150309040719.GS25463@bender.unx.csupomona.edu>
	<24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com>
	<alpine.GSO.2.00.1503091454530.1527@areb.pvgehf-vg.arg>
	<C714006B-9D09-4D3D-A4C9-A02171FEDDB4@omniti.com>
	<355501d05ab2$d6ac29b0$84047d10$@acm.org>
Message-ID: <F4B64227-F0D7-4A9B-B6A0-628B4E436027@omniti.com>


> On Mar 9, 2015, at 5:48 PM, Paul B. Henson <henson at acm.org> wrote:
> 
> So obviously mailwrapper would be installed, but what would happen with
> sendmail? It would be installed as well, but could be removed? I'm not sure
> exactly what IPS does with optional requirements.

I'm actually not sure either about the installation, but the weakened requirement should allow for removal, which was, I believe, the big problem.

>  Currently we drop in our
> own symlinks for /usr/lib/sendmail et al pointing to our installed postfix,
> after the update, instead we would need to integrate the paths to our
> postfix stuff into the mailwrapper configuration instead?

The mailwrapper manifest uses mediators to do the right thing.

I'm building what I hope will be this week's bloody as I'm typing this.  You can try it then (I'm not updating install media with this bump, however, so you'll have to "pkg update" to it).

Dan


From omnios at citrus-it.net  Mon Mar  9 22:34:10 2015
From: omnios at citrus-it.net (Andy)
Date: Mon, 9 Mar 2015 22:34:10 +0000 (GMT)
Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator
In-Reply-To: <355501d05ab2$d6ac29b0$84047d10$@acm.org>
References: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>
	<20150309040719.GS25463@bender.unx.csupomona.edu>
	<24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com>
	<alpine.GSO.2.00.1503091454530.1527@areb.pvgehf-vg.arg>
	<C714006B-9D09-4D3D-A4C9-A02171FEDDB4@omniti.com>
	<355501d05ab2$d6ac29b0$84047d10$@acm.org>
Message-ID: <alpine.GSO.2.00.1503092226540.5428@areb.pvgehf-vg.arg>


On Mon, 9 Mar 2015, Paul B. Henson wrote:

; > From: Dan McDonald
; > Sent: Monday, March 09, 2015 10:19 AM
; >
; > Tell me, would weakening the requirement of sendmail by mailwrapper help?
; >
; > This keeps the spirit of the change, but doesn't trip up folks who want
; their own
; > sendmail (even if they are technically violating KYSTY in their version!
; ;) ).
;
; So obviously mailwrapper would be installed, but what would happen with
; sendmail? It would be installed as well, but could be removed? I'm not sure
; exactly what IPS does with optional requirements. Currently we drop in our
; own symlinks for /usr/lib/sendmail et al pointing to our installed postfix,
; after the update, instead we would need to integrate the paths to our
; postfix stuff into the mailwrapper configuration instead?

You could do that - updating /etc/mailer.conf and leaving mailwrapper as
the /usr/lib/sendmail - but you're probably better off updating your
postfix package to use mediated symlinks. If you set the priority to
'site' it should override mailwrapper upon installation.

Here's what we have in our MTA manifest now:

link mediator=mta
    mediator-implementation=citrus-sendmail
    mediator-priority=site
    path=usr/lib/sendmail
    target=../../opt/sendmail/sbin/sendmail

/opt/sendmail being where our Sendmail is.

Similar entries for mailq, newaliases and /usr/sbin/sendmail.

Andy
-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From henson at acm.org  Tue Mar 10 00:08:32 2015
From: henson at acm.org (Paul B. Henson)
Date: Mon, 9 Mar 2015 17:08:32 -0700
Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator
In-Reply-To: <F4B64227-F0D7-4A9B-B6A0-628B4E436027@omniti.com>
References: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>
	<20150309040719.GS25463@bender.unx.csupomona.edu>
	<24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com>
	<alpine.GSO.2.00.1503091454530.1527@areb.pvgehf-vg.arg>
	<C714006B-9D09-4D3D-A4C9-A02171FEDDB4@omniti.com>
	<355501d05ab2$d6ac29b0$84047d10$@acm.org>
	<F4B64227-F0D7-4A9B-B6A0-628B4E436027@omniti.com>
Message-ID: <20150310000832.GT25463@bender.unx.csupomona.edu>

On Mon, Mar 09, 2015 at 05:53:51PM -0400, Dan McDonald wrote:

> The mailwrapper manifest uses mediators to do the right thing.

We don't build postfix as an IPS package, we use pkgsrc. So I don't
think mediators are going to work for me...


From henson at acm.org  Tue Mar 10 00:10:09 2015
From: henson at acm.org (Paul B. Henson)
Date: Mon, 9 Mar 2015 17:10:09 -0700
Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator
In-Reply-To: <alpine.GSO.2.00.1503092226540.5428@areb.pvgehf-vg.arg>
References: <alpine.GSO.2.00.1503071035370.25950@areb.pvgehf-vg.arg>
	<20150309040719.GS25463@bender.unx.csupomona.edu>
	<24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com>
	<alpine.GSO.2.00.1503091454530.1527@areb.pvgehf-vg.arg>
	<C714006B-9D09-4D3D-A4C9-A02171FEDDB4@omniti.com>
	<355501d05ab2$d6ac29b0$84047d10$@acm.org>
	<alpine.GSO.2.00.1503092226540.5428@areb.pvgehf-vg.arg>
Message-ID: <20150310001009.GU25463@bender.unx.csupomona.edu>

On Mon, Mar 09, 2015 at 10:34:10PM +0000, Andy wrote:

> You could do that - updating /etc/mailer.conf and leaving mailwrapper as
> the /usr/lib/sendmail - but you're probably better off updating your
> postfix package to use mediated symlinks. If you set the priority to
> 'site' it should override mailwrapper upon installation.

So what's the point of having mailwrapper at all if some other package
is going to basically completely replace everything it provides? At
least if you're using an IPS packaged MTA.

From stephan.budach at JVM.DE  Tue Mar 10 10:48:39 2015
From: stephan.budach at JVM.DE (Stephan Budach)
Date: Tue, 10 Mar 2015 11:48:39 +0100
Subject: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00
In-Reply-To: <D55F9D30-6341-49CF-A456-EE5529F40B3F@omniti.com>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>	<32b301d057ba$d6168cc0$8243a640$@acm.org>	<54F9FD0B.1040601@gnaa.net>	<331901d05846$e406cbb0$ac146310$@acm.org>	<54FA198B.7090300@gnaa.net>
	<54FA244B.1010600@gnaa.net>	<54FB65FD.6040600@gmail.com>	<CAO8hXRAoCLU-rsge=b87GRwLPNahyigyP96x5FTGaQ0W88b1EQ@mail.gmail.com>
	<D55F9D30-6341-49CF-A456-EE5529F40B3F@omniti.com>
Message-ID: <54FECC07.7060606@jvm.de>

Am 09.03.15 um 15:47 schrieb Dan McDonald:
>> On Mar 9, 2015, at 10:23 AM, Eric Sproul <eric.sproul at circonus.com> wrote:
>>
>> On Sat, Mar 7, 2015 at 3:56 PM, Brogy?nyi J?zsef <brogyi at gmail.com> wrote:
>>> Has anyone tested this firmware? Is it free from this error message "Parity
>>> Error on path"?
>>> Thanks any information.
>> P20 firmware is known to be toxic; just google for "lsi p20 firmware"
>> for the carnage.
>>
>> P19 and below are fine, as far as I know.
> I've not heard good things about 19.  I HAVE heard that 18 is the best level of FW to run for right now.
>
> Thanks!
> Dan
Is there a known good way to flash a LSI back to P18 if it already came 
with P19? I happen to have two new LSIs running P19.
Afaik, the readme explicitly warns about flashing back the fw?

Cheers,
budy

From filip.marvan at aira.cz  Tue Mar 10 12:39:31 2015
From: filip.marvan at aira.cz (Filip Marvan)
Date: Tue, 10 Mar 2015 13:39:31 +0100
Subject: [OmniOS-discuss] Howto install Grub on different device
Message-ID: <3BE0DEED8863E5429BAE4CAEDF62456503AE56C49C1B@AIRA-SRV.aira.local>

Hi,

 

I have HP Microserver G8 with 4 drive bays and one SATA port. I would like
to use this separate SATA port for SSD disk with system rpool, but
Microserver G8 is not able to boot from this SATA, if AHCI mode is enabled
and in drive bays are disks.

 

So would like to try some workround. I would like to install GRUB on SD
card, and use this SD card for booting (but all system with rpool will
remain on SSD on SATA, only bootloader will be on SD card).

 

I installed GRUB to SD card without any problems with:

 

installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c2t0d0s0 

 

c2t0d0 is my microSD card, without any filesystem installed. There is only
one Solaris2 partition. 

My active bootmenu entry in /rpool/boot/grub/menu.lst looks like this:

 

title omnios-1

bootfs rpool/ROOT/omnios-1

root (hd5,0,a)

kernel$ /platform/i86pc/kernel/amd64/unix -B $ZFS-BOOTFS

module$ /platform/i86pc/amd64/boot_archive

 

But if I boot HP Microserver from my SD card, it cannot locate my menu.lst
config file and fall to grub> shell. 

If I enter command:

configfile (hd5,0,a)/boot/grub/menu.lst

 

I can boot withou any problems in exact way as I wand, but hot to configure
GRUB, to use my config file on hd5 automatically?

 

Thank you for any help!

Filip Marvan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150310/14190904/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6220 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20150310/14190904/attachment-0001.bin>

From chip at innovates.com  Tue Mar 10 13:13:33 2015
From: chip at innovates.com (Schweiss, Chip)
Date: Tue, 10 Mar 2015 08:13:33 -0500
Subject: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00
In-Reply-To: <54FECC07.7060606@jvm.de>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>
	<32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net>
	<331901d05846$e406cbb0$ac146310$@acm.org> <54FA198B.7090300@gnaa.net>
	<54FA244B.1010600@gnaa.net> <54FB65FD.6040600@gmail.com>
	<CAO8hXRAoCLU-rsge=b87GRwLPNahyigyP96x5FTGaQ0W88b1EQ@mail.gmail.com>
	<D55F9D30-6341-49CF-A456-EE5529F40B3F@omniti.com>
	<54FECC07.7060606@jvm.de>
Message-ID: <CALeZrrRqjNPKJresZ5rpvdWEx1UMqp9GZKVP0U7vTPHsb8Wmow@mail.gmail.com>

On Tue, Mar 10, 2015 at 5:48 AM, Stephan Budach <stephan.budach at jvm.de>
wrote:

> Am 09.03.15 um 15:47 schrieb Dan McDonald:
>
>> On Mar 9, 2015, at 10:23 AM, Eric Sproul <eric.sproul at circonus.com>
>>> wrote:
>>>
>>> On Sat, Mar 7, 2015 at 3:56 PM, Brogy?nyi J?zsef <brogyi at gmail.com>
>>> wrote:
>>>
>>>> Has anyone tested this firmware? Is it free from this error message
>>>> "Parity
>>>> Error on path"?
>>>> Thanks any information.
>>>>
>>> P20 firmware is known to be toxic; just google for "lsi p20 firmware"
>>> for the carnage.
>>>
>>> P19 and below are fine, as far as I know.
>>>
>> I've not heard good things about 19.  I HAVE heard that 18 is the best
>> level of FW to run for right now.
>>
>> Thanks!
>> Dan
>>
> Is there a known good way to flash a LSI back to P18 if it already came
> with P19? I happen to have two new LSIs running P19.
> Afaik, the readme explicitly warns about flashing back the fw?
>
>
Backwards is hard.  I went through that trying to get v20 reverted on some
new HBAs.

The only method I could find that worked was using the UEFI shell and UEFI
sas2flash utility to erase the firmware and install the old version.  On
older motherboards, the DOS method should work. Solaris/Illumos sas2flash
is incapable of erasing the firmware.

-Chip





> Cheers,
> budy
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150310/25edb7e9/attachment.html>

From chip at innovates.com  Tue Mar 10 14:51:35 2015
From: chip at innovates.com (Schweiss, Chip)
Date: Tue, 10 Mar 2015 09:51:35 -0500
Subject: [OmniOS-discuss] smtp-notify dependency on sendmail
Message-ID: <CALeZrrT47BTuTNdKMaU9B8OO+xcqthyvpxYrG8fedGYk6yGjSw@mail.gmail.com>

I haven't used sendmail since the 1990's and don't intend to change.

I've figured out how to get smtp-notify to start with sendmail-client
disable, but it was a manual process of using 'svccfg -s smtp-notify
editprop'

What I can't figure out how to do the same on the command line.  Everything
I try either gives a syntax error or 'svccfg: No such property group
"startup_req".'  I really don't want to have to add a manual step to my
system setup scripts.

What's the proper syntax for this setting?:

svccfg -s system/fm/smtp-notify:default setprop "startup_req/entities" =
fmri: \"svc:/milestone/multi-user:default svc:/system/fmd:default\"

-Chip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150310/f7053de6/attachment.html>

From danmcd at omniti.com  Tue Mar 10 15:36:54 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 10 Mar 2015 11:36:54 -0400
Subject: [OmniOS-discuss] smtp-notify dependency on sendmail
In-Reply-To: <CALeZrrT47BTuTNdKMaU9B8OO+xcqthyvpxYrG8fedGYk6yGjSw@mail.gmail.com>
References: <CALeZrrT47BTuTNdKMaU9B8OO+xcqthyvpxYrG8fedGYk6yGjSw@mail.gmail.com>
Message-ID: <430FB4D8-1AB1-4675-B39F-84BE3928572B@omniti.com>


> On Mar 10, 2015, at 10:51 AM, Schweiss, Chip <chip at innovates.com> wrote:
> 
> svccfg -s system/fm/smtp-notify:default setprop "startup_req/entities" = fmri: \"svc:/milestone/multi-user:default svc:/system/fmd:default\"

First off, the :default is for an *instance*.  You want to lose that, as the startup_req/entities is for the whole service.

Second off, I don't know how to glom two FMRIs in one command line.

Here's my proposed, two command, solution:

svccfg -s system/fm/smtp-notify setprop startup_req/entities = fmri: svc:/milestone/multi-user:default
svccfg -s system/fm/smtp-notify addpropvalue startup_req/entities fmri: svc:/system/fmd:default

I got this to work on one of my VMs I use for bloody.  Please confirm/deny this works for you?

Hope this helps,
Dan


From omnios at citrus-it.net  Tue Mar 10 15:50:28 2015
From: omnios at citrus-it.net (Andy)
Date: Tue, 10 Mar 2015 15:50:28 +0000 (GMT)
Subject: [OmniOS-discuss] smtp-notify dependency on sendmail
In-Reply-To: <430FB4D8-1AB1-4675-B39F-84BE3928572B@omniti.com>
References: <CALeZrrT47BTuTNdKMaU9B8OO+xcqthyvpxYrG8fedGYk6yGjSw@mail.gmail.com>
	<430FB4D8-1AB1-4675-B39F-84BE3928572B@omniti.com>
Message-ID: <alpine.GSO.2.00.1503101545270.5497@areb.pvgehf-vg.arg>


On Tue, 10 Mar 2015, Dan McDonald wrote:

;
; > On Mar 10, 2015, at 10:51 AM, Schweiss, Chip <chip at innovates.com> wrote:
; >
; > svccfg -s system/fm/smtp-notify:default setprop "startup_req/entities" = fmri: \"svc:/milestone/multi-user:default svc:/system/fmd:default\"

Also, note that this list of dependencies is the default for bloody as the
sendmail-client dependency was removed; you will be able to stop using
this workaround in the future.

Andy
-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From chip at innovates.com  Tue Mar 10 15:55:50 2015
From: chip at innovates.com (Schweiss, Chip)
Date: Tue, 10 Mar 2015 10:55:50 -0500
Subject: [OmniOS-discuss] smtp-notify dependency on sendmail
In-Reply-To: <430FB4D8-1AB1-4675-B39F-84BE3928572B@omniti.com>
References: <CALeZrrT47BTuTNdKMaU9B8OO+xcqthyvpxYrG8fedGYk6yGjSw@mail.gmail.com>
	<430FB4D8-1AB1-4675-B39F-84BE3928572B@omniti.com>
Message-ID: <CALeZrrQQxgSy7eC8ruzJairjEHfb7D0US1zu82msT=8ph9cpGA@mail.gmail.com>

On Tue, Mar 10, 2015 at 10:36 AM, Dan McDonald <danmcd at omniti.com> wrote:

>
> svccfg -s system/fm/smtp-notify setprop startup_req/entities = fmri:
> svc:/milestone/multi-user:default
> svccfg -s system/fm/smtp-notify addpropvalue startup_req/entities fmri:
> svc:/system/fmd:default



That's the trick.  Thanks!

-Chip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150310/789f2187/attachment.html>

From jimklimov at cos.ru  Tue Mar 10 17:29:27 2015
From: jimklimov at cos.ru (Jim Klimov)
Date: Tue, 10 Mar 2015 18:29:27 +0100
Subject: [OmniOS-discuss] smtp-notify dependency on sendmail
In-Reply-To: <CALeZrrT47BTuTNdKMaU9B8OO+xcqthyvpxYrG8fedGYk6yGjSw@mail.gmail.com>
References: <CALeZrrT47BTuTNdKMaU9B8OO+xcqthyvpxYrG8fedGYk6yGjSw@mail.gmail.com>
Message-ID: <5F51C49D-0B8E-42BC-A91B-502383F3D424@cos.ru>

10 ????? 2015??. 15:51:35 CET, "Schweiss, Chip" <chip at innovates.com> ?????:
>I haven't used sendmail since the 1990's and don't intend to change.
>
>I've figured out how to get smtp-notify to start with sendmail-client
>disable, but it was a manual process of using 'svccfg -s smtp-notify
>editprop'
>
>What I can't figure out how to do the same on the command line. 
>Everything
>I try either gives a syntax error or 'svccfg: No such property group
>"startup_req".'  I really don't want to have to add a manual step to my
>system setup scripts.
>
>What's the proper syntax for this setting?:
>
>svccfg -s system/fm/smtp-notify:default setprop "startup_req/entities"
>=
>fmri: \"svc:/milestone/multi-user:default svc:/system/fmd:default\"
>
>-Chip
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>OmniOS-discuss mailing list
>OmniOS-discuss at lists.omniti.com
>http://lists.omniti.com/mailman/listinfo/omnios-discuss

You may have to 'addpg' the property group first. Just 'editprop' any service to see syntax examples.

Choose to not depend on success of 'addpg' (it will fail if the pg is present already) but do check success of 'setprop'.

HTH, Jim 
--
Typos courtesy of K-9 Mail on my Samsung Android

From danmcd at omniti.com  Tue Mar 10 17:36:23 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 10 Mar 2015 13:36:23 -0400
Subject: [OmniOS-discuss] smtp-notify dependency on sendmail
In-Reply-To: <5F51C49D-0B8E-42BC-A91B-502383F3D424@cos.ru>
References: <CALeZrrT47BTuTNdKMaU9B8OO+xcqthyvpxYrG8fedGYk6yGjSw@mail.gmail.com>
	<5F51C49D-0B8E-42BC-A91B-502383F3D424@cos.ru>
Message-ID: <44AD0F5A-4A1C-4A25-B5AA-DEFE9D8D3EAF@omniti.com>


> On Mar 10, 2015, at 1:29 PM, Jim Klimov <jimklimov at cos.ru> wrote:
> 
> You may have to 'addpg' the property group first. Just 'editprop' any service to see syntax examples.

No he won't have to in this case.  The pg was already there.

> Choose to not depend on success of 'addpg' (it will fail if the pg is present already) but do check success of 'setprop'.

I did, and it worked for me.  He did, and it appears to work for him too.

GENERALLY SPEAKING checking for the pg is a good idea.  In this particular case, it's not needed.

Dan


From danmcd at omniti.com  Tue Mar 10 18:02:32 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 10 Mar 2015 14:02:32 -0400
Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013
Message-ID: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com>

Hey folks!  We're winding down to the release of r151014, which is not just the next Stable, but also the next LTS, replacing r151006.  No release media for this release, but the repo has been updated COMPLETELY.  This means you'll get prompted to "pkg update pkg" first, followed by a proper "pkg update".

The last bloody didn't have a lot of changes.  This one does.  Let's go over them:

* The distro_const(1M) command, which creates installation ISOs, now has signature policy as a configurable.  This is in anticipation of r151014's change to have the "omnios" repository REQUIRE SIGNATURES.

* omnios-build master branch, revision 51bf0ac

* The linked-ipkg (lipkg) brand is now part of the entire consolidation.

* Bash is now 4.3PL33

* Bind is now 9.10.2

* gnu-binutils is now 2.25

* bison is now 3.0.4

* NSS is at 3.17.4, NSPR is at 4.10.7, and ca-bundle has NSS 3.17.4 goodies in it.

* Curl is now 7.41.0

* Amazon EC2 API is now 1.7.3.0

* gmp is now 6.0.0a (but versioned as 6.0.0 like its tarball)

* gettext is 0.19.4

* gnu-grep is 2.21

* ipmitool is now 1.8.15

* iso-codes are now 3.57

* numpy is 1.9.2

* libidn is now 1.30

* libpcap is now 1.6.2

* NTP is now 6.7p1

* gnu-patch is now 2.7.4

* pv/pipe-viewer is now 1.5.7

* lxml-26 is now 3.4.2

* Mako is now 1.0.1

* pycurl is now 7.19.5.1

* simplejson is now 3.6.5

* sigcpp is now 2.4.0

* sqlite-3 is now 3.8.8.2

* git 2.3.0 is now properly versioned in omnios-userland.

* illumos-omnios master branch, revision dd90365 (last illumos-gate merge e492095)

* Zones now inherit the global zone's per-publisher signature policies both upon creation and upon attach.

* A softening of Illumos's "mailwrapper" package dependencies in the hopes of allowing custom sendmails more room to play in /etc/.

* Various small bugfixes all over the system, including ZFS.

* beadm(1M) now sorts by BE creation date (and can sort other ways with new options).

* While not available in the installation tools yet (and might not be until the r151016 release), you can now create a bootable root ZFS pool on EFI/GPI disks.  (Illumos #5125 and #5560-1.)

We have one more update for omnios-build (libffi, if needed), and we're planning to take some more from upstream illumos-gate before we close for the r151014 release (I'm hopeful several new Ethernet chipsets will be showing up).

Please try out zone creation and upgrades if you haven't already!  And make sure the new versions of any software mentioned above aren't surprising you.  (We've had no surprises thus far.)

Thanks!
Dan


From tim at multitalents.net  Tue Mar 10 23:32:40 2015
From: tim at multitalents.net (Tim Rice)
Date: Tue, 10 Mar 2015 16:32:40 -0700 (PDT)
Subject: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00
In-Reply-To: <54FECC07.7060606@jvm.de>
References: <CAOqBcP9V5z0kDSydyHxR6dPZTd6Lo36Nd7FU4b2Lf=VrxHB=9Q@mail.gmail.com>
	<32b301d057ba$d6168cc0$8243a640$@acm.org>
	<54F9FD0B.1040601@gnaa.net>
	<331901d05846$e406cbb0$ac146310$@acm.org>
	<54FA198B.7090300@gnaa.net> <54FA244B.1010600@gnaa.net>
	<54FB65FD.6040600@gmail.com>
	<CAO8hXRAoCLU-rsge=b87GRwLPNahyigyP96x5FTGaQ0W88b1EQ@mail.gmail.com>
	<D55F9D30-6341-49CF-A456-EE5529F40B3F@omniti.com>
	<54FECC07.7060606@jvm.de>
Message-ID: <alpine.UW2.2.11.1503101628210.7235@server01.int.multitalents.net>

On Tue, 10 Mar 2015, Stephan Budach wrote:

> Is there a known good way to flash a LSI back to P18 if it already came with
> P19? I happen to have two new LSIs running P19.
> Afaik, the readme explicitly warns about flashing back the fw?

Here are my notes from downgrading from P20 to P19 on a supermicro box.
modify as needed to go from P19 to P18.
......
downgrade to P19. P20 has serious bugs.
boot into UFI shell
get to usb
        fs1:
get to fw dir
        cd 9211_8i.p19
erase newer fw
        sas2flash.efi -o -e 6
load new fw and bios
        sas2flash.efi -o -l 2118.log -f 2118it.bin -b mptsas2.rom
......

-- 
Tim Rice				Multitalents
tim at multitalents.net

From omnios at citrus-it.net  Wed Mar 11 00:35:13 2015
From: omnios at citrus-it.net (Andy)
Date: Wed, 11 Mar 2015 00:35:13 +0000 (GMT)
Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations..
In-Reply-To: <81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com>
References: <alpine.GSO.2.00.1502181328130.16159@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1502270958170.5310@areb.pvgehf-vg.arg>
	<81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com>
Message-ID: <alpine.GSO.2.00.1503110028170.15800@areb.pvgehf-vg.arg>


On Fri, 27 Feb 2015, Dan McDonald wrote:

;
; > On Feb 27, 2015, at 5:05 AM, Andy <omnios at citrus-it.net> wrote:
; >
; > If we go ahead, I'll let you all know how it goes!
;
; Please do that.  If you can zap a Dell Standard HBA out of HW-RAID and into a raw-disk controller, that'd be a HUGE WIN for illumos distros everywhere.

Initial results look good to me. This is a Dell R730 with a PERC H730
RAID card in it and just a pair of 300GB SAS disks for now. The card
is configured in non-RAID/HBA mode through the standard BIOS menus.

# prtconf -d
...
        pci8086,2f02 (pciex8086,2f02) [Intel Corporation Haswell-E PCI Express Root Port 1], instance #0
            pci1028,1f49 (pciex1000,5d) [LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader]], instance #0
                sd, instance #0
                sd, instance #1

# iostat -En
c0t0d1           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE  Product: ST300MP0005      Revision: VS08 Serial No: S7xxx
Size: 300.00GB <300000000000 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c0t1d1           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE  Product: ST300MP0005      Revision: VS08 Serial No: S7xxx
Size: 300.00GB <300000000000 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

I do get this during boot:

SunOS Release 5.11 Version omnios-10b9c79 64-bit
Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.
NOTICE: map sync received, switched map_id to 1

NOTICE: LDMAP sync completed.
WARNING: /pci at 0,0/pci8086,2f02 at 1/pci1028,1f49 at 0/sd at 0,1 (sd0):
        Command failed to complete...Device is gone

WARNING: /pci at 0,0/pci8086,2f02 at 1/pci1028,1f49 at 0/sd at 1,1 (sd1):
        Command failed to complete...Device is gone

but everything seems ok afterwards. Will continue with testing tomorrow.

Andy

-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From tobi at oetiker.ch  Wed Mar 11 08:20:04 2015
From: tobi at oetiker.ch (Tobias Oetiker)
Date: Wed, 11 Mar 2015 09:20:04 +0100 (CET)
Subject: [OmniOS-discuss] About P19
Message-ID: <alpine.DEB.2.10.1503110914230.6442@engelberg.oetiker.ch>

Dan,

you mentioned in an earlier post that you had not heard anything
good about P19 ... this seems to prompt people to consider
downgreading to P18 ...

Did you mean to say that you had heared something BAD about P19 or
just nothing at all. Because I like my firmware best when it just
does what it is suposed todo and noone even thinks about it.

We are running P19 currently on one of our boxes, and it works ok.

(It did not solve the problem that prompted us to upgrade, which is
that we are seeing disks going offline for a few seconds every few
weeks causing zfs to mark them as faulted. But it did not make it
worse either, so we are looking at the disk firmware now ... )

cheers
tobi

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
www.oetiker.ch tobi at oetiker.ch +41 62 775 9902


From johan.kragsterman at capvert.se  Wed Mar 11 09:12:07 2015
From: johan.kragsterman at capvert.se (Johan Kragsterman)
Date: Wed, 11 Mar 2015 10:12:07 +0100
Subject: [OmniOS-discuss] Ang:  About P19
In-Reply-To: <alpine.DEB.2.10.1503110914230.6442@engelberg.oetiker.ch>
References: <alpine.DEB.2.10.1503110914230.6442@engelberg.oetiker.ch>
Message-ID: <OF5EA4279D.9FFBF661-ONC1257E05.00328C57-C1257E05.00328C5A@inse.com>

An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150311/596c3fd0/attachment.html>

From eric.sproul at circonus.com  Wed Mar 11 14:16:38 2015
From: eric.sproul at circonus.com (Eric Sproul)
Date: Wed, 11 Mar 2015 10:16:38 -0400
Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013
In-Reply-To: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com>
References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com>
Message-ID: <CAO8hXRAQ+YRqS+o2X+tuJbA3x=ewP1W926TA45EzoaSxvbSKSw@mail.gmail.com>

On Tue, Mar 10, 2015 at 2:02 PM, Dan McDonald <danmcd at omniti.com> wrote:
> Hey folks!  We're winding down to the release of r151014, which is not just the next Stable, but also the next LTS, replacing r151006.  No release media for this release, but the repo has been updated COMPLETELY.  This means you'll get prompted to "pkg update pkg" first, followed by a proper "pkg update".

Thanks Dan,
I just upgraded to the latest and noticed that arcstat throws this
error before every line of stats output:

Use of uninitialized value in division (/) at /usr/bin/arcstat line 329.

It looks like a proposed fix is up for review on the OpenZFS dev list:
https://reviews.csiden.org/r/164/ and the illumos bug report is
https://www.illumos.org/issues/5564

Eric

From danmcd at omniti.com  Wed Mar 11 14:22:14 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 11 Mar 2015 10:22:14 -0400
Subject: [OmniOS-discuss] About P19
In-Reply-To: <alpine.DEB.2.10.1503110914230.6442@engelberg.oetiker.ch>
References: <alpine.DEB.2.10.1503110914230.6442@engelberg.oetiker.ch>
Message-ID: <97E9D2FC-2B37-49C9-966B-A21B65361532@omniti.com>


> On Mar 11, 2015, at 4:20 AM, Tobias Oetiker <tobi at oetiker.ch> wrote:
> 
> Dan,
> 
> you mentioned in an earlier post that you had not heard anything
> good about P19 ... this seems to prompt people to consider
> downgreading to P18 ...

I've heard little/nothing about P19.  I've only heard P18 is known to be good.

Dan


From danmcd at omniti.com  Wed Mar 11 14:39:18 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 11 Mar 2015 10:39:18 -0400
Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013
In-Reply-To: <CAO8hXRAQ+YRqS+o2X+tuJbA3x=ewP1W926TA45EzoaSxvbSKSw@mail.gmail.com>
References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com>
	<CAO8hXRAQ+YRqS+o2X+tuJbA3x=ewP1W926TA45EzoaSxvbSKSw@mail.gmail.com>
Message-ID: <98E3747C-7E1B-41B6-8075-30A68535AB20@omniti.com>


> On Mar 11, 2015, at 10:16 AM, Eric Sproul <eric.sproul at circonus.com> wrote:
> 
> On Tue, Mar 10, 2015 at 2:02 PM, Dan McDonald <danmcd at omniti.com> wrote:
>> Hey folks!  We're winding down to the release of r151014, which is not just the next Stable, but also the next LTS, replacing r151006.  No release media for this release, but the repo has been updated COMPLETELY.  This means you'll get prompted to "pkg update pkg" first, followed by a proper "pkg update".
> 
> Thanks Dan,
> I just upgraded to the latest and noticed that arcstat throws this
> error before every line of stats output:
> 
> Use of uninitialized value in division (/) at /usr/bin/arcstat line 329.
> 
> It looks like a proposed fix is up for review on the OpenZFS dev list:
> https://reviews.csiden.org/r/164/ and the illumos bug report is
> https://www.illumos.org/issues/5564

It will show up on the next bloody update come hell (via upstream) or high water (if I have to merge it manually myself).

Dan


From danmcd at omniti.com  Wed Mar 11 14:42:35 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 11 Mar 2015 10:42:35 -0400
Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013
In-Reply-To: <98E3747C-7E1B-41B6-8075-30A68535AB20@omniti.com>
References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com>
	<CAO8hXRAQ+YRqS+o2X+tuJbA3x=ewP1W926TA45EzoaSxvbSKSw@mail.gmail.com>
	<98E3747C-7E1B-41B6-8075-30A68535AB20@omniti.com>
Message-ID: <78215C27-657C-445F-97C2-12B5DC21877F@omniti.com>


> On Mar 11, 2015, at 10:39 AM, Dan McDonald <danmcd at omniti.com> wrote:
> 
>> 
>> It looks like a proposed fix is up for review on the OpenZFS dev list:
>> https://reviews.csiden.org/r/164/ and the illumos bug report is
>> https://www.illumos.org/issues/5564
> 
> It will show up on the next bloody update come hell (via upstream) or high water (if I have to merge it manually myself).

Actually, I approved the RTI for 5564 late yesterday.  It's literally just the committer typing "git commit --amend" (adding a missing reviewer credit) and  "git push" and it'll be in our next pull from upstream.  :)

Dan


From chip at innovates.com  Wed Mar 11 14:48:11 2015
From: chip at innovates.com (Schweiss, Chip)
Date: Wed, 11 Mar 2015 09:48:11 -0500
Subject: [OmniOS-discuss] About P19
In-Reply-To: <97E9D2FC-2B37-49C9-966B-A21B65361532@omniti.com>
References: <alpine.DEB.2.10.1503110914230.6442@engelberg.oetiker.ch>
	<97E9D2FC-2B37-49C9-966B-A21B65361532@omniti.com>
Message-ID: <CALeZrrQAnyFVC1Z5B-AR97UV5eMmiX3ejCjLgySNAJRqyUn8zg@mail.gmail.com>

I have P19 on 3 active servers.  No issues.

I consider it safe.

Also interesting, P20 was on them when I first purchased them.  It was
nearly a month of usage before I found out about P20 and then downgraded.
I didn't have any problems with P20 like others were seeing.

-Chip

On Wed, Mar 11, 2015 at 9:22 AM, Dan McDonald <danmcd at omniti.com> wrote:

>
> > On Mar 11, 2015, at 4:20 AM, Tobias Oetiker <tobi at oetiker.ch> wrote:
> >
> > Dan,
> >
> > you mentioned in an earlier post that you had not heard anything
> > good about P19 ... this seems to prompt people to consider
> > downgreading to P18 ...
>
> I've heard little/nothing about P19.  I've only heard P18 is known to be
> good.
>
> Dan
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150311/c48dd2e8/attachment.html>

From danmcd at omniti.com  Wed Mar 11 15:18:41 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 11 Mar 2015 11:18:41 -0400
Subject: [OmniOS-discuss] TEST REQUEST - Running kvm in a zone verification
Message-ID: <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB@omniti.com>

This most recent update to bloody has put the /dev/kvm device entry into non-global ipkg or lipkg zones.  The idea is you can run KVM instances in zones, which is something not available in r151012 or earlier.

Our standard methods for running KVM apply:

	http://omnios.omniti.com/wiki.php/VirtualMachinesKVM

But you MUST FIRST dedicate a vnic to the zone in question from the global zone: by creating one in the global and then "add net/set physical" in zonecfg(1M). Furthemore, that vnic cannot be used for that zone's normal activity (so you'll likely need two vnics, unless you want the zone to do nothing but run KVM).  You MUST also dedicate a filesystem to the zone using the "dataset" methods in zonecfg(1M).

I've not been able to test this yet, so I cannot yet make the claim that "you can run KVM in a zone starting with r151014".  I would appreciate some community help here.  I have *some* availability for questions and help, but I really would like someone to take this and run with it.

Thanks,
Dan



From jdg117 at elvis.arl.psu.edu  Wed Mar 11 18:23:05 2015
From: jdg117 at elvis.arl.psu.edu (John D Groenveld)
Date: Wed, 11 Mar 2015 14:23:05 -0400
Subject: [OmniOS-discuss] TEST REQUEST - Running kvm in a zone
	verification
In-Reply-To: Your message of "Wed, 11 Mar 2015 11:18:41 EDT."
	<5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB@omniti.com> 
References: <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB@omniti.com> 
Message-ID: <201503111823.t2BIN5NY007656@elvis.arl.psu.edu>

In message <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB at omniti.com>, Dan McDonald writ
es:
>This most recent update to bloody has put the /dev/kvm device entry into non-g
>lobal ipkg or lipkg zones.  The idea is you can run KVM instances in zones, wh
>ich is something not available in r151012 or earlier.

You can run a single KVM instance in a zone with r151012
but you must add /dev/dld and /dev/kvm.

>I've not been able to test this yet, so I cannot yet make the claim that "you 
>can run KVM in a zone starting with r151014".  I would appreciate some communi
>ty help here.  I have *some* availability for questions and help, but I really
> would like someone to take this and run with it.

Awesome!
Now to hunt down some raw iron for bloody.

John
groenveld at acm.org

# cat /etc/release 
  OmniOS v11 r151012
  Copyright 2014 OmniTI Computer Consulting, Inc. All rights reserved.
  Use is subject to license terms

# zonecfg -z doors export
create -b
set zonepath=/var/opt/zones/doors
set brand=ipkg
set autoboot=false
set ip-type=exclusive
add net
set physical=vnic2
end
add net
set physical=vnic3
end
add device
set match=/dev/kvm
end
add device
set match=/dev/dld
end

#!/usr/bin/bash

# configuration
NAME=doors
VNIC=vnic3
HDD=/root/doors.raw
CD=/root/openSUSE-13.2-DVD-x86_64.iso
VNC=5
MEM=8192

mac=`dladm show-vnic -po macaddress $VNIC`

/usr/bin/qemu-system-x86_64 \
-name $NAME \
-boot cd \
-enable-kvm \
-vnc 0.0.0.0:$VNC \
-cpu host \
-smp 4 \
-m $MEM \
-no-hpet \
-usbdevice tablet \
-localtime \
-drive file=$HDD,if=ide,index=0 \
-drive file=$CD,media=cdrom,if=ide,index=2 \
-net nic,vlan=0,name=net0,model=e1000,macaddr=$mac \
-net vnic,vlan=0,name=net0,ifname=$VNIC,macaddr=$mac \
-vga cirrus \
-monitor unix:/tmp/$NAME.monitor,server,nowait,nodelay \
-daemonize


if [ $? -gt 0 ]; then
    echo "Failed to start VM"
fi

port=`expr 5900 + $VNC`
public_nic=$(dladm show-vnic|grep vnic2|awk '{print $2}')
public_ip=$(ifconfig $public_nic|grep inet|awk '{print $2}')

echo "Started VM:"
echo "Public: ${public_ip}:${port}"





From danmcd at omniti.com  Wed Mar 11 19:05:57 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 11 Mar 2015 15:05:57 -0400
Subject: [OmniOS-discuss] TEST REQUEST - Running kvm in a zone
	verification
In-Reply-To: <201503111823.t2BIN5NY007656@elvis.arl.psu.edu>
References: <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB@omniti.com>
	<201503111823.t2BIN5NY007656@elvis.arl.psu.edu>
Message-ID: <8AFC27C8-2D56-4821-8525-F7B944D57F45@omniti.com>


> On Mar 11, 2015, at 2:23 PM, John D Groenveld <jdg117 at elvis.arl.psu.edu> wrote:
> 
> In message <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB at omniti.com>, Dan McDonald writ
> es:
>> This most recent update to bloody has put the /dev/kvm device entry into non-g
>> lobal ipkg or lipkg zones.  The idea is you can run KVM instances in zones, wh
>> ich is something not available in r151012 or earlier.
> 
> You can run a single KVM instance in a zone with r151012
> but you must add /dev/dld and /dev/kvm.

Oh hell!  I actually didn't add KVM into the platform.xml files.  /dev/dld *is* in the zones, if you use exclusive-stack (and I don't know a good reason NOT to these days...).

So your script still needs to add /dev/kvm, until I patch illumos-omnios with /dev/kvm in the appropriate platform.xml files.

>> I've not been able to test this yet, so I cannot yet make the claim that "you 
>> can run KVM in a zone starting with r151014".  I would appreciate some communi
>> ty help here.  I have *some* availability for questions and help, but I really
>> would like someone to take this and run with it.
> 
> Awesome!
> Now to hunt down some raw iron for bloody.

I'm going to have to push this back:

commit af30091afd0ccd9320c3aee83ac15318e8d9e78f
Author: Dan McDonald <danmcd at omniti.com>
Date:   Wed Mar 11 15:02:54 2015 -0400

    Add kvm device accessability to ipkg/lipkg zones.

diff --git a/usr/src/lib/brand/ipkg/zone/platform.xml b/usr/src/lib/brand/ipkg/zone/platform.xml
index db40c9f..1e4fd5c 100644
--- a/usr/src/lib/brand/ipkg/zone/platform.xml
+++ b/usr/src/lib/brand/ipkg/zone/platform.xml
@@ -54,6 +54,7 @@
        <device match="fd" />
        <device match="ipnet" />
        <device match="kstat" />
+       <device match="kvm" />
        <device match="lo0" />
        <device match="lofictl" />
        <device match="lofi" />
diff --git a/usr/src/lib/brand/lipkg/zone/platform.xml b/usr/src/lib/brand/lipkg/zone/platform.xml
index c5c6041..7433d22 100644
--- a/usr/src/lib/brand/lipkg/zone/platform.xml
+++ b/usr/src/lib/brand/lipkg/zone/platform.xml
@@ -54,6 +54,7 @@
        <device match="fd" />
        <device match="ipnet" />
        <device match="kstat" />
+       <device match="kvm" />
        <device match="lo0" />
        <device match="lofictl" />
        <device match="lofi" />

Thanks for the help and reality check!
Dan


From danmcd at omniti.com  Wed Mar 11 19:27:57 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 11 Mar 2015 15:27:57 -0400
Subject: [OmniOS-discuss] rsync & MacOS
Message-ID: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com>

MacOS is weird because of its forked files.  I use NFS and everything works out okay.  I would like, however, to use rsync to mirror my home directory, as even with GigE, it takes a long time to back things up from scratch.

Does the MacOS X native rsync client work with 10.6 or 10.10 (I have machines running both, but nothing in between)?  Do I need special patches either on my clients or on my OmniOS server (r151012 for now, 014 shortly coming).  There's no rsync changes between 012 and 014 (3.1.1), so if it works for 012, it will keep working.

Thanks,
Dan


From lists at marzocchi.net  Wed Mar 11 19:38:16 2015
From: lists at marzocchi.net (Olaf Marzocchi)
Date: Wed, 11 Mar 2015 20:38:16 +0100
Subject: [OmniOS-discuss] rsync & MacOS
In-Reply-To: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com>
References: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com>
Message-ID: <01391483-5BF3-4642-A592-4405EB2CCCA7@marzocchi.net>

Hi Dan,
I can give you some links, as in the past I was using rsync to backup OSX data.
https://static.afp548.com/mactips/rsync.html <https://static.afp548.com/mactips/rsync.html> (some old hints about how to compile rsync)
http://www.n8gray.org/code/backup-bouncer/ <http://www.n8gray.org/code/backup-bouncer/> (a script to verify correct backups of OSX data using rsync)

In any case, I think that if you compile rsync by adding the obviously named patches provided with rsync itself, you should be fine. It?s also true that recent OSXs almost don?t use resource forks, if you backup xattrs it should be enough.

Olaf



> Il giorno 11/mar/2015, alle ore 20:27, Dan McDonald <danmcd at omniti.com> ha scritto:
> 
> MacOS is weird because of its forked files.  I use NFS and everything works out okay.  I would like, however, to use rsync to mirror my home directory, as even with GigE, it takes a long time to back things up from scratch.
> 
> Does the MacOS X native rsync client work with 10.6 or 10.10 (I have machines running both, but nothing in between)?  Do I need special patches either on my clients or on my OmniOS server (r151012 for now, 014 shortly coming).  There's no rsync changes between 012 and 014 (3.1.1), so if it works for 012, it will keep working.
> 
> Thanks,
> Dan
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150311/b3f2bbe9/attachment.html>

From danmcd at omniti.com  Wed Mar 11 19:40:43 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 11 Mar 2015 15:40:43 -0400
Subject: [OmniOS-discuss] rsync & MacOS
In-Reply-To: <01391483-5BF3-4642-A592-4405EB2CCCA7@marzocchi.net>
References: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com>
	<01391483-5BF3-4642-A592-4405EB2CCCA7@marzocchi.net>
Message-ID: <2E06E450-631E-4623-A91E-06385AC6C81B@omniti.com>


> On Mar 11, 2015, at 3:38 PM, Olaf Marzocchi <lists at marzocchi.net> wrote:
> 
> In any case, I think that if you compile rsync by adding the obviously named patches provided with rsync itself, you should be fine. It?s also true that recent OSXs almost don?t use resource forks, if you backup xattrs it should be enough.

I have some OLD files in my homedir, some may even predate MacOS X, so I do worry about resource forks or Creator/Type metadata.

Thanks!
Dan


From lists at marzocchi.net  Wed Mar 11 19:45:26 2015
From: lists at marzocchi.net (Olaf Marzocchi)
Date: Wed, 11 Mar 2015 20:45:26 +0100
Subject: [OmniOS-discuss] rsync & MacOS
In-Reply-To: <2E06E450-631E-4623-A91E-06385AC6C81B@omniti.com>
References: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com>
	<01391483-5BF3-4642-A592-4405EB2CCCA7@marzocchi.net>
	<2E06E450-631E-4623-A91E-06385AC6C81B@omniti.com>
Message-ID: <257E5BA5-E0B8-42D1-962E-51D5CDFBE0CF@marzocchi.net>


> Il giorno 11/mar/2015, alle ore 20:40, Dan McDonald <danmcd at omniti.com> ha scritto:
> 
> 
>> On Mar 11, 2015, at 3:38 PM, Olaf Marzocchi <lists at marzocchi.net> wrote:
>> 
>> In any case, I think that if you compile rsync by adding the obviously named patches provided with rsync itself, you should be fine. It?s also true that recent OSXs almost don?t use resource forks, if you backup xattrs it should be enough.
> 
> I have some OLD files in my homedir, some may even predate MacOS X, so I do worry about resource forks or Creator/Type metadata.

I would add the patches then, my first guess:

acls
fileflags
hfs-compression
xattrs

Backup-bouncer is the key to ensure completeness of the backups :)

Olaf

From tobi at oetiker.ch  Wed Mar 11 20:20:19 2015
From: tobi at oetiker.ch (Tobias Oetiker)
Date: Wed, 11 Mar 2015 21:20:19 +0100 (CET)
Subject: [OmniOS-discuss] 5296 Support for more than 16 groups with AUTH_SYS
Message-ID: <alpine.DEB.2.10.1503112118420.6442@engelberg.oetiker.ch>

Is

https://github.com/illumos/illumos-gate/commit/89621fe174cf95ae903df6ceab605bf24d696ac3

in 14 ?

cheers
tobi


-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
www.oetiker.ch tobi at oetiker.ch +41 62 775 9902


From danmcd at omniti.com  Wed Mar 11 20:31:08 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 11 Mar 2015 16:31:08 -0400
Subject: [OmniOS-discuss] 5296 Support for more than 16 groups with
	AUTH_SYS
In-Reply-To: <alpine.DEB.2.10.1503112118420.6442@engelberg.oetiker.ch>
References: <alpine.DEB.2.10.1503112118420.6442@engelberg.oetiker.ch>
Message-ID: <6287C694-839D-43CA-8A20-E5067C0564EF@omniti.com>


> On Mar 11, 2015, at 4:20 PM, Tobias Oetiker <tobi at oetiker.ch> wrote:
> 
> Is
> 
> https://github.com/illumos/illumos-gate/commit/89621fe174cf95ae903df6ceab605bf24d696ac3
> 
> in 14 ?

Sure is:

	https://github.com/omniti-labs/illumos-omnios/commit/89621fe174cf95ae903df6ceab605bf24d696ac3

Unless it's VERY new, or not in illumos-gate yet, you can assume it's going to be in 014.  We close the window on illumos-gate synching sometime in the next 1-3 weeks.

Dan


From danmcd at omniti.com  Wed Mar 11 20:51:21 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 11 Mar 2015 16:51:21 -0400
Subject: [OmniOS-discuss] TEST REQUEST - Running kvm in a zone
	verification
In-Reply-To: <8AFC27C8-2D56-4821-8525-F7B944D57F45@omniti.com>
References: <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB@omniti.com>
	<201503111823.t2BIN5NY007656@elvis.arl.psu.edu>
	<8AFC27C8-2D56-4821-8525-F7B944D57F45@omniti.com>
Message-ID: <73FC7A22-8BFD-4806-9136-756243BB9ACF@omniti.com>


> On Mar 11, 2015, at 3:05 PM, Dan McDonald <danmcd at omniti.com> wrote:
> 
> 
> Oh hell!  I actually didn't add KVM into the platform.xml files.  /dev/dld *is* in the zones, if you use exclusive-stack (and I don't know a good reason NOT to these days...).

I've just pushed out an update to system/zones, which will require a new BE and a reboot, but it has the kvm in platform.xml for both ipkg and lipkg brands.

Dan


From cf at ferebee.net  Wed Mar 11 21:13:18 2015
From: cf at ferebee.net (Chris Ferebee)
Date: Wed, 11 Mar 2015 22:13:18 +0100
Subject: [OmniOS-discuss] rsync & MacOS
In-Reply-To: <257E5BA5-E0B8-42D1-962E-51D5CDFBE0CF@marzocchi.net>
References: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com>
	<01391483-5BF3-4642-A592-4405EB2CCCA7@marzocchi.net>
	<2E06E450-631E-4623-A91E-06385AC6C81B@omniti.com>
	<257E5BA5-E0B8-42D1-962E-51D5CDFBE0CF@marzocchi.net>
Message-ID: <7C8CD1D7-A318-4E15-9B96-4C531C09D16C@ferebee.net>

You can get a precompiled copy of rsync 3.0.9 for OS X that includes --xattrs, --acls, --fileflags in the mlbackup package by Pepi Zawodsky (@MacLemon) from

    <https://maclemon.at/downloads/>

Best,
Chris

> Am 11.03.2015 um 20:45 schrieb Olaf Marzocchi <lists at marzocchi.net>:
> 
> 
>> Il giorno 11/mar/2015, alle ore 20:40, Dan McDonald <danmcd at omniti.com> ha scritto:
>> 
>> 
>>> On Mar 11, 2015, at 3:38 PM, Olaf Marzocchi <lists at marzocchi.net> wrote:
>>> 
>>> In any case, I think that if you compile rsync by adding the obviously named patches provided with rsync itself, you should be fine. It?s also true that recent OSXs almost don?t use resource forks, if you backup xattrs it should be enough.
>> 
>> I have some OLD files in my homedir, some may even predate MacOS X, so I do worry about resource forks or Creator/Type metadata.
> 
> I would add the patches then, my first guess:
> 
> acls
> fileflags
> hfs-compression
> xattrs
> 
> Backup-bouncer is the key to ensure completeness of the backups :)
> 
> Olaf
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

From omnios at citrus-it.net  Wed Mar 11 23:39:05 2015
From: omnios at citrus-it.net (Andy)
Date: Wed, 11 Mar 2015 23:39:05 +0000 (GMT)
Subject: [OmniOS-discuss] TEST REQUEST - Running kvm in a zone
 verification
In-Reply-To: <73FC7A22-8BFD-4806-9136-756243BB9ACF@omniti.com>
References: <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB@omniti.com>
	<201503111823.t2BIN5NY007656@elvis.arl.psu.edu>
	<8AFC27C8-2D56-4821-8525-F7B944D57F45@omniti.com>
	<73FC7A22-8BFD-4806-9136-756243BB9ACF@omniti.com>
Message-ID: <alpine.GSO.2.00.1503112332090.5386@areb.pvgehf-vg.arg>

On Wed, 11 Mar 2015, Dan McDonald wrote:

;
; > On Mar 11, 2015, at 3:05 PM, Dan McDonald <danmcd at omniti.com> wrote:
; >
; >
; > Oh hell!  I actually didn't add KVM into the platform.xml files.  /dev/dld *is* in the zones, if you use exclusive-stack (and I don't know a good reason NOT to these days...).
;
; I've just pushed out an update to system/zones, which will require a new BE and a reboot, but it has the kvm in platform.xml for both ipkg and lipkg brands.

Working fine for me in an lipkg zone.

root at test:/root# zoneadm list -vc
  ID NAME    STATUS     PATH                      BRAND    IP
   6 test    running    /                         native   excl

root at test:/root# svccfg -s kvm
svc:/system/kvm> add bsd0
svc:/system/kvm> select bsd0
svc:/system/kvm:bsd0> addpg config application
svc:/system/kvm:bsd0> setprop config/vnic=bsd0
svc:/system/kvm:bsd0> setprop config/vnc=5
svc:/system/kvm:bsd0> setprop config/mem=4G
svc:/system/kvm:bsd0> setprop config/hdd=/dev/zvol/rdsk/test/bsd/hdd0
svc:/system/kvm:bsd0> setprop config/iso=/FreeBSD-9.2-RELEASE-amd64-disc1.iso
svc:/system/kvm:bsd0> end

root at test:/root# svcadm enable kvm:bsd0
root at test:/root# svcs kvm:bsd0
STATE          STIME    FMRI
online          0:23:08 svc:/system/kvm:bsd0

oot at test:/root# netstat -an | grep 590
      *.5905               *.*                0      0 128000      0 LISTEN
172.29.0.95.5905     172.29.0.10.54043    89984      0 128872      0 ESTABLISHED
      *.5905                            *.*                             0      0 128000      0 LISTEN

root at test:/root# cat /var/svc/log/system-kvm:bsd0.log

[ Mar 12 00:29:52 Executing start method ("/lib/svc/method/kvm start"). ]
svcprop: Couldn't find property `config/extra' for instance `svc:/system/kvm:bsd0'.
STARTING WITH: 		/usr/bin/qemu-system-x86_64 		-name bsd0 		-enable-kvm 		-vnc :5 		-smp 10 		-m 4G 		-no-hpet 		-localtime 		-drive file=/dev/zvol/rdsk/test/bsd/hdd0,if=ide,index=0 		-net nic,vlan=0,name=net0,model=e1000,macaddr=2:8:20:25:52:33 		-net vnic,vlan=0,name=net0,ifname=bsd0,macaddr=2:8:20:25:52:33 		-vga std 		-daemonize 	 			-drive file=/FreeBSD-9.2-RELEASE-amd64-disc1.iso,media=cdrom,if=ide,index=2  			-boot cd
multiticks: timer_create: Not owner
multiticks: could not create timer; disabling
timer_create: Not owner
Dynamic Ticks disabled
qemu-system-x86_64: -net vnic,vlan=0,name=net0,ifname=bsd0,macaddr=2:8:20:25:52:33: vnic dhcp disabled
qemu-system-x86_64: -net vnic,vlan=0,name=net0,ifname=bsd0,macaddr=2:8:20:25:52:33: can't ioctl: Invalid argument

-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From hakansom at ohsu.edu  Thu Mar 12 01:16:55 2015
From: hakansom at ohsu.edu (Marion Hakanson)
Date: Wed, 11 Mar 2015 18:16:55 -0700
Subject: [OmniOS-discuss] rsync & MacOS
In-Reply-To: Message from Dan McDonald <danmcd@omniti.com> of "Wed,
	11 Mar 2015 15:40:43 EDT."
	<2E06E450-631E-4623-A91E-06385AC6C81B@omniti.com>
Message-ID: <201503120116.t2C1Gtsu027144@kyklops.ohsu.edu>

danmcd at omniti.com said:
> I have some OLD files in my homedir, some may even predate MacOS X, so I do
> worry about resource forks or Creator/Type metadata. 

Dan,

I like Carbon Copy Cloner for backing up our Macs.  It has rsync behind
its GUI interface, and seems to handle native HFS+ stuff just fine.

I tend to set up a remote .dmg volume on our NFS (or SMB) network share
for each Mac, and treat those like whole-volume backups (similar to what
Time Machine would do).  But CCC also works for just a subdirectory,
not just for a whole Mac volume, and to a remote share, not only to
a disk image.

CCC is shareware these days, but you can still download the freeware
version, last I checked.

Regards,

Marion



From lists at marzocchi.net  Thu Mar 12 09:37:09 2015
From: lists at marzocchi.net (Olaf Marzocchi)
Date: Thu, 12 Mar 2015 10:37:09 +0100
Subject: [OmniOS-discuss] rsync & MacOS
In-Reply-To: <201503120116.t2C1Gtsu027144@kyklops.ohsu.edu>
References: <201503120116.t2C1Gtsu027144@kyklops.ohsu.edu>
Message-ID: <77E987D1-BA80-427B-9524-41B515A90E7D@marzocchi.net>

If I remember correctly, in the past the rsync provided in CCC contained patches not available in the main tree and it was the only way to get proper backups.
Nowadays the official rsync already has everything you need to backup OS X metadata and CCC is only a nice GUI.
However, I think not every patch from Mr. Bombich has been submitted, some minor differences may still be there.
A test is recommended.

Olaf



Il 12 marzo 2015 02:16:55 CET, Marion Hakanson <hakansom at ohsu.edu> ha scritto:
>danmcd at omniti.com said:
>> I have some OLD files in my homedir, some may even predate MacOS X,
>so I do
>> worry about resource forks or Creator/Type metadata. 
>
>Dan,
>
>I like Carbon Copy Cloner for backing up our Macs.  It has rsync behind
>its GUI interface, and seems to handle native HFS+ stuff just fine.
>
>I tend to set up a remote .dmg volume on our NFS (or SMB) network share
>for each Mac, and treat those like whole-volume backups (similar to
>what
>Time Machine would do).  But CCC also works for just a subdirectory,
>not just for a whole Mac volume, and to a remote share, not only to
>a disk image.
>
>CCC is shareware these days, but you can still download the freeware
>version, last I checked.
>
>Regards,
>
>Marion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150312/b492e2ec/attachment-0001.html>

From omnios at citrus-it.net  Thu Mar 12 09:53:58 2015
From: omnios at citrus-it.net (Andy)
Date: Thu, 12 Mar 2015 09:53:58 +0000 (GMT)
Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations..
In-Reply-To: <alpine.GSO.2.00.1503110028170.15800@areb.pvgehf-vg.arg>
References: <alpine.GSO.2.00.1502181328130.16159@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1502270958170.5310@areb.pvgehf-vg.arg>
	<81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com>
	<alpine.GSO.2.00.1503110028170.15800@areb.pvgehf-vg.arg>
Message-ID: <alpine.GSO.2.00.1503120936140.62@areb.pvgehf-vg.arg>


On Wed, 11 Mar 2015, Andy wrote:

;
; On Fri, 27 Feb 2015, Dan McDonald wrote:
;
; ;
; ; > On Feb 27, 2015, at 5:05 AM, Andy <omnios at citrus-it.net> wrote:
; ; >
; ; > If we go ahead, I'll let you all know how it goes!
; ;
; ; Please do that.  If you can zap a Dell Standard HBA out of HW-RAID and into a raw-disk controller, that'd be a HUGE WIN for illumos distros everywhere.
;
; Initial results look good to me. This is a Dell R730 with a PERC H730
; RAID card in it and just a pair of 300GB SAS disks for now. The card
; is configured in non-RAID/HBA mode through the standard BIOS menus.

I spoke too soon - disk performance seems generally poor with high service
times :( Everything's working apart from the disks briefly going away
at boot, just slow.

I have another server that I can try to elmininate the hardware but then I'll
need to start trying to diagnose this. If anyone has any thoughts on what to
look at first or commands to run I'd really appreciate it.

It's running with the mr_sas driver and the adapter is in HBA mode. It looks
like I have two options there - either HBA mode or RAID mode with the disks in
non-RAID mode; not sure what the difference is but I'll try both. In addition
to trying the second server, I'm also going to test with both firmware
revisions that are available for the PERC, RAID0 sets (just to see) and then
other OSs including Solaris 11 and whatever flavour of Linux is supported by
Dell.

Thanks,

Andy

-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From cf at ferebee.net  Thu Mar 12 13:36:56 2015
From: cf at ferebee.net (Chris Ferebee)
Date: Thu, 12 Mar 2015 14:36:56 +0100
Subject: [OmniOS-discuss] rsync & MacOS
In-Reply-To: <77E987D1-BA80-427B-9524-41B515A90E7D@marzocchi.net>
References: <201503120116.t2C1Gtsu027144@kyklops.ohsu.edu>
	<77E987D1-BA80-427B-9524-41B515A90E7D@marzocchi.net>
Message-ID: <9B935575-EB83-4B30-AF1A-97C121ED892C@ferebee.net>

I gather it can be a bit tricky to compile rsync correctly for OS X.

FWIW, mlbackup (and hence the bundled rsync 3.0.9 binary) is validated with Backup Bouncer, Mike Bombich?s test suite for HFS+ backups.

Best,
Chris

> Am 12.03.2015 um 10:37 schrieb Olaf Marzocchi <lists at marzocchi.net>:
> 
> If I remember correctly, in the past the rsync provided in CCC contained patches not available in the main tree and it was the only way to get proper backups.
> Nowadays the official rsync already has everything you need to backup OS X metadata and CCC is only a nice GUI.
> However, I think not every patch from Mr. Bombich has been submitted, some minor differences may still be there.
> A test is recommended.
> 
> Olaf
> 
> 
> 
> Il 12 marzo 2015 02:16:55 CET, Marion Hakanson <hakansom at ohsu.edu> ha scritto:
> danmcd at omniti.com said:
>  I have some OLD files in my homedir, some may even predate MacOS X, so I do
>  worry about resource forks or Creator/Type metadata. 
> 
> Dan,
> 
> I like Carbon Copy Cloner for backing up our Macs.  It has rsync behind
> its GUI interface, and seems to handle native HFS+ stuff just fine.
> 
> I tend to set up a remote .dmg volume on our NFS (or SMB) network share
> for each Mac, and treat those like whole-volume backups (similar to what
> Time Machine would do).  But CCC also works for just a subdirectory,
> not just for a whole Mac volume, and to a remote share, not only to
> a disk image.
> 
> CCC is shareware these days, but you can still download the freeware
> version, last I checked.
> 
> Regards,
> 
> Marion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4176 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20150312/105342d1/attachment.bin>

From omnios at citrus-it.net  Thu Mar 12 13:49:20 2015
From: omnios at citrus-it.net (Andy)
Date: Thu, 12 Mar 2015 13:49:20 +0000 (GMT)
Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations..
In-Reply-To: <alpine.GSO.2.00.1503120936140.62@areb.pvgehf-vg.arg>
References: <alpine.GSO.2.00.1502181328130.16159@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1502270958170.5310@areb.pvgehf-vg.arg>
	<81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com>
	<alpine.GSO.2.00.1503110028170.15800@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1503120936140.62@areb.pvgehf-vg.arg>
Message-ID: <alpine.GSO.2.00.1503121336420.11108@areb.pvgehf-vg.arg>


On Thu, 12 Mar 2015, Andy wrote:

;
; On Wed, 11 Mar 2015, Andy wrote:
;
; ;
; ; On Fri, 27 Feb 2015, Dan McDonald wrote:
; ;
; ; ;
; ; ; > On Feb 27, 2015, at 5:05 AM, Andy <omnios at citrus-it.net> wrote:
; ; ; >
; ; ; > If we go ahead, I'll let you all know how it goes!
; ; ;
; ; ; Please do that.  If you can zap a Dell Standard HBA out of HW-RAID and into a raw-disk controller, that'd be a HUGE WIN for illumos distros everywhere.
; ;
; ; Initial results look good to me. This is a Dell R730 with a PERC H730
; ; RAID card in it and just a pair of 300GB SAS disks for now. The card
; ; is configured in non-RAID/HBA mode through the standard BIOS menus.
;
; I spoke too soon - disk performance seems generally poor with high service
; times :( Everything's working apart from the disks briefly going away
; at boot, just slow.

Same story on a different R730. Abysmal disk performance in HBA mode,
apparently regardless of BIOS and RAID card firmware versions; at least
based on the four combinations I tried.

	  extended device statistics       ---- errors ---
r/s  w/s  Mr/s  Mw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
0.0 142.0  0.0  11.6  0.0 10.0  0.0   70.4   0 100  0  0  0  0 c0t0d1s0
0.0 142.6  0.0  11.6  0.0 10.0  0.0   70.1   0 100  0  0  0  0 c0t1d1s0

and I have seen asvc_t > 300 with this test workload.

However, with a mirrored rpool on top of RAID0 devices:

                  extended device statistics       ---- errors ---
r/s  w/s  Mr/s  Mw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
0.0 2829.2 0.0 232.0 0.0  9.8  0.0    3.5  2 99  0  0  0  0 c0t0d0s0
0.0 2823.6 0.0 231.5 0.0  9.9  0.0    3.5  2 100 0  0  0  0 c0t1d0s0

Off to play with dtrace and see if I can work out what's happening.

Andy

-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From jdg117 at elvis.arl.psu.edu  Thu Mar 12 14:15:11 2015
From: jdg117 at elvis.arl.psu.edu (John D Groenveld)
Date: Thu, 12 Mar 2015 10:15:11 -0400
Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations..
In-Reply-To: Your message of "Thu, 12 Mar 2015 09:53:58 -0000."
	<alpine.GSO.2.00.1503120936140.62@areb.pvgehf-vg.arg> 
References: <alpine.GSO.2.00.1502181328130.16159@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1502270958170.5310@areb.pvgehf-vg.arg>
	<81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com>
	<alpine.GSO.2.00.1503110028170.15800@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1503120936140.62@areb.pvgehf-vg.arg> 
Message-ID: <201503121415.t2CEFBcC022831@elvis.arl.psu.edu>

In message <alpine.GSO.2.00.1503120936140.62 at areb.pvgehf-vg.arg>, Andy writes:
>I spoke too soon - disk performance seems generally poor with high service
>times :( Everything's working apart from the disks briefly going away

Oye...maybe time to tell your Dell sales critter, you're going
to send your business elsewhere unless he figures out how to BTO
servers with non-RAID SAS HBAs similar to the ones that Dell US
sells:
<URL:http://accessories.dell.com/sna/productdetail.aspx?c=us&l=en&s=bsd&cs=04&sku=406-BBDP>

Otherwise, good luck debugging MegaRAID drivers and firmware.
What's the device ID for your RAID controller?
Which version mr_sas(7D) are you using?
Which version of the firmware?

John
groenveld at acm.org

From omnios at citrus-it.net  Thu Mar 12 14:26:10 2015
From: omnios at citrus-it.net (Andy)
Date: Thu, 12 Mar 2015 14:26:10 +0000 (GMT)
Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations..
In-Reply-To: <201503121415.t2CEFBcC022831@elvis.arl.psu.edu>
References: <alpine.GSO.2.00.1502181328130.16159@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1502270958170.5310@areb.pvgehf-vg.arg>
	<81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com>
	<alpine.GSO.2.00.1503110028170.15800@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1503120936140.62@areb.pvgehf-vg.arg>
	<201503121415.t2CEFBcC022831@elvis.arl.psu.edu>
Message-ID: <alpine.GSO.2.00.1503121423580.28055@areb.pvgehf-vg.arg>


On Thu, 12 Mar 2015, John D Groenveld wrote:

; In message <alpine.GSO.2.00.1503120936140.62 at areb.pvgehf-vg.arg>, Andy writes:
; >I spoke too soon - disk performance seems generally poor with high service
; >times :( Everything's working apart from the disks briefly going away
;
; Oye...maybe time to tell your Dell sales critter, you're going
; to send your business elsewhere unless he figures out how to BTO
; servers with non-RAID SAS HBAs similar to the ones that Dell US
; sells:
; <URL:http://accessories.dell.com/sna/productdetail.aspx?c=us&l=en&s=bsd&cs=04&sku=406-BBDP>
;
; Otherwise, good luck debugging MegaRAID drivers and firmware.
; What's the device ID for your RAID controller?

pci1028,1f49 (pciex1000,5d)
[LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader]],
instance #0 (driver name: mr_sas)

02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108
[Invader] (rev 02)
        Subsystem: Dell PERC H730 Mini
        Flags: bus master, fast devsel, latency 0, IRQ 15
        I/O ports at 2000
        Memory at 92000000 (64-bit, non-prefetchable)
        Memory at 91f00000 (64-bit, non-prefetchable)
        Expansion ROM at fff00000 [disabled]
        Capabilities: [50] Power Management version 3
        Capabilities: [68] Express Endpoint, MSI 00
        Capabilities: [d0] Vital Product Data
        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [c0] MSI-X: Enable+ Count=97 Masked-

; Which version mr_sas(7D) are you using?

 80 fffffffff7db7000  1d070 172   1  mr_sas (6.503.00.00ILLUMOS)

; Which version of the firmware?

# megacli -Version -Ctrl -aALL

              CTRL VERSION:
              ================

Product Name : PERC H730 Mini
Fw Package Build : 25.2.1.0037
FW Version : 4.240.00-3615

-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From lists at marzocchi.net  Thu Mar 12 16:32:25 2015
From: lists at marzocchi.net (Olaf Marzocchi)
Date: Thu, 12 Mar 2015 17:32:25 +0100
Subject: [OmniOS-discuss] rsync & MacOS
In-Reply-To: <9B935575-EB83-4B30-AF1A-97C121ED892C@ferebee.net>
References: <201503120116.t2C1Gtsu027144@kyklops.ohsu.edu>
	<77E987D1-BA80-427B-9524-41B515A90E7D@marzocchi.net>
	<9B935575-EB83-4B30-AF1A-97C121ED892C@ferebee.net>
Message-ID: <17F18021-18ED-4CCF-B910-7686DAA6CF73@marzocchi.net>

It WAS tricky prior to 3.0 :)
Nowadays the provided patches are enough. I'd try them first to be independent from other people's binaries and update policies.

Of course, it's a matter of personal preference.

Olaf


Il 12 marzo 2015 14:36:56 CET, Chris Ferebee <cf at ferebee.net> ha scritto:
>I gather it can be a bit tricky to compile rsync correctly for OS X.
>
>FWIW, mlbackup (and hence the bundled rsync 3.0.9 binary) is validated
>with Backup Bouncer, Mike Bombich?s test suite for HFS+ backups.
>
>Best,
>Chris
>
>> Am 12.03.2015 um 10:37 schrieb Olaf Marzocchi <lists at marzocchi.net>:
>> 
>> If I remember correctly, in the past the rsync provided in CCC
>contained patches not available in the main tree and it was the only
>way to get proper backups.
>> Nowadays the official rsync already has everything you need to backup
>OS X metadata and CCC is only a nice GUI.
>> However, I think not every patch from Mr. Bombich has been submitted,
>some minor differences may still be there.
>> A test is recommended.
>> 
>> Olaf
>> 
>> 
>> 
>> Il 12 marzo 2015 02:16:55 CET, Marion Hakanson <hakansom at ohsu.edu> ha
>scritto:
>> danmcd at omniti.com said:
>>  I have some OLD files in my homedir, some may even predate MacOS X,
>so I do
>>  worry about resource forks or Creator/Type metadata. 
>> 
>> Dan,
>> 
>> I like Carbon Copy Cloner for backing up our Macs.  It has rsync
>behind
>> its GUI interface, and seems to handle native HFS+ stuff just fine.
>> 
>> I tend to set up a remote .dmg volume on our NFS (or SMB) network
>share
>> for each Mac, and treat those like whole-volume backups (similar to
>what
>> Time Machine would do).  But CCC also works for just a subdirectory,
>> not just for a whole Mac volume, and to a remote share, not only to
>> a disk image.
>> 
>> CCC is shareware these days, but you can still download the freeware
>> version, last I checked.
>> 
>> Regards,
>> 
>> Marion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150312/fa678b93/attachment-0001.html>

From philip.robar at gmail.com  Thu Mar 12 20:35:27 2015
From: philip.robar at gmail.com (Philip Robar)
Date: Thu, 12 Mar 2015 16:35:27 -0400
Subject: [OmniOS-discuss] rsync & MacOS
In-Reply-To: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com>
References: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com>
Message-ID: <CAOi9mG+6SRwPw9rSZe_aurZr6+FuSW2xP7XZpUX3_jwWBpK1mQ@mail.gmail.com>

It's sad that Apple is still shipping rsync version 2.6.9, but with OS X
10.7 (as near as I can tell from what I read on the net) and newer it's
patched to handle extend attributes and resource forks. Note, however, that
at some point the meaning of options have changed: -E has different
meanings and the older version doesn't support the -X/-xattrs option that
replaces the old use of -E.

Macports will install version 3.1.1.

(
I'm not a fan of fink (Why install a GNU environment when you have a
perfectly good UNIX(?) environment already?) and I chose Macports over
Homebrew, but I don't remember why.
Pro Homebew: http://deephill.com/macports-vs-homebrew/
Pro Macports: http://arstechnica.com/civis/viewtopic.php?f=19&t=1207907
)

Phil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150312/df1fec4c/attachment.html>

From danmcd at omniti.com  Thu Mar 12 20:57:52 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 12 Mar 2015 16:57:52 -0400
Subject: [OmniOS-discuss] NFS ._ names and rsync (was Re:  rsync & MacOS)
In-Reply-To: <CAOi9mG+6SRwPw9rSZe_aurZr6+FuSW2xP7XZpUX3_jwWBpK1mQ@mail.gmail.com>
References: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com>
	<CAOi9mG+6SRwPw9rSZe_aurZr6+FuSW2xP7XZpUX3_jwWBpK1mQ@mail.gmail.com>
Message-ID: <FB7BC348-6D35-413A-8E5C-CE85FBE21878@omniti.com>


> On Mar 12, 2015, at 4:35 PM, Philip Robar <philip.robar at gmail.com> wrote:
> 
> It's sad that Apple is still shipping rsync version 2.6.9, but with OS X 10.7 (as near as I can tell from what I read on the net) and newer it's patched to handle extend attributes and resource forks. Note, however, that at some point the meaning of options have changed: -E has different meanings and the older version doesn't support the -X/-xattrs option that replaces the old use of -E.

10.7 or better.  THat's partially helpful.  I still have 10.6 on a few nodes.

I forgot to ask something releated.

Today, when I place files using NFS (likely NFSv3) on 10.6, I see ._<FILE> which contains resource forks on the OmniOS side.  Using NFS (likely NFSv4) on 10.10, I also see them.

Does rsync do ._<file> stuff like NFS does?!?  Ideally it would , that way I can rsynch, but then recover individual files using NFS.

Thanks,
Dan


From tobi at oetiker.ch  Fri Mar 13 07:25:14 2015
From: tobi at oetiker.ch (Tobias Oetiker)
Date: Fri, 13 Mar 2015 08:25:14 +0100 (CET)
Subject: [OmniOS-discuss] incomplete recursive snapshots
Message-ID: <alpine.DEB.2.10.1503130811300.6442@engelberg.oetiker.ch>

I got a bunch of new disks on one of our systems and wanted to
transfer an existing pool over to them so what I did was this:

    zfs snapshot -r old-pool at replicaton
    zfs send -R old-pool at replication | mbuffer -m 1G  | zfs receive -F -d  new-pool

but then halfway through the operation, I got warnings from send,
that old-pool/some/fileset at replication would not exist ...

when I went to investigate, I found indeed that zfs snapshot -r had
neglected to create a snapshot on old-pool/some/fileset. So I
ran

    zfs list -r -o name old-pool | xargs -n1 perl -e 'system "zfs","list",$ARGV[0].q{@replication}'

and found that there were about 10% of the filesets which were
lacking this snapshot ...

I then proceeded to create the missing snapshot individually, and
it worked fine.

I have since repeated the experiment and found the same problem
again ...

any idea how this can be ?

cheers
tobi


-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
www.oetiker.ch tobi at oetiker.ch +41 62 775 9902


From danmcd at omniti.com  Fri Mar 13 14:13:19 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 13 Mar 2015 10:13:19 -0400
Subject: [OmniOS-discuss] incomplete recursive snapshots
In-Reply-To: <alpine.DEB.2.10.1503130811300.6442@engelberg.oetiker.ch>
References: <alpine.DEB.2.10.1503130811300.6442@engelberg.oetiker.ch>
Message-ID: <A3B946AE-54CA-457F-A341-A9767677798C@omniti.com>

Only recently fixed snapshot bug I could find was Illumos 5150 http://www.illumos.org/issues/5150

Also, could you share the precise warnings?  It'll help finding who's doing the complaining.

Dan

Sent from my iPhone (typos, autocorrect, and all)

> On Mar 13, 2015, at 3:25 AM, Tobias Oetiker <tobi at oetiker.ch> wrote:
> 
> I got a bunch of new disks on one of our systems and wanted to
> transfer an existing pool over to them so what I did was this:
> 
>    zfs snapshot -r old-pool at replicaton
>    zfs send -R old-pool at replication | mbuffer -m 1G  | zfs receive -F -d  new-pool
> 
> but then halfway through the operation, I got warnings from send,
> that old-pool/some/fileset at replication would not exist ...
> 
> when I went to investigate, I found indeed that zfs snapshot -r had
> neglected to create a snapshot on old-pool/some/fileset. So I
> ran
> 
>    zfs list -r -o name old-pool | xargs -n1 perl -e 'system "zfs","list",$ARGV[0].q{@replication}'
> 
> and found that there were about 10% of the filesets which were
> lacking this snapshot ...
> 
> I then proceeded to create the missing snapshot individually, and
> it worked fine.
> 
> I have since repeated the experiment and found the same problem
> again ...
> 
> any idea how this can be ?
> 
> cheers
> tobi
> 
> 
> -- 
> Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
> www.oetiker.ch tobi at oetiker.ch +41 62 775 9902
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

From tobi at oetiker.ch  Fri Mar 13 14:16:13 2015
From: tobi at oetiker.ch (Tobias Oetiker)
Date: Fri, 13 Mar 2015 15:16:13 +0100 (CET)
Subject: [OmniOS-discuss] incomplete recursive snapshots
In-Reply-To: <A3B946AE-54CA-457F-A341-A9767677798C@omniti.com>
References: <alpine.DEB.2.10.1503130811300.6442@engelberg.oetiker.ch>
	<A3B946AE-54CA-457F-A341-A9767677798C@omniti.com>
Message-ID: <alpine.DEB.2.10.1503131515530.6442@engelberg.oetiker.ch>

Hi Dan,

Today Dan McDonald wrote:

> Only recently fixed snapshot bug I could find was Illumos 5150 http://www.illumos.org/issues/5150
>
> Also, could you share the precise warnings?  It'll help finding who's doing the complaining.

*blush* it was my bad ...

see
http://serverfault.com/questions/675185/incomplete-recursive-snapshots-on-zfs

cheers
tobi

>
> Dan
>
> Sent from my iPhone (typos, autocorrect, and all)
>
> > On Mar 13, 2015, at 3:25 AM, Tobias Oetiker <tobi at oetiker.ch> wrote:
> >
> > I got a bunch of new disks on one of our systems and wanted to
> > transfer an existing pool over to them so what I did was this:
> >
> >    zfs snapshot -r old-pool at replicaton
> >    zfs send -R old-pool at replication | mbuffer -m 1G  | zfs receive -F -d  new-pool
> >
> > but then halfway through the operation, I got warnings from send,
> > that old-pool/some/fileset at replication would not exist ...
> >
> > when I went to investigate, I found indeed that zfs snapshot -r had
> > neglected to create a snapshot on old-pool/some/fileset. So I
> > ran
> >
> >    zfs list -r -o name old-pool | xargs -n1 perl -e 'system "zfs","list",$ARGV[0].q{@replication}'
> >
> > and found that there were about 10% of the filesets which were
> > lacking this snapshot ...
> >
> > I then proceeded to create the missing snapshot individually, and
> > it worked fine.
> >
> > I have since repeated the experiment and found the same problem
> > again ...
> >
> > any idea how this can be ?
> >
> > cheers
> > tobi
> >
> >
> > --
> > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
> > www.oetiker.ch tobi at oetiker.ch +41 62 775 9902
> >
> > _______________________________________________
> > OmniOS-discuss mailing list
> > OmniOS-discuss at lists.omniti.com
> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
www.oetiker.ch tobi at oetiker.ch +41 62 775 9902


From fwp at deepthought.com  Sat Mar 14 19:09:43 2015
From: fwp at deepthought.com (Frank Pittel)
Date: Sat, 14 Mar 2015 14:09:43 -0500
Subject: [OmniOS-discuss] Problem with a couple of drives and omnios
Message-ID: <20150314190942.GA22808@warlock.deepthought.com>

I have a machine here at home running Omnios and I love it. It has 6 drives installed all sata and connected to the motherboard. There are 3
zpools with two drives each. One zpool has 2 1TB drives as an rpool and 2 zpools with 2 - 2TB drives in each. I've set up all pools as mirrored.
On monday while i was out the power went out and since this is a box that I play with it's not hooked up to my ups.

When the power came back up I noticed the machine wouldn't boot. I got the following bizzare error:

krtld: failed to open '/platform/i86pc/kernel/amd64/u<unprintable>'
krtld bind_primary(): no relocation information found for module /platform/i86pc/kernel/amd64/u<unprintable>
krtld: error during initial load/link phase



The errors go on and on along those lines and then I get:

Unable to boot
Press any key to reboot.





I thought at first that something happened to my boot drives so I unplugged my 2TB drives and tried to boot from dvd to reinstall. I didn't hit
the button for the boot menu fast enough and ended up booting from disk. To my surprise the OS booted fine. I then plugged in the 2TB zones and
tried booting again. To make a long story short I found that the two drives for one of the pools were causing the problem. I've tried deleting the
zpools, removing partitions and even used dd to overwrite the MBR on the drives. No luck with any of the attempts. Have I damaged the drives in
some wierd way or done something to keep them from working with omnios?

Frank

From danmcd at omniti.com  Sat Mar 14 18:59:11 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Sat, 14 Mar 2015 14:59:11 -0400
Subject: [OmniOS-discuss] Problem with a couple of drives and omnios
In-Reply-To: <20150314190942.GA22808@warlock.deepthought.com>
References: <20150314190942.GA22808@warlock.deepthought.com>
Message-ID: <8039FCEF-1D1A-423C-8106-3D5BF519F4F4@omniti.com>


> On Mar 14, 2015, at 3:09 PM, Frank Pittel <fwp at deepthought.com> wrote:
> 
> 
> I thought at first that something happened to my boot drives so I unplugged my 2TB drives and tried to boot from dvd to reinstall. I didn't hit
> the button for the boot menu fast enough and ended up booting from disk. To my surprise the OS booted fine. I then plugged in the 2TB zones and
> tried booting again. To make a long story short I found that the two drives for one of the pools were causing the problem. I've tried deleting the
> zpools, removing partitions and even used dd to overwrite the MBR on the drives. No luck with any of the attempts. Have I damaged the drives in
> some wierd way or done something to keep them from working with omnios?

So wait, removing one of your DATA pools makes this machine boot okay?  Did you check your zpool status after booting?  You may have been able to export the (disconnected) pool, then plug the drives back in, then reboot, and reimport the pool.

The corrupted "unix" at the end of platform/i86pc/... suggests possible a corrupt menu.lst.  Did you check with grub what the menu entry was actually passing along?  You can do that with the 'e' key over your specific boot menu choice.

Dan


From fwp at deepthought.com  Sat Mar 14 19:33:40 2015
From: fwp at deepthought.com (Frank Pittel)
Date: Sat, 14 Mar 2015 14:33:40 -0500
Subject: [OmniOS-discuss] Problem with a couple of drives and omnios
In-Reply-To: <8039FCEF-1D1A-423C-8106-3D5BF519F4F4@omniti.com>
References: <20150314190942.GA22808@warlock.deepthought.com>
	<8039FCEF-1D1A-423C-8106-3D5BF519F4F4@omniti.com>
Message-ID: <20150314193339.GB22808@warlock.deepthought.com>

On Sat, Mar 14, 2015 at 02:59:11PM -0400, Dan McDonald wrote:
> 
> > On Mar 14, 2015, at 3:09 PM, Frank Pittel <fwp at deepthought.com> wrote:
> > 
> > 
> > I thought at first that something happened to my boot drives so I unplugged my 2TB drives and tried to boot from dvd to reinstall. I didn't hit
> > the button for the boot menu fast enough and ended up booting from disk. To my surprise the OS booted fine. I then plugged in the 2TB zones and
> > tried booting again. To make a long story short I found that the two drives for one of the pools were causing the problem. I've tried deleting the
> > zpools, removing partitions and even used dd to overwrite the MBR on the drives. No luck with any of the attempts. Have I damaged the drives in
> > some wierd way or done something to keep them from working with omnios?
> 
> So wait, removing one of your DATA pools makes this machine boot okay?  Did you check your zpool status after booting?  You may have been able to export the (disconnected) pool, then plug the drives back in, then reboot, and reimport the pool.
> 
> The corrupted "unix" at the end of platform/i86pc/... suggests possible a corrupt menu.lst.  Did you check with grub what the menu entry was actually passing along?  You can do that with the 'e' key over your specific boot menu choice.
> 
> Dan

When I remove one of the DATA pools the machine will boot. I've tried exporting the pool and it doesn't help. While the pool drives are unplugged
I can boot the machine without issue. I've then run "devfsadm -C" to remove device entries.  Even then the machine won't boot with the drive
connected. After the OS is booted I can plug the drives in. They aren't visible via format until I run devfsadm again. The errors got me to
thinking there was something in the boot sector that was confusing my oldish MB and the MB was trying to boot off of one of those drives I used dd
to overwrite the MBR with no success. I'm thinking of just going and using dd to write /dev/zero over the entire drives. There's nothing on those
drives that I care about since I was just using them to play with different permutations of zfs and zones. Things like mounting filesystems in
zones, etc.

Frank 

From danmcd at omniti.com  Sun Mar 15 20:44:26 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Sun, 15 Mar 2015 16:44:26 -0400
Subject: [OmniOS-discuss] Problem with a couple of drives and omnios
In-Reply-To: <20150314193339.GB22808@warlock.deepthought.com>
References: <20150314190942.GA22808@warlock.deepthought.com>
	<8039FCEF-1D1A-423C-8106-3D5BF519F4F4@omniti.com>
	<20150314193339.GB22808@warlock.deepthought.com>
Message-ID: <CE92A1B0-E176-43F1-9476-6B04298A78B4@omniti.com>


> On Mar 14, 2015, at 3:33 PM, Frank Pittel <fwp at deepthought.com> wrote:
> 
> When I remove one of the DATA pools the machine will boot. I've tried exporting the pool and it doesn't help. While the pool drives are unplugged
> I can boot the machine without issue. I've then run "devfsadm -C" to remove device entries.  Even then the machine won't boot with the drive
> connected. After the OS is booted I can plug the drives in. They aren't visible via format until I run devfsadm again. The errors got me to
> thinking there was something in the boot sector that was confusing my oldish MB and the MB was trying to boot off of one of those drives I used dd
> to overwrite the MBR with no success. I'm thinking of just going and using dd to write /dev/zero over the entire drives. There's nothing on those
> drives that I care about since I was just using them to play with different permutations of zfs and zones. Things like mounting filesystems in
> zones, etc.

It does sound like your MB trying to boot off of other drives.  If there's nothing important there, try creating new pools, preferably using the whole disk (EFI/GPT).

Dan


From jim at cos.ru  Mon Mar 16 08:28:43 2015
From: jim at cos.ru (Jim Klimov)
Date: Mon, 16 Mar 2015 09:28:43 +0100
Subject: [OmniOS-discuss] Fix to VirtualBox installer under OI/OmniOS
Message-ID: <fc2d2cdb62e8.5506a24b@cos.ru>

An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150316/0bcb6896/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vboxconfig.sh.patch
Type: application/octet-stream
Size: 6484 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20150316/0bcb6896/attachment.obj>

From takashiary at gmail.com  Mon Mar 16 11:24:16 2015
From: takashiary at gmail.com (takashi ary)
Date: Mon, 16 Mar 2015 20:24:16 +0900
Subject: [OmniOS-discuss] Kernel Panic OmniOS r151006 svc.configd
Message-ID: <CANXaD7hNztWAs-75G0pF3-GaKDf1U6MrEsPQ_n43BVY3Hop+bA@mail.gmail.com>

Hello,

Kernel Panic occurred omnios-b281e50 (OmniOS r151006 LTS) on VMware ESXi 5.1
This file server (CIFS) was running over 300 days until this panic.
Panic occurred 2 times at Mar 14.

/var/adm/messages
--------------------------------------------------------------------------------
Mar 14 06:43:24 smbsv2 unix: [ID 836849 kern.notice]
Mar 14 06:43:24 smbsv2 ^Mpanic[cpu1]/thread=ffffff01cf612500:
Mar 14 06:43:24 smbsv2 genunix: [ID 683410 kern.notice] BAD TRAP: type=e
(#pf Page fault) rp=ffffff0007e81f60 addr=ffffff01ce49eff8
Mar 14 06:43:24 smbsv2 unix: [ID 100000 kern.notice]
Mar 14 06:43:24 smbsv2 unix: [ID 839527 kern.notice] svc.configd:
Mar 14 06:43:24 smbsv2 unix: [ID 753105 kern.notice] #pf Page fault
Mar 14 06:43:24 smbsv2 unix: [ID 532287 kern.notice] Bad kernel fault at
addr=0xffffff01ce49eff8
Mar 14 06:43:24 smbsv2 unix: [ID 243837 kern.notice] pid=12,
pc=0xfffffffffb8001b3, sp=0xffffff0007e82050, eflags=0x10086
Mar 14 06:43:24 smbsv2 unix: [ID 211416 kern.notice] cr0:
8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6b8<xmme,fxsr,pge,pae,pse,de>
Mar 14 06:43:24 smbsv2 unix: [ID 624947 kern.notice] cr2: ffffff01ce49eff8
Mar 14 06:43:24 smbsv2 unix: [ID 625075 kern.notice] cr3: 13cfef000
Mar 14 06:43:24 smbsv2 unix: [ID 625715 kern.notice] cr8: 0
Mar 14 06:43:24 smbsv2 unix: [ID 100000 kern.notice]
Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice]    rdi:
8 rsi: fffffffffbc7dd60 rdx:                1
Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice]    rcx:
4b  r8: fffffffffbc72480  r9: ffffff01d5d1b4c0
Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice]    rax:
fffffffffbc724c0 rbx: ffffff01cdfa7e00 rbp: ffffff0007e82050
Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice]    r10:
0 r11: ffffff01ce99dcb8 r12: ffffff01cf612500
Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice]    r13:
fffffffffb86071e r14: ffffff01ce49f000 r15: fffffffffbc724c0
Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice]    fsb:
0 gsb: ffffff01ce5ec580  ds:               4b
Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice]     es:
4b  fs:                0  gs:              1c3
Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice]    trp:
e err:                9 rip: fffffffffb8001b3
Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice]     cs:
30 rfl:            10086 rsp: ffffff0007e82050
Mar 14 06:43:24 smbsv2 unix: [ID 266532 kern.notice]     ss:
38
Mar 14 06:43:24 smbsv2 unix: [ID 100000 kern.notice]
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e81e40
unix:real_mode_stop_cpu_stage2_end+9d93 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e81f50
unix:trap+db3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e81f60
unix:cmntrap+e6 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82050
unix:cmntrap+c3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82140
unix:cmntrap+c3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82230
unix:cmntrap+c3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82320
unix:cmntrap+c3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82410
unix:cmntrap+c3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82500
unix:cmntrap+c3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e825f0
unix:cmntrap+c3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e826e0
unix:cmntrap+c3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e827d0
unix:cmntrap+c3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e828c0
unix:cmntrap+c3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e829b0
unix:cmntrap+c3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82aa0
unix:cmntrap+c3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82b90
unix:cmntrap+c3 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82c90
unix:gdt_update_usegd+20 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82cb0
unix:gdt_ucode_model+37 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82ce0
unix:lwp_segregs_restore32+26 ()
Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82d10
genunix:restorectx+2f ()
Mar 14 06:43:24 smbsv2 unix: [ID 100000 kern.notice]
Mar 14 06:43:24 smbsv2 genunix: [ID 672855 kern.notice] syncing file
systems...
Mar 14 06:43:24 smbsv2 genunix: [ID 904073 kern.notice]  done
Mar 14 06:43:25 smbsv2 genunix: [ID 111219 kern.notice] dumping to
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Mar 14 06:43:45 smbsv2 genunix: [ID 100000 kern.notice]
Mar 14 06:43:45 smbsv2 genunix: [ID 665016 kern.notice] ^M100% done: 264317
pages dumped,
Mar 14 06:43:45 smbsv2 genunix: [ID 851671 kern.notice] dump succeeded
--------------------------------------------------------------------------------

crash.tar.gz (attached)
--------------------------------------------------------------------------------
ID=0 and 1

echo '::panicinfo' | mdb ${ID} > ~/crash.${ID}_panicinfo
echo '::cpuinfo -v' | mdb ${ID} > ~/crash.${ID}_cpuinfo
echo '::threadlist -v 10' | mdb ${ID} > ~/crash.${ID}_threadlist
echo '::msgbuf' | mdb ${ID} > ~/crash.${ID}_msgbuf
echo '*panic_thread::findstack -v' | mdb ${ID} > ~/crash.${ID}_findstack
echo '::stacks' | mdb ${ID} > ~/crash.${ID}_stacks
echo '::ps' | mdb ${ID} > ~/crash.${ID}_ps
--------------------------------------------------------------------------------

I couldn't find similar panic on www.illumos.org.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150316/afae54b4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: crash.tar.gz
Type: application/x-gzip
Size: 83764 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20150316/afae54b4/attachment-0001.bin>

From danmcd at omniti.com  Mon Mar 16 15:10:34 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 16 Mar 2015 11:10:34 -0400
Subject: [OmniOS-discuss] [discuss] Kernel Panic OmniOS r151006
	svc.configd
In-Reply-To: <CANXaD7hNztWAs-75G0pF3-GaKDf1U6MrEsPQ_n43BVY3Hop+bA@mail.gmail.com>
References: <CANXaD7hNztWAs-75G0pF3-GaKDf1U6MrEsPQ_n43BVY3Hop+bA@mail.gmail.com>
Message-ID: <CBB2564D-4C44-4220-8CD5-4CFA0A1ED9F3@omniti.com>

Keeping my response on the OmniOS list only for now.  Your panic info may be better shared on the illumos developer list, BTW.

> On Mar 16, 2015, at 7:24 AM, takashi ary via illumos-discuss <discuss at lists.illumos.org> wrote:
> 
> <crash.tar.gz>

Normally I like to see the dumps themselves, but these are both the same panic, and both in a seemingly innocent return from a door call.

Both panics are page faults, like the kernel was using a userspace pointer or something.  I don't know the doorfs subsystem that well, but given I've not seen this anywhere else, I'm wondering if something odd is going on inside VMware's memory management (you did say this is running inside VMware)?

No specific activity was going on prior to the panics, right?

I'm a bit stumped at this point.

Dan


From takashiary at gmail.com  Mon Mar 16 20:02:42 2015
From: takashiary at gmail.com (takashi ary)
Date: Tue, 17 Mar 2015 05:02:42 +0900
Subject: [OmniOS-discuss] [discuss] Kernel Panic OmniOS r151006
	svc.configd
In-Reply-To: <CBB2564D-4C44-4220-8CD5-4CFA0A1ED9F3@omniti.com>
References: <CANXaD7hNztWAs-75G0pF3-GaKDf1U6MrEsPQ_n43BVY3Hop+bA@mail.gmail.com>
	<CBB2564D-4C44-4220-8CD5-4CFA0A1ED9F3@omniti.com>
Message-ID: <CANXaD7hYd89yikP+pSf8xS8yBObD1pbqM5YSeSC1AOJ=8G=xOg@mail.gmail.com>

Hi Dan,

Thanks for your analysis.

> Normally I like to see the dumps themselves, but these are both the same panic, and both in a seemingly innocent return from a door call.

It's possible to send vmdump.
What is good way to send?

> Both panics are page faults, like the kernel was using a userspace pointer or something.  I don't know the doorfs subsystem that well, but given I've not seen this anywhere else, I'm wondering if something odd is going on inside VMware's memory management (you did say this is running inside VMware)?

This is running inside VMware ESXi 5.1 Update 2
so search the VMware Knowledge Base...

Windows 2008 R2, Red Hat Enterprise Linux and Solaris 10 64-bit
virtual machines blue screen or kernel panic when running on ESXi 5.x
with an Intel E5/E7/E3 v2 series processor (2073791)
http://kb.vmware.com/kb/2073791

$ prtconf -v | grep Xeon
                        value='Intel(r) Xeon(r) CPU E5-2697 v2 @ 2.70GHz'
                        value='Intel(r) Xeon(r) CPU E5-2697 v2 @ 2.70GHz'

Intel E5 v2 series processor!
Bingo?

> No specific activity was going on prior to the panics, right?

Right, I think no one was using the server at that time.

This info may be better shared on the illumos developer list?

Thanks

From danmcd at omniti.com  Mon Mar 16 20:10:38 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 16 Mar 2015 16:10:38 -0400
Subject: [OmniOS-discuss] [discuss] Kernel Panic OmniOS r151006
	svc.configd
In-Reply-To: <CANXaD7hYd89yikP+pSf8xS8yBObD1pbqM5YSeSC1AOJ=8G=xOg@mail.gmail.com>
References: <CANXaD7hNztWAs-75G0pF3-GaKDf1U6MrEsPQ_n43BVY3Hop+bA@mail.gmail.com>
	<CBB2564D-4C44-4220-8CD5-4CFA0A1ED9F3@omniti.com>
	<CANXaD7hYd89yikP+pSf8xS8yBObD1pbqM5YSeSC1AOJ=8G=xOg@mail.gmail.com>
Message-ID: <4F31D9D5-FF00-41A7-953B-3EB35A2167C4@omniti.com>


> On Mar 16, 2015, at 4:02 PM, takashi ary <takashiary at gmail.com> wrote:
> 
>> Normally I like to see the dumps themselves, but these are both the same panic, and both in a seemingly innocent return from a door call.
> 
> It's possible to send vmdump.
> What is good way to send?

Given what you say below, I don't think you will need to send me anything...

>> Both panics are page faults, like the kernel was using a userspace pointer or something.  I don't know the doorfs subsystem that well, but given I've not seen this anywhere else, I'm wondering if something odd is going on inside VMware's memory management (you did say this is running inside VMware)?
> 
> This is running inside VMware ESXi 5.1 Update 2
> so search the VMware Knowledge Base...
> 
> Windows 2008 R2, Red Hat Enterprise Linux and Solaris 10 64-bit
> virtual machines blue screen or kernel panic when running on ESXi 5.x
> with an Intel E5/E7/E3 v2 series processor (2073791)
> http://kb.vmware.com/kb/2073791
> 
> $ prtconf -v | grep Xeon
>                        value='Intel(r) Xeon(r) CPU E5-2697 v2 @ 2.70GHz'
>                        value='Intel(r) Xeon(r) CPU E5-2697 v2 @ 2.70GHz'
> 
> Intel E5 v2 series processor!
> Bingo?

Yep!  See if the VMware note has information on the coredump for Solaris 10 --> it'll be close to any illumos distro, including OmniOS.  According to the note, you need "update 3" to 5.1.

>> No specific activity was going on prior to the panics, right?
> 
> Right, I think no one was using the server at that time.
> 
> This info may be better shared on the illumos developer list?

Yes, INCLUDING the VMware technical note and its solution.  Please share!!!  I'm glad this isn't a problem with us, but with VMware.  One major user of illumos runs on VMware, and needs to see that, but I suspect they know it already.

Thanks!
Dan


From takashiary at gmail.com  Mon Mar 16 21:48:01 2015
From: takashiary at gmail.com (takashi ary)
Date: Tue, 17 Mar 2015 06:48:01 +0900
Subject: [OmniOS-discuss] [discuss] Kernel Panic OmniOS r151006
	svc.configd
In-Reply-To: <4F31D9D5-FF00-41A7-953B-3EB35A2167C4@omniti.com>
References: <CANXaD7hNztWAs-75G0pF3-GaKDf1U6MrEsPQ_n43BVY3Hop+bA@mail.gmail.com>
	<CBB2564D-4C44-4220-8CD5-4CFA0A1ED9F3@omniti.com>
	<CANXaD7hYd89yikP+pSf8xS8yBObD1pbqM5YSeSC1AOJ=8G=xOg@mail.gmail.com>
	<4F31D9D5-FF00-41A7-953B-3EB35A2167C4@omniti.com>
Message-ID: <CANXaD7gmO23O2Fh=ZniGndEMNHNxhb4d6UfdNKZKmGgtc-Qemg@mail.gmail.com>

Hi Dan,

Thanks for your help.
I will update my ESXi 5.1 to Update 3.

I sent a mail to illumos developer list.
When there is a mistake, correction, please.

Thanks

2015-03-17 5:10 GMT+09:00 Dan McDonald <danmcd at omniti.com>:
>
>
> > On Mar 16, 2015, at 4:02 PM, takashi ary <takashiary at gmail.com> wrote:
> >
> >> Normally I like to see the dumps themselves, but these are both the same panic, and both in a seemingly innocent return from a door call.
> >
> > It's possible to send vmdump.
> > What is good way to send?
>
> Given what you say below, I don't think you will need to send me anything...
>
> >> Both panics are page faults, like the kernel was using a userspace pointer or something.  I don't know the doorfs subsystem that well, but given I've not seen this anywhere else, I'm wondering if something odd is going on inside VMware's memory management (you did say this is running inside VMware)?
> >
> > This is running inside VMware ESXi 5.1 Update 2
> > so search the VMware Knowledge Base...
> >
> > Windows 2008 R2, Red Hat Enterprise Linux and Solaris 10 64-bit
> > virtual machines blue screen or kernel panic when running on ESXi 5.x
> > with an Intel E5/E7/E3 v2 series processor (2073791)
> > http://kb.vmware.com/kb/2073791
> >
> > $ prtconf -v | grep Xeon
> >                        value='Intel(r) Xeon(r) CPU E5-2697 v2 @ 2.70GHz'
> >                        value='Intel(r) Xeon(r) CPU E5-2697 v2 @ 2.70GHz'
> >
> > Intel E5 v2 series processor!
> > Bingo?
>
> Yep!  See if the VMware note has information on the coredump for Solaris 10 --> it'll be close to any illumos distro, including OmniOS.  According to the note, you need "update 3" to 5.1.
>
> >> No specific activity was going on prior to the panics, right?
> >
> > Right, I think no one was using the server at that time.
> >
> > This info may be better shared on the illumos developer list?
>
> Yes, INCLUDING the VMware technical note and its solution.  Please share!!!  I'm glad this isn't a problem with us, but with VMware.  One major user of illumos runs on VMware, and needs to see that, but I suspect they know it already.
>
> Thanks!
> Dan
>

From danmcd at omniti.com  Thu Mar 19 15:18:13 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 19 Mar 2015 11:18:13 -0400
Subject: [OmniOS-discuss] OpenSSL now updated!
Message-ID: <3796FFF3-B587-4F76-8178-1B47197B94B6@omniti.com>

If you're runninng 006, 010, or 012 --> OpenSSL is now 1.0.1m.

If you're running bloody --> OpenSSL is now 1.0.2a.  (NOTE: 1.0.2 is affected more, so upgrade this quickly!)

All of the repos have been updated.  Since this is openssl, you will strictly speaking not need to reboot, but if you do not reboot, you WILL need to restart services that link to openssl.

Happy updating!
Dan


From stephan.budach at JVM.DE  Fri Mar 20 09:51:59 2015
From: stephan.budach at JVM.DE (Stephan Budach)
Date: Fri, 20 Mar 2015 10:51:59 +0100
Subject: [OmniOS-discuss] OmniOS: zpool import dumps core
Message-ID: <550BEDBF.2010506@jvm.de>

Hi,

OmniOS: SunOS nfsvmpool05 5.11 omnios-10b9c79 i86pc i386 i86pc (0.151012)

when trying to run zpool import, the command yields this output:

Assertion failed: rn->rn_nozpool == B_FALSE, file 
../common/libzfs_import.c, line 1080, function zpool_open_func
Abort (core dumped)

I don't think that this is related to the actual zpool I created, since 
running zpool import in general makes this happen.
This is new install that I created yesterday by first installing 006 and 
then updating via 008/010 to 012.

Any ideas, what could have caused that?

Thanks,
stephan

From jimklimov at cos.ru  Fri Mar 20 10:30:18 2015
From: jimklimov at cos.ru (Jim Klimov)
Date: Fri, 20 Mar 2015 11:30:18 +0100
Subject: [OmniOS-discuss] OpenSSL now updated!
In-Reply-To: <3796FFF3-B587-4F76-8178-1B47197B94B6@omniti.com>
References: <3796FFF3-B587-4F76-8178-1B47197B94B6@omniti.com>
Message-ID: <61991B57-270F-481E-BC28-49AB7F814409@cos.ru>

19 ????? 2015??. 16:18:13 CET, Dan McDonald <danmcd at omniti.com> ?????:
>If you're runninng 006, 010, or 012 --> OpenSSL is now 1.0.1m.
>
>If you're running bloody --> OpenSSL is now 1.0.2a.  (NOTE: 1.0.2 is
>affected more, so upgrade this quickly!)
>
>All of the repos have been updated.  Since this is openssl, you will
>strictly speaking not need to reboot, but if you do not reboot, you
>WILL need to restart services that link to openssl.
>
>Happy updating!
>Dan
>
>_______________________________________________
>OmniOS-discuss mailing list
>OmniOS-discuss at lists.omniti.com
>http://lists.omniti.com/mailman/listinfo/omnios-discuss

Is there a way for IPS services to be restarted automatically when their dependency libraries change?

I have a few ideas about how this wheel might be (re-)invented and bolted on, but perhaps there already is a generic solution in the packaging system? ;)

Jim

--
Typos courtesy of K-9 Mail on my Samsung Android

From ben at fluffy.co.uk  Fri Mar 20 11:08:49 2015
From: ben at fluffy.co.uk (Ben Summers)
Date: Fri, 20 Mar 2015 11:08:49 +0000
Subject: [OmniOS-discuss] OpenSSL now updated!
In-Reply-To: <61991B57-270F-481E-BC28-49AB7F814409@cos.ru>
References: <3796FFF3-B587-4F76-8178-1B47197B94B6@omniti.com>
	<61991B57-270F-481E-BC28-49AB7F814409@cos.ru>
Message-ID: <B699BC3A-1128-4859-835A-525392D09030@fluffy.co.uk>


> On 20 Mar 2015, at 10:30, Jim Klimov <jimklimov at cos.ru> wrote:
> 
> 19 ????? 2015 ?. 16:18:13 CET, Dan McDonald <danmcd at omniti.com> ?????:
>> If you're runninng 006, 010, or 012 --> OpenSSL is now 1.0.1m.
>> 
>> If you're running bloody --> OpenSSL is now 1.0.2a.  (NOTE: 1.0.2 is
>> affected more, so upgrade this quickly!)
>> 
>> All of the repos have been updated.  Since this is openssl, you will
>> strictly speaking not need to reboot, but if you do not reboot, you
>> WILL need to restart services that link to openssl.
>> 
> 
> Is there a way for IPS services to be restarted automatically when their dependency libraries change?
> 
> I have a few ideas about how this wheel might be (re-)invented and bolted on, but perhaps there already is a generic solution in the packaging system? ;)

I suppose a hacky script could get a list of all the libraries and executables changed in the last update, use pfiles on all processes in all zones to files which ones have those libraries open, then use svcs -p to determine which services those processes are running under, and then restart them.

Or you could just reboot. You've probably got bigger problems if you can't reboot your server.

Ben


--
http://bens.me.uk


From stephan.budach at JVM.DE  Fri Mar 20 13:04:22 2015
From: stephan.budach at JVM.DE (Stephan Budach)
Date: Fri, 20 Mar 2015 14:04:22 +0100
Subject: [OmniOS-discuss] OmniOS: zpool import dumps core
In-Reply-To: <550BEDBF.2010506@jvm.de>
References: <550BEDBF.2010506@jvm.de>
Message-ID: <550C1AD6.1010805@jvm.de>

Never mind. I re-installed OmniOS from a new r012 USB download and this 
issue went away. Must indeed have been something I have picked up while 
rushing through the updates from 006 to 012.

Cheers,
Stephan

From eric.sproul at circonus.com  Fri Mar 20 14:14:59 2015
From: eric.sproul at circonus.com (Eric Sproul)
Date: Fri, 20 Mar 2015 10:14:59 -0400
Subject: [OmniOS-discuss] OpenSSL now updated!
In-Reply-To: <B699BC3A-1128-4859-835A-525392D09030@fluffy.co.uk>
References: <3796FFF3-B587-4F76-8178-1B47197B94B6@omniti.com>
	<61991B57-270F-481E-BC28-49AB7F814409@cos.ru>
	<B699BC3A-1128-4859-835A-525392D09030@fluffy.co.uk>
Message-ID: <CAO8hXRCmXGdy1DKJ3Krpw2wa1S78CQd-HNqaZuYoCfhz+2nZ4w@mail.gmail.com>

On Fri, Mar 20, 2015 at 7:08 AM, Ben Summers <ben at fluffy.co.uk> wrote:

> I suppose a hacky script could get a list of all the libraries and executables changed in the last update, use pfiles on all processes in all zones to files which ones have those libraries open, then use svcs -p to determine which services those processes are running under, and then restart them.

Better yet, there already exists a hacky script:
http://omnios.omniti.com/media/ssl_services_to_restart.sh

This looks for running processes in the current zone that link libssl
or libcrypto and gives you a list of services that you may wish to
restart.  It could be turned into something more generic, perhaps that
took the name of a shared library as an argument.

It is possible to have a package action trigger a service restart.
See ACTUATORS in pkg(5).  Circonus uses this a lot to deliver and
update services via packages.  One might make a case for ssl-dependent
core system services (like ssh) to be restarted by the openssl
package.  It's obviously not practical for the OmniOS openssl package
to actuate your arbitrary services though.  :)

Eric

From jstockett at molalla.com  Fri Mar 20 18:27:01 2015
From: jstockett at molalla.com (Jeff Stockett)
Date: Fri, 20 Mar 2015 18:27:01 +0000
Subject: [OmniOS-discuss] NFS v4.1 in OmniOS stable?
Message-ID: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com>

Does OmniOS support NFS v4.1?  4.1 support is a new feature in esxi v6, and I was trying to set it up as described here:

http://wahlnetwork.com/2015/02/02/nfs-v4-1/

Things of course work fine if I use NFS v3, but if I try v4.1, I get a timeout error when it tries to attach the data store.  Both the omnios server and the esxi client are properly joined to Active Directory so I think the required Kerberos stuff should be working.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150320/03edcda4/attachment.html>

From danmcd at omniti.com  Fri Mar 20 19:13:25 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 20 Mar 2015 15:13:25 -0400
Subject: [OmniOS-discuss] NFS v4.1 in OmniOS stable?
In-Reply-To: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com>
References: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com>
Message-ID: <3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com>


> On Mar 20, 2015, at 2:27 PM, Jeff Stockett <jstockett at molalla.com> wrote:
> 
> Does OmniOS support NFS v4.1?  4.1 support is a new feature in esxi v6, and I was trying to set it up as described here:
>  
> http://wahlnetwork.com/2015/02/02/nfs-v4-1/
>  
> Things of course work fine if I use NFS v3, but if I try v4.1, I get a timeout error when it tries to attach the data store.  Both the omnios server and the esxi client are properly joined to Active Directory so I think the required Kerberos stuff should be working.

We have NFS4.0 and earlier.  We do not have NFS4.1.  It would be a very sizeable undertaking, requiring illumos community support.  If anyone would lead the charge on that, it'd be a storage-oriented firm, like Nexenta, or Delphix.

Dan


From omnios at citrus-it.net  Fri Mar 20 19:39:44 2015
From: omnios at citrus-it.net (Andy)
Date: Fri, 20 Mar 2015 19:39:44 +0000 (GMT)
Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations..
In-Reply-To: <alpine.GSO.2.00.1503121423580.28055@areb.pvgehf-vg.arg>
References: <alpine.GSO.2.00.1502181328130.16159@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1502270958170.5310@areb.pvgehf-vg.arg>
	<81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com>
	<alpine.GSO.2.00.1503110028170.15800@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1503120936140.62@areb.pvgehf-vg.arg>
	<201503121415.t2CEFBcC022831@elvis.arl.psu.edu>
	<alpine.GSO.2.00.1503121423580.28055@areb.pvgehf-vg.arg>
Message-ID: <alpine.GSO.2.00.1503201937030.17954@areb.pvgehf-vg.arg>

On Thu, 12 Mar 2015, Andy wrote:
;
; On Thu, 12 Mar 2015, John D Groenveld wrote:
;
; ; Otherwise, good luck debugging MegaRAID drivers and firmware.

This definitely looks like a driver problem but I'm making progress.
It seems that the code for handling logical versus physical disks on an
LSI Invader controller is different and the PD code has some issues.

Andy
-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From mir at miras.org  Fri Mar 20 19:52:22 2015
From: mir at miras.org (Michael Rasmussen)
Date: Fri, 20 Mar 2015 20:52:22 +0100
Subject: [OmniOS-discuss] NFS v4.1 in OmniOS stable?
In-Reply-To: <3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com>
References: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com>
	<3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com>
Message-ID: <20150320205222.0d2d9ab0@sleipner.datanom.net>

On Fri, 20 Mar 2015 15:13:25 -0400
Dan McDonald <danmcd at omniti.com> wrote:

> 
> We have NFS4.0 and earlier.  We do not have NFS4.1.  It would be a very sizeable undertaking, requiring illumos community support.  If anyone would lead the charge on that, it'd be a storage-oriented firm, like Nexenta, or Delphix.
> 
But isn't there an Illumos project for pNFS? (http://www.pnfs.com/)

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
If you look like your driver's license photo -- see a doctor.
If you look like your passport photo -- it's too late for a doctor.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <https://omniosce.org/ml-archive/attachments/20150320/7ad538dd/attachment.bin>

From jdg117 at elvis.arl.psu.edu  Fri Mar 20 19:58:45 2015
From: jdg117 at elvis.arl.psu.edu (John D Groenveld)
Date: Fri, 20 Mar 2015 15:58:45 -0400
Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations..
In-Reply-To: Your message of "Fri, 20 Mar 2015 19:39:44 -0000."
	<alpine.GSO.2.00.1503201937030.17954@areb.pvgehf-vg.arg> 
References: <alpine.GSO.2.00.1502181328130.16159@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1502270958170.5310@areb.pvgehf-vg.arg>
	<81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com>
	<alpine.GSO.2.00.1503110028170.15800@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1503120936140.62@areb.pvgehf-vg.arg>
	<201503121415.t2CEFBcC022831@elvis.arl.psu.edu>
	<alpine.GSO.2.00.1503121423580.28055@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1503201937030.17954@areb.pvgehf-vg.arg> 
Message-ID: <201503201958.t2KJwjI4025021@elvis.arl.psu.edu>

In message <alpine.GSO.2.00.1503201937030.17954 at areb.pvgehf-vg.arg>, Andy write
s:
>This definitely looks like a driver problem but I'm making progress.
>It seems that the code for handling logical versus physical disks on an
>LSI Invader controller is different and the PD code has some issues.

How much of a performance difference between mr_sas 6.503.00.00ILLUMOS
and LSI's 6.606.07.00?

John
groenveld at acm.org

From illumos at cucumber.demon.co.uk  Fri Mar 20 20:09:46 2015
From: illumos at cucumber.demon.co.uk (Andrew Gabriel)
Date: Fri, 20 Mar 2015 20:09:46 +0000
Subject: [OmniOS-discuss] NFS v4.1 in OmniOS stable?
In-Reply-To: <3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com>
References: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com>
	<3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com>
Message-ID: <550C7E8A.6020309@cucumber.demon.co.uk>

Dan McDonald wrote:
>> On Mar 20, 2015, at 2:27 PM, Jeff Stockett <jstockett at molalla.com> wrote:
>>
>> Does OmniOS support NFS v4.1?  4.1 support is a new feature in esxi v6, and I was trying to set it up as described here:
>>  
>> http://wahlnetwork.com/2015/02/02/nfs-v4-1/
>>  
>> Things of course work fine if I use NFS v3, but if I try v4.1, I get a timeout error when it tries to attach the data store.  Both the omnios server and the esxi client are properly joined to Active Directory so I think the required Kerberos stuff should be working.
>>     
>
> We have NFS4.0 and earlier.  We do not have NFS4.1.  It would be a very sizeable undertaking, requiring illumos community support.  If anyone would lead the charge on that, it'd be a storage-oriented firm, like Nexenta, or Delphix.
>   

Does anyone seriously use (or intend to use) 4.1 anymore?

Lustre has the parallel file server market (with some pockets of AFS too).
The large growth of parallel storage servers now tends to be object 
storage, and S3 (as a protocol) has become the standard for that.

Kind of makes me wonder what the market for NFSv4.1 is?

-- 
Andrew

From cks at cs.toronto.edu  Fri Mar 20 20:09:32 2015
From: cks at cs.toronto.edu (Chris Siebenmann)
Date: Fri, 20 Mar 2015 16:09:32 -0400
Subject: [OmniOS-discuss] How to check if you have enough NFS server threads?
Message-ID: <20150320200932.B63517A0690@apps0.cs.toronto.edu>

 We're running into a situation with one of our NFS ZFS fileservers[*]
where we're wondering if we have enough NFS server threads to handle
our load. Per 'sharectl get nfs', we have 'servers=512' configured,
but we're not sure we know how to check how many are actually in use
and active at any given time and whether or not we're running into
this limit.

 Does anyone know how to tell either?

 We've looked at mdb -k's '::svc_pool nfs' but I've concluded that I
don't know enough about OmniOS kernel internals to know for sure what
it's telling us (partly because it seems to be giving us implausibly
high numbers). Is the number we're looking for 'Non detached threads'
minus 'Asleep threads'? (Or that plus detached threads?)

 Thanks in advance.

	- cks
[*: our server setup and configuration is:
	http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFileserverSetupII
]

From ikaufman at eng.ucsd.edu  Fri Mar 20 20:23:19 2015
From: ikaufman at eng.ucsd.edu (Ian Kaufman)
Date: Fri, 20 Mar 2015 13:23:19 -0700
Subject: [OmniOS-discuss] NFS v4.1 in OmniOS stable?
In-Reply-To: <550C7E8A.6020309@cucumber.demon.co.uk>
References: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com>
	<3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com>
	<550C7E8A.6020309@cucumber.demon.co.uk>
Message-ID: <CAPJtH1jYYxfiZqUyMyLYNsSNXhdePU5MbV6C20mhuivZmvp9AA@mail.gmail.com>

Lustre is great for HPC. It lacks the "sanctity of data" that other
solutions have. It is getting there - using ZFS is a huge step
forward, but Lustre is by no means a general purpose solution at this
point.

Ian

On Fri, Mar 20, 2015 at 1:09 PM, Andrew Gabriel
<illumos at cucumber.demon.co.uk> wrote:
> Dan McDonald wrote:
>>>
>>> On Mar 20, 2015, at 2:27 PM, Jeff Stockett <jstockett at molalla.com> wrote:
>>>
>>> Does OmniOS support NFS v4.1?  4.1 support is a new feature in esxi v6,
>>> and I was trying to set it up as described here:
>>>  http://wahlnetwork.com/2015/02/02/nfs-v4-1/
>>>  Things of course work fine if I use NFS v3, but if I try v4.1, I get a
>>> timeout error when it tries to attach the data store.  Both the omnios
>>> server and the esxi client are properly joined to Active Directory so I
>>> think the required Kerberos stuff should be working.
>>>
>>
>>
>> We have NFS4.0 and earlier.  We do not have NFS4.1.  It would be a very
>> sizeable undertaking, requiring illumos community support.  If anyone would
>> lead the charge on that, it'd be a storage-oriented firm, like Nexenta, or
>> Delphix.
>>
>
>
> Does anyone seriously use (or intend to use) 4.1 anymore?
>
> Lustre has the parallel file server market (with some pockets of AFS too).
> The large growth of parallel storage servers now tends to be object storage,
> and S3 (as a protocol) has become the standard for that.
>
> Kind of makes me wonder what the market for NFSv4.1 is?
>
> --
> Andrew
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss



-- 
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu

From richard.elling at richardelling.com  Fri Mar 20 20:46:04 2015
From: richard.elling at richardelling.com (Richard Elling)
Date: Fri, 20 Mar 2015 13:46:04 -0700
Subject: [OmniOS-discuss] How to check if you have enough NFS server
	threads?
In-Reply-To: <20150320200932.B63517A0690@apps0.cs.toronto.edu>
References: <20150320200932.B63517A0690@apps0.cs.toronto.edu>
Message-ID: <5267538B-C118-4D81-B019-D4C8F0A8DAA0@richardelling.com>


> On Mar 20, 2015, at 1:09 PM, Chris Siebenmann <cks at cs.toronto.edu> wrote:
> 
> We're running into a situation with one of our NFS ZFS fileservers[*]
> where we're wondering if we have enough NFS server threads to handle
> our load. Per 'sharectl get nfs', we have 'servers=512' configured,
> but we're not sure we know how to check how many are actually in use
> and active at any given time and whether or not we're running into
> this limit.
> 
> Does anyone know how to tell either?

Yes, these are dynamically sized and you can track via the number of current threads 
as shown by ps or something sneaky like "ls /proc/$(pgrep nfsd)/lwp | wc -l"

Some distros, including Solaris 11.1, have kstats for this information. So when we track
them over time, they can and do change dynamically and quickly.

> 
> We've looked at mdb -k's '::svc_pool nfs' but I've concluded that I
> don't know enough about OmniOS kernel internals to know for sure what
> it's telling us (partly because it seems to be giving us implausibly
> high numbers). Is the number we're looking for 'Non detached threads'
> minus 'Asleep threads'? (Or that plus detached threads?)

In general, the number of threads is an indication of the load of the clients and
the service ability of the server (in queuing theory terms). Too much load gives
the same result as too slow of a back-end. In NFS, clients limit the number of
concurrent requests, which is the best way to deal with too much load.
 -- richard

> 
> Thanks in advance.
> 
> 	- cks
> [*: our server setup and configuration is:
> 	http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFileserverSetupII
> ]
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


From illumos at cucumber.demon.co.uk  Fri Mar 20 21:59:27 2015
From: illumos at cucumber.demon.co.uk (Andrew Gabriel)
Date: Fri, 20 Mar 2015 21:59:27 +0000
Subject: [OmniOS-discuss] How to check if you have enough NFS server
	threads?
In-Reply-To: <20150320200932.B63517A0690@apps0.cs.toronto.edu>
References: <20150320200932.B63517A0690@apps0.cs.toronto.edu>
Message-ID: <550C983F.8070606@cucumber.demon.co.uk>

Chris Siebenmann wrote:
>  We're running into a situation with one of our NFS ZFS fileservers[*]
> where we're wondering if we have enough NFS server threads to handle
> our load. Per 'sharectl get nfs', we have 'servers=512' configured,
> but we're not sure we know how to check how many are actually in use
> and active at any given time and whether or not we're running into
> this limit.
>   

If raising that limit then causes reports of:

WARNING: svc_cots_kdup no slots free
WARNING: svc_clts_kdup no slots free

together with the NFS clients getting EIO errors back, you may need to 
increase the size of the duplicate check cache in the kernel rpcmod by 
raising rpcmod:cotsmaxdupreqs and rpcmod:maxdupreqs (not sure that all 
Illumos distros use same default values).

-- 
Andrew


From omnios at citrus-it.net  Fri Mar 20 23:05:58 2015
From: omnios at citrus-it.net (Andy)
Date: Fri, 20 Mar 2015 23:05:58 +0000 (GMT)
Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations..
In-Reply-To: <201503201958.t2KJwjI4025021@elvis.arl.psu.edu>
References: <alpine.GSO.2.00.1502181328130.16159@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1502270958170.5310@areb.pvgehf-vg.arg>
	<81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com>
	<alpine.GSO.2.00.1503110028170.15800@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1503120936140.62@areb.pvgehf-vg.arg>
	<201503121415.t2CEFBcC022831@elvis.arl.psu.edu>
	<alpine.GSO.2.00.1503121423580.28055@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1503201937030.17954@areb.pvgehf-vg.arg> 
	<201503201958.t2KJwjI4025021@elvis.arl.psu.edu>
Message-ID: <alpine.GSO.2.00.1503202250530.15876@areb.pvgehf-vg.arg>


On Fri, 20 Mar 2015, John D Groenveld wrote:

; In message <alpine.GSO.2.00.1503201937030.17954 at areb.pvgehf-vg.arg>, Andy write
; s:
; >This definitely looks like a driver problem but I'm making progress.
; >It seems that the code for handling logical versus physical disks on an
; >LSI Invader controller is different and the PD code has some issues.
;
; How much of a performance difference between mr_sas 6.503.00.00ILLUMOS
; and LSI's 6.606.07.00?

6.606.. doesn't work for me, it receives the LD map then starts trying to
"kill" the adapter.

6.605.01.00 is fine however and throughput is 10x better than the
Illumos driver with a much more stable response time.

With 6.503.00.00ILLUMOS, the RAID card keeps reporting

03/20/15 23:00:40: C0:iopiSCSIIOCompleteError:
FPESTATUS_DEVHANDLE_OUT_OF_RANGE mid x02e6 PtrMsg xc00ccc00
03/20/15 23:00:40: C0:Out of range devHandle x0000 from SMID x0000022b

about 300 times a second.
-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From henson at acm.org  Fri Mar 20 23:07:21 2015
From: henson at acm.org (Paul B. Henson)
Date: Fri, 20 Mar 2015 16:07:21 -0700
Subject: [OmniOS-discuss] NFS v4.1 in OmniOS stable?
In-Reply-To: <550C7E8A.6020309@cucumber.demon.co.uk>
References: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com>	<3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com>
	<550C7E8A.6020309@cucumber.demon.co.uk>
Message-ID: <057301d06362$a0dae7f0$e290b7d0$@acm.org>

> From: Andrew Gabriel
> Sent: Friday, March 20, 2015 1:10 PM
>
> Does anyone seriously use (or intend to use) 4.1 anymore?

The one and only thing I want out of NFSv4.1 is the protocol fix to the
exclusive open operation, which currently in NFSv4 results in broken
inherited ACLs :(.



From jboren at drakecooper.com  Fri Mar 20 23:15:44 2015
From: jboren at drakecooper.com (Joseph Boren)
Date: Fri, 20 Mar 2015 17:15:44 -0600
Subject: [OmniOS-discuss] list of know-compatible motherboards?
In-Reply-To: <CAGoaUhPwSz=4PwHZYEhkobW5=ZPzTp4+oWon6VFCOpPLj_PiFg@mail.gmail.com>
References: <CAGoaUhMgnSUZK+ub_avHaaub0UZROmGiWnF+FKF7X9dVnuqYpg@mail.gmail.com>
	<CAEekY65+jM9aAhymmvQz2mywECH6s6nVYqczXdGkNRbmNTbZmA@mail.gmail.com>
	<CAGoaUhPwSz=4PwHZYEhkobW5=ZPzTp4+oWon6VFCOpPLj_PiFg@mail.gmail.com>
Message-ID: <CAGoaUhObfD-5JGmYr=T_nzRH8EN-O3qNk0_B_ynY_7PVg+2b-g@mail.gmail.com>

Sorry to dredge this old thread up, but I wanted to add my experience with
the Supermicro H8SGL-F motherboard, on the off chance someone is
considering using that motherboard and reads this thread.   And this is in
no way a criticism of F?bio, or complaint about his recommendations.  I
appreciate the advice and I'm sure his use case is just different enough
from mine that he didn't surface these issues.

This board has some limitations that may make it not a great choice
depending on your intentions for it.  First of all, the BIOS can see a
maximum of 12 attached HDs.  And for some strange reason it sees HDs
attached to HBAs before it sees HDs attached to the onboard SATA
connectors, so if you have 12 drives attached to HBAs you cannot use the
onboard SATA, BIOS can't see it (if you have 11 drives on HBAs you can see
1 local drive, etc).  In addition, if you have more than 12 drives attached
to HBAs, you can't boot from the drives higher than 12.   So I have 14
drives attached to 2 8port HBAs, I can only set any of the first 12 as boot
devices.  13 and 14 cannot be used.   Finally (i think), if you have 12 or
more drives attached to any HBA, on board SATA, whatever, you cannot boot
from a flash drive.  Even if you set it as the ONLY boot device it will
just skip it and complain that there is no bootable device.  If you only
have 11 drives you can boot from USB Flash no problem.  Interestingly, a
USB CDROM is unaffected by this.  You can select and boot from a USB CDROM
regardless of how many drives are attached.  Finally (actually this time I
think), it appears to be impossible to set up a mirrored syspool on this
motherboard, because there is only one slot in the Bios boot order menu for
Hard Disk.  So you can only choose one of the mirror pair as a boot
device.  There is no way to specify another HD as a second priority boot
device.  Now once you get OmniOS loaded it can see and make use of all
drives attached to the system, but you are very restricted in what you can
use for boot devices.

After the better part of 2 weeks of back and forth with Supermicro support
(who have been really nice and cooperative, but unable to do anything about
it), I'm going to have to eat cost of this board/cpu/memory and get
something else.  If your use case is 12 total drives or less, and no
mirrored boot, this board will work fine.  If you need more than 12 drives,
or mirrored syspool, it will not work.

Thanks
Joe

-jb-
*Joseph Boren*

IT Specialist
*DRAKE COOPER*
+ c: (208) 891-2128 + o: (208) 342-0925
+ 416 S. 8th St., Boise, ID 83702
+ w: drakecooper.com + f: /drakecooper <http://facebook.com/drakecooper> +
t: @drakecooper <http://twitter.com/drakecooper>


On Wed, Nov 19, 2014 at 4:54 PM, Joseph Boren <jboren at drakecooper.com>
wrote:

> Wow, F?bio, thanks so much, that is very helpful.  I was looking at
> supermicro motherboards, so your info is perfect.
>
> I will have a look at those, I'm guessing I can find something that fits
> my use case.  Thanks again, the help is much appreciated.
>
> Best regards,
>
> -jb-
> *Joseph Boren*
>
> IT Specialist
> *DRAKE COOPER*
> + c: (208) 891-2128 + o: (208) 342-0925
> + 416 S. 8th St., Boise, ID 83702
> + w: drakecooper.com + f: /drakecooper <http://facebook.com/drakecooper> +
>  t: @drakecooper <http://twitter.com/drakecooper>
>
>
> On Wed, Nov 19, 2014 at 4:48 PM, F?bio Rabelo <fabio at fabiorabelo.wiki.br>
> wrote:
>
>> I can show you what motherboards I have installed and fully working in
>> the customers of mine :
>>
>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DG6-F.cfm
>>
>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DG6.cfm
>>
>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DGi-F.cfm
>>
>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DGi.cfm
>>
>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8SGL-F.cfm
>>
>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8SGL.cfm
>>
>> The ones with LSI SAS controler needs to be flashed with IT firmware, you
>> can find them in the official Supermicro FTP site :
>>
>> ftp://ftp.supermicro.com/Driver/SAS/LSI/
>>
>> Opterons from 8 to 24 cores, no issue whats soever ...
>>
>> Some of them are up and running for over an year !!!
>>
>>
>> F?bio Rabelo
>>
>> 2014-11-19 21:35 GMT-02:00 Joseph Boren <jboren at drakecooper.com>:
>>
>>> Is anyone aware of a list, even a short list, of motherboards that are
>>> known to be compatible with OmniOS?  The illumos HCL doesn't list any
>>> motherboards.
>>>
>>> Thanks,
>>>
>>> -jb-
>>> *Joseph Boren*
>>>
>>> IT Specialist
>>> *DRAKE COOPER*
>>> + c: (208) 891-2128 + o: (208) 342-0925
>>> + 416 S. 8th St., Boise, ID 83702
>>> + w: drakecooper.com + f: /drakecooper <http://facebook.com/drakecooper>
>>>  + t: @drakecooper <http://twitter.com/drakecooper>
>>>
>>>
>>> _______________________________________________
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss at lists.omniti.com
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150320/c4c5f1d4/attachment-0001.html>

From hakansom at ohsu.edu  Sat Mar 21 01:23:04 2015
From: hakansom at ohsu.edu (Marion Hakanson)
Date: Fri, 20 Mar 2015 18:23:04 -0700
Subject: [OmniOS-discuss] list of know-compatible motherboards?
In-Reply-To: Message from Joseph Boren <jboren@drakecooper.com> of "Fri,
	20 Mar 2015 17:15:44 MDT."
	<CAGoaUhObfD-5JGmYr=T_nzRH8EN-O3qNk0_B_ynY_7PVg+2b-g@mail.gmail.com>
Message-ID: <201503210123.t2L1N4sb007295@kyklops.ohsu.edu>

Joseph,

You can work around the "too many drives" problems by making the assumption
that you will never need to boot off of your external disks (the ones attached
to the SAS HBA's).  Then you enter each SAS HBA's BIOS config manager, and
disable booting for that HBA.  The motherboard BIOS will then no longer see
the drives attached to the HBA's.

I do this as a matter of course, because we have systems with as many as 120
drives attached via external SAS HBA's.  No BIOS copes well with so many
potential boot devices.

Regards,

Marion


=================================================================
Subject: Re: [OmniOS-discuss] list of know-compatible motherboards?
From: Joseph Boren <jboren at drakecooper.com>
Date: Fri, 20 Mar 2015 17:15:44 -0600 (16:15 PDT)
To: F??bio Rabelo <fabio at fabiorabelo.wiki.br>
Cc: omnios-discuss <omnios-discuss at lists.omniti.com>

Sorry to dredge this old thread up, but I wanted to add my experience with
the Supermicro H8SGL-F motherboard, on the off chance someone is
considering using that motherboard and reads this thread.   And this is in
no way a criticism of F?bio, or complaint about his recommendations.  I
appreciate the advice and I'm sure his use case is just different enough
from mine that he didn't surface these issues.

This board has some limitations that may make it not a great choice
depending on your intentions for it.  First of all, the BIOS can see a
maximum of 12 attached HDs.  And for some strange reason it sees HDs
attached to HBAs before it sees HDs attached to the onboard SATA
connectors, so if you have 12 drives attached to HBAs you cannot use the
onboard SATA, BIOS can't see it (if you have 11 drives on HBAs you can see
1 local drive, etc).  In addition, if you have more than 12 drives attached
to HBAs, you can't boot from the drives higher than 12.   So I have 14
drives attached to 2 8port HBAs, I can only set any of the first 12 as boot
devices.  13 and 14 cannot be used.   Finally (i think), if you have 12 or
more drives attached to any HBA, on board SATA, whatever, you cannot boot
from a flash drive.  Even if you set it as the ONLY boot device it will
just skip it and complain that there is no bootable device.  If you only
have 11 drives you can boot from USB Flash no problem.  Interestingly, a
USB CDROM is unaffected by this.  You can select and boot from a USB CDROM
regardless of how many drives are attached.  Finally (actually this time I
think), it appears to be impossible to set up a mirrored syspool on this
motherboard, because there is only one slot in the Bios boot order menu for
Hard Disk.  So you can only choose one of the mirror pair as a boot
device.  There is no way to specify another HD as a second priority boot
device.  Now once you get OmniOS loaded it can see and make use of all
drives attached to the system, but you are very restricted in what you can
use for boot devices.

After the better part of 2 weeks of back and forth with Supermicro support
(who have been really nice and cooperative, but unable to do anything about
it), I'm going to have to eat cost of this board/cpu/memory and get
something else.  If your use case is 12 total drives or less, and no
mirrored boot, this board will work fine.  If you need more than 12 drives,
or mirrored syspool, it will not work.

Thanks
Joe

-jb-
*Joseph Boren*
. . .



From jboren at drakecooper.com  Sun Mar 22 21:05:45 2015
From: jboren at drakecooper.com (Joseph Boren)
Date: Sun, 22 Mar 2015 15:05:45 -0600
Subject: [OmniOS-discuss] list of know-compatible motherboards?
In-Reply-To: <0B39E89B-8B5E-4D36-B9D9-18136A87BF58@icloud.com>
References: <CAGoaUhMgnSUZK+ub_avHaaub0UZROmGiWnF+FKF7X9dVnuqYpg@mail.gmail.com>
	<CAEekY65+jM9aAhymmvQz2mywECH6s6nVYqczXdGkNRbmNTbZmA@mail.gmail.com>
	<CAGoaUhPwSz=4PwHZYEhkobW5=ZPzTp4+oWon6VFCOpPLj_PiFg@mail.gmail.com>
	<CAGoaUhObfD-5JGmYr=T_nzRH8EN-O3qNk0_B_ynY_7PVg+2b-g@mail.gmail.com>
	<0B39E89B-8B5E-4D36-B9D9-18136A87BF58@icloud.com>
Message-ID: <CAGoaUhMCqdJizqqUr-5iCvgmqLagQeW+8+cx8WCLN53Kn8bMLA@mail.gmail.com>

Hi Ben,

Thanks for the tip.  It turns out that board had some actual physical
defects that were causing some weird behaviour that was confusing the whole
issue.  What you suggest should work perfectly for that scenario.  I'm
exchanging the board and I'm sure the new one will be fine.

Thanks again for the idea.

Best regards,
joe boren

-jb-
*Joseph Boren*

IT Specialist
*DRAKE COOPER*
+ c: (208) 891-2128 + o: (208) 342-0925
+ 416 S. 8th St., Boise, ID 83702
+ w: drakecooper.com + f: /drakecooper <http://facebook.com/drakecooper> +
t: @drakecooper <http://twitter.com/drakecooper>


On Sat, Mar 21, 2015 at 1:08 AM, Ben Kitching <narratorben at icloud.com>
wrote:

> Hi Joe,
>
> I?ve had similar problems with Supermicro boards in the past.
>
> Have you tried disabling the option ROMs for your HBA?s in the BIOS?
>
> That solved it for us.
>
> On 20 Mar 2015, at 23:15, Joseph Boren <jboren at drakecooper.com> wrote:
>
> Sorry to dredge this old thread up, but I wanted to add my experience with
> the Supermicro H8SGL-F motherboard, on the off chance someone is
> considering using that motherboard and reads this thread.   And this is in
> no way a criticism of F?bio, or complaint about his recommendations.  I
> appreciate the advice and I'm sure his use case is just different enough
> from mine that he didn't surface these issues.
>
> This board has some limitations that may make it not a great choice
> depending on your intentions for it.  First of all, the BIOS can see a
> maximum of 12 attached HDs.  And for some strange reason it sees HDs
> attached to HBAs before it sees HDs attached to the onboard SATA
> connectors, so if you have 12 drives attached to HBAs you cannot use the
> onboard SATA, BIOS can't see it (if you have 11 drives on HBAs you can see
> 1 local drive, etc).  In addition, if you have more than 12 drives attached
> to HBAs, you can't boot from the drives higher than 12.   So I have 14
> drives attached to 2 8port HBAs, I can only set any of the first 12 as boot
> devices.  13 and 14 cannot be used.   Finally (i think), if you have 12 or
> more drives attached to any HBA, on board SATA, whatever, you cannot boot
> from a flash drive.  Even if you set it as the ONLY boot device it will
> just skip it and complain that there is no bootable device.  If you only
> have 11 drives you can boot from USB Flash no problem.  Interestingly, a
> USB CDROM is unaffected by this.  You can select and boot from a USB CDROM
> regardless of how many drives are attached.  Finally (actually this time I
> think), it appears to be impossible to set up a mirrored syspool on this
> motherboard, because there is only one slot in the Bios boot order menu for
> Hard Disk.  So you can only choose one of the mirror pair as a boot
> device.  There is no way to specify another HD as a second priority boot
> device.  Now once you get OmniOS loaded it can see and make use of all
> drives attached to the system, but you are very restricted in what you can
> use for boot devices.
>
> After the better part of 2 weeks of back and forth with Supermicro support
> (who have been really nice and cooperative, but unable to do anything about
> it), I'm going to have to eat cost of this board/cpu/memory and get
> something else.  If your use case is 12 total drives or less, and no
> mirrored boot, this board will work fine.  If you need more than 12 drives,
> or mirrored syspool, it will not work.
>
> Thanks
> Joe
>
>
> -jb-
> *Joseph Boren*
>
> IT Specialist
> *DRAKE COOPER*
> + c: (208) 891-2128 + o: (208) 342-0925
> + 416 S. 8th St., Boise, ID 83702
> + w: drakecooper.com + f: /drakecooper <http://facebook.com/drakecooper> +
>  t: @drakecooper <http://twitter.com/drakecooper>
>
>
>
> On Wed, Nov 19, 2014 at 4:54 PM, Joseph Boren <jboren at drakecooper.com>
> wrote:
>
>> Wow, F?bio, thanks so much, that is very helpful.  I was looking at
>> supermicro motherboards, so your info is perfect.
>>
>> I will have a look at those, I'm guessing I can find something that fits
>> my use case.  Thanks again, the help is much appreciated.
>>
>> Best regards,
>>
>>
>> -jb-
>> *Joseph Boren*
>>
>> IT Specialist
>> *DRAKE COOPER*
>> + c: (208) 891-2128 + o: (208) 342-0925
>> + 416 S. 8th St., Boise, ID 83702
>> + w: drakecooper.com + f: /drakecooper <http://facebook.com/drakecooper>
>> + t: @drakecooper <http://twitter.com/drakecooper>
>>
>>
>>
>> On Wed, Nov 19, 2014 at 4:48 PM, F?bio Rabelo <fabio at fabiorabelo.wiki.br>
>> wrote:
>>
>>> I can show you what motherboards I have installed and fully working in
>>> the customers of mine :
>>>
>>>
>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DG6-F.cfm
>>>
>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DG6.cfm
>>>
>>>
>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DGi-F.cfm
>>>
>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DGi.cfm
>>>
>>>
>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8SGL-F.cfm
>>>
>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8SGL.cfm
>>>
>>> The ones with LSI SAS controler needs to be flashed with IT firmware,
>>> you can find them in the official Supermicro FTP site :
>>>
>>> ftp://ftp.supermicro.com/Driver/SAS/LSI/
>>>
>>> Opterons from 8 to 24 cores, no issue whats soever ...
>>>
>>> Some of them are up and running for over an year !!!
>>>
>>>
>>> F?bio Rabelo
>>>
>>> 2014-11-19 21:35 GMT-02:00 Joseph Boren <jboren at drakecooper.com>:
>>>
>>>> Is anyone aware of a list, even a short list, of motherboards that are
>>>> known to be compatible with OmniOS?  The illumos HCL doesn't list any
>>>> motherboards.
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> -jb-
>>>> *Joseph Boren*
>>>>
>>>> IT Specialist
>>>> *DRAKE COOPER*
>>>> + c: (208) 891-2128 + o: (208) 342-0925
>>>> + 416 S. 8th St., Boise, ID 83702
>>>> + w: drakecooper.com + f: /drakecooper
>>>> <http://facebook.com/drakecooper> + t: @drakecooper
>>>> <http://twitter.com/drakecooper>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> OmniOS-discuss mailing list
>>>> OmniOS-discuss at lists.omniti.com
>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>>>
>>>>
>>>
>>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150322/64ae4894/attachment-0001.html>

From omnios at citrus-it.net  Mon Mar 23 00:01:26 2015
From: omnios at citrus-it.net (Andy)
Date: Mon, 23 Mar 2015 00:01:26 +0000 (GMT)
Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations..
In-Reply-To: <alpine.GSO.2.00.1503201937030.17954@areb.pvgehf-vg.arg>
References: <alpine.GSO.2.00.1502181328130.16159@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1502270958170.5310@areb.pvgehf-vg.arg>
	<81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com>
	<alpine.GSO.2.00.1503110028170.15800@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1503120936140.62@areb.pvgehf-vg.arg>
	<201503121415.t2CEFBcC022831@elvis.arl.psu.edu>
	<alpine.GSO.2.00.1503121423580.28055@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1503201937030.17954@areb.pvgehf-vg.arg>
Message-ID: <alpine.GSO.2.00.1503222350360.27564@areb.pvgehf-vg.arg>

On Fri, 20 Mar 2015, Andy wrote:

; On Thu, 12 Mar 2015, Andy wrote:
; ;
; ; On Thu, 12 Mar 2015, John D Groenveld wrote:
; ;
; ; ; Otherwise, good luck debugging MegaRAID drivers and firmware.
;
; This definitely looks like a driver problem but I'm making progress.
; It seems that the code for handling logical versus physical disks on an
; LSI Invader controller is different and the PD code has some issues.

I think I've cracked it!

carolina# (43) zfs create -o compress=off rpool/test
carolina# (48) dd if=/dev/zero of=/rpool/test/tt bs=512k count=10000
5242880000 bytes transferred in 13.199665 secs (397197954 bytes/sec)

Mirrored rpool on 15K SAS disks. Previously I was hitting 20MB/s maximum.
No more errors in the controller firmware log either.

I'll test properly over the next few days and clean up the diffs, but it
looks good and the changes should only affect the non-RAID code.

Andy
-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From danmcd at omniti.com  Mon Mar 23 03:52:02 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Sun, 22 Mar 2015 23:52:02 -0400
Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations..
In-Reply-To: <alpine.GSO.2.00.1503222350360.27564@areb.pvgehf-vg.arg>
References: <alpine.GSO.2.00.1502181328130.16159@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1502270958170.5310@areb.pvgehf-vg.arg>
	<81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com>
	<alpine.GSO.2.00.1503110028170.15800@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1503120936140.62@areb.pvgehf-vg.arg>
	<201503121415.t2CEFBcC022831@elvis.arl.psu.edu>
	<alpine.GSO.2.00.1503121423580.28055@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1503201937030.17954@areb.pvgehf-vg.arg>
	<alpine.GSO.2.00.1503222350360.27564@areb.pvgehf-vg.arg>
Message-ID: <22F91F32-07D1-42BF-99DE-F6C87037463A@omniti.com>


> On Mar 22, 2015, at 8:01 PM, Andy <omnios at citrus-it.net> wrote:
> 
> 
> I think I've cracked it!
> 
> carolina# (43) zfs create -o compress=off rpool/test
> carolina# (48) dd if=/dev/zero of=/rpool/test/tt bs=512k count=10000
> 5242880000 bytes transferred in 13.199665 secs (397197954 bytes/sec)
> 
> Mirrored rpool on 15K SAS disks. Previously I was hitting 20MB/s maximum.
> No more errors in the controller firmware log either.
> 
> I'll test properly over the next few days and clean up the diffs, but it
> looks good and the changes should only affect the non-RAID code.

Please make sure it gets reviewed on the illumos developer list.  If you're quick, it will make r151014 before I cut it off.

Thank you for cracking it!  :)

Dan


From tobi at oetiker.ch  Mon Mar 23 14:33:24 2015
From: tobi at oetiker.ch (Tobias Oetiker)
Date: Mon, 23 Mar 2015 15:33:24 +0100 (CET)
Subject: [OmniOS-discuss] kvm crashing while running replication send/receive
Message-ID: <alpine.DEB.2.10.1503231528550.6442@engelberg.oetiker.ch>

I got these bunch of new disks when for our (r12 omnios) server and
userd repication send / receive to transfer an existing pool to the
new disks.  While doing so, we found that the kvm instances running
on that machine had a rather pronounced tendency to become
unresponsive. Killing the kvm process and starting it again helped ...

Neither the sending nor the receiving pool were the ones where the
kvm volumes where hosted ...

Any ideas how this can happen ?

cheers
tobi

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
www.oetiker.ch tobi at oetiker.ch +41 62 775 9902


From danmcd at omniti.com  Mon Mar 23 20:14:57 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 23 Mar 2015 16:14:57 -0400
Subject: [OmniOS-discuss] A warning for upgraders with large numbers of BEs
Message-ID: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com>

Soon r151014 will be hitting the streets.  WHEN THAT DOES, I have to warn people, especially those jumping from r151006 to r151014 about a known issue in grub.

The illumos grub has serious memory management issues.  It cannot cope with too many boot environment (BE) entries.

The upper-limit on r151006 was ~60.  The upper-limit on r151014 is ~40.  If you upgrade an r151006 machine with 50 BEs to r151014, you may lose the ability to boot (but not your data or even rpool).

If you have more than 40 BEs on your rpool, I'd highly recommend trimming some back prior to an upgrade.  We've been (the illumos community, not just OmniOS) trying to figure out what to fix in grub, but it's opaque code at best.

The r151014 installation & upgrade page will have this warning as well, but I wanted to give the community a heads-up now, so you could prepare prior to the upgrade to r151014.

Dan


From danmcd at omniti.com  Mon Mar 23 21:40:50 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 23 Mar 2015 17:40:50 -0400
Subject: [OmniOS-discuss] A warning for upgraders with large numbers of
	BEs
In-Reply-To: <20150323205308.GA21991@linux.gyakg.u-szeged.hu>
References: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com>
	<20150323205308.GA21991@linux.gyakg.u-szeged.hu>
Message-ID: <DA89912E-A5E1-46FD-B6EA-6A19F8958929@omniti.com>

The LX brand hadn't been upstreamed yet. Once it has, we will include it. We will likely assist in its upstreaming, but not at the moment.

Dan

Sent from my iPhone (typos, autocorrect, and all)

> On Mar 23, 2015, at 4:53 PM, P?SZTOR Gy?rgy <pasztor at linux.gyakg.u-szeged.hu> wrote:
> 
> Hi,
> 
> "Dan McDonald" <danmcd at omniti.com> wrote at 2015-03-23 16:14:
>> Soon r151014 will be hitting the streets.  WHEN THAT DOES, I have to warn people, especially those jumping from r151006 to r151014 about a known issue in grub.
>> 
>> The illumos grub has serious memory management issues.  It cannot cope with too many boot environment (BE) entries.
> 
> Sorry for semi-offtopicing the thread, but: Will the lx brand be restored
> in the upcoming release?
> 
> Is there a feature map / release plan / anything available?
> I tried to find information regarding this topic without success.
> 
> I checked this url:
> http://omnios.omniti.com/roadmap.php
> But nothing relevant information was there. It seems outdated /
> unmaintained.
> 
> I've just recently find this distro. I used openindiana since Oracle...
> -- Did what they did to opensolaris --
> 
> So, I'm new here, sorry for lame questions.
> 
> Kind regards,
> Gy?rgy P?sztor

From jboren at drakecooper.com  Mon Mar 23 23:34:21 2015
From: jboren at drakecooper.com (Joseph Boren)
Date: Mon, 23 Mar 2015 17:34:21 -0600
Subject: [OmniOS-discuss] list of know-compatible motherboards?
In-Reply-To: <CAGoaUhMCqdJizqqUr-5iCvgmqLagQeW+8+cx8WCLN53Kn8bMLA@mail.gmail.com>
References: <CAGoaUhMgnSUZK+ub_avHaaub0UZROmGiWnF+FKF7X9dVnuqYpg@mail.gmail.com>
	<CAEekY65+jM9aAhymmvQz2mywECH6s6nVYqczXdGkNRbmNTbZmA@mail.gmail.com>
	<CAGoaUhPwSz=4PwHZYEhkobW5=ZPzTp4+oWon6VFCOpPLj_PiFg@mail.gmail.com>
	<CAGoaUhObfD-5JGmYr=T_nzRH8EN-O3qNk0_B_ynY_7PVg+2b-g@mail.gmail.com>
	<0B39E89B-8B5E-4D36-B9D9-18136A87BF58@icloud.com>
	<CAGoaUhMCqdJizqqUr-5iCvgmqLagQeW+8+cx8WCLN53Kn8bMLA@mail.gmail.com>
Message-ID: <CAGoaUhNdP-N3T50E4FjimeDx9ypYGUiV_A8f2ftvGgK5fVcPKQ@mail.gmail.com>

Well, I have a dumb question, if everyone isn't fed up with me.  This board
appears to have a hardware fault, and I'm trying to figure out the exact
details for the RMA exchange.   First of all the second Ethernet port
doesn't work.  If you plug it into a switchport, the speed/duplex led comes
on seeming to indicate that it is linking up at 1000/full duplex, but the
link/activity light never comes on.  OmniOS only sees one ethernet port on
the motherboard.

The issue I'm struggling with, however is identifying a failed PCIEX
device.  When the machine boots right before the login prompt comes up I
get an error:

Warning: one or more I/O devices have been retired.

When i check to see what the device is using "fmadm faulty" i get the
following:

Fault Class: fault.io.pciex.device-interr
Affects: dev:////pci at 0,0/pci1002,5a1d at a/pci15d9,a711 at 0  faulted and taken
out of service
FRU: "MB"
(hc://:product-id=H8SLG:server-id=omnistor1:chassis-id=1234567890/motherboard=0)

Description:  A problem was detected for a PCIEX device.  Rever to
http://illumos.org/msg/PCIEX-8000-0A for more information.

I'm having trouble identifying exactly what device it's referring to.
Seems like something on the motherboard, or is it referring to the
motherboard itself?  It would make sense that it was referring to the
ethernet port, but I'm pretty ignorant about PCIEX and haven't been able to
find any info that corresponds to those numbers.  If someone could point me
in the right direction, I'd be grateful.

Best regards,
Joe Boren

-jb-
*Joseph Boren*

IT Specialist
*DRAKE COOPER*
+ c: (208) 891-2128 + o: (208) 342-0925
+ 416 S. 8th St., Boise, ID 83702
+ w: drakecooper.com + f: /drakecooper <http://facebook.com/drakecooper> +
t: @drakecooper <http://twitter.com/drakecooper>


On Sun, Mar 22, 2015 at 3:05 PM, Joseph Boren <jboren at drakecooper.com>
wrote:

> Hi Ben,
>
> Thanks for the tip.  It turns out that board had some actual physical
> defects that were causing some weird behaviour that was confusing the whole
> issue.  What you suggest should work perfectly for that scenario.  I'm
> exchanging the board and I'm sure the new one will be fine.
>
> Thanks again for the idea.
>
> Best regards,
> joe boren
>
> -jb-
> *Joseph Boren*
>
> IT Specialist
> *DRAKE COOPER*
> + c: (208) 891-2128 + o: (208) 342-0925
> + 416 S. 8th St., Boise, ID 83702
> + w: drakecooper.com + f: /drakecooper <http://facebook.com/drakecooper> +
>  t: @drakecooper <http://twitter.com/drakecooper>
>
>
> On Sat, Mar 21, 2015 at 1:08 AM, Ben Kitching <narratorben at icloud.com>
> wrote:
>
>> Hi Joe,
>>
>> I?ve had similar problems with Supermicro boards in the past.
>>
>> Have you tried disabling the option ROMs for your HBA?s in the BIOS?
>>
>> That solved it for us.
>>
>> On 20 Mar 2015, at 23:15, Joseph Boren <jboren at drakecooper.com> wrote:
>>
>> Sorry to dredge this old thread up, but I wanted to add my experience
>> with the Supermicro H8SGL-F motherboard, on the off chance someone is
>> considering using that motherboard and reads this thread.   And this is in
>> no way a criticism of F?bio, or complaint about his recommendations.  I
>> appreciate the advice and I'm sure his use case is just different enough
>> from mine that he didn't surface these issues.
>>
>> This board has some limitations that may make it not a great choice
>> depending on your intentions for it.  First of all, the BIOS can see a
>> maximum of 12 attached HDs.  And for some strange reason it sees HDs
>> attached to HBAs before it sees HDs attached to the onboard SATA
>> connectors, so if you have 12 drives attached to HBAs you cannot use the
>> onboard SATA, BIOS can't see it (if you have 11 drives on HBAs you can see
>> 1 local drive, etc).  In addition, if you have more than 12 drives attached
>> to HBAs, you can't boot from the drives higher than 12.   So I have 14
>> drives attached to 2 8port HBAs, I can only set any of the first 12 as boot
>> devices.  13 and 14 cannot be used.   Finally (i think), if you have 12 or
>> more drives attached to any HBA, on board SATA, whatever, you cannot boot
>> from a flash drive.  Even if you set it as the ONLY boot device it will
>> just skip it and complain that there is no bootable device.  If you only
>> have 11 drives you can boot from USB Flash no problem.  Interestingly, a
>> USB CDROM is unaffected by this.  You can select and boot from a USB CDROM
>> regardless of how many drives are attached.  Finally (actually this time I
>> think), it appears to be impossible to set up a mirrored syspool on this
>> motherboard, because there is only one slot in the Bios boot order menu for
>> Hard Disk.  So you can only choose one of the mirror pair as a boot
>> device.  There is no way to specify another HD as a second priority boot
>> device.  Now once you get OmniOS loaded it can see and make use of all
>> drives attached to the system, but you are very restricted in what you can
>> use for boot devices.
>>
>> After the better part of 2 weeks of back and forth with Supermicro
>> support (who have been really nice and cooperative, but unable to do
>> anything about it), I'm going to have to eat cost of this board/cpu/memory
>> and get something else.  If your use case is 12 total drives or less, and
>> no mirrored boot, this board will work fine.  If you need more than 12
>> drives, or mirrored syspool, it will not work.
>>
>> Thanks
>> Joe
>>
>>
>> -jb-
>> *Joseph Boren*
>>
>> IT Specialist
>> *DRAKE COOPER*
>> + c: (208) 891-2128 + o: (208) 342-0925
>> + 416 S. 8th St., Boise, ID 83702
>> + w: drakecooper.com + f: /drakecooper <http://facebook.com/drakecooper>
>> + t: @drakecooper <http://twitter.com/drakecooper>
>>
>>
>>
>> On Wed, Nov 19, 2014 at 4:54 PM, Joseph Boren <jboren at drakecooper.com>
>> wrote:
>>
>>> Wow, F?bio, thanks so much, that is very helpful.  I was looking at
>>> supermicro motherboards, so your info is perfect.
>>>
>>> I will have a look at those, I'm guessing I can find something that fits
>>> my use case.  Thanks again, the help is much appreciated.
>>>
>>> Best regards,
>>>
>>>
>>> -jb-
>>> *Joseph Boren*
>>>
>>> IT Specialist
>>> *DRAKE COOPER*
>>> + c: (208) 891-2128 + o: (208) 342-0925
>>> + 416 S. 8th St., Boise, ID 83702
>>> + w: drakecooper.com + f: /drakecooper <http://facebook.com/drakecooper>
>>>  + t: @drakecooper <http://twitter.com/drakecooper>
>>>
>>>
>>>
>>> On Wed, Nov 19, 2014 at 4:48 PM, F?bio Rabelo <fabio at fabiorabelo.wiki.br
>>> > wrote:
>>>
>>>> I can show you what motherboards I have installed and fully working in
>>>> the customers of mine :
>>>>
>>>>
>>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DG6-F.cfm
>>>>
>>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DG6.cfm
>>>>
>>>>
>>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DGi-F.cfm
>>>>
>>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DGi.cfm
>>>>
>>>>
>>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8SGL-F.cfm
>>>>
>>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8SGL.cfm
>>>>
>>>> The ones with LSI SAS controler needs to be flashed with IT firmware,
>>>> you can find them in the official Supermicro FTP site :
>>>>
>>>> ftp://ftp.supermicro.com/Driver/SAS/LSI/
>>>>
>>>> Opterons from 8 to 24 cores, no issue whats soever ...
>>>>
>>>> Some of them are up and running for over an year !!!
>>>>
>>>>
>>>> F?bio Rabelo
>>>>
>>>> 2014-11-19 21:35 GMT-02:00 Joseph Boren <jboren at drakecooper.com>:
>>>>
>>>>> Is anyone aware of a list, even a short list, of motherboards that are
>>>>> known to be compatible with OmniOS?  The illumos HCL doesn't list any
>>>>> motherboards.
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> -jb-
>>>>> *Joseph Boren*
>>>>>
>>>>> IT Specialist
>>>>> *DRAKE COOPER*
>>>>> + c: (208) 891-2128 + o: (208) 342-0925
>>>>> + 416 S. 8th St., Boise, ID 83702
>>>>> + w: drakecooper.com + f: /drakecooper
>>>>> <http://facebook.com/drakecooper> + t: @drakecooper
>>>>> <http://twitter.com/drakecooper>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OmniOS-discuss mailing list
>>>>> OmniOS-discuss at lists.omniti.com
>>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>>>>
>>>>>
>>>>
>>>
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150323/107b1a04/attachment-0001.html>

From omnios at citrus-it.net  Mon Mar 23 23:58:48 2015
From: omnios at citrus-it.net (Andy)
Date: Mon, 23 Mar 2015 23:58:48 +0000 (GMT)
Subject: [OmniOS-discuss] list of know-compatible motherboards?
In-Reply-To: <CAGoaUhNdP-N3T50E4FjimeDx9ypYGUiV_A8f2ftvGgK5fVcPKQ@mail.gmail.com>
References: <CAGoaUhMgnSUZK+ub_avHaaub0UZROmGiWnF+FKF7X9dVnuqYpg@mail.gmail.com>
	<CAEekY65+jM9aAhymmvQz2mywECH6s6nVYqczXdGkNRbmNTbZmA@mail.gmail.com>
	<CAGoaUhPwSz=4PwHZYEhkobW5=ZPzTp4+oWon6VFCOpPLj_PiFg@mail.gmail.com>
	<CAGoaUhObfD-5JGmYr=T_nzRH8EN-O3qNk0_B_ynY_7PVg+2b-g@mail.gmail.com>
	<0B39E89B-8B5E-4D36-B9D9-18136A87BF58@icloud.com>
	<CAGoaUhMCqdJizqqUr-5iCvgmqLagQeW+8+cx8WCLN53Kn8bMLA@mail.gmail.com>
	<CAGoaUhNdP-N3T50E4FjimeDx9ypYGUiV_A8f2ftvGgK5fVcPKQ@mail.gmail.com>
Message-ID: <alpine.GSO.2.00.1503232357150.1515@areb.pvgehf-vg.arg>


On Mon, 23 Mar 2015, Joseph Boren wrote:

; Fault Class: fault.io.pciex.device-interr
; Affects: dev:////pci at 0,0/pci1002,5a1d at a/pci15d9,a711 at 0  faulted and taken
; out of service
; FRU: "MB"
; (hc://:product-id=H8SLG:server-id=omnistor1:chassis-id=1234567890/motherboard=0)

15d9 is SuperMicro
and http://mirror.szepe.net/siv/pcidevs.txt
says that's an embedded MegaRAID.

A.

-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From john.barfield at bissinc.com  Tue Mar 24 19:29:45 2015
From: john.barfield at bissinc.com (John Barfield)
Date: Tue, 24 Mar 2015 19:29:45 +0000
Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues
Message-ID: <8E74F76C-13B9-4FDD-95EE-8B07F373B412@bissinc.com>

Greetings OmnisOS community! This is my first time to ask a question on 
this list so here goes. I?ve deployed a zone on omnios and a KVM virtual 
machine within the zone. I?ve been doing some initial virtio network 
interface performance testing with iperf and the following are my results 
(default out of the box for all moving parts). 


Host:
OmniOS build stable: r151012
Test Interfaces:
GZ Phys = igb0
KVM Vnic: kvm0 over igb0
GZ Vnic: gvm0 over igb0
Zone Vnic: zvm0 over igb0

Network addressing: All vnics are with 10.128.255.249/29

Zone Brand: Omni-ti ipkg
Qemu-Kvm Guest: Centos 6.6 x86_64


iPerf Results:

KVM Guest -> Global Zone = 151 Mbytes (Expected close to 1 GByte)

KVM Guest -> KVM Zone = 147 Mbytes (Expected close to 1 GByte)

Zone -> Global Zone = 5.0GBytes (These were expected since it was a host 
only VNIC network)

GZ -> Zone = 4.7 Gbytes (These were expected since it was a host only VNIC 
network)


My question is are there any tweaks that I?m missing to get the full 
performance potential within the guest? Why am I only seeing 147 Mbytes 
between KVM and the hosting zone or the global zone? 

I?m testing with an isolated network and vnics only, so the traffic is 
never leaving the physical host to go over the wire.

I do have cpu capped at 16 cores and memory capped at 16GB of memory in 
the zone. Is there some default network capping that I?m missing? Or 
process throttling?

MTU is 1500 across the board. 

I did the same test with etherstubs at first but though maybe I was having 
an MTU mismatch because I received the same 147 Mbyte result?however a 
subsequent test using just the GZ -> child zone showed 5.0GBps over the 
etherstub switch just like when I only used the VNIC?s over igb0.  

Also just for grins I tested two bare metal hosts on my physical network 
with iperf?one being CentOS 6.5 and the other OmniOS build r151012 and 
received 1.09 Gbytes over a physical switch. 

Your thoughts are appreciated!








John Barfield / Sr Principal Engineer
+1 (214) 425-0783/ john.barfield at bissinc.com
BISS, Inc. Office: +1 (214) 506-8354 

4925 Greenville Ave Suite 900
Dallas, TX 75206
support.bissinc.com <http://htmlsig.com/support.bissinc.com>

From moo at wuffers.net  Tue Mar 24 21:17:33 2015
From: moo at wuffers.net (wuffers)
Date: Tue, 24 Mar 2015 17:17:33 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
 when creating zero eager disks
Message-ID: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>

I recently created a pair of 25TB LUs for use in my VMware environment to
test out Veeam (and using that space for my repo - yes, yes, backups should
not reside in the same storage, but they will be exported to tape).

So while trying to create a 16TB drive in the vSphere fat client, I got the
value out of range error (
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2054952).
OKed the error, and task seemed to run anyways, but at some point my whole
SAN crashed during the creation of the drive.

As this was during business hours, I did not have time to wait on the dump,
but I was able to reproduce it later trying to create a 10TB drive (again
from the fat vSphere client, not web client) and capture the dump (which
takes 40 minutes.. grr).

Just an quick note on the environment: the VMware hosts are connected to
the head unit via IB and SRP. The largest LUs I had previously created for
VMware were 5TB in size, and largest drive created was 2TB.

fmdump info:

TIME                           UUID
SUNW-MSG-ID
Mar 20 2015 19:35:26.819716000 31ced65f-dca2-ee58-c882-a6daa6b94208
SUNOS-8000-KL

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 31ced65f-dca2-ee58-c882-a6daa6b94208
        code = SUNOS-8000-KL
        diag-time = 1426894526 787544
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru =
sw:///:path=/var/crash/unknown/.31ced65f-dca2-ee58-c882-a6daa6b94208
                resource =
sw:///:path=/var/crash/unknown/.31ced65f-dca2-ee58-c882-a6daa6b94208
                savecore-succcess = 1
                dump-dir = /var/crash/unknown
                dump-files = vmdump.0
                os-instance-uuid = 31ced65f-dca2-ee58-c882-a6daa6b94208
                panicstr = kernel heap corruption detected
                panicstack = fffffffffba49114 () |
genunix:kmem_slab_free+c1 () | genunix:kmem_magazine_destroy+6e () |
genunix:kmem_cache_magazine_purge+f0 () |
genunix:kmem_cache_magazine_resize+40 () | genunix:taskq_thread+2d0 () |
unix:thread_start+8 () |
                crashtime = 1426891707
                panic-time = Fri Mar 20 18:48:27 2015 EDT
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x550caebe 0x30dbdfa0

Crash file:
https://drive.google.com/open?id=0B7mCJnZUzJPKOXl1S3IwYXh4NTg&authuser=0

I couldn't find any interesting comparative posts/reports. Would some kind
soul care to look at the dump and see what is happening here?

(And is this the right spot for a kernel panic report, or is it better to
go to the illumos list?)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150324/0842bce9/attachment.html>

From john.barfield at bissinc.com  Tue Mar 24 21:34:47 2015
From: john.barfield at bissinc.com (John Barfield)
Date: Tue, 24 Mar 2015 21:34:47 +0000
Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues
Message-ID: <F1F9969B-859C-4768-8630-A90AF9B4D0C0@bissinc.com>

Okay found the problem. 

After further testing I achieved 952 MBytes on a VM-2-VM connection...1 
linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two different 
SmartOS host machines (through an extreme networks switch).

This was puzzling so I look at how joyent ran the VM?s command with 
pargs?I found that they do not use the following format:

-net nic,vlan=1,name=${VNIC2},model=virtio,macaddr=${mac2} \
-net vnic,vlan=1,name=${VNIC2},ifname=${VNIC2},macaddr=${mac2} \

They use this format:

-device \
virtio-net-pci,mac=02:08:20:5f:85:0d,tx=timer,x-txtimer=200000,x-txburst=12
8,vlan=0 \
 \
-net \
vnic,name=${VNIC1},vlan=0,ifname=${VNIC1} \




I?m not sure if the txtimer values did anything performance gaining or 
not?I?m pretty sure just switching to the -device configuration instead of 
the legacy -net nic configuration is what did the trick. 

If anyone wants me to I?ll test and see if that was the only difference. 

Have a great day!







John Barfield / Sr Principal Engineer
+1 (214) 425-0783/ john.barfield at bissinc.com
BISS, Inc. Office: +1 (214) 506-8354 

4925 Greenville Ave Suite 900
Dallas, TX 75206
support.bissinc.com <http://htmlsig.com/support.bissinc.com>
This e-mail message may contain confidential or legally privileged 
information and is intended only for the use of the intended recipient(s). 
Any unauthorized disclosure, dissemination, distribution, copying or the 
taking of any action in reliance on the information herein is prohibited. 
E-mails are not secure and cannot be guaranteed to be error free as they 
can be intercepted, amended, or contain viruses. Anyone who communicates 
with us by e-mail is deemed to have accepted these risks. Company Name is 
not responsible for errors or omissions in this message and denies any 
responsibility for any damage arising from the use of e-mail. Any opinion 
and other statement contained in this message and any attachment are 
solely those of the author and do not necessarily represent those of the 
company.






On 3/24/15, 2:29 PM, "John Barfield" <john.barfield at bissinc.com> wrote:

>Greetings OmnisOS community! This is my first time to ask a question on 
>this list so here goes. I?ve deployed a zone on omnios and a KVM virtual 
>machine within the zone. I?ve been doing some initial virtio network 
>interface performance testing with iperf and the following are my results 
>(default out of the box for all moving parts). 
>
>
>Host:
>OmniOS build stable: r151012
>Test Interfaces:
>GZ Phys = igb0
>KVM Vnic: kvm0 over igb0
>GZ Vnic: gvm0 over igb0
>Zone Vnic: zvm0 over igb0
>
>Network addressing: All vnics are with 10.128.255.249/29
>
>Zone Brand: Omni-ti ipkg
>Qemu-Kvm Guest: Centos 6.6 x86_64
>
>
>iPerf Results:
>
>KVM Guest -> Global Zone = 151 Mbytes (Expected close to 1 GByte)
>
>KVM Guest -> KVM Zone = 147 Mbytes (Expected close to 1 GByte)
>
>Zone -> Global Zone = 5.0GBytes (These were expected since it was a host 
>only VNIC network)
>
>GZ -> Zone = 4.7 Gbytes (These were expected since it was a host only 
>VNIC 
>network)
>
>
>My question is are there any tweaks that I?m missing to get the full 
>performance potential within the guest? Why am I only seeing 147 Mbytes 
>between KVM and the hosting zone or the global zone? 
>
>I?m testing with an isolated network and vnics only, so the traffic is 
>never leaving the physical host to go over the wire.
>
>I do have cpu capped at 16 cores and memory capped at 16GB of memory in 
>the zone. Is there some default network capping that I?m missing? Or 
>process throttling?
>
>MTU is 1500 across the board. 
>
>I did the same test with etherstubs at first but though maybe I was 
>having 
>an MTU mismatch because I received the same 147 Mbyte result?however a 
>subsequent test using just the GZ -> child zone showed 5.0GBps over the 
>etherstub switch just like when I only used the VNIC?s over igb0.  
>
>Also just for grins I tested two bare metal hosts on my physical network 
>with iperf?one being CentOS 6.5 and the other OmniOS build r151012 and 
>received 1.09 Gbytes over a physical switch. 
>
>Your thoughts are appreciated!
>
>
>
>
>
>
>
>
>John Barfield / Sr Principal Engineer
>+1 (214) 425-0783/ john.barfield at bissinc.com
>BISS, Inc. Office: +1 (214) 506-8354 
>
>4925 Greenville Ave Suite 900
>Dallas, TX 75206
>support.bissinc.com <http://htmlsig.com/support.bissinc.com>
>_______________________________________________
>OmniOS-discuss mailing list
>OmniOS-discuss at lists.omniti.com
>http://lists.omniti.com/mailman/listinfo/omnios-discuss

From danmcd at omniti.com  Tue Mar 24 22:41:15 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 24 Mar 2015 18:41:15 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
	when creating zero eager disks
In-Reply-To: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
Message-ID: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>

Here's a good place to start.  It may need to be kicked to the illumos developer's list, but let's see what we can figure out first.

1.) What revision of OmniOS are you running?

2.) I notice a lot of STMF threads.  COMSTAR (aka. STMF) is not the most stable piece of software in illumos, especially in older revisions.  There's been a lot of work done on it, but that's mostly in Nexenta's distro.  It hasn't been all upstreamed yet.

Dan


From danmcd at omniti.com  Tue Mar 24 22:47:59 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 24 Mar 2015 18:47:59 -0400
Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues
In-Reply-To: <F1F9969B-859C-4768-8630-A90AF9B4D0C0@bissinc.com>
References: <F1F9969B-859C-4768-8630-A90AF9B4D0C0@bissinc.com>
Message-ID: <708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com>


> On Mar 24, 2015, at 5:34 PM, John Barfield <john.barfield at bissinc.com> wrote:
> 
> 
> They use this format:
> 
> -device \
> virtio-net-pci,mac=02:08:20:5f:85:0d,tx=timer,x-txtimer=200000,x-txburst=12
> 8,vlan=0 \
> \
> -net \
> vnic,name=${VNIC1},vlan=0,ifname=${VNIC1} \

> I?m not sure if the txtimer values did anything performance gaining or 
> not?I?m pretty sure just switching to the -device configuration instead of 
> the legacy -net nic configuration is what did the trick. 
> 
> If anyone wants me to I?ll test and see if that was the only difference. 

I would be interested, especially so if we have to update our KVM page to mention this.

Thanks!
Dan


From hasslerd at gmx.li  Tue Mar 24 23:04:26 2015
From: hasslerd at gmx.li (Dominik Hassler)
Date: Wed, 25 Mar 2015 00:04:26 +0100
Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues
In-Reply-To: <708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com>
References: <F1F9969B-859C-4768-8630-A90AF9B4D0C0@bissinc.com>
	<708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com>
Message-ID: <5511ED7A.3080006@gmx.li>

Dan,

>> After further testing I achieved 952 MBytes on a VM-2-VM
>> connection...1
>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two
>> different SmartOS host machines (through an extreme networks switch).

if I got John correctly, he was running his second test on SmartOS hosts...

We did a lot of testing on OmniOS with -net vnic and -device
virtio-net-pci but sadly to no avail...

I think we have to hope that SmartOS kvm improvements will get
upstreamed sooner or later.

On 03/24/2015 11:47 PM, Dan McDonald wrote:
> 
>> On Mar 24, 2015, at 5:34 PM, John Barfield <john.barfield at bissinc.com> wrote:
>>
>>
>> They use this format:
>>
>> -device \
>> virtio-net-pci,mac=02:08:20:5f:85:0d,tx=timer,x-txtimer=200000,x-txburst=12
>> 8,vlan=0 \
>> \
>> -net \
>> vnic,name=${VNIC1},vlan=0,ifname=${VNIC1} \
> 
>> I?m not sure if the txtimer values did anything performance gaining or 
>> not?I?m pretty sure just switching to the -device configuration instead of 
>> the legacy -net nic configuration is what did the trick. 
>>
>> If anyone wants me to I?ll test and see if that was the only difference. 
> 
> I would be interested, especially so if we have to update our KVM page to mention this.
> 
> Thanks!
> Dan
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 

From danmcd at omniti.com  Tue Mar 24 23:12:15 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 24 Mar 2015 19:12:15 -0400
Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues
In-Reply-To: <5511ED7A.3080006@gmx.li>
References: <F1F9969B-859C-4768-8630-A90AF9B4D0C0@bissinc.com>
	<708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com>
	<5511ED7A.3080006@gmx.li>
Message-ID: <870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com>


> On Mar 24, 2015, at 7:04 PM, Dominik Hassler <hasslerd at gmx.li> wrote:
> 
> Dan,
> 
>>> After further testing I achieved 952 MBytes on a VM-2-VM
>>> connection...1
>>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two
>>> different SmartOS host machines (through an extreme networks switch).
> 
> if I got John correctly, he was running his second test on SmartOS hosts...
> 
> We did a lot of testing on OmniOS with -net vnic and -device
> virtio-net-pci but sadly to no avail...
> 
> I think we have to hope that SmartOS kvm improvements will get
> upstreamed sooner or later.

Ahh yes.

I was hoping to have them ready for 014, but it's a complicated process to upstream larger projects, and Joyent was in the middle of getting their new Triton release out the door.

Dan


From moo at wuffers.net  Tue Mar 24 23:44:18 2015
From: moo at wuffers.net (wuffers)
Date: Tue, 24 Mar 2015 19:44:18 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
 when creating zero eager disks
In-Reply-To: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
Message-ID: <CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>

On r151012 since Nov. And yes, the LUs are exposed via COMSTAR.

If it helps to have some kmem_flags set, I can do that and try to reproduce
it in the same way, and have the dump accessible.

On Tue, Mar 24, 2015 at 6:41 PM, Dan McDonald <danmcd at omniti.com> wrote:

> Here's a good place to start.  It may need to be kicked to the illumos
> developer's list, but let's see what we can figure out first.
>
> 1.) What revision of OmniOS are you running?
>
> 2.) I notice a lot of STMF threads.  COMSTAR (aka. STMF) is not the most
> stable piece of software in illumos, especially in older revisions.
> There's been a lot of work done on it, but that's mostly in Nexenta's
> distro.  It hasn't been all upstreamed yet.
>
> Dan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150324/29c5b202/attachment.html>

From danmcd at omniti.com  Tue Mar 24 23:44:54 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 24 Mar 2015 19:44:54 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
	when creating zero eager disks
In-Reply-To: <CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
Message-ID: <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>


> On Mar 24, 2015, at 7:44 PM, wuffers <moo at wuffers.net> wrote:
> 
> On r151012 since Nov. And yes, the LUs are exposed via COMSTAR.
> 
> If it helps to have some kmem_flags set, I can do that and try to reproduce it in the same way, and have the dump accessible.

kmem_flags=0xf + the actual coredump would be amazingly useful.

Thanks,
Dan


From john.barfield at bissinc.com  Tue Mar 24 23:45:56 2015
From: john.barfield at bissinc.com (John Barfield)
Date: Tue, 24 Mar 2015 23:45:56 +0000
Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues
In-Reply-To: <870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com>
References: <F1F9969B-859C-4768-8630-A90AF9B4D0C0@bissinc.com>
	<708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com>
	<5511ED7A.3080006@gmx.li>,
	<870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com>
Message-ID: <CD455F01-7BF0-467A-B70D-C6D3C8DA0025@bissinc.com>

Btw I did go ahead and test both virtio methods...I gave a vm the -device argument on one interface and the -net argument for another the results where....

-device = eth0 = 952mbps
-net = eth1 = 199 mbps

Thanks and have a great day,

John Barfield

> On Mar 24, 2015, at 6:12 PM, Dan McDonald <danmcd at omniti.com> wrote:
> 
> 
>> On Mar 24, 2015, at 7:04 PM, Dominik Hassler <hasslerd at gmx.li> wrote:
>> 
>> Dan,
>> 
>>>> After further testing I achieved 952 MBytes on a VM-2-VM
>>>> connection...1
>>>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two
>>>> different SmartOS host machines (through an extreme networks switch).
>> 
>> if I got John correctly, he was running his second test on SmartOS hosts...
>> 
>> We did a lot of testing on OmniOS with -net vnic and -device
>> virtio-net-pci but sadly to no avail...
>> 
>> I think we have to hope that SmartOS kvm improvements will get
>> upstreamed sooner or later.
> 
> Ahh yes.
> 
> I was hoping to have them ready for 014, but it's a complicated process to upstream larger projects, and Joyent was in the middle of getting their new Triton release out the door.
> 
> Dan
> 

From phil.harman at gmail.com  Wed Mar 25 00:40:47 2015
From: phil.harman at gmail.com (Phil Harman)
Date: Wed, 25 Mar 2015 00:40:47 +0000
Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues
In-Reply-To: <CD455F01-7BF0-467A-B70D-C6D3C8DA0025@bissinc.com>
References: <F1F9969B-859C-4768-8630-A90AF9B4D0C0@bissinc.com>
	<708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com>
	<5511ED7A.3080006@gmx.li>
	<870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com>
	<CD455F01-7BF0-467A-B70D-C6D3C8DA0025@bissinc.com>
Message-ID: <BDEFC1B7-95FC-4C81-A932-7DE7532E3775@gmail.com>

John,

Interesting work and data. Thanks for sharing.

I've also been playing with Oracle Solaris 11.2 vs Linux vs FreeBSD on SmartOS vs ESX5.5 (free edition) both VM2VM and VM to remote host over a couple of Intel 10GBASE-T cards.

As far as I can tell, there remains no virtio-net driver for Solaris / Illumos guests, so I've been using e1000g, which really sucks.

I found virtio-net works ok under KVM, but was blown away by vmxnet3 under ESX performance (for which a Solaris / Illumos drivers do exist), being able to get close to 8gbps from the guest over the wire!

To achieve this I had to use jumbo frames (something the current Solaris 11.2 e1000g appears unable to do at all any more).

So I was wondering, while you are there, whether you've got (or can get) any data for KVM virtio-net VM2VM using jumbo frames?

My expectation would be at least 2x for MTU 9000 vs 1500.

I also wonder whether like for like comparison with ESX might encourage further improvements?

As someone used to say at Sun "If Linux is faster, it's a Solaris bug!". It would be great if the community could agree to the same for ESX vs KVM :)

Cheers,
Phil


> On 24 Mar 2015, at 23:45, John Barfield <john.barfield at bissinc.com> wrote:
> 
> Btw I did go ahead and test both virtio methods...I gave a vm the -device argument on one interface and the -net argument for another the results where....
> 
> -device = eth0 = 952mbps
> -net = eth1 = 199 mbps
> 
> Thanks and have a great day,
> 
> John Barfield
> 
>> On Mar 24, 2015, at 6:12 PM, Dan McDonald <danmcd at omniti.com> wrote:
>> 
>> 
>>> On Mar 24, 2015, at 7:04 PM, Dominik Hassler <hasslerd at gmx.li> wrote:
>>> 
>>> Dan,
>>> 
>>>>> After further testing I achieved 952 MBytes on a VM-2-VM
>>>>> connection...1
>>>>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two
>>>>> different SmartOS host machines (through an extreme networks switch).
>>> 
>>> if I got John correctly, he was running his second test on SmartOS hosts...
>>> 
>>> We did a lot of testing on OmniOS with -net vnic and -device
>>> virtio-net-pci but sadly to no avail...
>>> 
>>> I think we have to hope that SmartOS kvm improvements will get
>>> upstreamed sooner or later.
>> 
>> Ahh yes.
>> 
>> I was hoping to have them ready for 014, but it's a complicated process to upstream larger projects, and Joyent was in the middle of getting their new Triton release out the door.
>> 
>> Dan
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

From jboren at drakecooper.com  Wed Mar 25 01:21:38 2015
From: jboren at drakecooper.com (Joseph Boren)
Date: Tue, 24 Mar 2015 19:21:38 -0600
Subject: [OmniOS-discuss] list of know-compatible motherboards?
Message-ID: <CAGoaUhPzeMubKDDwhy5e3ZgKsXDPpwzX7P-bO+FXhBtO-73Prw@mail.gmail.com>

Hi Andy,

Thanks very much for the info, that's very helpful.  Much appreciated.

Best regards,
Joe Boren


>
> Message: 1
> Date: Mon, 23 Mar 2015 23:58:48 +0000 (GMT)
> From: Andy <omnios at citrus-it.net>
> To: omnios-discuss <omnios-discuss at lists.omniti.com>
> Subject: Re: [OmniOS-discuss] list of know-compatible motherboards?
> Message-ID: <alpine.GSO.2.00.1503232357150.1515 at areb.pvgehf-vg.arg>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
>
> On Mon, 23 Mar 2015, Joseph Boren wrote:
>
> ; Fault Class: fault.io.pciex.device-interr
> ; Affects: dev:////pci at 0,0/pci1002,5a1d at a/pci15d9,a711 at 0  faulted and
> taken
> ; out of service
> ; FRU: "MB"
> ;
> (hc://:product-id=H8SLG:server-id=omnistor1:chassis-id=1234567890/motherboard=0)
>
> 15d9 is SuperMicro
> and http://mirror.szepe.net/siv/pcidevs.txt
> says that's an embedded MegaRAID.
>
> A.
>
> --
> Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
> Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
> Registered in England and Wales | Company number 4899123
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150324/8bf5d567/attachment.html>

From john.barfield at bissinc.com  Wed Mar 25 01:50:25 2015
From: john.barfield at bissinc.com (John Barfield)
Date: Wed, 25 Mar 2015 01:50:25 +0000
Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues
In-Reply-To: <BDEFC1B7-95FC-4C81-A932-7DE7532E3775@gmail.com>
References: <F1F9969B-859C-4768-8630-A90AF9B4D0C0@bissinc.com>
	<708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com>
	<5511ED7A.3080006@gmx.li>
	<870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com>
	<CD455F01-7BF0-467A-B70D-C6D3C8DA0025@bissinc.com>,
	<BDEFC1B7-95FC-4C81-A932-7DE7532E3775@gmail.com>
Message-ID: <38C9C9CA-A75F-49F6-B210-C04CA81402E4@bissinc.com>

Actually the numbers I sent for the SmartOS VM to VM test were on a switch with Jumbo frames (switch = 9216 mtu...SmartOS GZ MTU = 9000) (Extreme Networks Summit X440-48t release 15.2.3 patch12) Theyre also sitting in Q-in-Q tagged VLANs. Admin tagged nic sits in Vman (provider bridge) 10 the VMs were tagged in VLAN 1674. (not bad :) really)

As far as everyone who is wondering how I got 952 Mbps on OmnisOS KVM while running in a zone I plan to write up a how-to that can be posted to the core site if you'd like. There are several caveats that are not documented today for running KVM in a zone. Not that I didnt reverse engineer some of Joyents work of course.



Thanks and have a great day,

John Barfield

> On Mar 24, 2015, at 7:40 PM, Phil Harman <phil.harman at gmail.com> wrote:
> 
> John,
> 
> Interesting work and data. Thanks for sharing.
> 
> I've also been playing with Oracle Solaris 11.2 vs Linux vs FreeBSD on SmartOS vs ESX5.5 (free edition) both VM2VM and VM to remote host over a couple of Intel 10GBASE-T cards.
> 
> As far as I can tell, there remains no virtio-net driver for Solaris / Illumos guests, so I've been using e1000g, which really sucks.
> 
> I found virtio-net works ok under KVM, but was blown away by vmxnet3 under ESX performance (for which a Solaris / Illumos drivers do exist), being able to get close to 8gbps from the guest over the wire!
> 
> To achieve this I had to use jumbo frames (something the current Solaris 11.2 e1000g appears unable to do at all any more).
> 
> So I was wondering, while you are there, whether you've got (or can get) any data for KVM virtio-net VM2VM using jumbo frames?
> 
> My expectation would be at least 2x for MTU 9000 vs 1500.
> 
> I also wonder whether like for like comparison with ESX might encourage further improvements?
> 
> As someone used to say at Sun "If Linux is faster, it's a Solaris bug!". It would be great if the community could agree to the same for ESX vs KVM :)
> 
> Cheers,
> Phil
> 
> 
>> On 24 Mar 2015, at 23:45, John Barfield <john.barfield at bissinc.com> wrote:
>> 
>> Btw I did go ahead and test both virtio methods...I gave a vm the -device argument on one interface and the -net argument for another the results where....
>> 
>> -device = eth0 = 952mbps
>> -net = eth1 = 199 mbps
>> 
>> Thanks and have a great day,
>> 
>> John Barfield
>> 
>>> On Mar 24, 2015, at 6:12 PM, Dan McDonald <danmcd at omniti.com> wrote:
>>> 
>>> 
>>>> On Mar 24, 2015, at 7:04 PM, Dominik Hassler <hasslerd at gmx.li> wrote:
>>>> 
>>>> Dan,
>>>> 
>>>>>> After further testing I achieved 952 MBytes on a VM-2-VM
>>>>>> connection...1
>>>>>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two
>>>>>> different SmartOS host machines (through an extreme networks switch).
>>>> 
>>>> if I got John correctly, he was running his second test on SmartOS hosts...
>>>> 
>>>> We did a lot of testing on OmniOS with -net vnic and -device
>>>> virtio-net-pci but sadly to no avail...
>>>> 
>>>> I think we have to hope that SmartOS kvm improvements will get
>>>> upstreamed sooner or later.
>>> 
>>> Ahh yes.
>>> 
>>> I was hoping to have them ready for 014, but it's a complicated process to upstream larger projects, and Joyent was in the middle of getting their new Triton release out the door.
>>> 
>>> Dan
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss

From jesus at omniti.com  Wed Mar 25 11:56:31 2015
From: jesus at omniti.com (Theo Schlossnagle)
Date: Wed, 25 Mar 2015 07:56:31 -0400
Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues
In-Reply-To: <38C9C9CA-A75F-49F6-B210-C04CA81402E4@bissinc.com>
References: <F1F9969B-859C-4768-8630-A90AF9B4D0C0@bissinc.com>
	<708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com>
	<5511ED7A.3080006@gmx.li>
	<870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com>
	<CD455F01-7BF0-467A-B70D-C6D3C8DA0025@bissinc.com>
	<BDEFC1B7-95FC-4C81-A932-7DE7532E3775@gmail.com>
	<38C9C9CA-A75F-49F6-B210-C04CA81402E4@bissinc.com>
Message-ID: <CACLsApv4Zdd+d2RJ0VUUd+xMDQtyY7kzkTdfL2F_Zg124dTfXQ@mail.gmail.com>

+1 John.  That documentation would be very welcome.

On Tue, Mar 24, 2015 at 9:50 PM, John Barfield <john.barfield at bissinc.com>
wrote:

> Actually the numbers I sent for the SmartOS VM to VM test were on a switch
> with Jumbo frames (switch = 9216 mtu...SmartOS GZ MTU = 9000) (Extreme
> Networks Summit X440-48t release 15.2.3 patch12) Theyre also sitting in
> Q-in-Q tagged VLANs. Admin tagged nic sits in Vman (provider bridge) 10 the
> VMs were tagged in VLAN 1674. (not bad :) really)
>
> As far as everyone who is wondering how I got 952 Mbps on OmnisOS KVM
> while running in a zone I plan to write up a how-to that can be posted to
> the core site if you'd like. There are several caveats that are not
> documented today for running KVM in a zone. Not that I didnt reverse
> engineer some of Joyents work of course.
>
>
>
> Thanks and have a great day,
>
> John Barfield
>
> > On Mar 24, 2015, at 7:40 PM, Phil Harman <phil.harman at gmail.com> wrote:
> >
> > John,
> >
> > Interesting work and data. Thanks for sharing.
> >
> > I've also been playing with Oracle Solaris 11.2 vs Linux vs FreeBSD on
> SmartOS vs ESX5.5 (free edition) both VM2VM and VM to remote host over a
> couple of Intel 10GBASE-T cards.
> >
> > As far as I can tell, there remains no virtio-net driver for Solaris /
> Illumos guests, so I've been using e1000g, which really sucks.
> >
> > I found virtio-net works ok under KVM, but was blown away by vmxnet3
> under ESX performance (for which a Solaris / Illumos drivers do exist),
> being able to get close to 8gbps from the guest over the wire!
> >
> > To achieve this I had to use jumbo frames (something the current Solaris
> 11.2 e1000g appears unable to do at all any more).
> >
> > So I was wondering, while you are there, whether you've got (or can get)
> any data for KVM virtio-net VM2VM using jumbo frames?
> >
> > My expectation would be at least 2x for MTU 9000 vs 1500.
> >
> > I also wonder whether like for like comparison with ESX might encourage
> further improvements?
> >
> > As someone used to say at Sun "If Linux is faster, it's a Solaris bug!".
> It would be great if the community could agree to the same for ESX vs KVM :)
> >
> > Cheers,
> > Phil
> >
> >
> >> On 24 Mar 2015, at 23:45, John Barfield <john.barfield at bissinc.com>
> wrote:
> >>
> >> Btw I did go ahead and test both virtio methods...I gave a vm the
> -device argument on one interface and the -net argument for another the
> results where....
> >>
> >> -device = eth0 = 952mbps
> >> -net = eth1 = 199 mbps
> >>
> >> Thanks and have a great day,
> >>
> >> John Barfield
> >>
> >>> On Mar 24, 2015, at 6:12 PM, Dan McDonald <danmcd at omniti.com> wrote:
> >>>
> >>>
> >>>> On Mar 24, 2015, at 7:04 PM, Dominik Hassler <hasslerd at gmx.li> wrote:
> >>>>
> >>>> Dan,
> >>>>
> >>>>>> After further testing I achieved 952 MBytes on a VM-2-VM
> >>>>>> connection...1
> >>>>>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two
> >>>>>> different SmartOS host machines (through an extreme networks
> switch).
> >>>>
> >>>> if I got John correctly, he was running his second test on SmartOS
> hosts...
> >>>>
> >>>> We did a lot of testing on OmniOS with -net vnic and -device
> >>>> virtio-net-pci but sadly to no avail...
> >>>>
> >>>> I think we have to hope that SmartOS kvm improvements will get
> >>>> upstreamed sooner or later.
> >>>
> >>> Ahh yes.
> >>>
> >>> I was hoping to have them ready for 014, but it's a complicated
> process to upstream larger projects, and Joyent was in the middle of
> getting their new Triton release out the door.
> >>>
> >>> Dan
> >> _______________________________________________
> >> OmniOS-discuss mailing list
> >> OmniOS-discuss at lists.omniti.com
> >> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>



-- 

Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150325/5f46ed82/attachment.html>

From nsmith at careyweb.com  Wed Mar 25 14:52:26 2015
From: nsmith at careyweb.com (Nate Smith)
Date: Wed, 25 Mar 2015 10:52:26 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption
	detected"	when creating zero eager disks
In-Reply-To: <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
Message-ID: <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>

Can confirm that there are problems with Comstar, especially with Fibre/STMF. Are people seeing problems with iSCSI or does that seem more stable?

-----Original Message-----
From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Dan McDonald
Sent: Tuesday, March 24, 2015 7:45 PM
To: wuffers
Cc: omnios-discuss
Subject: Re: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks


> On Mar 24, 2015, at 7:44 PM, wuffers <moo at wuffers.net> wrote:
> 
> On r151012 since Nov. And yes, the LUs are exposed via COMSTAR.
> 
> If it helps to have some kmem_flags set, I can do that and try to reproduce it in the same way, and have the dump accessible.

kmem_flags=0xf + the actual coredump would be amazingly useful.

Thanks,
Dan

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss




From moo at wuffers.net  Wed Mar 25 15:51:51 2015
From: moo at wuffers.net (wuffers)
Date: Wed, 25 Mar 2015 11:51:51 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
 when creating zero eager disks
In-Reply-To: <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
Message-ID: <CA+tR_Ky3Km7uRNqQf1f7ksEe=yc4NF7ZN-hUizbOCmYnT8mH7A@mail.gmail.com>

On Tue, Mar 24, 2015 at 7:44 PM, Dan McDonald <danmcd at omniti.com> wrote:

>
> > On Mar 24, 2015, at 7:44 PM, wuffers <moo at wuffers.net> wrote:
> >
> > On r151012 since Nov. And yes, the LUs are exposed via COMSTAR.
> >
> > If it helps to have some kmem_flags set, I can do that and try to
> reproduce it in the same way, and have the dump accessible.
>
> kmem_flags=0xf + the actual coredump would be amazingly useful.
>
> Thanks,
> Dan



Going to do this as soon as I can.

Solaris docs say to put the following line in etc/system and reboot:
set kmem_flags=0xf

Can't I just set this dynamically like so (so I can potentially skip 2
reboots)?

echo kmem_flags/W0xf | mdb -kw

I can't comment myself on Fibre/STMF, as we do IB SRP here. I would say
it's been "fairly" stable (can run for months before I see an issue), but
have seen some weird hangups where I had to reboot the head unit (but no
kernel panics).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150325/0dd69362/attachment-0001.html>

From danmcd at omniti.com  Wed Mar 25 16:09:04 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 25 Mar 2015 12:09:04 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
	when creating zero eager disks
In-Reply-To: <CA+tR_Ky3Km7uRNqQf1f7ksEe=yc4NF7ZN-hUizbOCmYnT8mH7A@mail.gmail.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<CA+tR_Ky3Km7uRNqQf1f7ksEe=yc4NF7ZN-hUizbOCmYnT8mH7A@mail.gmail.com>
Message-ID: <C99350BD-7C29-4E8E-91BC-4E2959B27819@omniti.com>


> On Mar 25, 2015, at 11:51 AM, wuffers <moo at wuffers.net> wrote:
> 
> 
> Going to do this as soon as I can.
> 
> Solaris docs say to put the following line in etc/system and reboot:
> set kmem_flags=0xf

That's correct.

> Can't I just set this dynamically like so (so I can potentially skip 2 reboots)?
> 
> echo kmem_flags/W0xf | mdb -kw

No, because those are read at kmem cache creation time at the system's start.

> I can't comment myself on Fibre/STMF, as we do IB SRP here. I would say it's been "fairly" stable (can run for months before I see an issue), but have seen some weird hangups where I had to reboot the head unit (but no kernel panics).

You reproduce this bug by configuring things a specific way, right?  I ask because you seem to have been running okay until you fell down this particular panic rabbit hole with a particular set of things, correct?

Thanks,
Dan


From moo at wuffers.net  Wed Mar 25 18:17:28 2015
From: moo at wuffers.net (wuffers)
Date: Wed, 25 Mar 2015 14:17:28 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
 when creating zero eager disks
In-Reply-To: <C99350BD-7C29-4E8E-91BC-4E2959B27819@omniti.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<CA+tR_Ky3Km7uRNqQf1f7ksEe=yc4NF7ZN-hUizbOCmYnT8mH7A@mail.gmail.com>
	<C99350BD-7C29-4E8E-91BC-4E2959B27819@omniti.com>
Message-ID: <CA+tR_KzSXWqaPsKhtbL4CxR5zu+7HSSKYFOnMv0v0sxnCcvDXA@mail.gmail.com>

On Wed, Mar 25, 2015 at 12:09 PM, Dan McDonald <danmcd at omniti.com> wrote:

>
> > Can't I just set this dynamically like so (so I can potentially skip 2
> reboots)?
> >
> > echo kmem_flags/W0xf | mdb -kw
>
> No, because those are read at kmem cache creation time at the system's
> start.
>
>
Ahh, if I RTFM'd the whole doc, I would have caught this excerpt:

" These are set in conjunction with the global kmem_flags variable at cache
creation time. Setting kmem_flags while the system is running has no effect
on the debugging behavior, except for subsequently created caches (which is
rare after boot-up)."


> I can't comment myself on Fibre/STMF, as we do IB SRP here. I would say
> it's been "fairly" stable (can run for months before I see an issue), but
> have seen some weird hangups where I had to reboot the head unit (but no
> kernel panics).
>
> You reproduce this bug by configuring things a specific way, right?  I ask
> because you seem to have been running okay until you fell down this
> particular panic rabbit hole with a particular set of things, correct?
>
>
>
The panic is happening when I tried to create a 10+TB eager zero vmdk with
the vSphere fat client. I'm assuming that it will happen a third time when
I use the same steps. Since I can't save myself the two reboots, I will
most likely try without the usual Hyper-V and VMware host loads and just
try to create the vmdk and see what happens. So I would say no, I'm not
changing any settings or configuration, just trying to do "normal" things
like create disks, although they are much bigger than anything I've created
before. I do have a 50TB LU for the Hyper-V hosts, but never tried to
create any disk that big on it. If I have time I'll try it on a Hyper-V VM
as well.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150325/28129d55/attachment.html>

From danmcd at omniti.com  Wed Mar 25 18:21:30 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 25 Mar 2015 14:21:30 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
	when creating zero eager disks
In-Reply-To: <CA+tR_KzSXWqaPsKhtbL4CxR5zu+7HSSKYFOnMv0v0sxnCcvDXA@mail.gmail.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<CA+tR_Ky3Km7uRNqQf1f7ksEe=yc4NF7ZN-hUizbOCmYnT8mH7A@mail.gmail.com>
	<C99350BD-7C29-4E8E-91BC-4E2959B27819@omniti.com>
	<CA+tR_KzSXWqaPsKhtbL4CxR5zu+7HSSKYFOnMv0v0sxnCcvDXA@mail.gmail.com>
Message-ID: <371F5645-A4A6-4BAC-A219-21F875F837FD@omniti.com>


> On Mar 25, 2015, at 2:17 PM, wuffers <moo at wuffers.net> wrote:
> 
>> You reproduce this bug by configuring things a specific way, right?  I ask because you seem to have been running okay until you fell down this particular panic rabbit hole with a particular set of things, correct?
>> 
> 
> The panic is happening when I tried to create a 10+TB eager zero vmdk with the vSphere fat client. I'm assuming that it will happen a third time when I use the same steps. Since I can't save myself the two reboots, I will most likely try without the usual Hyper-V and VMware host loads and just try to create the vmdk and see what happens. So I would say no, I'm not changing any settings or configuration, just trying to do "normal" things like create disks, although they are much bigger than anything I've created before. I do have a 50TB LU for the Hyper-V hosts, but never tried to create any disk that big on it. If I have time I'll try it on a Hyper-V VM as well.

I had to ask.

A with-kmem-flags coredump will be very useful.

Thanks,
Dan

p.s. r151014 is coming soon. I'll be curious if it manifests the same (mis-)behavior.  Not a lot of comstar fixes from upstream.


From mir at miras.org  Wed Mar 25 18:52:12 2015
From: mir at miras.org (Michael Rasmussen)
Date: Wed, 25 Mar 2015 19:52:12 +0100
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
 when creating zero eager disks
In-Reply-To: <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
Message-ID: <20150325195212.5a8cebe4@sleipner.datanom.net>

On Wed, 25 Mar 2015 10:52:26 -0400
Nate Smith <nsmith at careyweb.com> wrote:

> Can confirm that there are problems with Comstar, especially with Fibre/STMF. Are people seeing problems with iSCSI or does that seem more stable?
> 
I have used a box as shared storage for proxmox ve presenting
storage for KVM over iSCSI (Comstar) since 151008 (Bloody at that time
due to missing support for the Hudson chipset in 151006) and now
151012. I have not had a single problem in all this time. Omnios and
Comstar have been rock solid.

The first approx. 2 years the connection was through a bond of 1Gb
Intel Nics but the last approx. 1 year I have been using IPoIB which
until now have had the same track record as Gb nics.

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
Scientists are people who build the Brooklyn Bridge and then buy it.
		-- William Buckley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <https://omniosce.org/ml-archive/attachments/20150325/63a5ce0b/attachment.bin>

From omnios at citrus-it.net  Wed Mar 25 23:31:32 2015
From: omnios at citrus-it.net (Andy)
Date: Wed, 25 Mar 2015 23:31:32 +0000 (GMT)
Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013
In-Reply-To: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com>
References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com>
Message-ID: <alpine.GSO.2.00.1503252321360.26187@areb.pvgehf-vg.arg>


On Tue, 10 Mar 2015, Dan McDonald wrote:

; The last bloody didn't have a lot of changes.  This one does.  Let's go over them:
;
; * ipmitool is now 1.8.15

The configure.in patch that enabled the open interface was also removed
along with this upgrade to 1.8.15.

http://omnios.omniti.com/changeset.php/core/omnios-build/b9ed06fb1c62498f8c10acb7cf21e06865a3c74c#d1

Any particular reason? It stops it being able to talk to the interface
delivered by the dependant driver/ipmi package.

bloody# ipmitool -h 2>&1| ggrep -A5 Interfaces
Interfaces:
	lan           IPMI v1.5 LAN Interface [default]
	lanplus       IPMI v2.0 RMCP+ LAN Interface
	serial-terminal  Serial Interface, Terminal Mode
	serial-basic  Serial Interface, Basic Mode

r151012# ipmitool -h 2>&1| ggrep -A5 Interfaces
Interfaces:
        open          Linux OpenIPMI Interface [default]
        lan           IPMI v1.5 LAN Interface
        lanplus       IPMI v2.0 RMCP+ LAN Interface
        serial-terminal  Serial Interface, Terminal Mode
        serial-basic  Serial Interface, Basic Mode

Looking forward to r151014!

Thanks,

Andy

-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From danmcd at omniti.com  Wed Mar 25 23:44:32 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 25 Mar 2015 19:44:32 -0400
Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013
In-Reply-To: <alpine.GSO.2.00.1503252321360.26187@areb.pvgehf-vg.arg>
References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com>
	<alpine.GSO.2.00.1503252321360.26187@areb.pvgehf-vg.arg>
Message-ID: <0587009A-46A7-4432-9AAC-96DD5F9D8524@omniti.com>


> On Mar 25, 2015, at 7:31 PM, Andy <omnios at citrus-it.net> wrote:
> 
> 
> On Tue, 10 Mar 2015, Dan McDonald wrote:
> 
> ; The last bloody didn't have a lot of changes.  This one does.  Let's go over them:
> ;
> ; * ipmitool is now 1.8.15
> 
> The configure.in patch that enabled the open interface was also removed
> along with this upgrade to 1.8.15.
> 
> http://omnios.omniti.com/changeset.php/core/omnios-build/b9ed06fb1c62498f8c10acb7cf21e06865a3c74c#d1
> 
> Any particular reason? It stops it being able to talk to the interface
> delivered by the dependant driver/ipmi package.

I screwed up.  The 1.8.15 source has no configure.in.  I forgot to replace the configure.in patch with a similar configure patch.

bloody(build/ipmitool)[2]% /tmp/build_danmcd/ipmitool-1.8.15/src/ipmitool -h | & ggrep -A5 Interfaces
Interfaces:
        open          Linux OpenIPMI Interface [default]
        lan           IPMI v1.5 LAN Interface 
        lanplus       IPMI v2.0 RMCP+ LAN Interface 
        serial-terminal  Serial Interface, Terminal Mode 
        serial-basic  Serial Interface, Basic Mode 
bloody(build/ipmitool)[0]% 


That look better?

Dan


From danmcd at omniti.com  Wed Mar 25 23:50:00 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 25 Mar 2015 19:50:00 -0400
Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013
In-Reply-To: <0587009A-46A7-4432-9AAC-96DD5F9D8524@omniti.com>
References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com>
	<alpine.GSO.2.00.1503252321360.26187@areb.pvgehf-vg.arg>
	<0587009A-46A7-4432-9AAC-96DD5F9D8524@omniti.com>
Message-ID: <9FE39C53-E15D-49AF-BAB1-A42A00451404@omniti.com>


> On Mar 25, 2015, at 7:44 PM, Dan McDonald <danmcd at omniti.com> wrote:
> 
> I screwed up.  The 1.8.15 source has no configure.in.  I forgot to replace the configure.in patch with a similar configure patch.
> 
> bloody(build/ipmitool)[2]% /tmp/build_danmcd/ipmitool-1.8.15/src/ipmitool -h | & ggrep -A5 Interfaces
> Interfaces:
>        open          Linux OpenIPMI Interface [default]
>        lan           IPMI v1.5 LAN Interface 
>        lanplus       IPMI v2.0 RMCP+ LAN Interface 
>        serial-terminal  Serial Interface, Terminal Mode 
>        serial-basic  Serial Interface, Basic Mode 
> bloody(build/ipmitool)[0]% 

I've pushed the fix back into the master and r151014 branches.

VERY good catch, and I'm very sorry for missing this.

Thank you!
Dan


From omnios at citrus-it.net  Wed Mar 25 23:51:25 2015
From: omnios at citrus-it.net (Andy)
Date: Wed, 25 Mar 2015 23:51:25 +0000 (GMT)
Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013
In-Reply-To: <0587009A-46A7-4432-9AAC-96DD5F9D8524@omniti.com>
References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com>
	<alpine.GSO.2.00.1503252321360.26187@areb.pvgehf-vg.arg>
	<0587009A-46A7-4432-9AAC-96DD5F9D8524@omniti.com>
Message-ID: <alpine.GSO.2.00.1503252350390.29727@areb.pvgehf-vg.arg>


On Wed, 25 Mar 2015, Dan McDonald wrote:
;
; > On Mar 25, 2015, at 7:31 PM, Andy <omnios at citrus-it.net> wrote:
; >
; >
; > On Tue, 10 Mar 2015, Dan McDonald wrote:
; >
; > ; The last bloody didn't have a lot of changes.  This one does.  Let's go over them:
; > ;
; > ; * ipmitool is now 1.8.15
; >
; > The configure.in patch that enabled the open interface was also removed
; > along with this upgrade to 1.8.15.
; >
; > http://omnios.omniti.com/changeset.php/core/omnios-build/b9ed06fb1c62498f8c10acb7cf21e06865a3c74c#d1
; >
; > Any particular reason? It stops it being able to talk to the interface
; > delivered by the dependant driver/ipmi package.
;
; I screwed up.  The 1.8.15 source has no configure.in.  I forgot to replace the configure.in patch with a similar configure patch.
;
; bloody(build/ipmitool)[2]% /tmp/build_danmcd/ipmitool-1.8.15/src/ipmitool -h | & ggrep -A5 Interfaces
; Interfaces:
;         open          Linux OpenIPMI Interface [default]
;         lan           IPMI v1.5 LAN Interface
;         lanplus       IPMI v2.0 RMCP+ LAN Interface
;         serial-terminal  Serial Interface, Terminal Mode
;         serial-basic  Serial Interface, Basic Mode
; bloody(build/ipmitool)[0]%
;
; That look better?

Much! I build my own ipmitool package to /opt anyway but this would have
caught some people.

Thanks,

Andy
-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123


From moo at wuffers.net  Thu Mar 26 04:58:32 2015
From: moo at wuffers.net (wuffers)
Date: Thu, 26 Mar 2015 00:58:32 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
 when creating zero eager disks
In-Reply-To: <20150325195212.5a8cebe4@sleipner.datanom.net>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<20150325195212.5a8cebe4@sleipner.datanom.net>
Message-ID: <CA+tR_KzBgaQWdO41SQhYmR14ZMAFz+8B+0oshGD=Txq0CCnyQQ@mail.gmail.com>

On Wed, Mar 25, 2015 at 2:21 PM, Dan McDonald <danmcd at omniti.com> wrote:

>
>
> A with-kmem-flags coredump will be very useful.
>
>
>
Here we go. I reproduced this with no load on the SAN, just a DC and
vcenter server up, then created my 10TB disk in the vSphere fat client. As
expected, I got the kernel panic again.

TIME                           UUID
SUNW-MSG-ID
Mar 25 2015 21:13:40.122158000 daa21c2c-3a11-4d27-dc1b-a424cb890493
SUNOS-8000-KL

  TIME                 CLASS                                 ENA
  Mar 25 21:13:40.0785 ireport.os.sunos.panic.dump_available
0x0000000000000000
  Mar 25 21:12:23.5223 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = daa21c2c-3a11-4d27-dc1b-a424cb890493
        code = SUNOS-8000-KL
        diag-time = 1427332420 88270
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru =
sw:///:path=/var/crash/unknown/.daa21c2c-3a11-4d27-dc1b-a424cb890493
                resource =
sw:///:path=/var/crash/unknown/.daa21c2c-3a11-4d27-dc1b-a424cb890493
                savecore-succcess = 1
                dump-dir = /var/crash/unknown
                dump-files = vmdump.1
                os-instance-uuid = daa21c2c-3a11-4d27-dc1b-a424cb890493
                panicstr = kernel heap corruption detected
                panicstack = fffffffffba49114 () | genunix:kmem_free+1c8 ()
| stmf_sbd:sbd_handle_write_same_xfer_completion+14d () |
stmf_sbd:sbd_dbuf_xfer_done+b1 () | stmf:stmf_worker_task+376 () |
unix:thread_start+8 () |
                crashtime = 1427330450
                panic-time = Wed Mar 25 20:40:50 2015 EDT
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x55135d44 0x747fbb0


Crash file:
https://drive.google.com/open?id=0B7mCJnZUzJPKcTNRWGIwejVrV2s&authuser=0

Dump:
https://docs.google.com/uc?id=0B7mCJnZUzJPKZlVjUEQydm1vaE0&export=download

md5sum:
5ecbc150ed6683b90dbf39d4bf42209e  vmdump.6.gz (2433358152 bytes)
aa290f48c4ae9770c47fa62583e4cb70  vmdump.6 (5295046656 bytes)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150326/acaa8f72/attachment.html>

From danmcd at omniti.com  Thu Mar 26 05:06:47 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 26 Mar 2015 01:06:47 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
	when creating zero eager disks
In-Reply-To: <CA+tR_KzBgaQWdO41SQhYmR14ZMAFz+8B+0oshGD=Txq0CCnyQQ@mail.gmail.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<20150325195212.5a8cebe4@sleipner.datanom.net>
	<CA+tR_KzBgaQWdO41SQhYmR14ZMAFz+8B+0oshGD=Txq0CCnyQQ@mail.gmail.com>
Message-ID: <AA9C1B75-6888-4AE1-9C66-DC4D04A491AD@omniti.com>


> On Mar 26, 2015, at 12:58 AM, wuffers <moo at wuffers.net> wrote:
> 
> | genunix:kmem_free+1c8 () | stmf_sbd:sbd_handle_write_same_xfer_completion+14d () | stmf_sbd:sbd_dbuf_xfer_done+b1 () | stmf:stmf_worker_task+376 () | unix:thread_start+8 () |

Hmmph.  The WRITE_SAME code, huh?

I know Nexenta's done a LOT of improvements on this in illumos-nexenta.  It might be time to upstream some of what they've done.  I know it's a moving target (COMSTAR is not a well-written subsystem), so it may take some unravelling.

I'm downloading the dump, in case the actual panic is more straightforward than most code in there.  I worked on this a long time ago back when I was at Nexenta. It was provided to me by a contractor, and I had to bang it into shape for upstreaming.  Clearly I missed something.

Dan


From andreas at luka-online.de  Thu Mar 26 05:25:06 2015
From: andreas at luka-online.de (Andreas Luka)
Date: Thu, 26 Mar 2015 13:25:06 +0800
Subject: [OmniOS-discuss] A warning for upgraders with large numbers of
 BEs
In-Reply-To: <DA89912E-A5E1-46FD-B6EA-6A19F8958929@omniti.com>
References: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com>
	<20150323205308.GA21991@linux.gyakg.u-szeged.hu>
	<DA89912E-A5E1-46FD-B6EA-6A19F8958929@omniti.com>
Message-ID: <op.xv3an4khjdf4tt@bhi-luka.bhi.local>

If the LX brand is in I would volunteer with testing different  
Linux-Disto's.

Regards
Andreas

On Tue, 24 Mar 2015 05:40:50 +0800, Dan McDonald <danmcd at omniti.com> wrote:

> The LX brand hadn't been upstreamed yet. Once it has, we will include  
> it. We will likely assist in its upstreaming, but not at the moment.
>
> Dan
>
> Sent from my iPhone (typos, autocorrect, and all)
>
>> On Mar 23, 2015, at 4:53 PM, P?SZTOR Gy?rgy  
>> <pasztor at linux.gyakg.u-szeged.hu> wrote:
>>
>> Hi,
>>
>> "Dan McDonald" <danmcd at omniti.com> wrote at 2015-03-23 16:14:
>>> Soon r151014 will be hitting the streets.  WHEN THAT DOES, I have to  
>>> warn people, especially those jumping from r151006 to r151014 about a  
>>> known issue in grub.
>>>
>>> The illumos grub has serious memory management issues.  It cannot cope  
>>> with too many boot environment (BE) entries.
>>
>> Sorry for semi-offtopicing the thread, but: Will the lx brand be  
>> restored
>> in the upcoming release?
>>
>> Is there a feature map / release plan / anything available?
>> I tried to find information regarding this topic without success.
>>
>> I checked this url:
>> http://omnios.omniti.com/roadmap.php
>> But nothing relevant information was there. It seems outdated /
>> unmaintained.
>>
>> I've just recently find this distro. I used openindiana since Oracle...
>> -- Did what they did to opensolaris --
>>
>> So, I'm new here, sorry for lame questions.
>>
>> Kind regards,
>> Gy?rgy P?sztor
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


-- 
Using Opera's mail client: http://www.opera.com/mail/

From info at houseofancients.nl  Thu Mar 26 07:09:43 2015
From: info at houseofancients.nl (Floris van Essen ..:: House of Ancients Amstafs ::..)
Date: Thu, 26 Mar 2015 07:09:43 +0000
Subject: [OmniOS-discuss] heads up
Message-ID: <356582D1FC91784992ABB4265A16ED4891027D35@vEX01.mindstorm-internet.local>

Hi Dann,

Running latest Bloody , and after running a weekly check of available updates :

pkg update -nv
Creating Plan (Running solver): |
pkg update: No solution was found to satisfy constraints
Plan Creation: Package solver has not found a solution to update to latest available versions.
This may indicate an overly constrained set of packages are installed.

latest incorporations:

  pkg://omnios/consolidation/osnet/osnet-incorporation at 0.5.11,5.11-0.151014:20150324T181107Z

The following indicates why the system cannot update to the latest version:

  No suitable version of required package pkg://omnios/consolidation/osnet/osnet-incorporation at 0.5.11,5.11-0.151014:20150324T181107Z found:
    Reject:  pkg://omnios/consolidation/osnet/osnet-incorporation at 0.5.11,5.11-0.151014:20150324T181107Z
    Reason:  A version for 'incorporate' dependency on pkg:/SUNWcs at 0.5.11,5.11-0.151014 cannot be found

Can I just install osnet-incorporation at 0.5.11,5.11-0.151014:20150324T181107Z ?

Best regards,

Floris
...:: House of Ancients ::...
American Staffordshire Terriers

+31-628-161-350
+31-614-198-389
Het Perk 48
4903 RB
Oosterhout
Netherlands
www.houseofancients.nl

From alka at hfg-gmuend.de  Thu Mar 26 13:04:46 2015
From: alka at hfg-gmuend.de (Guenther Alka)
Date: Thu, 26 Mar 2015 14:04:46 +0100
Subject: [OmniOS-discuss] Open-VM-Tools in 151014
In-Reply-To: <op.xv3an4khjdf4tt@bhi-luka.bhi.local>
References: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com>
	<20150323205308.GA21991@linux.gyakg.u-szeged.hu>
	<DA89912E-A5E1-46FD-B6EA-6A19F8958929@omniti.com>
	<op.xv3an4khjdf4tt@bhi-luka.bhi.local>
Message-ID: <551403EE.6000803@hfg-gmuend.de>

I have updated 151012 to 151014 and installed
the open-vm-tools (on ESXi 6.0.) for some basic tests.

Installation via pkg install open-vm-tools was ok and ESXi 6.0 shows 
tools running (3rd party tools)
but vmxnet3 is missing. Is vmxnet3 not a part of the open-vm-tools?

Gea

From danmcd at omniti.com  Thu Mar 26 15:00:07 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 26 Mar 2015 11:00:07 -0400
Subject: [OmniOS-discuss] heads up
In-Reply-To: <356582D1FC91784992ABB4265A16ED4891027D35@vEX01.mindstorm-internet.local>
References: <356582D1FC91784992ABB4265A16ED4891027D35@vEX01.mindstorm-internet.local>
Message-ID: <2996D603-4661-4B45-9517-10C08BD0E3A3@omniti.com>

1.) It's "Dan"  one n.  :)

2.) Are you seeing 014 packages in the "bloody" repo?  You shouldn't be.  But shoot, there it is.

I'll clean up the bloody repo today.

Sorry,
Dan


From danmcd at omniti.com  Thu Mar 26 15:03:27 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 26 Mar 2015 11:03:27 -0400
Subject: [OmniOS-discuss] Did someone build omnios-build for 014 and push it
	to the *bloody* repo server?
Message-ID: <FAE5BB7B-6140-49CD-BDEE-266D595C409C@omniti.com>

Subject says it all.  It's possible *I* did, but if someone else has built omnios-build on or for r151014 without setting PKGSRVR, please let me know ASAP.

Thanks,
Dan


From sjorge+ml at blackdot.be  Thu Mar 26 15:04:50 2015
From: sjorge+ml at blackdot.be (Jorge Schrauwen)
Date: Thu, 26 Mar 2015 16:04:50 +0100
Subject: [OmniOS-discuss]
 =?utf-8?q?Did_someone_build_omnios-build_for_014?=
 =?utf-8?q?_and_push_it_to_the_*bloody*_repo_server=3F?=
In-Reply-To: <FAE5BB7B-6140-49CD-BDEE-266D595C409C@omniti.com>
References: <FAE5BB7B-6140-49CD-BDEE-266D595C409C@omniti.com>
Message-ID: <90d462ba2b581b9943cda1d4988cc191@blackdot.be>

Do you have the right mailing list? Seems odd to send it here as only 
omniti people should have access.



On 2015-03-26 16:03, Dan McDonald wrote:
> Subject says it all.  It's possible *I* did, but if someone else has
> built omnios-build on or for r151014 without setting PKGSRVR, please
> let me know ASAP.
> 
> Thanks,
> Dan
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

From danmcd at omniti.com  Thu Mar 26 15:06:06 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 26 Mar 2015 11:06:06 -0400
Subject: [OmniOS-discuss] Open-VM-Tools in 151014
In-Reply-To: <551403EE.6000803@hfg-gmuend.de>
References: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com>
	<20150323205308.GA21991@linux.gyakg.u-szeged.hu>
	<DA89912E-A5E1-46FD-B6EA-6A19F8958929@omniti.com>
	<op.xv3an4khjdf4tt@bhi-luka.bhi.local>
	<551403EE.6000803@hfg-gmuend.de>
Message-ID: <BF475056-D272-4EF8-9ED5-3D2DE04B8D72@omniti.com>

There is no 014 on the bloody repo server, and 014 is NOT OFFICIALLY OUT YET.  An automatic build of some sort pushed out r151014 packages to http://pkg.omniti.com/omnios/bloody/ and it shouldn't have.  I'm cleaning up the repo now.

If you updated via bloody, revert to an 013 BE now.

Dan


From danmcd at omniti.com  Thu Mar 26 15:13:11 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 26 Mar 2015 11:13:11 -0400
Subject: [OmniOS-discuss] Bloody repo is contaminated at the moment...
In-Reply-To: <90d462ba2b581b9943cda1d4988cc191@blackdot.be>
References: <FAE5BB7B-6140-49CD-BDEE-266D595C409C@omniti.com>
	<90d462ba2b581b9943cda1d4988cc191@blackdot.be>
Message-ID: <B313E1BE-BA08-4EEF-A41A-AF9A337BEB5A@omniti.com>


> On Mar 26, 2015, at 11:04 AM, Jorge Schrauwen <sjorge+ml at blackdot.be> wrote:
> 
> Do you have the right mailing list? Seems odd to send it here as only omniti people should have access.

Wrong mailing list.

BUT the "bloody" repo server apparently received packages for 014 when it shouldn't have.  And some people on the list have updated to them or attempted to update to them.

I'm cleaning out the 014 from bloody as I type this.  When r151014 is ready, you'll know!!!

Dan


From danmcd at omniti.com  Thu Mar 26 15:37:26 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 26 Mar 2015 11:37:26 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
	when creating zero eager disks
In-Reply-To: <AA9C1B75-6888-4AE1-9C66-DC4D04A491AD@omniti.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<20150325195212.5a8cebe4@sleipner.datanom.net>
	<CA+tR_KzBgaQWdO41SQhYmR14ZMAFz+8B+0oshGD=Txq0CCnyQQ@mail.gmail.com>
	<AA9C1B75-6888-4AE1-9C66-DC4D04A491AD@omniti.com>
Message-ID: <07B1E231-B571-46BC-913D-445DBCA54923@omniti.com>

I mentioned earlier:

> I know Nexenta's done a LOT of improvements on this in illumos-nexenta.  It might be time to upstream some of what they've done.  I know it's a moving target (COMSTAR is not a well-written subsystem), so it may take some unravelling.

I was looking at Nexenta's changes.  They HAVE done a lot of work in these areas, and at some point someone needs to upstream them.  Nexenta isn't under an obligation to upstream, just to publish, which they have.

I found one particular bug that MAY have manifested as your problem.  Because 014's coming up, I can't get to it at the moment. If you've built kernel modules before, I can tell you where the fix should go and approximately what the fix is.  You'd have to test it, however.

Sorry I can't be of more immediate assistance,
Dan


From moo at wuffers.net  Thu Mar 26 15:47:47 2015
From: moo at wuffers.net (wuffers)
Date: Thu, 26 Mar 2015 11:47:47 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
 when creating zero eager disks
In-Reply-To: <07B1E231-B571-46BC-913D-445DBCA54923@omniti.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<20150325195212.5a8cebe4@sleipner.datanom.net>
	<CA+tR_KzBgaQWdO41SQhYmR14ZMAFz+8B+0oshGD=Txq0CCnyQQ@mail.gmail.com>
	<AA9C1B75-6888-4AE1-9C66-DC4D04A491AD@omniti.com>
	<07B1E231-B571-46BC-913D-445DBCA54923@omniti.com>
Message-ID: <CA+tR_KwWu+hT7bOJxiuf1r5cB6EcMrbE_U0QOuK32=1=HNeeRg@mail.gmail.com>

On Thu, Mar 26, 2015 at 11:37 AM, Dan McDonald <danmcd at omniti.com> wrote:

> I mentioned earlier:
>
> > I know Nexenta's done a LOT of improvements on this in illumos-nexenta.
> It might be time to upstream some of what they've done.  I know it's a
> moving target (COMSTAR is not a well-written subsystem), so it may take
> some unravelling.
>
> I was looking at Nexenta's changes.  They HAVE done a lot of work in these
> areas, and at some point someone needs to upstream them.  Nexenta isn't
> under an obligation to upstream, just to publish, which they have.
>
> I found one particular bug that MAY have manifested as your problem.
> Because 014's coming up, I can't get to it at the moment. If you've built
> kernel modules before, I can tell you where the fix should go and
> approximately what the fix is.  You'd have to test it, however.
>
> Sorry I can't be of more immediate assistance,
> Dan
>
>
>
Hi Dan (just saw your latest reply as I was writing this),

Thanks for all the time you've put into this. It certainly sounds like some
of the Nexenta COMSTAR work might be useful. Is R151014 released yet? It
looks like all the documentation is there but mentions Apr 3/2015. Is there
any reason to believe that it might be fixed if there are no (or low
amounts) of changes in COMSTAR for this release? (Sounds like it isn't, now
that I've read your latest)

It looks like I'll have to make do with lazy zeroed or thin provisioned
disks of 10TB+ for my Veeam tests, if it doesn't cause another kernel
panic. I'm hesitant to create these now during business hours (and I
shouldn't be.. these are normal VM provisioning tasks on available
storage!). In your estimation, would eager zero vs lazy zero vs thin
provisioned vmdks make any difference with that WRITE_SAME code? The
majority of my VMs use eager zeroed disks, but again, never to this size.

If there is anything you need me to test (in R151014? or beyond?), it's
easy enough for me to reproduce (I timed myself last night, it took me
about 2 hours to gracefully shut/save all the VMs, cause the crash dump,
and get the infrastructure back up). I should probably try it on Hyper-V as
well when I get time, but I believe most of those are Dynamic (thin)
instead of Fixed (eager zero) disks, and I don't believe Hyper-V has an
equivalent to lazy zeroed. The Hyper-V environment runs our test VMs after
all, and aren't as performance sensitive.

If you can tell me where the fix should go, I can probably try it out, even
though I haven't built any kernel modules before (though I'm sure there are
enough resources for me to draw on). I'll start by making myself a build
server on a VM. Is this
http://wiki.illumos.org/display/illumos/How+To+Build+illumos still current?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150326/707c16c8/attachment-0001.html>

From doug at will.to  Thu Mar 26 16:24:49 2015
From: doug at will.to (Doug Hughes)
Date: Thu, 26 Mar 2015 12:24:49 -0400
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
Message-ID: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>

any recommendations? We're having some pretty big problems with the Solarflare card and driver dropping network under high load. We eliminated LACP as a culprit, and the switch.

Intel? Chelsio? other?

- Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150326/1069fd72/attachment.html>

From chip at innovates.com  Thu Mar 26 16:36:12 2015
From: chip at innovates.com (Schweiss, Chip)
Date: Thu, 26 Mar 2015 11:36:12 -0500
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
Message-ID: <CALeZrrT-qdy42gNKMzvmp4QhEe5HRTG829h6-5xdfuFXo6noHg@mail.gmail.com>

The Intel X520's  and the Supermicro equivalents  are rock solid.   The
X540 probably is too, I just haven't used it.  I prefer the Supermicro
branded Intel cards because the firmware is not as picky about the twin-ax
cables used.

-Chip

On Thu, Mar 26, 2015 at 11:24 AM, Doug Hughes <doug at will.to> wrote:

> any recommendations? We're having some pretty big problems with the
> Solarflare card and driver dropping network under high load. We eliminated
> LACP as a culprit, and the switch.
>
> Intel? Chelsio? other?
>
> - Doug
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150326/0d9992e3/attachment.html>

From jstockett at molalla.com  Thu Mar 26 16:45:11 2015
From: jstockett at molalla.com (Jeff Stockett)
Date: Thu, 26 Mar 2015 16:45:11 +0000
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
Message-ID: <136C13E89D22BB468B2A7025993639732F4E3778@EXMCCMB.molalla.com>

I would concur with what Chip said.  We?ve had good luck with the Intel X520s setup with LACP to a Nexus 5000 ? and also have a few X540s.  The X520s are a bit picky about SFPs but Appoved makes one that works and is reasonably affordable.

From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Doug Hughes
Sent: Thursday, March 26, 2015 9:25 AM
To: omnios-discuss
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS

any recommendations? We're having some pretty big problems with the Solarflare card and driver dropping network under high load. We eliminated LACP as a culprit, and the switch.

Intel? Chelsio? other?

- Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150326/d7fe727d/attachment.html>

From doug at will.to  Thu Mar 26 16:51:19 2015
From: doug at will.to (Doug Hughes)
Date: Thu, 26 Mar 2015 12:51:19 -0400
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <CALeZrrT-qdy42gNKMzvmp4QhEe5HRTG829h6-5xdfuFXo6noHg@mail.gmail.com>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
	<CALeZrrT-qdy42gNKMzvmp4QhEe5HRTG829h6-5xdfuFXo6noHg@mail.gmail.com>
Message-ID: <CAOpmc6x_OrNZZY53A7Ek1E1zZYj8-WU=Ow=yWJ1jEHSq3JOedQ@mail.gmail.com>

Thanks guys! (also, if anybody has some advice or contrary experience with
SolarFlare (5162), I'd love to hear it. Right now they are pretty much
unusable under load, though iperf tends to work fine.



On Thu, Mar 26, 2015 at 12:36 PM, Schweiss, Chip <chip at innovates.com> wrote:

> The Intel X520's  and the Supermicro equivalents  are rock solid.   The
> X540 probably is too, I just haven't used it.  I prefer the Supermicro
> branded Intel cards because the firmware is not as picky about the twin-ax
> cables used.
>
> -Chip
>
> On Thu, Mar 26, 2015 at 11:24 AM, Doug Hughes <doug at will.to> wrote:
>
>> any recommendations? We're having some pretty big problems with the
>> Solarflare card and driver dropping network under high load. We eliminated
>> LACP as a culprit, and the switch.
>>
>> Intel? Chelsio? other?
>>
>> - Doug
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150326/677662e1/attachment.html>

From doug at will.to  Thu Mar 26 16:55:55 2015
From: doug at will.to (Doug Hughes)
Date: Thu, 26 Mar 2015 12:55:55 -0400
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <CAOpmc6x_OrNZZY53A7Ek1E1zZYj8-WU=Ow=yWJ1jEHSq3JOedQ@mail.gmail.com>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
	<CALeZrrT-qdy42gNKMzvmp4QhEe5HRTG829h6-5xdfuFXo6noHg@mail.gmail.com>
	<CAOpmc6x_OrNZZY53A7Ek1E1zZYj8-WU=Ow=yWJ1jEHSq3JOedQ@mail.gmail.com>
Message-ID: <CAOpmc6yohQ1iarn_NcXVStoo1dERapzJ0GzT6c6-v-rKKtARGA@mail.gmail.com>

Regarding X520 and SFP+.. We tend to use Amphenol. Do those work ok?


On Thu, Mar 26, 2015 at 12:51 PM, Doug Hughes <doug at will.to> wrote:

> Thanks guys! (also, if anybody has some advice or contrary experience with
> SolarFlare (5162), I'd love to hear it. Right now they are pretty much
> unusable under load, though iperf tends to work fine.
>
>
>
> On Thu, Mar 26, 2015 at 12:36 PM, Schweiss, Chip <chip at innovates.com>
> wrote:
>
>> The Intel X520's  and the Supermicro equivalents  are rock solid.   The
>> X540 probably is too, I just haven't used it.  I prefer the Supermicro
>> branded Intel cards because the firmware is not as picky about the twin-ax
>> cables used.
>>
>> -Chip
>>
>> On Thu, Mar 26, 2015 at 11:24 AM, Doug Hughes <doug at will.to> wrote:
>>
>>> any recommendations? We're having some pretty big problems with the
>>> Solarflare card and driver dropping network under high load. We eliminated
>>> LACP as a culprit, and the switch.
>>>
>>> Intel? Chelsio? other?
>>>
>>> - Doug
>>> _______________________________________________
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss at lists.omniti.com
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150326/22ce8fe6/attachment.html>

From danmcd at omniti.com  Thu Mar 26 17:05:09 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 26 Mar 2015 13:05:09 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
	when creating zero eager disks
In-Reply-To: <CA+tR_KwWu+hT7bOJxiuf1r5cB6EcMrbE_U0QOuK32=1=HNeeRg@mail.gmail.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<20150325195212.5a8cebe4@sleipner.datanom.net>
	<CA+tR_KzBgaQWdO41SQhYmR14ZMAFz+8B+0oshGD=Txq0CCnyQQ@mail.gmail.com>
	<AA9C1B75-6888-4AE1-9C66-DC4D04A491AD@omniti.com>
	<07B1E231-B571-46BC-913D-445DBCA54923@omniti.com>
	<CA+tR_KwWu+hT7bOJxiuf1r5cB6EcMrbE_U0QOuK32=1=HNeeRg@mail.gmail.com>
Message-ID: <7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com>


> On Mar 26, 2015, at 11:47 AM, wuffers <moo at wuffers.net> wrote:

> It looks like I'll have to make do with lazy zeroed or thin provisioned disks of 10TB+ for my Veeam tests, if it doesn't cause another kernel panic. I'm hesitant to create these now during business hours (and I shouldn't be.. these are normal VM provisioning tasks on available storage!). In your estimation, would eager zero vs lazy zero vs thin provisioned vmdks make any difference with that WRITE_SAME code? The majority of my VMs use eager zeroed disks, but again, never to this size. 

WRITE_SAME is one of the four VAAI primitives.  Nexenta wrote this code for NS, and upstreamed two of them:

WRITE_SAME is "hardware assisted erase".

UNMAP is "hardware assisted freeing".

Those are in upstream illumos.

ATS is atomic-test-and-set or "hardware assisted fine-grained locking".

XCOPY is "hardware assisted copying".

These are in NexentaStor, and after being held back, were open-sourced, but not yet upstreamed.

> If there is anything you need me to test (in R151014? or beyond?), it's easy enough for me to reproduce (I timed myself last night, it took me about 2 hours to gracefully shut/save all the VMs, cause the crash dump, and get the infrastructure back up). I should probably try it on Hyper-V as well when I get time, but I believe most of those are Dynamic (thin) instead of Fixed (eager zero) disks, and I don't believe Hyper-V has an equivalent to lazy zeroed. The Hyper-V environment runs our test VMs after all, and aren't as performance sensitive.

I may be able to generate a fix, but I have no idea if it's sufficient or not.  Like I said, COMSTAR is not well-written or maintainable code, but Nexenta has put a lot of love into it.

> If you can tell me where the fix should go, I can probably try it out, even though I haven't built any kernel modules before (though I'm sure there are enough resources for me to draw on). I'll start by making myself a build server on a VM. Is this http://wiki.illumos.org/display/illumos/How+To+Build+illumos still current?

The small fix I might be able to generate will involve a replacement "stmf_sbd" module.  More on that after I get cycles to generate something.

Dan


From gmason at msu.edu  Thu Mar 26 16:30:21 2015
From: gmason at msu.edu (Greg Mason)
Date: Thu, 26 Mar 2015 12:30:21 -0400
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
Message-ID: <9C3872C8-BF0E-4A14-A1BC-FEC5241DF583@msu.edu>

On Mar 26, 2015, at 12:24 PM, Doug Hughes <doug at will.to> wrote:
> 
> any recommendations? We're having some pretty big problems with the Solarflare card and driver dropping network under high load. We eliminated LACP as a culprit, and the switch.
> 
> Intel? Chelsio? other?

I?ve had a pretty good experience with the Intel X520 cards, not really much I can complain about.

-Greg

From danmcd at omniti.com  Thu Mar 26 17:25:17 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 26 Mar 2015 13:25:17 -0400
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <9C3872C8-BF0E-4A14-A1BC-FEC5241DF583@msu.edu>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
	<9C3872C8-BF0E-4A14-A1BC-FEC5241DF583@msu.edu>
Message-ID: <4967907A-EA42-476A-93CA-B8D65D778972@omniti.com>

Generally speaking the Intel ones are preferable, because Intel does the best job of keeping those drivers up to date for all open-source platforms.

Just pushed into illumos, and coming soon to r151014 is OPEN-SOURCE support for Broadcom NetXtreme II (now owned by QLogic) 10GigE  (the "bnxe" driver).  It's not nearly as good as Intel's code, or likely Intel's HW, but now that it is open-source, it becomes a viable alternative, because people can now fix the driver if there is a problem.

I'd still use Intel 10Gig where possible, but if you're stuck on HW with formerly-Broadcom 10GigE, your luck will be improving somewhat.

Dan


From danmcd at omniti.com  Thu Mar 26 18:56:41 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 26 Mar 2015 14:56:41 -0400
Subject: [OmniOS-discuss] Bloody repo server
Message-ID: <3A0B4680-F786-4B21-AF31-A251A11FE2FA@omniti.com>

Is offline for now.  I'm trying to get rid of the 014 crud I accidentally pushed into it on Tuesday.  Those packages were just illumos-omnios ones, which are just still the master branch of illumos-omnios.

Dan


From danmcd at omniti.com  Thu Mar 26 19:30:44 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 26 Mar 2015 15:30:44 -0400
Subject: [OmniOS-discuss] Bloody repo server
In-Reply-To: <3A0B4680-F786-4B21-AF31-A251A11FE2FA@omniti.com>
References: <3A0B4680-F786-4B21-AF31-A251A11FE2FA@omniti.com>
Message-ID: <13204CB0-13D4-4E6C-828C-1CB0D98797DB@omniti.com>

Is now back online, and cleaned of r151014 packages that shouldn't have been there.

Thank you!
Dan


From john.barfield at bissinc.com  Thu Mar 26 20:50:32 2015
From: john.barfield at bissinc.com (John Barfield)
Date: Thu, 26 Mar 2015 20:50:32 +0000
Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues
In-Reply-To: <CACLsApv4Zdd+d2RJ0VUUd+xMDQtyY7kzkTdfL2F_Zg124dTfXQ@mail.gmail.com>
References: <F1F9969B-859C-4768-8630-A90AF9B4D0C0@bissinc.com>
	<708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com>
	<5511ED7A.3080006@gmx.li>
	<870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com>
	<CD455F01-7BF0-467A-B70D-C6D3C8DA0025@bissinc.com>
	<BDEFC1B7-95FC-4C81-A932-7DE7532E3775@gmail.com>
	<38C9C9CA-A75F-49F6-B210-C04CA81402E4@bissinc.com>
	<CACLsApv4Zdd+d2RJ0VUUd+xMDQtyY7kzkTdfL2F_Zg124dTfXQ@mail.gmail.com>
Message-ID: <4B77878C-A980-44C5-B5D8-1A34D9D3A52A@bissinc.com>

So I was still having issues with virtio performance. I?ve finally determined that its the child zone that is  capping the throughput at 85mbps.

If I halt the zone and launch the same VM from the GZ I get 955mbps.

Another thing?the virtio driver in Centos6.6 does not work well with OmniOS kvm.

I can boot Centos from either the GZ or a CZ and I?m actually getting results in the Kb now instead of mbps with iperf. May have something to do with the tcp window being 19.5 kb on CentOS vs 85kb on Ubuntu. Assuming this is a driver problem.

The only OS I get good speeds with are Ubuntu server 14.04 running the Global Zone. (Have only tested two though :))

So my recipe for decent virtio performance on OmniOS:

Ubuntu Linux Server 14.04 running in Global Zone.

Does anyone have any idea why the child zone is capping my throughput?

Am I missing a zone cfg parameter to allow the child zone to have full 1GB bandwidth?



From: Theo Schlossnagle
Date: Wednesday, March 25, 2015 at 6:56 AM
To: John Barfield
Cc: Phil Harman, "omnios-discuss at lists.omniti.com<mailto:omnios-discuss at lists.omniti.com>"
Subject: Re: [OmniOS-discuss] Potential KVM Virtio Performance Issues

+1 John.  That documentation would be very welcome.

On Tue, Mar 24, 2015 at 9:50 PM, John Barfield <john.barfield at bissinc.com<mailto:john.barfield at bissinc.com>> wrote:
Actually the numbers I sent for the SmartOS VM to VM test were on a switch with Jumbo frames (switch = 9216 mtu...SmartOS GZ MTU = 9000) (Extreme Networks Summit X440-48t release 15.2.3 patch12) Theyre also sitting in Q-in-Q tagged VLANs. Admin tagged nic sits in Vman (provider bridge) 10 the VMs were tagged in VLAN 1674. (not bad :) really)

As far as everyone who is wondering how I got 952 Mbps on OmnisOS KVM while running in a zone I plan to write up a how-to that can be posted to the core site if you'd like. There are several caveats that are not documented today for running KVM in a zone. Not that I didnt reverse engineer some of Joyents work of course.



Thanks and have a great day,

John Barfield

> On Mar 24, 2015, at 7:40 PM, Phil Harman <phil.harman at gmail.com<mailto:phil.harman at gmail.com>> wrote:
>
> John,
>
> Interesting work and data. Thanks for sharing.
>
> I've also been playing with Oracle Solaris 11.2 vs Linux vs FreeBSD on SmartOS vs ESX5.5 (free edition) both VM2VM and VM to remote host over a couple of Intel 10GBASE-T cards.
>
> As far as I can tell, there remains no virtio-net driver for Solaris / Illumos guests, so I've been using e1000g, which really sucks.
>
> I found virtio-net works ok under KVM, but was blown away by vmxnet3 under ESX performance (for which a Solaris / Illumos drivers do exist), being able to get close to 8gbps from the guest over the wire!
>
> To achieve this I had to use jumbo frames (something the current Solaris 11.2 e1000g appears unable to do at all any more).
>
> So I was wondering, while you are there, whether you've got (or can get) any data for KVM virtio-net VM2VM using jumbo frames?
>
> My expectation would be at least 2x for MTU 9000 vs 1500.
>
> I also wonder whether like for like comparison with ESX might encourage further improvements?
>
> As someone used to say at Sun "If Linux is faster, it's a Solaris bug!". It would be great if the community could agree to the same for ESX vs KVM :)
>
> Cheers,
> Phil
>
>
>> On 24 Mar 2015, at 23:45, John Barfield <john.barfield at bissinc.com<mailto:john.barfield at bissinc.com>> wrote:
>>
>> Btw I did go ahead and test both virtio methods...I gave a vm the -device argument on one interface and the -net argument for another the results where....
>>
>> -device = eth0 = 952mbps
>> -net = eth1 = 199 mbps
>>
>> Thanks and have a great day,
>>
>> John Barfield
>>
>>> On Mar 24, 2015, at 6:12 PM, Dan McDonald <danmcd at omniti.com<mailto:danmcd at omniti.com>> wrote:
>>>
>>>
>>>> On Mar 24, 2015, at 7:04 PM, Dominik Hassler <hasslerd at gmx.li<mailto:hasslerd at gmx.li>> wrote:
>>>>
>>>> Dan,
>>>>
>>>>>> After further testing I achieved 952 MBytes on a VM-2-VM
>>>>>> connection...1
>>>>>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two
>>>>>> different SmartOS host machines (through an extreme networks switch).
>>>>
>>>> if I got John correctly, he was running his second test on SmartOS hosts...
>>>>
>>>> We did a lot of testing on OmniOS with -net vnic and -device
>>>> virtio-net-pci but sadly to no avail...
>>>>
>>>> I think we have to hope that SmartOS kvm improvements will get
>>>> upstreamed sooner or later.
>>>
>>> Ahh yes.
>>>
>>> I was hoping to have them ready for 014, but it's a complicated process to upstream larger projects, and Joyent was in the middle of getting their new Triton release out the door.
>>>
>>> Dan
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com<mailto:OmniOS-discuss at lists.omniti.com>
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com<mailto:OmniOS-discuss at lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss



--

Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150326/eabb3f1a/attachment-0001.html>

From moo at wuffers.net  Thu Mar 26 21:15:39 2015
From: moo at wuffers.net (wuffers)
Date: Thu, 26 Mar 2015 17:15:39 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
 when creating zero eager disks
In-Reply-To: <7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<20150325195212.5a8cebe4@sleipner.datanom.net>
	<CA+tR_KzBgaQWdO41SQhYmR14ZMAFz+8B+0oshGD=Txq0CCnyQQ@mail.gmail.com>
	<AA9C1B75-6888-4AE1-9C66-DC4D04A491AD@omniti.com>
	<07B1E231-B571-46BC-913D-445DBCA54923@omniti.com>
	<CA+tR_KwWu+hT7bOJxiuf1r5cB6EcMrbE_U0QOuK32=1=HNeeRg@mail.gmail.com>
	<7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com>
Message-ID: <CA+tR_KwXJunaTp06n9kcpmdkC9F6hzvEXhS07vSZy+W49EUZKg@mail.gmail.com>

On Thu, Mar 26, 2015 at 1:05 PM, Dan McDonald <danmcd at omniti.com> wrote:

>
> WRITE_SAME is one of the four VAAI primitives.  Nexenta wrote this code
> for NS, and upstreamed two of them:
>
> WRITE_SAME is "hardware assisted erase".
>
> UNMAP is "hardware assisted freeing".
>
> Those are in upstream illumos.
>
> ATS is atomic-test-and-set or "hardware assisted fine-grained locking".
>
> XCOPY is "hardware assisted copying".
>
> These are in NexentaStor, and after being held back, were open-sourced,
> but not yet upstreamed.
>

>

Ahh, VAAI. I suspect this is a bigger bite to chew, looking back at some
prior discussions on this list (although I'm sure many are anxiously
awaiting this to be upstreamed). I'm guessing Microsoft's ODX will also be
supported since I understand that is just an XCOPY. I see that FreeNAS now
has support for both VAAI and ODX - are they porting stuff from the various
Illumos distros (including the referenced Nexenta work on VAAI or is it
their own implementation)?

After some more reading to answer my own questions, I came across this
VMware blog post (
http://blogs.vmware.com/vsphere/2012/06/low-level-vaai-behaviour.html):

"The following provisioning tasks are accelerated by the use of the WRITE
SAME command:

Cloning operations for eagerzeroedthick target disks.
Allocating new file blocks for thin provisioned virtual disks.
Initializing previous unwritten file blocks for zerothick virtual disks."

I don't seem to have issues allocating smaller amounts of space, so I
suspect that using thin or lazy zero will work.

Secondly, it *might* just be the vSphere fat client, as I found another
VMware KB (
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2058287)
which states I cannot make a disk larger than 4TB, which contradicts this
properties dialog:

http://i.imgur.com/f9liqpR.png
(says maximum file size of 2TB in vSphere fat client)

versus:
http://i.imgur.com/6Ya3oH4.png
(says maximum file size 64TB in the vSphere web client)

The KB goes on to state, "Checking the size of the newly created or
expanded VMDK, you find that it is 4 TB." is untrue, because it allocated
and is using 10TB. Don't know how much to trust that info as it seems
contradictory. Still, it shouldn't cause the kernel panic like it did.

Thirdly, it appears I can disable any of the VAAI primitives in the host
configuration, if all else fails (since we've determined that it is likely
caused by WRITE_SAME). Good read on this via the VAAI FAQ here (which shows
you how to check the properties via the ESX CLI):
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1021976

So here's what I will attempt to test:
- Create thin vmdk @ 10TB with vSphere fat client
- Create lazy zeroed vmdk @ 10 TB with vSphere fat client
- Create eager zeroed vmdk @ 10 TB with vSphere web client
- Create thin vmdk @ 10TB with vSphere web client
- Create lazy zeroed vmdk @ 10 TB with vSphere web client

So it seems I do have alternatives
(disabling DataMover.HardwareAcceleratedMove as a last resort).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150326/457672aa/attachment.html>

From danmcd at omniti.com  Thu Mar 26 21:19:37 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 26 Mar 2015 17:19:37 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
	when creating zero eager disks
In-Reply-To: <CA+tR_KwXJunaTp06n9kcpmdkC9F6hzvEXhS07vSZy+W49EUZKg@mail.gmail.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<20150325195212.5a8cebe4@sleipner.datanom.net>
	<CA+tR_KzBgaQWdO41SQhYmR14ZMAFz+8B+0oshGD=Txq0CCnyQQ@mail.gmail.com>
	<AA9C1B75-6888-4AE1-9C66-DC4D04A491AD@omniti.com>
	<07B1E231-B571-46BC-913D-445DBCA54923@omniti.com>
	<CA+tR_KwWu+hT7bOJxiuf1r5cB6EcMrbE_U0QOuK32=1=HNeeRg@mail.gmail.com>
	<7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com>
	<CA+tR_KwXJunaTp06n9kcpmdkC9F6hzvEXhS07vSZy+W49EUZKg@mail.gmail.com>
Message-ID: <294BC7DA-0A85-4662-B4B2-71C94755A30A@omniti.com>

Just remember that only WRITE_SAME and UNMAP are on stock illumos.  If you want the other two, you either get NexentaStor or you start an effort to upstream them from illumos-nexenta.

Dan


From matthew.lagoe at subrigo.net  Thu Mar 26 22:33:24 2015
From: matthew.lagoe at subrigo.net (Matthew Lagoe)
Date: Thu, 26 Mar 2015 15:33:24 -0700
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
Message-ID: <009e01d06814$e1f81170$a5e83450$@subrigo.net>

I use the myricom 10g cards (myri10g) they work just fine.

 

From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Doug Hughes
Sent: Thursday, March 26, 2015 09:25 AM
To: omnios-discuss
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS

 

any recommendations? We're having some pretty big problems with the Solarflare card and driver dropping network under high load. We eliminated LACP as a culprit, and the switch.

Intel? Chelsio? other?

- Doug

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150326/625bd8ac/attachment.html>

From phil.harman at gmail.com  Fri Mar 27 00:05:33 2015
From: phil.harman at gmail.com (Phil Harman)
Date: Fri, 27 Mar 2015 00:05:33 +0000
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <136C13E89D22BB468B2A7025993639732F4E3778@EXMCCMB.molalla.com>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
	<136C13E89D22BB468B2A7025993639732F4E3778@EXMCCMB.molalla.com>
Message-ID: <3FA8D3A2-B45F-48D4-909E-6B8F2542DAFF@gmail.com>

SFPs? Are you know kidding me? For my home lab I bought a pair of X540-T2 cards, which will even run over Cat5 for about a metre. They use a lot more power than SFP though :)

No issues at all with illumos or VMware.

> On 26 Mar 2015, at 16:45, Jeff Stockett <jstockett at molalla.com> wrote:
> 
> I would concur with what Chip said.  We?ve had good luck with the Intel X520s setup with LACP to a Nexus 5000 ? and also have a few X540s.  The X520s are a bit picky about SFPs but Appoved makes one that works and is reasonably affordable.
>  
> From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Doug Hughes
> Sent: Thursday, March 26, 2015 9:25 AM
> To: omnios-discuss
> Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
>  
> any recommendations? We're having some pretty big problems with the Solarflare card and driver dropping network under high load. We eliminated LACP as a culprit, and the switch.
> 
> Intel? Chelsio? other?
> 
> - Doug
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150327/da00b062/attachment.html>

From stephan.budach at JVM.DE  Fri Mar 27 05:49:00 2015
From: stephan.budach at JVM.DE (Stephan Budach)
Date: Fri, 27 Mar 2015 06:49:00 +0100
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <3FA8D3A2-B45F-48D4-909E-6B8F2542DAFF@gmail.com>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>	<136C13E89D22BB468B2A7025993639732F4E3778@EXMCCMB.molalla.com>
	<3FA8D3A2-B45F-48D4-909E-6B8F2542DAFF@gmail.com>
Message-ID: <5514EF4C.70909@jvm.de>

Am 27.03.15 um 01:05 schrieb Phil Harman:
> SFPs? Are you know kidding me? For my home lab I bought a pair of 
> X540-T2 cards, which will even run over Cat5 for about a metre. They 
> use a lot more power than SFP though :)

He he? 1m doesn't buy you much in a server room, so we're using almost 
only 540-T2 over Cat6 and we never had a problem under Solaris or OmniOS.
I'd always choose the Intel 10GbEs over all other brands. I've had 
horrible issues with Broadcom-based NICs in my Dells and they will have 
to prove their reliability to a load of other people first, before I 
will even consider, buying those again? but, I guess so will other do as 
well. ;)

>
> No issues at all with illumos or VMware.
>
> On 26 Mar 2015, at 16:45, Jeff Stockett <jstockett at molalla.com 
> <mailto:jstockett at molalla.com>> wrote:
>
>> I would concur with what Chip said.  We?ve had good luck with the 
>> Intel X520s setup with LACP to a Nexus 5000 ? and also have a few 
>> X540s.  The X520s are a bit picky about SFPs but Appoved makes one 
>> that works and is reasonably affordable.
>>
>> *From:*OmniOS-discuss 
>> [mailto:omnios-discuss-bounces at lists.omniti.com] *On Behalf Of *Doug 
>> Hughes
>> *Sent:* Thursday, March 26, 2015 9:25 AM
>> *To:* omnios-discuss
>> *Subject:* [OmniOS-discuss] best or preferred 10g card for OmniOS
>>
>> any recommendations? We're having some pretty big problems with the 
>> Solarflare card and driver dropping network under high load. We 
>> eliminated LACP as a culprit, and the switch.
>>
>> Intel? Chelsio? other?
>>
>> - Doug
>>
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


-- 
Stephan Budach
Managing Director
Jung von Matt/it-services GmbH
Glash?ttenstra?e 79
20357 Hamburg


Tel: +49 40-4321-1353
Fax: +49 40-4321-1114
E-Mail: stephan.budach at jvm.de
Internet: http://www.jvm.com

Gesch?ftsf?hrer: Stephan Budach
AG HH HRB 98380

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150327/b80498f9/attachment-0001.html>

From moo at wuffers.net  Fri Mar 27 06:24:07 2015
From: moo at wuffers.net (wuffers)
Date: Fri, 27 Mar 2015 02:24:07 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
 when creating zero eager disks
In-Reply-To: <294BC7DA-0A85-4662-B4B2-71C94755A30A@omniti.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<20150325195212.5a8cebe4@sleipner.datanom.net>
	<CA+tR_KzBgaQWdO41SQhYmR14ZMAFz+8B+0oshGD=Txq0CCnyQQ@mail.gmail.com>
	<AA9C1B75-6888-4AE1-9C66-DC4D04A491AD@omniti.com>
	<07B1E231-B571-46BC-913D-445DBCA54923@omniti.com>
	<CA+tR_KwWu+hT7bOJxiuf1r5cB6EcMrbE_U0QOuK32=1=HNeeRg@mail.gmail.com>
	<7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com>
	<CA+tR_KwXJunaTp06n9kcpmdkC9F6hzvEXhS07vSZy+W49EUZKg@mail.gmail.com>
	<294BC7DA-0A85-4662-B4B2-71C94755A30A@omniti.com>
Message-ID: <CA+tR_KxJKn-SZdJTXpWsGC3WS83+iLsi9UcmKS=YXb87pPrH9Q@mail.gmail.com>

>
>
> So here's what I will attempt to test:
> - Create thin vmdk @ 10TB with vSphere fat client: PASS
> - Create lazy zeroed vmdk @ 10 TB with vSphere fat client: PASS
> - Create eager zeroed vmdk @ 10 TB with vSphere web client: PASS! (took 1
> hour)
> - Create thin vmdk @ 10TB with vSphere web client: PASS
> - Create lazy zeroed vmdk @ 10 TB with vSphere web client: PASS
>
>
Additionally, I tried:
- Create fixed vhdx @ 10TB with SCVMM (Hyper-V): PASS (most likely no
primitives in use here - this took slightly over 3 hours)

Everything passed (which I didn't expect, especially the 10TB eager zero)..
then I tried again on the vSphere web client for a 20TB eager zero disk,
and I got another kernel panic altogether (no kmem_flags 0xf set,
unfortunately).

Mar 27 2015 01:09:33.664060000 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
SUNOS-8000-KL

  TIME                 CLASS                                 ENA
  Mar 27 01:09:33.6307 ireport.os.sunos.panic.dump_available
0x0000000000000000
  Mar 27 01:08:30.6688 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
        code = SUNOS-8000-KL
        diag-time = 1427432973 633746
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru =
sw:///:path=/var/crash/unknown/.2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
                resource =
sw:///:path=/var/crash/unknown/.2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
                savecore-succcess = 1
                dump-dir = /var/crash/unknown
                dump-files = vmdump.2
                os-instance-uuid = 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
                panicstr = BAD TRAP: type=d (#gp General protection)
rp=ffffff01eb72ea70 addr=0
                panicstack = unix:real_mode_stop_cpu_stage2_end+9e23 () |
unix:trap+a30 () | unix:cmntrap+e6 () | genunix:anon_decref+35 () |
genunix:anon_free+74 () | genunix:segvn_free+242 () | genunix:seg_free+30
() | genunix:segvn_unmap+cde () | genunix:as_free+e7 () | genunix:relvm+220
() | genunix:proc_exit+454 () | genunix:exit+15 () | genunix:rexit+18 () |
unix:brand_sys_sysenter+1c9 () |
                crashtime = 1427431421
                panic-time = Fri Mar 27 00:43:41 2015 EDT
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x5514e60d 0x2794c060

Crash file:
https://drive.google.com/file/d/0B7mCJnZUzJPKT0lpTW9GZFJCLTg/view?usp=sharing

It appears I can do thin and lazy zero disks of those sizes, so I will have
to be satisfied to use those options as a workaround (plus disabling
WRITE_SAME from the hosts if I really wanted the eager zeroed disk) until
some of that Nexenta COMSTAR love is upstreamed. For comparison sake,
provisioning a 10TB fixed vhdx took approximately 3 hours in Hyper-V, while
the same provisioning in VMware took about 1 hour. So we can say that
WRITE_SAME accelerated the same job by 3x.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150327/8f430726/attachment.html>

From tobi at oetiker.ch  Fri Mar 27 08:36:35 2015
From: tobi at oetiker.ch (Tobias Oetiker)
Date: Fri, 27 Mar 2015 09:36:35 +0100 (CET)
Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues
In-Reply-To: <4B77878C-A980-44C5-B5D8-1A34D9D3A52A@bissinc.com>
References: <F1F9969B-859C-4768-8630-A90AF9B4D0C0@bissinc.com>
	<708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com>
	<5511ED7A.3080006@gmx.li>
	<870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com>
	<CD455F01-7BF0-467A-B70D-C6D3C8DA0025@bissinc.com>
	<BDEFC1B7-95FC-4C81-A932-7DE7532E3775@gmail.com>
	<38C9C9CA-A75F-49F6-B210-C04CA81402E4@bissinc.com>
	<CACLsApv4Zdd+d2RJ0VUUd+xMDQtyY7kzkTdfL2F_Zg124dTfXQ@mail.gmail.com>
	<4B77878C-A980-44C5-B5D8-1A34D9D3A52A@bissinc.com>
Message-ID: <alpine.DEB.2.10.1503270932140.23412@engelberg.oetiker.ch>

Hi John,

this sounds encurraging ...

could you provied complete details of your setup ...

* start up commandline for qemu-system-x86_64
* kernel version running on on your ubuntu
* any special config on the ubuntu side (eg in /etc/systctl.conf) or so ...

cheers
tobi


Yesterday John Barfield wrote:

> So I was still having issues with virtio performance. I?ve finally determined that its the child zone that is  capping the throughput at 85mbps.
>
> If I halt the zone and launch the same VM from the GZ I get 955mbps.
>
> Another thing?the virtio driver in Centos6.6 does not work well with OmniOS kvm.
>
> I can boot Centos from either the GZ or a CZ and I?m actually getting results in the Kb now instead of mbps with iperf. May have something to do with the tcp window being 19.5 kb on CentOS vs 85kb on Ubuntu. Assuming this is a driver problem.
>
> The only OS I get good speeds with are Ubuntu server 14.04 running the Global Zone. (Have only tested two though :))
>
> So my recipe for decent virtio performance on OmniOS:
>
> Ubuntu Linux Server 14.04 running in Global Zone.
>
> Does anyone have any idea why the child zone is capping my throughput?
>
> Am I missing a zone cfg parameter to allow the child zone to have full 1GB bandwidth?
>
>
>
> From: Theo Schlossnagle
> Date: Wednesday, March 25, 2015 at 6:56 AM
> To: John Barfield
> Cc: Phil Harman, "omnios-discuss at lists.omniti.com<mailto:omnios-discuss at lists.omniti.com>"
> Subject: Re: [OmniOS-discuss] Potential KVM Virtio Performance Issues
>
> +1 John.  That documentation would be very welcome.
>
> On Tue, Mar 24, 2015 at 9:50 PM, John Barfield <john.barfield at bissinc.com<mailto:john.barfield at bissinc.com>> wrote:
> Actually the numbers I sent for the SmartOS VM to VM test were on a switch with Jumbo frames (switch = 9216 mtu...SmartOS GZ MTU = 9000) (Extreme Networks Summit X440-48t release 15.2.3 patch12) Theyre also sitting in Q-in-Q tagged VLANs. Admin tagged nic sits in Vman (provider bridge) 10 the VMs were tagged in VLAN 1674. (not bad :) really)
>
> As far as everyone who is wondering how I got 952 Mbps on OmnisOS KVM while running in a zone I plan to write up a how-to that can be posted to the core site if you'd like. There are several caveats that are not documented today for running KVM in a zone. Not that I didnt reverse engineer some of Joyents work of course.
>
>
>
> Thanks and have a great day,
>
> John Barfield
>
> > On Mar 24, 2015, at 7:40 PM, Phil Harman <phil.harman at gmail.com<mailto:phil.harman at gmail.com>> wrote:
> >
> > John,
> >
> > Interesting work and data. Thanks for sharing.
> >
> > I've also been playing with Oracle Solaris 11.2 vs Linux vs FreeBSD on SmartOS vs ESX5.5 (free edition) both VM2VM and VM to remote host over a couple of Intel 10GBASE-T cards.
> >
> > As far as I can tell, there remains no virtio-net driver for Solaris / Illumos guests, so I've been using e1000g, which really sucks.
> >
> > I found virtio-net works ok under KVM, but was blown away by vmxnet3 under ESX performance (for which a Solaris / Illumos drivers do exist), being able to get close to 8gbps from the guest over the wire!
> >
> > To achieve this I had to use jumbo frames (something the current Solaris 11.2 e1000g appears unable to do at all any more).
> >
> > So I was wondering, while you are there, whether you've got (or can get) any data for KVM virtio-net VM2VM using jumbo frames?
> >
> > My expectation would be at least 2x for MTU 9000 vs 1500.
> >
> > I also wonder whether like for like comparison with ESX might encourage further improvements?
> >
> > As someone used to say at Sun "If Linux is faster, it's a Solaris bug!". It would be great if the community could agree to the same for ESX vs KVM :)
> >
> > Cheers,
> > Phil
> >
> >
> >> On 24 Mar 2015, at 23:45, John Barfield <john.barfield at bissinc.com<mailto:john.barfield at bissinc.com>> wrote:
> >>
> >> Btw I did go ahead and test both virtio methods...I gave a vm the -device argument on one interface and the -net argument for another the results where....
> >>
> >> -device = eth0 = 952mbps
> >> -net = eth1 = 199 mbps
> >>
> >> Thanks and have a great day,
> >>
> >> John Barfield
> >>
> >>> On Mar 24, 2015, at 6:12 PM, Dan McDonald <danmcd at omniti.com<mailto:danmcd at omniti.com>> wrote:
> >>>
> >>>
> >>>> On Mar 24, 2015, at 7:04 PM, Dominik Hassler <hasslerd at gmx.li<mailto:hasslerd at gmx.li>> wrote:
> >>>>
> >>>> Dan,
> >>>>
> >>>>>> After further testing I achieved 952 MBytes on a VM-2-VM
> >>>>>> connection...1
> >>>>>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two
> >>>>>> different SmartOS host machines (through an extreme networks switch).
> >>>>
> >>>> if I got John correctly, he was running his second test on SmartOS hosts...
> >>>>
> >>>> We did a lot of testing on OmniOS with -net vnic and -device
> >>>> virtio-net-pci but sadly to no avail...
> >>>>
> >>>> I think we have to hope that SmartOS kvm improvements will get
> >>>> upstreamed sooner or later.
> >>>
> >>> Ahh yes.
> >>>
> >>> I was hoping to have them ready for 014, but it's a complicated process to upstream larger projects, and Joyent was in the middle of getting their new Triton release out the door.
> >>>
> >>> Dan
> >> _______________________________________________
> >> OmniOS-discuss mailing list
> >> OmniOS-discuss at lists.omniti.com<mailto:OmniOS-discuss at lists.omniti.com>
> >> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com<mailto:OmniOS-discuss at lists.omniti.com>
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
>
> --
>
> Theo Schlossnagle
>
> http://omniti.com/is/theo-schlossnagle
>

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
www.oetiker.ch tobi at oetiker.ch +41 62 775 9902


From matej at zunaj.si  Fri Mar 27 13:07:55 2015
From: matej at zunaj.si (Matej Zerovnik)
Date: Fri, 27 Mar 2015 14:07:55 +0100
Subject: [OmniOS-discuss] iSCSI target hang,
	no way to restart but server reboot
Message-ID: <5515562B.9090900@zunaj.si>

Hello!

We are having issues with iSCSI on work. Every now and then, iSCSI 
target just hangs up. We are unable to kill it, restart it or do 
anything else to restore the service. The only option to restore iscsi 
target back to working state, is to reboot the whole server and loose 
all the sessions (around 100 clients).

Weird thing is, that only iscsi target hangs. I can ssh to server and 
work on it without any problem, there is no load or anything else, 
nothing in log files, just iscsi target locks up and all connections to 
iscsi target drop (probably timeout)

Server is a IBM 3550 M4 with dual Xeon E5-2640 CPUs and 160GB of memory.
Hard drives are mounted in a Supermicro JBOD, which is attached via SAS 
HBA LSI Logic SAS2308.

We are using OmniOS v11 r151006.

Anyone encounter similar troubles?
Any recomendations what to do or how to solve that problem?

Matej

From danmcd at omniti.com  Fri Mar 27 14:42:01 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 27 Mar 2015 10:42:01 -0400
Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues
In-Reply-To: <4B77878C-A980-44C5-B5D8-1A34D9D3A52A@bissinc.com>
References: <F1F9969B-859C-4768-8630-A90AF9B4D0C0@bissinc.com>
	<708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com>
	<5511ED7A.3080006@gmx.li>
	<870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com>
	<CD455F01-7BF0-467A-B70D-C6D3C8DA0025@bissinc.com>
	<BDEFC1B7-95FC-4C81-A932-7DE7532E3775@gmail.com>
	<38C9C9CA-A75F-49F6-B210-C04CA81402E4@bissinc.com>
	<CACLsApv4Zdd+d2RJ0VUUd+xMDQtyY7kzkTdfL2F_Zg124dTfXQ@mail.gmail.com>
	<4B77878C-A980-44C5-B5D8-1A34D9D3A52A@bissinc.com>
Message-ID: <A3280D76-4B2F-4B4C-9741-B70F87A61F1D@omniti.com>


> On Mar 26, 2015, at 4:50 PM, John Barfield <john.barfield at bissinc.com> wrote:
> 
> So I was still having issues with virtio performance. I?ve finally determined that its the child zone that is  capping the throughput at 85mbps. 

We don't document KVM in a non-global (what you call "child") zone, but it is possible.

I've put in for 014 the /dev/kvm entry in newly-created zones by default.  This was missing from earlier OmniOS releases.  I assume when configuring your non-global zones you did this yourself via zonecfg(1M)?

Just checking to make sure I'm not missing anything.

Thanks,
Dan


From danmcd at omniti.com  Fri Mar 27 14:43:39 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 27 Mar 2015 10:43:39 -0400
Subject: [OmniOS-discuss] iSCSI target hang,
	no way to restart but server reboot
In-Reply-To: <5515562B.9090900@zunaj.si>
References: <5515562B.9090900@zunaj.si>
Message-ID: <D1BF7DA0-907E-4C87-894F-E8E3D09AC1A9@omniti.com>


> On Mar 27, 2015, at 9:07 AM, Matej Zerovnik <matej at zunaj.si> wrote:
> 
> Hello!
> 
> We are having issues with iSCSI on work. Every now and then, iSCSI target just hangs up. We are unable to kill it, restart it or do anything else to restore the service. The only option to restore iscsi target back to working state, is to reboot the whole server and loose all the sessions (around 100 clients).
> 
> Weird thing is, that only iscsi target hangs. I can ssh to server and work on it without any problem, there is no load or anything else, nothing in log files, just iscsi target locks up and all connections to iscsi target drop (probably timeout)
> 
> Server is a IBM 3550 M4 with dual Xeon E5-2640 CPUs and 160GB of memory.
> Hard drives are mounted in a Supermicro JBOD, which is attached via SAS HBA LSI Logic SAS2308.
> 
> We are using OmniOS v11 r151006.
> 
> Anyone encounter similar troubles?
> Any recomendations what to do or how to solve that problem?

I'd move to 012 or wait the short amount of time until 014 hits the streets.  Then see if your problem persists.

Dan


From matej at zunaj.si  Fri Mar 27 14:54:27 2015
From: matej at zunaj.si (Matej Zerovnik)
Date: Fri, 27 Mar 2015 15:54:27 +0100
Subject: [OmniOS-discuss] iSCSI target hang,
 no way to restart but server reboot
In-Reply-To: <D1BF7DA0-907E-4C87-894F-E8E3D09AC1A9@omniti.com>
References: <5515562B.9090900@zunaj.si>
	<D1BF7DA0-907E-4C87-894F-E8E3D09AC1A9@omniti.com>
Message-ID: <55156F23.2010300@zunaj.si>

It just happened about 2 hours ago... The whole system did not crash, 
but 2 clients lost the connection.

This is what I see in logs:
Mar 27 13:55:51 storage.host.org scsi: [ID 107833 kern.notice] 
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 13:55:51 storage.host.org         Timeout of 0 seconds expired 
with 1 commands on target 68 lun 0.
Mar 27 13:55:51 storage.host.org scsi: [ID 107833 kern.warning] WARNING: 
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 13:55:51 storage.host.org         Disconnected command timeout 
for target 68 w500304800039d83d, enclosure 3
Mar 27 13:55:52 storage.host.org scsi: [ID 365881 kern.info] 
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 13:55:52 storage.host.org         Log info 0x31140000 received 
for target 68 w500304800039d83d.
Mar 27 13:55:52 storage.host.org         scsi_status=0x0, 
ioc_status=0x8048, scsi_state=0xc
Mar 27 15:08:31 storage.host.org iscsit: [ID 744151 kern.notice] NOTICE: 
login_sm_session_bind: add new conn/sess continue
Mar 27 15:10:53 storage.host.org scsi: [ID 107833 kern.notice] 
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 15:10:53 storage.host.org         Timeout of 0 seconds expired 
with 1 commands on target 68 lun 0.
Mar 27 15:10:53 storage.host.org scsi: [ID 107833 kern.warning] WARNING: 
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 15:10:53 storage.host.org         Disconnected command timeout 
for target 68 w500304800039d83d, enclosure 3
Mar 27 15:10:54 storage.host.org scsi: [ID 365881 kern.info] 
/pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0):
Mar 27 15:10:54 storage.host.org         Log info 0x31140000 received 
for target 68 w500304800039d83d.
Mar 27 15:10:54 storage.host.org         scsi_status=0x0, 
ioc_status=0x8048, scsi_state=0xc

I read in the archives, that this errors happens when you have SATA 
drives on a SAS expander and one of the drives misbehaves:
A command did not complete and the mpt driver reset the target.
If that target is an expander, then everything behind the expander can
reset, resulting in the aborts of any in-flight commands, as follows...

iostat -Ei | grep Error reports that one device has 6 hard errors and 6 
device not ready errors, but that is a local drive, attached to a 
different controller (LSI Megaraid).

I wouldn't like to do a major upgrade, since this is a production 
machine. Too scary:)

Matej

On 27. 03. 2015 15:43, Dan McDonald wrote:
>> On Mar 27, 2015, at 9:07 AM, Matej Zerovnik <matej at zunaj.si> wrote:
>>
>> Hello!
>>
>> We are having issues with iSCSI on work. Every now and then, iSCSI target just hangs up. We are unable to kill it, restart it or do anything else to restore the service. The only option to restore iscsi target back to working state, is to reboot the whole server and loose all the sessions (around 100 clients).
>>
>> Weird thing is, that only iscsi target hangs. I can ssh to server and work on it without any problem, there is no load or anything else, nothing in log files, just iscsi target locks up and all connections to iscsi target drop (probably timeout)
>>
>> Server is a IBM 3550 M4 with dual Xeon E5-2640 CPUs and 160GB of memory.
>> Hard drives are mounted in a Supermicro JBOD, which is attached via SAS HBA LSI Logic SAS2308.
>>
>> We are using OmniOS v11 r151006.
>>
>> Anyone encounter similar troubles?
>> Any recomendations what to do or how to solve that problem?
> I'd move to 012 or wait the short amount of time until 014 hits the streets.  Then see if your problem persists.
>
> Dan
>


From danmcd at omniti.com  Fri Mar 27 14:56:53 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 27 Mar 2015 10:56:53 -0400
Subject: [OmniOS-discuss] iSCSI target hang,
	no way to restart but server reboot
In-Reply-To: <55156F23.2010300@zunaj.si>
References: <5515562B.9090900@zunaj.si>
	<D1BF7DA0-907E-4C87-894F-E8E3D09AC1A9@omniti.com>
	<55156F23.2010300@zunaj.si>
Message-ID: <17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com>


> On Mar 27, 2015, at 10:54 AM, Matej Zerovnik <matej at zunaj.si> wrote:
> 
> I read in the archives, that this errors happens when you have SATA drives on a SAS expander and one of the drives misbehaves:
> A command did not complete and the mpt driver reset the target.
> If that target is an expander, then everything behind the expander can
> reset, resulting in the aborts of any in-flight commands, as follows...

You read correctly.  You should not have SATA drives on a SAS expander.  You are setting yourself up for failure.

> iostat -Ei | grep Error reports that one device has 6 hard errors and 6 device not ready errors, but that is a local drive, attached to a different controller (LSI Megaraid).

LSI Megaraid, ESPECIALLY with 006, is not going to be as good as either mpt_sas, or a more modern build of OmniOS (I'm hoping to get one very good change in before I close 014's illumos synching).

> I wouldn't like to do a major upgrade, since this is a production machine. Too scary:)

You should plan for it, however.  SATA drives on SAS expanders is a recipe for disaster, as you're seeing.

Dan


From matej at zunaj.si  Fri Mar 27 15:03:19 2015
From: matej at zunaj.si (Matej Zerovnik)
Date: Fri, 27 Mar 2015 16:03:19 +0100
Subject: [OmniOS-discuss] iSCSI target hang,
 no way to restart but server reboot
In-Reply-To: <17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com>
References: <5515562B.9090900@zunaj.si>
	<D1BF7DA0-907E-4C87-894F-E8E3D09AC1A9@omniti.com>
	<55156F23.2010300@zunaj.si>
	<17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com>
Message-ID: <55157137.2010909@zunaj.si>

On 27. 03. 2015 15:56, Dan McDonald wrote:
>> iostat -Ei | grep Error reports that one device has 6 hard errors and 6 device not ready errors, but that is a local drive, attached to a different controller (LSI Megaraid).
> LSI Megaraid, ESPECIALLY with 006, is not going to be as good as either mpt_sas, or a more modern build of OmniOS (I'm hoping to get one very good change in before I close 014's illumos synching).
Only rpool is on megaraid, the storage is on LSI Logic SAS2308 HBA, 
which I think is using mpt_sas driver. What change do you plan on 
putting it? Does it concern mpt_sas driver?

>> I wouldn't like to do a major upgrade, since this is a production machine. Too scary:)
> You should plan for it, however.  SATA drives on SAS expanders is a recipe for disaster, as you're seeing.
Is there a better support for SATA drives in newer omnios?

Matej


From danmcd at omniti.com  Fri Mar 27 15:04:57 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 27 Mar 2015 11:04:57 -0400
Subject: [OmniOS-discuss] iSCSI target hang,
	no way to restart but server reboot
In-Reply-To: <55157137.2010909@zunaj.si>
References: <5515562B.9090900@zunaj.si>
	<D1BF7DA0-907E-4C87-894F-E8E3D09AC1A9@omniti.com>
	<55156F23.2010300@zunaj.si>
	<17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com>
	<55157137.2010909@zunaj.si>
Message-ID: <9E6B85F8-7454-4DE2-87AF-28A5BAAF6A33@omniti.com>


> On Mar 27, 2015, at 11:03 AM, Matej Zerovnik <matej at zunaj.si> wrote:
> 
> Only rpool is on megaraid, the storage is on LSI Logic SAS2308 HBA, which I think is using mpt_sas driver. What change do you plan on putting it? Does it concern mpt_sas driver?

There are some mpt_sas improvements, but no amount of driver improvements can fix the failure modes caused by SATA drives in SAS expanders.  You Just Can't Fix That.

>>> I wouldn't like to do a major upgrade, since this is a production machine. Too scary:)
>> You should plan for it, however.  SATA drives on SAS expanders is a recipe for disaster, as you're seeing.
> Is there a better support for SATA drives in newer omnios?

Not when you're using them in situations that are operationally dangerous.

Were you a paying customer, we would tell you we don't support SATA drives in SAS expanders.

Sorry,
Dan


From narayan.desai at gmail.com  Fri Mar 27 15:13:42 2015
From: narayan.desai at gmail.com (Narayan Desai)
Date: Fri, 27 Mar 2015 10:13:42 -0500
Subject: [OmniOS-discuss] iSCSI target hang,
	no way to restart but server reboot
In-Reply-To: <9E6B85F8-7454-4DE2-87AF-28A5BAAF6A33@omniti.com>
References: <5515562B.9090900@zunaj.si>
	<D1BF7DA0-907E-4C87-894F-E8E3D09AC1A9@omniti.com>
	<55156F23.2010300@zunaj.si>
	<17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com>
	<55157137.2010909@zunaj.si>
	<9E6B85F8-7454-4DE2-87AF-28A5BAAF6A33@omniti.com>
Message-ID: <CABweQm+qp+kaYTucjFqhpcEWxCpE2q2OKd5JaKTZLkxjSJ3fRg@mail.gmail.com>

Having been on the receiving end of similar advice, it is a frustrating
situation to be in, since you have (and will likely continue to have) the
hardware in production, without much option for replacement.

When we had systems like this, we had a lot of success being aggressive in
swapping out disks that were showing signs of going bad, even before
critical failures occurred. Also looking at SMART statistics, and
aggressively replacing those as well. This made the situation manageable.
Basically, having sata drives in sas expanders means the system is brittle,
and you should treat it as such. Look for:
 - errors in iostat -En
 - high service times in iostat -xnz
 - smartctl (this causes harmless sense messages when devices are probed,
but it is easy enough to ignore these)
 - any errors reported out of lsiutil, showing either problems with
cabling/enclosures, or devices
 - decode any sense errors reported by the lsi driver

Aggressively replace devices implicated by these, and hope for the best.
The best may or may not be what you're hoping for, but may be livable; it
was for us.
good luck
 -nld

On Fri, Mar 27, 2015 at 10:04 AM, Dan McDonald <danmcd at omniti.com> wrote:

>
> > On Mar 27, 2015, at 11:03 AM, Matej Zerovnik <matej at zunaj.si> wrote:
> >
> > Only rpool is on megaraid, the storage is on LSI Logic SAS2308 HBA,
> which I think is using mpt_sas driver. What change do you plan on putting
> it? Does it concern mpt_sas driver?
>
> There are some mpt_sas improvements, but no amount of driver improvements
> can fix the failure modes caused by SATA drives in SAS expanders.  You Just
> Can't Fix That.
>
> >>> I wouldn't like to do a major upgrade, since this is a production
> machine. Too scary:)
> >> You should plan for it, however.  SATA drives on SAS expanders is a
> recipe for disaster, as you're seeing.
> > Is there a better support for SATA drives in newer omnios?
>
> Not when you're using them in situations that are operationally dangerous.
>
> Were you a paying customer, we would tell you we don't support SATA drives
> in SAS expanders.
>
> Sorry,
> Dan
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150327/0694e16e/attachment.html>

From dave-oo at pooserville.com  Sat Mar 28 03:51:17 2015
From: dave-oo at pooserville.com (Dave Pooser)
Date: Fri, 27 Mar 2015 22:51:17 -0500
Subject: [OmniOS-discuss] iSCSI target hang,
 no way to restart but server reboot
In-Reply-To: <D13B8DD5.2F98EF%dave-sa@pooserville.com>
References: <5515562B.9090900@zunaj.si>
	<D1BF7DA0-907E-4C87-894F-E8E3D09AC1A9@omniti.com>
	<55156F23.2010300@zunaj.si>
	<17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com>
	<55157137.2010909@zunaj.si>
	<9E6B85F8-7454-4DE2-87AF-28A5BAAF6A33@omniti.com>
	<CABweQm+qp+kaYTucjFqhpcEWxCpE2q2OKd5JaKTZLkxjSJ3fRg@mail.gmail.com>
	<D13B8DD5.2F98EF%dave-sa@pooserville.com>
Message-ID: <D13B8E6E.2F98F2%dave-sa@pooserville.com>

>Having been on the receiving end of similar advice, it is a frustrating
>situation to be in, since you have (and will likely continue to have) the
>hardware in production, without much option for replacement.
>When we had systems like this, we had a lot of success being aggressive in
>swapping out disks that were showing signs of going bad, even before
>critical failures occurred. Also looking at SMART statistics, and
>aggressively replacing those as well.

<snip>

>Aggressively replace devices implicated by these, and hope for the best.
>The best may or may not be what you're hoping for, but may be livable; it
>was for us.

Also bear in mind it's entirely possible to mix SAS and SATA drives in the
same enclosure and even the same vdev-- so as you're aggressively
replacing SATA drives replace them with SAS drives and your system will
become less brittle. Assuming you're using enterprise SATA drives, their
SAS siblings are not much more expensive (often about $20 difference) and
the reliability gains will be significant.
-- 
Dave Pooser
Cat-Herder-in-Chief, Pooserville.com



From richard.elling at richardelling.com  Sat Mar 28 14:39:33 2015
From: richard.elling at richardelling.com (Richard Elling)
Date: Sat, 28 Mar 2015 07:39:33 -0700
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
Message-ID: <D4F08875-74B7-4C2D-948C-7E805C60A235@RichardElling.com>


> On Mar 26, 2015, at 9:24 AM, Doug Hughes <doug at will.to> wrote:
> 
> any recommendations? We're having some pretty big problems with the Solarflare card and driver dropping network under high load. We eliminated LACP as a culprit, and the switch.
> 
> Intel? Chelsio? other?

I've been running exclusively Intel for several years now. It gets the most
attention in the illumos community.

 -- richard


> 
> - Doug
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

From matthew.lagoe at subrigo.net  Sun Mar 29 13:51:09 2015
From: matthew.lagoe at subrigo.net (Matthew Lagoe)
Date: Sun, 29 Mar 2015 06:51:09 -0700
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <D4F08875-74B7-4C2D-948C-7E805C60A235@RichardElling.com>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
	<D4F08875-74B7-4C2D-948C-7E805C60A235@RichardElling.com>
Message-ID: <005d01d06a27$6a2b0f20$3e812d60$@subrigo.net>

The intel cards are nice but they don't have any cx4 cards so we don't use
them. Copper connections have less latency on short links then fiber as you
don't have the electric to optical conversion (when done properly)

-----Original Message-----
From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On
Behalf Of Richard Elling
Sent: Saturday, March 28, 2015 07:40 AM
To: Doug Hughes
Cc: omnios-discuss
Subject: Re: [OmniOS-discuss] best or preferred 10g card for OmniOS


> On Mar 26, 2015, at 9:24 AM, Doug Hughes <doug at will.to> wrote:
> 
> any recommendations? We're having some pretty big problems with the
Solarflare card and driver dropping network under high load. We eliminated
LACP as a culprit, and the switch.
> 
> Intel? Chelsio? other?

I've been running exclusively Intel for several years now. It gets the most
attention in the illumos community.

 -- richard


> 
> - Doug
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss



From chip at innovates.com  Sun Mar 29 14:06:36 2015
From: chip at innovates.com (Schweiss, Chip)
Date: Sun, 29 Mar 2015 09:06:36 -0500
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <005d01d06a27$6a2b0f20$3e812d60$@subrigo.net>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
	<D4F08875-74B7-4C2D-948C-7E805C60A235@RichardElling.com>
	<005d01d06a27$6a2b0f20$3e812d60$@subrigo.net>
Message-ID: <CALeZrrQkCz+xiehMRW89RiHg6Spqgtyo2eAMH_7evnN7WL6Fpw@mail.gmail.com>

On Sun, Mar 29, 2015 at 8:51 AM, Matthew Lagoe <matthew.lagoe at subrigo.net>
wrote:

> The intel cards are nice but they don't have any cx4 cards so we don't use
> them. Copper connections have less latency on short links then fiber as you
> don't have the electric to optical conversion (when done properly)
>

On short links (< 20M) twin-ax copper SFP+ are much more economical and
lower latency than optics.   I would only use optics and fiber if I have
long runs.

-Chip

>
> -----Original Message-----
> From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On
> Behalf Of Richard Elling
> Sent: Saturday, March 28, 2015 07:40 AM
> To: Doug Hughes
> Cc: omnios-discuss
> Subject: Re: [OmniOS-discuss] best or preferred 10g card for OmniOS
>
>
> > On Mar 26, 2015, at 9:24 AM, Doug Hughes <doug at will.to> wrote:
> >
> > any recommendations? We're having some pretty big problems with the
> Solarflare card and driver dropping network under high load. We eliminated
> LACP as a culprit, and the switch.
> >
> > Intel? Chelsio? other?
>
> I've been running exclusively Intel for several years now. It gets the most
> attention in the illumos community.
>
>  -- richard
>
>
> >
> > - Doug
> > _______________________________________________
> > OmniOS-discuss mailing list
> > OmniOS-discuss at lists.omniti.com
> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150329/25b528a3/attachment.html>

From doug at will.to  Sun Mar 29 16:04:03 2015
From: doug at will.to (Doug Hughes)
Date: Sun, 29 Mar 2015 12:04:03 -0400
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <005d01d06a27$6a2b0f20$3e812d60$@subrigo.net>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
	<D4F08875-74B7-4C2D-948C-7E805C60A235@RichardElling.com>
	<005d01d06a27$6a2b0f20$3e812d60$@subrigo.net>
Message-ID: <55182273.4070509@will.to>



On 3/29/2015 9:51 AM, Matthew Lagoe wrote:
> The intel cards are nice but they don't have any cx4 cards so we don't use
> them. Copper connections have less latency on short links then fiber as you
> don't have the electric to optical conversion (when done properly)
>
Do yourself a huge favor and go to the SFP+ direct attach stuff. Longer 
lengths, thinner cables, better cables, etc. The switches are becoming 
really inexpensive.

(I ordered  pair of the Intel X520 DA2 cards from all of the 
recommendations here. I should have them tomorrow)


From danmcd at omniti.com  Sun Mar 29 18:51:45 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Sun, 29 Mar 2015 14:51:45 -0400
Subject: [OmniOS-discuss] Final bloody release for 151013
Message-ID: <44F6A94E-5E6D-4570-97C6-F4A89C2EFA64@omniti.com>

I've pushed a whole wad out except for pkg(5), so expect a large update.

This is the contents of r151014, except any fixes for what I find in testing prior to its release, and one or two policy changes for signed packages in 014 itself.

- omnios-build master commit 1212faf

- Fix to ipmitool (thanks Andy!)

- Reduction in "pkg verify" noise (thanks Ben!)

- Fix to "omnios-userland" for libpcap's update

- Timezone database to 2015a

- OpenSSL to 1.0.2a

- sudo's Timezone (TZ environment variable) checking backported.

- zsh to 5.0.7

- PCI.IDs now are pulled from illumos-omnios, instead of we having two copies

- libffi to 3.2.1, with better build infrastructure.

- illumos-omnios master branch commit 45f3064, merged will illumos-gate 4e90188

- SMBIOS up to 2.8

- Several mr_sas fixes for modern LSI MegaRAID boards, including better raw-disk support for boards that support it.

- NFS lock manager now won't fail in startup when statd leaves entries behind (illumos #4518 - read its analysis, please)

- Disassembly support for Intel BMI1, BMI2, AVX2, and FMA instructions.


This is the last r151013 bloody release.  The next time there is a bloody release, there will be new ISOs, new Kayaks, and a new revision:  r151015.

Thanks!
Dan


From matthew.lagoe at subrigo.net  Mon Mar 30 09:36:52 2015
From: matthew.lagoe at subrigo.net (Matthew Lagoe)
Date: Mon, 30 Mar 2015 02:36:52 -0700
Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS
In-Reply-To: <55182273.4070509@will.to>
References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost>
	<D4F08875-74B7-4C2D-948C-7E805C60A235@RichardElling.com>
	<005d01d06a27$6a2b0f20$3e812d60$@subrigo.net>
	<55182273.4070509@will.to>
Message-ID: <008601d06acd$0e424b50$2ac6e1f0$@subrigo.net>

Sure was a few years ago when we deployed all our stuff and back then the
sfp+ stuff was ridiculous

-----Original Message-----
From: Doug Hughes [mailto:doug at will.to] 
Sent: Sunday, March 29, 2015 09:04 AM
To: Matthew Lagoe
Cc: 'omnios-discuss'
Subject: Re: [OmniOS-discuss] best or preferred 10g card for OmniOS



On 3/29/2015 9:51 AM, Matthew Lagoe wrote:
> The intel cards are nice but they don't have any cx4 cards so we don't 
> use them. Copper connections have less latency on short links then 
> fiber as you don't have the electric to optical conversion (when done 
> properly)
>
Do yourself a huge favor and go to the SFP+ direct attach stuff. Longer
lengths, thinner cables, better cables, etc. The switches are becoming
really inexpensive.

(I ordered  pair of the Intel X520 DA2 cards from all of the recommendations
here. I should have them tomorrow)




From richard.elling at richardelling.com  Mon Mar 30 20:10:42 2015
From: richard.elling at richardelling.com (Richard Elling)
Date: Mon, 30 Mar 2015 13:10:42 -0700
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
	when creating zero eager disks
In-Reply-To: <CA+tR_KxJKn-SZdJTXpWsGC3WS83+iLsi9UcmKS=YXb87pPrH9Q@mail.gmail.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<20150325195212.5a8cebe4@sleipner.datanom.net>
	<CA+tR_KzBgaQWdO41SQhYmR14ZMAFz+8B+0oshGD=Txq0CCnyQQ@mail.gmail.com>
	<AA9C1B75-6888-4AE1-9C66-DC4D04A491AD@omniti.com>
	<07B1E231-B571-46BC-913D-445DBCA54923@omniti.com>
	<CA+tR_KwWu+hT7bOJxiuf1r5cB6EcMrbE_U0QOuK32=1=HNeeRg@mail.gmail.com>
	<7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com>
	<CA+tR_KwXJunaTp06n9kcpmdkC9F6hzvEXhS07vSZy+W49EUZKg@mail.gmail.com>
	<294BC7DA-0A85-4662-B4B2-71C94755A30A@omniti.com>
	<CA+tR_KxJKn-SZdJTXpWsGC3WS83+iLsi9UcmKS=YXb87pPrH9Q@mail.gmail.com>
Message-ID: <7F1FFBA7-3F56-4A16-8189-CBB8B6F7EE79@RichardElling.com>


On Mar 26, 2015, at 11:24 PM, wuffers <moo at wuffers.net> wrote:

>> 
>> So here's what I will attempt to test:
>> - Create thin vmdk @ 10TB with vSphere fat client: PASS 
>> - Create lazy zeroed vmdk @ 10 TB with vSphere fat client: PASS
>> - Create eager zeroed vmdk @ 10 TB with vSphere web client: PASS! (took 1 hour)
>> - Create thin vmdk @ 10TB with vSphere web client: PASS
>> - Create lazy zeroed vmdk @ 10 TB with vSphere web client: PASS
> 
> Additionally, I tried:
> - Create fixed vhdx @ 10TB with SCVMM (Hyper-V): PASS (most likely no primitives in use here - this took slightly over 3 hours)

is compression enabled?


  -- richard

>  
> Everything passed (which I didn't expect, especially the 10TB eager zero).. then I tried again on the vSphere web client for a 20TB eager zero disk, and I got another kernel panic altogether (no kmem_flags 0xf set, unfortunately).
> 
> Mar 27 2015 01:09:33.664060000 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd SUNOS-8000-KL
> 
>   TIME                 CLASS                                 ENA
>   Mar 27 01:09:33.6307 ireport.os.sunos.panic.dump_available 0x0000000000000000
>   Mar 27 01:08:30.6688 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000
> 
> nvlist version: 0
>         version = 0x0
>         class = list.suspect
>         uuid = 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
>         code = SUNOS-8000-KL
>         diag-time = 1427432973 633746
>         de = fmd:///module/software-diagnosis
>         fault-list-sz = 0x1
>         fault-list = (array of embedded nvlists)
>         (start fault-list[0])
>         nvlist version: 0
>                 version = 0x0
>                 class = defect.sunos.kernel.panic
>                 certainty = 0x64
>                 asru = sw:///:path=/var/crash/unknown/.2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
>                 resource = sw:///:path=/var/crash/unknown/.2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
>                 savecore-succcess = 1
>                 dump-dir = /var/crash/unknown
>                 dump-files = vmdump.2
>                 os-instance-uuid = 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
>                 panicstr = BAD TRAP: type=d (#gp General protection) rp=ffffff01eb72ea70 addr=0
>                 panicstack = unix:real_mode_stop_cpu_stage2_end+9e23 () | unix:trap+a30 () | unix:cmntrap+e6 () | genunix:anon_decref+35 () | genunix:anon_free+74 () | genunix:segvn_free+242 () | genunix:seg_free+30 () | genunix:segvn_unmap+cde () | genunix:as_free+e7 () | genunix:relvm+220 () | genunix:proc_exit+454 () | genunix:exit+15 () | genunix:rexit+18 () | unix:brand_sys_sysenter+1c9 () |
>                 crashtime = 1427431421
>                 panic-time = Fri Mar 27 00:43:41 2015 EDT
>         (end fault-list[0])
> 
>         fault-status = 0x1
>         severity = Major
>         __ttl = 0x1
>         __tod = 0x5514e60d 0x2794c060
> 
> Crash file:
> https://drive.google.com/file/d/0B7mCJnZUzJPKT0lpTW9GZFJCLTg/view?usp=sharing
> 
> It appears I can do thin and lazy zero disks of those sizes, so I will have to be satisfied to use those options as a workaround (plus disabling WRITE_SAME from the hosts if I really wanted the eager zeroed disk) until some of that Nexenta COMSTAR love is upstreamed. For comparison sake, provisioning a 10TB fixed vhdx took approximately 3 hours in Hyper-V, while the same provisioning in VMware took about 1 hour. So we can say that WRITE_SAME accelerated the same job by 3x.
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150330/a22ae797/attachment-0001.html>

From moo at wuffers.net  Mon Mar 30 20:16:53 2015
From: moo at wuffers.net (wuffers)
Date: Mon, 30 Mar 2015 16:16:53 -0400
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
	when creating zero eager disks
In-Reply-To: <7F1FFBA7-3F56-4A16-8189-CBB8B6F7EE79@RichardElling.com>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<20150325195212.5a8cebe4@sleipner.datanom.net>
	<CA+tR_KzBgaQWdO41SQhYmR14ZMAFz+8B+0oshGD=Txq0CCnyQQ@mail.gmail.com>
	<AA9C1B75-6888-4AE1-9C66-DC4D04A491AD@omniti.com>
	<07B1E231-B571-46BC-913D-445DBCA54923@omniti.com>
	<CA+tR_KwWu+hT7bOJxiuf1r5cB6EcMrbE_U0QOuK32=1=HNeeRg@mail.gmail.com>
	<7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com>
	<CA+tR_KwXJunaTp06n9kcpmdkC9F6hzvEXhS07vSZy+W49EUZKg@mail.gmail.com>
	<294BC7DA-0A85-4662-B4B2-71C94755A30A@omniti.com>
	<CA+tR_KxJKn-SZdJTXpWsGC3WS83+iLsi9UcmKS=YXb87pPrH9Q@mail.gmail.com>
	<7F1FFBA7-3F56-4A16-8189-CBB8B6F7EE79@RichardElling.com>
Message-ID: <3E786626-25FC-48C8-9F9E-750BEEA9A7FA@wuffers.net>


> On Mar 30, 2015, at 4:10 PM, Richard Elling <richard.elling at richardelling.com> wrote:
> 
> 
> is compression enabled?
> 
> 
>   -- richard
>> 

Yes, LZ4. Dedupe off.

From richard.elling at richardelling.com  Mon Mar 30 23:56:37 2015
From: richard.elling at richardelling.com (Richard Elling)
Date: Mon, 30 Mar 2015 16:56:37 -0700
Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected"
	when creating zero eager disks
In-Reply-To: <3E786626-25FC-48C8-9F9E-750BEEA9A7FA@wuffers.net>
References: <CA+tR_Kxjm_1wcyDMXKGKJzNzEJW=VB_D0Mh=PCJEDUS7cBLPMw@mail.gmail.com>
	<847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com>
	<CA+tR_KyY+4eEQzp__XTjNfsnb+yepWo7vGoKoEN2QbqPVpRqdQ@mail.gmail.com>
	<82CC383C-71A1-425D-89E6-B7B725071183@omniti.com>
	<67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com>
	<20150325195212.5a8cebe4@sleipner.datanom.net>
	<CA+tR_KzBgaQWdO41SQhYmR14ZMAFz+8B+0oshGD=Txq0CCnyQQ@mail.gmail.com>
	<AA9C1B75-6888-4AE1-9C66-DC4D04A491AD@omniti.com>
	<07B1E231-B571-46BC-913D-445DBCA54923@omniti.com>
	<CA+tR_KwWu+hT7bOJxiuf1r5cB6EcMrbE_U0QOuK32=1=HNeeRg@mail.gmail.com>
	<7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com>
	<CA+tR_KwXJunaTp06n9kcpmdkC9F6hzvEXhS07vSZy+W49EUZKg@mail.gmail.com>
	<294BC7DA-0A85-4662-B4B2-71C94755A30A@omniti.com>
	<CA+tR_KxJKn-SZdJTXpWsGC3WS83+iLsi9UcmKS=YXb87pPrH9Q@mail.gmail.com>
	<7F1FFBA7-3F56-4A16-8189-CBB8B6F7EE79@RichardElling.com>
	<3E786626-25FC-48C8-9F9E-750BEEA9A7FA@wuffers.net>
Message-ID: <798CF5FF-0260-4F7A-9115-3C37D23E5230@richardelling.com>


> On Mar 30, 2015, at 1:16 PM, wuffers <moo at wuffers.net> wrote:
> 
> 
>> On Mar 30, 2015, at 4:10 PM, Richard Elling <richard.elling at richardelling.com> wrote:
>> 
>> 
>> is compression enabled?
>> 
>> 
>>  -- richard
>>> 
> 
> Yes, LZ4. Dedupe off.

Ironically, WRITE_SAME is the perfect workload for dedup :-)
 -- richard


From danmcd at omniti.com  Tue Mar 31 02:04:33 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 30 Mar 2015 22:04:33 -0400
Subject: [OmniOS-discuss] A reminder about r151010
Message-ID: <BFDAC439-6A7B-497C-AA99-0862738815BB@omniti.com>

Once the release of r151014 hits the streets, the r151010 release becomes unsupported.  Please migrate your 010 box to either 012 or 014.  We DO support upgrades from 010 to 014.  Modulo any odd packages that place constraints (and there are some in ms.omniti.com which do), 010 to 014 is a clean upgrade if you follow the (not yet published) r151014 upgrade instructions.

Thank you!
Dan


From matej at zunaj.si  Tue Mar 31 12:08:01 2015
From: matej at zunaj.si (Matej Zerovnik)
Date: Tue, 31 Mar 2015 14:08:01 +0200
Subject: [OmniOS-discuss] iSCSI target hang,
 no way to restart but server reboot
In-Reply-To: <CABweQm+qp+kaYTucjFqhpcEWxCpE2q2OKd5JaKTZLkxjSJ3fRg@mail.gmail.com>
References: <5515562B.9090900@zunaj.si>	<D1BF7DA0-907E-4C87-894F-E8E3D09AC1A9@omniti.com>	<55156F23.2010300@zunaj.si>	<17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com>	<55157137.2010909@zunaj.si>	<9E6B85F8-7454-4DE2-87AF-28A5BAAF6A33@omniti.com>
	<CABweQm+qp+kaYTucjFqhpcEWxCpE2q2OKd5JaKTZLkxjSJ3fRg@mail.gmail.com>
Message-ID: <551A8E21.401@zunaj.si>


On 27. 03. 2015 16:13, Narayan Desai wrote:
> Having been on the receiving end of similar advice, it is a 
> frustrating situation to be in, since you have (and will likely 
> continue to have) the hardware in production, without much option for 
> replacement.
>
> When we had systems like this, we had a lot of success being 
> aggressive in swapping out disks that were showing signs of going bad, 
> even before critical failures occurred. Also looking at SMART 
> statistics, and aggressively replacing those as well. This made the 
> situation manageable. Basically, having sata drives in sas expanders 
> means the system is brittle, and you should treat it as such. Look for:
>  - errors in iostat -En
>  - high service times in iostat -xnz
>  - smartctl (this causes harmless sense messages when devices are 
> probed, but it is easy enough to ignore these)
>  - any errors reported out of lsiutil, showing either problems with 
> cabling/enclosures, or devices
>  - decode any sense errors reported by the lsi driver
>
> Aggressively replace devices implicated by these, and hope for the 
> best. The best may or may not be what you're hoping for, but may be 
> livable; it was for us.
>
When errors happened to you, were you able to use the pool itself and 
only iscsi target froze or did you have troubles with the pool itself as 
well...

Because on our end, when iscsi target freezes, zpool is perfectly ok. We 
can access it and use it locally, but iscsi target is frozen and can't 
be restarted.

I will check my sistem with iostat and smartctl, but we are using 
seagate drives, so some of the smartctl stats are useless on 1st sight:)

Matej

From narayan.desai at gmail.com  Tue Mar 31 12:54:37 2015
From: narayan.desai at gmail.com (Narayan Desai)
Date: Tue, 31 Mar 2015 05:54:37 -0700
Subject: [OmniOS-discuss] iSCSI target hang,
	no way to restart but server reboot
In-Reply-To: <551A8E21.401@zunaj.si>
References: <5515562B.9090900@zunaj.si>
	<D1BF7DA0-907E-4C87-894F-E8E3D09AC1A9@omniti.com>
	<55156F23.2010300@zunaj.si>
	<17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com>
	<55157137.2010909@zunaj.si>
	<9E6B85F8-7454-4DE2-87AF-28A5BAAF6A33@omniti.com>
	<CABweQm+qp+kaYTucjFqhpcEWxCpE2q2OKd5JaKTZLkxjSJ3fRg@mail.gmail.com>
	<551A8E21.401@zunaj.si>
Message-ID: <CABweQmJrPLHOWpiCVPbmjMkEw=f-Lh1h8xvwwLA=P8WdibmG3A@mail.gmail.com>

We were primarily using the machines for serving iscsi to VMs, and we'd see
bad cascading failures (iscsi lun timeouts would cause the watchdog to kick
in on the linux hosts, resetting the initiator, meanwhile the VM would
decide that the virtio devices in the VM were dead, requiring a client
reboot). In some cases, the problems would happen across all luns, in
others it would be just particular luns. I assume this followed the
severity of the situation with the failing drive (or number of failing
drives before got aggressive about replacement). Similarly, we'd see a
range of behaviors with local pool commands, ranging from everything
looking alright to zpool commands hanging or running *extremely* slowly.

I'd hacked up some quick scripts to correlate info from the different
sources. They are here:
https://github.com/narayandesai/diy-lsi
They may or may not be portable, but demonstrate all of the info gathering
methods we found useful. Another thing that was useful was maintaining a
pool inventory (stored somewhere else) with device addresses, serial
numbers, and jbod bay mappings. Having to map that you when things are
falling apart is seriously sad times.

fwiw, you might still be ok with seagate drives; we were only using the
self-check predictive failure flag, as opposed to anything more
complicated.
good luck
 -nld

On Tue, Mar 31, 2015 at 5:08 AM, Matej Zerovnik <matej at zunaj.si> wrote:

>
> On 27. 03. 2015 16:13, Narayan Desai wrote:
>
>> Having been on the receiving end of similar advice, it is a frustrating
>> situation to be in, since you have (and will likely continue to have) the
>> hardware in production, without much option for replacement.
>>
>> When we had systems like this, we had a lot of success being aggressive
>> in swapping out disks that were showing signs of going bad, even before
>> critical failures occurred. Also looking at SMART statistics, and
>> aggressively replacing those as well. This made the situation manageable.
>> Basically, having sata drives in sas expanders means the system is brittle,
>> and you should treat it as such. Look for:
>>  - errors in iostat -En
>>  - high service times in iostat -xnz
>>  - smartctl (this causes harmless sense messages when devices are probed,
>> but it is easy enough to ignore these)
>>  - any errors reported out of lsiutil, showing either problems with
>> cabling/enclosures, or devices
>>  - decode any sense errors reported by the lsi driver
>>
>> Aggressively replace devices implicated by these, and hope for the best.
>> The best may or may not be what you're hoping for, but may be livable; it
>> was for us.
>>
>>  When errors happened to you, were you able to use the pool itself and
> only iscsi target froze or did you have troubles with the pool itself as
> well...
>
> Because on our end, when iscsi target freezes, zpool is perfectly ok. We
> can access it and use it locally, but iscsi target is frozen and can't be
> restarted.
>
> I will check my sistem with iostat and smartctl, but we are using seagate
> drives, so some of the smartctl stats are useless on 1st sight:)
>
> Matej
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150331/63dfc139/attachment.html>

From pasztor at sagv5.gyakg.u-szeged.hu  Mon Mar 23 21:19:53 2015
From: pasztor at sagv5.gyakg.u-szeged.hu (=?iso-8859-2?Q?P=C1SZTOR_Gy=F6rgy?=)
Date: Mon, 23 Mar 2015 21:19:53 -0000
Subject: [OmniOS-discuss] A warning for upgraders with large numbers of
	BEs
In-Reply-To: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com>
References: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com>
Message-ID: <20150323205308.GA21991@linux.gyakg.u-szeged.hu>

Hi,

"Dan McDonald" <danmcd at omniti.com> wrote at 2015-03-23 16:14:
> Soon r151014 will be hitting the streets.  WHEN THAT DOES, I have to warn people, especially those jumping from r151006 to r151014 about a known issue in grub.
> 
> The illumos grub has serious memory management issues.  It cannot cope with too many boot environment (BE) entries.

Sorry for semi-offtopicing the thread, but: Will the lx brand be restored
in the upcoming release?

Is there a feature map / release plan / anything available?
I tried to find information regarding this topic without success.

I checked this url:
http://omnios.omniti.com/roadmap.php
But nothing relevant information was there. It seems outdated /
unmaintained.

I've just recently find this distro. I used openindiana since Oracle...
-- Did what they did to opensolaris --

So, I'm new here, sorry for lame questions.

Kind regards,
Gy?rgy P?sztor