From wverb73 at gmail.com Mon Mar 2 05:03:37 2015 From: wverb73 at gmail.com (W Verb) Date: Sun, 1 Mar 2015 21:03:37 -0800 Subject: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: <54EB5392.6030900@osn.de> References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> Message-ID: Hello all, Well, I no longer blame the ixgbe driver for the problems I'm seeing. I tried Joerg's updated driver, which didn't improve the issue. So I went back to the drawing board and rebuilt the server from scratch. What I noted is that if I have only a single 1-gig physical interface active on the ESXi host, everything works as expected. As soon as I enable two interfaces, I start seeing the performance problems I've described. Response pauses from the server that I see in TCPdumps are still leading me to believe the problem is delay on the server side, so I ran a series of kernel dtraces and produced some flamegraphs. This was taken during a read operation with two active 10G interfaces on the server, with a single target being shared by two tpgs- one tpg for each 10G physical port. The host device has two 1G ports enabled, with VLANs separating the active ports into 10G/1G pairs. ESXi is set to multipath using both VLANS with a round-robin IO interval of 1. https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing This was taken during a write operation: https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing I then rebooted the server and disabled C-State, ACPI T-State, and general EIST (Turbo boost) functionality in the CPU. I when I attempted to boot my guest VM, the iSCSI transfer gradually ground to a halt during the boot loading process, and the guest OS never did complete its boot process. Here is a flamegraph taken while iSCSI is slowly dying: https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing I edited out cpu_idle_adaptive from the dtrace output and regenerated the slowdown graph: https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing I then edited cpu_idle_adaptive out of the speedy write operation and regenerated that graph: https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing I have zero experience with interpreting flamegraphs, but the most significant difference I see between the slow read example and the fast write example is in unix`thread_start --> unix`idle. There's a good chunk of "unix`i86_mwait" in the read example that is not present in the write example at all. Disabling the l2arc cache device didn't make a difference, and I had to reenable EIST support on the CPU to get my VMs to boot. I am seeing a variety of bug reports going back to 2010 regarding excessive mwait operations, with the suggested solutions usually being to set "cpupm enable poll-mode" in power.conf. That change also had no effect on speed. -Warren V -----Original Message----- From: Chris Siebenmann [mailto:cks at cs.toronto.edu ] Sent: Monday, February 23, 2015 8:30 AM To: W Verb Cc: omnios-discuss at lists.omniti.com; cks at cs.toronto.edu Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy > Chris, thanks for your specific details. I'd appreciate it if you > could tell me which copper NIC you tried, as well as to pass on the > iSCSI tuning parameters. Our copper NIC experience is with onboard X540-AT2 ports on SuperMicro hardware (which have the guaranteed 10-20 msec lock hold) and dual-port 82599EB TN cards (which have some sort of driver/hardware failure under load that eventually leads to 2-second lock holds). I can't recommend either with the current driver; we had to revert to 1G networking in order to get stable servers. The iSCSI parameter modifications we do, across both initiators and targets, are: initialr2t no firstburstlength 128k maxrecvdataseglen 128k [only on Linux backends] maxxmitdataseglen 128k [only on Linux backends] The OmniOS initiator doesn't need tuning for more than the first two parameters; on the Linux backends we tune up all four. My extended thoughts on these tuning parameters and why we touch them can be found here: http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning The short version is that these parameters probably only make a small difference but their overall goal is to do 128KB ZFS reads and writes in single iSCSI operations (although they will be fragmented at the TCP layer) and to do iSCSI writes without a back-and-forth delay between initiator and target (that's 'initialr2t no'). I think basically everyone should use InitialR2T set to no and in fact that it should be the software default. These days only unusually limited iSCSI targets should need it to be otherwise and they can change their setting for it (initiator and target must both agree to it being 'yes', so either can veto it). - cks On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann wrote: > Hi, > > I think your problem is caused by your link properties or your > switch settings. In general the standard ixgbe seems to perform > well. > > I had trouble after changing the default flow control settings to "bi" > and this was my motivation to update the ixgbe driver a long time ago. > After I have updated our systems to ixgbe 2.5.8 I never had any > problems .... > > Make sure your switch has support for jumbo frames and you use > the same mtu on all ports, otherwise the smallest will be used. > > What switch do you use? I can tell you nice horror stories about > different vendors.... > > - Joerg > > On 23.02.2015 10:31, W Verb wrote: > >> Thank you Joerg, >> >> I've downloaded the package and will try it tomorrow. >> >> The only thing I can add at this point is that upon review of my >> testing, I may have performed my "pkg -u" between the initial quad-gig >> performance test and installing the 10G NIC. So this may be a new >> problem introduced in the latest updates. >> >> Those of you who are running 10G and have not upgraded to the latest >> kernel, etc, might want to do some additional testing before running the >> update. >> >> -Warren V >> >> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann > > wrote: >> >> Hi, >> >> I remember there was a problem with the flow control settings in the >> ixgbe >> driver, so I updated it a long time ago for our internal servers to >> 2.5.8. >> Last weekend I integrated the latest changes from the FreeBSD driver >> to bring >> the illumos ixgbe to 2.5.25 but I had no time to test it, so it's >> completely >> untested! >> >> >> If you would like to give the latest driver a try you can fetch the >> kernel modules from >> https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9 >> >> >> Clone your boot environment, place the modules in the new environment >> and update the boot-archive of the new BE. >> >> - Joerg >> >> >> >> >> >> On 23.02.2015 02:54, W Verb wrote: >> >> By the way, to those of you who have working setups: please send >> me >> your pool/volume settings, interface linkprops, and any kernel >> tuning >> parameters you may have set. >> >> Thanks, >> Warren V >> >> On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip >> > wrote: >> >> I can't say I totally agree with your performance >> assessment. I run Intel >> X520 in all my OmniOS boxes. >> >> Here is a capture of nfssvrtop I made while running many >> storage vMotions >> between two OmniOS boxes hosting NFS datastores. This is a >> 10 host VMware >> cluster. Both OmniOS boxes are dual 10G connected with >> copper twin-ax to >> the in rack Nexus 5010. >> >> VMware does 100% sync writes, I use ZeusRAM SSDs for log >> devices. >> >> -Chip >> >> 2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB, >> swrite: 15985 KB, >> awrite: 1875455 KB >> >> Ver Client NFSOPS Reads SWrites AWrites >> Commits Rd_bw >> SWr_bw AWr_bw Rd_t SWr_t AWr_t Com_t Align% >> >> 4 10.28.17.105 0 0 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.17.215 0 0 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.17.213 0 0 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.16.151 0 0 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 all 1 0 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 3 10.28.16.175 3 0 3 0 >> 0 1 >> 11 0 4806 48 0 0 85 >> >> 3 10.28.16.183 6 0 6 0 >> 0 3 >> 162 0 549 124 0 0 73 >> >> 3 10.28.16.180 11 0 10 0 >> 0 3 >> 27 0 776 89 0 0 67 >> >> 3 10.28.16.176 28 2 26 0 >> 0 10 >> 405 0 2572 198 0 0 100 >> >> 3 10.28.16.178 4606 4602 4 0 >> 0 294534 >> 3 0 723 49 0 0 99 >> >> 3 10.28.16.179 4905 4879 26 0 >> 0 312208 >> 311 0 735 271 0 0 99 >> >> 3 10.28.16.181 5515 5502 13 0 >> 0 352107 >> 77 0 89 87 0 0 99 >> >> 3 10.28.16.184 12095 12059 10 0 >> 0 763014 >> 39 0 249 147 0 0 99 >> >> 3 10.28.58.1 15401 6040 116 6354 >> 53 191605 >> 474 202346 192 96 144 83 99 >> >> 3 all 42574 33086 217 >> 6354 53 1913488 >> 1582 202300 348 138 153 105 99 >> >> >> >> >> >> On Fri, Feb 20, 2015 at 11:46 PM, W Verb > > wrote: >> >> >> Hello All, >> >> Thank you for your replies. >> I tried a few things, and found the following: >> >> 1: Disabling hyperthreading support in the BIOS drops >> performance overall >> by a factor of 4. >> 2: Disabling VT support also seems to have some effect, >> although it >> appears to be minor. But this has the amusing side >> effect of fixing the >> hangs I've been experiencing with fast reboot. Probably >> by disabling kvm. >> 3: The performance tests are a bit tricky to quantify >> because of caching >> effects. In fact, I'm not entirely sure what is >> happening here. It's just >> best to describe what I'm seeing: >> >> The commands I'm using to test are >> dd if=/dev/zero of=./test.dd bs=2M count=5000 >> dd of=/dev/null if=./test.dd bs=2M count=5000 >> The host vm is running Centos 6.6, and has the latest >> vmtools installed. >> There is a host cache on an SSD local to the host that >> is also in place. >> Disabling the host cache didn't immediately have an >> effect as far as I could >> see. >> >> The host MTU set to 3000 on all iSCSI interfaces for all >> tests. >> >> Test 1: Right after reboot, with an ixgbe MTU of 9000, >> the write test >> yields an average speed over three tests of 137MB/s. The >> read test yields an >> average over three tests of 5MB/s. >> >> Test 2: After setting "ifconfig ixgbe0 mtu 3000", the >> write tests yield >> 140MB/s, and the read tests yield 53MB/s. It's important >> to note here that >> if I cut the read test short at only 2-3GB, I get >> results upwards of >> 350MB/s, which I assume is local cache-related distortion. >> >> Test 3: MTU of 1500. Read tests are up to 156 MB/s. >> Write tests yield >> about 142MB/s. >> Test 4: MTU of 1000: Read test at 182MB/s. >> Test 5: MTU of 900: Read test at 130 MB/s. >> Test 6: MTU of 1000: Read test at 160MB/s. Write tests >> are now >> consistently at about 300MB/s. >> Test 7: MTU of 1200: Read test at 124MB/s. >> Test 8: MTU of 1000: Read test at 161MB/s. Write at >> 261MB/s. >> >> A few final notes: >> L1ARC grabs about 10GB of RAM during the tests, so >> there's definitely some >> read caching going on. >> The write operations are easier to observe with iostat, >> and I'm seeing io >> rates that closely correlate with the network write >> speeds. >> >> >> Chris, thanks for your specific details. I'd appreciate >> it if you could >> tell me which copper NIC you tried, as well as to pass >> on the iSCSI tuning >> parameters. >> >> I've ordered an Intel EXPX9502AFXSR, which uses the >> 82598 chip instead of >> the 82599 in the X520. If I get similar results with my >> fiber transcievers, >> I'll see if I can get a hold of copper ones. >> >> But I should mention that I did indeed look at PHY/MAC >> error rates, and >> they are nil. >> >> -Warren V >> >> On Fri, Feb 20, 2015 at 7:25 PM, Chris Siebenmann >> > >> >> wrote: >> >> >> After installation and configuration, I observed >> all kinds of bad >> behavior >> in the network traffic between the hosts and the >> server. All of this >> bad >> behavior is traced to the ixgbe driver on the >> storage server. Without >> going >> into the full troubleshooting process, here are >> my takeaways: >> >> [...] >> >> For what it's worth, we managed to achieve much >> better line rates on >> copper 10G ixgbe hardware of various descriptions >> between OmniOS >> and CentOS 7 (I don't think we ever tested OmniOS to >> OmniOS). I don't >> believe OmniOS could do TCP at full line rate but I >> think we managed 700+ >> Mbytes/sec on both transmit and receive and we got >> basically disk-limited >> speeds with iSCSI (across multiple disks on >> multi-disk mirrored pools, >> OmniOS iSCSI initiator, Linux iSCSI targets). >> >> I don't believe we did any specific kernel tuning >> (and in fact some of >> our attempts to fiddle ixgbe driver parameters blew >> up in our face). >> We did tune iSCSI connection parameters to increase >> various buffer >> sizes so that ZFS could do even large single >> operations in single iSCSI >> transactions. (More details available if people are >> interested.) >> >> 10: At the wire level, the speed problems are >> clearly due to pauses in >> response time by omnios. At 9000 byte frame >> sizes, I see a good number >> of duplicate ACKs and fast retransmits during >> read operations (when >> omnios is transmitting). But below about a >> 4100-byte MTU on omnios >> (which seems to correlate to 4096-byte iSCSI >> block transfers), the >> transmission errors fade away and we only see >> the transmission pause >> problem. >> >> >> This is what really attracted my attention. In >> our OmniOS setup, our >> specific Intel hardware had ixgbe driver issues that >> could cause >> activity stalls during once-a-second link heartbeat >> checks. This >> obviously had an effect at the TCP and iSCSI layers. >> My initial message >> to illumos-developer sparked a potentially >> interesting discussion: >> >> >> http://www.listbox.com/member/ >> __archive/182179/2014/10/sort/__time_rev/page/16/entry/6: >> 405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/ >> > member/archive/182179/2014/10/sort/time_rev/page/16/entry/6: >> 405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/> >> >> If you think this is a possibility in your setup, >> I've put the DTrace >> script I used to hunt for this up on the web: >> >> http://www.cs.toronto.edu/~__ >> cks/src/omnios-ixgbe/ixgbe___delay.d >> > cks/src/omnios-ixgbe/ixgbe_delay.d> >> >> This isn't the only potential source of driver >> stalls by any means, it's >> just the one I found. You may also want to look at >> lockstat in general, >> as information it reported is what led us to look >> specifically at the >> ixgbe code here. >> >> (If you suspect kernel/driver issues, lockstat >> combined with kernel >> source is a really excellent resource.) >> >> - cks >> >> >> >> >> _________________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.__com >> >> http://lists.omniti.com/__mailman/listinfo/omnios-__ >> discuss >> >> >> >> _________________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.__com >> >> http://lists.omniti.com/__mailman/listinfo/omnios-__discuss >> >> >> >> -- >> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg >> Tel: +49 911 39905-0 - Fax: +49 911 >> 39905-55 - http://www.osn.de >> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >> >> >> > -- > OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg > Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de > HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garrett at damore.org Mon Mar 2 05:11:16 2015 From: garrett at damore.org (Garrett D'Amore) Date: Sun, 1 Mar 2015 21:11:16 -0800 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> Message-ID: <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> I?m not sure I?ve followed properly. You have *two* interfaces. You are not trying to provision these in an aggr are you? As far as I?m aware, VMware does not support 802.3ad link aggregations. (Its possible that you can make it work with ESXi if you give the entire NIC to the guest ? but I?m skeptical.) The problem is that if you try to use link aggregation, some packets (up to half!) will be lost. TCP and other protocols fare poorly in this situation. Its possible I?ve totally misunderstood what you?re trying to do, in which case I apologize. The idle thing is a red-herring ? the cpu is waiting for work to do, probably because packets haven?t arrived (or where dropped by the hypervisor!) I wouldn?t read too much into that except that your network stack is in trouble. I?d look a bit more closely at the kstats for tcp ? I suspect you?ll see retransmits or out of order values that are unusually high ? if so this may help validate my theory above. - Garrett > On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer wrote: > > Hello all, > > > Well, I no longer blame the ixgbe driver for the problems I'm seeing. > > > > I tried Joerg's updated driver, which didn't improve the issue. So I went back to the drawing board and rebuilt the server from scratch. > > What I noted is that if I have only a single 1-gig physical interface active on the ESXi host, everything works as expected. As soon as I enable two interfaces, I start seeing the performance problems I've described. > > Response pauses from the server that I see in TCPdumps are still leading me to believe the problem is delay on the server side, so I ran a series of kernel dtraces and produced some flamegraphs. > > > > This was taken during a read operation with two active 10G interfaces on the server, with a single target being shared by two tpgs- one tpg for each 10G physical port. The host device has two 1G ports enabled, with VLANs separating the active ports into 10G/1G pairs. ESXi is set to multipath using both VLANS with a round-robin IO interval of 1. > > https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing > > This was taken during a write operation: > > https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing > > I then rebooted the server and disabled C-State, ACPI T-State, and general EIST (Turbo boost) functionality in the CPU. > > I when I attempted to boot my guest VM, the iSCSI transfer gradually ground to a halt during the boot loading process, and the guest OS never did complete its boot process. > > Here is a flamegraph taken while iSCSI is slowly dying: > > https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing > > I edited out cpu_idle_adaptive from the dtrace output and regenerated the slowdown graph: > > https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing > > I then edited cpu_idle_adaptive out of the speedy write operation and regenerated that graph: > > https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing > > I have zero experience with interpreting flamegraphs, but the most significant difference I see between the slow read example and the fast write example is in unix`thread_start --> unix`idle. There's a good chunk of "unix`i86_mwait" in the read example that is not present in the write example at all. > > Disabling the l2arc cache device didn't make a difference, and I had to reenable EIST support on the CPU to get my VMs to boot. > > I am seeing a variety of bug reports going back to 2010 regarding excessive mwait operations, with the suggested solutions usually being to set "cpupm enable poll-mode" in power.conf. That change also had no effect on speed. > > -Warren V > > > > > -----Original Message----- > > From: Chris Siebenmann [mailto:cks at cs.toronto.edu ] > > Sent: Monday, February 23, 2015 8:30 AM > > To: W Verb > > Cc: omnios-discuss at lists.omniti.com ; cks at cs.toronto.edu > Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy > > > > Chris, thanks for your specific details. I'd appreciate it if you > > > could tell me which copper NIC you tried, as well as to pass on the > > > iSCSI tuning parameters. > > > Our copper NIC experience is with onboard X540-AT2 ports on SuperMicro hardware (which have the guaranteed 10-20 msec lock hold) and dual-port 82599EB TN cards (which have some sort of driver/hardware failure under load that eventually leads to 2-second lock holds). I can't recommend either with the current driver; we had to revert to 1G networking in order to get stable servers. > > > The iSCSI parameter modifications we do, across both initiators and targets, are: > > > initialr2t no > > firstburstlength 128k > > maxrecvdataseglen 128k [only on Linux backends] > > maxxmitdataseglen 128k [only on Linux backends] > > > The OmniOS initiator doesn't need tuning for more than the first two parameters; on the Linux backends we tune up all four. My extended thoughts on these tuning parameters and why we touch them can be found > > here: > > > http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol > http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning > > The short version is that these parameters probably only make a small difference but their overall goal is to do 128KB ZFS reads and writes in single iSCSI operations (although they will be fragmented at the TCP > > layer) and to do iSCSI writes without a back-and-forth delay between initiator and target (that's 'initialr2t no'). > > > I think basically everyone should use InitialR2T set to no and in fact that it should be the software default. These days only unusually limited iSCSI targets should need it to be otherwise and they can change their setting for it (initiator and target must both agree to it being 'yes', so either can veto it). > > > - cks > > > > On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann > wrote: > Hi, > > I think your problem is caused by your link properties or your > switch settings. In general the standard ixgbe seems to perform > well. > > I had trouble after changing the default flow control settings to "bi" > and this was my motivation to update the ixgbe driver a long time ago. > After I have updated our systems to ixgbe 2.5.8 I never had any > problems .... > > Make sure your switch has support for jumbo frames and you use > the same mtu on all ports, otherwise the smallest will be used. > > What switch do you use? I can tell you nice horror stories about > different vendors.... > > - Joerg > > On 23.02.2015 10:31, W Verb wrote: > Thank you Joerg, > > I've downloaded the package and will try it tomorrow. > > The only thing I can add at this point is that upon review of my > testing, I may have performed my "pkg -u" between the initial quad-gig > performance test and installing the 10G NIC. So this may be a new > problem introduced in the latest updates. > > Those of you who are running 10G and have not upgraded to the latest > kernel, etc, might want to do some additional testing before running the > update. > > -Warren V > > On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann > >> wrote: > > Hi, > > I remember there was a problem with the flow control settings in the > ixgbe > driver, so I updated it a long time ago for our internal servers to > 2.5.8. > Last weekend I integrated the latest changes from the FreeBSD driver > to bring > the illumos ixgbe to 2.5.25 but I had no time to test it, so it's > completely > untested! > > > If you would like to give the latest driver a try you can fetch the > kernel modules from > https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9 > > > > Clone your boot environment, place the modules in the new environment > and update the boot-archive of the new BE. > > - Joerg > > > > > > On 23.02.2015 02:54, W Verb wrote: > > By the way, to those of you who have working setups: please send me > your pool/volume settings, interface linkprops, and any kernel > tuning > parameters you may have set. > > Thanks, > Warren V > > On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip > >> wrote: > > I can't say I totally agree with your performance > assessment. I run Intel > X520 in all my OmniOS boxes. > > Here is a capture of nfssvrtop I made while running many > storage vMotions > between two OmniOS boxes hosting NFS datastores. This is a > 10 host VMware > cluster. Both OmniOS boxes are dual 10G connected with > copper twin-ax to > the in rack Nexus 5010. > > VMware does 100% sync writes, I use ZeusRAM SSDs for log > devices. > > -Chip > > 2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB, > swrite: 15985 KB, > awrite: 1875455 KB > > Ver Client NFSOPS Reads SWrites AWrites > Commits Rd_bw > SWr_bw AWr_bw Rd_t SWr_t AWr_t Com_t Align% > > 4 10.28.17.105 0 0 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 10.28.17.215 0 0 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 10.28.17.213 0 0 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 10.28.16.151 0 0 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 all 1 0 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 3 10.28.16.175 3 0 3 0 > 0 1 > 11 0 4806 48 0 0 85 > > 3 10.28.16.183 6 0 6 0 > 0 3 > 162 0 549 124 0 0 73 > > 3 10.28.16.180 11 0 10 0 > 0 3 > 27 0 776 89 0 0 67 > > 3 10.28.16.176 28 2 26 0 > 0 10 > 405 0 2572 198 0 0 100 > > 3 10.28.16.178 4606 4602 4 0 > 0 294534 > 3 0 723 49 0 0 99 > > 3 10.28.16.179 4905 4879 26 0 > 0 312208 > 311 0 735 271 0 0 99 > > 3 10.28.16.181 5515 5502 13 0 > 0 352107 > 77 0 89 87 0 0 99 > > 3 10.28.16.184 12095 12059 10 0 > 0 763014 > 39 0 249 147 0 0 99 > > 3 10.28.58.1 15401 6040 116 6354 > 53 191605 > 474 202346 192 96 144 83 99 > > 3 all 42574 33086 217 > 6354 53 1913488 > 1582 202300 348 138 153 105 99 > > > > > > On Fri, Feb 20, 2015 at 11:46 PM, W Verb > >> wrote: > > > Hello All, > > Thank you for your replies. > I tried a few things, and found the following: > > 1: Disabling hyperthreading support in the BIOS drops > performance overall > by a factor of 4. > 2: Disabling VT support also seems to have some effect, > although it > appears to be minor. But this has the amusing side > effect of fixing the > hangs I've been experiencing with fast reboot. Probably > by disabling kvm. > 3: The performance tests are a bit tricky to quantify > because of caching > effects. In fact, I'm not entirely sure what is > happening here. It's just > best to describe what I'm seeing: > > The commands I'm using to test are > dd if=/dev/zero of=./test.dd bs=2M count=5000 > dd of=/dev/null if=./test.dd bs=2M count=5000 > The host vm is running Centos 6.6, and has the latest > vmtools installed. > There is a host cache on an SSD local to the host that > is also in place. > Disabling the host cache didn't immediately have an > effect as far as I could > see. > > The host MTU set to 3000 on all iSCSI interfaces for all > tests. > > Test 1: Right after reboot, with an ixgbe MTU of 9000, > the write test > yields an average speed over three tests of 137MB/s. The > read test yields an > average over three tests of 5MB/s. > > Test 2: After setting "ifconfig ixgbe0 mtu 3000", the > write tests yield > 140MB/s, and the read tests yield 53MB/s. It's important > to note here that > if I cut the read test short at only 2-3GB, I get > results upwards of > 350MB/s, which I assume is local cache-related distortion. > > Test 3: MTU of 1500. Read tests are up to 156 MB/s. > Write tests yield > about 142MB/s. > Test 4: MTU of 1000: Read test at 182MB/s. > Test 5: MTU of 900: Read test at 130 MB/s. > Test 6: MTU of 1000: Read test at 160MB/s. Write tests > are now > consistently at about 300MB/s. > Test 7: MTU of 1200: Read test at 124MB/s. > Test 8: MTU of 1000: Read test at 161MB/s. Write at 261MB/s. > > A few final notes: > L1ARC grabs about 10GB of RAM during the tests, so > there's definitely some > read caching going on. > The write operations are easier to observe with iostat, > and I'm seeing io > rates that closely correlate with the network write speeds. > > > Chris, thanks for your specific details. I'd appreciate > it if you could > tell me which copper NIC you tried, as well as to pass > on the iSCSI tuning > parameters. > > I've ordered an Intel EXPX9502AFXSR, which uses the > 82598 chip instead of > the 82599 in the X520. If I get similar results with my > fiber transcievers, > I'll see if I can get a hold of copper ones. > > But I should mention that I did indeed look at PHY/MAC > error rates, and > they are nil. > > -Warren V > > On Fri, Feb 20, 2015 at 7:25 PM, Chris Siebenmann > >> > > wrote: > > > After installation and configuration, I observed > all kinds of bad > behavior > in the network traffic between the hosts and the > server. All of this > bad > behavior is traced to the ixgbe driver on the > storage server. Without > going > into the full troubleshooting process, here are > my takeaways: > > [...] > > For what it's worth, we managed to achieve much > better line rates on > copper 10G ixgbe hardware of various descriptions > between OmniOS > and CentOS 7 (I don't think we ever tested OmniOS to > OmniOS). I don't > believe OmniOS could do TCP at full line rate but I > think we managed 700+ > Mbytes/sec on both transmit and receive and we got > basically disk-limited > speeds with iSCSI (across multiple disks on > multi-disk mirrored pools, > OmniOS iSCSI initiator, Linux iSCSI targets). > > I don't believe we did any specific kernel tuning > (and in fact some of > our attempts to fiddle ixgbe driver parameters blew > up in our face). > We did tune iSCSI connection parameters to increase > various buffer > sizes so that ZFS could do even large single > operations in single iSCSI > transactions. (More details available if people are > interested.) > > 10: At the wire level, the speed problems are > clearly due to pauses in > response time by omnios. At 9000 byte frame > sizes, I see a good number > of duplicate ACKs and fast retransmits during > read operations (when > omnios is transmitting). But below about a > 4100-byte MTU on omnios > (which seems to correlate to 4096-byte iSCSI > block transfers), the > transmission errors fade away and we only see > the transmission pause > problem. > > > This is what really attracted my attention. In > our OmniOS setup, our > specific Intel hardware had ixgbe driver issues that > could cause > activity stalls during once-a-second link heartbeat > checks. This > obviously had an effect at the TCP and iSCSI layers. > My initial message > to illumos-developer sparked a potentially > interesting discussion: > > > http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/ > > > > If you think this is a possibility in your setup, > I've put the DTrace > script I used to hunt for this up on the web: > > http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d > > > > This isn't the only potential source of driver > stalls by any means, it's > just the one I found. You may also want to look at > lockstat in general, > as information it reported is what led us to look > specifically at the > ixgbe code here. > > (If you suspect kernel/driver issues, lockstat > combined with kernel > source is a really excellent resource.) > > - cks > > > > > _________________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.__com > > > http://lists.omniti.com/__mailman/listinfo/omnios-__discuss > > > > > _________________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.__com > > > http://lists.omniti.com/__mailman/listinfo/omnios-__discuss > > > > > -- > OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg > Tel: +49 911 39905-0 - Fax: +49 911 > 39905-55 - http://www.osn.de > HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann > > > > -- > OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg > Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de > HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann > > illumos-developer | Archives | Modify Your Subscription -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark0x01 at gmail.com Mon Mar 2 08:12:12 2015 From: mark0x01 at gmail.com (Mark) Date: Mon, 02 Mar 2015 21:12:12 +1300 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> Message-ID: <54F41B5C.8070108@gmail.com> LACP does work - I have used on HP Procurve, but settings are fussy & usually different than Etherchannel uses. (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004048) Did you try changing the virtual switch settings? On 2/03/2015 6:11 p.m., Garrett D'Amore wrote: > I?m not sure I?ve followed properly. You have *two* interfaces. You > are not trying to provision these in an aggr are you? As far as I?m > aware, VMware does not support 802.3ad link aggregations. (Its possible > that you can make it work with ESXi if you give the entire NIC to the > guest ? but I?m skeptical.) The problem is that if you try to use link > aggregation, some packets (up to half!) will be lost. TCP and other > protocols fare poorly in this situation. > > Its possible I?ve totally misunderstood what you?re trying to do, in > which case I apologize. > > The idle thing is a red-herring ? the cpu is waiting for work to do, > probably because packets haven?t arrived (or where dropped by the > hypervisor!) I wouldn?t read too much into that except that your > network stack is in trouble. I?d look a bit more closely at the kstats > for tcp ? I suspect you?ll see retransmits or out of order values that > are unusually high ? if so this may help validate my theory above. > > - Garrett > >> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer >> > wrote: >> >> Hello all, >> >> >> Well, I no longer blame the ixgbe driver for the problems I'm seeing. >> >> >> I tried Joerg's updated driver, which didn't improve the issue. So I >> went back to the drawing board and rebuilt the server from scratch. >> >> What I noted is that if I have only a single 1-gig physical interface >> active on the ESXi host, everything works as expected. As soon as I >> enable two interfaces, I start seeing the performance problems I've >> described. >> >> Response pauses from the server that I see in TCPdumps are still >> leading me to believe the problem is delay on the server side, so I >> ran a series of kernel dtraces and produced some flamegraphs. >> >> >> This was taken during a read operation with two active 10G interfaces >> on the server, with a single target being shared by two tpgs- one tpg >> for each 10G physical port. The host device has two 1G ports enabled, >> with VLANs separating the active ports into 10G/1G pairs. ESXi is set >> to multipath using both VLANS with a round-robin IO interval of 1. >> >> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing >> >> >> This was taken during a write operation: >> >> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing >> >> >> I then rebooted the server and disabled C-State, ACPI T-State, and >> general EIST (Turbo boost) functionality in the CPU. >> >> I when I attempted to boot my guest VM, the iSCSI transfer gradually >> ground to a halt during the boot loading process, and the guest OS >> never did complete its boot process. >> >> Here is a flamegraph taken while iSCSI is slowly dying: >> >> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing >> >> >> I edited out cpu_idle_adaptive from the dtrace output and regenerated >> the slowdown graph: >> >> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing >> >> >> I then edited cpu_idle_adaptive out of the speedy write operation and >> regenerated that graph: >> >> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing >> >> >> I have zero experience with interpreting flamegraphs, but the most >> significant difference I see between the slow read example and the >> fast write example is in unix`thread_start --> unix`idle. There's a >> good chunk of "unix`i86_mwait" in the read example that is not present >> in the write example at all. >> >> Disabling the l2arc cache device didn't make a difference, and I had >> to reenable EIST support on the CPU to get my VMs to boot. >> >> I am seeing a variety of bug reports going back to 2010 regarding >> excessive mwait operations, with the suggested solutions usually being >> to set "cpupm enable poll-mode" in power.conf. That change also had no >> effect on speed. >> >> -Warren V >> >> >> >> >> -----Original Message----- >> >> From: Chris Siebenmann [mailto:cks at cs.toronto.edu] >> >> Sent: Monday, February 23, 2015 8:30 AM >> >> To: W Verb >> >> Cc: omnios-discuss at lists.omniti.com >> ; cks at cs.toronto.edu >> >> >> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the >> Greek economy >> >> >> > Chris, thanks for your specific details. I'd appreciate it if you >> >> > could tell me which copper NIC you tried, as well as to pass on the >> >> > iSCSI tuning parameters. >> >> >> Our copper NIC experience is with onboard X540-AT2 ports on SuperMicro >> hardware (which have the guaranteed 10-20 msec lock hold) and >> dual-port 82599EB TN cards (which have some sort of driver/hardware >> failure under load that eventually leads to 2-second lock holds). I >> can't recommend either with the current driver; we had to revert to 1G >> networking in order to get stable servers. >> >> >> The iSCSI parameter modifications we do, across both initiators and >> targets, are: >> >> >> initialr2tno >> >> firstburstlength128k >> >> maxrecvdataseglen128k[only on Linux backends] >> >> maxxmitdataseglen128k[only on Linux backends] >> >> >> The OmniOS initiator doesn't need tuning for more than the first two >> parameters; on the Linux backends we tune up all four. My extended >> thoughts on these tuning parameters and why we touch them can be found >> >> here: >> >> >> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol >> >> http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning >> >> >> >> The short version is that these parameters probably only make a small >> difference but their overall goal is to do 128KB ZFS reads and writes >> in single iSCSI operations (although they will be fragmented at the TCP >> >> layer) and to do iSCSI writes without a back-and-forth delay between >> initiator and target (that's 'initialr2t no'). >> >> >> I think basically everyone should use InitialR2T set to no and in fact >> that it should be the software default. These days only unusually >> limited iSCSI targets should need it to be otherwise and they can >> change their setting for it (initiator and target must both agree to >> it being 'yes', so either can veto it). >> >> >> - cks >> >> >> >> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann > > wrote: >> >> Hi, >> >> I think your problem is caused by your link properties or your >> switch settings. In general the standard ixgbe seems to perform >> well. >> >> I had trouble after changing the default flow control settings to "bi" >> and this was my motivation to update the ixgbe driver a long time ago. >> After I have updated our systems to ixgbe 2.5.8 I never had any >> problems .... >> >> Make sure your switch has support for jumbo frames and you use >> the same mtu on all ports, otherwise the smallest will be used. >> >> What switch do you use? I can tell you nice horror stories about >> different vendors.... >> >> - Joerg >> >> On 23.02.2015 10:31, W Verb wrote: >> >> Thank you Joerg, >> >> I've downloaded the package and will try it tomorrow. >> >> The only thing I can add at this point is that upon review of my >> testing, I may have performed my "pkg -u" between the initial >> quad-gig >> performance test and installing the 10G NIC. So this may be a new >> problem introduced in the latest updates. >> >> Those of you who are running 10G and have not upgraded to the >> latest >> kernel, etc, might want to do some additional testing before >> running the >> update. >> >> -Warren V >> >> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann > >> >> wrote: >> >> Hi, >> >> I remember there was a problem with the flow control >> settings in the >> ixgbe >> driver, so I updated it a long time ago for our internal >> servers to >> 2.5.8. >> Last weekend I integrated the latest changes from the >> FreeBSD driver >> to bring >> the illumos ixgbe to 2.5.25 but I had no time to test it, >> so it's >> completely >> untested! >> >> >> If you would like to give the latest driver a try you can >> fetch the >> kernel modules from >> https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 >> >> > > >> >> Clone your boot environment, place the modules in the new >> environment >> and update the boot-archive of the new BE. >> >> - Joerg >> >> >> >> >> >> On 23.02.2015 02:54, W Verb wrote: >> >> By the way, to those of you who have working setups: >> please send me >> your pool/volume settings, interface linkprops, and >> any kernel >> tuning >> parameters you may have set. >> >> Thanks, >> Warren V >> >> On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip >> >> >> wrote: >> >> I can't say I totally agree with your performance >> assessment. I run Intel >> X520 in all my OmniOS boxes. >> >> Here is a capture of nfssvrtop I made while >> running many >> storage vMotions >> between two OmniOS boxes hosting NFS datastores. >> This is a >> 10 host VMware >> cluster. Both OmniOS boxes are dual 10G connected >> with >> copper twin-ax to >> the in rack Nexus 5010. >> >> VMware does 100% sync writes, I use ZeusRAM SSDs >> for log >> devices. >> >> -Chip >> >> 2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB, >> swrite: 15985 KB, >> awrite: 1875455 KB >> >> Ver Client NFSOPS Reads SWrites >> AWrites >> Commits Rd_bw >> SWr_bw AWr_bw Rd_t SWr_t AWr_t Com_t Align% >> >> 4 10.28.17.105 0 0 0 >> 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.17.215 0 0 0 >> 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.17.213 0 0 0 >> 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.16.151 0 0 0 >> 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 all 1 0 0 >> 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 3 10.28.16.175 3 0 3 >> 0 >> 0 1 >> 11 0 4806 48 0 0 85 >> >> 3 10.28.16.183 6 0 6 >> 0 >> 0 3 >> 162 0 549 124 0 0 73 >> >> 3 10.28.16.180 11 0 10 >> 0 >> 0 3 >> 27 0 776 89 0 0 67 >> >> 3 10.28.16.176 28 2 26 >> 0 >> 0 10 >> 405 0 2572 198 0 0 100 >> >> 3 10.28.16.178 4606 4602 4 >> 0 >> 0 294534 >> 3 0 723 49 0 0 99 >> >> 3 10.28.16.179 4905 4879 26 >> 0 >> 0 312208 >> 311 0 735 271 0 0 99 >> >> 3 10.28.16.181 5515 5502 13 >> 0 >> 0 352107 >> 77 0 89 87 0 0 99 >> >> 3 10.28.16.184 12095 12059 10 >> 0 >> 0 763014 >> 39 0 249 147 0 0 99 >> >> 3 10.28.58.1 15401 6040 116 >> 6354 >> 53 191605 >> 474 202346 192 96 144 83 99 >> >> 3 all 42574 33086 >> 217 >> 6354 53 1913488 >> 1582 202300 348 138 153 105 99 >> >> >> >> >> >> On Fri, Feb 20, 2015 at 11:46 PM, W Verb >> >> > >> wrote: >> >> >> Hello All, >> >> Thank you for your replies. >> I tried a few things, and found the following: >> >> 1: Disabling hyperthreading support in the >> BIOS drops >> performance overall >> by a factor of 4. >> 2: Disabling VT support also seems to have >> some effect, >> although it >> appears to be minor. But this has the amusing side >> effect of fixing the >> hangs I've been experiencing with fast reboot. >> Probably >> by disabling kvm. >> 3: The performance tests are a bit tricky to >> quantify >> because of caching >> effects. In fact, I'm not entirely sure what is >> happening here. It's just >> best to describe what I'm seeing: >> >> The commands I'm using to test are >> dd if=/dev/zero of=./test.dd bs=2M count=5000 >> dd of=/dev/null if=./test.dd bs=2M count=5000 >> The host vm is running Centos 6.6, and has the >> latest >> vmtools installed. >> There is a host cache on an SSD local to the >> host that >> is also in place. >> Disabling the host cache didn't immediately >> have an >> effect as far as I could >> see. >> >> The host MTU set to 3000 on all iSCSI >> interfaces for all >> tests. >> >> Test 1: Right after reboot, with an ixgbe MTU >> of 9000, >> the write test >> yields an average speed over three tests of >> 137MB/s. The >> read test yields an >> average over three tests of 5MB/s. >> >> Test 2: After setting "ifconfig ixgbe0 mtu >> 3000", the >> write tests yield >> 140MB/s, and the read tests yield 53MB/s. It's >> important >> to note here that >> if I cut the read test short at only 2-3GB, I get >> results upwards of >> 350MB/s, which I assume is local cache-related >> distortion. >> >> Test 3: MTU of 1500. Read tests are up to 156 >> MB/s. >> Write tests yield >> about 142MB/s. >> Test 4: MTU of 1000: Read test at 182MB/s. >> Test 5: MTU of 900: Read test at 130 MB/s. >> Test 6: MTU of 1000: Read test at 160MB/s. >> Write tests >> are now >> consistently at about 300MB/s. >> Test 7: MTU of 1200: Read test at 124MB/s. >> Test 8: MTU of 1000: Read test at 161MB/s. >> Write at 261MB/s. >> >> A few final notes: >> L1ARC grabs about 10GB of RAM during the tests, so >> there's definitely some >> read caching going on. >> The write operations are easier to observe >> with iostat, >> and I'm seeing io >> rates that closely correlate with the network >> write speeds. >> >> >> Chris, thanks for your specific details. I'd >> appreciate >> it if you could >> tell me which copper NIC you tried, as well as >> to pass >> on the iSCSI tuning >> parameters. >> >> I've ordered an Intel EXPX9502AFXSR, which >> uses the >> 82598 chip instead of >> the 82599 in the X520. If I get similar >> results with my >> fiber transcievers, >> I'll see if I can get a hold of copper ones. >> >> But I should mention that I did indeed look at >> PHY/MAC >> error rates, and >> they are nil. >> >> -Warren V >> >> On Fri, Feb 20, 2015 at 7:25 PM, Chris Siebenmann >> > > >> >> >> wrote: >> >> >> After installation and configuration, >> I observed >> all kinds of bad >> behavior >> in the network traffic between the >> hosts and the >> server. All of this >> bad >> behavior is traced to the ixgbe driver >> on the >> storage server. Without >> going >> into the full troubleshooting process, >> here are >> my takeaways: >> >> [...] >> >> For what it's worth, we managed to >> achieve much >> better line rates on >> copper 10G ixgbe hardware of various >> descriptions >> between OmniOS >> and CentOS 7 (I don't think we ever tested >> OmniOS to >> OmniOS). I don't >> believe OmniOS could do TCP at full line >> rate but I >> think we managed 700+ >> Mbytes/sec on both transmit and receive >> and we got >> basically disk-limited >> speeds with iSCSI (across multiple disks on >> multi-disk mirrored pools, >> OmniOS iSCSI initiator, Linux iSCSI targets). >> >> I don't believe we did any specific >> kernel tuning >> (and in fact some of >> our attempts to fiddle ixgbe driver >> parameters blew >> up in our face). >> We did tune iSCSI connection parameters to >> increase >> various buffer >> sizes so that ZFS could do even large single >> operations in single iSCSI >> transactions. (More details available if >> people are >> interested.) >> >> 10: At the wire level, the speed >> problems are >> clearly due to pauses in >> response time by omnios. At 9000 byte >> frame >> sizes, I see a good number >> of duplicate ACKs and fast retransmits >> during >> read operations (when >> omnios is transmitting). But below about a >> 4100-byte MTU on omnios >> (which seems to correlate to 4096-byte >> iSCSI >> block transfers), the >> transmission errors fade away and we >> only see >> the transmission pause >> problem. >> >> >> This is what really attracted my >> attention. In >> our OmniOS setup, our >> specific Intel hardware had ixgbe driver >> issues that >> could cause >> activity stalls during once-a-second link >> heartbeat >> checks. This >> obviously had an effect at the TCP and >> iSCSI layers. >> My initial message >> to illumos-developer sparked a potentially >> interesting discussion: >> >> >> http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/ >> >> >> > > >> >> If you think this is a possibility in your >> setup, >> I've put the DTrace >> script I used to hunt for this up on the web: >> >> http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d >> >> >> > > >> >> This isn't the only potential source of driver >> stalls by any means, it's >> just the one I found. You may also want to >> look at >> lockstat in general, >> as information it reported is what led us >> to look >> specifically at the >> ixgbe code here. >> >> (If you suspect kernel/driver issues, lockstat >> combined with kernel >> source is a really excellent resource.) >> >> - cks >> >> >> >> >> >> ___________________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti >> .____com >> > > >> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss >> >> > > >> >> >> ___________________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti >> .____com >> > > >> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss >> >> > > >> >> >> -- >> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 >> Nuernberg >> Tel: +49 911 39905-0 >> - Fax: +49 911 >> 39905-55 - http://www.osn.de >> >> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >> >> >> >> -- >> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg >> Tel: +49 911 39905-0 - Fax: +49 911 >> 39905-55 - http://www.osn.de >> >> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >> >> >> *illumos-developer* | Archives >> >> >> | Modify >> >> Your Subscription [Powered by Listbox] >> > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > From wverb73 at gmail.com Mon Mar 2 08:22:26 2015 From: wverb73 at gmail.com (W Verb) Date: Mon, 2 Mar 2015 00:22:26 -0800 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> Message-ID: Hello Garrett, No, no 802.3ad going on in this config. Here is a basic schematic: https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide: https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The switch is set to allow 9148-byte frames, and I'm not seeing any errors/buffer overruns on the switch. Here is a screenshot of a packet capture from a read operation on the guest OS (from it's local drive, which is actually a VMDK file on the storage server). In this example, only a single 1G ESXi kernel interface (vmk1) is bound to the software iSCSI initiator. https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing Note that there's a nice, well-behaved window sizing process taking place. The ESXi decreases the scaled window by 11 or 12 for each ACK, then bumps it back up to 512. Here is a similar screenshot of a single-interface write operation: https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing There are no pauses or gaps in the transmission rate in the single-interface transfers. In the next screenshots, I have enabled an additional 1G interface on the ESXi host, and bound it to the iSCSI initiator. The new interface is bound to a separate physical port, uses a different VLAN on the switch, and talks to a different 10G port on the storage server. First, let's look at a write operation on the guest OS, which happily pumps data at near-line-rate to the storage server. Here is a sequence number trace diagram. Note how the transfer has a nice, smooth increment rate over the entire transfer. https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing Here are screenshots from packet captures on both 1G interfaces: https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing Note how we again see nice, smooth window adjustment, and no gaps in transmission. But now, let's look at the problematic two-interface Read operation. First, the sequence graph: https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing As you can see, there are gaps and jumps in the transmission throughout the transfer. It is very illustrative to look at captures of the gaps, which are occurring on both interfaces: https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing As you can see, there are ~.4 second pauses in transmission from the storage server, which kills the transfer rate. It's clear that the ESXi box ACKs the prior iSCSI operation to completion, then makes a new LUN request, which the storage server immediately replies to. The ESXi ACKs the response packet from the storage server, then waits...and waits....and waits... until eventually the storage server starts transmitting again. Because the pause happens while the ESXi client is waiting for a packet from the storage server, that tells me that the gaps are not an artifact of traffic being switched between both active interfaces, but are actually indicative of short hangs occurring on the server. Having a pause or two in transmission is no big deal, but in my case, it is happening constantly, and dropping my overall read transfer rate down to 20-60MB/s, which is slower than the single interface transfer rate (~90-100MB/s). Decreasing the MTU makes the pauses shorter, increasing them makes the pauses longer. Another interesting thing is that if I set the multipath io interval to 3 operations instead of 1, I get better throughput. In other words, the less frequently I swap IP addresses on my iSCSI requests from the ESXi unit, the fewer pauses I see. Basically, COMSTAR seems to choke each time an iSCSI request from a new IP arrives. Because the single interface transfer is near line rate, that tells me that the storage system (mpt_sas, zfs, etc) is working fine. It's only when multiple paths are attempted that iSCSI falls on its face during reads. All of these captures were taken without a cache device being attached to the storage zpool, so this isn't looking like some kind of ZFS ARC problem. As mentioned previously, local transfers to/from the zpool are showing ~300-500 MB/s rates over long transfers (10G+). -Warren V On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore wrote: > I?m not sure I?ve followed properly. You have *two* interfaces. You are > not trying to provision these in an aggr are you? As far as I?m aware, > VMware does not support 802.3ad link aggregations. (Its possible that you > can make it work with ESXi if you give the entire NIC to the guest ? but > I?m skeptical.) The problem is that if you try to use link aggregation, > some packets (up to half!) will be lost. TCP and other protocols fare > poorly in this situation. > > Its possible I?ve totally misunderstood what you?re trying to do, in which > case I apologize. > > The idle thing is a red-herring ? the cpu is waiting for work to do, > probably because packets haven?t arrived (or where dropped by the > hypervisor!) I wouldn?t read too much into that except that your network > stack is in trouble. I?d look a bit more closely at the kstats for tcp ? I > suspect you?ll see retransmits or out of order values that are unusually > high ? if so this may help validate my theory above. > > - Garrett > > On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer < > developer at lists.illumos.org> wrote: > > Hello all, > > > Well, I no longer blame the ixgbe driver for the problems I'm seeing. > > > I tried Joerg's updated driver, which didn't improve the issue. So I went > back to the drawing board and rebuilt the server from scratch. > > What I noted is that if I have only a single 1-gig physical interface > active on the ESXi host, everything works as expected. As soon as I enable > two interfaces, I start seeing the performance problems I've described. > > Response pauses from the server that I see in TCPdumps are still leading > me to believe the problem is delay on the server side, so I ran a series of > kernel dtraces and produced some flamegraphs. > > > This was taken during a read operation with two active 10G interfaces on > the server, with a single target being shared by two tpgs- one tpg for each > 10G physical port. The host device has two 1G ports enabled, with VLANs > separating the active ports into 10G/1G pairs. ESXi is set to multipath > using both VLANS with a round-robin IO interval of 1. > > > https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing > > > This was taken during a write operation: > > > https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing > > > I then rebooted the server and disabled C-State, ACPI T-State, and general > EIST (Turbo boost) functionality in the CPU. > > I when I attempted to boot my guest VM, the iSCSI transfer gradually > ground to a halt during the boot loading process, and the guest OS never > did complete its boot process. > > Here is a flamegraph taken while iSCSI is slowly dying: > > > https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing > > > I edited out cpu_idle_adaptive from the dtrace output and regenerated the > slowdown graph: > > > https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing > > > I then edited cpu_idle_adaptive out of the speedy write operation and > regenerated that graph: > > > https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing > > > I have zero experience with interpreting flamegraphs, but the most > significant difference I see between the slow read example and the fast > write example is in unix`thread_start --> unix`idle. There's a good chunk > of "unix`i86_mwait" in the read example that is not present in the write > example at all. > > Disabling the l2arc cache device didn't make a difference, and I had to > reenable EIST support on the CPU to get my VMs to boot. > > I am seeing a variety of bug reports going back to 2010 regarding > excessive mwait operations, with the suggested solutions usually being to > set "cpupm enable poll-mode" in power.conf. That change also had no effect > on speed. > > -Warren V > > > > > -----Original Message----- > > From: Chris Siebenmann [mailto:cks at cs.toronto.edu ] > > Sent: Monday, February 23, 2015 8:30 AM > > To: W Verb > > Cc: omnios-discuss at lists.omniti.com; cks at cs.toronto.edu > > Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the > Greek economy > > > > Chris, thanks for your specific details. I'd appreciate it if you > > > could tell me which copper NIC you tried, as well as to pass on the > > > iSCSI tuning parameters. > > > Our copper NIC experience is with onboard X540-AT2 ports on SuperMicro > hardware (which have the guaranteed 10-20 msec lock hold) and dual-port > 82599EB TN cards (which have some sort of driver/hardware failure under > load that eventually leads to 2-second lock holds). I can't recommend > either with the current driver; we had to revert to 1G networking in order > to get stable servers. > > > The iSCSI parameter modifications we do, across both initiators and > targets, are: > > > initialr2t no > > firstburstlength 128k > > maxrecvdataseglen 128k [only on Linux backends] > > maxxmitdataseglen 128k [only on Linux backends] > > > The OmniOS initiator doesn't need tuning for more than the first two > parameters; on the Linux backends we tune up all four. My extended thoughts > on these tuning parameters and why we touch them can be found > > here: > > > http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol > > http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning > > > The short version is that these parameters probably only make a small > difference but their overall goal is to do 128KB ZFS reads and writes in > single iSCSI operations (although they will be fragmented at the TCP > > layer) and to do iSCSI writes without a back-and-forth delay between > initiator and target (that's 'initialr2t no'). > > > I think basically everyone should use InitialR2T set to no and in fact > that it should be the software default. These days only unusually limited > iSCSI targets should need it to be otherwise and they can change their > setting for it (initiator and target must both agree to it being 'yes', so > either can veto it). > > > - cks > > > On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann wrote: > >> Hi, >> >> I think your problem is caused by your link properties or your >> switch settings. In general the standard ixgbe seems to perform >> well. >> >> I had trouble after changing the default flow control settings to "bi" >> and this was my motivation to update the ixgbe driver a long time ago. >> After I have updated our systems to ixgbe 2.5.8 I never had any >> problems .... >> >> Make sure your switch has support for jumbo frames and you use >> the same mtu on all ports, otherwise the smallest will be used. >> >> What switch do you use? I can tell you nice horror stories about >> different vendors.... >> >> - Joerg >> >> On 23.02.2015 10:31, W Verb wrote: >> >>> Thank you Joerg, >>> >>> I've downloaded the package and will try it tomorrow. >>> >>> The only thing I can add at this point is that upon review of my >>> testing, I may have performed my "pkg -u" between the initial quad-gig >>> performance test and installing the 10G NIC. So this may be a new >>> problem introduced in the latest updates. >>> >>> Those of you who are running 10G and have not upgraded to the latest >>> kernel, etc, might want to do some additional testing before running the >>> update. >>> >>> -Warren V >>> >>> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann >> > wrote: >>> >>> Hi, >>> >>> I remember there was a problem with the flow control settings in the >>> ixgbe >>> driver, so I updated it a long time ago for our internal servers to >>> 2.5.8. >>> Last weekend I integrated the latest changes from the FreeBSD driver >>> to bring >>> the illumos ixgbe to 2.5.25 but I had no time to test it, so it's >>> completely >>> untested! >>> >>> >>> If you would like to give the latest driver a try you can fetch the >>> kernel modules from >>> https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9 >>> >>> >>> Clone your boot environment, place the modules in the new environment >>> and update the boot-archive of the new BE. >>> >>> - Joerg >>> >>> >>> >>> >>> >>> On 23.02.2015 02:54, W Verb wrote: >>> >>> By the way, to those of you who have working setups: please send >>> me >>> your pool/volume settings, interface linkprops, and any kernel >>> tuning >>> parameters you may have set. >>> >>> Thanks, >>> Warren V >>> >>> On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip >>> > wrote: >>> >>> I can't say I totally agree with your performance >>> assessment. I run Intel >>> X520 in all my OmniOS boxes. >>> >>> Here is a capture of nfssvrtop I made while running many >>> storage vMotions >>> between two OmniOS boxes hosting NFS datastores. This is a >>> 10 host VMware >>> cluster. Both OmniOS boxes are dual 10G connected with >>> copper twin-ax to >>> the in rack Nexus 5010. >>> >>> VMware does 100% sync writes, I use ZeusRAM SSDs for log >>> devices. >>> >>> -Chip >>> >>> 2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB, >>> swrite: 15985 KB, >>> awrite: 1875455 KB >>> >>> Ver Client NFSOPS Reads SWrites AWrites >>> Commits Rd_bw >>> SWr_bw AWr_bw Rd_t SWr_t AWr_t Com_t Align% >>> >>> 4 10.28.17.105 0 0 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 10.28.17.215 0 0 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 10.28.17.213 0 0 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 10.28.16.151 0 0 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 all 1 0 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 3 10.28.16.175 3 0 3 0 >>> 0 1 >>> 11 0 4806 48 0 0 85 >>> >>> 3 10.28.16.183 6 0 6 0 >>> 0 3 >>> 162 0 549 124 0 0 73 >>> >>> 3 10.28.16.180 11 0 10 0 >>> 0 3 >>> 27 0 776 89 0 0 67 >>> >>> 3 10.28.16.176 28 2 26 0 >>> 0 10 >>> 405 0 2572 198 0 0 100 >>> >>> 3 10.28.16.178 4606 4602 4 0 >>> 0 294534 >>> 3 0 723 49 0 0 99 >>> >>> 3 10.28.16.179 4905 4879 26 0 >>> 0 312208 >>> 311 0 735 271 0 0 99 >>> >>> 3 10.28.16.181 5515 5502 13 0 >>> 0 352107 >>> 77 0 89 87 0 0 99 >>> >>> 3 10.28.16.184 12095 12059 10 0 >>> 0 763014 >>> 39 0 249 147 0 0 99 >>> >>> 3 10.28.58.1 15401 6040 116 6354 >>> 53 191605 >>> 474 202346 192 96 144 83 99 >>> >>> 3 all 42574 33086 217 >>> 6354 53 1913488 >>> 1582 202300 348 138 153 105 99 >>> >>> >>> >>> >>> >>> On Fri, Feb 20, 2015 at 11:46 PM, W Verb >> > wrote: >>> >>> >>> Hello All, >>> >>> Thank you for your replies. >>> I tried a few things, and found the following: >>> >>> 1: Disabling hyperthreading support in the BIOS drops >>> performance overall >>> by a factor of 4. >>> 2: Disabling VT support also seems to have some effect, >>> although it >>> appears to be minor. But this has the amusing side >>> effect of fixing the >>> hangs I've been experiencing with fast reboot. Probably >>> by disabling kvm. >>> 3: The performance tests are a bit tricky to quantify >>> because of caching >>> effects. In fact, I'm not entirely sure what is >>> happening here. It's just >>> best to describe what I'm seeing: >>> >>> The commands I'm using to test are >>> dd if=/dev/zero of=./test.dd bs=2M count=5000 >>> dd of=/dev/null if=./test.dd bs=2M count=5000 >>> The host vm is running Centos 6.6, and has the latest >>> vmtools installed. >>> There is a host cache on an SSD local to the host that >>> is also in place. >>> Disabling the host cache didn't immediately have an >>> effect as far as I could >>> see. >>> >>> The host MTU set to 3000 on all iSCSI interfaces for all >>> tests. >>> >>> Test 1: Right after reboot, with an ixgbe MTU of 9000, >>> the write test >>> yields an average speed over three tests of 137MB/s. The >>> read test yields an >>> average over three tests of 5MB/s. >>> >>> Test 2: After setting "ifconfig ixgbe0 mtu 3000", the >>> write tests yield >>> 140MB/s, and the read tests yield 53MB/s. It's important >>> to note here that >>> if I cut the read test short at only 2-3GB, I get >>> results upwards of >>> 350MB/s, which I assume is local cache-related >>> distortion. >>> >>> Test 3: MTU of 1500. Read tests are up to 156 MB/s. >>> Write tests yield >>> about 142MB/s. >>> Test 4: MTU of 1000: Read test at 182MB/s. >>> Test 5: MTU of 900: Read test at 130 MB/s. >>> Test 6: MTU of 1000: Read test at 160MB/s. Write tests >>> are now >>> consistently at about 300MB/s. >>> Test 7: MTU of 1200: Read test at 124MB/s. >>> Test 8: MTU of 1000: Read test at 161MB/s. Write at >>> 261MB/s. >>> >>> A few final notes: >>> L1ARC grabs about 10GB of RAM during the tests, so >>> there's definitely some >>> read caching going on. >>> The write operations are easier to observe with iostat, >>> and I'm seeing io >>> rates that closely correlate with the network write >>> speeds. >>> >>> >>> Chris, thanks for your specific details. I'd appreciate >>> it if you could >>> tell me which copper NIC you tried, as well as to pass >>> on the iSCSI tuning >>> parameters. >>> >>> I've ordered an Intel EXPX9502AFXSR, which uses the >>> 82598 chip instead of >>> the 82599 in the X520. If I get similar results with my >>> fiber transcievers, >>> I'll see if I can get a hold of copper ones. >>> >>> But I should mention that I did indeed look at PHY/MAC >>> error rates, and >>> they are nil. >>> >>> -Warren V >>> >>> On Fri, Feb 20, 2015 at 7:25 PM, Chris Siebenmann >>> > >>> >>> wrote: >>> >>> >>> After installation and configuration, I observed >>> all kinds of bad >>> behavior >>> in the network traffic between the hosts and the >>> server. All of this >>> bad >>> behavior is traced to the ixgbe driver on the >>> storage server. Without >>> going >>> into the full troubleshooting process, here are >>> my takeaways: >>> >>> [...] >>> >>> For what it's worth, we managed to achieve much >>> better line rates on >>> copper 10G ixgbe hardware of various descriptions >>> between OmniOS >>> and CentOS 7 (I don't think we ever tested OmniOS to >>> OmniOS). I don't >>> believe OmniOS could do TCP at full line rate but I >>> think we managed 700+ >>> Mbytes/sec on both transmit and receive and we got >>> basically disk-limited >>> speeds with iSCSI (across multiple disks on >>> multi-disk mirrored pools, >>> OmniOS iSCSI initiator, Linux iSCSI targets). >>> >>> I don't believe we did any specific kernel tuning >>> (and in fact some of >>> our attempts to fiddle ixgbe driver parameters blew >>> up in our face). >>> We did tune iSCSI connection parameters to increase >>> various buffer >>> sizes so that ZFS could do even large single >>> operations in single iSCSI >>> transactions. (More details available if people are >>> interested.) >>> >>> 10: At the wire level, the speed problems are >>> clearly due to pauses in >>> response time by omnios. At 9000 byte frame >>> sizes, I see a good number >>> of duplicate ACKs and fast retransmits during >>> read operations (when >>> omnios is transmitting). But below about a >>> 4100-byte MTU on omnios >>> (which seems to correlate to 4096-byte iSCSI >>> block transfers), the >>> transmission errors fade away and we only see >>> the transmission pause >>> problem. >>> >>> >>> This is what really attracted my attention. In >>> our OmniOS setup, our >>> specific Intel hardware had ixgbe driver issues that >>> could cause >>> activity stalls during once-a-second link heartbeat >>> checks. This >>> obviously had an effect at the TCP and iSCSI layers. >>> My initial message >>> to illumos-developer sparked a potentially >>> interesting discussion: >>> >>> >>> http://www.listbox.com/member/ >>> __archive/182179/2014/10/sort/__time_rev/page/16/entry/6: >>> 405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/ >>> >> member/archive/182179/2014/10/sort/time_rev/page/16/entry/6: >>> 405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/> >>> >>> If you think this is a possibility in your setup, >>> I've put the DTrace >>> script I used to hunt for this up on the web: >>> >>> http://www.cs.toronto.edu/~__ >>> cks/src/omnios-ixgbe/ixgbe___delay.d >>> >> cks/src/omnios-ixgbe/ixgbe_delay.d> >>> >>> This isn't the only potential source of driver >>> stalls by any means, it's >>> just the one I found. You may also want to look at >>> lockstat in general, >>> as information it reported is what led us to look >>> specifically at the >>> ixgbe code here. >>> >>> (If you suspect kernel/driver issues, lockstat >>> combined with kernel >>> source is a really excellent resource.) >>> >>> - cks >>> >>> >>> >>> >>> _________________________________________________ >>> OmniOS-discuss mailing list >>> OmniOS-discuss at lists.omniti.__com >>> >>> http://lists.omniti.com/__mailman/listinfo/omnios-__ >>> discuss >>> >> > >>> >>> >>> _________________________________________________ >>> OmniOS-discuss mailing list >>> OmniOS-discuss at lists.omniti.__com >>> >>> http://lists.omniti.com/__mailman/listinfo/omnios-__discuss >>> >>> >>> >>> -- >>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg >>> Tel: +49 911 39905-0 - Fax: +49 911 >>> 39905-55 - http://www.osn.de >>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >>> >>> >>> >> -- >> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg >> Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de >> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >> > > *illumos-developer* | Archives > > | > Modify > > Your Subscription > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jg at osn.de Mon Mar 2 11:14:10 2015 From: jg at osn.de (Joerg Goltermann) Date: Mon, 02 Mar 2015 12:14:10 +0100 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> Message-ID: <54F44602.5030705@osn.de> Hi, I would try *one* TPG which includes both interface addresses and I would double check for packet drops on the Catalyst. The 3560 supports only receive flow control which means, that a sending 10Gbit port can easily overload a 1Gbit port. Do you have flow control enabled? - Joerg On 02.03.2015 09:22, W Verb via illumos-developer wrote: > Hello Garrett, > > No, no 802.3ad going on in this config. > > Here is a basic schematic: > > https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing > > Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide: > > https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing > > Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The > switch is set to allow 9148-byte frames, and I'm not seeing any > errors/buffer overruns on the switch. > > Here is a screenshot of a packet capture from a read operation on the > guest OS (from it's local drive, which is actually a VMDK file on the > storage server). In this example, only a single 1G ESXi kernel interface > (vmk1) is bound to the software iSCSI initiator. > > https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing > > Note that there's a nice, well-behaved window sizing process taking > place. The ESXi decreases the scaled window by 11 or 12 for each ACK, > then bumps it back up to 512. > > Here is a similar screenshot of a single-interface write operation: > > https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing > > There are no pauses or gaps in the transmission rate in the > single-interface transfers. > > > In the next screenshots, I have enabled an additional 1G interface on > the ESXi host, and bound it to the iSCSI initiator. The new interface is > bound to a separate physical port, uses a different VLAN on the switch, > and talks to a different 10G port on the storage server. > > First, let's look at a write operation on the guest OS, which happily > pumps data at near-line-rate to the storage server. > > Here is a sequence number trace diagram. Note how the transfer has a > nice, smooth increment rate over the entire transfer. > > https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing > > Here are screenshots from packet captures on both 1G interfaces: > > https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing > https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing > > Note how we again see nice, smooth window adjustment, and no gaps in > transmission. > > > But now, let's look at the problematic two-interface Read operation. > First, the sequence graph: > > https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing > > As you can see, there are gaps and jumps in the transmission throughout > the transfer. > It is very illustrative to look at captures of the gaps, which are > occurring on both interfaces: > > https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing > https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing > > As you can see, there are ~.4 second pauses in transmission from the > storage server, which kills the transfer rate. > It's clear that the ESXi box ACKs the prior iSCSI operation to > completion, then makes a new LUN request, which the storage server > immediately replies to. The ESXi ACKs the response packet from the > storage server, then waits...and waits....and waits... until eventually > the storage server starts transmitting again. > > Because the pause happens while the ESXi client is waiting for a packet > from the storage server, that tells me that the gaps are not an artifact > of traffic being switched between both active interfaces, but are > actually indicative of short hangs occurring on the server. > > Having a pause or two in transmission is no big deal, but in my case, it > is happening constantly, and dropping my overall read transfer rate down > to 20-60MB/s, which is slower than the single interface transfer rate > (~90-100MB/s). > > Decreasing the MTU makes the pauses shorter, increasing them makes the > pauses longer. > > Another interesting thing is that if I set the multipath io interval to > 3 operations instead of 1, I get better throughput. In other words, the > less frequently I swap IP addresses on my iSCSI requests from the ESXi > unit, the fewer pauses I see. > > Basically, COMSTAR seems to choke each time an iSCSI request from a new > IP arrives. > > Because the single interface transfer is near line rate, that tells me > that the storage system (mpt_sas, zfs, etc) is working fine. It's only > when multiple paths are attempted that iSCSI falls on its face during reads. > > All of these captures were taken without a cache device being attached > to the storage zpool, so this isn't looking like some kind of ZFS ARC > problem. As mentioned previously, local transfers to/from the zpool are > showing ~300-500 MB/s rates over long transfers (10G+). > > -Warren V > > On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore > wrote: > > I?m not sure I?ve followed properly. You have *two* interfaces. > You are not trying to provision these in an aggr are you? As far as > I?m aware, VMware does not support 802.3ad link aggregations. (Its > possible that you can make it work with ESXi if you give the entire > NIC to the guest ? but I?m skeptical.) The problem is that if you > try to use link aggregation, some packets (up to half!) will be > lost. TCP and other protocols fare poorly in this situation. > > Its possible I?ve totally misunderstood what you?re trying to do, in > which case I apologize. > > The idle thing is a red-herring ? the cpu is waiting for work to do, > probably because packets haven?t arrived (or where dropped by the > hypervisor!) I wouldn?t read too much into that except that your > network stack is in trouble. I?d look a bit more closely at the > kstats for tcp ? I suspect you?ll see retransmits or out of order > values that are unusually high ? if so this may help validate my > theory above. > > - Garrett > >> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer >> > >> wrote: >> >> Hello all, >> >> >> Well, I no longer blame the ixgbe driver for the problems I'm seeing. >> >> >> I tried Joerg's updated driver, which didn't improve the issue. So >> I went back to the drawing board and rebuilt the server from scratch. >> >> What I noted is that if I have only a single 1-gig physical >> interface active on the ESXi host, everything works as expected. >> As soon as I enable two interfaces, I start seeing the performance >> problems I've described. >> >> Response pauses from the server that I see in TCPdumps are still >> leading me to believe the problem is delay on the server side, so >> I ran a series of kernel dtraces and produced some flamegraphs. >> >> >> This was taken during a read operation with two active 10G >> interfaces on the server, with a single target being shared by two >> tpgs- one tpg for each 10G physical port. The host device has two >> 1G ports enabled, with VLANs separating the active ports into >> 10G/1G pairs. ESXi is set to multipath using both VLANS with a >> round-robin IO interval of 1. >> >> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing >> >> >> This was taken during a write operation: >> >> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing >> >> >> I then rebooted the server and disabled C-State, ACPI T-State, and >> general EIST (Turbo boost) functionality in the CPU. >> >> I when I attempted to boot my guest VM, the iSCSI transfer >> gradually ground to a halt during the boot loading process, and >> the guest OS never did complete its boot process. >> >> Here is a flamegraph taken while iSCSI is slowly dying: >> >> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing >> >> >> I edited out cpu_idle_adaptive from the dtrace output and >> regenerated the slowdown graph: >> >> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing >> >> >> I then edited cpu_idle_adaptive out of the speedy write operation >> and regenerated that graph: >> >> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing >> >> >> I have zero experience with interpreting flamegraphs, but the most >> significant difference I see between the slow read example and the >> fast write example is in unix`thread_start --> unix`idle. There's >> a good chunk of "unix`i86_mwait" in the read example that is not >> present in the write example at all. >> >> Disabling the l2arc cache device didn't make a difference, and I >> had to reenable EIST support on the CPU to get my VMs to boot. >> >> I am seeing a variety of bug reports going back to 2010 regarding >> excessive mwait operations, with the suggested solutions usually >> being to set "cpupm enable poll-mode" in power.conf. That change >> also had no effect on speed. >> >> -Warren V >> >> >> >> >> -----Original Message----- >> >> From: Chris Siebenmann [mailto:cks at cs.toronto.edu] >> >> Sent: Monday, February 23, 2015 8:30 AM >> >> To: W Verb >> >> Cc: omnios-discuss at lists.omniti.com >> ; cks at cs.toronto.edu >> >> >> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and >> the Greek economy >> >> >> > Chris, thanks for your specific details. I'd appreciate it if you >> >> > could tell me which copper NIC you tried, as well as to pass on the >> >> > iSCSI tuning parameters. >> >> >> Our copper NIC experience is with onboard X540-AT2 ports on >> SuperMicro hardware (which have the guaranteed 10-20 msec lock >> hold) and dual-port 82599EB TN cards (which have some sort of >> driver/hardware failure under load that eventually leads to >> 2-second lock holds). I can't recommend either with the current >> driver; we had to revert to 1G networking in order to get stable >> servers. >> >> >> The iSCSI parameter modifications we do, across both initiators >> and targets, are: >> >> >> initialr2tno >> >> firstburstlength128k >> >> maxrecvdataseglen128k[only on Linux backends] >> >> maxxmitdataseglen128k[only on Linux backends] >> >> >> The OmniOS initiator doesn't need tuning for more than the first >> two parameters; on the Linux backends we tune up all four. My >> extended thoughts on these tuning parameters and why we touch them >> can be found >> >> here: >> >> >> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol >> >> http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning >> >> >> The short version is that these parameters probably only make a >> small difference but their overall goal is to do 128KB ZFS reads >> and writes in single iSCSI operations (although they will be >> fragmented at the TCP >> >> layer) and to do iSCSI writes without a back-and-forth delay >> between initiator and target (that's 'initialr2t no'). >> >> >> I think basically everyone should use InitialR2T set to no and in >> fact that it should be the software default. These days only >> unusually limited iSCSI targets should need it to be otherwise and >> they can change their setting for it (initiator and target must >> both agree to it being 'yes', so either can veto it). >> >> >> - cks >> >> >> >> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann > > wrote: >> >> Hi, >> >> I think your problem is caused by your link properties or your >> switch settings. In general the standard ixgbe seems to perform >> well. >> >> I had trouble after changing the default flow control settings >> to "bi" >> and this was my motivation to update the ixgbe driver a long >> time ago. >> After I have updated our systems to ixgbe 2.5.8 I never had any >> problems .... >> >> Make sure your switch has support for jumbo frames and you use >> the same mtu on all ports, otherwise the smallest will be used. >> >> What switch do you use? I can tell you nice horror stories about >> different vendors.... >> >> - Joerg >> >> On 23.02.2015 10:31, W Verb wrote: >> >> Thank you Joerg, >> >> I've downloaded the package and will try it tomorrow. >> >> The only thing I can add at this point is that upon review >> of my >> testing, I may have performed my "pkg -u" between the >> initial quad-gig >> performance test and installing the 10G NIC. So this may >> be a new >> problem introduced in the latest updates. >> >> Those of you who are running 10G and have not upgraded to >> the latest >> kernel, etc, might want to do some additional testing >> before running the >> update. >> >> -Warren V >> >> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann >> >> >> wrote: >> >> Hi, >> >> I remember there was a problem with the flow control >> settings in the >> ixgbe >> driver, so I updated it a long time ago for our >> internal servers to >> 2.5.8. >> Last weekend I integrated the latest changes from the >> FreeBSD driver >> to bring >> the illumos ixgbe to 2.5.25 but I had no time to test >> it, so it's >> completely >> untested! >> >> >> If you would like to give the latest driver a try you >> can fetch the >> kernel modules from >> https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 >> >> > > >> >> Clone your boot environment, place the modules in the >> new environment >> and update the boot-archive of the new BE. >> >> - Joerg >> >> >> >> >> >> On 23.02.2015 02:54, W Verb wrote: >> >> By the way, to those of you who have working >> setups: please send me >> your pool/volume settings, interface linkprops, >> and any kernel >> tuning >> parameters you may have set. >> >> Thanks, >> Warren V >> >> On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip >> >> >> >> wrote: >> >> I can't say I totally agree with your performance >> assessment. I run Intel >> X520 in all my OmniOS boxes. >> >> Here is a capture of nfssvrtop I made while >> running many >> storage vMotions >> between two OmniOS boxes hosting NFS >> datastores. This is a >> 10 host VMware >> cluster. Both OmniOS boxes are dual 10G >> connected with >> copper twin-ax to >> the in rack Nexus 5010. >> >> VMware does 100% sync writes, I use ZeusRAM >> SSDs for log >> devices. >> >> -Chip >> >> 2014 Apr 24 08:05:51, load: 12.64, read: >> 17330243 KB, >> swrite: 15985 KB, >> awrite: 1875455 KB >> >> Ver Client NFSOPS Reads >> SWrites AWrites >> Commits Rd_bw >> SWr_bw AWr_bw Rd_t SWr_t AWr_t >> Com_t Align% >> >> 4 10.28.17.105 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.17.215 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.17.213 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.16.151 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 all 1 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 3 10.28.16.175 3 0 >> 3 0 >> 0 1 >> 11 0 4806 48 0 0 85 >> >> 3 10.28.16.183 6 0 >> 6 0 >> 0 3 >> 162 0 549 124 0 0 >> 73 >> >> 3 10.28.16.180 11 0 >> 10 0 >> 0 3 >> 27 0 776 89 0 0 67 >> >> 3 10.28.16.176 28 2 >> 26 0 >> 0 10 >> 405 0 2572 198 0 0 >> 100 >> >> 3 10.28.16.178 4606 4602 >> 4 0 >> 0 294534 >> 3 0 723 49 0 0 99 >> >> 3 10.28.16.179 4905 4879 >> 26 0 >> 0 312208 >> 311 0 735 271 0 0 >> 99 >> >> 3 10.28.16.181 5515 5502 >> 13 0 >> 0 352107 >> 77 0 89 87 0 0 99 >> >> 3 10.28.16.184 12095 12059 >> 10 0 >> 0 763014 >> 39 0 249 147 0 0 99 >> >> 3 10.28.58.1 15401 6040 >> 116 6354 >> 53 191605 >> 474 202346 192 96 144 83 >> 99 >> >> 3 all 42574 33086 >> 217 >> 6354 53 1913488 >> 1582 202300 348 138 153 105 >> 99 >> >> >> >> >> >> On Fri, Feb 20, 2015 at 11:46 PM, W Verb >> >> > >> wrote: >> >> >> Hello All, >> >> Thank you for your replies. >> I tried a few things, and found the following: >> >> 1: Disabling hyperthreading support in the >> BIOS drops >> performance overall >> by a factor of 4. >> 2: Disabling VT support also seems to have >> some effect, >> although it >> appears to be minor. But this has the >> amusing side >> effect of fixing the >> hangs I've been experiencing with fast >> reboot. Probably >> by disabling kvm. >> 3: The performance tests are a bit tricky >> to quantify >> because of caching >> effects. In fact, I'm not entirely sure >> what is >> happening here. It's just >> best to describe what I'm seeing: >> >> The commands I'm using to test are >> dd if=/dev/zero of=./test.dd bs=2M count=5000 >> dd of=/dev/null if=./test.dd bs=2M count=5000 >> The host vm is running Centos 6.6, and has >> the latest >> vmtools installed. >> There is a host cache on an SSD local to >> the host that >> is also in place. >> Disabling the host cache didn't >> immediately have an >> effect as far as I could >> see. >> >> The host MTU set to 3000 on all iSCSI >> interfaces for all >> tests. >> >> Test 1: Right after reboot, with an ixgbe >> MTU of 9000, >> the write test >> yields an average speed over three tests >> of 137MB/s. The >> read test yields an >> average over three tests of 5MB/s. >> >> Test 2: After setting "ifconfig ixgbe0 mtu >> 3000", the >> write tests yield >> 140MB/s, and the read tests yield 53MB/s. >> It's important >> to note here that >> if I cut the read test short at only >> 2-3GB, I get >> results upwards of >> 350MB/s, which I assume is local >> cache-related distortion. >> >> Test 3: MTU of 1500. Read tests are up to >> 156 MB/s. >> Write tests yield >> about 142MB/s. >> Test 4: MTU of 1000: Read test at 182MB/s. >> Test 5: MTU of 900: Read test at 130 MB/s. >> Test 6: MTU of 1000: Read test at 160MB/s. >> Write tests >> are now >> consistently at about 300MB/s. >> Test 7: MTU of 1200: Read test at 124MB/s. >> Test 8: MTU of 1000: Read test at 161MB/s. >> Write at 261MB/s. >> >> A few final notes: >> L1ARC grabs about 10GB of RAM during the >> tests, so >> there's definitely some >> read caching going on. >> The write operations are easier to observe >> with iostat, >> and I'm seeing io >> rates that closely correlate with the >> network write speeds. >> >> >> Chris, thanks for your specific details. >> I'd appreciate >> it if you could >> tell me which copper NIC you tried, as >> well as to pass >> on the iSCSI tuning >> parameters. >> >> I've ordered an Intel EXPX9502AFXSR, which >> uses the >> 82598 chip instead of >> the 82599 in the X520. If I get similar >> results with my >> fiber transcievers, >> I'll see if I can get a hold of copper ones. >> >> But I should mention that I did indeed >> look at PHY/MAC >> error rates, and >> they are nil. >> >> -Warren V >> >> On Fri, Feb 20, 2015 at 7:25 PM, Chris >> Siebenmann >> > > >> >> >> wrote: >> >> >> After installation and >> configuration, I observed >> all kinds of bad >> behavior >> in the network traffic between the >> hosts and the >> server. All of this >> bad >> behavior is traced to the ixgbe >> driver on the >> storage server. Without >> going >> into the full troubleshooting >> process, here are >> my takeaways: >> >> [...] >> >> For what it's worth, we managed to >> achieve much >> better line rates on >> copper 10G ixgbe hardware of various >> descriptions >> between OmniOS >> and CentOS 7 (I don't think we ever >> tested OmniOS to >> OmniOS). I don't >> believe OmniOS could do TCP at full >> line rate but I >> think we managed 700+ >> Mbytes/sec on both transmit and >> receive and we got >> basically disk-limited >> speeds with iSCSI (across multiple >> disks on >> multi-disk mirrored pools, >> OmniOS iSCSI initiator, Linux iSCSI >> targets). >> >> I don't believe we did any specific >> kernel tuning >> (and in fact some of >> our attempts to fiddle ixgbe driver >> parameters blew >> up in our face). >> We did tune iSCSI connection >> parameters to increase >> various buffer >> sizes so that ZFS could do even large >> single >> operations in single iSCSI >> transactions. (More details available >> if people are >> interested.) >> >> 10: At the wire level, the speed >> problems are >> clearly due to pauses in >> response time by omnios. At 9000 >> byte frame >> sizes, I see a good number >> of duplicate ACKs and fast >> retransmits during >> read operations (when >> omnios is transmitting). But below >> about a >> 4100-byte MTU on omnios >> (which seems to correlate to >> 4096-byte iSCSI >> block transfers), the >> transmission errors fade away and >> we only see >> the transmission pause >> problem. >> >> >> This is what really attracted my >> attention. In >> our OmniOS setup, our >> specific Intel hardware had ixgbe >> driver issues that >> could cause >> activity stalls during once-a-second >> link heartbeat >> checks. This >> obviously had an effect at the TCP and >> iSCSI layers. >> My initial message >> to illumos-developer sparked a potentially >> interesting discussion: >> >> >> http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/ >> >> >> > > >> >> If you think this is a possibility in >> your setup, >> I've put the DTrace >> script I used to hunt for this up on >> the web: >> >> http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d >> >> >> > > >> >> This isn't the only potential source >> of driver >> stalls by any means, it's >> just the one I found. You may also >> want to look at >> lockstat in general, >> as information it reported is what led >> us to look >> specifically at the >> ixgbe code here. >> >> (If you suspect kernel/driver issues, >> lockstat >> combined with kernel >> source is a really excellent resource.) >> >> - cks >> >> >> >> >> >> ___________________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti >> .____com >> > > >> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss >> >> >> > > >> >> >> ___________________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti >> .____com >> > > >> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss >> >> >> > > >> >> >> -- >> OSN Online Service Nuernberg GmbH, Bucher Str. 78, >> 90408 Nuernberg >> Tel: +49 911 39905-0 >> - Fax: +49 911 >> 39905-55 - >> http://www.osn.de >> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg >> Goltermann >> >> >> >> -- >> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg >> Tel: +49 911 39905-0 - Fax: +49 >> 911 39905-55 - http://www.osn.de >> >> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >> >> >> *illumos-developer* | Archives >> >> >> | Modify Your Subscription >> [Powered by Listbox] >> > > > *illumos-developer* | Archives > > | > Modify > > Your Subscription [Powered by Listbox] > -- OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann From garrett at damore.org Mon Mar 2 15:08:06 2015 From: garrett at damore.org (Garrett D'Amore) Date: Mon, 2 Mar 2015 07:08:06 -0800 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> Message-ID: <72CA76E9-35A7-4B00-A7BE-A54C99F1B98C@damore.org> Seems like it is indeed a comstar problem. Lockstat analysis might reveal contended locks or perhaps some kind of timeouts in the code. Sent from my iPhone > On Mar 2, 2015, at 12:22 AM, W Verb wrote: > > Hello Garrett, > > No, no 802.3ad going on in this config. > > Here is a basic schematic: > > https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing > > Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide: > > https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing > > Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The switch is set to allow 9148-byte frames, and I'm not seeing any errors/buffer overruns on the switch. > > Here is a screenshot of a packet capture from a read operation on the guest OS (from it's local drive, which is actually a VMDK file on the storage server). In this example, only a single 1G ESXi kernel interface (vmk1) is bound to the software iSCSI initiator. > > https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing > > Note that there's a nice, well-behaved window sizing process taking place. The ESXi decreases the scaled window by 11 or 12 for each ACK, then bumps it back up to 512. > > Here is a similar screenshot of a single-interface write operation: > > https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing > > There are no pauses or gaps in the transmission rate in the single-interface transfers. > > > In the next screenshots, I have enabled an additional 1G interface on the ESXi host, and bound it to the iSCSI initiator. The new interface is bound to a separate physical port, uses a different VLAN on the switch, and talks to a different 10G port on the storage server. > > First, let's look at a write operation on the guest OS, which happily pumps data at near-line-rate to the storage server. > > Here is a sequence number trace diagram. Note how the transfer has a nice, smooth increment rate over the entire transfer. > > https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing > > Here are screenshots from packet captures on both 1G interfaces: > > https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing > https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing > > Note how we again see nice, smooth window adjustment, and no gaps in transmission. > > > But now, let's look at the problematic two-interface Read operation. > First, the sequence graph: > > https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing > > As you can see, there are gaps and jumps in the transmission throughout the transfer. > It is very illustrative to look at captures of the gaps, which are occurring on both interfaces: > > https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing > https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing > > As you can see, there are ~.4 second pauses in transmission from the storage server, which kills the transfer rate. > It's clear that the ESXi box ACKs the prior iSCSI operation to completion, then makes a new LUN request, which the storage server immediately replies to. The ESXi ACKs the response packet from the storage server, then waits...and waits....and waits... until eventually the storage server starts transmitting again. > > Because the pause happens while the ESXi client is waiting for a packet from the storage server, that tells me that the gaps are not an artifact of traffic being switched between both active interfaces, but are actually indicative of short hangs occurring on the server. > > Having a pause or two in transmission is no big deal, but in my case, it is happening constantly, and dropping my overall read transfer rate down to 20-60MB/s, which is slower than the single interface transfer rate (~90-100MB/s). > > Decreasing the MTU makes the pauses shorter, increasing them makes the pauses longer. > > Another interesting thing is that if I set the multipath io interval to 3 operations instead of 1, I get better throughput. In other words, the less frequently I swap IP addresses on my iSCSI requests from the ESXi unit, the fewer pauses I see. > > Basically, COMSTAR seems to choke each time an iSCSI request from a new IP arrives. > > Because the single interface transfer is near line rate, that tells me that the storage system (mpt_sas, zfs, etc) is working fine. It's only when multiple paths are attempted that iSCSI falls on its face during reads. > > All of these captures were taken without a cache device being attached to the storage zpool, so this isn't looking like some kind of ZFS ARC problem. As mentioned previously, local transfers to/from the zpool are showing ~300-500 MB/s rates over long transfers (10G+). > > -Warren V > >> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore wrote: >> I?m not sure I?ve followed properly. You have *two* interfaces. You are not trying to provision these in an aggr are you? As far as I?m aware, VMware does not support 802.3ad link aggregations. (Its possible that you can make it work with ESXi if you give the entire NIC to the guest ? but I?m skeptical.) The problem is that if you try to use link aggregation, some packets (up to half!) will be lost. TCP and other protocols fare poorly in this situation. >> >> Its possible I?ve totally misunderstood what you?re trying to do, in which case I apologize. >> >> The idle thing is a red-herring ? the cpu is waiting for work to do, probably because packets haven?t arrived (or where dropped by the hypervisor!) I wouldn?t read too much into that except that your network stack is in trouble. I?d look a bit more closely at the kstats for tcp ? I suspect you?ll see retransmits or out of order values that are unusually high ? if so this may help validate my theory above. >> >> - Garrett >> >>> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer wrote: >>> >>> Hello all, >>> >>> >>> Well, I no longer blame the ixgbe driver for the problems I'm seeing. >>> >>> >>> >>> I tried Joerg's updated driver, which didn't improve the issue. So I went back to the drawing board and rebuilt the server from scratch. >>> >>> What I noted is that if I have only a single 1-gig physical interface active on the ESXi host, everything works as expected. As soon as I enable two interfaces, I start seeing the performance problems I've described. >>> >>> Response pauses from the server that I see in TCPdumps are still leading me to believe the problem is delay on the server side, so I ran a series of kernel dtraces and produced some flamegraphs. >>> >>> >>> >>> This was taken during a read operation with two active 10G interfaces on the server, with a single target being shared by two tpgs- one tpg for each 10G physical port. The host device has two 1G ports enabled, with VLANs separating the active ports into 10G/1G pairs. ESXi is set to multipath using both VLANS with a round-robin IO interval of 1. >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing >>> >>> >>> >>> This was taken during a write operation: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing >>> >>> >>> >>> I then rebooted the server and disabled C-State, ACPI T-State, and general EIST (Turbo boost) functionality in the CPU. >>> >>> I when I attempted to boot my guest VM, the iSCSI transfer gradually ground to a halt during the boot loading process, and the guest OS never did complete its boot process. >>> >>> Here is a flamegraph taken while iSCSI is slowly dying: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing >>> >>> >>> I edited out cpu_idle_adaptive from the dtrace output and regenerated the slowdown graph: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing >>> >>> >>> I then edited cpu_idle_adaptive out of the speedy write operation and regenerated that graph: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing >>> >>> >>> I have zero experience with interpreting flamegraphs, but the most significant difference I see between the slow read example and the fast write example is in unix`thread_start --> unix`idle. There's a good chunk of "unix`i86_mwait" in the read example that is not present in the write example at all. >>> >>> Disabling the l2arc cache device didn't make a difference, and I had to reenable EIST support on the CPU to get my VMs to boot. >>> >>> I am seeing a variety of bug reports going back to 2010 regarding excessive mwait operations, with the suggested solutions usually being to set "cpupm enable poll-mode" in power.conf. That change also had no effect on speed. >>> >>> -Warren V >>> >>> >>> >>> >>> -----Original Message----- >>> >>> From: Chris Siebenmann [mailto:cks at cs.toronto.edu] >>> >>> Sent: Monday, February 23, 2015 8:30 AM >>> >>> To: W Verb >>> >>> Cc: omnios-discuss at lists.omniti.com; cks at cs.toronto.edu >>> >>> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy >>> >>> >>> > Chris, thanks for your specific details. I'd appreciate it if you >>> >>> > could tell me which copper NIC you tried, as well as to pass on the >>> >>> > iSCSI tuning parameters. >>> >>> >>> Our copper NIC experience is with onboard X540-AT2 ports on SuperMicro hardware (which have the guaranteed 10-20 msec lock hold) and dual-port 82599EB TN cards (which have some sort of driver/hardware failure under load that eventually leads to 2-second lock holds). I can't recommend either with the current driver; we had to revert to 1G networking in order to get stable servers. >>> >>> >>> The iSCSI parameter modifications we do, across both initiators and targets, are: >>> >>> >>> initialr2t no >>> >>> firstburstlength 128k >>> >>> maxrecvdataseglen 128k [only on Linux backends] >>> >>> maxxmitdataseglen 128k [only on Linux backends] >>> >>> >>> The OmniOS initiator doesn't need tuning for more than the first two parameters; on the Linux backends we tune up all four. My extended thoughts on these tuning parameters and why we touch them can be found >>> >>> here: >>> >>> >>> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol >>> >>> http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning >>> >>> >>> The short version is that these parameters probably only make a small difference but their overall goal is to do 128KB ZFS reads and writes in single iSCSI operations (although they will be fragmented at the TCP >>> >>> layer) and to do iSCSI writes without a back-and-forth delay between initiator and target (that's 'initialr2t no'). >>> >>> >>> I think basically everyone should use InitialR2T set to no and in fact that it should be the software default. These days only unusually limited iSCSI targets should need it to be otherwise and they can change their setting for it (initiator and target must both agree to it being 'yes', so either can veto it). >>> >>> >>> - cks >>> >>> >>> >>>> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann wrote: >>>> Hi, >>>> >>>> I think your problem is caused by your link properties or your >>>> switch settings. In general the standard ixgbe seems to perform >>>> well. >>>> >>>> I had trouble after changing the default flow control settings to "bi" >>>> and this was my motivation to update the ixgbe driver a long time ago. >>>> After I have updated our systems to ixgbe 2.5.8 I never had any >>>> problems .... >>>> >>>> Make sure your switch has support for jumbo frames and you use >>>> the same mtu on all ports, otherwise the smallest will be used. >>>> >>>> What switch do you use? I can tell you nice horror stories about >>>> different vendors.... >>>> >>>> - Joerg >>>> >>>>> On 23.02.2015 10:31, W Verb wrote: >>>>> Thank you Joerg, >>>>> >>>>> I've downloaded the package and will try it tomorrow. >>>>> >>>>> The only thing I can add at this point is that upon review of my >>>>> testing, I may have performed my "pkg -u" between the initial quad-gig >>>>> performance test and installing the 10G NIC. So this may be a new >>>>> problem introduced in the latest updates. >>>>> >>>>> Those of you who are running 10G and have not upgraded to the latest >>>>> kernel, etc, might want to do some additional testing before running the >>>>> update. >>>>> >>>>> -Warren V >>>>> >>>>> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann >>>> > wrote: >>>>> >>>>> Hi, >>>>> >>>>> I remember there was a problem with the flow control settings in the >>>>> ixgbe >>>>> driver, so I updated it a long time ago for our internal servers to >>>>> 2.5.8. >>>>> Last weekend I integrated the latest changes from the FreeBSD driver >>>>> to bring >>>>> the illumos ixgbe to 2.5.25 but I had no time to test it, so it's >>>>> completely >>>>> untested! >>>>> >>>>> >>>>> If you would like to give the latest driver a try you can fetch the >>>>> kernel modules from >>>>> https://cloud.osn.de/index.__php/s/Fb4so9RsNnXA7r9 >>>>> >>>>> >>>>> Clone your boot environment, place the modules in the new environment >>>>> and update the boot-archive of the new BE. >>>>> >>>>> - Joerg >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 23.02.2015 02:54, W Verb wrote: >>>>> >>>>> By the way, to those of you who have working setups: please send me >>>>> your pool/volume settings, interface linkprops, and any kernel >>>>> tuning >>>>> parameters you may have set. >>>>> >>>>> Thanks, >>>>> Warren V >>>>> >>>>> On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip >>>>> > wrote: >>>>> >>>>> I can't say I totally agree with your performance >>>>> assessment. I run Intel >>>>> X520 in all my OmniOS boxes. >>>>> >>>>> Here is a capture of nfssvrtop I made while running many >>>>> storage vMotions >>>>> between two OmniOS boxes hosting NFS datastores. This is a >>>>> 10 host VMware >>>>> cluster. Both OmniOS boxes are dual 10G connected with >>>>> copper twin-ax to >>>>> the in rack Nexus 5010. >>>>> >>>>> VMware does 100% sync writes, I use ZeusRAM SSDs for log >>>>> devices. >>>>> >>>>> -Chip >>>>> >>>>> 2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB, >>>>> swrite: 15985 KB, >>>>> awrite: 1875455 KB >>>>> >>>>> Ver Client NFSOPS Reads SWrites AWrites >>>>> Commits Rd_bw >>>>> SWr_bw AWr_bw Rd_t SWr_t AWr_t Com_t Align% >>>>> >>>>> 4 10.28.17.105 0 0 0 0 >>>>> 0 0 >>>>> 0 0 0 0 0 0 0 >>>>> >>>>> 4 10.28.17.215 0 0 0 0 >>>>> 0 0 >>>>> 0 0 0 0 0 0 0 >>>>> >>>>> 4 10.28.17.213 0 0 0 0 >>>>> 0 0 >>>>> 0 0 0 0 0 0 0 >>>>> >>>>> 4 10.28.16.151 0 0 0 0 >>>>> 0 0 >>>>> 0 0 0 0 0 0 0 >>>>> >>>>> 4 all 1 0 0 0 >>>>> 0 0 >>>>> 0 0 0 0 0 0 0 >>>>> >>>>> 3 10.28.16.175 3 0 3 0 >>>>> 0 1 >>>>> 11 0 4806 48 0 0 85 >>>>> >>>>> 3 10.28.16.183 6 0 6 0 >>>>> 0 3 >>>>> 162 0 549 124 0 0 73 >>>>> >>>>> 3 10.28.16.180 11 0 10 0 >>>>> 0 3 >>>>> 27 0 776 89 0 0 67 >>>>> >>>>> 3 10.28.16.176 28 2 26 0 >>>>> 0 10 >>>>> 405 0 2572 198 0 0 100 >>>>> >>>>> 3 10.28.16.178 4606 4602 4 0 >>>>> 0 294534 >>>>> 3 0 723 49 0 0 99 >>>>> >>>>> 3 10.28.16.179 4905 4879 26 0 >>>>> 0 312208 >>>>> 311 0 735 271 0 0 99 >>>>> >>>>> 3 10.28.16.181 5515 5502 13 0 >>>>> 0 352107 >>>>> 77 0 89 87 0 0 99 >>>>> >>>>> 3 10.28.16.184 12095 12059 10 0 >>>>> 0 763014 >>>>> 39 0 249 147 0 0 99 >>>>> >>>>> 3 10.28.58.1 15401 6040 116 6354 >>>>> 53 191605 >>>>> 474 202346 192 96 144 83 99 >>>>> >>>>> 3 all 42574 33086 217 >>>>> 6354 53 1913488 >>>>> 1582 202300 348 138 153 105 99 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Feb 20, 2015 at 11:46 PM, W Verb >>>> > wrote: >>>>> >>>>> >>>>> Hello All, >>>>> >>>>> Thank you for your replies. >>>>> I tried a few things, and found the following: >>>>> >>>>> 1: Disabling hyperthreading support in the BIOS drops >>>>> performance overall >>>>> by a factor of 4. >>>>> 2: Disabling VT support also seems to have some effect, >>>>> although it >>>>> appears to be minor. But this has the amusing side >>>>> effect of fixing the >>>>> hangs I've been experiencing with fast reboot. Probably >>>>> by disabling kvm. >>>>> 3: The performance tests are a bit tricky to quantify >>>>> because of caching >>>>> effects. In fact, I'm not entirely sure what is >>>>> happening here. It's just >>>>> best to describe what I'm seeing: >>>>> >>>>> The commands I'm using to test are >>>>> dd if=/dev/zero of=./test.dd bs=2M count=5000 >>>>> dd of=/dev/null if=./test.dd bs=2M count=5000 >>>>> The host vm is running Centos 6.6, and has the latest >>>>> vmtools installed. >>>>> There is a host cache on an SSD local to the host that >>>>> is also in place. >>>>> Disabling the host cache didn't immediately have an >>>>> effect as far as I could >>>>> see. >>>>> >>>>> The host MTU set to 3000 on all iSCSI interfaces for all >>>>> tests. >>>>> >>>>> Test 1: Right after reboot, with an ixgbe MTU of 9000, >>>>> the write test >>>>> yields an average speed over three tests of 137MB/s. The >>>>> read test yields an >>>>> average over three tests of 5MB/s. >>>>> >>>>> Test 2: After setting "ifconfig ixgbe0 mtu 3000", the >>>>> write tests yield >>>>> 140MB/s, and the read tests yield 53MB/s. It's important >>>>> to note here that >>>>> if I cut the read test short at only 2-3GB, I get >>>>> results upwards of >>>>> 350MB/s, which I assume is local cache-related distortion. >>>>> >>>>> Test 3: MTU of 1500. Read tests are up to 156 MB/s. >>>>> Write tests yield >>>>> about 142MB/s. >>>>> Test 4: MTU of 1000: Read test at 182MB/s. >>>>> Test 5: MTU of 900: Read test at 130 MB/s. >>>>> Test 6: MTU of 1000: Read test at 160MB/s. Write tests >>>>> are now >>>>> consistently at about 300MB/s. >>>>> Test 7: MTU of 1200: Read test at 124MB/s. >>>>> Test 8: MTU of 1000: Read test at 161MB/s. Write at 261MB/s. >>>>> >>>>> A few final notes: >>>>> L1ARC grabs about 10GB of RAM during the tests, so >>>>> there's definitely some >>>>> read caching going on. >>>>> The write operations are easier to observe with iostat, >>>>> and I'm seeing io >>>>> rates that closely correlate with the network write speeds. >>>>> >>>>> >>>>> Chris, thanks for your specific details. I'd appreciate >>>>> it if you could >>>>> tell me which copper NIC you tried, as well as to pass >>>>> on the iSCSI tuning >>>>> parameters. >>>>> >>>>> I've ordered an Intel EXPX9502AFXSR, which uses the >>>>> 82598 chip instead of >>>>> the 82599 in the X520. If I get similar results with my >>>>> fiber transcievers, >>>>> I'll see if I can get a hold of copper ones. >>>>> >>>>> But I should mention that I did indeed look at PHY/MAC >>>>> error rates, and >>>>> they are nil. >>>>> >>>>> -Warren V >>>>> >>>>> On Fri, Feb 20, 2015 at 7:25 PM, Chris Siebenmann >>>>> > >>>>> >>>>> wrote: >>>>> >>>>> >>>>> After installation and configuration, I observed >>>>> all kinds of bad >>>>> behavior >>>>> in the network traffic between the hosts and the >>>>> server. All of this >>>>> bad >>>>> behavior is traced to the ixgbe driver on the >>>>> storage server. Without >>>>> going >>>>> into the full troubleshooting process, here are >>>>> my takeaways: >>>>> >>>>> [...] >>>>> >>>>> For what it's worth, we managed to achieve much >>>>> better line rates on >>>>> copper 10G ixgbe hardware of various descriptions >>>>> between OmniOS >>>>> and CentOS 7 (I don't think we ever tested OmniOS to >>>>> OmniOS). I don't >>>>> believe OmniOS could do TCP at full line rate but I >>>>> think we managed 700+ >>>>> Mbytes/sec on both transmit and receive and we got >>>>> basically disk-limited >>>>> speeds with iSCSI (across multiple disks on >>>>> multi-disk mirrored pools, >>>>> OmniOS iSCSI initiator, Linux iSCSI targets). >>>>> >>>>> I don't believe we did any specific kernel tuning >>>>> (and in fact some of >>>>> our attempts to fiddle ixgbe driver parameters blew >>>>> up in our face). >>>>> We did tune iSCSI connection parameters to increase >>>>> various buffer >>>>> sizes so that ZFS could do even large single >>>>> operations in single iSCSI >>>>> transactions. (More details available if people are >>>>> interested.) >>>>> >>>>> 10: At the wire level, the speed problems are >>>>> clearly due to pauses in >>>>> response time by omnios. At 9000 byte frame >>>>> sizes, I see a good number >>>>> of duplicate ACKs and fast retransmits during >>>>> read operations (when >>>>> omnios is transmitting). But below about a >>>>> 4100-byte MTU on omnios >>>>> (which seems to correlate to 4096-byte iSCSI >>>>> block transfers), the >>>>> transmission errors fade away and we only see >>>>> the transmission pause >>>>> problem. >>>>> >>>>> >>>>> This is what really attracted my attention. In >>>>> our OmniOS setup, our >>>>> specific Intel hardware had ixgbe driver issues that >>>>> could cause >>>>> activity stalls during once-a-second link heartbeat >>>>> checks. This >>>>> obviously had an effect at the TCP and iSCSI layers. >>>>> My initial message >>>>> to illumos-developer sparked a potentially >>>>> interesting discussion: >>>>> >>>>> >>>>> http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/ >>>>> >>>>> >>>>> If you think this is a possibility in your setup, >>>>> I've put the DTrace >>>>> script I used to hunt for this up on the web: >>>>> >>>>> http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d >>>>> >>>>> >>>>> This isn't the only potential source of driver >>>>> stalls by any means, it's >>>>> just the one I found. You may also want to look at >>>>> lockstat in general, >>>>> as information it reported is what led us to look >>>>> specifically at the >>>>> ixgbe code here. >>>>> >>>>> (If you suspect kernel/driver issues, lockstat >>>>> combined with kernel >>>>> source is a really excellent resource.) >>>>> >>>>> - cks >>>>> >>>>> >>>>> >>>>> >>>>> _________________________________________________ >>>>> OmniOS-discuss mailing list >>>>> OmniOS-discuss at lists.omniti.__com >>>>> >>>>> http://lists.omniti.com/__mailman/listinfo/omnios-__discuss >>>>> >>>>> >>>>> >>>>> _________________________________________________ >>>>> OmniOS-discuss mailing list >>>>> OmniOS-discuss at lists.omniti.__com >>>>> >>>>> http://lists.omniti.com/__mailman/listinfo/omnios-__discuss >>>>> >>>>> >>>>> >>>>> -- >>>>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg >>>>> Tel: +49 911 39905-0 - Fax: +49 911 >>>>> 39905-55 - http://www.osn.de >>>>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >>>>> >>>>> >>>> >>>> -- >>>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg >>>> Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de >>>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >>> >>> illumos-developer | Archives | Modify Your Subscription >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wverb73 at gmail.com Mon Mar 2 19:07:45 2015 From: wverb73 at gmail.com (W Verb) Date: Mon, 2 Mar 2015 11:07:45 -0800 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: <54F44602.5030705@osn.de> References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> <54F44602.5030705@osn.de> Message-ID: Hello all, I am not using layer 2 flow control. The switch carries line-rate 10G traffic without error. I think I have found the issue via lockstat. The first lockstat is taken during a multipath read: lockstat -kWP sleep 30 Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec) Count indv cuml rcnt nsec Hottest Lock Caller ------------------------------------------------------------------------------- 9306 44% 44% 0.00 1557 htable_mutex+0x370 htable_release 6307 23% 68% 0.00 1207 htable_mutex+0x108 htable_lookup 596 7% 75% 0.00 4100 0xffffff0931705188 cv_wait 349 5% 80% 0.00 4437 0xffffff0931705188 taskq_thread 704 2% 82% 0.00 995 0xffffff0935de3c50 dbuf_create The hash table being read here I would guess is the tcp connection hash table. When lockstat is run during a multipath write operation, I get: Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec) Count indv cuml rcnt nsec Hottest Lock Caller ------------------------------------------------------------------------------- 210752 28% 28% 0.00 4781 0xffffff0931705188 taskq_thread 174471 22% 50% 0.00 4476 0xffffff0931705188 cv_wait 127183 10% 61% 0.00 2871 0xffffff096f29b510 zio_notify_parent 176066 10% 70% 0.00 1922 0xffffff0931705188 taskq_dispatch_ent 105134 9% 80% 0.00 3110 0xffffff096ffdbf10 zio_remove_child 67512 4% 83% 0.00 1938 0xffffff096f3db4b0 zio_add_child 45736 3% 86% 0.00 2239 0xffffff0935de3c50 dbuf_destroy 27781 3% 89% 0.00 3416 0xffffff0935de3c50 dbuf_create 38536 2% 91% 0.00 2122 0xffffff0935de3b70 dnode_rele 27841 2% 93% 0.00 2423 0xffffff0935de3b70 dnode_diduse_space 19020 2% 95% 0.00 3046 0xffffff09d9e305e0 dbuf_rele 14627 1% 96% 0.00 3632 dbuf_hash_table+0x4f8 dbuf_find Writes are not performing htable lookups, while reads are. -Warren V On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann wrote: > Hi, > > I would try *one* TPG which includes both interface addresses > and I would double check for packet drops on the Catalyst. > > The 3560 supports only receive flow control which means, that > a sending 10Gbit port can easily overload a 1Gbit port. > Do you have flow control enabled? > > - Joerg > > > On 02.03.2015 09:22, W Verb via illumos-developer wrote: > >> Hello Garrett, >> >> No, no 802.3ad going on in this config. >> >> Here is a basic schematic: >> >> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/ >> view?usp=sharing >> >> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide: >> >> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/ >> view?usp=sharing >> >> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The >> switch is set to allow 9148-byte frames, and I'm not seeing any >> errors/buffer overruns on the switch. >> >> Here is a screenshot of a packet capture from a read operation on the >> guest OS (from it's local drive, which is actually a VMDK file on the >> storage server). In this example, only a single 1G ESXi kernel interface >> (vmk1) is bound to the software iSCSI initiator. >> >> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/ >> view?usp=sharing >> >> Note that there's a nice, well-behaved window sizing process taking >> place. The ESXi decreases the scaled window by 11 or 12 for each ACK, >> then bumps it back up to 512. >> >> Here is a similar screenshot of a single-interface write operation: >> >> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/ >> view?usp=sharing >> >> There are no pauses or gaps in the transmission rate in the >> single-interface transfers. >> >> >> In the next screenshots, I have enabled an additional 1G interface on >> the ESXi host, and bound it to the iSCSI initiator. The new interface is >> bound to a separate physical port, uses a different VLAN on the switch, >> and talks to a different 10G port on the storage server. >> >> First, let's look at a write operation on the guest OS, which happily >> pumps data at near-line-rate to the storage server. >> >> Here is a sequence number trace diagram. Note how the transfer has a >> nice, smooth increment rate over the entire transfer. >> >> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/ >> view?usp=sharing >> >> Here are screenshots from packet captures on both 1G interfaces: >> >> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/ >> view?usp=sharing >> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/ >> view?usp=sharing >> >> Note how we again see nice, smooth window adjustment, and no gaps in >> transmission. >> >> >> But now, let's look at the problematic two-interface Read operation. >> First, the sequence graph: >> >> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/ >> view?usp=sharing >> >> As you can see, there are gaps and jumps in the transmission throughout >> the transfer. >> It is very illustrative to look at captures of the gaps, which are >> occurring on both interfaces: >> >> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/ >> view?usp=sharing >> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/ >> view?usp=sharing >> >> As you can see, there are ~.4 second pauses in transmission from the >> storage server, which kills the transfer rate. >> It's clear that the ESXi box ACKs the prior iSCSI operation to >> completion, then makes a new LUN request, which the storage server >> immediately replies to. The ESXi ACKs the response packet from the >> storage server, then waits...and waits....and waits... until eventually >> the storage server starts transmitting again. >> >> Because the pause happens while the ESXi client is waiting for a packet >> from the storage server, that tells me that the gaps are not an artifact >> of traffic being switched between both active interfaces, but are >> actually indicative of short hangs occurring on the server. >> >> Having a pause or two in transmission is no big deal, but in my case, it >> is happening constantly, and dropping my overall read transfer rate down >> to 20-60MB/s, which is slower than the single interface transfer rate >> (~90-100MB/s). >> >> Decreasing the MTU makes the pauses shorter, increasing them makes the >> pauses longer. >> >> Another interesting thing is that if I set the multipath io interval to >> 3 operations instead of 1, I get better throughput. In other words, the >> less frequently I swap IP addresses on my iSCSI requests from the ESXi >> unit, the fewer pauses I see. >> >> Basically, COMSTAR seems to choke each time an iSCSI request from a new >> IP arrives. >> >> Because the single interface transfer is near line rate, that tells me >> that the storage system (mpt_sas, zfs, etc) is working fine. It's only >> when multiple paths are attempted that iSCSI falls on its face during >> reads. >> >> All of these captures were taken without a cache device being attached >> to the storage zpool, so this isn't looking like some kind of ZFS ARC >> problem. As mentioned previously, local transfers to/from the zpool are >> showing ~300-500 MB/s rates over long transfers (10G+). >> >> -Warren V >> >> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore > > wrote: >> >> I?m not sure I?ve followed properly. You have *two* interfaces. >> You are not trying to provision these in an aggr are you? As far as >> I?m aware, VMware does not support 802.3ad link aggregations. (Its >> possible that you can make it work with ESXi if you give the entire >> NIC to the guest ? but I?m skeptical.) The problem is that if you >> try to use link aggregation, some packets (up to half!) will be >> lost. TCP and other protocols fare poorly in this situation. >> >> Its possible I?ve totally misunderstood what you?re trying to do, in >> which case I apologize. >> >> The idle thing is a red-herring ? the cpu is waiting for work to do, >> probably because packets haven?t arrived (or where dropped by the >> hypervisor!) I wouldn?t read too much into that except that your >> network stack is in trouble. I?d look a bit more closely at the >> kstats for tcp ? I suspect you?ll see retransmits or out of order >> values that are unusually high ? if so this may help validate my >> theory above. >> >> - Garrett >> >> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer >>> > >>> >>> wrote: >>> >>> Hello all, >>> >>> >>> Well, I no longer blame the ixgbe driver for the problems I'm seeing. >>> >>> >>> I tried Joerg's updated driver, which didn't improve the issue. So >>> I went back to the drawing board and rebuilt the server from scratch. >>> >>> What I noted is that if I have only a single 1-gig physical >>> interface active on the ESXi host, everything works as expected. >>> As soon as I enable two interfaces, I start seeing the performance >>> problems I've described. >>> >>> Response pauses from the server that I see in TCPdumps are still >>> leading me to believe the problem is delay on the server side, so >>> I ran a series of kernel dtraces and produced some flamegraphs. >>> >>> >>> This was taken during a read operation with two active 10G >>> interfaces on the server, with a single target being shared by two >>> tpgs- one tpg for each 10G physical port. The host device has two >>> 1G ports enabled, with VLANs separating the active ports into >>> 10G/1G pairs. ESXi is set to multipath using both VLANS with a >>> round-robin IO interval of 1. >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/ >>> view?usp=sharing >>> >>> >>> This was taken during a write operation: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/ >>> view?usp=sharing >>> >>> >>> I then rebooted the server and disabled C-State, ACPI T-State, and >>> general EIST (Turbo boost) functionality in the CPU. >>> >>> I when I attempted to boot my guest VM, the iSCSI transfer >>> gradually ground to a halt during the boot loading process, and >>> the guest OS never did complete its boot process. >>> >>> Here is a flamegraph taken while iSCSI is slowly dying: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/ >>> view?usp=sharing >>> >>> >>> I edited out cpu_idle_adaptive from the dtrace output and >>> regenerated the slowdown graph: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/ >>> view?usp=sharing >>> >>> >>> I then edited cpu_idle_adaptive out of the speedy write operation >>> and regenerated that graph: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/ >>> view?usp=sharing >>> >>> >>> I have zero experience with interpreting flamegraphs, but the most >>> significant difference I see between the slow read example and the >>> fast write example is in unix`thread_start --> unix`idle. There's >>> a good chunk of "unix`i86_mwait" in the read example that is not >>> present in the write example at all. >>> >>> Disabling the l2arc cache device didn't make a difference, and I >>> had to reenable EIST support on the CPU to get my VMs to boot. >>> >>> I am seeing a variety of bug reports going back to 2010 regarding >>> excessive mwait operations, with the suggested solutions usually >>> being to set "cpupm enable poll-mode" in power.conf. That change >>> also had no effect on speed. >>> >>> -Warren V >>> >>> >>> >>> >>> -----Original Message----- >>> >>> From: Chris Siebenmann [mailto:cks at cs.toronto.edu] >>> >>> Sent: Monday, February 23, 2015 8:30 AM >>> >>> To: W Verb >>> >>> Cc: omnios-discuss at lists.omniti.com >>> ; cks at cs.toronto.edu >>> >>> >>> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and >>> the Greek economy >>> >>> >>> > Chris, thanks for your specific details. I'd appreciate it if you >>> >>> > could tell me which copper NIC you tried, as well as to pass on the >>> >>> > iSCSI tuning parameters. >>> >>> >>> Our copper NIC experience is with onboard X540-AT2 ports on >>> SuperMicro hardware (which have the guaranteed 10-20 msec lock >>> hold) and dual-port 82599EB TN cards (which have some sort of >>> driver/hardware failure under load that eventually leads to >>> 2-second lock holds). I can't recommend either with the current >>> driver; we had to revert to 1G networking in order to get stable >>> servers. >>> >>> >>> The iSCSI parameter modifications we do, across both initiators >>> and targets, are: >>> >>> >>> initialr2tno >>> >>> firstburstlength128k >>> >>> maxrecvdataseglen128k[only on Linux backends] >>> >>> maxxmitdataseglen128k[only on Linux backends] >>> >>> >>> The OmniOS initiator doesn't need tuning for more than the first >>> two parameters; on the Linux backends we tune up all four. My >>> extended thoughts on these tuning parameters and why we touch them >>> can be found >>> >>> here: >>> >>> >>> http://utcc.utoronto.ca/~cks/space/blog/tech/ >>> UnderstandingiSCSIProtocol >>> >>> http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning >>> >>> >>> The short version is that these parameters probably only make a >>> small difference but their overall goal is to do 128KB ZFS reads >>> and writes in single iSCSI operations (although they will be >>> fragmented at the TCP >>> >>> layer) and to do iSCSI writes without a back-and-forth delay >>> between initiator and target (that's 'initialr2t no'). >>> >>> >>> I think basically everyone should use InitialR2T set to no and in >>> fact that it should be the software default. These days only >>> unusually limited iSCSI targets should need it to be otherwise and >>> they can change their setting for it (initiator and target must >>> both agree to it being 'yes', so either can veto it). >>> >>> >>> - cks >>> >>> >>> >>> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann >> > wrote: >>> >>> Hi, >>> >>> I think your problem is caused by your link properties or your >>> switch settings. In general the standard ixgbe seems to perform >>> well. >>> >>> I had trouble after changing the default flow control settings >>> to "bi" >>> and this was my motivation to update the ixgbe driver a long >>> time ago. >>> After I have updated our systems to ixgbe 2.5.8 I never had any >>> problems .... >>> >>> Make sure your switch has support for jumbo frames and you use >>> the same mtu on all ports, otherwise the smallest will be used. >>> >>> What switch do you use? I can tell you nice horror stories about >>> different vendors.... >>> >>> - Joerg >>> >>> On 23.02.2015 10:31, W Verb wrote: >>> >>> Thank you Joerg, >>> >>> I've downloaded the package and will try it tomorrow. >>> >>> The only thing I can add at this point is that upon review >>> of my >>> testing, I may have performed my "pkg -u" between the >>> initial quad-gig >>> performance test and installing the 10G NIC. So this may >>> be a new >>> problem introduced in the latest updates. >>> >>> Those of you who are running 10G and have not upgraded to >>> the latest >>> kernel, etc, might want to do some additional testing >>> before running the >>> update. >>> >>> -Warren V >>> >>> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann >>> >>> >> wrote: >>> >>> Hi, >>> >>> I remember there was a problem with the flow control >>> settings in the >>> ixgbe >>> driver, so I updated it a long time ago for our >>> internal servers to >>> 2.5.8. >>> Last weekend I integrated the latest changes from the >>> FreeBSD driver >>> to bring >>> the illumos ixgbe to 2.5.25 but I had no time to test >>> it, so it's >>> completely >>> untested! >>> >>> >>> If you would like to give the latest driver a try you >>> can fetch the >>> kernel modules from >>> https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 >>> >>> >> > >>> >>> Clone your boot environment, place the modules in the >>> new environment >>> and update the boot-archive of the new BE. >>> >>> - Joerg >>> >>> >>> >>> >>> >>> On 23.02.2015 02:54, W Verb wrote: >>> >>> By the way, to those of you who have working >>> setups: please send me >>> your pool/volume settings, interface linkprops, >>> and any kernel >>> tuning >>> parameters you may have set. >>> >>> Thanks, >>> Warren V >>> >>> On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip >>> >>> >> >>> >>> wrote: >>> >>> I can't say I totally agree with your performance >>> assessment. I run Intel >>> X520 in all my OmniOS boxes. >>> >>> Here is a capture of nfssvrtop I made while >>> running many >>> storage vMotions >>> between two OmniOS boxes hosting NFS >>> datastores. This is a >>> 10 host VMware >>> cluster. Both OmniOS boxes are dual 10G >>> connected with >>> copper twin-ax to >>> the in rack Nexus 5010. >>> >>> VMware does 100% sync writes, I use ZeusRAM >>> SSDs for log >>> devices. >>> >>> -Chip >>> >>> 2014 Apr 24 08:05:51, load: 12.64, read: >>> 17330243 KB, >>> swrite: 15985 KB, >>> awrite: 1875455 KB >>> >>> Ver Client NFSOPS Reads >>> SWrites AWrites >>> Commits Rd_bw >>> SWr_bw AWr_bw Rd_t SWr_t AWr_t >>> Com_t Align% >>> >>> 4 10.28.17.105 0 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 10.28.17.215 0 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 10.28.17.213 0 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 10.28.16.151 0 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 all 1 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 3 10.28.16.175 3 0 >>> 3 0 >>> 0 1 >>> 11 0 4806 48 0 0 >>> 85 >>> >>> 3 10.28.16.183 6 0 >>> 6 0 >>> 0 3 >>> 162 0 549 124 0 0 >>> 73 >>> >>> 3 10.28.16.180 11 0 >>> 10 0 >>> 0 3 >>> 27 0 776 89 0 0 >>> 67 >>> >>> 3 10.28.16.176 28 2 >>> 26 0 >>> 0 10 >>> 405 0 2572 198 0 0 >>> 100 >>> >>> 3 10.28.16.178 4606 4602 >>> 4 0 >>> 0 294534 >>> 3 0 723 49 0 0 99 >>> >>> 3 10.28.16.179 4905 4879 >>> 26 0 >>> 0 312208 >>> 311 0 735 271 0 0 >>> 99 >>> >>> 3 10.28.16.181 5515 5502 >>> 13 0 >>> 0 352107 >>> 77 0 89 87 0 0 >>> 99 >>> >>> 3 10.28.16.184 12095 12059 >>> 10 0 >>> 0 763014 >>> 39 0 249 147 0 0 >>> 99 >>> >>> 3 10.28.58.1 15401 6040 >>> 116 6354 >>> 53 191605 >>> 474 202346 192 96 144 83 >>> 99 >>> >>> 3 all 42574 33086 >>> 217 >>> 6354 53 1913488 >>> 1582 202300 348 138 153 105 >>> 99 >>> >>> >>> >>> >>> >>> On Fri, Feb 20, 2015 at 11:46 PM, W Verb >>> >>> >> >>> >> wrote: >>> >>> >>> Hello All, >>> >>> Thank you for your replies. >>> I tried a few things, and found the >>> following: >>> >>> 1: Disabling hyperthreading support in the >>> BIOS drops >>> performance overall >>> by a factor of 4. >>> 2: Disabling VT support also seems to have >>> some effect, >>> although it >>> appears to be minor. But this has the >>> amusing side >>> effect of fixing the >>> hangs I've been experiencing with fast >>> reboot. Probably >>> by disabling kvm. >>> 3: The performance tests are a bit tricky >>> to quantify >>> because of caching >>> effects. In fact, I'm not entirely sure >>> what is >>> happening here. It's just >>> best to describe what I'm seeing: >>> >>> The commands I'm using to test are >>> dd if=/dev/zero of=./test.dd bs=2M count=5000 >>> dd of=/dev/null if=./test.dd bs=2M count=5000 >>> The host vm is running Centos 6.6, and has >>> the latest >>> vmtools installed. >>> There is a host cache on an SSD local to >>> the host that >>> is also in place. >>> Disabling the host cache didn't >>> immediately have an >>> effect as far as I could >>> see. >>> >>> The host MTU set to 3000 on all iSCSI >>> interfaces for all >>> tests. >>> >>> Test 1: Right after reboot, with an ixgbe >>> MTU of 9000, >>> the write test >>> yields an average speed over three tests >>> of 137MB/s. The >>> read test yields an >>> average over three tests of 5MB/s. >>> >>> Test 2: After setting "ifconfig ixgbe0 mtu >>> 3000", the >>> write tests yield >>> 140MB/s, and the read tests yield 53MB/s. >>> It's important >>> to note here that >>> if I cut the read test short at only >>> 2-3GB, I get >>> results upwards of >>> 350MB/s, which I assume is local >>> cache-related distortion. >>> >>> Test 3: MTU of 1500. Read tests are up to >>> 156 MB/s. >>> Write tests yield >>> about 142MB/s. >>> Test 4: MTU of 1000: Read test at 182MB/s. >>> Test 5: MTU of 900: Read test at 130 MB/s. >>> Test 6: MTU of 1000: Read test at 160MB/s. >>> Write tests >>> are now >>> consistently at about 300MB/s. >>> Test 7: MTU of 1200: Read test at 124MB/s. >>> Test 8: MTU of 1000: Read test at 161MB/s. >>> Write at 261MB/s. >>> >>> A few final notes: >>> L1ARC grabs about 10GB of RAM during the >>> tests, so >>> there's definitely some >>> read caching going on. >>> The write operations are easier to observe >>> with iostat, >>> and I'm seeing io >>> rates that closely correlate with the >>> network write speeds. >>> >>> >>> Chris, thanks for your specific details. >>> I'd appreciate >>> it if you could >>> tell me which copper NIC you tried, as >>> well as to pass >>> on the iSCSI tuning >>> parameters. >>> >>> I've ordered an Intel EXPX9502AFXSR, which >>> uses the >>> 82598 chip instead of >>> the 82599 in the X520. If I get similar >>> results with my >>> fiber transcievers, >>> I'll see if I can get a hold of copper ones. >>> >>> But I should mention that I did indeed >>> look at PHY/MAC >>> error rates, and >>> they are nil. >>> >>> -Warren V >>> >>> On Fri, Feb 20, 2015 at 7:25 PM, Chris >>> Siebenmann >>> >> >> >>> >> >>> >>> wrote: >>> >>> >>> After installation and >>> configuration, I observed >>> all kinds of bad >>> behavior >>> in the network traffic between the >>> hosts and the >>> server. All of this >>> bad >>> behavior is traced to the ixgbe >>> driver on the >>> storage server. Without >>> going >>> into the full troubleshooting >>> process, here are >>> my takeaways: >>> >>> [...] >>> >>> For what it's worth, we managed to >>> achieve much >>> better line rates on >>> copper 10G ixgbe hardware of various >>> descriptions >>> between OmniOS >>> and CentOS 7 (I don't think we ever >>> tested OmniOS to >>> OmniOS). I don't >>> believe OmniOS could do TCP at full >>> line rate but I >>> think we managed 700+ >>> Mbytes/sec on both transmit and >>> receive and we got >>> basically disk-limited >>> speeds with iSCSI (across multiple >>> disks on >>> multi-disk mirrored pools, >>> OmniOS iSCSI initiator, Linux iSCSI >>> targets). >>> >>> I don't believe we did any specific >>> kernel tuning >>> (and in fact some of >>> our attempts to fiddle ixgbe driver >>> parameters blew >>> up in our face). >>> We did tune iSCSI connection >>> parameters to increase >>> various buffer >>> sizes so that ZFS could do even large >>> single >>> operations in single iSCSI >>> transactions. (More details available >>> if people are >>> interested.) >>> >>> 10: At the wire level, the speed >>> problems are >>> clearly due to pauses in >>> response time by omnios. At 9000 >>> byte frame >>> sizes, I see a good number >>> of duplicate ACKs and fast >>> retransmits during >>> read operations (when >>> omnios is transmitting). But below >>> about a >>> 4100-byte MTU on omnios >>> (which seems to correlate to >>> 4096-byte iSCSI >>> block transfers), the >>> transmission errors fade away and >>> we only see >>> the transmission pause >>> problem. >>> >>> >>> This is what really attracted my >>> attention. In >>> our OmniOS setup, our >>> specific Intel hardware had ixgbe >>> driver issues that >>> could cause >>> activity stalls during once-a-second >>> link heartbeat >>> checks. This >>> obviously had an effect at the TCP and >>> iSCSI layers. >>> My initial message >>> to illumos-developer sparked a >>> potentially >>> interesting discussion: >>> >>> >>> http://www.listbox.com/member/____archive/182179/2014/10/ >>> sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__ >>> 4B1D-__11E4-A39C-D534381BA44D/ >>> >> 10/sort/__time_rev/page/16/entry/6:405/__20141003125035: >>> 6357079A-4B1D-__11E4-A39C-D534381BA44D/> >>> >>> >> __sort/time_rev/page/16/entry/6:__405/20141003125035: >>> 6357079A-__4B1D-11E4-A39C-D534381BA44D/ >>> >> sort/time_rev/page/16/entry/6:405/20141003125035:6357079A- >>> 4B1D-11E4-A39C-D534381BA44D/>> >>> >>> If you think this is a possibility in >>> your setup, >>> I've put the DTrace >>> script I used to hunt for this up on >>> the web: >>> >>> http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe___ >>> __delay.d >>> >> delay.d> >>> >>> >> delay.d >>> >> delay.d>> >>> >>> This isn't the only potential source >>> of driver >>> stalls by any means, it's >>> just the one I found. You may also >>> want to look at >>> lockstat in general, >>> as information it reported is what led >>> us to look >>> specifically at the >>> ixgbe code here. >>> >>> (If you suspect kernel/driver issues, >>> lockstat >>> combined with kernel >>> source is a really excellent resource.) >>> >>> - cks >>> >>> >>> >>> >>> >>> ___________________________________________________ >>> OmniOS-discuss mailing list >>> OmniOS-discuss at lists.omniti >>> .____com >>> >> > >>> http://lists.omniti.com/____mailman/listinfo/omnios-____ >>> discuss >>> >> > >>> >>> >> > >>> >>> >>> ___________________________________________________ >>> OmniOS-discuss mailing list >>> OmniOS-discuss at lists.omniti >>> .____com >>> >> > >>> http://lists.omniti.com/____mailman/listinfo/omnios-____ >>> discuss >>> >> > >>> >>> >> > >>> >>> >>> -- >>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, >>> 90408 Nuernberg >>> Tel: +49 911 39905-0 >>> - Fax: +49 911 >>> 39905-55 - >>> http://www.osn.de >>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg >>> Goltermann >>> >>> >>> >>> -- >>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 >>> Nuernberg >>> Tel: +49 911 39905-0 - Fax: +49 >>> 911 39905-55 - http://www.osn.de >>> >>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >>> >>> >>> *illumos-developer* | Archives >>> >>> >> > >>> | Modify Your Subscription >>> [Powered by Listbox] >>> >>> >> >> *illumos-developer* | Archives >> >> | >> Modify >> > secret=21175123-d92578cc> >> Your Subscription [Powered by Listbox] >> >> > -- > OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg > Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de > HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Mon Mar 2 19:15:45 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 2 Mar 2015 14:15:45 -0500 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> <54F44602.5030705@osn.de> Message-ID: > On Mar 2, 2015, at 2:07 PM, W Verb via illumos-developer wrote: > > Count indv cuml rcnt nsec Hottest Lock Caller > ------------------------------------------------------------------------------- > 9306 44% 44% 0.00 1557 htable_mutex+0x370 htable_release > 6307 23% 68% 0.00 1207 htable_mutex+0x108 htable_lookup > 596 7% 75% 0.00 4100 0xffffff0931705188 cv_wait > 349 5% 80% 0.00 4437 0xffffff0931705188 taskq_thread > 704 2% 82% 0.00 995 0xffffff0935de3c50 dbuf_create That has NOTHING to do with TCP. It has everything to do with the Virtual Memory subsystem. Here, see all the callers to htable_release(): http://src.illumos.org/source/search?q=&defs=&refs=htable_release&path=&hist=&project=illumos-gate I think "VM thrashing" when I see that. Dan From garrett at damore.org Mon Mar 2 19:30:18 2015 From: garrett at damore.org (Garrett D'Amore) Date: Mon, 2 Mar 2015 11:30:18 -0800 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> <54F44602.5030705@osn.de> Message-ID: <064C3FC8-43E0-4757-AADA-831303782A4C@damore.org> Here?s a theory. You are using small (relatively) MTUs (3000 is less than the smallest ZFS block size.) So, when you go multipathing this way, might a single upper layer transaction (ZFS block transfer request, or for that matter COMSTAR block request) get routed over different paths. This sounds like a potentially pathological condition to me. What happens if you increase the MTU to 9000? Have you tried it? I?m sort of thinking that this will permit each transaction to be issued in a single IP frame, which may alleviate certain tragic code paths. (That said, I?m not sure how aware COMSTAR is of the IP MTU. If it is ignorant, then it shouldn?t matter *that* much, since TCP should do the right thing here and a single TCP stream should stick to a single underlying NIC. But if COMSTAR is aware of the MTU, it may do some really screwball things as it tries to break requests up into single frames.) Your read spin really looks like only about 22 msec of wait out of a total run of 30 sec. (That?s not *great*, but neither does it sound tragic.) Your write is interesting because that looks like it is going a wildly different path. You should be aware that the locks you see are *not* necessarily related in call order, but rather are ordered by instance count. The write code path hitting the task_thread as hard as it does is really, really weird. Something is pounding on a taskq lock super hard. The number of taskq_dispatch_ent calls is interesting here. I?m starting to wonder if it?s something as stupid as a spin where if the taskq is ?full? (max size reached), a caller just is spinning trying to dispatch jobs to the taskq. The taskq_dispatch_ent code is super simple, and it should be almost impossible to have contention on that lock ? barring a thread spinning hard on taskq_dispatch (or taskq_dispatch_ent as I think is happening here). Looking at the various call sites, there are places in both COMSTAR (iscsit) and in ZFS where this could be coming from. To know which, we really need to have the back trace associated. lockstat can give this ? try giving ?-s 5? to give a short backtrace from this, that will probably give us a little more info about the guilty caller. :-) - Garrett > On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer wrote: > > Hello all, > I am not using layer 2 flow control. The switch carries line-rate 10G traffic without error. > > I think I have found the issue via lockstat. The first lockstat is taken during a multipath read: > > > lockstat -kWP sleep 30 > > Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec) > > Count indv cuml rcnt nsec Hottest Lock Caller > ------------------------------------------------------------------------------- > 9306 44% 44% 0.00 1557 htable_mutex+0x370 htable_release > 6307 23% 68% 0.00 1207 htable_mutex+0x108 htable_lookup > 596 7% 75% 0.00 4100 0xffffff0931705188 cv_wait > 349 5% 80% 0.00 4437 0xffffff0931705188 taskq_thread > 704 2% 82% 0.00 995 0xffffff0935de3c50 dbuf_create > > The hash table being read here I would guess is the tcp connection hash table. > > When lockstat is run during a multipath write operation, I get: > > Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec) > > Count indv cuml rcnt nsec Hottest Lock Caller > ------------------------------------------------------------------------------- > 210752 28% 28% 0.00 4781 0xffffff0931705188 taskq_thread > 174471 22% 50% 0.00 4476 0xffffff0931705188 cv_wait > 127183 10% 61% 0.00 2871 0xffffff096f29b510 zio_notify_parent > 176066 10% 70% 0.00 1922 0xffffff0931705188 taskq_dispatch_ent > 105134 9% 80% 0.00 3110 0xffffff096ffdbf10 zio_remove_child > 67512 4% 83% 0.00 1938 0xffffff096f3db4b0 zio_add_child > 45736 3% 86% 0.00 2239 0xffffff0935de3c50 dbuf_destroy > 27781 3% 89% 0.00 3416 0xffffff0935de3c50 dbuf_create > 38536 2% 91% 0.00 2122 0xffffff0935de3b70 dnode_rele > 27841 2% 93% 0.00 2423 0xffffff0935de3b70 dnode_diduse_space > 19020 2% 95% 0.00 3046 0xffffff09d9e305e0 dbuf_rele > 14627 1% 96% 0.00 3632 dbuf_hash_table+0x4f8 dbuf_find > > > > Writes are not performing htable lookups, while reads are. > > -Warren V > > > > > > > On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann > wrote: > Hi, > > I would try *one* TPG which includes both interface addresses > and I would double check for packet drops on the Catalyst. > > The 3560 supports only receive flow control which means, that > a sending 10Gbit port can easily overload a 1Gbit port. > Do you have flow control enabled? > > - Joerg > > > On 02.03.2015 09:22, W Verb via illumos-developer wrote: > Hello Garrett, > > No, no 802.3ad going on in this config. > > Here is a basic schematic: > > https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing > > Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide: > > https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing > > Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The > switch is set to allow 9148-byte frames, and I'm not seeing any > errors/buffer overruns on the switch. > > Here is a screenshot of a packet capture from a read operation on the > guest OS (from it's local drive, which is actually a VMDK file on the > storage server). In this example, only a single 1G ESXi kernel interface > (vmk1) is bound to the software iSCSI initiator. > > https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing > > Note that there's a nice, well-behaved window sizing process taking > place. The ESXi decreases the scaled window by 11 or 12 for each ACK, > then bumps it back up to 512. > > Here is a similar screenshot of a single-interface write operation: > > https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing > > There are no pauses or gaps in the transmission rate in the > single-interface transfers. > > > In the next screenshots, I have enabled an additional 1G interface on > the ESXi host, and bound it to the iSCSI initiator. The new interface is > bound to a separate physical port, uses a different VLAN on the switch, > and talks to a different 10G port on the storage server. > > First, let's look at a write operation on the guest OS, which happily > pumps data at near-line-rate to the storage server. > > Here is a sequence number trace diagram. Note how the transfer has a > nice, smooth increment rate over the entire transfer. > > https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing > > Here are screenshots from packet captures on both 1G interfaces: > > https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing > https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing > > Note how we again see nice, smooth window adjustment, and no gaps in > transmission. > > > But now, let's look at the problematic two-interface Read operation. > First, the sequence graph: > > https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing > > As you can see, there are gaps and jumps in the transmission throughout > the transfer. > It is very illustrative to look at captures of the gaps, which are > occurring on both interfaces: > > https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing > https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing > > As you can see, there are ~.4 second pauses in transmission from the > storage server, which kills the transfer rate. > It's clear that the ESXi box ACKs the prior iSCSI operation to > completion, then makes a new LUN request, which the storage server > immediately replies to. The ESXi ACKs the response packet from the > storage server, then waits...and waits....and waits... until eventually > the storage server starts transmitting again. > > Because the pause happens while the ESXi client is waiting for a packet > from the storage server, that tells me that the gaps are not an artifact > of traffic being switched between both active interfaces, but are > actually indicative of short hangs occurring on the server. > > Having a pause or two in transmission is no big deal, but in my case, it > is happening constantly, and dropping my overall read transfer rate down > to 20-60MB/s, which is slower than the single interface transfer rate > (~90-100MB/s). > > Decreasing the MTU makes the pauses shorter, increasing them makes the > pauses longer. > > Another interesting thing is that if I set the multipath io interval to > 3 operations instead of 1, I get better throughput. In other words, the > less frequently I swap IP addresses on my iSCSI requests from the ESXi > unit, the fewer pauses I see. > > Basically, COMSTAR seems to choke each time an iSCSI request from a new > IP arrives. > > Because the single interface transfer is near line rate, that tells me > that the storage system (mpt_sas, zfs, etc) is working fine. It's only > when multiple paths are attempted that iSCSI falls on its face during reads. > > All of these captures were taken without a cache device being attached > to the storage zpool, so this isn't looking like some kind of ZFS ARC > problem. As mentioned previously, local transfers to/from the zpool are > showing ~300-500 MB/s rates over long transfers (10G+). > > -Warren V > > On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore > >> wrote: > > I?m not sure I?ve followed properly. You have *two* interfaces. > You are not trying to provision these in an aggr are you? As far as > I?m aware, VMware does not support 802.3ad link aggregations. (Its > possible that you can make it work with ESXi if you give the entire > NIC to the guest ? but I?m skeptical.) The problem is that if you > try to use link aggregation, some packets (up to half!) will be > lost. TCP and other protocols fare poorly in this situation. > > Its possible I?ve totally misunderstood what you?re trying to do, in > which case I apologize. > > The idle thing is a red-herring ? the cpu is waiting for work to do, > probably because packets haven?t arrived (or where dropped by the > hypervisor!) I wouldn?t read too much into that except that your > network stack is in trouble. I?d look a bit more closely at the > kstats for tcp ? I suspect you?ll see retransmits or out of order > values that are unusually high ? if so this may help validate my > theory above. > > - Garrett > > On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer > >> > > wrote: > > Hello all, > > > Well, I no longer blame the ixgbe driver for the problems I'm seeing. > > > I tried Joerg's updated driver, which didn't improve the issue. So > I went back to the drawing board and rebuilt the server from scratch. > > What I noted is that if I have only a single 1-gig physical > interface active on the ESXi host, everything works as expected. > As soon as I enable two interfaces, I start seeing the performance > problems I've described. > > Response pauses from the server that I see in TCPdumps are still > leading me to believe the problem is delay on the server side, so > I ran a series of kernel dtraces and produced some flamegraphs. > > > This was taken during a read operation with two active 10G > interfaces on the server, with a single target being shared by two > tpgs- one tpg for each 10G physical port. The host device has two > 1G ports enabled, with VLANs separating the active ports into > 10G/1G pairs. ESXi is set to multipath using both VLANS with a > round-robin IO interval of 1. > > https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing > > > This was taken during a write operation: > > https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing > > > I then rebooted the server and disabled C-State, ACPI T-State, and > general EIST (Turbo boost) functionality in the CPU. > > I when I attempted to boot my guest VM, the iSCSI transfer > gradually ground to a halt during the boot loading process, and > the guest OS never did complete its boot process. > > Here is a flamegraph taken while iSCSI is slowly dying: > > https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing > > > I edited out cpu_idle_adaptive from the dtrace output and > regenerated the slowdown graph: > > https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing > > > I then edited cpu_idle_adaptive out of the speedy write operation > and regenerated that graph: > > https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing > > > I have zero experience with interpreting flamegraphs, but the most > significant difference I see between the slow read example and the > fast write example is in unix`thread_start --> unix`idle. There's > a good chunk of "unix`i86_mwait" in the read example that is not > present in the write example at all. > > Disabling the l2arc cache device didn't make a difference, and I > had to reenable EIST support on the CPU to get my VMs to boot. > > I am seeing a variety of bug reports going back to 2010 regarding > excessive mwait operations, with the suggested solutions usually > being to set "cpupm enable poll-mode" in power.conf. That change > also had no effect on speed. > > -Warren V > > > > > -----Original Message----- > > From: Chris Siebenmann [mailto:cks at cs.toronto.edu ] > > Sent: Monday, February 23, 2015 8:30 AM > > To: W Verb > > Cc: omnios-discuss at lists.omniti.com > >; cks at cs.toronto.edu > > > > Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and > the Greek economy > > > > Chris, thanks for your specific details. I'd appreciate it if you > > > could tell me which copper NIC you tried, as well as to pass on the > > > iSCSI tuning parameters. > > > Our copper NIC experience is with onboard X540-AT2 ports on > SuperMicro hardware (which have the guaranteed 10-20 msec lock > hold) and dual-port 82599EB TN cards (which have some sort of > driver/hardware failure under load that eventually leads to > 2-second lock holds). I can't recommend either with the current > driver; we had to revert to 1G networking in order to get stable > servers. > > > The iSCSI parameter modifications we do, across both initiators > and targets, are: > > > initialr2tno > > firstburstlength128k > > maxrecvdataseglen128k[only on Linux backends] > > maxxmitdataseglen128k[only on Linux backends] > > > The OmniOS initiator doesn't need tuning for more than the first > two parameters; on the Linux backends we tune up all four. My > extended thoughts on these tuning parameters and why we touch them > can be found > > here: > > > http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol > > http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning > > > The short version is that these parameters probably only make a > small difference but their overall goal is to do 128KB ZFS reads > and writes in single iSCSI operations (although they will be > fragmented at the TCP > > layer) and to do iSCSI writes without a back-and-forth delay > between initiator and target (that's 'initialr2t no'). > > > I think basically everyone should use InitialR2T set to no and in > fact that it should be the software default. These days only > unusually limited iSCSI targets should need it to be otherwise and > they can change their setting for it (initiator and target must > both agree to it being 'yes', so either can veto it). > > > - cks > > > > On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann > >> wrote: > > Hi, > > I think your problem is caused by your link properties or your > switch settings. In general the standard ixgbe seems to perform > well. > > I had trouble after changing the default flow control settings > to "bi" > and this was my motivation to update the ixgbe driver a long > time ago. > After I have updated our systems to ixgbe 2.5.8 I never had any > problems .... > > Make sure your switch has support for jumbo frames and you use > the same mtu on all ports, otherwise the smallest will be used. > > What switch do you use? I can tell you nice horror stories about > different vendors.... > > - Joerg > > On 23.02.2015 10:31, W Verb wrote: > > Thank you Joerg, > > I've downloaded the package and will try it tomorrow. > > The only thing I can add at this point is that upon review > of my > testing, I may have performed my "pkg -u" between the > initial quad-gig > performance test and installing the 10G NIC. So this may > be a new > problem introduced in the latest updates. > > Those of you who are running 10G and have not upgraded to > the latest > kernel, etc, might want to do some additional testing > before running the > update. > > -Warren V > > On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann > > > >>> wrote: > > Hi, > > I remember there was a problem with the flow control > settings in the > ixgbe > driver, so I updated it a long time ago for our > internal servers to > 2.5.8. > Last weekend I integrated the latest changes from the > FreeBSD driver > to bring > the illumos ixgbe to 2.5.25 but I had no time to test > it, so it's > completely > untested! > > > If you would like to give the latest driver a try you > can fetch the > kernel modules from > https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 > > > > >> > > Clone your boot environment, place the modules in the > new environment > and update the boot-archive of the new BE. > > - Joerg > > > > > > On 23.02.2015 02:54, W Verb wrote: > > By the way, to those of you who have working > setups: please send me > your pool/volume settings, interface linkprops, > and any kernel > tuning > parameters you may have set. > > Thanks, > Warren V > > On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip > > > >>> > > wrote: > > I can't say I totally agree with your performance > assessment. I run Intel > X520 in all my OmniOS boxes. > > Here is a capture of nfssvrtop I made while > running many > storage vMotions > between two OmniOS boxes hosting NFS > datastores. This is a > 10 host VMware > cluster. Both OmniOS boxes are dual 10G > connected with > copper twin-ax to > the in rack Nexus 5010. > > VMware does 100% sync writes, I use ZeusRAM > SSDs for log > devices. > > -Chip > > 2014 Apr 24 08:05:51, load: 12.64, read: > 17330243 KB, > swrite: 15985 KB, > awrite: 1875455 KB > > Ver Client NFSOPS Reads > SWrites AWrites > Commits Rd_bw > SWr_bw AWr_bw Rd_t SWr_t AWr_t > Com_t Align% > > 4 10.28.17.105 0 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 10.28.17.215 0 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 10.28.17.213 0 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 10.28.16.151 0 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 all 1 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 3 10.28.16.175 3 0 > 3 0 > 0 1 > 11 0 4806 48 0 0 85 > > 3 10.28.16.183 6 0 > 6 0 > 0 3 > 162 0 549 124 0 0 > 73 > > 3 10.28.16.180 11 0 > 10 0 > 0 3 > 27 0 776 89 0 0 67 > > 3 10.28.16.176 28 2 > 26 0 > 0 10 > 405 0 2572 198 0 0 > 100 > > 3 10.28.16.178 4606 4602 > 4 0 > 0 294534 > 3 0 723 49 0 0 99 > > 3 10.28.16.179 4905 4879 > 26 0 > 0 312208 > 311 0 735 271 0 0 > 99 > > 3 10.28.16.181 5515 5502 > 13 0 > 0 352107 > 77 0 89 87 0 0 99 > > 3 10.28.16.184 12095 12059 > 10 0 > 0 763014 > 39 0 249 147 0 0 99 > > 3 10.28.58.1 15401 6040 > 116 6354 > 53 191605 > 474 202346 192 96 144 83 > 99 > > 3 all 42574 33086 > 217 > 6354 53 1913488 > 1582 202300 348 138 153 105 > 99 > > > > > > On Fri, Feb 20, 2015 at 11:46 PM, W Verb > > > > > >>> wrote: > > > Hello All, > > Thank you for your replies. > I tried a few things, and found the following: > > 1: Disabling hyperthreading support in the > BIOS drops > performance overall > by a factor of 4. > 2: Disabling VT support also seems to have > some effect, > although it > appears to be minor. But this has the > amusing side > effect of fixing the > hangs I've been experiencing with fast > reboot. Probably > by disabling kvm. > 3: The performance tests are a bit tricky > to quantify > because of caching > effects. In fact, I'm not entirely sure > what is > happening here. It's just > best to describe what I'm seeing: > > The commands I'm using to test are > dd if=/dev/zero of=./test.dd bs=2M count=5000 > dd of=/dev/null if=./test.dd bs=2M count=5000 > The host vm is running Centos 6.6, and has > the latest > vmtools installed. > There is a host cache on an SSD local to > the host that > is also in place. > Disabling the host cache didn't > immediately have an > effect as far as I could > see. > > The host MTU set to 3000 on all iSCSI > interfaces for all > tests. > > Test 1: Right after reboot, with an ixgbe > MTU of 9000, > the write test > yields an average speed over three tests > of 137MB/s. The > read test yields an > average over three tests of 5MB/s. > > Test 2: After setting "ifconfig ixgbe0 mtu > 3000", the > write tests yield > 140MB/s, and the read tests yield 53MB/s. > It's important > to note here that > if I cut the read test short at only > 2-3GB, I get > results upwards of > 350MB/s, which I assume is local > cache-related distortion. > > Test 3: MTU of 1500. Read tests are up to > 156 MB/s. > Write tests yield > about 142MB/s. > Test 4: MTU of 1000: Read test at 182MB/s. > Test 5: MTU of 900: Read test at 130 MB/s. > Test 6: MTU of 1000: Read test at 160MB/s. > Write tests > are now > consistently at about 300MB/s. > Test 7: MTU of 1200: Read test at 124MB/s. > Test 8: MTU of 1000: Read test at 161MB/s. > Write at 261MB/s. > > A few final notes: > L1ARC grabs about 10GB of RAM during the > tests, so > there's definitely some > read caching going on. > The write operations are easier to observe > with iostat, > and I'm seeing io > rates that closely correlate with the > network write speeds. > > > Chris, thanks for your specific details. > I'd appreciate > it if you could > tell me which copper NIC you tried, as > well as to pass > on the iSCSI tuning > parameters. > > I've ordered an Intel EXPX9502AFXSR, which > uses the > 82598 chip instead of > the 82599 in the X520. If I get similar > results with my > fiber transcievers, > I'll see if I can get a hold of copper ones. > > But I should mention that I did indeed > look at PHY/MAC > error rates, and > they are nil. > > -Warren V > > On Fri, Feb 20, 2015 at 7:25 PM, Chris > Siebenmann > > > > > >>> > > wrote: > > > After installation and > configuration, I observed > all kinds of bad > behavior > in the network traffic between the > hosts and the > server. All of this > bad > behavior is traced to the ixgbe > driver on the > storage server. Without > going > into the full troubleshooting > process, here are > my takeaways: > > [...] > > For what it's worth, we managed to > achieve much > better line rates on > copper 10G ixgbe hardware of various > descriptions > between OmniOS > and CentOS 7 (I don't think we ever > tested OmniOS to > OmniOS). I don't > believe OmniOS could do TCP at full > line rate but I > think we managed 700+ > Mbytes/sec on both transmit and > receive and we got > basically disk-limited > speeds with iSCSI (across multiple > disks on > multi-disk mirrored pools, > OmniOS iSCSI initiator, Linux iSCSI > targets). > > I don't believe we did any specific > kernel tuning > (and in fact some of > our attempts to fiddle ixgbe driver > parameters blew > up in our face). > We did tune iSCSI connection > parameters to increase > various buffer > sizes so that ZFS could do even large > single > operations in single iSCSI > transactions. (More details available > if people are > interested.) > > 10: At the wire level, the speed > problems are > clearly due to pauses in > response time by omnios. At 9000 > byte frame > sizes, I see a good number > of duplicate ACKs and fast > retransmits during > read operations (when > omnios is transmitting). But below > about a > 4100-byte MTU on omnios > (which seems to correlate to > 4096-byte iSCSI > block transfers), the > transmission errors fade away and > we only see > the transmission pause > problem. > > > This is what really attracted my > attention. In > our OmniOS setup, our > specific Intel hardware had ixgbe > driver issues that > could cause > activity stalls during once-a-second > link heartbeat > checks. This > obviously had an effect at the TCP and > iSCSI layers. > My initial message > to illumos-developer sparked a potentially > interesting discussion: > > > http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/ > > > > > >> > > If you think this is a possibility in > your setup, > I've put the DTrace > script I used to hunt for this up on > the web: > > http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d > > > > > >> > > This isn't the only potential source > of driver > stalls by any means, it's > just the one I found. You may also > want to look at > lockstat in general, > as information it reported is what led > us to look > specifically at the > ixgbe code here. > > (If you suspect kernel/driver issues, > lockstat > combined with kernel > source is a really excellent resource.) > > - cks > > > > > > ___________________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti > >.____com > __omniti.com > >> > http://lists.omniti.com/____mailman/listinfo/omnios-____discuss > > > > > >> > > > ___________________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti > >.____com > __omniti.com > >> > http://lists.omniti.com/____mailman/listinfo/omnios-____discuss > > > > > >> > > > -- > OSN Online Service Nuernberg GmbH, Bucher Str. 78, > 90408 Nuernberg > Tel: +49 911 39905-0 > - Fax: +49 911 > 39905-55 - > http://www.osn.de > > HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg > Goltermann > > > > -- > OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg > Tel: +49 911 39905-0 - Fax: +49 > 911 39905-55 - http://www.osn.de > > > HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann > > > *illumos-developer* | Archives > > > > > | Modify > Your Subscription > [Powered by Listbox] > > > > > *illumos-developer* | Archives > > > > | > Modify > > > Your Subscription [Powered by Listbox] > > > > -- > OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg > Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de > HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann > > illumos-developer | Archives | Modify Your Subscription -------------- next part -------------- An HTML attachment was scrubbed... URL: From wverb73 at gmail.com Mon Mar 2 20:19:44 2015 From: wverb73 at gmail.com (W Verb) Date: Mon, 2 Mar 2015 12:19:44 -0800 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: <064C3FC8-43E0-4757-AADA-831303782A4C@damore.org> References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> <54F44602.5030705@osn.de> <064C3FC8-43E0-4757-AADA-831303782A4C@damore.org> Message-ID: Hello, vmstat seems pretty boring. Certainly nothing going to swap. root at sanbox:/root# vmstat kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr po ro s0 s2 in sy cs us sy id 0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0 1 99 Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep 30" during the "fast" write operation. ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Hottest Lock Caller 50934 3% 79% 0.00 3437 0xffffff093145ba40 taskq_dispatch_ent nsec ------ Time Distribution ------ count Stack 128 | 7 spa_taskq_dispatch_ent 256 |@@ 4333 zio_taskq_dispatch 512 |@@ 3863 zio_issue_async 1024 |@@@@@ 9717 zio_execute 2048 |@@@@@@@@@ 15904 4096 |@@@@ 7595 8192 |@@ 4498 16384 |@ 2662 32768 |@ 1886 65536 | 434 131072 | 34 262144 | 1 ------------------------------------------------------------------------------- However, the truly "broken" function is a read operation: Top lock 1st try: ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Hottest Lock Caller 474 15% 15% 0.00 7031 0xffffff093145b6f8 cv_wait nsec ------ Time Distribution ------ count Stack 256 |@ 29 taskq_thread_wait 512 |@@@@@@ 100 taskq_thread 1024 |@@@@ 72 thread_start 2048 |@@@@ 69 4096 |@@@ 51 8192 |@@ 47 16384 |@@ 44 32768 |@@ 32 65536 |@ 25 131072 | 5 ------------------------------------------------------------------------------- Top lock 2nd try: ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Hottest Lock Caller 174 39% 39% 0.00 103909 0xffffff0943f116a0 dmu_zfetch_find nsec ------ Time Distribution ------ count Stack 2048 | 2 dmu_zfetch 4096 | 3 dbuf_read 8192 | 4 dmu_buf_hold_array_by_dnode 16384 | 3 dmu_buf_hold_array 32768 |@ 7 65536 |@@ 14 131072 |@@@@@@@@@@@@@@@@@@@@ 116 262144 |@@@ 19 524288 | 4 1048576 | 2 ------------------------------------------------------------------------------- Top lock 3rd try: ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Hottest Lock Caller 283 55% 55% 0.00 94602 0xffffff0943ff5a68 dmu_zfetch_find nsec ------ Time Distribution ------ count Stack 512 | 1 dmu_zfetch 1024 | 1 dbuf_read 2048 | 0 dmu_buf_hold_array_by_dnode 4096 | 5 dmu_buf_hold_array 8192 | 2 16384 | 7 32768 | 4 65536 |@@@ 33 131072 |@@@@@@@@@@@@@@@@@@@@ 198 262144 |@@ 27 524288 | 2 1048576 | 3 ------------------------------------------------------------------------------- As for the MTU question- setting the MTU to 9000 makes read operations grind almost to a halt at 5MB/s transfer rate. -Warren V On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore wrote: > Here?s a theory. You are using small (relatively) MTUs (3000 is less than > the smallest ZFS block size.) So, when you go multipathing this way, might > a single upper layer transaction (ZFS block transfer request, or for that > matter COMSTAR block request) get routed over different paths. This sounds > like a potentially pathological condition to me. > > What happens if you increase the MTU to 9000? Have you tried it? I?m > sort of thinking that this will permit each transaction to be issued in a > single IP frame, which may alleviate certain tragic code paths. (That > said, I?m not sure how aware COMSTAR is of the IP MTU. If it is ignorant, > then it shouldn?t matter *that* much, since TCP should do the right thing > here and a single TCP stream should stick to a single underlying NIC. But > if COMSTAR is aware of the MTU, it may do some really screwball things as > it tries to break requests up into single frames.) > > Your read spin really looks like only about 22 msec of wait out of a total > run of 30 sec. (That?s not *great*, but neither does it sound tragic.) > Your write is interesting because that looks like it is going a wildly > different path. You should be aware that the locks you see are *not* > necessarily related in call order, but rather are ordered by instance > count. The write code path hitting the task_thread as hard as it does is > really, really weird. Something is pounding on a taskq lock super hard. > The number of taskq_dispatch_ent calls is interesting here. I?m starting > to wonder if it?s something as stupid as a spin where if the taskq is > ?full? (max size reached), a caller just is spinning trying to dispatch > jobs to the taskq. > > The taskq_dispatch_ent code is super simple, and it should be almost > impossible to have contention on that lock ? barring a thread spinning hard > on taskq_dispatch (or taskq_dispatch_ent as I think is happening here). > Looking at the various call sites, there are places in both COMSTAR > (iscsit) and in ZFS where this could be coming from. To know which, we > really need to have the back trace associated. > > lockstat can give this ? try giving ?-s 5? to give a short backtrace from > this, that will probably give us a little more info about the guilty > caller. :-) > > - Garrett > > On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer < > developer at lists.illumos.org> wrote: > > Hello all, > I am not using layer 2 flow control. The switch carries line-rate 10G > traffic without error. > > I think I have found the issue via lockstat. The first lockstat is taken > during a multipath read: > > > lockstat -kWP sleep 30 > > Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec) > > Count indv cuml rcnt nsec Hottest Lock Caller > > ------------------------------------------------------------------------------- > 9306 44% 44% 0.00 1557 htable_mutex+0x370 htable_release > 6307 23% 68% 0.00 1207 htable_mutex+0x108 htable_lookup > 596 7% 75% 0.00 4100 0xffffff0931705188 cv_wait > 349 5% 80% 0.00 4437 0xffffff0931705188 taskq_thread > 704 2% 82% 0.00 995 0xffffff0935de3c50 dbuf_create > > The hash table being read here I would guess is the tcp connection hash > table. > > When lockstat is run during a multipath write operation, I get: > > Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec) > > Count indv cuml rcnt nsec Hottest Lock Caller > > ------------------------------------------------------------------------------- > 210752 28% 28% 0.00 4781 0xffffff0931705188 taskq_thread > 174471 22% 50% 0.00 4476 0xffffff0931705188 cv_wait > 127183 10% 61% 0.00 2871 0xffffff096f29b510 zio_notify_parent > 176066 10% 70% 0.00 1922 0xffffff0931705188 taskq_dispatch_ent > 105134 9% 80% 0.00 3110 0xffffff096ffdbf10 zio_remove_child > 67512 4% 83% 0.00 1938 0xffffff096f3db4b0 zio_add_child > 45736 3% 86% 0.00 2239 0xffffff0935de3c50 dbuf_destroy > 27781 3% 89% 0.00 3416 0xffffff0935de3c50 dbuf_create > 38536 2% 91% 0.00 2122 0xffffff0935de3b70 dnode_rele > 27841 2% 93% 0.00 2423 0xffffff0935de3b70 dnode_diduse_space > 19020 2% 95% 0.00 3046 0xffffff09d9e305e0 dbuf_rele > 14627 1% 96% 0.00 3632 dbuf_hash_table+0x4f8 dbuf_find > > > > Writes are not performing htable lookups, while reads are. > > -Warren V > > > > > > > On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann wrote: > >> Hi, >> >> I would try *one* TPG which includes both interface addresses >> and I would double check for packet drops on the Catalyst. >> >> The 3560 supports only receive flow control which means, that >> a sending 10Gbit port can easily overload a 1Gbit port. >> Do you have flow control enabled? >> >> - Joerg >> >> >> On 02.03.2015 09:22, W Verb via illumos-developer wrote: >> >>> Hello Garrett, >>> >>> No, no 802.3ad going on in this config. >>> >>> Here is a basic schematic: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/ >>> view?usp=sharing >>> >>> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/ >>> view?usp=sharing >>> >>> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The >>> switch is set to allow 9148-byte frames, and I'm not seeing any >>> errors/buffer overruns on the switch. >>> >>> Here is a screenshot of a packet capture from a read operation on the >>> guest OS (from it's local drive, which is actually a VMDK file on the >>> storage server). In this example, only a single 1G ESXi kernel interface >>> (vmk1) is bound to the software iSCSI initiator. >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/ >>> view?usp=sharing >>> >>> Note that there's a nice, well-behaved window sizing process taking >>> place. The ESXi decreases the scaled window by 11 or 12 for each ACK, >>> then bumps it back up to 512. >>> >>> Here is a similar screenshot of a single-interface write operation: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/ >>> view?usp=sharing >>> >>> There are no pauses or gaps in the transmission rate in the >>> single-interface transfers. >>> >>> >>> In the next screenshots, I have enabled an additional 1G interface on >>> the ESXi host, and bound it to the iSCSI initiator. The new interface is >>> bound to a separate physical port, uses a different VLAN on the switch, >>> and talks to a different 10G port on the storage server. >>> >>> First, let's look at a write operation on the guest OS, which happily >>> pumps data at near-line-rate to the storage server. >>> >>> Here is a sequence number trace diagram. Note how the transfer has a >>> nice, smooth increment rate over the entire transfer. >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/ >>> view?usp=sharing >>> >>> Here are screenshots from packet captures on both 1G interfaces: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/ >>> view?usp=sharing >>> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/ >>> view?usp=sharing >>> >>> Note how we again see nice, smooth window adjustment, and no gaps in >>> transmission. >>> >>> >>> But now, let's look at the problematic two-interface Read operation. >>> First, the sequence graph: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/ >>> view?usp=sharing >>> >>> As you can see, there are gaps and jumps in the transmission throughout >>> the transfer. >>> It is very illustrative to look at captures of the gaps, which are >>> occurring on both interfaces: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/ >>> view?usp=sharing >>> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/ >>> view?usp=sharing >>> >>> As you can see, there are ~.4 second pauses in transmission from the >>> storage server, which kills the transfer rate. >>> It's clear that the ESXi box ACKs the prior iSCSI operation to >>> completion, then makes a new LUN request, which the storage server >>> immediately replies to. The ESXi ACKs the response packet from the >>> storage server, then waits...and waits....and waits... until eventually >>> the storage server starts transmitting again. >>> >>> Because the pause happens while the ESXi client is waiting for a packet >>> from the storage server, that tells me that the gaps are not an artifact >>> of traffic being switched between both active interfaces, but are >>> actually indicative of short hangs occurring on the server. >>> >>> Having a pause or two in transmission is no big deal, but in my case, it >>> is happening constantly, and dropping my overall read transfer rate down >>> to 20-60MB/s, which is slower than the single interface transfer rate >>> (~90-100MB/s). >>> >>> Decreasing the MTU makes the pauses shorter, increasing them makes the >>> pauses longer. >>> >>> Another interesting thing is that if I set the multipath io interval to >>> 3 operations instead of 1, I get better throughput. In other words, the >>> less frequently I swap IP addresses on my iSCSI requests from the ESXi >>> unit, the fewer pauses I see. >>> >>> Basically, COMSTAR seems to choke each time an iSCSI request from a new >>> IP arrives. >>> >>> Because the single interface transfer is near line rate, that tells me >>> that the storage system (mpt_sas, zfs, etc) is working fine. It's only >>> when multiple paths are attempted that iSCSI falls on its face during >>> reads. >>> >>> All of these captures were taken without a cache device being attached >>> to the storage zpool, so this isn't looking like some kind of ZFS ARC >>> problem. As mentioned previously, local transfers to/from the zpool are >>> showing ~300-500 MB/s rates over long transfers (10G+). >>> >>> -Warren V >>> >>> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore >> > wrote: >>> >>> I?m not sure I?ve followed properly. You have *two* interfaces. >>> You are not trying to provision these in an aggr are you? As far as >>> I?m aware, VMware does not support 802.3ad link aggregations. (Its >>> possible that you can make it work with ESXi if you give the entire >>> NIC to the guest ? but I?m skeptical.) The problem is that if you >>> try to use link aggregation, some packets (up to half!) will be >>> lost. TCP and other protocols fare poorly in this situation. >>> >>> Its possible I?ve totally misunderstood what you?re trying to do, in >>> which case I apologize. >>> >>> The idle thing is a red-herring ? the cpu is waiting for work to do, >>> probably because packets haven?t arrived (or where dropped by the >>> hypervisor!) I wouldn?t read too much into that except that your >>> network stack is in trouble. I?d look a bit more closely at the >>> kstats for tcp ? I suspect you?ll see retransmits or out of order >>> values that are unusually high ? if so this may help validate my >>> theory above. >>> >>> - Garrett >>> >>> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer >>>> > >>>> >>>> wrote: >>>> >>>> Hello all, >>>> >>>> >>>> Well, I no longer blame the ixgbe driver for the problems I'm >>>> seeing. >>>> >>>> >>>> I tried Joerg's updated driver, which didn't improve the issue. So >>>> I went back to the drawing board and rebuilt the server from >>>> scratch. >>>> >>>> What I noted is that if I have only a single 1-gig physical >>>> interface active on the ESXi host, everything works as expected. >>>> As soon as I enable two interfaces, I start seeing the performance >>>> problems I've described. >>>> >>>> Response pauses from the server that I see in TCPdumps are still >>>> leading me to believe the problem is delay on the server side, so >>>> I ran a series of kernel dtraces and produced some flamegraphs. >>>> >>>> >>>> This was taken during a read operation with two active 10G >>>> interfaces on the server, with a single target being shared by two >>>> tpgs- one tpg for each 10G physical port. The host device has two >>>> 1G ports enabled, with VLANs separating the active ports into >>>> 10G/1G pairs. ESXi is set to multipath using both VLANS with a >>>> round-robin IO interval of 1. >>>> >>>> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/ >>>> view?usp=sharing >>>> >>>> >>>> This was taken during a write operation: >>>> >>>> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/ >>>> view?usp=sharing >>>> >>>> >>>> I then rebooted the server and disabled C-State, ACPI T-State, and >>>> general EIST (Turbo boost) functionality in the CPU. >>>> >>>> I when I attempted to boot my guest VM, the iSCSI transfer >>>> gradually ground to a halt during the boot loading process, and >>>> the guest OS never did complete its boot process. >>>> >>>> Here is a flamegraph taken while iSCSI is slowly dying: >>>> >>>> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/ >>>> view?usp=sharing >>>> >>>> >>>> I edited out cpu_idle_adaptive from the dtrace output and >>>> regenerated the slowdown graph: >>>> >>>> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/ >>>> view?usp=sharing >>>> >>>> >>>> I then edited cpu_idle_adaptive out of the speedy write operation >>>> and regenerated that graph: >>>> >>>> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/ >>>> view?usp=sharing >>>> >>>> >>>> I have zero experience with interpreting flamegraphs, but the most >>>> significant difference I see between the slow read example and the >>>> fast write example is in unix`thread_start --> unix`idle. There's >>>> a good chunk of "unix`i86_mwait" in the read example that is not >>>> present in the write example at all. >>>> >>>> Disabling the l2arc cache device didn't make a difference, and I >>>> had to reenable EIST support on the CPU to get my VMs to boot. >>>> >>>> I am seeing a variety of bug reports going back to 2010 regarding >>>> excessive mwait operations, with the suggested solutions usually >>>> being to set "cpupm enable poll-mode" in power.conf. That change >>>> also had no effect on speed. >>>> >>>> -Warren V >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> >>>> From: Chris Siebenmann [mailto:cks at cs.toronto.edu] >>>> >>>> Sent: Monday, February 23, 2015 8:30 AM >>>> >>>> To: W Verb >>>> >>>> Cc: omnios-discuss at lists.omniti.com >>>> ; cks at cs.toronto.edu >>>> >>>> >>>> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and >>>> the Greek economy >>>> >>>> >>>> > Chris, thanks for your specific details. I'd appreciate it if you >>>> >>>> > could tell me which copper NIC you tried, as well as to pass on >>>> the >>>> >>>> > iSCSI tuning parameters. >>>> >>>> >>>> Our copper NIC experience is with onboard X540-AT2 ports on >>>> SuperMicro hardware (which have the guaranteed 10-20 msec lock >>>> hold) and dual-port 82599EB TN cards (which have some sort of >>>> driver/hardware failure under load that eventually leads to >>>> 2-second lock holds). I can't recommend either with the current >>>> driver; we had to revert to 1G networking in order to get stable >>>> servers. >>>> >>>> >>>> The iSCSI parameter modifications we do, across both initiators >>>> and targets, are: >>>> >>>> >>>> initialr2tno >>>> >>>> firstburstlength128k >>>> >>>> maxrecvdataseglen128k[only on Linux backends] >>>> >>>> maxxmitdataseglen128k[only on Linux backends] >>>> >>>> >>>> The OmniOS initiator doesn't need tuning for more than the first >>>> two parameters; on the Linux backends we tune up all four. My >>>> extended thoughts on these tuning parameters and why we touch them >>>> can be found >>>> >>>> here: >>>> >>>> >>>> http://utcc.utoronto.ca/~cks/space/blog/tech/ >>>> UnderstandingiSCSIProtocol >>>> >>>> http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning >>>> >>>> >>>> The short version is that these parameters probably only make a >>>> small difference but their overall goal is to do 128KB ZFS reads >>>> and writes in single iSCSI operations (although they will be >>>> fragmented at the TCP >>>> >>>> layer) and to do iSCSI writes without a back-and-forth delay >>>> between initiator and target (that's 'initialr2t no'). >>>> >>>> >>>> I think basically everyone should use InitialR2T set to no and in >>>> fact that it should be the software default. These days only >>>> unusually limited iSCSI targets should need it to be otherwise and >>>> they can change their setting for it (initiator and target must >>>> both agree to it being 'yes', so either can veto it). >>>> >>>> >>>> - cks >>>> >>>> >>>> >>>> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann >>> > wrote: >>>> >>>> Hi, >>>> >>>> I think your problem is caused by your link properties or your >>>> switch settings. In general the standard ixgbe seems to perform >>>> well. >>>> >>>> I had trouble after changing the default flow control settings >>>> to "bi" >>>> and this was my motivation to update the ixgbe driver a long >>>> time ago. >>>> After I have updated our systems to ixgbe 2.5.8 I never had any >>>> problems .... >>>> >>>> Make sure your switch has support for jumbo frames and you use >>>> the same mtu on all ports, otherwise the smallest will be used. >>>> >>>> What switch do you use? I can tell you nice horror stories about >>>> different vendors.... >>>> >>>> - Joerg >>>> >>>> On 23.02.2015 10:31, W Verb wrote: >>>> >>>> Thank you Joerg, >>>> >>>> I've downloaded the package and will try it tomorrow. >>>> >>>> The only thing I can add at this point is that upon review >>>> of my >>>> testing, I may have performed my "pkg -u" between the >>>> initial quad-gig >>>> performance test and installing the 10G NIC. So this may >>>> be a new >>>> problem introduced in the latest updates. >>>> >>>> Those of you who are running 10G and have not upgraded to >>>> the latest >>>> kernel, etc, might want to do some additional testing >>>> before running the >>>> update. >>>> >>>> -Warren V >>>> >>>> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann >>>> >>>> >> wrote: >>>> >>>> Hi, >>>> >>>> I remember there was a problem with the flow control >>>> settings in the >>>> ixgbe >>>> driver, so I updated it a long time ago for our >>>> internal servers to >>>> 2.5.8. >>>> Last weekend I integrated the latest changes from the >>>> FreeBSD driver >>>> to bring >>>> the illumos ixgbe to 2.5.25 but I had no time to test >>>> it, so it's >>>> completely >>>> untested! >>>> >>>> >>>> If you would like to give the latest driver a try you >>>> can fetch the >>>> kernel modules from >>>> https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 >>>> >>>> >>> > >>>> >>>> Clone your boot environment, place the modules in the >>>> new environment >>>> and update the boot-archive of the new BE. >>>> >>>> - Joerg >>>> >>>> >>>> >>>> >>>> >>>> On 23.02.2015 02:54, W Verb wrote: >>>> >>>> By the way, to those of you who have working >>>> setups: please send me >>>> your pool/volume settings, interface linkprops, >>>> and any kernel >>>> tuning >>>> parameters you may have set. >>>> >>>> Thanks, >>>> Warren V >>>> >>>> On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip >>>> >>>> >> >>>> >>>> wrote: >>>> >>>> I can't say I totally agree with your >>>> performance >>>> assessment. I run Intel >>>> X520 in all my OmniOS boxes. >>>> >>>> Here is a capture of nfssvrtop I made while >>>> running many >>>> storage vMotions >>>> between two OmniOS boxes hosting NFS >>>> datastores. This is a >>>> 10 host VMware >>>> cluster. Both OmniOS boxes are dual 10G >>>> connected with >>>> copper twin-ax to >>>> the in rack Nexus 5010. >>>> >>>> VMware does 100% sync writes, I use ZeusRAM >>>> SSDs for log >>>> devices. >>>> >>>> -Chip >>>> >>>> 2014 Apr 24 08:05:51, load: 12.64, read: >>>> 17330243 KB, >>>> swrite: 15985 KB, >>>> awrite: 1875455 KB >>>> >>>> Ver Client NFSOPS Reads >>>> SWrites AWrites >>>> Commits Rd_bw >>>> SWr_bw AWr_bw Rd_t SWr_t AWr_t >>>> Com_t Align% >>>> >>>> 4 10.28.17.105 0 0 >>>> 0 0 >>>> 0 0 >>>> 0 0 0 0 0 0 >>>> 0 >>>> >>>> 4 10.28.17.215 0 0 >>>> 0 0 >>>> 0 0 >>>> 0 0 0 0 0 0 >>>> 0 >>>> >>>> 4 10.28.17.213 0 0 >>>> 0 0 >>>> 0 0 >>>> 0 0 0 0 0 0 >>>> 0 >>>> >>>> 4 10.28.16.151 0 0 >>>> 0 0 >>>> 0 0 >>>> 0 0 0 0 0 0 >>>> 0 >>>> >>>> 4 all 1 0 >>>> 0 0 >>>> 0 0 >>>> 0 0 0 0 0 0 >>>> 0 >>>> >>>> 3 10.28.16.175 3 0 >>>> 3 0 >>>> 0 1 >>>> 11 0 4806 48 0 0 >>>> 85 >>>> >>>> 3 10.28.16.183 6 0 >>>> 6 0 >>>> 0 3 >>>> 162 0 549 124 0 0 >>>> 73 >>>> >>>> 3 10.28.16.180 11 0 >>>> 10 0 >>>> 0 3 >>>> 27 0 776 89 0 0 >>>> 67 >>>> >>>> 3 10.28.16.176 28 2 >>>> 26 0 >>>> 0 10 >>>> 405 0 2572 198 0 0 >>>> 100 >>>> >>>> 3 10.28.16.178 4606 4602 >>>> 4 0 >>>> 0 294534 >>>> 3 0 723 49 0 0 >>>> 99 >>>> >>>> 3 10.28.16.179 4905 4879 >>>> 26 0 >>>> 0 312208 >>>> 311 0 735 271 0 0 >>>> 99 >>>> >>>> 3 10.28.16.181 5515 5502 >>>> 13 0 >>>> 0 352107 >>>> 77 0 89 87 0 0 >>>> 99 >>>> >>>> 3 10.28.16.184 12095 12059 >>>> 10 0 >>>> 0 763014 >>>> 39 0 249 147 0 0 >>>> 99 >>>> >>>> 3 10.28.58.1 15401 6040 >>>> 116 6354 >>>> 53 191605 >>>> 474 202346 192 96 144 83 >>>> 99 >>>> >>>> 3 all 42574 33086 >>>> 217 >>>> 6354 53 1913488 >>>> 1582 202300 348 138 153 105 >>>> 99 >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Feb 20, 2015 at 11:46 PM, W Verb >>>> >>>> >>> >>>> >> wrote: >>>> >>>> >>>> Hello All, >>>> >>>> Thank you for your replies. >>>> I tried a few things, and found the >>>> following: >>>> >>>> 1: Disabling hyperthreading support in the >>>> BIOS drops >>>> performance overall >>>> by a factor of 4. >>>> 2: Disabling VT support also seems to have >>>> some effect, >>>> although it >>>> appears to be minor. But this has the >>>> amusing side >>>> effect of fixing the >>>> hangs I've been experiencing with fast >>>> reboot. Probably >>>> by disabling kvm. >>>> 3: The performance tests are a bit tricky >>>> to quantify >>>> because of caching >>>> effects. In fact, I'm not entirely sure >>>> what is >>>> happening here. It's just >>>> best to describe what I'm seeing: >>>> >>>> The commands I'm using to test are >>>> dd if=/dev/zero of=./test.dd bs=2M >>>> count=5000 >>>> dd of=/dev/null if=./test.dd bs=2M >>>> count=5000 >>>> The host vm is running Centos 6.6, and has >>>> the latest >>>> vmtools installed. >>>> There is a host cache on an SSD local to >>>> the host that >>>> is also in place. >>>> Disabling the host cache didn't >>>> immediately have an >>>> effect as far as I could >>>> see. >>>> >>>> The host MTU set to 3000 on all iSCSI >>>> interfaces for all >>>> tests. >>>> >>>> Test 1: Right after reboot, with an ixgbe >>>> MTU of 9000, >>>> the write test >>>> yields an average speed over three tests >>>> of 137MB/s. The >>>> read test yields an >>>> average over three tests of 5MB/s. >>>> >>>> Test 2: After setting "ifconfig ixgbe0 mtu >>>> 3000", the >>>> write tests yield >>>> 140MB/s, and the read tests yield 53MB/s. >>>> It's important >>>> to note here that >>>> if I cut the read test short at only >>>> 2-3GB, I get >>>> results upwards of >>>> 350MB/s, which I assume is local >>>> cache-related distortion. >>>> >>>> Test 3: MTU of 1500. Read tests are up to >>>> 156 MB/s. >>>> Write tests yield >>>> about 142MB/s. >>>> Test 4: MTU of 1000: Read test at 182MB/s. >>>> Test 5: MTU of 900: Read test at 130 MB/s. >>>> Test 6: MTU of 1000: Read test at 160MB/s. >>>> Write tests >>>> are now >>>> consistently at about 300MB/s. >>>> Test 7: MTU of 1200: Read test at 124MB/s. >>>> Test 8: MTU of 1000: Read test at 161MB/s. >>>> Write at 261MB/s. >>>> >>>> A few final notes: >>>> L1ARC grabs about 10GB of RAM during the >>>> tests, so >>>> there's definitely some >>>> read caching going on. >>>> The write operations are easier to observe >>>> with iostat, >>>> and I'm seeing io >>>> rates that closely correlate with the >>>> network write speeds. >>>> >>>> >>>> Chris, thanks for your specific details. >>>> I'd appreciate >>>> it if you could >>>> tell me which copper NIC you tried, as >>>> well as to pass >>>> on the iSCSI tuning >>>> parameters. >>>> >>>> I've ordered an Intel EXPX9502AFXSR, which >>>> uses the >>>> 82598 chip instead of >>>> the 82599 in the X520. If I get similar >>>> results with my >>>> fiber transcievers, >>>> I'll see if I can get a hold of copper ones. >>>> >>>> But I should mention that I did indeed >>>> look at PHY/MAC >>>> error rates, and >>>> they are nil. >>>> >>>> -Warren V >>>> >>>> On Fri, Feb 20, 2015 at 7:25 PM, Chris >>>> Siebenmann >>>> >>> >>> >>>> >> >>>> >>>> wrote: >>>> >>>> >>>> After installation and >>>> configuration, I observed >>>> all kinds of bad >>>> behavior >>>> in the network traffic between the >>>> hosts and the >>>> server. All of this >>>> bad >>>> behavior is traced to the ixgbe >>>> driver on the >>>> storage server. Without >>>> going >>>> into the full troubleshooting >>>> process, here are >>>> my takeaways: >>>> >>>> [...] >>>> >>>> For what it's worth, we managed to >>>> achieve much >>>> better line rates on >>>> copper 10G ixgbe hardware of various >>>> descriptions >>>> between OmniOS >>>> and CentOS 7 (I don't think we ever >>>> tested OmniOS to >>>> OmniOS). I don't >>>> believe OmniOS could do TCP at full >>>> line rate but I >>>> think we managed 700+ >>>> Mbytes/sec on both transmit and >>>> receive and we got >>>> basically disk-limited >>>> speeds with iSCSI (across multiple >>>> disks on >>>> multi-disk mirrored pools, >>>> OmniOS iSCSI initiator, Linux iSCSI >>>> targets). >>>> >>>> I don't believe we did any specific >>>> kernel tuning >>>> (and in fact some of >>>> our attempts to fiddle ixgbe driver >>>> parameters blew >>>> up in our face). >>>> We did tune iSCSI connection >>>> parameters to increase >>>> various buffer >>>> sizes so that ZFS could do even large >>>> single >>>> operations in single iSCSI >>>> transactions. (More details available >>>> if people are >>>> interested.) >>>> >>>> 10: At the wire level, the speed >>>> problems are >>>> clearly due to pauses in >>>> response time by omnios. At 9000 >>>> byte frame >>>> sizes, I see a good number >>>> of duplicate ACKs and fast >>>> retransmits during >>>> read operations (when >>>> omnios is transmitting). But below >>>> about a >>>> 4100-byte MTU on omnios >>>> (which seems to correlate to >>>> 4096-byte iSCSI >>>> block transfers), the >>>> transmission errors fade away and >>>> we only see >>>> the transmission pause >>>> problem. >>>> >>>> >>>> This is what really attracted my >>>> attention. In >>>> our OmniOS setup, our >>>> specific Intel hardware had ixgbe >>>> driver issues that >>>> could cause >>>> activity stalls during once-a-second >>>> link heartbeat >>>> checks. This >>>> obviously had an effect at the TCP and >>>> iSCSI layers. >>>> My initial message >>>> to illumos-developer sparked a >>>> potentially >>>> interesting discussion: >>>> >>>> >>>> http://www.listbox.com/member/____archive/182179/2014/10/ >>>> sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__ >>>> 4B1D-__11E4-A39C-D534381BA44D/ >>>> >>> 10/sort/__time_rev/page/16/entry/6:405/__20141003125035: >>>> 6357079A-4B1D-__11E4-A39C-D534381BA44D/> >>>> >>>> >>> __sort/time_rev/page/16/entry/6:__405/20141003125035: >>>> 6357079A-__4B1D-11E4-A39C-D534381BA44D/ >>>> >>> sort/time_rev/page/16/entry/6:405/20141003125035:6357079A- >>>> 4B1D-11E4-A39C-D534381BA44D/>> >>>> >>>> If you think this is a possibility in >>>> your setup, >>>> I've put the DTrace >>>> script I used to hunt for this up on >>>> the web: >>>> >>>> http://www.cs.toronto.edu/~___ >>>> _cks/src/omnios-ixgbe/ixgbe_____delay.d >>>> >>> delay.d> >>>> >>>> >>> delay.d >>>> >>> delay.d>> >>>> >>>> This isn't the only potential source >>>> of driver >>>> stalls by any means, it's >>>> just the one I found. You may also >>>> want to look at >>>> lockstat in general, >>>> as information it reported is what led >>>> us to look >>>> specifically at the >>>> ixgbe code here. >>>> >>>> (If you suspect kernel/driver issues, >>>> lockstat >>>> combined with kernel >>>> source is a really excellent resource.) >>>> >>>> - cks >>>> >>>> >>>> >>>> >>>> >>>> ___________________________________________________ >>>> OmniOS-discuss mailing list >>>> OmniOS-discuss at lists.omniti >>>> .____com >>>> >>> > >>>> http://lists.omniti.com/____mailman/listinfo/omnios-____ >>>> discuss >>>> >>> discuss> >>>> >>>> >>> discuss >>>> > >>>> >>>> >>>> ___________________________________________________ >>>> OmniOS-discuss mailing list >>>> OmniOS-discuss at lists.omniti >>>> .____com >>>> >>> > >>>> http://lists.omniti.com/____mailman/listinfo/omnios-____ >>>> discuss >>>> >>> discuss> >>>> >>>> >>> discuss >>>> > >>>> >>>> >>>> -- >>>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, >>>> 90408 Nuernberg >>>> Tel: +49 911 39905-0 >>>> - Fax: +49 911 >>>> 39905-55 - >>>> http://www.osn.de >>>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg >>>> Goltermann >>>> >>>> >>>> >>>> -- >>>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 >>>> Nuernberg >>>> Tel: +49 911 39905-0 - Fax: +49 >>>> 911 39905-55 - http://www.osn.de >>>> >>>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >>>> >>>> >>>> *illumos-developer* | Archives >>>> >>>> >>> 21239177-3604570e> >>>> | Modify Your Subscription >>>> [Powered by Listbox] >>>> >>>> >>> >>> *illumos-developer* | Archives >>> >>> | >>> Modify >>> >> > >>> Your Subscription [Powered by Listbox] >>> >>> >> -- >> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg >> Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de >> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >> > > *illumos-developer* | Archives > > | > Modify > > Your Subscription > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garrett at damore.org Mon Mar 2 20:29:55 2015 From: garrett at damore.org (Garrett D'Amore) Date: Mon, 2 Mar 2015 12:29:55 -0800 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> <54F44602.5030705@osn.de> <064C3FC8-43E0-4757-AADA-831303782A4C@damore.org> Message-ID: Please include the *full* lockstep output ? in particular, the values give below look relatively normal ? there are however *multiple* groups emitted from lockstat ? blocking, spinning, etc. You can?t just include the very first entry because it might be for a lock condition (such as blocking) that doesn?t occur much. (The cases you list below are ?clean?, representing total delays measured in milliseconds or even hundreds of micros, so not very interesting.) - Garrett > On Mar 2, 2015, at 12:19 PM, W Verb wrote: > > Hello, > > vmstat seems pretty boring. Certainly nothing going to swap. > > root at sanbox:/root# vmstat > kthr memory page disk faults cpu > r b w swap free re mf pi po fr de sr po ro s0 s2 in sy cs us sy id > 0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0 1 99 > > > Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep 30" during the "fast" write operation. > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Hottest Lock Caller > 50934 3% 79% 0.00 3437 0xffffff093145ba40 taskq_dispatch_ent > > nsec ------ Time Distribution ------ count Stack > 128 | 7 spa_taskq_dispatch_ent > 256 |@@ 4333 zio_taskq_dispatch > 512 |@@ 3863 zio_issue_async > 1024 |@@@@@ 9717 zio_execute > 2048 |@@@@@@@@@ 15904 > 4096 |@@@@ 7595 > 8192 |@@ 4498 > 16384 |@ 2662 > 32768 |@ 1886 > 65536 | 434 > 131072 | 34 > 262144 | 1 > ------------------------------------------------------------------------------- > > > > However, the truly "broken" function is a read operation: > > Top lock 1st try: > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Hottest Lock Caller > 474 15% 15% 0.00 7031 0xffffff093145b6f8 cv_wait > > nsec ------ Time Distribution ------ count Stack > 256 |@ 29 taskq_thread_wait > 512 |@@@@@@ 100 taskq_thread > 1024 |@@@@ 72 thread_start > 2048 |@@@@ 69 > 4096 |@@@ 51 > 8192 |@@ 47 > 16384 |@@ 44 > 32768 |@@ 32 > 65536 |@ 25 > 131072 | 5 > ------------------------------------------------------------------------------- > > > Top lock 2nd try: > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Hottest Lock Caller > 174 39% 39% 0.00 103909 0xffffff0943f116a0 dmu_zfetch_find > > nsec ------ Time Distribution ------ count Stack > 2048 | 2 dmu_zfetch > 4096 | 3 dbuf_read > 8192 | 4 dmu_buf_hold_array_by_dnode > 16384 | 3 dmu_buf_hold_array > 32768 |@ 7 > 65536 |@@ 14 > 131072 |@@@@@@@@@@@@@@@@@@@@ 116 > 262144 |@@@ 19 > 524288 | 4 > 1048576 | 2 > ------------------------------------------------------------------------------- > > Top lock 3rd try: > > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Hottest Lock Caller > 283 55% 55% 0.00 94602 0xffffff0943ff5a68 dmu_zfetch_find > > nsec ------ Time Distribution ------ count Stack > 512 | 1 dmu_zfetch > 1024 | 1 dbuf_read > 2048 | 0 dmu_buf_hold_array_by_dnode > 4096 | 5 dmu_buf_hold_array > 8192 | 2 > 16384 | 7 > 32768 | 4 > 65536 |@@@ 33 > 131072 |@@@@@@@@@@@@@@@@@@@@ 198 > 262144 |@@ 27 > 524288 | 2 > 1048576 | 3 > ------------------------------------------------------------------------------- > > > As for the MTU question- setting the MTU to 9000 makes read operations grind almost to a halt at 5MB/s transfer rate. > > -Warren V > > On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore > wrote: > Here?s a theory. You are using small (relatively) MTUs (3000 is less than the smallest ZFS block size.) So, when you go multipathing this way, might a single upper layer transaction (ZFS block transfer request, or for that matter COMSTAR block request) get routed over different paths. This sounds like a potentially pathological condition to me. > > What happens if you increase the MTU to 9000? Have you tried it? I?m sort of thinking that this will permit each transaction to be issued in a single IP frame, which may alleviate certain tragic code paths. (That said, I?m not sure how aware COMSTAR is of the IP MTU. If it is ignorant, then it shouldn?t matter *that* much, since TCP should do the right thing here and a single TCP stream should stick to a single underlying NIC. But if COMSTAR is aware of the MTU, it may do some really screwball things as it tries to break requests up into single frames.) > > Your read spin really looks like only about 22 msec of wait out of a total run of 30 sec. (That?s not *great*, but neither does it sound tragic.) Your write is interesting because that looks like it is going a wildly different path. You should be aware that the locks you see are *not* necessarily related in call order, but rather are ordered by instance count. The write code path hitting the task_thread as hard as it does is really, really weird. Something is pounding on a taskq lock super hard. The number of taskq_dispatch_ent calls is interesting here. I?m starting to wonder if it?s something as stupid as a spin where if the taskq is ?full? (max size reached), a caller just is spinning trying to dispatch jobs to the taskq. > > The taskq_dispatch_ent code is super simple, and it should be almost impossible to have contention on that lock ? barring a thread spinning hard on taskq_dispatch (or taskq_dispatch_ent as I think is happening here). Looking at the various call sites, there are places in both COMSTAR (iscsit) and in ZFS where this could be coming from. To know which, we really need to have the back trace associated. > > lockstat can give this ? try giving ?-s 5? to give a short backtrace from this, that will probably give us a little more info about the guilty caller. :-) > > - Garrett > >> On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer > wrote: >> >> Hello all, >> I am not using layer 2 flow control. The switch carries line-rate 10G traffic without error. >> >> I think I have found the issue via lockstat. The first lockstat is taken during a multipath read: >> >> >> lockstat -kWP sleep 30 >> >> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec) >> >> Count indv cuml rcnt nsec Hottest Lock Caller >> ------------------------------------------------------------------------------- >> 9306 44% 44% 0.00 1557 htable_mutex+0x370 htable_release >> 6307 23% 68% 0.00 1207 htable_mutex+0x108 htable_lookup >> 596 7% 75% 0.00 4100 0xffffff0931705188 cv_wait >> 349 5% 80% 0.00 4437 0xffffff0931705188 taskq_thread >> 704 2% 82% 0.00 995 0xffffff0935de3c50 dbuf_create >> >> The hash table being read here I would guess is the tcp connection hash table. >> >> When lockstat is run during a multipath write operation, I get: >> >> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec) >> >> Count indv cuml rcnt nsec Hottest Lock Caller >> ------------------------------------------------------------------------------- >> 210752 28% 28% 0.00 4781 0xffffff0931705188 taskq_thread >> 174471 22% 50% 0.00 4476 0xffffff0931705188 cv_wait >> 127183 10% 61% 0.00 2871 0xffffff096f29b510 zio_notify_parent >> 176066 10% 70% 0.00 1922 0xffffff0931705188 taskq_dispatch_ent >> 105134 9% 80% 0.00 3110 0xffffff096ffdbf10 zio_remove_child >> 67512 4% 83% 0.00 1938 0xffffff096f3db4b0 zio_add_child >> 45736 3% 86% 0.00 2239 0xffffff0935de3c50 dbuf_destroy >> 27781 3% 89% 0.00 3416 0xffffff0935de3c50 dbuf_create >> 38536 2% 91% 0.00 2122 0xffffff0935de3b70 dnode_rele >> 27841 2% 93% 0.00 2423 0xffffff0935de3b70 dnode_diduse_space >> 19020 2% 95% 0.00 3046 0xffffff09d9e305e0 dbuf_rele >> 14627 1% 96% 0.00 3632 dbuf_hash_table+0x4f8 dbuf_find >> >> >> >> Writes are not performing htable lookups, while reads are. >> >> -Warren V >> >> >> >> >> >> >> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann > wrote: >> Hi, >> >> I would try *one* TPG which includes both interface addresses >> and I would double check for packet drops on the Catalyst. >> >> The 3560 supports only receive flow control which means, that >> a sending 10Gbit port can easily overload a 1Gbit port. >> Do you have flow control enabled? >> >> - Joerg >> >> >> On 02.03.2015 09:22, W Verb via illumos-developer wrote: >> Hello Garrett, >> >> No, no 802.3ad going on in this config. >> >> Here is a basic schematic: >> >> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing >> >> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide: >> >> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing >> >> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The >> switch is set to allow 9148-byte frames, and I'm not seeing any >> errors/buffer overruns on the switch. >> >> Here is a screenshot of a packet capture from a read operation on the >> guest OS (from it's local drive, which is actually a VMDK file on the >> storage server). In this example, only a single 1G ESXi kernel interface >> (vmk1) is bound to the software iSCSI initiator. >> >> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing >> >> Note that there's a nice, well-behaved window sizing process taking >> place. The ESXi decreases the scaled window by 11 or 12 for each ACK, >> then bumps it back up to 512. >> >> Here is a similar screenshot of a single-interface write operation: >> >> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing >> >> There are no pauses or gaps in the transmission rate in the >> single-interface transfers. >> >> >> In the next screenshots, I have enabled an additional 1G interface on >> the ESXi host, and bound it to the iSCSI initiator. The new interface is >> bound to a separate physical port, uses a different VLAN on the switch, >> and talks to a different 10G port on the storage server. >> >> First, let's look at a write operation on the guest OS, which happily >> pumps data at near-line-rate to the storage server. >> >> Here is a sequence number trace diagram. Note how the transfer has a >> nice, smooth increment rate over the entire transfer. >> >> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing >> >> Here are screenshots from packet captures on both 1G interfaces: >> >> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing >> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing >> >> Note how we again see nice, smooth window adjustment, and no gaps in >> transmission. >> >> >> But now, let's look at the problematic two-interface Read operation. >> First, the sequence graph: >> >> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing >> >> As you can see, there are gaps and jumps in the transmission throughout >> the transfer. >> It is very illustrative to look at captures of the gaps, which are >> occurring on both interfaces: >> >> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing >> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing >> >> As you can see, there are ~.4 second pauses in transmission from the >> storage server, which kills the transfer rate. >> It's clear that the ESXi box ACKs the prior iSCSI operation to >> completion, then makes a new LUN request, which the storage server >> immediately replies to. The ESXi ACKs the response packet from the >> storage server, then waits...and waits....and waits... until eventually >> the storage server starts transmitting again. >> >> Because the pause happens while the ESXi client is waiting for a packet >> from the storage server, that tells me that the gaps are not an artifact >> of traffic being switched between both active interfaces, but are >> actually indicative of short hangs occurring on the server. >> >> Having a pause or two in transmission is no big deal, but in my case, it >> is happening constantly, and dropping my overall read transfer rate down >> to 20-60MB/s, which is slower than the single interface transfer rate >> (~90-100MB/s). >> >> Decreasing the MTU makes the pauses shorter, increasing them makes the >> pauses longer. >> >> Another interesting thing is that if I set the multipath io interval to >> 3 operations instead of 1, I get better throughput. In other words, the >> less frequently I swap IP addresses on my iSCSI requests from the ESXi >> unit, the fewer pauses I see. >> >> Basically, COMSTAR seems to choke each time an iSCSI request from a new >> IP arrives. >> >> Because the single interface transfer is near line rate, that tells me >> that the storage system (mpt_sas, zfs, etc) is working fine. It's only >> when multiple paths are attempted that iSCSI falls on its face during reads. >> >> All of these captures were taken without a cache device being attached >> to the storage zpool, so this isn't looking like some kind of ZFS ARC >> problem. As mentioned previously, local transfers to/from the zpool are >> showing ~300-500 MB/s rates over long transfers (10G+). >> >> -Warren V >> >> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore >> >> wrote: >> >> I?m not sure I?ve followed properly. You have *two* interfaces. >> You are not trying to provision these in an aggr are you? As far as >> I?m aware, VMware does not support 802.3ad link aggregations. (Its >> possible that you can make it work with ESXi if you give the entire >> NIC to the guest ? but I?m skeptical.) The problem is that if you >> try to use link aggregation, some packets (up to half!) will be >> lost. TCP and other protocols fare poorly in this situation. >> >> Its possible I?ve totally misunderstood what you?re trying to do, in >> which case I apologize. >> >> The idle thing is a red-herring ? the cpu is waiting for work to do, >> probably because packets haven?t arrived (or where dropped by the >> hypervisor!) I wouldn?t read too much into that except that your >> network stack is in trouble. I?d look a bit more closely at the >> kstats for tcp ? I suspect you?ll see retransmits or out of order >> values that are unusually high ? if so this may help validate my >> theory above. >> >> - Garrett >> >> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer >> >> >> >> wrote: >> >> Hello all, >> >> >> Well, I no longer blame the ixgbe driver for the problems I'm seeing. >> >> >> I tried Joerg's updated driver, which didn't improve the issue. So >> I went back to the drawing board and rebuilt the server from scratch. >> >> What I noted is that if I have only a single 1-gig physical >> interface active on the ESXi host, everything works as expected. >> As soon as I enable two interfaces, I start seeing the performance >> problems I've described. >> >> Response pauses from the server that I see in TCPdumps are still >> leading me to believe the problem is delay on the server side, so >> I ran a series of kernel dtraces and produced some flamegraphs. >> >> >> This was taken during a read operation with two active 10G >> interfaces on the server, with a single target being shared by two >> tpgs- one tpg for each 10G physical port. The host device has two >> 1G ports enabled, with VLANs separating the active ports into >> 10G/1G pairs. ESXi is set to multipath using both VLANS with a >> round-robin IO interval of 1. >> >> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing >> >> >> This was taken during a write operation: >> >> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing >> >> >> I then rebooted the server and disabled C-State, ACPI T-State, and >> general EIST (Turbo boost) functionality in the CPU. >> >> I when I attempted to boot my guest VM, the iSCSI transfer >> gradually ground to a halt during the boot loading process, and >> the guest OS never did complete its boot process. >> >> Here is a flamegraph taken while iSCSI is slowly dying: >> >> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing >> >> >> I edited out cpu_idle_adaptive from the dtrace output and >> regenerated the slowdown graph: >> >> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing >> >> >> I then edited cpu_idle_adaptive out of the speedy write operation >> and regenerated that graph: >> >> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing >> >> >> I have zero experience with interpreting flamegraphs, but the most >> significant difference I see between the slow read example and the >> fast write example is in unix`thread_start --> unix`idle. There's >> a good chunk of "unix`i86_mwait" in the read example that is not >> present in the write example at all. >> >> Disabling the l2arc cache device didn't make a difference, and I >> had to reenable EIST support on the CPU to get my VMs to boot. >> >> I am seeing a variety of bug reports going back to 2010 regarding >> excessive mwait operations, with the suggested solutions usually >> being to set "cpupm enable poll-mode" in power.conf. That change >> also had no effect on speed. >> >> -Warren V >> >> >> >> >> -----Original Message----- >> >> From: Chris Siebenmann [mailto:cks at cs.toronto.edu ] >> >> Sent: Monday, February 23, 2015 8:30 AM >> >> To: W Verb >> >> Cc: omnios-discuss at lists.omniti.com >> >; cks at cs.toronto.edu >> > >> >> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and >> the Greek economy >> >> >> > Chris, thanks for your specific details. I'd appreciate it if you >> >> > could tell me which copper NIC you tried, as well as to pass on the >> >> > iSCSI tuning parameters. >> >> >> Our copper NIC experience is with onboard X540-AT2 ports on >> SuperMicro hardware (which have the guaranteed 10-20 msec lock >> hold) and dual-port 82599EB TN cards (which have some sort of >> driver/hardware failure under load that eventually leads to >> 2-second lock holds). I can't recommend either with the current >> driver; we had to revert to 1G networking in order to get stable >> servers. >> >> >> The iSCSI parameter modifications we do, across both initiators >> and targets, are: >> >> >> initialr2tno >> >> firstburstlength128k >> >> maxrecvdataseglen128k[only on Linux backends] >> >> maxxmitdataseglen128k[only on Linux backends] >> >> >> The OmniOS initiator doesn't need tuning for more than the first >> two parameters; on the Linux backends we tune up all four. My >> extended thoughts on these tuning parameters and why we touch them >> can be found >> >> here: >> >> >> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol >> >> http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning >> >> >> The short version is that these parameters probably only make a >> small difference but their overall goal is to do 128KB ZFS reads >> and writes in single iSCSI operations (although they will be >> fragmented at the TCP >> >> layer) and to do iSCSI writes without a back-and-forth delay >> between initiator and target (that's 'initialr2t no'). >> >> >> I think basically everyone should use InitialR2T set to no and in >> fact that it should be the software default. These days only >> unusually limited iSCSI targets should need it to be otherwise and >> they can change their setting for it (initiator and target must >> both agree to it being 'yes', so either can veto it). >> >> >> - cks >> >> >> >> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann >> >> wrote: >> >> Hi, >> >> I think your problem is caused by your link properties or your >> switch settings. In general the standard ixgbe seems to perform >> well. >> >> I had trouble after changing the default flow control settings >> to "bi" >> and this was my motivation to update the ixgbe driver a long >> time ago. >> After I have updated our systems to ixgbe 2.5.8 I never had any >> problems .... >> >> Make sure your switch has support for jumbo frames and you use >> the same mtu on all ports, otherwise the smallest will be used. >> >> What switch do you use? I can tell you nice horror stories about >> different vendors.... >> >> - Joerg >> >> On 23.02.2015 10:31, W Verb wrote: >> >> Thank you Joerg, >> >> I've downloaded the package and will try it tomorrow. >> >> The only thing I can add at this point is that upon review >> of my >> testing, I may have performed my "pkg -u" between the >> initial quad-gig >> performance test and installing the 10G NIC. So this may >> be a new >> problem introduced in the latest updates. >> >> Those of you who are running 10G and have not upgraded to >> the latest >> kernel, etc, might want to do some additional testing >> before running the >> update. >> >> -Warren V >> >> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann >> > >> >>> wrote: >> >> Hi, >> >> I remember there was a problem with the flow control >> settings in the >> ixgbe >> driver, so I updated it a long time ago for our >> internal servers to >> 2.5.8. >> Last weekend I integrated the latest changes from the >> FreeBSD driver >> to bring >> the illumos ixgbe to 2.5.25 but I had no time to test >> it, so it's >> completely >> untested! >> >> >> If you would like to give the latest driver a try you >> can fetch the >> kernel modules from >> https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 >> > >> >> >> >> >> Clone your boot environment, place the modules in the >> new environment >> and update the boot-archive of the new BE. >> >> - Joerg >> >> >> >> >> >> On 23.02.2015 02:54, W Verb wrote: >> >> By the way, to those of you who have working >> setups: please send me >> your pool/volume settings, interface linkprops, >> and any kernel >> tuning >> parameters you may have set. >> >> Thanks, >> Warren V >> >> On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip >> > >> >>> >> >> wrote: >> >> I can't say I totally agree with your performance >> assessment. I run Intel >> X520 in all my OmniOS boxes. >> >> Here is a capture of nfssvrtop I made while >> running many >> storage vMotions >> between two OmniOS boxes hosting NFS >> datastores. This is a >> 10 host VMware >> cluster. Both OmniOS boxes are dual 10G >> connected with >> copper twin-ax to >> the in rack Nexus 5010. >> >> VMware does 100% sync writes, I use ZeusRAM >> SSDs for log >> devices. >> >> -Chip >> >> 2014 Apr 24 08:05:51, load: 12.64, read: >> 17330243 KB, >> swrite: 15985 KB, >> awrite: 1875455 KB >> >> Ver Client NFSOPS Reads >> SWrites AWrites >> Commits Rd_bw >> SWr_bw AWr_bw Rd_t SWr_t AWr_t >> Com_t Align% >> >> 4 10.28.17.105 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.17.215 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.17.213 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.16.151 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 all 1 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 3 10.28.16.175 3 0 >> 3 0 >> 0 1 >> 11 0 4806 48 0 0 85 >> >> 3 10.28.16.183 6 0 >> 6 0 >> 0 3 >> 162 0 549 124 0 0 >> 73 >> >> 3 10.28.16.180 11 0 >> 10 0 >> 0 3 >> 27 0 776 89 0 0 67 >> >> 3 10.28.16.176 28 2 >> 26 0 >> 0 10 >> 405 0 2572 198 0 0 >> 100 >> >> 3 10.28.16.178 4606 4602 >> 4 0 >> 0 294534 >> 3 0 723 49 0 0 99 >> >> 3 10.28.16.179 4905 4879 >> 26 0 >> 0 312208 >> 311 0 735 271 0 0 >> 99 >> >> 3 10.28.16.181 5515 5502 >> 13 0 >> 0 352107 >> 77 0 89 87 0 0 99 >> >> 3 10.28.16.184 12095 12059 >> 10 0 >> 0 763014 >> 39 0 249 147 0 0 99 >> >> 3 10.28.58.1 15401 6040 >> 116 6354 >> 53 191605 >> 474 202346 192 96 144 83 >> 99 >> >> 3 all 42574 33086 >> 217 >> 6354 53 1913488 >> 1582 202300 348 138 153 105 >> 99 >> >> >> >> >> >> On Fri, Feb 20, 2015 at 11:46 PM, W Verb >> > >> >> >> >>> wrote: >> >> >> Hello All, >> >> Thank you for your replies. >> I tried a few things, and found the following: >> >> 1: Disabling hyperthreading support in the >> BIOS drops >> performance overall >> by a factor of 4. >> 2: Disabling VT support also seems to have >> some effect, >> although it >> appears to be minor. But this has the >> amusing side >> effect of fixing the >> hangs I've been experiencing with fast >> reboot. Probably >> by disabling kvm. >> 3: The performance tests are a bit tricky >> to quantify >> because of caching >> effects. In fact, I'm not entirely sure >> what is >> happening here. It's just >> best to describe what I'm seeing: >> >> The commands I'm using to test are >> dd if=/dev/zero of=./test.dd bs=2M count=5000 >> dd of=/dev/null if=./test.dd bs=2M count=5000 >> The host vm is running Centos 6.6, and has >> the latest >> vmtools installed. >> There is a host cache on an SSD local to >> the host that >> is also in place. >> Disabling the host cache didn't >> immediately have an >> effect as far as I could >> see. >> >> The host MTU set to 3000 on all iSCSI >> interfaces for all >> tests. >> >> Test 1: Right after reboot, with an ixgbe >> MTU of 9000, >> the write test >> yields an average speed over three tests >> of 137MB/s. The >> read test yields an >> average over three tests of 5MB/s. >> >> Test 2: After setting "ifconfig ixgbe0 mtu >> 3000", the >> write tests yield >> 140MB/s, and the read tests yield 53MB/s. >> It's important >> to note here that >> if I cut the read test short at only >> 2-3GB, I get >> results upwards of >> 350MB/s, which I assume is local >> cache-related distortion. >> >> Test 3: MTU of 1500. Read tests are up to >> 156 MB/s. >> Write tests yield >> about 142MB/s. >> Test 4: MTU of 1000: Read test at 182MB/s. >> Test 5: MTU of 900: Read test at 130 MB/s. >> Test 6: MTU of 1000: Read test at 160MB/s. >> Write tests >> are now >> consistently at about 300MB/s. >> Test 7: MTU of 1200: Read test at 124MB/s. >> Test 8: MTU of 1000: Read test at 161MB/s. >> Write at 261MB/s. >> >> A few final notes: >> L1ARC grabs about 10GB of RAM during the >> tests, so >> there's definitely some >> read caching going on. >> The write operations are easier to observe >> with iostat, >> and I'm seeing io >> rates that closely correlate with the >> network write speeds. >> >> >> Chris, thanks for your specific details. >> I'd appreciate >> it if you could >> tell me which copper NIC you tried, as >> well as to pass >> on the iSCSI tuning >> parameters. >> >> I've ordered an Intel EXPX9502AFXSR, which >> uses the >> 82598 chip instead of >> the 82599 in the X520. If I get similar >> results with my >> fiber transcievers, >> I'll see if I can get a hold of copper ones. >> >> But I should mention that I did indeed >> look at PHY/MAC >> error rates, and >> they are nil. >> >> -Warren V >> >> On Fri, Feb 20, 2015 at 7:25 PM, Chris >> Siebenmann >> >> > >> >> >>> >> >> wrote: >> >> >> After installation and >> configuration, I observed >> all kinds of bad >> behavior >> in the network traffic between the >> hosts and the >> server. All of this >> bad >> behavior is traced to the ixgbe >> driver on the >> storage server. Without >> going >> into the full troubleshooting >> process, here are >> my takeaways: >> >> [...] >> >> For what it's worth, we managed to >> achieve much >> better line rates on >> copper 10G ixgbe hardware of various >> descriptions >> between OmniOS >> and CentOS 7 (I don't think we ever >> tested OmniOS to >> OmniOS). I don't >> believe OmniOS could do TCP at full >> line rate but I >> think we managed 700+ >> Mbytes/sec on both transmit and >> receive and we got >> basically disk-limited >> speeds with iSCSI (across multiple >> disks on >> multi-disk mirrored pools, >> OmniOS iSCSI initiator, Linux iSCSI >> targets). >> >> I don't believe we did any specific >> kernel tuning >> (and in fact some of >> our attempts to fiddle ixgbe driver >> parameters blew >> up in our face). >> We did tune iSCSI connection >> parameters to increase >> various buffer >> sizes so that ZFS could do even large >> single >> operations in single iSCSI >> transactions. (More details available >> if people are >> interested.) >> >> 10: At the wire level, the speed >> problems are >> clearly due to pauses in >> response time by omnios. At 9000 >> byte frame >> sizes, I see a good number >> of duplicate ACKs and fast >> retransmits during >> read operations (when >> omnios is transmitting). But below >> about a >> 4100-byte MTU on omnios >> (which seems to correlate to >> 4096-byte iSCSI >> block transfers), the >> transmission errors fade away and >> we only see >> the transmission pause >> problem. >> >> >> This is what really attracted my >> attention. In >> our OmniOS setup, our >> specific Intel hardware had ixgbe >> driver issues that >> could cause >> activity stalls during once-a-second >> link heartbeat >> checks. This >> obviously had an effect at the TCP and >> iSCSI layers. >> My initial message >> to illumos-developer sparked a potentially >> interesting discussion: >> >> >> http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/ >> > >> >> >> >> >> >> If you think this is a possibility in >> your setup, >> I've put the DTrace >> script I used to hunt for this up on >> the web: >> >> http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d >> > >> >> >> >> >> >> This isn't the only potential source >> of driver >> stalls by any means, it's >> just the one I found. You may also >> want to look at >> lockstat in general, >> as information it reported is what led >> us to look >> specifically at the >> ixgbe code here. >> >> (If you suspect kernel/driver issues, >> lockstat >> combined with kernel >> source is a really excellent resource.) >> >> - cks >> >> >> >> >> >> ___________________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti >> >.____com >> __omniti.com >> >> >> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss >> > >> >> >> >> >> >> >> ___________________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti >> >.____com >> __omniti.com >> >> >> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss >> > >> >> >> >> >> >> >> -- >> OSN Online Service Nuernberg GmbH, Bucher Str. 78, >> 90408 Nuernberg >> Tel: +49 911 39905-0 >> - Fax: +49 911 >> 39905-55 - >> http://www.osn.de > >> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg >> Goltermann >> >> >> >> -- >> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg >> Tel: +49 911 39905-0 - Fax: +49 >> 911 39905-55 - http://www.osn.de >> > >> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >> >> >> *illumos-developer* | Archives >> > >> > >> | Modify > Your Subscription >> [Powered by Listbox] > >> >> >> >> *illumos-developer* | Archives >> > >> > | >> Modify >> > >> Your Subscription [Powered by Listbox] > >> >> >> -- >> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg >> Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de >> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >> >> illumos-developer | Archives | Modify Your Subscription > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garrett at damore.org Mon Mar 2 20:32:26 2015 From: garrett at damore.org (Garrett D'Amore) Date: Mon, 2 Mar 2015 12:32:26 -0800 Subject: [OmniOS-discuss] [developer] The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> <54F44602.5030705@osn.de> Message-ID: <148BDD1F-A504-425B-8D30-78BED699E71F@damore.org> However, if you look at the total times involved ? we?re talking about 1.5 us average, and only 9.3k events. So there maybe some VM activity, but it is only responsible for 14.4 ms over the entire 30 sec run. Again, that?s not *great*, but its unlikely to be related (at least directly) for the tragic behavior we?ve seen elsewhere. (Indeed, the claim is that reads are good, so this result is from a good side, hence a red herring.) - Garrett > On Mar 2, 2015, at 11:15 AM, Dan McDonald via illumos-developer wrote: > > >> On Mar 2, 2015, at 2:07 PM, W Verb via illumos-developer wrote: >> >> Count indv cuml rcnt nsec Hottest Lock Caller >> ------------------------------------------------------------------------------- >> 9306 44% 44% 0.00 1557 htable_mutex+0x370 htable_release >> 6307 23% 68% 0.00 1207 htable_mutex+0x108 htable_lookup >> 596 7% 75% 0.00 4100 0xffffff0931705188 cv_wait >> 349 5% 80% 0.00 4437 0xffffff0931705188 taskq_thread >> 704 2% 82% 0.00 995 0xffffff0935de3c50 dbuf_create > > That has NOTHING to do with TCP. > > It has everything to do with the Virtual Memory subsystem. Here, see all the callers to htable_release(): > > http://src.illumos.org/source/search?q=&defs=&refs=htable_release&path=&hist=&project=illumos-gate > > I think "VM thrashing" when I see that. > > Dan > > > > ------------------------------------------- > illumos-developer > Archives: https://www.listbox.com/member/archive/182179/=now > RSS Feed: https://www.listbox.com/member/archive/rss/182179/21239177-3604570e > Modify Your Subscription: https://www.listbox.com/member/?member_id=21239177&id_secret=21239177-2d0c9337 > Powered by Listbox: http://www.listbox.com From nrhuff at umn.edu Tue Mar 3 15:41:13 2015 From: nrhuff at umn.edu (Nathan Huff) Date: Tue, 03 Mar 2015 09:41:13 -0600 Subject: [OmniOS-discuss] Long group names in ls acl output Message-ID: <54F5D619.6000902@umn.edu> We have a couple omnios servers that are getting NSS info from an AD domain that we use for serving CIFS shares. There are groups in our domain that are longer than the 20 characters. Looking at the ls man page and source code I don't think there is an option to display either a longer string or the uid/gid instead. The main issue that I run into is not only are the group names long, some have a common prefix that is over 20 characters long. In some cases the only way to tell which groups has which permissions is to connect to the share from windows and look at the ACLs that way. Is there some way to disambiguate this case I don't know about? -- Nathan Huff System Administrator Academic Health Center Information Systems University of Minnesota 612-626-9136 From henson at acm.org Tue Mar 3 20:58:03 2015 From: henson at acm.org (Paul B. Henson) Date: Tue, 3 Mar 2015 12:58:03 -0800 Subject: [OmniOS-discuss] Long group names in ls acl output In-Reply-To: <54F5D619.6000902@umn.edu> References: <54F5D619.6000902@umn.edu> Message-ID: <2dfa01d055f4$bfa16130$3ee42390$@acm.org> > Nathan Huff > Sent: Tuesday, March 03, 2015 7:41 AM > > domain that are longer than the 20 characters. Looking at the ls man > page and source code I don't think there is an option to display either > a longer string or the uid/gid instead. $ uname -a SunOS storage 5.11 omnios-10b9c79 i86pc i386 i86pc $ man ls [...] -n, --numeric-uid-gid like -l, but list numeric user and group IDs $ cd / ; ls -n total 535 lrwxrwxrwx 1 0 0 9 Sep 2 2013 bin -> ./usr/bin drwxr-xr-x 6 0 3 9 Sep 2 2013 boot drwxr-xr-x 238 0 3 238 Jan 20 16:09 dev drwxr-xr-x 9 0 3 9 Jan 20 16:08 devices drwxr-xr-x 63 0 3 207 Jan 20 16:09 etc drwxr-xr-x 7 0 0 7 Dec 14 20:06 export dr-xr-xr-x 2 0 0 2 Aug 14 2013 home drwxr-xr-x 18 0 3 19 Dec 20 18:00 kernel drwxr-xr-x 10 0 2 180 Jan 16 18:42 lib drwxr-xr-x 2 0 0 3 Sep 2 2013 media drwxr-xr-x 2 0 3 2 Sep 2 2013 mnt dr-xr-xr-x 2 0 0 2 Sep 2 2013 net drwxr-xr-x 3 0 3 3 Dec 20 17:49 opt drwxr-xr-x 5 0 3 5 Aug 14 2013 platform dr-xr-xr-x 128 0 0 480032 Mar 3 12:53 proc drwx------ 3 0 0 7 Dec 20 20:45 root drwxr-xr-x 3 0 0 3 Sep 2 2013 rpool drwxr-xr-x 2 0 3 62 Nov 24 10:34 sbin drwxr-xr-x 5 0 0 5 Aug 14 2013 system drwxrwxrwt 3 0 3 242 Mar 3 12:50 tmp drwxr-xr-x 29 0 3 41 Nov 24 10:34 usr drwxr-xr-x 35 0 3 35 Nov 8 2013 var drwxr-xr-x 4 0 0 4 Dec 20 20:46 zones From nrhuff at umn.edu Tue Mar 3 21:04:16 2015 From: nrhuff at umn.edu (Nathan Huff) Date: Tue, 03 Mar 2015 15:04:16 -0600 Subject: [OmniOS-discuss] Long group names in ls acl output In-Reply-To: <2dfa01d055f4$bfa16130$3ee42390$@acm.org> References: <54F5D619.6000902@umn.edu> <2dfa01d055f4$bfa16130$3ee42390$@acm.org> Message-ID: <54F621D0.6070506@umn.edu> -n works for the regular user and group but seems to have no effect on ACL entries /tank/shares$ /usr/bin/ls -Vn total 132 drwx------+ 5 256902 0 5 Jan 30 13:21 archive group:domain users:r-x---a-R-c---:-d-----:allow group:ahc_server_ops:rwxpdDaARWcCos:fd-----:allow owner@:rwxpdDaARWc--s:fd-----:allow group@:--------------:fd-----:allow everyone@:--------------:fd-----:allow . . . On 2015-03-03 2:58 PM, Paul B. Henson wrote: >> Nathan Huff >> Sent: Tuesday, March 03, 2015 7:41 AM >> >> domain that are longer than the 20 characters. Looking at the ls man >> page and source code I don't think there is an option to display either >> a longer string or the uid/gid instead. > > $ uname -a > SunOS storage 5.11 omnios-10b9c79 i86pc i386 i86pc > > $ man ls > [...] > -n, --numeric-uid-gid > like -l, but list numeric user and group IDs > > $ cd / ; ls -n > total 535 > lrwxrwxrwx 1 0 0 9 Sep 2 2013 bin -> ./usr/bin > drwxr-xr-x 6 0 3 9 Sep 2 2013 boot > drwxr-xr-x 238 0 3 238 Jan 20 16:09 dev > drwxr-xr-x 9 0 3 9 Jan 20 16:08 devices > drwxr-xr-x 63 0 3 207 Jan 20 16:09 etc > drwxr-xr-x 7 0 0 7 Dec 14 20:06 export > dr-xr-xr-x 2 0 0 2 Aug 14 2013 home > drwxr-xr-x 18 0 3 19 Dec 20 18:00 kernel > drwxr-xr-x 10 0 2 180 Jan 16 18:42 lib > drwxr-xr-x 2 0 0 3 Sep 2 2013 media > drwxr-xr-x 2 0 3 2 Sep 2 2013 mnt > dr-xr-xr-x 2 0 0 2 Sep 2 2013 net > drwxr-xr-x 3 0 3 3 Dec 20 17:49 opt > drwxr-xr-x 5 0 3 5 Aug 14 2013 platform > dr-xr-xr-x 128 0 0 480032 Mar 3 12:53 proc > drwx------ 3 0 0 7 Dec 20 20:45 root > drwxr-xr-x 3 0 0 3 Sep 2 2013 rpool > drwxr-xr-x 2 0 3 62 Nov 24 10:34 sbin > drwxr-xr-x 5 0 0 5 Aug 14 2013 system > drwxrwxrwt 3 0 3 242 Mar 3 12:50 tmp > drwxr-xr-x 29 0 3 41 Nov 24 10:34 usr > drwxr-xr-x 35 0 3 35 Nov 8 2013 var > drwxr-xr-x 4 0 0 4 Dec 20 20:46 zones > > > -- Nathan Huff System Administrator Academic Health Center Information Systems University of Minnesota 612-626-9136 From asc1111 at gmail.com Tue Mar 3 23:44:52 2015 From: asc1111 at gmail.com (Aaron Curry) Date: Tue, 3 Mar 2015 16:44:52 -0700 Subject: [OmniOS-discuss] CIFS File Lock Problems Message-ID: Hi all, We have encountered an issue with out OmniOS CIFS file server and file locks. Every now and then we get a call from a user trying to access a file but they can't because it says that the file is in use by another user. Long story short, we track down the session holding the lock, kill the session, file locks are released and the user can access the file. Left at that, it sounds pretty normal for a file server. But, it's not as simple as it sounds. Problem #1: Frequency This happens much more often than I am used to with other file servers. Not only that, but the frequency of "file locked" incidences seems to slowly increase over time until we are inundated with requested to unlock files. At that point we reboot the server and everyone is happy... at least for a week or so. Problem #2: Stale sessions It seems that the problem is caused by sessions becoming disconnected from the client and the server is not cleaning up those orphaned sessions. We can tell this because once we track down the session holding the lock, shutting down the desktop/laptop/whatever that initiated the session does not clean it up. The session stays open on the server. If you bring the client device back up, instead of reconnecting to the existing session it creates a new one. Often times the session holding locks is owned by the same user who is unable to access the file. When we first encountered this problem I changed the keep_alive setting on the smb server from the default of 5400 to 300. This seemed to help. At first I told people to wait 5 minutes and then the session cleared and they were able to access their files, but it doesn't seem to be working any more. Or at least changing keep_alive only fixed one problem and either didn't resolve another or maybe caused another problem? Problem #3: Tracking down open files Most NAS devices I have worked with have you manage session and open files through the Windows Computer Management console. You use that to connect to the NAS device and can see all session and open files. Through that console you can also kill sessions or the locks on specific files. It doesn't work very well in this case. Windows 7 takes forever to try to load the session information and seems to try to refresh while its still loading. The result is that it is unusable. XP/2003 loads session information just fine but if there's more than just a few sessions the open files list is empty. I even ran a packet capture on the client to see if it just didn't understand what the server was saying. Not the case. If there's more than 5 or so sessions, the request for open files returns an empty array of items. We have been using mdb to track down the open files, which is a pain. Getting a list of sessions from mdb is easy enough with ::smblist but open files are only returned as an address which then needs to be checked against ::smbnode to get the path and file name. I wrote a script to parse all the information and return something usable, but I'm not much of a programmer. It takes about 15 minutes to run and having to run it every time someone calls about a file lock is a waste of time. Problem #4: Releasing file locks Similar to problem #3. Normally we would connect to the NAS device with Windows Computer Management console, go to Open Files, find the file, right-click and Close Open File. Since we can't get a list of open files in the Computer Management console, that obviously doesn't work. That's where we've been tracking down the session holding the lock, pulling up the sessions in Computer Manager on a Windows 2003 machine and killing the session. This is going to become a very big problem in the near future as we retire all our XP/2003 systems. So those are the problems we are facing with our OmniOS CIFS server. If you are still reading this, thank you for your patience. We're at a loss on where to go from here. Understandably, the end users (and management) are starting to get a little grouchy. The questions we are having trouble answering are: Why do sessions seem to get disconnect and hold locks open? When a user / client machine combo reconnects, why doesn't it reuse an existing session and assume responsibility for open files? Why does the problem seem to grow in frequency over time (sounds like a system stability issue)? What is the best way to monitor / list active sessions and open files? Is there a way to kill individual file locks / close open files? Any help would be appreciated. Thank you, Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From wverb73 at gmail.com Tue Mar 3 23:45:29 2015 From: wverb73 at gmail.com (W Verb) Date: Tue, 3 Mar 2015 15:45:29 -0800 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> <54F44602.5030705@osn.de> <064C3FC8-43E0-4757-AADA-831303782A4C@damore.org> Message-ID: Hello Rob et al, Thank you for taking the time to look at this problem with me. I completely understand your inclination to look at the network as the most probable source of my issue, but I believe that this is a pretty clear-cut case of server-side issues. 1: I did run ping RTT tests during both read and write operations with multiple interfaces enabled, and the RTT stayed at ~.2ms regardless of whether traffic was actively being transmitted/received or not. 2: I am not seeing the TCP window size bouncing around, and I am certainly not seeing starvation and delay in my packet captures. It is true that I do see delayed ACKs and retransmissions when I bump the MTU to 9000 on both sides, but I stopped testing with high MTU as soon as I saw it happening because I have a good understanding of incast. All of my recent testing has been with MTUs between 1000 and 3000 bytes. 3: When testing with MTUs between 1000 and 3000 bytes, I do not see lost packets and retransmission in captures on either the server or client side. I only see staggered transmission delays on the part of the server. 4: The client is consistently advertising a large window size (20k+), so the TCP throttling mechanism does not appear to play into this. 5: As mentioned previously, layer 2 flow control is not enabled anywhere in the network, so there are no lower-level mechanisms at work. 6: Upon checking buffer and queue sizes (and doing the appropriate research into documentation on the C3560E's buffer sizes), I do not see large numbers of frames being dropped by the switch. It does happen at larger MTUs, but not very often (and not consistently) during transfers at 1000-3000 byte MTUs. I do not have QoS, policing, or rate-shaping enabled. 7: Network interface stats on both the server and the ESXi client show no errors of any kind. This is via netstat on the server, and esxcli / Vsphere client on the ESXi box. 8: When looking at captures taken simultaneously on the server and client side, the server-side transmission pauses are consistently seen and reproducible, even after multiple server rebuilds, ESXi rebuilds, vSphere reinstallations (down to wiping the SQL db), various COMSTAR configuration variations, multiple 10G NICs with different NIC chipsets, multiple switches (I tried both a 48-port and 24-port C3560E), multiple IOS revisions (12.2 and 15.0), OmniOS versions (r151012 and previous) multiple cables, transceivers, etc etc etc etc etc For your review, I have uploaded the actual packet captures to Google Drive: https://drive.google.com/file/d/0BwyUMjibonYQZS03dHJ2ZjJvTEE/view?usp=sharing 2 int write - ESXi vmk5 https://drive.google.com/file/d/0BwyUMjibonYQTlNTQ2M5bjlxZ00/view?usp=sharing 2 int write - ESXi vmk1 https://drive.google.com/file/d/0BwyUMjibonYQUEFsSVJCYXBVX3c/view?usp=sharing 2 int read - server ixgbe0 https://drive.google.com/file/d/0BwyUMjibonYQT3FBbElnNFpJTzQ/view?usp=sharing 2 int read - ESXi vmk5 https://drive.google.com/file/d/0BwyUMjibonYQU1hXdFRLM2cxSTA/view?usp=sharing 2 int read - ESXi vmk1 https://drive.google.com/file/d/0BwyUMjibonYQNEFZSHVZdFNweDA/view?usp=sharing 1 int write - ESXi vmk1 https://drive.google.com/file/d/0BwyUMjibonYQM3FpTmloQm5iMGc/view?usp=sharing 1 int read - ESXi vmk1 Regards, Warren V On Mon, Mar 2, 2015 at 1:11 PM, Mallory, Rob wrote: > Just an EWAG, and forgive me for not following closely, I just saw > this in my inbox, and looked at it and the screenshots for 2 minutes. > > > > But this looks like the typical incast problem.. see > http://www.pdl.cmu.edu/Incast/ > > where your storage servers (there are effectively two with ISCSI/MPIO if > round-robin is working) have networks which are 20:1 oversubscribed to your > 1GbE host interfaces. (although one of the tcpdumps shows only one server > so it may be choked out completely) > > > > What is your BDP? I?m guessing .150ms * 1GbE. For single-link that gets > you to a MSS of 18700 or so. > > > > On your 1GbE connected clients, leave MTU at 9k, set the following in > sysctl.conf, > > And reboot. > > > > net.ipv4.tcp_rmem = 4096 8938 17876 > > > > If MPIO from the server is indeed round-robining properly, this will ?make > things fit? much better. > > > > Note that your tcp_wmem can and should stay high, since you are not > oversubscribed going from client?server ; you only need to tweak the tcp > receive window size. > > > > I?ve not done it in quite some time, but IIRC, You can also set these from > the server side with: > > Route add -sendpipe 8930 or ?ssthresh > > > > And I think you can see the hash-table with computed BDP per client with > ndd. > > > > I would try playing with those before delving deep into potential bugs in > the TCP, nic driver, zfs, or vm. > > -Rob > > > > *From:* W Verb via illumos-developer [mailto:developer at lists.illumos.org] > *Sent:* Monday, March 02, 2015 12:20 PM > *To:* Garrett D'Amore > *Cc:* Joerg Goltermann; illumos-dev; omnios-discuss at lists.omniti.com > *Subject:* Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, Lindsay > Lohan, and the Greek economy > > > > Hello, > > vmstat seems pretty boring. Certainly nothing going to swap. > > root at sanbox:/root# vmstat > kthr memory page disk faults cpu > r b w swap free re mf pi po fr de sr po ro s0 s2 in sy cs us > sy id > 0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0 > 1 99 > > Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep > 30" during the "fast" write operation. > > > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Hottest Lock Caller > 50934 3% 79% 0.00 3437 0xffffff093145ba40 taskq_dispatch_ent > > nsec ------ Time Distribution ------ count Stack > 128 | 7 spa_taskq_dispatch_ent > 256 |@@ 4333 zio_taskq_dispatch > 512 |@@ 3863 zio_issue_async > 1024 |@@@@@ 9717 zio_execute > 2048 |@@@@@@@@@ 15904 > 4096 |@@@@ 7595 > 8192 |@@ 4498 > 16384 |@ 2662 > 32768 |@ 1886 > 65536 | 434 > 131072 | 34 > 262144 | 1 > > ------------------------------------------------------------------------------- > > > However, the truly "broken" function is a read operation: > > Top lock 1st try: > > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Hottest Lock Caller > 474 15% 15% 0.00 7031 0xffffff093145b6f8 cv_wait > > nsec ------ Time Distribution ------ count Stack > 256 |@ 29 taskq_thread_wait > 512 |@@@@@@ 100 taskq_thread > 1024 |@@@@ 72 thread_start > 2048 |@@@@ 69 > 4096 |@@@ 51 > 8192 |@@ 47 > 16384 |@@ 44 > 32768 |@@ 32 > 65536 |@ 25 > 131072 | 5 > > ------------------------------------------------------------------------------- > > Top lock 2nd try: > > > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Hottest Lock Caller > 174 39% 39% 0.00 103909 0xffffff0943f116a0 dmu_zfetch_find > > nsec ------ Time Distribution ------ count Stack > 2048 | 2 dmu_zfetch > 4096 | 3 dbuf_read > 8192 | 4 > dmu_buf_hold_array_by_dnode > 16384 | 3 dmu_buf_hold_array > 32768 |@ 7 > 65536 |@@ 14 > 131072 |@@@@@@@@@@@@@@@@@@@@ 116 > 262144 |@@@ 19 > 524288 | 4 > 1048576 | 2 > > ------------------------------------------------------------------------------- > > Top lock 3rd try: > > > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Hottest Lock Caller > 283 55% 55% 0.00 94602 0xffffff0943ff5a68 dmu_zfetch_find > > nsec ------ Time Distribution ------ count Stack > 512 | 1 dmu_zfetch > 1024 | 1 dbuf_read > 2048 | 0 > dmu_buf_hold_array_by_dnode > 4096 | 5 dmu_buf_hold_array > 8192 | 2 > 16384 | 7 > 32768 | 4 > 65536 |@@@ 33 > 131072 |@@@@@@@@@@@@@@@@@@@@ 198 > 262144 |@@ 27 > 524288 | 2 > 1048576 | 3 > > ------------------------------------------------------------------------------- > > > > As for the MTU question- setting the MTU to 9000 makes read operations > grind almost to a halt at 5MB/s transfer rate. > > -Warren V > > > > On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore > wrote: > > Here?s a theory. You are using small (relatively) MTUs (3000 is less > than the smallest ZFS block size.) So, when you go multipathing this way, > might a single upper layer transaction (ZFS block transfer request, or for > that matter COMSTAR block request) get routed over different paths. This > sounds like a potentially pathological condition to me. > > > > What happens if you increase the MTU to 9000? Have you tried it? I?m > sort of thinking that this will permit each transaction to be issued in a > single IP frame, which may alleviate certain tragic code paths. (That > said, I?m not sure how aware COMSTAR is of the IP MTU. If it is ignorant, > then it shouldn?t matter *that* much, since TCP should do the right thing > here and a single TCP stream should stick to a single underlying NIC. But > if COMSTAR is aware of the MTU, it may do some really screwball things as > it tries to break requests up into single frames.) > > > > Your read spin really looks like only about 22 msec of wait out of a total > run of 30 sec. (That?s not *great*, but neither does it sound tragic.) > Your write is interesting because that looks like it is going a wildly > different path. You should be aware that the locks you see are *not* > necessarily related in call order, but rather are ordered by instance > count. The write code path hitting the task_thread as hard as it does is > really, really weird. Something is pounding on a taskq lock super hard. > The number of taskq_dispatch_ent calls is interesting here. I?m starting > to wonder if it?s something as stupid as a spin where if the taskq is > ?full? (max size reached), a caller just is spinning trying to dispatch > jobs to the taskq. > > > > The taskq_dispatch_ent code is super simple, and it should be almost > impossible to have contention on that lock ? barring a thread spinning hard > on taskq_dispatch (or taskq_dispatch_ent as I think is happening here). > Looking at the various call sites, there are places in both COMSTAR > (iscsit) and in ZFS where this could be coming from. To know which, we > really need to have the back trace associated. > > > > lockstat can give this ? try giving ?-s 5? to give a short backtrace from > this, that will probably give us a little more info about the guilty > caller. :-) > > > > - Garrett > > > > On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer < > developer at lists.illumos.org> wrote: > > > > Hello all, > > I am not using layer 2 flow control. The switch carries line-rate 10G > traffic without error. > > I think I have found the issue via lockstat. The first lockstat is taken > during a multipath read: > > lockstat -kWP sleep 30 > > > Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec) > > Count indv cuml rcnt nsec Hottest Lock Caller > > ------------------------------------------------------------------------------- > 9306 44% 44% 0.00 1557 htable_mutex+0x370 htable_release > 6307 23% 68% 0.00 1207 htable_mutex+0x108 htable_lookup > 596 7% 75% 0.00 4100 0xffffff0931705188 cv_wait > 349 5% 80% 0.00 4437 0xffffff0931705188 taskq_thread > 704 2% 82% 0.00 995 0xffffff0935de3c50 dbuf_create > > The hash table being read here I would guess is the tcp connection hash > table. > > > > When lockstat is run during a multipath write operation, I get: > > Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec) > > Count indv cuml rcnt nsec Hottest Lock Caller > > ------------------------------------------------------------------------------- > 210752 28% 28% 0.00 4781 0xffffff0931705188 taskq_thread > 174471 22% 50% 0.00 4476 0xffffff0931705188 cv_wait > 127183 10% 61% 0.00 2871 0xffffff096f29b510 zio_notify_parent > 176066 10% 70% 0.00 1922 0xffffff0931705188 taskq_dispatch_ent > 105134 9% 80% 0.00 3110 0xffffff096ffdbf10 zio_remove_child > 67512 4% 83% 0.00 1938 0xffffff096f3db4b0 zio_add_child > 45736 3% 86% 0.00 2239 0xffffff0935de3c50 dbuf_destroy > 27781 3% 89% 0.00 3416 0xffffff0935de3c50 dbuf_create > 38536 2% 91% 0.00 2122 0xffffff0935de3b70 dnode_rele > 27841 2% 93% 0.00 2423 0xffffff0935de3b70 dnode_diduse_space > 19020 2% 95% 0.00 3046 0xffffff09d9e305e0 dbuf_rele > 14627 1% 96% 0.00 3632 dbuf_hash_table+0x4f8 dbuf_find > > > Writes are not performing htable lookups, while reads are. > > -Warren V > > > > > > > > On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann wrote: > > Hi, > > I would try *one* TPG which includes both interface addresses > and I would double check for packet drops on the Catalyst. > > The 3560 supports only receive flow control which means, that > a sending 10Gbit port can easily overload a 1Gbit port. > Do you have flow control enabled? > > - Joerg > > > > On 02.03.2015 09:22, W Verb via illumos-developer wrote: > > Hello Garrett, > > No, no 802.3ad going on in this config. > > Here is a basic schematic: > > > https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing > > Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide: > > > https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing > > Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The > switch is set to allow 9148-byte frames, and I'm not seeing any > errors/buffer overruns on the switch. > > Here is a screenshot of a packet capture from a read operation on the > guest OS (from it's local drive, which is actually a VMDK file on the > storage server). In this example, only a single 1G ESXi kernel interface > (vmk1) is bound to the software iSCSI initiator. > > > https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing > > Note that there's a nice, well-behaved window sizing process taking > place. The ESXi decreases the scaled window by 11 or 12 for each ACK, > then bumps it back up to 512. > > Here is a similar screenshot of a single-interface write operation: > > > https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing > > There are no pauses or gaps in the transmission rate in the > single-interface transfers. > > > In the next screenshots, I have enabled an additional 1G interface on > the ESXi host, and bound it to the iSCSI initiator. The new interface is > bound to a separate physical port, uses a different VLAN on the switch, > and talks to a different 10G port on the storage server. > > First, let's look at a write operation on the guest OS, which happily > pumps data at near-line-rate to the storage server. > > Here is a sequence number trace diagram. Note how the transfer has a > nice, smooth increment rate over the entire transfer. > > > https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing > > Here are screenshots from packet captures on both 1G interfaces: > > > https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing > > https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing > > Note how we again see nice, smooth window adjustment, and no gaps in > transmission. > > > But now, let's look at the problematic two-interface Read operation. > First, the sequence graph: > > > https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing > > As you can see, there are gaps and jumps in the transmission throughout > the transfer. > It is very illustrative to look at captures of the gaps, which are > occurring on both interfaces: > > > https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing > > https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing > > As you can see, there are ~.4 second pauses in transmission from the > storage server, which kills the transfer rate. > It's clear that the ESXi box ACKs the prior iSCSI operation to > completion, then makes a new LUN request, which the storage server > immediately replies to. The ESXi ACKs the response packet from the > storage server, then waits...and waits....and waits... until eventually > the storage server starts transmitting again. > > Because the pause happens while the ESXi client is waiting for a packet > from the storage server, that tells me that the gaps are not an artifact > of traffic being switched between both active interfaces, but are > actually indicative of short hangs occurring on the server. > > Having a pause or two in transmission is no big deal, but in my case, it > is happening constantly, and dropping my overall read transfer rate down > to 20-60MB/s, which is slower than the single interface transfer rate > (~90-100MB/s). > > Decreasing the MTU makes the pauses shorter, increasing them makes the > pauses longer. > > Another interesting thing is that if I set the multipath io interval to > 3 operations instead of 1, I get better throughput. In other words, the > less frequently I swap IP addresses on my iSCSI requests from the ESXi > unit, the fewer pauses I see. > > Basically, COMSTAR seems to choke each time an iSCSI request from a new > IP arrives. > > Because the single interface transfer is near line rate, that tells me > that the storage system (mpt_sas, zfs, etc) is working fine. It's only > when multiple paths are attempted that iSCSI falls on its face during > reads. > > All of these captures were taken without a cache device being attached > to the storage zpool, so this isn't looking like some kind of ZFS ARC > problem. As mentioned previously, local transfers to/from the zpool are > showing ~300-500 MB/s rates over long transfers (10G+). > > -Warren V > > On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore > > wrote: > > I?m not sure I?ve followed properly. You have *two* interfaces. > You are not trying to provision these in an aggr are you? As far as > I?m aware, VMware does not support 802.3ad link aggregations. (Its > possible that you can make it work with ESXi if you give the entire > NIC to the guest ? but I?m skeptical.) The problem is that if you > try to use link aggregation, some packets (up to half!) will be > lost. TCP and other protocols fare poorly in this situation. > > Its possible I?ve totally misunderstood what you?re trying to do, in > which case I apologize. > > The idle thing is a red-herring ? the cpu is waiting for work to do, > probably because packets haven?t arrived (or where dropped by the > hypervisor!) I wouldn?t read too much into that except that your > network stack is in trouble. I?d look a bit more closely at the > kstats for tcp ? I suspect you?ll see retransmits or out of order > values that are unusually high ? if so this may help validate my > theory above. > > - Garrett > > On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer > > > > > wrote: > > Hello all, > > > Well, I no longer blame the ixgbe driver for the problems I'm seeing. > > > I tried Joerg's updated driver, which didn't improve the issue. So > I went back to the drawing board and rebuilt the server from scratch. > > What I noted is that if I have only a single 1-gig physical > interface active on the ESXi host, everything works as expected. > As soon as I enable two interfaces, I start seeing the performance > problems I've described. > > Response pauses from the server that I see in TCPdumps are still > leading me to believe the problem is delay on the server side, so > I ran a series of kernel dtraces and produced some flamegraphs. > > > This was taken during a read operation with two active 10G > interfaces on the server, with a single target being shared by two > tpgs- one tpg for each 10G physical port. The host device has two > 1G ports enabled, with VLANs separating the active ports into > 10G/1G pairs. ESXi is set to multipath using both VLANS with a > round-robin IO interval of 1. > > > https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing > > > This was taken during a write operation: > > > https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing > > > I then rebooted the server and disabled C-State, ACPI T-State, and > general EIST (Turbo boost) functionality in the CPU. > > I when I attempted to boot my guest VM, the iSCSI transfer > gradually ground to a halt during the boot loading process, and > the guest OS never did complete its boot process. > > Here is a flamegraph taken while iSCSI is slowly dying: > > > https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing > > > I edited out cpu_idle_adaptive from the dtrace output and > regenerated the slowdown graph: > > > https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing > > > I then edited cpu_idle_adaptive out of the speedy write operation > and regenerated that graph: > > > https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing > > > I have zero experience with interpreting flamegraphs, but the most > significant difference I see between the slow read example and the > fast write example is in unix`thread_start --> unix`idle. There's > a good chunk of "unix`i86_mwait" in the read example that is not > present in the write example at all. > > Disabling the l2arc cache device didn't make a difference, and I > had to reenable EIST support on the CPU to get my VMs to boot. > > I am seeing a variety of bug reports going back to 2010 regarding > excessive mwait operations, with the suggested solutions usually > being to set "cpupm enable poll-mode" in power.conf. That change > also had no effect on speed. > > -Warren V > > > > > -----Original Message----- > > From: Chris Siebenmann [mailto:cks at cs.toronto.edu] > > Sent: Monday, February 23, 2015 8:30 AM > > To: W Verb > > Cc: omnios-discuss at lists.omniti.com > > ; cks at cs.toronto.edu > > > Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and > the Greek economy > > > > Chris, thanks for your specific details. I'd appreciate it if you > > > could tell me which copper NIC you tried, as well as to pass on the > > > iSCSI tuning parameters. > > > Our copper NIC experience is with onboard X540-AT2 ports on > SuperMicro hardware (which have the guaranteed 10-20 msec lock > hold) and dual-port 82599EB TN cards (which have some sort of > driver/hardware failure under load that eventually leads to > 2-second lock holds). I can't recommend either with the current > driver; we had to revert to 1G networking in order to get stable > servers. > > > The iSCSI parameter modifications we do, across both initiators > and targets, are: > > > initialr2tno > > firstburstlength128k > > maxrecvdataseglen128k[only on Linux backends] > > maxxmitdataseglen128k[only on Linux backends] > > > The OmniOS initiator doesn't need tuning for more than the first > two parameters; on the Linux backends we tune up all four. My > extended thoughts on these tuning parameters and why we touch them > can be found > > here: > > > > http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol > > http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning > > > The short version is that these parameters probably only make a > small difference but their overall goal is to do 128KB ZFS reads > and writes in single iSCSI operations (although they will be > fragmented at the TCP > > layer) and to do iSCSI writes without a back-and-forth delay > between initiator and target (that's 'initialr2t no'). > > > I think basically everyone should use InitialR2T set to no and in > fact that it should be the software default. These days only > unusually limited iSCSI targets should need it to be otherwise and > they can change their setting for it (initiator and target must > both agree to it being 'yes', so either can veto it). > > > - cks > > > > On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann > > wrote: > > Hi, > > I think your problem is caused by your link properties or your > switch settings. In general the standard ixgbe seems to perform > well. > > I had trouble after changing the default flow control settings > to "bi" > and this was my motivation to update the ixgbe driver a long > time ago. > After I have updated our systems to ixgbe 2.5.8 I never had any > problems .... > > Make sure your switch has support for jumbo frames and you use > the same mtu on all ports, otherwise the smallest will be used. > > What switch do you use? I can tell you nice horror stories about > different vendors.... > > - Joerg > > On 23.02.2015 10:31, W Verb wrote: > > Thank you Joerg, > > I've downloaded the package and will try it tomorrow. > > The only thing I can add at this point is that upon review > of my > testing, I may have performed my "pkg -u" between the > initial quad-gig > performance test and installing the 10G NIC. So this may > be a new > problem introduced in the latest updates. > > Those of you who are running 10G and have not upgraded to > the latest > kernel, etc, might want to do some additional testing > before running the > update. > > -Warren V > > On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann > > > >> wrote: > > Hi, > > I remember there was a problem with the flow control > settings in the > ixgbe > driver, so I updated it a long time ago for our > internal servers to > 2.5.8. > Last weekend I integrated the latest changes from the > FreeBSD driver > to bring > the illumos ixgbe to 2.5.25 but I had no time to test > it, so it's > completely > untested! > > > If you would like to give the latest driver a try you > can fetch the > kernel modules from > https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 > > > > > Clone your boot environment, place the modules in the > new environment > and update the boot-archive of the new BE. > > - Joerg > > > > > > On 23.02.2015 02:54, W Verb wrote: > > By the way, to those of you who have working > setups: please send me > your pool/volume settings, interface linkprops, > and any kernel > tuning > parameters you may have set. > > Thanks, > Warren V > > On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip > > >> > > > wrote: > > I can't say I totally agree with your performance > assessment. I run Intel > X520 in all my OmniOS boxes. > > Here is a capture of nfssvrtop I made while > running many > storage vMotions > between two OmniOS boxes hosting NFS > datastores. This is a > 10 host VMware > cluster. Both OmniOS boxes are dual 10G > connected with > copper twin-ax to > the in rack Nexus 5010. > > VMware does 100% sync writes, I use ZeusRAM > SSDs for log > devices. > > -Chip > > 2014 Apr 24 08:05:51, load: 12.64, read: > 17330243 KB, > swrite: 15985 KB, > awrite: 1875455 KB > > Ver Client NFSOPS Reads > SWrites AWrites > Commits Rd_bw > SWr_bw AWr_bw Rd_t SWr_t AWr_t > Com_t Align% > > 4 10.28.17.105 0 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 10.28.17.215 0 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 10.28.17.213 0 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 10.28.16.151 0 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 all 1 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 3 10.28.16.175 3 0 > 3 0 > 0 1 > 11 0 4806 48 0 0 85 > > 3 10.28.16.183 6 0 > 6 0 > 0 3 > 162 0 549 124 0 0 > 73 > > 3 10.28.16.180 11 0 > 10 0 > 0 3 > 27 0 776 89 0 0 67 > > 3 10.28.16.176 28 2 > 26 0 > 0 10 > 405 0 2572 198 0 0 > 100 > > 3 10.28.16.178 4606 4602 > 4 0 > 0 294534 > 3 0 723 49 0 0 99 > > 3 10.28.16.179 4905 4879 > 26 0 > 0 312208 > 311 0 735 271 0 0 > 99 > > 3 10.28.16.181 5515 5502 > 13 0 > 0 352107 > 77 0 89 87 0 0 99 > > 3 10.28.16.184 12095 12059 > 10 0 > 0 763014 > 39 0 249 147 0 0 99 > > 3 10.28.58.1 15401 6040 > 116 6354 > 53 191605 > 474 202346 192 96 144 83 > 99 > > 3 all 42574 33086 <42574%2033086>> > > 217 > 6354 53 1913488 > 1582 202300 348 138 153 105 > 99 > > > > > > On Fri, Feb 20, 2015 at 11:46 PM, W Verb > > > > >> wrote: > > > Hello All, > > Thank you for your replies. > I tried a few things, and found the following: > > 1: Disabling hyperthreading support in the > BIOS drops > performance overall > by a factor of 4. > 2: Disabling VT support also seems to have > some effect, > although it > appears to be minor. But this has the > amusing side > effect of fixing the > hangs I've been experiencing with fast > reboot. Probably > by disabling kvm. > 3: The performance tests are a bit tricky > to quantify > because of caching > effects. In fact, I'm not entirely sure > what is > happening here. It's just > best to describe what I'm seeing: > > The commands I'm using to test are > dd if=/dev/zero of=./test.dd bs=2M count=5000 > dd of=/dev/null if=./test.dd bs=2M count=5000 > The host vm is running Centos 6.6, and has > the latest > vmtools installed. > There is a host cache on an SSD local to > the host that > is also in place. > Disabling the host cache didn't > immediately have an > effect as far as I could > see. > > The host MTU set to 3000 on all iSCSI > interfaces for all > tests. > > Test 1: Right after reboot, with an ixgbe > MTU of 9000, > the write test > yields an average speed over three tests > of 137MB/s. The > read test yields an > average over three tests of 5MB/s. > > Test 2: After setting "ifconfig ixgbe0 mtu > 3000", the > write tests yield > 140MB/s, and the read tests yield 53MB/s. > It's important > to note here that > if I cut the read test short at only > 2-3GB, I get > results upwards of > 350MB/s, which I assume is local > cache-related distortion. > > Test 3: MTU of 1500. Read tests are up to > 156 MB/s. > Write tests yield > about 142MB/s. > Test 4: MTU of 1000: Read test at 182MB/s. > Test 5: MTU of 900: Read test at 130 MB/s. > Test 6: MTU of 1000: Read test at 160MB/s. > Write tests > are now > consistently at about 300MB/s. > Test 7: MTU of 1200: Read test at 124MB/s. > Test 8: MTU of 1000: Read test at 161MB/s. > Write at 261MB/s. > > A few final notes: > L1ARC grabs about 10GB of RAM during the > tests, so > there's definitely some > read caching going on. > The write operations are easier to observe > with iostat, > and I'm seeing io > rates that closely correlate with the > network write speeds. > > > Chris, thanks for your specific details. > I'd appreciate > it if you could > tell me which copper NIC you tried, as > well as to pass > on the iSCSI tuning > parameters. > > I've ordered an Intel EXPX9502AFXSR, which > uses the > 82598 chip instead of > the 82599 in the X520. If I get similar > results with my > fiber transcievers, > I'll see if I can get a hold of copper ones. > > But I should mention that I did indeed > look at PHY/MAC > error rates, and > they are nil. > > -Warren V > > On Fri, Feb 20, 2015 at 7:25 PM, Chris > Siebenmann > > > > >> > > wrote: > > > After installation and > configuration, I observed > all kinds of bad > behavior > in the network traffic between the > hosts and the > server. All of this > bad > behavior is traced to the ixgbe > driver on the > storage server. Without > going > into the full troubleshooting > process, here are > my takeaways: > > [...] > > For what it's worth, we managed to > achieve much > better line rates on > copper 10G ixgbe hardware of various > descriptions > between OmniOS > and CentOS 7 (I don't think we ever > tested OmniOS to > OmniOS). I don't > believe OmniOS could do TCP at full > line rate but I > think we managed 700+ > Mbytes/sec on both transmit and > receive and we got > basically disk-limited > speeds with iSCSI (across multiple > disks on > multi-disk mirrored pools, > OmniOS iSCSI initiator, Linux iSCSI > targets). > > I don't believe we did any specific > kernel tuning > (and in fact some of > our attempts to fiddle ixgbe driver > parameters blew > up in our face). > We did tune iSCSI connection > parameters to increase > various buffer > sizes so that ZFS could do even large > single > operations in single iSCSI > transactions. (More details available > if people are > interested.) > > 10: At the wire level, the speed > problems are > clearly due to pauses in > response time by omnios. At 9000 > byte frame > sizes, I see a good number > of duplicate ACKs and fast > retransmits during > read operations (when > omnios is transmitting). But below > about a > 4100-byte MTU on omnios > (which seems to correlate to > 4096-byte iSCSI > block transfers), the > transmission errors fade away and > we only see > the transmission pause > problem. > > > This is what really attracted my > attention. In > our OmniOS setup, our > specific Intel hardware had ixgbe > driver issues that > could cause > activity stalls during once-a-second > link heartbeat > checks. This > obviously had an effect at the TCP and > iSCSI layers. > My initial message > to illumos-developer sparked a potentially > interesting discussion: > > > http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/ > < > http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/ > > > > < > http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/ > < > http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/ > >> > > If you think this is a possibility in > your setup, > I've put the DTrace > script I used to hunt for this up on > the web: > > > http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d > < > http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d> > > < > http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d > >> > > This isn't the only potential source > of driver > stalls by any means, it's > just the one I found. You may also > want to look at > lockstat in general, > as information it reported is what led > us to look > specifically at the > ixgbe code here. > > (If you suspect kernel/driver issues, > lockstat > combined with kernel > source is a really excellent resource.) > > - cks > > > > > > ___________________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti > .____com > > > > http://lists.omniti.com/____mailman/listinfo/omnios-____discuss > > > > > > > ___________________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti > .____com > > > > http://lists.omniti.com/____mailman/listinfo/omnios-____discuss > > > > > > > -- > OSN Online Service Nuernberg GmbH, Bucher Str. 78, > 90408 Nuernberg > Tel: +49 911 39905-0 <%2B49%20911%2039905-0>> > > - Fax: > +49 911 > 39905-55 <%2B49%20911%2039905-55>> - > http://www.osn.de > HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg > Goltermann > > > > -- > OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg > Tel: +49 911 39905-0 <%2B49%20911%2039905-0>> - Fax: +49 > 911 39905-55 > > - http://www.osn.de > > HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann > > > *illumos-developer* | Archives > > > | Modify Your Subscription > [Powered by Listbox] > > > > *illumos-developer* | Archives > > | > Modify > > > > Your Subscription [Powered by Listbox] > > ... > > [Message clipped] -------------- next part -------------- An HTML attachment was scrubbed... URL: From gate03 at landcroft.co.uk Wed Mar 4 04:26:03 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Wed, 4 Mar 2015 14:26:03 +1000 Subject: [OmniOS-discuss] speeding up file access Message-ID: <20150304142603.152ac1da@emeritus> Hello list; this is a very basic question about ZFS performance from someone with limited sysadmin knowledge. I've seen various messages about ZILs and caching and noticed that my Supermicro 5017C-LF (http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm). This has a single USB socket on the board so I wondered if it would be worth putting a USB stick / `thumbdrive' in there and using it as the ZIL / cache. I know the real answer to my question is 'buy a proper server' but this is a home system and cost, noise and power-consumption all mandate the current choice of machine. (Yes; the USB socket is vertical; I'd have to buy a right-angle converter) Thanks, Michael. From wverb73 at gmail.com Wed Mar 4 05:21:56 2015 From: wverb73 at gmail.com (W Verb) Date: Tue, 3 Mar 2015 21:21:56 -0800 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> <54F44602.5030705@osn.de> <064C3FC8-43E0-4757-AADA-831303782A4C@damore.org> Message-ID: Hello all, This is probably the last message in this thread. I pulled the quad-gig NIC out of one of my hosts, and installed an X520. I then set a single 10G port on the server to be on the same VLAN as the host, and defined a vswitch, vmknic, etc on the host. I set the MTU to be 9000 on both sides, then ran my tests. Read: 130 MB/s. Write: 156 MB/s. Additionally, at higher MTUs, the NIC would periodically lock up until I performed an "ipadm disable-if -t ixgbe0" and re-enabled it. I tried your updated driver, Jeorg, but unfortunately it failed quite often. I then disabled stmf, enabled NFS (v3 only) on the server, and shared a dataset on the zpool with "share -f nfs /ppool/testy". I then mounted the server dataset on the host via NFS, and copied my test VM from the iSCSI zvol to the NFS dataset. I also removed the binding of the 10G port on the host from the sw iscsi interface. Running the same tests on the VM over NFSv3 yielded: Read: 650MB/s Write: 306MB/s This is getting within 10% of the throughput I consistently get on dd operations local on the server, so I'm pretty happy that I'm getting as good as I'm going to get until I add more drives. Additionally, I haven't experienced any NIC hangs. I tried varying the settings in ixgbe.conf, the MTU, and disabling LSO on the host and server, but nothing really made that much of a difference (except reducing the MTU made things about 20-30% slower). mpstat during both NFS and iSCSI transfers showed all processors as getting roughly the same number of interrupts, etc, although I did see a varying number of spins on reader/writer locks during the iSCSI transfers. The NFS showed no srws at all. Here is a pretty representative example of a 1s mpstat during an iSCSI transfer: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl set 0 0 0 0 3246 2690 8739 6 772 5967 2 0 0 11 0 89 0 1 0 0 0 2366 2249 7910 8 988 5563 2 302 0 9 0 91 0 2 0 0 0 2455 2344 5584 5 687 5656 3 66 0 9 0 91 0 3 0 0 25 248 12 6210 1 885 5679 2 0 0 9 0 91 0 4 0 0 0 284 7 5450 2 861 5751 1 0 0 8 0 92 0 5 0 0 0 232 3 4513 0 547 5733 3 0 0 7 0 93 0 6 0 0 0 322 8 6084 1 836 6295 2 0 0 8 0 92 0 7 0 0 0 3114 2848 8229 4 648 4966 2 0 0 10 0 90 0 So, it seems that it's COMSTAR/iSCSI that's broke as hell, not ixgbe. My apologies to anyone I may have offended with my pre-judgement. The consequences of this performance issue are significant: 1: Instead of being able to utilize the existing quad-port NICs I have in my hosts, I must use dual 10G cards for redundancy purposes. 2: I must build out a full 10G switching infrastructure. 3: The network traffic is inherently less secure, as it is essentially impossible to do real security with NFSv3 (that is supported by ESXi). In the short run, I have already ordered some relatively cheap 20G infiniband gear that will hopefully push up the cost/performance ratio. However, I have received all sorts of advice about how painful it can be to build and maintain infiniband, and if iSCSI over 10G ethernet is this painful, I'm not hopeful that infiniband will "just work". The last option, of course, is to bail out of the Solaris derivatives and move to ZoL or ZoBSD. The drawbacks of this are: 1: ZoL doesn't easily support booting off of mirrored USB flash drives, let alone running the root filesystem and swap on them. FreeNAS, by way of comparison, puts a 2G swap partition on each zdev, which (strangely enough) causes it to often crash when a zdev experiences a failure under load. 2: Neither ZoL or FreeNAS have good, stable, kernel-based iSCSI implementations. FreeNAS is indeed testing istgt, but it proved unstable for my purposes in recent builds. Unfortunately, stmf hasn't proved itself any better. There are other minor differences, but these are the ones that brought me to OmniOS in the first place. We'll just have to wait and see how well the infiniband stuff works. Hopefully this exercise will help prevent others from going down the same rabbit-hole that I did. -Warren V On Tue, Mar 3, 2015 at 3:45 PM, W Verb wrote: > Hello Rob et al, > > Thank you for taking the time to look at this problem with me. I > completely understand your inclination to look at the network as the most > probable source of my issue, but I believe that this is a pretty clear-cut > case of server-side issues. > > 1: I did run ping RTT tests during both read and write operations with > multiple interfaces enabled, and the RTT stayed at ~.2ms regardless of > whether traffic was actively being transmitted/received or not. > > 2: I am not seeing the TCP window size bouncing around, and I am certainly > not seeing starvation and delay in my packet captures. It is true that I do > see delayed ACKs and retransmissions when I bump the MTU to 9000 on both > sides, but I stopped testing with high MTU as soon as I saw it happening > because I have a good understanding of incast. All of my recent testing has > been with MTUs between 1000 and 3000 bytes. > > 3: When testing with MTUs between 1000 and 3000 bytes, I do not see lost > packets and retransmission in captures on either the server or client side. > I only see staggered transmission delays on the part of the server. > > 4: The client is consistently advertising a large window size (20k+), so > the TCP throttling mechanism does not appear to play into this. > > 5: As mentioned previously, layer 2 flow control is not enabled anywhere > in the network, so there are no lower-level mechanisms at work. > > 6: Upon checking buffer and queue sizes (and doing the appropriate > research into documentation on the C3560E's buffer sizes), I do not see > large numbers of frames being dropped by the switch. It does happen at > larger MTUs, but not very often (and not consistently) during transfers at > 1000-3000 byte MTUs. I do not have QoS, policing, or rate-shaping enabled. > > 7: Network interface stats on both the server and the ESXi client show no > errors of any kind. This is via netstat on the server, and esxcli / Vsphere > client on the ESXi box. > > 8: When looking at captures taken simultaneously on the server and client > side, the server-side transmission pauses are consistently seen and > reproducible, even after multiple server rebuilds, ESXi rebuilds, vSphere > reinstallations (down to wiping the SQL db), various COMSTAR configuration > variations, multiple 10G NICs with different NIC chipsets, multiple > switches (I tried both a 48-port and 24-port C3560E), multiple IOS > revisions (12.2 and 15.0), OmniOS versions (r151012 and previous) multiple > cables, transceivers, etc etc etc etc etc > > For your review, I have uploaded the actual packet captures to Google > Drive: > > > https://drive.google.com/file/d/0BwyUMjibonYQZS03dHJ2ZjJvTEE/view?usp=sharing > 2 int write - ESXi vmk5 > > https://drive.google.com/file/d/0BwyUMjibonYQTlNTQ2M5bjlxZ00/view?usp=sharing > 2 int write - ESXi vmk1 > > https://drive.google.com/file/d/0BwyUMjibonYQUEFsSVJCYXBVX3c/view?usp=sharing > 2 int read - server ixgbe0 > > https://drive.google.com/file/d/0BwyUMjibonYQT3FBbElnNFpJTzQ/view?usp=sharing > 2 int read - ESXi vmk5 > > https://drive.google.com/file/d/0BwyUMjibonYQU1hXdFRLM2cxSTA/view?usp=sharing > 2 int read - ESXi vmk1 > > https://drive.google.com/file/d/0BwyUMjibonYQNEFZSHVZdFNweDA/view?usp=sharing > 1 int write - ESXi vmk1 > > https://drive.google.com/file/d/0BwyUMjibonYQM3FpTmloQm5iMGc/view?usp=sharing > 1 int read - ESXi vmk1 > > Regards, > > Warren V > > On Mon, Mar 2, 2015 at 1:11 PM, Mallory, Rob > wrote: > >> Just an EWAG, and forgive me for not following closely, I just saw >> this in my inbox, and looked at it and the screenshots for 2 minutes. >> >> >> >> But this looks like the typical incast problem.. see >> http://www.pdl.cmu.edu/Incast/ >> >> where your storage servers (there are effectively two with ISCSI/MPIO if >> round-robin is working) have networks which are 20:1 oversubscribed to your >> 1GbE host interfaces. (although one of the tcpdumps shows only one server >> so it may be choked out completely) >> >> >> >> What is your BDP? I?m guessing .150ms * 1GbE. For single-link that gets >> you to a MSS of 18700 or so. >> >> >> >> On your 1GbE connected clients, leave MTU at 9k, set the following in >> sysctl.conf, >> >> And reboot. >> >> >> >> net.ipv4.tcp_rmem = 4096 8938 17876 >> >> >> >> If MPIO from the server is indeed round-robining properly, this will >> ?make things fit? much better. >> >> >> >> Note that your tcp_wmem can and should stay high, since you are not >> oversubscribed going from client?server ; you only need to tweak the >> tcp receive window size. >> >> >> >> I?ve not done it in quite some time, but IIRC, You can also set these >> from the server side with: >> >> Route add -sendpipe 8930 or ?ssthresh >> >> >> >> And I think you can see the hash-table with computed BDP per client with >> ndd. >> >> >> >> I would try playing with those before delving deep into potential bugs in >> the TCP, nic driver, zfs, or vm. >> >> -Rob >> >> >> >> *From:* W Verb via illumos-developer [mailto:developer at lists.illumos.org] >> >> *Sent:* Monday, March 02, 2015 12:20 PM >> *To:* Garrett D'Amore >> *Cc:* Joerg Goltermann; illumos-dev; omnios-discuss at lists.omniti.com >> *Subject:* Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, >> Lindsay Lohan, and the Greek economy >> >> >> >> Hello, >> >> vmstat seems pretty boring. Certainly nothing going to swap. >> >> root at sanbox:/root# vmstat >> kthr memory page disk faults cpu >> r b w swap free re mf pi po fr de sr po ro s0 s2 in sy cs us >> sy id >> 0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0 >> 1 99 >> >> Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep >> 30" during the "fast" write operation. >> >> >> ------------------------------------------------------------------------------- >> Count indv cuml rcnt nsec Hottest Lock Caller >> 50934 3% 79% 0.00 3437 0xffffff093145ba40 taskq_dispatch_ent >> >> nsec ------ Time Distribution ------ count Stack >> 128 | 7 >> spa_taskq_dispatch_ent >> 256 |@@ 4333 zio_taskq_dispatch >> 512 |@@ 3863 zio_issue_async >> 1024 |@@@@@ 9717 zio_execute >> 2048 |@@@@@@@@@ 15904 >> 4096 |@@@@ 7595 >> 8192 |@@ 4498 >> 16384 |@ 2662 >> 32768 |@ 1886 >> 65536 | 434 >> 131072 | 34 >> 262144 | 1 >> >> ------------------------------------------------------------------------------- >> >> >> However, the truly "broken" function is a read operation: >> >> Top lock 1st try: >> >> ------------------------------------------------------------------------------- >> Count indv cuml rcnt nsec Hottest Lock Caller >> 474 15% 15% 0.00 7031 0xffffff093145b6f8 cv_wait >> >> nsec ------ Time Distribution ------ count Stack >> 256 |@ 29 taskq_thread_wait >> 512 |@@@@@@ 100 taskq_thread >> 1024 |@@@@ 72 thread_start >> 2048 |@@@@ 69 >> 4096 |@@@ 51 >> 8192 |@@ 47 >> 16384 |@@ 44 >> 32768 |@@ 32 >> 65536 |@ 25 >> 131072 | 5 >> >> ------------------------------------------------------------------------------- >> >> Top lock 2nd try: >> >> >> ------------------------------------------------------------------------------- >> Count indv cuml rcnt nsec Hottest Lock Caller >> 174 39% 39% 0.00 103909 0xffffff0943f116a0 dmu_zfetch_find >> >> nsec ------ Time Distribution ------ count Stack >> 2048 | 2 dmu_zfetch >> 4096 | 3 dbuf_read >> 8192 | 4 >> dmu_buf_hold_array_by_dnode >> 16384 | 3 dmu_buf_hold_array >> 32768 |@ 7 >> 65536 |@@ 14 >> 131072 |@@@@@@@@@@@@@@@@@@@@ 116 >> 262144 |@@@ 19 >> 524288 | 4 >> 1048576 | 2 >> >> ------------------------------------------------------------------------------- >> >> Top lock 3rd try: >> >> >> ------------------------------------------------------------------------------- >> Count indv cuml rcnt nsec Hottest Lock Caller >> 283 55% 55% 0.00 94602 0xffffff0943ff5a68 dmu_zfetch_find >> >> nsec ------ Time Distribution ------ count Stack >> 512 | 1 dmu_zfetch >> 1024 | 1 dbuf_read >> 2048 | 0 >> dmu_buf_hold_array_by_dnode >> 4096 | 5 dmu_buf_hold_array >> 8192 | 2 >> 16384 | 7 >> 32768 | 4 >> 65536 |@@@ 33 >> 131072 |@@@@@@@@@@@@@@@@@@@@ 198 >> 262144 |@@ 27 >> 524288 | 2 >> 1048576 | 3 >> >> ------------------------------------------------------------------------------- >> >> >> >> As for the MTU question- setting the MTU to 9000 makes read operations >> grind almost to a halt at 5MB/s transfer rate. >> >> -Warren V >> >> >> >> On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore >> wrote: >> >> Here?s a theory. You are using small (relatively) MTUs (3000 is less >> than the smallest ZFS block size.) So, when you go multipathing this way, >> might a single upper layer transaction (ZFS block transfer request, or for >> that matter COMSTAR block request) get routed over different paths. This >> sounds like a potentially pathological condition to me. >> >> >> >> What happens if you increase the MTU to 9000? Have you tried it? I?m >> sort of thinking that this will permit each transaction to be issued in a >> single IP frame, which may alleviate certain tragic code paths. (That >> said, I?m not sure how aware COMSTAR is of the IP MTU. If it is ignorant, >> then it shouldn?t matter *that* much, since TCP should do the right thing >> here and a single TCP stream should stick to a single underlying NIC. But >> if COMSTAR is aware of the MTU, it may do some really screwball things as >> it tries to break requests up into single frames.) >> >> >> >> Your read spin really looks like only about 22 msec of wait out of a >> total run of 30 sec. (That?s not *great*, but neither does it sound >> tragic.) Your write is interesting because that looks like it is going a >> wildly different path. You should be aware that the locks you see are >> *not* necessarily related in call order, but rather are ordered by instance >> count. The write code path hitting the task_thread as hard as it does is >> really, really weird. Something is pounding on a taskq lock super hard. >> The number of taskq_dispatch_ent calls is interesting here. I?m starting >> to wonder if it?s something as stupid as a spin where if the taskq is >> ?full? (max size reached), a caller just is spinning trying to dispatch >> jobs to the taskq. >> >> >> >> The taskq_dispatch_ent code is super simple, and it should be almost >> impossible to have contention on that lock ? barring a thread spinning hard >> on taskq_dispatch (or taskq_dispatch_ent as I think is happening here). >> Looking at the various call sites, there are places in both COMSTAR >> (iscsit) and in ZFS where this could be coming from. To know which, we >> really need to have the back trace associated. >> >> >> >> lockstat can give this ? try giving ?-s 5? to give a short backtrace from >> this, that will probably give us a little more info about the guilty >> caller. :-) >> >> >> >> - Garrett >> >> >> >> On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer < >> developer at lists.illumos.org> wrote: >> >> >> >> Hello all, >> >> I am not using layer 2 flow control. The switch carries line-rate 10G >> traffic without error. >> >> I think I have found the issue via lockstat. The first lockstat is taken >> during a multipath read: >> >> lockstat -kWP sleep 30 >> >> >> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec) >> >> Count indv cuml rcnt nsec Hottest Lock Caller >> >> ------------------------------------------------------------------------------- >> 9306 44% 44% 0.00 1557 htable_mutex+0x370 htable_release >> 6307 23% 68% 0.00 1207 htable_mutex+0x108 htable_lookup >> 596 7% 75% 0.00 4100 0xffffff0931705188 cv_wait >> 349 5% 80% 0.00 4437 0xffffff0931705188 taskq_thread >> 704 2% 82% 0.00 995 0xffffff0935de3c50 dbuf_create >> >> The hash table being read here I would guess is the tcp connection hash >> table. >> >> >> >> When lockstat is run during a multipath write operation, I get: >> >> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec) >> >> Count indv cuml rcnt nsec Hottest Lock Caller >> >> ------------------------------------------------------------------------------- >> 210752 28% 28% 0.00 4781 0xffffff0931705188 taskq_thread >> 174471 22% 50% 0.00 4476 0xffffff0931705188 cv_wait >> 127183 10% 61% 0.00 2871 0xffffff096f29b510 zio_notify_parent >> 176066 10% 70% 0.00 1922 0xffffff0931705188 taskq_dispatch_ent >> 105134 9% 80% 0.00 3110 0xffffff096ffdbf10 zio_remove_child >> 67512 4% 83% 0.00 1938 0xffffff096f3db4b0 zio_add_child >> 45736 3% 86% 0.00 2239 0xffffff0935de3c50 dbuf_destroy >> 27781 3% 89% 0.00 3416 0xffffff0935de3c50 dbuf_create >> 38536 2% 91% 0.00 2122 0xffffff0935de3b70 dnode_rele >> 27841 2% 93% 0.00 2423 0xffffff0935de3b70 dnode_diduse_space >> 19020 2% 95% 0.00 3046 0xffffff09d9e305e0 dbuf_rele >> 14627 1% 96% 0.00 3632 dbuf_hash_table+0x4f8 dbuf_find >> >> >> Writes are not performing htable lookups, while reads are. >> >> -Warren V >> >> >> >> >> >> >> >> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann wrote: >> >> Hi, >> >> I would try *one* TPG which includes both interface addresses >> and I would double check for packet drops on the Catalyst. >> >> The 3560 supports only receive flow control which means, that >> a sending 10Gbit port can easily overload a 1Gbit port. >> Do you have flow control enabled? >> >> - Joerg >> >> >> >> On 02.03.2015 09:22, W Verb via illumos-developer wrote: >> >> Hello Garrett, >> >> No, no 802.3ad going on in this config. >> >> Here is a basic schematic: >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing >> >> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide: >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing >> >> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The >> switch is set to allow 9148-byte frames, and I'm not seeing any >> errors/buffer overruns on the switch. >> >> Here is a screenshot of a packet capture from a read operation on the >> guest OS (from it's local drive, which is actually a VMDK file on the >> storage server). In this example, only a single 1G ESXi kernel interface >> (vmk1) is bound to the software iSCSI initiator. >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing >> >> Note that there's a nice, well-behaved window sizing process taking >> place. The ESXi decreases the scaled window by 11 or 12 for each ACK, >> then bumps it back up to 512. >> >> Here is a similar screenshot of a single-interface write operation: >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing >> >> There are no pauses or gaps in the transmission rate in the >> single-interface transfers. >> >> >> In the next screenshots, I have enabled an additional 1G interface on >> the ESXi host, and bound it to the iSCSI initiator. The new interface is >> bound to a separate physical port, uses a different VLAN on the switch, >> and talks to a different 10G port on the storage server. >> >> First, let's look at a write operation on the guest OS, which happily >> pumps data at near-line-rate to the storage server. >> >> Here is a sequence number trace diagram. Note how the transfer has a >> nice, smooth increment rate over the entire transfer. >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing >> >> Here are screenshots from packet captures on both 1G interfaces: >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing >> >> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing >> >> Note how we again see nice, smooth window adjustment, and no gaps in >> transmission. >> >> >> But now, let's look at the problematic two-interface Read operation. >> First, the sequence graph: >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing >> >> As you can see, there are gaps and jumps in the transmission throughout >> the transfer. >> It is very illustrative to look at captures of the gaps, which are >> occurring on both interfaces: >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing >> >> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing >> >> As you can see, there are ~.4 second pauses in transmission from the >> storage server, which kills the transfer rate. >> It's clear that the ESXi box ACKs the prior iSCSI operation to >> completion, then makes a new LUN request, which the storage server >> immediately replies to. The ESXi ACKs the response packet from the >> storage server, then waits...and waits....and waits... until eventually >> the storage server starts transmitting again. >> >> Because the pause happens while the ESXi client is waiting for a packet >> from the storage server, that tells me that the gaps are not an artifact >> of traffic being switched between both active interfaces, but are >> actually indicative of short hangs occurring on the server. >> >> Having a pause or two in transmission is no big deal, but in my case, it >> is happening constantly, and dropping my overall read transfer rate down >> to 20-60MB/s, which is slower than the single interface transfer rate >> (~90-100MB/s). >> >> Decreasing the MTU makes the pauses shorter, increasing them makes the >> pauses longer. >> >> Another interesting thing is that if I set the multipath io interval to >> 3 operations instead of 1, I get better throughput. In other words, the >> less frequently I swap IP addresses on my iSCSI requests from the ESXi >> unit, the fewer pauses I see. >> >> Basically, COMSTAR seems to choke each time an iSCSI request from a new >> IP arrives. >> >> Because the single interface transfer is near line rate, that tells me >> that the storage system (mpt_sas, zfs, etc) is working fine. It's only >> when multiple paths are attempted that iSCSI falls on its face during >> reads. >> >> All of these captures were taken without a cache device being attached >> to the storage zpool, so this isn't looking like some kind of ZFS ARC >> problem. As mentioned previously, local transfers to/from the zpool are >> showing ~300-500 MB/s rates over long transfers (10G+). >> >> -Warren V >> >> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore > >> > wrote: >> >> I?m not sure I?ve followed properly. You have *two* interfaces. >> You are not trying to provision these in an aggr are you? As far as >> I?m aware, VMware does not support 802.3ad link aggregations. (Its >> possible that you can make it work with ESXi if you give the entire >> NIC to the guest ? but I?m skeptical.) The problem is that if you >> try to use link aggregation, some packets (up to half!) will be >> lost. TCP and other protocols fare poorly in this situation. >> >> Its possible I?ve totally misunderstood what you?re trying to do, in >> which case I apologize. >> >> The idle thing is a red-herring ? the cpu is waiting for work to do, >> probably because packets haven?t arrived (or where dropped by the >> hypervisor!) I wouldn?t read too much into that except that your >> network stack is in trouble. I?d look a bit more closely at the >> kstats for tcp ? I suspect you?ll see retransmits or out of order >> values that are unusually high ? if so this may help validate my >> theory above. >> >> - Garrett >> >> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer >> > >> >> >> wrote: >> >> Hello all, >> >> >> Well, I no longer blame the ixgbe driver for the problems I'm seeing. >> >> >> I tried Joerg's updated driver, which didn't improve the issue. So >> I went back to the drawing board and rebuilt the server from scratch. >> >> What I noted is that if I have only a single 1-gig physical >> interface active on the ESXi host, everything works as expected. >> As soon as I enable two interfaces, I start seeing the performance >> problems I've described. >> >> Response pauses from the server that I see in TCPdumps are still >> leading me to believe the problem is delay on the server side, so >> I ran a series of kernel dtraces and produced some flamegraphs. >> >> >> This was taken during a read operation with two active 10G >> interfaces on the server, with a single target being shared by two >> tpgs- one tpg for each 10G physical port. The host device has two >> 1G ports enabled, with VLANs separating the active ports into >> 10G/1G pairs. ESXi is set to multipath using both VLANS with a >> round-robin IO interval of 1. >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing >> >> >> This was taken during a write operation: >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing >> >> >> I then rebooted the server and disabled C-State, ACPI T-State, and >> general EIST (Turbo boost) functionality in the CPU. >> >> I when I attempted to boot my guest VM, the iSCSI transfer >> gradually ground to a halt during the boot loading process, and >> the guest OS never did complete its boot process. >> >> Here is a flamegraph taken while iSCSI is slowly dying: >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing >> >> >> I edited out cpu_idle_adaptive from the dtrace output and >> regenerated the slowdown graph: >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing >> >> >> I then edited cpu_idle_adaptive out of the speedy write operation >> and regenerated that graph: >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing >> >> >> I have zero experience with interpreting flamegraphs, but the most >> significant difference I see between the slow read example and the >> fast write example is in unix`thread_start --> unix`idle. There's >> a good chunk of "unix`i86_mwait" in the read example that is not >> present in the write example at all. >> >> Disabling the l2arc cache device didn't make a difference, and I >> had to reenable EIST support on the CPU to get my VMs to boot. >> >> I am seeing a variety of bug reports going back to 2010 regarding >> excessive mwait operations, with the suggested solutions usually >> being to set "cpupm enable poll-mode" in power.conf. That change >> also had no effect on speed. >> >> -Warren V >> >> >> >> >> -----Original Message----- >> >> From: Chris Siebenmann [mailto:cks at cs.toronto.edu] >> >> Sent: Monday, February 23, 2015 8:30 AM >> >> To: W Verb >> >> Cc: omnios-discuss at lists.omniti.com >> >> ; cks at cs.toronto.edu >> >> >> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and >> the Greek economy >> >> >> > Chris, thanks for your specific details. I'd appreciate it if you >> >> > could tell me which copper NIC you tried, as well as to pass on the >> >> > iSCSI tuning parameters. >> >> >> Our copper NIC experience is with onboard X540-AT2 ports on >> SuperMicro hardware (which have the guaranteed 10-20 msec lock >> hold) and dual-port 82599EB TN cards (which have some sort of >> driver/hardware failure under load that eventually leads to >> 2-second lock holds). I can't recommend either with the current >> driver; we had to revert to 1G networking in order to get stable >> servers. >> >> >> The iSCSI parameter modifications we do, across both initiators >> and targets, are: >> >> >> initialr2tno >> >> firstburstlength128k >> >> maxrecvdataseglen128k[only on Linux backends] >> >> maxxmitdataseglen128k[only on Linux backends] >> >> >> The OmniOS initiator doesn't need tuning for more than the first >> two parameters; on the Linux backends we tune up all four. My >> extended thoughts on these tuning parameters and why we touch them >> can be found >> >> here: >> >> >> >> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol >> >> http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning >> >> >> The short version is that these parameters probably only make a >> small difference but their overall goal is to do 128KB ZFS reads >> and writes in single iSCSI operations (although they will be >> fragmented at the TCP >> >> layer) and to do iSCSI writes without a back-and-forth delay >> between initiator and target (that's 'initialr2t no'). >> >> >> I think basically everyone should use InitialR2T set to no and in >> fact that it should be the software default. These days only >> unusually limited iSCSI targets should need it to be otherwise and >> they can change their setting for it (initiator and target must >> both agree to it being 'yes', so either can veto it). >> >> >> - cks >> >> >> >> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann > >> > wrote: >> >> Hi, >> >> I think your problem is caused by your link properties or your >> switch settings. In general the standard ixgbe seems to perform >> well. >> >> I had trouble after changing the default flow control settings >> to "bi" >> and this was my motivation to update the ixgbe driver a long >> time ago. >> After I have updated our systems to ixgbe 2.5.8 I never had any >> problems .... >> >> Make sure your switch has support for jumbo frames and you use >> the same mtu on all ports, otherwise the smallest will be used. >> >> What switch do you use? I can tell you nice horror stories about >> different vendors.... >> >> - Joerg >> >> On 23.02.2015 10:31, W Verb wrote: >> >> Thank you Joerg, >> >> I've downloaded the package and will try it tomorrow. >> >> The only thing I can add at this point is that upon review >> of my >> testing, I may have performed my "pkg -u" between the >> initial quad-gig >> performance test and installing the 10G NIC. So this may >> be a new >> problem introduced in the latest updates. >> >> Those of you who are running 10G and have not upgraded to >> the latest >> kernel, etc, might want to do some additional testing >> before running the >> update. >> >> -Warren V >> >> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann >> >> >> >> wrote: >> >> Hi, >> >> I remember there was a problem with the flow control >> settings in the >> ixgbe >> driver, so I updated it a long time ago for our >> internal servers to >> 2.5.8. >> Last weekend I integrated the latest changes from the >> FreeBSD driver >> to bring >> the illumos ixgbe to 2.5.25 but I had no time to test >> it, so it's >> completely >> untested! >> >> >> If you would like to give the latest driver a try you >> can fetch the >> kernel modules from >> https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 >> >> > > >> >> Clone your boot environment, place the modules in the >> new environment >> and update the boot-archive of the new BE. >> >> - Joerg >> >> >> >> >> >> On 23.02.2015 02:54, W Verb wrote: >> >> By the way, to those of you who have working >> setups: please send me >> your pool/volume settings, interface linkprops, >> and any kernel >> tuning >> parameters you may have set. >> >> Thanks, >> Warren V >> >> On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip >> >> >> >> >> >> wrote: >> >> I can't say I totally agree with your performance >> assessment. I run Intel >> X520 in all my OmniOS boxes. >> >> Here is a capture of nfssvrtop I made while >> running many >> storage vMotions >> between two OmniOS boxes hosting NFS >> datastores. This is a >> 10 host VMware >> cluster. Both OmniOS boxes are dual 10G >> connected with >> copper twin-ax to >> the in rack Nexus 5010. >> >> VMware does 100% sync writes, I use ZeusRAM >> SSDs for log >> devices. >> >> -Chip >> >> 2014 Apr 24 08:05:51, load: 12.64, read: >> 17330243 KB, >> swrite: 15985 KB, >> awrite: 1875455 KB >> >> Ver Client NFSOPS Reads >> SWrites AWrites >> Commits Rd_bw >> SWr_bw AWr_bw Rd_t SWr_t AWr_t >> Com_t Align% >> >> 4 10.28.17.105 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.17.215 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.17.213 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.16.151 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 all 1 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 3 10.28.16.175 3 0 >> 3 0 >> 0 1 >> 11 0 4806 48 0 0 85 >> >> 3 10.28.16.183 6 0 >> 6 0 >> 0 3 >> 162 0 549 124 0 0 >> 73 >> >> 3 10.28.16.180 11 0 >> 10 0 >> 0 3 >> 27 0 776 89 0 0 67 >> >> 3 10.28.16.176 28 2 >> 26 0 >> 0 10 >> 405 0 2572 198 0 0 >> 100 >> >> 3 10.28.16.178 4606 4602 >> 4 0 >> 0 294534 >> 3 0 723 49 0 0 99 >> >> 3 10.28.16.179 4905 4879 >> 26 0 >> 0 312208 >> 311 0 735 271 0 0 >> 99 >> >> 3 10.28.16.181 5515 5502 >> 13 0 >> 0 352107 >> 77 0 89 87 0 0 99 >> >> 3 10.28.16.184 12095 12059 >> 10 0 >> 0 763014 >> 39 0 249 147 0 0 99 >> >> 3 10.28.58.1 15401 6040 >> 116 6354 >> 53 191605 >> 474 202346 192 96 144 83 >> 99 >> >> 3 all 42574 33086 > <42574%2033086>> >> > 217 >> 6354 53 1913488 >> 1582 202300 348 138 153 105 >> 99 >> >> >> >> >> >> On Fri, Feb 20, 2015 at 11:46 PM, W Verb >> >> > >> >> >> wrote: >> >> >> Hello All, >> >> Thank you for your replies. >> I tried a few things, and found the following: >> >> 1: Disabling hyperthreading support in the >> BIOS drops >> performance overall >> by a factor of 4. >> 2: Disabling VT support also seems to have >> some effect, >> although it >> appears to be minor. But this has the >> amusing side >> effect of fixing the >> hangs I've been experiencing with fast >> reboot. Probably >> by disabling kvm. >> 3: The performance tests are a bit tricky >> to quantify >> because of caching >> effects. In fact, I'm not entirely sure >> what is >> happening here. It's just >> best to describe what I'm seeing: >> >> The commands I'm using to test are >> dd if=/dev/zero of=./test.dd bs=2M count=5000 >> dd of=/dev/null if=./test.dd bs=2M count=5000 >> The host vm is running Centos 6.6, and has >> the latest >> vmtools installed. >> There is a host cache on an SSD local to >> the host that >> is also in place. >> Disabling the host cache didn't >> immediately have an >> effect as far as I could >> see. >> >> The host MTU set to 3000 on all iSCSI >> interfaces for all >> tests. >> >> Test 1: Right after reboot, with an ixgbe >> MTU of 9000, >> the write test >> yields an average speed over three tests >> of 137MB/s. The >> read test yields an >> average over three tests of 5MB/s. >> >> Test 2: After setting "ifconfig ixgbe0 mtu >> 3000", the >> write tests yield >> 140MB/s, and the read tests yield 53MB/s. >> It's important >> to note here that >> if I cut the read test short at only >> 2-3GB, I get >> results upwards of >> 350MB/s, which I assume is local >> cache-related distortion. >> >> Test 3: MTU of 1500. Read tests are up to >> 156 MB/s. >> Write tests yield >> about 142MB/s. >> Test 4: MTU of 1000: Read test at 182MB/s. >> Test 5: MTU of 900: Read test at 130 MB/s. >> Test 6: MTU of 1000: Read test at 160MB/s. >> Write tests >> are now >> consistently at about 300MB/s. >> Test 7: MTU of 1200: Read test at 124MB/s. >> Test 8: MTU of 1000: Read test at 161MB/s. >> Write at 261MB/s. >> >> A few final notes: >> L1ARC grabs about 10GB of RAM during the >> tests, so >> there's definitely some >> read caching going on. >> The write operations are easier to observe >> with iostat, >> and I'm seeing io >> rates that closely correlate with the >> network write speeds. >> >> >> Chris, thanks for your specific details. >> I'd appreciate >> it if you could >> tell me which copper NIC you tried, as >> well as to pass >> on the iSCSI tuning >> parameters. >> >> I've ordered an Intel EXPX9502AFXSR, which >> uses the >> 82598 chip instead of >> the 82599 in the X520. If I get similar >> results with my >> fiber transcievers, >> I'll see if I can get a hold of copper ones. >> >> But I should mention that I did indeed >> look at PHY/MAC >> error rates, and >> they are nil. >> >> -Warren V >> >> On Fri, Feb 20, 2015 at 7:25 PM, Chris >> Siebenmann >> > >> > >> >> >> >> >> wrote: >> >> >> After installation and >> configuration, I observed >> all kinds of bad >> behavior >> in the network traffic between the >> hosts and the >> server. All of this >> bad >> behavior is traced to the ixgbe >> driver on the >> storage server. Without >> going >> into the full troubleshooting >> process, here are >> my takeaways: >> >> [...] >> >> For what it's worth, we managed to >> achieve much >> better line rates on >> copper 10G ixgbe hardware of various >> descriptions >> between OmniOS >> and CentOS 7 (I don't think we ever >> tested OmniOS to >> OmniOS). I don't >> believe OmniOS could do TCP at full >> line rate but I >> think we managed 700+ >> Mbytes/sec on both transmit and >> receive and we got >> basically disk-limited >> speeds with iSCSI (across multiple >> disks on >> multi-disk mirrored pools, >> OmniOS iSCSI initiator, Linux iSCSI >> targets). >> >> I don't believe we did any specific >> kernel tuning >> (and in fact some of >> our attempts to fiddle ixgbe driver >> parameters blew >> up in our face). >> We did tune iSCSI connection >> parameters to increase >> various buffer >> sizes so that ZFS could do even large >> single >> operations in single iSCSI >> transactions. (More details available >> if people are >> interested.) >> >> 10: At the wire level, the speed >> problems are >> clearly due to pauses in >> response time by omnios. At 9000 >> byte frame >> sizes, I see a good number >> of duplicate ACKs and fast >> retransmits during >> read operations (when >> omnios is transmitting). But below >> about a >> 4100-byte MTU on omnios >> (which seems to correlate to >> 4096-byte iSCSI >> block transfers), the >> transmission errors fade away and >> we only see >> the transmission pause >> problem. >> >> >> This is what really attracted my >> attention. In >> our OmniOS setup, our >> specific Intel hardware had ixgbe >> driver issues that >> could cause >> activity stalls during once-a-second >> link heartbeat >> checks. This >> obviously had an effect at the TCP and >> iSCSI layers. >> My initial message >> to illumos-developer sparked a potentially >> interesting discussion: >> >> >> http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/ >> < >> http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/ >> > >> >> < >> http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/ >> < >> http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/ >> >> >> >> If you think this is a possibility in >> your setup, >> I've put the DTrace >> script I used to hunt for this up on >> the web: >> >> >> http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d >> < >> http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d> >> >> < >> http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d >> < >> http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d>> >> >> This isn't the only potential source >> of driver >> stalls by any means, it's >> just the one I found. You may also >> want to look at >> lockstat in general, >> as information it reported is what led >> us to look >> specifically at the >> ixgbe code here. >> >> (If you suspect kernel/driver issues, >> lockstat >> combined with kernel >> source is a really excellent resource.) >> >> - cks >> >> >> >> >> >> ___________________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti >> .____com >> > > >> >> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss >> >> >> > > >> >> >> ___________________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti >> .____com >> > > >> >> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss >> >> >> > > >> >> >> -- >> OSN Online Service Nuernberg GmbH, Bucher Str. 78, >> 90408 Nuernberg >> Tel: +49 911 39905-0 > <%2B49%20911%2039905-0>> >> > - Fax: >> +49 911 >> 39905-55 > <%2B49%20911%2039905-55>> - >> http://www.osn.de >> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg >> Goltermann >> >> >> >> -- >> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg >> Tel: +49 911 39905-0 > <%2B49%20911%2039905-0>> - Fax: +49 >> 911 39905-55 > >> - http://www.osn.de >> >> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >> >> >> *illumos-developer* | Archives >> >> >> | Modify Your Subscription >> [Powered by Listbox] >> >> >> >> *illumos-developer* | Archives >> >> | >> Modify >> >> > > >> Your Subscription [Powered by Listbox] > >> >> ... >> >> [Message clipped] > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garrett at damore.org Wed Mar 4 07:30:17 2015 From: garrett at damore.org (Garrett D'Amore) Date: Tue, 3 Mar 2015 23:30:17 -0800 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> <54F44602.5030705@osn.de> <064C3FC8-43E0-4757-AADA-831303782A4C@damore.org> Message-ID: I'm not surprised by this result. Indeed with the earlier data you had from lockstat it looked like a comstar or zfs issue on the server. Unfortunately the follow up lockstat you sent was pruned to uselessness. If you can post the full lockstat with -s5 somewhere it might help understand what is actually going on under the hood. Sent from my iPhone > On Mar 3, 2015, at 9:21 PM, W Verb wrote: > > Hello all, > > This is probably the last message in this thread. > > I pulled the quad-gig NIC out of one of my hosts, and installed an X520. I then set a single 10G port on the server to be on the same VLAN as the host, and defined a vswitch, vmknic, etc on the host. > > I set the MTU to be 9000 on both sides, then ran my tests. > > Read: 130 MB/s. > Write: 156 MB/s. > > Additionally, at higher MTUs, the NIC would periodically lock up until I performed an "ipadm disable-if -t ixgbe0" and re-enabled it. I tried your updated driver, Jeorg, but unfortunately it failed quite often. > > I then disabled stmf, enabled NFS (v3 only) on the server, and shared a dataset on the zpool with "share -f nfs /ppool/testy". > I then mounted the server dataset on the host via NFS, and copied my test VM from the iSCSI zvol to the NFS dataset. I also removed the binding of the 10G port on the host from the sw iscsi interface. > > Running the same tests on the VM over NFSv3 yielded: > > Read: 650MB/s > Write: 306MB/s > > This is getting within 10% of the throughput I consistently get on dd operations local on the server, so I'm pretty happy that I'm getting as good as I'm going to get until I add more drives. Additionally, I haven't experienced any NIC hangs. > > I tried varying the settings in ixgbe.conf, the MTU, and disabling LSO on the host and server, but nothing really made that much of a difference (except reducing the MTU made things about 20-30% slower). > > mpstat during both NFS and iSCSI transfers showed all processors as getting roughly the same number of interrupts, etc, although I did see a varying number of spins on reader/writer locks during the iSCSI transfers. The NFS showed no srws at all. > > Here is a pretty representative example of a 1s mpstat during an iSCSI transfer: > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl set > 0 0 0 0 3246 2690 8739 6 772 5967 2 0 0 11 0 89 0 > 1 0 0 0 2366 2249 7910 8 988 5563 2 302 0 9 0 91 0 > 2 0 0 0 2455 2344 5584 5 687 5656 3 66 0 9 0 91 0 > 3 0 0 25 248 12 6210 1 885 5679 2 0 0 9 0 91 0 > 4 0 0 0 284 7 5450 2 861 5751 1 0 0 8 0 92 0 > 5 0 0 0 232 3 4513 0 547 5733 3 0 0 7 0 93 0 > 6 0 0 0 322 8 6084 1 836 6295 2 0 0 8 0 92 0 > 7 0 0 0 3114 2848 8229 4 648 4966 2 0 0 10 0 90 0 > > > So, it seems that it's COMSTAR/iSCSI that's broke as hell, not ixgbe. My apologies to anyone I may have offended with my pre-judgement. > > The consequences of this performance issue are significant: > 1: Instead of being able to utilize the existing quad-port NICs I have in my hosts, I must use dual 10G cards for redundancy purposes. > 2: I must build out a full 10G switching infrastructure. > 3: The network traffic is inherently less secure, as it is essentially impossible to do real security with NFSv3 (that is supported by ESXi). > > In the short run, I have already ordered some relatively cheap 20G infiniband gear that will hopefully push up the cost/performance ratio. However, I have received all sorts of advice about how painful it can be to build and maintain infiniband, and if iSCSI over 10G ethernet is this painful, I'm not hopeful that infiniband will "just work". > > The last option, of course, is to bail out of the Solaris derivatives and move to ZoL or ZoBSD. The drawbacks of this are: > > 1: ZoL doesn't easily support booting off of mirrored USB flash drives, let alone running the root filesystem and swap on them. FreeNAS, by way of comparison, puts a 2G swap partition on each zdev, which (strangely enough) causes it to often crash when a zdev experiences a failure under load. > > 2: Neither ZoL or FreeNAS have good, stable, kernel-based iSCSI implementations. FreeNAS is indeed testing istgt, but it proved unstable for my purposes in recent builds. Unfortunately, stmf hasn't proved itself any better. > > There are other minor differences, but these are the ones that brought me to OmniOS in the first place. We'll just have to wait and see how well the infiniband stuff works. > > > Hopefully this exercise will help prevent others from going down the same rabbit-hole that I did. > > -Warren V > > > > >> On Tue, Mar 3, 2015 at 3:45 PM, W Verb wrote: >> Hello Rob et al, >> >> Thank you for taking the time to look at this problem with me. I completely understand your inclination to look at the network as the most probable source of my issue, but I believe that this is a pretty clear-cut case of server-side issues. >> >> 1: I did run ping RTT tests during both read and write operations with multiple interfaces enabled, and the RTT stayed at ~.2ms regardless of whether traffic was actively being transmitted/received or not. >> >> 2: I am not seeing the TCP window size bouncing around, and I am certainly not seeing starvation and delay in my packet captures. It is true that I do see delayed ACKs and retransmissions when I bump the MTU to 9000 on both sides, but I stopped testing with high MTU as soon as I saw it happening because I have a good understanding of incast. All of my recent testing has been with MTUs between 1000 and 3000 bytes. >> >> 3: When testing with MTUs between 1000 and 3000 bytes, I do not see lost packets and retransmission in captures on either the server or client side. I only see staggered transmission delays on the part of the server. >> >> 4: The client is consistently advertising a large window size (20k+), so the TCP throttling mechanism does not appear to play into this. >> >> 5: As mentioned previously, layer 2 flow control is not enabled anywhere in the network, so there are no lower-level mechanisms at work. >> >> 6: Upon checking buffer and queue sizes (and doing the appropriate research into documentation on the C3560E's buffer sizes), I do not see large numbers of frames being dropped by the switch. It does happen at larger MTUs, but not very often (and not consistently) during transfers at 1000-3000 byte MTUs. I do not have QoS, policing, or rate-shaping enabled. >> >> 7: Network interface stats on both the server and the ESXi client show no errors of any kind. This is via netstat on the server, and esxcli / Vsphere client on the ESXi box. >> >> 8: When looking at captures taken simultaneously on the server and client side, the server-side transmission pauses are consistently seen and reproducible, even after multiple server rebuilds, ESXi rebuilds, vSphere reinstallations (down to wiping the SQL db), various COMSTAR configuration variations, multiple 10G NICs with different NIC chipsets, multiple switches (I tried both a 48-port and 24-port C3560E), multiple IOS revisions (12.2 and 15.0), OmniOS versions (r151012 and previous) multiple cables, transceivers, etc etc etc etc etc >> >> For your review, I have uploaded the actual packet captures to Google Drive: >> >> https://drive.google.com/file/d/0BwyUMjibonYQZS03dHJ2ZjJvTEE/view?usp=sharing 2 int write - ESXi vmk5 >> https://drive.google.com/file/d/0BwyUMjibonYQTlNTQ2M5bjlxZ00/view?usp=sharing 2 int write - ESXi vmk1 >> https://drive.google.com/file/d/0BwyUMjibonYQUEFsSVJCYXBVX3c/view?usp=sharing 2 int read - server ixgbe0 >> https://drive.google.com/file/d/0BwyUMjibonYQT3FBbElnNFpJTzQ/view?usp=sharing 2 int read - ESXi vmk5 >> https://drive.google.com/file/d/0BwyUMjibonYQU1hXdFRLM2cxSTA/view?usp=sharing 2 int read - ESXi vmk1 >> https://drive.google.com/file/d/0BwyUMjibonYQNEFZSHVZdFNweDA/view?usp=sharing 1 int write - ESXi vmk1 >> https://drive.google.com/file/d/0BwyUMjibonYQM3FpTmloQm5iMGc/view?usp=sharing 1 int read - ESXi vmk1 >> >> Regards, >> >> Warren V >> >>> On Mon, Mar 2, 2015 at 1:11 PM, Mallory, Rob wrote: >>> Just an EWAG, and forgive me for not following closely, I just saw this in my inbox, and looked at it and the screenshots for 2 minutes. >>> >>> >>> >>> But this looks like the typical incast problem.. see http://www.pdl.cmu.edu/Incast/ >>> >>> where your storage servers (there are effectively two with ISCSI/MPIO if round-robin is working) have networks which are 20:1 oversubscribed to your 1GbE host interfaces. (although one of the tcpdumps shows only one server so it may be choked out completely) >>> >>> >>> >>> What is your BDP? I?m guessing .150ms * 1GbE. For single-link that gets you to a MSS of 18700 or so. >>> >>> >>> >>> On your 1GbE connected clients, leave MTU at 9k, set the following in sysctl.conf, >>> >>> And reboot. >>> >>> >>> >>> net.ipv4.tcp_rmem = 4096 8938 17876 >>> >>> >>> >>> If MPIO from the server is indeed round-robining properly, this will ?make things fit? much better. >>> >>> >>> >>> Note that your tcp_wmem can and should stay high, since you are not oversubscribed going from client?server ; you only need to tweak the tcp receive window size. >>> >>> >>> >>> I?ve not done it in quite some time, but IIRC, You can also set these from the server side with: >>> >>> Route add -sendpipe 8930 or ?ssthresh >>> >>> >>> >>> And I think you can see the hash-table with computed BDP per client with ndd. >>> >>> >>> >>> I would try playing with those before delving deep into potential bugs in the TCP, nic driver, zfs, or vm. >>> >>> -Rob >>> >>> >>> >>> From: W Verb via illumos-developer [mailto:developer at lists.illumos.org] >>> Sent: Monday, March 02, 2015 12:20 PM >>> To: Garrett D'Amore >>> Cc: Joerg Goltermann; illumos-dev; omnios-discuss at lists.omniti.com >>> Subject: Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy >>> >>> >>> >>> Hello, >>> >>> vmstat seems pretty boring. Certainly nothing going to swap. >>> >>> root at sanbox:/root# vmstat >>> kthr memory page disk faults cpu >>> r b w swap free re mf pi po fr de sr po ro s0 s2 in sy cs us sy id >>> 0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0 1 99 >>> >>> >>> Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep 30" during the "fast" write operation. >>> >>> ------------------------------------------------------------------------------- >>> Count indv cuml rcnt nsec Hottest Lock Caller >>> 50934 3% 79% 0.00 3437 0xffffff093145ba40 taskq_dispatch_ent >>> >>> nsec ------ Time Distribution ------ count Stack >>> 128 | 7 spa_taskq_dispatch_ent >>> 256 |@@ 4333 zio_taskq_dispatch >>> 512 |@@ 3863 zio_issue_async >>> 1024 |@@@@@ 9717 zio_execute >>> 2048 |@@@@@@@@@ 15904 >>> 4096 |@@@@ 7595 >>> 8192 |@@ 4498 >>> 16384 |@ 2662 >>> 32768 |@ 1886 >>> 65536 | 434 >>> 131072 | 34 >>> 262144 | 1 >>> ------------------------------------------------------------------------------- >>> >>> >>> >>> However, the truly "broken" function is a read operation: >>> >>> Top lock 1st try: >>> ------------------------------------------------------------------------------- >>> Count indv cuml rcnt nsec Hottest Lock Caller >>> 474 15% 15% 0.00 7031 0xffffff093145b6f8 cv_wait >>> >>> nsec ------ Time Distribution ------ count Stack >>> 256 |@ 29 taskq_thread_wait >>> 512 |@@@@@@ 100 taskq_thread >>> 1024 |@@@@ 72 thread_start >>> 2048 |@@@@ 69 >>> 4096 |@@@ 51 >>> 8192 |@@ 47 >>> 16384 |@@ 44 >>> 32768 |@@ 32 >>> 65536 |@ 25 >>> 131072 | 5 >>> ------------------------------------------------------------------------------- >>> >>> >>> Top lock 2nd try: >>> >>> ------------------------------------------------------------------------------- >>> Count indv cuml rcnt nsec Hottest Lock Caller >>> 174 39% 39% 0.00 103909 0xffffff0943f116a0 dmu_zfetch_find >>> >>> nsec ------ Time Distribution ------ count Stack >>> 2048 | 2 dmu_zfetch >>> 4096 | 3 dbuf_read >>> 8192 | 4 dmu_buf_hold_array_by_dnode >>> 16384 | 3 dmu_buf_hold_array >>> 32768 |@ 7 >>> 65536 |@@ 14 >>> 131072 |@@@@@@@@@@@@@@@@@@@@ 116 >>> 262144 |@@@ 19 >>> 524288 | 4 >>> 1048576 | 2 >>> ------------------------------------------------------------------------------- >>> >>> Top lock 3rd try: >>> >>> ------------------------------------------------------------------------------- >>> Count indv cuml rcnt nsec Hottest Lock Caller >>> 283 55% 55% 0.00 94602 0xffffff0943ff5a68 dmu_zfetch_find >>> >>> nsec ------ Time Distribution ------ count Stack >>> 512 | 1 dmu_zfetch >>> 1024 | 1 dbuf_read >>> 2048 | 0 dmu_buf_hold_array_by_dnode >>> 4096 | 5 dmu_buf_hold_array >>> 8192 | 2 >>> 16384 | 7 >>> 32768 | 4 >>> 65536 |@@@ 33 >>> 131072 |@@@@@@@@@@@@@@@@@@@@ 198 >>> 262144 |@@ 27 >>> 524288 | 2 >>> 1048576 | 3 >>> ------------------------------------------------------------------------------- >>> >>> >>> >>> As for the MTU question- setting the MTU to 9000 makes read operations grind almost to a halt at 5MB/s transfer rate. >>> >>> -Warren V >>> >>> >>> >>> On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore wrote: >>> >>> Here?s a theory. You are using small (relatively) MTUs (3000 is less than the smallest ZFS block size.) So, when you go multipathing this way, might a single upper layer transaction (ZFS block transfer request, or for that matter COMSTAR block request) get routed over different paths. This sounds like a potentially pathological condition to me. >>> >>> >>> >>> What happens if you increase the MTU to 9000? Have you tried it? I?m sort of thinking that this will permit each transaction to be issued in a single IP frame, which may alleviate certain tragic code paths. (That said, I?m not sure how aware COMSTAR is of the IP MTU. If it is ignorant, then it shouldn?t matter *that* much, since TCP should do the right thing here and a single TCP stream should stick to a single underlying NIC. But if COMSTAR is aware of the MTU, it may do some really screwball things as it tries to break requests up into single frames.) >>> >>> >>> >>> Your read spin really looks like only about 22 msec of wait out of a total run of 30 sec. (That?s not *great*, but neither does it sound tragic.) Your write is interesting because that looks like it is going a wildly different path. You should be aware that the locks you see are *not* necessarily related in call order, but rather are ordered by instance count. The write code path hitting the task_thread as hard as it does is really, really weird. Something is pounding on a taskq lock super hard. The number of taskq_dispatch_ent calls is interesting here. I?m starting to wonder if it?s something as stupid as a spin where if the taskq is ?full? (max size reached), a caller just is spinning trying to dispatch jobs to the taskq. >>> >>> >>> >>> The taskq_dispatch_ent code is super simple, and it should be almost impossible to have contention on that lock ? barring a thread spinning hard on taskq_dispatch (or taskq_dispatch_ent as I think is happening here). Looking at the various call sites, there are places in both COMSTAR (iscsit) and in ZFS where this could be coming from. To know which, we really need to have the back trace associated. >>> >>> >>> >>> lockstat can give this ? try giving ?-s 5? to give a short backtrace from this, that will probably give us a little more info about the guilty caller. :-) >>> >>> >>> >>> - Garrett >>> >>> >>> >>> On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer wrote: >>> >>> >>> >>> Hello all, >>> >>> I am not using layer 2 flow control. The switch carries line-rate 10G traffic without error. >>> >>> I think I have found the issue via lockstat. The first lockstat is taken during a multipath read: >>> >>> >>> lockstat -kWP sleep 30 >>> >>> >>> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec) >>> >>> Count indv cuml rcnt nsec Hottest Lock Caller >>> ------------------------------------------------------------------------------- >>> 9306 44% 44% 0.00 1557 htable_mutex+0x370 htable_release >>> 6307 23% 68% 0.00 1207 htable_mutex+0x108 htable_lookup >>> 596 7% 75% 0.00 4100 0xffffff0931705188 cv_wait >>> 349 5% 80% 0.00 4437 0xffffff0931705188 taskq_thread >>> 704 2% 82% 0.00 995 0xffffff0935de3c50 dbuf_create >>> >>> The hash table being read here I would guess is the tcp connection hash table. >>> >>> >>> >>> When lockstat is run during a multipath write operation, I get: >>> >>> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec) >>> >>> Count indv cuml rcnt nsec Hottest Lock Caller >>> ------------------------------------------------------------------------------- >>> 210752 28% 28% 0.00 4781 0xffffff0931705188 taskq_thread >>> 174471 22% 50% 0.00 4476 0xffffff0931705188 cv_wait >>> 127183 10% 61% 0.00 2871 0xffffff096f29b510 zio_notify_parent >>> 176066 10% 70% 0.00 1922 0xffffff0931705188 taskq_dispatch_ent >>> 105134 9% 80% 0.00 3110 0xffffff096ffdbf10 zio_remove_child >>> 67512 4% 83% 0.00 1938 0xffffff096f3db4b0 zio_add_child >>> 45736 3% 86% 0.00 2239 0xffffff0935de3c50 dbuf_destroy >>> 27781 3% 89% 0.00 3416 0xffffff0935de3c50 dbuf_create >>> 38536 2% 91% 0.00 2122 0xffffff0935de3b70 dnode_rele >>> 27841 2% 93% 0.00 2423 0xffffff0935de3b70 dnode_diduse_space >>> 19020 2% 95% 0.00 3046 0xffffff09d9e305e0 dbuf_rele >>> 14627 1% 96% 0.00 3632 dbuf_hash_table+0x4f8 dbuf_find >>> >>> >>> >>> Writes are not performing htable lookups, while reads are. >>> >>> -Warren V >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann wrote: >>> >>> Hi, >>> >>> I would try *one* TPG which includes both interface addresses >>> and I would double check for packet drops on the Catalyst. >>> >>> The 3560 supports only receive flow control which means, that >>> a sending 10Gbit port can easily overload a 1Gbit port. >>> Do you have flow control enabled? >>> >>> - Joerg >>> >>> >>> >>> On 02.03.2015 09:22, W Verb via illumos-developer wrote: >>> >>> Hello Garrett, >>> >>> No, no 802.3ad going on in this config. >>> >>> Here is a basic schematic: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing >>> >>> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing >>> >>> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The >>> switch is set to allow 9148-byte frames, and I'm not seeing any >>> errors/buffer overruns on the switch. >>> >>> Here is a screenshot of a packet capture from a read operation on the >>> guest OS (from it's local drive, which is actually a VMDK file on the >>> storage server). In this example, only a single 1G ESXi kernel interface >>> (vmk1) is bound to the software iSCSI initiator. >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing >>> >>> Note that there's a nice, well-behaved window sizing process taking >>> place. The ESXi decreases the scaled window by 11 or 12 for each ACK, >>> then bumps it back up to 512. >>> >>> Here is a similar screenshot of a single-interface write operation: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing >>> >>> There are no pauses or gaps in the transmission rate in the >>> single-interface transfers. >>> >>> >>> In the next screenshots, I have enabled an additional 1G interface on >>> the ESXi host, and bound it to the iSCSI initiator. The new interface is >>> bound to a separate physical port, uses a different VLAN on the switch, >>> and talks to a different 10G port on the storage server. >>> >>> First, let's look at a write operation on the guest OS, which happily >>> pumps data at near-line-rate to the storage server. >>> >>> Here is a sequence number trace diagram. Note how the transfer has a >>> nice, smooth increment rate over the entire transfer. >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing >>> >>> Here are screenshots from packet captures on both 1G interfaces: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing >>> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing >>> >>> Note how we again see nice, smooth window adjustment, and no gaps in >>> transmission. >>> >>> >>> But now, let's look at the problematic two-interface Read operation. >>> First, the sequence graph: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing >>> >>> As you can see, there are gaps and jumps in the transmission throughout >>> the transfer. >>> It is very illustrative to look at captures of the gaps, which are >>> occurring on both interfaces: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing >>> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing >>> >>> As you can see, there are ~.4 second pauses in transmission from the >>> storage server, which kills the transfer rate. >>> It's clear that the ESXi box ACKs the prior iSCSI operation to >>> completion, then makes a new LUN request, which the storage server >>> immediately replies to. The ESXi ACKs the response packet from the >>> storage server, then waits...and waits....and waits... until eventually >>> the storage server starts transmitting again. >>> >>> Because the pause happens while the ESXi client is waiting for a packet >>> from the storage server, that tells me that the gaps are not an artifact >>> of traffic being switched between both active interfaces, but are >>> actually indicative of short hangs occurring on the server. >>> >>> Having a pause or two in transmission is no big deal, but in my case, it >>> is happening constantly, and dropping my overall read transfer rate down >>> to 20-60MB/s, which is slower than the single interface transfer rate >>> (~90-100MB/s). >>> >>> Decreasing the MTU makes the pauses shorter, increasing them makes the >>> pauses longer. >>> >>> Another interesting thing is that if I set the multipath io interval to >>> 3 operations instead of 1, I get better throughput. In other words, the >>> less frequently I swap IP addresses on my iSCSI requests from the ESXi >>> unit, the fewer pauses I see. >>> >>> Basically, COMSTAR seems to choke each time an iSCSI request from a new >>> IP arrives. >>> >>> Because the single interface transfer is near line rate, that tells me >>> that the storage system (mpt_sas, zfs, etc) is working fine. It's only >>> when multiple paths are attempted that iSCSI falls on its face during reads. >>> >>> All of these captures were taken without a cache device being attached >>> to the storage zpool, so this isn't looking like some kind of ZFS ARC >>> problem. As mentioned previously, local transfers to/from the zpool are >>> showing ~300-500 MB/s rates over long transfers (10G+). >>> >>> -Warren V >>> >>> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore >> >>> > wrote: >>> >>> I?m not sure I?ve followed properly. You have *two* interfaces. >>> You are not trying to provision these in an aggr are you? As far as >>> I?m aware, VMware does not support 802.3ad link aggregations. (Its >>> possible that you can make it work with ESXi if you give the entire >>> NIC to the guest ? but I?m skeptical.) The problem is that if you >>> try to use link aggregation, some packets (up to half!) will be >>> lost. TCP and other protocols fare poorly in this situation. >>> >>> Its possible I?ve totally misunderstood what you?re trying to do, in >>> which case I apologize. >>> >>> The idle thing is a red-herring ? the cpu is waiting for work to do, >>> probably because packets haven?t arrived (or where dropped by the >>> hypervisor!) I wouldn?t read too much into that except that your >>> network stack is in trouble. I?d look a bit more closely at the >>> kstats for tcp ? I suspect you?ll see retransmits or out of order >>> values that are unusually high ? if so this may help validate my >>> theory above. >>> >>> - Garrett >>> >>> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer >>> > >>> >>> >>> wrote: >>> >>> Hello all, >>> >>> >>> Well, I no longer blame the ixgbe driver for the problems I'm seeing. >>> >>> >>> I tried Joerg's updated driver, which didn't improve the issue. So >>> I went back to the drawing board and rebuilt the server from scratch. >>> >>> What I noted is that if I have only a single 1-gig physical >>> interface active on the ESXi host, everything works as expected. >>> As soon as I enable two interfaces, I start seeing the performance >>> problems I've described. >>> >>> Response pauses from the server that I see in TCPdumps are still >>> leading me to believe the problem is delay on the server side, so >>> I ran a series of kernel dtraces and produced some flamegraphs. >>> >>> >>> This was taken during a read operation with two active 10G >>> interfaces on the server, with a single target being shared by two >>> tpgs- one tpg for each 10G physical port. The host device has two >>> 1G ports enabled, with VLANs separating the active ports into >>> 10G/1G pairs. ESXi is set to multipath using both VLANS with a >>> round-robin IO interval of 1. >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing >>> >>> >>> This was taken during a write operation: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing >>> >>> >>> I then rebooted the server and disabled C-State, ACPI T-State, and >>> general EIST (Turbo boost) functionality in the CPU. >>> >>> I when I attempted to boot my guest VM, the iSCSI transfer >>> gradually ground to a halt during the boot loading process, and >>> the guest OS never did complete its boot process. >>> >>> Here is a flamegraph taken while iSCSI is slowly dying: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing >>> >>> >>> I edited out cpu_idle_adaptive from the dtrace output and >>> regenerated the slowdown graph: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing >>> >>> >>> I then edited cpu_idle_adaptive out of the speedy write operation >>> and regenerated that graph: >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing >>> >>> >>> I have zero experience with interpreting flamegraphs, but the most >>> significant difference I see between the slow read example and the >>> fast write example is in unix`thread_start --> unix`idle. There's >>> a good chunk of "unix`i86_mwait" in the read example that is not >>> present in the write example at all. >>> >>> Disabling the l2arc cache device didn't make a difference, and I >>> had to reenable EIST support on the CPU to get my VMs to boot. >>> >>> I am seeing a variety of bug reports going back to 2010 regarding >>> excessive mwait operations, with the suggested solutions usually >>> being to set "cpupm enable poll-mode" in power.conf. That change >>> also had no effect on speed. >>> >>> -Warren V >>> >>> >>> >>> >>> -----Original Message----- >>> >>> From: Chris Siebenmann [mailto:cks at cs.toronto.edu] >>> >>> Sent: Monday, February 23, 2015 8:30 AM >>> >>> To: W Verb >>> >>> Cc: omnios-discuss at lists.omniti.com >>> >>> ; cks at cs.toronto.edu >>> >>> >>> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and >>> the Greek economy >>> >>> >>> > Chris, thanks for your specific details. I'd appreciate it if you >>> >>> > could tell me which copper NIC you tried, as well as to pass on the >>> >>> > iSCSI tuning parameters. >>> >>> >>> Our copper NIC experience is with onboard X540-AT2 ports on >>> SuperMicro hardware (which have the guaranteed 10-20 msec lock >>> hold) and dual-port 82599EB TN cards (which have some sort of >>> driver/hardware failure under load that eventually leads to >>> 2-second lock holds). I can't recommend either with the current >>> driver; we had to revert to 1G networking in order to get stable >>> servers. >>> >>> >>> The iSCSI parameter modifications we do, across both initiators >>> and targets, are: >>> >>> >>> initialr2tno >>> >>> firstburstlength128k >>> >>> maxrecvdataseglen128k[only on Linux backends] >>> >>> maxxmitdataseglen128k[only on Linux backends] >>> >>> >>> The OmniOS initiator doesn't need tuning for more than the first >>> two parameters; on the Linux backends we tune up all four. My >>> extended thoughts on these tuning parameters and why we touch them >>> can be found >>> >>> here: >>> >>> >>> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol >>> >>> http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning >>> >>> >>> The short version is that these parameters probably only make a >>> small difference but their overall goal is to do 128KB ZFS reads >>> and writes in single iSCSI operations (although they will be >>> fragmented at the TCP >>> >>> layer) and to do iSCSI writes without a back-and-forth delay >>> between initiator and target (that's 'initialr2t no'). >>> >>> >>> I think basically everyone should use InitialR2T set to no and in >>> fact that it should be the software default. These days only >>> unusually limited iSCSI targets should need it to be otherwise and >>> they can change their setting for it (initiator and target must >>> both agree to it being 'yes', so either can veto it). >>> >>> >>> - cks >>> >>> >>> >>> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann >> >>> > wrote: >>> >>> Hi, >>> >>> I think your problem is caused by your link properties or your >>> switch settings. In general the standard ixgbe seems to perform >>> well. >>> >>> I had trouble after changing the default flow control settings >>> to "bi" >>> and this was my motivation to update the ixgbe driver a long >>> time ago. >>> After I have updated our systems to ixgbe 2.5.8 I never had any >>> problems .... >>> >>> Make sure your switch has support for jumbo frames and you use >>> the same mtu on all ports, otherwise the smallest will be used. >>> >>> What switch do you use? I can tell you nice horror stories about >>> different vendors.... >>> >>> - Joerg >>> >>> On 23.02.2015 10:31, W Verb wrote: >>> >>> Thank you Joerg, >>> >>> I've downloaded the package and will try it tomorrow. >>> >>> The only thing I can add at this point is that upon review >>> of my >>> testing, I may have performed my "pkg -u" between the >>> initial quad-gig >>> performance test and installing the 10G NIC. So this may >>> be a new >>> problem introduced in the latest updates. >>> >>> Those of you who are running 10G and have not upgraded to >>> the latest >>> kernel, etc, might want to do some additional testing >>> before running the >>> update. >>> >>> -Warren V >>> >>> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann >>> >>> >>> >> wrote: >>> >>> Hi, >>> >>> I remember there was a problem with the flow control >>> settings in the >>> ixgbe >>> driver, so I updated it a long time ago for our >>> internal servers to >>> 2.5.8. >>> Last weekend I integrated the latest changes from the >>> FreeBSD driver >>> to bring >>> the illumos ixgbe to 2.5.25 but I had no time to test >>> it, so it's >>> completely >>> untested! >>> >>> >>> If you would like to give the latest driver a try you >>> can fetch the >>> kernel modules from >>> https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 >>> >>> >> > >>> >>> Clone your boot environment, place the modules in the >>> new environment >>> and update the boot-archive of the new BE. >>> >>> - Joerg >>> >>> >>> >>> >>> >>> On 23.02.2015 02:54, W Verb wrote: >>> >>> By the way, to those of you who have working >>> setups: please send me >>> your pool/volume settings, interface linkprops, >>> and any kernel >>> tuning >>> parameters you may have set. >>> >>> Thanks, >>> Warren V >>> >>> On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip >>> >>> >> >>> >>> >>> wrote: >>> >>> I can't say I totally agree with your performance >>> assessment. I run Intel >>> X520 in all my OmniOS boxes. >>> >>> Here is a capture of nfssvrtop I made while >>> running many >>> storage vMotions >>> between two OmniOS boxes hosting NFS >>> datastores. This is a >>> 10 host VMware >>> cluster. Both OmniOS boxes are dual 10G >>> connected with >>> copper twin-ax to >>> the in rack Nexus 5010. >>> >>> VMware does 100% sync writes, I use ZeusRAM >>> SSDs for log >>> devices. >>> >>> -Chip >>> >>> 2014 Apr 24 08:05:51, load: 12.64, read: >>> 17330243 KB, >>> swrite: 15985 KB, >>> awrite: 1875455 KB >>> >>> Ver Client NFSOPS Reads >>> SWrites AWrites >>> Commits Rd_bw >>> SWr_bw AWr_bw Rd_t SWr_t AWr_t >>> Com_t Align% >>> >>> 4 10.28.17.105 0 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 10.28.17.215 0 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 10.28.17.213 0 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 10.28.16.151 0 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 all 1 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 3 10.28.16.175 3 0 >>> 3 0 >>> 0 1 >>> 11 0 4806 48 0 0 85 >>> >>> 3 10.28.16.183 6 0 >>> 6 0 >>> 0 3 >>> 162 0 549 124 0 0 >>> 73 >>> >>> 3 10.28.16.180 11 0 >>> 10 0 >>> 0 3 >>> 27 0 776 89 0 0 67 >>> >>> 3 10.28.16.176 28 2 >>> 26 0 >>> 0 10 >>> 405 0 2572 198 0 0 >>> 100 >>> >>> 3 10.28.16.178 4606 4602 >>> 4 0 >>> 0 294534 >>> 3 0 723 49 0 0 99 >>> >>> 3 10.28.16.179 4905 4879 >>> 26 0 >>> 0 312208 >>> 311 0 735 271 0 0 >>> 99 >>> >>> 3 10.28.16.181 5515 5502 >>> 13 0 >>> 0 352107 >>> 77 0 89 87 0 0 99 >>> >>> 3 10.28.16.184 12095 12059 >>> 10 0 >>> 0 763014 >>> 39 0 249 147 0 0 99 >>> >>> 3 10.28.58.1 15401 6040 >>> 116 6354 >>> 53 191605 >>> 474 202346 192 96 144 83 >>> 99 >>> >>> 3 all 42574 33086 >>> 217 >>> 6354 53 1913488 >>> 1582 202300 348 138 153 105 >>> 99 >>> >>> >>> >>> >>> >>> On Fri, Feb 20, 2015 at 11:46 PM, W Verb >>> >>> >> >>> >>> >> wrote: >>> >>> >>> Hello All, >>> >>> Thank you for your replies. >>> I tried a few things, and found the following: >>> >>> 1: Disabling hyperthreading support in the >>> BIOS drops >>> performance overall >>> by a factor of 4. >>> 2: Disabling VT support also seems to have >>> some effect, >>> although it >>> appears to be minor. But this has the >>> amusing side >>> effect of fixing the >>> hangs I've been experiencing with fast >>> reboot. Probably >>> by disabling kvm. >>> 3: The performance tests are a bit tricky >>> to quantify >>> because of caching >>> effects. In fact, I'm not entirely sure >>> what is >>> happening here. It's just >>> best to describe what I'm seeing: >>> >>> The commands I'm using to test are >>> dd if=/dev/zero of=./test.dd bs=2M count=5000 >>> dd of=/dev/null if=./test.dd bs=2M count=5000 >>> The host vm is running Centos 6.6, and has >>> the latest >>> vmtools installed. >>> There is a host cache on an SSD local to >>> the host that >>> is also in place. >>> Disabling the host cache didn't >>> immediately have an >>> effect as far as I could >>> see. >>> >>> The host MTU set to 3000 on all iSCSI >>> interfaces for all >>> tests. >>> >>> Test 1: Right after reboot, with an ixgbe >>> MTU of 9000, >>> the write test >>> yields an average speed over three tests >>> of 137MB/s. The >>> read test yields an >>> average over three tests of 5MB/s. >>> >>> Test 2: After setting "ifconfig ixgbe0 mtu >>> 3000", the >>> write tests yield >>> 140MB/s, and the read tests yield 53MB/s. >>> It's important >>> to note here that >>> if I cut the read test short at only >>> 2-3GB, I get >>> results upwards of >>> 350MB/s, which I assume is local >>> cache-related distortion. >>> >>> Test 3: MTU of 1500. Read tests are up to >>> 156 MB/s. >>> Write tests yield >>> about 142MB/s. >>> Test 4: MTU of 1000: Read test at 182MB/s. >>> Test 5: MTU of 900: Read test at 130 MB/s. >>> Test 6: MTU of 1000: Read test at 160MB/s. >>> Write tests >>> are now >>> consistently at about 300MB/s. >>> Test 7: MTU of 1200: Read test at 124MB/s. >>> Test 8: MTU of 1000: Read test at 161MB/s. >>> Write at 261MB/s. >>> >>> A few final notes: >>> L1ARC grabs about 10GB of RAM during the >>> tests, so >>> there's definitely some >>> read caching going on. >>> The write operations are easier to observe >>> with iostat, >>> and I'm seeing io >>> rates that closely correlate with the >>> network write speeds. >>> >>> >>> Chris, thanks for your specific details. >>> I'd appreciate >>> it if you could >>> tell me which copper NIC you tried, as >>> well as to pass >>> on the iSCSI tuning >>> parameters. >>> >>> I've ordered an Intel EXPX9502AFXSR, which >>> uses the >>> 82598 chip instead of >>> the 82599 in the X520. If I get similar >>> results with my >>> fiber transcievers, >>> I'll see if I can get a hold of copper ones. >>> >>> But I should mention that I did indeed >>> look at PHY/MAC >>> error rates, and >>> they are nil. >>> >>> -Warren V >>> >>> On Fri, Feb 20, 2015 at 7:25 PM, Chris >>> Siebenmann >>> >> >>> >> >>> >>> >> >>> >>> wrote: >>> >>> >>> After installation and >>> configuration, I observed >>> all kinds of bad >>> behavior >>> in the network traffic between the >>> hosts and the >>> server. All of this >>> bad >>> behavior is traced to the ixgbe >>> driver on the >>> storage server. Without >>> going >>> into the full troubleshooting >>> process, here are >>> my takeaways: >>> >>> [...] >>> >>> For what it's worth, we managed to >>> achieve much >>> better line rates on >>> copper 10G ixgbe hardware of various >>> descriptions >>> between OmniOS >>> and CentOS 7 (I don't think we ever >>> tested OmniOS to >>> OmniOS). I don't >>> believe OmniOS could do TCP at full >>> line rate but I >>> think we managed 700+ >>> Mbytes/sec on both transmit and >>> receive and we got >>> basically disk-limited >>> speeds with iSCSI (across multiple >>> disks on >>> multi-disk mirrored pools, >>> OmniOS iSCSI initiator, Linux iSCSI >>> targets). >>> >>> I don't believe we did any specific >>> kernel tuning >>> (and in fact some of >>> our attempts to fiddle ixgbe driver >>> parameters blew >>> up in our face). >>> We did tune iSCSI connection >>> parameters to increase >>> various buffer >>> sizes so that ZFS could do even large >>> single >>> operations in single iSCSI >>> transactions. (More details available >>> if people are >>> interested.) >>> >>> 10: At the wire level, the speed >>> problems are >>> clearly due to pauses in >>> response time by omnios. At 9000 >>> byte frame >>> sizes, I see a good number >>> of duplicate ACKs and fast >>> retransmits during >>> read operations (when >>> omnios is transmitting). But below >>> about a >>> 4100-byte MTU on omnios >>> (which seems to correlate to >>> 4096-byte iSCSI >>> block transfers), the >>> transmission errors fade away and >>> we only see >>> the transmission pause >>> problem. >>> >>> >>> This is what really attracted my >>> attention. In >>> our OmniOS setup, our >>> specific Intel hardware had ixgbe >>> driver issues that >>> could cause >>> activity stalls during once-a-second >>> link heartbeat >>> checks. This >>> obviously had an effect at the TCP and >>> iSCSI layers. >>> My initial message >>> to illumos-developer sparked a potentially >>> interesting discussion: >>> >>> >>> http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/ >>> >>> >>> >> > >>> >>> If you think this is a possibility in >>> your setup, >>> I've put the DTrace >>> script I used to hunt for this up on >>> the web: >>> >>> http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d >>> >>> >>> >> > >>> >>> This isn't the only potential source >>> of driver >>> stalls by any means, it's >>> just the one I found. You may also >>> want to look at >>> lockstat in general, >>> as information it reported is what led >>> us to look >>> specifically at the >>> ixgbe code here. >>> >>> (If you suspect kernel/driver issues, >>> lockstat >>> combined with kernel >>> source is a really excellent resource.) >>> >>> - cks >>> >>> >>> >>> >>> >>> ___________________________________________________ >>> OmniOS-discuss mailing list >>> OmniOS-discuss at lists.omniti >>> .____com >>> >> > >>> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss >>> >>> >>> >> > >>> >>> >>> ___________________________________________________ >>> OmniOS-discuss mailing list >>> OmniOS-discuss at lists.omniti >>> .____com >>> >> > >>> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss >>> >>> >>> >> > >>> >>> >>> -- >>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, >>> 90408 Nuernberg >>> Tel: +49 911 39905-0 >>> - Fax: +49 911 >>> 39905-55 - >>> http://www.osn.de >>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg >>> Goltermann >>> >>> >>> >>> -- >>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg >>> Tel: +49 911 39905-0 - Fax: +49 >>> 911 39905-55 - http://www.osn.de >>> >>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >>> >>> >>> *illumos-developer* | Archives >>> >>> >>> | Modify Your Subscription >>> [Powered by Listbox] >>> >>> >>> >>> *illumos-developer* | Archives >>> >>> | >>> Modify >>> >>> >>> Your Subscription [Powered by Listbox] >> >>> ... >>> >>> [Message clipped] > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wverb73 at gmail.com Wed Mar 4 08:27:08 2015 From: wverb73 at gmail.com (W Verb) Date: Wed, 4 Mar 2015 00:27:08 -0800 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> <54F44602.5030705@osn.de> <064C3FC8-43E0-4757-AADA-831303782A4C@damore.org> Message-ID: Thank you for following up, Garrett, The logs of all lockstat sessions are now in the zipfile located here: https://drive.google.com/file/d/0BwyUMjibonYQeVlzN2VndGstRUk/view?usp=sharing Regards, Warren V On Tue, Mar 3, 2015 at 11:30 PM, Garrett D'Amore wrote: > I'm not surprised by this result. Indeed with the earlier data you had > from lockstat it looked like a comstar or zfs issue on the server. > Unfortunately the follow up lockstat you sent was pruned to uselessness. > If you can post the full lockstat with -s5 somewhere it might help > understand what is actually going on under the hood. > > Sent from my iPhone > > On Mar 3, 2015, at 9:21 PM, W Verb wrote: > > Hello all, > > This is probably the last message in this thread. > > I pulled the quad-gig NIC out of one of my hosts, and installed an X520. I > then set a single 10G port on the server to be on the same VLAN as the > host, and defined a vswitch, vmknic, etc on the host. > > I set the MTU to be 9000 on both sides, then ran my tests. > > Read: 130 MB/s. > Write: 156 MB/s. > > Additionally, at higher MTUs, the NIC would periodically lock up until I > performed an "ipadm disable-if -t ixgbe0" and re-enabled it. I tried your > updated driver, Jeorg, but unfortunately it failed quite often. > > I then disabled stmf, enabled NFS (v3 only) on the server, and shared a > dataset on the zpool with "share -f nfs /ppool/testy". > I then mounted the server dataset on the host via NFS, and copied my test > VM from the iSCSI zvol to the NFS dataset. I also removed the binding of > the 10G port on the host from the sw iscsi interface. > > Running the same tests on the VM over NFSv3 yielded: > > Read: 650MB/s > Write: 306MB/s > > This is getting within 10% of the throughput I consistently get on dd > operations local on the server, so I'm pretty happy that I'm getting as > good as I'm going to get until I add more drives. Additionally, I haven't > experienced any NIC hangs. > > I tried varying the settings in ixgbe.conf, the MTU, and disabling LSO on > the host and server, but nothing really made that much of a difference > (except reducing the MTU made things about 20-30% slower). > > mpstat during both NFS and iSCSI transfers showed all processors as > getting roughly the same number of interrupts, etc, although I did see a > varying number of spins on reader/writer locks during the iSCSI transfers. > The NFS showed no srws at all. > > Here is a pretty representative example of a 1s mpstat during an iSCSI > transfer: > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt > idl set > 0 0 0 0 3246 2690 8739 6 772 5967 2 0 0 11 0 > 89 0 > 1 0 0 0 2366 2249 7910 8 988 5563 2 302 0 9 0 > 91 0 > 2 0 0 0 2455 2344 5584 5 687 5656 3 66 0 9 0 > 91 0 > 3 0 0 25 248 12 6210 1 885 5679 2 0 0 9 0 > 91 0 > 4 0 0 0 284 7 5450 2 861 5751 1 0 0 8 0 > 92 0 > 5 0 0 0 232 3 4513 0 547 5733 3 0 0 7 0 > 93 0 > 6 0 0 0 322 8 6084 1 836 6295 2 0 0 8 0 > 92 0 > 7 0 0 0 3114 2848 8229 4 648 4966 2 0 0 10 0 > 90 0 > > > So, it seems that it's COMSTAR/iSCSI that's broke as hell, not ixgbe. My > apologies to anyone I may have offended with my pre-judgement. > > The consequences of this performance issue are significant: > 1: Instead of being able to utilize the existing quad-port NICs I have in > my hosts, I must use dual 10G cards for redundancy purposes. > 2: I must build out a full 10G switching infrastructure. > 3: The network traffic is inherently less secure, as it is essentially > impossible to do real security with NFSv3 (that is supported by ESXi). > > In the short run, I have already ordered some relatively cheap 20G > infiniband gear that will hopefully push up the cost/performance ratio. > However, I have received all sorts of advice about how painful it can be to > build and maintain infiniband, and if iSCSI over 10G ethernet is this > painful, I'm not hopeful that infiniband will "just work". > > The last option, of course, is to bail out of the Solaris derivatives and > move to ZoL or ZoBSD. The drawbacks of this are: > > 1: ZoL doesn't easily support booting off of mirrored USB flash drives, > let alone running the root filesystem and swap on them. FreeNAS, by way of > comparison, puts a 2G swap partition on each zdev, which (strangely enough) > causes it to often crash when a zdev experiences a failure under load. > > 2: Neither ZoL or FreeNAS have good, stable, kernel-based iSCSI > implementations. FreeNAS is indeed testing istgt, but it proved unstable > for my purposes in recent builds. Unfortunately, stmf hasn't proved itself > any better. > > There are other minor differences, but these are the ones that brought me > to OmniOS in the first place. We'll just have to wait and see how well the > infiniband stuff works. > > > Hopefully this exercise will help prevent others from going down the same > rabbit-hole that I did. > > -Warren V > > > > > On Tue, Mar 3, 2015 at 3:45 PM, W Verb wrote: > >> Hello Rob et al, >> >> Thank you for taking the time to look at this problem with me. I >> completely understand your inclination to look at the network as the most >> probable source of my issue, but I believe that this is a pretty clear-cut >> case of server-side issues. >> >> 1: I did run ping RTT tests during both read and write operations with >> multiple interfaces enabled, and the RTT stayed at ~.2ms regardless of >> whether traffic was actively being transmitted/received or not. >> >> 2: I am not seeing the TCP window size bouncing around, and I am >> certainly not seeing starvation and delay in my packet captures. It is true >> that I do see delayed ACKs and retransmissions when I bump the MTU to 9000 >> on both sides, but I stopped testing with high MTU as soon as I saw it >> happening because I have a good understanding of incast. All of my recent >> testing has been with MTUs between 1000 and 3000 bytes. >> >> 3: When testing with MTUs between 1000 and 3000 bytes, I do not see lost >> packets and retransmission in captures on either the server or client side. >> I only see staggered transmission delays on the part of the server. >> >> 4: The client is consistently advertising a large window size (20k+), so >> the TCP throttling mechanism does not appear to play into this. >> >> 5: As mentioned previously, layer 2 flow control is not enabled anywhere >> in the network, so there are no lower-level mechanisms at work. >> >> 6: Upon checking buffer and queue sizes (and doing the appropriate >> research into documentation on the C3560E's buffer sizes), I do not see >> large numbers of frames being dropped by the switch. It does happen at >> larger MTUs, but not very often (and not consistently) during transfers at >> 1000-3000 byte MTUs. I do not have QoS, policing, or rate-shaping enabled. >> >> 7: Network interface stats on both the server and the ESXi client show no >> errors of any kind. This is via netstat on the server, and esxcli / Vsphere >> client on the ESXi box. >> >> 8: When looking at captures taken simultaneously on the server and client >> side, the server-side transmission pauses are consistently seen and >> reproducible, even after multiple server rebuilds, ESXi rebuilds, vSphere >> reinstallations (down to wiping the SQL db), various COMSTAR configuration >> variations, multiple 10G NICs with different NIC chipsets, multiple >> switches (I tried both a 48-port and 24-port C3560E), multiple IOS >> revisions (12.2 and 15.0), OmniOS versions (r151012 and previous) multiple >> cables, transceivers, etc etc etc etc etc >> >> For your review, I have uploaded the actual packet captures to Google >> Drive: >> >> >> https://drive.google.com/file/d/0BwyUMjibonYQZS03dHJ2ZjJvTEE/view?usp=sharing >> 2 int write - ESXi vmk5 >> >> https://drive.google.com/file/d/0BwyUMjibonYQTlNTQ2M5bjlxZ00/view?usp=sharing >> 2 int write - ESXi vmk1 >> >> https://drive.google.com/file/d/0BwyUMjibonYQUEFsSVJCYXBVX3c/view?usp=sharing >> 2 int read - server ixgbe0 >> >> https://drive.google.com/file/d/0BwyUMjibonYQT3FBbElnNFpJTzQ/view?usp=sharing >> 2 int read - ESXi vmk5 >> >> https://drive.google.com/file/d/0BwyUMjibonYQU1hXdFRLM2cxSTA/view?usp=sharing >> 2 int read - ESXi vmk1 >> >> https://drive.google.com/file/d/0BwyUMjibonYQNEFZSHVZdFNweDA/view?usp=sharing >> 1 int write - ESXi vmk1 >> >> https://drive.google.com/file/d/0BwyUMjibonYQM3FpTmloQm5iMGc/view?usp=sharing >> 1 int read - ESXi vmk1 >> >> Regards, >> >> Warren V >> >> On Mon, Mar 2, 2015 at 1:11 PM, Mallory, Rob >> wrote: >> >>> Just an EWAG, and forgive me for not following closely, I just saw >>> this in my inbox, and looked at it and the screenshots for 2 minutes. >>> >>> >>> >>> But this looks like the typical incast problem.. see >>> http://www.pdl.cmu.edu/Incast/ >>> >>> where your storage servers (there are effectively two with ISCSI/MPIO if >>> round-robin is working) have networks which are 20:1 oversubscribed to your >>> 1GbE host interfaces. (although one of the tcpdumps shows only one server >>> so it may be choked out completely) >>> >>> >>> >>> What is your BDP? I?m guessing .150ms * 1GbE. For single-link that >>> gets you to a MSS of 18700 or so. >>> >>> >>> >>> On your 1GbE connected clients, leave MTU at 9k, set the following in >>> sysctl.conf, >>> >>> And reboot. >>> >>> >>> >>> net.ipv4.tcp_rmem = 4096 8938 17876 >>> >>> >>> >>> If MPIO from the server is indeed round-robining properly, this will >>> ?make things fit? much better. >>> >>> >>> >>> Note that your tcp_wmem can and should stay high, since you are not >>> oversubscribed going from client?server ; you only need to tweak the >>> tcp receive window size. >>> >>> >>> >>> I?ve not done it in quite some time, but IIRC, You can also set these >>> from the server side with: >>> >>> Route add -sendpipe 8930 or ?ssthresh >>> >>> >>> >>> And I think you can see the hash-table with computed BDP per client with >>> ndd. >>> >>> >>> >>> I would try playing with those before delving deep into potential bugs >>> in the TCP, nic driver, zfs, or vm. >>> >>> -Rob >>> >>> >>> >>> *From:* W Verb via illumos-developer [mailto:developer at lists.illumos.org] >>> >>> *Sent:* Monday, March 02, 2015 12:20 PM >>> *To:* Garrett D'Amore >>> *Cc:* Joerg Goltermann; illumos-dev; omnios-discuss at lists.omniti.com >>> *Subject:* Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, >>> Lindsay Lohan, and the Greek economy >>> >>> >>> >>> Hello, >>> >>> vmstat seems pretty boring. Certainly nothing going to swap. >>> >>> root at sanbox:/root# vmstat >>> kthr memory page disk faults >>> cpu >>> r b w swap free re mf pi po fr de sr po ro s0 s2 in sy cs us >>> sy id >>> 0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 >>> 0 1 99 >>> >>> Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep >>> 30" during the "fast" write operation. >>> >>> >>> ------------------------------------------------------------------------------- >>> Count indv cuml rcnt nsec Hottest Lock Caller >>> 50934 3% 79% 0.00 3437 0xffffff093145ba40 taskq_dispatch_ent >>> >>> nsec ------ Time Distribution ------ count Stack >>> 128 | 7 >>> spa_taskq_dispatch_ent >>> 256 |@@ 4333 zio_taskq_dispatch >>> 512 |@@ 3863 zio_issue_async >>> 1024 |@@@@@ 9717 zio_execute >>> 2048 |@@@@@@@@@ 15904 >>> 4096 |@@@@ 7595 >>> 8192 |@@ 4498 >>> 16384 |@ 2662 >>> 32768 |@ 1886 >>> 65536 | 434 >>> 131072 | 34 >>> 262144 | 1 >>> >>> ------------------------------------------------------------------------------- >>> >>> >>> However, the truly "broken" function is a read operation: >>> >>> Top lock 1st try: >>> >>> ------------------------------------------------------------------------------- >>> Count indv cuml rcnt nsec Hottest Lock Caller >>> 474 15% 15% 0.00 7031 0xffffff093145b6f8 cv_wait >>> >>> nsec ------ Time Distribution ------ count Stack >>> 256 |@ 29 taskq_thread_wait >>> 512 |@@@@@@ 100 taskq_thread >>> 1024 |@@@@ 72 thread_start >>> 2048 |@@@@ 69 >>> 4096 |@@@ 51 >>> 8192 |@@ 47 >>> 16384 |@@ 44 >>> 32768 |@@ 32 >>> 65536 |@ 25 >>> 131072 | 5 >>> >>> ------------------------------------------------------------------------------- >>> >>> Top lock 2nd try: >>> >>> >>> ------------------------------------------------------------------------------- >>> Count indv cuml rcnt nsec Hottest Lock Caller >>> 174 39% 39% 0.00 103909 0xffffff0943f116a0 dmu_zfetch_find >>> >>> nsec ------ Time Distribution ------ count Stack >>> 2048 | 2 dmu_zfetch >>> 4096 | 3 dbuf_read >>> 8192 | 4 >>> dmu_buf_hold_array_by_dnode >>> 16384 | 3 dmu_buf_hold_array >>> 32768 |@ 7 >>> 65536 |@@ 14 >>> 131072 |@@@@@@@@@@@@@@@@@@@@ 116 >>> 262144 |@@@ 19 >>> 524288 | 4 >>> 1048576 | 2 >>> >>> ------------------------------------------------------------------------------- >>> >>> Top lock 3rd try: >>> >>> >>> ------------------------------------------------------------------------------- >>> Count indv cuml rcnt nsec Hottest Lock Caller >>> 283 55% 55% 0.00 94602 0xffffff0943ff5a68 dmu_zfetch_find >>> >>> nsec ------ Time Distribution ------ count Stack >>> 512 | 1 dmu_zfetch >>> 1024 | 1 dbuf_read >>> 2048 | 0 >>> dmu_buf_hold_array_by_dnode >>> 4096 | 5 dmu_buf_hold_array >>> 8192 | 2 >>> 16384 | 7 >>> 32768 | 4 >>> 65536 |@@@ 33 >>> 131072 |@@@@@@@@@@@@@@@@@@@@ 198 >>> 262144 |@@ 27 >>> 524288 | 2 >>> 1048576 | 3 >>> >>> ------------------------------------------------------------------------------- >>> >>> >>> >>> As for the MTU question- setting the MTU to 9000 makes read operations >>> grind almost to a halt at 5MB/s transfer rate. >>> >>> -Warren V >>> >>> >>> >>> On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore >>> wrote: >>> >>> Here?s a theory. You are using small (relatively) MTUs (3000 is less >>> than the smallest ZFS block size.) So, when you go multipathing this way, >>> might a single upper layer transaction (ZFS block transfer request, or for >>> that matter COMSTAR block request) get routed over different paths. This >>> sounds like a potentially pathological condition to me. >>> >>> >>> >>> What happens if you increase the MTU to 9000? Have you tried it? I?m >>> sort of thinking that this will permit each transaction to be issued in a >>> single IP frame, which may alleviate certain tragic code paths. (That >>> said, I?m not sure how aware COMSTAR is of the IP MTU. If it is ignorant, >>> then it shouldn?t matter *that* much, since TCP should do the right thing >>> here and a single TCP stream should stick to a single underlying NIC. But >>> if COMSTAR is aware of the MTU, it may do some really screwball things as >>> it tries to break requests up into single frames.) >>> >>> >>> >>> Your read spin really looks like only about 22 msec of wait out of a >>> total run of 30 sec. (That?s not *great*, but neither does it sound >>> tragic.) Your write is interesting because that looks like it is going a >>> wildly different path. You should be aware that the locks you see are >>> *not* necessarily related in call order, but rather are ordered by instance >>> count. The write code path hitting the task_thread as hard as it does is >>> really, really weird. Something is pounding on a taskq lock super hard. >>> The number of taskq_dispatch_ent calls is interesting here. I?m starting >>> to wonder if it?s something as stupid as a spin where if the taskq is >>> ?full? (max size reached), a caller just is spinning trying to dispatch >>> jobs to the taskq. >>> >>> >>> >>> The taskq_dispatch_ent code is super simple, and it should be almost >>> impossible to have contention on that lock ? barring a thread spinning hard >>> on taskq_dispatch (or taskq_dispatch_ent as I think is happening here). >>> Looking at the various call sites, there are places in both COMSTAR >>> (iscsit) and in ZFS where this could be coming from. To know which, we >>> really need to have the back trace associated. >>> >>> >>> >>> lockstat can give this ? try giving ?-s 5? to give a short backtrace >>> from this, that will probably give us a little more info about the guilty >>> caller. :-) >>> >>> >>> >>> - Garrett >>> >>> >>> >>> On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer < >>> developer at lists.illumos.org> wrote: >>> >>> >>> >>> Hello all, >>> >>> I am not using layer 2 flow control. The switch carries line-rate 10G >>> traffic without error. >>> >>> I think I have found the issue via lockstat. The first lockstat is taken >>> during a multipath read: >>> >>> lockstat -kWP sleep 30 >>> >>> >>> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec) >>> >>> Count indv cuml rcnt nsec Hottest Lock Caller >>> >>> ------------------------------------------------------------------------------- >>> 9306 44% 44% 0.00 1557 htable_mutex+0x370 htable_release >>> 6307 23% 68% 0.00 1207 htable_mutex+0x108 htable_lookup >>> 596 7% 75% 0.00 4100 0xffffff0931705188 cv_wait >>> 349 5% 80% 0.00 4437 0xffffff0931705188 taskq_thread >>> 704 2% 82% 0.00 995 0xffffff0935de3c50 dbuf_create >>> >>> The hash table being read here I would guess is the tcp connection hash >>> table. >>> >>> >>> >>> When lockstat is run during a multipath write operation, I get: >>> >>> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec) >>> >>> Count indv cuml rcnt nsec Hottest Lock Caller >>> >>> ------------------------------------------------------------------------------- >>> 210752 28% 28% 0.00 4781 0xffffff0931705188 taskq_thread >>> 174471 22% 50% 0.00 4476 0xffffff0931705188 cv_wait >>> 127183 10% 61% 0.00 2871 0xffffff096f29b510 zio_notify_parent >>> 176066 10% 70% 0.00 1922 0xffffff0931705188 taskq_dispatch_ent >>> 105134 9% 80% 0.00 3110 0xffffff096ffdbf10 zio_remove_child >>> 67512 4% 83% 0.00 1938 0xffffff096f3db4b0 zio_add_child >>> 45736 3% 86% 0.00 2239 0xffffff0935de3c50 dbuf_destroy >>> 27781 3% 89% 0.00 3416 0xffffff0935de3c50 dbuf_create >>> 38536 2% 91% 0.00 2122 0xffffff0935de3b70 dnode_rele >>> 27841 2% 93% 0.00 2423 0xffffff0935de3b70 dnode_diduse_space >>> 19020 2% 95% 0.00 3046 0xffffff09d9e305e0 dbuf_rele >>> 14627 1% 96% 0.00 3632 dbuf_hash_table+0x4f8 dbuf_find >>> >>> >>> Writes are not performing htable lookups, while reads are. >>> >>> -Warren V >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann wrote: >>> >>> Hi, >>> >>> I would try *one* TPG which includes both interface addresses >>> and I would double check for packet drops on the Catalyst. >>> >>> The 3560 supports only receive flow control which means, that >>> a sending 10Gbit port can easily overload a 1Gbit port. >>> Do you have flow control enabled? >>> >>> - Joerg >>> >>> >>> >>> On 02.03.2015 09:22, W Verb via illumos-developer wrote: >>> >>> Hello Garrett, >>> >>> No, no 802.3ad going on in this config. >>> >>> Here is a basic schematic: >>> >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing >>> >>> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide: >>> >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing >>> >>> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The >>> switch is set to allow 9148-byte frames, and I'm not seeing any >>> errors/buffer overruns on the switch. >>> >>> Here is a screenshot of a packet capture from a read operation on the >>> guest OS (from it's local drive, which is actually a VMDK file on the >>> storage server). In this example, only a single 1G ESXi kernel interface >>> (vmk1) is bound to the software iSCSI initiator. >>> >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing >>> >>> Note that there's a nice, well-behaved window sizing process taking >>> place. The ESXi decreases the scaled window by 11 or 12 for each ACK, >>> then bumps it back up to 512. >>> >>> Here is a similar screenshot of a single-interface write operation: >>> >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing >>> >>> There are no pauses or gaps in the transmission rate in the >>> single-interface transfers. >>> >>> >>> In the next screenshots, I have enabled an additional 1G interface on >>> the ESXi host, and bound it to the iSCSI initiator. The new interface is >>> bound to a separate physical port, uses a different VLAN on the switch, >>> and talks to a different 10G port on the storage server. >>> >>> First, let's look at a write operation on the guest OS, which happily >>> pumps data at near-line-rate to the storage server. >>> >>> Here is a sequence number trace diagram. Note how the transfer has a >>> nice, smooth increment rate over the entire transfer. >>> >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing >>> >>> Here are screenshots from packet captures on both 1G interfaces: >>> >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing >>> >>> Note how we again see nice, smooth window adjustment, and no gaps in >>> transmission. >>> >>> >>> But now, let's look at the problematic two-interface Read operation. >>> First, the sequence graph: >>> >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing >>> >>> As you can see, there are gaps and jumps in the transmission throughout >>> the transfer. >>> It is very illustrative to look at captures of the gaps, which are >>> occurring on both interfaces: >>> >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing >>> >>> As you can see, there are ~.4 second pauses in transmission from the >>> storage server, which kills the transfer rate. >>> It's clear that the ESXi box ACKs the prior iSCSI operation to >>> completion, then makes a new LUN request, which the storage server >>> immediately replies to. The ESXi ACKs the response packet from the >>> storage server, then waits...and waits....and waits... until eventually >>> the storage server starts transmitting again. >>> >>> Because the pause happens while the ESXi client is waiting for a packet >>> from the storage server, that tells me that the gaps are not an artifact >>> of traffic being switched between both active interfaces, but are >>> actually indicative of short hangs occurring on the server. >>> >>> Having a pause or two in transmission is no big deal, but in my case, it >>> is happening constantly, and dropping my overall read transfer rate down >>> to 20-60MB/s, which is slower than the single interface transfer rate >>> (~90-100MB/s). >>> >>> Decreasing the MTU makes the pauses shorter, increasing them makes the >>> pauses longer. >>> >>> Another interesting thing is that if I set the multipath io interval to >>> 3 operations instead of 1, I get better throughput. In other words, the >>> less frequently I swap IP addresses on my iSCSI requests from the ESXi >>> unit, the fewer pauses I see. >>> >>> Basically, COMSTAR seems to choke each time an iSCSI request from a new >>> IP arrives. >>> >>> Because the single interface transfer is near line rate, that tells me >>> that the storage system (mpt_sas, zfs, etc) is working fine. It's only >>> when multiple paths are attempted that iSCSI falls on its face during >>> reads. >>> >>> All of these captures were taken without a cache device being attached >>> to the storage zpool, so this isn't looking like some kind of ZFS ARC >>> problem. As mentioned previously, local transfers to/from the zpool are >>> showing ~300-500 MB/s rates over long transfers (10G+). >>> >>> -Warren V >>> >>> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore >> >>> > wrote: >>> >>> I?m not sure I?ve followed properly. You have *two* interfaces. >>> You are not trying to provision these in an aggr are you? As far as >>> I?m aware, VMware does not support 802.3ad link aggregations. (Its >>> possible that you can make it work with ESXi if you give the entire >>> NIC to the guest ? but I?m skeptical.) The problem is that if you >>> try to use link aggregation, some packets (up to half!) will be >>> lost. TCP and other protocols fare poorly in this situation. >>> >>> Its possible I?ve totally misunderstood what you?re trying to do, in >>> which case I apologize. >>> >>> The idle thing is a red-herring ? the cpu is waiting for work to do, >>> probably because packets haven?t arrived (or where dropped by the >>> hypervisor!) I wouldn?t read too much into that except that your >>> network stack is in trouble. I?d look a bit more closely at the >>> kstats for tcp ? I suspect you?ll see retransmits or out of order >>> values that are unusually high ? if so this may help validate my >>> theory above. >>> >>> - Garrett >>> >>> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer >>> > >>> >>> >>> wrote: >>> >>> Hello all, >>> >>> >>> Well, I no longer blame the ixgbe driver for the problems I'm seeing. >>> >>> >>> I tried Joerg's updated driver, which didn't improve the issue. So >>> I went back to the drawing board and rebuilt the server from scratch. >>> >>> What I noted is that if I have only a single 1-gig physical >>> interface active on the ESXi host, everything works as expected. >>> As soon as I enable two interfaces, I start seeing the performance >>> problems I've described. >>> >>> Response pauses from the server that I see in TCPdumps are still >>> leading me to believe the problem is delay on the server side, so >>> I ran a series of kernel dtraces and produced some flamegraphs. >>> >>> >>> This was taken during a read operation with two active 10G >>> interfaces on the server, with a single target being shared by two >>> tpgs- one tpg for each 10G physical port. The host device has two >>> 1G ports enabled, with VLANs separating the active ports into >>> 10G/1G pairs. ESXi is set to multipath using both VLANS with a >>> round-robin IO interval of 1. >>> >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing >>> >>> >>> This was taken during a write operation: >>> >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing >>> >>> >>> I then rebooted the server and disabled C-State, ACPI T-State, and >>> general EIST (Turbo boost) functionality in the CPU. >>> >>> I when I attempted to boot my guest VM, the iSCSI transfer >>> gradually ground to a halt during the boot loading process, and >>> the guest OS never did complete its boot process. >>> >>> Here is a flamegraph taken while iSCSI is slowly dying: >>> >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing >>> >>> >>> I edited out cpu_idle_adaptive from the dtrace output and >>> regenerated the slowdown graph: >>> >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing >>> >>> >>> I then edited cpu_idle_adaptive out of the speedy write operation >>> and regenerated that graph: >>> >>> >>> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing >>> >>> >>> I have zero experience with interpreting flamegraphs, but the most >>> significant difference I see between the slow read example and the >>> fast write example is in unix`thread_start --> unix`idle. There's >>> a good chunk of "unix`i86_mwait" in the read example that is not >>> present in the write example at all. >>> >>> Disabling the l2arc cache device didn't make a difference, and I >>> had to reenable EIST support on the CPU to get my VMs to boot. >>> >>> I am seeing a variety of bug reports going back to 2010 regarding >>> excessive mwait operations, with the suggested solutions usually >>> being to set "cpupm enable poll-mode" in power.conf. That change >>> also had no effect on speed. >>> >>> -Warren V >>> >>> >>> >>> >>> -----Original Message----- >>> >>> From: Chris Siebenmann [mailto:cks at cs.toronto.edu] >>> >>> Sent: Monday, February 23, 2015 8:30 AM >>> >>> To: W Verb >>> >>> Cc: omnios-discuss at lists.omniti.com >>> >>> ; cks at cs.toronto.edu >>> >>> >>> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and >>> the Greek economy >>> >>> >>> > Chris, thanks for your specific details. I'd appreciate it if you >>> >>> > could tell me which copper NIC you tried, as well as to pass on the >>> >>> > iSCSI tuning parameters. >>> >>> >>> Our copper NIC experience is with onboard X540-AT2 ports on >>> SuperMicro hardware (which have the guaranteed 10-20 msec lock >>> hold) and dual-port 82599EB TN cards (which have some sort of >>> driver/hardware failure under load that eventually leads to >>> 2-second lock holds). I can't recommend either with the current >>> driver; we had to revert to 1G networking in order to get stable >>> servers. >>> >>> >>> The iSCSI parameter modifications we do, across both initiators >>> and targets, are: >>> >>> >>> initialr2tno >>> >>> firstburstlength128k >>> >>> maxrecvdataseglen128k[only on Linux backends] >>> >>> maxxmitdataseglen128k[only on Linux backends] >>> >>> >>> The OmniOS initiator doesn't need tuning for more than the first >>> two parameters; on the Linux backends we tune up all four. My >>> extended thoughts on these tuning parameters and why we touch them >>> can be found >>> >>> here: >>> >>> >>> >>> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol >>> >>> http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning >>> >>> >>> The short version is that these parameters probably only make a >>> small difference but their overall goal is to do 128KB ZFS reads >>> and writes in single iSCSI operations (although they will be >>> fragmented at the TCP >>> >>> layer) and to do iSCSI writes without a back-and-forth delay >>> between initiator and target (that's 'initialr2t no'). >>> >>> >>> I think basically everyone should use InitialR2T set to no and in >>> fact that it should be the software default. These days only >>> unusually limited iSCSI targets should need it to be otherwise and >>> they can change their setting for it (initiator and target must >>> both agree to it being 'yes', so either can veto it). >>> >>> >>> - cks >>> >>> >>> >>> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann >> >>> > wrote: >>> >>> Hi, >>> >>> I think your problem is caused by your link properties or your >>> switch settings. In general the standard ixgbe seems to perform >>> well. >>> >>> I had trouble after changing the default flow control settings >>> to "bi" >>> and this was my motivation to update the ixgbe driver a long >>> time ago. >>> After I have updated our systems to ixgbe 2.5.8 I never had any >>> problems .... >>> >>> Make sure your switch has support for jumbo frames and you use >>> the same mtu on all ports, otherwise the smallest will be used. >>> >>> What switch do you use? I can tell you nice horror stories about >>> different vendors.... >>> >>> - Joerg >>> >>> On 23.02.2015 10:31, W Verb wrote: >>> >>> Thank you Joerg, >>> >>> I've downloaded the package and will try it tomorrow. >>> >>> The only thing I can add at this point is that upon review >>> of my >>> testing, I may have performed my "pkg -u" between the >>> initial quad-gig >>> performance test and installing the 10G NIC. So this may >>> be a new >>> problem introduced in the latest updates. >>> >>> Those of you who are running 10G and have not upgraded to >>> the latest >>> kernel, etc, might want to do some additional testing >>> before running the >>> update. >>> >>> -Warren V >>> >>> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann >>> >>> >>> >> wrote: >>> >>> Hi, >>> >>> I remember there was a problem with the flow control >>> settings in the >>> ixgbe >>> driver, so I updated it a long time ago for our >>> internal servers to >>> 2.5.8. >>> Last weekend I integrated the latest changes from the >>> FreeBSD driver >>> to bring >>> the illumos ixgbe to 2.5.25 but I had no time to test >>> it, so it's >>> completely >>> untested! >>> >>> >>> If you would like to give the latest driver a try you >>> can fetch the >>> kernel modules from >>> https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 >>> >>> >> > >>> >>> Clone your boot environment, place the modules in the >>> new environment >>> and update the boot-archive of the new BE. >>> >>> - Joerg >>> >>> >>> >>> >>> >>> On 23.02.2015 02:54, W Verb wrote: >>> >>> By the way, to those of you who have working >>> setups: please send me >>> your pool/volume settings, interface linkprops, >>> and any kernel >>> tuning >>> parameters you may have set. >>> >>> Thanks, >>> Warren V >>> >>> On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip >>> >>> >> >>> >>> >>> wrote: >>> >>> I can't say I totally agree with your performance >>> assessment. I run Intel >>> X520 in all my OmniOS boxes. >>> >>> Here is a capture of nfssvrtop I made while >>> running many >>> storage vMotions >>> between two OmniOS boxes hosting NFS >>> datastores. This is a >>> 10 host VMware >>> cluster. Both OmniOS boxes are dual 10G >>> connected with >>> copper twin-ax to >>> the in rack Nexus 5010. >>> >>> VMware does 100% sync writes, I use ZeusRAM >>> SSDs for log >>> devices. >>> >>> -Chip >>> >>> 2014 Apr 24 08:05:51, load: 12.64, read: >>> 17330243 KB, >>> swrite: 15985 KB, >>> awrite: 1875455 KB >>> >>> Ver Client NFSOPS Reads >>> SWrites AWrites >>> Commits Rd_bw >>> SWr_bw AWr_bw Rd_t SWr_t AWr_t >>> Com_t Align% >>> >>> 4 10.28.17.105 0 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 10.28.17.215 0 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 10.28.17.213 0 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 10.28.16.151 0 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 4 all 1 0 >>> 0 0 >>> 0 0 >>> 0 0 0 0 0 0 0 >>> >>> 3 10.28.16.175 3 0 >>> 3 0 >>> 0 1 >>> 11 0 4806 48 0 0 >>> 85 >>> >>> 3 10.28.16.183 6 0 >>> 6 0 >>> 0 3 >>> 162 0 549 124 0 0 >>> 73 >>> >>> 3 10.28.16.180 11 0 >>> 10 0 >>> 0 3 >>> 27 0 776 89 0 0 >>> 67 >>> >>> 3 10.28.16.176 28 2 >>> 26 0 >>> 0 10 >>> 405 0 2572 198 0 0 >>> 100 >>> >>> 3 10.28.16.178 4606 4602 >>> 4 0 >>> 0 294534 >>> 3 0 723 49 0 0 99 >>> >>> 3 10.28.16.179 4905 4879 >>> 26 0 >>> 0 312208 >>> 311 0 735 271 0 0 >>> 99 >>> >>> 3 10.28.16.181 5515 5502 >>> 13 0 >>> 0 352107 >>> 77 0 89 87 0 0 >>> 99 >>> >>> 3 10.28.16.184 12095 12059 >>> 10 0 >>> 0 763014 >>> 39 0 249 147 0 0 >>> 99 >>> >>> 3 10.28.58.1 15401 6040 >>> 116 6354 >>> 53 191605 >>> 474 202346 192 96 144 83 >>> 99 >>> >>> 3 all 42574 33086 >> <42574%2033086>> >>> > 217 >>> 6354 53 1913488 >>> 1582 202300 348 138 153 105 >>> 99 >>> >>> >>> >>> >>> >>> On Fri, Feb 20, 2015 at 11:46 PM, W Verb >>> >>> >> >>> >>> >> wrote: >>> >>> >>> Hello All, >>> >>> Thank you for your replies. >>> I tried a few things, and found the >>> following: >>> >>> 1: Disabling hyperthreading support in the >>> BIOS drops >>> performance overall >>> by a factor of 4. >>> 2: Disabling VT support also seems to have >>> some effect, >>> although it >>> appears to be minor. But this has the >>> amusing side >>> effect of fixing the >>> hangs I've been experiencing with fast >>> reboot. Probably >>> by disabling kvm. >>> 3: The performance tests are a bit tricky >>> to quantify >>> because of caching >>> effects. In fact, I'm not entirely sure >>> what is >>> happening here. It's just >>> best to describe what I'm seeing: >>> >>> The commands I'm using to test are >>> dd if=/dev/zero of=./test.dd bs=2M count=5000 >>> dd of=/dev/null if=./test.dd bs=2M count=5000 >>> The host vm is running Centos 6.6, and has >>> the latest >>> vmtools installed. >>> There is a host cache on an SSD local to >>> the host that >>> is also in place. >>> Disabling the host cache didn't >>> immediately have an >>> effect as far as I could >>> see. >>> >>> The host MTU set to 3000 on all iSCSI >>> interfaces for all >>> tests. >>> >>> Test 1: Right after reboot, with an ixgbe >>> MTU of 9000, >>> the write test >>> yields an average speed over three tests >>> of 137MB/s. The >>> read test yields an >>> average over three tests of 5MB/s. >>> >>> Test 2: After setting "ifconfig ixgbe0 mtu >>> 3000", the >>> write tests yield >>> 140MB/s, and the read tests yield 53MB/s. >>> It's important >>> to note here that >>> if I cut the read test short at only >>> 2-3GB, I get >>> results upwards of >>> 350MB/s, which I assume is local >>> cache-related distortion. >>> >>> Test 3: MTU of 1500. Read tests are up to >>> 156 MB/s. >>> Write tests yield >>> about 142MB/s. >>> Test 4: MTU of 1000: Read test at 182MB/s. >>> Test 5: MTU of 900: Read test at 130 MB/s. >>> Test 6: MTU of 1000: Read test at 160MB/s. >>> Write tests >>> are now >>> consistently at about 300MB/s. >>> Test 7: MTU of 1200: Read test at 124MB/s. >>> Test 8: MTU of 1000: Read test at 161MB/s. >>> Write at 261MB/s. >>> >>> A few final notes: >>> L1ARC grabs about 10GB of RAM during the >>> tests, so >>> there's definitely some >>> read caching going on. >>> The write operations are easier to observe >>> with iostat, >>> and I'm seeing io >>> rates that closely correlate with the >>> network write speeds. >>> >>> >>> Chris, thanks for your specific details. >>> I'd appreciate >>> it if you could >>> tell me which copper NIC you tried, as >>> well as to pass >>> on the iSCSI tuning >>> parameters. >>> >>> I've ordered an Intel EXPX9502AFXSR, which >>> uses the >>> 82598 chip instead of >>> the 82599 in the X520. If I get similar >>> results with my >>> fiber transcievers, >>> I'll see if I can get a hold of copper ones. >>> >>> But I should mention that I did indeed >>> look at PHY/MAC >>> error rates, and >>> they are nil. >>> >>> -Warren V >>> >>> On Fri, Feb 20, 2015 at 7:25 PM, Chris >>> Siebenmann >>> >> >>> >> >>> >>> >> >>> >>> wrote: >>> >>> >>> After installation and >>> configuration, I observed >>> all kinds of bad >>> behavior >>> in the network traffic between the >>> hosts and the >>> server. All of this >>> bad >>> behavior is traced to the ixgbe >>> driver on the >>> storage server. Without >>> going >>> into the full troubleshooting >>> process, here are >>> my takeaways: >>> >>> [...] >>> >>> For what it's worth, we managed to >>> achieve much >>> better line rates on >>> copper 10G ixgbe hardware of various >>> descriptions >>> between OmniOS >>> and CentOS 7 (I don't think we ever >>> tested OmniOS to >>> OmniOS). I don't >>> believe OmniOS could do TCP at full >>> line rate but I >>> think we managed 700+ >>> Mbytes/sec on both transmit and >>> receive and we got >>> basically disk-limited >>> speeds with iSCSI (across multiple >>> disks on >>> multi-disk mirrored pools, >>> OmniOS iSCSI initiator, Linux iSCSI >>> targets). >>> >>> I don't believe we did any specific >>> kernel tuning >>> (and in fact some of >>> our attempts to fiddle ixgbe driver >>> parameters blew >>> up in our face). >>> We did tune iSCSI connection >>> parameters to increase >>> various buffer >>> sizes so that ZFS could do even large >>> single >>> operations in single iSCSI >>> transactions. (More details available >>> if people are >>> interested.) >>> >>> 10: At the wire level, the speed >>> problems are >>> clearly due to pauses in >>> response time by omnios. At 9000 >>> byte frame >>> sizes, I see a good number >>> of duplicate ACKs and fast >>> retransmits during >>> read operations (when >>> omnios is transmitting). But below >>> about a >>> 4100-byte MTU on omnios >>> (which seems to correlate to >>> 4096-byte iSCSI >>> block transfers), the >>> transmission errors fade away and >>> we only see >>> the transmission pause >>> problem. >>> >>> >>> This is what really attracted my >>> attention. In >>> our OmniOS setup, our >>> specific Intel hardware had ixgbe >>> driver issues that >>> could cause >>> activity stalls during once-a-second >>> link heartbeat >>> checks. This >>> obviously had an effect at the TCP and >>> iSCSI layers. >>> My initial message >>> to illumos-developer sparked a >>> potentially >>> interesting discussion: >>> >>> >>> http://www.listbox.com/member/____archive/182179/2014/10/sort/____time_rev/page/16/entry/6:__405/__20141003125035:6357079A-__4B1D-__11E4-A39C-D534381BA44D/ >>> < >>> http://www.listbox.com/member/__archive/182179/2014/10/sort/__time_rev/page/16/entry/6:405/__20141003125035:6357079A-4B1D-__11E4-A39C-D534381BA44D/ >>> > >>> >>> < >>> http://www.listbox.com/__member/archive/182179/2014/10/__sort/time_rev/page/16/entry/6:__405/20141003125035:6357079A-__4B1D-11E4-A39C-D534381BA44D/ >>> < >>> http://www.listbox.com/member/archive/182179/2014/10/sort/time_rev/page/16/entry/6:405/20141003125035:6357079A-4B1D-11E4-A39C-D534381BA44D/ >>> >> >>> >>> If you think this is a possibility in >>> your setup, >>> I've put the DTrace >>> script I used to hunt for this up on >>> the web: >>> >>> >>> http://www.cs.toronto.edu/~____cks/src/omnios-ixgbe/ixgbe_____delay.d >>> < >>> http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d> >>> >>> < >>> http://www.cs.toronto.edu/~__cks/src/omnios-ixgbe/ixgbe___delay.d >>> < >>> http://www.cs.toronto.edu/~cks/src/omnios-ixgbe/ixgbe_delay.d>> >>> >>> This isn't the only potential source >>> of driver >>> stalls by any means, it's >>> just the one I found. You may also >>> want to look at >>> lockstat in general, >>> as information it reported is what led >>> us to look >>> specifically at the >>> ixgbe code here. >>> >>> (If you suspect kernel/driver issues, >>> lockstat >>> combined with kernel >>> source is a really excellent resource.) >>> >>> - cks >>> >>> >>> >>> >>> >>> ___________________________________________________ >>> OmniOS-discuss mailing list >>> OmniOS-discuss at lists.omniti >>> .____com >>> >> > >>> >>> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss >>> >> > >>> >>> >> > >>> >>> >>> ___________________________________________________ >>> OmniOS-discuss mailing list >>> OmniOS-discuss at lists.omniti >>> .____com >>> >> > >>> >>> http://lists.omniti.com/____mailman/listinfo/omnios-____discuss >>> >> > >>> >>> >> > >>> >>> >>> -- >>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, >>> 90408 Nuernberg >>> Tel: +49 911 39905-0 >> <%2B49%20911%2039905-0>> >>> > - Fax: >>> +49 911 >>> 39905-55 >> <%2B49%20911%2039905-55>> - >>> http://www.osn.de >>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg >>> Goltermann >>> >>> >>> >>> -- >>> OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 >>> Nuernberg >>> Tel: +49 911 39905-0 >> <%2B49%20911%2039905-0>> - Fax: +49 >>> 911 39905-55 >> <%2B49%20911%2039905-55>> - http://www.osn.de >>> >>> HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann >>> >>> >>> *illumos-developer* | Archives >>> >>> >> > >>> | Modify Your Subscription >>> [Powered by Listbox] >>> >>> >>> >>> *illumos-developer* | Archives >>> >>> | >>> Modify >>> >>> >> > >>> Your Subscription [Powered by Listbox] >> >>> >>> ... >>> >>> [Message clipped] >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chip at innovates.com Wed Mar 4 14:17:50 2015 From: chip at innovates.com (Schweiss, Chip) Date: Wed, 4 Mar 2015 08:17:50 -0600 Subject: [OmniOS-discuss] speeding up file access In-Reply-To: <20150304142603.152ac1da@emeritus> References: <20150304142603.152ac1da@emeritus> Message-ID: No USB flash is going to bring any benefit to the game as a log device. If it has any ram cache to increase write performance, it's useless as a log device because it will not have any power protection for the ram. Most likely it will not have any RAM and write performance will be poor. Decent log devices don't come cheap. If it's just a home server set up some frequent snapshots and turn sync off. You may have to throw away the most recent writes in the case of a power failure, but your performance will be maximized. I've been doing this for 4 years on my home ZFS server, about 1/2 dozen power failures and I've never lost anything. I still keep it backed up. I use Code42's Crashplan. -Chip On Tue, Mar 3, 2015 at 10:26 PM, Michael Mounteney wrote: > Hello list; this is a very basic question about ZFS performance from > someone with limited sysadmin knowledge. I've seen various messages > about ZILs and caching and noticed that my Supermicro 5017C-LF > (http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm). > This has a single USB socket on the board so I wondered if it would be > worth putting a USB stick / `thumbdrive' in there and using it as the > ZIL / cache. I know the real answer to my question is 'buy a proper > server' but this is a home system and cost, noise and power-consumption > all mandate the current choice of machine. > > (Yes; the USB socket is vertical; I'd have to buy a right-angle > converter) > > Thanks, Michael. > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.barfield at bissinc.com Wed Mar 4 16:27:29 2015 From: john.barfield at bissinc.com (John Barfield) Date: Wed, 4 Mar 2015 16:27:29 +0000 Subject: [OmniOS-discuss] OmniOS on IBM DX360 UEFI Firmware Message-ID: <25A710AE-6CAC-4935-BC44-BAE0A81762B6@bissinc.com> Greetings, I?m writing to see if anyone could point me in the direction of a document that would detail how to get OmniOS to boot on IBM?s newest UEFI firmware on system X machines. I?m using a DX360 3U chassis as a storage appliance and I?m having a hard time booting the installer iso from USB. The installer ISO simply does not work but I can boot another ?installed? OmniOS appliance image off of a different USB stick. However this image just crashes and reboots after the SunOS 5.11 screen and goes into an infinite reboot loop. If anyone has any experience with this server I would be very grateful if you shared your knowledge. I?ve tried disabling UEFI or enabling legacy mode but I just don?t think that its working?after scanning through IBM?s docs from what I can tell?it should just work automatically. Thanks in advance for any help! John Barfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at marzocchi.net Wed Mar 4 23:08:32 2015 From: lists at marzocchi.net (Olaf Marzocchi) Date: Thu, 5 Mar 2015 00:08:32 +0100 Subject: [OmniOS-discuss] speeding up file access In-Reply-To: References: <20150304142603.152ac1da@emeritus> Message-ID: <23C40932-04A6-40B3-B417-353340899272@marzocchi.net> I also have a USB connector and a SD connector on the mobo of my server (Proliant ML100 G7) and I never found any good use for them. The best I could think of is doing a local backup of /etc and of the other config dirs. Concerning CrashPlan: I also use it, are you aware that they cut support for Solaris in 4.x? Solaris will be supported only on the old 3.x versions. Since they mantain backward compatibility for two main releases, as soon as 5.x will be released, their servers will not accept data from 3.x anymore. That should be in about 1.5 years from now, rough estimate. I found NO alternatives yet. Olaf > Il giorno 04/mar/2015, alle ore 15:17, Schweiss, Chip ha scritto: > > No USB flash is going to bring any benefit to the game as a log device. If it has any ram cache to increase write performance, it's useless as a log device because it will not have any power protection for the ram. Most likely it will not have any RAM and write performance will be poor. Decent log devices don't come cheap. > > If it's just a home server set up some frequent snapshots and turn sync off. You may have to throw away the most recent writes in the case of a power failure, but your performance will be maximized. > > I've been doing this for 4 years on my home ZFS server, about 1/2 dozen power failures and I've never lost anything. I still keep it backed up. I use Code42's Crashplan. > > -Chip > > On Tue, Mar 3, 2015 at 10:26 PM, Michael Mounteney > wrote: > Hello list; this is a very basic question about ZFS performance from > someone with limited sysadmin knowledge. I've seen various messages > about ZILs and caching and noticed that my Supermicro 5017C-LF > (http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm ). > This has a single USB socket on the board so I wondered if it would be > worth putting a USB stick / `thumbdrive' in there and using it as the > ZIL / cache. I know the real answer to my question is 'buy a proper > server' but this is a home system and cost, noise and power-consumption > all mandate the current choice of machine. > > (Yes; the USB socket is vertical; I'd have to buy a right-angle > converter) > > Thanks, Michael. > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From gate03 at landcroft.co.uk Wed Mar 4 23:52:57 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Thu, 5 Mar 2015 09:52:57 +1000 Subject: [OmniOS-discuss] speeding up file access In-Reply-To: References: <20150304142603.152ac1da@emeritus> Message-ID: <20150305095257.5de6478f@emeritus> Thanks to Doug and Chip for the replies. On Wed, 4 Mar 2015 08:17:50 -0600 "Schweiss, Chip" wrote: > [...] > > If it's just a home server set up some frequent snapshots and turn > sync off. You may have to throw away the most recent writes in the > case of a power failure, but your performance will be maximized. I already turned-off sync but the machine is on a UPS, which helps. It's a low-power server and I have to remember that. The other day I found that NTP synchronisation to the clients wasn't working, as the server was responding too slowly because it had two KVM VMs running. Michael. From matthew.lagoe at subrigo.net Wed Mar 4 23:59:08 2015 From: matthew.lagoe at subrigo.net (Matthew Lagoe) Date: Wed, 4 Mar 2015 15:59:08 -0800 Subject: [OmniOS-discuss] speeding up file access In-Reply-To: <23C40932-04A6-40B3-B417-353340899272@marzocchi.net> References: <20150304142603.152ac1da@emeritus> <23C40932-04A6-40B3-B417-353340899272@marzocchi.net> Message-ID: <001e01d056d7$36d37710$a47a6530$@subrigo.net> I have only used them for like a usb dongle other then that there pretty useless J From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Olaf Marzocchi Sent: Wednesday, March 04, 2015 03:09 PM To: Schweiss, Chip Cc: omnios-discuss Subject: Re: [OmniOS-discuss] speeding up file access I also have a USB connector and a SD connector on the mobo of my server (Proliant ML100 G7) and I never found any good use for them. The best I could think of is doing a local backup of /etc and of the other config dirs. Concerning CrashPlan: I also use it, are you aware that they cut support for Solaris in 4.x? Solaris will be supported only on the old 3.x versions. Since they mantain backward compatibility for two main releases, as soon as 5.x will be released, their servers will not accept data from 3.x anymore. That should be in about 1.5 years from now, rough estimate. I found NO alternatives yet. Olaf Il giorno 04/mar/2015, alle ore 15:17, Schweiss, Chip ha scritto: No USB flash is going to bring any benefit to the game as a log device. If it has any ram cache to increase write performance, it's useless as a log device because it will not have any power protection for the ram. Most likely it will not have any RAM and write performance will be poor. Decent log devices don't come cheap. If it's just a home server set up some frequent snapshots and turn sync off. You may have to throw away the most recent writes in the case of a power failure, but your performance will be maximized. I've been doing this for 4 years on my home ZFS server, about 1/2 dozen power failures and I've never lost anything. I still keep it backed up. I use Code42's Crashplan. -Chip On Tue, Mar 3, 2015 at 10:26 PM, Michael Mounteney wrote: Hello list; this is a very basic question about ZFS performance from someone with limited sysadmin knowledge. I've seen various messages about ZILs and caching and noticed that my Supermicro 5017C-LF (http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm). This has a single USB socket on the board so I wondered if it would be worth putting a USB stick / `thumbdrive' in there and using it as the ZIL / cache. I know the real answer to my question is 'buy a proper server' but this is a home system and cost, noise and power-consumption all mandate the current choice of machine. (Yes; the USB socket is vertical; I'd have to buy a right-angle converter) Thanks, Michael. _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chip at innovates.com Thu Mar 5 00:40:23 2015 From: chip at innovates.com (Schweiss, Chip) Date: Wed, 4 Mar 2015 18:40:23 -0600 Subject: [OmniOS-discuss] OmniOS on IBM DX360 UEFI Firmware In-Reply-To: <25A710AE-6CAC-4935-BC44-BAE0A81762B6@bissinc.com> References: <25A710AE-6CAC-4935-BC44-BAE0A81762B6@bissinc.com> Message-ID: Sounds like the problem I had on a new Supermicro box. I found by trial and error turning off x2apic in the bios fixed the problem. Also disable C sleep states. -Chip On Wed, Mar 4, 2015 at 10:27 AM, John Barfield wrote: > Greetings, > > I?m writing to see if anyone could point me in the direction of a > document that would detail how to get OmniOS to boot on IBM?s newest UEFI > firmware on system X machines. > > I?m using a DX360 3U chassis as a storage appliance and I?m having a > hard time booting the installer iso from USB. > > The installer ISO simply does not work but I can boot another > ?installed? OmniOS appliance image off of a different USB stick. > > However this image just crashes and reboots after the SunOS 5.11 screen > and goes into an infinite reboot loop. > > If anyone has any experience with this server I would be very grateful > if you shared your knowledge. > > I?ve tried disabling UEFI or enabling legacy mode but I just don?t think > that its working?after scanning through IBM?s docs from what I can tell?it > should just work automatically. > > Thanks in advance for any help! > > John Barfield > > > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.barfield at bissinc.com Thu Mar 5 03:58:29 2015 From: john.barfield at bissinc.com (John Barfield) Date: Thu, 5 Mar 2015 03:58:29 +0000 Subject: [OmniOS-discuss] OmniOS on IBM DX360 UEFI Firmware In-Reply-To: References: <25A710AE-6CAC-4935-BC44-BAE0A81762B6@bissinc.com>, Message-ID: Thanks! Ill try that tomorrow... Thanks and have a great day, John Barfield On Mar 4, 2015, at 6:40 PM, Schweiss, Chip > wrote: Sounds like the problem I had on a new Supermicro box. I found by trial and error turning off x2apic in the bios fixed the problem. Also disable C sleep states. -Chip On Wed, Mar 4, 2015 at 10:27 AM, John Barfield > wrote: Greetings, I'm writing to see if anyone could point me in the direction of a document that would detail how to get OmniOS to boot on IBM's newest UEFI firmware on system X machines. I'm using a DX360 3U chassis as a storage appliance and I'm having a hard time booting the installer iso from USB. The installer ISO simply does not work but I can boot another "installed" OmniOS appliance image off of a different USB stick. However this image just crashes and reboots after the SunOS 5.11 screen and goes into an infinite reboot loop. If anyone has any experience with this server I would be very grateful if you shared your knowledge. I've tried disabling UEFI or enabling legacy mode but I just don't think that its working...after scanning through IBM's docs from what I can tell...it should just work automatically. Thanks in advance for any help! John Barfield _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From nsmith at careyweb.com Thu Mar 5 14:00:51 2015 From: nsmith at careyweb.com (Nate Smith) Date: Thu, 5 Mar 2015 09:00:51 -0500 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Message-ID: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:13 newstorm last message repeated 1 time Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G -------------- next part -------------- An HTML attachment was scrubbed... URL: From rt at steait.net Thu Mar 5 16:07:59 2015 From: rt at steait.net (Rune Tipsmark) Date: Thu, 5 Mar 2015 16:07:59 +0000 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> Message-ID: <0224e713f8ba49249c659888858f569b@EX1301.steait.net> Same problem here? have noticed I can cause this easily by using Windows as initiator? I cannot cause this using VMware as initiator? No idea how to fix, but a big problem. Br, Rune From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Thursday, March 05, 2015 6:01 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. Mar 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:13 newstorm last message repeated 1 time Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G -------------- next part -------------- An HTML attachment was scrubbed... URL: From nsmith at careyweb.com Thu Mar 5 16:10:12 2015 From: nsmith at careyweb.com (Nate Smith) Date: Thu, 5 Mar 2015 11:10:12 -0500 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: <0224e713f8ba49249c659888858f569b@EX1301.steait.net> References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <0224e713f8ba49249c659888858f569b@EX1301.steait.net> Message-ID: <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com> Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. From: Rune Tipsmark [mailto:rt at steait.net] Sent: Thursday, March 05, 2015 11:08 AM To: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Same problem here? have noticed I can cause this easily by using Windows as initiator? I cannot cause this using VMware as initiator? No idea how to fix, but a big problem. Br, Rune From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Thursday, March 05, 2015 6:01 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. Mar 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:13 newstorm last message repeated 1 time Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G -------------- next part -------------- An HTML attachment was scrubbed... URL: From rt at steait.net Thu Mar 5 16:14:26 2015 From: rt at steait.net (Rune Tipsmark) Date: Thu, 5 Mar 2015 16:14:26 +0000 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com> References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <0224e713f8ba49249c659888858f569b@EX1301.steait.net> <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com> Message-ID: <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net> Haven?t tried iSCSI but had similar issues with Infiniband? more frequent due to higher io load, but no console error messages. This only happened on my SuperMicro server and never on my HP server? what brand are you running? Br, Rune From: Nate Smith [mailto:nsmith at careyweb.com] Sent: Thursday, March 05, 2015 8:10 AM To: Rune Tipsmark; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. From: Rune Tipsmark [mailto:rt at steait.net] Sent: Thursday, March 05, 2015 11:08 AM To: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Same problem here? have noticed I can cause this easily by using Windows as initiator? I cannot cause this using VMware as initiator? No idea how to fix, but a big problem. Br, Rune From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Thursday, March 05, 2015 6:01 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. Mar 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:13 newstorm last message repeated 1 time Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G -------------- next part -------------- An HTML attachment was scrubbed... URL: From nsmith at careyweb.com Thu Mar 5 16:16:15 2015 From: nsmith at careyweb.com (Nate Smith) Date: Thu, 5 Mar 2015 11:16:15 -0500 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net> References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <0224e713f8ba49249c659888858f569b@EX1301.steait.net> <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com> <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net> Message-ID: Dell R720. Had it happen with an intel system too. From: Rune Tipsmark [mailto:rt at steait.net] Sent: Thursday, March 05, 2015 11:14 AM To: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Haven?t tried iSCSI but had similar issues with Infiniband? more frequent due to higher io load, but no console error messages. This only happened on my SuperMicro server and never on my HP server? what brand are you running? Br, Rune From: Nate Smith [mailto:nsmith at careyweb.com] Sent: Thursday, March 05, 2015 8:10 AM To: Rune Tipsmark; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. From: Rune Tipsmark [mailto:rt at steait.net] Sent: Thursday, March 05, 2015 11:08 AM To: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Same problem here? have noticed I can cause this easily by using Windows as initiator? I cannot cause this using VMware as initiator? No idea how to fix, but a big problem. Br, Rune From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Thursday, March 05, 2015 6:01 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. Mar 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:13 newstorm last message repeated 1 time Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G -------------- next part -------------- An HTML attachment was scrubbed... URL: From nsmith at careyweb.com Thu Mar 5 16:39:25 2015 From: nsmith at careyweb.com (Nate Smith) Date: Thu, 5 Mar 2015 11:39:25 -0500 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <0224e713f8ba49249c659888858f569b@EX1301.steait.net> <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com> <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net> Message-ID: <2b73ea18-5f73-4c6e-ab1a-18f45e6f8329@careyweb.com> I posted something about this last fall and didn?t get a response. Here was the only similar error I found. Looks like it happens on OI too. http://openindiana.org/pipermail/openindiana-discuss/2012-May/008211.html From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Thursday, March 05, 2015 11:16 AM To: 'Rune Tipsmark'; omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Dell R720. Had it happen with an intel system too. From: Rune Tipsmark [mailto:rt at steait.net] Sent: Thursday, March 05, 2015 11:14 AM To: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Haven?t tried iSCSI but had similar issues with Infiniband? more frequent due to higher io load, but no console error messages. This only happened on my SuperMicro server and never on my HP server? what brand are you running? Br, Rune From: Nate Smith [mailto:nsmith at careyweb.com] Sent: Thursday, March 05, 2015 8:10 AM To: Rune Tipsmark; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. From: Rune Tipsmark [mailto:rt at steait.net] Sent: Thursday, March 05, 2015 11:08 AM To: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Same problem here? have noticed I can cause this easily by using Windows as initiator? I cannot cause this using VMware as initiator? No idea how to fix, but a big problem. Br, Rune From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Thursday, March 05, 2015 6:01 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. Mar 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:13 newstorm last message repeated 1 time Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G -------------- next part -------------- An HTML attachment was scrubbed... URL: From johan.kragsterman at capvert.se Thu Mar 5 16:59:42 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Thu, 5 Mar 2015 17:59:42 +0100 Subject: [OmniOS-discuss] Ang: Re: QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net> References: <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>, <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <0224e713f8ba49249c659888858f569b@EX1301.steait.net> <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com> Message-ID: Hi! -----"OmniOS-discuss" skrev: ----- Till: "'Nate Smith'" , "omnios-discuss at lists.omniti.com" Fr?n: Rune Tipsmark S?nt av: "OmniOS-discuss" Datum: 2015-03-05 17:15 ?rende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Haven’t tried iSCSI but had similar issues with Infiniband… more frequent due to higher io load, but no console error messages. ? This only happened on my SuperMicro server and never on my HP server… what brand are you running? ? This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here... First, when you say "server", do you mean the SAN head? Not the hosts? Second: Can you specify the exakt model of the Supermicro and the HP? Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses. Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro? Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this. Rgrds Johan Br, Rune ? ? From: Nate Smith [mailto:nsmith at careyweb.com] Sent: Thursday, March 05, 2015 8:10 AM To: Rune Tipsmark; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. ? From: Rune Tipsmark [mailto:rt at steait.net] Sent: Thursday, March 05, 2015 11:08 AM To: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Same problem here… have noticed I can cause this easily by using Windows as initiator… I cannot cause this using VMware as initiator… No idea how to fix, but a big problem. Br, Rune ? ? From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Thursday, March 05, 2015 6:01 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? I’ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I’ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don’t get an error that it’s dropped, at least not on the Omnios system, but I get notice when it’s restored (which makes no sense). I’m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. ? Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:13 newstorm last message repeated 1 time Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From nsmith at careyweb.com Thu Mar 5 17:06:06 2015 From: nsmith at careyweb.com (Nate Smith) Date: Thu, 5 Mar 2015 12:06:06 -0500 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: References: <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>, <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <0224e713f8ba49249c659888858f569b@EX1301.steait.net> <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com> Message-ID: <48424678-f235-464c-8400-a87de6b2d161@careyweb.com> The way I have it set up, is that Hyper-V hypervisor picks up the comstar targets and mounts them as ntfs storage to host the HVDs for Cluster File System. In the cluster, I can have either hypervisor drop and the cluster stays up. I'm getting this behavior on 2008 R2 and 2012 R2 (I have both hypervisors connecting to different luns at the same time, so it's hard to say which is causing it to fail). As far as which PCI device I'm on, interrupts, etc, I could never find a rhyme or reason to it, but I didn't do an exacting test. It's hard to reproduce the problem to test for it. I know my HBAs were always on separate PCI busses and running at 8x on both systems I used. 0Nate -----Original Message----- From: Johan Kragsterman [mailto:johan.kragsterman at capvert.se] Sent: Thursday, March 05, 2015 12:00 PM To: Rune Tipsmark Cc: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: Ang: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Hi! -----"OmniOS-discuss" skrev: ----- Till: "'Nate Smith'" , "omnios-discuss at lists.omniti.com" Fr?n: Rune Tipsmark S?nt av: "OmniOS-discuss" Datum: 2015-03-05 17:15 ?rende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Haven’t tried iSCSI but had similar issues with Infiniband… more frequent due to higher io load, but no console error messages. ? This only happened on my SuperMicro server and never on my HP server… what brand are you running? ? This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here... First, when you say "server", do you mean the SAN head? Not the hosts? Second: Can you specify the exakt model of the Supermicro and the HP? Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses. Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro? Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this. Rgrds Johan Br, Rune ? ? From: Nate Smith [mailto:nsmith at careyweb.com] Sent: Thursday, March 05, 2015 8:10 AM To: Rune Tipsmark; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. ? From: Rune Tipsmark [mailto:rt at steait.net] Sent: Thursday, March 05, 2015 11:08 AM To: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Same problem here… have noticed I can cause this easily by using Windows as initiator… I cannot cause this using VMware as initiator… No idea how to fix, but a big problem. Br, Rune ? ? From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Thursday, March 05, 2015 6:01 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? I’ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I’ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don’t get an error that it’s dropped, at least not on the Omnios system, but I get notice when it’s restored (which makes no sense). I’m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. ? Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:13 newstorm last message repeated 1 time Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From doug at will.to Thu Mar 5 18:31:32 2015 From: doug at will.to (Doug Hughes) Date: Thu, 5 Mar 2015 13:31:32 -0500 Subject: [OmniOS-discuss] problems with 10g interfaces dropping off for a time and then coming back Message-ID: I'm having an issue with r*12 with 10g Solarflare interfaces setup in an aggregate simultaneously dropping for a while for no apparent reason and then coming back. Oddly, I can see them leaving the port channel and dropping on the switch side, but there's no log messages or anything on the client side. They are 5162 cards, for what it's worth. Has anybody else seen anything like this? Any idea why the host ports don't seem to log any messages to the effect? I can see side affects of this on the host. It only happens during moderate to heavy load. Interrupt balancing looks ok (intrstat), and I watch vmstat, and then all of a sudden the cs, interrupts and other markers drop preciptously (probably as a result of a complete drop of network traffic), and it will stay that way for a couple of minutes and then recover on its own. Sometimes it is up to 30 minutes and then it just recovers, equally as mysteriously. I can sometimes fix it by toggling the interface on the switch. I have other hosts with the same hardware and driver but running Solaris 10 that don't exhibit this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rt at steait.net Thu Mar 5 18:38:24 2015 From: rt at steait.net (Rune Tipsmark) Date: Thu, 5 Mar 2015 18:38:24 +0000 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: References: <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>, <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <0224e713f8ba49249c659888858f569b@EX1301.steait.net> <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com> Message-ID: <22d96a9197cb4d7aa9c46c00b2f96337@EX1301.steait.net> Pls see below >> -----Original Message----- From: Johan Kragsterman [mailto:johan.kragsterman at capvert.se] Sent: Thursday, March 05, 2015 9:00 AM To: Rune Tipsmark Cc: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: Ang: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Hi! -----"OmniOS-discuss" skrev: ----- Till: "'Nate Smith'" , "omnios-discuss at lists.omniti.com" Fr?n: Rune Tipsmark S?nt av: "OmniOS-discuss" Datum: 2015-03-05 17:15 ?rende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Haven’t tried iSCSI but had similar issues with Infiniband… more frequent due to higher io load, but no console error messages. ? This only happened on my SuperMicro server and never on my HP server… what brand are you running? This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here... First, when you say "server", do you mean the SAN head? Not the hosts? >> SAN Head yes Second: Can you specify the exakt model of the Supermicro and the HP? >>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed the shitty HP software and controller from and replaced with an LSI 9207 and installed OmniOS on. I have tested on other HP and SM servers too, all exhibit the same behavior (3 SM and 2 HP tested) Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses. >> both are set to max performance in BIOS and afaik all C-states are disabled. I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings but I actually have two 8Gbit FC Cards in the SM server and both exhibit the problem. I have tried to swap things around too with no luck. I do use every available PCI-E slot though.. L2ARC, SLOG etc. Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro? >>Windows on hardware yes, all I have to do is zone a block device over to Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no doubt it will cause this issue... when I say I cannot cause this on VMware then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating system without raw device mapping - I have not tested if I can cause this using RDM. It is also true for both HP and SM - both behave just fine using VMware and FibreChannel - however VMware can cause issues with Infiniband on the SM but I think that's a different issue and has to do with Mellanox and their terrible drivers that are never ever updated and half-beta etc. Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this. >> I think it could be a mx of both, would be interesting to see if something in Comstar could be fixed,... Bacically: ESX+FC+SM = problem ESX+FC+HP = no problem Win+FC+SM = problem Win+FC+HP = not tested ESX+IB+SM = problem ESX+IB+HP = no problem Win+IB+SM = not tested, SRP not supported in Win Win+IB+HP = not tested, SRP not supported in Win Anyway it all lead me to some information on 8Gbit FC - in particular portCfgFillWord Maybe this can affect some of this... google will reveal a great bit of info and also found some links.. http://virtualkenneth.com/2012/05/21/random-esxi-lun-loss-in-a-8gb-fc-brocade-infrastructure/ What about tuning the emlxs.conf? can anything be done there to get better performance? Br, Rune Rgrds Johan Br, Rune ? ? From: Nate Smith [mailto:nsmith at careyweb.com] Sent: Thursday, March 05, 2015 8:10 AM To: Rune Tipsmark; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. ? From: Rune Tipsmark [mailto:rt at steait.net] Sent: Thursday, March 05, 2015 11:08 AM To: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Same problem here… have noticed I can cause this easily by using Windows as initiator… I cannot cause this using VMware as initiator… No idea how to fix, but a big problem. Br, Rune ? ? From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Thursday, March 05, 2015 6:01 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? I’ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I’ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don’t get an error that it’s dropped, at least not on the Omnios system, but I get notice when it’s restored (which makes no sense). I’m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. ? Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:13 newstorm last message repeated 1 time Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From johan.kragsterman at capvert.se Thu Mar 5 19:06:52 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Thu, 5 Mar 2015 20:06:52 +0100 Subject: [OmniOS-discuss] Ang: RE: Re: QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: <22d96a9197cb4d7aa9c46c00b2f96337@EX1301.steait.net> References: <22d96a9197cb4d7aa9c46c00b2f96337@EX1301.steait.net>, <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>, <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <0224e713f8ba49249c659888858f569b@EX1301.steait.net> <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com> Message-ID: Hi! -----Rune Tipsmark skrev: ----- Till: 'Johan Kragsterman' Fr?n: Rune Tipsmark Datum: 2015-03-05 19:38 Kopia: 'Nate Smith' , "omnios-discuss at lists.omniti.com" ?rende: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Pls see below >> : [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Haven’t tried iSCSI but had similar issues with Infiniband… more frequent due to higher io load, but no console error messages. ? This only happened on my SuperMicro server and never on my HP server… what brand are you running? This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here... First, when you say "server", do you mean the SAN head? Not the hosts? >> SAN Head yes Second: Can you specify the exakt model of the Supermicro and the HP? >>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed the shitty HP software and controller from and replaced with ?an LSI 9207 and installed OmniOS on. I have tested on other HP and SM servers too, all exhibit the same behavior (3 SM and 2 HP tested) Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses. >> both are set to max performance in BIOS and afaik all C-states are disabled. I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings but I actually have two 8Gbit FC Cards in the SM server and both exhibit the problem. I have tried to swap things around too with no luck. I do use every available PCI-E slot though.. L2ARC, SLOG etc. Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro? >>Windows on hardware yes, all I have to do is zone a block device over to Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no doubt it will cause this issue... when I say I cannot cause this on VMware then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating system without raw device mapping - I have not tested if I can cause this using RDM. It is also true for both HP and SM - both behave just fine using VMware and FibreChannel - however VMware can cause issues with Infiniband on the SM but I think that's a different issue and has to do with Mellanox and their terrible drivers that are never ever updated and half-beta etc. Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this. >> I think it could be a mx of both, would be interesting to see if something in Comstar could be fixed,... Bacically: ESX+FC+SM = problem ESX+FC+HP = no problem Win+FC+SM = problem Win+FC+HP = not tested ESX+IB+SM = problem ESX+IB+HP = no problem Win+IB+SM = not tested, SRP not supported in Win Win+IB+HP = not tested, SRP not supported in Win Anyway it all lead me to some information on 8Gbit FC - in particular portCfgFillWord Maybe this can affect some of this... google will reveal a great bit of info and also found some links.. http://virtualkenneth.com/2012/05/21/random-esxi-lun-loss-in-a-8gb-fc-brocade-infrastructure/ What about tuning the emlxs.conf? can anything be done there to get better performance? Are you using Emulex HBA's? That would explain things....I have never used Emulex in production. Tried some times in lab env, but always turned out to behave strangly... Br, Rune Rgrds Johan Br, Rune ? ? From: Nate Smith [mailto:nsmith at careyweb.com] Sent: Thursday, March 05, 2015 8:10 AM To: Rune Tipsmark; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. ? From: Rune Tipsmark [mailto:rt at steait.net] Sent: Thursday, March 05, 2015 11:08 AM To: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Same problem here… have noticed I can cause this easily by using Windows as initiator… I cannot cause this using VMware as initiator… No idea how to fix, but a big problem. Br, Rune ? ? From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Thursday, March 05, 2015 6:01 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? I’ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I’ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don’t get an error that it’s dropped, at least not on the Omnios system, but I get notice when it’s restored (which makes no sense). I’m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. ? Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:13 newstorm last message repeated 1 time Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From rt at steait.net Thu Mar 5 19:44:35 2015 From: rt at steait.net (Rune Tipsmark) Date: Thu, 5 Mar 2015 19:44:35 +0000 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: References: <22d96a9197cb4d7aa9c46c00b2f96337@EX1301.steait.net>, <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>, <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <0224e713f8ba49249c659888858f569b@EX1301.steait.net> <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com> Message-ID: They are qLogic qmh2562 across the board... just figured the emlxs.conf had something to say since I had to edit it to get comstar into target mode. Br, Rune -----Original Message----- From: Johan Kragsterman [mailto:johan.kragsterman at capvert.se] Sent: Thursday, March 05, 2015 11:07 AM To: Rune Tipsmark Cc: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: Ang: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Hi! -----Rune Tipsmark skrev: ----- Till: 'Johan Kragsterman' Fr?n: Rune Tipsmark Datum: 2015-03-05 19:38 Kopia: 'Nate Smith' , "omnios-discuss at lists.omniti.com" ?rende: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Pls see below >> : [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Haven’t tried iSCSI but had similar issues with Infiniband… more frequent due to higher io load, but no console error messages. ? This only happened on my SuperMicro server and never on my HP server… what brand are you running? This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here... First, when you say "server", do you mean the SAN head? Not the hosts? >> SAN Head yes Second: Can you specify the exakt model of the Supermicro and the HP? >>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed >>the shitty HP software and controller from and replaced with ?an LSI >>9207 and installed OmniOS on. I have tested on other HP and SM servers >>too, all exhibit the same behavior (3 SM and 2 HP tested) Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses. >> both are set to max performance in BIOS and afaik all C-states are disabled. I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings but I actually have two 8Gbit FC Cards in the SM server and both exhibit the problem. I have tried to swap things around too with no luck. I do use every available PCI-E slot though.. L2ARC, SLOG etc. Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro? >>Windows on hardware yes, all I have to do is zone a block device over to Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no doubt it will cause this issue... when I say I cannot cause this on VMware then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating system without raw device mapping - I have not tested if I can cause this using RDM. It is also true for both HP and SM - both behave just fine using VMware and FibreChannel - however VMware can cause issues with Infiniband on the SM but I think that's a different issue and has to do with Mellanox and their terrible drivers that are never ever updated and half-beta etc. Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this. >> I think it could be a mx of both, would be interesting to see if something in Comstar could be fixed,... Bacically: ESX+FC+SM = problem ESX+FC+HP = no problem Win+FC+SM = problem Win+FC+HP = not tested ESX+IB+SM = problem ESX+IB+HP = no problem Win+IB+SM = not tested, SRP not supported in Win HP = not tested, SRP Win+IB+not supported in Win Anyway it all lead me to some information on 8Gbit FC - in particular portCfgFillWord Maybe this can affect some of this... google will reveal a great bit of info and also found some links.. http://virtualkenneth.com/2012/05/21/random-esxi-lun-loss-in-a-8gb-fc-brocade-infrastructure/ What about tuning the emlxs.conf? can anything be done there to get better performance? Are you using Emulex HBA's? That would explain things....I have never used Emulex in production. Tried some times in lab env, but always turned out to behave strangly... Br, Rune Rgrds Johan Br, Rune ? ? From: Nate Smith [mailto:nsmith at careyweb.com] Sent: Thursday, March 05, 2015 8:10 AM To: Rune Tipsmark; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. ? From: Rune Tipsmark [mailto:rt at steait.net] Sent: Thursday, March 05, 2015 11:08 AM To: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Same problem here… have noticed I can cause this easily by using Windows as initiator… I cannot cause this using VMware as initiator… No idea how to fix, but a big problem. Br, Rune ? ? From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Thursday, March 05, 2015 6:01 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? I’ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I’ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don’t get an error that it’s dropped, at least not on the Omnios system, but I get notice when it’s restored (which makes no sense). I’m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. ? Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:13 newstorm last message repeated 1 time Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From johan.kragsterman at capvert.se Thu Mar 5 20:12:06 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Thu, 5 Mar 2015 21:12:06 +0100 Subject: [OmniOS-discuss] Ang: RE: RE: Re: QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: References: , <22d96a9197cb4d7aa9c46c00b2f96337@EX1301.steait.net>, <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>, <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <0224e713f8ba49249c659888858f569b@EX1301.steait.net> <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com> Message-ID: -----Rune Tipsmark skrev: ----- Till: 'Johan Kragsterman' Fr?n: Rune Tipsmark Datum: 2015-03-05 20:44 Kopia: 'Nate Smith' , "omnios-discuss at lists.omniti.com" ?rende: RE: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? They are qLogic qmh2562 across the board... just figured the emlxs.conf had something to say since I had to edit it to get comstar into target mode. Br, Rune COMSTAR is target only, so you don't get COMSTAR into target mode, you get the HBA into target mode with a target driver, to give COMSTAR an interface to work with. If you are using qmh2562, you need the qlt driver, which I suppose you already use. emlx is the driver for Emulex HBA's, and is of no use when you're using qlogic HBA's. Rgrds Johan -----Original Message----- From: Johan Kragsterman [mailto:johan.kragsterman at capvert.se] Sent: Thursday, March 05, 2015 11:07 AM To: Rune Tipsmark Cc: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: Ang: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Hi! -----Rune Tipsmark skrev: ----- Till: 'Johan Kragsterman' Fr?n: Rune Tipsmark Datum: 2015-03-05 19:38 Kopia: 'Nate Smith' , "omnios-discuss at lists.omniti.com" ?rende: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Pls see below >> : [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Haven’t tried iSCSI but had similar issues with Infiniband… more frequent due to higher io load, but no console error messages. ? This only happened on my SuperMicro server and never on my HP server… what brand are you running? This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here... First, when you say "server", do you mean the SAN head? Not the hosts? >> SAN Head yes Second: Can you specify the exakt model of the Supermicro and the HP? >>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed >>the shitty HP software and controller from and replaced with ?an LSI >>9207 and installed OmniOS on. I have tested on other HP and SM servers >>too, all exhibit the same behavior (3 SM and 2 HP tested) Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses. >> both are set to max performance in BIOS and afaik all C-states are disabled. I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings but I actually have two 8Gbit FC Cards in the SM server and both exhibit the problem. I have tried to swap things around too with no luck. I do use every available PCI-E slot though.. L2ARC, SLOG etc. Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro? >>Windows on hardware yes, all I have to do is zone a block device over to Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no doubt it will cause this issue... when I say I cannot cause this on VMware then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating system without raw device mapping - I have not tested if I can cause this using RDM. It is also true for both HP and SM - both behave just fine using VMware and FibreChannel - however VMware can cause issues with Infiniband on the SM but I think that's a different issue and has to do with Mellanox and their terrible drivers that are never ever updated and half-beta etc. Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this. >> I think it could be a mx of both, would be interesting to see if something in Comstar could be fixed,... Bacically: ESX+FC+SM = problem ESX+FC+HP = no problem Win+FC+SM = problem Win+FC+HP = not tested ESX+IB+SM = problem ESX+IB+HP = no problem Win+IB+SM = not tested, SRP not supported in Win HP = not tested, SRP Win+IB+not supported in Win Anyway it all lead me to some information on 8Gbit FC - in particular portCfgFillWord Maybe this can affect some of this... google will reveal a great bit of info and also found some links.. http://virtualkenneth.com/2012/05/21/random-esxi-lun-loss-in-a-8gb-fc-brocade-infrastructure/ What about tuning the emlxs.conf? can anything be done there to get better performance? Are you using Emulex HBA's? That would explain things....I have never used Emulex in production. Tried some times in lab env, but always turned out to behave strangly... Br, Rune Rgrds Johan Br, Rune ? ? From: Nate Smith [mailto:nsmith at careyweb.com] Sent: Thursday, March 05, 2015 8:10 AM To: Rune Tipsmark; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. ? From: Rune Tipsmark [mailto:rt at steait.net] Sent: Thursday, March 05, 2015 11:08 AM To: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Same problem here… have noticed I can cause this easily by using Windows as initiator… I cannot cause this using VMware as initiator… No idea how to fix, but a big problem. Br, Rune ? ? From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Thursday, March 05, 2015 6:01 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? I’ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I’ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don’t get an error that it’s dropped, at least not on the Omnios system, but I get notice when it’s restored (which makes no sense). I’m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. ? Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:13 newstorm last message repeated 1 time Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From rt at steait.net Thu Mar 5 20:33:24 2015 From: rt at steait.net (Rune Tipsmark) Date: Thu, 5 Mar 2015 20:33:24 +0000 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: References: , <22d96a9197cb4d7aa9c46c00b2f96337@EX1301.steait.net>, <05240908cc744d0ea27321d0ea77b5e9@EX1301.steait.net>, <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <0224e713f8ba49249c659888858f569b@EX1301.steait.net> <476bc8c0-6135-4104-b3ca-9bac6873f473@careyweb.com> Message-ID: Ah ok, so just loading the qlt drives is enough, I followed a guide from napp-it when I first learned about solaris a year or so ago and it had the emlxs.conf target=1 described so I just followed it ever since. Any other files that can be used to tweak the target driver or comstar? Br, Rune -----Original Message----- From: Johan Kragsterman [mailto:johan.kragsterman at capvert.se] Sent: Thursday, March 05, 2015 12:12 PM To: Rune Tipsmark Cc: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: Ang: RE: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? -----Rune Tipsmark skrev: ----- Till: 'Johan Kragsterman' Fr?n: Rune Tipsmark Datum: 2015-03-05 20:44 Kopia: 'Nate Smith' , "omnios-discuss at lists.omniti.com" ?rende: RE: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? They are qLogic qmh2562 across the board... just figured the emlxs.conf had something to say since I had to edit it to get comstar into target mode. Br, Rune COMSTAR is target only, so you don't get COMSTAR into target mode, you get the HBA into target mode with a target driver, to give COMSTAR an interface to work with. If you are using qmh2562, you need the qlt driver, which I suppose you already use. emlx is the driver for Emulex HBA's, and is of no use when you're using qlogic HBA's. Rgrds Johan -----Original Message----- From: Johan Kragsterman [mailto:johan.kragsterman at capvert.se] Sent: Thursday, March 05, 2015 11:07 AM To: Rune Tipsmark Cc: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: Ang: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Hi! -----Rune Tipsmark skrev: ----- Till: 'Johan Kragsterman' Fr?n: Rune Tipsmark Datum: 2015-03-05 19:38 Kopia: 'Nate Smith' , "omnios-discuss at lists.omniti.com" ?rende: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Pls see below >> : [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Haven’t tried iSCSI but had similar issues with Infiniband… more frequent due to higher io load, but no console error messages. ? This only happened on my SuperMicro server and never on my HP server… what brand are you running? This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here... First, when you say "server", do you mean the SAN head? Not the hosts? >> SAN Head yes Second: Can you specify the exakt model of the Supermicro and the HP? >>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed >>the shitty HP software and controller from and replaced with ?an LSI >>9207 and installed OmniOS on. I have tested on other HP and SM servers >>too, all exhibit the same behavior (3 SM and 2 HP tested) Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses. >> both are set to max performance in BIOS and afaik all C-states are disabled. I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings but I actually have two 8Gbit FC Cards in the SM server and both exhibit the problem. I have tried to swap things around too with no luck. I do use every available PCI-E slot though.. L2ARC, SLOG etc. Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro? >>Windows on hardware yes, all I have to do is zone a block device over to Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no doubt it will cause this issue... when I say I cannot cause this on VMware then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating system without raw device mapping - I have not tested if I can cause this using RDM. It is also true for both HP and SM - both behave just fine using VMware and FibreChannel - however VMware can cause issues with Infiniband on the SM but I think that's a different issue and has to do with Mellanox and their terrible drivers that are never ever updated and half-beta etc. Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this. >> I think it could be a mx of both, would be interesting to see if something in Comstar could be fixed,... Bacically: ESX+FC+SM = problem ESX+FC+HP = no problem Win+FC+SM = problem Win+FC+HP = not tested ESX+IB+SM = problem ESX+IB+HP = no problem Win+IB+SM = not tested, SRP not supported in Win HP = not tested, SRP Win+IB+not supported in Win Anyway it all lead me to some information on 8Gbit FC - in particular portCfgFillWord Maybe this can affect some of this... google will reveal a great bit of info and also found some links.. http://virtualkenneth.com/2012/05/21/random-esxi-lun-loss-in-a-8gb-fc-brocade-infrastructure/ What about tuning the emlxs.conf? can anything be done there to get better performance? Are you using Emulex HBA's? That would explain things....I have never used Emulex in production. Tried some times in lab env, but always turned out to behave strangly... Br, Rune Rgrds Johan Br, Rune ? ? From: Nate Smith [mailto:nsmith at careyweb.com] Sent: Thursday, March 05, 2015 8:10 AM To: Rune Tipsmark; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out. ? From: Rune Tipsmark [mailto:rt at steait.net] Sent: Thursday, March 05, 2015 11:08 AM To: 'Nate Smith'; omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Same problem here… have noticed I can cause this easily by using Windows as initiator… I cannot cause this using VMware as initiator… No idea how to fix, but a big problem. Br, Rune ? ? From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Thursday, March 05, 2015 6:01 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? I’ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I’ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don’t get an error that it’s dropped, at least not on the Omnios system, but I get notice when it’s restored (which makes no sense). I’m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. ? Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:13 newstorm last message repeated 1 time Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From henson at acm.org Fri Mar 6 03:08:30 2015 From: henson at acm.org (Paul B. Henson) Date: Thu, 5 Mar 2015 19:08:30 -0800 Subject: [OmniOS-discuss] CIFS File Lock Problems In-Reply-To: References: Message-ID: <32b301d057ba$d6168cc0$8243a640$@acm.org> > From: Aaron Curry > Sent: Tuesday, March 03, 2015 3:45 PM > > We have encountered an issue with out OmniOS CIFS file server and file locks. We are currently actually using samba under omnios rather than the in-kernel CIFS server. One reason is that the in-kernel server does not support our requirement to use an MIT Kerberos realm for NFS, and an active directory domain for CIFS. Another is that samba just supports more current features of CIFS. I believe both of these issues are resolved in the nexenta illumos fork, which implements SMB2 and fixes a lot of other stuff. That code has been released and is available for integration into upstream illumos (where it would then come back down into omnios), but unfortunately that would be a lot of work and I don't believe anyone is currently planning on doing it :(. If that ever happens we will reevaluate the in-kernel CIFS server, as I'd rather be using that? From henson at acm.org Fri Mar 6 03:22:47 2015 From: henson at acm.org (Paul B. Henson) Date: Thu, 5 Mar 2015 19:22:47 -0800 Subject: [OmniOS-discuss] Long group names in ls acl output In-Reply-To: <54F621D0.6070506@umn.edu> References: <54F5D619.6000902@umn.edu> <2dfa01d055f4$bfa16130$3ee42390$@acm.org> <54F621D0.6070506@umn.edu> Message-ID: <32bd01d057bc$d4102380$7c306a80$@acm.org> > From: Nathan Huff > Sent: Tuesday, March 03, 2015 1:04 PM > > -n works for the regular user and group but seems to have no effect on > ACL entries Ah, sorry, I don't recall seeing ACL entries mentioned in your original post, perhaps I missed it. I took a quick look, it appears that ls does not parse/print ACL's itself, it uses the acl_printacl utility function in libsec. Unfortunately, I don't see any nontrivial way to modify it to do what you want. From nrhuff at umn.edu Fri Mar 6 16:32:19 2015 From: nrhuff at umn.edu (Nathan Huff) Date: Fri, 06 Mar 2015 10:32:19 -0600 Subject: [OmniOS-discuss] Long group names in ls acl output In-Reply-To: <32bd01d057bc$d4102380$7c306a80$@acm.org> References: <54F5D619.6000902@umn.edu> <2dfa01d055f4$bfa16130$3ee42390$@acm.org> <54F621D0.6070506@umn.edu> <32bd01d057bc$d4102380$7c306a80$@acm.org> Message-ID: <54F9D693.8080400@umn.edu> I ended up writing a shared library that overrides the acl_printacl routine with a maximum id string size of 256 instead of 20 that I can LD_PRELOAD if I need to see longer names. Hacky, but seems to work. On 2015-03-05 9:22 PM, Paul B. Henson wrote: >> From: Nathan Huff >> Sent: Tuesday, March 03, 2015 1:04 PM >> >> -n works for the regular user and group but seems to have no effect on >> ACL entries > > Ah, sorry, I don't recall seeing ACL entries mentioned in your original > post, perhaps I missed it. > > I took a quick look, it appears that ls does not parse/print ACL's itself, > it uses the acl_printacl utility function in libsec. Unfortunately, I don't > see any nontrivial way to modify it to do what you want. > > > -- Nathan Huff System Administrator Academic Health Center Information Systems University of Minnesota 612-626-9136 From richard.elling at richardelling.com Fri Mar 6 16:38:42 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Fri, 6 Mar 2015 08:38:42 -0800 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> Message-ID: > On Mar 5, 2015, at 6:00 AM, Nate Smith wrote: > > I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. Is there a PCI bridge in the data path? These can often be found on mezzanine or riser cards. -- richard > > Mar 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G > Mar 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G > Mar 5 02:00:13 newstorm last message repeated 1 time > Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G > Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G > Mar 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From nsmith at careyweb.com Fri Mar 6 16:56:45 2015 From: nsmith at careyweb.com (Nate Smith) Date: Fri, 6 Mar 2015 11:56:45 -0500 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> Message-ID: <4d6c8571-a7b9-4c31-bd87-2fe6e611f214@careyweb.com> Yeah, there is on R720s, I think. What about on the Supermicro and HP servers? From: Richard Elling [mailto:richard.elling at richardelling.com] Sent: Friday, March 06, 2015 11:39 AM To: Nate Smith Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? On Mar 5, 2015, at 6:00 AM, Nate Smith wrote: I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. Is there a PCI bridge in the data path? These can often be found on mezzanine or riser cards. -- richard Mar 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:13 newstorm last message repeated 1 time Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From asc1111 at gmail.com Fri Mar 6 18:41:26 2015 From: asc1111 at gmail.com (Aaron Curry) Date: Fri, 6 Mar 2015 11:41:26 -0700 Subject: [OmniOS-discuss] CIFS File Lock Problems In-Reply-To: <32b301d057ba$d6168cc0$8243a640$@acm.org> References: <32b301d057ba$d6168cc0$8243a640$@acm.org> Message-ID: Paul, Thank you for the response. I was beginning to think that everyone thought this wasn't even worth commenting on. We have a couple OmniOS servers that have been running the in-kernel CIFS server with Active Directory integration for a while now and haven't had any problems. We've been very happy with it. Of course those don't handle as many users as this new one. I guess the problems only show up under the stress of too many connections. I have considered running Samba as an alternative since I know that's what a lot of people are doing. So I'm curious, what version are you running? 3 or 4? Is there a package I can install or do I need to build it myself? Is there any sort of documentation showing how to get Samba working with AD on OmniOS? I'm not afraid to do the work myself, it would just save a lot of time to follow someone else's work. And, with this file server on the fritz, we don't have a lot of time. Thanks again, Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From johan.kragsterman at capvert.se Fri Mar 6 18:57:02 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Fri, 6 Mar 2015 19:57:02 +0100 Subject: [OmniOS-discuss] Ang: Re: QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: References: , <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> Message-ID: -----"OmniOS-discuss" skrev: ----- Till: Nate Smith Fr?n: Richard Elling S?nt av: "OmniOS-discuss" Datum: 2015-03-06 17:39 Kopia: omnios-discuss at lists.omniti.com ?rende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? On Mar 5, 2015, at 6:00 AM, Nate Smith wrote: I’ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I’ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don’t get an error that it’s dropped, at least not on the Omnios system, but I get notice when it’s restored (which makes no sense). I’m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. Is there a PCI bridge in the data path? These can often be found on mezzanine or riser cards. ?-- richard ? I never thought of that as a possible problem before, but of coarse, it must be a source of possible complications. I never had these problems, though, but interesting for the future! Thanks for that, Richard! Rgrds Johan _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From sjorge+ml at blackdot.be Fri Mar 6 19:01:54 2015 From: sjorge+ml at blackdot.be (Jorge Schrauwen) Date: Fri, 06 Mar 2015 20:01:54 +0100 Subject: [OmniOS-discuss] CIFS File Lock Problems In-Reply-To: References: <32b301d057ba$d6168cc0$8243a640$@acm.org> Message-ID: <73bd3a54ce6530d92a4f4374f4a0d994@blackdot.be> Hi Auron, I run the in-kernel CIFS too. I hit the same problem at home (plex accssing a large file share). I did not comment because I did not have a fix :( Gwr is upsteaming some of the smb bits, if I am not mistaken. There are a lot of goodies in the Nexenta tree and Joyent tree that would rock if they got upstreamed. I played with Samba4 for a bit but ended up back on in-kernel CIFS for ease of use. That was also the original reason i switched to OmniOS :) Goodluck on your quest for a solution Jorge On 2015-03-06 19:41, Aaron Curry wrote: > Paul, > > Thank you for the response. I was beginning to think that everyone > thought this wasn't even worth commenting on. > > We have a couple OmniOS servers that have been running the in-kernel > CIFS server with Active Directory integration for a while now and > haven't had any problems. We've been very happy with it. Of course > those don't handle as many users as this new one. I guess the problems > only show up under the stress of too many connections. > > I have considered running Samba as an alternative since I know that's > what a lot of people are doing. So I'm curious, what version are you > running? 3 or 4? Is there a package I can install or do I need to build > it myself? Is there any sort of documentation showing how to get Samba > working with AD on OmniOS? I'm not afraid to do the work myself, it > would just save a lot of time to follow someone else's work. And, with > this file server on the fritz, we don't have a lot of time. > > Thanks again, > > Aaron > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss [1] Links: ------ [1] http://lists.omniti.com/mailman/listinfo/omnios-discuss From geoffn at gnaa.net Fri Mar 6 19:16:27 2015 From: geoffn at gnaa.net (Geoff Nordli) Date: Fri, 06 Mar 2015 11:16:27 -0800 Subject: [OmniOS-discuss] CIFS File Lock Problems In-Reply-To: <32b301d057ba$d6168cc0$8243a640$@acm.org> References: <32b301d057ba$d6168cc0$8243a640$@acm.org> Message-ID: <54F9FD0B.1040601@gnaa.net> On 15-03-05 07:08 PM, Paul B. Henson wrote: >> From: Aaron Curry >> Sent: Tuesday, March 03, 2015 3:45 PM >> >> We have encountered an issue with out OmniOS CIFS file server and file locks. > We are currently actually using samba under omnios rather than the in-kernel CIFS server. One reason is that the in-kernel server does not support our requirement to use an MIT Kerberos realm for NFS, and an active directory domain for CIFS. Another is that samba just supports more current features of CIFS. > > I believe both of these issues are resolved in the nexenta illumos fork, which implements SMB2 and fixes a lot of other stuff. That code has been released and is available for integration into upstream illumos (where it would then come back down into omnios), but unfortunately that would be a lot of work and I don't believe anyone is currently planning on doing it :(. If that ever happens we will reevaluate the in-kernel CIFS server, as I'd rather be using that? > > ___ Paul, when using Samba, can users restore files via the "previous versions" within Windows to see all of the snapshots? Having an easy way for people to restore files is a huge plus. The main problem I have is every once in a while the server will lockup and the only way around it is a hard reset. It would be great if those SMB2 and fixes get upstreamed at some point. thanks, Geoff From danmcd at omniti.com Fri Mar 6 19:30:35 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 6 Mar 2015 14:30:35 -0500 Subject: [OmniOS-discuss] CIFS File Lock Problems In-Reply-To: <54F9FD0B.1040601@gnaa.net> References: <32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net> Message-ID: <9897DF76-CDA3-41C6-8839-A75BF9357B52@omniti.com> > On Mar 6, 2015, at 2:16 PM, Geoff Nordli wrote: > It would be great if those SMB2 and fixes get upstreamed at some point. All of the distro makers are busy... well... working on their distros. I certainly am (r151014 with its pkg(5) improvements, including some not yet in bloody, is in its final approach), and I know Joyent & Nexenta are as well. When we find time, we upstream. Sometimes it's easy, and sometimes it's hard. Sometimes it's hard because a distro's architectural decisions aren't the same as other distros, and it takes times to convert a distro's technology into something upstreamable. Joyent coolness sometimes has this problem (e.g. their work in virtual network devices). They're not sabotaging upstreaming, they are solving their problems first. Another reason it's hard can be because a technology arrives in several pieces, and you really have to upstream them a piece at a time for the best fit. I know the SMB2 work from Nexenta is like this. Again, it's not done to screw the community, it's done because they have paying customers who want it, and they know who's writing their paychecks. If you want something upstreamed, volunteer in the community. Volunteer by offering to test, by offering to inspect a distro's source and see its commit history. I have pieces myself that I'd like to upstream --> these will allow the building of stock illumos-gate on OmniOS. I can't upstream them all just yet because they come in pieces, and because I have r151014 coming soon. Sorry if I'm pontificating here, but this all isn't easy. :) Thanks, Dan From henson at acm.org Fri Mar 6 19:41:34 2015 From: henson at acm.org (Paul B. Henson) Date: Fri, 6 Mar 2015 11:41:34 -0800 Subject: [OmniOS-discuss] Long group names in ls acl output In-Reply-To: <54F9D693.8080400@umn.edu> References: <54F5D619.6000902@umn.edu> <2dfa01d055f4$bfa16130$3ee42390$@acm.org> <54F621D0.6070506@umn.edu> <32bd01d057bc$d4102380$7c306a80$@acm.org> <54F9D693.8080400@umn.edu> Message-ID: <330f01d05845$8fc51530$af4f3f90$@acm.org> > From: Nathan Huff > Sent: Friday, March 06, 2015 8:32 AM > > I ended up writing a shared library that overrides the acl_printacl > routine with a maximum id string size of 256 instead of 20 that I can > LD_PRELOAD if I need to see longer names. Hacky, but seems to work. Been there, done that :). Glad you at least found a workaround for now. Tentatively, I would think the cleanest upstream solution would be to add a new flag to acl_printacl to print uid/gid instead of name. I'm not sure how straightforward that would be or how much work it would result in though. From henson at acm.org Fri Mar 6 19:50:01 2015 From: henson at acm.org (Paul B. Henson) Date: Fri, 6 Mar 2015 11:50:01 -0800 Subject: [OmniOS-discuss] CIFS File Lock Problems In-Reply-To: References: <32b301d057ba$d6168cc0$8243a640$@acm.org> Message-ID: <331101d05846$bdea7a80$39bf6f80$@acm.org> > From: Aaron Curry > Sent: Friday, March 06, 2015 10:41 AM > > I have considered running Samba as an alternative since I know that's what a lot > of people are doing. So I'm curious, what version are you running? 3 or 4? We are currently running 3.6.25, we haven't made the jump to 4 yet. > there a package I can install or do I need to build it myself? Personally we build it from source via pkgsrc. However, you might be able to use the precompiled pkgsrc binaries from Joyent if you prefer. > Is there any sort of > documentation showing how to get Samba working with AD on OmniOS? It's basically the exact same as getting samba to work on any OS, so pretty much any guide you find on the Internet should be usable. Here is our current config: [global] allow trusted domains = no enable privileges = no deadtime = 10 debug pid = yes disable netbios = yes enable privileges = no idmap config * : backend = nss idmap config * : range = 2147483648-2147483648 idmap config WIN : backend = nss idmap config WIN : range = 1000-2147483647 lanman auth = no load printers = no log level = 1 map archive = no name resolve order = host realm = WIN.CSUPOMONA.EDU restrict anonymous = 1 security = ads server signing = auto show add printer wizard = no workgroup = WIN writable = yes max log size = 512000 unix extensions = no vfs objects = shadow_copy2 zfsacl shadow: snapdir = .zfs/snapshot shadow: format = backup-%Y.%m.%d-%H.%M.%S shadow: sort = desc shadow: localtime = yes nfs4: mode = special multicast dns register = no max protocol = SMB2 wide links = yes [homes] browseable = no path = /export/user/%S include = /etc/samba/smb-groups.conf [global] private dir = /etc/samba/private From henson at acm.org Fri Mar 6 19:51:05 2015 From: henson at acm.org (Paul B. Henson) Date: Fri, 6 Mar 2015 11:51:05 -0800 Subject: [OmniOS-discuss] CIFS File Lock Problems In-Reply-To: <54F9FD0B.1040601@gnaa.net> References: <32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net> Message-ID: <331901d05846$e406cbb0$ac146310$@acm.org> > From: Geoff Nordli > Sent: Friday, March 06, 2015 11:16 AM > > Paul, when using Samba, can users restore files via the "previous > versions" within Windows to see all of the snapshots? Having an easy way > for people to restore files is a huge plus. Yes, samba works with zfs snapshots and allows them to be presented via the Windows shadow copy interface. From henson at acm.org Fri Mar 6 19:58:02 2015 From: henson at acm.org (Paul B. Henson) Date: Fri, 6 Mar 2015 11:58:02 -0800 Subject: [OmniOS-discuss] CIFS File Lock Problems In-Reply-To: <9897DF76-CDA3-41C6-8839-A75BF9357B52@omniti.com> References: <32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net> <9897DF76-CDA3-41C6-8839-A75BF9357B52@omniti.com> Message-ID: <331c01d05847$dc4005d0$94c01170$@acm.org> > From: Dan McDonald > Sent: Friday, March 06, 2015 11:31 AM > > for the best fit. I know the SMB2 work from Nexenta is like this. Again, it's not > done to screw the community, it's done because they have paying customers > who want it, and they know who's writing their paychecks. I don't think anybody thinks any of the distributions are screwing the community :), they've released their code changes, which is really the only obligation they have. Particularly for changes like the Nexenta SMB2 stuff, they are so complicated and divergent from upstream it's really difficult to get them in, and it's understandable particularly for a commercial company that it's not a high priority. While I certainly might whine and sigh and say stuff like "I wish those SMB2 updates would get upstreamed" I'm not blaming anybody for not doing it :). I'm sure I speak for everybody on the list that we really appreciate all of the work you do, particularly the help and support you provide above and beyond the responsibilities of your day job, and in no way expect you to do everything for everybody ;). Thanks! From sjorge+ml at blackdot.be Fri Mar 6 20:08:09 2015 From: sjorge+ml at blackdot.be (Jorge Schrauwen) Date: Fri, 06 Mar 2015 21:08:09 +0100 Subject: [OmniOS-discuss] CIFS File Lock Problems In-Reply-To: <331c01d05847$dc4005d0$94c01170$@acm.org> References: <32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net> <9897DF76-CDA3-41C6-8839-A75BF9357B52@omniti.com> <331c01d05847$dc4005d0$94c01170$@acm.org> Message-ID: <91a70370340fe476e30ee8561c32ddc1@blackdot.be> +1 this post I feel frustration but only at the lack of my own ability to not being smart enough to upstream myself. Testing is all I seem to be able to contribute at the moment. ~ sjorge On 2015-03-06 20:58, Paul B. Henson wrote: >> From: Dan McDonald >> Sent: Friday, March 06, 2015 11:31 AM >> >> for the best fit. I know the SMB2 work from Nexenta is like this. >> Again, > it's not >> done to screw the community, it's done because they have paying >> customers >> who want it, and they know who's writing their paychecks. > > I don't think anybody thinks any of the distributions are screwing the > community :), they've released their code changes, which is really the > only > obligation they have. Particularly for changes like the Nexenta SMB2 > stuff, > they are so complicated and divergent from upstream it's really > difficult > to get them in, and it's understandable particularly for a commercial > company that it's not a high priority. While I certainly might whine > and > sigh and say stuff like "I wish those SMB2 updates would get > upstreamed" I'm > not blaming anybody for not doing it :). > > I'm sure I speak for everybody on the list that we really appreciate > all of > the work you do, particularly the help and support you provide above > and > beyond the responsibilities of your day job, and in no way expect you > to do > everything for everybody ;). Thanks! > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From geoffn at gnaa.net Fri Mar 6 20:13:43 2015 From: geoffn at gnaa.net (Geoff Nordli) Date: Fri, 06 Mar 2015 12:13:43 -0800 Subject: [OmniOS-discuss] CIFS File Lock Problems In-Reply-To: <9897DF76-CDA3-41C6-8839-A75BF9357B52@omniti.com> References: <32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net> <9897DF76-CDA3-41C6-8839-A75BF9357B52@omniti.com> Message-ID: <54FA0A77.7050503@gnaa.net> On 15-03-06 11:30 AM, Dan McDonald wrote: >> On Mar 6, 2015, at 2:16 PM, Geoff Nordli wrote: >> It would be great if those SMB2 and fixes get upstreamed at some point. > All of the distro makers are busy... well... working on their distros. I certainly am (r151014 with its pkg(5) improvements, including some not yet in bloody, is in its final approach), and I know Joyent & Nexenta are as well. > > When we find time, we upstream. Sometimes it's easy, and sometimes it's hard. Sometimes it's hard because a distro's architectural decisions aren't the same as other distros, and it takes times to convert a distro's technology into something upstreamable. Joyent coolness sometimes has this problem (e.g. their work in virtual network devices). They're not sabotaging upstreaming, they are solving their problems first. Another reason it's hard can be because a technology arrives in several pieces, and you really have to upstream them a piece at a time for the best fit. I know the SMB2 work from Nexenta is like this. Again, it's not done to screw the community, it's done because they have paying customers who want it, and they know who's writing their paychecks. > > If you want something upstreamed, volunteer in the community. Volunteer by offering to test, by offering to inspect a distro's source and see its commit history. I have pieces myself that I'd like to upstream --> these will allow the building of stock illumos-gate on OmniOS. I can't upstream them all just yet because they come in pieces, and because I have r151014 coming soon. > > Sorry if I'm pontificating here, but this all isn't easy. :) > > Thanks, > Dan > Dan, it definitely isn't easy. I know the rules: If you aren't able to do it and if you aren't a paying customer then you have no right to complain/choose what people work on. People in the community work on what interests them or work on what the company which pays their bills ask them to work on. I know the work (SMB2 and other fixes) Nexenta has done has a lot of moving pieces therefore it isn't very easy to upstream. I have been following the discussion since they announced the opening of that code. I thoroughly appreciate all the work everyone does around illumos and all the distributions. I have been in the community for five years and I need to be contributing more than I currently do. Happy Friday!! Geoff From geoffn at gnaa.net Fri Mar 6 21:18:03 2015 From: geoffn at gnaa.net (Geoff Nordli) Date: Fri, 06 Mar 2015 13:18:03 -0800 Subject: [OmniOS-discuss] CIFS File Lock Problems In-Reply-To: <331901d05846$e406cbb0$ac146310$@acm.org> References: <32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net> <331901d05846$e406cbb0$ac146310$@acm.org> Message-ID: <54FA198B.7090300@gnaa.net> On 15-03-06 11:51 AM, Paul B. Henson wrote: >> From: Geoff Nordli >> Sent: Friday, March 06, 2015 11:16 AM >> >> Paul, when using Samba, can users restore files via the "previous >> versions" within Windows to see all of the snapshots? Having an easy way >> for people to restore files is a huge plus. > Yes, samba works with zfs snapshots and allows them to be presented via the Windows shadow copy interface. > Thanks Paul. Did you follow an install guide to get Samba running on Omnios? Where did you get the package from? I don't see anything in the core or "extras" repo. I see a 3.6.x in the pkgsrc repo. Geoff From geoffn at gnaa.net Fri Mar 6 22:03:55 2015 From: geoffn at gnaa.net (Geoff Nordli) Date: Fri, 06 Mar 2015 14:03:55 -0800 Subject: [OmniOS-discuss] CIFS File Lock Problems In-Reply-To: <54FA198B.7090300@gnaa.net> References: <32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net> <331901d05846$e406cbb0$ac146310$@acm.org> <54FA198B.7090300@gnaa.net> Message-ID: <54FA244B.1010600@gnaa.net> On 15-03-06 01:18 PM, Geoff Nordli wrote: > On 15-03-06 11:51 AM, Paul B. Henson wrote: >>> From: Geoff Nordli >>> Sent: Friday, March 06, 2015 11:16 AM >>> >>> Paul, when using Samba, can users restore files via the "previous >>> versions" within Windows to see all of the snapshots? Having an easy >>> way >>> for people to restore files is a huge plus. >> Yes, samba works with zfs snapshots and allows them to be presented >> via the Windows shadow copy interface. >> > > Thanks Paul. > > Did you follow an install guide to get Samba running on Omnios? > > Where did you get the package from? I don't see anything in the core > or "extras" repo. I see a 3.6.x in the pkgsrc repo. > > Geoff > > Forget it, I see that you already outlined it in a response to Aaron. From rt at steait.net Sat Mar 7 03:04:44 2015 From: rt at steait.net (Rune Tipsmark) Date: Sat, 7 Mar 2015 03:04:44 +0000 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: <4d6c8571-a7b9-4c31-bd87-2fe6e611f214@careyweb.com> References: <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <4d6c8571-a7b9-4c31-bd87-2fe6e611f214@careyweb.com> Message-ID: No idea to be honest, even if there is its scary if it can cause these kinds of problems? Br, Rune From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Friday, March 06, 2015 8:57 AM To: 'Richard Elling' Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Yeah, there is on R720s, I think. What about on the Supermicro and HP servers? From: Richard Elling [mailto:richard.elling at richardelling.com] Sent: Friday, March 06, 2015 11:39 AM To: Nate Smith Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? On Mar 5, 2015, at 6:00 AM, Nate Smith > wrote: I?ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I?ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don?t get an error that it?s dropped, at least not on the Omnios system, but I get notice when it?s restored (which makes no sense). I?m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. Is there a PCI bridge in the data path? These can often be found on mezzanine or riser cards. -- richard Mar 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:13 newstorm last message repeated 1 time Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From omnios at citrus-it.net Sat Mar 7 11:17:03 2015 From: omnios at citrus-it.net (Andy) Date: Sat, 7 Mar 2015 11:17:03 +0000 (GMT) Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator Message-ID: Hi, I'm doing some testing with bloody, mainly to look at the new linked ipkg zones. Got distracted for now with the addition of mailwrapper and the new mta mediator which have come from Illumos 5166. Mailwrapper is completely unecessary on an IPS distribution as it effectively duplicates the function of mediators but I can see from discussion on illumos-discuss that it was added to support non-IPS distributions and I can see the logic in that at the Illumos level. However, it's giving me a bit of an upgrade headache which I'm working through and looking for ideas/help. At Citrus, we're in the business of running mail relays and our MTA of choice is Sendmail. Being involved in the Sendmail community, we're actually runinng a beta of sendmail 8.15.2 as we need some of the TLS and IPv6 enhancements it provides. Whilst I embrace OmniOS' KYSTY principle (and it's one of the reasons we chose OmniOS in the first place), Sendmail is one of the packages that we deliver using standard paths and service names. That is, configuration files under /etc/mail and a service that can be managed as just 'sendmail'. To date (we're running r151012 in production), OmniOS doesn't install an MTA by default but, with the integration of 5166, sendmail becomes a dependency of mailwrapper and mailwrapper is required by SUNWcs. Problem 1 - That immediately causes a conflict on uprade as we already have a package which delivers /usr/lib/sendmail etc. That's easily fixed by making these mediated links and that allows us to switch our sendmail in to replace the default package. aomni# (162) pkg mediator -a MEDIATOR VER. SRC. VERSION IMPL. SRC. IMPLEMENTATION mta site site citrus-sendmail mta system system mailwrapper mta system system sendmail Problem 2 - /etc/mail is populated by the OmniOS sendmail package. I could work around that by rebuilding our sendmail package to use /etc/opt/citrus-mail or something similar for its configuration. I don't really want to do that as the change will have an impact on backend systems and it will definitely confuse the people who look after the systems for a while. Problem 3 - sendmail drops in /etc/init.d/sendmail - a script which enables the standard sendmail service. One of our support staff is bound to type '/etc/init.d/sendmail start' at some point even though on our current systems that command doesn't exist. That script is flagged as preserve=true in the manifest so I suppose I could just add an exit 0 near the top! Problem 4 - IPS doesn't allow for mediated services so we'll always have svc:/network/smtp:sendmail. Again, we can rebuild ours to use svc:/network/smtp:citrus-sendmail or similar, but everyone is used to managing sendmail as just 'sendmail'. This is a long way of me asking if mailwrapper could be removed from OmniOS as it isn't required for an IPS distribution. That would remove the requirement to have the standard sendmail package installed at all - just like <=r151012. It would mean that 'mailx' doesn't work but that should be expected if you haven't installed an MTA and is presumably the current behaviour. Can anyone see any other practical and supportable options that would allow us to replace the sendmail package wholesale? Thanks in advance, Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From mir at miras.org Sat Mar 7 11:54:43 2015 From: mir at miras.org (Michael Rasmussen) Date: Sat, 7 Mar 2015 12:54:43 +0100 Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator In-Reply-To: References: Message-ID: <20150307125443.0240edbb@sleipner.datanom.net> On Sat, 7 Mar 2015 11:17:03 +0000 (GMT) Andy wrote: > > Can anyone see any other practical and supportable options that would > allow us to replace the sendmail package wholesale? > Could something be borrowed from Debian where the mta package simply is creating a link to the actual mta and the sysadms change the pointer by running update-alternatives? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: So you're back... about time... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From omnios at citrus-it.net Sat Mar 7 12:42:48 2015 From: omnios at citrus-it.net (Andy) Date: Sat, 7 Mar 2015 12:42:48 +0000 (GMT) Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator In-Reply-To: <20150307125443.0240edbb@sleipner.datanom.net> References: <20150307125443.0240edbb@sleipner.datanom.net> Message-ID: On Sat, 7 Mar 2015, Michael Rasmussen wrote: ; On Sat, 7 Mar 2015 11:17:03 +0000 (GMT) ; Andy wrote: ; ; > ; > Can anyone see any other practical and supportable options that would ; > allow us to replace the sendmail package wholesale? ; > ; Could something be borrowed from Debian where the mta package simply is ; creating a link to the actual mta and the sysadms change the pointer by ; running update-alternatives? That's precisely what IPS pkg mediators do: aomni# (168) ls -l /usr/lib/sendmail lrwxrwxrwx 1 root root 32 Mar 4 13:49 /usr/lib/sendmail -> ../../opt/sendmail/sbin/sendmail* aomni# (172) pkg set-mediator -I mailwrapper mta aomni# (173) ls -l /usr/lib/sendmail lrwxrwxrwx 1 root root 11 Mar 4 15:21 /usr/lib/sendmail -> mailwrapper* I've no problem with that as a solution for the MTA binaries aomni# (178) pkg contents -a mediator=mta -o action.raw service/network/smtp/sendmail ACTION.RAW link mediator=mta mediator-implementation=sendmail path=usr/bin/mailq target=../lib/smtp/sendmail/mailq link mediator=mta mediator-implementation=sendmail path=usr/sbin/sendmail target=../lib/smtp/sendmail/sendmail link mediator=mta mediator-implementation=sendmail path=usr/sbin/newaliases target=../lib/smtp/sendmail/newaliases link mediator=mta mediator-implementation=sendmail path=etc/aliases target=./mail/aliases link mediator=mta mediator-implementation=sendmail path=usr/lib/sendmail target=../lib/smtp/sendmail/sendmail but it doesn't help me with /etc/mail, the SMF service or the /etc/init.d script that value=pkg://omnios/service/network/smtp/sendmail at 8.14.4,5.11-0.151013 installs. A. -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From johan.kragsterman at capvert.se Sat Mar 7 15:24:36 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Sat, 7 Mar 2015 16:24:36 +0100 Subject: [OmniOS-discuss] Ang: Re: QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: References: , <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <4d6c8571-a7b9-4c31-bd87-2fe6e611f214@careyweb.com> Message-ID: -----"OmniOS-discuss" skrev: ----- Till: "'Nate Smith'" , "'Richard Elling'" Fr?n: Rune Tipsmark S?nt av: "OmniOS-discuss" Datum: 2015-03-07 04:06 Kopia: "omnios-discuss at lists.omniti.com" ?rende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? No idea to be honest, even if there is its scary if it can cause these kinds of problems… Br, Rune You don't know wether these systems got risers or not? That can't be difficult to find out: Are the HBA's located directly in PCIe slots on the system board, or are they instead located in riser boards that sits in the PCIe slots? It would be very interesting to find out.... If Richards theory is correct, you got HBA's sitting in risers on the Supermicro, but on the HP you got the HBA's directly in the PCIe slots on the system board. Rgrds Johan ? From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Friday, March 06, 2015 8:57 AM To: 'Richard Elling' Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? Yeah, there is on R720s, I think. What about on the Supermicro and HP servers? ? From: Richard Elling [mailto:richard.elling at richardelling.com] Sent: Friday, March 06, 2015 11:39 AM To: Nate Smith Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? ? ? On Mar 5, 2015, at 6:00 AM, Nate Smith wrote: ? I’ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I’ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don’t get an error that it’s dropped, at least not on the Omnios system, but I get notice when it’s restored (which makes no sense). I’m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. ? Is there a PCI bridge in the data path? These can often be found on mezzanine or riser cards. ?-- richard ? ? Mar? 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:13 newstorm last message repeated 1 time Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar? 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss ? _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From rt at steait.net Sat Mar 7 16:14:07 2015 From: rt at steait.net (Rune Tipsmark) Date: Sat, 7 Mar 2015 16:14:07 +0000 Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: References: , <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <4d6c8571-a7b9-4c31-bd87-2fe6e611f214@careyweb.com>, Message-ID: <1425744844314.65927@steait.net> ok, so HP has a riser and the FC cards are sitting in the riser. SM has no riser and all cards are inserted directly onto motherboard. Also I just remembered... when I had the Infiniband ConnectX2 installed in the SM it would not reboot and I always had to reset it via IPMI. The more I think about it the more I lean towards SM having an issue... and Dell uses essentially SM so same same. br, Rune ________________________________________ From: Johan Kragsterman Sent: Saturday, March 7, 2015 4:24 PM To: Rune Tipsmark Cc: 'Nate Smith'; 'Richard Elling'; omnios-discuss at lists.omniti.com Subject: Ang: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? -----"OmniOS-discuss" skrev: ----- Till: "'Nate Smith'" , "'Richard Elling'" Fr?n: Rune Tipsmark S?nt av: "OmniOS-discuss" Datum: 2015-03-07 04:06 Kopia: "omnios-discuss at lists.omniti.com" ?rende: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? No idea to be honest, even if there is its scary if it can cause these kinds of problems… Br, Rune You don't know wether these systems got risers or not? That can't be difficult to find out: Are the HBA's located directly in PCIe slots on the system board, or are they instead located in riser boards that sits in the PCIe slots? It would be very interesting to find out.... If Richards theory is correct, you got HBA's sitting in risers on the Supermicro, but on the HP you got the HBA's directly in the PCIe slots on the system board. Rgrds Johan From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith Sent: Friday, March 06, 2015 8:57 AM To: 'Richard Elling' Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? Yeah, there is on R720s, I think. What about on the Supermicro and HP servers? From: Richard Elling [mailto:richard.elling at richardelling.com] Sent: Friday, March 06, 2015 11:39 AM To: Nate Smith Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks? On Mar 5, 2015, at 6:00 AM, Nate Smith wrote: I’ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I’ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don’t get an error that it’s dropped, at least not on the Omnios system, but I get notice when it’s restored (which makes no sense). I’m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip. Is there a PCI bridge in the data path? These can often be found on mezzanine or riser cards. -- richard Mar 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:13 newstorm last message repeated 1 time Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G Mar 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From johan.kragsterman at capvert.se Sat Mar 7 16:39:37 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Sat, 7 Mar 2015 17:39:37 +0100 Subject: [OmniOS-discuss] Ang: RE: Re: QLE2652 I/O Disconnect. Heat Sinks? In-Reply-To: <1425744844314.65927@steait.net> References: <1425744844314.65927@steait.net>, , <4593924c-380f-4705-871f-3105d937c5f7@careyweb.com> <4d6c8571-a7b9-4c31-bd87-2fe6e611f214@careyweb.com>, Message-ID: An HTML attachment was scrubbed... URL: From brogyi at gmail.com Sat Mar 7 20:56:29 2015 From: brogyi at gmail.com (=?windows-1252?Q?Brogy=E1nyi_J=F3zsef?=) Date: Sat, 07 Mar 2015 21:56:29 +0100 Subject: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00 In-Reply-To: <54FA244B.1010600@gnaa.net> References: <32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net> <331901d05846$e406cbb0$ac146310$@acm.org> <54FA198B.7090300@gnaa.net> <54FA244B.1010600@gnaa.net> Message-ID: <54FB65FD.6040600@gmail.com> Has anyone tested this firmware? Is it free from this error message "Parity Error on path"? Thanks any information. BR Brogyi From wverb73 at gmail.com Sun Mar 8 22:58:01 2015 From: wverb73 at gmail.com (W Verb) Date: Sun, 8 Mar 2015 15:58:01 -0700 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: <7e4156d48aba46239fb4d490577382cd@NASANEXM01F.na.qualcomm.com> References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> <54F44602.5030705@osn.de> <064C3FC8-43E0-4757-AADA-831303782A4C@damore.org> <7e4156d48aba46239fb4d490577382cd@NASANEXM01F.na.qualcomm.com> Message-ID: Hello, I was able to perform my last round of testing last night. The tests were done with a single host, while enabling one or two 1G ports. 1 port (Read) 2 ports (Read) Baseline: 130MB/s Read 30MB/s Disable LRO: 90MB/s 27MB/s Disable LRO/LSO 88MB/s 27MB/s LRO/LSO enabled, TCP window size varies Default iscsid (256k max) 64k Window 96MB/s 28MB/s 32k Window 72MB/s 22MB/s 16k Window 61MB/s 17MB/s I then set everything back to the default and captured exactly what happens when I start a single port transfer then enable a second port in the middle. It's pretty illustrative. The server chokes a bit, then strangely sends TDS protocol packets saying "exception occurred". I didn't know TDS had anything to do with iSCSI. Captures from both interfaces here: https://drive.google.com/open?id=0BwyUMjibonYQMG8zZnNWbk40Ymc&authuser=0 So it seems that window size isn't the limiting factor here. I am in the middle of implementing infiniband now. I can highly recommend the Silverstorm (QLogic) 9024CU 20G 4096MTU (with latest firmware) switch for the lab. The fans run very quietly at normal temps, and they are very inexpensive ($250 on eBay). It supports ethernet out-of-band management, as well as a subnet manager web app hosted from the switch itself. Creating the serial console cable was mildly irritating. The latest firmware can be retrieved from the QLogic site via a Google search, you won't find a link on their support frontpage. I'll report back once I have iSER / SRP results. -Warren V On Wed, Mar 4, 2015 at 9:14 AM, Mallory, Rob wrote: > Hi Warren, > > [ ?no objections here if you want to take this thread off-line to a > smaller group? I wanted to post this to the larger groups for benefit of > others, > > And maybe if you find success in the end you can post back to the larger > groups with a summary ] > > > > Your recent success case going to 10GbE end to end seem to back up my > theory of overload. > > I noticed yesterday a couple things: in the packet capture of the > sender, it is apparent that large send offload LSO is being used (notice > the size up to 64k packets). Among other things, this makes it a bit > harder to tune and understand what is happening from packet captures on the > hosts. It also lets the NIC ?tightly pack? the outgoing packet stream > without much gap between the MTU sized packets. You need to get a 3rd-party > snoop on the wire to see what the wire sees. Same thing on the ESXi > server. I suspect it is using LRO or RSC, also ganging up the packets, and > making it difficult to diagnose from a tcpdump on a VM. > > > > I still stand by my original inclination. And the data you have shown > also seems to back this. (smaller MTU = less drops/pauses and then the > latest: equal size pipes on send and receive make it work fine.) I was > recommending that you either/both decrease the rwin on the client side > (limiting it to an absurdly small, but not for this case, 17KB) or on the > server. > > > > It makes more sense to control the window-size on the server, (to tune > just for these small-bandwidth clients) and use a host-route with ?sendpipe > (or is it ?ssthresh) because then the server will limit (and hopefully set > some timers on the LSO part of the ixgbe) the amount of in-flight so-as the > last-switch hop does not drop those important initial packets on > tcp-slowstart. > > > > So my original recommendation still stands: limit the rwin on the client > to 17K, if you want to continue to use a 1GbE interface on it. > > (your BDP @ 150us * 1GbE is about 24k, I?d pick a max receive window size > smaller than this to be more conservative in the case of two interfaces) > > (and yes, only 2 x 9K jumbo frames fit in that BDP, so those 3k jumbos > had less loss for a good reason, 1500MTU is probably best in your case) > > And two other things to help identify/understand the situation (in the > mode you had it before with the quad-1GbE in the client) > > You can turn off LSO (in ixbge.conf) and also on the ESX, and the client > VM side you can turn off LRO or RSC. > > > > Note: there was (maybe still is) a long-standing bug in the S10 and S11 > TCP stack which I know of only second-hand from a reliable source, > > It has very similar conditions that you describe here, including high > bandwidth servers, low bandwidth clients, multiple hops. > > The workaround is to disable RSC on the linux clients. > > > > good to hear that you can configure the system end-to-end 10GbE. That?s > the obvious best case if you stick eith ethernet, and you don?t have to go > to extremes like above. Note that the lossless fabric of Infiniband will > completely hide these effects of TCP loss. Also, you can use much more > efficient transport such as SRP (RDMA) which I think would fit really well > if you can afford the additional complexity. (I?ve done this, and it?s > really not that hard on small scale). > > > > Cheers, Rob > > > > > > *From:* W Verb [mailto:wverb73 at gmail.com] > *Sent:* Tuesday, March 03, 2015 9:22 PM > *To:* Mallory, Rob; illumos-dev > *Cc:* Garrett D'Amore; Joerg Goltermann; omnios-discuss at lists.omniti.com > > *Subject:* Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, Lindsay > Lohan, and the Greek economy > > > > Hello all, > > This is probably the last message in this thread. > > I pulled the quad-gig NIC out of one of my hosts, and installed an X520. I > then set a single 10G port on the server to be on the same VLAN as the > host, and defined a vswitch, vmknic, etc on the host. > > I set the MTU to be 9000 on both sides, then ran my tests. > > Read: 130 MB/s. > > Write: 156 MB/s. > > Additionally, at higher MTUs, the NIC would periodically lock up until I > performed an "ipadm disable-if -t ixgbe0" and re-enabled it. I tried your > updated driver, Jeorg, but unfortunately it failed quite often. > > > > I then disabled stmf, enabled NFS (v3 only) on the server, and shared a > dataset on the zpool with "share -f nfs /ppool/testy". > I then mounted the server dataset on the host via NFS, and copied my test > VM from the iSCSI zvol to the NFS dataset. I also removed the binding of > the 10G port on the host from the sw iscsi interface. > > Running the same tests on the VM over NFSv3 yielded: > > Read: 650MB/s > > Write: 306MB/s > > This is getting within 10% of the throughput I consistently get on dd > operations local on the server, so I'm pretty happy that I'm getting as > good as I'm going to get until I add more drives. Additionally, I haven't > experienced any NIC hangs. > > > > I tried varying the settings in ixgbe.conf, the MTU, and disabling LSO on > the host and server, but nothing really made that much of a difference > (except reducing the MTU made things about 20-30% slower). > > mpstat during both NFS and iSCSI transfers showed all processors as > getting roughly the same number of interrupts, etc, although I did see a > varying number of spins on reader/writer locks during the iSCSI transfers. > The NFS showed no srws at all. > > Here is a pretty representative example of a 1s mpstat during an iSCSI > transfer: > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt > idl set > 0 0 0 0 3246 2690 8739 6 772 5967 2 0 0 11 0 > 89 0 > 1 0 0 0 2366 2249 7910 8 988 5563 2 302 0 9 0 > 91 0 > 2 0 0 0 2455 2344 5584 5 687 5656 3 66 0 9 0 > 91 0 > 3 0 0 25 248 12 6210 1 885 5679 2 0 0 9 0 > 91 0 > 4 0 0 0 284 7 5450 2 861 5751 1 0 0 8 0 > 92 0 > 5 0 0 0 232 3 4513 0 547 5733 3 0 0 7 0 > 93 0 > 6 0 0 0 322 8 6084 1 836 6295 2 0 0 8 0 > 92 0 > 7 0 0 0 3114 2848 8229 4 648 4966 2 0 0 10 0 > 90 0 > > > > So, it seems that it's COMSTAR/iSCSI that's broke as hell, not ixgbe. My > apologies to anyone I may have offended with my pre-judgement. > > The consequences of this performance issue are significant: > > 1: Instead of being able to utilize the existing quad-port NICs I have in > my hosts, I must use dual 10G cards for redundancy purposes. > > 2: I must build out a full 10G switching infrastructure. > > 3: The network traffic is inherently less secure, as it is essentially > impossible to do real security with NFSv3 (that is supported by ESXi). > > In the short run, I have already ordered some relatively cheap 20G > infiniband gear that will hopefully push up the cost/performance ratio. > However, I have received all sorts of advice about how painful it can be to > build and maintain infiniband, and if iSCSI over 10G ethernet is this > painful, I'm not hopeful that infiniband will "just work". > > The last option, of course, is to bail out of the Solaris derivatives and > move to ZoL or ZoBSD. The drawbacks of this are: > > 1: ZoL doesn't easily support booting off of mirrored USB flash drives, > let alone running the root filesystem and swap on them. FreeNAS, by way of > comparison, puts a 2G swap partition on each zdev, which (strangely enough) > causes it to often crash when a zdev experiences a failure under load. > > 2: Neither ZoL or FreeNAS have good, stable, kernel-based iSCSI > implementations. FreeNAS is indeed testing istgt, but it proved unstable > for my purposes in recent builds. Unfortunately, stmf hasn't proved itself > any better. > > > > There are other minor differences, but these are the ones that brought me > to OmniOS in the first place. We'll just have to wait and see how well the > infiniband stuff works. > > Hopefully this exercise will help prevent others from going down the > same rabbit-hole that I did. > > -Warren V > > > > > > > > > > On Tue, Mar 3, 2015 at 3:45 PM, W Verb wrote: > > Hello Rob et al, > > Thank you for taking the time to look at this problem with me. I > completely understand your inclination to look at the network as the most > probable source of my issue, but I believe that this is a pretty clear-cut > case of server-side issues. > > 1: I did run ping RTT tests during both read and write operations with > multiple interfaces enabled, and the RTT stayed at ~.2ms regardless of > whether traffic was actively being transmitted/received or not. > > 2: I am not seeing the TCP window size bouncing around, and I am certainly > not seeing starvation and delay in my packet captures. It is true that I do > see delayed ACKs and retransmissions when I bump the MTU to 9000 on both > sides, but I stopped testing with high MTU as soon as I saw it happening > because I have a good understanding of incast. All of my recent testing has > been with MTUs between 1000 and 3000 bytes. > > 3: When testing with MTUs between 1000 and 3000 bytes, I do not see lost > packets and retransmission in captures on either the server or client side. > I only see staggered transmission delays on the part of the server. > > 4: The client is consistently advertising a large window size (20k+), so > the TCP throttling mechanism does not appear to play into this. > > 5: As mentioned previously, layer 2 flow control is not enabled anywhere > in the network, so there are no lower-level mechanisms at work. > > 6: Upon checking buffer and queue sizes (and doing the appropriate > research into documentation on the C3560E's buffer sizes), I do not see > large numbers of frames being dropped by the switch. It does happen at > larger MTUs, but not very often (and not consistently) during transfers at > 1000-3000 byte MTUs. I do not have QoS, policing, or rate-shaping enabled. > > 7: Network interface stats on both the server and the ESXi client show no > errors of any kind. This is via netstat on the server, and esxcli / Vsphere > client on the ESXi box. > > 8: When looking at captures taken simultaneously on the server and client > side, the server-side transmission pauses are consistently seen and > reproducible, even after multiple server rebuilds, ESXi rebuilds, vSphere > reinstallations (down to wiping the SQL db), various COMSTAR configuration > variations, multiple 10G NICs with different NIC chipsets, multiple > switches (I tried both a 48-port and 24-port C3560E), multiple IOS > revisions (12.2 and 15.0), OmniOS versions (r151012 and previous) multiple > cables, transceivers, etc etc etc etc etc > > > For your review, I have uploaded the actual packet captures to Google > Drive: > > > https://drive.google.com/file/d/0BwyUMjibonYQZS03dHJ2ZjJvTEE/view?usp=sharing > 2 int write - ESXi vmk5 > > https://drive.google.com/file/d/0BwyUMjibonYQTlNTQ2M5bjlxZ00/view?usp=sharing > 2 int write - ESXi vmk1 > > https://drive.google.com/file/d/0BwyUMjibonYQUEFsSVJCYXBVX3c/view?usp=sharing > 2 int read - server ixgbe0 > > https://drive.google.com/file/d/0BwyUMjibonYQT3FBbElnNFpJTzQ/view?usp=sharing > 2 int read - ESXi vmk5 > > https://drive.google.com/file/d/0BwyUMjibonYQU1hXdFRLM2cxSTA/view?usp=sharing > 2 int read - ESXi vmk1 > > https://drive.google.com/file/d/0BwyUMjibonYQNEFZSHVZdFNweDA/view?usp=sharing > 1 int write - ESXi vmk1 > > https://drive.google.com/file/d/0BwyUMjibonYQM3FpTmloQm5iMGc/view?usp=sharing > 1 int read - ESXi vmk1 > > Regards, > > Warren V > > > > On Mon, Mar 2, 2015 at 1:11 PM, Mallory, Rob > wrote: > > Just an EWAG, and forgive me for not following closely, I just saw > this in my inbox, and looked at it and the screenshots for 2 minutes. > > > > But this looks like the typical incast problem.. see > http://www.pdl.cmu.edu/Incast/ > > where your storage servers (there are effectively two with ISCSI/MPIO if > round-robin is working) have networks which are 20:1 oversubscribed to your > 1GbE host interfaces. (although one of the tcpdumps shows only one server > so it may be choked out completely) > > > > What is your BDP? I?m guessing .150ms * 1GbE. For single-link that gets > you to a MSS of 18700 or so. > > > > On your 1GbE connected clients, leave MTU at 9k, set the following in > sysctl.conf, > > And reboot. > > > > net.ipv4.tcp_rmem = 4096 8938 17876 > > > > If MPIO from the server is indeed round-robining properly, this will ?make > things fit? much better. > > > > Note that your tcp_wmem can and should stay high, since you are not > oversubscribed going from client?server ; you only need to tweak the tcp > receive window size. > > > > I?ve not done it in quite some time, but IIRC, You can also set these from > the server side with: > > Route add -sendpipe 8930 or ?ssthresh > > > > And I think you can see the hash-table with computed BDP per client with > ndd. > > > > I would try playing with those before delving deep into potential bugs in > the TCP, nic driver, zfs, or vm. > > -Rob > > > > *From:* W Verb via illumos-developer [mailto:developer at lists.illumos.org] > *Sent:* Monday, March 02, 2015 12:20 PM > *To:* Garrett D'Amore > *Cc:* Joerg Goltermann; illumos-dev; omnios-discuss at lists.omniti.com > *Subject:* Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, Lindsay > Lohan, and the Greek economy > > > > Hello, > > vmstat seems pretty boring. Certainly nothing going to swap. > > root at sanbox:/root# vmstat > kthr memory page disk faults cpu > r b w swap free re mf pi po fr de sr po ro s0 s2 in sy cs us > sy id > 0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0 > 1 99 > > Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep 30" > during the "fast" write operation. > > > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Hottest Lock Caller > 50934 3% 79% 0.00 3437 0xffffff093145ba40 taskq_dispatch_ent > > nsec ------ Time Distribution ------ count Stack > 128 | 7 spa_taskq_dispatch_ent > 256 |@@ 4333 zio_taskq_dispatch > 512 |@@ 3863 zio_issue_async > 1024 |@@@@@ 9717 zio_execute > 2048 |@@@@@@@@@ 15904 > 4096 |@@@@ 7595 > 8192 |@@ 4498 > 16384 |@ 2662 > 32768 |@ 1886 > 65536 | 434 > 131072 | 34 > 262144 | 1 > > ------------------------------------------------------------------------------- > > However, the truly "broken" function is a read operation: > > Top lock 1st try: > > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Hottest Lock Caller > 474 15% 15% 0.00 7031 0xffffff093145b6f8 cv_wait > > nsec ------ Time Distribution ------ count Stack > 256 |@ 29 taskq_thread_wait > 512 |@@@@@@ 100 taskq_thread > 1024 |@@@@ 72 thread_start > 2048 |@@@@ 69 > 4096 |@@@ 51 > 8192 |@@ 47 > 16384 |@@ 44 > 32768 |@@ 32 > 65536 |@ 25 > 131072 | 5 > > ------------------------------------------------------------------------------- > > Top lock 2nd try: > > > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Hottest Lock Caller > 174 39% 39% 0.00 103909 0xffffff0943f116a0 dmu_zfetch_find > > nsec ------ Time Distribution ------ count Stack > 2048 | 2 dmu_zfetch > 4096 | 3 dbuf_read > 8192 | 4 > dmu_buf_hold_array_by_dnode > 16384 | 3 dmu_buf_hold_array > 32768 |@ 7 > 65536 |@@ 14 > 131072 |@@@@@@@@@@@@@@@@@@@@ 116 > 262144 |@@@ 19 > 524288 | 4 > 1048576 | 2 > > ------------------------------------------------------------------------------- > > Top lock 3rd try: > > > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Hottest Lock Caller > 283 55% 55% 0.00 94602 0xffffff0943ff5a68 dmu_zfetch_find > > nsec ------ Time Distribution ------ count Stack > 512 | 1 dmu_zfetch > 1024 | 1 dbuf_read > 2048 | 0 > dmu_buf_hold_array_by_dnode > 4096 | 5 dmu_buf_hold_array > 8192 | 2 > 16384 | 7 > 32768 | 4 > 65536 |@@@ 33 > 131072 |@@@@@@@@@@@@@@@@@@@@ 198 > 262144 |@@ 27 > 524288 | 2 > 1048576 | 3 > > ------------------------------------------------------------------------------- > > > > As for the MTU question- setting the MTU to 9000 makes read operations > grind almost to a halt at 5MB/s transfer rate. > > -Warren V > > > > On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore > wrote: > > Here?s a theory. You are using small (relatively) MTUs (3000 is less > than the smallest ZFS block size.) So, when you go multipathing this way, > might a single upper layer transaction (ZFS block transfer request, or for > that matter COMSTAR block request) get routed over different paths. This > sounds like a potentially pathological condition to me. > > > > What happens if you increase the MTU to 9000? Have you tried it? I?m > sort of thinking that this will permit each transaction to be issued in a > single IP frame, which may alleviate certain tragic code paths. (That > said, I?m not sure how aware COMSTAR is of the IP MTU. If it is ignorant, > then it shouldn?t matter *that* much, since TCP should do the right thing > here and a single TCP stream should stick to a single underlying NIC. But > if COMSTAR is aware of the MTU, it may do some really screwball things as > it tries to break requests up into single frames.) > > > > Your read spin really looks like only about 22 msec of wait out of a total > run of 30 sec. (That?s not *great*, but neither does it sound tragic.) > Your write is interesting because that looks like it is going a wildly > different path. You should be aware that the locks you see are *not* > necessarily related in call order, but rather are ordered by instance > count. The write code path hitting the task_thread as hard as it does is > really, really weird. Something is pounding on a taskq lock super hard. > The number of taskq_dispatch_ent calls is interesting here. I?m starting > to wonder if it?s something as stupid as a spin where if the taskq is > ?full? (max size reached), a caller just is spinning trying to dispatch > jobs to the taskq. > > > > The taskq_dispatch_ent code is super simple, and it should be almost > impossible to have contention on that lock ? barring a thread spinning hard > on taskq_dispatch (or taskq_dispatch_ent as I think is happening here). > Looking at the various call sites, there are places in both COMSTAR > (iscsit) and in ZFS where this could be coming from. To know which, we > really need to have the back trace associated. > > > > lockstat can give this ? try giving ?-s 5? to give a short backtrace from > this, that will probably give us a little more info about the guilty > caller. :-) > > > > - Garrett > > > > On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer < > developer at lists.illumos.org> wrote: > > > > Hello all, > > I am not using layer 2 flow control. The switch carries line-rate 10G > traffic without error. > > I think I have found the issue via lockstat. The first lockstat is taken > during a multipath read: > > lockstat -kWP sleep 30 > > > Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec) > > Count indv cuml rcnt nsec Hottest Lock Caller > > ------------------------------------------------------------------------------- > 9306 44% 44% 0.00 1557 htable_mutex+0x370 htable_release > 6307 23% 68% 0.00 1207 htable_mutex+0x108 htable_lookup > 596 7% 75% 0.00 4100 0xffffff0931705188 cv_wait > 349 5% 80% 0.00 4437 0xffffff0931705188 taskq_thread > 704 2% 82% 0.00 995 0xffffff0935de3c50 dbuf_create > > The hash table being read here I would guess is the tcp connection hash > table. > > > > When lockstat is run during a multipath write operation, I get: > > Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec) > > Count indv cuml rcnt nsec Hottest Lock Caller > > ------------------------------------------------------------------------------- > 210752 28% 28% 0.00 4781 0xffffff0931705188 taskq_thread > 174471 22% 50% 0.00 4476 0xffffff0931705188 cv_wait > 127183 10% 61% 0.00 2871 0xffffff096f29b510 zio_notify_parent > 176066 10% 70% 0.00 1922 0xffffff0931705188 taskq_dispatch_ent > 105134 9% 80% 0.00 3110 0xffffff096ffdbf10 zio_remove_child > 67512 4% 83% 0.00 1938 0xffffff096f3db4b0 zio_add_child > 45736 3% 86% 0.00 2239 0xffffff0935de3c50 dbuf_destroy > 27781 3% 89% 0.00 3416 0xffffff0935de3c50 dbuf_create > 38536 2% 91% 0.00 2122 0xffffff0935de3b70 dnode_rele > 27841 2% 93% 0.00 2423 0xffffff0935de3b70 dnode_diduse_space > 19020 2% 95% 0.00 3046 0xffffff09d9e305e0 dbuf_rele > 14627 1% 96% 0.00 3632 dbuf_hash_table+0x4f8 dbuf_find > > Writes are not performing htable lookups, while reads are. > > -Warren V > > > > > > > On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann wrote: > > Hi, > > I would try *one* TPG which includes both interface addresses > and I would double check for packet drops on the Catalyst. > > The 3560 supports only receive flow control which means, that > a sending 10Gbit port can easily overload a 1Gbit port. > Do you have flow control enabled? > > - Joerg > > > > On 02.03.2015 09:22, W Verb via illumos-developer wrote: > > Hello Garrett, > > No, no 802.3ad going on in this config. > > Here is a basic schematic: > > > https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing > > Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide: > > > https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing > > Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The > switch is set to allow 9148-byte frames, and I'm not seeing any > errors/buffer overruns on the switch. > > Here is a screenshot of a packet capture from a read operation on the > guest OS (from it's local drive, which is actually a VMDK file on the > storage server). In this example, only a single 1G ESXi kernel interface > (vmk1) is bound to the software iSCSI initiator. > > > https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing > > Note that there's a nice, well-behaved window sizing process taking > place. The ESXi decreases the scaled window by 11 or 12 for each ACK, > then bumps it back up to 512. > > Here is a similar screenshot of a single-interface write operation: > > > https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing > > There are no pauses or gaps in the transmission rate in the > single-interface transfers. > > > In the next screenshots, I have enabled an additional 1G interface on > the ESXi host, and bound it to the iSCSI initiator. The new interface is > bound to a separate physical port, uses a different VLAN on the switch, > and talks to a different 10G port on the storage server. > > First, let's look at a write operation on the guest OS, which happily > pumps data at near-line-rate to the storage server. > > Here is a sequence number trace diagram. Note how the transfer has a > nice, smooth increment rate over the entire transfer. > > > https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing > > Here are screenshots from packet captures on both 1G interfaces: > > > https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing > > https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing > > Note how we again see nice, smooth window adjustment, and no gaps in > transmission. > > > But now, let's look at the problematic two-interface Read operation. > First, the sequence graph: > > > https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing > > As you can see, there are gaps and jumps in the transmission throughout > the transfer. > It is very illustrative to look at captures of the gaps, which are > occurring on both interfaces: > > > https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing > > https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing > > As you can see, there are ~.4 second pauses in transmission from the > storage server, which kills the transfer rate. > It's clear that the ESXi box ACKs the prior iSCSI operation to > completion, then makes a new LUN request, which the storage server > immediately replies to. The ESXi ACKs the response packet from the > storage server, then waits...and waits....and waits... until eventually > the storage server starts transmitting again. > > Because the pause happens while the ESXi client is waiting for a packet > from the storage server, that tells me that the gaps are not an artifact > of traffic being switched between both active interfaces, but are > actually indicative of short hangs occurring on the server. > > Having a pause or two in transmission is no big deal, but in my case, it > is happening constantly, and dropping my overall read transfer rate down > to 20-60MB/s, which is slower than the single interface transfer rate > (~90-100MB/s). > > Decreasing the MTU makes the pauses shorter, increasing them makes the > pauses longer. > > Another interesting thing is that if I set the multipath io interval to > 3 operations instead of 1, I get better throughput. In other words, the > less frequently I swap IP addresses on my iSCSI requests from the ESXi > unit, the fewer pauses I see. > > Basically, COMSTAR seems to choke each time an iSCSI request from a new > IP arrives. > > Because the single interface transfer is near line rate, that tells me > that the storage system (mpt_sas, zfs, etc) is working fine. It's only > when multiple paths are attempted that iSCSI falls on its face during > reads. > > All of these captures were taken without a cache device being attached > to the storage zpool, so this isn't looking like some kind of ZFS ARC > problem. As mentioned previously, local transfers to/from the zpool are > showing ~300-500 MB/s rates over long transfers (10G+). > > -Warren V > > On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore > > wrote: > > I?m not sure I?ve followed properly. You have *two* interfaces. > You are not trying to provision these in an aggr are you? As far as > I?m aware, VMware does not support 802.3ad link aggregations. (Its > possible that you can make it work with ESXi if you give the entire > NIC to the guest ? but I?m skeptical.) The problem is that if you > try to use link aggregation, some packets (up to half!) will be > lost. TCP and other protocols fare poorly in this situation. > > Its possible I?ve totally misunderstood what you?re trying to do, in > which case I apologize. > > The idle thing is a red-herring ? the cpu is waiting for work to do, > probably because packets haven?t arrived (or where dropped by the > hypervisor!) I wouldn?t read too much into that except that your > network stack is in trouble. I?d look a bit more closely at the > kstats for tcp ? I suspect you?ll see retransmits or out of order > values that are unusually high ? if so this may help validate my > theory above. > > - Garrett > > On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer > > > > > wrote: > > Hello all, > > > Well, I no longer blame the ixgbe driver for the problems I'm seeing. > > > I tried Joerg's updated driver, which didn't improve the issue. So > I went back to the drawing board and rebuilt the server from scratch. > > What I noted is that if I have only a single 1-gig physical > interface active on the ESXi host, everything works as expected. > As soon as I enable two interfaces, I start seeing the performance > problems I've described. > > Response pauses from the server that I see in TCPdumps are still > leading me to believe the problem is delay on the server side, so > I ran a series of kernel dtraces and produced some flamegraphs. > > > This was taken during a read operation with two active 10G > interfaces on the server, with a single target being shared by two > tpgs- one tpg for each 10G physical port. The host device has two > 1G ports enabled, with VLANs separating the active ports into > 10G/1G pairs. ESXi is set to multipath using both VLANS with a > round-robin IO interval of 1. > > > https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing > > > This was taken during a write operation: > > > https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing > > > I then rebooted the server and disabled C-State, ACPI T-State, and > general EIST (Turbo boost) functionality in the CPU. > > I when I attempted to boot my guest VM, the iSCSI transfer > gradually ground to a halt during the boot loading process, and > the guest OS never did complete its boot process. > > Here is a flamegraph taken while iSCSI is slowly dying: > > > https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing > > > I edited out cpu_idle_adaptive from the dtrace output and > regenerated the slowdown graph: > > > https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing > > > I then edited cpu_idle_adaptive out of the speedy write operation > and regenerated that graph: > > > https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing > > > I have zero experience with interpreting flamegraphs, but the most > significant difference I see between the slow read example and the > fast write example is in unix`thread_start --> unix`idle. There's > a good chunk of "unix`i86_mwait" in the read example that is not > present in the write example at all. > > Disabling the l2arc cache device didn't make a difference, and I > had to reenable EIST support on the CPU to get my VMs to boot. > > I am seeing a variety of bug reports going back to 2010 regarding > excessive mwait operations, with the suggested solutions usually > being to set "cpupm enable poll-mode" in power.conf. That change > also had no effect on speed. > > -Warren V > > > > > -----Original Message----- > > From: Chris Siebenmann [mailto:cks at cs.toronto.edu] > > Sent: Monday, February 23, 2015 8:30 AM > > To: W Verb > > Cc: omnios-discuss at lists.omniti.com > > ; cks at cs.toronto.edu > > > Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and > the Greek economy > > > > Chris, thanks for your specific details. I'd appreciate it if you > > > could tell me which copper NIC you tried, as well as to pass on the > > > iSCSI tuning parameters. > > > Our copper NIC experience is with onboard X540-AT2 ports on > SuperMicro hardware (which have the guaranteed 10-20 msec lock > hold) and dual-port 82599EB TN cards (which have some sort of > driver/hardware failure under load that eventually leads to > 2-second lock holds). I can't recommend either with the current > driver; we had to revert to 1G networking in order to get stable > servers. > > > The iSCSI parameter modifications we do, across both initiators > and targets, are: > > > initialr2tno > > firstburstlength128k > > maxrecvdataseglen128k[only on Linux backends] > > maxxmitdataseglen128k[only on Linux backends] > > > The OmniOS initiator doesn't need tuning for more than the first > two parameters; on the Linux backends we tune up all four. My > extended thoughts on these tuning parameters and why we touch them > can be found > > here: > > > > http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol > > http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning > > > The short version is that these parameters probably only make a > small difference but their overall goal is to do 128KB ZFS reads > and writes in single iSCSI operations (although they will be > fragmented at the TCP > > layer) and to do iSCSI writes without a back-and-forth delay > between initiator and target (that's 'initialr2t no'). > > > I think basically everyone should use InitialR2T set to no and in > fact that it should be the software default. These days only > unusually limited iSCSI targets should need it to be otherwise and > they can change their setting for it (initiator and target must > both agree to it being 'yes', so either can veto it). > > > - cks > > > > On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann > > wrote: > > Hi, > > I think your problem is caused by your link properties or your > switch settings. In general the standard ixgbe seems to perform > well. > > I had trouble after changing the default flow control settings > to "bi" > and this was my motivation to update the ixgbe driver a long > time ago. > After I have updated our systems to ixgbe 2.5.8 I never had any > problems .... > > Make sure your switch has support for jumbo frames and you use > the same mtu on all ports, otherwise the smallest will be used. > > What switch do you use? I can tell you nice horror stories about > different vendors.... > > - Joerg > > On 23.02.2015 10:31, W Verb wrote: > > Thank you Joerg, > > I've downloaded the package and will try it tomorrow. > > The only thing I can add at this point is that upon review > of my > testing, I may have performed my "pkg -u" between the > initial quad-gig > performance test and installing the 10G NIC. So this may > be a new > problem introduced in the latest updates. > > Those of you who are running 10G and have not upgraded to > the latest > kernel, etc, might want to do some additional testing > before running the > update. > > -Warren V > > On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann > > > >> wrote: > > Hi, > > I remember there was a problem with the flow control > settings in the > ixgbe > driver, so I updated it a long time ago for our > internal servers to > 2.5.8. > Last weekend I integrated the latest changes from the > FreeBSD driver > to bring > the illumos ixgbe to 2.5.25 but I had no time to test > it, so it's > completely > untested! > > > If you would like to give the latest driver a try you > can fetch the > kernel modules from > https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 > > > > > Clone your boot environment, place the modules in the > new environment > and update the boot-archive of the new BE. > > - Joerg > > > > > > On 23.02.2015 02:54, W Verb wrote: > > By the way, to those of you who have working > setups: please send me > your pool/volume settings, interface linkprops, > and any kernel > tuning > parameters you may have set. > > Thanks, > Warren V > > On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip > > >> > > > wrote: > > I can't say I totally agree with your performance > assessment. I run Intel > X520 in all my OmniOS boxes. > > Here is a capture of nfssvrtop I made while > running many > storage vMotions > between two OmniOS boxes hosting NFS > datastores. This is a > 10 host VMware > cluster. Both OmniOS boxes are dual 10G > connected with > copper twin-ax to > the in rack Nexus 5010. > > VMware does 100% sync writes, I use ZeusRAM > SSDs for log > devices. > > -Chip > > 2014 Apr 24 08:05:51, load: 12.64, read: > 17330243 KB, > swrite: 15985 KB, > awrite: 1875455 KB > > Ver Client NFSOPS Reads > SWrites AWrites > Commits Rd_bw > SWr_bw AWr_bw Rd_t SWr_t AWr_t > Com_t Align% > > 4 10.28.17.105 0 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 10.28.17.215 0 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 10.28.17.213 0 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 10.28.16.151 0 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 4 all 1 0 > 0 0 > 0 0 > 0 0 0 0 0 0 0 > > 3 10.28.16.175 3 0 > 3 0 > 0 1 > 11 0 4806 48 0 0 85 > > 3 10.28.16.183 6 0 > 6 0 > 0 3 > 162 0 549 124 0 0 > 73 > > 3 10.28.16.180 11 0 > 10 0 > 0 3 > 27 0 776 89 0 0 67 > > 3 10.28.16.176 28 2 > 26 0 > 0 10 > 405 0 2572 198 0 0 > 100 > > 3 10.28.16.178 4606 4602 > 4 0 > 0 294534 > 3 0 723 49 0 0 99 > > 3 10.28.16.179 4905 4879 > 26 0 > 0 312208 > 311 0 735 271 0 0 > 99 > > 3 10.28.16.181 5515 5502 > 13 0 > 0 352107 > 77 0 89 87 0 0 99 > > 3 10.28.16.184 12095 12059 > 10 0 > 0 763014 > 39 0 249 147 0 0 99 > > 3 10.28.58.1 15401 6040 > 116 6354 > 53 191605 > 474 202346 192 96 144 83 > 99 > > 3 all 42574 33086 <42574%2033086>> > > 217 > 6354 53 1913488 > 1582 202300 348 138 153 105 > 99 > > > > > > On Fri, Feb 20, 2015 at 11:46 PM, W Verb > > > > >> wrote: > > > Hello All, > > Thank you for your replies. > I tried a few things, and found the following: > > 1: Disabling hyperthreading support in the > BIOS drops > performance overall > by a factor of 4. > 2: Disabling VT support also seems to have > some effect, > although it > appears to be minor. But this has the > amusing side > effect of fixing the > hangs I've been experiencing with fast > reboot. Probably > by disabling kvm. > 3: The performance tests are a bit tricky > to quantify > because of caching > effects. In fact, I'm not entirely sure > what is > happening here. It's just > best to describe what I'm seeing: > > The commands I'm using to test are > dd if=/dev/zero of=./test.dd bs=2M count=5000 > dd of=/dev/null if=./test.dd bs=2M count=5000 > The host vm is running Centos 6.6, and has > the latest > vmtools installed. > There is a host cache on an SSD local to > the host that > is also in place. > Disabling the host cache didn't > immediately have an > effect as far as I could > see. > > The host MTU set to 3000 on all iSCSI > interfaces for all > tests. > > Test 1: Right after reboot, with an ixgbe > MTU of 9000, > the write test > yields an average speed over three tests > of 137MB/s. The > read test yields an > average over three tests of 5MB/s. > > Test 2: After setting "ifconfig ixgbe0 mtu > 3000", the > write tests yield > 140MB/s, and the read tests yield 53MB/s. > It's important > to note here that > if I cut the read test short at only > 2-3GB, I get > results upwards of > 350MB/s, which I assume is local > cache-related distortion. > > Test 3: MTU of 1500. Read tests are up to > 156 MB/s. > Write tests yield > about 142MB/s. > Test 4: MTU of 1000: Read test at 182MB/s. > Test 5: MTU of 900: Read test at 130 MB/s. > Test 6: MTU of 1000: Read test at 160MB/s. > Write tests > are now > consistently at about 300MB/s. > Test 7: MTU of 1200: Read test at 124MB/s. > Test 8: MTU of 1000: Read test at 161MB/s. > Write at 261MB/s. > > A few final notes: > L1ARC grabs about 10GB of RAM during the > tests, so > there's definitely some > read cachi > > ... > > [Message clipped] -------------- next part -------------- An HTML attachment was scrubbed... URL: From garrett at damore.org Sun Mar 8 23:30:06 2015 From: garrett at damore.org (Garrett D'Amore) Date: Sun, 8 Mar 2015 16:30:06 -0700 Subject: [OmniOS-discuss] [developer] Re: The ixgbe driver, Lindsay Lohan, and the Greek economy In-Reply-To: References: <20150221032559.727D07A0792@apps0.cs.toronto.edu> <54EAEFA8.4020101@osn.de> <54EB5392.6030900@osn.de> <279C084B-E76B-4F38-A7CF-D6CB37D06D8F@damore.org> <54F44602.5030705@osn.de> <064C3FC8-43E0-4757-AADA-831303782A4C@damore.org> <7e4156d48aba46239fb4d490577382cd@NASANEXM01F.na.qualcomm.com> Message-ID: Cool. Sorry I've still not looked at your lockstat data. Been buried under other tasks. I will try to find some time in the next day or so. Sent from my iPhone > On Mar 8, 2015, at 3:58 PM, W Verb wrote: > > Hello, > > I was able to perform my last round of testing last night. The tests were done with a single host, while enabling one or two 1G ports. > > 1 port (Read) 2 ports (Read) > Baseline: 130MB/s Read 30MB/s > Disable LRO: 90MB/s 27MB/s > Disable LRO/LSO 88MB/s 27MB/s > > LRO/LSO enabled, TCP window size varies > Default iscsid (256k max) > > 64k Window 96MB/s 28MB/s > 32k Window 72MB/s 22MB/s > 16k Window 61MB/s 17MB/s > > I then set everything back to the default and captured exactly what happens when I start a single port transfer then enable a second port in the middle. It's pretty illustrative. The server chokes a bit, then strangely sends TDS protocol packets saying "exception occurred". I didn't know TDS had anything to do with iSCSI. > > Captures from both interfaces here: > https://drive.google.com/open?id=0BwyUMjibonYQMG8zZnNWbk40Ymc&authuser=0 > > So it seems that window size isn't the limiting factor here. > > I am in the middle of implementing infiniband now. I can highly recommend the Silverstorm (QLogic) 9024CU 20G 4096MTU (with latest firmware) switch for the lab. The fans run very quietly at normal temps, and they are very inexpensive ($250 on eBay). It supports ethernet out-of-band management, as well as a subnet manager web app hosted from the switch itself. Creating the serial console cable was mildly irritating. > > The latest firmware can be retrieved from the QLogic site via a Google search, you won't find a link on their support frontpage. > > I'll report back once I have iSER / SRP results. > > > -Warren V > >> On Wed, Mar 4, 2015 at 9:14 AM, Mallory, Rob wrote: >> Hi Warren, >> >> [ ?no objections here if you want to take this thread off-line to a smaller group? I wanted to post this to the larger groups for benefit of others, >> >> And maybe if you find success in the end you can post back to the larger groups with a summary ] >> >> >> >> Your recent success case going to 10GbE end to end seem to back up my theory of overload. >> >> I noticed yesterday a couple things: in the packet capture of the sender, it is apparent that large send offload LSO is being used (notice the size up to 64k packets). Among other things, this makes it a bit harder to tune and understand what is happening from packet captures on the hosts. It also lets the NIC ?tightly pack? the outgoing packet stream without much gap between the MTU sized packets. You need to get a 3rd-party snoop on the wire to see what the wire sees. Same thing on the ESXi server. I suspect it is using LRO or RSC, also ganging up the packets, and making it difficult to diagnose from a tcpdump on a VM. >> >> >> >> I still stand by my original inclination. And the data you have shown also seems to back this. (smaller MTU = less drops/pauses and then the latest: equal size pipes on send and receive make it work fine.) I was recommending that you either/both decrease the rwin on the client side (limiting it to an absurdly small, but not for this case, 17KB) or on the server. >> >> >> >> It makes more sense to control the window-size on the server, (to tune just for these small-bandwidth clients) and use a host-route with ?sendpipe (or is it ?ssthresh) because then the server will limit (and hopefully set some timers on the LSO part of the ixgbe) the amount of in-flight so-as the last-switch hop does not drop those important initial packets on tcp-slowstart. >> >> >> >> So my original recommendation still stands: limit the rwin on the client to 17K, if you want to continue to use a 1GbE interface on it. >> >> (your BDP @ 150us * 1GbE is about 24k, I?d pick a max receive window size smaller than this to be more conservative in the case of two interfaces) >> >> (and yes, only 2 x 9K jumbo frames fit in that BDP, so those 3k jumbos had less loss for a good reason, 1500MTU is probably best in your case) >> >> And two other things to help identify/understand the situation (in the mode you had it before with the quad-1GbE in the client) >> >> You can turn off LSO (in ixbge.conf) and also on the ESX, and the client VM side you can turn off LRO or RSC. >> >> >> >> Note: there was (maybe still is) a long-standing bug in the S10 and S11 TCP stack which I know of only second-hand from a reliable source, >> >> It has very similar conditions that you describe here, including high bandwidth servers, low bandwidth clients, multiple hops. >> >> The workaround is to disable RSC on the linux clients. >> >> >> >> good to hear that you can configure the system end-to-end 10GbE. That?s the obvious best case if you stick eith ethernet, and you don?t have to go to extremes like above. Note that the lossless fabric of Infiniband will completely hide these effects of TCP loss. Also, you can use much more efficient transport such as SRP (RDMA) which I think would fit really well if you can afford the additional complexity. (I?ve done this, and it?s really not that hard on small scale). >> >> >> >> Cheers, Rob >> >> >> >> >> >> From: W Verb [mailto:wverb73 at gmail.com] >> Sent: Tuesday, March 03, 2015 9:22 PM >> To: Mallory, Rob; illumos-dev >> Cc: Garrett D'Amore; Joerg Goltermann; omnios-discuss at lists.omniti.com >> >> >> Subject: Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy >> >> >> Hello all, >> >> This is probably the last message in this thread. >> >> I pulled the quad-gig NIC out of one of my hosts, and installed an X520. I then set a single 10G port on the server to be on the same VLAN as the host, and defined a vswitch, vmknic, etc on the host. >> >> I set the MTU to be 9000 on both sides, then ran my tests. >> >> Read: 130 MB/s. >> >> Write: 156 MB/s. >> >> Additionally, at higher MTUs, the NIC would periodically lock up until I performed an "ipadm disable-if -t ixgbe0" and re-enabled it. I tried your updated driver, Jeorg, but unfortunately it failed quite often. >> >> >> >> I then disabled stmf, enabled NFS (v3 only) on the server, and shared a dataset on the zpool with "share -f nfs /ppool/testy". >> I then mounted the server dataset on the host via NFS, and copied my test VM from the iSCSI zvol to the NFS dataset. I also removed the binding of the 10G port on the host from the sw iscsi interface. >> >> Running the same tests on the VM over NFSv3 yielded: >> >> Read: 650MB/s >> >> Write: 306MB/s >> >> This is getting within 10% of the throughput I consistently get on dd operations local on the server, so I'm pretty happy that I'm getting as good as I'm going to get until I add more drives. Additionally, I haven't experienced any NIC hangs. >> >> >> >> I tried varying the settings in ixgbe.conf, the MTU, and disabling LSO on the host and server, but nothing really made that much of a difference (except reducing the MTU made things about 20-30% slower). >> >> mpstat during both NFS and iSCSI transfers showed all processors as getting roughly the same number of interrupts, etc, although I did see a varying number of spins on reader/writer locks during the iSCSI transfers. The NFS showed no srws at all. >> >> Here is a pretty representative example of a 1s mpstat during an iSCSI transfer: >> >> CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl set >> 0 0 0 0 3246 2690 8739 6 772 5967 2 0 0 11 0 89 0 >> 1 0 0 0 2366 2249 7910 8 988 5563 2 302 0 9 0 91 0 >> 2 0 0 0 2455 2344 5584 5 687 5656 3 66 0 9 0 91 0 >> 3 0 0 25 248 12 6210 1 885 5679 2 0 0 9 0 91 0 >> 4 0 0 0 284 7 5450 2 861 5751 1 0 0 8 0 92 0 >> 5 0 0 0 232 3 4513 0 547 5733 3 0 0 7 0 93 0 >> 6 0 0 0 322 8 6084 1 836 6295 2 0 0 8 0 92 0 >> 7 0 0 0 3114 2848 8229 4 648 4966 2 0 0 10 0 90 0 >> >> >> >> So, it seems that it's COMSTAR/iSCSI that's broke as hell, not ixgbe. My apologies to anyone I may have offended with my pre-judgement. >> >> The consequences of this performance issue are significant: >> >> 1: Instead of being able to utilize the existing quad-port NICs I have in my hosts, I must use dual 10G cards for redundancy purposes. >> >> 2: I must build out a full 10G switching infrastructure. >> >> 3: The network traffic is inherently less secure, as it is essentially impossible to do real security with NFSv3 (that is supported by ESXi). >> >> In the short run, I have already ordered some relatively cheap 20G infiniband gear that will hopefully push up the cost/performance ratio. However, I have received all sorts of advice about how painful it can be to build and maintain infiniband, and if iSCSI over 10G ethernet is this painful, I'm not hopeful that infiniband will "just work". >> >> The last option, of course, is to bail out of the Solaris derivatives and move to ZoL or ZoBSD. The drawbacks of this are: >> >> 1: ZoL doesn't easily support booting off of mirrored USB flash drives, let alone running the root filesystem and swap on them. FreeNAS, by way of comparison, puts a 2G swap partition on each zdev, which (strangely enough) causes it to often crash when a zdev experiences a failure under load. >> >> 2: Neither ZoL or FreeNAS have good, stable, kernel-based iSCSI implementations. FreeNAS is indeed testing istgt, but it proved unstable for my purposes in recent builds. Unfortunately, stmf hasn't proved itself any better. >> >> >> >> There are other minor differences, but these are the ones that brought me to OmniOS in the first place. We'll just have to wait and see how well the infiniband stuff works. >> >> >> Hopefully this exercise will help prevent others from going down the same rabbit-hole that I did. >> >> -Warren V >> >> >> >> >> >> >> >> >> >> On Tue, Mar 3, 2015 at 3:45 PM, W Verb wrote: >> >> Hello Rob et al, >> >> Thank you for taking the time to look at this problem with me. I completely understand your inclination to look at the network as the most probable source of my issue, but I believe that this is a pretty clear-cut case of server-side issues. >> >> 1: I did run ping RTT tests during both read and write operations with multiple interfaces enabled, and the RTT stayed at ~.2ms regardless of whether traffic was actively being transmitted/received or not. >> >> 2: I am not seeing the TCP window size bouncing around, and I am certainly not seeing starvation and delay in my packet captures. It is true that I do see delayed ACKs and retransmissions when I bump the MTU to 9000 on both sides, but I stopped testing with high MTU as soon as I saw it happening because I have a good understanding of incast. All of my recent testing has been with MTUs between 1000 and 3000 bytes. >> >> 3: When testing with MTUs between 1000 and 3000 bytes, I do not see lost packets and retransmission in captures on either the server or client side. I only see staggered transmission delays on the part of the server. >> >> 4: The client is consistently advertising a large window size (20k+), so the TCP throttling mechanism does not appear to play into this. >> >> 5: As mentioned previously, layer 2 flow control is not enabled anywhere in the network, so there are no lower-level mechanisms at work. >> >> 6: Upon checking buffer and queue sizes (and doing the appropriate research into documentation on the C3560E's buffer sizes), I do not see large numbers of frames being dropped by the switch. It does happen at larger MTUs, but not very often (and not consistently) during transfers at 1000-3000 byte MTUs. I do not have QoS, policing, or rate-shaping enabled. >> >> 7: Network interface stats on both the server and the ESXi client show no errors of any kind. This is via netstat on the server, and esxcli / Vsphere client on the ESXi box. >> >> 8: When looking at captures taken simultaneously on the server and client side, the server-side transmission pauses are consistently seen and reproducible, even after multiple server rebuilds, ESXi rebuilds, vSphere reinstallations (down to wiping the SQL db), various COMSTAR configuration variations, multiple 10G NICs with different NIC chipsets, multiple switches (I tried both a 48-port and 24-port C3560E), multiple IOS revisions (12.2 and 15.0), OmniOS versions (r151012 and previous) multiple cables, transceivers, etc etc etc etc etc >> >> >> For your review, I have uploaded the actual packet captures to Google Drive: >> >> https://drive.google.com/file/d/0BwyUMjibonYQZS03dHJ2ZjJvTEE/view?usp=sharing 2 int write - ESXi vmk5 >> https://drive.google.com/file/d/0BwyUMjibonYQTlNTQ2M5bjlxZ00/view?usp=sharing 2 int write - ESXi vmk1 >> https://drive.google.com/file/d/0BwyUMjibonYQUEFsSVJCYXBVX3c/view?usp=sharing 2 int read - server ixgbe0 >> https://drive.google.com/file/d/0BwyUMjibonYQT3FBbElnNFpJTzQ/view?usp=sharing 2 int read - ESXi vmk5 >> https://drive.google.com/file/d/0BwyUMjibonYQU1hXdFRLM2cxSTA/view?usp=sharing 2 int read - ESXi vmk1 >> https://drive.google.com/file/d/0BwyUMjibonYQNEFZSHVZdFNweDA/view?usp=sharing 1 int write - ESXi vmk1 >> https://drive.google.com/file/d/0BwyUMjibonYQM3FpTmloQm5iMGc/view?usp=sharing 1 int read - ESXi vmk1 >> >> Regards, >> >> Warren V >> >> >> >> On Mon, Mar 2, 2015 at 1:11 PM, Mallory, Rob wrote: >> >> Just an EWAG, and forgive me for not following closely, I just saw this in my inbox, and looked at it and the screenshots for 2 minutes. >> >> >> >> But this looks like the typical incast problem.. see http://www.pdl.cmu.edu/Incast/ >> >> where your storage servers (there are effectively two with ISCSI/MPIO if round-robin is working) have networks which are 20:1 oversubscribed to your 1GbE host interfaces. (although one of the tcpdumps shows only one server so it may be choked out completely) >> >> >> >> What is your BDP? I?m guessing .150ms * 1GbE. For single-link that gets you to a MSS of 18700 or so. >> >> >> >> On your 1GbE connected clients, leave MTU at 9k, set the following in sysctl.conf, >> >> And reboot. >> >> >> >> net.ipv4.tcp_rmem = 4096 8938 17876 >> >> >> >> If MPIO from the server is indeed round-robining properly, this will ?make things fit? much better. >> >> >> >> Note that your tcp_wmem can and should stay high, since you are not oversubscribed going from client?server ; you only need to tweak the tcp receive window size. >> >> >> >> I?ve not done it in quite some time, but IIRC, You can also set these from the server side with: >> >> Route add -sendpipe 8930 or ?ssthresh >> >> >> >> And I think you can see the hash-table with computed BDP per client with ndd. >> >> >> >> I would try playing with those before delving deep into potential bugs in the TCP, nic driver, zfs, or vm. >> >> -Rob >> >> >> >> From: W Verb via illumos-developer [mailto:developer at lists.illumos.org] >> Sent: Monday, March 02, 2015 12:20 PM >> To: Garrett D'Amore >> Cc: Joerg Goltermann; illumos-dev; omnios-discuss at lists.omniti.com >> Subject: Re: [developer] Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy >> >> >> >> Hello, >> >> vmstat seems pretty boring. Certainly nothing going to swap. >> >> root at sanbox:/root# vmstat >> kthr memory page disk faults cpu >> r b w swap free re mf pi po fr de sr po ro s0 s2 in sy cs us sy id >> 0 0 0 34631632 30728068 175 215 0 0 0 0 963 275 4 6 140 3301 796 6681 0 1 99 >> >> Here is the "taskq_dispatch_ent" output from "lockstat -s 5 -kWP sleep 30" during the "fast" write operation. >> >> ------------------------------------------------------------------------------- >> Count indv cuml rcnt nsec Hottest Lock Caller >> 50934 3% 79% 0.00 3437 0xffffff093145ba40 taskq_dispatch_ent >> >> nsec ------ Time Distribution ------ count Stack >> 128 | 7 spa_taskq_dispatch_ent >> 256 |@@ 4333 zio_taskq_dispatch >> 512 |@@ 3863 zio_issue_async >> 1024 |@@@@@ 9717 zio_execute >> 2048 |@@@@@@@@@ 15904 >> 4096 |@@@@ 7595 >> 8192 |@@ 4498 >> 16384 |@ 2662 >> 32768 |@ 1886 >> 65536 | 434 >> 131072 | 34 >> 262144 | 1 >> ------------------------------------------------------------------------------- >> >> >> However, the truly "broken" function is a read operation: >> >> Top lock 1st try: >> ------------------------------------------------------------------------------- >> Count indv cuml rcnt nsec Hottest Lock Caller >> 474 15% 15% 0.00 7031 0xffffff093145b6f8 cv_wait >> >> nsec ------ Time Distribution ------ count Stack >> 256 |@ 29 taskq_thread_wait >> 512 |@@@@@@ 100 taskq_thread >> 1024 |@@@@ 72 thread_start >> 2048 |@@@@ 69 >> 4096 |@@@ 51 >> 8192 |@@ 47 >> 16384 |@@ 44 >> 32768 |@@ 32 >> 65536 |@ 25 >> 131072 | 5 >> ------------------------------------------------------------------------------- >> >> Top lock 2nd try: >> >> ------------------------------------------------------------------------------- >> Count indv cuml rcnt nsec Hottest Lock Caller >> 174 39% 39% 0.00 103909 0xffffff0943f116a0 dmu_zfetch_find >> >> nsec ------ Time Distribution ------ count Stack >> 2048 | 2 dmu_zfetch >> 4096 | 3 dbuf_read >> 8192 | 4 dmu_buf_hold_array_by_dnode >> 16384 | 3 dmu_buf_hold_array >> 32768 |@ 7 >> 65536 |@@ 14 >> 131072 |@@@@@@@@@@@@@@@@@@@@ 116 >> 262144 |@@@ 19 >> 524288 | 4 >> 1048576 | 2 >> ------------------------------------------------------------------------------- >> >> Top lock 3rd try: >> >> ------------------------------------------------------------------------------- >> Count indv cuml rcnt nsec Hottest Lock Caller >> 283 55% 55% 0.00 94602 0xffffff0943ff5a68 dmu_zfetch_find >> >> nsec ------ Time Distribution ------ count Stack >> 512 | 1 dmu_zfetch >> 1024 | 1 dbuf_read >> 2048 | 0 dmu_buf_hold_array_by_dnode >> 4096 | 5 dmu_buf_hold_array >> 8192 | 2 >> 16384 | 7 >> 32768 | 4 >> 65536 |@@@ 33 >> 131072 |@@@@@@@@@@@@@@@@@@@@ 198 >> 262144 |@@ 27 >> 524288 | 2 >> 1048576 | 3 >> ------------------------------------------------------------------------------- >> >> >> >> As for the MTU question- setting the MTU to 9000 makes read operations grind almost to a halt at 5MB/s transfer rate. >> >> -Warren V >> >> >> >> On Mon, Mar 2, 2015 at 11:30 AM, Garrett D'Amore wrote: >> >> Here?s a theory. You are using small (relatively) MTUs (3000 is less than the smallest ZFS block size.) So, when you go multipathing this way, might a single upper layer transaction (ZFS block transfer request, or for that matter COMSTAR block request) get routed over different paths. This sounds like a potentially pathological condition to me. >> >> >> >> What happens if you increase the MTU to 9000? Have you tried it? I?m sort of thinking that this will permit each transaction to be issued in a single IP frame, which may alleviate certain tragic code paths. (That said, I?m not sure how aware COMSTAR is of the IP MTU. If it is ignorant, then it shouldn?t matter *that* much, since TCP should do the right thing here and a single TCP stream should stick to a single underlying NIC. But if COMSTAR is aware of the MTU, it may do some really screwball things as it tries to break requests up into single frames.) >> >> >> >> Your read spin really looks like only about 22 msec of wait out of a total run of 30 sec. (That?s not *great*, but neither does it sound tragic.) Your write is interesting because that looks like it is going a wildly different path. You should be aware that the locks you see are *not* necessarily related in call order, but rather are ordered by instance count. The write code path hitting the task_thread as hard as it does is really, really weird. Something is pounding on a taskq lock super hard. The number of taskq_dispatch_ent calls is interesting here. I?m starting to wonder if it?s something as stupid as a spin where if the taskq is ?full? (max size reached), a caller just is spinning trying to dispatch jobs to the taskq. >> >> >> >> The taskq_dispatch_ent code is super simple, and it should be almost impossible to have contention on that lock ? barring a thread spinning hard on taskq_dispatch (or taskq_dispatch_ent as I think is happening here). Looking at the various call sites, there are places in both COMSTAR (iscsit) and in ZFS where this could be coming from. To know which, we really need to have the back trace associated. >> >> >> >> lockstat can give this ? try giving ?-s 5? to give a short backtrace from this, that will probably give us a little more info about the guilty caller. :-) >> >> >> >> - Garrett >> >> >> >> On Mar 2, 2015, at 11:07 AM, W Verb via illumos-developer wrote: >> >> >> >> Hello all, >> >> I am not using layer 2 flow control. The switch carries line-rate 10G traffic without error. >> >> I think I have found the issue via lockstat. The first lockstat is taken during a multipath read: >> >> lockstat -kWP sleep 30 >> >> >> Adaptive mutex spin: 21331 events in 30.020 seconds (711 events/sec) >> >> Count indv cuml rcnt nsec Hottest Lock Caller >> ------------------------------------------------------------------------------- >> 9306 44% 44% 0.00 1557 htable_mutex+0x370 htable_release >> 6307 23% 68% 0.00 1207 htable_mutex+0x108 htable_lookup >> 596 7% 75% 0.00 4100 0xffffff0931705188 cv_wait >> 349 5% 80% 0.00 4437 0xffffff0931705188 taskq_thread >> 704 2% 82% 0.00 995 0xffffff0935de3c50 dbuf_create >> >> The hash table being read here I would guess is the tcp connection hash table. >> >> >> >> When lockstat is run during a multipath write operation, I get: >> >> Adaptive mutex spin: 1097341 events in 30.016 seconds (36558 events/sec) >> >> Count indv cuml rcnt nsec Hottest Lock Caller >> ------------------------------------------------------------------------------- >> 210752 28% 28% 0.00 4781 0xffffff0931705188 taskq_thread >> 174471 22% 50% 0.00 4476 0xffffff0931705188 cv_wait >> 127183 10% 61% 0.00 2871 0xffffff096f29b510 zio_notify_parent >> 176066 10% 70% 0.00 1922 0xffffff0931705188 taskq_dispatch_ent >> 105134 9% 80% 0.00 3110 0xffffff096ffdbf10 zio_remove_child >> 67512 4% 83% 0.00 1938 0xffffff096f3db4b0 zio_add_child >> 45736 3% 86% 0.00 2239 0xffffff0935de3c50 dbuf_destroy >> 27781 3% 89% 0.00 3416 0xffffff0935de3c50 dbuf_create >> 38536 2% 91% 0.00 2122 0xffffff0935de3b70 dnode_rele >> 27841 2% 93% 0.00 2423 0xffffff0935de3b70 dnode_diduse_space >> 19020 2% 95% 0.00 3046 0xffffff09d9e305e0 dbuf_rele >> 14627 1% 96% 0.00 3632 dbuf_hash_table+0x4f8 dbuf_find >> >> >> Writes are not performing htable lookups, while reads are. >> >> -Warren V >> >> >> >> >> >> >> >> On Mon, Mar 2, 2015 at 3:14 AM, Joerg Goltermann wrote: >> >> Hi, >> >> I would try *one* TPG which includes both interface addresses >> and I would double check for packet drops on the Catalyst. >> >> The 3560 supports only receive flow control which means, that >> a sending 10Gbit port can easily overload a 1Gbit port. >> Do you have flow control enabled? >> >> - Joerg >> >> >> >> On 02.03.2015 09:22, W Verb via illumos-developer wrote: >> >> Hello Garrett, >> >> No, no 802.3ad going on in this config. >> >> Here is a basic schematic: >> >> https://drive.google.com/file/d/0BwyUMjibonYQVkVqcE5OQUJyUUU/view?usp=sharing >> >> Here is the Nexenta MPIO iSCSI Setup Document that I used as a guide: >> >> https://drive.google.com/file/d/0BwyUMjibonYQbjEyUTBjN2tTNWM/view?usp=sharing >> >> Note that I am using an MTU of 3000 on both the 10G and 1G NICs. The >> switch is set to allow 9148-byte frames, and I'm not seeing any >> errors/buffer overruns on the switch. >> >> Here is a screenshot of a packet capture from a read operation on the >> guest OS (from it's local drive, which is actually a VMDK file on the >> storage server). In this example, only a single 1G ESXi kernel interface >> (vmk1) is bound to the software iSCSI initiator. >> >> https://drive.google.com/file/d/0BwyUMjibonYQa2NYdXhpZkpkbU0/view?usp=sharing >> >> Note that there's a nice, well-behaved window sizing process taking >> place. The ESXi decreases the scaled window by 11 or 12 for each ACK, >> then bumps it back up to 512. >> >> Here is a similar screenshot of a single-interface write operation: >> >> https://drive.google.com/file/d/0BwyUMjibonYQbU1RZHRnakxDSFU/view?usp=sharing >> >> There are no pauses or gaps in the transmission rate in the >> single-interface transfers. >> >> >> In the next screenshots, I have enabled an additional 1G interface on >> the ESXi host, and bound it to the iSCSI initiator. The new interface is >> bound to a separate physical port, uses a different VLAN on the switch, >> and talks to a different 10G port on the storage server. >> >> First, let's look at a write operation on the guest OS, which happily >> pumps data at near-line-rate to the storage server. >> >> Here is a sequence number trace diagram. Note how the transfer has a >> nice, smooth increment rate over the entire transfer. >> >> https://drive.google.com/file/d/0BwyUMjibonYQWHNIa0drWnNxMmM/view?usp=sharing >> >> Here are screenshots from packet captures on both 1G interfaces: >> >> https://drive.google.com/file/d/0BwyUMjibonYQRWhyVVQ4djNaU3c/view?usp=sharing >> https://drive.google.com/file/d/0BwyUMjibonYQaTVjTEtTRloyR2c/view?usp=sharing >> >> Note how we again see nice, smooth window adjustment, and no gaps in >> transmission. >> >> >> But now, let's look at the problematic two-interface Read operation. >> First, the sequence graph: >> >> https://drive.google.com/file/d/0BwyUMjibonYQTzdFVWdQMWZ6LUU/view?usp=sharing >> >> As you can see, there are gaps and jumps in the transmission throughout >> the transfer. >> It is very illustrative to look at captures of the gaps, which are >> occurring on both interfaces: >> >> https://drive.google.com/file/d/0BwyUMjibonYQc0VISXN6eVFwQzg/view?usp=sharing >> https://drive.google.com/file/d/0BwyUMjibonYQVFREUHp3TGFiUU0/view?usp=sharing >> >> As you can see, there are ~.4 second pauses in transmission from the >> storage server, which kills the transfer rate. >> It's clear that the ESXi box ACKs the prior iSCSI operation to >> completion, then makes a new LUN request, which the storage server >> immediately replies to. The ESXi ACKs the response packet from the >> storage server, then waits...and waits....and waits... until eventually >> the storage server starts transmitting again. >> >> Because the pause happens while the ESXi client is waiting for a packet >> from the storage server, that tells me that the gaps are not an artifact >> of traffic being switched between both active interfaces, but are >> actually indicative of short hangs occurring on the server. >> >> Having a pause or two in transmission is no big deal, but in my case, it >> is happening constantly, and dropping my overall read transfer rate down >> to 20-60MB/s, which is slower than the single interface transfer rate >> (~90-100MB/s). >> >> Decreasing the MTU makes the pauses shorter, increasing them makes the >> pauses longer. >> >> Another interesting thing is that if I set the multipath io interval to >> 3 operations instead of 1, I get better throughput. In other words, the >> less frequently I swap IP addresses on my iSCSI requests from the ESXi >> unit, the fewer pauses I see. >> >> Basically, COMSTAR seems to choke each time an iSCSI request from a new >> IP arrives. >> >> Because the single interface transfer is near line rate, that tells me >> that the storage system (mpt_sas, zfs, etc) is working fine. It's only >> when multiple paths are attempted that iSCSI falls on its face during reads. >> >> All of these captures were taken without a cache device being attached >> to the storage zpool, so this isn't looking like some kind of ZFS ARC >> problem. As mentioned previously, local transfers to/from the zpool are >> showing ~300-500 MB/s rates over long transfers (10G+). >> >> -Warren V >> >> On Sun, Mar 1, 2015 at 9:11 PM, Garrett D'Amore > >> > wrote: >> >> I?m not sure I?ve followed properly. You have *two* interfaces. >> You are not trying to provision these in an aggr are you? As far as >> I?m aware, VMware does not support 802.3ad link aggregations. (Its >> possible that you can make it work with ESXi if you give the entire >> NIC to the guest ? but I?m skeptical.) The problem is that if you >> try to use link aggregation, some packets (up to half!) will be >> lost. TCP and other protocols fare poorly in this situation. >> >> Its possible I?ve totally misunderstood what you?re trying to do, in >> which case I apologize. >> >> The idle thing is a red-herring ? the cpu is waiting for work to do, >> probably because packets haven?t arrived (or where dropped by the >> hypervisor!) I wouldn?t read too much into that except that your >> network stack is in trouble. I?d look a bit more closely at the >> kstats for tcp ? I suspect you?ll see retransmits or out of order >> values that are unusually high ? if so this may help validate my >> theory above. >> >> - Garrett >> >> On Mar 1, 2015, at 9:03 PM, W Verb via illumos-developer >> > >> >> >> wrote: >> >> Hello all, >> >> >> Well, I no longer blame the ixgbe driver for the problems I'm seeing. >> >> >> I tried Joerg's updated driver, which didn't improve the issue. So >> I went back to the drawing board and rebuilt the server from scratch. >> >> What I noted is that if I have only a single 1-gig physical >> interface active on the ESXi host, everything works as expected. >> As soon as I enable two interfaces, I start seeing the performance >> problems I've described. >> >> Response pauses from the server that I see in TCPdumps are still >> leading me to believe the problem is delay on the server side, so >> I ran a series of kernel dtraces and produced some flamegraphs. >> >> >> This was taken during a read operation with two active 10G >> interfaces on the server, with a single target being shared by two >> tpgs- one tpg for each 10G physical port. The host device has two >> 1G ports enabled, with VLANs separating the active ports into >> 10G/1G pairs. ESXi is set to multipath using both VLANS with a >> round-robin IO interval of 1. >> >> https://drive.google.com/file/d/0BwyUMjibonYQd3ZYOGh4d2pteGs/view?usp=sharing >> >> >> This was taken during a write operation: >> >> https://drive.google.com/file/d/0BwyUMjibonYQMnBtU1Q2SXM2ams/view?usp=sharing >> >> >> I then rebooted the server and disabled C-State, ACPI T-State, and >> general EIST (Turbo boost) functionality in the CPU. >> >> I when I attempted to boot my guest VM, the iSCSI transfer >> gradually ground to a halt during the boot loading process, and >> the guest OS never did complete its boot process. >> >> Here is a flamegraph taken while iSCSI is slowly dying: >> >> https://drive.google.com/file/d/0BwyUMjibonYQM21JeFZPX3dZWTg/view?usp=sharing >> >> >> I edited out cpu_idle_adaptive from the dtrace output and >> regenerated the slowdown graph: >> >> https://drive.google.com/file/d/0BwyUMjibonYQbTVwV3NvXzlPS1E/view?usp=sharing >> >> >> I then edited cpu_idle_adaptive out of the speedy write operation >> and regenerated that graph: >> >> https://drive.google.com/file/d/0BwyUMjibonYQeWFYM0pCMDZ1X2s/view?usp=sharing >> >> >> I have zero experience with interpreting flamegraphs, but the most >> significant difference I see between the slow read example and the >> fast write example is in unix`thread_start --> unix`idle. There's >> a good chunk of "unix`i86_mwait" in the read example that is not >> present in the write example at all. >> >> Disabling the l2arc cache device didn't make a difference, and I >> had to reenable EIST support on the CPU to get my VMs to boot. >> >> I am seeing a variety of bug reports going back to 2010 regarding >> excessive mwait operations, with the suggested solutions usually >> being to set "cpupm enable poll-mode" in power.conf. That change >> also had no effect on speed. >> >> -Warren V >> >> >> >> >> -----Original Message----- >> >> From: Chris Siebenmann [mailto:cks at cs.toronto.edu] >> >> Sent: Monday, February 23, 2015 8:30 AM >> >> To: W Verb >> >> Cc: omnios-discuss at lists.omniti.com >> >> ; cks at cs.toronto.edu >> >> >> Subject: Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and >> the Greek economy >> >> >> > Chris, thanks for your specific details. I'd appreciate it if you >> >> > could tell me which copper NIC you tried, as well as to pass on the >> >> > iSCSI tuning parameters. >> >> >> Our copper NIC experience is with onboard X540-AT2 ports on >> SuperMicro hardware (which have the guaranteed 10-20 msec lock >> hold) and dual-port 82599EB TN cards (which have some sort of >> driver/hardware failure under load that eventually leads to >> 2-second lock holds). I can't recommend either with the current >> driver; we had to revert to 1G networking in order to get stable >> servers. >> >> >> The iSCSI parameter modifications we do, across both initiators >> and targets, are: >> >> >> initialr2tno >> >> firstburstlength128k >> >> maxrecvdataseglen128k[only on Linux backends] >> >> maxxmitdataseglen128k[only on Linux backends] >> >> >> The OmniOS initiator doesn't need tuning for more than the first >> two parameters; on the Linux backends we tune up all four. My >> extended thoughts on these tuning parameters and why we touch them >> can be found >> >> here: >> >> >> http://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol >> >> http://utcc.utoronto.ca/~cks/space/blog/tech/LikelyISCSITuning >> >> >> The short version is that these parameters probably only make a >> small difference but their overall goal is to do 128KB ZFS reads >> and writes in single iSCSI operations (although they will be >> fragmented at the TCP >> >> layer) and to do iSCSI writes without a back-and-forth delay >> between initiator and target (that's 'initialr2t no'). >> >> >> I think basically everyone should use InitialR2T set to no and in >> fact that it should be the software default. These days only >> unusually limited iSCSI targets should need it to be otherwise and >> they can change their setting for it (initiator and target must >> both agree to it being 'yes', so either can veto it). >> >> >> - cks >> >> >> >> On Mon, Feb 23, 2015 at 8:21 AM, Joerg Goltermann > >> > wrote: >> >> Hi, >> >> I think your problem is caused by your link properties or your >> switch settings. In general the standard ixgbe seems to perform >> well. >> >> I had trouble after changing the default flow control settings >> to "bi" >> and this was my motivation to update the ixgbe driver a long >> time ago. >> After I have updated our systems to ixgbe 2.5.8 I never had any >> problems .... >> >> Make sure your switch has support for jumbo frames and you use >> the same mtu on all ports, otherwise the smallest will be used. >> >> What switch do you use? I can tell you nice horror stories about >> different vendors.... >> >> - Joerg >> >> On 23.02.2015 10:31, W Verb wrote: >> >> Thank you Joerg, >> >> I've downloaded the package and will try it tomorrow. >> >> The only thing I can add at this point is that upon review >> of my >> testing, I may have performed my "pkg -u" between the >> initial quad-gig >> performance test and installing the 10G NIC. So this may >> be a new >> problem introduced in the latest updates. >> >> Those of you who are running 10G and have not upgraded to >> the latest >> kernel, etc, might want to do some additional testing >> before running the >> update. >> >> -Warren V >> >> On Mon, Feb 23, 2015 at 1:15 AM, Joerg Goltermann >> >> >> >> wrote: >> >> Hi, >> >> I remember there was a problem with the flow control >> settings in the >> ixgbe >> driver, so I updated it a long time ago for our >> internal servers to >> 2.5.8. >> Last weekend I integrated the latest changes from the >> FreeBSD driver >> to bring >> the illumos ixgbe to 2.5.25 but I had no time to test >> it, so it's >> completely >> untested! >> >> >> If you would like to give the latest driver a try you >> can fetch the >> kernel modules from >> https://cloud.osn.de/index.____php/s/Fb4so9RsNnXA7r9 >> >> > > >> >> Clone your boot environment, place the modules in the >> new environment >> and update the boot-archive of the new BE. >> >> - Joerg >> >> >> >> >> >> On 23.02.2015 02:54, W Verb wrote: >> >> By the way, to those of you who have working >> setups: please send me >> your pool/volume settings, interface linkprops, >> and any kernel >> tuning >> parameters you may have set. >> >> Thanks, >> Warren V >> >> On Sat, Feb 21, 2015 at 7:59 AM, Schweiss, Chip >> >> >> >> >> >> wrote: >> >> I can't say I totally agree with your performance >> assessment. I run Intel >> X520 in all my OmniOS boxes. >> >> Here is a capture of nfssvrtop I made while >> running many >> storage vMotions >> between two OmniOS boxes hosting NFS >> datastores. This is a >> 10 host VMware >> cluster. Both OmniOS boxes are dual 10G >> connected with >> copper twin-ax to >> the in rack Nexus 5010. >> >> VMware does 100% sync writes, I use ZeusRAM >> SSDs for log >> devices. >> >> -Chip >> >> 2014 Apr 24 08:05:51, load: 12.64, read: >> 17330243 KB, >> swrite: 15985 KB, >> awrite: 1875455 KB >> >> Ver Client NFSOPS Reads >> SWrites AWrites >> Commits Rd_bw >> SWr_bw AWr_bw Rd_t SWr_t AWr_t >> Com_t Align% >> >> 4 10.28.17.105 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.17.215 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.17.213 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 10.28.16.151 0 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 4 all 1 0 >> 0 0 >> 0 0 >> 0 0 0 0 0 0 0 >> >> 3 10.28.16.175 3 0 >> 3 0 >> 0 1 >> 11 0 4806 48 0 0 85 >> >> 3 10.28.16.183 6 0 >> 6 0 >> 0 3 >> 162 0 549 124 0 0 >> 73 >> >> 3 10.28.16.180 11 0 >> 10 0 >> 0 3 >> 27 0 776 89 0 0 67 >> >> 3 10.28.16.176 28 2 >> 26 0 >> 0 10 >> 405 0 2572 198 0 0 >> 100 >> >> 3 10.28.16.178 4606 4602 >> 4 0 >> 0 294534 >> 3 0 723 49 0 0 99 >> >> 3 10.28.16.179 4905 4879 >> 26 0 >> 0 312208 >> 311 0 735 271 0 0 >> 99 >> >> 3 10.28.16.181 5515 5502 >> 13 0 >> 0 352107 >> 77 0 89 87 0 0 99 >> >> 3 10.28.16.184 12095 12059 >> 10 0 >> 0 763014 >> 39 0 249 147 0 0 99 >> >> 3 10.28.58.1 15401 6040 >> 116 6354 >> 53 191605 >> 474 202346 192 96 144 83 >> 99 >> >> 3 all 42574 33086 >> 217 >> 6354 53 1913488 >> 1582 202300 348 138 153 105 >> 99 >> >> >> >> >> >> On Fri, Feb 20, 2015 at 11:46 PM, W Verb >> >> > >> >> >> wrote: >> >> >> Hello All, >> >> Thank you for your replies. >> I tried a few things, and found the following: >> >> 1: Disabling hyperthreading support in the >> BIOS drops >> performance overall >> by a factor of 4. >> 2: Disabling VT support also seems to have >> some effect, >> although it >> appears to be minor. But this has the >> amusing side >> effect of fixing the >> hangs I've been experiencing with fast >> reboot. Probably >> by disabling kvm. >> 3: The performance tests are a bit tricky >> to quantify >> because of caching >> effects. In fact, I'm not entirely sure >> what is >> happening here. It's just >> best to describe what I'm seeing: >> >> The commands I'm using to test are >> dd if=/dev/zero of=./test.dd bs=2M count=5000 >> dd of=/dev/null if=./test.dd bs=2M count=5000 >> The host vm is running Centos 6.6, and has >> the latest >> vmtools installed. >> There is a host cache on an SSD local to >> the host that >> is also in place. >> Disabling the host cache didn't >> immediately have an >> effect as far as I could >> see. >> >> The host MTU set to 3000 on all iSCSI >> interfaces for all >> tests. >> >> Test 1: Right after reboot, with an ixgbe >> MTU of 9000, >> the write test >> yields an average speed over three tests >> of 137MB/s. The >> read test yields an >> average over three tests of 5MB/s. >> >> Test 2: After setting "ifconfig ixgbe0 mtu >> 3000", the >> write tests yield >> 140MB/s, and the read tests yield 53MB/s. >> It's important >> to note here that >> if I cut the read test short at only >> 2-3GB, I get >> results upwards of >> 350MB/s, which I assume is local >> cache-related distortion. >> >> Test 3: MTU of 1500. Read tests are up to >> 156 MB/s. >> Write tests yield >> about 142MB/s. >> Test 4: MTU of 1000: Read test at 182MB/s. >> Test 5: MTU of 900: Read test at 130 MB/s. >> Test 6: MTU of 1000: Read test at 160MB/s. >> Write tests >> are now >> consistently at about 300MB/s. >> Test 7: MTU of 1200: Read test at 124MB/s. >> Test 8: MTU of 1000: Read test at 161MB/s. >> Write at 261MB/s. >> >> A few final notes: >> L1ARC grabs about 10GB of RAM during the >> tests, so >> there's definitely some >> read cachi >> >> ... >> >> [Message clipped] > -------------- next part -------------- An HTML attachment was scrubbed... URL: From henson at acm.org Mon Mar 9 04:07:19 2015 From: henson at acm.org (Paul B. Henson) Date: Sun, 8 Mar 2015 21:07:19 -0700 Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator In-Reply-To: References: Message-ID: <20150309040719.GS25463@bender.unx.csupomona.edu> On Sat, Mar 07, 2015 at 11:17:03AM +0000, Andy wrote: > To date (we're running r151012 in production), OmniOS doesn't install > an MTA by default but, with the integration of 5166, sendmail becomes > a dependency of mailwrapper and mailwrapper is required by SUNWcs. :(, we actually use postfix for MTA purposes. I'd hate to see a hard requirement for sendmail as part of the base system. > This is a long way of me asking if mailwrapper could be removed from > OmniOS as it isn't required for an IPS distribution. That would remove the > requirement to have the standard sendmail package installed at all - just > like <=r151012. It would mean that 'mailx' doesn't work but that should be > expected if you haven't installed an MTA and is presumably the current > behaviour. +1 on this suggestion, thanks. From eric.sproul at circonus.com Mon Mar 9 14:23:13 2015 From: eric.sproul at circonus.com (Eric Sproul) Date: Mon, 9 Mar 2015 10:23:13 -0400 Subject: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00 In-Reply-To: <54FB65FD.6040600@gmail.com> References: <32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net> <331901d05846$e406cbb0$ac146310$@acm.org> <54FA198B.7090300@gnaa.net> <54FA244B.1010600@gnaa.net> <54FB65FD.6040600@gmail.com> Message-ID: On Sat, Mar 7, 2015 at 3:56 PM, Brogy?nyi J?zsef wrote: > Has anyone tested this firmware? Is it free from this error message "Parity > Error on path"? > Thanks any information. P20 firmware is known to be toxic; just google for "lsi p20 firmware" for the carnage. P19 and below are fine, as far as I know. Eric From danmcd at omniti.com Mon Mar 9 14:37:16 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 9 Mar 2015 10:37:16 -0400 Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator In-Reply-To: <20150309040719.GS25463@bender.unx.csupomona.edu> References: <20150309040719.GS25463@bender.unx.csupomona.edu> Message-ID: <24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com> > On Mar 9, 2015, at 12:07 AM, Paul B. Henson wrote: > > On Sat, Mar 07, 2015 at 11:17:03AM +0000, Andy wrote: > >> To date (we're running r151012 in production), OmniOS doesn't install >> an MTA by default but, with the integration of 5166, sendmail becomes >> a dependency of mailwrapper and mailwrapper is required by SUNWcs. > > :(, we actually use postfix for MTA purposes. I'd hate to see a hard > requirement for sendmail as part of the base system. I'll be looking into 5166 and its impact later today. I want to cut a last or next-to-last bloody today or tomorrow. This investigation will force it to be tomorrow. If you have suggested diffs, please mail them to the list or create webrevs. I'm generally okay with this so long as: - It does not break anything else. - It does not hinder the post-014 goal of building illumos-gate on OmniOS. But I need to make sure. Dan From danmcd at omniti.com Mon Mar 9 14:47:12 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 9 Mar 2015 10:47:12 -0400 Subject: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00 In-Reply-To: References: <32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net> <331901d05846$e406cbb0$ac146310$@acm.org> <54FA198B.7090300@gnaa.net> <54FA244B.1010600@gnaa.net> <54FB65FD.6040600@gmail.com> Message-ID: > On Mar 9, 2015, at 10:23 AM, Eric Sproul wrote: > > On Sat, Mar 7, 2015 at 3:56 PM, Brogy?nyi J?zsef wrote: >> Has anyone tested this firmware? Is it free from this error message "Parity >> Error on path"? >> Thanks any information. > > P20 firmware is known to be toxic; just google for "lsi p20 firmware" > for the carnage. > > P19 and below are fine, as far as I know. I've not heard good things about 19. I HAVE heard that 18 is the best level of FW to run for right now. Thanks! Dan From omnios at citrus-it.net Mon Mar 9 14:55:43 2015 From: omnios at citrus-it.net (Andy) Date: Mon, 9 Mar 2015 14:55:43 +0000 (GMT) Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator In-Reply-To: <24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com> References: <20150309040719.GS25463@bender.unx.csupomona.edu> <24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com> Message-ID: On Mon, 9 Mar 2015, Dan McDonald wrote: ; ; > On Mar 9, 2015, at 12:07 AM, Paul B. Henson wrote: ; > ; > On Sat, Mar 07, 2015 at 11:17:03AM +0000, Andy wrote: ; > ; >> To date (we're running r151012 in production), OmniOS doesn't install ; >> an MTA by default but, with the integration of 5166, sendmail becomes ; >> a dependency of mailwrapper and mailwrapper is required by SUNWcs. ; > ; > :(, we actually use postfix for MTA purposes. I'd hate to see a hard ; > requirement for sendmail as part of the base system. ; ; I'll be looking into 5166 and its impact later today. I want to cut a last or next-to-last bloody today or tomorrow. This investigation will force it to be tomorrow. ; ; If you have suggested diffs, please mail them to the list or create webrevs. I'm generally okay with this so long as: ; ; - It does not break anything else. ; ; - It does not hinder the post-014 goal of building illumos-gate on OmniOS. ; ; But I need to make sure. This would be sufficient for me. It re-introduces the problem with 'mailx' but that was there before. --- usr/src/pkg/manifests/SUNWcs.mf~ Mon Mar 9 14:54:01 2015 +++ usr/src/pkg/manifests/SUNWcs.mf Mon Mar 9 14:54:12 2015 @@ -1871,7 +1871,3 @@ # Depend on zoneinfo data. # depend fmri=system/data/zoneinfo type=require -# -# The mailx binary calls /usr/lib/sendmail provided by mailwrapper -# -depend fmri=system/network/mailwrapper type=require Thanks, Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From danmcd at omniti.com Mon Mar 9 17:18:33 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 9 Mar 2015 13:18:33 -0400 Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator In-Reply-To: References: <20150309040719.GS25463@bender.unx.csupomona.edu> <24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com> Message-ID: > On Mar 9, 2015, at 10:55 AM, Andy wrote: > > ; If you have suggested diffs, please mail them to the list or create webrevs. I'm generally okay with this so long as: > ; > ; - It does not break anything else. > ; > ; - It does not hinder the post-014 goal of building illumos-gate on OmniOS. > ; > ; But I need to make sure. > > This would be sufficient for me. It re-introduces the problem with 'mailx' > but that was there before. > > --- usr/src/pkg/manifests/SUNWcs.mf~ Mon Mar 9 14:54:01 2015 > +++ usr/src/pkg/manifests/SUNWcs.mf Mon Mar 9 14:54:12 2015 > @@ -1871,7 +1871,3 @@ > # Depend on zoneinfo data. > # > depend fmri=system/data/zoneinfo type=require > -# > -# The mailx binary calls /usr/lib/sendmail provided by mailwrapper > -# > -depend fmri=system/network/mailwrapper type=require I think this is too big of a hammer. Tell me, would weakening the requirement of sendmail by mailwrapper help? diff --git a/usr/src/pkg/manifests/system-network-mailwrapper.mf b/usr/src/pkg/manifests/system-network-mailwrapper.mf index fa855da..21cc0b7 100644 --- a/usr/src/pkg/manifests/system-network-mailwrapper.mf +++ b/usr/src/pkg/manifests/system-network-mailwrapper.mf @@ -42,4 +42,4 @@ link path=usr/sbin/newaliases mediator=mta mediator-implementation=mailwrapper \ target=../lib/mailwrapper link path=usr/sbin/sendmail mediator=mta mediator-implementation=mailwrapper \ target=../lib/mailwrapper -depend fmri=service/network/smtp/sendmail type=require +depend fmri=service/network/smtp/sendmail type=optional This keeps the spirit of the change, but doesn't trip up folks who want their own sendmail (even if they are technically violating KYSTY in their version! ;) ). Whatcha think? Dan p.s. I want to cut a bloody release today or more likely tomorrow. Let's not bikeshed this. I reserve the right to Just Say No also. From henson at acm.org Mon Mar 9 21:48:50 2015 From: henson at acm.org (Paul B. Henson) Date: Mon, 9 Mar 2015 14:48:50 -0700 Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator In-Reply-To: References: <20150309040719.GS25463@bender.unx.csupomona.edu> <24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com> Message-ID: <355501d05ab2$d6ac29b0$84047d10$@acm.org> > From: Dan McDonald > Sent: Monday, March 09, 2015 10:19 AM > > Tell me, would weakening the requirement of sendmail by mailwrapper help? > > This keeps the spirit of the change, but doesn't trip up folks who want their own > sendmail (even if they are technically violating KYSTY in their version! ;) ). So obviously mailwrapper would be installed, but what would happen with sendmail? It would be installed as well, but could be removed? I'm not sure exactly what IPS does with optional requirements. Currently we drop in our own symlinks for /usr/lib/sendmail et al pointing to our installed postfix, after the update, instead we would need to integrate the paths to our postfix stuff into the mailwrapper configuration instead? From danmcd at omniti.com Mon Mar 9 21:53:51 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 9 Mar 2015 17:53:51 -0400 Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator In-Reply-To: <355501d05ab2$d6ac29b0$84047d10$@acm.org> References: <20150309040719.GS25463@bender.unx.csupomona.edu> <24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com> <355501d05ab2$d6ac29b0$84047d10$@acm.org> Message-ID: > On Mar 9, 2015, at 5:48 PM, Paul B. Henson wrote: > > So obviously mailwrapper would be installed, but what would happen with > sendmail? It would be installed as well, but could be removed? I'm not sure > exactly what IPS does with optional requirements. I'm actually not sure either about the installation, but the weakened requirement should allow for removal, which was, I believe, the big problem. > Currently we drop in our > own symlinks for /usr/lib/sendmail et al pointing to our installed postfix, > after the update, instead we would need to integrate the paths to our > postfix stuff into the mailwrapper configuration instead? The mailwrapper manifest uses mediators to do the right thing. I'm building what I hope will be this week's bloody as I'm typing this. You can try it then (I'm not updating install media with this bump, however, so you'll have to "pkg update" to it). Dan From omnios at citrus-it.net Mon Mar 9 22:34:10 2015 From: omnios at citrus-it.net (Andy) Date: Mon, 9 Mar 2015 22:34:10 +0000 (GMT) Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator In-Reply-To: <355501d05ab2$d6ac29b0$84047d10$@acm.org> References: <20150309040719.GS25463@bender.unx.csupomona.edu> <24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com> <355501d05ab2$d6ac29b0$84047d10$@acm.org> Message-ID: On Mon, 9 Mar 2015, Paul B. Henson wrote: ; > From: Dan McDonald ; > Sent: Monday, March 09, 2015 10:19 AM ; > ; > Tell me, would weakening the requirement of sendmail by mailwrapper help? ; > ; > This keeps the spirit of the change, but doesn't trip up folks who want ; their own ; > sendmail (even if they are technically violating KYSTY in their version! ; ;) ). ; ; So obviously mailwrapper would be installed, but what would happen with ; sendmail? It would be installed as well, but could be removed? I'm not sure ; exactly what IPS does with optional requirements. Currently we drop in our ; own symlinks for /usr/lib/sendmail et al pointing to our installed postfix, ; after the update, instead we would need to integrate the paths to our ; postfix stuff into the mailwrapper configuration instead? You could do that - updating /etc/mailer.conf and leaving mailwrapper as the /usr/lib/sendmail - but you're probably better off updating your postfix package to use mediated symlinks. If you set the priority to 'site' it should override mailwrapper upon installation. Here's what we have in our MTA manifest now: link mediator=mta mediator-implementation=citrus-sendmail mediator-priority=site path=usr/lib/sendmail target=../../opt/sendmail/sbin/sendmail /opt/sendmail being where our Sendmail is. Similar entries for mailq, newaliases and /usr/sbin/sendmail. Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From henson at acm.org Tue Mar 10 00:08:32 2015 From: henson at acm.org (Paul B. Henson) Date: Mon, 9 Mar 2015 17:08:32 -0700 Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator In-Reply-To: References: <20150309040719.GS25463@bender.unx.csupomona.edu> <24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com> <355501d05ab2$d6ac29b0$84047d10$@acm.org> Message-ID: <20150310000832.GT25463@bender.unx.csupomona.edu> On Mon, Mar 09, 2015 at 05:53:51PM -0400, Dan McDonald wrote: > The mailwrapper manifest uses mediators to do the right thing. We don't build postfix as an IPS package, we use pkgsrc. So I don't think mediators are going to work for me... From henson at acm.org Tue Mar 10 00:10:09 2015 From: henson at acm.org (Paul B. Henson) Date: Mon, 9 Mar 2015 17:10:09 -0700 Subject: [OmniOS-discuss] Bloody // mailwrapper & mta mediator In-Reply-To: References: <20150309040719.GS25463@bender.unx.csupomona.edu> <24D36878-45A2-4088-86E4-AF079CF6E81B@omniti.com> <355501d05ab2$d6ac29b0$84047d10$@acm.org> Message-ID: <20150310001009.GU25463@bender.unx.csupomona.edu> On Mon, Mar 09, 2015 at 10:34:10PM +0000, Andy wrote: > You could do that - updating /etc/mailer.conf and leaving mailwrapper as > the /usr/lib/sendmail - but you're probably better off updating your > postfix package to use mediated symlinks. If you set the priority to > 'site' it should override mailwrapper upon installation. So what's the point of having mailwrapper at all if some other package is going to basically completely replace everything it provides? At least if you're using an IPS packaged MTA. From stephan.budach at JVM.DE Tue Mar 10 10:48:39 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Tue, 10 Mar 2015 11:48:39 +0100 Subject: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00 In-Reply-To: References: <32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net> <331901d05846$e406cbb0$ac146310$@acm.org> <54FA198B.7090300@gnaa.net> <54FA244B.1010600@gnaa.net> <54FB65FD.6040600@gmail.com> Message-ID: <54FECC07.7060606@jvm.de> Am 09.03.15 um 15:47 schrieb Dan McDonald: >> On Mar 9, 2015, at 10:23 AM, Eric Sproul wrote: >> >> On Sat, Mar 7, 2015 at 3:56 PM, Brogy?nyi J?zsef wrote: >>> Has anyone tested this firmware? Is it free from this error message "Parity >>> Error on path"? >>> Thanks any information. >> P20 firmware is known to be toxic; just google for "lsi p20 firmware" >> for the carnage. >> >> P19 and below are fine, as far as I know. > I've not heard good things about 19. I HAVE heard that 18 is the best level of FW to run for right now. > > Thanks! > Dan Is there a known good way to flash a LSI back to P18 if it already came with P19? I happen to have two new LSIs running P19. Afaik, the readme explicitly warns about flashing back the fw? Cheers, budy From filip.marvan at aira.cz Tue Mar 10 12:39:31 2015 From: filip.marvan at aira.cz (Filip Marvan) Date: Tue, 10 Mar 2015 13:39:31 +0100 Subject: [OmniOS-discuss] Howto install Grub on different device Message-ID: <3BE0DEED8863E5429BAE4CAEDF62456503AE56C49C1B@AIRA-SRV.aira.local> Hi, I have HP Microserver G8 with 4 drive bays and one SATA port. I would like to use this separate SATA port for SSD disk with system rpool, but Microserver G8 is not able to boot from this SATA, if AHCI mode is enabled and in drive bays are disks. So would like to try some workround. I would like to install GRUB on SD card, and use this SD card for booting (but all system with rpool will remain on SSD on SATA, only bootloader will be on SD card). I installed GRUB to SD card without any problems with: installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c2t0d0s0 c2t0d0 is my microSD card, without any filesystem installed. There is only one Solaris2 partition. My active bootmenu entry in /rpool/boot/grub/menu.lst looks like this: title omnios-1 bootfs rpool/ROOT/omnios-1 root (hd5,0,a) kernel$ /platform/i86pc/kernel/amd64/unix -B $ZFS-BOOTFS module$ /platform/i86pc/amd64/boot_archive But if I boot HP Microserver from my SD card, it cannot locate my menu.lst config file and fall to grub> shell. If I enter command: configfile (hd5,0,a)/boot/grub/menu.lst I can boot withou any problems in exact way as I wand, but hot to configure GRUB, to use my config file on hd5 automatically? Thank you for any help! Filip Marvan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6220 bytes Desc: not available URL: From chip at innovates.com Tue Mar 10 13:13:33 2015 From: chip at innovates.com (Schweiss, Chip) Date: Tue, 10 Mar 2015 08:13:33 -0500 Subject: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00 In-Reply-To: <54FECC07.7060606@jvm.de> References: <32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net> <331901d05846$e406cbb0$ac146310$@acm.org> <54FA198B.7090300@gnaa.net> <54FA244B.1010600@gnaa.net> <54FB65FD.6040600@gmail.com> <54FECC07.7060606@jvm.de> Message-ID: On Tue, Mar 10, 2015 at 5:48 AM, Stephan Budach wrote: > Am 09.03.15 um 15:47 schrieb Dan McDonald: > >> On Mar 9, 2015, at 10:23 AM, Eric Sproul >>> wrote: >>> >>> On Sat, Mar 7, 2015 at 3:56 PM, Brogy?nyi J?zsef >>> wrote: >>> >>>> Has anyone tested this firmware? Is it free from this error message >>>> "Parity >>>> Error on path"? >>>> Thanks any information. >>>> >>> P20 firmware is known to be toxic; just google for "lsi p20 firmware" >>> for the carnage. >>> >>> P19 and below are fine, as far as I know. >>> >> I've not heard good things about 19. I HAVE heard that 18 is the best >> level of FW to run for right now. >> >> Thanks! >> Dan >> > Is there a known good way to flash a LSI back to P18 if it already came > with P19? I happen to have two new LSIs running P19. > Afaik, the readme explicitly warns about flashing back the fw? > > Backwards is hard. I went through that trying to get v20 reverted on some new HBAs. The only method I could find that worked was using the UEFI shell and UEFI sas2flash utility to erase the firmware and install the old version. On older motherboards, the DOS method should work. Solaris/Illumos sas2flash is incapable of erasing the firmware. -Chip > Cheers, > budy > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chip at innovates.com Tue Mar 10 14:51:35 2015 From: chip at innovates.com (Schweiss, Chip) Date: Tue, 10 Mar 2015 09:51:35 -0500 Subject: [OmniOS-discuss] smtp-notify dependency on sendmail Message-ID: I haven't used sendmail since the 1990's and don't intend to change. I've figured out how to get smtp-notify to start with sendmail-client disable, but it was a manual process of using 'svccfg -s smtp-notify editprop' What I can't figure out how to do the same on the command line. Everything I try either gives a syntax error or 'svccfg: No such property group "startup_req".' I really don't want to have to add a manual step to my system setup scripts. What's the proper syntax for this setting?: svccfg -s system/fm/smtp-notify:default setprop "startup_req/entities" = fmri: \"svc:/milestone/multi-user:default svc:/system/fmd:default\" -Chip -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Tue Mar 10 15:36:54 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 10 Mar 2015 11:36:54 -0400 Subject: [OmniOS-discuss] smtp-notify dependency on sendmail In-Reply-To: References: Message-ID: <430FB4D8-1AB1-4675-B39F-84BE3928572B@omniti.com> > On Mar 10, 2015, at 10:51 AM, Schweiss, Chip wrote: > > svccfg -s system/fm/smtp-notify:default setprop "startup_req/entities" = fmri: \"svc:/milestone/multi-user:default svc:/system/fmd:default\" First off, the :default is for an *instance*. You want to lose that, as the startup_req/entities is for the whole service. Second off, I don't know how to glom two FMRIs in one command line. Here's my proposed, two command, solution: svccfg -s system/fm/smtp-notify setprop startup_req/entities = fmri: svc:/milestone/multi-user:default svccfg -s system/fm/smtp-notify addpropvalue startup_req/entities fmri: svc:/system/fmd:default I got this to work on one of my VMs I use for bloody. Please confirm/deny this works for you? Hope this helps, Dan From omnios at citrus-it.net Tue Mar 10 15:50:28 2015 From: omnios at citrus-it.net (Andy) Date: Tue, 10 Mar 2015 15:50:28 +0000 (GMT) Subject: [OmniOS-discuss] smtp-notify dependency on sendmail In-Reply-To: <430FB4D8-1AB1-4675-B39F-84BE3928572B@omniti.com> References: <430FB4D8-1AB1-4675-B39F-84BE3928572B@omniti.com> Message-ID: On Tue, 10 Mar 2015, Dan McDonald wrote: ; ; > On Mar 10, 2015, at 10:51 AM, Schweiss, Chip wrote: ; > ; > svccfg -s system/fm/smtp-notify:default setprop "startup_req/entities" = fmri: \"svc:/milestone/multi-user:default svc:/system/fmd:default\" Also, note that this list of dependencies is the default for bloody as the sendmail-client dependency was removed; you will be able to stop using this workaround in the future. Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From chip at innovates.com Tue Mar 10 15:55:50 2015 From: chip at innovates.com (Schweiss, Chip) Date: Tue, 10 Mar 2015 10:55:50 -0500 Subject: [OmniOS-discuss] smtp-notify dependency on sendmail In-Reply-To: <430FB4D8-1AB1-4675-B39F-84BE3928572B@omniti.com> References: <430FB4D8-1AB1-4675-B39F-84BE3928572B@omniti.com> Message-ID: On Tue, Mar 10, 2015 at 10:36 AM, Dan McDonald wrote: > > svccfg -s system/fm/smtp-notify setprop startup_req/entities = fmri: > svc:/milestone/multi-user:default > svccfg -s system/fm/smtp-notify addpropvalue startup_req/entities fmri: > svc:/system/fmd:default That's the trick. Thanks! -Chip -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimklimov at cos.ru Tue Mar 10 17:29:27 2015 From: jimklimov at cos.ru (Jim Klimov) Date: Tue, 10 Mar 2015 18:29:27 +0100 Subject: [OmniOS-discuss] smtp-notify dependency on sendmail In-Reply-To: References: Message-ID: <5F51C49D-0B8E-42BC-A91B-502383F3D424@cos.ru> 10 ????? 2015??. 15:51:35 CET, "Schweiss, Chip" ?????: >I haven't used sendmail since the 1990's and don't intend to change. > >I've figured out how to get smtp-notify to start with sendmail-client >disable, but it was a manual process of using 'svccfg -s smtp-notify >editprop' > >What I can't figure out how to do the same on the command line. >Everything >I try either gives a syntax error or 'svccfg: No such property group >"startup_req".' I really don't want to have to add a manual step to my >system setup scripts. > >What's the proper syntax for this setting?: > >svccfg -s system/fm/smtp-notify:default setprop "startup_req/entities" >= >fmri: \"svc:/milestone/multi-user:default svc:/system/fmd:default\" > >-Chip > > >------------------------------------------------------------------------ > >_______________________________________________ >OmniOS-discuss mailing list >OmniOS-discuss at lists.omniti.com >http://lists.omniti.com/mailman/listinfo/omnios-discuss You may have to 'addpg' the property group first. Just 'editprop' any service to see syntax examples. Choose to not depend on success of 'addpg' (it will fail if the pg is present already) but do check success of 'setprop'. HTH, Jim -- Typos courtesy of K-9 Mail on my Samsung Android From danmcd at omniti.com Tue Mar 10 17:36:23 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 10 Mar 2015 13:36:23 -0400 Subject: [OmniOS-discuss] smtp-notify dependency on sendmail In-Reply-To: <5F51C49D-0B8E-42BC-A91B-502383F3D424@cos.ru> References: <5F51C49D-0B8E-42BC-A91B-502383F3D424@cos.ru> Message-ID: <44AD0F5A-4A1C-4A25-B5AA-DEFE9D8D3EAF@omniti.com> > On Mar 10, 2015, at 1:29 PM, Jim Klimov wrote: > > You may have to 'addpg' the property group first. Just 'editprop' any service to see syntax examples. No he won't have to in this case. The pg was already there. > Choose to not depend on success of 'addpg' (it will fail if the pg is present already) but do check success of 'setprop'. I did, and it worked for me. He did, and it appears to work for him too. GENERALLY SPEAKING checking for the pg is a good idea. In this particular case, it's not needed. Dan From danmcd at omniti.com Tue Mar 10 18:02:32 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 10 Mar 2015 14:02:32 -0400 Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013 Message-ID: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com> Hey folks! We're winding down to the release of r151014, which is not just the next Stable, but also the next LTS, replacing r151006. No release media for this release, but the repo has been updated COMPLETELY. This means you'll get prompted to "pkg update pkg" first, followed by a proper "pkg update". The last bloody didn't have a lot of changes. This one does. Let's go over them: * The distro_const(1M) command, which creates installation ISOs, now has signature policy as a configurable. This is in anticipation of r151014's change to have the "omnios" repository REQUIRE SIGNATURES. * omnios-build master branch, revision 51bf0ac * The linked-ipkg (lipkg) brand is now part of the entire consolidation. * Bash is now 4.3PL33 * Bind is now 9.10.2 * gnu-binutils is now 2.25 * bison is now 3.0.4 * NSS is at 3.17.4, NSPR is at 4.10.7, and ca-bundle has NSS 3.17.4 goodies in it. * Curl is now 7.41.0 * Amazon EC2 API is now 1.7.3.0 * gmp is now 6.0.0a (but versioned as 6.0.0 like its tarball) * gettext is 0.19.4 * gnu-grep is 2.21 * ipmitool is now 1.8.15 * iso-codes are now 3.57 * numpy is 1.9.2 * libidn is now 1.30 * libpcap is now 1.6.2 * NTP is now 6.7p1 * gnu-patch is now 2.7.4 * pv/pipe-viewer is now 1.5.7 * lxml-26 is now 3.4.2 * Mako is now 1.0.1 * pycurl is now 7.19.5.1 * simplejson is now 3.6.5 * sigcpp is now 2.4.0 * sqlite-3 is now 3.8.8.2 * git 2.3.0 is now properly versioned in omnios-userland. * illumos-omnios master branch, revision dd90365 (last illumos-gate merge e492095) * Zones now inherit the global zone's per-publisher signature policies both upon creation and upon attach. * A softening of Illumos's "mailwrapper" package dependencies in the hopes of allowing custom sendmails more room to play in /etc/. * Various small bugfixes all over the system, including ZFS. * beadm(1M) now sorts by BE creation date (and can sort other ways with new options). * While not available in the installation tools yet (and might not be until the r151016 release), you can now create a bootable root ZFS pool on EFI/GPI disks. (Illumos #5125 and #5560-1.) We have one more update for omnios-build (libffi, if needed), and we're planning to take some more from upstream illumos-gate before we close for the r151014 release (I'm hopeful several new Ethernet chipsets will be showing up). Please try out zone creation and upgrades if you haven't already! And make sure the new versions of any software mentioned above aren't surprising you. (We've had no surprises thus far.) Thanks! Dan From tim at multitalents.net Tue Mar 10 23:32:40 2015 From: tim at multitalents.net (Tim Rice) Date: Tue, 10 Mar 2015 16:32:40 -0700 (PDT) Subject: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00 In-Reply-To: <54FECC07.7060606@jvm.de> References: <32b301d057ba$d6168cc0$8243a640$@acm.org> <54F9FD0B.1040601@gnaa.net> <331901d05846$e406cbb0$ac146310$@acm.org> <54FA198B.7090300@gnaa.net> <54FA244B.1010600@gnaa.net> <54FB65FD.6040600@gmail.com> <54FECC07.7060606@jvm.de> Message-ID: On Tue, 10 Mar 2015, Stephan Budach wrote: > Is there a known good way to flash a LSI back to P18 if it already came with > P19? I happen to have two new LSIs running P19. > Afaik, the readme explicitly warns about flashing back the fw? Here are my notes from downgrading from P20 to P19 on a supermicro box. modify as needed to go from P19 to P18. ...... downgrade to P19. P20 has serious bugs. boot into UFI shell get to usb fs1: get to fw dir cd 9211_8i.p19 erase newer fw sas2flash.efi -o -e 6 load new fw and bios sas2flash.efi -o -l 2118.log -f 2118it.bin -b mptsas2.rom ...... -- Tim Rice Multitalents tim at multitalents.net From omnios at citrus-it.net Wed Mar 11 00:35:13 2015 From: omnios at citrus-it.net (Andy) Date: Wed, 11 Mar 2015 00:35:13 +0000 (GMT) Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations.. In-Reply-To: <81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com> References: <81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com> Message-ID: On Fri, 27 Feb 2015, Dan McDonald wrote: ; ; > On Feb 27, 2015, at 5:05 AM, Andy wrote: ; > ; > If we go ahead, I'll let you all know how it goes! ; ; Please do that. If you can zap a Dell Standard HBA out of HW-RAID and into a raw-disk controller, that'd be a HUGE WIN for illumos distros everywhere. Initial results look good to me. This is a Dell R730 with a PERC H730 RAID card in it and just a pair of 300GB SAS disks for now. The card is configured in non-RAID/HBA mode through the standard BIOS menus. # prtconf -d ... pci8086,2f02 (pciex8086,2f02) [Intel Corporation Haswell-E PCI Express Root Port 1], instance #0 pci1028,1f49 (pciex1000,5d) [LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader]], instance #0 sd, instance #0 sd, instance #1 # iostat -En c0t0d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST300MP0005 Revision: VS08 Serial No: S7xxx Size: 300.00GB <300000000000 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t1d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST300MP0005 Revision: VS08 Serial No: S7xxx Size: 300.00GB <300000000000 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 I do get this during boot: SunOS Release 5.11 Version omnios-10b9c79 64-bit Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved. NOTICE: map sync received, switched map_id to 1 NOTICE: LDMAP sync completed. WARNING: /pci at 0,0/pci8086,2f02 at 1/pci1028,1f49 at 0/sd at 0,1 (sd0): Command failed to complete...Device is gone WARNING: /pci at 0,0/pci8086,2f02 at 1/pci1028,1f49 at 0/sd at 1,1 (sd1): Command failed to complete...Device is gone but everything seems ok afterwards. Will continue with testing tomorrow. Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From tobi at oetiker.ch Wed Mar 11 08:20:04 2015 From: tobi at oetiker.ch (Tobias Oetiker) Date: Wed, 11 Mar 2015 09:20:04 +0100 (CET) Subject: [OmniOS-discuss] About P19 Message-ID: Dan, you mentioned in an earlier post that you had not heard anything good about P19 ... this seems to prompt people to consider downgreading to P18 ... Did you mean to say that you had heared something BAD about P19 or just nothing at all. Because I like my firmware best when it just does what it is suposed todo and noone even thinks about it. We are running P19 currently on one of our boxes, and it works ok. (It did not solve the problem that prompted us to upgrade, which is that we are seeing disks going offline for a few seconds every few weeks causing zfs to mark them as faulted. But it did not make it worse either, so we are looking at the disk firmware now ... ) cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 From johan.kragsterman at capvert.se Wed Mar 11 09:12:07 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Wed, 11 Mar 2015 10:12:07 +0100 Subject: [OmniOS-discuss] Ang: About P19 In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From eric.sproul at circonus.com Wed Mar 11 14:16:38 2015 From: eric.sproul at circonus.com (Eric Sproul) Date: Wed, 11 Mar 2015 10:16:38 -0400 Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013 In-Reply-To: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com> References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com> Message-ID: On Tue, Mar 10, 2015 at 2:02 PM, Dan McDonald wrote: > Hey folks! We're winding down to the release of r151014, which is not just the next Stable, but also the next LTS, replacing r151006. No release media for this release, but the repo has been updated COMPLETELY. This means you'll get prompted to "pkg update pkg" first, followed by a proper "pkg update". Thanks Dan, I just upgraded to the latest and noticed that arcstat throws this error before every line of stats output: Use of uninitialized value in division (/) at /usr/bin/arcstat line 329. It looks like a proposed fix is up for review on the OpenZFS dev list: https://reviews.csiden.org/r/164/ and the illumos bug report is https://www.illumos.org/issues/5564 Eric From danmcd at omniti.com Wed Mar 11 14:22:14 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 11 Mar 2015 10:22:14 -0400 Subject: [OmniOS-discuss] About P19 In-Reply-To: References: Message-ID: <97E9D2FC-2B37-49C9-966B-A21B65361532@omniti.com> > On Mar 11, 2015, at 4:20 AM, Tobias Oetiker wrote: > > Dan, > > you mentioned in an earlier post that you had not heard anything > good about P19 ... this seems to prompt people to consider > downgreading to P18 ... I've heard little/nothing about P19. I've only heard P18 is known to be good. Dan From danmcd at omniti.com Wed Mar 11 14:39:18 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 11 Mar 2015 10:39:18 -0400 Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013 In-Reply-To: References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com> Message-ID: <98E3747C-7E1B-41B6-8075-30A68535AB20@omniti.com> > On Mar 11, 2015, at 10:16 AM, Eric Sproul wrote: > > On Tue, Mar 10, 2015 at 2:02 PM, Dan McDonald wrote: >> Hey folks! We're winding down to the release of r151014, which is not just the next Stable, but also the next LTS, replacing r151006. No release media for this release, but the repo has been updated COMPLETELY. This means you'll get prompted to "pkg update pkg" first, followed by a proper "pkg update". > > Thanks Dan, > I just upgraded to the latest and noticed that arcstat throws this > error before every line of stats output: > > Use of uninitialized value in division (/) at /usr/bin/arcstat line 329. > > It looks like a proposed fix is up for review on the OpenZFS dev list: > https://reviews.csiden.org/r/164/ and the illumos bug report is > https://www.illumos.org/issues/5564 It will show up on the next bloody update come hell (via upstream) or high water (if I have to merge it manually myself). Dan From danmcd at omniti.com Wed Mar 11 14:42:35 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 11 Mar 2015 10:42:35 -0400 Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013 In-Reply-To: <98E3747C-7E1B-41B6-8075-30A68535AB20@omniti.com> References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com> <98E3747C-7E1B-41B6-8075-30A68535AB20@omniti.com> Message-ID: <78215C27-657C-445F-97C2-12B5DC21877F@omniti.com> > On Mar 11, 2015, at 10:39 AM, Dan McDonald wrote: > >> >> It looks like a proposed fix is up for review on the OpenZFS dev list: >> https://reviews.csiden.org/r/164/ and the illumos bug report is >> https://www.illumos.org/issues/5564 > > It will show up on the next bloody update come hell (via upstream) or high water (if I have to merge it manually myself). Actually, I approved the RTI for 5564 late yesterday. It's literally just the committer typing "git commit --amend" (adding a missing reviewer credit) and "git push" and it'll be in our next pull from upstream. :) Dan From chip at innovates.com Wed Mar 11 14:48:11 2015 From: chip at innovates.com (Schweiss, Chip) Date: Wed, 11 Mar 2015 09:48:11 -0500 Subject: [OmniOS-discuss] About P19 In-Reply-To: <97E9D2FC-2B37-49C9-966B-A21B65361532@omniti.com> References: <97E9D2FC-2B37-49C9-966B-A21B65361532@omniti.com> Message-ID: I have P19 on 3 active servers. No issues. I consider it safe. Also interesting, P20 was on them when I first purchased them. It was nearly a month of usage before I found out about P20 and then downgraded. I didn't have any problems with P20 like others were seeing. -Chip On Wed, Mar 11, 2015 at 9:22 AM, Dan McDonald wrote: > > > On Mar 11, 2015, at 4:20 AM, Tobias Oetiker wrote: > > > > Dan, > > > > you mentioned in an earlier post that you had not heard anything > > good about P19 ... this seems to prompt people to consider > > downgreading to P18 ... > > I've heard little/nothing about P19. I've only heard P18 is known to be > good. > > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Wed Mar 11 15:18:41 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 11 Mar 2015 11:18:41 -0400 Subject: [OmniOS-discuss] TEST REQUEST - Running kvm in a zone verification Message-ID: <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB@omniti.com> This most recent update to bloody has put the /dev/kvm device entry into non-global ipkg or lipkg zones. The idea is you can run KVM instances in zones, which is something not available in r151012 or earlier. Our standard methods for running KVM apply: http://omnios.omniti.com/wiki.php/VirtualMachinesKVM But you MUST FIRST dedicate a vnic to the zone in question from the global zone: by creating one in the global and then "add net/set physical" in zonecfg(1M). Furthemore, that vnic cannot be used for that zone's normal activity (so you'll likely need two vnics, unless you want the zone to do nothing but run KVM). You MUST also dedicate a filesystem to the zone using the "dataset" methods in zonecfg(1M). I've not been able to test this yet, so I cannot yet make the claim that "you can run KVM in a zone starting with r151014". I would appreciate some community help here. I have *some* availability for questions and help, but I really would like someone to take this and run with it. Thanks, Dan From jdg117 at elvis.arl.psu.edu Wed Mar 11 18:23:05 2015 From: jdg117 at elvis.arl.psu.edu (John D Groenveld) Date: Wed, 11 Mar 2015 14:23:05 -0400 Subject: [OmniOS-discuss] TEST REQUEST - Running kvm in a zone verification In-Reply-To: Your message of "Wed, 11 Mar 2015 11:18:41 EDT." <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB@omniti.com> References: <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB@omniti.com> Message-ID: <201503111823.t2BIN5NY007656@elvis.arl.psu.edu> In message <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB at omniti.com>, Dan McDonald writ es: >This most recent update to bloody has put the /dev/kvm device entry into non-g >lobal ipkg or lipkg zones. The idea is you can run KVM instances in zones, wh >ich is something not available in r151012 or earlier. You can run a single KVM instance in a zone with r151012 but you must add /dev/dld and /dev/kvm. >I've not been able to test this yet, so I cannot yet make the claim that "you >can run KVM in a zone starting with r151014". I would appreciate some communi >ty help here. I have *some* availability for questions and help, but I really > would like someone to take this and run with it. Awesome! Now to hunt down some raw iron for bloody. John groenveld at acm.org # cat /etc/release OmniOS v11 r151012 Copyright 2014 OmniTI Computer Consulting, Inc. All rights reserved. Use is subject to license terms # zonecfg -z doors export create -b set zonepath=/var/opt/zones/doors set brand=ipkg set autoboot=false set ip-type=exclusive add net set physical=vnic2 end add net set physical=vnic3 end add device set match=/dev/kvm end add device set match=/dev/dld end #!/usr/bin/bash # configuration NAME=doors VNIC=vnic3 HDD=/root/doors.raw CD=/root/openSUSE-13.2-DVD-x86_64.iso VNC=5 MEM=8192 mac=`dladm show-vnic -po macaddress $VNIC` /usr/bin/qemu-system-x86_64 \ -name $NAME \ -boot cd \ -enable-kvm \ -vnc 0.0.0.0:$VNC \ -cpu host \ -smp 4 \ -m $MEM \ -no-hpet \ -usbdevice tablet \ -localtime \ -drive file=$HDD,if=ide,index=0 \ -drive file=$CD,media=cdrom,if=ide,index=2 \ -net nic,vlan=0,name=net0,model=e1000,macaddr=$mac \ -net vnic,vlan=0,name=net0,ifname=$VNIC,macaddr=$mac \ -vga cirrus \ -monitor unix:/tmp/$NAME.monitor,server,nowait,nodelay \ -daemonize if [ $? -gt 0 ]; then echo "Failed to start VM" fi port=`expr 5900 + $VNC` public_nic=$(dladm show-vnic|grep vnic2|awk '{print $2}') public_ip=$(ifconfig $public_nic|grep inet|awk '{print $2}') echo "Started VM:" echo "Public: ${public_ip}:${port}" From danmcd at omniti.com Wed Mar 11 19:05:57 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 11 Mar 2015 15:05:57 -0400 Subject: [OmniOS-discuss] TEST REQUEST - Running kvm in a zone verification In-Reply-To: <201503111823.t2BIN5NY007656@elvis.arl.psu.edu> References: <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB@omniti.com> <201503111823.t2BIN5NY007656@elvis.arl.psu.edu> Message-ID: <8AFC27C8-2D56-4821-8525-F7B944D57F45@omniti.com> > On Mar 11, 2015, at 2:23 PM, John D Groenveld wrote: > > In message <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB at omniti.com>, Dan McDonald writ > es: >> This most recent update to bloody has put the /dev/kvm device entry into non-g >> lobal ipkg or lipkg zones. The idea is you can run KVM instances in zones, wh >> ich is something not available in r151012 or earlier. > > You can run a single KVM instance in a zone with r151012 > but you must add /dev/dld and /dev/kvm. Oh hell! I actually didn't add KVM into the platform.xml files. /dev/dld *is* in the zones, if you use exclusive-stack (and I don't know a good reason NOT to these days...). So your script still needs to add /dev/kvm, until I patch illumos-omnios with /dev/kvm in the appropriate platform.xml files. >> I've not been able to test this yet, so I cannot yet make the claim that "you >> can run KVM in a zone starting with r151014". I would appreciate some communi >> ty help here. I have *some* availability for questions and help, but I really >> would like someone to take this and run with it. > > Awesome! > Now to hunt down some raw iron for bloody. I'm going to have to push this back: commit af30091afd0ccd9320c3aee83ac15318e8d9e78f Author: Dan McDonald Date: Wed Mar 11 15:02:54 2015 -0400 Add kvm device accessability to ipkg/lipkg zones. diff --git a/usr/src/lib/brand/ipkg/zone/platform.xml b/usr/src/lib/brand/ipkg/zone/platform.xml index db40c9f..1e4fd5c 100644 --- a/usr/src/lib/brand/ipkg/zone/platform.xml +++ b/usr/src/lib/brand/ipkg/zone/platform.xml @@ -54,6 +54,7 @@ + diff --git a/usr/src/lib/brand/lipkg/zone/platform.xml b/usr/src/lib/brand/lipkg/zone/platform.xml index c5c6041..7433d22 100644 --- a/usr/src/lib/brand/lipkg/zone/platform.xml +++ b/usr/src/lib/brand/lipkg/zone/platform.xml @@ -54,6 +54,7 @@ + Thanks for the help and reality check! Dan From danmcd at omniti.com Wed Mar 11 19:27:57 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 11 Mar 2015 15:27:57 -0400 Subject: [OmniOS-discuss] rsync & MacOS Message-ID: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com> MacOS is weird because of its forked files. I use NFS and everything works out okay. I would like, however, to use rsync to mirror my home directory, as even with GigE, it takes a long time to back things up from scratch. Does the MacOS X native rsync client work with 10.6 or 10.10 (I have machines running both, but nothing in between)? Do I need special patches either on my clients or on my OmniOS server (r151012 for now, 014 shortly coming). There's no rsync changes between 012 and 014 (3.1.1), so if it works for 012, it will keep working. Thanks, Dan From lists at marzocchi.net Wed Mar 11 19:38:16 2015 From: lists at marzocchi.net (Olaf Marzocchi) Date: Wed, 11 Mar 2015 20:38:16 +0100 Subject: [OmniOS-discuss] rsync & MacOS In-Reply-To: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com> References: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com> Message-ID: <01391483-5BF3-4642-A592-4405EB2CCCA7@marzocchi.net> Hi Dan, I can give you some links, as in the past I was using rsync to backup OSX data. https://static.afp548.com/mactips/rsync.html (some old hints about how to compile rsync) http://www.n8gray.org/code/backup-bouncer/ (a script to verify correct backups of OSX data using rsync) In any case, I think that if you compile rsync by adding the obviously named patches provided with rsync itself, you should be fine. It?s also true that recent OSXs almost don?t use resource forks, if you backup xattrs it should be enough. Olaf > Il giorno 11/mar/2015, alle ore 20:27, Dan McDonald ha scritto: > > MacOS is weird because of its forked files. I use NFS and everything works out okay. I would like, however, to use rsync to mirror my home directory, as even with GigE, it takes a long time to back things up from scratch. > > Does the MacOS X native rsync client work with 10.6 or 10.10 (I have machines running both, but nothing in between)? Do I need special patches either on my clients or on my OmniOS server (r151012 for now, 014 shortly coming). There's no rsync changes between 012 and 014 (3.1.1), so if it works for 012, it will keep working. > > Thanks, > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Wed Mar 11 19:40:43 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 11 Mar 2015 15:40:43 -0400 Subject: [OmniOS-discuss] rsync & MacOS In-Reply-To: <01391483-5BF3-4642-A592-4405EB2CCCA7@marzocchi.net> References: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com> <01391483-5BF3-4642-A592-4405EB2CCCA7@marzocchi.net> Message-ID: <2E06E450-631E-4623-A91E-06385AC6C81B@omniti.com> > On Mar 11, 2015, at 3:38 PM, Olaf Marzocchi wrote: > > In any case, I think that if you compile rsync by adding the obviously named patches provided with rsync itself, you should be fine. It?s also true that recent OSXs almost don?t use resource forks, if you backup xattrs it should be enough. I have some OLD files in my homedir, some may even predate MacOS X, so I do worry about resource forks or Creator/Type metadata. Thanks! Dan From lists at marzocchi.net Wed Mar 11 19:45:26 2015 From: lists at marzocchi.net (Olaf Marzocchi) Date: Wed, 11 Mar 2015 20:45:26 +0100 Subject: [OmniOS-discuss] rsync & MacOS In-Reply-To: <2E06E450-631E-4623-A91E-06385AC6C81B@omniti.com> References: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com> <01391483-5BF3-4642-A592-4405EB2CCCA7@marzocchi.net> <2E06E450-631E-4623-A91E-06385AC6C81B@omniti.com> Message-ID: <257E5BA5-E0B8-42D1-962E-51D5CDFBE0CF@marzocchi.net> > Il giorno 11/mar/2015, alle ore 20:40, Dan McDonald ha scritto: > > >> On Mar 11, 2015, at 3:38 PM, Olaf Marzocchi wrote: >> >> In any case, I think that if you compile rsync by adding the obviously named patches provided with rsync itself, you should be fine. It?s also true that recent OSXs almost don?t use resource forks, if you backup xattrs it should be enough. > > I have some OLD files in my homedir, some may even predate MacOS X, so I do worry about resource forks or Creator/Type metadata. I would add the patches then, my first guess: acls fileflags hfs-compression xattrs Backup-bouncer is the key to ensure completeness of the backups :) Olaf From tobi at oetiker.ch Wed Mar 11 20:20:19 2015 From: tobi at oetiker.ch (Tobias Oetiker) Date: Wed, 11 Mar 2015 21:20:19 +0100 (CET) Subject: [OmniOS-discuss] 5296 Support for more than 16 groups with AUTH_SYS Message-ID: Is https://github.com/illumos/illumos-gate/commit/89621fe174cf95ae903df6ceab605bf24d696ac3 in 14 ? cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 From danmcd at omniti.com Wed Mar 11 20:31:08 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 11 Mar 2015 16:31:08 -0400 Subject: [OmniOS-discuss] 5296 Support for more than 16 groups with AUTH_SYS In-Reply-To: References: Message-ID: <6287C694-839D-43CA-8A20-E5067C0564EF@omniti.com> > On Mar 11, 2015, at 4:20 PM, Tobias Oetiker wrote: > > Is > > https://github.com/illumos/illumos-gate/commit/89621fe174cf95ae903df6ceab605bf24d696ac3 > > in 14 ? Sure is: https://github.com/omniti-labs/illumos-omnios/commit/89621fe174cf95ae903df6ceab605bf24d696ac3 Unless it's VERY new, or not in illumos-gate yet, you can assume it's going to be in 014. We close the window on illumos-gate synching sometime in the next 1-3 weeks. Dan From danmcd at omniti.com Wed Mar 11 20:51:21 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 11 Mar 2015 16:51:21 -0400 Subject: [OmniOS-discuss] TEST REQUEST - Running kvm in a zone verification In-Reply-To: <8AFC27C8-2D56-4821-8525-F7B944D57F45@omniti.com> References: <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB@omniti.com> <201503111823.t2BIN5NY007656@elvis.arl.psu.edu> <8AFC27C8-2D56-4821-8525-F7B944D57F45@omniti.com> Message-ID: <73FC7A22-8BFD-4806-9136-756243BB9ACF@omniti.com> > On Mar 11, 2015, at 3:05 PM, Dan McDonald wrote: > > > Oh hell! I actually didn't add KVM into the platform.xml files. /dev/dld *is* in the zones, if you use exclusive-stack (and I don't know a good reason NOT to these days...). I've just pushed out an update to system/zones, which will require a new BE and a reboot, but it has the kvm in platform.xml for both ipkg and lipkg brands. Dan From cf at ferebee.net Wed Mar 11 21:13:18 2015 From: cf at ferebee.net (Chris Ferebee) Date: Wed, 11 Mar 2015 22:13:18 +0100 Subject: [OmniOS-discuss] rsync & MacOS In-Reply-To: <257E5BA5-E0B8-42D1-962E-51D5CDFBE0CF@marzocchi.net> References: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com> <01391483-5BF3-4642-A592-4405EB2CCCA7@marzocchi.net> <2E06E450-631E-4623-A91E-06385AC6C81B@omniti.com> <257E5BA5-E0B8-42D1-962E-51D5CDFBE0CF@marzocchi.net> Message-ID: <7C8CD1D7-A318-4E15-9B96-4C531C09D16C@ferebee.net> You can get a precompiled copy of rsync 3.0.9 for OS X that includes --xattrs, --acls, --fileflags in the mlbackup package by Pepi Zawodsky (@MacLemon) from Best, Chris > Am 11.03.2015 um 20:45 schrieb Olaf Marzocchi : > > >> Il giorno 11/mar/2015, alle ore 20:40, Dan McDonald ha scritto: >> >> >>> On Mar 11, 2015, at 3:38 PM, Olaf Marzocchi wrote: >>> >>> In any case, I think that if you compile rsync by adding the obviously named patches provided with rsync itself, you should be fine. It?s also true that recent OSXs almost don?t use resource forks, if you backup xattrs it should be enough. >> >> I have some OLD files in my homedir, some may even predate MacOS X, so I do worry about resource forks or Creator/Type metadata. > > I would add the patches then, my first guess: > > acls > fileflags > hfs-compression > xattrs > > Backup-bouncer is the key to ensure completeness of the backups :) > > Olaf > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From omnios at citrus-it.net Wed Mar 11 23:39:05 2015 From: omnios at citrus-it.net (Andy) Date: Wed, 11 Mar 2015 23:39:05 +0000 (GMT) Subject: [OmniOS-discuss] TEST REQUEST - Running kvm in a zone verification In-Reply-To: <73FC7A22-8BFD-4806-9136-756243BB9ACF@omniti.com> References: <5F988F8F-D3C4-4085-98C6-2D0D3D27E3AB@omniti.com> <201503111823.t2BIN5NY007656@elvis.arl.psu.edu> <8AFC27C8-2D56-4821-8525-F7B944D57F45@omniti.com> <73FC7A22-8BFD-4806-9136-756243BB9ACF@omniti.com> Message-ID: On Wed, 11 Mar 2015, Dan McDonald wrote: ; ; > On Mar 11, 2015, at 3:05 PM, Dan McDonald wrote: ; > ; > ; > Oh hell! I actually didn't add KVM into the platform.xml files. /dev/dld *is* in the zones, if you use exclusive-stack (and I don't know a good reason NOT to these days...). ; ; I've just pushed out an update to system/zones, which will require a new BE and a reboot, but it has the kvm in platform.xml for both ipkg and lipkg brands. Working fine for me in an lipkg zone. root at test:/root# zoneadm list -vc ID NAME STATUS PATH BRAND IP 6 test running / native excl root at test:/root# svccfg -s kvm svc:/system/kvm> add bsd0 svc:/system/kvm> select bsd0 svc:/system/kvm:bsd0> addpg config application svc:/system/kvm:bsd0> setprop config/vnic=bsd0 svc:/system/kvm:bsd0> setprop config/vnc=5 svc:/system/kvm:bsd0> setprop config/mem=4G svc:/system/kvm:bsd0> setprop config/hdd=/dev/zvol/rdsk/test/bsd/hdd0 svc:/system/kvm:bsd0> setprop config/iso=/FreeBSD-9.2-RELEASE-amd64-disc1.iso svc:/system/kvm:bsd0> end root at test:/root# svcadm enable kvm:bsd0 root at test:/root# svcs kvm:bsd0 STATE STIME FMRI online 0:23:08 svc:/system/kvm:bsd0 oot at test:/root# netstat -an | grep 590 *.5905 *.* 0 0 128000 0 LISTEN 172.29.0.95.5905 172.29.0.10.54043 89984 0 128872 0 ESTABLISHED *.5905 *.* 0 0 128000 0 LISTEN root at test:/root# cat /var/svc/log/system-kvm:bsd0.log [ Mar 12 00:29:52 Executing start method ("/lib/svc/method/kvm start"). ] svcprop: Couldn't find property `config/extra' for instance `svc:/system/kvm:bsd0'. STARTING WITH: /usr/bin/qemu-system-x86_64 -name bsd0 -enable-kvm -vnc :5 -smp 10 -m 4G -no-hpet -localtime -drive file=/dev/zvol/rdsk/test/bsd/hdd0,if=ide,index=0 -net nic,vlan=0,name=net0,model=e1000,macaddr=2:8:20:25:52:33 -net vnic,vlan=0,name=net0,ifname=bsd0,macaddr=2:8:20:25:52:33 -vga std -daemonize -drive file=/FreeBSD-9.2-RELEASE-amd64-disc1.iso,media=cdrom,if=ide,index=2 -boot cd multiticks: timer_create: Not owner multiticks: could not create timer; disabling timer_create: Not owner Dynamic Ticks disabled qemu-system-x86_64: -net vnic,vlan=0,name=net0,ifname=bsd0,macaddr=2:8:20:25:52:33: vnic dhcp disabled qemu-system-x86_64: -net vnic,vlan=0,name=net0,ifname=bsd0,macaddr=2:8:20:25:52:33: can't ioctl: Invalid argument -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From hakansom at ohsu.edu Thu Mar 12 01:16:55 2015 From: hakansom at ohsu.edu (Marion Hakanson) Date: Wed, 11 Mar 2015 18:16:55 -0700 Subject: [OmniOS-discuss] rsync & MacOS In-Reply-To: Message from Dan McDonald of "Wed, 11 Mar 2015 15:40:43 EDT." <2E06E450-631E-4623-A91E-06385AC6C81B@omniti.com> Message-ID: <201503120116.t2C1Gtsu027144@kyklops.ohsu.edu> danmcd at omniti.com said: > I have some OLD files in my homedir, some may even predate MacOS X, so I do > worry about resource forks or Creator/Type metadata. Dan, I like Carbon Copy Cloner for backing up our Macs. It has rsync behind its GUI interface, and seems to handle native HFS+ stuff just fine. I tend to set up a remote .dmg volume on our NFS (or SMB) network share for each Mac, and treat those like whole-volume backups (similar to what Time Machine would do). But CCC also works for just a subdirectory, not just for a whole Mac volume, and to a remote share, not only to a disk image. CCC is shareware these days, but you can still download the freeware version, last I checked. Regards, Marion From lists at marzocchi.net Thu Mar 12 09:37:09 2015 From: lists at marzocchi.net (Olaf Marzocchi) Date: Thu, 12 Mar 2015 10:37:09 +0100 Subject: [OmniOS-discuss] rsync & MacOS In-Reply-To: <201503120116.t2C1Gtsu027144@kyklops.ohsu.edu> References: <201503120116.t2C1Gtsu027144@kyklops.ohsu.edu> Message-ID: <77E987D1-BA80-427B-9524-41B515A90E7D@marzocchi.net> If I remember correctly, in the past the rsync provided in CCC contained patches not available in the main tree and it was the only way to get proper backups. Nowadays the official rsync already has everything you need to backup OS X metadata and CCC is only a nice GUI. However, I think not every patch from Mr. Bombich has been submitted, some minor differences may still be there. A test is recommended. Olaf Il 12 marzo 2015 02:16:55 CET, Marion Hakanson ha scritto: >danmcd at omniti.com said: >> I have some OLD files in my homedir, some may even predate MacOS X, >so I do >> worry about resource forks or Creator/Type metadata. > >Dan, > >I like Carbon Copy Cloner for backing up our Macs. It has rsync behind >its GUI interface, and seems to handle native HFS+ stuff just fine. > >I tend to set up a remote .dmg volume on our NFS (or SMB) network share >for each Mac, and treat those like whole-volume backups (similar to >what >Time Machine would do). But CCC also works for just a subdirectory, >not just for a whole Mac volume, and to a remote share, not only to >a disk image. > >CCC is shareware these days, but you can still download the freeware >version, last I checked. > >Regards, > >Marion -------------- next part -------------- An HTML attachment was scrubbed... URL: From omnios at citrus-it.net Thu Mar 12 09:53:58 2015 From: omnios at citrus-it.net (Andy) Date: Thu, 12 Mar 2015 09:53:58 +0000 (GMT) Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations.. In-Reply-To: References: <81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com> Message-ID: On Wed, 11 Mar 2015, Andy wrote: ; ; On Fri, 27 Feb 2015, Dan McDonald wrote: ; ; ; ; ; > On Feb 27, 2015, at 5:05 AM, Andy wrote: ; ; > ; ; > If we go ahead, I'll let you all know how it goes! ; ; ; ; Please do that. If you can zap a Dell Standard HBA out of HW-RAID and into a raw-disk controller, that'd be a HUGE WIN for illumos distros everywhere. ; ; Initial results look good to me. This is a Dell R730 with a PERC H730 ; RAID card in it and just a pair of 300GB SAS disks for now. The card ; is configured in non-RAID/HBA mode through the standard BIOS menus. I spoke too soon - disk performance seems generally poor with high service times :( Everything's working apart from the disks briefly going away at boot, just slow. I have another server that I can try to elmininate the hardware but then I'll need to start trying to diagnose this. If anyone has any thoughts on what to look at first or commands to run I'd really appreciate it. It's running with the mr_sas driver and the adapter is in HBA mode. It looks like I have two options there - either HBA mode or RAID mode with the disks in non-RAID mode; not sure what the difference is but I'll try both. In addition to trying the second server, I'm also going to test with both firmware revisions that are available for the PERC, RAID0 sets (just to see) and then other OSs including Solaris 11 and whatever flavour of Linux is supported by Dell. Thanks, Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From cf at ferebee.net Thu Mar 12 13:36:56 2015 From: cf at ferebee.net (Chris Ferebee) Date: Thu, 12 Mar 2015 14:36:56 +0100 Subject: [OmniOS-discuss] rsync & MacOS In-Reply-To: <77E987D1-BA80-427B-9524-41B515A90E7D@marzocchi.net> References: <201503120116.t2C1Gtsu027144@kyklops.ohsu.edu> <77E987D1-BA80-427B-9524-41B515A90E7D@marzocchi.net> Message-ID: <9B935575-EB83-4B30-AF1A-97C121ED892C@ferebee.net> I gather it can be a bit tricky to compile rsync correctly for OS X. FWIW, mlbackup (and hence the bundled rsync 3.0.9 binary) is validated with Backup Bouncer, Mike Bombich?s test suite for HFS+ backups. Best, Chris > Am 12.03.2015 um 10:37 schrieb Olaf Marzocchi : > > If I remember correctly, in the past the rsync provided in CCC contained patches not available in the main tree and it was the only way to get proper backups. > Nowadays the official rsync already has everything you need to backup OS X metadata and CCC is only a nice GUI. > However, I think not every patch from Mr. Bombich has been submitted, some minor differences may still be there. > A test is recommended. > > Olaf > > > > Il 12 marzo 2015 02:16:55 CET, Marion Hakanson ha scritto: > danmcd at omniti.com said: > I have some OLD files in my homedir, some may even predate MacOS X, so I do > worry about resource forks or Creator/Type metadata. > > Dan, > > I like Carbon Copy Cloner for backing up our Macs. It has rsync behind > its GUI interface, and seems to handle native HFS+ stuff just fine. > > I tend to set up a remote .dmg volume on our NFS (or SMB) network share > for each Mac, and treat those like whole-volume backups (similar to what > Time Machine would do). But CCC also works for just a subdirectory, > not just for a whole Mac volume, and to a remote share, not only to > a disk image. > > CCC is shareware these days, but you can still download the freeware > version, last I checked. > > Regards, > > Marion -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4176 bytes Desc: not available URL: From omnios at citrus-it.net Thu Mar 12 13:49:20 2015 From: omnios at citrus-it.net (Andy) Date: Thu, 12 Mar 2015 13:49:20 +0000 (GMT) Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations.. In-Reply-To: References: <81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com> Message-ID: On Thu, 12 Mar 2015, Andy wrote: ; ; On Wed, 11 Mar 2015, Andy wrote: ; ; ; ; ; On Fri, 27 Feb 2015, Dan McDonald wrote: ; ; ; ; ; ; ; ; > On Feb 27, 2015, at 5:05 AM, Andy wrote: ; ; ; > ; ; ; > If we go ahead, I'll let you all know how it goes! ; ; ; ; ; ; Please do that. If you can zap a Dell Standard HBA out of HW-RAID and into a raw-disk controller, that'd be a HUGE WIN for illumos distros everywhere. ; ; ; ; Initial results look good to me. This is a Dell R730 with a PERC H730 ; ; RAID card in it and just a pair of 300GB SAS disks for now. The card ; ; is configured in non-RAID/HBA mode through the standard BIOS menus. ; ; I spoke too soon - disk performance seems generally poor with high service ; times :( Everything's working apart from the disks briefly going away ; at boot, just slow. Same story on a different R730. Abysmal disk performance in HBA mode, apparently regardless of BIOS and RAID card firmware versions; at least based on the four combinations I tried. extended device statistics ---- errors --- r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 142.0 0.0 11.6 0.0 10.0 0.0 70.4 0 100 0 0 0 0 c0t0d1s0 0.0 142.6 0.0 11.6 0.0 10.0 0.0 70.1 0 100 0 0 0 0 c0t1d1s0 and I have seen asvc_t > 300 with this test workload. However, with a mirrored rpool on top of RAID0 devices: extended device statistics ---- errors --- r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 2829.2 0.0 232.0 0.0 9.8 0.0 3.5 2 99 0 0 0 0 c0t0d0s0 0.0 2823.6 0.0 231.5 0.0 9.9 0.0 3.5 2 100 0 0 0 0 c0t1d0s0 Off to play with dtrace and see if I can work out what's happening. Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From jdg117 at elvis.arl.psu.edu Thu Mar 12 14:15:11 2015 From: jdg117 at elvis.arl.psu.edu (John D Groenveld) Date: Thu, 12 Mar 2015 10:15:11 -0400 Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations.. In-Reply-To: Your message of "Thu, 12 Mar 2015 09:53:58 -0000." References: <81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com> Message-ID: <201503121415.t2CEFBcC022831@elvis.arl.psu.edu> In message , Andy writes: >I spoke too soon - disk performance seems generally poor with high service >times :( Everything's working apart from the disks briefly going away Oye...maybe time to tell your Dell sales critter, you're going to send your business elsewhere unless he figures out how to BTO servers with non-RAID SAS HBAs similar to the ones that Dell US sells: Otherwise, good luck debugging MegaRAID drivers and firmware. What's the device ID for your RAID controller? Which version mr_sas(7D) are you using? Which version of the firmware? John groenveld at acm.org From omnios at citrus-it.net Thu Mar 12 14:26:10 2015 From: omnios at citrus-it.net (Andy) Date: Thu, 12 Mar 2015 14:26:10 +0000 (GMT) Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations.. In-Reply-To: <201503121415.t2CEFBcC022831@elvis.arl.psu.edu> References: <81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com> <201503121415.t2CEFBcC022831@elvis.arl.psu.edu> Message-ID: On Thu, 12 Mar 2015, John D Groenveld wrote: ; In message , Andy writes: ; >I spoke too soon - disk performance seems generally poor with high service ; >times :( Everything's working apart from the disks briefly going away ; ; Oye...maybe time to tell your Dell sales critter, you're going ; to send your business elsewhere unless he figures out how to BTO ; servers with non-RAID SAS HBAs similar to the ones that Dell US ; sells: ; ; ; Otherwise, good luck debugging MegaRAID drivers and firmware. ; What's the device ID for your RAID controller? pci1028,1f49 (pciex1000,5d) [LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader]], instance #0 (driver name: mr_sas) 02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02) Subsystem: Dell PERC H730 Mini Flags: bus master, fast devsel, latency 0, IRQ 15 I/O ports at 2000 Memory at 92000000 (64-bit, non-prefetchable) Memory at 91f00000 (64-bit, non-prefetchable) Expansion ROM at fff00000 [disabled] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- ; Which version mr_sas(7D) are you using? 80 fffffffff7db7000 1d070 172 1 mr_sas (6.503.00.00ILLUMOS) ; Which version of the firmware? # megacli -Version -Ctrl -aALL CTRL VERSION: ================ Product Name : PERC H730 Mini Fw Package Build : 25.2.1.0037 FW Version : 4.240.00-3615 -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From lists at marzocchi.net Thu Mar 12 16:32:25 2015 From: lists at marzocchi.net (Olaf Marzocchi) Date: Thu, 12 Mar 2015 17:32:25 +0100 Subject: [OmniOS-discuss] rsync & MacOS In-Reply-To: <9B935575-EB83-4B30-AF1A-97C121ED892C@ferebee.net> References: <201503120116.t2C1Gtsu027144@kyklops.ohsu.edu> <77E987D1-BA80-427B-9524-41B515A90E7D@marzocchi.net> <9B935575-EB83-4B30-AF1A-97C121ED892C@ferebee.net> Message-ID: <17F18021-18ED-4CCF-B910-7686DAA6CF73@marzocchi.net> It WAS tricky prior to 3.0 :) Nowadays the provided patches are enough. I'd try them first to be independent from other people's binaries and update policies. Of course, it's a matter of personal preference. Olaf Il 12 marzo 2015 14:36:56 CET, Chris Ferebee ha scritto: >I gather it can be a bit tricky to compile rsync correctly for OS X. > >FWIW, mlbackup (and hence the bundled rsync 3.0.9 binary) is validated >with Backup Bouncer, Mike Bombich?s test suite for HFS+ backups. > >Best, >Chris > >> Am 12.03.2015 um 10:37 schrieb Olaf Marzocchi : >> >> If I remember correctly, in the past the rsync provided in CCC >contained patches not available in the main tree and it was the only >way to get proper backups. >> Nowadays the official rsync already has everything you need to backup >OS X metadata and CCC is only a nice GUI. >> However, I think not every patch from Mr. Bombich has been submitted, >some minor differences may still be there. >> A test is recommended. >> >> Olaf >> >> >> >> Il 12 marzo 2015 02:16:55 CET, Marion Hakanson ha >scritto: >> danmcd at omniti.com said: >> I have some OLD files in my homedir, some may even predate MacOS X, >so I do >> worry about resource forks or Creator/Type metadata. >> >> Dan, >> >> I like Carbon Copy Cloner for backing up our Macs. It has rsync >behind >> its GUI interface, and seems to handle native HFS+ stuff just fine. >> >> I tend to set up a remote .dmg volume on our NFS (or SMB) network >share >> for each Mac, and treat those like whole-volume backups (similar to >what >> Time Machine would do). But CCC also works for just a subdirectory, >> not just for a whole Mac volume, and to a remote share, not only to >> a disk image. >> >> CCC is shareware these days, but you can still download the freeware >> version, last I checked. >> >> Regards, >> >> Marion -------------- next part -------------- An HTML attachment was scrubbed... URL: From philip.robar at gmail.com Thu Mar 12 20:35:27 2015 From: philip.robar at gmail.com (Philip Robar) Date: Thu, 12 Mar 2015 16:35:27 -0400 Subject: [OmniOS-discuss] rsync & MacOS In-Reply-To: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com> References: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com> Message-ID: It's sad that Apple is still shipping rsync version 2.6.9, but with OS X 10.7 (as near as I can tell from what I read on the net) and newer it's patched to handle extend attributes and resource forks. Note, however, that at some point the meaning of options have changed: -E has different meanings and the older version doesn't support the -X/-xattrs option that replaces the old use of -E. Macports will install version 3.1.1. ( I'm not a fan of fink (Why install a GNU environment when you have a perfectly good UNIX(?) environment already?) and I chose Macports over Homebrew, but I don't remember why. Pro Homebew: http://deephill.com/macports-vs-homebrew/ Pro Macports: http://arstechnica.com/civis/viewtopic.php?f=19&t=1207907 ) Phil -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Thu Mar 12 20:57:52 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 12 Mar 2015 16:57:52 -0400 Subject: [OmniOS-discuss] NFS ._ names and rsync (was Re: rsync & MacOS) In-Reply-To: References: <3FFD70E1-8FFD-4440-8AC4-2CA310D3B2E7@omniti.com> Message-ID: > On Mar 12, 2015, at 4:35 PM, Philip Robar wrote: > > It's sad that Apple is still shipping rsync version 2.6.9, but with OS X 10.7 (as near as I can tell from what I read on the net) and newer it's patched to handle extend attributes and resource forks. Note, however, that at some point the meaning of options have changed: -E has different meanings and the older version doesn't support the -X/-xattrs option that replaces the old use of -E. 10.7 or better. THat's partially helpful. I still have 10.6 on a few nodes. I forgot to ask something releated. Today, when I place files using NFS (likely NFSv3) on 10.6, I see ._ which contains resource forks on the OmniOS side. Using NFS (likely NFSv4) on 10.10, I also see them. Does rsync do ._ stuff like NFS does?!? Ideally it would , that way I can rsynch, but then recover individual files using NFS. Thanks, Dan From tobi at oetiker.ch Fri Mar 13 07:25:14 2015 From: tobi at oetiker.ch (Tobias Oetiker) Date: Fri, 13 Mar 2015 08:25:14 +0100 (CET) Subject: [OmniOS-discuss] incomplete recursive snapshots Message-ID: I got a bunch of new disks on one of our systems and wanted to transfer an existing pool over to them so what I did was this: zfs snapshot -r old-pool at replicaton zfs send -R old-pool at replication | mbuffer -m 1G | zfs receive -F -d new-pool but then halfway through the operation, I got warnings from send, that old-pool/some/fileset at replication would not exist ... when I went to investigate, I found indeed that zfs snapshot -r had neglected to create a snapshot on old-pool/some/fileset. So I ran zfs list -r -o name old-pool | xargs -n1 perl -e 'system "zfs","list",$ARGV[0].q{@replication}' and found that there were about 10% of the filesets which were lacking this snapshot ... I then proceeded to create the missing snapshot individually, and it worked fine. I have since repeated the experiment and found the same problem again ... any idea how this can be ? cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 From danmcd at omniti.com Fri Mar 13 14:13:19 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 13 Mar 2015 10:13:19 -0400 Subject: [OmniOS-discuss] incomplete recursive snapshots In-Reply-To: References: Message-ID: Only recently fixed snapshot bug I could find was Illumos 5150 http://www.illumos.org/issues/5150 Also, could you share the precise warnings? It'll help finding who's doing the complaining. Dan Sent from my iPhone (typos, autocorrect, and all) > On Mar 13, 2015, at 3:25 AM, Tobias Oetiker wrote: > > I got a bunch of new disks on one of our systems and wanted to > transfer an existing pool over to them so what I did was this: > > zfs snapshot -r old-pool at replicaton > zfs send -R old-pool at replication | mbuffer -m 1G | zfs receive -F -d new-pool > > but then halfway through the operation, I got warnings from send, > that old-pool/some/fileset at replication would not exist ... > > when I went to investigate, I found indeed that zfs snapshot -r had > neglected to create a snapshot on old-pool/some/fileset. So I > ran > > zfs list -r -o name old-pool | xargs -n1 perl -e 'system "zfs","list",$ARGV[0].q{@replication}' > > and found that there were about 10% of the filesets which were > lacking this snapshot ... > > I then proceeded to create the missing snapshot individually, and > it worked fine. > > I have since repeated the experiment and found the same problem > again ... > > any idea how this can be ? > > cheers > tobi > > > -- > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland > www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From tobi at oetiker.ch Fri Mar 13 14:16:13 2015 From: tobi at oetiker.ch (Tobias Oetiker) Date: Fri, 13 Mar 2015 15:16:13 +0100 (CET) Subject: [OmniOS-discuss] incomplete recursive snapshots In-Reply-To: References: Message-ID: Hi Dan, Today Dan McDonald wrote: > Only recently fixed snapshot bug I could find was Illumos 5150 http://www.illumos.org/issues/5150 > > Also, could you share the precise warnings? It'll help finding who's doing the complaining. *blush* it was my bad ... see http://serverfault.com/questions/675185/incomplete-recursive-snapshots-on-zfs cheers tobi > > Dan > > Sent from my iPhone (typos, autocorrect, and all) > > > On Mar 13, 2015, at 3:25 AM, Tobias Oetiker wrote: > > > > I got a bunch of new disks on one of our systems and wanted to > > transfer an existing pool over to them so what I did was this: > > > > zfs snapshot -r old-pool at replicaton > > zfs send -R old-pool at replication | mbuffer -m 1G | zfs receive -F -d new-pool > > > > but then halfway through the operation, I got warnings from send, > > that old-pool/some/fileset at replication would not exist ... > > > > when I went to investigate, I found indeed that zfs snapshot -r had > > neglected to create a snapshot on old-pool/some/fileset. So I > > ran > > > > zfs list -r -o name old-pool | xargs -n1 perl -e 'system "zfs","list",$ARGV[0].q{@replication}' > > > > and found that there were about 10% of the filesets which were > > lacking this snapshot ... > > > > I then proceeded to create the missing snapshot individually, and > > it worked fine. > > > > I have since repeated the experiment and found the same problem > > again ... > > > > any idea how this can be ? > > > > cheers > > tobi > > > > > > -- > > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland > > www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 > > > > _______________________________________________ > > OmniOS-discuss mailing list > > OmniOS-discuss at lists.omniti.com > > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 From fwp at deepthought.com Sat Mar 14 19:09:43 2015 From: fwp at deepthought.com (Frank Pittel) Date: Sat, 14 Mar 2015 14:09:43 -0500 Subject: [OmniOS-discuss] Problem with a couple of drives and omnios Message-ID: <20150314190942.GA22808@warlock.deepthought.com> I have a machine here at home running Omnios and I love it. It has 6 drives installed all sata and connected to the motherboard. There are 3 zpools with two drives each. One zpool has 2 1TB drives as an rpool and 2 zpools with 2 - 2TB drives in each. I've set up all pools as mirrored. On monday while i was out the power went out and since this is a box that I play with it's not hooked up to my ups. When the power came back up I noticed the machine wouldn't boot. I got the following bizzare error: krtld: failed to open '/platform/i86pc/kernel/amd64/u' krtld bind_primary(): no relocation information found for module /platform/i86pc/kernel/amd64/u krtld: error during initial load/link phase The errors go on and on along those lines and then I get: Unable to boot Press any key to reboot. I thought at first that something happened to my boot drives so I unplugged my 2TB drives and tried to boot from dvd to reinstall. I didn't hit the button for the boot menu fast enough and ended up booting from disk. To my surprise the OS booted fine. I then plugged in the 2TB zones and tried booting again. To make a long story short I found that the two drives for one of the pools were causing the problem. I've tried deleting the zpools, removing partitions and even used dd to overwrite the MBR on the drives. No luck with any of the attempts. Have I damaged the drives in some wierd way or done something to keep them from working with omnios? Frank From danmcd at omniti.com Sat Mar 14 18:59:11 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sat, 14 Mar 2015 14:59:11 -0400 Subject: [OmniOS-discuss] Problem with a couple of drives and omnios In-Reply-To: <20150314190942.GA22808@warlock.deepthought.com> References: <20150314190942.GA22808@warlock.deepthought.com> Message-ID: <8039FCEF-1D1A-423C-8106-3D5BF519F4F4@omniti.com> > On Mar 14, 2015, at 3:09 PM, Frank Pittel wrote: > > > I thought at first that something happened to my boot drives so I unplugged my 2TB drives and tried to boot from dvd to reinstall. I didn't hit > the button for the boot menu fast enough and ended up booting from disk. To my surprise the OS booted fine. I then plugged in the 2TB zones and > tried booting again. To make a long story short I found that the two drives for one of the pools were causing the problem. I've tried deleting the > zpools, removing partitions and even used dd to overwrite the MBR on the drives. No luck with any of the attempts. Have I damaged the drives in > some wierd way or done something to keep them from working with omnios? So wait, removing one of your DATA pools makes this machine boot okay? Did you check your zpool status after booting? You may have been able to export the (disconnected) pool, then plug the drives back in, then reboot, and reimport the pool. The corrupted "unix" at the end of platform/i86pc/... suggests possible a corrupt menu.lst. Did you check with grub what the menu entry was actually passing along? You can do that with the 'e' key over your specific boot menu choice. Dan From fwp at deepthought.com Sat Mar 14 19:33:40 2015 From: fwp at deepthought.com (Frank Pittel) Date: Sat, 14 Mar 2015 14:33:40 -0500 Subject: [OmniOS-discuss] Problem with a couple of drives and omnios In-Reply-To: <8039FCEF-1D1A-423C-8106-3D5BF519F4F4@omniti.com> References: <20150314190942.GA22808@warlock.deepthought.com> <8039FCEF-1D1A-423C-8106-3D5BF519F4F4@omniti.com> Message-ID: <20150314193339.GB22808@warlock.deepthought.com> On Sat, Mar 14, 2015 at 02:59:11PM -0400, Dan McDonald wrote: > > > On Mar 14, 2015, at 3:09 PM, Frank Pittel wrote: > > > > > > I thought at first that something happened to my boot drives so I unplugged my 2TB drives and tried to boot from dvd to reinstall. I didn't hit > > the button for the boot menu fast enough and ended up booting from disk. To my surprise the OS booted fine. I then plugged in the 2TB zones and > > tried booting again. To make a long story short I found that the two drives for one of the pools were causing the problem. I've tried deleting the > > zpools, removing partitions and even used dd to overwrite the MBR on the drives. No luck with any of the attempts. Have I damaged the drives in > > some wierd way or done something to keep them from working with omnios? > > So wait, removing one of your DATA pools makes this machine boot okay? Did you check your zpool status after booting? You may have been able to export the (disconnected) pool, then plug the drives back in, then reboot, and reimport the pool. > > The corrupted "unix" at the end of platform/i86pc/... suggests possible a corrupt menu.lst. Did you check with grub what the menu entry was actually passing along? You can do that with the 'e' key over your specific boot menu choice. > > Dan When I remove one of the DATA pools the machine will boot. I've tried exporting the pool and it doesn't help. While the pool drives are unplugged I can boot the machine without issue. I've then run "devfsadm -C" to remove device entries. Even then the machine won't boot with the drive connected. After the OS is booted I can plug the drives in. They aren't visible via format until I run devfsadm again. The errors got me to thinking there was something in the boot sector that was confusing my oldish MB and the MB was trying to boot off of one of those drives I used dd to overwrite the MBR with no success. I'm thinking of just going and using dd to write /dev/zero over the entire drives. There's nothing on those drives that I care about since I was just using them to play with different permutations of zfs and zones. Things like mounting filesystems in zones, etc. Frank From danmcd at omniti.com Sun Mar 15 20:44:26 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sun, 15 Mar 2015 16:44:26 -0400 Subject: [OmniOS-discuss] Problem with a couple of drives and omnios In-Reply-To: <20150314193339.GB22808@warlock.deepthought.com> References: <20150314190942.GA22808@warlock.deepthought.com> <8039FCEF-1D1A-423C-8106-3D5BF519F4F4@omniti.com> <20150314193339.GB22808@warlock.deepthought.com> Message-ID: > On Mar 14, 2015, at 3:33 PM, Frank Pittel wrote: > > When I remove one of the DATA pools the machine will boot. I've tried exporting the pool and it doesn't help. While the pool drives are unplugged > I can boot the machine without issue. I've then run "devfsadm -C" to remove device entries. Even then the machine won't boot with the drive > connected. After the OS is booted I can plug the drives in. They aren't visible via format until I run devfsadm again. The errors got me to > thinking there was something in the boot sector that was confusing my oldish MB and the MB was trying to boot off of one of those drives I used dd > to overwrite the MBR with no success. I'm thinking of just going and using dd to write /dev/zero over the entire drives. There's nothing on those > drives that I care about since I was just using them to play with different permutations of zfs and zones. Things like mounting filesystems in > zones, etc. It does sound like your MB trying to boot off of other drives. If there's nothing important there, try creating new pools, preferably using the whole disk (EFI/GPT). Dan From jim at cos.ru Mon Mar 16 08:28:43 2015 From: jim at cos.ru (Jim Klimov) Date: Mon, 16 Mar 2015 09:28:43 +0100 Subject: [OmniOS-discuss] Fix to VirtualBox installer under OI/OmniOS Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vboxconfig.sh.patch Type: application/octet-stream Size: 6484 bytes Desc: not available URL: From takashiary at gmail.com Mon Mar 16 11:24:16 2015 From: takashiary at gmail.com (takashi ary) Date: Mon, 16 Mar 2015 20:24:16 +0900 Subject: [OmniOS-discuss] Kernel Panic OmniOS r151006 svc.configd Message-ID: Hello, Kernel Panic occurred omnios-b281e50 (OmniOS r151006 LTS) on VMware ESXi 5.1 This file server (CIFS) was running over 300 days until this panic. Panic occurred 2 times at Mar 14. /var/adm/messages -------------------------------------------------------------------------------- Mar 14 06:43:24 smbsv2 unix: [ID 836849 kern.notice] Mar 14 06:43:24 smbsv2 ^Mpanic[cpu1]/thread=ffffff01cf612500: Mar 14 06:43:24 smbsv2 genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff0007e81f60 addr=ffffff01ce49eff8 Mar 14 06:43:24 smbsv2 unix: [ID 100000 kern.notice] Mar 14 06:43:24 smbsv2 unix: [ID 839527 kern.notice] svc.configd: Mar 14 06:43:24 smbsv2 unix: [ID 753105 kern.notice] #pf Page fault Mar 14 06:43:24 smbsv2 unix: [ID 532287 kern.notice] Bad kernel fault at addr=0xffffff01ce49eff8 Mar 14 06:43:24 smbsv2 unix: [ID 243837 kern.notice] pid=12, pc=0xfffffffffb8001b3, sp=0xffffff0007e82050, eflags=0x10086 Mar 14 06:43:24 smbsv2 unix: [ID 211416 kern.notice] cr0: 8005003b cr4: 6b8 Mar 14 06:43:24 smbsv2 unix: [ID 624947 kern.notice] cr2: ffffff01ce49eff8 Mar 14 06:43:24 smbsv2 unix: [ID 625075 kern.notice] cr3: 13cfef000 Mar 14 06:43:24 smbsv2 unix: [ID 625715 kern.notice] cr8: 0 Mar 14 06:43:24 smbsv2 unix: [ID 100000 kern.notice] Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice] rdi: 8 rsi: fffffffffbc7dd60 rdx: 1 Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice] rcx: 4b r8: fffffffffbc72480 r9: ffffff01d5d1b4c0 Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice] rax: fffffffffbc724c0 rbx: ffffff01cdfa7e00 rbp: ffffff0007e82050 Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice] r10: 0 r11: ffffff01ce99dcb8 r12: ffffff01cf612500 Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice] r13: fffffffffb86071e r14: ffffff01ce49f000 r15: fffffffffbc724c0 Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice] fsb: 0 gsb: ffffff01ce5ec580 ds: 4b Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice] es: 4b fs: 0 gs: 1c3 Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice] trp: e err: 9 rip: fffffffffb8001b3 Mar 14 06:43:24 smbsv2 unix: [ID 592667 kern.notice] cs: 30 rfl: 10086 rsp: ffffff0007e82050 Mar 14 06:43:24 smbsv2 unix: [ID 266532 kern.notice] ss: 38 Mar 14 06:43:24 smbsv2 unix: [ID 100000 kern.notice] Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e81e40 unix:real_mode_stop_cpu_stage2_end+9d93 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e81f50 unix:trap+db3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e81f60 unix:cmntrap+e6 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82050 unix:cmntrap+c3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82140 unix:cmntrap+c3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82230 unix:cmntrap+c3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82320 unix:cmntrap+c3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82410 unix:cmntrap+c3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82500 unix:cmntrap+c3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e825f0 unix:cmntrap+c3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e826e0 unix:cmntrap+c3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e827d0 unix:cmntrap+c3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e828c0 unix:cmntrap+c3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e829b0 unix:cmntrap+c3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82aa0 unix:cmntrap+c3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82b90 unix:cmntrap+c3 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82c90 unix:gdt_update_usegd+20 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82cb0 unix:gdt_ucode_model+37 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82ce0 unix:lwp_segregs_restore32+26 () Mar 14 06:43:24 smbsv2 genunix: [ID 655072 kern.notice] ffffff0007e82d10 genunix:restorectx+2f () Mar 14 06:43:24 smbsv2 unix: [ID 100000 kern.notice] Mar 14 06:43:24 smbsv2 genunix: [ID 672855 kern.notice] syncing file systems... Mar 14 06:43:24 smbsv2 genunix: [ID 904073 kern.notice] done Mar 14 06:43:25 smbsv2 genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel Mar 14 06:43:45 smbsv2 genunix: [ID 100000 kern.notice] Mar 14 06:43:45 smbsv2 genunix: [ID 665016 kern.notice] ^M100% done: 264317 pages dumped, Mar 14 06:43:45 smbsv2 genunix: [ID 851671 kern.notice] dump succeeded -------------------------------------------------------------------------------- crash.tar.gz (attached) -------------------------------------------------------------------------------- ID=0 and 1 echo '::panicinfo' | mdb ${ID} > ~/crash.${ID}_panicinfo echo '::cpuinfo -v' | mdb ${ID} > ~/crash.${ID}_cpuinfo echo '::threadlist -v 10' | mdb ${ID} > ~/crash.${ID}_threadlist echo '::msgbuf' | mdb ${ID} > ~/crash.${ID}_msgbuf echo '*panic_thread::findstack -v' | mdb ${ID} > ~/crash.${ID}_findstack echo '::stacks' | mdb ${ID} > ~/crash.${ID}_stacks echo '::ps' | mdb ${ID} > ~/crash.${ID}_ps -------------------------------------------------------------------------------- I couldn't find similar panic on www.illumos.org. Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: crash.tar.gz Type: application/x-gzip Size: 83764 bytes Desc: not available URL: From danmcd at omniti.com Mon Mar 16 15:10:34 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 16 Mar 2015 11:10:34 -0400 Subject: [OmniOS-discuss] [discuss] Kernel Panic OmniOS r151006 svc.configd In-Reply-To: References: Message-ID: Keeping my response on the OmniOS list only for now. Your panic info may be better shared on the illumos developer list, BTW. > On Mar 16, 2015, at 7:24 AM, takashi ary via illumos-discuss wrote: > > Normally I like to see the dumps themselves, but these are both the same panic, and both in a seemingly innocent return from a door call. Both panics are page faults, like the kernel was using a userspace pointer or something. I don't know the doorfs subsystem that well, but given I've not seen this anywhere else, I'm wondering if something odd is going on inside VMware's memory management (you did say this is running inside VMware)? No specific activity was going on prior to the panics, right? I'm a bit stumped at this point. Dan From takashiary at gmail.com Mon Mar 16 20:02:42 2015 From: takashiary at gmail.com (takashi ary) Date: Tue, 17 Mar 2015 05:02:42 +0900 Subject: [OmniOS-discuss] [discuss] Kernel Panic OmniOS r151006 svc.configd In-Reply-To: References: Message-ID: Hi Dan, Thanks for your analysis. > Normally I like to see the dumps themselves, but these are both the same panic, and both in a seemingly innocent return from a door call. It's possible to send vmdump. What is good way to send? > Both panics are page faults, like the kernel was using a userspace pointer or something. I don't know the doorfs subsystem that well, but given I've not seen this anywhere else, I'm wondering if something odd is going on inside VMware's memory management (you did say this is running inside VMware)? This is running inside VMware ESXi 5.1 Update 2 so search the VMware Knowledge Base... Windows 2008 R2, Red Hat Enterprise Linux and Solaris 10 64-bit virtual machines blue screen or kernel panic when running on ESXi 5.x with an Intel E5/E7/E3 v2 series processor (2073791) http://kb.vmware.com/kb/2073791 $ prtconf -v | grep Xeon value='Intel(r) Xeon(r) CPU E5-2697 v2 @ 2.70GHz' value='Intel(r) Xeon(r) CPU E5-2697 v2 @ 2.70GHz' Intel E5 v2 series processor! Bingo? > No specific activity was going on prior to the panics, right? Right, I think no one was using the server at that time. This info may be better shared on the illumos developer list? Thanks From danmcd at omniti.com Mon Mar 16 20:10:38 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 16 Mar 2015 16:10:38 -0400 Subject: [OmniOS-discuss] [discuss] Kernel Panic OmniOS r151006 svc.configd In-Reply-To: References: Message-ID: <4F31D9D5-FF00-41A7-953B-3EB35A2167C4@omniti.com> > On Mar 16, 2015, at 4:02 PM, takashi ary wrote: > >> Normally I like to see the dumps themselves, but these are both the same panic, and both in a seemingly innocent return from a door call. > > It's possible to send vmdump. > What is good way to send? Given what you say below, I don't think you will need to send me anything... >> Both panics are page faults, like the kernel was using a userspace pointer or something. I don't know the doorfs subsystem that well, but given I've not seen this anywhere else, I'm wondering if something odd is going on inside VMware's memory management (you did say this is running inside VMware)? > > This is running inside VMware ESXi 5.1 Update 2 > so search the VMware Knowledge Base... > > Windows 2008 R2, Red Hat Enterprise Linux and Solaris 10 64-bit > virtual machines blue screen or kernel panic when running on ESXi 5.x > with an Intel E5/E7/E3 v2 series processor (2073791) > http://kb.vmware.com/kb/2073791 > > $ prtconf -v | grep Xeon > value='Intel(r) Xeon(r) CPU E5-2697 v2 @ 2.70GHz' > value='Intel(r) Xeon(r) CPU E5-2697 v2 @ 2.70GHz' > > Intel E5 v2 series processor! > Bingo? Yep! See if the VMware note has information on the coredump for Solaris 10 --> it'll be close to any illumos distro, including OmniOS. According to the note, you need "update 3" to 5.1. >> No specific activity was going on prior to the panics, right? > > Right, I think no one was using the server at that time. > > This info may be better shared on the illumos developer list? Yes, INCLUDING the VMware technical note and its solution. Please share!!! I'm glad this isn't a problem with us, but with VMware. One major user of illumos runs on VMware, and needs to see that, but I suspect they know it already. Thanks! Dan From takashiary at gmail.com Mon Mar 16 21:48:01 2015 From: takashiary at gmail.com (takashi ary) Date: Tue, 17 Mar 2015 06:48:01 +0900 Subject: [OmniOS-discuss] [discuss] Kernel Panic OmniOS r151006 svc.configd In-Reply-To: <4F31D9D5-FF00-41A7-953B-3EB35A2167C4@omniti.com> References: <4F31D9D5-FF00-41A7-953B-3EB35A2167C4@omniti.com> Message-ID: Hi Dan, Thanks for your help. I will update my ESXi 5.1 to Update 3. I sent a mail to illumos developer list. When there is a mistake, correction, please. Thanks 2015-03-17 5:10 GMT+09:00 Dan McDonald : > > > > On Mar 16, 2015, at 4:02 PM, takashi ary wrote: > > > >> Normally I like to see the dumps themselves, but these are both the same panic, and both in a seemingly innocent return from a door call. > > > > It's possible to send vmdump. > > What is good way to send? > > Given what you say below, I don't think you will need to send me anything... > > >> Both panics are page faults, like the kernel was using a userspace pointer or something. I don't know the doorfs subsystem that well, but given I've not seen this anywhere else, I'm wondering if something odd is going on inside VMware's memory management (you did say this is running inside VMware)? > > > > This is running inside VMware ESXi 5.1 Update 2 > > so search the VMware Knowledge Base... > > > > Windows 2008 R2, Red Hat Enterprise Linux and Solaris 10 64-bit > > virtual machines blue screen or kernel panic when running on ESXi 5.x > > with an Intel E5/E7/E3 v2 series processor (2073791) > > http://kb.vmware.com/kb/2073791 > > > > $ prtconf -v | grep Xeon > > value='Intel(r) Xeon(r) CPU E5-2697 v2 @ 2.70GHz' > > value='Intel(r) Xeon(r) CPU E5-2697 v2 @ 2.70GHz' > > > > Intel E5 v2 series processor! > > Bingo? > > Yep! See if the VMware note has information on the coredump for Solaris 10 --> it'll be close to any illumos distro, including OmniOS. According to the note, you need "update 3" to 5.1. > > >> No specific activity was going on prior to the panics, right? > > > > Right, I think no one was using the server at that time. > > > > This info may be better shared on the illumos developer list? > > Yes, INCLUDING the VMware technical note and its solution. Please share!!! I'm glad this isn't a problem with us, but with VMware. One major user of illumos runs on VMware, and needs to see that, but I suspect they know it already. > > Thanks! > Dan > From danmcd at omniti.com Thu Mar 19 15:18:13 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 19 Mar 2015 11:18:13 -0400 Subject: [OmniOS-discuss] OpenSSL now updated! Message-ID: <3796FFF3-B587-4F76-8178-1B47197B94B6@omniti.com> If you're runninng 006, 010, or 012 --> OpenSSL is now 1.0.1m. If you're running bloody --> OpenSSL is now 1.0.2a. (NOTE: 1.0.2 is affected more, so upgrade this quickly!) All of the repos have been updated. Since this is openssl, you will strictly speaking not need to reboot, but if you do not reboot, you WILL need to restart services that link to openssl. Happy updating! Dan From stephan.budach at JVM.DE Fri Mar 20 09:51:59 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Fri, 20 Mar 2015 10:51:59 +0100 Subject: [OmniOS-discuss] OmniOS: zpool import dumps core Message-ID: <550BEDBF.2010506@jvm.de> Hi, OmniOS: SunOS nfsvmpool05 5.11 omnios-10b9c79 i86pc i386 i86pc (0.151012) when trying to run zpool import, the command yields this output: Assertion failed: rn->rn_nozpool == B_FALSE, file ../common/libzfs_import.c, line 1080, function zpool_open_func Abort (core dumped) I don't think that this is related to the actual zpool I created, since running zpool import in general makes this happen. This is new install that I created yesterday by first installing 006 and then updating via 008/010 to 012. Any ideas, what could have caused that? Thanks, stephan From jimklimov at cos.ru Fri Mar 20 10:30:18 2015 From: jimklimov at cos.ru (Jim Klimov) Date: Fri, 20 Mar 2015 11:30:18 +0100 Subject: [OmniOS-discuss] OpenSSL now updated! In-Reply-To: <3796FFF3-B587-4F76-8178-1B47197B94B6@omniti.com> References: <3796FFF3-B587-4F76-8178-1B47197B94B6@omniti.com> Message-ID: <61991B57-270F-481E-BC28-49AB7F814409@cos.ru> 19 ????? 2015??. 16:18:13 CET, Dan McDonald ?????: >If you're runninng 006, 010, or 012 --> OpenSSL is now 1.0.1m. > >If you're running bloody --> OpenSSL is now 1.0.2a. (NOTE: 1.0.2 is >affected more, so upgrade this quickly!) > >All of the repos have been updated. Since this is openssl, you will >strictly speaking not need to reboot, but if you do not reboot, you >WILL need to restart services that link to openssl. > >Happy updating! >Dan > >_______________________________________________ >OmniOS-discuss mailing list >OmniOS-discuss at lists.omniti.com >http://lists.omniti.com/mailman/listinfo/omnios-discuss Is there a way for IPS services to be restarted automatically when their dependency libraries change? I have a few ideas about how this wheel might be (re-)invented and bolted on, but perhaps there already is a generic solution in the packaging system? ;) Jim -- Typos courtesy of K-9 Mail on my Samsung Android From ben at fluffy.co.uk Fri Mar 20 11:08:49 2015 From: ben at fluffy.co.uk (Ben Summers) Date: Fri, 20 Mar 2015 11:08:49 +0000 Subject: [OmniOS-discuss] OpenSSL now updated! In-Reply-To: <61991B57-270F-481E-BC28-49AB7F814409@cos.ru> References: <3796FFF3-B587-4F76-8178-1B47197B94B6@omniti.com> <61991B57-270F-481E-BC28-49AB7F814409@cos.ru> Message-ID: > On 20 Mar 2015, at 10:30, Jim Klimov wrote: > > 19 ????? 2015 ?. 16:18:13 CET, Dan McDonald ?????: >> If you're runninng 006, 010, or 012 --> OpenSSL is now 1.0.1m. >> >> If you're running bloody --> OpenSSL is now 1.0.2a. (NOTE: 1.0.2 is >> affected more, so upgrade this quickly!) >> >> All of the repos have been updated. Since this is openssl, you will >> strictly speaking not need to reboot, but if you do not reboot, you >> WILL need to restart services that link to openssl. >> > > Is there a way for IPS services to be restarted automatically when their dependency libraries change? > > I have a few ideas about how this wheel might be (re-)invented and bolted on, but perhaps there already is a generic solution in the packaging system? ;) I suppose a hacky script could get a list of all the libraries and executables changed in the last update, use pfiles on all processes in all zones to files which ones have those libraries open, then use svcs -p to determine which services those processes are running under, and then restart them. Or you could just reboot. You've probably got bigger problems if you can't reboot your server. Ben -- http://bens.me.uk From stephan.budach at JVM.DE Fri Mar 20 13:04:22 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Fri, 20 Mar 2015 14:04:22 +0100 Subject: [OmniOS-discuss] OmniOS: zpool import dumps core In-Reply-To: <550BEDBF.2010506@jvm.de> References: <550BEDBF.2010506@jvm.de> Message-ID: <550C1AD6.1010805@jvm.de> Never mind. I re-installed OmniOS from a new r012 USB download and this issue went away. Must indeed have been something I have picked up while rushing through the updates from 006 to 012. Cheers, Stephan From eric.sproul at circonus.com Fri Mar 20 14:14:59 2015 From: eric.sproul at circonus.com (Eric Sproul) Date: Fri, 20 Mar 2015 10:14:59 -0400 Subject: [OmniOS-discuss] OpenSSL now updated! In-Reply-To: References: <3796FFF3-B587-4F76-8178-1B47197B94B6@omniti.com> <61991B57-270F-481E-BC28-49AB7F814409@cos.ru> Message-ID: On Fri, Mar 20, 2015 at 7:08 AM, Ben Summers wrote: > I suppose a hacky script could get a list of all the libraries and executables changed in the last update, use pfiles on all processes in all zones to files which ones have those libraries open, then use svcs -p to determine which services those processes are running under, and then restart them. Better yet, there already exists a hacky script: http://omnios.omniti.com/media/ssl_services_to_restart.sh This looks for running processes in the current zone that link libssl or libcrypto and gives you a list of services that you may wish to restart. It could be turned into something more generic, perhaps that took the name of a shared library as an argument. It is possible to have a package action trigger a service restart. See ACTUATORS in pkg(5). Circonus uses this a lot to deliver and update services via packages. One might make a case for ssl-dependent core system services (like ssh) to be restarted by the openssl package. It's obviously not practical for the OmniOS openssl package to actuate your arbitrary services though. :) Eric From jstockett at molalla.com Fri Mar 20 18:27:01 2015 From: jstockett at molalla.com (Jeff Stockett) Date: Fri, 20 Mar 2015 18:27:01 +0000 Subject: [OmniOS-discuss] NFS v4.1 in OmniOS stable? Message-ID: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com> Does OmniOS support NFS v4.1? 4.1 support is a new feature in esxi v6, and I was trying to set it up as described here: http://wahlnetwork.com/2015/02/02/nfs-v4-1/ Things of course work fine if I use NFS v3, but if I try v4.1, I get a timeout error when it tries to attach the data store. Both the omnios server and the esxi client are properly joined to Active Directory so I think the required Kerberos stuff should be working. -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Fri Mar 20 19:13:25 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 20 Mar 2015 15:13:25 -0400 Subject: [OmniOS-discuss] NFS v4.1 in OmniOS stable? In-Reply-To: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com> References: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com> Message-ID: <3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com> > On Mar 20, 2015, at 2:27 PM, Jeff Stockett wrote: > > Does OmniOS support NFS v4.1? 4.1 support is a new feature in esxi v6, and I was trying to set it up as described here: > > http://wahlnetwork.com/2015/02/02/nfs-v4-1/ > > Things of course work fine if I use NFS v3, but if I try v4.1, I get a timeout error when it tries to attach the data store. Both the omnios server and the esxi client are properly joined to Active Directory so I think the required Kerberos stuff should be working. We have NFS4.0 and earlier. We do not have NFS4.1. It would be a very sizeable undertaking, requiring illumos community support. If anyone would lead the charge on that, it'd be a storage-oriented firm, like Nexenta, or Delphix. Dan From omnios at citrus-it.net Fri Mar 20 19:39:44 2015 From: omnios at citrus-it.net (Andy) Date: Fri, 20 Mar 2015 19:39:44 +0000 (GMT) Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations.. In-Reply-To: References: <81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com> <201503121415.t2CEFBcC022831@elvis.arl.psu.edu> Message-ID: On Thu, 12 Mar 2015, Andy wrote: ; ; On Thu, 12 Mar 2015, John D Groenveld wrote: ; ; ; Otherwise, good luck debugging MegaRAID drivers and firmware. This definitely looks like a driver problem but I'm making progress. It seems that the code for handling logical versus physical disks on an LSI Invader controller is different and the PD code has some issues. Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From mir at miras.org Fri Mar 20 19:52:22 2015 From: mir at miras.org (Michael Rasmussen) Date: Fri, 20 Mar 2015 20:52:22 +0100 Subject: [OmniOS-discuss] NFS v4.1 in OmniOS stable? In-Reply-To: <3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com> References: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com> <3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com> Message-ID: <20150320205222.0d2d9ab0@sleipner.datanom.net> On Fri, 20 Mar 2015 15:13:25 -0400 Dan McDonald wrote: > > We have NFS4.0 and earlier. We do not have NFS4.1. It would be a very sizeable undertaking, requiring illumos community support. If anyone would lead the charge on that, it'd be a storage-oriented firm, like Nexenta, or Delphix. > But isn't there an Illumos project for pNFS? (http://www.pnfs.com/) -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: If you look like your driver's license photo -- see a doctor. If you look like your passport photo -- it's too late for a doctor. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From jdg117 at elvis.arl.psu.edu Fri Mar 20 19:58:45 2015 From: jdg117 at elvis.arl.psu.edu (John D Groenveld) Date: Fri, 20 Mar 2015 15:58:45 -0400 Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations.. In-Reply-To: Your message of "Fri, 20 Mar 2015 19:39:44 -0000." References: <81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com> <201503121415.t2CEFBcC022831@elvis.arl.psu.edu> Message-ID: <201503201958.t2KJwjI4025021@elvis.arl.psu.edu> In message , Andy write s: >This definitely looks like a driver problem but I'm making progress. >It seems that the code for handling logical versus physical disks on an >LSI Invader controller is different and the PD code has some issues. How much of a performance difference between mr_sas 6.503.00.00ILLUMOS and LSI's 6.606.07.00? John groenveld at acm.org From illumos at cucumber.demon.co.uk Fri Mar 20 20:09:46 2015 From: illumos at cucumber.demon.co.uk (Andrew Gabriel) Date: Fri, 20 Mar 2015 20:09:46 +0000 Subject: [OmniOS-discuss] NFS v4.1 in OmniOS stable? In-Reply-To: <3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com> References: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com> <3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com> Message-ID: <550C7E8A.6020309@cucumber.demon.co.uk> Dan McDonald wrote: >> On Mar 20, 2015, at 2:27 PM, Jeff Stockett wrote: >> >> Does OmniOS support NFS v4.1? 4.1 support is a new feature in esxi v6, and I was trying to set it up as described here: >> >> http://wahlnetwork.com/2015/02/02/nfs-v4-1/ >> >> Things of course work fine if I use NFS v3, but if I try v4.1, I get a timeout error when it tries to attach the data store. Both the omnios server and the esxi client are properly joined to Active Directory so I think the required Kerberos stuff should be working. >> > > We have NFS4.0 and earlier. We do not have NFS4.1. It would be a very sizeable undertaking, requiring illumos community support. If anyone would lead the charge on that, it'd be a storage-oriented firm, like Nexenta, or Delphix. > Does anyone seriously use (or intend to use) 4.1 anymore? Lustre has the parallel file server market (with some pockets of AFS too). The large growth of parallel storage servers now tends to be object storage, and S3 (as a protocol) has become the standard for that. Kind of makes me wonder what the market for NFSv4.1 is? -- Andrew From cks at cs.toronto.edu Fri Mar 20 20:09:32 2015 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Fri, 20 Mar 2015 16:09:32 -0400 Subject: [OmniOS-discuss] How to check if you have enough NFS server threads? Message-ID: <20150320200932.B63517A0690@apps0.cs.toronto.edu> We're running into a situation with one of our NFS ZFS fileservers[*] where we're wondering if we have enough NFS server threads to handle our load. Per 'sharectl get nfs', we have 'servers=512' configured, but we're not sure we know how to check how many are actually in use and active at any given time and whether or not we're running into this limit. Does anyone know how to tell either? We've looked at mdb -k's '::svc_pool nfs' but I've concluded that I don't know enough about OmniOS kernel internals to know for sure what it's telling us (partly because it seems to be giving us implausibly high numbers). Is the number we're looking for 'Non detached threads' minus 'Asleep threads'? (Or that plus detached threads?) Thanks in advance. - cks [*: our server setup and configuration is: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFileserverSetupII ] From ikaufman at eng.ucsd.edu Fri Mar 20 20:23:19 2015 From: ikaufman at eng.ucsd.edu (Ian Kaufman) Date: Fri, 20 Mar 2015 13:23:19 -0700 Subject: [OmniOS-discuss] NFS v4.1 in OmniOS stable? In-Reply-To: <550C7E8A.6020309@cucumber.demon.co.uk> References: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com> <3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com> <550C7E8A.6020309@cucumber.demon.co.uk> Message-ID: Lustre is great for HPC. It lacks the "sanctity of data" that other solutions have. It is getting there - using ZFS is a huge step forward, but Lustre is by no means a general purpose solution at this point. Ian On Fri, Mar 20, 2015 at 1:09 PM, Andrew Gabriel wrote: > Dan McDonald wrote: >>> >>> On Mar 20, 2015, at 2:27 PM, Jeff Stockett wrote: >>> >>> Does OmniOS support NFS v4.1? 4.1 support is a new feature in esxi v6, >>> and I was trying to set it up as described here: >>> http://wahlnetwork.com/2015/02/02/nfs-v4-1/ >>> Things of course work fine if I use NFS v3, but if I try v4.1, I get a >>> timeout error when it tries to attach the data store. Both the omnios >>> server and the esxi client are properly joined to Active Directory so I >>> think the required Kerberos stuff should be working. >>> >> >> >> We have NFS4.0 and earlier. We do not have NFS4.1. It would be a very >> sizeable undertaking, requiring illumos community support. If anyone would >> lead the charge on that, it'd be a storage-oriented firm, like Nexenta, or >> Delphix. >> > > > Does anyone seriously use (or intend to use) 4.1 anymore? > > Lustre has the parallel file server market (with some pockets of AFS too). > The large growth of parallel storage servers now tends to be object storage, > and S3 (as a protocol) has become the standard for that. > > Kind of makes me wonder what the market for NFSv4.1 is? > > -- > Andrew > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -- Ian Kaufman Research Systems Administrator UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu From richard.elling at richardelling.com Fri Mar 20 20:46:04 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Fri, 20 Mar 2015 13:46:04 -0700 Subject: [OmniOS-discuss] How to check if you have enough NFS server threads? In-Reply-To: <20150320200932.B63517A0690@apps0.cs.toronto.edu> References: <20150320200932.B63517A0690@apps0.cs.toronto.edu> Message-ID: <5267538B-C118-4D81-B019-D4C8F0A8DAA0@richardelling.com> > On Mar 20, 2015, at 1:09 PM, Chris Siebenmann wrote: > > We're running into a situation with one of our NFS ZFS fileservers[*] > where we're wondering if we have enough NFS server threads to handle > our load. Per 'sharectl get nfs', we have 'servers=512' configured, > but we're not sure we know how to check how many are actually in use > and active at any given time and whether or not we're running into > this limit. > > Does anyone know how to tell either? Yes, these are dynamically sized and you can track via the number of current threads as shown by ps or something sneaky like "ls /proc/$(pgrep nfsd)/lwp | wc -l" Some distros, including Solaris 11.1, have kstats for this information. So when we track them over time, they can and do change dynamically and quickly. > > We've looked at mdb -k's '::svc_pool nfs' but I've concluded that I > don't know enough about OmniOS kernel internals to know for sure what > it's telling us (partly because it seems to be giving us implausibly > high numbers). Is the number we're looking for 'Non detached threads' > minus 'Asleep threads'? (Or that plus detached threads?) In general, the number of threads is an indication of the load of the clients and the service ability of the server (in queuing theory terms). Too much load gives the same result as too slow of a back-end. In NFS, clients limit the number of concurrent requests, which is the best way to deal with too much load. -- richard > > Thanks in advance. > > - cks > [*: our server setup and configuration is: > http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFileserverSetupII > ] > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From illumos at cucumber.demon.co.uk Fri Mar 20 21:59:27 2015 From: illumos at cucumber.demon.co.uk (Andrew Gabriel) Date: Fri, 20 Mar 2015 21:59:27 +0000 Subject: [OmniOS-discuss] How to check if you have enough NFS server threads? In-Reply-To: <20150320200932.B63517A0690@apps0.cs.toronto.edu> References: <20150320200932.B63517A0690@apps0.cs.toronto.edu> Message-ID: <550C983F.8070606@cucumber.demon.co.uk> Chris Siebenmann wrote: > We're running into a situation with one of our NFS ZFS fileservers[*] > where we're wondering if we have enough NFS server threads to handle > our load. Per 'sharectl get nfs', we have 'servers=512' configured, > but we're not sure we know how to check how many are actually in use > and active at any given time and whether or not we're running into > this limit. > If raising that limit then causes reports of: WARNING: svc_cots_kdup no slots free WARNING: svc_clts_kdup no slots free together with the NFS clients getting EIO errors back, you may need to increase the size of the duplicate check cache in the kernel rpcmod by raising rpcmod:cotsmaxdupreqs and rpcmod:maxdupreqs (not sure that all Illumos distros use same default values). -- Andrew From omnios at citrus-it.net Fri Mar 20 23:05:58 2015 From: omnios at citrus-it.net (Andy) Date: Fri, 20 Mar 2015 23:05:58 +0000 (GMT) Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations.. In-Reply-To: <201503201958.t2KJwjI4025021@elvis.arl.psu.edu> References: <81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com> <201503121415.t2CEFBcC022831@elvis.arl.psu.edu> <201503201958.t2KJwjI4025021@elvis.arl.psu.edu> Message-ID: On Fri, 20 Mar 2015, John D Groenveld wrote: ; In message , Andy write ; s: ; >This definitely looks like a driver problem but I'm making progress. ; >It seems that the code for handling logical versus physical disks on an ; >LSI Invader controller is different and the PD code has some issues. ; ; How much of a performance difference between mr_sas 6.503.00.00ILLUMOS ; and LSI's 6.606.07.00? 6.606.. doesn't work for me, it receives the LD map then starts trying to "kill" the adapter. 6.605.01.00 is fine however and throughput is 10x better than the Illumos driver with a much more stable response time. With 6.503.00.00ILLUMOS, the RAID card keeps reporting 03/20/15 23:00:40: C0:iopiSCSIIOCompleteError: FPESTATUS_DEVHANDLE_OUT_OF_RANGE mid x02e6 PtrMsg xc00ccc00 03/20/15 23:00:40: C0:Out of range devHandle x0000 from SMID x0000022b about 300 times a second. -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From henson at acm.org Fri Mar 20 23:07:21 2015 From: henson at acm.org (Paul B. Henson) Date: Fri, 20 Mar 2015 16:07:21 -0700 Subject: [OmniOS-discuss] NFS v4.1 in OmniOS stable? In-Reply-To: <550C7E8A.6020309@cucumber.demon.co.uk> References: <136C13E89D22BB468B2A7025993639732F4DE9A1@EXMCCMB.molalla.com> <3D1E8AC6-0F72-45F3-A56B-03E052826A18@omniti.com> <550C7E8A.6020309@cucumber.demon.co.uk> Message-ID: <057301d06362$a0dae7f0$e290b7d0$@acm.org> > From: Andrew Gabriel > Sent: Friday, March 20, 2015 1:10 PM > > Does anyone seriously use (or intend to use) 4.1 anymore? The one and only thing I want out of NFSv4.1 is the protocol fix to the exclusive open operation, which currently in NFSv4 results in broken inherited ACLs :(. From jboren at drakecooper.com Fri Mar 20 23:15:44 2015 From: jboren at drakecooper.com (Joseph Boren) Date: Fri, 20 Mar 2015 17:15:44 -0600 Subject: [OmniOS-discuss] list of know-compatible motherboards? In-Reply-To: References: Message-ID: Sorry to dredge this old thread up, but I wanted to add my experience with the Supermicro H8SGL-F motherboard, on the off chance someone is considering using that motherboard and reads this thread. And this is in no way a criticism of F?bio, or complaint about his recommendations. I appreciate the advice and I'm sure his use case is just different enough from mine that he didn't surface these issues. This board has some limitations that may make it not a great choice depending on your intentions for it. First of all, the BIOS can see a maximum of 12 attached HDs. And for some strange reason it sees HDs attached to HBAs before it sees HDs attached to the onboard SATA connectors, so if you have 12 drives attached to HBAs you cannot use the onboard SATA, BIOS can't see it (if you have 11 drives on HBAs you can see 1 local drive, etc). In addition, if you have more than 12 drives attached to HBAs, you can't boot from the drives higher than 12. So I have 14 drives attached to 2 8port HBAs, I can only set any of the first 12 as boot devices. 13 and 14 cannot be used. Finally (i think), if you have 12 or more drives attached to any HBA, on board SATA, whatever, you cannot boot from a flash drive. Even if you set it as the ONLY boot device it will just skip it and complain that there is no bootable device. If you only have 11 drives you can boot from USB Flash no problem. Interestingly, a USB CDROM is unaffected by this. You can select and boot from a USB CDROM regardless of how many drives are attached. Finally (actually this time I think), it appears to be impossible to set up a mirrored syspool on this motherboard, because there is only one slot in the Bios boot order menu for Hard Disk. So you can only choose one of the mirror pair as a boot device. There is no way to specify another HD as a second priority boot device. Now once you get OmniOS loaded it can see and make use of all drives attached to the system, but you are very restricted in what you can use for boot devices. After the better part of 2 weeks of back and forth with Supermicro support (who have been really nice and cooperative, but unable to do anything about it), I'm going to have to eat cost of this board/cpu/memory and get something else. If your use case is 12 total drives or less, and no mirrored boot, this board will work fine. If you need more than 12 drives, or mirrored syspool, it will not work. Thanks Joe -jb- *Joseph Boren* IT Specialist *DRAKE COOPER* + c: (208) 891-2128 + o: (208) 342-0925 + 416 S. 8th St., Boise, ID 83702 + w: drakecooper.com + f: /drakecooper + t: @drakecooper On Wed, Nov 19, 2014 at 4:54 PM, Joseph Boren wrote: > Wow, F?bio, thanks so much, that is very helpful. I was looking at > supermicro motherboards, so your info is perfect. > > I will have a look at those, I'm guessing I can find something that fits > my use case. Thanks again, the help is much appreciated. > > Best regards, > > -jb- > *Joseph Boren* > > IT Specialist > *DRAKE COOPER* > + c: (208) 891-2128 + o: (208) 342-0925 > + 416 S. 8th St., Boise, ID 83702 > + w: drakecooper.com + f: /drakecooper + > t: @drakecooper > > > On Wed, Nov 19, 2014 at 4:48 PM, F?bio Rabelo > wrote: > >> I can show you what motherboards I have installed and fully working in >> the customers of mine : >> >> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DG6-F.cfm >> >> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DG6.cfm >> >> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DGi-F.cfm >> >> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DGi.cfm >> >> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8SGL-F.cfm >> >> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8SGL.cfm >> >> The ones with LSI SAS controler needs to be flashed with IT firmware, you >> can find them in the official Supermicro FTP site : >> >> ftp://ftp.supermicro.com/Driver/SAS/LSI/ >> >> Opterons from 8 to 24 cores, no issue whats soever ... >> >> Some of them are up and running for over an year !!! >> >> >> F?bio Rabelo >> >> 2014-11-19 21:35 GMT-02:00 Joseph Boren : >> >>> Is anyone aware of a list, even a short list, of motherboards that are >>> known to be compatible with OmniOS? The illumos HCL doesn't list any >>> motherboards. >>> >>> Thanks, >>> >>> -jb- >>> *Joseph Boren* >>> >>> IT Specialist >>> *DRAKE COOPER* >>> + c: (208) 891-2128 + o: (208) 342-0925 >>> + 416 S. 8th St., Boise, ID 83702 >>> + w: drakecooper.com + f: /drakecooper >>> + t: @drakecooper >>> >>> >>> _______________________________________________ >>> OmniOS-discuss mailing list >>> OmniOS-discuss at lists.omniti.com >>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hakansom at ohsu.edu Sat Mar 21 01:23:04 2015 From: hakansom at ohsu.edu (Marion Hakanson) Date: Fri, 20 Mar 2015 18:23:04 -0700 Subject: [OmniOS-discuss] list of know-compatible motherboards? In-Reply-To: Message from Joseph Boren of "Fri, 20 Mar 2015 17:15:44 MDT." Message-ID: <201503210123.t2L1N4sb007295@kyklops.ohsu.edu> Joseph, You can work around the "too many drives" problems by making the assumption that you will never need to boot off of your external disks (the ones attached to the SAS HBA's). Then you enter each SAS HBA's BIOS config manager, and disable booting for that HBA. The motherboard BIOS will then no longer see the drives attached to the HBA's. I do this as a matter of course, because we have systems with as many as 120 drives attached via external SAS HBA's. No BIOS copes well with so many potential boot devices. Regards, Marion ================================================================= Subject: Re: [OmniOS-discuss] list of know-compatible motherboards? From: Joseph Boren Date: Fri, 20 Mar 2015 17:15:44 -0600 (16:15 PDT) To: F??bio Rabelo Cc: omnios-discuss Sorry to dredge this old thread up, but I wanted to add my experience with the Supermicro H8SGL-F motherboard, on the off chance someone is considering using that motherboard and reads this thread. And this is in no way a criticism of F?bio, or complaint about his recommendations. I appreciate the advice and I'm sure his use case is just different enough from mine that he didn't surface these issues. This board has some limitations that may make it not a great choice depending on your intentions for it. First of all, the BIOS can see a maximum of 12 attached HDs. And for some strange reason it sees HDs attached to HBAs before it sees HDs attached to the onboard SATA connectors, so if you have 12 drives attached to HBAs you cannot use the onboard SATA, BIOS can't see it (if you have 11 drives on HBAs you can see 1 local drive, etc). In addition, if you have more than 12 drives attached to HBAs, you can't boot from the drives higher than 12. So I have 14 drives attached to 2 8port HBAs, I can only set any of the first 12 as boot devices. 13 and 14 cannot be used. Finally (i think), if you have 12 or more drives attached to any HBA, on board SATA, whatever, you cannot boot from a flash drive. Even if you set it as the ONLY boot device it will just skip it and complain that there is no bootable device. If you only have 11 drives you can boot from USB Flash no problem. Interestingly, a USB CDROM is unaffected by this. You can select and boot from a USB CDROM regardless of how many drives are attached. Finally (actually this time I think), it appears to be impossible to set up a mirrored syspool on this motherboard, because there is only one slot in the Bios boot order menu for Hard Disk. So you can only choose one of the mirror pair as a boot device. There is no way to specify another HD as a second priority boot device. Now once you get OmniOS loaded it can see and make use of all drives attached to the system, but you are very restricted in what you can use for boot devices. After the better part of 2 weeks of back and forth with Supermicro support (who have been really nice and cooperative, but unable to do anything about it), I'm going to have to eat cost of this board/cpu/memory and get something else. If your use case is 12 total drives or less, and no mirrored boot, this board will work fine. If you need more than 12 drives, or mirrored syspool, it will not work. Thanks Joe -jb- *Joseph Boren* . . . From jboren at drakecooper.com Sun Mar 22 21:05:45 2015 From: jboren at drakecooper.com (Joseph Boren) Date: Sun, 22 Mar 2015 15:05:45 -0600 Subject: [OmniOS-discuss] list of know-compatible motherboards? In-Reply-To: <0B39E89B-8B5E-4D36-B9D9-18136A87BF58@icloud.com> References: <0B39E89B-8B5E-4D36-B9D9-18136A87BF58@icloud.com> Message-ID: Hi Ben, Thanks for the tip. It turns out that board had some actual physical defects that were causing some weird behaviour that was confusing the whole issue. What you suggest should work perfectly for that scenario. I'm exchanging the board and I'm sure the new one will be fine. Thanks again for the idea. Best regards, joe boren -jb- *Joseph Boren* IT Specialist *DRAKE COOPER* + c: (208) 891-2128 + o: (208) 342-0925 + 416 S. 8th St., Boise, ID 83702 + w: drakecooper.com + f: /drakecooper + t: @drakecooper On Sat, Mar 21, 2015 at 1:08 AM, Ben Kitching wrote: > Hi Joe, > > I?ve had similar problems with Supermicro boards in the past. > > Have you tried disabling the option ROMs for your HBA?s in the BIOS? > > That solved it for us. > > On 20 Mar 2015, at 23:15, Joseph Boren wrote: > > Sorry to dredge this old thread up, but I wanted to add my experience with > the Supermicro H8SGL-F motherboard, on the off chance someone is > considering using that motherboard and reads this thread. And this is in > no way a criticism of F?bio, or complaint about his recommendations. I > appreciate the advice and I'm sure his use case is just different enough > from mine that he didn't surface these issues. > > This board has some limitations that may make it not a great choice > depending on your intentions for it. First of all, the BIOS can see a > maximum of 12 attached HDs. And for some strange reason it sees HDs > attached to HBAs before it sees HDs attached to the onboard SATA > connectors, so if you have 12 drives attached to HBAs you cannot use the > onboard SATA, BIOS can't see it (if you have 11 drives on HBAs you can see > 1 local drive, etc). In addition, if you have more than 12 drives attached > to HBAs, you can't boot from the drives higher than 12. So I have 14 > drives attached to 2 8port HBAs, I can only set any of the first 12 as boot > devices. 13 and 14 cannot be used. Finally (i think), if you have 12 or > more drives attached to any HBA, on board SATA, whatever, you cannot boot > from a flash drive. Even if you set it as the ONLY boot device it will > just skip it and complain that there is no bootable device. If you only > have 11 drives you can boot from USB Flash no problem. Interestingly, a > USB CDROM is unaffected by this. You can select and boot from a USB CDROM > regardless of how many drives are attached. Finally (actually this time I > think), it appears to be impossible to set up a mirrored syspool on this > motherboard, because there is only one slot in the Bios boot order menu for > Hard Disk. So you can only choose one of the mirror pair as a boot > device. There is no way to specify another HD as a second priority boot > device. Now once you get OmniOS loaded it can see and make use of all > drives attached to the system, but you are very restricted in what you can > use for boot devices. > > After the better part of 2 weeks of back and forth with Supermicro support > (who have been really nice and cooperative, but unable to do anything about > it), I'm going to have to eat cost of this board/cpu/memory and get > something else. If your use case is 12 total drives or less, and no > mirrored boot, this board will work fine. If you need more than 12 drives, > or mirrored syspool, it will not work. > > Thanks > Joe > > > -jb- > *Joseph Boren* > > IT Specialist > *DRAKE COOPER* > + c: (208) 891-2128 + o: (208) 342-0925 > + 416 S. 8th St., Boise, ID 83702 > + w: drakecooper.com + f: /drakecooper + > t: @drakecooper > > > > On Wed, Nov 19, 2014 at 4:54 PM, Joseph Boren > wrote: > >> Wow, F?bio, thanks so much, that is very helpful. I was looking at >> supermicro motherboards, so your info is perfect. >> >> I will have a look at those, I'm guessing I can find something that fits >> my use case. Thanks again, the help is much appreciated. >> >> Best regards, >> >> >> -jb- >> *Joseph Boren* >> >> IT Specialist >> *DRAKE COOPER* >> + c: (208) 891-2128 + o: (208) 342-0925 >> + 416 S. 8th St., Boise, ID 83702 >> + w: drakecooper.com + f: /drakecooper >> + t: @drakecooper >> >> >> >> On Wed, Nov 19, 2014 at 4:48 PM, F?bio Rabelo >> wrote: >> >>> I can show you what motherboards I have installed and fully working in >>> the customers of mine : >>> >>> >>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DG6-F.cfm >>> >>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DG6.cfm >>> >>> >>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DGi-F.cfm >>> >>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DGi.cfm >>> >>> >>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8SGL-F.cfm >>> >>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8SGL.cfm >>> >>> The ones with LSI SAS controler needs to be flashed with IT firmware, >>> you can find them in the official Supermicro FTP site : >>> >>> ftp://ftp.supermicro.com/Driver/SAS/LSI/ >>> >>> Opterons from 8 to 24 cores, no issue whats soever ... >>> >>> Some of them are up and running for over an year !!! >>> >>> >>> F?bio Rabelo >>> >>> 2014-11-19 21:35 GMT-02:00 Joseph Boren : >>> >>>> Is anyone aware of a list, even a short list, of motherboards that are >>>> known to be compatible with OmniOS? The illumos HCL doesn't list any >>>> motherboards. >>>> >>>> Thanks, >>>> >>>> >>>> -jb- >>>> *Joseph Boren* >>>> >>>> IT Specialist >>>> *DRAKE COOPER* >>>> + c: (208) 891-2128 + o: (208) 342-0925 >>>> + 416 S. 8th St., Boise, ID 83702 >>>> + w: drakecooper.com + f: /drakecooper >>>> + t: @drakecooper >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> OmniOS-discuss mailing list >>>> OmniOS-discuss at lists.omniti.com >>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >>>> >>>> >>> >> > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From omnios at citrus-it.net Mon Mar 23 00:01:26 2015 From: omnios at citrus-it.net (Andy) Date: Mon, 23 Mar 2015 00:01:26 +0000 (GMT) Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations.. In-Reply-To: References: <81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com> <201503121415.t2CEFBcC022831@elvis.arl.psu.edu> Message-ID: On Fri, 20 Mar 2015, Andy wrote: ; On Thu, 12 Mar 2015, Andy wrote: ; ; ; ; On Thu, 12 Mar 2015, John D Groenveld wrote: ; ; ; ; ; Otherwise, good luck debugging MegaRAID drivers and firmware. ; ; This definitely looks like a driver problem but I'm making progress. ; It seems that the code for handling logical versus physical disks on an ; LSI Invader controller is different and the PD code has some issues. I think I've cracked it! carolina# (43) zfs create -o compress=off rpool/test carolina# (48) dd if=/dev/zero of=/rpool/test/tt bs=512k count=10000 5242880000 bytes transferred in 13.199665 secs (397197954 bytes/sec) Mirrored rpool on 15K SAS disks. Previously I was hitting 20MB/s maximum. No more errors in the controller firmware log either. I'll test properly over the next few days and clean up the diffs, but it looks good and the changes should only affect the non-RAID code. Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From danmcd at omniti.com Mon Mar 23 03:52:02 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sun, 22 Mar 2015 23:52:02 -0400 Subject: [OmniOS-discuss] Dell vs. Supermicro and any recommendations.. In-Reply-To: References: <81F2A38D-5298-4FB8-B0BB-24E4506D6040@omniti.com> <201503121415.t2CEFBcC022831@elvis.arl.psu.edu> Message-ID: <22F91F32-07D1-42BF-99DE-F6C87037463A@omniti.com> > On Mar 22, 2015, at 8:01 PM, Andy wrote: > > > I think I've cracked it! > > carolina# (43) zfs create -o compress=off rpool/test > carolina# (48) dd if=/dev/zero of=/rpool/test/tt bs=512k count=10000 > 5242880000 bytes transferred in 13.199665 secs (397197954 bytes/sec) > > Mirrored rpool on 15K SAS disks. Previously I was hitting 20MB/s maximum. > No more errors in the controller firmware log either. > > I'll test properly over the next few days and clean up the diffs, but it > looks good and the changes should only affect the non-RAID code. Please make sure it gets reviewed on the illumos developer list. If you're quick, it will make r151014 before I cut it off. Thank you for cracking it! :) Dan From tobi at oetiker.ch Mon Mar 23 14:33:24 2015 From: tobi at oetiker.ch (Tobias Oetiker) Date: Mon, 23 Mar 2015 15:33:24 +0100 (CET) Subject: [OmniOS-discuss] kvm crashing while running replication send/receive Message-ID: I got these bunch of new disks when for our (r12 omnios) server and userd repication send / receive to transfer an existing pool to the new disks. While doing so, we found that the kvm instances running on that machine had a rather pronounced tendency to become unresponsive. Killing the kvm process and starting it again helped ... Neither the sending nor the receiving pool were the ones where the kvm volumes where hosted ... Any ideas how this can happen ? cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 From danmcd at omniti.com Mon Mar 23 20:14:57 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 23 Mar 2015 16:14:57 -0400 Subject: [OmniOS-discuss] A warning for upgraders with large numbers of BEs Message-ID: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com> Soon r151014 will be hitting the streets. WHEN THAT DOES, I have to warn people, especially those jumping from r151006 to r151014 about a known issue in grub. The illumos grub has serious memory management issues. It cannot cope with too many boot environment (BE) entries. The upper-limit on r151006 was ~60. The upper-limit on r151014 is ~40. If you upgrade an r151006 machine with 50 BEs to r151014, you may lose the ability to boot (but not your data or even rpool). If you have more than 40 BEs on your rpool, I'd highly recommend trimming some back prior to an upgrade. We've been (the illumos community, not just OmniOS) trying to figure out what to fix in grub, but it's opaque code at best. The r151014 installation & upgrade page will have this warning as well, but I wanted to give the community a heads-up now, so you could prepare prior to the upgrade to r151014. Dan From danmcd at omniti.com Mon Mar 23 21:40:50 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 23 Mar 2015 17:40:50 -0400 Subject: [OmniOS-discuss] A warning for upgraders with large numbers of BEs In-Reply-To: <20150323205308.GA21991@linux.gyakg.u-szeged.hu> References: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com> <20150323205308.GA21991@linux.gyakg.u-szeged.hu> Message-ID: The LX brand hadn't been upstreamed yet. Once it has, we will include it. We will likely assist in its upstreaming, but not at the moment. Dan Sent from my iPhone (typos, autocorrect, and all) > On Mar 23, 2015, at 4:53 PM, P?SZTOR Gy?rgy wrote: > > Hi, > > "Dan McDonald" wrote at 2015-03-23 16:14: >> Soon r151014 will be hitting the streets. WHEN THAT DOES, I have to warn people, especially those jumping from r151006 to r151014 about a known issue in grub. >> >> The illumos grub has serious memory management issues. It cannot cope with too many boot environment (BE) entries. > > Sorry for semi-offtopicing the thread, but: Will the lx brand be restored > in the upcoming release? > > Is there a feature map / release plan / anything available? > I tried to find information regarding this topic without success. > > I checked this url: > http://omnios.omniti.com/roadmap.php > But nothing relevant information was there. It seems outdated / > unmaintained. > > I've just recently find this distro. I used openindiana since Oracle... > -- Did what they did to opensolaris -- > > So, I'm new here, sorry for lame questions. > > Kind regards, > Gy?rgy P?sztor From jboren at drakecooper.com Mon Mar 23 23:34:21 2015 From: jboren at drakecooper.com (Joseph Boren) Date: Mon, 23 Mar 2015 17:34:21 -0600 Subject: [OmniOS-discuss] list of know-compatible motherboards? In-Reply-To: References: <0B39E89B-8B5E-4D36-B9D9-18136A87BF58@icloud.com> Message-ID: Well, I have a dumb question, if everyone isn't fed up with me. This board appears to have a hardware fault, and I'm trying to figure out the exact details for the RMA exchange. First of all the second Ethernet port doesn't work. If you plug it into a switchport, the speed/duplex led comes on seeming to indicate that it is linking up at 1000/full duplex, but the link/activity light never comes on. OmniOS only sees one ethernet port on the motherboard. The issue I'm struggling with, however is identifying a failed PCIEX device. When the machine boots right before the login prompt comes up I get an error: Warning: one or more I/O devices have been retired. When i check to see what the device is using "fmadm faulty" i get the following: Fault Class: fault.io.pciex.device-interr Affects: dev:////pci at 0,0/pci1002,5a1d at a/pci15d9,a711 at 0 faulted and taken out of service FRU: "MB" (hc://:product-id=H8SLG:server-id=omnistor1:chassis-id=1234567890/motherboard=0) Description: A problem was detected for a PCIEX device. Rever to http://illumos.org/msg/PCIEX-8000-0A for more information. I'm having trouble identifying exactly what device it's referring to. Seems like something on the motherboard, or is it referring to the motherboard itself? It would make sense that it was referring to the ethernet port, but I'm pretty ignorant about PCIEX and haven't been able to find any info that corresponds to those numbers. If someone could point me in the right direction, I'd be grateful. Best regards, Joe Boren -jb- *Joseph Boren* IT Specialist *DRAKE COOPER* + c: (208) 891-2128 + o: (208) 342-0925 + 416 S. 8th St., Boise, ID 83702 + w: drakecooper.com + f: /drakecooper + t: @drakecooper On Sun, Mar 22, 2015 at 3:05 PM, Joseph Boren wrote: > Hi Ben, > > Thanks for the tip. It turns out that board had some actual physical > defects that were causing some weird behaviour that was confusing the whole > issue. What you suggest should work perfectly for that scenario. I'm > exchanging the board and I'm sure the new one will be fine. > > Thanks again for the idea. > > Best regards, > joe boren > > -jb- > *Joseph Boren* > > IT Specialist > *DRAKE COOPER* > + c: (208) 891-2128 + o: (208) 342-0925 > + 416 S. 8th St., Boise, ID 83702 > + w: drakecooper.com + f: /drakecooper + > t: @drakecooper > > > On Sat, Mar 21, 2015 at 1:08 AM, Ben Kitching > wrote: > >> Hi Joe, >> >> I?ve had similar problems with Supermicro boards in the past. >> >> Have you tried disabling the option ROMs for your HBA?s in the BIOS? >> >> That solved it for us. >> >> On 20 Mar 2015, at 23:15, Joseph Boren wrote: >> >> Sorry to dredge this old thread up, but I wanted to add my experience >> with the Supermicro H8SGL-F motherboard, on the off chance someone is >> considering using that motherboard and reads this thread. And this is in >> no way a criticism of F?bio, or complaint about his recommendations. I >> appreciate the advice and I'm sure his use case is just different enough >> from mine that he didn't surface these issues. >> >> This board has some limitations that may make it not a great choice >> depending on your intentions for it. First of all, the BIOS can see a >> maximum of 12 attached HDs. And for some strange reason it sees HDs >> attached to HBAs before it sees HDs attached to the onboard SATA >> connectors, so if you have 12 drives attached to HBAs you cannot use the >> onboard SATA, BIOS can't see it (if you have 11 drives on HBAs you can see >> 1 local drive, etc). In addition, if you have more than 12 drives attached >> to HBAs, you can't boot from the drives higher than 12. So I have 14 >> drives attached to 2 8port HBAs, I can only set any of the first 12 as boot >> devices. 13 and 14 cannot be used. Finally (i think), if you have 12 or >> more drives attached to any HBA, on board SATA, whatever, you cannot boot >> from a flash drive. Even if you set it as the ONLY boot device it will >> just skip it and complain that there is no bootable device. If you only >> have 11 drives you can boot from USB Flash no problem. Interestingly, a >> USB CDROM is unaffected by this. You can select and boot from a USB CDROM >> regardless of how many drives are attached. Finally (actually this time I >> think), it appears to be impossible to set up a mirrored syspool on this >> motherboard, because there is only one slot in the Bios boot order menu for >> Hard Disk. So you can only choose one of the mirror pair as a boot >> device. There is no way to specify another HD as a second priority boot >> device. Now once you get OmniOS loaded it can see and make use of all >> drives attached to the system, but you are very restricted in what you can >> use for boot devices. >> >> After the better part of 2 weeks of back and forth with Supermicro >> support (who have been really nice and cooperative, but unable to do >> anything about it), I'm going to have to eat cost of this board/cpu/memory >> and get something else. If your use case is 12 total drives or less, and >> no mirrored boot, this board will work fine. If you need more than 12 >> drives, or mirrored syspool, it will not work. >> >> Thanks >> Joe >> >> >> -jb- >> *Joseph Boren* >> >> IT Specialist >> *DRAKE COOPER* >> + c: (208) 891-2128 + o: (208) 342-0925 >> + 416 S. 8th St., Boise, ID 83702 >> + w: drakecooper.com + f: /drakecooper >> + t: @drakecooper >> >> >> >> On Wed, Nov 19, 2014 at 4:54 PM, Joseph Boren >> wrote: >> >>> Wow, F?bio, thanks so much, that is very helpful. I was looking at >>> supermicro motherboards, so your info is perfect. >>> >>> I will have a look at those, I'm guessing I can find something that fits >>> my use case. Thanks again, the help is much appreciated. >>> >>> Best regards, >>> >>> >>> -jb- >>> *Joseph Boren* >>> >>> IT Specialist >>> *DRAKE COOPER* >>> + c: (208) 891-2128 + o: (208) 342-0925 >>> + 416 S. 8th St., Boise, ID 83702 >>> + w: drakecooper.com + f: /drakecooper >>> + t: @drakecooper >>> >>> >>> >>> On Wed, Nov 19, 2014 at 4:48 PM, F?bio Rabelo >> > wrote: >>> >>>> I can show you what motherboards I have installed and fully working in >>>> the customers of mine : >>>> >>>> >>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DG6-F.cfm >>>> >>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DG6.cfm >>>> >>>> >>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DGi-F.cfm >>>> >>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8DGi.cfm >>>> >>>> >>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8SGL-F.cfm >>>> >>>> http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8SGL.cfm >>>> >>>> The ones with LSI SAS controler needs to be flashed with IT firmware, >>>> you can find them in the official Supermicro FTP site : >>>> >>>> ftp://ftp.supermicro.com/Driver/SAS/LSI/ >>>> >>>> Opterons from 8 to 24 cores, no issue whats soever ... >>>> >>>> Some of them are up and running for over an year !!! >>>> >>>> >>>> F?bio Rabelo >>>> >>>> 2014-11-19 21:35 GMT-02:00 Joseph Boren : >>>> >>>>> Is anyone aware of a list, even a short list, of motherboards that are >>>>> known to be compatible with OmniOS? The illumos HCL doesn't list any >>>>> motherboards. >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> -jb- >>>>> *Joseph Boren* >>>>> >>>>> IT Specialist >>>>> *DRAKE COOPER* >>>>> + c: (208) 891-2128 + o: (208) 342-0925 >>>>> + 416 S. 8th St., Boise, ID 83702 >>>>> + w: drakecooper.com + f: /drakecooper >>>>> + t: @drakecooper >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> OmniOS-discuss mailing list >>>>> OmniOS-discuss at lists.omniti.com >>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >>>>> >>>>> >>>> >>> >> _______________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From omnios at citrus-it.net Mon Mar 23 23:58:48 2015 From: omnios at citrus-it.net (Andy) Date: Mon, 23 Mar 2015 23:58:48 +0000 (GMT) Subject: [OmniOS-discuss] list of know-compatible motherboards? In-Reply-To: References: <0B39E89B-8B5E-4D36-B9D9-18136A87BF58@icloud.com> Message-ID: On Mon, 23 Mar 2015, Joseph Boren wrote: ; Fault Class: fault.io.pciex.device-interr ; Affects: dev:////pci at 0,0/pci1002,5a1d at a/pci15d9,a711 at 0 faulted and taken ; out of service ; FRU: "MB" ; (hc://:product-id=H8SLG:server-id=omnistor1:chassis-id=1234567890/motherboard=0) 15d9 is SuperMicro and http://mirror.szepe.net/siv/pcidevs.txt says that's an embedded MegaRAID. A. -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From john.barfield at bissinc.com Tue Mar 24 19:29:45 2015 From: john.barfield at bissinc.com (John Barfield) Date: Tue, 24 Mar 2015 19:29:45 +0000 Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues Message-ID: <8E74F76C-13B9-4FDD-95EE-8B07F373B412@bissinc.com> Greetings OmnisOS community! This is my first time to ask a question on this list so here goes. I?ve deployed a zone on omnios and a KVM virtual machine within the zone. I?ve been doing some initial virtio network interface performance testing with iperf and the following are my results (default out of the box for all moving parts). Host: OmniOS build stable: r151012 Test Interfaces: GZ Phys = igb0 KVM Vnic: kvm0 over igb0 GZ Vnic: gvm0 over igb0 Zone Vnic: zvm0 over igb0 Network addressing: All vnics are with 10.128.255.249/29 Zone Brand: Omni-ti ipkg Qemu-Kvm Guest: Centos 6.6 x86_64 iPerf Results: KVM Guest -> Global Zone = 151 Mbytes (Expected close to 1 GByte) KVM Guest -> KVM Zone = 147 Mbytes (Expected close to 1 GByte) Zone -> Global Zone = 5.0GBytes (These were expected since it was a host only VNIC network) GZ -> Zone = 4.7 Gbytes (These were expected since it was a host only VNIC network) My question is are there any tweaks that I?m missing to get the full performance potential within the guest? Why am I only seeing 147 Mbytes between KVM and the hosting zone or the global zone? I?m testing with an isolated network and vnics only, so the traffic is never leaving the physical host to go over the wire. I do have cpu capped at 16 cores and memory capped at 16GB of memory in the zone. Is there some default network capping that I?m missing? Or process throttling? MTU is 1500 across the board. I did the same test with etherstubs at first but though maybe I was having an MTU mismatch because I received the same 147 Mbyte result?however a subsequent test using just the GZ -> child zone showed 5.0GBps over the etherstub switch just like when I only used the VNIC?s over igb0. Also just for grins I tested two bare metal hosts on my physical network with iperf?one being CentOS 6.5 and the other OmniOS build r151012 and received 1.09 Gbytes over a physical switch. Your thoughts are appreciated! John Barfield / Sr Principal Engineer +1 (214) 425-0783/ john.barfield at bissinc.com BISS, Inc. Office: +1 (214) 506-8354 4925 Greenville Ave Suite 900 Dallas, TX 75206 support.bissinc.com From moo at wuffers.net Tue Mar 24 21:17:33 2015 From: moo at wuffers.net (wuffers) Date: Tue, 24 Mar 2015 17:17:33 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks Message-ID: I recently created a pair of 25TB LUs for use in my VMware environment to test out Veeam (and using that space for my repo - yes, yes, backups should not reside in the same storage, but they will be exported to tape). So while trying to create a 16TB drive in the vSphere fat client, I got the value out of range error ( http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2054952). OKed the error, and task seemed to run anyways, but at some point my whole SAN crashed during the creation of the drive. As this was during business hours, I did not have time to wait on the dump, but I was able to reproduce it later trying to create a 10TB drive (again from the fat vSphere client, not web client) and capture the dump (which takes 40 minutes.. grr). Just an quick note on the environment: the VMware hosts are connected to the head unit via IB and SRP. The largest LUs I had previously created for VMware were 5TB in size, and largest drive created was 2TB. fmdump info: TIME UUID SUNW-MSG-ID Mar 20 2015 19:35:26.819716000 31ced65f-dca2-ee58-c882-a6daa6b94208 SUNOS-8000-KL nvlist version: 0 version = 0x0 class = list.suspect uuid = 31ced65f-dca2-ee58-c882-a6daa6b94208 code = SUNOS-8000-KL diag-time = 1426894526 787544 de = fmd:///module/software-diagnosis fault-list-sz = 0x1 fault-list = (array of embedded nvlists) (start fault-list[0]) nvlist version: 0 version = 0x0 class = defect.sunos.kernel.panic certainty = 0x64 asru = sw:///:path=/var/crash/unknown/.31ced65f-dca2-ee58-c882-a6daa6b94208 resource = sw:///:path=/var/crash/unknown/.31ced65f-dca2-ee58-c882-a6daa6b94208 savecore-succcess = 1 dump-dir = /var/crash/unknown dump-files = vmdump.0 os-instance-uuid = 31ced65f-dca2-ee58-c882-a6daa6b94208 panicstr = kernel heap corruption detected panicstack = fffffffffba49114 () | genunix:kmem_slab_free+c1 () | genunix:kmem_magazine_destroy+6e () | genunix:kmem_cache_magazine_purge+f0 () | genunix:kmem_cache_magazine_resize+40 () | genunix:taskq_thread+2d0 () | unix:thread_start+8 () | crashtime = 1426891707 panic-time = Fri Mar 20 18:48:27 2015 EDT (end fault-list[0]) fault-status = 0x1 severity = Major __ttl = 0x1 __tod = 0x550caebe 0x30dbdfa0 Crash file: https://drive.google.com/open?id=0B7mCJnZUzJPKOXl1S3IwYXh4NTg&authuser=0 I couldn't find any interesting comparative posts/reports. Would some kind soul care to look at the dump and see what is happening here? (And is this the right spot for a kernel panic report, or is it better to go to the illumos list?) -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.barfield at bissinc.com Tue Mar 24 21:34:47 2015 From: john.barfield at bissinc.com (John Barfield) Date: Tue, 24 Mar 2015 21:34:47 +0000 Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues Message-ID: Okay found the problem. After further testing I achieved 952 MBytes on a VM-2-VM connection...1 linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two different SmartOS host machines (through an extreme networks switch). This was puzzling so I look at how joyent ran the VM?s command with pargs?I found that they do not use the following format: -net nic,vlan=1,name=${VNIC2},model=virtio,macaddr=${mac2} \ -net vnic,vlan=1,name=${VNIC2},ifname=${VNIC2},macaddr=${mac2} \ They use this format: -device \ virtio-net-pci,mac=02:08:20:5f:85:0d,tx=timer,x-txtimer=200000,x-txburst=12 8,vlan=0 \ \ -net \ vnic,name=${VNIC1},vlan=0,ifname=${VNIC1} \ I?m not sure if the txtimer values did anything performance gaining or not?I?m pretty sure just switching to the -device configuration instead of the legacy -net nic configuration is what did the trick. If anyone wants me to I?ll test and see if that was the only difference. Have a great day! John Barfield / Sr Principal Engineer +1 (214) 425-0783/ john.barfield at bissinc.com BISS, Inc. Office: +1 (214) 506-8354 4925 Greenville Ave Suite 900 Dallas, TX 75206 support.bissinc.com This e-mail message may contain confidential or legally privileged information and is intended only for the use of the intended recipient(s). Any unauthorized disclosure, dissemination, distribution, copying or the taking of any action in reliance on the information herein is prohibited. E-mails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, or contain viruses. Anyone who communicates with us by e-mail is deemed to have accepted these risks. Company Name is not responsible for errors or omissions in this message and denies any responsibility for any damage arising from the use of e-mail. Any opinion and other statement contained in this message and any attachment are solely those of the author and do not necessarily represent those of the company. On 3/24/15, 2:29 PM, "John Barfield" wrote: >Greetings OmnisOS community! This is my first time to ask a question on >this list so here goes. I?ve deployed a zone on omnios and a KVM virtual >machine within the zone. I?ve been doing some initial virtio network >interface performance testing with iperf and the following are my results >(default out of the box for all moving parts). > > >Host: >OmniOS build stable: r151012 >Test Interfaces: >GZ Phys = igb0 >KVM Vnic: kvm0 over igb0 >GZ Vnic: gvm0 over igb0 >Zone Vnic: zvm0 over igb0 > >Network addressing: All vnics are with 10.128.255.249/29 > >Zone Brand: Omni-ti ipkg >Qemu-Kvm Guest: Centos 6.6 x86_64 > > >iPerf Results: > >KVM Guest -> Global Zone = 151 Mbytes (Expected close to 1 GByte) > >KVM Guest -> KVM Zone = 147 Mbytes (Expected close to 1 GByte) > >Zone -> Global Zone = 5.0GBytes (These were expected since it was a host >only VNIC network) > >GZ -> Zone = 4.7 Gbytes (These were expected since it was a host only >VNIC >network) > > >My question is are there any tweaks that I?m missing to get the full >performance potential within the guest? Why am I only seeing 147 Mbytes >between KVM and the hosting zone or the global zone? > >I?m testing with an isolated network and vnics only, so the traffic is >never leaving the physical host to go over the wire. > >I do have cpu capped at 16 cores and memory capped at 16GB of memory in >the zone. Is there some default network capping that I?m missing? Or >process throttling? > >MTU is 1500 across the board. > >I did the same test with etherstubs at first but though maybe I was >having >an MTU mismatch because I received the same 147 Mbyte result?however a >subsequent test using just the GZ -> child zone showed 5.0GBps over the >etherstub switch just like when I only used the VNIC?s over igb0. > >Also just for grins I tested two bare metal hosts on my physical network >with iperf?one being CentOS 6.5 and the other OmniOS build r151012 and >received 1.09 Gbytes over a physical switch. > >Your thoughts are appreciated! > > > > > > > > >John Barfield / Sr Principal Engineer >+1 (214) 425-0783/ john.barfield at bissinc.com >BISS, Inc. Office: +1 (214) 506-8354 > >4925 Greenville Ave Suite 900 >Dallas, TX 75206 >support.bissinc.com >_______________________________________________ >OmniOS-discuss mailing list >OmniOS-discuss at lists.omniti.com >http://lists.omniti.com/mailman/listinfo/omnios-discuss From danmcd at omniti.com Tue Mar 24 22:41:15 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 24 Mar 2015 18:41:15 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: References: Message-ID: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> Here's a good place to start. It may need to be kicked to the illumos developer's list, but let's see what we can figure out first. 1.) What revision of OmniOS are you running? 2.) I notice a lot of STMF threads. COMSTAR (aka. STMF) is not the most stable piece of software in illumos, especially in older revisions. There's been a lot of work done on it, but that's mostly in Nexenta's distro. It hasn't been all upstreamed yet. Dan From danmcd at omniti.com Tue Mar 24 22:47:59 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 24 Mar 2015 18:47:59 -0400 Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues In-Reply-To: References: Message-ID: <708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com> > On Mar 24, 2015, at 5:34 PM, John Barfield wrote: > > > They use this format: > > -device \ > virtio-net-pci,mac=02:08:20:5f:85:0d,tx=timer,x-txtimer=200000,x-txburst=12 > 8,vlan=0 \ > \ > -net \ > vnic,name=${VNIC1},vlan=0,ifname=${VNIC1} \ > I?m not sure if the txtimer values did anything performance gaining or > not?I?m pretty sure just switching to the -device configuration instead of > the legacy -net nic configuration is what did the trick. > > If anyone wants me to I?ll test and see if that was the only difference. I would be interested, especially so if we have to update our KVM page to mention this. Thanks! Dan From hasslerd at gmx.li Tue Mar 24 23:04:26 2015 From: hasslerd at gmx.li (Dominik Hassler) Date: Wed, 25 Mar 2015 00:04:26 +0100 Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues In-Reply-To: <708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com> References: <708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com> Message-ID: <5511ED7A.3080006@gmx.li> Dan, >> After further testing I achieved 952 MBytes on a VM-2-VM >> connection...1 >> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two >> different SmartOS host machines (through an extreme networks switch). if I got John correctly, he was running his second test on SmartOS hosts... We did a lot of testing on OmniOS with -net vnic and -device virtio-net-pci but sadly to no avail... I think we have to hope that SmartOS kvm improvements will get upstreamed sooner or later. On 03/24/2015 11:47 PM, Dan McDonald wrote: > >> On Mar 24, 2015, at 5:34 PM, John Barfield wrote: >> >> >> They use this format: >> >> -device \ >> virtio-net-pci,mac=02:08:20:5f:85:0d,tx=timer,x-txtimer=200000,x-txburst=12 >> 8,vlan=0 \ >> \ >> -net \ >> vnic,name=${VNIC1},vlan=0,ifname=${VNIC1} \ > >> I?m not sure if the txtimer values did anything performance gaining or >> not?I?m pretty sure just switching to the -device configuration instead of >> the legacy -net nic configuration is what did the trick. >> >> If anyone wants me to I?ll test and see if that was the only difference. > > I would be interested, especially so if we have to update our KVM page to mention this. > > Thanks! > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > From danmcd at omniti.com Tue Mar 24 23:12:15 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 24 Mar 2015 19:12:15 -0400 Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues In-Reply-To: <5511ED7A.3080006@gmx.li> References: <708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com> <5511ED7A.3080006@gmx.li> Message-ID: <870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com> > On Mar 24, 2015, at 7:04 PM, Dominik Hassler wrote: > > Dan, > >>> After further testing I achieved 952 MBytes on a VM-2-VM >>> connection...1 >>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two >>> different SmartOS host machines (through an extreme networks switch). > > if I got John correctly, he was running his second test on SmartOS hosts... > > We did a lot of testing on OmniOS with -net vnic and -device > virtio-net-pci but sadly to no avail... > > I think we have to hope that SmartOS kvm improvements will get > upstreamed sooner or later. Ahh yes. I was hoping to have them ready for 014, but it's a complicated process to upstream larger projects, and Joyent was in the middle of getting their new Triton release out the door. Dan From moo at wuffers.net Tue Mar 24 23:44:18 2015 From: moo at wuffers.net (wuffers) Date: Tue, 24 Mar 2015 19:44:18 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> Message-ID: On r151012 since Nov. And yes, the LUs are exposed via COMSTAR. If it helps to have some kmem_flags set, I can do that and try to reproduce it in the same way, and have the dump accessible. On Tue, Mar 24, 2015 at 6:41 PM, Dan McDonald wrote: > Here's a good place to start. It may need to be kicked to the illumos > developer's list, but let's see what we can figure out first. > > 1.) What revision of OmniOS are you running? > > 2.) I notice a lot of STMF threads. COMSTAR (aka. STMF) is not the most > stable piece of software in illumos, especially in older revisions. > There's been a lot of work done on it, but that's mostly in Nexenta's > distro. It hasn't been all upstreamed yet. > > Dan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Tue Mar 24 23:44:54 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 24 Mar 2015 19:44:54 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> Message-ID: <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> > On Mar 24, 2015, at 7:44 PM, wuffers wrote: > > On r151012 since Nov. And yes, the LUs are exposed via COMSTAR. > > If it helps to have some kmem_flags set, I can do that and try to reproduce it in the same way, and have the dump accessible. kmem_flags=0xf + the actual coredump would be amazingly useful. Thanks, Dan From john.barfield at bissinc.com Tue Mar 24 23:45:56 2015 From: john.barfield at bissinc.com (John Barfield) Date: Tue, 24 Mar 2015 23:45:56 +0000 Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues In-Reply-To: <870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com> References: <708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com> <5511ED7A.3080006@gmx.li>, <870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com> Message-ID: Btw I did go ahead and test both virtio methods...I gave a vm the -device argument on one interface and the -net argument for another the results where.... -device = eth0 = 952mbps -net = eth1 = 199 mbps Thanks and have a great day, John Barfield > On Mar 24, 2015, at 6:12 PM, Dan McDonald wrote: > > >> On Mar 24, 2015, at 7:04 PM, Dominik Hassler wrote: >> >> Dan, >> >>>> After further testing I achieved 952 MBytes on a VM-2-VM >>>> connection...1 >>>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two >>>> different SmartOS host machines (through an extreme networks switch). >> >> if I got John correctly, he was running his second test on SmartOS hosts... >> >> We did a lot of testing on OmniOS with -net vnic and -device >> virtio-net-pci but sadly to no avail... >> >> I think we have to hope that SmartOS kvm improvements will get >> upstreamed sooner or later. > > Ahh yes. > > I was hoping to have them ready for 014, but it's a complicated process to upstream larger projects, and Joyent was in the middle of getting their new Triton release out the door. > > Dan > From phil.harman at gmail.com Wed Mar 25 00:40:47 2015 From: phil.harman at gmail.com (Phil Harman) Date: Wed, 25 Mar 2015 00:40:47 +0000 Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues In-Reply-To: References: <708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com> <5511ED7A.3080006@gmx.li> <870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com> Message-ID: John, Interesting work and data. Thanks for sharing. I've also been playing with Oracle Solaris 11.2 vs Linux vs FreeBSD on SmartOS vs ESX5.5 (free edition) both VM2VM and VM to remote host over a couple of Intel 10GBASE-T cards. As far as I can tell, there remains no virtio-net driver for Solaris / Illumos guests, so I've been using e1000g, which really sucks. I found virtio-net works ok under KVM, but was blown away by vmxnet3 under ESX performance (for which a Solaris / Illumos drivers do exist), being able to get close to 8gbps from the guest over the wire! To achieve this I had to use jumbo frames (something the current Solaris 11.2 e1000g appears unable to do at all any more). So I was wondering, while you are there, whether you've got (or can get) any data for KVM virtio-net VM2VM using jumbo frames? My expectation would be at least 2x for MTU 9000 vs 1500. I also wonder whether like for like comparison with ESX might encourage further improvements? As someone used to say at Sun "If Linux is faster, it's a Solaris bug!". It would be great if the community could agree to the same for ESX vs KVM :) Cheers, Phil > On 24 Mar 2015, at 23:45, John Barfield wrote: > > Btw I did go ahead and test both virtio methods...I gave a vm the -device argument on one interface and the -net argument for another the results where.... > > -device = eth0 = 952mbps > -net = eth1 = 199 mbps > > Thanks and have a great day, > > John Barfield > >> On Mar 24, 2015, at 6:12 PM, Dan McDonald wrote: >> >> >>> On Mar 24, 2015, at 7:04 PM, Dominik Hassler wrote: >>> >>> Dan, >>> >>>>> After further testing I achieved 952 MBytes on a VM-2-VM >>>>> connection...1 >>>>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two >>>>> different SmartOS host machines (through an extreme networks switch). >>> >>> if I got John correctly, he was running his second test on SmartOS hosts... >>> >>> We did a lot of testing on OmniOS with -net vnic and -device >>> virtio-net-pci but sadly to no avail... >>> >>> I think we have to hope that SmartOS kvm improvements will get >>> upstreamed sooner or later. >> >> Ahh yes. >> >> I was hoping to have them ready for 014, but it's a complicated process to upstream larger projects, and Joyent was in the middle of getting their new Triton release out the door. >> >> Dan > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From jboren at drakecooper.com Wed Mar 25 01:21:38 2015 From: jboren at drakecooper.com (Joseph Boren) Date: Tue, 24 Mar 2015 19:21:38 -0600 Subject: [OmniOS-discuss] list of know-compatible motherboards? Message-ID: Hi Andy, Thanks very much for the info, that's very helpful. Much appreciated. Best regards, Joe Boren > > Message: 1 > Date: Mon, 23 Mar 2015 23:58:48 +0000 (GMT) > From: Andy > To: omnios-discuss > Subject: Re: [OmniOS-discuss] list of know-compatible motherboards? > Message-ID: > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > On Mon, 23 Mar 2015, Joseph Boren wrote: > > ; Fault Class: fault.io.pciex.device-interr > ; Affects: dev:////pci at 0,0/pci1002,5a1d at a/pci15d9,a711 at 0 faulted and > taken > ; out of service > ; FRU: "MB" > ; > (hc://:product-id=H8SLG:server-id=omnistor1:chassis-id=1234567890/motherboard=0) > > 15d9 is SuperMicro > and http://mirror.szepe.net/siv/pcidevs.txt > says that's an embedded MegaRAID. > > A. > > -- > Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk > Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ > Registered in England and Wales | Company number 4899123 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.barfield at bissinc.com Wed Mar 25 01:50:25 2015 From: john.barfield at bissinc.com (John Barfield) Date: Wed, 25 Mar 2015 01:50:25 +0000 Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues In-Reply-To: References: <708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com> <5511ED7A.3080006@gmx.li> <870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com> , Message-ID: <38C9C9CA-A75F-49F6-B210-C04CA81402E4@bissinc.com> Actually the numbers I sent for the SmartOS VM to VM test were on a switch with Jumbo frames (switch = 9216 mtu...SmartOS GZ MTU = 9000) (Extreme Networks Summit X440-48t release 15.2.3 patch12) Theyre also sitting in Q-in-Q tagged VLANs. Admin tagged nic sits in Vman (provider bridge) 10 the VMs were tagged in VLAN 1674. (not bad :) really) As far as everyone who is wondering how I got 952 Mbps on OmnisOS KVM while running in a zone I plan to write up a how-to that can be posted to the core site if you'd like. There are several caveats that are not documented today for running KVM in a zone. Not that I didnt reverse engineer some of Joyents work of course. Thanks and have a great day, John Barfield > On Mar 24, 2015, at 7:40 PM, Phil Harman wrote: > > John, > > Interesting work and data. Thanks for sharing. > > I've also been playing with Oracle Solaris 11.2 vs Linux vs FreeBSD on SmartOS vs ESX5.5 (free edition) both VM2VM and VM to remote host over a couple of Intel 10GBASE-T cards. > > As far as I can tell, there remains no virtio-net driver for Solaris / Illumos guests, so I've been using e1000g, which really sucks. > > I found virtio-net works ok under KVM, but was blown away by vmxnet3 under ESX performance (for which a Solaris / Illumos drivers do exist), being able to get close to 8gbps from the guest over the wire! > > To achieve this I had to use jumbo frames (something the current Solaris 11.2 e1000g appears unable to do at all any more). > > So I was wondering, while you are there, whether you've got (or can get) any data for KVM virtio-net VM2VM using jumbo frames? > > My expectation would be at least 2x for MTU 9000 vs 1500. > > I also wonder whether like for like comparison with ESX might encourage further improvements? > > As someone used to say at Sun "If Linux is faster, it's a Solaris bug!". It would be great if the community could agree to the same for ESX vs KVM :) > > Cheers, > Phil > > >> On 24 Mar 2015, at 23:45, John Barfield wrote: >> >> Btw I did go ahead and test both virtio methods...I gave a vm the -device argument on one interface and the -net argument for another the results where.... >> >> -device = eth0 = 952mbps >> -net = eth1 = 199 mbps >> >> Thanks and have a great day, >> >> John Barfield >> >>> On Mar 24, 2015, at 6:12 PM, Dan McDonald wrote: >>> >>> >>>> On Mar 24, 2015, at 7:04 PM, Dominik Hassler wrote: >>>> >>>> Dan, >>>> >>>>>> After further testing I achieved 952 MBytes on a VM-2-VM >>>>>> connection...1 >>>>>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two >>>>>> different SmartOS host machines (through an extreme networks switch). >>>> >>>> if I got John correctly, he was running his second test on SmartOS hosts... >>>> >>>> We did a lot of testing on OmniOS with -net vnic and -device >>>> virtio-net-pci but sadly to no avail... >>>> >>>> I think we have to hope that SmartOS kvm improvements will get >>>> upstreamed sooner or later. >>> >>> Ahh yes. >>> >>> I was hoping to have them ready for 014, but it's a complicated process to upstream larger projects, and Joyent was in the middle of getting their new Triton release out the door. >>> >>> Dan >> _______________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss From jesus at omniti.com Wed Mar 25 11:56:31 2015 From: jesus at omniti.com (Theo Schlossnagle) Date: Wed, 25 Mar 2015 07:56:31 -0400 Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues In-Reply-To: <38C9C9CA-A75F-49F6-B210-C04CA81402E4@bissinc.com> References: <708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com> <5511ED7A.3080006@gmx.li> <870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com> <38C9C9CA-A75F-49F6-B210-C04CA81402E4@bissinc.com> Message-ID: +1 John. That documentation would be very welcome. On Tue, Mar 24, 2015 at 9:50 PM, John Barfield wrote: > Actually the numbers I sent for the SmartOS VM to VM test were on a switch > with Jumbo frames (switch = 9216 mtu...SmartOS GZ MTU = 9000) (Extreme > Networks Summit X440-48t release 15.2.3 patch12) Theyre also sitting in > Q-in-Q tagged VLANs. Admin tagged nic sits in Vman (provider bridge) 10 the > VMs were tagged in VLAN 1674. (not bad :) really) > > As far as everyone who is wondering how I got 952 Mbps on OmnisOS KVM > while running in a zone I plan to write up a how-to that can be posted to > the core site if you'd like. There are several caveats that are not > documented today for running KVM in a zone. Not that I didnt reverse > engineer some of Joyents work of course. > > > > Thanks and have a great day, > > John Barfield > > > On Mar 24, 2015, at 7:40 PM, Phil Harman wrote: > > > > John, > > > > Interesting work and data. Thanks for sharing. > > > > I've also been playing with Oracle Solaris 11.2 vs Linux vs FreeBSD on > SmartOS vs ESX5.5 (free edition) both VM2VM and VM to remote host over a > couple of Intel 10GBASE-T cards. > > > > As far as I can tell, there remains no virtio-net driver for Solaris / > Illumos guests, so I've been using e1000g, which really sucks. > > > > I found virtio-net works ok under KVM, but was blown away by vmxnet3 > under ESX performance (for which a Solaris / Illumos drivers do exist), > being able to get close to 8gbps from the guest over the wire! > > > > To achieve this I had to use jumbo frames (something the current Solaris > 11.2 e1000g appears unable to do at all any more). > > > > So I was wondering, while you are there, whether you've got (or can get) > any data for KVM virtio-net VM2VM using jumbo frames? > > > > My expectation would be at least 2x for MTU 9000 vs 1500. > > > > I also wonder whether like for like comparison with ESX might encourage > further improvements? > > > > As someone used to say at Sun "If Linux is faster, it's a Solaris bug!". > It would be great if the community could agree to the same for ESX vs KVM :) > > > > Cheers, > > Phil > > > > > >> On 24 Mar 2015, at 23:45, John Barfield > wrote: > >> > >> Btw I did go ahead and test both virtio methods...I gave a vm the > -device argument on one interface and the -net argument for another the > results where.... > >> > >> -device = eth0 = 952mbps > >> -net = eth1 = 199 mbps > >> > >> Thanks and have a great day, > >> > >> John Barfield > >> > >>> On Mar 24, 2015, at 6:12 PM, Dan McDonald wrote: > >>> > >>> > >>>> On Mar 24, 2015, at 7:04 PM, Dominik Hassler wrote: > >>>> > >>>> Dan, > >>>> > >>>>>> After further testing I achieved 952 MBytes on a VM-2-VM > >>>>>> connection...1 > >>>>>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two > >>>>>> different SmartOS host machines (through an extreme networks > switch). > >>>> > >>>> if I got John correctly, he was running his second test on SmartOS > hosts... > >>>> > >>>> We did a lot of testing on OmniOS with -net vnic and -device > >>>> virtio-net-pci but sadly to no avail... > >>>> > >>>> I think we have to hope that SmartOS kvm improvements will get > >>>> upstreamed sooner or later. > >>> > >>> Ahh yes. > >>> > >>> I was hoping to have them ready for 014, but it's a complicated > process to upstream larger projects, and Joyent was in the middle of > getting their new Triton release out the door. > >>> > >>> Dan > >> _______________________________________________ > >> OmniOS-discuss mailing list > >> OmniOS-discuss at lists.omniti.com > >> http://lists.omniti.com/mailman/listinfo/omnios-discuss > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -- Theo Schlossnagle http://omniti.com/is/theo-schlossnagle -------------- next part -------------- An HTML attachment was scrubbed... URL: From nsmith at careyweb.com Wed Mar 25 14:52:26 2015 From: nsmith at careyweb.com (Nate Smith) Date: Wed, 25 Mar 2015 10:52:26 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> Message-ID: <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> Can confirm that there are problems with Comstar, especially with Fibre/STMF. Are people seeing problems with iSCSI or does that seem more stable? -----Original Message----- From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Dan McDonald Sent: Tuesday, March 24, 2015 7:45 PM To: wuffers Cc: omnios-discuss Subject: Re: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks > On Mar 24, 2015, at 7:44 PM, wuffers wrote: > > On r151012 since Nov. And yes, the LUs are exposed via COMSTAR. > > If it helps to have some kmem_flags set, I can do that and try to reproduce it in the same way, and have the dump accessible. kmem_flags=0xf + the actual coredump would be amazingly useful. Thanks, Dan _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From moo at wuffers.net Wed Mar 25 15:51:51 2015 From: moo at wuffers.net (wuffers) Date: Wed, 25 Mar 2015 11:51:51 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> Message-ID: On Tue, Mar 24, 2015 at 7:44 PM, Dan McDonald wrote: > > > On Mar 24, 2015, at 7:44 PM, wuffers wrote: > > > > On r151012 since Nov. And yes, the LUs are exposed via COMSTAR. > > > > If it helps to have some kmem_flags set, I can do that and try to > reproduce it in the same way, and have the dump accessible. > > kmem_flags=0xf + the actual coredump would be amazingly useful. > > Thanks, > Dan Going to do this as soon as I can. Solaris docs say to put the following line in etc/system and reboot: set kmem_flags=0xf Can't I just set this dynamically like so (so I can potentially skip 2 reboots)? echo kmem_flags/W0xf | mdb -kw I can't comment myself on Fibre/STMF, as we do IB SRP here. I would say it's been "fairly" stable (can run for months before I see an issue), but have seen some weird hangups where I had to reboot the head unit (but no kernel panics). -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Wed Mar 25 16:09:04 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 25 Mar 2015 12:09:04 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> Message-ID: > On Mar 25, 2015, at 11:51 AM, wuffers wrote: > > > Going to do this as soon as I can. > > Solaris docs say to put the following line in etc/system and reboot: > set kmem_flags=0xf That's correct. > Can't I just set this dynamically like so (so I can potentially skip 2 reboots)? > > echo kmem_flags/W0xf | mdb -kw No, because those are read at kmem cache creation time at the system's start. > I can't comment myself on Fibre/STMF, as we do IB SRP here. I would say it's been "fairly" stable (can run for months before I see an issue), but have seen some weird hangups where I had to reboot the head unit (but no kernel panics). You reproduce this bug by configuring things a specific way, right? I ask because you seem to have been running okay until you fell down this particular panic rabbit hole with a particular set of things, correct? Thanks, Dan From moo at wuffers.net Wed Mar 25 18:17:28 2015 From: moo at wuffers.net (wuffers) Date: Wed, 25 Mar 2015 14:17:28 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> Message-ID: On Wed, Mar 25, 2015 at 12:09 PM, Dan McDonald wrote: > > > Can't I just set this dynamically like so (so I can potentially skip 2 > reboots)? > > > > echo kmem_flags/W0xf | mdb -kw > > No, because those are read at kmem cache creation time at the system's > start. > > Ahh, if I RTFM'd the whole doc, I would have caught this excerpt: " These are set in conjunction with the global kmem_flags variable at cache creation time. Setting kmem_flags while the system is running has no effect on the debugging behavior, except for subsequently created caches (which is rare after boot-up)." > I can't comment myself on Fibre/STMF, as we do IB SRP here. I would say > it's been "fairly" stable (can run for months before I see an issue), but > have seen some weird hangups where I had to reboot the head unit (but no > kernel panics). > > You reproduce this bug by configuring things a specific way, right? I ask > because you seem to have been running okay until you fell down this > particular panic rabbit hole with a particular set of things, correct? > > > The panic is happening when I tried to create a 10+TB eager zero vmdk with the vSphere fat client. I'm assuming that it will happen a third time when I use the same steps. Since I can't save myself the two reboots, I will most likely try without the usual Hyper-V and VMware host loads and just try to create the vmdk and see what happens. So I would say no, I'm not changing any settings or configuration, just trying to do "normal" things like create disks, although they are much bigger than anything I've created before. I do have a 50TB LU for the Hyper-V hosts, but never tried to create any disk that big on it. If I have time I'll try it on a Hyper-V VM as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Wed Mar 25 18:21:30 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 25 Mar 2015 14:21:30 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> Message-ID: <371F5645-A4A6-4BAC-A219-21F875F837FD@omniti.com> > On Mar 25, 2015, at 2:17 PM, wuffers wrote: > >> You reproduce this bug by configuring things a specific way, right? I ask because you seem to have been running okay until you fell down this particular panic rabbit hole with a particular set of things, correct? >> > > The panic is happening when I tried to create a 10+TB eager zero vmdk with the vSphere fat client. I'm assuming that it will happen a third time when I use the same steps. Since I can't save myself the two reboots, I will most likely try without the usual Hyper-V and VMware host loads and just try to create the vmdk and see what happens. So I would say no, I'm not changing any settings or configuration, just trying to do "normal" things like create disks, although they are much bigger than anything I've created before. I do have a 50TB LU for the Hyper-V hosts, but never tried to create any disk that big on it. If I have time I'll try it on a Hyper-V VM as well. I had to ask. A with-kmem-flags coredump will be very useful. Thanks, Dan p.s. r151014 is coming soon. I'll be curious if it manifests the same (mis-)behavior. Not a lot of comstar fixes from upstream. From mir at miras.org Wed Mar 25 18:52:12 2015 From: mir at miras.org (Michael Rasmussen) Date: Wed, 25 Mar 2015 19:52:12 +0100 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> Message-ID: <20150325195212.5a8cebe4@sleipner.datanom.net> On Wed, 25 Mar 2015 10:52:26 -0400 Nate Smith wrote: > Can confirm that there are problems with Comstar, especially with Fibre/STMF. Are people seeing problems with iSCSI or does that seem more stable? > I have used a box as shared storage for proxmox ve presenting storage for KVM over iSCSI (Comstar) since 151008 (Bloody at that time due to missing support for the Hudson chipset in 151006) and now 151012. I have not had a single problem in all this time. Omnios and Comstar have been rock solid. The first approx. 2 years the connection was through a bond of 1Gb Intel Nics but the last approx. 1 year I have been using IPoIB which until now have had the same track record as Gb nics. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Scientists are people who build the Brooklyn Bridge and then buy it. -- William Buckley -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From omnios at citrus-it.net Wed Mar 25 23:31:32 2015 From: omnios at citrus-it.net (Andy) Date: Wed, 25 Mar 2015 23:31:32 +0000 (GMT) Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013 In-Reply-To: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com> References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com> Message-ID: On Tue, 10 Mar 2015, Dan McDonald wrote: ; The last bloody didn't have a lot of changes. This one does. Let's go over them: ; ; * ipmitool is now 1.8.15 The configure.in patch that enabled the open interface was also removed along with this upgrade to 1.8.15. http://omnios.omniti.com/changeset.php/core/omnios-build/b9ed06fb1c62498f8c10acb7cf21e06865a3c74c#d1 Any particular reason? It stops it being able to talk to the interface delivered by the dependant driver/ipmi package. bloody# ipmitool -h 2>&1| ggrep -A5 Interfaces Interfaces: lan IPMI v1.5 LAN Interface [default] lanplus IPMI v2.0 RMCP+ LAN Interface serial-terminal Serial Interface, Terminal Mode serial-basic Serial Interface, Basic Mode r151012# ipmitool -h 2>&1| ggrep -A5 Interfaces Interfaces: open Linux OpenIPMI Interface [default] lan IPMI v1.5 LAN Interface lanplus IPMI v2.0 RMCP+ LAN Interface serial-terminal Serial Interface, Terminal Mode serial-basic Serial Interface, Basic Mode Looking forward to r151014! Thanks, Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From danmcd at omniti.com Wed Mar 25 23:44:32 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 25 Mar 2015 19:44:32 -0400 Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013 In-Reply-To: References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com> Message-ID: <0587009A-46A7-4432-9AAC-96DD5F9D8524@omniti.com> > On Mar 25, 2015, at 7:31 PM, Andy wrote: > > > On Tue, 10 Mar 2015, Dan McDonald wrote: > > ; The last bloody didn't have a lot of changes. This one does. Let's go over them: > ; > ; * ipmitool is now 1.8.15 > > The configure.in patch that enabled the open interface was also removed > along with this upgrade to 1.8.15. > > http://omnios.omniti.com/changeset.php/core/omnios-build/b9ed06fb1c62498f8c10acb7cf21e06865a3c74c#d1 > > Any particular reason? It stops it being able to talk to the interface > delivered by the dependant driver/ipmi package. I screwed up. The 1.8.15 source has no configure.in. I forgot to replace the configure.in patch with a similar configure patch. bloody(build/ipmitool)[2]% /tmp/build_danmcd/ipmitool-1.8.15/src/ipmitool -h | & ggrep -A5 Interfaces Interfaces: open Linux OpenIPMI Interface [default] lan IPMI v1.5 LAN Interface lanplus IPMI v2.0 RMCP+ LAN Interface serial-terminal Serial Interface, Terminal Mode serial-basic Serial Interface, Basic Mode bloody(build/ipmitool)[0]% That look better? Dan From danmcd at omniti.com Wed Mar 25 23:50:00 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 25 Mar 2015 19:50:00 -0400 Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013 In-Reply-To: <0587009A-46A7-4432-9AAC-96DD5F9D8524@omniti.com> References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com> <0587009A-46A7-4432-9AAC-96DD5F9D8524@omniti.com> Message-ID: <9FE39C53-E15D-49AF-BAB1-A42A00451404@omniti.com> > On Mar 25, 2015, at 7:44 PM, Dan McDonald wrote: > > I screwed up. The 1.8.15 source has no configure.in. I forgot to replace the configure.in patch with a similar configure patch. > > bloody(build/ipmitool)[2]% /tmp/build_danmcd/ipmitool-1.8.15/src/ipmitool -h | & ggrep -A5 Interfaces > Interfaces: > open Linux OpenIPMI Interface [default] > lan IPMI v1.5 LAN Interface > lanplus IPMI v2.0 RMCP+ LAN Interface > serial-terminal Serial Interface, Terminal Mode > serial-basic Serial Interface, Basic Mode > bloody(build/ipmitool)[0]% I've pushed the fix back into the master and r151014 branches. VERY good catch, and I'm very sorry for missing this. Thank you! Dan From omnios at citrus-it.net Wed Mar 25 23:51:25 2015 From: omnios at citrus-it.net (Andy) Date: Wed, 25 Mar 2015 23:51:25 +0000 (GMT) Subject: [OmniOS-discuss] 2nd-to-last bloody update for r151013 In-Reply-To: <0587009A-46A7-4432-9AAC-96DD5F9D8524@omniti.com> References: <694C0E20-17D7-468C-9C40-2461EE6350EB@omniti.com> <0587009A-46A7-4432-9AAC-96DD5F9D8524@omniti.com> Message-ID: On Wed, 25 Mar 2015, Dan McDonald wrote: ; ; > On Mar 25, 2015, at 7:31 PM, Andy wrote: ; > ; > ; > On Tue, 10 Mar 2015, Dan McDonald wrote: ; > ; > ; The last bloody didn't have a lot of changes. This one does. Let's go over them: ; > ; ; > ; * ipmitool is now 1.8.15 ; > ; > The configure.in patch that enabled the open interface was also removed ; > along with this upgrade to 1.8.15. ; > ; > http://omnios.omniti.com/changeset.php/core/omnios-build/b9ed06fb1c62498f8c10acb7cf21e06865a3c74c#d1 ; > ; > Any particular reason? It stops it being able to talk to the interface ; > delivered by the dependant driver/ipmi package. ; ; I screwed up. The 1.8.15 source has no configure.in. I forgot to replace the configure.in patch with a similar configure patch. ; ; bloody(build/ipmitool)[2]% /tmp/build_danmcd/ipmitool-1.8.15/src/ipmitool -h | & ggrep -A5 Interfaces ; Interfaces: ; open Linux OpenIPMI Interface [default] ; lan IPMI v1.5 LAN Interface ; lanplus IPMI v2.0 RMCP+ LAN Interface ; serial-terminal Serial Interface, Terminal Mode ; serial-basic Serial Interface, Basic Mode ; bloody(build/ipmitool)[0]% ; ; That look better? Much! I build my own ipmitool package to /opt anyway but this would have caught some people. Thanks, Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From moo at wuffers.net Thu Mar 26 04:58:32 2015 From: moo at wuffers.net (wuffers) Date: Thu, 26 Mar 2015 00:58:32 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: <20150325195212.5a8cebe4@sleipner.datanom.net> References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> <20150325195212.5a8cebe4@sleipner.datanom.net> Message-ID: On Wed, Mar 25, 2015 at 2:21 PM, Dan McDonald wrote: > > > A with-kmem-flags coredump will be very useful. > > > Here we go. I reproduced this with no load on the SAN, just a DC and vcenter server up, then created my 10TB disk in the vSphere fat client. As expected, I got the kernel panic again. TIME UUID SUNW-MSG-ID Mar 25 2015 21:13:40.122158000 daa21c2c-3a11-4d27-dc1b-a424cb890493 SUNOS-8000-KL TIME CLASS ENA Mar 25 21:13:40.0785 ireport.os.sunos.panic.dump_available 0x0000000000000000 Mar 25 21:12:23.5223 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000 nvlist version: 0 version = 0x0 class = list.suspect uuid = daa21c2c-3a11-4d27-dc1b-a424cb890493 code = SUNOS-8000-KL diag-time = 1427332420 88270 de = fmd:///module/software-diagnosis fault-list-sz = 0x1 fault-list = (array of embedded nvlists) (start fault-list[0]) nvlist version: 0 version = 0x0 class = defect.sunos.kernel.panic certainty = 0x64 asru = sw:///:path=/var/crash/unknown/.daa21c2c-3a11-4d27-dc1b-a424cb890493 resource = sw:///:path=/var/crash/unknown/.daa21c2c-3a11-4d27-dc1b-a424cb890493 savecore-succcess = 1 dump-dir = /var/crash/unknown dump-files = vmdump.1 os-instance-uuid = daa21c2c-3a11-4d27-dc1b-a424cb890493 panicstr = kernel heap corruption detected panicstack = fffffffffba49114 () | genunix:kmem_free+1c8 () | stmf_sbd:sbd_handle_write_same_xfer_completion+14d () | stmf_sbd:sbd_dbuf_xfer_done+b1 () | stmf:stmf_worker_task+376 () | unix:thread_start+8 () | crashtime = 1427330450 panic-time = Wed Mar 25 20:40:50 2015 EDT (end fault-list[0]) fault-status = 0x1 severity = Major __ttl = 0x1 __tod = 0x55135d44 0x747fbb0 Crash file: https://drive.google.com/open?id=0B7mCJnZUzJPKcTNRWGIwejVrV2s&authuser=0 Dump: https://docs.google.com/uc?id=0B7mCJnZUzJPKZlVjUEQydm1vaE0&export=download md5sum: 5ecbc150ed6683b90dbf39d4bf42209e vmdump.6.gz (2433358152 bytes) aa290f48c4ae9770c47fa62583e4cb70 vmdump.6 (5295046656 bytes) -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Thu Mar 26 05:06:47 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 26 Mar 2015 01:06:47 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> <20150325195212.5a8cebe4@sleipner.datanom.net> Message-ID: > On Mar 26, 2015, at 12:58 AM, wuffers wrote: > > | genunix:kmem_free+1c8 () | stmf_sbd:sbd_handle_write_same_xfer_completion+14d () | stmf_sbd:sbd_dbuf_xfer_done+b1 () | stmf:stmf_worker_task+376 () | unix:thread_start+8 () | Hmmph. The WRITE_SAME code, huh? I know Nexenta's done a LOT of improvements on this in illumos-nexenta. It might be time to upstream some of what they've done. I know it's a moving target (COMSTAR is not a well-written subsystem), so it may take some unravelling. I'm downloading the dump, in case the actual panic is more straightforward than most code in there. I worked on this a long time ago back when I was at Nexenta. It was provided to me by a contractor, and I had to bang it into shape for upstreaming. Clearly I missed something. Dan From andreas at luka-online.de Thu Mar 26 05:25:06 2015 From: andreas at luka-online.de (Andreas Luka) Date: Thu, 26 Mar 2015 13:25:06 +0800 Subject: [OmniOS-discuss] A warning for upgraders with large numbers of BEs In-Reply-To: References: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com> <20150323205308.GA21991@linux.gyakg.u-szeged.hu> Message-ID: If the LX brand is in I would volunteer with testing different Linux-Disto's. Regards Andreas On Tue, 24 Mar 2015 05:40:50 +0800, Dan McDonald wrote: > The LX brand hadn't been upstreamed yet. Once it has, we will include > it. We will likely assist in its upstreaming, but not at the moment. > > Dan > > Sent from my iPhone (typos, autocorrect, and all) > >> On Mar 23, 2015, at 4:53 PM, P?SZTOR Gy?rgy >> wrote: >> >> Hi, >> >> "Dan McDonald" wrote at 2015-03-23 16:14: >>> Soon r151014 will be hitting the streets. WHEN THAT DOES, I have to >>> warn people, especially those jumping from r151006 to r151014 about a >>> known issue in grub. >>> >>> The illumos grub has serious memory management issues. It cannot cope >>> with too many boot environment (BE) entries. >> >> Sorry for semi-offtopicing the thread, but: Will the lx brand be >> restored >> in the upcoming release? >> >> Is there a feature map / release plan / anything available? >> I tried to find information regarding this topic without success. >> >> I checked this url: >> http://omnios.omniti.com/roadmap.php >> But nothing relevant information was there. It seems outdated / >> unmaintained. >> >> I've just recently find this distro. I used openindiana since Oracle... >> -- Did what they did to opensolaris -- >> >> So, I'm new here, sorry for lame questions. >> >> Kind regards, >> Gy?rgy P?sztor > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -- Using Opera's mail client: http://www.opera.com/mail/ From info at houseofancients.nl Thu Mar 26 07:09:43 2015 From: info at houseofancients.nl (Floris van Essen ..:: House of Ancients Amstafs ::..) Date: Thu, 26 Mar 2015 07:09:43 +0000 Subject: [OmniOS-discuss] heads up Message-ID: <356582D1FC91784992ABB4265A16ED4891027D35@vEX01.mindstorm-internet.local> Hi Dann, Running latest Bloody , and after running a weekly check of available updates : pkg update -nv Creating Plan (Running solver): | pkg update: No solution was found to satisfy constraints Plan Creation: Package solver has not found a solution to update to latest available versions. This may indicate an overly constrained set of packages are installed. latest incorporations: pkg://omnios/consolidation/osnet/osnet-incorporation at 0.5.11,5.11-0.151014:20150324T181107Z The following indicates why the system cannot update to the latest version: No suitable version of required package pkg://omnios/consolidation/osnet/osnet-incorporation at 0.5.11,5.11-0.151014:20150324T181107Z found: Reject: pkg://omnios/consolidation/osnet/osnet-incorporation at 0.5.11,5.11-0.151014:20150324T181107Z Reason: A version for 'incorporate' dependency on pkg:/SUNWcs at 0.5.11,5.11-0.151014 cannot be found Can I just install osnet-incorporation at 0.5.11,5.11-0.151014:20150324T181107Z ? Best regards, Floris ...:: House of Ancients ::... American Staffordshire Terriers +31-628-161-350 +31-614-198-389 Het Perk 48 4903 RB Oosterhout Netherlands www.houseofancients.nl From alka at hfg-gmuend.de Thu Mar 26 13:04:46 2015 From: alka at hfg-gmuend.de (Guenther Alka) Date: Thu, 26 Mar 2015 14:04:46 +0100 Subject: [OmniOS-discuss] Open-VM-Tools in 151014 In-Reply-To: References: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com> <20150323205308.GA21991@linux.gyakg.u-szeged.hu> Message-ID: <551403EE.6000803@hfg-gmuend.de> I have updated 151012 to 151014 and installed the open-vm-tools (on ESXi 6.0.) for some basic tests. Installation via pkg install open-vm-tools was ok and ESXi 6.0 shows tools running (3rd party tools) but vmxnet3 is missing. Is vmxnet3 not a part of the open-vm-tools? Gea From danmcd at omniti.com Thu Mar 26 15:00:07 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 26 Mar 2015 11:00:07 -0400 Subject: [OmniOS-discuss] heads up In-Reply-To: <356582D1FC91784992ABB4265A16ED4891027D35@vEX01.mindstorm-internet.local> References: <356582D1FC91784992ABB4265A16ED4891027D35@vEX01.mindstorm-internet.local> Message-ID: <2996D603-4661-4B45-9517-10C08BD0E3A3@omniti.com> 1.) It's "Dan" one n. :) 2.) Are you seeing 014 packages in the "bloody" repo? You shouldn't be. But shoot, there it is. I'll clean up the bloody repo today. Sorry, Dan From danmcd at omniti.com Thu Mar 26 15:03:27 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 26 Mar 2015 11:03:27 -0400 Subject: [OmniOS-discuss] Did someone build omnios-build for 014 and push it to the *bloody* repo server? Message-ID: Subject says it all. It's possible *I* did, but if someone else has built omnios-build on or for r151014 without setting PKGSRVR, please let me know ASAP. Thanks, Dan From sjorge+ml at blackdot.be Thu Mar 26 15:04:50 2015 From: sjorge+ml at blackdot.be (Jorge Schrauwen) Date: Thu, 26 Mar 2015 16:04:50 +0100 Subject: [OmniOS-discuss] =?utf-8?q?Did_someone_build_omnios-build_for_014?= =?utf-8?q?_and_push_it_to_the_*bloody*_repo_server=3F?= In-Reply-To: References: Message-ID: <90d462ba2b581b9943cda1d4988cc191@blackdot.be> Do you have the right mailing list? Seems odd to send it here as only omniti people should have access. On 2015-03-26 16:03, Dan McDonald wrote: > Subject says it all. It's possible *I* did, but if someone else has > built omnios-build on or for r151014 without setting PKGSRVR, please > let me know ASAP. > > Thanks, > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From danmcd at omniti.com Thu Mar 26 15:06:06 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 26 Mar 2015 11:06:06 -0400 Subject: [OmniOS-discuss] Open-VM-Tools in 151014 In-Reply-To: <551403EE.6000803@hfg-gmuend.de> References: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com> <20150323205308.GA21991@linux.gyakg.u-szeged.hu> <551403EE.6000803@hfg-gmuend.de> Message-ID: There is no 014 on the bloody repo server, and 014 is NOT OFFICIALLY OUT YET. An automatic build of some sort pushed out r151014 packages to http://pkg.omniti.com/omnios/bloody/ and it shouldn't have. I'm cleaning up the repo now. If you updated via bloody, revert to an 013 BE now. Dan From danmcd at omniti.com Thu Mar 26 15:13:11 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 26 Mar 2015 11:13:11 -0400 Subject: [OmniOS-discuss] Bloody repo is contaminated at the moment... In-Reply-To: <90d462ba2b581b9943cda1d4988cc191@blackdot.be> References: <90d462ba2b581b9943cda1d4988cc191@blackdot.be> Message-ID: > On Mar 26, 2015, at 11:04 AM, Jorge Schrauwen wrote: > > Do you have the right mailing list? Seems odd to send it here as only omniti people should have access. Wrong mailing list. BUT the "bloody" repo server apparently received packages for 014 when it shouldn't have. And some people on the list have updated to them or attempted to update to them. I'm cleaning out the 014 from bloody as I type this. When r151014 is ready, you'll know!!! Dan From danmcd at omniti.com Thu Mar 26 15:37:26 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 26 Mar 2015 11:37:26 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> <20150325195212.5a8cebe4@sleipner.datanom.net> Message-ID: <07B1E231-B571-46BC-913D-445DBCA54923@omniti.com> I mentioned earlier: > I know Nexenta's done a LOT of improvements on this in illumos-nexenta. It might be time to upstream some of what they've done. I know it's a moving target (COMSTAR is not a well-written subsystem), so it may take some unravelling. I was looking at Nexenta's changes. They HAVE done a lot of work in these areas, and at some point someone needs to upstream them. Nexenta isn't under an obligation to upstream, just to publish, which they have. I found one particular bug that MAY have manifested as your problem. Because 014's coming up, I can't get to it at the moment. If you've built kernel modules before, I can tell you where the fix should go and approximately what the fix is. You'd have to test it, however. Sorry I can't be of more immediate assistance, Dan From moo at wuffers.net Thu Mar 26 15:47:47 2015 From: moo at wuffers.net (wuffers) Date: Thu, 26 Mar 2015 11:47:47 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: <07B1E231-B571-46BC-913D-445DBCA54923@omniti.com> References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> <20150325195212.5a8cebe4@sleipner.datanom.net> <07B1E231-B571-46BC-913D-445DBCA54923@omniti.com> Message-ID: On Thu, Mar 26, 2015 at 11:37 AM, Dan McDonald wrote: > I mentioned earlier: > > > I know Nexenta's done a LOT of improvements on this in illumos-nexenta. > It might be time to upstream some of what they've done. I know it's a > moving target (COMSTAR is not a well-written subsystem), so it may take > some unravelling. > > I was looking at Nexenta's changes. They HAVE done a lot of work in these > areas, and at some point someone needs to upstream them. Nexenta isn't > under an obligation to upstream, just to publish, which they have. > > I found one particular bug that MAY have manifested as your problem. > Because 014's coming up, I can't get to it at the moment. If you've built > kernel modules before, I can tell you where the fix should go and > approximately what the fix is. You'd have to test it, however. > > Sorry I can't be of more immediate assistance, > Dan > > > Hi Dan (just saw your latest reply as I was writing this), Thanks for all the time you've put into this. It certainly sounds like some of the Nexenta COMSTAR work might be useful. Is R151014 released yet? It looks like all the documentation is there but mentions Apr 3/2015. Is there any reason to believe that it might be fixed if there are no (or low amounts) of changes in COMSTAR for this release? (Sounds like it isn't, now that I've read your latest) It looks like I'll have to make do with lazy zeroed or thin provisioned disks of 10TB+ for my Veeam tests, if it doesn't cause another kernel panic. I'm hesitant to create these now during business hours (and I shouldn't be.. these are normal VM provisioning tasks on available storage!). In your estimation, would eager zero vs lazy zero vs thin provisioned vmdks make any difference with that WRITE_SAME code? The majority of my VMs use eager zeroed disks, but again, never to this size. If there is anything you need me to test (in R151014? or beyond?), it's easy enough for me to reproduce (I timed myself last night, it took me about 2 hours to gracefully shut/save all the VMs, cause the crash dump, and get the infrastructure back up). I should probably try it on Hyper-V as well when I get time, but I believe most of those are Dynamic (thin) instead of Fixed (eager zero) disks, and I don't believe Hyper-V has an equivalent to lazy zeroed. The Hyper-V environment runs our test VMs after all, and aren't as performance sensitive. If you can tell me where the fix should go, I can probably try it out, even though I haven't built any kernel modules before (though I'm sure there are enough resources for me to draw on). I'll start by making myself a build server on a VM. Is this http://wiki.illumos.org/display/illumos/How+To+Build+illumos still current? -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at will.to Thu Mar 26 16:24:49 2015 From: doug at will.to (Doug Hughes) Date: Thu, 26 Mar 2015 12:24:49 -0400 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS Message-ID: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> any recommendations? We're having some pretty big problems with the Solarflare card and driver dropping network under high load. We eliminated LACP as a culprit, and the switch. Intel? Chelsio? other? - Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From chip at innovates.com Thu Mar 26 16:36:12 2015 From: chip at innovates.com (Schweiss, Chip) Date: Thu, 26 Mar 2015 11:36:12 -0500 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> Message-ID: The Intel X520's and the Supermicro equivalents are rock solid. The X540 probably is too, I just haven't used it. I prefer the Supermicro branded Intel cards because the firmware is not as picky about the twin-ax cables used. -Chip On Thu, Mar 26, 2015 at 11:24 AM, Doug Hughes wrote: > any recommendations? We're having some pretty big problems with the > Solarflare card and driver dropping network under high load. We eliminated > LACP as a culprit, and the switch. > > Intel? Chelsio? other? > > - Doug > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jstockett at molalla.com Thu Mar 26 16:45:11 2015 From: jstockett at molalla.com (Jeff Stockett) Date: Thu, 26 Mar 2015 16:45:11 +0000 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> Message-ID: <136C13E89D22BB468B2A7025993639732F4E3778@EXMCCMB.molalla.com> I would concur with what Chip said. We?ve had good luck with the Intel X520s setup with LACP to a Nexus 5000 ? and also have a few X540s. The X520s are a bit picky about SFPs but Appoved makes one that works and is reasonably affordable. From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Doug Hughes Sent: Thursday, March 26, 2015 9:25 AM To: omnios-discuss Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS any recommendations? We're having some pretty big problems with the Solarflare card and driver dropping network under high load. We eliminated LACP as a culprit, and the switch. Intel? Chelsio? other? - Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at will.to Thu Mar 26 16:51:19 2015 From: doug at will.to (Doug Hughes) Date: Thu, 26 Mar 2015 12:51:19 -0400 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> Message-ID: Thanks guys! (also, if anybody has some advice or contrary experience with SolarFlare (5162), I'd love to hear it. Right now they are pretty much unusable under load, though iperf tends to work fine. On Thu, Mar 26, 2015 at 12:36 PM, Schweiss, Chip wrote: > The Intel X520's and the Supermicro equivalents are rock solid. The > X540 probably is too, I just haven't used it. I prefer the Supermicro > branded Intel cards because the firmware is not as picky about the twin-ax > cables used. > > -Chip > > On Thu, Mar 26, 2015 at 11:24 AM, Doug Hughes wrote: > >> any recommendations? We're having some pretty big problems with the >> Solarflare card and driver dropping network under high load. We eliminated >> LACP as a culprit, and the switch. >> >> Intel? Chelsio? other? >> >> - Doug >> _______________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at will.to Thu Mar 26 16:55:55 2015 From: doug at will.to (Doug Hughes) Date: Thu, 26 Mar 2015 12:55:55 -0400 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> Message-ID: Regarding X520 and SFP+.. We tend to use Amphenol. Do those work ok? On Thu, Mar 26, 2015 at 12:51 PM, Doug Hughes wrote: > Thanks guys! (also, if anybody has some advice or contrary experience with > SolarFlare (5162), I'd love to hear it. Right now they are pretty much > unusable under load, though iperf tends to work fine. > > > > On Thu, Mar 26, 2015 at 12:36 PM, Schweiss, Chip > wrote: > >> The Intel X520's and the Supermicro equivalents are rock solid. The >> X540 probably is too, I just haven't used it. I prefer the Supermicro >> branded Intel cards because the firmware is not as picky about the twin-ax >> cables used. >> >> -Chip >> >> On Thu, Mar 26, 2015 at 11:24 AM, Doug Hughes wrote: >> >>> any recommendations? We're having some pretty big problems with the >>> Solarflare card and driver dropping network under high load. We eliminated >>> LACP as a culprit, and the switch. >>> >>> Intel? Chelsio? other? >>> >>> - Doug >>> _______________________________________________ >>> OmniOS-discuss mailing list >>> OmniOS-discuss at lists.omniti.com >>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Thu Mar 26 17:05:09 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 26 Mar 2015 13:05:09 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> <20150325195212.5a8cebe4@sleipner.datanom.net> <07B1E231-B571-46BC-913D-445DBCA54923@omniti.com> Message-ID: <7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com> > On Mar 26, 2015, at 11:47 AM, wuffers wrote: > It looks like I'll have to make do with lazy zeroed or thin provisioned disks of 10TB+ for my Veeam tests, if it doesn't cause another kernel panic. I'm hesitant to create these now during business hours (and I shouldn't be.. these are normal VM provisioning tasks on available storage!). In your estimation, would eager zero vs lazy zero vs thin provisioned vmdks make any difference with that WRITE_SAME code? The majority of my VMs use eager zeroed disks, but again, never to this size. WRITE_SAME is one of the four VAAI primitives. Nexenta wrote this code for NS, and upstreamed two of them: WRITE_SAME is "hardware assisted erase". UNMAP is "hardware assisted freeing". Those are in upstream illumos. ATS is atomic-test-and-set or "hardware assisted fine-grained locking". XCOPY is "hardware assisted copying". These are in NexentaStor, and after being held back, were open-sourced, but not yet upstreamed. > If there is anything you need me to test (in R151014? or beyond?), it's easy enough for me to reproduce (I timed myself last night, it took me about 2 hours to gracefully shut/save all the VMs, cause the crash dump, and get the infrastructure back up). I should probably try it on Hyper-V as well when I get time, but I believe most of those are Dynamic (thin) instead of Fixed (eager zero) disks, and I don't believe Hyper-V has an equivalent to lazy zeroed. The Hyper-V environment runs our test VMs after all, and aren't as performance sensitive. I may be able to generate a fix, but I have no idea if it's sufficient or not. Like I said, COMSTAR is not well-written or maintainable code, but Nexenta has put a lot of love into it. > If you can tell me where the fix should go, I can probably try it out, even though I haven't built any kernel modules before (though I'm sure there are enough resources for me to draw on). I'll start by making myself a build server on a VM. Is this http://wiki.illumos.org/display/illumos/How+To+Build+illumos still current? The small fix I might be able to generate will involve a replacement "stmf_sbd" module. More on that after I get cycles to generate something. Dan From gmason at msu.edu Thu Mar 26 16:30:21 2015 From: gmason at msu.edu (Greg Mason) Date: Thu, 26 Mar 2015 12:30:21 -0400 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> Message-ID: <9C3872C8-BF0E-4A14-A1BC-FEC5241DF583@msu.edu> On Mar 26, 2015, at 12:24 PM, Doug Hughes wrote: > > any recommendations? We're having some pretty big problems with the Solarflare card and driver dropping network under high load. We eliminated LACP as a culprit, and the switch. > > Intel? Chelsio? other? I?ve had a pretty good experience with the Intel X520 cards, not really much I can complain about. -Greg From danmcd at omniti.com Thu Mar 26 17:25:17 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 26 Mar 2015 13:25:17 -0400 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: <9C3872C8-BF0E-4A14-A1BC-FEC5241DF583@msu.edu> References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> <9C3872C8-BF0E-4A14-A1BC-FEC5241DF583@msu.edu> Message-ID: <4967907A-EA42-476A-93CA-B8D65D778972@omniti.com> Generally speaking the Intel ones are preferable, because Intel does the best job of keeping those drivers up to date for all open-source platforms. Just pushed into illumos, and coming soon to r151014 is OPEN-SOURCE support for Broadcom NetXtreme II (now owned by QLogic) 10GigE (the "bnxe" driver). It's not nearly as good as Intel's code, or likely Intel's HW, but now that it is open-source, it becomes a viable alternative, because people can now fix the driver if there is a problem. I'd still use Intel 10Gig where possible, but if you're stuck on HW with formerly-Broadcom 10GigE, your luck will be improving somewhat. Dan From danmcd at omniti.com Thu Mar 26 18:56:41 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 26 Mar 2015 14:56:41 -0400 Subject: [OmniOS-discuss] Bloody repo server Message-ID: <3A0B4680-F786-4B21-AF31-A251A11FE2FA@omniti.com> Is offline for now. I'm trying to get rid of the 014 crud I accidentally pushed into it on Tuesday. Those packages were just illumos-omnios ones, which are just still the master branch of illumos-omnios. Dan From danmcd at omniti.com Thu Mar 26 19:30:44 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 26 Mar 2015 15:30:44 -0400 Subject: [OmniOS-discuss] Bloody repo server In-Reply-To: <3A0B4680-F786-4B21-AF31-A251A11FE2FA@omniti.com> References: <3A0B4680-F786-4B21-AF31-A251A11FE2FA@omniti.com> Message-ID: <13204CB0-13D4-4E6C-828C-1CB0D98797DB@omniti.com> Is now back online, and cleaned of r151014 packages that shouldn't have been there. Thank you! Dan From john.barfield at bissinc.com Thu Mar 26 20:50:32 2015 From: john.barfield at bissinc.com (John Barfield) Date: Thu, 26 Mar 2015 20:50:32 +0000 Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues In-Reply-To: References: <708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com> <5511ED7A.3080006@gmx.li> <870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com> <38C9C9CA-A75F-49F6-B210-C04CA81402E4@bissinc.com> Message-ID: <4B77878C-A980-44C5-B5D8-1A34D9D3A52A@bissinc.com> So I was still having issues with virtio performance. I?ve finally determined that its the child zone that is capping the throughput at 85mbps. If I halt the zone and launch the same VM from the GZ I get 955mbps. Another thing?the virtio driver in Centos6.6 does not work well with OmniOS kvm. I can boot Centos from either the GZ or a CZ and I?m actually getting results in the Kb now instead of mbps with iperf. May have something to do with the tcp window being 19.5 kb on CentOS vs 85kb on Ubuntu. Assuming this is a driver problem. The only OS I get good speeds with are Ubuntu server 14.04 running the Global Zone. (Have only tested two though :)) So my recipe for decent virtio performance on OmniOS: Ubuntu Linux Server 14.04 running in Global Zone. Does anyone have any idea why the child zone is capping my throughput? Am I missing a zone cfg parameter to allow the child zone to have full 1GB bandwidth? From: Theo Schlossnagle Date: Wednesday, March 25, 2015 at 6:56 AM To: John Barfield Cc: Phil Harman, "omnios-discuss at lists.omniti.com" Subject: Re: [OmniOS-discuss] Potential KVM Virtio Performance Issues +1 John. That documentation would be very welcome. On Tue, Mar 24, 2015 at 9:50 PM, John Barfield > wrote: Actually the numbers I sent for the SmartOS VM to VM test were on a switch with Jumbo frames (switch = 9216 mtu...SmartOS GZ MTU = 9000) (Extreme Networks Summit X440-48t release 15.2.3 patch12) Theyre also sitting in Q-in-Q tagged VLANs. Admin tagged nic sits in Vman (provider bridge) 10 the VMs were tagged in VLAN 1674. (not bad :) really) As far as everyone who is wondering how I got 952 Mbps on OmnisOS KVM while running in a zone I plan to write up a how-to that can be posted to the core site if you'd like. There are several caveats that are not documented today for running KVM in a zone. Not that I didnt reverse engineer some of Joyents work of course. Thanks and have a great day, John Barfield > On Mar 24, 2015, at 7:40 PM, Phil Harman > wrote: > > John, > > Interesting work and data. Thanks for sharing. > > I've also been playing with Oracle Solaris 11.2 vs Linux vs FreeBSD on SmartOS vs ESX5.5 (free edition) both VM2VM and VM to remote host over a couple of Intel 10GBASE-T cards. > > As far as I can tell, there remains no virtio-net driver for Solaris / Illumos guests, so I've been using e1000g, which really sucks. > > I found virtio-net works ok under KVM, but was blown away by vmxnet3 under ESX performance (for which a Solaris / Illumos drivers do exist), being able to get close to 8gbps from the guest over the wire! > > To achieve this I had to use jumbo frames (something the current Solaris 11.2 e1000g appears unable to do at all any more). > > So I was wondering, while you are there, whether you've got (or can get) any data for KVM virtio-net VM2VM using jumbo frames? > > My expectation would be at least 2x for MTU 9000 vs 1500. > > I also wonder whether like for like comparison with ESX might encourage further improvements? > > As someone used to say at Sun "If Linux is faster, it's a Solaris bug!". It would be great if the community could agree to the same for ESX vs KVM :) > > Cheers, > Phil > > >> On 24 Mar 2015, at 23:45, John Barfield > wrote: >> >> Btw I did go ahead and test both virtio methods...I gave a vm the -device argument on one interface and the -net argument for another the results where.... >> >> -device = eth0 = 952mbps >> -net = eth1 = 199 mbps >> >> Thanks and have a great day, >> >> John Barfield >> >>> On Mar 24, 2015, at 6:12 PM, Dan McDonald > wrote: >>> >>> >>>> On Mar 24, 2015, at 7:04 PM, Dominik Hassler > wrote: >>>> >>>> Dan, >>>> >>>>>> After further testing I achieved 952 MBytes on a VM-2-VM >>>>>> connection...1 >>>>>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two >>>>>> different SmartOS host machines (through an extreme networks switch). >>>> >>>> if I got John correctly, he was running his second test on SmartOS hosts... >>>> >>>> We did a lot of testing on OmniOS with -net vnic and -device >>>> virtio-net-pci but sadly to no avail... >>>> >>>> I think we have to hope that SmartOS kvm improvements will get >>>> upstreamed sooner or later. >>> >>> Ahh yes. >>> >>> I was hoping to have them ready for 014, but it's a complicated process to upstream larger projects, and Joyent was in the middle of getting their new Triton release out the door. >>> >>> Dan >> _______________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss -- Theo Schlossnagle http://omniti.com/is/theo-schlossnagle -------------- next part -------------- An HTML attachment was scrubbed... URL: From moo at wuffers.net Thu Mar 26 21:15:39 2015 From: moo at wuffers.net (wuffers) Date: Thu, 26 Mar 2015 17:15:39 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: <7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com> References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> <20150325195212.5a8cebe4@sleipner.datanom.net> <07B1E231-B571-46BC-913D-445DBCA54923@omniti.com> <7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com> Message-ID: On Thu, Mar 26, 2015 at 1:05 PM, Dan McDonald wrote: > > WRITE_SAME is one of the four VAAI primitives. Nexenta wrote this code > for NS, and upstreamed two of them: > > WRITE_SAME is "hardware assisted erase". > > UNMAP is "hardware assisted freeing". > > Those are in upstream illumos. > > ATS is atomic-test-and-set or "hardware assisted fine-grained locking". > > XCOPY is "hardware assisted copying". > > These are in NexentaStor, and after being held back, were open-sourced, > but not yet upstreamed. > > Ahh, VAAI. I suspect this is a bigger bite to chew, looking back at some prior discussions on this list (although I'm sure many are anxiously awaiting this to be upstreamed). I'm guessing Microsoft's ODX will also be supported since I understand that is just an XCOPY. I see that FreeNAS now has support for both VAAI and ODX - are they porting stuff from the various Illumos distros (including the referenced Nexenta work on VAAI or is it their own implementation)? After some more reading to answer my own questions, I came across this VMware blog post ( http://blogs.vmware.com/vsphere/2012/06/low-level-vaai-behaviour.html): "The following provisioning tasks are accelerated by the use of the WRITE SAME command: Cloning operations for eagerzeroedthick target disks. Allocating new file blocks for thin provisioned virtual disks. Initializing previous unwritten file blocks for zerothick virtual disks." I don't seem to have issues allocating smaller amounts of space, so I suspect that using thin or lazy zero will work. Secondly, it *might* just be the vSphere fat client, as I found another VMware KB ( http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2058287) which states I cannot make a disk larger than 4TB, which contradicts this properties dialog: http://i.imgur.com/f9liqpR.png (says maximum file size of 2TB in vSphere fat client) versus: http://i.imgur.com/6Ya3oH4.png (says maximum file size 64TB in the vSphere web client) The KB goes on to state, "Checking the size of the newly created or expanded VMDK, you find that it is 4 TB." is untrue, because it allocated and is using 10TB. Don't know how much to trust that info as it seems contradictory. Still, it shouldn't cause the kernel panic like it did. Thirdly, it appears I can disable any of the VAAI primitives in the host configuration, if all else fails (since we've determined that it is likely caused by WRITE_SAME). Good read on this via the VAAI FAQ here (which shows you how to check the properties via the ESX CLI): http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1021976 So here's what I will attempt to test: - Create thin vmdk @ 10TB with vSphere fat client - Create lazy zeroed vmdk @ 10 TB with vSphere fat client - Create eager zeroed vmdk @ 10 TB with vSphere web client - Create thin vmdk @ 10TB with vSphere web client - Create lazy zeroed vmdk @ 10 TB with vSphere web client So it seems I do have alternatives (disabling DataMover.HardwareAcceleratedMove as a last resort). -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Thu Mar 26 21:19:37 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 26 Mar 2015 17:19:37 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> <20150325195212.5a8cebe4@sleipner.datanom.net> <07B1E231-B571-46BC-913D-445DBCA54923@omniti.com> <7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com> Message-ID: <294BC7DA-0A85-4662-B4B2-71C94755A30A@omniti.com> Just remember that only WRITE_SAME and UNMAP are on stock illumos. If you want the other two, you either get NexentaStor or you start an effort to upstream them from illumos-nexenta. Dan From matthew.lagoe at subrigo.net Thu Mar 26 22:33:24 2015 From: matthew.lagoe at subrigo.net (Matthew Lagoe) Date: Thu, 26 Mar 2015 15:33:24 -0700 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> Message-ID: <009e01d06814$e1f81170$a5e83450$@subrigo.net> I use the myricom 10g cards (myri10g) they work just fine. From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Doug Hughes Sent: Thursday, March 26, 2015 09:25 AM To: omnios-discuss Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS any recommendations? We're having some pretty big problems with the Solarflare card and driver dropping network under high load. We eliminated LACP as a culprit, and the switch. Intel? Chelsio? other? - Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From phil.harman at gmail.com Fri Mar 27 00:05:33 2015 From: phil.harman at gmail.com (Phil Harman) Date: Fri, 27 Mar 2015 00:05:33 +0000 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: <136C13E89D22BB468B2A7025993639732F4E3778@EXMCCMB.molalla.com> References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> <136C13E89D22BB468B2A7025993639732F4E3778@EXMCCMB.molalla.com> Message-ID: <3FA8D3A2-B45F-48D4-909E-6B8F2542DAFF@gmail.com> SFPs? Are you know kidding me? For my home lab I bought a pair of X540-T2 cards, which will even run over Cat5 for about a metre. They use a lot more power than SFP though :) No issues at all with illumos or VMware. > On 26 Mar 2015, at 16:45, Jeff Stockett wrote: > > I would concur with what Chip said. We?ve had good luck with the Intel X520s setup with LACP to a Nexus 5000 ? and also have a few X540s. The X520s are a bit picky about SFPs but Appoved makes one that works and is reasonably affordable. > > From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Doug Hughes > Sent: Thursday, March 26, 2015 9:25 AM > To: omnios-discuss > Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS > > any recommendations? We're having some pretty big problems with the Solarflare card and driver dropping network under high load. We eliminated LACP as a culprit, and the switch. > > Intel? Chelsio? other? > > - Doug > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephan.budach at JVM.DE Fri Mar 27 05:49:00 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Fri, 27 Mar 2015 06:49:00 +0100 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: <3FA8D3A2-B45F-48D4-909E-6B8F2542DAFF@gmail.com> References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> <136C13E89D22BB468B2A7025993639732F4E3778@EXMCCMB.molalla.com> <3FA8D3A2-B45F-48D4-909E-6B8F2542DAFF@gmail.com> Message-ID: <5514EF4C.70909@jvm.de> Am 27.03.15 um 01:05 schrieb Phil Harman: > SFPs? Are you know kidding me? For my home lab I bought a pair of > X540-T2 cards, which will even run over Cat5 for about a metre. They > use a lot more power than SFP though :) He he? 1m doesn't buy you much in a server room, so we're using almost only 540-T2 over Cat6 and we never had a problem under Solaris or OmniOS. I'd always choose the Intel 10GbEs over all other brands. I've had horrible issues with Broadcom-based NICs in my Dells and they will have to prove their reliability to a load of other people first, before I will even consider, buying those again? but, I guess so will other do as well. ;) > > No issues at all with illumos or VMware. > > On 26 Mar 2015, at 16:45, Jeff Stockett > wrote: > >> I would concur with what Chip said. We?ve had good luck with the >> Intel X520s setup with LACP to a Nexus 5000 ? and also have a few >> X540s. The X520s are a bit picky about SFPs but Appoved makes one >> that works and is reasonably affordable. >> >> *From:*OmniOS-discuss >> [mailto:omnios-discuss-bounces at lists.omniti.com] *On Behalf Of *Doug >> Hughes >> *Sent:* Thursday, March 26, 2015 9:25 AM >> *To:* omnios-discuss >> *Subject:* [OmniOS-discuss] best or preferred 10g card for OmniOS >> >> any recommendations? We're having some pretty big problems with the >> Solarflare card and driver dropping network under high load. We >> eliminated LACP as a culprit, and the switch. >> >> Intel? Chelsio? other? >> >> - Doug >> >> _______________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -- Stephan Budach Managing Director Jung von Matt/it-services GmbH Glash?ttenstra?e 79 20357 Hamburg Tel: +49 40-4321-1353 Fax: +49 40-4321-1114 E-Mail: stephan.budach at jvm.de Internet: http://www.jvm.com Gesch?ftsf?hrer: Stephan Budach AG HH HRB 98380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From moo at wuffers.net Fri Mar 27 06:24:07 2015 From: moo at wuffers.net (wuffers) Date: Fri, 27 Mar 2015 02:24:07 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: <294BC7DA-0A85-4662-B4B2-71C94755A30A@omniti.com> References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> <20150325195212.5a8cebe4@sleipner.datanom.net> <07B1E231-B571-46BC-913D-445DBCA54923@omniti.com> <7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com> <294BC7DA-0A85-4662-B4B2-71C94755A30A@omniti.com> Message-ID: > > > So here's what I will attempt to test: > - Create thin vmdk @ 10TB with vSphere fat client: PASS > - Create lazy zeroed vmdk @ 10 TB with vSphere fat client: PASS > - Create eager zeroed vmdk @ 10 TB with vSphere web client: PASS! (took 1 > hour) > - Create thin vmdk @ 10TB with vSphere web client: PASS > - Create lazy zeroed vmdk @ 10 TB with vSphere web client: PASS > > Additionally, I tried: - Create fixed vhdx @ 10TB with SCVMM (Hyper-V): PASS (most likely no primitives in use here - this took slightly over 3 hours) Everything passed (which I didn't expect, especially the 10TB eager zero).. then I tried again on the vSphere web client for a 20TB eager zero disk, and I got another kernel panic altogether (no kmem_flags 0xf set, unfortunately). Mar 27 2015 01:09:33.664060000 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd SUNOS-8000-KL TIME CLASS ENA Mar 27 01:09:33.6307 ireport.os.sunos.panic.dump_available 0x0000000000000000 Mar 27 01:08:30.6688 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000 nvlist version: 0 version = 0x0 class = list.suspect uuid = 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd code = SUNOS-8000-KL diag-time = 1427432973 633746 de = fmd:///module/software-diagnosis fault-list-sz = 0x1 fault-list = (array of embedded nvlists) (start fault-list[0]) nvlist version: 0 version = 0x0 class = defect.sunos.kernel.panic certainty = 0x64 asru = sw:///:path=/var/crash/unknown/.2e2305c2-54b5-c1f4-aafd-fb1eccc982dd resource = sw:///:path=/var/crash/unknown/.2e2305c2-54b5-c1f4-aafd-fb1eccc982dd savecore-succcess = 1 dump-dir = /var/crash/unknown dump-files = vmdump.2 os-instance-uuid = 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd panicstr = BAD TRAP: type=d (#gp General protection) rp=ffffff01eb72ea70 addr=0 panicstack = unix:real_mode_stop_cpu_stage2_end+9e23 () | unix:trap+a30 () | unix:cmntrap+e6 () | genunix:anon_decref+35 () | genunix:anon_free+74 () | genunix:segvn_free+242 () | genunix:seg_free+30 () | genunix:segvn_unmap+cde () | genunix:as_free+e7 () | genunix:relvm+220 () | genunix:proc_exit+454 () | genunix:exit+15 () | genunix:rexit+18 () | unix:brand_sys_sysenter+1c9 () | crashtime = 1427431421 panic-time = Fri Mar 27 00:43:41 2015 EDT (end fault-list[0]) fault-status = 0x1 severity = Major __ttl = 0x1 __tod = 0x5514e60d 0x2794c060 Crash file: https://drive.google.com/file/d/0B7mCJnZUzJPKT0lpTW9GZFJCLTg/view?usp=sharing It appears I can do thin and lazy zero disks of those sizes, so I will have to be satisfied to use those options as a workaround (plus disabling WRITE_SAME from the hosts if I really wanted the eager zeroed disk) until some of that Nexenta COMSTAR love is upstreamed. For comparison sake, provisioning a 10TB fixed vhdx took approximately 3 hours in Hyper-V, while the same provisioning in VMware took about 1 hour. So we can say that WRITE_SAME accelerated the same job by 3x. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobi at oetiker.ch Fri Mar 27 08:36:35 2015 From: tobi at oetiker.ch (Tobias Oetiker) Date: Fri, 27 Mar 2015 09:36:35 +0100 (CET) Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues In-Reply-To: <4B77878C-A980-44C5-B5D8-1A34D9D3A52A@bissinc.com> References: <708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com> <5511ED7A.3080006@gmx.li> <870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com> <38C9C9CA-A75F-49F6-B210-C04CA81402E4@bissinc.com> <4B77878C-A980-44C5-B5D8-1A34D9D3A52A@bissinc.com> Message-ID: Hi John, this sounds encurraging ... could you provied complete details of your setup ... * start up commandline for qemu-system-x86_64 * kernel version running on on your ubuntu * any special config on the ubuntu side (eg in /etc/systctl.conf) or so ... cheers tobi Yesterday John Barfield wrote: > So I was still having issues with virtio performance. I?ve finally determined that its the child zone that is capping the throughput at 85mbps. > > If I halt the zone and launch the same VM from the GZ I get 955mbps. > > Another thing?the virtio driver in Centos6.6 does not work well with OmniOS kvm. > > I can boot Centos from either the GZ or a CZ and I?m actually getting results in the Kb now instead of mbps with iperf. May have something to do with the tcp window being 19.5 kb on CentOS vs 85kb on Ubuntu. Assuming this is a driver problem. > > The only OS I get good speeds with are Ubuntu server 14.04 running the Global Zone. (Have only tested two though :)) > > So my recipe for decent virtio performance on OmniOS: > > Ubuntu Linux Server 14.04 running in Global Zone. > > Does anyone have any idea why the child zone is capping my throughput? > > Am I missing a zone cfg parameter to allow the child zone to have full 1GB bandwidth? > > > > From: Theo Schlossnagle > Date: Wednesday, March 25, 2015 at 6:56 AM > To: John Barfield > Cc: Phil Harman, "omnios-discuss at lists.omniti.com" > Subject: Re: [OmniOS-discuss] Potential KVM Virtio Performance Issues > > +1 John. That documentation would be very welcome. > > On Tue, Mar 24, 2015 at 9:50 PM, John Barfield > wrote: > Actually the numbers I sent for the SmartOS VM to VM test were on a switch with Jumbo frames (switch = 9216 mtu...SmartOS GZ MTU = 9000) (Extreme Networks Summit X440-48t release 15.2.3 patch12) Theyre also sitting in Q-in-Q tagged VLANs. Admin tagged nic sits in Vman (provider bridge) 10 the VMs were tagged in VLAN 1674. (not bad :) really) > > As far as everyone who is wondering how I got 952 Mbps on OmnisOS KVM while running in a zone I plan to write up a how-to that can be posted to the core site if you'd like. There are several caveats that are not documented today for running KVM in a zone. Not that I didnt reverse engineer some of Joyents work of course. > > > > Thanks and have a great day, > > John Barfield > > > On Mar 24, 2015, at 7:40 PM, Phil Harman > wrote: > > > > John, > > > > Interesting work and data. Thanks for sharing. > > > > I've also been playing with Oracle Solaris 11.2 vs Linux vs FreeBSD on SmartOS vs ESX5.5 (free edition) both VM2VM and VM to remote host over a couple of Intel 10GBASE-T cards. > > > > As far as I can tell, there remains no virtio-net driver for Solaris / Illumos guests, so I've been using e1000g, which really sucks. > > > > I found virtio-net works ok under KVM, but was blown away by vmxnet3 under ESX performance (for which a Solaris / Illumos drivers do exist), being able to get close to 8gbps from the guest over the wire! > > > > To achieve this I had to use jumbo frames (something the current Solaris 11.2 e1000g appears unable to do at all any more). > > > > So I was wondering, while you are there, whether you've got (or can get) any data for KVM virtio-net VM2VM using jumbo frames? > > > > My expectation would be at least 2x for MTU 9000 vs 1500. > > > > I also wonder whether like for like comparison with ESX might encourage further improvements? > > > > As someone used to say at Sun "If Linux is faster, it's a Solaris bug!". It would be great if the community could agree to the same for ESX vs KVM :) > > > > Cheers, > > Phil > > > > > >> On 24 Mar 2015, at 23:45, John Barfield > wrote: > >> > >> Btw I did go ahead and test both virtio methods...I gave a vm the -device argument on one interface and the -net argument for another the results where.... > >> > >> -device = eth0 = 952mbps > >> -net = eth1 = 199 mbps > >> > >> Thanks and have a great day, > >> > >> John Barfield > >> > >>> On Mar 24, 2015, at 6:12 PM, Dan McDonald > wrote: > >>> > >>> > >>>> On Mar 24, 2015, at 7:04 PM, Dominik Hassler > wrote: > >>>> > >>>> Dan, > >>>> > >>>>>> After further testing I achieved 952 MBytes on a VM-2-VM > >>>>>> connection...1 > >>>>>> linux Ubuntu 12.04 vm to another CentOS 6.6 VM running on two > >>>>>> different SmartOS host machines (through an extreme networks switch). > >>>> > >>>> if I got John correctly, he was running his second test on SmartOS hosts... > >>>> > >>>> We did a lot of testing on OmniOS with -net vnic and -device > >>>> virtio-net-pci but sadly to no avail... > >>>> > >>>> I think we have to hope that SmartOS kvm improvements will get > >>>> upstreamed sooner or later. > >>> > >>> Ahh yes. > >>> > >>> I was hoping to have them ready for 014, but it's a complicated process to upstream larger projects, and Joyent was in the middle of getting their new Triton release out the door. > >>> > >>> Dan > >> _______________________________________________ > >> OmniOS-discuss mailing list > >> OmniOS-discuss at lists.omniti.com > >> http://lists.omniti.com/mailman/listinfo/omnios-discuss > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > > -- > > Theo Schlossnagle > > http://omniti.com/is/theo-schlossnagle > -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 From matej at zunaj.si Fri Mar 27 13:07:55 2015 From: matej at zunaj.si (Matej Zerovnik) Date: Fri, 27 Mar 2015 14:07:55 +0100 Subject: [OmniOS-discuss] iSCSI target hang, no way to restart but server reboot Message-ID: <5515562B.9090900@zunaj.si> Hello! We are having issues with iSCSI on work. Every now and then, iSCSI target just hangs up. We are unable to kill it, restart it or do anything else to restore the service. The only option to restore iscsi target back to working state, is to reboot the whole server and loose all the sessions (around 100 clients). Weird thing is, that only iscsi target hangs. I can ssh to server and work on it without any problem, there is no load or anything else, nothing in log files, just iscsi target locks up and all connections to iscsi target drop (probably timeout) Server is a IBM 3550 M4 with dual Xeon E5-2640 CPUs and 160GB of memory. Hard drives are mounted in a Supermicro JBOD, which is attached via SAS HBA LSI Logic SAS2308. We are using OmniOS v11 r151006. Anyone encounter similar troubles? Any recomendations what to do or how to solve that problem? Matej From danmcd at omniti.com Fri Mar 27 14:42:01 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 27 Mar 2015 10:42:01 -0400 Subject: [OmniOS-discuss] Potential KVM Virtio Performance Issues In-Reply-To: <4B77878C-A980-44C5-B5D8-1A34D9D3A52A@bissinc.com> References: <708F6249-5038-4DEE-A2A1-34D185B65CFC@omniti.com> <5511ED7A.3080006@gmx.li> <870038BE-8F1E-4411-A0BC-099D309E447D@omniti.com> <38C9C9CA-A75F-49F6-B210-C04CA81402E4@bissinc.com> <4B77878C-A980-44C5-B5D8-1A34D9D3A52A@bissinc.com> Message-ID: > On Mar 26, 2015, at 4:50 PM, John Barfield wrote: > > So I was still having issues with virtio performance. I?ve finally determined that its the child zone that is capping the throughput at 85mbps. We don't document KVM in a non-global (what you call "child") zone, but it is possible. I've put in for 014 the /dev/kvm entry in newly-created zones by default. This was missing from earlier OmniOS releases. I assume when configuring your non-global zones you did this yourself via zonecfg(1M)? Just checking to make sure I'm not missing anything. Thanks, Dan From danmcd at omniti.com Fri Mar 27 14:43:39 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 27 Mar 2015 10:43:39 -0400 Subject: [OmniOS-discuss] iSCSI target hang, no way to restart but server reboot In-Reply-To: <5515562B.9090900@zunaj.si> References: <5515562B.9090900@zunaj.si> Message-ID: > On Mar 27, 2015, at 9:07 AM, Matej Zerovnik wrote: > > Hello! > > We are having issues with iSCSI on work. Every now and then, iSCSI target just hangs up. We are unable to kill it, restart it or do anything else to restore the service. The only option to restore iscsi target back to working state, is to reboot the whole server and loose all the sessions (around 100 clients). > > Weird thing is, that only iscsi target hangs. I can ssh to server and work on it without any problem, there is no load or anything else, nothing in log files, just iscsi target locks up and all connections to iscsi target drop (probably timeout) > > Server is a IBM 3550 M4 with dual Xeon E5-2640 CPUs and 160GB of memory. > Hard drives are mounted in a Supermicro JBOD, which is attached via SAS HBA LSI Logic SAS2308. > > We are using OmniOS v11 r151006. > > Anyone encounter similar troubles? > Any recomendations what to do or how to solve that problem? I'd move to 012 or wait the short amount of time until 014 hits the streets. Then see if your problem persists. Dan From matej at zunaj.si Fri Mar 27 14:54:27 2015 From: matej at zunaj.si (Matej Zerovnik) Date: Fri, 27 Mar 2015 15:54:27 +0100 Subject: [OmniOS-discuss] iSCSI target hang, no way to restart but server reboot In-Reply-To: References: <5515562B.9090900@zunaj.si> Message-ID: <55156F23.2010300@zunaj.si> It just happened about 2 hours ago... The whole system did not crash, but 2 clients lost the connection. This is what I see in logs: Mar 27 13:55:51 storage.host.org scsi: [ID 107833 kern.notice] /pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0): Mar 27 13:55:51 storage.host.org Timeout of 0 seconds expired with 1 commands on target 68 lun 0. Mar 27 13:55:51 storage.host.org scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0): Mar 27 13:55:51 storage.host.org Disconnected command timeout for target 68 w500304800039d83d, enclosure 3 Mar 27 13:55:52 storage.host.org scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0): Mar 27 13:55:52 storage.host.org Log info 0x31140000 received for target 68 w500304800039d83d. Mar 27 13:55:52 storage.host.org scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Mar 27 15:08:31 storage.host.org iscsit: [ID 744151 kern.notice] NOTICE: login_sm_session_bind: add new conn/sess continue Mar 27 15:10:53 storage.host.org scsi: [ID 107833 kern.notice] /pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0): Mar 27 15:10:53 storage.host.org Timeout of 0 seconds expired with 1 commands on target 68 lun 0. Mar 27 15:10:53 storage.host.org scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0): Mar 27 15:10:53 storage.host.org Disconnected command timeout for target 68 w500304800039d83d, enclosure 3 Mar 27 15:10:54 storage.host.org scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c02 at 1/pci1000,3040 at 0 (mpt_sas0): Mar 27 15:10:54 storage.host.org Log info 0x31140000 received for target 68 w500304800039d83d. Mar 27 15:10:54 storage.host.org scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc I read in the archives, that this errors happens when you have SATA drives on a SAS expander and one of the drives misbehaves: A command did not complete and the mpt driver reset the target. If that target is an expander, then everything behind the expander can reset, resulting in the aborts of any in-flight commands, as follows... iostat -Ei | grep Error reports that one device has 6 hard errors and 6 device not ready errors, but that is a local drive, attached to a different controller (LSI Megaraid). I wouldn't like to do a major upgrade, since this is a production machine. Too scary:) Matej On 27. 03. 2015 15:43, Dan McDonald wrote: >> On Mar 27, 2015, at 9:07 AM, Matej Zerovnik wrote: >> >> Hello! >> >> We are having issues with iSCSI on work. Every now and then, iSCSI target just hangs up. We are unable to kill it, restart it or do anything else to restore the service. The only option to restore iscsi target back to working state, is to reboot the whole server and loose all the sessions (around 100 clients). >> >> Weird thing is, that only iscsi target hangs. I can ssh to server and work on it without any problem, there is no load or anything else, nothing in log files, just iscsi target locks up and all connections to iscsi target drop (probably timeout) >> >> Server is a IBM 3550 M4 with dual Xeon E5-2640 CPUs and 160GB of memory. >> Hard drives are mounted in a Supermicro JBOD, which is attached via SAS HBA LSI Logic SAS2308. >> >> We are using OmniOS v11 r151006. >> >> Anyone encounter similar troubles? >> Any recomendations what to do or how to solve that problem? > I'd move to 012 or wait the short amount of time until 014 hits the streets. Then see if your problem persists. > > Dan > From danmcd at omniti.com Fri Mar 27 14:56:53 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 27 Mar 2015 10:56:53 -0400 Subject: [OmniOS-discuss] iSCSI target hang, no way to restart but server reboot In-Reply-To: <55156F23.2010300@zunaj.si> References: <5515562B.9090900@zunaj.si> <55156F23.2010300@zunaj.si> Message-ID: <17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com> > On Mar 27, 2015, at 10:54 AM, Matej Zerovnik wrote: > > I read in the archives, that this errors happens when you have SATA drives on a SAS expander and one of the drives misbehaves: > A command did not complete and the mpt driver reset the target. > If that target is an expander, then everything behind the expander can > reset, resulting in the aborts of any in-flight commands, as follows... You read correctly. You should not have SATA drives on a SAS expander. You are setting yourself up for failure. > iostat -Ei | grep Error reports that one device has 6 hard errors and 6 device not ready errors, but that is a local drive, attached to a different controller (LSI Megaraid). LSI Megaraid, ESPECIALLY with 006, is not going to be as good as either mpt_sas, or a more modern build of OmniOS (I'm hoping to get one very good change in before I close 014's illumos synching). > I wouldn't like to do a major upgrade, since this is a production machine. Too scary:) You should plan for it, however. SATA drives on SAS expanders is a recipe for disaster, as you're seeing. Dan From matej at zunaj.si Fri Mar 27 15:03:19 2015 From: matej at zunaj.si (Matej Zerovnik) Date: Fri, 27 Mar 2015 16:03:19 +0100 Subject: [OmniOS-discuss] iSCSI target hang, no way to restart but server reboot In-Reply-To: <17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com> References: <5515562B.9090900@zunaj.si> <55156F23.2010300@zunaj.si> <17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com> Message-ID: <55157137.2010909@zunaj.si> On 27. 03. 2015 15:56, Dan McDonald wrote: >> iostat -Ei | grep Error reports that one device has 6 hard errors and 6 device not ready errors, but that is a local drive, attached to a different controller (LSI Megaraid). > LSI Megaraid, ESPECIALLY with 006, is not going to be as good as either mpt_sas, or a more modern build of OmniOS (I'm hoping to get one very good change in before I close 014's illumos synching). Only rpool is on megaraid, the storage is on LSI Logic SAS2308 HBA, which I think is using mpt_sas driver. What change do you plan on putting it? Does it concern mpt_sas driver? >> I wouldn't like to do a major upgrade, since this is a production machine. Too scary:) > You should plan for it, however. SATA drives on SAS expanders is a recipe for disaster, as you're seeing. Is there a better support for SATA drives in newer omnios? Matej From danmcd at omniti.com Fri Mar 27 15:04:57 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 27 Mar 2015 11:04:57 -0400 Subject: [OmniOS-discuss] iSCSI target hang, no way to restart but server reboot In-Reply-To: <55157137.2010909@zunaj.si> References: <5515562B.9090900@zunaj.si> <55156F23.2010300@zunaj.si> <17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com> <55157137.2010909@zunaj.si> Message-ID: <9E6B85F8-7454-4DE2-87AF-28A5BAAF6A33@omniti.com> > On Mar 27, 2015, at 11:03 AM, Matej Zerovnik wrote: > > Only rpool is on megaraid, the storage is on LSI Logic SAS2308 HBA, which I think is using mpt_sas driver. What change do you plan on putting it? Does it concern mpt_sas driver? There are some mpt_sas improvements, but no amount of driver improvements can fix the failure modes caused by SATA drives in SAS expanders. You Just Can't Fix That. >>> I wouldn't like to do a major upgrade, since this is a production machine. Too scary:) >> You should plan for it, however. SATA drives on SAS expanders is a recipe for disaster, as you're seeing. > Is there a better support for SATA drives in newer omnios? Not when you're using them in situations that are operationally dangerous. Were you a paying customer, we would tell you we don't support SATA drives in SAS expanders. Sorry, Dan From narayan.desai at gmail.com Fri Mar 27 15:13:42 2015 From: narayan.desai at gmail.com (Narayan Desai) Date: Fri, 27 Mar 2015 10:13:42 -0500 Subject: [OmniOS-discuss] iSCSI target hang, no way to restart but server reboot In-Reply-To: <9E6B85F8-7454-4DE2-87AF-28A5BAAF6A33@omniti.com> References: <5515562B.9090900@zunaj.si> <55156F23.2010300@zunaj.si> <17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com> <55157137.2010909@zunaj.si> <9E6B85F8-7454-4DE2-87AF-28A5BAAF6A33@omniti.com> Message-ID: Having been on the receiving end of similar advice, it is a frustrating situation to be in, since you have (and will likely continue to have) the hardware in production, without much option for replacement. When we had systems like this, we had a lot of success being aggressive in swapping out disks that were showing signs of going bad, even before critical failures occurred. Also looking at SMART statistics, and aggressively replacing those as well. This made the situation manageable. Basically, having sata drives in sas expanders means the system is brittle, and you should treat it as such. Look for: - errors in iostat -En - high service times in iostat -xnz - smartctl (this causes harmless sense messages when devices are probed, but it is easy enough to ignore these) - any errors reported out of lsiutil, showing either problems with cabling/enclosures, or devices - decode any sense errors reported by the lsi driver Aggressively replace devices implicated by these, and hope for the best. The best may or may not be what you're hoping for, but may be livable; it was for us. good luck -nld On Fri, Mar 27, 2015 at 10:04 AM, Dan McDonald wrote: > > > On Mar 27, 2015, at 11:03 AM, Matej Zerovnik wrote: > > > > Only rpool is on megaraid, the storage is on LSI Logic SAS2308 HBA, > which I think is using mpt_sas driver. What change do you plan on putting > it? Does it concern mpt_sas driver? > > There are some mpt_sas improvements, but no amount of driver improvements > can fix the failure modes caused by SATA drives in SAS expanders. You Just > Can't Fix That. > > >>> I wouldn't like to do a major upgrade, since this is a production > machine. Too scary:) > >> You should plan for it, however. SATA drives on SAS expanders is a > recipe for disaster, as you're seeing. > > Is there a better support for SATA drives in newer omnios? > > Not when you're using them in situations that are operationally dangerous. > > Were you a paying customer, we would tell you we don't support SATA drives > in SAS expanders. > > Sorry, > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave-oo at pooserville.com Sat Mar 28 03:51:17 2015 From: dave-oo at pooserville.com (Dave Pooser) Date: Fri, 27 Mar 2015 22:51:17 -0500 Subject: [OmniOS-discuss] iSCSI target hang, no way to restart but server reboot In-Reply-To: References: <5515562B.9090900@zunaj.si> <55156F23.2010300@zunaj.si> <17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com> <55157137.2010909@zunaj.si> <9E6B85F8-7454-4DE2-87AF-28A5BAAF6A33@omniti.com> Message-ID: >Having been on the receiving end of similar advice, it is a frustrating >situation to be in, since you have (and will likely continue to have) the >hardware in production, without much option for replacement. >When we had systems like this, we had a lot of success being aggressive in >swapping out disks that were showing signs of going bad, even before >critical failures occurred. Also looking at SMART statistics, and >aggressively replacing those as well. >Aggressively replace devices implicated by these, and hope for the best. >The best may or may not be what you're hoping for, but may be livable; it >was for us. Also bear in mind it's entirely possible to mix SAS and SATA drives in the same enclosure and even the same vdev-- so as you're aggressively replacing SATA drives replace them with SAS drives and your system will become less brittle. Assuming you're using enterprise SATA drives, their SAS siblings are not much more expensive (often about $20 difference) and the reliability gains will be significant. -- Dave Pooser Cat-Herder-in-Chief, Pooserville.com From richard.elling at richardelling.com Sat Mar 28 14:39:33 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Sat, 28 Mar 2015 07:39:33 -0700 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> Message-ID: > On Mar 26, 2015, at 9:24 AM, Doug Hughes wrote: > > any recommendations? We're having some pretty big problems with the Solarflare card and driver dropping network under high load. We eliminated LACP as a culprit, and the switch. > > Intel? Chelsio? other? I've been running exclusively Intel for several years now. It gets the most attention in the illumos community. -- richard > > - Doug > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From matthew.lagoe at subrigo.net Sun Mar 29 13:51:09 2015 From: matthew.lagoe at subrigo.net (Matthew Lagoe) Date: Sun, 29 Mar 2015 06:51:09 -0700 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> Message-ID: <005d01d06a27$6a2b0f20$3e812d60$@subrigo.net> The intel cards are nice but they don't have any cx4 cards so we don't use them. Copper connections have less latency on short links then fiber as you don't have the electric to optical conversion (when done properly) -----Original Message----- From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Richard Elling Sent: Saturday, March 28, 2015 07:40 AM To: Doug Hughes Cc: omnios-discuss Subject: Re: [OmniOS-discuss] best or preferred 10g card for OmniOS > On Mar 26, 2015, at 9:24 AM, Doug Hughes wrote: > > any recommendations? We're having some pretty big problems with the Solarflare card and driver dropping network under high load. We eliminated LACP as a culprit, and the switch. > > Intel? Chelsio? other? I've been running exclusively Intel for several years now. It gets the most attention in the illumos community. -- richard > > - Doug > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From chip at innovates.com Sun Mar 29 14:06:36 2015 From: chip at innovates.com (Schweiss, Chip) Date: Sun, 29 Mar 2015 09:06:36 -0500 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: <005d01d06a27$6a2b0f20$3e812d60$@subrigo.net> References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> <005d01d06a27$6a2b0f20$3e812d60$@subrigo.net> Message-ID: On Sun, Mar 29, 2015 at 8:51 AM, Matthew Lagoe wrote: > The intel cards are nice but they don't have any cx4 cards so we don't use > them. Copper connections have less latency on short links then fiber as you > don't have the electric to optical conversion (when done properly) > On short links (< 20M) twin-ax copper SFP+ are much more economical and lower latency than optics. I would only use optics and fiber if I have long runs. -Chip > > -----Original Message----- > From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On > Behalf Of Richard Elling > Sent: Saturday, March 28, 2015 07:40 AM > To: Doug Hughes > Cc: omnios-discuss > Subject: Re: [OmniOS-discuss] best or preferred 10g card for OmniOS > > > > On Mar 26, 2015, at 9:24 AM, Doug Hughes wrote: > > > > any recommendations? We're having some pretty big problems with the > Solarflare card and driver dropping network under high load. We eliminated > LACP as a culprit, and the switch. > > > > Intel? Chelsio? other? > > I've been running exclusively Intel for several years now. It gets the most > attention in the illumos community. > > -- richard > > > > > > - Doug > > _______________________________________________ > > OmniOS-discuss mailing list > > OmniOS-discuss at lists.omniti.com > > http://lists.omniti.com/mailman/listinfo/omnios-discuss > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at will.to Sun Mar 29 16:04:03 2015 From: doug at will.to (Doug Hughes) Date: Sun, 29 Mar 2015 12:04:03 -0400 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: <005d01d06a27$6a2b0f20$3e812d60$@subrigo.net> References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> <005d01d06a27$6a2b0f20$3e812d60$@subrigo.net> Message-ID: <55182273.4070509@will.to> On 3/29/2015 9:51 AM, Matthew Lagoe wrote: > The intel cards are nice but they don't have any cx4 cards so we don't use > them. Copper connections have less latency on short links then fiber as you > don't have the electric to optical conversion (when done properly) > Do yourself a huge favor and go to the SFP+ direct attach stuff. Longer lengths, thinner cables, better cables, etc. The switches are becoming really inexpensive. (I ordered pair of the Intel X520 DA2 cards from all of the recommendations here. I should have them tomorrow) From danmcd at omniti.com Sun Mar 29 18:51:45 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sun, 29 Mar 2015 14:51:45 -0400 Subject: [OmniOS-discuss] Final bloody release for 151013 Message-ID: <44F6A94E-5E6D-4570-97C6-F4A89C2EFA64@omniti.com> I've pushed a whole wad out except for pkg(5), so expect a large update. This is the contents of r151014, except any fixes for what I find in testing prior to its release, and one or two policy changes for signed packages in 014 itself. - omnios-build master commit 1212faf - Fix to ipmitool (thanks Andy!) - Reduction in "pkg verify" noise (thanks Ben!) - Fix to "omnios-userland" for libpcap's update - Timezone database to 2015a - OpenSSL to 1.0.2a - sudo's Timezone (TZ environment variable) checking backported. - zsh to 5.0.7 - PCI.IDs now are pulled from illumos-omnios, instead of we having two copies - libffi to 3.2.1, with better build infrastructure. - illumos-omnios master branch commit 45f3064, merged will illumos-gate 4e90188 - SMBIOS up to 2.8 - Several mr_sas fixes for modern LSI MegaRAID boards, including better raw-disk support for boards that support it. - NFS lock manager now won't fail in startup when statd leaves entries behind (illumos #4518 - read its analysis, please) - Disassembly support for Intel BMI1, BMI2, AVX2, and FMA instructions. This is the last r151013 bloody release. The next time there is a bloody release, there will be new ISOs, new Kayaks, and a new revision: r151015. Thanks! Dan From matthew.lagoe at subrigo.net Mon Mar 30 09:36:52 2015 From: matthew.lagoe at subrigo.net (Matthew Lagoe) Date: Mon, 30 Mar 2015 02:36:52 -0700 Subject: [OmniOS-discuss] best or preferred 10g card for OmniOS In-Reply-To: <55182273.4070509@will.to> References: <933dadf1-3c92-4fd7-b0b2-c2d899ff42c5.maildroid@localhost> <005d01d06a27$6a2b0f20$3e812d60$@subrigo.net> <55182273.4070509@will.to> Message-ID: <008601d06acd$0e424b50$2ac6e1f0$@subrigo.net> Sure was a few years ago when we deployed all our stuff and back then the sfp+ stuff was ridiculous -----Original Message----- From: Doug Hughes [mailto:doug at will.to] Sent: Sunday, March 29, 2015 09:04 AM To: Matthew Lagoe Cc: 'omnios-discuss' Subject: Re: [OmniOS-discuss] best or preferred 10g card for OmniOS On 3/29/2015 9:51 AM, Matthew Lagoe wrote: > The intel cards are nice but they don't have any cx4 cards so we don't > use them. Copper connections have less latency on short links then > fiber as you don't have the electric to optical conversion (when done > properly) > Do yourself a huge favor and go to the SFP+ direct attach stuff. Longer lengths, thinner cables, better cables, etc. The switches are becoming really inexpensive. (I ordered pair of the Intel X520 DA2 cards from all of the recommendations here. I should have them tomorrow) From richard.elling at richardelling.com Mon Mar 30 20:10:42 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Mon, 30 Mar 2015 13:10:42 -0700 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> <20150325195212.5a8cebe4@sleipner.datanom.net> <07B1E231-B571-46BC-913D-445DBCA54923@omniti.com> <7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com> <294BC7DA-0A85-4662-B4B2-71C94755A30A@omniti.com> Message-ID: <7F1FFBA7-3F56-4A16-8189-CBB8B6F7EE79@RichardElling.com> On Mar 26, 2015, at 11:24 PM, wuffers wrote: >> >> So here's what I will attempt to test: >> - Create thin vmdk @ 10TB with vSphere fat client: PASS >> - Create lazy zeroed vmdk @ 10 TB with vSphere fat client: PASS >> - Create eager zeroed vmdk @ 10 TB with vSphere web client: PASS! (took 1 hour) >> - Create thin vmdk @ 10TB with vSphere web client: PASS >> - Create lazy zeroed vmdk @ 10 TB with vSphere web client: PASS > > Additionally, I tried: > - Create fixed vhdx @ 10TB with SCVMM (Hyper-V): PASS (most likely no primitives in use here - this took slightly over 3 hours) is compression enabled? -- richard > > Everything passed (which I didn't expect, especially the 10TB eager zero).. then I tried again on the vSphere web client for a 20TB eager zero disk, and I got another kernel panic altogether (no kmem_flags 0xf set, unfortunately). > > Mar 27 2015 01:09:33.664060000 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd SUNOS-8000-KL > > TIME CLASS ENA > Mar 27 01:09:33.6307 ireport.os.sunos.panic.dump_available 0x0000000000000000 > Mar 27 01:08:30.6688 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000 > > nvlist version: 0 > version = 0x0 > class = list.suspect > uuid = 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd > code = SUNOS-8000-KL > diag-time = 1427432973 633746 > de = fmd:///module/software-diagnosis > fault-list-sz = 0x1 > fault-list = (array of embedded nvlists) > (start fault-list[0]) > nvlist version: 0 > version = 0x0 > class = defect.sunos.kernel.panic > certainty = 0x64 > asru = sw:///:path=/var/crash/unknown/.2e2305c2-54b5-c1f4-aafd-fb1eccc982dd > resource = sw:///:path=/var/crash/unknown/.2e2305c2-54b5-c1f4-aafd-fb1eccc982dd > savecore-succcess = 1 > dump-dir = /var/crash/unknown > dump-files = vmdump.2 > os-instance-uuid = 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd > panicstr = BAD TRAP: type=d (#gp General protection) rp=ffffff01eb72ea70 addr=0 > panicstack = unix:real_mode_stop_cpu_stage2_end+9e23 () | unix:trap+a30 () | unix:cmntrap+e6 () | genunix:anon_decref+35 () | genunix:anon_free+74 () | genunix:segvn_free+242 () | genunix:seg_free+30 () | genunix:segvn_unmap+cde () | genunix:as_free+e7 () | genunix:relvm+220 () | genunix:proc_exit+454 () | genunix:exit+15 () | genunix:rexit+18 () | unix:brand_sys_sysenter+1c9 () | > crashtime = 1427431421 > panic-time = Fri Mar 27 00:43:41 2015 EDT > (end fault-list[0]) > > fault-status = 0x1 > severity = Major > __ttl = 0x1 > __tod = 0x5514e60d 0x2794c060 > > Crash file: > https://drive.google.com/file/d/0B7mCJnZUzJPKT0lpTW9GZFJCLTg/view?usp=sharing > > It appears I can do thin and lazy zero disks of those sizes, so I will have to be satisfied to use those options as a workaround (plus disabling WRITE_SAME from the hosts if I really wanted the eager zeroed disk) until some of that Nexenta COMSTAR love is upstreamed. For comparison sake, provisioning a 10TB fixed vhdx took approximately 3 hours in Hyper-V, while the same provisioning in VMware took about 1 hour. So we can say that WRITE_SAME accelerated the same job by 3x. > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From moo at wuffers.net Mon Mar 30 20:16:53 2015 From: moo at wuffers.net (wuffers) Date: Mon, 30 Mar 2015 16:16:53 -0400 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: <7F1FFBA7-3F56-4A16-8189-CBB8B6F7EE79@RichardElling.com> References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> <20150325195212.5a8cebe4@sleipner.datanom.net> <07B1E231-B571-46BC-913D-445DBCA54923@omniti.com> <7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com> <294BC7DA-0A85-4662-B4B2-71C94755A30A@omniti.com> <7F1FFBA7-3F56-4A16-8189-CBB8B6F7EE79@RichardElling.com> Message-ID: <3E786626-25FC-48C8-9F9E-750BEEA9A7FA@wuffers.net> > On Mar 30, 2015, at 4:10 PM, Richard Elling wrote: > > > is compression enabled? > > > -- richard >> Yes, LZ4. Dedupe off. From richard.elling at richardelling.com Mon Mar 30 23:56:37 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Mon, 30 Mar 2015 16:56:37 -0700 Subject: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks In-Reply-To: <3E786626-25FC-48C8-9F9E-750BEEA9A7FA@wuffers.net> References: <847C1AB3-A505-4A73-B5E7-F3EA2BD41577@omniti.com> <82CC383C-71A1-425D-89E6-B7B725071183@omniti.com> <67c125eb-75c2-48ae-89fb-6a8df288d2d4@careyweb.com> <20150325195212.5a8cebe4@sleipner.datanom.net> <07B1E231-B571-46BC-913D-445DBCA54923@omniti.com> <7CBB4FDC-03CF-422F-889A-B867FFA68FB7@omniti.com> <294BC7DA-0A85-4662-B4B2-71C94755A30A@omniti.com> <7F1FFBA7-3F56-4A16-8189-CBB8B6F7EE79@RichardElling.com> <3E786626-25FC-48C8-9F9E-750BEEA9A7FA@wuffers.net> Message-ID: <798CF5FF-0260-4F7A-9115-3C37D23E5230@richardelling.com> > On Mar 30, 2015, at 1:16 PM, wuffers wrote: > > >> On Mar 30, 2015, at 4:10 PM, Richard Elling wrote: >> >> >> is compression enabled? >> >> >> -- richard >>> > > Yes, LZ4. Dedupe off. Ironically, WRITE_SAME is the perfect workload for dedup :-) -- richard From danmcd at omniti.com Tue Mar 31 02:04:33 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 30 Mar 2015 22:04:33 -0400 Subject: [OmniOS-discuss] A reminder about r151010 Message-ID: Once the release of r151014 hits the streets, the r151010 release becomes unsupported. Please migrate your 010 box to either 012 or 014. We DO support upgrades from 010 to 014. Modulo any odd packages that place constraints (and there are some in ms.omniti.com which do), 010 to 014 is a clean upgrade if you follow the (not yet published) r151014 upgrade instructions. Thank you! Dan From matej at zunaj.si Tue Mar 31 12:08:01 2015 From: matej at zunaj.si (Matej Zerovnik) Date: Tue, 31 Mar 2015 14:08:01 +0200 Subject: [OmniOS-discuss] iSCSI target hang, no way to restart but server reboot In-Reply-To: References: <5515562B.9090900@zunaj.si> <55156F23.2010300@zunaj.si> <17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com> <55157137.2010909@zunaj.si> <9E6B85F8-7454-4DE2-87AF-28A5BAAF6A33@omniti.com> Message-ID: <551A8E21.401@zunaj.si> On 27. 03. 2015 16:13, Narayan Desai wrote: > Having been on the receiving end of similar advice, it is a > frustrating situation to be in, since you have (and will likely > continue to have) the hardware in production, without much option for > replacement. > > When we had systems like this, we had a lot of success being > aggressive in swapping out disks that were showing signs of going bad, > even before critical failures occurred. Also looking at SMART > statistics, and aggressively replacing those as well. This made the > situation manageable. Basically, having sata drives in sas expanders > means the system is brittle, and you should treat it as such. Look for: > - errors in iostat -En > - high service times in iostat -xnz > - smartctl (this causes harmless sense messages when devices are > probed, but it is easy enough to ignore these) > - any errors reported out of lsiutil, showing either problems with > cabling/enclosures, or devices > - decode any sense errors reported by the lsi driver > > Aggressively replace devices implicated by these, and hope for the > best. The best may or may not be what you're hoping for, but may be > livable; it was for us. > When errors happened to you, were you able to use the pool itself and only iscsi target froze or did you have troubles with the pool itself as well... Because on our end, when iscsi target freezes, zpool is perfectly ok. We can access it and use it locally, but iscsi target is frozen and can't be restarted. I will check my sistem with iostat and smartctl, but we are using seagate drives, so some of the smartctl stats are useless on 1st sight:) Matej From narayan.desai at gmail.com Tue Mar 31 12:54:37 2015 From: narayan.desai at gmail.com (Narayan Desai) Date: Tue, 31 Mar 2015 05:54:37 -0700 Subject: [OmniOS-discuss] iSCSI target hang, no way to restart but server reboot In-Reply-To: <551A8E21.401@zunaj.si> References: <5515562B.9090900@zunaj.si> <55156F23.2010300@zunaj.si> <17BBA672-FDF1-4F36-8011-79212B0EFD97@omniti.com> <55157137.2010909@zunaj.si> <9E6B85F8-7454-4DE2-87AF-28A5BAAF6A33@omniti.com> <551A8E21.401@zunaj.si> Message-ID: We were primarily using the machines for serving iscsi to VMs, and we'd see bad cascading failures (iscsi lun timeouts would cause the watchdog to kick in on the linux hosts, resetting the initiator, meanwhile the VM would decide that the virtio devices in the VM were dead, requiring a client reboot). In some cases, the problems would happen across all luns, in others it would be just particular luns. I assume this followed the severity of the situation with the failing drive (or number of failing drives before got aggressive about replacement). Similarly, we'd see a range of behaviors with local pool commands, ranging from everything looking alright to zpool commands hanging or running *extremely* slowly. I'd hacked up some quick scripts to correlate info from the different sources. They are here: https://github.com/narayandesai/diy-lsi They may or may not be portable, but demonstrate all of the info gathering methods we found useful. Another thing that was useful was maintaining a pool inventory (stored somewhere else) with device addresses, serial numbers, and jbod bay mappings. Having to map that you when things are falling apart is seriously sad times. fwiw, you might still be ok with seagate drives; we were only using the self-check predictive failure flag, as opposed to anything more complicated. good luck -nld On Tue, Mar 31, 2015 at 5:08 AM, Matej Zerovnik wrote: > > On 27. 03. 2015 16:13, Narayan Desai wrote: > >> Having been on the receiving end of similar advice, it is a frustrating >> situation to be in, since you have (and will likely continue to have) the >> hardware in production, without much option for replacement. >> >> When we had systems like this, we had a lot of success being aggressive >> in swapping out disks that were showing signs of going bad, even before >> critical failures occurred. Also looking at SMART statistics, and >> aggressively replacing those as well. This made the situation manageable. >> Basically, having sata drives in sas expanders means the system is brittle, >> and you should treat it as such. Look for: >> - errors in iostat -En >> - high service times in iostat -xnz >> - smartctl (this causes harmless sense messages when devices are probed, >> but it is easy enough to ignore these) >> - any errors reported out of lsiutil, showing either problems with >> cabling/enclosures, or devices >> - decode any sense errors reported by the lsi driver >> >> Aggressively replace devices implicated by these, and hope for the best. >> The best may or may not be what you're hoping for, but may be livable; it >> was for us. >> >> When errors happened to you, were you able to use the pool itself and > only iscsi target froze or did you have troubles with the pool itself as > well... > > Because on our end, when iscsi target freezes, zpool is perfectly ok. We > can access it and use it locally, but iscsi target is frozen and can't be > restarted. > > I will check my sistem with iostat and smartctl, but we are using seagate > drives, so some of the smartctl stats are useless on 1st sight:) > > Matej > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pasztor at sagv5.gyakg.u-szeged.hu Mon Mar 23 21:19:53 2015 From: pasztor at sagv5.gyakg.u-szeged.hu (=?iso-8859-2?Q?P=C1SZTOR_Gy=F6rgy?=) Date: Mon, 23 Mar 2015 21:19:53 -0000 Subject: [OmniOS-discuss] A warning for upgraders with large numbers of BEs In-Reply-To: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com> References: <8B8D8D0F-35B5-42F8-B1FE-CD8EA002EE06@omniti.com> Message-ID: <20150323205308.GA21991@linux.gyakg.u-szeged.hu> Hi, "Dan McDonald" wrote at 2015-03-23 16:14: > Soon r151014 will be hitting the streets. WHEN THAT DOES, I have to warn people, especially those jumping from r151006 to r151014 about a known issue in grub. > > The illumos grub has serious memory management issues. It cannot cope with too many boot environment (BE) entries. Sorry for semi-offtopicing the thread, but: Will the lx brand be restored in the upcoming release? Is there a feature map / release plan / anything available? I tried to find information regarding this topic without success. I checked this url: http://omnios.omniti.com/roadmap.php But nothing relevant information was there. It seems outdated / unmaintained. I've just recently find this distro. I used openindiana since Oracle... -- Did what they did to opensolaris -- So, I'm new here, sorry for lame questions. Kind regards, Gy?rgy P?sztor