[OmniOS-discuss] 802.3ad/LACP weirdness
George-Cristian Bîrzan
gc at birzan.org
Sat Jun 22 03:28:50 EDT 2013
We set up a machine the other day and had a very weird problem when trying
to aggregate a couple of NICs. The machine has 4*10GigE ports (2*2 cards)
but we only connected two to a Force10 switch. We enabled LACP and the port
channel was created on the switch, but we then spent the next 4 hours
trying to get ARP requests to work.
It turns out that, for our config, it is possible broadcast frames over
interfaces that are down. We have not seen any unicast doing this, and
we're not doing any multicast, I'm not 100% sure if having an L4 policy
looks at the IP requested in the ARP frame, but we've tested with ~30 and
they all went over the same, even with L4. It's also something that can
happen if the interface goes down after the aggr interface was created, as
we shutdown the two operational interfaces one by and ARP requests were
still trying to go over the one that was down.
Obviously, our setup is not just that, on top of the aggr1 interface, we
have two VLANs (and the problem manifests itself on both). I cannot test
right now, as the machine is in production, but we'll be getting an
identical setup on Monday to play with for a bit. Anyway, some of the stuff
I looked at (was talking to esproul in the IRC channel):
dladm show-aggr -si while trying to get the MAC of an IP with all 4 ifaces
in (I removed the irrelevant ones so the headers match :) ):
LINK PORT IPACKETS RBYTES OPACKETS OBYTES IPKTDIST OPKTDIST
aggr1 -- 6521 2804 2 294 -- --
-- ixgbe0 0 0 0 46 0.0 0.0
-- ixgbe1 5616 252 1 124 86.1 50.0
-- ixgbe2 0 0 0 0 0.0 0.0
-- ixgbe3 905 2552 1 124 13.9 50.0
dladm show-aggr -x:
LINK PORT SPEED DUPLEX STATE ADDRESS
PORTSTATE
aggr1 -- 10000Mb full up 90:e2:ba:3f:d2:38 --
ixgbe0 0Mb unknown down 90:e2:ba:3f:d2:38
standby
ixgbe1 10000Mb full up 90:e2:ba:3f:d2:39
attached
ixgbe2 0Mb unknown down 90:e2:ba:3f:d0:50
standby
ixgbe3 10000Mb full up 90:e2:ba:3f:d0:51
attached
A tcpdump running on ixgbe0 at the same time:
# /opt/omni/sbin/tcpdump -ni ixgbe0 ether host 90:e2:ba:3f:d2:38
tcpdump: WARNING: SIOCGIFADDR: ixgbe0: No such device or address
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ixgbe0, link-type EN10MB (Ethernet), capture size 65535 bytes
14:42:30.025630 ARP, Request who-has 10.0.64.105 (ff:ff:ff:ff:ff:ff) tell
10.0.64.131, length 28
14:42:30.520063 ARP, Request who-has 10.0.64.105 (ff:ff:ff:ff:ff:ff) tell
10.0.64.131, length 28
14:42:31.020049 ARP, Request who-has 10.0.64.105 (ff:ff:ff:ff:ff:ff) tell
10.0.64.131, length 28
14:42:32.020122 ARP, Request who-has 10.0.64.105 (ff:ff:ff:ff:ff:ff) tell
10.0.64.131, length 28
14:42:32.520041 ARP, Request who-has 10.0.64.105 (ff:ff:ff:ff:ff:ff) tell
10.0.64.131, length 28
While the machine will be mostly accessed from other hosts, not the other
way around, this won't really be an issue, if an external machine sends an
ARP request, the reply will go over unicast and it will be on the correct
interface (i.e. one that is up), and we can just add static ARP entries
(i.e. the log host and default gateway), going forward this might be quite
a problem, and wanted to know if:
a) I did something stupid (quite likely, I have absolutely no experience
with OmniOS or Solaris except for the two weeks spent playing around with
it)
b) it's a bug (I took a look at aggr_send.c and couldn't see anything
obviously wrong, and I cannot see why broadcast packets would be treated
differently)
c) anyone has seen this behaviour before
--
George-Cristian Bîrzan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20130622/2357bca6/attachment.html>
More information about the OmniOS-discuss
mailing list