[OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

Stephan Budach stephan.budach at JVM.DE
Wed May 11 16:32:31 UTC 2016


Am 11.05.16 um 16:48 schrieb Dale Ghent:
>> On May 11, 2016, at 7:36 AM, Stephan Budach <stephan.budach at JVM.DE> wrote:
>>
>> Am 09.05.16 um 20:43 schrieb Dale Ghent:
>>>> On May 9, 2016, at 2:04 PM, Stephan Budach <stephan.budach at JVM.DE> wrote:
>>>>
>>>> Am 09.05.16 um 16:33 schrieb Dale Ghent:
>>>>>> On May 9, 2016, at 8:24 AM, Stephan Budach <stephan.budach at JVM.DE> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first starts with a couple if link downs/ups on one port and finally the link on that  port negiotates to 1GbE instead of 10GbE, which then breaks the LACP channel on my Cisco Nexus for this connection.
>>>>>>
>>>>>> I have tried swapping and interchangeing cables and thus switchports, but to no avail.
>>>>>>
>>>>>> Anyone else noticed this and even better… knows a solution to this?
>>>>> Was this an issue noticed only with r151018 and not with previous versions, or have you only tried this with 018?
>>>>>
>>>>> By your description, I presume that the two ixgbe physical links will stay at 10Gb and not bounce down to 1Gb if not LACP'd together?
>>>>>
>>>>> /dale
>>>> I have noticed that on prior versions of OmniOS as well, but we only recently started deploying 10GbE LACP bonds, when we introduced our Nexus gear to our network. I will have to check if both links stay at 10GbE, when not being configured as a LACP bond. Let me check that tomorrow and report back. As we're heading for a streched DC, we are mainly configuring 2-way LACP bonds over our Nexus gear, so we don't actually have any single 10GbE connection, as they will all have to be conencted to both DCs. This is achieved by using VPCs on our Nexus switches.
>>> Provide as much detail as you can - if you're using hw flow control, whether both links act this way at the same time or independently, and so-on. Problems like this often boil down to a very small and seemingly insignificant detail.
>>>
>>> I currently have ixgbe on the operating table for adding X550 support, so I can take a look at this; however I don't have your type of switches available to me so LACP-specific testing is something I can't do for you.
>>>
>>> /dale
>> I checked the ixgbe.conf files on each host and they all are still at the standard setting, which includes flow_control = 3;
> As, so you are using ethernet flow control. Could you try disabling that on both sides (on the ixgbe host and on the switch) and see if that corrects the link stability issues? There's an outstanding issue with hw flow control on ixgbe that you *might* be running into regarding pause frame timing, which could manifest in the way you describe.
>
> /dale
>
I will try to get one node free of all services running on it, as I will 
have to reboot the system, since I will have to change the ixgbe.conf, 
haven't I?
This is a RSF-1 host, so this will likely be done over the weekend.

Thanks,
Stephan


More information about the OmniOS-discuss mailing list