[OmniOS-discuss] latent problem with networking

Doug Hughes doug at will.to
Thu Dec 18 18:30:39 UTC 2014


well, we figured it out..

it was pretty silly actually.. It looks like for this machine, at this
location, and without the network/routing/route disabled, it was picking up
a *second* default route.. so some of the packets (seemingly acks and other
TCP activity -- somewhat important!) were ending up at this other router
which belongs to a peer organization and we're not making it all the way to
the remote side under certain circumstances. Once that second default route
was removed, everything was fixed. It never affected ping, and my existing
ssh was working fine. I have no idea why this suddenly started causing a
problem!

I'm glad it turned out to be something simple.


On Thu, Dec 18, 2014 at 1:21 PM, Dan McDonald <danmcd at omniti.com> wrote:
>
>
> > On Dec 18, 2014, at 11:26 AM, Doug Hughes <doug at will.to> wrote:
> >
> >
> > Here's the simplest test... I start up ttcp -r on the server, it binds
> to port 5001, listening. I run snoop.. Then I try to connect to 5001 from
> another machine. I see the packets in snoop, but the accept call on the
> omniOS machine never returns. Something seems wonky in network land. Has
> anybody seen this? THe machine has been up for weeks without any problems.
> >
> >   OmniOS v11 r151012
> >   Copyright 2014 OmniTI Computer Consulting, Inc. All rights reserved.
> >   Use is subject to license terms.
> >
> > Regular/plain Intel chipset:
> > e1000g0:
> > root at xyr-r:/root# dladm show-link e1000g0
> > LINK        CLASS     MTU    STATE    BRIDGE     OVER
> > e1000g0     phys      1500   up       --         --
> > root at xyr-r:/root# dladm show-phys e1000g0
> > LINK         MEDIA                STATE      SPEED  DUPLEX    DEVICE
> > e1000g0      Ethernet             up         1000   full      e1000g0
> >
> > e1000 prtdiag excerpt:
> >                     name='device-name' type=string items=1
> >                         value='82574L Gigabit Network Connection'
>
> I can't recall if this chipset has problems or not.  I want to say it
> might, BUT I'm not sure, so I won't point fingers.
>
> >                     name='subsystem-name' type=string items=1
> >                         value='unknown subsystem'
> >                 Device Minor Nodes:
> >                     dev=(112,1)
> >                         dev_path=/pci at 0,0/pci8086,1d14 at 1c
> ,2/pci122e,10d3 at 0:e1000g0
> > Ideas?
>
> If you've the disk space, please utter "savecore -L" while your machine is
> in this state.  It might be nice to have the system state while things are
> failing.
>
> Do you see any complaints from e1000g in /var/adm/messages?
>
> It's like the NIC or the driver stopped receiving packets.
>
> One thing you could do is unplumb and replumb the interface.  That may
> make the kernel reset the driver.
>
>         ifconfig e1000g0 unplumb
>         ifconfig e1000g0 plumb <addr/prefix> up
>
> If that doesn't work, you may also need to modunload the driver before
> replumbing.
>
>         ifconfig e1000g0 unplumb
>         modinfo | grep e1000g
>         modunload -i <number from modinfo line>
>         ifconfig e1000g0 plumb ....
>
> If modunload complains, you will need to unplumb the v6 interface
> ("ifconfig e1000g0 inet6 unplumb") or maybe disable some other services
> temporarily.
>
> Dan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20141218/87da12ce/attachment.html>


More information about the OmniOS-discuss mailing list