[OmniOS-discuss] [zfs] Re: rpcbind: t_bind failed

Wed Jan 3 18:55:09 UTC 2018

Hopefully the patch Marcel is talking about fixes this.  I've at least
figured out enough to predict when the problem is imminent.

We have been migrating to using automounter instead of hard mounts which
could to be related to this problem growing over time.

Just an FYI:  I've kept the server running in this state, but moved its
storage pool to a sister server.   The port binding problem remains with NO
NFS clients connected, but neither pfiles or lsof shows rpcbind as the
culprit:

# netstat -an|grep BOUND|wc -l
32739

# /opt/ozmt/bin/SunOS/lsof -i:41155

{nothing returned}

# pfiles `pgrep rpcbind`
449:    /usr/sbin/rpcbind
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0666 dev:527,0 ino:70778888 uid:0 gid:3 rdev:135,2
      O_RDWR
      /devices/pseudo/mm at 0:null
      offset:0
   1: S_IFCHR mode:0666 dev:527,0 ino:70778888 uid:0 gid:3 rdev:135,2
      O_RDWR
      /devices/pseudo/mm at 0:null
      offset:0
   2: S_IFCHR mode:0666 dev:527,0 ino:70778888 uid:0 gid:3 rdev:135,2
      O_RDWR
      /devices/pseudo/mm at 0:null
      offset:0
   3: S_IFCHR mode:0000 dev:527,0 ino:61271 uid:0 gid:0 rdev:231,64
      O_RDWR
        sockname: AF_INET6 ::  port: 111
      /devices/pseudo/udp6 at 0:udp6
      offset:0
   4: S_IFCHR mode:0000 dev:527,0 ino:50998 uid:0 gid:0 rdev:231,59
      O_RDWR
        sockname: AF_INET6 ::  port: 0
      /devices/pseudo/udp6 at 0:udp6
      offset:0
   5: S_IFCHR mode:0000 dev:527,0 ino:61264 uid:0 gid:0 rdev:231,58
      O_RDWR
        sockname: AF_INET6 ::  port: 60955
      /devices/pseudo/udp6 at 0:udp6
      offset:0
   6: S_IFCHR mode:0000 dev:527,0 ino:64334 uid:0 gid:0 rdev:224,57
      O_RDWR
        sockname: AF_INET6 ::  port: 111
      /devices/pseudo/tcp6 at 0:tcp6
      offset:0
   7: S_IFCHR mode:0000 dev:527,0 ino:64333 uid:0 gid:0 rdev:224,56
      O_RDWR
        sockname: AF_INET6 ::  port: 0
      /devices/pseudo/tcp6 at 0:tcp6
      offset:0
   8: S_IFCHR mode:0000 dev:527,0 ino:64332 uid:0 gid:0 rdev:230,55
      O_RDWR
        sockname: AF_INET 0.0.0.0  port: 111
      /devices/pseudo/udp at 0:udp
      offset:0
   9: S_IFCHR mode:0000 dev:527,0 ino:64330 uid:0 gid:0 rdev:230,54
      O_RDWR
        sockname: AF_INET 0.0.0.0  port: 0
      /devices/pseudo/udp at 0:udp
      offset:0
  10: S_IFCHR mode:0000 dev:527,0 ino:64331 uid:0 gid:0 rdev:230,53
      O_RDWR
        sockname: AF_INET 0.0.0.0  port: 60994
      /devices/pseudo/udp at 0:udp
      offset:0
  11: S_IFCHR mode:0000 dev:527,0 ino:64327 uid:0 gid:0 rdev:223,52
      O_RDWR
        sockname: AF_INET 0.0.0.0  port: 111
      /devices/pseudo/tcp at 0:tcp
      offset:0
  12: S_IFCHR mode:0000 dev:527,0 ino:64326 uid:0 gid:0 rdev:223,51
      O_RDWR
        sockname: AF_INET 0.0.0.0  port: 0
      /devices/pseudo/tcp at 0:tcp
      offset:0
  13: S_IFCHR mode:0000 dev:527,0 ino:64324 uid:0 gid:0 rdev:226,32
      O_RDWR
      /devices/pseudo/tl at 0:ticlts
      offset:0
  14: S_IFCHR mode:0000 dev:527,0 ino:64328 uid:0 gid:0 rdev:226,33
      O_RDWR
      /devices/pseudo/tl at 0:ticlts
      offset:0
  15: S_IFCHR mode:0000 dev:527,0 ino:64324 uid:0 gid:0 rdev:226,35
      O_RDWR
      /devices/pseudo/tl at 0:ticlts
      offset:0
  16: S_IFCHR mode:0000 dev:527,0 ino:64322 uid:0 gid:0 rdev:226,36
      O_RDWR
      /devices/pseudo/tl at 0:ticotsord
      offset:0
  17: S_IFCHR mode:0000 dev:527,0 ino:64321 uid:0 gid:0 rdev:226,37
      O_RDWR
      /devices/pseudo/tl at 0:ticotsord
      offset:0
  18: S_IFCHR mode:0000 dev:527,0 ino:64030 uid:0 gid:0 rdev:226,39
      O_RDWR
      /devices/pseudo/tl at 0:ticots
      offset:0
  19: S_IFCHR mode:0000 dev:527,0 ino:64029 uid:0 gid:0 rdev:226,40
      O_RDWR
      /devices/pseudo/tl at 0:ticots
      offset:0
  20: S_IFIFO mode:0000 dev:525,0 ino:206 uid:1 gid:12 rdev:0,0
      O_RDWR|O_NONBLOCK
  21: S_IFIFO mode:0000 dev:525,0 ino:206 uid:1 gid:12 rdev:0,0
      O_RDWR|O_NONBLOCK
  23: S_IFCHR mode:0000 dev:527,0 ino:33089 uid:0 gid:0 rdev:129,21273
      O_WRONLY FD_CLOEXEC
      /devices/pseudo/log at 0:conslog
      offset:0

Restarting rpcbind doesn't affect it either:

# svcadm restart svc:/network/rpc/bind:default

# netstat -an|grep BOUND|wc -l
32739

In the interim of this patch getting integrated I'll monitor the number of
bound ports to know when I should fail my pool over again.

On Wed, Jan 3, 2018 at 10:32 AM, Marcel Telka <marcel at telka.sk> wrote:

> On Wed, Jan 03, 2018 at 10:02:43AM -0600, Schweiss, Chip wrote:
> > The problem occurred again starting last night.  I have another clue,
> but I
> > still don't know how it is occurring or how to fix it.
> >
> > It looks like all the TCP ports are in "bound" state, but not being
> > released.
> >
> > How can I isolate the cause of this?
>
> This is a bug in rpcmod, very likely related to
> https://www.illumos.org/issues/1616
>
> I discussed this few weeks back with some guy who faced the same issue.  It
> looks like he found the cause and have a fix for it.  I thought he will
> post a
> review request, but that didn't happened for some reason yet.
>
> I'll try to push this forward...
>
>
> Thanks.
>
> --
> +-------------------------------------------+
> | Marcel Telka   e-mail:   marcel at telka.sk  |
> |                homepage: http://telka.sk/ |
> |                jabber:   marcel at jabber.sk |
> +-------------------------------------------+
>
> ------------------------------------------
> illumos-zfs
> Archives: https://illumos.topicbox.com/groups/zfs/discussions/
> T8f10bde64dc0d5c5-Mb17ca753ce6f6fbed5124147
> Powered by Topicbox: https://topicbox.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20180103/4de28636/attachment-0001.html>