[OmniOS-discuss] [zfs] Re: rpcbind: t_bind failed
Schweiss, Chip
chip at innovates.com
Wed Jan 3 18:55:09 UTC 2018
Hopefully the patch Marcel is talking about fixes this. I've at least
figured out enough to predict when the problem is imminent.
We have been migrating to using automounter instead of hard mounts which
could to be related to this problem growing over time.
Just an FYI: I've kept the server running in this state, but moved its
storage pool to a sister server. The port binding problem remains with NO
NFS clients connected, but neither pfiles or lsof shows rpcbind as the
culprit:
# netstat -an|grep BOUND|wc -l
32739
# /opt/ozmt/bin/SunOS/lsof -i:41155
{nothing returned}
# pfiles `pgrep rpcbind`
449: /usr/sbin/rpcbind
Current rlimit: 65536 file descriptors
0: S_IFCHR mode:0666 dev:527,0 ino:70778888 uid:0 gid:3 rdev:135,2
O_RDWR
/devices/pseudo/mm at 0:null
offset:0
1: S_IFCHR mode:0666 dev:527,0 ino:70778888 uid:0 gid:3 rdev:135,2
O_RDWR
/devices/pseudo/mm at 0:null
offset:0
2: S_IFCHR mode:0666 dev:527,0 ino:70778888 uid:0 gid:3 rdev:135,2
O_RDWR
/devices/pseudo/mm at 0:null
offset:0
3: S_IFCHR mode:0000 dev:527,0 ino:61271 uid:0 gid:0 rdev:231,64
O_RDWR
sockname: AF_INET6 :: port: 111
/devices/pseudo/udp6 at 0:udp6
offset:0
4: S_IFCHR mode:0000 dev:527,0 ino:50998 uid:0 gid:0 rdev:231,59
O_RDWR
sockname: AF_INET6 :: port: 0
/devices/pseudo/udp6 at 0:udp6
offset:0
5: S_IFCHR mode:0000 dev:527,0 ino:61264 uid:0 gid:0 rdev:231,58
O_RDWR
sockname: AF_INET6 :: port: 60955
/devices/pseudo/udp6 at 0:udp6
offset:0
6: S_IFCHR mode:0000 dev:527,0 ino:64334 uid:0 gid:0 rdev:224,57
O_RDWR
sockname: AF_INET6 :: port: 111
/devices/pseudo/tcp6 at 0:tcp6
offset:0
7: S_IFCHR mode:0000 dev:527,0 ino:64333 uid:0 gid:0 rdev:224,56
O_RDWR
sockname: AF_INET6 :: port: 0
/devices/pseudo/tcp6 at 0:tcp6
offset:0
8: S_IFCHR mode:0000 dev:527,0 ino:64332 uid:0 gid:0 rdev:230,55
O_RDWR
sockname: AF_INET 0.0.0.0 port: 111
/devices/pseudo/udp at 0:udp
offset:0
9: S_IFCHR mode:0000 dev:527,0 ino:64330 uid:0 gid:0 rdev:230,54
O_RDWR
sockname: AF_INET 0.0.0.0 port: 0
/devices/pseudo/udp at 0:udp
offset:0
10: S_IFCHR mode:0000 dev:527,0 ino:64331 uid:0 gid:0 rdev:230,53
O_RDWR
sockname: AF_INET 0.0.0.0 port: 60994
/devices/pseudo/udp at 0:udp
offset:0
11: S_IFCHR mode:0000 dev:527,0 ino:64327 uid:0 gid:0 rdev:223,52
O_RDWR
sockname: AF_INET 0.0.0.0 port: 111
/devices/pseudo/tcp at 0:tcp
offset:0
12: S_IFCHR mode:0000 dev:527,0 ino:64326 uid:0 gid:0 rdev:223,51
O_RDWR
sockname: AF_INET 0.0.0.0 port: 0
/devices/pseudo/tcp at 0:tcp
offset:0
13: S_IFCHR mode:0000 dev:527,0 ino:64324 uid:0 gid:0 rdev:226,32
O_RDWR
/devices/pseudo/tl at 0:ticlts
offset:0
14: S_IFCHR mode:0000 dev:527,0 ino:64328 uid:0 gid:0 rdev:226,33
O_RDWR
/devices/pseudo/tl at 0:ticlts
offset:0
15: S_IFCHR mode:0000 dev:527,0 ino:64324 uid:0 gid:0 rdev:226,35
O_RDWR
/devices/pseudo/tl at 0:ticlts
offset:0
16: S_IFCHR mode:0000 dev:527,0 ino:64322 uid:0 gid:0 rdev:226,36
O_RDWR
/devices/pseudo/tl at 0:ticotsord
offset:0
17: S_IFCHR mode:0000 dev:527,0 ino:64321 uid:0 gid:0 rdev:226,37
O_RDWR
/devices/pseudo/tl at 0:ticotsord
offset:0
18: S_IFCHR mode:0000 dev:527,0 ino:64030 uid:0 gid:0 rdev:226,39
O_RDWR
/devices/pseudo/tl at 0:ticots
offset:0
19: S_IFCHR mode:0000 dev:527,0 ino:64029 uid:0 gid:0 rdev:226,40
O_RDWR
/devices/pseudo/tl at 0:ticots
offset:0
20: S_IFIFO mode:0000 dev:525,0 ino:206 uid:1 gid:12 rdev:0,0
O_RDWR|O_NONBLOCK
21: S_IFIFO mode:0000 dev:525,0 ino:206 uid:1 gid:12 rdev:0,0
O_RDWR|O_NONBLOCK
23: S_IFCHR mode:0000 dev:527,0 ino:33089 uid:0 gid:0 rdev:129,21273
O_WRONLY FD_CLOEXEC
/devices/pseudo/log at 0:conslog
offset:0
Restarting rpcbind doesn't affect it either:
# svcadm restart svc:/network/rpc/bind:default
# netstat -an|grep BOUND|wc -l
32739
In the interim of this patch getting integrated I'll monitor the number of
bound ports to know when I should fail my pool over again.
On Wed, Jan 3, 2018 at 10:32 AM, Marcel Telka <marcel at telka.sk> wrote:
> On Wed, Jan 03, 2018 at 10:02:43AM -0600, Schweiss, Chip wrote:
> > The problem occurred again starting last night. I have another clue,
> but I
> > still don't know how it is occurring or how to fix it.
> >
> > It looks like all the TCP ports are in "bound" state, but not being
> > released.
> >
> > How can I isolate the cause of this?
>
> This is a bug in rpcmod, very likely related to
> https://www.illumos.org/issues/1616
>
> I discussed this few weeks back with some guy who faced the same issue. It
> looks like he found the cause and have a fix for it. I thought he will
> post a
> review request, but that didn't happened for some reason yet.
>
> I'll try to push this forward...
>
>
> Thanks.
>
> --
> +-------------------------------------------+
> | Marcel Telka e-mail: marcel at telka.sk |
> | homepage: http://telka.sk/ |
> | jabber: marcel at jabber.sk |
> +-------------------------------------------+
>
> ------------------------------------------
> illumos-zfs
> Archives: https://illumos.topicbox.com/groups/zfs/discussions/
> T8f10bde64dc0d5c5-Mb17ca753ce6f6fbed5124147
> Powered by Topicbox: https://topicbox.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20180103/4de28636/attachment-0001.html>
More information about the OmniOS-discuss
mailing list