[OmniOS-discuss] [zfs] Re: rpcbind: t_bind failed
Youzhong Yang
youzhong at gmail.com
Mon Jan 8 16:43:24 UTC 2018
This is our patch. It was applied 3 years ago so the line number could be
different for the latest version of the file.
diff --git a/usr/src/uts/common/rpc/clnt_cots.c
b/usr/src/uts/common/rpc/clnt_cots.c
index 4466e93..0a0951d 100644
--- a/usr/src/uts/common/rpc/clnt_cots.c
<http://bsegit/cgi-bin/cgit/smartos-repo/illumos-joyent.git/.git/tree/usr/src/uts/common/rpc/clnt_cots.c?h=tmw-3.5p1&id=01ab13000b73a72f417e287ae71a7e175b2da20a>
+++ b/usr/src/uts/common/rpc/clnt_cots.c
<http://bsegit/cgi-bin/cgit/smartos-repo/illumos-joyent.git/.git/tree/usr/src/uts/common/rpc/clnt_cots.c?h=tmw-3.5p1&id=bab3bdcd1929211af2a220ea41cf946fafc521ed>
@@ -2285,6 +2285,7 @@ start_retry_loop:
if (rpcerr->re_status == RPC_SUCCESS)
rpcerr->re_status = RPC_XPRTFAILED;
cm_entry->x_connected = FALSE;
+ cm_entry->x_dead = TRUE;
} else
cm_entry->x_connected = connected;
@@ -2403,6 +2404,7 @@ connmgr_wrapconnect(
if (rpcerr->re_status == RPC_SUCCESS)
rpcerr->re_status = RPC_XPRTFAILED;
cm_entry->x_connected = FALSE;
+ cm_entry->x_dead = TRUE;
} else
cm_entry->x_connected = connected;
On Mon, Jan 8, 2018 at 11:21 AM, Dan McDonald <danmcd at joyent.com> wrote:
>
>
> > On Jan 7, 2018, at 3:15 PM, Youzhong Yang <youzhong at gmail.com> wrote:
> >
> > Not sure if it's the same issue we reported 3 years ago. We applied our
> patch and haven't seen this issue ever since.
> >
> > https://illumos.topicbox.com/groups/developer/Te5808458a5a5a14f-
> M74735db9aeccaa5d8c3a70a4
>
> To quote that e-mail:
>
> > Hi Marcel:
> > It looks like we're getting an "early disconnect". This is what is
> leading to the accumulation of bound reserved ports. The scenario for
> reproduction is a follows:
> >
> > 1. Linux DEBIAN7.4 client acquires and releases lock on file on
> server (via NFS).
> > 2. reboot Linux client (but do so
> > _before_
> > MIR_CLNT_IDLE_TIMEOUT interval fires on server side).
> > 3. when Linux client comes back up, repeat step 1.
> >
> > At this point, a cm_entry with only the ORDREL flag set in
> x_state_word will remain in the cm_entry linked list list (cm_hd).
> > It appears that without at least a DEAD flag set in x_state_word,
> this cm_entry will remain bound to a the port...and will never be garbage
> collected.
> >
> > To experiment, we added,
> > "cm_entry->x_dead = TRUE;"
> > at line 2272 and 2390 to here:
> > https://github.com/joyent/illumos-joyent/blob/master/
> usr/src/uts/common/rpc/clnt_cots.c
> >
> > Testing with the above reproduction scenario, we are taking the
> path of line 2272 -- and that with the DEAD flag set in x_state_word, these
> "zombie" cm_entries are being now cleaned up, and we're no longer
> accumulating/leaking reserved ports.
> >
> > Is there more to it? This seems too simple of a fix. Are there
> unintended consequences we should be looking/testing for? Does this seem
> like it might be #1616 as well?
> > Thoughts?
> > Thanks!
> > -Ken & Youzhong
>
>
> And here's the patch in diff form for easier consumption:
>
> diff --git a/usr/src/uts/common/rpc/clnt_cots.c
> b/usr/src/uts/common/rpc/clnt_cots.c
> index 2e64ab0..f9b78ff 100644
> --- a/usr/src/uts/common/rpc/clnt_cots.c
> +++ b/usr/src/uts/common/rpc/clnt_cots.c
> @@ -2269,6 +2269,7 @@ start_retry_loop:
> cm_entry->x_ordrel = FALSE;
>
> cm_entry->x_tidu_size = tidu_size;
> + cm_entry->x_dead = TRUE;
>
> if (cm_entry->x_early_disc) {
> /*
> @@ -2387,6 +2388,7 @@ connmgr_wrapconnect(
>
> mutex_enter(&connmgr_lock);
>
> + cm_entry->x_dead = TRUE;
>
> if (cm_entry->x_early_disc) {
> /*
>
> Went back and checked my notes - I was traveling when that thread was
> going on, so I likely missed it altogether in the hustle/bustle of that.
>
> It seems at first glance you're being too aggressive in setting X_DEAD
> (note that this code gives you BOTH ways to set the flag, via a C bit
> fields OR the macro form... makes for very difficult reading, IMHO), but it
> my concern was valid you'd likely see far more outright failures.
>
> Maybe that patch is all we need?
>
> Dan
>
>
> ------------------------------------------
> illumos-zfs
> Archives: https://illumos.topicbox.com/groups/zfs/discussions/
> T8f10bde64dc0d5c5-M9dee7d96157a6ad5c11c472a
> Powered by Topicbox: https://topicbox.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20180108/56cee5d9/attachment.html>
More information about the OmniOS-discuss
mailing list