[OmniOS-discuss] getgrnam_r hangs if buffer too small
Nathan Huff
nrhuff at umn.edu
Thu Jan 22 22:06:36 UTC 2015
I should have also mentioned that this is using the samba winbind nss
module. As I am looking at the code I think the
problem is that when the buffer is too small the winbind nss module set
errno to ERANGE and then returns NSS_TRYAGAIN.
In Illumos in nss_commons.c there is a function retry_test that looks at
the return value, but not the errno. This causes the nss_search
function to loop endlessly since the buffer never gets resized. It looks
like the nss modules in Illumos return UNAVAIL instead of TRYAGAIN for
cases where the buffer isn't big enough. I will probably try patching
the Samba sources and see if that fixes the issue. I couldn't find any
documentation that would say which is correct in the general case. The
only thing I could find was for glibc which wants TRYAGAIN in this case.
I don't know if there is any use for it, but the pstack is below.
2971: ./a.out
fef04ae5 nanosleep (8047c28, 8047c20)
feef3244 sleep (5, 2d7, fee104c8, 8047cb8, fee84523, fefa0b28) + 31
fee9b385 nss_search (fef74520, fee83eb0, 4, 8047cb8, 0, 1) + 1a5
fee845c0 getgrnam_r (8050f8b, 8047d10, 80611b0, 400, 80611b0, 80611c0)
+ 9d
08050e89 main (1, 8047d60, 8047d68, 8050bf2, 8050f60, 0) + 59
08050c53 _start (1, 8047e28, 0, 8047e30, 8047e44, 8047e58) + 83
I have a core file, but I think I understand what is going on enough so
it probably isn't necessary.
On 2015-01-22 2:58 PM, Dan McDonald wrote:
>
>> On Jan 22, 2015, at 3:11 PM, Nathan Huff <nrhuff at umn.edu> wrote:
>>
>> I am running 151006 and we have some very large groups. If the buffer passed to getgrnam_r is too small to fit the group entry it seems to just hang. I think it is supposed to return NULL and set errno to ERANGE. If the buffer is big enough it returns the information fine.
>
> When your process hangs (assuming it's easily reproducible) could you utter:
>
> pstack <PID-of-hung-process>
>
> and share the stack with the list, please?
>
> And for bonus points, take a core dump of it as well:
>
> gcore <PID-of-hung-process>
>
> I *suspect* this affects all OmniOS versions. The code in question is quite old, with last-changes predating illumos itself.
>
> Thanks!
> Dan
>
--
Nathan Huff
System Administrator
Academic Health Center Information Systems
University of Minnesota
612-626-9136
More information about the OmniOS-discuss
mailing list