[OmniOS-discuss] getgrnam_r hangs if buffer too small

Nathan Huff nrhuff at umn.edu
Thu Jan 22 22:06:36 UTC 2015


I should have also mentioned that this is using the samba winbind nss 
module.  As I am looking at the code I think the
problem is that when the buffer is too small the winbind nss module set 
errno to ERANGE and then returns NSS_TRYAGAIN.

In Illumos in nss_commons.c there is a function retry_test that looks at 
the return value, but not the errno.  This causes the nss_search 
function to loop endlessly since the buffer never gets resized. It looks 
like the nss modules in Illumos return UNAVAIL instead of TRYAGAIN for 
cases where the buffer isn't big enough.  I will probably try patching 
the Samba sources and see if that fixes the issue.  I couldn't find any 
documentation that would say which is correct in the general case.  The 
only thing I could find was for glibc which wants TRYAGAIN in this case.

I don't know if there is any use for it, but the pstack is below.
2971:   ./a.out
  fef04ae5 nanosleep (8047c28, 8047c20)
  feef3244 sleep    (5, 2d7, fee104c8, 8047cb8, fee84523, fefa0b28) + 31
  fee9b385 nss_search (fef74520, fee83eb0, 4, 8047cb8, 0, 1) + 1a5
  fee845c0 getgrnam_r (8050f8b, 8047d10, 80611b0, 400, 80611b0, 80611c0) 
+ 9d
  08050e89 main     (1, 8047d60, 8047d68, 8050bf2, 8050f60, 0) + 59
  08050c53 _start   (1, 8047e28, 0, 8047e30, 8047e44, 8047e58) + 83

I have a core file, but I think I understand what is going on enough so 
it probably isn't necessary.

On 2015-01-22 2:58 PM, Dan McDonald wrote:
>
>> On Jan 22, 2015, at 3:11 PM, Nathan Huff <nrhuff at umn.edu> wrote:
>>
>> I am running 151006 and we have some very large groups.  If the buffer passed to getgrnam_r is too small to fit the group entry it seems to just hang.  I think it is supposed to return NULL and set errno to ERANGE.  If the buffer is big enough it returns the information fine.
>
> When your process hangs (assuming it's easily reproducible) could you utter:
>
> 	pstack <PID-of-hung-process>
>
> and share the stack with the list, please?
>
> And for bonus points, take a core dump of it as well:
>
> 	gcore <PID-of-hung-process>
>
> I *suspect* this affects all OmniOS versions.  The code in question is quite old, with last-changes predating illumos itself.
>
> Thanks!
> Dan
>

-- 
Nathan Huff
System Administrator
Academic Health Center Information Systems
University of Minnesota
612-626-9136


More information about the OmniOS-discuss mailing list