[OmniOS-discuss] getgrnam_r hangs if buffer too small
Nathan Huff
nrhuff at umn.edu
Fri Jan 23 21:23:32 UTC 2015
Patching the winbind nss module to return NSS_UNAVAIL in the buffer to
small case fixed the issue. Turns out there was a year old open bug
report about this. I submitted my patch so hopefully this can get fixed
upstream as well.
On 2015-01-22 4:06 PM, Nathan Huff wrote:
> I should have also mentioned that this is using the samba winbind nss
> module. As I am looking at the code I think the
> problem is that when the buffer is too small the winbind nss module set
> errno to ERANGE and then returns NSS_TRYAGAIN.
>
> In Illumos in nss_commons.c there is a function retry_test that looks at
> the return value, but not the errno. This causes the nss_search
> function to loop endlessly since the buffer never gets resized. It looks
> like the nss modules in Illumos return UNAVAIL instead of TRYAGAIN for
> cases where the buffer isn't big enough. I will probably try patching
> the Samba sources and see if that fixes the issue. I couldn't find any
> documentation that would say which is correct in the general case. The
> only thing I could find was for glibc which wants TRYAGAIN in this case.
>
> I don't know if there is any use for it, but the pstack is below.
> 2971: ./a.out
> fef04ae5 nanosleep (8047c28, 8047c20)
> feef3244 sleep (5, 2d7, fee104c8, 8047cb8, fee84523, fefa0b28) + 31
> fee9b385 nss_search (fef74520, fee83eb0, 4, 8047cb8, 0, 1) + 1a5
> fee845c0 getgrnam_r (8050f8b, 8047d10, 80611b0, 400, 80611b0, 80611c0)
> + 9d
> 08050e89 main (1, 8047d60, 8047d68, 8050bf2, 8050f60, 0) + 59
> 08050c53 _start (1, 8047e28, 0, 8047e30, 8047e44, 8047e58) + 83
>
> I have a core file, but I think I understand what is going on enough so
> it probably isn't necessary.
>
> On 2015-01-22 2:58 PM, Dan McDonald wrote:
>>
>>> On Jan 22, 2015, at 3:11 PM, Nathan Huff <nrhuff at umn.edu> wrote:
>>>
>>> I am running 151006 and we have some very large groups. If the
>>> buffer passed to getgrnam_r is too small to fit the group entry it
>>> seems to just hang. I think it is supposed to return NULL and set
>>> errno to ERANGE. If the buffer is big enough it returns the
>>> information fine.
>>
>> When your process hangs (assuming it's easily reproducible) could you
>> utter:
>>
>> pstack <PID-of-hung-process>
>>
>> and share the stack with the list, please?
>>
>> And for bonus points, take a core dump of it as well:
>>
>> gcore <PID-of-hung-process>
>>
>> I *suspect* this affects all OmniOS versions. The code in question is
>> quite old, with last-changes predating illumos itself.
>>
>> Thanks!
>> Dan
>>
>
--
Nathan Huff
System Administrator
Academic Health Center Information Systems
University of Minnesota
612-626-9136
More information about the OmniOS-discuss
mailing list