[OmniOS-discuss] LX: real ksh93 broken

Ludovic Orban lorban at bitronix.be
Thu May 11 09:11:14 UTC 2017


I think I found the root cause. At least, I'm 90% certain what follows is
correct and is the cause of the problem.

 getconf("CHILD_MAX") ends up calling sysconf(_SC_CHILD_MAX) which can also
be read from a shell with ulimit -u.

Here's what I get from a few different machines:

# uname -a ; ulimit -u
SunOS omnios 5.11 omnios-r151020-4151d05 i86pc i386 i86pc
29995
#

# uname -a ; ulimit -u
Linux opensuse 4.4.27-2-default #1 SMP Thu Nov 3 14:59:54 UTC 2016
(5c21e7c) x86_64 x86_64 x86_64 GNU/Linux
1200
#

# uname -a ; ulimit -u
Linux debian-8 4.4 BrandZ virtual linux x86_64 GNU/Linux
2147483647
#

Apparently, ksh isn't very happy when CHILD_MAX equals to MAX_INT, but
that's probably a ksh bug.

If my understanding of the LX code is correct, sysconf(_SC_CHILD_MAX) ends
up being translated to lx_getrlimit() which would return the value of
zone.max-lwps. Looks like an odd default to me, but I can't say for sure.
Since I haven't configured any rctl on my lx zone, apparently the default
is MAX_INT. I assume smartos uses a different default, but I wish I could
double-check that.

Now, I'm not sure how this could or should be fixed.

--
Ludovic


On Wed, May 10, 2017 at 10:32 PM, Dan McDonald <danmcd at omniti.com> wrote:

> Again, thank you for the digging.  I'd be interested in why ksh's
> getconf() fails as well.
>
> Thanks,
> Dan
>
> Sent from my iPhone (typos, autocorrect, and all)
>
> On May 10, 2017, at 3:44 PM, Ludovic Orban <lorban at bitronix.be> wrote:
>
> Okay, I found what causes ksh to misbehave. It's in sh_init(), when
> shgd->lim.child_max is initialized with the results of
> getconf("CHILD_MAX"), see: https://github.com/att/ast/
> blob/master/src/cmd/ksh93/sh/init.c#L1289
>
> I've commented out that line, hardcoded shgd->lim.child_max to 128,
> rebuilt and voila: ksh works as it should.
>
> Now I have to dig into that getconf() method to figure out what the
> returned value is and where it's coming from. Sounds trivial, but my C is
> *very* rusty, the asm gcc generates doesn't look at all what the JVM's JIT
> generates (which gives me wrong reflexes as I'm used to the latter) and I'm
> not very familiar with mdb.
>
> Oh well, that turned into a nice debugging re-training session which I
> very much needed. That reminds me the good old days at my first job when I
> was porting Linux apps to Solaris.
>
> Thank you for maintaining such a well-designed and pleasant to use OS!
>
>
> On Wed, May 10, 2017 at 3:59 PM, Dan McDonald <danmcd at omniti.com> wrote:
>
>> Wow, thank you for the further deep-diving.
>>
>> > On May 10, 2017, at 5:21 AM, Ludovic Orban <lorban at bitronix.be> wrote:
>> >
>> > Looking at ksh' sources, my understanding is that job_post is stuck in
>> that else clause:
>> >        else
>> >        {
>> >               /* create a new job */
>> >               while((pw->p_job = job_alloc()) < 0)
>> >                      job_wait((pid_t)1);
>> >               pw->p_nxtjob = job.pwlist;
>> >               pw->p_nxtproc = 0;
>> >        }
>> >
>> > Digging into the sources and stepping though the instructions of
>> job_alloc and job_byjid it looks like ksh cannot allocate a job id as it
>> believes they're all reserved. But so far, all this code is purely working
>> on internal structures of ksh so a LX bug would have no impact.
>> >
>> > I'll continue looking into this as time permits and I'll post an update
>> if I find anything worth mentioning.
>> >
>>
>> Be careful of narrowing your focus too far.  I see some things worth
>> considering:
>>
>> 1.) If the "if" you're not showing me dependent on something in global
>> state that may have been mis-initialized by an LX emulation bug?
>>
>> 2.) Same question as #1, but applied to job_alloc() and job_wait().
>>
>> I'm guessing LX in OmniOS is failing because I mismerged or plain forgot
>> something, given that Nahum says he can run ksh93 on SmartOS just fine.
>>
>>
>> Please make sure you're looking at the bigger picture, but THANK YOU for
>> the further investigation.
>>
>> Dan
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20170511/204bbed9/attachment.html>


More information about the OmniOS-discuss mailing list