[OmniOS-discuss] LX: real ksh93 broken

Wed May 10 09:21:13 UTC 2017

In x86 asm, cmpl is both signed and unsigned, it's the following jump that
decides to work signed or not. In this case it's jl "jump if less" so it's
signed (vs jb "jump if before" that is unsigned). But I digress.

I've recompiled ksh93 with debug, no stripped symbols and no optimizations
(the binary is here: https://www.dropbox.com/s/brys628g40akruv/ksh93.gz?dl=0)
and managed to figure out where that infinite loop is happening:

> ::stack
job_byjid+5()
job_alloc+0x62()
job_post+0x1a2()
_sh_fork+0x265()
sh_ntfork+0xa99()
sh_exec+0x2be8()
sh_subshell+0x982()
comsubst+0xbf0()
varsub+0x3f4()
copyto+0xa2a()
sh_mactrim+0x196()
nv_setlist+0x220()
sh_exec+0xdb7()
sh_eval+0x2b9()
sh_trap+0x29b()
ed_setup+0x7ac()
ed_viread+0xf6()
slowread+0x181()
sfrd+0x4da()
_sffilbuf+0x433()
sfreserve+0x566()
exfile+0x808()
sh_main+0xb38()
main+0x25()
>

Looking at ksh' sources, my understanding is that job_post is stuck in that
else clause:
       else
       {
              /* create a new job */
              while((pw->p_job = job_alloc()) < 0)
                     job_wait((pid_t)1);
              pw->p_nxtjob = job.pwlist;
              pw->p_nxtproc = 0;
       }

Digging into the sources and stepping though the instructions of job_alloc
and job_byjid it looks like ksh cannot allocate a job id as it believes
they're all reserved. But so far, all this code is purely working on
internal structures of ksh so a LX bug would have no impact.

I'll continue looking into this as time permits and I'll post an update if
I find anything worth mentioning.

--
Ludovic

On Tue, May 9, 2017 at 5:15 PM, Dan McDonald <danmcd at omniti.com> wrote:

>
> > On May 9, 2017, at 11:05 AM, Dan McDonald <danmcd at omniti.com> wrote:
> >
> > And I've no good way to know what it's doing, as the illumos-native
> tools aren't giving me enough data.
>
> ksh93 appears to be looping in something:
>
> mdb: target stopped at:
> 0x42adf0:       movq   +0x350129(%rip),%rax     <0x77af20>
> > ::step
> mdb: target stopped at:
> 0x42adf7:       testq  %rax,%rax
> > ::step
> mdb: target stopped at:
> 0x42adfa:       jne    +0xc     <0x42ae08>
> > ::step
> mdb: target stopped at:
> 0x42adfc:       jmp    +0x28    <0x42ae26>
> > ::step
> mdb: target stopped at:
> 0x42ae26:       addl   $0x1,%r14d
> > ::step
> mdb: target stopped at:
> 0x42ae2a:       cmpl   0x10(%rsi),%r14d
> > ::step
> mdb: target stopped at:
> 0x42ae2e:       jl     -0x40    <0x42adf0>
> > ::step
> mdb: target stopped at:
> 0x42adf0:       movq   +0x350129(%rip),%rax     <0x77af20>
> > <rsi+0x10
> Usage: step [ over | out ] [SIG]
> > <rsi+0x10=P
>                 0x7fffff0470f0
> > 0x7fffff0470f0/D
> 0x7fffff0470f0: 2147483647
> > 0x7fffff0470f0/X
> 0x7fffff0470f0: 7fffffff
> > <r14d/P
> mdb: failed to read data from target: no mapping for address
> 0x761133e5:
> > <r14d=P
>                 0x761133e5
> >
>
> Something that never seems to exit.  Hmmm... I'm guessing r14d will never
> be less than 0x7ffffff with a signed compare (cmpl is signed,right?).
>
> NO idea how this happened.  :-/
>
> Dan
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20170510/630d7280/attachment.html>