[OmniOS-discuss] device probe related command timeouts

Wed Jan 4 20:57:59 UTC 2017

Hi Joshua, 

As requested:

root at PRD-GIP-cpls-san1:/export/home/jbarfield# pstack 18919
18919:  sudo diskinfo
 fee8e665 pollsys  (8047b50, 2, 0, 0)
 fee264af pselect  (6, 8089c30, 8089c50, fef06320, 0, 0) + 1bf
 fee267b8 select   (6, 8089c30, 8089c50, 0, 0, 0) + 8e
 08055f64 sudo_execute (807c960, 8047ce8, 41e, c0) + 4ba
 080606f8 run_command (807c960, 807c9c0, 8047d88, 805dff9) + 4c
 0805e03f main     (8047d7c, fef076a8, 8047db4, 8054cc3, 2, 8047dc0) + 681
 08054cc3 _start   (2, 8047e7c, 8047e81, 0, 8047e8a, 8047e95) + 83
root at PRD-GIP-cpls-san1:/export/home/jbarfield# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs sata sd ip hook neti sockfs arp usba uhci stmf stmf_sbd md lofs mpt_sas random idm ipc nfs ptm crypto cpc kvm ufs logindmux nsmb smbsrv ]
> 0t18919::pid2proc | ::ps -f
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R  18919  18022  18919  18022      0 0x5a016000 ffffff24c389d000 sudo diskinfo
> 0t18919::pid2proc | ::walk thread | ::findstack -v
stack pointer for thread ffffff24c5bf4500: ffffff010a032c50
[ ffffff010a032c50 _resume_from_idle+0xf4() ]
  ffffff010a032c80 swtch+0x141()
  ffffff010a032d10 cv_wait_sig_swap_core+0x1b9(ffffff26a30f47fa,
  ffffff26a30f47c0, 0)
  ffffff010a032d30 cv_wait_sig_swap+0x17(ffffff26a30f47fa, ffffff26a30f47c0)
  ffffff010a032d60 cv_timedwait_sig_hrtime+0x35(ffffff26a30f47fa,
  ffffff26a30f47c0, ffffffffffffffff)
  ffffff010a032e20 poll_common+0x554(8047b50, 2, 0, 0)
  ffffff010a032ec0 pollsys+0xe7(8047b50, 2, 0, 0)
  ffffff010a032f10 _sys_sysenter_post_swapgs+0x149()
>

On 1/4/17, 2:52 PM, "Joshua M. Clulow" <josh at sysmgr.org> wrote:

    On 4 January 2017 at 12:29, John Barfield <john.barfield at bissinc.com> wrote:
    > I’ve got a SAN that seems to be timing out on any hardware probing commands such as “format” or “diskinfo” although prtconf seems to work.
    >
    > Does anyone happen to have a dtrace one liner or maybe kstat command I can run to see why/what they’re hanging on?

    I would start by running "pstack" with the pid of one of the stuck
    processes.  That will give you the part of the user program which is
    stuck.  Then, I would get the in-kernel state of the stuck threads;
    e.g., looking at my bash process:

        asgard # echo $$
        45435
        asgard # ps -fp 45435
             UID   PID  PPID   C    STIME TTY         TIME CMD
            root 45435 45433   0 20:47:17 pts/3       0:00 -bash

        asgard # mdb -k
        Loading modules: [ unix genunix specfs ... ]
        > 0t45435::pid2proc | ::ps -f
        S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
        R  45435  45433  45435  45435      0 0x4a014000 ffffff1b14d33048 -bash
        > 0t45435::pid2proc | ::walk thread | ::findstack -v
        stack pointer for thread ffffff03f776c080: ffffff0011b57c10
        [ ffffff0011b57c10 _resume_from_idle+0x112() ]
          ffffff0011b57c40 swtch+0x141()
          ffffff0011b57cd0 cv_wait_sig_swap_core+0x1b9(ffffff1b14d33108, ...)
          ffffff0011b57cf0 cv_wait_sig_swap+0x17(ffffff1b14d33108, ...)
          ffffff0011b57da0 waitid+0x315(7, 0, ffffff0011b57e30, f)
          ffffff0011b57eb0 waitsys32+0x36(7, 0, 8047750, f)
          ffffff0011b57f10 sys_syscall32+0x123()

    That might tell us where in the storage subsystem you're getting stuck.

    Cheers.

    -- 
    Joshua M. Clulow
    UNIX Admin/Developer
    http://blog.sysmgr.org