[OmniOS-discuss] Continuing hung-zone problems

Dan McDonald danmcd at omniti.com
Mon Apr 3 15:46:56 UTC 2017


> On Apr 2, 2017, at 7:07 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:
> 
> Previously I reported a problem (in the 040 timeframe) in that zones are hanging when being shut down.  Problems continue on that system. Today I am seeing the same issue with a different OmniOS system (version is omnios-r151020-4151d05).
> 
> In this case it was after doing 'init 6' to reboot the system and the problem added about 180 seconds to the shutdown time.  In three reboots, the problem happened twice.

If you have a shell available, you should inspect the available processes to see what all is stuck in where.

	pgrep -z <zonename>

is helpful, as well as:

	pstack <pid>
	pwdx <pid>

pwdx(1) shows the PWD for a process.  If the process is stuck with PWD==<filesytem> that <filesystem> can't be umounted until the process exits or changes PWD.  If that pstack(1) command shows NOTHING, it means it's stuck doing something in the kernel.  In that case, "mdb -k" may show you the processes from the kernel's pov:

	::ps -t

and then you can find the stuck process, and check out its threads in "mdb -k" by:

	<thread-pointer>::findstack -v

> This is logged:
> 
> Apr  2 17:50:13 velma zoneadmd[653]: [ID 702911 daemon.error] [zone 'swdev'] failed to open console master: Device busy
> Apr  2 17:50:13 velma zoneadmd[653]: [ID 702911 daemon.error] [zone 'swdev'] WARNING: could not open master side of zone console for swdev to release slave handle: Device busy
> Apr  2 17:50:13 velma zoneadmd[653]: [ID 702911 daemon.error] [zone 'swdev'] WARNING: console /devices//pseudo/zconsnex at 1/zcons at 0 found, but it could not be removed.: I/O error
> Apr  2 17:51:29 velma zoneadmd[1842]: [ID 702911 daemon.error] [zone 'swdev'] unable to unmount '/zones/swdev/root/proc'
> Apr  2 17:51:29 velma zoneadmd[1842]: [ID 702911 daemon.error] [zone 'swdev'] unable to unmount file systems in zone
> Apr  2 17:51:29 velma zoneadmd[1842]: [ID 702911 daemon.error] [zone 'swdev'] unable to destroy zone
> Apr  2 17:51:53 velma svc.startd[10]: [ID 122153 daemon.warning] svc:/system/zones:default: Method or service exit timed out.  Killing contract 169.
> Apr  2 17:51:53 velma svc.startd[10]: [ID 636263 daemon.warning] svc:/system/zones:default: Method "/lib/svc/method/svc-zones stop" failed due to signal KILL.

You did say you were using that zone as an NFS client.  Perhaps one of the processes wasn't really done accessing a remote filesystem?  I noticed the unmountable filesystem is procfs for that zone.  I can't tell you why that's interesting off the top of my head, that IS interesting to know, however.

Dan



More information about the OmniOS-discuss mailing list