[OmniOS-discuss] Continuing hung-zone problems

Dan McDonald danmcd at omniti.com
Mon Apr 3 18:59:17 UTC 2017


> On Apr 3, 2017, at 2:11 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:
> 
> The problem is definintely with zone 'shutdown'.  I have never seen it happen with 'reboot' or 'halt'.

Which does go through the inittab things.

I've found something.  I can reproduce this on bloody easily:

bloody(~)[0]% sudo zoneadm -z lipkg0 shutdown
<many-seconds-of-waiting...>
zone 'lipkg0': unable to shutdown zone
bloody(~)[1]% zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              ipkg     shared
   1 lx1              running    /zones/lx1                     lx       excl  
   2 lx0              running    /zones/lx0                     lx       excl  
   3 lx2              running    /zones/lx2                     lx       excl  
   4 lipkg0           shutting_down /zones/lipkg0                  lipkg    excl  
bloody(~)[0]% pgrep -z lipkg0
764
bloody(~)[0]% 

Process 764 is a zsched process.  It's here:

> 0xffffff007d61fc40::findstack -v
stack pointer for thread ffffff007d61fc40: ffffff007d61f950
[ ffffff007d61f950 _resume_from_idle+0x112() ]
  ffffff007d61f980 swtch+0x141()
  ffffff007d61f9c0 cv_wait+0x70(ffffff1955618148, fffffffffbcf8610)
  ffffff007d61fa50 zone_status_wait_cpr+0xb5(ffffff1955618000, 8, 
  fffffffffbb82a3d)
  ffffff007d61fb30 zsched+0x5f0(ffffff007dfbacf0)
  ffffff007d61fb40 thread_start+8()

Which is in this function:

3007  /*
3008   * Private CPR-safe version of zone_status_wait().
3009   */
3010  static void
3011  zone_status_wait_cpr(zone_t *zone, zone_status_t status, char *str)
3012  {
3013  	callb_cpr_t cprinfo;
3014  
3015  	ASSERT(status > ZONE_MIN_STATE && status <= ZONE_MAX_STATE);
3016  
3017  	CALLB_CPR_INIT(&cprinfo, &zone_status_lock, callb_generic_cpr,
3018  	    str);
3019  	mutex_enter(&zone_status_lock);
3020  	while (zone->zone_status < status) {
3021  		CALLB_CPR_SAFE_BEGIN(&cprinfo);
3022  		cv_wait(&zone->zone_cv, &zone_status_lock);
3023  		CALLB_CPR_SAFE_END(&cprinfo, &zone_status_lock);
3024  	}
3025  	/*
3026  	 * zone_status_lock is implicitly released by the following.
3027  	 */
3028  	CALLB_CPR_EXIT(&cprinfo);
3029  }


Okay.  I will be diving into this now to see WTF happened.  I'm sorry for not paying closer attention to this sooner.

Dan



More information about the OmniOS-discuss mailing list