[OmniOS-discuss] "zpool import" triggers deadlock in somes cases ? (metaslab_group_taskqs)

Alex alex.ranskis at gmail.com
Mon Apr 28 13:17:44 UTC 2014


Hello,

I'm trying to understand this behavior, which I see on servers connected to
an external disk enclosure. (I cannot reproduce it on a simple 1 disk VM)

# kstat -c taskq | grep metaslab_group_tasksq| wc -l
1112

# zpool import >/dev/null

# kstat -c taskq | grep metaslab_group_tasksq| wc -l
1160


we are accumulating 'metaslab_group_taskqs'

module: unix                            instance: 513
name:   metaslab_group_tasksq           class:    taskq
        crtime                          842173.739164514
        executed                        0
        maxtasks                        0
        nactive                         0
        nalloc                          0
        pid                             0
        priority                        60
        snaptime                        842774.7092530ok 06
        tasks                           0
        threads                         3
        totaltime                       0


The "zpool import" command itself runs fine. I get the same behavior
whether there are pools to import or not.

but kernel threads are piling up, for each CV there are 3 threads :
> ffffff05844fe080::wchaninfo -v
ADDR             TYPE NWAITERS   THREAD           PROC
ffffff05844fe080 cond        3:  ffffff0021c58c40 sched
                                 ffffff0021c5ec40 sched
                                 ffffff0021c64c40 sched

and they're all blocking, with a similar stack :
> ffffff0021c58c40::findstack -v
stack pointer for thread ffffff0021c58c40: ffffff0021c58a80
[ ffffff0021c58a80 _resume_from_idle+0xf4() ]
  ffffff0021c58ab0 swtch+0x141()
  ffffff0021c58af0 cv_wait+0x70(ffffff05844fe080, ffffff05844fe070)
  ffffff0021c58b60 taskq_thread_wait+0xbe(ffffff05844fe050,
ffffff05844fe070, ffffff05844fe080, ffffff0021c58bc0, ffffffffffffffff)
  ffffff0021c58c20 taskq_thread+0x37c(ffffff05844fe050)
  ffffff0021c58c30 thread_start+8()


the taskq seems to be created by a call to metaslab_group_create(), here :
              zfs`vdev_alloc+0x54a
              zfs`spa_config_parse+0x48
              zfs`spa_config_parse+0xda
              zfs`spa_config_valid+0x78
              zfs`spa_load_impl+0xa81
              zfs`spa_load+0x14e
              zfs`spa_tryimport+0xaa
              zfs`zfs_ioc_pool_tryimport+0x51
              zfs`zfsdev_ioctl+0x4a7
              genunix`cdev_ioctl+0x39
              specfs`spec_ioctl+0x60
              genunix`fop_ioctl+0x55
              genunix`ioctl+0x9b
              unix`sys_syscall32+0xff


I'm out of my depth here, any pointer to investigate further would be much
appreciated !

cheers,
alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140428/0941c294/attachment.html>


More information about the OmniOS-discuss mailing list