[OmniOS-discuss] Reproducible r151008j kernel crash with ZFS pools on iSCSI

Chris Siebenmann cks at cs.toronto.edu
Fri Mar 7 19:34:53 UTC 2014


 I have a reproducible kernel crash with OmniOS r151008j. The situation:

 The basic setup is a ZFS pool on mirrored pairs of iSCSI disks. The
iSCSI disks come from two different iSCSI targets, and all
targets are multipathed over two 10G networks. The pool is set to
'failmode=continue'.  If I start a large streaming write to the pool and
then take down both iSCSI interfaces on both targets (making all disks
in the pool completely unavailable), OmniOS panics after a couple of
minutes. Fortunately this doesn't happen if only a single target becomes
inaccessible.

 I have crash dumps and can run commands against them and so on. Just
tell me what to look at/do/etc. Since this is a test environment I can
also reproduce this on demand and I'm willing test things freely.

 One panic produced the following:

Mar  7 10:20:50 sanjuan ^Mpanic[cpu3]/thread=ffffff007c4dbc40: 
  BAD TRAP: type=8 (#df Double fault) rp=ffffff114dca2f10 addr=0
  
  zpool-fs3-test-0: 
  #df Double fault
  pid=463, pc=0xfffffffff7903bb8, sp=0xffffff007c4d7000, eflags=0x10086
  cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 426f8<osxsav,vmxe,xmme,fxsr,pge,mce,pae,pse,de>
  cr2: ffffff007c4d6ff8
  cr3: bc00000
  cr8: 0
  
  	rdi: ffffff1157501a80 rsi:                5 rdx: ffffff007c4d70c0
  	rcx:                5  r8: ffffff1275342d58  r9:                1
  	rax:                3 rbx: ffffff1157501a80 rbp: ffffff007c4d7050
  	r10:                0 r11:         ffffffff r12:                5
  	r13: ffffff1142c55b48 r14: ffffff007c4d70c0 r15:                5
  	fsb:                0 gsb: ffffff1157501a80  ds:               4b
  	 es:               4b  fs:                0  gs:              1c3
  	trp:                8 err:                0 rip: fffffffff7903bb8
  	 cs:               30 rfl:            10086 rsp: ffffff007c4d7000
  	 ss:               38
  tss.tss_rsp0:	0xffffff007c4dbc40
  tss.tss_rsp1:	0x0
  tss.tss_rsp2:	0x0
  tss.tss_ist1:	0xffffff114dca3000
  tss.tss_ist2:	0x0
  tss.tss_ist3:	0x0
  tss.tss_ist4:	0x0
  tss.tss_ist5:	0x0
  tss.tss_ist6:	0x0
  tss.tss_ist7:	0x0
  
  ffffff114dca2df0 unix:real_mode_stop_cpu_stage2_end+9de3 ()
  ffffff114dca2f00 unix:trap+ca5 ()
  ffffff007c4d7050 unix:_patch_xrstorq_rbx+196 ()
  ffffff007c4d70b0 apix:apix_do_interrupt+372 ()
  ffffff007c4d70c0 unix:cmnint+ba ()
  ffffff007c4d7200 genunix:avl_remove+197 ()
  ffffff007c4d7240 zfs:vdev_queue_io_remove+54 ()
  ffffff007c4d7600 zfs:vdev_queue_io_to_issue+133 ()
  ffffff007c4d7640 zfs:vdev_queue_io_done+88 ()
  ffffff007c4d7680 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d76c0 zfs:zio_execute+88 ()
  ffffff007c4d7700 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d7740 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d7780 zfs:zio_execute+88 ()
  ffffff007c4d77c0 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d7800 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d7840 zfs:zio_execute+88 ()
  ffffff007c4d7880 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d78c0 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d7900 zfs:zio_execute+88 ()
  ffffff007c4d7940 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d7980 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d79c0 zfs:zio_execute+88 ()
  ffffff007c4d7a00 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d7a40 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d7a80 zfs:zio_execute+88 ()
  ffffff007c4d7ac0 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d7b00 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d7b40 zfs:zio_execute+88 ()
  ffffff007c4d7b80 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d7bc0 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d7c00 zfs:zio_execute+88 ()
  ffffff007c4d7c40 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d7c80 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d7cc0 zfs:zio_execute+88 ()
  ffffff007c4d7d00 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d7d40 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d7d80 zfs:zio_execute+88 ()
  ffffff007c4d7dc0 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d7e00 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d7e40 zfs:zio_execute+88 ()
  ffffff007c4d7e80 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d7ec0 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d7f00 zfs:zio_execute+88 ()
  ffffff007c4d7f40 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d7f80 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d7fc0 zfs:zio_execute+88 ()
  ffffff007c4d8000 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d8040 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d8080 zfs:zio_execute+88 ()
  ffffff007c4d80c0 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d8100 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d8140 zfs:zio_execute+88 ()
  ffffff007c4d8180 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d81c0 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d8200 zfs:zio_execute+88 ()
  ffffff007c4d8240 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d8280 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d82c0 zfs:zio_execute+88 ()
  ffffff007c4d8300 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d8340 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d8380 zfs:zio_execute+88 ()
  ffffff007c4d83c0 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d8400 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d8440 zfs:zio_execute+88 ()
  ffffff007c4d8480 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d84c0 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d8500 zfs:zio_execute+88 ()
  ffffff007c4d8540 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d8580 zfs:zio_vdev_io_done+80 ()
  ffffff007c4d85c0 zfs:zio_execute+88 ()
  ffffff007c4d8600 zfs:vdev_queue_io_done+78 ()
  ffffff007c4d8640 zfs:zio_vdev_io_done+80 ()
  Warning: stack in the dump buffer may be incomplete
  ffffff007c4d8680 zfs:zio_execute+88 ()
  Warning: stack in the dump buffer may be incomplete
  ffffff007c4d86c0 zfs:vdev_queue_io_done+78 ()
  Warning: stack in the dump buffer may be incomplete
  ffffff007c4d8700 zfs:zio_vdev_io_done+80 ()
  Warning: stack in the dump buffer may be incomplete
[... repeats a lot ...]
  ffffff007c4db930 zfs:vdev_queue_io_done+78 ()
  Warning: stack in the dump buffer may be incomplete
  ffffff007c4db970 zfs:zio_vdev_io_done+80 ()
  Warning: stack in the dump buffer may be incomplete
  ffffff007c4db9b0 zfs:zio_execute+88 ()
  Warning: stack in the dump buffer may be incomplete
  ffffff007c4db9f0 zfs:vdev_queue_io_done+78 ()
  Warning: stack in the dump buffer may be incomplete
  ffffff007c4dba30 zfs:zio_vdev_io_done+80 ()
  Warning: stack in the dump buffer may be incomplete
  ffffff007c4dba70 zfs:zio_execute+88 ()
  Warning: stack in the dump buffer may be incomplete
  ffffff007c4dbb30 genunix:taskq_thread+2d0 ()
  Warning: stack in the dump buffer may be incomplete
  ffffff007c4dbb40 unix:thread_start+8 ()
  Warning: stack in the dump buffer may be incomplete
  
  syncing file systems...
   done

A second crash has a very similar backtrace but the front is different:
  ffffff1157a61df0 unix:real_mode_stop_cpu_stage2_end+9de3 ()
  ffffff1157a61f00 unix:trap+ca5 ()
  ffffff007b7ce000 unix:_patch_xrstorq_rbx+196 ()
  ffffff007b7ce070 genunix:avl_find+72 ()
  ffffff007b7ce0b0 genunix:avl_add+27 ()
  ffffff007b7ce0f0 zfs:vdev_queue_pending_add+4b ()
  ffffff007b7ce4b0 zfs:vdev_queue_io_to_issue+153 ()
  ffffff007b7ce4f0 zfs:vdev_queue_io_done+88 ()
  ffffff007b7ce530 zfs:zio_vdev_io_done+80 ()
  ffffff007b7ce570 zfs:zio_execute+88 ()
  ffffff007b7ce5b0 zfs:vdev_queue_io_done+78 ()
  ffffff007b7ce5f0 zfs:zio_vdev_io_done+80 ()
  ffffff007b7ce630 zfs:zio_execute+88 ()
  ffffff007b7ce670 zfs:vdev_queue_io_done+78 ()
  ffffff007b7ce6b0 zfs:zio_vdev_io_done+80 ()
[... repeating pattern repeats ...]

	- cks


More information about the OmniOS-discuss mailing list