[OmniOS-discuss] omniOS r018 crashed due to scsi/iSCSI issue
Stephan Budach
stephan.budach at jvm.de
Fri Jan 13 07:43:42 UTC 2017
Hi Dan,
just wanted to know, if you would be interested in the core dump of this
crash:
Jan 11 07:28:52 zfsha02gh79 scsi: [ID 107833 kern.warning] WARNING:
/scsi_vhci/disk at g600144f090d09613000056b8a83b0007 (sd27):
Jan 11 07:28:52 zfsha02gh79 incomplete write- retrying
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING:
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79
/scsi_vhci/disk at g600144f0564d504f4f4c3035534c3133 (sd47): Command
Timeout on path iscsi0/disk at 0000iqn.2016-02.de.jvm:nfsvmpool05ssd030002,3
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING:
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79
/scsi_vhci/disk at g600144f0564d504f4f4c3035534c3039 (sd44): Command
Timeout on path iscsi0/disk at 0000iqn.2016-02.de.jvm:nfsvmpool05ssd030002,0
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING:
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79
/scsi_vhci/disk at g600144f0564d504f4f4c3035534c3131 (sd46): Command
Timeout on path iscsi0/disk at 0000iqn.2016-02.de.jvm:nfsvmpool05ssd030002,2
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING:
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79
/scsi_vhci/disk at g600144f0564d504f4f4c3035534c3130 (sd45): Command
Timeout on path iscsi0/disk at 0000iqn.2016-02.de.jvm:nfsvmpool05ssd030002,1
Jan 12 17:30:22 zfsha02gh79 iscsi: [ID 431120 kern.warning] WARNING:
iscsi connection(26/3f) closing connection - target requested reason:0x7
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING:
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79
/scsi_vhci/disk at g600144f090d09613000056b8a7f10003 (sd19): Command
Timeout on path iscsi0/disk at 0000iqn.2015-03.de.jvm:nfsvmpool05ssd010002,2
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING:
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79
/scsi_vhci/disk at g600144f090d09613000056b8a7fc0004 (sd21): Command
Timeout on path iscsi0/disk at 0000iqn.2015-03.de.jvm:nfsvmpool05ssd010002,3
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING:
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79
/scsi_vhci/disk at g600144f090d09613000056b8a84a0008 (sd29): Command
Timeout on path iscsi0/disk at 0000iqn.2015-03.de.jvm:nfsvmpool05ssd010002,7
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING:
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79 unix: [ID 836849 kern.notice]
Jan 12 17:30:22 zfsha02gh79 ^Mpanic[cpu1]/thread=ffffff00f6539c40:
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 335743 kern.notice] BAD TRAP:
type=e (#pf Page fault) rp=ffffff00f6539610 addr=10 occurred in module
"scsi_vhci" due to a NULL pointer dereference
Jan 12 17:30:22 zfsha02gh79 unix: [ID 100000 kern.notice]
Jan 12 17:30:22 zfsha02gh79 unix: [ID 839527 kern.notice] sched:
Jan 12 17:30:22 zfsha02gh79 unix: [ID 753105 kern.notice] #pf Page fault
Jan 12 17:30:22 zfsha02gh79 unix: [ID 532287 kern.notice] Bad kernel
fault at addr=0x10
Jan 12 17:30:22 zfsha02gh79 unix: [ID 243837 kern.notice] pid=0,
pc=0xfffffffff7948e15, sp=0xffffff00f6539700, eflags=0x10246
Jan 12 17:30:22 zfsha02gh79 unix: [ID 211416 kern.notice] cr0:
8005003b<pg,wp,ne,et,ts,mp,pe> cr4:
1426f8<smep,osxsav,vmxe,xmme,fxsr,pge,mce,pae,pse,de>
Jan 12 17:30:22 zfsha02gh79 unix: [ID 624947 kern.notice] cr2: 10
Jan 12 17:30:22 zfsha02gh79 unix: [ID 625075 kern.notice] cr3: c000000
Jan 12 17:30:22 zfsha02gh79 unix: [ID 625715 kern.notice] cr8: 0
Jan 12 17:30:22 zfsha02gh79 unix: [ID 100000 kern.notice]
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice] rdi:
ffffff226adb90d8 rsi: 1 rdx: ffffff227063d400
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice]
rcx: 2 r8: 0 r9: fffffffff794bd10
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice]
rax: 0 rbx: 1 rbp: ffffff00f6539780
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice]
r10: 0 r11: ffffff00f65397b0 r12: 2
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice]
r13: 1 r14: 0 r15: 4
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice]
fsb: 0 gsb: ffffff21f0e81040 ds: 4b
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice]
es: 4b fs: 0 gs: 1c3
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice]
trp: e err: 0 rip: fffffffff7948e15
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice]
cs: 30 rfl: 10246 rsp: ffffff00f6539700
Jan 12 17:30:22 zfsha02gh79 unix: [ID 266532 kern.notice]
ss: 38
Jan 12 17:30:22 zfsha02gh79 unix: [ID 100000 kern.notice]
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f65394f0 unix:die+df ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f6539600 unix:trap+dd8 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f6539610 unix:_cmntrap+e6 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f6539780 scsi_vhci:vhci_scsi_reset_target+75 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f65397d0 scsi_vhci:vhci_recovery_reset+7d ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f6539820 scsi_vhci:vhci_pathinfo_offline+e5 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f65398c0 scsi_vhci:vhci_pathinfo_state_change+d5 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f6539950 genunix:i_mdi_pi_state_change+16a ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f6539990 genunix:mdi_pi_offline+39 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f6539a20 iscsi:iscsi_lun_offline+b3 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f6539a60 iscsi:iscsi_sess_offline_luns+4d ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f6539ab0 iscsi:iscsi_sess_state_logged_in+11e ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f6539b00 iscsi:iscsi_sess_state_machine+13e ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f6539b60 iscsi:iscsi_client_notify_task+17e ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f6539c20 genunix:taskq_thread+2d0 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice]
ffffff00f6539c30 unix:thread_start+8 ()
Jan 12 17:30:22 zfsha02gh79 unix: [ID 100000 kern.notice]
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 672855 kern.notice] syncing
file systems...
Jan 12 17:30:24 zfsha02gh79 genunix: [ID 904073 kern.notice] done
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 111219 kern.notice] dumping to
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Jan 12 17:30:22 zfsha02gh79 ahci: [ID 405573 kern.info] NOTICE: ahci0:
ahci_tran_reset_dport port 0 reset port
Jan 12 17:48:01 zfsha02gh79 genunix: [ID 100000 kern.notice]
Jan 12 17:48:02 zfsha02gh79 genunix: [ID 665016 kern.notice] ^M100%
done: 4721646 pages dumped,
This happend on a rather higher load situation, when I was copying a
200G file from a snapshot back to it's original place on its zvol, when
this happened. Luckily these are RSF-1 nodes and the other one took over
very quickliy, such as that my VM cluster didn't even seem to notice
this issue. However, at that time I was conencted to the crashing host
via ssh and my heart skipped a beat. ;)
As I have (unvoluntarily) freed this node of it's duties, I could jump
to r020 on it, but I wonder if there has been any changes to the
scsi_vhci layer at all in recent times…
Cheers,
Stephan
--
Krebs's 3 Basic Rules for Online Safety
1st - "If you didn't go looking for it, don't install it!"
2nd - "If you installed it, update it."
3rd - "If you no longer need it, remove it."
http://krebsonsecurity.com/2011/05/krebss-3-basic-rules-for-online-safety
Stephan Budach
Head of IT
Jung von Matt AG
Glashüttenstraße 79
20357 Hamburg
Tel: +49 40-4321-1353
Fax: +49 40-4321-1114
E-Mail: stephan.budach at jvm.de
Internet: http://www.jvm.com
CiscoJabber Video: https://exp-e2.jvm.de/call/stephan.budach
Vorstand: Dr. Peter Figge, Jean-Remy von Matt, Larissa Pohl, Thomas Strerath, Götz Ulmer
Vorsitzender des Aufsichtsrates: Hans Hermann Münchmeyer
AG HH HRB 72893
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20170113/905a30eb/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5546 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20170113/905a30eb/attachment-0001.bin>
More information about the OmniOS-discuss
mailing list