[OmniOS-discuss] stmf trouble, crash and dump
Johan Kragsterman
johan.kragsterman at capvert.se
Thu Oct 6 09:00:58 UTC 2016
Hi!
Got a problem here a couple of days ago when I ran a snapshot stream over fibre channel on my home/business/devel server to the clone backup server.
Systems: OmniOS 5.11 omnios-r151018-95eaa7e on both systems, initiator on one, and and target on the other. Also same hardware: Dell precision workstation with dual xeon 6-cores and 96 GB registred ram, intel quad port gb nic, and qlogic QLE2462 HBA's.
Configured with one lun provisioned from the target/backup system to the initiator system as a backup lun, and that backup lun configured as a zpool in the initiator system. I should also say, that I run this Fc connection point-to-point, no switch is involved, and it's a single fibre pair, 10 m.
I did a zfs send/recv of a snapshot, and I thought it took a long time. It was around 67 GB. Then the initiator system crashed and dumped. It rebooted, and I got into it again without any trouble.
What I immediately saw was that the zpool "backpool" that was backed by the Fc lun was not present any longer. I made a zpool import, and it was back there again. I did another test, sent a much smaller snap, this was around 450 MB, and that worked fine, although I thought it took a lot of time.
I once again tried with the bigger snap, and same thing happened, system crashed and dumped. I got those two dump files, but I don't know wether this might be a problem on the target system or the initiator side.
I can provide access to dump files.
Here is some information from the two systems that I find interesting:
The initiator system, omni:
omni:
root at omni:/var/log# dmesg | grep qlc
Oct 2 18:33:08 omni qlc: [ID 439991 kern.info] NOTICE: Qlogic qlc(0,0): Loop OFFLINE
root at omni:/# dmesg | grep scsi
Oct 2 18:34:58 omni scsi: [ID 243001 kern.info] /pci at 19,0/pci8086,3410 at 9/pci1077,138 at 0/fp at 0,0 (fcp0):
Oct 2 18:34:58 omni genunix: [ID 408114 kern.info] /scsi_vhci/disk at g600144f0c648ae73000057ef6d370001 (sd5) offline
Oct 2 18:34:58 omni genunix: [ID 483743 kern.info] /scsi_vhci/disk at g600144f0c648ae73000057ef6d370001 (sd5) multipath status: failed: path 4 fp0/disk at w2101001b32a19a92,0 is offline
root at omni:/# dmesg | grep multipath
Oct 2 18:34:58 omni genunix: [ID 483743 kern.info] /scsi_vhci/disk at g600144f0c648ae73000057ef6d370001 (sd5) multipath status: failed: path 4 fp0/disk at w2101001b32a19a92,0 is offline
As you can see, the loop is marked offline at the occasion for the crash. But notably strange, there is also an info of a failed multipath...? Why this? There is no multipath here...
The target system, omni2:
root at omni2:/root# grep stmf /var/adm/messages
Oct 2 09:56:37 omni2 pseudo: [ID 129642 kern.info] pseudo-device: stmf_sbd0
Oct 2 09:56:37 omni2 genunix: [ID 936769 kern.info] stmf_sbd0 is /pseudo/stmf_sbd at 0
Oct 2 09:56:46 omni2 pseudo: [ID 129642 kern.info] pseudo-device: stmf0
Oct 2 09:56:46 omni2 genunix: [ID 936769 kern.info] stmf0 is /pseudo/stmf at 0
Oct 2 09:57:31 omni2 pseudo: [ID 129642 kern.info] pseudo-device: stmf0
Oct 2 09:57:31 omni2 genunix: [ID 936769 kern.info] stmf0 is /pseudo/stmf at 0
Oct 2 09:57:31 omni2 stmf_sbd: [ID 690249 kern.warning] WARNING: ioctl(DKIOCINFO) failed 25
There is this warning, ioctl(DKCINFO) failed 25, that I tried to find out what it is about, but not succeeded.
Perhaps it is just so simple that the Fc connection isn't good enough. The cable shouldn't be a problem, since it is brand new, but it could of coarse be something with the HBA's. I could get another cable for doing multipath, and see how that would work, but let's start with this first.
Best regards from/Med vänliga hälsningar från
Johan Kragsterman
Capvert
More information about the OmniOS-discuss
mailing list