[OmniOS-discuss] Debugging reproducable iSCSI initiator hang problem?

Mon Sep 29 19:51:04 UTC 2014

 We have found a reproducable iSCSI initiator issue in our environment.
The short form version of the hang is that if there is a network
interruption between one of our OmniOS machines and an iSCSI target and
you run 'iscsiadm list target -S' in a roughly 30-second window after
the interruption starts and before the OmniOS iSCSI initiator subsystem
really notices the problem, 'iscsiadm' will hang in an uninterruptable
state. Further iscsiadm commands will also hang and we have seen machines
escalate into full-scale system lockups.

 Because this is completely reproducable for us, we have a 'savecore -L'
vmdump file (and can generate as many more as would be useful, and we
can fully panic a test system if desired). However I don't have much
experience poking around OmniOS crash dumps. Where should I be looking
as far as mdb commands and so on go? Is there any particular state that
would be most useful to get a test machine into before taking/forcing
a crash dump?

 Thanks in advance.

	- cks
[For people who want to know more about our iSCSI configuration, the
 setup is described here:
   http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFileserverSetupII
 It may be relevant that we're running multipathing over two separate
 iSCSI networks; we've not currently tried to reproduce this with only
 a single network or without multipathing enabled.
]