[OmniOS-discuss] kernel panic - anon_decref
wuffers
moo at wuffers.net
Fri Nov 15 05:39:27 UTC 2013
So I'm adding VMware hosts (ESXi 5.5) to my OmniOS ZFS SAN, which are
already hosting some volumes for our Windows 2012 Hyper-V infrastructure,
running over SRP and Infiniband. In VMware, I had uninstalled the default
Mellanox 1.9.7 drivers and installed the older 1.6.1 drivers along with
OFED 1.8.2. I had no issues adding the new initiator to the target group,
and creating a new host group and view for the host - after which the
volume automagically showed up as expected.
I formatted using VMFS5, and started creating a VM, attaching an ISO and
loading up Windows Server 2012 R2. Somewhere during the install, I had my
first kernel panic and I had to reboot the SAN as it was during business
hours (couldn't wait for the dump to finish). Later that night I reproduced
the issue (just loading up VMs, and trying out a VMware converter job) and
was able to get a proper dump (which is now sitting in my
/var/crash/unknown, ~7GB).
Screenshots:
http://i.imgur.com/nGakKyS.png?1
http://i.imgur.com/wIx0g6J.png?1
TIME UUID
SUNW-MSG-ID
Nov 14 2013 22:13:46.926077000 a4432472-983c-ca82-a231-d1b468a3a91a
SUNOS-8000-KL
TIME CLASS ENA
Nov 14 22:13:46.8830 ireport.os.sunos.panic.dump_available
0x0000000000000000
Nov 14 22:12:33.1029 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000
nvlist version: 0
version = 0x0
class = list.suspect
uuid = a4432472-983c-ca82-a231-d1b468a3a91a
code = SUNOS-8000-KL
diag-time = 1384485226 890408
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru =
sw:///:path=/var/crash/unknown/.a4432472-983c-ca82-a231-d1b468a3a91a
resource =
sw:///:path=/var/crash/unknown/.a4432472-983c-ca82-a231-d1b468a3a91a
savecore-succcess = 1
dump-dir = /var/crash/unknown
dump-files = vmdump.0
os-instance-uuid = a4432472-983c-ca82-a231-d1b468a3a91a
panicstr = anon_decref: slot count 0
panicstack = fffffffffbb2fa18 () | genunix:anon_free+74 ()
| genunix:segvn_free+242 () | genunix:seg_free+30 () |
genunix:segvn_unmap+cde () | genunix:as_free+e7 () | genunix:relvm+220 () |
genunix:proc_exit+454 () | genunix:exit+15 () | genunix:rexit+18 () |
unix:brand_sys_sysenter+1c9 () |
crashtime = 1384482703
panic-time = Thu Nov 14 21:31:43 2013 EST
(end fault-list[0])
fault-status = 0x1
severity = Major
__ttl = 0x1
__tod = 0x5285916a 0x3732d048
While getting the Hyper-V hosts up on IB and SRP I had issues with the
Windows hosts but never with the SAN box, and they have now been running
stable for 3+ months until the kernel panic today. I saw some other
anon_decref bugs, but those were in 2007-2008 and have already been rolled
into OmniOS. I'm pretty sure I was on the original r151006, and now am on
the latest r151006y, in hopes it's already taken care of. I'll try other
things to see if I can reproduce on the latest build.
In the meantime, does anyone want to take a look at the dump?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20131115/6b427a0a/attachment-0001.html>
More information about the OmniOS-discuss
mailing list