[OmniOS-discuss] kernel panic - anon_decref

wuffers moo at wuffers.net
Fri Nov 15 05:39:27 UTC 2013


So I'm adding VMware hosts (ESXi 5.5)  to my OmniOS ZFS SAN, which are
already hosting some volumes for our Windows 2012 Hyper-V infrastructure,
running over SRP and Infiniband. In VMware, I had uninstalled the default
Mellanox 1.9.7 drivers and installed the older 1.6.1 drivers along with
OFED 1.8.2. I had no issues adding the new initiator to the target group,
and creating a new host group and view for the host - after which the
volume automagically showed up as expected.

I formatted using VMFS5, and started creating a VM, attaching an ISO and
loading up Windows Server 2012 R2. Somewhere during the install, I had my
first kernel panic and I had to reboot the SAN as it was during business
hours (couldn't wait for the dump to finish). Later that night I reproduced
the issue (just loading up VMs, and trying out a VMware converter job) and
was able to get a proper dump (which is now sitting in my
/var/crash/unknown, ~7GB).

Screenshots:
http://i.imgur.com/nGakKyS.png?1
http://i.imgur.com/wIx0g6J.png?1


TIME                           UUID
SUNW-MSG-ID
Nov 14 2013 22:13:46.926077000 a4432472-983c-ca82-a231-d1b468a3a91a
SUNOS-8000-KL

  TIME                 CLASS                                 ENA
  Nov 14 22:13:46.8830 ireport.os.sunos.panic.dump_available
0x0000000000000000
  Nov 14 22:12:33.1029 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = a4432472-983c-ca82-a231-d1b468a3a91a
        code = SUNOS-8000-KL
        diag-time = 1384485226 890408
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru =
sw:///:path=/var/crash/unknown/.a4432472-983c-ca82-a231-d1b468a3a91a
                resource =
sw:///:path=/var/crash/unknown/.a4432472-983c-ca82-a231-d1b468a3a91a
                savecore-succcess = 1
                dump-dir = /var/crash/unknown
                dump-files = vmdump.0
                os-instance-uuid = a4432472-983c-ca82-a231-d1b468a3a91a
                panicstr = anon_decref: slot count 0
                panicstack = fffffffffbb2fa18 () | genunix:anon_free+74 ()
| genunix:segvn_free+242 () | genunix:seg_free+30 () |
genunix:segvn_unmap+cde () | genunix:as_free+e7 () | genunix:relvm+220 () |
genunix:proc_exit+454 () | genunix:exit+15 () | genunix:rexit+18 () |
unix:brand_sys_sysenter+1c9 () |
                crashtime = 1384482703
                panic-time = Thu Nov 14 21:31:43 2013 EST
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x5285916a 0x3732d048

While getting the Hyper-V hosts up on IB and SRP I had issues with the
Windows hosts but never with the SAN box, and they have now been running
stable for 3+ months until the kernel panic today. I saw some other
anon_decref bugs, but those were in 2007-2008 and have already been rolled
into OmniOS. I'm pretty sure I was on the original r151006, and now am on
the latest r151006y, in hopes it's already taken care of. I'll try other
things to see if I can reproduce on the latest build.

In the meantime, does anyone want to take a look at the dump?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20131115/6b427a0a/attachment-0001.html>


More information about the OmniOS-discuss mailing list