[OmniOS-discuss] kernel panic - anon_decref

wuffers moo at wuffers.net
Mon Nov 18 22:42:31 UTC 2013


Just to add to this, I had a 4th kernel panic, and this was a 3rd different
type. I did a memtest on the unit after this last panic, and it ran
successfully (24+ hours). I'm skeptical that it's memory, or something to
do with the IOCLogInfo=0x31120303 error (last 2 panics didn't have that - I
may start another thread on that), as I've been running this config with
Hyper-V hosts just fine. Adding an ESXi host (just one for now) into the
mix seems to make things unstable.

Should I be starting an issue in the Illumos issue report (
https://www.illumos.org/projects/illumos-gate/issues/new), and if so, just
one report or one for each panic type?

List of kernel panics so far:

Panic 1: anon_decref: slot count 0
Panic 2-3: kernel heap corruption detected
Panic 4: BAD TRAP: type=e (#pf Page fault) rp=ffffff01e97d7a70 addr=1500010
occurred in module "genunix" due to an illegal access to a user address

Latest crash file here:
https://drive.google.com/file/d/0B7mCJnZUzJPKWW83TFBhVHpVajQ

TIME                           UUID
SUNW-MSG-ID
Nov 17 2013 09:22:20.799446000 9d55f532-d39f-4dea-8f57-d3b24c8e9dff
SUNOS-8000-KL

  TIME                 CLASS                                 ENA
  Nov 17 09:22:20.7654 ireport.os.sunos.panic.dump_available
0x0000000000000000
  Nov 17 09:21:14.0267 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 9d55f532-d39f-4dea-8f57-d3b24c8e9dff
        code = SUNOS-8000-KL
        diag-time = 1384698140 767808
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru =
sw:///:path=/var/crash/unknown/.9d55f532-d39f-4dea-8f57-d3b24c8e9dff
                resource =
sw:///:path=/var/crash/unknown/.9d55f532-d39f-4dea-8f57-d3b24c8e9dff
                savecore-succcess = 1
                dump-dir = /var/crash/unknown
                dump-files = vmdump.3
                os-instance-uuid = 9d55f532-d39f-4dea-8f57-d3b24c8e9dff
                panicstr = BAD TRAP: type=e (#pf Page fault)
rp=ffffff01e97d7a70 addr=1500010 occurred in module "genunix" due to an
illegal access to a user address
                panicstack = unix:die+df () | unix:trap+db3 () |
unix:cmntrap+e6 () | genunix:anon_decref+35 () | genunix:anon_free+74 () |
genunix:segvn_free+242 () | genunix:seg_free+30 () |
genunix:segvn_unmap+cde () | genunix:as_free+e7 () | genunix:relvm+220 () |
genunix:proc_exit+454 () | genunix:exit+15 () | genunix:rexit+18 () |
unix:brand_sys_sysenter+1c9 () |
                crashtime = 1384592942
                panic-time = Sat Nov 16 04:09:02 2013 EST
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x5288d11c 0x2fa693f0


On Sat, Nov 16, 2013 at 2:48 AM, wuffers <moo at wuffers.net> wrote:

> When it pours, it rains. With r151006y, I had two kernel panics in quick
> succession while trying to create some zero thick eager disks (4 at the
> same time) in ESXi. They are now "kernel heap corruption detected" instead
> of anon_decref.
>
> Kernel panic 2 (dump info:
> https://drive.google.com/file/d/0B7mCJnZUzJPKMHhqZHJnaDEzYkk)
> http://i.imgur.com/eIssxmc.png?1
> http://i.imgur.com/MXJy4zP.png?1
>
> TIME                           UUID
> SUNW-MSG-ID
> Nov 16 2013 00:51:24.912170000 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
> SUNOS-8000-KL
>
>   TIME                 CLASS                                 ENA
>   Nov 16 00:51:24.8638 ireport.os.sunos.panic.dump_available
> 0x0000000000000000
>   Nov 16 00:49:58.8671 ireport.os.sunos.panic.dump_pending_on_device
> 0x0000000000000000
>
>
> nvlist version: 0
>         version = 0x0
>         class = list.suspect
>         uuid = 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
>         code = SUNOS-8000-KL
>         diag-time = 1384581084 866703
>
>         de = fmd:///module/software-diagnosis
>         fault-list-sz = 0x1
>         fault-list = (array of embedded nvlists)
>         (start fault-list[0])
>         nvlist version: 0
>                 version = 0x0
>                 class = defect.sunos.kernel.panic
>                 certainty = 0x64
>                 asru =
> sw:///:path=/var/crash/unknown/.5998ba1e-3aa5-ccac-e885-be4897cfcfe8
>                 resource =
> sw:///:path=/var/crash/unknown/.5998ba1e-3aa5-ccac-e885-be4897cfcfe8
>
>                 savecore-succcess = 1
>                 dump-dir = /var/crash/unknown
>                 dump-files = vmdump.1
>                 os-instance-uuid = 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
>                 panicstr = kernel heap corruption detected
>                 panicstack = fffffffffba49c04 () |
> genunix:kmem_slab_free+c1 () | genunix:kmem_magazine_destroy+6e () |
> genunix:kmem_depot_ws_reap+5d () | genunix:kmem_cache_magazine_purge+118 ()
> | genunix:kmem_cache_magazine_resize+40 () | genunix:taskq_thread+2d0 () |
> unix:thread_start+8 () |
>                 crashtime = 1384577735
>                 panic-time = Fri Nov 15 23:55:35 2013 EST
>
>         (end fault-list[0])
>
>         fault-status = 0x1
>         severity = Major
>         __ttl = 0x1
>         __tod = 0x528707dc 0x365e9c10
>
> kernel panic 3 (dump info:
> https://drive.google.com/file/d/0B7mCJnZUzJPKbnZIeWZzQjhUOTQ):
> (looked the same, no screenshots)
>
> TIME                           UUID
> SUNW-MSG-ID
> Nov 16 2013 01:44:43.327489000 a6592c60-199f-ead5-9586-ff013bf5ab2d
> SUNOS-8000-KL
>
>   TIME                 CLASS                                 ENA
>   Nov 16 01:44:43.2941 ireport.os.sunos.panic.dump_available
> 0x0000000000000000
>   Nov 16 01:44:03.5356 ireport.os.sunos.panic.dump_pending_on_device
> 0x0000000000000000
>
>
> nvlist version: 0
>         version = 0x0
>         class = list.suspect
>         uuid = a6592c60-199f-ead5-9586-ff013bf5ab2d
>         code = SUNOS-8000-KL
>         diag-time = 1384584283 296816
>
>         de = fmd:///module/software-diagnosis
>         fault-list-sz = 0x1
>         fault-list = (array of embedded nvlists)
>         (start fault-list[0])
>         nvlist version: 0
>                 version = 0x0
>                 class = defect.sunos.kernel.panic
>                 certainty = 0x64
>                 asru =
> sw:///:path=/var/crash/unknown/.a6592c60-199f-ead5-9586-ff013bf5ab2d
>                 resource =
> sw:///:path=/var/crash/unknown/.a6592c60-199f-ead5-9586-ff013bf5ab2d
>
>                 savecore-succcess = 1
>                 dump-dir = /var/crash/unknown
>                 dump-files = vmdump.2
>                 os-instance-uuid = a6592c60-199f-ead5-9586-ff013bf5ab2d
>                 panicstr = kernel heap corruption detected
>                 panicstack = fffffffffba49c04 () |
> genunix:kmem_slab_free+c1 () | genunix:kmem_magazine_destroy+6e () |
> genunix:kmem_cache_magazine_purge+dc () |
> genunix:kmem_cache_magazine_resize+40 () | genunix:taskq_thread+2d0 () |
> unix:thread_start+8 () |
>                 crashtime = 1384582658
>                 panic-time = Sat Nov 16 01:17:38 2013 EST
>
>         (end fault-list[0])
>
>         fault-status = 0x1
>         severity = Major
>         __ttl = 0x1
>         __tod = 0x5287145b 0x138515e8
>
>
> ---
> Now, having looked through all 3, I can see in the first two there were
> some warnings:
>
> WARNING: /pci at 0 <http://lists.omniti.com/mailman/listinfo/omnios-discuss>,0/pci8086,3c08 at 3 <http://lists.omniti.com/mailman/listinfo/omnios-discuss>/pci1000,3030 at 0 <http://lists.omniti.com/mailman/listinfo/omnios-discuss> (mpt_sas1):
>         mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31120303
>
> The /var/adm/message also had a sprinkling of these:
> Nov 15 23:36:43 san1 scsi: [ID 243001 kern.warning] WARNING: /pci at 0
> ,0/pci8086,3c08 at 3/pci1000,3030 at 0 (mpt_sas1):
> Nov 15 23:36:43 san1    mptsas_handle_event: IOCStatus=0x8000,
> IOCLogInfo=0x31120303
> Nov 15 23:36:43 san1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c08 at 3
> /pci1000,3030 at 0 (mpt_sas1):
> Nov 15 23:36:43 san1    Log info 0x31120303 received for target 10.
> Nov 15 23:36:43 san1    scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
>
> Following this
> http://lists.omniti.com/pipermail/omnios-discuss/2013-March/000544.htmlto map the target disk, it's my Stec ZeusRAM ZIL drive that's configured as
> a mirror (if I've done it right). I didn't see these errors in the 3rd
> dump, so don't know if it's contributing. I may try to do a memtest
> tomorrow on the system just in case it's some hardware issues.
>
> My zpool status shows all my drives okay with no known data errors.
>
> Not sure how to proceed from here.. my Hyper-V hosts have been using the
> SAN with no issues for 2+ months since it's been up and configured, using
> SRP and IB. I'd expect the VM hosts to crash before my SAN does.
>
> Of course, I can make the vmdump.x files available to anyone who wants to
> look at them (7GB, 8GB, 4GB).
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20131118/74a15cf9/attachment-0001.html>


More information about the OmniOS-discuss mailing list