[OmniOS-discuss] kernel panic - anon_decref
wuffers
moo at wuffers.net
Mon Nov 18 22:42:31 UTC 2013
Just to add to this, I had a 4th kernel panic, and this was a 3rd different
type. I did a memtest on the unit after this last panic, and it ran
successfully (24+ hours). I'm skeptical that it's memory, or something to
do with the IOCLogInfo=0x31120303 error (last 2 panics didn't have that - I
may start another thread on that), as I've been running this config with
Hyper-V hosts just fine. Adding an ESXi host (just one for now) into the
mix seems to make things unstable.
Should I be starting an issue in the Illumos issue report (
https://www.illumos.org/projects/illumos-gate/issues/new), and if so, just
one report or one for each panic type?
List of kernel panics so far:
Panic 1: anon_decref: slot count 0
Panic 2-3: kernel heap corruption detected
Panic 4: BAD TRAP: type=e (#pf Page fault) rp=ffffff01e97d7a70 addr=1500010
occurred in module "genunix" due to an illegal access to a user address
Latest crash file here:
https://drive.google.com/file/d/0B7mCJnZUzJPKWW83TFBhVHpVajQ
TIME UUID
SUNW-MSG-ID
Nov 17 2013 09:22:20.799446000 9d55f532-d39f-4dea-8f57-d3b24c8e9dff
SUNOS-8000-KL
TIME CLASS ENA
Nov 17 09:22:20.7654 ireport.os.sunos.panic.dump_available
0x0000000000000000
Nov 17 09:21:14.0267 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000
nvlist version: 0
version = 0x0
class = list.suspect
uuid = 9d55f532-d39f-4dea-8f57-d3b24c8e9dff
code = SUNOS-8000-KL
diag-time = 1384698140 767808
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru =
sw:///:path=/var/crash/unknown/.9d55f532-d39f-4dea-8f57-d3b24c8e9dff
resource =
sw:///:path=/var/crash/unknown/.9d55f532-d39f-4dea-8f57-d3b24c8e9dff
savecore-succcess = 1
dump-dir = /var/crash/unknown
dump-files = vmdump.3
os-instance-uuid = 9d55f532-d39f-4dea-8f57-d3b24c8e9dff
panicstr = BAD TRAP: type=e (#pf Page fault)
rp=ffffff01e97d7a70 addr=1500010 occurred in module "genunix" due to an
illegal access to a user address
panicstack = unix:die+df () | unix:trap+db3 () |
unix:cmntrap+e6 () | genunix:anon_decref+35 () | genunix:anon_free+74 () |
genunix:segvn_free+242 () | genunix:seg_free+30 () |
genunix:segvn_unmap+cde () | genunix:as_free+e7 () | genunix:relvm+220 () |
genunix:proc_exit+454 () | genunix:exit+15 () | genunix:rexit+18 () |
unix:brand_sys_sysenter+1c9 () |
crashtime = 1384592942
panic-time = Sat Nov 16 04:09:02 2013 EST
(end fault-list[0])
fault-status = 0x1
severity = Major
__ttl = 0x1
__tod = 0x5288d11c 0x2fa693f0
On Sat, Nov 16, 2013 at 2:48 AM, wuffers <moo at wuffers.net> wrote:
> When it pours, it rains. With r151006y, I had two kernel panics in quick
> succession while trying to create some zero thick eager disks (4 at the
> same time) in ESXi. They are now "kernel heap corruption detected" instead
> of anon_decref.
>
> Kernel panic 2 (dump info:
> https://drive.google.com/file/d/0B7mCJnZUzJPKMHhqZHJnaDEzYkk)
> http://i.imgur.com/eIssxmc.png?1
> http://i.imgur.com/MXJy4zP.png?1
>
> TIME UUID
> SUNW-MSG-ID
> Nov 16 2013 00:51:24.912170000 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
> SUNOS-8000-KL
>
> TIME CLASS ENA
> Nov 16 00:51:24.8638 ireport.os.sunos.panic.dump_available
> 0x0000000000000000
> Nov 16 00:49:58.8671 ireport.os.sunos.panic.dump_pending_on_device
> 0x0000000000000000
>
>
> nvlist version: 0
> version = 0x0
> class = list.suspect
> uuid = 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
> code = SUNOS-8000-KL
> diag-time = 1384581084 866703
>
> de = fmd:///module/software-diagnosis
> fault-list-sz = 0x1
> fault-list = (array of embedded nvlists)
> (start fault-list[0])
> nvlist version: 0
> version = 0x0
> class = defect.sunos.kernel.panic
> certainty = 0x64
> asru =
> sw:///:path=/var/crash/unknown/.5998ba1e-3aa5-ccac-e885-be4897cfcfe8
> resource =
> sw:///:path=/var/crash/unknown/.5998ba1e-3aa5-ccac-e885-be4897cfcfe8
>
> savecore-succcess = 1
> dump-dir = /var/crash/unknown
> dump-files = vmdump.1
> os-instance-uuid = 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
> panicstr = kernel heap corruption detected
> panicstack = fffffffffba49c04 () |
> genunix:kmem_slab_free+c1 () | genunix:kmem_magazine_destroy+6e () |
> genunix:kmem_depot_ws_reap+5d () | genunix:kmem_cache_magazine_purge+118 ()
> | genunix:kmem_cache_magazine_resize+40 () | genunix:taskq_thread+2d0 () |
> unix:thread_start+8 () |
> crashtime = 1384577735
> panic-time = Fri Nov 15 23:55:35 2013 EST
>
> (end fault-list[0])
>
> fault-status = 0x1
> severity = Major
> __ttl = 0x1
> __tod = 0x528707dc 0x365e9c10
>
> kernel panic 3 (dump info:
> https://drive.google.com/file/d/0B7mCJnZUzJPKbnZIeWZzQjhUOTQ):
> (looked the same, no screenshots)
>
> TIME UUID
> SUNW-MSG-ID
> Nov 16 2013 01:44:43.327489000 a6592c60-199f-ead5-9586-ff013bf5ab2d
> SUNOS-8000-KL
>
> TIME CLASS ENA
> Nov 16 01:44:43.2941 ireport.os.sunos.panic.dump_available
> 0x0000000000000000
> Nov 16 01:44:03.5356 ireport.os.sunos.panic.dump_pending_on_device
> 0x0000000000000000
>
>
> nvlist version: 0
> version = 0x0
> class = list.suspect
> uuid = a6592c60-199f-ead5-9586-ff013bf5ab2d
> code = SUNOS-8000-KL
> diag-time = 1384584283 296816
>
> de = fmd:///module/software-diagnosis
> fault-list-sz = 0x1
> fault-list = (array of embedded nvlists)
> (start fault-list[0])
> nvlist version: 0
> version = 0x0
> class = defect.sunos.kernel.panic
> certainty = 0x64
> asru =
> sw:///:path=/var/crash/unknown/.a6592c60-199f-ead5-9586-ff013bf5ab2d
> resource =
> sw:///:path=/var/crash/unknown/.a6592c60-199f-ead5-9586-ff013bf5ab2d
>
> savecore-succcess = 1
> dump-dir = /var/crash/unknown
> dump-files = vmdump.2
> os-instance-uuid = a6592c60-199f-ead5-9586-ff013bf5ab2d
> panicstr = kernel heap corruption detected
> panicstack = fffffffffba49c04 () |
> genunix:kmem_slab_free+c1 () | genunix:kmem_magazine_destroy+6e () |
> genunix:kmem_cache_magazine_purge+dc () |
> genunix:kmem_cache_magazine_resize+40 () | genunix:taskq_thread+2d0 () |
> unix:thread_start+8 () |
> crashtime = 1384582658
> panic-time = Sat Nov 16 01:17:38 2013 EST
>
> (end fault-list[0])
>
> fault-status = 0x1
> severity = Major
> __ttl = 0x1
> __tod = 0x5287145b 0x138515e8
>
>
> ---
> Now, having looked through all 3, I can see in the first two there were
> some warnings:
>
> WARNING: /pci at 0 <http://lists.omniti.com/mailman/listinfo/omnios-discuss>,0/pci8086,3c08 at 3 <http://lists.omniti.com/mailman/listinfo/omnios-discuss>/pci1000,3030 at 0 <http://lists.omniti.com/mailman/listinfo/omnios-discuss> (mpt_sas1):
> mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31120303
>
> The /var/adm/message also had a sprinkling of these:
> Nov 15 23:36:43 san1 scsi: [ID 243001 kern.warning] WARNING: /pci at 0
> ,0/pci8086,3c08 at 3/pci1000,3030 at 0 (mpt_sas1):
> Nov 15 23:36:43 san1 mptsas_handle_event: IOCStatus=0x8000,
> IOCLogInfo=0x31120303
> Nov 15 23:36:43 san1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c08 at 3
> /pci1000,3030 at 0 (mpt_sas1):
> Nov 15 23:36:43 san1 Log info 0x31120303 received for target 10.
> Nov 15 23:36:43 san1 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
>
> Following this
> http://lists.omniti.com/pipermail/omnios-discuss/2013-March/000544.htmlto map the target disk, it's my Stec ZeusRAM ZIL drive that's configured as
> a mirror (if I've done it right). I didn't see these errors in the 3rd
> dump, so don't know if it's contributing. I may try to do a memtest
> tomorrow on the system just in case it's some hardware issues.
>
> My zpool status shows all my drives okay with no known data errors.
>
> Not sure how to proceed from here.. my Hyper-V hosts have been using the
> SAN with no issues for 2+ months since it's been up and configured, using
> SRP and IB. I'd expect the VM hosts to crash before my SAN does.
>
> Of course, I can make the vmdump.x files available to anyone who wants to
> look at them (7GB, 8GB, 4GB).
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20131118/74a15cf9/attachment-0001.html>
More information about the OmniOS-discuss
mailing list