[OmniOS-discuss] kernel panic - anon_decref
wuffers
moo at wuffers.net
Sat Nov 16 07:48:43 UTC 2013
When it pours, it rains. With r151006y, I had two kernel panics in quick
succession while trying to create some zero thick eager disks (4 at the
same time) in ESXi. They are now "kernel heap corruption detected" instead
of anon_decref.
Kernel panic 2 (dump info:
https://drive.google.com/file/d/0B7mCJnZUzJPKMHhqZHJnaDEzYkk)
http://i.imgur.com/eIssxmc.png?1
http://i.imgur.com/MXJy4zP.png?1
TIME UUID
SUNW-MSG-ID
Nov 16 2013 00:51:24.912170000 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
SUNOS-8000-KL
TIME CLASS ENA
Nov 16 00:51:24.8638 ireport.os.sunos.panic.dump_available
0x0000000000000000
Nov 16 00:49:58.8671 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000
nvlist version: 0
version = 0x0
class = list.suspect
uuid = 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
code = SUNOS-8000-KL
diag-time = 1384581084 866703
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru =
sw:///:path=/var/crash/unknown/.5998ba1e-3aa5-ccac-e885-be4897cfcfe8
resource =
sw:///:path=/var/crash/unknown/.5998ba1e-3aa5-ccac-e885-be4897cfcfe8
savecore-succcess = 1
dump-dir = /var/crash/unknown
dump-files = vmdump.1
os-instance-uuid = 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
panicstr = kernel heap corruption detected
panicstack = fffffffffba49c04 () |
genunix:kmem_slab_free+c1 () | genunix:kmem_magazine_destroy+6e () |
genunix:kmem_depot_ws_reap+5d () | genunix:kmem_cache_magazine_purge+118 ()
| genunix:kmem_cache_magazine_resize+40 () | genunix:taskq_thread+2d0 () |
unix:thread_start+8 () |
crashtime = 1384577735
panic-time = Fri Nov 15 23:55:35 2013 EST
(end fault-list[0])
fault-status = 0x1
severity = Major
__ttl = 0x1
__tod = 0x528707dc 0x365e9c10
kernel panic 3 (dump info:
https://drive.google.com/file/d/0B7mCJnZUzJPKbnZIeWZzQjhUOTQ):
(looked the same, no screenshots)
TIME UUID
SUNW-MSG-ID
Nov 16 2013 01:44:43.327489000 a6592c60-199f-ead5-9586-ff013bf5ab2d
SUNOS-8000-KL
TIME CLASS ENA
Nov 16 01:44:43.2941 ireport.os.sunos.panic.dump_available
0x0000000000000000
Nov 16 01:44:03.5356 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000
nvlist version: 0
version = 0x0
class = list.suspect
uuid = a6592c60-199f-ead5-9586-ff013bf5ab2d
code = SUNOS-8000-KL
diag-time = 1384584283 296816
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru =
sw:///:path=/var/crash/unknown/.a6592c60-199f-ead5-9586-ff013bf5ab2d
resource =
sw:///:path=/var/crash/unknown/.a6592c60-199f-ead5-9586-ff013bf5ab2d
savecore-succcess = 1
dump-dir = /var/crash/unknown
dump-files = vmdump.2
os-instance-uuid = a6592c60-199f-ead5-9586-ff013bf5ab2d
panicstr = kernel heap corruption detected
panicstack = fffffffffba49c04 () |
genunix:kmem_slab_free+c1 () | genunix:kmem_magazine_destroy+6e () |
genunix:kmem_cache_magazine_purge+dc () |
genunix:kmem_cache_magazine_resize+40 () | genunix:taskq_thread+2d0 () |
unix:thread_start+8 () |
crashtime = 1384582658
panic-time = Sat Nov 16 01:17:38 2013 EST
(end fault-list[0])
fault-status = 0x1
severity = Major
__ttl = 0x1
__tod = 0x5287145b 0x138515e8
---
Now, having looked through all 3, I can see in the first two there were
some warnings:
WARNING: /pci at 0
<http://lists.omniti.com/mailman/listinfo/omnios-discuss>,0/pci8086,3c08
at 3 <http://lists.omniti.com/mailman/listinfo/omnios-discuss>/pci1000,3030
at 0 <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
(mpt_sas1):
mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31120303
The /var/adm/message also had a sprinkling of these:
Nov 15 23:36:43 san1 scsi: [ID 243001 kern.warning] WARNING: /pci at 0
,0/pci8086,3c08 at 3/pci1000,3030 at 0 (mpt_sas1):
Nov 15 23:36:43 san1 mptsas_handle_event: IOCStatus=0x8000,
IOCLogInfo=0x31120303
Nov 15 23:36:43 san1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c08 at 3
/pci1000,3030 at 0 (mpt_sas1):
Nov 15 23:36:43 san1 Log info 0x31120303 received for target 10.
Nov 15 23:36:43 san1 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Following this
http://lists.omniti.com/pipermail/omnios-discuss/2013-March/000544.html to
map the target disk, it's my Stec ZeusRAM ZIL drive that's configured as a
mirror (if I've done it right). I didn't see these errors in the 3rd dump,
so don't know if it's contributing. I may try to do a memtest tomorrow on
the system just in case it's some hardware issues.
My zpool status shows all my drives okay with no known data errors.
Not sure how to proceed from here.. my Hyper-V hosts have been using the
SAN with no issues for 2+ months since it's been up and configured, using
SRP and IB. I'd expect the VM hosts to crash before my SAN does.
Of course, I can make the vmdump.x files available to anyone who wants to
look at them (7GB, 8GB, 4GB).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20131116/302ede10/attachment.html>
More information about the OmniOS-discuss
mailing list