[OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks

Fri Mar 27 06:24:07 UTC 2015

>
>
> So here's what I will attempt to test:
> - Create thin vmdk @ 10TB with vSphere fat client: PASS
> - Create lazy zeroed vmdk @ 10 TB with vSphere fat client: PASS
> - Create eager zeroed vmdk @ 10 TB with vSphere web client: PASS! (took 1
> hour)
> - Create thin vmdk @ 10TB with vSphere web client: PASS
> - Create lazy zeroed vmdk @ 10 TB with vSphere web client: PASS
>
>
Additionally, I tried:
- Create fixed vhdx @ 10TB with SCVMM (Hyper-V): PASS (most likely no
primitives in use here - this took slightly over 3 hours)

Everything passed (which I didn't expect, especially the 10TB eager zero)..
then I tried again on the vSphere web client for a 20TB eager zero disk,
and I got another kernel panic altogether (no kmem_flags 0xf set,
unfortunately).

Mar 27 2015 01:09:33.664060000 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
SUNOS-8000-KL

  TIME                 CLASS                                 ENA
  Mar 27 01:09:33.6307 ireport.os.sunos.panic.dump_available
0x0000000000000000
  Mar 27 01:08:30.6688 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
        code = SUNOS-8000-KL
        diag-time = 1427432973 633746
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru =
sw:///:path=/var/crash/unknown/.2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
                resource =
sw:///:path=/var/crash/unknown/.2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
                savecore-succcess = 1
                dump-dir = /var/crash/unknown
                dump-files = vmdump.2
                os-instance-uuid = 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
                panicstr = BAD TRAP: type=d (#gp General protection)
rp=ffffff01eb72ea70 addr=0
                panicstack = unix:real_mode_stop_cpu_stage2_end+9e23 () |
unix:trap+a30 () | unix:cmntrap+e6 () | genunix:anon_decref+35 () |
genunix:anon_free+74 () | genunix:segvn_free+242 () | genunix:seg_free+30
() | genunix:segvn_unmap+cde () | genunix:as_free+e7 () | genunix:relvm+220
() | genunix:proc_exit+454 () | genunix:exit+15 () | genunix:rexit+18 () |
unix:brand_sys_sysenter+1c9 () |
                crashtime = 1427431421
                panic-time = Fri Mar 27 00:43:41 2015 EDT
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x5514e60d 0x2794c060

Crash file:
https://drive.google.com/file/d/0B7mCJnZUzJPKT0lpTW9GZFJCLTg/view?usp=sharing

It appears I can do thin and lazy zero disks of those sizes, so I will have
to be satisfied to use those options as a workaround (plus disabling
WRITE_SAME from the hosts if I really wanted the eager zeroed disk) until
some of that Nexenta COMSTAR love is upstreamed. For comparison sake,
provisioning a 10TB fixed vhdx took approximately 3 hours in Hyper-V, while
the same provisioning in VMware took about 1 hour. So we can say that
WRITE_SAME accelerated the same job by 3x.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150327/8f430726/attachment.html>