[OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks

Dan McDonald danmcd at omniti.com
Thu Mar 26 17:05:09 UTC 2015


> On Mar 26, 2015, at 11:47 AM, wuffers <moo at wuffers.net> wrote:

> It looks like I'll have to make do with lazy zeroed or thin provisioned disks of 10TB+ for my Veeam tests, if it doesn't cause another kernel panic. I'm hesitant to create these now during business hours (and I shouldn't be.. these are normal VM provisioning tasks on available storage!). In your estimation, would eager zero vs lazy zero vs thin provisioned vmdks make any difference with that WRITE_SAME code? The majority of my VMs use eager zeroed disks, but again, never to this size. 

WRITE_SAME is one of the four VAAI primitives.  Nexenta wrote this code for NS, and upstreamed two of them:

WRITE_SAME is "hardware assisted erase".

UNMAP is "hardware assisted freeing".

Those are in upstream illumos.

ATS is atomic-test-and-set or "hardware assisted fine-grained locking".

XCOPY is "hardware assisted copying".

These are in NexentaStor, and after being held back, were open-sourced, but not yet upstreamed.

> If there is anything you need me to test (in R151014? or beyond?), it's easy enough for me to reproduce (I timed myself last night, it took me about 2 hours to gracefully shut/save all the VMs, cause the crash dump, and get the infrastructure back up). I should probably try it on Hyper-V as well when I get time, but I believe most of those are Dynamic (thin) instead of Fixed (eager zero) disks, and I don't believe Hyper-V has an equivalent to lazy zeroed. The Hyper-V environment runs our test VMs after all, and aren't as performance sensitive.

I may be able to generate a fix, but I have no idea if it's sufficient or not.  Like I said, COMSTAR is not well-written or maintainable code, but Nexenta has put a lot of love into it.

> If you can tell me where the fix should go, I can probably try it out, even though I haven't built any kernel modules before (though I'm sure there are enough resources for me to draw on). I'll start by making myself a build server on a VM. Is this http://wiki.illumos.org/display/illumos/How+To+Build+illumos still current?

The small fix I might be able to generate will involve a replacement "stmf_sbd" module.  More on that after I get cycles to generate something.

Dan



More information about the OmniOS-discuss mailing list