[OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks
danmcd at omniti.com
Thu Mar 26 17:05:09 UTC 2015
> On Mar 26, 2015, at 11:47 AM, wuffers <moo at wuffers.net> wrote:
> It looks like I'll have to make do with lazy zeroed or thin provisioned disks of 10TB+ for my Veeam tests, if it doesn't cause another kernel panic. I'm hesitant to create these now during business hours (and I shouldn't be.. these are normal VM provisioning tasks on available storage!). In your estimation, would eager zero vs lazy zero vs thin provisioned vmdks make any difference with that WRITE_SAME code? The majority of my VMs use eager zeroed disks, but again, never to this size.
WRITE_SAME is one of the four VAAI primitives. Nexenta wrote this code for NS, and upstreamed two of them:
WRITE_SAME is "hardware assisted erase".
UNMAP is "hardware assisted freeing".
Those are in upstream illumos.
ATS is atomic-test-and-set or "hardware assisted fine-grained locking".
XCOPY is "hardware assisted copying".
These are in NexentaStor, and after being held back, were open-sourced, but not yet upstreamed.
> If there is anything you need me to test (in R151014? or beyond?), it's easy enough for me to reproduce (I timed myself last night, it took me about 2 hours to gracefully shut/save all the VMs, cause the crash dump, and get the infrastructure back up). I should probably try it on Hyper-V as well when I get time, but I believe most of those are Dynamic (thin) instead of Fixed (eager zero) disks, and I don't believe Hyper-V has an equivalent to lazy zeroed. The Hyper-V environment runs our test VMs after all, and aren't as performance sensitive.
I may be able to generate a fix, but I have no idea if it's sufficient or not. Like I said, COMSTAR is not well-written or maintainable code, but Nexenta has put a lot of love into it.
> If you can tell me where the fix should go, I can probably try it out, even though I haven't built any kernel modules before (though I'm sure there are enough resources for me to draw on). I'll start by making myself a build server on a VM. Is this http://wiki.illumos.org/display/illumos/How+To+Build+illumos still current?
The small fix I might be able to generate will involve a replacement "stmf_sbd" module. More on that after I get cycles to generate something.
More information about the OmniOS-discuss