[OmniOS-discuss] ZFS crash/reboot loop

Dan McDonald danmcd at omniti.com
Mon Jul 13 15:46:42 UTC 2015


> On Jul 13, 2015, at 11:29 AM, Dan McDonald <danmcd at omniti.com> wrote:
> 
> 
>> On Jul 13, 2015, at 11:25 AM, Derek Yarnell <derek at umiacs.umd.edu> wrote:
>> 
>> https://obj.umiacs.umd.edu/derek_support/vmdump.0
> 
> Yeah, that's what I'm seeking.  Downloading it now to an r151014 box (you are running r151014 according to the first mail).  My normal '014 box is otherwise indisposed at the moment, so this dump may take a bit longer to analyze.  I can forward it along to the ZFS folks once I've done my initial analysis.
> 
> For bugs like these, I usually have to engage the illumos ZFS list.  If anyone here wants to follow along, I'll Cc: you on anything I report to them.


Okay, it's a VERIFY() failure in zio_buf_alloc().  It's passed a size of 0 by its caller.  Observe this MDB interaction:

> $c
vpanic()
0xfffffffffba8b13d()
zio_buf_alloc+0x49(0)
arc_get_data_buf+0x12b(ffffff0d4071ca98)
arc_buf_alloc+0xd2(ffffff0d4dfec000, 0, 0, 1)
...<SNIP!>


0xffffff0d4071ca98 is an arc_buf_t, read off of disk.  The code in arc_get_data_buf starts with:

static void
arc_get_data_buf(arc_buf_t *buf)
{
        arc_state_t             *state = buf->b_hdr->b_l1hdr.b_state;
        uint64_t                size = buf->b_hdr->b_size;
        arc_buf_contents_t      type = arc_buf_type(buf->b_hdr);


So let's look at that size:

> ffffff0d4071ca98::print arc_buf_t b_hdr |::print arc_buf_hdr_t b_size
b_size = 0
> 

Ouch.  There's your zero.

I'm going to forward this very note to the illumos ZFS list.  I see ONE possible bugfix post-r151014 that might help:

commit 31c46cf23cd1cf4d66390a983dc5072d7d299ba2
Author: Alek Pinchuk <alek at nexenta.com>
Date:   Tue Jun 30 09:44:11 2015 -0700

    6033 arc_adjust() should search MFU lists for oldest buffer when adjusting MFU size
    Reviewed by: Saso Kiselkov <saso.kiselkov at nexenta.com>
    Reviewed by: Xin Li <delphij at delphij.net>
    Reviewed by: Prakash Surya <me at prakashsurya.com>
    Approved by: Matthew Ahrens <mahrens at delphix.com>

It's a small bug, and I shudder to say this, even hot-patchable on a running system if you're desperate.  :)

Thanks,
Dan
d


More information about the OmniOS-discuss mailing list