[OmniOS-discuss] disk failure causing reboot?

Jeff Stockett jstockett at molalla.com
Mon May 18 20:33:33 UTC 2015


The pool is set to fail mode wait.  

In looking at the fmdump -e and fmdump -eV output, it looks just like the drive started having media/disk/transport errors around 3:40am and eventually culminated in the reboot around 6:18am.  The funny thing is that driver-assessment = fatal was returned 42 times on the same device in that period, so I'm not quite sure why it didn't just drop the drive - because the documentation says:

Note: An ereport with the value driver-assessment = fatal results in the fault being propagated.  It appears it didn't drop the drive until after it rebooted.  I can upload the crash dump and or fmdump output if anyone is interested.

Thanks,  Jeff

-----Original Message-----
From: Paul Henson [mailto:paul.b.henson at gmail.com] On Behalf Of Paul B. Henson
Sent: Monday, May 18, 2015 1:09 PM
To: Jeff Stockett
Cc: omnios-discuss at lists.omniti.com
Subject: Re: [OmniOS-discuss] disk failure causing reboot?

On Mon, May 18, 2015 at 06:25:34PM +0000, Jeff Stockett wrote:
> A drive failed in one of our supermicro 5048R-E1CR36L servers running 
> omnios r151012 last night, and somewhat unexpectedly, the whole system 
> seems to have panicked.

You don't happen to have failmode set to panic on the pool?

>From the zpool manpage:

       failmode=wait | continue | panic
           Controls the system behavior in the event of catastrophic pool
           failure. This condition is typically a result of a loss of
           connectivity to the underlying storage device(s) or a failure of
           all devices within the pool. The behavior of such an event is
           determined as follows:

           wait
                       Blocks all I/O access until the device connectivity is
                       recovered and the errors are cleared. This is the
                       default behavior.

           continue
                       Returns EIO to any new write I/O requests but allows
                       reads to any of the remaining healthy devices. Any
                       write requests that have yet to be committed to disk
                       would be blocked.

           panic
                       Prints out a message to the console and generates a
                       system crash dump.



More information about the OmniOS-discuss mailing list