[OmniOS-discuss] disk failure causing reboot?

Mon May 18 20:30:34 UTC 2015

I had the exact same failure mode last week.  With over 1000 spindles I see
this about once a month.

I can publish my dump also if anyone actually want's to try to fix this
problem, but I think there are several of the same thing already linked to
tickets in Illumos-gate.

Pools for the most part should be set to failmode=panic or wait, but a
failed disk should not cause a panic.   The system this happened to me on
failmode was set to wait.  It is also on r151012, waiting on a window to
upgrade to r151014.  My pool is raidz3, so no reason not to kick a bad disk.

All my disks are SAS in DataON JBODs, dual connected across two LSI
HBAs.    BTW, pull a SAS cable and you get a panic too, not degraded
multipath.    Illumos seems to panic on just about any SAS event these days
regardless of redundancy.

-Chip

On Mon, May 18, 2015 at 3:08 PM, Paul B. Henson <henson at acm.org> wrote:

> On Mon, May 18, 2015 at 06:25:34PM +0000, Jeff Stockett wrote:
> > A drive failed in one of our supermicro 5048R-E1CR36L servers running
> > omnios r151012 last night, and somewhat unexpectedly, the whole system
> > seems to have panicked.
>
> You don't happen to have failmode set to panic on the pool?
>
> From the zpool manpage:
>
>        failmode=wait | continue | panic
>            Controls the system behavior in the event of catastrophic pool
>            failure. This condition is typically a result of a loss of
>            connectivity to the underlying storage device(s) or a failure of
>            all devices within the pool. The behavior of such an event is
>            determined as follows:
>
>            wait
>                        Blocks all I/O access until the device connectivity
> is
>                        recovered and the errors are cleared. This is the
>                        default behavior.
>
>            continue
>                        Returns EIO to any new write I/O requests but allows
>                        reads to any of the remaining healthy devices. Any
>                        write requests that have yet to be committed to disk
>                        would be blocked.
>
>            panic
>                        Prints out a message to the console and generates a
>                        system crash dump.
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150518/005045a0/attachment-0001.html>