[OmniOS-discuss] disk failure causing reboot?
Schweiss, Chip
chip at innovates.com
Mon May 18 20:30:34 UTC 2015
I had the exact same failure mode last week. With over 1000 spindles I see
this about once a month.
I can publish my dump also if anyone actually want's to try to fix this
problem, but I think there are several of the same thing already linked to
tickets in Illumos-gate.
Pools for the most part should be set to failmode=panic or wait, but a
failed disk should not cause a panic. The system this happened to me on
failmode was set to wait. It is also on r151012, waiting on a window to
upgrade to r151014. My pool is raidz3, so no reason not to kick a bad disk.
All my disks are SAS in DataON JBODs, dual connected across two LSI
HBAs. BTW, pull a SAS cable and you get a panic too, not degraded
multipath. Illumos seems to panic on just about any SAS event these days
regardless of redundancy.
-Chip
On Mon, May 18, 2015 at 3:08 PM, Paul B. Henson <henson at acm.org> wrote:
> On Mon, May 18, 2015 at 06:25:34PM +0000, Jeff Stockett wrote:
> > A drive failed in one of our supermicro 5048R-E1CR36L servers running
> > omnios r151012 last night, and somewhat unexpectedly, the whole system
> > seems to have panicked.
>
> You don't happen to have failmode set to panic on the pool?
>
> From the zpool manpage:
>
> failmode=wait | continue | panic
> Controls the system behavior in the event of catastrophic pool
> failure. This condition is typically a result of a loss of
> connectivity to the underlying storage device(s) or a failure of
> all devices within the pool. The behavior of such an event is
> determined as follows:
>
> wait
> Blocks all I/O access until the device connectivity
> is
> recovered and the errors are cleared. This is the
> default behavior.
>
> continue
> Returns EIO to any new write I/O requests but allows
> reads to any of the remaining healthy devices. Any
> write requests that have yet to be committed to disk
> would be blocked.
>
> panic
> Prints out a message to the console and generates a
> system crash dump.
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150518/005045a0/attachment-0001.html>
More information about the OmniOS-discuss
mailing list