[OmniOS-discuss] big zfs storage?

Schweiss, Chip chip at innovates.com
Thu Oct 8 00:56:30 UTC 2015


I completely concur with Richard on this.  Let me give an a real example
that emphases this point as it's a critical design decision.

I never fully understood this until I saw in action the problem can
automate hot spares can cause.   I had all 5 hot spares get put into action
on one raidz2 vdev of a 300TB pool.  This was triggered by an HA event that
was taking SCSI reservations in a split brain situation that was supposed
to trigger a panic on one system.  This caused a highly corrupted pool.
Fortunately this was not a production pool and I simply trashed it and
started reloading data.

Now I only run one hot spare per pool.  Most of my pools are raidz2 or
raidz3.   This way any event like this can not take out more than one disk
and data parity will never be lost.

There are other causes that can trigger multiple disk replacements. I have
not encountered them.  If I do, they won't hurt my data with the limit of
one hot spare.

-Chip




On Wed, Oct 7, 2015 at 5:38 PM, Richard Elling <
richard.elling at richardelling.com> wrote:

>
> > On Oct 7, 2015, at 1:59 PM, Mick Burns <bmx1955 at gmail.com> wrote:
> >
> > So... how does Nexenta copes with hot spares and all kinds of disk
> failures ?
> > Adding hot spares is part of their administration manuals so can we
> > assume things are almost always handled smoothly ?  I'd like to hear
> > from tangible experiences in production.
>
> I do not speak for Nexenta.
>
> Hot spares are a bigger issue when you have single parity protection.
> With double parity and large pools, warm spares is a better approach.
> The reasons are:
>
> 1. Hot spares exist solely to eliminate the time between disk failure and
> human
>    intervention for corrective action. There is no other reason to have
> hot spares.
>    The exposure for a single disk failure under single parity protection
> is too risky
>    for most folks, but with double parity (eg raidz2 or RAID-6) the few
> hours you
>    save has little impact on overall data availabilty vs warm spares.
>
> 2. Under some transient failure conditions (eg isolated power failure, IOM
> reboot, or fabric
>    partition), all available hot spares can be kicked into action. This
> can leave you with a
>    big mess for large pools with many drives and spares. You can avoid
> this by making a
>    human be involved in the decision process, rather than just *locally
> isolated,* automated
>    decision making.
>
>  -- richard
>
> >
> >
> > thanks
> >
> > On Mon, Jul 13, 2015 at 7:58 AM, Schweiss, Chip <chip at innovates.com>
> wrote:
> >> Liam,
> >>
> >> This report is encouraging.  Please share some details of your
> >> configuration.   What disk failure parameters are have you set?   Which
> >> JBODs and disks are you running?
> >>
> >> I have mostly DataON JBODs and a some Supermicro.   DataON has PMC SAS
> >> expanders and Supermicro has LSI, both setups have pretty much the same
> >> behavior with disk failures.   All my servers are Supermicro with LSI
> HBAs.
> >>
> >> If there's a magic combination of hardware and OS config out there that
> >> solves the disk failure panic problem, I will certainly change my builds
> >> going forward.
> >>
> >> -Chip
> >>
> >> On Fri, Jul 10, 2015 at 1:04 PM, Liam Slusser <lslusser at gmail.com>
> wrote:
> >>>
> >>> I have two 800T ZFS systems on OmniOS and a bunch of smaller <50T
> systems.
> >>> Things generally work very well.  We loose a disk here and there but
> its
> >>> never resulted in downtime.  They're all on Dell hardware with LSI or
> Dell
> >>> PERC controllers.
> >>>
> >>> Putting in smaller disk failure parameters, so disks fail quicker, was
> a
> >>> big help when something does go wrong with a disk.
> >>>
> >>> thanks,
> >>> liam
> >>>
> >>>
> >>> On Fri, Jul 10, 2015 at 10:31 AM, Schweiss, Chip <chip at innovates.com>
> >>> wrote:
> >>>>
> >>>> Unfortunately for the past couple years panics on disk failure has
> been
> >>>> the norm.   All my production systems are HA with RSF-1, so at least
> things
> >>>> come back online relatively quick.  There are quite a few open
> tickets in
> >>>> the Illumos bug tracker related to mpt_sas related panics.
> >>>>
> >>>> Most of the work to fix these problems has been committed in the past
> >>>> year, though problems still exist.  For example, my systems are dual
> path
> >>>> SAS, however, mpt_sas will panic if you pull a cable instead of
> dropping a
> >>>> path to the disks.  Dan McDonald is actively working to resolve
> this.   He
> >>>> is also pushing a bug fix in genunix from Nexenta that appears to fix
> a lot
> >>>> of the panic problems.   I'll know for sure in a few months after I
> see a
> >>>> disk or two drop if it truly fixes things.  Hans Rosenfeld at Nexenta
> is
> >>>> responsible for most of the updates to mpt_sas including support for
> 3008
> >>>> (12G SAS).
> >>>>
> >>>> I haven't run any 12G SAS yet, but plan to on my next build in a
> couple
> >>>> months.   This will be about 300TB using an 84 disk JBOD.  All the
> code from
> >>>> Nexenta to support the 3008 appears to be in Illumos now, and they
> fully
> >>>> support it so I suspect it's pretty stable now.  From what I
> understand
> >>>> there may be some 12G performance fixes coming sometime.
> >>>>
> >>>> The fault manager is nice when the system doesn't panic.  When it
> panics,
> >>>> the fault manger never gets a chance to take action.  It is still the
> >>>> consensus that is is better to run pools without hot spares because
> there
> >>>> are situations the fault manager will do bad things.   I witnessed
> this
> >>>> myself when building a system and the fault manger replaced 5 disks
> in a
> >>>> raidz2 vdev inside 1 minute, trashing the pool.   I haven't
> completely yield
> >>>> to the "best practice".  I now run one hot spare per pool.  I figure
> with
> >>>> raidz2, the odds of the fault manager causing something catastrophic
> is much
> >>>> less possible.
> >>>>
> >>>> -Chip
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Jul 10, 2015 at 11:37 AM, Linda Kateley <lkateley at kateley.com
> >
> >>>> wrote:
> >>>>>
> >>>>> I have to build and maintain my own system. I usually help others
> >>>>> build(i teach zfs and freenas classes/consulting). I really love
> fault
> >>>>> management in solaris and miss it. Just thought since it's my system
> and I
> >>>>> get to choose I would use omni. I have 20+ years using solaris and
> only 2 on
> >>>>> freebsd.
> >>>>>
> >>>>> I like freebsd for how well tuned for zfs oob. I miss the network,
> v12n
> >>>>> and resource controls in solaris.
> >>>>>
> >>>>> Concerned about panics on disk failure. Is that common?
> >>>>>
> >>>>>
> >>>>> linda
> >>>>>
> >>>>>
> >>>>> On 7/9/15 9:30 PM, Schweiss, Chip wrote:
> >>>>>
> >>>>> Linda,
> >>>>>
> >>>>> I have 3.5 PB running under OmniOS.  All my systems have LSI 2108
> HBAs
> >>>>> which is considered the best choice for HBAs.
> >>>>>
> >>>>> Illumos leaves a bit to be desired with handling faults from disks or
> >>>>> SAS problems, but things under OmniOS have been improving, much
> thanks to
> >>>>> Dan McDonald and OmniTI.   We have a paid support on all of our
> production
> >>>>> systems with OmniTI.  Their response and dedication has been very
> good.
> >>>>> Other than the occasional panic and restart from a disk failure,
> OmniOS has
> >>>>> been solid.   ZFS of course never has lost a single bit of
> information.
> >>>>>
> >>>>> I'd be curious why you're looking to move, have there been specific
> >>>>> problems under BSD or ZoL?  I've been slowly evaluating FreeBSD ZFS,
> but of
> >>>>> course the skeletons in the closet never seem to come out until you
> do
> >>>>> something big.
> >>>>>
> >>>>> -Chip
> >>>>>
> >>>>> On Thu, Jul 9, 2015 at 4:21 PM, Linda Kateley <lkateley at kateley.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hey is there anyone out there running big zfs on omni?
> >>>>>>
> >>>>>> I have been doing mostly zol and freebsd for the last year but have
> to
> >>>>>> build a 300+TB box and i want to come back home to roots(solaris).
> Feeling
> >>>>>> kind of hesitant :) Also, if you had to do over, is there anything
> you would
> >>>>>> do different.
> >>>>>>
> >>>>>> Also, what is the go to HBA these days? Seems like i saw stable code
> >>>>>> for lsi 3008?
> >>>>>>
> >>>>>> TIA
> >>>>>>
> >>>>>> linda
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> OmniOS-discuss mailing list
> >>>>>> OmniOS-discuss at lists.omniti.com
> >>>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Linda Kateley
> >>>>> Kateley Company
> >>>>> Skype ID-kateleyco
> >>>>> http://kateleyco.com
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> OmniOS-discuss mailing list
> >>>> OmniOS-discuss at lists.omniti.com
> >>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> >>>>
> >>>
> >>
> >>
> >> _______________________________________________
> >> OmniOS-discuss mailing list
> >> OmniOS-discuss at lists.omniti.com
> >> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> >>
> > _______________________________________________
> > OmniOS-discuss mailing list
> > OmniOS-discuss at lists.omniti.com
> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151007/d0ea108f/attachment-0001.html>


More information about the OmniOS-discuss mailing list