[OmniOS-discuss] big zfs storage?

Mick Burns bmx1955 at gmail.com
Wed Oct 7 20:59:14 UTC 2015


So... how does Nexenta copes with hot spares and all kinds of disk failures ?
Adding hot spares is part of their administration manuals so can we
assume things are almost always handled smoothly ?  I'd like to hear
from tangible experiences in production.


thanks

On Mon, Jul 13, 2015 at 7:58 AM, Schweiss, Chip <chip at innovates.com> wrote:
> Liam,
>
> This report is encouraging.  Please share some details of your
> configuration.   What disk failure parameters are have you set?   Which
> JBODs and disks are you running?
>
> I have mostly DataON JBODs and a some Supermicro.   DataON has PMC SAS
> expanders and Supermicro has LSI, both setups have pretty much the same
> behavior with disk failures.   All my servers are Supermicro with LSI HBAs.
>
> If there's a magic combination of hardware and OS config out there that
> solves the disk failure panic problem, I will certainly change my builds
> going forward.
>
> -Chip
>
> On Fri, Jul 10, 2015 at 1:04 PM, Liam Slusser <lslusser at gmail.com> wrote:
>>
>> I have two 800T ZFS systems on OmniOS and a bunch of smaller <50T systems.
>> Things generally work very well.  We loose a disk here and there but its
>> never resulted in downtime.  They're all on Dell hardware with LSI or Dell
>> PERC controllers.
>>
>> Putting in smaller disk failure parameters, so disks fail quicker, was a
>> big help when something does go wrong with a disk.
>>
>> thanks,
>> liam
>>
>>
>> On Fri, Jul 10, 2015 at 10:31 AM, Schweiss, Chip <chip at innovates.com>
>> wrote:
>>>
>>> Unfortunately for the past couple years panics on disk failure has been
>>> the norm.   All my production systems are HA with RSF-1, so at least things
>>> come back online relatively quick.  There are quite a few open tickets in
>>> the Illumos bug tracker related to mpt_sas related panics.
>>>
>>> Most of the work to fix these problems has been committed in the past
>>> year, though problems still exist.  For example, my systems are dual path
>>> SAS, however, mpt_sas will panic if you pull a cable instead of dropping a
>>> path to the disks.  Dan McDonald is actively working to resolve this.   He
>>> is also pushing a bug fix in genunix from Nexenta that appears to fix a lot
>>> of the panic problems.   I'll know for sure in a few months after I see a
>>> disk or two drop if it truly fixes things.  Hans Rosenfeld at Nexenta is
>>> responsible for most of the updates to mpt_sas including support for 3008
>>> (12G SAS).
>>>
>>> I haven't run any 12G SAS yet, but plan to on my next build in a couple
>>> months.   This will be about 300TB using an 84 disk JBOD.  All the code from
>>> Nexenta to support the 3008 appears to be in Illumos now, and they fully
>>> support it so I suspect it's pretty stable now.  From what I understand
>>> there may be some 12G performance fixes coming sometime.
>>>
>>> The fault manager is nice when the system doesn't panic.  When it panics,
>>> the fault manger never gets a chance to take action.  It is still the
>>> consensus that is is better to run pools without hot spares because there
>>> are situations the fault manager will do bad things.   I witnessed this
>>> myself when building a system and the fault manger replaced 5 disks in a
>>> raidz2 vdev inside 1 minute, trashing the pool.   I haven't completely yield
>>> to the "best practice".  I now run one hot spare per pool.  I figure with
>>> raidz2, the odds of the fault manager causing something catastrophic is much
>>> less possible.
>>>
>>> -Chip
>>>
>>>
>>>
>>> On Fri, Jul 10, 2015 at 11:37 AM, Linda Kateley <lkateley at kateley.com>
>>> wrote:
>>>>
>>>> I have to build and maintain my own system. I usually help others
>>>> build(i teach zfs and freenas classes/consulting). I really love fault
>>>> management in solaris and miss it. Just thought since it's my system and I
>>>> get to choose I would use omni. I have 20+ years using solaris and only 2 on
>>>> freebsd.
>>>>
>>>> I like freebsd for how well tuned for zfs oob. I miss the network, v12n
>>>> and resource controls in solaris.
>>>>
>>>> Concerned about panics on disk failure. Is that common?
>>>>
>>>>
>>>> linda
>>>>
>>>>
>>>> On 7/9/15 9:30 PM, Schweiss, Chip wrote:
>>>>
>>>> Linda,
>>>>
>>>> I have 3.5 PB running under OmniOS.  All my systems have LSI 2108 HBAs
>>>> which is considered the best choice for HBAs.
>>>>
>>>> Illumos leaves a bit to be desired with handling faults from disks or
>>>> SAS problems, but things under OmniOS have been improving, much thanks to
>>>> Dan McDonald and OmniTI.   We have a paid support on all of our production
>>>> systems with OmniTI.  Their response and dedication has been very good.
>>>> Other than the occasional panic and restart from a disk failure, OmniOS has
>>>> been solid.   ZFS of course never has lost a single bit of information.
>>>>
>>>> I'd be curious why you're looking to move, have there been specific
>>>> problems under BSD or ZoL?  I've been slowly evaluating FreeBSD ZFS, but of
>>>> course the skeletons in the closet never seem to come out until you do
>>>> something big.
>>>>
>>>> -Chip
>>>>
>>>> On Thu, Jul 9, 2015 at 4:21 PM, Linda Kateley <lkateley at kateley.com>
>>>> wrote:
>>>>>
>>>>> Hey is there anyone out there running big zfs on omni?
>>>>>
>>>>> I have been doing mostly zol and freebsd for the last year but have to
>>>>> build a 300+TB box and i want to come back home to roots(solaris). Feeling
>>>>> kind of hesitant :) Also, if you had to do over, is there anything you would
>>>>> do different.
>>>>>
>>>>> Also, what is the go to HBA these days? Seems like i saw stable code
>>>>> for lsi 3008?
>>>>>
>>>>> TIA
>>>>>
>>>>> linda
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OmniOS-discuss mailing list
>>>>> OmniOS-discuss at lists.omniti.com
>>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>>>
>>>>
>>>>
>>>> --
>>>> Linda Kateley
>>>> Kateley Company
>>>> Skype ID-kateleyco
>>>> http://kateleyco.com
>>>
>>>
>>>
>>> _______________________________________________
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss at lists.omniti.com
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>>
>>
>
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>


More information about the OmniOS-discuss mailing list