[OmniOS-discuss] big zfs storage?

Chris Siebenmann cks at cs.toronto.edu
Thu Oct 8 01:36:43 UTC 2015


> I completely concur with Richard on this.  Let me give an a real example
> that emphases this point as it's a critical design decision.
[...]
> Now I only run one hot spare per pool.  Most of my pools are raidz2 or
> raidz3.  This way any event like this can not take out more than one
> disk and data parity will never be lost.
>
> There are other causes that can trigger multiple disk replacements. I
> have not encountered them.  If I do, they won't hurt my data with the
> limit of one hot spare.

 My view is that spare handling needs to be a local decision based on
your storage topology and pool and vdev structure (and on your durability
needs, and even on how staffing is handled, eg if you have a 24/7 on
call rotation). I don't think there is any single global right answer;
hot spares will be good for some people and bad for others.

 Locally we use mirrored vdevs, multiple pools, an iSCSI SAN to connect
to actual disks, multiple backend disk controllers, and no 24/7 on call
setup. We've developed an automated spares handling system that knows a
great deal about our local storage topology (so it knows what are 'good'
and 'bad' spares for any particular bad disk, using various criteria)
and having it available has been very helpful in the face of various
things going wrong, both individual disk failures and entire backend
disk controllers suffering power failures after the end of the workday.
Our solution is of course very local, but the important thing is that
it's clear that automating this has been the right tradeoff *for us*.

(In another environment it would probably be the wrong answer, eg if we
had a 24/7 NOC staffed with people to swap physical disks and hardware
at any time of the day, night, or holidays, and a 24/7 on call sysadmin
to do system things like 'zpool replace'. There are other parts of
the university which do have this. I suspect that they don't use an
automated spares system of any kind, although I don't know for sure.)

	- cks


More information about the OmniOS-discuss mailing list