[OmniOS-discuss] OmniOS NFS fileserver hanging under sustained high write loads

Doug Hughes doug at will.to
Tue May 5 20:28:27 UTC 2015


I managed to get my system in a state with dd test across a bunch of
client nodes (4k writes, many nodes in parallel, all to the same file -- by
mistake, I meant to do many files), that all of the ttys except for
/dev/console are stuck. It was showing signs of desparation swapping a few
times, but it seems to have recovered from that. I have killed all of the
write-intensive I/O and the host is mostly fine. Load has fallen, no
residual I/O to disks, but the ttys that are not console are still stuck.

I had quite a few pauses in my vmstat output while the memory exhaustion
from write load took place. In contrast, just can't bring the machine down
with read load, as you might expect. The arc does an admiral job with the
72GB ram and can totally fill up the 10g pipes outbound.

It didn't lock up completely, but it came close, and there's some residual
damage lingering with respect to the ttys.

(config = 2xquad core Intel Sandybridge CPU in Sun X4275 with 72GB ram and
12x4TB disks)



On Tue, May 5, 2015 at 12:15 PM, Chris Siebenmann <cks at cs.toronto.edu>
wrote:

> > >> The hard part will be testing this. I'm not sure I have the HW
> > >> in-house to do it.  I may need illumos community help.
> > >
> > > Since we have a test environment where we can reproduce this and a
> > > high interest in seeing it fixed, we can test new kernel packages
> > > and so on.
> > >
> > > (If given specific howto instructions we can probably build test
> > > kernels from source, but we've never tried to do any OmniOS source
> > > building before so it may take us some time to get up to speed on
> > > that. It'd be much easier to take a prebuilt test kernel, drop it
> > > in, and go.)
> >
> > I can turn around the whole world in an hour or less and provide
> > ONU images if your'e on 012 or 014. What revision are you running
> > currently? I can also help you get a build-illumos-omnios up and
> > running as well. Pick your favorite.
>
>  For now, the simplest thing is installable kernel images (I assume
> that's ONU images) for r151014, which is what our test environment
> is using now and what we'd wind up on with all of our production
> fileservers[*]. I won't be able to start any testing with the images
> until this afternoon at the earliest, so I don't think it's urgent to
> build them right away.
>
>  Thanks for all of this!
>
>         - cks
> [*: our production fileservers are currently at r151010 but we're
>     already looking at an r151014 upgrade. having this fix as part
>     of r151014 would make that upgrade definite, and there's other
>     things in 14 that we want, eg >16 group support over NFS.
> ]
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150505/a86f0753/attachment.html>


More information about the OmniOS-discuss mailing list