[OmniOS-discuss] zpool Write Bottlenecks
Dale Ghent
daleg at omniti.com
Fri Sep 30 02:46:22 UTC 2016
Awesome that you're using LX Zones in a way with BeeGFS.
A note on your testing methodology, however:
http://lethargy.org/~jesus/writes/disk-benchmarking-with-dd-dont/#.V-3RUqOZPOY
> On Sep 29, 2016, at 10:21 PM, Michael Talbott <mtalbott at lji.org> wrote:
>
> Hi, I'm trying to find a way to achieve massive write speeds with some decent hardware which will be used for some parallel computing needs (bioinformatics). Eventually if all goes well and my testing succeeds.. I'll be duplicating this setup and run BeeGFS in a few LX zones (THANK YOU LX ZONES!!!) to run some truly massive parallel computing storage happiness, but, I'd like to tune this box as much as possible first.
>
> For this setup I'm using an Intel S2600GZ board and 2 x E5-2640 (six cores ea) @ 2.5GHz and there's 144GB ECC ram installed. I have 3 x SAS2008 based LSI cards in that box. 1 of those is connected to 8 internal SSDs and the other 2 cards (4 ports) are connected to a 45 bay drive enclosure. And then there's 2 intel x 2 port 10ge cards for connectivity.
>
> I've created so many different zpools in different configurations between straight up striped with no redundancy, radiz2, raidz3.. used multipath, non-multipath'd with 8x phy links instead of 4x multipath links, etc, etc in order to find the magic combination for maximum performance, but there's something somewhere capping raw throughput and I can't seem to find it.
>
> Now the crazy part is I can for instance, create zpoolA with ~20 drives (via cardA and attached only to backplaneA), create zpoolB with another ~20 drives (via cardB and attached only to backplaneB), and each of them gets the same performance individually (~1GB/s write and 2.5GB/s read). So, my thought is if I destroy zpoolB and attach all those drives to zpoolA via additional vdevs.. It should double the performance or make some sort of significant improvement, but, nope, roughly the same speed.. Then I thought, ok, well maybe it's a slowest vdev sort of thing.. So then I created vdevs such each vdev used half it's drives from backplaneA and the other half from backplaneB.. That would force data distribution between the cards for each vdev and double the speed and get me to 2GB/s write.. but, nope. same deal. 1G/s write and 2.5G read :(
>
> When I create the pool from scratch, for each vdev I add I see a linear increase in performance until I hit about 4-5 vdevs.. That's where the performance flatlines and no matter what I do beyond that it just wont go any faster :(
>
> Also, if I create a pure SSD pool with cardC, the linear read/write performance of those are hitting the exact same numbers as the others :( Bottom line, no matter what pool configuration I use, no matter what recordsize is set in zfs, I'm always getting capped with roughly 1GB/s write and 2.5GB/s read.
>
> I thought maybe there wasn't enough PCIe lanes to run all of those cards at 8x, but, that's not the case, this board can run 6 x 8 lane PCIe 3.0 cards at full speed.. I booted it up in linux to use lspci -vv to make sure of it (since I'm not sure how to view pcie speeds in OmniOS), and sure enough, everything is running with 8x width, so that's not it.
>
> Oh, and just fyi, this is my super simple throughput testing script that I run with compression disabled on the tested pool.
>
> START=$(date +%s)
> /usr/gnu/bin/dd if=/dev/zero of=/datastore/testdd bs=1M count=10k
> sync
> echo $(($(date +%s)-START))
>
> My goal is to find a way to achieve at least 2GB/s write and 4GB/s read which I think is theoretically possbile with this hardware..
>
> Anyone have any ideas of what could be limiting this or how to remedy it? Could it be the mpt_sas driver itself somehow throttling access to all these devices? Or maybe do I need to do some sort of irq-cpu pinning type of magic?
>
>
> Thanks,
>
> Michael
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
More information about the OmniOS-discuss
mailing list