[OmniOS-discuss] Building a new storage
Matej Zerovnik
matej at zunaj.si
Fri Apr 10 10:17:27 UTC 2015
We are currently thinking of rebuilding our SAN, since we did some
mistakes on the first build. But before we begin, we would like to plan
accordingly, so I'm wondering how to measure some data(l2arc and zil
usage, current iops,...) the right way.
We currently have a single raidz2 pool build out of 50 SATA
drives(Seagate Constellation, 2x Intel S3700 100GB as ZIL and 2x Intel
S3700 100GB as L2ARC.
For the new system, we plan to use a IBM 3550 M4 server with 256GB of
memory and LSI SAS 9207-8e HBA. We will have around 70-80 SAS 4TB drives
in JBOD cases and, if we need, some SSD's for ZIL and L2ARC.
Questions:
*1.)*
How to measure average IOPS of the current system? 'zpool iostat
poolname 1' gives me weird numbers saying current drives perform around
300 read ops and 100 write ops per second. Drives are 7200 SATA drives,
so I know they can't perform that much IOPS.
Output from iostat -vx (only some drives are pasted):
Code:
device r/s w/s kr/s kw/s wait actv svc_t %w %b
data 36621,9 25740,2 19288,6 66191,0 197,6 25,9 3,6 40 77
sd18 276,3 104,8 145,2 83,3 0,0 0,6 1,5 0 36
sd19 283,3 106,7 152,1 83,3 0,0 0,6 1,5 0 24
sd20 281,3 101,8 146,7 79,8 0,0 0,5 1,4 0 35
sd21 286,3 117,7 146,7 84,3 0,0 0,3 0,7 0 21
sd22 283,3 85,8 144,2 81,3 0,0 0,5 1,3 0 32
sd23 275,3 116,7 139,7 82,8 0,0 0,3 0,8 0 21
sd24 280,3 106,7 155,6 84,3 0,0 0,6 1,6 0 25
sd25 288,3 106,7 148,6 86,3 0,0 0,4 1,0 0 24
sd26 269,4 110,7 137,2 91,8 0,0 0,5 1,3 0 24
sd27 272,4 87,8 141,7 78,3 0,0 0,7 1,8 0 34
sd28 236,4 115,7 219,0 84,8 0,0 0,9 2,5 0 26
sd29 235,4 108,7 228,5 83,8 0,0 0,9 2,7 0 33
Output of 'zpool iostat -v data 1 | grep drive_id'
Code:
capacity operations bandwidth
pool alloc free read write read write
c8t5000C5004FD18DE9d0 - - 573 220 663K 607K
c8t5000C5004FD18DE9d0 - - 563 0 318K 0
c8t5000C5004FD18DE9d0 - - 586 314 361K 806K
c8t5000C5004FD18DE9d0 - - 567 445 373K 1,02M
c8t5000C5004FD18DE9d0 - - 464 25 299K 17,9K
c8t5000C5004FD18DE9d0 - - 552 2 326K 3,68K
c8t5000C5004FD18DE9d0 - - 421 41 249K 31,3K
c8t5000C5004FD18DE9d0 - - 492 400 391K 944K
c8t5000C5004FD18DE9d0 - - 313 148 242K 337K
c8t5000C5004FD18DE9d0 - - 330 163 360K 390K
c8t5000C5004FD18DE9d0 - - 655 23 577K 21,5K
Is it just me, or are there too much IOPS for those drive to handle even
in theory, let alone in practice? How to get the right measurement?
*2.)*
Current ARC utilization on our system:
Code:
ARC Efficency:
Cache Access Total: 2134027465
Cache Hit Ratio: 64% 1381755042 [Defined State for buffer]
Cache Miss Ratio: 35% 752272423 [Undefined State for Buffer]
REAL Hit Ratio: 56% 1199175179 [MRU/MFU Hits Only]
Code:
./arcstat.pl -f read,hits,miss,hit%,l2read,l2hits,l2miss,l2hit%,arcsz,l2size 1 2>/dev/null
read hits miss hit% l2read l2hits l2miss l2hit% arcsz l2size
1 1 0 100 0 0 0 0 213G 235G
4.8K 3.0K 1.9K 61 1.9K 40 1.8K 2 213G 235G
4.3K 2.7K 1.6K 62 1.6K 35 1.5K 2 213G 235G
2.5K 853 1.6K 34 1.6K 45 1.6K 2 213G 235G
5.1K 3.0K 2.2K 57 2.2K 49 2.1K 2 213G 235G
6.5K 4.4K 2.1K 68 2.1K 30 2.0K 1 213G 235G
5.0K 2.5K 2.5K 49 2.5K 44 2.5K 1 213G 235G
11K 8.5K 2.8K 75 2.8K 13 2.8K 0 213G 235G
6.4K 4.8K 1.6K 74 1.6K 57 1.6K 3 213G 235G
2.3K 1.1K 1.2K 46 1.2K 88 1.1K 7 213G 235G
1.9K 532 1.3K 28 1.3K 83 1.2K 6 213G 235G
As we can see, there are almost no L2ARC cache hits. What can be the
reason for that? Is our L2ARC cache too small or are the data on our
storage just too much random to be cached? I don't know what is on our
iscsi shares, since they are for outside customers, but as far as I
know, it's mostly backups and some live data.
*3.)*
As far as ZIL goes, do we need it? I think I read somewhere, that ZIL
can only store 8k blocks and that you have to 'format' iscsi drives
accordingly. Is that still the case? Output from 'zilstat':
Code:
N-Bytes N-Bytes/s N-Max-Rate B-Bytes B-Bytes/s B-Max-Rate ops <=4kB 4-32kB >=32kB
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
178352 178352 178352 262144 262144 262144 2 0 0 2
134823992 134823992 134823992 221380608 221380608 221380608 1689 0 0 1689
102893848 102893848 102893848 168427520 168427520 168427520 1285 0 0 1285
0 0 0 0 0 0 0 0 0 0
4472 4472 4472 131072 131072 131072 1 0 0 1
0 0 0 0 0 0 0 0 0 0
41904 41904 41904 262144 262144 262144 2 0 0 2
134963824 134963824 134963824 221511680 221511680 221511680 1690 0 0 1690
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
32789896 32789896 32789896 53346304 53346304 53346304 407 0 0 407
25467912 25467912 25467912 41811968 41811968 41811968 319 0 0 319
Given the stats, is ZIL even necessary? When I'm running zilstat, I see
big ops every 5s. Why is that? I know system is suppose to flush data
from memory to spindles every 5s, but that shouldn't be seen as ZIL
flush, is that correct?
*4.)*
How to put drives together, to get the best IOPS/capacity ratio out of
them? We were thinking of 7 RAIDZ2 vdev's with 10 drives each. That way
we would get around 224TB pool.
*5.)*
In case we decide to go with 4 JBOD cases, would it be better to build 2
pools, just so that in case 1 pool has a hickup, we won't loose all data?
What else am I not considering?
Thanks, Matej
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150410/790a187c/attachment-0001.html>
More information about the OmniOS-discuss
mailing list