[OmniOS-discuss] Mildly confusing ZFS iostat output
W Verb
wverb73 at gmail.com
Tue Jan 27 06:27:53 UTC 2015
Thank you Richard. I also found a quite detailed writeup with kstat
examples here:
http://sunsite.uakom.sk/sunworldonline/swol-09-1997/swol-09-perf.html
It's a little old, but I think it gets to the heart of the matter.
-Warren V
On Mon, Jan 26, 2015 at 8:14 PM, Richard Elling <
richard.elling at richardelling.com> wrote:
>
> On Jan 26, 2015, at 5:16 PM, W Verb <wverb73 at gmail.com> wrote:
>
> Hello All,
>
> I am mildly confused by something iostat does when displaying statistics
> for a zpool. Before I begin rooting through the iostat source, does
> anyone have an idea of why I am seeing high "wait" and "wsvc_t" values
> for "ppool" when my devices apparently are not busy? I would have assumed
> that the stats for the pool would be the sum of the stats for the zdevs....
>
>
> welcome to queuing theory! ;-)
>
> First, iostat knows nothing about the devices being measured. It is really
> just a processor
> for kstats of type KSTAT_TYPE_IO (see the kstat(3kstat) man page for
> discussion) For that
> type, you get a 2-queue set. For many cases, 2-queues is a fine model, but
> when there is
> only one interesting queue, sometimes developers choose to put less
> interesting info in the
> "wait" queue.
>
> Second, it is the responsibility of the developer to define the queues. In
> the case of pools,
> the queues are defined as:
> wait = vdev_queue_io_add() until vdev_queue_io_remove()
> run = vdev_queue_pending_add() until vdev_queue_pending_remove()
>
> The run queue is closer to the actual measured I/O to the vdev (the juicy
> performance bits)
> The wait queue is closer to the transaction engine and includes time for
> aggregation.
> Thus we expect the wait queue to be higher, especially for async
> workloads. But since I/Os
> can and do get aggregated prior to being sent to the vdev, it is not a
> very useful measure of
> overall performance. In other words, optimizing this away could actually
> hurt performance.
>
> In general, worry about the run queues and don't worry so much about the
> wait queues.
> NB, iostat calls "run" queues "active" queues. You say Tomato, I say
> 'mater.
> -- richard
>
>
>
> extended device statistics
> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
> 10.0 9183.0 40.5 344942.0 0.0 1.8 0.0 0.2 0 178 c4
> 1.0 187.0 4.0 19684.0 0.0 0.1 0.0 0.5 0 8
> c4t5000C5006A597B93d0
> 2.0 199.0 12.0 20908.0 0.0 0.1 0.0 0.6 0 12
> c4t5000C500653DE049d0
> 2.0 197.0 8.0 20788.0 0.0 0.2 0.0 0.8 0 15
> c4t5000C5003607D87Bd0
> 0.0 202.0 0.0 20908.0 0.0 0.1 0.0 0.6 0 11
> c4t5000C5006A5903A2d0
> 0.0 189.0 0.0 19684.0 0.0 0.1 0.0 0.5 0 10
> c4t5000C500653DEE58d0
> 5.0 957.0 16.5 1966.5 0.0 0.1 0.0 0.1 0 7
> c4t50026B723A07AC78d0
> 0.0 201.0 0.0 20787.9 0.0 0.1 0.0 0.7 0 14
> c4t5000C5003604ED37d0
> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
> c4t5000C500653E447Ad0
> 0.0 3525.0 0.0 110107.7 0.0 0.5 0.0 0.2 0 51
> c4t500253887000690Dd0
> 0.0 3526.0 0.0 110107.7 0.0 0.5 0.0 0.1 1 50
> c4t5002538870006917d0
> 10.0 6046.0 40.5 344941.5 837.4 1.9 138.3 0.3 23 67 ppool
>
>
> For those following the VAAI thread, this is the system I will be using as
> my testbed.
>
> Here is the structure of ppool (taken at a different time than above):
>
> root at sanbox:/root# zpool iostat -v ppool
> capacity operations bandwidth
> pool alloc free read write read write
> ------------------------- ----- ----- ----- ----- ----- -----
> ppool 191G 7.97T 23 637 140K 15.0M
> mirror 63.5G 2.66T 7 133 46.3K 840K
> c4t5000C5006A597B93d0 - - 1 13 24.3K 844K
> c4t5000C500653DEE58d0 - - 1 13 24.1K 844K
> mirror 63.6G 2.66T 7 133 46.5K 839K
> c4t5000C5006A5903A2d0 - - 1 13 24.0K 844K
> c4t5000C500653DE049d0 - - 1 13 24.6K 844K
> mirror 63.5G 2.66T 7 133 46.8K 839K
> c4t5000C5003607D87Bd0 - - 1 13 24.5K 843K
> c4t5000C5003604ED37d0 - - 1 13 24.4K 843K
> logs - - - - - -
> mirror 301M 222G 0 236 0 12.5M
> c4t5002538870006917d0 - - 0 236 5 12.5M
> c4t500253887000690Dd0 - - 0 236 5 12.5M
> cache - - - - - -
> c4t50026B723A07AC78d0 62.3G 11.4G 19 113 83.0K 1.07M
> ------------------------- ----- ----- ----- ----- ----- -----
>
> root at sanbox:/root# zfs get all ppool
> NAME PROPERTY VALUE SOURCE
> ppool type filesystem -
> ppool creation Sat Jan 24 18:37 2015 -
> ppool used 5.16T -
> ppool available 2.74T -
> ppool referenced 96K -
> ppool compressratio 1.51x -
> ppool mounted yes -
> ppool quota none default
> ppool reservation none default
> ppool recordsize 128K default
> ppool mountpoint /ppool default
> ppool sharenfs off default
> ppool checksum on default
> ppool compression lz4 local
> ppool atime on default
> ppool devices on default
> ppool exec on default
> ppool setuid on default
> ppool readonly off default
> ppool zoned off default
> ppool snapdir hidden default
> ppool aclmode discard default
> ppool aclinherit restricted default
> ppool canmount on default
> ppool xattr on default
> ppool copies 1 default
> ppool version 5 -
> ppool utf8only off -
> ppool normalization none -
> ppool casesensitivity sensitive -
> ppool vscan off default
> ppool nbmand off default
> ppool sharesmb off default
> ppool refquota none default
> ppool refreservation none default
> ppool primarycache all default
> ppool secondarycache all default
> ppool usedbysnapshots 0 -
> ppool usedbydataset 96K -
> ppool usedbychildren 5.16T -
> ppool usedbyrefreservation 0 -
> ppool logbias latency default
> ppool dedup off default
> ppool mlslabel none default
> ppool sync standard local
> ppool refcompressratio 1.00x -
> ppool written 96K -
> ppool logicalused 445G -
> ppool logicalreferenced 9.50K -
> ppool filesystem_limit none default
> ppool snapshot_limit none default
> ppool filesystem_count none default
> ppool snapshot_count none default
> ppool redundant_metadata all default
>
> Currently, ppool contains a single 5TB zvol that I am hosting as an iSCSI
> block device. At the zdev level, I have ensured that the ashift is 12 for
> all devices, all physical devices are 4k-native SATA, and the cache/log
> SSDs are also set for 4k. The block sizes are manually set in sd.conf,
> and confirmed with "echo ::sd_state | mdb -k | egrep '(^un|_blocksize)'".
> The zvol blocksize is 4k, and the iSCSI block transfer size is 512B (not
> that it matters).
>
> All drives contain a single Solaris2 partition with an EFI label, and are
> properly aligned:
> format> verify
>
> Volume name = < >
> ascii name = <ATA-ST3000DM001-1CH1-CC27-2.73TB>
> bytes/sector = 512
> sectors = 5860533167
> accessible sectors = 5860533134
> Part Tag Flag First Sector Size Last Sector
> 0 usr wm 256 2.73TB
> 5860516750
> 1 unassigned wm 0 0 0
> 2 unassigned wm 0 0 0
> 3 unassigned wm 0 0 0
> 4 unassigned wm 0 0 0
> 5 unassigned wm 0 0 0
> 6 unassigned wm 0 0 0
> 8 reserved wm 5860516751 8.00MB 5860533134
>
> I scrubbed the pool last night, which completed without error. From "zdb
> ppool", I have extracted (with minor formatting):
>
> capacity operations bandwidth ---- errors
> ----
> description used avail read write read write read write
> cksum
> ppool 339G 7.82T 26.6K 0 175M 0 0
> 0 5
> mirror 113G 2.61T 8.87K 0 58.5M 0 0
> 0 2
> /dev/dsk/c4t5000C5006A597B93d0s0 3.15K 0 48.8M 0 0
> 0 2
> /dev/dsk/c4t5000C500653DEE58d0s0 3.10K 0 49.0M 0 0
> 0 2
>
> mirror 113G 2.61T 8.86K 0 58.5M 0 0
> 0 8
> /dev/dsk/c4t5000C5006A5903A2d0s0 3.12K 0 48.7M 0 0
> 0 8
> /dev/dsk/c4t5000C500653DE049d0s0 3.08K 0 48.9M 0 0
> 0 8
>
> mirror 113G 2.61T 8.86K 0 58.5M 0 0
> 0 10
> /dev/dsk/c4t5000C5003607D87Bd0s0 2.48K 0 48.8M 0 0
> 0 10
> /dev/dsk/c4t5000C5003604ED37d0s0 2.47K 0 48.9M 0 0
> 0 10
>
> log mirror 44.0K 222G 0 0 37 0 0
> 0 0
> /dev/dsk/c4t5002538870006917d0s0 0 0 290 0 0
> 0 0
> /dev/dsk/c4t500253887000690Dd0s0 0 0 290 0 0
> 0 0
> Cache
> /dev/dsk/c4t50026B723A07AC78d0s0
> 0 73.8G 0 0 35 0 0
> 0 0
> Spare
> /dev/dsk/c4t5000C500653E447Ad0s0 4 0 136K 0 0
> 0 0
>
> This shows a few checksum errors, which is not consistent with the output
> of "zfs status -v", and "iostat -eE" shows no physical error count. I
> again see the discrepancy between the "ppool" value and what I would
> expect, which would be a sum of the cksum errors for each vdev.
>
> I also observed a ton of leaked space, which I expect from a live pool, as
> well as a single:
> db_blkptr_cb: Got error 50 reading <96, 1, 2, 3fc8>
> DVA[0]=<1:1dc4962000:1000> DVA[1]=<2:1dc4654000:1000> [L2 zvol object]
> fletcher4 lz4 LE contiguous unique double size=4000L/a00P
> birth=52386L/52386P fill=4825
> cksum=c70e8a7765:f2a
> dce34f59c:c8a289b51fe11d:7e0af40fe154aab4 -- skipping
>
>
> By the way, I also found:
>
> Uberblock:
> magic = 000000000*0bab10c*
>
> Wow. Just wow.
>
>
> -Warren V
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150126/2d31b01c/attachment-0001.html>
More information about the OmniOS-discuss
mailing list