[OmniOS-discuss] Mildly confusing ZFS iostat output

Tue Jan 27 06:27:53 UTC 2015

Thank you Richard. I also found a quite detailed writeup with kstat
examples here:
http://sunsite.uakom.sk/sunworldonline/swol-09-1997/swol-09-perf.html

It's a little old, but I think it gets to the heart of the matter.
-Warren V

On Mon, Jan 26, 2015 at 8:14 PM, Richard Elling <
richard.elling at richardelling.com> wrote:

>
> On Jan 26, 2015, at 5:16 PM, W Verb <wverb73 at gmail.com> wrote:
>
> Hello All,
>
> I am mildly confused by something iostat does when displaying statistics
> for a zpool. Before I begin rooting through the iostat source, does
> anyone have an idea of why I am seeing high "wait" and "wsvc_t" values
> for "ppool" when my devices apparently are not busy? I would have assumed
> that the stats for the pool would be the sum of the stats for the zdevs....
>
>
> welcome to queuing theory! ;-)
>
> First, iostat knows nothing about the devices being measured. It is really
> just a processor
> for kstats of type KSTAT_TYPE_IO (see the kstat(3kstat) man page for
> discussion)  For that
> type, you get a 2-queue set. For many cases, 2-queues is a fine model, but
> when there is
> only one interesting queue, sometimes developers choose to put less
> interesting info in the
> "wait" queue.
>
> Second, it is the responsibility of the developer to define the queues. In
> the case of pools,
> the queues are defined as:
> wait = vdev_queue_io_add() until vdev_queue_io_remove()
> run = vdev_queue_pending_add() until vdev_queue_pending_remove()
>
> The run queue is closer to the actual measured I/O to the vdev (the juicy
> performance bits)
> The wait queue is closer to the transaction engine and includes time for
> aggregation.
> Thus we expect the wait queue to be higher, especially for async
> workloads. But since I/Os
> can and do get aggregated prior to being sent to the vdev, it is not a
> very useful measure of
> overall performance. In other words, optimizing this away could actually
> hurt performance.
>
> In general, worry about the run queues and don't worry so much about the
> wait queues.
> NB, iostat calls "run" queues "active" queues. You say Tomato, I say
> 'mater.
>  -- richard
>
>
>
>                     extended device statistics
>     r/s    w/s   kr/s     kw/s  wait actv wsvc_t asvc_t  %w  %b device
>    10.0 9183.0   40.5 344942.0   0.0  1.8    0.0    0.2   0 178 c4
>     1.0  187.0    4.0  19684.0   0.0  0.1    0.0    0.5   0   8
> c4t5000C5006A597B93d0
>     2.0  199.0   12.0  20908.0   0.0  0.1    0.0    0.6   0  12
> c4t5000C500653DE049d0
>     2.0  197.0    8.0  20788.0   0.0  0.2    0.0    0.8   0  15
> c4t5000C5003607D87Bd0
>     0.0  202.0    0.0  20908.0   0.0  0.1    0.0    0.6   0  11
> c4t5000C5006A5903A2d0
>     0.0  189.0    0.0  19684.0   0.0  0.1    0.0    0.5   0  10
> c4t5000C500653DEE58d0
>     5.0  957.0   16.5   1966.5   0.0  0.1    0.0    0.1   0   7
> c4t50026B723A07AC78d0
>     0.0  201.0    0.0  20787.9   0.0  0.1    0.0    0.7   0  14
> c4t5000C5003604ED37d0
>     0.0    0.0    0.0      0.0   0.0  0.0    0.0    0.0   0   0
> c4t5000C500653E447Ad0
>     0.0 3525.0    0.0 110107.7   0.0  0.5    0.0    0.2   0  51
> c4t500253887000690Dd0
>     0.0 3526.0    0.0 110107.7   0.0  0.5    0.0    0.1   1  50
> c4t5002538870006917d0
>    10.0 6046.0   40.5 344941.5 837.4  1.9  138.3    0.3  23  67 ppool
>
>
> For those following the VAAI thread, this is the system I will be using as
> my testbed.
>
> Here is the structure of ppool (taken at a different time than above):
>
> root at sanbox:/root# zpool iostat -v ppool
>                               capacity     operations    bandwidth
> pool                       alloc   free   read  write   read  write
> -------------------------  -----  -----  -----  -----  -----  -----
> ppool                       191G  7.97T     23    637   140K  15.0M
>   mirror                   63.5G  2.66T      7    133  46.3K   840K
>     c4t5000C5006A597B93d0      -      -      1     13  24.3K   844K
>     c4t5000C500653DEE58d0      -      -      1     13  24.1K   844K
>   mirror                   63.6G  2.66T      7    133  46.5K   839K
>     c4t5000C5006A5903A2d0      -      -      1     13  24.0K   844K
>     c4t5000C500653DE049d0      -      -      1     13  24.6K   844K
>   mirror                   63.5G  2.66T      7    133  46.8K   839K
>     c4t5000C5003607D87Bd0      -      -      1     13  24.5K   843K
>     c4t5000C5003604ED37d0      -      -      1     13  24.4K   843K
> logs                           -      -      -      -      -      -
>   mirror                    301M   222G      0    236      0  12.5M
>     c4t5002538870006917d0      -      -      0    236      5  12.5M
>     c4t500253887000690Dd0      -      -      0    236      5  12.5M
> cache                          -      -      -      -      -      -
>   c4t50026B723A07AC78d0    62.3G  11.4G     19    113  83.0K  1.07M
> -------------------------  -----  -----  -----  -----  -----  -----
>
> root at sanbox:/root# zfs get all ppool
> NAME   PROPERTY              VALUE                  SOURCE
> ppool  type                  filesystem             -
> ppool  creation              Sat Jan 24 18:37 2015  -
> ppool  used                  5.16T                  -
> ppool  available             2.74T                  -
> ppool  referenced            96K                    -
> ppool  compressratio         1.51x                  -
> ppool  mounted               yes                    -
> ppool  quota                 none                   default
> ppool  reservation           none                   default
> ppool  recordsize            128K                   default
> ppool  mountpoint            /ppool                 default
> ppool  sharenfs              off                    default
> ppool  checksum              on                     default
> ppool  compression           lz4                    local
> ppool  atime                 on                     default
> ppool  devices               on                     default
> ppool  exec                  on                     default
> ppool  setuid                on                     default
> ppool  readonly              off                    default
> ppool  zoned                 off                    default
> ppool  snapdir               hidden                 default
> ppool  aclmode               discard                default
> ppool  aclinherit            restricted             default
> ppool  canmount              on                     default
> ppool  xattr                 on                     default
> ppool  copies                1                      default
> ppool  version               5                      -
> ppool  utf8only              off                    -
> ppool  normalization         none                   -
> ppool  casesensitivity       sensitive              -
> ppool  vscan                 off                    default
> ppool  nbmand                off                    default
> ppool  sharesmb              off                    default
> ppool  refquota              none                   default
> ppool  refreservation        none                   default
> ppool  primarycache          all                    default
> ppool  secondarycache        all                    default
> ppool  usedbysnapshots       0                      -
> ppool  usedbydataset         96K                    -
> ppool  usedbychildren        5.16T                  -
> ppool  usedbyrefreservation  0                      -
> ppool  logbias               latency                default
> ppool  dedup                 off                    default
> ppool  mlslabel              none                   default
> ppool  sync                  standard               local
> ppool  refcompressratio      1.00x                  -
> ppool  written               96K                    -
> ppool  logicalused           445G                   -
> ppool  logicalreferenced     9.50K                  -
> ppool  filesystem_limit      none                   default
> ppool  snapshot_limit        none                   default
> ppool  filesystem_count      none                   default
> ppool  snapshot_count        none                   default
> ppool  redundant_metadata    all                    default
>
> Currently, ppool contains a single 5TB zvol that I am hosting as an iSCSI
> block device. At the zdev level, I have ensured that the ashift is 12 for
> all devices, all physical devices are 4k-native SATA, and the cache/log
> SSDs are also set for 4k. The block sizes are manually set in sd.conf,
> and confirmed with "echo ::sd_state | mdb -k | egrep '(^un|_blocksize)'".
> The zvol blocksize is 4k, and the iSCSI block transfer size is 512B (not
> that it matters).
>
> All drives contain a single Solaris2 partition with an EFI label, and are
> properly aligned:
> format> verify
>
> Volume name = <        >
> ascii name  = <ATA-ST3000DM001-1CH1-CC27-2.73TB>
> bytes/sector    =  512
> sectors = 5860533167
> accessible sectors = 5860533134
> Part      Tag    Flag     First Sector          Size          Last Sector
>   0        usr    wm               256         2.73TB
> 5860516750
>   1 unassigned    wm                 0            0                0
>   2 unassigned    wm                 0            0                0
>   3 unassigned    wm                 0            0                0
>   4 unassigned    wm                 0            0                0
>   5 unassigned    wm                 0            0                0
>   6 unassigned    wm                 0            0                0
>   8   reserved    wm        5860516751         8.00MB           5860533134
>
> I scrubbed the pool last night, which completed without error. From "zdb
> ppool", I have extracted (with minor formatting):
>
>                              capacity  operations   bandwidth  ---- errors
> ----
> description                used avail  read write  read write  read write
> cksum
> ppool                      339G 7.82T 26.6K     0  175M     0     0
> 0     5
>   mirror                   113G 2.61T 8.87K     0 58.5M     0     0
> 0     2
>     /dev/dsk/c4t5000C5006A597B93d0s0  3.15K     0 48.8M     0     0
> 0     2
>     /dev/dsk/c4t5000C500653DEE58d0s0  3.10K     0 49.0M     0     0
> 0     2
>
>   mirror                   113G 2.61T 8.86K     0 58.5M     0     0
> 0     8
>     /dev/dsk/c4t5000C5006A5903A2d0s0  3.12K     0 48.7M     0     0
> 0     8
>     /dev/dsk/c4t5000C500653DE049d0s0  3.08K     0 48.9M     0     0
> 0     8
>
>   mirror                   113G 2.61T 8.86K     0 58.5M     0     0
> 0    10
>     /dev/dsk/c4t5000C5003607D87Bd0s0  2.48K     0 48.8M     0     0
> 0    10
>     /dev/dsk/c4t5000C5003604ED37d0s0  2.47K     0 48.9M     0     0
> 0    10
>
>   log mirror              44.0K  222G     0     0    37     0     0
> 0     0
>     /dev/dsk/c4t5002538870006917d0s0      0     0   290     0     0
> 0     0
>     /dev/dsk/c4t500253887000690Dd0s0      0     0   290     0     0
> 0     0
>   Cache
>   /dev/dsk/c4t50026B723A07AC78d0s0
>                               0 73.8G     0     0    35     0     0
> 0     0
>   Spare
>   /dev/dsk/c4t5000C500653E447Ad0s0        4     0  136K     0     0
> 0     0
>
> This shows a few checksum errors, which is not consistent with the output
> of "zfs status -v", and "iostat -eE" shows no physical error count. I
> again see the discrepancy between the "ppool" value and what I would
> expect, which would be a sum of the cksum errors for each vdev.
>
> I also observed a ton of leaked space, which I expect from a live pool, as
> well as a single:
> db_blkptr_cb: Got error 50 reading <96, 1, 2, 3fc8>
> DVA[0]=<1:1dc4962000:1000> DVA[1]=<2:1dc4654000:1000> [L2 zvol object]
> fletcher4 lz4 LE contiguous unique double size=4000L/a00P
> birth=52386L/52386P fill=4825
> cksum=c70e8a7765:f2a
> dce34f59c:c8a289b51fe11d:7e0af40fe154aab4 -- skipping
>
>
> By the way, I also found:
>
> Uberblock:
>         magic = 000000000*0bab10c*
>
> Wow. Just wow.
>
>
> -Warren V
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150126/2d31b01c/attachment-0001.html>