[OmniOS-discuss] zfs pool 100% busy, disks less than 10%

Richard Elling richard.elling at richardelling.com
Mon Nov 3 02:07:00 UTC 2014


On Oct 31, 2014, at 6:07 PM, Rune Tipsmark <rt at steait.net> wrote:

> So actually started storage vmotions on 3 host, 6 concurrent and am getting about 1GB/sec
> Guess I need more hosts to really push this, the disk are not more than 20-25% busy, so in theory I could push a bit more.
> 
> I think this is resolved for now.... cpu sitting at 30-40% usage while moving 1GB/sec

Yes, that seems about right.
 -- richard

> 
> Iostat -xn 1
> pool04       396G  39.5T      9  15.9K   325K  1.01G
> pool04       396G  39.5T      7  17.0K   270K  1.03G
> pool04       396G  39.5T     12  17.4K   558K  1.10G
> pool04       396G  39.5T     10  16.9K   442K  1.03G
> pool04       397G  39.5T      6  16.9K   332K  1021M
> pool04       397G  39.5T      1  16.3K  74.9K  1.01G
> pool04       397G  39.5T      8  17.0K   433K  1.05G
> pool04       397G  39.5T     20  17.1K   716K  1023M
> pool04       397G  39.5T     11  18.3K   425K  1.14G
> pool04       398G  39.5T      0  18.3K  65.9K  1.11G
> pool04       398G  39.5T     16  17.9K   551K  1.06G
> pool04       398G  39.5T      0  16.8K   105K  1.03G
> pool04       398G  39.5T      1  18.2K   124K  1.11G
> pool04       398G  39.5T      0  17.1K  45.9K  1.05G
> pool04       399G  39.5T      6  17.3K   454K  1.08G
> pool04       399G  39.5T      0  17.9K      0  1.06G
> pool04       399G  39.5T      2  16.9K   116K  1.04G
> pool04       399G  39.5T      2  18.8K   130K  1.09G
> pool04       399G  39.5T      0  17.6K      0  1.03G
> pool04       400G  39.5T      3  17.5K   155K  1.04G
> pool04       400G  39.5T      0  17.6K  31.5K  1.03G
> 
> -----Original Message-----
> From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Rune Tipsmark
> Sent: Friday, October 31, 2014 12:38 PM
> To: Richard Elling; Eric Sproul
> Cc: omnios-discuss at lists.omniti.com
> Subject: Re: [OmniOS-discuss] zfs pool 100% busy, disks less than 10%
> 
> Ok, makes sense.
> What other kind of  indicators can I look at?
> 
> I get decent results from DD but still feels a bit slow...
> 
> Compression lz4 should not slow it down right? Cpu is not doing much when copying data over, maybe 15% busy or so... 
> 
> Sync=always, block size 1M
> 204800000000 bytes (205 GB) copied, 296.379 s, 691 MB/s
> real    4m56.382s
> user    0m0.461s
> sys     3m12.662s
> 
> Sync=disabled, block size 1M
> 204800000000 bytes (205 GB) copied, 117.774 s, 1.7 GB/s
> real    1m57.777s
> user    0m0.237s
> sys     1m57.466s
> 
> ... while doing this I was looking at my FIO cards, I think the reason is that the SLC's need more power to deliver higher performance, they are supposed to deliver 1.5GB/sec but only delivers around 350MB/sec each....
> 
> Now looking for aux power cables and will retest...
> 
> Br,
> Rune
> 
> -----Original Message-----
> From: Richard Elling [mailto:richard.elling at richardelling.com] 
> Sent: Friday, October 31, 2014 9:03 AM
> To: Eric Sproul
> Cc: Rune Tipsmark; omnios-discuss at lists.omniti.com
> Subject: Re: [OmniOS-discuss] zfs pool 100% busy, disks less than 10%
> 
> 
> On Oct 31, 2014, at 7:14 AM, Eric Sproul <eric.sproul at circonus.com> wrote:
> 
>> On Fri, Oct 31, 2014 at 2:33 AM, Rune Tipsmark <rt at steait.net> wrote:
>> 
>>> Why is this pool showing near 100% busy when the underlying disks are 
>>> doing nothing at all....
>> 
>> Simply put, it's just how the accounting works in iostat.  It treats 
>> the pool like any other device, so if there is even one outstanding 
>> request to the pool, it counts towards the busy%.  Keith W. from 
>> Joyent explained this recently on the illumos-zfs list:
>> http://www.listbox.com/member/archive/182191/2014/10/sort/time_rev/pag
>> e/3/entry/18:93/20141017161955:F3E11AB2-563A-11E4-8EDC-D0C677981E2F/
>> 
>> The TL;DR is: if your pool has more than one disk in it, the pool-wide 
>> busy% is useless.
> 
> FWIW, we use %busy as an indicator that we can ignore a device/subsystem when looking for performance problems. We don't use it as an indicator of problems. In other words, if the device isn't > 10% busy, forgetabouddit. If it is more busy, look in more detail at the meaningful performance indicators.
> -- richard
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss



More information about the OmniOS-discuss mailing list