From jarrad at gmail.com  Tue Jul  1 07:56:47 2014
From: jarrad at gmail.com (Jarrad Piper)
Date: Tue, 1 Jul 2014 17:26:47 +0930
Subject: [OmniOS-discuss] Experiences with HP Micro Server Gen8
Message-ID: <CA+i_BFSm1kqW16uGO-Cvr0KWVhqzY1MLbUz4LLm7vz2FDgtxqQ@mail.gmail.com>

Hi All,

I missed out on the discussion Re. the HP Gen8 Microserver last month but
thought I'd put my two cent's on where it is at currently in terms of
hardware support as I've been using it for close to a year now as a
Storage/KVM server.

After Initially having no support at all, the Broadcom 5720 onboard NIC's
for a while now has had out of the box support. However the bundled drivers
are very old and have a serious bug in that the NIC will lock up entirely
if you try to create a VNIC. e.g. "dladm create-vnic -l bge0 vnic0" This
may not concern some people but if you wan't to run a Zone or a KVM Virtual
Machine you will need to create a VNIC. This problem only presents when you
are connected to a 1Gbit switch. Other people have exhibited the same
problem in other circumstances but I have only seen this problem occur
creating VNIC's in OmniOS.

One workaround is to boot the machine up on a 100Mbit switch create the
VNIC and then connect the Gigabit switch. A more practical solution is to
turn of auto-negotiation and set the link properties to 100Mbit. e.g.

sudo dladm show-linkprop bge1
sudo dladm set-linkprop -t -p adv_autoneg_cap=0 bge1
sudo dladm set-linkprop -t -p en_1000fdx_cap=0 bge1
sudo dladm set-linkprop -t -p en_1000hdx_cap=0 bge1

The best solution however is to build the latest driver (16.2.2) from
source which is available from Broadcom's website. On OmniOS you will need
to install onbld and the Solaris Studio 12.1 Compiler before building. The
zip file contains a readme.txt on how to compile and update the driver for
OpenSolaris/Solaris11. You will end up with a BRCMbge-S11-i386-16.2.2.pkg
file which you install using "pkgadd -d BRCMbge-S11-i386-16.2.2.pkg". Note:
this is from memory, there may have also been a few tweaks to the Makefile
to get it to compile.

>From what I can gather Illumos cannot include newer versions of the BGE
driver due to Broadcom's license not being compatible with theirs.

The other major problem with the Microserver is the unbearable fan noise
when the drives are configured in AHCI mode (which is what anyone using ZFS
will want). I won't go into it but more information is available here:

http://h30499.www3.hp.com/t5/ProLiant-Servers-Netservers/MicroServer-Gen8-is-noisy/td-p/6171563
http://homeservershow.com/forums/index.php?/topic/6032-g8-microserver-be-aware-of-fan-issue-add-in-cards/

Basically to get it to a bearable level you will need the latest BIOS and a
hacked ILO firmware (see second link) which lowers the heat tolerance.

Oh and I can also confirm that booting from an SD Card is fine, It has been
happily working for almost a year now.

Hope this helps anyone having issues or thinking of purchasing one. Any
other questions let me know.

Jarrad.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140701/e8579da0/attachment.html>

From nicolas.digregorio at gmail.com  Tue Jul  1 08:56:00 2014
From: nicolas.digregorio at gmail.com (Nicolas Di Gregorio)
Date: Tue, 1 Jul 2014 10:56:00 +0200
Subject: [OmniOS-discuss] best practice for permissions on a NAS
Message-ID: <CAO2ysPYfC=LWefPU+qBpQSRH8Nd06JckbGxJGQXttpvU-Bb6XQ@mail.gmail.com>

Hello,

I'm building a NAS with OmniOS that should server CIFS and NFS client with
at least basic restriction regarding users and groups. What would be the
best practive to configure the permissions? How to implement it?

Kind Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140701/ee88e2e6/attachment.html>

From nicolas.digregorio at gmail.com  Tue Jul  1 09:42:25 2014
From: nicolas.digregorio at gmail.com (Nicolas Di Gregorio)
Date: Tue, 1 Jul 2014 11:42:25 +0200
Subject: [OmniOS-discuss] best practice for permissions on a NAS
In-Reply-To: <1283584D-CE77-4BBB-9191-FBA2D1F5BAA3@marzocchi.net>
References: <CAO2ysPYfC=LWefPU+qBpQSRH8Nd06JckbGxJGQXttpvU-Bb6XQ@mail.gmail.com>
	<1283584D-CE77-4BBB-9191-FBA2D1F5BAA3@marzocchi.net>
Message-ID: <CAO2ysPb89LR3wZBejQ+UUjKW9G-rRSUM+Cwxi9yZt1K-+4wkcw@mail.gmail.com>

Thanks for this.

what does mean aclinherit=passthrough?


2014-07-01 11:20 GMT+02:00 Olaf Marzocchi <lists at marzocchi.net>:

> You may want to check my short articles on Marzocchi.net about my NAS
> based on OmniOS.
>
> Regards
> Olaf Marzocchi
>
>
> Inviato da iPhone
>
> > Il giorno 01/lug/2014, alle ore 10:56, Nicolas Di Gregorio <
> nicolas.digregorio at gmail.com> ha scritto:
> >
> >
> > Hello,
> >
> > I'm building a NAS with OmniOS that should server CIFS and NFS client
> with at least basic restriction regarding users and groups. What would be
> the best practive to configure the permissions? How to implement it?
> >
> > Kind Regards
> > _______________________________________________
> > OmniOS-discuss mailing list
> > OmniOS-discuss at lists.omniti.com
> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140701/2c824522/attachment-0001.html>

From cal-s at blue-bolt.com  Fri Jul  4 08:35:37 2014
From: cal-s at blue-bolt.com (Cal Sawyer)
Date: Fri, 04 Jul 2014 09:35:37 +0100
Subject: [OmniOS-discuss] ncurses lib in r151010j
In-Reply-To: <mailman.11.1403796376.8821.omnios-discuss@lists.omniti.com>
References: <mailman.11.1403796376.8821.omnios-discuss@lists.omniti.com>
Message-ID: <53B66759.1080306@blue-bolt.com>

Hi

I'm trying to build iftop on r151010j because the available packages are 
rather stale.

     basename   file   opt/omni/sbin/iftop pkg:/network/iftop at 1.0.2-0.151006


     PUBLISHER                             TYPE     STATUS   URI
     omnios                                origin online   
http://pkg.omniti.com/omnios/r151010/
     ms.omniti.com                         origin online   
http://pkg.omniti.com/omniti-ms/


Running into an issue with ncurses during the configure stage

    checking for a curses library containing mvchgat... none found
    configure: error: Curses! Foiled again!
       (Can't find a curses library supporting mvchgat.)
       Consider installing ncurses.


however ...

     > pkg list ncurses
    NAME (PUBLISHER) VERSION                    IFO
    library/ncurses 5.9-0.151010               i--

     > grep mvchgat /usr/include/ncurses/ncurses.h

     extern NCURSES_EXPORT(int) mvchgat (int, int, int, attr_t, short, 
const void *);    /* generated */
     #define mvchgat(y,x,n,a,c,o) mvwchgat(stdscr,y,x,n,a,c,o)

Looks like it's supported.  Any ideas?

thanks

- cal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140704/dd1f8298/attachment.html>

From daleg at omniti.com  Fri Jul  4 09:05:09 2014
From: daleg at omniti.com (Dale Ghent)
Date: Fri, 4 Jul 2014 09:05:09 +0000
Subject: [OmniOS-discuss] ncurses lib in r151010j
In-Reply-To: <53B66759.1080306@blue-bolt.com>
References: <mailman.11.1403796376.8821.omnios-discuss@lists.omniti.com>
	<53B66759.1080306@blue-bolt.com>
Message-ID: <F9439B26-E418-477F-BCAA-6896DCF25003@omniti.com>


So what does your config.log say regarding the check for mvchgat() ? How is it failing the test? When running into issues such as this, config.log is the go-to place to start figuring out the why.

autoconf itself isn?t infallible in how it checks for things, after all.

/dale

On Jul 4, 2014, at 8:35 AM, Cal Sawyer <cal-s at blue-bolt.com> wrote:

> Hi
> 
> I'm trying to build iftop on r151010j because the available packages are rather stale.  
> 
>     basename   file   opt/omni/sbin/iftop                pkg:/network/iftop at 1.0.2-0.151006
> 
> 
>     PUBLISHER                             TYPE     STATUS   URI
>     omnios                                origin   online   http://pkg.omniti.com/omnios/r151010/
>     ms.omniti.com                         origin   online   http://pkg.omniti.com/omniti-ms/
> 
> 
> Running into an issue with ncurses during the configure stage
> 
> checking for a curses library containing mvchgat... none found
> configure: error: Curses! Foiled again!
>   (Can't find a curses library supporting mvchgat.)
>   Consider installing ncurses.
> 
> however ...
> 
> > pkg list ncurses
> NAME (PUBLISHER)                                  VERSION                    IFO
> library/ncurses                                   5.9-0.151010               i--
>     > grep mvchgat /usr/include/ncurses/ncurses.h 
> 
>     extern NCURSES_EXPORT(int) mvchgat (int, int, int, attr_t, short, const void *);    /* generated */
>     #define mvchgat(y,x,n,a,c,o)        mvwchgat(stdscr,y,x,n,a,c,o)
> 
> Looks like it's supported.  Any ideas?
> 
> thanks
> 
> - cal
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


From cal-s at blue-bolt.com  Fri Jul  4 09:57:33 2014
From: cal-s at blue-bolt.com (Cal Sawyer)
Date: Fri, 04 Jul 2014 10:57:33 +0100
Subject: [OmniOS-discuss] ncurses lib in r151010j
In-Reply-To: <F9439B26-E418-477F-BCAA-6896DCF25003@omniti.com>
References: <mailman.11.1403796376.8821.omnios-discuss@lists.omniti.com>
	<53B66759.1080306@blue-bolt.com>
	<F9439B26-E418-477F-BCAA-6896DCF25003@omniti.com>
Message-ID: <53B67A8D.2040803@blue-bolt.com>

This, i imagine, is it.  From config.log

    configure:6233: checking for a curses library containing mvchgat
    configure:6255: gcc -o conftest -g -O2   conftest.c -lpcap -lnsl -lm
    -lsocket  -lcurses >&5
    Undefined                       first referenced
      symbol                             in file
    mvchgat                             /var/tmp//cckgaO9B.o
    ld: fatal: symbol referencing errors. No output written to conftest
    collect2: error: ld returned 1 exit status
    configure:6258: $? = 1
    configure: failed program was:
    #line 6238 "configure"
    #include "confdefs.h"

    #include <curses.h>

    int
    main ()
    {

             mvchgat(0, 0, 1, A_REVERSE, 0, NULL)

       ;
       return 0;
    }
    configure:6255: gcc -o conftest -g -O2   conftest.c -lpcap -lnsl -lm
    -lsocket  -lncurses >&5
    ld: fatal: library -lncurses: not found
    ld: fatal: file processing errors. No output written to conftest
    collect2: error: ld returned 1 exit status
    configure:6258: $? = 1
    configure: failed program was:
    #line 6238 "configure"
    #include "confdefs.h"

    #include <curses.h>

    int
    main ()
    {

             mvchgat(0, 0, 1, A_REVERSE, 0, NULL)

       ;
       return 0;
    }
    configure:6278: result: none found
    configure:6280: error: Curses! Foiled again!
       (Can't find a curses library supporting mvchgat.)
       Consider installing ncurses.


No idea what to do about it, though.

regards,

- cal

  On 04/07/14 10:05, Dale Ghent wrote:
> So what does your config.log say regarding the check for mvchgat() ? How is it failing the test? When running into issues such as this, config.log is the go-to place to start figuring out the why.
>
> autoconf itself isn?t infallible in how it checks for things, after all.
>
> /dale
>
> On Jul 4, 2014, at 8:35 AM, Cal Sawyer <cal-s at blue-bolt.com> wrote:
>
>> Hi
>>
>> I'm trying to build iftop on r151010j because the available packages are rather stale.
>>
>>      basename   file   opt/omni/sbin/iftop                pkg:/network/iftop at 1.0.2-0.151006
>>
>>
>>      PUBLISHER                             TYPE     STATUS   URI
>>      omnios                                origin   online   http://pkg.omniti.com/omnios/r151010/
>>      ms.omniti.com                         origin   online   http://pkg.omniti.com/omniti-ms/
>>
>>
>> Running into an issue with ncurses during the configure stage
>>
>> checking for a curses library containing mvchgat... none found
>> configure: error: Curses! Foiled again!
>>    (Can't find a curses library supporting mvchgat.)
>>    Consider installing ncurses.
>>
>> however ...
>>
>>> pkg list ncurses
>> NAME (PUBLISHER)                                  VERSION                    IFO
>> library/ncurses                                   5.9-0.151010               i--
>>      > grep mvchgat /usr/include/ncurses/ncurses.h
>>
>>      extern NCURSES_EXPORT(int) mvchgat (int, int, int, attr_t, short, const void *);    /* generated */
>>      #define mvchgat(y,x,n,a,c,o)        mvwchgat(stdscr,y,x,n,a,c,o)
>>
>> Looks like it's supported.  Any ideas?
>>
>> thanks
>>
>> - cal
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140704/7c7dec6e/attachment.html>

From daleg at omniti.com  Fri Jul  4 10:39:42 2014
From: daleg at omniti.com (Dale Ghent)
Date: Fri, 4 Jul 2014 10:39:42 +0000
Subject: [OmniOS-discuss] ncurses lib in r151010j
In-Reply-To: <53B67A8D.2040803@blue-bolt.com>
References: <mailman.11.1403796376.8821.omnios-discuss@lists.omniti.com>
	<53B66759.1080306@blue-bolt.com>
	<F9439B26-E418-477F-BCAA-6896DCF25003@omniti.com>
	<53B67A8D.2040803@blue-bolt.com>
Message-ID: <31BB1921-0954-4525-90EF-77F829AB1732@omniti.com>


On Jul 4, 2014, at 9:57 AM, Cal Sawyer <cal-s at blue-bolt.com> wrote:

> This, i imagine, is it.  From config.log

...

> configure:6255: gcc -o conftest -g -O2   conftest.c -lpcap -lnsl -lm -lsocket  -lncurses >&5
> ld: fatal: library -lncurses: not found
> ld: fatal: file processing errors. No output written to conftest

Try:

LDFLAGS=?-L/usr/gnu/lib -R/usr/gnu/lib? ./configure ?

/dale


From alex at cooperi.net  Sun Jul  6 10:39:32 2014
From: alex at cooperi.net (Alex Wilson)
Date: Sun, 6 Jul 2014 20:39:32 +1000
Subject: [OmniOS-discuss] AS Media 1061
In-Reply-To: <CAEekY67xGrZSrHnmu97PFwY--3JDHUWiUSYepMGdWNKj6PSVDQ@mail.gmail.com>
References: <CAEekY67xGrZSrHnmu97PFwY--3JDHUWiUSYepMGdWNKj6PSVDQ@mail.gmail.com>
Message-ID: <F1E3998B-B6F1-4600-87D8-5B689A8CBA0B@cooperi.net>

On 27 Jun 2014, at 4:03 am, F?bio Rabelo <fabio at fabiorabelo.wiki.br> wrote:
> Someone has give a try recently any PCIe card based on the ASm Media
> 1061 SATA 3  chipset ?

It's not the ASM1061, but I've been using its close relative, the ASM1062 since the original commit by Marcel and have had no real issues. Performance is acceptable. I have a single Intel mSATA SSD attached to it. I did update the BIOS and firmware on it as soon as I got it (the images to do this have since disappeared from the ASMedia website though I think).

The card I have is one of these: http://eshop.macsales.com/item/Other+World+Computing/PCIEACCELM/ -- there are a bunch of other brands of it too which seem to be the same PCB layout.


From lotheac at iki.fi  Mon Jul  7 15:38:26 2014
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Mon, 7 Jul 2014 18:38:26 +0300
Subject: [OmniOS-discuss] omnios-build: the build system,
	scripts and merging between branches
Message-ID: <20140707153826.GA15377@gutsman.lotheac.fi>

Please read this and comment if you maintain a fork of omnios-build.
Thanks.

Status quo
----------

The omnios-build repository currently versions more than one thing:
 - the build scripts which describe how to build certain software (under
   build/),
 - the build system (lib/ (mostly), build/buildctl and template/)
 - and some site-specific data (configuration).

The 'template' branch holds nothing site-specific and users are expected
to fork that one, adding their own stuff on top and pulling changes from
upstream's 'template'. This works in one direction, but in practice the
workflow is not "make your changes on top template"; it needs to work in
both directions. When the build scripts and configuration are in the
same repository, you cannot merge back into template - you must instead
be careful with cherry-picking. This isn't very maintainable: recently I
made a pull request that just ported build system changes from
omniti-labs:master to omniti-labs:template and that was 176 commits in
size (ie. after pruning all commits which only touched build scripts).

I maintain two forks of omnios-build (one for a publicly available repo,
and one for a private one) and syncing changes between them and upstream
is a royal pain. I'd wager OmniTI isn't particularly fond of porting
build system changes between different release version branches and
omniti-ms and friends.

I argue that making build system changes flow from one fork/branch to
another needs to be made easier. The way to do that is to split the
build system from the build scripts into another repository, so that
merges can be made freely. This obviously requires some rather large
changes, which I will propose below.

Proposal: submodules?
---------------------

At first, I was thinking "git submodules are the solution to this".  If
you're not familiar, submodules are a way to include a reference to
another repository at a certain commit. This is the sort of tree I was
thinking about:

    omnios-build-my-organization (your site-specific build repo)
    |-- lib (submodule: the build system, or: site-independent code)
    |-- build
    `-- (config.sh, site.sh and other site-specific data)

This would allow you to work on lib/ separately from your build scripts,
and make it clear which version of the build system they need to build.
However, git submodules are a bit unwieldy and not very intuitive (among
other things you need to manually use submodule commands to update the
submodule tree instead of git doing it for you; this is certain to get
annyoing when switching branches).

Better proposal
---------------

So, let's make this simpler by turning it around and ditching the
submodule idea. Let's just have omnios-build contain site-independent
data (ie. the build system), and make build/ another repository, but
let's not make it a submodule. Instead, .gitignore it and make ./new.sh
warn and initialize a new git repository into it if it doesn't exist.
This doesn't require any submodule trickery, but lets you have your
scripts in a separate repo.

Proof of concept is on github niksula/omnios-build split branch and
niksula/omnios-build-scrips repos:

    git clone -b split https://github.com/niksula/omnios-build.git
    cd omnios-build
    git clone https://github.com/niksula/omnios-build-scripts.git build
    cd build/ircii
    ./build.sh

In the PoC I moved buildctl out of build/ and {site,config}.sh inside it
(to the omnios-build-scripts repo). It's not a finished product, but
should demonstrate what I'm saying.

Thoughts?

-- 
Lauri Tirkkonen | +358 50 5341376 | lotheac @ IRCnet

From danmcd at omniti.com  Mon Jul  7 16:02:59 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 7 Jul 2014 12:02:59 -0400
Subject: [OmniOS-discuss] omnios-build: the build system,
	scripts and merging between branches
In-Reply-To: <20140707153826.GA15377@gutsman.lotheac.fi>
References: <20140707153826.GA15377@gutsman.lotheac.fi>
Message-ID: <7B288CC0-9AEB-45A3-8167-AF343283ACC0@omniti.com>


On Jul 7, 2014, at 11:38 AM, Lauri Tirkkonen <lotheac at iki.fi> wrote:

> Please read this and comment if you maintain a fork of omnios-build.
> Thanks.

I'll have to read this more deeply, of course, but I had only one knee-jerk reaction:

> Proof of concept is on github niksula/omnios-build split branch and
> niksula/omnios-build-scrips repos:
> 
>    git clone -b split https://github.com/niksula/omnios-build.git
>    cd omnios-build
>    git clone https://github.com/niksula/omnios-build-scripts.git build
>    cd build/ircii
>    ./build.sh
> 
> In the PoC I moved buildctl out of build/ and {site,config}.sh inside it
> (to the omnios-build-scripts repo). It's not a finished product, but
> should demonstrate what I'm saying.
> 
> Thoughts?

This *seems* sensible, especially as you've put buildctl at the top-level in the omnios-build half of the split.

It seems right now, however, that buildctl still assumes it's in build/. instead of one directory above it.  It's also not 100% clear that the functions in lib/ have been altered to assume site.sh and config.sh are in build instead of lib.

I'd like to see what all changed in any scripts that now live in the "build" half of your split vs. their original pre-split incarnations.  Not sure if github or a tool like webrev would be able to help here.

Also, I've been documenting buildctl and my wrapper - OmniOS-on-demand - which generates the bloody bits.  I will have one push upstream to help here.

Dan


From lotheac at iki.fi  Mon Jul  7 16:11:59 2014
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Mon, 7 Jul 2014 19:11:59 +0300
Subject: [OmniOS-discuss] omnios-build: the build system,
 scripts and merging between branches
In-Reply-To: <7B288CC0-9AEB-45A3-8167-AF343283ACC0@omniti.com>
References: <20140707153826.GA15377@gutsman.lotheac.fi>
	<7B288CC0-9AEB-45A3-8167-AF343283ACC0@omniti.com>
Message-ID: <20140707161159.GB15377@gutsman.lotheac.fi>

On Mon, Jul 07 2014 12:02:59 -0400, Dan McDonald wrote:
> This *seems* sensible, especially as you've put buildctl at the top-level in the omnios-build half of the split.
> 
> It seems right now, however, that buildctl still assumes it's in
> build/. instead of one directory above it.

Yep, it does. As I said, it's just a PoC at this point (but I think you
might be able to do 'cd build; ../buildctl'. I don't use buildctl
myself, yet)

> It's also not 100% clear that the functions in lib/ have been altered
> to assume site.sh and config.sh are in build instead of lib.

https://github.com/niksula/omnios-build/compare/niksula:master...split

> I'd like to see what all changed in any scripts that now live in the
> "build" half of your split vs. their original pre-split incarnations.
> Not sure if github or a tool like webrev would be able to help here.

Nothing in the scripts themselves changed. I did a git filter-branch
--subdirectory-filter to split the build dir into its own repo, after
which I just removed buildctl, added config.sh and site.sh. The latest
three commits at
https://github.com/niksula/omnios-build-scripts/commits/master
basically.

-- 
Lauri Tirkkonen | +358 50 5341376 | lotheac @ IRCnet

From fabio at fabiorabelo.wiki.br  Tue Jul  8 15:56:43 2014
From: fabio at fabiorabelo.wiki.br (=?UTF-8?Q?F=C3=A1bio_Rabelo?=)
Date: Tue, 8 Jul 2014 12:56:43 -0300
Subject: [OmniOS-discuss] ASMedia 1061 WORKING !
Message-ID: <CAEekY65YDOwm3=5NdVnc4OXSqNyhtpmETJRDNSNq4ZeRQmQXVQ@mail.gmail.com>

Hi to all

I just plugged this card in an OmniOS box :

http://sgp.imgmarket.net/sgp/201406/35765_b.jpg

And the system recognises the hard disk connected to it, initialised
it, and it is in use !!!

I do not have yet anything about performance, but it is working !!


F?bio Rabelo

From brogyi at gmail.com  Tue Jul  8 19:05:29 2014
From: brogyi at gmail.com (=?UTF-8?B?QnJvZ3nDoW55aSBKw7N6c2Vm?=)
Date: Tue, 08 Jul 2014 21:05:29 +0200
Subject: [OmniOS-discuss] ASMedia 1061 WORKING !
In-Reply-To: <CAEekY65YDOwm3=5NdVnc4OXSqNyhtpmETJRDNSNq4ZeRQmQXVQ@mail.gmail.com>
References: <CAEekY65YDOwm3=5NdVnc4OXSqNyhtpmETJRDNSNq4ZeRQmQXVQ@mail.gmail.com>
Message-ID: <53BC40F9.7040105@gmail.com>

Very good Fabio. My card is exactly same but the postman hasn't brought 
me yet.
  I felt, I thought it must work. Thank you for your confirmation.

Brogyi

One important things. AHCI must on in BIOS.  Am I right? Stupid question 
but is important. :)


2014.07.08. 17:56 keltez?ssel, F?bio Rabelo ?rta:
> Hi to all
>
> I just plugged this card in an OmniOS box :
>
> http://sgp.imgmarket.net/sgp/201406/35765_b.jpg
>
> And the system recognises the hard disk connected to it, initialised
> it, and it is in use !!!
>
> I do not have yet anything about performance, but it is working !!
>
>
> F?bio Rabelo
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


From danmcd at omniti.com  Tue Jul  8 19:18:34 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 8 Jul 2014 15:18:34 -0400
Subject: [OmniOS-discuss] ASMedia 1061 WORKING !
In-Reply-To: <53BC40F9.7040105@gmail.com>
References: <CAEekY65YDOwm3=5NdVnc4OXSqNyhtpmETJRDNSNq4ZeRQmQXVQ@mail.gmail.com>
	<53BC40F9.7040105@gmail.com>
Message-ID: <4DB1667D-AED1-43B6-9C8C-979E55280C2E@omniti.com>


On Jul 8, 2014, at 3:05 PM, Brogy?nyi J?zsef <brogyi at gmail.com> wrote:

> Very good Fabio. My card is exactly same but the postman hasn't brought me yet.
> I felt, I thought it must work. Thank you for your confirmation.
> 
> Brogyi
> 
> One important things. AHCI must on in BIOS.  Am I right? Stupid question but is important. :)

AHCI is generally the best choice.

Dan


From danmcd at omniti.com  Fri Jul 11 18:02:58 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 11 Jul 2014 14:02:58 -0400
Subject: [OmniOS-discuss] OmniOS "bloody" repo has been updated
Message-ID: <5E1D27AA-CD90-48CC-9C0A-6EBD64FC4C0B@omniti.com>

AND there's new installation media as well.

Highlights include:

	- Now built with pkgdepend(1) checking.  To that end, I've updated the entire wad of packages.  This may increase your download/upgrade time.  (Thanks to community member Lauri "lotheac" Tirkkonen.)

	- Several ZFS updates from upstream illumos.
		+ Better behavior in the face of full or nearly-full pools.
		+ Better-behaved "zfs rename" and "zfs create" when not sharing the datasets.

	- rpcbind(1M) and mountd(1M) now use libumem, which will help them both when under load.

	- Additional devices entries for cpqary3(7D) for additional HP HBAs.

Happy updating!
Dan McD. -- OmniOS Engineering


From fabio at fabiorabelo.wiki.br  Sun Jul 13 12:07:26 2014
From: fabio at fabiorabelo.wiki.br (=?UTF-8?Q?F=C3=A1bio_Rabelo?=)
Date: Sun, 13 Jul 2014 09:07:26 -0300
Subject: [OmniOS-discuss] AHCI driver problem with ASM1062
In-Reply-To: <53C24C65.2000905@gmail.com>
References: <53C24C65.2000905@gmail.com>
Message-ID: <CAEekY6558X70KTNEypBtgEVYHnnuRR48hyeUbG2h+cd1_SBQVw@mail.gmail.com>

I am not shore if I can help .

The chip are not exactly the same, mine are ASM 1061, and I am not
experiencing any error, but the only disk connected to this card are a
Samsung 256GB SSD that are working as ZIL .

During my research I found in several forums a warning about BIOS
version, but all users saying it were from Windows world .

Anyway, my card already comes with the up to date BIOS .

And I did not saw anything about the ASM1062 chip, only about the ASM
1061 chip, so I do not know if the bios issue can apply to your case .

Sorry if I cannot be more useful ...


F?bio Rabelo

2014-07-13 6:07 GMT-03:00 Brogy?nyi J?zsef <brogyi at gmail.com>:
> Hi
>
> I'm little bit sad because this card not working 100%. Some people can use
> it without any error. My system is working with small files.
> I think the AHCI driver is same as OmniTI.
> When I want to copy a big file ( more than 1 GB) the system began to slow
> and finally stop or waiting for a long time.
> Fabio could you check your on your system this? The dd copy sometimes can
> run with few error message then the speed is enough for any new HDD
> (395MBps).
> Now I don't know the card is bad or the AHCI driver not fit my system.
> Thanks any help. I think the message contains lot information about this
> error.
> Here is my dmesg output:
>
> Jul 13 10:43:46 hipster         Transport state transition error (T)
> Jul 13 10:43:46 hipster ahci: [ID 657156 kern.warning] WARNING: ahci1: error
> recovery for port 1 succeed
> Jul 13 10:43:46 hipster ahci: [ID 777486 kern.warning] WARNING: ahci1: ahci
> port 1 has interface fatal error
> Jul 13 10:43:46 hipster ahci: [ID 687168 kern.warning] WARNING: ahci1: ahci
> port 1 is trying to do error recovery
> Jul 13 10:43:46 hipster ahci: [ID 551337 kern.warning] WARNING: ahci1:
> Handshake Error (H)
> Jul 13 10:43:46 hipster         Transport state transition error (T)
> Jul 13 10:43:47 hipster ahci: [ID 657156 kern.warning] WARNING: ahci1: error
> recovery for port 1 succeed
> Jul 13 10:44:44 hipster ahci: [ID 517647 kern.warning] WARNING: ahci1:
> watchdog port 1 satapkt 0xffffff02d216e8d8 timed out
> Jul 13 10:44:44 hipster ahci: [ID 777486 kern.warning] WARNING: ahci1: ahci
> port 1 has interface fatal error
> Jul 13 10:44:44 hipster ahci: [ID 687168 kern.warning] WARNING: ahci1: ahci
> port 1 is trying to do error recovery
> Jul 13 10:44:44 hipster ahci: [ID 551337 kern.warning] WARNING: ahci1:
> Handshake Error (H)
> Jul 13 10:44:44 hipster         Transport state transition error (T)
> Jul 13 10:44:45 hipster ahci: [ID 657156 kern.warning] WARNING: ahci1: error
> recovery for port 1 succeed
> Jul 13 10:44:45 hipster sata: [ID 801845 kern.info]
> /pci at 0,0/pci8086,8c14 at 1c,2/pci1b21,1060 at 0:
> Jul 13 10:44:45 hipster  SATA port 1 error
> Jul 13 10:44:45 hipster sata: [ID 801845 kern.info]
> /pci at 0,0/pci8086,8c14 at 1c,2/pci1b21,1060 at 0:
> Jul 13 10:44:45 hipster  SATA port 1 error
> Jul 13 10:44:45 hipster sata: [ID 801845 kern.info]
> /pci at 0,0/pci8086,8c14 at 1c,2/pci1b21,1060 at 0:
> Jul 13 10:44:45 hipster  SATA port 1 error
> Jul 13 10:44:45 hipster sata: [ID 801845 kern.info]
> /pci at 0,0/pci8086,8c14 at 1c,2/pci1b21,1060 at 0:
> Jul 13 10:44:45 hipster  SATA port 1 error
> Jul 13 10:44:45 hipster fmd: [ID 377184 daemon.error] SUNW-MSG-ID:
> ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major
> Jul 13 10:44:45 hipster EVENT-TIME: Sun Jul 13 10:44:45 CEST 2014
> Jul 13 10:44:45 hipster PLATFORM: PowerEdge-T20, CSN: 33XJJZ1, HOSTNAME:
> hipster
> Jul 13 10:44:45 hipster SOURCE: zfs-diagnosis, REV: 1.0
> Jul 13 10:44:45 hipster EVENT-ID: 6fb72a24-271b-4b1b-9287-d5352ace8993
> Jul 13 10:44:45 hipster DESC: The ZFS pool has experienced currently
> unrecoverable I/O
> Jul 13 10:44:45 hipster             failures.  Refer to
> http://illumos.org/msg/ZFS-8000-HC for more information.
> Jul 13 10:44:45 hipster AUTO-RESPONSE: No automated response will be taken.
> Jul 13 10:44:45 hipster IMPACT: Read and write I/Os cannot be serviced.
> Jul 13 10:44:45 hipster REC-ACTION: Make sure the affected devices are
> connected, then run
> Jul 13 10:44:45 hipster             'zpool clear'.
> Jul 13 10:45:00 hipster fmd: [ID 377184 daemon.error] SUNW-MSG-ID:
> ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
> Jul 13 10:45:00 hipster EVENT-TIME: Sun Jul 13 10:45:00 CEST 2014
> Jul 13 10:45:00 hipster PLATFORM: PowerEdge-T20, CSN: 33XJJZ1, HOSTNAME:
> hipster
> Jul 13 10:45:00 hipster SOURCE: zfs-diagnosis, REV: 1.0
> Jul 13 10:45:00 hipster EVENT-ID: 52d16a83-6041-e362-b66f-ff4d1fb95a61
> Jul 13 10:45:00 hipster DESC: The number of I/O errors associated with a ZFS
> device exceeded
> Jul 13 10:45:00 hipster              acceptable levels.  Refer to
> http://illumos.org/msg/ZFS-8000-FD for more information.
> Jul 13 10:45:00 hipster AUTO-RESPONSE: The device has been offlined and
> marked as faulted.  An attempt
> Jul 13 10:45:00 hipster              will be made to activate a hot spare if
> available.
> Jul 13 10:45:00 hipster IMPACT: Fault tolerance of the pool may be
> compromised.
> Jul 13 10:45:00 hipster REC-ACTION: Run 'zpool status -x' and replace the
> bad device.
> Jul 13 10:45:44 hipster ahci: [ID 517647 kern.warning] WARNING: ahci1:
> watchdog port 1 satapkt 0xffffff02deb2f180 timed out
>

From youzhong at gmail.com  Thu Jul 17 15:07:59 2014
From: youzhong at gmail.com (Youzhong Yang)
Date: Thu, 17 Jul 2014 11:07:59 -0400
Subject: [OmniOS-discuss] Issue with LSI 3108 MegaRAID ROMB card
Message-ID: <CADpNCvYaAVkm66AN-yGAP8sFt_TvnOD9jLTBR4hjz3yW=DXfOQ@mail.gmail.com>

Hi All,

We have problem using the LSI 3108 card, just wondering if anyone here has
any success story using this card in production.

Here is the FM version info and error we got in /var/adm/messages:

BIOS Version       : 6.13.00_4.14.05.00_0x06010600
Ctrl-R Version     : 5.01-0004
FW Version         : 4.210.10-2910
NVDATA Version     : 3.1310.00-0054
Boot Block Version : 3.00.00.00-0009

Jul 15 21:01:56 xxxx mr_sas: [ID 270009 kern.warning] WARNING:
io_timeout_checker: FW Fault, calling reset adapter
Jul 15 21:01:56 xxxx mr_sas: [ID 643100 kern.notice]
io_timeout_checker: fw_outstanding 0x17 max_fw_cmds 0x39F
Jul 15 21:01:59 xxxx mr_sas: [ID 347913 kern.warning] WARNING:
mrsas_tbolt_reset_ppc: FW is in fault after OCR count 1 Retry Reset
Jul 15 21:02:09 xxxx mr_sas: [ID 887724 kern.warning] WARNING:
mrsas_tbolt_reset_ppc:resetadapter bit is set already check retry
count 101

Thanks,

-Youzhong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140717/a9073b5f/attachment-0001.html>

From danmcd at omniti.com  Thu Jul 17 15:15:54 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 17 Jul 2014 11:15:54 -0400
Subject: [OmniOS-discuss] Issue with LSI 3108 MegaRAID ROMB card
In-Reply-To: <CADpNCvYaAVkm66AN-yGAP8sFt_TvnOD9jLTBR4hjz3yW=DXfOQ@mail.gmail.com>
References: <CADpNCvYaAVkm66AN-yGAP8sFt_TvnOD9jLTBR4hjz3yW=DXfOQ@mail.gmail.com>
Message-ID: <59D196E8-9E21-4205-BAD5-3DFE22873AA1@omniti.com>


On Jul 17, 2014, at 11:07 AM, Youzhong Yang <youzhong at gmail.com> wrote:

> Hi All,
> 
> We have problem using the LSI 3108 card, just wondering if anyone here has any success story using this card in production.

<SNIP!>

When mr_sas(7d) was updated for 2208, it included untested 3108 support.  3108 was untested because people didn't have 3108 cards at the time it went back.

The messages you're seeing indicate the card's timing-out an IO operation, followed by a reset-the-card failure.

Beyond that, I can't help you much right now.  I have no such card available to me.  For the record, are you stuck using this, or did you choose a 3108?  I'd recommend choosing something else if it was your choice.  If it's not, please tell me what platform stuck you with a 3108, as it may be a harbinger of future complaints.

Thanks,
Dan


From youzhong at gmail.com  Thu Jul 17 15:59:16 2014
From: youzhong at gmail.com (Youzhong Yang)
Date: Thu, 17 Jul 2014 11:59:16 -0400
Subject: [OmniOS-discuss] Issue with LSI 3108 MegaRAID ROMB card
In-Reply-To: <59D196E8-9E21-4205-BAD5-3DFE22873AA1@omniti.com>
References: <CADpNCvYaAVkm66AN-yGAP8sFt_TvnOD9jLTBR4hjz3yW=DXfOQ@mail.gmail.com>
	<59D196E8-9E21-4205-BAD5-3DFE22873AA1@omniti.com>
Message-ID: <CADpNCvYmfQpETEC_7TCfNhLg34iO2g-RpKBn_uKeckJssKapsA@mail.gmail.com>

Thanks Dan.

We ordered these Supermicro X9DRW-CF/CTF boxes which have ROMB LSI 3108 on
the motherboard and got stuck. We will probably add 9211-8i HBA cards to
the machines and get them move forward.

Thanks,
Youzhong


On Thu, Jul 17, 2014 at 11:15 AM, Dan McDonald <danmcd at omniti.com> wrote:

>
> On Jul 17, 2014, at 11:07 AM, Youzhong Yang <youzhong at gmail.com> wrote:
>
> > Hi All,
> >
> > We have problem using the LSI 3108 card, just wondering if anyone here
> has any success story using this card in production.
>
> <SNIP!>
>
> When mr_sas(7d) was updated for 2208, it included untested 3108 support.
>  3108 was untested because people didn't have 3108 cards at the time it
> went back.
>
> The messages you're seeing indicate the card's timing-out an IO operation,
> followed by a reset-the-card failure.
>
> Beyond that, I can't help you much right now.  I have no such card
> available to me.  For the record, are you stuck using this, or did you
> choose a 3108?  I'd recommend choosing something else if it was your
> choice.  If it's not, please tell me what platform stuck you with a 3108,
> as it may be a harbinger of future complaints.
>
> Thanks,
> Dan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140717/61021ea2/attachment.html>

From danmcd at omniti.com  Thu Jul 17 15:59:28 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 17 Jul 2014 11:59:28 -0400
Subject: [OmniOS-discuss] Issue with LSI 3108 MegaRAID ROMB card
In-Reply-To: <CADpNCvYmfQpETEC_7TCfNhLg34iO2g-RpKBn_uKeckJssKapsA@mail.gmail.com>
References: <CADpNCvYaAVkm66AN-yGAP8sFt_TvnOD9jLTBR4hjz3yW=DXfOQ@mail.gmail.com>
	<59D196E8-9E21-4205-BAD5-3DFE22873AA1@omniti.com>
	<CADpNCvYmfQpETEC_7TCfNhLg34iO2g-RpKBn_uKeckJssKapsA@mail.gmail.com>
Message-ID: <CE93C266-0B2B-48D8-97EB-EBF08A2ADA0C@omniti.com>


On Jul 17, 2014, at 11:59 AM, Youzhong Yang <youzhong at gmail.com> wrote:

> Thanks Dan.
> 
> We ordered these Supermicro X9DRW-CF/CTF boxes which have ROMB LSI 3108 on the motherboard and got stuck. We will probably add 9211-8i HBA cards to the machines and get them move forward.

Before I saw this, one of my OmniTI co-workers mentioned that model of mobo as one possibility.

Thanks,
Dan


From youzhong at gmail.com  Thu Jul 17 19:30:25 2014
From: youzhong at gmail.com (Youzhong Yang)
Date: Thu, 17 Jul 2014 15:30:25 -0400
Subject: [OmniOS-discuss] [smartos-discuss] Re: Issue with LSI 3108
 MegaRAID ROMB card
In-Reply-To: <20140717170103.GA425@joyent.com>
References: <CADpNCvYaAVkm66AN-yGAP8sFt_TvnOD9jLTBR4hjz3yW=DXfOQ@mail.gmail.com>
	<59D196E8-9E21-4205-BAD5-3DFE22873AA1@omniti.com>
	<CADpNCvYmfQpETEC_7TCfNhLg34iO2g-RpKBn_uKeckJssKapsA@mail.gmail.com>
	<20140717170103.GA425@joyent.com>
Message-ID: <CADpNCvZLe4POEt6iEQE8j_jCL7yt6+MhZOuCEPv_xYsWh58PFA@mail.gmail.com>

Thanks Keith.

We upgraded the firmware to its latest but still no luck, so likely we'll
give up.

BIOS Version       : 6.17.04.0_4.16.08.00_0x06060A01

Ctrl-R Version     : 5.04-0002

Preboot CLI Version: 01.07-05:#%0000

FW Version         : 4.230.20-3532

NVDATA Version     : 3.1403.00-0079

Boot Block Version : 3.02.00.00-0001


Jul 17 13:08:35 batfs0388 mr_sas: [ID 643100 kern.notice]
io_timeout_checker: fw_outstanding 0x17 max_fw_cmds 0x39F

Jul 17 13:08:38 batfs0388 mr_sas: [ID 347913 kern.warning] WARNING:
mrsas_tbolt_reset_ppc: FW is in fault after OCR count 1 Retry Reset

Jul 17 13:08:48 batfs0388 mr_sas: [ID 887724 kern.warning] WARNING:
mrsas_tbolt_reset_ppc:resetadapter bit is set already check retry count 101

Jul 17 13:08:49 batfs0388 mr_sas: [ID 270009 kern.warning] WARNING:
io_timeout_checker: FW Fault, calling reset adapter


On Thu, Jul 17, 2014 at 1:01 PM, Keith Wesolowski <
keith.wesolowski at joyent.com> wrote:

> On Thu, Jul 17, 2014 at 11:59:16AM -0400, Youzhong Yang via
> smartos-discuss wrote:
>
> > We ordered these Supermicro X9DRW-CF/CTF boxes which have ROMB LSI 3108
> on
> > the motherboard and got stuck. We will probably add 9211-8i HBA cards to
> > the machines and get them move forward.
>
> The 9207-8i is likely a better fit; it comes standard with IT firmware
> vs IR in the 9211-8i.  Both do work, however.  SMCI makes a similar
> board, the X9DRD-7LN4F-JBOD, which has the same 2308-IT on board as the
> 9207-8i.  I recommend using that instead of the X9DRW unless you're
> wedded to the WIO form factor.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140717/e15af5f4/attachment.html>

From henson at acm.org  Mon Jul 21 02:59:02 2014
From: henson at acm.org (Paul B. Henson)
Date: Sun, 20 Jul 2014 19:59:02 -0700
Subject: [OmniOS-discuss] unexpected BE snapshots
Message-ID: <20140721025902.GL31192@bender.unx.csupomona.edu>

I decided to go clean up some old boot environments, and noticed some
unexpected snapshots of my new 151010 BE. Before I cleaned up, it looked
like this:

rpool                                          2.56G  36.6G    37K /rpool
rpool/ROOT                                     2.55G  36.6G    31K legacy
rpool/ROOT/omnios-r151008f                      590K  36.6G  1.05G  /
rpool/ROOT/omnios-r151008f-tty-irq              494K  36.6G   814M  /
rpool/ROOT/omnios-r151008f-ttyc-1               592K  36.6G   813M  /
rpool/ROOT/omnios-r151008j                      652K  36.6G   813M  /
rpool/ROOT/omnios-r151008t                     53.5M  36.6G   823M  /
rpool/ROOT/omnios-r151008t-backup-1              58K  36.6G   828M  /
rpool/ROOT/omnios-r151010                      2.50G  36.6G  1.08G  /
rpool/ROOT/omnios-r151010 at install               490M      -   672M  -
rpool/ROOT/omnios-r151010 at 2014-02-22-03:00:02   340M      -  1.05G  -
rpool/ROOT/omnios-r151010 at 2014-03-13-22:38:49  64.4M      -   813M  -
rpool/ROOT/omnios-r151010 at 2014-05-30-21:31:31  3.54M      -   822M  -
rpool/ROOT/omnios-r151010 at 2014-05-30-21:36:13  6.19M      -   822M  -
rpool/ROOT/omnios-r151010 at 2014-06-05-22:00:07  18.0M      -   828M  -
rpool/ROOT/omnios-r151010 at 2014-06-16-18:55:29  28.8M      -  1.01G  -

I noticed the new 151010 BE had a number of snapshots I didn't make; the
dates are particularly odd given 151010 wasn't released until May
something and I didn't install it until Junish.

After running beadm to delete my old BE's, it then looked like this:

rpool                                          2.02G  37.1G  36.5K /rpool
rpool/ROOT                                     2.02G  37.1G    31K legacy
rpool/ROOT/omnios-r151008t                     53.5M  37.1G   823M  /
rpool/ROOT/omnios-r151010                      1.97G  37.1G  1.08G  /
rpool/ROOT/omnios-r151010 at install               492M      -   672M  -
rpool/ROOT/omnios-r151010 at 2014-06-16-18:55:29   412M      -  1.01G  -

Five of the 151010 snapshots had disappeared, presumably cleaned up by
beadm? Are these beadm managed snapshots something new? I don't recall
ever seeing them before. Can I just delete them manually or would that
break something? Between the two of them that are left, they seem to
be sucking up about 1 gig or so.

Thanks...


From danmcd at omniti.com  Wed Jul 23 14:18:46 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 23 Jul 2014 10:18:46 -0400
Subject: [OmniOS-discuss] Who's using "bloody" out there?
Message-ID: <BB89A20E-4ED4-4C92-9D8D-CB41B0CBECDC@omniti.com>

I'm spinning what will be this week's update to bloody right now.  There's at least one bugfix in there that I'm kinda surprised nobody noticed or complained about.

Can people who are using bloody quickly send a reply to the list on this thread?

Thanks,
Dan


From danmcd at omniti.com  Wed Jul 23 14:22:11 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 23 Jul 2014 10:22:11 -0400
Subject: [OmniOS-discuss] OmniOS "bloody" repo has been updated
Message-ID: <41D125AE-CB35-46E8-91B8-7291F0ABAE09@omniti.com>

Hey everyone!

Once again, I've updated the install media for this update as well as the IPS repo.  It's a big one, and includes one new device I'd *REALLY* like folks to test upon.

Broken down by category, what's new in this update?

userspace
	- zsh to 5.0.5.
	- mandoc now in the man pages.

headers
	- POSIX 2008 locale support

devices
	- mpt_sas now supports 12G  **** WOULD APPRECIATE TESTING ****

zones
	- virtualized load average for zones
	- per-zone CPU kstats

Files & sharing
	- Several ZFS bugfixes and enhancements.
	- rpcbind bugfix
	- NLM bugfixes

TCP/IP
	- tcp_strong_iss defaults to 2.
	- ipsec_policy_log_interval to 0.


Happy updating!
Dan


From danmcd at omniti.com  Wed Jul 23 14:34:48 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 23 Jul 2014 10:34:48 -0400
Subject: [OmniOS-discuss] NOT YET (was Re: OmniOS "bloody" repo has been
	updated)
In-Reply-To: <41D125AE-CB35-46E8-91B8-7291F0ABAE09@omniti.com>
References: <41D125AE-CB35-46E8-91B8-7291F0ABAE09@omniti.com>
Message-ID: <550297D7-9154-43DB-AD20-E10331532676@omniti.com>

AAAAAH!

Shoot, this update isn't ready yet.

But everything mentioned below will be included.

Sorry,
Dan


On Jul 23, 2014, at 10:22 AM, Dan McDonald <danmcd at omniti.com> wrote:

> Hey everyone!
> 
> Once again, I've updated the install media for this update as well as the IPS repo.  It's a big one, and includes one new device I'd *REALLY* like folks to test upon.
> 
> Broken down by category, what's new in this update?
> 
> userspace
> 	- zsh to 5.0.5.
> 	- mandoc now in the man pages.
> 
> headers
> 	- POSIX 2008 locale support
> 
> devices
> 	- mpt_sas now supports 12G  **** WOULD APPRECIATE TESTING ****
> 
> zones
> 	- virtualized load average for zones
> 	- per-zone CPU kstats
> 
> Files & sharing
> 	- Several ZFS bugfixes and enhancements.
> 	- rpcbind bugfix
> 	- NLM bugfixes
> 
> TCP/IP
> 	- tcp_strong_iss defaults to 2.
> 	- ipsec_policy_log_interval to 0.
> 
> 
> Happy updating!
> Dan
> 


From fabio at fabiorabelo.wiki.br  Wed Jul 23 14:41:40 2014
From: fabio at fabiorabelo.wiki.br (=?UTF-8?Q?F=C3=A1bio_Rabelo?=)
Date: Wed, 23 Jul 2014 11:41:40 -0300
Subject: [OmniOS-discuss] OmniOS "bloody" repo has been updated
In-Reply-To: <41D125AE-CB35-46E8-91B8-7291F0ABAE09@omniti.com>
References: <41D125AE-CB35-46E8-91B8-7291F0ABAE09@omniti.com>
Message-ID: <CAEekY64cq0m_Zz2ybD13kqt1kR8RU1S50j3xP2bmEFw-fy8=ow@mail.gmail.com>

Just for clarification, this mpt-sas addresses LSI 2308 ?

Anyway, it is great news .... thanks ...


F?bio Rabelo

2014-07-23 11:22 GMT-03:00 Dan McDonald <danmcd at omniti.com>:
> Hey everyone!
>
> Once again, I've updated the install media for this update as well as the IPS repo.  It's a big one, and includes one new device I'd *REALLY* like folks to test upon.
>
> Broken down by category, what's new in this update?
>
> userspace
>         - zsh to 5.0.5.
>         - mandoc now in the man pages.
>
> headers
>         - POSIX 2008 locale support
>
> devices
>         - mpt_sas now supports 12G  **** WOULD APPRECIATE TESTING ****
>
> zones
>         - virtualized load average for zones
>         - per-zone CPU kstats
>
> Files & sharing
>         - Several ZFS bugfixes and enhancements.
>         - rpcbind bugfix
>         - NLM bugfixes
>
> TCP/IP
>         - tcp_strong_iss defaults to 2.
>         - ipsec_policy_log_interval to 0.
>
>
> Happy updating!
> Dan
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

From danmcd at omniti.com  Wed Jul 23 23:14:08 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 23 Jul 2014 19:14:08 -0400
Subject: [OmniOS-discuss] NOW AVAILABLE - OmniOS "bloody" repo has been
	updated
In-Reply-To: <550297D7-9154-43DB-AD20-E10331532676@omniti.com>
References: <41D125AE-CB35-46E8-91B8-7291F0ABAE09@omniti.com>
	<550297D7-9154-43DB-AD20-E10331532676@omniti.com>
Message-ID: <C43D345B-CACE-4E02-AAE2-317716000F78@omniti.com>

Let's try this again!

.  .  .

Hey everyone!

Once again, I've updated the install media for this update as well as the IPS repo.  It's a big one, and includes one new device I'd *REALLY* like folks to test upon IF they have said device (LSI 3008 12G SAS chipset).

Broken down by category, what's new in this update?

userspace
	- zsh to 5.0.5.
	- mandoc now in the man pages.

headers
	- POSIX 2008 locale support

devices
	- mpt_sas now supports LSI 3008 12G SAS  **** WOULD APPRECIATE TESTING ****

zones
	- virtualized load average for zones
	- per-zone CPU kstats

Files & sharing
	- Several ZFS bugfixes and enhancements.
	- rpcbind bugfix
	- NLM bugfixes

TCP/IP
	- tcp_strong_iss defaults to 2.
	- ipsec_policy_log_interval to 0.


Happy updating!
Dan


From henson at acm.org  Thu Jul 24 00:25:23 2014
From: henson at acm.org (Paul B. Henson)
Date: Wed, 23 Jul 2014 17:25:23 -0700
Subject: [OmniOS-discuss] Who's using "bloody" out there?
In-Reply-To: <BB89A20E-4ED4-4C92-9D8D-CB41B0CBECDC@omniti.com>
References: <BB89A20E-4ED4-4C92-9D8D-CB41B0CBECDC@omniti.com>
Message-ID: <145c01cfa6d5$c409b890$4c1d29b0$@acm.org>

> From: Dan McDonald
> Sent: Wednesday, July 23, 2014 7:19 AM
>
> Can people who are using bloody quickly send a reply to the list on this
> thread?

I've got a bloody box that I poke at and update occasionally, but I can't
really say that I'm "using" it very much. I'd notice if it completely failed
to boot or died randomly, but beyond that, not much testing going on :(,
sorry.


From al.slater at scluk.com  Thu Jul 24 10:16:23 2014
From: al.slater at scluk.com (Al Slater)
Date: Thu, 24 Jul 2014 11:16:23 +0100
Subject: [OmniOS-discuss] pckrecv chash failure
Message-ID: <53D0DCF7.9070307@scluk.com>

Hi,

I am trying to pkgrecv http://pkg.omniti.com/omnios/release to my own 
repository, but it is failing with

root at omniostest:/export/home/aslate# pkgrecv -s 
http://pkg.omniti.com/omnios/release -d file:///sclomnios -c 
/var/tmp/pkgrecv-2GrduN 'pkg:/*'
Processing packages for publisher omnios ...
Retrieving and evaluating 1039 package(s)...
PROCESS                                         ITEMS     GET (MB) 
SEND (MB)
developer/sunstudio12.1                         0/109    5.5/336.5 
0.0/1138.5pkgrecv: Invalid contentpath 
opt/sunstudio12.1/prod/lib/sys/libsunir.so: chash failure: expected: 
b251c238070b6fdbf392194e85319e2c954a5384 computed: 
289fe42c63d889a623a80b9158517139bd29ac3b. (happened 4 times)

Is there a problem with the repo?

-- 
Al Slater

Technical Director
SCL

Phone : +44 (0)1273 666607
Fax   : +44 (0)1273 666601
email : al.slater at scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

Registered in England Company number: 1957652 VAT number: GB 760 2433 55


From groups at tierarzt-mueller.de  Thu Jul 24 18:25:32 2014
From: groups at tierarzt-mueller.de (Alexander Lesle)
Date: Thu, 24 Jul 2014 20:25:32 +0200
Subject: [OmniOS-discuss] Oracle Java v7u65
Message-ID: <567177507.20140724202532@tierarzt-mueller.de>

Hello All

Is it possible to install Oracle Java v7u65 JRE to OI?
Is there a package available?
And how to do the installation.

Thanks.

-- 
Best Regards
Alexander
Juli, 24 2014


From jesus at omniti.com  Thu Jul 24 18:32:27 2014
From: jesus at omniti.com (Theo Schlossnagle)
Date: Thu, 24 Jul 2014 14:32:27 -0400
Subject: [OmniOS-discuss] Oracle Java v7u65
In-Reply-To: <567177507.20140724202532@tierarzt-mueller.de>
References: <567177507.20140724202532@tierarzt-mueller.de>
Message-ID: <CACLsApuqenQy4VFBTSaz_S2_e5Tf3ZWK_-5eaR--M-nCfK96WA@mail.gmail.com>

Do you mean to "OmniOS".... OI is a different distribution.


On Thu, Jul 24, 2014 at 2:25 PM, Alexander Lesle <groups at tierarzt-mueller.de
> wrote:

> Hello All
>
> Is it possible to install Oracle Java v7u65 JRE to OI?
> Is there a package available?
> And how to do the installation.
>
> Thanks.
>
> --
> Best Regards
> Alexander
> Juli, 24 2014
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>


-- 

Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140724/0fa7efd3/attachment.html>

From jdg117 at elvis.arl.psu.edu  Thu Jul 24 18:33:12 2014
From: jdg117 at elvis.arl.psu.edu (John D Groenveld)
Date: Thu, 24 Jul 2014 14:33:12 -0400
Subject: [OmniOS-discuss] Oracle Java v7u65
In-Reply-To: Your message of "Thu, 24 Jul 2014 20:25:32 +0200."
	<567177507.20140724202532@tierarzt-mueller.de> 
References: <567177507.20140724202532@tierarzt-mueller.de> 
Message-ID: <201407241833.s6OIXChp026989@elvis.arl.psu.edu>

In message <567177507.20140724202532 at tierarzt-mueller.de>, Alexander Lesle writ
es:
>Is it possible to install Oracle Java v7u65 JRE to OI?
>Is there a package available?

The SVR4 packages for Solaris 10 are on OTN:
<URL:http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html>

>And how to do the installation.

pkgadd(1M) per the README.

John
groenveld at acm.org

From groups at tierarzt-mueller.de  Thu Jul 24 18:36:33 2014
From: groups at tierarzt-mueller.de (Alexander Lesle)
Date: Thu, 24 Jul 2014 20:36:33 +0200
Subject: [OmniOS-discuss] Oracle Java v7u65
In-Reply-To: <CACLsApuqenQy4VFBTSaz_S2_e5Tf3ZWK_-5eaR--M-nCfK96WA@mail.gmail.com>
References: <567177507.20140724202532@tierarzt-mueller.de>
	<CACLsApuqenQy4VFBTSaz_S2_e5Tf3ZWK_-5eaR--M-nCfK96WA@mail.gmail.com>
Message-ID: <473254889.20140724203633@tierarzt-mueller.de>

Hello Theo Schlossnagle and List,

you are right. Sorry.

At the moment I mean OmniOS.

On Juli, 24 2014, 20:32 <Theo Schlossnagle> wrote in [1]:

> Do you mean to "OmniOS".... OI is a different distribution.


> On Thu, Jul 24, 2014 at 2:25 PM, Alexander Lesle
> <groups at tierarzt-mueller.de> wrote:
>  
> Hello All
>  
>  Is it possible to install Oracle Java v7u65 JRE to OI?
>  Is there a package available?
>  And how to do the installation.
>  
>  Thanks.
>  
>  --
>  Best Regards
>  Alexander
>  Juli, 24 2014
>  
>  _______________________________________________
>  OmniOS-discuss mailing list
>  OmniOS-discuss at lists.omniti.com
>  http://lists.omniti.com/mailman/listinfo/omnios-discuss
>  


-- 
Best Regards
Alexander
Juli, 24 2014
........
[1] mid:CACLsApuqenQy4VFBTSaz_S2_e5Tf3ZWK_-5eaR--M-nCfK96WA at mail.gmail.com
........


From jesus at omniti.com  Thu Jul 24 18:40:47 2014
From: jesus at omniti.com (Theo Schlossnagle)
Date: Thu, 24 Jul 2014 14:40:47 -0400
Subject: [OmniOS-discuss] Oracle Java v7u65
In-Reply-To: <473254889.20140724203633@tierarzt-mueller.de>
References: <567177507.20140724202532@tierarzt-mueller.de>
	<CACLsApuqenQy4VFBTSaz_S2_e5Tf3ZWK_-5eaR--M-nCfK96WA@mail.gmail.com>
	<473254889.20140724203633@tierarzt-mueller.de>
Message-ID: <CACLsApvqJW1Uuw84+Js=_Z7bGY9oCQfy=K6zuuGXkVNZ641sUw@mail.gmail.com>

I usually download the gzip tarball for i586 and x86_64 for Solaris 10.
 And untar it in /opt/


On Thu, Jul 24, 2014 at 2:36 PM, Alexander Lesle <groups at tierarzt-mueller.de
> wrote:

> Hello Theo Schlossnagle and List,
>
> you are right. Sorry.
>
> At the moment I mean OmniOS.
>
> On Juli, 24 2014, 20:32 <Theo Schlossnagle> wrote in [1]:
>
> > Do you mean to "OmniOS".... OI is a different distribution.
>
>
> > On Thu, Jul 24, 2014 at 2:25 PM, Alexander Lesle
> > <groups at tierarzt-mueller.de> wrote:
> >
> > Hello All
> >
> >  Is it possible to install Oracle Java v7u65 JRE to OI?
> >  Is there a package available?
> >  And how to do the installation.
> >
> >  Thanks.
> >
> >  --
> >  Best Regards
> >  Alexander
> >  Juli, 24 2014
> >
> >  _______________________________________________
> >  OmniOS-discuss mailing list
> >  OmniOS-discuss at lists.omniti.com
> >  http://lists.omniti.com/mailman/listinfo/omnios-discuss
> >
>
>
>
>
>
> --
> Best Regards
> Alexander
> Juli, 24 2014
> ........
> [1] mid:CACLsApuqenQy4VFBTSaz_S2_e5Tf3ZWK_-5eaR--M-nCfK96WA at mail.gmail.com
> ........
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>


-- 

Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140724/7c7cf132/attachment-0001.html>

From groups at tierarzt-mueller.de  Thu Jul 24 18:57:17 2014
From: groups at tierarzt-mueller.de (Alexander Lesle)
Date: Thu, 24 Jul 2014 20:57:17 +0200
Subject: [OmniOS-discuss] Oracle Java v7u65
In-Reply-To: <CACLsApvqJW1Uuw84+Js=_Z7bGY9oCQfy=K6zuuGXkVNZ641sUw@mail.gmail.com>
References: <567177507.20140724202532@tierarzt-mueller.de>
	<CACLsApuqenQy4VFBTSaz_S2_e5Tf3ZWK_-5eaR--M-nCfK96WA@mail.gmail.com>
	<473254889.20140724203633@tierarzt-mueller.de>
	<CACLsApvqJW1Uuw84+Js=_Z7bGY9oCQfy=K6zuuGXkVNZ641sUw@mail.gmail.com>
Message-ID: <639180269.20140724205717@tierarzt-mueller.de>

Hello Theo Schlossnagle and List,

You mean this file "jre-7u65-solaris-i586.tar.gz" from java.com
Only unpack in /opt/
Nothing else? No Var setting in PATH??
So easy? :))

On Juli, 24 2014, 20:40 <Theo Schlossnagle> wrote in [1]:

> I usually download the gzip tarball for i586 and x86_64 for Solaris 10. ?And untar it in /opt/


> On Thu, Jul 24, 2014 at 2:36 PM, Alexander Lesle
> <groups at tierarzt-mueller.de> wrote:
>  
> Hello Theo Schlossnagle and List,
>  
>  you are right. Sorry.
>  
>  At the moment I mean OmniOS.
>  
>  On Juli, 24 2014, 20:32 <Theo Schlossnagle> wrote in [1]:
>  

 >> Do you mean to "OmniOS".... OI is a different distribution.
>  
>  
 >> On Thu, Jul 24, 2014 at 2:25 PM, Alexander Lesle
 >> <groups at tierarzt-mueller.de> wrote:
 >>
 >> Hello All
 >>
 >> ?Is it possible to install Oracle Java v7u65 JRE to OI?
 >> ?Is there a package available?
 >> ?And how to do the installation.
 >>
 >> ?Thanks.
 >>
 >> ?--
 >> ?Best Regards
 >> ?Alexander
 >> ?Juli, 24 2014
 >>
 >> ?_______________________________________________
 >> ?OmniOS-discuss mailing list
 >> ?OmniOS-discuss at lists.omniti.com
 >> ?http://lists.omniti.com/mailman/listinfo/omnios-discuss
 >>
>  
>  
>  
>  
>  
>  --
>  Best Regards
>  Alexander
>  Juli, 24 2014
>  ........
>  [1]
> mid:CACLsApuqenQy4VFBTSaz_S2_e5Tf3ZWK_-5eaR--M-nCfK96WA at mail.gmail.com
>  ........
>  

>  _______________________________________________
>  OmniOS-discuss mailing list
>  OmniOS-discuss at lists.omniti.com
>  http://lists.omniti.com/mailman/listinfo/omnios-discuss
>  


-- 
Best Regards
Alexander
Juli, 24 2014
........
[1] mid:CACLsApvqJW1Uuw84+Js=_Z7bGY9oCQfy=K6zuuGXkVNZ641sUw at mail.gmail.com
........


From groups at tierarzt-mueller.de  Thu Jul 24 19:00:44 2014
From: groups at tierarzt-mueller.de (Alexander Lesle)
Date: Thu, 24 Jul 2014 21:00:44 +0200
Subject: [OmniOS-discuss] Oracle Java v7u65
In-Reply-To: <201407241833.s6OIXChp026989@elvis.arl.psu.edu>
References: <567177507.20140724202532@tierarzt-mueller.de>
	<201407241833.s6OIXChp026989@elvis.arl.psu.edu>
Message-ID: <1727715525.20140724210044@tierarzt-mueller.de>

Hello John D Groenveld and List,

Whats OTN?

Whats the different like the solution from Theo?
pkgadd or untar

On Juli, 24 2014, 20:33 <John D Groenveld> wrote in [1]:

> In message <567177507.20140724202532 at tierarzt-mueller.de>, Alexander Lesle writ
> es:
>>Is it possible to install Oracle Java v7u65 JRE to OI?
>>Is there a package available?

> The SVR4 packages for Solaris 10 are on OTN:
> <URL:http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html>

>>And how to do the installation.

> pkgadd(1M) per the README.

> John
> groenveld at acm.org
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

-- 
Best Regards
Alexander
Juli, 24 2014
........
[1] mid:201407241833.s6OIXChp026989 at elvis.arl.psu.edu
........


From jimklimov at cos.ru  Fri Jul 25 08:47:11 2014
From: jimklimov at cos.ru (Jim Klimov)
Date: Fri, 25 Jul 2014 10:47:11 +0200
Subject: [OmniOS-discuss] Oracle Java v7u65
In-Reply-To: <1727715525.20140724210044@tierarzt-mueller.de>
References: <567177507.20140724202532@tierarzt-mueller.de>
	<201407241833.s6OIXChp026989@elvis.arl.psu.edu>
	<1727715525.20140724210044@tierarzt-mueller.de>
Message-ID: <1456d13f-43ac-42f1-a21f-feec284f8b6c@email.android.com>

24 ???? 2014??. 21:00:44 CEST, Alexander Lesle <groups at tierarzt-mueller.de> ?????:
>Hello John D Groenveld and List,
>
>Whats OTN?
>
>Whats the different like the solution from Theo?
>pkgadd or untar
>
>On Juli, 24 2014, 20:33 <John D Groenveld> wrote in [1]:
>
>> In message <567177507.20140724202532 at tierarzt-mueller.de>, Alexander
>Lesle writ
>> es:
>>>Is it possible to install Oracle Java v7u65 JRE to OI?
>>>Is there a package available?
>
>> The SVR4 packages for Solaris 10 are on OTN:
>>
><URL:http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html>
>
>>>And how to do the installation.
>
>> pkgadd(1M) per the README.
>
>> John
>> groenveld at acm.org
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss

Did you previously use and manage Java, on solaris, ilkumos or other oses?

There is a JAVA_HOME environment variable (set in shell, profile, initscript, smf attribs, etc.) that points your Java program such as tomcat, or a cli tool, or some gui installer or whatever to the installation location of the jvm you want to use in this case. Of course, the "java" program for this jvm instance  should be from the corresponding "$JAVA_HOME/bin" path.

So you can have multiple installations and many JVMs running with different versions (there are programs where backwards compatibility does not cut it and you do need an older version of Java for example).

It is customary to install Solaris java's into /usr/jdk/instances/<versioncode>/ and symlink /usr/jdk/latest and /usr/java to the directory with the version you need most likely, and a dozen programs in the standard PATH like /usr/bin/java are in fact symlinks to /usr/java/bin/java and such.

For hosts where /usr is system-managed and should not be touched by users according to some policy, /opt/jdk, /opt/java or plain /opt/<versioncode> are commonly used as containers for jre/jdk installations (typically as unzipped archives, unpackaged). It still makes sense to maintain /opt/java or even /usr/java (if changeable) to point to the installation this zone needs.

Note that for some java versions the x86_64 package or archive only includes the 64-bit files and should overlay a 32-bit jvm of the same version installed/unpacked into the same location. For other oses or major java versions the releases are fully sufficient. An indicator may be the file size (i.e. 80mb 32-bit + 10mb 64-bit vs. both similarly-sized).

As a hint, if you use many local zones to host farms of java appservers,etc., you'll find that updating java's consistently (especially un-packaged) has a large footprint in storage and management, more so if you customize the jdk installations (local CA's, most recent timezones and so on). Instead, I install/unpack/customize once in the GZ (one best-compressed dataset per jdk in a structure resembling the standard solaris jdk installation), and lofs-mount the whole lot into the local zones. If certain zones need more customization, you can clone off their copy of jdk dataset and delegate into the zone, but we've never needed that beyond some testing of the approach (cumbersome but works and saves space). Also, once you've completed this update on one host, it is easy to 'zfs-send' to your other GZ's hosting LZs with javas.

HTH,
//Jim

Ps: OTN = Oracle TechNet 
--
Typos courtesy of K-9 Mail on my Samsung Android

From groups at tierarzt-mueller.de  Fri Jul 25 14:04:59 2014
From: groups at tierarzt-mueller.de (Alexander Lesle)
Date: Fri, 25 Jul 2014 16:04:59 +0200
Subject: [OmniOS-discuss] Oracle Java v7u65
In-Reply-To: <1456d13f-43ac-42f1-a21f-feec284f8b6c@email.android.com>
References: <567177507.20140724202532@tierarzt-mueller.de>
	<201407241833.s6OIXChp026989@elvis.arl.psu.edu>
	<1727715525.20140724210044@tierarzt-mueller.de>
	<1456d13f-43ac-42f1-a21f-feec284f8b6c@email.android.com>
Message-ID: <821427678.20140725160459@tierarzt-mueller.de>

Hello Jim Klimov,

No, I havent.

Thank you very much for your informations and tips.

On Juli, 25 2014, 10:47 <Jim Klimov> wrote in [1]:

> Did you previously use and manage Java, on solaris, ilkumos or other oses?

> There is a JAVA_HOME environment variable (set in shell, profile,
> initscript, smf attribs, etc.) that points your Java program such as
> tomcat, or a cli tool, or some gui installer or whatever to the
> installation location of the jvm you want to use in this case. Of
> course, the "java" program for this jvm instance  should be from the
> corresponding "$JAVA_HOME/bin" path.

> So you can have multiple installations and many JVMs running with
> different versions (there are programs where backwards compatibility
> does not cut it and you do need an older version of Java for example).

> It is customary to install Solaris java's into
> /usr/jdk/instances/<versioncode>/ and symlink /usr/jdk/latest and
> /usr/java to the directory with the version you need most likely,
> and a dozen programs in the standard PATH like /usr/bin/java are in
> fact symlinks to /usr/java/bin/java and such.

> For hosts where /usr is system-managed and should not be touched by
> users according to some policy, /opt/jdk, /opt/java or plain
> /opt/<versioncode> are commonly used as containers for jre/jdk
> installations (typically as unzipped archives, unpackaged). It still
> makes sense to maintain /opt/java or even /usr/java (if changeable)
> to point to the installation this zone needs.

> Note that for some java versions the x86_64 package or archive only
> includes the 64-bit files and should overlay a 32-bit jvm of the
> same version installed/unpacked into the same location. For other
> oses or major java versions the releases are fully sufficient. An
> indicator may be the file size (i.e. 80mb 32-bit + 10mb 64-bit vs. both similarly-sized).

> As a hint, if you use many local zones to host farms of java
> appservers,etc., you'll find that updating java's consistently
> (especially un-packaged) has a large footprint in storage and
> management, more so if you customize the jdk installations (local
> CA's, most recent timezones and so on). Instead, I
> install/unpack/customize once in the GZ (one best-compressed dataset
> per jdk in a structure resembling the standard solaris jdk
> installation), and lofs-mount the whole lot into the local zones. If
> certain zones need more customization, you can clone off their copy
> of jdk dataset and delegate into the zone, but we've never needed
> that beyond some testing of the approach (cumbersome but works and
> saves space). Also, once you've completed this update on one host,
> it is easy to 'zfs-send' to your other GZ's hosting LZs with javas.

> HTH,
> //Jim

> Ps: OTN = Oracle TechNet 
> --
> Typos courtesy of K-9 Mail on my Samsung Android


-- 
Best Regards
Alexander
Juli, 25 2014
........
[1] mid:1456d13f-43ac-42f1-a21f-feec284f8b6c at email.android.com
........


From nrhuff at umn.edu  Mon Jul 28 16:59:43 2014
From: nrhuff at umn.edu (Nathan Huff)
Date: Mon, 28 Jul 2014 11:59:43 -0500
Subject: [OmniOS-discuss] question on pkg info behavior
Message-ID: <53D6817F.50407@umn.edu>

I am currently setting up a server using 151006 LTS and I am seeing 
something unexpected from the 'pkg info command' and I am not sure if it 
is a bug or just that I don't understand how it is supposed to work.

If I run 'pkg info gnu-patch' I get

           Name: text/gnu-patch
        Summary: The GNU Patch utility
          State: Installed
      Publisher: omnios
        Version: 2.7
  Build Release: 5.11
         Branch: 0.151006
Packaging Date: Mon May  6 19:55:18 2013
           Size: 239.38 kB
           FMRI: 
pkg://omnios/text/gnu-patch at 2.7,5.11-0.151006:20130506T195518Z

Which is what I expect, but if I run 'pkg info -r gnu-patch' I get

           Name: text/gnu-patch
        Summary: The GNU Patch utility
          State: Not installed
      Publisher: omnios
        Version: 2.7
  Build Release: 5.11
         Branch: 0.151008
Packaging Date: Wed Dec  4 02:52:08 2013
           Size: 240.62 kB
           FMRI: 
pkg://omnios/text/gnu-patch at 2.7,5.11-0.151008:20131204T025208Z

And if I run 'pkg install gnu-patch'
It tells me there are no updates for the image.

It looks like 'pkg info -r' Isn't restricting itself to the system's 
branch, but 'pkg install' is.  It seems like the correct behavior would 
be to have pkg info restrict itself to the branch as well.

-- 
Nathan Huff
System Administrator
Academic Health Center Information Systems
University of Minnesota
612-626-9136

From esproul at omniti.com  Mon Jul 28 22:06:58 2014
From: esproul at omniti.com (Eric Sproul)
Date: Mon, 28 Jul 2014 18:06:58 -0400
Subject: [OmniOS-discuss] question on pkg info behavior
In-Reply-To: <53D6817F.50407@umn.edu>
References: <53D6817F.50407@umn.edu>
Message-ID: <CA+QY2RTDMi+f31YQdvFTMYLcUYzhUK9oiHwR8n5-aKg=E=PuMg@mail.gmail.com>

On Mon, Jul 28, 2014 at 12:59 PM, Nathan Huff <nrhuff at umn.edu> wrote:
> It looks like 'pkg info -r' Isn't restricting itself to the system's branch,
> but 'pkg install' is.  It seems like the correct behavior would be to have
> pkg info restrict itself to the branch as well.

Hi Nathan,
The behavior of 'pkg info -r' is to show you the most recent version
available in the remote repository.  That remote version may or may
not be installable due to local restrictions (in your case, it's the
omnios-userland package, which incorporates on the r151006 version of
gnu-patch).  See pkg(5) for an explanation of incorporate
dependencies.

In the case of LTS and r151008, these versions are in the same repo.
Starting with r151010, there is a separate repo per release, which
will causes fewer problems and less confusion for users.

Eric

From moo at wuffers.net  Tue Jul 29 00:11:32 2014
From: moo at wuffers.net (wuffers)
Date: Mon, 28 Jul 2014 20:11:32 -0400
Subject: [OmniOS-discuss] Slow scrub performance
Message-ID: <CA+tR_KxTmft+hK9U8HLgqmPOdZf0+9q67+Kg9p1JwX6sg5drAA@mail.gmail.com>

Does this look normal?

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0 in 0h3m with 0 errors on Tue Jul 15 09:36:17 2014
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c4t0d0s0  ONLINE       0     0     0
            c4t1d0s0  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Mon Jul 14 17:54:42 2014
    6.59T scanned out of 24.2T at 5.71M/s, (scan is slow, no estimated time)
    0 repaired, 27.25% done
config:

        NAME                       STATE     READ WRITE CKSUM
        tank                       ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            c1t5000C50055F9F637d0  ONLINE       0     0     0
            c1t5000C50055F9EF2Fd0  ONLINE       0     0     0
          mirror-1                 ONLINE       0     0     0
            c1t5000C50055F87D97d0  ONLINE       0     0     0
            c1t5000C50055F9D3B3d0  ONLINE       0     0     0
          mirror-2                 ONLINE       0     0     0
            c1t5000C50055E6606Fd0  ONLINE       0     0     0
            c1t5000C50055F9F92Bd0  ONLINE       0     0     0
          mirror-3                 ONLINE       0     0     0
            c1t5000C50055F856CFd0  ONLINE       0     0     0
            c1t5000C50055F9FE87d0  ONLINE       0     0     0
          mirror-4                 ONLINE       0     0     0
            c1t5000C50055F84A97d0  ONLINE       0     0     0
            c1t5000C50055FA0AF7d0  ONLINE       0     0     0
          mirror-5                 ONLINE       0     0     0
            c1t5000C50055F9D3E3d0  ONLINE       0     0     0
            c1t5000C50055F9F0B3d0  ONLINE       0     0     0
          mirror-6                 ONLINE       0     0     0
            c1t5000C50055F8A46Fd0  ONLINE       0     0     0
            c1t5000C50055F9FB8Bd0  ONLINE       0     0     0
          mirror-7                 ONLINE       0     0     0
            c1t5000C50055F8B21Fd0  ONLINE       0     0     0
            c1t5000C50055F9F89Fd0  ONLINE       0     0     0
          mirror-8                 ONLINE       0     0     0
            c1t5000C50055F8BE3Fd0  ONLINE       0     0     0
            c1t5000C50055F9E123d0  ONLINE       0     0     0
          mirror-9                 ONLINE       0     0     0
            c1t5000C50055F9379Bd0  ONLINE       0     0     0
            c1t5000C50055F9E7D7d0  ONLINE       0     0     0
          mirror-10                ONLINE       0     0     0
            c1t5000C50055E65F0Fd0  ONLINE       0     0     0
            c1t5000C50055F9F80Bd0  ONLINE       0     0     0
          mirror-11                ONLINE       0     0     0
            c1t5000C50055F8A22Bd0  ONLINE       0     0     0
            c1t5000C50055F8D48Fd0  ONLINE       0     0     0
          mirror-12                ONLINE       0     0     0
            c1t5000C50055E65807d0  ONLINE       0     0     0
            c1t5000C50055F8BFA3d0  ONLINE       0     0     0
          mirror-13                ONLINE       0     0     0
            c1t5000C50055E579F7d0  ONLINE       0     0     0
            c1t5000C50055E65877d0  ONLINE       0     0     0
          mirror-14                ONLINE       0     0     0
            c1t5000C50055F9FA1Fd0  ONLINE       0     0     0
            c1t5000C50055F8CDA7d0  ONLINE       0     0     0
          mirror-15                ONLINE       0     0     0
            c1t5000C50055F8BF9Bd0  ONLINE       0     0     0
            c1t5000C50055F9A607d0  ONLINE       0     0     0
          mirror-16                ONLINE       0     0     0
            c1t5000C50055E66503d0  ONLINE       0     0     0
            c1t5000C50055E4FDE7d0  ONLINE       0     0     0
          mirror-17                ONLINE       0     0     0
            c1t5000C50055F8E017d0  ONLINE       0     0     0
            c1t5000C50055F9F3EBd0  ONLINE       0     0     0
          mirror-18                ONLINE       0     0     0
            c1t5000C50055F8B80Fd0  ONLINE       0     0     0
            c1t5000C50055F9F63Bd0  ONLINE       0     0     0
          mirror-19                ONLINE       0     0     0
            c1t5000C50055F84FB7d0  ONLINE       0     0     0
            c1t5000C50055F9FEABd0  ONLINE       0     0     0
          mirror-20                ONLINE       0     0     0
            c1t5000C50055F8CCAFd0  ONLINE       0     0     0
            c1t5000C50055F9F91Bd0  ONLINE       0     0     0
          mirror-21                ONLINE       0     0     0
            c1t5000C50055E65ABBd0  ONLINE       0     0     0
            c1t5000C50055F8905Fd0  ONLINE       0     0     0
          mirror-22                ONLINE       0     0     0
            c1t5000C50055E57A5Fd0  ONLINE       0     0     0
            c1t5000C50055F87E73d0  ONLINE       0     0     0
          mirror-23                ONLINE       0     0     0
            c1t5000C50055E66053d0  ONLINE       0     0     0
            c1t5000C50055E66B63d0  ONLINE       0     0     0
          mirror-24                ONLINE       0     0     0
            c1t5000C50055F8723Bd0  ONLINE       0     0     0
            c1t5000C50055F8C3ABd0  ONLINE       0     0     0
        logs
          c2t5000A72A3007811Dd0    ONLINE       0     0     0
        cache
          c2t500117310015D579d0    ONLINE       0     0     0
          c2t50011731001631FDd0    ONLINE       0     0     0
          c12t500117310015D59Ed0   ONLINE       0     0     0
          c12t500117310015D54Ed0   ONLINE       0     0     0
        spares
          c1t5000C50055FA2AEFd0    AVAIL
          c1t5000C50055E595B7d0    AVAIL

errors: No known data errors

---
This is a ~90TB SAN on r151008, with 25 pairs of 4TB mirror drives. The
last scrub I ran was about 3 months ago, which took (from my recollection)
~250 hours or so. I've only run about 4 scrubs so far on this installation.

The current scrub has been running for 2 weeks, with no end in sight. The
last time I saw an estimate, it said around ~650 hours remaining.

This thread http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/46021
from over 3 years ago mention the metaslab_min_alloc_size as a way to
improve this (reducing it to 4K from 10MB). Further reading into this
property got me this Illumos bug: https://www.illumos.org/issues/54, which
states "Turns out this tunable is made irrelevant as a result of a change
to use the metaslab_df_ops allocator. We don't need to change it. I'm
closing this bug.". So that seems like a dead end to me.

This is the current load with scrub running (~350 VMs between Hyper-V and
VMware environments):

# iostat -xnze
                            extended device statistics       ---- errors ---
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn
tot device
    0.4   12.5   39.7   78.8  0.1  0.0    5.0    0.1   0   0   0   0   0
0 rpool
    0.2    6.9   19.9   39.4  0.0  0.0    0.0    0.1   0   0   0   0   0
0 c4t0d0
    0.2    6.8   19.9   39.4  0.0  0.0    0.0    0.1   0   0   0   0   0
0 c4t1d0
    4.4   29.3  209.7  962.7  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F8723Bd0
    4.7   25.1  209.4  962.3  0.0  0.0    0.0    1.5   0   3   0   0   0
0 c1t5000C50055E66B63d0
    4.7   27.6  208.3  952.7  0.0  0.0    0.0    1.3   0   3   0   0   0
0 c1t5000C50055F87E73d0
    4.4   28.6  209.1  974.3  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F8BFA3d0
    4.4   28.9  208.3  964.5  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9E123d0
    4.4   25.7  208.7  955.7  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9F0B3d0
    4.4   26.5  209.1  960.9  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9D3B3d0
    4.3   25.2  206.6  936.1  0.0  0.0    0.0    1.5   0   3   0   0   0
0 c1t5000C50055E4FDE7d0
    4.4   26.9  208.1  982.6  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9A607d0
    4.4   24.5  208.7  955.4  0.0  0.0    0.0    1.5   0   3   0   0   0
0 c1t5000C50055F8CDA7d0
    4.3   26.5  207.8  943.8  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055E65877d0
    4.4   27.7  208.0  961.1  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9E7D7d0
    4.3   26.0  208.0  953.9  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055FA0AF7d0
    4.3   26.1  208.0  966.2  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9FE87d0
    4.4   28.5  208.6  965.3  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9F91Bd0
    4.3   26.7  207.2  945.0  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9FEABd0
    4.4   26.5  209.3  980.1  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9F63Bd0
    4.3   26.1  207.6  944.3  0.0  0.0    0.0    1.5   0   3   0   0   0
0 c1t5000C50055F9F3EBd0
    4.3   26.5  208.1  954.9  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9F80Bd0
   32.5   14.7 1005.6  751.2  0.0  0.0    0.0    0.3   0   1   0   0   0
0 c2t500117310015D579d0
   32.5   14.7 1004.1  751.2  0.0  0.0    0.0    0.3   0   1   0   0   0
0 c2t50011731001631FDd0
    0.0  180.8    0.0 16434.5  0.0  0.3    0.0    1.6   0   4   0   0   0
0 c2t5000A72A3007811Dd0
    4.4   25.3  208.7  966.7  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9FB8Bd0
    4.4   26.3  208.5  949.1  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9F92Bd0
    4.4   29.7  208.6  975.1  0.0  0.0    0.0    1.3   0   3   0   0   0
0 c1t5000C50055F8905Fd0
    4.4   25.7  207.9  954.1  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F8D48Fd0
    4.4   26.8  208.4  967.4  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9F89Fd0
    4.4   28.5  208.1  964.9  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9EF2Fd0
    4.4   29.4  209.5  962.7  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F8C3ABd0
    4.7   25.0  208.9  962.3  0.0  0.0    0.0    1.5   0   3   0   0   0
0 c1t5000C50055E66053d0
    4.3   25.1  207.5  936.1  0.0  0.0    0.0    1.5   0   3   0   0   0
0 c1t5000C50055E66503d0
    4.4   25.6  209.1  955.7  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9D3E3d0
    4.3   26.6  207.4  945.0  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F84FB7d0
    4.3   26.0  207.5  944.3  0.0  0.0    0.0    1.5   0   3   0   0   0
0 c1t5000C50055F8E017d0
    4.3   26.4  207.1  943.8  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055E579F7d0
    4.4   28.5  208.8  974.3  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055E65807d0
    4.4   25.9  208.5  953.9  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F84A97d0
    4.4   26.4  209.2  960.9  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F87D97d0
    4.4   28.5  208.8  964.9  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9F637d0
    4.4   29.6  208.9  975.1  0.0  0.0    0.0    1.3   0   3   0   0   0
0 c1t5000C50055E65ABBd0
    4.4   26.7  208.5  982.6  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F8BF9Bd0
    4.3   25.6  207.6  954.1  0.0  0.0    0.0    1.5   0   3   0   0   0
0 c1t5000C50055F8A22Bd0
    4.4   27.6  208.2  961.1  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F9379Bd0
    4.7   27.6  208.3  952.8  0.0  0.0    0.0    1.3   0   3   0   0   0
0 c1t5000C50055E57A5Fd0
    4.4   28.4  208.4  965.3  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F8CCAFd0
    4.4   26.4  208.9  980.1  0.0  0.0    0.0    1.5   0   3   0   0   0
0 c1t5000C50055F8B80Fd0
    4.4   24.4  208.9  955.4  0.0  0.0    0.0    1.5   0   3   0   0   0
0 c1t5000C50055F9FA1Fd0
    4.3   26.4  207.6  954.9  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055E65F0Fd0
    4.4   28.8  208.3  964.5  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F8BE3Fd0
    4.3   26.7  207.4  967.4  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F8B21Fd0
    4.4   25.1  208.9  966.7  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F8A46Fd0
    4.4   26.0  209.7  966.2  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055F856CFd0
    4.4   26.2  209.0  949.1  0.0  0.0    0.0    1.4   0   3   0   0   0
0 c1t5000C50055E6606Fd0
   32.5   14.7 1004.3  750.9  0.0  0.0    0.0    0.3   0   1   0   0   0
0 c12t500117310015D59Ed0
   32.5   14.7 1004.4  751.3  0.0  0.0    0.0    0.3   0   1   0   0   0
0 c12t500117310015D54Ed0
  349.1  646.9 14437.7 67437.3 52.7  2.6   52.9    2.6  12  37   0   0   0
  0 tank

What should I be checking for? Is a scrub supposed to take that long (and I
thought over 10 days for the last one was long..)? There doesn't seem to be
any hardware errors. Is the load too high (12% wait, 37% busy with asvc_t
of 2.6ms)?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140728/5e9a91b1/attachment-0001.html>

From jsavikko at niksula.hut.fi  Tue Jul 29 12:21:05 2014
From: jsavikko at niksula.hut.fi (Janne Savikko)
Date: Tue, 29 Jul 2014 15:21:05 +0300 (EEST)
Subject: [OmniOS-discuss] dd does not work as expected with count=0 option
Message-ID: <alpine.GSO.2.03.1407291439180.28703@niksula.hut.fi>

Hi,

I've used dd to create sparse files, and I noticed that dd does not work 
as expected with count=0 option, but keeps writing indefinitely. This does 
not happen with dd of Ubuntu 12.04,14.04, Solaris 11 Express (snv_151a), 
OSX 10.10 beta or OpenBSD 5.5.

OmniOS build that I've tested this problem: omnios-b281e50 i86pc
This behavior happens also with old Solaris 10 (Generic_139555-08 sun4v).

Manual pages state "count=n, Copies only n input blocks". So according to 
documentation I expect it to copy only 0 blocks.

Is this desired behavior (and should documentation be fixed) or a bug?


Cheers,
Janne

From danmcd at omniti.com  Tue Jul 29 12:33:42 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 29 Jul 2014 08:33:42 -0400
Subject: [OmniOS-discuss] dd does not work as expected with count=0
	option
In-Reply-To: <alpine.GSO.2.03.1407291439180.28703@niksula.hut.fi>
References: <alpine.GSO.2.03.1407291439180.28703@niksula.hut.fi>
Message-ID: <D1B5CB75-105E-4BE5-B824-9237740946E5@omniti.com>

I suspect this is an Illumos bug.  I'm top posting this because it's my phone, and I think the Illumos developers mailing list should confirm my suspicions.

Dan

Sent from my iPhone (typos, autocorrect, and all)

> On Jul 29, 2014, at 8:21 AM, Janne Savikko <jsavikko at niksula.hut.fi> wrote:
> 
> Hi,
> 
> I've used dd to create sparse files, and I noticed that dd does not work as expected with count=0 option, but keeps writing indefinitely. This does not happen with dd of Ubuntu 12.04,14.04, Solaris 11 Express (snv_151a), OSX 10.10 beta or OpenBSD 5.5.
> 
> OmniOS build that I've tested this problem: omnios-b281e50 i86pc
> This behavior happens also with old Solaris 10 (Generic_139555-08 sun4v).
> 
> Manual pages state "count=n, Copies only n input blocks". So according to documentation I expect it to copy only 0 blocks.
> 
> Is this desired behavior (and should documentation be fixed) or a bug?
> 
> 
> Cheers,
> Janne
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

From paul at pk1048.com  Tue Jul 29 15:02:18 2014
From: paul at pk1048.com (PK1048)
Date: Tue, 29 Jul 2014 11:02:18 -0400
Subject: [OmniOS-discuss] Slow scrub performance
In-Reply-To: <CA+tR_KxTmft+hK9U8HLgqmPOdZf0+9q67+Kg9p1JwX6sg5drAA@mail.gmail.com>
References: <CA+tR_KxTmft+hK9U8HLgqmPOdZf0+9q67+Kg9p1JwX6sg5drAA@mail.gmail.com>
Message-ID: <8A89E9CC-065D-432A-9F0C-3C9583284B97@pk1048.com>

On Jul 28, 2014, at 20:11, wuffers <moo at wuffers.net> wrote:

> Does this look normal?

Short answer, yes. ? Keep in mind that 

1. a scrub runs in the background (so as not to impact production I/O, this was not always the case and caused serious issues in the past with a pool being unresponsive due to a scrub)

2. a scrub essentially walks the zpool examining every transaction in order (as does a resilver)

So the time to complete a scrub depends on how many write transactions since the pool was created (which is generally related to the amount of data but not always). You are limited by the random I/O capability of the disks involved. With VMs I assume this is a file server, so the I/O size will also affect performance.

<snip>

> This is a ~90TB SAN on r151008, with 25 pairs of 4TB mirror drives. The last scrub I ran was about 3 months ago, which took (from my recollection) ~250 hours or so. I've only run about 4 scrubs so far on this installation.
> 
> The current scrub has been running for 2 weeks, with no end in sight. The last time I saw an estimate, it said around ~650 hours remaining. 

Run the numbers? you are scanning 24.2TB at about 5.5MB/sec ? 4,613,734 seconds or 54 days. And that assumes the same rate for all of the scan. The rate will change as other I/O competes for resources.

> 
> This thread http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/46021 from over 3 years ago mention the metaslab_min_alloc_size as a way to improve this (reducing it to 4K from 10MB). Further reading into this property got me this Illumos bug: https://www.illumos.org/issues/54, which states "Turns out this tunable is made irrelevant as a result of a change to use the metaslab_df_ops allocator. We don't need to change it. I'm closing this bug.". So that seems like a dead end to me. 
> 
> This is the current load with scrub running (~350 VMs between Hyper-V and VMware environments):
> 
> # iostat -xnze
>                             extended device statistics       ---- errors ---
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
>     0.4   12.5   39.7   78.8  0.1  0.0    5.0    0.1   0   0   0   0   0   0 rpool
>     0.2    6.9   19.9   39.4  0.0  0.0    0.0    0.1   0   0   0   0   0   0 c4t0d0
>     0.2    6.8   19.9   39.4  0.0  0.0    0.0    0.1   0   0   0   0   0   0 c4t1d0
>     4.4   29.3  209.7  962.7  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8723Bd0
>     4.7   25.1  209.4  962.3  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055E66B63d0
>     4.7   27.6  208.3  952.7  0.0  0.0    0.0    1.3   0   3   0   0   0   0 c1t5000C50055F87E73d0

<snip>

Looks like you have a fair bit of activity going on (almost 1MB/sec of writes per spindle).

Since this is storage for VMs, I assume this is the storage server for separate compute servers? Have you tuned the block size for the file share you are using? That can make a huge difference in performance.

I also noted that you only have a single LOG device. Best Practice is to mirror log devices so you do not lose any data in flight if hit by a power outage (of course, if this server has more UPS runtime that all the clients that may not matter).

You may want to ask this question over on the ZFS discuss list?

Subscribe here: https://www.listbox.com/subscribe/?listname=zfs at lists.illumos.org


From richard.elling at richardelling.com  Tue Jul 29 15:29:19 2014
From: richard.elling at richardelling.com (Richard Elling)
Date: Tue, 29 Jul 2014 08:29:19 -0700
Subject: [OmniOS-discuss] Slow scrub performance
In-Reply-To: <CA+tR_KxTmft+hK9U8HLgqmPOdZf0+9q67+Kg9p1JwX6sg5drAA@mail.gmail.com>
References: <CA+tR_KxTmft+hK9U8HLgqmPOdZf0+9q67+Kg9p1JwX6sg5drAA@mail.gmail.com>
Message-ID: <C7904250-3A94-4D4B-8D62-D49FDE808008@richardelling.com>

On Jul 28, 2014, at 5:11 PM, wuffers <moo at wuffers.net> wrote:

> Does this look normal?

maybe, maybe not

> 
>   pool: rpool
>  state: ONLINE
>   scan: scrub repaired 0 in 0h3m with 0 errors on Tue Jul 15 09:36:17 2014
> config:
> 
>         NAME          STATE     READ WRITE CKSUM
>         rpool         ONLINE       0     0     0
>           mirror-0    ONLINE       0     0     0
>             c4t0d0s0  ONLINE       0     0     0
>             c4t1d0s0  ONLINE       0     0     0
> 
> errors: No known data errors
> 
>   pool: tank
>  state: ONLINE
>   scan: scrub in progress since Mon Jul 14 17:54:42 2014
>     6.59T scanned out of 24.2T at 5.71M/s, (scan is slow, no estimated time)

this is slower than most, surely slower than desired

>     0 repaired, 27.25% done
> config:
> 
>         NAME                       STATE     READ WRITE CKSUM
>         tank                       ONLINE       0     0     0
>           mirror-0                 ONLINE       0     0     0
>             c1t5000C50055F9F637d0  ONLINE       0     0     0
>             c1t5000C50055F9EF2Fd0  ONLINE       0     0     0
>           mirror-1                 ONLINE       0     0     0
>             c1t5000C50055F87D97d0  ONLINE       0     0     0
>             c1t5000C50055F9D3B3d0  ONLINE       0     0     0
>           mirror-2                 ONLINE       0     0     0
>             c1t5000C50055E6606Fd0  ONLINE       0     0     0
>             c1t5000C50055F9F92Bd0  ONLINE       0     0     0
>           mirror-3                 ONLINE       0     0     0
>             c1t5000C50055F856CFd0  ONLINE       0     0     0
>             c1t5000C50055F9FE87d0  ONLINE       0     0     0
>           mirror-4                 ONLINE       0     0     0
>             c1t5000C50055F84A97d0  ONLINE       0     0     0
>             c1t5000C50055FA0AF7d0  ONLINE       0     0     0
>           mirror-5                 ONLINE       0     0     0
>             c1t5000C50055F9D3E3d0  ONLINE       0     0     0
>             c1t5000C50055F9F0B3d0  ONLINE       0     0     0
>           mirror-6                 ONLINE       0     0     0
>             c1t5000C50055F8A46Fd0  ONLINE       0     0     0
>             c1t5000C50055F9FB8Bd0  ONLINE       0     0     0
>           mirror-7                 ONLINE       0     0     0
>             c1t5000C50055F8B21Fd0  ONLINE       0     0     0
>             c1t5000C50055F9F89Fd0  ONLINE       0     0     0
>           mirror-8                 ONLINE       0     0     0
>             c1t5000C50055F8BE3Fd0  ONLINE       0     0     0
>             c1t5000C50055F9E123d0  ONLINE       0     0     0
>           mirror-9                 ONLINE       0     0     0
>             c1t5000C50055F9379Bd0  ONLINE       0     0     0
>             c1t5000C50055F9E7D7d0  ONLINE       0     0     0
>           mirror-10                ONLINE       0     0     0
>             c1t5000C50055E65F0Fd0  ONLINE       0     0     0
>             c1t5000C50055F9F80Bd0  ONLINE       0     0     0
>           mirror-11                ONLINE       0     0     0
>             c1t5000C50055F8A22Bd0  ONLINE       0     0     0
>             c1t5000C50055F8D48Fd0  ONLINE       0     0     0
>           mirror-12                ONLINE       0     0     0
>             c1t5000C50055E65807d0  ONLINE       0     0     0
>             c1t5000C50055F8BFA3d0  ONLINE       0     0     0
>           mirror-13                ONLINE       0     0     0
>             c1t5000C50055E579F7d0  ONLINE       0     0     0
>             c1t5000C50055E65877d0  ONLINE       0     0     0
>           mirror-14                ONLINE       0     0     0
>             c1t5000C50055F9FA1Fd0  ONLINE       0     0     0
>             c1t5000C50055F8CDA7d0  ONLINE       0     0     0
>           mirror-15                ONLINE       0     0     0
>             c1t5000C50055F8BF9Bd0  ONLINE       0     0     0
>             c1t5000C50055F9A607d0  ONLINE       0     0     0
>           mirror-16                ONLINE       0     0     0
>             c1t5000C50055E66503d0  ONLINE       0     0     0
>             c1t5000C50055E4FDE7d0  ONLINE       0     0     0
>           mirror-17                ONLINE       0     0     0
>             c1t5000C50055F8E017d0  ONLINE       0     0     0
>             c1t5000C50055F9F3EBd0  ONLINE       0     0     0
>           mirror-18                ONLINE       0     0     0
>             c1t5000C50055F8B80Fd0  ONLINE       0     0     0
>             c1t5000C50055F9F63Bd0  ONLINE       0     0     0
>           mirror-19                ONLINE       0     0     0
>             c1t5000C50055F84FB7d0  ONLINE       0     0     0
>             c1t5000C50055F9FEABd0  ONLINE       0     0     0
>           mirror-20                ONLINE       0     0     0
>             c1t5000C50055F8CCAFd0  ONLINE       0     0     0
>             c1t5000C50055F9F91Bd0  ONLINE       0     0     0
>           mirror-21                ONLINE       0     0     0
>             c1t5000C50055E65ABBd0  ONLINE       0     0     0
>             c1t5000C50055F8905Fd0  ONLINE       0     0     0
>           mirror-22                ONLINE       0     0     0
>             c1t5000C50055E57A5Fd0  ONLINE       0     0     0
>             c1t5000C50055F87E73d0  ONLINE       0     0     0
>           mirror-23                ONLINE       0     0     0
>             c1t5000C50055E66053d0  ONLINE       0     0     0
>             c1t5000C50055E66B63d0  ONLINE       0     0     0
>           mirror-24                ONLINE       0     0     0
>             c1t5000C50055F8723Bd0  ONLINE       0     0     0
>             c1t5000C50055F8C3ABd0  ONLINE       0     0     0
>         logs
>           c2t5000A72A3007811Dd0    ONLINE       0     0     0
>         cache
>           c2t500117310015D579d0    ONLINE       0     0     0
>           c2t50011731001631FDd0    ONLINE       0     0     0
>           c12t500117310015D59Ed0   ONLINE       0     0     0
>           c12t500117310015D54Ed0   ONLINE       0     0     0
>         spares
>           c1t5000C50055FA2AEFd0    AVAIL
>           c1t5000C50055E595B7d0    AVAIL
> 
> errors: No known data errors
> 
> ---
> This is a ~90TB SAN on r151008, with 25 pairs of 4TB mirror drives. The last scrub I ran was about 3 months ago, which took (from my recollection) ~250 hours or so. I've only run about 4 scrubs so far on this installation.
> 
> The current scrub has been running for 2 weeks, with no end in sight. The last time I saw an estimate, it said around ~650 hours remaining. 

The estimate is often very wrong, especially for busy systems.
If this is an older ZFS implementation, this pool is likely getting pounded by the
ZFS write throttle. There are some tunings that can be applied, but the old write
throttle is not a stable control system, so it will always be a little bit unpredictable.

> 
> This thread http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/46021 from over 3 years ago mention the metaslab_min_alloc_size as a way to improve this (reducing it to 4K from 10MB). Further reading into this property got me this Illumos bug: https://www.illumos.org/issues/54, which states "Turns out this tunable is made irrelevant as a result of a change to use the metaslab_df_ops allocator. We don't need to change it. I'm closing this bug.". So that seems like a dead end to me. 

dead end.

> 
> This is the current load with scrub running (~350 VMs between Hyper-V and VMware environments):
> 
> # iostat -xnze

Unfortunately, this is the performance since boot and is not suitable for performance
analysis unless the system has been rebooted in the past 10 minutes or so. You'll need
to post the second batch from "iostat -zxCn 60 2"

>                             extended device statistics       ---- errors ---
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
>     0.4   12.5   39.7   78.8  0.1  0.0    5.0    0.1   0   0   0   0   0   0 rpool
>     0.2    6.9   19.9   39.4  0.0  0.0    0.0    0.1   0   0   0   0   0   0 c4t0d0
>     0.2    6.8   19.9   39.4  0.0  0.0    0.0    0.1   0   0   0   0   0   0 c4t1d0
>     4.4   29.3  209.7  962.7  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8723Bd0
>     4.7   25.1  209.4  962.3  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055E66B63d0
>     4.7   27.6  208.3  952.7  0.0  0.0    0.0    1.3   0   3   0   0   0   0 c1t5000C50055F87E73d0
>     4.4   28.6  209.1  974.3  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8BFA3d0
>     4.4   28.9  208.3  964.5  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9E123d0
>     4.4   25.7  208.7  955.7  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F0B3d0
>     4.4   26.5  209.1  960.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9D3B3d0
>     4.3   25.2  206.6  936.1  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055E4FDE7d0
>     4.4   26.9  208.1  982.6  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9A607d0
>     4.4   24.5  208.7  955.4  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055F8CDA7d0
>     4.3   26.5  207.8  943.8  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055E65877d0
>     4.4   27.7  208.0  961.1  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9E7D7d0
>     4.3   26.0  208.0  953.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055FA0AF7d0
>     4.3   26.1  208.0  966.2  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9FE87d0
>     4.4   28.5  208.6  965.3  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F91Bd0
>     4.3   26.7  207.2  945.0  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9FEABd0
>     4.4   26.5  209.3  980.1  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F63Bd0
>     4.3   26.1  207.6  944.3  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055F9F3EBd0
>     4.3   26.5  208.1  954.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F80Bd0
>    32.5   14.7 1005.6  751.2  0.0  0.0    0.0    0.3   0   1   0   0   0   0 c2t500117310015D579d0
>    32.5   14.7 1004.1  751.2  0.0  0.0    0.0    0.3   0   1   0   0   0   0 c2t50011731001631FDd0
>     0.0  180.8    0.0 16434.5  0.0  0.3    0.0    1.6   0   4   0   0   0   0 c2t5000A72A3007811Dd0
>     4.4   25.3  208.7  966.7  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9FB8Bd0
>     4.4   26.3  208.5  949.1  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F92Bd0
>     4.4   29.7  208.6  975.1  0.0  0.0    0.0    1.3   0   3   0   0   0   0 c1t5000C50055F8905Fd0
>     4.4   25.7  207.9  954.1  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8D48Fd0
>     4.4   26.8  208.4  967.4  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F89Fd0
>     4.4   28.5  208.1  964.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9EF2Fd0
>     4.4   29.4  209.5  962.7  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8C3ABd0
>     4.7   25.0  208.9  962.3  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055E66053d0
>     4.3   25.1  207.5  936.1  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055E66503d0
>     4.4   25.6  209.1  955.7  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9D3E3d0
>     4.3   26.6  207.4  945.0  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F84FB7d0
>     4.3   26.0  207.5  944.3  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055F8E017d0
>     4.3   26.4  207.1  943.8  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055E579F7d0
>     4.4   28.5  208.8  974.3  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055E65807d0
>     4.4   25.9  208.5  953.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F84A97d0
>     4.4   26.4  209.2  960.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F87D97d0
>     4.4   28.5  208.8  964.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9F637d0
>     4.4   29.6  208.9  975.1  0.0  0.0    0.0    1.3   0   3   0   0   0   0 c1t5000C50055E65ABBd0
>     4.4   26.7  208.5  982.6  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8BF9Bd0
>     4.3   25.6  207.6  954.1  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055F8A22Bd0
>     4.4   27.6  208.2  961.1  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F9379Bd0
>     4.7   27.6  208.3  952.8  0.0  0.0    0.0    1.3   0   3   0   0   0   0 c1t5000C50055E57A5Fd0
>     4.4   28.4  208.4  965.3  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8CCAFd0
>     4.4   26.4  208.9  980.1  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055F8B80Fd0
>     4.4   24.4  208.9  955.4  0.0  0.0    0.0    1.5   0   3   0   0   0   0 c1t5000C50055F9FA1Fd0
>     4.3   26.4  207.6  954.9  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055E65F0Fd0
>     4.4   28.8  208.3  964.5  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8BE3Fd0
>     4.3   26.7  207.4  967.4  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8B21Fd0
>     4.4   25.1  208.9  966.7  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F8A46Fd0
>     4.4   26.0  209.7  966.2  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055F856CFd0
>     4.4   26.2  209.0  949.1  0.0  0.0    0.0    1.4   0   3   0   0   0   0 c1t5000C50055E6606Fd0
>    32.5   14.7 1004.3  750.9  0.0  0.0    0.0    0.3   0   1   0   0   0   0 c12t500117310015D59Ed0
>    32.5   14.7 1004.4  751.3  0.0  0.0    0.0    0.3   0   1   0   0   0   0 c12t500117310015D54Ed0
>   349.1  646.9 14437.7 67437.3 52.7  2.6   52.9    2.6  12  37   0   0   0   0 tank
> 
> What should I be checking for? Is a scrub supposed to take that long (and I thought over 10 days for the last one was long..)? There doesn't seem to be any hardware errors. Is the load too high (12% wait, 37% busy with asvc_t of 2.6ms)?

There are many variables here, the biggest of which is the current non-scrub load.
 -- richard

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140729/041ddad4/attachment-0001.html>

From tobi at oetiker.ch  Tue Jul 29 15:50:02 2014
From: tobi at oetiker.ch (Tobias Oetiker)
Date: Tue, 29 Jul 2014 17:50:02 +0200 (CEST)
Subject: [OmniOS-discuss] announcement znapzend a new zfs backup tool
Message-ID: <alpine.DEB.2.02.1407291748500.6752@froburg.oetiker.ch>

Just out:

 ZnapZend a Multilevel Backuptool for ZFS

It is on Github. Check out

 http://www.znapzend.org

cheers
tobi

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
www.oetiker.ch tobi at oetiker.ch +41 62 775 9902


From jesus at omniti.com  Tue Jul 29 15:54:07 2014
From: jesus at omniti.com (Theo Schlossnagle)
Date: Tue, 29 Jul 2014 11:54:07 -0400
Subject: [OmniOS-discuss] announcement znapzend a new zfs backup tool
In-Reply-To: <alpine.DEB.2.02.1407291748500.6752@froburg.oetiker.ch>
References: <alpine.DEB.2.02.1407291748500.6752@froburg.oetiker.ch>
Message-ID: <CACLsAptC_wDb+Stkw2-jZkgp7oQZ4OwEUWG_Nnrm_xkaoOkGRg@mail.gmail.com>

Awesome!


On Tue, Jul 29, 2014 at 11:50 AM, Tobias Oetiker <tobi at oetiker.ch> wrote:

> Just out:
>
>  ZnapZend a Multilevel Backuptool for ZFS
>
> It is on Github. Check out
>
>  http://www.znapzend.org
>
> cheers
> tobi
>
> --
> Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
> www.oetiker.ch tobi at oetiker.ch +41 62 775 9902
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>


-- 

Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140729/f8adbbf5/attachment.html>

From skiselkov.ml at gmail.com  Tue Jul 29 15:59:18 2014
From: skiselkov.ml at gmail.com (Saso Kiselkov)
Date: Tue, 29 Jul 2014 17:59:18 +0200
Subject: [OmniOS-discuss] announcement znapzend a new zfs backup tool
In-Reply-To: <alpine.DEB.2.02.1407291748500.6752@froburg.oetiker.ch>
References: <alpine.DEB.2.02.1407291748500.6752@froburg.oetiker.ch>
Message-ID: <53D7C4D6.5060308@gmail.com>

On 7/29/14, 5:50 PM, Tobias Oetiker wrote:
> Just out:
> 
>  ZnapZend a Multilevel Backuptool for ZFS
> 
> It is on Github. Check out
> 
>  http://www.znapzend.org

Neat, especially the feature that the backup config is part of a
dataset's properties. Very cool.

-- 
Saso


From moo at wuffers.net  Tue Jul 29 19:29:38 2014
From: moo at wuffers.net (wuffers)
Date: Tue, 29 Jul 2014 15:29:38 -0400
Subject: [OmniOS-discuss] Slow scrub performance
In-Reply-To: <C7904250-3A94-4D4B-8D62-D49FDE808008@richardelling.com>
References: <CA+tR_KxTmft+hK9U8HLgqmPOdZf0+9q67+Kg9p1JwX6sg5drAA@mail.gmail.com>
	<C7904250-3A94-4D4B-8D62-D49FDE808008@richardelling.com>
Message-ID: <CA+tR_KwX_1HN4tVa+-ZOFJk2mN7RE-nFh31sMcTNo7TJJjfyLg@mail.gmail.com>

Going to try to answer both responses in one message..

Short answer, yes. ? Keep in mind that
>
> 1. a scrub runs in the background (so as not to impact production I/O,
> this was not always the case and caused serious issues in the past with a
> pool being unresponsive due to a scrub)
>
> 2. a scrub essentially walks the zpool examining every transaction in
> order (as does a resilver)
>
> So the time to complete a scrub depends on how many write transactions
> since the pool was created (which is generally related to the amount of
> data but not always). You are limited by the random I/O capability of the
> disks involved. With VMs I assume this is a file server, so the I/O size
> will also affect performance.


I haven't noticed any slowdowns in our virtual environments, so I guess
that's a good thing it's so low priority that it doesn't impact workloads.

Run the numbers? you are scanning 24.2TB at about 5.5MB/sec ? 4,613,734
> seconds or 54 days. And that assumes the same rate for all of the scan. The
> rate will change as other I/O competes for resources.
>

The number was fluctuating when I started the scrub, and I had seen it go
as high as 35MB/s at one point. I am certain that our Hyper-V workload has
increased since the last scrub, so this does make sense.


> Looks like you have a fair bit of activity going on (almost 1MB/sec of
> writes per spindle).
>

As Richard correctly states below, this is the aggregate since boot (uptime
~56 days). I have another output from iostat as per his instructions below.


> Since this is storage for VMs, I assume this is the storage server for
> separate compute servers? Have you tuned the block size for the file share
> you are using? That can make a huge difference in performance.
>

Both the Hyper-V and VMware LUNs are created with 64K block sizes. From
what I've read of other performance and tuning articles, that is the
optimal block size (I did some limited testing when first configuring the
SAN, but results were somewhat inconclusive). Hyper-V hosts our testing
environment (we integrate with TFS, a MS product, so we have no choice
here) and probably make up the bulk of the workload (~300+ test VMs with
various OSes). VMware hosts our production servers (Exchange, file servers,
SQL, AD, etc - ~50+ VMs).

I also noted that you only have a single LOG device. Best Practice is to
> mirror log devices so you do not lose any data in flight if hit by a power
> outage (of course, if this server has more UPS runtime that all the clients
> that may not matter).
>

Actually, I do have a mirror ZIL device, it's just disabled at this time
(my ZIL devices are ZeusRAMs). At some point, I was troubleshooting some
kernel panics (turned out to be a faulty SSD on the rpool), and hadn't
re-enabled it yet. Thanks for the reminder (and yes, we do have a UPS as
well).

And oops.. re-attaching the ZIL as a mirror triggered a resilver now,
suspending or canceling the scrub? Will monitor this and restart the scrub
if it doesn't by itself.

  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Jul 29 14:48:48 2014
    3.89T scanned out of 24.5T at 3.06G/s, 1h55m to go
    0 resilvered, 15.84% done

At least it's going very fast. EDIT: Now about 67% done as I finish writing
this, speed dropping to ~1.3G/s.

maybe, maybe not
>>
>> this is slower than most, surely slower than desired
>>
>
Unfortunately reattaching the mirror to my log device triggered a resilver.
Not sure if this is desired behavior, but yes, 5.5MB/s seems quite slow.
Hopefully after the resilver the scrub will progress where it left off.


> The estimate is often very wrong, especially for busy systems.
>> If this is an older ZFS implementation, this pool is likely getting
>> pounded by the
>> ZFS write throttle. There are some tunings that can be applied, but the
>> old write
>> throttle is not a stable control system, so it will always be a little
>> bit unpredictable.
>>
>
The system is on r151008 (my BE states that I upgraded back in February,
putting me in r151008j or so), with all the pools upgraded for the new
enhancements as well as activating the new L2ARC compression feature.
Reading the release notes, the ZFS write throttle enhancements were in
since r151008e so I should be good there.


> # iostat -xnze
>>
>>
>> Unfortunately, this is the performance since boot and is not suitable for
>> performance
>> analysis unless the system has been rebooted in the past 10 minutes or
>> so. You'll need
>> to post the second batch from "iostat -zxCn 60 2"
>>
>
Ah yes, that was my mistake. Output from second count (before re-attaching
log mirror):

# iostat -zxCn 60 2

                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  255.7 1077.7 6294.0 41335.1  0.0  1.9    0.0    1.4   0 153 c1
    5.3   23.9  118.5  811.9  0.0  0.0    0.0    1.1   0   3
c1t5000C50055F8723Bd0
    5.9   14.5  110.0  834.3  0.0  0.0    0.0    1.3   0   2
c1t5000C50055E66B63d0
    5.6   16.6  123.8  822.7  0.0  0.0    0.0    1.3   0   2
c1t5000C50055F87E73d0
    4.7   27.8  118.6  796.6  0.0  0.0    0.0    1.3   0   3
c1t5000C50055F8BFA3d0
    5.6   14.5  139.7  833.8  0.0  0.0    0.0    1.6   0   3
c1t5000C50055F9E123d0
    4.4   27.1  112.3  825.2  0.0  0.0    0.0    0.8   0   2
c1t5000C50055F9F0B3d0
    5.0   20.2  121.7  803.4  0.0  0.0    0.0    1.2   0   3
c1t5000C50055F9D3B3d0
    5.4   26.4  137.0  857.3  0.0  0.0    0.0    1.4   0   4
c1t5000C50055E4FDE7d0
    4.7   12.3  123.7  832.7  0.0  0.0    0.0    2.0   0   3
c1t5000C50055F9A607d0
    5.0   23.9  125.9  830.9  0.0  0.0    0.0    1.3   0   3
c1t5000C50055F8CDA7d0
    4.5   31.4  112.2  814.6  0.0  0.0    0.0    1.1   0   3
c1t5000C50055E65877d0
    5.2   24.4  130.6  872.5  0.0  0.0    0.0    1.2   0   3
c1t5000C50055F9E7D7d0
    4.1   21.8  103.7  797.2  0.0  0.0    0.0    1.1   0   3
c1t5000C50055FA0AF7d0
    5.5   24.8  129.8  802.8  0.0  0.0    0.0    1.5   0   4
c1t5000C50055F9FE87d0
    5.7   17.7  137.2  797.6  0.0  0.0    0.0    1.4   0   3
c1t5000C50055F9F91Bd0
    6.0   30.6  139.1  852.0  0.0  0.1    0.0    1.5   0   4
c1t5000C50055F9FEABd0
    6.1   34.1  137.8  929.2  0.0  0.1    0.0    1.9   0   6
c1t5000C50055F9F63Bd0
    4.1   15.9  101.8  791.4  0.0  0.0    0.0    1.6   0   3
c1t5000C50055F9F3EBd0
    6.4   23.2  155.2  878.6  0.0  0.0    0.0    1.1   0   3
c1t5000C50055F9F80Bd0
    4.5   23.5  106.2  825.4  0.0  0.0    0.0    1.1   0   3
c1t5000C50055F9FB8Bd0
    4.0   23.2  101.1  788.9  0.0  0.0    0.0    1.3   0   3
c1t5000C50055F9F92Bd0
    4.4   11.3  125.7  782.3  0.0  0.0    0.0    1.9   0   3
c1t5000C50055F8905Fd0
    4.6   20.4  129.2  823.0  0.0  0.0    0.0    1.5   0   3
c1t5000C50055F8D48Fd0
    5.1   19.7  142.9  887.2  0.0  0.0    0.0    1.7   0   3
c1t5000C50055F9F89Fd0
    5.6   11.4  129.1  776.0  0.0  0.0    0.0    1.9   0   3
c1t5000C50055F9EF2Fd0
    5.6   23.7  137.4  811.9  0.0  0.0    0.0    1.2   0   3
c1t5000C50055F8C3ABd0
    6.8   13.9  132.4  834.3  0.0  0.0    0.0    1.8   0   3
c1t5000C50055E66053d0
    5.2   26.7  126.9  857.3  0.0  0.0    0.0    1.2   0   3
c1t5000C50055E66503d0
    4.2   27.1  104.6  825.2  0.0  0.0    0.0    1.0   0   3
c1t5000C50055F9D3E3d0
    5.2   30.7  140.9  852.0  0.0  0.1    0.0    1.5   0   4
c1t5000C50055F84FB7d0
    5.4   16.1  124.3  791.4  0.0  0.0    0.0    1.7   0   3
c1t5000C50055F8E017d0
    3.8   31.4   89.7  814.6  0.0  0.0    0.0    1.1   0   4
c1t5000C50055E579F7d0
    4.6   27.5  116.0  796.6  0.0  0.1    0.0    1.6   0   4
c1t5000C50055E65807d0
    4.0   21.5   99.7  797.2  0.0  0.0    0.0    1.1   0   3
c1t5000C50055F84A97d0
    4.7   20.2  116.3  803.4  0.0  0.0    0.0    1.4   0   3
c1t5000C50055F87D97d0
    5.0   11.5  121.5  776.0  0.0  0.0    0.0    2.0   0   3
c1t5000C50055F9F637d0
    4.9   11.3  112.4  782.3  0.0  0.0    0.0    2.3   0   3
c1t5000C50055E65ABBd0
    5.3   11.8  142.5  832.7  0.0  0.0    0.0    2.4   0   3
c1t5000C50055F8BF9Bd0
    5.0   20.3  121.4  823.0  0.0  0.0    0.0    1.7   0   3
c1t5000C50055F8A22Bd0
    6.6   24.3  170.3  872.5  0.0  0.0    0.0    1.3   0   3
c1t5000C50055F9379Bd0
    5.8   16.3  121.7  822.7  0.0  0.0    0.0    1.3   0   2
c1t5000C50055E57A5Fd0
    5.3   17.7  146.5  797.6  0.0  0.0    0.0    1.4   0   3
c1t5000C50055F8CCAFd0
    5.7   34.1  141.5  929.2  0.0  0.1    0.0    1.7   0   5
c1t5000C50055F8B80Fd0
    5.5   23.8  125.7  830.9  0.0  0.0    0.0    1.2   0   3
c1t5000C50055F9FA1Fd0
    5.0   23.2  127.9  878.6  0.0  0.0    0.0    1.1   0   3
c1t5000C50055E65F0Fd0
    5.2   14.0  163.7  833.8  0.0  0.0    0.0    2.0   0   3
c1t5000C50055F8BE3Fd0
    4.6   18.9  122.8  887.2  0.0  0.0    0.0    1.6   0   3
c1t5000C50055F8B21Fd0
    5.5   23.6  137.4  825.4  0.0  0.0    0.0    1.5   0   3
c1t5000C50055F8A46Fd0
    4.9   24.6  116.7  802.8  0.0  0.0    0.0    1.4   0   4
c1t5000C50055F856CFd0
    4.9   23.4  120.8  788.9  0.0  0.0    0.0    1.4   0   3
c1t5000C50055E6606Fd0
  234.9  170.1 4079.9 11127.8  0.0  0.2    0.0    0.5   0   9 c2
  119.0   28.9 2083.8  670.8  0.0  0.0    0.0    0.3   0   3
c2t500117310015D579d0
  115.9   27.4 1996.1  634.2  0.0  0.0    0.0    0.3   0   3
c2t50011731001631FDd0
    0.0  113.8    0.0 9822.8  0.0  0.1    0.0    1.0   0   2
c2t5000A72A3007811Dd0
    0.1   18.5    0.0   64.8  0.0  0.0    0.0    0.0   0   0 c4
    0.1    9.2    0.0   32.4  0.0  0.0    0.0    0.0   0   0 c4t0d0
    0.0    9.2    0.0   32.4  0.0  0.0    0.0    0.0   0   0 c4t1d0
  229.8   58.1 3987.4 1308.0  0.0  0.1    0.0    0.3   0   6 c12
  114.2   27.7 1994.8  626.0  0.0  0.0    0.0    0.3   0   3
c12t500117310015D59Ed0
  115.5   30.4 1992.6  682.0  0.0  0.0    0.0    0.3   0   3
c12t500117310015D54Ed0
    0.1   17.1    0.0   64.8  0.0  0.0    0.6    0.1   0   0 rpool
  720.3 1298.4 14361.2 53770.8 18.7  2.3    9.3    1.1   6  68 tank

Is 153% busy correct on c1? Seems to me that disks are quite "busy", but
are handling the workload just fine (wait at 6% and asvc_t at 1.1ms)

Interestingly, this is the same output now that the resilver is running:

                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 2876.9 1041.1 25400.7 38189.1  0.0 37.9    0.0    9.7   0 2011 c1
   60.8   26.1  540.1  845.2  0.0  0.7    0.0    8.3   0  39
c1t5000C50055F8723Bd0
   58.4   14.2  511.6  740.7  0.0  0.7    0.0   10.1   0  39
c1t5000C50055E66B63d0
   60.2   16.3  529.3  756.1  0.0  0.8    0.0   10.1   0  41
c1t5000C50055F87E73d0
   57.5   24.9  527.6  841.7  0.0  0.7    0.0    9.0   0  40
c1t5000C50055F8BFA3d0
   57.9   14.5  543.5  765.1  0.0  0.7    0.0    9.8   0  38
c1t5000C50055F9E123d0
   57.9   23.9  516.6  806.9  0.0  0.8    0.0    9.3   0  40
c1t5000C50055F9F0B3d0
   59.8   24.6  554.1  857.5  0.0  0.8    0.0    9.6   0  42
c1t5000C50055F9D3B3d0
   56.5   21.0  480.4  715.7  0.0  0.7    0.0    8.9   0  37
c1t5000C50055E4FDE7d0
   54.8    9.7  473.5  737.9  0.0  0.7    0.0   11.2   0  39
c1t5000C50055F9A607d0
   55.8   20.2  457.3  708.7  0.0  0.7    0.0    9.9   0  40
c1t5000C50055F8CDA7d0
   57.8   28.6  487.0  796.1  0.0  0.9    0.0    9.9   0  45
c1t5000C50055E65877d0
   60.8   27.1  572.6  823.7  0.0  0.8    0.0    8.8   0  41
c1t5000C50055F9E7D7d0
   55.8   21.1  478.2  766.6  0.0  0.7    0.0    9.7   0  40
c1t5000C50055FA0AF7d0
   57.0   22.8  528.3  724.5  0.0  0.8    0.0    9.6   0  41
c1t5000C50055F9FE87d0
   56.2   10.8  465.2  715.6  0.0  0.7    0.0   10.4   0  38
c1t5000C50055F9F91Bd0
   59.2   29.4  524.6  740.9  0.0  0.8    0.0    8.9   0  41
c1t5000C50055F9FEABd0
   57.3   30.7  496.7  788.3  0.0  0.8    0.0    9.1   0  42
c1t5000C50055F9F63Bd0
   55.5   16.3  461.9  652.9  0.0  0.7    0.0   10.1   0  39
c1t5000C50055F9F3EBd0
   57.2   22.1  495.1  701.1  0.0  0.8    0.0    9.8   0  41
c1t5000C50055F9F80Bd0
   59.5   30.2  543.1  741.8  0.0  0.9    0.0    9.6   0  45
c1t5000C50055F9FB8Bd0
   56.5   25.1  515.4  786.9  0.0  0.7    0.0    8.6   0  38
c1t5000C50055F9F92Bd0
   61.8   12.5  540.6  790.9  0.0  0.8    0.0   10.3   0  41
c1t5000C50055F8905Fd0
   57.0   19.8  521.0  774.3  0.0  0.7    0.0    9.6   0  39
c1t5000C50055F8D48Fd0
   56.3   16.3  517.7  724.7  0.0  0.7    0.0    9.9   0  38
c1t5000C50055F9F89Fd0
   57.0   13.4  504.5  790.5  0.0  0.8    0.0   10.7   0  40
c1t5000C50055F9EF2Fd0
   55.0   26.1  477.6  845.2  0.0  0.7    0.0    8.3   0  36
c1t5000C50055F8C3ABd0
   57.8   14.1  518.7  740.7  0.0  0.8    0.0   10.8   0  41
c1t5000C50055E66053d0
   55.9   20.8  490.2  715.7  0.0  0.7    0.0    9.0   0  37
c1t5000C50055E66503d0
   57.0   24.1  509.7  806.9  0.0  0.8    0.0   10.0   0  41
c1t5000C50055F9D3E3d0
   59.1   29.2  504.1  740.9  0.0  0.8    0.0    9.3   0  44
c1t5000C50055F84FB7d0
   54.4   16.3  449.5  652.9  0.0  0.7    0.0   10.4   0  39
c1t5000C50055F8E017d0
   57.8   28.4  503.3  796.1  0.0  0.9    0.0   10.1   0  45
c1t5000C50055E579F7d0
   58.2   24.9  502.0  841.7  0.0  0.8    0.0    9.2   0  40
c1t5000C50055E65807d0
   58.2   20.7  513.4  766.6  0.0  0.8    0.0    9.8   0  41
c1t5000C50055F84A97d0
   56.5   24.9  508.0  857.5  0.0  0.8    0.0    9.2   0  40
c1t5000C50055F87D97d0
   53.4   13.5  449.9  790.5  0.0  0.7    0.0   10.7   0  38
c1t5000C50055F9F637d0
   57.0   11.8  503.0  790.9  0.0  0.7    0.0   10.6   0  39
c1t5000C50055E65ABBd0
   55.4    9.6  461.1  737.9  0.0  0.8    0.0   11.6   0  40
c1t5000C50055F8BF9Bd0
   55.7   19.7  484.6  774.3  0.0  0.7    0.0    9.9   0  40
c1t5000C50055F8A22Bd0
   57.6   27.1  518.2  823.7  0.0  0.8    0.0    8.9   0  40
c1t5000C50055F9379Bd0
   59.6   17.0  528.0  756.1  0.0  0.8    0.0   10.1   0  41
c1t5000C50055E57A5Fd0
   61.2   10.8  530.0  715.6  0.0  0.8    0.0   10.7   0  40
c1t5000C50055F8CCAFd0
   58.0   30.8  493.3  788.3  0.0  0.8    0.0    9.4   0  43
c1t5000C50055F8B80Fd0
   56.5   19.9  490.7  708.7  0.0  0.8    0.0   10.0   0  40
c1t5000C50055F9FA1Fd0
   56.1   22.4  484.2  701.1  0.0  0.7    0.0    9.5   0  39
c1t5000C50055E65F0Fd0
   59.2   14.6  560.9  765.1  0.0  0.7    0.0    9.8   0  39
c1t5000C50055F8BE3Fd0
   57.9   16.2  546.0  724.7  0.0  0.7    0.0   10.1   0  40
c1t5000C50055F8B21Fd0
   59.5   30.0  553.2  741.8  0.0  0.9    0.0    9.8   0  45
c1t5000C50055F8A46Fd0
   57.4   22.5  504.0  724.5  0.0  0.8    0.0    9.6   0  41
c1t5000C50055F856CFd0
   58.4   24.6  531.4  786.9  0.0  0.7    0.0    8.4   0  38
c1t5000C50055E6606Fd0
  511.0  161.4 7572.1 11260.1  0.0  0.3    0.0    0.4   0  14 c2
  252.3   20.1 3776.3  458.9  0.0  0.1    0.0    0.2   0   6
c2t500117310015D579d0
  258.8   18.0 3795.7  350.0  0.0  0.1    0.0    0.2   0   6
c2t50011731001631FDd0
    0.0  123.4    0.0 10451.1  0.0  0.1    0.0    1.0   0   3
c2t5000A72A3007811Dd0
    0.2   16.1    1.9   56.7  0.0  0.0    0.0    0.0   0   0 c4
    0.2    8.1    1.6   28.3  0.0  0.0    0.0    0.0   0   0 c4t0d0
    0.0    8.1    0.3   28.3  0.0  0.0    0.0    0.0   0   0 c4t1d0
  495.6  163.6 7168.9 11290.3  0.0  0.2    0.0    0.4   0  14 c12
    0.0  123.4    0.0 10451.1  0.0  0.1    0.0    1.0   0   3
c12t5000A72B300780FFd0
  248.2   18.1 3645.8  323.0  0.0  0.1    0.0    0.2   0   5
c12t500117310015D59Ed0
  247.4   22.1 3523.1  516.2  0.0  0.1    0.0    0.2   0   6
c12t500117310015D54Ed0
    0.2   14.8    1.9   56.7  0.0  0.0    0.6    0.1   0   0 rpool
 3883.5 1357.7 40141.6 60739.5 22.8 38.6    4.4    7.4  54 100 tank

It is very busy with alot of wait % and higher asvc_t (2011% busy on c1?!).
I'm assuming resilvers are alot more aggressive than scrubs.

There are many variables here, the biggest of which is the current
>> non-scrub load.
>>
>
I might have lost 2 weeks of scrub time, depending on whether the scrub
will resume where it left off. I'll update when I can.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140729/1b53a492/attachment-0001.html>

From henson at acm.org  Tue Jul 29 20:32:18 2014
From: henson at acm.org (Paul B. Henson)
Date: Tue, 29 Jul 2014 13:32:18 -0700
Subject: [OmniOS-discuss] LDAP TLS client services (on r151006)
In-Reply-To: <0BB6AEA7-8454-447F-BE21-5B8B09E26188@homeshore.be>
References: <mailman.1418.1378337322.5504.omnios-discuss@lists.omniti.com>
	<0BB6AEA7-8454-447F-BE21-5B8B09E26188@homeshore.be>
Message-ID: <1f7501cfab6c$336023b0$9a206b10$@acm.org>

> From: Thierry Bingen
> Sent: Monday, July 28, 2014 10:37 AM
>
> The native ldapsearch having been compiled without the DEBUG option, I
> installed the OpenLDAP version of ldapsearch which lets you use the debug
> options. The latter informed me that "TLS certificate verification: Error,
self
> signed certificate in certificate chain". I had installed the (private) CA
> certificate in the NSS DB (cert8.db, key3.db, secmod.db) with certutil
though.
> I then replaced the TLS_CACERTDIR of the OpenLDAP ldap.conf pointing to
> the NSS DB directory with a TLS_CACERT pointing directly to the CA
> certificate PEM file, and, bingo, it worked!

I don't believe openldap uses NSS format certificate databases, so pointing
it at one is presumably doomed to failure regardless of the validity of the
database.

> I therefore suspect that there is something wrong with my NSS DB. I read
> somewhere that it shouldn't be cert8.db but cert7.db. I also read the
> opposite. Other than that, certutil seems happy with the contents of the
NSS
> DB. I am lost.

As a point of reference, for both solaris and illumos I have successfully
used cert8.db and key3.db format NSS certificate repositories.


From gearboxes at outlook.com  Tue Jul 29 21:20:28 2014
From: gearboxes at outlook.com (Machine Man)
Date: Tue, 29 Jul 2014 17:20:28 -0400
Subject: [OmniOS-discuss] KVM - copy paste in VM
Message-ID: <BAY178-W4301D902E49164650131C7AEF80@phx.gbl>

Hello all,
First I want to thank everyone involved with creating and supporting OmniOS.
On one of the systems we have a need to run two VMs using KVM. Everything works fine except when a copy of a large file is made inside the VM. Copying over the network result in 38MB/s - 70MB/s and inside the VM it will drop to 700KB/s and somtimes stall entirely. It takes just over 5 min to copy 2.5GB inside the VM.The system has 20 3TB NL-SAS drives and has no problem to perform with VMware connected via FC. Also has a 280GB enterprise SSD allocated for cache.
for a HDD device is ide the only supported bus?
Thanks, 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140729/75ebdf33/attachment.html>

From bfriesen at simple.dallas.tx.us  Wed Jul 30 01:39:59 2014
From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn)
Date: Tue, 29 Jul 2014 20:39:59 -0500 (CDT)
Subject: [OmniOS-discuss] KVM - copy paste in VM
In-Reply-To: <BAY178-W4301D902E49164650131C7AEF80@phx.gbl>
References: <BAY178-W4301D902E49164650131C7AEF80@phx.gbl>
Message-ID: <alpine.GSO.2.01.1407292037140.18732@freddy.simplesystems.org>

On Tue, 29 Jul 2014, Machine Man wrote:

> Hello all,
> First I want to thank everyone involved with creating and supporting OmniOS.
> 
> On one of the systems we have a need to run two VMs using KVM.?
> Everything works fine except when a copy of a large file is made inside the VM. Copying over the network result
> in 38MB/s - 70MB/s and inside the VM it will drop to 700KB/s and somtimes stall entirely. It takes just over 5
> min to copy 2.5GB inside the VM.

What direction is your network copy going (from VM to native server, 
from native server to VM, from VM on one server to VM on another)?

You have not described your situation very clearly to us at all.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From gearboxes at outlook.com  Wed Jul 30 12:30:15 2014
From: gearboxes at outlook.com (Machine Man)
Date: Wed, 30 Jul 2014 08:30:15 -0400
Subject: [OmniOS-discuss] KVM - copy paste in VM
In-Reply-To: <alpine.GSO.2.01.1407292037140.18732@freddy.simplesystems.org>
References: <BAY178-W4301D902E49164650131C7AEF80@phx.gbl>,
	<alpine.GSO.2.01.1407292037140.18732@freddy.simplesystems.org>
Message-ID: <BAY178-W39178D31263D6B84EDEE95AEF90@phx.gbl>

The problem was making a copy of a large file inside the VM (duplicating the file in the VM)It was very slow and it looks like it is the ide device.Changed the controller to virtio and it is much faster.I tried using an image file for this disk, but this was slower than pointing to zvol on ide.  I will test with the controller set to virtio and file img for disk.Overall,it is not bad now and the virtual machine is much more usable when making large copies, still seems slow from what I would expect. Copy start off at 120MB/s and quickly drops down to 14MB/s and then jumps up and down to about 30 or 40 a few times, but no longer stalls as before.Thanks
> Date: Tue, 29 Jul 2014 20:39:59 -0500
> From: bfriesen at simple.dallas.tx.us
> To: gearboxes at outlook.com
> CC: omnios-discuss at lists.omniti.com
> Subject: Re: [OmniOS-discuss] KVM - copy paste in VM
> 
> On Tue, 29 Jul 2014, Machine Man wrote:
> 
> > Hello all,
> > First I want to thank everyone involved with creating and supporting OmniOS.
> > 
> > On one of the systems we have a need to run two VMs using KVM. 
> > Everything works fine except when a copy of a large file is made inside the VM. Copying over the network result
> > in 38MB/s - 70MB/s and inside the VM it will drop to 700KB/s and somtimes stall entirely. It takes just over 5
> > min to copy 2.5GB inside the VM.
> 
> What direction is your network copy going (from VM to native server, 
> from native server to VM, from VM on one server to VM on another)?
> 
> You have not described your situation very clearly to us at all.
> 
> Bob
> -- 
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140730/c5d86fb8/attachment.html>

From bfriesen at simple.dallas.tx.us  Wed Jul 30 14:12:47 2014
From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn)
Date: Wed, 30 Jul 2014 09:12:47 -0500 (CDT)
Subject: [OmniOS-discuss] KVM - copy paste in VM
In-Reply-To: <BAY178-W39178D31263D6B84EDEE95AEF90@phx.gbl>
References: <BAY178-W4301D902E49164650131C7AEF80@phx.gbl>,
	<alpine.GSO.2.01.1407292037140.18732@freddy.simplesystems.org>
	<BAY178-W39178D31263D6B84EDEE95AEF90@phx.gbl>
Message-ID: <alpine.GSO.2.01.1407300907161.18732@freddy.simplesystems.org>

On Wed, 30 Jul 2014, Machine Man wrote:

> The problem was making a copy of a large file inside the VM (duplicating the file in the VM)It was very slow and
> it looks like it is the ide device.
> Changed the controller to virtio and it is much faster.
> I tried using an image file for this disk, but this was slower than pointing to zvol on ide. ?I will test with
> the controller set to virtio and file img for disk.
> Overall,it is not bad now and the virtual machine is much more usable when making large copies, still seems slow
> from what I would expect. Copy start off at 120MB/s and quickly drops down to 14MB/s and then jumps up and down
> to about 30 or 40 a few times, but no longer stalls as before.

What operating system do you have installed in your virtual machine 
and what filesystem are you using?  Is the backing zfs volume 
blocksize properly matched with the blocksize of the virtual machine's 
blocksize?  If the blocksize and offsets are not well matched, then 
performance would suffer quite a lot.

The VM writes to the zfs volume would normally be synchronous writes 
(does not return util data is on disk), which will be slow unless you 
have added a zfs slog (perhaps with an SSD) to make synchronous writes 
faster.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From moo at wuffers.net  Thu Jul 31 04:10:20 2014
From: moo at wuffers.net (wuffers)
Date: Thu, 31 Jul 2014 00:10:20 -0400
Subject: [OmniOS-discuss] Slow scrub performance
In-Reply-To: <CA+tR_KwX_1HN4tVa+-ZOFJk2mN7RE-nFh31sMcTNo7TJJjfyLg@mail.gmail.com>
References: <CA+tR_KxTmft+hK9U8HLgqmPOdZf0+9q67+Kg9p1JwX6sg5drAA@mail.gmail.com>
	<C7904250-3A94-4D4B-8D62-D49FDE808008@richardelling.com>
	<CA+tR_KwX_1HN4tVa+-ZOFJk2mN7RE-nFh31sMcTNo7TJJjfyLg@mail.gmail.com>
Message-ID: <CA+tR_KxSt3VVeUVZAOL4FxWsRL3M-PwpBF6JbHVXv5Xx66Jozg@mail.gmail.com>

So as I suspected, I lost 2 weeks of scrub time after the resilver. I
started a scrub again, and it's going extremely slow (~13x slower than
before):

  pool: tank
 state: ONLINE
  scan: scrub in progress since Tue Jul 29 15:41:27 2014
    45.4G scanned out of 24.5T at 413K/s, (scan is slow, no estimated time)
    0 repaired, 0.18% done

# iostat -zxCn 60 2 (2nd batch output)

                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  143.7 1321.5 5149.0 46223.4  0.0  1.5    0.0    1.0   0 120 c1
    2.4   33.3   72.0  897.5  0.0  0.0    0.0    0.6   0   2
c1t5000C50055F8723Bd0
    2.7   22.8   82.9 1005.4  0.0  0.0    0.0    0.9   0   2
c1t5000C50055E66B63d0
    2.2   24.4   73.1  917.7  0.0  0.0    0.0    0.7   0   2
c1t5000C50055F87E73d0
    3.1   26.2  120.9  899.8  0.0  0.0    0.0    0.8   0   2
c1t5000C50055F8BFA3d0
    2.8   16.5  105.9  941.6  0.0  0.0    0.0    1.0   0   2
c1t5000C50055F9E123d0
    2.5   25.6   86.6  897.9  0.0  0.0    0.0    0.7   0   2
c1t5000C50055F9F0B3d0
    2.3   19.9   85.3  967.8  0.0  0.0    0.0    1.2   0   2
c1t5000C50055F9D3B3d0
    3.1   38.3  120.7 1053.1  0.0  0.0    0.0    0.8   0   3
c1t5000C50055E4FDE7d0
    2.6   12.7   81.8  854.3  0.0  0.0    0.0    1.6   0   2
c1t5000C50055F9A607d0
    3.2   25.0  121.7  871.7  0.0  0.0    0.0    0.8   0   2
c1t5000C50055F8CDA7d0
    2.5   30.6   93.0  941.2  0.0  0.0    0.0    0.9   0   2
c1t5000C50055E65877d0
    3.1   43.7  101.4 1004.2  0.0  0.0    0.0    1.0   0   4
c1t5000C50055F9E7D7d0
    2.3   24.0   92.2  965.8  0.0  0.0    0.0    0.9   0   2
c1t5000C50055FA0AF7d0
    2.5   25.3   99.2  872.9  0.0  0.0    0.0    0.8   0   2
c1t5000C50055F9FE87d0
    2.9   19.0  116.1  894.8  0.0  0.0    0.0    1.2   0   2
c1t5000C50055F9F91Bd0
    2.6   38.9   96.1  915.4  0.0  0.1    0.0    1.2   0   4
c1t5000C50055F9FEABd0
    3.2   45.6  135.7  973.5  0.0  0.1    0.0    1.5   0   5
c1t5000C50055F9F63Bd0
    3.1   21.2  105.9  966.6  0.0  0.0    0.0    1.0   0   2
c1t5000C50055F9F3EBd0
    2.8   26.7  122.0  781.6  0.0  0.0    0.0    0.7   0   2
c1t5000C50055F9F80Bd0
    3.1   31.6  119.9  932.5  0.0  0.0    0.0    1.1   0   3
c1t5000C50055F9FB8Bd0
    3.1   32.5  123.3  924.1  0.0  0.0    0.0    0.9   0   3
c1t5000C50055F9F92Bd0
    2.9   17.0  113.8  952.0  0.0  0.0    0.0    1.2   0   2
c1t5000C50055F8905Fd0
    3.0   23.4  111.0  871.1  0.0  0.0    0.0    1.5   0   2
c1t5000C50055F8D48Fd0
    2.8   21.4  105.5  858.0  0.0  0.0    0.0    1.0   0   2
c1t5000C50055F9F89Fd0
    3.5   16.4   87.1  941.3  0.0  0.0    0.0    1.4   0   2
c1t5000C50055F9EF2Fd0
    2.1   33.8   64.5  897.5  0.0  0.0    0.0    0.5   0   2
c1t5000C50055F8C3ABd0
    3.0   21.8   72.3 1005.4  0.0  0.0    0.0    1.0   0   2
c1t5000C50055E66053d0
    3.0   37.8  106.9 1053.5  0.0  0.0    0.0    0.9   0   3
c1t5000C50055E66503d0
    2.7   26.0  107.7  897.9  0.0  0.0    0.0    0.7   0   2
c1t5000C50055F9D3E3d0
    2.2   38.9   96.4  918.7  0.0  0.0    0.0    0.9   0   4
c1t5000C50055F84FB7d0
    2.8   21.4  111.1  953.6  0.0  0.0    0.0    0.7   0   1
c1t5000C50055F8E017d0
    3.0   30.6  104.3  940.9  0.0  0.1    0.0    1.5   0   3
c1t5000C50055E579F7d0
    2.8   26.4   90.9  901.1  0.0  0.0    0.0    0.9   0   2
c1t5000C50055E65807d0
    2.4   24.0   96.7  965.8  0.0  0.0    0.0    0.9   0   2
c1t5000C50055F84A97d0
    2.9   19.8  109.4  967.8  0.0  0.0    0.0    1.1   0   2
c1t5000C50055F87D97d0
    3.8   16.1  106.4  943.1  0.0  0.0    0.0    1.3   0   2
c1t5000C50055F9F637d0
    2.2   17.1   72.7  966.6  0.0  0.0    0.0    1.4   0   2
c1t5000C50055E65ABBd0
    2.7   12.7   86.0  863.3  0.0  0.0    0.0    1.5   0   2
c1t5000C50055F8BF9Bd0
    2.7   23.2  101.8  871.1  0.0  0.0    0.0    1.0   0   2
c1t5000C50055F8A22Bd0
    4.5   43.6  134.7 1004.2  0.0  0.0    0.0    1.0   0   4
c1t5000C50055F9379Bd0
    2.8   24.0   87.9  917.7  0.0  0.0    0.0    0.8   0   2
c1t5000C50055E57A5Fd0
    2.9   18.8  119.0  894.3  0.0  0.0    0.0    1.1   0   2
c1t5000C50055F8CCAFd0
    3.4   45.7  128.1  976.8  0.0  0.1    0.0    1.2   0   5
c1t5000C50055F8B80Fd0
    2.7   24.9  100.2  871.7  0.0  0.0    0.0    0.8   0   2
c1t5000C50055F9FA1Fd0
    4.8   26.8  128.6  781.6  0.0  0.0    0.0    0.7   0   2
c1t5000C50055E65F0Fd0
    2.7   16.3  109.5  941.6  0.0  0.0    0.0    1.1   0   2
c1t5000C50055F8BE3Fd0
    3.1   21.1  119.9  858.0  0.0  0.0    0.0    1.1   0   2
c1t5000C50055F8B21Fd0
    2.8   31.8  108.5  932.5  0.0  0.0    0.0    1.0   0   3
c1t5000C50055F8A46Fd0
    2.4   25.3   87.4  872.9  0.0  0.0    0.0    0.8   0   2
c1t5000C50055F856CFd0
    3.3   32.0  125.2  924.1  0.0  0.0    0.0    1.2   0   3
c1t5000C50055E6606Fd0
  289.9  169.0 3905.0 12754.1  0.0  0.2    0.0    0.4   0  10 c2
  146.6   14.1 1987.9  305.2  0.0  0.0    0.0    0.2   0   4
c2t500117310015D579d0
  143.4   10.6 1917.1  205.2  0.0  0.0    0.0    0.2   0   3
c2t50011731001631FDd0
    0.0  144.3    0.0 12243.7  0.0  0.1    0.0    0.9   0   3
c2t5000A72A3007811Dd0
    0.0   14.6    0.0   75.8  0.0  0.0    0.0    0.1   0   0 c4
    0.0    7.3    0.0   37.9  0.0  0.0    0.0    0.1   0   0 c4t0d0
    0.0    7.3    0.0   37.9  0.0  0.0    0.0    0.1   0   0 c4t1d0
  284.8  171.5 3792.8 12786.2  0.0  0.2    0.0    0.4   0  10 c12
    0.0  144.3    0.0 12243.7  0.0  0.1    0.0    0.9   0   3
c12t5000A72B300780FFd0
  152.3   13.3 2004.6  255.9  0.0  0.0    0.0    0.2   0   4
c12t500117310015D59Ed0
  132.5   13.9 1788.2  286.6  0.0  0.0    0.0    0.2   0   3
c12t500117310015D54Ed0
    0.0   13.5    0.0   75.8  0.0  0.0    0.8    0.1   0   0 rpool
  718.4 1653.5 12846.8 71761.5 34.0  2.0   14.3    0.8   7  51 tank

This doesn't seem any busier than my earlier output (6% wait, 68% busy,
asvc_t 1.1ms) and the dev team confirms that their workload hasn't changed
for the past few days. If my math is right.. this will take ~719 days to
complete.

Anything I can tune to help speed this up?

On Tue, Jul 29, 2014 at 3:29 PM, wuffers <moo at wuffers.net> wrote:

> Going to try to answer both responses in one message..
>
> Short answer, yes. ? Keep in mind that
>>
>> 1. a scrub runs in the background (so as not to impact production I/O,
>> this was not always the case and caused serious issues in the past with a
>> pool being unresponsive due to a scrub)
>>
>> 2. a scrub essentially walks the zpool examining every transaction in
>> order (as does a resilver)
>>
>> So the time to complete a scrub depends on how many write transactions
>> since the pool was created (which is generally related to the amount of
>> data but not always). You are limited by the random I/O capability of the
>> disks involved. With VMs I assume this is a file server, so the I/O size
>> will also affect performance.
>
>
> I haven't noticed any slowdowns in our virtual environments, so I guess
> that's a good thing it's so low priority that it doesn't impact workloads.
>
> Run the numbers? you are scanning 24.2TB at about 5.5MB/sec ? 4,613,734
>> seconds or 54 days. And that assumes the same rate for all of the scan. The
>> rate will change as other I/O competes for resources.
>>
>
> The number was fluctuating when I started the scrub, and I had seen it go
> as high as 35MB/s at one point. I am certain that our Hyper-V workload has
> increased since the last scrub, so this does make sense.
>
>
>> Looks like you have a fair bit of activity going on (almost 1MB/sec of
>> writes per spindle).
>>
>
> As Richard correctly states below, this is the aggregate since boot
> (uptime ~56 days). I have another output from iostat as per his
> instructions below.
>
>
>> Since this is storage for VMs, I assume this is the storage server for
>> separate compute servers? Have you tuned the block size for the file share
>> you are using? That can make a huge difference in performance.
>>
>
> Both the Hyper-V and VMware LUNs are created with 64K block sizes. From
> what I've read of other performance and tuning articles, that is the
> optimal block size (I did some limited testing when first configuring the
> SAN, but results were somewhat inconclusive). Hyper-V hosts our testing
> environment (we integrate with TFS, a MS product, so we have no choice
> here) and probably make up the bulk of the workload (~300+ test VMs with
> various OSes). VMware hosts our production servers (Exchange, file servers,
> SQL, AD, etc - ~50+ VMs).
>
> I also noted that you only have a single LOG device. Best Practice is to
>> mirror log devices so you do not lose any data in flight if hit by a power
>> outage (of course, if this server has more UPS runtime that all the clients
>> that may not matter).
>>
>
> Actually, I do have a mirror ZIL device, it's just disabled at this time
> (my ZIL devices are ZeusRAMs). At some point, I was troubleshooting some
> kernel panics (turned out to be a faulty SSD on the rpool), and hadn't
> re-enabled it yet. Thanks for the reminder (and yes, we do have a UPS as
> well).
>
> And oops.. re-attaching the ZIL as a mirror triggered a resilver now,
> suspending or canceling the scrub? Will monitor this and restart the scrub
> if it doesn't by itself.
>
>   pool: tank
>  state: ONLINE
> status: One or more devices is currently being resilvered.  The pool will
>         continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>   scan: resilver in progress since Tue Jul 29 14:48:48 2014
>     3.89T scanned out of 24.5T at 3.06G/s, 1h55m to go
>     0 resilvered, 15.84% done
>
> At least it's going very fast. EDIT: Now about 67% done as I finish
> writing this, speed dropping to ~1.3G/s.
>
> maybe, maybe not
>>>
>>> this is slower than most, surely slower than desired
>>>
>>
> Unfortunately reattaching the mirror to my log device triggered a
> resilver. Not sure if this is desired behavior, but yes, 5.5MB/s seems
> quite slow. Hopefully after the resilver the scrub will progress where it
> left off.
>
>
>> The estimate is often very wrong, especially for busy systems.
>>> If this is an older ZFS implementation, this pool is likely getting
>>> pounded by the
>>> ZFS write throttle. There are some tunings that can be applied, but the
>>> old write
>>> throttle is not a stable control system, so it will always be a little
>>> bit unpredictable.
>>>
>>
> The system is on r151008 (my BE states that I upgraded back in February,
> putting me in r151008j or so), with all the pools upgraded for the new
> enhancements as well as activating the new L2ARC compression feature.
> Reading the release notes, the ZFS write throttle enhancements were in
> since r151008e so I should be good there.
>
>
>> # iostat -xnze
>>>
>>>
>>> Unfortunately, this is the performance since boot and is not suitable
>>> for performance
>>> analysis unless the system has been rebooted in the past 10 minutes or
>>> so. You'll need
>>> to post the second batch from "iostat -zxCn 60 2"
>>>
>>
> Ah yes, that was my mistake. Output from second count (before re-attaching
> log mirror):
>
> # iostat -zxCn 60 2
>
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>   255.7 1077.7 6294.0 41335.1  0.0  1.9    0.0    1.4   0 153 c1
>     5.3   23.9  118.5  811.9  0.0  0.0    0.0    1.1   0   3
> c1t5000C50055F8723Bd0
>     5.9   14.5  110.0  834.3  0.0  0.0    0.0    1.3   0   2
> c1t5000C50055E66B63d0
>     5.6   16.6  123.8  822.7  0.0  0.0    0.0    1.3   0   2
> c1t5000C50055F87E73d0
>     4.7   27.8  118.6  796.6  0.0  0.0    0.0    1.3   0   3
> c1t5000C50055F8BFA3d0
>     5.6   14.5  139.7  833.8  0.0  0.0    0.0    1.6   0   3
> c1t5000C50055F9E123d0
>     4.4   27.1  112.3  825.2  0.0  0.0    0.0    0.8   0   2
> c1t5000C50055F9F0B3d0
>     5.0   20.2  121.7  803.4  0.0  0.0    0.0    1.2   0   3
> c1t5000C50055F9D3B3d0
>     5.4   26.4  137.0  857.3  0.0  0.0    0.0    1.4   0   4
> c1t5000C50055E4FDE7d0
>     4.7   12.3  123.7  832.7  0.0  0.0    0.0    2.0   0   3
> c1t5000C50055F9A607d0
>     5.0   23.9  125.9  830.9  0.0  0.0    0.0    1.3   0   3
> c1t5000C50055F8CDA7d0
>     4.5   31.4  112.2  814.6  0.0  0.0    0.0    1.1   0   3
> c1t5000C50055E65877d0
>     5.2   24.4  130.6  872.5  0.0  0.0    0.0    1.2   0   3
> c1t5000C50055F9E7D7d0
>     4.1   21.8  103.7  797.2  0.0  0.0    0.0    1.1   0   3
> c1t5000C50055FA0AF7d0
>     5.5   24.8  129.8  802.8  0.0  0.0    0.0    1.5   0   4
> c1t5000C50055F9FE87d0
>     5.7   17.7  137.2  797.6  0.0  0.0    0.0    1.4   0   3
> c1t5000C50055F9F91Bd0
>     6.0   30.6  139.1  852.0  0.0  0.1    0.0    1.5   0   4
> c1t5000C50055F9FEABd0
>     6.1   34.1  137.8  929.2  0.0  0.1    0.0    1.9   0   6
> c1t5000C50055F9F63Bd0
>     4.1   15.9  101.8  791.4  0.0  0.0    0.0    1.6   0   3
> c1t5000C50055F9F3EBd0
>     6.4   23.2  155.2  878.6  0.0  0.0    0.0    1.1   0   3
> c1t5000C50055F9F80Bd0
>     4.5   23.5  106.2  825.4  0.0  0.0    0.0    1.1   0   3
> c1t5000C50055F9FB8Bd0
>     4.0   23.2  101.1  788.9  0.0  0.0    0.0    1.3   0   3
> c1t5000C50055F9F92Bd0
>     4.4   11.3  125.7  782.3  0.0  0.0    0.0    1.9   0   3
> c1t5000C50055F8905Fd0
>     4.6   20.4  129.2  823.0  0.0  0.0    0.0    1.5   0   3
> c1t5000C50055F8D48Fd0
>     5.1   19.7  142.9  887.2  0.0  0.0    0.0    1.7   0   3
> c1t5000C50055F9F89Fd0
>     5.6   11.4  129.1  776.0  0.0  0.0    0.0    1.9   0   3
> c1t5000C50055F9EF2Fd0
>     5.6   23.7  137.4  811.9  0.0  0.0    0.0    1.2   0   3
> c1t5000C50055F8C3ABd0
>     6.8   13.9  132.4  834.3  0.0  0.0    0.0    1.8   0   3
> c1t5000C50055E66053d0
>     5.2   26.7  126.9  857.3  0.0  0.0    0.0    1.2   0   3
> c1t5000C50055E66503d0
>     4.2   27.1  104.6  825.2  0.0  0.0    0.0    1.0   0   3
> c1t5000C50055F9D3E3d0
>     5.2   30.7  140.9  852.0  0.0  0.1    0.0    1.5   0   4
> c1t5000C50055F84FB7d0
>     5.4   16.1  124.3  791.4  0.0  0.0    0.0    1.7   0   3
> c1t5000C50055F8E017d0
>     3.8   31.4   89.7  814.6  0.0  0.0    0.0    1.1   0   4
> c1t5000C50055E579F7d0
>     4.6   27.5  116.0  796.6  0.0  0.1    0.0    1.6   0   4
> c1t5000C50055E65807d0
>     4.0   21.5   99.7  797.2  0.0  0.0    0.0    1.1   0   3
> c1t5000C50055F84A97d0
>     4.7   20.2  116.3  803.4  0.0  0.0    0.0    1.4   0   3
> c1t5000C50055F87D97d0
>     5.0   11.5  121.5  776.0  0.0  0.0    0.0    2.0   0   3
> c1t5000C50055F9F637d0
>     4.9   11.3  112.4  782.3  0.0  0.0    0.0    2.3   0   3
> c1t5000C50055E65ABBd0
>     5.3   11.8  142.5  832.7  0.0  0.0    0.0    2.4   0   3
> c1t5000C50055F8BF9Bd0
>     5.0   20.3  121.4  823.0  0.0  0.0    0.0    1.7   0   3
> c1t5000C50055F8A22Bd0
>     6.6   24.3  170.3  872.5  0.0  0.0    0.0    1.3   0   3
> c1t5000C50055F9379Bd0
>     5.8   16.3  121.7  822.7  0.0  0.0    0.0    1.3   0   2
> c1t5000C50055E57A5Fd0
>     5.3   17.7  146.5  797.6  0.0  0.0    0.0    1.4   0   3
> c1t5000C50055F8CCAFd0
>     5.7   34.1  141.5  929.2  0.0  0.1    0.0    1.7   0   5
> c1t5000C50055F8B80Fd0
>     5.5   23.8  125.7  830.9  0.0  0.0    0.0    1.2   0   3
> c1t5000C50055F9FA1Fd0
>     5.0   23.2  127.9  878.6  0.0  0.0    0.0    1.1   0   3
> c1t5000C50055E65F0Fd0
>     5.2   14.0  163.7  833.8  0.0  0.0    0.0    2.0   0   3
> c1t5000C50055F8BE3Fd0
>     4.6   18.9  122.8  887.2  0.0  0.0    0.0    1.6   0   3
> c1t5000C50055F8B21Fd0
>     5.5   23.6  137.4  825.4  0.0  0.0    0.0    1.5   0   3
> c1t5000C50055F8A46Fd0
>     4.9   24.6  116.7  802.8  0.0  0.0    0.0    1.4   0   4
> c1t5000C50055F856CFd0
>     4.9   23.4  120.8  788.9  0.0  0.0    0.0    1.4   0   3
> c1t5000C50055E6606Fd0
>   234.9  170.1 4079.9 11127.8  0.0  0.2    0.0    0.5   0   9 c2
>   119.0   28.9 2083.8  670.8  0.0  0.0    0.0    0.3   0   3
> c2t500117310015D579d0
>   115.9   27.4 1996.1  634.2  0.0  0.0    0.0    0.3   0   3
> c2t50011731001631FDd0
>     0.0  113.8    0.0 9822.8  0.0  0.1    0.0    1.0   0   2
> c2t5000A72A3007811Dd0
>     0.1   18.5    0.0   64.8  0.0  0.0    0.0    0.0   0   0 c4
>     0.1    9.2    0.0   32.4  0.0  0.0    0.0    0.0   0   0 c4t0d0
>     0.0    9.2    0.0   32.4  0.0  0.0    0.0    0.0   0   0 c4t1d0
>   229.8   58.1 3987.4 1308.0  0.0  0.1    0.0    0.3   0   6 c12
>   114.2   27.7 1994.8  626.0  0.0  0.0    0.0    0.3   0   3
> c12t500117310015D59Ed0
>   115.5   30.4 1992.6  682.0  0.0  0.0    0.0    0.3   0   3
> c12t500117310015D54Ed0
>     0.1   17.1    0.0   64.8  0.0  0.0    0.6    0.1   0   0 rpool
>   720.3 1298.4 14361.2 53770.8 18.7  2.3    9.3    1.1   6  68 tank
>
> Is 153% busy correct on c1? Seems to me that disks are quite "busy", but
> are handling the workload just fine (wait at 6% and asvc_t at 1.1ms)
>
> Interestingly, this is the same output now that the resilver is running:
>
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>  2876.9 1041.1 25400.7 38189.1  0.0 37.9    0.0    9.7   0 2011 c1
>    60.8   26.1  540.1  845.2  0.0  0.7    0.0    8.3   0  39
> c1t5000C50055F8723Bd0
>    58.4   14.2  511.6  740.7  0.0  0.7    0.0   10.1   0  39
> c1t5000C50055E66B63d0
>    60.2   16.3  529.3  756.1  0.0  0.8    0.0   10.1   0  41
> c1t5000C50055F87E73d0
>    57.5   24.9  527.6  841.7  0.0  0.7    0.0    9.0   0  40
> c1t5000C50055F8BFA3d0
>    57.9   14.5  543.5  765.1  0.0  0.7    0.0    9.8   0  38
> c1t5000C50055F9E123d0
>    57.9   23.9  516.6  806.9  0.0  0.8    0.0    9.3   0  40
> c1t5000C50055F9F0B3d0
>    59.8   24.6  554.1  857.5  0.0  0.8    0.0    9.6   0  42
> c1t5000C50055F9D3B3d0
>    56.5   21.0  480.4  715.7  0.0  0.7    0.0    8.9   0  37
> c1t5000C50055E4FDE7d0
>    54.8    9.7  473.5  737.9  0.0  0.7    0.0   11.2   0  39
> c1t5000C50055F9A607d0
>    55.8   20.2  457.3  708.7  0.0  0.7    0.0    9.9   0  40
> c1t5000C50055F8CDA7d0
>    57.8   28.6  487.0  796.1  0.0  0.9    0.0    9.9   0  45
> c1t5000C50055E65877d0
>    60.8   27.1  572.6  823.7  0.0  0.8    0.0    8.8   0  41
> c1t5000C50055F9E7D7d0
>    55.8   21.1  478.2  766.6  0.0  0.7    0.0    9.7   0  40
> c1t5000C50055FA0AF7d0
>    57.0   22.8  528.3  724.5  0.0  0.8    0.0    9.6   0  41
> c1t5000C50055F9FE87d0
>    56.2   10.8  465.2  715.6  0.0  0.7    0.0   10.4   0  38
> c1t5000C50055F9F91Bd0
>    59.2   29.4  524.6  740.9  0.0  0.8    0.0    8.9   0  41
> c1t5000C50055F9FEABd0
>    57.3   30.7  496.7  788.3  0.0  0.8    0.0    9.1   0  42
> c1t5000C50055F9F63Bd0
>    55.5   16.3  461.9  652.9  0.0  0.7    0.0   10.1   0  39
> c1t5000C50055F9F3EBd0
>    57.2   22.1  495.1  701.1  0.0  0.8    0.0    9.8   0  41
> c1t5000C50055F9F80Bd0
>    59.5   30.2  543.1  741.8  0.0  0.9    0.0    9.6   0  45
> c1t5000C50055F9FB8Bd0
>    56.5   25.1  515.4  786.9  0.0  0.7    0.0    8.6   0  38
> c1t5000C50055F9F92Bd0
>    61.8   12.5  540.6  790.9  0.0  0.8    0.0   10.3   0  41
> c1t5000C50055F8905Fd0
>    57.0   19.8  521.0  774.3  0.0  0.7    0.0    9.6   0  39
> c1t5000C50055F8D48Fd0
>    56.3   16.3  517.7  724.7  0.0  0.7    0.0    9.9   0  38
> c1t5000C50055F9F89Fd0
>    57.0   13.4  504.5  790.5  0.0  0.8    0.0   10.7   0  40
> c1t5000C50055F9EF2Fd0
>    55.0   26.1  477.6  845.2  0.0  0.7    0.0    8.3   0  36
> c1t5000C50055F8C3ABd0
>    57.8   14.1  518.7  740.7  0.0  0.8    0.0   10.8   0  41
> c1t5000C50055E66053d0
>    55.9   20.8  490.2  715.7  0.0  0.7    0.0    9.0   0  37
> c1t5000C50055E66503d0
>    57.0   24.1  509.7  806.9  0.0  0.8    0.0   10.0   0  41
> c1t5000C50055F9D3E3d0
>    59.1   29.2  504.1  740.9  0.0  0.8    0.0    9.3   0  44
> c1t5000C50055F84FB7d0
>    54.4   16.3  449.5  652.9  0.0  0.7    0.0   10.4   0  39
> c1t5000C50055F8E017d0
>    57.8   28.4  503.3  796.1  0.0  0.9    0.0   10.1   0  45
> c1t5000C50055E579F7d0
>    58.2   24.9  502.0  841.7  0.0  0.8    0.0    9.2   0  40
> c1t5000C50055E65807d0
>    58.2   20.7  513.4  766.6  0.0  0.8    0.0    9.8   0  41
> c1t5000C50055F84A97d0
>    56.5   24.9  508.0  857.5  0.0  0.8    0.0    9.2   0  40
> c1t5000C50055F87D97d0
>    53.4   13.5  449.9  790.5  0.0  0.7    0.0   10.7   0  38
> c1t5000C50055F9F637d0
>    57.0   11.8  503.0  790.9  0.0  0.7    0.0   10.6   0  39
> c1t5000C50055E65ABBd0
>    55.4    9.6  461.1  737.9  0.0  0.8    0.0   11.6   0  40
> c1t5000C50055F8BF9Bd0
>    55.7   19.7  484.6  774.3  0.0  0.7    0.0    9.9   0  40
> c1t5000C50055F8A22Bd0
>    57.6   27.1  518.2  823.7  0.0  0.8    0.0    8.9   0  40
> c1t5000C50055F9379Bd0
>    59.6   17.0  528.0  756.1  0.0  0.8    0.0   10.1   0  41
> c1t5000C50055E57A5Fd0
>    61.2   10.8  530.0  715.6  0.0  0.8    0.0   10.7   0  40
> c1t5000C50055F8CCAFd0
>    58.0   30.8  493.3  788.3  0.0  0.8    0.0    9.4   0  43
> c1t5000C50055F8B80Fd0
>    56.5   19.9  490.7  708.7  0.0  0.8    0.0   10.0   0  40
> c1t5000C50055F9FA1Fd0
>    56.1   22.4  484.2  701.1  0.0  0.7    0.0    9.5   0  39
> c1t5000C50055E65F0Fd0
>    59.2   14.6  560.9  765.1  0.0  0.7    0.0    9.8   0  39
> c1t5000C50055F8BE3Fd0
>    57.9   16.2  546.0  724.7  0.0  0.7    0.0   10.1   0  40
> c1t5000C50055F8B21Fd0
>    59.5   30.0  553.2  741.8  0.0  0.9    0.0    9.8   0  45
> c1t5000C50055F8A46Fd0
>    57.4   22.5  504.0  724.5  0.0  0.8    0.0    9.6   0  41
> c1t5000C50055F856CFd0
>    58.4   24.6  531.4  786.9  0.0  0.7    0.0    8.4   0  38
> c1t5000C50055E6606Fd0
>   511.0  161.4 7572.1 11260.1  0.0  0.3    0.0    0.4   0  14 c2
>   252.3   20.1 3776.3  458.9  0.0  0.1    0.0    0.2   0   6
> c2t500117310015D579d0
>   258.8   18.0 3795.7  350.0  0.0  0.1    0.0    0.2   0   6
> c2t50011731001631FDd0
>     0.0  123.4    0.0 10451.1  0.0  0.1    0.0    1.0   0   3
> c2t5000A72A3007811Dd0
>     0.2   16.1    1.9   56.7  0.0  0.0    0.0    0.0   0   0 c4
>     0.2    8.1    1.6   28.3  0.0  0.0    0.0    0.0   0   0 c4t0d0
>     0.0    8.1    0.3   28.3  0.0  0.0    0.0    0.0   0   0 c4t1d0
>   495.6  163.6 7168.9 11290.3  0.0  0.2    0.0    0.4   0  14 c12
>     0.0  123.4    0.0 10451.1  0.0  0.1    0.0    1.0   0   3
> c12t5000A72B300780FFd0
>   248.2   18.1 3645.8  323.0  0.0  0.1    0.0    0.2   0   5
> c12t500117310015D59Ed0
>   247.4   22.1 3523.1  516.2  0.0  0.1    0.0    0.2   0   6
> c12t500117310015D54Ed0
>     0.2   14.8    1.9   56.7  0.0  0.0    0.6    0.1   0   0 rpool
>  3883.5 1357.7 40141.6 60739.5 22.8 38.6    4.4    7.4  54 100 tank
>
> It is very busy with alot of wait % and higher asvc_t (2011% busy on
> c1?!). I'm assuming resilvers are alot more aggressive than scrubs.
>
> There are many variables here, the biggest of which is the current
>>> non-scrub load.
>>>
>>
> I might have lost 2 weeks of scrub time, depending on whether the scrub
> will resume where it left off. I'll update when I can.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140731/2c2d356a/attachment-0001.html>

From richard.elling at richardelling.com  Thu Jul 31 05:37:26 2014
From: richard.elling at richardelling.com (Richard Elling)
Date: Wed, 30 Jul 2014 22:37:26 -0700
Subject: [OmniOS-discuss] Slow scrub performance
In-Reply-To: <CA+tR_KxSt3VVeUVZAOL4FxWsRL3M-PwpBF6JbHVXv5Xx66Jozg@mail.gmail.com>
References: <CA+tR_KxTmft+hK9U8HLgqmPOdZf0+9q67+Kg9p1JwX6sg5drAA@mail.gmail.com>
	<C7904250-3A94-4D4B-8D62-D49FDE808008@richardelling.com>
	<CA+tR_KwX_1HN4tVa+-ZOFJk2mN7RE-nFh31sMcTNo7TJJjfyLg@mail.gmail.com>
	<CA+tR_KxSt3VVeUVZAOL4FxWsRL3M-PwpBF6JbHVXv5Xx66Jozg@mail.gmail.com>
Message-ID: <BE32B046-05EC-43D2-AC46-7649FAC34C2A@richardelling.com>

apologies for the long post, data for big systems tends to do that, comments below...

On Jul 30, 2014, at 9:10 PM, wuffers <moo at wuffers.net> wrote:

> So as I suspected, I lost 2 weeks of scrub time after the resilver. I started a scrub again, and it's going extremely slow (~13x slower than before):
> 
>   pool: tank
>  state: ONLINE
>   scan: scrub in progress since Tue Jul 29 15:41:27 2014
>     45.4G scanned out of 24.5T at 413K/s, (scan is slow, no estimated time)
>     0 repaired, 0.18% done
> 
> # iostat -zxCn 60 2 (2nd batch output)
> 
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>   143.7 1321.5 5149.0 46223.4  0.0  1.5    0.0    1.0   0 120 c1
>     2.4   33.3   72.0  897.5  0.0  0.0    0.0    0.6   0   2 c1t5000C50055F8723Bd0
>     2.7   22.8   82.9 1005.4  0.0  0.0    0.0    0.9   0   2 c1t5000C50055E66B63d0
>     2.2   24.4   73.1  917.7  0.0  0.0    0.0    0.7   0   2 c1t5000C50055F87E73d0
>     3.1   26.2  120.9  899.8  0.0  0.0    0.0    0.8   0   2 c1t5000C50055F8BFA3d0
>     2.8   16.5  105.9  941.6  0.0  0.0    0.0    1.0   0   2 c1t5000C50055F9E123d0
>     2.5   25.6   86.6  897.9  0.0  0.0    0.0    0.7   0   2 c1t5000C50055F9F0B3d0
>     2.3   19.9   85.3  967.8  0.0  0.0    0.0    1.2   0   2 c1t5000C50055F9D3B3d0
>     3.1   38.3  120.7 1053.1  0.0  0.0    0.0    0.8   0   3 c1t5000C50055E4FDE7d0
>     2.6   12.7   81.8  854.3  0.0  0.0    0.0    1.6   0   2 c1t5000C50055F9A607d0
>     3.2   25.0  121.7  871.7  0.0  0.0    0.0    0.8   0   2 c1t5000C50055F8CDA7d0
>     2.5   30.6   93.0  941.2  0.0  0.0    0.0    0.9   0   2 c1t5000C50055E65877d0
>     3.1   43.7  101.4 1004.2  0.0  0.0    0.0    1.0   0   4 c1t5000C50055F9E7D7d0
>     2.3   24.0   92.2  965.8  0.0  0.0    0.0    0.9   0   2 c1t5000C50055FA0AF7d0
>     2.5   25.3   99.2  872.9  0.0  0.0    0.0    0.8   0   2 c1t5000C50055F9FE87d0
>     2.9   19.0  116.1  894.8  0.0  0.0    0.0    1.2   0   2 c1t5000C50055F9F91Bd0
>     2.6   38.9   96.1  915.4  0.0  0.1    0.0    1.2   0   4 c1t5000C50055F9FEABd0
>     3.2   45.6  135.7  973.5  0.0  0.1    0.0    1.5   0   5 c1t5000C50055F9F63Bd0
>     3.1   21.2  105.9  966.6  0.0  0.0    0.0    1.0   0   2 c1t5000C50055F9F3EBd0
>     2.8   26.7  122.0  781.6  0.0  0.0    0.0    0.7   0   2 c1t5000C50055F9F80Bd0
>     3.1   31.6  119.9  932.5  0.0  0.0    0.0    1.1   0   3 c1t5000C50055F9FB8Bd0
>     3.1   32.5  123.3  924.1  0.0  0.0    0.0    0.9   0   3 c1t5000C50055F9F92Bd0
>     2.9   17.0  113.8  952.0  0.0  0.0    0.0    1.2   0   2 c1t5000C50055F8905Fd0
>     3.0   23.4  111.0  871.1  0.0  0.0    0.0    1.5   0   2 c1t5000C50055F8D48Fd0
>     2.8   21.4  105.5  858.0  0.0  0.0    0.0    1.0   0   2 c1t5000C50055F9F89Fd0
>     3.5   16.4   87.1  941.3  0.0  0.0    0.0    1.4   0   2 c1t5000C50055F9EF2Fd0
>     2.1   33.8   64.5  897.5  0.0  0.0    0.0    0.5   0   2 c1t5000C50055F8C3ABd0
>     3.0   21.8   72.3 1005.4  0.0  0.0    0.0    1.0   0   2 c1t5000C50055E66053d0
>     3.0   37.8  106.9 1053.5  0.0  0.0    0.0    0.9   0   3 c1t5000C50055E66503d0
>     2.7   26.0  107.7  897.9  0.0  0.0    0.0    0.7   0   2 c1t5000C50055F9D3E3d0
>     2.2   38.9   96.4  918.7  0.0  0.0    0.0    0.9   0   4 c1t5000C50055F84FB7d0
>     2.8   21.4  111.1  953.6  0.0  0.0    0.0    0.7   0   1 c1t5000C50055F8E017d0
>     3.0   30.6  104.3  940.9  0.0  0.1    0.0    1.5   0   3 c1t5000C50055E579F7d0
>     2.8   26.4   90.9  901.1  0.0  0.0    0.0    0.9   0   2 c1t5000C50055E65807d0
>     2.4   24.0   96.7  965.8  0.0  0.0    0.0    0.9   0   2 c1t5000C50055F84A97d0
>     2.9   19.8  109.4  967.8  0.0  0.0    0.0    1.1   0   2 c1t5000C50055F87D97d0
>     3.8   16.1  106.4  943.1  0.0  0.0    0.0    1.3   0   2 c1t5000C50055F9F637d0
>     2.2   17.1   72.7  966.6  0.0  0.0    0.0    1.4   0   2 c1t5000C50055E65ABBd0
>     2.7   12.7   86.0  863.3  0.0  0.0    0.0    1.5   0   2 c1t5000C50055F8BF9Bd0
>     2.7   23.2  101.8  871.1  0.0  0.0    0.0    1.0   0   2 c1t5000C50055F8A22Bd0
>     4.5   43.6  134.7 1004.2  0.0  0.0    0.0    1.0   0   4 c1t5000C50055F9379Bd0
>     2.8   24.0   87.9  917.7  0.0  0.0    0.0    0.8   0   2 c1t5000C50055E57A5Fd0
>     2.9   18.8  119.0  894.3  0.0  0.0    0.0    1.1   0   2 c1t5000C50055F8CCAFd0
>     3.4   45.7  128.1  976.8  0.0  0.1    0.0    1.2   0   5 c1t5000C50055F8B80Fd0
>     2.7   24.9  100.2  871.7  0.0  0.0    0.0    0.8   0   2 c1t5000C50055F9FA1Fd0
>     4.8   26.8  128.6  781.6  0.0  0.0    0.0    0.7   0   2 c1t5000C50055E65F0Fd0
>     2.7   16.3  109.5  941.6  0.0  0.0    0.0    1.1   0   2 c1t5000C50055F8BE3Fd0
>     3.1   21.1  119.9  858.0  0.0  0.0    0.0    1.1   0   2 c1t5000C50055F8B21Fd0
>     2.8   31.8  108.5  932.5  0.0  0.0    0.0    1.0   0   3 c1t5000C50055F8A46Fd0
>     2.4   25.3   87.4  872.9  0.0  0.0    0.0    0.8   0   2 c1t5000C50055F856CFd0
>     3.3   32.0  125.2  924.1  0.0  0.0    0.0    1.2   0   3 c1t5000C50055E6606Fd0
>   289.9  169.0 3905.0 12754.1  0.0  0.2    0.0    0.4   0  10 c2
>   146.6   14.1 1987.9  305.2  0.0  0.0    0.0    0.2   0   4 c2t500117310015D579d0
>   143.4   10.6 1917.1  205.2  0.0  0.0    0.0    0.2   0   3 c2t50011731001631FDd0
>     0.0  144.3    0.0 12243.7  0.0  0.1    0.0    0.9   0   3 c2t5000A72A3007811Dd0
>     0.0   14.6    0.0   75.8  0.0  0.0    0.0    0.1   0   0 c4
>     0.0    7.3    0.0   37.9  0.0  0.0    0.0    0.1   0   0 c4t0d0
>     0.0    7.3    0.0   37.9  0.0  0.0    0.0    0.1   0   0 c4t1d0
>   284.8  171.5 3792.8 12786.2  0.0  0.2    0.0    0.4   0  10 c12
>     0.0  144.3    0.0 12243.7  0.0  0.1    0.0    0.9   0   3 c12t5000A72B300780FFd0
>   152.3   13.3 2004.6  255.9  0.0  0.0    0.0    0.2   0   4 c12t500117310015D59Ed0
>   132.5   13.9 1788.2  286.6  0.0  0.0    0.0    0.2   0   3 c12t500117310015D54Ed0
>     0.0   13.5    0.0   75.8  0.0  0.0    0.8    0.1   0   0 rpool
>   718.4 1653.5 12846.8 71761.5 34.0  2.0   14.3    0.8   7  51 tank
> 
> This doesn't seem any busier than my earlier output (6% wait, 68% busy, asvc_t 1.1ms) and the dev team confirms that their workload hasn't changed for the past few days. If my math is right.. this will take ~719 days to complete.

The %busy for controllers is a sum of the %busy for all disks on the controller, so
is can be large, but overall isn't interesting. With HDDs, there is no way you can 
saturate the controller, so we don't really care what the %busy shows.

The more important item is that the number of read ops is fairly low for all but 4 disks.
Since you didn't post the pool configuration, we can only guess that they might be a
souce of the bottleneck. 

You're seeing a lot of reads from the cache devices. How much RAM does this system
have?

> 
> Anything I can tune to help speed this up?

methinks the scrub I/Os are getting starved and since they are low priority, they 
could get very starved. In general, I wouldn't worry about it, but I understand 
why you might be nervous. Keep in mind that in ZFS scrubs are intended to find 
errors on idle data, not frequently accessed data.

more far below...

> 
> On Tue, Jul 29, 2014 at 3:29 PM, wuffers <moo at wuffers.net> wrote:
> Going to try to answer both responses in one message..
> 
> Short answer, yes. ? Keep in mind that
> 
> 1. a scrub runs in the background (so as not to impact production I/O, this was not always the case and caused serious issues in the past with a pool being unresponsive due to a scrub)
> 
> 2. a scrub essentially walks the zpool examining every transaction in order (as does a resilver)
> 
> So the time to complete a scrub depends on how many write transactions since the pool was created (which is generally related to the amount of data but not always). You are limited by the random I/O capability of the disks involved. With VMs I assume this is a file server, so the I/O size will also affect performance.
> 
> I haven't noticed any slowdowns in our virtual environments, so I guess that's a good thing it's so low priority that it doesn't impact workloads. 
> 
> Run the numbers? you are scanning 24.2TB at about 5.5MB/sec ? 4,613,734 seconds or 54 days. And that assumes the same rate for all of the scan. The rate will change as other I/O competes for resources.
> 
> The number was fluctuating when I started the scrub, and I had seen it go as high as 35MB/s at one point. I am certain that our Hyper-V workload has increased since the last scrub, so this does make sense.
>  
> Looks like you have a fair bit of activity going on (almost 1MB/sec of writes per spindle).
> 
> As Richard correctly states below, this is the aggregate since boot (uptime ~56 days). I have another output from iostat as per his instructions below. 
>  
> Since this is storage for VMs, I assume this is the storage server for separate compute servers? Have you tuned the block size for the file share you are using? That can make a huge difference in performance.
> 
> Both the Hyper-V and VMware LUNs are created with 64K block sizes. From what I've read of other performance and tuning articles, that is the optimal block size (I did some limited testing when first configuring the SAN, but results were somewhat inconclusive). Hyper-V hosts our testing environment (we integrate with TFS, a MS product, so we have no choice here) and probably make up the bulk of the workload (~300+ test VMs with various OSes). VMware hosts our production servers (Exchange, file servers, SQL, AD, etc - ~50+ VMs).
> 
> I also noted that you only have a single LOG device. Best Practice is to mirror log devices so you do not lose any data in flight if hit by a power outage (of course, if this server has more UPS runtime that all the clients that may not matter).
> 
> Actually, I do have a mirror ZIL device, it's just disabled at this time (my ZIL devices are ZeusRAMs). At some point, I was troubleshooting some kernel panics (turned out to be a faulty SSD on the rpool), and hadn't re-enabled it yet. Thanks for the reminder (and yes, we do have a UPS as well). 
> 
> And oops.. re-attaching the ZIL as a mirror triggered a resilver now, suspending or canceling the scrub? Will monitor this and restart the scrub if it doesn't by itself.
> 
>   pool: tank
>  state: ONLINE
> status: One or more devices is currently being resilvered.  The pool will
>         continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>   scan: resilver in progress since Tue Jul 29 14:48:48 2014
>     3.89T scanned out of 24.5T at 3.06G/s, 1h55m to go
>     0 resilvered, 15.84% done
> 
> At least it's going very fast. EDIT: Now about 67% done as I finish writing this, speed dropping to ~1.3G/s. 
> 
> maybe, maybe not
> 
> this is slower than most, surely slower than desired
> 
> Unfortunately reattaching the mirror to my log device triggered a resilver. Not sure if this is desired behavior, but yes, 5.5MB/s seems quite slow. Hopefully after the resilver the scrub will progress where it left off. 
>  
> The estimate is often very wrong, especially for busy systems.
> If this is an older ZFS implementation, this pool is likely getting pounded by the
> ZFS write throttle. There are some tunings that can be applied, but the old write
> throttle is not a stable control system, so it will always be a little bit unpredictable.
> 
> The system is on r151008 (my BE states that I upgraded back in February, putting me in r151008j or so), with all the pools upgraded for the new enhancements as well as activating the new L2ARC compression feature. Reading the release notes, the ZFS write throttle enhancements were in since r151008e so I should be good there.
>  
>> # iostat -xnze
> 
> Unfortunately, this is the performance since boot and is not suitable for performance
> analysis unless the system has been rebooted in the past 10 minutes or so. You'll need
> to post the second batch from "iostat -zxCn 60 2"
> 
> Ah yes, that was my mistake. Output from second count (before re-attaching log mirror):
> 
> # iostat -zxCn 60 2
> 
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>   255.7 1077.7 6294.0 41335.1  0.0  1.9    0.0    1.4   0 153 c1
>     5.3   23.9  118.5  811.9  0.0  0.0    0.0    1.1   0   3 c1t5000C50055F8723Bd0
>     5.9   14.5  110.0  834.3  0.0  0.0    0.0    1.3   0   2 c1t5000C50055E66B63d0
>     5.6   16.6  123.8  822.7  0.0  0.0    0.0    1.3   0   2 c1t5000C50055F87E73d0
>     4.7   27.8  118.6  796.6  0.0  0.0    0.0    1.3   0   3 c1t5000C50055F8BFA3d0
>     5.6   14.5  139.7  833.8  0.0  0.0    0.0    1.6   0   3 c1t5000C50055F9E123d0
>     4.4   27.1  112.3  825.2  0.0  0.0    0.0    0.8   0   2 c1t5000C50055F9F0B3d0
>     5.0   20.2  121.7  803.4  0.0  0.0    0.0    1.2   0   3 c1t5000C50055F9D3B3d0
>     5.4   26.4  137.0  857.3  0.0  0.0    0.0    1.4   0   4 c1t5000C50055E4FDE7d0
>     4.7   12.3  123.7  832.7  0.0  0.0    0.0    2.0   0   3 c1t5000C50055F9A607d0
>     5.0   23.9  125.9  830.9  0.0  0.0    0.0    1.3   0   3 c1t5000C50055F8CDA7d0
>     4.5   31.4  112.2  814.6  0.0  0.0    0.0    1.1   0   3 c1t5000C50055E65877d0
>     5.2   24.4  130.6  872.5  0.0  0.0    0.0    1.2   0   3 c1t5000C50055F9E7D7d0
>     4.1   21.8  103.7  797.2  0.0  0.0    0.0    1.1   0   3 c1t5000C50055FA0AF7d0
>     5.5   24.8  129.8  802.8  0.0  0.0    0.0    1.5   0   4 c1t5000C50055F9FE87d0
>     5.7   17.7  137.2  797.6  0.0  0.0    0.0    1.4   0   3 c1t5000C50055F9F91Bd0
>     6.0   30.6  139.1  852.0  0.0  0.1    0.0    1.5   0   4 c1t5000C50055F9FEABd0
>     6.1   34.1  137.8  929.2  0.0  0.1    0.0    1.9   0   6 c1t5000C50055F9F63Bd0
>     4.1   15.9  101.8  791.4  0.0  0.0    0.0    1.6   0   3 c1t5000C50055F9F3EBd0
>     6.4   23.2  155.2  878.6  0.0  0.0    0.0    1.1   0   3 c1t5000C50055F9F80Bd0
>     4.5   23.5  106.2  825.4  0.0  0.0    0.0    1.1   0   3 c1t5000C50055F9FB8Bd0
>     4.0   23.2  101.1  788.9  0.0  0.0    0.0    1.3   0   3 c1t5000C50055F9F92Bd0
>     4.4   11.3  125.7  782.3  0.0  0.0    0.0    1.9   0   3 c1t5000C50055F8905Fd0
>     4.6   20.4  129.2  823.0  0.0  0.0    0.0    1.5   0   3 c1t5000C50055F8D48Fd0
>     5.1   19.7  142.9  887.2  0.0  0.0    0.0    1.7   0   3 c1t5000C50055F9F89Fd0
>     5.6   11.4  129.1  776.0  0.0  0.0    0.0    1.9   0   3 c1t5000C50055F9EF2Fd0
>     5.6   23.7  137.4  811.9  0.0  0.0    0.0    1.2   0   3 c1t5000C50055F8C3ABd0
>     6.8   13.9  132.4  834.3  0.0  0.0    0.0    1.8   0   3 c1t5000C50055E66053d0
>     5.2   26.7  126.9  857.3  0.0  0.0    0.0    1.2   0   3 c1t5000C50055E66503d0
>     4.2   27.1  104.6  825.2  0.0  0.0    0.0    1.0   0   3 c1t5000C50055F9D3E3d0
>     5.2   30.7  140.9  852.0  0.0  0.1    0.0    1.5   0   4 c1t5000C50055F84FB7d0
>     5.4   16.1  124.3  791.4  0.0  0.0    0.0    1.7   0   3 c1t5000C50055F8E017d0
>     3.8   31.4   89.7  814.6  0.0  0.0    0.0    1.1   0   4 c1t5000C50055E579F7d0
>     4.6   27.5  116.0  796.6  0.0  0.1    0.0    1.6   0   4 c1t5000C50055E65807d0
>     4.0   21.5   99.7  797.2  0.0  0.0    0.0    1.1   0   3 c1t5000C50055F84A97d0
>     4.7   20.2  116.3  803.4  0.0  0.0    0.0    1.4   0   3 c1t5000C50055F87D97d0
>     5.0   11.5  121.5  776.0  0.0  0.0    0.0    2.0   0   3 c1t5000C50055F9F637d0
>     4.9   11.3  112.4  782.3  0.0  0.0    0.0    2.3   0   3 c1t5000C50055E65ABBd0
>     5.3   11.8  142.5  832.7  0.0  0.0    0.0    2.4   0   3 c1t5000C50055F8BF9Bd0
>     5.0   20.3  121.4  823.0  0.0  0.0    0.0    1.7   0   3 c1t5000C50055F8A22Bd0
>     6.6   24.3  170.3  872.5  0.0  0.0    0.0    1.3   0   3 c1t5000C50055F9379Bd0
>     5.8   16.3  121.7  822.7  0.0  0.0    0.0    1.3   0   2 c1t5000C50055E57A5Fd0
>     5.3   17.7  146.5  797.6  0.0  0.0    0.0    1.4   0   3 c1t5000C50055F8CCAFd0
>     5.7   34.1  141.5  929.2  0.0  0.1    0.0    1.7   0   5 c1t5000C50055F8B80Fd0
>     5.5   23.8  125.7  830.9  0.0  0.0    0.0    1.2   0   3 c1t5000C50055F9FA1Fd0
>     5.0   23.2  127.9  878.6  0.0  0.0    0.0    1.1   0   3 c1t5000C50055E65F0Fd0
>     5.2   14.0  163.7  833.8  0.0  0.0    0.0    2.0   0   3 c1t5000C50055F8BE3Fd0
>     4.6   18.9  122.8  887.2  0.0  0.0    0.0    1.6   0   3 c1t5000C50055F8B21Fd0
>     5.5   23.6  137.4  825.4  0.0  0.0    0.0    1.5   0   3 c1t5000C50055F8A46Fd0
>     4.9   24.6  116.7  802.8  0.0  0.0    0.0    1.4   0   4 c1t5000C50055F856CFd0
>     4.9   23.4  120.8  788.9  0.0  0.0    0.0    1.4   0   3 c1t5000C50055E6606Fd0
>   234.9  170.1 4079.9 11127.8  0.0  0.2    0.0    0.5   0   9 c2
>   119.0   28.9 2083.8  670.8  0.0  0.0    0.0    0.3   0   3 c2t500117310015D579d0
>   115.9   27.4 1996.1  634.2  0.0  0.0    0.0    0.3   0   3 c2t50011731001631FDd0
>     0.0  113.8    0.0 9822.8  0.0  0.1    0.0    1.0   0   2 c2t5000A72A3007811Dd0
>     0.1   18.5    0.0   64.8  0.0  0.0    0.0    0.0   0   0 c4
>     0.1    9.2    0.0   32.4  0.0  0.0    0.0    0.0   0   0 c4t0d0
>     0.0    9.2    0.0   32.4  0.0  0.0    0.0    0.0   0   0 c4t1d0
>   229.8   58.1 3987.4 1308.0  0.0  0.1    0.0    0.3   0   6 c12
>   114.2   27.7 1994.8  626.0  0.0  0.0    0.0    0.3   0   3 c12t500117310015D59Ed0
>   115.5   30.4 1992.6  682.0  0.0  0.0    0.0    0.3   0   3 c12t500117310015D54Ed0
>     0.1   17.1    0.0   64.8  0.0  0.0    0.6    0.1   0   0 rpool
>   720.3 1298.4 14361.2 53770.8 18.7  2.3    9.3    1.1   6  68 tank

ok, so the pool is issuing 720 read iops, including resilver workload, vs 1298 write iops.
There is plenty of I/O capacity left on the table here, as you can see by the %busy being
so low.

So I think the pool is not scheduling scrub I/Os very well. You can increase the number of
scrub I/Os in the scheduler by adjusting the zfs_vdev_scrub_max_active tunable. The
default is 2, but you'll have to consider that a share (in the stock market sense) where
the active sync reads and writes are getting 10 each. You can try bumping up the value
and see what happens over some time, perhaps 10 minutes or so -- too short of a time
and you won't get a good feeling for the impact (try this in off-peak time).
	echo zfs_vdev_scrub_max_active/W0t5 | mdb -kw
will change the value from 2 to 5, increasing its share of the total I/O workload.

You can see the progress of scan (scrubs do scan) workload by looking at the ZFS
debug messages.
	echo ::zfs_dbgmsg | mdb -k
These will look mysterious... they are. But the interesting bits are about how many blocks
are visited in some amount of time (txg sync interval). Ideally, this will change as you 
adjust zfs_vdev_scrub_max_active.
 -- richard

>  
> Is 153% busy correct on c1? Seems to me that disks are quite "busy", but are handling the workload just fine (wait at 6% and asvc_t at 1.1ms)
> 
> Interestingly, this is the same output now that the resilver is running:
> 
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>  2876.9 1041.1 25400.7 38189.1  0.0 37.9    0.0    9.7   0 2011 c1
>    60.8   26.1  540.1  845.2  0.0  0.7    0.0    8.3   0  39 c1t5000C50055F8723Bd0
>    58.4   14.2  511.6  740.7  0.0  0.7    0.0   10.1   0  39 c1t5000C50055E66B63d0
>    60.2   16.3  529.3  756.1  0.0  0.8    0.0   10.1   0  41 c1t5000C50055F87E73d0
>    57.5   24.9  527.6  841.7  0.0  0.7    0.0    9.0   0  40 c1t5000C50055F8BFA3d0
>    57.9   14.5  543.5  765.1  0.0  0.7    0.0    9.8   0  38 c1t5000C50055F9E123d0
>    57.9   23.9  516.6  806.9  0.0  0.8    0.0    9.3   0  40 c1t5000C50055F9F0B3d0
>    59.8   24.6  554.1  857.5  0.0  0.8    0.0    9.6   0  42 c1t5000C50055F9D3B3d0
>    56.5   21.0  480.4  715.7  0.0  0.7    0.0    8.9   0  37 c1t5000C50055E4FDE7d0
>    54.8    9.7  473.5  737.9  0.0  0.7    0.0   11.2   0  39 c1t5000C50055F9A607d0
>    55.8   20.2  457.3  708.7  0.0  0.7    0.0    9.9   0  40 c1t5000C50055F8CDA7d0
>    57.8   28.6  487.0  796.1  0.0  0.9    0.0    9.9   0  45 c1t5000C50055E65877d0
>    60.8   27.1  572.6  823.7  0.0  0.8    0.0    8.8   0  41 c1t5000C50055F9E7D7d0
>    55.8   21.1  478.2  766.6  0.0  0.7    0.0    9.7   0  40 c1t5000C50055FA0AF7d0
>    57.0   22.8  528.3  724.5  0.0  0.8    0.0    9.6   0  41 c1t5000C50055F9FE87d0
>    56.2   10.8  465.2  715.6  0.0  0.7    0.0   10.4   0  38 c1t5000C50055F9F91Bd0
>    59.2   29.4  524.6  740.9  0.0  0.8    0.0    8.9   0  41 c1t5000C50055F9FEABd0
>    57.3   30.7  496.7  788.3  0.0  0.8    0.0    9.1   0  42 c1t5000C50055F9F63Bd0
>    55.5   16.3  461.9  652.9  0.0  0.7    0.0   10.1   0  39 c1t5000C50055F9F3EBd0
>    57.2   22.1  495.1  701.1  0.0  0.8    0.0    9.8   0  41 c1t5000C50055F9F80Bd0
>    59.5   30.2  543.1  741.8  0.0  0.9    0.0    9.6   0  45 c1t5000C50055F9FB8Bd0
>    56.5   25.1  515.4  786.9  0.0  0.7    0.0    8.6   0  38 c1t5000C50055F9F92Bd0
>    61.8   12.5  540.6  790.9  0.0  0.8    0.0   10.3   0  41 c1t5000C50055F8905Fd0
>    57.0   19.8  521.0  774.3  0.0  0.7    0.0    9.6   0  39 c1t5000C50055F8D48Fd0
>    56.3   16.3  517.7  724.7  0.0  0.7    0.0    9.9   0  38 c1t5000C50055F9F89Fd0
>    57.0   13.4  504.5  790.5  0.0  0.8    0.0   10.7   0  40 c1t5000C50055F9EF2Fd0
>    55.0   26.1  477.6  845.2  0.0  0.7    0.0    8.3   0  36 c1t5000C50055F8C3ABd0
>    57.8   14.1  518.7  740.7  0.0  0.8    0.0   10.8   0  41 c1t5000C50055E66053d0
>    55.9   20.8  490.2  715.7  0.0  0.7    0.0    9.0   0  37 c1t5000C50055E66503d0
>    57.0   24.1  509.7  806.9  0.0  0.8    0.0   10.0   0  41 c1t5000C50055F9D3E3d0
>    59.1   29.2  504.1  740.9  0.0  0.8    0.0    9.3   0  44 c1t5000C50055F84FB7d0
>    54.4   16.3  449.5  652.9  0.0  0.7    0.0   10.4   0  39 c1t5000C50055F8E017d0
>    57.8   28.4  503.3  796.1  0.0  0.9    0.0   10.1   0  45 c1t5000C50055E579F7d0
>    58.2   24.9  502.0  841.7  0.0  0.8    0.0    9.2   0  40 c1t5000C50055E65807d0
>    58.2   20.7  513.4  766.6  0.0  0.8    0.0    9.8   0  41 c1t5000C50055F84A97d0
>    56.5   24.9  508.0  857.5  0.0  0.8    0.0    9.2   0  40 c1t5000C50055F87D97d0
>    53.4   13.5  449.9  790.5  0.0  0.7    0.0   10.7   0  38 c1t5000C50055F9F637d0
>    57.0   11.8  503.0  790.9  0.0  0.7    0.0   10.6   0  39 c1t5000C50055E65ABBd0
>    55.4    9.6  461.1  737.9  0.0  0.8    0.0   11.6   0  40 c1t5000C50055F8BF9Bd0
>    55.7   19.7  484.6  774.3  0.0  0.7    0.0    9.9   0  40 c1t5000C50055F8A22Bd0
>    57.6   27.1  518.2  823.7  0.0  0.8    0.0    8.9   0  40 c1t5000C50055F9379Bd0
>    59.6   17.0  528.0  756.1  0.0  0.8    0.0   10.1   0  41 c1t5000C50055E57A5Fd0
>    61.2   10.8  530.0  715.6  0.0  0.8    0.0   10.7   0  40 c1t5000C50055F8CCAFd0
>    58.0   30.8  493.3  788.3  0.0  0.8    0.0    9.4   0  43 c1t5000C50055F8B80Fd0
>    56.5   19.9  490.7  708.7  0.0  0.8    0.0   10.0   0  40 c1t5000C50055F9FA1Fd0
>    56.1   22.4  484.2  701.1  0.0  0.7    0.0    9.5   0  39 c1t5000C50055E65F0Fd0
>    59.2   14.6  560.9  765.1  0.0  0.7    0.0    9.8   0  39 c1t5000C50055F8BE3Fd0
>    57.9   16.2  546.0  724.7  0.0  0.7    0.0   10.1   0  40 c1t5000C50055F8B21Fd0
>    59.5   30.0  553.2  741.8  0.0  0.9    0.0    9.8   0  45 c1t5000C50055F8A46Fd0
>    57.4   22.5  504.0  724.5  0.0  0.8    0.0    9.6   0  41 c1t5000C50055F856CFd0
>    58.4   24.6  531.4  786.9  0.0  0.7    0.0    8.4   0  38 c1t5000C50055E6606Fd0
>   511.0  161.4 7572.1 11260.1  0.0  0.3    0.0    0.4   0  14 c2
>   252.3   20.1 3776.3  458.9  0.0  0.1    0.0    0.2   0   6 c2t500117310015D579d0
>   258.8   18.0 3795.7  350.0  0.0  0.1    0.0    0.2   0   6 c2t50011731001631FDd0
>     0.0  123.4    0.0 10451.1  0.0  0.1    0.0    1.0   0   3 c2t5000A72A3007811Dd0
>     0.2   16.1    1.9   56.7  0.0  0.0    0.0    0.0   0   0 c4
>     0.2    8.1    1.6   28.3  0.0  0.0    0.0    0.0   0   0 c4t0d0
>     0.0    8.1    0.3   28.3  0.0  0.0    0.0    0.0   0   0 c4t1d0
>   495.6  163.6 7168.9 11290.3  0.0  0.2    0.0    0.4   0  14 c12
>     0.0  123.4    0.0 10451.1  0.0  0.1    0.0    1.0   0   3 c12t5000A72B300780FFd0
>   248.2   18.1 3645.8  323.0  0.0  0.1    0.0    0.2   0   5 c12t500117310015D59Ed0
>   247.4   22.1 3523.1  516.2  0.0  0.1    0.0    0.2   0   6 c12t500117310015D54Ed0
>     0.2   14.8    1.9   56.7  0.0  0.0    0.6    0.1   0   0 rpool
>  3883.5 1357.7 40141.6 60739.5 22.8 38.6    4.4    7.4  54 100 tank
> 
> It is very busy with alot of wait % and higher asvc_t (2011% busy on c1?!). I'm assuming resilvers are alot more aggressive than scrubs.
> 
> There are many variables here, the biggest of which is the current non-scrub load.
> 
> I might have lost 2 weeks of scrub time, depending on whether the scrub will resume where it left off. I'll update when I can. 
>  
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140730/c2cf4246/attachment-0001.html>

From richard.elling at richardelling.com  Thu Jul 31 16:06:29 2014
From: richard.elling at richardelling.com (Richard Elling)
Date: Thu, 31 Jul 2014 09:06:29 -0700
Subject: [OmniOS-discuss] Slow scrub performance
In-Reply-To: <BE32B046-05EC-43D2-AC46-7649FAC34C2A@richardelling.com>
References: <CA+tR_KxTmft+hK9U8HLgqmPOdZf0+9q67+Kg9p1JwX6sg5drAA@mail.gmail.com>
	<C7904250-3A94-4D4B-8D62-D49FDE808008@richardelling.com>
	<CA+tR_KwX_1HN4tVa+-ZOFJk2mN7RE-nFh31sMcTNo7TJJjfyLg@mail.gmail.com>
	<CA+tR_KxSt3VVeUVZAOL4FxWsRL3M-PwpBF6JbHVXv5Xx66Jozg@mail.gmail.com>
	<BE32B046-05EC-43D2-AC46-7649FAC34C2A@richardelling.com>
Message-ID: <548A8D26-CE2B-4C3B-BBB8-661F6D7C8B49@richardelling.com>

correction below...

On Jul 30, 2014, at 10:37 PM, Richard Elling <richard.elling at richardelling.com> wrote:

> apologies for the long post, data for big systems tends to do that, comments below...
> 
> On Jul 30, 2014, at 9:10 PM, wuffers <moo at wuffers.net> wrote:
> 
>> So as I suspected, I lost 2 weeks of scrub time after the resilver. I started a scrub again, and it's going extremely slow (~13x slower than before):
>> 
>>   pool: tank
>>  state: ONLINE
>>   scan: scrub in progress since Tue Jul 29 15:41:27 2014
>>     45.4G scanned out of 24.5T at 413K/s, (scan is slow, no estimated time)
>>     0 repaired, 0.18% done
>> 
>> # iostat -zxCn 60 2 (2nd batch output)
>> 
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>   143.7 1321.5 5149.0 46223.4  0.0  1.5    0.0    1.0   0 120 c1
>>     2.4   33.3   72.0  897.5  0.0  0.0    0.0    0.6   0   2 c1t5000C50055F8723Bd0
>>     2.7   22.8   82.9 1005.4  0.0  0.0    0.0    0.9   0   2 c1t5000C50055E66B63d0
>>     2.2   24.4   73.1  917.7  0.0  0.0    0.0    0.7   0   2 c1t5000C50055F87E73d0
>>     3.1   26.2  120.9  899.8  0.0  0.0    0.0    0.8   0   2 c1t5000C50055F8BFA3d0
>>     2.8   16.5  105.9  941.6  0.0  0.0    0.0    1.0   0   2 c1t5000C50055F9E123d0
>>     2.5   25.6   86.6  897.9  0.0  0.0    0.0    0.7   0   2 c1t5000C50055F9F0B3d0
>>     2.3   19.9   85.3  967.8  0.0  0.0    0.0    1.2   0   2 c1t5000C50055F9D3B3d0
>>     3.1   38.3  120.7 1053.1  0.0  0.0    0.0    0.8   0   3 c1t5000C50055E4FDE7d0
>>     2.6   12.7   81.8  854.3  0.0  0.0    0.0    1.6   0   2 c1t5000C50055F9A607d0
>>     3.2   25.0  121.7  871.7  0.0  0.0    0.0    0.8   0   2 c1t5000C50055F8CDA7d0
>>     2.5   30.6   93.0  941.2  0.0  0.0    0.0    0.9   0   2 c1t5000C50055E65877d0
>>     3.1   43.7  101.4 1004.2  0.0  0.0    0.0    1.0   0   4 c1t5000C50055F9E7D7d0
>>     2.3   24.0   92.2  965.8  0.0  0.0    0.0    0.9   0   2 c1t5000C50055FA0AF7d0
>>     2.5   25.3   99.2  872.9  0.0  0.0    0.0    0.8   0   2 c1t5000C50055F9FE87d0
>>     2.9   19.0  116.1  894.8  0.0  0.0    0.0    1.2   0   2 c1t5000C50055F9F91Bd0
>>     2.6   38.9   96.1  915.4  0.0  0.1    0.0    1.2   0   4 c1t5000C50055F9FEABd0
>>     3.2   45.6  135.7  973.5  0.0  0.1    0.0    1.5   0   5 c1t5000C50055F9F63Bd0
>>     3.1   21.2  105.9  966.6  0.0  0.0    0.0    1.0   0   2 c1t5000C50055F9F3EBd0
>>     2.8   26.7  122.0  781.6  0.0  0.0    0.0    0.7   0   2 c1t5000C50055F9F80Bd0
>>     3.1   31.6  119.9  932.5  0.0  0.0    0.0    1.1   0   3 c1t5000C50055F9FB8Bd0
>>     3.1   32.5  123.3  924.1  0.0  0.0    0.0    0.9   0   3 c1t5000C50055F9F92Bd0
>>     2.9   17.0  113.8  952.0  0.0  0.0    0.0    1.2   0   2 c1t5000C50055F8905Fd0
>>     3.0   23.4  111.0  871.1  0.0  0.0    0.0    1.5   0   2 c1t5000C50055F8D48Fd0
>>     2.8   21.4  105.5  858.0  0.0  0.0    0.0    1.0   0   2 c1t5000C50055F9F89Fd0
>>     3.5   16.4   87.1  941.3  0.0  0.0    0.0    1.4   0   2 c1t5000C50055F9EF2Fd0
>>     2.1   33.8   64.5  897.5  0.0  0.0    0.0    0.5   0   2 c1t5000C50055F8C3ABd0
>>     3.0   21.8   72.3 1005.4  0.0  0.0    0.0    1.0   0   2 c1t5000C50055E66053d0
>>     3.0   37.8  106.9 1053.5  0.0  0.0    0.0    0.9   0   3 c1t5000C50055E66503d0
>>     2.7   26.0  107.7  897.9  0.0  0.0    0.0    0.7   0   2 c1t5000C50055F9D3E3d0
>>     2.2   38.9   96.4  918.7  0.0  0.0    0.0    0.9   0   4 c1t5000C50055F84FB7d0
>>     2.8   21.4  111.1  953.6  0.0  0.0    0.0    0.7   0   1 c1t5000C50055F8E017d0
>>     3.0   30.6  104.3  940.9  0.0  0.1    0.0    1.5   0   3 c1t5000C50055E579F7d0
>>     2.8   26.4   90.9  901.1  0.0  0.0    0.0    0.9   0   2 c1t5000C50055E65807d0
>>     2.4   24.0   96.7  965.8  0.0  0.0    0.0    0.9   0   2 c1t5000C50055F84A97d0
>>     2.9   19.8  109.4  967.8  0.0  0.0    0.0    1.1   0   2 c1t5000C50055F87D97d0
>>     3.8   16.1  106.4  943.1  0.0  0.0    0.0    1.3   0   2 c1t5000C50055F9F637d0
>>     2.2   17.1   72.7  966.6  0.0  0.0    0.0    1.4   0   2 c1t5000C50055E65ABBd0
>>     2.7   12.7   86.0  863.3  0.0  0.0    0.0    1.5   0   2 c1t5000C50055F8BF9Bd0
>>     2.7   23.2  101.8  871.1  0.0  0.0    0.0    1.0   0   2 c1t5000C50055F8A22Bd0
>>     4.5   43.6  134.7 1004.2  0.0  0.0    0.0    1.0   0   4 c1t5000C50055F9379Bd0
>>     2.8   24.0   87.9  917.7  0.0  0.0    0.0    0.8   0   2 c1t5000C50055E57A5Fd0
>>     2.9   18.8  119.0  894.3  0.0  0.0    0.0    1.1   0   2 c1t5000C50055F8CCAFd0
>>     3.4   45.7  128.1  976.8  0.0  0.1    0.0    1.2   0   5 c1t5000C50055F8B80Fd0
>>     2.7   24.9  100.2  871.7  0.0  0.0    0.0    0.8   0   2 c1t5000C50055F9FA1Fd0
>>     4.8   26.8  128.6  781.6  0.0  0.0    0.0    0.7   0   2 c1t5000C50055E65F0Fd0
>>     2.7   16.3  109.5  941.6  0.0  0.0    0.0    1.1   0   2 c1t5000C50055F8BE3Fd0
>>     3.1   21.1  119.9  858.0  0.0  0.0    0.0    1.1   0   2 c1t5000C50055F8B21Fd0
>>     2.8   31.8  108.5  932.5  0.0  0.0    0.0    1.0   0   3 c1t5000C50055F8A46Fd0
>>     2.4   25.3   87.4  872.9  0.0  0.0    0.0    0.8   0   2 c1t5000C50055F856CFd0
>>     3.3   32.0  125.2  924.1  0.0  0.0    0.0    1.2   0   3 c1t5000C50055E6606Fd0
>>   289.9  169.0 3905.0 12754.1  0.0  0.2    0.0    0.4   0  10 c2
>>   146.6   14.1 1987.9  305.2  0.0  0.0    0.0    0.2   0   4 c2t500117310015D579d0
>>   143.4   10.6 1917.1  205.2  0.0  0.0    0.0    0.2   0   3 c2t50011731001631FDd0
>>     0.0  144.3    0.0 12243.7  0.0  0.1    0.0    0.9   0   3 c2t5000A72A3007811Dd0
>>     0.0   14.6    0.0   75.8  0.0  0.0    0.0    0.1   0   0 c4
>>     0.0    7.3    0.0   37.9  0.0  0.0    0.0    0.1   0   0 c4t0d0
>>     0.0    7.3    0.0   37.9  0.0  0.0    0.0    0.1   0   0 c4t1d0
>>   284.8  171.5 3792.8 12786.2  0.0  0.2    0.0    0.4   0  10 c12
>>     0.0  144.3    0.0 12243.7  0.0  0.1    0.0    0.9   0   3 c12t5000A72B300780FFd0
>>   152.3   13.3 2004.6  255.9  0.0  0.0    0.0    0.2   0   4 c12t500117310015D59Ed0
>>   132.5   13.9 1788.2  286.6  0.0  0.0    0.0    0.2   0   3 c12t500117310015D54Ed0
>>     0.0   13.5    0.0   75.8  0.0  0.0    0.8    0.1   0   0 rpool
>>   718.4 1653.5 12846.8 71761.5 34.0  2.0   14.3    0.8   7  51 tank
>> 
>> This doesn't seem any busier than my earlier output (6% wait, 68% busy, asvc_t 1.1ms) and the dev team confirms that their workload hasn't changed for the past few days. If my math is right.. this will take ~719 days to complete.
> 
> The %busy for controllers is a sum of the %busy for all disks on the controller, so
> is can be large, but overall isn't interesting. With HDDs, there is no way you can 
> saturate the controller, so we don't really care what the %busy shows.
> 
> The more important item is that the number of read ops is fairly low for all but 4 disks.
> Since you didn't post the pool configuration, we can only guess that they might be a
> souce of the bottleneck. 

the above paragraph missed the editor's cut. You did post the pool config, thanks!
 -- richard

> 
> You're seeing a lot of reads from the cache devices. How much RAM does this system
> have?
> 
>> 
>> Anything I can tune to help speed this up?
> 
> methinks the scrub I/Os are getting starved and since they are low priority, they 
> could get very starved. In general, I wouldn't worry about it, but I understand 
> why you might be nervous. Keep in mind that in ZFS scrubs are intended to find 
> errors on idle data, not frequently accessed data.
> 
> more far below...
> 
>> 
>> On Tue, Jul 29, 2014 at 3:29 PM, wuffers <moo at wuffers.net> wrote:
>> Going to try to answer both responses in one message..
>> 
>> Short answer, yes. ? Keep in mind that
>> 
>> 1. a scrub runs in the background (so as not to impact production I/O, this was not always the case and caused serious issues in the past with a pool being unresponsive due to a scrub)
>> 
>> 2. a scrub essentially walks the zpool examining every transaction in order (as does a resilver)
>> 
>> So the time to complete a scrub depends on how many write transactions since the pool was created (which is generally related to the amount of data but not always). You are limited by the random I/O capability of the disks involved. With VMs I assume this is a file server, so the I/O size will also affect performance.
>> 
>> I haven't noticed any slowdowns in our virtual environments, so I guess that's a good thing it's so low priority that it doesn't impact workloads. 
>> 
>> Run the numbers? you are scanning 24.2TB at about 5.5MB/sec ? 4,613,734 seconds or 54 days. And that assumes the same rate for all of the scan. The rate will change as other I/O competes for resources.
>> 
>> The number was fluctuating when I started the scrub, and I had seen it go as high as 35MB/s at one point. I am certain that our Hyper-V workload has increased since the last scrub, so this does make sense.
>>  
>> Looks like you have a fair bit of activity going on (almost 1MB/sec of writes per spindle).
>> 
>> As Richard correctly states below, this is the aggregate since boot (uptime ~56 days). I have another output from iostat as per his instructions below. 
>>  
>> Since this is storage for VMs, I assume this is the storage server for separate compute servers? Have you tuned the block size for the file share you are using? That can make a huge difference in performance.
>> 
>> Both the Hyper-V and VMware LUNs are created with 64K block sizes. From what I've read of other performance and tuning articles, that is the optimal block size (I did some limited testing when first configuring the SAN, but results were somewhat inconclusive). Hyper-V hosts our testing environment (we integrate with TFS, a MS product, so we have no choice here) and probably make up the bulk of the workload (~300+ test VMs with various OSes). VMware hosts our production servers (Exchange, file servers, SQL, AD, etc - ~50+ VMs).
>> 
>> I also noted that you only have a single LOG device. Best Practice is to mirror log devices so you do not lose any data in flight if hit by a power outage (of course, if this server has more UPS runtime that all the clients that may not matter).
>> 
>> Actually, I do have a mirror ZIL device, it's just disabled at this time (my ZIL devices are ZeusRAMs). At some point, I was troubleshooting some kernel panics (turned out to be a faulty SSD on the rpool), and hadn't re-enabled it yet. Thanks for the reminder (and yes, we do have a UPS as well). 
>> 
>> And oops.. re-attaching the ZIL as a mirror triggered a resilver now, suspending or canceling the scrub? Will monitor this and restart the scrub if it doesn't by itself.
>> 
>>   pool: tank
>>  state: ONLINE
>> status: One or more devices is currently being resilvered.  The pool will
>>         continue to function, possibly in a degraded state.
>> action: Wait for the resilver to complete.
>>   scan: resilver in progress since Tue Jul 29 14:48:48 2014
>>     3.89T scanned out of 24.5T at 3.06G/s, 1h55m to go
>>     0 resilvered, 15.84% done
>> 
>> At least it's going very fast. EDIT: Now about 67% done as I finish writing this, speed dropping to ~1.3G/s. 
>> 
>> maybe, maybe not
>> 
>> this is slower than most, surely slower than desired
>> 
>> Unfortunately reattaching the mirror to my log device triggered a resilver. Not sure if this is desired behavior, but yes, 5.5MB/s seems quite slow. Hopefully after the resilver the scrub will progress where it left off. 
>>  
>> The estimate is often very wrong, especially for busy systems.
>> If this is an older ZFS implementation, this pool is likely getting pounded by the
>> ZFS write throttle. There are some tunings that can be applied, but the old write
>> throttle is not a stable control system, so it will always be a little bit unpredictable.
>> 
>> The system is on r151008 (my BE states that I upgraded back in February, putting me in r151008j or so), with all the pools upgraded for the new enhancements as well as activating the new L2ARC compression feature. Reading the release notes, the ZFS write throttle enhancements were in since r151008e so I should be good there.
>>  
>>> # iostat -xnze
>> 
>> Unfortunately, this is the performance since boot and is not suitable for performance
>> analysis unless the system has been rebooted in the past 10 minutes or so. You'll need
>> to post the second batch from "iostat -zxCn 60 2"
>> 
>> Ah yes, that was my mistake. Output from second count (before re-attaching log mirror):
>> 
>> # iostat -zxCn 60 2
>> 
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>   255.7 1077.7 6294.0 41335.1  0.0  1.9    0.0    1.4   0 153 c1
>>     5.3   23.9  118.5  811.9  0.0  0.0    0.0    1.1   0   3 c1t5000C50055F8723Bd0
>>     5.9   14.5  110.0  834.3  0.0  0.0    0.0    1.3   0   2 c1t5000C50055E66B63d0
>>     5.6   16.6  123.8  822.7  0.0  0.0    0.0    1.3   0   2 c1t5000C50055F87E73d0
>>     4.7   27.8  118.6  796.6  0.0  0.0    0.0    1.3   0   3 c1t5000C50055F8BFA3d0
>>     5.6   14.5  139.7  833.8  0.0  0.0    0.0    1.6   0   3 c1t5000C50055F9E123d0
>>     4.4   27.1  112.3  825.2  0.0  0.0    0.0    0.8   0   2 c1t5000C50055F9F0B3d0
>>     5.0   20.2  121.7  803.4  0.0  0.0    0.0    1.2   0   3 c1t5000C50055F9D3B3d0
>>     5.4   26.4  137.0  857.3  0.0  0.0    0.0    1.4   0   4 c1t5000C50055E4FDE7d0
>>     4.7   12.3  123.7  832.7  0.0  0.0    0.0    2.0   0   3 c1t5000C50055F9A607d0
>>     5.0   23.9  125.9  830.9  0.0  0.0    0.0    1.3   0   3 c1t5000C50055F8CDA7d0
>>     4.5   31.4  112.2  814.6  0.0  0.0    0.0    1.1   0   3 c1t5000C50055E65877d0
>>     5.2   24.4  130.6  872.5  0.0  0.0    0.0    1.2   0   3 c1t5000C50055F9E7D7d0
>>     4.1   21.8  103.7  797.2  0.0  0.0    0.0    1.1   0   3 c1t5000C50055FA0AF7d0
>>     5.5   24.8  129.8  802.8  0.0  0.0    0.0    1.5   0   4 c1t5000C50055F9FE87d0
>>     5.7   17.7  137.2  797.6  0.0  0.0    0.0    1.4   0   3 c1t5000C50055F9F91Bd0
>>     6.0   30.6  139.1  852.0  0.0  0.1    0.0    1.5   0   4 c1t5000C50055F9FEABd0
>>     6.1   34.1  137.8  929.2  0.0  0.1    0.0    1.9   0   6 c1t5000C50055F9F63Bd0
>>     4.1   15.9  101.8  791.4  0.0  0.0    0.0    1.6   0   3 c1t5000C50055F9F3EBd0
>>     6.4   23.2  155.2  878.6  0.0  0.0    0.0    1.1   0   3 c1t5000C50055F9F80Bd0
>>     4.5   23.5  106.2  825.4  0.0  0.0    0.0    1.1   0   3 c1t5000C50055F9FB8Bd0
>>     4.0   23.2  101.1  788.9  0.0  0.0    0.0    1.3   0   3 c1t5000C50055F9F92Bd0
>>     4.4   11.3  125.7  782.3  0.0  0.0    0.0    1.9   0   3 c1t5000C50055F8905Fd0
>>     4.6   20.4  129.2  823.0  0.0  0.0    0.0    1.5   0   3 c1t5000C50055F8D48Fd0
>>     5.1   19.7  142.9  887.2  0.0  0.0    0.0    1.7   0   3 c1t5000C50055F9F89Fd0
>>     5.6   11.4  129.1  776.0  0.0  0.0    0.0    1.9   0   3 c1t5000C50055F9EF2Fd0
>>     5.6   23.7  137.4  811.9  0.0  0.0    0.0    1.2   0   3 c1t5000C50055F8C3ABd0
>>     6.8   13.9  132.4  834.3  0.0  0.0    0.0    1.8   0   3 c1t5000C50055E66053d0
>>     5.2   26.7  126.9  857.3  0.0  0.0    0.0    1.2   0   3 c1t5000C50055E66503d0
>>     4.2   27.1  104.6  825.2  0.0  0.0    0.0    1.0   0   3 c1t5000C50055F9D3E3d0
>>     5.2   30.7  140.9  852.0  0.0  0.1    0.0    1.5   0   4 c1t5000C50055F84FB7d0
>>     5.4   16.1  124.3  791.4  0.0  0.0    0.0    1.7   0   3 c1t5000C50055F8E017d0
>>     3.8   31.4   89.7  814.6  0.0  0.0    0.0    1.1   0   4 c1t5000C50055E579F7d0
>>     4.6   27.5  116.0  796.6  0.0  0.1    0.0    1.6   0   4 c1t5000C50055E65807d0
>>     4.0   21.5   99.7  797.2  0.0  0.0    0.0    1.1   0   3 c1t5000C50055F84A97d0
>>     4.7   20.2  116.3  803.4  0.0  0.0    0.0    1.4   0   3 c1t5000C50055F87D97d0
>>     5.0   11.5  121.5  776.0  0.0  0.0    0.0    2.0   0   3 c1t5000C50055F9F637d0
>>     4.9   11.3  112.4  782.3  0.0  0.0    0.0    2.3   0   3 c1t5000C50055E65ABBd0
>>     5.3   11.8  142.5  832.7  0.0  0.0    0.0    2.4   0   3 c1t5000C50055F8BF9Bd0
>>     5.0   20.3  121.4  823.0  0.0  0.0    0.0    1.7   0   3 c1t5000C50055F8A22Bd0
>>     6.6   24.3  170.3  872.5  0.0  0.0    0.0    1.3   0   3 c1t5000C50055F9379Bd0
>>     5.8   16.3  121.7  822.7  0.0  0.0    0.0    1.3   0   2 c1t5000C50055E57A5Fd0
>>     5.3   17.7  146.5  797.6  0.0  0.0    0.0    1.4   0   3 c1t5000C50055F8CCAFd0
>>     5.7   34.1  141.5  929.2  0.0  0.1    0.0    1.7   0   5 c1t5000C50055F8B80Fd0
>>     5.5   23.8  125.7  830.9  0.0  0.0    0.0    1.2   0   3 c1t5000C50055F9FA1Fd0
>>     5.0   23.2  127.9  878.6  0.0  0.0    0.0    1.1   0   3 c1t5000C50055E65F0Fd0
>>     5.2   14.0  163.7  833.8  0.0  0.0    0.0    2.0   0   3 c1t5000C50055F8BE3Fd0
>>     4.6   18.9  122.8  887.2  0.0  0.0    0.0    1.6   0   3 c1t5000C50055F8B21Fd0
>>     5.5   23.6  137.4  825.4  0.0  0.0    0.0    1.5   0   3 c1t5000C50055F8A46Fd0
>>     4.9   24.6  116.7  802.8  0.0  0.0    0.0    1.4   0   4 c1t5000C50055F856CFd0
>>     4.9   23.4  120.8  788.9  0.0  0.0    0.0    1.4   0   3 c1t5000C50055E6606Fd0
>>   234.9  170.1 4079.9 11127.8  0.0  0.2    0.0    0.5   0   9 c2
>>   119.0   28.9 2083.8  670.8  0.0  0.0    0.0    0.3   0   3 c2t500117310015D579d0
>>   115.9   27.4 1996.1  634.2  0.0  0.0    0.0    0.3   0   3 c2t50011731001631FDd0
>>     0.0  113.8    0.0 9822.8  0.0  0.1    0.0    1.0   0   2 c2t5000A72A3007811Dd0
>>     0.1   18.5    0.0   64.8  0.0  0.0    0.0    0.0   0   0 c4
>>     0.1    9.2    0.0   32.4  0.0  0.0    0.0    0.0   0   0 c4t0d0
>>     0.0    9.2    0.0   32.4  0.0  0.0    0.0    0.0   0   0 c4t1d0
>>   229.8   58.1 3987.4 1308.0  0.0  0.1    0.0    0.3   0   6 c12
>>   114.2   27.7 1994.8  626.0  0.0  0.0    0.0    0.3   0   3 c12t500117310015D59Ed0
>>   115.5   30.4 1992.6  682.0  0.0  0.0    0.0    0.3   0   3 c12t500117310015D54Ed0
>>     0.1   17.1    0.0   64.8  0.0  0.0    0.6    0.1   0   0 rpool
>>   720.3 1298.4 14361.2 53770.8 18.7  2.3    9.3    1.1   6  68 tank
> 
> ok, so the pool is issuing 720 read iops, including resilver workload, vs 1298 write iops.
> There is plenty of I/O capacity left on the table here, as you can see by the %busy being
> so low.
> 
> So I think the pool is not scheduling scrub I/Os very well. You can increase the number of
> scrub I/Os in the scheduler by adjusting the zfs_vdev_scrub_max_active tunable. The
> default is 2, but you'll have to consider that a share (in the stock market sense) where
> the active sync reads and writes are getting 10 each. You can try bumping up the value
> and see what happens over some time, perhaps 10 minutes or so -- too short of a time
> and you won't get a good feeling for the impact (try this in off-peak time).
> 	echo zfs_vdev_scrub_max_active/W0t5 | mdb -kw
> will change the value from 2 to 5, increasing its share of the total I/O workload.
> 
> You can see the progress of scan (scrubs do scan) workload by looking at the ZFS
> debug messages.
> 	echo ::zfs_dbgmsg | mdb -k
> These will look mysterious... they are. But the interesting bits are about how many blocks
> are visited in some amount of time (txg sync interval). Ideally, this will change as you 
> adjust zfs_vdev_scrub_max_active.
>  -- richard
> 
>>  
>> Is 153% busy correct on c1? Seems to me that disks are quite "busy", but are handling the workload just fine (wait at 6% and asvc_t at 1.1ms)
>> 
>> Interestingly, this is the same output now that the resilver is running:
>> 
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>  2876.9 1041.1 25400.7 38189.1  0.0 37.9    0.0    9.7   0 2011 c1
>>    60.8   26.1  540.1  845.2  0.0  0.7    0.0    8.3   0  39 c1t5000C50055F8723Bd0
>>    58.4   14.2  511.6  740.7  0.0  0.7    0.0   10.1   0  39 c1t5000C50055E66B63d0
>>    60.2   16.3  529.3  756.1  0.0  0.8    0.0   10.1   0  41 c1t5000C50055F87E73d0
>>    57.5   24.9  527.6  841.7  0.0  0.7    0.0    9.0   0  40 c1t5000C50055F8BFA3d0
>>    57.9   14.5  543.5  765.1  0.0  0.7    0.0    9.8   0  38 c1t5000C50055F9E123d0
>>    57.9   23.9  516.6  806.9  0.0  0.8    0.0    9.3   0  40 c1t5000C50055F9F0B3d0
>>    59.8   24.6  554.1  857.5  0.0  0.8    0.0    9.6   0  42 c1t5000C50055F9D3B3d0
>>    56.5   21.0  480.4  715.7  0.0  0.7    0.0    8.9   0  37 c1t5000C50055E4FDE7d0
>>    54.8    9.7  473.5  737.9  0.0  0.7    0.0   11.2   0  39 c1t5000C50055F9A607d0
>>    55.8   20.2  457.3  708.7  0.0  0.7    0.0    9.9   0  40 c1t5000C50055F8CDA7d0
>>    57.8   28.6  487.0  796.1  0.0  0.9    0.0    9.9   0  45 c1t5000C50055E65877d0
>>    60.8   27.1  572.6  823.7  0.0  0.8    0.0    8.8   0  41 c1t5000C50055F9E7D7d0
>>    55.8   21.1  478.2  766.6  0.0  0.7    0.0    9.7   0  40 c1t5000C50055FA0AF7d0
>>    57.0   22.8  528.3  724.5  0.0  0.8    0.0    9.6   0  41 c1t5000C50055F9FE87d0
>>    56.2   10.8  465.2  715.6  0.0  0.7    0.0   10.4   0  38 c1t5000C50055F9F91Bd0
>>    59.2   29.4  524.6  740.9  0.0  0.8    0.0    8.9   0  41 c1t5000C50055F9FEABd0
>>    57.3   30.7  496.7  788.3  0.0  0.8    0.0    9.1   0  42 c1t5000C50055F9F63Bd0
>>    55.5   16.3  461.9  652.9  0.0  0.7    0.0   10.1   0  39 c1t5000C50055F9F3EBd0
>>    57.2   22.1  495.1  701.1  0.0  0.8    0.0    9.8   0  41 c1t5000C50055F9F80Bd0
>>    59.5   30.2  543.1  741.8  0.0  0.9    0.0    9.6   0  45 c1t5000C50055F9FB8Bd0
>>    56.5   25.1  515.4  786.9  0.0  0.7    0.0    8.6   0  38 c1t5000C50055F9F92Bd0
>>    61.8   12.5  540.6  790.9  0.0  0.8    0.0   10.3   0  41 c1t5000C50055F8905Fd0
>>    57.0   19.8  521.0  774.3  0.0  0.7    0.0    9.6   0  39 c1t5000C50055F8D48Fd0
>>    56.3   16.3  517.7  724.7  0.0  0.7    0.0    9.9   0  38 c1t5000C50055F9F89Fd0
>>    57.0   13.4  504.5  790.5  0.0  0.8    0.0   10.7   0  40 c1t5000C50055F9EF2Fd0
>>    55.0   26.1  477.6  845.2  0.0  0.7    0.0    8.3   0  36 c1t5000C50055F8C3ABd0
>>    57.8   14.1  518.7  740.7  0.0  0.8    0.0   10.8   0  41 c1t5000C50055E66053d0
>>    55.9   20.8  490.2  715.7  0.0  0.7    0.0    9.0   0  37 c1t5000C50055E66503d0
>>    57.0   24.1  509.7  806.9  0.0  0.8    0.0   10.0   0  41 c1t5000C50055F9D3E3d0
>>    59.1   29.2  504.1  740.9  0.0  0.8    0.0    9.3   0  44 c1t5000C50055F84FB7d0
>>    54.4   16.3  449.5  652.9  0.0  0.7    0.0   10.4   0  39 c1t5000C50055F8E017d0
>>    57.8   28.4  503.3  796.1  0.0  0.9    0.0   10.1   0  45 c1t5000C50055E579F7d0
>>    58.2   24.9  502.0  841.7  0.0  0.8    0.0    9.2   0  40 c1t5000C50055E65807d0
>>    58.2   20.7  513.4  766.6  0.0  0.8    0.0    9.8   0  41 c1t5000C50055F84A97d0
>>    56.5   24.9  508.0  857.5  0.0  0.8    0.0    9.2   0  40 c1t5000C50055F87D97d0
>>    53.4   13.5  449.9  790.5  0.0  0.7    0.0   10.7   0  38 c1t5000C50055F9F637d0
>>    57.0   11.8  503.0  790.9  0.0  0.7    0.0   10.6   0  39 c1t5000C50055E65ABBd0
>>    55.4    9.6  461.1  737.9  0.0  0.8    0.0   11.6   0  40 c1t5000C50055F8BF9Bd0
>>    55.7   19.7  484.6  774.3  0.0  0.7    0.0    9.9   0  40 c1t5000C50055F8A22Bd0
>>    57.6   27.1  518.2  823.7  0.0  0.8    0.0    8.9   0  40 c1t5000C50055F9379Bd0
>>    59.6   17.0  528.0  756.1  0.0  0.8    0.0   10.1   0  41 c1t5000C50055E57A5Fd0
>>    61.2   10.8  530.0  715.6  0.0  0.8    0.0   10.7   0  40 c1t5000C50055F8CCAFd0
>>    58.0   30.8  493.3  788.3  0.0  0.8    0.0    9.4   0  43 c1t5000C50055F8B80Fd0
>>    56.5   19.9  490.7  708.7  0.0  0.8    0.0   10.0   0  40 c1t5000C50055F9FA1Fd0
>>    56.1   22.4  484.2  701.1  0.0  0.7    0.0    9.5   0  39 c1t5000C50055E65F0Fd0
>>    59.2   14.6  560.9  765.1  0.0  0.7    0.0    9.8   0  39 c1t5000C50055F8BE3Fd0
>>    57.9   16.2  546.0  724.7  0.0  0.7    0.0   10.1   0  40 c1t5000C50055F8B21Fd0
>>    59.5   30.0  553.2  741.8  0.0  0.9    0.0    9.8   0  45 c1t5000C50055F8A46Fd0
>>    57.4   22.5  504.0  724.5  0.0  0.8    0.0    9.6   0  41 c1t5000C50055F856CFd0
>>    58.4   24.6  531.4  786.9  0.0  0.7    0.0    8.4   0  38 c1t5000C50055E6606Fd0
>>   511.0  161.4 7572.1 11260.1  0.0  0.3    0.0    0.4   0  14 c2
>>   252.3   20.1 3776.3  458.9  0.0  0.1    0.0    0.2   0   6 c2t500117310015D579d0
>>   258.8   18.0 3795.7  350.0  0.0  0.1    0.0    0.2   0   6 c2t50011731001631FDd0
>>     0.0  123.4    0.0 10451.1  0.0  0.1    0.0    1.0   0   3 c2t5000A72A3007811Dd0
>>     0.2   16.1    1.9   56.7  0.0  0.0    0.0    0.0   0   0 c4
>>     0.2    8.1    1.6   28.3  0.0  0.0    0.0    0.0   0   0 c4t0d0
>>     0.0    8.1    0.3   28.3  0.0  0.0    0.0    0.0   0   0 c4t1d0
>>   495.6  163.6 7168.9 11290.3  0.0  0.2    0.0    0.4   0  14 c12
>>     0.0  123.4    0.0 10451.1  0.0  0.1    0.0    1.0   0   3 c12t5000A72B300780FFd0
>>   248.2   18.1 3645.8  323.0  0.0  0.1    0.0    0.2   0   5 c12t500117310015D59Ed0
>>   247.4   22.1 3523.1  516.2  0.0  0.1    0.0    0.2   0   6 c12t500117310015D54Ed0
>>     0.2   14.8    1.9   56.7  0.0  0.0    0.6    0.1   0   0 rpool
>>  3883.5 1357.7 40141.6 60739.5 22.8 38.6    4.4    7.4  54 100 tank
>> 
>> It is very busy with alot of wait % and higher asvc_t (2011% busy on c1?!). I'm assuming resilvers are alot more aggressive than scrubs.
>> 
>> There are many variables here, the biggest of which is the current non-scrub load.
>> 
>> I might have lost 2 weeks of scrub time, depending on whether the scrub will resume where it left off. I'll update when I can. 
>>  
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140731/fbbcb895/attachment-0001.html>

From moo at wuffers.net  Thu Jul 31 19:44:45 2014
From: moo at wuffers.net (wuffers)
Date: Thu, 31 Jul 2014 15:44:45 -0400
Subject: [OmniOS-discuss] Slow scrub performance
In-Reply-To: <BE32B046-05EC-43D2-AC46-7649FAC34C2A@richardelling.com>
References: <CA+tR_KxTmft+hK9U8HLgqmPOdZf0+9q67+Kg9p1JwX6sg5drAA@mail.gmail.com>
	<C7904250-3A94-4D4B-8D62-D49FDE808008@richardelling.com>
	<CA+tR_KwX_1HN4tVa+-ZOFJk2mN7RE-nFh31sMcTNo7TJJjfyLg@mail.gmail.com>
	<CA+tR_KxSt3VVeUVZAOL4FxWsRL3M-PwpBF6JbHVXv5Xx66Jozg@mail.gmail.com>
	<BE32B046-05EC-43D2-AC46-7649FAC34C2A@richardelling.com>
Message-ID: <CA+tR_KyPZs38vOoSZjN5ob7GS2j=V-ZvCuY_abgXpqS62UjWPA@mail.gmail.com>

This is going to be long winded as well (apologies!).. lots of pasted data.

On Thu, Jul 31, 2014 at 1:37 AM, Richard Elling <
richard.elling at richardelling.com> wrote:

>
> The %busy for controllers is a sum of the %busy for all disks on the
> controller, so
> is can be large, but overall isn't interesting. With HDDs, there is no way
> you can
> saturate the controller, so we don't really care what the %busy shows.
>
> The more important item is that the number of read ops is fairly low for
> all but 4 disks.
> Since you didn't post the pool configuration, we can only guess that they
> might be a
> souce of the bottleneck.
>
> You're seeing a lot of reads from the cache devices. How much RAM does
> this system
> have?
>
>
I realized that the busy % was a sum after looking through some of that
data, but good to know that it isn't very relevant.

The pool configuration was in the original post, but here it is again
(after re-attaching the mirror log device). Just saw your edit, but this
has been updated from the original post anyways.

  pool: tank
 state: ONLINE
  scan: scrub in progress since Tue Jul 29 15:41:27 2014
    82.5G scanned out of 24.5T at 555K/s, (scan is slow, no estimated time)
    0 repaired, 0.33% done
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            c1t5000C50055F9F637d0   ONLINE       0     0     0
            c1t5000C50055F9EF2Fd0   ONLINE       0     0     0
          mirror-1                  ONLINE       0     0     0
            c1t5000C50055F87D97d0   ONLINE       0     0     0
            c1t5000C50055F9D3B3d0   ONLINE       0     0     0
          mirror-2                  ONLINE       0     0     0
            c1t5000C50055E6606Fd0   ONLINE       0     0     0
            c1t5000C50055F9F92Bd0   ONLINE       0     0     0
          mirror-3                  ONLINE       0     0     0
            c1t5000C50055F856CFd0   ONLINE       0     0     0
            c1t5000C50055F9FE87d0   ONLINE       0     0     0
          mirror-4                  ONLINE       0     0     0
            c1t5000C50055F84A97d0   ONLINE       0     0     0
            c1t5000C50055FA0AF7d0   ONLINE       0     0     0
          mirror-5                  ONLINE       0     0     0
            c1t5000C50055F9D3E3d0   ONLINE       0     0     0
            c1t5000C50055F9F0B3d0   ONLINE       0     0     0
          mirror-6                  ONLINE       0     0     0
            c1t5000C50055F8A46Fd0   ONLINE       0     0     0
            c1t5000C50055F9FB8Bd0   ONLINE       0     0     0
          mirror-7                  ONLINE       0     0     0
            c1t5000C50055F8B21Fd0   ONLINE       0     0     0
            c1t5000C50055F9F89Fd0   ONLINE       0     0     0
          mirror-8                  ONLINE       0     0     0
            c1t5000C50055F8BE3Fd0   ONLINE       0     0     0
            c1t5000C50055F9E123d0   ONLINE       0     0     0
          mirror-9                  ONLINE       0     0     0
            c1t5000C50055F9379Bd0   ONLINE       0     0     0
            c1t5000C50055F9E7D7d0   ONLINE       0     0     0
          mirror-10                 ONLINE       0     0     0
            c1t5000C50055E65F0Fd0   ONLINE       0     0     0
            c1t5000C50055F9F80Bd0   ONLINE       0     0     0
          mirror-11                 ONLINE       0     0     0
            c1t5000C50055F8A22Bd0   ONLINE       0     0     0
            c1t5000C50055F8D48Fd0   ONLINE       0     0     0
          mirror-12                 ONLINE       0     0     0
            c1t5000C50055E65807d0   ONLINE       0     0     0
            c1t5000C50055F8BFA3d0   ONLINE       0     0     0
          mirror-13                 ONLINE       0     0     0
            c1t5000C50055E579F7d0   ONLINE       0     0     0
            c1t5000C50055E65877d0   ONLINE       0     0     0
          mirror-14                 ONLINE       0     0     0
            c1t5000C50055F9FA1Fd0   ONLINE       0     0     0
            c1t5000C50055F8CDA7d0   ONLINE       0     0     0
          mirror-15                 ONLINE       0     0     0
            c1t5000C50055F8BF9Bd0   ONLINE       0     0     0
            c1t5000C50055F9A607d0   ONLINE       0     0     0
          mirror-16                 ONLINE       0     0     0
            c1t5000C50055E66503d0   ONLINE       0     0     0
            c1t5000C50055E4FDE7d0   ONLINE       0     0     0
          mirror-17                 ONLINE       0     0     0
            c1t5000C50055F8E017d0   ONLINE       0     0     0
            c1t5000C50055F9F3EBd0   ONLINE       0     0     0
          mirror-18                 ONLINE       0     0     0
            c1t5000C50055F8B80Fd0   ONLINE       0     0     0
            c1t5000C50055F9F63Bd0   ONLINE       0     0     0
          mirror-19                 ONLINE       0     0     0
            c1t5000C50055F84FB7d0   ONLINE       0     0     0
            c1t5000C50055F9FEABd0   ONLINE       0     0     0
          mirror-20                 ONLINE       0     0     0
            c1t5000C50055F8CCAFd0   ONLINE       0     0     0
            c1t5000C50055F9F91Bd0   ONLINE       0     0     0
          mirror-21                 ONLINE       0     0     0
            c1t5000C50055E65ABBd0   ONLINE       0     0     0
            c1t5000C50055F8905Fd0   ONLINE       0     0     0
          mirror-22                 ONLINE       0     0     0
            c1t5000C50055E57A5Fd0   ONLINE       0     0     0
            c1t5000C50055F87E73d0   ONLINE       0     0     0
          mirror-23                 ONLINE       0     0     0
            c1t5000C50055E66053d0   ONLINE       0     0     0
            c1t5000C50055E66B63d0   ONLINE       0     0     0
          mirror-24                 ONLINE       0     0     0
            c1t5000C50055F8723Bd0   ONLINE       0     0     0
            c1t5000C50055F8C3ABd0   ONLINE       0     0     0
        logs
          mirror-25                 ONLINE       0     0     0
            c2t5000A72A3007811Dd0   ONLINE       0     0     0
            c12t5000A72B300780FFd0  ONLINE       0     0     0
        cache
          c2t500117310015D579d0     ONLINE       0     0     0
          c2t50011731001631FDd0     ONLINE       0     0     0
          c12t500117310015D59Ed0    ONLINE       0     0     0
          c12t500117310015D54Ed0    ONLINE       0     0     0
        spares
          c1t5000C50055FA2AEFd0     AVAIL
          c1t5000C50055E595B7d0     AVAIL

Basically, this is 2 head nodes (Supermicro 826BE26) connected to a
Supermicro 847E26 JBOD, using LSI 9207s. There are 52 Seagate ST4000NM0023s
(4TB SAS drives) in 25 mirror pairs plus 2 which are spares. There are 4
Smart Optimus 400GB SSDs as cache drives, and 2 Stec ZeusRAMs for slogs.
They're wired in such a way that both nodes can see all the drives (data,
cache and log), and the data drives are on separate controllers than the
cache/slog devices. RSF-1 was also specced in here but not in use at the
moment. All the SAN traffic is through InfiniBand (SRP). Each head unit has
256GB of RAM. Dedupe is not in use and all the latest feature flags are
enabled.

An arc_summary output:

System Memory:
         Physical RAM:  262103 MB
         Free Memory :  10273 MB
         LotsFree:      4095 MB

ZFS Tunables (/etc/system):
         set zfs:zfs_arc_shrink_shift = 10

ARC Size:
         Current Size:             225626 MB (arcsize)
         Target Size (Adaptive):   225626 MB (c)
         Min Size (Hard Limit):    8190 MB (zfs_arc_min)
         Max Size (Hard Limit):    261079 MB (zfs_arc_max)

ARC Size Breakdown:
         Most Recently Used Cache Size:          10%    23290 MB (p)
         Most Frequently Used Cache Size:        89%    202335 MB (c-p)

ARC Efficency:
         Cache Access Total:             27377320465
         Cache Hit Ratio:      93%       25532510784    [Defined State for
buffer]
         Cache Miss Ratio:      6%       1844809681     [Undefined State
for Buffer]
         REAL Hit Ratio:       92%       25243933796    [MRU/MFU Hits Only]

         Data Demand   Efficiency:    95%
         Data Prefetch Efficiency:    40%

        CACHE HITS BY CACHE LIST:
          Anon:                       --%        Counter Rolled.
          Most Recently Used:         18%        4663226393 (mru)       [
Return Customer ]
          Most Frequently Used:       80%        20580707403 (mfu)
     [ Frequent Customer ]
          Most Recently Used Ghost:    0%        176686906 (mru_ghost)  [
Return Customer Evicted, Now Back ]
          Most Frequently Used Ghost:  0%        126286869 (mfu_ghost)  [
Frequent Customer Evicted, Now Back ]
        CACHE HITS BY DATA TYPE:
          Demand Data:                95%        24413671342
          Prefetch Data:               1%        358419705
          Demand Metadata:             2%        698314899
          Prefetch Metadata:           0%        62104838
        CACHE MISSES BY DATA TYPE:
          Demand Data:                69%        1277347273
          Prefetch Data:              28%        519579788
          Demand Metadata:             2%        39512363
          Prefetch Metadata:           0%        8370257

And a sample of arcstat (deleted first line of output):

# arcstat -f
read,hits,miss,hit%,l2read,l2hits,l2miss,l2hit%,arcsz,l2size,l2asize 1

read  hits  miss  hit%  l2read  l2hits  l2miss  l2hit%  arcsz  l2size
 l2asize
5.9K  4.6K  1.3K    78    1.3K    1.2K      80      93   220G    1.6T
901G
6.7K  5.2K  1.5K    76    1.5K    1.3K     250      83   220G    1.6T
901G
7.0K  5.3K  1.7K    76    1.7K    1.4K     316      81   220G    1.6T
901G
6.5K  5.3K  1.2K    80    1.2K    1.1K     111      91   220G    1.6T
901G
6.4K  5.2K  1.2K    81    1.2K    1.1K     100      91   220G    1.6T
901G
7.2K  5.6K  1.6K    78    1.6K    1.3K     289      81   220G    1.6T
901G
8.5K  6.8K  1.7K    80    1.7K    1.3K     351      79   220G    1.6T
901G
7.5K  5.9K  1.6K    78    1.6K    1.3K     282      82   220G    1.6T
901G
6.7K  5.6K  1.1K    83    1.1K     991     123      88   220G    1.6T
901G
6.8K  5.5K  1.3K    80    1.3K    1.1K     234      82   220G    1.6T
901G

Interesting to see only an l2asize of 901G even though I should have more..
373G x 4 is just under 1.5TB of raw storage. The compressed l2arc size is
1.6TB, while actual used space is 901G. I expect more to be used. Perhaps
Saso can comment on this portion, if he's following this thread (snipped
from "zpool iostat -v"):

cache                           -      -      -      -      -      -
  c2t500117310015D579d0      373G     8M    193     16  2.81M   394K
  c2t50011731001631FDd0      373G  5.29M    194     15  2.85M   360K
  c12t500117310015D59Ed0     373G  5.50M    191     17  2.74M   368K
  c12t500117310015D54Ed0     373G  5.57M    200     14  2.89M   300K

(from this discussion here:
http://lists.omniti.com/pipermail/omnios-discuss/2014-February/002287.html),
and the uptime on this is currently around ~58 days, so it should have had
enough time to rotate through the l2arc "rotor".


> methinks the scrub I/Os are getting starved and since they are low
> priority, they
> could get very starved. In general, I wouldn't worry about it, but I
> understand
> why you might be nervous. Keep in mind that in ZFS scrubs are intended to
> find
> errors on idle data, not frequently accessed data.
>
> more far below...
>
>
I'm worried because there's no way the scrub will ever complete before the
next reboot. Regular scrubs are important, right?


> ok, so the pool is issuing 720 read iops, including resilver workload, vs
> 1298 write iops.
> There is plenty of I/O capacity left on the table here, as you can see by
> the %busy being
> so low.
>
> So I think the pool is not scheduling scrub I/Os very well. You can
> increase the number of
> scrub I/Os in the scheduler by adjusting the zfs_vdev_scrub_max_active
> tunable. The
> default is 2, but you'll have to consider that a share (in the stock
> market sense) where
> the active sync reads and writes are getting 10 each. You can try bumping
> up the value
> and see what happens over some time, perhaps 10 minutes or so -- too short
> of a time
> and you won't get a good feeling for the impact (try this in off-peak
> time).
> echo zfs_vdev_scrub_max_active/W0t5 | mdb -kw
> will change the value from 2 to 5, increasing its share of the total I/O
> workload.
>
> You can see the progress of scan (scrubs do scan) workload by looking at
> the ZFS
> debug messages.
> echo ::zfs_dbgmsg | mdb -k
> These will look mysterious... they are. But the interesting bits are about
> how many blocks
> are visited in some amount of time (txg sync interval). Ideally, this will
> change as you
> adjust zfs_vdev_scrub_max_active.
>  -- richard
>
>
Actually, you used the data from before the resilver. During resilver this
was the activity on the pool:

3883.5 1357.7 40141.6 60739.5 22.8 38.6    4.4    7.4  54 100 tank

Are you looking at an individual drive's busy % or the pool's busy % to
determine whether it's "busy"? During the resilver this was the activity on
the drives (a sample - between 38-45%, whereas during the scrub the
individual drives were 2-5% busy):

   59.5   30.0  553.2  741.8  0.0  0.9    0.0    9.8   0  45
c1t5000C50055F8A46Fd0
   57.4   22.5  504.0  724.5  0.0  0.8    0.0    9.6   0  41
c1t5000C50055F856CFd0
   58.4   24.6  531.4  786.9  0.0  0.7    0.0    8.4   0  38
c1t5000C50055E6606Fd0

But yes, without the resilver the busy % was much less (during the scrub
each individual drive was 2-4% busy). I've pasted the current iostat output
further below.

With the zfs_vdev_scrub_max_active at the default of 2, it was doing an
average of 162 blocks:

doing scan sync txg 26678243; bm=897/1/0/15785978
scanned dataset 897 (tank/vmware-64k-5tb-1) with min=1 max=26652167;
pausing=1
visited 162 blocks in 6090ms
doing scan sync txg 26678244; bm=897/1/0/15786126
scanned dataset 897 (tank/vmware-64k-5tb-1) with min=1 max=26652167;
pausing=1
visited 162 blocks in 6094ms

After changing it to 5, and waiting about 20 mins, I'm not seeing anything
significantly different:

doing scan sync txg 26678816; bm=897/1/0/37082043
scanned dataset 897 (tank/vmware-64k-5tb-1) with min=1 max=26652167;
pausing=1
visited 163 blocks in 6154ms
doing scan sync txg 26678817; bm=897/1/0/37082193
scanned dataset 897 (tank/vmware-64k-5tb-1) with min=1 max=26652167;
pausing=1
visited 162 blocks in 6128ms

  pool: tank
 state: ONLINE
  scan: scrub in progress since Tue Jul 29 15:41:27 2014
    97.0G scanned out of 24.5T at 599K/s, (scan is slow, no estimated time)
    0 repaired, 0.39% done

I'll keep the zfs_vdev_scrub_max_active tunable to 5, as it doesn't appear
to be impacting too much, and monitor for changes. What's strange to me is
that it was "humming" along at 5.5MB/s at the 2 week mark but is now 10x
slower (compared to before reattaching the mirror log device). It *seems*
marginally faster, from 541K/s to almost 600K/s..

This is the current activity from "iostat -xnCz 60 2":

                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  158.8 1219.2 3717.8 39969.8  0.0  1.6    0.0    1.1   0 139 c1
    3.6   35.1   86.2  730.9  0.0  0.0    0.0    0.9   0   3
c1t5000C50055F8723Bd0
    3.7   19.9   83.7  789.9  0.0  0.0    0.0    1.4   0   3
c1t5000C50055E66B63d0
    2.7   22.5   60.8  870.9  0.0  0.0    0.0    1.1   0   2
c1t5000C50055F87E73d0
    2.4   27.9   66.0  765.8  0.0  0.0    0.0    0.8   0   2
c1t5000C50055F8BFA3d0
    2.8   17.9   64.9  767.0  0.0  0.0    0.0    0.8   0   1
c1t5000C50055F9E123d0
    3.1   26.1   73.8  813.3  0.0  0.0    0.0    0.9   0   2
c1t5000C50055F9F0B3d0
    3.1   15.5   79.4  783.4  0.0  0.0    0.0    1.3   0   2
c1t5000C50055F9D3B3d0
    3.8   38.6   86.2  826.8  0.0  0.1    0.0    1.2   0   4
c1t5000C50055E4FDE7d0
    3.8   15.4   93.0  822.3  0.0  0.0    0.0    1.5   0   3
c1t5000C50055F9A607d0
    3.0   25.7   79.4  719.7  0.0  0.0    0.0    0.9   0   2
c1t5000C50055F8CDA7d0
    3.2   26.5   69.0  824.3  0.0  0.0    0.0    1.1   0   3
c1t5000C50055E65877d0
    3.7   42.6   79.2  834.1  0.0  0.1    0.0    1.3   0   5
c1t5000C50055F9E7D7d0
    3.3   23.2   79.5  778.0  0.0  0.0    0.0    1.2   0   3
c1t5000C50055FA0AF7d0
    3.4   30.2   77.0  805.9  0.0  0.0    0.0    0.9   0   3
c1t5000C50055F9FE87d0
    3.0   15.4   72.6  795.0  0.0  0.0    0.0    1.6   0   3
c1t5000C50055F9F91Bd0
    2.5   38.1   61.1  859.4  0.0  0.1    0.0    1.6   0   5
c1t5000C50055F9FEABd0
    2.1   13.2   42.7  801.6  0.0  0.0    0.0    1.6   0   2
c1t5000C50055F9F63Bd0
    3.0   20.0   62.6  766.6  0.0  0.0    0.0    1.1   0   2
c1t5000C50055F9F3EBd0
    3.7   24.3   80.2  807.9  0.0  0.0    0.0    1.0   0   2
c1t5000C50055F9F80Bd0
    3.2   35.2   66.1  852.4  0.0  0.0    0.0    1.2   0   4
c1t5000C50055F9FB8Bd0
    3.9   30.6   84.7  845.7  0.0  0.0    0.0    0.8   0   3
c1t5000C50055F9F92Bd0
    2.7   18.1   68.8  831.4  0.0  0.0    0.0    1.4   0   2
c1t5000C50055F8905Fd0
    2.7   17.7   61.4  762.1  0.0  0.0    0.0    1.0   0   2
c1t5000C50055F8D48Fd0
    3.5   17.5   87.8  749.7  0.0  0.0    0.0    1.7   0   3
c1t5000C50055F9F89Fd0
    2.6   13.7   58.6  780.9  0.0  0.0    0.0    1.7   0   3
c1t5000C50055F9EF2Fd0
    3.3   34.9   74.5  730.9  0.0  0.0    0.0    0.8   0   3
c1t5000C50055F8C3ABd0
    3.1   19.3   64.7  789.9  0.0  0.0    0.0    1.0   0   2
c1t5000C50055E66053d0
    3.8   38.5   82.9  826.8  0.0  0.1    0.0    1.3   0   4
c1t5000C50055E66503d0
    3.7   25.8   91.4  813.3  0.0  0.0    0.0    0.8   0   2
c1t5000C50055F9D3E3d0
    2.2   37.9   52.5  859.4  0.0  0.0    0.0    1.1   0   4
c1t5000C50055F84FB7d0
    2.8   20.0   62.8  766.6  0.0  0.0    0.0    1.0   0   2
c1t5000C50055F8E017d0
    3.9   26.1   86.5  824.3  0.0  0.0    0.0    1.1   0   3
c1t5000C50055E579F7d0
    3.1   27.7   79.9  765.8  0.0  0.0    0.0    1.2   0   3
c1t5000C50055E65807d0
    2.9   22.8   76.3  778.0  0.0  0.0    0.0    1.1   0   3
c1t5000C50055F84A97d0
    3.6   15.3   89.0  783.4  0.0  0.0    0.0    1.7   0   3
c1t5000C50055F87D97d0
    2.8   13.8   77.9  780.9  0.0  0.0    0.0    1.5   0   2
c1t5000C50055F9F637d0
    2.1   18.3   51.4  831.4  0.0  0.0    0.0    1.1   0   2
c1t5000C50055E65ABBd0
    3.1   15.4   70.9  822.3  0.0  0.0    0.0    1.2   0   2
c1t5000C50055F8BF9Bd0
    3.2   17.9   75.5  762.1  0.0  0.0    0.0    1.2   0   2
c1t5000C50055F8A22Bd0
    3.7   42.4   83.3  834.1  0.0  0.1    0.0    1.1   0   5
c1t5000C50055F9379Bd0
    4.0   22.7   86.8  870.9  0.0  0.0    0.0    1.0   0   2
c1t5000C50055E57A5Fd0
    2.6   15.5   67.5  795.0  0.0  0.0    0.0    1.4   0   2
c1t5000C50055F8CCAFd0
    2.9   13.2   65.4  801.6  0.0  0.0    0.0    1.9   0   3
c1t5000C50055F8B80Fd0
    3.3   25.7   82.7  719.7  0.0  0.0    0.0    1.1   0   3
c1t5000C50055F9FA1Fd0
    4.0   24.0   84.9  807.9  0.0  0.0    0.0    1.1   0   3
c1t5000C50055E65F0Fd0
    2.8   18.4   69.5  767.0  0.0  0.0    0.0    1.0   0   2
c1t5000C50055F8BE3Fd0
    3.3   17.6   81.6  749.7  0.0  0.0    0.0    1.4   0   3
c1t5000C50055F8B21Fd0
    3.3   35.1   64.2  852.4  0.0  0.0    0.0    1.1   0   4
c1t5000C50055F8A46Fd0
    3.5   30.0   82.1  805.9  0.0  0.0    0.0    0.9   0   3
c1t5000C50055F856CFd0
    3.9   30.4   89.5  845.7  0.0  0.0    0.0    0.9   0   3
c1t5000C50055E6606Fd0
  429.4  133.6 5933.0 8163.0  0.0  0.2    0.0    0.3   0  12 c2
  215.8   28.4 2960.4  677.7  0.0  0.1    0.0    0.2   0   5
c2t500117310015D579d0
  213.7   27.4 2972.6  654.1  0.0  0.1    0.0    0.2   0   5
c2t50011731001631FDd0
    0.0   77.8    0.0 6831.2  0.0  0.1    0.0    0.8   0   2
c2t5000A72A3007811Dd0
    0.0   12.3    0.0   46.8  0.0  0.0    0.0    0.0   0   0 c4
    0.0    6.2    0.0   23.4  0.0  0.0    0.0    0.0   0   0 c4t0d0
    0.0    6.1    0.0   23.4  0.0  0.0    0.0    0.0   0   0 c4t1d0
  418.4  134.8 5663.1 8197.8  0.0  0.2    0.0    0.3   0  11 c12
    0.0   77.8    0.0 6831.2  0.0  0.1    0.0    0.8   0   2
c12t5000A72B300780FFd0
  203.5   29.7 2738.0  715.8  0.0  0.1    0.0    0.2   0   5
c12t500117310015D59Ed0
  214.9   27.2 2925.2  650.8  0.0  0.1    0.0    0.2   0   5
c12t500117310015D54Ed0
    0.0   11.3    0.0   46.8  0.0  0.0    0.7    0.1   0   0 rpool
 1006.7 1478.2 15313.9 56330.7 30.4  2.0   12.2    0.8   6  64 tank

Seems the pool is busy at 64% but the individual drives are not taxed at
all (this load is virtually identical to when the scrub was running before
the resilver was triggered). Still not too sure how to interpret this data.
Is the system over-stressed? Is there really a bottleneck somewhere, or
just need to fine tune some settings?

Going to try some iometer results via the VM I/O Analyzer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140731/ca5a7585/attachment-0001.html>