From jarrad at gmail.com Tue Jul 1 07:56:47 2014 From: jarrad at gmail.com (Jarrad Piper) Date: Tue, 1 Jul 2014 17:26:47 +0930 Subject: [OmniOS-discuss] Experiences with HP Micro Server Gen8 Message-ID: Hi All, I missed out on the discussion Re. the HP Gen8 Microserver last month but thought I'd put my two cent's on where it is at currently in terms of hardware support as I've been using it for close to a year now as a Storage/KVM server. After Initially having no support at all, the Broadcom 5720 onboard NIC's for a while now has had out of the box support. However the bundled drivers are very old and have a serious bug in that the NIC will lock up entirely if you try to create a VNIC. e.g. "dladm create-vnic -l bge0 vnic0" This may not concern some people but if you wan't to run a Zone or a KVM Virtual Machine you will need to create a VNIC. This problem only presents when you are connected to a 1Gbit switch. Other people have exhibited the same problem in other circumstances but I have only seen this problem occur creating VNIC's in OmniOS. One workaround is to boot the machine up on a 100Mbit switch create the VNIC and then connect the Gigabit switch. A more practical solution is to turn of auto-negotiation and set the link properties to 100Mbit. e.g. sudo dladm show-linkprop bge1 sudo dladm set-linkprop -t -p adv_autoneg_cap=0 bge1 sudo dladm set-linkprop -t -p en_1000fdx_cap=0 bge1 sudo dladm set-linkprop -t -p en_1000hdx_cap=0 bge1 The best solution however is to build the latest driver (16.2.2) from source which is available from Broadcom's website. On OmniOS you will need to install onbld and the Solaris Studio 12.1 Compiler before building. The zip file contains a readme.txt on how to compile and update the driver for OpenSolaris/Solaris11. You will end up with a BRCMbge-S11-i386-16.2.2.pkg file which you install using "pkgadd -d BRCMbge-S11-i386-16.2.2.pkg". Note: this is from memory, there may have also been a few tweaks to the Makefile to get it to compile. >From what I can gather Illumos cannot include newer versions of the BGE driver due to Broadcom's license not being compatible with theirs. The other major problem with the Microserver is the unbearable fan noise when the drives are configured in AHCI mode (which is what anyone using ZFS will want). I won't go into it but more information is available here: http://h30499.www3.hp.com/t5/ProLiant-Servers-Netservers/MicroServer-Gen8-is-noisy/td-p/6171563 http://homeservershow.com/forums/index.php?/topic/6032-g8-microserver-be-aware-of-fan-issue-add-in-cards/ Basically to get it to a bearable level you will need the latest BIOS and a hacked ILO firmware (see second link) which lowers the heat tolerance. Oh and I can also confirm that booting from an SD Card is fine, It has been happily working for almost a year now. Hope this helps anyone having issues or thinking of purchasing one. Any other questions let me know. Jarrad. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.digregorio at gmail.com Tue Jul 1 08:56:00 2014 From: nicolas.digregorio at gmail.com (Nicolas Di Gregorio) Date: Tue, 1 Jul 2014 10:56:00 +0200 Subject: [OmniOS-discuss] best practice for permissions on a NAS Message-ID: Hello, I'm building a NAS with OmniOS that should server CIFS and NFS client with at least basic restriction regarding users and groups. What would be the best practive to configure the permissions? How to implement it? Kind Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.digregorio at gmail.com Tue Jul 1 09:42:25 2014 From: nicolas.digregorio at gmail.com (Nicolas Di Gregorio) Date: Tue, 1 Jul 2014 11:42:25 +0200 Subject: [OmniOS-discuss] best practice for permissions on a NAS In-Reply-To: <1283584D-CE77-4BBB-9191-FBA2D1F5BAA3@marzocchi.net> References: <1283584D-CE77-4BBB-9191-FBA2D1F5BAA3@marzocchi.net> Message-ID: Thanks for this. what does mean aclinherit=passthrough? 2014-07-01 11:20 GMT+02:00 Olaf Marzocchi : > You may want to check my short articles on Marzocchi.net about my NAS > based on OmniOS. > > Regards > Olaf Marzocchi > > > Inviato da iPhone > > > Il giorno 01/lug/2014, alle ore 10:56, Nicolas Di Gregorio < > nicolas.digregorio at gmail.com> ha scritto: > > > > > > Hello, > > > > I'm building a NAS with OmniOS that should server CIFS and NFS client > with at least basic restriction regarding users and groups. What would be > the best practive to configure the permissions? How to implement it? > > > > Kind Regards > > _______________________________________________ > > OmniOS-discuss mailing list > > OmniOS-discuss at lists.omniti.com > > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From cal-s at blue-bolt.com Fri Jul 4 08:35:37 2014 From: cal-s at blue-bolt.com (Cal Sawyer) Date: Fri, 04 Jul 2014 09:35:37 +0100 Subject: [OmniOS-discuss] ncurses lib in r151010j In-Reply-To: References: Message-ID: <53B66759.1080306@blue-bolt.com> Hi I'm trying to build iftop on r151010j because the available packages are rather stale. basename file opt/omni/sbin/iftop pkg:/network/iftop at 1.0.2-0.151006 PUBLISHER TYPE STATUS URI omnios origin online http://pkg.omniti.com/omnios/r151010/ ms.omniti.com origin online http://pkg.omniti.com/omniti-ms/ Running into an issue with ncurses during the configure stage checking for a curses library containing mvchgat... none found configure: error: Curses! Foiled again! (Can't find a curses library supporting mvchgat.) Consider installing ncurses. however ... > pkg list ncurses NAME (PUBLISHER) VERSION IFO library/ncurses 5.9-0.151010 i-- > grep mvchgat /usr/include/ncurses/ncurses.h extern NCURSES_EXPORT(int) mvchgat (int, int, int, attr_t, short, const void *); /* generated */ #define mvchgat(y,x,n,a,c,o) mvwchgat(stdscr,y,x,n,a,c,o) Looks like it's supported. Any ideas? thanks - cal -------------- next part -------------- An HTML attachment was scrubbed... URL: From daleg at omniti.com Fri Jul 4 09:05:09 2014 From: daleg at omniti.com (Dale Ghent) Date: Fri, 4 Jul 2014 09:05:09 +0000 Subject: [OmniOS-discuss] ncurses lib in r151010j In-Reply-To: <53B66759.1080306@blue-bolt.com> References: <53B66759.1080306@blue-bolt.com> Message-ID: So what does your config.log say regarding the check for mvchgat() ? How is it failing the test? When running into issues such as this, config.log is the go-to place to start figuring out the why. autoconf itself isn?t infallible in how it checks for things, after all. /dale On Jul 4, 2014, at 8:35 AM, Cal Sawyer wrote: > Hi > > I'm trying to build iftop on r151010j because the available packages are rather stale. > > basename file opt/omni/sbin/iftop pkg:/network/iftop at 1.0.2-0.151006 > > > PUBLISHER TYPE STATUS URI > omnios origin online http://pkg.omniti.com/omnios/r151010/ > ms.omniti.com origin online http://pkg.omniti.com/omniti-ms/ > > > Running into an issue with ncurses during the configure stage > > checking for a curses library containing mvchgat... none found > configure: error: Curses! Foiled again! > (Can't find a curses library supporting mvchgat.) > Consider installing ncurses. > > however ... > > > pkg list ncurses > NAME (PUBLISHER) VERSION IFO > library/ncurses 5.9-0.151010 i-- > > grep mvchgat /usr/include/ncurses/ncurses.h > > extern NCURSES_EXPORT(int) mvchgat (int, int, int, attr_t, short, const void *); /* generated */ > #define mvchgat(y,x,n,a,c,o) mvwchgat(stdscr,y,x,n,a,c,o) > > Looks like it's supported. Any ideas? > > thanks > > - cal > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From cal-s at blue-bolt.com Fri Jul 4 09:57:33 2014 From: cal-s at blue-bolt.com (Cal Sawyer) Date: Fri, 04 Jul 2014 10:57:33 +0100 Subject: [OmniOS-discuss] ncurses lib in r151010j In-Reply-To: References: <53B66759.1080306@blue-bolt.com> Message-ID: <53B67A8D.2040803@blue-bolt.com> This, i imagine, is it. From config.log configure:6233: checking for a curses library containing mvchgat configure:6255: gcc -o conftest -g -O2 conftest.c -lpcap -lnsl -lm -lsocket -lcurses >&5 Undefined first referenced symbol in file mvchgat /var/tmp//cckgaO9B.o ld: fatal: symbol referencing errors. No output written to conftest collect2: error: ld returned 1 exit status configure:6258: $? = 1 configure: failed program was: #line 6238 "configure" #include "confdefs.h" #include int main () { mvchgat(0, 0, 1, A_REVERSE, 0, NULL) ; return 0; } configure:6255: gcc -o conftest -g -O2 conftest.c -lpcap -lnsl -lm -lsocket -lncurses >&5 ld: fatal: library -lncurses: not found ld: fatal: file processing errors. No output written to conftest collect2: error: ld returned 1 exit status configure:6258: $? = 1 configure: failed program was: #line 6238 "configure" #include "confdefs.h" #include int main () { mvchgat(0, 0, 1, A_REVERSE, 0, NULL) ; return 0; } configure:6278: result: none found configure:6280: error: Curses! Foiled again! (Can't find a curses library supporting mvchgat.) Consider installing ncurses. No idea what to do about it, though. regards, - cal On 04/07/14 10:05, Dale Ghent wrote: > So what does your config.log say regarding the check for mvchgat() ? How is it failing the test? When running into issues such as this, config.log is the go-to place to start figuring out the why. > > autoconf itself isn?t infallible in how it checks for things, after all. > > /dale > > On Jul 4, 2014, at 8:35 AM, Cal Sawyer wrote: > >> Hi >> >> I'm trying to build iftop on r151010j because the available packages are rather stale. >> >> basename file opt/omni/sbin/iftop pkg:/network/iftop at 1.0.2-0.151006 >> >> >> PUBLISHER TYPE STATUS URI >> omnios origin online http://pkg.omniti.com/omnios/r151010/ >> ms.omniti.com origin online http://pkg.omniti.com/omniti-ms/ >> >> >> Running into an issue with ncurses during the configure stage >> >> checking for a curses library containing mvchgat... none found >> configure: error: Curses! Foiled again! >> (Can't find a curses library supporting mvchgat.) >> Consider installing ncurses. >> >> however ... >> >>> pkg list ncurses >> NAME (PUBLISHER) VERSION IFO >> library/ncurses 5.9-0.151010 i-- >> > grep mvchgat /usr/include/ncurses/ncurses.h >> >> extern NCURSES_EXPORT(int) mvchgat (int, int, int, attr_t, short, const void *); /* generated */ >> #define mvchgat(y,x,n,a,c,o) mvwchgat(stdscr,y,x,n,a,c,o) >> >> Looks like it's supported. Any ideas? >> >> thanks >> >> - cal >> _______________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From daleg at omniti.com Fri Jul 4 10:39:42 2014 From: daleg at omniti.com (Dale Ghent) Date: Fri, 4 Jul 2014 10:39:42 +0000 Subject: [OmniOS-discuss] ncurses lib in r151010j In-Reply-To: <53B67A8D.2040803@blue-bolt.com> References: <53B66759.1080306@blue-bolt.com> <53B67A8D.2040803@blue-bolt.com> Message-ID: <31BB1921-0954-4525-90EF-77F829AB1732@omniti.com> On Jul 4, 2014, at 9:57 AM, Cal Sawyer wrote: > This, i imagine, is it. From config.log ... > configure:6255: gcc -o conftest -g -O2 conftest.c -lpcap -lnsl -lm -lsocket -lncurses >&5 > ld: fatal: library -lncurses: not found > ld: fatal: file processing errors. No output written to conftest Try: LDFLAGS=?-L/usr/gnu/lib -R/usr/gnu/lib? ./configure ? /dale From alex at cooperi.net Sun Jul 6 10:39:32 2014 From: alex at cooperi.net (Alex Wilson) Date: Sun, 6 Jul 2014 20:39:32 +1000 Subject: [OmniOS-discuss] AS Media 1061 In-Reply-To: References: Message-ID: On 27 Jun 2014, at 4:03 am, F?bio Rabelo wrote: > Someone has give a try recently any PCIe card based on the ASm Media > 1061 SATA 3 chipset ? It's not the ASM1061, but I've been using its close relative, the ASM1062 since the original commit by Marcel and have had no real issues. Performance is acceptable. I have a single Intel mSATA SSD attached to it. I did update the BIOS and firmware on it as soon as I got it (the images to do this have since disappeared from the ASMedia website though I think). The card I have is one of these: http://eshop.macsales.com/item/Other+World+Computing/PCIEACCELM/ -- there are a bunch of other brands of it too which seem to be the same PCB layout. From lotheac at iki.fi Mon Jul 7 15:38:26 2014 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Mon, 7 Jul 2014 18:38:26 +0300 Subject: [OmniOS-discuss] omnios-build: the build system, scripts and merging between branches Message-ID: <20140707153826.GA15377@gutsman.lotheac.fi> Please read this and comment if you maintain a fork of omnios-build. Thanks. Status quo ---------- The omnios-build repository currently versions more than one thing: - the build scripts which describe how to build certain software (under build/), - the build system (lib/ (mostly), build/buildctl and template/) - and some site-specific data (configuration). The 'template' branch holds nothing site-specific and users are expected to fork that one, adding their own stuff on top and pulling changes from upstream's 'template'. This works in one direction, but in practice the workflow is not "make your changes on top template"; it needs to work in both directions. When the build scripts and configuration are in the same repository, you cannot merge back into template - you must instead be careful with cherry-picking. This isn't very maintainable: recently I made a pull request that just ported build system changes from omniti-labs:master to omniti-labs:template and that was 176 commits in size (ie. after pruning all commits which only touched build scripts). I maintain two forks of omnios-build (one for a publicly available repo, and one for a private one) and syncing changes between them and upstream is a royal pain. I'd wager OmniTI isn't particularly fond of porting build system changes between different release version branches and omniti-ms and friends. I argue that making build system changes flow from one fork/branch to another needs to be made easier. The way to do that is to split the build system from the build scripts into another repository, so that merges can be made freely. This obviously requires some rather large changes, which I will propose below. Proposal: submodules? --------------------- At first, I was thinking "git submodules are the solution to this". If you're not familiar, submodules are a way to include a reference to another repository at a certain commit. This is the sort of tree I was thinking about: omnios-build-my-organization (your site-specific build repo) |-- lib (submodule: the build system, or: site-independent code) |-- build `-- (config.sh, site.sh and other site-specific data) This would allow you to work on lib/ separately from your build scripts, and make it clear which version of the build system they need to build. However, git submodules are a bit unwieldy and not very intuitive (among other things you need to manually use submodule commands to update the submodule tree instead of git doing it for you; this is certain to get annyoing when switching branches). Better proposal --------------- So, let's make this simpler by turning it around and ditching the submodule idea. Let's just have omnios-build contain site-independent data (ie. the build system), and make build/ another repository, but let's not make it a submodule. Instead, .gitignore it and make ./new.sh warn and initialize a new git repository into it if it doesn't exist. This doesn't require any submodule trickery, but lets you have your scripts in a separate repo. Proof of concept is on github niksula/omnios-build split branch and niksula/omnios-build-scrips repos: git clone -b split https://github.com/niksula/omnios-build.git cd omnios-build git clone https://github.com/niksula/omnios-build-scripts.git build cd build/ircii ./build.sh In the PoC I moved buildctl out of build/ and {site,config}.sh inside it (to the omnios-build-scripts repo). It's not a finished product, but should demonstrate what I'm saying. Thoughts? -- Lauri Tirkkonen | +358 50 5341376 | lotheac @ IRCnet From danmcd at omniti.com Mon Jul 7 16:02:59 2014 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 7 Jul 2014 12:02:59 -0400 Subject: [OmniOS-discuss] omnios-build: the build system, scripts and merging between branches In-Reply-To: <20140707153826.GA15377@gutsman.lotheac.fi> References: <20140707153826.GA15377@gutsman.lotheac.fi> Message-ID: <7B288CC0-9AEB-45A3-8167-AF343283ACC0@omniti.com> On Jul 7, 2014, at 11:38 AM, Lauri Tirkkonen wrote: > Please read this and comment if you maintain a fork of omnios-build. > Thanks. I'll have to read this more deeply, of course, but I had only one knee-jerk reaction: > Proof of concept is on github niksula/omnios-build split branch and > niksula/omnios-build-scrips repos: > > git clone -b split https://github.com/niksula/omnios-build.git > cd omnios-build > git clone https://github.com/niksula/omnios-build-scripts.git build > cd build/ircii > ./build.sh > > In the PoC I moved buildctl out of build/ and {site,config}.sh inside it > (to the omnios-build-scripts repo). It's not a finished product, but > should demonstrate what I'm saying. > > Thoughts? This *seems* sensible, especially as you've put buildctl at the top-level in the omnios-build half of the split. It seems right now, however, that buildctl still assumes it's in build/. instead of one directory above it. It's also not 100% clear that the functions in lib/ have been altered to assume site.sh and config.sh are in build instead of lib. I'd like to see what all changed in any scripts that now live in the "build" half of your split vs. their original pre-split incarnations. Not sure if github or a tool like webrev would be able to help here. Also, I've been documenting buildctl and my wrapper - OmniOS-on-demand - which generates the bloody bits. I will have one push upstream to help here. Dan From lotheac at iki.fi Mon Jul 7 16:11:59 2014 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Mon, 7 Jul 2014 19:11:59 +0300 Subject: [OmniOS-discuss] omnios-build: the build system, scripts and merging between branches In-Reply-To: <7B288CC0-9AEB-45A3-8167-AF343283ACC0@omniti.com> References: <20140707153826.GA15377@gutsman.lotheac.fi> <7B288CC0-9AEB-45A3-8167-AF343283ACC0@omniti.com> Message-ID: <20140707161159.GB15377@gutsman.lotheac.fi> On Mon, Jul 07 2014 12:02:59 -0400, Dan McDonald wrote: > This *seems* sensible, especially as you've put buildctl at the top-level in the omnios-build half of the split. > > It seems right now, however, that buildctl still assumes it's in > build/. instead of one directory above it. Yep, it does. As I said, it's just a PoC at this point (but I think you might be able to do 'cd build; ../buildctl'. I don't use buildctl myself, yet) > It's also not 100% clear that the functions in lib/ have been altered > to assume site.sh and config.sh are in build instead of lib. https://github.com/niksula/omnios-build/compare/niksula:master...split > I'd like to see what all changed in any scripts that now live in the > "build" half of your split vs. their original pre-split incarnations. > Not sure if github or a tool like webrev would be able to help here. Nothing in the scripts themselves changed. I did a git filter-branch --subdirectory-filter to split the build dir into its own repo, after which I just removed buildctl, added config.sh and site.sh. The latest three commits at https://github.com/niksula/omnios-build-scripts/commits/master basically. -- Lauri Tirkkonen | +358 50 5341376 | lotheac @ IRCnet From fabio at fabiorabelo.wiki.br Tue Jul 8 15:56:43 2014 From: fabio at fabiorabelo.wiki.br (=?UTF-8?Q?F=C3=A1bio_Rabelo?=) Date: Tue, 8 Jul 2014 12:56:43 -0300 Subject: [OmniOS-discuss] ASMedia 1061 WORKING ! Message-ID: Hi to all I just plugged this card in an OmniOS box : http://sgp.imgmarket.net/sgp/201406/35765_b.jpg And the system recognises the hard disk connected to it, initialised it, and it is in use !!! I do not have yet anything about performance, but it is working !! F?bio Rabelo From brogyi at gmail.com Tue Jul 8 19:05:29 2014 From: brogyi at gmail.com (=?UTF-8?B?QnJvZ3nDoW55aSBKw7N6c2Vm?=) Date: Tue, 08 Jul 2014 21:05:29 +0200 Subject: [OmniOS-discuss] ASMedia 1061 WORKING ! In-Reply-To: References: Message-ID: <53BC40F9.7040105@gmail.com> Very good Fabio. My card is exactly same but the postman hasn't brought me yet. I felt, I thought it must work. Thank you for your confirmation. Brogyi One important things. AHCI must on in BIOS. Am I right? Stupid question but is important. :) 2014.07.08. 17:56 keltez?ssel, F?bio Rabelo ?rta: > Hi to all > > I just plugged this card in an OmniOS box : > > http://sgp.imgmarket.net/sgp/201406/35765_b.jpg > > And the system recognises the hard disk connected to it, initialised > it, and it is in use !!! > > I do not have yet anything about performance, but it is working !! > > > F?bio Rabelo > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From danmcd at omniti.com Tue Jul 8 19:18:34 2014 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 8 Jul 2014 15:18:34 -0400 Subject: [OmniOS-discuss] ASMedia 1061 WORKING ! In-Reply-To: <53BC40F9.7040105@gmail.com> References: <53BC40F9.7040105@gmail.com> Message-ID: <4DB1667D-AED1-43B6-9C8C-979E55280C2E@omniti.com> On Jul 8, 2014, at 3:05 PM, Brogy?nyi J?zsef wrote: > Very good Fabio. My card is exactly same but the postman hasn't brought me yet. > I felt, I thought it must work. Thank you for your confirmation. > > Brogyi > > One important things. AHCI must on in BIOS. Am I right? Stupid question but is important. :) AHCI is generally the best choice. Dan From danmcd at omniti.com Fri Jul 11 18:02:58 2014 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 11 Jul 2014 14:02:58 -0400 Subject: [OmniOS-discuss] OmniOS "bloody" repo has been updated Message-ID: <5E1D27AA-CD90-48CC-9C0A-6EBD64FC4C0B@omniti.com> AND there's new installation media as well. Highlights include: - Now built with pkgdepend(1) checking. To that end, I've updated the entire wad of packages. This may increase your download/upgrade time. (Thanks to community member Lauri "lotheac" Tirkkonen.) - Several ZFS updates from upstream illumos. + Better behavior in the face of full or nearly-full pools. + Better-behaved "zfs rename" and "zfs create" when not sharing the datasets. - rpcbind(1M) and mountd(1M) now use libumem, which will help them both when under load. - Additional devices entries for cpqary3(7D) for additional HP HBAs. Happy updating! Dan McD. -- OmniOS Engineering From fabio at fabiorabelo.wiki.br Sun Jul 13 12:07:26 2014 From: fabio at fabiorabelo.wiki.br (=?UTF-8?Q?F=C3=A1bio_Rabelo?=) Date: Sun, 13 Jul 2014 09:07:26 -0300 Subject: [OmniOS-discuss] AHCI driver problem with ASM1062 In-Reply-To: <53C24C65.2000905@gmail.com> References: <53C24C65.2000905@gmail.com> Message-ID: I am not shore if I can help . The chip are not exactly the same, mine are ASM 1061, and I am not experiencing any error, but the only disk connected to this card are a Samsung 256GB SSD that are working as ZIL . During my research I found in several forums a warning about BIOS version, but all users saying it were from Windows world . Anyway, my card already comes with the up to date BIOS . And I did not saw anything about the ASM1062 chip, only about the ASM 1061 chip, so I do not know if the bios issue can apply to your case . Sorry if I cannot be more useful ... F?bio Rabelo 2014-07-13 6:07 GMT-03:00 Brogy?nyi J?zsef : > Hi > > I'm little bit sad because this card not working 100%. Some people can use > it without any error. My system is working with small files. > I think the AHCI driver is same as OmniTI. > When I want to copy a big file ( more than 1 GB) the system began to slow > and finally stop or waiting for a long time. > Fabio could you check your on your system this? The dd copy sometimes can > run with few error message then the speed is enough for any new HDD > (395MBps). > Now I don't know the card is bad or the AHCI driver not fit my system. > Thanks any help. I think the message contains lot information about this > error. > Here is my dmesg output: > > Jul 13 10:43:46 hipster Transport state transition error (T) > Jul 13 10:43:46 hipster ahci: [ID 657156 kern.warning] WARNING: ahci1: error > recovery for port 1 succeed > Jul 13 10:43:46 hipster ahci: [ID 777486 kern.warning] WARNING: ahci1: ahci > port 1 has interface fatal error > Jul 13 10:43:46 hipster ahci: [ID 687168 kern.warning] WARNING: ahci1: ahci > port 1 is trying to do error recovery > Jul 13 10:43:46 hipster ahci: [ID 551337 kern.warning] WARNING: ahci1: > Handshake Error (H) > Jul 13 10:43:46 hipster Transport state transition error (T) > Jul 13 10:43:47 hipster ahci: [ID 657156 kern.warning] WARNING: ahci1: error > recovery for port 1 succeed > Jul 13 10:44:44 hipster ahci: [ID 517647 kern.warning] WARNING: ahci1: > watchdog port 1 satapkt 0xffffff02d216e8d8 timed out > Jul 13 10:44:44 hipster ahci: [ID 777486 kern.warning] WARNING: ahci1: ahci > port 1 has interface fatal error > Jul 13 10:44:44 hipster ahci: [ID 687168 kern.warning] WARNING: ahci1: ahci > port 1 is trying to do error recovery > Jul 13 10:44:44 hipster ahci: [ID 551337 kern.warning] WARNING: ahci1: > Handshake Error (H) > Jul 13 10:44:44 hipster Transport state transition error (T) > Jul 13 10:44:45 hipster ahci: [ID 657156 kern.warning] WARNING: ahci1: error > recovery for port 1 succeed > Jul 13 10:44:45 hipster sata: [ID 801845 kern.info] > /pci at 0,0/pci8086,8c14 at 1c,2/pci1b21,1060 at 0: > Jul 13 10:44:45 hipster SATA port 1 error > Jul 13 10:44:45 hipster sata: [ID 801845 kern.info] > /pci at 0,0/pci8086,8c14 at 1c,2/pci1b21,1060 at 0: > Jul 13 10:44:45 hipster SATA port 1 error > Jul 13 10:44:45 hipster sata: [ID 801845 kern.info] > /pci at 0,0/pci8086,8c14 at 1c,2/pci1b21,1060 at 0: > Jul 13 10:44:45 hipster SATA port 1 error > Jul 13 10:44:45 hipster sata: [ID 801845 kern.info] > /pci at 0,0/pci8086,8c14 at 1c,2/pci1b21,1060 at 0: > Jul 13 10:44:45 hipster SATA port 1 error > Jul 13 10:44:45 hipster fmd: [ID 377184 daemon.error] SUNW-MSG-ID: > ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major > Jul 13 10:44:45 hipster EVENT-TIME: Sun Jul 13 10:44:45 CEST 2014 > Jul 13 10:44:45 hipster PLATFORM: PowerEdge-T20, CSN: 33XJJZ1, HOSTNAME: > hipster > Jul 13 10:44:45 hipster SOURCE: zfs-diagnosis, REV: 1.0 > Jul 13 10:44:45 hipster EVENT-ID: 6fb72a24-271b-4b1b-9287-d5352ace8993 > Jul 13 10:44:45 hipster DESC: The ZFS pool has experienced currently > unrecoverable I/O > Jul 13 10:44:45 hipster failures. Refer to > http://illumos.org/msg/ZFS-8000-HC for more information. > Jul 13 10:44:45 hipster AUTO-RESPONSE: No automated response will be taken. > Jul 13 10:44:45 hipster IMPACT: Read and write I/Os cannot be serviced. > Jul 13 10:44:45 hipster REC-ACTION: Make sure the affected devices are > connected, then run > Jul 13 10:44:45 hipster 'zpool clear'. > Jul 13 10:45:00 hipster fmd: [ID 377184 daemon.error] SUNW-MSG-ID: > ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major > Jul 13 10:45:00 hipster EVENT-TIME: Sun Jul 13 10:45:00 CEST 2014 > Jul 13 10:45:00 hipster PLATFORM: PowerEdge-T20, CSN: 33XJJZ1, HOSTNAME: > hipster > Jul 13 10:45:00 hipster SOURCE: zfs-diagnosis, REV: 1.0 > Jul 13 10:45:00 hipster EVENT-ID: 52d16a83-6041-e362-b66f-ff4d1fb95a61 > Jul 13 10:45:00 hipster DESC: The number of I/O errors associated with a ZFS > device exceeded > Jul 13 10:45:00 hipster acceptable levels. Refer to > http://illumos.org/msg/ZFS-8000-FD for more information. > Jul 13 10:45:00 hipster AUTO-RESPONSE: The device has been offlined and > marked as faulted. An attempt > Jul 13 10:45:00 hipster will be made to activate a hot spare if > available. > Jul 13 10:45:00 hipster IMPACT: Fault tolerance of the pool may be > compromised. > Jul 13 10:45:00 hipster REC-ACTION: Run 'zpool status -x' and replace the > bad device. > Jul 13 10:45:44 hipster ahci: [ID 517647 kern.warning] WARNING: ahci1: > watchdog port 1 satapkt 0xffffff02deb2f180 timed out > From youzhong at gmail.com Thu Jul 17 15:07:59 2014 From: youzhong at gmail.com (Youzhong Yang) Date: Thu, 17 Jul 2014 11:07:59 -0400 Subject: [OmniOS-discuss] Issue with LSI 3108 MegaRAID ROMB card Message-ID: Hi All, We have problem using the LSI 3108 card, just wondering if anyone here has any success story using this card in production. Here is the FM version info and error we got in /var/adm/messages: BIOS Version : 6.13.00_4.14.05.00_0x06010600 Ctrl-R Version : 5.01-0004 FW Version : 4.210.10-2910 NVDATA Version : 3.1310.00-0054 Boot Block Version : 3.00.00.00-0009 Jul 15 21:01:56 xxxx mr_sas: [ID 270009 kern.warning] WARNING: io_timeout_checker: FW Fault, calling reset adapter Jul 15 21:01:56 xxxx mr_sas: [ID 643100 kern.notice] io_timeout_checker: fw_outstanding 0x17 max_fw_cmds 0x39F Jul 15 21:01:59 xxxx mr_sas: [ID 347913 kern.warning] WARNING: mrsas_tbolt_reset_ppc: FW is in fault after OCR count 1 Retry Reset Jul 15 21:02:09 xxxx mr_sas: [ID 887724 kern.warning] WARNING: mrsas_tbolt_reset_ppc:resetadapter bit is set already check retry count 101 Thanks, -Youzhong -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Thu Jul 17 15:15:54 2014 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 17 Jul 2014 11:15:54 -0400 Subject: [OmniOS-discuss] Issue with LSI 3108 MegaRAID ROMB card In-Reply-To: References: Message-ID: <59D196E8-9E21-4205-BAD5-3DFE22873AA1@omniti.com> On Jul 17, 2014, at 11:07 AM, Youzhong Yang wrote: > Hi All, > > We have problem using the LSI 3108 card, just wondering if anyone here has any success story using this card in production. When mr_sas(7d) was updated for 2208, it included untested 3108 support. 3108 was untested because people didn't have 3108 cards at the time it went back. The messages you're seeing indicate the card's timing-out an IO operation, followed by a reset-the-card failure. Beyond that, I can't help you much right now. I have no such card available to me. For the record, are you stuck using this, or did you choose a 3108? I'd recommend choosing something else if it was your choice. If it's not, please tell me what platform stuck you with a 3108, as it may be a harbinger of future complaints. Thanks, Dan From youzhong at gmail.com Thu Jul 17 15:59:16 2014 From: youzhong at gmail.com (Youzhong Yang) Date: Thu, 17 Jul 2014 11:59:16 -0400 Subject: [OmniOS-discuss] Issue with LSI 3108 MegaRAID ROMB card In-Reply-To: <59D196E8-9E21-4205-BAD5-3DFE22873AA1@omniti.com> References: <59D196E8-9E21-4205-BAD5-3DFE22873AA1@omniti.com> Message-ID: Thanks Dan. We ordered these Supermicro X9DRW-CF/CTF boxes which have ROMB LSI 3108 on the motherboard and got stuck. We will probably add 9211-8i HBA cards to the machines and get them move forward. Thanks, Youzhong On Thu, Jul 17, 2014 at 11:15 AM, Dan McDonald wrote: > > On Jul 17, 2014, at 11:07 AM, Youzhong Yang wrote: > > > Hi All, > > > > We have problem using the LSI 3108 card, just wondering if anyone here > has any success story using this card in production. > > > > When mr_sas(7d) was updated for 2208, it included untested 3108 support. > 3108 was untested because people didn't have 3108 cards at the time it > went back. > > The messages you're seeing indicate the card's timing-out an IO operation, > followed by a reset-the-card failure. > > Beyond that, I can't help you much right now. I have no such card > available to me. For the record, are you stuck using this, or did you > choose a 3108? I'd recommend choosing something else if it was your > choice. If it's not, please tell me what platform stuck you with a 3108, > as it may be a harbinger of future complaints. > > Thanks, > Dan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Thu Jul 17 15:59:28 2014 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 17 Jul 2014 11:59:28 -0400 Subject: [OmniOS-discuss] Issue with LSI 3108 MegaRAID ROMB card In-Reply-To: References: <59D196E8-9E21-4205-BAD5-3DFE22873AA1@omniti.com> Message-ID: On Jul 17, 2014, at 11:59 AM, Youzhong Yang wrote: > Thanks Dan. > > We ordered these Supermicro X9DRW-CF/CTF boxes which have ROMB LSI 3108 on the motherboard and got stuck. We will probably add 9211-8i HBA cards to the machines and get them move forward. Before I saw this, one of my OmniTI co-workers mentioned that model of mobo as one possibility. Thanks, Dan From youzhong at gmail.com Thu Jul 17 19:30:25 2014 From: youzhong at gmail.com (Youzhong Yang) Date: Thu, 17 Jul 2014 15:30:25 -0400 Subject: [OmniOS-discuss] [smartos-discuss] Re: Issue with LSI 3108 MegaRAID ROMB card In-Reply-To: <20140717170103.GA425@joyent.com> References: <59D196E8-9E21-4205-BAD5-3DFE22873AA1@omniti.com> <20140717170103.GA425@joyent.com> Message-ID: Thanks Keith. We upgraded the firmware to its latest but still no luck, so likely we'll give up. BIOS Version : 6.17.04.0_4.16.08.00_0x06060A01 Ctrl-R Version : 5.04-0002 Preboot CLI Version: 01.07-05:#%0000 FW Version : 4.230.20-3532 NVDATA Version : 3.1403.00-0079 Boot Block Version : 3.02.00.00-0001 Jul 17 13:08:35 batfs0388 mr_sas: [ID 643100 kern.notice] io_timeout_checker: fw_outstanding 0x17 max_fw_cmds 0x39F Jul 17 13:08:38 batfs0388 mr_sas: [ID 347913 kern.warning] WARNING: mrsas_tbolt_reset_ppc: FW is in fault after OCR count 1 Retry Reset Jul 17 13:08:48 batfs0388 mr_sas: [ID 887724 kern.warning] WARNING: mrsas_tbolt_reset_ppc:resetadapter bit is set already check retry count 101 Jul 17 13:08:49 batfs0388 mr_sas: [ID 270009 kern.warning] WARNING: io_timeout_checker: FW Fault, calling reset adapter On Thu, Jul 17, 2014 at 1:01 PM, Keith Wesolowski < keith.wesolowski at joyent.com> wrote: > On Thu, Jul 17, 2014 at 11:59:16AM -0400, Youzhong Yang via > smartos-discuss wrote: > > > We ordered these Supermicro X9DRW-CF/CTF boxes which have ROMB LSI 3108 > on > > the motherboard and got stuck. We will probably add 9211-8i HBA cards to > > the machines and get them move forward. > > The 9207-8i is likely a better fit; it comes standard with IT firmware > vs IR in the 9211-8i. Both do work, however. SMCI makes a similar > board, the X9DRD-7LN4F-JBOD, which has the same 2308-IT on board as the > 9207-8i. I recommend using that instead of the X9DRW unless you're > wedded to the WIO form factor. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From henson at acm.org Mon Jul 21 02:59:02 2014 From: henson at acm.org (Paul B. Henson) Date: Sun, 20 Jul 2014 19:59:02 -0700 Subject: [OmniOS-discuss] unexpected BE snapshots Message-ID: <20140721025902.GL31192@bender.unx.csupomona.edu> I decided to go clean up some old boot environments, and noticed some unexpected snapshots of my new 151010 BE. Before I cleaned up, it looked like this: rpool 2.56G 36.6G 37K /rpool rpool/ROOT 2.55G 36.6G 31K legacy rpool/ROOT/omnios-r151008f 590K 36.6G 1.05G / rpool/ROOT/omnios-r151008f-tty-irq 494K 36.6G 814M / rpool/ROOT/omnios-r151008f-ttyc-1 592K 36.6G 813M / rpool/ROOT/omnios-r151008j 652K 36.6G 813M / rpool/ROOT/omnios-r151008t 53.5M 36.6G 823M / rpool/ROOT/omnios-r151008t-backup-1 58K 36.6G 828M / rpool/ROOT/omnios-r151010 2.50G 36.6G 1.08G / rpool/ROOT/omnios-r151010 at install 490M - 672M - rpool/ROOT/omnios-r151010 at 2014-02-22-03:00:02 340M - 1.05G - rpool/ROOT/omnios-r151010 at 2014-03-13-22:38:49 64.4M - 813M - rpool/ROOT/omnios-r151010 at 2014-05-30-21:31:31 3.54M - 822M - rpool/ROOT/omnios-r151010 at 2014-05-30-21:36:13 6.19M - 822M - rpool/ROOT/omnios-r151010 at 2014-06-05-22:00:07 18.0M - 828M - rpool/ROOT/omnios-r151010 at 2014-06-16-18:55:29 28.8M - 1.01G - I noticed the new 151010 BE had a number of snapshots I didn't make; the dates are particularly odd given 151010 wasn't released until May something and I didn't install it until Junish. After running beadm to delete my old BE's, it then looked like this: rpool 2.02G 37.1G 36.5K /rpool rpool/ROOT 2.02G 37.1G 31K legacy rpool/ROOT/omnios-r151008t 53.5M 37.1G 823M / rpool/ROOT/omnios-r151010 1.97G 37.1G 1.08G / rpool/ROOT/omnios-r151010 at install 492M - 672M - rpool/ROOT/omnios-r151010 at 2014-06-16-18:55:29 412M - 1.01G - Five of the 151010 snapshots had disappeared, presumably cleaned up by beadm? Are these beadm managed snapshots something new? I don't recall ever seeing them before. Can I just delete them manually or would that break something? Between the two of them that are left, they seem to be sucking up about 1 gig or so. Thanks... From danmcd at omniti.com Wed Jul 23 14:18:46 2014 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 23 Jul 2014 10:18:46 -0400 Subject: [OmniOS-discuss] Who's using "bloody" out there? Message-ID: I'm spinning what will be this week's update to bloody right now. There's at least one bugfix in there that I'm kinda surprised nobody noticed or complained about. Can people who are using bloody quickly send a reply to the list on this thread? Thanks, Dan From danmcd at omniti.com Wed Jul 23 14:22:11 2014 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 23 Jul 2014 10:22:11 -0400 Subject: [OmniOS-discuss] OmniOS "bloody" repo has been updated Message-ID: <41D125AE-CB35-46E8-91B8-7291F0ABAE09@omniti.com> Hey everyone! Once again, I've updated the install media for this update as well as the IPS repo. It's a big one, and includes one new device I'd *REALLY* like folks to test upon. Broken down by category, what's new in this update? userspace - zsh to 5.0.5. - mandoc now in the man pages. headers - POSIX 2008 locale support devices - mpt_sas now supports 12G **** WOULD APPRECIATE TESTING **** zones - virtualized load average for zones - per-zone CPU kstats Files & sharing - Several ZFS bugfixes and enhancements. - rpcbind bugfix - NLM bugfixes TCP/IP - tcp_strong_iss defaults to 2. - ipsec_policy_log_interval to 0. Happy updating! Dan From danmcd at omniti.com Wed Jul 23 14:34:48 2014 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 23 Jul 2014 10:34:48 -0400 Subject: [OmniOS-discuss] NOT YET (was Re: OmniOS "bloody" repo has been updated) In-Reply-To: <41D125AE-CB35-46E8-91B8-7291F0ABAE09@omniti.com> References: <41D125AE-CB35-46E8-91B8-7291F0ABAE09@omniti.com> Message-ID: <550297D7-9154-43DB-AD20-E10331532676@omniti.com> AAAAAH! Shoot, this update isn't ready yet. But everything mentioned below will be included. Sorry, Dan On Jul 23, 2014, at 10:22 AM, Dan McDonald wrote: > Hey everyone! > > Once again, I've updated the install media for this update as well as the IPS repo. It's a big one, and includes one new device I'd *REALLY* like folks to test upon. > > Broken down by category, what's new in this update? > > userspace > - zsh to 5.0.5. > - mandoc now in the man pages. > > headers > - POSIX 2008 locale support > > devices > - mpt_sas now supports 12G **** WOULD APPRECIATE TESTING **** > > zones > - virtualized load average for zones > - per-zone CPU kstats > > Files & sharing > - Several ZFS bugfixes and enhancements. > - rpcbind bugfix > - NLM bugfixes > > TCP/IP > - tcp_strong_iss defaults to 2. > - ipsec_policy_log_interval to 0. > > > Happy updating! > Dan > From fabio at fabiorabelo.wiki.br Wed Jul 23 14:41:40 2014 From: fabio at fabiorabelo.wiki.br (=?UTF-8?Q?F=C3=A1bio_Rabelo?=) Date: Wed, 23 Jul 2014 11:41:40 -0300 Subject: [OmniOS-discuss] OmniOS "bloody" repo has been updated In-Reply-To: <41D125AE-CB35-46E8-91B8-7291F0ABAE09@omniti.com> References: <41D125AE-CB35-46E8-91B8-7291F0ABAE09@omniti.com> Message-ID: Just for clarification, this mpt-sas addresses LSI 2308 ? Anyway, it is great news .... thanks ... F?bio Rabelo 2014-07-23 11:22 GMT-03:00 Dan McDonald : > Hey everyone! > > Once again, I've updated the install media for this update as well as the IPS repo. It's a big one, and includes one new device I'd *REALLY* like folks to test upon. > > Broken down by category, what's new in this update? > > userspace > - zsh to 5.0.5. > - mandoc now in the man pages. > > headers > - POSIX 2008 locale support > > devices > - mpt_sas now supports 12G **** WOULD APPRECIATE TESTING **** > > zones > - virtualized load average for zones > - per-zone CPU kstats > > Files & sharing > - Several ZFS bugfixes and enhancements. > - rpcbind bugfix > - NLM bugfixes > > TCP/IP > - tcp_strong_iss defaults to 2. > - ipsec_policy_log_interval to 0. > > > Happy updating! > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From danmcd at omniti.com Wed Jul 23 23:14:08 2014 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 23 Jul 2014 19:14:08 -0400 Subject: [OmniOS-discuss] NOW AVAILABLE - OmniOS "bloody" repo has been updated In-Reply-To: <550297D7-9154-43DB-AD20-E10331532676@omniti.com> References: <41D125AE-CB35-46E8-91B8-7291F0ABAE09@omniti.com> <550297D7-9154-43DB-AD20-E10331532676@omniti.com> Message-ID: Let's try this again! . . . Hey everyone! Once again, I've updated the install media for this update as well as the IPS repo. It's a big one, and includes one new device I'd *REALLY* like folks to test upon IF they have said device (LSI 3008 12G SAS chipset). Broken down by category, what's new in this update? userspace - zsh to 5.0.5. - mandoc now in the man pages. headers - POSIX 2008 locale support devices - mpt_sas now supports LSI 3008 12G SAS **** WOULD APPRECIATE TESTING **** zones - virtualized load average for zones - per-zone CPU kstats Files & sharing - Several ZFS bugfixes and enhancements. - rpcbind bugfix - NLM bugfixes TCP/IP - tcp_strong_iss defaults to 2. - ipsec_policy_log_interval to 0. Happy updating! Dan From henson at acm.org Thu Jul 24 00:25:23 2014 From: henson at acm.org (Paul B. Henson) Date: Wed, 23 Jul 2014 17:25:23 -0700 Subject: [OmniOS-discuss] Who's using "bloody" out there? In-Reply-To: References: Message-ID: <145c01cfa6d5$c409b890$4c1d29b0$@acm.org> > From: Dan McDonald > Sent: Wednesday, July 23, 2014 7:19 AM > > Can people who are using bloody quickly send a reply to the list on this > thread? I've got a bloody box that I poke at and update occasionally, but I can't really say that I'm "using" it very much. I'd notice if it completely failed to boot or died randomly, but beyond that, not much testing going on :(, sorry. From al.slater at scluk.com Thu Jul 24 10:16:23 2014 From: al.slater at scluk.com (Al Slater) Date: Thu, 24 Jul 2014 11:16:23 +0100 Subject: [OmniOS-discuss] pckrecv chash failure Message-ID: <53D0DCF7.9070307@scluk.com> Hi, I am trying to pkgrecv http://pkg.omniti.com/omnios/release to my own repository, but it is failing with root at omniostest:/export/home/aslate# pkgrecv -s http://pkg.omniti.com/omnios/release -d file:///sclomnios -c /var/tmp/pkgrecv-2GrduN 'pkg:/*' Processing packages for publisher omnios ... Retrieving and evaluating 1039 package(s)... PROCESS ITEMS GET (MB) SEND (MB) developer/sunstudio12.1 0/109 5.5/336.5 0.0/1138.5pkgrecv: Invalid contentpath opt/sunstudio12.1/prod/lib/sys/libsunir.so: chash failure: expected: b251c238070b6fdbf392194e85319e2c954a5384 computed: 289fe42c63d889a623a80b9158517139bd29ac3b. (happened 4 times) Is there a problem with the repo? -- Al Slater Technical Director SCL Phone : +44 (0)1273 666607 Fax : +44 (0)1273 666601 email : al.slater at scluk.com Stanton Consultancy Ltd Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU Registered in England Company number: 1957652 VAT number: GB 760 2433 55 From groups at tierarzt-mueller.de Thu Jul 24 18:25:32 2014 From: groups at tierarzt-mueller.de (Alexander Lesle) Date: Thu, 24 Jul 2014 20:25:32 +0200 Subject: [OmniOS-discuss] Oracle Java v7u65 Message-ID: <567177507.20140724202532@tierarzt-mueller.de> Hello All Is it possible to install Oracle Java v7u65 JRE to OI? Is there a package available? And how to do the installation. Thanks. -- Best Regards Alexander Juli, 24 2014 From jesus at omniti.com Thu Jul 24 18:32:27 2014 From: jesus at omniti.com (Theo Schlossnagle) Date: Thu, 24 Jul 2014 14:32:27 -0400 Subject: [OmniOS-discuss] Oracle Java v7u65 In-Reply-To: <567177507.20140724202532@tierarzt-mueller.de> References: <567177507.20140724202532@tierarzt-mueller.de> Message-ID: Do you mean to "OmniOS".... OI is a different distribution. On Thu, Jul 24, 2014 at 2:25 PM, Alexander Lesle wrote: > Hello All > > Is it possible to install Oracle Java v7u65 JRE to OI? > Is there a package available? > And how to do the installation. > > Thanks. > > -- > Best Regards > Alexander > Juli, 24 2014 > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -- Theo Schlossnagle http://omniti.com/is/theo-schlossnagle -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdg117 at elvis.arl.psu.edu Thu Jul 24 18:33:12 2014 From: jdg117 at elvis.arl.psu.edu (John D Groenveld) Date: Thu, 24 Jul 2014 14:33:12 -0400 Subject: [OmniOS-discuss] Oracle Java v7u65 In-Reply-To: Your message of "Thu, 24 Jul 2014 20:25:32 +0200." <567177507.20140724202532@tierarzt-mueller.de> References: <567177507.20140724202532@tierarzt-mueller.de> Message-ID: <201407241833.s6OIXChp026989@elvis.arl.psu.edu> In message <567177507.20140724202532 at tierarzt-mueller.de>, Alexander Lesle writ es: >Is it possible to install Oracle Java v7u65 JRE to OI? >Is there a package available? The SVR4 packages for Solaris 10 are on OTN: >And how to do the installation. pkgadd(1M) per the README. John groenveld at acm.org From groups at tierarzt-mueller.de Thu Jul 24 18:36:33 2014 From: groups at tierarzt-mueller.de (Alexander Lesle) Date: Thu, 24 Jul 2014 20:36:33 +0200 Subject: [OmniOS-discuss] Oracle Java v7u65 In-Reply-To: References: <567177507.20140724202532@tierarzt-mueller.de> Message-ID: <473254889.20140724203633@tierarzt-mueller.de> Hello Theo Schlossnagle and List, you are right. Sorry. At the moment I mean OmniOS. On Juli, 24 2014, 20:32 wrote in [1]: > Do you mean to "OmniOS".... OI is a different distribution. > On Thu, Jul 24, 2014 at 2:25 PM, Alexander Lesle > wrote: > > Hello All > > Is it possible to install Oracle Java v7u65 JRE to OI? > Is there a package available? > And how to do the installation. > > Thanks. > > -- > Best Regards > Alexander > Juli, 24 2014 > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -- Best Regards Alexander Juli, 24 2014 ........ [1] mid:CACLsApuqenQy4VFBTSaz_S2_e5Tf3ZWK_-5eaR--M-nCfK96WA at mail.gmail.com ........ From jesus at omniti.com Thu Jul 24 18:40:47 2014 From: jesus at omniti.com (Theo Schlossnagle) Date: Thu, 24 Jul 2014 14:40:47 -0400 Subject: [OmniOS-discuss] Oracle Java v7u65 In-Reply-To: <473254889.20140724203633@tierarzt-mueller.de> References: <567177507.20140724202532@tierarzt-mueller.de> <473254889.20140724203633@tierarzt-mueller.de> Message-ID: I usually download the gzip tarball for i586 and x86_64 for Solaris 10. And untar it in /opt/ On Thu, Jul 24, 2014 at 2:36 PM, Alexander Lesle wrote: > Hello Theo Schlossnagle and List, > > you are right. Sorry. > > At the moment I mean OmniOS. > > On Juli, 24 2014, 20:32 wrote in [1]: > > > Do you mean to "OmniOS".... OI is a different distribution. > > > > On Thu, Jul 24, 2014 at 2:25 PM, Alexander Lesle > > wrote: > > > > Hello All > > > > Is it possible to install Oracle Java v7u65 JRE to OI? > > Is there a package available? > > And how to do the installation. > > > > Thanks. > > > > -- > > Best Regards > > Alexander > > Juli, 24 2014 > > > > _______________________________________________ > > OmniOS-discuss mailing list > > OmniOS-discuss at lists.omniti.com > > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > > > > > > -- > Best Regards > Alexander > Juli, 24 2014 > ........ > [1] mid:CACLsApuqenQy4VFBTSaz_S2_e5Tf3ZWK_-5eaR--M-nCfK96WA at mail.gmail.com > ........ > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -- Theo Schlossnagle http://omniti.com/is/theo-schlossnagle -------------- next part -------------- An HTML attachment was scrubbed... URL: From groups at tierarzt-mueller.de Thu Jul 24 18:57:17 2014 From: groups at tierarzt-mueller.de (Alexander Lesle) Date: Thu, 24 Jul 2014 20:57:17 +0200 Subject: [OmniOS-discuss] Oracle Java v7u65 In-Reply-To: References: <567177507.20140724202532@tierarzt-mueller.de> <473254889.20140724203633@tierarzt-mueller.de> Message-ID: <639180269.20140724205717@tierarzt-mueller.de> Hello Theo Schlossnagle and List, You mean this file "jre-7u65-solaris-i586.tar.gz" from java.com Only unpack in /opt/ Nothing else? No Var setting in PATH?? So easy? :)) On Juli, 24 2014, 20:40 wrote in [1]: > I usually download the gzip tarball for i586 and x86_64 for Solaris 10. ?And untar it in /opt/ > On Thu, Jul 24, 2014 at 2:36 PM, Alexander Lesle > wrote: > > Hello Theo Schlossnagle and List, > > you are right. Sorry. > > At the moment I mean OmniOS. > > On Juli, 24 2014, 20:32 wrote in [1]: > >> Do you mean to "OmniOS".... OI is a different distribution. > > >> On Thu, Jul 24, 2014 at 2:25 PM, Alexander Lesle >> wrote: >> >> Hello All >> >> ?Is it possible to install Oracle Java v7u65 JRE to OI? >> ?Is there a package available? >> ?And how to do the installation. >> >> ?Thanks. >> >> ?-- >> ?Best Regards >> ?Alexander >> ?Juli, 24 2014 >> >> ?_______________________________________________ >> ?OmniOS-discuss mailing list >> ?OmniOS-discuss at lists.omniti.com >> ?http://lists.omniti.com/mailman/listinfo/omnios-discuss >> > > > > > > -- > Best Regards > Alexander > Juli, 24 2014 > ........ > [1] > mid:CACLsApuqenQy4VFBTSaz_S2_e5Tf3ZWK_-5eaR--M-nCfK96WA at mail.gmail.com > ........ > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -- Best Regards Alexander Juli, 24 2014 ........ [1] mid:CACLsApvqJW1Uuw84+Js=_Z7bGY9oCQfy=K6zuuGXkVNZ641sUw at mail.gmail.com ........ From groups at tierarzt-mueller.de Thu Jul 24 19:00:44 2014 From: groups at tierarzt-mueller.de (Alexander Lesle) Date: Thu, 24 Jul 2014 21:00:44 +0200 Subject: [OmniOS-discuss] Oracle Java v7u65 In-Reply-To: <201407241833.s6OIXChp026989@elvis.arl.psu.edu> References: <567177507.20140724202532@tierarzt-mueller.de> <201407241833.s6OIXChp026989@elvis.arl.psu.edu> Message-ID: <1727715525.20140724210044@tierarzt-mueller.de> Hello John D Groenveld and List, Whats OTN? Whats the different like the solution from Theo? pkgadd or untar On Juli, 24 2014, 20:33 wrote in [1]: > In message <567177507.20140724202532 at tierarzt-mueller.de>, Alexander Lesle writ > es: >>Is it possible to install Oracle Java v7u65 JRE to OI? >>Is there a package available? > The SVR4 packages for Solaris 10 are on OTN: > >>And how to do the installation. > pkgadd(1M) per the README. > John > groenveld at acm.org > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -- Best Regards Alexander Juli, 24 2014 ........ [1] mid:201407241833.s6OIXChp026989 at elvis.arl.psu.edu ........ From jimklimov at cos.ru Fri Jul 25 08:47:11 2014 From: jimklimov at cos.ru (Jim Klimov) Date: Fri, 25 Jul 2014 10:47:11 +0200 Subject: [OmniOS-discuss] Oracle Java v7u65 In-Reply-To: <1727715525.20140724210044@tierarzt-mueller.de> References: <567177507.20140724202532@tierarzt-mueller.de> <201407241833.s6OIXChp026989@elvis.arl.psu.edu> <1727715525.20140724210044@tierarzt-mueller.de> Message-ID: <1456d13f-43ac-42f1-a21f-feec284f8b6c@email.android.com> 24 ???? 2014??. 21:00:44 CEST, Alexander Lesle ?????: >Hello John D Groenveld and List, > >Whats OTN? > >Whats the different like the solution from Theo? >pkgadd or untar > >On Juli, 24 2014, 20:33 wrote in [1]: > >> In message <567177507.20140724202532 at tierarzt-mueller.de>, Alexander >Lesle writ >> es: >>>Is it possible to install Oracle Java v7u65 JRE to OI? >>>Is there a package available? > >> The SVR4 packages for Solaris 10 are on OTN: >> > > >>>And how to do the installation. > >> pkgadd(1M) per the README. > >> John >> groenveld at acm.org >> _______________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss Did you previously use and manage Java, on solaris, ilkumos or other oses? There is a JAVA_HOME environment variable (set in shell, profile, initscript, smf attribs, etc.) that points your Java program such as tomcat, or a cli tool, or some gui installer or whatever to the installation location of the jvm you want to use in this case. Of course, the "java" program for this jvm instance should be from the corresponding "$JAVA_HOME/bin" path. So you can have multiple installations and many JVMs running with different versions (there are programs where backwards compatibility does not cut it and you do need an older version of Java for example). It is customary to install Solaris java's into /usr/jdk/instances// and symlink /usr/jdk/latest and /usr/java to the directory with the version you need most likely, and a dozen programs in the standard PATH like /usr/bin/java are in fact symlinks to /usr/java/bin/java and such. For hosts where /usr is system-managed and should not be touched by users according to some policy, /opt/jdk, /opt/java or plain /opt/ are commonly used as containers for jre/jdk installations (typically as unzipped archives, unpackaged). It still makes sense to maintain /opt/java or even /usr/java (if changeable) to point to the installation this zone needs. Note that for some java versions the x86_64 package or archive only includes the 64-bit files and should overlay a 32-bit jvm of the same version installed/unpacked into the same location. For other oses or major java versions the releases are fully sufficient. An indicator may be the file size (i.e. 80mb 32-bit + 10mb 64-bit vs. both similarly-sized). As a hint, if you use many local zones to host farms of java appservers,etc., you'll find that updating java's consistently (especially un-packaged) has a large footprint in storage and management, more so if you customize the jdk installations (local CA's, most recent timezones and so on). Instead, I install/unpack/customize once in the GZ (one best-compressed dataset per jdk in a structure resembling the standard solaris jdk installation), and lofs-mount the whole lot into the local zones. If certain zones need more customization, you can clone off their copy of jdk dataset and delegate into the zone, but we've never needed that beyond some testing of the approach (cumbersome but works and saves space). Also, once you've completed this update on one host, it is easy to 'zfs-send' to your other GZ's hosting LZs with javas. HTH, //Jim Ps: OTN = Oracle TechNet -- Typos courtesy of K-9 Mail on my Samsung Android From groups at tierarzt-mueller.de Fri Jul 25 14:04:59 2014 From: groups at tierarzt-mueller.de (Alexander Lesle) Date: Fri, 25 Jul 2014 16:04:59 +0200 Subject: [OmniOS-discuss] Oracle Java v7u65 In-Reply-To: <1456d13f-43ac-42f1-a21f-feec284f8b6c@email.android.com> References: <567177507.20140724202532@tierarzt-mueller.de> <201407241833.s6OIXChp026989@elvis.arl.psu.edu> <1727715525.20140724210044@tierarzt-mueller.de> <1456d13f-43ac-42f1-a21f-feec284f8b6c@email.android.com> Message-ID: <821427678.20140725160459@tierarzt-mueller.de> Hello Jim Klimov, No, I havent. Thank you very much for your informations and tips. On Juli, 25 2014, 10:47 wrote in [1]: > Did you previously use and manage Java, on solaris, ilkumos or other oses? > There is a JAVA_HOME environment variable (set in shell, profile, > initscript, smf attribs, etc.) that points your Java program such as > tomcat, or a cli tool, or some gui installer or whatever to the > installation location of the jvm you want to use in this case. Of > course, the "java" program for this jvm instance should be from the > corresponding "$JAVA_HOME/bin" path. > So you can have multiple installations and many JVMs running with > different versions (there are programs where backwards compatibility > does not cut it and you do need an older version of Java for example). > It is customary to install Solaris java's into > /usr/jdk/instances// and symlink /usr/jdk/latest and > /usr/java to the directory with the version you need most likely, > and a dozen programs in the standard PATH like /usr/bin/java are in > fact symlinks to /usr/java/bin/java and such. > For hosts where /usr is system-managed and should not be touched by > users according to some policy, /opt/jdk, /opt/java or plain > /opt/ are commonly used as containers for jre/jdk > installations (typically as unzipped archives, unpackaged). It still > makes sense to maintain /opt/java or even /usr/java (if changeable) > to point to the installation this zone needs. > Note that for some java versions the x86_64 package or archive only > includes the 64-bit files and should overlay a 32-bit jvm of the > same version installed/unpacked into the same location. For other > oses or major java versions the releases are fully sufficient. An > indicator may be the file size (i.e. 80mb 32-bit + 10mb 64-bit vs. both similarly-sized). > As a hint, if you use many local zones to host farms of java > appservers,etc., you'll find that updating java's consistently > (especially un-packaged) has a large footprint in storage and > management, more so if you customize the jdk installations (local > CA's, most recent timezones and so on). Instead, I > install/unpack/customize once in the GZ (one best-compressed dataset > per jdk in a structure resembling the standard solaris jdk > installation), and lofs-mount the whole lot into the local zones. If > certain zones need more customization, you can clone off their copy > of jdk dataset and delegate into the zone, but we've never needed > that beyond some testing of the approach (cumbersome but works and > saves space). Also, once you've completed this update on one host, > it is easy to 'zfs-send' to your other GZ's hosting LZs with javas. > HTH, > //Jim > Ps: OTN = Oracle TechNet > -- > Typos courtesy of K-9 Mail on my Samsung Android -- Best Regards Alexander Juli, 25 2014 ........ [1] mid:1456d13f-43ac-42f1-a21f-feec284f8b6c at email.android.com ........ From nrhuff at umn.edu Mon Jul 28 16:59:43 2014 From: nrhuff at umn.edu (Nathan Huff) Date: Mon, 28 Jul 2014 11:59:43 -0500 Subject: [OmniOS-discuss] question on pkg info behavior Message-ID: <53D6817F.50407@umn.edu> I am currently setting up a server using 151006 LTS and I am seeing something unexpected from the 'pkg info command' and I am not sure if it is a bug or just that I don't understand how it is supposed to work. If I run 'pkg info gnu-patch' I get Name: text/gnu-patch Summary: The GNU Patch utility State: Installed Publisher: omnios Version: 2.7 Build Release: 5.11 Branch: 0.151006 Packaging Date: Mon May 6 19:55:18 2013 Size: 239.38 kB FMRI: pkg://omnios/text/gnu-patch at 2.7,5.11-0.151006:20130506T195518Z Which is what I expect, but if I run 'pkg info -r gnu-patch' I get Name: text/gnu-patch Summary: The GNU Patch utility State: Not installed Publisher: omnios Version: 2.7 Build Release: 5.11 Branch: 0.151008 Packaging Date: Wed Dec 4 02:52:08 2013 Size: 240.62 kB FMRI: pkg://omnios/text/gnu-patch at 2.7,5.11-0.151008:20131204T025208Z And if I run 'pkg install gnu-patch' It tells me there are no updates for the image. It looks like 'pkg info -r' Isn't restricting itself to the system's branch, but 'pkg install' is. It seems like the correct behavior would be to have pkg info restrict itself to the branch as well. -- Nathan Huff System Administrator Academic Health Center Information Systems University of Minnesota 612-626-9136 From esproul at omniti.com Mon Jul 28 22:06:58 2014 From: esproul at omniti.com (Eric Sproul) Date: Mon, 28 Jul 2014 18:06:58 -0400 Subject: [OmniOS-discuss] question on pkg info behavior In-Reply-To: <53D6817F.50407@umn.edu> References: <53D6817F.50407@umn.edu> Message-ID: On Mon, Jul 28, 2014 at 12:59 PM, Nathan Huff wrote: > It looks like 'pkg info -r' Isn't restricting itself to the system's branch, > but 'pkg install' is. It seems like the correct behavior would be to have > pkg info restrict itself to the branch as well. Hi Nathan, The behavior of 'pkg info -r' is to show you the most recent version available in the remote repository. That remote version may or may not be installable due to local restrictions (in your case, it's the omnios-userland package, which incorporates on the r151006 version of gnu-patch). See pkg(5) for an explanation of incorporate dependencies. In the case of LTS and r151008, these versions are in the same repo. Starting with r151010, there is a separate repo per release, which will causes fewer problems and less confusion for users. Eric From moo at wuffers.net Tue Jul 29 00:11:32 2014 From: moo at wuffers.net (wuffers) Date: Mon, 28 Jul 2014 20:11:32 -0400 Subject: [OmniOS-discuss] Slow scrub performance Message-ID: Does this look normal? pool: rpool state: ONLINE scan: scrub repaired 0 in 0h3m with 0 errors on Tue Jul 15 09:36:17 2014 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t0d0s0 ONLINE 0 0 0 c4t1d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scan: scrub in progress since Mon Jul 14 17:54:42 2014 6.59T scanned out of 24.2T at 5.71M/s, (scan is slow, no estimated time) 0 repaired, 27.25% done config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t5000C50055F9F637d0 ONLINE 0 0 0 c1t5000C50055F9EF2Fd0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c1t5000C50055F87D97d0 ONLINE 0 0 0 c1t5000C50055F9D3B3d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c1t5000C50055E6606Fd0 ONLINE 0 0 0 c1t5000C50055F9F92Bd0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c1t5000C50055F856CFd0 ONLINE 0 0 0 c1t5000C50055F9FE87d0 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 c1t5000C50055F84A97d0 ONLINE 0 0 0 c1t5000C50055FA0AF7d0 ONLINE 0 0 0 mirror-5 ONLINE 0 0 0 c1t5000C50055F9D3E3d0 ONLINE 0 0 0 c1t5000C50055F9F0B3d0 ONLINE 0 0 0 mirror-6 ONLINE 0 0 0 c1t5000C50055F8A46Fd0 ONLINE 0 0 0 c1t5000C50055F9FB8Bd0 ONLINE 0 0 0 mirror-7 ONLINE 0 0 0 c1t5000C50055F8B21Fd0 ONLINE 0 0 0 c1t5000C50055F9F89Fd0 ONLINE 0 0 0 mirror-8 ONLINE 0 0 0 c1t5000C50055F8BE3Fd0 ONLINE 0 0 0 c1t5000C50055F9E123d0 ONLINE 0 0 0 mirror-9 ONLINE 0 0 0 c1t5000C50055F9379Bd0 ONLINE 0 0 0 c1t5000C50055F9E7D7d0 ONLINE 0 0 0 mirror-10 ONLINE 0 0 0 c1t5000C50055E65F0Fd0 ONLINE 0 0 0 c1t5000C50055F9F80Bd0 ONLINE 0 0 0 mirror-11 ONLINE 0 0 0 c1t5000C50055F8A22Bd0 ONLINE 0 0 0 c1t5000C50055F8D48Fd0 ONLINE 0 0 0 mirror-12 ONLINE 0 0 0 c1t5000C50055E65807d0 ONLINE 0 0 0 c1t5000C50055F8BFA3d0 ONLINE 0 0 0 mirror-13 ONLINE 0 0 0 c1t5000C50055E579F7d0 ONLINE 0 0 0 c1t5000C50055E65877d0 ONLINE 0 0 0 mirror-14 ONLINE 0 0 0 c1t5000C50055F9FA1Fd0 ONLINE 0 0 0 c1t5000C50055F8CDA7d0 ONLINE 0 0 0 mirror-15 ONLINE 0 0 0 c1t5000C50055F8BF9Bd0 ONLINE 0 0 0 c1t5000C50055F9A607d0 ONLINE 0 0 0 mirror-16 ONLINE 0 0 0 c1t5000C50055E66503d0 ONLINE 0 0 0 c1t5000C50055E4FDE7d0 ONLINE 0 0 0 mirror-17 ONLINE 0 0 0 c1t5000C50055F8E017d0 ONLINE 0 0 0 c1t5000C50055F9F3EBd0 ONLINE 0 0 0 mirror-18 ONLINE 0 0 0 c1t5000C50055F8B80Fd0 ONLINE 0 0 0 c1t5000C50055F9F63Bd0 ONLINE 0 0 0 mirror-19 ONLINE 0 0 0 c1t5000C50055F84FB7d0 ONLINE 0 0 0 c1t5000C50055F9FEABd0 ONLINE 0 0 0 mirror-20 ONLINE 0 0 0 c1t5000C50055F8CCAFd0 ONLINE 0 0 0 c1t5000C50055F9F91Bd0 ONLINE 0 0 0 mirror-21 ONLINE 0 0 0 c1t5000C50055E65ABBd0 ONLINE 0 0 0 c1t5000C50055F8905Fd0 ONLINE 0 0 0 mirror-22 ONLINE 0 0 0 c1t5000C50055E57A5Fd0 ONLINE 0 0 0 c1t5000C50055F87E73d0 ONLINE 0 0 0 mirror-23 ONLINE 0 0 0 c1t5000C50055E66053d0 ONLINE 0 0 0 c1t5000C50055E66B63d0 ONLINE 0 0 0 mirror-24 ONLINE 0 0 0 c1t5000C50055F8723Bd0 ONLINE 0 0 0 c1t5000C50055F8C3ABd0 ONLINE 0 0 0 logs c2t5000A72A3007811Dd0 ONLINE 0 0 0 cache c2t500117310015D579d0 ONLINE 0 0 0 c2t50011731001631FDd0 ONLINE 0 0 0 c12t500117310015D59Ed0 ONLINE 0 0 0 c12t500117310015D54Ed0 ONLINE 0 0 0 spares c1t5000C50055FA2AEFd0 AVAIL c1t5000C50055E595B7d0 AVAIL errors: No known data errors --- This is a ~90TB SAN on r151008, with 25 pairs of 4TB mirror drives. The last scrub I ran was about 3 months ago, which took (from my recollection) ~250 hours or so. I've only run about 4 scrubs so far on this installation. The current scrub has been running for 2 weeks, with no end in sight. The last time I saw an estimate, it said around ~650 hours remaining. This thread http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/46021 from over 3 years ago mention the metaslab_min_alloc_size as a way to improve this (reducing it to 4K from 10MB). Further reading into this property got me this Illumos bug: https://www.illumos.org/issues/54, which states "Turns out this tunable is made irrelevant as a result of a change to use the metaslab_df_ops allocator. We don't need to change it. I'm closing this bug.". So that seems like a dead end to me. This is the current load with scrub running (~350 VMs between Hyper-V and VMware environments): # iostat -xnze extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.4 12.5 39.7 78.8 0.1 0.0 5.0 0.1 0 0 0 0 0 0 rpool 0.2 6.9 19.9 39.4 0.0 0.0 0.0 0.1 0 0 0 0 0 0 c4t0d0 0.2 6.8 19.9 39.4 0.0 0.0 0.0 0.1 0 0 0 0 0 0 c4t1d0 4.4 29.3 209.7 962.7 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8723Bd0 4.7 25.1 209.4 962.3 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055E66B63d0 4.7 27.6 208.3 952.7 0.0 0.0 0.0 1.3 0 3 0 0 0 0 c1t5000C50055F87E73d0 4.4 28.6 209.1 974.3 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8BFA3d0 4.4 28.9 208.3 964.5 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9E123d0 4.4 25.7 208.7 955.7 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F0B3d0 4.4 26.5 209.1 960.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9D3B3d0 4.3 25.2 206.6 936.1 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055E4FDE7d0 4.4 26.9 208.1 982.6 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9A607d0 4.4 24.5 208.7 955.4 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055F8CDA7d0 4.3 26.5 207.8 943.8 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055E65877d0 4.4 27.7 208.0 961.1 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9E7D7d0 4.3 26.0 208.0 953.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055FA0AF7d0 4.3 26.1 208.0 966.2 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9FE87d0 4.4 28.5 208.6 965.3 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F91Bd0 4.3 26.7 207.2 945.0 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9FEABd0 4.4 26.5 209.3 980.1 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F63Bd0 4.3 26.1 207.6 944.3 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055F9F3EBd0 4.3 26.5 208.1 954.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F80Bd0 32.5 14.7 1005.6 751.2 0.0 0.0 0.0 0.3 0 1 0 0 0 0 c2t500117310015D579d0 32.5 14.7 1004.1 751.2 0.0 0.0 0.0 0.3 0 1 0 0 0 0 c2t50011731001631FDd0 0.0 180.8 0.0 16434.5 0.0 0.3 0.0 1.6 0 4 0 0 0 0 c2t5000A72A3007811Dd0 4.4 25.3 208.7 966.7 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9FB8Bd0 4.4 26.3 208.5 949.1 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F92Bd0 4.4 29.7 208.6 975.1 0.0 0.0 0.0 1.3 0 3 0 0 0 0 c1t5000C50055F8905Fd0 4.4 25.7 207.9 954.1 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8D48Fd0 4.4 26.8 208.4 967.4 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F89Fd0 4.4 28.5 208.1 964.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9EF2Fd0 4.4 29.4 209.5 962.7 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8C3ABd0 4.7 25.0 208.9 962.3 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055E66053d0 4.3 25.1 207.5 936.1 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055E66503d0 4.4 25.6 209.1 955.7 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9D3E3d0 4.3 26.6 207.4 945.0 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F84FB7d0 4.3 26.0 207.5 944.3 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055F8E017d0 4.3 26.4 207.1 943.8 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055E579F7d0 4.4 28.5 208.8 974.3 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055E65807d0 4.4 25.9 208.5 953.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F84A97d0 4.4 26.4 209.2 960.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F87D97d0 4.4 28.5 208.8 964.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F637d0 4.4 29.6 208.9 975.1 0.0 0.0 0.0 1.3 0 3 0 0 0 0 c1t5000C50055E65ABBd0 4.4 26.7 208.5 982.6 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8BF9Bd0 4.3 25.6 207.6 954.1 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055F8A22Bd0 4.4 27.6 208.2 961.1 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9379Bd0 4.7 27.6 208.3 952.8 0.0 0.0 0.0 1.3 0 3 0 0 0 0 c1t5000C50055E57A5Fd0 4.4 28.4 208.4 965.3 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8CCAFd0 4.4 26.4 208.9 980.1 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055F8B80Fd0 4.4 24.4 208.9 955.4 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055F9FA1Fd0 4.3 26.4 207.6 954.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055E65F0Fd0 4.4 28.8 208.3 964.5 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8BE3Fd0 4.3 26.7 207.4 967.4 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8B21Fd0 4.4 25.1 208.9 966.7 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8A46Fd0 4.4 26.0 209.7 966.2 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F856CFd0 4.4 26.2 209.0 949.1 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055E6606Fd0 32.5 14.7 1004.3 750.9 0.0 0.0 0.0 0.3 0 1 0 0 0 0 c12t500117310015D59Ed0 32.5 14.7 1004.4 751.3 0.0 0.0 0.0 0.3 0 1 0 0 0 0 c12t500117310015D54Ed0 349.1 646.9 14437.7 67437.3 52.7 2.6 52.9 2.6 12 37 0 0 0 0 tank What should I be checking for? Is a scrub supposed to take that long (and I thought over 10 days for the last one was long..)? There doesn't seem to be any hardware errors. Is the load too high (12% wait, 37% busy with asvc_t of 2.6ms)? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsavikko at niksula.hut.fi Tue Jul 29 12:21:05 2014 From: jsavikko at niksula.hut.fi (Janne Savikko) Date: Tue, 29 Jul 2014 15:21:05 +0300 (EEST) Subject: [OmniOS-discuss] dd does not work as expected with count=0 option Message-ID: Hi, I've used dd to create sparse files, and I noticed that dd does not work as expected with count=0 option, but keeps writing indefinitely. This does not happen with dd of Ubuntu 12.04,14.04, Solaris 11 Express (snv_151a), OSX 10.10 beta or OpenBSD 5.5. OmniOS build that I've tested this problem: omnios-b281e50 i86pc This behavior happens also with old Solaris 10 (Generic_139555-08 sun4v). Manual pages state "count=n, Copies only n input blocks". So according to documentation I expect it to copy only 0 blocks. Is this desired behavior (and should documentation be fixed) or a bug? Cheers, Janne From danmcd at omniti.com Tue Jul 29 12:33:42 2014 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 29 Jul 2014 08:33:42 -0400 Subject: [OmniOS-discuss] dd does not work as expected with count=0 option In-Reply-To: References: Message-ID: I suspect this is an Illumos bug. I'm top posting this because it's my phone, and I think the Illumos developers mailing list should confirm my suspicions. Dan Sent from my iPhone (typos, autocorrect, and all) > On Jul 29, 2014, at 8:21 AM, Janne Savikko wrote: > > Hi, > > I've used dd to create sparse files, and I noticed that dd does not work as expected with count=0 option, but keeps writing indefinitely. This does not happen with dd of Ubuntu 12.04,14.04, Solaris 11 Express (snv_151a), OSX 10.10 beta or OpenBSD 5.5. > > OmniOS build that I've tested this problem: omnios-b281e50 i86pc > This behavior happens also with old Solaris 10 (Generic_139555-08 sun4v). > > Manual pages state "count=n, Copies only n input blocks". So according to documentation I expect it to copy only 0 blocks. > > Is this desired behavior (and should documentation be fixed) or a bug? > > > Cheers, > Janne > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From paul at pk1048.com Tue Jul 29 15:02:18 2014 From: paul at pk1048.com (PK1048) Date: Tue, 29 Jul 2014 11:02:18 -0400 Subject: [OmniOS-discuss] Slow scrub performance In-Reply-To: References: Message-ID: <8A89E9CC-065D-432A-9F0C-3C9583284B97@pk1048.com> On Jul 28, 2014, at 20:11, wuffers wrote: > Does this look normal? Short answer, yes. ? Keep in mind that 1. a scrub runs in the background (so as not to impact production I/O, this was not always the case and caused serious issues in the past with a pool being unresponsive due to a scrub) 2. a scrub essentially walks the zpool examining every transaction in order (as does a resilver) So the time to complete a scrub depends on how many write transactions since the pool was created (which is generally related to the amount of data but not always). You are limited by the random I/O capability of the disks involved. With VMs I assume this is a file server, so the I/O size will also affect performance. > This is a ~90TB SAN on r151008, with 25 pairs of 4TB mirror drives. The last scrub I ran was about 3 months ago, which took (from my recollection) ~250 hours or so. I've only run about 4 scrubs so far on this installation. > > The current scrub has been running for 2 weeks, with no end in sight. The last time I saw an estimate, it said around ~650 hours remaining. Run the numbers? you are scanning 24.2TB at about 5.5MB/sec ? 4,613,734 seconds or 54 days. And that assumes the same rate for all of the scan. The rate will change as other I/O competes for resources. > > This thread http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/46021 from over 3 years ago mention the metaslab_min_alloc_size as a way to improve this (reducing it to 4K from 10MB). Further reading into this property got me this Illumos bug: https://www.illumos.org/issues/54, which states "Turns out this tunable is made irrelevant as a result of a change to use the metaslab_df_ops allocator. We don't need to change it. I'm closing this bug.". So that seems like a dead end to me. > > This is the current load with scrub running (~350 VMs between Hyper-V and VMware environments): > > # iostat -xnze > extended device statistics ---- errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device > 0.4 12.5 39.7 78.8 0.1 0.0 5.0 0.1 0 0 0 0 0 0 rpool > 0.2 6.9 19.9 39.4 0.0 0.0 0.0 0.1 0 0 0 0 0 0 c4t0d0 > 0.2 6.8 19.9 39.4 0.0 0.0 0.0 0.1 0 0 0 0 0 0 c4t1d0 > 4.4 29.3 209.7 962.7 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8723Bd0 > 4.7 25.1 209.4 962.3 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055E66B63d0 > 4.7 27.6 208.3 952.7 0.0 0.0 0.0 1.3 0 3 0 0 0 0 c1t5000C50055F87E73d0 Looks like you have a fair bit of activity going on (almost 1MB/sec of writes per spindle). Since this is storage for VMs, I assume this is the storage server for separate compute servers? Have you tuned the block size for the file share you are using? That can make a huge difference in performance. I also noted that you only have a single LOG device. Best Practice is to mirror log devices so you do not lose any data in flight if hit by a power outage (of course, if this server has more UPS runtime that all the clients that may not matter). You may want to ask this question over on the ZFS discuss list? Subscribe here: https://www.listbox.com/subscribe/?listname=zfs at lists.illumos.org From richard.elling at richardelling.com Tue Jul 29 15:29:19 2014 From: richard.elling at richardelling.com (Richard Elling) Date: Tue, 29 Jul 2014 08:29:19 -0700 Subject: [OmniOS-discuss] Slow scrub performance In-Reply-To: References: Message-ID: On Jul 28, 2014, at 5:11 PM, wuffers wrote: > Does this look normal? maybe, maybe not > > pool: rpool > state: ONLINE > scan: scrub repaired 0 in 0h3m with 0 errors on Tue Jul 15 09:36:17 2014 > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c4t0d0s0 ONLINE 0 0 0 > c4t1d0s0 ONLINE 0 0 0 > > errors: No known data errors > > pool: tank > state: ONLINE > scan: scrub in progress since Mon Jul 14 17:54:42 2014 > 6.59T scanned out of 24.2T at 5.71M/s, (scan is slow, no estimated time) this is slower than most, surely slower than desired > 0 repaired, 27.25% done > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c1t5000C50055F9F637d0 ONLINE 0 0 0 > c1t5000C50055F9EF2Fd0 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > c1t5000C50055F87D97d0 ONLINE 0 0 0 > c1t5000C50055F9D3B3d0 ONLINE 0 0 0 > mirror-2 ONLINE 0 0 0 > c1t5000C50055E6606Fd0 ONLINE 0 0 0 > c1t5000C50055F9F92Bd0 ONLINE 0 0 0 > mirror-3 ONLINE 0 0 0 > c1t5000C50055F856CFd0 ONLINE 0 0 0 > c1t5000C50055F9FE87d0 ONLINE 0 0 0 > mirror-4 ONLINE 0 0 0 > c1t5000C50055F84A97d0 ONLINE 0 0 0 > c1t5000C50055FA0AF7d0 ONLINE 0 0 0 > mirror-5 ONLINE 0 0 0 > c1t5000C50055F9D3E3d0 ONLINE 0 0 0 > c1t5000C50055F9F0B3d0 ONLINE 0 0 0 > mirror-6 ONLINE 0 0 0 > c1t5000C50055F8A46Fd0 ONLINE 0 0 0 > c1t5000C50055F9FB8Bd0 ONLINE 0 0 0 > mirror-7 ONLINE 0 0 0 > c1t5000C50055F8B21Fd0 ONLINE 0 0 0 > c1t5000C50055F9F89Fd0 ONLINE 0 0 0 > mirror-8 ONLINE 0 0 0 > c1t5000C50055F8BE3Fd0 ONLINE 0 0 0 > c1t5000C50055F9E123d0 ONLINE 0 0 0 > mirror-9 ONLINE 0 0 0 > c1t5000C50055F9379Bd0 ONLINE 0 0 0 > c1t5000C50055F9E7D7d0 ONLINE 0 0 0 > mirror-10 ONLINE 0 0 0 > c1t5000C50055E65F0Fd0 ONLINE 0 0 0 > c1t5000C50055F9F80Bd0 ONLINE 0 0 0 > mirror-11 ONLINE 0 0 0 > c1t5000C50055F8A22Bd0 ONLINE 0 0 0 > c1t5000C50055F8D48Fd0 ONLINE 0 0 0 > mirror-12 ONLINE 0 0 0 > c1t5000C50055E65807d0 ONLINE 0 0 0 > c1t5000C50055F8BFA3d0 ONLINE 0 0 0 > mirror-13 ONLINE 0 0 0 > c1t5000C50055E579F7d0 ONLINE 0 0 0 > c1t5000C50055E65877d0 ONLINE 0 0 0 > mirror-14 ONLINE 0 0 0 > c1t5000C50055F9FA1Fd0 ONLINE 0 0 0 > c1t5000C50055F8CDA7d0 ONLINE 0 0 0 > mirror-15 ONLINE 0 0 0 > c1t5000C50055F8BF9Bd0 ONLINE 0 0 0 > c1t5000C50055F9A607d0 ONLINE 0 0 0 > mirror-16 ONLINE 0 0 0 > c1t5000C50055E66503d0 ONLINE 0 0 0 > c1t5000C50055E4FDE7d0 ONLINE 0 0 0 > mirror-17 ONLINE 0 0 0 > c1t5000C50055F8E017d0 ONLINE 0 0 0 > c1t5000C50055F9F3EBd0 ONLINE 0 0 0 > mirror-18 ONLINE 0 0 0 > c1t5000C50055F8B80Fd0 ONLINE 0 0 0 > c1t5000C50055F9F63Bd0 ONLINE 0 0 0 > mirror-19 ONLINE 0 0 0 > c1t5000C50055F84FB7d0 ONLINE 0 0 0 > c1t5000C50055F9FEABd0 ONLINE 0 0 0 > mirror-20 ONLINE 0 0 0 > c1t5000C50055F8CCAFd0 ONLINE 0 0 0 > c1t5000C50055F9F91Bd0 ONLINE 0 0 0 > mirror-21 ONLINE 0 0 0 > c1t5000C50055E65ABBd0 ONLINE 0 0 0 > c1t5000C50055F8905Fd0 ONLINE 0 0 0 > mirror-22 ONLINE 0 0 0 > c1t5000C50055E57A5Fd0 ONLINE 0 0 0 > c1t5000C50055F87E73d0 ONLINE 0 0 0 > mirror-23 ONLINE 0 0 0 > c1t5000C50055E66053d0 ONLINE 0 0 0 > c1t5000C50055E66B63d0 ONLINE 0 0 0 > mirror-24 ONLINE 0 0 0 > c1t5000C50055F8723Bd0 ONLINE 0 0 0 > c1t5000C50055F8C3ABd0 ONLINE 0 0 0 > logs > c2t5000A72A3007811Dd0 ONLINE 0 0 0 > cache > c2t500117310015D579d0 ONLINE 0 0 0 > c2t50011731001631FDd0 ONLINE 0 0 0 > c12t500117310015D59Ed0 ONLINE 0 0 0 > c12t500117310015D54Ed0 ONLINE 0 0 0 > spares > c1t5000C50055FA2AEFd0 AVAIL > c1t5000C50055E595B7d0 AVAIL > > errors: No known data errors > > --- > This is a ~90TB SAN on r151008, with 25 pairs of 4TB mirror drives. The last scrub I ran was about 3 months ago, which took (from my recollection) ~250 hours or so. I've only run about 4 scrubs so far on this installation. > > The current scrub has been running for 2 weeks, with no end in sight. The last time I saw an estimate, it said around ~650 hours remaining. The estimate is often very wrong, especially for busy systems. If this is an older ZFS implementation, this pool is likely getting pounded by the ZFS write throttle. There are some tunings that can be applied, but the old write throttle is not a stable control system, so it will always be a little bit unpredictable. > > This thread http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/46021 from over 3 years ago mention the metaslab_min_alloc_size as a way to improve this (reducing it to 4K from 10MB). Further reading into this property got me this Illumos bug: https://www.illumos.org/issues/54, which states "Turns out this tunable is made irrelevant as a result of a change to use the metaslab_df_ops allocator. We don't need to change it. I'm closing this bug.". So that seems like a dead end to me. dead end. > > This is the current load with scrub running (~350 VMs between Hyper-V and VMware environments): > > # iostat -xnze Unfortunately, this is the performance since boot and is not suitable for performance analysis unless the system has been rebooted in the past 10 minutes or so. You'll need to post the second batch from "iostat -zxCn 60 2" > extended device statistics ---- errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device > 0.4 12.5 39.7 78.8 0.1 0.0 5.0 0.1 0 0 0 0 0 0 rpool > 0.2 6.9 19.9 39.4 0.0 0.0 0.0 0.1 0 0 0 0 0 0 c4t0d0 > 0.2 6.8 19.9 39.4 0.0 0.0 0.0 0.1 0 0 0 0 0 0 c4t1d0 > 4.4 29.3 209.7 962.7 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8723Bd0 > 4.7 25.1 209.4 962.3 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055E66B63d0 > 4.7 27.6 208.3 952.7 0.0 0.0 0.0 1.3 0 3 0 0 0 0 c1t5000C50055F87E73d0 > 4.4 28.6 209.1 974.3 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8BFA3d0 > 4.4 28.9 208.3 964.5 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9E123d0 > 4.4 25.7 208.7 955.7 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F0B3d0 > 4.4 26.5 209.1 960.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9D3B3d0 > 4.3 25.2 206.6 936.1 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055E4FDE7d0 > 4.4 26.9 208.1 982.6 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9A607d0 > 4.4 24.5 208.7 955.4 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055F8CDA7d0 > 4.3 26.5 207.8 943.8 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055E65877d0 > 4.4 27.7 208.0 961.1 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9E7D7d0 > 4.3 26.0 208.0 953.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055FA0AF7d0 > 4.3 26.1 208.0 966.2 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9FE87d0 > 4.4 28.5 208.6 965.3 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F91Bd0 > 4.3 26.7 207.2 945.0 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9FEABd0 > 4.4 26.5 209.3 980.1 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F63Bd0 > 4.3 26.1 207.6 944.3 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055F9F3EBd0 > 4.3 26.5 208.1 954.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F80Bd0 > 32.5 14.7 1005.6 751.2 0.0 0.0 0.0 0.3 0 1 0 0 0 0 c2t500117310015D579d0 > 32.5 14.7 1004.1 751.2 0.0 0.0 0.0 0.3 0 1 0 0 0 0 c2t50011731001631FDd0 > 0.0 180.8 0.0 16434.5 0.0 0.3 0.0 1.6 0 4 0 0 0 0 c2t5000A72A3007811Dd0 > 4.4 25.3 208.7 966.7 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9FB8Bd0 > 4.4 26.3 208.5 949.1 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F92Bd0 > 4.4 29.7 208.6 975.1 0.0 0.0 0.0 1.3 0 3 0 0 0 0 c1t5000C50055F8905Fd0 > 4.4 25.7 207.9 954.1 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8D48Fd0 > 4.4 26.8 208.4 967.4 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F89Fd0 > 4.4 28.5 208.1 964.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9EF2Fd0 > 4.4 29.4 209.5 962.7 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8C3ABd0 > 4.7 25.0 208.9 962.3 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055E66053d0 > 4.3 25.1 207.5 936.1 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055E66503d0 > 4.4 25.6 209.1 955.7 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9D3E3d0 > 4.3 26.6 207.4 945.0 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F84FB7d0 > 4.3 26.0 207.5 944.3 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055F8E017d0 > 4.3 26.4 207.1 943.8 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055E579F7d0 > 4.4 28.5 208.8 974.3 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055E65807d0 > 4.4 25.9 208.5 953.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F84A97d0 > 4.4 26.4 209.2 960.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F87D97d0 > 4.4 28.5 208.8 964.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9F637d0 > 4.4 29.6 208.9 975.1 0.0 0.0 0.0 1.3 0 3 0 0 0 0 c1t5000C50055E65ABBd0 > 4.4 26.7 208.5 982.6 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8BF9Bd0 > 4.3 25.6 207.6 954.1 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055F8A22Bd0 > 4.4 27.6 208.2 961.1 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F9379Bd0 > 4.7 27.6 208.3 952.8 0.0 0.0 0.0 1.3 0 3 0 0 0 0 c1t5000C50055E57A5Fd0 > 4.4 28.4 208.4 965.3 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8CCAFd0 > 4.4 26.4 208.9 980.1 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055F8B80Fd0 > 4.4 24.4 208.9 955.4 0.0 0.0 0.0 1.5 0 3 0 0 0 0 c1t5000C50055F9FA1Fd0 > 4.3 26.4 207.6 954.9 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055E65F0Fd0 > 4.4 28.8 208.3 964.5 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8BE3Fd0 > 4.3 26.7 207.4 967.4 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8B21Fd0 > 4.4 25.1 208.9 966.7 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F8A46Fd0 > 4.4 26.0 209.7 966.2 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055F856CFd0 > 4.4 26.2 209.0 949.1 0.0 0.0 0.0 1.4 0 3 0 0 0 0 c1t5000C50055E6606Fd0 > 32.5 14.7 1004.3 750.9 0.0 0.0 0.0 0.3 0 1 0 0 0 0 c12t500117310015D59Ed0 > 32.5 14.7 1004.4 751.3 0.0 0.0 0.0 0.3 0 1 0 0 0 0 c12t500117310015D54Ed0 > 349.1 646.9 14437.7 67437.3 52.7 2.6 52.9 2.6 12 37 0 0 0 0 tank > > What should I be checking for? Is a scrub supposed to take that long (and I thought over 10 days for the last one was long..)? There doesn't seem to be any hardware errors. Is the load too high (12% wait, 37% busy with asvc_t of 2.6ms)? There are many variables here, the biggest of which is the current non-scrub load. -- richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobi at oetiker.ch Tue Jul 29 15:50:02 2014 From: tobi at oetiker.ch (Tobias Oetiker) Date: Tue, 29 Jul 2014 17:50:02 +0200 (CEST) Subject: [OmniOS-discuss] announcement znapzend a new zfs backup tool Message-ID: Just out: ZnapZend a Multilevel Backuptool for ZFS It is on Github. Check out http://www.znapzend.org cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 From jesus at omniti.com Tue Jul 29 15:54:07 2014 From: jesus at omniti.com (Theo Schlossnagle) Date: Tue, 29 Jul 2014 11:54:07 -0400 Subject: [OmniOS-discuss] announcement znapzend a new zfs backup tool In-Reply-To: References: Message-ID: Awesome! On Tue, Jul 29, 2014 at 11:50 AM, Tobias Oetiker wrote: > Just out: > > ZnapZend a Multilevel Backuptool for ZFS > > It is on Github. Check out > > http://www.znapzend.org > > cheers > tobi > > -- > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland > www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -- Theo Schlossnagle http://omniti.com/is/theo-schlossnagle -------------- next part -------------- An HTML attachment was scrubbed... URL: From skiselkov.ml at gmail.com Tue Jul 29 15:59:18 2014 From: skiselkov.ml at gmail.com (Saso Kiselkov) Date: Tue, 29 Jul 2014 17:59:18 +0200 Subject: [OmniOS-discuss] announcement znapzend a new zfs backup tool In-Reply-To: References: Message-ID: <53D7C4D6.5060308@gmail.com> On 7/29/14, 5:50 PM, Tobias Oetiker wrote: > Just out: > > ZnapZend a Multilevel Backuptool for ZFS > > It is on Github. Check out > > http://www.znapzend.org Neat, especially the feature that the backup config is part of a dataset's properties. Very cool. -- Saso From moo at wuffers.net Tue Jul 29 19:29:38 2014 From: moo at wuffers.net (wuffers) Date: Tue, 29 Jul 2014 15:29:38 -0400 Subject: [OmniOS-discuss] Slow scrub performance In-Reply-To: References: Message-ID: Going to try to answer both responses in one message.. Short answer, yes. ? Keep in mind that > > 1. a scrub runs in the background (so as not to impact production I/O, > this was not always the case and caused serious issues in the past with a > pool being unresponsive due to a scrub) > > 2. a scrub essentially walks the zpool examining every transaction in > order (as does a resilver) > > So the time to complete a scrub depends on how many write transactions > since the pool was created (which is generally related to the amount of > data but not always). You are limited by the random I/O capability of the > disks involved. With VMs I assume this is a file server, so the I/O size > will also affect performance. I haven't noticed any slowdowns in our virtual environments, so I guess that's a good thing it's so low priority that it doesn't impact workloads. Run the numbers? you are scanning 24.2TB at about 5.5MB/sec ? 4,613,734 > seconds or 54 days. And that assumes the same rate for all of the scan. The > rate will change as other I/O competes for resources. > The number was fluctuating when I started the scrub, and I had seen it go as high as 35MB/s at one point. I am certain that our Hyper-V workload has increased since the last scrub, so this does make sense. > Looks like you have a fair bit of activity going on (almost 1MB/sec of > writes per spindle). > As Richard correctly states below, this is the aggregate since boot (uptime ~56 days). I have another output from iostat as per his instructions below. > Since this is storage for VMs, I assume this is the storage server for > separate compute servers? Have you tuned the block size for the file share > you are using? That can make a huge difference in performance. > Both the Hyper-V and VMware LUNs are created with 64K block sizes. From what I've read of other performance and tuning articles, that is the optimal block size (I did some limited testing when first configuring the SAN, but results were somewhat inconclusive). Hyper-V hosts our testing environment (we integrate with TFS, a MS product, so we have no choice here) and probably make up the bulk of the workload (~300+ test VMs with various OSes). VMware hosts our production servers (Exchange, file servers, SQL, AD, etc - ~50+ VMs). I also noted that you only have a single LOG device. Best Practice is to > mirror log devices so you do not lose any data in flight if hit by a power > outage (of course, if this server has more UPS runtime that all the clients > that may not matter). > Actually, I do have a mirror ZIL device, it's just disabled at this time (my ZIL devices are ZeusRAMs). At some point, I was troubleshooting some kernel panics (turned out to be a faulty SSD on the rpool), and hadn't re-enabled it yet. Thanks for the reminder (and yes, we do have a UPS as well). And oops.. re-attaching the ZIL as a mirror triggered a resilver now, suspending or canceling the scrub? Will monitor this and restart the scrub if it doesn't by itself. pool: tank state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Jul 29 14:48:48 2014 3.89T scanned out of 24.5T at 3.06G/s, 1h55m to go 0 resilvered, 15.84% done At least it's going very fast. EDIT: Now about 67% done as I finish writing this, speed dropping to ~1.3G/s. maybe, maybe not >> >> this is slower than most, surely slower than desired >> > Unfortunately reattaching the mirror to my log device triggered a resilver. Not sure if this is desired behavior, but yes, 5.5MB/s seems quite slow. Hopefully after the resilver the scrub will progress where it left off. > The estimate is often very wrong, especially for busy systems. >> If this is an older ZFS implementation, this pool is likely getting >> pounded by the >> ZFS write throttle. There are some tunings that can be applied, but the >> old write >> throttle is not a stable control system, so it will always be a little >> bit unpredictable. >> > The system is on r151008 (my BE states that I upgraded back in February, putting me in r151008j or so), with all the pools upgraded for the new enhancements as well as activating the new L2ARC compression feature. Reading the release notes, the ZFS write throttle enhancements were in since r151008e so I should be good there. > # iostat -xnze >> >> >> Unfortunately, this is the performance since boot and is not suitable for >> performance >> analysis unless the system has been rebooted in the past 10 minutes or >> so. You'll need >> to post the second batch from "iostat -zxCn 60 2" >> > Ah yes, that was my mistake. Output from second count (before re-attaching log mirror): # iostat -zxCn 60 2 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 255.7 1077.7 6294.0 41335.1 0.0 1.9 0.0 1.4 0 153 c1 5.3 23.9 118.5 811.9 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F8723Bd0 5.9 14.5 110.0 834.3 0.0 0.0 0.0 1.3 0 2 c1t5000C50055E66B63d0 5.6 16.6 123.8 822.7 0.0 0.0 0.0 1.3 0 2 c1t5000C50055F87E73d0 4.7 27.8 118.6 796.6 0.0 0.0 0.0 1.3 0 3 c1t5000C50055F8BFA3d0 5.6 14.5 139.7 833.8 0.0 0.0 0.0 1.6 0 3 c1t5000C50055F9E123d0 4.4 27.1 112.3 825.2 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F9F0B3d0 5.0 20.2 121.7 803.4 0.0 0.0 0.0 1.2 0 3 c1t5000C50055F9D3B3d0 5.4 26.4 137.0 857.3 0.0 0.0 0.0 1.4 0 4 c1t5000C50055E4FDE7d0 4.7 12.3 123.7 832.7 0.0 0.0 0.0 2.0 0 3 c1t5000C50055F9A607d0 5.0 23.9 125.9 830.9 0.0 0.0 0.0 1.3 0 3 c1t5000C50055F8CDA7d0 4.5 31.4 112.2 814.6 0.0 0.0 0.0 1.1 0 3 c1t5000C50055E65877d0 5.2 24.4 130.6 872.5 0.0 0.0 0.0 1.2 0 3 c1t5000C50055F9E7D7d0 4.1 21.8 103.7 797.2 0.0 0.0 0.0 1.1 0 3 c1t5000C50055FA0AF7d0 5.5 24.8 129.8 802.8 0.0 0.0 0.0 1.5 0 4 c1t5000C50055F9FE87d0 5.7 17.7 137.2 797.6 0.0 0.0 0.0 1.4 0 3 c1t5000C50055F9F91Bd0 6.0 30.6 139.1 852.0 0.0 0.1 0.0 1.5 0 4 c1t5000C50055F9FEABd0 6.1 34.1 137.8 929.2 0.0 0.1 0.0 1.9 0 6 c1t5000C50055F9F63Bd0 4.1 15.9 101.8 791.4 0.0 0.0 0.0 1.6 0 3 c1t5000C50055F9F3EBd0 6.4 23.2 155.2 878.6 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F9F80Bd0 4.5 23.5 106.2 825.4 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F9FB8Bd0 4.0 23.2 101.1 788.9 0.0 0.0 0.0 1.3 0 3 c1t5000C50055F9F92Bd0 4.4 11.3 125.7 782.3 0.0 0.0 0.0 1.9 0 3 c1t5000C50055F8905Fd0 4.6 20.4 129.2 823.0 0.0 0.0 0.0 1.5 0 3 c1t5000C50055F8D48Fd0 5.1 19.7 142.9 887.2 0.0 0.0 0.0 1.7 0 3 c1t5000C50055F9F89Fd0 5.6 11.4 129.1 776.0 0.0 0.0 0.0 1.9 0 3 c1t5000C50055F9EF2Fd0 5.6 23.7 137.4 811.9 0.0 0.0 0.0 1.2 0 3 c1t5000C50055F8C3ABd0 6.8 13.9 132.4 834.3 0.0 0.0 0.0 1.8 0 3 c1t5000C50055E66053d0 5.2 26.7 126.9 857.3 0.0 0.0 0.0 1.2 0 3 c1t5000C50055E66503d0 4.2 27.1 104.6 825.2 0.0 0.0 0.0 1.0 0 3 c1t5000C50055F9D3E3d0 5.2 30.7 140.9 852.0 0.0 0.1 0.0 1.5 0 4 c1t5000C50055F84FB7d0 5.4 16.1 124.3 791.4 0.0 0.0 0.0 1.7 0 3 c1t5000C50055F8E017d0 3.8 31.4 89.7 814.6 0.0 0.0 0.0 1.1 0 4 c1t5000C50055E579F7d0 4.6 27.5 116.0 796.6 0.0 0.1 0.0 1.6 0 4 c1t5000C50055E65807d0 4.0 21.5 99.7 797.2 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F84A97d0 4.7 20.2 116.3 803.4 0.0 0.0 0.0 1.4 0 3 c1t5000C50055F87D97d0 5.0 11.5 121.5 776.0 0.0 0.0 0.0 2.0 0 3 c1t5000C50055F9F637d0 4.9 11.3 112.4 782.3 0.0 0.0 0.0 2.3 0 3 c1t5000C50055E65ABBd0 5.3 11.8 142.5 832.7 0.0 0.0 0.0 2.4 0 3 c1t5000C50055F8BF9Bd0 5.0 20.3 121.4 823.0 0.0 0.0 0.0 1.7 0 3 c1t5000C50055F8A22Bd0 6.6 24.3 170.3 872.5 0.0 0.0 0.0 1.3 0 3 c1t5000C50055F9379Bd0 5.8 16.3 121.7 822.7 0.0 0.0 0.0 1.3 0 2 c1t5000C50055E57A5Fd0 5.3 17.7 146.5 797.6 0.0 0.0 0.0 1.4 0 3 c1t5000C50055F8CCAFd0 5.7 34.1 141.5 929.2 0.0 0.1 0.0 1.7 0 5 c1t5000C50055F8B80Fd0 5.5 23.8 125.7 830.9 0.0 0.0 0.0 1.2 0 3 c1t5000C50055F9FA1Fd0 5.0 23.2 127.9 878.6 0.0 0.0 0.0 1.1 0 3 c1t5000C50055E65F0Fd0 5.2 14.0 163.7 833.8 0.0 0.0 0.0 2.0 0 3 c1t5000C50055F8BE3Fd0 4.6 18.9 122.8 887.2 0.0 0.0 0.0 1.6 0 3 c1t5000C50055F8B21Fd0 5.5 23.6 137.4 825.4 0.0 0.0 0.0 1.5 0 3 c1t5000C50055F8A46Fd0 4.9 24.6 116.7 802.8 0.0 0.0 0.0 1.4 0 4 c1t5000C50055F856CFd0 4.9 23.4 120.8 788.9 0.0 0.0 0.0 1.4 0 3 c1t5000C50055E6606Fd0 234.9 170.1 4079.9 11127.8 0.0 0.2 0.0 0.5 0 9 c2 119.0 28.9 2083.8 670.8 0.0 0.0 0.0 0.3 0 3 c2t500117310015D579d0 115.9 27.4 1996.1 634.2 0.0 0.0 0.0 0.3 0 3 c2t50011731001631FDd0 0.0 113.8 0.0 9822.8 0.0 0.1 0.0 1.0 0 2 c2t5000A72A3007811Dd0 0.1 18.5 0.0 64.8 0.0 0.0 0.0 0.0 0 0 c4 0.1 9.2 0.0 32.4 0.0 0.0 0.0 0.0 0 0 c4t0d0 0.0 9.2 0.0 32.4 0.0 0.0 0.0 0.0 0 0 c4t1d0 229.8 58.1 3987.4 1308.0 0.0 0.1 0.0 0.3 0 6 c12 114.2 27.7 1994.8 626.0 0.0 0.0 0.0 0.3 0 3 c12t500117310015D59Ed0 115.5 30.4 1992.6 682.0 0.0 0.0 0.0 0.3 0 3 c12t500117310015D54Ed0 0.1 17.1 0.0 64.8 0.0 0.0 0.6 0.1 0 0 rpool 720.3 1298.4 14361.2 53770.8 18.7 2.3 9.3 1.1 6 68 tank Is 153% busy correct on c1? Seems to me that disks are quite "busy", but are handling the workload just fine (wait at 6% and asvc_t at 1.1ms) Interestingly, this is the same output now that the resilver is running: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 2876.9 1041.1 25400.7 38189.1 0.0 37.9 0.0 9.7 0 2011 c1 60.8 26.1 540.1 845.2 0.0 0.7 0.0 8.3 0 39 c1t5000C50055F8723Bd0 58.4 14.2 511.6 740.7 0.0 0.7 0.0 10.1 0 39 c1t5000C50055E66B63d0 60.2 16.3 529.3 756.1 0.0 0.8 0.0 10.1 0 41 c1t5000C50055F87E73d0 57.5 24.9 527.6 841.7 0.0 0.7 0.0 9.0 0 40 c1t5000C50055F8BFA3d0 57.9 14.5 543.5 765.1 0.0 0.7 0.0 9.8 0 38 c1t5000C50055F9E123d0 57.9 23.9 516.6 806.9 0.0 0.8 0.0 9.3 0 40 c1t5000C50055F9F0B3d0 59.8 24.6 554.1 857.5 0.0 0.8 0.0 9.6 0 42 c1t5000C50055F9D3B3d0 56.5 21.0 480.4 715.7 0.0 0.7 0.0 8.9 0 37 c1t5000C50055E4FDE7d0 54.8 9.7 473.5 737.9 0.0 0.7 0.0 11.2 0 39 c1t5000C50055F9A607d0 55.8 20.2 457.3 708.7 0.0 0.7 0.0 9.9 0 40 c1t5000C50055F8CDA7d0 57.8 28.6 487.0 796.1 0.0 0.9 0.0 9.9 0 45 c1t5000C50055E65877d0 60.8 27.1 572.6 823.7 0.0 0.8 0.0 8.8 0 41 c1t5000C50055F9E7D7d0 55.8 21.1 478.2 766.6 0.0 0.7 0.0 9.7 0 40 c1t5000C50055FA0AF7d0 57.0 22.8 528.3 724.5 0.0 0.8 0.0 9.6 0 41 c1t5000C50055F9FE87d0 56.2 10.8 465.2 715.6 0.0 0.7 0.0 10.4 0 38 c1t5000C50055F9F91Bd0 59.2 29.4 524.6 740.9 0.0 0.8 0.0 8.9 0 41 c1t5000C50055F9FEABd0 57.3 30.7 496.7 788.3 0.0 0.8 0.0 9.1 0 42 c1t5000C50055F9F63Bd0 55.5 16.3 461.9 652.9 0.0 0.7 0.0 10.1 0 39 c1t5000C50055F9F3EBd0 57.2 22.1 495.1 701.1 0.0 0.8 0.0 9.8 0 41 c1t5000C50055F9F80Bd0 59.5 30.2 543.1 741.8 0.0 0.9 0.0 9.6 0 45 c1t5000C50055F9FB8Bd0 56.5 25.1 515.4 786.9 0.0 0.7 0.0 8.6 0 38 c1t5000C50055F9F92Bd0 61.8 12.5 540.6 790.9 0.0 0.8 0.0 10.3 0 41 c1t5000C50055F8905Fd0 57.0 19.8 521.0 774.3 0.0 0.7 0.0 9.6 0 39 c1t5000C50055F8D48Fd0 56.3 16.3 517.7 724.7 0.0 0.7 0.0 9.9 0 38 c1t5000C50055F9F89Fd0 57.0 13.4 504.5 790.5 0.0 0.8 0.0 10.7 0 40 c1t5000C50055F9EF2Fd0 55.0 26.1 477.6 845.2 0.0 0.7 0.0 8.3 0 36 c1t5000C50055F8C3ABd0 57.8 14.1 518.7 740.7 0.0 0.8 0.0 10.8 0 41 c1t5000C50055E66053d0 55.9 20.8 490.2 715.7 0.0 0.7 0.0 9.0 0 37 c1t5000C50055E66503d0 57.0 24.1 509.7 806.9 0.0 0.8 0.0 10.0 0 41 c1t5000C50055F9D3E3d0 59.1 29.2 504.1 740.9 0.0 0.8 0.0 9.3 0 44 c1t5000C50055F84FB7d0 54.4 16.3 449.5 652.9 0.0 0.7 0.0 10.4 0 39 c1t5000C50055F8E017d0 57.8 28.4 503.3 796.1 0.0 0.9 0.0 10.1 0 45 c1t5000C50055E579F7d0 58.2 24.9 502.0 841.7 0.0 0.8 0.0 9.2 0 40 c1t5000C50055E65807d0 58.2 20.7 513.4 766.6 0.0 0.8 0.0 9.8 0 41 c1t5000C50055F84A97d0 56.5 24.9 508.0 857.5 0.0 0.8 0.0 9.2 0 40 c1t5000C50055F87D97d0 53.4 13.5 449.9 790.5 0.0 0.7 0.0 10.7 0 38 c1t5000C50055F9F637d0 57.0 11.8 503.0 790.9 0.0 0.7 0.0 10.6 0 39 c1t5000C50055E65ABBd0 55.4 9.6 461.1 737.9 0.0 0.8 0.0 11.6 0 40 c1t5000C50055F8BF9Bd0 55.7 19.7 484.6 774.3 0.0 0.7 0.0 9.9 0 40 c1t5000C50055F8A22Bd0 57.6 27.1 518.2 823.7 0.0 0.8 0.0 8.9 0 40 c1t5000C50055F9379Bd0 59.6 17.0 528.0 756.1 0.0 0.8 0.0 10.1 0 41 c1t5000C50055E57A5Fd0 61.2 10.8 530.0 715.6 0.0 0.8 0.0 10.7 0 40 c1t5000C50055F8CCAFd0 58.0 30.8 493.3 788.3 0.0 0.8 0.0 9.4 0 43 c1t5000C50055F8B80Fd0 56.5 19.9 490.7 708.7 0.0 0.8 0.0 10.0 0 40 c1t5000C50055F9FA1Fd0 56.1 22.4 484.2 701.1 0.0 0.7 0.0 9.5 0 39 c1t5000C50055E65F0Fd0 59.2 14.6 560.9 765.1 0.0 0.7 0.0 9.8 0 39 c1t5000C50055F8BE3Fd0 57.9 16.2 546.0 724.7 0.0 0.7 0.0 10.1 0 40 c1t5000C50055F8B21Fd0 59.5 30.0 553.2 741.8 0.0 0.9 0.0 9.8 0 45 c1t5000C50055F8A46Fd0 57.4 22.5 504.0 724.5 0.0 0.8 0.0 9.6 0 41 c1t5000C50055F856CFd0 58.4 24.6 531.4 786.9 0.0 0.7 0.0 8.4 0 38 c1t5000C50055E6606Fd0 511.0 161.4 7572.1 11260.1 0.0 0.3 0.0 0.4 0 14 c2 252.3 20.1 3776.3 458.9 0.0 0.1 0.0 0.2 0 6 c2t500117310015D579d0 258.8 18.0 3795.7 350.0 0.0 0.1 0.0 0.2 0 6 c2t50011731001631FDd0 0.0 123.4 0.0 10451.1 0.0 0.1 0.0 1.0 0 3 c2t5000A72A3007811Dd0 0.2 16.1 1.9 56.7 0.0 0.0 0.0 0.0 0 0 c4 0.2 8.1 1.6 28.3 0.0 0.0 0.0 0.0 0 0 c4t0d0 0.0 8.1 0.3 28.3 0.0 0.0 0.0 0.0 0 0 c4t1d0 495.6 163.6 7168.9 11290.3 0.0 0.2 0.0 0.4 0 14 c12 0.0 123.4 0.0 10451.1 0.0 0.1 0.0 1.0 0 3 c12t5000A72B300780FFd0 248.2 18.1 3645.8 323.0 0.0 0.1 0.0 0.2 0 5 c12t500117310015D59Ed0 247.4 22.1 3523.1 516.2 0.0 0.1 0.0 0.2 0 6 c12t500117310015D54Ed0 0.2 14.8 1.9 56.7 0.0 0.0 0.6 0.1 0 0 rpool 3883.5 1357.7 40141.6 60739.5 22.8 38.6 4.4 7.4 54 100 tank It is very busy with alot of wait % and higher asvc_t (2011% busy on c1?!). I'm assuming resilvers are alot more aggressive than scrubs. There are many variables here, the biggest of which is the current >> non-scrub load. >> > I might have lost 2 weeks of scrub time, depending on whether the scrub will resume where it left off. I'll update when I can. -------------- next part -------------- An HTML attachment was scrubbed... URL: From henson at acm.org Tue Jul 29 20:32:18 2014 From: henson at acm.org (Paul B. Henson) Date: Tue, 29 Jul 2014 13:32:18 -0700 Subject: [OmniOS-discuss] LDAP TLS client services (on r151006) In-Reply-To: <0BB6AEA7-8454-447F-BE21-5B8B09E26188@homeshore.be> References: <0BB6AEA7-8454-447F-BE21-5B8B09E26188@homeshore.be> Message-ID: <1f7501cfab6c$336023b0$9a206b10$@acm.org> > From: Thierry Bingen > Sent: Monday, July 28, 2014 10:37 AM > > The native ldapsearch having been compiled without the DEBUG option, I > installed the OpenLDAP version of ldapsearch which lets you use the debug > options. The latter informed me that "TLS certificate verification: Error, self > signed certificate in certificate chain". I had installed the (private) CA > certificate in the NSS DB (cert8.db, key3.db, secmod.db) with certutil though. > I then replaced the TLS_CACERTDIR of the OpenLDAP ldap.conf pointing to > the NSS DB directory with a TLS_CACERT pointing directly to the CA > certificate PEM file, and, bingo, it worked! I don't believe openldap uses NSS format certificate databases, so pointing it at one is presumably doomed to failure regardless of the validity of the database. > I therefore suspect that there is something wrong with my NSS DB. I read > somewhere that it shouldn't be cert8.db but cert7.db. I also read the > opposite. Other than that, certutil seems happy with the contents of the NSS > DB. I am lost. As a point of reference, for both solaris and illumos I have successfully used cert8.db and key3.db format NSS certificate repositories. From gearboxes at outlook.com Tue Jul 29 21:20:28 2014 From: gearboxes at outlook.com (Machine Man) Date: Tue, 29 Jul 2014 17:20:28 -0400 Subject: [OmniOS-discuss] KVM - copy paste in VM Message-ID: Hello all, First I want to thank everyone involved with creating and supporting OmniOS. On one of the systems we have a need to run two VMs using KVM. Everything works fine except when a copy of a large file is made inside the VM. Copying over the network result in 38MB/s - 70MB/s and inside the VM it will drop to 700KB/s and somtimes stall entirely. It takes just over 5 min to copy 2.5GB inside the VM.The system has 20 3TB NL-SAS drives and has no problem to perform with VMware connected via FC. Also has a 280GB enterprise SSD allocated for cache. for a HDD device is ide the only supported bus? Thanks, -------------- next part -------------- An HTML attachment was scrubbed... URL: From bfriesen at simple.dallas.tx.us Wed Jul 30 01:39:59 2014 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Tue, 29 Jul 2014 20:39:59 -0500 (CDT) Subject: [OmniOS-discuss] KVM - copy paste in VM In-Reply-To: References: Message-ID: On Tue, 29 Jul 2014, Machine Man wrote: > Hello all, > First I want to thank everyone involved with creating and supporting OmniOS. > > On one of the systems we have a need to run two VMs using KVM.? > Everything works fine except when a copy of a large file is made inside the VM. Copying over the network result > in 38MB/s - 70MB/s and inside the VM it will drop to 700KB/s and somtimes stall entirely. It takes just over 5 > min to copy 2.5GB inside the VM. What direction is your network copy going (from VM to native server, from native server to VM, from VM on one server to VM on another)? You have not described your situation very clearly to us at all. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From gearboxes at outlook.com Wed Jul 30 12:30:15 2014 From: gearboxes at outlook.com (Machine Man) Date: Wed, 30 Jul 2014 08:30:15 -0400 Subject: [OmniOS-discuss] KVM - copy paste in VM In-Reply-To: References: , Message-ID: The problem was making a copy of a large file inside the VM (duplicating the file in the VM)It was very slow and it looks like it is the ide device.Changed the controller to virtio and it is much faster.I tried using an image file for this disk, but this was slower than pointing to zvol on ide. I will test with the controller set to virtio and file img for disk.Overall,it is not bad now and the virtual machine is much more usable when making large copies, still seems slow from what I would expect. Copy start off at 120MB/s and quickly drops down to 14MB/s and then jumps up and down to about 30 or 40 a few times, but no longer stalls as before.Thanks > Date: Tue, 29 Jul 2014 20:39:59 -0500 > From: bfriesen at simple.dallas.tx.us > To: gearboxes at outlook.com > CC: omnios-discuss at lists.omniti.com > Subject: Re: [OmniOS-discuss] KVM - copy paste in VM > > On Tue, 29 Jul 2014, Machine Man wrote: > > > Hello all, > > First I want to thank everyone involved with creating and supporting OmniOS. > > > > On one of the systems we have a need to run two VMs using KVM. > > Everything works fine except when a copy of a large file is made inside the VM. Copying over the network result > > in 38MB/s - 70MB/s and inside the VM it will drop to 700KB/s and somtimes stall entirely. It takes just over 5 > > min to copy 2.5GB inside the VM. > > What direction is your network copy going (from VM to native server, > from native server to VM, from VM on one server to VM on another)? > > You have not described your situation very clearly to us at all. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bfriesen at simple.dallas.tx.us Wed Jul 30 14:12:47 2014 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Wed, 30 Jul 2014 09:12:47 -0500 (CDT) Subject: [OmniOS-discuss] KVM - copy paste in VM In-Reply-To: References: , Message-ID: On Wed, 30 Jul 2014, Machine Man wrote: > The problem was making a copy of a large file inside the VM (duplicating the file in the VM)It was very slow and > it looks like it is the ide device. > Changed the controller to virtio and it is much faster. > I tried using an image file for this disk, but this was slower than pointing to zvol on ide. ?I will test with > the controller set to virtio and file img for disk. > Overall,it is not bad now and the virtual machine is much more usable when making large copies, still seems slow > from what I would expect. Copy start off at 120MB/s and quickly drops down to 14MB/s and then jumps up and down > to about 30 or 40 a few times, but no longer stalls as before. What operating system do you have installed in your virtual machine and what filesystem are you using? Is the backing zfs volume blocksize properly matched with the blocksize of the virtual machine's blocksize? If the blocksize and offsets are not well matched, then performance would suffer quite a lot. The VM writes to the zfs volume would normally be synchronous writes (does not return util data is on disk), which will be slow unless you have added a zfs slog (perhaps with an SSD) to make synchronous writes faster. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From moo at wuffers.net Thu Jul 31 04:10:20 2014 From: moo at wuffers.net (wuffers) Date: Thu, 31 Jul 2014 00:10:20 -0400 Subject: [OmniOS-discuss] Slow scrub performance In-Reply-To: References: Message-ID: So as I suspected, I lost 2 weeks of scrub time after the resilver. I started a scrub again, and it's going extremely slow (~13x slower than before): pool: tank state: ONLINE scan: scrub in progress since Tue Jul 29 15:41:27 2014 45.4G scanned out of 24.5T at 413K/s, (scan is slow, no estimated time) 0 repaired, 0.18% done # iostat -zxCn 60 2 (2nd batch output) extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 143.7 1321.5 5149.0 46223.4 0.0 1.5 0.0 1.0 0 120 c1 2.4 33.3 72.0 897.5 0.0 0.0 0.0 0.6 0 2 c1t5000C50055F8723Bd0 2.7 22.8 82.9 1005.4 0.0 0.0 0.0 0.9 0 2 c1t5000C50055E66B63d0 2.2 24.4 73.1 917.7 0.0 0.0 0.0 0.7 0 2 c1t5000C50055F87E73d0 3.1 26.2 120.9 899.8 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F8BFA3d0 2.8 16.5 105.9 941.6 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F9E123d0 2.5 25.6 86.6 897.9 0.0 0.0 0.0 0.7 0 2 c1t5000C50055F9F0B3d0 2.3 19.9 85.3 967.8 0.0 0.0 0.0 1.2 0 2 c1t5000C50055F9D3B3d0 3.1 38.3 120.7 1053.1 0.0 0.0 0.0 0.8 0 3 c1t5000C50055E4FDE7d0 2.6 12.7 81.8 854.3 0.0 0.0 0.0 1.6 0 2 c1t5000C50055F9A607d0 3.2 25.0 121.7 871.7 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F8CDA7d0 2.5 30.6 93.0 941.2 0.0 0.0 0.0 0.9 0 2 c1t5000C50055E65877d0 3.1 43.7 101.4 1004.2 0.0 0.0 0.0 1.0 0 4 c1t5000C50055F9E7D7d0 2.3 24.0 92.2 965.8 0.0 0.0 0.0 0.9 0 2 c1t5000C50055FA0AF7d0 2.5 25.3 99.2 872.9 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F9FE87d0 2.9 19.0 116.1 894.8 0.0 0.0 0.0 1.2 0 2 c1t5000C50055F9F91Bd0 2.6 38.9 96.1 915.4 0.0 0.1 0.0 1.2 0 4 c1t5000C50055F9FEABd0 3.2 45.6 135.7 973.5 0.0 0.1 0.0 1.5 0 5 c1t5000C50055F9F63Bd0 3.1 21.2 105.9 966.6 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F9F3EBd0 2.8 26.7 122.0 781.6 0.0 0.0 0.0 0.7 0 2 c1t5000C50055F9F80Bd0 3.1 31.6 119.9 932.5 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F9FB8Bd0 3.1 32.5 123.3 924.1 0.0 0.0 0.0 0.9 0 3 c1t5000C50055F9F92Bd0 2.9 17.0 113.8 952.0 0.0 0.0 0.0 1.2 0 2 c1t5000C50055F8905Fd0 3.0 23.4 111.0 871.1 0.0 0.0 0.0 1.5 0 2 c1t5000C50055F8D48Fd0 2.8 21.4 105.5 858.0 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F9F89Fd0 3.5 16.4 87.1 941.3 0.0 0.0 0.0 1.4 0 2 c1t5000C50055F9EF2Fd0 2.1 33.8 64.5 897.5 0.0 0.0 0.0 0.5 0 2 c1t5000C50055F8C3ABd0 3.0 21.8 72.3 1005.4 0.0 0.0 0.0 1.0 0 2 c1t5000C50055E66053d0 3.0 37.8 106.9 1053.5 0.0 0.0 0.0 0.9 0 3 c1t5000C50055E66503d0 2.7 26.0 107.7 897.9 0.0 0.0 0.0 0.7 0 2 c1t5000C50055F9D3E3d0 2.2 38.9 96.4 918.7 0.0 0.0 0.0 0.9 0 4 c1t5000C50055F84FB7d0 2.8 21.4 111.1 953.6 0.0 0.0 0.0 0.7 0 1 c1t5000C50055F8E017d0 3.0 30.6 104.3 940.9 0.0 0.1 0.0 1.5 0 3 c1t5000C50055E579F7d0 2.8 26.4 90.9 901.1 0.0 0.0 0.0 0.9 0 2 c1t5000C50055E65807d0 2.4 24.0 96.7 965.8 0.0 0.0 0.0 0.9 0 2 c1t5000C50055F84A97d0 2.9 19.8 109.4 967.8 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F87D97d0 3.8 16.1 106.4 943.1 0.0 0.0 0.0 1.3 0 2 c1t5000C50055F9F637d0 2.2 17.1 72.7 966.6 0.0 0.0 0.0 1.4 0 2 c1t5000C50055E65ABBd0 2.7 12.7 86.0 863.3 0.0 0.0 0.0 1.5 0 2 c1t5000C50055F8BF9Bd0 2.7 23.2 101.8 871.1 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F8A22Bd0 4.5 43.6 134.7 1004.2 0.0 0.0 0.0 1.0 0 4 c1t5000C50055F9379Bd0 2.8 24.0 87.9 917.7 0.0 0.0 0.0 0.8 0 2 c1t5000C50055E57A5Fd0 2.9 18.8 119.0 894.3 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F8CCAFd0 3.4 45.7 128.1 976.8 0.0 0.1 0.0 1.2 0 5 c1t5000C50055F8B80Fd0 2.7 24.9 100.2 871.7 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F9FA1Fd0 4.8 26.8 128.6 781.6 0.0 0.0 0.0 0.7 0 2 c1t5000C50055E65F0Fd0 2.7 16.3 109.5 941.6 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F8BE3Fd0 3.1 21.1 119.9 858.0 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F8B21Fd0 2.8 31.8 108.5 932.5 0.0 0.0 0.0 1.0 0 3 c1t5000C50055F8A46Fd0 2.4 25.3 87.4 872.9 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F856CFd0 3.3 32.0 125.2 924.1 0.0 0.0 0.0 1.2 0 3 c1t5000C50055E6606Fd0 289.9 169.0 3905.0 12754.1 0.0 0.2 0.0 0.4 0 10 c2 146.6 14.1 1987.9 305.2 0.0 0.0 0.0 0.2 0 4 c2t500117310015D579d0 143.4 10.6 1917.1 205.2 0.0 0.0 0.0 0.2 0 3 c2t50011731001631FDd0 0.0 144.3 0.0 12243.7 0.0 0.1 0.0 0.9 0 3 c2t5000A72A3007811Dd0 0.0 14.6 0.0 75.8 0.0 0.0 0.0 0.1 0 0 c4 0.0 7.3 0.0 37.9 0.0 0.0 0.0 0.1 0 0 c4t0d0 0.0 7.3 0.0 37.9 0.0 0.0 0.0 0.1 0 0 c4t1d0 284.8 171.5 3792.8 12786.2 0.0 0.2 0.0 0.4 0 10 c12 0.0 144.3 0.0 12243.7 0.0 0.1 0.0 0.9 0 3 c12t5000A72B300780FFd0 152.3 13.3 2004.6 255.9 0.0 0.0 0.0 0.2 0 4 c12t500117310015D59Ed0 132.5 13.9 1788.2 286.6 0.0 0.0 0.0 0.2 0 3 c12t500117310015D54Ed0 0.0 13.5 0.0 75.8 0.0 0.0 0.8 0.1 0 0 rpool 718.4 1653.5 12846.8 71761.5 34.0 2.0 14.3 0.8 7 51 tank This doesn't seem any busier than my earlier output (6% wait, 68% busy, asvc_t 1.1ms) and the dev team confirms that their workload hasn't changed for the past few days. If my math is right.. this will take ~719 days to complete. Anything I can tune to help speed this up? On Tue, Jul 29, 2014 at 3:29 PM, wuffers wrote: > Going to try to answer both responses in one message.. > > Short answer, yes. ? Keep in mind that >> >> 1. a scrub runs in the background (so as not to impact production I/O, >> this was not always the case and caused serious issues in the past with a >> pool being unresponsive due to a scrub) >> >> 2. a scrub essentially walks the zpool examining every transaction in >> order (as does a resilver) >> >> So the time to complete a scrub depends on how many write transactions >> since the pool was created (which is generally related to the amount of >> data but not always). You are limited by the random I/O capability of the >> disks involved. With VMs I assume this is a file server, so the I/O size >> will also affect performance. > > > I haven't noticed any slowdowns in our virtual environments, so I guess > that's a good thing it's so low priority that it doesn't impact workloads. > > Run the numbers? you are scanning 24.2TB at about 5.5MB/sec ? 4,613,734 >> seconds or 54 days. And that assumes the same rate for all of the scan. The >> rate will change as other I/O competes for resources. >> > > The number was fluctuating when I started the scrub, and I had seen it go > as high as 35MB/s at one point. I am certain that our Hyper-V workload has > increased since the last scrub, so this does make sense. > > >> Looks like you have a fair bit of activity going on (almost 1MB/sec of >> writes per spindle). >> > > As Richard correctly states below, this is the aggregate since boot > (uptime ~56 days). I have another output from iostat as per his > instructions below. > > >> Since this is storage for VMs, I assume this is the storage server for >> separate compute servers? Have you tuned the block size for the file share >> you are using? That can make a huge difference in performance. >> > > Both the Hyper-V and VMware LUNs are created with 64K block sizes. From > what I've read of other performance and tuning articles, that is the > optimal block size (I did some limited testing when first configuring the > SAN, but results were somewhat inconclusive). Hyper-V hosts our testing > environment (we integrate with TFS, a MS product, so we have no choice > here) and probably make up the bulk of the workload (~300+ test VMs with > various OSes). VMware hosts our production servers (Exchange, file servers, > SQL, AD, etc - ~50+ VMs). > > I also noted that you only have a single LOG device. Best Practice is to >> mirror log devices so you do not lose any data in flight if hit by a power >> outage (of course, if this server has more UPS runtime that all the clients >> that may not matter). >> > > Actually, I do have a mirror ZIL device, it's just disabled at this time > (my ZIL devices are ZeusRAMs). At some point, I was troubleshooting some > kernel panics (turned out to be a faulty SSD on the rpool), and hadn't > re-enabled it yet. Thanks for the reminder (and yes, we do have a UPS as > well). > > And oops.. re-attaching the ZIL as a mirror triggered a resilver now, > suspending or canceling the scrub? Will monitor this and restart the scrub > if it doesn't by itself. > > pool: tank > state: ONLINE > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Tue Jul 29 14:48:48 2014 > 3.89T scanned out of 24.5T at 3.06G/s, 1h55m to go > 0 resilvered, 15.84% done > > At least it's going very fast. EDIT: Now about 67% done as I finish > writing this, speed dropping to ~1.3G/s. > > maybe, maybe not >>> >>> this is slower than most, surely slower than desired >>> >> > Unfortunately reattaching the mirror to my log device triggered a > resilver. Not sure if this is desired behavior, but yes, 5.5MB/s seems > quite slow. Hopefully after the resilver the scrub will progress where it > left off. > > >> The estimate is often very wrong, especially for busy systems. >>> If this is an older ZFS implementation, this pool is likely getting >>> pounded by the >>> ZFS write throttle. There are some tunings that can be applied, but the >>> old write >>> throttle is not a stable control system, so it will always be a little >>> bit unpredictable. >>> >> > The system is on r151008 (my BE states that I upgraded back in February, > putting me in r151008j or so), with all the pools upgraded for the new > enhancements as well as activating the new L2ARC compression feature. > Reading the release notes, the ZFS write throttle enhancements were in > since r151008e so I should be good there. > > >> # iostat -xnze >>> >>> >>> Unfortunately, this is the performance since boot and is not suitable >>> for performance >>> analysis unless the system has been rebooted in the past 10 minutes or >>> so. You'll need >>> to post the second batch from "iostat -zxCn 60 2" >>> >> > Ah yes, that was my mistake. Output from second count (before re-attaching > log mirror): > > # iostat -zxCn 60 2 > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 255.7 1077.7 6294.0 41335.1 0.0 1.9 0.0 1.4 0 153 c1 > 5.3 23.9 118.5 811.9 0.0 0.0 0.0 1.1 0 3 > c1t5000C50055F8723Bd0 > 5.9 14.5 110.0 834.3 0.0 0.0 0.0 1.3 0 2 > c1t5000C50055E66B63d0 > 5.6 16.6 123.8 822.7 0.0 0.0 0.0 1.3 0 2 > c1t5000C50055F87E73d0 > 4.7 27.8 118.6 796.6 0.0 0.0 0.0 1.3 0 3 > c1t5000C50055F8BFA3d0 > 5.6 14.5 139.7 833.8 0.0 0.0 0.0 1.6 0 3 > c1t5000C50055F9E123d0 > 4.4 27.1 112.3 825.2 0.0 0.0 0.0 0.8 0 2 > c1t5000C50055F9F0B3d0 > 5.0 20.2 121.7 803.4 0.0 0.0 0.0 1.2 0 3 > c1t5000C50055F9D3B3d0 > 5.4 26.4 137.0 857.3 0.0 0.0 0.0 1.4 0 4 > c1t5000C50055E4FDE7d0 > 4.7 12.3 123.7 832.7 0.0 0.0 0.0 2.0 0 3 > c1t5000C50055F9A607d0 > 5.0 23.9 125.9 830.9 0.0 0.0 0.0 1.3 0 3 > c1t5000C50055F8CDA7d0 > 4.5 31.4 112.2 814.6 0.0 0.0 0.0 1.1 0 3 > c1t5000C50055E65877d0 > 5.2 24.4 130.6 872.5 0.0 0.0 0.0 1.2 0 3 > c1t5000C50055F9E7D7d0 > 4.1 21.8 103.7 797.2 0.0 0.0 0.0 1.1 0 3 > c1t5000C50055FA0AF7d0 > 5.5 24.8 129.8 802.8 0.0 0.0 0.0 1.5 0 4 > c1t5000C50055F9FE87d0 > 5.7 17.7 137.2 797.6 0.0 0.0 0.0 1.4 0 3 > c1t5000C50055F9F91Bd0 > 6.0 30.6 139.1 852.0 0.0 0.1 0.0 1.5 0 4 > c1t5000C50055F9FEABd0 > 6.1 34.1 137.8 929.2 0.0 0.1 0.0 1.9 0 6 > c1t5000C50055F9F63Bd0 > 4.1 15.9 101.8 791.4 0.0 0.0 0.0 1.6 0 3 > c1t5000C50055F9F3EBd0 > 6.4 23.2 155.2 878.6 0.0 0.0 0.0 1.1 0 3 > c1t5000C50055F9F80Bd0 > 4.5 23.5 106.2 825.4 0.0 0.0 0.0 1.1 0 3 > c1t5000C50055F9FB8Bd0 > 4.0 23.2 101.1 788.9 0.0 0.0 0.0 1.3 0 3 > c1t5000C50055F9F92Bd0 > 4.4 11.3 125.7 782.3 0.0 0.0 0.0 1.9 0 3 > c1t5000C50055F8905Fd0 > 4.6 20.4 129.2 823.0 0.0 0.0 0.0 1.5 0 3 > c1t5000C50055F8D48Fd0 > 5.1 19.7 142.9 887.2 0.0 0.0 0.0 1.7 0 3 > c1t5000C50055F9F89Fd0 > 5.6 11.4 129.1 776.0 0.0 0.0 0.0 1.9 0 3 > c1t5000C50055F9EF2Fd0 > 5.6 23.7 137.4 811.9 0.0 0.0 0.0 1.2 0 3 > c1t5000C50055F8C3ABd0 > 6.8 13.9 132.4 834.3 0.0 0.0 0.0 1.8 0 3 > c1t5000C50055E66053d0 > 5.2 26.7 126.9 857.3 0.0 0.0 0.0 1.2 0 3 > c1t5000C50055E66503d0 > 4.2 27.1 104.6 825.2 0.0 0.0 0.0 1.0 0 3 > c1t5000C50055F9D3E3d0 > 5.2 30.7 140.9 852.0 0.0 0.1 0.0 1.5 0 4 > c1t5000C50055F84FB7d0 > 5.4 16.1 124.3 791.4 0.0 0.0 0.0 1.7 0 3 > c1t5000C50055F8E017d0 > 3.8 31.4 89.7 814.6 0.0 0.0 0.0 1.1 0 4 > c1t5000C50055E579F7d0 > 4.6 27.5 116.0 796.6 0.0 0.1 0.0 1.6 0 4 > c1t5000C50055E65807d0 > 4.0 21.5 99.7 797.2 0.0 0.0 0.0 1.1 0 3 > c1t5000C50055F84A97d0 > 4.7 20.2 116.3 803.4 0.0 0.0 0.0 1.4 0 3 > c1t5000C50055F87D97d0 > 5.0 11.5 121.5 776.0 0.0 0.0 0.0 2.0 0 3 > c1t5000C50055F9F637d0 > 4.9 11.3 112.4 782.3 0.0 0.0 0.0 2.3 0 3 > c1t5000C50055E65ABBd0 > 5.3 11.8 142.5 832.7 0.0 0.0 0.0 2.4 0 3 > c1t5000C50055F8BF9Bd0 > 5.0 20.3 121.4 823.0 0.0 0.0 0.0 1.7 0 3 > c1t5000C50055F8A22Bd0 > 6.6 24.3 170.3 872.5 0.0 0.0 0.0 1.3 0 3 > c1t5000C50055F9379Bd0 > 5.8 16.3 121.7 822.7 0.0 0.0 0.0 1.3 0 2 > c1t5000C50055E57A5Fd0 > 5.3 17.7 146.5 797.6 0.0 0.0 0.0 1.4 0 3 > c1t5000C50055F8CCAFd0 > 5.7 34.1 141.5 929.2 0.0 0.1 0.0 1.7 0 5 > c1t5000C50055F8B80Fd0 > 5.5 23.8 125.7 830.9 0.0 0.0 0.0 1.2 0 3 > c1t5000C50055F9FA1Fd0 > 5.0 23.2 127.9 878.6 0.0 0.0 0.0 1.1 0 3 > c1t5000C50055E65F0Fd0 > 5.2 14.0 163.7 833.8 0.0 0.0 0.0 2.0 0 3 > c1t5000C50055F8BE3Fd0 > 4.6 18.9 122.8 887.2 0.0 0.0 0.0 1.6 0 3 > c1t5000C50055F8B21Fd0 > 5.5 23.6 137.4 825.4 0.0 0.0 0.0 1.5 0 3 > c1t5000C50055F8A46Fd0 > 4.9 24.6 116.7 802.8 0.0 0.0 0.0 1.4 0 4 > c1t5000C50055F856CFd0 > 4.9 23.4 120.8 788.9 0.0 0.0 0.0 1.4 0 3 > c1t5000C50055E6606Fd0 > 234.9 170.1 4079.9 11127.8 0.0 0.2 0.0 0.5 0 9 c2 > 119.0 28.9 2083.8 670.8 0.0 0.0 0.0 0.3 0 3 > c2t500117310015D579d0 > 115.9 27.4 1996.1 634.2 0.0 0.0 0.0 0.3 0 3 > c2t50011731001631FDd0 > 0.0 113.8 0.0 9822.8 0.0 0.1 0.0 1.0 0 2 > c2t5000A72A3007811Dd0 > 0.1 18.5 0.0 64.8 0.0 0.0 0.0 0.0 0 0 c4 > 0.1 9.2 0.0 32.4 0.0 0.0 0.0 0.0 0 0 c4t0d0 > 0.0 9.2 0.0 32.4 0.0 0.0 0.0 0.0 0 0 c4t1d0 > 229.8 58.1 3987.4 1308.0 0.0 0.1 0.0 0.3 0 6 c12 > 114.2 27.7 1994.8 626.0 0.0 0.0 0.0 0.3 0 3 > c12t500117310015D59Ed0 > 115.5 30.4 1992.6 682.0 0.0 0.0 0.0 0.3 0 3 > c12t500117310015D54Ed0 > 0.1 17.1 0.0 64.8 0.0 0.0 0.6 0.1 0 0 rpool > 720.3 1298.4 14361.2 53770.8 18.7 2.3 9.3 1.1 6 68 tank > > Is 153% busy correct on c1? Seems to me that disks are quite "busy", but > are handling the workload just fine (wait at 6% and asvc_t at 1.1ms) > > Interestingly, this is the same output now that the resilver is running: > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 2876.9 1041.1 25400.7 38189.1 0.0 37.9 0.0 9.7 0 2011 c1 > 60.8 26.1 540.1 845.2 0.0 0.7 0.0 8.3 0 39 > c1t5000C50055F8723Bd0 > 58.4 14.2 511.6 740.7 0.0 0.7 0.0 10.1 0 39 > c1t5000C50055E66B63d0 > 60.2 16.3 529.3 756.1 0.0 0.8 0.0 10.1 0 41 > c1t5000C50055F87E73d0 > 57.5 24.9 527.6 841.7 0.0 0.7 0.0 9.0 0 40 > c1t5000C50055F8BFA3d0 > 57.9 14.5 543.5 765.1 0.0 0.7 0.0 9.8 0 38 > c1t5000C50055F9E123d0 > 57.9 23.9 516.6 806.9 0.0 0.8 0.0 9.3 0 40 > c1t5000C50055F9F0B3d0 > 59.8 24.6 554.1 857.5 0.0 0.8 0.0 9.6 0 42 > c1t5000C50055F9D3B3d0 > 56.5 21.0 480.4 715.7 0.0 0.7 0.0 8.9 0 37 > c1t5000C50055E4FDE7d0 > 54.8 9.7 473.5 737.9 0.0 0.7 0.0 11.2 0 39 > c1t5000C50055F9A607d0 > 55.8 20.2 457.3 708.7 0.0 0.7 0.0 9.9 0 40 > c1t5000C50055F8CDA7d0 > 57.8 28.6 487.0 796.1 0.0 0.9 0.0 9.9 0 45 > c1t5000C50055E65877d0 > 60.8 27.1 572.6 823.7 0.0 0.8 0.0 8.8 0 41 > c1t5000C50055F9E7D7d0 > 55.8 21.1 478.2 766.6 0.0 0.7 0.0 9.7 0 40 > c1t5000C50055FA0AF7d0 > 57.0 22.8 528.3 724.5 0.0 0.8 0.0 9.6 0 41 > c1t5000C50055F9FE87d0 > 56.2 10.8 465.2 715.6 0.0 0.7 0.0 10.4 0 38 > c1t5000C50055F9F91Bd0 > 59.2 29.4 524.6 740.9 0.0 0.8 0.0 8.9 0 41 > c1t5000C50055F9FEABd0 > 57.3 30.7 496.7 788.3 0.0 0.8 0.0 9.1 0 42 > c1t5000C50055F9F63Bd0 > 55.5 16.3 461.9 652.9 0.0 0.7 0.0 10.1 0 39 > c1t5000C50055F9F3EBd0 > 57.2 22.1 495.1 701.1 0.0 0.8 0.0 9.8 0 41 > c1t5000C50055F9F80Bd0 > 59.5 30.2 543.1 741.8 0.0 0.9 0.0 9.6 0 45 > c1t5000C50055F9FB8Bd0 > 56.5 25.1 515.4 786.9 0.0 0.7 0.0 8.6 0 38 > c1t5000C50055F9F92Bd0 > 61.8 12.5 540.6 790.9 0.0 0.8 0.0 10.3 0 41 > c1t5000C50055F8905Fd0 > 57.0 19.8 521.0 774.3 0.0 0.7 0.0 9.6 0 39 > c1t5000C50055F8D48Fd0 > 56.3 16.3 517.7 724.7 0.0 0.7 0.0 9.9 0 38 > c1t5000C50055F9F89Fd0 > 57.0 13.4 504.5 790.5 0.0 0.8 0.0 10.7 0 40 > c1t5000C50055F9EF2Fd0 > 55.0 26.1 477.6 845.2 0.0 0.7 0.0 8.3 0 36 > c1t5000C50055F8C3ABd0 > 57.8 14.1 518.7 740.7 0.0 0.8 0.0 10.8 0 41 > c1t5000C50055E66053d0 > 55.9 20.8 490.2 715.7 0.0 0.7 0.0 9.0 0 37 > c1t5000C50055E66503d0 > 57.0 24.1 509.7 806.9 0.0 0.8 0.0 10.0 0 41 > c1t5000C50055F9D3E3d0 > 59.1 29.2 504.1 740.9 0.0 0.8 0.0 9.3 0 44 > c1t5000C50055F84FB7d0 > 54.4 16.3 449.5 652.9 0.0 0.7 0.0 10.4 0 39 > c1t5000C50055F8E017d0 > 57.8 28.4 503.3 796.1 0.0 0.9 0.0 10.1 0 45 > c1t5000C50055E579F7d0 > 58.2 24.9 502.0 841.7 0.0 0.8 0.0 9.2 0 40 > c1t5000C50055E65807d0 > 58.2 20.7 513.4 766.6 0.0 0.8 0.0 9.8 0 41 > c1t5000C50055F84A97d0 > 56.5 24.9 508.0 857.5 0.0 0.8 0.0 9.2 0 40 > c1t5000C50055F87D97d0 > 53.4 13.5 449.9 790.5 0.0 0.7 0.0 10.7 0 38 > c1t5000C50055F9F637d0 > 57.0 11.8 503.0 790.9 0.0 0.7 0.0 10.6 0 39 > c1t5000C50055E65ABBd0 > 55.4 9.6 461.1 737.9 0.0 0.8 0.0 11.6 0 40 > c1t5000C50055F8BF9Bd0 > 55.7 19.7 484.6 774.3 0.0 0.7 0.0 9.9 0 40 > c1t5000C50055F8A22Bd0 > 57.6 27.1 518.2 823.7 0.0 0.8 0.0 8.9 0 40 > c1t5000C50055F9379Bd0 > 59.6 17.0 528.0 756.1 0.0 0.8 0.0 10.1 0 41 > c1t5000C50055E57A5Fd0 > 61.2 10.8 530.0 715.6 0.0 0.8 0.0 10.7 0 40 > c1t5000C50055F8CCAFd0 > 58.0 30.8 493.3 788.3 0.0 0.8 0.0 9.4 0 43 > c1t5000C50055F8B80Fd0 > 56.5 19.9 490.7 708.7 0.0 0.8 0.0 10.0 0 40 > c1t5000C50055F9FA1Fd0 > 56.1 22.4 484.2 701.1 0.0 0.7 0.0 9.5 0 39 > c1t5000C50055E65F0Fd0 > 59.2 14.6 560.9 765.1 0.0 0.7 0.0 9.8 0 39 > c1t5000C50055F8BE3Fd0 > 57.9 16.2 546.0 724.7 0.0 0.7 0.0 10.1 0 40 > c1t5000C50055F8B21Fd0 > 59.5 30.0 553.2 741.8 0.0 0.9 0.0 9.8 0 45 > c1t5000C50055F8A46Fd0 > 57.4 22.5 504.0 724.5 0.0 0.8 0.0 9.6 0 41 > c1t5000C50055F856CFd0 > 58.4 24.6 531.4 786.9 0.0 0.7 0.0 8.4 0 38 > c1t5000C50055E6606Fd0 > 511.0 161.4 7572.1 11260.1 0.0 0.3 0.0 0.4 0 14 c2 > 252.3 20.1 3776.3 458.9 0.0 0.1 0.0 0.2 0 6 > c2t500117310015D579d0 > 258.8 18.0 3795.7 350.0 0.0 0.1 0.0 0.2 0 6 > c2t50011731001631FDd0 > 0.0 123.4 0.0 10451.1 0.0 0.1 0.0 1.0 0 3 > c2t5000A72A3007811Dd0 > 0.2 16.1 1.9 56.7 0.0 0.0 0.0 0.0 0 0 c4 > 0.2 8.1 1.6 28.3 0.0 0.0 0.0 0.0 0 0 c4t0d0 > 0.0 8.1 0.3 28.3 0.0 0.0 0.0 0.0 0 0 c4t1d0 > 495.6 163.6 7168.9 11290.3 0.0 0.2 0.0 0.4 0 14 c12 > 0.0 123.4 0.0 10451.1 0.0 0.1 0.0 1.0 0 3 > c12t5000A72B300780FFd0 > 248.2 18.1 3645.8 323.0 0.0 0.1 0.0 0.2 0 5 > c12t500117310015D59Ed0 > 247.4 22.1 3523.1 516.2 0.0 0.1 0.0 0.2 0 6 > c12t500117310015D54Ed0 > 0.2 14.8 1.9 56.7 0.0 0.0 0.6 0.1 0 0 rpool > 3883.5 1357.7 40141.6 60739.5 22.8 38.6 4.4 7.4 54 100 tank > > It is very busy with alot of wait % and higher asvc_t (2011% busy on > c1?!). I'm assuming resilvers are alot more aggressive than scrubs. > > There are many variables here, the biggest of which is the current >>> non-scrub load. >>> >> > I might have lost 2 weeks of scrub time, depending on whether the scrub > will resume where it left off. I'll update when I can. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.elling at richardelling.com Thu Jul 31 05:37:26 2014 From: richard.elling at richardelling.com (Richard Elling) Date: Wed, 30 Jul 2014 22:37:26 -0700 Subject: [OmniOS-discuss] Slow scrub performance In-Reply-To: References: Message-ID: apologies for the long post, data for big systems tends to do that, comments below... On Jul 30, 2014, at 9:10 PM, wuffers wrote: > So as I suspected, I lost 2 weeks of scrub time after the resilver. I started a scrub again, and it's going extremely slow (~13x slower than before): > > pool: tank > state: ONLINE > scan: scrub in progress since Tue Jul 29 15:41:27 2014 > 45.4G scanned out of 24.5T at 413K/s, (scan is slow, no estimated time) > 0 repaired, 0.18% done > > # iostat -zxCn 60 2 (2nd batch output) > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 143.7 1321.5 5149.0 46223.4 0.0 1.5 0.0 1.0 0 120 c1 > 2.4 33.3 72.0 897.5 0.0 0.0 0.0 0.6 0 2 c1t5000C50055F8723Bd0 > 2.7 22.8 82.9 1005.4 0.0 0.0 0.0 0.9 0 2 c1t5000C50055E66B63d0 > 2.2 24.4 73.1 917.7 0.0 0.0 0.0 0.7 0 2 c1t5000C50055F87E73d0 > 3.1 26.2 120.9 899.8 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F8BFA3d0 > 2.8 16.5 105.9 941.6 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F9E123d0 > 2.5 25.6 86.6 897.9 0.0 0.0 0.0 0.7 0 2 c1t5000C50055F9F0B3d0 > 2.3 19.9 85.3 967.8 0.0 0.0 0.0 1.2 0 2 c1t5000C50055F9D3B3d0 > 3.1 38.3 120.7 1053.1 0.0 0.0 0.0 0.8 0 3 c1t5000C50055E4FDE7d0 > 2.6 12.7 81.8 854.3 0.0 0.0 0.0 1.6 0 2 c1t5000C50055F9A607d0 > 3.2 25.0 121.7 871.7 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F8CDA7d0 > 2.5 30.6 93.0 941.2 0.0 0.0 0.0 0.9 0 2 c1t5000C50055E65877d0 > 3.1 43.7 101.4 1004.2 0.0 0.0 0.0 1.0 0 4 c1t5000C50055F9E7D7d0 > 2.3 24.0 92.2 965.8 0.0 0.0 0.0 0.9 0 2 c1t5000C50055FA0AF7d0 > 2.5 25.3 99.2 872.9 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F9FE87d0 > 2.9 19.0 116.1 894.8 0.0 0.0 0.0 1.2 0 2 c1t5000C50055F9F91Bd0 > 2.6 38.9 96.1 915.4 0.0 0.1 0.0 1.2 0 4 c1t5000C50055F9FEABd0 > 3.2 45.6 135.7 973.5 0.0 0.1 0.0 1.5 0 5 c1t5000C50055F9F63Bd0 > 3.1 21.2 105.9 966.6 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F9F3EBd0 > 2.8 26.7 122.0 781.6 0.0 0.0 0.0 0.7 0 2 c1t5000C50055F9F80Bd0 > 3.1 31.6 119.9 932.5 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F9FB8Bd0 > 3.1 32.5 123.3 924.1 0.0 0.0 0.0 0.9 0 3 c1t5000C50055F9F92Bd0 > 2.9 17.0 113.8 952.0 0.0 0.0 0.0 1.2 0 2 c1t5000C50055F8905Fd0 > 3.0 23.4 111.0 871.1 0.0 0.0 0.0 1.5 0 2 c1t5000C50055F8D48Fd0 > 2.8 21.4 105.5 858.0 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F9F89Fd0 > 3.5 16.4 87.1 941.3 0.0 0.0 0.0 1.4 0 2 c1t5000C50055F9EF2Fd0 > 2.1 33.8 64.5 897.5 0.0 0.0 0.0 0.5 0 2 c1t5000C50055F8C3ABd0 > 3.0 21.8 72.3 1005.4 0.0 0.0 0.0 1.0 0 2 c1t5000C50055E66053d0 > 3.0 37.8 106.9 1053.5 0.0 0.0 0.0 0.9 0 3 c1t5000C50055E66503d0 > 2.7 26.0 107.7 897.9 0.0 0.0 0.0 0.7 0 2 c1t5000C50055F9D3E3d0 > 2.2 38.9 96.4 918.7 0.0 0.0 0.0 0.9 0 4 c1t5000C50055F84FB7d0 > 2.8 21.4 111.1 953.6 0.0 0.0 0.0 0.7 0 1 c1t5000C50055F8E017d0 > 3.0 30.6 104.3 940.9 0.0 0.1 0.0 1.5 0 3 c1t5000C50055E579F7d0 > 2.8 26.4 90.9 901.1 0.0 0.0 0.0 0.9 0 2 c1t5000C50055E65807d0 > 2.4 24.0 96.7 965.8 0.0 0.0 0.0 0.9 0 2 c1t5000C50055F84A97d0 > 2.9 19.8 109.4 967.8 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F87D97d0 > 3.8 16.1 106.4 943.1 0.0 0.0 0.0 1.3 0 2 c1t5000C50055F9F637d0 > 2.2 17.1 72.7 966.6 0.0 0.0 0.0 1.4 0 2 c1t5000C50055E65ABBd0 > 2.7 12.7 86.0 863.3 0.0 0.0 0.0 1.5 0 2 c1t5000C50055F8BF9Bd0 > 2.7 23.2 101.8 871.1 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F8A22Bd0 > 4.5 43.6 134.7 1004.2 0.0 0.0 0.0 1.0 0 4 c1t5000C50055F9379Bd0 > 2.8 24.0 87.9 917.7 0.0 0.0 0.0 0.8 0 2 c1t5000C50055E57A5Fd0 > 2.9 18.8 119.0 894.3 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F8CCAFd0 > 3.4 45.7 128.1 976.8 0.0 0.1 0.0 1.2 0 5 c1t5000C50055F8B80Fd0 > 2.7 24.9 100.2 871.7 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F9FA1Fd0 > 4.8 26.8 128.6 781.6 0.0 0.0 0.0 0.7 0 2 c1t5000C50055E65F0Fd0 > 2.7 16.3 109.5 941.6 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F8BE3Fd0 > 3.1 21.1 119.9 858.0 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F8B21Fd0 > 2.8 31.8 108.5 932.5 0.0 0.0 0.0 1.0 0 3 c1t5000C50055F8A46Fd0 > 2.4 25.3 87.4 872.9 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F856CFd0 > 3.3 32.0 125.2 924.1 0.0 0.0 0.0 1.2 0 3 c1t5000C50055E6606Fd0 > 289.9 169.0 3905.0 12754.1 0.0 0.2 0.0 0.4 0 10 c2 > 146.6 14.1 1987.9 305.2 0.0 0.0 0.0 0.2 0 4 c2t500117310015D579d0 > 143.4 10.6 1917.1 205.2 0.0 0.0 0.0 0.2 0 3 c2t50011731001631FDd0 > 0.0 144.3 0.0 12243.7 0.0 0.1 0.0 0.9 0 3 c2t5000A72A3007811Dd0 > 0.0 14.6 0.0 75.8 0.0 0.0 0.0 0.1 0 0 c4 > 0.0 7.3 0.0 37.9 0.0 0.0 0.0 0.1 0 0 c4t0d0 > 0.0 7.3 0.0 37.9 0.0 0.0 0.0 0.1 0 0 c4t1d0 > 284.8 171.5 3792.8 12786.2 0.0 0.2 0.0 0.4 0 10 c12 > 0.0 144.3 0.0 12243.7 0.0 0.1 0.0 0.9 0 3 c12t5000A72B300780FFd0 > 152.3 13.3 2004.6 255.9 0.0 0.0 0.0 0.2 0 4 c12t500117310015D59Ed0 > 132.5 13.9 1788.2 286.6 0.0 0.0 0.0 0.2 0 3 c12t500117310015D54Ed0 > 0.0 13.5 0.0 75.8 0.0 0.0 0.8 0.1 0 0 rpool > 718.4 1653.5 12846.8 71761.5 34.0 2.0 14.3 0.8 7 51 tank > > This doesn't seem any busier than my earlier output (6% wait, 68% busy, asvc_t 1.1ms) and the dev team confirms that their workload hasn't changed for the past few days. If my math is right.. this will take ~719 days to complete. The %busy for controllers is a sum of the %busy for all disks on the controller, so is can be large, but overall isn't interesting. With HDDs, there is no way you can saturate the controller, so we don't really care what the %busy shows. The more important item is that the number of read ops is fairly low for all but 4 disks. Since you didn't post the pool configuration, we can only guess that they might be a souce of the bottleneck. You're seeing a lot of reads from the cache devices. How much RAM does this system have? > > Anything I can tune to help speed this up? methinks the scrub I/Os are getting starved and since they are low priority, they could get very starved. In general, I wouldn't worry about it, but I understand why you might be nervous. Keep in mind that in ZFS scrubs are intended to find errors on idle data, not frequently accessed data. more far below... > > On Tue, Jul 29, 2014 at 3:29 PM, wuffers wrote: > Going to try to answer both responses in one message.. > > Short answer, yes. ? Keep in mind that > > 1. a scrub runs in the background (so as not to impact production I/O, this was not always the case and caused serious issues in the past with a pool being unresponsive due to a scrub) > > 2. a scrub essentially walks the zpool examining every transaction in order (as does a resilver) > > So the time to complete a scrub depends on how many write transactions since the pool was created (which is generally related to the amount of data but not always). You are limited by the random I/O capability of the disks involved. With VMs I assume this is a file server, so the I/O size will also affect performance. > > I haven't noticed any slowdowns in our virtual environments, so I guess that's a good thing it's so low priority that it doesn't impact workloads. > > Run the numbers? you are scanning 24.2TB at about 5.5MB/sec ? 4,613,734 seconds or 54 days. And that assumes the same rate for all of the scan. The rate will change as other I/O competes for resources. > > The number was fluctuating when I started the scrub, and I had seen it go as high as 35MB/s at one point. I am certain that our Hyper-V workload has increased since the last scrub, so this does make sense. > > Looks like you have a fair bit of activity going on (almost 1MB/sec of writes per spindle). > > As Richard correctly states below, this is the aggregate since boot (uptime ~56 days). I have another output from iostat as per his instructions below. > > Since this is storage for VMs, I assume this is the storage server for separate compute servers? Have you tuned the block size for the file share you are using? That can make a huge difference in performance. > > Both the Hyper-V and VMware LUNs are created with 64K block sizes. From what I've read of other performance and tuning articles, that is the optimal block size (I did some limited testing when first configuring the SAN, but results were somewhat inconclusive). Hyper-V hosts our testing environment (we integrate with TFS, a MS product, so we have no choice here) and probably make up the bulk of the workload (~300+ test VMs with various OSes). VMware hosts our production servers (Exchange, file servers, SQL, AD, etc - ~50+ VMs). > > I also noted that you only have a single LOG device. Best Practice is to mirror log devices so you do not lose any data in flight if hit by a power outage (of course, if this server has more UPS runtime that all the clients that may not matter). > > Actually, I do have a mirror ZIL device, it's just disabled at this time (my ZIL devices are ZeusRAMs). At some point, I was troubleshooting some kernel panics (turned out to be a faulty SSD on the rpool), and hadn't re-enabled it yet. Thanks for the reminder (and yes, we do have a UPS as well). > > And oops.. re-attaching the ZIL as a mirror triggered a resilver now, suspending or canceling the scrub? Will monitor this and restart the scrub if it doesn't by itself. > > pool: tank > state: ONLINE > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Tue Jul 29 14:48:48 2014 > 3.89T scanned out of 24.5T at 3.06G/s, 1h55m to go > 0 resilvered, 15.84% done > > At least it's going very fast. EDIT: Now about 67% done as I finish writing this, speed dropping to ~1.3G/s. > > maybe, maybe not > > this is slower than most, surely slower than desired > > Unfortunately reattaching the mirror to my log device triggered a resilver. Not sure if this is desired behavior, but yes, 5.5MB/s seems quite slow. Hopefully after the resilver the scrub will progress where it left off. > > The estimate is often very wrong, especially for busy systems. > If this is an older ZFS implementation, this pool is likely getting pounded by the > ZFS write throttle. There are some tunings that can be applied, but the old write > throttle is not a stable control system, so it will always be a little bit unpredictable. > > The system is on r151008 (my BE states that I upgraded back in February, putting me in r151008j or so), with all the pools upgraded for the new enhancements as well as activating the new L2ARC compression feature. Reading the release notes, the ZFS write throttle enhancements were in since r151008e so I should be good there. > >> # iostat -xnze > > Unfortunately, this is the performance since boot and is not suitable for performance > analysis unless the system has been rebooted in the past 10 minutes or so. You'll need > to post the second batch from "iostat -zxCn 60 2" > > Ah yes, that was my mistake. Output from second count (before re-attaching log mirror): > > # iostat -zxCn 60 2 > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 255.7 1077.7 6294.0 41335.1 0.0 1.9 0.0 1.4 0 153 c1 > 5.3 23.9 118.5 811.9 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F8723Bd0 > 5.9 14.5 110.0 834.3 0.0 0.0 0.0 1.3 0 2 c1t5000C50055E66B63d0 > 5.6 16.6 123.8 822.7 0.0 0.0 0.0 1.3 0 2 c1t5000C50055F87E73d0 > 4.7 27.8 118.6 796.6 0.0 0.0 0.0 1.3 0 3 c1t5000C50055F8BFA3d0 > 5.6 14.5 139.7 833.8 0.0 0.0 0.0 1.6 0 3 c1t5000C50055F9E123d0 > 4.4 27.1 112.3 825.2 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F9F0B3d0 > 5.0 20.2 121.7 803.4 0.0 0.0 0.0 1.2 0 3 c1t5000C50055F9D3B3d0 > 5.4 26.4 137.0 857.3 0.0 0.0 0.0 1.4 0 4 c1t5000C50055E4FDE7d0 > 4.7 12.3 123.7 832.7 0.0 0.0 0.0 2.0 0 3 c1t5000C50055F9A607d0 > 5.0 23.9 125.9 830.9 0.0 0.0 0.0 1.3 0 3 c1t5000C50055F8CDA7d0 > 4.5 31.4 112.2 814.6 0.0 0.0 0.0 1.1 0 3 c1t5000C50055E65877d0 > 5.2 24.4 130.6 872.5 0.0 0.0 0.0 1.2 0 3 c1t5000C50055F9E7D7d0 > 4.1 21.8 103.7 797.2 0.0 0.0 0.0 1.1 0 3 c1t5000C50055FA0AF7d0 > 5.5 24.8 129.8 802.8 0.0 0.0 0.0 1.5 0 4 c1t5000C50055F9FE87d0 > 5.7 17.7 137.2 797.6 0.0 0.0 0.0 1.4 0 3 c1t5000C50055F9F91Bd0 > 6.0 30.6 139.1 852.0 0.0 0.1 0.0 1.5 0 4 c1t5000C50055F9FEABd0 > 6.1 34.1 137.8 929.2 0.0 0.1 0.0 1.9 0 6 c1t5000C50055F9F63Bd0 > 4.1 15.9 101.8 791.4 0.0 0.0 0.0 1.6 0 3 c1t5000C50055F9F3EBd0 > 6.4 23.2 155.2 878.6 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F9F80Bd0 > 4.5 23.5 106.2 825.4 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F9FB8Bd0 > 4.0 23.2 101.1 788.9 0.0 0.0 0.0 1.3 0 3 c1t5000C50055F9F92Bd0 > 4.4 11.3 125.7 782.3 0.0 0.0 0.0 1.9 0 3 c1t5000C50055F8905Fd0 > 4.6 20.4 129.2 823.0 0.0 0.0 0.0 1.5 0 3 c1t5000C50055F8D48Fd0 > 5.1 19.7 142.9 887.2 0.0 0.0 0.0 1.7 0 3 c1t5000C50055F9F89Fd0 > 5.6 11.4 129.1 776.0 0.0 0.0 0.0 1.9 0 3 c1t5000C50055F9EF2Fd0 > 5.6 23.7 137.4 811.9 0.0 0.0 0.0 1.2 0 3 c1t5000C50055F8C3ABd0 > 6.8 13.9 132.4 834.3 0.0 0.0 0.0 1.8 0 3 c1t5000C50055E66053d0 > 5.2 26.7 126.9 857.3 0.0 0.0 0.0 1.2 0 3 c1t5000C50055E66503d0 > 4.2 27.1 104.6 825.2 0.0 0.0 0.0 1.0 0 3 c1t5000C50055F9D3E3d0 > 5.2 30.7 140.9 852.0 0.0 0.1 0.0 1.5 0 4 c1t5000C50055F84FB7d0 > 5.4 16.1 124.3 791.4 0.0 0.0 0.0 1.7 0 3 c1t5000C50055F8E017d0 > 3.8 31.4 89.7 814.6 0.0 0.0 0.0 1.1 0 4 c1t5000C50055E579F7d0 > 4.6 27.5 116.0 796.6 0.0 0.1 0.0 1.6 0 4 c1t5000C50055E65807d0 > 4.0 21.5 99.7 797.2 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F84A97d0 > 4.7 20.2 116.3 803.4 0.0 0.0 0.0 1.4 0 3 c1t5000C50055F87D97d0 > 5.0 11.5 121.5 776.0 0.0 0.0 0.0 2.0 0 3 c1t5000C50055F9F637d0 > 4.9 11.3 112.4 782.3 0.0 0.0 0.0 2.3 0 3 c1t5000C50055E65ABBd0 > 5.3 11.8 142.5 832.7 0.0 0.0 0.0 2.4 0 3 c1t5000C50055F8BF9Bd0 > 5.0 20.3 121.4 823.0 0.0 0.0 0.0 1.7 0 3 c1t5000C50055F8A22Bd0 > 6.6 24.3 170.3 872.5 0.0 0.0 0.0 1.3 0 3 c1t5000C50055F9379Bd0 > 5.8 16.3 121.7 822.7 0.0 0.0 0.0 1.3 0 2 c1t5000C50055E57A5Fd0 > 5.3 17.7 146.5 797.6 0.0 0.0 0.0 1.4 0 3 c1t5000C50055F8CCAFd0 > 5.7 34.1 141.5 929.2 0.0 0.1 0.0 1.7 0 5 c1t5000C50055F8B80Fd0 > 5.5 23.8 125.7 830.9 0.0 0.0 0.0 1.2 0 3 c1t5000C50055F9FA1Fd0 > 5.0 23.2 127.9 878.6 0.0 0.0 0.0 1.1 0 3 c1t5000C50055E65F0Fd0 > 5.2 14.0 163.7 833.8 0.0 0.0 0.0 2.0 0 3 c1t5000C50055F8BE3Fd0 > 4.6 18.9 122.8 887.2 0.0 0.0 0.0 1.6 0 3 c1t5000C50055F8B21Fd0 > 5.5 23.6 137.4 825.4 0.0 0.0 0.0 1.5 0 3 c1t5000C50055F8A46Fd0 > 4.9 24.6 116.7 802.8 0.0 0.0 0.0 1.4 0 4 c1t5000C50055F856CFd0 > 4.9 23.4 120.8 788.9 0.0 0.0 0.0 1.4 0 3 c1t5000C50055E6606Fd0 > 234.9 170.1 4079.9 11127.8 0.0 0.2 0.0 0.5 0 9 c2 > 119.0 28.9 2083.8 670.8 0.0 0.0 0.0 0.3 0 3 c2t500117310015D579d0 > 115.9 27.4 1996.1 634.2 0.0 0.0 0.0 0.3 0 3 c2t50011731001631FDd0 > 0.0 113.8 0.0 9822.8 0.0 0.1 0.0 1.0 0 2 c2t5000A72A3007811Dd0 > 0.1 18.5 0.0 64.8 0.0 0.0 0.0 0.0 0 0 c4 > 0.1 9.2 0.0 32.4 0.0 0.0 0.0 0.0 0 0 c4t0d0 > 0.0 9.2 0.0 32.4 0.0 0.0 0.0 0.0 0 0 c4t1d0 > 229.8 58.1 3987.4 1308.0 0.0 0.1 0.0 0.3 0 6 c12 > 114.2 27.7 1994.8 626.0 0.0 0.0 0.0 0.3 0 3 c12t500117310015D59Ed0 > 115.5 30.4 1992.6 682.0 0.0 0.0 0.0 0.3 0 3 c12t500117310015D54Ed0 > 0.1 17.1 0.0 64.8 0.0 0.0 0.6 0.1 0 0 rpool > 720.3 1298.4 14361.2 53770.8 18.7 2.3 9.3 1.1 6 68 tank ok, so the pool is issuing 720 read iops, including resilver workload, vs 1298 write iops. There is plenty of I/O capacity left on the table here, as you can see by the %busy being so low. So I think the pool is not scheduling scrub I/Os very well. You can increase the number of scrub I/Os in the scheduler by adjusting the zfs_vdev_scrub_max_active tunable. The default is 2, but you'll have to consider that a share (in the stock market sense) where the active sync reads and writes are getting 10 each. You can try bumping up the value and see what happens over some time, perhaps 10 minutes or so -- too short of a time and you won't get a good feeling for the impact (try this in off-peak time). echo zfs_vdev_scrub_max_active/W0t5 | mdb -kw will change the value from 2 to 5, increasing its share of the total I/O workload. You can see the progress of scan (scrubs do scan) workload by looking at the ZFS debug messages. echo ::zfs_dbgmsg | mdb -k These will look mysterious... they are. But the interesting bits are about how many blocks are visited in some amount of time (txg sync interval). Ideally, this will change as you adjust zfs_vdev_scrub_max_active. -- richard > > Is 153% busy correct on c1? Seems to me that disks are quite "busy", but are handling the workload just fine (wait at 6% and asvc_t at 1.1ms) > > Interestingly, this is the same output now that the resilver is running: > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 2876.9 1041.1 25400.7 38189.1 0.0 37.9 0.0 9.7 0 2011 c1 > 60.8 26.1 540.1 845.2 0.0 0.7 0.0 8.3 0 39 c1t5000C50055F8723Bd0 > 58.4 14.2 511.6 740.7 0.0 0.7 0.0 10.1 0 39 c1t5000C50055E66B63d0 > 60.2 16.3 529.3 756.1 0.0 0.8 0.0 10.1 0 41 c1t5000C50055F87E73d0 > 57.5 24.9 527.6 841.7 0.0 0.7 0.0 9.0 0 40 c1t5000C50055F8BFA3d0 > 57.9 14.5 543.5 765.1 0.0 0.7 0.0 9.8 0 38 c1t5000C50055F9E123d0 > 57.9 23.9 516.6 806.9 0.0 0.8 0.0 9.3 0 40 c1t5000C50055F9F0B3d0 > 59.8 24.6 554.1 857.5 0.0 0.8 0.0 9.6 0 42 c1t5000C50055F9D3B3d0 > 56.5 21.0 480.4 715.7 0.0 0.7 0.0 8.9 0 37 c1t5000C50055E4FDE7d0 > 54.8 9.7 473.5 737.9 0.0 0.7 0.0 11.2 0 39 c1t5000C50055F9A607d0 > 55.8 20.2 457.3 708.7 0.0 0.7 0.0 9.9 0 40 c1t5000C50055F8CDA7d0 > 57.8 28.6 487.0 796.1 0.0 0.9 0.0 9.9 0 45 c1t5000C50055E65877d0 > 60.8 27.1 572.6 823.7 0.0 0.8 0.0 8.8 0 41 c1t5000C50055F9E7D7d0 > 55.8 21.1 478.2 766.6 0.0 0.7 0.0 9.7 0 40 c1t5000C50055FA0AF7d0 > 57.0 22.8 528.3 724.5 0.0 0.8 0.0 9.6 0 41 c1t5000C50055F9FE87d0 > 56.2 10.8 465.2 715.6 0.0 0.7 0.0 10.4 0 38 c1t5000C50055F9F91Bd0 > 59.2 29.4 524.6 740.9 0.0 0.8 0.0 8.9 0 41 c1t5000C50055F9FEABd0 > 57.3 30.7 496.7 788.3 0.0 0.8 0.0 9.1 0 42 c1t5000C50055F9F63Bd0 > 55.5 16.3 461.9 652.9 0.0 0.7 0.0 10.1 0 39 c1t5000C50055F9F3EBd0 > 57.2 22.1 495.1 701.1 0.0 0.8 0.0 9.8 0 41 c1t5000C50055F9F80Bd0 > 59.5 30.2 543.1 741.8 0.0 0.9 0.0 9.6 0 45 c1t5000C50055F9FB8Bd0 > 56.5 25.1 515.4 786.9 0.0 0.7 0.0 8.6 0 38 c1t5000C50055F9F92Bd0 > 61.8 12.5 540.6 790.9 0.0 0.8 0.0 10.3 0 41 c1t5000C50055F8905Fd0 > 57.0 19.8 521.0 774.3 0.0 0.7 0.0 9.6 0 39 c1t5000C50055F8D48Fd0 > 56.3 16.3 517.7 724.7 0.0 0.7 0.0 9.9 0 38 c1t5000C50055F9F89Fd0 > 57.0 13.4 504.5 790.5 0.0 0.8 0.0 10.7 0 40 c1t5000C50055F9EF2Fd0 > 55.0 26.1 477.6 845.2 0.0 0.7 0.0 8.3 0 36 c1t5000C50055F8C3ABd0 > 57.8 14.1 518.7 740.7 0.0 0.8 0.0 10.8 0 41 c1t5000C50055E66053d0 > 55.9 20.8 490.2 715.7 0.0 0.7 0.0 9.0 0 37 c1t5000C50055E66503d0 > 57.0 24.1 509.7 806.9 0.0 0.8 0.0 10.0 0 41 c1t5000C50055F9D3E3d0 > 59.1 29.2 504.1 740.9 0.0 0.8 0.0 9.3 0 44 c1t5000C50055F84FB7d0 > 54.4 16.3 449.5 652.9 0.0 0.7 0.0 10.4 0 39 c1t5000C50055F8E017d0 > 57.8 28.4 503.3 796.1 0.0 0.9 0.0 10.1 0 45 c1t5000C50055E579F7d0 > 58.2 24.9 502.0 841.7 0.0 0.8 0.0 9.2 0 40 c1t5000C50055E65807d0 > 58.2 20.7 513.4 766.6 0.0 0.8 0.0 9.8 0 41 c1t5000C50055F84A97d0 > 56.5 24.9 508.0 857.5 0.0 0.8 0.0 9.2 0 40 c1t5000C50055F87D97d0 > 53.4 13.5 449.9 790.5 0.0 0.7 0.0 10.7 0 38 c1t5000C50055F9F637d0 > 57.0 11.8 503.0 790.9 0.0 0.7 0.0 10.6 0 39 c1t5000C50055E65ABBd0 > 55.4 9.6 461.1 737.9 0.0 0.8 0.0 11.6 0 40 c1t5000C50055F8BF9Bd0 > 55.7 19.7 484.6 774.3 0.0 0.7 0.0 9.9 0 40 c1t5000C50055F8A22Bd0 > 57.6 27.1 518.2 823.7 0.0 0.8 0.0 8.9 0 40 c1t5000C50055F9379Bd0 > 59.6 17.0 528.0 756.1 0.0 0.8 0.0 10.1 0 41 c1t5000C50055E57A5Fd0 > 61.2 10.8 530.0 715.6 0.0 0.8 0.0 10.7 0 40 c1t5000C50055F8CCAFd0 > 58.0 30.8 493.3 788.3 0.0 0.8 0.0 9.4 0 43 c1t5000C50055F8B80Fd0 > 56.5 19.9 490.7 708.7 0.0 0.8 0.0 10.0 0 40 c1t5000C50055F9FA1Fd0 > 56.1 22.4 484.2 701.1 0.0 0.7 0.0 9.5 0 39 c1t5000C50055E65F0Fd0 > 59.2 14.6 560.9 765.1 0.0 0.7 0.0 9.8 0 39 c1t5000C50055F8BE3Fd0 > 57.9 16.2 546.0 724.7 0.0 0.7 0.0 10.1 0 40 c1t5000C50055F8B21Fd0 > 59.5 30.0 553.2 741.8 0.0 0.9 0.0 9.8 0 45 c1t5000C50055F8A46Fd0 > 57.4 22.5 504.0 724.5 0.0 0.8 0.0 9.6 0 41 c1t5000C50055F856CFd0 > 58.4 24.6 531.4 786.9 0.0 0.7 0.0 8.4 0 38 c1t5000C50055E6606Fd0 > 511.0 161.4 7572.1 11260.1 0.0 0.3 0.0 0.4 0 14 c2 > 252.3 20.1 3776.3 458.9 0.0 0.1 0.0 0.2 0 6 c2t500117310015D579d0 > 258.8 18.0 3795.7 350.0 0.0 0.1 0.0 0.2 0 6 c2t50011731001631FDd0 > 0.0 123.4 0.0 10451.1 0.0 0.1 0.0 1.0 0 3 c2t5000A72A3007811Dd0 > 0.2 16.1 1.9 56.7 0.0 0.0 0.0 0.0 0 0 c4 > 0.2 8.1 1.6 28.3 0.0 0.0 0.0 0.0 0 0 c4t0d0 > 0.0 8.1 0.3 28.3 0.0 0.0 0.0 0.0 0 0 c4t1d0 > 495.6 163.6 7168.9 11290.3 0.0 0.2 0.0 0.4 0 14 c12 > 0.0 123.4 0.0 10451.1 0.0 0.1 0.0 1.0 0 3 c12t5000A72B300780FFd0 > 248.2 18.1 3645.8 323.0 0.0 0.1 0.0 0.2 0 5 c12t500117310015D59Ed0 > 247.4 22.1 3523.1 516.2 0.0 0.1 0.0 0.2 0 6 c12t500117310015D54Ed0 > 0.2 14.8 1.9 56.7 0.0 0.0 0.6 0.1 0 0 rpool > 3883.5 1357.7 40141.6 60739.5 22.8 38.6 4.4 7.4 54 100 tank > > It is very busy with alot of wait % and higher asvc_t (2011% busy on c1?!). I'm assuming resilvers are alot more aggressive than scrubs. > > There are many variables here, the biggest of which is the current non-scrub load. > > I might have lost 2 weeks of scrub time, depending on whether the scrub will resume where it left off. I'll update when I can. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.elling at richardelling.com Thu Jul 31 16:06:29 2014 From: richard.elling at richardelling.com (Richard Elling) Date: Thu, 31 Jul 2014 09:06:29 -0700 Subject: [OmniOS-discuss] Slow scrub performance In-Reply-To: References: Message-ID: <548A8D26-CE2B-4C3B-BBB8-661F6D7C8B49@richardelling.com> correction below... On Jul 30, 2014, at 10:37 PM, Richard Elling wrote: > apologies for the long post, data for big systems tends to do that, comments below... > > On Jul 30, 2014, at 9:10 PM, wuffers wrote: > >> So as I suspected, I lost 2 weeks of scrub time after the resilver. I started a scrub again, and it's going extremely slow (~13x slower than before): >> >> pool: tank >> state: ONLINE >> scan: scrub in progress since Tue Jul 29 15:41:27 2014 >> 45.4G scanned out of 24.5T at 413K/s, (scan is slow, no estimated time) >> 0 repaired, 0.18% done >> >> # iostat -zxCn 60 2 (2nd batch output) >> >> extended device statistics >> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 143.7 1321.5 5149.0 46223.4 0.0 1.5 0.0 1.0 0 120 c1 >> 2.4 33.3 72.0 897.5 0.0 0.0 0.0 0.6 0 2 c1t5000C50055F8723Bd0 >> 2.7 22.8 82.9 1005.4 0.0 0.0 0.0 0.9 0 2 c1t5000C50055E66B63d0 >> 2.2 24.4 73.1 917.7 0.0 0.0 0.0 0.7 0 2 c1t5000C50055F87E73d0 >> 3.1 26.2 120.9 899.8 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F8BFA3d0 >> 2.8 16.5 105.9 941.6 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F9E123d0 >> 2.5 25.6 86.6 897.9 0.0 0.0 0.0 0.7 0 2 c1t5000C50055F9F0B3d0 >> 2.3 19.9 85.3 967.8 0.0 0.0 0.0 1.2 0 2 c1t5000C50055F9D3B3d0 >> 3.1 38.3 120.7 1053.1 0.0 0.0 0.0 0.8 0 3 c1t5000C50055E4FDE7d0 >> 2.6 12.7 81.8 854.3 0.0 0.0 0.0 1.6 0 2 c1t5000C50055F9A607d0 >> 3.2 25.0 121.7 871.7 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F8CDA7d0 >> 2.5 30.6 93.0 941.2 0.0 0.0 0.0 0.9 0 2 c1t5000C50055E65877d0 >> 3.1 43.7 101.4 1004.2 0.0 0.0 0.0 1.0 0 4 c1t5000C50055F9E7D7d0 >> 2.3 24.0 92.2 965.8 0.0 0.0 0.0 0.9 0 2 c1t5000C50055FA0AF7d0 >> 2.5 25.3 99.2 872.9 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F9FE87d0 >> 2.9 19.0 116.1 894.8 0.0 0.0 0.0 1.2 0 2 c1t5000C50055F9F91Bd0 >> 2.6 38.9 96.1 915.4 0.0 0.1 0.0 1.2 0 4 c1t5000C50055F9FEABd0 >> 3.2 45.6 135.7 973.5 0.0 0.1 0.0 1.5 0 5 c1t5000C50055F9F63Bd0 >> 3.1 21.2 105.9 966.6 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F9F3EBd0 >> 2.8 26.7 122.0 781.6 0.0 0.0 0.0 0.7 0 2 c1t5000C50055F9F80Bd0 >> 3.1 31.6 119.9 932.5 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F9FB8Bd0 >> 3.1 32.5 123.3 924.1 0.0 0.0 0.0 0.9 0 3 c1t5000C50055F9F92Bd0 >> 2.9 17.0 113.8 952.0 0.0 0.0 0.0 1.2 0 2 c1t5000C50055F8905Fd0 >> 3.0 23.4 111.0 871.1 0.0 0.0 0.0 1.5 0 2 c1t5000C50055F8D48Fd0 >> 2.8 21.4 105.5 858.0 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F9F89Fd0 >> 3.5 16.4 87.1 941.3 0.0 0.0 0.0 1.4 0 2 c1t5000C50055F9EF2Fd0 >> 2.1 33.8 64.5 897.5 0.0 0.0 0.0 0.5 0 2 c1t5000C50055F8C3ABd0 >> 3.0 21.8 72.3 1005.4 0.0 0.0 0.0 1.0 0 2 c1t5000C50055E66053d0 >> 3.0 37.8 106.9 1053.5 0.0 0.0 0.0 0.9 0 3 c1t5000C50055E66503d0 >> 2.7 26.0 107.7 897.9 0.0 0.0 0.0 0.7 0 2 c1t5000C50055F9D3E3d0 >> 2.2 38.9 96.4 918.7 0.0 0.0 0.0 0.9 0 4 c1t5000C50055F84FB7d0 >> 2.8 21.4 111.1 953.6 0.0 0.0 0.0 0.7 0 1 c1t5000C50055F8E017d0 >> 3.0 30.6 104.3 940.9 0.0 0.1 0.0 1.5 0 3 c1t5000C50055E579F7d0 >> 2.8 26.4 90.9 901.1 0.0 0.0 0.0 0.9 0 2 c1t5000C50055E65807d0 >> 2.4 24.0 96.7 965.8 0.0 0.0 0.0 0.9 0 2 c1t5000C50055F84A97d0 >> 2.9 19.8 109.4 967.8 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F87D97d0 >> 3.8 16.1 106.4 943.1 0.0 0.0 0.0 1.3 0 2 c1t5000C50055F9F637d0 >> 2.2 17.1 72.7 966.6 0.0 0.0 0.0 1.4 0 2 c1t5000C50055E65ABBd0 >> 2.7 12.7 86.0 863.3 0.0 0.0 0.0 1.5 0 2 c1t5000C50055F8BF9Bd0 >> 2.7 23.2 101.8 871.1 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F8A22Bd0 >> 4.5 43.6 134.7 1004.2 0.0 0.0 0.0 1.0 0 4 c1t5000C50055F9379Bd0 >> 2.8 24.0 87.9 917.7 0.0 0.0 0.0 0.8 0 2 c1t5000C50055E57A5Fd0 >> 2.9 18.8 119.0 894.3 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F8CCAFd0 >> 3.4 45.7 128.1 976.8 0.0 0.1 0.0 1.2 0 5 c1t5000C50055F8B80Fd0 >> 2.7 24.9 100.2 871.7 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F9FA1Fd0 >> 4.8 26.8 128.6 781.6 0.0 0.0 0.0 0.7 0 2 c1t5000C50055E65F0Fd0 >> 2.7 16.3 109.5 941.6 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F8BE3Fd0 >> 3.1 21.1 119.9 858.0 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F8B21Fd0 >> 2.8 31.8 108.5 932.5 0.0 0.0 0.0 1.0 0 3 c1t5000C50055F8A46Fd0 >> 2.4 25.3 87.4 872.9 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F856CFd0 >> 3.3 32.0 125.2 924.1 0.0 0.0 0.0 1.2 0 3 c1t5000C50055E6606Fd0 >> 289.9 169.0 3905.0 12754.1 0.0 0.2 0.0 0.4 0 10 c2 >> 146.6 14.1 1987.9 305.2 0.0 0.0 0.0 0.2 0 4 c2t500117310015D579d0 >> 143.4 10.6 1917.1 205.2 0.0 0.0 0.0 0.2 0 3 c2t50011731001631FDd0 >> 0.0 144.3 0.0 12243.7 0.0 0.1 0.0 0.9 0 3 c2t5000A72A3007811Dd0 >> 0.0 14.6 0.0 75.8 0.0 0.0 0.0 0.1 0 0 c4 >> 0.0 7.3 0.0 37.9 0.0 0.0 0.0 0.1 0 0 c4t0d0 >> 0.0 7.3 0.0 37.9 0.0 0.0 0.0 0.1 0 0 c4t1d0 >> 284.8 171.5 3792.8 12786.2 0.0 0.2 0.0 0.4 0 10 c12 >> 0.0 144.3 0.0 12243.7 0.0 0.1 0.0 0.9 0 3 c12t5000A72B300780FFd0 >> 152.3 13.3 2004.6 255.9 0.0 0.0 0.0 0.2 0 4 c12t500117310015D59Ed0 >> 132.5 13.9 1788.2 286.6 0.0 0.0 0.0 0.2 0 3 c12t500117310015D54Ed0 >> 0.0 13.5 0.0 75.8 0.0 0.0 0.8 0.1 0 0 rpool >> 718.4 1653.5 12846.8 71761.5 34.0 2.0 14.3 0.8 7 51 tank >> >> This doesn't seem any busier than my earlier output (6% wait, 68% busy, asvc_t 1.1ms) and the dev team confirms that their workload hasn't changed for the past few days. If my math is right.. this will take ~719 days to complete. > > The %busy for controllers is a sum of the %busy for all disks on the controller, so > is can be large, but overall isn't interesting. With HDDs, there is no way you can > saturate the controller, so we don't really care what the %busy shows. > > The more important item is that the number of read ops is fairly low for all but 4 disks. > Since you didn't post the pool configuration, we can only guess that they might be a > souce of the bottleneck. the above paragraph missed the editor's cut. You did post the pool config, thanks! -- richard > > You're seeing a lot of reads from the cache devices. How much RAM does this system > have? > >> >> Anything I can tune to help speed this up? > > methinks the scrub I/Os are getting starved and since they are low priority, they > could get very starved. In general, I wouldn't worry about it, but I understand > why you might be nervous. Keep in mind that in ZFS scrubs are intended to find > errors on idle data, not frequently accessed data. > > more far below... > >> >> On Tue, Jul 29, 2014 at 3:29 PM, wuffers wrote: >> Going to try to answer both responses in one message.. >> >> Short answer, yes. ? Keep in mind that >> >> 1. a scrub runs in the background (so as not to impact production I/O, this was not always the case and caused serious issues in the past with a pool being unresponsive due to a scrub) >> >> 2. a scrub essentially walks the zpool examining every transaction in order (as does a resilver) >> >> So the time to complete a scrub depends on how many write transactions since the pool was created (which is generally related to the amount of data but not always). You are limited by the random I/O capability of the disks involved. With VMs I assume this is a file server, so the I/O size will also affect performance. >> >> I haven't noticed any slowdowns in our virtual environments, so I guess that's a good thing it's so low priority that it doesn't impact workloads. >> >> Run the numbers? you are scanning 24.2TB at about 5.5MB/sec ? 4,613,734 seconds or 54 days. And that assumes the same rate for all of the scan. The rate will change as other I/O competes for resources. >> >> The number was fluctuating when I started the scrub, and I had seen it go as high as 35MB/s at one point. I am certain that our Hyper-V workload has increased since the last scrub, so this does make sense. >> >> Looks like you have a fair bit of activity going on (almost 1MB/sec of writes per spindle). >> >> As Richard correctly states below, this is the aggregate since boot (uptime ~56 days). I have another output from iostat as per his instructions below. >> >> Since this is storage for VMs, I assume this is the storage server for separate compute servers? Have you tuned the block size for the file share you are using? That can make a huge difference in performance. >> >> Both the Hyper-V and VMware LUNs are created with 64K block sizes. From what I've read of other performance and tuning articles, that is the optimal block size (I did some limited testing when first configuring the SAN, but results were somewhat inconclusive). Hyper-V hosts our testing environment (we integrate with TFS, a MS product, so we have no choice here) and probably make up the bulk of the workload (~300+ test VMs with various OSes). VMware hosts our production servers (Exchange, file servers, SQL, AD, etc - ~50+ VMs). >> >> I also noted that you only have a single LOG device. Best Practice is to mirror log devices so you do not lose any data in flight if hit by a power outage (of course, if this server has more UPS runtime that all the clients that may not matter). >> >> Actually, I do have a mirror ZIL device, it's just disabled at this time (my ZIL devices are ZeusRAMs). At some point, I was troubleshooting some kernel panics (turned out to be a faulty SSD on the rpool), and hadn't re-enabled it yet. Thanks for the reminder (and yes, we do have a UPS as well). >> >> And oops.. re-attaching the ZIL as a mirror triggered a resilver now, suspending or canceling the scrub? Will monitor this and restart the scrub if it doesn't by itself. >> >> pool: tank >> state: ONLINE >> status: One or more devices is currently being resilvered. The pool will >> continue to function, possibly in a degraded state. >> action: Wait for the resilver to complete. >> scan: resilver in progress since Tue Jul 29 14:48:48 2014 >> 3.89T scanned out of 24.5T at 3.06G/s, 1h55m to go >> 0 resilvered, 15.84% done >> >> At least it's going very fast. EDIT: Now about 67% done as I finish writing this, speed dropping to ~1.3G/s. >> >> maybe, maybe not >> >> this is slower than most, surely slower than desired >> >> Unfortunately reattaching the mirror to my log device triggered a resilver. Not sure if this is desired behavior, but yes, 5.5MB/s seems quite slow. Hopefully after the resilver the scrub will progress where it left off. >> >> The estimate is often very wrong, especially for busy systems. >> If this is an older ZFS implementation, this pool is likely getting pounded by the >> ZFS write throttle. There are some tunings that can be applied, but the old write >> throttle is not a stable control system, so it will always be a little bit unpredictable. >> >> The system is on r151008 (my BE states that I upgraded back in February, putting me in r151008j or so), with all the pools upgraded for the new enhancements as well as activating the new L2ARC compression feature. Reading the release notes, the ZFS write throttle enhancements were in since r151008e so I should be good there. >> >>> # iostat -xnze >> >> Unfortunately, this is the performance since boot and is not suitable for performance >> analysis unless the system has been rebooted in the past 10 minutes or so. You'll need >> to post the second batch from "iostat -zxCn 60 2" >> >> Ah yes, that was my mistake. Output from second count (before re-attaching log mirror): >> >> # iostat -zxCn 60 2 >> >> extended device statistics >> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 255.7 1077.7 6294.0 41335.1 0.0 1.9 0.0 1.4 0 153 c1 >> 5.3 23.9 118.5 811.9 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F8723Bd0 >> 5.9 14.5 110.0 834.3 0.0 0.0 0.0 1.3 0 2 c1t5000C50055E66B63d0 >> 5.6 16.6 123.8 822.7 0.0 0.0 0.0 1.3 0 2 c1t5000C50055F87E73d0 >> 4.7 27.8 118.6 796.6 0.0 0.0 0.0 1.3 0 3 c1t5000C50055F8BFA3d0 >> 5.6 14.5 139.7 833.8 0.0 0.0 0.0 1.6 0 3 c1t5000C50055F9E123d0 >> 4.4 27.1 112.3 825.2 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F9F0B3d0 >> 5.0 20.2 121.7 803.4 0.0 0.0 0.0 1.2 0 3 c1t5000C50055F9D3B3d0 >> 5.4 26.4 137.0 857.3 0.0 0.0 0.0 1.4 0 4 c1t5000C50055E4FDE7d0 >> 4.7 12.3 123.7 832.7 0.0 0.0 0.0 2.0 0 3 c1t5000C50055F9A607d0 >> 5.0 23.9 125.9 830.9 0.0 0.0 0.0 1.3 0 3 c1t5000C50055F8CDA7d0 >> 4.5 31.4 112.2 814.6 0.0 0.0 0.0 1.1 0 3 c1t5000C50055E65877d0 >> 5.2 24.4 130.6 872.5 0.0 0.0 0.0 1.2 0 3 c1t5000C50055F9E7D7d0 >> 4.1 21.8 103.7 797.2 0.0 0.0 0.0 1.1 0 3 c1t5000C50055FA0AF7d0 >> 5.5 24.8 129.8 802.8 0.0 0.0 0.0 1.5 0 4 c1t5000C50055F9FE87d0 >> 5.7 17.7 137.2 797.6 0.0 0.0 0.0 1.4 0 3 c1t5000C50055F9F91Bd0 >> 6.0 30.6 139.1 852.0 0.0 0.1 0.0 1.5 0 4 c1t5000C50055F9FEABd0 >> 6.1 34.1 137.8 929.2 0.0 0.1 0.0 1.9 0 6 c1t5000C50055F9F63Bd0 >> 4.1 15.9 101.8 791.4 0.0 0.0 0.0 1.6 0 3 c1t5000C50055F9F3EBd0 >> 6.4 23.2 155.2 878.6 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F9F80Bd0 >> 4.5 23.5 106.2 825.4 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F9FB8Bd0 >> 4.0 23.2 101.1 788.9 0.0 0.0 0.0 1.3 0 3 c1t5000C50055F9F92Bd0 >> 4.4 11.3 125.7 782.3 0.0 0.0 0.0 1.9 0 3 c1t5000C50055F8905Fd0 >> 4.6 20.4 129.2 823.0 0.0 0.0 0.0 1.5 0 3 c1t5000C50055F8D48Fd0 >> 5.1 19.7 142.9 887.2 0.0 0.0 0.0 1.7 0 3 c1t5000C50055F9F89Fd0 >> 5.6 11.4 129.1 776.0 0.0 0.0 0.0 1.9 0 3 c1t5000C50055F9EF2Fd0 >> 5.6 23.7 137.4 811.9 0.0 0.0 0.0 1.2 0 3 c1t5000C50055F8C3ABd0 >> 6.8 13.9 132.4 834.3 0.0 0.0 0.0 1.8 0 3 c1t5000C50055E66053d0 >> 5.2 26.7 126.9 857.3 0.0 0.0 0.0 1.2 0 3 c1t5000C50055E66503d0 >> 4.2 27.1 104.6 825.2 0.0 0.0 0.0 1.0 0 3 c1t5000C50055F9D3E3d0 >> 5.2 30.7 140.9 852.0 0.0 0.1 0.0 1.5 0 4 c1t5000C50055F84FB7d0 >> 5.4 16.1 124.3 791.4 0.0 0.0 0.0 1.7 0 3 c1t5000C50055F8E017d0 >> 3.8 31.4 89.7 814.6 0.0 0.0 0.0 1.1 0 4 c1t5000C50055E579F7d0 >> 4.6 27.5 116.0 796.6 0.0 0.1 0.0 1.6 0 4 c1t5000C50055E65807d0 >> 4.0 21.5 99.7 797.2 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F84A97d0 >> 4.7 20.2 116.3 803.4 0.0 0.0 0.0 1.4 0 3 c1t5000C50055F87D97d0 >> 5.0 11.5 121.5 776.0 0.0 0.0 0.0 2.0 0 3 c1t5000C50055F9F637d0 >> 4.9 11.3 112.4 782.3 0.0 0.0 0.0 2.3 0 3 c1t5000C50055E65ABBd0 >> 5.3 11.8 142.5 832.7 0.0 0.0 0.0 2.4 0 3 c1t5000C50055F8BF9Bd0 >> 5.0 20.3 121.4 823.0 0.0 0.0 0.0 1.7 0 3 c1t5000C50055F8A22Bd0 >> 6.6 24.3 170.3 872.5 0.0 0.0 0.0 1.3 0 3 c1t5000C50055F9379Bd0 >> 5.8 16.3 121.7 822.7 0.0 0.0 0.0 1.3 0 2 c1t5000C50055E57A5Fd0 >> 5.3 17.7 146.5 797.6 0.0 0.0 0.0 1.4 0 3 c1t5000C50055F8CCAFd0 >> 5.7 34.1 141.5 929.2 0.0 0.1 0.0 1.7 0 5 c1t5000C50055F8B80Fd0 >> 5.5 23.8 125.7 830.9 0.0 0.0 0.0 1.2 0 3 c1t5000C50055F9FA1Fd0 >> 5.0 23.2 127.9 878.6 0.0 0.0 0.0 1.1 0 3 c1t5000C50055E65F0Fd0 >> 5.2 14.0 163.7 833.8 0.0 0.0 0.0 2.0 0 3 c1t5000C50055F8BE3Fd0 >> 4.6 18.9 122.8 887.2 0.0 0.0 0.0 1.6 0 3 c1t5000C50055F8B21Fd0 >> 5.5 23.6 137.4 825.4 0.0 0.0 0.0 1.5 0 3 c1t5000C50055F8A46Fd0 >> 4.9 24.6 116.7 802.8 0.0 0.0 0.0 1.4 0 4 c1t5000C50055F856CFd0 >> 4.9 23.4 120.8 788.9 0.0 0.0 0.0 1.4 0 3 c1t5000C50055E6606Fd0 >> 234.9 170.1 4079.9 11127.8 0.0 0.2 0.0 0.5 0 9 c2 >> 119.0 28.9 2083.8 670.8 0.0 0.0 0.0 0.3 0 3 c2t500117310015D579d0 >> 115.9 27.4 1996.1 634.2 0.0 0.0 0.0 0.3 0 3 c2t50011731001631FDd0 >> 0.0 113.8 0.0 9822.8 0.0 0.1 0.0 1.0 0 2 c2t5000A72A3007811Dd0 >> 0.1 18.5 0.0 64.8 0.0 0.0 0.0 0.0 0 0 c4 >> 0.1 9.2 0.0 32.4 0.0 0.0 0.0 0.0 0 0 c4t0d0 >> 0.0 9.2 0.0 32.4 0.0 0.0 0.0 0.0 0 0 c4t1d0 >> 229.8 58.1 3987.4 1308.0 0.0 0.1 0.0 0.3 0 6 c12 >> 114.2 27.7 1994.8 626.0 0.0 0.0 0.0 0.3 0 3 c12t500117310015D59Ed0 >> 115.5 30.4 1992.6 682.0 0.0 0.0 0.0 0.3 0 3 c12t500117310015D54Ed0 >> 0.1 17.1 0.0 64.8 0.0 0.0 0.6 0.1 0 0 rpool >> 720.3 1298.4 14361.2 53770.8 18.7 2.3 9.3 1.1 6 68 tank > > ok, so the pool is issuing 720 read iops, including resilver workload, vs 1298 write iops. > There is plenty of I/O capacity left on the table here, as you can see by the %busy being > so low. > > So I think the pool is not scheduling scrub I/Os very well. You can increase the number of > scrub I/Os in the scheduler by adjusting the zfs_vdev_scrub_max_active tunable. The > default is 2, but you'll have to consider that a share (in the stock market sense) where > the active sync reads and writes are getting 10 each. You can try bumping up the value > and see what happens over some time, perhaps 10 minutes or so -- too short of a time > and you won't get a good feeling for the impact (try this in off-peak time). > echo zfs_vdev_scrub_max_active/W0t5 | mdb -kw > will change the value from 2 to 5, increasing its share of the total I/O workload. > > You can see the progress of scan (scrubs do scan) workload by looking at the ZFS > debug messages. > echo ::zfs_dbgmsg | mdb -k > These will look mysterious... they are. But the interesting bits are about how many blocks > are visited in some amount of time (txg sync interval). Ideally, this will change as you > adjust zfs_vdev_scrub_max_active. > -- richard > >> >> Is 153% busy correct on c1? Seems to me that disks are quite "busy", but are handling the workload just fine (wait at 6% and asvc_t at 1.1ms) >> >> Interestingly, this is the same output now that the resilver is running: >> >> extended device statistics >> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 2876.9 1041.1 25400.7 38189.1 0.0 37.9 0.0 9.7 0 2011 c1 >> 60.8 26.1 540.1 845.2 0.0 0.7 0.0 8.3 0 39 c1t5000C50055F8723Bd0 >> 58.4 14.2 511.6 740.7 0.0 0.7 0.0 10.1 0 39 c1t5000C50055E66B63d0 >> 60.2 16.3 529.3 756.1 0.0 0.8 0.0 10.1 0 41 c1t5000C50055F87E73d0 >> 57.5 24.9 527.6 841.7 0.0 0.7 0.0 9.0 0 40 c1t5000C50055F8BFA3d0 >> 57.9 14.5 543.5 765.1 0.0 0.7 0.0 9.8 0 38 c1t5000C50055F9E123d0 >> 57.9 23.9 516.6 806.9 0.0 0.8 0.0 9.3 0 40 c1t5000C50055F9F0B3d0 >> 59.8 24.6 554.1 857.5 0.0 0.8 0.0 9.6 0 42 c1t5000C50055F9D3B3d0 >> 56.5 21.0 480.4 715.7 0.0 0.7 0.0 8.9 0 37 c1t5000C50055E4FDE7d0 >> 54.8 9.7 473.5 737.9 0.0 0.7 0.0 11.2 0 39 c1t5000C50055F9A607d0 >> 55.8 20.2 457.3 708.7 0.0 0.7 0.0 9.9 0 40 c1t5000C50055F8CDA7d0 >> 57.8 28.6 487.0 796.1 0.0 0.9 0.0 9.9 0 45 c1t5000C50055E65877d0 >> 60.8 27.1 572.6 823.7 0.0 0.8 0.0 8.8 0 41 c1t5000C50055F9E7D7d0 >> 55.8 21.1 478.2 766.6 0.0 0.7 0.0 9.7 0 40 c1t5000C50055FA0AF7d0 >> 57.0 22.8 528.3 724.5 0.0 0.8 0.0 9.6 0 41 c1t5000C50055F9FE87d0 >> 56.2 10.8 465.2 715.6 0.0 0.7 0.0 10.4 0 38 c1t5000C50055F9F91Bd0 >> 59.2 29.4 524.6 740.9 0.0 0.8 0.0 8.9 0 41 c1t5000C50055F9FEABd0 >> 57.3 30.7 496.7 788.3 0.0 0.8 0.0 9.1 0 42 c1t5000C50055F9F63Bd0 >> 55.5 16.3 461.9 652.9 0.0 0.7 0.0 10.1 0 39 c1t5000C50055F9F3EBd0 >> 57.2 22.1 495.1 701.1 0.0 0.8 0.0 9.8 0 41 c1t5000C50055F9F80Bd0 >> 59.5 30.2 543.1 741.8 0.0 0.9 0.0 9.6 0 45 c1t5000C50055F9FB8Bd0 >> 56.5 25.1 515.4 786.9 0.0 0.7 0.0 8.6 0 38 c1t5000C50055F9F92Bd0 >> 61.8 12.5 540.6 790.9 0.0 0.8 0.0 10.3 0 41 c1t5000C50055F8905Fd0 >> 57.0 19.8 521.0 774.3 0.0 0.7 0.0 9.6 0 39 c1t5000C50055F8D48Fd0 >> 56.3 16.3 517.7 724.7 0.0 0.7 0.0 9.9 0 38 c1t5000C50055F9F89Fd0 >> 57.0 13.4 504.5 790.5 0.0 0.8 0.0 10.7 0 40 c1t5000C50055F9EF2Fd0 >> 55.0 26.1 477.6 845.2 0.0 0.7 0.0 8.3 0 36 c1t5000C50055F8C3ABd0 >> 57.8 14.1 518.7 740.7 0.0 0.8 0.0 10.8 0 41 c1t5000C50055E66053d0 >> 55.9 20.8 490.2 715.7 0.0 0.7 0.0 9.0 0 37 c1t5000C50055E66503d0 >> 57.0 24.1 509.7 806.9 0.0 0.8 0.0 10.0 0 41 c1t5000C50055F9D3E3d0 >> 59.1 29.2 504.1 740.9 0.0 0.8 0.0 9.3 0 44 c1t5000C50055F84FB7d0 >> 54.4 16.3 449.5 652.9 0.0 0.7 0.0 10.4 0 39 c1t5000C50055F8E017d0 >> 57.8 28.4 503.3 796.1 0.0 0.9 0.0 10.1 0 45 c1t5000C50055E579F7d0 >> 58.2 24.9 502.0 841.7 0.0 0.8 0.0 9.2 0 40 c1t5000C50055E65807d0 >> 58.2 20.7 513.4 766.6 0.0 0.8 0.0 9.8 0 41 c1t5000C50055F84A97d0 >> 56.5 24.9 508.0 857.5 0.0 0.8 0.0 9.2 0 40 c1t5000C50055F87D97d0 >> 53.4 13.5 449.9 790.5 0.0 0.7 0.0 10.7 0 38 c1t5000C50055F9F637d0 >> 57.0 11.8 503.0 790.9 0.0 0.7 0.0 10.6 0 39 c1t5000C50055E65ABBd0 >> 55.4 9.6 461.1 737.9 0.0 0.8 0.0 11.6 0 40 c1t5000C50055F8BF9Bd0 >> 55.7 19.7 484.6 774.3 0.0 0.7 0.0 9.9 0 40 c1t5000C50055F8A22Bd0 >> 57.6 27.1 518.2 823.7 0.0 0.8 0.0 8.9 0 40 c1t5000C50055F9379Bd0 >> 59.6 17.0 528.0 756.1 0.0 0.8 0.0 10.1 0 41 c1t5000C50055E57A5Fd0 >> 61.2 10.8 530.0 715.6 0.0 0.8 0.0 10.7 0 40 c1t5000C50055F8CCAFd0 >> 58.0 30.8 493.3 788.3 0.0 0.8 0.0 9.4 0 43 c1t5000C50055F8B80Fd0 >> 56.5 19.9 490.7 708.7 0.0 0.8 0.0 10.0 0 40 c1t5000C50055F9FA1Fd0 >> 56.1 22.4 484.2 701.1 0.0 0.7 0.0 9.5 0 39 c1t5000C50055E65F0Fd0 >> 59.2 14.6 560.9 765.1 0.0 0.7 0.0 9.8 0 39 c1t5000C50055F8BE3Fd0 >> 57.9 16.2 546.0 724.7 0.0 0.7 0.0 10.1 0 40 c1t5000C50055F8B21Fd0 >> 59.5 30.0 553.2 741.8 0.0 0.9 0.0 9.8 0 45 c1t5000C50055F8A46Fd0 >> 57.4 22.5 504.0 724.5 0.0 0.8 0.0 9.6 0 41 c1t5000C50055F856CFd0 >> 58.4 24.6 531.4 786.9 0.0 0.7 0.0 8.4 0 38 c1t5000C50055E6606Fd0 >> 511.0 161.4 7572.1 11260.1 0.0 0.3 0.0 0.4 0 14 c2 >> 252.3 20.1 3776.3 458.9 0.0 0.1 0.0 0.2 0 6 c2t500117310015D579d0 >> 258.8 18.0 3795.7 350.0 0.0 0.1 0.0 0.2 0 6 c2t50011731001631FDd0 >> 0.0 123.4 0.0 10451.1 0.0 0.1 0.0 1.0 0 3 c2t5000A72A3007811Dd0 >> 0.2 16.1 1.9 56.7 0.0 0.0 0.0 0.0 0 0 c4 >> 0.2 8.1 1.6 28.3 0.0 0.0 0.0 0.0 0 0 c4t0d0 >> 0.0 8.1 0.3 28.3 0.0 0.0 0.0 0.0 0 0 c4t1d0 >> 495.6 163.6 7168.9 11290.3 0.0 0.2 0.0 0.4 0 14 c12 >> 0.0 123.4 0.0 10451.1 0.0 0.1 0.0 1.0 0 3 c12t5000A72B300780FFd0 >> 248.2 18.1 3645.8 323.0 0.0 0.1 0.0 0.2 0 5 c12t500117310015D59Ed0 >> 247.4 22.1 3523.1 516.2 0.0 0.1 0.0 0.2 0 6 c12t500117310015D54Ed0 >> 0.2 14.8 1.9 56.7 0.0 0.0 0.6 0.1 0 0 rpool >> 3883.5 1357.7 40141.6 60739.5 22.8 38.6 4.4 7.4 54 100 tank >> >> It is very busy with alot of wait % and higher asvc_t (2011% busy on c1?!). I'm assuming resilvers are alot more aggressive than scrubs. >> >> There are many variables here, the biggest of which is the current non-scrub load. >> >> I might have lost 2 weeks of scrub time, depending on whether the scrub will resume where it left off. I'll update when I can. >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From moo at wuffers.net Thu Jul 31 19:44:45 2014 From: moo at wuffers.net (wuffers) Date: Thu, 31 Jul 2014 15:44:45 -0400 Subject: [OmniOS-discuss] Slow scrub performance In-Reply-To: References: Message-ID: This is going to be long winded as well (apologies!).. lots of pasted data. On Thu, Jul 31, 2014 at 1:37 AM, Richard Elling < richard.elling at richardelling.com> wrote: > > The %busy for controllers is a sum of the %busy for all disks on the > controller, so > is can be large, but overall isn't interesting. With HDDs, there is no way > you can > saturate the controller, so we don't really care what the %busy shows. > > The more important item is that the number of read ops is fairly low for > all but 4 disks. > Since you didn't post the pool configuration, we can only guess that they > might be a > souce of the bottleneck. > > You're seeing a lot of reads from the cache devices. How much RAM does > this system > have? > > I realized that the busy % was a sum after looking through some of that data, but good to know that it isn't very relevant. The pool configuration was in the original post, but here it is again (after re-attaching the mirror log device). Just saw your edit, but this has been updated from the original post anyways. pool: tank state: ONLINE scan: scrub in progress since Tue Jul 29 15:41:27 2014 82.5G scanned out of 24.5T at 555K/s, (scan is slow, no estimated time) 0 repaired, 0.33% done config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t5000C50055F9F637d0 ONLINE 0 0 0 c1t5000C50055F9EF2Fd0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c1t5000C50055F87D97d0 ONLINE 0 0 0 c1t5000C50055F9D3B3d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c1t5000C50055E6606Fd0 ONLINE 0 0 0 c1t5000C50055F9F92Bd0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c1t5000C50055F856CFd0 ONLINE 0 0 0 c1t5000C50055F9FE87d0 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 c1t5000C50055F84A97d0 ONLINE 0 0 0 c1t5000C50055FA0AF7d0 ONLINE 0 0 0 mirror-5 ONLINE 0 0 0 c1t5000C50055F9D3E3d0 ONLINE 0 0 0 c1t5000C50055F9F0B3d0 ONLINE 0 0 0 mirror-6 ONLINE 0 0 0 c1t5000C50055F8A46Fd0 ONLINE 0 0 0 c1t5000C50055F9FB8Bd0 ONLINE 0 0 0 mirror-7 ONLINE 0 0 0 c1t5000C50055F8B21Fd0 ONLINE 0 0 0 c1t5000C50055F9F89Fd0 ONLINE 0 0 0 mirror-8 ONLINE 0 0 0 c1t5000C50055F8BE3Fd0 ONLINE 0 0 0 c1t5000C50055F9E123d0 ONLINE 0 0 0 mirror-9 ONLINE 0 0 0 c1t5000C50055F9379Bd0 ONLINE 0 0 0 c1t5000C50055F9E7D7d0 ONLINE 0 0 0 mirror-10 ONLINE 0 0 0 c1t5000C50055E65F0Fd0 ONLINE 0 0 0 c1t5000C50055F9F80Bd0 ONLINE 0 0 0 mirror-11 ONLINE 0 0 0 c1t5000C50055F8A22Bd0 ONLINE 0 0 0 c1t5000C50055F8D48Fd0 ONLINE 0 0 0 mirror-12 ONLINE 0 0 0 c1t5000C50055E65807d0 ONLINE 0 0 0 c1t5000C50055F8BFA3d0 ONLINE 0 0 0 mirror-13 ONLINE 0 0 0 c1t5000C50055E579F7d0 ONLINE 0 0 0 c1t5000C50055E65877d0 ONLINE 0 0 0 mirror-14 ONLINE 0 0 0 c1t5000C50055F9FA1Fd0 ONLINE 0 0 0 c1t5000C50055F8CDA7d0 ONLINE 0 0 0 mirror-15 ONLINE 0 0 0 c1t5000C50055F8BF9Bd0 ONLINE 0 0 0 c1t5000C50055F9A607d0 ONLINE 0 0 0 mirror-16 ONLINE 0 0 0 c1t5000C50055E66503d0 ONLINE 0 0 0 c1t5000C50055E4FDE7d0 ONLINE 0 0 0 mirror-17 ONLINE 0 0 0 c1t5000C50055F8E017d0 ONLINE 0 0 0 c1t5000C50055F9F3EBd0 ONLINE 0 0 0 mirror-18 ONLINE 0 0 0 c1t5000C50055F8B80Fd0 ONLINE 0 0 0 c1t5000C50055F9F63Bd0 ONLINE 0 0 0 mirror-19 ONLINE 0 0 0 c1t5000C50055F84FB7d0 ONLINE 0 0 0 c1t5000C50055F9FEABd0 ONLINE 0 0 0 mirror-20 ONLINE 0 0 0 c1t5000C50055F8CCAFd0 ONLINE 0 0 0 c1t5000C50055F9F91Bd0 ONLINE 0 0 0 mirror-21 ONLINE 0 0 0 c1t5000C50055E65ABBd0 ONLINE 0 0 0 c1t5000C50055F8905Fd0 ONLINE 0 0 0 mirror-22 ONLINE 0 0 0 c1t5000C50055E57A5Fd0 ONLINE 0 0 0 c1t5000C50055F87E73d0 ONLINE 0 0 0 mirror-23 ONLINE 0 0 0 c1t5000C50055E66053d0 ONLINE 0 0 0 c1t5000C50055E66B63d0 ONLINE 0 0 0 mirror-24 ONLINE 0 0 0 c1t5000C50055F8723Bd0 ONLINE 0 0 0 c1t5000C50055F8C3ABd0 ONLINE 0 0 0 logs mirror-25 ONLINE 0 0 0 c2t5000A72A3007811Dd0 ONLINE 0 0 0 c12t5000A72B300780FFd0 ONLINE 0 0 0 cache c2t500117310015D579d0 ONLINE 0 0 0 c2t50011731001631FDd0 ONLINE 0 0 0 c12t500117310015D59Ed0 ONLINE 0 0 0 c12t500117310015D54Ed0 ONLINE 0 0 0 spares c1t5000C50055FA2AEFd0 AVAIL c1t5000C50055E595B7d0 AVAIL Basically, this is 2 head nodes (Supermicro 826BE26) connected to a Supermicro 847E26 JBOD, using LSI 9207s. There are 52 Seagate ST4000NM0023s (4TB SAS drives) in 25 mirror pairs plus 2 which are spares. There are 4 Smart Optimus 400GB SSDs as cache drives, and 2 Stec ZeusRAMs for slogs. They're wired in such a way that both nodes can see all the drives (data, cache and log), and the data drives are on separate controllers than the cache/slog devices. RSF-1 was also specced in here but not in use at the moment. All the SAN traffic is through InfiniBand (SRP). Each head unit has 256GB of RAM. Dedupe is not in use and all the latest feature flags are enabled. An arc_summary output: System Memory: Physical RAM: 262103 MB Free Memory : 10273 MB LotsFree: 4095 MB ZFS Tunables (/etc/system): set zfs:zfs_arc_shrink_shift = 10 ARC Size: Current Size: 225626 MB (arcsize) Target Size (Adaptive): 225626 MB (c) Min Size (Hard Limit): 8190 MB (zfs_arc_min) Max Size (Hard Limit): 261079 MB (zfs_arc_max) ARC Size Breakdown: Most Recently Used Cache Size: 10% 23290 MB (p) Most Frequently Used Cache Size: 89% 202335 MB (c-p) ARC Efficency: Cache Access Total: 27377320465 Cache Hit Ratio: 93% 25532510784 [Defined State for buffer] Cache Miss Ratio: 6% 1844809681 [Undefined State for Buffer] REAL Hit Ratio: 92% 25243933796 [MRU/MFU Hits Only] Data Demand Efficiency: 95% Data Prefetch Efficiency: 40% CACHE HITS BY CACHE LIST: Anon: --% Counter Rolled. Most Recently Used: 18% 4663226393 (mru) [ Return Customer ] Most Frequently Used: 80% 20580707403 (mfu) [ Frequent Customer ] Most Recently Used Ghost: 0% 176686906 (mru_ghost) [ Return Customer Evicted, Now Back ] Most Frequently Used Ghost: 0% 126286869 (mfu_ghost) [ Frequent Customer Evicted, Now Back ] CACHE HITS BY DATA TYPE: Demand Data: 95% 24413671342 Prefetch Data: 1% 358419705 Demand Metadata: 2% 698314899 Prefetch Metadata: 0% 62104838 CACHE MISSES BY DATA TYPE: Demand Data: 69% 1277347273 Prefetch Data: 28% 519579788 Demand Metadata: 2% 39512363 Prefetch Metadata: 0% 8370257 And a sample of arcstat (deleted first line of output): # arcstat -f read,hits,miss,hit%,l2read,l2hits,l2miss,l2hit%,arcsz,l2size,l2asize 1 read hits miss hit% l2read l2hits l2miss l2hit% arcsz l2size l2asize 5.9K 4.6K 1.3K 78 1.3K 1.2K 80 93 220G 1.6T 901G 6.7K 5.2K 1.5K 76 1.5K 1.3K 250 83 220G 1.6T 901G 7.0K 5.3K 1.7K 76 1.7K 1.4K 316 81 220G 1.6T 901G 6.5K 5.3K 1.2K 80 1.2K 1.1K 111 91 220G 1.6T 901G 6.4K 5.2K 1.2K 81 1.2K 1.1K 100 91 220G 1.6T 901G 7.2K 5.6K 1.6K 78 1.6K 1.3K 289 81 220G 1.6T 901G 8.5K 6.8K 1.7K 80 1.7K 1.3K 351 79 220G 1.6T 901G 7.5K 5.9K 1.6K 78 1.6K 1.3K 282 82 220G 1.6T 901G 6.7K 5.6K 1.1K 83 1.1K 991 123 88 220G 1.6T 901G 6.8K 5.5K 1.3K 80 1.3K 1.1K 234 82 220G 1.6T 901G Interesting to see only an l2asize of 901G even though I should have more.. 373G x 4 is just under 1.5TB of raw storage. The compressed l2arc size is 1.6TB, while actual used space is 901G. I expect more to be used. Perhaps Saso can comment on this portion, if he's following this thread (snipped from "zpool iostat -v"): cache - - - - - - c2t500117310015D579d0 373G 8M 193 16 2.81M 394K c2t50011731001631FDd0 373G 5.29M 194 15 2.85M 360K c12t500117310015D59Ed0 373G 5.50M 191 17 2.74M 368K c12t500117310015D54Ed0 373G 5.57M 200 14 2.89M 300K (from this discussion here: http://lists.omniti.com/pipermail/omnios-discuss/2014-February/002287.html), and the uptime on this is currently around ~58 days, so it should have had enough time to rotate through the l2arc "rotor". > methinks the scrub I/Os are getting starved and since they are low > priority, they > could get very starved. In general, I wouldn't worry about it, but I > understand > why you might be nervous. Keep in mind that in ZFS scrubs are intended to > find > errors on idle data, not frequently accessed data. > > more far below... > > I'm worried because there's no way the scrub will ever complete before the next reboot. Regular scrubs are important, right? > ok, so the pool is issuing 720 read iops, including resilver workload, vs > 1298 write iops. > There is plenty of I/O capacity left on the table here, as you can see by > the %busy being > so low. > > So I think the pool is not scheduling scrub I/Os very well. You can > increase the number of > scrub I/Os in the scheduler by adjusting the zfs_vdev_scrub_max_active > tunable. The > default is 2, but you'll have to consider that a share (in the stock > market sense) where > the active sync reads and writes are getting 10 each. You can try bumping > up the value > and see what happens over some time, perhaps 10 minutes or so -- too short > of a time > and you won't get a good feeling for the impact (try this in off-peak > time). > echo zfs_vdev_scrub_max_active/W0t5 | mdb -kw > will change the value from 2 to 5, increasing its share of the total I/O > workload. > > You can see the progress of scan (scrubs do scan) workload by looking at > the ZFS > debug messages. > echo ::zfs_dbgmsg | mdb -k > These will look mysterious... they are. But the interesting bits are about > how many blocks > are visited in some amount of time (txg sync interval). Ideally, this will > change as you > adjust zfs_vdev_scrub_max_active. > -- richard > > Actually, you used the data from before the resilver. During resilver this was the activity on the pool: 3883.5 1357.7 40141.6 60739.5 22.8 38.6 4.4 7.4 54 100 tank Are you looking at an individual drive's busy % or the pool's busy % to determine whether it's "busy"? During the resilver this was the activity on the drives (a sample - between 38-45%, whereas during the scrub the individual drives were 2-5% busy): 59.5 30.0 553.2 741.8 0.0 0.9 0.0 9.8 0 45 c1t5000C50055F8A46Fd0 57.4 22.5 504.0 724.5 0.0 0.8 0.0 9.6 0 41 c1t5000C50055F856CFd0 58.4 24.6 531.4 786.9 0.0 0.7 0.0 8.4 0 38 c1t5000C50055E6606Fd0 But yes, without the resilver the busy % was much less (during the scrub each individual drive was 2-4% busy). I've pasted the current iostat output further below. With the zfs_vdev_scrub_max_active at the default of 2, it was doing an average of 162 blocks: doing scan sync txg 26678243; bm=897/1/0/15785978 scanned dataset 897 (tank/vmware-64k-5tb-1) with min=1 max=26652167; pausing=1 visited 162 blocks in 6090ms doing scan sync txg 26678244; bm=897/1/0/15786126 scanned dataset 897 (tank/vmware-64k-5tb-1) with min=1 max=26652167; pausing=1 visited 162 blocks in 6094ms After changing it to 5, and waiting about 20 mins, I'm not seeing anything significantly different: doing scan sync txg 26678816; bm=897/1/0/37082043 scanned dataset 897 (tank/vmware-64k-5tb-1) with min=1 max=26652167; pausing=1 visited 163 blocks in 6154ms doing scan sync txg 26678817; bm=897/1/0/37082193 scanned dataset 897 (tank/vmware-64k-5tb-1) with min=1 max=26652167; pausing=1 visited 162 blocks in 6128ms pool: tank state: ONLINE scan: scrub in progress since Tue Jul 29 15:41:27 2014 97.0G scanned out of 24.5T at 599K/s, (scan is slow, no estimated time) 0 repaired, 0.39% done I'll keep the zfs_vdev_scrub_max_active tunable to 5, as it doesn't appear to be impacting too much, and monitor for changes. What's strange to me is that it was "humming" along at 5.5MB/s at the 2 week mark but is now 10x slower (compared to before reattaching the mirror log device). It *seems* marginally faster, from 541K/s to almost 600K/s.. This is the current activity from "iostat -xnCz 60 2": extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 158.8 1219.2 3717.8 39969.8 0.0 1.6 0.0 1.1 0 139 c1 3.6 35.1 86.2 730.9 0.0 0.0 0.0 0.9 0 3 c1t5000C50055F8723Bd0 3.7 19.9 83.7 789.9 0.0 0.0 0.0 1.4 0 3 c1t5000C50055E66B63d0 2.7 22.5 60.8 870.9 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F87E73d0 2.4 27.9 66.0 765.8 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F8BFA3d0 2.8 17.9 64.9 767.0 0.0 0.0 0.0 0.8 0 1 c1t5000C50055F9E123d0 3.1 26.1 73.8 813.3 0.0 0.0 0.0 0.9 0 2 c1t5000C50055F9F0B3d0 3.1 15.5 79.4 783.4 0.0 0.0 0.0 1.3 0 2 c1t5000C50055F9D3B3d0 3.8 38.6 86.2 826.8 0.0 0.1 0.0 1.2 0 4 c1t5000C50055E4FDE7d0 3.8 15.4 93.0 822.3 0.0 0.0 0.0 1.5 0 3 c1t5000C50055F9A607d0 3.0 25.7 79.4 719.7 0.0 0.0 0.0 0.9 0 2 c1t5000C50055F8CDA7d0 3.2 26.5 69.0 824.3 0.0 0.0 0.0 1.1 0 3 c1t5000C50055E65877d0 3.7 42.6 79.2 834.1 0.0 0.1 0.0 1.3 0 5 c1t5000C50055F9E7D7d0 3.3 23.2 79.5 778.0 0.0 0.0 0.0 1.2 0 3 c1t5000C50055FA0AF7d0 3.4 30.2 77.0 805.9 0.0 0.0 0.0 0.9 0 3 c1t5000C50055F9FE87d0 3.0 15.4 72.6 795.0 0.0 0.0 0.0 1.6 0 3 c1t5000C50055F9F91Bd0 2.5 38.1 61.1 859.4 0.0 0.1 0.0 1.6 0 5 c1t5000C50055F9FEABd0 2.1 13.2 42.7 801.6 0.0 0.0 0.0 1.6 0 2 c1t5000C50055F9F63Bd0 3.0 20.0 62.6 766.6 0.0 0.0 0.0 1.1 0 2 c1t5000C50055F9F3EBd0 3.7 24.3 80.2 807.9 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F9F80Bd0 3.2 35.2 66.1 852.4 0.0 0.0 0.0 1.2 0 4 c1t5000C50055F9FB8Bd0 3.9 30.6 84.7 845.7 0.0 0.0 0.0 0.8 0 3 c1t5000C50055F9F92Bd0 2.7 18.1 68.8 831.4 0.0 0.0 0.0 1.4 0 2 c1t5000C50055F8905Fd0 2.7 17.7 61.4 762.1 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F8D48Fd0 3.5 17.5 87.8 749.7 0.0 0.0 0.0 1.7 0 3 c1t5000C50055F9F89Fd0 2.6 13.7 58.6 780.9 0.0 0.0 0.0 1.7 0 3 c1t5000C50055F9EF2Fd0 3.3 34.9 74.5 730.9 0.0 0.0 0.0 0.8 0 3 c1t5000C50055F8C3ABd0 3.1 19.3 64.7 789.9 0.0 0.0 0.0 1.0 0 2 c1t5000C50055E66053d0 3.8 38.5 82.9 826.8 0.0 0.1 0.0 1.3 0 4 c1t5000C50055E66503d0 3.7 25.8 91.4 813.3 0.0 0.0 0.0 0.8 0 2 c1t5000C50055F9D3E3d0 2.2 37.9 52.5 859.4 0.0 0.0 0.0 1.1 0 4 c1t5000C50055F84FB7d0 2.8 20.0 62.8 766.6 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F8E017d0 3.9 26.1 86.5 824.3 0.0 0.0 0.0 1.1 0 3 c1t5000C50055E579F7d0 3.1 27.7 79.9 765.8 0.0 0.0 0.0 1.2 0 3 c1t5000C50055E65807d0 2.9 22.8 76.3 778.0 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F84A97d0 3.6 15.3 89.0 783.4 0.0 0.0 0.0 1.7 0 3 c1t5000C50055F87D97d0 2.8 13.8 77.9 780.9 0.0 0.0 0.0 1.5 0 2 c1t5000C50055F9F637d0 2.1 18.3 51.4 831.4 0.0 0.0 0.0 1.1 0 2 c1t5000C50055E65ABBd0 3.1 15.4 70.9 822.3 0.0 0.0 0.0 1.2 0 2 c1t5000C50055F8BF9Bd0 3.2 17.9 75.5 762.1 0.0 0.0 0.0 1.2 0 2 c1t5000C50055F8A22Bd0 3.7 42.4 83.3 834.1 0.0 0.1 0.0 1.1 0 5 c1t5000C50055F9379Bd0 4.0 22.7 86.8 870.9 0.0 0.0 0.0 1.0 0 2 c1t5000C50055E57A5Fd0 2.6 15.5 67.5 795.0 0.0 0.0 0.0 1.4 0 2 c1t5000C50055F8CCAFd0 2.9 13.2 65.4 801.6 0.0 0.0 0.0 1.9 0 3 c1t5000C50055F8B80Fd0 3.3 25.7 82.7 719.7 0.0 0.0 0.0 1.1 0 3 c1t5000C50055F9FA1Fd0 4.0 24.0 84.9 807.9 0.0 0.0 0.0 1.1 0 3 c1t5000C50055E65F0Fd0 2.8 18.4 69.5 767.0 0.0 0.0 0.0 1.0 0 2 c1t5000C50055F8BE3Fd0 3.3 17.6 81.6 749.7 0.0 0.0 0.0 1.4 0 3 c1t5000C50055F8B21Fd0 3.3 35.1 64.2 852.4 0.0 0.0 0.0 1.1 0 4 c1t5000C50055F8A46Fd0 3.5 30.0 82.1 805.9 0.0 0.0 0.0 0.9 0 3 c1t5000C50055F856CFd0 3.9 30.4 89.5 845.7 0.0 0.0 0.0 0.9 0 3 c1t5000C50055E6606Fd0 429.4 133.6 5933.0 8163.0 0.0 0.2 0.0 0.3 0 12 c2 215.8 28.4 2960.4 677.7 0.0 0.1 0.0 0.2 0 5 c2t500117310015D579d0 213.7 27.4 2972.6 654.1 0.0 0.1 0.0 0.2 0 5 c2t50011731001631FDd0 0.0 77.8 0.0 6831.2 0.0 0.1 0.0 0.8 0 2 c2t5000A72A3007811Dd0 0.0 12.3 0.0 46.8 0.0 0.0 0.0 0.0 0 0 c4 0.0 6.2 0.0 23.4 0.0 0.0 0.0 0.0 0 0 c4t0d0 0.0 6.1 0.0 23.4 0.0 0.0 0.0 0.0 0 0 c4t1d0 418.4 134.8 5663.1 8197.8 0.0 0.2 0.0 0.3 0 11 c12 0.0 77.8 0.0 6831.2 0.0 0.1 0.0 0.8 0 2 c12t5000A72B300780FFd0 203.5 29.7 2738.0 715.8 0.0 0.1 0.0 0.2 0 5 c12t500117310015D59Ed0 214.9 27.2 2925.2 650.8 0.0 0.1 0.0 0.2 0 5 c12t500117310015D54Ed0 0.0 11.3 0.0 46.8 0.0 0.0 0.7 0.1 0 0 rpool 1006.7 1478.2 15313.9 56330.7 30.4 2.0 12.2 0.8 6 64 tank Seems the pool is busy at 64% but the individual drives are not taxed at all (this load is virtually identical to when the scrub was running before the resilver was triggered). Still not too sure how to interpret this data. Is the system over-stressed? Is there really a bottleneck somewhere, or just need to fine tune some settings? Going to try some iometer results via the VM I/O Analyzer. -------------- next part -------------- An HTML attachment was scrubbed... URL: