From tim at multitalents.net Sat Mar 1 00:22:36 2014 From: tim at multitalents.net (Tim Rice) Date: Fri, 28 Feb 2014 16:22:36 -0800 (PST) Subject: [OmniOS-discuss] How do you configure serial ports on OmniOS? In-Reply-To: <20140228181645.624801A0BBB@apps0.cs.toronto.edu> References: <20140228181645.624801A0BBB@apps0.cs.toronto.edu> Message-ID: On Fri, 28 Feb 2014, Chris Siebenmann wrote: > This question makes me feel silly but I'm lost in a confusing maze of > documentation for sacadm, pmadm, and so on and I can't find anything > with web searches. What I would like to do is configure what I believe is > /dev/term/c ('ttyS3' in Linux) to run a getty or the OmniOS/Illumos/etc > equivalent at 115200 baud. Seems like there should be a command to change the baud rate. I've always just edited the _pmtab file. disable the port (assuming pmadm -l shows it enabled) # pmadm -d -p zsmon -s ttyc change the baud rate to what you want # vi /etc/saf/zsmon/_pmtab Enable the port # pmadm -e -p zsmon -s ttyc Sometimes you need to restart the port monitor. (will effect all ports on zsmon) # sacadm -k -p zsmon # sacadm -s -p zsmon > > Relatedly, I'd also like to change the getty-equivalent that 'pmadm -l' > at least theoretically says is talking to /dev/term/a from 9600 baud to > 115200 baud. How to do this is also, well, not obvious to me. > > (Please note that I do not want to make this the serial console, a > procedure for which there seems to be plenty of documentation. I just > want to be able to log in over that serial port, or it and /dev/term/a.) > > Thanks in advance. > > - cks -- Tim Rice Multitalents tim at multitalents.net From cks at cs.toronto.edu Sat Mar 1 01:39:02 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Fri, 28 Feb 2014 20:39:02 -0500 Subject: [OmniOS-discuss] How do you configure serial ports on OmniOS? In-Reply-To: tim's message of Fri, 28 Feb 2014 16:22:36 -0800. Message-ID: <20140301013902.1CCCD1A02C8@apps0.cs.toronto.edu> | On Fri, 28 Feb 2014, Chris Siebenmann wrote: | > This question makes me feel silly but I'm lost in a confusing maze of | > documentation for sacadm, pmadm, and so on and I can't find anything | > with web searches. What I would like to do is configure what I believe is | > /dev/term/c ('ttyS3' in Linux) to run a getty or the OmniOS/Illumos/etc | > equivalent at 115200 baud. | | Seems like there should be a command to change the baud rate. | I've always just edited the _pmtab file. | | disable the port (assuming pmadm -l shows it enabled) | # pmadm -d -p zsmon -s ttyc I should clarify something here: the port isn't listed in pmadm -l at all (although it exists in /dev/term). I assume that this means that I need to create it with some arguments; the exact arguments needed to set things up right (or right enough that I can edit files from there) are one of the things that I'm lost about. (The examples I've found of using pmadm to configure things seem to leave a lot out of magic, and they're often old enough that I'm not sure if things have changed since then.) - cks From tim at multitalents.net Sat Mar 1 07:21:24 2014 From: tim at multitalents.net (Tim Rice) Date: Fri, 28 Feb 2014 23:21:24 -0800 (PST) Subject: [OmniOS-discuss] How do you configure serial ports on OmniOS? In-Reply-To: <20140301013902.1CCCD1A02C8@apps0.cs.toronto.edu> References: <20140301013902.1CCCD1A02C8@apps0.cs.toronto.edu> Message-ID: On Fri, 28 Feb 2014, Chris Siebenmann wrote: > | On Fri, 28 Feb 2014, Chris Siebenmann wrote: > | > This question makes me feel silly but I'm lost in a confusing maze of > | > documentation for sacadm, pmadm, and so on and I can't find anything > | > with web searches. What I would like to do is configure what I believe is > | > /dev/term/c ('ttyS3' in Linux) to run a getty or the OmniOS/Illumos/etc > | > equivalent at 115200 baud. > | > | Seems like there should be a command to change the baud rate. > | I've always just edited the _pmtab file. > | > | disable the port (assuming pmadm -l shows it enabled) > | # pmadm -d -p zsmon -s ttyc > > I should clarify something here: the port isn't listed in pmadm -l at > all (although it exists in /dev/term). I assume that this means that > I need to create it with some arguments; the exact arguments needed to > set things up right (or right enough that I can edit files from there) > are one of the things that I'm lost about. Again, I'd just edit /etc/saf/zsmon/_pmtab and do a copy and paste of the ttyb line and make the necessary changes. Probably want to start with ux (disabled) in the second field. Then restart the port monitor and then enable the port. > (The examples I've found of using pmadm to configure things seem to > leave a lot out of magic, and they're often old enough that I'm not > sure if things have changed since then.) -- Tim Rice Multitalents tim at multitalents.net From mir at miras.org Sat Mar 1 13:46:22 2014 From: mir at miras.org (Michael Rasmussen) Date: Sat, 1 Mar 2014 14:46:22 +0100 Subject: [OmniOS-discuss] ZFS trim support Message-ID: <20140301144622.12a79ac6@sleipner.datanom.net> Hi all, Anybody knows the current status for trim support in Illumos? It seems two solutions (FreeBSD and tracking the metaslab allocator) are suggested but no final date: http://open-zfs.org/wiki/Features#TRIM_Support In lack of trim how do you then handle SSD's? I have just ordered some Corsair for use as log and cache so this will soon be a headache of mine as well;-) -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: HOORAY, Ronald!! Now YOU can marry LINDA RONSTADT too!! -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From esproul at omniti.com Sat Mar 1 15:32:25 2014 From: esproul at omniti.com (Eric Sproul) Date: Sat, 1 Mar 2014 10:32:25 -0500 Subject: [OmniOS-discuss] ZFS trim support In-Reply-To: <20140301144622.12a79ac6@sleipner.datanom.net> References: <20140301144622.12a79ac6@sleipner.datanom.net> Message-ID: On Sat, Mar 1, 2014 at 8:46 AM, Michael Rasmussen wrote: > In lack of trim how do you then handle SSD's? I just treat them as regular drives. I've got some 600G Intel 320s that have been in service as primary pool devices for about 2.5 years, having over 20TB written and still the media wear indicator has barely budged. Here's a graph of one of them: https://share.circonus.com/shared/graphs/ae29a8e6-167c-47e4-aea6-d96e51f8e35f/gPUfX5 Gather some data on your daily write totals and do the math, then look at the endurance spec for your drives and you'll have an idea of what kind of life to expect from them. Eric From lists at mcintyreweb.com Sat Mar 1 19:46:02 2014 From: lists at mcintyreweb.com (Hugh McIntyre) Date: Sat, 01 Mar 2014 11:46:02 -0800 Subject: [OmniOS-discuss] illumos power management...again... In-Reply-To: References: Message-ID: <531238FA.5040409@mcintyreweb.com> Others have answered the disk power saving part of the question. But regardless of the power management issues, if your backup disks are in the same server as the server disks themselves then your backups are not very disaster proof. What if the home server is stolen or goes wrong in ways that destroy the backup as well as live disks? Maybe this setup is OK for you, since a backup server in the same house also carries risks. But worth considering if this is really for valuable backups. Hugh. On 2/27/14 11:13 PM, Johan Kragsterman wrote: > Hi! > > I remember a discussion on this theme last year. I've been reading up on that, but that didn't answer my questions. > > I'm in the process of building a new main home server, and I would of coarse like it to be energy efficient. I don't use much space normally, so my daily working environment doesn't need much space, which means I can use all SSD's for that. > > Though I would like a backup/nfs environment, with more space on spinning disks, and I got two different scenarious to choose from here: > > One is building a separate machine for backup/nfs, and only start it when it needs to be started( with wake on LAN). > > The other is to have the spinning disks in my main home server, and use illumos power management to take care of powering down the disks when they are not in use. > This would of coarse be the easiest way, if the power management system is efficient enough. > > I don't know if the system can/do power down the disks if the nfs server is active and the shares are mounted? (I don't have any problems with latencies/delays here, since it isn't in regular use) > > If so, good! > If not, I could umount the shares and turn off the nfs server, and export the pool, if that would help spinning down disks... > > Someone got any insight and/or suggestions here...? > > > > Best regards from/Med v?nliga h?lsningar fr?n > > Johan Kragsterman > > Capvert > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > From johan.kragsterman at capvert.se Sun Mar 2 10:25:10 2014 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Sun, 2 Mar 2014 11:25:10 +0100 Subject: [OmniOS-discuss] illumos power management...again... In-Reply-To: <531238FA.5040409@mcintyreweb.com> References: <531238FA.5040409@mcintyreweb.com>, Message-ID: Hi! Thanks, Hugh, but that is not one of my concerns. I am not afraid of that the system in any way will mess up my backup/nfs pool, and I am not afraid of thieves currently. The VERY, VERY important backups I might also replicate to other places, but they are really very small, so no problems to fit them into whatever... No, I'm more interested in power savings...probably going to measure the electricity difference between when nfs server is running and not running, as well as imported or exported pool. Best regards from/Med v?nliga h?lsningar fr?n Johan Kragsterman Capvert -----"OmniOS-discuss" skrev: ----- Till: omnios-discuss at lists.omniti.com Fr?n: Hugh McIntyre S?nt av: "OmniOS-discuss" Datum: 2014-03-01 20:47 ?rende: Re: [OmniOS-discuss] illumos power management...again... Others have answered the disk power saving part of the question. But regardless of the power management issues, if your backup disks are in the same server as the server disks themselves then your backups are not very disaster proof. ?What if the home server is stolen or goes wrong in ways that destroy the backup as well as live disks? Maybe this setup is OK for you, since a backup server in the same house also carries risks. ?But worth considering if this is really for valuable backups. Hugh. On 2/27/14 11:13 PM, Johan Kragsterman wrote: > Hi! > > I remember a discussion on this theme last year. I've been reading up on that, but that didn't answer my questions. > > I'm in the process of building a new main home server, and I would of coarse like it to be energy efficient. I don't use much space normally, so my daily working environment doesn't need much space, which means I can use all SSD's for that. > > Though I would like a backup/nfs environment, with more space on spinning disks, and I got two different scenarious to choose from here: > > One is building a separate machine for backup/nfs, and only start it when it needs to be started( with wake on LAN). > > The other is to have the spinning disks in my main home server, and use illumos power management to take care of powering down the disks when they are not in use. > This would of coarse be the easiest way, if the power management system is efficient enough. > > I don't know if the system can/do power down the disks if the nfs server is active and the shares are mounted? (I don't have any problems with latencies/delays here, since it isn't in regular use) > > If so, good! > If not, I could umount the shares and turn off the nfs server, and export the pool, if that would help spinning down disks... > > Someone got any insight and/or suggestions here...? > > > > Best regards from/Med v?nliga h?lsningar fr?n > > Johan Kragsterman > > Capvert > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From jimklimov at cos.ru Sun Mar 2 11:40:10 2014 From: jimklimov at cos.ru (Jim Klimov) Date: Sun, 02 Mar 2014 12:40:10 +0100 Subject: [OmniOS-discuss] ZFS trim support In-Reply-To: <20140301144622.12a79ac6@sleipner.datanom.net> References: <20140301144622.12a79ac6@sleipner.datanom.net> Message-ID: <5313189A.5000908@cos.ru> On 2014-03-01 14:46, Michael Rasmussen wrote: > Hi all, > > Anybody knows the current status for trim support in Illumos? > > It seems two solutions (FreeBSD and tracking the metaslab allocator) are > suggested but no final date: > http://open-zfs.org/wiki/Features#TRIM_Support > > In lack of trim how do you then handle SSD's? > > I have just ordered some Corsair for use as log and cache so this will > soon be a headache of mine as well;-) I believe one common approach is under-allocating the drives. For example, I did this on my rig, with little if any "scientific" approach like testing the results, other that seeing how much it would take to wear down my drives... which is kinda irreversible :) As a rule of thumb I took that the same hardware device is branded and sold as two models (only one available in our shops) - higher volume (120Gb in lowest size) or higher speed/endurance (100Gb - and MUCH higher specs in endurance). So I took the 120Gb one and partitioned using 100Gb and retaining 20Gb free. According to the datasheet for this model, the difference in endurance is 10-fold (total drive rewrites) for the small model and 5-fold for larger ones, for the cost of a 20% difference in size... Peak 4K random writes are also about 3x faster on smaller brothers. I think it is worth not-using that little extra: http://www.seagate.com/files/www-content/product-content/pulsar-fam/pulsar/enterprise-sata-ssd/en-us/docs/enterprise-sata-ssd-ds1775-1-1301us.pdf The idea with under-allocation, the way I get it, is that those 20Gb are "required" to contain zeroes. So when the other 100Gb have used and freed some flash blocks, there is pressure on the device to free some of them up so as to accommodate the 20Gb of zeroes before it completely runs out of all 120Gb worth of user-addressable blocks. In effect this causes early trimming (if this works at all, which I am unsure how can be measured) at the devices discretion (it chooses when it can do the trick so as to stay within the bounds of "guaranteed" storage space). Unfortunately, I have no idea whether partitioning the 120Gb drive to use only 100Gb magically turns it into the 100Gb model for the endurance or performance (i.e. there might well be some differences in firmware as well, such as other allocation goals and optimizations, etc.) HTH, //Jim Klimov From richard.elling at richardelling.com Sun Mar 2 23:26:55 2014 From: richard.elling at richardelling.com (Richard Elling) Date: Sun, 2 Mar 2014 15:26:55 -0800 Subject: [OmniOS-discuss] Pliant/Sandisk SSD ZIL In-Reply-To: <530FB59D.3000703@cos.ru> References: <530291DA.6050009@umiacs.umd.edu> <5302BC07.9070304@umiacs.umd.edu> <348BDDFA-3B48-422E-A166-8427417EF432@RichardElling.com> <5303A925.6060209@cos.ru> <6D195400-4AA6-47B4-A7D6-1CF77D92A119@RichardElling.com> <530F4ACB.40607@cos.ru> <17E70147-79EB-471C-A9F4-496F8C99BF77@richardelling.com> <530FB59D.3000703@cos.ru> Message-ID: <6F378AFA-45E4-473D-9D7D-266AF754735F@richardelling.com> On Feb 27, 2014, at 2:01 PM, Jim Klimov wrote: > On 2014-02-27 20:39, Richard Elling wrote: >>> I hope, NFS cached-data syncs and locks, and ZFS write-syncs are >>> not very related in this case (i.e. zfs sync=disabled does not >>> influence co-ordination of NFS data between hosts), right? >> >> Right. The file system is consistent. The NFS sync is for the case when >> the server reboots. As long as your server isn't rebooting, everything >> should be consistent (assuming the clients are configured appropriately) > > And "appropriately" is just how much different from "default"? It depends on the client. Some people prefer performance over correctness and default to caching strategies on the client that can make multiclient sync more difficult. > I.e. if I set sharenfs on the server dataset, and walk in with > autofs + nfs/client from the build host, with no special configs > other than those in solaris 10 or OpenIndiana, is that appropriate? > Or should some specific stuff be configured on clients and servers? For Solaris derivatives, check the attribute cache settings, noac, and nocto in mount_nfs(1m) to see if the defaults suit your needs. -- richard From cks at cs.toronto.edu Tue Mar 4 23:03:13 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Tue, 04 Mar 2014 18:03:13 -0500 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? Message-ID: <20140304230313.AE1AF1A02E9@apps0.cs.toronto.edu> I will ask my question to start with and then explain the background. As far as I can tell from running truss on the 'zfs mount -a' in /lib/svc/method/fs-local, this *does not* mount filesystems from pools other than rpool. However the mounts are absent immediately before it runs and present immediately afterwards. So: does anyone understand how this works? I assume 'zfs mount -a' is doing some ZFS action that activates non-rpool pools and causes them to magically mount their filesystems? Thanks in advance if anyone knows this. Background: I am having an extremely weird heisenbug problem where on boot[*] our test OmniOS machine fails out at the ZFS mount stage with errors about: Reading ZFS config: done. Mounting ZFS filesystems: cannot mount 'fs3-test-01': mountmount or data is busy cannot mount '/fs3-test-02': directory is not empty cannot mount 'fs3-test-02/h/999': mountpoint or dataset is busy (20/20) svc:/system/filesystem/local:default: WARNING: /usr/sbin/zfs mount -a foiled: exit status 1 [failures go on] The direct problem here is that as far as I can tell this is incorrect. If I log in to the console after this failure, the pools and their filesystems are present. If I hack up /lib/svc/method/fs-local to add debugging stuff, all of the directories involved are empty (and unmounted) before 'zfs mount -a' runs and magically present afterwards, even as 'zfs mount -a' complains and errors out. That was when I started truss'ing the 'zfs mount -a' itself and discovered that it normally doesn't mount non-rpool filesystems. In fact, based on a truss trace I have during an incident it appears that the problem happens exactly when 'zfs mount -a' thinks that it *does* need to mount such a filesystem but finds that the target directory already has things in it because the filesystem is actually mounted already. Running truss on the 'zfs mount -a' seems to make this happen much less frequently, especially a relatively verbose truss that is tracing calls in libzfs as well as system calls. This makes me wonder if there is some sort of a race involved. - cks [*: the other problem is that the test OmniOS machine has stopped actually rebooting when I run 'reboot'; it hangs during shutdown and must be power cycled (and I have the magic fastboot settings turned off). Neither this nor the mount problem used to happen; both appeared this morning. No packages have been updated. ] From mark at omniti.com Tue Mar 4 23:29:42 2014 From: mark at omniti.com (Mark Harrison) Date: Tue, 4 Mar 2014 18:29:42 -0500 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: <20140304230313.AE1AF1A02E9@apps0.cs.toronto.edu> References: <20140304230313.AE1AF1A02E9@apps0.cs.toronto.edu> Message-ID: You mention 'directories' being empty. Does /fs3-test-02 contain empty directories before being mounted? If so, this will be why zfs thinks it's isn't empty and then fail to mount it. However, the child filesystems might still mount because their directories are empty, giving the appearance of everything being mounted OK. I'm not sure why you're not seeing truss show zfs trying to mount non-rpool filesystems, but it should be doing so. My wild guess right now is that it is due to zfs checking to see if the directory is empty first, and only showing up that it's doing something in truss if the dir isnt' empty. We've had this happen before when someone runs mv on a directory that is actually the root of a filesystem. When zfs remounts it on reboot, it gets remounted at the old location, which may or may not have other data in it at this point (this comes up a lot when doing something like mv foo foo.old; mkdir foo; do_stuff_with foo). I've not tracked down the exact pathology of this when it happens, but our solution then has basically to be to unmount all affected filesystems, then run rmdir on all the blank directories, move any non-blank directories aside (keep them in case they have data that needs to be kept), then run zfs mount -a to let it clean things up. On Tue, Mar 4, 2014 at 6:03 PM, Chris Siebenmann wrote: > I will ask my question to start with and then explain the background. > As far as I can tell from running truss on the 'zfs mount -a' in > /lib/svc/method/fs-local, this *does not* mount filesystems from pools > other than rpool. However the mounts are absent immediately before it > runs and present immediately afterwards. So: does anyone understand > how this works? I assume 'zfs mount -a' is doing some ZFS action that > activates non-rpool pools and causes them to magically mount their > filesystems? > > Thanks in advance if anyone knows this. > > Background: > I am having an extremely weird heisenbug problem where on boot[*] our > test OmniOS machine fails out at the ZFS mount stage with errors about: > > Reading ZFS config: done. > Mounting ZFS filesystems: cannot mount 'fs3-test-01': mountmount or data is busy > cannot mount '/fs3-test-02': directory is not empty > cannot mount 'fs3-test-02/h/999': mountpoint or dataset is busy > (20/20) > svc:/system/filesystem/local:default: WARNING: /usr/sbin/zfs mount -a foiled: exit status 1 > [failures go on] > > The direct problem here is that as far as I can tell this is incorrect. > If I log in to the console after this failure, the pools and their > filesystems are present. If I hack up /lib/svc/method/fs-local to add > debugging stuff, all of the directories involved are empty (and unmounted) > before 'zfs mount -a' runs and magically present afterwards, even as 'zfs > mount -a' complains and errors out. That was when I started truss'ing > the 'zfs mount -a' itself and discovered that it normally doesn't mount > non-rpool filesystems. In fact, based on a truss trace I have during an > incident it appears that the problem happens exactly when 'zfs mount -a' > thinks that it *does* need to mount such a filesystem but finds that > the target directory already has things in it because the filesystem is > actually mounted already. > > Running truss on the 'zfs mount -a' seems to make this happen much less > frequently, especially a relatively verbose truss that is tracing calls > in libzfs as well as system calls. This makes me wonder if there is some > sort of a race involved. > > - cks > [*: the other problem is that the test OmniOS machine has stopped actually > rebooting when I run 'reboot'; it hangs during shutdown and must be > power cycled (and I have the magic fastboot settings turned off). > Neither this nor the mount problem used to happen; both appeared this > morning. No packages have been updated. > ] > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -- Mark Harrison Lead Site Reliability Engineer OmniTI From mir at miras.org Tue Mar 4 23:35:28 2014 From: mir at miras.org (Michael Rasmussen) Date: Wed, 5 Mar 2014 00:35:28 +0100 Subject: [OmniOS-discuss] ZFS trim support In-Reply-To: <20140301144622.12a79ac6@sleipner.datanom.net> References: <20140301144622.12a79ac6@sleipner.datanom.net> Message-ID: <20140305003528.1514e9be@sleipner.datanom.net> On Sat, 1 Mar 2014 14:46:22 +0100 Michael Rasmussen wrote: Thanks for your answers. I have followed the advice of not partition more than 80%. For the part whether sequential writes have impact or not. From the disk's point of view a write is a write and therefore any write will impact the need for trim in one way or another. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Debian Hint #21: If your Debian box is behind a slow network connection, but you have access to a fast one as well, check out the apt-zip package. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From cks at cs.toronto.edu Tue Mar 4 23:42:40 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Tue, 04 Mar 2014 18:42:40 -0500 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: mark's message of Tue, 04 Mar 2014 18:29:42 -0500. Message-ID: <20140304234240.766661A02E9@apps0.cs.toronto.edu> | You mention 'directories' being empty. Does /fs3-test-02 contain empty | directories before being mounted? It doesn't. All of /fs3-test-01, /fs3-test-02, /h/281, and /h/999 are empty before 'zfs mount -a' runs (I've verified this with ls's immediately before the 'zfs mount -a' in /lib/svc/method/fs-local). | I'm not sure why you're not seeing truss show zfs trying to mount | non-rpool filesystems, but it should be doing so. My truss traces on successful boot are quite definitive about this. It clearly looks to see if a lot of fs's are mounted and finds that they are. I've put one captured trace up here, if people are interested: http://www.cs.toronto.edu/~cks/t/fs-local-truss-good-boot.txt Notice that calls to libzfs:zfs_is_mounted() return either 0 or 1. Calls that return 0 are followed by a call to libzfs:zfs_mount() (and an actual mount operation); calls that return 1 aren't. Clearly 'zfs mount -a' is checking a bunch more filesystems than it actually is mounting. (I don't know if there's a way to make truss dump the first argument to libzfs:zfs_is_mounted() as a string so that one can see what mount points are being checked.) A truss from a bad boot is http://www.cs.toronto.edu/~cks/t/fs-local-truss-bad-boot.txt This doesn't have the libzfs trace information, just the syscalls, but you can see a similar sequence of syscall level operations right up to the point where it does getdents64() on /h/281 and finds it *not* empty (a 232-byte return value instead of a 48-byte one). Based on the information from the good trace, this is a safety check inside libzfs:zfs_mount(). - cks From jimklimov at cos.ru Wed Mar 5 08:56:50 2014 From: jimklimov at cos.ru (Jim Klimov) Date: Wed, 05 Mar 2014 09:56:50 +0100 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: References: <20140304230313.AE1AF1A02E9@apps0.cs.toronto.edu> Message-ID: <5316E6D2.7020404@cos.ru> On 2014-03-05 00:29, Mark Harrison wrote: > You mention 'directories' being empty. Does /fs3-test-02 contain empty > directories before being mounted? If so, this will be why zfs thinks > it's isn't empty and then fail to mount it. However, the child > filesystems might still mount because their directories are empty, > giving the appearance of everything being mounted OK. Just in case, such cases my be verified with df which returns the actual mounted filesystem which provides the tested directory or file: # df -k /lib/libzfs.so /lib/libc.so /var/log/syslog Filesystem kbytes used avail capacity Mounted on rpool/ROOT/sol10u10 30707712 1826637 7105279 21% / rpool/ROOT/sol10u10/usr 30707712 508738 7105279 7% /usr rpool/SHARED/var/log 4194304 1491 3638955 1% /var/log This way you can test for example if a directory is "standalone" or an actively used mountpoint of a ZFS POSIX dataset. I think a "zpool list" can help in your debugging to see if the pools in question are in fact imported before "zfs mount -a", or if some unexpected magic happens and the "zfs" command does indeed trigger the imports. On 2014-03-05 00:03, Chris Siebenmann wrote: > As far as I can tell from running truss on the 'zfs mount -a' in > /lib/svc/method/fs-local, this *does not* mount filesystems from pools > other than rpool. However the mounts are absent immediately before it > runs and present immediately afterwards. So: does anyone understand > how this works? I assume 'zfs mount -a' is doing some ZFS action that > activates non-rpool pools and causes them to magically mount their > filesystems? Regarding the "zfs mount -a" - I am not sure why it errors out in your case, I can only think of some extended attributes being in use, or overlay-mounts, or stuff like that - though such things are likely to come up in "strange" runtime cases to mostly block un-mounts, not in orderly startup scenarios... Namely, one thing that may be a problem is if a directory in question is a current-working-dir for some process, or if a file has been created, used, deleted (while it remains open by some process) which is quite possible for the likes of /var/tmp paths. But even so, it is likely to block unmounts but not over-mounts as long as the directory is (seems) empty. Also, as at least a workaround, you can switch the mountpoint to "legacy" and refer the dataset from /etc/vfstab including the "-O" option for overlay-mount. Unfortunately there is no equivalent dataset attribute at the moment, so it is not a very convenient solution for possible trees of datasets - but may be quite acceptable for leaf datasets where you don't need to automate any sub-mounts. Vote for https://www.illumos.org/issues/997 ;) And finally, I also don't know where the pools get imported, but "zfs mount -a" *should* only mount datasets with canmount=on and zoned=off (if in global zone) and a valid mountpoint path, picked from any pools imported at the moment. The mounts from different pools may be done in parallel, so if you need some specific order of mounts (i.e. rpool/export/home and then datapool/export/home/user... okay, there is in fact no problem with these - but just to give *some* viable example) you may have to specify stuff in /etc/vfstab. I can guess (but would need to grok the code) that something like "zpool import -N -a" is done in some part of the root environment preparation to prepare all pools referenced in /etc/zfs/zpool.cache, perhaps some time after the rpool is imported and the chosen root dataset is mounted explicitly to anchor the running kernel. As another workaround, you can export the pool which contains your "problematic" datasets so it is un-cached from zpool.cache and is not automatically imported nor mounted during the system bootup - so that the system becomes able to boot successfully to the point of being accessible over ssh for example. Then you import and mount that other pool as an SMF service, upon which your other services can depend to proceed, see here for ideas and code snippets: http://wiki.openindiana.org/oi/Advanced+-+ZFS+Pools+as+SMF+services+and+iSCSI+loopback+mounts HTH, //Jim Klimov From cks at cs.toronto.edu Wed Mar 5 15:50:51 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Wed, 05 Mar 2014 10:50:51 -0500 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: Your message of Wed, 05 Mar 2014 09:56:50 +0100. <5316E6D2.7020404@cos.ru> Message-ID: <20140305155052.05A401A0826@apps0.cs.toronto.edu> | I think a "zpool list" can help in your debugging to see if the pools | in question are in fact imported before "zfs mount -a", or if some | unexpected magic happens and the "zfs" command does indeed trigger the | imports. Sorry for not mentioning this before: a 'zpool list' before the 'zfs mount -a' lists the pools as visible, but both df and 'mount -v' do not report any filesystems from the two additional pools (the ones that get mount failures and so on). | The mounts from different pools may be done in parallel, so if you | need some specific order of mounts (i.e. rpool/export/home and then | datapool/export/home/user... okay, there is in fact no problem with | these - but just to give *some* viable example) you may have to | specify stuff in /etc/vfstab. As far as I can tell from the Illumos code, this is not the case. The code certainly seems to be single-threaded and it sorts the mount list into order in a way that should put prerequisite mounts first (eg you mount /a and then /a/b). (This potential issue also doesn't apply to my case because all four of the mounts from these pools are in the root filesystem, not in any sub-filesystem.) | I can guess (but would need to grok the code) that something | like "zpool import -N -a" is done in some part of the root | environment preparation to prepare all pools referenced in | /etc/zfs/zpool.cache, perhaps some time after the rpool is | imported and the chosen root dataset is mounted explicitly | to anchor the running kernel. The last time I spelunked the OpenSolaris code some years ago, the kernel read zpool.cache very early on but only sort of half-activated pools then (eg it didn't check to see if all vdevs were present). Pools were brought to full activation essentially as a side effect of doing other operations with/to them. I don't know if this is still the state of affairs in Illumos/OmniOS today and how such half-activated pools show up during early boot (eg if they appear in 'zpool list', or even if simply running 'zpool list' is enough to bring them to fully active status). - cks From dswartz at druber.com Wed Mar 5 16:29:56 2014 From: dswartz at druber.com (Dan Swartzendruber) Date: Wed, 5 Mar 2014 11:29:56 -0500 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: <20140305155052.05A401A0826@apps0.cs.toronto.edu> References: <20140305155052.05A401A0826@apps0.cs.toronto.edu> Message-ID: This is all very strange. I saw stuff like this all the time when I was using ZFS on Linux, due to timing where an HBA would not present devices quickly enough, resulting in missing pools, missing/unmounted datasets, etc, which would all get 'fixed' if you manually re-did them, but I've never seen it in omniOS. From bdha at mirrorshades.net Wed Mar 5 17:17:27 2014 From: bdha at mirrorshades.net (Bryan Horstmann-Allen) Date: Wed, 5 Mar 2014 12:17:27 -0500 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: References: <20140305155052.05A401A0826@apps0.cs.toronto.edu> Message-ID: <0635F3D6-C531-480C-8A79-83BE84D5DD79@mirrorshades.net> I've seen that bug on SmartOS. Fixed in the last month or two. -- bdha > On Mar 5, 2014, at 11:29, "Dan Swartzendruber" wrote: > > > This is all very strange. I saw stuff like this all the time when I was > using ZFS on Linux, due to timing where an HBA would not present devices > quickly enough, resulting in missing pools, missing/unmounted datasets, > etc, which would all get 'fixed' if you manually re-did them, but I've > never seen it in omniOS. > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From dswartz at druber.com Wed Mar 5 17:29:57 2014 From: dswartz at druber.com (Dan Swartzendruber) Date: Wed, 5 Mar 2014 12:29:57 -0500 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: <0635F3D6-C531-480C-8A79-83BE84D5DD79@mirrorshades.net> References: <20140305155052.05A401A0826@apps0.cs.toronto.edu> <0635F3D6-C531-480C-8A79-83BE84D5DD79@mirrorshades.net> Message-ID: > I've seen that bug on SmartOS. Fixed in the last month or two. Any explanation as to what was happening? From bdha at mirrorshades.net Wed Mar 5 17:46:43 2014 From: bdha at mirrorshades.net (Bryan Horstmann-Allen) Date: Wed, 5 Mar 2014 12:46:43 -0500 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: References: <20140305155052.05A401A0826@apps0.cs.toronto.edu> <0635F3D6-C531-480C-8A79-83BE84D5DD79@mirrorshades.net> Message-ID: <20140305174643.GA16938@lab.pobox.com> +------------------------------------------------------------------------------ | On 2014-03-05 12:29:57, Dan Swartzendruber wrote: | | Any explanation as to what was happening? This is the bug I was hitting: http://smartos.org/bugview/OS-2616 Devices wouldn't be available at boot, but would once the system was up. -- bdha cyberpunk is dead. long live cyberpunk. From dswartz at druber.com Wed Mar 5 17:53:02 2014 From: dswartz at druber.com (Dan Swartzendruber) Date: Wed, 5 Mar 2014 12:53:02 -0500 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: <20140305174643.GA16938@lab.pobox.com> References: <20140305155052.05A401A0826@apps0.cs.toronto.edu> <0635F3D6-C531-480C-8A79-83BE84D5DD79@mirrorshades.net> <20140305174643.GA16938@lab.pobox.com> Message-ID: <933d63e9b39154da4445a368a76d2279.squirrel@webmail.druber.com> > +------------------------------------------------------------------------------ > | On 2014-03-05 12:29:57, Dan Swartzendruber wrote: > | > | Any explanation as to what was happening? > > This is the bug I was hitting: http://smartos.org/bugview/OS-2616 > > Devices wouldn't be available at boot, but would once the system was up. Interesting. Thanks for posting this! From danmcd at omniti.com Wed Mar 5 17:53:32 2014 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 5 Mar 2014 12:53:32 -0500 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: <20140305174643.GA16938@lab.pobox.com> References: <20140305155052.05A401A0826@apps0.cs.toronto.edu> <0635F3D6-C531-480C-8A79-83BE84D5DD79@mirrorshades.net> <20140305174643.GA16938@lab.pobox.com> Message-ID: <4D262C1E-E99B-4251-8DE0-C7FD955AC932@omniti.com> On Mar 5, 2014, at 12:46 PM, Bryan Horstmann-Allen wrote: > +------------------------------------------------------------------------------ > | On 2014-03-05 12:29:57, Dan Swartzendruber wrote: > | > | Any explanation as to what was happening? > > This is the bug I was hitting: http://smartos.org/bugview/OS-2616 > > Devices wouldn't be available at boot, but would once the system was up. I believe that bugfix is in illumos-gate now as: https://www.illumos.org/issues/4500 which was fixed by this changeset: https://github.com/illumos/illumos-gate/commit/da5ab83fc888325fc812733d8a54bc5eab65c65c and it *should* be in bloody now: https://github.com/omniti-labs/illumos-omnios/commit/da5ab83fc888325fc812733d8a54bc5eab65c65c Dan From ikaufman at eng.ucsd.edu Wed Mar 5 18:03:12 2014 From: ikaufman at eng.ucsd.edu (Ian Kaufman) Date: Wed, 5 Mar 2014 10:03:12 -0800 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: <20140304234240.766661A02E9@apps0.cs.toronto.edu> References: <20140304234240.766661A02E9@apps0.cs.toronto.edu> Message-ID: > It doesn't. All of /fs3-test-01, /fs3-test-02, /h/281, and /h/999 > are empty before 'zfs mount -a' runs (I've verified this with ls's > immediately before the 'zfs mount -a' in /lib/svc/method/fs-local). > As a test, try renaming those "empty" directories and then reboot. We saw this issue with Solaris 10, where on reboot, the filesystems did not unmount cleanly, and failed to mount at boot. Ian -- Ian Kaufman Research Systems Administrator UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu From cks at cs.toronto.edu Wed Mar 5 21:23:56 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Wed, 05 Mar 2014 16:23:56 -0500 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: danmcd's message of Wed, 05 Mar 2014 12:53:32 -0500. <4D262C1E-E99B-4251-8DE0-C7FD955AC932@omniti.com> Message-ID: <20140305212356.EE1AA1A0826@apps0.cs.toronto.edu> With the aid of DTrace (and Illumos source) I have traced down what is going on and where the race is. The short version is that the 'zfs mount -a' in /lib/svc/method/fs-local is racing with syseventd's ZFS module. I have a dtrace capture (well, several of them) that shows this clearly: http://www.cs.toronto.edu/~cks/t/fs-local-mounttrace.txt (produced by http://www.cs.toronto.edu/~cks/t/mounttrace.d which I started at the top of /lib/svc/method/fs-local.) Looking at various things suggests that this may be happening partly because these additional pools are on iSCSI disks and the iSCSI disks seem to be taking a bit of time to show up (I've never fully understood how iSCSI disks are probed by Illumos). This may make it spiritually related to the bug that Bryan Horstmann-Allen mentioned in that both result in delayed device appearances. The following is a longer explanation of the race and assumes you have some familiarity with Illumos ZFS kernel internals. - pools present in /etc/zfs/zpool.cache are loaded into the kernel very early in boot, but they are not initialized and activated. This is done in spa_config_load(), calling spa_add(), which sets them to spa->spa_state = POOL_STATE_UNINITIALIZED. - inactive pools are activated through spa_activate(), which is called (among other times) whenever you open a pool. By a chain of calls this happens any time you make a ZFS IOCTL that involves a pool name. zfsdev_ioctl() -> pool_status_check() -> spa_open() -> etc. - 'zfs mount -a' of course does ZFS IOCTLs that involve pools because it wants to get pool configurations to find out what datasets it might have to mount. As such, it activate all additional pools present in zpool.cache when it runs (assuming that their vdev configuration is good, of course). - when a pool is activated this way in our environment, some sort of events are delivered to syseventd. I don't know enough about syseventd to say exactly what sort of event it is and it may well be iSCSI disk 'device appeared' messages. I have a very verbose syseventd debugging dump but I don't know enough to see anything useful in it. - when syseventd gets these events, its ZFS module decides that it too should mount (aka 'activate') all datasets for the newly-active pools. At this point a multithreaded syseventd and 'zfs mount -a' are racing to see who can mount all of the pool datasets, creating two failure modes for 'zfs mount -a'. The first failure mode is simply that syseventd wins the race and fully mounts a filesystem before 'zfs mount -a' looks at it, triggering a safety check of 'directory is not empty'. The second failure mode is that syseventd and 'zfs mount -a' both call mount() on the same filesystem at the same time and syseventd is the one that succeeds. In this case mount() itself will return an error and 'zfs mount -a' will report: cannot mount 'fs3-test-02': mountpoint or dataset is busy - cks From cks at cs.toronto.edu Wed Mar 5 21:38:09 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Wed, 05 Mar 2014 16:38:09 -0500 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: cks's message of Wed, 05 Mar 2014 16:23:56 -0500. <20140305212356.EE1AA1A0826@apps0.cs.toronto.edu> Message-ID: <20140305213809.3DC421A0826@apps0.cs.toronto.edu> It turns out that there is an unpleasant consequence to syseventd being willing to mount ZFS filesystems for additional pools before the 'zfs mount -a' has run: you can get unresolvable mount conflicts in some situations. Suppose that you have /opt as a separate ZFS filesystem in your root pool and you also have /opt/bigthing as a ZFS filesystem in a second pool. You can set this up and everything looks right, but if you reboot and syseventd beats 'zfs mount -a' for whatever reasons, you get an explosion: - we start with no additional filesystems mounted, including /opt - syseventd grabs the second pool, starts mounting things, and mounts /opt/bigthing on the *bare* root filesystem, making /opt (if necessary) in the process. - 'zfs mount -a' reaches /opt and attempts to mount it. However, because syseventd has already mounted /opt/bigthing, /opt is not empty. FAILURE. As far as I can tell there is no particularly good cure for this. To me it really looks like syseventd should either not be started before fs-local (although I don't know if anything breaks if its startup is deferred) or that it should not be mounting ZFS filesystems (although I can half-see the attraction of it doing so). - cks From cks at cs.toronto.edu Fri Mar 7 19:34:53 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Fri, 07 Mar 2014 14:34:53 -0500 Subject: [OmniOS-discuss] Reproducible r151008j kernel crash with ZFS pools on iSCSI Message-ID: <20140307193454.0E2911A0488@apps0.cs.toronto.edu> I have a reproducible kernel crash with OmniOS r151008j. The situation: The basic setup is a ZFS pool on mirrored pairs of iSCSI disks. The iSCSI disks come from two different iSCSI targets, and all targets are multipathed over two 10G networks. The pool is set to 'failmode=continue'. If I start a large streaming write to the pool and then take down both iSCSI interfaces on both targets (making all disks in the pool completely unavailable), OmniOS panics after a couple of minutes. Fortunately this doesn't happen if only a single target becomes inaccessible. I have crash dumps and can run commands against them and so on. Just tell me what to look at/do/etc. Since this is a test environment I can also reproduce this on demand and I'm willing test things freely. One panic produced the following: Mar 7 10:20:50 sanjuan ^Mpanic[cpu3]/thread=ffffff007c4dbc40: BAD TRAP: type=8 (#df Double fault) rp=ffffff114dca2f10 addr=0 zpool-fs3-test-0: #df Double fault pid=463, pc=0xfffffffff7903bb8, sp=0xffffff007c4d7000, eflags=0x10086 cr0: 8005003b cr4: 426f8 cr2: ffffff007c4d6ff8 cr3: bc00000 cr8: 0 rdi: ffffff1157501a80 rsi: 5 rdx: ffffff007c4d70c0 rcx: 5 r8: ffffff1275342d58 r9: 1 rax: 3 rbx: ffffff1157501a80 rbp: ffffff007c4d7050 r10: 0 r11: ffffffff r12: 5 r13: ffffff1142c55b48 r14: ffffff007c4d70c0 r15: 5 fsb: 0 gsb: ffffff1157501a80 ds: 4b es: 4b fs: 0 gs: 1c3 trp: 8 err: 0 rip: fffffffff7903bb8 cs: 30 rfl: 10086 rsp: ffffff007c4d7000 ss: 38 tss.tss_rsp0: 0xffffff007c4dbc40 tss.tss_rsp1: 0x0 tss.tss_rsp2: 0x0 tss.tss_ist1: 0xffffff114dca3000 tss.tss_ist2: 0x0 tss.tss_ist3: 0x0 tss.tss_ist4: 0x0 tss.tss_ist5: 0x0 tss.tss_ist6: 0x0 tss.tss_ist7: 0x0 ffffff114dca2df0 unix:real_mode_stop_cpu_stage2_end+9de3 () ffffff114dca2f00 unix:trap+ca5 () ffffff007c4d7050 unix:_patch_xrstorq_rbx+196 () ffffff007c4d70b0 apix:apix_do_interrupt+372 () ffffff007c4d70c0 unix:cmnint+ba () ffffff007c4d7200 genunix:avl_remove+197 () ffffff007c4d7240 zfs:vdev_queue_io_remove+54 () ffffff007c4d7600 zfs:vdev_queue_io_to_issue+133 () ffffff007c4d7640 zfs:vdev_queue_io_done+88 () ffffff007c4d7680 zfs:zio_vdev_io_done+80 () ffffff007c4d76c0 zfs:zio_execute+88 () ffffff007c4d7700 zfs:vdev_queue_io_done+78 () ffffff007c4d7740 zfs:zio_vdev_io_done+80 () ffffff007c4d7780 zfs:zio_execute+88 () ffffff007c4d77c0 zfs:vdev_queue_io_done+78 () ffffff007c4d7800 zfs:zio_vdev_io_done+80 () ffffff007c4d7840 zfs:zio_execute+88 () ffffff007c4d7880 zfs:vdev_queue_io_done+78 () ffffff007c4d78c0 zfs:zio_vdev_io_done+80 () ffffff007c4d7900 zfs:zio_execute+88 () ffffff007c4d7940 zfs:vdev_queue_io_done+78 () ffffff007c4d7980 zfs:zio_vdev_io_done+80 () ffffff007c4d79c0 zfs:zio_execute+88 () ffffff007c4d7a00 zfs:vdev_queue_io_done+78 () ffffff007c4d7a40 zfs:zio_vdev_io_done+80 () ffffff007c4d7a80 zfs:zio_execute+88 () ffffff007c4d7ac0 zfs:vdev_queue_io_done+78 () ffffff007c4d7b00 zfs:zio_vdev_io_done+80 () ffffff007c4d7b40 zfs:zio_execute+88 () ffffff007c4d7b80 zfs:vdev_queue_io_done+78 () ffffff007c4d7bc0 zfs:zio_vdev_io_done+80 () ffffff007c4d7c00 zfs:zio_execute+88 () ffffff007c4d7c40 zfs:vdev_queue_io_done+78 () ffffff007c4d7c80 zfs:zio_vdev_io_done+80 () ffffff007c4d7cc0 zfs:zio_execute+88 () ffffff007c4d7d00 zfs:vdev_queue_io_done+78 () ffffff007c4d7d40 zfs:zio_vdev_io_done+80 () ffffff007c4d7d80 zfs:zio_execute+88 () ffffff007c4d7dc0 zfs:vdev_queue_io_done+78 () ffffff007c4d7e00 zfs:zio_vdev_io_done+80 () ffffff007c4d7e40 zfs:zio_execute+88 () ffffff007c4d7e80 zfs:vdev_queue_io_done+78 () ffffff007c4d7ec0 zfs:zio_vdev_io_done+80 () ffffff007c4d7f00 zfs:zio_execute+88 () ffffff007c4d7f40 zfs:vdev_queue_io_done+78 () ffffff007c4d7f80 zfs:zio_vdev_io_done+80 () ffffff007c4d7fc0 zfs:zio_execute+88 () ffffff007c4d8000 zfs:vdev_queue_io_done+78 () ffffff007c4d8040 zfs:zio_vdev_io_done+80 () ffffff007c4d8080 zfs:zio_execute+88 () ffffff007c4d80c0 zfs:vdev_queue_io_done+78 () ffffff007c4d8100 zfs:zio_vdev_io_done+80 () ffffff007c4d8140 zfs:zio_execute+88 () ffffff007c4d8180 zfs:vdev_queue_io_done+78 () ffffff007c4d81c0 zfs:zio_vdev_io_done+80 () ffffff007c4d8200 zfs:zio_execute+88 () ffffff007c4d8240 zfs:vdev_queue_io_done+78 () ffffff007c4d8280 zfs:zio_vdev_io_done+80 () ffffff007c4d82c0 zfs:zio_execute+88 () ffffff007c4d8300 zfs:vdev_queue_io_done+78 () ffffff007c4d8340 zfs:zio_vdev_io_done+80 () ffffff007c4d8380 zfs:zio_execute+88 () ffffff007c4d83c0 zfs:vdev_queue_io_done+78 () ffffff007c4d8400 zfs:zio_vdev_io_done+80 () ffffff007c4d8440 zfs:zio_execute+88 () ffffff007c4d8480 zfs:vdev_queue_io_done+78 () ffffff007c4d84c0 zfs:zio_vdev_io_done+80 () ffffff007c4d8500 zfs:zio_execute+88 () ffffff007c4d8540 zfs:vdev_queue_io_done+78 () ffffff007c4d8580 zfs:zio_vdev_io_done+80 () ffffff007c4d85c0 zfs:zio_execute+88 () ffffff007c4d8600 zfs:vdev_queue_io_done+78 () ffffff007c4d8640 zfs:zio_vdev_io_done+80 () Warning: stack in the dump buffer may be incomplete ffffff007c4d8680 zfs:zio_execute+88 () Warning: stack in the dump buffer may be incomplete ffffff007c4d86c0 zfs:vdev_queue_io_done+78 () Warning: stack in the dump buffer may be incomplete ffffff007c4d8700 zfs:zio_vdev_io_done+80 () Warning: stack in the dump buffer may be incomplete [... repeats a lot ...] ffffff007c4db930 zfs:vdev_queue_io_done+78 () Warning: stack in the dump buffer may be incomplete ffffff007c4db970 zfs:zio_vdev_io_done+80 () Warning: stack in the dump buffer may be incomplete ffffff007c4db9b0 zfs:zio_execute+88 () Warning: stack in the dump buffer may be incomplete ffffff007c4db9f0 zfs:vdev_queue_io_done+78 () Warning: stack in the dump buffer may be incomplete ffffff007c4dba30 zfs:zio_vdev_io_done+80 () Warning: stack in the dump buffer may be incomplete ffffff007c4dba70 zfs:zio_execute+88 () Warning: stack in the dump buffer may be incomplete ffffff007c4dbb30 genunix:taskq_thread+2d0 () Warning: stack in the dump buffer may be incomplete ffffff007c4dbb40 unix:thread_start+8 () Warning: stack in the dump buffer may be incomplete syncing file systems... done A second crash has a very similar backtrace but the front is different: ffffff1157a61df0 unix:real_mode_stop_cpu_stage2_end+9de3 () ffffff1157a61f00 unix:trap+ca5 () ffffff007b7ce000 unix:_patch_xrstorq_rbx+196 () ffffff007b7ce070 genunix:avl_find+72 () ffffff007b7ce0b0 genunix:avl_add+27 () ffffff007b7ce0f0 zfs:vdev_queue_pending_add+4b () ffffff007b7ce4b0 zfs:vdev_queue_io_to_issue+153 () ffffff007b7ce4f0 zfs:vdev_queue_io_done+88 () ffffff007b7ce530 zfs:zio_vdev_io_done+80 () ffffff007b7ce570 zfs:zio_execute+88 () ffffff007b7ce5b0 zfs:vdev_queue_io_done+78 () ffffff007b7ce5f0 zfs:zio_vdev_io_done+80 () ffffff007b7ce630 zfs:zio_execute+88 () ffffff007b7ce670 zfs:vdev_queue_io_done+78 () ffffff007b7ce6b0 zfs:zio_vdev_io_done+80 () [... repeating pattern repeats ...] - cks From zembower at criterion.com Fri Mar 7 20:28:11 2014 From: zembower at criterion.com (Chris Zembower) Date: Fri, 7 Mar 2014 15:28:11 -0500 Subject: [OmniOS-discuss] System hangs every few days Message-ID: I was at about 6 months of uptime, then added some new SSD's for cache to the motherboard SATA ports. They weren't hot-plug recognized, so I rebooted over the weekend. Added the caches, all seemed good. Five days later, the system was locked. No kernel panic, just a frozen console and no network access. Not ping-able. Looking through the logs, I saw mostly just the typical (and benign?) netatalk messages: ------ mDNSResponder: [ID 702911 daemon.error] ERROR: getOptRdata - unknown opt 4 mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to 3600 for 312 nexus ------- Etc. But also, something new right before the crash: ------ Mar 7 11:18:28 colossus mac: [ID 486395 kern.info] NOTICE: igb3 link down Mar 7 11:18:28 colossus mac: [ID 486395 kern.info] NOTICE: igb4 link down Mar 7 11:18:28 colossus mac: [ID 486395 kern.info] NOTICE: igb2 link down Mar 7 11:18:28 colossus mac: [ID 486395 kern.info] NOTICE: igb5 link down Mar 7 11:18:28 colossus mac: [ID 486395 kern.info] NOTICE: aggr1000 link down Mar 7 11:18:30 colossus mac: [ID 435574 kern.info] NOTICE: igb3 link up, 1000 Mbps, full duplex Mar 7 11:18:30 colossus mac: [ID 435574 kern.info] NOTICE: aggr1000 link up, 1000 Mbps, full duplex Mar 7 11:18:30 colossus mac: [ID 435574 kern.info] NOTICE: igb2 link up, 1000 Mbps, full duplex Mar 7 11:18:30 colossus mac: [ID 435574 kern.info] NOTICE: igb4 link up, 1000 Mbps, full duplex Mar 7 11:18:30 colossus mac: [ID 435574 kern.info] NOTICE: igb5 link up, 1000 Mbps, full duplex Mar 7 11:18:35 colossus mac: [ID 486395 kern.info] NOTICE: igb3 link down Mar 7 11:18:35 colossus mac: [ID 486395 kern.info] NOTICE: igb4 link down Mar 7 11:18:35 colossus mac: [ID 486395 kern.info] NOTICE: igb2 link down Mar 7 11:18:36 colossus mac: [ID 486395 kern.info] NOTICE: igb5 link down Mar 7 11:18:36 colossus mac: [ID 486395 kern.info] NOTICE: aggr1000 link down ------ This goes on indefinitely, interfaces going down, coming up, over and over. All of the igb interfaces listed here are part of an aggregate group (although it's actually called aggr1, not aggr1000?). The other interfaces (2 additional igb's and 4 ixgbe's) did not log error messages, but by this point the server is unresponsive via ssh over the network and at the console. Interesting however, is that established file-sharing connections over the unaffected interfaces continue to function for quite a whole after the lockup, all night in one case. This includes AFP, SMB, and iSCSI (giving me enough time to shut down my virtual machines and log off some key clients). In other words, the zpools are functional, and so are enough services to keep that particular type of access alive. Establishing new connections over those protocols after the incident doesn't appear to be possible. A hard reboot is necessary to regain access to the console and permit new connections. My initial thought was that it could be an issue with the switch, but that seems unlikely because I have other LACP groups that are unaffected. I'm also thinking that it can't be a coincidence that this only started happening right after that initial reboot? Since that reboot, this crash has happened three times. The first, as I noted, was five days after the reconfiguration, but now they seem to be happening slightly more frequently, although they're always several days apart. I'm considering reverting to a base install and rebuilding the system config this weekend, as it's very basic.. but still curious if anyone has seen this type of behavior before. Regards, Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From cks at cs.toronto.edu Fri Mar 7 20:49:44 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Fri, 07 Mar 2014 15:49:44 -0500 Subject: [OmniOS-discuss] Bug: OmniOS r151008j terminates iSCSI initiator too early in shutdown Message-ID: <20140307204944.5BB0A1A0488@apps0.cs.toronto.edu> In at least OmniOS r151008j, the iSCSI initiator and thus any iSCSI disks it has established are shut down relatively early during a shutdown or reboot. In specific they are terminated before halt et al runs '/sbin/bootadm -ea update_all' (in halt.c's do_archives_update()). Under some circumstances this will cause system shutdown to hang. Suppose that you have ZFS pools that are hosted on iSCSI disks and those pools are set to the default 'failmode=wait'. When the iSCSI disks go away due to initiator shutdown, those pools enter a state where any IO to them will stall. Unfortunately bootadm does such IO (or at least does something that stalls in ZFS-land) and as such will itself stall, which stalls the shutdown process. Presumably either bootadm should be run earlier or iSCSI initiator shutdown should happen later or both. - cks From jimklimov at cos.ru Sat Mar 8 11:50:07 2014 From: jimklimov at cos.ru (Jim Klimov) Date: Sat, 08 Mar 2014 12:50:07 +0100 Subject: [OmniOS-discuss] Bug: OmniOS r151008j terminates iSCSI initiator too early in shutdown In-Reply-To: <20140307204944.5BB0A1A0488@apps0.cs.toronto.edu> References: <20140307204944.5BB0A1A0488@apps0.cs.toronto.edu> Message-ID: <531B03EF.7070501@cos.ru> On 2014-03-07 21:49, Chris Siebenmann wrote: > In at least OmniOS r151008j, the iSCSI initiator and thus any iSCSI > disks it has established are shut down relatively early during a shutdown > or reboot. In specific they are terminated before halt et al runs > '/sbin/bootadm -ea update_all' (in halt.c's do_archives_update()). > Under some circumstances this will cause system shutdown to hang. > > Suppose that you have ZFS pools that are hosted on iSCSI disks and > those pools are set to the default 'failmode=wait'. When the iSCSI disks > go away due to initiator shutdown, those pools enter a state where any > IO to them will stall. Unfortunately bootadm does such IO (or at least > does something that stalls in ZFS-land) and as such will itself stall, > which stalls the shutdown process. > > Presumably either bootadm should be run earlier or iSCSI initiator > shutdown should happen later or both. I guess you can control the order of shutdown procedures with SMF dependencies. In particular, it might be helpful to ensure that your system completely exports the remote-hosted pools before disabling iSCSI (and possibly networking, etc.). I hope that my write-up on the OI wiki would be relevant here: http://wiki.openindiana.org/oi/Advanced+-+ZFS+Pools+as+SMF+services+and+iSCSI+loopback+mounts Likewise, any of your services which need data from this pool and can be wrapped into SMF (like VM's, zones, etc.) can also be sure to stop properly before you export the pool and stop iSCSI. http://wiki.openindiana.org/display/oi/Zones+as+SMF+services I do mean to brush up those articles and code samples into a more proper form (a package or something), but in the meanwhile the articles can do with some manual work on the user's side ;) HTH, //Jim From jimklimov at cos.ru Sat Mar 8 17:11:36 2014 From: jimklimov at cos.ru (Jim Klimov) Date: Sat, 08 Mar 2014 18:11:36 +0100 Subject: [OmniOS-discuss] Reproducible r151008j kernel crash with ZFS pools on iSCSI In-Reply-To: <20140307193454.0E2911A0488@apps0.cs.toronto.edu> References: <20140307193454.0E2911A0488@apps0.cs.toronto.edu> Message-ID: <531B4F48.7060005@cos.ru> On 2014-03-07 20:34, Chris Siebenmann wrote: > I have a reproducible kernel crash with OmniOS r151008j. The situation: > > The basic setup is a ZFS pool on mirrored pairs of iSCSI disks. The > iSCSI disks come from two different iSCSI targets, and all > targets are multipathed over two 10G networks. The pool is set to > 'failmode=continue'. If I start a large streaming write to the pool and > then take down both iSCSI interfaces on both targets (making all disks > in the pool completely unavailable), OmniOS panics after a couple of > minutes. Fortunately this doesn't happen if only a single target becomes > inaccessible. By "pointing my finger into the sky" I might guesstimate that since you have some streaming writes and they do go on, some buffer space becomes exhausted (perhaps the hanging ZIOs waiting for the storage backends to come back). I would expect the write()'s to not return and thus throttle the clients from pushing more data, but perhaps there are enough client threads trying to write that their maximum buffer spaces combined would overwhelm the particular server. In short: when reproducing the bug, try something like "vmstat 1" in a separate SSH shell, to see if your available memory plummets when you disconnect the devices and/or the "sr" (scanrate, search for swapping) increases substantially. HTH, //Jim Klimov From cks at cs.toronto.edu Sat Mar 8 21:31:16 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Sat, 08 Mar 2014 16:31:16 -0500 Subject: [OmniOS-discuss] Bug: OmniOS r151008j terminates iSCSI initiator too early in shutdown In-Reply-To: Your message of Sat, 08 Mar 2014 12:50:07 +0100. <531B03EF.7070501@cos.ru> Message-ID: <20140308213116.740201A03A0@apps0.cs.toronto.edu> | On 2014-03-07 21:49, Chris Siebenmann wrote: | > In at least OmniOS r151008j, the iSCSI initiator and thus any iSCSI | > disks it has established are shut down relatively early during a shutdown | > or reboot. In specific they are terminated before halt et al runs | > '/sbin/bootadm -ea update_all' (in halt.c's do_archives_update()). | > Under some circumstances this will cause system shutdown to hang. [...] | > Presumably either bootadm should be run earlier or iSCSI initiator | > shutdown should happen later or both. | | I guess you can control the order of shutdown procedures with | SMF dependencies. In particular, it might be helpful to ensure | that your system completely exports the remote-hosted pools | before disabling iSCSI (and possibly networking, etc.). Unfortunately exporting pools on shutdown is an ugly and potentially fragile workaround with a number of side effects (and one that was not necessary on Solaris 10). As far as I can tell from simply looking at things right now, even an orderly shutdown on an OmniOS system will not avoid this. I don't see anything that inactivates pools[*] or even unmounts ZFS filesystems even in an orderly shutdown. And SMF shutdown procedures are deliberately bypassed if you just run 'reboot' (it's in the manpage if you read all the way to the end and ignore the fact that it doesn't talk about SMF). - cks [*: partly because there is no user-level way to do this as far as I know. Explicitly exporting a pool is a different thing.] From cks at cs.toronto.edu Sat Mar 8 21:35:37 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Sat, 08 Mar 2014 16:35:37 -0500 Subject: [OmniOS-discuss] Bug: OmniOS r151008j terminates iSCSI initiator too early in shutdown In-Reply-To: cks's message of Sat, 08 Mar 2014 16:31:16 -0500. <20140308213116.740201A03A0@apps0.cs.toronto.edu> Message-ID: <20140308213537.A2A1C1A03A0@apps0.cs.toronto.edu> I wrote: | As far as I can tell from simply looking at things right now, even | an orderly shutdown on an OmniOS system will not avoid this. I should clarify that: 'on a normal, stock setup OmniOS system'. You can of course add SMF jobs to import and export ZFS pools and then shim them into the dependency order, but an out of the box OmniOS system does not do this right. (I'm not convinced it's ever possible to do it right without the administrator having to explicitly configure things, but maybe there's a way to make all of the magic work if the right SMF jobs were present by default.) - cks From richard.elling at richardelling.com Sun Mar 9 02:28:21 2014 From: richard.elling at richardelling.com (Richard Elling) Date: Sat, 8 Mar 2014 18:28:21 -0800 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: <20140305213809.3DC421A0826@apps0.cs.toronto.edu> References: <20140305213809.3DC421A0826@apps0.cs.toronto.edu> Message-ID: <15D16E14-A80D-47B9-94AB-427A2F9D7BBA@RichardElling.com> On Mar 5, 2014, at 1:38 PM, Chris Siebenmann wrote: > It turns out that there is an unpleasant consequence to syseventd being > willing to mount ZFS filesystems for additional pools before the 'zfs > mount -a' has run: you can get unresolvable mount conflicts in some > situations. The basic problem affects other file systems, too. The general best practice has always been to keep your hierarchy flat. But... > > Suppose that you have /opt as a separate ZFS filesystem in your > root pool and you also have /opt/bigthing as a ZFS filesystem in > a second pool. You can set this up and everything looks right, but > if you reboot and syseventd beats 'zfs mount -a' for whatever reasons, > you get an explosion: > > - we start with no additional filesystems mounted, including /opt > - syseventd grabs the second pool, starts mounting things, and > mounts /opt/bigthing on the *bare* root filesystem, making /opt > (if necessary) in the process. > - 'zfs mount -a' reaches /opt and attempts to mount it. However, > because syseventd has already mounted /opt/bigthing, /opt is not > empty. FAILURE. > > As far as I can tell there is no particularly good cure for this. To > me it really looks like syseventd should either not be started before > fs-local (although I don't know if anything breaks if its startup is > deferred) or that it should not be mounting ZFS filesystems (although I > can half-see the attraction of it doing so). ... a fix would necessitate building a multi-pool dependency tree. Where would this live? How about if we put it in /etc? This is effectively what vfstab does, though in a more simplistic manner: it simply sorts the list of file systems and mounts the short path first. The difference between vfstab and ZFS automatic mounts is that the former can be multi-pool aware, even if it doesn't know anything about pools at all. Hence the "solution" is ZFS mountpoint=legacy and use vfstab. -- richard -- Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cks at cs.toronto.edu Sun Mar 9 02:56:34 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Sat, 08 Mar 2014 21:56:34 -0500 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: richard.elling's message of Sat, 08 Mar 2014 18:28:21 -0800. <15D16E14-A80D-47B9-94AB-427A2F9D7BBA@RichardElling.com> Message-ID: <20140309025634.C81041A03A0@apps0.cs.toronto.edu> | On Mar 5, 2014, at 1:38 PM, Chris Siebenmann wrote: | > It turns out that there is an unpleasant consequence to syseventd | > being willing to mount ZFS filesystems for additional pools before | > the 'zfs mount -a' has run: you can get unresolvable mount conflicts | > in some situations. [...] Richard Elling: | ... a fix would necessitate building a multi-pool dependency | tree. Where would this live? The thing is that ZFS already has a multi-pool dependency that works perfectly well in this situation. 'zfs mount -a' processess all pools at once and sorts the mount list so that /opt will be mounted before /opt/bigthing. What makes this not work is that syseventd is willing to mount filesystems from non-root pools before the rpool mounts have completed (and also I believe to do pool mounts on a pool by pool basis). At a minimum I believe that syseventd should not be mounting filesystems from non-rpool pools before all rpool mounts have completed. I would prefer that syseventd not do mounts at all before /system/filesystem/local finishes. (You cannot in general defer syseventd until afterwards because there are a number of dependencies in SMF today that I assume are there for good reason. I have actually inventoried these in the process of relocating syseventd to after fs-local so I can provide a list if people want.[*]) - cks [*: This is where I wish SMF had a way to report the full dependency graph in one go in some format, so you did not have to play whack-a-mole when doing this sort of thing and also potentially blow up your system.] From jimklimov at cos.ru Mon Mar 10 15:13:06 2014 From: jimklimov at cos.ru (Jim Klimov) Date: Mon, 10 Mar 2014 16:13:06 +0100 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: <20140309025634.C81041A03A0@apps0.cs.toronto.edu> References: <20140309025634.C81041A03A0@apps0.cs.toronto.edu> Message-ID: <531DD682.8000507@cos.ru> On 2014-03-09 03:28, Richard Elling wrote:> The basic problem affects other file systems, too. The general best practice > has always been to keep your hierarchy flat. But... That is a strange best practice, especially given that ZFS allows and markets the ability of hierarchical datasets. But at least in this case, this is irrelevant since Chris's setup used datasets living just under the pool's root. Flatter than that is a private pool per user, which is not quite the promoted ZFS way ;) On 2014-03-09 03:56, Chris Siebenmann wrote: > [*: This is where I wish SMF had a way to report the full dependency > graph in one go in some format, so you did not have to play > whack-a-mole when doing this sort of thing and also potentially > blow up your system.] This one immediately came to mind: "SMF Dependency Graph Generator" https://java.net/projects/scfdot/pages/Home https://java.net/projects/scfdot/sources/scfdot-src/show I am not sure how alive or functional this project is today, and on OmniOS (or any other non-Oracle distro) in particular. But IMHO it is the best fit to your question (says so on the label ;) ). //Jim From richard.elling at richardelling.com Mon Mar 10 15:56:45 2014 From: richard.elling at richardelling.com (Richard Elling) Date: Mon, 10 Mar 2014 08:56:45 -0700 Subject: [OmniOS-discuss] How do non-rpool ZFS filesystems get mounted? In-Reply-To: <531DD682.8000507@cos.ru> References: <20140309025634.C81041A03A0@apps0.cs.toronto.edu> <531DD682.8000507@cos.ru> Message-ID: <7378CB27-D4A8-415C-A70A-C5668396C88A@RichardElling.com> > On Mar 10, 2014, at 8:13 AM, Jim Klimov wrote: > > On 2014-03-09 03:28, Richard Elling wrote:> The basic problem affects other file systems, too. The general best practice > > has always been to keep your hierarchy flat. But... > > That is a strange best practice, especially given that ZFS allows > and markets the ability of hierarchical datasets. Hierarchial datasets work well. The problems occur with hierarchial pools. -- richard > But at least in > this case, this is irrelevant since Chris's setup used datasets > living just under the pool's root. Flatter than that is a private > pool per user, which is not quite the promoted ZFS way ;) > >> On 2014-03-09 03:56, Chris Siebenmann wrote: >> [*: This is where I wish SMF had a way to report the full dependency >> graph in one go in some format, so you did not have to play >> whack-a-mole when doing this sort of thing and also potentially >> blow up your system.] > > This one immediately came to mind: > "SMF Dependency Graph Generator" > https://java.net/projects/scfdot/pages/Home > https://java.net/projects/scfdot/sources/scfdot-src/show > > I am not sure how alive or functional this project is today, and on > OmniOS (or any other non-Oracle distro) in particular. But IMHO it > is the best fit to your question (says so on the label ;) ). > > //Jim > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From cks at cs.toronto.edu Mon Mar 10 16:10:40 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Mon, 10 Mar 2014 12:10:40 -0400 Subject: [OmniOS-discuss] Reproducible r151008j kernel crash with ZFS pools on iSCSI In-Reply-To: Your message of Sat, 08 Mar 2014 18:11:36 +0100. <531B4F48.7060005@cos.ru> Message-ID: <20140310161040.9263E1A053B@apps0.cs.toronto.edu> | In short: when reproducing the bug, try something like "vmstat 1" in a | separate SSH shell, to see if your available memory plummets when you | disconnect the devices and/or the "sr" (scanrate, search for swapping) | increases substantially. 'vmstat 1' shows no sign of this. sr is flatlined at zero all through and free is basically frozen (with roughly 39 GB free[*]). I also have live monitoring of the user-level write rate of the IO source and it stalls relatively early on in the process. To the extent that I can see anything from the call stack in the panics, it really looks to me as if something is overrunning a kernel stack size limit for some reason. - cks [*: this is a 64 GB machine.] From jimklimov at cos.ru Mon Mar 10 16:37:35 2014 From: jimklimov at cos.ru (Jim Klimov) Date: Mon, 10 Mar 2014 17:37:35 +0100 Subject: [OmniOS-discuss] Reproducible r151008j kernel crash with ZFS pools on iSCSI In-Reply-To: <20140310161040.9263E1A053B@apps0.cs.toronto.edu> References: <20140310161040.9263E1A053B@apps0.cs.toronto.edu> Message-ID: <531DEA4F.2000709@cos.ru> On 2014-03-10 17:10, Chris Siebenmann wrote: > To the extent that I can see anything from the call stack in the > panics, it really looks to me as if something is overrunning a kernel > stack size limit for some reason. Also, just to be sure, you don't do anything non-standard, like ZFS blocks over 128KB in size? There were experiments toward this which according to the lists could lead to some overflow. Probably not your case, but popped out in my head by some analogy ;) //Jim From cks at cs.toronto.edu Mon Mar 10 16:41:35 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Mon, 10 Mar 2014 12:41:35 -0400 Subject: [OmniOS-discuss] Reproducible r151008j kernel crash with ZFS pools on iSCSI In-Reply-To: jimklimov's message of Mon, 10 Mar 2014 17:37:35 +0100. <531DEA4F.2000709@cos.ru> Message-ID: <20140310164135.2E8861A053B@apps0.cs.toronto.edu> | On 2014-03-10 17:10, Chris Siebenmann wrote: | > To the extent that I can see anything from the call stack in the | > panics, it really looks to me as if something is overrunning a | > kernel stack size limit for some reason. | | Also, just to be sure, you don't do anything non-standard, like ZFS | blocks over 128KB in size? There were experiments toward this which | according to the lists could lead to some overflow. Probably not your | case, but popped out in my head by some analogy ;) I've got nothing unusual in this way; the pool setup is plain and ordinary. The pools are on 4k sector iSCSI disks (that are being reported that way and the pool vdevs are ashift=12). - cks From cj.keist at colostate.edu Tue Mar 11 19:21:30 2014 From: cj.keist at colostate.edu (CJ Keist) Date: Tue, 11 Mar 2014 13:21:30 -0600 Subject: [OmniOS-discuss] VirtIO drivers?? Message-ID: <531F623A.9080800@colostate.edu> I saw question asked on this discussion list but no answer was given. Is there VirtIO driver for OmniOS? Wanting to run OmniOS on proxmox KVM with VirtIO for network driver. -- C. J. Keist Email: cj.keist at colostate.edu Systems Group Manager Solaris 10 OS (SAI) Engineering Network Services Phone: 970-491-0630 College of Engineering, CSU Fax: 970-491-5569 Ft. Collins, CO 80523-1301 All I want is a chance to prove 'Money can't buy happiness' From danmcd at omniti.com Tue Mar 11 19:43:22 2014 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 11 Mar 2014 15:43:22 -0400 Subject: [OmniOS-discuss] VirtIO drivers?? In-Reply-To: <531F623A.9080800@colostate.edu> References: <531F623A.9080800@colostate.edu> Message-ID: <190CFDCB-B118-422E-82D9-11A3ABE860EA@omniti.com> On Mar 11, 2014, at 3:21 PM, CJ Keist wrote: > I saw question asked on this discussion list but no answer was given. Is there VirtIO driver for OmniOS? Wanting to run OmniOS on proxmox KVM with VirtIO for network driver. Check the illumos-nexenta repo: github.com/Nexenta/illumos-nexenta They're further along on this front than upstream. You'd be a hero if you tested it publically and upstreamed it! Dan From sim.ple at live.nl Thu Mar 13 10:54:16 2014 From: sim.ple at live.nl (Randy S) Date: Thu, 13 Mar 2014 11:54:16 +0100 Subject: [OmniOS-discuss] kayak problems Message-ID: Hi, I'm new to omnios and was testing to see how the kayak system works. Just installed the latest stable omnios and followed the online standard instructions to install kayak. Website reachable, everything seems fine. Working image=r151008 en configuration file created. (Just to make my info complete). I have noticed that more people have had problems with it and have I the idea that somehow they solved them with the help of your forum. For me however, the suggestions didn't work for me ... yet. The thing is that when I start a workstation to be installed by kayak, the process fails because (I guess) the miniroot is missing libidn.so.11.6.11 What I did: (taken from http://comments.gmane.org/gmane.os.omnios.general/1660) # gzip -d miniroot.gz # cp miniroot /tmp # mkdir /mnt/test # mount -o nologging `lofiadm -a /tmp/miniroot` /mnt/test/ I then tried to copy the lib to the /mnt/test but could not because there is no space left on the device. Can anybody please tell me how to solve this? Thanks in advance Greetings Randy -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at will.to Thu Mar 13 14:07:02 2014 From: doug at will.to (Doug Hughes) Date: Thu, 13 Mar 2014 10:07:02 -0400 Subject: [OmniOS-discuss] kayak problems In-Reply-To: References: Message-ID: <92f4622d-02b8-419d-974d-8a65ee09bfae.maildroid@localhost> Yes, it is missing a few key libraries, I reported that in a similar email back around december but it seems to have gone unfixed in the interim. If you find my posts, it indicates the things that I found and fixed. Luckily, it is easy to uncompress, mount and fix the miniroot with the missing libraries and symlinks. Sent from my android device. -----Original Message----- From: Randy S To: "omnios-discuss at lists.omniti.com" Sent: Thu, 13 Mar 2014 7:00 AM Subject: [OmniOS-discuss] kayak problems Hi, I'm new to omnios and was testing to see how the kayak system works. Just installed the latest stable omnios and followed the online standard instructions to install kayak. Website reachable, everything seems fine. Working image=r151008 en configuration file created. (Just to make my info complete). I have noticed that more people have had problems with it and have I the idea that somehow they solved them with the help of your forum. For me however, the suggestions didn't work for me ... yet. The thing is that when I start a workstation to be installed by kayak, the process fails because (I guess) the miniroot is missing libidn.so.11.6.11 What I did: (taken from http://comments.gmane.org/gmane.os.omnios.general/1660) # gzip -d miniroot.gz # cp miniroot /tmp # mkdir /mnt/test # mount -o nologging `lofiadm -a /tmp/miniroot` /mnt/test/ I then tried to copy the lib to the /mnt/test but could not because there is no space left on the device. Can anybody please tell me how to solve this? Thanks in advance Greetings Randy -------------- next part -------------- An HTML attachment was scrubbed... URL: From esproul at omniti.com Thu Mar 13 14:16:56 2014 From: esproul at omniti.com (Eric Sproul) Date: Thu, 13 Mar 2014 10:16:56 -0400 Subject: [OmniOS-discuss] kayak problems In-Reply-To: <92f4622d-02b8-419d-974d-8a65ee09bfae.maildroid@localhost> References: <92f4622d-02b8-419d-974d-8a65ee09bfae.maildroid@localhost> Message-ID: On Thu, Mar 13, 2014 at 10:07 AM, Doug Hughes wrote: > Yes, it is missing a few key libraries, I reported that in a similar email > back around december but it seems to have gone unfixed in the interim. If > you find my posts, it indicates the things that I found and fixed. Luckily, > it is easy to uncompress, mount and fix the miniroot with the missing > libraries and symlinks. It's been fixed in the code. Updated packages will likely be coming soon, but if you're impatient, you can build it. https://github.com/omniti-labs/kayak/commit/3eb2021a8bd5cd0e52fc34c7520ccf98a2ad6aa5 Eric From doug at will.to Thu Mar 13 14:16:48 2014 From: doug at will.to (Doug Hughes) Date: Thu, 13 Mar 2014 10:16:48 -0400 Subject: [OmniOS-discuss] kayak problems In-Reply-To: References: <92f4622d-02b8-419d-974d-8a65ee09bfae.maildroid@localhost> Message-ID: <5fcb2063-606c-4542-8daa-88170197ecf4.maildroid@localhost> Thanls, Eric! Sent from my android device. -----Original Message----- From: Eric Sproul To: Doug Hughes Cc: "omnios-discuss at lists.omniti.com" , Randy S Sent: Thu, 13 Mar 2014 10:16 AM Subject: Re: [OmniOS-discuss] kayak problems On Thu, Mar 13, 2014 at 10:07 AM, Doug Hughes wrote: > Yes, it is missing a few key libraries, I reported that in a similar email > back around december but it seems to have gone unfixed in the interim. If > you find my posts, it indicates the things that I found and fixed. Luckily, > it is easy to uncompress, mount and fix the miniroot with the missing > libraries and symlinks. It's been fixed in the code. Updated packages will likely be coming soon, but if you're impatient, you can build it. https://github.com/omniti-labs/kayak/commit/3eb2021a8bd5cd0e52fc34c7520ccf98a2ad6aa5 Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.ranskis at gmail.com Mon Mar 17 22:22:19 2014 From: alex.ranskis at gmail.com (Alex) Date: Mon, 17 Mar 2014 23:22:19 +0100 Subject: [OmniOS-discuss] kayak problems In-Reply-To: References: <92f4622d-02b8-419d-974d-8a65ee09bfae.maildroid@localhost> Message-ID: On 13 March 2014 15:16, Eric Sproul wrote: > On Thu, Mar 13, 2014 at 10:07 AM, Doug Hughes wrote: > > Yes, it is missing a few key libraries, I reported that in a similar > email > > back around december but it seems to have gone unfixed in the interim. If > > you find my posts, it indicates the things that I found and fixed. > Luckily, > > it is easy to uncompress, mount and fix the miniroot with the missing > > libraries and symlinks. > > It's been fixed in the code. Updated packages will likely be coming > soon, but if you're impatient, you can build it. > > > https://github.com/omniti-labs/kayak/commit/3eb2021a8bd5cd0e52fc34c7520ccf98a2ad6aa5 I've also had issues with disk_help.sh, if using '<' or '>' to match for a specific disk size. Caused by : size=`prtvtoc $rdsk 2>/dev/null | awk '/bytes\/sector/{bps=$2} /sectors\/cylinder/{bpc=bps*$2} /accessible sectors/{print ($2*bps)/1048576;} /accessible cylinders/{print int(($2*bpc)/1048576);}'` awk will switch to scientific notation for large values and bash will fail later while comparing that value to the one provided in the configuration. switching from print to printf("%.0f", ..) fixed it. Apologies if this has already been reported Cheers, -- alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From lotheac at iki.fi Tue Mar 18 15:58:45 2014 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Tue, 18 Mar 2014 17:58:45 +0200 Subject: [OmniOS-discuss] pkgsend generate bug with spaces in file names Message-ID: <20140318155845.GC21841@gutsman.lotheac.fi> I ran into this while packaging Python 3.4.0 which apparently ships files with space characters in them. % mkdir foo && touch 'foo/bar baz' % pkgsend generate foo | pkgmogrify pkgmogrify: File line 1: Malformed action at position: 12: file bar baz group=bin mode=0644 owner=root path="bar baz" ^ http://docs.oracle.com/cd/E26502_01/html/E21383/pkgcreate.html#gludq suggests this has has possibly been fixed in Oracle's pkg. -- Lauri Tirkkonen | +358 50 5341376 | lotheac @ IRCnet From svavar at januar.is Wed Mar 19 10:37:34 2014 From: svavar at januar.is (=?ISO-8859-1?Q?Svavar_=D6rn_Eysteinsson?=) Date: Wed, 19 Mar 2014 10:37:34 +0000 Subject: [OmniOS-discuss] Trying to upgrade from r151006... Message-ID: Hello list. I have a HP microserver that I have installed OmniOS on lately last year. I havn't powered up the server for some time until yesterday. Having some trouble upgrading the OS to the newest through the pkg command. Current version is : OmniOS v11 r151006 pkg publishers configured : omnios origin online http://pkg.omniti.com/omnios/release/ Now when I issue a pkg update -nv command I will receive the following errors : root at blackbox:~# pkg update -nv Creating Plan | pkg update: No solution was found to satisfy constraints Plan Creation: Package solver has not found a solution to update to latest available versions. This may indicate an overly constrained set of packages are installed. latest incorporations: pkg://omnios/consolidation/osnet/osnet-incorporation at 0.5.11 ,5.11-0.151008:20131204T022427Z pkg://omnios/incorporation/jeos/omnios-userland at 11 ,5.11-0.151008:20131206T160517Z pkg://omnios/entire at 11,5.11-0.151008:20131205T195242Z pkg://omnios/incorporation/jeos/illumos-gate at 11 ,5.11-0.151008:20131204T024149Z The following indicates why the system cannot update to the latest version: No suitable version of required package pkg://omnios/incorporation/jeos/omnios-userland at 11,5.11-0.151006:20140109T172403Z found: Reject: pkg://omnios/incorporation/jeos/omnios-userland at 11 ,5.11-0.151006:20140109T172403Z Reason: A version for 'incorporate' dependency on pkg:/library/python-2/python-extra-26 at 0.5.11,5.11-0.151006 cannot be found No suitable version of required package pkg://omnios/incorporation/jeos/omnios-userland at 11,5.11-0.151006:20140113T224931Z found: Reject: pkg://omnios/incorporation/jeos/omnios-userland at 11 ,5.11-0.151006:20140113T224931Z Reason: A version for 'incorporate' dependency on pkg:/library/python-2/python-extra-26 at 0.5.11,5.11-0.151006 cannot be found No suitable version of required package pkg://omnios/incorporation/jeos/omnios-userland at 11,5.11-0.151006:20140203T190027Z found: Reject: pkg://omnios/incorporation/jeos/omnios-userland at 11 ,5.11-0.151006:20140203T190027Z Reason: A version for 'incorporate' dependency on pkg:/library/python-2/python-extra-26 at 0.5.11,5.11-0.151006 cannot be found Does anyone know that the heck is going on ? I have followed the procedures on : http://omnios.omniti.com/wiki.php/Upgrade_r151006_r151008 but, surely when I issue pkg update command I will get these errors/notification above. Thanks in advance. Best regards, Svavar O - Reykjavik - Iceland -------------- next part -------------- An HTML attachment was scrubbed... URL: From svavar at januar.is Wed Mar 19 16:53:11 2014 From: svavar at januar.is (=?ISO-8859-1?Q?Svavar_=D6rn_Eysteinsson?=) Date: Wed, 19 Mar 2014 16:53:11 +0000 Subject: [OmniOS-discuss] Trying to upgrade from r151006... In-Reply-To: References: Message-ID: I've managed to activate a older BE environment which is still a r151006 version. When I issue a pkg update -nv it gives me the following packages to update : root at blackbox:/tmp# pkg update -nv Packages to update: 11 Estimated space available: 218.64 GB Estimated space to be consumed: 153.88 MB Create boot environment: Yes Activate boot environment: Yes Create backup boot environment: No Rebuild boot archive: Yes Changed packages: omnios developer/debug/mdb 0.5.11,5.11-0.151006:20130731T194820Z -> 0.5.11,5.11-0.151006:20131019T183740Z driver/storage/mpt_sas 0.5.11,5.11-0.151006:20130506T161108Z -> 0.5.11,5.11-0.151006:20130906T160306Z driver/storage/mr_sas 0.5.11,5.11-0.151006:20130506T161108Z -> 0.5.11,5.11-0.151006:20130906T160306Z entire 11,5.11-0.151006:20130507T204959Z -> 11,5.11-0.151006:20131210T224515Z incorporation/jeos/omnios-userland 11,5.11-0.151006:20130716T202721Z -> 11,5.11-0.151006:20131030T205312Z library/python-2/python-extra-26 0.5.11,5.11-0.151006:20130506T184813Z -> 0.5.11,5.11-0.151008:20131204T024250Z library/security/openssl 1.0.1.5,5.11-0.151006:20130506T185419Z -> 1.0.1.6,5.11-0.151006:20140110T154549Z network/dns/bind 9.9.2.2,5.11-0.151006:20130506T185915Z -> 9.9.3.2,5.11-0.151006:20130731T155125Z system/file-system/zfs 0.5.11,5.11-0.151006:20130814T165834Z -> 0.5.11,5.11-0.151006:20131210T212000Z system/kernel 0.5.11,5.11-0.151006:20130731T194843Z -> 0.5.11,5.11-0.151006:20131019T183804Z perl.omniti.com omniti/perl/www-curl 4.15,5.11-0.151002:20120807T165910Z -> 4.15,5.11-0.151006:20140312T201517Z These are the publishers, root at blackbox:/tmp# pkg publisher PUBLISHER TYPE STATUS URI omnios origin online http://pkg.omniti.com/omnios/release/ ms.omniti.com origin online http://pkg.omniti.com/omniti-ms/ perl.omniti.com origin online http://pkg.omniti.com/omniti-perl/ Is this the only updated files from 151006 to 151008 ? Thanks in advance. *SVAVAR ?RN EYSTEINSSON*Kerfisstj?ri Gsm / mobile +354 862 1624 S?mi / tel +354 531 0101 *Jan?ar marka?sh?s*www.januar.is / Facebook On 19 March 2014 10:37, Svavar ?rn Eysteinsson wrote: > Hello list. > > I have a HP microserver that I have installed OmniOS on lately last year. > I havn't powered up the server for some time until yesterday. > > Having some trouble upgrading the OS to the newest through the pkg command. > > Current version is : OmniOS v11 r151006 > > pkg publishers configured : > > omnios origin online > http://pkg.omniti.com/omnios/release/ > > Now when I issue a pkg update -nv command I will receive the following > errors : > > > root at blackbox:~# pkg update -nv > Creating Plan | > pkg update: No solution was found to satisfy constraints > Plan Creation: Package solver has not found a solution to update to latest > available versions. > This may indicate an overly constrained set of packages are installed. > > latest incorporations: > > pkg://omnios/consolidation/osnet/osnet-incorporation at 0.5.11 > ,5.11-0.151008:20131204T022427Z > pkg://omnios/incorporation/jeos/omnios-userland at 11 > ,5.11-0.151008:20131206T160517Z > pkg://omnios/entire at 11,5.11-0.151008:20131205T195242Z > pkg://omnios/incorporation/jeos/illumos-gate at 11 > ,5.11-0.151008:20131204T024149Z > > The following indicates why the system cannot update to the latest version: > > No suitable version of required package > pkg://omnios/incorporation/jeos/omnios-userland at 11,5.11-0.151006:20140109T172403Z > found: > Reject: pkg://omnios/incorporation/jeos/omnios-userland at 11 > ,5.11-0.151006:20140109T172403Z > Reason: A version for 'incorporate' dependency on > pkg:/library/python-2/python-extra-26 at 0.5.11,5.11-0.151006 cannot be found > No suitable version of required package > pkg://omnios/incorporation/jeos/omnios-userland at 11,5.11-0.151006:20140113T224931Z > found: > Reject: pkg://omnios/incorporation/jeos/omnios-userland at 11 > ,5.11-0.151006:20140113T224931Z > Reason: A version for 'incorporate' dependency on > pkg:/library/python-2/python-extra-26 at 0.5.11,5.11-0.151006 cannot be found > No suitable version of required package > pkg://omnios/incorporation/jeos/omnios-userland at 11,5.11-0.151006:20140203T190027Z > found: > Reject: pkg://omnios/incorporation/jeos/omnios-userland at 11 > ,5.11-0.151006:20140203T190027Z > Reason: A version for 'incorporate' dependency on > pkg:/library/python-2/python-extra-26 at 0.5.11,5.11-0.151006 cannot be found > > > Does anyone know that the heck is going on ? > > I have followed the procedures on : > http://omnios.omniti.com/wiki.php/Upgrade_r151006_r151008 > but, surely when I issue pkg update command I will get these > errors/notification above. > > Thanks in advance. > > Best regards, > > Svavar O - Reykjavik - Iceland > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmabis at vmware.com Fri Mar 21 15:46:27 2014 From: mmabis at vmware.com (Matthew Mabis) Date: Fri, 21 Mar 2014 08:46:27 -0700 (PDT) Subject: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone In-Reply-To: References: Message-ID: <1424115820.12054356.1395416787354.JavaMail.root@vmware.com> Hey All, I am debating the idea of just swapping all my hard drives in my current 8x2TB RaidZ2 (all be it slowly) and let the environment resilver each drive than expand versus creating a new RaidZ2 on a different box and cloning the data over. Obviously i know of the Pros/Cons/Risks associated with that method. My question about debating deals with the new drives being 4K where as the old drives were 512b aligned My Current config is using (6x Hitachi HDS5C302 and 2x SAMSUNG HD203WI) where i will be switching over to ST4000VN000 drives all the way (purchased 4 already waiting a little time to see if i can purchase via a different batch [some ppl debate on this but to me its the way i have done it for a long time]) i don't wan't to us dissimilar models anymore as sometimes the samsung drives in this config went well lets call it NUTTY.... I use my environment for multiple things (Network Data Backups, NFS Backups for ESXi, Media Storage) my current environment is running down on space and with my projections ill run out of space within the next 6 months (~26% Free that includes the 1.08TB Reservation) so i am prepping for the transition. Just curious what you would do in my situation, replace the drives or build a new vDev and why? I have all the underlying hardware to handle it (SAS-2008 Controller, ECC, and a ZIL/SLOG. If needed i could use my infiniband backend to clone the data at 10Gb via IPoIB) Thanks Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From cks at cs.toronto.edu Fri Mar 21 16:04:37 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Fri, 21 Mar 2014 12:04:37 -0400 Subject: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone In-Reply-To: mmabis's message of Fri, 21 Mar 2014 08:46:27 -0700. <1424115820.12054356.1395416787354.JavaMail.root@vmware.com> Message-ID: <20140321160437.2C8B11A04E9@apps0.cs.toronto.edu> | I am debating the idea of just swapping all my hard drives in my | current 8x2TB RaidZ2 (all be it slowly) and let the environment | resilver each drive than expand versus creating a new RaidZ2 on a | different box and cloning the data over. | | Obviously i know of the Pros/Cons/Risks associated with that | method. My question about debating deals with the new drives being 4K | where as the old drives were 512b aligned [...] As far as I know there is no question here: you simply cannot put 4K drives in a vdev originally created with 512b drives[*]. You need to make a new pool with the 4K drives. Even if you could get them into the existing pool, the performance effects would likely be relatively bad. ZFS does a lot of unaligned writes. - cks [*: If we're being technical, it's possible to force OmniOS to think that they're all 512b drives. ] From mmabis at vmware.com Fri Mar 21 16:34:18 2014 From: mmabis at vmware.com (Matthew Mabis) Date: Fri, 21 Mar 2014 09:34:18 -0700 (PDT) Subject: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone In-Reply-To: <20140321160437.2C8B11A04E9@apps0.cs.toronto.edu> References: <20140321160437.2C8B11A04E9@apps0.cs.toronto.edu> Message-ID: <996286416.12067302.1395419658589.JavaMail.root@vmware.com> I know the drive itself does 512b emulation but i would rather run 4K if theres a performance increase! thanks Matt ----- Original Message ----- From: "Chris Siebenmann" To: "Matthew Mabis" Cc: omnios-discuss at lists.omniti.com Sent: Friday, March 21, 2014 9:04:37 AM Subject: Re: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone | I am debating the idea of just swapping all my hard drives in my | current 8x2TB RaidZ2 (all be it slowly) and let the environment | resilver each drive than expand versus creating a new RaidZ2 on a | different box and cloning the data over. | | Obviously i know of the Pros/Cons/Risks associated with that | method. My question about debating deals with the new drives being 4K | where as the old drives were 512b aligned [...] As far as I know there is no question here: you simply cannot put 4K drives in a vdev originally created with 512b drives[*]. You need to make a new pool with the 4K drives. Even if you could get them into the existing pool, the performance effects would likely be relatively bad. ZFS does a lot of unaligned writes. - cks [*: If we're being technical, it's possible to force OmniOS to think that they're all 512b drives. ] From tobi at oetiker.ch Fri Mar 21 16:48:14 2014 From: tobi at oetiker.ch (Tobias Oetiker) Date: Fri, 21 Mar 2014 17:48:14 +0100 (CET) Subject: [OmniOS-discuss] zpool degraded while smart sais disks are OK Message-ID: a zpool on one of our boxes has been degraded with several disks faulted ... * the disks are all sas direct attached * according to smartctl the offending disks have no faults. * zfs decided to fault the disks after the events below. I have now told the pool to clear the errors and it is resilvering the disks ... (in progress) any idea what is happening here ? Mar 2 22:21:51 foo scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,3c04 at 2/pci1000,3020 at 0 (mpt_sas0): Mar 2 22:21:51 foo mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31170000 Mar 2 22:21:51 foo scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,3c04 at 2/pci1000,3020 at 0 (mpt_sas0): Mar 2 22:21:51 foo mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31170000 Mar 2 22:21:51 foo scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c04 at 2/pci1000,3020 at 0 (mpt_sas0): Mar 2 22:21:51 foo Log info 0x31170000 received for target 11. Mar 2 22:21:51 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Mar 2 22:21:51 foo scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c04 at 2/pci1000,3020 at 0 (mpt_sas0): Mar 2 22:21:51 foo Log info 0x31170000 received for target 11. Mar 2 22:21:51 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Mar 5 02:20:53 foo scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,3c06 at 2,2/pci1000,3020 at 0 (mpt_sas1): Mar 5 02:20:53 foo mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31170000 Mar 5 02:20:53 foo scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,3c06 at 2,2/pci1000,3020 at 0 (mpt_sas1): Mar 5 02:20:53 foo mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31170000 Mar 5 02:20:53 foo scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c06 at 2,2/pci1000,3020 at 0 (mpt_sas1): Mar 5 02:20:53 foo Log info 0x31170000 received for target 10. Mar 5 02:20:53 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Mar 5 02:20:53 foo scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c06 at 2,2/pci1000,3020 at 0 (mpt_sas1): Mar 5 02:20:53 foo Log info 0x31170000 received for target 10. Mar 5 02:20:53 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 *** We are hiring IT staff: www.oetiker.ch/jobs *** From cks at cs.toronto.edu Fri Mar 21 17:04:50 2014 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Fri, 21 Mar 2014 13:04:50 -0400 Subject: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone In-Reply-To: mmabis's message of Fri, 21 Mar 2014 09:34:18 -0700. <996286416.12067302.1395419658589.JavaMail.root@vmware.com> Message-ID: <20140321170450.89A941A04E9@apps0.cs.toronto.edu> | I know the drive itself does 512b emulation but i would rather run 4K | if theres a performance increase! What matters for OmniOS is what the drive reports as. If it reports honestly that it has a 4k physical sector size, ZFS will say 'nope!' even if the drive will accept 512b reads and writes. This is a very unfortunate limitation these days since it's increasingly hard to get drives that do not have 4k physical sector drives. But that's life. - cks From richard.elling at richardelling.com Fri Mar 21 19:50:45 2014 From: richard.elling at richardelling.com (Richard Elling) Date: Fri, 21 Mar 2014 12:50:45 -0700 Subject: [OmniOS-discuss] zpool degraded while smart sais disks are OK In-Reply-To: References: Message-ID: <39B55A5A-AA04-4C56-8A74-5B9316861405@RichardElling.com> On Mar 21, 2014, at 9:48 AM, Tobias Oetiker wrote: > a zpool on one of our boxes has been degraded with several disks > faulted ... > > * the disks are all sas direct attached > * according to smartctl the offending disks have no faults. > * zfs decided to fault the disks after the events below. > > I have now told the pool to clear the errors and it is resilvering the disks ... (in progress) > > any idea what is happening here ? > > Mar 2 22:21:51 foo scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,3c04 at 2/pci1000,3020 at 0 (mpt_sas0): > Mar 2 22:21:51 foo mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31170000 > Mar 2 22:21:51 foo scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,3c04 at 2/pci1000,3020 at 0 (mpt_sas0): > Mar 2 22:21:51 foo mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31170000 > Mar 2 22:21:51 foo scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c04 at 2/pci1000,3020 at 0 (mpt_sas0): > Mar 2 22:21:51 foo Log info 0x31170000 received for target 11. > Mar 2 22:21:51 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc > Mar 2 22:21:51 foo scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c04 at 2/pci1000,3020 at 0 (mpt_sas0): > Mar 2 22:21:51 foo Log info 0x31170000 received for target 11. > Mar 2 22:21:51 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc These are command aborted reports from the target device. You will see these every 60 seconds if the disk is not responding and the subsequent reset of the disk aborts the commands that are not responding. -- richard > > > Mar 5 02:20:53 foo scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,3c06 at 2,2/pci1000,3020 at 0 (mpt_sas1): > Mar 5 02:20:53 foo mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31170000 > Mar 5 02:20:53 foo scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,3c06 at 2,2/pci1000,3020 at 0 (mpt_sas1): > Mar 5 02:20:53 foo mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31170000 > Mar 5 02:20:53 foo scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c06 at 2,2/pci1000,3020 at 0 (mpt_sas1): > Mar 5 02:20:53 foo Log info 0x31170000 received for target 10. > Mar 5 02:20:53 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc > Mar 5 02:20:53 foo scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c06 at 2,2/pci1000,3020 at 0 (mpt_sas1): > Mar 5 02:20:53 foo Log info 0x31170000 received for target 10. > Mar 5 02:20:53 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc > > -- > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland > www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 > *** We are hiring IT staff: www.oetiker.ch/jobs *** > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -- Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zmalone at omniti.com Fri Mar 21 20:23:40 2014 From: zmalone at omniti.com (Zach Malone) Date: Fri, 21 Mar 2014 16:23:40 -0400 Subject: [OmniOS-discuss] zpool degraded while smart sais disks are OK In-Reply-To: <39B55A5A-AA04-4C56-8A74-5B9316861405@RichardElling.com> References: <39B55A5A-AA04-4C56-8A74-5B9316861405@RichardElling.com> Message-ID: On Fri, Mar 21, 2014 at 3:50 PM, Richard Elling wrote: > > On Mar 21, 2014, at 9:48 AM, Tobias Oetiker wrote: > > a zpool on one of our boxes has been degraded with several disks > faulted ... > > * the disks are all sas direct attached > * according to smartctl the offending disks have no faults. > * zfs decided to fault the disks after the events below. > > I have now told the pool to clear the errors and it is resilvering the disks > ... (in progress) > > any idea what is happening here ? ... Did all the disks fault at the same time, or was it spread out over a longer period? I'd suspect your power supply or disk controller. What are your zpool errors? From tobi at oetiker.ch Fri Mar 21 22:23:28 2014 From: tobi at oetiker.ch (Tobias Oetiker) Date: Fri, 21 Mar 2014 23:23:28 +0100 (CET) Subject: [OmniOS-discuss] zpool degraded while smart sais disks are OK In-Reply-To: References: <39B55A5A-AA04-4C56-8A74-5B9316861405@RichardElling.com> Message-ID: Today Zach Malone wrote: > On Fri, Mar 21, 2014 at 3:50 PM, Richard Elling > wrote: > > > > On Mar 21, 2014, at 9:48 AM, Tobias Oetiker wrote: > > > > a zpool on one of our boxes has been degraded with several disks > > faulted ... > > > > * the disks are all sas direct attached > > * according to smartctl the offending disks have no faults. > > * zfs decided to fault the disks after the events below. > > > > I have now told the pool to clear the errors and it is resilvering the disks > > ... (in progress) > > > > any idea what is happening here ? > > ... > > Did all the disks fault at the same time, or was it spread out over a > longer period? I'd suspect your power supply or disk controller. > What are your zpool errors? it happened over time as you can see from the timestamps in the log. The errors from zfs's point of view were 1 read and about 30 write but according to smart the disks are without flaw cheers tobi > > -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 *** We are hiring IT staff: www.oetiker.ch/jobs *** From richard.elling at richardelling.com Fri Mar 21 23:37:50 2014 From: richard.elling at richardelling.com (Richard Elling) Date: Fri, 21 Mar 2014 16:37:50 -0700 Subject: [OmniOS-discuss] zpool degraded while smart sais disks are OK In-Reply-To: References: <39B55A5A-AA04-4C56-8A74-5B9316861405@RichardElling.com> Message-ID: <0D51CBC0-D049-4A12-A733-7DDB6320BD82@richardelling.com> On Mar 21, 2014, at 3:23 PM, Tobias Oetiker wrote: > Today Zach Malone wrote: > >> On Fri, Mar 21, 2014 at 3:50 PM, Richard Elling >> wrote: >>> >>> On Mar 21, 2014, at 9:48 AM, Tobias Oetiker wrote: >>> >>> a zpool on one of our boxes has been degraded with several disks >>> faulted ... >>> >>> * the disks are all sas direct attached >>> * according to smartctl the offending disks have no faults. >>> * zfs decided to fault the disks after the events below. >>> >>> I have now told the pool to clear the errors and it is resilvering the disks >>> ... (in progress) >>> >>> any idea what is happening here ? >> >> ... >> >> Did all the disks fault at the same time, or was it spread out over a >> longer period? I'd suspect your power supply or disk controller. >> What are your zpool errors? > > it happened over time as you can see from the timestamps in the > log. The errors from zfs's point of view were 1 read and about 30 write > > but according to smart the disks are without flaw Actually, SMART is pretty dumb. In most cases, it only looks for uncorrectable errors that are related to media or heads. For a clue to more permanent errors, you will want to look at the read/write error reports for errors that are corrected with possible delays. You can also look at the grown defects list. This behaviour is expected for drives with errors that are not being quickly corrected or have firmware bugs (horrors!) and where the disk does not do TLER (or its vendor's equivalent) -- richard From tobi at oetiker.ch Sat Mar 22 05:13:25 2014 From: tobi at oetiker.ch (Tobias Oetiker) Date: Sat, 22 Mar 2014 06:13:25 +0100 (CET) Subject: [OmniOS-discuss] zpool degraded while smart sais disks are OK In-Reply-To: <0D51CBC0-D049-4A12-A733-7DDB6320BD82@richardelling.com> References: <39B55A5A-AA04-4C56-8A74-5B9316861405@RichardElling.com> <0D51CBC0-D049-4A12-A733-7DDB6320BD82@richardelling.com> Message-ID: Yesterday Richard Elling wrote: > > On Mar 21, 2014, at 3:23 PM, Tobias Oetiker wrote: [...] > > > > it happened over time as you can see from the timestamps in the > > log. The errors from zfs's point of view were 1 read and about 30 write > > > > but according to smart the disks are without flaw > > Actually, SMART is pretty dumb. In most cases, it only looks for uncorrectable > errors that are related to media or heads. For a clue to more permanent errors, > you will want to look at the read/write error reports for errors that are > corrected with possible delays. You can also look at the grown defects list. > > This behaviour is expected for drives with errors that are not being quickly > corrected or have firmware bugs (horrors!) and where the disk does not do TLER > (or its vendor's equivalent) > -- richard the error counters look like this: Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 3494 0 0 3494 44904 530.879 0 write: 0 0 0 0 39111 1793.323 0 verify: 0 0 0 0 8133 0.000 0 the disk vendor is HGST in case anyone has further ideas ... the system has 20 of these disks and the problems occured with three of them. The system has been running fine for two months previously. Vendor: HGST Product: HUS724030ALS640 Revision: A152 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Logical block size: 512 bytes Serial number: P8J20SNV Device type: disk Transport protocol: SAS cheers tobi > > -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 *** We are hiring IT staff: www.oetiker.ch/jobs *** From bfriesen at simple.dallas.tx.us Sat Mar 22 15:15:05 2014 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Sat, 22 Mar 2014 10:15:05 -0500 (CDT) Subject: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone In-Reply-To: <996286416.12067302.1395419658589.JavaMail.root@vmware.com> References: <20140321160437.2C8B11A04E9@apps0.cs.toronto.edu> <996286416.12067302.1395419658589.JavaMail.root@vmware.com> Message-ID: On Fri, 21 Mar 2014, Matthew Mabis wrote: > I know the drive itself does 512b emulation but i would rather run 4K if theres a performance increase! Does Illumos really have a "4k" path? It is my impression that knowledge of "4k" influences offsets and allocated block sizes but that otherwise things are really still done in terms of 512 byte sectors. A drive which can only support I/O in 4k sectors would not be very usable on most systems. Regardless, I can not imagine why someone would want to replace 2TB drives with 4TB drives. The resilver rate is no better with the 4TB drive than with the 2TB drive so the time to resilver is doubled and there are limits to what is tolerable. I/O performance would not improve and in fact it may diminish with the larger drives. It is much better to add more spindles to the pool (i.e. another raidz2 vdev). Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From jimklimov at cos.ru Sun Mar 23 14:49:11 2014 From: jimklimov at cos.ru (Jim Klimov) Date: Sun, 23 Mar 2014 15:49:11 +0100 Subject: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone In-Reply-To: References: <20140321160437.2C8B11A04E9@apps0.cs.toronto.edu> <996286416.12067302.1395419658589.JavaMail.root@vmware.com> Message-ID: 22 ????? 2014??. 16:15:05 CET, Bob Friesenhahn ?????: >On Fri, 21 Mar 2014, Matthew Mabis wrote: > >> I know the drive itself does 512b emulation but i would rather run 4K >if theres a performance increase! > >Does Illumos really have a "4k" path? It is my impression that >knowledge of "4k" influences offsets and allocated block sizes but >that otherwise things are really still done in terms of 512 byte >sectors. > >A drive which can only support I/O in 4k sectors would not be very >usable on most systems. Alas (or not), that's what does happen with "honest 4k native" disks - the minimal logical io request is 4k as well as the hardware sector size, unlike the 512e drives including those which do and don't honestly report the hardware sector size which can be used i.e. to influence better alignment of system data (fs headers, etc.) In this 4k-native case, minimal zfs block size is 4k, with some consequences in slack data overheads, fragmentation, metadata-to-data ratios, etc. There may be more visible drawbacks to such allocation on raidz than on mirrors. In case of 512e drives, the 512b sized blocks may be used, but writes cause RMW cycles in hardware, which may reduce reliability (theoretically - just another failure mode and bug nest in logical paths; no statistics to prove practical weaknesses) and performance (once said a 30% hit for random io). Since many OSes and FSes use 4k clusters or blocks anyway, given proper alignment to avoid RMW, they don't care or notice - they long haven't used the smaller io sizes anyway. > >Regardless, I can not imagine why someone would want to replace 2TB >drives with 4TB drives. Limited number of disk bays? ;) > The resilver rate is no better with the 4TB >drive than with the 2TB drive so the time to resilver is doubled and >there are limits to what is tolerable. I/O performance would not >improve and in fact it may diminish with the larger drives. It is >much better to add more spindles to the pool (i.e. another raidz2 >vdev). > >Bob -- Typos courtesy of K-9 Mail on my Samsung Android From bfriesen at simple.dallas.tx.us Sun Mar 23 15:53:38 2014 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Sun, 23 Mar 2014 10:53:38 -0500 (CDT) Subject: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone In-Reply-To: References: <20140321160437.2C8B11A04E9@apps0.cs.toronto.edu> <996286416.12067302.1395419658589.JavaMail.root@vmware.com> Message-ID: On Sun, 23 Mar 2014, Jim Klimov wrote: >> >> Regardless, I can not imagine why someone would want to replace 2TB >> drives with 4TB drives. > > Limited number of disk bays? ;) That would be the only reason. The cost of replacing existing 2TB drives with 4TB drives seems pretty high. Performace would only go down, and if the physical block size increases, then storage efficiency would decrease. Disk bays are not necessarily all that expensive as long as there is a place nearby to put it. An existing disk chassis could be replaced with one which supports more slots and the existing drives re-used as long as they are physically compatible with the new chassis. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From doug at will.to Sun Mar 23 17:02:54 2014 From: doug at will.to (Doug Hughes) Date: Sun, 23 Mar 2014 13:02:54 -0400 Subject: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone In-Reply-To: References: <20140321160437.2C8B11A04E9@apps0.cs.toronto.edu> <996286416.12067302.1395419658589.JavaMail.root@vmware.com> Message-ID: <532F13BE.4010204@will.to> On 3/23/2014 11:53 AM, Bob Friesenhahn wrote: > On Sun, 23 Mar 2014, Jim Klimov wrote: >>> >>> Regardless, I can not imagine why someone would want to replace 2TB >>> drives with 4TB drives. >> >> Limited number of disk bays? ;) > > That would be the only reason. The cost of replacing existing 2TB > drives with 4TB drives seems pretty high. Performace would only go > down, and if the physical block size increases, then storage efficiency > would decrease. Disk bays are not necessarily all that expensive as > long as there is a place nearby to put it. > > An existing disk chassis could be replaced with one which supports more > slots and the existing drives re-used as long as they are physically > compatible with the new chassis. > > Bob or just they need a lot of capacity for e.g. video or audio media. The modest loss of performance would go roughly unnoticed. (home storage, for instance) From jimklimov at cos.ru Sun Mar 23 17:09:03 2014 From: jimklimov at cos.ru (Jim Klimov) Date: Sun, 23 Mar 2014 18:09:03 +0100 Subject: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone In-Reply-To: References: <20140321160437.2C8B11A04E9@apps0.cs.toronto.edu> <996286416.12067302.1395419658589.JavaMail.root@vmware.com> Message-ID: 23 ????? 2014??. 16:53:38 CET, Bob Friesenhahn ?????: >On Sun, 23 Mar 2014, Jim Klimov wrote: >>> >>> Regardless, I can not imagine why someone would want to replace 2TB >>> drives with 4TB drives. >> >> Limited number of disk bays? ;) > >That would be the only reason. The cost of replacing existing 2TB >drives with 4TB drives seems pretty high. Performace would only go >down, and if the physical block size increases, then storage >efficiency would decrease. Disk bays are not necessarily all that >expensive as long as there is a place nearby to put it. > >An existing disk chassis could be replaced with one which supports >more slots and the existing drives re-used as long as they are >physically compatible with the new chassis. > >Bob Engineering is a matter of compromise. Something good for one usecase is not suitable for another. Consider the users of the popular HP Microserver series limited by 4-5 data disks. Consider the low-power rigs where more spindles might soon double the power draw (think of TCO over time vs. raw price of purchase). For a home nas peak performance might matter less than available volume, and even a "less efficient" storage in terms of slack space might be more efficient for mechanical performance by enforcing smaller fragmentation. So... YMMV ;) //Jim -- Typos courtesy of K-9 Mail on my Samsung Android From richard.elling at richardelling.com Sun Mar 23 23:32:15 2014 From: richard.elling at richardelling.com (Richard Elling) Date: Sun, 23 Mar 2014 16:32:15 -0700 Subject: [OmniOS-discuss] zpool degraded while smart sais disks are OK In-Reply-To: References: <39B55A5A-AA04-4C56-8A74-5B9316861405@RichardElling.com> <0D51CBC0-D049-4A12-A733-7DDB6320BD82@richardelling.com> Message-ID: On Mar 21, 2014, at 10:13 PM, Tobias Oetiker wrote: > Yesterday Richard Elling wrote: > >> >> On Mar 21, 2014, at 3:23 PM, Tobias Oetiker wrote: > > [...] >>> >>> it happened over time as you can see from the timestamps in the >>> log. The errors from zfs's point of view were 1 read and about 30 write >>> >>> but according to smart the disks are without flaw >> >> Actually, SMART is pretty dumb. In most cases, it only looks for uncorrectable >> errors that are related to media or heads. For a clue to more permanent errors, >> you will want to look at the read/write error reports for errors that are >> corrected with possible delays. You can also look at the grown defects list. >> >> This behaviour is expected for drives with errors that are not being quickly >> corrected or have firmware bugs (horrors!) and where the disk does not do TLER >> (or its vendor's equivalent) >> -- richard > > the error counters look like this: > > > Error counter log: > Errors Corrected by Total Correction Gigabytes Total > ECC rereads/ errors algorithm processed uncorrected > fast | delayed rewrites corrected invocations [10^9 bytes] errors > read: 3494 0 0 3494 44904 530.879 0 > write: 0 0 0 0 39111 1793.323 0 > verify: 0 0 0 0 8133 0.000 0 Errors corrected without delay looks good. The problem lies elsewhere. > > the disk vendor is HGST in case anyone has further ideas ... the system has 20 of these disks and the problems occured with > three of them. The system has been running fine for two months previously. ...and yet there are aborted commands, likely due to a reset after a timeout. Resets aren't issued without cause. There are two different resets issued by the sd driver: LU and bus. If the LU reset doesn't work, the resets are escalated to bus. This is, of course, tunable, but is rarely tuned. A bus reset for SAS is a questionable practice, since SAS is a fabric, not a bus. But the effect of a device in the fabric being reset could be seen as aborted commands by more than one target. To troubleshoot these cases, you need to look at all of the devices in the data path and map the common causes: HBAs, expanders, enclosures, etc. Traverse the devices looking for errors, as you did with the disks. Useful tools: sasinfo, lsiutil/sas2ircu, smp_utils, sg3_utils, mpathadm, fmtopo. -- richard > > Vendor: HGST > Product: HUS724030ALS640 > Revision: A152 > User Capacity: 3,000,592,982,016 bytes [3.00 TB] > Logical block size: 512 bytes > Serial number: P8J20SNV > Device type: disk > Transport protocol: SAS > > cheers > tobi >> >> > > -- > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland > www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 > *** We are hiring IT staff: www.oetiker.ch/jobs *** From geoffn at gnaa.net Mon Mar 24 23:13:13 2014 From: geoffn at gnaa.net (Geoff Nordli) Date: Mon, 24 Mar 2014 16:13:13 -0700 Subject: [OmniOS-discuss] anyone using SaltStack Message-ID: <5330BC09.1080500@gnaa.net> Is anyone is using SaltStack (http://www.saltstack.com/) on OmniOS. If so, how you are getting it installed? thanks, Geoff From bfriesen at simple.dallas.tx.us Tue Mar 25 01:45:20 2014 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Mon, 24 Mar 2014 20:45:20 -0500 (CDT) Subject: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone In-Reply-To: <1424115820.12054356.1395416787354.JavaMail.root@vmware.com> References: <1424115820.12054356.1395416787354.JavaMail.root@vmware.com> Message-ID: On Fri, 21 Mar 2014, Matthew Mabis wrote: > > Just curious what you would do in my situation, replace the drives or build a new vDev and why? I would add a new vdev for three reasons: 1) It is usually best to let sleeping dogs lie. 2) Takes a whole lot less time. 3) About twice the total performance. Drawbacks are: 1) More cost (but time is money). 2) More hardware. 3) More physical space. 4) More power consumption. 5) Imbalanced vdevs (in terms of space). The imbalanced vdevs might be helped by a number of sends/receives to the same pool but this depends on how the filesystems are organized. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From Rob at Logan.com Tue Mar 25 04:01:32 2014 From: Rob at Logan.com (Rob Logan) Date: Tue, 25 Mar 2014 00:01:32 -0400 Subject: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone In-Reply-To: References: <1424115820.12054356.1395416787354.JavaMail.root@vmware.com> Message-ID: >> Just curious what you would do in my situation, replace the drives or build a new vDev and why? I attach both 4T drives to the 2T mirrored pair, when its done resilvering, I detach the org 2T drives and add them to the 1T mirrored pair, when that?s done, I use to add the smallest disk pair as a new vdev, but now I toss them in the trash to minimize the number of devices. This takes way too much wall clock time, but only a few mins every night for three nights. my 512e drives are living fine in the 512n ashift=9 pool. rob at nas:~# zpool history z | head History for 'z': 2012-01-04.21:26:14 zpool create zz c4t7d0 c4t2d0 c5t5d0 c4t0d0 c7t2d0 2012-01-04.21:27:10 zfs set compression=on zz 2012-01-04.21:27:39 zfs recv -vd zz 2012-01-04.21:31:37 zfs recv -vd zz 2012-01-04.21:38:35 zfs recv -vd zz rob at nas:~# zpool iostat -v capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- rpool 7.30G 132G 0 0 756 1.72K mirror 7.30G 132G 0 0 756 1.72K c7t0d0s0 - - 0 0 663 1.73K c4t6d0s0 - - 0 0 664 1.73K ---------- ----- ----- ----- ----- ----- ----- z 7.28T 2.69T 12 3 1.18M 115K mirror 1.94T 1.69T 3 0 302K 25.1K c7t1d0 - - 0 3 17.6K 338K c7t3d0 - - 0 3 17.6K 338K mirror 1.80T 10.8G 1 0 164K 16.3K c5t2d0 - - 1 0 150K 16.3K c4t2d0 - - 1 0 150K 16.3K mirror 1.06T 772G 2 0 220K 18.3K c4t7d0 - - 0 1 11.1K 189K c5t7d0 - - 0 1 11.1K 189K mirror 925G 3.14G 1 0 78.7K 6.02K c4t0d0 - - 0 0 73.6K 6.04K c5t0d0 - - 0 0 73.7K 6.04K mirror 1.58T 237G 4 1 444K 49.3K c4t5d0 - - 2 0 316K 49.0K c5t5d0 - - 0 2 24.5K 286K ---------- ----- ----- ----- ----- ----- ----- rob at nas:~# hd Device Serial Vendor Model Rev Temperature ------ ------ ------ ----- ---- ----------- c4t0d0p0 WMATV00864xx ATA WDC WD1001FALS-0 0K05 36 C (96 F) c4t2d0p0 H7JR0C7006xx ATA SAMSUNG HD204UI 0001 28 C (82 F) c4t5d0p0 WMC3005796xx ATA WDC WD20EFRX-68A 0A80 29 C (84 F) c4t6d0p0 WCAMR24049xx ATA WDC WD3200JD-00K 5J08 0 C (32 F) c4t7d0p0 H7J90B9245xx ATA SAMSUNG HD204UI 0001 27 C (80 F) c5t0d0p0 WMATV00783xx ATA WDC WD1001FALS-0 0K05 36 C (96 F) c5t2d0p0 H7JD5B1029xx ATA SAMSUNG HD204UI 0001 27 C (80 F) c5t5d0p0 WMC3005605xx ATA WDC WD20EFRX-68A 0A80 29 C (84 F) c5t7d0p0 H7JD5B1029xx ATA SAMSUNG HD204UI 0001 25 C (77 F) c7t0d0p0 WMAP419728xx ATA WDC WD1500AHFD-0 7QR5 33 C (91 F) c7t1d0p0 WCC4E03420xx ATA WDC WD40EFRX-68W 0A80 30 C (86 F) c7t3d0p0 WCC4E04257xx ATA WDC WD40EFRX-68W 0A80 30 C (86 F) c7t4d0p0 H7J9HBA008xx ATA SAMSUNG HD204UI 0001 25 C (77 F) rob at nas:~# zdb | grep ashift ashift: 9 ashift: 9 ashift: 9 ashift: 9 ashift: 9 ashift: 9 From henk at hlangeveld.nl Tue Mar 25 07:39:02 2014 From: henk at hlangeveld.nl (Henk Langeveld) Date: Tue, 25 Mar 2014 08:39:02 +0100 Subject: [OmniOS-discuss] anyone using SaltStack In-Reply-To: <5330BC09.1080500@gnaa.net> References: <5330BC09.1080500@gnaa.net> Message-ID: <53313296.2050508@hlangeveld.nl> On 03/25/2014 12:13 AM, Geoff Nordli wrote: > Is anyone is using SaltStack (http://www.saltstack.com/) on OmniOS. > > If so, how you are getting it installed? Hi Geoff, I'm not using Salt, but according to the installation guide (http://docs.saltstack.com/en/latest/topics/installation/index.html) various versions of Solaris are supported. In addition, the salt-bootstrap.sh script (https://github.com/saltstack/salt-bootstrap) appears to support SmartOS. What do you need (minion or master), and what have you tried? Cheers, Henk From geoffn at gnaa.net Tue Mar 25 15:51:30 2014 From: geoffn at gnaa.net (Geoff Nordli) Date: Tue, 25 Mar 2014 08:51:30 -0700 Subject: [OmniOS-discuss] anyone using SaltStack In-Reply-To: <53313296.2050508@hlangeveld.nl> References: <5330BC09.1080500@gnaa.net> <53313296.2050508@hlangeveld.nl> Message-ID: <5331A602.8070302@gnaa.net> On 14-03-25 12:39 AM, Henk Langeveld wrote: > On 03/25/2014 12:13 AM, Geoff Nordli wrote: >> Is anyone is using SaltStack (http://www.saltstack.com/) on OmniOS. >> >> If so, how you are getting it installed? > > Hi Geoff, > > I'm not using Salt, but according to the installation guide > (http://docs.saltstack.com/en/latest/topics/installation/index.html) > various versions of Solaris are supported. > > In addition, the salt-bootstrap.sh script > (https://github.com/saltstack/salt-bootstrap) appears to support SmartOS. > > What do you need (minion or master), and what have you tried? > Hi Henk. I tried running the bootstrap which failed, because 0mq failed to compiled due to a missing header file. I looked at the bootstrap source file and smartOS uses pkgsrc to install the dependencies. The saltstack docs for solaris have you use opencsw. When I look at the options most likely I am going to go down the pkgsrc path. Just wondering what others have done. thanks, Geoff From carlb at flamewarestudios.com Tue Mar 25 16:27:01 2014 From: carlb at flamewarestudios.com (Carl Brunning) Date: Tue, 25 Mar 2014 16:27:01 +0000 Subject: [OmniOS-discuss] Problem with using omnios-build Message-ID: <29D2E00A4E2C9B4893E610929CA990A966A6F515@srv01.cblinux.co.uk> HI am playing with the omnios-build but when trying to do pkg build i find on the git clone it having a problem the line is this logcmd $GIT clone -b $PKG_BRANCH src at src.omniti.com:~omnios/core/pkg the branch is r151006 this is wanting a password but I don't know what it is i did see on the latest version of the build you have changed to this logcmd $GIT clone -b omni anon at src.omniti.com:~omnios/core/illumos-omni-os but this is just not finding anything Cloning into 'illumos-omni-os'... Non-existant repo core/illumos-omni-os fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. so why it asking for a password for the first one lol and what is it and i hope you fix it all thanks Carl Brunning -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Tue Mar 25 17:15:13 2014 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 25 Mar 2014 13:15:13 -0400 Subject: [OmniOS-discuss] Problem with using omnios-build In-Reply-To: <29D2E00A4E2C9B4893E610929CA990A966A6F515@srv01.cblinux.co.uk> References: <29D2E00A4E2C9B4893E610929CA990A966A6F515@srv01.cblinux.co.uk> Message-ID: <77965151-05DF-41EA-992A-6C48992AFBBE@omniti.com> On Mar 25, 2014, at 12:27 PM, Carl Brunning wrote: > HI > am playing with the omnios-build > but when trying to do pkg build i find on the git clone it having a problem > the line is this > logcmd $GIT clone -b $PKG_BRANCH src at src.omniti.com:~omnios/core/pkg > > the branch is r151006 I fixed some things in "master" to this. It should've just swapped out "src@" for "anon@". > this is wanting a password but I don't know what it is > i did see on the latest version of the build you have changed to this > > logcmd $GIT clone -b omni anon at src.omniti.com:~omnios/core/illumos-omni-os Hmmm. This is a very old artifact. The "master" version has this changeset: osdev2(build/pkg)[0]% git show --stat 85f25c28 commit 85f25c28969c859fb9bc838a6779dd3a70286896 Author: Theo Schlossnagle Date: Sat Mar 24 20:06:12 2012 +0000 move pkg to git and make it work build/pkg/build.sh | 47 ++++++++++++++++++++++++++++------------------- 1 file changed, 28 insertions(+), 19 deletions(-) osdev2(build/pkg)[0]% that eliminates the need for the clone to happen. Generally speaking, though, it's illumos-omnios: anon at src.omniti.com:~omnios/core/illumos-omnios Perhaps fixing that will work? > so why it asking for a password for the first one lol That's a bug. I didn't fix it in anywhere other than master, though. > and what is it > and i hope you fix it all I'm curious if pkg can be re-rewhacked to use headers from a completed omnios build? I'm introducing new features into omnios-build to make it completely fire-and-forget. One new feature not yet back is the PREBUILT_OMNIOS feature, that can point packages that depend on a populated illumos-omnios proto area. Perhaps I should include PREBUILT_OMNIOS support in pkg as well?! Pardon any slowness on my part. I'm new at pkg, and at the build outside of illumos-omnios itself. Thanks, Dan From carlb at flamewarestudios.com Wed Mar 26 11:13:16 2014 From: carlb at flamewarestudios.com (Carl Brunning) Date: Wed, 26 Mar 2014 11:13:16 +0000 Subject: [OmniOS-discuss] Problem with using omnios-build In-Reply-To: <77965151-05DF-41EA-992A-6C48992AFBBE@omniti.com> References: <29D2E00A4E2C9B4893E610929CA990A966A6F515@srv01.cblinux.co.uk> <77965151-05DF-41EA-992A-6C48992AFBBE@omniti.com> Message-ID: <29D2E00A4E2C9B4893E610929CA990A966AC38EC@srv01.cblinux.co.uk> Thanks for that yes that help me a little more Now I just got to work out why it has a problem uploading to the repo This is the error I get PATH=/tmp/build_admin/pkg-1.0/pkg/src/pkg/../../proto/root_i386/usr/bin:/usr/sbin:/usr/bin PYTHONPATH=/tmp/build_admin/pkg-1.0/pkg/src/pkg/../../proto/root_i386/usr/lib/python2.6/vendor-packages pkgsend -s http://repo.flamewarestudios.com:10000/ publish -d /tmp/build_admin/pkg-1.0/pkg/src/pkg/../../proto/root_i386 \ -d license_files -T \*.py --fmri-in-manifest \ pkgtmp/SUNWipkg-brand.dep.res pkgsend: Transfer from 'http://repo.removeed failed: api_errors.InvalidP5IFile:. (happened 4 times) the repo is a openindiana system could this be my problem now I have got most of the other package compile and up load with no problems Just pkg one is the problem Thanks Carl Brunning -----Original Message----- From: Dan McDonald [mailto:danmcd at omniti.com] Sent: 25 March 2014 17:15 To: Carl Brunning Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] Problem with using omnios-build On Mar 25, 2014, at 12:27 PM, Carl Brunning wrote: > HI > am playing with the omnios-build > but when trying to do pkg build i find on the git clone it having a > problem the line is this logcmd $GIT clone -b $PKG_BRANCH > src at src.omniti.com:~omnios/core/pkg > > the branch is r151006 I fixed some things in "master" to this. It should've just swapped out "src@" for "anon@". > this is wanting a password but I don't know what it is i did see on > the latest version of the build you have changed to this > > logcmd $GIT clone -b omni > anon at src.omniti.com:~omnios/core/illumos-omni-os Hmmm. This is a very old artifact. The "master" version has this changeset: osdev2(build/pkg)[0]% git show --stat 85f25c28 commit 85f25c28969c859fb9bc838a6779dd3a70286896 Author: Theo Schlossnagle Date: Sat Mar 24 20:06:12 2012 +0000 move pkg to git and make it work build/pkg/build.sh | 47 ++++++++++++++++++++++++++++------------------- 1 file changed, 28 insertions(+), 19 deletions(-) osdev2(build/pkg)[0]% that eliminates the need for the clone to happen. Generally speaking, though, it's illumos-omnios: anon at src.omniti.com:~omnios/core/illumos-omnios Perhaps fixing that will work? > so why it asking for a password for the first one lol That's a bug. I didn't fix it in anywhere other than master, though. > and what is it > and i hope you fix it all I'm curious if pkg can be re-rewhacked to use headers from a completed omnios build? I'm introducing new features into omnios-build to make it completely fire-and-forget. One new feature not yet back is the PREBUILT_OMNIOS feature, that can point packages that depend on a populated illumos-omnios proto area. Perhaps I should include PREBUILT_OMNIOS support in pkg as well?! Pardon any slowness on my part. I'm new at pkg, and at the build outside of illumos-omnios itself. Thanks, Dan From nitram at konsortit.se Wed Mar 26 13:05:07 2014 From: nitram at konsortit.se (Nitram Grebredna) Date: Wed, 26 Mar 2014 14:05:07 +0100 Subject: [OmniOS-discuss] Multipathing, only one path visible - there ought to be two, what am i doing wrong? Message-ID: Hi! I'm having issues with multipathing, and i cant seem to figure out what is wrong. The setup is a Supermicro 24 disk-box with 3 controllers (1 pcs internal SAS2308, two 9207i-cards, same firmware on all units), identified as follows: Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr ---------------------------------------------------------------------------- 0 SAS2308_2(D1) 18.00.00.00 11.00.00.05 07.33.00.00 00:01:00:00 1 SAS2308_2(D1) 18.00.00.00 11.00.00.05 07.33.00.00 00:02:00:00 2 SAS2308_2(D1) 18.00.00.00 11.00.00.05 07.33.00.00 00:03:00:00 The machine has 12 ST4000NM0023 (seagate 4TB DP) disks in it and a couple or bootdisks. The controllers are connected via 2 cables per controller to the backplane/expander. I've Installed latest stable omnios on it. Excerpt from dmesg: [...] genunix: [ID 936769 kern.info] mpt_sas2 is /pci at 0,0/pci8086,e06 at 2 ,2/pci1000,3020 at 0 scsi: [ID 583861 kern.info] mpt_sas7 at mpt_sas2: scsi-iport 4 genunix: [ID 936769 kern.info] mpt_sas7 is /pci at 0,0/pci8086,e06 at 2 ,2/pci1000,3020 at 0/iport at 4 genunix: [ID 408114 kern.info] /pci at 0,0/pci8086,e06 at 2,2/pci1000,3020 at 0 /iport at 4 (mpt_sas7) online scsi: [ID 583861 kern.info] sd11 at scsi_vhci0: unit-address g5000c50057c1fce3: conf f_sym genunix: [ID 936769 kern.info] sd11 is /scsi_vhci/disk at g5000c50057c1fce3 genunix: [ID 408114 kern.info] /scsi_vhci/disk at g5000c50057c1fce3 (sd11) online genunix: [ID 483743 kern.info] /scsi_vhci/disk at g5000c50057c1fce3 (sd11) multipath status: degraded: path 4 mpt_sas15/disk at w5000c50057c1fce1,0 is online scsi: [ID 583861 kern.info] mpt_sas11 at mpt_sas1: scsi-iport 2 genunix: [ID 936769 kern.info] mpt_sas11 is /pci at 0,0/pci8086,e04 at 2 /pci1000,3020 at 0/iport at 2 genunix: [ID 408114 kern.info] /pci at 0,0/pci8086,e04 at 2/pci1000,3020 at 0/iport at 2(mpt_sas11) online scsi: [ID 583861 kern.info] sd2 at scsi_vhci0: unit-address g5000c50057ca74bb: conf f_sym genunix: [ID 936769 kern.info] sd2 is /scsi_vhci/disk at g5000c50057ca74bb genunix: [ID 408114 kern.info] /scsi_vhci/disk at g5000c50057ca74bb (sd2) online [...] Excerpt from mpathadm: # mpathadm list lu /dev/rdsk/c1t5000C5006C0BF63Fd0s2 Total Path Count: 1 Operational Path Count: 1 /dev/rdsk/c1t5000C5006C0B29C7d0s2 Total Path Count: 1 Operational Path Count: 1 /dev/rdsk/c1t5000C50057C1F5BFd0s2 Total Path Count: 1 [...] Excerpt from format: # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t5000C5006C0B29C7d0 /scsi_vhci/disk at g5000c5006c0b29c7 1. c1t5000C5006C0BF63Fd0 /scsi_vhci/disk at g5000c5006c0bf63f 2. c1t5000C50057C1F5BFd0 /scsi_vhci/disk at g5000c50057c1f5bf 3. c1t5000C50057C1FCE3d0 /scsi_vhci/disk at g5000c50057c1fce3 [...] If i set mpxio-disable="yes"; in mpt_sas.conf the error above obviously dissapears and also i can see the 'real' device/controller id's when issuing the format command. If things were working correctly i assume i would see a total path count of 2 per disk, and the multipath status wouldn't be set as degraded in the log? What am i doing wrong? I've asked google and since they dont know the answer to the question i'd thought i'd try a post here ;) Thanks in advance for any help, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From svavar at januar.is Wed Mar 26 15:47:51 2014 From: svavar at januar.is (=?ISO-8859-1?Q?Svavar_=D6rn_Eysteinsson?=) Date: Wed, 26 Mar 2014 15:47:51 +0000 Subject: [OmniOS-discuss] Networking Performance Tips on HP Microserver N40L ? Message-ID: Hello people. I recently installed my first true NAS box at home, which is a HP Microserver N40L with 16GB in RAM, 1x250GB for OS and 4x 2TB Enterprise SATA disks provided by HP in a RAIDZ. I'm using the newest/updated OmniOS v11 r151008 and also Napp-it and other services. What I would like to know is, have there been any issues/problems and do people have some performance tuning tips regarding networking issues on the BC5723 controller provided by the HP Microserver ? It's the bge module/driver ? Sometimes I find the speeds to the BOX will rock up & down. I haven't configured a gigabit network, thats on the plan this weekend. I have full-duplex and flowctrl enabled. For an example, I noticed after building my small ipf firewall rules and enabled the firewall the speed did go down, specially with CIFS and NFS(didn't test the AFP). So, any performance tips out there ? Thanks in advance. Best regards, Svavar O - Reykjavik - Iceland -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Wed Mar 26 16:01:34 2014 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 26 Mar 2014 12:01:34 -0400 Subject: [OmniOS-discuss] Networking Performance Tips on HP Microserver N40L ? In-Reply-To: References: Message-ID: On Mar 26, 2014, at 11:47 AM, Svavar ?rn Eysteinsson wrote: > Hello people. > I recently installed my first true NAS box at home, which is a HP Microserver N40L > with 16GB in RAM, 1x250GB for OS and 4x 2TB Enterprise SATA disks provided by HP in a RAIDZ. > > I'm using the newest/updated OmniOS v11 r151008 and also Napp-it and other services. > What I would like to know is, have there been any issues/problems and do people > have some performance tuning tips regarding networking issues on the BC5723 controller provided > by the HP Microserver ? It's the bge module/driver ? > > Sometimes I find the speeds to the BOX will rock up & down. I haven't configured > a gigabit network, thats on the plan this weekend. I have full-duplex and flowctrl enabled. > For an example, I noticed after building my small ipf firewall rules and enabled the firewall > the speed did go down, specially with CIFS and NFS(didn't test the AFP). Was performance okay pre-ipf? If so, it's probably ipf that's tripping you up. > So, any performance tips out there ? I have to ask, are you using ipf to protect the box? Or for NAT? If just to protect the box, you may be able to use something NOT ipf to help you out, depending on the problem(s) you're trying to solve. Dan From svavar at januar.is Wed Mar 26 17:12:45 2014 From: svavar at januar.is (=?ISO-8859-1?Q?Svavar_=D6rn_Eysteinsson?=) Date: Wed, 26 Mar 2014 17:12:45 +0000 Subject: [OmniOS-discuss] Networking Performance Tips on HP Microserver N40L ? In-Reply-To: References: Message-ID: No, the performance was a little shaky before, and after the ipf activation. So I just disabled the firewall part. The reason I activated the firewall is not for NAT, just to protect the box. As I have configured my router to portmap some ports into the HP server, and I use ipf to deny/accept by source. As my stupid router firewall configuration never works. The rules I used where : # my HP server is 192.168.1.1 # anti spoofing rule block in log quick on bge0 from 192.168.1.1 to any # # Allow everything on loopbak # Rule 1 (lo0) pass in quick on lo0 proto icmp from any to any keep state pass in quick on lo0 proto tcp from any to any keep state pass in quick on lo0 proto udp from any to any keep state pass in quick on lo0 from any to any pass out quick on lo0 proto icmp from any to any keep state pass out quick on lo0 proto tcp from any to any keep state pass out quick on lo0 proto udp from any to any keep state pass out quick on lo0 from any to any # # Rule 2 (global) # SSH Access to the host; useful ICMP # types; ping request pass in quick proto icmp from any to 192.168.1.1 icmp-type 3 keep state pass in quick proto icmp from any to 192.168.1.1 icmp-type 0 code 0 keep state pass in quick proto icmp from any to 192.168.1.1 icmp-type 8 code 0 keep state pass in quick proto icmp from any to 192.168.1.1 icmp-type 11 code 0 keep state pass in quick proto icmp from any to 192.168.1.1 icmp-type 11 code 1 keep state # # Rule 4 (global) # Allow everything from these management hosts. # blackbox:Policy:4: warning: Changing rule direction due to self reference pass in quick proto icmp from MANAGENETWORK_1 to 192.168.1.1 keep state pass in quick proto icmp from MANAGENETWORK_2 to 192.168.1.1 keep state pass in quick proto icmp from MANAGEHOST_1 to 192.168.1.1 keep state pass in quick proto tcp from MANAGENETWORK_1 to 192.168.1.1 keep state pass in quick proto tcp from MANAGENETWORK_2 to 192.168.1.1 keep state pass in quick proto tcp from MANAGEHOST_1 to 192.168.1.1 keep state pass in quick proto udp from MANAGENETWORK_1 to 192.168.1.1 keep state pass in quick proto udp from MANAGENETWORK_2 to 192.168.1.1 keep state pass in quick proto udp from MANAGEHOST_1 to 192.168.1.1 keep state pass in quick from MANAGENETWORK_1 to 192.168.1.1 pass in quick from MANAGENETWORK_2 to 192.168.1.1 pass in quick from MANAGEHOST_1 to 192.168.1.1 # # Rule 5 (global) # Allow everything from the HP Server itself # blackbox:Policy:5: warning: Changing rule direction due to self reference pass out quick proto icmp from 192.168.1.1 to any keep state pass out quick proto tcp from 192.168.1.1 to any keep state pass out quick proto udp from 192.168.1.1 to any keep state pass out quick from 192.168.1.1 to any # # Rule 6 (global) block in log quick from any to any block out log quick from any to any # # Rule fallback rule # fallback rule block in quick from any to any block out quick from any to any *SVAVAR ?RN EYSTEINSSON*Kerfisstj?ri Gsm / mobile +354 862 1624 S?mi / tel +354 531 0101 *Jan?ar marka?sh?s*www.januar.is / Facebook On 26 March 2014 16:01, Dan McDonald wrote: > > On Mar 26, 2014, at 11:47 AM, Svavar ?rn Eysteinsson > wrote: > > > Hello people. > > I recently installed my first true NAS box at home, which is a HP > Microserver N40L > > with 16GB in RAM, 1x250GB for OS and 4x 2TB Enterprise SATA disks > provided by HP in a RAIDZ. > > > > I'm using the newest/updated OmniOS v11 r151008 and also Napp-it and > other services. > > What I would like to know is, have there been any issues/problems and do > people > > have some performance tuning tips regarding networking issues on the > BC5723 controller provided > > by the HP Microserver ? It's the bge module/driver ? > > > > Sometimes I find the speeds to the BOX will rock up & down. I haven't > configured > > a gigabit network, thats on the plan this weekend. I have full-duplex > and flowctrl enabled. > > For an example, I noticed after building my small ipf firewall rules and > enabled the firewall > > the speed did go down, specially with CIFS and NFS(didn't test the AFP). > > Was performance okay pre-ipf? If so, it's probably ipf that's tripping > you up. > > > So, any performance tips out there ? > > I have to ask, are you using ipf to protect the box? Or for NAT? If just > to protect the box, you may be able to use something NOT ipf to help you > out, depending on the problem(s) you're trying to solve. > > Dan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Wed Mar 26 17:19:42 2014 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 26 Mar 2014 13:19:42 -0400 Subject: [OmniOS-discuss] Networking Performance Tips on HP Microserver N40L ? In-Reply-To: References: Message-ID: On Mar 26, 2014, at 1:12 PM, Svavar ?rn Eysteinsson wrote: > No, the performance was a little shaky before, and after the ipf activation. > So I just disabled the firewall part. > > The reason I activated the firewall is not for NAT, just to protect the box. > As I have configured my router to portmap some ports into the HP server, and I use ipf to deny/accept by source. > As my stupid router firewall configuration never works. Okay. Just checking. I tend to use ipsecconf(1M) and drop actions for this sort of thing, but it's stateless, and it appears some of your FW rules are stateful. Yes, bge is the driver for what you have. I do know that bge needs some updating, but nobody's been contributing in the Illumos community on that front. Sorry I can't be of more immediate assistance, Dan From cf at ferebee.net Wed Mar 26 17:33:17 2014 From: cf at ferebee.net (Chris Ferebee) Date: Wed, 26 Mar 2014 18:33:17 +0100 Subject: [OmniOS-discuss] Multipathing, only one path visible - there ought to be two, what am i doing wrong? In-Reply-To: References: Message-ID: <9A1D4D35-8F4D-4175-BFF7-A887846FCEBA@ferebee.net> Martin, Are you sure you have SAS expanders in your backplane? Supermicro will sell you the same chassis with or without expanders, with almost identical model numbers. You?ve described a typical JBOD configuration (i. e., no expanders): Each LSI 2308 has 8 SAS/SATA ports, 4 ports on each of 2 Mini-SAS SFF8087 connectors. Thus, with 3 controllers you are running 3 x 8 = 24 SAS ports to the backplane. Do you have the exact model number of the chassis or backplane? Best, Chris > Am 26.03.2014 um 14:05 schrieb Nitram Grebredna : > > Hi! > > I'm having issues with multipathing, and i cant seem to figure out what is wrong. > > The setup is a Supermicro 24 disk-box with 3 controllers (1 pcs internal SAS2308, two 9207i-cards, same firmware on all units), identified as follows: > > Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr > ---------------------------------------------------------------------------- > > 0 SAS2308_2(D1) 18.00.00.00 11.00.00.05 07.33.00.00 00:01:00:00 > 1 SAS2308_2(D1) 18.00.00.00 11.00.00.05 07.33.00.00 00:02:00:00 > 2 SAS2308_2(D1) 18.00.00.00 11.00.00.05 07.33.00.00 00:03:00:00 > > The machine has 12 ST4000NM0023 (seagate 4TB DP) disks in it and a couple or bootdisks. The controllers are connected via 2 cables per controller to the backplane/expander. I've Installed latest stable omnios on it. > > Excerpt from dmesg: > > [...] > > genunix: [ID 936769 kern.info] mpt_sas2 is /pci at 0,0/pci8086,e06 at 2,2/pci1000,3020 at 0 > scsi: [ID 583861 kern.info] mpt_sas7 at mpt_sas2: scsi-iport 4 > genunix: [ID 936769 kern.info] mpt_sas7 is /pci at 0,0/pci8086,e06 at 2,2/pci1000,3020 at 0/iport at 4 > genunix: [ID 408114 kern.info] /pci at 0,0/pci8086,e06 at 2,2/pci1000,3020 at 0/iport at 4 (mpt_sas7) online > scsi: [ID 583861 kern.info] sd11 at scsi_vhci0: unit-address g5000c50057c1fce3: conf f_sym > genunix: [ID 936769 kern.info] sd11 is /scsi_vhci/disk at g5000c50057c1fce3 > genunix: [ID 408114 kern.info] /scsi_vhci/disk at g5000c50057c1fce3 (sd11) online > genunix: [ID 483743 kern.info] /scsi_vhci/disk at g5000c50057c1fce3 (sd11) multipath status: degraded: path 4 mpt_sas15/disk at w5000c50057c1fce1,0 is online > scsi: [ID 583861 kern.info] mpt_sas11 at mpt_sas1: scsi-iport 2 > genunix: [ID 936769 kern.info] mpt_sas11 is /pci at 0,0/pci8086,e04 at 2/pci1000,3020 at 0/iport at 2 > genunix: [ID 408114 kern.info] /pci at 0,0/pci8086,e04 at 2/pci1000,3020 at 0/iport at 2 (mpt_sas11) online > scsi: [ID 583861 kern.info] sd2 at scsi_vhci0: unit-address g5000c50057ca74bb: conf f_sym > genunix: [ID 936769 kern.info] sd2 is /scsi_vhci/disk at g5000c50057ca74bb > genunix: [ID 408114 kern.info] /scsi_vhci/disk at g5000c50057ca74bb (sd2) online > > [...] > > Excerpt from mpathadm: > > # mpathadm list lu > /dev/rdsk/c1t5000C5006C0BF63Fd0s2 > Total Path Count: 1 > Operational Path Count: 1 > /dev/rdsk/c1t5000C5006C0B29C7d0s2 > Total Path Count: 1 > Operational Path Count: 1 > /dev/rdsk/c1t5000C50057C1F5BFd0s2 > Total Path Count: 1 > > [...] > > > Excerpt from format: > > # format > Searching for disks...done > > > AVAILABLE DISK SELECTIONS: > 0. c1t5000C5006C0B29C7d0 > /scsi_vhci/disk at g5000c5006c0b29c7 > 1. c1t5000C5006C0BF63Fd0 > /scsi_vhci/disk at g5000c5006c0bf63f > 2. c1t5000C50057C1F5BFd0 > /scsi_vhci/disk at g5000c50057c1f5bf > 3. c1t5000C50057C1FCE3d0 > /scsi_vhci/disk at g5000c50057c1fce3 > > [...] > > If i set mpxio-disable="yes"; in mpt_sas.conf the error above obviously dissapears and also i can see the 'real' device/controller id's when issuing the format command. > > If things were working correctly i assume i would see a total path count of 2 per disk, and the multipath status wouldn't be set as degraded in the log? What am i doing wrong? I've asked google and since they dont know the answer to the question i'd thought i'd try a post here ;) > > Thanks in advance for any help, > > Martin > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From russhan at new-swankton.net Wed Mar 26 19:02:23 2014 From: russhan at new-swankton.net (Russell Hansen) Date: Wed, 26 Mar 2014 19:02:23 +0000 Subject: [OmniOS-discuss] Multipathing, only one path visible - there ought to be two, what am i doing wrong? In-Reply-To: <9A1D4D35-8F4D-4175-BFF7-A887846FCEBA@ferebee.net> References: , <9A1D4D35-8F4D-4175-BFF7-A887846FCEBA@ferebee.net> Message-ID: <0AE3E26797567E4AAB5C53C304D024455DA1A23D@ns-ex2010.new-swankton.lan> Because those disks don't have Sun/Oracle firmware I believe you need to update /kernel/drv/scsi_vhci.conf scsi-vhci-failover-override = "SEAGATE ST3300657SS", "f_sym", "SEAGATE ST4000NM0023", "f_sym"; You can double-check the VID/PID string by running format -> disk# -> inquiry -Russ From: OmniOS-discuss [omnios-discuss-bounces at lists.omniti.com] on behalf of Chris Ferebee [cf at ferebee.net] Sent: Wednesday, March 26, 2014 10:33 AM To: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] Multipathing, only one path visible - there ought to be two, what am i doing wrong? Martin, Are you sure you have SAS expanders in your backplane? Supermicro will sell you the same chassis with or without expanders, with almost identical model numbers. You?ve described a typical JBOD configuration (i. e., no expanders): Each LSI 2308 has 8 SAS/SATA ports, 4 ports on each of 2 Mini-SAS SFF8087 connectors. Thus, with 3 controllers you are running 3 x 8 = 24 SAS ports to the backplane. Do you have the exact model number of the chassis or backplane? Best, Chris Am 26.03.2014 um 14:05 schrieb Nitram Grebredna : Hi! I'm having issues with multipathing, and i cant seem to figure out what is wrong. The setup is a Supermicro 24 disk-box with 3 controllers (1 pcs internal SAS2308, two 9207i-cards, same firmware on all units), identified as follows: Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr ---------------------------------------------------------------------------- 0 SAS2308_2(D1) 18.00.00.00 11.00.00.05 07.33.00.00 00:01:00:00 1 SAS2308_2(D1) 18.00.00.00 11.00.00.05 07.33.00.00 00:02:00:00 2 SAS2308_2(D1) 18.00.00.00 11.00.00.05 07.33.00.00 00:03:00:00 The machine has 12 ST4000NM0023 (seagate 4TB DP) disks in it and a couple or bootdisks. The controllers are connected via 2 cables per controller to the backplane/expander. I've Installed latest stable omnios on it. Excerpt from dmesg: [...] genunix: [ID 936769 kern.info] mpt_sas2 is /pci at 0,0/pci8086,e06 at 2,2/pci1000,3020 at 0 scsi: [ID 583861 kern.info] mpt_sas7 at mpt_sas2: scsi-iport 4 genunix: [ID 936769 kern.info] mpt_sas7 is /pci at 0,0/pci8086,e06 at 2,2/pci1000,3020 at 0/iport at 4 genunix: [ID 408114 kern.info] /pci at 0,0/pci8086,e06 at 2,2/pci1000,3020 at 0/iport at 4 (mpt_sas7) online scsi: [ID 583861 kern.info] sd11 at scsi_vhci0: unit-address g5000c50057c1fce3: conf f_sym genunix: [ID 936769 kern.info] sd11 is /scsi_vhci/disk at g5000c50057c1fce3 genunix: [ID 408114 kern.info] /scsi_vhci/disk at g5000c50057c1fce3 (sd11) online genunix: [ID 483743 kern.info] /scsi_vhci/disk at g5000c50057c1fce3 (sd11) multipath status: degraded: path 4 mpt_sas15/disk at w5000c50057c1fce1,0 is online scsi: [ID 583861 kern.info] mpt_sas11 at mpt_sas1: scsi-iport 2 genunix: [ID 936769 kern.info] mpt_sas11 is /pci at 0,0/pci8086,e04 at 2/pci1000,3020 at 0/iport at 2 genunix: [ID 408114 kern.info] /pci at 0,0/pci8086,e04 at 2/pci1000,3020 at 0/iport at 2 (mpt_sas11) online scsi: [ID 583861 kern.info] sd2 at scsi_vhci0: unit-address g5000c50057ca74bb: conf f_sym genunix: [ID 936769 kern.info] sd2 is /scsi_vhci/disk at g5000c50057ca74bb genunix: [ID 408114 kern.info] /scsi_vhci/disk at g5000c50057ca74bb (sd2) online [...] Excerpt from mpathadm: # mpathadm list lu /dev/rdsk/c1t5000C5006C0BF63Fd0s2 Total Path Count: 1 Operational Path Count: 1 /dev/rdsk/c1t5000C5006C0B29C7d0s2 Total Path Count: 1 Operational Path Count: 1 /dev/rdsk/c1t5000C50057C1F5BFd0s2 Total Path Count: 1 [...] Excerpt from format: # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t5000C5006C0B29C7d0 /scsi_vhci/disk at g5000c5006c0b29c7 1. c1t5000C5006C0BF63Fd0 /scsi_vhci/disk at g5000c5006c0bf63f 2. c1t5000C50057C1F5BFd0 /scsi_vhci/disk at g5000c50057c1f5bf 3. c1t5000C50057C1FCE3d0 /scsi_vhci/disk at g5000c50057c1fce3 [...] If i set mpxio-disable="yes"; in mpt_sas.conf the error above obviously dissapears and also i can see the 'real' device/controller id's when issuing the format command. If things were working correctly i assume i would see a total path count of 2 per disk, and the multipath status wouldn't be set as degraded in the log? What am i doing wrong? I've asked google and since they dont know the answer to the question i'd thought i'd try a post here ;) Thanks in advance for any help, Martin _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From carlb at flamewarestudios.com Wed Mar 26 22:45:49 2014 From: carlb at flamewarestudios.com (Carl Brunning) Date: Wed, 26 Mar 2014 22:45:49 +0000 Subject: [OmniOS-discuss] Problem with using omnios-build In-Reply-To: <29D2E00A4E2C9B4893E610929CA990A966AC38EC@srv01.cblinux.co.uk> References: <29D2E00A4E2C9B4893E610929CA990A966A6F515@srv01.cblinux.co.uk> <77965151-05DF-41EA-992A-6C48992AFBBE@omniti.com> <29D2E00A4E2C9B4893E610929CA990A966AC38EC@srv01.cblinux.co.uk> Message-ID: <29D2E00A4E2C9B4893E610929CA990A966AC817F@srv01.cblinux.co.uk> Anyone Sorry if am pain just want to learn from this And why it not work Thanks carl brunning -----Original Message----- From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Carl Brunning Sent: 26 March 2014 11:13 To: Dan McDonald Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] Problem with using omnios-build Thanks for that yes that help me a little more Now I just got to work out why it has a problem uploading to the repo This is the error I get PATH=/tmp/build_admin/pkg-1.0/pkg/src/pkg/../../proto/root_i386/usr/bin:/usr/sbin:/usr/bin PYTHONPATH=/tmp/build_admin/pkg-1.0/pkg/src/pkg/../../proto/root_i386/usr/lib/python2.6/vendor-packages pkgsend -s http://repo.flamewarestudios.com:10000/ publish -d /tmp/build_admin/pkg-1.0/pkg/src/pkg/../../proto/root_i386 \ -d license_files -T \*.py --fmri-in-manifest \ pkgtmp/SUNWipkg-brand.dep.res pkgsend: Transfer from 'http://repo.removeed failed: api_errors.InvalidP5IFile:. (happened 4 times) the repo is a openindiana system could this be my problem now I have got most of the other package compile and up load with no problems Just pkg one is the problem Thanks Carl Brunning -----Original Message----- From: Dan McDonald [mailto:danmcd at omniti.com] Sent: 25 March 2014 17:15 To: Carl Brunning Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] Problem with using omnios-build On Mar 25, 2014, at 12:27 PM, Carl Brunning wrote: > HI > am playing with the omnios-build > but when trying to do pkg build i find on the git clone it having a > problem the line is this logcmd $GIT clone -b $PKG_BRANCH > src at src.omniti.com:~omnios/core/pkg > > the branch is r151006 I fixed some things in "master" to this. It should've just swapped out "src@" for "anon@". > this is wanting a password but I don't know what it is i did see on > the latest version of the build you have changed to this > > logcmd $GIT clone -b omni > anon at src.omniti.com:~omnios/core/illumos-omni-os Hmmm. This is a very old artifact. The "master" version has this changeset: osdev2(build/pkg)[0]% git show --stat 85f25c28 commit 85f25c28969c859fb9bc838a6779dd3a70286896 Author: Theo Schlossnagle Date: Sat Mar 24 20:06:12 2012 +0000 move pkg to git and make it work build/pkg/build.sh | 47 ++++++++++++++++++++++++++++------------------- 1 file changed, 28 insertions(+), 19 deletions(-) osdev2(build/pkg)[0]% that eliminates the need for the clone to happen. Generally speaking, though, it's illumos-omnios: anon at src.omniti.com:~omnios/core/illumos-omnios Perhaps fixing that will work? > so why it asking for a password for the first one lol That's a bug. I didn't fix it in anywhere other than master, though. > and what is it > and i hope you fix it all I'm curious if pkg can be re-rewhacked to use headers from a completed omnios build? I'm introducing new features into omnios-build to make it completely fire-and-forget. One new feature not yet back is the PREBUILT_OMNIOS feature, that can point packages that depend on a populated illumos-omnios proto area. Perhaps I should include PREBUILT_OMNIOS support in pkg as well?! Pardon any slowness on my part. I'm new at pkg, and at the build outside of illumos-omnios itself. Thanks, Dan _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From carlb at flamewarestudios.com Thu Mar 27 16:56:35 2014 From: carlb at flamewarestudios.com (Carl Brunning) Date: Thu, 27 Mar 2014 16:56:35 +0000 Subject: [OmniOS-discuss] Problem with using omnios-build In-Reply-To: <29D2E00A4E2C9B4893E610929CA990A966AC817F@srv01.cblinux.co.uk> References: <29D2E00A4E2C9B4893E610929CA990A966A6F515@srv01.cblinux.co.uk> <77965151-05DF-41EA-992A-6C48992AFBBE@omniti.com> <29D2E00A4E2C9B4893E610929CA990A966AC38EC@srv01.cblinux.co.uk>, <29D2E00A4E2C9B4893E610929CA990A966AC817F@srv01.cblinux.co.uk> Message-ID: <29D2E00A4E2C9B4893E610929CA990A966ACA4E6@srv01.cblinux.co.uk> HI just to say I've fixed it so it all good now thanks Carl Brunning ________________________________________ From: Carl Brunning Sent: 26 March 2014 22:45 To: Carl Brunning; Dan McDonald Cc: omnios-discuss at lists.omniti.com Subject: RE: [OmniOS-discuss] Problem with using omnios-build Anyone Sorry if am pain just want to learn from this And why it not work Thanks carl brunning -----Original Message----- From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Carl Brunning Sent: 26 March 2014 11:13 To: Dan McDonald Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] Problem with using omnios-build Thanks for that yes that help me a little more Now I just got to work out why it has a problem uploading to the repo This is the error I get PATH=/tmp/build_admin/pkg-1.0/pkg/src/pkg/../../proto/root_i386/usr/bin:/usr/sbin:/usr/bin PYTHONPATH=/tmp/build_admin/pkg-1.0/pkg/src/pkg/../../proto/root_i386/usr/lib/python2.6/vendor-packages pkgsend -s http://repo.flamewarestudios.com:10000/ publish -d /tmp/build_admin/pkg-1.0/pkg/src/pkg/../../proto/root_i386 \ -d license_files -T \*.py --fmri-in-manifest \ pkgtmp/SUNWipkg-brand.dep.res pkgsend: Transfer from 'http://repo.removeed failed: api_errors.InvalidP5IFile:. (happened 4 times) the repo is a openindiana system could this be my problem now I have got most of the other package compile and up load with no problems Just pkg one is the problem Thanks Carl Brunning -----Original Message----- From: Dan McDonald [mailto:danmcd at omniti.com] Sent: 25 March 2014 17:15 To: Carl Brunning Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] Problem with using omnios-build On Mar 25, 2014, at 12:27 PM, Carl Brunning wrote: > HI > am playing with the omnios-build > but when trying to do pkg build i find on the git clone it having a > problem the line is this logcmd $GIT clone -b $PKG_BRANCH > src at src.omniti.com:~omnios/core/pkg > > the branch is r151006 I fixed some things in "master" to this. It should've just swapped out "src@" for "anon@". > this is wanting a password but I don't know what it is i did see on > the latest version of the build you have changed to this > > logcmd $GIT clone -b omni > anon at src.omniti.com:~omnios/core/illumos-omni-os Hmmm. This is a very old artifact. The "master" version has this changeset: osdev2(build/pkg)[0]% git show --stat 85f25c28 commit 85f25c28969c859fb9bc838a6779dd3a70286896 Author: Theo Schlossnagle Date: Sat Mar 24 20:06:12 2012 +0000 move pkg to git and make it work build/pkg/build.sh | 47 ++++++++++++++++++++++++++++------------------- 1 file changed, 28 insertions(+), 19 deletions(-) osdev2(build/pkg)[0]% that eliminates the need for the clone to happen. Generally speaking, though, it's illumos-omnios: anon at src.omniti.com:~omnios/core/illumos-omnios Perhaps fixing that will work? > so why it asking for a password for the first one lol That's a bug. I didn't fix it in anywhere other than master, though. > and what is it > and i hope you fix it all I'm curious if pkg can be re-rewhacked to use headers from a completed omnios build? I'm introducing new features into omnios-build to make it completely fire-and-forget. One new feature not yet back is the PREBUILT_OMNIOS feature, that can point packages that depend on a populated illumos-omnios proto area. Perhaps I should include PREBUILT_OMNIOS support in pkg as well?! Pardon any slowness on my part. I'm new at pkg, and at the build outside of illumos-omnios itself. Thanks, Dan _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From danmcd at omniti.com Thu Mar 27 17:10:46 2014 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 27 Mar 2014 13:10:46 -0400 Subject: [OmniOS-discuss] Problem with using omnios-build In-Reply-To: <29D2E00A4E2C9B4893E610929CA990A966ACA4E6@srv01.cblinux.co.uk> References: <29D2E00A4E2C9B4893E610929CA990A966A6F515@srv01.cblinux.co.uk> <77965151-05DF-41EA-992A-6C48992AFBBE@omniti.com> <29D2E00A4E2C9B4893E610929CA990A966AC38EC@srv01.cblinux.co.uk>, <29D2E00A4E2C9B4893E610929CA990A966AC817F@srv01.cblinux.co.uk> <29D2E00A4E2C9B4893E610929CA990A966ACA4E6@srv01.cblinux.co.uk> Message-ID: <9B51789D-9DB2-4406-B57C-B694AE245C82@omniti.com> On Mar 27, 2014, at 12:56 PM, Carl Brunning wrote: > HI just to say I've fixed it > so it all good now Good! Sorry I didn't respond earlier. A recent push into omnios-build has cause one of my works-in-progress some problems, so I've been debugging that. How did you fix your problem? Is it something we need to fix properly? Do you want to contribute if it is? Thanks, Dan From carlb at flamewarestudios.com Thu Mar 27 17:38:48 2014 From: carlb at flamewarestudios.com (Carl Brunning) Date: Thu, 27 Mar 2014 17:38:48 +0000 Subject: [OmniOS-discuss] Problem with using omnios-build In-Reply-To: <9B51789D-9DB2-4406-B57C-B694AE245C82@omniti.com> References: <29D2E00A4E2C9B4893E610929CA990A966A6F515@srv01.cblinux.co.uk> <77965151-05DF-41EA-992A-6C48992AFBBE@omniti.com> <29D2E00A4E2C9B4893E610929CA990A966AC38EC@srv01.cblinux.co.uk>, <29D2E00A4E2C9B4893E610929CA990A966AC817F@srv01.cblinux.co.uk> <29D2E00A4E2C9B4893E610929CA990A966ACA4E6@srv01.cblinux.co.uk> <9B51789D-9DB2-4406-B57C-B694AE245C82@omniti.com> Message-ID: <29D2E00A4E2C9B4893E610929CA990A966ACB534@srv01.cblinux.co.uk> Hay not a problem What I found I had bad clock skew So reset the clock on the build machine and on the repo machine so far has fixed that problem Did not know bad clock can casue so much build problems lol Even caused my illumos build problems as well Anyway am all good now Just to to see what my next problem is and fix it Keep up the good work I like the build scripts, have to see if they can be used for other os build lol Thanks Carl Brunning -----Original Message----- From: Dan McDonald [mailto:danmcd at omniti.com] Sent: 27 March 2014 17:11 To: Carl Brunning Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] Problem with using omnios-build On Mar 27, 2014, at 12:56 PM, Carl Brunning wrote: > HI just to say I've fixed it > so it all good now Good! Sorry I didn't respond earlier. A recent push into omnios-build has cause one of my works-in-progress some problems, so I've been debugging that. How did you fix your problem? Is it something we need to fix properly? Do you want to contribute if it is? Thanks, Dan From nitram at konsortit.se Fri Mar 28 15:33:17 2014 From: nitram at konsortit.se (Nitram Grebredna) Date: Fri, 28 Mar 2014 16:33:17 +0100 Subject: [OmniOS-discuss] Multipathing, only one path visible - there ought to be two, what am i doing wrong? In-Reply-To: <0AE3E26797567E4AAB5C53C304D024455DA1A23D@ns-ex2010.new-swankton.lan> References: <9A1D4D35-8F4D-4175-BFF7-A887846FCEBA@ferebee.net> <0AE3E26797567E4AAB5C53C304D024455DA1A23D@ns-ex2010.new-swankton.lan> Message-ID: Hi guys! Thanks for the input. I've had a second look at the backplane and you were right Chris, it's not dual port. Should i set mpxio-disable="yes" to disable mpxio on the sas driver or should i leave it as is? My guess would be to disable mpxio to actually see which controller holds which disk, this to be able to spread the mirror-sets across multiple controllers. Would you agree? Best regards, Martin On Wed, Mar 26, 2014 at 8:02 PM, Russell Hansen wrote: > Because those disks don't have Sun/Oracle firmware I believe you need to > update /kernel/drv/scsi_vhci.conf > > scsi-vhci-failover-override = > "SEAGATE ST3300657SS", "f_sym", > "SEAGATE ST4000NM0023", "f_sym"; > > You can double-check the VID/PID string by running format -> disk# -> > inquiry > > -Russ > > > > From: OmniOS-discuss [omnios-discuss-bounces at lists.omniti.com] on behalf > of Chris Ferebee [cf at ferebee.net] > > Sent: Wednesday, March 26, 2014 10:33 AM > > To: omnios-discuss at lists.omniti.com > > Subject: Re: [OmniOS-discuss] Multipathing, only one path visible - there > ought to be two, what am i doing wrong? > > > > > > > Martin, > > > > Are you sure you have SAS expanders in your backplane? Supermicro will > sell you the same chassis with or without expanders, with almost identical > model numbers. > > > > You've described a typical JBOD configuration (i. e., no expanders): Each > LSI 2308 has 8 SAS/SATA ports, 4 ports on each of 2 Mini-SAS SFF8087 > connectors. Thus, with 3 controllers you are running 3 x 8 = 24 SAS ports > to the backplane. > > > > Do you have the exact model number of the chassis or backplane? > > > > Best, > Chris > > > Am 26.03.2014 um 14:05 schrieb Nitram Grebredna : > > > > > > > > > > > > > > > > > Hi! > > > > > I'm having issues with multipathing, and i cant seem to figure out what is > wrong. > > > > > > The setup is a Supermicro 24 disk-box with 3 controllers (1 pcs internal > SAS2308, two 9207i-cards, same firmware on all units), identified as > follows: > > > > Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr > > > ---------------------------------------------------------------------------- > > > > 0 SAS2308_2(D1) 18.00.00.00 11.00.00.05 07.33.00.00 > 00:01:00:00 > > 1 SAS2308_2(D1) 18.00.00.00 11.00.00.05 07.33.00.00 > 00:02:00:00 > > 2 SAS2308_2(D1) 18.00.00.00 11.00.00.05 07.33.00.00 > 00:03:00:00 > > > > > The machine has 12 ST4000NM0023 (seagate 4TB DP) disks in it and a couple > or bootdisks. The controllers are connected via 2 cables per controller to > the backplane/expander. I've Installed latest stable omnios on it. > > > > > Excerpt from dmesg: > > > > [...] > > > > genunix: [ID 936769 > kern.info] mpt_sas2 is /pci at 0,0/pci8086,e06 at 2,2/pci1000,3020 at 0 > > scsi: [ID 583861 > kern.info] mpt_sas7 at mpt_sas2: scsi-iport 4 > > genunix: [ID 936769 > kern.info] mpt_sas7 is /pci at 0,0/pci8086,e06 at 2,2/pci1000,3020 at 0/iport at 4 > > genunix: [ID 408114 > kern.info] /pci at 0,0/pci8086,e06 at 2,2/pci1000,3020 at 0/iport at 4 (mpt_sas7) > online > > scsi: [ID 583861 > kern.info] sd11 at scsi_vhci0: unit-address g5000c50057c1fce3: conf f_sym > > genunix: [ID 936769 > kern.info] sd11 is /scsi_vhci/disk at g5000c50057c1fce3 > > genunix: [ID 408114 > kern.info] /scsi_vhci/disk at g5000c50057c1fce3 (sd11) online > > genunix: [ID 483743 > kern.info] /scsi_vhci/disk at g5000c50057c1fce3 (sd11) multipath status: > degraded: path 4 mpt_sas15/disk at w5000c50057c1fce1,0 is online > > scsi: [ID 583861 > kern.info] mpt_sas11 at mpt_sas1: scsi-iport 2 > > genunix: [ID 936769 > kern.info] mpt_sas11 is /pci at 0,0/pci8086,e04 at 2/pci1000,3020 at 0/iport at 2 > > genunix: [ID 408114 > kern.info] /pci at 0,0/pci8086,e04 at 2/pci1000,3020 at 0/iport at 2 (mpt_sas11) > online > > scsi: [ID 583861 > kern.info] sd2 at scsi_vhci0: unit-address g5000c50057ca74bb: conf f_sym > > genunix: [ID 936769 > kern.info] sd2 is /scsi_vhci/disk at g5000c50057ca74bb > > genunix: [ID 408114 > kern.info] /scsi_vhci/disk at g5000c50057ca74bb (sd2) online > > > > [...] > > > > > Excerpt from mpathadm: > > > > # mpathadm list lu > > /dev/rdsk/c1t5000C5006C0BF63Fd0s2 > > Total Path Count: 1 > > Operational Path Count: 1 > > /dev/rdsk/c1t5000C5006C0B29C7d0s2 > > Total Path Count: 1 > > Operational Path Count: 1 > > /dev/rdsk/c1t5000C50057C1F5BFd0s2 > > Total Path Count: 1 > > > > [...] > > > > > > > Excerpt from format: > > > > # format > > Searching for disks...done > > > > > > AVAILABLE DISK SELECTIONS: > > 0. c1t5000C5006C0B29C7d0 hd 255 sec 63> > > /scsi_vhci/disk at g5000c5006c0b29c7 > > 1. c1t5000C5006C0BF63Fd0 hd 255 sec 63> > > /scsi_vhci/disk at g5000c5006c0bf63f > > 2. c1t5000C50057C1F5BFd0 > > /scsi_vhci/disk at g5000c50057c1f5bf > > 3. c1t5000C50057C1FCE3d0 > > /scsi_vhci/disk at g5000c50057c1fce3 > > > > [...] > > > > > If i set mpxio-disable="yes"; in mpt_sas.conf the error above obviously > dissapears and also i can see the 'real' device/controller id's when > issuing the format command. > > > > > > If things were working correctly i assume i would see a total path count > of 2 per disk, and the multipath status wouldn't be set as degraded in the > log? What am i doing wrong? I've asked google and since they dont know the > answer to the question i'd thought > i'd try a post here ;) > > > > > Thanks in advance for any help, > > > > > Martin > > > > > > _______________________________________________ > > OmniOS-discuss mailing list > > OmniOS-discuss at lists.omniti.com > > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From 4omnios at nccg.de Sat Mar 29 21:27:49 2014 From: 4omnios at nccg.de (4omnios at nccg.de) Date: Sat, 29 Mar 2014 22:27:49 +0100 Subject: [OmniOS-discuss] (r151008) iscsi target, LU etc missing after reboot Message-ID: === error symptoms system with iscsi and SRP target. works great ... until reboot then only SRP target shows up Target: eui.001A4BFFFF0C3218 but no iscsi target (iqn...), no LU. Neither do configured TG and HG exist anymore. svcs export/import stmf does not help === software versions omnios-6de5e81, OmniOS v11 r151008 * srpt pck was already installed, nothing else added * napp-it running version : 0.9e1_nightly Jan.25.2014 thx for any ideas how to solve -------------- next part -------------- An HTML attachment was scrubbed... URL: From johan.kragsterman at capvert.se Sun Mar 30 11:42:12 2014 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Sun, 30 Mar 2014 13:42:12 +0200 Subject: [OmniOS-discuss] Again: Infiniband, OVUF, subnet manager, etc. Garret? Message-ID: Hi! Back again with some infiniband questions: I've been trying to read up around the "IB subnet manager" in omnios/illumos, and what I don't understand so far is, if it is included in OVUF or not...? There are very few information sources out there regarding this... What I also THINK I understand, is that OVUF is included in the ofk package. Or is this all libraries, no tools...? If there are tools around, pls inform me... Anyone with information on the subject? Garret? So why am I nagging about this subject in eternity...? It is because I would like to implement a dual storage head failover solution with zil on IB. So I'd prefer it to be direct attach, and would like to avoid IB switches. Best regards from/Med v?nliga h?lsningar fr?n Johan Kragsterman Capvert From tobi at oetiker.ch Mon Mar 31 14:16:08 2014 From: tobi at oetiker.ch (Tobias Oetiker) Date: Mon, 31 Mar 2014 16:16:08 +0200 (CEST) Subject: [OmniOS-discuss] zpool degraded while smart sais disks are OK In-Reply-To: References: <39B55A5A-AA04-4C56-8A74-5B9316861405@RichardElling.com> <0D51CBC0-D049-4A12-A733-7DDB6320BD82@richardelling.com> Message-ID: Hi Richard, Mar 23 Richard Elling wrote: > > On Mar 21, 2014, at 10:13 PM, Tobias Oetiker wrote: > > > Yesterday Richard Elling wrote: > > > >> > >> On Mar 21, 2014, at 3:23 PM, Tobias Oetiker wrote: > > > > [...] > >>> > >>> it happened over time as you can see from the timestamps in the > >>> log. The errors from zfs's point of view were 1 read and about 30 write > >>> > >>> but according to smart the disks are without flaw > >> > >> Actually, SMART is pretty dumb. In most cases, it only looks for uncorrectable > >> errors that are related to media or heads. For a clue to more permanent errors, > >> you will want to look at the read/write error reports for errors that are > >> corrected with possible delays. You can also look at the grown defects list. > >> > >> This behaviour is expected for drives with errors that are not being quickly > >> corrected or have firmware bugs (horrors!) and where the disk does not do TLER > >> (or its vendor's equivalent) > >> -- richard > > > > the error counters look like this: > > > > > > Error counter log: > > Errors Corrected by Total Correction Gigabytes Total > > ECC rereads/ errors algorithm processed uncorrected > > fast | delayed rewrites corrected invocations [10^9 bytes] errors > > read: 3494 0 0 3494 44904 530.879 0 > > write: 0 0 0 0 39111 1793.323 0 > > verify: 0 0 0 0 8133 0.000 0 > > Errors corrected without delay looks good. The problem lies elsewhere. > > > > > the disk vendor is HGST in case anyone has further ideas ... the system has 20 of these disks and the problems occured with > > three of them. The system has been running fine for two months previously. > > ...and yet there are aborted commands, likely due to a reset after a timeout. > Resets aren't issued without cause. > > There are two different resets issued by the sd driver: LU and bus. If the > LU reset doesn't work, the resets are escalated to bus. This is, of course, > tunable, but is rarely tuned. A bus reset for SAS is a questionable practice, > since SAS is a fabric, not a bus. But the effect of a device in the fabric > being reset could be seen as aborted commands by more than one target. To > troubleshoot these cases, you need to look at all of the devices in the data > path and map the common causes: HBAs, expanders, enclosures, etc. Traverse > the devices looking for errors, as you did with the disks. Useful tools: > sasinfo, lsiutil/sas2ircu, smp_utils, sg3_utils, mpathadm, fmtopo. thanks for the hints ... after detatching/attaching the 'failed' disks, they got resilvered and a subsequent scrub did not detect any errors ... all a bit mysterious ... will keep an eye on the box to see how it fares on the future ... cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 *** We are hiring IT staff: www.oetiker.ch/jobs *** From steve at linuxsuite.org Mon Mar 31 19:04:10 2014 From: steve at linuxsuite.org (steve at linuxsuite.org) Date: Mon, 31 Mar 2014 15:04:10 -0400 Subject: [OmniOS-discuss] How to disable ata module / driver at boot Message-ID: <7409d33d8efc08eccda1cecdc31bd7ea.squirrel@emailmg.netfirms.com> Howdy! I have omnios running on Dell R710, and get these warnings for device ata0. This device is a TEAC DVD ROM. kern.warning<4>: Nov 11 09:30:05 dfs2 #011timeout: reset target, target=0 lun=0 kern.warning<4>: Nov 11 09:30:05 dfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 1f,2/ide at 0 (ata0): kern.warning<4>: Nov 11 09:30:05 dfs2 #011timeout: reset bus, target=0 lun=0 kern.info<6>: Nov 11 09:35:56 dfs2 pci_autoconfig: [ID 595143 kern.info] NOTICE: add io-range on subtractive ppb[0/1e/0]: 0x3000 ~ 0x3fff Then system hangs and needs to be power cycled.. kern.info<6>: Nov 11 09:35:56 dfs2 genunix: [ID 936769 kern.info] pseudo0 is /pseudo May not be related, but I would like to reboot so that OmniOS does not see the device by not loading the driver / module. I do not need the device after system install.. What is the best way to do this? thanx - steve From jdg117 at elvis.arl.psu.edu Mon Mar 31 23:31:24 2014 From: jdg117 at elvis.arl.psu.edu (John D Groenveld) Date: Mon, 31 Mar 2014 19:31:24 -0400 Subject: [OmniOS-discuss] How to disable ata module / driver at boot In-Reply-To: Your message of "Mon, 31 Mar 2014 15:04:10 EDT." <7409d33d8efc08eccda1cecdc31bd7ea.squirrel@emailmg.netfirms.com> References: <7409d33d8efc08eccda1cecdc31bd7ea.squirrel@emailmg.netfirms.com> Message-ID: <201403312331.s2VNVOIW011926@elvis.arl.psu.edu> In message <7409d33d8efc08eccda1cecdc31bd7ea.squirrel at emailmg.netfirms.com>, st eve at linuxsuite.org writes: > May not be related, but I would like to reboot so that OmniOS >does not >see the device by not loading the driver / module. I do not need the >device after >system install.. disable-ata=true John groenveld at acm.org