From fcliang at baolict.com Tue Dec 1 17:03:46 2015 From: fcliang at baolict.com (Fucai.Liang) Date: Wed, 2 Dec 2015 01:03:46 +0800 Subject: [OmniOS-discuss] qemu-system-x86_64 can not locked enough memory Message-ID: Hello, guys? I has a server running OmniOS v11 r151016. the server have 32G memory . I star tow kvm virtual machines by running the following commands: qemu-system-x86_64 -enable-kvm -vnc 0.0.0.0:12 -cpu host -smp 4 -m 8192 -no-hpe qemu-system-x86_64 -enable-kvm -vnc 0.0.0.0:11 -cpu host -smp 2 -m 4096 -no-hpe one use 8G memory and the other one use 4G memory. now the memory usage of the system as following: root at BLCC01:/root# prtconf | grep Memory Memory size: 32760 Megabytes root at BLCC01:/root# echo "::memstat" | mdb -k Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 549618 2146 7% ZFS File Data 668992 2613 8% Anon 3198732 12495 38% Exec and libs 1411 5 0% Page cache 4402 17 0% Free (cachelist) 10578 41 0% Free (freelist) 3950545 15431 47% Total 8384278 32751 Physical 8384277 32751 root at BLCC01:/root# swap -sh total: 12G allocated + 35M reserved = 12G used, 6.8G available root at BLCC01:/root# swap -l swapfile dev swaplo blocks free /dev/zvol/dsk/rpool/swap 263,2 8 8388600 8388600 root at BLCC01:/root# root at BLCC01:/root# prctl $$ project.max-locked-memory usage 12.0GB system 16.0EB max deny - project.max-port-ids privileged 8.19K - deny - system 65.5K max deny - project.max-shm-memory privileged 8.00GB - deny - system 16.0EB max deny - #prstat -J PROJID NPROC SWAP RSS MEMORY TIME CPU PROJECT 1 5 12G 12G 38% 1:07:23 5.6% user.root 0 43 72M 76M 0.2% 0:00:59 0.0% system 3 5 4392K 14M 0.0% 0:00:00 0.0% default then I start the third vm (4G memory), it got the following error : qemu-system-x86_64 -enable-kvm -vnc 0.0.0.0:2 -cpu host -smp 2 -m 4096 -no-hpet qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying? I got 15G free memory in the system, why qemu-system-x86_64 can not locked enough memory ? Thanks for your help ! sorry for my poor english ! ----------------------------------- fcliang From danmcd at omniti.com Tue Dec 1 17:11:43 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 1 Dec 2015 12:11:43 -0500 Subject: [OmniOS-discuss] qemu-system-x86_64 can not locked enough memory In-Reply-To: References: Message-ID: > On Dec 1, 2015, at 12:03 PM, Fucai.Liang wrote: > > then I start the third vm (4G memory), it got the following error : > > > qemu-system-x86_64 -enable-kvm -vnc 0.0.0.0:2 -cpu host -smp 2 -m 4096 -no-hpet > > qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... > qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... > qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... > qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... > qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... > qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying? > > > I got 15G free memory in the system, why qemu-system-x86_64 can not locked enough memory ? What does "vmstat 1 5" say prior to your launch of the third VM? Dan From fcliang at baolict.com Tue Dec 1 17:24:34 2015 From: fcliang at baolict.com (Fucai.Liang) Date: Wed, 2 Dec 2015 01:24:34 +0800 Subject: [OmniOS-discuss] qemu-system-x86_64 can not locked enough memory In-Reply-To: References: Message-ID: <13D19B56-7F73-40EB-88B7-0551A2D76316@baolict.com> root at BLCC01:/root# vmstat 15 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr ro s0 s1 s2 in sy cs us sy id 0 0 0 8327684 16628108 21 334 0 0 0 0 39 1 1 28 28 3971 22009 9713 1 6 93 0 0 0 7135732 15842152 1 3 0 0 0 0 0 0 0 19 19 3895 22852 9855 1 5 93 1 0 0 7135692 15842176 0 0 0 0 0 0 0 0 0 26 26 4053 22766 9903 1 5 94 0 0 0 7135656 15842140 0 0 0 0 0 0 0 0 0 20 20 4001 22727 9858 1 5 94 ??????????????>launch third VM . 1 0 0 1966932 14275356 13 98 0 0 0 0 0 0 0 20 20 4103 22893 10480 1 8 91 0 0 0 1037608 13954964 2 30 0 0 0 0 0 0 0 19 19 4195 23059 10683 1 6 93 0 0 0 1037280 13954636 0 0 0 0 0 0 0 0 0 24 24 4312 22948 10636 1 5 94 0 0 0 1037112 13954468 0 0 0 0 0 0 0 0 0 21 21 4362 22927 10678 1 5 93 0 0 0 1037288 13954644 0 0 0 0 0 0 0 0 0 19 19 4256 22897 10551 1 5 94 0 0 0 1037412 13954768 0 0 0 0 0 0 0 0 0 19 19 4384 23172 10638 1 6 93 thank Dan ----------------------------------- fcliang > On Dec 2, 2015, at 1:11 AM, Dan McDonald wrote: > > >> On Dec 1, 2015, at 12:03 PM, Fucai.Liang wrote: >> >> then I start the third vm (4G memory), it got the following error : >> >> >> qemu-system-x86_64 -enable-kvm -vnc 0.0.0.0:2 -cpu host -smp 2 -m 4096 -no-hpet >> >> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying? >> >> >> I got 15G free memory in the system, why qemu-system-x86_64 can not locked enough memory ? > > What does "vmstat 1 5" say prior to your launch of the third VM? > > Dan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josh at sysmgr.org Tue Dec 1 17:37:12 2015 From: josh at sysmgr.org (Joshua M. Clulow) Date: Tue, 1 Dec 2015 09:37:12 -0800 Subject: [OmniOS-discuss] qemu-system-x86_64 can not locked enough memory In-Reply-To: References: Message-ID: On 1 December 2015 at 09:11, Dan McDonald wrote: >> On Dec 1, 2015, at 12:03 PM, Fucai.Liang wrote: >> then I start the third vm (4G memory), it got the following error : >> qemu-system-x86_64 -enable-kvm -vnc 0.0.0.0:2 -cpu host -smp 2 -m 4096 -no-hpet >> >> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying? >> >> I got 15G free memory in the system, why qemu-system-x86_64 can not locked enough memory ? > What does "vmstat 1 5" say prior to your launch of the third VM? I suspect it will show you have free memory available, but that what is really happening is getting here: https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/vm/seg_vn.c#L7989-L8002 This is likely failing in page_pp_lock() because "availrmem" has fallen below "pages_pp_maximum": https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/vm/vm_page.c#L3817-L3818 We set this value here, though it can be overridden in "/etc/system": https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/vm/vm_page.c#L423-L436 You can look at the current values with mdb: mdb -ke 'availrmem/D ; pages_pp_maximum/D' Increasing this value doesn't seem to be without risk: I believe that it can lead to memory exhaustion deadlocks, amongst other things. I don't know if it's expected to be tuneable without a reboot. Cheers. -- Joshua M. Clulow UNIX Admin/Developer http://blog.sysmgr.org From fcliang at baolict.com Wed Dec 2 03:38:35 2015 From: fcliang at baolict.com (=?utf-8?Q?Fucai_Liang_=EF=BC=88BLCT=EF=BC=89?=) Date: Wed, 2 Dec 2015 11:38:35 +0800 Subject: [OmniOS-discuss] qemu-system-x86_64 can not locked enough memory In-Reply-To: References: Message-ID: <26523C65-285A-41C3-8C5C-CD509D20F965@baolict.com> Thank for your help! when the server boot up, it has 7989066 pages availrmem. after I launch one VM (8Gmemory), availrmem decrease to 4756624 . 7989066-4756624 = 3232442 3232442/256 = 12626.7265625 / 1024 = 12.3G root at BLCC01:/root# mdb -ke 'availrmem/D ; pages_pp_maximum/D' availrmem: availrmem: 7989066 pages_pp_maximum: pages_pp_maximum: 325044 root at BLCC01:/root# qemu-system-x86_64 -enable-kvm -vnc 0.0.0.0:12 -cpu host -smp 4 -m 8192 -no-hpe root at BLCC01:/root# mdb -ke 'availrmem/D ; pages_pp_maximum/D' availrmem: availrmem: 4756624 pages_pp_maximum: pages_pp_maximum: 325044 root at BLCC01:/root# That mean the VM use 12.3G availrmem , how it happens ? Thank ! ------------------------------ fcliang On Dec 2, 2015, at 1:37, Joshua M. Clulow wrote: > On 1 December 2015 at 09:11, Dan McDonald wrote: >>> On Dec 1, 2015, at 12:03 PM, Fucai.Liang wrote: >>> then I start the third vm (4G memory), it got the following error : >>> qemu-system-x86_64 -enable-kvm -vnc 0.0.0.0:2 -cpu host -smp 2 -m 4096 -no-hpet >>> >>> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >>> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >>> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >>> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >>> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying... >>> qemu_mlock: have only locked 1940582400 of 4294967296 bytes; still trying? >>> >>> I got 15G free memory in the system, why qemu-system-x86_64 can not locked enough memory ? >> What does "vmstat 1 5" say prior to your launch of the third VM? > > I suspect it will show you have free memory available, but that what > is really happening is getting here: > > https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/vm/seg_vn.c#L7989-L8002 > > This is likely failing in page_pp_lock() because "availrmem" has > fallen below "pages_pp_maximum": > > https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/vm/vm_page.c#L3817-L3818 > > We set this value here, though it can be overridden in "/etc/system": > > https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/vm/vm_page.c#L423-L436 > > You can look at the current values with mdb: > > mdb -ke 'availrmem/D ; pages_pp_maximum/D' > > Increasing this value doesn't seem to be without risk: I believe that > it can lead to memory exhaustion deadlocks, amongst other things. I > don't know if it's expected to be tuneable without a reboot. > > > Cheers. > > -- > Joshua M. Clulow > UNIX Admin/Developer > http://blog.sysmgr.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From omnios at citrus-it.net Wed Dec 2 15:01:21 2015 From: omnios at citrus-it.net (Andy Fiddaman) Date: Wed, 2 Dec 2015 15:01:21 +0000 (UTC) Subject: [OmniOS-discuss] PCRE version Message-ID: Whilst playing around with the latest version of ClamAV I notice that it now prints this warning: configure: WARNING: The installed pcre version may contain a security bug. Please upgrade to 8.38 or later: http://www.pcre.org. There is some information on the security fixes in 8.38 at https://blog.fuzzing-project.org/29-Heap-Overflow-in-PCRE.html but I couldn't find anything specific on the Exim mailing list. May be worth bumping the PCRE version. Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From danmcd at omniti.com Wed Dec 2 15:09:15 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 2 Dec 2015 10:09:15 -0500 Subject: [OmniOS-discuss] PCRE version In-Reply-To: References: Message-ID: <8AA43FEA-E2C9-4169-9A74-3A12A639846A@omniti.com> > On Dec 2, 2015, at 10:01 AM, Andy Fiddaman wrote: > > May be worth bumping the PCRE version. Sure is. I followed this, but wasn't sure how deeply it might affect the user base. Someone's asking, so I'll take care of it. I'll have to bump it on all the releases. Watch for it later today. Dan From danmcd at omniti.com Wed Dec 2 16:21:10 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 2 Dec 2015 11:21:10 -0500 Subject: [OmniOS-discuss] PCRE now updated for OmniOS Message-ID: Several CVEs have been filed against PCRE (Perl Compatible Regular Expressions). All supported versions of OmniOS (r151006, r151014, and r151016) have updates of PCRE to version 8.38. This is a non-reboot-needed update, but you may need to restart certain services, especially those not provided by the system. Thanks, Dan From cks at cs.toronto.edu Wed Dec 2 19:16:59 2015 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Wed, 02 Dec 2015 14:16:59 -0500 Subject: [OmniOS-discuss] What's the best way to detect OmniOS version, specifically r151014? Message-ID: <20151202191659.3635A7A0875@apps0.cs.toronto.edu> We have at least one shell script that needs to know if it's running on a host with OmniOS r151014 versus a host with an earlier OmniOS version (due to the change in ZFS pool reservations from 1/64th of the pool to 1/32nd of the pool that we picked up with r151014). Is there any particular good way for a shell script to determine this, ideally in a lightweight way and without requiring root permissions? Thanks in advance. - cks From ikaufman at eng.ucsd.edu Wed Dec 2 19:19:53 2015 From: ikaufman at eng.ucsd.edu (Ian Kaufman) Date: Wed, 2 Dec 2015 11:19:53 -0800 Subject: [OmniOS-discuss] What's the best way to detect OmniOS version, specifically r151014? In-Reply-To: <20151202191659.3635A7A0875@apps0.cs.toronto.edu> References: <20151202191659.3635A7A0875@apps0.cs.toronto.edu> Message-ID: Examine /etc/release? Ian On Wed, Dec 2, 2015 at 11:16 AM, Chris Siebenmann wrote: > We have at least one shell script that needs to know if it's running > on a host with OmniOS r151014 versus a host with an earlier OmniOS > version (due to the change in ZFS pool reservations from 1/64th of the > pool to 1/32nd of the pool that we picked up with r151014). Is there > any particular good way for a shell script to determine this, ideally > in a lightweight way and without requiring root permissions? > > Thanks in advance. > > - cks > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -- Ian Kaufman Research Systems Administrator UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Wed Dec 2 19:22:37 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 2 Dec 2015 14:22:37 -0500 Subject: [OmniOS-discuss] What's the best way to detect OmniOS version, specifically r151014? In-Reply-To: <20151202191659.3635A7A0875@apps0.cs.toronto.edu> References: <20151202191659.3635A7A0875@apps0.cs.toronto.edu> Message-ID: > On Dec 2, 2015, at 2:16 PM, Chris Siebenmann wrote: > > We have at least one shell script that needs to know if it's running > on a host with OmniOS r151014 versus a host with an earlier OmniOS > version (due to the change in ZFS pool reservations from 1/64th of the > pool to 1/32nd of the pool that we picked up with r151014). Is there > any particular good way for a shell script to determine this, ideally > in a lightweight way and without requiring root permissions? > > Thanks in advance. /etc/release is the stable interface. We use it ourselves in the omniti-ms gate: # Determine what release we're running as that affects some versions of things RELEASE=$(head -1 /etc/release | awk '{ print $3 }') Hope this helps, Dan From cks at cs.toronto.edu Wed Dec 2 19:23:09 2015 From: cks at cs.toronto.edu (Chris Siebenmann) Date: Wed, 02 Dec 2015 14:23:09 -0500 Subject: [OmniOS-discuss] What's the best way to detect OmniOS version, specifically r151014? In-Reply-To: ikaufman's message of Wed, 02 Dec 2015 11:19:53 -0800. Message-ID: <20151202192309.BCCCE7A0875@apps0.cs.toronto.edu> > On Wed, Dec 2, 2015 at 11:16 AM, Chris Siebenmann > wrote: > > We have at least one shell script that needs to know if it's > > running on a host with OmniOS r151014 versus a host with an earlier > > OmniOS version (due to the change in ZFS pool reservations from > > 1/64th of the pool to 1/32nd of the pool that we picked up with > > r151014). Is there any particular good way for a shell script to > > determine this, ideally in a lightweight way and without requiring > > root permissions? > > Examine /etc/release? Somehow I missed that file. This looks like exactly what I want; I can easily match against the first line. Thank you. - cks From doug at will.to Wed Dec 2 19:34:28 2015 From: doug at will.to (Doug Hughes) Date: Wed, 2 Dec 2015 14:34:28 -0500 Subject: [OmniOS-discuss] What's the best way to detect OmniOS version, specifically r151014? In-Reply-To: <20151202191659.3635A7A0875@apps0.cs.toronto.edu> References: <20151202191659.3635A7A0875@apps0.cs.toronto.edu> Message-ID: <6519793c-24ba-4446-b8da-576b9e3697e8.maildroid@localhost> You can look at /etc/release. It's on the 1st line. Sent from my android device. -----Original Message----- From: Chris Siebenmann To: omnios-discuss at lists.omniti.com Sent: Wed, 02 Dec 2015 14:30 Subject: [OmniOS-discuss] What's the best way to detect OmniOS version, specifically r151014? We have at least one shell script that needs to know if it's running on a host with OmniOS r151014 versus a host with an earlier OmniOS version (due to the change in ZFS pool reservations from 1/64th of the pool to 1/32nd of the pool that we picked up with r151014). Is there any particular good way for a shell script to determine this, ideally in a lightweight way and without requiring root permissions? Thanks in advance. - cks _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.tribble at gmail.com Thu Dec 3 10:35:58 2015 From: peter.tribble at gmail.com (Peter Tribble) Date: Thu, 3 Dec 2015 10:35:58 +0000 Subject: [OmniOS-discuss] PCRE now updated for OmniOS In-Reply-To: References: Message-ID: On Wed, Dec 2, 2015 at 4:21 PM, Dan McDonald wrote: > Several CVEs have been filed against PCRE (Perl Compatible Regular > Expressions). All supported versions of OmniOS (r151006, r151014, and > r151016) have updates of PCRE to version 8.38. > > This is a non-reboot-needed update, but you may need to restart certain > services, especially those not provided by the system. > That's not entirely true, unfortunately. The versions appear to be overconstrained, so you need to update entire and omnios-userland in order to see the updated pcre packages (which will require a reboot if you aren't current). All my machines simply tell me that no updates are available. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Thu Dec 3 14:05:08 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 3 Dec 2015 09:05:08 -0500 Subject: [OmniOS-discuss] PCRE now updated for OmniOS In-Reply-To: References: Message-ID: <912F6657-4DDD-4DE5-BD64-C45D6E467733@omniti.com> Sent from my iPhone (typos, autocorrect, and all) > On Dec 3, 2015, at 5:35 AM, Peter Tribble wrote: > > (which will require a reboot > if you aren't current) Much of the entire and OmniOS-userland constraints could be loosened, and at least in a few cases, they have. As for not being current, I assume most folks stay up to date. The most recent reboot-required update included potential security fixes, so I (perhaps incorrectly) assume people take their machines to updates when I release them. The one mentioned here: http://lists.omniti.com/pipermail/omnios-discuss/2015-November/005950.html Included a KVM driver update that closed a hole, eg. Dan Sent from my iPhone (typos, autocorrect, and all) From danmcd at omniti.com Thu Dec 3 21:18:18 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 3 Dec 2015 16:18:18 -0500 Subject: [OmniOS-discuss] OpenSSL updates for OmniOS Message-ID: <936B7914-AD43-4B35-AA23-2102C105F126@omniti.com> OpenSSL 1.0.2e is now available for LTS (r151014), Stable (r151016), and will also be ready for the next large update of bloody. OpenSSL 1.0.1q is now available for old-LTS (r151006). Additionally, LTS receives a bump to wget, to work better with modern HTTPS servers, and old-LTS gets a bump in "entire", due to a previous packaging error. These are SECURITY FIXES and you should "pkg update" as soon as possible. Dan From omnios at citrus-it.net Thu Dec 3 23:03:11 2015 From: omnios at citrus-it.net (Andy Fiddaman) Date: Thu, 3 Dec 2015 23:03:11 +0000 (UTC) Subject: [OmniOS-discuss] PCRE now updated for OmniOS In-Reply-To: References: Message-ID: On Wed, 2 Dec 2015, Dan McDonald wrote: ; Several CVEs have been filed against PCRE (Perl Compatible Regular Expressions). All supported versions of OmniOS (r151006, r151014, and r151016) have updates of PCRE to version 8.38. ; ; This is a non-reboot-needed update, but you may need to restart certain services, especially those not provided by the system. Thanks Dan. Quick as ever! Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From paladinemishakal at gmail.com Fri Dec 4 09:38:52 2015 From: paladinemishakal at gmail.com (Lawrence Giam) Date: Fri, 4 Dec 2015 17:38:52 +0800 Subject: [OmniOS-discuss] core dump while trying to import pool Message-ID: Hi All, I have a problem here is that I am upgrading my server OS from OpenIndiana 151a7 to OmniOS R151014. While working on the upgrade, I have detached the sas expander from the main chassis and the installation was proceeding fine. When the upgrade is done, I connect back the SAS expander. I have 2 pool which one is on the main chassis and the other one is on the SAS expander. When I was trying to import the pool on the main chassis, the system core dump and rebooted. I removed the SAS expander and attempt to boot with the main chassis. The system displayed this on the screen: svc.startd[10]: svc:/system/boot-archive:default: Method or service exit timed out. Killing contract 15. svc.startd[10]: svc:/system/boot-archive:default: Method "/lib/svc/method/boot-archive" failed due to signal KILL. console login: Reading ZFS config: done. Mounting ZFS filesystems: (43/1018) After a while, the system core dump again ffffff003ee4a170 unix:die+df () ffffff003ee4a280 unix:trap+db3 () ffffff003ee4a290 unix:cmntrap+e6 () ffffff003ee4a3d0 zfs:zap_leaf_lookup_closest+45 () ffffff003ee4a470 zfs:fzap_cursor_retrieve+bb () ffffff003ee4a510 zfs:zap_cursor_trtrieve+11e () ffffff003ee4a700 zfs:zfs_purgedir+67 () ffffff003ee4a750 zfs:zfs_rmnode+202 () ffffff003ee4a790 zfs:zfs_zinactive+e8 () ffffff003ee4a7f0 zfs:zfs_inactive+75 () ffffff003ee4a850 genunix:fop_inactive+76 () ffffff003ee4a880 genunix:vn_rele+82 () ffffff003ee4aa70 zfs:zfs_unlinked_drain++aa () ffffff003ee4aab0 zfs:zfsvfs_setup+e8 () ffffff003ee4ab10 zfs:zfs_domount+131 () ffffff003ee4ac40 zfs:zfs_mount+24f () ffffff003ee4ac70 genunix:fsop_mount+1e () ffffff003ee4ad70 genunix:domount+86b () ffffff003ee4ae80 genunix:mount+167 () ffffff003ee4aec0 genunix: syscall_ap+94 () ffffff003ee4af10 unix:brand_sys_sysenter+1c9 () syncing file systems.... done dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel Now, I boot to single user mode. I need help urgently, what should I do next? Thanks & Regards, Lawrence. -------------- next part -------------- An HTML attachment was scrubbed... URL: From paladinemishakal at gmail.com Fri Dec 4 10:40:18 2015 From: paladinemishakal at gmail.com (Lawrence Giam) Date: Fri, 4 Dec 2015 18:40:18 +0800 Subject: [OmniOS-discuss] core dump while trying to import pool In-Reply-To: References: Message-ID: Not sure is this is link to what I am facing: lawrence at sgsan7r:/export/home/lawrence$ fmdump -Vp -u 036b26cc-a99a-c9a5-9a1e-df89eef1be5d TIME UUID SUNW-MSG-ID Dec 04 2015 17:50:58.724277000 036b26cc-a99a-c9a5-9a1e-df89eef1be5d SUNOS-8000-KL TIME CLASS ENA Dec 04 17:50:58.7180 ireport.os.sunos.panic.dump_available 0x0000000000000000 Dec 04 17:50:53.9974 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000 nvlist version: 0 version = 0x0 class = list.suspect uuid = 036b26cc-a99a-c9a5-9a1e-df89eef1be5d code = SUNOS-8000-KL diag-time = 1449222658 719027 de = fmd:///module/software-diagnosis fault-list-sz = 0x1 fault-list = (array of embedded nvlists) (start fault-list[0]) nvlist version: 0 version = 0x0 class = defect.sunos.kernel.panic certainty = 0x64 asru = sw:///:path=/var/crash/unknown/.036b26cc-a99a-c9a5-9a1e-df89eef1be5d resource = sw:///:path=/var/crash/unknown/.036b26cc-a99a-c9a5-9a1e-df89eef1be5d savecore-succcess = 1 dump-dir = /var/crash/unknown dump-files = vmdump.0 os-instance-uuid = 036b26cc-a99a-c9a5-9a1e-df89eef1be5d panicstr = BAD TRAP: type=e (#pf Page fault) rp=ffffff003ee4a290 addr=20 occurred in module "zfs" due to a NULL pointer dereference panicstack = unix:die+df () | unix:trap+db3 () | unix:cmntrap+e6 () | zfs:zap_leaf_lookup_closest+45 () | zfs:fzap_cursor_retrieve+bb () | zfs:zap_cursor_retrieve+11e () | zfs:zfs_purgedir+67 () | zfs:zfs_rmnode+202 () | zfs:zfs_zinactive+e8 () | zfs:zfs_inactive+75 () | genunix:fop_inactive+76 () | genunix:vn_rele+82 () | zfs:zfs_unlinked_drain+aa () | zfs:zfsvfs_setup+e8 () | zfs:zfs_domount+131 () | zfs:zfs_mount+24f () | genunix:fsop_mount+1e () | genunix:domount+86b () | genunix:mount+167 () | genunix:syscall_ap+94 () | unix:brand_sys_sysenter+1c9 () | crashtime = 1449220361 panic-time = Fri Dec 4 17:12:41 2015 SGT (end fault-list[0]) fault-status = 0x1 severity = Major __ttl = 0x1 __tod = 0x56616202 0x2b2b9708 On Fri, Dec 4, 2015 at 5:38 PM, Lawrence Giam wrote: > Hi All, > > I have a problem here is that I am upgrading my server OS from OpenIndiana > 151a7 to OmniOS R151014. While working on the upgrade, I have detached the > sas expander from the main chassis and the installation was proceeding fine. > > When the upgrade is done, I connect back the SAS expander. I have 2 pool > which one is on the main chassis and the other one is on the SAS expander. > When I was trying to import the pool on the main chassis, the system core > dump and rebooted. > > I removed the SAS expander and attempt to boot with the main chassis. The > system displayed this on the screen: > svc.startd[10]: svc:/system/boot-archive:default: Method or service exit > timed out. Killing contract 15. > svc.startd[10]: svc:/system/boot-archive:default: Method > "/lib/svc/method/boot-archive" failed due to signal KILL. > > console login: Reading ZFS config: done. > Mounting ZFS filesystems: (43/1018) > > After a while, the system core dump again > ffffff003ee4a170 unix:die+df () > ffffff003ee4a280 unix:trap+db3 () > ffffff003ee4a290 unix:cmntrap+e6 () > ffffff003ee4a3d0 zfs:zap_leaf_lookup_closest+45 () > ffffff003ee4a470 zfs:fzap_cursor_retrieve+bb () > ffffff003ee4a510 zfs:zap_cursor_trtrieve+11e () > ffffff003ee4a700 zfs:zfs_purgedir+67 () > ffffff003ee4a750 zfs:zfs_rmnode+202 () > ffffff003ee4a790 zfs:zfs_zinactive+e8 () > ffffff003ee4a7f0 zfs:zfs_inactive+75 () > ffffff003ee4a850 genunix:fop_inactive+76 () > ffffff003ee4a880 genunix:vn_rele+82 () > ffffff003ee4aa70 zfs:zfs_unlinked_drain++aa () > ffffff003ee4aab0 zfs:zfsvfs_setup+e8 () > ffffff003ee4ab10 zfs:zfs_domount+131 () > ffffff003ee4ac40 zfs:zfs_mount+24f () > ffffff003ee4ac70 genunix:fsop_mount+1e () > ffffff003ee4ad70 genunix:domount+86b () > ffffff003ee4ae80 genunix:mount+167 () > ffffff003ee4aec0 genunix: syscall_ap+94 () > ffffff003ee4af10 unix:brand_sys_sysenter+1c9 () > > syncing file systems.... done > dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel > > Now, I boot to single user mode. > > I need help urgently, what should I do next? > > Thanks & Regards, > Lawrence. > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdg117 at elvis.arl.psu.edu Fri Dec 4 12:54:31 2015 From: jdg117 at elvis.arl.psu.edu (John D Groenveld) Date: Fri, 04 Dec 2015 07:54:31 -0500 Subject: [OmniOS-discuss] core dump while trying to import pool In-Reply-To: Your message of "Fri, 04 Dec 2015 17:38:52 +0800." References: Message-ID: <201512041254.tB4CsVBJ005325@elvis.arl.psu.edu> In message , Lawrence Giam writes: >151a7 to OmniOS R151014. While working on the upgrade, I have detached the >sas expander from the main chassis and the installation was proceeding fine. If you boot 151016 installation media to single-user, can you import your pools? John groenveld at acm.org From danmcd at omniti.com Fri Dec 4 15:42:34 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 4 Dec 2015 10:42:34 -0500 Subject: [OmniOS-discuss] core dump while trying to import pool In-Reply-To: References: Message-ID: <71C5258A-C99E-44DF-BFE1-A1D5EE0CE686@omniti.com> > On Dec 4, 2015, at 4:38 AM, Lawrence Giam wrote: > > Hi All, > > I have a problem here is that I am upgrading my server OS from OpenIndiana 151a7 to OmniOS R151014. While working on the upgrade, I have detached the sas expander from the main chassis and the installation was proceeding fine. > > When the upgrade is done, I connect back the SAS expander. I have 2 pool which one is on the main chassis and the other one is on the SAS expander. When I was trying to import the pool on the main chassis, the system core dump and rebooted. I've seen this stack before: > After a while, the system core dump again > ffffff003ee4a170 unix:die+df () > ffffff003ee4a280 unix:trap+db3 () > ffffff003ee4a290 unix:cmntrap+e6 () > ffffff003ee4a3d0 zfs:zap_leaf_lookup_closest+45 () > ffffff003ee4a470 zfs:fzap_cursor_retrieve+bb () > ffffff003ee4a510 zfs:zap_cursor_trtrieve+11e () > ffffff003ee4a700 zfs:zfs_purgedir+67 () > ffffff003ee4a750 zfs:zfs_rmnode+202 () > ffffff003ee4a790 zfs:zfs_zinactive+e8 () > ffffff003ee4a7f0 zfs:zfs_inactive+75 () > ffffff003ee4a850 genunix:fop_inactive+76 () > ffffff003ee4a880 genunix:vn_rele+82 () > ffffff003ee4aa70 zfs:zfs_unlinked_drain++aa () > ffffff003ee4aab0 zfs:zfsvfs_setup+e8 () > ffffff003ee4ab10 zfs:zfs_domount+131 () > ffffff003ee4ac40 zfs:zfs_mount+24f () > ffffff003ee4ac70 genunix:fsop_mount+1e () > ffffff003ee4ad70 genunix:domount+86b () > ffffff003ee4ae80 genunix:mount+167 () > ffffff003ee4aec0 genunix: syscall_ap+94 () > ffffff003ee4af10 unix:brand_sys_sysenter+1c9 () Tell me, do you have an L2ARC on this pool? And John's suggestion is a very good one: Boot the 016 ISO, and see if a vanilla "zpool import " causes problems. Dan From paladinemishakal at gmail.com Fri Dec 4 15:53:06 2015 From: paladinemishakal at gmail.com (Lawrence Giam) Date: Fri, 4 Dec 2015 23:53:06 +0800 Subject: [OmniOS-discuss] core dump while trying to import pool In-Reply-To: <71C5258A-C99E-44DF-BFE1-A1D5EE0CE686@omniti.com> References: <71C5258A-C99E-44DF-BFE1-A1D5EE0CE686@omniti.com> Message-ID: Hi Dan, No, I do not have a L2ARC on this server. This server is use to receive "zfs send" and so I have a very large zfs filesystem. Should I cancel the scrub and try the method that John suggest? Regards. On Fri, Dec 4, 2015 at 11:42 PM, Dan McDonald wrote: > > > On Dec 4, 2015, at 4:38 AM, Lawrence Giam > wrote: > > > > Hi All, > > > > I have a problem here is that I am upgrading my server OS from > OpenIndiana 151a7 to OmniOS R151014. While working on the upgrade, I have > detached the sas expander from the main chassis and the installation was > proceeding fine. > > > > When the upgrade is done, I connect back the SAS expander. I have 2 pool > which one is on the main chassis and the other one is on the SAS expander. > When I was trying to import the pool on the main chassis, the system core > dump and rebooted. > > I've seen this stack before: > > > After a while, the system core dump again > > ffffff003ee4a170 unix:die+df () > > ffffff003ee4a280 unix:trap+db3 () > > ffffff003ee4a290 unix:cmntrap+e6 () > > ffffff003ee4a3d0 zfs:zap_leaf_lookup_closest+45 () > > ffffff003ee4a470 zfs:fzap_cursor_retrieve+bb () > > ffffff003ee4a510 zfs:zap_cursor_trtrieve+11e () > > ffffff003ee4a700 zfs:zfs_purgedir+67 () > > ffffff003ee4a750 zfs:zfs_rmnode+202 () > > ffffff003ee4a790 zfs:zfs_zinactive+e8 () > > ffffff003ee4a7f0 zfs:zfs_inactive+75 () > > ffffff003ee4a850 genunix:fop_inactive+76 () > > ffffff003ee4a880 genunix:vn_rele+82 () > > ffffff003ee4aa70 zfs:zfs_unlinked_drain++aa () > > ffffff003ee4aab0 zfs:zfsvfs_setup+e8 () > > ffffff003ee4ab10 zfs:zfs_domount+131 () > > ffffff003ee4ac40 zfs:zfs_mount+24f () > > ffffff003ee4ac70 genunix:fsop_mount+1e () > > ffffff003ee4ad70 genunix:domount+86b () > > ffffff003ee4ae80 genunix:mount+167 () > > ffffff003ee4aec0 genunix: syscall_ap+94 () > > ffffff003ee4af10 unix:brand_sys_sysenter+1c9 () > > Tell me, do you have an L2ARC on this pool? > > And John's suggestion is a very good one: Boot the 016 ISO, and see if a > vanilla "zpool import " causes problems. > > Dan > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Fri Dec 4 15:56:20 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 4 Dec 2015 10:56:20 -0500 Subject: [OmniOS-discuss] core dump while trying to import pool In-Reply-To: References: <71C5258A-C99E-44DF-BFE1-A1D5EE0CE686@omniti.com> Message-ID: <589C8043-C3E2-4249-99E8-AA5A35E17892@omniti.com> > On Dec 4, 2015, at 10:53 AM, Lawrence Giam wrote: > > Should I cancel the scrub and try the method that John suggest? > I'd let the scrub run to be sure. If it's the class of bug I'm thinking, though, scrub won't catch it. :( And if you can provide one of those r151014 core dumps, that'd be great. If this pool has confidential data, though, I can understand why not. Dan From mtalbott at lji.org Fri Dec 4 19:33:09 2015 From: mtalbott at lji.org (Michael Talbott) Date: Fri, 4 Dec 2015 11:33:09 -0800 Subject: [OmniOS-discuss] core dump while trying to import pool In-Reply-To: <589C8043-C3E2-4249-99E8-AA5A35E17892@omniti.com> References: <71C5258A-C99E-44DF-BFE1-A1D5EE0CE686@omniti.com> <589C8043-C3E2-4249-99E8-AA5A35E17892@omniti.com> Message-ID: I also came upon this same issue after rebooting one of my OmniOS machines. I did have l2arc devices on my pool until the announcement of the bug found. At that point I immediately removed my l2arc devices and didn't reboot the machine until a convenient time where if something bad were to happen I could manage it. Well, it was good I planned for that reboot ;) I was able to boot in single user mode, delete the pool cache file, reboot, import without mounting (zpool import -N ) and then scrub. Scrub fixed 16kb of data in my 254TB pool.. then exported and imported the pool as rw only to discover that it did not fix the problem at all. Importing as read-only allows proper mounting to pull data off. The problem for me stemmed around mounting 1 of my 52 filesystem as rw. I was able to mount the filesystems one by one after a zpool import -N to discover which filesystem was causing the issue. I'm still rsync'ng the problem filesystem out since as luck would have it, it was the only one that I wasn't replicating out (probably a good thing considering) since I used it for a scratch drive. But my plan is to destroy then recreate the problem fs after the sync finishes and rsync it back.. And cross my fingers that the problem doesn't come back or get worse.. The problem I'm seeing that causes this is: BAD TRAP: type=e (#pf Page fault) rp=ffffff00f5cee290 addr=20 occurred in module "zfs" due to a NULL pointer dereference Here's the details of my crash which appears to be the same as yours: root at store2:/var/crash/unknown# mdb unix.2 vmcore.2 Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci zfs mr_sas sd ip hook neti sockfs arp usba stmf stmf_sbd random md lofs idm sata cpc crypto kvm mpt_sas ufs logindmux nsmb ptm smbsrv nfs ipc ] > $c zap_leaf_lookup_closest+0x45(ffffff223e7bd290, 0, 0, ffffff00f5cee3f0) fzap_cursor_retrieve+0xbb(ffffff223e7bd290, ffffff00f5cee650, ffffff00f5cee530) zap_cursor_retrieve+0x11e(ffffff00f5cee650, ffffff00f5cee530) zfs_purgedir+0x67(ffffff2232f41bc0) zfs_rmnode+0x202(ffffff2232f41bc0) zfs_zinactive+0xe8(ffffff2232f41bc0) zfs_inactive+0x75(ffffff2232f44640, ffffff221918b468, 0) fop_inactive+0x76(ffffff2232f44640, ffffff221918b468, 0) vn_rele+0x82(ffffff2232f44640) zfs_unlinked_drain+0xaa(ffffff21f254d000) zfsvfs_setup+0xe8(ffffff21f254d000, 1) zfs_domount+0x131(ffffff223d709368, ffffff222916fd80) zfs_mount+0x24f(ffffff223d709368, ffffff21f2645400, ffffff00f5ceee00, ffffff221918b468) fsop_mount+0x1e(ffffff223d709368, ffffff21f2645400, ffffff00f5ceee00, ffffff221918b468) domount+0x86b(0, ffffff00f5ceee00, ffffff21f2645400, ffffff221918b468, ffffff00f5ceee40) mount+0x167(ffffff2228e61c38, ffffff00f5ceee90) syscall_ap+0x94() _sys_sysenter_post_swapgs+0x149() > ::status debugging crash dump vmcore.2 (64-bit) from store2 operating system: 5.11 omnios-8322307 (i86pc) image uuid: 69a1d6dd-f13a-627d-c2a0-b00c9e50a800 panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff00f5cee290 addr=20 occurred in module "zfs" due to a NULL pointer dereference dump content: kernel pages only > ::stack zap_leaf_lookup_closest+0x45(ffffff223e7bd290, 0, 0, ffffff00f5cee3f0) fzap_cursor_retrieve+0xbb(ffffff223e7bd290, ffffff00f5cee650, ffffff00f5cee530) zap_cursor_retrieve+0x11e(ffffff00f5cee650, ffffff00f5cee530) zfs_purgedir+0x67(ffffff2232f41bc0) zfs_rmnode+0x202(ffffff2232f41bc0) zfs_zinactive+0xe8(ffffff2232f41bc0) zfs_inactive+0x75(ffffff2232f44640, ffffff221918b468, 0) fop_inactive+0x76(ffffff2232f44640, ffffff221918b468, 0) vn_rele+0x82(ffffff2232f44640) zfs_unlinked_drain+0xaa(ffffff21f254d000) zfsvfs_setup+0xe8(ffffff21f254d000, 1) zfs_domount+0x131(ffffff223d709368, ffffff222916fd80) zfs_mount+0x24f(ffffff223d709368, ffffff21f2645400, ffffff00f5ceee00, ffffff221918b468) fsop_mount+0x1e(ffffff223d709368, ffffff21f2645400, ffffff00f5ceee00, ffffff221918b468) domount+0x86b(0, ffffff00f5ceee00, ffffff21f2645400, ffffff221918b468, ffffff00f5ceee40) mount+0x167(ffffff2228e61c38, ffffff00f5ceee90) syscall_ap+0x94() _sys_sysenter_post_swapgs+0x149() > ::panicinfo cpu 3 thread ffffff21f2968440 message BAD TRAP: type=e (#pf Page fault) rp=ffffff00f5cee290 addr=20 occurred in module "zfs" due to a NULL pointer dereference rdi ffffff223e7bd290 rsi 0 rdx 8 rcx 4170d6eb r8 ffffff00f5cee3f0 r9 ffffff00f5cee1c8 rax 4170d6f0 rbx ffffff00f5cee650 rbp ffffff00f5cee3d0 r10 fffffffffb854358 r11 0 r12 800 r13 0 r14 ffffff00f5cee3f0 r15 ffffff00f5cee530 fsbase 0 gsbase ffffff21f169c000 ds 4b es 4b fs 0 gs 1c3 trapno e err 0 rip fffffffff7a11e95 cs 30 rflags 10206 rsp ffffff00f5cee380 ss 38 gdt_hi 0 gdt_lo 700001ef idt_hi 0 idt_lo 40000fff ldt 0 task 70 cr0 8005003b cr2 20 cr3 206fe00000 cr4 426f8 > ________________________ Michael Talbott Systems Administrator La Jolla Institute On Dec 4, 2015, at 7:56 AM, Dan McDonald wrote: On Dec 4, 2015, at 10:53 AM, Lawrence Giam wrote: Should I cancel the scrub and try the method that John suggest? I'd let the scrub run to be sure. If it's the class of bug I'm thinking, though, scrub won't catch it. :( And if you can provide one of those r151014 core dumps, that'd be great. If this pool has confidential data, though, I can understand why not. Dan _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From waldenvik at gmx.com Fri Dec 4 20:43:54 2015 From: waldenvik at gmx.com (Martin Waldenvik) Date: Fri, 4 Dec 2015 21:43:54 +0100 Subject: [OmniOS-discuss] security updates and zones Message-ID: Hi Just a quick question. There was a security update the other day for omnios. What is the correct way to update the zones? Via pkg update in the zone or via zoneadm -z zone attach -u. Best regards -- Martin Sent with Airmail From danmcd at omniti.com Fri Dec 4 20:56:32 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 4 Dec 2015 15:56:32 -0500 Subject: [OmniOS-discuss] security updates and zones In-Reply-To: References: Message-ID: <2793292F-933F-431D-AA4C-09D619BF432F@omniti.com> > On Dec 4, 2015, at 3:43 PM, Martin Waldenvik wrote: > > Hi > > Just a quick question. There was a security update the other day for omnios. What is the correct way to update the zones? Via pkg update in the zone or via zoneadm -z zone attach -u. If it's just openssl, you can do pkg update in each zone. Use "pkg update -nv" to confirm things. And use "pkg update --no-backup-be" to prevent backup-BEs from being created. You do know that the "lipkg" zones are linked to global, and update when the global does, right? :) Dan From henson at acm.org Sun Dec 6 02:17:39 2015 From: henson at acm.org (Paul B. Henson) Date: Sat, 05 Dec 2015 18:17:39 -0800 Subject: [OmniOS-discuss] core dump while trying to import pool In-Reply-To: References: <71C5258A-C99E-44DF-BFE1-A1D5EE0CE686@omniti.com> <589C8043-C3E2-4249-99E8-AA5A35E17892@omniti.com> Message-ID: <20151206021738.GT3405@bender.unx.cpp.edu> On Fri, Dec 04, 2015 at 11:33:09AM -0800, Michael Talbott wrote: > I also came upon this same issue after rebooting one of my OmniOS machines. > I did have l2arc devices on my pool until the announcement of the bug > found. At that point I immediately removed my l2arc devices and didn't > reboot the machine until a convenient time where if something bad were to > happen I could manage it. Well, it was good I planned for that reboot ;) Hmm, out of curiosity, did you run a scrub and a zdb analysis of your pool before you rebooted? I'm in a similar boat, I have a pool which had L2ARC devices and might have been impacted by the bug. I removed the devices, ran a scrub and zdb, with no complaints from either, which left me reasonably hopeful the pool wasn't corrupted 8-/. I still haven't rebooted it though, there's really no good time for a pool to go belly up and potentially be unrecoverable :(. I was planning to do it over Christmas break, but if you scrubbed and zdb'd your pool successfully before rebooting and it still died that's gonna make me (extra) nervous . Thanks... From mtalbott at lji.org Sun Dec 6 06:49:59 2015 From: mtalbott at lji.org (Michael Talbott) Date: Sat, 5 Dec 2015 22:49:59 -0800 Subject: [OmniOS-discuss] core dump while trying to import pool In-Reply-To: <20151206021738.GT3405@bender.unx.cpp.edu> References: <71C5258A-C99E-44DF-BFE1-A1D5EE0CE686@omniti.com> <589C8043-C3E2-4249-99E8-AA5A35E17892@omniti.com> <20151206021738.GT3405@bender.unx.cpp.edu> Message-ID: <317E3C4D-1AD5-4A57-95BD-B12624049595@lji.org> I did not run a zdb check since this pool was over 200TB and figured it'd take weeks to finish. Maybe more, maybe not? I just planned for worst case scenarios before the reboot and am sure glad I did. The pool was scrubbed several times between the time the l2arc devices were removed and the reboot all reported no errors. The problem surfaces (at least in my case) when a particular volume tries to mount as rw. After lots of googling I found a few other reports with the same backtrace that say they were able to work around a similar issue by mounting the volumes as readonly first and then after they were mounted to update the mount to rw. I didn't try that, but, maybe that would have worked? If so it sounded like that'd only have been a temporary fix until next reboot... At any rate, a clean scrub alone is not an indicator of pool health regarding this bug. No clue if a zdb analyses would be a more determining factor. My personal advise is plan for the worst and hope for the best with backups on hand. Better to plan for it than to let a fluke bug or power incident reveal it unexpectedly. Since I didn't zdb it first.. Maybe your nerves can be at more ease? Good luck and let me know how things turn out. Michael Sent from my iPhone > On Dec 5, 2015, at 6:17 PM, Paul B. Henson wrote: > >> On Fri, Dec 04, 2015 at 11:33:09AM -0800, Michael Talbott wrote: >> I also came upon this same issue after rebooting one of my OmniOS machines. >> I did have l2arc devices on my pool until the announcement of the bug >> found. At that point I immediately removed my l2arc devices and didn't >> reboot the machine until a convenient time where if something bad were to >> happen I could manage it. Well, it was good I planned for that reboot ;) > > Hmm, out of curiosity, did you run a scrub and a zdb analysis of your > pool before you rebooted? I'm in a similar boat, I have a pool which had > L2ARC devices and might have been impacted by the bug. I removed the > devices, ran a scrub and zdb, with no complaints from either, which left > me reasonably hopeful the pool wasn't corrupted 8-/. I still haven't > rebooted it though, there's really no good time for a pool to go belly > up and potentially be unrecoverable :(. I was planning to do it over > Christmas break, but if you scrubbed and zdb'd your pool successfully > before rebooting and it still died that's gonna make me (extra) nervous > . > > Thanks... From jeffpc at josefsipek.net Sun Dec 6 14:45:14 2015 From: jeffpc at josefsipek.net (Josef 'Jeff' Sipek) Date: Sun, 6 Dec 2015 09:45:14 -0500 Subject: [OmniOS-discuss] PowerDNS recursor SIGSEGV Message-ID: <20151206144514.GA1425@meili.valhalla.31bits.net> I compiled powerdns recursor [1] on 016, but I'm running into an occasional SIGSEGV. The SIGSEGV is because of insufficiently aligned memory operand to an instruction. (See the powerdns bug I filed for this [2].) The SIGSEGV actually happens in the deque code which comes from boost (1.58.0 in this case). Now, the weird thing... I compiled the same powerdns source with the same version of boost on OI Hipster and OmniOS 016. Hipster uses gcc 4.9.3, OmniOS 016 uses 5.1. The function that causes the SEGV on 016 looks totally different between the two distros so I haven't see it die on my laptop. Has anyone seen any strange SIGSEGVs in boost using software? I hope it isn't some sort of gcc bug. Thanks, Jeff. P.S. PowerDNS uses {get,set,swap}context, so I haven't ruled out a stack alignment bug on their end. [1] https://www.powerdns.com/ [2] https://github.com/PowerDNS/pdns/issues/3002 OmniOS 016: _ZNKSt15_Deque_iteratorIcRcPcEmiEi: pushl %ebp _ZNKSt15_Deque_iteratorIcRcPcEmiEi+1: movl %esp,%ebp _ZNKSt15_Deque_iteratorIcRcPcEmiEi+3: pushl %ebx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+4: subl $0x1c,%esp _ZNKSt15_Deque_iteratorIcRcPcEmiEi+7: movl 0xc(%ebp),%eax _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xa: movl 0x8(%ebp),%ebx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xd: movdqu (%eax),%xmm0 _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x11:movl 0x10(%ebp),%eax _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x14:movaps %xmm0,-0x18(%ebp) _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x18:negl %eax _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1a:pushl %eax _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1b:leal -0x18(%ebp),%eax _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1e:pushl %eax _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1f:call -0x94 <_ZNSt15_Deque_iteratorIcRcPcEpLEi> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x24:movl (%eax),%edx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x26:addl $0x10,%esp _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x29:movl %edx,(%ebx) _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2b:movl 0x4(%eax),%edx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2e:movl %edx,0x4(%ebx) _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x31:movl 0x8(%eax),%edx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x34:movl 0xc(%eax),%eax _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x37:movl %edx,0x8(%ebx) _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a:movl %eax,0xc(%ebx) _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3d:movl %ebx,%eax _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3f:movl -0x4(%ebp),%ebx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x42:leave _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x43:ret $0x4 OI Hipster: _ZNKSt15_Deque_iteratorIcRcPcEmiEi: pushl %ebp _ZNKSt15_Deque_iteratorIcRcPcEmiEi+1: pushl %edi _ZNKSt15_Deque_iteratorIcRcPcEmiEi+2: pushl %esi _ZNKSt15_Deque_iteratorIcRcPcEmiEi+3: pushl %ebx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+4: subl $0x14,%esp _ZNKSt15_Deque_iteratorIcRcPcEmiEi+7: movl 0x2c(%esp),%edx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xb: movl 0x30(%esp),%ebx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xf: movl 0x28(%esp),%eax _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x13:movl (%edx),%esi _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x15:movl 0x4(%edx),%ecx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x18:movl 0x8(%edx),%edi _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1b:movl 0xc(%edx),%ebp _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1e:movl %esi,%edx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x20:subl %ebx,%esi _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x22:subl %ecx,%edx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x24:subl %ebx,%edx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x26:cmpl $0x1ff,%edx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2c:movl %esi,(%esp) _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2f:jbe +0x21 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x52> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x31:movl %edx,%ebx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x33:sarl $0x9,%ebx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x36:testl %edx,%edx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x38:jle +0x56 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x90> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a:leal 0x0(%ebp,%ebx,4),%ebp _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3e:movl 0x0(%ebp),%ecx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x41:shll $0x9,%ebx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x44:subl %ebx,%edx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x46:leal (%ecx,%edx),%esi _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x49:leal 0x200(%ecx),%edi _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x4f:movl %esi,(%esp) _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x52:movl %edi,0x4(%esp) _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x56:movd (%esp),%xmm0 _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x5b:movl %ecx,(%esp) _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x5e:movd 0x4(%esp),%xmm1 _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x64:movl %ebp,0x4(%esp) _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x68:movd (%esp),%xmm3 _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x6d:punpckldq %xmm3,%xmm0 _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x71:movd 0x4(%esp),%xmm2 _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x77:punpckldq %xmm2,%xmm1 _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x7b:punpcklqdq %xmm1,%xmm0 _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x7f:movdqu %xmm0,(%eax) _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x83:addl $0x14,%esp _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x86:popl %ebx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x87:popl %esi _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x88:popl %edi _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x89:popl %ebp _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x8a:ret $0x4 _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x8d:leal 0x0(%esi),%esi _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x90:movl %edx,%ebx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x92:shrl $0x9,%ebx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x95:orl $0xff800000,%ebx _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x9b:jmp -0x63 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a> -- I'm somewhere between geek and normal. - Linus Torvalds From danmcd at omniti.com Sun Dec 6 15:26:00 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sun, 6 Dec 2015 10:26:00 -0500 Subject: [OmniOS-discuss] PowerDNS recursor SIGSEGV In-Reply-To: <20151206144514.GA1425@meili.valhalla.31bits.net> References: <20151206144514.GA1425@meili.valhalla.31bits.net> Message-ID: One other weird thing to try -- build powerdns with the Illumos gcc4. If the gcc5 bug affects powerdns, that'd isolate it. If gcc5 affects some non Illumos library, gcc4 won't help and you'll still segv. If gcc4 Illumos can't build it, you could try 014 and its gcc481. Dan Sent from my iPhone (typos, autocorrect, and all) > On Dec 6, 2015, at 9:45 AM, Josef 'Jeff' Sipek wrote: > > I compiled powerdns recursor [1] on 016, but I'm running into an occasional > SIGSEGV. The SIGSEGV is because of insufficiently aligned memory operand to an > instruction. (See the powerdns bug I filed for this [2].) The SIGSEGV actually > happens in the deque code which comes from boost (1.58.0 in this case). > > Now, the weird thing... I compiled the same powerdns source with the same > version of boost on OI Hipster and OmniOS 016. Hipster uses gcc 4.9.3, > OmniOS 016 uses 5.1. The function that causes the SEGV on 016 looks totally > different between the two distros so I haven't see it die on my laptop. > > Has anyone seen any strange SIGSEGVs in boost using software? I hope it isn't > some sort of gcc bug. > > Thanks, > > Jeff. > > P.S. PowerDNS uses {get,set,swap}context, so I haven't ruled out a stack > alignment bug on their end. > > [1] https://www.powerdns.com/ > [2] https://github.com/PowerDNS/pdns/issues/3002 > > > OmniOS 016: > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi: pushl %ebp > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+1: movl %esp,%ebp > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+3: pushl %ebx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+4: subl $0x1c,%esp > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+7: movl 0xc(%ebp),%eax > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xa: movl 0x8(%ebp),%ebx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xd: movdqu (%eax),%xmm0 > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x11:movl 0x10(%ebp),%eax > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x14:movaps %xmm0,-0x18(%ebp) > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x18:negl %eax > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1a:pushl %eax > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1b:leal -0x18(%ebp),%eax > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1e:pushl %eax > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1f:call -0x94 <_ZNSt15_Deque_iteratorIcRcPcEpLEi> > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x24:movl (%eax),%edx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x26:addl $0x10,%esp > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x29:movl %edx,(%ebx) > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2b:movl 0x4(%eax),%edx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2e:movl %edx,0x4(%ebx) > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x31:movl 0x8(%eax),%edx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x34:movl 0xc(%eax),%eax > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x37:movl %edx,0x8(%ebx) > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a:movl %eax,0xc(%ebx) > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3d:movl %ebx,%eax > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3f:movl -0x4(%ebp),%ebx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x42:leave > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x43:ret $0x4 > > > OI Hipster: > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi: pushl %ebp > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+1: pushl %edi > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+2: pushl %esi > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+3: pushl %ebx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+4: subl $0x14,%esp > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+7: movl 0x2c(%esp),%edx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xb: movl 0x30(%esp),%ebx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xf: movl 0x28(%esp),%eax > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x13:movl (%edx),%esi > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x15:movl 0x4(%edx),%ecx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x18:movl 0x8(%edx),%edi > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1b:movl 0xc(%edx),%ebp > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1e:movl %esi,%edx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x20:subl %ebx,%esi > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x22:subl %ecx,%edx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x24:subl %ebx,%edx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x26:cmpl $0x1ff,%edx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2c:movl %esi,(%esp) > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2f:jbe +0x21 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x52> > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x31:movl %edx,%ebx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x33:sarl $0x9,%ebx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x36:testl %edx,%edx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x38:jle +0x56 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x90> > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a:leal 0x0(%ebp,%ebx,4),%ebp > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3e:movl 0x0(%ebp),%ecx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x41:shll $0x9,%ebx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x44:subl %ebx,%edx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x46:leal (%ecx,%edx),%esi > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x49:leal 0x200(%ecx),%edi > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x4f:movl %esi,(%esp) > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x52:movl %edi,0x4(%esp) > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x56:movd (%esp),%xmm0 > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x5b:movl %ecx,(%esp) > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x5e:movd 0x4(%esp),%xmm1 > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x64:movl %ebp,0x4(%esp) > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x68:movd (%esp),%xmm3 > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x6d:punpckldq %xmm3,%xmm0 > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x71:movd 0x4(%esp),%xmm2 > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x77:punpckldq %xmm2,%xmm1 > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x7b:punpcklqdq %xmm1,%xmm0 > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x7f:movdqu %xmm0,(%eax) > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x83:addl $0x14,%esp > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x86:popl %ebx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x87:popl %esi > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x88:popl %edi > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x89:popl %ebp > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x8a:ret $0x4 > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x8d:leal 0x0(%esi),%esi > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x90:movl %edx,%ebx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x92:shrl $0x9,%ebx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x95:orl $0xff800000,%ebx > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x9b:jmp -0x63 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a> > > -- > I'm somewhere between geek and normal. > - Linus Torvalds > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From jeffpc at josefsipek.net Sun Dec 6 20:40:30 2015 From: jeffpc at josefsipek.net (Josef 'Jeff' Sipek) Date: Sun, 6 Dec 2015 15:40:30 -0500 Subject: [OmniOS-discuss] PowerDNS recursor SIGSEGV In-Reply-To: References: <20151206144514.GA1425@meili.valhalla.31bits.net> Message-ID: <20151206204030.GA1360@meili.valhalla.31bits.net> On Sun, Dec 06, 2015 at 10:26:00AM -0500, Dan McDonald wrote: > One other weird thing to try -- build powerdns with the Illumos gcc4. If > the gcc5 bug affects powerdns, that'd isolate it. If gcc5 affects some > non Illumos library, gcc4 won't help and you'll still segv. > > If gcc4 Illumos can't build it, The powerdns devs use a lot of c++11 which makes 4.4.4 *waaay* too old. Apparently, 4.8 should be good enough. > you could try 014 and its gcc481. Yeah, I'll try that. Thanks, Jeff. > > Dan > > Sent from my iPhone (typos, autocorrect, and all) > > > On Dec 6, 2015, at 9:45 AM, Josef 'Jeff' Sipek wrote: > > > > I compiled powerdns recursor [1] on 016, but I'm running into an occasional > > SIGSEGV. The SIGSEGV is because of insufficiently aligned memory operand to an > > instruction. (See the powerdns bug I filed for this [2].) The SIGSEGV actually > > happens in the deque code which comes from boost (1.58.0 in this case). > > > > Now, the weird thing... I compiled the same powerdns source with the same > > version of boost on OI Hipster and OmniOS 016. Hipster uses gcc 4.9.3, > > OmniOS 016 uses 5.1. The function that causes the SEGV on 016 looks totally > > different between the two distros so I haven't see it die on my laptop. > > > > Has anyone seen any strange SIGSEGVs in boost using software? I hope it isn't > > some sort of gcc bug. > > > > Thanks, > > > > Jeff. > > > > P.S. PowerDNS uses {get,set,swap}context, so I haven't ruled out a stack > > alignment bug on their end. > > > > [1] https://www.powerdns.com/ > > [2] https://github.com/PowerDNS/pdns/issues/3002 > > > > > > OmniOS 016: > > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi: pushl %ebp > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+1: movl %esp,%ebp > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+3: pushl %ebx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+4: subl $0x1c,%esp > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+7: movl 0xc(%ebp),%eax > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xa: movl 0x8(%ebp),%ebx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xd: movdqu (%eax),%xmm0 > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x11:movl 0x10(%ebp),%eax > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x14:movaps %xmm0,-0x18(%ebp) > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x18:negl %eax > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1a:pushl %eax > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1b:leal -0x18(%ebp),%eax > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1e:pushl %eax > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1f:call -0x94 <_ZNSt15_Deque_iteratorIcRcPcEpLEi> > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x24:movl (%eax),%edx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x26:addl $0x10,%esp > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x29:movl %edx,(%ebx) > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2b:movl 0x4(%eax),%edx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2e:movl %edx,0x4(%ebx) > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x31:movl 0x8(%eax),%edx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x34:movl 0xc(%eax),%eax > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x37:movl %edx,0x8(%ebx) > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a:movl %eax,0xc(%ebx) > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3d:movl %ebx,%eax > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3f:movl -0x4(%ebp),%ebx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x42:leave > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x43:ret $0x4 > > > > > > OI Hipster: > > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi: pushl %ebp > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+1: pushl %edi > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+2: pushl %esi > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+3: pushl %ebx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+4: subl $0x14,%esp > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+7: movl 0x2c(%esp),%edx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xb: movl 0x30(%esp),%ebx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xf: movl 0x28(%esp),%eax > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x13:movl (%edx),%esi > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x15:movl 0x4(%edx),%ecx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x18:movl 0x8(%edx),%edi > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1b:movl 0xc(%edx),%ebp > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1e:movl %esi,%edx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x20:subl %ebx,%esi > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x22:subl %ecx,%edx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x24:subl %ebx,%edx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x26:cmpl $0x1ff,%edx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2c:movl %esi,(%esp) > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2f:jbe +0x21 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x52> > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x31:movl %edx,%ebx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x33:sarl $0x9,%ebx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x36:testl %edx,%edx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x38:jle +0x56 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x90> > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a:leal 0x0(%ebp,%ebx,4),%ebp > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3e:movl 0x0(%ebp),%ecx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x41:shll $0x9,%ebx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x44:subl %ebx,%edx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x46:leal (%ecx,%edx),%esi > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x49:leal 0x200(%ecx),%edi > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x4f:movl %esi,(%esp) > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x52:movl %edi,0x4(%esp) > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x56:movd (%esp),%xmm0 > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x5b:movl %ecx,(%esp) > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x5e:movd 0x4(%esp),%xmm1 > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x64:movl %ebp,0x4(%esp) > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x68:movd (%esp),%xmm3 > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x6d:punpckldq %xmm3,%xmm0 > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x71:movd 0x4(%esp),%xmm2 > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x77:punpckldq %xmm2,%xmm1 > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x7b:punpcklqdq %xmm1,%xmm0 > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x7f:movdqu %xmm0,(%eax) > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x83:addl $0x14,%esp > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x86:popl %ebx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x87:popl %esi > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x88:popl %edi > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x89:popl %ebp > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x8a:ret $0x4 > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x8d:leal 0x0(%esi),%esi > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x90:movl %edx,%ebx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x92:shrl $0x9,%ebx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x95:orl $0xff800000,%ebx > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x9b:jmp -0x63 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a> > > > > -- > > I'm somewhere between geek and normal. > > - Linus Torvalds > > _______________________________________________ > > OmniOS-discuss mailing list > > OmniOS-discuss at lists.omniti.com > > http://lists.omniti.com/mailman/listinfo/omnios-discuss -- The box said "Windows XP or better required". So I installed Linux. From jeffpc at josefsipek.net Sun Dec 6 22:54:26 2015 From: jeffpc at josefsipek.net (Josef 'Jeff' Sipek) Date: Sun, 6 Dec 2015 17:54:26 -0500 Subject: [OmniOS-discuss] PowerDNS recursor SIGSEGV In-Reply-To: <20151206204030.GA1360@meili.valhalla.31bits.net> References: <20151206144514.GA1425@meili.valhalla.31bits.net> <20151206204030.GA1360@meili.valhalla.31bits.net> Message-ID: <20151206225426.GB1360@meili.valhalla.31bits.net> On Sun, Dec 06, 2015 at 03:40:30PM -0500, Josef 'Jeff' Sipek wrote: > On Sun, Dec 06, 2015 at 10:26:00AM -0500, Dan McDonald wrote: > > One other weird thing to try -- build powerdns with the Illumos gcc4. If > > the gcc5 bug affects powerdns, that'd isolate it. If gcc5 affects some > > non Illumos library, gcc4 won't help and you'll still segv. > > > > If gcc4 Illumos can't build it, > > The powerdns devs use a lot of c++11 which makes 4.4.4 *waaay* too old. > Apparently, 4.8 should be good enough. > > > you could try 014 and its gcc481. > > Yeah, I'll try that. Ok. 014 produces the same exact instructions as OI Hipster. I wonder if gcc 5 changed some processor default. Jeff. > Thanks, > > Jeff. > > > > > Dan > > > > Sent from my iPhone (typos, autocorrect, and all) > > > > > On Dec 6, 2015, at 9:45 AM, Josef 'Jeff' Sipek wrote: > > > > > > I compiled powerdns recursor [1] on 016, but I'm running into an occasional > > > SIGSEGV. The SIGSEGV is because of insufficiently aligned memory operand to an > > > instruction. (See the powerdns bug I filed for this [2].) The SIGSEGV actually > > > happens in the deque code which comes from boost (1.58.0 in this case). > > > > > > Now, the weird thing... I compiled the same powerdns source with the same > > > version of boost on OI Hipster and OmniOS 016. Hipster uses gcc 4.9.3, > > > OmniOS 016 uses 5.1. The function that causes the SEGV on 016 looks totally > > > different between the two distros so I haven't see it die on my laptop. > > > > > > Has anyone seen any strange SIGSEGVs in boost using software? I hope it isn't > > > some sort of gcc bug. > > > > > > Thanks, > > > > > > Jeff. > > > > > > P.S. PowerDNS uses {get,set,swap}context, so I haven't ruled out a stack > > > alignment bug on their end. > > > > > > [1] https://www.powerdns.com/ > > > [2] https://github.com/PowerDNS/pdns/issues/3002 > > > > > > > > > OmniOS 016: > > > > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi: pushl %ebp > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+1: movl %esp,%ebp > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+3: pushl %ebx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+4: subl $0x1c,%esp > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+7: movl 0xc(%ebp),%eax > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xa: movl 0x8(%ebp),%ebx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xd: movdqu (%eax),%xmm0 > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x11:movl 0x10(%ebp),%eax > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x14:movaps %xmm0,-0x18(%ebp) > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x18:negl %eax > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1a:pushl %eax > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1b:leal -0x18(%ebp),%eax > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1e:pushl %eax > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1f:call -0x94 <_ZNSt15_Deque_iteratorIcRcPcEpLEi> > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x24:movl (%eax),%edx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x26:addl $0x10,%esp > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x29:movl %edx,(%ebx) > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2b:movl 0x4(%eax),%edx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2e:movl %edx,0x4(%ebx) > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x31:movl 0x8(%eax),%edx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x34:movl 0xc(%eax),%eax > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x37:movl %edx,0x8(%ebx) > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a:movl %eax,0xc(%ebx) > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3d:movl %ebx,%eax > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3f:movl -0x4(%ebp),%ebx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x42:leave > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x43:ret $0x4 > > > > > > > > > OI Hipster: > > > > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi: pushl %ebp > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+1: pushl %edi > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+2: pushl %esi > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+3: pushl %ebx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+4: subl $0x14,%esp > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+7: movl 0x2c(%esp),%edx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xb: movl 0x30(%esp),%ebx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xf: movl 0x28(%esp),%eax > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x13:movl (%edx),%esi > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x15:movl 0x4(%edx),%ecx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x18:movl 0x8(%edx),%edi > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1b:movl 0xc(%edx),%ebp > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1e:movl %esi,%edx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x20:subl %ebx,%esi > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x22:subl %ecx,%edx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x24:subl %ebx,%edx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x26:cmpl $0x1ff,%edx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2c:movl %esi,(%esp) > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2f:jbe +0x21 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x52> > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x31:movl %edx,%ebx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x33:sarl $0x9,%ebx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x36:testl %edx,%edx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x38:jle +0x56 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x90> > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a:leal 0x0(%ebp,%ebx,4),%ebp > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3e:movl 0x0(%ebp),%ecx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x41:shll $0x9,%ebx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x44:subl %ebx,%edx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x46:leal (%ecx,%edx),%esi > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x49:leal 0x200(%ecx),%edi > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x4f:movl %esi,(%esp) > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x52:movl %edi,0x4(%esp) > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x56:movd (%esp),%xmm0 > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x5b:movl %ecx,(%esp) > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x5e:movd 0x4(%esp),%xmm1 > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x64:movl %ebp,0x4(%esp) > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x68:movd (%esp),%xmm3 > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x6d:punpckldq %xmm3,%xmm0 > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x71:movd 0x4(%esp),%xmm2 > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x77:punpckldq %xmm2,%xmm1 > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x7b:punpcklqdq %xmm1,%xmm0 > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x7f:movdqu %xmm0,(%eax) > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x83:addl $0x14,%esp > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x86:popl %ebx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x87:popl %esi > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x88:popl %edi > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x89:popl %ebp > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x8a:ret $0x4 > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x8d:leal 0x0(%esi),%esi > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x90:movl %edx,%ebx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x92:shrl $0x9,%ebx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x95:orl $0xff800000,%ebx > > > _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x9b:jmp -0x63 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a> > > > > > > -- > > > I'm somewhere between geek and normal. > > > - Linus Torvalds > > > _______________________________________________ > > > OmniOS-discuss mailing list > > > OmniOS-discuss at lists.omniti.com > > > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > -- > The box said "Windows XP or better required". So I installed Linux. -- If I have trouble installing Linux, something is wrong. Very wrong. - Linus Torvalds From danmcd at omniti.com Sun Dec 6 23:42:46 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sun, 6 Dec 2015 18:42:46 -0500 Subject: [OmniOS-discuss] PowerDNS recursor SIGSEGV In-Reply-To: <20151206225426.GB1360@meili.valhalla.31bits.net> References: <20151206144514.GA1425@meili.valhalla.31bits.net> <20151206204030.GA1360@meili.valhalla.31bits.net> <20151206225426.GB1360@meili.valhalla.31bits.net> Message-ID: <0112A91C-03C8-4007-A85F-893E6DBE93EE@omniti.com> I wonder how the 014-compiled binary performs on 016? More accurately, I wonder if any gcc-51 compiled libs are off? Dan Sent from my iPhone (typos, autocorrect, and all) > On Dec 6, 2015, at 5:54 PM, Josef 'Jeff' Sipek wrote: > >> On Sun, Dec 06, 2015 at 03:40:30PM -0500, Josef 'Jeff' Sipek wrote: >>> On Sun, Dec 06, 2015 at 10:26:00AM -0500, Dan McDonald wrote: >>> One other weird thing to try -- build powerdns with the Illumos gcc4. If >>> the gcc5 bug affects powerdns, that'd isolate it. If gcc5 affects some >>> non Illumos library, gcc4 won't help and you'll still segv. >>> >>> If gcc4 Illumos can't build it, >> >> The powerdns devs use a lot of c++11 which makes 4.4.4 *waaay* too old. >> Apparently, 4.8 should be good enough. >> >>> you could try 014 and its gcc481. >> >> Yeah, I'll try that. > > Ok. 014 produces the same exact instructions as OI Hipster. I wonder if > gcc 5 changed some processor default. > > Jeff. > >> Thanks, >> >> Jeff. >> >>> >>> Dan >>> >>> Sent from my iPhone (typos, autocorrect, and all) >>> >>>> On Dec 6, 2015, at 9:45 AM, Josef 'Jeff' Sipek wrote: >>>> >>>> I compiled powerdns recursor [1] on 016, but I'm running into an occasional >>>> SIGSEGV. The SIGSEGV is because of insufficiently aligned memory operand to an >>>> instruction. (See the powerdns bug I filed for this [2].) The SIGSEGV actually >>>> happens in the deque code which comes from boost (1.58.0 in this case). >>>> >>>> Now, the weird thing... I compiled the same powerdns source with the same >>>> version of boost on OI Hipster and OmniOS 016. Hipster uses gcc 4.9.3, >>>> OmniOS 016 uses 5.1. The function that causes the SEGV on 016 looks totally >>>> different between the two distros so I haven't see it die on my laptop. >>>> >>>> Has anyone seen any strange SIGSEGVs in boost using software? I hope it isn't >>>> some sort of gcc bug. >>>> >>>> Thanks, >>>> >>>> Jeff. >>>> >>>> P.S. PowerDNS uses {get,set,swap}context, so I haven't ruled out a stack >>>> alignment bug on their end. >>>> >>>> [1] https://www.powerdns.com/ >>>> [2] https://github.com/PowerDNS/pdns/issues/3002 >>>> >>>> >>>> OmniOS 016: >>>> >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi: pushl %ebp >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+1: movl %esp,%ebp >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+3: pushl %ebx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+4: subl $0x1c,%esp >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+7: movl 0xc(%ebp),%eax >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xa: movl 0x8(%ebp),%ebx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xd: movdqu (%eax),%xmm0 >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x11:movl 0x10(%ebp),%eax >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x14:movaps %xmm0,-0x18(%ebp) >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x18:negl %eax >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1a:pushl %eax >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1b:leal -0x18(%ebp),%eax >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1e:pushl %eax >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1f:call -0x94 <_ZNSt15_Deque_iteratorIcRcPcEpLEi> >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x24:movl (%eax),%edx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x26:addl $0x10,%esp >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x29:movl %edx,(%ebx) >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2b:movl 0x4(%eax),%edx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2e:movl %edx,0x4(%ebx) >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x31:movl 0x8(%eax),%edx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x34:movl 0xc(%eax),%eax >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x37:movl %edx,0x8(%ebx) >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a:movl %eax,0xc(%ebx) >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3d:movl %ebx,%eax >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3f:movl -0x4(%ebp),%ebx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x42:leave >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x43:ret $0x4 >>>> >>>> >>>> OI Hipster: >>>> >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi: pushl %ebp >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+1: pushl %edi >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+2: pushl %esi >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+3: pushl %ebx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+4: subl $0x14,%esp >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+7: movl 0x2c(%esp),%edx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xb: movl 0x30(%esp),%ebx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xf: movl 0x28(%esp),%eax >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x13:movl (%edx),%esi >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x15:movl 0x4(%edx),%ecx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x18:movl 0x8(%edx),%edi >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1b:movl 0xc(%edx),%ebp >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1e:movl %esi,%edx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x20:subl %ebx,%esi >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x22:subl %ecx,%edx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x24:subl %ebx,%edx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x26:cmpl $0x1ff,%edx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2c:movl %esi,(%esp) >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2f:jbe +0x21 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x52> >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x31:movl %edx,%ebx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x33:sarl $0x9,%ebx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x36:testl %edx,%edx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x38:jle +0x56 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x90> >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a:leal 0x0(%ebp,%ebx,4),%ebp >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3e:movl 0x0(%ebp),%ecx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x41:shll $0x9,%ebx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x44:subl %ebx,%edx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x46:leal (%ecx,%edx),%esi >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x49:leal 0x200(%ecx),%edi >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x4f:movl %esi,(%esp) >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x52:movl %edi,0x4(%esp) >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x56:movd (%esp),%xmm0 >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x5b:movl %ecx,(%esp) >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x5e:movd 0x4(%esp),%xmm1 >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x64:movl %ebp,0x4(%esp) >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x68:movd (%esp),%xmm3 >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x6d:punpckldq %xmm3,%xmm0 >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x71:movd 0x4(%esp),%xmm2 >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x77:punpckldq %xmm2,%xmm1 >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x7b:punpcklqdq %xmm1,%xmm0 >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x7f:movdqu %xmm0,(%eax) >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x83:addl $0x14,%esp >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x86:popl %ebx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x87:popl %esi >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x88:popl %edi >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x89:popl %ebp >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x8a:ret $0x4 >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x8d:leal 0x0(%esi),%esi >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x90:movl %edx,%ebx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x92:shrl $0x9,%ebx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x95:orl $0xff800000,%ebx >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x9b:jmp -0x63 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a> >>>> >>>> -- >>>> I'm somewhere between geek and normal. >>>> - Linus Torvalds >>>> _______________________________________________ >>>> OmniOS-discuss mailing list >>>> OmniOS-discuss at lists.omniti.com >>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >> >> -- >> The box said "Windows XP or better required". So I installed Linux. > > -- > If I have trouble installing Linux, something is wrong. Very wrong. > - Linus Torvalds From bfriesen at simple.dallas.tx.us Mon Dec 7 00:19:03 2015 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Sun, 6 Dec 2015 18:19:03 -0600 (CST) Subject: [OmniOS-discuss] OmniOS r151016 zone has difficulties shutting down Message-ID: On a freshly installed zone with no additional packages installed (but with one lofs mount to a filesystem), I am seeing a glitch with 'zoneadm -z name shutdown', 'zoneadm -z name reboot' or 'reboot' within the zone. This message appears on the console and in the /var/adm/messages file of the global zone: Dec 6 17:17:22 scrappy zoneadmd[17388]: [ID 702911 daemon.error] [zone 'pkgbuild'] failed to open console master: Device busy Dec 6 17:17:22 scrappy zoneadmd[17388]: [ID 702911 daemon.error] [zone 'pkgbuild'] WARNING: could not open master side of zone console for pkgbuild to release slave handle: Device busy Dec 6 17:17:22 scrappy zoneadmd[17388]: [ID 702911 daemon.error] [zone 'pkgbuild'] WARNING: console /devices//pseudo/zconsnex at 1/zcons at 1 found, but it could not be removed.: I/O error and the shutdown hangs. If I then do a zlogin to the console (or have already done so) the shutdown immediately completes: scrappy:~% pfexec zlogin -C pkgbuild [Connected to zone 'pkgbuild' console] [NOTICE: Zone halted] If I attempt to zlogin into the zone while it is being shut down I get this message: zlogin: login allowed only to running zones (pkgbuild is 'shutting_down'). If I do 'zoneadm -z name reboot', it works fine, although this is documented to be the same as 'shutdown' followed by 'boot'. If I do the reboot on the zone console then the reboot works fine. This is the second zone that I have installed and the first zone also encountered this issue. The problem went away with the other zone but still persists for this new zone. Have others encountered this issue? What can be done to fix it? Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From hasslerd at gmx.li Mon Dec 7 11:20:34 2015 From: hasslerd at gmx.li (Dominik Hassler) Date: Mon, 7 Dec 2015 12:20:34 +0100 Subject: [OmniOS-discuss] OmniOS r151016 zone has difficulties shutting down In-Reply-To: References: Message-ID: Bob, I can confirm that this happens occasionally on my systems (all r16 and latest patches applied), too. Since it does not happen every shutdown and for a different zone every time, I could not find a pattern, yet. Usually I just halt the zone if the clean shutdown fails. I don't recall when this occured for the first time but it might be after upgrading to r16. > Gesendet: Montag, 07. Dezember 2015 um 01:19 Uhr > Von: "Bob Friesenhahn" > An: omnios-discuss at lists.omniti.com > Betreff: [OmniOS-discuss] OmniOS r151016 zone has difficulties shutting down > > On a freshly installed zone with no additional packages installed (but > with one lofs mount to a filesystem), I am seeing a glitch with > 'zoneadm -z name shutdown', 'zoneadm -z name reboot' or 'reboot' > within the zone. This message appears on the console and in the > /var/adm/messages file of the global zone: > > Dec 6 17:17:22 scrappy zoneadmd[17388]: [ID 702911 daemon.error] > [zone 'pkgbuild'] failed to open console master: Device busy > Dec 6 17:17:22 scrappy zoneadmd[17388]: [ID 702911 daemon.error] > [zone 'pkgbuild'] WARNING: could not open master side of zone console > for pkgbuild to release slave handle: Device busy > Dec 6 17:17:22 scrappy zoneadmd[17388]: [ID 702911 daemon.error] > [zone 'pkgbuild'] WARNING: console /devices//pseudo/zconsnex at 1/zcons at 1 > found, but it could not be removed.: I/O error > > and the shutdown hangs. If I then do a zlogin to the console (or have > already done so) the shutdown immediately completes: > > scrappy:~% pfexec zlogin -C pkgbuild > [Connected to zone 'pkgbuild' console] > > [NOTICE: Zone halted] > > If I attempt to zlogin into the zone while it is being shut down I get > this message: > > zlogin: login allowed only to running zones (pkgbuild is > 'shutting_down'). > > If I do 'zoneadm -z name reboot', it works fine, although this is > documented to be the same as 'shutdown' followed by 'boot'. > > If I do the reboot on the zone console then the reboot works fine. > > This is the second zone that I have installed and the first zone also > encountered this issue. The problem went away with the other zone but > still persists for this new zone. > > Have others encountered this issue? What can be done to fix it? > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > From danmcd at omniti.com Mon Dec 7 12:49:47 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 7 Dec 2015 07:49:47 -0500 Subject: [OmniOS-discuss] OmniOS r151016 zone has difficulties shutting down In-Reply-To: References: Message-ID: <536501D2-EA96-4F6B-8CB2-39A0F9698267@omniti.com> > On Dec 6, 2015, at 7:19 PM, Bob Friesenhahn wrote: > > On a freshly installed zone with no additional packages installed (but with one lofs mount to a filesystem), I am seeing a glitch with 'zoneadm -z name shutdown', 'zoneadm -z name reboot' or 'reboot' within the zone. This message appears on the console and in the /var/adm/messages file of the global zone: > > Dec 6 17:17:22 scrappy zoneadmd[17388]: [ID 702911 daemon.error] [zone 'pkgbuild'] failed to open console master: Device busy > Dec 6 17:17:22 scrappy zoneadmd[17388]: [ID 702911 daemon.error] [zone 'pkgbuild'] WARNING: could not open master side of zone console for pkgbuild to release slave handle: Device busy > Dec 6 17:17:22 scrappy zoneadmd[17388]: [ID 702911 daemon.error] [zone 'pkgbuild'] WARNING: console /devices//pseudo/zconsnex at 1/zcons at 1 found, but it could not be removed.: I/O error > > and the shutdown hangs. I just tried a couple of shutdown/boot loops on a 016 zone of mine. I did not see the hang, but these errors were in my global: Dec 7 07:34:45 neuromancer zoneadmd[29496]: [ID 702911 daemon.error] [zone 'minecraft'] WARNING: console /devices//pseudo/zconsnex at 1/zcons at 1 found, but it could not be removed.: I/O error So I'm guessing the failed-to-open-console-master and "could not open master side of zone console" is what was causing your failure. The other message (found, but it could not be removed), I only see when there is no "zlogin -C" process attached to my zone's console. > Have others encountered this issue? What can be done to fix it? This message is printed by zoneadmd. If you or anyone else encounters this hang again, please do the following: 1.) While zoneadm is hung, check the console for the above message, you'll see a pid for zoneadmd (Bob's example was 17388). 2.) See if you can get the stack(s) of zoneadmd that reported the console master error: pstack 3.) Grab a corefile of the zoneadmd: gcore 4.) Share the corefile somehow. The pstack and core of the running/hung zoneadm(1M) command would also be useful, I think. Thanks, Dan From davide.poletto at gmail.com Mon Dec 7 13:13:17 2015 From: davide.poletto at gmail.com (Davide Poletto) Date: Mon, 7 Dec 2015 14:13:17 +0100 Subject: [OmniOS-discuss] illumos and contributions metrics: how to evaluate companies that commercialize illumos based products by examining them in the light of their illumos community's contributions. Message-ID: Hi all, maybe I'm a little bit fool to ask such type of general question here...yes - I know - probably the illumos user's mailing list is the proper place to ask what I'm trying to explore here...but I feel comfortable to place my doubts here first (just see below why). At first sight it looks definitely Off Topic in regard with OmniOS and - also - not relevant to OmniOS (and OmniTI) in itself...so first of all...really *pardon me* if I jumped in with this type of generic doubts *but*, at the same time, I hope that, among others, Dan McDonald will read and give me (and us) his opinion about what I'm going to ask...since I recently read - and here it's the point for me that legitimates the discussion here - *his* interesting presentation "2015 illumos Day" (I found it at http://kebe.com/~danmcd/illumos-day-2015.pdf). All my interest started once I've read it and, particularly, when I started to think about the relevance of two slides: the "Non-Upstreamed Technical Changes" and "Bad Reasons for Not Upstreaming" slides captured my curiosity exactly while I was in the process of evaluate a NAS/SAN appliance intensively developed by a relatively young European company. The appliance I'm referring to, despite the company's marketing approach avoided to refer to expected terms such "illumos" or "ZFS", is clearly illumos based and uses, among other value added proprietary technologies, illumos kernel and ZFS as foundations for all other high level added features/services. This is quite normal, nothing new here you would say (I add it's sad to see that particular type of marketing approach in use: to apparently hide the evidence of your roots not because it is evident enough but because it isn't a useful topic that help to sell...this, at least, is my perception). The statement "Even if for a limited time, elapsed time increases upstreaming difficulty" of the second slide cited above hit my imagination: so I started to look at some illumos forked repositories (often those companies have one on GitHub, to cite only the illumos part forgetting other illumos related projects they may have forked) and the only evident fact I was able to note immediately is a probable relationship with the "This branch is n commits behind illumos:master." GitHub assertion...where the number "n" may (or may not) be an index of how much the company's project (illumos in this case) has diverged since its initial fork...leaving me with the impression that all possible related (bad/good) consequences are going to have a real (bad/good) impact on the future of the product/project especially if I want to find a relationship between those possible consequences and what is going to happen on the master branch (think about how fast things are changing when speaking about ZFS or the illumos kernel development). Maybe that only parameter (the n commits behind) is not enough to form a valid opinion and start to speculate: "The company X develops, produces, markets, sells and supports illumos based products but, looking at how much behind their illumos fork is with respect to the illumos master branch, that is not enough...what's about their grade of contributions to the community? how good their product/support/development will then be if they tend to diverge from the community?" and so on with similar questions. I've also read the interesting "Illumos Productivity and Bus Factor" illumetric blog entry (available at https://illumetrics.wordpress.com/2015/01/28/illumos-productivity-and-bus-factor/) but I didn't found a way to easily understand - as user - if a company is acting well in terms of commits done and why it is (or it is not) doing so...or to easily understand if its "public market image" finds a weighted counterpart in its community image (through the contributions it could give back to the entire illumos community). This approach could be also extended/applied to institutions too, I mean not only to commercial companies seen as special or particular entities (remembering that committers are individuals that, mostly, work for companies or for institutions)...but I'm now focused about companies that sells illumos based technology because they creates profits also through the essential software components they use as foundation of their products. Illumetrics released a framework for calculating statistics on illumos related repositories and data sources (see it here: https://github.com/nickziv/illumetrics) but, as they stated, it is far from complete (it seems to consider only contributions made by known names that reference to yet well known companies without considering also young/emerging ones in the count). That's not an illumetrics fault, that's clear...simply the "data cluster" is still little to infer generically about all illumos forked public projects. So, after this long preamble, here my legitimate question: is there a way to easily evaluate how good (and in which way) a commercial company - which naturally attracts system administrators' attention with their products (once and especially because those administrators realize that those products are illumos based) - is in "giving back" (if it does) to the illumos community (or to related communities) when that exact company develops, produces, markets and sells appliances by - at best - technically/commercially hiding (or by tending to hide or tending to not sufficiently promote with the necessary transparency) the fact that their products are essentially based and developed on a illumos fork? What I'm asking here are not names but metrics or, eventually, metrics' results...to help me form a partial but reasonable opinion. Is there a way to rank/evaluate and so reward/honour (by, as example, purchasing their products or by sustaining their development as testers/free-time contributors) those {individuals, companies, institutions} that clearly demonstrate not only to have good numbers (commits) but also that they care about the community and that are more transparent than others in advertising their commercial offer's origin? Kind regards, Davide. -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Mon Dec 7 13:44:38 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 7 Dec 2015 08:44:38 -0500 Subject: [OmniOS-discuss] illumos and contributions metrics: how to evaluate companies that commercialize illumos based products by examining them in the light of their illumos community's contributions. In-Reply-To: References: Message-ID: > On Dec 7, 2015, at 8:13 AM, Davide Poletto wrote: > > Is there a way to rank/evaluate and so reward/honour (by, as example, purchasing their products or by sustaining their development as testers/free-time contributors) those {individuals, companies, institutions} that clearly demonstrate not only to have good numbers (commits) but also that they care about the community and that are more transparent than others in advertising their commercial offer's origin? That's a damned good question. It's also very tricky. Some firms keep things closed until they've released, or after some time after they've released. Some find this fair enough, others find it annoying. Because people are different, it may be hard to get a consensus on how to rank/evaluate firms the way you wish. BTW, I lean toward "fair enough" so long as there's consistency and not going back on one's word. Keeping to one's word is important to me. I didn't leave Oracle because of the Solaris-closing: if you read the text of that leaked email, it implied a source-dump-on-release model. Only after I left Oracle did it become clear that it was all a big lie. You're chasing a hard problem. You may not get much sympathy. Making things MORE complicated is that "illumos" as a brand is still tightly tied up by its owner. Many feel that it's tied up too tightly, and that is why you rarely see "illumos" mentioned in marketing materials, especially not the trademarked symbol. I'm sorry I don't have better answers for you right now. It's a hard problem, and many of us who might be able to help clarify things are trying to keep all of the machinery moving as smoothly as we can. Dan From jeffpc at josefsipek.net Mon Dec 7 15:07:17 2015 From: jeffpc at josefsipek.net (Josef 'Jeff' Sipek) Date: Mon, 7 Dec 2015 10:07:17 -0500 Subject: [OmniOS-discuss] PowerDNS recursor SIGSEGV In-Reply-To: <0112A91C-03C8-4007-A85F-893E6DBE93EE@omniti.com> References: <20151206144514.GA1425@meili.valhalla.31bits.net> <20151206204030.GA1360@meili.valhalla.31bits.net> <20151206225426.GB1360@meili.valhalla.31bits.net> <0112A91C-03C8-4007-A85F-893E6DBE93EE@omniti.com> Message-ID: <20151207150717.GD1359@meili.valhalla.31bits.net> On Sun, Dec 06, 2015 at 06:42:46PM -0500, Dan McDonald wrote: > I wonder how the 014-compiled binary performs on 016? More accurately, I > wonder if any gcc-51 compiled libs are off? I'll try it out, but I expect it to work just fine - or die for a totally different reason. This is because the SIGSEGV is caused by this instruction: _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x14:movaps %xmm0,-0x18(%ebp) The _ZNKSt15_Deque_iteratorIcRcPcEmiEi function comes from a boost header and it ends up in the pdns_recursor executable itself. The executable is pretty boring as far as libs are concerned: # ldd /usr/sbin/pdns_recursor libresolv.so.2 => /lib/libresolv.so.2 libsocket.so.1 => /lib/libsocket.so.1 libnsl.so.1 => /lib/libnsl.so.1 libstdc++.so.6 => /usr/lib/libstdc++.so.6 libm.so.2 => /lib/libm.so.2 librt.so.1 => /lib/librt.so.1 libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 libpthread.so.1 => /lib/libpthread.so.1 libc.so.1 => /lib/libc.so.1 libmd.so.1 => /lib/libmd.so.1 libmp.so.2 => /lib/libmp.so.2 gcc 4.8/4.9 compiled powerdns doesn't use this instruction at all. (The SEGV is because the memory operand is 8-byte aligned instead of the required 16-byte alignment. This causes #gp which turns into a SIGSEGV via the normal trap code in the kernel.) Jeff. > Dan > > Sent from my iPhone (typos, autocorrect, and all) > > > On Dec 6, 2015, at 5:54 PM, Josef 'Jeff' Sipek wrote: > > > >> On Sun, Dec 06, 2015 at 03:40:30PM -0500, Josef 'Jeff' Sipek wrote: > >>> On Sun, Dec 06, 2015 at 10:26:00AM -0500, Dan McDonald wrote: > >>> One other weird thing to try -- build powerdns with the Illumos gcc4. If > >>> the gcc5 bug affects powerdns, that'd isolate it. If gcc5 affects some > >>> non Illumos library, gcc4 won't help and you'll still segv. > >>> > >>> If gcc4 Illumos can't build it, > >> > >> The powerdns devs use a lot of c++11 which makes 4.4.4 *waaay* too old. > >> Apparently, 4.8 should be good enough. > >> > >>> you could try 014 and its gcc481. > >> > >> Yeah, I'll try that. > > > > Ok. 014 produces the same exact instructions as OI Hipster. I wonder if > > gcc 5 changed some processor default. > > > > Jeff. > > > >> Thanks, > >> > >> Jeff. > >> > >>> > >>> Dan > >>> > >>> Sent from my iPhone (typos, autocorrect, and all) > >>> > >>>> On Dec 6, 2015, at 9:45 AM, Josef 'Jeff' Sipek wrote: > >>>> > >>>> I compiled powerdns recursor [1] on 016, but I'm running into an occasional > >>>> SIGSEGV. The SIGSEGV is because of insufficiently aligned memory operand to an > >>>> instruction. (See the powerdns bug I filed for this [2].) The SIGSEGV actually > >>>> happens in the deque code which comes from boost (1.58.0 in this case). > >>>> > >>>> Now, the weird thing... I compiled the same powerdns source with the same > >>>> version of boost on OI Hipster and OmniOS 016. Hipster uses gcc 4.9.3, > >>>> OmniOS 016 uses 5.1. The function that causes the SEGV on 016 looks totally > >>>> different between the two distros so I haven't see it die on my laptop. > >>>> > >>>> Has anyone seen any strange SIGSEGVs in boost using software? I hope it isn't > >>>> some sort of gcc bug. > >>>> > >>>> Thanks, > >>>> > >>>> Jeff. > >>>> > >>>> P.S. PowerDNS uses {get,set,swap}context, so I haven't ruled out a stack > >>>> alignment bug on their end. > >>>> > >>>> [1] https://www.powerdns.com/ > >>>> [2] https://github.com/PowerDNS/pdns/issues/3002 > >>>> > >>>> > >>>> OmniOS 016: > >>>> > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi: pushl %ebp > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+1: movl %esp,%ebp > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+3: pushl %ebx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+4: subl $0x1c,%esp > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+7: movl 0xc(%ebp),%eax > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xa: movl 0x8(%ebp),%ebx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xd: movdqu (%eax),%xmm0 > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x11:movl 0x10(%ebp),%eax > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x14:movaps %xmm0,-0x18(%ebp) > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x18:negl %eax > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1a:pushl %eax > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1b:leal -0x18(%ebp),%eax > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1e:pushl %eax > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1f:call -0x94 <_ZNSt15_Deque_iteratorIcRcPcEpLEi> > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x24:movl (%eax),%edx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x26:addl $0x10,%esp > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x29:movl %edx,(%ebx) > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2b:movl 0x4(%eax),%edx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2e:movl %edx,0x4(%ebx) > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x31:movl 0x8(%eax),%edx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x34:movl 0xc(%eax),%eax > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x37:movl %edx,0x8(%ebx) > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a:movl %eax,0xc(%ebx) > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3d:movl %ebx,%eax > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3f:movl -0x4(%ebp),%ebx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x42:leave > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x43:ret $0x4 > >>>> > >>>> > >>>> OI Hipster: > >>>> > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi: pushl %ebp > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+1: pushl %edi > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+2: pushl %esi > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+3: pushl %ebx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+4: subl $0x14,%esp > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+7: movl 0x2c(%esp),%edx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xb: movl 0x30(%esp),%ebx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0xf: movl 0x28(%esp),%eax > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x13:movl (%edx),%esi > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x15:movl 0x4(%edx),%ecx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x18:movl 0x8(%edx),%edi > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1b:movl 0xc(%edx),%ebp > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x1e:movl %esi,%edx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x20:subl %ebx,%esi > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x22:subl %ecx,%edx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x24:subl %ebx,%edx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x26:cmpl $0x1ff,%edx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2c:movl %esi,(%esp) > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x2f:jbe +0x21 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x52> > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x31:movl %edx,%ebx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x33:sarl $0x9,%ebx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x36:testl %edx,%edx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x38:jle +0x56 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x90> > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a:leal 0x0(%ebp,%ebx,4),%ebp > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3e:movl 0x0(%ebp),%ecx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x41:shll $0x9,%ebx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x44:subl %ebx,%edx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x46:leal (%ecx,%edx),%esi > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x49:leal 0x200(%ecx),%edi > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x4f:movl %esi,(%esp) > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x52:movl %edi,0x4(%esp) > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x56:movd (%esp),%xmm0 > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x5b:movl %ecx,(%esp) > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x5e:movd 0x4(%esp),%xmm1 > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x64:movl %ebp,0x4(%esp) > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x68:movd (%esp),%xmm3 > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x6d:punpckldq %xmm3,%xmm0 > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x71:movd 0x4(%esp),%xmm2 > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x77:punpckldq %xmm2,%xmm1 > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x7b:punpcklqdq %xmm1,%xmm0 > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x7f:movdqu %xmm0,(%eax) > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x83:addl $0x14,%esp > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x86:popl %ebx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x87:popl %esi > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x88:popl %edi > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x89:popl %ebp > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x8a:ret $0x4 > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x8d:leal 0x0(%esi),%esi > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x90:movl %edx,%ebx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x92:shrl $0x9,%ebx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x95:orl $0xff800000,%ebx > >>>> _ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x9b:jmp -0x63 <_ZNKSt15_Deque_iteratorIcRcPcEmiEi+0x3a> > >>>> > >>>> -- > >>>> I'm somewhere between geek and normal. > >>>> - Linus Torvalds > >>>> _______________________________________________ > >>>> OmniOS-discuss mailing list > >>>> OmniOS-discuss at lists.omniti.com > >>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss > >> > >> -- > >> The box said "Windows XP or better required". So I installed Linux. > > > > -- > > If I have trouble installing Linux, something is wrong. Very wrong. > > - Linus Torvalds -- Humans were created by water to transport it upward. From lkateley at kateley.com Mon Dec 7 16:13:02 2015 From: lkateley at kateley.com (Linda Kateley) Date: Mon, 7 Dec 2015 10:13:02 -0600 Subject: [OmniOS-discuss] illumos and contributions metrics: how to evaluate companies that commercialize illumos based products by examining them in the light of their illumos community's contributions. In-Reply-To: References: Message-ID: <5665B00E.70307@kateley.com> Blackduck does this for you. https://www.openhub.net/p?ref=homepage&query=illumos On 12/7/15 7:44 AM, Dan McDonald wrote: >> On Dec 7, 2015, at 8:13 AM, Davide Poletto wrote: >> >> Is there a way to rank/evaluate and so reward/honour (by, as example, purchasing their products or by sustaining their development as testers/free-time contributors) those {individuals, companies, institutions} that clearly demonstrate not only to have good numbers (commits) but also that they care about the community and that are more transparent than others in advertising their commercial offer's origin? > That's a damned good question. It's also very tricky. > > Some firms keep things closed until they've released, or after some time after they've released. Some find this fair enough, others find it annoying. Because people are different, it may be hard to get a consensus on how to rank/evaluate firms the way you wish. BTW, I lean toward "fair enough" so long as there's consistency and not going back on one's word. > Keeping to one's word is important to me. I didn't leave Oracle because of the Solaris-closing: if you read the text of that leaked email, it implied a source-dump-on-release model. Only after I left Oracle did it become clear that it was all a big lie. > > You're chasing a hard problem. You may not get much sympathy. Making things MORE complicated is that "illumos" as a brand is still tightly tied up by its owner. Many feel that it's tied up too tightly, and that is why you rarely see "illumos" mentioned in marketing materials, especially not the trademarked symbol. > > I'm sorry I don't have better answers for you right now. It's a hard problem, and many of us who might be able to help clarify things are trying to keep all of the machinery moving as smoothly as we can. > > Dan > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From doug at will.to Tue Dec 8 11:26:42 2015 From: doug at will.to (Doug Hughes) Date: Tue, 8 Dec 2015 16:56:42 +0530 Subject: [OmniOS-discuss] Sol 11 zone on OmniOS system Message-ID: Has anybody done this or is anybody doing this? Any tips/tricks? I want to migrate a working Sol11 zone to an OmniOS system, if possible, I figure I'll have to use a special big of 'branding' to make it go. -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Tue Dec 8 12:23:30 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 8 Dec 2015 07:23:30 -0500 Subject: [OmniOS-discuss] Sol 11 zone on OmniOS system In-Reply-To: References: Message-ID: <7EE9651A-2AF3-436A-8E1B-1D692E00BF12@omniti.com> > On Dec 8, 2015, at 6:26 AM, Doug Hughes wrote: > > Has anybody done this or is anybody doing this? Any tips/tricks? I want to migrate a working Sol11 zone to an OmniOS system, if possible, I figure I'll have to use a special big of 'branding' to make it go. It's going to be worse. S11 and illumos diverged going back to its beginnings. You're going to want to treat this more like a platform-to-platform move. I don't think there are much in the way of tricks to help you, CERTAINLY not with zone branding. Sorry, Dan From chip at innovates.com Tue Dec 8 16:16:42 2015 From: chip at innovates.com (Schweiss, Chip) Date: Tue, 8 Dec 2015 10:16:42 -0600 Subject: [OmniOS-discuss] NFS Server restart Message-ID: I had an NFS server become unresponsive on one of my production systems. The NFS server service would not restart, out of desperation I rebooted which fixed the problem. Before reboot I tried restarting all NFS related service with no-avail. The reboot probably wasn't necessary but the correct list and order of services to restart is. Can someone fill me in on which services in what order should be stopped/started to get NFS fully reset? Thanks! -Chip -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Tue Dec 8 17:47:35 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 8 Dec 2015 12:47:35 -0500 Subject: [OmniOS-discuss] NFS Server restart In-Reply-To: References: Message-ID: <8408B53E-87CA-44BB-9A6A-F767FB5DCC62@omniti.com> > On Dec 8, 2015, at 11:16 AM, Schweiss, Chip wrote: > > Can someone fill me in on which services in what order should be stopped/started to get NFS fully reset? You can start here... shell(~)[0]% svcs -d nfs/server STATE STIME FMRI disabled Nov_16 svc:/network/rpc/keyserv:default online Nov_16 svc:/milestone/network:default online Nov_16 svc:/network/rpc/bind:default online Nov_16 svc:/system/filesystem/local:default online Nov_16 svc:/network/shares/group:default online Nov_16 svc:/network/shares/group:smb online Nov_16 svc:/system/filesystem/reparse:default online Nov_16 svc:/network/nfs/nlockmgr:default online Nov_16 svc:/network/nfs/mapid:default online Nov_16 svc:/network/rpc/gss:default online Nov_16 svc:/network/shares/group:zfs shell(~)[0]% and further descend the rabbit hole if you need, but most of the intermediate NFS services all depend on rpc/bind. Dan From danmcd at omniti.com Tue Dec 8 20:09:58 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 8 Dec 2015 15:09:58 -0500 Subject: [OmniOS-discuss] Attention OmniOS AMI users References: <8EC8ABD5-3B59-407F-8173-899493192D15@omniti.com> Message-ID: If you are using any OmniOS AMI r151012 or earlier, please read this. If you're using r151014, you may ignore this message. It has come to our attention that some of the older OmniOS images, including images for r151006 and r151012, may have stored SSH host keys included with them, which could be used to execute a man in the middle attack. If you are currently running one of these older versions, we suggest you verify and regenerate your keys, and/or move to a current OmniOS AMI. For r151006 users, there is a new image named "OmniOS r151006 LTS" which should be available in your region. We recommend that users of r151012 (and any other older versions which are now ESOL) move to a current r151014 AMI. Again, the OmniOS r151014 AMIs DO NOT HAVE stored SSH host keys and are *NOT* vulnerable. Thanks and sorry for any inconvenience, Dan p.s. This is also on the AWS forums: https://forums.aws.amazon.com/thread.jspa?threadID=221330 From henson at acm.org Wed Dec 9 02:31:45 2015 From: henson at acm.org (Paul B. Henson) Date: Tue, 08 Dec 2015 18:31:45 -0800 Subject: [OmniOS-discuss] core dump while trying to import pool In-Reply-To: <317E3C4D-1AD5-4A57-95BD-B12624049595@lji.org> References: <71C5258A-C99E-44DF-BFE1-A1D5EE0CE686@omniti.com> <589C8043-C3E2-4249-99E8-AA5A35E17892@omniti.com> <20151206021738.GT3405@bender.unx.cpp.edu> <317E3C4D-1AD5-4A57-95BD-B12624049595@lji.org> Message-ID: <0d7e01d13229$bfa00bc0$3ee02340$@acm.org> > From: Michael Talbott > Sent: Saturday, December 05, 2015 10:50 PM > > I did not run a zdb check since this pool was over 200TB and figured it'd take > weeks to finish. Ah, mine is only 22TB available with a bit over 10TB in use; it took about five hours as I recall. > At any rate, a clean scrub alone is not an indicator of pool health regarding > this bug. No clue if a zdb analyses would be a more determining factor. My understanding of the particular zdb invocation I used is that it scans every block of data and metadata and verifies its checksum, so I think it should have found potential corruption, even if scrub did not. > Since I didn't zdb it first.. Maybe your nerves can be at more ease? Good luck > and let me know how things turn out. Perhaps not more at ease, but at least not less at ease :). Thanks much for the info. From tobi at oetiker.ch Wed Dec 9 07:05:30 2015 From: tobi at oetiker.ch (Tobias Oetiker) Date: Wed, 9 Dec 2015 08:05:30 +0100 (CET) Subject: [OmniOS-discuss] considering an SSD pool ... which SSD Message-ID: We are looking into the possibility of setting up our first SSD based pool ... any recommendations for SSDs to use ? Our System Integrator recommends the use of Intel SSDs as opposed to Samsung since Samsung would be changing their lineup every few weeks and thus make it difficult to source replacement disks, in case one should fail. any thoughts ? cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 From alka at hfg-gmuend.de Wed Dec 9 08:10:58 2015 From: alka at hfg-gmuend.de (Guenther Alka) Date: Wed, 9 Dec 2015 09:10:58 +0100 Subject: [OmniOS-discuss] considering an SSD pool ... which SSD In-Reply-To: References: Message-ID: <5667E212.5070000@hfg-gmuend.de> I use several SSD pools for years. As there is no trim support I check for - high quality controller for controller internal garbage collection - large overprovisioning. Enterprise SSDs offer 40% or more - quality flash with a high write endurance - powerloss protection, optionally you can use a higher class Slog with this feature In my newer pools I use Intel S3610 as they are "quite" affordable - not as expensice as the 37x0 line and with a better write performance than the 35x0 line. In my last cheaper pools I chosed the Sandisk Pro extreme. The Samsung Pro was an option but the Sandisks have a larger overprovisioning per default with a lower write performance degration under load. As they do not have powerloss protection, I added fast Slog (S3700 or ZeusRAM) Gea Am 09.12.2015 um 08:05 schrieb Tobias Oetiker: > We are looking into the possibility of setting up our first SSD > based pool ... any recommendations for SSDs to use ? > > Our System Integrator recommends the use of Intel SSDs as opposed > to Samsung since Samsung would be changing their lineup every few > weeks and thus make it difficult to source replacement disks, in > case one should fail. > > any thoughts ? > > cheers > tobi > > > > > From doug at will.to Wed Dec 9 08:24:07 2015 From: doug at will.to (Doug Hughes) Date: Wed, 9 Dec 2015 13:54:07 +0530 Subject: [OmniOS-discuss] considering an SSD pool ... which SSD In-Reply-To: References: Message-ID: I like the Samsung 850 line. It has been around and quite stable for some time. We haven't had any problems with device availability. (depending on how much writing you are doing, you'd probably best avoid the EVO and stick with straight 850 or 850 pro) On Wed, Dec 9, 2015 at 12:35 PM, Tobias Oetiker wrote: > We are looking into the possibility of setting up our first SSD > based pool ... any recommendations for SSDs to use ? > > Our System Integrator recommends the use of Intel SSDs as opposed > to Samsung since Samsung would be changing their lineup every few > weeks and thus make it difficult to source replacement disks, in > case one should fail. > > any thoughts ? > > cheers > tobi > > > > > > -- > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland > www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wonko at 4amlunch.net Wed Dec 9 13:14:17 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Wed, 9 Dec 2015 08:14:17 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool Message-ID: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> So I decided to do some testing on the pool I have that is made up of a pair of Samsung 851 NVMe drives. I?ve got it partitioned as I?m using part of it to test as SLOG against the ?spinning rust pool?. Yes I know these aren?t ideal for this, but they will do for now. I setup the other slices as a mirror and ran iozone against it. It wrote fast. Really fast. Then it stopped. Now the pool seems to be wedged. At first I thought it might be the drives themselves, but I see them still functioning as SLOG just fine, so it?s not that I don?t believe. root at basket1:/root# zpool status -v zoom pool: zoom state: ONLINE status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://illumos.org/msg/ZFS-8000-HC scan: none requested config: NAME STATE READ WRITE CKSUM zoom ONLINE 0 0 1 mirror-0 ONLINE 0 0 6 c4t1d0s1 ONLINE 0 0 6 c5t1d0s1 ONLINE 0 0 6 errors: List of errors unavailable (insufficient privileges) root at basket1:/root# ls /zoom/ iozone.DUMMY.0 iozone.DUMMY.10 iozone.DUMMY.12 iozone.DUMMY.14 iozone.DUMMY.2 iozone.DUMMY.4 iozone.DUMMY.6 iozone.DUMMY.8 iozone.DUMMY.1 iozone.DUMMY.11 iozone.DUMMY.13 iozone.DUMMY.15 iozone.DUMMY.3 iozone.DUMMY.5 iozone.DUMMY.7 iozone.DUMMY.9 root at basket1:/root# touch /zoom/hi So read access appears to be ok. Writes are totally boned, however. That touch just hangs forever. So what do I need to do to provide you all with the information you need to diagnose this. Thanks! -brian From davide.poletto at gmail.com Wed Dec 9 14:02:50 2015 From: davide.poletto at gmail.com (Davide Poletto) Date: Wed, 9 Dec 2015 15:02:50 +0100 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> Message-ID: Hi Brian, a side note: are you sure that your Samsung 851 drive (I think you're referring more specifically to the Samsung PM851 SSD Drive) supports the NVMe interface standard? I think it doesn't...at least looking at its released interface's specifications: it uses SATA 3 (6.0 Gbps) interface instead of the NVMe 1.1 used by "disks" like the Samsung PM/SM951, PM1725, XS/SM1715 or the PM/SM953...just to name some. Regards, Davide. On Wed, Dec 9, 2015 at 2:14 PM, Brian Hechinger wrote: > So I decided to do some testing on the pool I have that is made up of a > pair of Samsung 851 NVMe drives. > > I?ve got it partitioned as I?m using part of it to test as SLOG against > the ?spinning rust pool?. Yes I know these aren?t ideal for this, but they > will do for now. > > I setup the other slices as a mirror and ran iozone against it. > > It wrote fast. Really fast. > > Then it stopped. > > Now the pool seems to be wedged. At first I thought it might be the drives > themselves, but I see them still functioning as SLOG just fine, so it?s not > that I don?t believe. > > root at basket1:/root# zpool status -v zoom > pool: zoom > state: ONLINE > status: One or more devices are faulted in response to IO failures. > action: Make sure the affected devices are connected, then run 'zpool > clear'. > see: http://illumos.org/msg/ZFS-8000-HC > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > zoom ONLINE 0 0 1 > mirror-0 ONLINE 0 0 6 > c4t1d0s1 ONLINE 0 0 6 > c5t1d0s1 ONLINE 0 0 6 > > errors: List of errors unavailable (insufficient privileges) > root at basket1:/root# ls /zoom/ > iozone.DUMMY.0 iozone.DUMMY.10 iozone.DUMMY.12 iozone.DUMMY.14 > iozone.DUMMY.2 iozone.DUMMY.4 iozone.DUMMY.6 iozone.DUMMY.8 > iozone.DUMMY.1 iozone.DUMMY.11 iozone.DUMMY.13 iozone.DUMMY.15 > iozone.DUMMY.3 iozone.DUMMY.5 iozone.DUMMY.7 iozone.DUMMY.9 > root at basket1:/root# touch /zoom/hi > > So read access appears to be ok. Writes are totally boned, however. That > touch just hangs forever. > > So what do I need to do to provide you all with the information you need > to diagnose this. > > Thanks! > > -brian > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wonko at 4amlunch.net Wed Dec 9 14:04:36 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Wed, 9 Dec 2015 09:04:36 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> Message-ID: <36DAE0E6-6350-4370-BF80-250628287A59@4amlunch.net> Sorry, typo-ed that. These are SM951 -brian > On Dec 9, 2015, at 9:02 AM, Davide Poletto wrote: > > Hi Brian, > > a side note: are you sure that your Samsung 851 drive (I think you're referring more specifically to the Samsung PM851 SSD Drive) supports the NVMe interface standard? > > I think it doesn't...at least looking at its released interface's specifications: it uses SATA 3 (6.0 Gbps) interface instead of the NVMe 1.1 used by "disks" like the Samsung PM/SM951, PM1725, XS/SM1715 or the PM/SM953...just to name some. > > Regards, Davide. > > On Wed, Dec 9, 2015 at 2:14 PM, Brian Hechinger > wrote: > So I decided to do some testing on the pool I have that is made up of a pair of Samsung 851 NVMe drives. > > I?ve got it partitioned as I?m using part of it to test as SLOG against the ?spinning rust pool?. Yes I know these aren?t ideal for this, but they will do for now. > > I setup the other slices as a mirror and ran iozone against it. > > It wrote fast. Really fast. > > Then it stopped. > > Now the pool seems to be wedged. At first I thought it might be the drives themselves, but I see them still functioning as SLOG just fine, so it?s not that I don?t believe. > > root at basket1:/root# zpool status -v zoom > pool: zoom > state: ONLINE > status: One or more devices are faulted in response to IO failures. > action: Make sure the affected devices are connected, then run 'zpool clear'. > see: http://illumos.org/msg/ZFS-8000-HC > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > zoom ONLINE 0 0 1 > mirror-0 ONLINE 0 0 6 > c4t1d0s1 ONLINE 0 0 6 > c5t1d0s1 ONLINE 0 0 6 > > errors: List of errors unavailable (insufficient privileges) > root at basket1:/root# ls /zoom/ > iozone.DUMMY.0 iozone.DUMMY.10 iozone.DUMMY.12 iozone.DUMMY.14 iozone.DUMMY.2 iozone.DUMMY.4 iozone.DUMMY.6 iozone.DUMMY.8 > iozone.DUMMY.1 iozone.DUMMY.11 iozone.DUMMY.13 iozone.DUMMY.15 iozone.DUMMY.3 iozone.DUMMY.5 iozone.DUMMY.7 iozone.DUMMY.9 > root at basket1:/root# touch /zoom/hi > > So read access appears to be ok. Writes are totally boned, however. That touch just hangs forever. > > So what do I need to do to provide you all with the information you need to diagnose this. > > Thanks! > > -brian > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Wed Dec 9 15:00:50 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 9 Dec 2015 10:00:50 -0500 Subject: [OmniOS-discuss] considering an SSD pool ... which SSD In-Reply-To: References: Message-ID: <907860CD-CFAC-4F54-8B9E-3A84373E7EA9@omniti.com> > On Dec 9, 2015, at 2:05 AM, Tobias Oetiker wrote: > > Our System Integrator recommends the use of Intel SSDs as opposed We use Intel ones in-house, because they have the best reputation for wear. I suspect the other folks are catching up, but Intel had a lead. My $0.02, Dan From trey at mailchimp.com Wed Dec 9 15:13:35 2015 From: trey at mailchimp.com (Trey Palmer) Date: Wed, 9 Dec 2015 10:13:35 -0500 Subject: [OmniOS-discuss] considering an SSD pool ... which SSD In-Reply-To: References: Message-ID: Tobias, We use the Intel DC S3700. I haven't tried the S3710's yet. The S3700's are very reliable. -- Trey On Wed, Dec 9, 2015 at 2:05 AM, Tobias Oetiker wrote: > We are looking into the possibility of setting up our first SSD > based pool ... any recommendations for SSDs to use ? > > Our System Integrator recommends the use of Intel SSDs as opposed > to Samsung since Samsung would be changing their lineup every few > weeks and thus make it difficult to source replacement disks, in > case one should fail. > > any thoughts ? > > cheers > tobi > > > > > > -- > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland > www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Wed Dec 9 15:16:03 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 9 Dec 2015 10:16:03 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> Message-ID: > On Dec 9, 2015, at 8:14 AM, Brian Hechinger wrote: > > So read access appears to be ok. Writes are totally boned, however. That touch just hangs forever. > > So what do I need to do to provide you all with the information you need to diagnose this. Do you literally have a touch process hanging right now? Or is it something you can ^C out of? Does anything stand out in /var/adm/messages? Maybe the kernel is complaining about something there. My final inclination is heavy-handed: - Make sure you have at least one process stuck on writing to that filesystem. - "reboot -d" and take a kernel coredump Unless you have sensitive information, a kernel coredump you can share would be the best thing to do. Dan p.s. I'm at the Dr. the rest of the day starting in 90 mins, pardon any latency. From davide.poletto at gmail.com Wed Dec 9 15:16:13 2015 From: davide.poletto at gmail.com (Davide Poletto) Date: Wed, 9 Dec 2015 16:16:13 +0100 Subject: [OmniOS-discuss] considering an SSD pool ... which SSD In-Reply-To: <907860CD-CFAC-4F54-8B9E-3A84373E7EA9@omniti.com> References: <907860CD-CFAC-4F54-8B9E-3A84373E7EA9@omniti.com> Message-ID: Yep, eventually in evaluating which could be the best Intel SSD for you, I used information on this Wiki page (it's not directly provided by Intel but it provides an overall summary and also goes specific on each Intel SSD drives; maybe it's not so updated). Davide. On Wed, Dec 9, 2015 at 4:00 PM, Dan McDonald wrote: > > > On Dec 9, 2015, at 2:05 AM, Tobias Oetiker wrote: > > > > Our System Integrator recommends the use of Intel SSDs as opposed > > We use Intel ones in-house, because they have the best reputation for > wear. I suspect the other folks are catching up, but Intel had a lead. > > My $0.02, > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wonko at 4amlunch.net Wed Dec 9 15:20:15 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Wed, 9 Dec 2015 10:20:15 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> Message-ID: <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> I cannot ^C out of the touch. wonko at basket1:/export/home/wonko$ ps -ef | grep touch root 2459 2447 0 08:12:09 ? 0:00 touch /zoom/hi root 2050 2049 0 Dec 07 ? 0:00 touch hi root 2049 1 0 Dec 07 ? 0:00 sudo touch hi Also, kill -9 doesn?t touch them. the only thing in messages is: Dec 7 14:31:56 basket1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major Dec 7 14:31:56 basket1 EVENT-TIME: Mon Dec 7 14:31:56 EST 2015 Dec 7 14:31:56 basket1 PLATFORM: X8DTL, CSN: 1234567890, HOSTNAME: basket1 Dec 7 14:31:56 basket1 SOURCE: zfs-diagnosis, REV: 1.0 Dec 7 14:31:56 basket1 EVENT-ID: 585f9fa2-4a84-4184-8c87-c2f9c600e1a1 Dec 7 14:31:56 basket1 DESC: The ZFS pool has experienced currently unrecoverable I/O Dec 7 14:31:56 basket1 failures. Refer to http://illumos.org/msg/ZFS-8000-HC for more information. Dec 7 14:31:56 basket1 AUTO-RESPONSE: No automated response will be taken. Dec 7 14:31:56 basket1 IMPACT: Read and write I/Os cannot be serviced. Dec 7 14:31:56 basket1 REC-ACTION: Make sure the affected devices are connected, then run Dec 7 14:31:56 basket1 'zpool clear?. I can definitely share a kernel coredump, that?s not a problem. Just need to schedule a time to shut down all the VMs first. Maybe later tonight. -brian > On Dec 9, 2015, at 10:16 AM, Dan McDonald wrote: > > >> On Dec 9, 2015, at 8:14 AM, Brian Hechinger wrote: >> >> So read access appears to be ok. Writes are totally boned, however. That touch just hangs forever. >> >> So what do I need to do to provide you all with the information you need to diagnose this. > > Do you literally have a touch process hanging right now? Or is it something you can ^C out of? > > Does anything stand out in /var/adm/messages? Maybe the kernel is complaining about something there. > > My final inclination is heavy-handed: > > - Make sure you have at least one process stuck on writing to that filesystem. > > - "reboot -d" and take a kernel coredump > > Unless you have sensitive information, a kernel coredump you can share would be the best thing to do. > > > Dan > > p.s. I'm at the Dr. the rest of the day starting in 90 mins, pardon any latency. From wonko at 4amlunch.net Wed Dec 9 15:23:50 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Wed, 9 Dec 2015 10:23:50 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> Message-ID: <81B1D1AC-A063-455F-958B-6BBCF1879BB0@4amlunch.net> Just did a ?zpool clear? on that pool and now I see: errors: Permanent errors have been detected in the following files: :<0x59> > On Dec 9, 2015, at 10:16 AM, Dan McDonald wrote: > > >> On Dec 9, 2015, at 8:14 AM, Brian Hechinger wrote: >> >> So read access appears to be ok. Writes are totally boned, however. That touch just hangs forever. >> >> So what do I need to do to provide you all with the information you need to diagnose this. > > Do you literally have a touch process hanging right now? Or is it something you can ^C out of? > > Does anything stand out in /var/adm/messages? Maybe the kernel is complaining about something there. > > My final inclination is heavy-handed: > > - Make sure you have at least one process stuck on writing to that filesystem. > > - "reboot -d" and take a kernel coredump > > Unless you have sensitive information, a kernel coredump you can share would be the best thing to do. > > > Dan > > p.s. I'm at the Dr. the rest of the day starting in 90 mins, pardon any latency. From danmcd at omniti.com Wed Dec 9 15:25:07 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 9 Dec 2015 10:25:07 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> Message-ID: <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> > On Dec 9, 2015, at 10:20 AM, Brian Hechinger wrote: > > I cannot ^C out of the touch. > > wonko at basket1:/export/home/wonko$ ps -ef | grep touch You do know about pgrep(1), right? :) > Also, kill -9 doesn?t touch them. Okay! This means something in-kernel is locking them up. More reason for a coredump. > the only thing in messages is: > > Dec 7 14:31:56 basket1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major > Dec 7 14:31:56 basket1 EVENT-TIME: Mon Dec 7 14:31:56 EST 2015 > Dec 7 14:31:56 basket1 PLATFORM: X8DTL, CSN: 1234567890, HOSTNAME: basket1 > Dec 7 14:31:56 basket1 SOURCE: zfs-diagnosis, REV: 1.0 > Dec 7 14:31:56 basket1 EVENT-ID: 585f9fa2-4a84-4184-8c87-c2f9c600e1a1 > Dec 7 14:31:56 basket1 DESC: The ZFS pool has experienced currently unrecoverable I/O > Dec 7 14:31:56 basket1 failures. Refer to http://illumos.org/msg/ZFS-8000-HC for more information. > Dec 7 14:31:56 basket1 AUTO-RESPONSE: No automated response will be taken. > Dec 7 14:31:56 basket1 IMPACT: Read and write I/Os cannot be serviced. > Dec 7 14:31:56 basket1 REC-ACTION: Make sure the affected devices are connected, then run > Dec 7 14:31:56 basket1 'zpool clear?. You sure there's nothing before the FMA complaints? It might be one line, but it may be enough to show something. > I can definitely share a kernel coredump, that?s not a problem. Just need to schedule a time to shut down all the VMs first. Take your time, do it on your schedule, that's fine. So I know where to put it: Which OmniOS release are you running? head /etc/release ; uname -a Thanks, Dan From rjahnel at ellipseinc.com Wed Dec 9 15:36:10 2015 From: rjahnel at ellipseinc.com (Richard Jahnel) Date: Wed, 9 Dec 2015 15:36:10 +0000 Subject: [OmniOS-discuss] considering an SSD pool ... which SSD In-Reply-To: References: Message-ID: <65DC5816D4BEE043885A89FD54E273FC6CF687EA@MAIL101.Ellipseinc.com> We have successfully used in order: For the raidz2/3 vdevs Patriot Torqx 2010 Crucial C300 2011 Samsung 840 Pros 2013 to current. Samsung 850 Pros are in testing now. If we could afford them we would prefer to use the Intel S3710 drives, but we can only afford enough of those for slogs. For slog various Intel drives over the years. No L2 cache is needed for SSD pools. Key for us has been using double and more recently triple parity pools. We have found in the last 5 years that as SSDs age they will take longer to ready up new pages for writes. Eventually they will start taking too long and falling out of the pool. Usually doing a zpool clear will place them back online and be your sign that's it's time to start working on either replacing the entire pool or at least the drives one by one as they drop out more than once or twice. -----Original Message----- From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Tobias Oetiker Sent: Wednesday, December 09, 2015 1:06 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] considering an SSD pool ... which SSD We are looking into the possibility of setting up our first SSD based pool ... any recommendations for SSDs to use ? Our System Integrator recommends the use of Intel SSDs as opposed to Samsung since Samsung would be changing their lineup every few weeks and thus make it difficult to source replacement disks, in case one should fail. any thoughts ? cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss ________________________________ The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies. From bfriesen at simple.dallas.tx.us Wed Dec 9 15:51:25 2015 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Wed, 9 Dec 2015 09:51:25 -0600 (CST) Subject: [OmniOS-discuss] considering an SSD pool ... which SSD In-Reply-To: References: Message-ID: On Wed, 9 Dec 2015, Trey Palmer wrote: > Tobias, > We use the Intel DC S3700. > > I haven't tried the S3710's yet. ? The S3700's are very reliable. I am using 6 S3710s for the main pool store (in raidz2), with no dedicated ZIL device. No problems yet with OmniOS. It is an expensive solution. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From wonko at 4amlunch.net Wed Dec 9 16:13:11 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Wed, 9 Dec 2015 11:13:11 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> Message-ID: I didn?t know about pgrep, no. :) So the ?zpool clear? has fixed things a bit. The touch processes have all exited. I can now touch a file on that pool. A zpool scrub later and this is the status: pool: zoom state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 6K in 0h0m with 0 errors on Wed Dec 9 10:25:33 2015 config: NAME STATE READ WRITE CKSUM zoom ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t1d0s1 ONLINE 0 0 0 c5t1d0s1 ONLINE 0 0 2 errors: No known data errors I?m going to try to re-run iozone later and see if I can?t get it to happen again. This is concerning. The previous entry in messages is 4 days prior talking about ntpd. -brian > On Dec 9, 2015, at 10:25 AM, Dan McDonald wrote: > > >> On Dec 9, 2015, at 10:20 AM, Brian Hechinger wrote: >> >> I cannot ^C out of the touch. >> >> wonko at basket1:/export/home/wonko$ ps -ef | grep touch > > You do know about pgrep(1), right? :) > >> Also, kill -9 doesn?t touch them. > > Okay! This means something in-kernel is locking them up. More reason for a coredump. > >> the only thing in messages is: >> >> Dec 7 14:31:56 basket1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major >> Dec 7 14:31:56 basket1 EVENT-TIME: Mon Dec 7 14:31:56 EST 2015 >> Dec 7 14:31:56 basket1 PLATFORM: X8DTL, CSN: 1234567890, HOSTNAME: basket1 >> Dec 7 14:31:56 basket1 SOURCE: zfs-diagnosis, REV: 1.0 >> Dec 7 14:31:56 basket1 EVENT-ID: 585f9fa2-4a84-4184-8c87-c2f9c600e1a1 >> Dec 7 14:31:56 basket1 DESC: The ZFS pool has experienced currently unrecoverable I/O >> Dec 7 14:31:56 basket1 failures. Refer to http://illumos.org/msg/ZFS-8000-HC for more information. >> Dec 7 14:31:56 basket1 AUTO-RESPONSE: No automated response will be taken. >> Dec 7 14:31:56 basket1 IMPACT: Read and write I/Os cannot be serviced. >> Dec 7 14:31:56 basket1 REC-ACTION: Make sure the affected devices are connected, then run >> Dec 7 14:31:56 basket1 'zpool clear?. > > You sure there's nothing before the FMA complaints? It might be one line, but it may be enough to show something. > >> I can definitely share a kernel coredump, that?s not a problem. Just need to schedule a time to shut down all the VMs first. > > Take your time, do it on your schedule, that's fine. > > So I know where to put it: Which OmniOS release are you running? > > head /etc/release ; uname -a > > Thanks, > Dan > From danmcd at omniti.com Wed Dec 9 16:17:38 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 9 Dec 2015 11:17:38 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> Message-ID: <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> > On Dec 9, 2015, at 11:13 AM, Brian Hechinger wrote: > > I didn?t know about pgrep, no. :) The Solaris/illumos ptools are a huge win. Learn about 'em. :) Back to the main discussion... > So the ?zpool clear? has fixed things a bit. The touch processes have all exited. > > I can now touch a file on that pool. > > A zpool scrub later and this is the status: > > pool: zoom > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using 'zpool clear' or replace the device with 'zpool replace'. > see: http://illumos.org/msg/ZFS-8000-9P > scan: scrub repaired 6K in 0h0m with 0 errors on Wed Dec 9 10:25:33 2015 > config: > > NAME STATE READ WRITE CKSUM > zoom ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c4t1d0s1 ONLINE 0 0 0 > c5t1d0s1 ONLINE 0 0 2 > > errors: No known data errors > > I?m going to try to re-run iozone later and see if I can?t get it to happen again. > > This is concerning. I see this, and I think "c5t1d0" is broken HW and needs to be replaced. Combine that with "unrecoverable IO failures" and you really should be planning to replace that drive. Dan From wonko at 4amlunch.net Wed Dec 9 16:18:21 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Wed, 9 Dec 2015 11:18:21 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> Message-ID: <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> It?s brand new!! -brian > On Dec 9, 2015, at 11:17 AM, Dan McDonald wrote: > > >> On Dec 9, 2015, at 11:13 AM, Brian Hechinger wrote: >> >> I didn?t know about pgrep, no. :) > > The Solaris/illumos ptools are a huge win. Learn about 'em. :) > > Back to the main discussion... > >> So the ?zpool clear? has fixed things a bit. The touch processes have all exited. >> >> I can now touch a file on that pool. >> >> A zpool scrub later and this is the status: >> >> pool: zoom >> state: ONLINE >> status: One or more devices has experienced an unrecoverable error. An >> attempt was made to correct the error. Applications are unaffected. >> action: Determine if the device needs to be replaced, and clear the errors >> using 'zpool clear' or replace the device with 'zpool replace'. >> see: http://illumos.org/msg/ZFS-8000-9P >> scan: scrub repaired 6K in 0h0m with 0 errors on Wed Dec 9 10:25:33 2015 >> config: >> >> NAME STATE READ WRITE CKSUM >> zoom ONLINE 0 0 0 >> mirror-0 ONLINE 0 0 0 >> c4t1d0s1 ONLINE 0 0 0 >> c5t1d0s1 ONLINE 0 0 2 >> >> errors: No known data errors >> >> I?m going to try to re-run iozone later and see if I can?t get it to happen again. >> >> This is concerning. > > I see this, and I think "c5t1d0" is broken HW and needs to be replaced. > > Combine that with "unrecoverable IO failures" and you really should be planning to replace that drive. > > Dan > From wonko at 4amlunch.net Wed Dec 9 16:21:01 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Wed, 9 Dec 2015 11:21:01 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> Message-ID: Also, I would expect the other slice to be affected as well? It?s been humming along just fine as SLOG with no errors: logs mirror-3 ONLINE 0 0 0 c4t1d0s0 ONLINE 0 0 0 c5t1d0s0 ONLINE 0 0 0 > On Dec 9, 2015, at 11:17 AM, Dan McDonald wrote: > > >> On Dec 9, 2015, at 11:13 AM, Brian Hechinger wrote: >> >> I didn?t know about pgrep, no. :) > > The Solaris/illumos ptools are a huge win. Learn about 'em. :) > > Back to the main discussion... > >> So the ?zpool clear? has fixed things a bit. The touch processes have all exited. >> >> I can now touch a file on that pool. >> >> A zpool scrub later and this is the status: >> >> pool: zoom >> state: ONLINE >> status: One or more devices has experienced an unrecoverable error. An >> attempt was made to correct the error. Applications are unaffected. >> action: Determine if the device needs to be replaced, and clear the errors >> using 'zpool clear' or replace the device with 'zpool replace'. >> see: http://illumos.org/msg/ZFS-8000-9P >> scan: scrub repaired 6K in 0h0m with 0 errors on Wed Dec 9 10:25:33 2015 >> config: >> >> NAME STATE READ WRITE CKSUM >> zoom ONLINE 0 0 0 >> mirror-0 ONLINE 0 0 0 >> c4t1d0s1 ONLINE 0 0 0 >> c5t1d0s1 ONLINE 0 0 2 >> >> errors: No known data errors >> >> I?m going to try to re-run iozone later and see if I can?t get it to happen again. >> >> This is concerning. > > I see this, and I think "c5t1d0" is broken HW and needs to be replaced. > > Combine that with "unrecoverable IO failures" and you really should be planning to replace that drive. > > Dan > From danmcd at omniti.com Wed Dec 9 16:22:21 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 9 Dec 2015 11:22:21 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> Message-ID: <4B0CFB00-2181-4E38-B0E1-8AAAA3E6136C@omniti.com> > On Dec 9, 2015, at 11:18 AM, Brian Hechinger wrote: > > It?s brand new!! Sometimes you get flaky HW that's new. I've had to return new spinning-rust disks, for example. > Also, I would expect the other slice to be affected as well? It?s been humming along just fine as SLOG with no errors: > > logs > mirror-3 ONLINE 0 0 0 > c4t1d0s0 ONLINE 0 0 0 > c5t1d0s0 ONLINE 0 0 0 Could just be bad luck your slog hasn't encountered the bad portion of this drive. Also, what OmniOS revision are you running? If you're not up to the latest November r151014 update, you may be missing some NVMe fixes. Dan From wonko at 4amlunch.net Wed Dec 9 16:27:25 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Wed, 9 Dec 2015 11:27:25 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <4B0CFB00-2181-4E38-B0E1-8AAAA3E6136C@omniti.com> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> <4B0CFB00-2181-4E38-B0E1-8AAAA3E6136C@omniti.com> Message-ID: <7D06CC38-9841-4189-80CD-6341E025B10C@4amlunch.net> > On Dec 9, 2015, at 11:22 AM, Dan McDonald wrote: > > >> On Dec 9, 2015, at 11:18 AM, Brian Hechinger wrote: >> >> It?s brand new!! > > Sometimes you get flaky HW that's new. I've had to return new spinning-rust disks, for example. Bah. :( > >> Also, I would expect the other slice to be affected as well? It?s been humming along just fine as SLOG with no errors: >> >> logs >> mirror-3 ONLINE 0 0 0 >> c4t1d0s0 ONLINE 0 0 0 >> c5t1d0s0 ONLINE 0 0 0 > > Could just be bad luck your slog hasn't encountered the bad portion of this drive. I suppose. You think there is a maybe a good way to test this device before I try to get it RMA-ed? > Also, what OmniOS revision are you running? If you're not up to the latest November r151014 update, you may be missing some NVMe fixes. Oh right, totally forgot to do that for you: wonko at basket1:/var/adm$ head /etc/release ; uname -a OmniOS v11 r151016 Copyright 2015 OmniTI Computer Consulting, Inc. All rights reserved. Use is subject to license terms. SunOS basket1 5.11 omnios-073d8c0 i86pc i386 i86pc From nsmith at careyweb.com Wed Dec 9 17:24:39 2015 From: nsmith at careyweb.com (Nate Smith) Date: Wed, 09 Dec 2015 12:24:39 -0500 Subject: [OmniOS-discuss] =?utf-8?q?considering_an_SSD_pool_=2E=2E=2E_whic?= =?utf-8?q?h_SSD?= Message-ID: <2095707765-3740@mail.careyweb.com> I love the Intel 730s nice mix of price/server level reliability. -- Nate Smith Sr. Network /Systems Analyst Carey Color Inc. 6835 Ridge Rd./PO Box 609 Sharon Center, OH 44274 330-239-1835 -----Original Message----- From: Dan McDonald [danmcd at omniti.com] Received: Wednesday, 09 Dec 2015, 10:02AM To: Tobias Oetiker [tobi at oetiker.ch] CC: omnios-discuss [omnios-discuss at lists.omniti.com] Subject: Re: [OmniOS-discuss] considering an SSD pool ... which SSD > On Dec 9, 2015, at 2:05 AM, Tobias Oetiker wrote: > > Our System Integrator recommends the use of Intel SSDs as opposed We use Intel ones in-house, because they have the best reputation for wear. I suspect the other folks are catching up, but Intel had a lead. My $0.02, Dan _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From tom.robinson at motec.com.au Wed Dec 9 22:24:23 2015 From: tom.robinson at motec.com.au (Tom Robinson) Date: Thu, 10 Dec 2015 09:24:23 +1100 Subject: [OmniOS-discuss] LUN (in)visibility Message-ID: <5668AA17.6050105@motec.com.au> OmniOS v11 r151012 Hi, We are using iSCSI over 10G ethernet and Infiniband to connect our storage (OmniOS) to our virtual infrastructure (KVM/ESXi). Our KVM host is using multipath and is configured to prefer Infiniband. We also run an ESXi host (in legacy - we are changing to KVM) with a similar path selection configuration. A few weeks ago Infiniband failed to connect (although we couldn't see any reason why) but thankfully all the initiators failed over to iSCSI on both KVM and ESXi. The iSCSI connection seemed to be stable and would we could see all existing LUNs offered by the OmniOS server. Since then I've been working on solving the Infiniband issue but couldn't find any obvious problems with the configuration. In doing so I have rebooted both KVM and ESXi mulitple times. The Infiniband fabric, however, didn't come back. A couple of days ago I needed new storage for the virtual infrastructure but had an issue with COMSTAR LUN visibility on the initiator end. I created two new target LUNs for iSCSI and the COMSTAR end looked perfect but neither the KVM nor ESXi host picked them up on rescan (as they usually would). At that time I had 68 active LUNs which were seen and active by the KVM and ESXi hosts but the two new ones weren't appearing. I've done the procedure for adding new targets multiple times before and it's always worked so this stumped me. Yesterday I shutdown all our infrastructure and started OmniOS first, then KVM and ESXi. It's all working again. The two new targets came up along with all of the others, Infinband and iSCSI are working. So great, but now I'm thinking I have an unknown instability issue in the storage system. Has anyone seen this behaviour before? Where can I begin to look for the cause of the issue? Kind regards, Tom -- Tom Robinson IT Manager/System Administrator MoTeC Pty Ltd 121 Merrindale Drive Croydon South 3136 Victoria Australia T: +61 3 9761 5050 F: +61 3 9761 5051 E: tom.robinson at motec.com.au -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From danmcd at omniti.com Wed Dec 9 23:00:59 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 9 Dec 2015 18:00:59 -0500 Subject: [OmniOS-discuss] LUN (in)visibility In-Reply-To: <5668AA17.6050105@motec.com.au> References: <5668AA17.6050105@motec.com.au> Message-ID: <9F96534F-D20A-4AF8-BCF2-84D62402635B@omniti.com> > On Dec 9, 2015, at 5:24 PM, Tom Robinson wrote: > > OmniOS v11 r151012 First off, OmniOS r151012 has reached end of service life. You should upgrade to at least r151014 (the current LTS) or r151016 (the current Stable). > > Yesterday I shutdown all our infrastructure and started OmniOS first, then KVM and ESXi. It's all > working again. The two new targets came up along with all of the others, Infinband and iSCSI are > working. > > So great, but now I'm thinking I have an unknown instability issue in the storage system. I'd recommend first getting up to date with either r151014 or r151016. From there people can figure things out a little easier. Dan From tom.robinson at motec.com.au Wed Dec 9 23:41:29 2015 From: tom.robinson at motec.com.au (Tom Robinson) Date: Thu, 10 Dec 2015 10:41:29 +1100 Subject: [OmniOS-discuss] LUN (in)visibility In-Reply-To: <9F96534F-D20A-4AF8-BCF2-84D62402635B@omniti.com> References: <5668AA17.6050105@motec.com.au> <9F96534F-D20A-4AF8-BCF2-84D62402635B@omniti.com> Message-ID: <5668BC29.8090803@motec.com.au> On 10/12/15 10:00, Dan McDonald wrote: > >> On Dec 9, 2015, at 5:24 PM, Tom Robinson wrote: >> >> OmniOS v11 r151012 > > First off, OmniOS r151012 has reached end of service life. You should upgrade to at least r151014 (the current LTS) or r151016 (the current Stable). Yes, I was looking at either r151014 or r151016 yesterday. We will plan to do that upgrade. Is there an announce list as I was unaware that r151012 had reached end of service life. > > I'd recommend first getting up to date with either r151014 or r151016. From there people can figure things out a little easier. I appreciate that we should be moving onto a supported platform but it would be good to know where I would even start to look for reasons why this is happening. Kind regards, Tom -- Tom Robinson IT Manager/System Administrator MoTeC Pty Ltd 121 Merrindale Drive Croydon South 3136 Victoria Australia T: +61 3 9761 5050 F: +61 3 9761 5051 E: tom.robinson at motec.com.au -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From danmcd at omniti.com Thu Dec 10 00:06:53 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 9 Dec 2015 19:06:53 -0500 Subject: [OmniOS-discuss] LUN (in)visibility In-Reply-To: <5668BC29.8090803@motec.com.au> References: <5668AA17.6050105@motec.com.au> <9F96534F-D20A-4AF8-BCF2-84D62402635B@omniti.com> <5668BC29.8090803@motec.com.au> Message-ID: > On Dec 9, 2015, at 6:41 PM, Tom Robinson wrote: > > Yes, I was looking at either r151014 or r151016 yesterday. We will plan to do that upgrade. Is there > an announce list as I was unaware that r151012 had reached end of service life. Our release cycle is documented: http://omnios.omniti.com/wiki.php/ReleaseCycle and on the omnios-discuss list, I announce EOSLs alongside new releases. >> >> I'd recommend first getting up to date with either r151014 or r151016. From there people can figure things out a little easier. > > I appreciate that we should be moving onto a supported platform but it would be good to know where I > would even start to look for reasons why this is happening. I'd be digging into the source code. COMSTAR is a bit brittle sometimes. One other thing you can do is restart COMSTAR: svcadm restart stmf Other people have recently reported that one may also need to restart the iSCSI target as well: svcadm restart iscsi/target And if you feel you need to start both, disable them, then re-enable them: svcadm disable -st stmf iscsi/target ; svcadm enable stmf iscsi/target That may kicks things around enough without you rebooting everything on your storage box. Dan From jdg117 at elvis.arl.psu.edu Thu Dec 10 00:55:15 2015 From: jdg117 at elvis.arl.psu.edu (John D Groenveld) Date: Wed, 09 Dec 2015 19:55:15 -0500 Subject: [OmniOS-discuss] LUN (in)visibility In-Reply-To: Your message of "Wed, 09 Dec 2015 19:06:53 EST." References: <5668AA17.6050105@motec.com.au> <9F96534F-D20A-4AF8-BCF2-84D62402635B@omniti.com> <5668BC29.8090803@motec.com.au> Message-ID: <201512100055.tBA0tF1a001552@elvis.arl.psu.edu> In message , Dan McDonald writ es: >and on the omnios-discuss list, I announce EOSLs alongside new releases. Perhaps more trouble than its worth right now, but... as usage grows and M/L volume increases, I hope you'll consider an omnios-announce moderated list for releases and other notices. John groenveld at acm.org From tom.robinson at motec.com.au Thu Dec 10 01:18:51 2015 From: tom.robinson at motec.com.au (Tom Robinson) Date: Thu, 10 Dec 2015 12:18:51 +1100 Subject: [OmniOS-discuss] LUN (in)visibility In-Reply-To: <201512100055.tBA0tF1a001552@elvis.arl.psu.edu> References: <5668AA17.6050105@motec.com.au> <9F96534F-D20A-4AF8-BCF2-84D62402635B@omniti.com> <5668BC29.8090803@motec.com.au> <201512100055.tBA0tF1a001552@elvis.arl.psu.edu> Message-ID: <5668D2FB.8040506@motec.com.au> On 10/12/15 11:55, John D Groenveld wrote: > as usage grows and M/L volume increases, I hope you'll > consider an omnios-announce moderated list for releases > and other notices. > I'll second that. -- Tom Robinson IT Manager/System Administrator MoTeC Pty Ltd 121 Merrindale Drive Croydon South 3136 Victoria Australia T: +61 3 9761 5050 F: +61 3 9761 5051 E: tom.robinson at motec.com.au -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From johan.kragsterman at capvert.se Thu Dec 10 07:55:28 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Thu, 10 Dec 2015 08:55:28 +0100 Subject: [OmniOS-discuss] Ang: Re: LUN (in)visibility In-Reply-To: <5668BC29.8090803@motec.com.au> References: <5668BC29.8090803@motec.com.au>, <5668AA17.6050105@motec.com.au> <9F96534F-D20A-4AF8-BCF2-84D62402635B@omniti.com> Message-ID: Hi! -----"OmniOS-discuss" skrev: ----- Till: Dan McDonald Fr?n: Tom Robinson S?nt av: "OmniOS-discuss" Datum: 2015-12-10 00:43 Kopia: omnios-discuss ?rende: Re: [OmniOS-discuss] LUN (in)visibility On 10/12/15 10:00, Dan McDonald wrote: > >> On Dec 9, 2015, at 5:24 PM, Tom Robinson wrote: >> >> OmniOS v11 r151012 > > First off, OmniOS r151012 has reached end of service life. ?You should upgrade to at least r151014 (the current LTS) or r151016 (the current Stable). Yes, I was looking at either r151014 or r151016 yesterday. We will plan to do that upgrade. Is there an announce list as I was unaware that r151012 had reached end of service life. > > I'd recommend first getting up to date with either r151014 or r151016. ?From there people can figure things out a little easier. I appreciate that we should be moving onto a supported platform but it would be good to know where I would even start to look for reasons why this is happening. Kind regards, Tom You say "infiniband". Do you mean SRP? Where do you have your subnet manager? In the IB switch? If so, did you check the switch SM logs? I suppose you checked the data links? dladm show-link? What exactly did you check? How about multipath? How many paths did/do you have to each LUN? I know there were a discussion about too many paths to a LUN earlier on this list. That was fibre channel, though. I can't really comment on iScsi since I never use it... Rgrds Johan -- Tom Robinson IT Manager/System Administrator MoTeC Pty Ltd 121 Merrindale Drive Croydon South 3136 Victoria Australia T: +61 3 9761 5050 F: +61 3 9761 5051 E: tom.robinson at motec.com.au _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss [bilagan "signature.asc" borttagen av Johan Kragsterman/Capvert] From tobi at oetiker.ch Thu Dec 10 12:58:42 2015 From: tobi at oetiker.ch (Tobias Oetiker) Date: Thu, 10 Dec 2015 13:58:42 +0100 (CET) Subject: [OmniOS-discuss] Samsung SM863 Message-ID: Just found that samsung now has an ssd with power loss protection http://www.storagereview.com/samsung_sm863_ssd_review what do you think ? cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 From daleg at omniti.com Thu Dec 10 15:54:12 2015 From: daleg at omniti.com (Dale Ghent) Date: Thu, 10 Dec 2015 10:54:12 -0500 Subject: [OmniOS-discuss] LUN (in)visibility In-Reply-To: References: <5668AA17.6050105@motec.com.au> <9F96534F-D20A-4AF8-BCF2-84D62402635B@omniti.com> <5668BC29.8090803@motec.com.au> Message-ID: > On Dec 9, 2015, at 7:06 PM, Dan McDonald wrote: > > >> On Dec 9, 2015, at 6:41 PM, Tom Robinson wrote: >> >> Yes, I was looking at either r151014 or r151016 yesterday. We will plan to do that upgrade. Is there >> an announce list as I was unaware that r151012 had reached end of service life. > > Our release cycle is documented: > > http://omnios.omniti.com/wiki.php/ReleaseCycle > > and on the omnios-discuss list, I announce EOSLs alongside new releases. Yeah, the general rule is that there's a 1 year shelf life for Stable releases. We certainly don't mind people running Stables (I do so myself) as long as that is kept in mind. /dale -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 455 bytes Desc: Message signed with OpenPGP using GPGMail URL: From davide.poletto at gmail.com Thu Dec 10 16:57:46 2015 From: davide.poletto at gmail.com (Davide Poletto) Date: Thu, 10 Dec 2015 17:57:46 +0100 Subject: [OmniOS-discuss] illumos and contributions metrics: how to evaluate companies that commercialize illumos based products by examining them in the light of their illumos community's contributions. In-Reply-To: <5665B00E.70307@kateley.com> References: <5665B00E.70307@kateley.com> Message-ID: Thanks Dan and Linda for your answers (thanks Linda for the useful link!): mine was just sane curiosity and, probably, not a real/relevant problem, at least, nothing blocking my evaluating activity...it's just like a slow bee flying around my head...especially considering that, as Dan said, community member's energy is used to keep the "machinery" running and in good health! Kind regards, Davide. P.S. slightly OT: I can't understand the consequences of Dan's statement: "Making things MORE complicated is that "illumos" as a brand is still tightly tied up by its owner."...what does it mean? On Mon, Dec 7, 2015 at 5:13 PM, Linda Kateley wrote: > Blackduck does this for you. > > https://www.openhub.net/p?ref=homepage&query=illumos > > > On 12/7/15 7:44 AM, Dan McDonald wrote: > >> On Dec 7, 2015, at 8:13 AM, Davide Poletto >>> wrote: >>> >>> Is there a way to rank/evaluate and so reward/honour (by, as example, >>> purchasing their products or by sustaining their development as >>> testers/free-time contributors) those {individuals, companies, >>> institutions} that clearly demonstrate not only to have good numbers >>> (commits) but also that they care about the community and that are more >>> transparent than others in advertising their commercial offer's origin? >>> >> That's a damned good question. It's also very tricky. >> >> Some firms keep things closed until they've released, or after some time >> after they've released. Some find this fair enough, others find it >> annoying. Because people are different, it may be hard to get a consensus >> on how to rank/evaluate firms the way you wish. BTW, I lean toward "fair >> enough" so long as there's consistency and not going back on one's word. >> Keeping to one's word is important to me. I didn't leave Oracle because >> of the Solaris-closing: if you read the text of that leaked email, it >> implied a source-dump-on-release model. Only after I left Oracle did it >> become clear that it was all a big lie. >> >> You're chasing a hard problem. You may not get much sympathy. Making >> things MORE complicated is that "illumos" as a brand is still tightly tied >> up by its owner. Many feel that it's tied up too tightly, and that is why >> you rarely see "illumos" mentioned in marketing materials, especially not >> the trademarked symbol. >> >> I'm sorry I don't have better answers for you right now. It's a hard >> problem, and many of us who might be able to help clarify things are trying >> to keep all of the machinery moving as smoothly as we can. >> >> Dan >> >> >> _______________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss >> > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.elling at richardelling.com Thu Dec 10 18:13:12 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Thu, 10 Dec 2015 10:13:12 -0800 Subject: [OmniOS-discuss] Samsung SM863 In-Reply-To: References: Message-ID: <63AA1781-EFEC-4991-B2AD-A7F6A7D2EDFC@richardelling.com> > On Dec 10, 2015, at 4:58 AM, Tobias Oetiker wrote: > > Just found that samsung now has an ssd with power loss protection > > http://www.storagereview.com/samsung_sm863_ssd_review > > what do you think ? Power-loss protection is not required (ZFS works on HDDs :-) but it is a nice feature. Overall, this looks like a very nice SSD. I expect more enterprise-grade SSDs from Samsung in the future. -- richard From dave-oo at pooserville.com Thu Dec 10 20:02:45 2015 From: dave-oo at pooserville.com (Dave Pooser) Date: Thu, 10 Dec 2015 14:02:45 -0600 Subject: [OmniOS-discuss] Samsung SM863 In-Reply-To: <63AA1781-EFEC-4991-B2AD-A7F6A7D2EDFC@richardelling.com> References: <63AA1781-EFEC-4991-B2AD-A7F6A7D2EDFC@richardelling.com> Message-ID: On 12/10/15, 12:13 PM, "OmniOS-discuss on behalf of Richard Elling" wrote: > >> On Dec 10, 2015, at 4:58 AM, Tobias Oetiker wrote: >> >> Just found that samsung now has an ssd with power loss protection >> >> http://www.storagereview.com/samsung_sm863_ssd_review >> >> what do you think ? > >Power-loss protection is not required (ZFS works on HDDs :-) but it is a >nice feature. On a device that will likely be used for ZIL, I'd call power-loss protection required. HDDs don't lie to the OS about when data has been flushed from cache to disk the way SSDs do, right? On a device that's going to be L2ARC I care a lot less, obviously. ;-) -- Dave Pooser Cat-Herder-in-Chief, Pooserville.com From jimklimov at cos.ru Thu Dec 10 20:20:19 2015 From: jimklimov at cos.ru (Jim Klimov) Date: Thu, 10 Dec 2015 21:20:19 +0100 Subject: [OmniOS-discuss] Samsung SM863 In-Reply-To: <63AA1781-EFEC-4991-B2AD-A7F6A7D2EDFC@richardelling.com> References: <63AA1781-EFEC-4991-B2AD-A7F6A7D2EDFC@richardelling.com> Message-ID: <45D2D4ED-E2B5-48B8-A170-5D8AB70C809B@cos.ru> 10 ??????? 2015??. 19:13:12 CET, Richard Elling ?????: > >> On Dec 10, 2015, at 4:58 AM, Tobias Oetiker wrote: >> >> Just found that samsung now has an ssd with power loss protection >> >> http://www.storagereview.com/samsung_sm863_ssd_review >> >> what do you think ? > >Power-loss protection is not required (ZFS works on HDDs :-) but it is >a nice feature. >Overall, this looks like a very nice SSD. I expect more >enterprise-grade SSDs from >Samsung in the future. > -- richard > >_______________________________________________ >OmniOS-discuss mailing list >OmniOS-discuss at lists.omniti.com >http://lists.omniti.com/mailman/listinfo/omnios-discuss IIRC the historical issue was not per se with powerloss protection for ZFS needs, but with drives and firmwares that could misbehave when power disappeared if they did not yet flush ram to flash - including corruption of ssd metadata which bricked the device, and also in cases of graceful shutdown when the host cut its own power off afterwards. These effects were not seen as often (or ever) on ssds with capacitors or equivalent protection. I do not know how much of this is FUD or relevant with today's devices vs. vendors' first steps a few years back, but the rule of thumb was to use protected ssds for anything other than scratch use (e.g. whole device dedicated as l2arc or other cache area) since nobody knew what's really good and what's not. Jim -- Typos courtesy of K-9 Mail on my Samsung Android From richard.elling at richardelling.com Thu Dec 10 20:43:25 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Thu, 10 Dec 2015 12:43:25 -0800 Subject: [OmniOS-discuss] Samsung SM863 In-Reply-To: References: <63AA1781-EFEC-4991-B2AD-A7F6A7D2EDFC@richardelling.com> Message-ID: <2C49B16E-5D7E-4A7B-8903-64BE04566DA0@richardelling.com> > On Dec 10, 2015, at 12:02 PM, Dave Pooser wrote: > > On 12/10/15, 12:13 PM, "OmniOS-discuss on behalf of Richard Elling" > richard.elling at richardelling.com> wrote: > >> >>> On Dec 10, 2015, at 4:58 AM, Tobias Oetiker wrote: >>> >>> Just found that samsung now has an ssd with power loss protection >>> >>> http://www.storagereview.com/samsung_sm863_ssd_review >>> >>> what do you think ? >> >> Power-loss protection is not required (ZFS works on HDDs :-) but it is a >> nice feature. > > On a device that will likely be used for ZIL, I'd call power-loss > protection required. HDDs don't lie to the OS about when data has been > flushed from cache to disk the way SSDs do, right? You do need a device that honors cache flush commands. But that goes for any device or RAID array, not just SSDs. -- richard > > On a device that's going to be L2ARC I care a lot less, obviously. ;-) > -- > Dave Pooser > Cat-Herder-in-Chief, Pooserville.com > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From tom.robinson at motec.com.au Thu Dec 10 22:10:46 2015 From: tom.robinson at motec.com.au (Tom Robinson) Date: Fri, 11 Dec 2015 09:10:46 +1100 Subject: [OmniOS-discuss] Ang: Re: LUN (in)visibility In-Reply-To: References: <5668BC29.8090803@motec.com.au> <5668AA17.6050105@motec.com.au> <9F96534F-D20A-4AF8-BCF2-84D62402635B@omniti.com> Message-ID: <5669F866.2030204@motec.com.au> On 10/12/15 18:55, Johan Kragsterman wrote: > You say "infiniband". Do you mean SRP? Where do you have your subnet manager? In the IB switch? If so, did you check the switch SM logs? > > I suppose you checked the data links? dladm show-link? What exactly did you check? > > How about multipath? How many paths did/do you have to each LUN? I know there were a discussion about too many paths to a LUN earlier on this list. That was fibre channel, though. > > I can't really comment on iScsi since I never use it... Hi Johan, Yes, SRP. As I said, we had everything working fine before which means we also have a subnet manager. The SM actually runs on it's own litlte box. ---------- ----------- ----- | storage|======|IB Switch|======|KVM| ---------- ----------- ----- | | ---- ------ |SM| |ESXi| ---- ------ Normally there are only three paths; one iSCSI and two SRP. I spent a lot of time hunting around on the KVM system looking for clues as at that time I didn't see any issues else where in the setup. On OmniOS, in /var/adm/messages I had this: Oct 26 07:28:43 monza.motec.com.au genunix: [ID 408789 kern.warning] WARNING: hermon0: fault detected external to device; service unavailable Oct 26 07:28:43 monza.motec.com.au genunix: [ID 451854 kern.warning] WARNING: hermon0: port 2 down Oct 26 07:28:47 monza.motec.com.au genunix: [ID 408822 kern.info] NOTICE: hermon0: fault detected external to device; service still unavailable Oct 26 07:28:47 monza.motec.com.au genunix: [ID 611667 kern.info] NOTICE: hermon0: port 1 down Oct 26 07:30:12 monza.motec.com.au genunix: [ID 408789 kern.notice] NOTICE: hermon0: fault cleared external to device; service available Oct 26 07:30:12 monza.motec.com.au genunix: [ID 451854 kern.notice] NOTICE: hermon0: port 2 up Oct 26 07:30:12 monza.motec.com.au genunix: [ID 408822 kern.info] NOTICE: hermon0: no fault external to device; service available Oct 26 07:30:12 monza.motec.com.au genunix: [ID 611667 kern.info] NOTICE: hermon0: port 1 up Oct 26 07:31:31 monza.motec.com.au genunix: [ID 408789 kern.warning] WARNING: hermon0: fault detected external to device; service unavailable Oct 26 07:31:31 monza.motec.com.au genunix: [ID 451854 kern.warning] WARNING: hermon0: port 2 down Oct 26 07:31:38 monza.motec.com.au genunix: [ID 408822 kern.info] NOTICE: hermon0: fault detected external to device; service still unavailable Oct 26 07:31:38 monza.motec.com.au genunix: [ID 611667 kern.info] NOTICE: hermon0: port 1 down Oct 26 07:32:12 monza.motec.com.au genunix: [ID 408789 kern.notice] NOTICE: hermon0: fault cleared external to device; service available Oct 26 07:32:12 monza.motec.com.au genunix: [ID 451854 kern.notice] NOTICE: hermon0: port 2 up Oct 26 07:32:12 monza.motec.com.au genunix: [ID 408822 kern.info] NOTICE: hermon0: no fault external to device; service available Oct 26 07:32:12 monza.motec.com.au genunix: [ID 611667 kern.info] NOTICE: hermon0: port 1 up Isn't the hermon0 driver for the Mellanox cards? Kind regards, Tom -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From danmcd at omniti.com Thu Dec 10 22:12:55 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 10 Dec 2015 17:12:55 -0500 Subject: [OmniOS-discuss] Ang: Re: LUN (in)visibility In-Reply-To: <5669F866.2030204@motec.com.au> References: <5668BC29.8090803@motec.com.au> <5668AA17.6050105@motec.com.au> <9F96534F-D20A-4AF8-BCF2-84D62402635B@omniti.com> <5669F866.2030204@motec.com.au> Message-ID: <3653512F-6EB5-4D25-90BD-BB47ECE5C28F@omniti.com> > On Dec 10, 2015, at 5:10 PM, Tom Robinson wrote: > > Isn't the hermon0 driver for the Mellanox cards? Yes it is! NAME hermon - ConnectX MT25408/MT25418/MT25428 InfiniBand (IB) Driver DESCRIPTION The hermon driver is an IB Architecture-compliant implementation of an HCA, which operates on the Mellanox MT25408, MT25418 and MT25428 InfiniBand ASSPs using host memory for context storage rather than locally attached memory on the card. Cards based on these ASSP's utilize the PCI-Express I/O bus. These ASSP's support the link and physical layers of the InfiniBand specification while the ASSP and the driver support the transport layer. Dan From johan.kragsterman at capvert.se Fri Dec 11 07:45:37 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Fri, 11 Dec 2015 08:45:37 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Re: LUN (in)visibility In-Reply-To: <5669F866.2030204@motec.com.au> References: <5669F866.2030204@motec.com.au>, <5668BC29.8090803@motec.com.au> <5668AA17.6050105@motec.com.au> <9F96534F-D20A-4AF8-BCF2-84D62402635B@omniti.com> Message-ID: Hi! -----Tom Robinson skrev: ----- Till: Johan Kragsterman Fr?n: Tom Robinson Datum: 2015-12-10 23:10 Kopia: Dan McDonald , omnios-discuss ?rende: Re: Ang: Re: [OmniOS-discuss] LUN (in)visibility On 10/12/15 18:55, Johan Kragsterman wrote: > You say "infiniband". Do you mean SRP? Where do you have your subnet manager? In the IB switch? If so, did you check the switch SM logs? > > I suppose you checked the data links? dladm show-link? What exactly did you check? > > How about multipath? How many paths did/do you have to each LUN? I know there were a discussion about too many paths to a LUN earlier on this list. That was fibre channel, though. > > I can't really comment on iScsi since I never use it... Hi Johan, Yes, SRP. As I said, we had everything working fine before which means we also have a subnet manager. The SM actually runs on it's own litlte box. ---------- ? ? ?----------- ? ? ?----- | storage|======|IB Switch|======|KVM| ---------- ? ? ?----------- ? ? ?----- ?? ? ? ? ? ? ? ? ?| ? ? | ?? ? ? ? ? ? ? ?---- ? ------ ?? ? ? ? ? ? ? ?|SM| ? |ESXi| ?? ? ? ? ? ? ? ?---- ? ------ Normally there are only three paths; one iSCSI and two SRP. I spent a lot of time hunting around on the KVM system looking for clues as at that time I didn't see any issues else where in the setup. On OmniOS, in /var/adm/messages I had this: Oct 26 07:28:43 monza.motec.com.au genunix: [ID 408789 kern.warning] WARNING: hermon0: fault detected external to device; service unavailable Oct 26 07:28:43 monza.motec.com.au genunix: [ID 451854 kern.warning] WARNING: hermon0: port 2 down Oct 26 07:28:47 monza.motec.com.au genunix: [ID 408822 kern.info] NOTICE: hermon0: fault detected external to device; service still unavailable Oct 26 07:28:47 monza.motec.com.au genunix: [ID 611667 kern.info] NOTICE: hermon0: port 1 down Oct 26 07:30:12 monza.motec.com.au genunix: [ID 408789 kern.notice] NOTICE: hermon0: fault cleared external to device; service available Oct 26 07:30:12 monza.motec.com.au genunix: [ID 451854 kern.notice] NOTICE: hermon0: port 2 up Oct 26 07:30:12 monza.motec.com.au genunix: [ID 408822 kern.info] NOTICE: hermon0: no fault external to device; service available Oct 26 07:30:12 monza.motec.com.au genunix: [ID 611667 kern.info] NOTICE: hermon0: port 1 up Oct 26 07:31:31 monza.motec.com.au genunix: [ID 408789 kern.warning] WARNING: hermon0: fault detected external to device; service unavailable Oct 26 07:31:31 monza.motec.com.au genunix: [ID 451854 kern.warning] WARNING: hermon0: port 2 down Oct 26 07:31:38 monza.motec.com.au genunix: [ID 408822 kern.info] NOTICE: hermon0: fault detected external to device; service still unavailable Oct 26 07:31:38 monza.motec.com.au genunix: [ID 611667 kern.info] NOTICE: hermon0: port 1 down Oct 26 07:32:12 monza.motec.com.au genunix: [ID 408789 kern.notice] NOTICE: hermon0: fault cleared external to device; service available Oct 26 07:32:12 monza.motec.com.au genunix: [ID 451854 kern.notice] NOTICE: hermon0: port 2 up Oct 26 07:32:12 monza.motec.com.au genunix: [ID 408822 kern.info] NOTICE: hermon0: no fault external to device; service available Oct 26 07:32:12 monza.motec.com.au genunix: [ID 611667 kern.info] NOTICE: hermon0: port 1 up Isn't the hermon0 driver for the Mellanox cards? Kind regards, Tom Yeah, that's the driver, and this seem to me like a data link problem. And a data link problem could be one or mer things among many things, like I suggested before to check the SM, if you got any logs there. The msg: "fault detected external to device" is of coarse the key here, but I can't decipher it, unfortunatly... Do you run the iScsi service over the same IB infrastructure? Rgrds Johan [bilagan "signature.asc" borttagen av Johan Kragsterman/Capvert] From danmcd at omniti.com Fri Dec 11 15:48:15 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 11 Dec 2015 10:48:15 -0500 Subject: [OmniOS-discuss] Bloody update for December 11th Message-ID: This will be the last bloody update for 2015. The new install media (ISO, USB-DD, or kayak .zfs.bz2) can be obtained via here: http://omnios.omniti.com/wiki.php/Installation illumos-omnios is built from revision 2e8c0ba. omnios-build is built from b7ab647, but is a faster moving target. New in this bloody: - PCRE to 8.38 (the one already patched in 014 & 016). - OpenSSH now works properly with the illumos audit system. This will be backported to 014 & 016 soon. - OpenSSL to 1.0.2e (already patched in 014 & 016). - Package variant support for DEBUG kernels now built-in (more on this below). - ZFS receive now works with replication streams with intermediate snapshots that exceed refquota (a requested fix) - SMB2 support (thanks Nexenta) The "package variant" support (thanks to Jeff Sipek for the inspiration) allows a user of OmniOS to create a distinct boot environment with same-time-compiled bits, but with DEBUG enabled. To create such a BE, you start by upgrading to these bits, and then you can use the "pkg change-variant" command. The change-variant subcommand works a lot like install. Here's an example: Last login: Fri Dec 11 10:30:56 2015 OmniOS 5.11 omnios-2e8c0ba December 2015 # pkg variant -a VARIANT VALUE arch i386 debug.illumos false opensolaris.zone global # pkg change-variant -n debug.illumos=true Packages to change: 188 Variants/Facets to change: 1 Create boot environment: Yes Create backup boot environment: No Planning linked: 0/1 done; 1 working: zone:tz2 Linked image 'zone:tz2' output: | No updates necessary for this image. (zone:tz2) ` Planning linked: 1/1 done # If I wasn't using -n: boom, instant DEBUG kernel in a new BE. This will be handy for people who encounter problems. We can request you create a DEBUG BE and reproduce your problem without having to ONU or do any other weirdness. As I said earlier, this is the last bloody update for 2015. Have a happy holiday season (whatever you celebrate, or not) and a happy new year! Dan From wonko at 4amlunch.net Fri Dec 11 16:33:35 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Fri, 11 Dec 2015 11:33:35 -0500 Subject: [OmniOS-discuss] OpenSM for OmniOS Message-ID: <0F467FE2-9E8C-40D8-90AC-7B62638072F9@4amlunch.net> I?ve found that supposedly this works. I just need to get a copy and build it. Does anyone know where I would get a copy? I cannot find it for the life of me! Thanks, -brian From johan.kragsterman at capvert.se Fri Dec 11 16:49:47 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Fri, 11 Dec 2015 17:49:47 +0100 Subject: [OmniOS-discuss] Ang: OpenSM for OmniOS In-Reply-To: <0F467FE2-9E8C-40D8-90AC-7B62638072F9@4amlunch.net> References: <0F467FE2-9E8C-40D8-90AC-7B62638072F9@4amlunch.net> Message-ID: Hi! -----"OmniOS-discuss" skrev: ----- Till: omnios-discuss Fr?n: Brian Hechinger S?nt av: "OmniOS-discuss" Datum: 2015-12-11 17:35 ?rende: [OmniOS-discuss] OpenSM for OmniOS I’ve found that supposedly this works. I just need to get a copy and build it. Does anyone know where I would get a copy? ?I cannot find it for the life of me! Thanks, -brian That software is pretty old, and I never tested it myself, but heard about people successfully running it. Don't know about production, though... I can give you some links: https://syoyo.wordpress.com/category/infiniband/ https://github.com/syoyo/solaris-infiniband-tools Rgrds Johan _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From wonko at 4amlunch.net Fri Dec 11 16:51:40 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Fri, 11 Dec 2015 11:51:40 -0500 Subject: [OmniOS-discuss] Ang: OpenSM for OmniOS In-Reply-To: References: <0F467FE2-9E8C-40D8-90AC-7B62638072F9@4amlunch.net> Message-ID: <4B0A0C6D-D48D-4DF2-8D74-9DB3EDAA8785@4amlunch.net> Yeah, I?ve found that. The problem is I can?t find the source for this or the patch. Eric pointed me at http://code.openhub.net/project?pid=&ipid=303919&fp=303919&mp&projSelected=true&filterChecked I?m trying to get that to clone (it fails, sigh). It?s at least newer and maybe (hopefully) doesn?t need to be patched for Solaris/Illumos? -brian > On Dec 11, 2015, at 11:49 AM, Johan Kragsterman wrote: > > Hi! > > > -----"OmniOS-discuss" skrev: ----- > Till: omnios-discuss > Fr?n: Brian Hechinger > S?nt av: "OmniOS-discuss" > Datum: 2015-12-11 17:35 > ?rende: [OmniOS-discuss] OpenSM for OmniOS > > I’ve found that supposedly this works. I just need to get a copy and build it. > > Does anyone know where I would get a copy? I cannot find it for the life of me! > > Thanks, > > -brian > > > > > That software is pretty old, and I never tested it myself, but heard about people successfully running it. Don't know about production, though... > > I can give you some links: > > https://syoyo.wordpress.com/category/infiniband/ > > https://github.com/syoyo/solaris-infiniband-tools > > > Rgrds Johan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johan.kragsterman at capvert.se Fri Dec 11 17:02:24 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Fri, 11 Dec 2015 18:02:24 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: OpenSM for OmniOS In-Reply-To: <4B0A0C6D-D48D-4DF2-8D74-9DB3EDAA8785@4amlunch.net> References: <4B0A0C6D-D48D-4DF2-8D74-9DB3EDAA8785@4amlunch.net>, <0F467FE2-9E8C-40D8-90AC-7B62638072F9@4amlunch.net> Message-ID: Hi! -----Brian Hechinger skrev: ----- Till: Johan Kragsterman Fr?n: Brian Hechinger Datum: 2015-12-11 17:51 Kopia: omnios-discuss ?rende: Re: Ang: [OmniOS-discuss] OpenSM for OmniOS Yeah, I’ve found that. The problem is I can’t find the source for this or the patch. Eric pointed me at?http://code.openhub.net/project?pid=&ipid=303919&fp=303919&mp&projSelected=true&filterChecked I’m trying to get that to clone (it fails, sigh). It’s at least newer and maybe (hopefully) doesn’t need to be patched for Solaris/Illumos? -brian Did you try it from here: http://git.openfabrics.org/~alexnetes/opensm.git/ Rgrds Johan From wonko at 4amlunch.net Fri Dec 11 17:09:32 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Fri, 11 Dec 2015 12:09:32 -0500 Subject: [OmniOS-discuss] Ang: Re: Ang: OpenSM for OmniOS In-Reply-To: References: <4B0A0C6D-D48D-4DF2-8D74-9DB3EDAA8785@4amlunch.net> <0F467FE2-9E8C-40D8-90AC-7B62638072F9@4amlunch.net> Message-ID: <03F2E265-758A-4512-91F5-B082C2FE67DD@4amlunch.net> That fails to clone, but I did manage to eventually find it on that site. Got 3.3.19 It needs libibumad so I got that, but that explodes horribly when I try to build it. :( Stuff like this: ./include/infiniband/umad.h:84:62: error: expected expression before 'uint32_t' #define IB_USER_MAD_UNREGISTER_AGENT _IOW(IB_IOCTL_MAGIC, 2, uint32_t) ^ src/umad.c:979:19: note: in expansion of macro 'IB_USER_MAD_UNREGISTER_AGENT' return ioctl(fd, IB_USER_MAD_UNREGISTER_AGENT, &agentid); -brian > On Dec 11, 2015, at 12:02 PM, Johan Kragsterman wrote: > > Hi! > > > > -----Brian Hechinger skrev: ----- > Till: Johan Kragsterman > Fr?n: Brian Hechinger > Datum: 2015-12-11 17:51 > Kopia: omnios-discuss > ?rende: Re: Ang: [OmniOS-discuss] OpenSM for OmniOS > > Yeah, I’ve found that. The problem is I can’t find the source for this or the patch. > > Eric pointed me at http://code.openhub.net/project?pid=&ipid=303919&fp=303919&mp&projSelected=true&filterChecked > > I’m trying to get that to clone (it fails, sigh). It’s at least newer and maybe (hopefully) doesn’t need to be patched for Solaris/Illumos? > > -brian > > > > > Did you try it from here: > > > http://git.openfabrics.org/~alexnetes/opensm.git/ > > > Rgrds Johan > From alka at hfg-gmuend.de Fri Dec 11 17:30:04 2015 From: alka at hfg-gmuend.de (=?UTF-8?Q?G=c3=bcnther_Alka?=) Date: Fri, 11 Dec 2015 18:30:04 +0100 Subject: [OmniOS-discuss] Bloody update for December 11th In-Reply-To: References: Message-ID: <566B081C.8@hfg-gmuend.de> Many Thanks to Nexenta and to OmniTi for this december bloody with SMB 2 I have just done some tests on OSX under Solaris 11.3 to check some configuration options for a ZFS video editing storage server for my Mac Pros. There are two must have principles: SMB2 and Jumboframes see http://napp-it.org/doc/downloads/performance_smb2.pdf Gea On 11.12.2015 16:48, Dan McDonald wrote: > This will be the last bloody update for 2015. The new install media (ISO, USB-DD, or kayak .zfs.bz2) can be obtained via here: > > http://omnios.omniti.com/wiki.php/Installation > > illumos-omnios is built from revision 2e8c0ba. omnios-build is built from b7ab647, but is a faster moving target. > > New in this bloody: > > - PCRE to 8.38 (the one already patched in 014 & 016). > > - OpenSSH now works properly with the illumos audit system. This will be backported to 014 & 016 soon. > > - OpenSSL to 1.0.2e (already patched in 014 & 016). > > - Package variant support for DEBUG kernels now built-in (more on this below). > > - ZFS receive now works with replication streams with intermediate snapshots that exceed refquota (a requested fix) > > - SMB2 support (thanks Nexenta) > > > The "package variant" support (thanks to Jeff Sipek for the inspiration) allows a user of OmniOS to create a distinct boot environment with same-time-compiled bits, but with DEBUG enabled. To create such a BE, you start by upgrading to these bits, and then you can use the "pkg change-variant" command. The change-variant subcommand works a lot like install. Here's an example: > > Last login: Fri Dec 11 10:30:56 2015 > OmniOS 5.11 omnios-2e8c0ba December 2015 > # pkg variant -a > VARIANT VALUE > arch i386 > debug.illumos false > opensolaris.zone global > # pkg change-variant -n debug.illumos=true > Packages to change: 188 > Variants/Facets to change: 1 > Create boot environment: Yes > Create backup boot environment: No > > Planning linked: 0/1 done; 1 working: zone:tz2 > Linked image 'zone:tz2' output: > | No updates necessary for this image. (zone:tz2) > ` > Planning linked: 1/1 done > # > > If I wasn't using -n: boom, instant DEBUG kernel in a new BE. This will be handy for people who encounter problems. We can request you create a DEBUG BE and reproduce your problem without having to ONU or do any other weirdness. > > As I said earlier, this is the last bloody update for 2015. Have a happy holiday season (whatever you celebrate, or not) and a happy new year! > > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -- H f G Hochschule f?r Gestaltung university of design Schw?bisch Gm?nd Rektor Klaus Str. 100 73525 Schw?bisch Gm?nd Guenther Alka, Dipl.-Ing. (FH) Leiter des Rechenzentrums head of computer center Tel 07171 602 624 Fax 07171 69259 guenther.alka at hfg-gmuend.de http://rz.hfg-gmuend.de From lists at marzocchi.net Fri Dec 11 19:01:56 2015 From: lists at marzocchi.net (Olaf Marzocchi) Date: Fri, 11 Dec 2015 22:31:56 +0330 Subject: [OmniOS-discuss] Bloody update for December 11th In-Reply-To: References: Message-ID: Any plan to backport it to 014? Since 014 is LTS, lack of backport means no SMB2 for a long time for all the LTS users. Olaf Il 11 dicembre 2015 19:18:15 GMT+03:30, Dan McDonald ha scritto: > >- SMB2 support (thanks Nexenta) > From danmcd at omniti.com Fri Dec 11 19:03:38 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 11 Dec 2015 14:03:38 -0500 Subject: [OmniOS-discuss] Bloody update for December 11th In-Reply-To: References: Message-ID: <12128792-E69E-43B8-9477-354E2CE9F6AB@omniti.com> > On Dec 11, 2015, at 2:01 PM, Olaf Marzocchi wrote: > > Any plan to backport it to 014? Since 014 is LTS, lack of backport means no SMB2 for a long time for all the LTS users. I deliberately left it out of 016 because of its size & complexity (it came in the day after 016 did its last upstream merge). NO WAY will something this big go back into LTS/014 without a LOT of convincing (and a suitable test cycle). Sorry, Dan From johan.kragsterman at capvert.se Fri Dec 11 19:08:15 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Fri, 11 Dec 2015 20:08:15 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Re: Ang: OpenSM for OmniOS In-Reply-To: <03F2E265-758A-4512-91F5-B082C2FE67DD@4amlunch.net> References: <03F2E265-758A-4512-91F5-B082C2FE67DD@4amlunch.net>, <4B0A0C6D-D48D-4DF2-8D74-9DB3EDAA8785@4amlunch.net> <0F467FE2-9E8C-40D8-90AC-7B62638072F9@4amlunch.net> Message-ID: Hi! Would be nice if you keep us(list) updated... Rgrds Johan -----Brian Hechinger skrev: ----- Till: Johan Kragsterman Fr?n: Brian Hechinger Datum: 2015-12-11 18:09 Kopia: omnios-discuss ?rende: Re: Ang: Re: Ang: [OmniOS-discuss] OpenSM for OmniOS That fails to clone, but I did manage to eventually find it on that site. Got 3.3.19 It needs libibumad so I got that, but that explodes horribly when I try to build it. :( Stuff like this: ./include/infiniband/umad.h:84:62: error: expected expression before 'uint32_t' ?#define IB_USER_MAD_UNREGISTER_AGENT _IOW(IB_IOCTL_MAGIC, 2, uint32_t) ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ src/umad.c:979:19: note: in expansion of macro 'IB_USER_MAD_UNREGISTER_AGENT' ??return ioctl(fd, IB_USER_MAD_UNREGISTER_AGENT, &agentid); -brian > On Dec 11, 2015, at 12:02 PM, Johan Kragsterman wrote: > > Hi! > > > > -----Brian Hechinger skrev: ----- > Till: Johan Kragsterman > Fr?n: Brian Hechinger > Datum: 2015-12-11 17:51 > Kopia: omnios-discuss > ?rende: Re: Ang: [OmniOS-discuss] OpenSM for OmniOS > > Yeah, I’ve found that. The problem is I can’t find the source for this or the patch. > > Eric pointed me at http://code.openhub.net/project?pid=&ipid=303919&fp=303919&mp&projSelected=true&filterChecked > > I’m trying to get that to clone (it fails, sigh). It’s at least newer and maybe (hopefully) doesn’t need to be patched for Solaris/Illumos? > > -brian > > > > > Did you try it from here: > > > http://git.openfabrics.org/~alexnetes/opensm.git/ > > > Rgrds Johan > From philip.yuengling at circonus.com Fri Dec 11 20:25:28 2015 From: philip.yuengling at circonus.com (Philip Yuengling) Date: Fri, 11 Dec 2015 15:25:28 -0500 Subject: [OmniOS-discuss] SSH versions on global and non-global zones Message-ID: It seems that when installing LTS 151014 from the kayak image the global gets OpenSSH_7.1p1, but non-global zones get Sun_SSH_1.5. Obviously some work can be done to get them to match, but it may be good to have them match from the start? Or am I missing something. -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Fri Dec 11 20:38:30 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 11 Dec 2015 15:38:30 -0500 Subject: [OmniOS-discuss] SSH versions on global and non-global zones In-Reply-To: References: Message-ID: <1380CE53-3591-40B9-B0B5-8DEA92AA9713@omniti.com> > On Dec 11, 2015, at 3:25 PM, Philip Yuengling wrote: > > It seems that when installing LTS 151014 from the kayak image the global gets OpenSSH_7.1p1, but non-global zones get Sun_SSH_1.5. > > Obviously some work can be done to get them to match, but it may be good to have them match from the start? Or am I missing something. Huh... I had NO idea it would do that. I assumed (probably incorrectly) that the NGZs would get "entire" just like the global one would. Ahhh, I see the problem: https://github.com/omniti-labs/pkg5/blob/omnios/src/brand/pkgcreatezone#L545 "entire" populates the global zone. Whatever is in pkgcreatezone works for ipkg & lipkg zones. "entire" can support both, and due to IPS's rules (higher version number wins), OpenSSH7.1 beats SunSSH0.151xxx. Not sure if patching pkgcreatezone is the best option OR if we should inherit-from-global more intelligently in the pkgcreatezone script. Thanks for finding this! Dan From eric.sproul at circonus.com Fri Dec 11 20:51:01 2015 From: eric.sproul at circonus.com (Eric Sproul) Date: Fri, 11 Dec 2015 15:51:01 -0500 Subject: [OmniOS-discuss] SSH versions on global and non-global zones In-Reply-To: <1380CE53-3591-40B9-B0B5-8DEA92AA9713@omniti.com> References: <1380CE53-3591-40B9-B0B5-8DEA92AA9713@omniti.com> Message-ID: On Fri, Dec 11, 2015 at 3:38 PM, Dan McDonald wrote: > Huh... I had NO idea it would do that. I assumed (probably incorrectly) that the NGZs would get "entire" just like the global one would. > > > Ahhh, I see the problem: > > https://github.com/omniti-labs/pkg5/blob/omnios/src/brand/pkgcreatezone#L545 > > "entire" populates the global zone. Whatever is in pkgcreatezone works for ipkg & lipkg zones. > > "entire" can support both, and due to IPS's rules (higher version number wins), OpenSSH7.1 beats SunSSH0.151xxx. > > Not sure if patching pkgcreatezone is the best option OR if we should inherit-from-global more intelligently in the pkgcreatezone script. This is fallout from our abuse of entire. The ipkg brand scripts assume entire is just an incorporation, so they explicitly install a bunch of basic packages (including ssh). I'd like to see us try to undo that early mistake (for which I'm partly to blame!) and get back to having "slim_install" fill the role that we forced "entire" into back at the beginning, when we didn't fully understand what distro_const was actually doing. We (Circonus) can work around this for now, and make it part of our '014 zone bootstrap process to switch out ssh daemons. Eric From bfriesen at simple.dallas.tx.us Sat Dec 12 21:07:42 2015 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Sat, 12 Dec 2015 15:07:42 -0600 (CST) Subject: [OmniOS-discuss] Bloody update for December 11th In-Reply-To: <566B081C.8@hfg-gmuend.de> References: <566B081C.8@hfg-gmuend.de> Message-ID: On Fri, 11 Dec 2015, G?nther Alka wrote: > Many Thanks to Nexenta > and to OmniTi for this december bloody with SMB 2 > > I have just done some tests on OSX under Solaris 11.3 to check some > configuration > options for a ZFS video editing storage server for my Mac Pros. Do you plan to add tests with the implementation in OmniOS bloody? The Nexenta implementation might be quite a lot different than the Oracle Solaris one. Perhaps it might even fail with your tests. > There are two must have principles: SMB2 and Jumboframes > see http://napp-it.org/doc/downloads/performance_smb2.pdf I was surprised to see the huge improvement with jumbo frames. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From alka at hfg-gmuend.de Sun Dec 13 08:38:43 2015 From: alka at hfg-gmuend.de (Guenther Alka) Date: Sun, 13 Dec 2015 09:38:43 +0100 Subject: [OmniOS-discuss] Bloody update for December 11th In-Reply-To: References: <566B081C.8@hfg-gmuend.de> Message-ID: <566D2E93.3020505@hfg-gmuend.de> OmniOS is my preferred platform. I will add SMB2 results with the same config to the pdf this week Am 12.12.2015 um 22:07 schrieb Bob Friesenhahn: > On Fri, 11 Dec 2015, G?nther Alka wrote: > >> Many Thanks to Nexenta >> and to OmniTi for this december bloody with SMB 2 >> >> I have just done some tests on OSX under Solaris 11.3 to check some >> configuration >> options for a ZFS video editing storage server for my Mac Pros. > > Do you plan to add tests with the implementation in OmniOS bloody? The > Nexenta implementation might be quite a lot different than the Oracle > Solaris one. Perhaps it might even fail with your tests. > >> There are two must have principles: SMB2 and Jumboframes >> see http://napp-it.org/doc/downloads/performance_smb2.pdf > > I was surprised to see the huge improvement with jumbo frames. > > Bob From vab at bb-c.de Sun Dec 13 12:42:57 2015 From: vab at bb-c.de (Volker A. Brandt) Date: Sun, 13 Dec 2015 13:42:57 +0100 Subject: [OmniOS-discuss] Bloody update for December 11th In-Reply-To: References: Message-ID: <22125.26577.758055.199016@glaurung.bb-c.de> Hi Dan! Thanks for your good work on OmniOS! > This will be the last bloody update for 2015. The new install media > (ISO, USB-DD, or kayak .zfs.bz2) can be obtained via here: > > http://omnios.omniti.com/wiki.php/Installation Note that the link for the ZFS installation root on this page still points to the Nov 09 version. I used wget to retrieve the Dec 11 version, so the file itself is there, just the link is wrong. Regards -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Oracle Solaris Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim, GERMANY Email: vab at bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgr??e: 46 Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt "When logic and proportion have fallen sloppy dead" From danmcd at omniti.com Mon Dec 14 15:18:41 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 14 Dec 2015 10:18:41 -0500 Subject: [OmniOS-discuss] Bloody update for December 11th In-Reply-To: <22125.26577.758055.199016@glaurung.bb-c.de> References: <22125.26577.758055.199016@glaurung.bb-c.de> Message-ID: <0ECB9C46-3301-437E-B223-448AC083015C@omniti.com> > On Dec 13, 2015, at 7:42 AM, Volker A. Brandt wrote: > > Note that the link for the ZFS installation root on this page still > points to the Nov 09 version. I used wget to retrieve the Dec 11 > version, so the file itself is there, just the link is wrong. Fixed: http://omnios.omniti.com/changeset.php/default/wiki/810badbcdcd1a0d0430008a79be2f755b5dd99e8 Thanks! Dan From stephan.budach at JVM.DE Mon Dec 14 15:25:18 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Mon, 14 Dec 2015 16:25:18 +0100 Subject: [OmniOS-discuss] How to configure FCoE target in OmniOS? Message-ID: <566EDF5E.3000004@jvm.de> Hi guys, I am trying to configure a FCoE target in OmniOS r016, but I seem to cannot get it right. I started out with the documentation for Solaris 11, which seemd appropriate and configured a fc target and added a view, which granted the fcoe port access to that LUN, but the ort doesn't seem to login to my Nexus 5500 at all. This is what I got so far: root at nfsvmpool03:/root# pkg list | grep fcoe driver/network/fcoe 0.5.11-0.151016 i-- driver/network/fcoet 0.5.11-0.151016 i-- system/library/libfcoe 0.5.11-0.151016 i-- svcs -a | grep fcoe disabled 15:35:16 svc:/system/fcoe_initiator:default online 15:35:39 svc:/system/fcoe_target:default fcadm hba-port HBA Port WWN: 2000a0369f590a20 Port Mode: Target Port ID: 0 OS Device Name: Not Applicable Manufacturer: Sun Microsystems, Inc. Model: FCoE Virtual FC HBA Firmware Version: N/A FCode/BIOS Version: N/A Serial Number: N/A Driver Name: COMSTAR FCoET Driver Version: v20091123-1.02 Type: unknown State: offline Supported Speeds: 1Gb 10Gb Current Speed: not established Node WWN: 1000a0369f590a20 stmfadm list-target -v wwn.2000A0369F590A20 Target: wwn.2000A0369F590A20 Operational Status: Online Provider Name : fcoet Alias : fcoet0 Protocol : Fibre Channel Sessions : 0 stmfadm list-view -l 600144F07A34AC66000054D1DEB50001 View Entry: 0 Host group : ovmHosts Target group : fcoeNFSVMPOOL03 LUN : 0 stmfadm list-tg -v fcoeNFSVMPOOL03 Target Group: fcoeNFSVMPOOL03 Member: wwn.2000A0369F590A20 Member: wwn.1000A0369F590A20 stmfadm list-hg -v ovmHosts Host Group: ovmHosts Member: wwn.2000A0369F1A171D I was able to successfully connect my Linux FCoE initiator to the fabric, but not the OmniOS target. Is there anything obvious wrong with my config? Thanks, Stephan From danmcd at omniti.com Mon Dec 14 15:35:16 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 14 Dec 2015 10:35:16 -0500 Subject: [OmniOS-discuss] How to configure FCoE target in OmniOS? In-Reply-To: <566EDF5E.3000004@jvm.de> References: <566EDF5E.3000004@jvm.de> Message-ID: > On Dec 14, 2015, at 10:25 AM, Stephan Budach wrote: > > Hi guys, > > I am trying to configure a FCoE target in OmniOS r016, but I seem to cannot get it right. I started out with the documentation for Solaris 11, which seemd appropriate and configured a fc target and added a view, which granted the fcoe port access to that LUN, but the ort doesn't seem to login to my Nexus 5500 at all. > > This is what I got so far: > > I was able to successfully connect my Linux FCoE initiator to the fabric, but not the OmniOS target. Is there anything obvious wrong with my config? Do you have any LUs configured? stmfadm list-lu You may want to make sure you have at least one. Create a ZFS volume and then add it: zfs create -V poolname/volname stmfadm create-lu /dev/zvol/dsk/poolname/volname Dan From stephan.budach at JVM.DE Mon Dec 14 15:46:53 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Mon, 14 Dec 2015 16:46:53 +0100 Subject: [OmniOS-discuss] How to configure FCoE target in OmniOS? In-Reply-To: References: <566EDF5E.3000004@jvm.de> Message-ID: <566EE46D.5050507@jvm.de> Well? creating a view, requires a LUN, doesn't it? But anyway, I do have some LUNs configured already, which I formerly presented using iSCSI. The one I chose for my FCoE testing is this one: stmfadm list-lu -v LU Name: 600144F07A34AC66000054D1DEB50001 Operational Status: Online Provider Name : sbd Alias : /dev/zvol/rdsk/sasTank/nfsvmpool03sas View Entry Count : 1 Data File : /dev/zvol/rdsk/sasTank/nfsvmpool03sas Meta File : not set Size : 1342177280000 Block Size : 512 Management URL : not set Vendor ID : SUN Product ID : COMSTAR Serial Num : not set Write Protect : Disabled Writeback Cache : Disabled Access State : Active Thanks, Stephan Am 14.12.15 um 16:35 schrieb Dan McDonald: >> On Dec 14, 2015, at 10:25 AM, Stephan Budach wrote: >> >> Hi guys, >> >> I am trying to configure a FCoE target in OmniOS r016, but I seem to cannot get it right. I started out with the documentation for Solaris 11, which seemd appropriate and configured a fc target and added a view, which granted the fcoe port access to that LUN, but the ort doesn't seem to login to my Nexus 5500 at all. >> >> This is what I got so far: >> >> I was able to successfully connect my Linux FCoE initiator to the fabric, but not the OmniOS target. Is there anything obvious wrong with my config? > Do you have any LUs configured? > > stmfadm list-lu > > You may want to make sure you have at least one. Create a ZFS volume and then add it: > > zfs create -V poolname/volname > > stmfadm create-lu /dev/zvol/dsk/poolname/volname > > Dan > From johan.kragsterman at capvert.se Mon Dec 14 16:26:19 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Mon, 14 Dec 2015 17:26:19 +0100 Subject: [OmniOS-discuss] Ang: Re: How to configure FCoE target in OmniOS? In-Reply-To: <566EE46D.5050507@jvm.de> References: <566EE46D.5050507@jvm.de>, <566EDF5E.3000004@jvm.de> Message-ID: Hi! Is there a question in this mail I miss somewhere...? Anyway, check further down... -----"OmniOS-discuss" skrev: ----- Till: Dan McDonald Fr?n: Stephan Budach S?nt av: "OmniOS-discuss" Datum: 2015-12-14 16:48 Kopia: omnios-discuss ?rende: Re: [OmniOS-discuss] How to configure FCoE target in OmniOS? Well… creating a view, requires a LUN, doesn't it? But anyway, I do have some LUNs configured already, which I formerly presented using iSCSI. The one I chose for my FCoE testing is this one: stmfadm list-lu -v LU Name: 600144F07A34AC66000054D1DEB50001 ?? ? Operational Status: Online ?? ? Provider Name ? ? : sbd ?? ? Alias ? ? ? ? ? ? : /dev/zvol/rdsk/sasTank/nfsvmpool03sas ?? ? View Entry Count ?: 1 ?? ? Data File ? ? ? ? : /dev/zvol/rdsk/sasTank/nfsvmpool03sas ?? ? Meta File ? ? ? ? : not set ?? ? Size ? ? ? ? ? ? ?: 1342177280000 ?? ? Block Size ? ? ? ?: 512 ?? ? Management URL ? ?: not set ?? ? Vendor ID ? ? ? ? : SUN ?? ? Product ID ? ? ? ?: COMSTAR ?? ? Serial Num ? ? ? ?: not set ?? ? Write Protect ? ? : Disabled ?? ? Writeback Cache ? : Disabled ?? ? Access State ? ? ?: Active Thanks, Stephan Have you enabled the: svcadm enable svc:/system/fcoe_target:default ? What I remember(years ago I did this...), FCoE HBA's show up when you run: stmfadm list-target (-v for verbose). From there you can get the wwnn and wwpn, which you need to configure fcoe ports: # fcadm create-fcoe-port -i -p Port_WWN -n Node_WWN Ethernet_Interface Then, if you connect the FCoE initiator to the fabric, and search for a LUN, you will see that if you again run: stmfadm list-target -v as a logged in initiator, and from there you will get the wwn on the initiator. With that info you can create the view for the LUN, for the initiator to be able to access it. Hope this is the right working order, it was a long time ago I did this... Regards Johan Am 14.12.15 um 16:35 schrieb Dan McDonald: >> On Dec 14, 2015, at 10:25 AM, Stephan Budach wrote: >> >> Hi guys, >> >> I am trying to configure a FCoE target in OmniOS r016, but I seem to cannot get it right. I started out with the documentation for Solaris 11, which seemd appropriate and configured a fc target and added a view, which granted the fcoe port access to that LUN, but the ort doesn't seem to login to my Nexus 5500 at all. >> >> This is what I got so far: >> >> I was able to successfully connect my Linux FCoE initiator to the fabric, but not the OmniOS target. Is there anything obvious wrong with my config? > Do you have any LUs configured? > > stmfadm list-lu > > You may want to make sure you have at least one. ?Create a ZFS volume and then add it: > > zfs create -V poolname/volname > > stmfadm create-lu /dev/zvol/dsk/poolname/volname > > Dan > _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From rjahnel at ellipseinc.com Mon Dec 14 18:22:50 2015 From: rjahnel at ellipseinc.com (Richard Jahnel) Date: Mon, 14 Dec 2015 18:22:50 +0000 Subject: [OmniOS-discuss] A useful tidbit or two for ESX admins running OmniOS Fibre Targets Message-ID: <65DC5816D4BEE043885A89FD54E273FC6CF693F1@MAIL101.Ellipseinc.com> Limiting the feature flags to those used in R151006 will eliminate the eager zero panic bug currently present in versions R151010 and later including the current LTS R151014. Example: zpool create -d \ -o feature at async_destroy=enabled \ -o feature at empty_bpobj=enabled \ -o feature at lz4_compress=enabled \ poolname \ raidz3 %disks% \ raidz3 %disks% \ <....> log mirror %disks% \ cache %disks% Also in an unrelated hint. Disabling the write back cache in the fibre target will prevent corrupted VMs that might otherwise result from a panic. PS. Be sure to have an SSD backed log mirror to minimize the write performance impact. Example: stmfadm modify-lu -p wcd=true [Ellipse Communications] Richard Jahnel | Senior Network Engineer Ellipse Communications - Corporate Office 14800 Quorum Dr, Suite 420 Dallas, TX 75254 TF: 888-678-3869 | F: 972-479-9115 Email * Website * Facebook * Twitter ________________________________ The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Mon Dec 14 18:36:01 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 14 Dec 2015 13:36:01 -0500 Subject: [OmniOS-discuss] A useful tidbit or two for ESX admins running OmniOS Fibre Targets In-Reply-To: <65DC5816D4BEE043885A89FD54E273FC6CF693F1@MAIL101.Ellipseinc.com> References: <65DC5816D4BEE043885A89FD54E273FC6CF693F1@MAIL101.Ellipseinc.com> Message-ID: <0FDFF526-04C0-4705-8E81-3B29F031CCD8@omniti.com> > On Dec 14, 2015, at 1:22 PM, Richard Jahnel wrote: > > Limiting the feature flags to those used in R151006 will eliminate the eager zero panic bug currently present in versions R151010 and later including the current LTS R151014. > Is there an illumos bug filed for this? If not, why hasn't there been? Modulo the fiber channel HW, it seems easy enough to reproduce, no? Dan From rjahnel at ellipseinc.com Mon Dec 14 18:51:05 2015 From: rjahnel at ellipseinc.com (Richard Jahnel) Date: Mon, 14 Dec 2015 18:51:05 +0000 Subject: [OmniOS-discuss] A useful tidbit or two for ESX admins running OmniOS Fibre Targets In-Reply-To: <0FDFF526-04C0-4705-8E81-3B29F031CCD8@omniti.com> References: <65DC5816D4BEE043885A89FD54E273FC6CF693F1@MAIL101.Ellipseinc.com> <0FDFF526-04C0-4705-8E81-3B29F031CCD8@omniti.com> Message-ID: <65DC5816D4BEE043885A89FD54E273FC6CF69427@MAIL101.Ellipseinc.com> We have discussed it before here on this list. I haven't filed a bug because I don't know how to do so for or where the bug resides. For example, does it belong to OmniOS, Illumos or OpenZFS? I don't know and I can't read source code well enough to figure it out. -----Original Message----- From: Dan McDonald [mailto:danmcd at omniti.com] Sent: Monday, December 14, 2015 12:36 PM To: Richard Jahnel Cc: omnios-discuss at lists.omniti.com; Dan McDonald Subject: Re: [OmniOS-discuss] A useful tidbit or two for ESX admins running OmniOS Fibre Targets > On Dec 14, 2015, at 1:22 PM, Richard Jahnel wrote: > > Limiting the feature flags to those used in R151006 will eliminate the eager zero panic bug currently present in versions R151010 and later including the current LTS R151014. > Is there an illumos bug filed for this? If not, why hasn't there been? Modulo the fiber channel HW, it seems easy enough to reproduce, no? Dan ________________________________ The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies. From stephan.budach at JVM.DE Mon Dec 14 18:57:38 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Mon, 14 Dec 2015 19:57:38 +0100 Subject: [OmniOS-discuss] Ang: Re: How to configure FCoE target in OmniOS? In-Reply-To: References: <566EE46D.5050507@jvm.de>, <566EDF5E.3000004@jvm.de> Message-ID: <566F1122.5010402@jvm.de> Hi Johan, Am 14.12.15 um 17:26 schrieb Johan Kragsterman: > Hi! > > Is there a question in this mail I miss somewhere...? well, not in that post, but in the first one, where I asked, if anyone would spot some obvious errors in my attempt to configure a FCoE target in OmniOS. > > Anyway, check further down... > > > > -----"OmniOS-discuss" skrev: ----- > Till: Dan McDonald > Fr?n: Stephan Budach > S?nt av: "OmniOS-discuss" > Datum: 2015-12-14 16:48 > Kopia: omnios-discuss > ?rende: Re: [OmniOS-discuss] How to configure FCoE target in OmniOS? > > Well… creating a view, requires a LUN, doesn't it? But anyway, I do have > some LUNs configured already, which I formerly presented using iSCSI. > The one I chose for my FCoE testing is this one: > > stmfadm list-lu -v > LU Name: 600144F07A34AC66000054D1DEB50001 > Operational Status: Online > Provider Name : sbd > Alias : /dev/zvol/rdsk/sasTank/nfsvmpool03sas > View Entry Count : 1 > Data File : /dev/zvol/rdsk/sasTank/nfsvmpool03sas > Meta File : not set > Size : 1342177280000 > Block Size : 512 > Management URL : not set > Vendor ID : SUN > Product ID : COMSTAR > Serial Num : not set > Write Protect : Disabled > Writeback Cache : Disabled > Access State : Active > > Thanks, > Stephan > > > > Have you enabled the: svcadm enable svc:/system/fcoe_target:default ? Yes. > > What I remember(years ago I did this...), FCoE HBA's show up when you run: stmfadm list-target (-v for verbose). From there you can get the wwnn and wwpn, which you need to configure fcoe ports: > > # fcadm create-fcoe-port -i -p Port_WWN -n Node_WWN Ethernet_Interface Yeah, I did that as well, but the port actually doesn't seem to login to the fabric on the switch. Shouldn't I see some flogi message from the target port on th4 switch as well? > > Then, if you connect the FCoE initiator to the fabric, and search for a LUN, you will see that if you again run: > > stmfadm list-target -v > > as a logged in initiator, and from there you will get the wwn on the initiator. > > With that info you can create the view for the LUN, for the initiator to be able to access it. > > Hope this is the right working order, it was a long time ago I did this... > > Regards Johan > > > Am 14.12.15 um 16:35 schrieb Dan McDonald: >>> On Dec 14, 2015, at 10:25 AM, Stephan Budach wrote: >>> >>> Hi guys, >>> >>> I am trying to configure a FCoE target in OmniOS r016, but I seem to cannot get it right. I started out with the documentation for Solaris 11, which seemd appropriate and configured a fc target and added a view, which granted the fcoe port access to that LUN, but the ort doesn't seem to login to my Nexus 5500 at all. >>> >>> This is what I got so far: >>> >>> I was able to successfully connect my Linux FCoE initiator to the fabric, but not the OmniOS target. Is there anything obvious wrong with my config? >> Do you have any LUs configured? >> >> stmfadm list-lu >> >> You may want to make sure you have at least one. Create a ZFS volume and then add it: >> >> zfs create -V poolname/volname >> >> stmfadm create-lu /dev/zvol/dsk/poolname/volname >> >> Dan >> > _______________________________________________ > From danmcd at omniti.com Mon Dec 14 19:48:22 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 14 Dec 2015 14:48:22 -0500 Subject: [OmniOS-discuss] A useful tidbit or two for ESX admins running OmniOS Fibre Targets In-Reply-To: <65DC5816D4BEE043885A89FD54E273FC6CF69427@MAIL101.Ellipseinc.com> References: <65DC5816D4BEE043885A89FD54E273FC6CF693F1@MAIL101.Ellipseinc.com> <0FDFF526-04C0-4705-8E81-3B29F031CCD8@omniti.com> <65DC5816D4BEE043885A89FD54E273FC6CF69427@MAIL101.Ellipseinc.com> Message-ID: Have you shared a coredump before? I can analyze it to see where it might fall. Typically a kernel panic is Illumos. It might also be OpenZFS, but since Illumos is still upstream it's the safer bet. Dan Sent from my iPhone (typos, autocorrect, and all) > On Dec 14, 2015, at 1:51 PM, Richard Jahnel wrote: > > We have discussed it before here on this list. I haven't filed a bug because I don't know how to do so for or where the bug resides. > > For example, does it belong to OmniOS, Illumos or OpenZFS? > > I don't know and I can't read source code well enough to figure it out. > > -----Original Message----- > From: Dan McDonald [mailto:danmcd at omniti.com] > Sent: Monday, December 14, 2015 12:36 PM > To: Richard Jahnel > Cc: omnios-discuss at lists.omniti.com; Dan McDonald > Subject: Re: [OmniOS-discuss] A useful tidbit or two for ESX admins running OmniOS Fibre Targets > > >> On Dec 14, 2015, at 1:22 PM, Richard Jahnel wrote: >> >> Limiting the feature flags to those used in R151006 will eliminate the eager zero panic bug currently present in versions R151010 and later including the current LTS R151014. > > > Is there an illumos bug filed for this? If not, why hasn't there been? Modulo the fiber channel HW, it seems easy enough to reproduce, no? > > Dan > > ________________________________ > > The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies. From rjahnel at ellipseinc.com Mon Dec 14 19:53:36 2015 From: rjahnel at ellipseinc.com (Richard Jahnel) Date: Mon, 14 Dec 2015 19:53:36 +0000 Subject: [OmniOS-discuss] A useful tidbit or two for ESX admins running OmniOS Fibre Targets In-Reply-To: References: <65DC5816D4BEE043885A89FD54E273FC6CF693F1@MAIL101.Ellipseinc.com> <0FDFF526-04C0-4705-8E81-3B29F031CCD8@omniti.com> <65DC5816D4BEE043885A89FD54E273FC6CF69427@MAIL101.Ellipseinc.com> Message-ID: <65DC5816D4BEE043885A89FD54E273FC6CF69477@MAIL101.Ellipseinc.com> Yes, you looked at it around Oct 12th or 13th of this year. -----Original Message----- From: Dan McDonald [mailto:danmcd at omniti.com] Sent: Monday, December 14, 2015 1:48 PM To: Richard Jahnel ; Dan McDonald Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] A useful tidbit or two for ESX admins running OmniOS Fibre Targets Have you shared a coredump before? I can analyze it to see where it might fall. Typically a kernel panic is Illumos. It might also be OpenZFS, but since Illumos is still upstream it's the safer bet. Dan Sent from my iPhone (typos, autocorrect, and all) > On Dec 14, 2015, at 1:51 PM, Richard Jahnel wrote: > > We have discussed it before here on this list. I haven't filed a bug because I don't know how to do so for or where the bug resides. > > For example, does it belong to OmniOS, Illumos or OpenZFS? > > I don't know and I can't read source code well enough to figure it out. > > -----Original Message----- > From: Dan McDonald [mailto:danmcd at omniti.com] > Sent: Monday, December 14, 2015 12:36 PM > To: Richard Jahnel > Cc: omnios-discuss at lists.omniti.com; Dan McDonald > Subject: Re: [OmniOS-discuss] A useful tidbit or two for ESX admins running OmniOS Fibre Targets > > >> On Dec 14, 2015, at 1:22 PM, Richard Jahnel wrote: >> >> Limiting the feature flags to those used in R151006 will eliminate the eager zero panic bug currently present in versions R151010 and later including the current LTS R151014. > > > Is there an illumos bug filed for this? If not, why hasn't there been? Modulo the fiber channel HW, it seems easy enough to reproduce, no? > > Dan > > ________________________________ > > The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies. From danmcd at omniti.com Mon Dec 14 22:13:46 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 14 Dec 2015 17:13:46 -0500 Subject: [OmniOS-discuss] A useful tidbit or two for ESX admins running OmniOS Fibre Targets In-Reply-To: <65DC5816D4BEE043885A89FD54E273FC6CF69477@MAIL101.Ellipseinc.com> References: <65DC5816D4BEE043885A89FD54E273FC6CF693F1@MAIL101.Ellipseinc.com> <0FDFF526-04C0-4705-8E81-3B29F031CCD8@omniti.com> <65DC5816D4BEE043885A89FD54E273FC6CF69427@MAIL101.Ellipseinc.com> <65DC5816D4BEE043885A89FD54E273FC6CF69477@MAIL101.Ellipseinc.com> Message-ID: <33ACC7C1-A2CF-4B5A-9EDF-840B41EE8C5E@omniti.com> > On Dec 14, 2015, at 2:53 PM, Richard Jahnel wrote: > > Yes, you looked at it around Oct 12th or 13th of this year. And it was... interesting: panic[cpu11]/thread=ffffff15a1ea4780: hati_pte_map: flags & HAT_LOAD_REMAP ffffff009a23f850 unix:hati_pte_map+3ab () ffffff009a23f8e0 unix:hati_load_common+139 () ffffff009a23f960 unix:hat_memload+75 () ffffff009a23fa80 genunix:segvn_faultpage+730 () ffffff009a23fc50 genunix:segvn_fault+8e6 () ffffff009a23fd60 genunix:as_fault+31a () ffffff009a23fdf0 unix:pagefault+96 () ffffff009a23ff00 unix:trap+2c7 () ffffff009a23ff10 unix:cmntrap+e6 () Nothing to indicate ZFS or FC... that's a VM subsystem fault. I do, however, see 78 threads all doing SCSI WRITE_SAME. Dan From rjahnel at ellipseinc.com Mon Dec 14 22:21:51 2015 From: rjahnel at ellipseinc.com (Richard Jahnel) Date: Mon, 14 Dec 2015 22:21:51 +0000 Subject: [OmniOS-discuss] A useful tidbit or two for ESX admins running OmniOS Fibre Targets In-Reply-To: <33ACC7C1-A2CF-4B5A-9EDF-840B41EE8C5E@omniti.com> References: <65DC5816D4BEE043885A89FD54E273FC6CF693F1@MAIL101.Ellipseinc.com> <0FDFF526-04C0-4705-8E81-3B29F031CCD8@omniti.com> <65DC5816D4BEE043885A89FD54E273FC6CF69427@MAIL101.Ellipseinc.com> <65DC5816D4BEE043885A89FD54E273FC6CF69477@MAIL101.Ellipseinc.com> <33ACC7C1-A2CF-4B5A-9EDF-840B41EE8C5E@omniti.com> Message-ID: <65DC5816D4BEE043885A89FD54E273FC6CF694EB@MAIL101.Ellipseinc.com> All I can say for sure is that the problem is repeatable with sufficient time. The test we use to see whether or not a storage volume is susceptible is to create and eager zero a 4 TB vmdk on the volume. Or as large a VMDK as the volume can handle. Most of the time it will panic within the first TB else it has thus far always panicked before the third. Volumes made with only three flags previously listed will not panic and have been test with eager zeros as large as 8 TB. This has been tested against R151014 and R151016. -----Original Message----- From: Dan McDonald [mailto:danmcd at omniti.com] Sent: Monday, December 14, 2015 4:14 PM To: Richard Jahnel ; Dan McDonald Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] A useful tidbit or two for ESX admins running OmniOS Fibre Targets > On Dec 14, 2015, at 2:53 PM, Richard Jahnel wrote: > > Yes, you looked at it around Oct 12th or 13th of this year. And it was... interesting: panic[cpu11]/thread=ffffff15a1ea4780: hati_pte_map: flags & HAT_LOAD_REMAP ffffff009a23f850 unix:hati_pte_map+3ab () ffffff009a23f8e0 unix:hati_load_common+139 () ffffff009a23f960 unix:hat_memload+75 () ffffff009a23fa80 genunix:segvn_faultpage+730 () ffffff009a23fc50 genunix:segvn_fault+8e6 () ffffff009a23fd60 genunix:as_fault+31a () ffffff009a23fdf0 unix:pagefault+96 () ffffff009a23ff00 unix:trap+2c7 () ffffff009a23ff10 unix:cmntrap+e6 () Nothing to indicate ZFS or FC... that's a VM subsystem fault. I do, however, see 78 threads all doing SCSI WRITE_SAME. Dan ________________________________ The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies. From johan.kragsterman at capvert.se Mon Dec 14 22:37:00 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Mon, 14 Dec 2015 23:37:00 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Re: How to configure FCoE target in OmniOS? In-Reply-To: <566F1122.5010402@jvm.de> References: <566F1122.5010402@jvm.de>, <566EE46D.5050507@jvm.de>, <566EDF5E.3000004@jvm.de> Message-ID: Hi! -----Stephan Budach skrev: ----- Till: Johan Kragsterman Fr?n: Stephan Budach Datum: 2015-12-14 19:57 Kopia: Dan McDonald , omnios-discuss ?rende: Re: Ang: Re: [OmniOS-discuss] How to configure FCoE target in OmniOS? > Have you enabled the: svcadm enable svc:/system/fcoe_target:default ?? Yes. > > What I remember(years ago I did this...), FCoE HBA's show up when you run: stmfadm list-target (-v for verbose). From there you can get the wwnn and wwpn, which you need to configure fcoe ports: > > # fcadm create-fcoe-port -i -p Port_WWN -n Node_WWN Ethernet_Interface Yeah, I did that as well, but the port actually doesn't seem to login to the fabric on the switch. Shouldn't I see some flogi message from the target port on th4 switch as well? > Yeah, it must register with the name services on the switch, if I remember correctly. Must be the same as with fibre channel, the name services must pick it up to be able to serve the name further on to the SAN. The problem with FCoE is, imho, that the adaptors doesn't have any bios to check. In FC you go inte the bios and check the presence of storage devices, I don't think you got that possibility in FCoE adaptors, do you? But can you perhaps bypass the switch, and see if you can pick up any device directly? By the way, what kind of switch do you use? I know there are FCoE switches that have different FCoE ports and FC ports...I mean, if you confused those ports...? Rgrds Johan >______________________________________ > From stephan.budach at JVM.DE Tue Dec 15 05:55:26 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Tue, 15 Dec 2015 06:55:26 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Re: How to configure FCoE target in OmniOS? In-Reply-To: References: <566F1122.5010402@jvm.de>, <566EE46D.5050507@jvm.de>, <566EDF5E.3000004@jvm.de> Message-ID: <566FAB4E.8010201@jvm.de> Hi Johan, Am 14.12.15 um 23:37 schrieb Johan Kragsterman: > Hi! > > > -----Stephan Budach skrev: ----- > Till: Johan Kragsterman > Fr?n: Stephan Budach > Datum: 2015-12-14 19:57 > Kopia: Dan McDonald , omnios-discuss > ?rende: Re: Ang: Re: [OmniOS-discuss] How to configure FCoE target in OmniOS? > > >> Have you enabled the: svcadm enable svc:/system/fcoe_target:default ? > Yes. >> What I remember(years ago I did this...), FCoE HBA's show up when you run: stmfadm list-target (-v for verbose). From there you can get the wwnn and wwpn, which you need to configure fcoe ports: >> >> # fcadm create-fcoe-port -i -p Port_WWN -n Node_WWN Ethernet_Interface > Yeah, I did that as well, but the port actually doesn't seem to login to > the fabric on the switch. Shouldn't I see some flogi message from the > target port on th4 switch as well? > Yeah, it must register with the name services on the switch, if I remember correctly. Must be the same as with fibre channel, the name services must pick it up to be able to serve the name further on to the SAN. > > The problem with FCoE is, imho, that the adaptors doesn't have any bios to check. In FC you go inte the bios and check the presence of storage devices, I don't think you got that possibility in FCoE adaptors, do you? But can you perhaps bypass the switch, and see if you can pick up any device directly? Hmm, no? I want to use OmniOS as a FCoE target, actually, not an initiator. I got my FCoE initiator on RHEL already set up and logged in to the fabric. I am wondering a bit about the DCB client though - I am still not sure, if the X520-T2 has one built it or not. On RHEL I configred the DCB_REQUIRED to no and it obviously works, which led me to believe that these Intel CNAs actually do have a DCB client built in. > > By the way, what kind of switch do you use? I know there are FCoE switches that have different FCoE ports and FC ports...I mean, if you confused those ports...? I am using a Nexus 5596, with some fabric extenders and although I am pretty sure, that I configured the ports correctly I will check that one again and see, if I made some mistake. > > Rgrds Johan Thanks, Stephan > > >> ______________________________________ >> > > -- Krebs?s 3 Basic Rules for Online Safety 1st - ?If you didn?t go looking for it, don?t install it!? 2nd - ?If you installed it, update it.? 3rd - ?If you no longer need it, remove it.? http://krebsonsecurity.com/2011/05/krebss-3-basic-rules-for-online-safety Stephan Budach Head of IT Jung von Matt/basis GmbH Glash?ttenstra?e 79 20357 Hamburg Tel: +49 40-4321-1353 Fax: +49 40-4321-1114 E-Mail: stephan.budach at jvm.de Internet: http://www.jvm.com Gesch?ftsf?hrer: Dominik Fassl, Christoph K?hler, Ulrich Pallas AG HH HRB 82024 From stephan.budach at JVM.DE Tue Dec 15 07:56:26 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Tue, 15 Dec 2015 08:56:26 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Re: How to configure FCoE target in OmniOS? In-Reply-To: <566FAB4E.8010201@jvm.de> References: <566F1122.5010402@jvm.de>, <566EE46D.5050507@jvm.de>, <566EDF5E.3000004@jvm.de> <566FAB4E.8010201@jvm.de> Message-ID: <566FC7AA.9050402@jvm.de> Am 15.12.15 um 06:55 schrieb Stephan Budach: > Hi Johan, > > Am 14.12.15 um 23:37 schrieb Johan Kragsterman: >> Hi! >> >> >> -----Stephan Budach skrev: ----- >> Till: Johan Kragsterman >> Fr?n: Stephan Budach >> Datum: 2015-12-14 19:57 >> Kopia: Dan McDonald , omnios-discuss >> >> ?rende: Re: Ang: Re: [OmniOS-discuss] How to configure FCoE target in >> OmniOS? >> >> >>> Have you enabled the: svcadm enable svc:/system/fcoe_target:default ? >> Yes. >>> What I remember(years ago I did this...), FCoE HBA's show up when >>> you run: stmfadm list-target (-v for verbose). From there you can >>> get the wwnn and wwpn, which you need to configure fcoe ports: >>> >>> # fcadm create-fcoe-port -i -p Port_WWN -n Node_WWN Ethernet_Interface >> Yeah, I did that as well, but the port actually doesn't seem to login to >> the fabric on the switch. Shouldn't I see some flogi message from the >> target port on th4 switch as well? >> Yeah, it must register with the name services on the switch, if I >> remember correctly. Must be the same as with fibre channel, the name >> services must pick it up to be able to serve the name further on to >> the SAN. >> >> The problem with FCoE is, imho, that the adaptors doesn't have any >> bios to check. In FC you go inte the bios and check the presence of >> storage devices, I don't think you got that possibility in FCoE >> adaptors, do you? But can you perhaps bypass the switch, and see if >> you can pick up any device directly? > Hmm, no? I want to use OmniOS as a FCoE target, actually, not an > initiator. I got my FCoE initiator on RHEL already set up and logged > in to the fabric. I am wondering a bit about the DCB client though - I > am still not sure, if the X520-T2 has one built it or not. On RHEL I > configred the DCB_REQUIRED to no and it obviously works, which led me > to believe that these Intel CNAs actually do have a DCB client built in. >> >> By the way, what kind of switch do you use? I know there are FCoE >> switches that have different FCoE ports and FC ports...I mean, if you >> confused those ports...? > I am using a Nexus 5596, with some fabric extenders and although I am > pretty sure, that I configured the ports correctly I will check that > one again and see, if I made some mistake. >> >> Rgrds Johan > Thanks, > Stephan I think that I do need the LLDP package installed, which will give me the DCBX capabilities, I seem to be missing, but actually I can't find any package providing that. Does anyone know, where that sucker hides in? Or what the equivalent in OmniOS is? Thanks, Stephan From johan.kragsterman at capvert.se Tue Dec 15 09:08:02 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Tue, 15 Dec 2015 10:08:02 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Re: Ang: Re: How to configure FCoE target in OmniOS? In-Reply-To: <566FAB4E.8010201@jvm.de> References: <566FAB4E.8010201@jvm.de>, <566F1122.5010402@jvm.de>, <566EE46D.5050507@jvm.de>, <566EDF5E.3000004@jvm.de> Message-ID: Hi! -----Stephan Budach skrev: ----- Till: Johan Kragsterman Fr?n: Stephan Budach Datum: 2015-12-15 06:55 Kopia: omnios-discuss ?rende: Re: Ang: Re: Ang: Re: [OmniOS-discuss] How to configure FCoE target in OmniOS? Hi Johan, Am 14.12.15 um 23:37 schrieb Johan Kragsterman: > Hi! > > > -----Stephan Budach skrev: ----- > Till: Johan Kragsterman > Fr?n: Stephan Budach > Datum: 2015-12-14 19:57 > Kopia: Dan McDonald , omnios-discuss > ?rende: Re: Ang: Re: [OmniOS-discuss] How to configure FCoE target in OmniOS? > > >> Have you enabled the: svcadm enable svc:/system/fcoe_target:default ?? > Yes. >> What I remember(years ago I did this...), FCoE HBA's show up when you run: stmfadm list-target (-v for verbose). From there you can get the wwnn and wwpn, which you need to configure fcoe ports: >> >> # fcadm create-fcoe-port -i -p Port_WWN -n Node_WWN Ethernet_Interface > Yeah, I did that as well, but the port actually doesn't seem to login to > the fabric on the switch. Shouldn't I see some flogi message from the > target port on th4 switch as well? > Yeah, it must register with the name services on the switch, if I remember correctly. Must be the same as with fibre channel, the name services must pick it up to be able to serve the name further on to the SAN. > > The problem with FCoE is, imho, that the adaptors doesn't have any bios to check. In FC you go inte the bios and check the presence of storage devices, I don't think you got that possibility in FCoE adaptors, do you? But can you perhaps bypass the switch, and see if you can pick up any device directly? Hmm, no… I want to use OmniOS as a FCoE target, actually, not an initiator. What I mean here it to bypass the switch in that way that you eliminate the switch as the problem. Disconnect the switch, and connect the initiator directly to the target, and see if you can pich up a device. Rgrds Johan From johan.kragsterman at capvert.se Tue Dec 15 09:20:30 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Tue, 15 Dec 2015 10:20:30 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Re: Ang: Re: How to configure FCoE target in OmniOS? In-Reply-To: <566FC7AA.9050402@jvm.de> References: <566FC7AA.9050402@jvm.de>, <566F1122.5010402@jvm.de>, <566EE46D.5050507@jvm.de>, <566EDF5E.3000004@jvm.de> <566FAB4E.8010201@jvm.de> Message-ID: Hi! -----"OmniOS-discuss" skrev: ----- Till: Fr?n: Stephan Budach S?nt av: "OmniOS-discuss" Datum: 2015-12-15 08:58 ?rende: Re: [OmniOS-discuss] Ang: Re: Ang: Re: How to configure FCoE target in OmniOS? Am 15.12.15 um 06:55 schrieb Stephan Budach: > Hi Johan, > > Am 14.12.15 um 23:37 schrieb Johan Kragsterman: >> Hi! >> >> >> -----Stephan Budach skrev: ----- >> Till: Johan Kragsterman >> Fr?n: Stephan Budach >> Datum: 2015-12-14 19:57 >> Kopia: Dan McDonald , omnios-discuss >> >> ?rende: Re: Ang: Re: [OmniOS-discuss] How to configure FCoE target in >> OmniOS? >> >> >>> Have you enabled the: svcadm enable svc:/system/fcoe_target:default ?? >> Yes. >>> What I remember(years ago I did this...), FCoE HBA's show up when >>> you run: stmfadm list-target (-v for verbose). From there you can >>> get the wwnn and wwpn, which you need to configure fcoe ports: >>> >>> # fcadm create-fcoe-port -i -p Port_WWN -n Node_WWN Ethernet_Interface >> Yeah, I did that as well, but the port actually doesn't seem to login to >> the fabric on the switch. Shouldn't I see some flogi message from the >> target port on th4 switch as well? >> Yeah, it must register with the name services on the switch, if I >> remember correctly. Must be the same as with fibre channel, the name >> services must pick it up to be able to serve the name further on to >> the SAN. >> >> The problem with FCoE is, imho, that the adaptors doesn't have any >> bios to check. In FC you go inte the bios and check the presence of >> storage devices, I don't think you got that possibility in FCoE >> adaptors, do you? But can you perhaps bypass the switch, and see if >> you can pick up any device directly? > Hmm, no… I want to use OmniOS as a FCoE target, actually, not an > initiator. I got my FCoE initiator on RHEL already set up and logged > in to the fabric. I am wondering a bit about the DCB client though - I > am still not sure, if the X520-T2 has one built it or not. On RHEL I > configred the DCB_REQUIRED to no and it obviously works, which led me > to believe that these Intel CNAs actually do have a DCB client built in. >> >> By the way, what kind of switch do you use? I know there are FCoE >> switches that have different FCoE ports and FC ports...I mean, if you >> confused those ports...? > I am using a Nexus 5596, with some fabric extenders and although I am > pretty sure, that I configured the ports correctly I will check that > one again and see, if I made some mistake. >> >> Rgrds Johan > Thanks, > Stephan I think that I do need the LLDP package installed, which will give me the DCBX capabilities, I seem to be missing, but actually I can't find any package providing that. Does anyone know, where that sucker hides in? Or what the equivalent in OmniOS is? Thanks, Stephan Now I remember that Gea(G?nther Alka) @ napp-it has FCoE working on his napp-it server. He uses omnios as a base. He's on this list frequently. Otherwise: http://napp-it.org/ gea at napp-it.org alka at hfg-gmuend.de Regards Johan _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From alka at hfg-gmuend.de Tue Dec 15 11:31:00 2015 From: alka at hfg-gmuend.de (Guenther Alka) Date: Tue, 15 Dec 2015 12:31:00 +0100 Subject: [OmniOS-discuss] Bloody update for December 11th In-Reply-To: References: <566B081C.8@hfg-gmuend.de> Message-ID: <566FF9F4.5010902@hfg-gmuend.de> I have updated the pdf with results from OmniOS bloody. Main resultsfor 10G Ethernet on OSX 10.11 and Windows 8.1 - OS version and client network driver is very criticalfor 10G on some configs or with some driver releases 10G is not faster than 1G (mostly on reads) - From Windows, performance to Solaris is similar than to OmniOS (at a lower level than with OSX) - From OSX, SMB2 to Solaris is faster than to OmniOS - OSX is faster than Windows on SMB2 reads and writes out of the box SMB2 perfomance on OSX goes up to > 600 MB/s on writes and > 800 MB/s on reads SMB perfomance on Windows goes up to > 300 MB/s on writes and > 600 MB/s on reads This is a quick "out of the box" check with SMB2 and Jumboframes as the only special settingson OSX. On Windows 8.1 defaults + mtu 9000 are used. Maybe we need some additional tweakings on Windows Am 12.12.2015 um 22:07 schrieb Bob Friesenhahn: > On Fri, 11 Dec 2015, G?nther Alka wrote: > >> Many Thanks to Nexenta >> and to OmniTi for this december bloody with SMB 2 >> >> I have just done some tests on OSX under Solaris 11.3 to check some >> configuration >> options for a ZFS video editing storage server for my Mac Pros. > > Do you plan to add tests with the implementation in OmniOS bloody? The > Nexenta implementation might be quite a lot different than the Oracle > Solaris one. Perhaps it might even fail with your tests. > >> There are two must have principles: SMB2 and Jumboframes >> see http://napp-it.org/doc/downloads/performance_smb2.pdf > > I was surprised to see the huge improvement with jumbo frames. > > Bob -- H f G Hochschule f?r Gestaltung university of design Schw?bisch Gm?nd Rektor-Klaus Str. 100 73525 Schw?bisch Gm?nd Guenther Alka, Dipl.-Ing. (FH) Leiter des Rechenzentrums head of computer center Tel 07171 602 627 Fax 07171 69259 guenther.alka at hfg-gmuend.de http://rz.hfg-gmuend.de From mtalbott at lji.org Wed Dec 16 05:07:45 2015 From: mtalbott at lji.org (Michael Talbott) Date: Tue, 15 Dec 2015 21:07:45 -0800 Subject: [OmniOS-discuss] OmniOS and Veeam Message-ID: <1B979ADA-EF76-422E-AFB0-AB31DED8A046@lji.org> I'd like to use an OmniOS box for a Veeam backup repository. The only solution I've been able to come up with thus far is to create a Linux VM, mount an OmniOS nfs export and then point Veeam to the Linux box's mounted nfs volume. Kinda clunky and far from optimal, but seems to get the job done. I'd like to eliminate the middleman if possible. From my understanding Veeam just uses some perl script over ssh to query for a few things like finding mounted filesystems, etc. And then it uses NFS for the actual transfers. But that perl script doesn't properly complete on OmniOS probably due to some slightly different call requirements. Anybody on here been through this and know of any workarounds to make it work natively? Or can anyone think of a less clunky workaround? Thanks, Michael From danmcd at omniti.com Wed Dec 16 14:46:38 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 16 Dec 2015 09:46:38 -0500 Subject: [OmniOS-discuss] OmniOS and Veeam In-Reply-To: <1B979ADA-EF76-422E-AFB0-AB31DED8A046@lji.org> References: <1B979ADA-EF76-422E-AFB0-AB31DED8A046@lji.org> Message-ID: I know nothing about Veeam, but... > On Dec 16, 2015, at 12:07 AM, Michael Talbott wrote: > > I'd like to use an OmniOS box for a Veeam backup repository. The only solution I've been able to come up with thus far is to create a Linux VM, mount an OmniOS nfs export and then point Veeam to the Linux box's mounted nfs volume. Kinda clunky and far from optimal, but seems to get the job done. I'd like to eliminate the middleman if possible. > > From my understanding Veeam just uses some perl script over ssh to query for a few things like finding mounted filesystems, etc. And then it uses NFS for the actual transfers. But that perl script doesn't properly complete on OmniOS probably due to some slightly different call requirements. If you showed people the failure output, that may help you. Dan From danmcd at omniti.com Thu Dec 17 01:38:23 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 16 Dec 2015 20:38:23 -0500 Subject: [OmniOS-discuss] Updates for LTS (r151014) and Stable (r151016) Message-ID: <4240E754-A583-4CA4-A079-631EB3954FBF@omniti.com> The updates for LTS and Stable are identical this time. New release media is out, and if you "pkg update" you will need to reboot, because of kernel ZFS changes. This update includes: * BIND security update to 9.10.3-P2 * ZFS receives now replication streams with a refquota even if older snapshots exceed it (illumos 4986). Includes new ZFS Test Suite test. * OpenSSH now integrates with the illumos audit subsystem. Thanks to Joyent, and this is part of getting OpenSSH to match SunSSH's integrated functionality. * NVMe bugfixes (illumos 6466 and 6467). Modulo disaster, this will be the last update for calendar year 2015. After this week ends, I will be on vacation (just relaxing at home with my family), but I will be occasionally reading mail. My latency will be VERY HIGH after COB Friday, US/Eastern. Have an enjoyable holiday season, whatever you do or don't celebrate, and catch you in 2016! Dan From henson at acm.org Thu Dec 17 02:12:18 2015 From: henson at acm.org (Paul B. Henson) Date: Wed, 16 Dec 2015 18:12:18 -0800 Subject: [OmniOS-discuss] Updates for LTS (r151014) and Stable (r151016) In-Reply-To: <4240E754-A583-4CA4-A079-631EB3954FBF@omniti.com> References: <4240E754-A583-4CA4-A079-631EB3954FBF@omniti.com> Message-ID: <20151217021217.GC3405@bender.unx.cpp.edu> On Wed, Dec 16, 2015 at 08:38:23PM -0500, Dan McDonald wrote: > * ZFS receives now replication streams with a refquota even if older > snapshots exceed it (illumos 4986). Includes new ZFS Test Suite test. Woo-hoo! We'll be testing this out straight-away, thanks much for resolving it so quickly. > After this week ends, I will be on vacation (just relaxing at home > with my family), but I will be occasionally reading mail. My latency > will be VERY HIGH after COB Friday, US/Eastern. I'm off after this week too, but I fear I'm probably more of a workaholic than you are ;). One of my holiday plans is to update/reboot my home storage server and hope nothing blows chunks 8-/, then I can finally add back my L2ARC devices. From wonko at 4amlunch.net Thu Dec 17 19:05:21 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Thu, 17 Dec 2015 14:05:21 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <7D06CC38-9841-4189-80CD-6341E025B10C@4amlunch.net> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> <4B0CFB00-2181-4E38-B0E1-8AAAA3E6136C@omniti.com> <7D06CC38-9841-4189-80CD-6341E025B10C@4amlunch.net> Message-ID: Ok, let?s add to the weirdness. I destroyed the degraded pool. I re-created it. I then re-ran iozone. It completed with zero errors on the pool. iozone did have some issues at the end, but the FS seems ok: pool: zoom state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zoom ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t1d0s1 ONLINE 0 0 0 c5t1d0s1 ONLINE 0 0 0 errors: No known data errors Iozone: Performance Test of File I/O Version $Revision: 3.434 $ Compiled for 64 bit mode. Build: Solaris10gcc-64 Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone, Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root, Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer, Vangel Bojaxhi, Ben England, Vikentsi Lapa, Alexey Skidanov. Run began: Thu Dec 17 13:21:59 2015 Multi_buffer. Work area 16777216 bytes OPS Mode. Output is in operations per second. Record Size 8 kB SYNC Mode. File size set to 2097152 kB Command line used: /usr/local/bin/iozone -m -t 16 -T -O -r 8k -o -s 2G Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 16 threads Each thread writes a 2097152 kByte file in 8 kByte records Children see throughput for 16 initial writers = 29558.13 ops/sec Parent sees throughput for 16 initial writers = 29467.57 ops/sec Min throughput per thread = 1845.28 ops/sec Max throughput per thread = 1853.28 ops/sec Avg throughput per thread = 1847.38 ops/sec Min xfer = 261012.00 ops Children see throughput for 16 rewriters = 26802.94 ops/sec Parent sees throughput for 16 rewriters = 26801.51 ops/sec Min throughput per thread = 1671.70 ops/sec Max throughput per thread = 1679.40 ops/sec Avg throughput per thread = 1675.18 ops/sec Min xfer = 260942.00 ops Children see throughput for 16 readers = 305525.26 ops/sec Parent sees throughput for 16 readers = 304910.58 ops/sec Min throughput per thread = 16371.37 ops/sec Max throughput per thread = 20084.48 ops/sec Avg throughput per thread = 19095.33 ops/sec Min xfer = 213905.00 ops Children see throughput for 16 re-readers = 301510.86 ops/sec Parent sees throughput for 16 re-readers = 301021.85 ops/sec Min throughput per thread = 16066.28 ops/sec Max throughput per thread = 19850.40 ops/sec Avg throughput per thread = 18844.43 ops/sec Min xfer = 212289.00 ops Children see throughput for 16 reverse readers = 520691.82 ops/sec Parent sees throughput for 16 reverse readers = 520026.68 ops/sec Min throughput per thread = 30897.40 ops/sec Max throughput per thread = 33412.20 ops/sec Avg throughput per thread = 32543.24 ops/sec Min xfer = 242448.00 ops Children see throughput for 16 stride readers = 27067.77 ops/sec Parent sees throughput for 16 stride readers = 27064.74 ops/sec Min throughput per thread = 1549.09 ops/sec Max throughput per thread = 3205.10 ops/sec Avg throughput per thread = 1691.74 ops/sec Min xfer = 126699.00 ops Children see throughput for 16 random readers = 215258.98 ops/sec Parent sees throughput for 16 random readers = 214461.71 ops/sec Min throughput per thread = 2759.80 ops/sec Max throughput per thread = 169551.89 ops/sec Avg throughput per thread = 13453.69 ops/sec Min xfer = 4281.00 ops Children see throughput for 16 mixed workload = 8673.89 ops/sec Parent sees throughput for 16 mixed workload = 6341.03 ops/sec Min throughput per thread = 442.73 ops/sec Max throughput per thread = 641.36 ops/sec Avg throughput per thread = 542.12 ops/sec Min xfer = 180991.00 ops Children see throughput for 16 random writers = 4008.54 ops/sec Parent sees throughput for 16 random writers = 3972.48 ops/sec Min throughput per thread = 248.54 ops/sec Max throughput per thread = 252.76 ops/sec Avg throughput per thread = 250.53 ops/sec Min xfer = 257769.00 ops Children see throughput for 16 fwriters = 70222.20 ops/sec Parent sees throughput for 16 fwriters = 65632.32 ops/sec Min throughput per thread = 4132.12 ops/sec Max throughput per thread = 4686.85 ops/sec Avg throughput per thread = 4388.89 ops/sec Min xfer = 262144.00 ops Error in file: Found ?0? Expecting ?7979797979797979? addr 29f6770 Error in file: Found ?0? Expecting ?7979797979797979? addr 29f6770 Error in file: Position 0 Error in file: Position 0 Record # 0 Record size 8 kb Record # 0 Record size 8 kb where 29f6770x loop 0 where 29f6770x loop 0 I can delete and create files just fine. Grrrr. -brian > On Dec 9, 2015, at 11:27 AM, Brian Hechinger wrote: > > >> On Dec 9, 2015, at 11:22 AM, Dan McDonald wrote: >> >> >>> On Dec 9, 2015, at 11:18 AM, Brian Hechinger wrote: >>> >>> It?s brand new!! >> >> Sometimes you get flaky HW that's new. I've had to return new spinning-rust disks, for example. > > Bah. :( > >> >>> Also, I would expect the other slice to be affected as well? It?s been humming along just fine as SLOG with no errors: >>> >>> logs >>> mirror-3 ONLINE 0 0 0 >>> c4t1d0s0 ONLINE 0 0 0 >>> c5t1d0s0 ONLINE 0 0 0 >> >> Could just be bad luck your slog hasn't encountered the bad portion of this drive. > > I suppose. You think there is a maybe a good way to test this device before I try to get it RMA-ed? > >> Also, what OmniOS revision are you running? If you're not up to the latest November r151014 update, you may be missing some NVMe fixes. > > Oh right, totally forgot to do that for you: > > wonko at basket1:/var/adm$ head /etc/release ; uname -a > OmniOS v11 r151016 > Copyright 2015 OmniTI Computer Consulting, Inc. All rights reserved. > Use is subject to license terms. > SunOS basket1 5.11 omnios-073d8c0 i86pc i386 i86pc > From danmcd at omniti.com Thu Dec 17 19:15:11 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 17 Dec 2015 14:15:11 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> <4B0CFB00-2181-4E38-B0E1-8AAAA3E6136C@omniti.com> <7D06CC38-9841-4189-80CD-6341E025B10C@4amlunch.net> Message-ID: <1A509267-5ADB-451C-A540-5F49367B7C22@omniti.com> > On Dec 17, 2015, at 2:05 PM, Brian Hechinger wrote: > > I can delete and create files just fine. > > Grrrr. Scrub it now. Just in case. A scrub is always a good idea anyway just to make sure bits haven't rotted on the disk. Dan From danmcd at omniti.com Thu Dec 17 19:15:55 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 17 Dec 2015 14:15:55 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <1A509267-5ADB-451C-A540-5F49367B7C22@omniti.com> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> <4B0CFB00-2181-4E38-B0E1-8AAAA3E6136C@omniti.com> <7D06CC38-9841-4189-80CD-6341E025B10C@4amlunch.net> <1A509267-5ADB-451C-A540-5F49367B7C22@omniti.com> Message-ID: <68FC6DA2-5431-4AD5-8F4F-1023ED9CAA15@omniti.com> > On Dec 17, 2015, at 2:15 PM, Dan McDonald wrote: > > Scrub it now. Just in case. A scrub is always a good idea anyway just to make sure bits haven't rotted on the disk. Pardon me if I'm being pedantic: zpool scrub zoom Then check it with zpool status. Dan From wonko at 4amlunch.net Thu Dec 17 19:17:29 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Thu, 17 Dec 2015 14:17:29 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <1A509267-5ADB-451C-A540-5F49367B7C22@omniti.com> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> <4B0CFB00-2181-4E38-B0E1-8AAAA3E6136C@omniti.com> <7D06CC38-9841-4189-80CD-6341E025B10C@4amlunch.net> <1A509267-5ADB-451C-A540-5F49367B7C22@omniti.com> Message-ID: Boom. wonko at basket1:/export/home/wonko$ sudo zpool scrub zoom Password: wonko at basket1:/export/home/wonko$ sudo zpool status -v zoom pool: zoom state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 226K in 0h0m with 0 errors on Thu Dec 17 14:15:12 2015 config: NAME STATE READ WRITE CKSUM zoom DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c4t1d0s1 DEGRADED 0 0 38 too many errors c5t1d0s1 DEGRADED 0 0 42 too many errors errors: No known data errors -brian > On Dec 17, 2015, at 2:15 PM, Dan McDonald wrote: > > >> On Dec 17, 2015, at 2:05 PM, Brian Hechinger wrote: >> >> I can delete and create files just fine. >> >> Grrrr. > > Scrub it now. Just in case. A scrub is always a good idea anyway just to make sure bits haven't rotted on the disk. > > Dan > From danmcd at omniti.com Thu Dec 17 19:18:13 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 17 Dec 2015 14:18:13 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> <4B0CFB00-2181-4E38-B0E1-8AAAA3E6136C@omniti.com> <7D06CC38-9841-4189-80CD-6341E025B10C@4amlunch.net> <1A509267-5ADB-451C-A540-5F49367B7C22@omniti.com> Message-ID: <7FF77844-B41C-4837-A81F-42F32EF72AAC@omniti.com> > On Dec 17, 2015, at 2:17 PM, Brian Hechinger wrote: > > Boom. > > wonko at basket1:/export/home/wonko$ sudo zpool scrub zoom > Password: > wonko at basket1:/export/home/wonko$ sudo zpool status -v zoom > pool: zoom > state: DEGRADED > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using 'zpool clear' or replace the device with 'zpool replace'. > see: http://illumos.org/msg/ZFS-8000-9P > scan: scrub repaired 226K in 0h0m with 0 errors on Thu Dec 17 14:15:12 2015 > config: > > NAME STATE READ WRITE CKSUM > zoom DEGRADED 0 0 0 > mirror-0 DEGRADED 0 0 0 > c4t1d0s1 DEGRADED 0 0 38 too many errors > c5t1d0s1 DEGRADED 0 0 42 too many errors > > errors: No known data errors Looks like you got bad drives. Dan From wonko at 4amlunch.net Thu Dec 17 19:20:13 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Thu, 17 Dec 2015 14:20:13 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <7FF77844-B41C-4837-A81F-42F32EF72AAC@omniti.com> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> <4B0CFB00-2181-4E38-B0E1-8AAAA3E6136C@omniti.com> <7D06CC38-9841-4189-80CD-6341E025B10C@4amlunch.net> <1A509267-5ADB-451C-A540-5F49367B7C22@omniti.com> <7FF77844-B41C-4837-A81F-42F32EF72AAC@omniti.com> Message-ID: That seems??? unlikely to me? I?ll put one of them into a linux box and see what happens with it. Is there a way to somehow see if the nvme drivers are being wonky? I get the feeling NVMe 1.1 cards aren?t completely supported just yet? -brian > On Dec 17, 2015, at 2:18 PM, Dan McDonald wrote: > >> >> On Dec 17, 2015, at 2:17 PM, Brian Hechinger wrote: >> >> Boom. >> >> wonko at basket1:/export/home/wonko$ sudo zpool scrub zoom >> Password: >> wonko at basket1:/export/home/wonko$ sudo zpool status -v zoom >> pool: zoom >> state: DEGRADED >> status: One or more devices has experienced an unrecoverable error. An >> attempt was made to correct the error. Applications are unaffected. >> action: Determine if the device needs to be replaced, and clear the errors >> using 'zpool clear' or replace the device with 'zpool replace'. >> see: http://illumos.org/msg/ZFS-8000-9P >> scan: scrub repaired 226K in 0h0m with 0 errors on Thu Dec 17 14:15:12 2015 >> config: >> >> NAME STATE READ WRITE CKSUM >> zoom DEGRADED 0 0 0 >> mirror-0 DEGRADED 0 0 0 >> c4t1d0s1 DEGRADED 0 0 38 too many errors >> c5t1d0s1 DEGRADED 0 0 42 too many errors >> >> errors: No known data errors > > Looks like you got bad drives. > > Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Thu Dec 17 19:21:33 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 17 Dec 2015 14:21:33 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> <4B0CFB00-2181-4E38-B0E1-8AAAA3E6136C@omniti.com> <7D06CC38-9841-4189-80CD-6341E025B10C@4amlunch.net> <1A509267-5ADB-451C-A540-5F49367B7C22@omniti.com> <7FF77844-B41C-4837-A81F-42F32EF72AAC@omniti.com> Message-ID: <4C5FB8BB-861E-4425-9F96-17D8E337A5AA@omniti.com> > On Dec 17, 2015, at 2:20 PM, Brian Hechinger wrote: > > That seems??? unlikely to me? > > I?ll put one of them into a linux box and see what happens with it. > > Is there a way to somehow see if the nvme drivers are being wonky? I get the feeling NVMe 1.1 cards aren?t completely supported just yet? OH SHOOT! I forgot these are NVMe. Did you see my mail announcing the update? Did you see it has two NVME fixes in it? Dan From wonko at 4amlunch.net Thu Dec 17 19:23:26 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Thu, 17 Dec 2015 14:23:26 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <4C5FB8BB-861E-4425-9F96-17D8E337A5AA@omniti.com> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> <4B0CFB00-2181-4E38-B0E1-8AAAA3E6136C@omniti.com> <7D06CC38-9841-4189-80CD-6341E025B10C@4amlunch.net> <1A509267-5ADB-451C-A540-5F49367B7C22@omniti.com> <7FF77844-B41C-4837-A81F-42F32EF72AAC@omniti.com> <4C5FB8BB-861E-4425-9F96-17D8E337A5AA@omniti.com> Message-ID: <04185D4F-4394-4517-B395-89012DB9BE66@4amlunch.net> Yeah, I think the one I already had (the init() related one that Hans gave me) but I wonder if the other one is somehow related? I?ve installed the updates. I?ll re-create the pool and re-run iozone -brian > On Dec 17, 2015, at 2:21 PM, Dan McDonald wrote: > > >> On Dec 17, 2015, at 2:20 PM, Brian Hechinger wrote: >> >> That seems??? unlikely to me? >> >> I?ll put one of them into a linux box and see what happens with it. >> >> Is there a way to somehow see if the nvme drivers are being wonky? I get the feeling NVMe 1.1 cards aren?t completely supported just yet? > > OH SHOOT! I forgot these are NVMe. > > Did you see my mail announcing the update? Did you see it has two NVME fixes in it? > > Dan > From wonko at 4amlunch.net Thu Dec 17 19:38:59 2015 From: wonko at 4amlunch.net (Brian Hechinger) Date: Thu, 17 Dec 2015 14:38:59 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <04185D4F-4394-4517-B395-89012DB9BE66@4amlunch.net> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> <4B0CFB00-2181-4E38-B0E1-8AAAA3E6136C@omniti.com> <7D06CC38-9841-4189-80CD-6341E025B10C@4amlunch.net> <1A509267-5ADB-451C-A540-5F49367B7C22@omniti.com> <7FF77844-B41C-4837-A81F-42F32EF72AAC@omniti.com> <4C5FB8BB-861E-4425-9F96-17D8E337A5AA@omniti.com> <04185D4F-4394-4517-B395-89012DB9BE66@4amlunch.net> Message-ID: <31E95CC0-CFE7-46BE-ABF2-F099ADBB7527@4amlunch.net> And?? pool: zoom state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: none requested config: NAME STATE READ WRITE CKSUM zoom DEGRADED 0 0 25 mirror-0 DEGRADED 0 0 150 c4t1d0s1 DEGRADED 0 0 150 too many errors c5t1d0s1 DEGRADED 0 0 154 too many errors So those patches didn?t help. :( -brian > On Dec 17, 2015, at 2:23 PM, Brian Hechinger wrote: > > Yeah, I think the one I already had (the init() related one that Hans gave me) but I wonder if the other one is somehow related? > > I?ve installed the updates. > > I?ll re-create the pool and re-run iozone > > -brian > >> On Dec 17, 2015, at 2:21 PM, Dan McDonald wrote: >> >> >>> On Dec 17, 2015, at 2:20 PM, Brian Hechinger wrote: >>> >>> That seems??? unlikely to me? >>> >>> I?ll put one of them into a linux box and see what happens with it. >>> >>> Is there a way to somehow see if the nvme drivers are being wonky? I get the feeling NVMe 1.1 cards aren?t completely supported just yet? >> >> OH SHOOT! I forgot these are NVMe. >> >> Did you see my mail announcing the update? Did you see it has two NVME fixes in it? >> >> Dan >> > From danmcd at omniti.com Thu Dec 17 19:44:28 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 17 Dec 2015 14:44:28 -0500 Subject: [OmniOS-discuss] Hung ZFS Pool In-Reply-To: <31E95CC0-CFE7-46BE-ABF2-F099ADBB7527@4amlunch.net> References: <25D313E7-C974-43C0-817E-3A96514BCC16@4amlunch.net> <3FF750E3-A2C5-467C-A0D2-BDCC8C48C9CA@4amlunch.net> <7F5D451E-6467-4A3D-8785-AE069524452A@omniti.com> <4B858828-C823-4251-84A9-417028B01B3C@omniti.com> <584980F4-502A-4700-A58F-E720CB398BF0@4amlunch.net> <4B0CFB00-2181-4E38-B0E1-8AAAA3E6136C@omniti.com> <7D06CC38-9841-4189-80CD-6341E025B10C@4amlunch.net> <1A509267-5ADB-451C-A540-5F49367B7C22@omniti.com> <7FF77844-B41C-4837-A81F-42F32EF72AAC@omniti.com> <4C5FB8BB-861E-4425-9F96-17D8E337A5AA@omniti.com> <04185D4F-4394-4517-B395-89012DB9BE66@4amlunch.net> <31E95CC0-CFE7-46BE-ABF2-F099ADBB7527@4amlunch.net> Message-ID: <440ADF5A-0FFB-40E3-B4FB-17015DAA0C91@omniti.com> > On Dec 17, 2015, at 2:38 PM, Brian Hechinger wrote: > > So those patches didn?t help. :( Hmm. Try one on your Linux box, and I do know that we really only support NVMe 1.0 currently, not any higher revisions. You may also need to kick this out to the illumos list, where the NVMe developer can see it. Dan From stephan.budach at JVM.DE Sun Dec 20 21:16:22 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Sun, 20 Dec 2015 22:16:22 +0100 Subject: [OmniOS-discuss] OmniOS r151016 crashed and rebootet Message-ID: <56771AA6.90501@jvm.de> Hi all, a couple of hours ago one of my OmniOS boxes crashed and rebootet. As I'd like to determine the reason as of why that happend, I'd could use some advice on how to do that. There is a vmdump.0 available, but I am lacking the knowledge what to do with it. Could anyone fill me in on that? Thanks, Stephan From stephan.budach at JVM.DE Sun Dec 20 21:54:09 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Sun, 20 Dec 2015 22:54:09 +0100 Subject: [OmniOS-discuss] OmniOS r151016 crashed and rebootet In-Reply-To: <56771AA6.90501@jvm.de> References: <56771AA6.90501@jvm.de> Message-ID: <56772381.8040606@jvm.de> A little addendum? Am 20.12.15 um 22:16 schrieb Stephan Budach: > Hi all, > > a couple of hours ago one of my OmniOS boxes crashed and rebootet. As > I'd like to determine the reason as of why that happend, I'd could use > some advice on how to do that. There is a vmdump.0 available, but I am > lacking the knowledge what to do with it. > > Could anyone fill me in on that? > > Thanks, > Stephan I found this while digging through fmdump's log: root at nfsvmpool07:/root# fmdump -Vp -u 1e24474c-0077-cad1-e684-8f3b0f950af6 TIME UUID SUNW-MSG-ID Dez 20 2015 20:26:34.062346000 1e24474c-0077-cad1-e684-8f3b0f950af6 SUNOS-8000-KL TIME CLASS ENA Dez 20 20:26:33.8727 ireport.os.sunos.panic.dump_available 0x0000000000000000 Dez 20 20:23:14.5491 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000 nvlist version: 0 version = 0x0 class = list.suspect uuid = 1e24474c-0077-cad1-e684-8f3b0f950af6 code = SUNOS-8000-KL diag-time = 1450639593 895469 de = fmd:///module/software-diagnosis fault-list-sz = 0x1 fault-list = (array of embedded nvlists) (start fault-list[0]) nvlist version: 0 version = 0x0 class = defect.sunos.kernel.panic certainty = 0x64 asru = sw:///:path=/var/crash/unknown/.1e24474c-0077-cad1-e684-8f3b0f950af6 resource = sw:///:path=/var/crash/unknown/.1e24474c-0077-cad1-e684-8f3b0f950af6 savecore-succcess = 1 dump-dir = /var/crash/unknown dump-files = vmdump.0 os-instance-uuid = 1e24474c-0077-cad1-e684-8f3b0f950af6 panicstr = kernel heap corruption detected panicstack = fffffffffba4e8d4 () | genunix:kmem_slab_free+c1 () | genunix:kmem_magazine_destroy+6e () | genunix:kmem_depot_ws_reap+5d () | genunix:kmem_cache_magazine_purge+110 () | genunix:kmem_cache_magazine_resize+40 () | genunix:taskq_thread+2d0 () | unix:thread_start+8 () | crashtime = 1450638112 panic-time = Sun Dec 20 20:01:52 2015 CET (end fault-list[0]) fault-status = 0x1 severity = Major __ttl = 0x1 __tod = 0x567700ea 0x3b75310 Cheers, Stephan From danmcd at omniti.com Sun Dec 20 23:36:37 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sun, 20 Dec 2015 18:36:37 -0500 Subject: [OmniOS-discuss] OmniOS r151016 crashed and rebootet In-Reply-To: <56771AA6.90501@jvm.de> References: <56771AA6.90501@jvm.de> Message-ID: <213D3342-ABEA-4AA1-A8B5-773940300451@omniti.com> Place it somewhere I can download it unless it has customer information. If it does, mail me offline for instructions. Dan Sent from my iPhone (typos, autocorrect, and all) > On Dec 20, 2015, at 4:16 PM, Stephan Budach wrote: > > Hi all, > > a couple of hours ago one of my OmniOS boxes crashed and rebootet. As I'd like to determine the reason as of why that happend, I'd could use some advice on how to do that. There is a vmdump.0 available, but I am lacking the knowledge what to do with it. > > Could anyone fill me in on that? > > Thanks, > Stephan > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From bfriesen at simple.dallas.tx.us Mon Dec 21 21:29:48 2015 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Mon, 21 Dec 2015 15:29:48 -0600 (CST) Subject: [OmniOS-discuss] rsync 3.1.2 & security fix Message-ID: Rsync 3.1.2 is out and contains a security fix. OmniOS seems to be using 3.1.1. See "http://rsync.samba.org/ftp/rsync/src/rsync-3.1.2-NEWS". Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From bfriesen at simple.dallas.tx.us Mon Dec 21 23:26:02 2015 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Mon, 21 Dec 2015 17:26:02 -0600 (CST) Subject: [OmniOS-discuss] OmniOS r151016 zone has difficulties shutting down In-Reply-To: <536501D2-EA96-4F6B-8CB2-39A0F9698267@omniti.com> References: <536501D2-EA96-4F6B-8CB2-39A0F9698267@omniti.com> Message-ID: > >> Have others encountered this issue? What can be done to fix it? > > This message is printed by zoneadmd. If you or anyone else encounters this hang again, please do the following: > > 1.) While zoneadm is hung, check the console for the above message, you'll see a pid for zoneadmd (Bob's example was 17388). > > 2.) See if you can get the stack(s) of zoneadmd that reported the console master error: pstack > > 3.) Grab a corefile of the zoneadmd: gcore > > 4.) Share the corefile somehow. > > The pstack and core of the running/hung zoneadm(1M) command would also be useful, I think. I captured some data (as described above) and have made it available for anonymous ftp at "ftp://ftp.simplesystems.org/pub/outgoing/omnios/zoneadmd/". I did this prior to updating the system due to suspecting that the problem would be cured by rebooting the system. As suspected, the problem was cured by rebooting the system. Perhaps the parent zoneadmd is confused about the state after the new zone has been added and this confusion carries over to the child zoneadmd. The creation of the zone follows the example from the OmniOS Wiki except for the addition of a lofs mount. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From danmcd at omniti.com Tue Dec 22 01:02:59 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 21 Dec 2015 20:02:59 -0500 Subject: [OmniOS-discuss] OmniOS r151016 zone has difficulties shutting down In-Reply-To: References: <536501D2-EA96-4F6B-8CB2-39A0F9698267@omniti.com> Message-ID: I forgot to mention "zoneadm list -cv". That would've shown the zone's state. The pstack for zoneadmd showed this function: http://src.illumos.org/source/xref/illumos-gate/usr/src/cmd/zoneadmd/zoneadmd.c#1089 is waiting for something to only exit a loop after a long wait. I'm curious which of: zone_get_state() failing or "zstate == ZONE_STATE_INSTALLED" fails? That's why the "list -cv" would've been nice. If this happens ever again, some more useful captures: ptree `pgrep -z ` pargs `pgrep -z ` pstack `pgrep -z ` That'll be a LOT of output, BUT it may provide more clues. Thanks for this, and sorry I don't have more immediately useful data. Dan From danmcd at omniti.com Tue Dec 22 01:17:27 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 21 Dec 2015 20:17:27 -0500 Subject: [OmniOS-discuss] rsync 3.1.2 & security fix In-Reply-To: References: Message-ID: <06447436-873B-436B-9686-5B632F9D5C9A@omniti.com> > On Dec 21, 2015, at 4:29 PM, Bob Friesenhahn wrote: > > Rsync 3.1.2 is out and contains a security fix. OmniOS seems to be using 3.1.1. See "http://rsync.samba.org/ftp/rsync/src/rsync-3.1.2-NEWS". So much for a full vacation day... Watch this space for an update later this evening. Thank you for pointing this out, security is important!!! Dan From danmcd at omniti.com Tue Dec 22 01:29:12 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 21 Dec 2015 20:29:12 -0500 Subject: [OmniOS-discuss] SECURITY UPDATE rsync to 3.1.2 Message-ID: Hello! Thanks go out to Bob Friesenhahn for reminding me about today's rsync security update. It is now pushed out for r151014 (LTS) and r151016 (Stable). Happy updating! Dan p.s. I'm officially on vacation the rest of this year, so pardon any latency increases until 2016. From bfriesen at simple.dallas.tx.us Tue Dec 22 02:05:54 2015 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Mon, 21 Dec 2015 20:05:54 -0600 (CST) Subject: [OmniOS-discuss] rsync 3.1.2 & security fix In-Reply-To: <06447436-873B-436B-9686-5B632F9D5C9A@omniti.com> References: <06447436-873B-436B-9686-5B632F9D5C9A@omniti.com> Message-ID: On Mon, 21 Dec 2015, Dan McDonald wrote: > >> On Dec 21, 2015, at 4:29 PM, Bob Friesenhahn wrote: >> >> Rsync 3.1.2 is out and contains a security fix. OmniOS seems to be using 3.1.1. See "http://rsync.samba.org/ftp/rsync/src/rsync-3.1.2-NEWS". > > So much for a full vacation day... > > Watch this space for an update later this evening. > > Thank you for pointing this out, security is important!!! This is yet another case where the sending side can send something bad to cause harm to the recipient. It is only a problem if you don't trust the sending side. I agree that security is important. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From hasslerd at gmx.li Tue Dec 22 10:50:22 2015 From: hasslerd at gmx.li (Dominik Hassler) Date: Tue, 22 Dec 2015 11:50:22 +0100 Subject: [OmniOS-discuss] OmniOS r151016 zone has difficulties shutting down In-Reply-To: References: <536501D2-EA96-4F6B-8CB2-39A0F9698267@omniti.com> Message-ID: <56792AEE.10807@gmx.li> Dan, I remember that in my cases when a zone shutdown got stuck, "zoneadm list -cv" showed the state of the hung zone as: shutting_down On 12/22/2015 02:02 AM, Dan McDonald wrote: > I forgot to mention "zoneadm list -cv". That would've shown the zone's state. > > The pstack for zoneadmd showed this function: > > http://src.illumos.org/source/xref/illumos-gate/usr/src/cmd/zoneadmd/zoneadmd.c#1089 > > is waiting for something to only exit a loop after a long wait. > > I'm curious which of: zone_get_state() failing or "zstate == ZONE_STATE_INSTALLED" fails? That's why the "list -cv" would've been nice. > > If this happens ever again, some more useful captures: > > ptree `pgrep -z ` > > pargs `pgrep -z ` > > pstack `pgrep -z ` > > That'll be a LOT of output, BUT it may provide more clues. > > Thanks for this, and sorry I don't have more immediately useful data. > > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > From jz+omni at neexistuje.sk Tue Dec 22 16:15:39 2015 From: jz+omni at neexistuje.sk (Juraj Ziegler) Date: Tue, 22 Dec 2015 17:15:39 +0100 Subject: [OmniOS-discuss] SECURITY UPDATE rsync to 3.1.2 In-Reply-To: References: Message-ID: > On 22.12.2015, at 2:29, Dan McDonald wrote: > > Hello! > > Thanks go out to Bob Friesenhahn for reminding me about today's rsync security update. It is now pushed out for r151014 (LTS) and r151016 (Stable). > > Happy updating! > Dan > > p.s. I'm officially on vacation the rest of this year, so pardon any latency increases until 2016. Am I doing something wrong, or is something else wrong? rsync is not updating for me. As shown below, ?pkg update -nv? says there?s nothing to update. pkg is subscribed to r151016 publisher. rsync is 3.1.1. (Personally, I don?t mind the vacation latency, but other users might be affected by this as well). root at box:/root# rsync --version rsync version 3.1.1 protocol version 31 Copyright (C) 1996-2014 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 32-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, iconv, symtimes, no prealloc rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence for details. root at box:/root# uname -a SunOS box 5.11 omnios-b5093df i86pc i386 i86pc root at box:/root# cat /etc/release OmniOS v11 r151016 Copyright 2015 OmniTI Computer Consulting, Inc. All rights reserved. Use is subject to license terms. root at box:/root# pkg publisher PUBLISHER TYPE STATUS P LOCATION omnios origin online F http://pkg.omniti.com/omnios/r151016/ ms.omniti.com origin online F http://pkg.omniti.com/omniti-ms/ niksula.hut.fi origin online F http://pkg.niksula.hut.fi/ omnios.blackdot.be origin online F http://omnios.blackdot.be/ uulm.mawi origin online F http://scott.mathematik.uni-ulm.de/release/ root at box:/root# pkg update -nv No updates available for this image. j. From danmcd at omniti.com Tue Dec 22 16:47:35 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 22 Dec 2015 11:47:35 -0500 Subject: [OmniOS-discuss] SECURITY UPDATE rsync to 3.1.2 In-Reply-To: References: Message-ID: <206EEEE6-E838-4BB6-8008-5C99E3D38E7F@omniti.com> > On Dec 22, 2015, at 11:15 AM, Juraj Ziegler wrote: > > Am I doing something wrong, or is something else wrong? > rsync is not updating for me. > > As shown below, ?pkg update -nv? says there?s nothing to update. > pkg is subscribed to r151016 publisher. > rsync is 3.1.1. > > (Personally, I don?t mind the vacation latency, but other users might be affected by this as well). Where is your rsync coming from? Utter this: pkg list rsync and see if you have output. Maybe you have your own version somewhere? Maybe it's from one of the other publishers you mention: > root at box:/root# pkg publisher > PUBLISHER TYPE STATUS P LOCATION > omnios origin online F http://pkg.omniti.com/omnios/r151016/ > ms.omniti.com origin online F http://pkg.omniti.com/omniti-ms/ > niksula.hut.fi origin online F http://pkg.niksula.hut.fi/ > omnios.blackdot.be origin online F http://omnios.blackdot.be/ > uulm.mawi origin online F http://scott.mathematik.uni-ulm.de/release/ > > root at box:/root# pkg update -nv > No updates available for this image. It's most certainly available: pkg list -avf -g http://pkg.omniti.com/omniti-ms/r151016 rsync You should see two versions, 3.1.1 and 3.1.2. I think your'e getting your rsync from one of the other publishers mentioned on your list. Dan From davide.poletto at gmail.com Tue Dec 22 21:37:49 2015 From: davide.poletto at gmail.com (Davide Poletto) Date: Tue, 22 Dec 2015 22:37:49 +0100 Subject: [OmniOS-discuss] SECURITY UPDATE rsync to 3.1.2 In-Reply-To: <206EEEE6-E838-4BB6-8008-5C99E3D38E7F@omniti.com> References: <206EEEE6-E838-4BB6-8008-5C99E3D38E7F@omniti.com> Message-ID: Just for information...on a r151014 (on which rsync was not installed) the command: pkg list -avf -g http://pkg.omniti.com/omniti-ms/r151014 rsync provides: Errors were encountered while attempting to retrieve package or file data for the requested operation. Details follow: http protocol error: code: 400 reason: Bad Request URL: 'http://pkg.omniti.com/omniti-ms/r151014/versions/0/' The same result happens using the r151016 string instead of r151014. Instead running: pkg list -avf -g http://pkg.omniti.com/omnios/r151014 rsync provides what is expected: FMRI IFO pkg://omnios/network/rsync at 3.1.2-0.151014:20151222T011609Z --- pkg://omnios/network/rsync at 3.1.1-0.151014:20150402T174523Z --- So rsync could be provided by omnios publisher. Looking at r151016: pkg list -avf -g http://pkg.omniti.com/omnios/r151016 rsync gives: FMRI IFO pkg://omnios/network/rsync at 3.1.2-0.151016:20151222T011220Z --- pkg://omnios/network/rsync at 3.1.1-0.151016:20151102T185945Z --- Once rsync is installed (omnios publisher) invoking: pkg list -avf -g http://pkg.omniti.com/omnios/r151014 rsync provides: FMRI IFO pkg://omnios/network/rsync at 3.1.2-0.151014:20151222T011609Z i-- pkg://omnios/network/rsync at 3.1.1-0.151014:20150402T174523Z --- On Tue, Dec 22, 2015 at 5:47 PM, Dan McDonald wrote: > > > On Dec 22, 2015, at 11:15 AM, Juraj Ziegler > wrote: > > > > Am I doing something wrong, or is something else wrong? > > rsync is not updating for me. > > > > As shown below, ?pkg update -nv? says there?s nothing to update. > > pkg is subscribed to r151016 publisher. > > rsync is 3.1.1. > > > > (Personally, I don?t mind the vacation latency, but other users might be > affected by this as well). > > Where is your rsync coming from? Utter this: > > pkg list rsync > > and see if you have output. Maybe you have your own version somewhere? > Maybe it's from one of the other publishers you mention: > > > root at box:/root# pkg publisher > > PUBLISHER TYPE STATUS P LOCATION > > omnios origin online F > http://pkg.omniti.com/omnios/r151016/ > > ms.omniti.com origin online F > http://pkg.omniti.com/omniti-ms/ > > niksula.hut.fi origin online F http://pkg.niksula.hut.fi/ > > omnios.blackdot.be origin online F > http://omnios.blackdot.be/ > > uulm.mawi origin online F > http://scott.mathematik.uni-ulm.de/release/ > > > > root at box:/root# pkg update -nv > > No updates available for this image. > > It's most certainly available: > > pkg list -avf -g http://pkg.omniti.com/omniti-ms/r151016 rsync > > You should see two versions, 3.1.1 and 3.1.2. > > I think your'e getting your rsync from one of the other publishers > mentioned on your list. > > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Tue Dec 22 22:04:28 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 22 Dec 2015 17:04:28 -0500 Subject: [OmniOS-discuss] SECURITY UPDATE rsync to 3.1.2 In-Reply-To: References: <206EEEE6-E838-4BB6-8008-5C99E3D38E7F@omniti.com> Message-ID: I misspelled the URL. Your URLs are correct. Dan From jz+omni at neexistuje.sk Wed Dec 23 00:12:17 2015 From: jz+omni at neexistuje.sk (Juraj Ziegler) Date: Wed, 23 Dec 2015 01:12:17 +0100 Subject: [OmniOS-discuss] SECURITY UPDATE rsync to 3.1.2 In-Reply-To: <206EEEE6-E838-4BB6-8008-5C99E3D38E7F@omniti.com> References: <206EEEE6-E838-4BB6-8008-5C99E3D38E7F@omniti.com> Message-ID: > On 22.12.2015, at 17:47, Dan McDonald wrote: > > >> On Dec 22, 2015, at 11:15 AM, Juraj Ziegler wrote: >> >> Am I doing something wrong, or is something else wrong? >> rsync is not updating for me. >> >> As shown below, ?pkg update -nv? says there?s nothing to update. >> pkg is subscribed to r151016 publisher. >> rsync is 3.1.1. >> >> (Personally, I don?t mind the vacation latency, but other users might be affected by this as well). > > Where is your rsync coming from? Utter this: ? > I think your'e getting your rsync from one of the other publishers mentioned on your list. Right you are: root at box:/root# which rsync /opt/local/bin/rsync root at box:/root# pkgin ls | grep rsync rsync-3.1.1 Network file distribution/synchronisation utility I had it from pkgsrc. j. From gary at genashor.com Wed Dec 23 00:20:56 2015 From: gary at genashor.com (Gary Gendel) Date: Tue, 22 Dec 2015 19:20:56 -0500 Subject: [OmniOS-discuss] SECURITY UPDATE rsync to 3.1.2 In-Reply-To: References: <206EEEE6-E838-4BB6-8008-5C99E3D38E7F@omniti.com> Message-ID: <5679E8E8.3010303@genashor.com> I had the same situation. Unfortunately, pkgsrc dirvish has pkgsrc rsync as a dependency. Anyone know how to fool pkgsrc into understanding that rsync is already installed? I manually removed pkgsrc rsync for the time being. Gary On 12/22/2015 7:12 PM, Juraj Ziegler wrote: >> On 22.12.2015, at 17:47, Dan McDonald wrote: >> >> >>> On Dec 22, 2015, at 11:15 AM, Juraj Ziegler wrote: >>> >>> Am I doing something wrong, or is something else wrong? >>> rsync is not updating for me. >>> >>> As shown below, ?pkg update -nv? says there?s nothing to update. >>> pkg is subscribed to r151016 publisher. >>> rsync is 3.1.1. >>> >>> (Personally, I don?t mind the vacation latency, but other users might be affected by this as well). >> Where is your rsync coming from? Utter this: > ? > >> I think your'e getting your rsync from one of the other publishers mentioned on your list. > Right you are: > > root at box:/root# which rsync > /opt/local/bin/rsync > > root at box:/root# pkgin ls | grep rsync > rsync-3.1.1 Network file distribution/synchronisation utility > > I had it from pkgsrc. > > > j. > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From doug at will.to Wed Dec 23 00:39:49 2015 From: doug at will.to (Doug Hughes) Date: Tue, 22 Dec 2015 19:39:49 -0500 Subject: [OmniOS-discuss] reverse engineering from broken ips Message-ID: <5679ED55.4080008@will.to> had a local IPS server.. it went SNAFU. totally lost.. Has anybody recovered an IPS from the local installed packages that came from that repo to generate a new one? /var/pkg/publisher//pkg/... it looks like the complete catalog is in there that could be used rebuild a repo. Ideas? From jerry1209 at cht.com.tw Wed Dec 23 00:44:03 2015 From: jerry1209 at cht.com.tw (=?big5?B?sWmubaZ0?=) Date: Wed, 23 Dec 2015 00:44:03 +0000 Subject: [OmniOS-discuss] How to get NFS read & write latency in OmniOS r151016 Message-ID: <58A78BB477E10F419783CE1E1E5185C301227187B7@mbs5.app.corp.cht.com.tw> Hi all, According to the release note of OmniOS r151016, we could get ?IOPS, bandwidth, and latency kstats for NFS server? there is lots of information showing when I use enter command #kstat, I want to get the ?nfs read & write latency for NFS server? Q1 : Is the ?nfs:0:rfsprocio_v4_write:wtime? & ?nfs:0:rfsprocio_v4_read:wtime? meant write & read latency ? Q2 : I mounted the nfs share directory, and write lots file to it, the number of ?nfs:0:rfsprocio_v4_write:wtime? & ?nfs:0:rfsprocio_v4_read:wtime? still zero. Why ? #kstat ?p ?m nfs ?n rfsprocio_v4_write nfs:0:rfsprocio_v4_write:class rfsprocio_v4 nfs:0:rfsprocio_v4_write:crtime 50.833043074 nfs:0:rfsprocio_v4_write:nread 3932160 nfs:0:rfsprocio_v4_write:nwritten 5374607360 nfs:0:rfsprocio_v4_write:rcnt 0 nfs:0:rfsprocio_v4_write:reads 163840 nfs:0:rfsprocio_v4_write:rlastupdate 12048225488385 nfs:0:rfsprocio_v4_write:rlentime 33429565743 nfs:0:rfsprocio_v4_write:rtime 23992279289 nfs:0:rfsprocio_v4_write:snaptime 269635.483575440 nfs:0:rfsprocio_v4_write:wcnt 0 nfs:0:rfsprocio_v4_write:wlastupdate 0 nfs:0:rfsprocio_v4_write:wlentime 0 nfs:0:rfsprocio_v4_write:writes 163840 / number of writes / nfs:0:rfsprocio_v4_write:wtime 0 / wait queue - time spent waiting / #kstat ?p ?m nfs ?n rfsprocio_v4_read nfs:0:rfsprocio_v4_read:class rfsprocio_v4 nfs:0:rfsprocio_v4_read:crtime 50.833003263 nfs:0:rfsprocio_v4_read:nread 0 nfs:0:rfsprocio_v4_read:nwritten 0 nfs:0:rfsprocio_v4_read:rcnt 0 nfs:0:rfsprocio_v4_read:reads 0 nfs:0:rfsprocio_v4_read:rlastupdate 0 nfs:0:rfsprocio_v4_read:rlentime 0 nfs:0:rfsprocio_v4_read:rtime 0 nfs:0:rfsprocio_v4_read:snaptime 269635.483080962 nfs:0:rfsprocio_v4_read:wcnt 0 nfs:0:rfsprocio_v4_read:wlastupdate 0 nfs:0:rfsprocio_v4_read:wlentime 0 nfs:0:rfsprocio_v4_read:writes 0 nfs:0:rfsprocio_v4_read:wtime 0 Best regards, --------------------------------------------- ??? ?????????????? TEL: 03-4245663 Please be advised that this email message (including any attachments) contains confidential information and may be legally privileged. If you are not the intended recipient, please destroy this message and all attachments from your system and do not further collect, process, or use them. Chunghwa Telecom and all its subsidiaries and associated companies shall not be liable for the improper or incomplete transmission of the information contained in this email nor for any delay in its receipt or damage to your system. If you are the intended recipient, please protect the confidential and/or personal information contained in this email with due care. Any unauthorized use, disclosure or distribution of this message in whole or in part is strictly prohibited. Also, please self-inspect attachments and hyperlinks contained in this email to ensure the information security and to protect personal information. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.elling at richardelling.com Wed Dec 23 03:49:05 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Tue, 22 Dec 2015 19:49:05 -0800 Subject: [OmniOS-discuss] How to get NFS read & write latency in OmniOS r151016 In-Reply-To: <58A78BB477E10F419783CE1E1E5185C301227187B7@mbs5.app.corp.cht.com.tw> References: <58A78BB477E10F419783CE1E1E5185C301227187B7@mbs5.app.corp.cht.com.tw> Message-ID: <7BC64C53-06FC-465F-BB1E-B242CE7809AD@richardelling.com> > On Dec 22, 2015, at 4:44 PM, ??? wrote: > > Hi all, > According to the release note of OmniOS r151016, we could get ?IOPS, bandwidth, and latency kstats for NFS server? > > there is lots of information showing when I use enter command #kstat, > I want to get the ?nfs read & write latency for NFS server? > > Q1 : Is the ?nfs:0:rfsprocio_v4_write:wtime? & ?nfs:0:rfsprocio_v4_read:wtime? meant write & read latency ? No, wtime is the wait queue occupancy (%wait in iostat -x) A good reference is the man page for kstat(3kstat) man -s 3kstat kstat Hopefully, the information there will answer your Q2. -- richard > Q2 : I mounted the nfs share directory, and write lots file to it, the number of ?nfs:0:rfsprocio_v4_write:wtime? & ?nfs:0:rfsprocio_v4_read:wtime? still zero. Why ? > > #kstat ?p ?m nfs ?n rfsprocio_v4_write > nfs:0:rfsprocio_v4_write:class rfsprocio_v4 > nfs:0:rfsprocio_v4_write:crtime 50.833043074 > nfs:0:rfsprocio_v4_write:nread 3932160 > nfs:0:rfsprocio_v4_write:nwritten 5374607360 > nfs:0:rfsprocio_v4_write:rcnt 0 > nfs:0:rfsprocio_v4_write:reads 163840 > nfs:0:rfsprocio_v4_write:rlastupdate 12048225488385 > nfs:0:rfsprocio_v4_write:rlentime 33429565743 > nfs:0:rfsprocio_v4_write:rtime 23992279289 > nfs:0:rfsprocio_v4_write:snaptime 269635.483575440 > nfs:0:rfsprocio_v4_write:wcnt 0 > nfs:0:rfsprocio_v4_write:wlastupdate 0 > nfs:0:rfsprocio_v4_write:wlentime 0 > nfs:0:rfsprocio_v4_write:writes 163840 / number of writes / > nfs:0:rfsprocio_v4_write:wtime 0 / wait queue - time spent waiting / > > #kstat ?p ?m nfs ?n rfsprocio_v4_read > nfs:0:rfsprocio_v4_read:class rfsprocio_v4 > nfs:0:rfsprocio_v4_read:crtime 50.833003263 > nfs:0:rfsprocio_v4_read:nread 0 > nfs:0:rfsprocio_v4_read:nwritten 0 > nfs:0:rfsprocio_v4_read:rcnt 0 > nfs:0:rfsprocio_v4_read:reads 0 > nfs:0:rfsprocio_v4_read:rlastupdate 0 > nfs:0:rfsprocio_v4_read:rlentime 0 > nfs:0:rfsprocio_v4_read:rtime 0 > nfs:0:rfsprocio_v4_read:snaptime 269635.483080962 > nfs:0:rfsprocio_v4_read:wcnt 0 > nfs:0:rfsprocio_v4_read:wlastupdate 0 > nfs:0:rfsprocio_v4_read:wlentime 0 > nfs:0:rfsprocio_v4_read:writes 0 > nfs:0:rfsprocio_v4_read:wtime 0 > > > > Best regards, > --------------------------------------------- > ??? > ?????????????? > TEL: 03-4245663 > > > > ?????????????????????,???????,???????????????,???????. ???????,?????????????????????,?????????,????????????????????,????????????????. > Please be advised that this email message (including any attachments) contains confidential information and may be legally privileged. If you are not the intended recipient, please destroy this message and all attachments from your system and do not further collect, process, or use them. Chunghwa Telecom and all its subsidiaries and associated companies shall not be liable for the improper or incomplete transmission of the information contained in this email nor for any delay in its receipt or damage to your system. If you are the intended recipient, please protect the confidential and/or personal information contained in this email with due care. Any unauthorized use, disclosure or distribution of this message in whole or in part is strictly prohibited. Also, please self-inspect attachments and hyperlinks contained in this email to ensure the information security and to protect personal information._______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan at syneto.eu Wed Dec 23 08:58:37 2015 From: dan at syneto.eu (Dan Vatca) Date: Wed, 23 Dec 2015 10:58:37 +0200 Subject: [OmniOS-discuss] How to get NFS read & write latency in OmniOS r151016 In-Reply-To: <58A78BB477E10F419783CE1E1E5185C301227187B7@mbs5.app.corp.cht.com.tw> References: <58A78BB477E10F419783CE1E1E5185C301227187B7@mbs5.app.corp.cht.com.tw> Message-ID: If you need latency, you will most likely need a latency distribution histogram, and not an average latency. With averages you will lose latency outliers that are very important. Here's a good read with lots of references on this topic: https://www.vividcortex.com/blog/why-percentiles-dont-work-the-way-you-think To currently do this on OmniOS, you need to use dtrace to aggregate (quantize) time differences between nfsv3:::op-read-start and nfsv3:::op-read-done (same for write). Dan V?tca CTO at Syneto Tel: +40723604357, Skype: dan_vatca On Wed, Dec 23, 2015 at 2:44 AM, ??? wrote: > Hi all, > > According to the release note of OmniOS r151016, we could get ?IOPS, > bandwidth, and latency kstats for NFS server? > > > > there is lots of information showing when I use enter command #kstat, > > I want to get the ?nfs read & write latency for NFS server? > > > > Q1 : Is the ?nfs:0:rfsprocio_v4_write:wtime? & > ?nfs:0:rfsprocio_v4_read:wtime? meant write & read latency ? > > Q2 : I mounted the nfs share directory, and write lots file to it, > the number of ?nfs:0:rfsprocio_v4_write:wtime? & > ?nfs:0:rfsprocio_v4_read:wtime? still zero. Why ? > > > > #kstat ?p ?m nfs ?n rfsprocio_v4_write > > nfs:0:rfsprocio_v4_write:class rfsprocio_v4 > > nfs:0:rfsprocio_v4_write:crtime 50.833043074 > > nfs:0:rfsprocio_v4_write:nread 3932160 > > nfs:0:rfsprocio_v4_write:nwritten 5374607360 > > nfs:0:rfsprocio_v4_write:rcnt 0 > > nfs:0:rfsprocio_v4_write:reads 163840 > > nfs:0:rfsprocio_v4_write:rlastupdate 12048225488385 > > nfs:0:rfsprocio_v4_write:rlentime 33429565743 > > nfs:0:rfsprocio_v4_write:rtime 23992279289 > > nfs:0:rfsprocio_v4_write:snaptime 269635.483575440 > > nfs:0:rfsprocio_v4_write:wcnt 0 > > nfs:0:rfsprocio_v4_write:wlastupdate 0 > > nfs:0:rfsprocio_v4_write:wlentime 0 > > nfs:0:rfsprocio_v4_write:writes 163840 / number of writes / > > nfs:0:rfsprocio_v4_write:wtime 0 / wait queue - > time spent waiting / > > > > #kstat ?p ?m nfs ?n rfsprocio_v4_read > > nfs:0:rfsprocio_v4_read:class rfsprocio_v4 > > nfs:0:rfsprocio_v4_read:crtime 50.833003263 > > nfs:0:rfsprocio_v4_read:nread 0 > > nfs:0:rfsprocio_v4_read:nwritten 0 > > nfs:0:rfsprocio_v4_read:rcnt 0 > > nfs:0:rfsprocio_v4_read:reads 0 > > nfs:0:rfsprocio_v4_read:rlastupdate 0 > > nfs:0:rfsprocio_v4_read:rlentime 0 > > nfs:0:rfsprocio_v4_read:rtime 0 > > nfs:0:rfsprocio_v4_read:snaptime 269635.483080962 > > nfs:0:rfsprocio_v4_read:wcnt 0 > > nfs:0:rfsprocio_v4_read:wlastupdate 0 > > nfs:0:rfsprocio_v4_read:wlentime 0 > > nfs:0:rfsprocio_v4_read:writes 0 > > nfs:0:rfsprocio_v4_read:wtime 0 > > > > > > > > Best regards, > > --------------------------------------------- > > ??? > > ?????????????? > > TEL: 03-4245663 > > > > > > *?????????????????????,???????,???????????????,???????. > ???????,?????????????????????,?????????,????????????????????,????????????????. > Please be advised that this email message (including any attachments) > contains confidential information and may be legally privileged. If you are > not the intended recipient, please destroy this message and all attachments > from your system and do not further collect, process, or use them. Chunghwa > Telecom and all its subsidiaries and associated companies shall not be > liable for the improper or incomplete transmission of the information > contained in this email nor for any delay in its receipt or damage to your > system. If you are the intended recipient, please protect the confidential > and/or personal information contained in this email with due care. Any > unauthorized use, disclosure or distribution of this message in whole or in > part is strictly prohibited. Also, please self-inspect attachments and > hyperlinks contained in this email to ensure the information security and > to protect personal information.* > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.elling at richardelling.com Thu Dec 24 22:44:35 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Thu, 24 Dec 2015 14:44:35 -0800 Subject: [OmniOS-discuss] How to get NFS read & write latency in OmniOS r151016 In-Reply-To: References: <58A78BB477E10F419783CE1E1E5185C301227187B7@mbs5.app.corp.cht.com.tw> Message-ID: <0DB958F5-09BC-48D9-924E-4F54A53F4D3A@RichardElling.com> > On Dec 23, 2015, at 12:58 AM, Dan Vatca wrote: > > If you need latency, you will most likely need a latency distribution histogram, and not an average latency. > With averages you will lose latency outliers that are very important. Here's a good read with lots of references on this topic: https://www.vividcortex.com/blog/why-percentiles-dont-work-the-way-you-think > To currently do this on OmniOS, you need to use dtrace to aggregate (quantize) time differences between nfsv3:::op-read-start and nfsv3:::op-read-done (same for write). Indeed, distributions are much more enlightening than averages. Unfortunately, the new kstats added for NFS server operations on a per-mountpoint basis are implemented using the Riemann sums (KSTAT_TYPE_IO) and it is not possible to obtain per-operation information needed for min/max or distribution. These are the same type of kstat used for the iostat command. Shameless plug, nfssvrtop has proven to be useful in watching NFS traffic and uses the op-read-start/op-read-done method. https://github.com/richardelling/tools ? richard > > > Dan V?tca > CTO at Syneto > Tel: +40723604357, Skype: dan_vatca > > On Wed, Dec 23, 2015 at 2:44 AM, ??? > wrote: > Hi all, > > According to the release note of OmniOS r151016, we could get ?IOPS, bandwidth, and latency kstats for NFS server? > > > > there is lots of information showing when I use enter command #kstat, > > I want to get the ?nfs read & write latency for NFS server? > > > > Q1 : Is the ?nfs:0:rfsprocio_v4_write:wtime? & ?nfs:0:rfsprocio_v4_read:wtime? meant write & read latency ? > > Q2 : I mounted the nfs share directory, and write lots file to it, the number of ?nfs:0:rfsprocio_v4_write:wtime? & ?nfs:0:rfsprocio_v4_read:wtime? still zero. Why ? > > > > #kstat ?p ?m nfs ?n rfsprocio_v4_write > > nfs:0:rfsprocio_v4_write:class rfsprocio_v4 > > nfs:0:rfsprocio_v4_write:crtime 50.833043074 > > nfs:0:rfsprocio_v4_write:nread 3932160 > > nfs:0:rfsprocio_v4_write:nwritten 5374607360 > > nfs:0:rfsprocio_v4_write:rcnt 0 > > nfs:0:rfsprocio_v4_write:reads 163840 > > nfs:0:rfsprocio_v4_write:rlastupdate 12048225488385 > > nfs:0:rfsprocio_v4_write:rlentime 33429565743 > > nfs:0:rfsprocio_v4_write:rtime 23992279289 > > nfs:0:rfsprocio_v4_write:snaptime 269635.483575440 > > nfs:0:rfsprocio_v4_write:wcnt 0 > > nfs:0:rfsprocio_v4_write:wlastupdate 0 > > nfs:0:rfsprocio_v4_write:wlentime 0 > > nfs:0:rfsprocio_v4_write:writes 163840 / number of writes / > > nfs:0:rfsprocio_v4_write:wtime 0 / wait queue - time spent waiting / > > > > #kstat ?p ?m nfs ?n rfsprocio_v4_read > > nfs:0:rfsprocio_v4_read:class rfsprocio_v4 > > nfs:0:rfsprocio_v4_read:crtime 50.833003263 > > nfs:0:rfsprocio_v4_read:nread 0 > > nfs:0:rfsprocio_v4_read:nwritten 0 > > nfs:0:rfsprocio_v4_read:rcnt 0 > > nfs:0:rfsprocio_v4_read:reads 0 > > nfs:0:rfsprocio_v4_read:rlastupdate 0 > > nfs:0:rfsprocio_v4_read:rlentime 0 > > nfs:0:rfsprocio_v4_read:rtime 0 > > nfs:0:rfsprocio_v4_read:snaptime 269635.483080962 > > nfs:0:rfsprocio_v4_read:wcnt 0 > > nfs:0:rfsprocio_v4_read:wlastupdate 0 > > nfs:0:rfsprocio_v4_read:wlentime 0 > > nfs:0:rfsprocio_v4_read:writes 0 > > nfs:0:rfsprocio_v4_read:wtime 0 > > > > > > > > Best regards, > > --------------------------------------------- > > ??? > > ?????????????? > > TEL: 03-4245663 > > > > > > ?????????????????????,???????,???????????????,???????. ???????,?????????????????????,?????????,????????????????????,????????????????. > Please be advised that this email message (including any attachments) contains confidential information and may be legally privileged. If you are not the intended recipient, please destroy this message and all attachments from your system and do not further collect, process, or use them. Chunghwa Telecom and all its subsidiaries and associated companies shall not be liable for the improper or incomplete transmission of the information contained in this email nor for any delay in its receipt or damage to your system. If you are the intended recipient, please protect the confidential and/or personal information contained in this email with due care. Any unauthorized use, disclosure or distribution of this message in whole or in part is strictly prohibited. Also, please self-inspect attachments and hyperlinks contained in this email to ensure the information security and to protect personal information. > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -- Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bfriesen at simple.dallas.tx.us Thu Dec 24 23:36:46 2015 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Thu, 24 Dec 2015 17:36:46 -0600 (CST) Subject: [OmniOS-discuss] OmniOS and OpenMP Message-ID: GCC compiled programs making use of OpenMP require libgomp in order to run. Currently this library is provided as part of the GCC packages. It is necessary to install all of GCC in order for dependent programs to be able to run, and linker run-path (e.g. -R/opt/gcc-5.1.0/lib) also needs to be specified when linking the program. There was a similar problem for libgcc_s.so but this was provided via a runtime package: % pkg contents system/library/gcc-5-runtime PATH usr/lib/amd64 usr/lib/amd64/libgcc_s.so usr/lib/amd64/libgcc_s.so.1 usr/lib/libgcc_s.so usr/lib/libgcc_s.so.1 Can a similar runtime package be provided for libgomp (e.g. gcc-5-gomp-runtime)? It would make sense for gomp to be included in gcc-5-runtime except that since it was not included from the start, adding it now might cause problems. Thanks, Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From gate03 at landcroft.co.uk Sun Dec 27 03:15:47 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Sun, 27 Dec 2015 13:15:47 +1000 Subject: [OmniOS-discuss] networking from a zone Message-ID: <20151227131547.1ac15e50@punda-mlia> Hello, I tried to do this a while ago and Jim Klimov (4 Jan 2015) was kind enough to reply but I was unable to solve the problem with his advice. The problem is that DNS does not work from a non-global zone (hereunder referred-to as a child zone or CZ) whereas it does from the global zone (GZ). My IPFilter rule set is at https://pastebin.com/JYeYDPAb and it is the problem: with 'svcadm disable ipfilter' I CAN do DNS from the CZ and with 'svcadm enable ipfilter' I CANNOT. Interface e1000g0 is connected to my cable modem (192.168.0.1) and the interwebs, and e1000g1 is connected to my switch and house network. The interfaces in the GZ and CZ: GZ# netstat -rn Routing Table: IPv4 Destination Gateway Flags Ref Use Interface -------------------- -------------------- ----- ----- ---------- --------- default 192.168.0.1 UG 3 1517370 127.0.0.1 127.0.0.1 UH 2 236 lo0 192.168.0.0 192.168.0.9 U 3 12 e1000g0 192.168.1.0 192.168.1.1 U 10 60219886 e1000g1 (IPv6 stuff omitted for brevity) CZ# netstat -rn Routing Table: IPv4 Destination Gateway Flags Ref Use Interface -------------------- -------------------- ----- ----- ---------- --------- default 192.168.0.1 UG 3 1517442 127.0.0.1 127.0.0.1 UH 2 24 lo0 192.168.0.0 192.168.0.3 U 3 3 e1000g0 192.168.1.0 192.168.1.3 U 5 0 e1000g1 so the only difference is the IP addresses. Now with ipfilter disabled: CZ# nslookup www.gentoo.org Server: 198.142.235.14 Address: 198.142.235.14#53 Non-authoritative answer: www.gentoo.org canonical name = www-bytemark-v4v6.gentoo.org. Name: www-bytemark-v4v6.gentoo.org Address: 89.16.167.134 But with it ENabled: CZ# nslookup www.gentoo.org ;; connection timed out; no servers could be reached CZ# ping 89.16.167.134 89.16.167.134 is alive So pinging works but DNS doesn't. Obviously, as nslookup in the CZ works with ipfilter disabled, DNS is configured correctly: CZ# grep '^hosts:' /etc/nsswitch.conf hosts: files dns mdns CZ# cat /etc/resolv.conf nameserver 198.142.235.14 nameserver 211.29.132.12 nameserver 198.142.0.51 Picking bits from Jim's responses (4 Jan 2015): << For debugging, you can 'snoop' in the zone owning the interface (GZ for shared, LZ for dedicated VNICs) to check what requests go out and what does or does not come back in. >> I tried this couldn't snoop in the CZ/LZ ("snoop: cannot open "e1000g0": DLPI link does not exist") and a GZ snoop didn't show any DNS. << rules for e1000g0 in/out comms. name the dynamic address for the interface as 'e1000g0/32' which may limit to the GZ address. See if replacing this by the subnet /24 fixes the issue? >> I did this but no difference. << Does the external LZ have a fixed IP address >> Yes << you can then pluck in specific rules for its network access then? >> Now that e1000g0 rules in ipf.conf are all /24 this should not matter. << you start with block in quick on e1000g0 from 192.168.0.0/16 to any which may preclude access to your router >> I tried removing this but no difference. << Also [...] 'ipfstat -hion' [...] 'ipmon | grep -w b' >> Tried those but couldn't see anything relevant in the output. The nub of the matter is that something in the ipf.conf is treating the LZ e1000g0 interface differently from the GZ's e1000g0 but I cannot see what. Any assistance would be appreciated. -- ______________ Michael Mounteney From gate03 at landcroft.co.uk Sun Dec 27 03:18:40 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Sun, 27 Dec 2015 13:18:40 +1000 Subject: [OmniOS-discuss] networking from a zone Message-ID: <20151227131840.4e5467ed@punda-mlia> Hello, I tried to do this a while ago and Jim Klimov (4 Jan 2015) was kind enough to reply but I was unable to solve the problem with his advice. The problem is that DNS does not work from a non-global zone (hereunder referred-to as a child zone or CZ) whereas it does from the global zone (GZ). My IPFilter rule set is at https://pastebin.com/JYeYDPAb and it is the problem: with 'svcadm disable ipfilter' I CAN do DNS from the CZ and with 'svcadm enable ipfilter' I CANNOT. Interface e1000g0 is connected to my cable modem (192.168.0.1) and the interwebs, and e1000g1 is connected to my switch and house network. The interfaces in the GZ and CZ: GZ# netstat -rn Routing Table: IPv4 Destination Gateway Flags Ref Use Interface -------------------- -------------------- ----- ----- ---------- --------- default 192.168.0.1 UG 3 1517370 127.0.0.1 127.0.0.1 UH 2 236 lo0 192.168.0.0 192.168.0.9 U 3 12 e1000g0 192.168.1.0 192.168.1.1 U 10 60219886 e1000g1 (IPv6 stuff omitted for brevity) CZ# netstat -rn Routing Table: IPv4 Destination Gateway Flags Ref Use Interface -------------------- -------------------- ----- ----- ---------- --------- default 192.168.0.1 UG 3 1517442 127.0.0.1 127.0.0.1 UH 2 24 lo0 192.168.0.0 192.168.0.3 U 3 3 e1000g0 192.168.1.0 192.168.1.3 U 5 0 e1000g1 so the only difference is the IP addresses. Now with ipfilter disabled: CZ# nslookup www.gentoo.org Server: 198.142.235.14 Address: 198.142.235.14#53 Non-authoritative answer: www.gentoo.org canonical name = www-bytemark-v4v6.gentoo.org. Name: www-bytemark-v4v6.gentoo.org Address: 89.16.167.134 But with it ENabled: CZ# nslookup www.gentoo.org ;; connection timed out; no servers could be reached CZ# ping 89.16.167.134 89.16.167.134 is alive So pinging works but DNS doesn't. Obviously, as nslookup in the CZ works with ipfilter disabled, DNS is configured correctly: CZ# grep '^hosts:' /etc/nsswitch.conf hosts: files dns mdns CZ# cat /etc/resolv.conf nameserver 198.142.235.14 nameserver 211.29.132.12 nameserver 198.142.0.51 Picking bits from Jim's responses (4 Jan 2015): << For debugging, you can 'snoop' in the zone owning the interface (GZ for shared, LZ for dedicated VNICs) to check what requests go out and what does or does not come back in. >> I tried this couldn't snoop in the CZ/LZ ("snoop: cannot open "e1000g0": DLPI link does not exist") and a GZ snoop didn't show any DNS. << rules for e1000g0 in/out comms. name the dynamic address for the interface as 'e1000g0/32' which may limit to the GZ address. See if replacing this by the subnet /24 fixes the issue? >> I did this but no difference. << Does the external LZ have a fixed IP address >> Yes << you can then pluck in specific rules for its network access then? >> Now that e1000g0 rules in ipf.conf are all /24 this should not matter. << you start with block in quick on e1000g0 from 192.168.0.0/16 to any which may preclude access to your router >> I tried removing this but no difference. << Also [...] 'ipfstat -hion' [...] 'ipmon | grep -w b' >> Tried those but couldn't see anything relevant in the output. The nub of the matter is that something in the ipf.conf is treating the LZ e1000g0 interface differently from the GZ's e1000g0 but I cannot see what. Any assistance would be appreciated. -- ______________ Michael Mounteney From gate03 at landcroft.co.uk Sun Dec 27 04:49:49 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Sun, 27 Dec 2015 14:49:49 +1000 Subject: [OmniOS-discuss] OmniOS stops acting as a DHCP client In-Reply-To: <5639EAF9.4010605@genashor.com> References: <20151104194943.3faa0777@coomera> <5639EAF9.4010605@genashor.com> Message-ID: <20151227144949.6a51cb27@punda-mlia> On Wed, 4 Nov 2015 06:24:41 -0500 Gary Gendel wrote: > Try snooping the nic to see if you get the appropriate DHCP messages > flowing in and out of the box. Make sure you don't have an ipfilter > rule blocking this traffic. You might try to shut down ipfilter just > to see if it got in the way. Gary, you were right. At first I dismissed your solution because I reasoned that I had not altered the ipfilter rule set so why would it block the DHCP request when it never did before? But I had run the initial DHCP request early during configuration **before configuring ipfilter**. The actual problem is the lack of a port=68 rule to let the lease-response through. ______________ Michael Mounteney From jimklimov at cos.ru Sun Dec 27 10:18:30 2015 From: jimklimov at cos.ru (Jim Klimov) Date: Sun, 27 Dec 2015 11:18:30 +0100 Subject: [OmniOS-discuss] networking from a zone In-Reply-To: <20151227131840.4e5467ed@punda-mlia> References: <20151227131840.4e5467ed@punda-mlia> Message-ID: 27 ??????? 2015??. 4:18:40 CET, Michael Mounteney ?????: >Hello, I tried to do this a while ago and Jim Klimov (4 Jan 2015) was >kind enough to reply but I was unable to solve the problem with his >advice. > >The problem is that DNS does not work from a non-global zone >(hereunder referred-to as a child zone or CZ) whereas it does >from the global zone (GZ). > >My IPFilter rule set is at https://pastebin.com/JYeYDPAb and it is >the problem: with 'svcadm disable ipfilter' I CAN do DNS from the CZ >and with 'svcadm enable ipfilter' I CANNOT. > >Interface e1000g0 is connected to my cable modem (192.168.0.1) and the >interwebs, and e1000g1 is connected to my switch and house network. > >The interfaces in the GZ and CZ: > >GZ# netstat -rn >Routing Table: IPv4 >Destination Gateway Flags Ref Use >Interface >-------------------- -------------------- ----- ----- ---------- >--------- >default 192.168.0.1 UG 3 1517370 > >127.0.0.1 127.0.0.1 UH 2 236 lo0 > >192.168.0.0 192.168.0.9 U 3 12 >e1000g0 >192.168.1.0 192.168.1.1 U 10 60219886 >e1000g1 > >(IPv6 stuff omitted for brevity) > >CZ# netstat -rn >Routing Table: IPv4 >Destination Gateway Flags Ref Use >Interface >-------------------- -------------------- ----- ----- ---------- >--------- >default 192.168.0.1 UG 3 1517442 > >127.0.0.1 127.0.0.1 UH 2 24 lo0 > >192.168.0.0 192.168.0.3 U 3 3 >e1000g0 >192.168.1.0 192.168.1.3 U 5 0 >e1000g1 > >so the only difference is the IP addresses. > >Now with ipfilter disabled: > >CZ# nslookup www.gentoo.org >Server: 198.142.235.14 >Address: 198.142.235.14#53 > >Non-authoritative answer: >www.gentoo.org canonical name = www-bytemark-v4v6.gentoo.org. >Name: www-bytemark-v4v6.gentoo.org >Address: 89.16.167.134 > >But with it ENabled: > >CZ# nslookup www.gentoo.org >;; connection timed out; no servers could be reached > >CZ# ping 89.16.167.134 >89.16.167.134 is alive > >So pinging works but DNS doesn't. > >Obviously, as nslookup in the CZ works with ipfilter disabled, DNS is >configured correctly: > >CZ# grep '^hosts:' /etc/nsswitch.conf >hosts: files dns mdns > >CZ# cat /etc/resolv.conf >nameserver 198.142.235.14 >nameserver 211.29.132.12 >nameserver 198.142.0.51 > >Picking bits from Jim's responses (4 Jan 2015): > ><< For debugging, you can 'snoop' in the zone owning the interface >(GZ for shared, LZ for dedicated VNICs) to check what requests go >out and what does or does not come back in. >> > >I tried this couldn't snoop in the CZ/LZ >("snoop: cannot open "e1000g0": DLPI link does not exist") and a GZ >snoop >didn't show any DNS. > ><< rules for e1000g0 in/out comms. name the dynamic address for the >interface as 'e1000g0/32' which may limit to the GZ address. See if >replacing this by the subnet /24 fixes the issue? >> >I did this but no difference. > ><< Does the external LZ have a fixed IP address >> Yes > ><< you can then pluck in specific rules for its network access then? >> >Now that e1000g0 rules in ipf.conf are all /24 this should not matter. > ><< you start with > block in quick on e1000g0 from 192.168.0.0/16 to any >which may preclude access to your router >> >I tried removing this but no difference. > ><< Also [...] 'ipfstat -hion' [...] 'ipmon | grep -w b' >> > >Tried those but couldn't see anything relevant in the output. > >The nub of the matter is that something in the ipf.conf is treating the >LZ e1000g0 >interface differently from the GZ's e1000g0 but I cannot see what. > >Any assistance would be appreciated. Hello again ;) Looking at your pastebin rules, i am a bit concerned about lines 34, 42 and such with 'e1000g0/24' - this may be, possibly, limiting the ipfilter somehow to only use the GZ addresses, or those that are bound to GZ at the time of ipfilter startup, or just wholly owned by the GZ. At least I'm wary of that bit... And from route screenshots, I infer that the local zone is currently on shared stack, so its interfaces are aliased and set up from the GZ. If you boot up the local zone and then restart ipfilter in the GZ - does it still misbehave? See if allowing requests from the subnet by number explicitly would help? Also, your rules could be a bit optimized by using 'head' and 'group' to separate the int/ext interfaces in/out directions so ipfilter does not have to process the whole ruleset when you know in advance that a rule is not applicable to each and every packet ;) As for snoop and/or libpcap cliebts not finding interfaces - 'truss' the program to see what they try to access. Maybe they want e.g. /dev/e1000g0 so you'd have to go and make symlinks: cd /dev && ln -s ./net/* . Some (older/vanilla) sniffer versions could also look for the base device like 'e1000' - i'm not sure how to help that... Hope this helps, Jim -- Typos courtesy of K-9 Mail on my Samsung Android From gary at genashor.com Sun Dec 27 13:45:46 2015 From: gary at genashor.com (Gary Gendel) Date: Sun, 27 Dec 2015 08:45:46 -0500 Subject: [OmniOS-discuss] OmniOS stops acting as a DHCP client In-Reply-To: <20151227144949.6a51cb27@punda-mlia> References: <20151104194943.3faa0777@coomera> <5639EAF9.4010605@genashor.com> <20151227144949.6a51cb27@punda-mlia> Message-ID: <567FEB8A.8060900@genashor.com> On 12/26/2015 11:49 PM, Michael Mounteney wrote: > On Wed, 4 Nov 2015 06:24:41 -0500 Gary Gendel wrote: > >> Try snooping the nic to see if you get the appropriate DHCP messages >> flowing in and out of the box. Make sure you don't have an ipfilter >> rule blocking this traffic. You might try to shut down ipfilter just >> to see if it got in the way. > Gary, you were right. At first I dismissed your solution because I > reasoned that I had not altered the ipfilter rule set so why would it > block the DHCP request when it never did before? But I had run the > initial DHCP request early during configuration **before configuring > ipfilter**. > > The actual problem is the lack of a port=68 rule to let the > lease-response through. > > ______________ > Michael Mounteney Michael, It always helps to not assume anything and test everything. I have a modern smartphone get an OS update quarterly. After each update, I get the same symptom... Every ~20 hours all applications send me notifications that I have to log in again. However, when I check I am logged in. The only cure I found was to wipe the phone clean and install each application again manually. If I restore from backup the nonsense starts all over again. It goes against reason but it happens reliably after each OS update. Software is funny that way. Gary From danmcd at omniti.com Mon Dec 28 21:45:25 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 28 Dec 2015 16:45:25 -0500 Subject: [OmniOS-discuss] OmniOS and OpenMP In-Reply-To: References: Message-ID: > On Dec 24, 2015, at 6:36 PM, Bob Friesenhahn wrote: > > GCC compiled programs making use of OpenMP require libgomp in order to run. Currently this library is provided as part of the GCC packages. It is necessary to install all of GCC in order for dependent programs to be able to run, and linker run-path (e.g. -R/opt/gcc-5.1.0/lib) also needs to be specified when linking the program. There was a similar problem for libgcc_s.so but this was provided via a runtime package: > > % pkg contents system/library/gcc-5-runtime > PATH > usr/lib/amd64 > usr/lib/amd64/libgcc_s.so > usr/lib/amd64/libgcc_s.so.1 > usr/lib/libgcc_s.so > usr/lib/libgcc_s.so.1 > > Can a similar runtime package be provided for libgomp (e.g. gcc-5-gomp-runtime)? It would make sense for gomp to be included in gcc-5-runtime except that since it was not included from the start, adding it now might cause problems. It's possible, but I'd have to think about how best to package it up (including dependencies, etc. etc.). I don't have any objections to including gomp in gcc-5-runtime, and modulo the must-have-the-latest-version problem, it might not be so bad. It's something to consider for bloody & r151018. Dan From danmcd at omniti.com Mon Dec 28 21:53:25 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 28 Dec 2015 16:53:25 -0500 Subject: [OmniOS-discuss] OmniOS stops acting as a DHCP client In-Reply-To: <20151227144949.6a51cb27@punda-mlia> References: <20151104194943.3faa0777@coomera> <5639EAF9.4010605@genashor.com> <20151227144949.6a51cb27@punda-mlia> Message-ID: <23D7ADAF-99A8-4277-A0D0-18D1106CAEA5@omniti.com> > On Dec 26, 2015, at 11:49 PM, Michael Mounteney wrote: > > The actual problem is the lack of a port=68 rule to let the > lease-response through. Did that clear up this problem? Also, did you other networking problem clear up with some ipfilter rule fixing? Dan From bfriesen at simple.dallas.tx.us Mon Dec 28 22:50:23 2015 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Mon, 28 Dec 2015 16:50:23 -0600 (CST) Subject: [OmniOS-discuss] OmniOS and OpenMP In-Reply-To: References: Message-ID: On Mon, 28 Dec 2015, Dan McDonald wrote: > >> On Dec 24, 2015, at 6:36 PM, Bob Friesenhahn wrote: >> >> GCC compiled programs making use of OpenMP require libgomp in order to run. Currently this library is provided as part of the GCC packages. It is necessary to install all of GCC in order for dependent programs to be able to run, and linker run-path (e.g. -R/opt/gcc-5.1.0/lib) also needs to be specified when linking the program. There was a similar problem for libgcc_s.so but this was provided via a runtime package: >> >> % pkg contents system/library/gcc-5-runtime >> PATH >> usr/lib/amd64 >> usr/lib/amd64/libgcc_s.so >> usr/lib/amd64/libgcc_s.so.1 >> usr/lib/libgcc_s.so >> usr/lib/libgcc_s.so.1 >> >> Can a similar runtime package be provided for libgomp (e.g. gcc-5-gomp-runtime)? It would make sense for gomp to be included in gcc-5-runtime except that since it was not included from the start, adding it now might cause problems. > > It's possible, but I'd have to think about how best to package it up (including dependencies, etc. etc.). I don't have any objections to including gomp in gcc-5-runtime, and modulo the must-have-the-latest-version problem, it might not be so bad. It's something to consider for bloody & r151018. Due to the existing problem, I don't think that there are existing dependencies to worry about. It is clear that for existing release branches, gcc-5-runtime can not add libraries without the risk of an application not running because the gcc-5-runtime vintage is too old. This is perhaps not so much of a problem since OmniOS systems should be updating regularly. The gomp library is a bit large so some users might be happier if it was optional via its own package and not part of the OmniOS baseline. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From gate03 at landcroft.co.uk Tue Dec 29 00:16:58 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Tue, 29 Dec 2015 10:16:58 +1000 Subject: [OmniOS-discuss] OmniOS stops acting as a DHCP client In-Reply-To: <23D7ADAF-99A8-4277-A0D0-18D1106CAEA5@omniti.com> References: <20151104194943.3faa0777@coomera> <5639EAF9.4010605@genashor.com> <20151227144949.6a51cb27@punda-mlia> <23D7ADAF-99A8-4277-A0D0-18D1106CAEA5@omniti.com> Message-ID: <20151229101658.0bbb436e@punda-mlia> On Mon, 28 Dec 2015 16:53:25 -0500 Dan McDonald wrote: > > On Dec 26, 2015, at 11:49 PM, Michael Mounteney wrote: > > > > The actual problem is the lack of a port=68 rule to let the > > lease-response through. > > Did that clear up this problem? Also, did you other networking > problem clear up with some ipfilter rule fixing? Hello Dan; the port=68 rule did clear up the DHCP problem but the other problem, i.e., DNS from a non-global zone, is still present. I haven't had a chance yet to implement all Jim's suggestions but will ask again on this list when I've done so. Thanks for caring. ;-) ______________ Michael Mounteney From ryan at zinascii.com Wed Dec 30 01:28:50 2015 From: ryan at zinascii.com (Ryan Zezeski) Date: Tue, 29 Dec 2015 20:28:50 -0500 Subject: [OmniOS-discuss] Panic, BAD TRAP, r151014, VMWare Fusion 8.1.0 Message-ID: While running a nightly build of illumos-gate the kernel panicked with "BAD TRAP". Running OmniOS r151014 on VMWare Fusion. This is not urgent. I am posting in case I have stumbled onto a bug. VMWare Fusion Version 8.1.0 (3272237) # cat /etc/release OmniOS v11 r151014 Copyright 2015 OmniTI Computer Consulting, Inc. All rights reserved. Use is subject to license terms. # uname -v omnios-d08e0e5 You can find the full crash dump here: http://zinascii.com/pub/illumos/cores/bad-trap-12-29-15/ crash dump info --------------- > ::status debugging crash dump /var/crash/unknown/vmcore.0 (64-bit) from omnislash operating system: 5.11 omnios-d08e0e5 (i86pc) image uuid: a43f56bd-f5a4-6643-92e6-84262b07c26a panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff001010c010 addr=fffffffffb8484b0 dump content: kernel pages only > ::panicinfo cpu 2 thread ffffff02e40fb160 message BAD TRAP: type=e (#pf Page fault) rp=ffffff001010c010 addr=fffffffffb8484b0 rdi ffffff001010c110 rsi fffffffffb8484b0 rdx 2 rcx 0 r8 fffffffffbc723a0 r9 78 rax fffffffffbc72420 rbx fee19000 rbp ffffff001010c110 r10 fffffffffbcf8cb0 r11 ffffff02e40fb160 r12 0 r13 0 r14 ffffff02fb5e1c00 r15 fffffffffb8484b0 fsbase 0 gsbase ffffff02dbcb4080 ds 4b es 4b fs 0 gs 1c3 trapno e err 10 rip fffffffffb8484b0 cs 30 rflags 10202 rsp ffffff001010c108 ss 0 gdt_hi 0 gdt_lo b00001ef idt_hi 0 idt_lo a0000fff ldt 0 task 70 cr0 8005003b cr2 ffffff02d90f0ff8 cr3 234af6000 cr4 406b8 > ::msgbuf MESSAGE pcieb16 is /pci at 0,0/pci15ad,7a0 at 17 PCI Express-device: pci15ad,7a0 at 17,1, pcieb17 pcieb17 is /pci at 0,0/pci15ad,7a0 at 17,1 PCI Express-device: pci15ad,7a0 at 17,2, pcieb18 pcieb18 is /pci at 0,0/pci15ad,7a0 at 17,2 PCI Express-device: pci15ad,7a0 at 17,3, pcieb19 pcieb19 is /pci at 0,0/pci15ad,7a0 at 17,3 PCI Express-device: pci15ad,7a0 at 17,4, pcieb20 pcieb20 is /pci at 0,0/pci15ad,7a0 at 17,4 PCI Express-device: pci15ad,7a0 at 17,5, pcieb21 pcieb21 is /pci at 0,0/pci15ad,7a0 at 17,5 PCI Express-device: pci15ad,7a0 at 17,6, pcieb22 pcieb22 is /pci at 0,0/pci15ad,7a0 at 17,6 PCI Express-device: pci15ad,7a0 at 17,7, pcieb23 pcieb23 is /pci at 0,0/pci15ad,7a0 at 17,7 PCI Express-device: pci15ad,7a0 at 18, pcieb24 pcieb24 is /pci at 0,0/pci15ad,7a0 at 18 PCI Express-device: pci15ad,7a0 at 18,1, pcieb25 pcieb25 is /pci at 0,0/pci15ad,7a0 at 18,1 PCI Express-device: pci15ad,7a0 at 18,2, pcieb26 pcieb26 is /pci at 0,0/pci15ad,7a0 at 18,2 PCI Express-device: pci15ad,7a0 at 18,3, pcieb27 pcieb27 is /pci at 0,0/pci15ad,7a0 at 18,3 PCI Express-device: pci15ad,7a0 at 18,4, pcieb28 pcieb28 is /pci at 0,0/pci15ad,7a0 at 18,4 PCI Express-device: pci15ad,7a0 at 18,5, pcieb29 pcieb29 is /pci at 0,0/pci15ad,7a0 at 18,5 PCI Express-device: pci15ad,7a0 at 18,6, pcieb30 pcieb30 is /pci at 0,0/pci15ad,7a0 at 18,6 PCI Express-device: pci15ad,7a0 at 18,7, pcieb31 pcieb31 is /pci at 0,0/pci15ad,7a0 at 18,7 pseudo-device: stmf_sbd0 stmf_sbd0 is /pseudo/stmf_sbd at 0 PCI Express-device: pci15ad,790 at 11, pci_pci1 pci_pci1 is /pci at 0,0/pci15ad,790 at 11 NOTICE: e1000g0 registered NOTICE: e1000g0 link up, 1000 Mbps, full duplex pseudo-device: devinfo0 devinfo0 is /pseudo/devinfo at 0 pseudo-device: zfs0 zfs0 is /pseudo/zfs at 0 WARNING: drmach_init: number of logical CPUs (3) in physical processor is not power of 2. This Solaris instance has UUID a43f56bd-f5a4-6643-92e6-84262b07c26a dump on /dev/zvol/dsk/rpool/dump size 4096 MB pseudo-device: pm0 pm0 is /pseudo/pm at 0 pseudo-device: power0 power0 is /pseudo/power at 0 pseudo-device: srn0 srn0 is /pseudo/srn at 0 iscsi0 at root iscsi0 is /iscsi ISA-device: fdc0 fd0 at fdc0 fd0 is /pci at 0,0/isa at 7/fdc at 1,3f0/fd at 0,0 audioens#0: AC'97 codec id Cirrus Logic 0x43525913 (43525913, 2 channels, caps 0) PCI-device: pci1274,1371 at 1, audioens0 audioens0 is /pci at 0,0/pci15ad,790 at 11/pci1274,1371 at 1 ATAPI device at targ 0, lun 0 lastlun 0x0 model VMware Virtual IDE CDROM Drive ATA/ATAPI-4 supported, majver 0x1e minver 0x17 PCI Express-device: ide at 1, ata1 ata1 is /pci at 0,0/pci-ide at 7,1/ide at 1 UltraDMA mode 2 selected sd1 at ata1: target 0 lun 0 sd1 is /pci at 0,0/pci-ide at 7,1/ide at 1/sd at 0,0 device pciclass,030000 at f(display#0) keeps up device sd at 0,0(sd#1), but the former is not power managed pseudo-device: pool0 pool0 is /pseudo/pool at 0 pseudo-device: dtrace0 dtrace0 is /pseudo/dtrace at 0 pseudo-device: devinfo0 devinfo0 is /pseudo/devinfo at 0 panic[cpu2]/thread=ffffff02e40fb160: BAD TRAP: type=e (#pf Page fault) rp=ffffff001010c010 addr=fffffffffb8484b0 make: #pf Page fault Bad kernel fault at addr=0xfffffffffb8484b0 pid=19583, pc=0xfffffffffb8484b0, sp=0xffffff001010c108, eflags=0x10202 cr0: 8005003b cr4: 406b8 cr2: ffffff02d90f0ff8 cr3: 234af6000 cr8: 0 rdi: ffffff001010c110 rsi: fffffffffb8484b0 rdx: 2 rcx: 0 r8: fffffffffbc723a0 r9: 78 rax: fffffffffbc72420 rbx: fee19000 rbp: ffffff001010c110 r10: fffffffffbcf8cb0 r11: ffffff02e40fb160 r12: 0 r13: 0 r14: ffffff02fb5e1c00 r15: fffffffffb8484b0 fsb: 0 gsb: ffffff02dbcb4080 ds: 4b es: 4b fs: 0 gs: 1c3 trp: e err: 10 rip: fffffffffb8484b0 cs: 30 rfl: 10202 rsp: ffffff001010c108 ss: 0 ffffff001010bef0 unix:real_mode_stop_cpu_stage2_end+9e43 () ffffff001010c000 unix:trap+db3 () ffffff001010c010 unix:cmntrap+e6 () ffffff001010c110 unix:trap+0 () ffffff001010c210 unix:trap+0 () ffffff001010c310 unix:trap+0 () ffffff001010c410 unix:trap+0 () ffffff001010c510 unix:trap+0 () ffffff001010c610 unix:trap+0 () ffffff001010c710 unix:trap+0 () ffffff001010c810 unix:trap+0 () ffffff001010c910 unix:trap+0 () ffffff001010ca10 unix:trap+0 () ffffff001010cb10 unix:trap+0 () ffffff001010cc10 unix:trap+0 () ffffff001010cd10 unix:trap+0 () ffffff001010ce10 unix:trap+0 () ffffff001010cf10 unix:trap+0 () ffffff001010d010 unix:trap+0 () ffffff001010d110 unix:trap+0 () ffffff001010d210 unix:trap+0 () ffffff001010d310 unix:trap+0 () ffffff001010d410 unix:trap+0 () ffffff001010d510 unix:trap+0 () ffffff001010d610 unix:trap+0 () ffffff001010d710 unix:trap+0 () ffffff001010d810 unix:trap+0 () ffffff001010d910 unix:trap+0 () ffffff001010da10 unix:trap+0 () ffffff001010db10 unix:trap+0 () ffffff001010dc10 unix:trap+0 () ffffff001010dd10 unix:trap+0 () ffffff001010de10 unix:trap+0 () ffffff001010df10 unix:trap+0 () syncing file systems... done dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel > fffffffffbc3b540::cpuinfo -v ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 2 fffffffffbc3b540 1b 1 0 19 no no t-0 ffffff02e40fb160 make | | RUNNING <--+ +--> PRI THREAD PROC READY 0 ffffff02fbb5dae0 gcc EXISTS ENABLE -Ryan From josh at sysmgr.org Wed Dec 30 01:51:13 2015 From: josh at sysmgr.org (Joshua M. Clulow) Date: Tue, 29 Dec 2015 17:51:13 -0800 Subject: [OmniOS-discuss] Panic, BAD TRAP, r151014, VMWare Fusion 8.1.0 In-Reply-To: References: Message-ID: On 29 December 2015 at 17:28, Ryan Zezeski wrote: > While running a nightly build of illumos-gate the kernel panicked with > "BAD TRAP". Running OmniOS r151014 on VMWare Fusion. > > VMWare Fusion Version 8.1.0 (3272237) What model of Mac is this, and what model of CPU is in it? We have experienced some issues with SmartOS running in VMware Fusion on some models of Intel CPU. I believe there is an erratum about spurious page faults when running a hypervisor that makes use of EPT. Cheers. -- Joshua M. Clulow UNIX Admin/Developer http://blog.sysmgr.org From ryan at zinascii.com Wed Dec 30 02:11:34 2015 From: ryan at zinascii.com (Ryan Zezeski) Date: Tue, 29 Dec 2015 21:11:34 -0500 Subject: [OmniOS-discuss] Panic, BAD TRAP, r151014, VMWare Fusion 8.1.0 In-Reply-To: References: Message-ID: Joshua M. Clulow writes: > On 29 December 2015 at 17:28, Ryan Zezeski wrote: >> While running a nightly build of illumos-gate the kernel panicked with >> "BAD TRAP". Running OmniOS r151014 on VMWare Fusion. >> >> VMWare Fusion Version 8.1.0 (3272237) > > What model of Mac is this, and what model of CPU is in it? We have > experienced some issues with SmartOS running in VMware Fusion on some > models of Intel CPU. I believe there is an erratum about spurious > page faults when running a hypervisor that makes use of EPT. MacBook Pro Retina 15" Late 2013 Model Identifier: MacBookPro11,3 System Version: OS X 10.11.2 (15C50) $ sysctl -a machdep.cpu machdep.cpu.max_basic: 13 machdep.cpu.max_ext: 2147483656 machdep.cpu.vendor: GenuineIntel machdep.cpu.brand_string: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz machdep.cpu.family: 6 machdep.cpu.model: 70 machdep.cpu.extmodel: 4 machdep.cpu.extfamily: 0 machdep.cpu.stepping: 1 machdep.cpu.feature_bits: 9221960262849657855 machdep.cpu.leaf7_feature_bits: 12219 machdep.cpu.extfeature_bits: 142473169152 machdep.cpu.signature: 263777 machdep.cpu.brand: 0 machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM FPU_CSDS machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT RDTSCP TSCI machdep.cpu.logical_per_package: 16 machdep.cpu.cores_per_package: 8 machdep.cpu.microcode_version: 15 machdep.cpu.processor_flag: 5 machdep.cpu.mwait.linesize_min: 64 machdep.cpu.mwait.linesize_max: 64 machdep.cpu.mwait.extensions: 3 machdep.cpu.mwait.sub_Cstates: 270624 machdep.cpu.thermal.sensor: 1 machdep.cpu.thermal.dynamic_acceleration: 1 machdep.cpu.thermal.invariant_APIC_timer: 1 machdep.cpu.thermal.thresholds: 2 machdep.cpu.thermal.ACNT_MCNT: 1 machdep.cpu.thermal.core_power_limits: 1 machdep.cpu.thermal.fine_grain_clock_mod: 1 machdep.cpu.thermal.package_thermal_intr: 1 machdep.cpu.thermal.hardware_feedback: 0 machdep.cpu.thermal.energy_policy: 1 machdep.cpu.xsave.extended_state: 7 832 832 0 machdep.cpu.xsave.extended_state1: 1 0 0 0 machdep.cpu.arch_perf.version: 3 machdep.cpu.arch_perf.number: 4 machdep.cpu.arch_perf.width: 48 machdep.cpu.arch_perf.events_number: 7 machdep.cpu.arch_perf.events: 0 machdep.cpu.arch_perf.fixed_number: 3 machdep.cpu.arch_perf.fixed_width: 48 machdep.cpu.cache.linesize: 64 machdep.cpu.cache.L2_associativity: 8 machdep.cpu.cache.size: 256 machdep.cpu.tlb.inst.large: 8 machdep.cpu.tlb.data.small: 64 machdep.cpu.tlb.data.small_level1: 64 machdep.cpu.tlb.shared: 1024 machdep.cpu.address_bits.physical: 39 machdep.cpu.address_bits.virtual: 48 machdep.cpu.core_count: 4 machdep.cpu.thread_count: 8 machdep.cpu.tsc_ccc.numerator: 0 machdep.cpu.tsc_ccc.denominator: 0 -Ryan From danmcd at omniti.com Wed Dec 30 04:21:57 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 29 Dec 2015 23:21:57 -0500 Subject: [OmniOS-discuss] Panic, BAD TRAP, r151014, VMWare Fusion 8.1.0 In-Reply-To: References: Message-ID: <9E905831-B7CE-40A9-BD29-196079023D64@omniti.com> So is r151014 panicking? Or is your nightly build panicking? If the latter, it's not strictly an OmniOS problem. Also, if your uname -a is correct, you need to update your r151014. Dan Sent from my iPhone (typos, autocorrect, and all) > On Dec 29, 2015, at 9:11 PM, Ryan Zezeski wrote: > > > Joshua M. Clulow writes: > >>> On 29 December 2015 at 17:28, Ryan Zezeski wrote: >>> While running a nightly build of illumos-gate the kernel panicked with >>> "BAD TRAP". Running OmniOS r151014 on VMWare Fusion. >>> >>> VMWare Fusion Version 8.1.0 (3272237) >> >> What model of Mac is this, and what model of CPU is in it? We have >> experienced some issues with SmartOS running in VMware Fusion on some >> models of Intel CPU. I believe there is an erratum about spurious >> page faults when running a hypervisor that makes use of EPT. > > MacBook Pro Retina 15" Late 2013 > Model Identifier: MacBookPro11,3 > System Version: OS X 10.11.2 (15C50) > > $ sysctl -a machdep.cpu > machdep.cpu.max_basic: 13 > machdep.cpu.max_ext: 2147483656 > machdep.cpu.vendor: GenuineIntel > machdep.cpu.brand_string: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz > machdep.cpu.family: 6 > machdep.cpu.model: 70 > machdep.cpu.extmodel: 4 > machdep.cpu.extfamily: 0 > machdep.cpu.stepping: 1 > machdep.cpu.feature_bits: 9221960262849657855 > machdep.cpu.leaf7_feature_bits: 12219 > machdep.cpu.extfeature_bits: 142473169152 > machdep.cpu.signature: 263777 > machdep.cpu.brand: 0 > machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C > machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM FPU_CSDS > machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT RDTSCP TSCI > machdep.cpu.logical_per_package: 16 > machdep.cpu.cores_per_package: 8 > machdep.cpu.microcode_version: 15 > machdep.cpu.processor_flag: 5 > machdep.cpu.mwait.linesize_min: 64 > machdep.cpu.mwait.linesize_max: 64 > machdep.cpu.mwait.extensions: 3 > machdep.cpu.mwait.sub_Cstates: 270624 > machdep.cpu.thermal.sensor: 1 > machdep.cpu.thermal.dynamic_acceleration: 1 > machdep.cpu.thermal.invariant_APIC_timer: 1 > machdep.cpu.thermal.thresholds: 2 > machdep.cpu.thermal.ACNT_MCNT: 1 > machdep.cpu.thermal.core_power_limits: 1 > machdep.cpu.thermal.fine_grain_clock_mod: 1 > machdep.cpu.thermal.package_thermal_intr: 1 > machdep.cpu.thermal.hardware_feedback: 0 > machdep.cpu.thermal.energy_policy: 1 > machdep.cpu.xsave.extended_state: 7 832 832 0 > machdep.cpu.xsave.extended_state1: 1 0 0 0 > machdep.cpu.arch_perf.version: 3 > machdep.cpu.arch_perf.number: 4 > machdep.cpu.arch_perf.width: 48 > machdep.cpu.arch_perf.events_number: 7 > machdep.cpu.arch_perf.events: 0 > machdep.cpu.arch_perf.fixed_number: 3 > machdep.cpu.arch_perf.fixed_width: 48 > machdep.cpu.cache.linesize: 64 > machdep.cpu.cache.L2_associativity: 8 > machdep.cpu.cache.size: 256 > machdep.cpu.tlb.inst.large: 8 > machdep.cpu.tlb.data.small: 64 > machdep.cpu.tlb.data.small_level1: 64 > machdep.cpu.tlb.shared: 1024 > machdep.cpu.address_bits.physical: 39 > machdep.cpu.address_bits.virtual: 48 > machdep.cpu.core_count: 4 > machdep.cpu.thread_count: 8 > machdep.cpu.tsc_ccc.numerator: 0 > machdep.cpu.tsc_ccc.denominator: 0 > > > -Ryan > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From ryan at zinascii.com Wed Dec 30 04:36:52 2015 From: ryan at zinascii.com (Ryan Zezeski) Date: Tue, 29 Dec 2015 23:36:52 -0500 Subject: [OmniOS-discuss] Panic, BAD TRAP, r151014, VMWare Fusion 8.1.0 In-Reply-To: <9E905831-B7CE-40A9-BD29-196079023D64@omniti.com> References: <9E905831-B7CE-40A9-BD29-196079023D64@omniti.com> Message-ID: Dan McDonald writes: > So is r151014 panicking? Or is your nightly build panicking? If the > latter, it's not strictly an OmniOS problem. r151014 panicked, not my nightly build. I.e., this was not an ONU'd system that panicked. > Also, if your uname -a is correct, you need to update your r151014. Yes, I need to update. I haven't touched this system in a while. I needed to run a nightly build and didn't want to wait to upgrade first. -Ryan From bfriesen at simple.dallas.tx.us Thu Dec 31 15:45:48 2015 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Thu, 31 Dec 2015 09:45:48 -0600 (CST) Subject: [OmniOS-discuss] OmniOS r151016 zone has difficulties shutting down In-Reply-To: References: <536501D2-EA96-4F6B-8CB2-39A0F9698267@omniti.com> Message-ID: I would like to share a idea regarding how the newly created zone may have difficulties shutting down. I am accessing the system via ssh which uses '~.' to terminate the ssh session. This is also the default shutdown sequence for 'zlogin -C'. The idea is that after doing some initial administration using 'zlogin -C', the "~." sequence was used to quit it. This terminates the ssh session rather than the zone console login. While subsequent 'zlogin -C' works, it may be that the unclean/violent termination of the zone console login has left behind residue which prevents the zone from shutting down. Yesterday I created a new zone, which had no problems shutting down. I did not use 'zlogin -C' (only ordinary 'zlogin') with this new zone. It may be that 'zlogin -e ! -C' and then using '!.' to terminate the zlogin would avoid the problem. Regardless, this would still be a bug. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From Kevin.Swab at ColoState.EDU Thu Dec 31 16:19:20 2015 From: Kevin.Swab at ColoState.EDU (Kevin Swab) Date: Thu, 31 Dec 2015 09:19:20 -0700 Subject: [OmniOS-discuss] OmniOS r151016 zone has difficulties shutting down In-Reply-To: References: <536501D2-EA96-4F6B-8CB2-39A0F9698267@omniti.com> Message-ID: <56855588.7040308@ColoState.EDU> You can work round this by escaping the '~' character. Try typing '~~.' to exit from 'zlogin -C'. Here's the relevant section from the ssh man page: > A single tilde character can be sent as ~~, or by following the tilde > with a character other than those described above. The escape character > must always follow a newline to be interpreted as special. The escape > character can be changed in configuration files or on the command line. HTH, Kevin On 12/31/2015 08:45 AM, Bob Friesenhahn wrote: > I would like to share a idea regarding how the newly created zone may > have difficulties shutting down. > > I am accessing the system via ssh which uses '~.' to terminate the ssh > session. This is also the default shutdown sequence for 'zlogin -C'. > The idea is that after doing some initial administration using 'zlogin > -C', the "~." sequence was used to quit it. This terminates the ssh > session rather than the zone console login. While subsequent 'zlogin > -C' works, it may be that the unclean/violent termination of the zone > console login has left behind residue which prevents the zone from > shutting down. > > Yesterday I created a new zone, which had no problems shutting down. I > did not use 'zlogin -C' (only ordinary 'zlogin') with this new zone. > > It may be that 'zlogin -e ! -C' and then using '!.' to terminate the > zlogin would avoid the problem. > > Regardless, this would still be a bug. > > Bob -- ------------------------------------------------------------------- Kevin Swab UNIX Systems Administrator ACNS Colorado State University Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C