From richard at netbsd.org  Thu Oct  1 09:50:03 2015
From: richard at netbsd.org (Richard PALO)
Date: Thu, 01 Oct 2015 11:50:03 +0200
Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
In-Reply-To: <20150930080248.GA4668@gutsman.lotheac.fi>
References: <mslp14$jk9$1@ger.gmane.org>
	<5606BAD8.8090101@netbsd.org>	<33923013-0E59-4223-8EF4-A77A168E1C70@omniti.com>	<8963D7A6-2339-4E6F-9559-9DBAAAAD23BF@omniti.com>	<20150928134639.GC17072@gutsman.lotheac.fi>	<mubibr$omv$1@ger.gmane.org>	<20150928154027.GD5062@gutsman.lotheac.fi>	<mudohh$i64$1@ger.gmane.org>	<20150929103507.GE17072@gutsman.lotheac.fi>	<560B95BF.4080404@netbsd.org>
	<20150930080248.GA4668@gutsman.lotheac.fi>
Message-ID: <560D01CB.1060303@netbsd.org>

Le 30/09/15 10:02, Lauri Tirkkonen a ?crit :
> On Wed, Sep 30 2015 09:56:47 +0200, Richard PALO wrote:
>>> To be clear, it's not implementing RFC 1323 (and not even *not*
>>> implementing 7323) that causes the issue. 1323 actually didn't specify
>>> what to do with non-timestamped segments on a timestamp-negotiated
>>> connection, and illumos pre-5850 did something very surprising which I
>>> doubt nobody else did (stop generating timestamps on all future
>>> segments), so I don't think you will be able to reproduce the hang with
>>> other operating systems, but you'll likely be able to see the unexpected
>>> non-timestamped segments in connections between other OSes as well (but
>>> I still can't be sure because I don't know what middlebox is injecting
>>> them or why :)
>>>
>>
>> In that case, wouldn't setting tcp_tstamp_always on OI to '1' be better in
>> this case (or would OI not honour that setting correctly)?
>
> It wouldn't work. From what I can tell, those ndd settings only affect
> the SYN segments (ie.  timestamp negotiation); pre-5850 illumos will
> always stop timestamping mid-connection if it receives a non-timestamped
> segment.
>

Okay, I set tcp_tstamp_if_wscale to 0 and it does seem to work fine.

(Hoping there isn't any fallout from doing this now...)

kiitoksia


From lotheac at iki.fi  Thu Oct  1 09:58:00 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Thu, 1 Oct 2015 12:58:00 +0300
Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
In-Reply-To: <560D01CB.1060303@netbsd.org>
References: <33923013-0E59-4223-8EF4-A77A168E1C70@omniti.com>
	<8963D7A6-2339-4E6F-9559-9DBAAAAD23BF@omniti.com>
	<20150928134639.GC17072@gutsman.lotheac.fi>
	<mubibr$omv$1@ger.gmane.org>
	<20150928154027.GD5062@gutsman.lotheac.fi>
	<mudohh$i64$1@ger.gmane.org>
	<20150929103507.GE17072@gutsman.lotheac.fi>
	<560B95BF.4080404@netbsd.org>
	<20150930080248.GA4668@gutsman.lotheac.fi>
	<560D01CB.1060303@netbsd.org>
Message-ID: <20151001095800.GB4668@gutsman.lotheac.fi>

On Thu, Oct 01 2015 11:50:03 +0200, Richard PALO wrote:
> >>In that case, wouldn't setting tcp_tstamp_always on OI to '1' be better in
> >>this case (or would OI not honour that setting correctly)?
> >
> >It wouldn't work. From what I can tell, those ndd settings only affect
> >the SYN segments (ie.  timestamp negotiation); pre-5850 illumos will
> >always stop timestamping mid-connection if it receives a non-timestamped
> >segment.
> >
> 
> Okay, I set tcp_tstamp_if_wscale to 0 and it does seem to work fine.

Thanks, that pretty much confirms the issue is what I suspected it is.

> (Hoping there isn't any fallout from doing this now...)

As long as that middlebox has been mucking with your traffic in the way
it is, timestamps have been getting turned off mid-connection for your
pre-5850 box. I recommend you to ugprade to post-5850 if you can, or to
scream loudly at whoever is modifying your traffic :)

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From richard at netbsd.org  Thu Oct  1 11:49:03 2015
From: richard at netbsd.org (Richard PALO)
Date: Thu, 01 Oct 2015 13:49:03 +0200
Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
In-Reply-To: <20151001095800.GB4668@gutsman.lotheac.fi>
References: <33923013-0E59-4223-8EF4-A77A168E1C70@omniti.com>	<8963D7A6-2339-4E6F-9559-9DBAAAAD23BF@omniti.com>	<20150928134639.GC17072@gutsman.lotheac.fi>	<mubibr$omv$1@ger.gmane.org>	<20150928154027.GD5062@gutsman.lotheac.fi>	<mudohh$i64$1@ger.gmane.org>	<20150929103507.GE17072@gutsman.lotheac.fi>	<560B95BF.4080404@netbsd.org>	<20150930080248.GA4668@gutsman.lotheac.fi>	<560D01CB.1060303@netbsd.org>
	<20151001095800.GB4668@gutsman.lotheac.fi>
Message-ID: <muj6ks$5ab$1@ger.gmane.org>

Le 01/10/15 11:58, Lauri Tirkkonen a ?crit :
> On Thu, Oct 01 2015 11:50:03 +0200, Richard PALO wrote:
>>>> In that case, wouldn't setting tcp_tstamp_always on OI to '1' be better in
>>>> this case (or would OI not honour that setting correctly)?
>>>
>>> It wouldn't work. From what I can tell, those ndd settings only affect
>>> the SYN segments (ie.  timestamp negotiation); pre-5850 illumos will
>>> always stop timestamping mid-connection if it receives a non-timestamped
>>> segment.
>>>
>>
>> Okay, I set tcp_tstamp_if_wscale to 0 and it does seem to work fine.
> 
> Thanks, that pretty much confirms the issue is what I suspected it is.
> 
>> (Hoping there isn't any fallout from doing this now...)
> 
> As long as that middlebox has been mucking with your traffic in the way
> it is, timestamps have been getting turned off mid-connection for your
> pre-5850 box. I recommend you to ugprade to post-5850 if you can, or to
> scream loudly at whoever is modifying your traffic :)
> 

Actually I still notice some problems.. This morning in the direction OI => omnios 
things seemed okay.
Now, omnios => OI I just now experienced the hang again, and it is repeatable.

Could it be that your workaround is only useful for outbound connections (relative to OI)?

-- 
Richard PALO


From lotheac at iki.fi  Thu Oct  1 12:24:11 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Thu, 1 Oct 2015 15:24:11 +0300
Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
In-Reply-To: <muj6ks$5ab$1@ger.gmane.org>
References: <20150928134639.GC17072@gutsman.lotheac.fi>
	<mubibr$omv$1@ger.gmane.org>
	<20150928154027.GD5062@gutsman.lotheac.fi>
	<mudohh$i64$1@ger.gmane.org>
	<20150929103507.GE17072@gutsman.lotheac.fi>
	<560B95BF.4080404@netbsd.org>
	<20150930080248.GA4668@gutsman.lotheac.fi>
	<560D01CB.1060303@netbsd.org>
	<20151001095800.GB4668@gutsman.lotheac.fi>
	<muj6ks$5ab$1@ger.gmane.org>
Message-ID: <20151001122411.GD4668@gutsman.lotheac.fi>

On Thu, Oct 01 2015 13:49:03 +0200, Richard PALO wrote:
> Le 01/10/15 11:58, Lauri Tirkkonen a ?crit :
> > On Thu, Oct 01 2015 11:50:03 +0200, Richard PALO wrote:
> >>>> In that case, wouldn't setting tcp_tstamp_always on OI to '1' be better in
> >>>> this case (or would OI not honour that setting correctly)?
> >>>
> >>> It wouldn't work. From what I can tell, those ndd settings only affect
> >>> the SYN segments (ie.  timestamp negotiation); pre-5850 illumos will
> >>> always stop timestamping mid-connection if it receives a non-timestamped
> >>> segment.
> >>>
> >>
> >> Okay, I set tcp_tstamp_if_wscale to 0 and it does seem to work fine.
> > 
> > Thanks, that pretty much confirms the issue is what I suspected it is.
> > 
> >> (Hoping there isn't any fallout from doing this now...)
> > 
> > As long as that middlebox has been mucking with your traffic in the way
> > it is, timestamps have been getting turned off mid-connection for your
> > pre-5850 box. I recommend you to ugprade to post-5850 if you can, or to
> > scream loudly at whoever is modifying your traffic :)
> > 
> 
> Actually I still notice some problems.. This morning in the direction OI => omnios 
> things seemed okay.
> Now, omnios => OI I just now experienced the hang again, and it is repeatable.
> 
> Could it be that your workaround is only useful for outbound connections (relative to OI)?

Yeah, it's possible. Whoever sends the SYN expresses their capability to
timestamp by including the tsopt, and you can disable that with the ndd
options. I assumed that the ndd options would affect SYNACK as well, but
I didn't actually read the code; I guess that's not the case after all,
so inbound connections still get timestamping negotiated. I don't have a
workaround for this, sorry.

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From richard at netbsd.org  Thu Oct  1 12:29:32 2015
From: richard at netbsd.org (Richard PALO)
Date: Thu, 01 Oct 2015 14:29:32 +0200
Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
In-Reply-To: <20151001122342.GC4668@gutsman.lotheac.fi>
References: <20150928134639.GC17072@gutsman.lotheac.fi>
	<mubibr$omv$1@ger.gmane.org>
	<20150928154027.GD5062@gutsman.lotheac.fi>
	<mudohh$i64$1@ger.gmane.org>
	<20150929103507.GE17072@gutsman.lotheac.fi>
	<560B95BF.4080404@netbsd.org>
	<20150930080248.GA4668@gutsman.lotheac.fi>
	<560D01CB.1060303@netbsd.org>
	<20151001095800.GB4668@gutsman.lotheac.fi>
	<muj6ks$5ab$1@ger.gmane.org>
	<20151001122342.GC4668@gutsman.lotheac.fi>
Message-ID: <560D272C.9040506@netbsd.org>

Le 01/10/15 14:23, Lauri Tirkkonen a ?crit :
> On Thu, Oct 01 2015 13:49:03 +0200, Richard PALO wrote:
>> Le 01/10/15 11:58, Lauri Tirkkonen a ?crit :
>>> On Thu, Oct 01 2015 11:50:03 +0200, Richard PALO wrote:
>>>>>> In that case, wouldn't setting tcp_tstamp_always on OI to '1' be better in
>>>>>> this case (or would OI not honour that setting correctly)?
>>>>>
>>>>> It wouldn't work. From what I can tell, those ndd settings only affect
>>>>> the SYN segments (ie.  timestamp negotiation); pre-5850 illumos will
>>>>> always stop timestamping mid-connection if it receives a non-timestamped
>>>>> segment.
>>>>>
>>>>
>>>> Okay, I set tcp_tstamp_if_wscale to 0 and it does seem to work fine.
>>>
>>> Thanks, that pretty much confirms the issue is what I suspected it is.
>>>
>>>> (Hoping there isn't any fallout from doing this now...)
>>>
>>> As long as that middlebox has been mucking with your traffic in the way
>>> it is, timestamps have been getting turned off mid-connection for your
>>> pre-5850 box. I recommend you to ugprade to post-5850 if you can, or to
>>> scream loudly at whoever is modifying your traffic :)
>>>
>>
>> Actually I still notice some problems.. This morning in the direction OI => omnios 
>> things seemed okay.
>> Now, omnios => OI I just now experienced the hang again, and it is repeatable.
>>
>> Could it be that your workaround is only useful for outbound connections (relative to OI)?
> 
> Yeah, it's possible. Whoever sends the SYN expresses their capability to
> timestamp by including the tsopt, and you can disable that with the ndd
> options. I assumed that the ndd options would affect SYNACK as well, but
> I didn't actually read the code; I guess that's not the case after all,
> so inbound connections still get timestamping negotiated. I don't have a
> workaround for this, sorry.
> 

Too bad.  Naturally it isn't feasible to turn things off via ndd on omnios for just one target.
Is there any way to do that differently? That is, for only one target (and primarily ssh)?

Unfortunately as well seems my inquiry to the OI list went unheard, even after subscribing (again).
Must not have any moderators any longer... oh bother. The easiest would be to have 5850 integrated into OI.


-- 
Richard PALO


From lotheac at iki.fi  Thu Oct  1 12:34:11 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Thu, 1 Oct 2015 15:34:11 +0300
Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9
In-Reply-To: <560D272C.9040506@netbsd.org>
References: <20150928154027.GD5062@gutsman.lotheac.fi>
	<mudohh$i64$1@ger.gmane.org>
	<20150929103507.GE17072@gutsman.lotheac.fi>
	<560B95BF.4080404@netbsd.org>
	<20150930080248.GA4668@gutsman.lotheac.fi>
	<560D01CB.1060303@netbsd.org>
	<20151001095800.GB4668@gutsman.lotheac.fi>
	<muj6ks$5ab$1@ger.gmane.org>
	<20151001122342.GC4668@gutsman.lotheac.fi>
	<560D272C.9040506@netbsd.org>
Message-ID: <20151001123411.GE4668@gutsman.lotheac.fi>

On Thu, Oct 01 2015 14:29:32 +0200, Richard PALO wrote:
> >> Actually I still notice some problems.. This morning in the direction OI => omnios 
> >> things seemed okay.
> >> Now, omnios => OI I just now experienced the hang again, and it is repeatable.
> >>
> >> Could it be that your workaround is only useful for outbound connections (relative to OI)?
> > 
> > Yeah, it's possible. Whoever sends the SYN expresses their capability to
> > timestamp by including the tsopt, and you can disable that with the ndd
> > options. I assumed that the ndd options would affect SYNACK as well, but
> > I didn't actually read the code; I guess that's not the case after all,
> > so inbound connections still get timestamping negotiated. I don't have a
> > workaround for this, sorry.
> > 
> 
> Too bad.  Naturally it isn't feasible to turn things off via ndd on omnios for just one target.
> Is there any way to do that differently? That is, for only one target (and primarily ssh)?

Not that I know of.

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From pekka.niiranen at pp5.inet.fi  Thu Oct  1 19:00:18 2015
From: pekka.niiranen at pp5.inet.fi (Pekka Niiranen)
Date: Thu, 1 Oct 2015 22:00:18 +0300
Subject: [OmniOS-discuss] Intel HD 3000 graphics
Message-ID: <mujvs2$ob1$1@ger.gmane.org>

Hello,

has anybody managed to get Intel HD 3000 in Sandy Bridge
work using X from SmartOS packages? Dmesg shows the chip
and X -configure sets "intel", but only after replacing
it with "vesa" works. Otherwise I get "No screens found".

I found https://www.illumos.org/issues/4044 but
what does "Work completed" in there mean for the user?
Do I need to build the driver myself?

-pekka-


From daleg at omniti.com  Thu Oct  1 19:31:42 2015
From: daleg at omniti.com (Dale Ghent)
Date: Thu, 1 Oct 2015 15:31:42 -0400
Subject: [OmniOS-discuss] Intel HD 3000 graphics
In-Reply-To: <mujvs2$ob1$1@ger.gmane.org>
References: <mujvs2$ob1$1@ger.gmane.org>
Message-ID: <28E20456-7B62-445E-B60D-5A4F93CA8A2B@omniti.com>


> On Oct 1, 2015, at 3:00 PM, Pekka Niiranen <pekka.niiranen at pp5.inet.fi> wrote:
> 
> Hello,
> 
> has anybody managed to get Intel HD 3000 in Sandy Bridge
> work using X from SmartOS packages? Dmesg shows the chip
> and X -configure sets "intel", but only after replacing
> it with "vesa" works. Otherwise I get "No screens found".
> 
> I found https://www.illumos.org/issues/4044 but
> what does "Work completed" in there mean for the user?
> Do I need to build the driver myself?

Xwindows/Xorg is not a target audience for OmniOS, so little, if any, attention is given to issues surrounding support for a windows environment.

/dale

From richard at netbsd.org  Thu Oct  1 19:47:41 2015
From: richard at netbsd.org (Richard PALO)
Date: Thu, 01 Oct 2015 21:47:41 +0200
Subject: [OmniOS-discuss] Intel HD 3000 graphics
In-Reply-To: <mujvs2$ob1$1@ger.gmane.org>
References: <mujvs2$ob1$1@ger.gmane.org>
Message-ID: <muk2mc$6jq$1@ger.gmane.org>

Le 01/10/15 21:00, Pekka Niiranen a ?crit :
> Hello,
> 
> has anybody managed to get Intel HD 3000 in Sandy Bridge
> work using X from SmartOS packages? Dmesg shows the chip
> and X -configure sets "intel", but only after replacing
> it with "vesa" works. Otherwise I get "No screens found".
> 
> I found https://www.illumos.org/issues/4044 but
> what does "Work completed" in there mean for the user?
> Do I need to build the driver myself?
> 
> -pekka-
> 
according to http://www.x.org/wiki/IntelGraphicsDriver/
I believe you will need a working Kernel Mode Setting (KMS) implementation.
seems all versions since 2.15 of the Intel-driver only support KMS...


-- 
Richard PALO


From Robert.Brock at 2hoffshore.com  Fri Oct  2 09:26:13 2015
From: Robert.Brock at 2hoffshore.com (Robert A. Brock)
Date: Fri, 2 Oct 2015 09:26:13 +0000
Subject: [OmniOS-discuss] Cannot access CIFS with \\servername.fqdn
Message-ID: <2859482C466CCA42AD9B84B9F56212301506AF27@2H199.2hukwok2.local>

List,

I've got a box that's doing this:

https://www.illumos.org/issues/1087

Trying to hit the cifs share by fqdn gets me this:


Error:
\\server.domain.local is not accessible. You might not have permission to use this network resource. Contact the administrator of this server to find out if you have access permissions.

The account is not authorized to log in from this station.
But I can connect fine with \\hostname<file:///\\hostname>.

For added weirdness, I have another OmniOS box that isn't exhibiting this problem.

Anybody seen this before and managed to figure out the cause?

Regards,
Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151002/9263790c/attachment.html>

From gate03 at landcroft.co.uk  Fri Oct  2 22:56:07 2015
From: gate03 at landcroft.co.uk (Michael Mounteney)
Date: Sat, 3 Oct 2015 08:56:07 +1000
Subject: [OmniOS-discuss] Unable to install r151014
Message-ID: <20151003085607.19a2bf44@pimple.landy.net>

Hello, I tried to install LTS yesterday but at the menu where one
selects [1] to install, it just paused for a moment before flashing up
what looked like a Python stacktrace, before returning to the
five-option menu.

The machine (Supermicro SYS 5107C-LF) is running r151013 so I believe is
compatible hardware.

The first step is to capture that stacktrace.  How to do that ?

______________
Michael Mounteney

From danmcd at omniti.com  Sat Oct  3 01:34:51 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 2 Oct 2015 21:34:51 -0400
Subject: [OmniOS-discuss] Unable to install r151014
In-Reply-To: <20151003085607.19a2bf44@pimple.landy.net>
References: <20151003085607.19a2bf44@pimple.landy.net>
Message-ID: <6EF1C261-400F-4020-A2D8-C58119CF2561@omniti.com>

If you're running 013, you can just use IPS to upgrade.

But yes, I'd like to see what happened.  I'll also reverify locally.  You using a USB or an ISO?

Dan

Sent from my iPhone (typos, autocorrect, and all)

> On Oct 2, 2015, at 6:56 PM, Michael Mounteney <gate03 at landcroft.co.uk> wrote:
> 
> Hello, I tried to install LTS yesterday but at the menu where one
> selects [1] to install, it just paused for a moment before flashing up
> what looked like a Python stacktrace, before returning to the
> five-option menu.
> 
> The machine (Supermicro SYS 5107C-LF) is running r151013 so I believe is
> compatible hardware.
> 
> The first step is to capture that stacktrace.  How to do that ?
> 
> ______________
> Michael Mounteney
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

From marcus at plumdev.com  Mon Oct  5 00:35:06 2015
From: marcus at plumdev.com (Marcus Marinelli)
Date: Sun, 4 Oct 2015 17:35:06 -0700
Subject: [OmniOS-discuss] Unable to boot r151014 USB installation media;
 difference between ISO and USB files?
Message-ID: <CANjZW_qGKi1cV-24gyTAtkM+7GvhSkesDRCu=FEQhFrodCcTwg@mail.gmail.com>

Hi All,

I'm hoping to be able to use OmniOS in a hosted/cloud environment, but
having some problems getting a working image setup.

I began by trying to install the latest stable (r151014) by writing the USB
disk installation file (OmniOS_Text_r151014.usb-dd) to a virtual device on
the VM provider, adding a blank disk to install to, and booting the
virtualized KVM node against the usb image.

The VM comes up and shows the OmniOS grub splash screen, lets me choose the
regular or ttya/ttyb boot menu selections, however after getting the
initial "SunOS Release 5.11 Version omnios-f090f73 64-bit" messages to show
up, things start going poorly.

After "Preparing image for use" is shown, next I see:

Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Console login service(s) cannot run

Enter use name for system maintenance (control-d to bypass):

At this point, once I drop into the shell, svcs -vx shows

svc:/system/filesystem/root-assembly:media (Installation file system
assembly) is in state maintenance due to reason "Start method exited with
$SMF_EXIT_ERR_FATAL" and consequently a whole bunch (55) of dependent
services are not running.

If I look at the svc log for system-filesystem-root-assembly:media, I see:

Executing start method ("/lib/svc/method/media-assembly")
Unable to mount media

Looking in /lib/svc/method/media-assembly, I believe it is failing quite
early:

. /lib/svc/share/media_include.sh
. /lib/svc/share/smf_include.sh
. /lib/svc/share/fs_include.sh

volsetid=$( < "/.volsetid" )

echo "\rPreparing image for use" >/dev/msglog

/usr/sbin/mount_media $volsetid
if [ $? -ne 0]; then
echo "Unable to mount media"
echo $SMF_EXIT_ERR_FATAL
fi

For reference, the contents of /.volsetid ($volsetid) is
"r151014-2015-09-30T13:22:46.571554"

If I run the /lib/svc/method/media-assembly script at this point (from the
shell) I can see that /usr/sbin/mount_media is returning exit code 1, which
explains why the script is failing and consequently SMF has given up.

In order to diagnose this further, I then took the same USB image
(OmniOS_Text_r151014.usb-dd) and wrote it to a real physical USB disk, and
booted a real physical machine from it. The machine is 2 or 3 years old at
this point - not brand new, but not ancient either. The surprising thing is
I experienced the same behavior - the "real" machine failed to boot in the
same way with seemingly the same issue.

At this point, I thought maybe something was wrong with the r151014 USB
installation media, so I downloaded r151012, r151010, r151006 and r151004's
USB installation files. Flashed them all (in sequence, working backwards)
to a real USB stick and experienced the same problem on all of them.

I then took the r151014 ISO image file, burned that to a disk, and put it
in the same physical computer, and it booted right up and loaded the OmniOS
installer fine.

It looks like this may be related to another user's recent experience:

http://lists.omniti.com/pipermail/omnios-discuss/2015-September/005651.html

(at least it sure seems like the same failure mode, although it's not clear
from that email chain if the user installed r151012 successfully from the
.usb-dd image or if they, perhaps, also used an ISO image for the
successful '012 installation they mentioned)

Can anyone (Dan? :)) help shed some light on what I might be doing wrong
here with the USB installation media?

Thanks,
Marcus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151004/b7b4835c/attachment-0001.html>

From danmcd at omniti.com  Mon Oct  5 14:04:32 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 5 Oct 2015 10:04:32 -0400
Subject: [OmniOS-discuss] Unable to boot r151014 USB installation media;
	difference between ISO and USB files?
In-Reply-To: <CANjZW_qGKi1cV-24gyTAtkM+7GvhSkesDRCu=FEQhFrodCcTwg@mail.gmail.com>
References: <CANjZW_qGKi1cV-24gyTAtkM+7GvhSkesDRCu=FEQhFrodCcTwg@mail.gmail.com>
Message-ID: <18281AE8-D7CF-48E4-8B85-7B132118C4D2@omniti.com>


> On Oct 4, 2015, at 8:35 PM, Marcus Marinelli <marcus at plumdev.com> wrote:
> 
> Can anyone (Dan? :)) help shed some light on what I might be doing wrong here with the USB installation media?

It is possible there's a problem in distro_const(1M) when it comes to the USB media.

What's concerning, however is this:

> At this point, I thought maybe something was wrong with the r151014 USB installation media, so I downloaded r151012, r151010, r151006 and r151004's USB installation files. Flashed them all (in sequence, working backwards) to a real USB stick and experienced the same problem on all of them.

Another user recently had a problem with USB, but it turned out it was a bad USB drive on his end.  Please make sure.  Also, and this may be a documentation error, are you writing to the right disk device?

Dan


From asc1111 at gmail.com  Mon Oct  5 16:45:37 2015
From: asc1111 at gmail.com (Aaron Curry)
Date: Mon, 5 Oct 2015 10:45:37 -0600
Subject: [OmniOS-discuss] zfs send/receive corruption?
Message-ID: <CAOqBcP_m=btaBsEfvJ6-3V0OK4iNMmDL=sPhOUJ=AhXu0aU5rw@mail.gmail.com>

We have a file server implementing CIFS to serve files to our users.
Periodic snapshots are replicated to a secondary system via zfs
send/receive. I recently moved services (shares, ip addresses, etc) to the
secondary system while we performed some maintenance on the primary server.
Shortly after everything was up and running on the secondary system, that
server panic'ed. Here's the stack trace:

panicstr = assertion failed: 0 == zfs_acl_node_read(dzp, B_TRUE, &paclp,
B_FALSE), file: ../../common/fs/zfs/zfs_acl.c, line: 1717
panicstack = fffffffffba8b1a8 () | zfs:zfs_acl_ids_create+4d2 () |
zfs:zfs_make_xattrdir+96 () | zfs:zfs_get_xattrdir+103 () |
zfs:zfs_lookup+1b6 () | genunix:fop_lookup+a2 () |
genunix:xattr_dir_realdir+b3 () | genunix:xattr_lookup_cb+65 () |
genunix:gfs_dir_lookup_dynamic+7c () | genunix:gfs_dir_lookup+18c () |
genunix:gfs_vop_lookup+35 () | genunix:fop_lookup+a2 () |
smbsrv:smb_vop_lookup+ea () | smbsrv:smb_vop_stream_lookup+e5 () |
smbsrv:smb_fsop_lookup_name+158 () | smbsrv:smb_open_subr+1b8 () |
smbsrv:smb_common_open+54 () | smbsrv:smb_com_nt_create_andx+ac () |
smbsrv:smb_dispatch_request+687 () | smbsrv:smb_session_worker+a0 () |
genunix:taskq_d_thread+b7 () | unix:thread_start+8 () |

Luckily, it wasn't hard to identify the steps to reproduce the problem.
Accessing a particular directory from a Mac OS X system causes this panic
every time, but only on the secondary (zfs send/receive target) system.
Accessing the same directory on the primary system does not cause a panic.

I have tested this on other systems and have been able to reproduce the
panic on the zfs send/receive target every time. Also, I have been able to
reproduce it with OmniOS versions 151010, 151012 and the latest 151014.
Replicating between two separate system and replicating to the local system
both exhibit the same behavior.

While I have been able to reliably pin the down a particular file or
file/directory combination that is causing the problem and can easily
reproduce the panic, I am at a loss of where to go from here. Are there any
known issues with zfs send/receive? Any help would be appreciated.

Aaron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151005/7fb6652c/attachment.html>

From mir at miras.org  Mon Oct  5 17:09:46 2015
From: mir at miras.org (Michael Rasmussen)
Date: Mon, 5 Oct 2015 19:09:46 +0200
Subject: [OmniOS-discuss] zfs send/receive corruption?
In-Reply-To: <CAOqBcP_m=btaBsEfvJ6-3V0OK4iNMmDL=sPhOUJ=AhXu0aU5rw@mail.gmail.com>
References: <CAOqBcP_m=btaBsEfvJ6-3V0OK4iNMmDL=sPhOUJ=AhXu0aU5rw@mail.gmail.com>
Message-ID: <20151005190946.503fa851@sleipner.datanom.net>

On Mon, 5 Oct 2015 10:45:37 -0600
Aaron Curry <asc1111 at gmail.com> wrote:

> 
> While I have been able to reliably pin the down a particular file or
> file/directory combination that is causing the problem and can easily
> reproduce the panic, I am at a loss of where to go from here. Are there any
> known issues with zfs send/receive? Any help would be appreciated.
> 
What is the sync setting on the receiving pool?

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
There are no emotional victims, only volunteers.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <https://omniosce.org/ml-archive/attachments/20151005/82b37198/attachment.bin>

From mir at miras.org  Mon Oct  5 17:45:41 2015
From: mir at miras.org (Michael Rasmussen)
Date: Mon, 5 Oct 2015 19:45:41 +0200
Subject: [OmniOS-discuss] zfs send/receive corruption?
In-Reply-To: <CAOqBcP-h2hDyfFEWqh4_HS6Kep7iq3F0dqJ6g3L3Pyuh6uG1cg@mail.gmail.com>
References: <CAOqBcP_m=btaBsEfvJ6-3V0OK4iNMmDL=sPhOUJ=AhXu0aU5rw@mail.gmail.com>
	<20151005190946.503fa851@sleipner.datanom.net>
	<CAOqBcP-h2hDyfFEWqh4_HS6Kep7iq3F0dqJ6g3L3Pyuh6uG1cg@mail.gmail.com>
Message-ID: <20151005194541.6e7d7b70@sleipner.datanom.net>

On Mon, 5 Oct 2015 11:30:04 -0600
Aaron Curry <asc1111 at gmail.com> wrote:

> # zfs get sync pool/fs
> NAME        PROPERTY  VALUE     SOURCE
> pool/fs  sync      standard  default
> 
> Is that what you mean?
> 
Yes. Default means honor sync requests.

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
Love isn't only blind, it's also deaf, dumb, and stupid.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <https://omniosce.org/ml-archive/attachments/20151005/43d8b1ab/attachment.bin>

From chip at innovates.com  Mon Oct  5 18:28:48 2015
From: chip at innovates.com (Schweiss, Chip)
Date: Mon, 5 Oct 2015 13:28:48 -0500
Subject: [OmniOS-discuss] zfs send/receive corruption?
In-Reply-To: <20151005194541.6e7d7b70@sleipner.datanom.net>
References: <CAOqBcP_m=btaBsEfvJ6-3V0OK4iNMmDL=sPhOUJ=AhXu0aU5rw@mail.gmail.com>
	<20151005190946.503fa851@sleipner.datanom.net>
	<CAOqBcP-h2hDyfFEWqh4_HS6Kep7iq3F0dqJ6g3L3Pyuh6uG1cg@mail.gmail.com>
	<20151005194541.6e7d7b70@sleipner.datanom.net>
Message-ID: <CALeZrrT7L0WqGRpLkoxf5qH_or+j3rRxw+U9e8Uz5FoCdn4ULQ@mail.gmail.com>

This smells of a problem reported fixed on FreeBSD and ZoL.
http://permalink.gmane.org/gmane.comp.file-systems.openzfs.devel/1545

On the Illumos ZFS the question was posed if the fixed have been
incorporated, but unanswered:
http://www.listbox.com/member/archive/182191/2015/09/sort/time_rev/page/1/entry/23:71/20150916025648:1487D326-5C40-11E5-A45A-20B0EF10038B/

I'd be curious to confirm if this has been fixed in Illimos or not as I now
have systems with lots of CIFS and ACLs and potential vulnerable to the
same sort of problem.  Thus far I cannot find reference to it, but I could
be looking in the wrong place, or for the wrong keywords.

-Chip

On Mon, Oct 5, 2015 at 12:45 PM, Michael Rasmussen <mir at miras.org> wrote:

> On Mon, 5 Oct 2015 11:30:04 -0600
> Aaron Curry <asc1111 at gmail.com> wrote:
>
> > # zfs get sync pool/fs
> > NAME        PROPERTY  VALUE     SOURCE
> > pool/fs  sync      standard  default
> >
> > Is that what you mean?
> >
> Yes. Default means honor sync requests.
>
> --
> Hilsen/Regards
> Michael Rasmussen
>
> Get my public GnuPG keys:
> michael <at> rasmussen <dot> cc
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
> mir <at> datanom <dot> net
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
> mir <at> miras <dot> org
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
> --------------------------------------------------------------
> /usr/games/fortune -es says:
> Love isn't only blind, it's also deaf, dumb, and stupid.
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151005/02cfdd4f/attachment.html>

From henson at acm.org  Tue Oct  6 20:54:26 2015
From: henson at acm.org (Paul B. Henson)
Date: Tue, 06 Oct 2015 13:54:26 -0700
Subject: [OmniOS-discuss] zdb -h bug?
In-Reply-To: <034601d0f278$08ce27b0$1a6a7710$@acm.org>
References: <005801d0ef58$da1898f0$8e49cad0$@acm.org>
	<55F8A73E.6050003@genashor.com>
	<034601d0f278$08ce27b0$1a6a7710$@acm.org>
Message-ID: <02f401d10079$32e1b2b0$98a51810$@acm.org>

> From: Paul B. Henson
> Sent: Friday, September 18, 2015 6:11 PM
> 
> Thanks for the verification. I gotta tell you, when zdb core dumped while
I
> was trying to determine if my pool had been corrupted by the L2ARC bug, it
> was not a good feeling 8-/. But I'm pretty sure at this point it is an
> unrelated bug with zdb and not a pool corruption issue. I still haven't
had
> time to set up a test environment to reproduce it, maybe next week.

I noticed a review request for "6290 zdb -h overflows stack" fly by on the
ZFS development list, I haven't confirmed it, but I think this is the bug we
were running into that was causing zdb -h to dump core. It's a pretty
trivial change; Dan, any chance you might be able to backport it to LTS at
some point?

Thanks.


From danmcd at omniti.com  Tue Oct  6 23:10:05 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 6 Oct 2015 19:10:05 -0400
Subject: [OmniOS-discuss] zdb -h bug?
In-Reply-To: <02f401d10079$32e1b2b0$98a51810$@acm.org>
References: <005801d0ef58$da1898f0$8e49cad0$@acm.org>
	<55F8A73E.6050003@genashor.com>
	<034601d0f278$08ce27b0$1a6a7710$@acm.org>
	<02f401d10079$32e1b2b0$98a51810$@acm.org>
Message-ID: <BE27175B-C09A-4516-B2DD-987BC6368791@omniti.com>


> On Oct 6, 2015, at 4:54 PM, Paul B. Henson <henson at acm.org> wrote:
> 
> Dan, any chance you might be able to backport it to LTS at
> some point?

I'm hoping the next batch of changes stays entirely in userland and doesn't require a reboot.  The problem with patching zdb is that it requires an upgrade of the whole ZFS package, which forces a reboot.

r151014(~)[0]% pkg search `which zdb`
INDEX      ACTION   VALUE        PACKAGE
path       hardlink usr/sbin/zdb pkg:/system/file-system/zfs at 0.5.11-0.151014
r151014(~)[0]% 

So if it happens, it mightn't happen as quickly as some other things I have in the backport pipeline.

Dan


From henson at acm.org  Wed Oct  7 19:43:20 2015
From: henson at acm.org (Paul B. Henson)
Date: Wed, 07 Oct 2015 12:43:20 -0700
Subject: [OmniOS-discuss] zdb -h bug?
In-Reply-To: <BE27175B-C09A-4516-B2DD-987BC6368791@omniti.com>
References: <005801d0ef58$da1898f0$8e49cad0$@acm.org>
	<55F8A73E.6050003@genashor.com>
	<034601d0f278$08ce27b0$1a6a7710$@acm.org>
	<02f401d10079$32e1b2b0$98a51810$@acm.org>
	<BE27175B-C09A-4516-B2DD-987BC6368791@omniti.com>
Message-ID: <03fb01d10138$6bec7530$43c55f90$@acm.org>

> From: Dan McDonald
> Sent: Tuesday, October 06, 2015 4:10 PM
> 
> I'm hoping the next batch of changes stays entirely in userland and
doesn't
> require a reboot.  The problem with patching zdb is that it requires an
> upgrade of the whole ZFS package, which forces a reboot.

That's a bummer. Maybe you can pull it in the next time you back port a zfs
kernel change. It's not particular important, I'm mostly curious to see if
it fixes the core dump I get from zdb, which is the last niggly annoyance I
have left from the L2ARC corruption scare :).

Thanks.


From bmx1955 at gmail.com  Wed Oct  7 20:59:14 2015
From: bmx1955 at gmail.com (Mick Burns)
Date: Wed, 7 Oct 2015 16:59:14 -0400
Subject: [OmniOS-discuss] big zfs storage?
In-Reply-To: <CALeZrrT10rqrdmrJ87U=Bo=XCd5QR-Z=DxA442nL2dxKvDTrJQ@mail.gmail.com>
References: <559EE5BF.4040900@kateley.com>
	<CALeZrrQ1nVjCFiTPEnDdpvCyHHDw7pm6z3tKH=678gHtR3c5tw@mail.gmail.com>
	<559FF4DE.4040202@kateley.com>
	<CALeZrrTSMhsD0K4Vw1khAVE=XDtThJ21kyGk-5DHvy5A_oBuug@mail.gmail.com>
	<CAESZ+_-zWA31os-tw+uvW5+FRCXwM6dt_VxgoNOAtq6fA==d_Q@mail.gmail.com>
	<CALeZrrT10rqrdmrJ87U=Bo=XCd5QR-Z=DxA442nL2dxKvDTrJQ@mail.gmail.com>
Message-ID: <CAJ=JNBaqYp+-PEEqNA2z6XfPBFY3jgtNcmXN6y9YP1SL0uEnZA@mail.gmail.com>

So... how does Nexenta copes with hot spares and all kinds of disk failures ?
Adding hot spares is part of their administration manuals so can we
assume things are almost always handled smoothly ?  I'd like to hear
from tangible experiences in production.


thanks

On Mon, Jul 13, 2015 at 7:58 AM, Schweiss, Chip <chip at innovates.com> wrote:
> Liam,
>
> This report is encouraging.  Please share some details of your
> configuration.   What disk failure parameters are have you set?   Which
> JBODs and disks are you running?
>
> I have mostly DataON JBODs and a some Supermicro.   DataON has PMC SAS
> expanders and Supermicro has LSI, both setups have pretty much the same
> behavior with disk failures.   All my servers are Supermicro with LSI HBAs.
>
> If there's a magic combination of hardware and OS config out there that
> solves the disk failure panic problem, I will certainly change my builds
> going forward.
>
> -Chip
>
> On Fri, Jul 10, 2015 at 1:04 PM, Liam Slusser <lslusser at gmail.com> wrote:
>>
>> I have two 800T ZFS systems on OmniOS and a bunch of smaller <50T systems.
>> Things generally work very well.  We loose a disk here and there but its
>> never resulted in downtime.  They're all on Dell hardware with LSI or Dell
>> PERC controllers.
>>
>> Putting in smaller disk failure parameters, so disks fail quicker, was a
>> big help when something does go wrong with a disk.
>>
>> thanks,
>> liam
>>
>>
>> On Fri, Jul 10, 2015 at 10:31 AM, Schweiss, Chip <chip at innovates.com>
>> wrote:
>>>
>>> Unfortunately for the past couple years panics on disk failure has been
>>> the norm.   All my production systems are HA with RSF-1, so at least things
>>> come back online relatively quick.  There are quite a few open tickets in
>>> the Illumos bug tracker related to mpt_sas related panics.
>>>
>>> Most of the work to fix these problems has been committed in the past
>>> year, though problems still exist.  For example, my systems are dual path
>>> SAS, however, mpt_sas will panic if you pull a cable instead of dropping a
>>> path to the disks.  Dan McDonald is actively working to resolve this.   He
>>> is also pushing a bug fix in genunix from Nexenta that appears to fix a lot
>>> of the panic problems.   I'll know for sure in a few months after I see a
>>> disk or two drop if it truly fixes things.  Hans Rosenfeld at Nexenta is
>>> responsible for most of the updates to mpt_sas including support for 3008
>>> (12G SAS).
>>>
>>> I haven't run any 12G SAS yet, but plan to on my next build in a couple
>>> months.   This will be about 300TB using an 84 disk JBOD.  All the code from
>>> Nexenta to support the 3008 appears to be in Illumos now, and they fully
>>> support it so I suspect it's pretty stable now.  From what I understand
>>> there may be some 12G performance fixes coming sometime.
>>>
>>> The fault manager is nice when the system doesn't panic.  When it panics,
>>> the fault manger never gets a chance to take action.  It is still the
>>> consensus that is is better to run pools without hot spares because there
>>> are situations the fault manager will do bad things.   I witnessed this
>>> myself when building a system and the fault manger replaced 5 disks in a
>>> raidz2 vdev inside 1 minute, trashing the pool.   I haven't completely yield
>>> to the "best practice".  I now run one hot spare per pool.  I figure with
>>> raidz2, the odds of the fault manager causing something catastrophic is much
>>> less possible.
>>>
>>> -Chip
>>>
>>>
>>>
>>> On Fri, Jul 10, 2015 at 11:37 AM, Linda Kateley <lkateley at kateley.com>
>>> wrote:
>>>>
>>>> I have to build and maintain my own system. I usually help others
>>>> build(i teach zfs and freenas classes/consulting). I really love fault
>>>> management in solaris and miss it. Just thought since it's my system and I
>>>> get to choose I would use omni. I have 20+ years using solaris and only 2 on
>>>> freebsd.
>>>>
>>>> I like freebsd for how well tuned for zfs oob. I miss the network, v12n
>>>> and resource controls in solaris.
>>>>
>>>> Concerned about panics on disk failure. Is that common?
>>>>
>>>>
>>>> linda
>>>>
>>>>
>>>> On 7/9/15 9:30 PM, Schweiss, Chip wrote:
>>>>
>>>> Linda,
>>>>
>>>> I have 3.5 PB running under OmniOS.  All my systems have LSI 2108 HBAs
>>>> which is considered the best choice for HBAs.
>>>>
>>>> Illumos leaves a bit to be desired with handling faults from disks or
>>>> SAS problems, but things under OmniOS have been improving, much thanks to
>>>> Dan McDonald and OmniTI.   We have a paid support on all of our production
>>>> systems with OmniTI.  Their response and dedication has been very good.
>>>> Other than the occasional panic and restart from a disk failure, OmniOS has
>>>> been solid.   ZFS of course never has lost a single bit of information.
>>>>
>>>> I'd be curious why you're looking to move, have there been specific
>>>> problems under BSD or ZoL?  I've been slowly evaluating FreeBSD ZFS, but of
>>>> course the skeletons in the closet never seem to come out until you do
>>>> something big.
>>>>
>>>> -Chip
>>>>
>>>> On Thu, Jul 9, 2015 at 4:21 PM, Linda Kateley <lkateley at kateley.com>
>>>> wrote:
>>>>>
>>>>> Hey is there anyone out there running big zfs on omni?
>>>>>
>>>>> I have been doing mostly zol and freebsd for the last year but have to
>>>>> build a 300+TB box and i want to come back home to roots(solaris). Feeling
>>>>> kind of hesitant :) Also, if you had to do over, is there anything you would
>>>>> do different.
>>>>>
>>>>> Also, what is the go to HBA these days? Seems like i saw stable code
>>>>> for lsi 3008?
>>>>>
>>>>> TIA
>>>>>
>>>>> linda
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OmniOS-discuss mailing list
>>>>> OmniOS-discuss at lists.omniti.com
>>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>>>
>>>>
>>>>
>>>> --
>>>> Linda Kateley
>>>> Kateley Company
>>>> Skype ID-kateleyco
>>>> http://kateleyco.com
>>>
>>>
>>>
>>> _______________________________________________
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss at lists.omniti.com
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>>
>>
>
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>

From richard.elling at richardelling.com  Wed Oct  7 22:38:11 2015
From: richard.elling at richardelling.com (Richard Elling)
Date: Wed, 7 Oct 2015 15:38:11 -0700
Subject: [OmniOS-discuss] big zfs storage?
In-Reply-To: <CAJ=JNBaqYp+-PEEqNA2z6XfPBFY3jgtNcmXN6y9YP1SL0uEnZA@mail.gmail.com>
References: <559EE5BF.4040900@kateley.com>
	<CALeZrrQ1nVjCFiTPEnDdpvCyHHDw7pm6z3tKH=678gHtR3c5tw@mail.gmail.com>
	<559FF4DE.4040202@kateley.com>
	<CALeZrrTSMhsD0K4Vw1khAVE=XDtThJ21kyGk-5DHvy5A_oBuug@mail.gmail.com>
	<CAESZ+_-zWA31os-tw+uvW5+FRCXwM6dt_VxgoNOAtq6fA==d_Q@mail.gmail.com>
	<CALeZrrT10rqrdmrJ87U=Bo=XCd5QR-Z=DxA442nL2dxKvDTrJQ@mail.gmail.com>
	<CAJ=JNBaqYp+-PEEqNA2z6XfPBFY3jgtNcmXN6y9YP1SL0uEnZA@mail.gmail.com>
Message-ID: <6A4C3B06-D2C5-4F07-B9A3-D0F477AE89AA@richardelling.com>


> On Oct 7, 2015, at 1:59 PM, Mick Burns <bmx1955 at gmail.com> wrote:
> 
> So... how does Nexenta copes with hot spares and all kinds of disk failures ?
> Adding hot spares is part of their administration manuals so can we
> assume things are almost always handled smoothly ?  I'd like to hear
> from tangible experiences in production.

I do not speak for Nexenta.

Hot spares are a bigger issue when you have single parity protection.
With double parity and large pools, warm spares is a better approach.
The reasons are:

1. Hot spares exist solely to eliminate the time between disk failure and human
   intervention for corrective action. There is no other reason to have hot spares.
   The exposure for a single disk failure under single parity protection is too risky
   for most folks, but with double parity (eg raidz2 or RAID-6) the few hours you 
   save has little impact on overall data availabilty vs warm spares.

2. Under some transient failure conditions (eg isolated power failure, IOM reboot, or fabric 
   partition), all available hot spares can be kicked into action. This can leave you with a
   big mess for large pools with many drives and spares. You can avoid this by making a
   human be involved in the decision process, rather than just *locally isolated,* automated
   decision making.

 -- richard

> 
> 
> thanks
> 
> On Mon, Jul 13, 2015 at 7:58 AM, Schweiss, Chip <chip at innovates.com> wrote:
>> Liam,
>> 
>> This report is encouraging.  Please share some details of your
>> configuration.   What disk failure parameters are have you set?   Which
>> JBODs and disks are you running?
>> 
>> I have mostly DataON JBODs and a some Supermicro.   DataON has PMC SAS
>> expanders and Supermicro has LSI, both setups have pretty much the same
>> behavior with disk failures.   All my servers are Supermicro with LSI HBAs.
>> 
>> If there's a magic combination of hardware and OS config out there that
>> solves the disk failure panic problem, I will certainly change my builds
>> going forward.
>> 
>> -Chip
>> 
>> On Fri, Jul 10, 2015 at 1:04 PM, Liam Slusser <lslusser at gmail.com> wrote:
>>> 
>>> I have two 800T ZFS systems on OmniOS and a bunch of smaller <50T systems.
>>> Things generally work very well.  We loose a disk here and there but its
>>> never resulted in downtime.  They're all on Dell hardware with LSI or Dell
>>> PERC controllers.
>>> 
>>> Putting in smaller disk failure parameters, so disks fail quicker, was a
>>> big help when something does go wrong with a disk.
>>> 
>>> thanks,
>>> liam
>>> 
>>> 
>>> On Fri, Jul 10, 2015 at 10:31 AM, Schweiss, Chip <chip at innovates.com>
>>> wrote:
>>>> 
>>>> Unfortunately for the past couple years panics on disk failure has been
>>>> the norm.   All my production systems are HA with RSF-1, so at least things
>>>> come back online relatively quick.  There are quite a few open tickets in
>>>> the Illumos bug tracker related to mpt_sas related panics.
>>>> 
>>>> Most of the work to fix these problems has been committed in the past
>>>> year, though problems still exist.  For example, my systems are dual path
>>>> SAS, however, mpt_sas will panic if you pull a cable instead of dropping a
>>>> path to the disks.  Dan McDonald is actively working to resolve this.   He
>>>> is also pushing a bug fix in genunix from Nexenta that appears to fix a lot
>>>> of the panic problems.   I'll know for sure in a few months after I see a
>>>> disk or two drop if it truly fixes things.  Hans Rosenfeld at Nexenta is
>>>> responsible for most of the updates to mpt_sas including support for 3008
>>>> (12G SAS).
>>>> 
>>>> I haven't run any 12G SAS yet, but plan to on my next build in a couple
>>>> months.   This will be about 300TB using an 84 disk JBOD.  All the code from
>>>> Nexenta to support the 3008 appears to be in Illumos now, and they fully
>>>> support it so I suspect it's pretty stable now.  From what I understand
>>>> there may be some 12G performance fixes coming sometime.
>>>> 
>>>> The fault manager is nice when the system doesn't panic.  When it panics,
>>>> the fault manger never gets a chance to take action.  It is still the
>>>> consensus that is is better to run pools without hot spares because there
>>>> are situations the fault manager will do bad things.   I witnessed this
>>>> myself when building a system and the fault manger replaced 5 disks in a
>>>> raidz2 vdev inside 1 minute, trashing the pool.   I haven't completely yield
>>>> to the "best practice".  I now run one hot spare per pool.  I figure with
>>>> raidz2, the odds of the fault manager causing something catastrophic is much
>>>> less possible.
>>>> 
>>>> -Chip
>>>> 
>>>> 
>>>> 
>>>> On Fri, Jul 10, 2015 at 11:37 AM, Linda Kateley <lkateley at kateley.com>
>>>> wrote:
>>>>> 
>>>>> I have to build and maintain my own system. I usually help others
>>>>> build(i teach zfs and freenas classes/consulting). I really love fault
>>>>> management in solaris and miss it. Just thought since it's my system and I
>>>>> get to choose I would use omni. I have 20+ years using solaris and only 2 on
>>>>> freebsd.
>>>>> 
>>>>> I like freebsd for how well tuned for zfs oob. I miss the network, v12n
>>>>> and resource controls in solaris.
>>>>> 
>>>>> Concerned about panics on disk failure. Is that common?
>>>>> 
>>>>> 
>>>>> linda
>>>>> 
>>>>> 
>>>>> On 7/9/15 9:30 PM, Schweiss, Chip wrote:
>>>>> 
>>>>> Linda,
>>>>> 
>>>>> I have 3.5 PB running under OmniOS.  All my systems have LSI 2108 HBAs
>>>>> which is considered the best choice for HBAs.
>>>>> 
>>>>> Illumos leaves a bit to be desired with handling faults from disks or
>>>>> SAS problems, but things under OmniOS have been improving, much thanks to
>>>>> Dan McDonald and OmniTI.   We have a paid support on all of our production
>>>>> systems with OmniTI.  Their response and dedication has been very good.
>>>>> Other than the occasional panic and restart from a disk failure, OmniOS has
>>>>> been solid.   ZFS of course never has lost a single bit of information.
>>>>> 
>>>>> I'd be curious why you're looking to move, have there been specific
>>>>> problems under BSD or ZoL?  I've been slowly evaluating FreeBSD ZFS, but of
>>>>> course the skeletons in the closet never seem to come out until you do
>>>>> something big.
>>>>> 
>>>>> -Chip
>>>>> 
>>>>> On Thu, Jul 9, 2015 at 4:21 PM, Linda Kateley <lkateley at kateley.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hey is there anyone out there running big zfs on omni?
>>>>>> 
>>>>>> I have been doing mostly zol and freebsd for the last year but have to
>>>>>> build a 300+TB box and i want to come back home to roots(solaris). Feeling
>>>>>> kind of hesitant :) Also, if you had to do over, is there anything you would
>>>>>> do different.
>>>>>> 
>>>>>> Also, what is the go to HBA these days? Seems like i saw stable code
>>>>>> for lsi 3008?
>>>>>> 
>>>>>> TIA
>>>>>> 
>>>>>> linda
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> OmniOS-discuss mailing list
>>>>>> OmniOS-discuss at lists.omniti.com
>>>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Linda Kateley
>>>>> Kateley Company
>>>>> Skype ID-kateleyco
>>>>> http://kateleyco.com
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> OmniOS-discuss mailing list
>>>> OmniOS-discuss at lists.omniti.com
>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


From chip at innovates.com  Thu Oct  8 00:56:30 2015
From: chip at innovates.com (Schweiss, Chip)
Date: Wed, 7 Oct 2015 19:56:30 -0500
Subject: [OmniOS-discuss] big zfs storage?
In-Reply-To: <6A4C3B06-D2C5-4F07-B9A3-D0F477AE89AA@richardelling.com>
References: <559EE5BF.4040900@kateley.com>
	<CALeZrrQ1nVjCFiTPEnDdpvCyHHDw7pm6z3tKH=678gHtR3c5tw@mail.gmail.com>
	<559FF4DE.4040202@kateley.com>
	<CALeZrrTSMhsD0K4Vw1khAVE=XDtThJ21kyGk-5DHvy5A_oBuug@mail.gmail.com>
	<CAESZ+_-zWA31os-tw+uvW5+FRCXwM6dt_VxgoNOAtq6fA==d_Q@mail.gmail.com>
	<CALeZrrT10rqrdmrJ87U=Bo=XCd5QR-Z=DxA442nL2dxKvDTrJQ@mail.gmail.com>
	<CAJ=JNBaqYp+-PEEqNA2z6XfPBFY3jgtNcmXN6y9YP1SL0uEnZA@mail.gmail.com>
	<6A4C3B06-D2C5-4F07-B9A3-D0F477AE89AA@richardelling.com>
Message-ID: <CALeZrrSHs2o8m_qKDYCeX+vpJWo+=cFyjHSV2BGS5rA8t268BQ@mail.gmail.com>

I completely concur with Richard on this.  Let me give an a real example
that emphases this point as it's a critical design decision.

I never fully understood this until I saw in action the problem can
automate hot spares can cause.   I had all 5 hot spares get put into action
on one raidz2 vdev of a 300TB pool.  This was triggered by an HA event that
was taking SCSI reservations in a split brain situation that was supposed
to trigger a panic on one system.  This caused a highly corrupted pool.
Fortunately this was not a production pool and I simply trashed it and
started reloading data.

Now I only run one hot spare per pool.  Most of my pools are raidz2 or
raidz3.   This way any event like this can not take out more than one disk
and data parity will never be lost.

There are other causes that can trigger multiple disk replacements. I have
not encountered them.  If I do, they won't hurt my data with the limit of
one hot spare.

-Chip


On Wed, Oct 7, 2015 at 5:38 PM, Richard Elling <
richard.elling at richardelling.com> wrote:

>
> > On Oct 7, 2015, at 1:59 PM, Mick Burns <bmx1955 at gmail.com> wrote:
> >
> > So... how does Nexenta copes with hot spares and all kinds of disk
> failures ?
> > Adding hot spares is part of their administration manuals so can we
> > assume things are almost always handled smoothly ?  I'd like to hear
> > from tangible experiences in production.
>
> I do not speak for Nexenta.
>
> Hot spares are a bigger issue when you have single parity protection.
> With double parity and large pools, warm spares is a better approach.
> The reasons are:
>
> 1. Hot spares exist solely to eliminate the time between disk failure and
> human
>    intervention for corrective action. There is no other reason to have
> hot spares.
>    The exposure for a single disk failure under single parity protection
> is too risky
>    for most folks, but with double parity (eg raidz2 or RAID-6) the few
> hours you
>    save has little impact on overall data availabilty vs warm spares.
>
> 2. Under some transient failure conditions (eg isolated power failure, IOM
> reboot, or fabric
>    partition), all available hot spares can be kicked into action. This
> can leave you with a
>    big mess for large pools with many drives and spares. You can avoid
> this by making a
>    human be involved in the decision process, rather than just *locally
> isolated,* automated
>    decision making.
>
>  -- richard
>
> >
> >
> > thanks
> >
> > On Mon, Jul 13, 2015 at 7:58 AM, Schweiss, Chip <chip at innovates.com>
> wrote:
> >> Liam,
> >>
> >> This report is encouraging.  Please share some details of your
> >> configuration.   What disk failure parameters are have you set?   Which
> >> JBODs and disks are you running?
> >>
> >> I have mostly DataON JBODs and a some Supermicro.   DataON has PMC SAS
> >> expanders and Supermicro has LSI, both setups have pretty much the same
> >> behavior with disk failures.   All my servers are Supermicro with LSI
> HBAs.
> >>
> >> If there's a magic combination of hardware and OS config out there that
> >> solves the disk failure panic problem, I will certainly change my builds
> >> going forward.
> >>
> >> -Chip
> >>
> >> On Fri, Jul 10, 2015 at 1:04 PM, Liam Slusser <lslusser at gmail.com>
> wrote:
> >>>
> >>> I have two 800T ZFS systems on OmniOS and a bunch of smaller <50T
> systems.
> >>> Things generally work very well.  We loose a disk here and there but
> its
> >>> never resulted in downtime.  They're all on Dell hardware with LSI or
> Dell
> >>> PERC controllers.
> >>>
> >>> Putting in smaller disk failure parameters, so disks fail quicker, was
> a
> >>> big help when something does go wrong with a disk.
> >>>
> >>> thanks,
> >>> liam
> >>>
> >>>
> >>> On Fri, Jul 10, 2015 at 10:31 AM, Schweiss, Chip <chip at innovates.com>
> >>> wrote:
> >>>>
> >>>> Unfortunately for the past couple years panics on disk failure has
> been
> >>>> the norm.   All my production systems are HA with RSF-1, so at least
> things
> >>>> come back online relatively quick.  There are quite a few open
> tickets in
> >>>> the Illumos bug tracker related to mpt_sas related panics.
> >>>>
> >>>> Most of the work to fix these problems has been committed in the past
> >>>> year, though problems still exist.  For example, my systems are dual
> path
> >>>> SAS, however, mpt_sas will panic if you pull a cable instead of
> dropping a
> >>>> path to the disks.  Dan McDonald is actively working to resolve
> this.   He
> >>>> is also pushing a bug fix in genunix from Nexenta that appears to fix
> a lot
> >>>> of the panic problems.   I'll know for sure in a few months after I
> see a
> >>>> disk or two drop if it truly fixes things.  Hans Rosenfeld at Nexenta
> is
> >>>> responsible for most of the updates to mpt_sas including support for
> 3008
> >>>> (12G SAS).
> >>>>
> >>>> I haven't run any 12G SAS yet, but plan to on my next build in a
> couple
> >>>> months.   This will be about 300TB using an 84 disk JBOD.  All the
> code from
> >>>> Nexenta to support the 3008 appears to be in Illumos now, and they
> fully
> >>>> support it so I suspect it's pretty stable now.  From what I
> understand
> >>>> there may be some 12G performance fixes coming sometime.
> >>>>
> >>>> The fault manager is nice when the system doesn't panic.  When it
> panics,
> >>>> the fault manger never gets a chance to take action.  It is still the
> >>>> consensus that is is better to run pools without hot spares because
> there
> >>>> are situations the fault manager will do bad things.   I witnessed
> this
> >>>> myself when building a system and the fault manger replaced 5 disks
> in a
> >>>> raidz2 vdev inside 1 minute, trashing the pool.   I haven't
> completely yield
> >>>> to the "best practice".  I now run one hot spare per pool.  I figure
> with
> >>>> raidz2, the odds of the fault manager causing something catastrophic
> is much
> >>>> less possible.
> >>>>
> >>>> -Chip
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Jul 10, 2015 at 11:37 AM, Linda Kateley <lkateley at kateley.com
> >
> >>>> wrote:
> >>>>>
> >>>>> I have to build and maintain my own system. I usually help others
> >>>>> build(i teach zfs and freenas classes/consulting). I really love
> fault
> >>>>> management in solaris and miss it. Just thought since it's my system
> and I
> >>>>> get to choose I would use omni. I have 20+ years using solaris and
> only 2 on
> >>>>> freebsd.
> >>>>>
> >>>>> I like freebsd for how well tuned for zfs oob. I miss the network,
> v12n
> >>>>> and resource controls in solaris.
> >>>>>
> >>>>> Concerned about panics on disk failure. Is that common?
> >>>>>
> >>>>>
> >>>>> linda
> >>>>>
> >>>>>
> >>>>> On 7/9/15 9:30 PM, Schweiss, Chip wrote:
> >>>>>
> >>>>> Linda,
> >>>>>
> >>>>> I have 3.5 PB running under OmniOS.  All my systems have LSI 2108
> HBAs
> >>>>> which is considered the best choice for HBAs.
> >>>>>
> >>>>> Illumos leaves a bit to be desired with handling faults from disks or
> >>>>> SAS problems, but things under OmniOS have been improving, much
> thanks to
> >>>>> Dan McDonald and OmniTI.   We have a paid support on all of our
> production
> >>>>> systems with OmniTI.  Their response and dedication has been very
> good.
> >>>>> Other than the occasional panic and restart from a disk failure,
> OmniOS has
> >>>>> been solid.   ZFS of course never has lost a single bit of
> information.
> >>>>>
> >>>>> I'd be curious why you're looking to move, have there been specific
> >>>>> problems under BSD or ZoL?  I've been slowly evaluating FreeBSD ZFS,
> but of
> >>>>> course the skeletons in the closet never seem to come out until you
> do
> >>>>> something big.
> >>>>>
> >>>>> -Chip
> >>>>>
> >>>>> On Thu, Jul 9, 2015 at 4:21 PM, Linda Kateley <lkateley at kateley.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hey is there anyone out there running big zfs on omni?
> >>>>>>
> >>>>>> I have been doing mostly zol and freebsd for the last year but have
> to
> >>>>>> build a 300+TB box and i want to come back home to roots(solaris).
> Feeling
> >>>>>> kind of hesitant :) Also, if you had to do over, is there anything
> you would
> >>>>>> do different.
> >>>>>>
> >>>>>> Also, what is the go to HBA these days? Seems like i saw stable code
> >>>>>> for lsi 3008?
> >>>>>>
> >>>>>> TIA
> >>>>>>
> >>>>>> linda
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> OmniOS-discuss mailing list
> >>>>>> OmniOS-discuss at lists.omniti.com
> >>>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Linda Kateley
> >>>>> Kateley Company
> >>>>> Skype ID-kateleyco
> >>>>> http://kateleyco.com
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> OmniOS-discuss mailing list
> >>>> OmniOS-discuss at lists.omniti.com
> >>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> >>>>
> >>>
> >>
> >>
> >> _______________________________________________
> >> OmniOS-discuss mailing list
> >> OmniOS-discuss at lists.omniti.com
> >> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> >>
> > _______________________________________________
> > OmniOS-discuss mailing list
> > OmniOS-discuss at lists.omniti.com
> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151007/d0ea108f/attachment-0001.html>

From cks at cs.toronto.edu  Thu Oct  8 01:36:43 2015
From: cks at cs.toronto.edu (Chris Siebenmann)
Date: Wed, 07 Oct 2015 21:36:43 -0400
Subject: [OmniOS-discuss] big zfs storage?
In-Reply-To: chip's message of Wed, 07 Oct 2015 19:56:30 -0500.
	<CALeZrrSHs2o8m_qKDYCeX+vpJWo+=cFyjHSV2BGS5rA8t268BQ@mail.gmail.com>
Message-ID: <20151008013643.2D31F7A06B2@apps0.cs.toronto.edu>

> I completely concur with Richard on this.  Let me give an a real example
> that emphases this point as it's a critical design decision.
[...]
> Now I only run one hot spare per pool.  Most of my pools are raidz2 or
> raidz3.  This way any event like this can not take out more than one
> disk and data parity will never be lost.
>
> There are other causes that can trigger multiple disk replacements. I
> have not encountered them.  If I do, they won't hurt my data with the
> limit of one hot spare.

 My view is that spare handling needs to be a local decision based on
your storage topology and pool and vdev structure (and on your durability
needs, and even on how staffing is handled, eg if you have a 24/7 on
call rotation). I don't think there is any single global right answer;
hot spares will be good for some people and bad for others.

 Locally we use mirrored vdevs, multiple pools, an iSCSI SAN to connect
to actual disks, multiple backend disk controllers, and no 24/7 on call
setup. We've developed an automated spares handling system that knows a
great deal about our local storage topology (so it knows what are 'good'
and 'bad' spares for any particular bad disk, using various criteria)
and having it available has been very helpful in the face of various
things going wrong, both individual disk failures and entire backend
disk controllers suffering power failures after the end of the workday.
Our solution is of course very local, but the important thing is that
it's clear that automating this has been the right tradeoff *for us*.

(In another environment it would probably be the wrong answer, eg if we
had a 24/7 NOC staffed with people to swap physical disks and hardware
at any time of the day, night, or holidays, and a 24/7 on call sysadmin
to do system things like 'zpool replace'. There are other parts of
the university which do have this. I suspect that they don't use an
automated spares system of any kind, although I don't know for sure.)

	- cks

From lotheac at iki.fi  Thu Oct  8 07:59:20 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Thu, 8 Oct 2015 10:59:20 +0300
Subject: [OmniOS-discuss] zfs recv assertion failed when scrubbing source
	pool
Message-ID: <20151008075920.GA10733@gutsman.lotheac.fi>

We're sending nightly incremental replication snaphots of a large
filesystem tree (about 3900 filesystems) to a backup host. It's been
working mostly okay - we scrub the source pool every month and that
hasn't had any effect on the sends/receives. However, on Sep 21, I
upgraded the backup host from entire at 11-0.151014:20150402T192159Z to
entire at 11-0.151014:20150914T123242Z, and during the zpool scrub on the
source host at the start of October we got this:

    Assertion failed: ilen <= SPA_MAXBLOCKSIZE, file ../common/libzfs_sendrecv.c, line 1706, function recv_read

It seemed a transient failure as I was at first unable to reproduce it,
but firing off another scrub on the source pool did cause it to happen
again the following night, when scrub was still running. I further
upgraded the backup host to the Sep29 151014 update (which apparently
didn't bump the 'entire' version), and it's still happening. The source
host is currently in production and still running omnios-170cea2 (or
entire at 11-0.151014:20150402T192159Z); we're scheduled to upgrade it next
Monday. It had a cache device up until Dan's recent advice to remove it;
I suspected maybe we'd been hit by corruption, but that doesn't explain
why the assertion happens only when the source pool is scrubbing.

We use this kind of command to send snapshots to the backup host:
    zfs send -R -i $yesterday ${filesystem}@today | ssh backuphost zfs recv -ud $targetfs

We're not running either send or recv as root, opting to use delegations
instead. Don't know if that's relevant or not.

Any clues?

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From peter.tribble at gmail.com  Thu Oct  8 14:35:50 2015
From: peter.tribble at gmail.com (Peter Tribble)
Date: Thu, 8 Oct 2015 15:35:50 +0100
Subject: [OmniOS-discuss] pkg verify failing on pyc files
Message-ID: <CAEgYsbHPrPwKcJ6UaJMgcrDaPebk5QtLonWX5aqa0WnCLxBAvg@mail.gmail.com>

I'm using pkg verify to ensure that nothing has been tampered with.
Unfortunately, I keep getting verify errors like the following:

pkg://omnios/library/python-2/ply
ERROR
  file: usr/lib/python2.6/vendor-packages/ply/__init__.pyc
    Group: 'root (0)' should be 'bin (2)'
    Size: 178 bytes should be 176
    Hash: d769283e99c45552467e95e55e5f5a3df00875b4 should be
d2f6ea4ff88fd7a35b5e56b7d34dd71a72479fd6
  file: usr/lib/python2.6/vendor-packages/ply/lex.pyc
    Group: 'root (0)' should be 'bin (2)'
    Size: 26838 bytes should be 26728
    Hash: 00fb25fb4ab79ec5cb85fb745ff0996ab34230a8 should be
1c5b1e78d4531bcc2c0c202c153531377b0cc17f
  file: usr/lib/python2.6/vendor-packages/ply/yacc.pyc
    Group: 'root (0)' should be 'bin (2)'
    Size: 63183 bytes should be 62924
    Hash: 4064718466fb95fc61eb5c585e8463b1f68ce884 should be
67d904213ad4ab85d34e6564920acda6582deb9a
pkg://omnios/library/python-2/pybonjour
ERROR
  file: usr/lib/python2.6/vendor-packages/pybonjour.pyc
    Group: 'root (0)' should be 'bin (2)'
    Size: 54053 bytes should be 53919
    Hash: 361960dab53ecc51d163b6bfd840309115db41c9 should be
68717718c8a8ac2bfb21a1442e6f143218e7cb60
pkg://omnios/library/python-2/pyopenssl-26
ERROR
  file: usr/lib/python2.6/vendor-packages/OpenSSL/__init__.pyc
    Group: 'root (0)' should be 'bin (2)'
    Size: 959 bytes should be 957
    Hash: 751a19a02c3fd4bbd6f33aec2c4cb3de0cd79af4 should be
b6d57b6ec31252af5fff3098bc80e9ece560f91d
  file: usr/lib/python2.6/vendor-packages/OpenSSL/version.pyc
    Group: 'root (0)' should be 'bin (2)'
    Size: 253 bytes should be 251
    Hash: 4062c7a082e198a3e3a8208e19776df8903ec255 should be
3b4fc0e99a45ffdb472d17dde5d9bfc0e74f4fa2

It looks like something is deciding to recompile the pyc files, which ends
up
changing them.

Is there any way to stop this, to keep pkg verify clean?

Thanks,

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151008/58a1e627/attachment.html>

From danmcd at omniti.com  Thu Oct  8 15:12:41 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 8 Oct 2015 11:12:41 -0400
Subject: [OmniOS-discuss] pkg verify failing on pyc files
In-Reply-To: <CAEgYsbHPrPwKcJ6UaJMgcrDaPebk5QtLonWX5aqa0WnCLxBAvg@mail.gmail.com>
References: <CAEgYsbHPrPwKcJ6UaJMgcrDaPebk5QtLonWX5aqa0WnCLxBAvg@mail.gmail.com>
Message-ID: <37E9FE40-E0B7-4E9C-A201-7D18749E5306@omniti.com>


> On Oct 8, 2015, at 10:35 AM, Peter Tribble <peter.tribble at gmail.com> wrote:
> 
> It looks like something is deciding to recompile the pyc files, which ends up
> changing them.
> 
> Is there any way to stop this, to keep pkg verify clean?

You'll notice several workarounds in omnios-build.  For example from python26-coverage:

	# Prevents pkgdepend from freaking out.
	<transform file path=usr/bin/coverage -> set pkg.depend.bypass-generate .* >
	<transform file path=usr/bin/coverage-2.6 -> set pkg.depend.bypass-generate .* >
	<transform file path=usr/bin/coverage2 -> set pkg.depend.bypass-generate .* >

Here's Tim Foster's blog entry about it:

	https://timsfoster.wordpress.com/2011/02/24/pkgdepend-improvements/

You may need to bypass-generate a few things.

Dan


From bmx1955 at gmail.com  Thu Oct  8 15:14:57 2015
From: bmx1955 at gmail.com (Mick Burns)
Date: Thu, 8 Oct 2015 11:14:57 -0400
Subject: [OmniOS-discuss] big zfs storage?
In-Reply-To: <20151008013643.2D31F7A06B2@apps0.cs.toronto.edu>
References: <CALeZrrSHs2o8m_qKDYCeX+vpJWo+=cFyjHSV2BGS5rA8t268BQ@mail.gmail.com>
	<20151008013643.2D31F7A06B2@apps0.cs.toronto.edu>
Message-ID: <CAJ=JNBbMWs_hzei7+SH6HuvFzobNbRHr+ahDDOVOB4tG5rwzBg@mail.gmail.com>

Thanks everyone who answered, very insightful.

What scares me the most is hearing about the panics and FMA not having
time to react at all, and also stories of sub-optimal multi hot-spare
kicking into action like described by Chip.  Recipe for disaster.
I guess this is an area where Nexenta has worked hard in implementing
their own graceful handling of various tested failure scenarios.
However you're covered if and only if you have a system conforming to
their HCL.
This goes in-line with what Chris has implemented where he works;
very customized to their environment and policies.


On Wed, Oct 7, 2015 at 9:36 PM, Chris Siebenmann <cks at cs.toronto.edu> wrote:
>> I completely concur with Richard on this.  Let me give an a real example
>> that emphases this point as it's a critical design decision.
> [...]
>> Now I only run one hot spare per pool.  Most of my pools are raidz2 or
>> raidz3.  This way any event like this can not take out more than one
>> disk and data parity will never be lost.
>>
>> There are other causes that can trigger multiple disk replacements. I
>> have not encountered them.  If I do, they won't hurt my data with the
>> limit of one hot spare.
>
>  My view is that spare handling needs to be a local decision based on
> your storage topology and pool and vdev structure (and on your durability
> needs, and even on how staffing is handled, eg if you have a 24/7 on
> call rotation). I don't think there is any single global right answer;
> hot spares will be good for some people and bad for others.
>
>  Locally we use mirrored vdevs, multiple pools, an iSCSI SAN to connect
> to actual disks, multiple backend disk controllers, and no 24/7 on call
> setup. We've developed an automated spares handling system that knows a
> great deal about our local storage topology (so it knows what are 'good'
> and 'bad' spares for any particular bad disk, using various criteria)
> and having it available has been very helpful in the face of various
> things going wrong, both individual disk failures and entire backend
> disk controllers suffering power failures after the end of the workday.
> Our solution is of course very local, but the important thing is that
> it's clear that automating this has been the right tradeoff *for us*.
>
> (In another environment it would probably be the wrong answer, eg if we
> had a 24/7 NOC staffed with people to swap physical disks and hardware
> at any time of the day, night, or holidays, and a 24/7 on call sysadmin
> to do system things like 'zpool replace'. There are other parts of
> the university which do have this. I suspect that they don't use an
> automated spares system of any kind, although I don't know for sure.)
>
>         - cks
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

From peter.tribble at gmail.com  Thu Oct  8 17:56:51 2015
From: peter.tribble at gmail.com (Peter Tribble)
Date: Thu, 8 Oct 2015 18:56:51 +0100
Subject: [OmniOS-discuss] pkg verify failing on pyc files
In-Reply-To: <37E9FE40-E0B7-4E9C-A201-7D18749E5306@omniti.com>
References: <CAEgYsbHPrPwKcJ6UaJMgcrDaPebk5QtLonWX5aqa0WnCLxBAvg@mail.gmail.com>
	<37E9FE40-E0B7-4E9C-A201-7D18749E5306@omniti.com>
Message-ID: <CAEgYsbE-G7ezes19ke4RNtjoMEsHiE2aKKyXp2HQVQJRQPFGTw@mail.gmail.com>

On Thu, Oct 8, 2015 at 4:12 PM, Dan McDonald <danmcd at omniti.com> wrote:

>
> > On Oct 8, 2015, at 10:35 AM, Peter Tribble <peter.tribble at gmail.com>
> wrote:
> >
> > It looks like something is deciding to recompile the pyc files, which
> ends up
> > changing them.
> >
> > Is there any way to stop this, to keep pkg verify clean?
>
> You'll notice several workarounds in omnios-build.  For example from
> python26-coverage:
>
>         # Prevents pkgdepend from freaking out.
>         <transform file path=usr/bin/coverage -> set
> pkg.depend.bypass-generate .* >
>         <transform file path=usr/bin/coverage-2.6 -> set
> pkg.depend.bypass-generate .* >
>         <transform file path=usr/bin/coverage2 -> set
> pkg.depend.bypass-generate .* >
>
> Here's Tim Foster's blog entry about it:
>
>
> https://timsfoster.wordpress.com/2011/02/24/pkgdepend-improvements/
>
> You may need to bypass-generate a few things.


This is all coming from omnios so it wouldn't be me who would be making
changes...

I'm a little more confused than I was, though. The .pyc files encode the
python version
and the metadata of the source file (in particular, its timestamp) in the
.pyc file.

That's all correct. The packages appear to have been published using
pkgsend -T so
that the timestamps on the .py files are preserved, and they match the
timestamps
encoded into the packaged .pyc files..

Which makes me wonder even more why it's found it necessary to recompile
the files, it's not the normal python version or source timestamp mismatch.

The only thing I notice is that the original (packaged) .pyc file has
site-packages
encoded in it, whereas the recompiled version has the vendor-packages path
(which is where the files are installed to). As far as I can tell that's
the only
difference in the contents of the .pyc files.

Curious...

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151008/bdb48472/attachment-0001.html>

From lotheac at iki.fi  Thu Oct  8 19:37:00 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Thu, 8 Oct 2015 22:37:00 +0300
Subject: [OmniOS-discuss] pkg verify failing on pyc files
In-Reply-To: <CAEgYsbE-G7ezes19ke4RNtjoMEsHiE2aKKyXp2HQVQJRQPFGTw@mail.gmail.com>
References: <CAEgYsbHPrPwKcJ6UaJMgcrDaPebk5QtLonWX5aqa0WnCLxBAvg@mail.gmail.com>
	<37E9FE40-E0B7-4E9C-A201-7D18749E5306@omniti.com>
	<CAEgYsbE-G7ezes19ke4RNtjoMEsHiE2aKKyXp2HQVQJRQPFGTw@mail.gmail.com>
Message-ID: <20151008193700.GC26155@gutsman.lotheac.fi>

On Thu, Oct 08 2015 18:56:51 +0100, Peter Tribble wrote:
> That's all correct. The packages appear to have been published using
> pkgsend -T so
> that the timestamps on the .py files are preserved, and they match the
> timestamps
> encoded into the packaged .pyc files..
> 
> Which makes me wonder even more why it's found it necessary to recompile
> the files, it's not the normal python version or source timestamp mismatch.

I think I found a clue for this. I checked an old box I've upgraded
through multiple releases and sure enough, .pyc files had been
regenerated after install for, among other things, the simplejson-26
package. I checked the timestamps for one such .py/.pyc pair:

    -rw-r--r--   1 root     bin         1036 Jul 22  2014 /usr/lib/python2.6/vendor-packages/simplejson/compat.py
    -rw-r--r--   1 root     root        2040 Apr  3  2015 /usr/lib/python2.6/vendor-packages/simplejson/compat.pyc

But this is strange - surely the package is newer than July 2014 on this
151014 box?

    % pkg list -Hv simplejson-26
    pkg://omnios/library/python-2/simplejson-26 at 3.6.5-0.151014:20150402T184431Z i--

Yes, yes it is, and it *was* built using pkgsend -T '*.py'. However,
the commit in omnios-build introducing that was authored on Mar 25 2015
(8cc8f3ef45d9c7d8ccdfda608d00599cd3890597). My theory is that if the
file content does not change, even if pkgsend -T is used to preserve the
timestamps, the file is not touched on update; would that help explain
what you're seeing?

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From eric.sproul at circonus.com  Thu Oct  8 20:23:35 2015
From: eric.sproul at circonus.com (Eric Sproul)
Date: Thu, 8 Oct 2015 16:23:35 -0400
Subject: [OmniOS-discuss] pkg verify failing on pyc files
In-Reply-To: <20151008193700.GC26155@gutsman.lotheac.fi>
References: <CAEgYsbHPrPwKcJ6UaJMgcrDaPebk5QtLonWX5aqa0WnCLxBAvg@mail.gmail.com>
	<37E9FE40-E0B7-4E9C-A201-7D18749E5306@omniti.com>
	<CAEgYsbE-G7ezes19ke4RNtjoMEsHiE2aKKyXp2HQVQJRQPFGTw@mail.gmail.com>
	<20151008193700.GC26155@gutsman.lotheac.fi>
Message-ID: <CAO8hXRD2gOBjBAK-LFwbmHCmDE80VLjfPMVg9Eiq0JURtZ3GFw@mail.gmail.com>

On Thu, Oct 8, 2015 at 3:37 PM, Lauri Tirkkonen <lotheac at iki.fi> wrote:
>     % pkg list -Hv simplejson-26
>     pkg://omnios/library/python-2/simplejson-26 at 3.6.5-0.151014:20150402T184431Z i--
>
> Yes, yes it is, and it *was* built using pkgsend -T '*.py'. However,
> the commit in omnios-build introducing that was authored on Mar 25 2015
> (8cc8f3ef45d9c7d8ccdfda608d00599cd3890597). My theory is that if the
> file content does not change, even if pkgsend -T is used to preserve the
> timestamps, the file is not touched on update; would that help explain
> what you're seeing?

Since .pyc files are evidently locally modified outside of pkg(5),
would it make sense to mark them as such in their manifests, i.e.
setting the "preserve" attribute?  This makes pkg verify not report
differences from the installed manifest.

Perhaps not a total win, though, as it would mean potentially losing
upgrade content, unless preserve was set to "renameold" or some such.

Eric

From peter.tribble at gmail.com  Thu Oct  8 20:31:07 2015
From: peter.tribble at gmail.com (Peter Tribble)
Date: Thu, 8 Oct 2015 21:31:07 +0100
Subject: [OmniOS-discuss] pkg verify failing on pyc files
In-Reply-To: <20151008193700.GC26155@gutsman.lotheac.fi>
References: <CAEgYsbHPrPwKcJ6UaJMgcrDaPebk5QtLonWX5aqa0WnCLxBAvg@mail.gmail.com>
	<37E9FE40-E0B7-4E9C-A201-7D18749E5306@omniti.com>
	<CAEgYsbE-G7ezes19ke4RNtjoMEsHiE2aKKyXp2HQVQJRQPFGTw@mail.gmail.com>
	<20151008193700.GC26155@gutsman.lotheac.fi>
Message-ID: <CAEgYsbEwVLdoPmQJpFhw=hWX+L5LADzy2V8aEvN3zuG51AcGZQ@mail.gmail.com>

On Thu, Oct 8, 2015 at 8:37 PM, Lauri Tirkkonen <lotheac at iki.fi> wrote:

> On Thu, Oct 08 2015 18:56:51 +0100, Peter Tribble wrote:
> > That's all correct. The packages appear to have been published using
> > pkgsend -T so
> > that the timestamps on the .py files are preserved, and they match the
> > timestamps
> > encoded into the packaged .pyc files..
> >
> > Which makes me wonder even more why it's found it necessary to recompile
> > the files, it's not the normal python version or source timestamp
> mismatch.
>
> I think I found a clue for this. I checked an old box I've upgraded
> through multiple releases


That seems to be the key point. I'm seeing it on upgraded boxes.
Just checked a fresh install, that's clean.

Going back, in earlier omnios releases the .py files didn't have fixed
timestamps.
Some of them (pybonjour.py for example) should be dated 2008. As a result,
the
older releases pretty much always rebuilt the pyc files.

The upgrade process correctly sets the timestamp on the .py files. That bit
appears
to work. Presumably, it also should put the correct .pyc file from the
repo, but that
doesn't seem to work correctly. I suspect some sort of race between python
explicitly writing the .pyc file from the repo and python recompiling the
.pyc file
because the one it had becomes invalid as soon as the timestamp on the .py
file gets updated.

In any event, I suspect that if you run pkg fix after the upgrade, then
because
everything now matches up, it'll be good in the future. At least, I've done
that
on one system and it hasn't deviated since.


> and sure enough, .pyc files had been
> regenerated after install for, among other things, the simplejson-26
> package. I checked the timestamps for one such .py/.pyc pair:
>
>     -rw-r--r--   1 root     bin         1036 Jul 22  2014
> /usr/lib/python2.6/vendor-packages/simplejson/compat.py
>     -rw-r--r--   1 root     root        2040 Apr  3  2015
> /usr/lib/python2.6/vendor-packages/simplejson/compat.pyc
>
> But this is strange - surely the package is newer than July 2014 on this
> 151014 box?
>
>     % pkg list -Hv simplejson-26
>     pkg://omnios/library/python-2/simplejson-26 at 3.6.5-0.151014:20150402T184431Z
> i--
>
> Yes, yes it is, and it *was* built using pkgsend -T '*.py'. However,
> the commit in omnios-build introducing that was authored on Mar 25 2015
> (8cc8f3ef45d9c7d8ccdfda608d00599cd3890597). My theory is that if the
> file content does not change, even if pkgsend -T is used to preserve the
> timestamps, the file is not touched on update; would that help explain
> what you're seeing?
>
> --
> Lauri Tirkkonen | lotheac @ IRCnet
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>


-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151008/f9b985fe/attachment.html>

From lotheac at iki.fi  Thu Oct  8 20:32:50 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Thu, 8 Oct 2015 23:32:50 +0300
Subject: [OmniOS-discuss] pkg verify failing on pyc files
In-Reply-To: <CAO8hXRD2gOBjBAK-LFwbmHCmDE80VLjfPMVg9Eiq0JURtZ3GFw@mail.gmail.com>
References: <CAEgYsbHPrPwKcJ6UaJMgcrDaPebk5QtLonWX5aqa0WnCLxBAvg@mail.gmail.com>
	<37E9FE40-E0B7-4E9C-A201-7D18749E5306@omniti.com>
	<CAEgYsbE-G7ezes19ke4RNtjoMEsHiE2aKKyXp2HQVQJRQPFGTw@mail.gmail.com>
	<20151008193700.GC26155@gutsman.lotheac.fi>
	<CAO8hXRD2gOBjBAK-LFwbmHCmDE80VLjfPMVg9Eiq0JURtZ3GFw@mail.gmail.com>
Message-ID: <20151008203250.GA26977@gutsman.lotheac.fi>

On Thu, Oct 08 2015 16:23:35 -0400, Eric Sproul wrote:
> On Thu, Oct 8, 2015 at 3:37 PM, Lauri Tirkkonen <lotheac at iki.fi> wrote:
> >     % pkg list -Hv simplejson-26
> >     pkg://omnios/library/python-2/simplejson-26 at 3.6.5-0.151014:20150402T184431Z i--
> >
> > Yes, yes it is, and it *was* built using pkgsend -T '*.py'. However,
> > the commit in omnios-build introducing that was authored on Mar 25 2015
> > (8cc8f3ef45d9c7d8ccdfda608d00599cd3890597). My theory is that if the
> > file content does not change, even if pkgsend -T is used to preserve the
> > timestamps, the file is not touched on update; would that help explain
> > what you're seeing?
> 
> Since .pyc files are evidently locally modified outside of pkg(5),
> would it make sense to mark them as such in their manifests, i.e.
> setting the "preserve" attribute?  This makes pkg verify not report
> differences from the installed manifest.
> 
> Perhaps not a total win, though, as it would mean potentially losing
> upgrade content, unless preserve was set to "renameold" or some such.

One could just not ship them at all; I was just trying to find out why
they're being regenerated after the package install.

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From rjahnel at ellipseinc.com  Thu Oct  8 22:28:55 2015
From: rjahnel at ellipseinc.com (Richard Jahnel)
Date: Thu, 8 Oct 2015 22:28:55 +0000
Subject: [OmniOS-discuss] Two panics now while writing eager zeros to zvols
Message-ID: <65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>

VMware Esx 5.5

QLogic Fibre

Omnios r14

I have 3 zvols on 3 zpools (1ea)

I am attempting to get of the garbage in the empty space by creating eager zeroed vmdks in the free space on vmfs5 volumes backed by zfs zvols.

The zvols are hosted on zpools with 2 cache ssds 2 log ssd in mirror and lz4 compression turned on.

Twice in the past 24 hours the Omnios host has panicked after about 8 hours of writing eager zero to one or more vmdks.

Any ideas? Dump available upon request.


[Ellipse Communications]


Richard Jahnel | Senior Network Engineer
Ellipse Communications - Corporate Office
14800 Quorum Dr, Suite 420  Dallas, TX 75254
TF: 888-678-3869 | F: 972-479-9115
Email<mailto:rjahnel at ellipseinc.com> * Website<https://www.ellipseinc.com> * Facebook<https://www.facebook.com/ellipsegroup> * Twitter<https://www.twitter.com/theellipsecow>


________________________________

The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151008/bdb5a642/attachment-0001.html>

From daleg at omniti.com  Thu Oct  8 23:09:06 2015
From: daleg at omniti.com (Dale Ghent)
Date: Thu, 8 Oct 2015 19:09:06 -0400
Subject: [OmniOS-discuss] Two panics now while writing eager zeros to
	zvols
In-Reply-To: <65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
References: <65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
Message-ID: <0B6890AE-EC67-43C9-815D-1BAB2924EB53@omniti.com>


A stack trace from the panic would be good to know in general, to see if it matches up with an already-known issue.

/dale

> On Oct 8, 2015, at 6:28 PM, Richard Jahnel <rjahnel at ellipseinc.com> wrote:
> 
> VMware Esx 5.5
>  
> QLogic Fibre 
>  
> Omnios r14
>  
> I have 3 zvols on 3 zpools (1ea)
>  
> I am attempting to get of the garbage in the empty space by creating eager zeroed vmdks in the free space on vmfs5 volumes backed by zfs zvols.
>  
> The zvols are hosted on zpools with 2 cache ssds 2 log ssd in mirror and lz4 compression turned on.
>  
> Twice in the past 24 hours the Omnios host has panicked after about 8 hours of writing eager zero to one or more vmdks.
>  
> Any ideas? Dump available upon request.
> 
> 
> 
> Richard Jahnel | Senior Network Engineer
> Ellipse Communications ? Corporate Office
> 14800 Quorum Dr, Suite 420  Dallas, TX 75254
> TF: 888-678-3869 | F: 972-479-9115
> Email ? Website ? Facebook ? Twitter
> 
> 
> The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies. 
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


From martin.truhlar at archcon.cz  Fri Oct  9 10:53:38 2015
From: martin.truhlar at archcon.cz (=?utf-8?B?TWFydGluIFRydWhsw6HFmQ==?=)
Date: Fri, 9 Oct 2015 12:53:38 +0200
Subject: [OmniOS-discuss] iSCSI poor write performance
In-Reply-To: <F28EF524B0C97740B772BE85DAD085050198D2688050@SERVER.archcon.local>
References: <F28EF524B0C97740B772BE85DAD085050198CE8350E6@SERVER.archcon.local>
	<15C9B79E-7BC4-4C01-9660-FFD64353304D@omniti.com>
	<F28EF524B0C97740B772BE85DAD085050198D2687F15@SERVER.archcon.local>
	<8D1002D9-69E2-4857-945A-746B821B27A1@omniti.com>
	<F28EF524B0C97740B772BE85DAD085050198D2688050@SERVER.archcon.local>
Message-ID: <F28EF524B0C97740B772BE85DAD085050198D2688336@SERVER.archcon.local>

So I've moved a bit. I've disabled writing synchronisation and voila! from writing hell straight to heaven! 7MB/s -> 245MB/s. I'm aware this is not much secure solution, but it works.
There is some problem with ZILL SSD, right? I've used new mirrored Intel SSD 530, connected directly to HBA. Any advice before I buy new couple of SSD?

I'm still little dissapointed with 4K queued writing, that I would expect higher. Now it is 20k IOPS for reading and 15k IOPS for writing, but Intel SSD 530 is capable to 24k IOPS for reading and 80k IOPS for writing. Actually, I don't know what performance to expect...

Martin

-----Original Message-----
From: Martin Truhl?? [mailto:martin.truhlar at archcon.cz] 
Sent: Wednesday, September 23, 2015 10:51 AM
To: Dan McDonald
Cc: omnios-discuss at lists.omniti.com
Subject: Re: [OmniOS-discuss] iSCSI poor write performance

Tests revealed, that problem is somewhere in disk array itself. Write performance of disk connected directly (via iSCSI) to KVM is poor as well, even write performance measured on Omnios is very poor. So loop is tightened, but there still remains lot of possible hacks.
I strived to use professional hw (disks included), so I would try to seek the error in a software setup first. Do you have any ideas where to search first (and second, third...)?

FYI mirror 5 was added lately to the running pool.

pool: dpool
 state: ONLINE
  scan: scrub repaired 0 in 5h33m with 0 errors on Sun Sep 20 00:33:15 2015
config:

	NAME                       STATE     READ WRITE CKSUM      CAP            Product /napp-it   IOstat mess
	dpool                      ONLINE       0     0     0
	  mirror-0                 ONLINE       0     0     0
	    c1t50014EE00400FA16d0  ONLINE       0     0     0      1 TB           WDC WD1002F9YZ-0   S:0 H:0 T:0
	    c1t50014EE2B40F14DBd0  ONLINE       0     0     0      1 TB           WDC WD1003FBYX-0   S:0 H:0 T:0
	  mirror-1                 ONLINE       0     0     0
	    c1t50014EE05950B131d0  ONLINE       0     0     0      1 TB           WDC WD1002F9YZ-0   S:0 H:0 T:0
	    c1t50014EE2B5E5A6B8d0  ONLINE       0     0     0      1 TB           WDC WD1003FBYZ-0   S:0 H:0 T:0
	  mirror-2                 ONLINE       0     0     0
	    c1t50014EE05958C51Bd0  ONLINE       0     0     0      1 TB           WDC WD1002F9YZ-0   S:0 H:0 T:0
	    c1t50014EE0595617ACd0  ONLINE       0     0     0      1 TB           WDC WD1002F9YZ-0   S:0 H:0 T:0
	  mirror-3                 ONLINE       0     0     0
	    c1t50014EE0AEAE7540d0  ONLINE       0     0     0      1 TB           WDC WD1002F9YZ-0   S:0 H:0 T:0
	    c1t50014EE0AEAE9B65d0  ONLINE       0     0     0      1 TB           WDC WD1002F9YZ-0   S:0 H:0 T:0
	  mirror-5                 ONLINE       0     0     0
	    c1t50014EE0AEABB8E7d0  ONLINE       0     0     0      1 TB           WDC WD1002F9YZ-0   S:0 H:0 T:0
	    c1t50014EE0AEB44327d0  ONLINE       0     0     0      1 TB           WDC WD1002F9YZ-0   S:0 H:0 T:0
	logs
	  mirror-4                 ONLINE       0     0     0
	    c1t55CD2E404B88ABE1d0  ONLINE       0     0     0      120 GB         INTEL SSDSC2BW12   S:0 H:0 T:0
	    c1t55CD2E404B88E4CFd0  ONLINE       0     0     0      120 GB         INTEL SSDSC2BW12   S:0 H:0 T:0
	cache
	  c1t55CD2E4000339A59d0    ONLINE       0     0     0      180 GB         INTEL SSDSC2BW18   S:0 H:0 T:0
	spares
	  c2t2d0                   AVAIL         1 TB           WDC WD10EFRX-68F   S:0 H:0 T:0

errors: No known data errors

Martin


-----Original Message-----
From: Dan McDonald [mailto:danmcd at omniti.com] 
Sent: Wednesday, September 16, 2015 1:51 PM
To: Martin Truhl??
Cc: omnios-discuss at lists.omniti.com; Dan McDonald
Subject: Re: [OmniOS-discuss] iSCSI poor write performance


> On Sep 16, 2015, at 4:04 AM, Martin Truhl?? <martin.truhlar at archcon.cz> wrote:
> 
> Yes, I'm aware, that problem can be hidden in many places.
> MTU is 1500. All nics and their setup are included at this email.

Start by making your 10GigE network use 9000 MTU.  You'll need to configure this on both ends (is this directly-attached 10GigE?  Or over a switch?).

Dan

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss
-------------- next part --------------
A non-text attachment was scrubbed...
Name: before.PNG
Type: image/png
Size: 41935 bytes
Desc: before.PNG
URL: <https://omniosce.org/ml-archive/attachments/20151009/8e8dd932/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: after.PNG
Type: image/png
Size: 49541 bytes
Desc: after.PNG
URL: <https://omniosce.org/ml-archive/attachments/20151009/8e8dd932/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pools.PNG
Type: image/png
Size: 24757 bytes
Desc: pools.PNG
URL: <https://omniosce.org/ml-archive/attachments/20151009/8e8dd932/attachment-0005.png>

From danmcd at omniti.com  Fri Oct  9 17:46:43 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 9 Oct 2015 13:46:43 -0400
Subject: [OmniOS-discuss] Two panics now while writing eager zeros to
	zvols
In-Reply-To: <65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
References: <65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
Message-ID: <2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>


> On Oct 8, 2015, at 6:28 PM, Richard Jahnel <rjahnel at ellipseinc.com> wrote:
> 
> Any ideas? Dump available upon request.
> 

I have to go, but I took a quick look at your one dump:

> $c
vpanic()
hati_pte_map+0x3ab(ffffff16fb3dde70, 6f, ffffff007aef5358, 800000101cecd007, 0, 
0)
hati_load_common+0x139(ffffff15ae13ca88, 806f000, ffffff007aef5358, 40b, 0, 0)
hat_memload+0x75(ffffff15ae13ca88, 806f000, ffffff007aef5358, b, 0)
segvn_faultpage+0x730(ffffff15ae13ca88, ffffff169c042ee8, 806f000, d000, 0, 
ffffff009a23fb50)
segvn_fault+0x8e6(ffffff15ae13ca88, ffffff169c042ee8, 806f000, 1000, 1, 2)
as_fault+0x31a(ffffff15ae13ca88, ffffff168f631de0, 806ff20, 1, 1, 2)
pagefault+0x96(806ff20, 1, 2, 0)
trap+0x2c7(ffffff009a23ff10, 806ff20, b)
0xfffffffffb8001d6()
> ::status
debugging crash dump vmcore.4 (64-bit) from vstore1
operating system: 5.11 omnios-f090f73 (i86pc)
image uuid: 37ff548e-1a7e-48b7-cbc1-9577366cda82
panic message: hati_pte_map: flags & HAT_LOAD_REMAP
dump content: kernel pages only
> 


This is a panic while servicing a page fault.  Usually when I see something like this, I have to ask if your HW is okay.  ("fmadm faulty" show anything?)

Beyond that, I'll need to dig deeper, but I can't now.  Just wanted to let you know the stuff on the surface, at least.

Thanks for sharing the dump!
Dan


From danmcd at omniti.com  Fri Oct  9 17:48:36 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 9 Oct 2015 13:48:36 -0400
Subject: [OmniOS-discuss] Two panics now while writing eager zeros to
	zvols
In-Reply-To: <2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
References: <65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
Message-ID: <E2A52F6D-F62C-4CAD-B135-98F393F938DA@omniti.com>

Process for this thread was:

R   1178    520   1178   1178      0 0x42014000 ffffff15b6cfc080 VS20Vol20_snapc
y
        T  0xffffff15a1ea4780 <TS_ONPROC>


Dan


From gate03 at landcroft.co.uk  Fri Oct  9 23:40:39 2015
From: gate03 at landcroft.co.uk (Michael Mounteney)
Date: Sat, 10 Oct 2015 09:40:39 +1000
Subject: [OmniOS-discuss] ISC-DHCPD in a zone
Message-ID: <20151010094039.4fde856f@pimple.landy.net>

I'm sure this has been done before but I can't find anything in my
archive of this list.  In a zone, isc-dhcpd is failing on start because
it can't obtain the details of an interface.

Zone# ifconfig -a
lo0:2: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 
e1000g1:2: flags=1100843<UP,BROADCAST,RUNNING,MULTICAST,ROUTER,IPv4> mtu 1500 index 2 inet 192.168.1.2 netmask ffffff00 broadcast
192.168.1.255 lo0:2: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1 inet6 ::1/128 

According to various sources, one can specify interfaces on the command line to restrict dhcpd:

/usr/sbin/dhcpd -cf /etc/dhcpd.conf -lf /var/db/dhcpd.leases -p 67 -s 192.168.1.2 e1000g1
binding to user-specified port 67
Internet Systems Consortium DHCP Server 4.3.1
Copyright 2004-2014 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Config file: /etc/dhcpd.conf
Database file: /var/db/dhcpd.leases
PID file: /var/run/dhcpd.pid
irs_resconf_load failed: 59.
Unable to set resolver from resolv.conf; startup continuing but DDNS support may be affected
Wrote 0 deleted host decls to leases file.
Wrote 0 new dynamic host decls to leases file.
Wrote 0 leases to leases file.
Error getting interface flags for 'lo0:2'; No such device or address
Error getting interface information.

If you think you have received this message due to a bug rather
than a configuration issue please read the section on submitting
bugs on either our web page at www.isc.org or in the README file
before submitting a bug.  These pages explain the proper
process and the information we find helpful for debugging..

exiting.

Specifying e1000g1:2 makes no difference.

Obviously normally isc-dhcpd is run via a service, but the above command is what is eventually execed and it fails with the above message in the log.

Really I don't care about that lo0:2 interface.  Is it the unconfigured ipv6 ?  If I could get rid of that, it would solve my problem.

Any help ?  Either restrict isc-dhcpd or eliminate the interface.

______________
Michael Mounteney

From rjahnel at ellipseinc.com  Sat Oct 10 03:32:08 2015
From: rjahnel at ellipseinc.com (Richard Jahnel)
Date: Sat, 10 Oct 2015 03:32:08 +0000
Subject: [OmniOS-discuss] Two panics now while writing eager zeros to
 zvols
In-Reply-To: <2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
References: <65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
Message-ID: <65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>

Faulty output for yesterday:

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Oct 08 16:18:59 37ff548e-1a7e-48b7-cbc1-9577366cda82  SUNOS-8000-KL  Major

Host        : vstore1
Platform    : PowerEdge-R510Chassis_id  : 8K307S1
Product_sn  :

Fault class : defect.sunos.kernel.panic
Affects     : sw:///:path=/var/crash/unknown/.37ff548e-1a7e-48b7-cbc1-9577366cda82
                  faulted but still in service
Problem in  : sw:///:path=/var/crash/unknown/.37ff548e-1a7e-48b7-cbc1-9577366cda82
                  faulted but still in service

Description : The system has rebooted after a kernel panic.  Refer to
              http://illumos.org/msg/SUNOS-8000-KL for more information.

Response    : The failed system image was dumped to the dump device.  If
              savecore is enabled (see dumpadm(1M)) a copy of the dump will be
              written to the savecore directory /var/crash/unknown.

Impact      : There may be some performance impact while the panic is copied to
              the savecore directory.  Disk space usage by panics can be
              substantial.

Action      : If savecore is not enabled then please take steps to preserve the
              crash image.
              Use 'fmdump -Vp -u 37ff548e-1a7e-48b7-cbc1-9577366cda82' to view
              more panic detail.  Please refer to the knowledge article for
              additional information.

-----Original Message-----
From: Dan McDonald [mailto:danmcd at omniti.com]
Sent: Friday, October 09, 2015 12:47 PM
To: Richard Jahnel
Cc: omnios-discuss at lists.omniti.com; imemo; Dan McDonald
Subject: Re: [OmniOS-discuss] Two panics now while writing eager zeros to zvols


> On Oct 8, 2015, at 6:28 PM, Richard Jahnel <rjahnel at ellipseinc.com> wrote:
>
> Any ideas? Dump available upon request.
>

I have to go, but I took a quick look at your one dump:

> $c
vpanic()
hati_pte_map+0x3ab(ffffff16fb3dde70, 6f, ffffff007aef5358, 800000101cecd007, 0,
0)
hati_load_common+0x139(ffffff15ae13ca88, 806f000, ffffff007aef5358, 40b, 0, 0) hat_memload+0x75(ffffff15ae13ca88, 806f000, ffffff007aef5358, b, 0) segvn_faultpage+0x730(ffffff15ae13ca88, ffffff169c042ee8, 806f000, d000, 0,
ffffff009a23fb50)
segvn_fault+0x8e6(ffffff15ae13ca88, ffffff169c042ee8, 806f000, 1000, 1, 2) as_fault+0x31a(ffffff15ae13ca88, ffffff168f631de0, 806ff20, 1, 1, 2)
pagefault+0x96(806ff20, 1, 2, 0)
trap+0x2c7(ffffff009a23ff10, 806ff20, b)
0xfffffffffb8001d6()
> ::status
debugging crash dump vmcore.4 (64-bit) from vstore1 operating system: 5.11 omnios-f090f73 (i86pc) image uuid: 37ff548e-1a7e-48b7-cbc1-9577366cda82
panic message: hati_pte_map: flags & HAT_LOAD_REMAP dump content: kernel pages only
>


This is a panic while servicing a page fault.  Usually when I see something like this, I have to ask if your HW is okay.  ("fmadm faulty" show anything?)

Beyond that, I'll need to dig deeper, but I can't now.  Just wanted to let you know the stuff on the surface, at least.

Thanks for sharing the dump!
Dan

________________________________

The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies.

From danmcd at omniti.com  Sat Oct 10 03:33:39 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 9 Oct 2015 23:33:39 -0400
Subject: [OmniOS-discuss] Two panics now while writing eager zeros to
	zvols
In-Reply-To: <65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
References: <65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
Message-ID: <FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>

That's just the "you had a kernel panic" message.  Shoot.  I was hoping for hardware problems.

What is that process I mentioned -- VS20Vol20_snapcy ? What is it doing?  It's driven from cron, but I can't tell much beyond that.  (Most kernel dumps don't take in userspace text.)

Dan


From jimklimov at cos.ru  Sat Oct 10 05:33:29 2015
From: jimklimov at cos.ru (Jim Klimov)
Date: Sat, 10 Oct 2015 07:33:29 +0200
Subject: [OmniOS-discuss] ISC-DHCPD in a zone
In-Reply-To: <20151010094039.4fde856f@pimple.landy.net>
References: <20151010094039.4fde856f@pimple.landy.net>
Message-ID: <EF4C8357-EC00-4C5C-9FE7-6D3C799BCDF3@cos.ru>

10 ??????? 2015??. 1:40:39 CEST, Michael Mounteney <gate03 at landcroft.co.uk> ?????:
>I'm sure this has been done before but I can't find anything in my
>archive of this list.  In a zone, isc-dhcpd is failing on start because
>it can't obtain the details of an interface.
>
>Zone# ifconfig -a
>lo0:2: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu
>8232 index 1 inet 127.0.0.1 netmask ff000000 
>e1000g1:2: flags=1100843<UP,BROADCAST,RUNNING,MULTICAST,ROUTER,IPv4>
>mtu 1500 index 2 inet 192.168.1.2 netmask ffffff00 broadcast
>192.168.1.255 lo0:2:
>flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252
>index 1 inet6 ::1/128 
>
>According to various sources, one can specify interfaces on the command
>line to restrict dhcpd:
>
>/usr/sbin/dhcpd -cf /etc/dhcpd.conf -lf /var/db/dhcpd.leases -p 67 -s
>192.168.1.2 e1000g1
>binding to user-specified port 67
>Internet Systems Consortium DHCP Server 4.3.1
>Copyright 2004-2014 Internet Systems Consortium.
>All rights reserved.
>For info, please visit https://www.isc.org/software/dhcp/
>Config file: /etc/dhcpd.conf
>Database file: /var/db/dhcpd.leases
>PID file: /var/run/dhcpd.pid
>irs_resconf_load failed: 59.
>Unable to set resolver from resolv.conf; startup continuing but DDNS
>support may be affected
>Wrote 0 deleted host decls to leases file.
>Wrote 0 new dynamic host decls to leases file.
>Wrote 0 leases to leases file.
>Error getting interface flags for 'lo0:2'; No such device or address
>Error getting interface information.
>
>If you think you have received this message due to a bug rather
>than a configuration issue please read the section on submitting
>bugs on either our web page at www.isc.org or in the README file
>before submitting a bug.  These pages explain the proper
>process and the information we find helpful for debugging..
>
>exiting.
>
>Specifying e1000g1:2 makes no difference.
>
>Obviously normally isc-dhcpd is run via a service, but the above
>command is what is eventually execed and it fails with the above
>message in the log.
>
>Really I don't care about that lo0:2 interface.  Is it the unconfigured
>ipv6 ?  If I could get rid of that, it would solve my problem.
>
>Any help ?  Either restrict isc-dhcpd or eliminate the interface.
>
>______________
>Michael Mounteney
>_______________________________________________
>OmniOS-discuss mailing list
>OmniOS-discuss at lists.omniti.com
>http://lists.omniti.com/mailman/listinfo/omnios-discuss

With the alias interfaces in play - do you use a shared-ip zone? That may be the limit; try switching to exclusive-ip with dedicated vnic(s).

Also see if any zone or process rbac privileges seem suitable additions to the service (especially if it works from shell and fails from SMF even as root): things like promiscuity or not-owned file access are dropped by default.

Jim
--
Typos courtesy of K-9 Mail on my Samsung Android

From moo at wuffers.net  Sat Oct 10 07:18:49 2015
From: moo at wuffers.net (wuffers)
Date: Sat, 10 Oct 2015 03:18:49 -0400
Subject: [OmniOS-discuss] Two panics now while writing eager zeros to
	zvols
In-Reply-To: <FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
References: <65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
Message-ID: <CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>

Is this the same bug I ran into in March?

http://lists.omniti.com/pipermail/omnios-discuss/2015-March/004540.html

I'm running a newer stmf_sbd that Dan made which solved my issue. It had
something to do with the WRITE_SAME VAAI primitive, but I'm also running
with COMSTAR.

Dan was pretty busy preparing for R151014 at the time, so he hasn't had a
chance to upstream it back.

On Fri, Oct 9, 2015 at 11:33 PM, Dan McDonald <danmcd at omniti.com> wrote:

> That's just the "you had a kernel panic" message.  Shoot.  I was hoping
> for hardware problems.
>
> What is that process I mentioned -- VS20Vol20_snapcy ? What is it doing?
> It's driven from cron, but I can't tell much beyond that.  (Most kernel
> dumps don't take in userspace text.)
>
> Dan
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151010/ec58c19f/attachment-0001.html>

From johan.kragsterman at capvert.se  Sat Oct 10 08:49:35 2015
From: johan.kragsterman at capvert.se (Johan Kragsterman)
Date: Sat, 10 Oct 2015 10:49:35 +0200
Subject: [OmniOS-discuss] Ang: Re: Two panics now while writing eager zeros
	to	zvols
In-Reply-To: <CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>
References: <CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>,
	<65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
Message-ID: <OF1299D1C9.37F54B21-ONC1257EDA.00307C3B-C1257EDA.00307C3C@inse.com>


Hi!


-----"OmniOS-discuss" <omnios-discuss-bounces at lists.omniti.com> skrev: -----
Till: Dan McDonald <danmcd at omniti.com>
Fr?n: wuffers 
S?nt av: "OmniOS-discuss" 
Datum: 2015-10-10 09:20
Kopia: Richard Jahnel <rjahnel at ellipseinc.com>, imemo <imemo at ellipseinc.com>, "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
?rende: Re: [OmniOS-discuss] Two panics now while writing eager zeros to	zvols

Is this the same bug I ran into in March?

http://lists.omniti.com/pipermail/omnios-discuss/2015-March/004540.html

I'm running a newer?stmf_sbd that Dan made which solved my issue. It had something to do with the WRITE_SAME VAAI primitive, but I'm also running with COMSTAR.

Dan was pretty busy preparing for R151014 at the time, so he hasn't had a chance to upstream it back.


Dan, is this upstreamed at all...?

Rgrds Johan


_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


From gate03 at landcroft.co.uk  Sat Oct 10 08:50:36 2015
From: gate03 at landcroft.co.uk (Michael Mounteney)
Date: Sat, 10 Oct 2015 18:50:36 +1000
Subject: [OmniOS-discuss] ISC-DHCPD in a zone
In-Reply-To: <EF4C8357-EC00-4C5C-9FE7-6D3C799BCDF3@cos.ru>
References: <20151010094039.4fde856f@pimple.landy.net>
	<EF4C8357-EC00-4C5C-9FE7-6D3C799BCDF3@cos.ru>
Message-ID: <20151010185036.3d904430@coomera>

On Sat, 10 Oct 2015 07:33:29 +0200
Jim Klimov <jimklimov at cos.ru> wrote:

> With the alias interfaces in play - do you use a shared-ip zone? That
> may be the limit; try switching to exclusive-ip with dedicated
> vnic(s).

That would explain why my setup notes (this is a fresh installation)
have DHCP in its own zone and all other services (IMAP, version control
repositories, TFTP, rsync server etc.) in another.

It's not the answer for which I was hoping.  It would be neater to have
all services together in one zone and not have to run a second zone,
just for one service.  Is there another way?  Anything else I can try?

> Also see if any zone or process rbac privileges seem suitable
> additions to the service (especially if it works from shell and fails
> from SMF even as root): things like promiscuity or not-owned file
> access are dropped by default.

It's the same both from the command line and via a service.

Thanks for your reply.

______________
Michael Mounteney

From lists at marzocchi.net  Sat Oct 10 16:09:05 2015
From: lists at marzocchi.net (Olaf Marzocchi)
Date: Sat, 10 Oct 2015 18:09:05 +0200
Subject: [OmniOS-discuss] Maildir: ACLs/Unix perms: unlink(...) failed:
	Permission denied
In-Reply-To: <56086833.7090507@marzocchi.net>
References: <55FD6E91.7020505@marzocchi.net>
	<839515024ef34c25a9bbe682a454855c@valo.at>
	<56086833.7090507@marzocchi.net>
Message-ID: <56193821.7010503@marzocchi.net>

I solved the issue I mentioned some days ago.

I checked in the logs the datethe issue appeared, and I noticed it did 
not correspond to a dovecot update, dovecot was not the culprit.

The date also did not correspond to a update of OmniOS, and in any case 
the previous OmniOS update contained only userland updates.

Since the issue appeared when I assigned for the first time ACLs to my 
home folder on the fileserver to make it better compatible with SMB 
sharing, I decided the easiest way was to start a new ZFS dataset only 
for mail, splitting home folder and mail.

$ zfs create -o compression=on tank/mail
$ chgrp mail /tank/mail
$ mkdir /tank/mail/olaf
$ mv /tank/home/olaf/Maildir /tank/mail/olaf/
$ chown -R olaf:olaf /tank/mail/olaf
$ find Maildir -type d -exec chmod 700 {} \;
$ find Maildir -type f -exec chmod 600 {} \;
$ svcadm enable dovecot

This time in the dataset I did not set the options:
-o aclinherit=passthrough-x -o aclmode=passthrough
because dovecot does not need ACL anyway. I'm not even sure those two 
options are what I actually need, but the server is running so I won't 
change them.

Anyway, the server is running fine now.
I'm not sure why I cannot see in Thunderbird any folder "Trash" but if I 
try to create one it fails with "Folder already existing", but I will 
find out.

I also wrote a summary of the issue and of the solution here, because 
other people had the same problem in the past 
(http://www.dovecot.org/list/dovecot/2013-November/093778.html) and 
there was no solution posted.
http://www.marzocchi.net/Olafsen/Software/InstallationOfOmniOSAndBasicSetup

Cheers,
Olaf


On 28/09/2015 00:05, Olaf Marzocchi wrote:
> Hi,
> I tried again with some other options.
>
> After finding
> http://www.dovecot.org/list/dovecot/2013-November/093793.html
> I deleted every ACL from the directory Maildir and I also assigned the
> group "mail" to it, recursively:
>
> OmniOS-Xeon:/tank/home/olaf/Maildir/.Generiche $ ls -lV
> total 903
> drwxrwxrwx   2 olaf     mail           2 Sep 27 23:47 cur
>                   owner@:rwxp--aARWcCos:-------:allow
>                   group@:rwxp--a-R-c--s:-------:allow
>                everyone@:rwxp--a-R-c--s:-------:allow
> (and so on)
>
> I tried also
> mail_full_filesystem_access = yes
> hoping that it would solve the issue, but nothing. Even with
> mail_debug = yes
> the log does not give any info besides
> dovecot: [ID 583609 mail.error] imap(olaf): Error:
> unlink(/tank/home/olaf/Maildir/.Generiche/dovecot-uidlist.tmp) failed:
> Permission denied
>
> (it shows also "rename" instead of "unlink")
>
> With these additional info, has anyone any idea about the cause of the
> problem?
>
> My doveconf -n:
>
> # 2.2.18: /etc/dovecot/dovecot.conf
> # OS: SunOS 5.11 i86pc  zfs
> mail_debug = yes
> mail_full_filesystem_access = yes
> mail_location = maildir:/tank/home/%u/Maildir
> mail_privileged_group = mail
> namespace inbox {
>    inbox = yes
>    location =
>    mailbox Sent {
>      special_use = \Sent
>    }
>    mailbox "Sent Messages" {
>      special_use = \Sent
>    }
>    mailbox Trash {
>      special_use = \Trash
>    }
>    prefix =
> }
> passdb {
>    driver = pam
> }
> protocols = imap
> ssl = required
> ssl_cert = </etc/dovecot/certs/dovecot.pem
> ssl_key = </etc/dovecot/private/dovecot.pem
> userdb {
>    driver = passwd
> }
>
>
> Any help will be appreciated.
>
> Regards,
> Olaf Marzocchi
>
>
>
>
> On 19/09/2015 19:22, Christian Kivalo wrote:
>> Hi,
>>
>> On 2015-09-19 16:17, Olaf Marzocchi wrote:
>>> Dear Dovecot users, hello.
>>> I will merge two issues I have into a single email because they may be
>>> related.
>>>
>>> I used dovecot on a OmniOS server since 2014 (currently OmniOS
>>> r151014) with the following configuration (it shows 2.2.18 because I
>>> recently updated dovecot, skipping only the PostgreSQL plugin):
>>>
>>> # 2.2.18: /etc/dovecot/dovecot.conf
>>> # OS: SunOS 5.11 i86pc  zfs
>>> mail_location = maildir:/tank/home/%u/Maildir
>>> mail_privileged_group = mail
>>> namespace inbox {
>>>   inbox = yes
>>>   location =
>>>   mailbox Drafts {
>>>     special_use = \Drafts
>>>   }
>>>   mailbox Junk {
>>>     special_use = \Junk
>>>   }
>>>   mailbox Sent {
>>>     special_use = \Sent
>>>   }
>>>   mailbox "Sent Messages" {
>>>     special_use = \Sent
>>>   }
>>>   mailbox Trash {
>>>     special_use = \Trash
>>>   }
>>>   prefix =
>>> }
>>> passdb {
>>>   driver = pam
>>> }
>>> protocols = imap
>>> ssl = required
>>> ssl_cert = </etc/dovecot/certs/dovecot.pem
>>> ssl_key = </etc/dovecot/private/dovecot.pem
>>> userdb {
>>>   driver = passwd
>>> }
>>>
>>> You can see that I set the Maildir folder inside the shared home
>>> folders of my server (it is only one user, anyway).
>>> It always worked perfectly, but one-two months ago I changed the
>>> permissions of my whole home folder, recursively, to add proper ACLs.
>>> I needed them because the clients started using illumos kernel SMB
>>> (relying on ACLs) instead of Netatalk/AFP (relying on Unix perms
>>> only).
>>> I didn't realise I applied the ACLs also to the Maildir folder.
>>>
>>> Dovecot worked for several weeks fine, I noticed the issue only
>>> yesterday when a mailbox (see below) appeared in Thunderbird
>>> completely empty even if the "cur" subfolder on the server still
>>> contains all the mails.
>>>
>>> Dovecot was throwing some errors like:
>>>
>>> dovecot: [ID 583609 mail.error] imap(olaf): Error:
>>> rename(/tank/home/olaf/Maildir/.&A6k- Mailing
>>> Lists.Log/dovecot.index.cache) failed: Permission denied
>>> (euid=501(olaf) egid=501(olaf) UNIX perms appear ok (ACL/MAC wrong?))
>>> dovecot: [ID 583609 mail.error] imap(olaf): Error:
>>> rename(/tank/home/olaf/Maildir/.&A6k- Mailing
>>> Lists.Log/dovecot.index.tmp, /tank/home/olaf/Maildir/.&A6k- Mailing
>>> Lists.Log/dovecot.index) failed: Permission denied
>>> dovecot: [ID 583609 mail.error] imap(olaf): Error:
>>> unlink(/tank/home/olaf/Maildir/subscriptions.lock) failed: Permission
>>> denied
>>> dovecot: [ID 583609 mail.error] imap(olaf): Error:
>>> rename(/tank/home/olaf/Maildir/subscriptions.lock,
>>> /tank/home/olaf/Maildir/subscriptions) failed: Permission denied
>>>
>>> I will post here the current permissions of the folder containing
>>> Maildir, of the Maildir itself, of its contents, and of the folder
>>> that appears empty when browsed with a client (Thunderbird).
>>>
>>> /tank/home/olaf $ ls -lV ..
>>> drwx------+ 16 olaf     olaf          17 Sep 19 01:52 olaf
>>>               user:olaf:rwxpdDaARWcCos:fd-----:allow
>>>        group:2147483648:rwxpdDaARWcCos:fd-----:allow
>>>               everyone@:rwxpdDaARWcCos:fd-----:deny
>>>
>>> /tank/home/olaf $ ls -lV
>>> drwxrwx--- 348 olaf     olaf         359 Sep 19 01:51 Maildir
>>>                  owner@:rwxp--aARWcCos:-------:allow
>>>                  group@:rwxp--a-R-c--s:-------:allow
>>>               everyone@:------a-R-c--s:-------:allow
>>>
>>> /tank/home/olaf $ ls -lV Maildir/
>>> drwxrwx---   2 olaf     olaf           2 Jan 30  2014 cur
>>>                  owner@:rwxp--aARWcCos:-------:allow
>>>                  group@:rwxp--a-R-c--s:-------:allow
>>>               everyone@:------a-R-c--s:-------:allow
>>> -rwxrwx---   1 olaf     olaf          21 Jan 30  2014 dovecot-keywords
>>>                  owner@:rwxp--aARWcCos:-------:allow
>>>                  group@:rwxp--a-R-c--s:-------:allow
>>>               everyone@:------a-R-c--s:-------:allow
>>> (ALL THE SAME PERMISSIONS FOR THE OTHER FILES EXCEPT...)
>>> -rwxrwx---   1 olaf     olaf       13735 Jan 24  2015 subscriptions
>>>                  owner@:rwxp--aARWcCos:-------:allow
>>>                  group@:rwxp--a-R-c--s:-------:allow
>>>               everyone@:------a-R-c--s:-------:allow
>>> -rw-rw----   1 olaf     olaf       13709 Sep 19 01:51 subscriptions.lock
>>>                  owner@:rw-p--aARWcCos:-------:allow
>>>                  group@:rw-p--a-R-c--s:-------:allow
>>>               everyone@:------a-R-c--s:-------:allow
>>>
>>> The folder that appears empty:
>>>
>>> /tank/home/olaf $ ls -lV Maildir/.Generiche/
>>> total 513
>>> drwxrwx---   2 olaf     olaf         949 Sep 18 01:42 cur
>>>                  owner@:rwxp--aARWcCos:-------:allow
>>>                  group@:rwxp--a-R-c--s:-------:allow
>>>               everyone@:------a-R-c--s:-------:allow
>>> -rwxrwx---   1 olaf     olaf          46 May 18  2014 dovecot-keywords
>>>                  owner@:rwxp--aARWcCos:-------:allow
>>>                  group@:rwxp--a-R-c--s:-------:allow
>>>               everyone@:------a-R-c--s:-------:allow
>>> (ALL THE SAME PERMISSIONS FOR THE OTHER FILES)
>>>
>>>
>>> I really hope you will have the time to help me because I already
>>> applied the permissions recursively and I removed the ACLs, almost as
>>> it was before my mistake.
>>> I specified "almost" because originally (I checked the backups) the
>>> Maildir folder had an ACL that gave access permissions also to the
>>> group "mail":
>>>
>>> drwxrwx---+349 olaf     olaf         359 Feb 16  2014 Maildir
>>>              group:mail:rwxpdDaARWcCos:fd-----:allow
>>>                  owner@:rwxpdDaARWcCos:fd----I:allow
>>>                  group@:rwxpdDaARWcCos:fd----I:allow
>>>               everyone@:rwxpdDaARWcCos:fd----I:deny
>>>
>>> Yesterday I haven't replicated it because from the documentation I
>>> understood it was not necessary.
>>
>>  From my view the permissions seem to be set correctly, i have to admin,
>> its been a while since i moved to virtual users so i may be wrong here...
>>
>> The log output also seems to support that permissions are correct.
>>
>> Have you tried adding the group:mail:.... ACLs back?
>>
>> Have you set mail_debug=yes or other more verbose logging settings?
>> http://wiki2.dovecot.org/Logging

From hannohirschberger at googlemail.com  Sat Oct 10 17:23:15 2015
From: hannohirschberger at googlemail.com (Hanno Hirschberger)
Date: Sat, 10 Oct 2015 19:23:15 +0200
Subject: [OmniOS-discuss] Maildir: ACLs/Unix perms: unlink(...) failed:
 Permission denied
In-Reply-To: <56193821.7010503@marzocchi.net>
References: <55FD6E91.7020505@marzocchi.net>
	<839515024ef34c25a9bbe682a454855c@valo.at>
	<56086833.7090507@marzocchi.net> <56193821.7010503@marzocchi.net>
Message-ID: <56194983.20401@googlemail.com>

On 10.10.2015 18:09, Olaf Marzocchi wrote:
> I'm not sure why I cannot see in Thunderbird any folder "Trash" but if I
> try to create one it fails with "Folder already existing", but I will
> find out

Had the same problem before so let's give it a try! Are all the folders 
subscribed in Thunderbird? Right click on the mail account name in the 
mailbox list and go to "Subscribe...". See if the checkbox on the 
"Trash" entry is activated.

Regards,

Hanno
-------------- next part --------------
A non-text attachment was scrubbed...
Name: subscribe.jpg
Type: image/jpeg
Size: 21212 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20151010/573a7cae/attachment-0001.jpg>

From lists at marzocchi.net  Sat Oct 10 17:45:52 2015
From: lists at marzocchi.net (Olaf Marzocchi)
Date: Sat, 10 Oct 2015 19:45:52 +0200
Subject: [OmniOS-discuss] Maildir: ACLs/Unix perms: unlink(...) failed:
 Permission denied
In-Reply-To: <56194983.20401@googlemail.com>
References: <55FD6E91.7020505@marzocchi.net>
	<839515024ef34c25a9bbe682a454855c@valo.at>
	<56086833.7090507@marzocchi.net>
	<56193821.7010503@marzocchi.net> <56194983.20401@googlemail.com>
Message-ID: <56194ED0.8000208@marzocchi.net>

Since I was not able to see the "Trash" even in that dialog, I created 
one called "__Cestino" (that's trash in Italian, plus the underscores to 
be sure it appeared on top) and now, after some hours I left it alone, 
it looks like it is recognised as "official" trash, with the special 
icon and the... renaming disabled.

Anyway, I don't know what actually solved the problem, but thanks.

Olaf


On 10/10/2015 19:23, Hanno Hirschberger wrote:
> On 10.10.2015 18:09, Olaf Marzocchi wrote:
>> I'm not sure why I cannot see in Thunderbird any folder "Trash" but if I
>> try to create one it fails with "Folder already existing", but I will
>> find out
>
> Had the same problem before so let's give it a try! Are all the folders
> subscribed in Thunderbird? Right click on the mail account name in the
> mailbox list and go to "Subscribe...". See if the checkbox on the
> "Trash" entry is activated.
>
> Regards,
>
> Hanno
>
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>

From jimklimov at cos.ru  Sat Oct 10 16:55:43 2015
From: jimklimov at cos.ru (Jim Klimov)
Date: Sat, 10 Oct 2015 18:55:43 +0200
Subject: [OmniOS-discuss] ISC-DHCPD in a zone
In-Reply-To: <20151010185036.3d904430@coomera>
References: <20151010094039.4fde856f@pimple.landy.net>
	<EF4C8357-EC00-4C5C-9FE7-6D3C799BCDF3@cos.ru>
	<20151010185036.3d904430@coomera>
Message-ID: <10558480-332F-49A3-993D-DF040DCD49C2@cos.ru>

10 ??????? 2015??. 10:50:36 CEST, Michael Mounteney <gate03 at landcroft.co.uk> ?????:
>On Sat, 10 Oct 2015 07:33:29 +0200
>Jim Klimov <jimklimov at cos.ru> wrote:
>
>> With the alias interfaces in play - do you use a shared-ip zone? That
>> may be the limit; try switching to exclusive-ip with dedicated
>> vnic(s).
>
>That would explain why my setup notes (this is a fresh installation)
>have DHCP in its own zone and all other services (IMAP, version control
>repositories, TFTP, rsync server etc.) in another.
>
>It's not the answer for which I was hoping.  It would be neater to have
>all services together in one zone and not have to run a second zone,
>just for one service.  Is there another way?  Anything else I can try?
>
>> Also see if any zone or process rbac privileges seem suitable
>> additions to the service (especially if it works from shell and fails
>> from SMF even as root): things like promiscuity or not-owned file
>> access are dropped by default.
>
>It's the same both from the command line and via a service.
>
>Thanks for your reply.
>
>______________
>Michael Mounteney

You can try creating a vnic and delegating it to a zone (via device match rules). Hopefully then you'd get an owned device in the zone, but still not an owned stack where you can go promiscuous, change routes, etc. It may still be the limit... Maybe you can't even set an ip address on the delegated vnic from inside the zone.

Hopefully someone better experienced with isc dhcpd canoffer better ideas.

Jim

--
Typos courtesy of K-9 Mail on my Samsung Android

From heinz at licenser.net  Mon Oct 12 13:38:32 2015
From: heinz at licenser.net (Heinz Nikolaus Gies)
Date: Mon, 12 Oct 2015 15:38:32 +0200
Subject: [OmniOS-discuss] Project-FiFo 0.7.0 release
Message-ID: <8A3C13DD-6CC6-4090-926C-190554FFFB1B@licenser.net>

Good news everyone!

FiFo 0.7.0 is released today! There is a blog post [1] explaining the details so I just want to go over the biggest news and keep this short.

* A shiny new UI.
* A complete overhaul of our documentation.
* Historic metrics for the whole cloud with Tachyon and DalmatinerDB.
* Accounting/usage information for VM?s.
* Full support for OAuth2.
* Experimental support for OmniOS. 

If you want to update we?ve a detailed update section in the docs [2].

[1] https://blog.project-fifo.net/the-biggest-news-yet-0-7-0-and-support <https://blog.project-fifo.net/the-biggest-news-yet-0-7-0-and-support>
[2] http://docs-new.project-fifo.net/docs/upgrading-fifo <http://docs-new.project-fifo.net/docs/upgrading-fifo>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151012/70a9f835/attachment.html>

From rjahnel at ellipseinc.com  Mon Oct 12 14:35:36 2015
From: rjahnel at ellipseinc.com (Richard Jahnel)
Date: Mon, 12 Oct 2015 14:35:36 +0000
Subject: [OmniOS-discuss] Two panics now while writing eager zeros to
 zvols
In-Reply-To: <FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
References: <65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
Message-ID: <65DC5816D4BEE043885A89FD54E273FC6CF60B69@MAIL101.Ellipseinc.com>

That would be the hourly snapshot destroy and create for VS20.

-----Original Message-----
From: Dan McDonald [mailto:danmcd at omniti.com]
Sent: Friday, October 09, 2015 10:34 PM
To: Richard Jahnel
Cc: omnios-discuss at lists.omniti.com; imemo
Subject: Re: [OmniOS-discuss] Two panics now while writing eager zeros to zvols

That's just the "you had a kernel panic" message.  Shoot.  I was hoping for hardware problems.

What is that process I mentioned -- VS20Vol20_snapcy ? What is it doing?  It's driven from cron, but I can't tell much beyond that.  (Most kernel dumps don't take in userspace text.)

Dan

________________________________

The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies.

From rjahnel at ellipseinc.com  Mon Oct 12 14:46:03 2015
From: rjahnel at ellipseinc.com (Richard Jahnel)
Date: Mon, 12 Oct 2015 14:46:03 +0000
Subject: [OmniOS-discuss] Two panics now while writing eager zeros to
 zvols
In-Reply-To: <CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>
References: <65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
	<CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>
Message-ID: <65DC5816D4BEE043885A89FD54E273FC6CF60C61@MAIL101.Ellipseinc.com>

Hmmm seems possible. Both panics included attempts to make eager zeroed volumes larger than 2 TB.

From: wuffers [mailto:moo at wuffers.net]
Sent: Saturday, October 10, 2015 2:19 AM
To: Dan McDonald
Cc: Richard Jahnel; imemo; omnios-discuss at lists.omniti.com
Subject: Re: [OmniOS-discuss] Two panics now while writing eager zeros to zvols

Is this the same bug I ran into in March?

http://lists.omniti.com/pipermail/omnios-discuss/2015-March/004540.html

I'm running a newer stmf_sbd that Dan made which solved my issue. It had something to do with the WRITE_SAME VAAI primitive, but I'm also running with COMSTAR.

Dan was pretty busy preparing for R151014 at the time, so he hasn't had a chance to upstream it back.

On Fri, Oct 9, 2015 at 11:33 PM, Dan McDonald <danmcd at omniti.com<mailto:danmcd at omniti.com>> wrote:
That's just the "you had a kernel panic" message.  Shoot.  I was hoping for hardware problems.

What is that process I mentioned -- VS20Vol20_snapcy ? What is it doing?  It's driven from cron, but I can't tell much beyond that.  (Most kernel dumps don't take in userspace text.)

Dan

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com<mailto:OmniOS-discuss at lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss

________________________________

The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151012/754d28e3/attachment.html>

From rjahnel at ellipseinc.com  Mon Oct 12 15:29:17 2015
From: rjahnel at ellipseinc.com (Richard Jahnel)
Date: Mon, 12 Oct 2015 15:29:17 +0000
Subject: [OmniOS-discuss] Two panics now while writing eager zeros to
 zvols
In-Reply-To: <65DC5816D4BEE043885A89FD54E273FC6CF60C61@MAIL101.Ellipseinc.com>
References: <65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
	<CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF60C61@MAIL101.Ellipseinc.com>
Message-ID: <65DC5816D4BEE043885A89FD54E273FC6CF60CA4@MAIL101.Ellipseinc.com>

Scratch that. Just panicked again on 250gb disks.

From: Richard Jahnel
Sent: Monday, October 12, 2015 9:46 AM
To: wuffers; Dan McDonald
Cc: imemo; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] Two panics now while writing eager zeros to zvols

Hmmm seems possible. Both panics included attempts to make eager zeroed volumes larger than 2 TB.

From: wuffers [mailto:moo at wuffers.net]
Sent: Saturday, October 10, 2015 2:19 AM
To: Dan McDonald
Cc: Richard Jahnel; imemo; omnios-discuss at lists.omniti.com<mailto:omnios-discuss at lists.omniti.com>
Subject: Re: [OmniOS-discuss] Two panics now while writing eager zeros to zvols

Is this the same bug I ran into in March?

http://lists.omniti.com/pipermail/omnios-discuss/2015-March/004540.html

I'm running a newer stmf_sbd that Dan made which solved my issue. It had something to do with the WRITE_SAME VAAI primitive, but I'm also running with COMSTAR.

Dan was pretty busy preparing for R151014 at the time, so he hasn't had a chance to upstream it back.

On Fri, Oct 9, 2015 at 11:33 PM, Dan McDonald <danmcd at omniti.com<mailto:danmcd at omniti.com>> wrote:
That's just the "you had a kernel panic" message.  Shoot.  I was hoping for hardware problems.

What is that process I mentioned -- VS20Vol20_snapcy ? What is it doing?  It's driven from cron, but I can't tell much beyond that.  (Most kernel dumps don't take in userspace text.)

Dan

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com<mailto:OmniOS-discuss at lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss

________________________________

The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151012/08d42bdd/attachment-0001.html>

From richard at netbsd.org  Tue Oct 13 03:24:15 2015
From: richard at netbsd.org (Richard PALO)
Date: Tue, 13 Oct 2015 05:24:15 +0200
Subject: [OmniOS-discuss] usb printer debugging
Message-ID: <mvgqe7$mc9$1@ger.gmane.org>

Hi, me again.

Trying to see if why my multifunction doesn't show up in correctly on omnios.
a similar box works okay on OI..

I have a single purpose DYMO label printer that is configured okay, but not the epson.

> richard at omnis:/home/richard/src$ cfgadm -lv  usb5/1
> Ap_Id                          Receptacle   Occupant     Condition  Information
> When         Type         Busy     Phys_Id
> usb5/1                         connected    unconfigured ok         Mfg: DYMO  Product: DYMO LabelWriter 450  NConfigs: 1  Config: 0  <no cfg str descr>
> unavailable  usb-printer  n        /devices/pci at 0,0/pci15d9,a711 at 13:1
> richard at omnis:/home/richard/src$ cfgadm -lv  usb4/1
> Ap_Id                          Receptacle   Occupant     Condition  Information
> When         Type         Busy     Phys_Id
> usb4/1                         connected    configured   ok         Mfg: EPSON  Product: EPSON WP-4595 Series  NConfigs: 1  Config: 0  : USB2.0 MFP(Hi-Speed)
> unavailable  usb-device   n        /devices/pci at 0,0/pci15d9,a711 at 12,2:1
> richard at omnis:/home/richard# echo ::prtusb |mdb -k
> INDEX   DRIVER      INST  NODE            VID.PID     PRODUCT             
> 1       ehci        0     pci15d9,a711    0000.0000   No Product String
> 2       ehci        1     pci1002,4396    0000.0000   No Product String
> 3       ohci        0     pci15d9,a711    0000.0000   No Product String
> 4       ohci        1     pci15d9,a711    0000.0000   No Product String
> 5       ohci        2     pci15d9,a711    0000.0000   No Product String
> 6       ohci        3     pci15d9,a711    0000.0000   No Product String
> 7       ohci        4     pci1002,4396    0000.0000   No Product String
> 8       usb_mid     1     device          0557.2221   Hermon USB hidmouse Device
> 9       usb_mid     4     device          046d.c52b   USB Receiver
> a       usbprn      0     printer         0922.0020   DYMO LabelWriter 450
> b       usb_mid     6     device          04b8.087e   EPSON WP-4595 Series
> richard at omnis:/home/richard# echo ::prtusb -v -ia |mdb -k
> INDEX   DRIVER      INST  NODE            VID.PID     PRODUCT             
> a       usbprn      0     printer         0922.0020   DYMO LabelWriter 450
> 
> Device Descriptor
> {
>     bLength = 0x12
>     bDescriptorType = 0x1
>     bcdUSB = 0x200
>     bDeviceClass = 0
>     bDeviceSubClass = 0
>     bDeviceProtocol = 0
>     bMaxPacketSize0 = 0x40
>     idVendor = 0x922
>     idProduct = 0x20
>     bcdDevice = 0x112
>     iManufacturer = 0x1
>     iProduct = 0x2
>     iSerialNumber = 0x3
>     bNumConfigurations = 0x1
> }
>     -- Active Config Index 0
>     Configuration Descriptor
>     {
>         bLength = 0x9
>         bDescriptorType = 0x2
>         wTotalLength = 0x20
>         bNumInterfaces = 0x1
>         bConfigurationValue = 0x1
>         iConfiguration = 0x0
>         bmAttributes = 0xc0
>         bMaxPower = 0x1
>     }
>         Interface Descriptor
>         {
>             bLength = 0x9
>             bDescriptorType = 0x4
>             bInterfaceNumber = 0x0
>             bAlternateSetting = 0x0
>             bNumEndpoints = 0x2
>             bInterfaceClass = 0x7
>             bInterfaceSubClass = 0x1
>             bInterfaceProtocol = 0x2
>             iInterface = 0x0
>         }
>         Endpoint Descriptor
>         {
>             bLength = 0x7
>             bDescriptorType = 0x5
>             bEndpointAddress = 0x82
>             bmAttributes = 0x2
>             wMaxPacketSize = 0x40
>             bInterval = 0x0
>         }
>         Endpoint Descriptor
>         {
>             bLength = 0x7
>             bDescriptorType = 0x5
>             bEndpointAddress = 0x2
>             bmAttributes = 0x2
>             wMaxPacketSize = 0x40
>             bInterval = 0x0
>         }
>                                                                         
> richard at omnis:/home/richard# echo ::prtusb -v -ib |mdb -k
> INDEX   DRIVER      INST  NODE            VID.PID     PRODUCT             
> b       usb_mid     6     device          04b8.087e   EPSON WP-4595 Series
> 
> Device Descriptor
> {
>     bLength = 0x12
>     bDescriptorType = 0x1
>     bcdUSB = 0x200
>     bDeviceClass = 0
>     bDeviceSubClass = 0
>     bDeviceProtocol = 0
>     bMaxPacketSize0 = 0x40
>     idVendor = 0x4b8
>     idProduct = 0x87e
>     bcdDevice = 0x100
>     iManufacturer = 0x1
>     iProduct = 0x2
>     iSerialNumber = 0x3
>     bNumConfigurations = 0x1
> }
>     -- Active Config Index 0
>     Configuration Descriptor
>     {
>         bLength = 0x9
>         bDescriptorType = 0x2
>         wTotalLength = 0x4e
>         bNumInterfaces = 0x3
>         bConfigurationValue = 0x1
>         iConfiguration = 0x4
>         bmAttributes = 0xc0
>         bMaxPower = 0x1
>     }
>         Interface Descriptor
>         {
>             bLength = 0x9
>             bDescriptorType = 0x4
>             bInterfaceNumber = 0x0
>             bAlternateSetting = 0x0
>             bNumEndpoints = 0x2
>             bInterfaceClass = 0xff
>             bInterfaceSubClass = 0xff
>             bInterfaceProtocol = 0xff
>             iInterface = 0x5
>         }
>         Endpoint Descriptor
>         {
>             bLength = 0x7
>             bDescriptorType = 0x5
>             bEndpointAddress = 0x1
>             bmAttributes = 0x2
>             wMaxPacketSize = 0x200
>             bInterval = 0x0
>         }
>         Endpoint Descriptor
>         {
>             bLength = 0x7
>             bDescriptorType = 0x5
>             bEndpointAddress = 0x82
>             bmAttributes = 0x2
>             wMaxPacketSize = 0x200
>             bInterval = 0x0
>         }
>         Interface Descriptor
>         {
>             bLength = 0x9
>             bDescriptorType = 0x4
>             bInterfaceNumber = 0x1
>             bAlternateSetting = 0x0
>             bNumEndpoints = 0x2
>             bInterfaceClass = 0x7
>             bInterfaceSubClass = 0x1
>             bInterfaceProtocol = 0x2
>             iInterface = 0x6
>         }
>         Endpoint Descriptor
>         {
>             bLength = 0x7
>             bDescriptorType = 0x5
>             bEndpointAddress = 0x4
>             bmAttributes = 0x2
>             wMaxPacketSize = 0x200
>             bInterval = 0x0
>         }
>         Endpoint Descriptor
>         {
>             bLength = 0x7
>             bDescriptorType = 0x5
>             bEndpointAddress = 0x85
>             bmAttributes = 0x2
>             wMaxPacketSize = 0x200
>             bInterval = 0x0
>         }
>         Interface Descriptor
>         {
>             bLength = 0x9
>             bDescriptorType = 0x4
>             bInterfaceNumber = 0x2
>             bAlternateSetting = 0x0
>             bNumEndpoints = 0x2
>             bInterfaceClass = 0x8
>             bInterfaceSubClass = 0x6
>             bInterfaceProtocol = 0x50
>             iInterface = 0x7
>         }
>         Endpoint Descriptor
>         {
>             bLength = 0x7
>             bDescriptorType = 0x5
>             bEndpointAddress = 0x7
>             bmAttributes = 0x2
>             wMaxPacketSize = 0x200
>             bInterval = 0x0
>         }
>         Endpoint Descriptor
>         {
>             bLength = 0x7
>             bDescriptorType = 0x5
>             bEndpointAddress = 0x88
>             bmAttributes = 0x2
>             wMaxPacketSize = 0x200
>             bInterval = 0x0
>         }
>                                                                         

The multifonction only gets one configuration made, config 0
although the other two are certainly listed.

On OI, I get automagically a printer and a fax device.

Any hints?
-- 
Richard PALO


From danmcd at omniti.com  Tue Oct 13 11:39:42 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 13 Oct 2015 07:39:42 -0400
Subject: [OmniOS-discuss] Ang: Re: Two panics now while writing eager
	zeros to zvols
In-Reply-To: <OF1299D1C9.37F54B21-ONC1257EDA.00307C3B-C1257EDA.00307C3C@inse.com>
References: <CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
	<OF1299D1C9.37F54B21-ONC1257EDA.00307C3B-C1257EDA.00307C3C@inse.com>
Message-ID: <1677B2AD-3051-4359-8381-C624157913DB@omniti.com>


> On Oct 10, 2015, at 4:49 AM, Johan Kragsterman <johan.kragsterman at capvert.se> wrote:
> 
> Dan, is this upstreamed at all...?

No it's not.  The following not-yet-upstreamed diff is a subset of a larger fix from illumos-nexenta:

diff --git a/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c b/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c
index cb6e115..7242d15 100644
--- a/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c
+++ b/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c
@@ -2347,6 +2347,7 @@ write_same_xfer_done:
                if (scmd->flags & SBD_SCSI_CMD_XFER_FAIL) {
                        stmf_scsilib_send_status(task, STATUS_CHECK,
                            STMF_SAA_WRITE_ERROR);
+                       ret = (int)SBD_FAILURE;
                } else {
                        ret = sbd_write_same_data(task, scmd);
                        if (ret != SBD_SUCCESS) {
@@ -2355,15 +2356,24 @@ write_same_xfer_done:
                        } else {
                                stmf_scsilib_send_status(task, STATUS_GOOD, 0);
                        }
+                       if ((scmd->flags & SBD_SCSI_CMD_TRANS_DATA) &&
+                           scmd->trans_data != NULL) {
+                               kmem_free(scmd->trans_data, scmd->trans_data_len);
+                               scmd->trans_data = NULL;
+                               scmd->trans_data_len = 0;
+                               scmd->flags &= ~SBD_SCSI_CMD_TRANS_DATA;
+                       }
                }
                /*
-                * Only way we should get here is via handle_write_same(),
-                * and that should make the following assertion always pass.
+                * Do the send_status afterwards, because of a potential
+                * double-free problem.
                 */
-               ASSERT((scmd->flags & SBD_SCSI_CMD_TRANS_DATA) &&
-                   scmd->trans_data != NULL);
-               kmem_free(scmd->trans_data, scmd->trans_data_len);
-               scmd->flags &= ~SBD_SCSI_CMD_TRANS_DATA;
+               if (ret != SBD_SUCCESS) {
+                       stmf_scsilib_send_status(task, STATUS_CHECK,
+                           STMF_SAA_WRITE_ERROR);
+               } else {
+                       stmf_scsilib_send_status(task, STATUS_GOOD, 0);
+               }
                return;
        }
        sbd_do_write_same_xfer(task, scmd, dbuf, dbuf_reusable);

It fixes a double-free in this path, which the eager-zeroes seems to tickle.

I'm attaching an stmf_sbd binary as well.  People affected by this can try this binary by:

1.) beadm create test-be

2.) beadm mount test-be /mnt

3.) cp stmf_sbd /mnt/kernel/drv/amd64/stmf_sbd

4.) bootadm update-archive -R /mnt

5.) Reboot, watch for grub

6.) Select "test-be" from the grub menu.  This will be a boot-once-into-the-new-BE test.

7.) Try your eager-zero test.

I look forward to seeing any experimental results.

Thanks,
Dan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: stmf_sbd
Type: application/octet-stream
Size: 164248 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20151013/fb7db0da/attachment-0001.obj>

From danmcd at omniti.com  Tue Oct 13 11:43:17 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 13 Oct 2015 07:43:17 -0400
Subject: [OmniOS-discuss] ISC-DHCPD in a zone
In-Reply-To: <10558480-332F-49A3-993D-DF040DCD49C2@cos.ru>
References: <20151010094039.4fde856f@pimple.landy.net>
	<EF4C8357-EC00-4C5C-9FE7-6D3C799BCDF3@cos.ru>
	<20151010185036.3d904430@coomera>
	<10558480-332F-49A3-993D-DF040DCD49C2@cos.ru>
Message-ID: <9965A834-711E-4CD4-B8D3-11C57B239CC7@omniti.com>


> On Oct 10, 2015, at 12:55 PM, Jim Klimov <jimklimov at cos.ru> wrote:
> 
> You can try creating a vnic and delegating it to a zone (via device match rules). Hopefully then you'd get an owned device in the zone, but still not an owned stack where you can go promiscuous, change routes, etc. It may still be the limit... Maybe you can't even set an ip address on the delegated vnic from inside the zone.
> 
> Hopefully someone better experienced with isc dhcpd canoffer better ideas.

Oh my...


> Zone# ifconfig -a
> lo0:2: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 
> e1000g1:2: flags=1100843<UP,BROADCAST,RUNNING,MULTICAST,ROUTER,IPv4> mtu 1500 index 2 inet 192.168.1.2 netmask ffffff00 broadcast
> 192.168.1.255 lo0:2: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1 inet6 ::1/128 

You're using a shared-stack zone.  I didn't know people still did that...

ISC DHCP needs full DLPI-ish access to the NIC in question.  I run ISC DHCP in a zone, but it's an exclusive-stack zone.  I'd take Jim's advice first if you're really intent on using a shared-stack zone.  I can't guarantee it'll work, but you certainly cannot run ISC DHCP without having a full NIC available.

Sorry,
Dan


From danmcd at omniti.com  Tue Oct 13 12:16:19 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 13 Oct 2015 08:16:19 -0400
Subject: [OmniOS-discuss] Two panics now while writing eager zeros to
	zvols
In-Reply-To: <65DC5816D4BEE043885A89FD54E273FC6CF60CA4@MAIL101.Ellipseinc.com>
References: <65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
	<CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF60C61@MAIL101.Ellipseinc.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF60CA4@MAIL101.Ellipseinc.com>
Message-ID: <BB85C4DD-631B-4EAC-AAAF-A374BFF47399@omniti.com>

See my other note on this subject.  This may be a bug which is fixed in illumos-nexenta, but not upstreamed.

Dan


From danmcd at omniti.com  Tue Oct 13 12:21:52 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 13 Oct 2015 08:21:52 -0400
Subject: [OmniOS-discuss] usb printer debugging
In-Reply-To: <mvgqe7$mc9$1@ger.gmane.org>
References: <mvgqe7$mc9$1@ger.gmane.org>
Message-ID: <7477FABC-5ACE-47F6-A80B-102ECB716CF7@omniti.com>


> On Oct 12, 2015, at 11:24 PM, Richard PALO <richard at netbsd.org> wrote:
> 
> On OI, I get automagically a printer and a fax device.
> 
> Any hints?

Sure - we don't support CUPS in OmniOS.  That requires apache stuff, which was expunged to enforce the keep-your-stuff-to-yourself policies of OmniOS.  (Apache is only available on the "omniti-ms/ms.omniti.com" publisher.)

Sorry,
Dan


From rjahnel at ellipseinc.com  Tue Oct 13 14:30:07 2015
From: rjahnel at ellipseinc.com (Richard Jahnel)
Date: Tue, 13 Oct 2015 14:30:07 +0000
Subject: [OmniOS-discuss] Ang: Re: Two panics now while writing eager
 zeros to zvols
In-Reply-To: <1677B2AD-3051-4359-8381-C624157913DB@omniti.com>
References: <CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
	<OF1299D1C9.37F54B21-ONC1257EDA.00307C3B-C1257EDA.00307C3C@inse.com>
	<1677B2AD-3051-4359-8381-C624157913DB@omniti.com>
Message-ID: <65DC5816D4BEE043885A89FD54E273FC6CF60E45@MAIL101.Ellipseinc.com>

Will experiment with it this afternoon. If it hasn't panicked by tomorrow evening odds are this will have identified and fixed this issue.

-----Original Message-----
From: Dan McDonald [mailto:danmcd at omniti.com]
Sent: Tuesday, October 13, 2015 6:40 AM
To: Johan Kragsterman; Dan McDonald
Cc: wuffers; Richard Jahnel; imemo; omnios-discuss at lists.omniti.com
Subject: Re: Ang: Re: [OmniOS-discuss] Two panics now while writing eager zeros to zvols


> On Oct 10, 2015, at 4:49 AM, Johan Kragsterman <johan.kragsterman at capvert.se> wrote:
>
> Dan, is this upstreamed at all...?

No it's not.  The following not-yet-upstreamed diff is a subset of a larger fix from illumos-nexenta:

diff --git a/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c b/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c
index cb6e115..7242d15 100644
--- a/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c
+++ b/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c
@@ -2347,6 +2347,7 @@ write_same_xfer_done:
                if (scmd->flags & SBD_SCSI_CMD_XFER_FAIL) {
                        stmf_scsilib_send_status(task, STATUS_CHECK,
                            STMF_SAA_WRITE_ERROR);
+                       ret = (int)SBD_FAILURE;
                } else {
                        ret = sbd_write_same_data(task, scmd);
                        if (ret != SBD_SUCCESS) { @@ -2355,15 +2356,24 @@ write_same_xfer_done:
                        } else {
                                stmf_scsilib_send_status(task, STATUS_GOOD, 0);
                        }
+                       if ((scmd->flags & SBD_SCSI_CMD_TRANS_DATA) &&
+                           scmd->trans_data != NULL) {
+                               kmem_free(scmd->trans_data, scmd->trans_data_len);
+                               scmd->trans_data = NULL;
+                               scmd->trans_data_len = 0;
+                               scmd->flags &= ~SBD_SCSI_CMD_TRANS_DATA;
+                       }
                }
                /*
-                * Only way we should get here is via handle_write_same(),
-                * and that should make the following assertion always pass.
+                * Do the send_status afterwards, because of a potential
+                * double-free problem.
                 */
-               ASSERT((scmd->flags & SBD_SCSI_CMD_TRANS_DATA) &&
-                   scmd->trans_data != NULL);
-               kmem_free(scmd->trans_data, scmd->trans_data_len);
-               scmd->flags &= ~SBD_SCSI_CMD_TRANS_DATA;
+               if (ret != SBD_SUCCESS) {
+                       stmf_scsilib_send_status(task, STATUS_CHECK,
+                           STMF_SAA_WRITE_ERROR);
+               } else {
+                       stmf_scsilib_send_status(task, STATUS_GOOD, 0);
+               }
                return;
        }
        sbd_do_write_same_xfer(task, scmd, dbuf, dbuf_reusable);

It fixes a double-free in this path, which the eager-zeroes seems to tickle.

I'm attaching an stmf_sbd binary as well.  People affected by this can try this binary by:

1.) beadm create test-be

2.) beadm mount test-be /mnt

3.) cp stmf_sbd /mnt/kernel/drv/amd64/stmf_sbd

4.) bootadm update-archive -R /mnt

5.) Reboot, watch for grub

6.) Select "test-be" from the grub menu.  This will be a boot-once-into-the-new-BE test.

7.) Try your eager-zero test.

I look forward to seeing any experimental results.

Thanks,
Dan

________________________________

The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies.

From rjahnel at ellipseinc.com  Tue Oct 13 17:34:19 2015
From: rjahnel at ellipseinc.com (Richard Jahnel)
Date: Tue, 13 Oct 2015 17:34:19 +0000
Subject: [OmniOS-discuss] Ang: Re: Two panics now while writing eager
 zeros to zvols
In-Reply-To: <1677B2AD-3051-4359-8381-C624157913DB@omniti.com>
References: <CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
	<OF1299D1C9.37F54B21-ONC1257EDA.00307C3B-C1257EDA.00307C3C@inse.com>
	<1677B2AD-3051-4359-8381-C624157913DB@omniti.com>
Message-ID: <65DC5816D4BEE043885A89FD54E273FC6CF60EC9@MAIL101.Ellipseinc.com>

I'm probably doing it wrong, but I have failed to get this to work.

When attempting to boot into the test environment I got something along the lines of

stmf_sbd: undefined symbol '__stack_chk_fail'
stmf_sbd: undefined symbol '__stack_chk_guard'

unable to load module stmf_sbd

My version information below.

OmniOS 5.11     omnios-f090f73  September 2015

# cat /etc/release
  OmniOS v11 r151014
  Copyright 2015 OmniTI Computer Consulting, Inc. All rights reserved.
  Use is subject to license terms.

-----Original Message-----
From: Dan McDonald [mailto:danmcd at omniti.com]
Sent: Tuesday, October 13, 2015 6:40 AM
To: Johan Kragsterman; Dan McDonald
Cc: wuffers; Richard Jahnel; imemo; omnios-discuss at lists.omniti.com
Subject: Re: Ang: Re: [OmniOS-discuss] Two panics now while writing eager zeros to zvols


> On Oct 10, 2015, at 4:49 AM, Johan Kragsterman <johan.kragsterman at capvert.se> wrote:
>
> Dan, is this upstreamed at all...?

No it's not.  The following not-yet-upstreamed diff is a subset of a larger fix from illumos-nexenta:

diff --git a/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c b/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c
index cb6e115..7242d15 100644
--- a/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c
+++ b/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c
@@ -2347,6 +2347,7 @@ write_same_xfer_done:
                if (scmd->flags & SBD_SCSI_CMD_XFER_FAIL) {
                        stmf_scsilib_send_status(task, STATUS_CHECK,
                            STMF_SAA_WRITE_ERROR);
+                       ret = (int)SBD_FAILURE;
                } else {
                        ret = sbd_write_same_data(task, scmd);
                        if (ret != SBD_SUCCESS) { @@ -2355,15 +2356,24 @@ write_same_xfer_done:
                        } else {
                                stmf_scsilib_send_status(task, STATUS_GOOD, 0);
                        }
+                       if ((scmd->flags & SBD_SCSI_CMD_TRANS_DATA) &&
+                           scmd->trans_data != NULL) {
+                               kmem_free(scmd->trans_data, scmd->trans_data_len);
+                               scmd->trans_data = NULL;
+                               scmd->trans_data_len = 0;
+                               scmd->flags &= ~SBD_SCSI_CMD_TRANS_DATA;
+                       }
                }
                /*
-                * Only way we should get here is via handle_write_same(),
-                * and that should make the following assertion always pass.
+                * Do the send_status afterwards, because of a potential
+                * double-free problem.
                 */
-               ASSERT((scmd->flags & SBD_SCSI_CMD_TRANS_DATA) &&
-                   scmd->trans_data != NULL);
-               kmem_free(scmd->trans_data, scmd->trans_data_len);
-               scmd->flags &= ~SBD_SCSI_CMD_TRANS_DATA;
+               if (ret != SBD_SUCCESS) {
+                       stmf_scsilib_send_status(task, STATUS_CHECK,
+                           STMF_SAA_WRITE_ERROR);
+               } else {
+                       stmf_scsilib_send_status(task, STATUS_GOOD, 0);
+               }
                return;
        }
        sbd_do_write_same_xfer(task, scmd, dbuf, dbuf_reusable);

It fixes a double-free in this path, which the eager-zeroes seems to tickle.

I'm attaching an stmf_sbd binary as well.  People affected by this can try this binary by:

1.) beadm create test-be

2.) beadm mount test-be /mnt

3.) cp stmf_sbd /mnt/kernel/drv/amd64/stmf_sbd

4.) bootadm update-archive -R /mnt

5.) Reboot, watch for grub

6.) Select "test-be" from the grub menu.  This will be a boot-once-into-the-new-BE test.

7.) Try your eager-zero test.

I look forward to seeing any experimental results.

Thanks,
Dan

________________________________

The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies.

From danmcd at omniti.com  Tue Oct 13 17:52:37 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 13 Oct 2015 13:52:37 -0400
Subject: [OmniOS-discuss] Ang: Re: Two panics now while writing eager
	zeros to zvols
In-Reply-To: <65DC5816D4BEE043885A89FD54E273FC6CF60EC9@MAIL101.Ellipseinc.com>
References: <CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
	<OF1299D1C9.37F54B21-ONC1257EDA.00307C3B-C1257EDA.00307C3C@inse.com>
	<1677B2AD-3051-4359-8381-C624157913DB@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF60EC9@MAIL101.Ellipseinc.com>
Message-ID: <807F8D8C-BF81-4368-BCFA-2CCA0C2DE338@omniti.com>

My bad.  I built this with bloody and forgot the flag day for modules. I'll need to build it for 014 specifically.

Dan

Sent from my iPhone (typos, autocorrect, and all)

> On Oct 13, 2015, at 1:34 PM, Richard Jahnel <rjahnel at ellipseinc.com> wrote:
> 
> I'm probably doing it wrong, but I have failed to get this to work.
> 
> When attempting to boot into the test environment I got something along the lines of
> 
> stmf_sbd: undefined symbol '__stack_chk_fail'
> stmf_sbd: undefined symbol '__stack_chk_guard'
> 
> unable to load module stmf_sbd
> 
> My version information below.
> 
> OmniOS 5.11     omnios-f090f73  September 2015
> 
> # cat /etc/release
>  OmniOS v11 r151014
>  Copyright 2015 OmniTI Computer Consulting, Inc. All rights reserved.
>  Use is subject to license terms.
> 
> -----Original Message-----
> From: Dan McDonald [mailto:danmcd at omniti.com]
> Sent: Tuesday, October 13, 2015 6:40 AM
> To: Johan Kragsterman; Dan McDonald
> Cc: wuffers; Richard Jahnel; imemo; omnios-discuss at lists.omniti.com
> Subject: Re: Ang: Re: [OmniOS-discuss] Two panics now while writing eager zeros to zvols
> 
> 
>> On Oct 10, 2015, at 4:49 AM, Johan Kragsterman <johan.kragsterman at capvert.se> wrote:
>> 
>> Dan, is this upstreamed at all...?
> 
> No it's not.  The following not-yet-upstreamed diff is a subset of a larger fix from illumos-nexenta:
> 
> diff --git a/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c b/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c
> index cb6e115..7242d15 100644
> --- a/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c
> +++ b/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c
> @@ -2347,6 +2347,7 @@ write_same_xfer_done:
>                if (scmd->flags & SBD_SCSI_CMD_XFER_FAIL) {
>                        stmf_scsilib_send_status(task, STATUS_CHECK,
>                            STMF_SAA_WRITE_ERROR);
> +                       ret = (int)SBD_FAILURE;
>                } else {
>                        ret = sbd_write_same_data(task, scmd);
>                        if (ret != SBD_SUCCESS) { @@ -2355,15 +2356,24 @@ write_same_xfer_done:
>                        } else {
>                                stmf_scsilib_send_status(task, STATUS_GOOD, 0);
>                        }
> +                       if ((scmd->flags & SBD_SCSI_CMD_TRANS_DATA) &&
> +                           scmd->trans_data != NULL) {
> +                               kmem_free(scmd->trans_data, scmd->trans_data_len);
> +                               scmd->trans_data = NULL;
> +                               scmd->trans_data_len = 0;
> +                               scmd->flags &= ~SBD_SCSI_CMD_TRANS_DATA;
> +                       }
>                }
>                /*
> -                * Only way we should get here is via handle_write_same(),
> -                * and that should make the following assertion always pass.
> +                * Do the send_status afterwards, because of a potential
> +                * double-free problem.
>                 */
> -               ASSERT((scmd->flags & SBD_SCSI_CMD_TRANS_DATA) &&
> -                   scmd->trans_data != NULL);
> -               kmem_free(scmd->trans_data, scmd->trans_data_len);
> -               scmd->flags &= ~SBD_SCSI_CMD_TRANS_DATA;
> +               if (ret != SBD_SUCCESS) {
> +                       stmf_scsilib_send_status(task, STATUS_CHECK,
> +                           STMF_SAA_WRITE_ERROR);
> +               } else {
> +                       stmf_scsilib_send_status(task, STATUS_GOOD, 0);
> +               }
>                return;
>        }
>        sbd_do_write_same_xfer(task, scmd, dbuf, dbuf_reusable);
> 
> It fixes a double-free in this path, which the eager-zeroes seems to tickle.
> 
> I'm attaching an stmf_sbd binary as well.  People affected by this can try this binary by:
> 
> 1.) beadm create test-be
> 
> 2.) beadm mount test-be /mnt
> 
> 3.) cp stmf_sbd /mnt/kernel/drv/amd64/stmf_sbd
> 
> 4.) bootadm update-archive -R /mnt
> 
> 5.) Reboot, watch for grub
> 
> 6.) Select "test-be" from the grub menu.  This will be a boot-once-into-the-new-BE test.
> 
> 7.) Try your eager-zero test.
> 
> I look forward to seeing any experimental results.
> 
> Thanks,
> Dan
> 
> ________________________________
> 
> The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies.

From danmcd at omniti.com  Tue Oct 13 18:13:44 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 13 Oct 2015 14:13:44 -0400
Subject: [OmniOS-discuss] Ang: Re: Two panics now while writing eager
	zeros to zvols
In-Reply-To: <807F8D8C-BF81-4368-BCFA-2CCA0C2DE338@omniti.com>
References: <CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
	<OF1299D1C9.37F54B21-ONC1257EDA.00307C3B-C1257EDA.00307C3C@inse.com>
	<1677B2AD-3051-4359-8381-C624157913DB@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF60EC9@MAIL101.Ellipseinc.com>
	<807F8D8C-BF81-4368-BCFA-2CCA0C2DE338@omniti.com>
Message-ID: <4AC0EC81-FEC5-4A5C-BA79-D93C0162597D@omniti.com>


> On Oct 13, 2015, at 1:52 PM, Dan McDonald <danmcd at omniti.com> wrote:
> 
> My bad.  I built this with bloody and forgot the flag day for modules. I'll need to build it for 014 specifically.

Try this one.

Dan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: stmf_sbd
Type: application/octet-stream
Size: 164072 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20151013/0a78feb4/attachment-0001.obj>

From danmcd at omniti.com  Tue Oct 13 18:15:32 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 13 Oct 2015 14:15:32 -0400
Subject: [OmniOS-discuss] Ang: Re: Two panics now while writing eager
	zeros to zvols
In-Reply-To: <4AC0EC81-FEC5-4A5C-BA79-D93C0162597D@omniti.com>
References: <CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
	<OF1299D1C9.37F54B21-ONC1257EDA.00307C3B-C1257EDA.00307C3C@inse.com>
	<1677B2AD-3051-4359-8381-C624157913DB@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF60EC9@MAIL101.Ellipseinc.com>
	<807F8D8C-BF81-4368-BCFA-2CCA0C2DE338@omniti.com>
	<4AC0EC81-FEC5-4A5C-BA79-D93C0162597D@omniti.com>
Message-ID: <9A17713D-913B-4A87-B8CE-14315CFA82CA@omniti.com>


> On Oct 13, 2015, at 2:13 PM, Dan McDonald <danmcd at omniti.com> wrote:
> 
> Try this one.

NO DON'T!

Sorry.  This one.

Dan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: stmf_sbd
Type: application/octet-stream
Size: 163240 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20151013/3ce0d1ee/attachment-0001.obj>

From cks at cs.toronto.edu  Tue Oct 13 18:35:08 2015
From: cks at cs.toronto.edu (Chris Siebenmann)
Date: Tue, 13 Oct 2015 14:35:08 -0400
Subject: [OmniOS-discuss] Installing non-current kernels on OmniOS r151014?
Message-ID: <20151013183508.6AEF07A0408@apps0.cs.toronto.edu>

 We have a situation where we would like to be able to install new
r151014 machines with something other than the current r151014 kernel.
(In the extreme case we'd like to be able to specify the exact package
version for all packages, but kernels are the most important for us.)

 I *think* that the required older versions (both of the kernel and
of drivers) are still available in the OmniOS repository. However,
I can't seem to coax 'pkg' to show them to me (perhaps because they
differ only in the timestamp, not in the version number that pkg stuff
normally shows) and so I'm not sure I can get pkg to install them.

 In related news, is there an easy way to fish the full specific versions
of installed packages out of a non-current boot environment? (Or for
that matter from the current boot environment.)

 Is the OmniOS repo for r151014 going to keep copies of all old packages
for the lifetime of r151014, or should we also be looking into creating
our own copy of the r151014 repo so we can be sure the copies we need
are preserved?

 Thanks in advance.

(For the curious: we've been doing various testing of r151014 before
upgrading production machines to it. The August 18th and September 14th
updates were stable, but after the September 29th one our test machine
has started experiencing kernel problems. It's possible that we're
putting somewhat different test load on it, but we don't think we've
particularly changed anything. While we're going to try to get crash dumps
and so on, our first priority is stabilizing some version of r151014 for a
production upgrade, which requires being able to specifically install *it*
(at least as far as the kernel/NFS/etc goes), not 'the current r151014'.)

	- cks

From richard at netbsd.org  Tue Oct 13 18:57:02 2015
From: richard at netbsd.org (Richard PALO)
Date: Tue, 13 Oct 2015 20:57:02 +0200
Subject: [OmniOS-discuss] usb printer debugging
In-Reply-To: <7477FABC-5ACE-47F6-A80B-102ECB716CF7@omniti.com>
References: <mvgqe7$mc9$1@ger.gmane.org>
	<7477FABC-5ACE-47F6-A80B-102ECB716CF7@omniti.com>
Message-ID: <561D53FE.20105@netbsd.org>

Le 13/10/15 14:21, Dan McDonald a ?crit :
> 
>> On Oct 12, 2015, at 11:24 PM, Richard PALO <richard at netbsd.org> wrote:
>>
>> On OI, I get automagically a printer and a fax device.
>>
>> Any hints?
> 
> Sure - we don't support CUPS in OmniOS.  That requires apache stuff, which was expunged to enforce the keep-your-stuff-to-yourself policies of OmniOS.  (Apache is only available on the "omniti-ms/ms.omniti.com" publisher.)
> 
> Sorry,
> Dan
> 
Well, I'm building a gate with CUPS and APACHE just to see.
But I'm still just a bit dubious, otherwise why would usbprn still be provided? omission? still smells anomalie somewhere...

-- 
Richard PALO


From lotheac at iki.fi  Tue Oct 13 19:10:20 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Tue, 13 Oct 2015 22:10:20 +0300
Subject: [OmniOS-discuss] Installing non-current kernels on OmniOS
 r151014?
In-Reply-To: <20151013183508.6AEF07A0408@apps0.cs.toronto.edu>
References: <20151013183508.6AEF07A0408@apps0.cs.toronto.edu>
Message-ID: <20151013191020.GD26977@gutsman.lotheac.fi>

On Tue, Oct 13 2015 14:35:08 -0400, Chris Siebenmann wrote:
>  I *think* that the required older versions (both of the kernel and
> of drivers) are still available in the OmniOS repository. However,
> I can't seem to coax 'pkg' to show them to me (perhaps because they
> differ only in the timestamp, not in the version number that pkg stuff
> normally shows) and so I'm not sure I can get pkg to install them.

They are: 

    % pkg list -vfa kernel
    FMRI                                                                         IFO
    pkg://omnios/system/kernel at 0.5.11-0.151014:20150929T225337Z                  i--
    pkg://omnios/system/kernel at 0.5.11-0.151014:20150914T195008Z                  ---
    pkg://omnios/system/kernel at 0.5.11-0.151014:20150913T201559Z                  ---
    pkg://omnios/system/kernel at 0.5.11-0.151014:20150818T161044Z                  ---
    pkg://omnios/system/kernel at 0.5.11-0.151014:20150727T054700Z                  ---
    pkg://omnios/system/kernel at 0.5.11-0.151014:20150417T182434Z                  ---
    pkg://omnios/system/kernel at 0.5.11-0.151014:20150402T175237Z                  ---

Downgrading a kernel package might not be so trivial, though. pkg will
generally refuse to downgrade packages unless you give version numbers in
'update', and the dependencies generally involve lots more packages than just
system/kernel.

>  In related news, is there an easy way to fish the full specific versions
> of installed packages out of a non-current boot environment? (Or for
> that matter from the current boot environment.)

Mount the BE (beadm mount) and run 'pkg -R <mountpoint> list -v'. For
the current BE just 'pkg list -v'.

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From rjahnel at ellipseinc.com  Tue Oct 13 19:42:52 2015
From: rjahnel at ellipseinc.com (Richard Jahnel)
Date: Tue, 13 Oct 2015 19:42:52 +0000
Subject: [OmniOS-discuss] Ang: Re: Two panics now while writing eager
 zeros to zvols
In-Reply-To: <9A17713D-913B-4A87-B8CE-14315CFA82CA@omniti.com>
References: <CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
	<OF1299D1C9.37F54B21-ONC1257EDA.00307C3B-C1257EDA.00307C3C@inse.com>
	<1677B2AD-3051-4359-8381-C624157913DB@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF60EC9@MAIL101.Ellipseinc.com>
	<807F8D8C-BF81-4368-BCFA-2CCA0C2DE338@omniti.com>
	<4AC0EC81-FEC5-4A5C-BA79-D93C0162597D@omniti.com>
	<9A17713D-913B-4A87-B8CE-14315CFA82CA@omniti.com>
Message-ID: <65DC5816D4BEE043885A89FD54E273FC6CF60FA7@MAIL101.Ellipseinc.com>

While the system did not panic, VMware lost all communication with all zvols shortly after I attempted to add a new vmdk to one of them.

-----Original Message-----
From: Dan McDonald [mailto:danmcd at omniti.com]
Sent: Tuesday, October 13, 2015 1:16 PM
To: Richard Jahnel
Cc: Johan Kragsterman; wuffers; imemo; omnios-discuss at lists.omniti.com
Subject: Re: Ang: Re: [OmniOS-discuss] Two panics now while writing eager zeros to zvols


> On Oct 13, 2015, at 2:13 PM, Dan McDonald <danmcd at omniti.com> wrote:
>
> Try this one.

NO DON'T!

Sorry.  This one.

Dan

________________________________

The content of this e-mail (including any attachments) is strictly confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies.

From danmcd at omniti.com  Tue Oct 13 19:47:16 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 13 Oct 2015 15:47:16 -0400
Subject: [OmniOS-discuss] Ang: Re: Two panics now while writing eager
	zeros to zvols
In-Reply-To: <65DC5816D4BEE043885A89FD54E273FC6CF60FA7@MAIL101.Ellipseinc.com>
References: <CA+tR_KxMsg5JLy29e_NzJK_8WsL+piRRCvYnT6tkCEOrzsK=8w@mail.gmail.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF5FEBB@MAIL101.Ellipseinc.com>
	<2D9BCBB7-C736-400F-A149-9BD4CB44E5FD@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF608D8@MAIL101.Ellipseinc.com>
	<FCDDBB2E-D94E-418D-809D-7CB30EC947B8@omniti.com>
	<OF1299D1C9.37F54B21-ONC1257EDA.00307C3B-C1257EDA.00307C3C@inse.com>
	<1677B2AD-3051-4359-8381-C624157913DB@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF60EC9@MAIL101.Ellipseinc.com>
	<807F8D8C-BF81-4368-BCFA-2CCA0C2DE338@omniti.com>
	<4AC0EC81-FEC5-4A5C-BA79-D93C0162597D@omniti.com>
	<9A17713D-913B-4A87-B8CE-14315CFA82CA@omniti.com>
	<65DC5816D4BEE043885A89FD54E273FC6CF60FA7@MAIL101.Ellipseinc.com>
Message-ID: <4ECAE70E-6C02-499E-9568-08D4C83B2C3F@omniti.com>


> On Oct 13, 2015, at 3:42 PM, Richard Jahnel <rjahnel at ellipseinc.com> wrote:
> 
> While the system did not panic, VMware lost all communication with all zvols shortly after I attempted to add a new vmdk to one of them.

Shoot.

There's a LOT of COMSTAR goodies in NexentaStor they haven't yet upstreamed.  VAAI-related ones are a big part of it.

Dan


From keith at paskett.org  Wed Oct 14 04:07:04 2015
From: keith at paskett.org (Keith Paskett)
Date: Tue, 13 Oct 2015 22:07:04 -0600
Subject: [OmniOS-discuss] subversion intentionally compiled without http(s)
	support?
Message-ID: <C4644AA8-1F6F-449E-A1F1-7338683A1F28@paskett.org>

After installing the following subversion package, I get an error accessing any subversion repository via http(s) protocols:

PACKAGE                                                    PUBLISHER
pkg:/omniti/developer/versioning/subversion at 1.9.2-0.151014 ms.omniti.com

The error I get is  svn: E170000: Unrecognized URL scheme for ?https://?'

Online search suggests that it was compiled without the ?with-ssl and/or ?with-neon switches.

Subversion worked fine on a r151014 system I set up a couple of months ago.

Keith Paskett
KLP-Systems


From keith at paskett.org  Wed Oct 14 04:32:14 2015
From: keith at paskett.org (Keith Paskett)
Date: Tue, 13 Oct 2015 22:32:14 -0600
Subject: [OmniOS-discuss] subversion intentionally compiled without
	http(s) support?
In-Reply-To: <C4644AA8-1F6F-449E-A1F1-7338683A1F28@paskett.org>
References: <C4644AA8-1F6F-449E-A1F1-7338683A1F28@paskett.org>
Message-ID: <3B1BD7DE-7D7B-471F-A5DE-B402E145B65D@paskett.org>

subversion at 1.8.10-0.151014 is fine.

> On Oct 13, 2015, at 10:07 PM, Keith Paskett <keith at paskett.org> wrote:
> 
> After installing the following subversion package, I get an error accessing any subversion repository via http(s) protocols:
> 
> PACKAGE                                                    PUBLISHER
> pkg:/omniti/developer/versioning/subversion at 1.9.2-0.151014 ms.omniti.com
> 
> The error I get is  svn: E170000: Unrecognized URL scheme for ?https://?'
> 
> Online search suggests that it was compiled without the ?with-ssl and/or ?with-neon switches.
> 
> Subversion worked fine on a r151014 system I set up a couple of months ago.
> 
> Keith Paskett
> KLP-Systems
> 
> 
> 
> 
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


From rt at steait.net  Wed Oct 14 05:45:37 2015
From: rt at steait.net (Rune Tipsmark)
Date: Wed, 14 Oct 2015 05:45:37 +0000
Subject: [OmniOS-discuss] ZIL TXG commits happen very frequently - why?
Message-ID: <d00b8f220a7b49438cf1434a74e2deb4@EX1301.steait.net>

Hi all.

Wondering if anyone could shed some light on why my ZFS pool would perform TXG commits up to 5 times per second. It's set to the default 5 second interval and occasionally it does wait 5 seconds between commits, but only when nearly idle.

I'm not sure if this impacts my performance but I would suspect it doesn't improve it. I force sync on all data.

I got 11 mirrors (7200rpm sas disks) two SLOG devices and two L2 ARC devices and a pair of spare disks.

Each log device can hold 150GB of data so plenty for 2 TXG commits. The system has 384GB memory.

Below is a bit of output from zilstat during a near idle time this morning so you wont see 4-5 commits per second, but during load later today it will happen..

root at zfs10:/tmp# ./zilstat.ksh -M -t -p pool01 txg
waiting for txg commit...
TIME                        txg       N-MB     N-MB/s N-Max-Rate       B-MB     B-MB/s B-Max-Rate    ops  <=4kB 4-32kB >=32kB
2015 Oct 14 06:21:19   10872771          3          3          0         21         21          2    234     14     19    201
2015 Oct 14 06:21:22   10872772         10          3          3         70         23         24    806      0     84    725
2015 Oct 14 06:21:24   10872773         12          6          5         56         28         26    682     17    107    558
2015 Oct 14 06:21:25   10872774         13         13          2         75         75         14    651      0     10    641
2015 Oct 14 06:21:25   10872775          0          0          0          0          0          0      1      0      0      1
2015 Oct 14 06:21:26   10872776         11         11          6         53         53         29    645      2    136    507
2015 Oct 14 06:21:30   10872777         11          2          4         81         20         32    873     11     60    804
2015 Oct 14 06:21:30   10872778          0          0          0          0          0          0      1      0      1      0
2015 Oct 14 06:21:31   10872779         12         12         11         56         56         52    631      0      8    623
2015 Oct 14 06:21:33   10872780         11          5          4         74         37         27    858      0     44    814
2015 Oct 14 06:21:36   10872781         14          4          6         79         26         30    977     12     82    883
2015 Oct 14 06:21:39   10872782         11          3          4         78         26         25    957     18     55    884
2015 Oct 14 06:21:43   10872783         13          3          4         80         20         24    930      0    135    795
2015 Oct 14 06:21:46   10872784         13          4          4         81         27         29    965     13     95    857
2015 Oct 14 06:21:49   10872785         11          3          6         80         26         41   1077     12    215    850
2015 Oct 14 06:21:53   10872786          9          3          2         67         22         18    870      1     74    796
2015 Oct 14 06:21:56   10872787         12          3          5         72         18         26    909     17    163    729
2015 Oct 14 06:21:58   10872788         12          6          3         53         26         21    530      0     33    497
2015 Oct 14 06:21:59   10872789         26         26         24         72         72         62    882     12     60    810
2015 Oct 14 06:22:02   10872790          9          3          5         57         19         28    777      0     70    708
2015 Oct 14 06:22:07   10872791         11          2          3         96         24         22   1044     12     46    986
2015 Oct 14 06:22:10   10872792         13          3          4         78         19         22    911     12     38    862
2015 Oct 14 06:22:14   10872793         11          2          4         79         19         26    930     10     94    826
2015 Oct 14 06:22:17   10872794         11          3          5         73         24         26   1054     17    151    886
2015 Oct 14 06:22:17   10872795          0          0          0          0          0          0      2      0      0      2
2015 Oct 14 06:22:18   10872796         40         40         38         78         78         60    707      0     28    680
2015 Oct 14 06:22:22   10872797         10          3          3         66         22         21    937     14    164    759
2015 Oct 14 06:22:25   10872798          9          2          2         66         16         21    821     11     92    718
2015 Oct 14 06:22:28   10872799         24         12         14         80         40         43    750      0     23    727
2015 Oct 14 06:22:28   10872800          0          0          0          0          0          0      2      0      0      2
2015 Oct 14 06:22:29   10872801         15          7          9         49         24         24    526     11     25    490
2015 Oct 14 06:22:33   10872802         10          2          3         79         19         24    939      0     63    876
2015 Oct 14 06:22:36   10872803         10          5          3         59         29         18    756     11     65    682
2015 Oct 14 06:22:36   10872804          0          0          0          0          0          0      0      0      0      0
2015 Oct 14 06:22:36   10872805         13         13          2         58         58          9    500      0     29    471

--

root at zfs10:/tmp# zpool status pool01
  pool: pool01
state: ONLINE
  scan: scrub repaired 0 in 7h53m with 0 errors on Sat Oct  3 06:53:43 2015
config:

        NAME                       STATE     READ WRITE CKSUM
        pool01                     ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            c4t5000C50055FC9533d0  ONLINE       0     0     0
            c4t5000C50055FE6A63d0  ONLINE       0     0     0
          mirror-1                 ONLINE       0     0     0
            c4t5000C5005708296Fd0  ONLINE       0     0     0
            c4t5000C5005708351Bd0  ONLINE       0     0     0
          mirror-2                 ONLINE       0     0     0
            c4t5000C500570858EFd0  ONLINE       0     0     0
            c4t5000C50057085A6Bd0  ONLINE       0     0     0
          mirror-3                 ONLINE       0     0     0
            c4t5000C50057086307d0  ONLINE       0     0     0
            c4t5000C50057086B67d0  ONLINE       0     0     0
          mirror-4                 ONLINE       0     0     0
            c4t5000C500570870D3d0  ONLINE       0     0     0
            c4t5000C50057089753d0  ONLINE       0     0     0
          mirror-5                 ONLINE       0     0     0
            c4t5000C500625B7EA7d0  ONLINE       0     0     0
            c4t5000C500625B8137d0  ONLINE       0     0     0
          mirror-6                 ONLINE       0     0     0
            c4t5000C500625B8427d0  ONLINE       0     0     0
            c4t5000C500625B86E3d0  ONLINE       0     0     0
          mirror-7                 ONLINE       0     0     0
            c4t5000C500625B886Fd0  ONLINE       0     0     0
            c4t5000C500625BB773d0  ONLINE       0     0     0
          mirror-8                 ONLINE       0     0     0
            c4t5000C500625BC2C3d0  ONLINE       0     0     0
            c4t5000C500625BD3EBd0  ONLINE       0     0     0
          mirror-9                 ONLINE       0     0     0
            c4t5000C50062878C0Bd0  ONLINE       0     0     0
            c4t5000C50062878C43d0  ONLINE       0     0     0
          mirror-10                ONLINE       0     0     0
            c4t5000C50062879687d0  ONLINE       0     0     0
            c4t5000C50062879707d0  ONLINE       0     0     0
        logs
          c11d0                    ONLINE       0     0     0
          c10d0                    ONLINE       0     0     0
        cache
          c14d0                    ONLINE       0     0     0
          c15d0                    ONLINE       0     0     0
        spares
          c4t5000C50062879723d0    AVAIL
          c4t5000C50062879787d0    AVAIL

errors: No known data errors


Br,
Rune
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151014/f1cf1c44/attachment-0001.html>

From chip at innovates.com  Wed Oct 14 12:44:50 2015
From: chip at innovates.com (Schweiss, Chip)
Date: Wed, 14 Oct 2015 07:44:50 -0500
Subject: [OmniOS-discuss] ZIL TXG commits happen very frequently - why?
In-Reply-To: <d00b8f220a7b49438cf1434a74e2deb4@EX1301.steait.net>
References: <d00b8f220a7b49438cf1434a74e2deb4@EX1301.steait.net>
Message-ID: <CALeZrrSn5ST+MPpSDv5rnWYf2AdLNz1+RjiRBnLRBkiBz4vOXA@mail.gmail.com>

It all has to do with the write throttle and buffers filling.   Here's a
great blog post on how it works and how it's tuned:

http://dtrace.org/blogs/ahl/2014/02/10/the-openzfs-write-throttle/

http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/

-Chip


On Wed, Oct 14, 2015 at 12:45 AM, Rune Tipsmark <rt at steait.net> wrote:

> Hi all.
>
>
>
> Wondering if anyone could shed some light on why my ZFS pool would perform
> TXG commits up to 5 times per second. It?s set to the default 5 second
> interval and occasionally it does wait 5 seconds between commits, but only
> when nearly idle.
>
>
>
> I?m not sure if this impacts my performance but I would suspect it doesn?t
> improve it. I force sync on all data.
>
>
>
> I got 11 mirrors (7200rpm sas disks) two SLOG devices and two L2 ARC
> devices and a pair of spare disks.
>
>
>
> Each log device can hold 150GB of data so plenty for 2 TXG commits. The
> system has 384GB memory.
>
>
>
>
> Below is a bit of output from zilstat during a near idle time this morning
> so you wont see 4-5 commits per second, but during load later today it will
> happen..
>
>
>
> root at zfs10:/tmp# ./zilstat.ksh -M -t -p pool01 txg
>
> waiting for txg commit...
>
> TIME                        txg       N-MB     N-MB/s N-Max-Rate
> B-MB     B-MB/s B-Max-Rate    ops  <=4kB 4-32kB >=32kB
>
> 2015 Oct 14 06:21:19   10872771          3          3          0
> 21         21          2    234     14     19    201
>
> 2015 Oct 14 06:21:22   10872772         10          3          3
>      70         23         24    806      0     84    725
>
> 2015 Oct 14 06:21:24   10872773         12          6          5
> 56         28         26    682     17    107    558
>
> 2015 Oct 14 06:21:25   10872774         13         13          2
>  75         75         14    651      0     10    641
>
> 2015 Oct 14 06:21:25   10872775          0          0          0
> 0          0          0      1      0      0      1
>
> 2015 Oct 14 06:21:26   10872776         11         11          6
> 53         53         29    645      2    136    507
>
> 2015 Oct 14 06:21:30   10872777         11          2          4
> 81         20         32    873     11     60    804
>
> 2015 Oct 14 06:21:30   10872778          0          0          0
> 0          0          0      1      0      1      0
>
> 2015 Oct 14 06:21:31   10872779         12         12         11
> 56         56         52    631      0      8    623
>
> 2015 Oct 14 06:21:33   10872780         11          5          4
> 74         37         27    858      0     44    814
>
> 2015 Oct 14 06:21:36   10872781         14          4          6
> 79         26         30    977     12     82    883
>
> 2015 Oct 14 06:21:39   10872782         11          3          4
> 78         26         25    957     18     55    884
>
> 2015 Oct 14 06:21:43   10872783         13          3          4
> 80         20         24    930      0    135    795
>
> 2015 Oct 14 06:21:46   10872784         13          4          4
> 81         27         29    965     13     95    857
>
> 2015 Oct 14 06:21:49   10872785         11          3          6
> 80         26         41   1077     12    215    850
>
> 2015 Oct 14 06:21:53   10872786          9          3          2
> 67         22         18    870      1     74    796
>
> 2015 Oct 14 06:21:56   10872787         12          3          5
> 72         18         26    909     17    163    729
>
> 2015 Oct 14 06:21:58   10872788         12          6          3
> 53         26         21    530      0     33    497
>
> 2015 Oct 14 06:21:59   10872789         26         26         24
> 72         72         62    882     12     60    810
>
> 2015 Oct 14 06:22:02   10872790          9          3          5
> 57         19         28    777      0     70    708
>
> 2015 Oct 14 06:22:07   10872791         11          2          3
> 96         24         22   1044     12     46    986
>
> 2015 Oct 14 06:22:10   10872792         13          3          4
> 78         19         22    911     12     38    862
>
> 2015 Oct 14 06:22:14   10872793         11          2          4
> 79         19         26    930     10     94    826
>
> 2015 Oct 14 06:22:17   10872794         11          3          5
> 73         24         26   1054     17    151    886
>
> 2015 Oct 14 06:22:17   10872795          0          0          0
> 0          0          0      2      0      0      2
>
> 2015 Oct 14 06:22:18   10872796         40         40         38
> 78         78         60    707      0     28    680
>
> 2015 Oct 14 06:22:22   10872797         10          3          3
> 66         22         21    937     14    164    759
>
> 2015 Oct 14 06:22:25   10872798          9          2          2
> 66         16         21    821     11     92    718
>
> 2015 Oct 14 06:22:28   10872799         24         12         14
> 80         40         43    750      0     23    727
>
> 2015 Oct 14 06:22:28   10872800          0          0          0
> 0          0          0      2      0      0      2
>
> 2015 Oct 14 06:22:29   10872801         15          7          9
> 49         24         24    526     11     25    490
>
> 2015 Oct 14 06:22:33   10872802         10          2          3
> 79         19         24    939      0     63    876
>
> 2015 Oct 14 06:22:36   10872803         10          5          3
> 59         29         18    756     11     65    682
>
> 2015 Oct 14 06:22:36   10872804          0          0          0
> 0          0          0      0      0      0      0
>
> 2015 Oct 14 06:22:36   10872805         13         13          2
> 58         58          9    500      0     29    471
>
>
>
> --
>
>
>
> root at zfs10:/tmp# zpool status pool01
>
>   pool: pool01
>
> state: ONLINE
>
>   scan: scrub repaired 0 in 7h53m with 0 errors on Sat Oct  3 06:53:43 2015
>
> config:
>
>
>
>         NAME                       STATE     READ WRITE CKSUM
>
>         pool01                     ONLINE       0     0     0
>
>           mirror-0                 ONLINE       0     0     0
>
>             c4t5000C50055FC9533d0  ONLINE       0     0     0
>
>             c4t5000C50055FE6A63d0  ONLINE       0     0     0
>
>           mirror-1                 ONLINE       0     0     0
>
>             c4t5000C5005708296Fd0  ONLINE       0     0     0
>
>             c4t5000C5005708351Bd0  ONLINE       0     0     0
>
>           mirror-2                 ONLINE       0     0     0
>
>             c4t5000C500570858EFd0  ONLINE       0     0     0
>
>             c4t5000C50057085A6Bd0  ONLINE       0     0     0
>
>           mirror-3                 ONLINE       0     0     0
>
>             c4t5000C50057086307d0  ONLINE       0     0     0
>
>             c4t5000C50057086B67d0  ONLINE       0     0     0
>
>           mirror-4                 ONLINE       0     0     0
>
>             c4t5000C500570870D3d0  ONLINE       0     0     0
>
>             c4t5000C50057089753d0  ONLINE       0     0     0
>
>           mirror-5                 ONLINE       0     0     0
>
>             c4t5000C500625B7EA7d0  ONLINE       0     0     0
>
>             c4t5000C500625B8137d0  ONLINE       0     0     0
>
>           mirror-6                 ONLINE       0     0     0
>
>             c4t5000C500625B8427d0  ONLINE       0     0     0
>
>             c4t5000C500625B86E3d0  ONLINE       0     0     0
>
>           mirror-7                 ONLINE       0     0     0
>
>             c4t5000C500625B886Fd0  ONLINE       0     0     0
>
>             c4t5000C500625BB773d0  ONLINE       0     0     0
>
>           mirror-8                 ONLINE       0     0     0
>
>             c4t5000C500625BC2C3d0  ONLINE       0     0     0
>
>             c4t5000C500625BD3EBd0  ONLINE       0     0     0
>
>           mirror-9                 ONLINE       0     0     0
>
>             c4t5000C50062878C0Bd0  ONLINE       0     0     0
>
>             c4t5000C50062878C43d0  ONLINE       0     0     0
>
>           mirror-10                ONLINE       0     0     0
>
>             c4t5000C50062879687d0  ONLINE       0     0     0
>
>             c4t5000C50062879707d0  ONLINE       0     0     0
>
>         logs
>
>           c11d0                    ONLINE       0     0     0
>
>           c10d0                    ONLINE       0     0     0
>
>         cache
>
>           c14d0                    ONLINE       0     0     0
>
>           c15d0                    ONLINE       0     0     0
>
>         spares
>
>           c4t5000C50062879723d0    AVAIL
>
>           c4t5000C50062879787d0    AVAIL
>
>
>
> errors: No known data errors
>
>
>
>
>
> Br,
>
> Rune
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151014/1f6b33ca/attachment-0001.html>

From danmcd at omniti.com  Wed Oct 14 14:50:10 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 14 Oct 2015 10:50:10 -0400
Subject: [OmniOS-discuss] NEW BLOODY - last or second-to-last before r151016
Message-ID: <B62A8B9D-F7D4-44FB-8205-FE6322B9337D@omniti.com>

BIG update this time for OmniOS bloody as we head to r151016.  If there is another bloody before r151016, it'll be bugfixes or upstream-illumos that I think require some bloody soak time.

New with this update out of omnios-build (now at master revision 76d2785 in the install media, and one additional bugfix advancing to e75489a in the repo server):

- gcc51 now built with parallel make (shrinking build times noticeably)

- So many updates I had to automate the extraction of them:

    Update gnu-make to 4.1
    Update xz to 5.2.2
    Update wget to 1.16.3
    Update unixodbc to 2.3.2
    Update tmux to 2.0   (NOTE --> additional bugfix available only from "pkg update", thanks to Lauri "lotheac" Tirkkonen for the fix)
    Update tcsh to 6.19.0
    Update sqlite-3 to 3.8.11.1
    Update sigcpp to 2.6.1
    Update screen to 4.3.1
    Update simplejson-26 to 3.8.0
    Update pylint to 1.4.4
    Update ply to 3.8
    Update numpy-26 to 1.10.0
    Update lxml-26 to 3.4.4
    Update coverage-26 to 4.0
    Update pv (pipe-viewer) to 1.6.0
    Update pcre to 8.37
    Update pciutils to 3.4.0
    Update gnu-patch to 2.7.5
    Update netperf to 2.7.0
    Update Mercurial to 3.5.2
    Update libxml2 to 2.9.2
    Update libtool & libltdl to 2.4.6
    Update libpcap to 1.7.4
    Update libidn to 1.32
    Update iso-codes to 3.57
    Update ISC DHCP to 4.3.3
    Update intltool to 0.51.0
    Update groff to 1.22.3
    Update git to 2.6.1
    Update gawk to 4.1.3
    Update Amazon EC2 API to 1.7.5.1
    Update curl to 7.44.0
    Update bind to 9.10.3
    Update bash to 4.3p42
    Update automake to 1.15
    Update XML::Parser to 2.44
    Update coreutils to 8.24
    Update Mercurial to 3.5.1

And highlights of illumos-omnios progress (now at master revision 85fef88, meaning uname -v == omnios-85fef88) include:

- Resumable ZFS send/recv.  Flag day here:

	http://www.listbox.com/member/archive/182191/2015/10/sort/time_rev/page/1/entry/2:18/20151012235207:C12B2C18-715D-11E5-A848-EAF6A2A023E1/

- New ZFS hash algorithms

- Other ZFS bugfixes

- Fix in link aggregations (aggrs) to be more reliable in the face of downed links (illumos 6274)

- strerror_l() for localized strerror().  (Translations welcome.)

- Updated hardware data (prelude to r151016).

- Assorted SMB/CIFS fixes from Nexenta.

- Slight increase in the number of concurrent BEs GRUB can cope with (up from 40 to ~55, but officially we still suggest you keep it at 40 or less).

- useradd/del/mod is now ZFS aware, fix from OpenIndiana.

- EOL of cachefs (I hope nobody is still using it...).

- New uuidgen(1) command.

- prtdiag(1M) improvements dealing with hardware in slots.

- SMBIOS 3.0 support, from Joyent.

- NVMe support (pardon the delay here, wanted it in '014 first).


Please give this one a spin folks -- it's essentially an r151016 preview.

Dan


From danmcd at omniti.com  Wed Oct 14 14:51:50 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 14 Oct 2015 10:51:50 -0400
Subject: [OmniOS-discuss] EOSL for r151012 COMING VERY SOON
Message-ID: <36615D8E-D774-4A0C-A1DB-7B0590BB5616@omniti.com>

The day r151016 is released, support for r151012 (old stable) will dry up - i.e. its End Of Service Life (EOSL).  You already should have updated r151012 to r151014, which is not only the current stable but is also the current LTS release.

If you have any problems migrating off r151012, please mention it on the list, or via support if you're a paying support customer.

Thanks,
Dan


From colin at omniti.com  Wed Oct 14 23:24:31 2015
From: colin at omniti.com (Colin Roche-Dutch)
Date: Wed, 14 Oct 2015 19:24:31 -0400
Subject: [OmniOS-discuss] subversion intentionally compiled without
 http(s) support?
In-Reply-To: <3B1BD7DE-7D7B-471F-A5DE-B402E145B65D@paskett.org>
References: <C4644AA8-1F6F-449E-A1F1-7338683A1F28@paskett.org>
	<3B1BD7DE-7D7B-471F-A5DE-B402E145B65D@paskett.org>
Message-ID: <CAHvNA=OHYoRsFyubjorLmZMmGtD11q9s9LtUU+LFyXXC1o516Q@mail.gmail.com>

Hi Keith,

A bad package slipped into the ms.omniti.com repo. I should have an updated
one early tomorrow with libserf enabled to fix the http/https version.

-Thanks,
Colin

On Wed, Oct 14, 2015 at 12:32 AM, Keith Paskett <keith at paskett.org> wrote:

> subversion at 1.8.10-0.151014 is fine.
>
> > On Oct 13, 2015, at 10:07 PM, Keith Paskett <keith at paskett.org> wrote:
> >
> > After installing the following subversion package, I get an error
> accessing any subversion repository via http(s) protocols:
> >
> > PACKAGE                                                    PUBLISHER
> > pkg:/omniti/developer/versioning/subversion at 1.9.2-0.151014 ms.omniti.com
> >
> > The error I get is  svn: E170000: Unrecognized URL scheme for ?https://
> ?'
> >
> > Online search suggests that it was compiled without the ?with-ssl and/or
> ?with-neon switches.
> >
> > Subversion worked fine on a r151014 system I set up a couple of months
> ago.
> >
> > Keith Paskett
> > KLP-Systems
> >
> >
> >
> >
> >
> > _______________________________________________
> > OmniOS-discuss mailing list
> > OmniOS-discuss at lists.omniti.com
> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151014/935ca5e0/attachment.html>

From rt at steait.net  Wed Oct 14 23:41:13 2015
From: rt at steait.net (Rune Tipsmark)
Date: Wed, 14 Oct 2015 23:41:13 +0000
Subject: [OmniOS-discuss] ZIL TXG commits happen very frequently - why?
In-Reply-To: <CALeZrrSn5ST+MPpSDv5rnWYf2AdLNz1+RjiRBnLRBkiBz4vOXA@mail.gmail.com>
References: <d00b8f220a7b49438cf1434a74e2deb4@EX1301.steait.net>,
	<CALeZrrSn5ST+MPpSDv5rnWYf2AdLNz1+RjiRBnLRBkiBz4vOXA@mail.gmail.com>
Message-ID: <1444866067065.866@steait.net>

Thanks, that was helpful reading although I'm not 100% sure where to start.


I did a bit of testing with the scripts and running IOmeter on a VM residing on a vSphere host connected to my ZFS box with 8Gbit Fibre channel.


I noticed that txg commits rarely took over 1 second.


root at zfs10:/tmp# dtrace -s duration.d pool01
dtrace: script 'duration.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
  7  17407       txg_sync_thread:txg-synced sync took 0.68 seconds
  8  17407       txg_sync_thread:txg-synced sync took 0.71 seconds
 12  17407       txg_sync_thread:txg-synced sync took 0.52 seconds
  7  17407       txg_sync_thread:txg-synced sync took 0.29 seconds
 22  17407       txg_sync_thread:txg-synced sync took 0.64 seconds
  1  17407       txg_sync_thread:txg-synced sync took 0.34 seconds
  5  17407       txg_sync_thread:txg-synced sync took 0.93 seconds
  0  17407       txg_sync_thread:txg-synced sync took 0.46 seconds
  9  17407       txg_sync_thread:txg-synced sync took 2.59 seconds
  0  17407       txg_sync_thread:txg-synced sync took 0.29 seconds
  7  17407       txg_sync_thread:txg-synced sync took 1.31 seconds
  8  17407       txg_sync_thread:txg-synced sync took 0.71 seconds
 10  17407       txg_sync_thread:txg-synced sync took 0.67 seconds
  8  17407       txg_sync_thread:txg-synced sync took 0.29 seconds
 12  17407       txg_sync_thread:txg-synced sync took 0.58 seconds
  1  17407       txg_sync_thread:txg-synced sync took 0.46 seconds

also I noticed that the default allocation of 4GB on the slog device was never used, the peak I saw was just over 1GB, most of the time half that.

 0  17408      txg_sync_thread:txg-syncing 1179MB of 4096MB used
  9  17408      txg_sync_thread:txg-syncing  482MB of 4096MB used
 20  17408      txg_sync_thread:txg-syncing  686MB of 4096MB used
  0  17408      txg_sync_thread:txg-syncing  429MB of 4096MB used
 14  17408      txg_sync_thread:txg-syncing  328MB of 4096MB used
 10  17408      txg_sync_thread:txg-syncing  374MB of 4096MB used
  8  17408      txg_sync_thread:txg-syncing  510MB of 4096MB used
 12  17408      txg_sync_thread:txg-syncing  210MB of 4096MB used
  1  17408      txg_sync_thread:txg-syncing  268MB of 4096MB used
  0  17408      txg_sync_thread:txg-syncing  432MB of 4096MB used
 16  17408      txg_sync_thread:txg-syncing  236MB of 4096MB used
 18  17408      txg_sync_thread:txg-syncing  341MB of 4096MB used
  9  17408      txg_sync_thread:txg-syncing  361MB of 4096MB used
 14  17408      txg_sync_thread:txg-syncing  597MB of 4096MB used
 10  17408      txg_sync_thread:txg-syncing  357MB of 4096MB used
 21  17408      txg_sync_thread:txg-syncing  437MB of 4096MB used
 18  17408      txg_sync_thread:txg-syncing  637MB of 4096MB used

I did not see any significant write latency, but I did see a high read latency which is odd since it doesn't reflect what I experience on the VM or the vSphere host.

root at zfs10:/tmp# dtrace -s rw.d -c 'sleep 60'

  write
           value  ------------- Distribution ------------- count
               8 |                                         0
              16 |                                         27
              32 |@                                        11222
              64 |@@@@@@@                                  106215
             128 |@@@@@@@@@@@@@@@@@@@@@@                   327807
             256 |@@@@@@                                   94605
             512 |@                                        20467
            1024 |                                         6067
            2048 |@                                        7968
            4096 |@                                        10076
            8192 |                                         3380
           16384 |                                         249
           32768 |                                         214
           65536 |                                         219
          131072 |                                         77
          262144 |                                         1
          524288 |                                         0

  read
           value  ------------- Distribution ------------- count
               4 |                                         0
               8 |                                         18
              16 |                                         58
              32 |                                         174
              64 |@@@@@@                                   4322
             128 |@@@@@@@@@                                6278
             256 |@@@                                      2545
             512 |@                                        892
            1024 |@                                        1074
            2048 |@@                                       1171
            4096 |@@@                                      2222
            8192 |@@@@@@                                   4103
           16384 |@@@                                      2400
           32768 |@@                                       1401
           65536 |@@                                       1504
          131072 |@                                        897
          262144 |@                                        427
          524288 |                                         39
         1048576 |                                         1
         2097152 |                                         0

                               avg latency      stddev        iops  throughput
write                                496us      3136us      9809/s   450773k/s
read                               22633us     59917us       492/s    17405k/s

I also happen to monitor how busy each disk is and I don't see any significant load there either... here is an example

[cid:391d20a6-a7e7-4ec1-850c-2153d4eb4f64]

so I'm a bit lost as what to do next, I don't see any stress on the system in terms of writes but I still cannot max out the 8gbit FC... reads however are doing fairly good, getting just over 700MB/sec which is acceptable over 8Gbit FC. Writes tend to be between 350 and 450 MB/sec... they should get up to 700MB/sec as well.

Any ideas where to start?

br,
Rune


________________________________
From: Schweiss, Chip <chip at innovates.com>
Sent: Wednesday, October 14, 2015 2:44 PM
To: Rune Tipsmark
Cc: omnios-discuss at lists.omniti.com
Subject: Re: [OmniOS-discuss] ZIL TXG commits happen very frequently - why?

It all has to do with the write throttle and buffers filling.   Here's a great blog post on how it works and how it's tuned:

http://dtrace.org/blogs/ahl/2014/02/10/the-openzfs-write-throttle/

http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/

-Chip


On Wed, Oct 14, 2015 at 12:45 AM, Rune Tipsmark <rt at steait.net<mailto:rt at steait.net>> wrote:
Hi all.

Wondering if anyone could shed some light on why my ZFS pool would perform TXG commits up to 5 times per second. It?s set to the default 5 second interval and occasionally it does wait 5 seconds between commits, but only when nearly idle.

I?m not sure if this impacts my performance but I would suspect it doesn?t improve it. I force sync on all data.

I got 11 mirrors (7200rpm sas disks) two SLOG devices and two L2 ARC devices and a pair of spare disks.

Each log device can hold 150GB of data so plenty for 2 TXG commits. The system has 384GB memory.

Below is a bit of output from zilstat during a near idle time this morning so you wont see 4-5 commits per second, but during load later today it will happen..

root at zfs10:/tmp# ./zilstat.ksh -M -t -p pool01 txg
waiting for txg commit...
TIME                        txg       N-MB     N-MB/s N-Max-Rate       B-MB     B-MB/s B-Max-Rate    ops  <=4kB 4-32kB >=32kB
2015 Oct 14 06:21:19   10872771          3          3          0         21         21          2    234     14     19    201
2015 Oct 14 06:21:22   10872772         10          3          3         70         23         24    806      0     84    725
2015 Oct 14 06:21:24   10872773         12          6          5         56         28         26    682     17    107    558
2015 Oct 14 06:21:25   10872774         13         13          2         75         75         14    651      0     10    641
2015 Oct 14 06:21:25   10872775          0          0          0          0          0          0      1      0      0      1
2015 Oct 14 06:21:26   10872776         11         11          6         53         53         29    645      2    136    507
2015 Oct 14 06:21:30   10872777         11          2          4         81         20         32    873     11     60    804
2015 Oct 14 06:21:30   10872778          0          0          0          0          0          0      1      0      1      0
2015 Oct 14 06:21:31   10872779         12         12         11         56         56         52    631      0      8    623
2015 Oct 14 06:21:33   10872780         11          5          4         74         37         27    858      0     44    814
2015 Oct 14 06:21:36   10872781         14          4          6         79         26         30    977     12     82    883
2015 Oct 14 06:21:39   10872782         11          3          4         78         26         25    957     18     55    884
2015 Oct 14 06:21:43   10872783         13          3          4         80         20         24    930      0    135    795
2015 Oct 14 06:21:46   10872784         13          4          4         81         27         29    965     13     95    857
2015 Oct 14 06:21:49   10872785         11          3          6         80         26         41   1077     12    215    850
2015 Oct 14 06:21:53   10872786          9          3          2         67         22         18    870      1     74    796
2015 Oct 14 06:21:56   10872787         12          3          5         72         18         26    909     17    163    729
2015 Oct 14 06:21:58   10872788         12          6          3         53         26         21    530      0     33    497
2015 Oct 14 06:21:59   10872789         26         26         24         72         72         62    882     12     60    810
2015 Oct 14 06:22:02   10872790          9          3          5         57         19         28    777      0     70    708
2015 Oct 14 06:22:07   10872791         11          2          3         96         24         22   1044     12     46    986
2015 Oct 14 06:22:10   10872792         13          3          4         78         19         22    911     12     38    862
2015 Oct 14 06:22:14   10872793         11          2          4         79         19         26    930     10     94    826
2015 Oct 14 06:22:17   10872794         11          3          5         73         24         26   1054     17    151    886
2015 Oct 14 06:22:17   10872795          0          0          0          0          0          0      2      0      0      2
2015 Oct 14 06:22:18   10872796         40         40         38         78         78         60    707      0     28    680
2015 Oct 14 06:22:22   10872797         10          3          3         66         22         21    937     14    164    759
2015 Oct 14 06:22:25   10872798          9          2          2         66         16         21    821     11     92    718
2015 Oct 14 06:22:28   10872799         24         12         14         80         40         43    750      0     23    727
2015 Oct 14 06:22:28   10872800          0          0          0          0          0          0      2      0      0      2
2015 Oct 14 06:22:29   10872801         15          7          9         49         24         24    526     11     25    490
2015 Oct 14 06:22:33   10872802         10          2          3         79         19         24    939      0     63    876
2015 Oct 14 06:22:36   10872803         10          5          3         59         29         18    756     11     65    682
2015 Oct 14 06:22:36   10872804          0          0          0          0          0          0      0      0      0      0
2015 Oct 14 06:22:36   10872805         13         13          2         58         58          9    500      0     29    471

--

root at zfs10:/tmp# zpool status pool01
  pool: pool01
state: ONLINE
  scan: scrub repaired 0 in 7h53m with 0 errors on Sat Oct  3 06:53:43 2015
config:

        NAME                       STATE     READ WRITE CKSUM
        pool01                     ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            c4t5000C50055FC9533d0  ONLINE       0     0     0
            c4t5000C50055FE6A63d0  ONLINE       0     0     0
          mirror-1                 ONLINE       0     0     0
            c4t5000C5005708296Fd0  ONLINE       0     0     0
            c4t5000C5005708351Bd0  ONLINE       0     0     0
          mirror-2                 ONLINE       0     0     0
            c4t5000C500570858EFd0  ONLINE       0     0     0
            c4t5000C50057085A6Bd0  ONLINE       0     0     0
          mirror-3                 ONLINE       0     0     0
            c4t5000C50057086307d0  ONLINE       0     0     0
            c4t5000C50057086B67d0  ONLINE       0     0     0
          mirror-4                 ONLINE       0     0     0
            c4t5000C500570870D3d0  ONLINE       0     0     0
            c4t5000C50057089753d0  ONLINE       0     0     0
          mirror-5                 ONLINE       0     0     0
            c4t5000C500625B7EA7d0  ONLINE       0     0     0
            c4t5000C500625B8137d0  ONLINE       0     0     0
          mirror-6                 ONLINE       0     0     0
            c4t5000C500625B8427d0  ONLINE       0     0     0
            c4t5000C500625B86E3d0  ONLINE       0     0     0
          mirror-7                 ONLINE       0     0     0
            c4t5000C500625B886Fd0  ONLINE       0     0     0
            c4t5000C500625BB773d0  ONLINE       0     0     0
          mirror-8                 ONLINE       0     0     0
            c4t5000C500625BC2C3d0  ONLINE       0     0     0
            c4t5000C500625BD3EBd0  ONLINE       0     0     0
          mirror-9                 ONLINE       0     0     0
            c4t5000C50062878C0Bd0  ONLINE       0     0     0
            c4t5000C50062878C43d0  ONLINE       0     0     0
          mirror-10                ONLINE       0     0     0
            c4t5000C50062879687d0  ONLINE       0     0     0
            c4t5000C50062879707d0  ONLINE       0     0     0
        logs
          c11d0                    ONLINE       0     0     0
          c10d0                    ONLINE       0     0     0
        cache
          c14d0                    ONLINE       0     0     0
          c15d0                    ONLINE       0     0     0
        spares
          c4t5000C50062879723d0    AVAIL
          c4t5000C50062879787d0    AVAIL

errors: No known data errors


Br,
Rune

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com<mailto:OmniOS-discuss at lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151014/8fa19e83/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pastedImage.png
Type: image/png
Size: 51990 bytes
Desc: pastedImage.png
URL: <https://omniosce.org/ml-archive/attachments/20151014/8fa19e83/attachment-0001.png>

From danmcd at omniti.com  Wed Oct 14 23:45:19 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 14 Oct 2015 19:45:19 -0400
Subject: [OmniOS-discuss] New metapackage --> illumos-tools
Message-ID: <7D40F53B-0F1C-4147-9E64-0C29E649EDCF@omniti.com>

IF you want to build illumos-omnos or illumos-gate on an OmniOS r151014 or later installation, you can do so now simply by uttering:

	pkg install developer/illumos-tools

illumos-tools is a metapackage that brings in all of the required packages one needs to build illumos-gate or illumos-omnios.  You can then just download the closed-binaries, git clone your favorite illumos-gate child, construct a .env file and get going.

I wanted to have this be a feature of r151016, but it was easy enough where I backported it.

Sorry I didn't have something like this sooner.  I've also updated the How to Build illumos page on the illumos wiki.

Happy installing!
Dan


From ryan at zinascii.com  Thu Oct 15 00:59:58 2015
From: ryan at zinascii.com (Ryan Zezeski)
Date: Wed, 14 Oct 2015 20:59:58 -0400
Subject: [OmniOS-discuss] New metapackage --> illumos-tools
In-Reply-To: <7D40F53B-0F1C-4147-9E64-0C29E649EDCF@omniti.com>
References: <7D40F53B-0F1C-4147-9E64-0C29E649EDCF@omniti.com>
Message-ID: <m27fmp54kx.fsf@zinascii.com>


Dan McDonald writes:

> IF you want to build illumos-omnos or illumos-gate on an OmniOS r151014 or later installation, you can do so now simply by uttering:
>
> 	pkg install developer/illumos-tools
>
> illumos-tools is a metapackage that brings in all of the required packages one needs to build illumos-gate or illumos-omnios.  You can then just download the closed-binaries, git clone your favorite illumos-gate child, construct a .env file and get going.
>
> I wanted to have this be a feature of r151016, but it was easy enough where I backported it.
>
> Sorry I didn't have something like this sooner.  I've also updated the How to Build illumos page on the illumos wiki.
>
> Happy installing!
> Dan

Thank you Dan! Changes like this may seem small but they make all the
difference to a beginner.

-Z

From danmcd at omniti.com  Thu Oct 15 01:23:03 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 14 Oct 2015 21:23:03 -0400
Subject: [OmniOS-discuss] New metapackage --> illumos-tools
In-Reply-To: <m27fmp54kx.fsf@zinascii.com>
References: <7D40F53B-0F1C-4147-9E64-0C29E649EDCF@omniti.com>
	<m27fmp54kx.fsf@zinascii.com>
Message-ID: <C74B9226-C12A-4CA6-9D37-B5016E7A9C9F@omniti.com>


> On Oct 14, 2015, at 8:59 PM, Ryan Zezeski <ryan at zinascii.com> wrote:
> 
> 
> Thank you Dan! Changes like this may seem small but they make all the
> difference to a beginner.

I wanted to have this done before FOSDEM, on the off chance I get to go.  There's one other thing in illumos-gate itself I think I can do to help the newbie (merge bldenv and ws, probably implemented by adding goodies to bldenv, and making ws a wrapper to said goodies), but any little bit will help.

Dan


From paladinemishakal at gmail.com  Thu Oct 15 07:49:48 2015
From: paladinemishakal at gmail.com (Lawrence Giam)
Date: Thu, 15 Oct 2015 15:49:48 +0800
Subject: [OmniOS-discuss] HP Proliant Gen9 server
Message-ID: <CAGueQCcmgG7sWaLzO6D28h7BUVbJpcotFW1i6UD9Ke5p7npu4Q@mail.gmail.com>

Hi All,

I am looking at whether to get HP Proliant Gen9 server and running OmniOS
on it. Does anyone have any experience with this generation of server? Is
the RAID controller (either the B140i or the H240) supported by illumos? I
did a search and cannot find any result in the hardware compatability list.

Thanks & Regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151015/e6e33e6e/attachment.html>

From danmcd at omniti.com  Thu Oct 15 11:25:18 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 15 Oct 2015 07:25:18 -0400
Subject: [OmniOS-discuss] HP Proliant Gen9 server
In-Reply-To: <CAGueQCcmgG7sWaLzO6D28h7BUVbJpcotFW1i6UD9Ke5p7npu4Q@mail.gmail.com>
References: <CAGueQCcmgG7sWaLzO6D28h7BUVbJpcotFW1i6UD9Ke5p7npu4Q@mail.gmail.com>
Message-ID: <F308FE9C-CD66-431C-BADB-3A8AEED42453@omniti.com>


> On Oct 15, 2015, at 3:49 AM, Lawrence Giam <paladinemishakal at gmail.com> wrote:
> 
> Hi All,
> 
> I am looking at whether to get HP Proliant Gen9 server and running OmniOS on it. Does anyone have any experience with this generation of server? Is the RAID controller (either the B140i or the H240) supported by illumos? I did a search and cannot find any result in the hardware compatability list.

There have been updates to cpqary3 for more modern HP Proliant HW.  If you know the PCI IDs, that'd be MOST helpful.  I suspect the answer is "yes", but I don't have the requisite experience.

Dan


From keith at paskett.org  Thu Oct 15 19:29:23 2015
From: keith at paskett.org (Keith Paskett)
Date: Thu, 15 Oct 2015 13:29:23 -0600
Subject: [OmniOS-discuss] HP Proliant Gen9 server
In-Reply-To: <F308FE9C-CD66-431C-BADB-3A8AEED42453@omniti.com>
References: <CAGueQCcmgG7sWaLzO6D28h7BUVbJpcotFW1i6UD9Ke5p7npu4Q@mail.gmail.com>
	<F308FE9C-CD66-431C-BADB-3A8AEED42453@omniti.com>
Message-ID: <647A48B5-2A1E-40AD-9B22-0BDD179EC098@paskett.org>

Proceed with caution. 
A couple of months ago we tried with r151014 and never could get it so OmniOS would recognize the drives. We tried a couple of different HBAs/Array controllers.
The Gen8 systems have been great, but they are getting harder to find new.

Keith

> On Oct 15, 2015, at 5:25 AM, Dan McDonald <danmcd at omniti.com> wrote:
> 
> 
>> On Oct 15, 2015, at 3:49 AM, Lawrence Giam <paladinemishakal at gmail.com> wrote:
>> 
>> Hi All,
>> 
>> I am looking at whether to get HP Proliant Gen9 server and running OmniOS on it. Does anyone have any experience with this generation of server? Is the RAID controller (either the B140i or the H240) supported by illumos? I did a search and cannot find any result in the hardware compatability list.
> 
> There have been updates to cpqary3 for more modern HP Proliant HW.  If you know the PCI IDs, that'd be MOST helpful.  I suspect the answer is "yes", but I don't have the requisite experience.
> 
> Dan
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


From danmcd at omniti.com  Thu Oct 15 19:37:05 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 15 Oct 2015 15:37:05 -0400
Subject: [OmniOS-discuss] HP Proliant Gen9 server
In-Reply-To: <647A48B5-2A1E-40AD-9B22-0BDD179EC098@paskett.org>
References: <CAGueQCcmgG7sWaLzO6D28h7BUVbJpcotFW1i6UD9Ke5p7npu4Q@mail.gmail.com>
	<F308FE9C-CD66-431C-BADB-3A8AEED42453@omniti.com>
	<647A48B5-2A1E-40AD-9B22-0BDD179EC098@paskett.org>
Message-ID: <BE73DEF6-B4F8-4640-8164-374A372DA2D9@omniti.com>


> On Oct 15, 2015, at 3:29 PM, Keith Paskett <keith at paskett.org> wrote:
> 
> Proceed with caution. 
> A couple of months ago we tried with r151014 and never could get it so OmniOS would recognize the drives. We tried a couple of different HBAs/Array controllers.
> The Gen8 systems have been great, but they are getting harder to find new.

Did you try after this commit got backported into r151014?

commit d08e0e5199f47566c90482b1ef4f31ec3798228b
Author:     Robert Mustacchi <rm at joyent.com>
AuthorDate: Tue Aug 11 14:53:49 2015 -0700
Commit:     Dan McDonald <danmcd at omniti.com>
CommitDate: Tue Aug 18 11:39:33 2015 -0400

    6113 cpqary3: add support for hp gen9 smart array controllers
    Reviewed by: Garrett D'Amore <garrett at damore.org>
    Reviewed by: Igor Kozhukhov <ikozhukhov at gmail.com>
    Reviewed by: Richard Lowe <richlowe at richlowe.net>
    Approved by: Dan McDonald <danmcd at omniti.com>


Note the commit date of 18 August vs. when you tried.  I can't recall if you tried before or after.  The most recent r151014 ISO should have that commit on there.

Dan


From keith at paskett.org  Thu Oct 15 20:18:08 2015
From: keith at paskett.org (Keith Paskett)
Date: Thu, 15 Oct 2015 14:18:08 -0600
Subject: [OmniOS-discuss] HP Proliant Gen9 server
In-Reply-To: <BE73DEF6-B4F8-4640-8164-374A372DA2D9@omniti.com>
References: <CAGueQCcmgG7sWaLzO6D28h7BUVbJpcotFW1i6UD9Ke5p7npu4Q@mail.gmail.com>
	<F308FE9C-CD66-431C-BADB-3A8AEED42453@omniti.com>
	<647A48B5-2A1E-40AD-9B22-0BDD179EC098@paskett.org>
	<BE73DEF6-B4F8-4640-8164-374A372DA2D9@omniti.com>
Message-ID: <2AB16D7D-E4EC-4E4C-A4DD-8B23320ECEBA@paskett.org>


> On Oct 15, 2015, at 1:37 PM, Dan McDonald <danmcd at omniti.com> wrote:
> 
> 
>> On Oct 15, 2015, at 3:29 PM, Keith Paskett <keith at paskett.org> wrote:
>> 
>> Proceed with caution. 
>> A couple of months ago we tried with r151014 and never could get it so OmniOS would recognize the drives. We tried a couple of different HBAs/Array controllers.
>> The Gen8 systems have been great, but they are getting harder to find new.
> 
> Did you try after this commit got backported into r151014?
> 
> commit d08e0e5199f47566c90482b1ef4f31ec3798228b
> Author:     Robert Mustacchi <rm at joyent.com>
> AuthorDate: Tue Aug 11 14:53:49 2015 -0700
> Commit:     Dan McDonald <danmcd at omniti.com>
> CommitDate: Tue Aug 18 11:39:33 2015 -0400
> 
>    6113 cpqary3: add support for hp gen9 smart array controllers
>    Reviewed by: Garrett D'Amore <garrett at damore.org>
>    Reviewed by: Igor Kozhukhov <ikozhukhov at gmail.com>
>    Reviewed by: Richard Lowe <richlowe at richlowe.net>
>    Approved by: Dan McDonald <danmcd at omniti.com>
> 
> 
> Note the commit date of 18 August vs. when you tried.  I can't recall if you tried before or after.  The most recent r151014 ISO should have that commit on there.
> 
> Dan
> 

We had already returned our gen9 servers by 18 August, so we never tried with that patch.
? How quickly knowledge becomes obsolete. At least in this case, it?s bad news that is no longer true.

Keith

From henson at acm.org  Fri Oct 16 01:59:00 2015
From: henson at acm.org (Paul B. Henson)
Date: Thu, 15 Oct 2015 18:59:00 -0700
Subject: [OmniOS-discuss] HP Proliant Gen9 server
In-Reply-To: <F308FE9C-CD66-431C-BADB-3A8AEED42453@omniti.com>
References: <CAGueQCcmgG7sWaLzO6D28h7BUVbJpcotFW1i6UD9Ke5p7npu4Q@mail.gmail.com>
	<F308FE9C-CD66-431C-BADB-3A8AEED42453@omniti.com>
Message-ID: <20151016015900.GQ3405@bender.unx.cpp.edu>

On Thu, Oct 15, 2015 at 07:25:18AM -0400, Dan McDonald wrote:

> There have been updates to cpqary3 for more modern HP Proliant HW.  If
> you know the PCI IDs, that'd be MOST helpful.  I suspect the answer is
> "yes", but I don't have the requisite experience.

We've got some gen9 gear running linux, the H240 card shows up as:

0a:00.0 RAID bus controller [0104]: Hewlett-Packard Company Smart Array
Gen9 Controllers [103c:3239] (rev 01)

I believe 103c:3239 is the PCI ID for the card. I can provide any other
hardware details on request (we've got a handful of DL160 units and a
DL360 unit), but unfortunately at this time don't have a box I could
actually boot omnios on.

We're actually planning on buying some HP gear for basic zfs cifs/nfs
storage, but ideally will get it with just a SAS HBA rather than the HP
raid card. I know HP partners with Nexenta, so if you can find the right
sales rep they should be able to spec out illumos friendly gear.

From trey at mailchimp.com  Fri Oct 16 04:38:49 2015
From: trey at mailchimp.com (Trey Palmer)
Date: Fri, 16 Oct 2015 00:38:49 -0400
Subject: [OmniOS-discuss] HP Proliant Gen9 server
In-Reply-To: <CAGueQCcmgG7sWaLzO6D28h7BUVbJpcotFW1i6UD9Ke5p7npu4Q@mail.gmail.com>
References: <CAGueQCcmgG7sWaLzO6D28h7BUVbJpcotFW1i6UD9Ke5p7npu4Q@mail.gmail.com>
Message-ID: <CADRROpUpP+E1XCAcayDWmOT4n9RJqiTabnnwz6G0rOEBNSDtCQ@mail.gmail.com>

B140i is standard AHCI with a fakeraid mode per HP's docs, so it probably
works okay.

B120i works fine, and it looks B140i is the same thing but supports more
disks.

We also run OmniOS on DL380Gen8's, but with LSI2x08 cards and Intel X520's
(ixgbe) vice the onboard controllers.   That setup works well.   I have a
Gen9 with an LSI card racked in to test but haven't gotten to it yet.

One really nice thing about the DL380Gen9 is that you can get a 24xSFF
version with no SAS expanders.

Somewhat relevant:   I tested a 60-disk 4.3U SL4540 Gen8 earlier this year,
using the mezzanine H220 (which is an LSI mpt_sas chipset, the PCIe version
looks like a bog standard 920x-8i).   The box takes two mezzanine cards
which reminded me a little of Sparc 20 SBus cards.

There's a PCIe riser available for the upper socket, but you can't use it
for an HBA because the only connection to the disks is through the
mezzanine sockets.

The mezzanine H220 was the most pathological mpt_sas card I've ever
encountered.   The disks would just disconnect completely.  It could be the
weird connection or the SAS expanders or anything else (disks were He8
SAS).   It just felt risky and we punted on the non-standard hardware after
trying several firmware versions even though we loved the form factor.

The "onboard 10GbE" turns out to be Mellanox ConnectX3, with one QSFP and
one SFP.   It has two SFF AHCI SATA system drives (B120i) per server
module.   You can get it with one or several nodes in the chassis, 1x60,
2x25 or 3x15.   It seems purpose-built for the HDFS/Ceph/GlusterFS
communities.

   -- Trey


On Thu, Oct 15, 2015 at 3:49 AM, Lawrence Giam <paladinemishakal at gmail.com>
wrote:

> Hi All,
>
> I am looking at whether to get HP Proliant Gen9 server and running OmniOS
> on it. Does anyone have any experience with this generation of server? Is
> the RAID controller (either the B140i or the H240) supported by illumos? I
> did a search and cannot find any result in the hardware compatability list.
>
> Thanks & Regards.
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151016/5ca59c3b/attachment.html>

From johan.kragsterman at capvert.se  Fri Oct 16 12:48:06 2015
From: johan.kragsterman at capvert.se (Johan Kragsterman)
Date: Fri, 16 Oct 2015 14:48:06 +0200
Subject: [OmniOS-discuss] was: HP Proliant Gen9 server;
 now: Mezzanine H220 problems SL4540
In-Reply-To: <CADRROpUpP+E1XCAcayDWmOT4n9RJqiTabnnwz6G0rOEBNSDtCQ@mail.gmail.com>
References: <CADRROpUpP+E1XCAcayDWmOT4n9RJqiTabnnwz6G0rOEBNSDtCQ@mail.gmail.com>,
	<CAGueQCcmgG7sWaLzO6D28h7BUVbJpcotFW1i6UD9Ke5p7npu4Q@mail.gmail.com>
Message-ID: <OF3D18E1D3.91310CB8-ONC1257EE0.004652A0-C1257EE0.004652A2@inse.com>


Hi!


-----"OmniOS-discuss" <omnios-discuss-bounces at lists.omniti.com> skrev: -----
Till: Lawrence Giam <paladinemishakal at gmail.com>
Fr?n: Trey Palmer 
S?nt av: "OmniOS-discuss" 
Datum: 2015-10-16 06:47
Kopia: omnios-discuss <omnios-discuss at lists.omniti.com>
?rende: Re: [OmniOS-discuss] HP Proliant Gen9 server


Somewhat relevant: ? I tested a 60-disk 4.3U SL4540 Gen8 earlier this year, using the mezzanine H220 (which is an LSI mpt_sas chipset, the PCIe version looks like a bog standard 920x-8i). ? The box takes two mezzanine cards which reminded me a little of Sparc 20 SBus cards.

There's a PCIe riser available for the upper socket, but you can't use it for an HBA because the only connection to the disks is through the mezzanine sockets.

The mezzanine H220 was the most pathological mpt_sas card I've ever encountered. ? The disks would just disconnect completely.? It could be the weird connection or the SAS expanders or anything else (disks were He8 SAS). ? It just felt risky and we punted on the non-standard hardware after trying several firmware versions even though we loved the form factor.

The "onboard 10GbE" turns out to be Mellanox ConnectX3, with one QSFP and one SFP. ? It has two SFF AHCI SATA system drives (B120i) per server module. ? You can get it with one or several nodes in the chassis, 1x60, 2x25 or 3x15. ? It seems purpose-built for the HDFS/Ceph/GlusterFS communities.

? ?-- Trey


Just want to comment on the Mezzanine H220 problems on SL4540:

Could be a PCI bridge problem. Perhaps there is a bridge between the PCIe bus and the mezzanine card slot, and that one could definitly mess it up...


Rgrds Johan


_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


From colin at omniti.com  Fri Oct 16 15:06:42 2015
From: colin at omniti.com (Colin Roche-Dutch)
Date: Fri, 16 Oct 2015 11:06:42 -0400
Subject: [OmniOS-discuss] subversion intentionally compiled without
 http(s) support?
In-Reply-To: <CAHvNA=OHYoRsFyubjorLmZMmGtD11q9s9LtUU+LFyXXC1o516Q@mail.gmail.com>
References: <C4644AA8-1F6F-449E-A1F1-7338683A1F28@paskett.org>
	<3B1BD7DE-7D7B-471F-A5DE-B402E145B65D@paskett.org>
	<CAHvNA=OHYoRsFyubjorLmZMmGtD11q9s9LtUU+LFyXXC1o516Q@mail.gmail.com>
Message-ID: <CAHvNA=OSP_HgZmtP=wSsKuj_a14ef-kFjqY_uR+1FqC4ySWeJw@mail.gmail.com>

Keith,

The updated pkg for subversion 1.9.2 is out with libserf for http/https
support.

-Thanks,
Colin

On Wed, Oct 14, 2015 at 7:24 PM, Colin Roche-Dutch <colin at omniti.com> wrote:

> Hi Keith,
>
> A bad package slipped into the ms.omniti.com repo. I should have an
> updated one early tomorrow with libserf enabled to fix the http/https
> version.
>
> -Thanks,
> Colin
>
> On Wed, Oct 14, 2015 at 12:32 AM, Keith Paskett <keith at paskett.org> wrote:
>
>> subversion at 1.8.10-0.151014 is fine.
>>
>> > On Oct 13, 2015, at 10:07 PM, Keith Paskett <keith at paskett.org> wrote:
>> >
>> > After installing the following subversion package, I get an error
>> accessing any subversion repository via http(s) protocols:
>> >
>> > PACKAGE                                                    PUBLISHER
>> > pkg:/omniti/developer/versioning/subversion at 1.9.2-0.151014
>> ms.omniti.com
>> >
>> > The error I get is  svn: E170000: Unrecognized URL scheme for ?https://
>> ?'
>> >
>> > Online search suggests that it was compiled without the ?with-ssl
>> and/or ?with-neon switches.
>> >
>> > Subversion worked fine on a r151014 system I set up a couple of months
>> ago.
>> >
>> > Keith Paskett
>> > KLP-Systems
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > OmniOS-discuss mailing list
>> > OmniOS-discuss at lists.omniti.com
>> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151016/1d081b36/attachment.html>

From kai at meder.info  Sat Oct 17 14:51:02 2015
From: kai at meder.info (Kai)
Date: Sat, 17 Oct 2015 14:51:02 +0000 (UTC)
Subject: [OmniOS-discuss] Upgrade from r151006 to r151014
Message-ID: <loom.20151017T164644-197@post.gmane.org>

Hello,

I'm currently on OmniOS v11 r151006 and want to upgrade to the lastest LTS 
and stay there for a while.

I already did
$ pkg refresh --full
$ pkg unfreeze \
  entire \
  consolidation/osnet/osnet-incorporation \
  incorporation/jeos/illumos-gate \
  incorporation/jeos/omnios-userland 
(although can't remember to having freezed anything in the past)

$ pkg update -nv
: No updates available for this image.

doing an explicit 
$ pkg update -nv --be-name=omnios-r151014 entire at 11,5.11-0.151014
: pkg update: 'entire at 11,5.11-0.151008' matches no installed packages

What can I do now?

Thank you very much,
Kai


From jdg117 at elvis.arl.psu.edu  Sat Oct 17 15:21:20 2015
From: jdg117 at elvis.arl.psu.edu (John D Groenveld)
Date: Sat, 17 Oct 2015 11:21:20 -0400
Subject: [OmniOS-discuss] Upgrade from r151006 to r151014
In-Reply-To: Your message of "Sat, 17 Oct 2015 14:51:02 -0000."
	<loom.20151017T164644-197@post.gmane.org> 
References: <loom.20151017T164644-197@post.gmane.org> 
Message-ID: <201510171521.t9HFLKm2006012@elvis.arl.psu.edu>

In message <loom.20151017T164644-197 at post.gmane.org>, Kai writes:
>I'm currently on OmniOS v11 r151006 and want to upgrade to the lastest LTS 
>and stay there for a while.

Did you reset you publisher?
<URL:http://omnios.omniti.com/wiki.php/Upgrade_to_r151014>

John
groenveld at acm.org

From danmcd at omniti.com  Sat Oct 17 16:11:34 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Sat, 17 Oct 2015 12:11:34 -0400
Subject: [OmniOS-discuss] Upgrade from r151006 to r151014
In-Reply-To: <loom.20151017T164644-197@post.gmane.org>
References: <loom.20151017T164644-197@post.gmane.org>
Message-ID: <15DBC5B2-D7E2-461A-800F-EE097646A03B@omniti.com>

We documented this pretty well, I think.

http://omnios.omniti.com/wiki.php/Upgrade_to_r151014

Dan

Sent from my iPhone (typos, autocorrect, and all)

> On Oct 17, 2015, at 10:51 AM, Kai <kai at meder.info> wrote:
> 
> Hello,
> 
> I'm currently on OmniOS v11 r151006 and want to upgrade to the lastest LTS 
> and stay there for a while.
> 
> I already did
> $ pkg refresh --full
> $ pkg unfreeze \
>  entire \
>  consolidation/osnet/osnet-incorporation \
>  incorporation/jeos/illumos-gate \
>  incorporation/jeos/omnios-userland 
> (although can't remember to having freezed anything in the past)
> 
> $ pkg update -nv
> : No updates available for this image.
> 
> doing an explicit 
> $ pkg update -nv --be-name=omnios-r151014 entire at 11,5.11-0.151014
> : pkg update: 'entire at 11,5.11-0.151008' matches no installed packages
> 
> What can I do now?
> 
> Thank you very much,
> Kai
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

From richard at netbsd.org  Tue Oct 20 10:42:50 2015
From: richard at netbsd.org (Richard PALO)
Date: Tue, 20 Oct 2015 12:42:50 +0200
Subject: [OmniOS-discuss] omnios-build perl
Message-ID: <n055rk$dk$1@ger.gmane.org>

thought I'd try to rebuild perl (for grins) on bloody

> richard at omnis:/home/richard/src/omnios-build/build/perl$ tail -30 build.log 
> Build Perl for SOCKS? [n]  
> Try to use long doubles if available? [n]  
> Checking for optional libraries...
> What libraries to use? [-lsocket -lnsl -lgdbm -ldl -lm -lpthread -lc]  
> What optimizer/debugger flag should be used? [-O3]  
> Any additional cc flags?
> [-D_REENTRANT -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_TS_ERRNO -DPTR_IS_LONG -fno-strict-aliasing -pipe -fstack-protector -I/opt/local/include]  
> Let me guess what the preprocessor flags are...
> Any additional ld flags (NOT including libraries)?
> [ -fstack-protector -L/opt/local/lib -L/usr/gnu/lib]  
> Checking your choice of C compiler and flags for coherency...
> Configure: line 5358: 415512 Killed                  $sh -c "$run ./try " >> try.msg 2>&1
> I've tried to compile and run the following simple program:
> 
> #include <stdio.h>
> int main() { printf("Ok\n"); return(0); }
> 
> I used the command:
> 
> 	gcc -o try -O3 -D_REENTRANT -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_TS_ERRNO -DPTR_IS_LONG -fno-strict-aliasing -pipe -fstack-protector -I/opt/local/include -fstack-protector -L/opt/local/lib -L/usr/gnu/lib try.c -lsocket -lnsl -lgdbm -ldl -lm -lpthread -lc
> 	 ./try
> 
> and I got the following output:
> 
> ld.so.1: try: fatal: libgdbm.so.4: open failed: No such file or directory
> The program compiled OK, but exited with status 137.
> You have a problem.  Shall I abort Configure [y]  
> Ok.  Stopping Configure.
> --- Configure failed
> ===== Build aborted =====

qu?saco? I don't believe the gate nor omnios provides gdbm...

-- 
Richard PALO


From sequoiamobil at gmx.net  Tue Oct 20 11:47:47 2015
From: sequoiamobil at gmx.net (Sebastian Gabler)
Date: Tue, 20 Oct 2015 13:47:47 +0200
Subject: [OmniOS-discuss] HP Proliant Gen9 server
In-Reply-To: <mailman.1382.1445337797.63303.omnios-discuss@lists.omniti.com>
References: <mailman.1382.1445337797.63303.omnios-discuss@lists.omniti.com>
Message-ID: <562629E3.9060004@gmx.net>

Am 20.10.2015 um 12:43 schrieb omnios-discuss-request at lists.omniti.com:
> Message: 1
> Date: Fri, 16 Oct 2015 00:38:49 -0400
> From: Trey Palmer<trey at mailchimp.com>
> To: Lawrence Giam<paladinemishakal at gmail.com>
> Cc: omnios-discuss<omnios-discuss at lists.omniti.com>
> Subject: Re: [OmniOS-discuss] HP Proliant Gen9 server
> Message-ID:
> 	<CADRROpUpP+E1XCAcayDWmOT4n9RJqiTabnnwz6G0rOEBNSDtCQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"

> One really nice thing about the DL380Gen9 is that you can get a 24xSFF
> version with no SAS expanders.

I would be interested how that would work, using a single HP branded HBA 
or RAID controller. It may work using the H240ar and H240 (PCIe) each 
(two port controllers) , but I am not sure if that is a desirable 
configuration. I'd rather go for the expander card option, or for 
external JBODs, entirely.
My expectation would be that the expander card would not have problems 
with HP branded drives, at least. Aftermarket drives are problematic 
anyhow in context with the HP boxes.
The B120i is no longer available with the G9 servers, BTW.

sebastian


From jdg117 at elvis.arl.psu.edu  Tue Oct 20 12:32:23 2015
From: jdg117 at elvis.arl.psu.edu (John D Groenveld)
Date: Tue, 20 Oct 2015 08:32:23 -0400
Subject: [OmniOS-discuss] omnios-build perl
In-Reply-To: Your message of "Tue, 20 Oct 2015 12:42:50 +0200."
	<n055rk$dk$1@ger.gmane.org> 
References: <n055rk$dk$1@ger.gmane.org> 
Message-ID: <201510201232.t9KCWNcY007546@elvis.arl.psu.edu>

In message <n055rk$dk$1 at ger.gmane.org>, Richard PALO writes:
>thought I'd try to rebuild perl (for grins) on bloody
>
>> richard at omnis:/home/richard/src/omnios-build/build/perl$ tail -30 build.log 
>> Build Perl for SOCKS? [n]  
>> Try to use long doubles if available? [n]  
>> Checking for optional libraries...
>> What libraries to use? [-lsocket -lnsl -lgdbm -ldl -lm -lpthread -lc]  

Where's that -lgdbm come from?
Perl's not dependent on libgdbm.

And which gcc are you using?
AFAICT bloody allows you to choose between developer/gcc48
and developer/gcc51.

John
groenveld at acm.org

From lotheac at iki.fi  Tue Oct 20 12:52:34 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Tue, 20 Oct 2015 15:52:34 +0300
Subject: [OmniOS-discuss] omnios-build perl
In-Reply-To: <n055rk$dk$1@ger.gmane.org>
References: <n055rk$dk$1@ger.gmane.org>
Message-ID: <20151020125234.GA29305@gutsman.lotheac.fi>

On Tue, Oct 20 2015 12:42:50 +0200, Richard PALO wrote:
> thought I'd try to rebuild perl (for grins) on bloody
> 
> > richard at omnis:/home/richard/src/omnios-build/build/perl$ tail -30 build.log 
> > Build Perl for SOCKS? [n]  
> > Try to use long doubles if available? [n]  
> > Checking for optional libraries...
> > What libraries to use? [-lsocket -lnsl -lgdbm -ldl -lm -lpthread -lc]  

Doesn't happen on my bloody box. Perhaps you have gdbm.h available
somewhere and perl picks it up? My perl build.log says:

    ~/omnios-build/build/perl % egrep '(gdbm\.h|What libraries)' build.log
    What libraries to use? [-lsocket -lnsl -ldl -lm -lpthread -lc]  
    <gdbm.h> NOT found.
    What libraries to use? [-lsocket -lnsl -ldl -lm -lpthread -lc]  
    <gdbm.h> NOT found.

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From richard at netbsd.org  Tue Oct 20 13:01:21 2015
From: richard at netbsd.org (Richard PALO)
Date: Tue, 20 Oct 2015 15:01:21 +0200
Subject: [OmniOS-discuss] omnios-build perl
In-Reply-To: <201510201232.t9KCWNcY007546@elvis.arl.psu.edu>
References: <n055rk$dk$1@ger.gmane.org>
	<201510201232.t9KCWNcY007546@elvis.arl.psu.edu>
Message-ID: <56263B21.6030707@netbsd.org>

Le 20/10/15 14:32, John D Groenveld a ?crit :
> In message <n055rk$dk$1 at ger.gmane.org>, Richard PALO writes:
>> thought I'd try to rebuild perl (for grins) on bloody
>>
>>> richard at omnis:/home/richard/src/omnios-build/build/perl$ tail -30 build.log 
>>> Build Perl for SOCKS? [n]  
>>> Try to use long doubles if available? [n]  
>>> Checking for optional libraries...
>>> What libraries to use? [-lsocket -lnsl -lgdbm -ldl -lm -lpthread -lc]  
> 
> Where's that -lgdbm come from?
> Perl's not dependent on libgdbm.

>From INSTALL, perl seems to try autodetecting, which I presume is buggy here.
Perhaps something needs to be done in Configure or Makefile.SH?

> richard at omnis:/home/richard$ gcc -v
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/opt/gcc-5.1.0/libexec/gcc/i386-pc-solaris2.11/5.1.0/lto-wrapper
> Target: i386-pc-solaris2.11
> Configured with: ./configure --prefix=/opt/gcc-5.1.0 --host i386-pc-solaris2.11 --build i386-pc-solaris2.11 --target i386-pc-solaris2.11 --with-boot-ldflags=-R/opt/gcc-5.1.0/lib --with-gmp=/opt/gcc-5.1.0 --with-mpfr=/opt/gcc-5.1.0 --with-mpc=/opt/gcc-5.1.0 --enable-languages=c,c++,fortran,lto --without-gnu-ld --with-ld=/bin/ld --with-as=/usr/bin/gas --with-gnu-as --with-build-time-tools=/usr/gnu/i386-pc-solaris2.11/bin
> Thread model: posix
> gcc version 5.1.0 (GCC) 


-- 
Richard PALO


From richard at netbsd.org  Tue Oct 20 13:19:10 2015
From: richard at netbsd.org (Richard PALO)
Date: Tue, 20 Oct 2015 15:19:10 +0200
Subject: [OmniOS-discuss] omnios-build perl
In-Reply-To: <20151020125234.GA29305@gutsman.lotheac.fi>
References: <n055rk$dk$1@ger.gmane.org>
	<20151020125234.GA29305@gutsman.lotheac.fi>
Message-ID: <56263F4E.9010709@netbsd.org>

Le 20/10/15 14:52, Lauri Tirkkonen a ?crit :
> egrep '(gdbm\.h|What libraries)' build.log

bloody hell, perl is picking up my pkgsrc installation in /opt/local

I already checked my $PATH prior to launching build.sh, but I notice
from INSTALL:
> 022 Again, this should all happen automatically.  This should also work if
> 1023 you have gdbm installed in any of (/usr/local, /opt/local, /usr/gnu,
> 1024 /opt/gnu, /usr/GNU, or /opt/GNU).

I'll patch these out and see.

-- 
Richard PALO


From trey at mailchimp.com  Tue Oct 20 13:22:24 2015
From: trey at mailchimp.com (Trey Palmer)
Date: Tue, 20 Oct 2015 09:22:24 -0400
Subject: [OmniOS-discuss] HP Proliant Gen9 server
In-Reply-To: <562629E3.9060004@gmx.net>
References: <mailman.1382.1445337797.63303.omnios-discuss@lists.omniti.com>
	<562629E3.9060004@gmx.net>
Message-ID: <CADRROpXQY3dm6OQUGg5UwntQJJHPt4rFU6pN5dRm6McgCh204A@mail.gmail.com>

The expectation by HP is to use a SAS expander card if you're using the HP
RAID hardware.

I was thinking of the specific case of running ZFS on SATA SSD's hooked up
to mpt_sas HBA's.   For SATA drives on Illumos, SAS expanders should be
avoided.   Not a bad idea on any platform, but imperative on Illumos.

     -- Trey


On Tuesday, October 20, 2015, Sebastian Gabler <sequoiamobil at gmx.net> wrote:

> Am 20.10.2015 um 12:43 schrieb omnios-discuss-request at lists.omniti.com:
>
>> Message: 1
>> Date: Fri, 16 Oct 2015 00:38:49 -0400
>> From: Trey Palmer<trey at mailchimp.com>
>> To: Lawrence Giam<paladinemishakal at gmail.com>
>> Cc: omnios-discuss<omnios-discuss at lists.omniti.com>
>> Subject: Re: [OmniOS-discuss] HP Proliant Gen9 server
>> Message-ID:
>>         <
>> CADRROpUpP+E1XCAcayDWmOT4n9RJqiTabnnwz6G0rOEBNSDtCQ at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>
> One really nice thing about the DL380Gen9 is that you can get a 24xSFF
>> version with no SAS expanders.
>>
>
> I would be interested how that would work, using a single HP branded HBA
> or RAID controller. It may work using the H240ar and H240 (PCIe) each (two
> port controllers) , but I am not sure if that is a desirable configuration.
> I'd rather go for the expander card option, or for external JBODs, entirely.
> My expectation would be that the expander card would not have problems
> with HP branded drives, at least. Aftermarket drives are problematic anyhow
> in context with the HP boxes.
> The B120i is no longer available with the G9 servers, BTW.
>
> sebastian
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151020/dc6bd7e2/attachment.html>

From richard at netbsd.org  Tue Oct 20 14:13:58 2015
From: richard at netbsd.org (Richard PALO)
Date: Tue, 20 Oct 2015 16:13:58 +0200
Subject: [OmniOS-discuss] omnios-build perl
In-Reply-To: <20151020125234.GA29305@gutsman.lotheac.fi>
References: <n055rk$dk$1@ger.gmane.org>
	<20151020125234.GA29305@gutsman.lotheac.fi>
Message-ID: <56264C26.3090403@netbsd.org>

Le 20/10/15 14:52, Lauri Tirkkonen a ?crit :
> On Tue, Oct 20 2015 12:42:50 +0200, Richard PALO wrote:
>> thought I'd try to rebuild perl (for grins) on bloody
>>
>>> richard at omnis:/home/richard/src/omnios-build/build/perl$ tail -30 build.log 
>>> Build Perl for SOCKS? [n]  
>>> Try to use long doubles if available? [n]  
>>> Checking for optional libraries...
>>> What libraries to use? [-lsocket -lnsl -lgdbm -ldl -lm -lpthread -lc]  
> 
> Doesn't happen on my bloody box. Perhaps you have gdbm.h available
> somewhere and perl picks it up? My perl build.log says:
> 
>     ~/omnios-build/build/perl % egrep '(gdbm\.h|What libraries)' build.log
>     What libraries to use? [-lsocket -lnsl -ldl -lm -lpthread -lc]  
>     <gdbm.h> NOT found.
>     What libraries to use? [-lsocket -lnsl -ldl -lm -lpthread -lc]  
>     <gdbm.h> NOT found.
> 

I was able to get it building okay updating gcc-sunld.patch with the following:
> @@ -60,7 +60,7 @@ esac
>  # libmalloc.a may allocate memory that is only 4 byte aligned, but
>  # GNU CC on the Sparc assumes that doubles are 8 byte aligned.
>  # Thanks to  Hallvard B. Furuseth <h.b.furuseth at usit.uio.no>
> -set `echo " $libswanted " | sed -e 's@ ld @ @' -e 's@ malloc @ @' -e 's@ ucb @ @' -e 's@ sec @ @' -e 's@ crypt @ @'`
> +set `echo " $libswanted " | sed -e 's@ ld @ @' -e 's@ malloc @ @' -e 's@ ucb @ @' -e 's@ sec @ @' -e 's@ crypt @ @' -e 's@ gdbm @ @'`
>  libswanted="$*"
>  
>  # Look for architecture name.  We want to suggest a useful default.
---------------------------------------------------------------------------------------------------------------------------^^^^^^^^^^^
btw, needed to add it to patches/series with the argument -bz.orig to be useful
this is probably not necessary if always building in a virgin dev zone.
-- 
Richard PALO


From al.slater at scluk.com  Wed Oct 21 10:08:55 2015
From: al.slater at scluk.com (Al Slater)
Date: Wed, 21 Oct 2015 11:08:55 +0100
Subject: [OmniOS-discuss] ILB memory leak?
Message-ID: <56276437.2020109@scluk.com>

Hi,

I am running omnios r151014 on a couple of machines with a couple of 
zones each.  1 zone runs apache as an SSL reverse proxy, the other runs 
ILB for load balancing web to app tier connections.

I noticed that in the ILB zone, the ilbd process memory grows to about 
2Gb.   Restarting ILB releases the memory, and then the memory usage 
gradually increases again, with each memory increase approximately 2 * 
the size of the previous one.  I run a cronjob twice a day ( 8am and 
8pm) which restarts the ilb service and releases the memory.

A graph of memory usage is available at 
https://www.dropbox.com/s/zaz51apxslnivlq/ILB_Memory_2_days.png?dl=0

There are currently 62 rules in the load balancer, with a total of 664 
server/port pairs.

Is there anything I can provide that would help track this down?


-- 
Al Slater


From danmcd at omniti.com  Wed Oct 21 16:35:33 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 21 Oct 2015 12:35:33 -0400
Subject: [OmniOS-discuss] ILB memory leak?
In-Reply-To: <56276437.2020109@scluk.com>
References: <56276437.2020109@scluk.com>
Message-ID: <00AE5FA3-E699-4C4F-8A94-AEEDAAED0856@omniti.com>


> On Oct 21, 2015, at 6:08 AM, Al Slater <al.slater at scluk.com> wrote:
> 
> Hi,
> 
> I am running omnios r151014 on a couple of machines with a couple of zones each.  1 zone runs apache as an SSL reverse proxy, the other runs ILB for load balancing web to app tier connections.
> 
> I noticed that in the ILB zone, the ilbd process memory grows to about 2Gb.   Restarting ILB releases the memory, and then the memory usage gradually increases again, with each memory increase approximately 2 * the size of the previous one.  I run a cronjob twice a day ( 8am and 8pm) which restarts the ilb service and releases the memory.
> 
> A graph of memory usage is available at https://www.dropbox.com/s/zaz51apxslnivlq/ILB_Memory_2_days.png?dl=0
> 
> There are currently 62 rules in the load balancer, with a total of 664 server/port pairs.
> 
> Is there anything I can provide that would help track this down?

You can use svccfg(1M) to enable user-level memory debugging on ilb.  It may cause the ilb daemon to dump core.  (And you're just noticing this in the process, not kernel memory consumption, correct?)

As root:

	svcadm disable -t ilb
	svccfg -s ilb setenv LD_PRELOAD libumem.so
	svccfg -s ilb setenv UMEM_DEBUG default
	svccfg -s ilb refresh
	svcadm enable ilb

That should enable user-level memory debugging.  If you get a coredump, save it and share it.  If you don't and the ilb daemon keeps running, eventually please:

	gcore `pgrep ilbd`

and share THAT corefile.  You can also do this by youself:

	mdb <ilbd-core>
	> ::findleaks

and share ::findleaks.

Once you're done generating corefiles, repeat the steps above, but use "unsetenv LD_PRELOAD" and "unsetenv UMEM_DEBUG" instead of the setenv lines.

Hope this helps,
Dan


From bfriesen at simple.dallas.tx.us  Wed Oct 21 19:07:08 2015
From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn)
Date: Wed, 21 Oct 2015 14:07:08 -0500 (CDT)
Subject: [OmniOS-discuss] ILB memory leak?
In-Reply-To: <00AE5FA3-E699-4C4F-8A94-AEEDAAED0856@omniti.com>
References: <56276437.2020109@scluk.com>
	<00AE5FA3-E699-4C4F-8A94-AEEDAAED0856@omniti.com>
Message-ID: <alpine.GSO.2.01.1510211404050.24180@freddy.simplesystems.org>

On Wed, 21 Oct 2015, Dan McDonald wrote:
>
> You can use svccfg(1M) to enable user-level memory debugging on ilb.  It may cause the ilb daemon to dump core.  (And you're just noticing this in the process, not kernel memory consumption, correct?)
>
> As root:
>
> 	svcadm disable -t ilb
> 	svccfg -s ilb setenv LD_PRELOAD libumem.so
> 	svccfg -s ilb setenv UMEM_DEBUG default
> 	svccfg -s ilb refresh
> 	svcadm enable ilb

Is there a way to use ulimit to limit the data segment size (ulimit 
-d)?  If this is possible, then a dumped core (due to hitting the 
limit) may point directly to the guilty party.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From al.slater at scluk.com  Thu Oct 22 08:43:04 2015
From: al.slater at scluk.com (Al Slater)
Date: Thu, 22 Oct 2015 09:43:04 +0100
Subject: [OmniOS-discuss] ILB memory leak?
In-Reply-To: <00AE5FA3-E699-4C4F-8A94-AEEDAAED0856@omniti.com>
References: <56276437.2020109@scluk.com>
	<00AE5FA3-E699-4C4F-8A94-AEEDAAED0856@omniti.com>
Message-ID: <5628A198.5040808@scluk.com>

On 21/10/2015 17:35, Dan McDonald wrote:
>
>> On Oct 21, 2015, at 6:08 AM, Al Slater <al.slater at scluk.com>
>> wrote:
>>
>> Hi,
>>
>> I am running omnios r151014 on a couple of machines with a couple
>> of zones each.  1 zone runs apache as an SSL reverse proxy, the
>> other runs ILB for load balancing web to app tier connections.
>>
>> I noticed that in the ILB zone, the ilbd process memory grows to
>> about 2Gb.   Restarting ILB releases the memory, and then the
>> memory usage gradually increases again, with each memory increase
>> approximately 2 * the size of the previous one.  I run a cronjob
>> twice a day ( 8am and 8pm) which restarts the ilb service and
>> releases the memory.
>>
>> A graph of memory usage is available at
>> https://www.dropbox.com/s/zaz51apxslnivlq/ILB_Memory_2_days.png?dl=0
>>
 >> There are currently 62 rules in the load balancer, with a
 >> total
>> of 664 server/port pairs.
>>
>> Is there anything I can provide that would help track this down?
>
> You can use svccfg(1M) to enable user-level memory debugging on ilb.
>  It may cause the ilb daemon to dump core.  (And you're just noticing
>  this in the process, not kernel memory consumption, correct?)

I am seeing kernel memory consumption increasing as well, but that may 
be a different issue.  The ilbd process memory is definitely growing.

> As root:
>
> svcadm disable -t ilb svccfg -s ilb setenv LD_PRELOAD libumem.so
> svccfg -s ilb setenv UMEM_DEBUG default svccfg -s ilb refresh svcadm
>  enable ilb
>
> That should enable user-level memory debugging.  If you get a
> coredump, save it and share it.  If you don't and the ilb daemon
> keeps running, eventually please:
>
> gcore `pgrep ilbd`
>
> and share THAT corefile.  You can also do this by youself:
>
> mdb <ilbd-core> > ::findleaks
>
> and share ::findleaks.
>
> Once you're done generating corefiles, repeat the steps above, but
> use "unsetenv LD_PRELOAD" and "unsetenv UMEM_DEBUG" instead of the
> setenv lines.

Thanks Dan.  As we are talking about production boxes here, I will have 
to try and reproduce on another box and then I will give the process 
above a go and see what we come up with.

-- 
Al Slater

Technical Director
SCL

Phone : +44 (0)1273 666607
Fax   : +44 (0)1273 666601
email : al.slater at scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

Registered in England Company number: 1957652 VAT number: GB 760 2433 55


From ryan at zinascii.com  Thu Oct 22 15:13:05 2015
From: ryan at zinascii.com (Ryan Zezeski)
Date: Thu, 22 Oct 2015 11:13:05 -0400
Subject: [OmniOS-discuss] ILB memory leak?
In-Reply-To: <5628A198.5040808@scluk.com>
References: <56276437.2020109@scluk.com>
	<00AE5FA3-E699-4C4F-8A94-AEEDAAED0856@omniti.com>
	<5628A198.5040808@scluk.com>
Message-ID: <m2zizb3pj2.fsf@zinascii.com>


Al Slater writes:

> On 21/10/2015 17:35, Dan McDonald wrote:
>>
>> That should enable user-level memory debugging.  If you get a
>> coredump, save it and share it.  If you don't and the ilb daemon
>> keeps running, eventually please:
>>
>> gcore `pgrep ilbd`
>>
>> and share THAT corefile.  You can also do this by youself:
>>
>> mdb <ilbd-core> > ::findleaks
>>
>> and share ::findleaks.
>>
>> Once you're done generating corefiles, repeat the steps above, but
>> use "unsetenv LD_PRELOAD" and "unsetenv UMEM_DEBUG" instead of the
>> setenv lines.
>
> Thanks Dan.  As we are talking about production boxes here, I will have 
> to try and reproduce on another box and then I will give the process 
> above a go and see what we come up with.

You can also use the DTrace pid provider to grab the user stack on every
malloc(3C) call, and the syscall provider to track mmap(2) calls. That
poses no harm to production and might make the cause of memory usage
obvious.

Something like:

dtrace -qn 'pid$target::malloc:entry { @[ustack()] = count(); }
syscall::mmap*:entry /pid == $target/ { @[ustack()] = count(); }' -p <PID>

Let that run for a while as the memory grows, then Ctrl-C.

-Z

From jim at cos.ru  Thu Oct 22 16:59:15 2015
From: jim at cos.ru (Jim Klimov)
Date: Thu, 22 Oct 2015 19:59:15 +0300
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
Message-ID: <fc3ca2101e41.56294013@cos.ru>

Hello all,
I have this HP-Z400 workstation with 16Gb ECC(should be) RAM running OmniOS bloody, which acts as a backup server for our production systems (regularly rsync'ing large files off Linux boxes, and rotating ZFS auto-snapshots to keep its space free). Sometimes it also runs replicas of infrastructure (DHCP, DNS) and was set up as a VirtualBox + phpVirtualBox host to test that out, but no VMs running.
So the essential loads are ZFS snapshots and ZFS scrubs :)
And it freezes roughly every week. Stops responding to ping, attempts to log in via SSH or physical console - it processes keypresses on the latter, but does not present a login prompt. It used to be stable, and such regular hangs began around summertime.
 
My primary guess would be for flaky disks, maybe timing out under load or going to sleep or whatever... But I have yet to prove it, or any other theory. Maybe just CPU is overheating due to regular near-100% load with disk I/O... At least I want to rule out OS errors and rule out (or point out) operator/box errors as much as possible - which is something I can change to try and fix ;)
Before I proceed to TL;DR screenshots, I'd overview what I see:
* In the "top" output, processes owned by zfssnap lead most of the time... But even the SSH shell is noticeably slow to respond (1 sec per line when just pressing enter to clear the screen to prepare nice screenshots).
* SMART was not enabled on 3TB mirrored "pool" SATA disks (is now, long tests initiated), but was in place on the "rpool" SAS disk where it logged some corrected ECC errors - but none uncorrected.
Maybe the cabling should be reseated.
* iostat shows disks are generally not busy (they don't audibly rattle nor visibly blink all the time, either)
* zpool scrubs return clean
* there are partitions of the system rpool disk (10K RPM SAS) used as log and cache devices for the main data pool on 3TB SATA disks. The system disk is fast and underutilized, so what the heck ;) And it was not a problem for the first year of this system's honest and stable workouts. These devices are pretty empty at the moment.
 
I have enabled deadman panics according to Wiki, but none have happened so far:
# cat /etc/system  | egrep -v '(^\*|^$)'
set snooping=1
set pcplusmp:apic_panic_on_nmi=1
set apix:apic_panic_on_nmi = 1

 
In the "top" output, processes owned by zfssnap lead most of the time:
 
last pid: 22599;  load avg:  12.9,  12.2,  11.2;       up 0+09:52:11                                                                          18:34:41
140 processes: 125 sleeping, 13 running, 2 on cpu
CPU states:  0.0% idle, 22.9% user, 77.1% kernel,  0.0% iowait,  0.0% swap
Memory: 16G phys mem, 1765M free mem, 2048M total swap, 2048M free swap
Seconds to delay:
   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 21389 zfssnap    1  43    2  863M  860M run      5:04 35.61% zfs
 22360 zfssnap    1  52    2  118M  115M run      0:37 16.50% zfs
 21778 zfssnap    1  52    2  563M  560M run      3:15 13.17% zfs
 21278 zfssnap    1  52    2  947M  944M run      5:32  6.91% zfs
 21881 zfssnap    1  43    2  433M  431M run      2:31  5.41% zfs
 21852 zfssnap    1  52    2  459M  456M run      2:39  5.16% zfs
 21266 zfssnap    1  43    2  906M  903M run      5:18  3.95% zfs
 21757 zfssnap    1  43    2  597M  594M run      3:26  2.91% zfs
 21274 zfssnap    1  52    2  930M  927M cpu/0    5:27  2.78% zfs
 22588 zfssnap    1  43    2   30M   27M run      0:08  2.48% zfs
 22580 zfssnap    1  52    2   49M   46M run      0:14  0.71% zfs
 22038 root       1  59    0 5312K 3816K cpu/1    0:01  0.10% top
 22014 root       1  59    0 8020K 4988K sleep    0:00  0.02% sshd

 
Average "iostats" are not that busy:
 
# zpool iostat -Td 5
Thu Oct 22 18:24:59 CEST 2015
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
pool        2.52T   207G    802    116  28.3M   840K
rpool       33.0G   118G      0      4  4.52K  58.7K
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:04 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0     10      0  97.9K
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:09 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:14 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0      9      0  93.5K
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:19 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:24 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:29 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:34 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:39 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0     16      0   374K
----------  -----  -----  -----  -----  -----  -----
...
Thu Oct 22 18:33:49 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0     11      0  94.5K
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:33:54 CEST 2015
pool        2.52T   207G      0     13    819  80.0K
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:33:59 CEST 2015
pool        2.52T   207G      0    129      0  1.06M
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:34:04 CEST 2015
pool        2.52T   207G      0     55      0   503K
rpool       33.0G   118G      0     11      0  97.9K
----------  -----  -----  -----  -----  -----  -----
...
just occasional bursts of work. 
I've now enabled SMART on the disks (2*3Tb mirror "pool" and 1*300Gb "rpool") and ran some short tests and triggered long tests (hopefully they'd succeed by tomorrow); current results are:


# for D in /dev/rdsk/c0*s0; do echo "===== $D :"; smartctl -d sat,12 -a $D ; done ; for D in /dev/rdsk/c4*s0 ; do echo "===== $D :"; smartctl -d scsi -a $D ; done
===== /dev/rdsk/c0t3d0s0 :
smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model:     WDC WD3003FZEX-00Z4SA0
Serial Number:    WD-WCC5D1KKU0PA
LU WWN Device Id: 5 0014ee 2610716b7
Firmware Version: 01.01A01
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Oct 22 18:45:28 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline
data collection:                (32880) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 357) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x7035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   246   154   021    Pre-fail  Always       -       6691
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       14
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       4869
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       14
 16 Unknown_Attribute       0x0022   130   070   000    Old_age   Always       -       2289651870502
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       12
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2
194 Temperature_Celsius     0x0022   117   111   000    Old_age   Always       -       35
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      4869         -
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
===== /dev/rdsk/c0t5d0s0 :
smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Seagate SV35
Device Model:     ST3000VX000-1ES166
Serial Number:    Z500S3L8
LU WWN Device Id: 5 000c50 079e3757b
Firmware Version: CV26
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Oct 22 18:45:28 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline
data collection:                (   80) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 325) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x10b9) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   105   099   006    Pre-fail  Always       -       8600880
  3 Spin_Up_Time            0x0003   096   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       19
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail  Always       -       342685681
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       4214
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       19
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   028   028   000    Old_age   Always       -       72
190 Airflow_Temperature_Cel 0x0022   069   065   045    Old_age   Always       -       31 (Min/Max 29/32)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       19
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       28
194 Temperature_Celsius     0x0022   031   040   000    Old_age   Always       -       31 (0 20 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 90%      4214         -
# 2  Short offline       Completed without error       00%      4214         -
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
===== /dev/rdsk/c4t5000CCA02A1292DDd0s0 :
smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org
Vendor:               HITACHI
Product:              HUS156030VLS600
Revision:             HPH1
User Capacity:        300,000,000,000 bytes [300 GB]
Logical block size:   512 bytes
Logical Unit id:      0x5000cca02a1292dc
Serial number:                LVVA6NHS
Device type:          disk
Transport protocol:   SAS
Local Time is:        Thu Oct 22 18:45:29 2015 CEST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
Current Drive Temperature:     45 C
Drive Trip Temperature:        70 C
Manufactured in week 14 of year 2012
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  80
Elements in grown defect list: 0
Vendor (Seagate) cache information
  Blocks sent to initiator = 2340336504406016
Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0   888890         0    888890          0      29326.957           0
write:         0   961315         0    961315          0       6277.560           0
Non-medium error count:      283
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Self test in progress ...   -     NOW                 - [-   -    -]
# 2  Background long   Aborted (device reset ?)    -   14354                 - [-   -    -]
# 3  Background short  Completed                   -   14354                 - [-   -    -]
# 4  Background long   Aborted (device reset ?)    -   14354                 - [-   -    -]
# 5  Background long   Aborted (device reset ?)    -   14354                 - [-   -    -]
Long (extended) Self Test duration: 2506 seconds [41.8 minutes]

 
The zpool scrub results and general layout:
 
# zpool status -v
  pool: pool
 state: ONLINE
  scan: scrub repaired 0 in 164h13m with 0 errors on Thu Oct 22 18:13:33 2015
config:
        NAME                       STATE     READ WRITE CKSUM
        pool                       ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            c0t3d0                 ONLINE       0     0     0
            c0t5d0                 ONLINE       0     0     0
        logs
          c4t5000CCA02A1292DDd0p2  ONLINE       0     0     0
        cache
          c4t5000CCA02A1292DDd0p3  ONLINE       0     0     0
errors: No known data errors
  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0 in 3h3m with 0 errors on Thu Oct  8 04:12:35 2015
config:
        NAME                       STATE     READ WRITE CKSUM
        rpool                      ONLINE       0     0     0
          c4t5000CCA02A1292DDd0s0  ONLINE       0     0     0
errors: No known data errors

# zpool list -v
NAME                        SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
pool                       2.72T  2.52T   207G         -    68%    92%  1.36x  ONLINE  /
  mirror                   2.72T  2.52T   207G         -    68%    92%
    c0t3d0                     -      -      -         -      -      -
    c0t5d0                     -      -      -         -      -      -
log                            -      -      -         -      -      -
  c4t5000CCA02A1292DDd0p2     8G   148K  8.00G         -     0%     0%
cache                          -      -      -         -      -      -
  c4t5000CCA02A1292DDd0p3   120G  1.80G   118G         -     0%     1%
rpool                       151G  33.0G   118G         -    76%    21%  1.00x  ONLINE  -
  c4t5000CCA02A1292DDd0s0   151G  33.0G   118G         -    76%    21%

Note the long scrub time may have included the downtime while the system was frozen until it was rebooted.
 
Thanks in advance for the fresh pairs of eyeballs,
Jim Klimov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151022/d2b1a76b/attachment-0001.html>

From danmcd at omniti.com  Thu Oct 22 17:11:30 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 22 Oct 2015 13:11:30 -0400
Subject: [OmniOS-discuss] UPDATE NOW --> ntp to 4.2.8p4
Message-ID: <B43F2E80-E9AB-4EE0-8197-6A8308B41F0F@omniti.com>

The NTP software was updated to 4.2.8p4 yesterday.  I've pushed out updates for r151006, r151014, and bloody.  As I mentioned earlier, r151012 users should update to r151014.

"pkg update" followed by "svcadm restart ntp" as a safety measure should be sufficient.  No rebooting is needed.

NTP's update on this patch is here:

	http://support.ntp.org/bin/view/Main/SecurityNotice#Recent_Vulnerabilities

Thanks,
Dan


From danmcd at omniti.com  Thu Oct 22 17:13:12 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 22 Oct 2015 13:13:12 -0400
Subject: [OmniOS-discuss] UPDATE NOW --> ntp to 4.2.8p4
In-Reply-To: <B43F2E80-E9AB-4EE0-8197-6A8308B41F0F@omniti.com>
References: <B43F2E80-E9AB-4EE0-8197-6A8308B41F0F@omniti.com>
Message-ID: <36D4EF6A-C596-441B-828B-862D5EB9423E@omniti.com>


> On Oct 22, 2015, at 1:11 PM, Dan McDonald <danmcd at omniti.com> wrote:
> 
> "pkg update" followed by "svcadm restart ntp" as a safety measure should be sufficient.  No rebooting is needed.

I made a small mistake here.  The "svcadm restart..." is not necessary, the IPS package does the right thing here.

Sorry for the confusion,
Dan


From yavoritomov at gmail.com  Thu Oct 22 17:36:56 2015
From: yavoritomov at gmail.com (Yavor Tomov)
Date: Thu, 22 Oct 2015 12:36:56 -0500
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
In-Reply-To: <fc3ca2101e41.56294013@cos.ru>
References: <fc3ca2101e41.56294013@cos.ru>
Message-ID: <CAJ5wh+6rk+FyO9o1P9AGAtgETMYK6DCu30pmxDUaNHWTECJDUA@mail.gmail.com>

Hi Tovarishch Jim,

I had similar issue with my box and it was related to the NFS locks. I
assume you are using it due to the Linux backups. The solution was posted
by Chip on the mailing list. Copy of his solution below:

"I've seen issues like this when you run out of NFS locks.   NFSv3 in
Illumos is really slow at releasing locks.

On all my NFS servers I do:

sharectl set -p lockd_listen_backlog=256 nfs
sharectl set -p lockd_servers=2048 nfs

Everywhere I can, I use NFSv4 instead of v3.   It handles lock much better."

All the Best
Yavor


On Thu, Oct 22, 2015 at 11:59 AM, Jim Klimov <jim at cos.ru> wrote:

> Hello all,
>
> I have this HP-Z400 workstation with 16Gb ECC(should be) RAM running
> OmniOS bloody, which acts as a backup server for our production systems
> (regularly rsync'ing large files off Linux boxes, and rotating ZFS
> auto-snapshots to keep its space free). Sometimes it also runs replicas of
> infrastructure (DHCP, DNS) and was set up as a VirtualBox + phpVirtualBox
> host to test that out, but no VMs running.
>
> So the essential loads are ZFS snapshots and ZFS scrubs :)
>
> And it freezes roughly every week. Stops responding to ping, attempts to
> log in via SSH or physical console - it processes keypresses on the latter,
> but does not present a login prompt. It used to be stable, and such regular
> hangs began around summertime.
>
>
>
> My primary guess would be for flaky disks, maybe timing out under load or
> going to sleep or whatever... But I have yet to prove it, or any other
> theory. Maybe just CPU is overheating due to regular near-100% load with
> disk I/O... At least I want to rule out OS errors and rule out (or point
> out) operator/box errors as much as possible - which is something I can
> change to try and fix ;)
>
> Before I proceed to TL;DR screenshots, I'd overview what I see:
>
> * In the "top" output, processes owned by zfssnap lead most of the time...
> But even the SSH shell is noticeably slow to respond (1 sec per line when
> just pressing enter to clear the screen to prepare nice screenshots).
>
> * SMART was not enabled on 3TB mirrored "pool" SATA disks (is now, long
> tests initiated), but was in place on the "rpool" SAS disk where it logged
> some corrected ECC errors - but none uncorrected.
>
> Maybe the cabling should be reseated.
>
> * iostat shows disks are generally not busy (they don't audibly rattle nor
> visibly blink all the time, either)
>
> * zpool scrubs return clean
>
> * there are partitions of the system rpool disk (10K RPM SAS) used as log
> and cache devices for the main data pool on 3TB SATA disks. The system disk
> is fast and underutilized, so what the heck ;) And it was not a problem for
> the first year of this system's honest and stable workouts. These devices
> are pretty empty at the moment.
>
>
>
> I have enabled deadman panics according to Wiki, but none have happened so
> far:
>
> # cat /etc/system  | egrep -v '(^\*|^$)'
> set snooping=1
> set pcplusmp:apic_panic_on_nmi=1
> set apix:apic_panic_on_nmi = 1
>
>
>
>
>
> In the "top" output, processes owned by zfssnap lead most of the time:
>
>
>
> last pid: 22599;  load avg:  12.9,  12.2,  11.2;       up
> 0+09:52:11
> 18:34:41
> 140 processes: 125 sleeping, 13 running, 2 on cpu
> CPU states:  0.0% idle, 22.9% user, 77.1% kernel,  0.0% iowait,  0.0% swap
> Memory: 16G phys mem, 1765M free mem, 2048M total swap, 2048M free swap
> Seconds to delay:
>    PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
>  21389 zfssnap    1  43    2  863M  860M run      5:04 35.61% zfs
>  22360 zfssnap    1  52    2  118M  115M run      0:37 16.50% zfs
>  21778 zfssnap    1  52    2  563M  560M run      3:15 13.17% zfs
>  21278 zfssnap    1  52    2  947M  944M run      5:32  6.91% zfs
>  21881 zfssnap    1  43    2  433M  431M run      2:31  5.41% zfs
>  21852 zfssnap    1  52    2  459M  456M run      2:39  5.16% zfs
>  21266 zfssnap    1  43    2  906M  903M run      5:18  3.95% zfs
>  21757 zfssnap    1  43    2  597M  594M run      3:26  2.91% zfs
>  21274 zfssnap    1  52    2  930M  927M cpu/0    5:27  2.78% zfs
>  22588 zfssnap    1  43    2   30M   27M run      0:08  2.48% zfs
>  22580 zfssnap    1  52    2   49M   46M run      0:14  0.71% zfs
>  22038 root       1  59    0 5312K 3816K cpu/1    0:01  0.10% top
>  22014 root       1  59    0 8020K 4988K sleep    0:00  0.02% sshd
>
>
>
> Average "iostats" are not that busy:
>
>
>
> # zpool iostat -Td 5
> Thu Oct 22 18:24:59 CEST 2015
>                capacity     operations    bandwidth
> pool        alloc   free   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> pool        2.52T   207G    802    116  28.3M   840K
> rpool       33.0G   118G      0      4  4.52K  58.7K
> ----------  -----  -----  -----  -----  -----  -----
>
> Thu Oct 22 18:25:04 CEST 2015
> pool        2.52T   207G      0      0      0      0
> rpool       33.0G   118G      0     10      0  97.9K
> ----------  -----  -----  -----  -----  -----  -----
> Thu Oct 22 18:25:09 CEST 2015
> pool        2.52T   207G      0      0      0      0
> rpool       33.0G   118G      0      0      0      0
> ----------  -----  -----  -----  -----  -----  -----
> Thu Oct 22 18:25:14 CEST 2015
> pool        2.52T   207G      0      0      0      0
> rpool       33.0G   118G      0      9      0  93.5K
> ----------  -----  -----  -----  -----  -----  -----
> Thu Oct 22 18:25:19 CEST 2015
> pool        2.52T   207G      0      0      0      0
> rpool       33.0G   118G      0      0      0      0
> ----------  -----  -----  -----  -----  -----  -----
> Thu Oct 22 18:25:24 CEST 2015
> pool        2.52T   207G      0      0      0      0
> rpool       33.0G   118G      0      0      0      0
> ----------  -----  -----  -----  -----  -----  -----
> Thu Oct 22 18:25:29 CEST 2015
> pool        2.52T   207G      0      0      0      0
> rpool       33.0G   118G      0      0      0      0
> ----------  -----  -----  -----  -----  -----  -----
> Thu Oct 22 18:25:34 CEST 2015
> pool        2.52T   207G      0      0      0      0
> rpool       33.0G   118G      0      0      0      0
> ----------  -----  -----  -----  -----  -----  -----
> Thu Oct 22 18:25:39 CEST 2015
> pool        2.52T   207G      0      0      0      0
> rpool       33.0G   118G      0     16      0   374K
> ----------  -----  -----  -----  -----  -----  -----
> ...
>
> Thu Oct 22 18:33:49 CEST 2015
> pool        2.52T   207G      0      0      0      0
> rpool       33.0G   118G      0     11      0  94.5K
> ----------  -----  -----  -----  -----  -----  -----
> Thu Oct 22 18:33:54 CEST 2015
> pool        2.52T   207G      0     13    819  80.0K
> rpool       33.0G   118G      0      0      0      0
> ----------  -----  -----  -----  -----  -----  -----
> Thu Oct 22 18:33:59 CEST 2015
> pool        2.52T   207G      0    129      0  1.06M
> rpool       33.0G   118G      0      0      0      0
> ----------  -----  -----  -----  -----  -----  -----
> Thu Oct 22 18:34:04 CEST 2015
> pool        2.52T   207G      0     55      0   503K
> rpool       33.0G   118G      0     11      0  97.9K
> ----------  -----  -----  -----  -----  -----  -----
> ...
>
> just occasional bursts of work.
>
> I've now enabled SMART on the disks (2*3Tb mirror "pool" and 1*300Gb
> "rpool") and ran some short tests and triggered long tests (hopefully
> they'd succeed by tomorrow); current results are:
>
>
> # for D in /dev/rdsk/c0*s0; do echo "===== $D :"; smartctl -d sat,12 -a $D
> ; done ; for D in /dev/rdsk/c4*s0 ; do echo "===== $D :"; smartctl -d scsi
> -a $D ; done
> ===== /dev/rdsk/c0t3d0s0 :
> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
> www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Device Model:     WDC WD3003FZEX-00Z4SA0
> Serial Number:    WD-WCC5D1KKU0PA
> LU WWN Device Id: 5 0014ee 2610716b7
> Firmware Version: 01.01A01
> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    7200 rpm
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
> Local Time is:    Thu Oct 22 18:45:28 2015 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status:  (0x82) Offline data collection activity
>                                         was completed without error.
>                                         Auto Offline Data Collection:
> Enabled.
> Self-test execution status:      ( 249) Self-test routine in progress...
>                                         90% of test remaining.
> Total time to complete Offline
> data collection:                (32880) seconds.
> Offline data collection
> capabilities:                    (0x7b) SMART execute Offline immediate.
>                                         Auto Offline data collection
> on/off support.
>                                         Suspend Offline collection upon new
>                                         command.
>                                         Offline surface scan supported.
>                                         Self-test supported.
>                                         Conveyance Self-test supported.
>                                         Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before entering
>                                         power-saving mode.
>                                         Supports SMART auto save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                         General Purpose Logging supported.
> Short self-test routine
> recommended polling time:        (   2) minutes.
> Extended self-test routine
> recommended polling time:        ( 357) minutes.
> Conveyance self-test routine
> recommended polling time:        (   5) minutes.
> SCT capabilities:              (0x7035) SCT Status supported.
>                                         SCT Feature Control supported.
>                                         SCT Data Table supported.
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
> Always       -       0
>   3 Spin_Up_Time            0x0027   246   154   021    Pre-fail
> Always       -       6691
>   4 Start_Stop_Count        0x0032   100   100   000    Old_age
> Always       -       14
>   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
> Always       -       0
>   7 Seek_Error_Rate         0x002e   200   200   000    Old_age
> Always       -       0
>   9 Power_On_Hours          0x0032   094   094   000    Old_age
> Always       -       4869
>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age
> Always       -       0
>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       14
>  16 Unknown_Attribute       0x0022   130   070   000    Old_age
> Always       -       2289651870502
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
> Always       -       12
> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age
> Always       -       2
> 194 Temperature_Celsius     0x0022   117   111   000    Old_age
> Always       -       35
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
> Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
> Offline      -       0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining
> LifeTime(hours)  LBA_of_first_error
> # 1  Short offline       Completed without error       00%
> 4869         -
>
> SMART Selective self-test log data structure revision number 1
>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>     1        0        0  Not_testing
>     2        0        0  Not_testing
>     3        0        0  Not_testing
>     4        0        0  Not_testing
>     5        0        0  Not_testing
> Selective self-test flags (0x0):
>   After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>
> ===== /dev/rdsk/c0t5d0s0 :
> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
> www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Model Family:     Seagate SV35
> Device Model:     ST3000VX000-1ES166
> Serial Number:    Z500S3L8
> LU WWN Device Id: 5 000c50 079e3757b
> Firmware Version: CV26
> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    7200 rpm
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
> Local Time is:    Thu Oct 22 18:45:28 2015 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status:  (0x00) Offline data collection activity
>                                         was never started.
>                                         Auto Offline Data Collection:
> Disabled.
> Self-test execution status:      ( 249) Self-test routine in progress...
>                                         90% of test remaining.
> Total time to complete Offline
> data collection:                (   80) seconds.
> Offline data collection
> capabilities:                    (0x73) SMART execute Offline immediate.
>                                         Auto Offline data collection
> on/off support.
>                                         Suspend Offline collection upon new
>                                         command.
>                                         No Offline surface scan supported.
>                                         Self-test supported.
>                                         Conveyance Self-test supported.
>                                         Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before entering
>                                         power-saving mode.
>                                         Supports SMART auto save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                         General Purpose Logging supported.
> Short self-test routine
> recommended polling time:        (   1) minutes.
> Extended self-test routine
> recommended polling time:        ( 325) minutes.
> Conveyance self-test routine
> recommended polling time:        (   2) minutes.
> SCT capabilities:              (0x10b9) SCT Status supported.
>                                         SCT Error Recovery Control
> supported.
>                                         SCT Feature Control supported.
>                                         SCT Data Table supported.
>
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   105   099   006    Pre-fail
> Always       -       8600880
>   3 Spin_Up_Time            0x0003   096   094   000    Pre-fail
> Always       -       0
>   4 Start_Stop_Count        0x0032   100   100   020    Old_age
> Always       -       19
>   5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
> Always       -       0
>   7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail
> Always       -       342685681
>   9 Power_On_Hours          0x0032   096   096   000    Old_age
> Always       -       4214
>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   020    Old_age
> Always       -       19
> 184 End-to-End_Error        0x0032   100   100   099    Old_age
> Always       -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age
> Always       -       0
> 188 Command_Timeout         0x0032   100   100   000    Old_age
> Always       -       0
> 189 High_Fly_Writes         0x003a   028   028   000    Old_age
> Always       -       72
> 190 Airflow_Temperature_Cel 0x0022   069   065   045    Old_age
> Always       -       31 (Min/Max 29/32)
> 191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
> Always       -       0
> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
> Always       -       19
> 193 Load_Cycle_Count        0x0032   100   100   000    Old_age
> Always       -       28
> 194 Temperature_Celsius     0x0022   031   040   000    Old_age
> Always       -       31 (0 20 0 0 0)
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
> Always       -       0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining
> LifeTime(hours)  LBA_of_first_error
> # 1  Extended offline    Self-test routine in progress 90%
> 4214         -
> # 2  Short offline       Completed without error       00%
> 4214         -
>
> SMART Selective self-test log data structure revision number 1
>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>     1        0        0  Not_testing
>     2        0        0  Not_testing
>     3        0        0  Not_testing
>     4        0        0  Not_testing
>     5        0        0  Not_testing
> Selective self-test flags (0x0):
>   After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>
> ===== /dev/rdsk/c4t5000CCA02A1292DDd0s0 :
> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
> www.smartmontools.org
>
> Vendor:               HITACHI
> Product:              HUS156030VLS600
> Revision:             HPH1
> User Capacity:        300,000,000,000 bytes [300 GB]
> Logical block size:   512 bytes
> Logical Unit id:      0x5000cca02a1292dc
> Serial number:                LVVA6NHS
> Device type:          disk
> Transport protocol:   SAS
> Local Time is:        Thu Oct 22 18:45:29 2015 CEST
> Device supports SMART and is Enabled
> Temperature Warning Enabled
> SMART Health Status: OK
>
> Current Drive Temperature:     45 C
> Drive Trip Temperature:        70 C
> Manufactured in week 14 of year 2012
> Specified cycle count over device lifetime:  50000
> Accumulated start-stop cycles:  80
> Elements in grown defect list: 0
> Vendor (Seagate) cache information
>   Blocks sent to initiator = 2340336504406016
>
> Error counter log:
>            Errors Corrected by           Total   Correction
> Gigabytes    Total
>                ECC          rereads/    errors   algorithm
> processed    uncorrected
>            fast | delayed   rewrites  corrected  invocations   [10^9
> bytes]  errors
> read:          0   888890         0    888890          0
> 29326.957           0
> write:         0   961315         0    961315          0
> 6277.560           0
>
> Non-medium error count:      283
>
> SMART Self-test log
> Num  Test              Status                 segment  LifeTime
> LBA_first_err [SK ASC ASQ]
>      Description                              number   (hours)
> # 1  Background long   Self test in progress ...   -
> NOW                 - [-   -    -]
> # 2  Background long   Aborted (device reset ?)    -
> 14354                 - [-   -    -]
> # 3  Background short  Completed                   -
> 14354                 - [-   -    -]
> # 4  Background long   Aborted (device reset ?)    -
> 14354                 - [-   -    -]
> # 5  Background long   Aborted (device reset ?)    -
> 14354                 - [-   -    -]
>
> Long (extended) Self Test duration: 2506 seconds [41.8 minutes]
>
>
>
> The zpool scrub results and general layout:
>
>
>
> # zpool status -v
>   pool: pool
>  state: ONLINE
>   scan: scrub repaired 0 in 164h13m with 0 errors on Thu Oct 22 18:13:33
> 2015
> config:
>
>         NAME                       STATE     READ WRITE CKSUM
>         pool                       ONLINE       0     0     0
>           mirror-0                 ONLINE       0     0     0
>             c0t3d0                 ONLINE       0     0     0
>             c0t5d0                 ONLINE       0     0     0
>         logs
>           c4t5000CCA02A1292DDd0p2  ONLINE       0     0     0
>         cache
>           c4t5000CCA02A1292DDd0p3  ONLINE       0     0     0
>
> errors: No known data errors
>
>   pool: rpool
>  state: ONLINE
> status: Some supported features are not enabled on the pool. The pool can
>         still be used, but some features are unavailable.
> action: Enable all features using 'zpool upgrade'. Once this is done,
>         the pool may no longer be accessible by software that does not
> support
>         the features. See zpool-features(5) for details.
>   scan: scrub repaired 0 in 3h3m with 0 errors on Thu Oct  8 04:12:35 2015
> config:
>
>         NAME                       STATE     READ WRITE CKSUM
>         rpool                      ONLINE       0     0     0
>           c4t5000CCA02A1292DDd0s0  ONLINE       0     0     0
>
> errors: No known data errors
>
> # zpool list -v
> NAME                        SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP
> DEDUP  HEALTH  ALTROOT
> pool                       2.72T  2.52T   207G         -    68%    92%
> 1.36x  ONLINE  /
>   mirror                   2.72T  2.52T   207G         -    68%    92%
>     c0t3d0                     -      -      -         -      -      -
>     c0t5d0                     -      -      -         -      -      -
> log                            -      -      -         -      -      -
>   c4t5000CCA02A1292DDd0p2     8G   148K  8.00G         -     0%     0%
> cache                          -      -      -         -      -      -
>   c4t5000CCA02A1292DDd0p3   120G  1.80G   118G         -     0%     1%
> rpool                       151G  33.0G   118G         -    76%    21%
> 1.00x  ONLINE  -
>   c4t5000CCA02A1292DDd0s0   151G  33.0G   118G         -    76%    21%
>
> Note the long scrub time may have included the downtime while the system
> was frozen until it was rebooted.
>
>
>
> Thanks in advance for the fresh pairs of eyeballs,
> Jim Klimov
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151022/091c9255/attachment-0001.html>

From jimklimov at cos.ru  Thu Oct 22 18:51:32 2015
From: jimklimov at cos.ru (Jim Klimov)
Date: Thu, 22 Oct 2015 20:51:32 +0200
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
In-Reply-To: <CAJ5wh+6rk+FyO9o1P9AGAtgETMYK6DCu30pmxDUaNHWTECJDUA@mail.gmail.com>
References: <fc3ca2101e41.56294013@cos.ru>
	<CAJ5wh+6rk+FyO9o1P9AGAtgETMYK6DCu30pmxDUaNHWTECJDUA@mail.gmail.com>
Message-ID: <F5FBDE4F-9753-4A9C-9DB5-35B264FAD530@cos.ru>

22 ??????? 2015??. 19:36:56 CEST, Yavor Tomov <yavoritomov at gmail.com> ?????:
>Hi Tovarishch Jim,
>
>I had similar issue with my box and it was related to the NFS locks. I
>assume you are using it due to the Linux backups. The solution was
>posted
>by Chip on the mailing list. Copy of his solution below:
>
>"I've seen issues like this when you run out of NFS locks.   NFSv3 in
>Illumos is really slow at releasing locks.
>
>On all my NFS servers I do:
>
>sharectl set -p lockd_listen_backlog=256 nfs
>sharectl set -p lockd_servers=2048 nfs
>
>Everywhere I can, I use NFSv4 instead of v3.   It handles lock much
>better."
>
>All the Best
>Yavor
>
>
>
>On Thu, Oct 22, 2015 at 11:59 AM, Jim Klimov <jim at cos.ru> wrote:
>
>> Hello all,
>>
>> I have this HP-Z400 workstation with 16Gb ECC(should be) RAM running
>> OmniOS bloody, which acts as a backup server for our production
>systems
>> (regularly rsync'ing large files off Linux boxes, and rotating ZFS
>> auto-snapshots to keep its space free). Sometimes it also runs
>replicas of
>> infrastructure (DHCP, DNS) and was set up as a VirtualBox +
>phpVirtualBox
>> host to test that out, but no VMs running.
>>
>> So the essential loads are ZFS snapshots and ZFS scrubs :)
>>
>> And it freezes roughly every week. Stops responding to ping, attempts
>to
>> log in via SSH or physical console - it processes keypresses on the
>latter,
>> but does not present a login prompt. It used to be stable, and such
>regular
>> hangs began around summertime.
>>
>>
>>
>> My primary guess would be for flaky disks, maybe timing out under
>load or
>> going to sleep or whatever... But I have yet to prove it, or any
>other
>> theory. Maybe just CPU is overheating due to regular near-100% load
>with
>> disk I/O... At least I want to rule out OS errors and rule out (or
>point
>> out) operator/box errors as much as possible - which is something I
>can
>> change to try and fix ;)
>>
>> Before I proceed to TL;DR screenshots, I'd overview what I see:
>>
>> * In the "top" output, processes owned by zfssnap lead most of the
>time...
>> But even the SSH shell is noticeably slow to respond (1 sec per line
>when
>> just pressing enter to clear the screen to prepare nice screenshots).
>>
>> * SMART was not enabled on 3TB mirrored "pool" SATA disks (is now,
>long
>> tests initiated), but was in place on the "rpool" SAS disk where it
>logged
>> some corrected ECC errors - but none uncorrected.
>>
>> Maybe the cabling should be reseated.
>>
>> * iostat shows disks are generally not busy (they don't audibly
>rattle nor
>> visibly blink all the time, either)
>>
>> * zpool scrubs return clean
>>
>> * there are partitions of the system rpool disk (10K RPM SAS) used as
>log
>> and cache devices for the main data pool on 3TB SATA disks. The
>system disk
>> is fast and underutilized, so what the heck ;) And it was not a
>problem for
>> the first year of this system's honest and stable workouts. These
>devices
>> are pretty empty at the moment.
>>
>>
>>
>> I have enabled deadman panics according to Wiki, but none have
>happened so
>> far:
>>
>> # cat /etc/system  | egrep -v '(^\*|^$)'
>> set snooping=1
>> set pcplusmp:apic_panic_on_nmi=1
>> set apix:apic_panic_on_nmi = 1
>>
>>
>>
>>
>>
>> In the "top" output, processes owned by zfssnap lead most of the
>time:
>>
>>
>>
>> last pid: 22599;  load avg:  12.9,  12.2,  11.2;       up
>> 0+09:52:11
>> 18:34:41
>> 140 processes: 125 sleeping, 13 running, 2 on cpu
>> CPU states:  0.0% idle, 22.9% user, 77.1% kernel,  0.0% iowait,  0.0%
>swap
>> Memory: 16G phys mem, 1765M free mem, 2048M total swap, 2048M free
>swap
>> Seconds to delay:
>>    PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
>>  21389 zfssnap    1  43    2  863M  860M run      5:04 35.61% zfs
>>  22360 zfssnap    1  52    2  118M  115M run      0:37 16.50% zfs
>>  21778 zfssnap    1  52    2  563M  560M run      3:15 13.17% zfs
>>  21278 zfssnap    1  52    2  947M  944M run      5:32  6.91% zfs
>>  21881 zfssnap    1  43    2  433M  431M run      2:31  5.41% zfs
>>  21852 zfssnap    1  52    2  459M  456M run      2:39  5.16% zfs
>>  21266 zfssnap    1  43    2  906M  903M run      5:18  3.95% zfs
>>  21757 zfssnap    1  43    2  597M  594M run      3:26  2.91% zfs
>>  21274 zfssnap    1  52    2  930M  927M cpu/0    5:27  2.78% zfs
>>  22588 zfssnap    1  43    2   30M   27M run      0:08  2.48% zfs
>>  22580 zfssnap    1  52    2   49M   46M run      0:14  0.71% zfs
>>  22038 root       1  59    0 5312K 3816K cpu/1    0:01  0.10% top
>>  22014 root       1  59    0 8020K 4988K sleep    0:00  0.02% sshd
>>
>>
>>
>> Average "iostats" are not that busy:
>>
>>
>>
>> # zpool iostat -Td 5
>> Thu Oct 22 18:24:59 CEST 2015
>>                capacity     operations    bandwidth
>> pool        alloc   free   read  write   read  write
>> ----------  -----  -----  -----  -----  -----  -----
>> pool        2.52T   207G    802    116  28.3M   840K
>> rpool       33.0G   118G      0      4  4.52K  58.7K
>> ----------  -----  -----  -----  -----  -----  -----
>>
>> Thu Oct 22 18:25:04 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0     10      0  97.9K
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:09 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:14 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      9      0  93.5K
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:19 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:24 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:29 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:34 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:39 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0     16      0   374K
>> ----------  -----  -----  -----  -----  -----  -----
>> ...
>>
>> Thu Oct 22 18:33:49 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0     11      0  94.5K
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:33:54 CEST 2015
>> pool        2.52T   207G      0     13    819  80.0K
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:33:59 CEST 2015
>> pool        2.52T   207G      0    129      0  1.06M
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:34:04 CEST 2015
>> pool        2.52T   207G      0     55      0   503K
>> rpool       33.0G   118G      0     11      0  97.9K
>> ----------  -----  -----  -----  -----  -----  -----
>> ...
>>
>> just occasional bursts of work.
>>
>> I've now enabled SMART on the disks (2*3Tb mirror "pool" and 1*300Gb
>> "rpool") and ran some short tests and triggered long tests (hopefully
>> they'd succeed by tomorrow); current results are:
>>
>>
>> # for D in /dev/rdsk/c0*s0; do echo "===== $D :"; smartctl -d sat,12
>-a $D
>> ; done ; for D in /dev/rdsk/c4*s0 ; do echo "===== $D :"; smartctl -d
>scsi
>> -a $D ; done
>> ===== /dev/rdsk/c0t3d0s0 :
>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
>> www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Device Model:     WDC WD3003FZEX-00Z4SA0
>> Serial Number:    WD-WCC5D1KKU0PA
>> LU WWN Device Id: 5 0014ee 2610716b7
>> Firmware Version: 01.01A01
>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Device is:        Not in smartctl database [for details use: -P
>showall]
>> ATA Version is:   ACS-2 (minor revision not indicated)
>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
>> Local Time is:    Thu Oct 22 18:45:28 2015 CEST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> General SMART Values:
>> Offline data collection status:  (0x82) Offline data collection
>activity
>>                                         was completed without error.
>>                                         Auto Offline Data Collection:
>> Enabled.
>> Self-test execution status:      ( 249) Self-test routine in
>progress...
>>                                         90% of test remaining.
>> Total time to complete Offline
>> data collection:                (32880) seconds.
>> Offline data collection
>> capabilities:                    (0x7b) SMART execute Offline
>immediate.
>>                                         Auto Offline data collection
>> on/off support.
>>                                         Suspend Offline collection
>upon new
>>                                         command.
>>                                         Offline surface scan
>supported.
>>                                         Self-test supported.
>>                                         Conveyance Self-test
>supported.
>>                                         Selective Self-test
>supported.
>> SMART capabilities:            (0x0003) Saves SMART data before
>entering
>>                                         power-saving mode.
>>                                         Supports SMART auto save
>timer.
>> Error logging capability:        (0x01) Error logging supported.
>>                                         General Purpose Logging
>supported.
>> Short self-test routine
>> recommended polling time:        (   2) minutes.
>> Extended self-test routine
>> recommended polling time:        ( 357) minutes.
>> Conveyance self-test routine
>> recommended polling time:        (   5) minutes.
>> SCT capabilities:              (0x7035) SCT Status supported.
>>                                         SCT Feature Control
>supported.
>>                                         SCT Data Table supported.
>>
>> SMART Attributes Data Structure revision number: 16
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE     
>UPDATED
>> WHEN_FAILED RAW_VALUE
>>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
>> Always       -       0
>>   3 Spin_Up_Time            0x0027   246   154   021    Pre-fail
>> Always       -       6691
>>   4 Start_Stop_Count        0x0032   100   100   000    Old_age
>> Always       -       14
>>   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
>> Always       -       0
>>   7 Seek_Error_Rate         0x002e   200   200   000    Old_age
>> Always       -       0
>>   9 Power_On_Hours          0x0032   094   094   000    Old_age
>> Always       -       4869
>>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age
>> Always       -       0
>>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
>> Always       -       0
>>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
>> Always       -       14
>>  16 Unknown_Attribute       0x0022   130   070   000    Old_age
>> Always       -       2289651870502
>> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
>> Always       -       12
>> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age
>> Always       -       2
>> 194 Temperature_Celsius     0x0022   117   111   000    Old_age
>> Always       -       35
>> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
>> Always       -       0
>> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age
>> Always       -       0
>> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age
>> Offline      -       0
>> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
>> Always       -       0
>> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
>> Offline      -       0
>>
>> SMART Error Log Version: 1
>> No Errors Logged
>>
>> SMART Self-test log structure revision number 1
>> Num  Test_Description    Status                  Remaining
>> LifeTime(hours)  LBA_of_first_error
>> # 1  Short offline       Completed without error       00%
>> 4869         -
>>
>> SMART Selective self-test log data structure revision number 1
>>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>>     1        0        0  Not_testing
>>     2        0        0  Not_testing
>>     3        0        0  Not_testing
>>     4        0        0  Not_testing
>>     5        0        0  Not_testing
>> Selective self-test flags (0x0):
>>   After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute
>delay.
>>
>> ===== /dev/rdsk/c0t5d0s0 :
>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
>> www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Model Family:     Seagate SV35
>> Device Model:     ST3000VX000-1ES166
>> Serial Number:    Z500S3L8
>> LU WWN Device Id: 5 000c50 079e3757b
>> Firmware Version: CV26
>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
>> Local Time is:    Thu Oct 22 18:45:28 2015 CEST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> General SMART Values:
>> Offline data collection status:  (0x00) Offline data collection
>activity
>>                                         was never started.
>>                                         Auto Offline Data Collection:
>> Disabled.
>> Self-test execution status:      ( 249) Self-test routine in
>progress...
>>                                         90% of test remaining.
>> Total time to complete Offline
>> data collection:                (   80) seconds.
>> Offline data collection
>> capabilities:                    (0x73) SMART execute Offline
>immediate.
>>                                         Auto Offline data collection
>> on/off support.
>>                                         Suspend Offline collection
>upon new
>>                                         command.
>>                                         No Offline surface scan
>supported.
>>                                         Self-test supported.
>>                                         Conveyance Self-test
>supported.
>>                                         Selective Self-test
>supported.
>> SMART capabilities:            (0x0003) Saves SMART data before
>entering
>>                                         power-saving mode.
>>                                         Supports SMART auto save
>timer.
>> Error logging capability:        (0x01) Error logging supported.
>>                                         General Purpose Logging
>supported.
>> Short self-test routine
>> recommended polling time:        (   1) minutes.
>> Extended self-test routine
>> recommended polling time:        ( 325) minutes.
>> Conveyance self-test routine
>> recommended polling time:        (   2) minutes.
>> SCT capabilities:              (0x10b9) SCT Status supported.
>>                                         SCT Error Recovery Control
>> supported.
>>                                         SCT Feature Control
>supported.
>>                                         SCT Data Table supported.
>>
>> SMART Attributes Data Structure revision number: 10
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE     
>UPDATED
>> WHEN_FAILED RAW_VALUE
>>   1 Raw_Read_Error_Rate     0x000f   105   099   006    Pre-fail
>> Always       -       8600880
>>   3 Spin_Up_Time            0x0003   096   094   000    Pre-fail
>> Always       -       0
>>   4 Start_Stop_Count        0x0032   100   100   020    Old_age
>> Always       -       19
>>   5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
>> Always       -       0
>>   7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail
>> Always       -       342685681
>>   9 Power_On_Hours          0x0032   096   096   000    Old_age
>> Always       -       4214
>>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
>> Always       -       0
>>  12 Power_Cycle_Count       0x0032   100   100   020    Old_age
>> Always       -       19
>> 184 End-to-End_Error        0x0032   100   100   099    Old_age
>> Always       -       0
>> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age
>> Always       -       0
>> 188 Command_Timeout         0x0032   100   100   000    Old_age
>> Always       -       0
>> 189 High_Fly_Writes         0x003a   028   028   000    Old_age
>> Always       -       72
>> 190 Airflow_Temperature_Cel 0x0022   069   065   045    Old_age
>> Always       -       31 (Min/Max 29/32)
>> 191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
>> Always       -       0
>> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
>> Always       -       19
>> 193 Load_Cycle_Count        0x0032   100   100   000    Old_age
>> Always       -       28
>> 194 Temperature_Celsius     0x0022   031   040   000    Old_age
>> Always       -       31 (0 20 0 0 0)
>> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age
>> Always       -       0
>> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
>> Offline      -       0
>> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
>> Always       -       0
>>
>> SMART Error Log Version: 1
>> No Errors Logged
>>
>> SMART Self-test log structure revision number 1
>> Num  Test_Description    Status                  Remaining
>> LifeTime(hours)  LBA_of_first_error
>> # 1  Extended offline    Self-test routine in progress 90%
>> 4214         -
>> # 2  Short offline       Completed without error       00%
>> 4214         -
>>
>> SMART Selective self-test log data structure revision number 1
>>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>>     1        0        0  Not_testing
>>     2        0        0  Not_testing
>>     3        0        0  Not_testing
>>     4        0        0  Not_testing
>>     5        0        0  Not_testing
>> Selective self-test flags (0x0):
>>   After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute
>delay.
>>
>> ===== /dev/rdsk/c4t5000CCA02A1292DDd0s0 :
>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
>> www.smartmontools.org
>>
>> Vendor:               HITACHI
>> Product:              HUS156030VLS600
>> Revision:             HPH1
>> User Capacity:        300,000,000,000 bytes [300 GB]
>> Logical block size:   512 bytes
>> Logical Unit id:      0x5000cca02a1292dc
>> Serial number:                LVVA6NHS
>> Device type:          disk
>> Transport protocol:   SAS
>> Local Time is:        Thu Oct 22 18:45:29 2015 CEST
>> Device supports SMART and is Enabled
>> Temperature Warning Enabled
>> SMART Health Status: OK
>>
>> Current Drive Temperature:     45 C
>> Drive Trip Temperature:        70 C
>> Manufactured in week 14 of year 2012
>> Specified cycle count over device lifetime:  50000
>> Accumulated start-stop cycles:  80
>> Elements in grown defect list: 0
>> Vendor (Seagate) cache information
>>   Blocks sent to initiator = 2340336504406016
>>
>> Error counter log:
>>            Errors Corrected by           Total   Correction
>> Gigabytes    Total
>>                ECC          rereads/    errors   algorithm
>> processed    uncorrected
>>            fast | delayed   rewrites  corrected  invocations   [10^9
>> bytes]  errors
>> read:          0   888890         0    888890          0
>> 29326.957           0
>> write:         0   961315         0    961315          0
>> 6277.560           0
>>
>> Non-medium error count:      283
>>
>> SMART Self-test log
>> Num  Test              Status                 segment  LifeTime
>> LBA_first_err [SK ASC ASQ]
>>      Description                              number   (hours)
>> # 1  Background long   Self test in progress ...   -
>> NOW                 - [-   -    -]
>> # 2  Background long   Aborted (device reset ?)    -
>> 14354                 - [-   -    -]
>> # 3  Background short  Completed                   -
>> 14354                 - [-   -    -]
>> # 4  Background long   Aborted (device reset ?)    -
>> 14354                 - [-   -    -]
>> # 5  Background long   Aborted (device reset ?)    -
>> 14354                 - [-   -    -]
>>
>> Long (extended) Self Test duration: 2506 seconds [41.8 minutes]
>>
>>
>>
>> The zpool scrub results and general layout:
>>
>>
>>
>> # zpool status -v
>>   pool: pool
>>  state: ONLINE
>>   scan: scrub repaired 0 in 164h13m with 0 errors on Thu Oct 22
>18:13:33
>> 2015
>> config:
>>
>>         NAME                       STATE     READ WRITE CKSUM
>>         pool                       ONLINE       0     0     0
>>           mirror-0                 ONLINE       0     0     0
>>             c0t3d0                 ONLINE       0     0     0
>>             c0t5d0                 ONLINE       0     0     0
>>         logs
>>           c4t5000CCA02A1292DDd0p2  ONLINE       0     0     0
>>         cache
>>           c4t5000CCA02A1292DDd0p3  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>>   pool: rpool
>>  state: ONLINE
>> status: Some supported features are not enabled on the pool. The pool
>can
>>         still be used, but some features are unavailable.
>> action: Enable all features using 'zpool upgrade'. Once this is done,
>>         the pool may no longer be accessible by software that does
>not
>> support
>>         the features. See zpool-features(5) for details.
>>   scan: scrub repaired 0 in 3h3m with 0 errors on Thu Oct  8 04:12:35
>2015
>> config:
>>
>>         NAME                       STATE     READ WRITE CKSUM
>>         rpool                      ONLINE       0     0     0
>>           c4t5000CCA02A1292DDd0s0  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>> # zpool list -v
>> NAME                        SIZE  ALLOC   FREE  EXPANDSZ   FRAG   
>CAP
>> DEDUP  HEALTH  ALTROOT
>> pool                       2.72T  2.52T   207G         -    68%   
>92%
>> 1.36x  ONLINE  /
>>   mirror                   2.72T  2.52T   207G         -    68%   
>92%
>>     c0t3d0                     -      -      -         -      -     
>-
>>     c0t5d0                     -      -      -         -      -     
>-
>> log                            -      -      -         -      -     
>-
>>   c4t5000CCA02A1292DDd0p2     8G   148K  8.00G         -     0%    
>0%
>> cache                          -      -      -         -      -     
>-
>>   c4t5000CCA02A1292DDd0p3   120G  1.80G   118G         -     0%    
>1%
>> rpool                       151G  33.0G   118G         -    76%   
>21%
>> 1.00x  ONLINE  -
>>   c4t5000CCA02A1292DDd0s0   151G  33.0G   118G         -    76%   
>21%
>>
>> Note the long scrub time may have included the downtime while the
>system
>> was frozen until it was rebooted.
>>
>>
>>
>> Thanks in advance for the fresh pairs of eyeballs,
>> Jim Klimov
>>
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>>

Thanks for the heads-up. I think all copies are rsync's, but will make sure just in case this bump helps.

Did anyone run into issues with many zfs-auto-snapshots (e.g. thousands - many datasets and many snaps until they are killed to keep some 200gb free) on a small nujber of spindles?

Jim
--
Typos courtesy of K-9 Mail on my Samsung Android

From vab at bb-c.de  Thu Oct 22 18:57:45 2015
From: vab at bb-c.de (Volker A. Brandt)
Date: Thu, 22 Oct 2015 20:57:45 +0200
Subject: [OmniOS-discuss] UPDATE NOW --> ntp to 4.2.8p4
In-Reply-To: <36D4EF6A-C596-441B-828B-862D5EB9423E@omniti.com>
References: <B43F2E80-E9AB-4EE0-8197-6A8308B41F0F@omniti.com>
	<36D4EF6A-C596-441B-828B-862D5EB9423E@omniti.com>
Message-ID: <22057.12713.886330.955767@glaurung.bb-c.de>

Hi Dan!


Thanks for all the work you're doing on OmniOS!

> > On Oct 22, 2015, at 1:11 PM, Dan McDonald <danmcd at omniti.com>
> > wrote:
> >
> > "pkg update" followed by "svcadm restart ntp" as a safety measure
> > should be sufficient.  No rebooting is needed.
>
> I made a small mistake here.  The "svcadm restart..." is not
> necessary, the IPS package does the right thing here.

Well, no, it doesn't. :-)  That's due to a design flaw in the interaction
between IPS and SMF (IMHO).  Even though the manifest object in the package
is properly tagged with restart_fmri, the service is never restarted,
because the manifest is not touched during the "pkg update", as it has not
changed since the last package version.

So if you change an SMF method in a package and want an "automatic"
restart, you  need to also physically modify the SMF manifest.  I do that
by just incrementing a version counter or a timestamp, and noting the
fact in an XML comment in the manifest.  Otherwise, you need to manually
restart the service.


Unrelated, when I updated my local copy of the r151014 repo in preparation
of the pkg udpate for ntp, I got this:

  Processing packages for publisher omnios ...
  Retrieving and evaluating 6161 package(s)...
  Download Manifests ( 907/6161) -pkgrecv: http protocol error: code: 404 reason: Not Found
  URL: 'http://pkg.omniti.com/omnios/r151014/omnios/manifest/0/developer%2Fillumos-tools at 11%2C5.11-0.151014%3A20151016T122410Z' (happened 4 times)

  Processing packages for publisher omnios ...
  Retrieving and evaluating 1030 package(s)...
  PROCESS                                         ITEMS    GET (MB)   SEND (MB)
  Completed                                         1/1     2.4/2.4     4.9/4.9


So the recent addition of the illumos-tools pkg broke something in your
repo.  I worked around that by specifying the service/network/ntp pkg
in the pkgrecv invocation.


Regards -- Volker
-- 
------------------------------------------------------------------------
Volker A. Brandt               Consulting and Support for Oracle Solaris
Brandt & Brandt Computer GmbH                   WWW: http://www.bb-c.de/
Am Wiesenpfad 6, 53340 Meckenheim, GERMANY            Email: vab at bb-c.de
Handelsregister: Amtsgericht Bonn, HRB 10513              Schuhgr??e: 46
Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt

"When logic and proportion have fallen sloppy dead"

From matej at zunaj.si  Thu Oct 22 19:02:49 2015
From: matej at zunaj.si (Matej Zerovnik)
Date: Thu, 22 Oct 2015 21:02:49 +0200
Subject: [OmniOS-discuss] Slow performance with ZeusRAM?
Message-ID: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>

Hello,

I'm building a new system and I'm having a bit of a performance problem. Well, its either that or I'm not getting the whole ZIL idea:)

My system is following:
- IBM xServer 3550 M4 server (dual CPU with 160GB memory)
- LSI 9207 HBA (P19 firmware)
- Supermicro JBOD with SAS expander
- 4TB SAS3 drives
- ZeusRAM for ZIL
- LTS Omnios (all patches applied)

If I benchmark ZeusRAM on its own with random 4k sync writes, I can get 48k IOPS out of it, no problem there.

If I create a new raidz2 pool with 10 hard drives, mirrored ZeusRAMs for ZIL and set sync=always, I can only squeeze 14k IOPS out of the system.
Is that normal or should I be getting 48k IOPS on the 2nd pool as well, since this is the performance ZeusRAM can deliver?

I'm testing with fio:
fio --filename=/pool0/test01 --size=5g --rw=randwrite --refill_buffers --norandommap --randrepeat=0 --ioengine=solarisaio --bs=4k --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest

thanks, Matej
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3468 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20151022/9b8c9ce6/attachment.bin>

From minkim1 at gmail.com  Thu Oct 22 19:15:37 2015
From: minkim1 at gmail.com (Min Kim)
Date: Thu, 22 Oct 2015 12:15:37 -0700
Subject: [OmniOS-discuss] Slow performance with ZeusRAM?
In-Reply-To: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
References: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
Message-ID: <68BCD34E-B3AE-4A23-A0A9-DD6A450DB892@gmail.com>

I believe this is an known issue with SAS expanders.

Please see here:

http://serverfault.com/questions/242336/sas-expanders-vs-direct-attached-sas <http://serverfault.com/questions/242336/sas-expanders-vs-direct-attached-sas>

When you are stress-testing the Zeusram by itself, all the IOPs and bandwidth of the expander are allocated to that device alone.  Once you add all the other drives, you lose some of that as you have to share it with the other disks.

Min Kim

> On Oct 22, 2015, at 12:02 PM, Matej Zerovnik <matej at zunaj.si> wrote:
> 
> Hello,
> 
> I'm building a new system and I'm having a bit of a performance problem. Well, its either that or I'm not getting the whole ZIL idea:)
> 
> My system is following:
> - IBM xServer 3550 M4 server (dual CPU with 160GB memory)
> - LSI 9207 HBA (P19 firmware)
> - Supermicro JBOD with SAS expander
> - 4TB SAS3 drives
> - ZeusRAM for ZIL
> - LTS Omnios (all patches applied)
> 
> If I benchmark ZeusRAM on its own with random 4k sync writes, I can get 48k IOPS out of it, no problem there.
> 
> If I create a new raidz2 pool with 10 hard drives, mirrored ZeusRAMs for ZIL and set sync=always, I can only squeeze 14k IOPS out of the system.
> Is that normal or should I be getting 48k IOPS on the 2nd pool as well, since this is the performance ZeusRAM can deliver?
> 
> I'm testing with fio:
> fio --filename=/pool0/test01 --size=5g --rw=randwrite --refill_buffers --norandommap --randrepeat=0 --ioengine=solarisaio --bs=4k --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest
> 
> thanks, Matej_______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151022/379de577/attachment.html>

From eric.sproul at circonus.com  Thu Oct 22 19:18:47 2015
From: eric.sproul at circonus.com (Eric Sproul)
Date: Thu, 22 Oct 2015 15:18:47 -0400
Subject: [OmniOS-discuss] UPDATE NOW --> ntp to 4.2.8p4
In-Reply-To: <22057.12713.886330.955767@glaurung.bb-c.de>
References: <B43F2E80-E9AB-4EE0-8197-6A8308B41F0F@omniti.com>
	<36D4EF6A-C596-441B-828B-862D5EB9423E@omniti.com>
	<22057.12713.886330.955767@glaurung.bb-c.de>
Message-ID: <CAO8hXRBfQEoWOgPJB6yS3usXqDLj=0A9Gc_qFvvmRM0_Tov+dw@mail.gmail.com>

On Thu, Oct 22, 2015 at 2:57 PM, Volker A. Brandt <vab at bb-c.de> wrote:
>> I made a small mistake here.  The "svcadm restart..." is not
>> necessary, the IPS package does the right thing here.
>
> Well, no, it doesn't. :-)  That's due to a design flaw in the interaction
> between IPS and SMF (IMHO).  Even though the manifest object in the package
> is properly tagged with restart_fmri, the service is never restarted,
> because the manifest is not touched during the "pkg update", as it has not
> changed since the last package version.

The service has restarted correctly for me on both 006 and 014 with
this update.  I'm not sure why that is though, because you're correct
that the ntp.xml file has not changed in all of the '014 versions
published.  I was under the impression that the restart_fmri actuator
would only fire when the associated action was triggered.

However, if we really *do* want to restart ntp when the *daemon*
updates, then we could add a restart_fmri actuator on the
usr/lib/inet/ntpd file.  Thus, whenever that file is updated,
svc:/network/ntp:default could be restarted.

Eric

From matej at zunaj.si  Thu Oct 22 19:26:12 2015
From: matej at zunaj.si (Matej Zerovnik)
Date: Thu, 22 Oct 2015 21:26:12 +0200
Subject: [OmniOS-discuss] Slow performance with ZeusRAM?
In-Reply-To: <1D3B7684-CBA0-408D-99E6-9D84639CB217@gmail.com>
References: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
	<1D3B7684-CBA0-408D-99E6-9D84639CB217@gmail.com>
Message-ID: <6B6E0336-CF33-4B2E-BB7A-1B1D6937E4FC@zunaj.si>

Interesting?

Although, I?m not sure if this is really the problem.

For test, I booted up linux and put both ZeusRAM to raid1 software raid and repeated the test. I got full 48kIOPS in the test, meaning there was 96kIOPS sent to JBOD (48k IOPS for each drive).

On the OmniOS test bed, there are 28k IOPS sent to ZIL and X amount to spindles when flushing write cache, but no more then 1000 IOPS (100 iops/drive * 10). Comparing that to the case above, IOPS shouldn?t be a limit.

Maybe I could try building my pools with hard drives that aren?t near ZIL drive, which is in bay 0. I could take hard drives from bays 4-15, which probably use different SAS lanes.

lp, Matej


> On 22 Oct 2015, at 21:10, Min Kim <minkim1 at gmail.com> wrote:
> 
> I believe this is an known issue with SAS expanders.
> 
> Please see here:
> 
> http://serverfault.com/questions/242336/sas-expanders-vs-direct-attached-sas <http://serverfault.com/questions/242336/sas-expanders-vs-direct-attached-sas>
> 
> When you are stress-testing the Zeusram by itself, all the IOPs and bandwidth of the expander are allocated to that device alone.  Once you add all the other drives, you lose some of that as you have to share it with the other disks.
> 
> Min Kim
> 
> 
> 
>> On Oct 22, 2015, at 12:02 PM, Matej Zerovnik <matej at zunaj.si <mailto:matej at zunaj.si>> wrote:
>> 
>> Hello,
>> 
>> I'm building a new system and I'm having a bit of a performance problem. Well, its either that or I'm not getting the whole ZIL idea:)
>> 
>> My system is following:
>> - IBM xServer 3550 M4 server (dual CPU with 160GB memory)
>> - LSI 9207 HBA (P19 firmware)
>> - Supermicro JBOD with SAS expander
>> - 4TB SAS3 drives
>> - ZeusRAM for ZIL
>> - LTS Omnios (all patches applied)
>> 
>> If I benchmark ZeusRAM on its own with random 4k sync writes, I can get 48k IOPS out of it, no problem there.
>> 
>> If I create a new raidz2 pool with 10 hard drives, mirrored ZeusRAMs for ZIL and set sync=always, I can only squeeze 14k IOPS out of the system.
>> Is that normal or should I be getting 48k IOPS on the 2nd pool as well, since this is the performance ZeusRAM can deliver?
>> 
>> I'm testing with fio:
>> fio --filename=/pool0/test01 --size=5g --rw=randwrite --refill_buffers --norandommap --randrepeat=0 --ioengine=solarisaio --bs=4k --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest
>> 
>> thanks, Matej_______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151022/1d948254/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3468 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20151022/1d948254/attachment-0001.bin>

From lotheac at iki.fi  Thu Oct 22 19:28:33 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Thu, 22 Oct 2015 22:28:33 +0300
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
In-Reply-To: <F5FBDE4F-9753-4A9C-9DB5-35B264FAD530@cos.ru>
References: <fc3ca2101e41.56294013@cos.ru>
	<CAJ5wh+6rk+FyO9o1P9AGAtgETMYK6DCu30pmxDUaNHWTECJDUA@mail.gmail.com>
	<F5FBDE4F-9753-4A9C-9DB5-35B264FAD530@cos.ru>
Message-ID: <20151022192833.GB77@gutsman.lotheac.fi>

On Thu, Oct 22 2015 20:51:32 +0200, Jim Klimov wrote:
> Did anyone run into issues with many zfs-auto-snapshots (e.g.
> thousands - many datasets and many snaps until they are killed to keep
> some 200gb free) on a small nujber of spindles?

Not with that number of snapshots, but we had several thousand
filesystems with dozens (I think about 70) of snapshots per fs, and it
turned out to be a really bad idea due to the memory requirements:
things slowed down *a lot*. We didn't see any hangs though.

I have a pool with around four thousand snapshots total at home and that
box is performing just fine, and it's just two spinning disks + two SSDs
for cache/slog.

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From minkim1 at gmail.com  Thu Oct 22 19:36:50 2015
From: minkim1 at gmail.com (Min Kim)
Date: Thu, 22 Oct 2015 12:36:50 -0700
Subject: [OmniOS-discuss] Slow performance with ZeusRAM?
In-Reply-To: <6B6E0336-CF33-4B2E-BB7A-1B1D6937E4FC@zunaj.si>
References: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
	<1D3B7684-CBA0-408D-99E6-9D84639CB217@gmail.com>
	<6B6E0336-CF33-4B2E-BB7A-1B1D6937E4FC@zunaj.si>
Message-ID: <9D6C17D8-26E4-4F6B-837F-2A3FC0C6E882@gmail.com>

Are you using the same record size of 4K on your zfs pool as you used with your linux test system?

If the record size for the zpool and slog is set at the default value of 128K, it will greatly reduce the measured IOPS relative to that measured with a recordsize of 4K.

Min Kim


> On Oct 22, 2015, at 12:26 PM, Matej Zerovnik <matej at zunaj.si> wrote:
> 
> Interesting?
> 
> Although, I?m not sure if this is really the problem.
> 
> For test, I booted up linux and put both ZeusRAM to raid1 software raid and repeated the test. I got full 48kIOPS in the test, meaning there was 96kIOPS sent to JBOD (48k IOPS for each drive).
> 
> On the OmniOS test bed, there are 28k IOPS sent to ZIL and X amount to spindles when flushing write cache, but no more then 1000 IOPS (100 iops/drive * 10). Comparing that to the case above, IOPS shouldn?t be a limit.
> 
> Maybe I could try building my pools with hard drives that aren?t near ZIL drive, which is in bay 0. I could take hard drives from bays 4-15, which probably use different SAS lanes.
> 
> lp, Matej
> 
> 
>> On 22 Oct 2015, at 21:10, Min Kim <minkim1 at gmail.com <mailto:minkim1 at gmail.com>> wrote:
>> 
>> I believe this is an known issue with SAS expanders.
>> 
>> Please see here:
>> 
>> http://serverfault.com/questions/242336/sas-expanders-vs-direct-attached-sas <http://serverfault.com/questions/242336/sas-expanders-vs-direct-attached-sas>
>> 
>> When you are stress-testing the Zeusram by itself, all the IOPs and bandwidth of the expander are allocated to that device alone.  Once you add all the other drives, you lose some of that as you have to share it with the other disks.
>> 
>> Min Kim
>> 
>> 
>> 
>>> On Oct 22, 2015, at 12:02 PM, Matej Zerovnik <matej at zunaj.si <mailto:matej at zunaj.si>> wrote:
>>> 
>>> Hello,
>>> 
>>> I'm building a new system and I'm having a bit of a performance problem. Well, its either that or I'm not getting the whole ZIL idea:)
>>> 
>>> My system is following:
>>> - IBM xServer 3550 M4 server (dual CPU with 160GB memory)
>>> - LSI 9207 HBA (P19 firmware)
>>> - Supermicro JBOD with SAS expander
>>> - 4TB SAS3 drives
>>> - ZeusRAM for ZIL
>>> - LTS Omnios (all patches applied)
>>> 
>>> If I benchmark ZeusRAM on its own with random 4k sync writes, I can get 48k IOPS out of it, no problem there.
>>> 
>>> If I create a new raidz2 pool with 10 hard drives, mirrored ZeusRAMs for ZIL and set sync=always, I can only squeeze 14k IOPS out of the system.
>>> Is that normal or should I be getting 48k IOPS on the 2nd pool as well, since this is the performance ZeusRAM can deliver?
>>> 
>>> I'm testing with fio:
>>> fio --filename=/pool0/test01 --size=5g --rw=randwrite --refill_buffers --norandommap --randrepeat=0 --ioengine=solarisaio --bs=4k --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest
>>> 
>>> thanks, Matej_______________________________________________
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151022/3bcd156e/attachment.html>

From lotheac at iki.fi  Thu Oct 22 19:41:34 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Thu, 22 Oct 2015 22:41:34 +0300
Subject: [OmniOS-discuss] UPDATE NOW --> ntp to 4.2.8p4
In-Reply-To: <22057.12713.886330.955767@glaurung.bb-c.de>
References: <B43F2E80-E9AB-4EE0-8197-6A8308B41F0F@omniti.com>
	<36D4EF6A-C596-441B-828B-862D5EB9423E@omniti.com>
	<22057.12713.886330.955767@glaurung.bb-c.de>
Message-ID: <20151022194134.GC77@gutsman.lotheac.fi>

On Thu, Oct 22 2015 20:57:45 +0200, Volker A. Brandt wrote:
> Thanks for all the work you're doing on OmniOS!
> 
> > > On Oct 22, 2015, at 1:11 PM, Dan McDonald <danmcd at omniti.com>
> > > wrote:
> > >
> > > "pkg update" followed by "svcadm restart ntp" as a safety measure
> > > should be sufficient.  No rebooting is needed.
> >
> > I made a small mistake here.  The "svcadm restart..." is not
> > necessary, the IPS package does the right thing here.
> 
> Well, no, it doesn't. :-)  That's due to a design flaw in the interaction
> between IPS and SMF (IMHO).  Even though the manifest object in the package
> is properly tagged with restart_fmri, the service is never restarted,
> because the manifest is not touched during the "pkg update", as it has not
> changed since the last package version.
> 
> So if you change an SMF method in a package and want an "automatic"
> restart, you  need to also physically modify the SMF manifest.  I do that
> by just incrementing a version counter or a timestamp, and noting the
> fact in an XML comment in the manifest.  Otherwise, you need to manually
> restart the service.

Well, that's not a design flaw. Actuators are executed only when the
action (eg. file) specifying them changes -- in other words, the
packager should include restart_fmri actuators in file actions that are
relevant for the service in question (eg. the ntpd binary at minimum).
This ntp package does not contain *any* restart_fmri actuators for the
ntp service:

    % pkg contents -mr pkg://omnios/service/network/ntp at 4.2.8.4-0.151014:20151022T170026Z|grep restart_fmri
    file cb84fc718d7aa637c12641aed4405107b5659ab8 chash=8758e80d1b9738c35f2d29cdebceee2930cdfa3b group=bin mode=0444 owner=root path=lib/svc/manifest/network/ntp.xml pkg.csize=1649 pkg.size=4681 restart_fmri=svc:/system/manifest-import:default

(sidebar: the manifest-import service is what imports service manifests
into the SMF repository, which you need to do when you install a new
service)

If you wanted to restart ntp when any files in the ntp package change on
update, you would need an actuator like
'restart_fmri=svc:/network/ntp:default' on *all* file actions delivered
by the package.

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From chip at innovates.com  Thu Oct 22 19:47:53 2015
From: chip at innovates.com (Schweiss, Chip)
Date: Thu, 22 Oct 2015 14:47:53 -0500
Subject: [OmniOS-discuss] Slow performance with ZeusRAM?
In-Reply-To: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
References: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
Message-ID: <CALeZrrQMHQsLWjaF0Ak8QcuG4X__HDgxSB=awqw96CfoKhYm1A@mail.gmail.com>

The ZIL on log devices suffer a bit from not filling queues well.   In
order to get the queues to fill more, try running your test to several zfs
folders on the pool simultaneously and measure your total I/O.

As I understand it, ff you're writing to only one zfs folder, your queue
depth will stay at 1 on the log device and you be come latency bound.

-Chip

On Thu, Oct 22, 2015 at 2:02 PM, Matej Zerovnik <matej at zunaj.si> wrote:

> Hello,
>
> I'm building a new system and I'm having a bit of a performance problem.
> Well, its either that or I'm not getting the whole ZIL idea:)
>
> My system is following:
> - IBM xServer 3550 M4 server (dual CPU with 160GB memory)
> - LSI 9207 HBA (P19 firmware)
> - Supermicro JBOD with SAS expander
> - 4TB SAS3 drives
> - ZeusRAM for ZIL
> - LTS Omnios (all patches applied)
>
> If I benchmark ZeusRAM on its own with random 4k sync writes, I can get
> 48k IOPS out of it, no problem there.
>
> If I create a new raidz2 pool with 10 hard drives, mirrored ZeusRAMs for
> ZIL and set sync=always, I can only squeeze 14k IOPS out of the system.
> Is that normal or should I be getting 48k IOPS on the 2nd pool as well,
> since this is the performance ZeusRAM can deliver?
>
> I'm testing with fio:
> fio --filename=/pool0/test01 --size=5g --rw=randwrite --refill_buffers
> --norandommap --randrepeat=0 --ioengine=solarisaio --bs=4k --iodepth=16
> --numjobs=16 --runtime=60 --group_reporting --name=4ktest
>
> thanks, Matej
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151022/bde569ed/attachment.html>

From bfriesen at simple.dallas.tx.us  Thu Oct 22 19:58:45 2015
From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn)
Date: Thu, 22 Oct 2015 14:58:45 -0500 (CDT)
Subject: [OmniOS-discuss] Slow performance with ZeusRAM?
In-Reply-To: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
References: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
Message-ID: <alpine.GSO.2.01.1510221454010.24180@freddy.simplesystems.org>

On Thu, 22 Oct 2015, Matej Zerovnik wrote:
>
> If I create a new raidz2 pool with 10 hard drives, mirrored ZeusRAMs for ZIL and set sync=always, I can only squeeze 14k IOPS out of the system.
> Is that normal or should I be getting 48k IOPS on the 2nd pool as well, since this is the performance ZeusRAM can deliver?

Is your zfs filesystem using 4k blocks?  Random writes may also 
require random reads due to COW.  If the data is not perfectly aligned 
and perfectly fill the underlying zfs block, and the existing data is 
not already cached in ARC, then it needs to be read from underlying 
store so existing data can be modified during the write.

I do see that you are using asynchronous I/O, which may add more 
factors.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From matej at zunaj.si  Thu Oct 22 20:47:51 2015
From: matej at zunaj.si (Matej Zerovnik)
Date: Thu, 22 Oct 2015 22:47:51 +0200
Subject: [OmniOS-discuss] Slow performance with ZeusRAM?
In-Reply-To: <9D6C17D8-26E4-4F6B-837F-2A3FC0C6E882@gmail.com>
References: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
	<1D3B7684-CBA0-408D-99E6-9D84639CB217@gmail.com>
	<6B6E0336-CF33-4B2E-BB7A-1B1D6937E4FC@zunaj.si>
	<9D6C17D8-26E4-4F6B-837F-2A3FC0C6E882@gmail.com>
Message-ID: <F682CDA1-FA54-452E-B7BF-A3F95AECAFA8@zunaj.si>

I?m using the default value of 128K on linux and OmniOS. I tried with recordsize=4k, but there is no different in iops?

Matej


> On 22 Oct 2015, at 21:36, Min Kim <minkim1 at gmail.com> wrote:
> 
> Are you using the same record size of 4K on your zfs pool as you used with your linux test system?
> 
> If the record size for the zpool and slog is set at the default value of 128K, it will greatly reduce the measured IOPS relative to that measured with a recordsize of 4K.
> 
> Min Kim
> 
> 
> 
> 
>> On Oct 22, 2015, at 12:26 PM, Matej Zerovnik <matej at zunaj.si <mailto:matej at zunaj.si>> wrote:
>> 
>> Interesting?
>> 
>> Although, I?m not sure if this is really the problem.
>> 
>> For test, I booted up linux and put both ZeusRAM to raid1 software raid and repeated the test. I got full 48kIOPS in the test, meaning there was 96kIOPS sent to JBOD (48k IOPS for each drive).
>> 
>> On the OmniOS test bed, there are 28k IOPS sent to ZIL and X amount to spindles when flushing write cache, but no more then 1000 IOPS (100 iops/drive * 10). Comparing that to the case above, IOPS shouldn?t be a limit.
>> 
>> Maybe I could try building my pools with hard drives that aren?t near ZIL drive, which is in bay 0. I could take hard drives from bays 4-15, which probably use different SAS lanes.
>> 
>> lp, Matej
>> 
>> 
>>> On 22 Oct 2015, at 21:10, Min Kim <minkim1 at gmail.com <mailto:minkim1 at gmail.com>> wrote:
>>> 
>>> I believe this is an known issue with SAS expanders.
>>> 
>>> Please see here:
>>> 
>>> http://serverfault.com/questions/242336/sas-expanders-vs-direct-attached-sas <http://serverfault.com/questions/242336/sas-expanders-vs-direct-attached-sas>
>>> 
>>> When you are stress-testing the Zeusram by itself, all the IOPs and bandwidth of the expander are allocated to that device alone.  Once you add all the other drives, you lose some of that as you have to share it with the other disks.
>>> 
>>> Min Kim
>>> 
>>> 
>>> 
>>>> On Oct 22, 2015, at 12:02 PM, Matej Zerovnik <matej at zunaj.si <mailto:matej at zunaj.si>> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> I'm building a new system and I'm having a bit of a performance problem. Well, its either that or I'm not getting the whole ZIL idea:)
>>>> 
>>>> My system is following:
>>>> - IBM xServer 3550 M4 server (dual CPU with 160GB memory)
>>>> - LSI 9207 HBA (P19 firmware)
>>>> - Supermicro JBOD with SAS expander
>>>> - 4TB SAS3 drives
>>>> - ZeusRAM for ZIL
>>>> - LTS Omnios (all patches applied)
>>>> 
>>>> If I benchmark ZeusRAM on its own with random 4k sync writes, I can get 48k IOPS out of it, no problem there.
>>>> 
>>>> If I create a new raidz2 pool with 10 hard drives, mirrored ZeusRAMs for ZIL and set sync=always, I can only squeeze 14k IOPS out of the system.
>>>> Is that normal or should I be getting 48k IOPS on the 2nd pool as well, since this is the performance ZeusRAM can deliver?
>>>> 
>>>> I'm testing with fio:
>>>> fio --filename=/pool0/test01 --size=5g --rw=randwrite --refill_buffers --norandommap --randrepeat=0 --ioengine=solarisaio --bs=4k --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest
>>>> 
>>>> thanks, Matej_______________________________________________
>>>> OmniOS-discuss mailing list
>>>> OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>
>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>>> 
>> 
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151022/2a001eca/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3468 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20151022/2a001eca/attachment-0001.bin>

From bfriesen at simple.dallas.tx.us  Thu Oct 22 21:11:48 2015
From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn)
Date: Thu, 22 Oct 2015 16:11:48 -0500 (CDT)
Subject: [OmniOS-discuss] Slow performance with ZeusRAM?
In-Reply-To: <F682CDA1-FA54-452E-B7BF-A3F95AECAFA8@zunaj.si>
References: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
	<1D3B7684-CBA0-408D-99E6-9D84639CB217@gmail.com>
	<6B6E0336-CF33-4B2E-BB7A-1B1D6937E4FC@zunaj.si>
	<9D6C17D8-26E4-4F6B-837F-2A3FC0C6E882@gmail.com>
	<F682CDA1-FA54-452E-B7BF-A3F95AECAFA8@zunaj.si>
Message-ID: <alpine.GSO.2.01.1510221603030.24180@freddy.simplesystems.org>

On Thu, 22 Oct 2015, Matej Zerovnik wrote:

> I?m using the default value of 128K on linux and OmniOS. I tried with recordsize=4k, but there is no different in iops?

There should be a large difference unless the file data is already 
cached in the ARC.  Even with caching, a block size of 128k means that 
128k is written to underlying store, although a useful purpose of your 
ZIL device is that an aggregation of multiple writes during the TXG 
interval to the same 128k block may be written as one write at the 
next TXG sync interval (rather than immediately).

Try umounting and re-mounting your zfs filesystem (or 'zfs destroy' 
followed by 'zfs create') to see how performance differs on a freshly 
mounted filesystem.  The zfs ARC caching will be purged when the 
filesystem is unmounted.

Do you have compression enabled for this filesystem?

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From matej at zunaj.si  Thu Oct 22 21:28:49 2015
From: matej at zunaj.si (Matej Zerovnik)
Date: Thu, 22 Oct 2015 23:28:49 +0200
Subject: [OmniOS-discuss] Slow performance with ZeusRAM?
In-Reply-To: <CALeZrrQMHQsLWjaF0Ak8QcuG4X__HDgxSB=awqw96CfoKhYm1A@mail.gmail.com>
References: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
	<CALeZrrQMHQsLWjaF0Ak8QcuG4X__HDgxSB=awqw96CfoKhYm1A@mail.gmail.com>
Message-ID: <95ADB53F-BBDD-4CCD-959F-0A174E7DA8F2@zunaj.si>

Chip:
I tried running fio on multiple folders and it?s a little better

When I run 7x fio (iodepth=4 threads=4), I get 28k IOPS on average
When I run 7x fio (iodepth=4 threads=16), I get 35k IOPS on average. iostat shows that there is transfer rate of 140-220MB/s with average request size of 35kB
When I run 7x fio (iodepth=1 threads=1), I get 24k IOPS on average

There are still at least 10k IOPS left to use I guess:)

Bob:
Yes, my ZFS is ashift=12, since all my drives report 4k blocks (is that what you ment?). The pool is completly empty, so there is enough place for writes, so write speed should not be limited because of COW. Looking at iostat, there are no reads on the drives at all.
I?m not sure where fio gets its data, probably from /dev/zero or somewhere?

I will try sync engine instead of solarisaio to see if there is any difference.

I don?t have compression enabled, since I want to test raw performance. I also disabled ARC (primarycache=metadata), just so my read tests are also as real as possible (so I don?t need to run tests with 1TB test file).

> Try umounting and re-mounting your zfs filesystem (or 'zfs destroy' followed by 'zfs create') to see how performance differs on a freshly mounted filesystem.  The zfs ARC caching will be purged when the filesystem is unmounted.


If I understand you correctly, you are saying I should destroy my folders, set recordsize=4k on my pool and then create zfs folders?

thanks, Matej


> On 22 Oct 2015, at 21:47, Schweiss, Chip <chip at innovates.com> wrote:
> 
> The ZIL on log devices suffer a bit from not filling queues well.   In order to get the queues to fill more, try running your test to several zfs folders on the pool simultaneously and measure your total I/O.  
> 
> As I understand it, ff you're writing to only one zfs folder, your queue depth will stay at 1 on the log device and you be come latency bound.
> 
> -Chip
> 
> On Thu, Oct 22, 2015 at 2:02 PM, Matej Zerovnik <matej at zunaj.si <mailto:matej at zunaj.si>> wrote:
> Hello,
> 
> I'm building a new system and I'm having a bit of a performance problem. Well, its either that or I'm not getting the whole ZIL idea:)
> 
> My system is following:
> - IBM xServer 3550 M4 server (dual CPU with 160GB memory)
> - LSI 9207 HBA (P19 firmware)
> - Supermicro JBOD with SAS expander
> - 4TB SAS3 drives
> - ZeusRAM for ZIL
> - LTS Omnios (all patches applied)
> 
> If I benchmark ZeusRAM on its own with random 4k sync writes, I can get 48k IOPS out of it, no problem there.
> 
> If I create a new raidz2 pool with 10 hard drives, mirrored ZeusRAMs for ZIL and set sync=always, I can only squeeze 14k IOPS out of the system.
> Is that normal or should I be getting 48k IOPS on the 2nd pool as well, since this is the performance ZeusRAM can deliver?
> 
> I'm testing with fio:
> fio --filename=/pool0/test01 --size=5g --rw=randwrite --refill_buffers --norandommap --randrepeat=0 --ioengine=solarisaio --bs=4k --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest
> 
> thanks, Matej
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>
> http://lists.omniti.com/mailman/listinfo/omnios-discuss <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151022/2ecb4112/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3468 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20151022/2ecb4112/attachment.bin>

From bfriesen at simple.dallas.tx.us  Thu Oct 22 21:51:04 2015
From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn)
Date: Thu, 22 Oct 2015 16:51:04 -0500 (CDT)
Subject: [OmniOS-discuss] Slow performance with ZeusRAM?
In-Reply-To: <95ADB53F-BBDD-4CCD-959F-0A174E7DA8F2@zunaj.si>
References: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
	<CALeZrrQMHQsLWjaF0Ak8QcuG4X__HDgxSB=awqw96CfoKhYm1A@mail.gmail.com>
	<95ADB53F-BBDD-4CCD-959F-0A174E7DA8F2@zunaj.si>
Message-ID: <alpine.GSO.2.01.1510221636190.24180@freddy.simplesystems.org>

On Thu, 22 Oct 2015, Matej Zerovnik wrote:
> Bob:
> Yes, my ZFS is ashift=12, since all my drives report 4k blocks (is that what you ment?). The pool is completly empty, so there is enough place for
> writes, so write speed should not be limited because of COW. Looking at iostat, there are no reads on the drives at all.
> I?m not sure where fio gets its data, probably from /dev/zero or somewhere?

To be clear, zfs does not overwrite blocks.  Instead zfs modifies (in 
memory) any prior data from a block, and then it writes the block data 
to a new location.  This is called "copy on write".  If the prior data 
would not be entirely overwritten and is not already cached in memory, 
then it needs to be read from underlying disk.

It is interesting that you say there are no reads on the drives at 
all.

>       Try umounting and re-mounting your zfs filesystem (or 'zfs destroy' followed by 'zfs create') to see how performance differs on a freshly
>       mounted filesystem. ?The zfs ARC caching will be purged when the filesystem is unmounted.
> 
> If I understand you correctly, you are saying I should destroy my folders, set recordsize=4k on my pool and then create zfs folders?

It should suffice to delete the file and set recordsize=4k on the 
filesystem, and then use fio to create a new file.  The file retains 
its original recordsize after it has been created so you would need to 
create a new file.

There is maximum performance for random write if the write blocksize 
matches the filesystem blocksize.  There is still a catch though if 
the random write is truely random because the writes may still not 
match up perfectly to underlying blocks.  For example (top is logical 
write data and bottom is file block data):

aligned "random write" data:

XXXXXXXXXXXXXX  XXXXXXXXXXXXXX
XXXXXXXXXXXXXX  XXXXXXXXXXXXXX

vs unaligned "random write" data:

        XXXXXXXXXXXXXX   XXXXXXXXXXXXXX
XXXXXXXXXXXXXX  XXXXXXXXXXXXXX  XXXXXXXXXXXXXX

The writes aligned to the start of the underlying block will be much 
faster.

A major benefit of your ZIL device is to help turn random writes into 
fewer random writes or even sequential writes when the TXG is written 
to your data drives.

It is very difficult to test raw hardware performance through zfs 
filesytem access.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From vab at bb-c.de  Thu Oct 22 23:13:59 2015
From: vab at bb-c.de (Volker A. Brandt)
Date: Fri, 23 Oct 2015 01:13:59 +0200
Subject: [OmniOS-discuss] UPDATE NOW --> ntp to 4.2.8p4
In-Reply-To: <20151022194134.GC77@gutsman.lotheac.fi>
References: <B43F2E80-E9AB-4EE0-8197-6A8308B41F0F@omniti.com>
	<36D4EF6A-C596-441B-828B-862D5EB9423E@omniti.com>
	<22057.12713.886330.955767@glaurung.bb-c.de>
	<20151022194134.GC77@gutsman.lotheac.fi>
Message-ID: <22057.28087.51441.685509@glaurung.bb-c.de>

Lauri Tirkkonen writes:
> > Well, no, it doesn't. :-) That's due to a design flaw in the
> > interaction between IPS and SMF (IMHO).
[...]
 
> Well, that's not a design flaw. Actuators are executed only when the
> action (eg. file) specifying them changes

Yes, this is how IPS does it.  IPS does not really know that the
manifest-import service is special.  There should have been an explicit
"re-import this manifest now" actuator, much like users or groups are 
created.

[...]
> If you wanted to restart ntp when any files in the ntp package
> change on update, you would need an actuator like
> 'restart_fmri=svc:/network/ntp:default' on *all* file actions
> delivered by the package.

I know what you mean.  That might work, but that is normally not what 
you do when you deliver an SMF manifest in your package.  You just drop
it and restart manifest-import, and hope that manifest-import will see
your new manifest.  This is quite a different thing.

Also, what you wrote is not quite true.  What you wanted to write was 
"you would need an actuator on *at least one* file action *that has a
different file hash*".  If nothing is different, the action is not
executed, and the attached actuator does not fire. 

And it gets worse when you remove a package that contains a manifest for
a running SMF service, because it is impossible to call the stop method
of the service before removing the package.  Lots of fun. :-)


Viele Gr??e -- Volker A. Brandt
-- 
------------------------------------------------------------------------
Volker A. Brandt               Consulting and Support for Oracle Solaris
Brandt & Brandt Computer GmbH                   WWW: http://www.bb-c.de/
Am Wiesenpfad 6, 53340 Meckenheim, GERMANY            Email: vab at bb-c.de
Handelsregister: Amtsgericht Bonn, HRB 10513              Schuhgr??e: 46
Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt

"When logic and proportion have fallen sloppy dead"


From vab at bb-c.de  Thu Oct 22 23:20:16 2015
From: vab at bb-c.de (Volker A. Brandt)
Date: Fri, 23 Oct 2015 01:20:16 +0200
Subject: [OmniOS-discuss] UPDATE NOW --> ntp to 4.2.8p4
In-Reply-To: <CAO8hXRBfQEoWOgPJB6yS3usXqDLj=0A9Gc_qFvvmRM0_Tov+dw@mail.gmail.com>
References: <B43F2E80-E9AB-4EE0-8197-6A8308B41F0F@omniti.com>
	<36D4EF6A-C596-441B-828B-862D5EB9423E@omniti.com>
	<22057.12713.886330.955767@glaurung.bb-c.de>
	<CAO8hXRBfQEoWOgPJB6yS3usXqDLj=0A9Gc_qFvvmRM0_Tov+dw@mail.gmail.com>
Message-ID: <22057.28464.529688.787911@glaurung.bb-c.de>

Eric Sproul writes:
> The service has restarted correctly for me on both 006 and 014 with
> this update.  I'm not sure why that is though, because you're
> correct that the ntp.xml file has not changed in all of the '014
> versions published.  I was under the impression that the
> restart_fmri actuator would only fire when the associated action was
> triggered.

Yes, that's why it did not restart on my 014 box, hence I noticed.
Maybe you were on an earlier rev where the manifest really did change?
I guess it also restarted when Dan tested, or else he would not have
mentioned that pkg DTRT.
 
> However, if we really *do* want to restart ntp when the *daemon*
> updates, then we could add a restart_fmri actuator on the
> usr/lib/inet/ntpd file.  Thus, whenever that file is updated,
> svc:/network/ntp:default could be restarted.

It's been a while since I last tried, but I think this will not work,
at least not in some corner cases, e.g. when the pkg is not installed
at all, and the svc:/network/ntp:default service does not exist when
the pkg is installed.  ISTR that pkg install will error out.  Hmmm, 
need to test that again sometimes. :-)


Regards -- Volker
-- 
------------------------------------------------------------------------
Volker A. Brandt               Consulting and Support for Oracle Solaris
Brandt & Brandt Computer GmbH                   WWW: http://www.bb-c.de/
Am Wiesenpfad 6, 53340 Meckenheim, GERMANY            Email: vab at bb-c.de
Handelsregister: Amtsgericht Bonn, HRB 10513              Schuhgr??e: 46
Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt

"When logic and proportion have fallen sloppy dead"

From bhildebrandt at exegy.com  Fri Oct 23 04:42:02 2015
From: bhildebrandt at exegy.com (Hildebrandt, Bill)
Date: Fri, 23 Oct 2015 04:42:02 +0000
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
In-Reply-To: <CAJ5wh+6rk+FyO9o1P9AGAtgETMYK6DCu30pmxDUaNHWTECJDUA@mail.gmail.com>
References: <fc3ca2101e41.56294013@cos.ru>
	<CAJ5wh+6rk+FyO9o1P9AGAtgETMYK6DCu30pmxDUaNHWTECJDUA@mail.gmail.com>
Message-ID: <AC3215A2A0BBBA4F936F337194E5B1950119ED13E2@mx2k10.eng.exegy.net>

Chip was responding to an issue that we have been having (since 9/18) with our online systems becoming unresponsive to NFS and CIFS.  During these hangs we also see that doing an ?ls? works, but an ?ls ?l? hangs.  NFS cannot be restarted, and we are forced to reboot.  I was hopeful that the NFS settings would work, but on Tuesday of this week, our offline replication target experienced the same issue.  We were about to perform a scrub, when we noticed in our Napp-it interface that ?ZFS Filesystems? was not displaying.  After SSHing to the system, we saw that ?ls ?l? did not respond.  This was a pure replication target with no NFS access, so I don?t see how the NFS lock tuning could be the solution.  A ?reboot ?d? was performed, and we have a 3.9GB dump.  If you have a preferred location to  receive such dumps, I would be more than happy to share.

I should note that we just started using OmniOS this summer, so we have always been at the r151014 version.  These were newly created units that have performed perfectly for 2.5 months, and now we are having hangs every 1-2 weeks.  Here is a timeline that I shared with Guenther Alka:
(I know it?s not best practice, but we had to use ?export? as the name of our data pool)

9/14 ? notified of the L2Arc issue ? removed the L2Arc device from the export pool and stopped nightly replication jobs.  Ran a scrub on the replicated NAS appliance, and found no issues

9/15 ? ran ?zdb ?bbccsv export? on the replicated unit and it came up clean

9/16 ? updated OmniOS and Napp-it on the replicated unit; re-added the L2Arc device; re-enabled the replication jobs

9/18 ?  in the morning, we were notified that the main NAS unit had stopped responding to CIFS and ?ls ?l? was not working ? NFS continued to function.  So this was prior to any rebuild of the system disk, upgrade of OmniOS, or re-enabling the L2Arc device.  That night the system disk was rebuilt from scratch from the original CD, with a pkg update performed prior to importing the data pool.

9/19-9/21 ? everything appeared to be working well.  The replication jobs that run at 1, 2, and 3am this morning completed just fine.  Around 5:40am this morning is when we were notified that NFS had stopped serving up files.  After logging in, we found that the ?ls ?l? issue had returned, and CIFS was non-functional as well.  Also, the ?Pools? tab in Napp-it worked, but the ?ZFS Filesystems? tab did not.  I find it interesting that an ?ls ?l? failed while I was in /root ? the rpool is brand new, with updated OS, and no L2Arc device.

I have performed scrubs and ?zdb ?bbccvs? on both units shortly after this, and no errors were found.  For the system that hung on Tuesday, a new scrub was clean, but the zdb showed leaks, and over 33k checksum errors (full disclosure ? several replication jobs were ran during the zdb).  A subsequent scrub and ?zpool status? showed no problem.  I performed another round of replications and all completed without errors.  I have now exported that pool, and am performing another zdb, which is still running but showing clean so far.

Just in case I have done something insanely stupid with regard to the system configuration, here is the config (both units are identical except for the OS ? if the config seems strange, these were conversions of two old CORAID chassis):

OS:                         omnios-f090f73  (the online unit is at omnios-cffff65)
Mobo:                 Supermicro H8DG6-H8DGi
RAM:                     128GB ECC (Hynix, which is really Samsung I think)
Controllers:        2x LSI 9201-16i with SAS-SATA break-out cables going to a SAS846TQ backplane
System disk:       mirrored Innodisk 128GB SATADOMs (SATADOM-MV 3ME) off mobo
ZIL:                         mirrored Samsung SSD 845D (400GB) off mobo
L2ARC:                  SAMSUNG MZHPU512 (512GB PCIe)
NICs:                     Intel 10G (X520)
Data disks:          24x WD 1TB Enterprise SATA (WDC WD1002FBYS)

Any help/insight is greatly appreciated.

Thanks,
Bill


From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Yavor Tomov
Sent: Thursday, October 22, 2015 12:37 PM
To: jimklimov at cos.ru
Cc: OmniOS-discuss
Subject: Re: [OmniOS-discuss] OmniOS backup box hanging regularly

Hi Tovarishch Jim,

I had similar issue with my box and it was related to the NFS locks. I assume you are using it due to the Linux backups. The solution was posted by Chip on the mailing list. Copy of his solution below:

"I've seen issues like this when you run out of NFS locks.   NFSv3 in Illumos is really slow at releasing locks.

On all my NFS servers I do:

sharectl set -p lockd_listen_backlog=256 nfs
sharectl set -p lockd_servers=2048 nfs

Everywhere I can, I use NFSv4 instead of v3.   It handles lock much better."

All the Best
Yavor


On Thu, Oct 22, 2015 at 11:59 AM, Jim Klimov <jim at cos.ru<mailto:jim at cos.ru>> wrote:

Hello all,

I have this HP-Z400 workstation with 16Gb ECC(should be) RAM running OmniOS bloody, which acts as a backup server for our production systems (regularly rsync'ing large files off Linux boxes, and rotating ZFS auto-snapshots to keep its space free). Sometimes it also runs replicas of infrastructure (DHCP, DNS) and was set up as a VirtualBox + phpVirtualBox host to test that out, but no VMs running.

So the essential loads are ZFS snapshots and ZFS scrubs :)

And it freezes roughly every week. Stops responding to ping, attempts to log in via SSH or physical console - it processes keypresses on the latter, but does not present a login prompt. It used to be stable, and such regular hangs began around summertime.


My primary guess would be for flaky disks, maybe timing out under load or going to sleep or whatever... But I have yet to prove it, or any other theory. Maybe just CPU is overheating due to regular near-100% load with disk I/O... At least I want to rule out OS errors and rule out (or point out) operator/box errors as much as possible - which is something I can change to try and fix ;)

Before I proceed to TL;DR screenshots, I'd overview what I see:

* In the "top" output, processes owned by zfssnap lead most of the time... But even the SSH shell is noticeably slow to respond (1 sec per line when just pressing enter to clear the screen to prepare nice screenshots).

* SMART was not enabled on 3TB mirrored "pool" SATA disks (is now, long tests initiated), but was in place on the "rpool" SAS disk where it logged some corrected ECC errors - but none uncorrected.

Maybe the cabling should be reseated.

* iostat shows disks are generally not busy (they don't audibly rattle nor visibly blink all the time, either)

* zpool scrubs return clean

* there are partitions of the system rpool disk (10K RPM SAS) used as log and cache devices for the main data pool on 3TB SATA disks. The system disk is fast and underutilized, so what the heck ;) And it was not a problem for the first year of this system's honest and stable workouts. These devices are pretty empty at the moment.


I have enabled deadman panics according to Wiki, but none have happened so far:

# cat /etc/system  | egrep -v '(^\*|^$)'
set snooping=1
set pcplusmp:apic_panic_on_nmi=1
set apix:apic_panic_on_nmi = 1


In the "top" output, processes owned by zfssnap lead most of the time:


last pid: 22599;  load avg:  12.9,  12.2,  11.2;       up 0+09:52:11                                                                          18:34:41
140 processes: 125 sleeping, 13 running, 2 on cpu
CPU states:  0.0% idle, 22.9% user, 77.1% kernel,  0.0% iowait,  0.0% swap
Memory: 16G phys mem, 1765M free mem, 2048M total swap, 2048M free swap
Seconds to delay:
   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 21389 zfssnap    1  43    2  863M  860M run      5:04 35.61% zfs
 22360 zfssnap    1  52    2  118M  115M run      0:37 16.50% zfs
 21778 zfssnap    1  52    2  563M  560M run      3:15 13.17% zfs
 21278 zfssnap    1  52    2  947M  944M run      5:32  6.91% zfs
 21881 zfssnap    1  43    2  433M  431M run      2:31  5.41% zfs
 21852 zfssnap    1  52    2  459M  456M run      2:39  5.16% zfs
 21266 zfssnap    1  43    2  906M  903M run      5:18  3.95% zfs
 21757 zfssnap    1  43    2  597M  594M run      3:26  2.91% zfs
 21274 zfssnap    1  52    2  930M  927M cpu/0    5:27  2.78% zfs
 22588 zfssnap    1  43    2   30M   27M run      0:08  2.48% zfs
 22580 zfssnap    1  52    2   49M   46M run      0:14  0.71% zfs
 22038 root       1  59    0 5312K 3816K cpu/1    0:01  0.10% top
 22014 root       1  59    0 8020K 4988K sleep    0:00  0.02% sshd


Average "iostats" are not that busy:


# zpool iostat -Td 5
Thu Oct 22 18:24:59 CEST 2015
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
pool        2.52T   207G    802    116  28.3M   840K
rpool       33.0G   118G      0      4  4.52K  58.7K
----------  -----  -----  -----  -----  -----  -----

Thu Oct 22 18:25:04 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0     10      0  97.9K
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:09 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:14 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0      9      0  93.5K
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:19 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:24 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:29 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:34 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:25:39 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0     16      0   374K
----------  -----  -----  -----  -----  -----  -----
...

Thu Oct 22 18:33:49 CEST 2015
pool        2.52T   207G      0      0      0      0
rpool       33.0G   118G      0     11      0  94.5K
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:33:54 CEST 2015
pool        2.52T   207G      0     13    819  80.0K
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:33:59 CEST 2015
pool        2.52T   207G      0    129      0  1.06M
rpool       33.0G   118G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
Thu Oct 22 18:34:04 CEST 2015
pool        2.52T   207G      0     55      0   503K
rpool       33.0G   118G      0     11      0  97.9K
----------  -----  -----  -----  -----  -----  -----
...

just occasional bursts of work.

I've now enabled SMART on the disks (2*3Tb mirror "pool" and 1*300Gb "rpool") and ran some short tests and triggered long tests (hopefully they'd succeed by tomorrow); current results are:


# for D in /dev/rdsk/c0*s0; do echo "===== $D :"; smartctl -d sat,12 -a $D ; done ; for D in /dev/rdsk/c4*s0 ; do echo "===== $D :"; smartctl -d scsi -a $D ; done
===== /dev/rdsk/c0t3d0s0 :
smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org<http://www.smartmontools.org/>

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD3003FZEX-00Z4SA0
Serial Number:    WD-WCC5D1KKU0PA
LU WWN Device Id: 5 0014ee 2610716b7
Firmware Version: 01.01A01
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Oct 22 18:45:28 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline
data collection:                (32880) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 357) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x7035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   246   154   021    Pre-fail  Always       -       6691
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       14
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       4869
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       14
 16 Unknown_Attribute       0x0022   130   070   000    Old_age   Always       -       2289651870502
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       12
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2
194 Temperature_Celsius     0x0022   117   111   000    Old_age   Always       -       35
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      4869         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

===== /dev/rdsk/c0t5d0s0 :
smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org<http://www.smartmontools.org/>

=== START OF INFORMATION SECTION ===
Model Family:     Seagate SV35
Device Model:     ST3000VX000-1ES166
Serial Number:    Z500S3L8
LU WWN Device Id: 5 000c50 079e3757b
Firmware Version: CV26
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Oct 22 18:45:28 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline
data collection:                (   80) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 325) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x10b9) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   105   099   006    Pre-fail  Always       -       8600880
  3 Spin_Up_Time            0x0003   096   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       19
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail  Always       -       342685681
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       4214
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       19
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   028   028   000    Old_age   Always       -       72
190 Airflow_Temperature_Cel 0x0022   069   065   045    Old_age   Always       -       31 (Min/Max 29/32)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       19
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       28
194 Temperature_Celsius     0x0022   031   040   000    Old_age   Always       -       31 (0 20 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 90%      4214         -
# 2  Short offline       Completed without error       00%      4214         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

===== /dev/rdsk/c4t5000CCA02A1292DDd0s0 :
smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org<http://www.smartmontools.org/>

Vendor:               HITACHI
Product:              HUS156030VLS600
Revision:             HPH1
User Capacity:        300,000,000,000 bytes [300 GB]
Logical block size:   512 bytes
Logical Unit id:      0x5000cca02a1292dc
Serial number:                LVVA6NHS
Device type:          disk
Transport protocol:   SAS
Local Time is:        Thu Oct 22 18:45:29 2015 CEST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:     45 C
Drive Trip Temperature:        70 C
Manufactured in week 14 of year 2012
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  80
Elements in grown defect list: 0
Vendor (Seagate) cache information
  Blocks sent to initiator = 2340336504406016

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0   888890         0    888890          0      29326.957           0
write:         0   961315         0    961315          0       6277.560           0

Non-medium error count:      283

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Self test in progress ...   -     NOW                 - [-   -    -]
# 2  Background long   Aborted (device reset ?)    -   14354                 - [-   -    -]
# 3  Background short  Completed                   -   14354                 - [-   -    -]
# 4  Background long   Aborted (device reset ?)    -   14354                 - [-   -    -]
# 5  Background long   Aborted (device reset ?)    -   14354                 - [-   -    -]

Long (extended) Self Test duration: 2506 seconds [41.8 minutes]


The zpool scrub results and general layout:


# zpool status -v
  pool: pool
 state: ONLINE
  scan: scrub repaired 0 in 164h13m with 0 errors on Thu Oct 22 18:13:33 2015
config:

        NAME                       STATE     READ WRITE CKSUM
        pool                       ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            c0t3d0                 ONLINE       0     0     0
            c0t5d0                 ONLINE       0     0     0
        logs
          c4t5000CCA02A1292DDd0p2  ONLINE       0     0     0
        cache
          c4t5000CCA02A1292DDd0p3  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0 in 3h3m with 0 errors on Thu Oct  8 04:12:35 2015
config:

        NAME                       STATE     READ WRITE CKSUM
        rpool                      ONLINE       0     0     0
          c4t5000CCA02A1292DDd0s0  ONLINE       0     0     0

errors: No known data errors

# zpool list -v
NAME                        SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
pool                       2.72T  2.52T   207G         -    68%    92%  1.36x  ONLINE  /
  mirror                   2.72T  2.52T   207G         -    68%    92%
    c0t3d0                     -      -      -         -      -      -
    c0t5d0                     -      -      -         -      -      -
log                            -      -      -         -      -      -
  c4t5000CCA02A1292DDd0p2     8G   148K  8.00G         -     0%     0%
cache                          -      -      -         -      -      -
  c4t5000CCA02A1292DDd0p3   120G  1.80G   118G         -     0%     1%
rpool                       151G  33.0G   118G         -    76%    21%  1.00x  ONLINE  -
  c4t5000CCA02A1292DDd0s0   151G  33.0G   118G         -    76%    21%

Note the long scrub time may have included the downtime while the system was frozen until it was rebooted.


Thanks in advance for the fresh pairs of eyeballs,
Jim Klimov

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com<mailto:OmniOS-discuss at lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss


________________________________

This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc. Such information may be protected from disclosure by law. The information is intended for use by only the addressee. If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited. If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151023/04aed2aa/attachment-0001.html>

From lotheac at iki.fi  Fri Oct 23 06:32:03 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Fri, 23 Oct 2015 09:32:03 +0300
Subject: [OmniOS-discuss] UPDATE NOW --> ntp to 4.2.8p4
In-Reply-To: <22057.28087.51441.685509@glaurung.bb-c.de>
References: <B43F2E80-E9AB-4EE0-8197-6A8308B41F0F@omniti.com>
	<36D4EF6A-C596-441B-828B-862D5EB9423E@omniti.com>
	<22057.12713.886330.955767@glaurung.bb-c.de>
	<20151022194134.GC77@gutsman.lotheac.fi>
	<22057.28087.51441.685509@glaurung.bb-c.de>
Message-ID: <20151023063203.GD77@gutsman.lotheac.fi>

We may be getting a bit off topic here :)

On Fri, Oct 23 2015 01:13:59 +0200, Volker A. Brandt wrote:
> Lauri Tirkkonen writes:
> > Well, that's not a design flaw. Actuators are executed only when the
> > action (eg. file) specifying them changes
> 
> Yes, this is how IPS does it.  IPS does not really know that the
> manifest-import service is special.  There should have been an explicit
> "re-import this manifest now" actuator, much like users or groups are 
> created.

But it isn't really that special. If you ship a different service
manifest, it gets reimported with 'svccfg import' -- but this does *not*
affect whether the service is running or restart it (IME). Reimporting a
manifest does, however, cause a service refresh, but the refresh action
doesn't necessarily restart any processes:

    mail ~ # svcs spamd
    STATE          STIME    FMRI
    online         Sep_30   svc:/mail/spamassassin/spamd:default
    mail ~ # svccfg import /lib/svc/manifest/mail/spamassassin-spamd.xml 
    mail ~ # svcs -vp spamd
    STATE          NSTATE        STIME    CTID   FMRI
    online         -              6:15:41    341 svc:/mail/spamassassin/spamd:default
                   Sep_30       3619 spamd
                   Oct_20      19590 spamd
                   Oct_16      29484 spamd
    mail ~ # tail -2 $(svcs -L spamd)
    [ Oct 23 06:15:41 Rereading configuration. ]
    [ Oct 23 06:15:41 No 'refresh' method defined.  Treating as :true. ]

I don't know what you mean with the bit about users and groups; AFAICT
the logic is similar to other actions.

> [...]
> > If you wanted to restart ntp when any files in the ntp package
> > change on update, you would need an actuator like
> > 'restart_fmri=svc:/network/ntp:default' on *all* file actions
> > delivered by the package.
> 
> I know what you mean.  That might work, but that is normally not what 
> you do when you deliver an SMF manifest in your package.  You just drop
> it and restart manifest-import, and hope that manifest-import will see
> your new manifest.  This is quite a different thing.

It is actually what I normally do, because no other way works :) See for
example the following mog file for ISC bind:
https://github.com/niksula/omnios-build-scripts/blob/master/bind/local.mog#L6
Even the manifest-import restart is an actuator added by omnios-build (from
global.mog), nothing happens automatically.

> Also, what you wrote is not quite true.  What you wanted to write was 
> "you would need an actuator on *at least one* file action *that has a
> different file hash*".  If nothing is different, the action is not
> executed, and the attached actuator does not fire. 

We're both correct. Note that I said "when _any_ files in the ntp package
change on update" (emphasis added).

> And it gets worse when you remove a package that contains a manifest for
> a running SMF service, because it is impossible to call the stop method
> of the service before removing the package.  Lots of fun. :-)

I don't know why it would be impossible before or even after removing the
package. After the manifest has been imported into the SMF repository, the
service will not go away even if you remove the xml manifest it came from.

If you meant that pkg can't automatically stop your service when removing a
package, maybe disable_fmri is something you want? From pkg(5):

    disable_fmri causes the given FMRI to be disabled prior to action
    removal, per the disable subcommand to svcadm(1M).

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From lotheac at iki.fi  Fri Oct 23 06:44:58 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Fri, 23 Oct 2015 09:44:58 +0300
Subject: [OmniOS-discuss] UPDATE NOW --> ntp to 4.2.8p4
In-Reply-To: <22057.28464.529688.787911@glaurung.bb-c.de>
References: <B43F2E80-E9AB-4EE0-8197-6A8308B41F0F@omniti.com>
	<36D4EF6A-C596-441B-828B-862D5EB9423E@omniti.com>
	<22057.12713.886330.955767@glaurung.bb-c.de>
	<CAO8hXRBfQEoWOgPJB6yS3usXqDLj=0A9Gc_qFvvmRM0_Tov+dw@mail.gmail.com>
	<22057.28464.529688.787911@glaurung.bb-c.de>
Message-ID: <20151023064458.GE77@gutsman.lotheac.fi>

On Fri, Oct 23 2015 01:20:16 +0200, Volker A. Brandt wrote:
> Eric Sproul writes:
> > The service has restarted correctly for me on both 006 and 014 with
> > this update.  I'm not sure why that is though, because you're
> > correct that the ntp.xml file has not changed in all of the '014
> > versions published.  I was under the impression that the
> > restart_fmri actuator would only fire when the associated action was
> > triggered.
> 
> Yes, that's why it did not restart on my 014 box, hence I noticed.
> Maybe you were on an earlier rev where the manifest really did change?
> I guess it also restarted when Dan tested, or else he would not have
> mentioned that pkg DTRT.

I don't know why the service would restart even if the manifest did
change, because reimporting the manifest only triggers a refresh (see my
other mail). I find it strange that Eric mentions it did.

> > However, if we really *do* want to restart ntp when the *daemon*
> > updates, then we could add a restart_fmri actuator on the
> > usr/lib/inet/ntpd file.  Thus, whenever that file is updated,
> > svc:/network/ntp:default could be restarted.
> 
> It's been a while since I last tried, but I think this will not work,
> at least not in some corner cases, e.g. when the pkg is not installed
> at all, and the svc:/network/ntp:default service does not exist when
> the pkg is installed.  ISTR that pkg install will error out.  Hmmm, 
> need to test that again sometimes. :-)

I don't know about when pkg is not installed (how would you even update
or install the package then?), but on a package containing both the
manifest-import restart actuator on the manifest file as well as a
restart_fmri for the new service on the other files, 'pkg install -v'
does mention that both services will be restarted, even though the one
of them doesn't exist at install time. It has never caused any errors
for me though, and we've been shipping several different packages
containing services for quite a while now.

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From lotheac at iki.fi  Fri Oct 23 07:00:38 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Fri, 23 Oct 2015 10:00:38 +0300
Subject: [OmniOS-discuss] UPDATE NOW --> ntp to 4.2.8p4
In-Reply-To: <20151023064458.GE77@gutsman.lotheac.fi>
References: <B43F2E80-E9AB-4EE0-8197-6A8308B41F0F@omniti.com>
	<36D4EF6A-C596-441B-828B-862D5EB9423E@omniti.com>
	<22057.12713.886330.955767@glaurung.bb-c.de>
	<CAO8hXRBfQEoWOgPJB6yS3usXqDLj=0A9Gc_qFvvmRM0_Tov+dw@mail.gmail.com>
	<22057.28464.529688.787911@glaurung.bb-c.de>
	<20151023064458.GE77@gutsman.lotheac.fi>
Message-ID: <20151023070038.GF77@gutsman.lotheac.fi>

On Fri, Oct 23 2015 09:44:58 +0300, Lauri Tirkkonen wrote:
> On Fri, Oct 23 2015 01:20:16 +0200, Volker A. Brandt wrote:
> > It's been a while since I last tried, but I think this will not work,
> > at least not in some corner cases, e.g. when the pkg is not installed
> > at all
> 
> I don't know about when pkg is not installed (how would you even update
> or install the package then?)

Sorry, looks like I failed at reading comprehension (I missed the "the"
in "the pkg").

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From richard at netbsd.org  Fri Oct 23 07:25:14 2015
From: richard at netbsd.org (Richard PALO)
Date: Fri, 23 Oct 2015 09:25:14 +0200
Subject: [OmniOS-discuss] ILB memory leak?
In-Reply-To: <5628A198.5040808@scluk.com>
References: <56276437.2020109@scluk.com>	<00AE5FA3-E699-4C4F-8A94-AEEDAAED0856@omniti.com>
	<5628A198.5040808@scluk.com>
Message-ID: <5629E0DA.5070402@netbsd.org>

Le 22/10/15 10:43, Al Slater a ?crit :
> I am seeing kernel memory consumption increasing as well, but that may 
> be a different issue.  The ilbd process memory is definitely growing.
> 

this is indeed probably a different issue, but it would be useful to create a thread
on illumos discuss as I'm seeing it as well (not using ILB).. for example, running 
a number of rather intensive builds I see kernel steadily going up to ~40%!!:
> richard at omnis:/home/richard$ swap -hs ; echo ::memstat |pfexec mdb -k
> total: 1,8G allocated + 311M reserved = 2,1G used, 40G available
> Page Summary                Pages                MB  %Tot
> ------------     ----------------  ----------------  ----
> Kernel                    3231113             12621   39%
> ZFS File Data             2944763             11502   35%
> Anon                       452803              1768    5%
> Exec and libs                5088                19    0%
> Page cache                  65892               257    1%
> Free (cachelist)            70820               276    1%
> Free (freelist)           1614595              6307   19%
> 
> Total                     8385074             32754
> Physical                  8385072             32754


-- 
Richard PALO


From matej at zunaj.si  Fri Oct 23 11:55:03 2015
From: matej at zunaj.si (Matej Zerovnik)
Date: Fri, 23 Oct 2015 13:55:03 +0200
Subject: [OmniOS-discuss] Slow performance with ZeusRAM?
In-Reply-To: <alpine.GSO.2.01.1510221636190.24180@freddy.simplesystems.org>
References: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
	<CALeZrrQMHQsLWjaF0Ak8QcuG4X__HDgxSB=awqw96CfoKhYm1A@mail.gmail.com>
	<95ADB53F-BBDD-4CCD-959F-0A174E7DA8F2@zunaj.si>
	<alpine.GSO.2.01.1510221636190.24180@freddy.simplesystems.org>
Message-ID: <792FFED6-445A-41E6-9FB3-ADD6E8F7310E@zunaj.si>


> On 22 Oct 2015, at 23:51, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:
> 
> On Thu, 22 Oct 2015, Matej Zerovnik wrote:
>> Bob:
>> Yes, my ZFS is ashift=12, since all my drives report 4k blocks (is that what you ment?). The pool is completly empty, so there is enough place for
>> writes, so write speed should not be limited because of COW. Looking at iostat, there are no reads on the drives at all.
>> I?m not sure where fio gets its data, probably from /dev/zero or somewhere?
> 
> To be clear, zfs does not overwrite blocks.  Instead zfs modifies (in memory) any prior data from a block, and then it writes the block data to a new location.  This is called "copy on write".  If the prior data would not be entirely overwritten and is not already cached in memory, then it needs to be read from underlying disk.
> 
> It is interesting that you say there are no reads on the drives at all.

I think I got it. I think in my case, there are no reads because the whole pool is empty and fresh data are written to pool. So there is no rewrite, just pure writes...

I did some more testing with recordsize=4k (need to repeat them with 128k as well) and it looks like I can get up to 48k IOPS when doing sequential 4k writing running 6x dd (dd if=/dev/zero of=/pool/folder/file1 bs=4k count=100000).
When I switch to random writing (this time I tried iozone instead of fio), I can only get up to 58mb/s (which translates to around 14.500 IOPS (although iostat is showing higher values).

iostat during random write:
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 29805.4    0.0 119209.6  0.0  3.0    0.0    0.1   3  83 c9t5000A72A300B3D9Dd0
    0.0 29807.4    0.0 119217.6  0.0  4.7    0.0    0.2   4  83 c10t5000A72A300B3D7Ed0
    0.0  589.7    0.0 62794.9  0.0  0.7    0.0    1.2   1  62 c10t5000C500837549F9d0
    0.0  622.6    0.0 66173.8  0.0  0.6    0.0    1.0   1  54 c10t5000C50083759089d0
    0.0  609.6    0.0 66173.9  0.0  0.6    0.0    1.0   1  54 c10t5000C500837557EDd0

How come I can see 30k IOPS flowing to ZeusRAM, but I only see 58MB/s being written to hard drive?

I tried running scripts from http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning: txd flush takes around 0,3s and average txd usage is:
 3  12645      txg_sync_thread:txg-syncing  742MB of 4096MB used
 15  12645      txg_sync_thread:txg-syncing 1848MB of 4096MB used
 21  12645      txg_sync_thread:txg-syncing 1467MB of 4096MB used
 20  12645      txg_sync_thread:txg-syncing 2231MB of 4096MB used
 16  12645      txg_sync_thread:txg-syncing 1237MB of 4096MB used
 14  12645      txg_sync_thread:txg-syncing 1624MB of 4096MB used
  1  12645      txg_sync_thread:txg-syncing 1130MB of 4096MB used
  9  12645      txg_sync_thread:txg-syncing 1750MB of 4096MB used
  9  12645      txg_sync_thread:txg-syncing 1300MB of 4096MB used
 18  12645      txg_sync_thread:txg-syncing 2396MB of 4096MB used

If I understand that correctly, spindles have no problem writing data from write cache to disks.

I have a feling I have a real problem with understanding how things work in ZFS:) I always used a simple explanation: as fast as system can write to ZIL, that is the speed that the program can write to fs. I guess not?

Matej
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3468 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20151023/22e4e8a1/attachment.bin>

From danmcd at omniti.com  Fri Oct 23 15:24:23 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 23 Oct 2015 11:24:23 -0400
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
In-Reply-To: <AC3215A2A0BBBA4F936F337194E5B1950119ED13E2@mx2k10.eng.exegy.net>
References: <fc3ca2101e41.56294013@cos.ru>
	<CAJ5wh+6rk+FyO9o1P9AGAtgETMYK6DCu30pmxDUaNHWTECJDUA@mail.gmail.com>
	<AC3215A2A0BBBA4F936F337194E5B1950119ED13E2@mx2k10.eng.exegy.net>
Message-ID: <74C4EFF5-B1D7-49DE-81D5-54CE8787C258@omniti.com>


> On Oct 23, 2015, at 12:42 AM, Hildebrandt, Bill <bhildebrandt at exegy.com> wrote:
> 
> Controllers:        2x LSI 9201-16i with SAS-SATA break-out cables going to a SAS846TQ backplane
> System disk:       mirrored Innodisk 128GB SATADOMs (SATADOM-MV 3ME) off mobo
> ZIL:                         mirrored Samsung SSD 845D (400GB) off mobo
> L2ARC:                  SAMSUNG MZHPU512 (512GB PCIe)
> NICs:                     Intel 10G (X520)
> Data disks:          24x WD 1TB Enterprise SATA (WDC WD1002FBYS)

SATA disks attached to a SAS expander (this is the SAS846TQ, an expander, right?) are known to be a dangerous deployment.  SATA doesn't have the reporting capabilities SAS does, and many SAS expanders don't relay SATA well enough.

We do not support paying customers who deploy like this.

FYI,
Dan


From richard.elling at richardelling.com  Fri Oct 23 16:42:10 2015
From: richard.elling at richardelling.com (Richard Elling)
Date: Fri, 23 Oct 2015 09:42:10 -0700
Subject: [OmniOS-discuss] Slow performance with ZeusRAM?
In-Reply-To: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
References: <10FE1CC1-F9F5-433A-9A2D-6570C4EE6CCF@zunaj.si>
Message-ID: <D17265C3-B2BD-4DD2-8DB3-7F7179985144@richardelling.com>

additional insight below...

> On Oct 22, 2015, at 12:02 PM, Matej Zerovnik <matej at zunaj.si> wrote:
> 
> Hello,
> 
> I'm building a new system and I'm having a bit of a performance problem. Well, its either that or I'm not getting the whole ZIL idea:)
> 
> My system is following:
> - IBM xServer 3550 M4 server (dual CPU with 160GB memory)
> - LSI 9207 HBA (P19 firmware)
> - Supermicro JBOD with SAS expander
> - 4TB SAS3 drives
> - ZeusRAM for ZIL
> - LTS Omnios (all patches applied)
> 
> If I benchmark ZeusRAM on its own with random 4k sync writes, I can get 48k IOPS out of it, no problem there.

Do not assume writes to the slog for 4k random write workload are only 4k in size.
You'll want to measure to be sure, but the worst case here is 8k written to slog:
   4k data + 4k chain pointer = 8k physical write

There are cases where multiple 4k data gets coalesced, so the above is worst case.
Measure to be sure. A quick back-of-the-napkin measurement can be done from
iostat -x output. More detailed measurements can be done wil zilstat or other specific
dtracing.
 -- richard

> 
> If I create a new raidz2 pool with 10 hard drives, mirrored ZeusRAMs for ZIL and set sync=always, I can only squeeze 14k IOPS out of the system.
> Is that normal or should I be getting 48k IOPS on the 2nd pool as well, since this is the performance ZeusRAM can deliver?
> 
> I'm testing with fio:
> fio --filename=/pool0/test01 --size=5g --rw=randwrite --refill_buffers --norandommap --randrepeat=0 --ioengine=solarisaio --bs=4k --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest
> 
> thanks, Matej_______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


From jimklimov at cos.ru  Fri Oct 23 16:54:27 2015
From: jimklimov at cos.ru (Jim Klimov)
Date: Fri, 23 Oct 2015 18:54:27 +0200
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
In-Reply-To: <fc3bc0e92848.562a26c0@cos.ru>
References: <fc3ca2101e41.56294013@cos.ru> <fc3bc0e92848.562a26c0@cos.ru>
Message-ID: <02DF3A33-F955-4F86-A478-0D639CB500F1@cos.ru>

23 ??????? 2015??. 11:23:28 CEST, Jim Klimov <jim at cos.ru> ?????:
>A new bit of info came in. I left the box running along with an SSH
>session running various tracers overnight, and it seems that the system
>dumbly ran out of memory. 
> 
>However it never logged any forking errors, etc. which were typical for
>similar cases before, and did not recover by time or "magic" (e.g.
>processes dying on ENOMEM and so freeing it up). There is some swap
>free, too (which wouldn't help if it ran out of kernel memory). The
>64Mb free RAM (or 32Mb in some of my older experiences) is the empiric
>minimum under which illumos is good as dead ;)
> 
>The box is pingable after all, but no SSH nor local usability. The
>"top" listings froze at 4:25am, that's some 7 hours ago.
> 
>last pid: 26331;  load avg:  6.65,  5.61,  6.88;       up 0+19:42:47   
>04:25:17
>208 processes: 178 sleeping, 28 running, 1 zombie, 1 on cpu
>CPU states: 21.5% idle,  3.5% user, 75.0% kernel,  0.0% iowait,  0.0%
>swap
>Memory: 16G phys mem, 64M free mem, 2048M total swap, 1420M free swap
>   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
> 24910 zfssnap    1  60    2  243M  238M run      1:24  1.61% zfs
> 25620 zfssnap    1  53    2  220M  215M run      1:18  1.17% zfs
> 25619 zfssnap    1  53    2  220M  215M run      1:18  1.15% zfs
> 24753 zfssnap    1  53    2  243M  238M run      1:26  1.12% zfs
> 25864 zfssnap    1  53    2  220M  215M run      1:19  0.93% zfs
> 25861 zfssnap    1  53    2   13M   10M sleep    0:03  0.87% zfs
> 22380 zfssnap    1  60    2  764M  721M run      4:34  0.83% zfs
> 22546 zfssnap    1  53    2  698M  672M sleep    4:08  0.79% zfs
> 25857 zfssnap    1  53    2  220M  215M run      1:19  0.78% zfs
> 24224 zfssnap    1  60    2  548M  536M run      3:13  0.76% zfs
> 22901 zfssnap    1  60    2  698M  672M run      4:09  0.75% zfs
> 22551 zfssnap    1  60    2  698M  672M sleep    4:08  0.73% zfs
> 22373 zfssnap    1  60    2  767M  729M run      4:33  0.73% zfs
> 22212 zfssnap    1  60    2  767M  730M sleep    4:36  0.71% zfs
> 24215 zfssnap    1  60    2  549M  537M sleep    3:13  0.69% zfs
> 
> 
>Heh, by sheer coincidence, it froze 10 hours 1 second after I logged in
>(which in my profile also prints a few lines from top):
> 
>last pid: 22036;  load avg:  10.7,  10.3,  10.4;       up 0+09:42:46  
>18:25:16
>126 processes: 115 sleeping, 9 running, 2 on cpu
>CPU states:  0.0% idle, 21.4% user, 78.6% kernel,  0.0% iowait,  0.0%
>swap
>Memory: 16G phys mem, 1713M free mem, 2048M total swap, 2048M free swap
>   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
> 21266 zfssnap    1  46    2  660M  657M run      3:50 28.20% zfs
> 21274 zfssnap    1  45    2  672M  669M run      3:55 16.70% zfs
> 21757 zfssnap    1  50    2  314M  311M run      1:48  8.95% zfs
> 21389 zfssnap    1  52    2  586M  584M run      3:24  8.82% zfs
> 21173 zfssnap    1  53    2  762M  759M cpu/1    4:28  8.76% zfs
>
>
>So at the moment it seems there is some issue with zfs-auto-snapshots
>on OmniOS that I haven't seen in SXCE, OI nor Hipster. Possibly I had
>different implementations of the service in these different OSes (shell
>vs python, and all that at different versions).
> 
>I'll see if toning down the frequency of autosnaps (e.g. disable
>"frequent" or "hourly" schedules) would help improve stability. If it
>does - I'd still call it a bug. System should not die like that. And
>the actual load (as in I/O ops) is seemingly not that gigantic.
> 
> 
>Jim
>
>----- ???????? ????????? -----
>??: Jim Klimov <jim at cos.ru>
>????: Thursday, October 22, 2015 20:02
>????: [OmniOS-discuss] OmniOS backup box hanging regularly
>???? (To): OmniOS-discuss <omnios-discuss at lists.omniti.com>
>
>
>> Hello all,
>> I have this HP-Z400 workstation with 16Gb ECC(should be) RAM running
>OmniOS bloody, which acts as a backup server for our production systems
>(regularly rsync'ing large files off Linux boxes, and rotating ZFS
>auto-snapshots to keep its space free). Sometimes it also runs replicas
>of infrastructure (DHCP, DNS) and was set up as a VirtualBox +
>phpVirtualBox host to test that out, but no VMs running.
>> So the essential loads are ZFS snapshots and ZFS scrubs :)
>> And it freezes roughly every week. Stops responding to ping, attempts
>to log in via SSH or physical console - it processes keypresses on the
>latter, but does not present a login prompt. It used to be stable, and
>such regular hangs began around summertime.
>>  
>> My primary guess would be for flaky disks, maybe timing out under
>load or going to sleep or whatever... But I have yet to prove it, or
>any other theory. Maybe just CPU is overheating due to regular
>near-100% load with disk I/O... At least I want to rule out OS errors
>and rule out (or point out) operator/box errors as much as possible -
>which is something I can change to try and fix ;)
>> Before I proceed to TL;DR screenshots, I'd overview what I see:
>> * In the "top" output, processes owned by zfssnap lead most of the
>time... But even the SSH shell is noticeably slow to respond (1 sec per
>line when just pressing enter to clear the screen to prepare nice
>screenshots).
>> * SMART was not enabled on 3TB mirrored "pool" SATA disks (is now,
>long tests initiated), but was in place on the "rpool" SAS disk where
>it logged some corrected ECC errors - but none uncorrected.
>> Maybe the cabling should be reseated.
>> * iostat shows disks are generally not busy (they don't audibly
>rattle nor visibly blink all the time, either)
>> * zpool scrubs return clean
>> * there are partitions of the system rpool disk (10K RPM SAS) used as
>log and cache devices for the main data pool on 3TB SATA disks. The
>system disk is fast and underutilized, so what the heck ;) And it was
>not a problem for the first year of this system's honest and stable
>workouts. These devices are pretty empty at the moment.
>>  
>> I have enabled deadman panics according to Wiki, but none have
>happened so far:
>> # cat /etc/system  | egrep -v '(^\*|^$)'
>> set snooping=1
>> set pcplusmp:apic_panic_on_nmi=1
>> set apix:apic_panic_on_nmi = 1
>
>>  
>>  
>> In the "top" output, processes owned by zfssnap lead most of the
>time:
>>  
>> last pid: 22599;  load avg:  12.9,  12.2,  11.2;       up 0+09:52:11 
>                                                               18:34:41
>> 140 processes: 125 sleeping, 13 running, 2 on cpu
>> CPU states:  0.0% idle, 22.9% user, 77.1% kernel,  0.0% iowait,  0.0%
>swap
>> Memory: 16G phys mem, 1765M free mem, 2048M total swap, 2048M free
>swap
>> Seconds to delay:
>>    PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
>>  21389 zfssnap    1  43    2  863M  860M run      5:04 35.61% zfs
>>  22360 zfssnap    1  52    2  118M  115M run      0:37 16.50% zfs
>>  21778 zfssnap    1  52    2  563M  560M run      3:15 13.17% zfs
>>  21278 zfssnap    1  52    2  947M  944M run      5:32  6.91% zfs
>>  21881 zfssnap    1  43    2  433M  431M run      2:31  5.41% zfs
>>  21852 zfssnap    1  52    2  459M  456M run      2:39  5.16% zfs
>>  21266 zfssnap    1  43    2  906M  903M run      5:18  3.95% zfs
>>  21757 zfssnap    1  43    2  597M  594M run      3:26  2.91% zfs
>>  21274 zfssnap    1  52    2  930M  927M cpu/0    5:27  2.78% zfs
>>  22588 zfssnap    1  43    2   30M   27M run      0:08  2.48% zfs
>>  22580 zfssnap    1  52    2   49M   46M run      0:14  0.71% zfs
>>  22038 root       1  59    0 5312K 3816K cpu/1    0:01  0.10% top
>>  22014 root       1  59    0 8020K 4988K sleep    0:00  0.02% sshd
>
>>  
>> Average "iostats" are not that busy:
>>  
>> # zpool iostat -Td 5
>> Thu Oct 22 18:24:59 CEST 2015
>>                capacity     operations    bandwidth
>> pool        alloc   free   read  write   read  write
>> ----------  -----  -----  -----  -----  -----  -----
>> pool        2.52T   207G    802    116  28.3M   840K
>> rpool       33.0G   118G      0      4  4.52K  58.7K
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:04 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0     10      0  97.9K
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:09 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:14 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      9      0  93.5K
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:19 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:24 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:29 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:34 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:39 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0     16      0   374K
>> ----------  -----  -----  -----  -----  -----  -----
>> ...
>> Thu Oct 22 18:33:49 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0     11      0  94.5K
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:33:54 CEST 2015
>> pool        2.52T   207G      0     13    819  80.0K
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:33:59 CEST 2015
>> pool        2.52T   207G      0    129      0  1.06M
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:34:04 CEST 2015
>> pool        2.52T   207G      0     55      0   503K
>> rpool       33.0G   118G      0     11      0  97.9K
>> ----------  -----  -----  -----  -----  -----  -----
>> ...
>> just occasional bursts of work. 
>> I've now enabled SMART on the disks (2*3Tb mirror "pool" and 1*300Gb
>"rpool") and ran some short tests and triggered long tests (hopefully
>they'd succeed by tomorrow); current results are:
>> 
>> 
>> # for D in /dev/rdsk/c0*s0; do echo "===== $D :"; smartctl -d sat,12
>-a $D ; done ; for D in /dev/rdsk/c4*s0 ; do echo "===== $D :";
>smartctl -d scsi -a $D ; done
>> ===== /dev/rdsk/c0t3d0s0 :
>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
>www.smartmontools.org
>> === START OF INFORMATION SECTION ===
>> Device Model:     WDC WD3003FZEX-00Z4SA0
>> Serial Number:    WD-WCC5D1KKU0PA
>> LU WWN Device Id: 5 0014ee 2610716b7
>> Firmware Version: 01.01A01
>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Device is:        Not in smartctl database [for details use: -P
>showall]
>> ATA Version is:   ACS-2 (minor revision not indicated)
>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
>> Local Time is:    Thu Oct 22 18:45:28 2015 CEST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>> General SMART Values:
>> Offline data collection status:  (0x82) Offline data collection
>activity
>>                                         was completed without error.
>>                                         Auto Offline Data Collection:
>Enabled.
>> Self-test execution status:      ( 249) Self-test routine in
>progress...
>>                                         90% of test remaining.
>> Total time to complete Offline
>> data collection:                (32880) seconds.
>> Offline data collection
>> capabilities:                    (0x7b) SMART execute Offline
>immediate.
>>                                         Auto Offline data collection
>on/off support.
>>                                         Suspend Offline collection
>upon new
>>                                         command.
>>                                         Offline surface scan
>supported.
>>                                         Self-test supported.
>>                                         Conveyance Self-test
>supported.
>>                                         Selective Self-test
>supported.
>> SMART capabilities:            (0x0003) Saves SMART data before
>entering
>>                                         power-saving mode.
>>                                         Supports SMART auto save
>timer.
>> Error logging capability:        (0x01) Error logging supported.
>>                                         General Purpose Logging
>supported.
>> Short self-test routine
>> recommended polling time:        (   2) minutes.
>> Extended self-test routine
>> recommended polling time:        ( 357) minutes.
>> Conveyance self-test routine
>> recommended polling time:        (   5) minutes.
>> SCT capabilities:              (0x7035) SCT Status supported.
>>                                         SCT Feature Control
>supported.
>>                                         SCT Data Table supported.
>> SMART Attributes Data Structure revision number: 16
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE     
>UPDATED  WHEN_FAILED RAW_VALUE
>>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail 
>Always       -       0
>>   3 Spin_Up_Time            0x0027   246   154   021    Pre-fail 
>Always       -       6691
>>   4 Start_Stop_Count        0x0032   100   100   000    Old_age  
>Always       -       14
>>   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail 
>Always       -       0
>>   7 Seek_Error_Rate         0x002e   200   200   000    Old_age  
>Always       -       0
>>   9 Power_On_Hours          0x0032   094   094   000    Old_age  
>Always       -       4869
>>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age  
>Always       -       0
>>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age  
>Always       -       0
>>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age  
>Always       -       14
>>  16 Unknown_Attribute       0x0022   130   070   000    Old_age  
>Always       -       2289651870502
>> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age  
>Always       -       12
>> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age  
>Always       -       2
>> 194 Temperature_Celsius     0x0022   117   111   000    Old_age  
>Always       -       35
>> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age  
>Always       -       0
>> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age  
>Always       -       0
>> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age  
>Offline      -       0
>> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age  
>Always       -       0
>> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age  
>Offline      -       0
>> SMART Error Log Version: 1
>> No Errors Logged
>> SMART Self-test log structure revision number 1
>> Num  Test_Description    Status                  Remaining 
>LifeTime(hours)  LBA_of_first_error
>> # 1  Short offline       Completed without error       00%      4869 
>       -
>> SMART Selective self-test log data structure revision number 1
>>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>>     1        0        0  Not_testing
>>     2        0        0  Not_testing
>>     3        0        0  Not_testing
>>     4        0        0  Not_testing
>>     5        0        0  Not_testing
>> Selective self-test flags (0x0):
>>   After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute
>delay.
>> ===== /dev/rdsk/c0t5d0s0 :
>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
>www.smartmontools.org
>> === START OF INFORMATION SECTION ===
>> Model Family:     Seagate SV35
>> Device Model:     ST3000VX000-1ES166
>> Serial Number:    Z500S3L8
>> LU WWN Device Id: 5 000c50 079e3757b
>> Firmware Version: CV26
>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
>> Local Time is:    Thu Oct 22 18:45:28 2015 CEST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>> General SMART Values:
>> Offline data collection status:  (0x00) Offline data collection
>activity
>>                                         was never started.
>>                                         Auto Offline Data Collection:
>Disabled.
>> Self-test execution status:      ( 249) Self-test routine in
>progress...
>>                                         90% of test remaining.
>> Total time to complete Offline
>> data collection:                (   80) seconds.
>> Offline data collection
>> capabilities:                    (0x73) SMART execute Offline
>immediate.
>>                                         Auto Offline data collection
>on/off support.
>>                                         Suspend Offline collection
>upon new
>>                                         command.
>>                                         No Offline surface scan
>supported.
>>                                         Self-test supported.
>>                                         Conveyance Self-test
>supported.
>>                                         Selective Self-test
>supported.
>> SMART capabilities:            (0x0003) Saves SMART data before
>entering
>>                                         power-saving mode.
>>                                         Supports SMART auto save
>timer.
>> Error logging capability:        (0x01) Error logging supported.
>>                                         General Purpose Logging
>supported.
>> Short self-test routine
>> recommended polling time:        (   1) minutes.
>> Extended self-test routine
>> recommended polling time:        ( 325) minutes.
>> Conveyance self-test routine
>> recommended polling time:        (   2) minutes.
>> SCT capabilities:              (0x10b9) SCT Status supported.
>>                                         SCT Error Recovery Control
>supported.
>>                                         SCT Feature Control
>supported.
>>                                         SCT Data Table supported.
>> SMART Attributes Data Structure revision number: 10
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE     
>UPDATED  WHEN_FAILED RAW_VALUE
>>   1 Raw_Read_Error_Rate     0x000f   105   099   006    Pre-fail 
>Always       -       8600880
>>   3 Spin_Up_Time            0x0003   096   094   000    Pre-fail 
>Always       -       0
>>   4 Start_Stop_Count        0x0032   100   100   020    Old_age  
>Always       -       19
>>   5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
>Always       -       0
>>   7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail 
>Always       -       342685681
>>   9 Power_On_Hours          0x0032   096   096   000    Old_age  
>Always       -       4214
>>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail 
>Always       -       0
>>  12 Power_Cycle_Count       0x0032   100   100   020    Old_age  
>Always       -       19
>> 184 End-to-End_Error        0x0032   100   100   099    Old_age  
>Always       -       0
>> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age  
>Always       -       0
>> 188 Command_Timeout         0x0032   100   100   000    Old_age  
>Always       -       0
>> 189 High_Fly_Writes         0x003a   028   028   000    Old_age  
>Always       -       72
>> 190 Airflow_Temperature_Cel 0x0022   069   065   045    Old_age  
>Always       -       31 (Min/Max 29/32)
>> 191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age  
>Always       -       0
>> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age  
>Always       -       19
>> 193 Load_Cycle_Count        0x0032   100   100   000    Old_age  
>Always       -       28
>> 194 Temperature_Celsius     0x0022   031   040   000    Old_age  
>Always       -       31 (0 20 0 0 0)
>> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age  
>Always       -       0
>> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age  
>Offline      -       0
>> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age  
>Always       -       0
>> SMART Error Log Version: 1
>> No Errors Logged
>> SMART Self-test log structure revision number 1
>> Num  Test_Description    Status                  Remaining 
>LifeTime(hours)  LBA_of_first_error
>> # 1  Extended offline    Self-test routine in progress 90%      4214 
>       -
>> # 2  Short offline       Completed without error       00%      4214 
>       -
>> SMART Selective self-test log data structure revision number 1
>>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>>     1        0        0  Not_testing
>>     2        0        0  Not_testing
>>     3        0        0  Not_testing
>>     4        0        0  Not_testing
>>     5        0        0  Not_testing
>> Selective self-test flags (0x0):
>>   After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute
>delay.
>> ===== /dev/rdsk/c4t5000CCA02A1292DDd0s0 :
>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
>www.smartmontools.org
>> Vendor:               HITACHI
>> Product:              HUS156030VLS600
>> Revision:             HPH1
>> User Capacity:        300,000,000,000 bytes [300 GB]
>> Logical block size:   512 bytes
>> Logical Unit id:      0x5000cca02a1292dc
>> Serial number:                LVVA6NHS
>> Device type:          disk
>> Transport protocol:   SAS
>> Local Time is:        Thu Oct 22 18:45:29 2015 CEST
>> Device supports SMART and is Enabled
>> Temperature Warning Enabled
>> SMART Health Status: OK
>> Current Drive Temperature:     45 C
>> Drive Trip Temperature:        70 C
>> Manufactured in week 14 of year 2012
>> Specified cycle count over device lifetime:  50000
>> Accumulated start-stop cycles:  80
>> Elements in grown defect list: 0
>> Vendor (Seagate) cache information
>>   Blocks sent to initiator = 2340336504406016
>> Error counter log:
>>            Errors Corrected by           Total   Correction    
>Gigabytes    Total
>>                ECC          rereads/    errors   algorithm     
>processed    uncorrected
>>            fast | delayed   rewrites  corrected  invocations   [10^9
>bytes]  errors
>> read:          0   888890         0    888890          0     
>29326.957           0
>> write:         0   961315         0    961315          0      
>6277.560           0
>> Non-medium error count:      283
>> SMART Self-test log
>> Num  Test              Status                 segment  LifeTime 
>LBA_first_err [SK ASC ASQ]
>>      Description                              number   (hours)
>> # 1  Background long   Self test in progress ...   -     NOW         
>       - [-   -    -]
>> # 2  Background long   Aborted (device reset ?)    -   14354         
>       - [-   -    -]
>> # 3  Background short  Completed                   -   14354         
>       - [-   -    -]
>> # 4  Background long   Aborted (device reset ?)    -   14354         
>       - [-   -    -]
>> # 5  Background long   Aborted (device reset ?)    -   14354         
>       - [-   -    -]
>> Long (extended) Self Test duration: 2506 seconds [41.8 minutes]
>
>>  
>> The zpool scrub results and general layout:
>>  
>> # zpool status -v
>>   pool: pool
>>  state: ONLINE
>>   scan: scrub repaired 0 in 164h13m with 0 errors on Thu Oct 22
>18:13:33 2015
>> config:
>>         NAME                       STATE     READ WRITE CKSUM
>>         pool                       ONLINE       0     0     0
>>           mirror-0                 ONLINE       0     0     0
>>             c0t3d0                 ONLINE       0     0     0
>>             c0t5d0                 ONLINE       0     0     0
>>         logs
>>           c4t5000CCA02A1292DDd0p2  ONLINE       0     0     0
>>         cache
>>           c4t5000CCA02A1292DDd0p3  ONLINE       0     0     0
>> errors: No known data errors
>>   pool: rpool
>>  state: ONLINE
>> status: Some supported features are not enabled on the pool. The pool
>can
>>         still be used, but some features are unavailable.
>> action: Enable all features using 'zpool upgrade'. Once this is done,
>>         the pool may no longer be accessible by software that does
>not support
>>         the features. See zpool-features(5) for details.
>>   scan: scrub repaired 0 in 3h3m with 0 errors on Thu Oct  8 04:12:35
>2015
>> config:
>>         NAME                       STATE     READ WRITE CKSUM
>>         rpool                      ONLINE       0     0     0
>>           c4t5000CCA02A1292DDd0s0  ONLINE       0     0     0
>> errors: No known data errors
>
>> # zpool list -v
>> NAME                        SIZE  ALLOC   FREE  EXPANDSZ   FRAG   
>CAP  DEDUP  HEALTH  ALTROOT
>> pool                       2.72T  2.52T   207G         -    68%   
>92%  1.36x  ONLINE  /
>>   mirror                   2.72T  2.52T   207G         -    68%   
>92%
>>     c0t3d0                     -      -      -         -      -     
>-
>>     c0t5d0                     -      -      -         -      -     
>-
>> log                            -      -      -         -      -     
>-
>>   c4t5000CCA02A1292DDd0p2     8G   148K  8.00G         -     0%    
>0%
>> cache                          -      -      -         -      -     
>-
>>   c4t5000CCA02A1292DDd0p3   120G  1.80G   118G         -     0%    
>1%
>> rpool                       151G  33.0G   118G         -    76%   
>21%  1.00x  ONLINE  -
>>   c4t5000CCA02A1292DDd0s0   151G  33.0G   118G         -    76%   
>21%
>
>> Note the long scrub time may have included the downtime while the
>system was frozen until it was rebooted.
>>  
>> Thanks in advance for the fresh pairs of eyeballs,
>> Jim Klimov
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss

Mail apparently got bounced, reposting...
--
Typos courtesy of K-9 Mail on my Samsung Android

From bhildebrandt at exegy.com  Fri Oct 23 16:57:55 2015
From: bhildebrandt at exegy.com (Hildebrandt, Bill)
Date: Fri, 23 Oct 2015 16:57:55 +0000
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
In-Reply-To: <74C4EFF5-B1D7-49DE-81D5-54CE8787C258@omniti.com>
References: <fc3ca2101e41.56294013@cos.ru>
	<CAJ5wh+6rk+FyO9o1P9AGAtgETMYK6DCu30pmxDUaNHWTECJDUA@mail.gmail.com>
	<AC3215A2A0BBBA4F936F337194E5B1950119ED13E2@mx2k10.eng.exegy.net>
	<74C4EFF5-B1D7-49DE-81D5-54CE8787C258@omniti.com>
Message-ID: <AC3215A2A0BBBA4F936F337194E5B1950119ED2135@mx2k10.eng.exegy.net>

It's a 4U direct-attached backplane . . . not an expander.  Each drive requires its own connection.

The zdb -bbccsv of my exported pool just finished and without error this time.

Dump is still available if you would like to review it.

-Bill

-----Original Message-----
From: Dan McDonald [mailto:danmcd at omniti.com]
Sent: Friday, October 23, 2015 10:24 AM
To: Hildebrandt, Bill
Cc: Yavor Tomov; jimklimov at cos.ru; OmniOS-discuss; Dan McDonald
Subject: Re: [OmniOS-discuss] OmniOS backup box hanging regularly


> On Oct 23, 2015, at 12:42 AM, Hildebrandt, Bill <bhildebrandt at exegy.com> wrote:
>
> Controllers:        2x LSI 9201-16i with SAS-SATA break-out cables going to a SAS846TQ backplane
> System disk:       mirrored Innodisk 128GB SATADOMs (SATADOM-MV 3ME) off mobo
> ZIL:                         mirrored Samsung SSD 845D (400GB) off mobo
> L2ARC:                  SAMSUNG MZHPU512 (512GB PCIe)
> NICs:                     Intel 10G (X520)
> Data disks:          24x WD 1TB Enterprise SATA (WDC WD1002FBYS)

SATA disks attached to a SAS expander (this is the SAS846TQ, an expander, right?) are known to be a dangerous deployment.  SATA doesn't have the reporting capabilities SAS does, and many SAS expanders don't relay SATA well enough.

We do not support paying customers who deploy like this.

FYI,
Dan


________________________________

This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc. Such information may be protected from disclosure by law. The information is intended for use by only the addressee. If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited. If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others.

From lotheac at iki.fi  Fri Oct 23 17:10:47 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Fri, 23 Oct 2015 20:10:47 +0300
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
In-Reply-To: <02DF3A33-F955-4F86-A478-0D639CB500F1@cos.ru>
References: <fc3ca2101e41.56294013@cos.ru> <fc3bc0e92848.562a26c0@cos.ru>
	<02DF3A33-F955-4F86-A478-0D639CB500F1@cos.ru>
Message-ID: <20151023171047.GA25370@gutsman.lotheac.fi>

On Fri, Oct 23 2015 18:54:27 +0200, Jim Klimov wrote:
> 23 ??????? 2015??. 11:23:28 CEST, Jim Klimov <jim at cos.ru> ?????:
> >So at the moment it seems there is some issue with zfs-auto-snapshots
> >on OmniOS that I haven't seen in SXCE, OI nor Hipster. Possibly I had
> >different implementations of the service in these different OSes (shell
> >vs python, and all that at different versions).

I highly recommend znapzend for automatic snapshotting. We used to use
an implementation called zfs-auto-snapshot (I wonder how many there are
:) in the past but it was doing dumb things like listing all existing
snapshots quite frequently, and that ate up memory.
http://www.znapzend.org/

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From carlb at flamewarestudios.com  Fri Oct 23 18:01:04 2015
From: carlb at flamewarestudios.com (Carl Brunning)
Date: Fri, 23 Oct 2015 18:01:04 +0000
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
In-Reply-To: <74C4EFF5-B1D7-49DE-81D5-54CE8787C258@omniti.com>
References: <fc3ca2101e41.56294013@cos.ru>
	<CAJ5wh+6rk+FyO9o1P9AGAtgETMYK6DCu30pmxDUaNHWTECJDUA@mail.gmail.com>
	<AC3215A2A0BBBA4F936F337194E5B1950119ED13E2@mx2k10.eng.exegy.net>
	<74C4EFF5-B1D7-49DE-81D5-54CE8787C258@omniti.com>
Message-ID: <f3789fa4343c44a8a7efaa7743476186@exchange02.cblinux.co.uk>

Hay just so you know this backplane is not a expander but direct connection
So sata should be fine 


-----Original Message-----
From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Dan McDonald
Sent: 23 October 2015 16:24
To: Hildebrandt, Bill <bhildebrandt at exegy.com>
Cc: OmniOS-discuss <omnios-discuss at lists.omniti.com>
Subject: Re: [OmniOS-discuss] OmniOS backup box hanging regularly


> On Oct 23, 2015, at 12:42 AM, Hildebrandt, Bill <bhildebrandt at exegy.com> wrote:
> 
> Controllers:        2x LSI 9201-16i with SAS-SATA break-out cables going to a SAS846TQ backplane
> System disk:       mirrored Innodisk 128GB SATADOMs (SATADOM-MV 3ME) off mobo
> ZIL:                         mirrored Samsung SSD 845D (400GB) off mobo
> L2ARC:                  SAMSUNG MZHPU512 (512GB PCIe)
> NICs:                     Intel 10G (X520)
> Data disks:          24x WD 1TB Enterprise SATA (WDC WD1002FBYS)

SATA disks attached to a SAS expander (this is the SAS846TQ, an expander, right?) are known to be a dangerous deployment.  SATA doesn't have the reporting capabilities SAS does, and many SAS expanders don't relay SATA well enough.

We do not support paying customers who deploy like this.

FYI,
Dan

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

From nagele at wildbit.com  Fri Oct 23 20:03:59 2015
From: nagele at wildbit.com (Chris Nagele)
Date: Fri, 23 Oct 2015 16:03:59 -0400
Subject: [OmniOS-discuss] Pool degraded after inserting disk
Message-ID: <CAHfYOdUmhvi7WV28ggeHdxRMtSreVsN11kaZf73NjTNFZt39hQ@mail.gmail.com>

Hi all. I had an issue come up today when we added some new disks to a
server. After physically inserting the disk, the pool immediately
degraded and any zfs commands would hang.

We rebooted and the pool was completely fine. Is this a common issue?
I've never run into it before.

To clarify, these servers are running on the X9DRD-7LN4F-JBOD board
with SATA SSDs attached directly to the onboard controller.

Chris

From danmcd at omniti.com  Mon Oct 26 15:17:46 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 26 Oct 2015 11:17:46 -0400
Subject: [OmniOS-discuss] HEADS UP: r151006 EOSL on March 31st, 2016
Message-ID: <0D2ECE15-BC7F-4246-B41E-C595712058F5@omniti.com>

With r151014, the new LTS, in the field for over six months now, and r151016, the next stable, arriving within the next two weeks, it is time to start the countdown timer on support for the old LTS, r151006.

March 31st, 2016 is the earliest date the following stable, r151018, will be released.  Coincident with that is the end of service life (EOSL) for r151006, the old LTS.

If you are still on r151006, PLEASE start your migration to r151014 NOW.  As before, r151006 will receive security updates, but after March 31st, it will not.

The release cadence listed here:

	http://omnios.omniti.com/wiki.php/ReleaseCycle

documents this, and has for some time.  

Thank you,
Dan McDonald -- OmniOS Engineering


From skeltonr at btconnect.com  Mon Oct 26 20:46:34 2015
From: skeltonr at btconnect.com (Richard Skelton)
Date: Mon, 26 Oct 2015 20:46:34 +0000
Subject: [OmniOS-discuss] pkgrecv -s http://pkg.omniti.com/omnios/r151014/
 -d /tank/repo '*' seems to be broken
Message-ID: <562E912A.80808@btconnect.com>

Hi,
I am trying to make a local copy of the stable repo :-
pkgrecv -s http://pkg.omniti.com/omnios/r151014/ -d /tank/repo '*' seems
to be broken

but it fails :-(

root at hp:/root/fio-2.1.10# pkgrecv -s
http://pkg.omniti.com/omnios/r151014/ -d /tank/repo '*'
Processing packages for publisher omnios ...
Retrieving and evaluating 6161 package(s)...
Download Manifests ( 907/6161) -pkgrecv: http protocol error: code: 404
reason: Not Found
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/manifest/0/developer%2Fillumos-tools at 11%2C5.11-0.151014%3A20151016T122410Z'
(happened 4 times)

root at hp:/root/fio-2.1.10#


Cheers Richard

From danmcd at omniti.com  Mon Oct 26 21:14:23 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 26 Oct 2015 17:14:23 -0400
Subject: [OmniOS-discuss] pkgrecv -s
	http://pkg.omniti.com/omnios/r151014/ -d /tank/repo '*' seems
	to be broken
In-Reply-To: <562E912A.80808@btconnect.com>
References: <562E912A.80808@btconnect.com>
Message-ID: <43AF2ECC-D806-47BD-B313-23DAFE7F4563@omniti.com>


> On Oct 26, 2015, at 4:46 PM, Richard Skelton <skeltonr at btconnect.com> wrote:
> 
> Hi,
> I am trying to make a local copy of the stable repo :-
> pkgrecv -s http://pkg.omniti.com/omnios/r151014/ -d /tank/repo '*' seems
> to be broken
> 
> but it fails :-(

Two things.

1.) Assuming you're ON r151014 already, use "-m latest" for less transfers, unless you REALLY WANT all of the historical r151014 packages.

	pkgrecv -s http://pkg.omniti.com/omnios/r151014/ -d /tank/repo -m latest '*'

2.) I just rebuilt r151014's repo index.  Please try again (with the -m latest to prevent extra transfers if needed).


Dan


From lotheac at iki.fi  Tue Oct 27 09:56:42 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Tue, 27 Oct 2015 11:56:42 +0200
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
In-Reply-To: <54A6F787-8791-4C94-85AB-BB3615077387@cos.ru>
References: <fc3ca2101e41.56294013@cos.ru> <fc3bc0e92848.562a26c0@cos.ru>
	<02DF3A33-F955-4F86-A478-0D639CB500F1@cos.ru>
	<20151023171047.GA25370@gutsman.lotheac.fi>
	<54A6F787-8791-4C94-85AB-BB3615077387@cos.ru>
Message-ID: <20151027095642.GA13407@gutsman.lotheac.fi>

On Tue, Oct 27 2015 09:49:40 +0100, Jim Klimov wrote:
> So far I use a mix of 'standard' time-slider and additionally my script that kills oldest snapshot groups (chosen by pattern of automatic snaps) to keep a specified watermark of free space.

Yeah, we were previously using zfs-auto-snap from OpenSolaris before it
became time-slider (with one or two local patches). 

> Something in this simple activity is enough to bring the box down into swapping until the deadman knocks to interrupt the infinite loop looking for a free page, and I've got a screenshot to prove this theory ;)

In your previous mail you have a 'top' listing with way too many 'zfs'
processes owned by zfssnap, and all are hundreds of megabytes in RSS.
That sounds like a problem. IIRC, one problematic configuration that
caused issues like this was a single filesystem setting a
zfs-auto-snapshot property locally in a large tree where it also
inherited it from the parent. My memory on this is a bit hazy though.

> I wonder why doesn't the offending process die on some failed malloc...

Good question.

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From jimklimov at cos.ru  Tue Oct 27 11:05:31 2015
From: jimklimov at cos.ru (Jim Klimov)
Date: Tue, 27 Oct 2015 12:05:31 +0100
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
In-Reply-To: <54A6F787-8791-4C94-85AB-BB3615077387@cos.ru>
References: <fc3ca2101e41.56294013@cos.ru> <fc3bc0e92848.562a26c0@cos.ru>
	<02DF3A33-F955-4F86-A478-0D639CB500F1@cos.ru>
	<20151023171047.GA25370@gutsman.lotheac.fi>
	<54A6F787-8791-4C94-85AB-BB3615077387@cos.ru>
Message-ID: <31B5A10C-1A68-4FC4-82EF-A887518349B8@cos.ru>

27 ??????? 2015??. 9:49:40 CET, Jim Klimov <jim at cos.ru> ?????:
>23 ??????? 2015??. 19:10:47 CEST, Lauri Tirkkonen <lotheac at iki.fi>
>?????:
>>On Fri, Oct 23 2015 18:54:27 +0200, Jim Klimov wrote:
>>> 23 ??????? 2015??. 11:23:28 CEST, Jim Klimov <jim at cos.ru> ?????:
>>> >So at the moment it seems there is some issue with
>>zfs-auto-snapshots
>>> >on OmniOS that I haven't seen in SXCE, OI nor Hipster. Possibly I
>>had
>>> >different implementations of the service in these different OSes
>>(shell
>>> >vs python, and all that at different versions).
>>
>>I highly recommend znapzend for automatic snapshotting. We used to use
>>an implementation called zfs-auto-snapshot (I wonder how many there
>are
>>:) in the past but it was doing dumb things like listing all existing
>>snapshots quite frequently, and that ate up memory.
>>http://www.znapzend.org/
>>
>>-- 
>>Lauri Tirkkonen | lotheac @ IRCnet
>>_______________________________________________
>>OmniOS-discuss mailing list
>>OmniOS-discuss at lists.omniti.com
>>http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>Thanks, i'll try to take a look whenifever I have time ;)
>
>So far I use a mix of 'standard' time-slider and additionally my script
>that kills oldest snapshot groups (chosen by pattern of automatic
>snaps) to keep a specified watermark of free space.
>
>Something in this simple activity is enough to bring the box down into
>swapping until the deadman knocks to interrupt the infinite loop
>looking for a free page, and I've got a screenshot to prove this theory
>;)
>
>I wonder why doesn't the offending process die on some failed malloc...
>
>Jim
>--
>Typos courtesy of K-9 Mail on my Samsung Android

Heh, in fact this OmniOS installation does not offer a time-slider, but rather the ksh93-based scripts for 'zfs/autosnapshot'. Now gotta verify what i run elsewhere;)
--
Typos courtesy of K-9 Mail on my Samsung Android

From lotheac at iki.fi  Tue Oct 27 11:07:56 2015
From: lotheac at iki.fi (Lauri Tirkkonen)
Date: Tue, 27 Oct 2015 13:07:56 +0200
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
In-Reply-To: <31B5A10C-1A68-4FC4-82EF-A887518349B8@cos.ru>
References: <fc3ca2101e41.56294013@cos.ru> <fc3bc0e92848.562a26c0@cos.ru>
	<02DF3A33-F955-4F86-A478-0D639CB500F1@cos.ru>
	<20151023171047.GA25370@gutsman.lotheac.fi>
	<54A6F787-8791-4C94-85AB-BB3615077387@cos.ru>
	<31B5A10C-1A68-4FC4-82EF-A887518349B8@cos.ru>
Message-ID: <20151027110756.GB13407@gutsman.lotheac.fi>

On Tue, Oct 27 2015 12:05:31 +0100, Jim Klimov wrote:
> Heh, in fact this OmniOS installation does not offer a time-slider, but rather the ksh93-based scripts for 'zfs/autosnapshot'. Now gotta verify what i run elsewhere;)

I think OmniOS ships neither time-slider nor zfs-auto-snapshot. When we
used it I had to dig it out from somewhere in the interwebs and package
it :)

-- 
Lauri Tirkkonen | lotheac @ IRCnet

From gmason at msu.edu  Tue Oct 27 12:37:25 2015
From: gmason at msu.edu (Greg Mason)
Date: Tue, 27 Oct 2015 08:37:25 -0400
Subject: [OmniOS-discuss] OmniOS backup box hanging regularly
In-Reply-To: <20151027110756.GB13407@gutsman.lotheac.fi>
References: <fc3ca2101e41.56294013@cos.ru> <fc3bc0e92848.562a26c0@cos.ru>
	<02DF3A33-F955-4F86-A478-0D639CB500F1@cos.ru>
	<20151023171047.GA25370@gutsman.lotheac.fi>
	<54A6F787-8791-4C94-85AB-BB3615077387@cos.ru>
	<31B5A10C-1A68-4FC4-82EF-A887518349B8@cos.ru>
	<20151027110756.GB13407@gutsman.lotheac.fi>
Message-ID: <778F99AB-C38F-489A-ACFA-2A031EEC80D2@msu.edu>

we?ve been using this, fired off by cron:

https://github.com/MSU-iCER/puppet-zfs-auto-snapshot/blob/master/files/zfs-auto-snapshot.pl

We manage it via puppet. It?s a bit of an older puppet module, but should still work.

-Greg


> On Oct 27, 2015, at 7:07 AM, Lauri Tirkkonen <lotheac at iki.fi> wrote:
> 
> On Tue, Oct 27 2015 12:05:31 +0100, Jim Klimov wrote:
>> Heh, in fact this OmniOS installation does not offer a time-slider, but rather the ksh93-based scripts for 'zfs/autosnapshot'. Now gotta verify what i run elsewhere;)
> 
> I think OmniOS ships neither time-slider nor zfs-auto-snapshot. When we
> used it I had to dig it out from somewhere in the interwebs and package
> it :)
> 
> -- 
> Lauri Tirkkonen | lotheac @ IRCnet
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


From skeltonr at btconnect.com  Tue Oct 27 17:03:34 2015
From: skeltonr at btconnect.com (Richard Skelton)
Date: Tue, 27 Oct 2015 17:03:34 +0000
Subject: [OmniOS-discuss] pkgrecv -s
 http://pkg.omniti.com/omnios/r151014/ -d /tank/repo '*' seems to be broken
In-Reply-To: <43AF2ECC-D806-47BD-B313-23DAFE7F4563@omniti.com>
References: <562E912A.80808@btconnect.com>
	<43AF2ECC-D806-47BD-B313-23DAFE7F4563@omniti.com>
Message-ID: <562FAE66.8080704@btconnect.com>

Hi Dan,
Now I get :-
root at hp:/root/fio-2.1.10# pkgrecv  -s
http://pkg.omniti.com/omnios/r151014/ -d /tank/repo '*'
Processing packages for publisher omnios ...
Retrieving and evaluating 6160 package(s)...
PROCESS                                         ITEMS    GET (MB)   SEND
(MB)
developer/gcc48                                0/3464     43/1853     
0/5108pkgrecv: 1: Framework error: code: 18 reason: transfer closed with
26579845 bytes remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/81dded17f29bf94296d580fe3d197f1c650a7f98'
2: Framework error: code: 18 reason: transfer closed with 24763267 bytes
remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/399f01f216d9bf29a3549821df79572a9e120401'
3: Framework error: code: 18 reason: transfer closed with 26681841 bytes
remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/4c40e09a64075f6b145cc747fdd5cbc621984d16'
4: Framework error: code: 18 reason: transfer closed with 22826291 bytes
remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/d371f83d2055ab737eaacb0c70f3b8e958f896ac'
5: Framework error: code: 18 reason: transfer closed with 20496851 bytes
remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/399f01f216d9bf29a3549821df79572a9e120401'
6: Framework error: code: 18 reason: transfer closed with 20129541 bytes
remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/81dded17f29bf94296d580fe3d197f1c650a7f98'
7: Framework error: code: 18 reason: transfer closed with 22444385 bytes
remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/4c40e09a64075f6b145cc747fdd5cbc621984d16'
8: Framework error: code: 18 reason: transfer closed with 21467763 bytes
remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/d371f83d2055ab737eaacb0c70f3b8e958f896ac'
9: Framework error: code: 18 reason: transfer closed with 21192525 bytes
remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/81dded17f29bf94296d580fe3d197f1c650a7f98'
10: Framework error: code: 18 reason: transfer closed with 15680195
bytes remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/399f01f216d9bf29a3549821df79572a9e120401'
11: Framework error: code: 18 reason: transfer closed with 21010865
bytes remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/4c40e09a64075f6b145cc747fdd5cbc621984d16'
12: Framework error: code: 18 reason: transfer closed with 19564821
bytes remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/81dded17f29bf94296d580fe3d197f1c650a7f98'
13: Framework error: code: 18 reason: transfer closed with 21120761
bytes remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/4c40e09a64075f6b145cc747fdd5cbc621984d16'
14: Framework error: code: 18 reason: transfer closed with 19370755
bytes remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/d371f83d2055ab737eaacb0c70f3b8e958f896ac'
15: Framework error: code: 18 reason: transfer closed with 15824995
bytes remaining to read
URL:
'http://pkg.omniti.com/omnios/r151014/omnios/file/1/399f01f216d9bf29a3549821df79572a9e120401'


pkgrecv: Cached files were preserved in the following directory:
        /var/tmp/pkgrecv-BpcFZb
Use pkgrecv -c to resume the interrupted download.
root at hp:/root/fio-2.1.10#


Dan McDonald wrote:
>> On Oct 26, 2015, at 4:46 PM, Richard Skelton <skeltonr at btconnect.com> wrote:
>>
>> Hi,
>> I am trying to make a local copy of the stable repo :-
>> pkgrecv -s http://pkg.omniti.com/omnios/r151014/ -d /tank/repo '*' seems
>> to be broken
>>
>> but it fails :-(
>>     
>
> Two things.
>
> 1.) Assuming you're ON r151014 already, use "-m latest" for less transfers, unless you REALLY WANT all of the historical r151014 packages.
>
> 	pkgrecv -s http://pkg.omniti.com/omnios/r151014/ -d /tank/repo -m latest '*'
>
> 2.) I just rebuilt r151014's repo index.  Please try again (with the -m latest to prevent extra transfers if needed).
>
>
> Dan
>
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151027/976886a6/attachment.html>

From danmcd at omniti.com  Tue Oct 27 17:47:26 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Tue, 27 Oct 2015 13:47:26 -0400
Subject: [OmniOS-discuss] pkgrecv -s
	http://pkg.omniti.com/omnios/r151014/ -d /tank/repo '*' seems
	to be broken
In-Reply-To: <562FAE66.8080704@btconnect.com>
References: <562E912A.80808@btconnect.com>
	<43AF2ECC-D806-47BD-B313-23DAFE7F4563@omniti.com>
	<562FAE66.8080704@btconnect.com>
Message-ID: <AAD28670-63A5-46A7-84D8-A1DE3DA9E08C@omniti.com>

Try -m latest... it could just be the sheer number of packages you're transferring. We still use the tiny CherryPy webserver at the repo-box end.

Dan


From danmcd at omniti.com  Thu Oct 29 18:03:07 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Thu, 29 Oct 2015 14:03:07 -0400
Subject: [OmniOS-discuss] Small update --> UNZIP
Message-ID: <3F49B4C5-3871-4496-B9F9-A803C60BA6A7@omniti.com>

Testers found UNZIP 6.0 didn't handle certain fuzz situations as well as it should've.  To that end, r151006 and r151014 have been updated with new compression/unzip packages.  Please "pkg update".

Thanks,
Dan


From danmcd at omniti.com  Fri Oct 30 19:26:33 2015
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 30 Oct 2015 15:26:33 -0400
Subject: [OmniOS-discuss] Fwd: [discuss] HEADS UP: Java kerberos GUI
	(gkadmin) gone
References: <CAAYoGd_Z0MvNWexcgZ9QLVgHq9v08xtdRAph-EUxONC-4SX5RA@mail.gmail.com>
Message-ID: <5F829969-C9C6-475F-A256-25E213A4B349@omniti.com>

THIS DOES NOT AFFECT THE UPCOMING r151016, but it will affect 018 and beyond.

Anyone in the audience use the Kerberos Java GUI to administer Kerberos?  If so, consider this an EOL warning for spring of 2016 and r151018.

Thanks,
Dan


> Begin forwarded message:
> 
> From: "Garrett D'Amore" <garrett at damore.org>
> Date: October 30, 2015 at 3:24:32 PM EDT
> To: "discuss at lists.illumos.org" <discuss at lists.illumos.org>
> Subject: [discuss] HEADS UP: Java kerberos GUI (gkadmin) gone
> 
> FYI, with the push I just did on behalf of Dan McDonald and Josef Sipek (who actually took most of this from work I did ages ago in illumos-core), the Java based GUI for administering kerberos is gone from illumos-gate.
> 
> Its unclear if anyone was able to use gkadmin successfully recently, or has been using it.
> 
> As distributions pick this up, you may notice its absence.  You can still administer Kerberos using the command line interface kadmin.
> 
> The push in question is this:
> 
> commit dd3293375033eaa6f009722670ffa191b992ffd9
> Author: Garrett D'Amore <garrett at damore.org <mailto:garrett at damore.org>>
> Date:   Thu Oct 29 12:33:18 2015 -0400
> 
>     6407 kerberos Java GUI should go away
>     Portions contributed by: Josef 'Jeff' Sipek <jeffpc at josefsipek.net <mailto:jeffpc at josefsipek.net>>
>     Reviewed by: Josef 'Jeff' Sipek <jeffpc at josefsipek.net <mailto:jeffpc at josefsipek.net>>
>     Reviewed by: Toomas Soome <tsoome at me.com <mailto:tsoome at me.com>>
>     Reviewed by: Andy Stormont <astormont at racktopsystems.com <mailto:astormont at racktopsystems.com>>
>     Reviewed by: Albert Lee <trisk at omniti.com <mailto:trisk at omniti.com>>
>     Reviewed by: Peter Tribble <peter.tribble at gmail.com <mailto:peter.tribble at gmail.com>>
>     Reviewed by: Richard PALO <richard at netbsd.org <mailto:richard at netbsd.org>>
>     Approved by: Dan McDonald <danmcd at omniti.com <mailto:danmcd at omniti.com>>
> 
>   - Garrett
>  
> illumos-discuss | Archives <https://www.listbox.com/member/archive/182180/=now>  <https://www.listbox.com/member/archive/rss/182180/21175487-64bb3106> | Modify <https://www.listbox.com/member/?member_id=21175487&id_secret=21175487-26add75e> Your Subscription	  <http://www.listbox.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151030/5eac4e62/attachment-0001.html>