[OmniOS-discuss] ZPOOL bug after upgrade to r151020

John Barfield john.barfield at bissinc.com
Tue Apr 18 14:51:13 UTC 2017


I did compile it. It seems to have fixed the deadlocsk but we’re still having performance issues that we were not having before. IMHO this is some type of regression.

Here is the binary location:

http://download.txtelecom.net/zfs

Install procedure (As root):
(Props to Dan McDonald for helping me with this process)


·         beadm list (Note your current BE)

·         zfs snap rpool at WHATEVER

·         beadm create OmniOS-r151020-cstm

·         beadm mount OmniOS-r151020-cstm /mnt

·         cd /mnt/kernel/fs/amd64/

·         mv zfs zfs.old

·         wget download.txtelecom.net/zfs

·         cd ~

·         beadm unmount /mnt

·         beadm activate OmniOS-r151020

·         shutdown –i6 –g0


The system will reboot twice…the first SAN I did this on rebooted both times flawlessly. On the Second system it froze while shutting down and we had to power cycle it. But it still came back online using the new build. Use this at your own risk.

If it does fail completely simply choose your old BE on next reboot within grub. (Step 1 you’ll have a list)

However, please see the following comments from my customer which are still occurring:

John,
I’ve copied 6GB, 37GB, and 61GB files from the thor /mnt local disk to a location on the San.
Monitoring only shell activity on a few machines, nothing locked up (except thor). Although trying a ssh to thor took a minute to connect.
While the ‘cp’ command was working, I monitor thor using ‘top’ command.
* Of course, the load factor started to rise from 1 to over 5 on thor.
* As the file was copying, I see the destination file size increase.
* However, when the destination file size reached 6 or 37 or 61GB, thor’s load factor is still very high and remained high for a while.
* If logged into thor, no work can be done while the ‘cp’ is ongoing.
* Only after the ‘cp’ command was completed did thor’s load factor start to drop and work could be done if logged into thor.
* Other machines, e.g. flash, linux8, had no issues. . . as far as I could tell.
* I did not invoke any of the EDA tools.

Additionally the VMware datastore on NFS(ZFS)  bug that I was hoping to fix was only ½ resolved as well.

We can delete large files now but it brings the SAN to a crawl. Better than it used to be when it would come to a halt.

I’d love to see if I could get a kernel hacker to look at both of these issues with us.

John Barfield
Engineering and Stuff

M: +1 (214) 425-0783  O: +1 (214) 506-8354
john.barfield at bissinc.com<mailto:john.barfield at bissinc.com>

[cid:image001.png at 01D2B829.5033AEE0]
4925 Greenville Ave, Ste 900
Dallas, TX 75206

For Support Requests:
http://support.bissinc.com<http://support.bissinc.com/> or support at bissinc.com<mailto:support at bissinc.com>

Follow us on Twitter for Network Status &  Updates!

[cid:image002.gif at 01D2B829.5033AEE0]<https://twitter.com/johnbarfield>

From: wuffers <moo at wuffers.net>
Date: Tuesday, April 18, 2017 at 9:38 AM
To: Dan McDonald <danmcd at omniti.com>
Cc: John Barfield <john.barfield at bissinc.com>, "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
Subject: Re: [OmniOS-discuss] ZPOOL bug after upgrade to r151020

I upgraded to r151020 in late Jan, and saw some strangeness with arcstat (l2size and l2asize were huge) before I did a reboot due to some instability a few weeks ago. I thought it was just a case of not using the latest arcstat, and things were running fine after a reboot so didn't pursue it.

I saw this post last week, and confirmed it was within my environment, so did the remove/re-add of the cache devices, then a complete reboot as well. The cache devices reported back their actual proper size (400GB) via "zpool iostat -v". Today I checked it again and this is what I see:

# arcstat
read  hits  miss  hit%  l2read  l2hits  l2miss  l2hit%  arcsz  l2size  l2asize
 465   412    53    88      53      50       3      94   230G    4.4T     3.2T

# zpool iostat -v

(other info snipped for brevity)

cache                          -      -      -      -      -      -
  c2t500117310015D579d0     816G  16.0E     54     23  2.32M  1.46M
  c2t50011731001631FDd0     816G  16.0E     54     23  2.32M  1.46M
  c12t500117310015D59Ed0    815G  16.0E     55     23  2.35M  1.46M
  c12t500117310015D54Ed0    816G  16.0E     55     23  2.36M  1.46M


I'm just waiting for the next lockup/crash..

John, were you able to compile the fix, and if so, be able send me a copy?

Thanks.


On Mon, Apr 10, 2017 at 10:00 AM, Dan McDonald <danmcd at omniti.com<mailto:danmcd at omniti.com>> wrote:

> On Apr 9, 2017, at 10:27 PM, John Barfield <john.barfield at bissinc.com<mailto:john.barfield at bissinc.com>> wrote:
>
> Thank you Dan.
>
> Do you happen to have the process or know the location of a process document for only building ZFS?
>
> Ive re-built only nfs from illumos-gate in the past to resolve a bug but im wondering how I would build and install only zfs. (if its even possible).
>
> There are 2 bugs that we're suffering with at two different customer sites that didnt get into r151020 and Im not sure that we can make it till r151022 is released.
>
> Thanks for any advice

You can build zfs the way you likely built NFS.  Build it, replace it on an alternate BE (in zfs's case:  /kernel/fs/amd64/zfs), and reboot.

The only gotcha might be if a bugfix covers more than just ZFS itself... but for 7504, that's NOT the case.  :)

Dan

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com<mailto:OmniOS-discuss at lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20170418/b04ccb28/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 3510 bytes
Desc: image001.png
URL: <https://omniosce.org/ml-archive/attachments/20170418/b04ccb28/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.gif
Type: image/gif
Size: 1353 bytes
Desc: image002.gif
URL: <https://omniosce.org/ml-archive/attachments/20170418/b04ccb28/attachment-0001.gif>


More information about the OmniOS-discuss mailing list