[OmniOS-discuss] zpool upgrade

Tue Nov 19 15:40:53 UTC 2013

Hello,

any side effects upgrading zfs pool for future flags?

I tested it on my test server ,but is it ok for prod zfs pools?

# zpool upgrade ydkpool
This system supports ZFS pool feature flags.

Successfully upgraded 'ydkpool' from version 28 to feature flags.
Enabled the following features on 'ydkpool':
  async_destroy
  empty_bpobj
  lz4_compress

----- Orijinal Mesaj -----
Kimden: omnios-discuss-request at lists.omniti.com
Kime: omnios-discuss at lists.omniti.com
Gönderilenler: 19 Kasım Salı 2013 0:42:32
Konu: OmniOS-discuss Digest, Vol 20, Issue 19

Send OmniOS-discuss mailing list submissions to
	omnios-discuss at lists.omniti.com

To subscribe or unsubscribe via the World Wide Web, visit
	http://lists.omniti.com/mailman/listinfo/omnios-discuss
or, via email, send a message with subject or body 'help' to
	omnios-discuss-request at lists.omniti.com

You can reach the person managing the list at
	omnios-discuss-owner at lists.omniti.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of OmniOS-discuss digest..."

Today's Topics:

   1. Re: kernel panic - anon_decref (wuffers)
   2. Re: kernel panic - anon_decref (wuffers)

----------------------------------------------------------------------

Message: 1
Date: Sat, 16 Nov 2013 02:48:43 -0500
From: wuffers <moo at wuffers.net>
To: Saso Kiselkov <skiselkov.ml at gmail.com>
Cc: omnios-discuss <omnios-discuss at lists.omniti.com>
Subject: Re: [OmniOS-discuss] kernel panic - anon_decref
Message-ID:
	<CA+tR_KyemANkTZyKs-1ejp+vzSGkZsDoBc=hhMqh-Qcu8AsvDA at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

When it pours, it rains. With r151006y, I had two kernel panics in quick
succession while trying to create some zero thick eager disks (4 at the
same time) in ESXi. They are now "kernel heap corruption detected" instead
of anon_decref.

Kernel panic 2 (dump info:
https://drive.google.com/file/d/0B7mCJnZUzJPKMHhqZHJnaDEzYkk)
http://i.imgur.com/eIssxmc.png?1
http://i.imgur.com/MXJy4zP.png?1

TIME                           UUID
SUNW-MSG-ID
Nov 16 2013 00:51:24.912170000 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
SUNOS-8000-KL

  TIME                 CLASS                                 ENA
  Nov 16 00:51:24.8638 ireport.os.sunos.panic.dump_available
0x0000000000000000
  Nov 16 00:49:58.8671 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
        code = SUNOS-8000-KL
        diag-time = 1384581084 866703
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru =
sw:///:path=/var/crash/unknown/.5998ba1e-3aa5-ccac-e885-be4897cfcfe8
                resource =
sw:///:path=/var/crash/unknown/.5998ba1e-3aa5-ccac-e885-be4897cfcfe8
                savecore-succcess = 1
                dump-dir = /var/crash/unknown
                dump-files = vmdump.1
                os-instance-uuid = 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
                panicstr = kernel heap corruption detected
                panicstack = fffffffffba49c04 () |
genunix:kmem_slab_free+c1 () | genunix:kmem_magazine_destroy+6e () |
genunix:kmem_depot_ws_reap+5d () | genunix:kmem_cache_magazine_purge+118 ()
| genunix:kmem_cache_magazine_resize+40 () | genunix:taskq_thread+2d0 () |
unix:thread_start+8 () |
                crashtime = 1384577735
                panic-time = Fri Nov 15 23:55:35 2013 EST
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x528707dc 0x365e9c10

kernel panic 3 (dump info:
https://drive.google.com/file/d/0B7mCJnZUzJPKbnZIeWZzQjhUOTQ):
(looked the same, no screenshots)

TIME                           UUID
SUNW-MSG-ID
Nov 16 2013 01:44:43.327489000 a6592c60-199f-ead5-9586-ff013bf5ab2d
SUNOS-8000-KL

  TIME                 CLASS                                 ENA
  Nov 16 01:44:43.2941 ireport.os.sunos.panic.dump_available
0x0000000000000000
  Nov 16 01:44:03.5356 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = a6592c60-199f-ead5-9586-ff013bf5ab2d
        code = SUNOS-8000-KL
        diag-time = 1384584283 296816
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru =
sw:///:path=/var/crash/unknown/.a6592c60-199f-ead5-9586-ff013bf5ab2d
                resource =
sw:///:path=/var/crash/unknown/.a6592c60-199f-ead5-9586-ff013bf5ab2d
                savecore-succcess = 1
                dump-dir = /var/crash/unknown
                dump-files = vmdump.2
                os-instance-uuid = a6592c60-199f-ead5-9586-ff013bf5ab2d
                panicstr = kernel heap corruption detected
                panicstack = fffffffffba49c04 () |
genunix:kmem_slab_free+c1 () | genunix:kmem_magazine_destroy+6e () |
genunix:kmem_cache_magazine_purge+dc () |
genunix:kmem_cache_magazine_resize+40 () | genunix:taskq_thread+2d0 () |
unix:thread_start+8 () |
                crashtime = 1384582658
                panic-time = Sat Nov 16 01:17:38 2013 EST
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x5287145b 0x138515e8

---
Now, having looked through all 3, I can see in the first two there were
some warnings:

WARNING: /pci at 0
<http://lists.omniti.com/mailman/listinfo/omnios-discuss>,0/pci8086,3c08
at 3 <http://lists.omniti.com/mailman/listinfo/omnios-discuss>/pci1000,3030
at 0 <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
(mpt_sas1):
        mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31120303

The /var/adm/message also had a sprinkling of these:
Nov 15 23:36:43 san1 scsi: [ID 243001 kern.warning] WARNING: /pci at 0
,0/pci8086,3c08 at 3/pci1000,3030 at 0 (mpt_sas1):
Nov 15 23:36:43 san1    mptsas_handle_event: IOCStatus=0x8000,
IOCLogInfo=0x31120303
Nov 15 23:36:43 san1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c08 at 3
/pci1000,3030 at 0 (mpt_sas1):
Nov 15 23:36:43 san1    Log info 0x31120303 received for target 10.
Nov 15 23:36:43 san1    scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc

Following this
http://lists.omniti.com/pipermail/omnios-discuss/2013-March/000544.html to
map the target disk, it's my Stec ZeusRAM ZIL drive that's configured as a
mirror (if I've done it right). I didn't see these errors in the 3rd dump,
so don't know if it's contributing. I may try to do a memtest tomorrow on
the system just in case it's some hardware issues.

My zpool status shows all my drives okay with no known data errors.

Not sure how to proceed from here.. my Hyper-V hosts have been using the
SAN with no issues for 2+ months since it's been up and configured, using
SRP and IB. I'd expect the VM hosts to crash before my SAN does.

Of course, I can make the vmdump.x files available to anyone who wants to
look at them (7GB, 8GB, 4GB).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20131116/302ede10/attachment-0001.html>

------------------------------

Message: 2
Date: Mon, 18 Nov 2013 17:42:31 -0500
From: wuffers <moo at wuffers.net>
To: Saso Kiselkov <skiselkov.ml at gmail.com>
Cc: omnios-discuss <omnios-discuss at lists.omniti.com>
Subject: Re: [OmniOS-discuss] kernel panic - anon_decref
Message-ID:
	<CA+tR_Kx_hShMmt9mxEuje=Wy+CxnPoqBBt=SWNdTH=ttfnK1EA at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Just to add to this, I had a 4th kernel panic, and this was a 3rd different
type. I did a memtest on the unit after this last panic, and it ran
successfully (24+ hours). I'm skeptical that it's memory, or something to
do with the IOCLogInfo=0x31120303 error (last 2 panics didn't have that - I
may start another thread on that), as I've been running this config with
Hyper-V hosts just fine. Adding an ESXi host (just one for now) into the
mix seems to make things unstable.

Should I be starting an issue in the Illumos issue report (
https://www.illumos.org/projects/illumos-gate/issues/new), and if so, just
one report or one for each panic type?

List of kernel panics so far:

Panic 1: anon_decref: slot count 0
Panic 2-3: kernel heap corruption detected
Panic 4: BAD TRAP: type=e (#pf Page fault) rp=ffffff01e97d7a70 addr=1500010
occurred in module "genunix" due to an illegal access to a user address

Latest crash file here:
https://drive.google.com/file/d/0B7mCJnZUzJPKWW83TFBhVHpVajQ

TIME                           UUID
SUNW-MSG-ID
Nov 17 2013 09:22:20.799446000 9d55f532-d39f-4dea-8f57-d3b24c8e9dff
SUNOS-8000-KL

  TIME                 CLASS                                 ENA
  Nov 17 09:22:20.7654 ireport.os.sunos.panic.dump_available
0x0000000000000000
  Nov 17 09:21:14.0267 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 9d55f532-d39f-4dea-8f57-d3b24c8e9dff
        code = SUNOS-8000-KL
        diag-time = 1384698140 767808
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru =
sw:///:path=/var/crash/unknown/.9d55f532-d39f-4dea-8f57-d3b24c8e9dff
                resource =
sw:///:path=/var/crash/unknown/.9d55f532-d39f-4dea-8f57-d3b24c8e9dff
                savecore-succcess = 1
                dump-dir = /var/crash/unknown
                dump-files = vmdump.3
                os-instance-uuid = 9d55f532-d39f-4dea-8f57-d3b24c8e9dff
                panicstr = BAD TRAP: type=e (#pf Page fault)
rp=ffffff01e97d7a70 addr=1500010 occurred in module "genunix" due to an
illegal access to a user address
                panicstack = unix:die+df () | unix:trap+db3 () |
unix:cmntrap+e6 () | genunix:anon_decref+35 () | genunix:anon_free+74 () |
genunix:segvn_free+242 () | genunix:seg_free+30 () |
genunix:segvn_unmap+cde () | genunix:as_free+e7 () | genunix:relvm+220 () |
genunix:proc_exit+454 () | genunix:exit+15 () | genunix:rexit+18 () |
unix:brand_sys_sysenter+1c9 () |
                crashtime = 1384592942
                panic-time = Sat Nov 16 04:09:02 2013 EST
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x5288d11c 0x2fa693f0

On Sat, Nov 16, 2013 at 2:48 AM, wuffers <moo at wuffers.net> wrote:

> When it pours, it rains. With r151006y, I had two kernel panics in quick
> succession while trying to create some zero thick eager disks (4 at the
> same time) in ESXi. They are now "kernel heap corruption detected" instead
> of anon_decref.
>
> Kernel panic 2 (dump info:
> https://drive.google.com/file/d/0B7mCJnZUzJPKMHhqZHJnaDEzYkk)
> http://i.imgur.com/eIssxmc.png?1
> http://i.imgur.com/MXJy4zP.png?1
>
> TIME                           UUID
> SUNW-MSG-ID
> Nov 16 2013 00:51:24.912170000 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
> SUNOS-8000-KL
>
>   TIME                 CLASS                                 ENA
>   Nov 16 00:51:24.8638 ireport.os.sunos.panic.dump_available
> 0x0000000000000000
>   Nov 16 00:49:58.8671 ireport.os.sunos.panic.dump_pending_on_device
> 0x0000000000000000
>
>
> nvlist version: 0
>         version = 0x0
>         class = list.suspect
>         uuid = 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
>         code = SUNOS-8000-KL
>         diag-time = 1384581084 866703
>
>         de = fmd:///module/software-diagnosis
>         fault-list-sz = 0x1
>         fault-list = (array of embedded nvlists)
>         (start fault-list[0])
>         nvlist version: 0
>                 version = 0x0
>                 class = defect.sunos.kernel.panic
>                 certainty = 0x64
>                 asru =
> sw:///:path=/var/crash/unknown/.5998ba1e-3aa5-ccac-e885-be4897cfcfe8
>                 resource =
> sw:///:path=/var/crash/unknown/.5998ba1e-3aa5-ccac-e885-be4897cfcfe8
>
>                 savecore-succcess = 1
>                 dump-dir = /var/crash/unknown
>                 dump-files = vmdump.1
>                 os-instance-uuid = 5998ba1e-3aa5-ccac-e885-be4897cfcfe8
>                 panicstr = kernel heap corruption detected
>                 panicstack = fffffffffba49c04 () |
> genunix:kmem_slab_free+c1 () | genunix:kmem_magazine_destroy+6e () |
> genunix:kmem_depot_ws_reap+5d () | genunix:kmem_cache_magazine_purge+118 ()
> | genunix:kmem_cache_magazine_resize+40 () | genunix:taskq_thread+2d0 () |
> unix:thread_start+8 () |
>                 crashtime = 1384577735
>                 panic-time = Fri Nov 15 23:55:35 2013 EST
>
>         (end fault-list[0])
>
>         fault-status = 0x1
>         severity = Major
>         __ttl = 0x1
>         __tod = 0x528707dc 0x365e9c10
>
> kernel panic 3 (dump info:
> https://drive.google.com/file/d/0B7mCJnZUzJPKbnZIeWZzQjhUOTQ):
> (looked the same, no screenshots)
>
> TIME                           UUID
> SUNW-MSG-ID
> Nov 16 2013 01:44:43.327489000 a6592c60-199f-ead5-9586-ff013bf5ab2d
> SUNOS-8000-KL
>
>   TIME                 CLASS                                 ENA
>   Nov 16 01:44:43.2941 ireport.os.sunos.panic.dump_available
> 0x0000000000000000
>   Nov 16 01:44:03.5356 ireport.os.sunos.panic.dump_pending_on_device
> 0x0000000000000000
>
>
> nvlist version: 0
>         version = 0x0
>         class = list.suspect
>         uuid = a6592c60-199f-ead5-9586-ff013bf5ab2d
>         code = SUNOS-8000-KL
>         diag-time = 1384584283 296816
>
>         de = fmd:///module/software-diagnosis
>         fault-list-sz = 0x1
>         fault-list = (array of embedded nvlists)
>         (start fault-list[0])
>         nvlist version: 0
>                 version = 0x0
>                 class = defect.sunos.kernel.panic
>                 certainty = 0x64
>                 asru =
> sw:///:path=/var/crash/unknown/.a6592c60-199f-ead5-9586-ff013bf5ab2d
>                 resource =
> sw:///:path=/var/crash/unknown/.a6592c60-199f-ead5-9586-ff013bf5ab2d
>
>                 savecore-succcess = 1
>                 dump-dir = /var/crash/unknown
>                 dump-files = vmdump.2
>                 os-instance-uuid = a6592c60-199f-ead5-9586-ff013bf5ab2d
>                 panicstr = kernel heap corruption detected
>                 panicstack = fffffffffba49c04 () |
> genunix:kmem_slab_free+c1 () | genunix:kmem_magazine_destroy+6e () |
> genunix:kmem_cache_magazine_purge+dc () |
> genunix:kmem_cache_magazine_resize+40 () | genunix:taskq_thread+2d0 () |
> unix:thread_start+8 () |
>                 crashtime = 1384582658
>                 panic-time = Sat Nov 16 01:17:38 2013 EST
>
>         (end fault-list[0])
>
>         fault-status = 0x1
>         severity = Major
>         __ttl = 0x1
>         __tod = 0x5287145b 0x138515e8
>
>
> ---
> Now, having looked through all 3, I can see in the first two there were
> some warnings:
>
> WARNING: /pci at 0 <http://lists.omniti.com/mailman/listinfo/omnios-discuss>,0/pci8086,3c08 at 3 <http://lists.omniti.com/mailman/listinfo/omnios-discuss>/pci1000,3030 at 0 <http://lists.omniti.com/mailman/listinfo/omnios-discuss> (mpt_sas1):
>         mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31120303
>
> The /var/adm/message also had a sprinkling of these:
> Nov 15 23:36:43 san1 scsi: [ID 243001 kern.warning] WARNING: /pci at 0
> ,0/pci8086,3c08 at 3/pci1000,3030 at 0 (mpt_sas1):
> Nov 15 23:36:43 san1    mptsas_handle_event: IOCStatus=0x8000,
> IOCLogInfo=0x31120303
> Nov 15 23:36:43 san1 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c08 at 3
> /pci1000,3030 at 0 (mpt_sas1):
> Nov 15 23:36:43 san1    Log info 0x31120303 received for target 10.
> Nov 15 23:36:43 san1    scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
>
> Following this
> http://lists.omniti.com/pipermail/omnios-discuss/2013-March/000544.htmlto map the target disk, it's my Stec ZeusRAM ZIL drive that's configured as
> a mirror (if I've done it right). I didn't see these errors in the 3rd
> dump, so don't know if it's contributing. I may try to do a memtest
> tomorrow on the system just in case it's some hardware issues.
>
> My zpool status shows all my drives okay with no known data errors.
>
> Not sure how to proceed from here.. my Hyper-V hosts have been using the
> SAN with no issues for 2+ months since it's been up and configured, using
> SRP and IB. I'd expect the VM hosts to crash before my SAN does.
>
> Of course, I can make the vmdump.x files available to anyone who wants to
> look at them (7GB, 8GB, 4GB).
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20131118/74a15cf9/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

------------------------------

End of OmniOS-discuss Digest, Vol 20, Issue 19
**********************************************