[OmniOS-discuss] core dump while trying to import pool
Michael Talbott
mtalbott at lji.org
Fri Dec 4 19:33:09 UTC 2015
I also came upon this same issue after rebooting one of my OmniOS machines.
I did have l2arc devices on my pool until the announcement of the bug
found. At that point I immediately removed my l2arc devices and didn't
reboot the machine until a convenient time where if something bad were to
happen I could manage it. Well, it was good I planned for that reboot ;)
I was able to boot in single user mode, delete the pool cache file, reboot,
import without mounting (zpool import -N <pool>) and then scrub. Scrub
fixed 16kb of data in my 254TB pool.. then exported and imported the pool
as rw only to discover that it did not fix the problem at all. Importing as
read-only allows proper mounting to pull data off.
The problem for me stemmed around mounting 1 of my 52 filesystem as rw. I
was able to mount the filesystems one by one after a zpool import -N to
discover which filesystem was causing the issue.
I'm still rsync'ng the problem filesystem out since as luck would have it,
it was the only one that I wasn't replicating out (probably a good thing
considering) since I used it for a scratch drive. But my plan is to destroy
then recreate the problem fs after the sync finishes and rsync it back..
And cross my fingers that the problem doesn't come back or get worse..
The problem I'm seeing that causes this is:
BAD TRAP: type=e (#pf Page fault) rp=ffffff00f5cee290 addr=20 occurred in
module "zfs" due to a NULL pointer dereference
Here's the details of my crash which appears to be the same as yours:
root at store2:/var/crash/unknown# mdb unix.2 vmcore.2
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix
scsi_vhci zfs mr_sas sd ip hook neti sockfs arp usba stmf stmf_sbd random
md lofs idm sata cpc crypto kvm mpt_sas ufs logindmux nsmb ptm smbsrv nfs
ipc ]
> $c
zap_leaf_lookup_closest+0x45(ffffff223e7bd290, 0, 0, ffffff00f5cee3f0)
fzap_cursor_retrieve+0xbb(ffffff223e7bd290, ffffff00f5cee650,
ffffff00f5cee530)
zap_cursor_retrieve+0x11e(ffffff00f5cee650, ffffff00f5cee530)
zfs_purgedir+0x67(ffffff2232f41bc0)
zfs_rmnode+0x202(ffffff2232f41bc0)
zfs_zinactive+0xe8(ffffff2232f41bc0)
zfs_inactive+0x75(ffffff2232f44640, ffffff221918b468, 0)
fop_inactive+0x76(ffffff2232f44640, ffffff221918b468, 0)
vn_rele+0x82(ffffff2232f44640)
zfs_unlinked_drain+0xaa(ffffff21f254d000)
zfsvfs_setup+0xe8(ffffff21f254d000, 1)
zfs_domount+0x131(ffffff223d709368, ffffff222916fd80)
zfs_mount+0x24f(ffffff223d709368, ffffff21f2645400, ffffff00f5ceee00,
ffffff221918b468)
fsop_mount+0x1e(ffffff223d709368, ffffff21f2645400, ffffff00f5ceee00,
ffffff221918b468)
domount+0x86b(0, ffffff00f5ceee00, ffffff21f2645400, ffffff221918b468,
ffffff00f5ceee40)
mount+0x167(ffffff2228e61c38, ffffff00f5ceee90)
syscall_ap+0x94()
_sys_sysenter_post_swapgs+0x149()
> ::status
debugging crash dump vmcore.2 (64-bit) from store2
operating system: 5.11 omnios-8322307 (i86pc)
image uuid: 69a1d6dd-f13a-627d-c2a0-b00c9e50a800
panic message:
BAD TRAP: type=e (#pf Page fault) rp=ffffff00f5cee290 addr=20 occurred in
module "zfs" due to a NULL pointer dereference
dump content: kernel pages only
> ::stack
zap_leaf_lookup_closest+0x45(ffffff223e7bd290, 0, 0, ffffff00f5cee3f0)
fzap_cursor_retrieve+0xbb(ffffff223e7bd290, ffffff00f5cee650,
ffffff00f5cee530)
zap_cursor_retrieve+0x11e(ffffff00f5cee650, ffffff00f5cee530)
zfs_purgedir+0x67(ffffff2232f41bc0)
zfs_rmnode+0x202(ffffff2232f41bc0)
zfs_zinactive+0xe8(ffffff2232f41bc0)
zfs_inactive+0x75(ffffff2232f44640, ffffff221918b468, 0)
fop_inactive+0x76(ffffff2232f44640, ffffff221918b468, 0)
vn_rele+0x82(ffffff2232f44640)
zfs_unlinked_drain+0xaa(ffffff21f254d000)
zfsvfs_setup+0xe8(ffffff21f254d000, 1)
zfs_domount+0x131(ffffff223d709368, ffffff222916fd80)
zfs_mount+0x24f(ffffff223d709368, ffffff21f2645400, ffffff00f5ceee00,
ffffff221918b468)
fsop_mount+0x1e(ffffff223d709368, ffffff21f2645400, ffffff00f5ceee00,
ffffff221918b468)
domount+0x86b(0, ffffff00f5ceee00, ffffff21f2645400, ffffff221918b468,
ffffff00f5ceee40)
mount+0x167(ffffff2228e61c38, ffffff00f5ceee90)
syscall_ap+0x94()
_sys_sysenter_post_swapgs+0x149()
> ::panicinfo
cpu 3
thread ffffff21f2968440
message
BAD TRAP: type=e (#pf Page fault) rp=ffffff00f5cee290 addr=20 occurred in
module "zfs" due to a NULL pointer dereference
rdi ffffff223e7bd290
rsi 0
rdx 8
rcx 4170d6eb
r8 ffffff00f5cee3f0
r9 ffffff00f5cee1c8
rax 4170d6f0
rbx ffffff00f5cee650
rbp ffffff00f5cee3d0
r10 fffffffffb854358
r11 0
r12 800
r13 0
r14 ffffff00f5cee3f0
r15 ffffff00f5cee530
fsbase 0
gsbase ffffff21f169c000
ds 4b
es 4b
fs 0
gs 1c3
trapno e
err 0
rip fffffffff7a11e95
cs 30
rflags 10206
rsp ffffff00f5cee380
ss 38
gdt_hi 0
gdt_lo 700001ef
idt_hi 0
idt_lo 40000fff
ldt 0
task 70
cr0 8005003b
cr2 20
cr3 206fe00000
cr4 426f8
>
________________________
Michael Talbott
Systems Administrator
La Jolla Institute
On Dec 4, 2015, at 7:56 AM, Dan McDonald <danmcd at omniti.com> wrote:
On Dec 4, 2015, at 10:53 AM, Lawrence Giam <paladinemishakal at gmail.com>
wrote:
Should I cancel the scrub and try the method that John suggest?
I'd let the scrub run to be sure. If it's the class of bug I'm thinking,
though, scrub won't catch it. :(
And if you can provide one of those r151014 core dumps, that'd be great.
If this pool has confidential data, though, I can understand why not.
Dan
_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20151204/c2d3ed52/attachment-0001.html>
More information about the OmniOS-discuss
mailing list