[OmniOS-discuss] crash dump analysis help

Sat Apr 18 09:23:29 UTC 2015

hi guys,

my omnios zfs server crashed today and I got a complete core dump and I was wondering if I am on the right track...

here is what I did so far...

root at zfs10:/root<mailto:root at zfs10:/root># fmdump -Vp -u 775e0fc1-dcd2-4cb2-b800-88a1b9910f94
TIME                           UUID                                 SUNW-MSG-ID
Apr 17 2015 22:48:13.667749000 775e0fc1-dcd2-4cb2-b800-88a1b9910f94 SUNOS-8000-KL
  TIME                 CLASS                                 ENA
  Apr 17 22:48:13.6544 ireport.os.sunos.panic.dump_available 0x0000000000000000
  Apr 17 22:45:46.3335 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000
nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 775e0fc1-dcd2-4cb2-b800-88a1b9910f94
        code = SUNOS-8000-KL
        diag-time = 1429303693 655062
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru = sw:///:path=/var/crash/unknown/.775e0fc1-dcd2-4cb2-b800-88a1b9910f94
                resource = sw:///:path=/var/crash/unknown/.775e0fc1-dcd2-4cb2-b800-88a1b9910f94
                savecore-succcess = 1
                dump-dir = /var/crash/unknown
                dump-files = vmdump.1
                os-instance-uuid = 775e0fc1-dcd2-4cb2-b800-88a1b9910f94
                panicstr = BAD TRAP: type=e (#pf Page fault) rp=ffffff01701bb960 addr=ec6093a0 occurred in module "unix" due to an illegal access to a user address
                panicstack = unix:die+df () | unix:trap+db3 () | unix:cmntrap+e6 () | unix:bzero+184 () | zfs:l2arc_write_buffers+1f8 () | zfs:l2arc_feed_thread+240 () | unix:thread_start+8 () |
                crashtime = 1429299093
                panic-time = Fri Apr 17 21:31:33 2015 CEST
        (end fault-list[0])
        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x5531718d 0x27cd0a88

//then extract the dump file:

savecore: not enough space in /var/crash/unknown (14937 MB avail, 27154 MB needed)
root at zfs10:/var/crash/unknown<mailto:root at zfs10:/var/crash/unknown># savecore -f /pool01/ISO/vmdump.1 /pool01/ISO/
savecore: System dump time: Fri Apr 17 21:31:33 2015
savecore: saving system crash dump in /pool01/ISO//{unix,vmcore}.1
Constructing namelist /pool01/ISO//unix.1
Constructing corefile /pool01/ISO//vmcore.1
 3:33 100% done: 6897249 of 6897249 pages saved

// then mdb and $c to see last process before the crash...

root at zfs10:/pool01/ISO<mailto:root at zfs10:/pool01/ISO># mdb unix.1 vmcore.1
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci zfs sata sd ip hook neti sockfs arp usba stmf stmf_sbd fctl md lofs mpt_sas random ufs idm smbsrv nfs crypto ptm cpc kvm fcp fcip logindmux nsmb nsctl sdbc ii sv rdc ]
> $c
bzero+0x184()
l2arc_write_buffers+0x1f8(ffffff328f860000, ffffff331782d8d8, 800000, ffffff01701bbbec)
l2arc_feed_thread+0x240()
thread_start+8()

// based on this I believe my  m2 sata L2 cache Samsung ssd drives used for L2arc in the zpool are ready to be thrown into the bin ....

Is there some way I can gather more info and confirm I am on the right track?

br,
Rune
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150418/f86df9de/attachment-0001.html>