[OmniOS-discuss] Hung ZFS Pool

Brian Hechinger wonko at 4amlunch.net
Wed Dec 9 15:20:15 UTC 2015


I cannot ^C out of the touch.

wonko at basket1:/export/home/wonko$ ps -ef | grep touch
    root  2459  2447   0 08:12:09 ?           0:00 touch /zoom/hi
    root  2050  2049   0   Dec 07 ?           0:00 touch hi
    root  2049     1   0   Dec 07 ?           0:00 sudo touch hi

Also, kill -9 doesn’t touch them.

the only thing in messages is:

Dec  7 14:31:56 basket1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major
Dec  7 14:31:56 basket1 EVENT-TIME: Mon Dec  7 14:31:56 EST 2015
Dec  7 14:31:56 basket1 PLATFORM: X8DTL, CSN: 1234567890, HOSTNAME: basket1
Dec  7 14:31:56 basket1 SOURCE: zfs-diagnosis, REV: 1.0
Dec  7 14:31:56 basket1 EVENT-ID: 585f9fa2-4a84-4184-8c87-c2f9c600e1a1
Dec  7 14:31:56 basket1 DESC: The ZFS pool has experienced currently unrecoverable I/O
Dec  7 14:31:56 basket1             failures.  Refer to http://illumos.org/msg/ZFS-8000-HC for more information.
Dec  7 14:31:56 basket1 AUTO-RESPONSE: No automated response will be taken.
Dec  7 14:31:56 basket1 IMPACT: Read and write I/Os cannot be serviced.
Dec  7 14:31:56 basket1 REC-ACTION: Make sure the affected devices are connected, then run
Dec  7 14:31:56 basket1             'zpool clear’.

I can definitely share a kernel coredump, that’s not a problem. Just need to schedule a time to shut down all the VMs first.

Maybe later tonight.

-brian

> On Dec 9, 2015, at 10:16 AM, Dan McDonald <danmcd at omniti.com> wrote:
> 
> 
>> On Dec 9, 2015, at 8:14 AM, Brian Hechinger <wonko at 4amlunch.net> wrote:
>> 
>> So read access appears to be ok. Writes are totally boned, however.  That touch just hangs forever.
>> 
>> So what do I need to do to provide you all with the information you need to diagnose this.
> 
> Do you literally have a touch process hanging right now?  Or is it something you can ^C out of?
> 
> Does anything stand out in /var/adm/messages?  Maybe the kernel is complaining about something there.
> 
> My final inclination is heavy-handed:
> 
> 	- Make sure you have at least one process stuck on writing to that filesystem.
> 
> 	- "reboot -d" and take a kernel coredump
> 
> Unless you have sensitive information, a kernel coredump you can share would be the best thing to do.
> 
> 
> Dan
> 
> p.s. I'm at the Dr. the rest of the day starting in 90 mins, pardon any latency.



More information about the OmniOS-discuss mailing list