[OmniOS-discuss] Hung ZFS Pool
Brian Hechinger
wonko at 4amlunch.net
Wed Dec 9 16:13:11 UTC 2015
I didn’t know about pgrep, no. :)
So the ‘zpool clear’ has fixed things a bit. The touch processes have all exited.
I can now touch a file on that pool.
A zpool scrub later and this is the status:
pool: zoom
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 6K in 0h0m with 0 errors on Wed Dec 9 10:25:33 2015
config:
NAME STATE READ WRITE CKSUM
zoom ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c4t1d0s1 ONLINE 0 0 0
c5t1d0s1 ONLINE 0 0 2
errors: No known data errors
I’m going to try to re-run iozone later and see if I can’t get it to happen again.
This is concerning.
The previous entry in messages is 4 days prior talking about ntpd.
-brian
> On Dec 9, 2015, at 10:25 AM, Dan McDonald <danmcd at omniti.com> wrote:
>
>
>> On Dec 9, 2015, at 10:20 AM, Brian Hechinger <wonko at 4amlunch.net> wrote:
>>
>> I cannot ^C out of the touch.
>>
>> wonko at basket1:/export/home/wonko$ ps -ef | grep touch
>
> You do know about pgrep(1), right? :)
>
>> Also, kill -9 doesn’t touch them.
>
> Okay! This means something in-kernel is locking them up. More reason for a coredump.
>
>> the only thing in messages is:
>>
>> Dec 7 14:31:56 basket1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major
>> Dec 7 14:31:56 basket1 EVENT-TIME: Mon Dec 7 14:31:56 EST 2015
>> Dec 7 14:31:56 basket1 PLATFORM: X8DTL, CSN: 1234567890, HOSTNAME: basket1
>> Dec 7 14:31:56 basket1 SOURCE: zfs-diagnosis, REV: 1.0
>> Dec 7 14:31:56 basket1 EVENT-ID: 585f9fa2-4a84-4184-8c87-c2f9c600e1a1
>> Dec 7 14:31:56 basket1 DESC: The ZFS pool has experienced currently unrecoverable I/O
>> Dec 7 14:31:56 basket1 failures. Refer to http://illumos.org/msg/ZFS-8000-HC for more information.
>> Dec 7 14:31:56 basket1 AUTO-RESPONSE: No automated response will be taken.
>> Dec 7 14:31:56 basket1 IMPACT: Read and write I/Os cannot be serviced.
>> Dec 7 14:31:56 basket1 REC-ACTION: Make sure the affected devices are connected, then run
>> Dec 7 14:31:56 basket1 'zpool clear’.
>
> You sure there's nothing before the FMA complaints? It might be one line, but it may be enough to show something.
>
>> I can definitely share a kernel coredump, that’s not a problem. Just need to schedule a time to shut down all the VMs first.
>
> Take your time, do it on your schedule, that's fine.
>
> So I know where to put it: Which OmniOS release are you running?
>
> head /etc/release ; uname -a
>
> Thanks,
> Dan
>
More information about the OmniOS-discuss
mailing list