[OmniOS-discuss] Hung ZFS Pool

Brian Hechinger wonko at 4amlunch.net
Wed Dec 9 16:13:11 UTC 2015


I didn’t know about pgrep, no. :)

So the ‘zpool clear’ has fixed things a bit. The touch processes have all exited.

I can now touch a file on that pool.

A zpool scrub later and this is the status:

  pool: zoom
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 6K in 0h0m with 0 errors on Wed Dec  9 10:25:33 2015
config:

        NAME          STATE     READ WRITE CKSUM
        zoom          ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c4t1d0s1  ONLINE       0     0     0
            c5t1d0s1  ONLINE       0     0     2

errors: No known data errors

I’m going to try to re-run iozone later and see if I can’t get it to happen again.

This is concerning.

The previous entry in messages is 4 days prior talking about ntpd.

-brian

> On Dec 9, 2015, at 10:25 AM, Dan McDonald <danmcd at omniti.com> wrote:
> 
> 
>> On Dec 9, 2015, at 10:20 AM, Brian Hechinger <wonko at 4amlunch.net> wrote:
>> 
>> I cannot ^C out of the touch.
>> 
>> wonko at basket1:/export/home/wonko$ ps -ef | grep touch
> 
> You do know about pgrep(1), right?  :)
> 
>> Also, kill -9 doesn’t touch them.
> 
> Okay!  This means something in-kernel is locking them up.  More reason for a coredump.
> 
>> the only thing in messages is:
>> 
>> Dec  7 14:31:56 basket1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major
>> Dec  7 14:31:56 basket1 EVENT-TIME: Mon Dec  7 14:31:56 EST 2015
>> Dec  7 14:31:56 basket1 PLATFORM: X8DTL, CSN: 1234567890, HOSTNAME: basket1
>> Dec  7 14:31:56 basket1 SOURCE: zfs-diagnosis, REV: 1.0
>> Dec  7 14:31:56 basket1 EVENT-ID: 585f9fa2-4a84-4184-8c87-c2f9c600e1a1
>> Dec  7 14:31:56 basket1 DESC: The ZFS pool has experienced currently unrecoverable I/O
>> Dec  7 14:31:56 basket1             failures.  Refer to http://illumos.org/msg/ZFS-8000-HC for more information.
>> Dec  7 14:31:56 basket1 AUTO-RESPONSE: No automated response will be taken.
>> Dec  7 14:31:56 basket1 IMPACT: Read and write I/Os cannot be serviced.
>> Dec  7 14:31:56 basket1 REC-ACTION: Make sure the affected devices are connected, then run
>> Dec  7 14:31:56 basket1             'zpool clear’.
> 
> You sure there's nothing before the FMA complaints?  It might be one line, but it may be enough to show something.
> 
>> I can definitely share a kernel coredump, that’s not a problem. Just need to schedule a time to shut down all the VMs first.
> 
> Take your time, do it on your schedule, that's fine.
> 
> So I know where to put it:  Which OmniOS release are you running?
> 
> 	head /etc/release ; uname -a
> 
> Thanks,
> Dan
> 



More information about the OmniOS-discuss mailing list