[OmniOS-discuss] disk failure causing reboot?

Mon May 18 18:59:16 UTC 2015

Dan McDonald wrote:
>> On May 18, 2015, at 2:25 PM, Jeff Stockett <jstockett at molalla.com> wrote:
>>
>> A drive failed in one of our supermicro 5048R-E1CR36L servers running omnios r151012 last night, and somewhat unexpectedly, the whole system seems to have panicked.
>>     
>
> The panic was done for protection of your pool:
>
>   
>> May 18 04:44:36 zfs01 genunix: [ID 918906 kern.notice] I/O to pool 'dpool' appears to be hung.
>>     
>
> <SNIP!>
>
>   
>>  
>> The disks are all 4TB WD40001FYYG enterprise SAS drives.  Googling seems to indicate it is a known problem with the way the various subsystems sometimes interact. Is there any way to fix/workaround this issue?
>>     
>
> Pull the drive.  I'm assuming you have a raidz or mirrored setup where you can do that, right?  Or is it a question of finding *which* drive failed?
>   

Must admit I haven't played with this since the protection against no TX 
commits completing for a while went in, but I would have expected FMA 
would have faulted out the disk to prevent hanging the pool, unless 
there was no redundancy for the top level vdev it's in?

Would be interesting to know what the pool layout and state was.

-- 
Andrew