[OmniOS-discuss] ZFS Checksum problem

Richard Elling richard.elling at richardelling.com
Mon Jul 1 21:27:48 UTC 2013


troubleshooting tips below...

On Jul 1, 2013, at 1:15 PM, Saso Kiselkov <skiselkov.ml at gmail.com> wrote:

> On 01/07/2013 21:00, "Daniel D. Gonçalves" wrote:
>> I'm having trouble checksum in my ZFS pool, I tried to change data
>> cables and power of HDDs, but the problems remain.
>> All 8 HDDs that are exhibiting errors are identicaland all is on the
>> same controller.
>> 
>>        NAME          STATE     READ WRITE CKSUM
>>        STORAGE01     DEGRADED     0     0   347
>>          mirror-0    DEGRADED     0     0   188
>>            c15t35d1  DEGRADED     0     0   188  too many errors
>>            c15t18d1  DEGRADED     0     0   188  too many errors
>>          mirror-1    DEGRADED     0     0   170
>>            c15t21d1  DEGRADED     0     0   170  too many errors
>>            c15t22d1  DEGRADED     0     0   170  too many errors
>>          mirror-2    DEGRADED     0     0   164
>>            c15t17d1  DEGRADED     0     0   164  too many errors
>>            c15t19d1  DEGRADED     0     0   164  too many errors
>>          mirror-3    DEGRADED     0     0   172
>>            c15t24d1  DEGRADED     0     0   172  too many errors
>>            c15t23d1  DEGRADED     0     0   172  too many errors

smells like a bad cable, power supply, expander, or controller...

>>          mirror-5    ONLINE       0     0     0
>>            c15t25d1  ONLINE       0     0     0
>>            c15t27d1  ONLINE       0     0     0
>>          mirror-6    ONLINE       0     0     0
>>            c15t26d1  ONLINE       0     0     0
>>            c15t28d1  ONLINE       0     0     0
>>          mirror-7    ONLINE       0     0     0
>>            c15t29d1  ONLINE       0     0     0
>>            c15t31d1  ONLINE       0     0     0
>>          mirror-8    ONLINE       0     0     0
>>            c15t32d1  ONLINE       0     0     0
>>            c15t30d1  ONLINE       0     0     0
>>        logs
>>          mirror-4    ONLINE       0     0     0
>>            c14t1d0   ONLINE       0     0     0
>>            c14t3d0   ONLINE       0     0     0
>>        cache
>>          c14t4d0     ONLINE       0     0     0
>> 
>> Smartinfo:
>> 
>> c15t17d1       3001 GB       STORAGE01       mirror       DEGRADED    
>>  S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       40
>> °C       Z1F21TA7       without error       short long abort log
>> c15t18d1       3001 GB       STORAGE01       mirror       DEGRADED    
>>  S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       41
>> °C       Z1F27DCD       without error       short long abort log
>> c15t19d1       3001 GB       STORAGE01       mirror       DEGRADED    
>>  S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       39
>> °C       Z1F28PSJ       without error       short long abort log
>> c15t21d1       3001 GB       STORAGE01       mirror       DEGRADED    
>>  S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       35
>> °C       Z1F21NPM       without error       short long abort log
>> c15t22d1       3001 GB       STORAGE01       mirror       DEGRADED    
>>  S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       38
>> °C       Z1F27CFV       without error       short long abort log
>> c15t23d1       3001 GB       STORAGE01       mirror       DEGRADED    
>>  S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       40
>> °C       Z1F27EHT       without error       short long abort log
>> c15t24d1       3001 GB       STORAGE01       mirror       DEGRADED    
>>  S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       41
>> °C       Z1F27796       without error       short long abort log
>> c15t35d1       3001 GB       STORAGE01       mirror       DEGRADED    
>>  S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       25
>> °C       Z1F27E4X       without error       short long abort log

disks look clean

>> 
>> HD Info:
>> 
>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>> Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org
>> 
>> === START OF INFORMATION SECTION ===
>> Model Family:     Seagate Barracuda 7200.14 (AF)
>> Device Model:     ST3000DM001-1CH166
>> Serial Number:    Z1F21TA7
>> LU WWN Device Id: 5 000c50 04f6f0c73
>> Firmware Version: CC24
>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   ATA8-ACS T13/1699-D revision 4
>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Mon Jul  1 16:56:42 2013 BRT
>> 
>> Can anyone help me?
> 
> Try iostat -Exn and have a look at "fmadm faulty" to see if you can
> pinpoint the fault source.

fmdump -e
shows the error reports. These are more interesting than the faulty diagnosis in this
case. If the only error reports you see are ZFS checksums, then it is more difficult to
get to root cause, because that means something in the datapath is corrupting data.
If you see other errors, like transport or data errors, then it will be a better clue.
 -- richard


> 
> Cheers,
> -- 
> Saso
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

--

Richard.Elling at RichardElling.com
+1-760-896-4422



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20130701/bc9672e1/attachment.html>


More information about the OmniOS-discuss mailing list