[OmniOS-discuss] ZFS Checksum problem

"Daniel D. Gonçalves" daniel at dgnetwork.com.br
Mon Jul 1 20:28:00 UTC 2013


iostat -Exn result:

c15t35d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No: 
Z1F27E4X
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
Illegal Request: 0 Predictive Failure Analysis: 0
c15t18d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No: 
Z1F27DCD
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
Illegal Request: 0 Predictive Failure Analysis: 0

c15t21d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No: 
Z1F21NPM
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
Illegal Request: 0 Predictive Failure Analysis: 0
c15t22d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No: 
Z1F27CFV
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
Illegal Request: 0 Predictive Failure Analysis: 0

c15t17d1         Soft Errors: 5 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No: 
Z1F21TA7
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 5
Illegal Request: 0 Predictive Failure Analysis: 0
c15t19d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No: 
Z1F28PSJ
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
Illegal Request: 0 Predictive Failure Analysis: 0

c15t23d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No: 
Z1F27EHT
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
Illegal Request: 0 Predictive Failure Analysis: 0
c15t24d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No: 
Z1F27796
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
Illegal Request: 0 Predictive Failure Analysis: 0


fmadm faulty result:

--------------- ------------------------------------ -------------- 
---------
TIME            EVENT-ID MSG-ID         SEVERITY
--------------- ------------------------------------ -------------- 
---------
Jun 29 14:48:38 09217f5b-2ded-e74c-bef3-fbaec52391ed ZFS-8000-GH    Major

Host        : storage01
Platform    : SandyBridge-Platform      Chassis_id  : To-be-filled-by-O.E.M.
Product_sn  :

Fault class : fault.fs.zfs.vdev.checksum
Affects     : zfs://pool=STORAGE01/vdev=40200696be7be968
                   faulted but still in service
Problem in  : zfs://pool=STORAGE01/vdev=40200696be7be968
                   faulted but still in service

Description : The number of checksum errors associated with a ZFS device
               exceeded acceptable levels.  Refer to
               http://illumos.org/msg/ZFS-8000-GH for more information.

Response    : The device has been marked as degraded.  An attempt
               will be made to activate a hot spare if available.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.


Already ran the SCRUB several times, but checksum errors occur again, 
only this 8 HDDs.
Remembering, SATA and power cables have been swapped.

Daniel

Em 01/07/2013 17:15, Saso Kiselkov escreveu:
> On 01/07/2013 21:00, "Daniel D. Gonçalves" wrote:
>> I'm having trouble checksum in my ZFS pool, I tried to change data
>> cables and power of HDDs, but the problems remain.
>> All 8 HDDs that are exhibiting errors are identicaland all is on the
>> same controller.
>>
>>          NAME          STATE     READ WRITE CKSUM
>>          STORAGE01     DEGRADED     0     0   347
>>            mirror-0    DEGRADED     0     0   188
>>              c15t35d1  DEGRADED     0     0   188  too many errors
>>              c15t18d1  DEGRADED     0     0   188  too many errors
>>            mirror-1    DEGRADED     0     0   170
>>              c15t21d1  DEGRADED     0     0   170  too many errors
>>              c15t22d1  DEGRADED     0     0   170  too many errors
>>            mirror-2    DEGRADED     0     0   164
>>              c15t17d1  DEGRADED     0     0   164  too many errors
>>              c15t19d1  DEGRADED     0     0   164  too many errors
>>            mirror-3    DEGRADED     0     0   172
>>              c15t24d1  DEGRADED     0     0   172  too many errors
>>              c15t23d1  DEGRADED     0     0   172  too many errors
>>            mirror-5    ONLINE       0     0     0
>>              c15t25d1  ONLINE       0     0     0
>>              c15t27d1  ONLINE       0     0     0
>>            mirror-6    ONLINE       0     0     0
>>              c15t26d1  ONLINE       0     0     0
>>              c15t28d1  ONLINE       0     0     0
>>            mirror-7    ONLINE       0     0     0
>>              c15t29d1  ONLINE       0     0     0
>>              c15t31d1  ONLINE       0     0     0
>>            mirror-8    ONLINE       0     0     0
>>              c15t32d1  ONLINE       0     0     0
>>              c15t30d1  ONLINE       0     0     0
>>          logs
>>            mirror-4    ONLINE       0     0     0
>>              c14t1d0   ONLINE       0     0     0
>>              c14t3d0   ONLINE       0     0     0
>>          cache
>>            c14t4d0     ONLINE       0     0     0
>>
>> Smartinfo:
>>
>>   c15t17d1       3001 GB       STORAGE01       mirror       DEGRADED
>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       40
>> °C       Z1F21TA7       without error       short long abort log
>>   c15t18d1       3001 GB       STORAGE01       mirror       DEGRADED
>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       41
>> °C       Z1F27DCD       without error       short long abort log
>>   c15t19d1       3001 GB       STORAGE01       mirror       DEGRADED
>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       39
>> °C       Z1F28PSJ       without error       short long abort log
>>   c15t21d1       3001 GB       STORAGE01       mirror       DEGRADED
>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       35
>> °C       Z1F21NPM       without error       short long abort log
>>   c15t22d1       3001 GB       STORAGE01       mirror       DEGRADED
>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       38
>> °C       Z1F27CFV       without error       short long abort log
>>   c15t23d1       3001 GB       STORAGE01       mirror       DEGRADED
>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       40
>> °C       Z1F27EHT       without error       short long abort log
>>   c15t24d1       3001 GB       STORAGE01       mirror       DEGRADED
>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       41
>> °C       Z1F27796       without error       short long abort log
>>   c15t35d1       3001 GB       STORAGE01       mirror       DEGRADED
>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       25
>> °C       Z1F27E4X       without error       short long abort log
>>
>> HD Info:
>>
>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>> Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Model Family:     Seagate Barracuda 7200.14 (AF)
>> Device Model:     ST3000DM001-1CH166
>> Serial Number:    Z1F21TA7
>> LU WWN Device Id: 5 000c50 04f6f0c73
>> Firmware Version: CC24
>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   ATA8-ACS T13/1699-D revision 4
>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Mon Jul  1 16:56:42 2013 BRT
>>
>> Can anyone help me?
> Try iostat -Exn and have a look at "fmadm faulty" to see if you can
> pinpoint the fault source.
>
> Cheers,



More information about the OmniOS-discuss mailing list