[OmniOS-discuss] ZFS Checksum problem

Saso Kiselkov skiselkov.ml at gmail.com
Mon Jul 1 20:35:19 UTC 2013


It looks like your controller is faulty and is corrupting data. The
drives look fine, all of your errors are on the transport, not on the
platters.

Try swapping out the controller or move the drives to another controller.

Cheers,
-- 
Saso

On 01/07/2013 21:28, "Daniel D. Gonçalves" wrote:
> iostat -Exn result:
> 
> c15t35d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F27E4X
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
> c15t18d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F27DCD
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
> 
> c15t21d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F21NPM
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
> c15t22d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F27CFV
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
> 
> c15t17d1         Soft Errors: 5 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F21TA7
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 5
> Illegal Request: 0 Predictive Failure Analysis: 0
> c15t19d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F28PSJ
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
> 
> c15t23d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F27EHT
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
> c15t24d1         Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA      Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F27796
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
> 
> 
> fmadm faulty result:
> 
> --------------- ------------------------------------ --------------
> ---------
> TIME            EVENT-ID MSG-ID         SEVERITY
> --------------- ------------------------------------ --------------
> ---------
> Jun 29 14:48:38 09217f5b-2ded-e74c-bef3-fbaec52391ed ZFS-8000-GH    Major
> 
> Host        : storage01
> Platform    : SandyBridge-Platform      Chassis_id  :
> To-be-filled-by-O.E.M.
> Product_sn  :
> 
> Fault class : fault.fs.zfs.vdev.checksum
> Affects     : zfs://pool=STORAGE01/vdev=40200696be7be968
>                   faulted but still in service
> Problem in  : zfs://pool=STORAGE01/vdev=40200696be7be968
>                   faulted but still in service
> 
> Description : The number of checksum errors associated with a ZFS device
>               exceeded acceptable levels.  Refer to
>               http://illumos.org/msg/ZFS-8000-GH for more information.
> 
> Response    : The device has been marked as degraded.  An attempt
>               will be made to activate a hot spare if available.
> 
> Impact      : Fault tolerance of the pool may be compromised.
> 
> Action      : Run 'zpool status -x' and replace the bad device.
> 
> 
> Already ran the SCRUB several times, but checksum errors occur again,
> only this 8 HDDs.
> Remembering, SATA and power cables have been swapped.
> 
> Daniel
> 
> Em 01/07/2013 17:15, Saso Kiselkov escreveu:
>> On 01/07/2013 21:00, "Daniel D. Gonçalves" wrote:
>>> I'm having trouble checksum in my ZFS pool, I tried to change data
>>> cables and power of HDDs, but the problems remain.
>>> All 8 HDDs that are exhibiting errors are identicaland all is on the
>>> same controller.
>>>
>>>          NAME          STATE     READ WRITE CKSUM
>>>          STORAGE01     DEGRADED     0     0   347
>>>            mirror-0    DEGRADED     0     0   188
>>>              c15t35d1  DEGRADED     0     0   188  too many errors
>>>              c15t18d1  DEGRADED     0     0   188  too many errors
>>>            mirror-1    DEGRADED     0     0   170
>>>              c15t21d1  DEGRADED     0     0   170  too many errors
>>>              c15t22d1  DEGRADED     0     0   170  too many errors
>>>            mirror-2    DEGRADED     0     0   164
>>>              c15t17d1  DEGRADED     0     0   164  too many errors
>>>              c15t19d1  DEGRADED     0     0   164  too many errors
>>>            mirror-3    DEGRADED     0     0   172
>>>              c15t24d1  DEGRADED     0     0   172  too many errors
>>>              c15t23d1  DEGRADED     0     0   172  too many errors
>>>            mirror-5    ONLINE       0     0     0
>>>              c15t25d1  ONLINE       0     0     0
>>>              c15t27d1  ONLINE       0     0     0
>>>            mirror-6    ONLINE       0     0     0
>>>              c15t26d1  ONLINE       0     0     0
>>>              c15t28d1  ONLINE       0     0     0
>>>            mirror-7    ONLINE       0     0     0
>>>              c15t29d1  ONLINE       0     0     0
>>>              c15t31d1  ONLINE       0     0     0
>>>            mirror-8    ONLINE       0     0     0
>>>              c15t32d1  ONLINE       0     0     0
>>>              c15t30d1  ONLINE       0     0     0
>>>          logs
>>>            mirror-4    ONLINE       0     0     0
>>>              c14t1d0   ONLINE       0     0     0
>>>              c14t3d0   ONLINE       0     0     0
>>>          cache
>>>            c14t4d0     ONLINE       0     0     0
>>>
>>> Smartinfo:
>>>
>>>   c15t17d1       3001 GB       STORAGE01       mirror       DEGRADED
>>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       40
>>> °C       Z1F21TA7       without error       short long abort log
>>>   c15t18d1       3001 GB       STORAGE01       mirror       DEGRADED
>>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       41
>>> °C       Z1F27DCD       without error       short long abort log
>>>   c15t19d1       3001 GB       STORAGE01       mirror       DEGRADED
>>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       39
>>> °C       Z1F28PSJ       without error       short long abort log
>>>   c15t21d1       3001 GB       STORAGE01       mirror       DEGRADED
>>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       35
>>> °C       Z1F21NPM       without error       short long abort log
>>>   c15t22d1       3001 GB       STORAGE01       mirror       DEGRADED
>>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       38
>>> °C       Z1F27CFV       without error       short long abort log
>>>   c15t23d1       3001 GB       STORAGE01       mirror       DEGRADED
>>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       40
>>> °C       Z1F27EHT       without error       short long abort log
>>>   c15t24d1       3001 GB       STORAGE01       mirror       DEGRADED
>>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       41
>>> °C       Z1F27796       without error       short long abort log
>>>   c15t35d1       3001 GB       STORAGE01       mirror       DEGRADED
>>>    S:4 H:0 T:0       ST3000DM001-1CH166       sat,12  PASSED       25
>>> °C       Z1F27E4X       without error       short long abort log
>>>
>>> HD Info:
>>>
>>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>>> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
>>> www.smartmontools.org
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family:     Seagate Barracuda 7200.14 (AF)
>>> Device Model:     ST3000DM001-1CH166
>>> Serial Number:    Z1F21TA7
>>> LU WWN Device Id: 5 000c50 04f6f0c73
>>> Firmware Version: CC24
>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>> Rotation Rate:    7200 rpm
>>> Device is:        In smartctl database [for details use: -P show]
>>> ATA Version is:   ATA8-ACS T13/1699-D revision 4
>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is:    Mon Jul  1 16:56:42 2013 BRT
>>>
>>> Can anyone help me?
>> Try iostat -Exn and have a look at "fmadm faulty" to see if you can
>> pinpoint the fault source.
>>
>> Cheers,
> 



More information about the OmniOS-discuss mailing list