[OmniOS-discuss] ZFS Checksum problem
Saso Kiselkov
skiselkov.ml at gmail.com
Mon Jul 1 20:35:19 UTC 2013
It looks like your controller is faulty and is corrupting data. The
drives look fine, all of your errors are on the transport, not on the
platters.
Try swapping out the controller or move the drives to another controller.
Cheers,
--
Saso
On 01/07/2013 21:28, "Daniel D. Gonçalves" wrote:
> iostat -Exn result:
>
> c15t35d1 Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F27E4X
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
> c15t18d1 Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F27DCD
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
>
> c15t21d1 Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F21NPM
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
> c15t22d1 Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F27CFV
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
>
> c15t17d1 Soft Errors: 5 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F21TA7
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 5
> Illegal Request: 0 Predictive Failure Analysis: 0
> c15t19d1 Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F28PSJ
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
>
> c15t23d1 Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F27EHT
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
> c15t24d1 Soft Errors: 4 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA Product: ST3000DM001-1CH1 Revision: CC24 Serial No:
> Z1F27796
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 4
> Illegal Request: 0 Predictive Failure Analysis: 0
>
>
> fmadm faulty result:
>
> --------------- ------------------------------------ --------------
> ---------
> TIME EVENT-ID MSG-ID SEVERITY
> --------------- ------------------------------------ --------------
> ---------
> Jun 29 14:48:38 09217f5b-2ded-e74c-bef3-fbaec52391ed ZFS-8000-GH Major
>
> Host : storage01
> Platform : SandyBridge-Platform Chassis_id :
> To-be-filled-by-O.E.M.
> Product_sn :
>
> Fault class : fault.fs.zfs.vdev.checksum
> Affects : zfs://pool=STORAGE01/vdev=40200696be7be968
> faulted but still in service
> Problem in : zfs://pool=STORAGE01/vdev=40200696be7be968
> faulted but still in service
>
> Description : The number of checksum errors associated with a ZFS device
> exceeded acceptable levels. Refer to
> http://illumos.org/msg/ZFS-8000-GH for more information.
>
> Response : The device has been marked as degraded. An attempt
> will be made to activate a hot spare if available.
>
> Impact : Fault tolerance of the pool may be compromised.
>
> Action : Run 'zpool status -x' and replace the bad device.
>
>
> Already ran the SCRUB several times, but checksum errors occur again,
> only this 8 HDDs.
> Remembering, SATA and power cables have been swapped.
>
> Daniel
>
> Em 01/07/2013 17:15, Saso Kiselkov escreveu:
>> On 01/07/2013 21:00, "Daniel D. Gonçalves" wrote:
>>> I'm having trouble checksum in my ZFS pool, I tried to change data
>>> cables and power of HDDs, but the problems remain.
>>> All 8 HDDs that are exhibiting errors are identicaland all is on the
>>> same controller.
>>>
>>> NAME STATE READ WRITE CKSUM
>>> STORAGE01 DEGRADED 0 0 347
>>> mirror-0 DEGRADED 0 0 188
>>> c15t35d1 DEGRADED 0 0 188 too many errors
>>> c15t18d1 DEGRADED 0 0 188 too many errors
>>> mirror-1 DEGRADED 0 0 170
>>> c15t21d1 DEGRADED 0 0 170 too many errors
>>> c15t22d1 DEGRADED 0 0 170 too many errors
>>> mirror-2 DEGRADED 0 0 164
>>> c15t17d1 DEGRADED 0 0 164 too many errors
>>> c15t19d1 DEGRADED 0 0 164 too many errors
>>> mirror-3 DEGRADED 0 0 172
>>> c15t24d1 DEGRADED 0 0 172 too many errors
>>> c15t23d1 DEGRADED 0 0 172 too many errors
>>> mirror-5 ONLINE 0 0 0
>>> c15t25d1 ONLINE 0 0 0
>>> c15t27d1 ONLINE 0 0 0
>>> mirror-6 ONLINE 0 0 0
>>> c15t26d1 ONLINE 0 0 0
>>> c15t28d1 ONLINE 0 0 0
>>> mirror-7 ONLINE 0 0 0
>>> c15t29d1 ONLINE 0 0 0
>>> c15t31d1 ONLINE 0 0 0
>>> mirror-8 ONLINE 0 0 0
>>> c15t32d1 ONLINE 0 0 0
>>> c15t30d1 ONLINE 0 0 0
>>> logs
>>> mirror-4 ONLINE 0 0 0
>>> c14t1d0 ONLINE 0 0 0
>>> c14t3d0 ONLINE 0 0 0
>>> cache
>>> c14t4d0 ONLINE 0 0 0
>>>
>>> Smartinfo:
>>>
>>> c15t17d1 3001 GB STORAGE01 mirror DEGRADED
>>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 40
>>> °C Z1F21TA7 without error short long abort log
>>> c15t18d1 3001 GB STORAGE01 mirror DEGRADED
>>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 41
>>> °C Z1F27DCD without error short long abort log
>>> c15t19d1 3001 GB STORAGE01 mirror DEGRADED
>>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 39
>>> °C Z1F28PSJ without error short long abort log
>>> c15t21d1 3001 GB STORAGE01 mirror DEGRADED
>>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 35
>>> °C Z1F21NPM without error short long abort log
>>> c15t22d1 3001 GB STORAGE01 mirror DEGRADED
>>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 38
>>> °C Z1F27CFV without error short long abort log
>>> c15t23d1 3001 GB STORAGE01 mirror DEGRADED
>>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 40
>>> °C Z1F27EHT without error short long abort log
>>> c15t24d1 3001 GB STORAGE01 mirror DEGRADED
>>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 41
>>> °C Z1F27796 without error short long abort log
>>> c15t35d1 3001 GB STORAGE01 mirror DEGRADED
>>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 25
>>> °C Z1F27E4X without error short long abort log
>>>
>>> HD Info:
>>>
>>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>>> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
>>> www.smartmontools.org
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family: Seagate Barracuda 7200.14 (AF)
>>> Device Model: ST3000DM001-1CH166
>>> Serial Number: Z1F21TA7
>>> LU WWN Device Id: 5 000c50 04f6f0c73
>>> Firmware Version: CC24
>>> User Capacity: 3,000,592,982,016 bytes [3.00 TB]
>>> Sector Sizes: 512 bytes logical, 4096 bytes physical
>>> Rotation Rate: 7200 rpm
>>> Device is: In smartctl database [for details use: -P show]
>>> ATA Version is: ATA8-ACS T13/1699-D revision 4
>>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is: Mon Jul 1 16:56:42 2013 BRT
>>>
>>> Can anyone help me?
>> Try iostat -Exn and have a look at "fmadm faulty" to see if you can
>> pinpoint the fault source.
>>
>> Cheers,
>
More information about the OmniOS-discuss
mailing list