[OmniOS-discuss] ZFS Checksum problem
Richard Elling
richard.elling at richardelling.com
Mon Jul 1 21:27:48 UTC 2013
troubleshooting tips below...
On Jul 1, 2013, at 1:15 PM, Saso Kiselkov <skiselkov.ml at gmail.com> wrote:
> On 01/07/2013 21:00, "Daniel D. Gonçalves" wrote:
>> I'm having trouble checksum in my ZFS pool, I tried to change data
>> cables and power of HDDs, but the problems remain.
>> All 8 HDDs that are exhibiting errors are identicaland all is on the
>> same controller.
>>
>> NAME STATE READ WRITE CKSUM
>> STORAGE01 DEGRADED 0 0 347
>> mirror-0 DEGRADED 0 0 188
>> c15t35d1 DEGRADED 0 0 188 too many errors
>> c15t18d1 DEGRADED 0 0 188 too many errors
>> mirror-1 DEGRADED 0 0 170
>> c15t21d1 DEGRADED 0 0 170 too many errors
>> c15t22d1 DEGRADED 0 0 170 too many errors
>> mirror-2 DEGRADED 0 0 164
>> c15t17d1 DEGRADED 0 0 164 too many errors
>> c15t19d1 DEGRADED 0 0 164 too many errors
>> mirror-3 DEGRADED 0 0 172
>> c15t24d1 DEGRADED 0 0 172 too many errors
>> c15t23d1 DEGRADED 0 0 172 too many errors
smells like a bad cable, power supply, expander, or controller...
>> mirror-5 ONLINE 0 0 0
>> c15t25d1 ONLINE 0 0 0
>> c15t27d1 ONLINE 0 0 0
>> mirror-6 ONLINE 0 0 0
>> c15t26d1 ONLINE 0 0 0
>> c15t28d1 ONLINE 0 0 0
>> mirror-7 ONLINE 0 0 0
>> c15t29d1 ONLINE 0 0 0
>> c15t31d1 ONLINE 0 0 0
>> mirror-8 ONLINE 0 0 0
>> c15t32d1 ONLINE 0 0 0
>> c15t30d1 ONLINE 0 0 0
>> logs
>> mirror-4 ONLINE 0 0 0
>> c14t1d0 ONLINE 0 0 0
>> c14t3d0 ONLINE 0 0 0
>> cache
>> c14t4d0 ONLINE 0 0 0
>>
>> Smartinfo:
>>
>> c15t17d1 3001 GB STORAGE01 mirror DEGRADED
>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 40
>> °C Z1F21TA7 without error short long abort log
>> c15t18d1 3001 GB STORAGE01 mirror DEGRADED
>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 41
>> °C Z1F27DCD without error short long abort log
>> c15t19d1 3001 GB STORAGE01 mirror DEGRADED
>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 39
>> °C Z1F28PSJ without error short long abort log
>> c15t21d1 3001 GB STORAGE01 mirror DEGRADED
>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 35
>> °C Z1F21NPM without error short long abort log
>> c15t22d1 3001 GB STORAGE01 mirror DEGRADED
>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 38
>> °C Z1F27CFV without error short long abort log
>> c15t23d1 3001 GB STORAGE01 mirror DEGRADED
>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 40
>> °C Z1F27EHT without error short long abort log
>> c15t24d1 3001 GB STORAGE01 mirror DEGRADED
>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 41
>> °C Z1F27796 without error short long abort log
>> c15t35d1 3001 GB STORAGE01 mirror DEGRADED
>> S:4 H:0 T:0 ST3000DM001-1CH166 sat,12 PASSED 25
>> °C Z1F27E4X without error short long abort log
disks look clean
>>
>> HD Info:
>>
>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>> Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Model Family: Seagate Barracuda 7200.14 (AF)
>> Device Model: ST3000DM001-1CH166
>> Serial Number: Z1F21TA7
>> LU WWN Device Id: 5 000c50 04f6f0c73
>> Firmware Version: CC24
>> User Capacity: 3,000,592,982,016 bytes [3.00 TB]
>> Sector Sizes: 512 bytes logical, 4096 bytes physical
>> Rotation Rate: 7200 rpm
>> Device is: In smartctl database [for details use: -P show]
>> ATA Version is: ATA8-ACS T13/1699-D revision 4
>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is: Mon Jul 1 16:56:42 2013 BRT
>>
>> Can anyone help me?
>
> Try iostat -Exn and have a look at "fmadm faulty" to see if you can
> pinpoint the fault source.
fmdump -e
shows the error reports. These are more interesting than the faulty diagnosis in this
case. If the only error reports you see are ZFS checksums, then it is more difficult to
get to root cause, because that means something in the datapath is corrupting data.
If you see other errors, like transport or data errors, then it will be a better clue.
-- richard
>
> Cheers,
> --
> Saso
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
--
Richard.Elling at RichardElling.com
+1-760-896-4422
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20130701/bc9672e1/attachment.html>
More information about the OmniOS-discuss
mailing list