[OmniOS-discuss] OmniOS backup box hanging regularly

Jim Klimov jimklimov at cos.ru
Fri Oct 23 16:54:27 UTC 2015


23 октября 2015 г. 11:23:28 CEST, Jim Klimov <jim at cos.ru> пишет:
>A new bit of info came in. I left the box running along with an SSH
>session running various tracers overnight, and it seems that the system
>dumbly ran out of memory. 
> 
>However it never logged any forking errors, etc. which were typical for
>similar cases before, and did not recover by time or "magic" (e.g.
>processes dying on ENOMEM and so freeing it up). There is some swap
>free, too (which wouldn't help if it ran out of kernel memory). The
>64Mb free RAM (or 32Mb in some of my older experiences) is the empiric
>minimum under which illumos is good as dead ;)
> 
>The box is pingable after all, but no SSH nor local usability. The
>"top" listings froze at 4:25am, that's some 7 hours ago.
> 
>last pid: 26331;  load avg:  6.65,  5.61,  6.88;       up 0+19:42:47   
>04:25:17
>208 processes: 178 sleeping, 28 running, 1 zombie, 1 on cpu
>CPU states: 21.5% idle,  3.5% user, 75.0% kernel,  0.0% iowait,  0.0%
>swap
>Memory: 16G phys mem, 64M free mem, 2048M total swap, 1420M free swap
>   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
> 24910 zfssnap    1  60    2  243M  238M run      1:24  1.61% zfs
> 25620 zfssnap    1  53    2  220M  215M run      1:18  1.17% zfs
> 25619 zfssnap    1  53    2  220M  215M run      1:18  1.15% zfs
> 24753 zfssnap    1  53    2  243M  238M run      1:26  1.12% zfs
> 25864 zfssnap    1  53    2  220M  215M run      1:19  0.93% zfs
> 25861 zfssnap    1  53    2   13M   10M sleep    0:03  0.87% zfs
> 22380 zfssnap    1  60    2  764M  721M run      4:34  0.83% zfs
> 22546 zfssnap    1  53    2  698M  672M sleep    4:08  0.79% zfs
> 25857 zfssnap    1  53    2  220M  215M run      1:19  0.78% zfs
> 24224 zfssnap    1  60    2  548M  536M run      3:13  0.76% zfs
> 22901 zfssnap    1  60    2  698M  672M run      4:09  0.75% zfs
> 22551 zfssnap    1  60    2  698M  672M sleep    4:08  0.73% zfs
> 22373 zfssnap    1  60    2  767M  729M run      4:33  0.73% zfs
> 22212 zfssnap    1  60    2  767M  730M sleep    4:36  0.71% zfs
> 24215 zfssnap    1  60    2  549M  537M sleep    3:13  0.69% zfs
> 
> 
>Heh, by sheer coincidence, it froze 10 hours 1 second after I logged in
>(which in my profile also prints a few lines from top):
> 
>last pid: 22036;  load avg:  10.7,  10.3,  10.4;       up 0+09:42:46  
>18:25:16
>126 processes: 115 sleeping, 9 running, 2 on cpu
>CPU states:  0.0% idle, 21.4% user, 78.6% kernel,  0.0% iowait,  0.0%
>swap
>Memory: 16G phys mem, 1713M free mem, 2048M total swap, 2048M free swap
>   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
> 21266 zfssnap    1  46    2  660M  657M run      3:50 28.20% zfs
> 21274 zfssnap    1  45    2  672M  669M run      3:55 16.70% zfs
> 21757 zfssnap    1  50    2  314M  311M run      1:48  8.95% zfs
> 21389 zfssnap    1  52    2  586M  584M run      3:24  8.82% zfs
> 21173 zfssnap    1  53    2  762M  759M cpu/1    4:28  8.76% zfs
>
>
>So at the moment it seems there is some issue with zfs-auto-snapshots
>on OmniOS that I haven't seen in SXCE, OI nor Hipster. Possibly I had
>different implementations of the service in these different OSes (shell
>vs python, and all that at different versions).
> 
>I'll see if toning down the frequency of autosnaps (e.g. disable
>"frequent" or "hourly" schedules) would help improve stability. If it
>does - I'd still call it a bug. System should not die like that. And
>the actual load (as in I/O ops) is seemingly not that gigantic.
> 
> 
>Jim
>
>----- Исходное сообщение -----
>От: Jim Klimov <jim at cos.ru>
>Дата: Thursday, October 22, 2015 20:02
>Тема: [OmniOS-discuss] OmniOS backup box hanging regularly
>Кому (To): OmniOS-discuss <omnios-discuss at lists.omniti.com>
>
>
>> Hello all,
>> I have this HP-Z400 workstation with 16Gb ECC(should be) RAM running
>OmniOS bloody, which acts as a backup server for our production systems
>(regularly rsync'ing large files off Linux boxes, and rotating ZFS
>auto-snapshots to keep its space free). Sometimes it also runs replicas
>of infrastructure (DHCP, DNS) and was set up as a VirtualBox +
>phpVirtualBox host to test that out, but no VMs running.
>> So the essential loads are ZFS snapshots and ZFS scrubs :)
>> And it freezes roughly every week. Stops responding to ping, attempts
>to log in via SSH or physical console - it processes keypresses on the
>latter, but does not present a login prompt. It used to be stable, and
>such regular hangs began around summertime.
>>  
>> My primary guess would be for flaky disks, maybe timing out under
>load or going to sleep or whatever... But I have yet to prove it, or
>any other theory. Maybe just CPU is overheating due to regular
>near-100% load with disk I/O... At least I want to rule out OS errors
>and rule out (or point out) operator/box errors as much as possible -
>which is something I can change to try and fix ;)
>> Before I proceed to TL;DR screenshots, I'd overview what I see:
>> * In the "top" output, processes owned by zfssnap lead most of the
>time... But even the SSH shell is noticeably slow to respond (1 sec per
>line when just pressing enter to clear the screen to prepare nice
>screenshots).
>> * SMART was not enabled on 3TB mirrored "pool" SATA disks (is now,
>long tests initiated), but was in place on the "rpool" SAS disk where
>it logged some corrected ECC errors - but none uncorrected.
>> Maybe the cabling should be reseated.
>> * iostat shows disks are generally not busy (they don't audibly
>rattle nor visibly blink all the time, either)
>> * zpool scrubs return clean
>> * there are partitions of the system rpool disk (10K RPM SAS) used as
>log and cache devices for the main data pool on 3TB SATA disks. The
>system disk is fast and underutilized, so what the heck ;) And it was
>not a problem for the first year of this system's honest and stable
>workouts. These devices are pretty empty at the moment.
>>  
>> I have enabled deadman panics according to Wiki, but none have
>happened so far:
>> # cat /etc/system  | egrep -v '(^\*|^$)'
>> set snooping=1
>> set pcplusmp:apic_panic_on_nmi=1
>> set apix:apic_panic_on_nmi = 1
>
>>  
>>  
>> In the "top" output, processes owned by zfssnap lead most of the
>time:
>>  
>> last pid: 22599;  load avg:  12.9,  12.2,  11.2;       up 0+09:52:11 
>                                                               18:34:41
>> 140 processes: 125 sleeping, 13 running, 2 on cpu
>> CPU states:  0.0% idle, 22.9% user, 77.1% kernel,  0.0% iowait,  0.0%
>swap
>> Memory: 16G phys mem, 1765M free mem, 2048M total swap, 2048M free
>swap
>> Seconds to delay:
>>    PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
>>  21389 zfssnap    1  43    2  863M  860M run      5:04 35.61% zfs
>>  22360 zfssnap    1  52    2  118M  115M run      0:37 16.50% zfs
>>  21778 zfssnap    1  52    2  563M  560M run      3:15 13.17% zfs
>>  21278 zfssnap    1  52    2  947M  944M run      5:32  6.91% zfs
>>  21881 zfssnap    1  43    2  433M  431M run      2:31  5.41% zfs
>>  21852 zfssnap    1  52    2  459M  456M run      2:39  5.16% zfs
>>  21266 zfssnap    1  43    2  906M  903M run      5:18  3.95% zfs
>>  21757 zfssnap    1  43    2  597M  594M run      3:26  2.91% zfs
>>  21274 zfssnap    1  52    2  930M  927M cpu/0    5:27  2.78% zfs
>>  22588 zfssnap    1  43    2   30M   27M run      0:08  2.48% zfs
>>  22580 zfssnap    1  52    2   49M   46M run      0:14  0.71% zfs
>>  22038 root       1  59    0 5312K 3816K cpu/1    0:01  0.10% top
>>  22014 root       1  59    0 8020K 4988K sleep    0:00  0.02% sshd
>
>>  
>> Average "iostats" are not that busy:
>>  
>> # zpool iostat -Td 5
>> Thu Oct 22 18:24:59 CEST 2015
>>                capacity     operations    bandwidth
>> pool        alloc   free   read  write   read  write
>> ----------  -----  -----  -----  -----  -----  -----
>> pool        2.52T   207G    802    116  28.3M   840K
>> rpool       33.0G   118G      0      4  4.52K  58.7K
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:04 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0     10      0  97.9K
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:09 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:14 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      9      0  93.5K
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:19 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:24 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:29 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:34 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:25:39 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0     16      0   374K
>> ----------  -----  -----  -----  -----  -----  -----
>> ...
>> Thu Oct 22 18:33:49 CEST 2015
>> pool        2.52T   207G      0      0      0      0
>> rpool       33.0G   118G      0     11      0  94.5K
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:33:54 CEST 2015
>> pool        2.52T   207G      0     13    819  80.0K
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:33:59 CEST 2015
>> pool        2.52T   207G      0    129      0  1.06M
>> rpool       33.0G   118G      0      0      0      0
>> ----------  -----  -----  -----  -----  -----  -----
>> Thu Oct 22 18:34:04 CEST 2015
>> pool        2.52T   207G      0     55      0   503K
>> rpool       33.0G   118G      0     11      0  97.9K
>> ----------  -----  -----  -----  -----  -----  -----
>> ...
>> just occasional bursts of work. 
>> I've now enabled SMART on the disks (2*3Tb mirror "pool" and 1*300Gb
>"rpool") and ran some short tests and triggered long tests (hopefully
>they'd succeed by tomorrow); current results are:
>> 
>> 
>> # for D in /dev/rdsk/c0*s0; do echo "===== $D :"; smartctl -d sat,12
>-a $D ; done ; for D in /dev/rdsk/c4*s0 ; do echo "===== $D :";
>smartctl -d scsi -a $D ; done
>> ===== /dev/rdsk/c0t3d0s0 :
>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
>www.smartmontools.org
>> === START OF INFORMATION SECTION ===
>> Device Model:     WDC WD3003FZEX-00Z4SA0
>> Serial Number:    WD-WCC5D1KKU0PA
>> LU WWN Device Id: 5 0014ee 2610716b7
>> Firmware Version: 01.01A01
>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Device is:        Not in smartctl database [for details use: -P
>showall]
>> ATA Version is:   ACS-2 (minor revision not indicated)
>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
>> Local Time is:    Thu Oct 22 18:45:28 2015 CEST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>> General SMART Values:
>> Offline data collection status:  (0x82) Offline data collection
>activity
>>                                         was completed without error.
>>                                         Auto Offline Data Collection:
>Enabled.
>> Self-test execution status:      ( 249) Self-test routine in
>progress...
>>                                         90% of test remaining.
>> Total time to complete Offline
>> data collection:                (32880) seconds.
>> Offline data collection
>> capabilities:                    (0x7b) SMART execute Offline
>immediate.
>>                                         Auto Offline data collection
>on/off support.
>>                                         Suspend Offline collection
>upon new
>>                                         command.
>>                                         Offline surface scan
>supported.
>>                                         Self-test supported.
>>                                         Conveyance Self-test
>supported.
>>                                         Selective Self-test
>supported.
>> SMART capabilities:            (0x0003) Saves SMART data before
>entering
>>                                         power-saving mode.
>>                                         Supports SMART auto save
>timer.
>> Error logging capability:        (0x01) Error logging supported.
>>                                         General Purpose Logging
>supported.
>> Short self-test routine
>> recommended polling time:        (   2) minutes.
>> Extended self-test routine
>> recommended polling time:        ( 357) minutes.
>> Conveyance self-test routine
>> recommended polling time:        (   5) minutes.
>> SCT capabilities:              (0x7035) SCT Status supported.
>>                                         SCT Feature Control
>supported.
>>                                         SCT Data Table supported.
>> SMART Attributes Data Structure revision number: 16
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE     
>UPDATED  WHEN_FAILED RAW_VALUE
>>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail 
>Always       -       0
>>   3 Spin_Up_Time            0x0027   246   154   021    Pre-fail 
>Always       -       6691
>>   4 Start_Stop_Count        0x0032   100   100   000    Old_age  
>Always       -       14
>>   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail 
>Always       -       0
>>   7 Seek_Error_Rate         0x002e   200   200   000    Old_age  
>Always       -       0
>>   9 Power_On_Hours          0x0032   094   094   000    Old_age  
>Always       -       4869
>>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age  
>Always       -       0
>>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age  
>Always       -       0
>>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age  
>Always       -       14
>>  16 Unknown_Attribute       0x0022   130   070   000    Old_age  
>Always       -       2289651870502
>> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age  
>Always       -       12
>> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age  
>Always       -       2
>> 194 Temperature_Celsius     0x0022   117   111   000    Old_age  
>Always       -       35
>> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age  
>Always       -       0
>> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age  
>Always       -       0
>> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age  
>Offline      -       0
>> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age  
>Always       -       0
>> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age  
>Offline      -       0
>> SMART Error Log Version: 1
>> No Errors Logged
>> SMART Self-test log structure revision number 1
>> Num  Test_Description    Status                  Remaining 
>LifeTime(hours)  LBA_of_first_error
>> # 1  Short offline       Completed without error       00%      4869 
>       -
>> SMART Selective self-test log data structure revision number 1
>>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>>     1        0        0  Not_testing
>>     2        0        0  Not_testing
>>     3        0        0  Not_testing
>>     4        0        0  Not_testing
>>     5        0        0  Not_testing
>> Selective self-test flags (0x0):
>>   After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute
>delay.
>> ===== /dev/rdsk/c0t5d0s0 :
>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
>www.smartmontools.org
>> === START OF INFORMATION SECTION ===
>> Model Family:     Seagate SV35
>> Device Model:     ST3000VX000-1ES166
>> Serial Number:    Z500S3L8
>> LU WWN Device Id: 5 000c50 079e3757b
>> Firmware Version: CV26
>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
>> Local Time is:    Thu Oct 22 18:45:28 2015 CEST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>> General SMART Values:
>> Offline data collection status:  (0x00) Offline data collection
>activity
>>                                         was never started.
>>                                         Auto Offline Data Collection:
>Disabled.
>> Self-test execution status:      ( 249) Self-test routine in
>progress...
>>                                         90% of test remaining.
>> Total time to complete Offline
>> data collection:                (   80) seconds.
>> Offline data collection
>> capabilities:                    (0x73) SMART execute Offline
>immediate.
>>                                         Auto Offline data collection
>on/off support.
>>                                         Suspend Offline collection
>upon new
>>                                         command.
>>                                         No Offline surface scan
>supported.
>>                                         Self-test supported.
>>                                         Conveyance Self-test
>supported.
>>                                         Selective Self-test
>supported.
>> SMART capabilities:            (0x0003) Saves SMART data before
>entering
>>                                         power-saving mode.
>>                                         Supports SMART auto save
>timer.
>> Error logging capability:        (0x01) Error logging supported.
>>                                         General Purpose Logging
>supported.
>> Short self-test routine
>> recommended polling time:        (   1) minutes.
>> Extended self-test routine
>> recommended polling time:        ( 325) minutes.
>> Conveyance self-test routine
>> recommended polling time:        (   2) minutes.
>> SCT capabilities:              (0x10b9) SCT Status supported.
>>                                         SCT Error Recovery Control
>supported.
>>                                         SCT Feature Control
>supported.
>>                                         SCT Data Table supported.
>> SMART Attributes Data Structure revision number: 10
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE     
>UPDATED  WHEN_FAILED RAW_VALUE
>>   1 Raw_Read_Error_Rate     0x000f   105   099   006    Pre-fail 
>Always       -       8600880
>>   3 Spin_Up_Time            0x0003   096   094   000    Pre-fail 
>Always       -       0
>>   4 Start_Stop_Count        0x0032   100   100   020    Old_age  
>Always       -       19
>>   5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
>Always       -       0
>>   7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail 
>Always       -       342685681
>>   9 Power_On_Hours          0x0032   096   096   000    Old_age  
>Always       -       4214
>>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail 
>Always       -       0
>>  12 Power_Cycle_Count       0x0032   100   100   020    Old_age  
>Always       -       19
>> 184 End-to-End_Error        0x0032   100   100   099    Old_age  
>Always       -       0
>> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age  
>Always       -       0
>> 188 Command_Timeout         0x0032   100   100   000    Old_age  
>Always       -       0
>> 189 High_Fly_Writes         0x003a   028   028   000    Old_age  
>Always       -       72
>> 190 Airflow_Temperature_Cel 0x0022   069   065   045    Old_age  
>Always       -       31 (Min/Max 29/32)
>> 191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age  
>Always       -       0
>> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age  
>Always       -       19
>> 193 Load_Cycle_Count        0x0032   100   100   000    Old_age  
>Always       -       28
>> 194 Temperature_Celsius     0x0022   031   040   000    Old_age  
>Always       -       31 (0 20 0 0 0)
>> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age  
>Always       -       0
>> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age  
>Offline      -       0
>> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age  
>Always       -       0
>> SMART Error Log Version: 1
>> No Errors Logged
>> SMART Self-test log structure revision number 1
>> Num  Test_Description    Status                  Remaining 
>LifeTime(hours)  LBA_of_first_error
>> # 1  Extended offline    Self-test routine in progress 90%      4214 
>       -
>> # 2  Short offline       Completed without error       00%      4214 
>       -
>> SMART Selective self-test log data structure revision number 1
>>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>>     1        0        0  Not_testing
>>     2        0        0  Not_testing
>>     3        0        0  Not_testing
>>     4        0        0  Not_testing
>>     5        0        0  Not_testing
>> Selective self-test flags (0x0):
>>   After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute
>delay.
>> ===== /dev/rdsk/c4t5000CCA02A1292DDd0s0 :
>> smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
>> Copyright (C) 2002-12, Bruce Allen, Christian Franke,
>www.smartmontools.org
>> Vendor:               HITACHI
>> Product:              HUS156030VLS600
>> Revision:             HPH1
>> User Capacity:        300,000,000,000 bytes [300 GB]
>> Logical block size:   512 bytes
>> Logical Unit id:      0x5000cca02a1292dc
>> Serial number:                LVVA6NHS
>> Device type:          disk
>> Transport protocol:   SAS
>> Local Time is:        Thu Oct 22 18:45:29 2015 CEST
>> Device supports SMART and is Enabled
>> Temperature Warning Enabled
>> SMART Health Status: OK
>> Current Drive Temperature:     45 C
>> Drive Trip Temperature:        70 C
>> Manufactured in week 14 of year 2012
>> Specified cycle count over device lifetime:  50000
>> Accumulated start-stop cycles:  80
>> Elements in grown defect list: 0
>> Vendor (Seagate) cache information
>>   Blocks sent to initiator = 2340336504406016
>> Error counter log:
>>            Errors Corrected by           Total   Correction    
>Gigabytes    Total
>>                ECC          rereads/    errors   algorithm     
>processed    uncorrected
>>            fast | delayed   rewrites  corrected  invocations   [10^9
>bytes]  errors
>> read:          0   888890         0    888890          0     
>29326.957           0
>> write:         0   961315         0    961315          0      
>6277.560           0
>> Non-medium error count:      283
>> SMART Self-test log
>> Num  Test              Status                 segment  LifeTime 
>LBA_first_err [SK ASC ASQ]
>>      Description                              number   (hours)
>> # 1  Background long   Self test in progress ...   -     NOW         
>       - [-   -    -]
>> # 2  Background long   Aborted (device reset ?)    -   14354         
>       - [-   -    -]
>> # 3  Background short  Completed                   -   14354         
>       - [-   -    -]
>> # 4  Background long   Aborted (device reset ?)    -   14354         
>       - [-   -    -]
>> # 5  Background long   Aborted (device reset ?)    -   14354         
>       - [-   -    -]
>> Long (extended) Self Test duration: 2506 seconds [41.8 minutes]
>
>>  
>> The zpool scrub results and general layout:
>>  
>> # zpool status -v
>>   pool: pool
>>  state: ONLINE
>>   scan: scrub repaired 0 in 164h13m with 0 errors on Thu Oct 22
>18:13:33 2015
>> config:
>>         NAME                       STATE     READ WRITE CKSUM
>>         pool                       ONLINE       0     0     0
>>           mirror-0                 ONLINE       0     0     0
>>             c0t3d0                 ONLINE       0     0     0
>>             c0t5d0                 ONLINE       0     0     0
>>         logs
>>           c4t5000CCA02A1292DDd0p2  ONLINE       0     0     0
>>         cache
>>           c4t5000CCA02A1292DDd0p3  ONLINE       0     0     0
>> errors: No known data errors
>>   pool: rpool
>>  state: ONLINE
>> status: Some supported features are not enabled on the pool. The pool
>can
>>         still be used, but some features are unavailable.
>> action: Enable all features using 'zpool upgrade'. Once this is done,
>>         the pool may no longer be accessible by software that does
>not support
>>         the features. See zpool-features(5) for details.
>>   scan: scrub repaired 0 in 3h3m with 0 errors on Thu Oct  8 04:12:35
>2015
>> config:
>>         NAME                       STATE     READ WRITE CKSUM
>>         rpool                      ONLINE       0     0     0
>>           c4t5000CCA02A1292DDd0s0  ONLINE       0     0     0
>> errors: No known data errors
>
>> # zpool list -v
>> NAME                        SIZE  ALLOC   FREE  EXPANDSZ   FRAG   
>CAP  DEDUP  HEALTH  ALTROOT
>> pool                       2.72T  2.52T   207G         -    68%   
>92%  1.36x  ONLINE  /
>>   mirror                   2.72T  2.52T   207G         -    68%   
>92%
>>     c0t3d0                     -      -      -         -      -     
>-
>>     c0t5d0                     -      -      -         -      -     
>-
>> log                            -      -      -         -      -     
>-
>>   c4t5000CCA02A1292DDd0p2     8G   148K  8.00G         -     0%    
>0%
>> cache                          -      -      -         -      -     
>-
>>   c4t5000CCA02A1292DDd0p3   120G  1.80G   118G         -     0%    
>1%
>> rpool                       151G  33.0G   118G         -    76%   
>21%  1.00x  ONLINE  -
>>   c4t5000CCA02A1292DDd0s0   151G  33.0G   118G         -    76%   
>21%
>
>> Note the long scrub time may have included the downtime while the
>system was frozen until it was rebooted.
>>  
>> Thanks in advance for the fresh pairs of eyeballs,
>> Jim Klimov
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss

Mail apparently got bounced, reposting...
--
Typos courtesy of K-9 Mail on my Samsung Android


More information about the OmniOS-discuss mailing list