[OmniOS-discuss] SSD rpool degraded

Yavor Tomov yavoritomov at gmail.com
Thu Jul 2 14:41:04 UTC 2015


First make sure you have a backup of your data. SSD are famous of failing
at the same time if they were installed at the same time. Also if you are
using the pool by write intensive application SSDs can die quick. Depending
on the drive you should be able to look at the Erase Count attribute from
the SMART information. The value should be above a 100, if under some data
loss is possible, if "0" the ssd will stop working, all this dependence on
the manufacture. The easiest thing will be to replace the drive and see if
the errors come back.

Good Luck
Yavor Tomov

On Thu, Jul 2, 2015 at 8:55 AM, Johan Kragsterman <
johan.kragsterman at capvert.se> wrote:

>
> Hi!
>
> I got a degraded rpool, consisting of a mirror of two SSD's. I feel unsure
> about if it really is the SSD that have failed, since it is enterprise
> grade and haven't been running that long
>
> I would like to know if there is a way to figure out wether it is the SATA
> port or the SSD that have failed.
>
> The zpool status looks like this:
>
> NAME          STATE     READ WRITE CKSUM
>         rpool         DEGRADED     0     0     0
>           mirror-0    DEGRADED     0     0     0
>             c2t0d0s0  ONLINE       0     0     0
>             c2t1d0s0  FAULTED      1   191     0  too many errors
>
>
>
> dmesg containes this:
>
> Jun 29 00:39:47 omni2 genunix: [ID 517647 kern.warning] WARNING: ahci0:
> watchdog port 1 satapkt 0xffffff065eb76860 timed out
> Jun 29 00:39:58 omni2 genunix: [ID 860969 kern.warning] WARNING: ahci0:
> ahci_port_reset port 1 the device hardware has been initialized and the
> power-up diagnostics failed
> Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0
> ,0/pci1028,26e at 1f,2:
> Jun 29 00:39:59 omni2  SATA port 1 error
> Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0
> ,0/pci1028,26e at 1f,2:
> Jun 29 00:39:59 omni2  SATA port 1 error
> Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0
> ,0/pci1028,26e at 1f,2:
> Jun 29 00:39:59 omni2  SATA port 1 error
> Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0
> ,0/pci1028,26e at 1f,2:
> Jun 29 00:39:59 omni2  SATA port 1 error
> Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0
> ,0/pci1028,26e at 1f,2:
> Jun 29 00:39:59 omni2  SATA port 1 error
> Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0
> ,0/pci1028,26e at 1f,2:
> Jun 29 00:39:59 omni2  SATA port 1 error
> Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0
> ,0/pci1028,26e at 1f,2:
> Jun 29 00:39:59 omni2  SATA port 1 error
> Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0
> ,0/pci1028,26e at 1f,2:
> Jun 29 00:39:59 omni2  SATA port 1 error
> Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0
> ,0/pci1028,26e at 1f,2:
> Jun 29 00:39:59 omni2  SATA port 1 error
> Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0
> ,0/pci1028,26e at 1f,2:
> Jun 29 00:39:59 omni2  SATA port 1 error
> Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0
> ,0/pci1028,26e at 1f,2:
> Jun 29 00:39:59 omni2  SATA port 1 error
> Jun 29 00:40:14 omni2 fmd: [ID 377184 daemon.error] SUNW-MSG-ID:
> ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
> Jun 29 00:40:14 omni2 EVENT-TIME: Mon Jun 29 00:40:14 CEST 2015
> Jun 29 00:40:14 omni2 PLATFORM: Precision-WorkStation-T5500, CSN: BCLJ55J,
> HOSTNAME: omni2
> Jun 29 00:40:14 omni2 SOURCE: zfs-diagnosis, REV: 1.0
> Jun 29 00:40:14 omni2 EVENT-ID: e44ba921-004f-61f8-cbdf-8f1ebf0d57c0
> Jun 29 00:40:14 omni2 DESC: The number of I/O errors associated with a ZFS
> device exceeded
> Jun 29 00:40:14 omni2        acceptable levels.  Refer to
> http://illumos.org/msg/ZFS-8000-FD for more information.
> Jun 29 00:40:14 omni2 AUTO-RESPONSE: The device has been offlined and
> marked as faulted.  An attempt
> Jun 29 00:40:14 omni2        will be made to activate a hot spare if
> available.
> Jun 29 00:40:14 omni2 IMPACT: Fault tolerance of the pool may be
> compromised.
> Jun 29 00:40:14 omni2 REC-ACTION: Run 'zpool status -x' and replace the
> bad device.
>
>
>
> From that it looks like zfs hinting that it is the device, not the port...
>
>
>
>
> cfgadm -al:
>
> root at omni2:/root# cfgadm -al
> Ap_Id                          Type         Receptacle   Occupant
>  Condition
> c3                             scsi-sas     connected    unconfigured
> unknown
> c5                             scsi-sas     connected    configured
>  unknown
> c5::w5000c50078e5135e,0        disk-path    connected    configured
>  unknown
> c8                             scsi-sas     connected    configured
>  unknown
> c8::w5000c5007ffee30b,0        disk-path    connected    configured
>  unknown
> c9                             scsi-sas     connected    configured
>  unknown
> c9::w500a0751034af6dc,0        disk-path    connected    configured
>  unknown
> sata1/0::dsk/c2t0d0            disk         connected    configured   ok
> sata1/1                        sata-port    disconnected unconfigured
> failed
> sata1/2                        sata-port    empty        unconfigured ok
> sata1/3                        sata-port    empty        unconfigured ok
> sata1/4                        sata-port    empty        unconfigured ok
> sata1/5                        sata-port    empty        unconfigured ok
>
>
> But with cfgadm I get unsure again...
>
> Someone know...?
>
>
> Best regards from/Med vänliga hälsningar från
>
> Johan Kragsterman
>
> Capvert
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150702/d6645ace/attachment-0001.html>


More information about the OmniOS-discuss mailing list