[OmniOS-discuss] Ang: Re: SSD rpool degraded

Johan Kragsterman johan.kragsterman at capvert.se
Thu Jul 2 15:03:38 UTC 2015


Hi!


It is only rpool, no data, but of coarse configuration data. So there are no write-intensive applications writing to the drive, only log. Not even swap or dump, they are on different devices.

It would be nice to know if there was a way to tell wether it is the drive or the port, though...


Best regards from/Med vänliga hälsningar från

Johan Kragsterman

Capvert


-----Yavor Tomov <yavoritomov at gmail.com> skrev: -----
Till: Johan Kragsterman <johan.kragsterman at capvert.se>
Från: Yavor Tomov <yavoritomov at gmail.com>
Datum: 2015-07-02 16:41
Kopia: omnios-discuss at lists.omniti.com
Ärende: Re: [OmniOS-discuss] SSD rpool degraded

First make sure you have a backup of your data. SSD are famous of failing at the same time if they were installed at the same time. Also if you are using the pool by write intensive application SSDs can die quick. Depending on the drive you should be able to look at the Erase Count attribute from the SMART information. The value should be above a 100, if under some data loss is possible, if "0" the ssd will stop working, all this dependence on the manufacture. The easiest thing will be to replace the drive and see if the errors come back. 

Good Luck
Yavor Tomov

On Thu, Jul 2, 2015 at 8:55 AM, Johan Kragsterman <johan.kragsterman at capvert.se> wrote:

Hi!

I got a degraded rpool, consisting of a mirror of two SSD's. I feel unsure about if it really is the SSD that have failed, since it is enterprise grade and haven't been running that long

I would like to know if there is a way to figure out wether it is the SATA port or the SSD that have failed.

The zpool status looks like this:

NAME          STATE     READ WRITE CKSUM
        rpool         DEGRADED     0     0     0
          mirror-0    DEGRADED     0     0     0
            c2t0d0s0  ONLINE       0     0     0
            c2t1d0s0  FAULTED      1   191     0  too many errors



dmesg containes this:

Jun 29 00:39:47 omni2 genunix: [ID 517647 kern.warning] WARNING: ahci0: watchdog port 1 satapkt 0xffffff065eb76860 timed out
Jun 29 00:39:58 omni2 genunix: [ID 860969 kern.warning] WARNING: ahci0: ahci_port_reset port 1 the device hardware has been initialized and the power-up diagnostics failed
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2  SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2  SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2  SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2  SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2  SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2  SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2  SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2  SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2  SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2  SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2  SATA port 1 error
Jun 29 00:40:14 omni2 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
Jun 29 00:40:14 omni2 EVENT-TIME: Mon Jun 29 00:40:14 CEST 2015
Jun 29 00:40:14 omni2 PLATFORM: Precision-WorkStation-T5500, CSN: BCLJ55J, HOSTNAME: omni2
Jun 29 00:40:14 omni2 SOURCE: zfs-diagnosis, REV: 1.0
Jun 29 00:40:14 omni2 EVENT-ID: e44ba921-004f-61f8-cbdf-8f1ebf0d57c0
Jun 29 00:40:14 omni2 DESC: The number of I/O errors associated with a ZFS device exceeded
Jun 29 00:40:14 omni2        acceptable levels.  Refer to http://illumos.org/msg/ZFS-8000-FD for more information.
Jun 29 00:40:14 omni2 AUTO-RESPONSE: The device has been offlined and marked as faulted.  An attempt
Jun 29 00:40:14 omni2        will be made to activate a hot spare if available.
Jun 29 00:40:14 omni2 IMPACT: Fault tolerance of the pool may be compromised.
Jun 29 00:40:14 omni2 REC-ACTION: Run 'zpool status -x' and replace the bad device.



>From that it looks like zfs hinting that it is the device, not the port...




cfgadm -al:

root at omni2:/root# cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c3                             scsi-sas     connected    unconfigured unknown
c5                             scsi-sas     connected    configured   unknown
c5::w5000c50078e5135e,0        disk-path    connected    configured   unknown
c8                             scsi-sas     connected    configured   unknown
c8::w5000c5007ffee30b,0        disk-path    connected    configured   unknown
c9                             scsi-sas     connected    configured   unknown
c9::w500a0751034af6dc,0        disk-path    connected    configured   unknown
sata1/0::dsk/c2t0d0            disk         connected    configured   ok
sata1/1                        sata-port    disconnected unconfigured failed
sata1/2                        sata-port    empty        unconfigured ok
sata1/3                        sata-port    empty        unconfigured ok
sata1/4                        sata-port    empty        unconfigured ok
sata1/5                        sata-port    empty        unconfigured ok


But with cfgadm I get unsure again...

Someone know...?


Best regards from/Med vänliga hälsningar från

Johan Kragsterman

Capvert

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss





More information about the OmniOS-discuss mailing list