[OmniOS-discuss] Ang: Re: SSD rpool degraded
Johan Kragsterman
johan.kragsterman at capvert.se
Thu Jul 2 15:03:38 UTC 2015
Hi!
It is only rpool, no data, but of coarse configuration data. So there are no write-intensive applications writing to the drive, only log. Not even swap or dump, they are on different devices.
It would be nice to know if there was a way to tell wether it is the drive or the port, though...
Best regards from/Med vänliga hälsningar från
Johan Kragsterman
Capvert
-----Yavor Tomov <yavoritomov at gmail.com> skrev: -----
Till: Johan Kragsterman <johan.kragsterman at capvert.se>
Från: Yavor Tomov <yavoritomov at gmail.com>
Datum: 2015-07-02 16:41
Kopia: omnios-discuss at lists.omniti.com
Ärende: Re: [OmniOS-discuss] SSD rpool degraded
First make sure you have a backup of your data. SSD are famous of failing at the same time if they were installed at the same time. Also if you are using the pool by write intensive application SSDs can die quick. Depending on the drive you should be able to look at the Erase Count attribute from the SMART information. The value should be above a 100, if under some data loss is possible, if "0" the ssd will stop working, all this dependence on the manufacture. The easiest thing will be to replace the drive and see if the errors come back.
Good Luck
Yavor Tomov
On Thu, Jul 2, 2015 at 8:55 AM, Johan Kragsterman <johan.kragsterman at capvert.se> wrote:
Hi!
I got a degraded rpool, consisting of a mirror of two SSD's. I feel unsure about if it really is the SSD that have failed, since it is enterprise grade and haven't been running that long
I would like to know if there is a way to figure out wether it is the SATA port or the SSD that have failed.
The zpool status looks like this:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c2t0d0s0 ONLINE 0 0 0
c2t1d0s0 FAULTED 1 191 0 too many errors
dmesg containes this:
Jun 29 00:39:47 omni2 genunix: [ID 517647 kern.warning] WARNING: ahci0: watchdog port 1 satapkt 0xffffff065eb76860 timed out
Jun 29 00:39:58 omni2 genunix: [ID 860969 kern.warning] WARNING: ahci0: ahci_port_reset port 1 the device hardware has been initialized and the power-up diagnostics failed
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2 SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2 SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2 SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2 SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2 SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2 SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2 SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2 SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2 SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2 SATA port 1 error
Jun 29 00:39:59 omni2 genunix: [ID 801845 kern.info] /pci at 0,0/pci1028,26e at 1f,2:
Jun 29 00:39:59 omni2 SATA port 1 error
Jun 29 00:40:14 omni2 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
Jun 29 00:40:14 omni2 EVENT-TIME: Mon Jun 29 00:40:14 CEST 2015
Jun 29 00:40:14 omni2 PLATFORM: Precision-WorkStation-T5500, CSN: BCLJ55J, HOSTNAME: omni2
Jun 29 00:40:14 omni2 SOURCE: zfs-diagnosis, REV: 1.0
Jun 29 00:40:14 omni2 EVENT-ID: e44ba921-004f-61f8-cbdf-8f1ebf0d57c0
Jun 29 00:40:14 omni2 DESC: The number of I/O errors associated with a ZFS device exceeded
Jun 29 00:40:14 omni2 acceptable levels. Refer to http://illumos.org/msg/ZFS-8000-FD for more information.
Jun 29 00:40:14 omni2 AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt
Jun 29 00:40:14 omni2 will be made to activate a hot spare if available.
Jun 29 00:40:14 omni2 IMPACT: Fault tolerance of the pool may be compromised.
Jun 29 00:40:14 omni2 REC-ACTION: Run 'zpool status -x' and replace the bad device.
>From that it looks like zfs hinting that it is the device, not the port...
cfgadm -al:
root at omni2:/root# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c3 scsi-sas connected unconfigured unknown
c5 scsi-sas connected configured unknown
c5::w5000c50078e5135e,0 disk-path connected configured unknown
c8 scsi-sas connected configured unknown
c8::w5000c5007ffee30b,0 disk-path connected configured unknown
c9 scsi-sas connected configured unknown
c9::w500a0751034af6dc,0 disk-path connected configured unknown
sata1/0::dsk/c2t0d0 disk connected configured ok
sata1/1 sata-port disconnected unconfigured failed
sata1/2 sata-port empty unconfigured ok
sata1/3 sata-port empty unconfigured ok
sata1/4 sata-port empty unconfigured ok
sata1/5 sata-port empty unconfigured ok
But with cfgadm I get unsure again...
Someone know...?
Best regards from/Med vänliga hälsningar från
Johan Kragsterman
Capvert
_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss
More information about the OmniOS-discuss
mailing list