[OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs
Stephan Budach
stephan.budach at jvm.de
Wed Jan 25 23:01:14 UTC 2017
Ooops… should have waited with sending that message after I rebootet the
S11.1 host…
Am 25.01.17 um 23:41 schrieb Stephan Budach:
> Hi Richard,
>
> Am 25.01.17 um 20:27 schrieb Richard Elling:
>> Hi Stephan,
>>
>>> On Jan 25, 2017, at 5:54 AM, Stephan Budach <stephan.budach at JVM.DE
>>> <mailto:stephan.budach at jvm.de>> wrote:
>>>
>>> Hi guys,
>>>
>>> I have been trying to import a zpool, based on a 3way-mirror
>>> provided by three omniOS boxes via iSCSI. This zpool had been
>>> working flawlessly until some random reboot of the S11.1 host. Since
>>> then, S11.1 has been importing this zpool without success.
>>>
>>> This zpool consists of three 108TB LUNs, based on a raidz-2 zvols…
>>> yeah I know, we shouldn't have done that in the first place, but
>>> performance was not the primary goal for that, as this one is a
>>> backup/archive pool.
>>>
>>> When issueing a zpool import, it says this:
>>>
>>> root at solaris11atest2:~# zpool import
>>> pool: vsmPool10
>>> id: 12653649504720395171
>>> state: DEGRADED
>>> status: The pool was last accessed by another system.
>>> action: The pool can be imported despite missing or damaged
>>> devices. The
>>> fault tolerance of the pool may be compromised if imported.
>>> see: http://support.oracle.com/msg/ZFS-8000-EY
>>> config:
>>>
>>> vsmPool10 DEGRADED
>>> mirror-0 DEGRADED
>>> c0t600144F07A3506580000569398F60001d0 DEGRADED corrupted data
>>> c0t600144F07A35066C00005693A0D90001d0 DEGRADED corrupted data
>>> c0t600144F07A35001A00005693A2810001d0 DEGRADED corrupted data
>>>
>>> device details:
>>>
>>> c0t600144F07A3506580000569398F60001d0 DEGRADED
>>> scrub/resilver needed
>>> status: ZFS detected errors on this device.
>>> The device is missing some data that is recoverable.
>>>
>>> c0t600144F07A35066C00005693A0D90001d0 DEGRADED
>>> scrub/resilver needed
>>> status: ZFS detected errors on this device.
>>> The device is missing some data that is recoverable.
>>>
>>> c0t600144F07A35001A00005693A2810001d0 DEGRADED
>>> scrub/resilver needed
>>> status: ZFS detected errors on this device.
>>> The device is missing some data that is recoverable.
>>>
>>> However, when actually running zpool import -f vsmPool10, the
>>> system starts to perform a lot of writes on the LUNs and iostat
>>> report an alarming increase in h/w errors:
>>>
>>> root at solaris11atest2:~# iostat -xeM 5
>>> extended device statistics ----
>>> errors ---
>>> device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w
>>> trn tot
>>> sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0
>>> 0 0
>>> sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0
>>> 0 0
>>> sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 71
>>> 0 71
>>> sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0
>>> 0 0
>>> sd4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0
>>> 0 0
>>> sd5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0
>>> 0 0
>>> extended device statistics ----
>>> errors ---
>>> device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w
>>> trn tot
>>> sd0 14.2 147.3 0.7 0.4 0.2 0.1 2.0 6 9 0 0
>>> 0 0
>>> sd1 14.2 8.4 0.4 0.0 0.0 0.0 0.3 0 0 0 0
>>> 0 0
>>> sd2 0.0 4.2 0.0 0.0 0.0 0.0 0.0 0 0 0 92
>>> 0 92
>>> sd3 157.3 46.2 2.1 0.2 0.0 0.7 3.7 0 14 0 30
>>> 0 30
>>> sd4 123.9 29.4 1.6 0.1 0.0 1.7 10.9 0 36 0 40
>>> 0 40
>>> sd5 142.5 43.0 2.0 0.1 0.0 1.9 10.2 0 45 0 88
>>> 0 88
>>> extended device statistics ----
>>> errors ---
>>> device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w
>>> trn tot
>>> sd0 0.0 234.5 0.0 0.6 0.2 0.1 1.4 6 10 0 0
>>> 0 0
>>> sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0
>>> 0 0
>>> sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 92
>>> 0 92
>>> sd3 3.6 64.0 0.0 0.5 0.0 4.3 63.2 0 63 0 235
>>> 0 235
>>> sd4 3.0 67.0 0.0 0.6 0.0 4.2 60.5 0 68 0 298
>>> 0 298
>>> sd5 4.2 59.6 0.0 0.4 0.0 5.2 81.0 0 72 0 406
>>> 0 406
>>> extended device statistics ----
>>> errors ---
>>> device r/s w/s Mr/s Mw/s wait actv svc_t %w %b s/w h/w
>>> trn tot
>>> sd0 0.0 234.8 0.0 0.7 0.4 0.1 2.2 11 10 0 0
>>> 0 0
>>> sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0
>>> 0 0
>>> sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 92
>>> 0 92
>>> sd3 5.4 54.4 0.0 0.3 0.0 2.9 48.5 0 67 0 384
>>> 0 384
>>> sd4 6.0 53.4 0.0 0.3 0.0 4.6 77.7 0 87 0 519
>>> 0 519
>>> sd5 6.0 60.8 0.0 0.3 0.0 4.8 72.5 0 87 0 727
>>> 0 727
>>
>> h/w errors are a classification of other errors. The full error list
>> is available from "iostat -E" and will
>> be important to tracking this down.
>>
>> A better, more detailed analysis can be gleaned from the "fmdump -e"
>> ereports that should be
>> associated with each h/w error. However, there are dozens of causes
>> of these so we don’t have
>> enough info here to fully understand.
>> — richard
>>
> Well… I can't provide you with the output of fmdump -e (since I am
> currently unable to get the '-' typed in to the console, due to some
> fancy keyboard layout issues and nit being able to login via ssh as
> well (can authenticate, but I don't get to the shell, which may be due
> to the running zpool import), but I can confirm that fmdump does show
> nothing at all. I could just reset the S11.1 host, after removing the
> zpool.cache file, such as that the system will not try to import the
> zpool upon restart right away…
>
> …plus I might get the option to set the keyboard right, after reboot,
> but that's another issue…
>
After resetting the S11.1 host and getting the keyboard layout right, I
issued a fmdump -e and there they are… lots of:
Jan 25 23:25:13.5643 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.8944 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.8945 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.8946 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9274 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9275 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9276 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9277 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9282 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9284 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9285 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9286 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9287 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9288 ereport.fs.zfs.dev.merr.write
Jan 25 23:25:13.9290 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9294 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9301 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9306 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:50:44.7195 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:50:44.7306 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:50:44.7434 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:53:31.4386 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:53:31.4579 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:53:31.4710 ereport.io.scsi.cmd.disk.dev.rqs.derr
These seem to be media errors and disk errors on the zpools/zvols that
make up the LUNs for this zpool… I am wondering, why this happens.
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20170126/bed6b1ea/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5546 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20170126/bed6b1ea/attachment-0001.bin>
More information about the OmniOS-discuss
mailing list