[OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

Wed Jan 25 23:01:14 UTC 2017

Ooops… should have waited with sending that message after I rebootet the 
S11.1 host…

Am 25.01.17 um 23:41 schrieb Stephan Budach:
> Hi Richard,
>
> Am 25.01.17 um 20:27 schrieb Richard Elling:
>> Hi Stephan,
>>
>>> On Jan 25, 2017, at 5:54 AM, Stephan Budach <stephan.budach at JVM.DE 
>>> <mailto:stephan.budach at jvm.de>> wrote:
>>>
>>> Hi guys,
>>>
>>> I have been trying to import a zpool, based on a 3way-mirror 
>>> provided by three omniOS boxes via iSCSI. This zpool had been 
>>> working flawlessly until some random reboot of the S11.1 host. Since 
>>> then, S11.1 has been importing this zpool without success.
>>>
>>> This zpool consists of three 108TB LUNs, based on a raidz-2 zvols… 
>>> yeah I know, we shouldn't have done that in the first place, but 
>>> performance was not the primary goal for that, as this one is a 
>>> backup/archive pool.
>>>
>>> When issueing a zpool import, it says this:
>>>
>>> root at solaris11atest2:~# zpool import
>>>   pool: vsmPool10
>>>     id: 12653649504720395171
>>>  state: DEGRADED
>>> status: The pool was last accessed by another system.
>>> action: The pool can be imported despite missing or damaged 
>>> devices.  The
>>>         fault tolerance of the pool may be compromised if imported.
>>>    see: http://support.oracle.com/msg/ZFS-8000-EY
>>> config:
>>>
>>> vsmPool10                                  DEGRADED
>>> mirror-0                                 DEGRADED
>>> c0t600144F07A3506580000569398F60001d0  DEGRADED corrupted data
>>> c0t600144F07A35066C00005693A0D90001d0  DEGRADED corrupted data
>>> c0t600144F07A35001A00005693A2810001d0  DEGRADED corrupted data
>>>
>>> device details:
>>>
>>> c0t600144F07A3506580000569398F60001d0 DEGRADED         
>>> scrub/resilver needed
>>>         status: ZFS detected errors on this device.
>>>                 The device is missing some data that is recoverable.
>>>
>>> c0t600144F07A35066C00005693A0D90001d0 DEGRADED         
>>> scrub/resilver needed
>>>         status: ZFS detected errors on this device.
>>>                 The device is missing some data that is recoverable.
>>>
>>> c0t600144F07A35001A00005693A2810001d0 DEGRADED         
>>> scrub/resilver needed
>>>         status: ZFS detected errors on this device.
>>>                 The device is missing some data that is recoverable.
>>>
>>> However, when  actually running zpool import -f vsmPool10, the 
>>> system starts to perform a lot of writes on the LUNs and iostat 
>>> report an alarming increase in h/w errors:
>>>
>>> root at solaris11atest2:~# iostat -xeM 5
>>>                          extended device statistics         ---- 
>>> errors ---
>>> device    r/s    w/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
>>> trn tot
>>> sd0       0.0    0.0    0.0    0.0 0.0  0.0    0.0   0   0   0   0   
>>> 0   0
>>> sd1       0.0    0.0    0.0    0.0 0.0  0.0    0.0   0   0   0   0   
>>> 0   0
>>> sd2       0.0    0.0    0.0    0.0 0.0  0.0    0.0   0   0   0  71   
>>> 0  71
>>> sd3       0.0    0.0    0.0    0.0 0.0  0.0    0.0   0   0   0   0   
>>> 0   0
>>> sd4       0.0    0.0    0.0    0.0 0.0  0.0    0.0   0   0   0   0   
>>> 0   0
>>> sd5       0.0    0.0    0.0    0.0 0.0  0.0    0.0   0   0   0   0   
>>> 0   0
>>>                          extended device statistics         ---- 
>>> errors ---
>>> device    r/s    w/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
>>> trn tot
>>> sd0      14.2  147.3    0.7    0.4 0.2  0.1    2.0   6   9   0   0   
>>> 0   0
>>> sd1      14.2    8.4    0.4    0.0 0.0  0.0    0.3   0   0   0   0   
>>> 0   0
>>> sd2       0.0    4.2    0.0    0.0 0.0  0.0    0.0   0   0   0  92   
>>> 0  92
>>> sd3     157.3   46.2    2.1    0.2 0.0  0.7    3.7   0  14   0  30   
>>> 0  30
>>> sd4     123.9   29.4    1.6    0.1 0.0  1.7   10.9   0  36   0  40   
>>> 0  40
>>> sd5     142.5   43.0    2.0    0.1 0.0  1.9   10.2   0  45   0  88   
>>> 0  88
>>>                          extended device statistics         ---- 
>>> errors ---
>>> device    r/s    w/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
>>> trn tot
>>> sd0       0.0  234.5    0.0    0.6 0.2  0.1    1.4   6  10   0   0   
>>> 0   0
>>> sd1       0.0    0.0    0.0    0.0 0.0  0.0    0.0   0   0   0   0   
>>> 0   0
>>> sd2       0.0    0.0    0.0    0.0 0.0  0.0    0.0   0   0   0  92   
>>> 0  92
>>> sd3       3.6   64.0    0.0    0.5 0.0  4.3   63.2   0  63   0 235   
>>> 0 235
>>> sd4       3.0   67.0    0.0    0.6 0.0  4.2   60.5   0  68   0 298   
>>> 0 298
>>> sd5       4.2   59.6    0.0    0.4 0.0  5.2   81.0   0  72   0 406   
>>> 0 406
>>>                          extended device statistics         ---- 
>>> errors ---
>>> device    r/s    w/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
>>> trn tot
>>> sd0       0.0  234.8    0.0    0.7 0.4  0.1    2.2  11  10   0   0   
>>> 0   0
>>> sd1       0.0    0.0    0.0    0.0 0.0  0.0    0.0   0   0   0   0   
>>> 0   0
>>> sd2       0.0    0.0    0.0    0.0 0.0  0.0    0.0   0   0   0  92   
>>> 0  92
>>> sd3       5.4   54.4    0.0    0.3 0.0  2.9   48.5   0  67   0 384   
>>> 0 384
>>> sd4       6.0   53.4    0.0    0.3 0.0  4.6   77.7   0  87   0 519   
>>> 0 519
>>> sd5       6.0   60.8    0.0    0.3 0.0  4.8   72.5   0  87   0 727   
>>> 0 727
>>
>> h/w errors are a classification of other errors. The full error list 
>> is available from "iostat -E" and will
>> be important to tracking this down.
>>
>> A better, more detailed analysis can be gleaned from the "fmdump -e" 
>> ereports that should be
>> associated with each h/w error. However, there are dozens of causes 
>> of these so we don’t have
>> enough info here to fully understand.
>>  — richard
>>
> Well… I can't provide you with the output of fmdump -e (since  I am 
> currently unable to get the '-' typed in to the console, due to some 
> fancy keyboard layout issues and nit being able to login via ssh as 
> well (can authenticate, but I don't get to the shell, which may be due 
> to the running zpool import), but I can confirm that fmdump does show 
> nothing at all. I could just reset the S11.1 host, after removing the 
> zpool.cache file, such as that the system will not try to import the 
> zpool upon restart right away…
>
> …plus I might get the option to set the keyboard right, after reboot, 
> but that's another issue…
>
After resetting the S11.1 host and getting the keyboard layout right, I 
issued a fmdump -e and there they are… lots of:

Jan 25 23:25:13.5643 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.8944 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.8945 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.8946 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9274 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9275 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9276 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9277 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9282 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9284 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9285 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9286 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9287 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9288 ereport.fs.zfs.dev.merr.write
Jan 25 23:25:13.9290 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9294 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9301 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:25:13.9306 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
Jan 25 23:50:44.7195 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:50:44.7306 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:50:44.7434 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:53:31.4386 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:53:31.4579 ereport.io.scsi.cmd.disk.dev.rqs.derr
Jan 25 23:53:31.4710 ereport.io.scsi.cmd.disk.dev.rqs.derr

These seem to be media errors and disk errors on the zpools/zvols that 
make up the LUNs for this zpool… I am wondering, why this happens.

Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20170126/bed6b1ea/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5546 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20170126/bed6b1ea/attachment-0001.bin>