[OmniOS-discuss] ZFS/COMSTAR - zpool reports errors

Wed Aug 12 14:50:49 UTC 2015

Am 12.08.15 um 16:04 schrieb Stephan Budach:
> Hi everyone,
>
> yesterday I was alerted about one of my zpools reporting an 
> uncorrectable error. When I checked that, I was presented with some 
> sort of generic error at one of my iSCSI zvols:
>
> root at nfsvmpool08:/root# zpool status -v sataTank
>   pool: sataTank
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
>         entire pool from backup.
>    see: http://illumos.org/msg/ZFS-8000-8A
>   scan: scrub repaired 0 in 6h2m with 0 errors on Wed Aug 12 05:08:51 
> 2015
> config:
>
>         NAME                         STATE     READ WRITE CKSUM
>         sataTank                     ONLINE       0 0     0
>           mirror-0                   ONLINE       0 0     0
>             c1t5000CCA22BC4ACEDd0    ONLINE       0 0     0
>             c1t5000CCA22BC51C04d0    ONLINE       0 0     0
>           mirror-1                   ONLINE       0 0     0
>             c1t5000CCA22BC4896Dd0    ONLINE       0 0     0
>             c1t5000CCA22BC4B18Ed0    ONLINE       0 0     0
>           mirror-2                   ONLINE       0 0     0
>             c1t5000CCA22BC4AFFBd0    ONLINE       0 0     0
>             c1t5000CCA22BC5135Ed0    ONLINE       0 0     0
>         logs
>           mirror-3                   ONLINE       0 0     0
>             c1t50015179596C598Ed0p2  ONLINE       0 0     0
>             c1t50015179596B0A1Fd0p2  ONLINE       0 0     0
>         cache
>           c1t5001517959680E33d0      ONLINE       0 0     0
>           c1t50015179596B0A1Fd0p3    ONLINE       0 0     0
>
> errors: Permanent errors have been detected in the following files:
>
>         sataTank/nfsvmpool08-sata-03:<0x1>
> root at nfsvmpool08:/root#
>
> As you can see, I ran a scrub, but that one didn't find any issue with 
> any of the data in the pool. Checking fmdump also revealed nothing, so 
> I wonder what I am to do about that? I recall from somewhere in my 
> head, that I had seen a topic like this had been discussed before, but 
> I seem to cannot find it anywhere. This must have been either on this 
> list or the OpenIndiana list.
>
> This zvol is actually part of a mirror RAC volume group, so I went to 
> the RAC nodes, but neither of them notices anythinf strange as well…
>
> So, my main question is: how can I diagnose this further, if possible=
>
> Thanks,
> Stephan
>
> _______________________________________________ 
Ahh… that was too soon… ;) Actually one of the RAC nodes noticed an 
error at or rather a couple of mintues before this issue, when it 
reported this:

Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Unhandled sense code
Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Result: 
hostbyte=invalid driverbyte=DRIVER_SENSE
Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Sense Key : Medium 
Error [current]
Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx]  Add. Sense: 
Unrecovered read error
Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] CDB: Read(10): 28 
00 60 b2 54 28 00 00 80 00
Aug 11 20:25:04 btierasm01 kernel: end_request: critical target error, 
dev sdx, sector 1622299688
Aug 11 20:25:04 btierasm01 kernel: end_request: critical target error, 
dev dm-25, sector 1622299688
Aug 11 20:25:04 btierasm01 kernel: ADVMK-0020: A read error was reported 
to the ASM instance for volume vg_nfs07fwd-16 in diskgroup DG_NFS07SA

This vg_nfs07fwd-16 is a RAC volume, which is presented via NFS from the 
RAC cluster nodes to some Oracle VM hosts, but neither of those hosts 
had any issues with that volume at any time, so I assmume  the request 
came from the RAC node itself and I will dig into the logs to see, what 
it actually treid to do with the volume.

I am still wondering if this issue is somewhat related to COMSTAR or the 
zpool itself.

Thanks,
Stephan