[OmniOS-discuss] ZFS/COMSTAR - zpool reports errors

Wed Aug 12 17:13:12 UTC 2015

Am 12.08.15 um 17:19 schrieb Michael Rasmussen:
> On Wed, 12 Aug 2015 16:50:49 +0200
> Stephan Budach <stephan.budach at JVM.DE> wrote:
>
>> Ahh… that was too soon… ;) Actually one of the RAC nodes noticed an
>> error at or rather a couple of mintues before this issue, when it
>> reported this:
>>
>> Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Unhandled sense code
>> Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Result:
>> hostbyte=invalid driverbyte=DRIVER_SENSE
>> Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Sense Key : Medium
>> Error [current]
>> Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx]  Add. Sense:
>> Unrecovered read error
>> Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] CDB: Read(10): 28
>> 00 60 b2 54 28 00 00 80 00
>> Aug 11 20:25:04 btierasm01 kernel: end_request: critical target error,
>> dev sdx, sector 1622299688
>> Aug 11 20:25:04 btierasm01 kernel: end_request: critical target error,
>> dev dm-25, sector 1622299688
>> Aug 11 20:25:04 btierasm01 kernel: ADVMK-0020: A read error was reported
>> to the ASM instance for volume vg_nfs07fwd-16 in diskgroup DG_NFS07SA
>>
>> This vg_nfs07fwd-16 is a RAC volume, which is presented via NFS from the
>> RAC cluster nodes to some Oracle VM hosts, but neither of those hosts
>> had any issues with that volume at any time, so I assmume  the request
>> came from the RAC node itself and I will dig into the logs to see, what
>> it actually treid to do with the volume.
>>
>> I am still wondering if this issue is somewhat related to COMSTAR or the
>> zpool itself.
>>
> I wonder whether this is a hardware issue (eg driver firmware). What if
> the firmware has marked a sector bad and have moved it elsewhere. Could
> one imagine that this move have taking place unnoticed to ZFS?
>
I think, if the drive's firmware moves a sector, then it would go 
unnoticed to ZFS, but then it shouldn't have any effect on the zpool 
itself, since in order to remap a sector, the drive had to be able to 
finally read it succesfully, no? Otherwise the zpool would have to 
encounter some read errors which should be in fmdump… but then, maybe not.

The drives are all HGST HUS72404-A3B0, connected to a LSI 9207-8i with 
FW 19.

The other questions than is, how can this error be cleared, as a scrub 
doesn't turn up any data to be faulty.