[OmniOS-discuss] zfs recv assertion failed when scrubbing source pool

Thu Oct 8 07:59:20 UTC 2015

We're sending nightly incremental replication snaphots of a large
filesystem tree (about 3900 filesystems) to a backup host. It's been
working mostly okay - we scrub the source pool every month and that
hasn't had any effect on the sends/receives. However, on Sep 21, I
upgraded the backup host from entire at 11-0.151014:20150402T192159Z to
entire at 11-0.151014:20150914T123242Z, and during the zpool scrub on the
source host at the start of October we got this:

    Assertion failed: ilen <= SPA_MAXBLOCKSIZE, file ../common/libzfs_sendrecv.c, line 1706, function recv_read

It seemed a transient failure as I was at first unable to reproduce it,
but firing off another scrub on the source pool did cause it to happen
again the following night, when scrub was still running. I further
upgraded the backup host to the Sep29 151014 update (which apparently
didn't bump the 'entire' version), and it's still happening. The source
host is currently in production and still running omnios-170cea2 (or
entire at 11-0.151014:20150402T192159Z); we're scheduled to upgrade it next
Monday. It had a cache device up until Dan's recent advice to remove it;
I suspected maybe we'd been hit by corruption, but that doesn't explain
why the assertion happens only when the source pool is scrubbing.

We use this kind of command to send snapshots to the backup host:
    zfs send -R -i $yesterday ${filesystem}@today | ssh backuphost zfs recv -ud $targetfs

We're not running either send or recv as root, opting to use delegations
instead. Don't know if that's relevant or not.

Any clues?

-- 
Lauri Tirkkonen | lotheac @ IRCnet