[OmniOS-discuss] core dump while trying to import pool

Sun Dec 6 06:49:59 UTC 2015

I did not run a zdb check since this pool was over 200TB and figured it'd take weeks to finish. Maybe more, maybe not? I just planned for worst case scenarios before the reboot and am sure glad I did.

The pool was scrubbed several times between the time the l2arc devices were removed and the reboot all reported no errors. The problem surfaces (at least in my case) when a particular volume tries to mount as rw. After lots of googling I found a few other reports with the same backtrace that say they were able to work around a similar issue by mounting the volumes as readonly first and then after they were mounted to update the mount to rw. I didn't try that, but, maybe that would have worked? If so it sounded like that'd only have been a temporary fix until next reboot...

At any rate, a clean scrub alone is not an indicator of pool health regarding this bug. No clue if a zdb analyses would be a more determining factor.  My personal advise is plan for the worst and hope for the best with backups on hand. Better to plan for it than to let a fluke bug or power incident reveal it unexpectedly.

Since I didn't zdb it first.. Maybe your nerves can be at more ease? Good luck and let me know how things turn out.

Michael
Sent from my iPhone

> On Dec 5, 2015, at 6:17 PM, Paul B. Henson <henson at acm.org> wrote:
> 
>> On Fri, Dec 04, 2015 at 11:33:09AM -0800, Michael Talbott wrote:
>> I also came upon this same issue after rebooting one of my OmniOS machines.
>> I did have l2arc devices on my pool until the announcement of the bug
>> found. At that point I immediately removed my l2arc devices and didn't
>> reboot the machine until a convenient time where if something bad were to
>> happen I could manage it. Well, it was good I planned for that reboot ;)
> 
> Hmm, out of curiosity, did you run a scrub and a zdb analysis of your
> pool before you rebooted? I'm in a similar boat, I have a pool which had
> L2ARC devices and might have been impacted by the bug. I removed the
> devices, ran a scrub and zdb, with no complaints from either, which left
> me reasonably hopeful the pool wasn't corrupted 8-/. I still haven't
> rebooted it though, there's really no good time for a pool to go belly
> up and potentially be unrecoverable :(. I was planning to do it over
> Christmas break, but if you scrubbed and zdb'd your pool successfully
> before rebooting and it still died that's gonna make me (extra) nervous
> <sigh>.
> 
> Thanks...