[OmniOS-discuss] Testing RSF-1 with zpool/nfs HA
Stephan Budach
stephan.budach at JVM.DE
Fri Feb 19 06:10:56 UTC 2016
Am 18.02.16 um 22:56 schrieb Richard Elling:
> comments below...
>
>> On Feb 18, 2016, at 12:57 PM, Schweiss, Chip <chip at innovates.com
>> <mailto:chip at innovates.com>> wrote:
>>
>>
>>
>> On Thu, Feb 18, 2016 at 5:14 AM, Michael Rasmussen<mir at miras.org
>> <mailto:mir at miras.org>>wrote:
>>
>> On Thu, 18 Feb 2016 07:13:36 +0100
>> Stephan Budach <stephan.budach at JVM.DE
>> <mailto:stephan.budach at JVM.DE>> wrote:
>>
>> >
>> > So, when I issue a simple ls -l on the folder of the vdisks,
>> while the switchover is happening, the command somtimes comcludes
>> in 18 to 20 seconds, but sometime ls will just sit there for minutes.
>> >
>> This is a known limitation in NFS. NFS was never intended to be
>> clustered so what you experience is the NFS process on the client
>> side
>> keeps kernel locks for the now unavailable NFS server and any request
>> to the process hangs waiting for these locks to be resolved. This can
>> be compared to a situation where you hot-swap a drive in the pool
>> without notifying the pool.
>>
>> Only way to resolve this is to forcefully kill all NFS client
>> processes
>> and the restart the NFS client.
>>
>
> ugh. No, something else is wrong. I've been running such clusters for
> almost 20 years,
> it isn't a problem with the NFS server code.
>
>>
>>
>> I've been running RSF-1 on OmniOS since about r151008. All my
>> clients have always been NFSv3 and NFSv4.
>>
>> My memory is a bit fuzzy, but when I first started testing RSF-1,
>> OmniOS still had the Sun lock manager which was later replaced with
>> the BSD lock manager. This has had many difficulties.
>>
>> I do remember that fail overs when I first started with RSF-1 never
>> had these stalls, I believe this was because the lock state was
>> stored in the pool and the server taking over the pool would inherit
>> that state too. That state is now lost when a pool is imported with
>> the BSD lock manager.
>>
>> When I did testing I would do both full speed reading and writing to
>> the pool and force fail overs, both by command line and by killing
>> power on the active server. Never did I have a fail over take more
>> than about 30 seconds for NFS to fully resume data flow.
>
> Clients will back-off, but the client's algorithm is not universal, so
> we do expect to
> see different client retry intervals for different clients. For
> example, the retries can
> exceed 30 seconds for Solaris clients after a minute or two (alas, I
> don't have the
> detailed data at my fingertips anymore :-(. Hence we work hard to make
> sure failovers
> occur as fast as feasible.
>
>>
>> Others who know more about the BSD lock manager vs the old Sun lock
>> manager may be able to tell us more. I'd also be curious if Nexenta
>> has addressed this.
>
> Lock manager itself is an issue and through we're currently testing
> the BSD lock
> manager in anger, we haven't seen this behaviour.
>
> Related to lock manager is name lookup. If you use name services, you
> add a latency
> dependency to failover for name lookups, which is why we often disable
> DNS or other
> network name services on high-availability services as a best practice.
> -- richard
This is, why I always put each host name,involved in my cluster setups,
into /etc/hosts on each node.
Cheers,
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20160219/beeccb9e/attachment.html>
More information about the OmniOS-discuss
mailing list