[OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

Fri Feb 19 06:10:56 UTC 2016

Am 18.02.16 um 22:56 schrieb Richard Elling:
> comments below...
>
>> On Feb 18, 2016, at 12:57 PM, Schweiss, Chip <chip at innovates.com 
>> <mailto:chip at innovates.com>> wrote:
>>
>>
>>
>> On Thu, Feb 18, 2016 at 5:14 AM, Michael Rasmussen<mir at miras.org 
>> <mailto:mir at miras.org>>wrote:
>>
>>     On Thu, 18 Feb 2016 07:13:36 +0100
>>     Stephan Budach <stephan.budach at JVM.DE
>>     <mailto:stephan.budach at JVM.DE>> wrote:
>>
>>     >
>>     > So, when I issue a simple ls -l on the folder of the vdisks,
>>     while the switchover is happening, the command somtimes comcludes
>>     in 18 to 20 seconds, but sometime ls will just sit there for minutes.
>>     >
>>     This is a known limitation in NFS. NFS was never intended to be
>>     clustered so what you experience is the NFS process on the client
>>     side
>>     keeps kernel locks for the now unavailable NFS server and any request
>>     to the process hangs waiting for these locks to be resolved. This can
>>     be compared to a situation where you hot-swap a drive in the pool
>>     without notifying the pool.
>>
>>     Only way to resolve this is to forcefully kill all NFS client
>>     processes
>>     and the restart the NFS client.
>>
>
> ugh. No, something else is wrong. I've been running such clusters for 
> almost 20 years,
> it isn't a problem with the NFS server code.
>
>>
>>
>> I've been running RSF-1 on OmniOS since about r151008.  All my 
>> clients have always been NFSv3 and NFSv4.
>>
>> My memory is a bit fuzzy, but when I first started testing RSF-1, 
>> OmniOS still had the Sun lock manager which was later replaced with 
>> the BSD lock manager.   This has had many difficulties.
>>
>> I do remember that fail overs when I first started with RSF-1 never 
>> had these stalls, I believe this was because the lock state was 
>> stored in the pool and the server taking over the pool would inherit 
>> that state too.   That state is now lost when a pool is imported with 
>> the BSD lock manager.
>>
>> When I did testing I would do both full speed reading and writing to 
>> the pool and force fail overs, both by command line and by killing 
>> power on the active server.    Never did I have a fail over take more 
>> than about 30 seconds for NFS to fully resume data flow.
>
> Clients will back-off, but the client's algorithm is not universal, so 
> we do expect to
> see different client retry intervals for different clients. For 
> example, the retries can
> exceed 30 seconds for Solaris clients after a minute or two (alas, I 
> don't have the
> detailed data at my fingertips anymore :-(. Hence we work hard to make 
> sure failovers
> occur as fast as feasible.
>
>>
>> Others who know more about the BSD lock manager vs the old Sun lock 
>> manager may be able to tell us more.  I'd also be curious if Nexenta 
>> has addressed this.
>
> Lock manager itself is an issue and through we're currently testing 
> the BSD lock
> manager in anger, we haven't seen this behaviour.
>
> Related to lock manager is name lookup. If you use name services, you 
> add a latency
> dependency to failover for name lookups, which is why we often disable 
> DNS or other
> network name services on high-availability services as a best practice.
>  -- richard

This is, why I always put each host name,involved in my cluster setups, 
into /etc/hosts on each node.

Cheers,
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20160219/beeccb9e/attachment.html>