[OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

Michael Talbott mtalbott at lji.org
Thu Feb 18 08:01:12 UTC 2016


If that's the case, perhaps you should check to see if the nfs ports are open upon failover. If they open just as quickly as the pings respond, then I would blame the nfs locking managers or nfs in general. The action to remedy that is beyond my scope other than to try force a remount clientside or try restarting the nfs server service, which I imagine rfs1 already does? I would be interested to know how rfs1 handles file locking during failover if at all.

Michael

> On Feb 17, 2016, at 11:30 PM, Stephan Budach <stephan.budach at jvm.de> wrote:
> 
> Hi Michael,
> 
>> Am 18.02.16 um 08:17 schrieb Michael Talbott:
>> While I don't have a setup like you've described, I'm going to take a wild guess and say check your switches (and servers) ARP tables. Perhaps the switch isn't updating your VIP address with the other servers MAC address fast enough. Maybe as part of the failover script, throw a command to your switch to update the ARP entry or clear its ARP table. Another perhaps simpler solution / diagnostic you could do is record a ping output of the server to your router via the vip interface and address right after the failover process to try and tickle the switch to update its mac table. Also it's possible the clients might need an ARP flush too.
>> 
>> If this is the case, another possibility is you could have both servers spoof the same MAC address and only ever have one up at a time and have them controlled by the failover script (or bad things will happen).
>> 
>> Just a thought.
>> 
>> Michael
>> Sent from my iPhone
>> 
>>> On Feb 17, 2016, at 10:13 PM, Stephan Budach <stephan.budach at JVM.DE> wrote:
>>> 
>>> Hi,
>>> 
>>> I have been test driving RSF-1 for the last week to accomplish the following:
>>> 
>>> - cluster a zpool, that is made up from 8 mirrored vdevs, which are based on 8 x 2 SSD mirrors via iSCSI from another OmniOS box
>>> - export a nfs share from above zpool via a vip
>>> - have RSF-1 provide the fail-over and vip-moving
>>> - use the nfs share as a repository for my Oracle VM guests and vdisks
>>> 
>>> The setup seems to work fine, but I do have one issue, I can't seem to get solved. Whenever I failover the zpool, any inflight nfs data, will be stalled for some unpredictable time. Sometimes it takes not much longer than the "move" time of the resources but sometimes it takes up to 5 mins. until the nfs client on my VM server becomes alive again.
>>> 
>>> So, when I issue a simple ls -l on the folder of the vdisks, while the switchover is happening, the command somtimes comcludes in 18 to 20 seconds, but sometime ls will just sit there for minutes.
>>> 
>>> I wonder, if there's anything, I could do about that. I have already played with several timeouts, nfs wise and tcp wise, but nothing seem to yield any effect on this issue. Anyone, who knows some tricks to speed up the inflight data?
>>> 
>>> Thanks,
>>> Stephan
>>> 
>>> 
> I don't think that the switches are the problem, since when I ping the vip from the VM host (OL6 based), then the ping only ceases for the time it takes RSF-1 to move the services and afterwards the pings continue just normally. The only thing I wonder is, if it's more of a NFS or a tcp-in-general thing. Maybe I should also test some other IP protocol to see, if that one stalls as well for that long of a time.
> 
> Cheers,
> Stephan


More information about the OmniOS-discuss mailing list