[OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

Michael Talbott mtalbott at lji.org
Thu Feb 18 07:17:11 UTC 2016


While I don't have a setup like you've described, I'm going to take a wild guess and say check your switches (and servers) ARP tables. Perhaps the switch isn't updating your VIP address with the other servers MAC address fast enough. Maybe as part of the failover script, throw a command to your switch to update the ARP entry or clear its ARP table. Another perhaps simpler solution / diagnostic you could do is record a ping output of the server to your router via the vip interface and address right after the failover process to try and tickle the switch to update its mac table. Also it's possible the clients might need an ARP flush too.

If this is the case, another possibility is you could have both servers spoof the same MAC address and only ever have one up at a time and have them controlled by the failover script (or bad things will happen).

Just a thought.

Michael
Sent from my iPhone

> On Feb 17, 2016, at 10:13 PM, Stephan Budach <stephan.budach at JVM.DE> wrote:
> 
> Hi,
> 
> I have been test driving RSF-1 for the last week to accomplish the following:
> 
> - cluster a zpool, that is made up from 8 mirrored vdevs, which are based on 8 x 2 SSD mirrors via iSCSI from another OmniOS box
> - export a nfs share from above zpool via a vip
> - have RSF-1 provide the fail-over and vip-moving
> - use the nfs share as a repository for my Oracle VM guests and vdisks
> 
> The setup seems to work fine, but I do have one issue, I can't seem to get solved. Whenever I failover the zpool, any inflight nfs data, will be stalled for some unpredictable time. Sometimes it takes not much longer than the "move" time of the resources but sometimes it takes up to 5 mins. until the nfs client on my VM server becomes alive again.
> 
> So, when I issue a simple ls -l on the folder of the vdisks, while the switchover is happening, the command somtimes comcludes in 18 to 20 seconds, but sometime ls will just sit there for minutes.
> 
> I wonder, if there's anything, I could do about that. I have already played with several timeouts, nfs wise and tcp wise, but nothing seem to yield any effect on this issue. Anyone, who knows some tricks to speed up the inflight data?
> 
> Thanks,
> Stephan
> 
> 
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


More information about the OmniOS-discuss mailing list