[OmniOS-discuss] RSF-1 and Zones

Michael Talbott mtalbott at lji.org
Fri Apr 21 23:50:45 UTC 2017


All,

I'm answering my own question here, but, a few other users reached out to me about it so I'm posting my own crafted solution below which I've found to work very well with my particular setup.

The key of making an HA zone (assuming RSF-1 is already working properly) is to configure both/all nodes with the zone making sure they point to the shared storage and making sure all vnic or other networking is already in place (and make sure theres no naming conflicts). If you update the zone configuration on one node you need to do it to all of them so when a failover occurs you don't end up with a different config. I didn't bother automating that part since that can be a tricky issue that adds complexity that I don't need in my case. 

Here's the core of the failover logic needed:

In /opt/HAC/RSF-1/etc/rc.appliance.c/ add these scripts:


First, a kill script for ONLY the related zones on the shared storage going down. RSF-1 is kind enough to give us an exported variable RSF_SERVICE we can extract some important info from :D

K70_zones

#!/bin/bash

SERVICE=${RSF_SERVICE:-"servicename"}

# halt related non-global zones referencing failing service
ZONES=$(zoneadm list | egrep -v '^global$')
while read ZONE; do
    zonecfg -z $ZONE export | grep "$SERVICE"
    [[ "$?" == "0" ]] && zoneadm -z $ZONE halt
done <<< "$ZONES"


And a start script to attach and bring up any/all accessible/attached/installed zones:

S70_zones

#!/bin/bash

# attach configured zones
CZONES=$(zoneadm list -c | egrep -v '^global$')
while read ZONE; do
    zoneadm -z $ZONE attach -F
done <<< "$CZONES"

# boot installed zones
IZONES=$(zoneadm list -c | egrep -v '^global$')
while read ZONE; do
    zoneadm -z $ZONE boot
done <<< "$IZONES"



There's obviously room for improvement, but, this is all I need in order to make my LX zones (hosting beegfs storage servers) highly available for our HPC cluster :)


If anyone has any suggestions to make this better, I'm all ears.


Michael


> On Apr 18, 2017, at 1:35 PM, Michael Talbott <mtalbott at lji.org> wrote:
> 
> Anyone out there use RSF-1 ( http://www.high-availability.com <http://www.high-availability.com/> ) for ZFS HA and have some good scripts (or best practices) for handling failing over the zones along with the storage pools?
> 
> Since I started using some zones on my storage nodes, it occured to me that if a failover were to happen it'll probably hang since the zones would be in-use and there's no scripts currently in place to stop them, export the config, import the config, and start them up on the other node.
> 
> Before I go and write my own implementation I figured I'd see if anyone else has a good solution already written.
> 
> Thanks,
> 
> 
> Michael
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20170421/43df4473/attachment.html>


More information about the OmniOS-discuss mailing list