Here is a subject that does not get enough attention. With RHCS/CMAN, the heartbeat of a node, by default, uses broadcast to let the other nodes know that it is alive and well, and is a member of the cluster. So, in turn, by default we are restricted to using a single network for our heartbeat, and in most cases this is fine… But, what if we want to cluster across multiple networks? Say I have node01 on 10.1.1.0/24 and node02 on 10.1.2.0/24, and I want them to be able to communicate with each other? Well, this is not as trivial a feat as you might assume… In fact, I had to employ some of the finest network technicians (well, not really — but they are pretty smart!) in the south-east to get this to work…

You can configure CMAN to perform its heartbeat operation using multicast instead of broadcast, and that gets us on the right path toward being able to cluster across networks, but first there are a few things that you will need to do on your switch/router to make sure that multicasting across networks will work… In our test environment, we are using a Cisco Catalyst 4510r-e core switch and hosting multiple VLANs, but for the purposes of this example, we will be clustering between the 10.3.1.0/24 and 10.3.16.0/24 networks. Here are the commands to globally enable ip multicasting and set a rendezvous-point on one of the VLANs so that the multicast traffic has a place to go. VLAN1 is the 10.3.1.0/24 network and VLAN 316 is the 10.3.16.0/24 network…

router#conf t
router (config)#ip multicast- routing
router (config)#ip pim rp-address 10.3.1.241
router (config)#int vlan 1
router (config-if)#ip pim sparse-mode
router (config-if)#int vlan 316
router (config-if)#ip pim sparse-mode
router (config-if)#exit
router (config)#exit
router#

10.3.1.241 is the default gateway of VLAN 1, and serves as a suitable rendezvous-point for the multicast traffic. The rendezvous-point could have just as easily been defined as the default gateway of the 10.3.16.0/24 network. We were also able to perform almost the exact same commands on a Force10 E1200 core switch and multicasting quickly began flowing between networks… So, that’s Step 1. Step 2 we do in the cluster configuration. Basically, we are going to enable the heartbeat to work across multicast instead of broadcast, and we’re going to define a specific, static multicast address for this cluster to use. Probably a good idea to not use the same multicast address between different clusters in your organization. Not that anything would go wrong if you did, but I have seen it cause some funny behavior when you get a large number of nodes in either cluster (ie. they frequently fall out of the cluster and they start fencing each other).

On to the bare metal…

Let’s start with a sample /etc/cluster/cluster.conf. In this example, I am using a two-node cluster, but the concept of configuring a node with three or more nodes for multicasting is exactly the same:

<?xml version="1.0"?>
<cluster name="MY_CLUSTER" config_version="8">
  <clusternodes>
    <clusternode name="node01" votes="1" nodeid="1">
      <multicast addr="239.192.132.95" interface="eth0"/>
      <fence>
        <method name="single">
          <device name="manual"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node02" votes="1" nodeid="2">
      <multicast addr="239.192.132.95" interface="eth0"/>
      <fence>
        <method name="single">
          <device name="manual"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <cman two_node="1" expected_votes="1">
    <multicast addr="239.192.132.95"/>
  </cman>
  <fencedevices><fencedevice name="manual" agent="fence_manual"/></fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>

The key elements here that we want to pay attention to are the pieces that specifically reference multicast. For each node, you will have to tell the configuration what multicast address it will be using and what network interface it will be sending/receiving multicast traffic on… then, for the global CMAN configuration, you will need to specify a multicast address for the cluster. Once you have done this, save your configuration and propagate it to each of the nodes in the cluster.

A quick note on this: if you are using the system-config-cluster utility for your configuration, it may throw up errors about this configuration… At least the latest version (1.0.53-7) doesn’t seem to like the multicast address being configured on each of the clusternode elements. (GUIs are for losers anyway, right?)

Step 3: You will need to make it so that multicast traffic originating from your cluster nodes will actually traverse your networks. This is really stupid, but the clustering daemon will send the multicast packet with a TTL of 1… This won’t make it past the inside interface of your router! So, we’ll need to do some iptables magic to increase the TTL of our multicast packets so that our other nodes will see us (and us them)…

# iptables -t mangle -A OUTPUT -d 239.192.132.95 -j TTL --ttl-set 3

Setting the TTL to 3 for our multicast address gets the traffic through the router, from node01 to node02, and vice-versa. You will need to perform this step on each of the nodes, and save your iptables configuration when you’re done so that the change persists across reboot.

Finally, restart the cluster services on each of your nodes and watch them happily join to quorate.

Here are some basics on multicast that may help you better understand this from the switch side (thanks to Chris Bell for performing the work, providing very detailed documentation, bearing with me while we figured this out, and eventually completely figuring it out.):

  • PIM – Protocol Independent Multicast. A routing protocol that routers can use to track what multicast packets to forward to which routers and to their respective LAN’s.
  • RP – Rendezvous Point. An IP address for routers that have directly connected hosts that are interested in joining a multicast group can send join messages too. This is usually the DG or router interface closest to the source of the multicast traffic. The RP keeps track of multicast groups. The RP then sends join messages to the source. This works in conjunction with PIM
  • Sender – Host sending the multicast transmission.
  • Listener – Host interested in receiving the multicast transmission.
  • Querier – network device that sends messages to discover which hosts are members of a multicast group.
  • Host – Listener that sends report messages in response to the querier informing of group membership.
  • Multicast Group – set of hosts and queriers that receive multicast messages from the same sender via a multicast address (group address).
  • Sparse Mode – Only listeners that have specifically requested the data will be forwarded the multicast traffic.
  • Dense Mode – Multicast traffic is flooded to all interfaces
  • IGMP Snooping – Allows the switch to examine all multicast packets and update the group table


Note:
A host does not have to belong to the group to be a sender, but it must belong to the multicast group to be a listener.  Queriers and hosts use IGMP messages to join and leave multicast groups.  The RP is simply a ‘gateway’ of sorts for multicasts listeners to send their requests to join and a target for the source to send it’s multicast traffic.

Enjoy.

-dan

Special thanks to Allan Howard for doing his magic with this on the VMware side.

Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

© 2013 Dan's Blog Suffusion theme by Sayontan Sinha