The CLUSTERIP target is used to create simple clusters of nodes answering to the same IP and MAC address in a round robin fashion. This is a simple form of clustering where you set up a Virtual IP (VIP) on all hosts participating in the cluster, and then use the CLUSTERIP on each host that is supposed to answer the requests. The CLUSTERIP match requires no special load balancing hardware or machines, it simply does its work on each host part of the cluster of machines. It is a very simple clustering solution and not suited for large and complex clusters, neither does it have built in heartbeat handling, but it should be easily implemented as a simple script.
All servers in the cluster uses a common Multicast MAC for a VIP, and then a special hash algorithm is used within the CLUSTERIP target to figure out who of the cluster participants should respond to each connection. A Multicast MAC is a MAC address starting with 01:00:5e as the first 24 bits. an example of a Multicast MAC would be 01:00:5e:00:00:20. The VIP can be any IP address, but must be the same on all hosts as well.
Remember that the CLUSTERIP might break protocols such as SSH et cetera. The connection will go through properly, but if you try the same time again to the same host, you might be connected to another machine in the cluster, with a different keyset, and hence your ssh client might refuse to connect or give you errors. For this reason, this will not work very well with some protocols, and it might be a good idea to add separate addresses that can be used for maintenance and administration. Another solution is to use the same SSH keys on all hosts participating in the cluster.
The cluster can be loadbalanced with three kinds of hashmodes. The first one is only source IP (sourceip), the second is source IP and source port (sourceip-sourceport) and the third one is source IP, source port and destination port (sourceip-sourceport-destport). The first one might be a good idea where you need to remember states between connections, for example a webserver with a shopping cart that keeps state between connections, this load-balancing might become a little bit uneven -- different machines might get a higher loads than others, et cetera -- since connections from the same source IP will go to the same server. The sourceip-sourceport hash might be a good idea where you want to get the load-balancing a little bit more even, and where state does not have to be kept between connections on each server. For example, a large informational webpage with perhaps a simple search engine might be a good idea here. The third and last hashmode, sourceip-sourceport-destport, might be a good idea where you have a host with several services running that does not require any state to be preserved between connections. This might for example be a simple ntp, dns and www server on the same host. Each connection to each new destination would hence be "renegotiated" -- actually no negotiation goes on, it is basically just a round robin system and each host receives one connection each.
Each CLUSTERIP cluster gets a separate file in the /proc/net/ipt_CLUSTERIP directory, based on the VIP of the cluster. If the VIP is 192.168.0.5 for example, you could cat /proc/net/ipt_CLUSTERIP/192.168.0.5 to see which nodes this machine is answering for. To make the machine answer for another machine, lets say node 2, add it using echo "+2" >> /proc/net/ipt_CLUSTERIP/192.168.0.5. To remove it, run echo "-2" >> /proc/net/ipt_CLUSTERIP/192.168.0.5.
Table 11-2. CLUSTERIP target options
|Example||iptables -A INPUT -p tcp -d 192.168.0.5 --dport 80 -j CLUSTERIP --new ...|
|Explanation||This creates a new CLUSTERIP entry. It must be set on the first rule for a VIP, and is used to create a new cluster. If you have several rules connecting to the same CLUSTERIP you can omit the --new keyword in any secondary references to the same VIP.|
|Example||iptables -A INPUT -p tcp -d 192.168.0.5 --dport 443 -j CLUSTERIP --new --hashmode sourceip ...|
The --hashmode keyword specifies the kind of hash that should be created. The hashmode can be any of the following three.
The hashmodes has been extensively explained above. Basically, sourceip will give better performance and simpler states between connections, but not as good load-balancing between the machines. sourceip-sourceport will give a slightly slower hashing and not as good to maintain states between connections, but will give better load-balancing properties. The last one may create very slow hashing that consumes a lot of memory, but will on the other hand also create very good load-balancing properties.
|Example||iptables -A INPUT -p tcp -d 192.168.0.5 --dport 80 -j CLUSTERIP --new --hashmode sourceip --clustermac 01:00:5e:00:00:20 ...|
|Explanation||The MAC address that the cluster is listening to for new connections. This is a shared Multicast MAC address that all the hosts are listening to. See above for a deeper explanation of this.|
|Example||iptables -A INPUT -p tcp -d 192.168.0.5 --dport 80 -j CLUSTERIP --new --hashmode sourceip --clustermac 01:00:5e:00:00:20 --total-nodes 2 ...|
|Explanation||The --total-nodes keyword specifies how many hosts are participating in the cluster and that will answer to requests. See above for a deeper explanation.|
|Example||iptables -A INPUT -p tcp -d 192.168.0.5 --dport 80 -j CLUSTERIP --new --hashmode sourceip --clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1|
|Explanation||This is the number that this machine has in the cluster. The cluster answers in a round-robin fashion, so once a new connection is made to the cluster, the next machine answers, and then the next after that, and so on.|
|Example||iptables -A INPUT -p tcp -d 192.168.0.5 --dport 80 -j CLUSTERIP --new --hashmode sourceip --clustermac 01:00:5e:00:00:20 --hash-init 1234|
|Explanation||Specifies a random seed for hash initialization.|
This target is in violation of the RFC 1812 - Requirements for IP Version 4 Routers RFC, so be wary of any problems that may arise. Specifically, section 3.3.2 which specifies that a router must never trust another host or router that says that it is using a multicast mac.
Works under late Linux 2.6 kernels, marked experimental.