When you are offering a high availability service, you often balance users among nodes using DNS. The problem with DNS is the propagation time so, in case of a node failure, a quick response is very important. This is the reason why I developed a tiny script that checks the status of the balanced nodes for the bind9 DNS server.
The script will need a little modification on your bind9 zone files. This is the syntax:
[..] ftp.example.com 3600 IN A 258.421.3.125 ;; BALANCED_AUTOCHECK::80 balancer-www 120 IN A 258.421.3.182 balancer-www 120 IN A 258.421.3.183 balancer-http 3600 IN CNAME balancer-www.example.com. [..]
Here we have the subdomain balancer-www balanced between two hosts with IP's 258.421.3.182, 258.421.3.183. At the top of these A records we have the "code" that we have to add for the script to know how to proceed. The sintax is simple: ;; BALANCED_AUTOCHECK::<service_port>. The BALANCED_AUTOCHECK part is only a matching pattern and the <service_port> is the port of the service to check. In the above example, we are checking a balancer for an http service, so we are using the port 80.
NOTE: For your interest, the rule matches the regular expression: ;;\s*BALANCED_AUTOCHECK::\d+$
Please, have in mind that no protocol check is made (i.e.: HTTP 400 errors, etc) but only a plain socket connection. If the socket connection fails, the IP is marked as down by commenting the A record it and if a recover is detected, the A record is uncommented.
Here is the help output of the script:
Usage: bind9_check_balancer [options] [dns_files] Options: -h, --help show this help message and exit -c COMMAND, --command=COMMAND Command to be executed if changes are made. (example:'service bind9 restart') -t TIMEOUT, --timeout=TIMEOUT Socket timeout for unreachable hosts in seconds. Default: 5
I think it explains quite well how it works but, just in case, here are some examples:
# Check test.net.hosts and test.com.hosts. If a change is made # (down/recover detected), exec the command: /etc/init.d/bind9 reload bind9_check_balancer -c '/etc/init.d/bind9 reload' /etc/named/test.net.hosts /etc/bind/test.com.hosts # Check all files in /etc/named/zones directory and set a timeout # of 1 second for connection checks. Also exec: service bind9 reload bind9_check_balancer -c 'service bind9 reload' /etc/named/zones/*
This script is intempted to be executed as a cron job each minute or each 5 minutes (or each time you want). You can get the script form dgtool github repository at: https://github.com/diego-XA/dgtool/blob/master/ha/bind9_check_balancer