This is a very sharp edged howto, I made for mysql and apache ha setup in my home. It was basically a log of events I did. But putting it here for the benefit of everyone. This was done in mid 2006. You may need to adjust a few things as technology has advanced.

Setup DNS:

vi /var/named/chroot/var/named/example.com.fwd

$ORIGIN example.com.
$TTL 86400
@ IN SOA homeserver.example.com. kamran@wbitt.com. (
20060421 ; serial (d. adams)
3H ; refresh
15M ; retry
1W ; expiry
1D ) ; minimum

@ IN NS homeserver.homedomain.com.
@ IN MX 10 homeserver.homedomain.com.

www1.example.com. IN A 192.168.0.201
www2.example.com. IN A 192.168.0.202
www.example.com. IN A 192.168.0.203

vi /var/named/chroot/var/named/0.168.192.in-addr.arpa.zone

$ORIGIN 0.168.192.in-addr.arpa.
$TTL 86400
@ IN SOA homeserver.homedomain.com. kamran@wbitt.com. (
20060323 ; serial (d. adams)
3H ; refresh
15M ; retry
1W ; expiry
1D ) ; minimum

@ IN NS homeserver.homedomain.com.
@ IN MX 10 homeserver.homedomain.com.

254.0.168.192.in-addr.arpa. IN PTR homeserver.homedomain.com.
201.0.168.192.in-addr.arpa. IN PTR www1.example.com.
202.0.168.192.in-addr.arpa. IN PTR www2.example.com.
203.0.168.192.in-addr.arpa. IN PTR www.example.com.

service named restart

# service heartbeat start

on both Web servers (This step is not required. Creats problem with heartbeat)

vi /etc/sysconfig/network-scripts/ifcfg-lo:0

DEVICE=lo:0
BOOTPROTO=static
IPADDR=192.168.0.203
NETMASK=255.255.255.255
ONBOOT=yes
NAME=loopback

service heartbeat start

Both Web servers:

vi /etc/sysctl.conf

. . .
. . .

# When an arp request is received on eth0, only respond if that address is
# configured on eth0. In particular, do not respond if the address is
# configured on lo
net.ipv4.conf.eth0.arp_ignore = 1

# Ditto for eth1, add for all ARPing interfaces
#net.ipv4.conf.eth1.arp_ignore = 1

# Enable configuration of arp_announce option
net.ipv4.conf.all.arp_announce = 2

# When making an ARP request sent through eth0 Always use an address that
# is configured on eth0 as the source address of the ARP request. If this
# is not set, and packets are being sent out eth0 for an address that is on
# lo, and an arp request is required, then the address on lo will be used.
# As the source IP address of arp requests is entered into the ARP cache on
# the destination, it has the effect of announcing this address. This is
# not desirable in this case as adresses on lo on the real-servers should
# be announced only by the linux-director.
net.ipv4.conf.eth0.arp_announce = 2

# Ditto for eth1, add for all ARPing interfaces
#net.ipv4.conf.eth1.arp_announce = 2

Then:

sysctl -p

service heartbeat start

Create two partitions on both NFS servers with IDENTICAL / SAME size on both servers

one for DRBD (not less than 128 MB)

one for /data (the partition containing NFS share, which needs to be replicated) (I have created a 200MB partition only for testing)

**Node 1:**

~]# fdisk -l

Disk /dev/hda: 8455 MB, 8455200768 bytes
255 heads, 63 sectors/track, 1027 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/hda1 * 1 510 4096543+ 83 Linux
/dev/hda3 995 1027 265072+ 82 Linux swap
/dev/hda4 511 994 3887730 5 Extended
/dev/hda5 511 529 152586 83 Linux
/dev/hda6 530 554 200781 83 Linux

**Node 2:**

~]# fdisk -l

Disk /dev/hda: 8455 MB, 8455200768 bytes
255 heads, 63 sectors/track, 1027 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/hda1 * 1 510 4096543+ 83 Linux
/dev/hda2 511 543 265072+ 82 Linux swap
/dev/hda4 544 1027 3887730 5 Extended
/dev/hda5 544 562 152586 83 Linux
/dev/hda6 563 587 200781 83 Linux

service heartbeat start

Make sure you “DO NOT” mount the above two partitions on any of the NFS servers. Neither should you mention them in fstab.

service heartbeat start

NTP:

Time should be same on both nodes, ideally sychronized by an external time clock.

I use my homeserver as Time source for all nodes, as it has ntpd service running on it. So, on Both nodes:

~]# vi /etc/ntp.conf


server 192.168.0.254
….

Make sure the service ntpd is not running on nodes right now, then:

~]# ntpdate -u 192.168.0.254  
~]# ntpdate -u 192.168.0.254  
~]# ntpdate -u 192.168.0.254  
  
  
~]# service ntpd start

Starting ntpd:                                             \[  OK  \]  
  
~]# chkconfig --level 35 ntpd on  
  

DRBD installation on both Web servers

**Both nodes:**

~]# rpm -ivh drbd-0.7.23-1.el4.centos.i386.rpm
~]# rpm -ivh kernel-module-drbd-2.6.9-42.EL-0.7.21-1.c4.i686.rpm

service heartbeat start

on BOTH NFS servers :

vi /etc/drbd.conf  
  
resource r0 {  
protocol C;  
incon-degr-cmd "halt -f";  
startup {  
degr-wfc-timeout 120;    # 2 minutes.  
}  
  
disk {  
on-io-error   detach;  
}  
  
net {  
  
}  
  
syncer {  
  
rate 10M;  
  
group 1;  
  
al-extents 257;  
}  
  
on www1.example.com {          # \*\* EDIT \*\* the hostname of server 1 (uname -n)  
device     /dev/drbd0;        #  
disk       /dev/hda6;         # \*\* EDIT \*\* data partition on server 1  
address    192.168.0.201:7788; # \*\* EDIT \*\* IP address on server 1  
meta-disk  /dev/hda5\[0\];      # \*\* EDIT \*\* 128MB partition for DRBD on server 1  
}  
  
on www2.example.com {          # \*\* EDIT \*\* the hostname of server 2 (uname -n)  
device    /dev/drbd0;         #  
disk      /dev/hda6;          # \*\* EDIT \*\* data partition on server 2  
address   192.168.0.202:7788;  # \*\* EDIT \*\* IP address on server 2  
meta-disk /dev/hda5\[0\];       # \*\* EDIT \*\* 128MB partition for DRBD on server 2  
}  
  
}  

# service heartbeat start

Both Web servers:

drbdadm up all

# service heartbeat start

Check the status of /proc/drbd on both Web servers and you should see something like:

~]# cat /proc/drbd

[root@www1 ~]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Secondary/Secondary ld:Consistent
ns:0 nr:4 dw:4 dr:0 al:0 bm:1 lo:0 pe:0 ua:0 ap:0
1: cs:Unconfigured

[root@www2 ~]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Secondary/Secondary ld:Consistent
ns:4 nr:0 dw:0 dr:4 al:0 bm:1 lo:0 pe:0 ua:0 ap:0
1: cs:Unconfigured
[root@www2 ~]#

service heartbeat start

You see that both Web servers say that they are secondary and that the data is inconsistant. This is because no initial sync has been made yet.

I want to make WWW1 the primary Web server and WWW2 the “hot-standby”, If WWW1 fails, WWW2 takes over, and if WWW1 comes back then all data that has changed in the meantime is mirrored back from WWW2 to WWW1 so that data is always consistent.

Only on WWW1:-

~]# drbdadm – –do-what-I-say primary all

This will start the sync process

[root@nfs1 extras]# cat /proc/drbd
version: 0.7.17 (api:77/proto:74)
SVN Revision: 2093 build by buildcentos@build-i386, 2006-04-13 14:38:33
0: cs:SyncSource st:Primary/Secondary ld:Consistent
ns:18548 nr:0 dw:0 dr:19468 al:0 bm:27 lo:0 pe:25 ua:230 ap:0
[==>……………..] sync’ed: 12.0% (182332/200784)K
finish: 0:04:20 speed: 668 (540) K/sec
1: cs:Unconfigured
[root@nfs1 extras]#

This will take some time. I used only a 200MB data partition. On larger partitions, it may take hours !

While the sync process is running, you should see the syncing process output from /proc/drbd on both NFS servers:

[root@nfs2 extras]# cat /proc/drbd
version: 0.7.17 (api:77/proto:74)
SVN Revision: 2093 build by buildcentos@build-i386, 2006-04-13 14:38:33
0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent
ns:0 nr:96776 dw:96776 dr:0 al:0 bm:31 lo:0 pe:245 ua:0 ap:0
[===>…………….] sync’ed: 19.4% (104008/123188)K
finish: 0:04:20 speed: 176 (636) K/sec
1: cs:Unconfigured
[root@nfs2 extras]#

Notice the role change in both servers above # service heartbeat start-^

The following shows that Syncing is complete:

[root@www1 ~]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:0 nr:0 dw:4 dr:0 al:0 bm:1 lo:0 pe:0 ua:0 ap:0
1: cs:Unconfigured

[root@www2 ~]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Secondary/Primary ld:Consistent
ns:4 nr:0 dw:0 dr:4 al:0 bm:1 lo:0 pe:0 ua:0 ap:0
1: cs:Unconfigured
[root@www2 ~]#

service heartbeat start

Now you will need to format this newly created data partition:

On WWW1 only:

~]# mke2fs -v -j /dev/drbd0

service heartbeat start

Now what we want to do it make a directory “/data” on both Web servers and mount the “DRBD data partition” on it. Then we will further create two directories, one as /data/sites for Web Content and the other /data/mysql. It is not important to create this /data/mysql directory based on the following discussion.

If you create the a symbolic link from /data/mysql to /var/lib/mysql. It will NOT work and everytime mysql WILL FAIL TO START. MySQL expects this to be a directory. The solution is to edit the /etc/my.cnf and change the directives from from “/var/lib” to “/data” . So when mysqld is started, it will automatically create the /data/mysql directory. And will start fine. Thanks to Naveed Ahmad (naveed.ahmad@wbitt.com) for pointing this out.

~]# vi /etc/my.cnf

[mysqld]
# datadir=/var/lib/mysql # original
datadir=/data/mysql
# socket=/var/lib/mysql/mysql.sock # original
socket=/data/mysql/mysql.sock
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

[mysql.server]
user=mysql
# basedir=/var/lib # original
basedir=/data

[*mysqld_safe*]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

Proof of failure (using the original my.cnf):

\[root@www1 extras\]# `service mysqld start` `  
Initializing MySQL database:                               \[  OK  \]  
Timeout error occurred trying to start MySQL Daemon.  
Starting MySQL:                                            \[FAILED\]  
You have mail in /var/spool/mail/root  
\[root@www1 extras\]# service mysqld start  
Timeout error occurred trying to start MySQL Daemon.  
Starting MySQL:                                            \[FAILED\]  
\[root@www1 extras\]#  

And from the mysqld.log file:

[root@www1 ~]# tail -f /var/log/mysqld.log
070225 21:08:01 InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 0 43634.
InnoDB: Doing recovery: scanned up to log sequence number 0 43634
070225 21:08:01 InnoDB: Flushing modified pages from the buffer pool…
070225 21:08:01 InnoDB: Started; log sequence number 0 43634
070225 21:08:01 [ERROR] /usr/libexec/mysqld: Can’t find file: ‘./mysql/host.frm’ (errno: 13)
070225 21:08:01 [ERROR] /usr/libexec/mysqld: Can’t find file: ‘./mysql/host.frm’ (errno: 13)
070225 21:08:01 [ERROR] Fatal error: Can’t open and lock privilege tables: Can’t find file: ‘./mysql/host.frm’ (errno: 13)
070225 21:08:01 mysqld ended

On **Both Web servers:**

~]# mkdir /data

Then on WWW1 only mount the parition on the /data mount point.

~]# mount -t ext3 /dev/drbd0 /data

Check using “mount” and “df -h” commands:

[root@www1 ~]# `mount
. . .
. . .

/dev/drbd0 on /data type ext3 (rw)

[root@www1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda1 3.9G 711M 3.0G 20% /
none 58M 0 58M 0% /dev/shm
/dev/drbd0 190M 5.6M 175M 4% /data

Now on **Node 1:**

[root@www1 ~]# mkdir /data/sites/example.com -p

On node 2, though we have not mounted the /data yet, nor we have the sites and mysql directories in /data yet as well.

service heartbeat start

Now lets install HeartBeat on **Both nodes:**

First some required packages from CentOS main RPMS:

~]# rpm -ivh curl-7.12.1-8.rhel4.i386.rpm perl-Crypt-SSLeay-0.51-5.i386.rpm perl-HTML-Parser-3.35-6.i386.rpm perl-LDAP-0.31-5.noarch.rpm perl-Net-DNS-0.48-1.i386.rpm perl-libwww-perl-5.79-5.noarch.rpm libidn-0.5.6-1.i386.rpm perl-Convert-ASN1-0.18-3.noarch.rpm perl-Digest-HMAC-1.01-13.noarch.rpm perl-HTML-Tagset-3.03-30.noarch.rpm perl-URI-1.30-4.noarch.rpm perl-XML-SAX-0.12-7.noarch.rpm perl-Digest-SHA1-2.07-5.i386.rpm perl-XML-NamespaceSupport-1.08-6.noarch.rpm gnutls-1.0.20-3.2.2.i386.rpm libglade2-2.4.0-5.i386.rpm lm_sensors-2.8.7-2.40.3.i386.rpm pygtk2-2.4.0-1.i386.rpm atk-1.8.0-2.i386.rpm gtk2-2.4.13-19.i386.rpm pango-1.6.0-9.i386.rpm

then HeartBeat packages from CentOS extras:
~]# rpm -ivh perl-Mail-IMAPClient-2.2.9-1.rf.noarch.rpm ipvsadm-1.24-6.i386.rpm heartbeat-2.0.7-1.c4.i386.rpm heartbeat-ldirectord-2.0.7-1.c4.i386.rpm heartbeat-pils-2.0.7-1.c4.i386.rpm heartbeat-stonith-2.0.7-1.c4.i386.rpm perl-MailTools-1.74-1.c4.noarch.rpm perl-Net-IMAP-Simple-1.16-1.c4.noarch.rpm perl-Net-IMAP-Simple-SSL-1.3-1.c4.noarch.rpm perl-Mail-POP3Client-2.17-1.c4.noarch.rpm perl-TimeDate-1.16-1.c4.noarch.rpm perl-IO-Socket-SSL-1.01-1.c4.noarch.rpm perl-Net-SSLeay-1.25-3.rf.i386.rpm

service heartbeat start

**Both Web servers:**

vi /etc/ha.d/ha.cf

logfacility local0
keepalive 2
#deadtime 30 # USE THIS!!!
deadtime 10
#bcast eth0
serial /dev/ttyS0
baud 19200
auto_failback off
node www1.example.com www2.example.com

service heartbeat start

WWW1:

vi /etc/ha.d/haresources

www1.example.com IPaddr::192.168.0.203/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 httpd mysqld

NFS2:

vi /etc/ha.d/haresources

www2.example.com IPaddr::192.168.0.203/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 httpd mysqld

service heartbeat start

**Both nodes:**

~]# vi /etc/ha.d/authkeys

auth 3
3 md5 redhat

~]# chmod 600 /etc/ha.d/authkeys

service heartbeat start

Just to make things really interesting make sure you configure httpd.conf on both servers, setup the following directives:

ServerName , DocumentRoot

Node1:

ServerName www1.example.com:80
. . .
. . .
DocumentRoot “/data/sites/example.com”

Options Indexes FollowSymLinks
AllowOverride None
Order allow,deny
Allow from all

Node2:

ServerName www2.example.com:80
. . .
. . .
DocumentRoot “/data/sites/example.com”

Options Indexes FollowSymLinks
AllowOverride None
Order allow,deny
Allow from all

~]# echo “

www.example.com served using DRBD and heartbeat.

” > /data/sites/example.com/index.html

Lets start apache service on both nodes to make sure things are in order.

[root@www1 extras]# service httpd restart
Stopping httpd: [FAILED]
Starting httpd: [ OK ]
[root@www1 extras]#

If it runs, good, and shut it down again, as heartbeat will control it.

Great! on Node 2:

[root@www2 extras]# service httpd restart
Stopping httpd: [FAILED]
Starting httpd: Syntax error on line 266 of /etc/httpd/conf/httpd.conf:
DocumentRoot must be a directory
[FAILED]
[root@www2 extras]#

Ouch !!!!!!

This error is thown because /data/sites/example.com does not exist at this point in time on node 2. Once DRBD will fail over to this node, it WILL make the /data directory availble here, and then heartbeat service will be started by heartbeat. Thus this error will not come “at that time”. So don’t worry about it right now.

Also try to start and then stop mysql service on both sides. You will encounter the same situation. You will have success only on node 1 at the moment. Also note that mysql service will take time to start as it will be Initializing it’s DB on node 1 and at the same time, DRBD will be copying it’s contents on the other side.

~]# vi /etc/my.cnf
[mysqld]
# datadir=/var/lib/mysql # original
datadir=/data/mysql
# socket=/var/lib/mysql/mysql.sock # original
socket=/data/mysql/mysql.sock
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

[mysql.server]
user=mysql
# basedir=/var/lib # original
basedir=/data

[mysqld_safe]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

IMPORTANT : Make sure you copy this file to node 2 as well. Else disaster will occer.

[root@www1 ~]# scp /etc/my.cnf   192.168.0.202:/etc/  


[root@www1 RPMS]# service mysqld start  
Initializing MySQL database:                               [  OK  ]  
Timeout error occurred trying to start MySQL Daemon.  
Starting MySQL:                                            [FAILED]  
[root@www1 RPMS]# service mysqld start  
Timeout error occurred trying to start MySQL Daemon.  
Starting MySQL:                                            [FAILED]  
[root@www1 RPMS]# service mysqld status  
mysqld (pid 6491) is running...  
[root@www1 RPMS]# service mysqld restart  
Stopping MySQL:                                            [  OK  ]  
Timeout error occurred trying to start MySQL Daemon.  
Starting MySQL:                                            [FAILED]  
[root@www1 RPMS]# service mysqld status  
mysqld (pid 7067) is running...  
[root@www1 RPMS]#  

The service start says FAILED, the status command says “running” , the log file says:

[root@www1 ~]# tail -f /var/log/mysqld.log
070225 21:20:45 InnoDB: Shutdown completed; log sequence number 0 43634
070225 21:20:45 [Note] /usr/libexec/mysqld: Shutdown complete

070225 21:20:45 mysqld ended

070225 21:24:25 mysqld started
InnoDB: The first specified data file ./ibdata1 did not exist:
InnoDB: a new database to be created!
070225 21:24:25 InnoDB: Setting file ./ibdata1 size to 10 MB
InnoDB: Database physically writes the file full: wait…
070225 21:24:53 InnoDB: Log file ./ib_logfile0 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile0 size to 5 MB
InnoDB: Database physically writes the file full: wait…
070225 21:25:07 InnoDB: Log file ./ib_logfile1 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile1 size to 5 MB
InnoDB: Database physically writes the file full: wait…
070225 21:25:09 mysqld started
070225 21:25:09 [ERROR] Can’t start server: Bind on TCP/IP port: Address already in use
070225 21:25:09 [ERROR] Do you already have another mysqld server running on port: 3306 ?
070225 21:25:09 [ERROR] Aborting

070225 21:25:09 [Note] /usr/libexec/mysqld: Shutdown complete

070225 21:25:09 mysqld ended

InnoDB: Doublewrite buffer not found: creating new
InnoDB: Doublewrite buffer created
InnoDB: Creating foreign key constraint system tables
InnoDB: Foreign key constraint system tables created
070225 21:25:30 InnoDB: Started; log sequence number 0 0
/usr/libexec/mysqld: ready for connections.
Version: ‘4.1.20’ socket: ‘/data/mysql/mysql.sock’ port: 3306 Source distribution
070225 21:25:52 [Note] /usr/libexec/mysqld: Normal shutdown

070225 21:25:52 InnoDB: Starting shutdown…
070225 21:25:54 InnoDB: Shutdown completed; log sequence number 0 43634
070225 21:25:54 [Note] /usr/libexec/mysqld: Shutdown complete

070225 21:25:54 mysqld ended

070225 21:25:55 mysqld started
070225 21:25:55 InnoDB: Started; log sequence number 0 43634
/usr/libexec/mysqld: ready for connections.
Version: ‘4.1.20’ socket: ‘/data/mysql/mysql.sock’ port: 3306 Source distribution

I do not understand this behavior !!!!!!!

\[root@www1 RPMS\]# mysql -u root -p  
Enter password:  
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)  
\[root@www1 RPMS\]# vi /etc/my.cnf  
\[root@www1 RPMS\]# mysql -u root -p  
\[root@www1 RPMS\]# man mysql  
\[root@www1 RPMS\]# mysql -u root -S /data/mysql/mysql  
mysql/      mysql.sock  
\[root@www1 RPMS\]# mysql -u root -S /data/mysql/mysql.sock -p  
Enter password:  
Welcome to the MySQL monitor.  Commands end with ; or g.  
Your MySQL connection id is 1 to server version: 4.1.20  
  
Type 'help;' or 'h' for help. Type 'c' to clear the buffer.  
  
mysql> use mysql  
Reading table information for completion of table and column names  
You can turn off this feature to get a quicker startup with -A  
  
Database changed  
mysql> select \* from hosts;  
ERROR 1146 (42S02): Table 'mysql.hosts' doesn't exist  
mysql> select \* from host;  
Empty set (0.00 sec)  
  
mysql>quit  
  

This is nonsense !!!!!!!!!!!!!

Here is the reason:

The /etc/init.d/mysqld script has a section as :

# Spin for a maximum of N seconds waiting for the server to come up.
# Rather than assuming we know a valid username, accept an “access
# denied” response as meaning the server is functioning.
echo “result of starting mysqld_safe is $ret”
if [ $ret -eq 0 ]; then
STARTTIMEOUT=30
while [ $STARTTIMEOUT -gt 0 ]; do
RESPONSE=`/usr/bin/mysqladmin -uUNKNOWN_MYSQL_USERping 2>&1` && break
echo “$RESPONSE” | grep -q “Access denied for user” && break
sleep 1
let STARTTIMEOUT=${STARTTIMEOUT}-1
done
if [ $STARTTIMEOUT -eq 0 ]; then
echo “Timeout error occurred trying to start MySQL Daemon.”
action $”Starting $prog: “ /bin/false
else
action $”Starting $prog: “ /bin/true
fi
else
action $”Starting $prog: “ /bin/false
fi

The cause of error / FAILED is this line:

RESPONSE=`/usr/bin/mysqladmin -uUNKNOWN_MYSQL_USERping 2>&1` && break

The problem is that the mysqld_safe program line (not shown here) is given an argument to load the variable values from a default file “/etc/my.cnf”, in this script file. However, the RESPONSE line shown here does not account for socket path and keeps looking for a socket at default location /var/lib/mysql/mysql.sock . Whereas in our situation, we have moved the entire /var/lib/mysql to /data , or so to speak . The solution is to pass the “–socket $socketfile” or “-S $socketfile” parameter to the RESPONSE line above. Changing it to look like:

RESPONSE=`/usr/bin/mysqladmin -S $socketfile -uUNKNOWN_MYSQL_USERping 2>&1` && break

After saving this file you can easily start and stop mysql . Make sure that you copy the modified script to the other node as well.

[root@www1 ~]# scp /etc/init.d/mysqld 192.168.0.202:/etc/init.d/
mysqld 100% 4619 4.5KB/s 00:00
[root@www1 ~]#

[root@www1 RPMS]# service mysqld start
datadir is /data/mysql
socket file is /data/mysql/mysql.sock
result of starting mysqld_safe is 0
Starting MySQL: [ OK ]
[root@www1 RPMS]#

Now on both Web servers start the DRBD and heartbeat services:

**Node 1:**

[root@www1 RPMS]# chkconfig --level 35 drbd on  
[root@www1 RPMS]# chkconfig --level 35 heartbeat on  
  
[root@www1 RPMS]# service drbd start  

Starting DRBD resources: [ ].

[root@www1 RPMS]# service heartbeat start
Starting High-Availability services:
2007/02/25_22:05:36 INFO: IPaddr Running OK
2007/02/25_22:05:36 CRITICAL: Resource IPaddr::192.168.0.203/24/eth0 is active, and should not be!
2007/02/25_22:05:36 CRITICAL: Non-idle resources can affect data integrity!
2007/02/25_22:05:36 info: If you don’t know what this means, then get help!
2007/02/25_22:05:36 info: Read the docs and/or source to /usr/lib/heartbeat/ResourceManager for more details.
CRITICAL: Resource IPaddr::192.168.0.203/24/eth0 is active, and should not be!
CRITICAL: Non-idle resources can affect data integrity!
info: If you don’t know what this means, then get help!
info: Read the docs and/or the source to /usr/lib/heartbeat/ResourceManager for more details.
2007/02/25_22:05:36 CRITICAL: Non-idle resources will affect resource takeback!
2007/02/25_22:05:36 CRITICAL: Non-idle resources may affect data integrity!
[ OK ]

This error is because of the /etc/sysconfig/network-scripts/ifcfg-lo:0 file, and the IP 192.168.0.203 being active on lo:0 , which I setup incorrectly. That was not required in the first place. So:

Both nodes:-

~]# ifdown lo:0

~]# rm /etc/sysconfig/network-scripts/ifcfg-lo:0
rm: remove regular file `/etc/sysconfig/network-scripts/ifcfg-lo:0’? y

Now stop heartbeat and drbd on all nodes and restart in this order:

First DRBD on Node 1 then on node 2.

Then heartbeat on Node 1 then on node 2.

As following

Node 1:

[root@www1 RPMS]# service drbd start
Starting DRBD resources: [ d0 s0 n0 ].
……….
***************************************************************
DRBD’s startup script waits for the peer node(s) to appear.
- In case this node was already a degraded cluster before the
reboot the timeout is 120 seconds. [degr-wfc-timeout]
- If the peer was available before the reboot the timeout will
expire after 0 seconds. [wfc-timeout]
(These values are for resource ‘r0’; 0 sec -> wait forever)
To abort waiting enter ‘yes’ [ 35]:

As soon as you see the notice above, you should know that the drbd on the other node is not running. So start it:

Node 2:

[root@www2 extras]# service drbd start
Starting DRBD resources: [ d0 s0 n0 ].

Now start the heartbeat :

Node 1:

[root@www1 RPMS]# service heartbeat start
Starting High-Availability services:
2007/02/25_22:17:32 INFO: IPaddr Resource is stopped
[ OK ]

Node 2:

[root@www2 extras]# service heartbeat start
Starting High-Availability services:
2007/02/25_22:19:30 INFO: IPaddr Resource is stopped
[ OK ]

Now check that httpd and MySQL are alive on www1 as well as the IP and DRBD role:

[root@www1 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0A:5E:05:97:B4
inet addr:192.168.0.201 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::20a:5eff:fe05:97b4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:135182 errors:8764 dropped:0 overruns:0 frame:8764
TX packets:137915 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:38226250 (36.4 MiB) TX bytes:134849046 (128.6 MiB)
Interrupt:5 Base address:0x7080

eth0:0 Link encap:Ethernet HWaddr 00:0A:5E:05:97:B4
inet addr:192.168.0.203 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:5 Base address:0x7080

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:59 errors:0 dropped:0 overruns:0 frame:0
TX packets:59 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6196 (6.0 KiB) TX bytes:6196 (6.0 KiB)

[root@www1 RPMS]# service httpd status
httpd (pid 12986 12985 12984 12983 12977 12976 12975 12974 12973) is running…
[root@www1 RPMS]# service mysqld status
datadir is /data/mysql
socket file is /data/mysql/mysql.sock
mysqld (pid 13156) is running…
[root@www1 RPMS]#

[root@www1 RPMS]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:96 nr:0 dw:32 dr:2695 al:0 bm:2 lo:0 pe:0 ua:0 ap:0

Check the same on Node 2:

[root@www2 extras]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:40:26:5F:5C:60
inet addr:192.168.0.202 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::240:26ff:fe5f:5c60/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:125148 errors:0 dropped:0 overruns:0 frame:0
TX packets:114584 errors:0 dropped:0 overruns:0 carrier:0
collisions:33839 txqueuelen:1000
RX bytes:132306637 (126.1 MiB) TX bytes:12236087 (11.6 MiB)
Interrupt:5 Base address:0x7080

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:55 errors:0 dropped:0 overruns:0 frame:0
TX packets:55 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5670 (5.5 KiB) TX bytes:5670 (5.5 KiB)

[root@www2 extras]# service httpd status
httpd is stopped

[root@www2 extras]# service mysqld status
datadir is /data/mysql
socket file is /data/mysql/mysql.sock
mysqld is stopped
[root@www2 extras]#

[root@www2 extras]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Secondary/Primary ld:Consistent
ns:0 nr:96 dw:96 dr:0 al:0 bm:2 lo:0 pe:0 ua:0 ap:0

Lets try a failover. Note that Node1 is master/ primary for DRBD, HTTPD and MYSQL at the moment. This should change as soon as we manually “FAIL” node1:

[root@www1 RPMS]# service heartbeat stop Stopping High-Availability services:
[ OK ]
[root@www1 RPMS]#

[root@www1 RPMS]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0A:5E:05:97:B4
inet addr:192.168.0.201 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::20a:5eff:fe05:97b4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:135969 errors:8765 dropped:0 overruns:0 frame:8765
TX packets:138601 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:38329258 (36.5 MiB) TX bytes:134983955 (128.7 MiB)
Interrupt:5 Base address:0x7080

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:59 errors:0 dropped:0 overruns:0 frame:0
TX packets:59 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6196 (6.0 KiB) TX bytes:6196 (6.0 KiB)

[root@www1 RPMS]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Secondary/Primary ld:Consistent
ns:159 nr:35 dw:130 dr:2695 al:0 bm:39 lo:0 pe:0 ua:0 ap:0

[root@www1 RPMS]# service httpd status
httpd is stopped

[root@www1 RPMS]# service mysqld status
datadir is /data/mysql
socket file is /data/mysql/mysql.sock
mysqld is stopped
[root@www1 RPMS]#

Check node2 and status of all applications:

[root@www2 extras]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:40:26:5F:5C:60
inet addr:192.168.0.202 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::240:26ff:fe5f:5c60/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:126155 errors:0 dropped:0 overruns:0 frame:0
TX packets:115389 errors:0 dropped:0 overruns:0 carrier:0
collisions:33863 txqueuelen:1000
RX bytes:132459094 (126.3 MiB) TX bytes:12356739 (11.7 MiB)
Interrupt:5 Base address:0x7080

eth0:0 Link encap:Ethernet HWaddr 00:40:26:5F:5C:60
inet addr:192.168.0.203 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:5 Base address:0x7080

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:57 errors:0 dropped:0 overruns:0 frame:0
TX packets:57 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5838 (5.7 KiB) TX bytes:5838 (5.7 KiB)

[root@www2 extras]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:35 nr:159 dw:194 dr:2631 al:0 bm:2 lo:0 pe:0 ua:0 ap:0

[root@www2 extras]# service httpd status
httpd (pid 11668 11667 11666 11665 11664 11663 11662 11659 11658) is running…

[root@www2 extras]# service mysqld status
datadir is /data/mysql
socket file is /data/mysql/mysql.sock
mysqld (pid 11841) is running…

service heartbeat start

Now Apache website and mysql transaction testing:

Since Node 2 is primary at the moment, lets create a db, table and a sample record in it.

[root@www2 extras]# mysql -u root -S /data/mysql/mysql.sock

Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 2 to server version: 4.1.20

Type ‘help;’ or ‘h’ for help. Type ‘c’ to clear the buffer.

mysql> create database kamitest;
Query OK, 1 row affected (0.02 sec)

mysql> use kamitest;
Database changed
mysql> create table students (id int, name varchar(10));
Query OK, 0 rows affected (0.02 sec)

mysql> insert into students values (1,’Imran’);
Query OK, 1 row affected (0.00 sec)

mysql> quit
Bye
[root@www2 extras]#

Lets bring up the Node 1 and manually FAIL this node:

[root@www1 RPMS]# service heartbeat start
Starting High-Availability services:
2007/02/25_22:38:00 INFO: IPaddr Resource is stopped
[ OK ]

Note: Since autofailback is off, node1 will NOT acquire resources from Node2. It will just wait for the FAILOVER to occur.

Let’s fail node 2.

[root@www2 extras]# service heartbeat stop
Stopping High-Availability services:
[ OK ]
[root@www2 extras]#

Check status of httpd and mysql on node 2 to make sure it is stopped by heartbeat.
[ OK ]
[root@www2 extras]# service httpd status
httpd is stopped
[root@www2 extras]# service mysqld status
datadir is /data/mysql
socket file is /data/mysql/mysql.sock
mysqld is stopped
[root@www2 extras]#

Now our node 1 is active, we should have data in the mysql table we created earlier.

[root@www1 RPMS\]# mysql -u root  
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)  
  
[root@www1 RPMS\]# mysql -u root -S /data/mysql/mysql.sock  
Welcome to the MySQL monitor.  Commands end with ; or g.  
Your MySQL connection id is 2 to server version: 4.1.20  
  
Type 'help;' or 'h' for help. Type 'c' to clear the buffer.  
  
mysql> use kamitest;  
Reading table information for completion of table and column names  
You can turn off this feature to get a quicker startup with -A  
  
Database changed  
mysql> select \* from students;  
+# `service heartbeat start`# `service heartbeat start`+# `service heartbeat start`# `service heartbeat start`-+  
| id   | name  |  
+# `service heartbeat start`# `service heartbeat start`+# `service heartbeat start`# `service heartbeat start`-+  
|    1 | Imran |  
+# `service heartbeat start`# `service heartbeat start`+# `service heartbeat start`# `service heartbeat start`-+  
1 row in set (0.00 sec)  
  
mysql>quit  
  

Great!

But why did it not work without the socket option? Will investigate later. Maybe the mysql client is not intelligent enough ? but the man page says :

Default options are read from the following files in the given order:
/etc/my.cnf /var/lib/mysql/my.cnf ~/.my.cnf

This means that the mysql client program “should” be able to pick up the socket option from the /etc/my.cnf ! Wait a minute ! Do we have a [mysql] para in the /etc/my.cnf ?

[root@www1 RPMS]# cat /etc/my.cnf
[mysqld]
# datadir=/var/lib/mysql # original
datadir=/data/mysql
# socket=/var/lib/mysql/mysql.sock # original
socket=/data/mysql/mysql.sock
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

[mysql.server]
user=mysql
# basedir=/var/lib # original
basedir=/data

[mysqld_safe]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
[root@www1 RPMS]#

Answer: No. So maybe we need to add a section? Lets see:

Lets add this to /etc/my.cnf on Both nodes:

[mysql]
socket=/data/mysql/mysql.sock

save the file and check mysql client:

[root@www2 ~]# mysql -u root
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 2 to server version: 4.1.20

Type ‘help;’ or ‘h’ for help. Type ‘c’ to clear the buffer.

mysql> quit
Bye
[root@www2 ~]#

Alhumdulillah! Great ! Without giving socket info, it picked up mysql.

Copy the /etc/my.cnf to both nodes. Try fail over, etc etc. It should work.