WBITT

We Bring In Tomorrows Technology

  • Full Screen
  • Wide Screen
  • Narrow Screen
  • Increase font size
  • Default font size
  • Decrease font size

MySQL+Apache High Availability

E-mail Print PDF
User Rating: / 3
PoorBest 

Apache, MySQL High Availability
====================
Author: Muhammad Kamran Azeem ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it )
OS: CentOS 4.4
Date: Mid 2006

This is a very sharp edged howto, I made for mysql and apache ha setup in my home. It was basically a log of events I did. But putting it here for the benefit of everyone. This was done in mid 2006. You may need to adjust a few things as technology has advanced.

 

Setup DNS:

vi /var/named/chroot/var/named/example.com.fwd  

$ORIGIN example.com.
$TTL    86400
@       IN SOA  homeserver.example.com.    This e-mail address is being protected from spambots. You need JavaScript enabled to view it . (
20060421        ; serial (d. adams)
3H              ; refresh
15M             ; retry
1W              ; expiry
1D )            ; minimum

@       IN NS           homeserver.homedomain.com.
@       IN MX  10       homeserver.homedomain.com.

www1.example.com.       IN      A       192.168.0.201
www2.example.com.       IN      A       192.168.0.202
www.example.com.        IN      A       192.168.0.203

----------------------------------------------------

vi /var/named/chroot/var/named/0.168.192.in-addr.arpa.zone

$ORIGIN 0.168.192.in-addr.arpa.
$TTL    86400
@       IN SOA  homeserver.homedomain.com.    This e-mail address is being protected from spambots. You need JavaScript enabled to view it . (
20060323        ; serial (d. adams)
3H              ; refresh
15M             ; retry
1W              ; expiry
1D )            ; minimum

@       IN NS           homeserver.homedomain.com.
@       IN MX  10       homeserver.homedomain.com.

254.0.168.192.in-addr.arpa.     IN      PTR     homeserver.homedomain.com.
201.0.168.192.in-addr.arpa.     IN      PTR     www1.example.com.
202.0.168.192.in-addr.arpa.     IN      PTR     www2.example.com.
203.0.168.192.in-addr.arpa.     IN      PTR     www.example.com.


service named restart


--------------------------

on both Web servers (This step is not required. Creats problem with heartbeat)

vi  /etc/sysconfig/network-scripts/ifcfg-lo\:0

DEVICE=lo:0
BOOTPROTO=static
IPADDR=192.168.0.203
NETMASK=255.255.255.255
ONBOOT=yes
NAME=loopback

----------------------

Both Web servers:

vi /etc/sysctl.conf

. . .
. . .
# When an arp request is received on eth0, only respond if that address is
# configured on eth0. In particular, do not respond if the address is
# configured on lo
net.ipv4.conf.eth0.arp_ignore = 1

# Ditto for eth1, add for all ARPing interfaces
#net.ipv4.conf.eth1.arp_ignore = 1


# Enable configuration of arp_announce option
net.ipv4.conf.all.arp_announce = 2

# When making an ARP request sent through eth0 Always use an address that
# is configured on eth0 as the source address of the ARP request.  If this
# is not set, and packets are being sent out eth0 for an address that is on
# lo, and an arp request is required, then the address on lo will be used.
# As the source IP address of arp requests is entered into the ARP cache on
# the destination, it has the effect of announcing this address.  This is
# not desirable in this case as adresses on lo on the real-servers should
# be announced only by the linux-director.
net.ipv4.conf.eth0.arp_announce = 2

# Ditto for eth1, add for all ARPing interfaces
#net.ipv4.conf.eth1.arp_announce = 2


Then:

sysctl -p

-----------------------

Create two partitions on both NFS servers with IDENTICAL /  SAME size on both servers


one for DRBD (not less than 128 MB)

one for /data (the partition containing NFS share, which needs to be replicated) (I have created a 200MB partition only for testing)


Node 1:

~]# fdisk -l

Disk /dev/hda: 8455 MB, 8455200768 bytes
255 heads, 63 sectors/track, 1027 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1         510     4096543+  83  Linux
/dev/hda3             995        1027      265072+  82  Linux swap
/dev/hda4             511         994     3887730    5  Extended
/dev/hda5             511         529      152586   83  Linux
/dev/hda6             530         554      200781   83  Linux



Node 2:

~]# fdisk -l

Disk /dev/hda: 8455 MB, 8455200768 bytes
255 heads, 63 sectors/track, 1027 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1         510     4096543+  83  Linux
/dev/hda2             511         543      265072+  82  Linux swap
/dev/hda4             544        1027     3887730    5  Extended
/dev/hda5             544         562      152586   83  Linux
/dev/hda6             563         587      200781   83  Linux




------------------


Make sure you "DO NOT" mount the above two partitions on any of the NFS servers. Neither should you mention them in fstab.

---------------------

NTP:

Time should be same on both nodes, ideally sychronized by an external time clock.

I use my homeserver as Time source for all nodes, as it has ntpd service running on it. So, on both nodes:


~]# vi /etc/ntp.conf

...
server 192.168.0.254
....


Make sure the service ntpd is not running on nodes right now, then:

~]# ntpdate -u 192.168.0.254
~]# ntpdate -u 192.168.0.254
~]# ntpdate -u 192.168.0.254


~]# service ntpd start
Starting ntpd:                                             [  OK  ]

~]# chkconfig --level 35 ntpd on

------------------


DRBD installation on both Web servers


Both Nodes:

~]# rpm -ivh drbd-0.7.23-1.el4.centos.i386.rpm
~]# rpm -ivh kernel-module-drbd-2.6.9-42.EL-0.7.21-1.c4.i686.rpm

-------------------------------------

on BOTH NFS servers :

vi /etc/drbd.conf

resource r0 {
protocol C;
incon-degr-cmd "halt -f";
startup {
degr-wfc-timeout 120;    # 2 minutes.
}

disk {
on-io-error   detach;
}

net {

}

syncer {

rate 10M;

group 1;

al-extents 257;
}

on www1.example.com {          # ** EDIT ** the hostname of server 1 (uname -n)
device     /dev/drbd0;        #
disk       /dev/hda6;         # ** EDIT ** data partition on server 1
address    192.168.0.201:7788; # ** EDIT ** IP address on server 1
meta-disk  /dev/hda5[0];      # ** EDIT ** 128MB partition for DRBD on server 1
}

on www2.example.com {          # ** EDIT ** the hostname of server 2 (uname -n)
device    /dev/drbd0;         #
disk      /dev/hda6;          # ** EDIT ** data partition on server 2
address   192.168.0.202:7788;  # ** EDIT ** IP address on server 2
meta-disk /dev/hda5[0];       # ** EDIT ** 128MB partition for DRBD on server 2
}

}


-------------------------


Both Web servers:

drbdadm up all

--------------

Check the status of /proc/drbd on both Web servers and you should see something like:


~]# cat /proc/drbd

[root@www1 ~]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Secondary/Secondary ld:Consistent
ns:0 nr:4 dw:4 dr:0 al:0 bm:1 lo:0 pe:0 ua:0 ap:0
1: cs:Unconfigured


[root@www2 ~]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Secondary/Secondary ld:Consistent
ns:4 nr:0 dw:0 dr:4 al:0 bm:1 lo:0 pe:0 ua:0 ap:0
1: cs:Unconfigured
[root@www2 ~]#

--------------------------------------


You see that both Web servers say that they are secondary and that the data is inconsistant. This is because no initial sync has been made yet.

I want to make WWW1 the primary Web server and WWW2 the "hot-standby", If WWW1 fails, WWW2 takes over, and if WWW1 comes back then all data that has changed in the meantime is mirrored back from WWW2 to WWW1 so that data is always consistent.



Only on WWW1:-

~]# drbdadm -- --do-what-I-say primary all

This will start the sync process

[root@nfs1 extras]# cat /proc/drbd
version: 0.7.17 (api:77/proto:74)
SVN Revision: 2093 build by buildcentos@build-i386, 2006-04-13 14:38:33
0: cs:SyncSource st:Primary/Secondary ld:Consistent
ns:18548 nr:0 dw:0 dr:19468 al:0 bm:27 lo:0 pe:25 ua:230 ap:0
[==>.................] sync'ed: 12.0% (182332/200784)K
finish: 0:04:20 speed: 668 (540) K/sec
1: cs:Unconfigured
[root@nfs1 extras]#


This will take some time. I used only a 200MB data partition. On larger partitions, it may take hours !



While the sync process is running, you should see the syncing process output from /proc/drbd on both NFS servers:

[root@nfs2 extras]# cat /proc/drbd
version: 0.7.17 (api:77/proto:74)
SVN Revision: 2093 build by buildcentos@build-i386, 2006-04-13 14:38:33
0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent
ns:0 nr:96776 dw:96776 dr:0 al:0 bm:31 lo:0 pe:245 ua:0 ap:0
[===>................] sync'ed: 19.4% (104008/123188)K
finish: 0:04:20 speed: 176 (636) K/sec
1: cs:Unconfigured
[root@nfs2 extras]#


Notice the role change in both servers above ----^


The following shows that Syncing is complete:

[root@www1 ~]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:0 nr:0 dw:4 dr:0 al:0 bm:1 lo:0 pe:0 ua:0 ap:0
1: cs:Unconfigured


[root@www2 ~]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Secondary/Primary ld:Consistent
ns:4 nr:0 dw:0 dr:4 al:0 bm:1 lo:0 pe:0 ua:0 ap:0
1: cs:Unconfigured
[root@www2 ~]#


---------------------

Now you will need to format this newly created data partition:

On WWW1 only:

~]# mke2fs -v  -j /dev/drbd0

---------------------------


Now what we want to do it make a directory "/data" on both Web servers and mount the "DRBD  data partition" on it. Then we will further create two directories, one as /data/sites for Web Content and the other /data/mysql. It is not important to create this /data/mysql directory based on the following discussion.

If you create the a symbolic link from  /data/mysql to /var/lib/mysql. It will NOT work and everytime mysql WILL FAIL TO START. MySQL expects this to be a directory. The solution is to edit the /etc/my.cnf and change the directives from from "/var/lib" to "/data" . So when mysqld is started, it will automatically create the /data/mysql directory. And will start fine. Thanks to Naveed Ahmad ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it ) for pointing this out.

~]# vi /etc/my.cnf

[mysqld]
# datadir=/var/lib/mysql # original
datadir=/data/mysql
# socket=/var/lib/mysql/mysql.sock # original
socket=/data/mysql/mysql.sock
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

[mysql.server]
user=mysql
# basedir=/var/lib # original
basedir=/data

[mysqld_safe]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid



Proof of failure (using the original my.cnf):

[root@www1 extras]# service mysqld start
Initializing MySQL database:                               [  OK  ]
Timeout error occurred trying to start MySQL Daemon.
Starting MySQL:                                            [FAILED]
You have mail in /var/spool/mail/root
[root@www1 extras]# service mysqld start
Timeout error occurred trying to start MySQL Daemon.
Starting MySQL:                                            [FAILED]
[root@www1 extras]#


And from the mysqld.log file:

[root@www1 ~]# tail -f /var/log/mysqld.log
070225 21:08:01  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 0 43634.
InnoDB: Doing recovery: scanned up to log sequence number 0 43634
070225 21:08:01  InnoDB: Flushing modified pages from the buffer pool...
070225 21:08:01  InnoDB: Started; log sequence number 0 43634
070225 21:08:01 [ERROR] /usr/libexec/mysqld: Can't find file: './mysql/host.frm' (errno: 13)
070225 21:08:01 [ERROR] /usr/libexec/mysqld: Can't find file: './mysql/host.frm' (errno: 13)
070225 21:08:01 [ERROR] Fatal error: Can't open and lock privilege tables: Can't find file: './mysql/host.frm' (errno: 13)
070225 21:08:01  mysqld ended







On both Web Servers:

~]# mkdir /data


Then on WWW1 only mount the parition on the /data mount point.

~]# mount -t ext3 /dev/drbd0 /data


Check using "mount" and "df -h" commands:

[root@www1 ~]# mount
. . .
. . .

/dev/drbd0 on /data type ext3 (rw)


[root@www1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda1             3.9G  711M  3.0G  20% /
none                   58M     0   58M   0% /dev/shm
/dev/drbd0            190M  5.6M  175M   4% /data



Now on node 1:

[root@www1 ~]# mkdir /data/sites/example.com -p


On node 2, though we have not mounted the /data yet, nor we have the sites and mysql directories in /data yet as well.

===================================================  

Now lets install HeartBeat on both nodes:

First some required packages from CentOS main RPMS:

~]# rpm -ivh curl-7.12.1-8.rhel4.i386.rpm perl-Crypt-SSLeay-0.51-5.i386.rpm perl-HTML-Parser-3.35-6.i386.rpm perl-LDAP-0.31-5.noarch.rpm perl-Net-DNS-0.48-1.i386.rpm perl-libwww-perl-5.79-5.noarch.rpm libidn-0.5.6-1.i386.rpm perl-Convert-ASN1-0.18-3.noarch.rpm perl-Digest-HMAC-1.01-13.noarch.rpm perl-HTML-Tagset-3.03-30.noarch.rpm perl-URI-1.30-4.noarch.rpm perl-XML-SAX-0.12-7.noarch.rpm perl-Digest-SHA1-2.07-5.i386.rpm perl-XML-NamespaceSupport-1.08-6.noarch.rpm  gnutls-1.0.20-3.2.2.i386.rpm libglade2-2.4.0-5.i386.rpm lm_sensors-2.8.7-2.40.3.i386.rpm pygtk2-2.4.0-1.i386.rpm atk-1.8.0-2.i386.rpm gtk2-2.4.13-19.i386.rpm pango-1.6.0-9.i386.rpm


then HeartBeat packages from CentOS extras:
~]# rpm -ivh perl-Mail-IMAPClient-2.2.9-1.rf.noarch.rpm ipvsadm-1.24-6.i386.rpm heartbeat-2.0.7-1.c4.i386.rpm heartbeat-ldirectord-2.0.7-1.c4.i386.rpm heartbeat-pils-2.0.7-1.c4.i386.rpm heartbeat-stonith-2.0.7-1.c4.i386.rpm  perl-MailTools-1.74-1.c4.noarch.rpm perl-Net-IMAP-Simple-1.16-1.c4.noarch.rpm perl-Net-IMAP-Simple-SSL-1.3-1.c4.noarch.rpm perl-Mail-POP3Client-2.17-1.c4.noarch.rpm perl-TimeDate-1.16-1.c4.noarch.rpm perl-IO-Socket-SSL-1.01-1.c4.noarch.rpm perl-Net-SSLeay-1.25-3.rf.i386.rpm


----------------------------

BOTH Web servers:

vi /etc/ha.d/ha.cf

logfacility     local0
keepalive 2
#deadtime 30 # USE THIS!!!
deadtime 10
#bcast   eth0
serial /dev/ttyS0
baud 19200
auto_failback off
node www1.example.com www2.example.com


-----------------


WWW1:

vi /etc/ha.d/haresources

www1.example.com  IPaddr::192.168.0.203/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 httpd mysqld

NFS2:

vi /etc/ha.d/haresources

www2.example.com  IPaddr::192.168.0.203/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 httpd mysqld

---------------


Both nodes:

~]# vi /etc/ha.d/authkeys

auth 3
3 md5 redhat


~]# chmod 600 /etc/ha.d/authkeys

--------------------------


Just to make things really interesting make sure you configure httpd.conf on both servers, setup the following directives:

ServerName , DocumentRoot


Node1:

ServerName www1.example.com:80
. . .
. . .
DocumentRoot "/data/sites/example.com"


Options Indexes FollowSymLinks
AllowOverride None
Order allow,deny
Allow from all



Node2:

ServerName www2.example.com:80
. . .
. . .
DocumentRoot "/data/sites/example.com"


Options Indexes FollowSymLinks
AllowOverride None
Order allow,deny
Allow from all



~]# echo "

www.example.com served using DRBD and heartbeat.

" > /data/sites/example.com/index.html



Lets start apache service on both nodes to make sure things are in order.

[root@www1 extras]# service httpd restart
Stopping httpd:                                            [FAILED]
Starting httpd:                                            [  OK  ]
[root@www1 extras]#


If it runs, good, and shut it down again, as heartbeat will control it.

Great! on node 2:

[root@www2 extras]# service httpd restart
Stopping httpd:                                            [FAILED]
Starting httpd: Syntax error on line 266 of /etc/httpd/conf/httpd.conf:
DocumentRoot must be a directory
[FAILED]
[root@www2 extras]#


Ouch !!!!!!

This error is thown because /data/sites/example.com does not exist at this point in time on node 2. Once DRBD will fail over to this node, it WILL make the /data directory availble here, and then heartbeat service will be started by heartbeat. Thus this error will not come "at that time". So don't worry about it right now.


Also try to start and then stop mysql service on both sides. You will encounter the same situation. You will have success only on node 1 at the moment. Also note that mysql service will take time to start as it will be Initializing it's DB on node 1 and at the same time, DRBD will be copying it's contents on the other side.

~]# vi /etc/my.cnf
[mysqld]
# datadir=/var/lib/mysql # original
datadir=/data/mysql
# socket=/var/lib/mysql/mysql.sock # original
socket=/data/mysql/mysql.sock
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

[mysql.server]
user=mysql
# basedir=/var/lib # original
basedir=/data

[mysqld_safe]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid


IMPORTANT : Make sure you copy this file to node 2 as well. Else disaster will occer.

[root@www1 ~]# scp /etc/my.cnf   192.168.0.202:/etc/




[root@www1 RPMS]# service mysqld start
Initializing MySQL database:                               [  OK  ]
Timeout error occurred trying to start MySQL Daemon.
Starting MySQL:                                            [FAILED]
[root@www1 RPMS]# service mysqld start
Timeout error occurred trying to start MySQL Daemon.
Starting MySQL:                                            [FAILED]
[root@www1 RPMS]# service mysqld status
mysqld (pid 6491) is running...
[root@www1 RPMS]# service mysqld restart
Stopping MySQL:                                            [  OK  ]
Timeout error occurred trying to start MySQL Daemon.
Starting MySQL:                                            [FAILED]
[root@www1 RPMS]# service mysqld status
mysqld (pid 7067) is running...
[root@www1 RPMS]#


The service start says FAILED, the status command says "running" , the log file says :

[root@www1 ~]# tail -f /var/log/mysqld.log
070225 21:20:45  InnoDB: Shutdown completed; log sequence number 0 43634
070225 21:20:45 [Note] /usr/libexec/mysqld: Shutdown complete

070225 21:20:45  mysqld ended

070225 21:24:25  mysqld started
InnoDB: The first specified data file ./ibdata1 did not exist:
InnoDB: a new database to be created!
070225 21:24:25  InnoDB: Setting file ./ibdata1 size to 10 MB
InnoDB: Database physically writes the file full: wait...
070225 21:24:53  InnoDB: Log file ./ib_logfile0 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile0 size to 5 MB
InnoDB: Database physically writes the file full: wait...
070225 21:25:07  InnoDB: Log file ./ib_logfile1 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile1 size to 5 MB
InnoDB: Database physically writes the file full: wait...
070225 21:25:09  mysqld started
070225 21:25:09 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use
070225 21:25:09 [ERROR] Do you already have another mysqld server running on port: 3306 ?
070225 21:25:09 [ERROR] Aborting

070225 21:25:09 [Note] /usr/libexec/mysqld: Shutdown complete

070225 21:25:09  mysqld ended



InnoDB: Doublewrite buffer not found: creating new
InnoDB: Doublewrite buffer created
InnoDB: Creating foreign key constraint system tables
InnoDB: Foreign key constraint system tables created
070225 21:25:30  InnoDB: Started; log sequence number 0 0
/usr/libexec/mysqld: ready for connections.
Version: '4.1.20'  socket: '/data/mysql/mysql.sock'  port: 3306  Source distribution
070225 21:25:52 [Note] /usr/libexec/mysqld: Normal shutdown

070225 21:25:52  InnoDB: Starting shutdown...
070225 21:25:54  InnoDB: Shutdown completed; log sequence number 0 43634
070225 21:25:54 [Note] /usr/libexec/mysqld: Shutdown complete

070225 21:25:54  mysqld ended

070225 21:25:55  mysqld started
070225 21:25:55  InnoDB: Started; log sequence number 0 43634
/usr/libexec/mysqld: ready for connections.
Version: '4.1.20'  socket: '/data/mysql/mysql.sock'  port: 3306  Source distribution




----------

I do not understand this behavior !!!!!!!


[root@www1 RPMS]# mysql -u root -p
Enter password:
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
[root@www1 RPMS]# vi /etc/my.cnf
[root@www1 RPMS]# mysql -u root -p
[root@www1 RPMS]# man mysql
[root@www1 RPMS]# mysql -u root -S /data/mysql/mysql
mysql/      mysql.sock
[root@www1 RPMS]# mysql -u root -S /data/mysql/mysql.sock -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1 to server version: 4.1.20

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use mysql
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from hosts;
ERROR 1146 (42S02): Table 'mysql.hosts' doesn't exist
mysql> select * from host;
Empty set (0.00 sec)

mysql>quit


This is nonsense !!!!!!!!!!!!!
---------------



Here is the reason:

The /etc/init.d/mysqld script has a section as :

# Spin for a maximum of N seconds waiting for the server to come up.
# Rather than assuming we know a valid username, accept an "access
# denied" response as meaning the server is functioning.
echo "result of starting mysqld_safe is $ret"        
if [ $ret -eq 0 ]; then            
STARTTIMEOUT=30            
while [ $STARTTIMEOUT -gt 0 ]; do                
RESPONSE=`/usr/bin/mysqladmin -uUNKNOWN_MYSQL_USERping 2>&1` && break                
echo "$RESPONSE" | grep -q "Access denied for user" && break                
sleep 1                
let STARTTIMEOUT=${STARTTIMEOUT}-1            
done            
if [ $STARTTIMEOUT -eq 0 ]; then                    
echo "Timeout error occurred trying to start MySQL Daemon."
action $"Starting $prog: " /bin/false
else                    
action $"Starting $prog: " /bin/true            
fi        
else            
action $"Starting $prog: " /bin/false
fi


The cause of error / FAILED is this line:

RESPONSE=`/usr/bin/mysqladmin -uUNKNOWN_MYSQL_USERping 2>&1` && break  

The problem is that the mysqld_safe program line (not shown here) is given an argument to load the variable values from a default file "/etc/my.cnf", in this script file. However, the RESPONSE line shown here does not account for socket path and keeps looking for a socket at default location /var/lib/mysql/mysql.sock . Whereas in our situation, we have moved the entire /var/lib/mysql to /data , or so to speak . The solution is to pass the "--socket $socketfile" or "-S $socketfile" parameter to the RESPONSE line above. Changing it to look like:

RESPONSE=`/usr/bin/mysqladmin -S $socketfile -uUNKNOWN_MYSQL_USERping 2>&1` && break  


After saving this file you can easily start and stop mysql . Make sure that you copy the modified script to the other node as well.

[root@www1 ~]# scp /etc/init.d/mysqld 192.168.0.202:/etc/init.d/
mysqld                                          100% 4619     4.5KB/s   00:00
[root@www1 ~]#


[root@www1 RPMS]# service mysqld start
datadir is /data/mysql
socket file is /data/mysql/mysql.sock
result of starting mysqld_safe is 0
Starting MySQL:                                            [  OK  ]
[root@www1 RPMS]#


===================

Now on both Web servers start the DRBD and heartbeat services:


Node 1:

[root@www1 RPMS]# chkconfig --level 35 drbd on
[root@www1 RPMS]# chkconfig --level 35 heartbeat on

[root@www1 RPMS]# service drbd start
Starting DRBD resources:    [ ].


[root@www1 RPMS]# service heartbeat start
Starting High-Availability services:
2007/02/25_22:05:36 INFO: IPaddr Running OK
2007/02/25_22:05:36 CRITICAL: Resource IPaddr::192.168.0.203/24/eth0 is active, and should not be!
2007/02/25_22:05:36 CRITICAL: Non-idle resources can affect data integrity!
2007/02/25_22:05:36 info: If you don't know what this means, then get help!
2007/02/25_22:05:36 info: Read the docs and/or source to /usr/lib/heartbeat/ResourceManager for more details.
CRITICAL: Resource IPaddr::192.168.0.203/24/eth0 is active, and should not be!
CRITICAL: Non-idle resources can affect data integrity!
info: If you don't know what this means, then get help!
info: Read the docs and/or the source to /usr/lib/heartbeat/ResourceManager for more details.
2007/02/25_22:05:36 CRITICAL: Non-idle resources will affect resource takeback!
2007/02/25_22:05:36 CRITICAL: Non-idle resources may affect data integrity!
[  OK  ]

This error is because of the /etc/sysconfig/network-scripts/ifcfg-lo:0 file, and the IP 192.168.0.203 being active on lo:0 , which I setup incorrectly. That was not required in the first place. So:

Both nodes:-

~]# ifdown lo:0

~]# rm /etc/sysconfig/network-scripts/ifcfg-lo\:0
rm: remove regular file `/etc/sysconfig/network-scripts/ifcfg-lo:0'? y


Now stop heartbeat and drbd on all nodes and restart in this order:

First DRBD on Node 1 then on node 2.

Then heartbeat on Node 1 then on node 2.

As following


Node 1:

[root@www1 RPMS]# service drbd start
Starting DRBD resources:    [ d0 s0 n0 ].
..........
***************************************************************
DRBD's startup script waits for the peer node(s) to appear.
- In case this node was already a degraded cluster before the
reboot the timeout is 120 seconds. [degr-wfc-timeout]
- If the peer was available before the reboot the timeout will
expire after 0 seconds. [wfc-timeout]
(These values are for resource 'r0'; 0 sec -> wait forever)
To abort waiting enter 'yes' [  35]:

As soon as you see the notice above, you should know that the drbd on the other node is not running. So start it:

Node 2:

[root@www2 extras]# service drbd  start
Starting DRBD resources:    [ d0 s0 n0 ].


Now start the heartbeat :


Node 1:

[root@www1 RPMS]# service heartbeat start
Starting High-Availability services:
2007/02/25_22:17:32 INFO: IPaddr Resource is stopped
[  OK  ]

Node 2:

[root@www2 extras]# service heartbeat  start
Starting High-Availability services:
2007/02/25_22:19:30 INFO: IPaddr Resource is stopped
[  OK  ]


Now check that httpd and MySQL are alive on www1 as well as the IP and DRBD role:

[root@www1 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0A:5E:05:97:B4
inet addr:192.168.0.201  Bcast:192.168.0.255  Mask:255.255.255.0
inet6 addr: fe80::20a:5eff:fe05:97b4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:135182 errors:8764 dropped:0 overruns:0 frame:8764
TX packets:137915 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:38226250 (36.4 MiB)  TX bytes:134849046 (128.6 MiB)
Interrupt:5 Base address:0x7080

eth0:0    Link encap:Ethernet  HWaddr 00:0A:5E:05:97:B4
inet addr:192.168.0.203  Bcast:192.168.0.255  Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Interrupt:5 Base address:0x7080

lo        Link encap:Local Loopback
inet addr:127.0.0.1  Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING  MTU:16436  Metric:1
RX packets:59 errors:0 dropped:0 overruns:0 frame:0
TX packets:59 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6196 (6.0 KiB)  TX bytes:6196 (6.0 KiB)


[root@www1 RPMS]# service httpd status
httpd (pid 12986 12985 12984 12983 12977 12976 12975 12974 12973) is running...
[root@www1 RPMS]# service mysqld status
datadir is /data/mysql
socket file is /data/mysql/mysql.sock
mysqld (pid 13156) is running...
[root@www1 RPMS]#


[root@www1 RPMS]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:96 nr:0 dw:32 dr:2695 al:0 bm:2 lo:0 pe:0 ua:0 ap:0



Check the same on node 2:

[root@www2 extras]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:40:26:5F:5C:60
inet addr:192.168.0.202  Bcast:192.168.0.255  Mask:255.255.255.0
inet6 addr: fe80::240:26ff:fe5f:5c60/64 Scope:Link
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:125148 errors:0 dropped:0 overruns:0 frame:0
TX packets:114584 errors:0 dropped:0 overruns:0 carrier:0
collisions:33839 txqueuelen:1000
RX bytes:132306637 (126.1 MiB)  TX bytes:12236087 (11.6 MiB)
Interrupt:5 Base address:0x7080

lo        Link encap:Local Loopback
inet addr:127.0.0.1  Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING  MTU:16436  Metric:1
RX packets:55 errors:0 dropped:0 overruns:0 frame:0
TX packets:55 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5670 (5.5 KiB)  TX bytes:5670 (5.5 KiB)

[root@www2 extras]# service httpd status
httpd is stopped

[root@www2 extras]# service mysqld status
datadir is /data/mysql
socket file is /data/mysql/mysql.sock
mysqld is stopped
[root@www2 extras]#   


[root@www2 extras]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Secondary/Primary ld:Consistent
ns:0 nr:96 dw:96 dr:0 al:0 bm:2 lo:0 pe:0 ua:0 ap:0




Lets try  a failover. Note that Node1 is master/ primary for DRBD, HTTPD and MYSQL at the moment. This should change as soon as we manually "FAIL" node1:

[root@www1 RPMS]# service heartbeat stop
Stopping High-Availability services:
[  OK  ]
[root@www1 RPMS]#



[root@www1 RPMS]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0A:5E:05:97:B4
inet addr:192.168.0.201  Bcast:192.168.0.255  Mask:255.255.255.0
inet6 addr: fe80::20a:5eff:fe05:97b4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:135969 errors:8765 dropped:0 overruns:0 frame:8765
TX packets:138601 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:38329258 (36.5 MiB)  TX bytes:134983955 (128.7 MiB)
Interrupt:5 Base address:0x7080

lo        Link encap:Local Loopback
inet addr:127.0.0.1  Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING  MTU:16436  Metric:1
RX packets:59 errors:0 dropped:0 overruns:0 frame:0
TX packets:59 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6196 (6.0 KiB)  TX bytes:6196 (6.0 KiB)

[root@www1 RPMS]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Secondary/Primary ld:Consistent
ns:159 nr:35 dw:130 dr:2695 al:0 bm:39 lo:0 pe:0 ua:0 ap:0

[root@www1 RPMS]# service httpd status
httpd is stopped

[root@www1 RPMS]# service mysqld  status
datadir is /data/mysql
socket file is /data/mysql/mysql.sock
mysqld is stopped
[root@www1 RPMS]#


Check node2 and status of all applications:


[root@www2 extras]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:40:26:5F:5C:60
inet addr:192.168.0.202  Bcast:192.168.0.255  Mask:255.255.255.0
inet6 addr: fe80::240:26ff:fe5f:5c60/64 Scope:Link
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:126155 errors:0 dropped:0 overruns:0 frame:0
TX packets:115389 errors:0 dropped:0 overruns:0 carrier:0
collisions:33863 txqueuelen:1000
RX bytes:132459094 (126.3 MiB)  TX bytes:12356739 (11.7 MiB)
Interrupt:5 Base address:0x7080

eth0:0    Link encap:Ethernet  HWaddr 00:40:26:5F:5C:60
inet addr:192.168.0.203  Bcast:192.168.0.255  Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Interrupt:5 Base address:0x7080

lo        Link encap:Local Loopback
inet addr:127.0.0.1  Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING  MTU:16436  Metric:1
RX packets:57 errors:0 dropped:0 overruns:0 frame:0
TX packets:57 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5838 (5.7 KiB)  TX bytes:5838 (5.7 KiB)

[root@www2 extras]# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by buildsvn@build-i386, 2006-08-26 20:44:47
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:35 nr:159 dw:194 dr:2631 al:0 bm:2 lo:0 pe:0 ua:0 ap:0

[root@www2 extras]# service httpd status
httpd (pid 11668 11667 11666 11665 11664 11663 11662 11659 11658) is running...

[root@www2 extras]# service mysqld  status
datadir is /data/mysql
socket file is /data/mysql/mysql.sock
mysqld (pid 11841) is running...

----------------------------------------------

Now Apache website and mysql transaction testing:

Since Node 2 is primary at the moment, lets create a db, table and a sample record in it.

[root@www2 extras]# mysql -u root -S /data/mysql/mysql.sock

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2 to server version: 4.1.20

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create database kamitest;
Query OK, 1 row affected (0.02 sec)

mysql> use kamitest;
Database changed
mysql> create table students (id int, name varchar(10));
Query OK, 0 rows affected (0.02 sec)

mysql> insert into students values (1,'Imran');
Query OK, 1 row affected (0.00 sec)

mysql> quit
Bye
[root@www2 extras]#


Lets bring up the Node 1 and manually FAIL this node:

[root@www1 RPMS]# service heartbeat start
Starting High-Availability services:
2007/02/25_22:38:00 INFO: IPaddr Resource is stopped
[  OK  ]

Note: Since autofailback is off, node1 will NOT acquire resources from Node2. It will just wait for the FAILOVER to occur.


Let's fail node 2.

[root@www2 extras]# service heartbeat stop
Stopping High-Availability services:
[  OK  ]
[root@www2 extras]#


Check status of httpd and mysql on node 2 to make sure it is stopped by heartbeat.
[  OK  ]
[root@www2 extras]# service httpd status
httpd is stopped
[root@www2 extras]# service mysqld  status
datadir is /data/mysql
socket file is /data/mysql/mysql.sock
mysqld is stopped
[root@www2 extras]#


Now our node 1 is active, we should have data in the mysql table we created earlier.

[root@www1 RPMS]# mysql -u root
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

[root@www1 RPMS]# mysql -u root -S /data/mysql/mysql.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2 to server version: 4.1.20

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use kamitest;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from students;
+------+-------+
| id   | name  |
+------+-------+
|    1 | Imran |
+------+-------+
1 row in set (0.00 sec)

mysql>quit


Great!

But why did it not work without the socket option? Will investigate later. Maybe the mysql client is not intelligent enough ? but the man page says :

Default options are read from the following files in the given order:
/etc/my.cnf /var/lib/mysql/my.cnf ~/.my.cnf

This means that the mysql client program "should" be able to pick up the socket option from the /etc/my.cnf ! Wait  a minute ! Do we have a [mysql] para in the /etc/my.cnf ?

[root@www1 RPMS]# cat /etc/my.cnf
[mysqld]
# datadir=/var/lib/mysql # original
datadir=/data/mysql
# socket=/var/lib/mysql/mysql.sock # original
socket=/data/mysql/mysql.sock
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

[mysql.server]
user=mysql
# basedir=/var/lib # original
basedir=/data

[mysqld_safe]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
[root@www1 RPMS]#


Answer: No. So maybe we need to add a section? Lets see:


Lets add this to /etc/my.cnf on both nodes:

[mysql]
socket=/data/mysql/mysql.sock


save the file and check mysql client:

[root@www2 ~]# mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2 to server version: 4.1.20

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> quit
Bye
[root@www2 ~]#

Alhumdulillah! Great ! Without giving socket info, it picked up mysql.

Copy the /etc/my.cnf to both nodes. Try fail over, etc etc. It should work.

You are here How To / Tutorials MySQL+Apache High Availability