Monthly Archives: October 2011

WebSphere MQ – High Availability on Linux using DRBD and Heartbeat

Overview

High availability

High availability refers to the ability of a system or component to be operational and accessible when required for use for a specified period of time. The system or component is equipped to handle faults in an unplanned outage gracefully to continue providing the intended functionality.

What is DRBD

DRBD is a block device which is designed to build high availability clusters. This is done by mirroring a whole block device via (a dedicated) network.

What is heartbeat

Heartbeat is a daemon that provides cluster infrastructure services. It allows clients to know about the presence of peer processes on other machines and to easily exchange messages with them. The heartbeatdaemon needs to be combined with a cluster resource manager which has the task of starting and stopping the services (e.g. IP addresses, WebSphere MQ etc ) that cluster will make highly available.

Installation(HAMQ)

In this tutorial we will set up a highly available server providing WebSphere MQ services to clients. Should a server become unavailable, WMQ will continue to be available to users. However, WMQ Clients which are communicating with a queue manager that may be subject to a restart or takeover should be written to tolerate a broken connection and should repeatedly attempt to reconnect.

WMQ server1: techish-mq-a IP address: 10.10.1.21
WMQ server2: techish-mq-b IP address: 10.10.1.22
WMQ Server Virtual IP address 10.10.1.20
We will use the /drbd directory as the highly available Queue Managers

To begin, set up two Ubuntu 10.04 LTS servers. You can use RAID, LVM etc as per your requirement. I’ll assume you are using LVM to manage your disk, you may or may not use LVM (beyond the scope of this tutorial)

Install/Configure DRBD

The following partition scheme will be used for the DRBD data:

/dev/data/meta-disk -- 1  GB DRBD meta data
/dev/data/drbdlv    -- 20 GB unmounted DRBD device

Sample output from lvdisplay:

--- Logical volume ---
LV Name                /dev/data/drbdlv
VG Name                data
LV UUID                GCJWiy-0eGD-S5ti-19yy-9QAN-E6tJ-j3mnce
LV Write Access        read/write
LV Status              available
# open                 1
LV Size                20.00 GiB
Current LE             78336
Segments               2
Allocation             inherit
Read ahead sectors     auto
- currently set to     256
Block device           251:0

--- Logical volume ---
LV Name                /dev/data/meta-disk
VG Name                data
LV UUID                XaDR2x-cNhV-Sxgb-auKi-YNPZ-HxLf-GYZxoE
LV Write Access        read/write
LV Status              available
# open                 1
LV Size                1.00 GiB
Current LE             256
Segments               1
Allocation             inherit
Read ahead sectors     auto
- currently set to     256
Block device           251:1

The isolated network between the two servers will be:

WMQ server1:     node1-private IP address: 192.168.0.21
WMQ server2:     node2-private IP address: 192.168.0.22

Ensure that /etc/hosts contains the names and IP addresses of the two servers.

Sample /etc/hosts:

127.0.0.1         localhost
10.10.1.21        techish-mq-a    node1
10.10.1.22        techish-mq-b    node2
192.168.0.21      mq-a-private
192.168.0.22      mq-b-private

Install NTP to ensure both servers have the same time.

apt-get install ntp

You can verify the time is in sync with the date command. Install drbd and heartbeat.

apt-get install drbd8-utils heartbeat

Now create a resource configuration file mq.res and place it in /etc/drbd.d/

Example /etc/drbd.d/mq.res would look as follows

resource mq {

protocol C;

handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}

startup {
degr-wfc-timeout 120;
}

disk {
on-io-error detach;
}

net {
cram-hmac-alg sha1;
shared-secret "password";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}

syncer {
rate 100M;
verify-alg sha1;
al-extents 257;
}

on node1 {
device /dev/drbd0;
disk /dev/data/drbdlv;
address 192.168.0.21:7788;
meta-disk /dev/data/meta-disk[0];
}

on node2 {
device /dev/drbd0;
disk /dev/data/drbdlv;
address 192.168.0.22:7788;
meta-disk /dev/data/meta-disk[0];
}

}

Duplicate the DRBD configuration to the other server.

scp /etc/drbd.conf root @ 10.10.1.22:/etc/

As we are using heartbeat with drbd, we need to change ownership and permissions on several DRBD related files on both servers:

chgrp haclient /sbin/drbdsetup
chmod o-x /sbin/drbdsetup
chmod u+s /sbin/drbdsetup
chgrp haclient /sbin/drbdmeta
chmod o-x /sbin/drbdmeta
chmod u+s /sbin/drbdmeta

Initialize the meta-data disk on both servers.

drbdadm create-md mq

Decide which server will act as a primary for the DRBD device and initiate the first full sync between the two servers e.g. node1 and execute the following on node1:

drbdadm -- --overwrite-data-of-peer primary mq

You can view the current status of DRBD with:

cat /proc/drbd  

version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by mq@techish, 2011-03-14 20:36:57
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
    ns:1687824 nr:168 dw:1655320 dr:42512 al:572 bm:205 lo:1 pe:17 ua:2032 ap:0 ep:1 wo:b oos:1015968
    [>....................] sync'ed:  3.6% (1015968/1048576)K
    finish: 0:01:32 speed: 10,868 (10,868) K/sec

I prefer to wait for the initial sync to complete. Once completed, format /dev/drbd0 and mount it on node1:

mkfs.ext3 /dev/drbd0
mkdir -p /drbd
mount /dev/drbd0 /drbd

To ensure replication is working correctly, create test data on node1 and then switch node2 to be primary.

Create data:

dd if=/dev/zero of=/drbd/test.techish bs=1M count=100

Switch to node2 and make it the Primary DRBD device:

On node1:
[node1]umount /drbd
[node1]drbdadm secondary mq
On node2:
[node2]mkdir -p /drbd
[node2]drbdadm primary mq
[node2[mount /dev/drbd0 /drbd

You should now see the 100MB file in /drbd on node2. Now delete this file and make node1 the primary DRBD server to ensure replication is working in both directions.

On node2:
[node2]rm /drbd/test.techish
[node2]umount /drbd
[node2[drbdadm secondary mq
On node1:
[node1]drbdadm primary mq
[node1]mount /dev/drbd0 /drbd

Performing an ls -lh /drbd on node1 will verify the file is now removed and synchronization successfully occurred in both directions.

By now, we have configured DRBD HA cluster location /drbd and have tested that replication is working perfectly.

Install/Configure WSMQ

Install WSMQ on node1 and node2. Please refer to blog post Install IBM WebSphere MQ on Ubuntu

note: remove any runlevel init scripts on node1 and node2 for wsmq.

Relocate the  qmgrs data directory and configuration to our DRBD device.

On node1:
[node1]mount /dev/drbd0 /drbd
[node1]mv /var/mqm/ /drbd/
[node1]ln -s /drbd/mqm/ /var/mqm
On node2:
[node2]rm -rf /var/mqm
[node2]ln -s /drbd/mqm/ /var/mqm

Configure Heartbeat

Configure heartbeat to control a Virtual IP address and fail-over WSMQ in the case of a node failure.

On node1, define the cluster within /etc/heartbeat/ha.cf. Example /etc/heartbeat/ha.cf:

logfacility     local0
keepalive 2
deadtime 30
warntime 10
initdead 120
bcast eth0
bcast eth1
node node1
node node2

On node1, define the authentication mechanism within /etc/heartbeat/authkeys the cluster will use. Example /etc/heartbeat/authkeys:

auth 3
3 md5 password

Change the permissions of /etc/heartbeat/authkeys.

chmod 600 /etc/heartbeat/authkeys

On node1, define the resources that will run on the cluster within /etc/heartbeat/haresources. We will define the master node for the resource, the Virtual IP address, the file systems used, and the service(wsmq) to start. Example /etc/heartbeat/haresources:

node1 IPaddr::10.10.1.20/24/eth1 drbddisk::mq Filesystem::/dev/drbd0::/drbd::ext3 wsmq

Copy the cluster configuration files from node1 to node2.

[node1]scp /etc/heartbeat/ha.cf root@10.10.1.22:/etc/heartbeat/
[node1]scp /etc/heartbeat/authkeys root@10.10.1.22:/etc/heartbeat/
[node1]scp /etc/heartbeat/haresources root@10.10.1.22:/etc/heartbeat/

Reboot both servers.

How to Completely Uninstall MySQL on Ubuntu

I always found it a bit tricky doing a clean re-install of MySQL. Here’s some steps I have learned from my experience

Note: this will delete *everything* associated with MySQL on ubuntu, I believe that’s what is meant by a ‘complete uninstall’ to be able to do a fresh installation. (Tested on 10.4 LTS)

a) Using apt

apt-get --purge remove mysql-server apt-get --purge remove mysql-client apt-get --purge remove mysql-common
apt-get autoremove apt-get autoclean 

b) Using aptitude

I noticed aptitude does a better job of removing dependencies

aptitude remove mysql-client aptitude remove mysql-server aptitude remove mysql-common 

c) See if anything depends on the installed packages

apt-cache rdepends mysql-server apt-cache rdepends mysql-client

d) Only if you’ve changed apparmor settings, change them back inside here

vi /etc/apparmor.d/usr.sbin.mysqld 

e) Delete configuration(following command deletes all the config)

rm -rf /etc/mysql

f) Find all files with “mysql” on / and delete them

find / -iname 'mysql*' -exec rm -rf {} \;

 

After all of the above, follow the steps below for a clean install

a) Install MySQL server and client

apt-get install mysql-server mysql-client

b) Check to see if mysql is running

service mysql status

and you should get the output

mysql start/running, process 1234

c) Check with mysqladmin

mysqladmin -u root -p status
Array ( [marginTop] => 100 [pageid] => @techish1 [alignment] => left [width] => 292 [height] => 300 [color_scheme] => light [header] => header [footer] => footer [border] => true [scrollbar] => scrollbar [linkcolor] => #2EA2CC )