Setup On FreeBSD

If someone tells you setting up PostgreSQL in a failover setup is easy, they have either never done it in a real production environment or they're really good. Most likely, it's the former. Yes, there are plenty of blog-posts out there on this topic, but most of them are far too simplistic and lack good logic to prevent data loss/split-brain situations.

Before we begin, I recommend that you go and fetch a cup of coffee, because this is going to get messy.

I've been working with PostgreSQL in production for a few years, and I've never found a setup that I've been really happy with. This is the closest I've gotten, and It's far from perfect. Compared to something like MongoDB's ReplicaSet, this setup is very primitive. Yet, to my best knowledge, it is the best setup available and it uses the latest tools available for PostgreSQL. It should also be said MySQL might be a tad better than PostgreSQL when it comes to ease of setting up a master/slave setup, it's not that much better. Besides, PostgreSQL comes with other benefits.

Here are the core objective that I wanted to meet:

Master/slave setup with automatic failover
Connection pooling
Load-balance SELECT-queries to the slave

To accomplish this, I will be using the following:

PostgreSQL with streaming replication
PGPool-II
UCARP

All of this will be running on FreeBSD 9.0, but should in theory be able to run on any UNIX or Linux system with minor modifications.

The architecture and possible scenarios

When we are done with the entire setup, there will be four possible stages the cluster can be in (or possibly five, if you count a slave recovering from previously being a slave). The scenarios are as follows:

Scenario 1: Master / slave. Both active.
Scenario 2: Master goes down. Slave promoted to master.
Scenario 3: New master / slave. The old master becomes slave.
Scenario 4: Slave down. The master takes all load.

As you can see in the above scenarios, PGPool-II will always be served from the master-server. We will be using UCARP to determine which server is the master, and trigger the actions based on that.

You can also see in the above scenarios that we have two servers involved: box0 (192.168.10.10/24) and box1 (192.168.10.11/24). The shared IP we will use for UCARP is 192.168.10.1/32. All clients requests to PostgreSQL must go via PGPool (ie. 192.168.10.1:9999). They cannot connect to PostgreSQL directly on box0 or box1.

Now before we move on to setting up the various components, let's talk about about how the architecture can move between the various scenarios. I guess it goes without saying that the default, and most desirable state, is scenario 1. In that case the master accepts all write queries (INSERT, UPDATE, ALTER etc), and the slave offloads the master for read-only queries (SELECT). The master then stream all the changes to the slave using streaming replication.

From scenario 1, either scenario 2 (the master goes down) or 4 (the slave goes down) can happen. To recover to scenario 1 again from 2, we must go via scenario 3 (the old master get degraded to slave). It's important to know that moving from scenario 2 back to 1 will require manual intervention (to avoid a split-brain situation), but we will talk more about that later on.

Setting up UCARP

UCARP is the key to the automatic failover. It works similar to VRRP for Linux, and the kernel-lever CARP in FreeBSD. The reason why I've decided to use UCARP instead of CARP is partially because FreeBSD's CARP doesn't fully work in virtualized environment, but perhaps more importantly, it allows you to better script actions upon failover.

Contrary to CARP, UCARP isn't a virtual network interface, it's just a trigger mechanism (or heartbeat system if you prefer). This is both good and bad. The drawback with this is that you cannot bind on the shared IP when in backup or slave-mode. The benefit however, is that you can have actions taking place before the node takes over. We will use this and tie it into PGPool.

To install UCARP on FreeBSD, simply run:

cd /usr/ports/net/ucarp && make clean install

Once installed, we need to configure it UCARP on both nodes. Add the following to /etc/rc.conf:

ucarp_enable="YES"
ucarp_if="vtnet1"
ucarp_vhid="1"
ucarp_pass="carppass"
ucarp_preempt="YES"
ucarp_facility="daemon"
ucarp_src="192.168.10.10"
ucarp_addr="192.168.10.1"
ucarp_advbase="2"
ucarp_advskew="0"
ucarp_upscript="/usr/local/bin/ucarp_up.sh"
ucarp_downscript="/usr/local/bin/ucarp_down.sh"

You can read more about the various variable in this post, but as you can see, this is the configuration for box0 (since ucarp_src is set to 192.168.10.10). You need to install UCARP on both nodes, and then add the above to rc.conf with ucarp_src adjusted based on the node. Also, note that we have advskew set to 0 on both nodes to make sure there is no preferred master.

As you can see above, we call on the scripts 'ucarp_upscript' and 'ucarp_downscript' when the UCARP interface goes up (master) and down (slave). These scripts (together with the script 'pg_ha.sh' that we will talk about later on) are what will orchestrate PostgreSQL and make it behave the way we want it to in the various scenarios.

You need to the scripts on both nodes. The scripts are available on Github and you can install them with these commands:

wget https://github.com/vpetersson/pg_ha/blob/master/bin/pg_ha.sh -O /usr/local/bin/pg_ha.sh
wget https://github.com/vpetersson/pg_ha/blob/master/bin/ucarp_up.sh -O /usr/local/bin/ucarp_up.sh
wget https://github.com/vpetersson/pg_ha/blob/master/bin/ucarp_down.sh -O /usr/local/bin/ucarp_down.sh
chmod +x /usr/local/bin/ucarp*
chmod +x /usr/local/bin/pg_ha.sh

With all those components in place, you can now start UCARP. Since we don't have PostgreSQL installed yet, the actions pg_ha.sh will trigger won't do anything. All it will do is to set up an IP alias on the master node (which will be the first node you start UCARP on).

/usr/local/etc/rc.d/ucarp start

You can verify the status using the script 'pg_ha.sh status.' Also, you can also try to verify that the system works as expected by simply pinging the UCARP IP (192.168.10.1) from a third node.

In the rest of the setup, we are going to assume that box0 is the master, so if box1 is the master right now, simply stop UCARP for a few seconds on box1 and then start it and box0 should be promoted to master

PostgreSQL

Let's move on to setting up PostgreSQL. This is where it starts to get messy.

First, let's install PosgreSQL and enable it in rc.conf:

cd /usr/ports/databases/postgresql91-server && make clean install
echo -e "postgresql_enable=\"YES\"" >> /etc/rc.conf

Now on box0 initiate an empty database:

/usr/local/etc/rc.d/postgresql initdb
/usr/local/etc/rc.d/postgresql start

Setting up password-less authentication

Next we will need to setup key-based authentication between the two servers using SSH and Rsync. We will also need 'sudo.' If you don't have Rsync installed already, install it with:

cd /usr/ports/net/rsync && make clean install
cd /usr/ports/security/sudo && make clean install

With that installed, we now need to generate and install the keys. On both servers, run (we assume pgsql is the user PosgreSQL will run under):

sudo su - pgsql
ssh-keygen
cat ~/.ssh/id_rsa.pub

Now create a file called ~/.ssh/authorized_keys and populate it with the content from id_rsa.pub on the other node. We also need to ensure that it has the correct permission. On box0, run:

sudo su - pgsql
chmod 600 ~/.ssh/authorized_keys
ssh box1

Do the same thing on box1 (but replace box1 with box0 of course). Once you have validated that you can SSH in from box0 to box1 and box1 to box0 without a password we can move on. If you can't get pass this step, there's no point in moving forward, as this is a critical component.

Now let's set a password for PostgreSQL's pgsql-user such that the two nodes can communicate with each other. Start by opening up a session with PostgreSQL on box0 (remember that we don't have PostgreSQL running on box1 yet, and that's on purpose). psql postgres pgsql

Once you have the session open, we'll set the password of 'pgsql' to 'foobar':

ALTER USER pgsql WITH password 'foobar';
\d

Now we need to grant access to PostgreSQL to clients on the local network. Open up the file /usr/local/pgsql/data/postgresql.conf and add the lines (yes, you need both 'all' and 'replication'):

host    replication     all     192.168.10.0/24         password
host    all             all     192.168.10.0/24         password

Once you've done that, you will need to restart PostgreSQL: /usr/local/etc/rc.d/postgresql restart

Now verify that you can connect from box1 to box0 using the following command: psql -hbox0 -Upgsql

Provide the password you specified above. If that works, let's move on.

Next up we need to setup password-less authentication between the two PostgreSQL-servers. To do this, we need to store the password to the pgsql-user in a file called ~/.pgpass. On both box0 and box1, do the following:

sudo su - pgsql
echo -e "box0:5432:*:pgsql:foobar\nbox1:5432:*:pgsql:foobar" > ~/.pgpass
chmod 600 ~/.pgpass

On box1, and make sure you can access box0 without a password:

sudo su - pgsql
psql -hbox0 -Upgsql

If you that worked, we can now move forward. If not, verify your settings.

Configuring PostgreSQL for streaming replication

On box0, start by editing /usr/local/pgsql/data/postgresql.conf. Make sure you have the following settings:

listen_addresses = '192.168.10.10,127.0.0.1'
synchronous_commit = off
wal_level = hot_standby
max_wal_senders = 1
wal_keep_segments = 32
hot_standby = on
wal_buffers = 16MB

Continue by adding the recovery.bak-file. Since both recovery.bak and pg_ha.sh includes a hard-coded name to the other node, you might need to adjust them. wget https://github.com/vpetersson/pg_ha/blob/master/conf/recovery.bak -O /usr/local/pgsql/data/recovery.bak

Now restart PostgreSQL and make sure you can connect to it. If not check your log-files: /usr/local/etc/rc.d/postgresql restart

If that worked, and you were able to connect, it's time to configure box1. Start by shutting down PostgreSQL ond box0: /usr/local/etc/rc.d/postgresql stop

Now hop over to box1 and it's time to copy the whole shebang from box0 to box1. From box1, run the following: sudo su - pgsql rsync -aP box1:/usr/local/pgsql/data/ /usr/local/pgsql/data/

Once the copying is done, you need to adjust the following files:

* /usr/local/pgsql/data/recovery.bak

Change the 'host' to box0. * /usr/local/pgsql/data/postgresql.conf Change the 'listen_addresses' to match the local IP. * /usr/local/bin/pg_ha.sh Change the 'REMOTE' to box0

With all the changes made, we can now start testing if streaming replication actually works the way we want it. Start by firing up PostgreSQL on box0: /usr/local/etc/rc.d/postgresql start

Assuming that went well, it's now time to start box1:

sudo -u pgsql cp -f /usr/local/pgsql/data/recovery.bak  /usr/local/pgsql/data/recovery.conf
/usr/local/etc/rc.d/postgresql start

If everything went well, box0 should now stream to box1 and we have our Streaming Replication setup up and running. The quickest way to check this is to run the following on both boxes:

pg_ha.sh status

When you've gotten this far and everything works, you most certainly deserve a break.

PGPool-II

I'd like to tell you that the worst is over, but that'd be a lie. Wrapping your head around PGPool is far from easy and the lack of good documentation doesn't make it any easier. Even after spending a significant amount of time perfecting this setup, I'm still not sure that I fully understand how PGPool really works. Hence, this should serve as a warning that I might be implementing it incorrectly. It should also be said that I will not be installing the web-based management tool pgpoolAdmin.

Let's start with the easy part -- installing PGPool-II:

cd /usr/ports/databases/pgpool-II/ && make clean install
echo -e "pgpool_enable=\"YES\"" >> /etc/rc.conf

This of course, needs to be done on both box0 and box1.

Next, on box0 (the master), open up a session to PostgreSQL and create a user and datase for pgpool:

CREATE USER pgpool WITH PASSWORD 'foobar';
CREATE DATABASE pgpool OWNER pgpool;

Now we need to populate the data:

cd /usr/local/share/pgpool-II
psql pgpool pgpool < system_db.sql

We also need to create a few config-files and folders for PGPool (this needs to be done on both box0 and box1):

mkdir -p /var/run/pgpool
chown -R nobody:nobody /var/run/pgpool
touch /usr/local/etc/pool_passwd
chown nobody:nobody /usr/local/etc/pool_passwd
touch /usr/local/etc/pcp.conf
chown nobody:nobody /usr/local/etc/pcp.conf
wget https://github.com/vpetersson/pg_ha/blob/master/conf/pgpool.conf -O /usr/local/etc/pgpool.conf

Let's move on to configuring the PCP, which is PGPool's management interface. This is fairly straight forward. The config-file (pcp.conf) is simply a password-file with md5-hased passwords in the format "username:." First you need to hash your password. You can do that with the following command:

pg_md5 foobar

Take the output you got from that command and add it to the file /usr/local/etc/pcp.conf. For instance, if my password was 'foobar' and my user pooladmin, I'd have: "pooladm:3858f62230ac3c915f300c664312c63f" in my pcp.conf.

Now let's dive in to the pgpool.conf. It should be said that this file will not look identical on both systems. Ideally, you start by editing the them on one server and then copy it over. There are a lot of variables to configure, but most of them are self-explanatory based on the template I've set up. Make sure that pgpool2_hostname is set to the local hostname and that all the passwords are correct.

Once done, you can start up PGPool on the master (box0): /usr/local/etc/rc.d/pgpool start

Putting it all together

With all of the components sorted, let's see if we can orchestrate the entire setup now using the pg_ha.sh.

On the master, run: pg_ha.sh master What this did in this particular case was more or less just to start PGPool.

Now on the slave, run the following command: pg_ha.sh slave

You can now verify that everything works with our good old friend pg_ha.sh status on both nodes. If everything worked, you deserve a pat on the back.

If not, try running the following on the slave: pg_ha.sh init-slave

If that still doesn't work, check the logs.

Now let's take a look at what PGPool thinks about our system. To do that, we'll have to use PGPool's CLI, which is a suite of commands called pcp_*. To check the status, we'll use the command pcp_pool_status. By default it will spit out a lot of data, so but right now, we're only interested in the status of the two nodes: pcp_pool_status 10 192.168.10.1 9898 pooladm foobar | grep -A1 "backend_status"

If you got the value 1 reported on both backends, that means you're up and running in 'scenario 1'. If not, you need to investigate your log-files.

Now let's see what happens if we bring down the slave. Stop PostgreSQL on box1 and then open a session to PGPool with PostgreSQL and run a few commands. You'll now notice in the logs on box0 that it detected box1 as down. Now re-run the command from above. You'll now see that "backend_status1" changed to 3 (assuming box1 was backend 1). That means that it is down.

It is important to stress that PGPool won't bring the node back to status 1 even if you start PostgreSQL back up on box1. To avoid a split-brain situation, you need to explicitly tell PGPool to re-attach box1.

pcp_attach_node 10 192.168.10.1 9898 pooladm foobar 1
pcp_pool_status 10 192.168.10.1 9898 pooladm foobar | grep -A1 "backend_status"

Notice that the backend_status1 now is 1. I honestly do not know the difference between the statuses are. At first I thought they ment the following:

Status 1: Accept read/write queries
Status 2: Accept read queries
Status 3: Down.

But this does not seem to be the case. Status 1 and 2 seems to both accept read and write queries. I'm am pretty sure however that Status 3 means down.

This is an example of what I meant by poor documentation. I can't find anything on PGPool's website that explains the different statuses. Sure, I could read the source code, but it's pretty silly to have to resort to that.

Simulating a failover

We assumed earlier that box0 was the master and box1 was the slave, so let's simulate the most important situation: the master goes down (i.e. we go from scenario 1 to scenario 2). To do this, the easiest way is to simply reboot box0 (shutdown -r now).

If everything went as planned, box1 took over as master, and box0 came back up. If you run the pcp_pool_status, you'll most likely have the other node in status 3 (i.e. down) and the local server as status 1. You can also verify the status of the two nodes using pg_ha.sh status. Once you've verified that the node is back up and is receiving the WAL stream, you can bring it back in using pcp_attach_node as discussed above.

In a real situation, it isn't unlikely that the slave won't come back up, as there have been enough changes to the database, such that the node cannot catch up. There are two possible situations to recover using the pg_ha.sh script.

The first, and fastest, way to recover the slave is to run: pg_ha.sh slave

This will take a database dump from the master and append it to the local database, and then put it into slave-mode. Sometimes however, this isn't enough. In those situations we need to resort to the more aggressive approach and invoke: pg_ha.sh init-slave

If you have a large database, this may take some time, as we literally copy over all the files from the master to the slave, and then starts the slave back up.

As a safety precaution, both pg_ha.sh slave and pg_ha.sh init-slave stores a complete backup of both the local and the remote database prior to modifying the data. This is to avoid a situation where data is lost in a failover. These backups will be stored in /usr/local/pgsql/ha_backups. Hence, this folder can grow large in size if you do a lot of failover back and forth.

Final words

As you probably can tell, this isn't something I put together in an afternoon. I've worked on these scripts for some time to perfect the logic and handle all possible situations. It his however very likely that I've missed some situation or that something isn't configured to perfection. It is also worth mentioning that I haven't even mentioned anything about tuning PostgreSQL, but I consider that a separate discussion.

All of the scripts I've talked about in this article have been posted here such that they can easily be updated.

Checkout my blog for more tech articles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly