PostgreSQL Replication over Linux Platforms

arizonahoopleData Management

Nov 28, 2012 (5 years and 5 months ago)


Open Source Science Journal Vol. 2, No. 3, 2010


PostgreSQL Replication over Linux Platforms

Elena-Mirela DROG
Academy of Economic Studies, Bucharest, Romania

Abstract: This article focuses on ways to place multiple instances for one single data-
base with high availability, backup or for a no-downtime migration to a new version. It is fo-
cusing on the replication tool, which will keep all of them in sync. We will use the replication
in the backup strategy, scale-out and reliability. As part of a backup strategy, replication
avoids shutting down the master if backups are made from the slave. Replication can allow
the distribution of reads over multiple replication slaves. Replication can provide a ready
backup in the event of a failure. The goal of the replication architecture is to have additional
database servers standing by in case of hardware failure, DBA/Application mistakes and
maintenance operations, because turning on a standby means failing the master.

Keywords: Replication, Linux, Databases, Security, Backup, Failure, Synchronization.

1. Some preliminary considerations upon existing replication tools for PostgreSQL

PostgreSQL replication is a technology designed to allow storing identical data in mul-
tiple locations. This article aims to examine why replication may be useful and how it solves
common data distribution problems. There are several classical examples in which Post-
greSQL replication solves business problems. One of the most popular ones is the case when
a business has mobile employees, who need to access data from their portable computers
while they are away from the office. Another example is when the workforce of a business is
distributed around the world and all employees need to access one and the same set of data,
but network connectivity has poor quality. In both the above examples using replication is the
right thing to do. Replication is used in many other scenarios as well for example as a backup
solution, and for offloading database intensive processing like reporting and data mining from
main live databases.
Sales force automation, field service, retail, and other mass deployment applications
typically require data to be periodically synchronized between central database systems and a
large number of small, remote sites, which are often disconnected from the central database.
Members of a sales force must be able to complete transactions, regardless of whether they
are connected to the central database. In this case, remote sites must be autonomous. On the
other hand, applications such as call centers and Internet systems require data on multiple
servers to be synchronized in a continuous, nearly instantaneous manner to ensure that the
service provided is available and equivalent at all times. For example, a retail Web site on the
Internet must ensure that customers see the same information in the online catalog at each
site. Here, data consistency is more important than site autonomy.
Database management systems have been a crucial component of infrastructures for
many years now. [1] PostgreSQL is an advanced, object-relational database management sys-
tem that is frequently used to provide such services. It has more than 15 years of active devel-
opment and a proven architecture that has earned it a strong reputation for reliability, data in-
tegrity, and correctness. It runs on all major operating systems, including Linux, UNIX (AIX,
BSD, HP-UX, SGI IRIX, Mac OS X, Solaris, Tru64), and Windows. It is fully ACID compli-
Open Source Science Journal Vol. 2, No. 3, 2010


ant, has full support for foreign keys, joins, views, triggers, and stored procedures, in multiple
languages. It includes most SQL92 and SQL99 data types, including INTEGER, NUMERIC,
storage of binary large objects, including pictures, sounds, or video. It has native program-
ming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others. pgAd-
min III is the most popular and feature rich Open Source administration and development
platform for PostgreSQL, the most advanced Open Source database in the world. The applica-
tion may be used on Linux, FreeBSD, OpenSUSE, Solaris, Mac OSX and Windows platforms
to manage PostgreSQL 7.3 and above running on any platform, as well as commercial and de-
rived versions of PostgreSQL such as EnterpriseDB, Mammoth PostgreSQL, Bizgres and
Greenplum database. pgAdmin III is designed to answer the needs of all users, from writing
simple SQL queries to developing complex databases. The graphical interface supports all
PostgreSQL features and makes administration easy. The application also includes a syntax
highlighting SQL editor, a server-side code editor, an SQL/batch/shell job scheduling agent,
support for the Slony-I replication engine and much more.
Server connection may be made using TCP/IP or Unix Domain Sockets and may be
SSL encrypted for security. No additional drivers are required to communicate with the data-
base server.
The two available open-source replication solutions in the past, rserv and ERServer, had
serious limitations and needed replacement. Fortunately, such a replacement recently became
available. Each new option for replication offers its own features, benefits and issues.
PGCluster is a multi-master no delay synchronous replication for load sharing or high
availability. Large objects are now supported. It has the features that are needed in real high
available environments, think financial, legal industries. PGCluster 1.0 started with this, 1.1
added large-object replication and 1.3 is a nice refinement.
Slony-I is a master to multi-slave cascading and almost-failover. The Slony-I (one)
package is well developed, documented and works with PostgreSQL 7.3 and newer [3]. Slony
does not do multi-master replication like PGCluster and does not support large objects. Slony
is also suitable for use in environments that require high-availability.
DBBalancer Alpha is a load balancing multithreaded PostgreSQL connection pool. Also
has a replication mode to keep in sync the load balanced backend servers, but needs the ACE
framework libraries.
Pgpool is a connection pooling server that supports up two PostgreSQL servers in syn-
chronous replication. This replication mode operates in degeneration mode when one server
has failed, the data must be manually synchronized before restarting. The load-balancing fea-
tures are restricted to non-transactional SELECT queries.
Pg_comparator is effectively rsync for PostgreSQL. Only one table can be replicated at
a time. This is a nice package for moving development code into test or test to live. Recover-
ing from a Slony failure this will come in handy.
When speaking about PostgreSQL we also have to mention the possibility to implement
a lazy replication [12]. Lazy replication replicates a PostgreSQL database across a loose net-
work of unreliable servers. This solution is suited for data that does not have to be correct at
all times and does not change quickly; examples are common bodies of knowledge like ency-
clopedias, genealogy trees, sharing scientific data. The functioning principles are based on the
fact that one master server periodically exports a change packet and makes it available to
slaves and by the other hand, slaves pick up the change packet and apply it to their local data-
base mirror.
As a comparison, PGCluster and Slony are the two most advanced packages mentioned
here. PGCluster has a multi-master capability and large object support; it can be used in rapid
recovery from failure when combining its load-balancer with the replicator. Slony is designed
Open Source Science Journal Vol. 2, No. 3, 2010


more for data centers to keep hot-backups of the data servers. Slony does not support hot-
failover, but does support planned switch over.
Pg_pool is a simpler concept than PGCluster and Slony and understandably less fea-
tured, designed for pooling not so much for high availability and load-balancing as PGCluster
or Slony. Pg_comparator is not replication, its best used for synchronizing two tables. This
type of tool could be used when bringing a failed Slony master back-online.
Bottom line, it seems like PGCluster is the package to use for those that really need that.
For almost real-time backups one could use Slony, remember its failover features aren't as
complete as PGCluster. A mostly static database, say a read-only copy for web content could
benefit from pgpool by using non-transactional selects. Pg_comparator could be used for this
read-only database server pool to push changes into the live server.
The lazy replication can use a loose collection of potentially unreliable servers scattered
across the globe, and you may find it attractive if your project is short on cash and you choose
it as method of sharing the hosting burden. Lazy replication can also be used to make money
for your project, if someone needs to have continuous updates, charge them for this service

2. Slony-I and PGCluster

Slony-I is a trigger-based master to multiple slaves replication system for PostgreSQL
being developed by Jan Wieck [2]. This enterprise-level replication solution works asynchro-
nously and offers all key features required by data centers.
Among the key Slony-I usage scenarios are:
a. Database replication from the head office to various branches to reduce bandwidth us-
age or speed up database requests.
b. Database replication to offer load balancing in all instances, this can be particularly
useful for report generators or dynamic Web sites.
c. Database replication to offer high availability of database services.
d. Hot backup using a standby server or upgrades to a new release of PostgreSQL.
PGCluster is the synchronous replication system of the multi-master composition for
PostgreSQL [7]. PGCluster consists of three kinds of servers: a load balancer, a cluster DB
and a replication server.
The structure of a PGCluster replication:
a. Clusters are represented by the database instance, containing the data’s that are going
to be replicated.
b. Load balancers exist to share the query load between all the databases in the replication
c. The replicator is used to replicate, or synchronize, data between all the clusters.

3. Installation and configuration of Slony-I and PGCluster

The steps required to install Slony-I, replicate a simple database located on the same
machine and making it a high-available solution to provide automatic failover could be as fol-
lows. First make sure that the installation of PostgreSQL from source worked fine, supposing
that the version installed is 7.3.2 or higher [5]. Proceed with the following commands:

% tar -zxvf slony1-1.0.5.tar.gz
% cd slony1-1.0.5
% ./configure --with-pgsourcetree=/usr/src/redhat/BUILD/postgresql-7.4.5
Open Source Science Journal Vol. 2, No. 3, 2010


% make install

Here we tell the Slony-I's configure script to look in /usr/src/redhat/BUILD/postgresql-
7.4.5/ for the location of the PostgreSQL sources, the directory used when building the Post-
greSQL 7.4.5 RPMs on Red Hat Enterprise Linux. The last command compiles Slony-I and
installs the following files:
is the administration and configuration script utility of Slony-I used to
modify Slony-I replication systems [4].
is a multithreaded engine that use the information from the replication
schema to communicate with other engines, creating the distributed replication system.
contains the C functions and triggers.
contains additional data type to store transaction IDs safely.
is the replication schema.
contains replication functions.
is a script used to load the additional data type previously
For creating the database to be replicated let's create a user UserApplication, the DBAp-
plication database and activate the plpgsql programming language to this newly created Post-
greSQL database by proceeding with the following commands [10]:
% su - postgres
% createuser --pwprompt UserApplication
Enter password for user "UserApplication": (specify a password)
Enter it again: Shall the new user be allowed to create databases? (y/ n) y
Shall the new user be allowed to create more new users? (y/ n) n
% createdb -O UserApplication DBApplication
% createlang -U postgres -h localhost plpgsql \

For this purpose create a sequence and the table in the database to be replicated and in-
sert some information in the table:
% psql -U UserApplication DBApplication
DBApplication=> create sequence contact_seq start with 1;
DBApplication=> create table contact (
cid int4 primary key,
name varchar(50),
address varchar(255),
phonenumber varchar(15)
DBApplication=> insert into contact (cid, name, address,phonenumber) values ((se-
lect nextval('contact_seq')),'Joe', '1 Foo Street', '(592) 471-8271');
DBApplication=> insert into contact (cid, name, address,phonenumber) values ((se-
lect nextval('contact_seq')),'Robert', '4 Bar Roard', '(515) 821-3831');
DBApplication=> \q

Now create a second database on the same system in which we will replicate the infor-
mation from the DBApplication database. Proceed with the following commands to create the
database, add plpgsql programming language support and import the schema without any data
from the DBApplication database:
% su - postgres
% createdb -O UserApplication DBApplication_slave
% createlang -U postgres -h localhost plpgsql \
Open Source Science Journal Vol. 2, No. 3, 2010


% pg_dump -s -U postgres -h localhost DBApplication | \
psql -U postgres -h localhost DBApplication_slave

Once the databases are created, we are ready to create our database cluster containing a
master and a single slave. The Slonik script to execute may look like the following one.

slonik <<_EOF_
cluster name = $CLUSTER;
node 1 admin conninfo = 'dbname=$DB1 host=$H1 user=$U';
node 2 admin conninfo = 'dbname=$DB2 host=$H2 user=$U';

init cluster (id = 1, comment = 'Node 1');
create set (id = 1, origin = 1,
comment = 'contact table');
set add table (set id = 1, origin = 1, id = 1,
full qualified name = '',
comment = 'Table contact');
set add sequence (set id = 1, origin = 1, id = 2,
full qualified name = 'public.contact_seq',
comment = 'Sequence contact_seq');
store node (id = 2, comment = 'Node 2');
store path (server = 1, client = 2,
conninfo = 'dbname=$DB1 host=$H1 user=$U');
store path (server = 2, client = 1,
conninfo = 'dbname=$DB2 host=$H2 user=$U');
store listen (origin = 1, provider = 1, receiver = 2);
store listen (origin = 2, provider = 2, receiver = 1);

The instruction

defines the namespace where all Slony-I-specific functions, procedures,
tables and sequences are defined.
In Slony-I, a node is a collection of a database and a slon process, and a cluster is a col-
lection of nodes, connected using paths between each other. Then, the connection information
for node 1 and 2 is specified, and the first node is initialized (init cluster). Once completed,
the script creates a new set to replicate, which is essentially a collection containing the pub- table and the public.contact_seq sequence. After the creation of the set, the script
adds the contact table to it and the contact_seq sequence. The store node command is used to
initialize the second node (id = 2) and add it to the cluster (sql_cluster). Once completed, the
scripts define how the replication system of node 2 connects to node 1 and how node 1 con-
nects to node 2. Finally, the script tells both nodes to listen for events (stores listen) for every
other node in the system.
Once this script has been executed, just start the slon replication processes, that always
must run in order for the replication to take place. A slon process is needed on the master and
slave nodes. If for some reason they must be stopped, simply restarting allows them to con-
Open Source Science Journal Vol. 2, No. 3, 2010


tinue where they left off. To start the replication engines, proceed with the following com-
% slon sql_cluster "dbname=DBApplication user=postgres" &
% slon sql_cluster "dbname=DBApplication_slave user=postgres" &

Next, we need to subscribe to the newly created set. Subscribing to the set causes the
second node, the subscriber, to start replicating the information of the contact table and con-
tact_seq sequence from the first node. Here it how the content of the subscription script might
look like.
slonik <<_EOF_
cluster name = $CLUSTER;
node 1 admin conninfo = 'dbname=$DB1 host=$H1 user=$U';
node 2 admin conninfo = 'dbname=$DB2 host=$H2 user=$U';

subscribe set (id = 1, provider = 1, receiver = 2, forward = yes);

The script starts by defining the cluster namespace and the connection information for
the two nodes. Once completed, the subscribe set command causes the first node to start rep-
licating the set containing a single table and sequence to the second node using the slon proc-
esses. Once the script has been executed, connect to the DBApplication_slave database and
examine the content of the contact table.
At any moment, you should see that the information was replicated correctly:

% psql -U UserApplication DBApplication_slave
DBApplication_slave=> select * from contact;

Now, connect to the DBApplication database and insert a row [11]:

% psql -U contact DBApplication
DBApplication=> begin;
insert into contact (cid, name,address, phonenumber) values
((select nextval('contact_seq')), 'William','81 Zot Street', '(918) 817-6381');

If you examine the content of the contact table of the DBApplication_slave database
once more, you will notice that the row was replicated. Now, delete a row from the DBAppli-
cation database:

DBApplication=> begin;
delete from contact where cid = 2; commit;

Again, by examining the content of the contact table of the DBApplication_slave data-
base, you will notice that the row was removed from the slave node correctly.
Instead of comparing the information for DBApplication and DBApplication_slave ma-
nually, we easily can automate this process with a simple script. Such a script could be
Open Source Science Journal Vol. 2, No. 3, 2010


executed regularly to ensure that all nodes are in sync, notifying the administrator if that is no
longer the case.
echo -n "Comparing the databases..."
psql -U $U -h $H1 $DB1 >dump.tmp.1.$$ <<_EOF_
select 'contact'::text, cid, name, address,
phonenumber from contact order by cid;
psql -U $U -h $H2 $DB2 >dump.tmp.2.$$ <<_EOF_
select 'contact'::text, cid, name, address,
phonenumber from contact order by cid;
if diff dump.tmp.1.$$ dump.tmp.2.$$ >dump.diff ; then
echo -e "\nSuccess! Databases are identical."
rm dump.diff
echo -e "\nFAILED - see dump.diff."
rm dump.tmp.?.$$

For taking this experiment with a replication system on nodes located on separate com-
puters, just modify the DB2, H1 and H2 environment variables. Normally, DB2 would be set
to the same value as DB1, so an application always refers to the same database name. The
host environment variables would need to be set to the fully qualified domain name of the two
nodes. You also would need to make sure that the slon processes are running on both com-
puters. Finally, it is good practice to synchronize the clocks of all nodes using ntpd or some-
thing similar. Later, if you want to add more tables or sequences to the initial replication set,
you can create a new set and use the merge set slonik command. Alternatively, you can use
the set move table and set move sequence commands to split the set.
In case of a failure from the master node, due to an operating system crash or hardware
problem, for example, Slony-I does not provide any automatic capability to promote a slave
node to become a master. This is problematic because human intervention is required to pro-
mote a node, and applications demanding highly available database services should not de-
pend on this. Luckily, plenty of solutions are available that can be combined with Slony-I to
offer automatic failover capabilities. The Linux-HA Heartbeat (Fig. 1) program is one of
Open Source Science Journal Vol. 2, No. 3, 2010


Fig. 1. Heartbeat switches the IP alias to the slave node in case the master fails.

When having a master and slave node connected together using an Ethernet and serial
link, in this configuration, the Heartbeat is used to monitor the node's availability through
those two links. The application makes use of the database services by connecting to Post-
greSQL through an IP alias, which is activated on the master node by the Heartbeat. If the
Heartbeat detects that the master node has failed, it brings the IP alias up on the slave node
and executes the slonik script to promote the slave as the new master [6].
The script is relatively simple. The following listing shows the content of the script that
would be used to promote a slave node, running on, so it starts offering all
the database services that offered.

su - postgres -c slonik <<_EOF_
cluster name = $CLUSTER;
node 1 admin conninfo = 'dbname=$DB1 host=$H1 user=$U';
node 2 admin conninfo = 'dbname=$DB2 host=$H2 user=$U';

failover (id = 1, backup node = 2);
drop node (id = 1, event node = 2);

The failover Slonik command is used to indicate that the node with id = 1, the node
running on, has failed, and that the node with id = 2 will take over all
sets from the failed node. The second command, drop node, is used to remove the node with
id = 1 from the replication system completely. Eventually, you might want to bring back the
failed node in the cluster. To do this, you must configure it as a slave and let Slony-I replicate
any missing information. Eventually, you can proceed with a switch back to the initial master
node by locking the set (lock set), waiting for all events to complete (wait for event), moving
the set to a new origin (move set) and waiting for a confirmation that the last command has
For PostgreSQL replication with PGCluster we will configure two different computers,
Open Source Science Journal Vol. 2, No. 3, 2010


and first of all we will proceed with the installation of Ubuntu 9.04. The model of replication
uses 2 entities: one with a cluster database, a load balancer and a replication server and the
other one with a cluster database and a replication server. Next install some compiler tools in
order to build PGCluster, this includes gcc, bison, and flex, as well as their respective -devel
packages. To install these packages, simply run the following from the command prompt on
each computer [15]:

apt-get -y install gcc g++ flex bison build-essential libreadline-dev zlib1g-dev

Let’s now download and compile PGCluster. There are 2 different methods for building
PGCluster. The first involves downloading a patch to the original PostgreSQL source
distribution. Simply apply the patch prior to compiling PostgreSQL in order to add the
PGCluster support. This method is very useful if there are other patches you wish to install
prior to building PostgreSQL. The second method involves downloading the complete
PGCluster distribution. This distribution includes the PostgreSQL source tree already patched
with PGCluster. We will use the second, full-distribution method for simplicity.



tar -xzvf pgcluster-1.9.0rc5.tar.gz

Move to the PGCluster folder in order to proceed with the installation:

cd pgcluster-1.9.0rc5
./configure && make && make install

Prior to configuring PGCluster, it is important to ensure that the hostnames for all the
PGCluster components can be resolved to IP addresses. You may do this using DNS, or sim-
ply add the required entries to the /etc/hosts file. In our case, clusterdb1, pglb, and pgrepl1 all
resolve to Both clusterdb2 and pgrepl2 resolve to
We open /etc/hosts and add the following two lines, for implementing those settings: clusterdb1 pglb pgrepl1 clusterdb2 pgrepl2

Now add the postgres user:
useradd postgres

mkdir /home/postgres
chmod 700 /home/postgres/
chown postgres /home/postgres

In /etc/passwd we change the line


Give rights to postgres user on pgsql folder:
chown -R postgres /usr/local/pgsql

Append these lines to bashrc file:
gedit ~/.bashrc
set PGDATA=/usr/local/
Open Source Science Journal Vol. 2, No. 3, 2010


export PGDATA=/usr/local/
set LD_LIBRARY_PATH=/usr/local/pgsql/lib
export LD_LIBRARY_PATH=/usr/local/pgsql/lib

For the purpose of initializing the DB, we will use the initdb command, as postgres us-

su -l postgres
cd /usr/local/pgsql/
/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data

PGCluster makes use of several configuration files, each specific to the component you
are installing. First, we will configure each of the clusterdb instances.
In the /usr/local/pgsql/data directory, you will find three files that need to be modified:
cluster.conf, postgresql.conf, and pg_hba.conf.
The cluster.conf file defines characteristics of the database cluster, as well as the repli-
cation server that will be used.
The postgresql.conf is the standard PostgreSQL configuration file.
The pg_hba.conf file is used for defining PostgreSQL security and access controls. This
file must be modified to trust connections originating from all other databases in the cluster.
Let’s now edit each one, and apply the corresponding changes in order to have the rep-
lication working.
gedit /usr/local/pgsql/data/cluster.conf
The hostname must be changed to the value “pgrepl1” for the first computer or
“pgrepl2” for the other computer. For using SSH cor rectly, just change “ssh -1” with “ssh -2”.

gedit /usr/local/pgsql/data/postgresql.conf
Here is just one single change to be done, uncomment the port 5432 for enabling con-
nections using the default port for PostgreSQL.

gedit /usr/local/pgsql/data/pg_hba.conf
First add the ip addresses [9]:
host all all trust
host all all trust
cp /usr/local/pgsql/share/pgreplicate.conf.sample
gedit /usr/local/pgsql/data/pgreplicate.conf

Modify at the beginning the existing script with the following lines:


The configuration must include some settings for the load balancer, so where you find
the next four lines

# <Host_Name></Host_Name>
# <Recovery_Port>6001</Recovery_Port>
Open Source Science Journal Vol. 2, No. 3, 2010



just delete them and insert as follows:

Modify the hostname

depending on the computer you are referring at.

Only for computer 1 who will have in the configuration scheme a load balancer we must
run some configuration settings:

cp /usr/local/pgsql/share/pglb.conf.sample /usr/local/pgsql/data/pglb.conf
gedit /usr/local/pgsql/data/pglb.conf

Modify the lines from the script referring to clusterDB configuration and load balancer



<Port> 5432</Port>




<Host_Name> clusterdb2</Host_Name>
<Host_Name>pglb </Host_Name>

In order to have an active communication/connection between the two computers we
just settled, we must disable firewall on each computer, and for Ubuntu 9.04 the command is:

ufw disable.
It is possible to use SSH to encrypt the network connection between clients and a Post-
greSQL server. Done properly, this provides an adequately secure network connection, even
for non-SSL-capable clients.

Installation of SSH consists of some basic steps:
apt-get install ssh
To generate the ssh key on the first computer run as follows:

mkdir ~/.ssh
chmod 700 ~/.ssh
ssh-keygen -q -f ~/.ssh/id_rsa -t rsa
Enter passphrase (empty for no passphrase): …
Enter same passphrase again: …
chmod go-w ~/
chmod 700 ~/.ssh
chmod go-rwx ~/.ssh/*
scp ~/.ssh/ clusterdb2:

and now we are connected using ssh to clusterdb2, where we shall continue the ssh settings:

mkdir ~/.ssh
chmod 700 ~/.ssh
cat ~/ >> ~/.ssh/authorized_keys
Open Source Science Journal Vol. 2, No. 3, 2010


chmod 600 ~/.ssh/authorized_keys
rm ~/

For the second computer which will connect via ssh to clusterdb1 there must be done
the same steps done with the first computer. To check that ssh was installed correctly just use
this command
ssh - PreferredAuthentications=publickey clusterdb1.

With our configuration out of the way, we are ready to start the services and begin
working with PGCluster. The order in which you start the services is important. The first
command is referring to the Cluster DB service.
/usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -o "-i" start
To start the load balancer service, run the following command as the postgres user on
the computer with a load balancer configuration.
/usr/local/pgsql/bin/pglb -D /usr/local/pgsql/data
/usr/local/pgsql/bin/pgreplicate -D /usr/local/pgsql/data
If you encounter errors just stop and then restart pgreplicate and pg_ctl on both com-

4. Comparison of different solutions implemented for replication purposes

Shared Disk Failover
Shared disk failover [11] avoids synchronization overhead by having only one copy of
the database. It uses a single disk array that is shared by multiple servers. If the main database
server fails, the standby server is able to mount and start the database as though it were recov-
ering from a database crash. This allows rapid failover with no data loss.
Shared hardware functionality is common in network storage devices. Using a network
file system is also possible, though care must be taken that the file system has full POSIX be-
havior. One significant limitation of this method is that if the shared disk array fails or be-
comes corrupt, the primary and standby servers are both nonfunctional. Another issue is that
the standby server should never access the shared storage while the primary server is running.

File System (Block-Device) Replication
All changes to a file system are mirrored to a file system residing on another computer.
The only restriction is that the mirroring must be done in a way that ensures the standby
server has a consistent copy of the file system — s pecifically, writes to the standby must be
done in the same order as those on the master. DRBD is a popular file system replication solu-
tion for Linux.

Warm and Hot Standby Using Point-In-Time Recovery (PITR)
Warm and hot standby servers can be kept current by reading a stream of write-ahead
log (WAL) records. If the main server fails, the standby contains almost all of the data of the
main server, and can be quickly made the new master database server. This is asynchronous
and can only be done for the entire database server.

Trigger-Based Master-Standby Replication
A master-standby replication setup sends all data modification queries to the master
server. The master server asynchronously sends data changes to the standby server. The
standby can answer read-only queries while the master server is running. The standby server
is ideal for data warehouse queries.Slony-I is an example of this type of replication, with per-
table granularity, and support for multiple standby servers. Because it updates the standby
server asynchronously (in batches), there is possible data loss during fail over.

Open Source Science Journal Vol. 2, No. 3, 2010


Statement-Based Replication Middleware
A program intercepts every SQL query and sends it to one or all servers. Each server
operates independently. Read-write queries are sent to all servers, while read-only queries can
be sent to just one server, allowing the read workload to be distributed. Pgpool-II and Sequoia
are examples of this type of replication.

Asynchronous Multimaster Replication
For servers that are not regularly connected, like laptops or remote servers, keeping data
consistent among servers is a challenge. Using asynchronous multimaster replication, each
server works independently, and periodically communicates with the other servers to identify
conflicting transactions. The conflicts can be resolved by users or conflict resolution rules.
Bucardo is an example of this type of replication.

Synchronous Multimaster Replication
In synchronous multimaster replication, each server can accept write requests, and
modified data is transmitted from the original server to every other server before each transac-
tion commits. Heavy write activity can cause excessive locking, leading to poor performance.
In fact, write performance is often worse than that of a single server. Read requests can be
sent to any server. Some implementations use shared disk to reduce the communication over-
head. Synchronous multimaster replication is best for mostly read workloads, though its big
advantage is that any server can accept write requests — there is no need to partition work-
loads between master and standby servers, and because the data changes are sent from one
server to another, there is no problem with non-deterministic functions like random().

Commercial Solutions
Because PostgreSQL is open source and easily extended, a number of companies have
taken PostgreSQL and created commercial closed-source solutions with unique failover, rep-
lication, and load balancing capabilities.

There are a few solutions that do not fit into the above categories:
Data Partitioning
Data partitioning splits tables into data sets. Each set can be modified by only one serv-
er. For example, data can be partitioned by offices, e.g., London and Paris, with a server in
each office. If queries combining London and Paris data are necessary, an application can
query both servers, or master/standby replication can be used to keep a read-only copy of the
other office's data on each server.

Multiple-Server Parallel Query Execution
Many of the above solutions allow multiple servers to handle multiple queries, but none
allow a single query to use multiple servers to complete faster. This solution allows multiple
servers to work concurrently on a single query. It is usually accomplished by splitting the data
among servers and having each server execute its part of the query and return results to a cen-
tral server where they are combined and returned to the user. Pgpool-II has this capability. Al-
so, this can be implemented using the PL/Proxy tool set.

5. Open source vs. Commercial products

Open source databases have revolutionized the economics of database management by
promising a wide range of businesses the opportunity to build high value systems with a low
Open Source Science Journal Vol. 2, No. 3, 2010


cost of entry and incremental cost scaling by the horizontal addition of databases. This model
is especially attractive for businesses like SaaS, ISPs, social networking sites, and many en-
terprise systems that need to scale investment as the value of the applications grows. The
problem is that while horizontal database scaling works well for small - to medium - sized ap-
plications, large clusters of open source databases are expensive to implement, difficult to
manage, and poses a variety of operational risks, including poor availability, data loss, ineffi-
cient usage of resources, and problems with performance scaling.
This leaves customers with a difficult choice - invest in the management capabilities to
make open source databases work, migrate to far more expensive commercial database sys-
tems, or consider even more radical options like discarding SQL databases. In all cases, cus-
tomers end up with enormously increased costs that severely limit profitability or even cause
their businesses to fail.

Fig. 2. Incremental costs for Open Source vs. Commercial Database

What the market really needs is a solution to make existing open source DBMS busi-
ness manageable: lower overall cost, raise flexibility, and lower risks of operation as busi-
nesses scale. In short, the market needs a solution to the cost-scaling problem (Figure 2).
MySQL has revolutionized the simplicity and accessibility of databases in general and
replication in particular. For example, MySQL users have created cost-effective read scaling
solutions using master/slave replication for years. By the principle that you can spend time to
save money, the community driven MySQL replication monitoring tools are a collection of
scripts in SQL and other languages, to achieve the same result that others can get with expen-
sive applications. Everything in this project is made with community involvement, by open
source principles. The main goal of this project is to take advantage of MySQL 5.1 features to
create a portable, SQL-based monitoring system. By the other hand breaking MySQL’s repli-
cation (Figure 3) isn’t hard [13]. The simple implementation that make it easy to set up, also
means there are many ways to stop, confuse, and otherwise disrupt it. We have to mention
common issues you might have to deal with by implementing a MySQL replication configu-
ration: errors caused by data corruption or loss, unexpected master shutdown, unexpected
slave shutdown, binary logs corrupted on the master, relay logs corrupted on the slave, binary
log out of sync with the InnoDB transaction log, bytes changed but the event is still valid
SQL, bytes changed and the event is invalid SQL, bytes omitted and/or the event’s length is
wrong, several events corrupted or overwritten, or offsets have shifted and the next event
starts at the wrong offset.
For a variety of reasons, MySQL replication is not very resilient to crashes, power out-
ages and corruption caused by disk, memory, or network errors. You’ll almost certainly have
to restart replication at some point due to one of these problems.
Open Source Science Journal Vol. 2, No. 3, 2010


Fig. 3. MySQL replication architecture [15]

On the other hand, Oracle replication products offer state-of-the-art solutions for avail-
ability, heterogeneous replication, application upgrade, and other problems. For example, [14]
Oracle Streams and Golden Gate TDM offer very advanced solutions to the problem of data
migration with minimal downtime. The big problem with these solutions is not capabilities,
but administrative complexity and cost.
Despite ubiquity, replication is complex to use and in the case of commercial databases
quite expensive to boot.
Oracle's four basic types of replication are as follows:

Table 1. Replication Types for Oracle
Replication Type Description
Read-only materialized views
A master table is copied to one or more databases. Changes in the
master table are reflected in the snapshot tables whenever the
snapshot refreshes. The snapshot site determin
es the frequency of
the refreshes; data is pulled.
Updateable materialized views Similar to read-
only snapshots, except that the snapshot sites are
able to modify the data and send their changes back to the master.
The snapshot site determines the frequen
cy of the refreshes and
the frequency with which updates are sent back to the master.
Multi-master replication
A table is copied to one or more databases, and each database has
the ability to insert, update, or delete records from it. Modifica-
tions are pu
shed to the other database at an interval that the DBA
sets for each replication group. The highest theoretical frequency
is once per second.
Procedural replication A call to a packaged procedure or function is repli
cated to one or
more databases.

As you can see, these types of replication are quite different, and each is suited for spe-
cific kinds of uses. A single environment can utilize all of these methods; they are not mutu-
ally exclusive.
Coming back to PostgreSQL, two or more database servers can work together to allow a
second server to take over quickly if the primary server fails (high availability), or to allow
Open Source Science Journal Vol. 2, No. 3, 2010


several computers to serve the same data (load balancing). Ideally, database servers could
work together seamlessly. Web servers serving static web pages can be combined quite easily
by merely load-balancing web requests to multiple machines. In fact, read-only database serv-
ers can be combined relatively easy too. Unfortunately, most database servers have a
read/write mix of requests, and read/write servers are much harder to combine. This is be-
cause though read-only data needs to be placed on each server only once, a write to any server
has to be propagated to all servers so that future read requests to those servers return consis-
tent results.
This synchronization problem is the fundamental difficulty for servers working to-
gether. Because there is no single solution that eliminates the impact of the sync problem for
all use cases, there are multiple solutions. Each solution addresses this problem in a different
way, and minimizes its impact for a specific workload.
Some solutions deal with synchronization by allowing only one server to modify the
data. Servers that can modify data are called read/write, master or primary servers. Servers
that track changes in the master are called standby or slave servers. A standby server that can-
not be connected to until it is promoted to a master server is called a warm standby server,
and one that can accept connections and serves read-only queries is called a hot standby
Some solutions are synchronous, meaning that a data-modifying transaction is not con-
sidered committed until all servers have committed the transaction. This guarantees that a
failover will not lose any data and that all load-balanced servers will return consistent results
no matter which server is queried. In contrast, asynchronous solutions allow some delay be-
tween the time of a commit and its propagation to the other servers, opening the possibility
that some transactions might be lost in the switch to a backup server, and that load balanced
servers might return slightly stale results. Asynchronous communication is used when syn-
chronous would be too slow.
Solutions can also be categorized by their granularity. Some solutions can deal only
with an entire database server, while others allow control at the per-table or per-database lev-
Performance must be considered in any choice. There is usually a trade-off between
functionality and performance. For example, a fully synchronous solution over a slow net-
work might cut performance by more than half, while an asynchronous one might have a
minimal performance impact.

6. Conclusions

Linux provides a high performance, reliable, open-source solution and its low cost
makes it the platform of choice for companies that are targeting the new Web economy. The
solution presented enhances the stability and power of Linux (Fig. 4) with high availability
Linux data protection, software solution that provides fault-resilience to support mission-
critical applications. This software application ensures the continuous availability of applica-
tions by maintaining system uptime. It maintains the high availability of Linux cluster sys-
tems by monitoring system and application health, maintaining client connectivity and provid-
ing uninterrupted data access regardless of where clients reside - on the corporate Internet,
intranet or extranet.

This implementation works on both physical servers and virtual ma-

It provides fault resilience for Linux cluster environments by enabling other servers in
a cluster to take over for failed servers or failed applications. Total cost of ownership is re-
duced, because it supports an active-active and N+1 server configurations. These models
eliminate the need for extra servers dedicated for hot backup and allow clients and applica-
Open Source Science Journal Vol. 2, No. 3, 2010


tions to failover to other production servers in the cluster.

Fig. 4. Replication scheme for Linux clusters

If an event creates an interruption in a server's availability, it automatically moves the
protected resources and applications to another server in the Linux cluster (Figure 5). Because
this switchover is transparent to clients, a system failure does not impact users' productivity. It
migrates all applications and transfers connectivity in such a way that clients have continuous
access to applications and data. This ensures that all clients - from internal users to customers
shopping online - are not affected by unanticipated system failures.

Fig. 5. Linux Database Server Failure
Open Source Science Journal Vol. 2, No. 3, 2010


Replicating databases using Slony-I is relatively simple. Combined with the Linux-HA
Heartbeat, this allows you to offer high availability of your database services. Although the
combination of Slony-I and Linux HA-Heartbeat is an attractive solution, it is important to
note that this is not a substitute for good hardware for your database servers.
Even with its small limitations, like not being able to propagate schema changes or rep-
licate large objects, Slony-I is a great alternative to both rserv and ERServer and is now, in
fact, the preferred solution for replicating PostgreSQL databases.
PGCluster is an extension of the PostgreSQL database, designed to give it synchronous,
multi-master replication. Replication is one of the areas in which PostgreSQL is lacking in
comparison to other proprietary databases, such as Oracle or Microsoft SQL Server. Since a
replication system is a synchronous replication, delay does not occur with the data duplicate
between the Cluster DBs. Since a server is multi-master composition, two or more the Cluster
DBs can receive access from a user simultaneously. PGCluster consists of three kinds of serv-
ers, Cluster DB, a load balancer, and a replication server.
The performance of processing reference queries in PGCluster rises as the number of
cluster servers increase because the load is distributed to more servers. However, update que-
ries are synchronous and require replication, which results in slower responses.
Data replication (Figure 6) allows replication of data between a source server where
data changes are captured, and a target „server” where data changes are applied.

Fig. 6. How replication works

Linux Data Replication Challenges:
- Ensuring that your data is available 24x7x365 regardless of the type of failure: storage
subsystem, user error or site loss
- Overcoming bandwidth and latency limitations associated with WAN replication
- Protection of data across virtual, physical or mixed environments
- Protecting data using your existing hardware and infrastructure
- Protection for all data - not just specific applications


[1] PostgreSQL, Available at:
[2] Slony-I, Available at:
[3] Slony-I, Available at:
Open Source Science Journal Vol. 2, No. 3, 2010


[4] Slonik Command Summary, , Available at:
[5] Slony-I Mailing List, Available at:

[6] Linux-HA Heartbeat, Available at:

[7] PGCluster, Available at:


[9] A. Brookins and M. Holloway, Practical PostgreSQL,

Commandprompt, Inc, 2001.

[10] N. Matthew and R. Stones, Beginning Databases with PostgreSQL: From Novice to Pro-
fessional, Second Edition, Apress, 2005.

[11] E. Geschwinde and H. J. Schoenig, PostgreSQL Developer's Handbook, Sams, Decem-
ber 28, 2001.
[12] O'Reilly's Open Source Conference, 2004.
[13] B. Schwarts, P. Zaitsev, V. Tkachenko, J. D. Zawodny, A. Lentz, D. J. Balling, High
Performance MySQL, O’Reilly Media, Inc., 2008.

[14] Oracle® Database Advanced Replication 10g Release 1 (10.1), December 2003.

[15] B. Momjian, PostgreSQL: Introduction and Concepts, Addison-Wesley, December 2000.


Elena - Mirela DROG has graduated the Faculty of Cybernetics, Statistics and
Computer Science in 2009 and the Faculty of Finances, Insurances, Banks and
Stock Exchanges in 2010. Amongst her fields of interest are DB Administration
and also the development and usage of Open Source Technologies. She is currently
working as a Database Administrator at Vodafone Romania, learning and
practicing the principles of Data Warehouse, administration and software com-
parative research.