Building and Configuring Slony

seasoningalluringΔιαχείριση Δεδομένων

29 Νοε 2012 (πριν από 4 χρόνια και 6 μήνες)

244 εμφανίσεις

16/03/2013

1
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc








Published on
ONLamp.com

(
http://www.onlamp.com/
)


http://www.onlamp.com/pub/a/onlamp/2004/11/18/slony.html


See this

if you're having trouble printing code examples


Introducing Slony

by
A. Elein Mustain

11/18/2004

Slony is the Russian plural for elephant. It is also

the name of the new replication project being
developed by Jan Weick. The mascot for Slony, Slon, is a good variation of the usual Postgres elephant
mascot, created by Jan.


Figure 1. Slon, the Slony mascot.

Slony
-
I
, the first iteration of the project, is an asynchronous replicator of a single master database to
multiple replicas, w
hich in turn may have
cascaded

replicas. It will include all features required to
replicate large databases with a reasonable number of replicas. Jan has targeted Slony
-
I toward data
centers and backup sites, implying that all nodes in the network are alwa
ys available.

The master is the primary database with which the applications interact. Replicas are replications, or
copies of the primary database. Since the master database is always changing, data replication is the
system that enables the updates of se
condary, or replica, databases as the master database updates. In
synchronous replication systems, the master and the replica are consistent exact copies. The client does
not receive a commit until all replicas have the transaction in question. Asynchronou
s replication
loosens that binding and allows the replica to copy transactions from the master, rolling forward, at its
own pace. The server issues a commit to the master client based on the state of the master database
transaction.

Cascading replicas over

a WAN minimizes bandwidth, enabling better scalability and also enables read
-
only (for example, reporting) applications to take advantage of replicas.

16/03/2013

2
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc



Figu
re 2. Cascading replicas

Assume you have a primary site, with a database server and a replica as backup server. Then you create
a remote backup center with its own main server and its backup replica. The remote primary server is a
direct replica, replicati
ng from the master over the WAN, while the remote secondary server is a
cascaded replica, replicating from the primary server via the LAN. This avoids transferring all of the
transactions twice over the WAN. More importantly, this configuration enables you

to have a remote
backup with its own local failover already in place for cases such as a data center failure.

Slony's design goals differentiate it from other replication systems. The initial plan was to enable a few
very important key features as a basis

for implementing these design goals. An underlying theme to the
design is to update only that which changes, enabling scalable replication for a reliable failover strategy.

The design goals for Slony are:

1.

The ability to install, configure, and create a re
plica and let it join and catch up with a running
database.

This allows the replacement of both masters and replicas. This idea also enables
cascading replicas, which in turn adds scalability, limitation of bandwidth, and proper
handling of failover situat
ions.

2.

Allowing any node to take over for any other node that fails.

In the case of a failure of a replica that provides data to other replicas, the other replicas
can continue to replicate from another replica or directly from the master.


Figure 3. Replication continues after a failure

16/03/2013

3
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc


In the case where a master node fails, a replica can receive a promotion to become a
master. Any other replicas can then repl
icate from the new master. Because Slony
-
I is
asynchronous, the different replicas may be ahead of or behind each other. When a
replica becomes a master, it synchronizes itself with the state of the most recent other
replica.

In other replication solutions
, this roll forward of the new master is not possible. In those
solutions, when promoting a replica to master, any other replicas that exist must rebuild
from scratch in order to synchronize with the new master correctly. A failover of a 1TB
database leave
s the new master with no failover of its own for quite a while.

The Slony design handles the case where multiple replicas may be at different
synchronization times with the master and are able to resynchronize when a new master
arises. For example, differe
nt replicas could logically be in the future, compared to the
new master. There is a way to detect and correct this. If there weren't, you would have to
dump and restore the other replicas from the new master to synchronize again.

It's possible to roll for
ward the new master, if necessary, from other replicas because of
the packaging and saving of the replication transactions. Replication data is packaged
into blocks of transactions and sent to each replica. Each replica knows what blocks it has
consumed. E
ach replica can also pass those blocks along to other servers
--
this is the
mechanism of cascading replicas. A new master may be on transaction block 17 relative
to the old master, when another replica is on transaction block 20 relative to the old
master.
Switching to the new master causes the other replicas to send blocks 18, 19, and
20 to the new master.

Jan, said, "This feature took me a while to develop, even in theory."

3.

Backup and point
-
in
-
time capability with a twist.

It is possible, with some scripti
ng, to maintain a delayed replica as a backup that might,
for example, be two hours behind the master. This is done by storing and delaying the
application of the transaction blocks. With this technique, it is possible to do a point
-
in
-
time recovery anytim
e within the last two hours on this replica. The time it takes to
recover only depends on the time to which you choose to recover. Choosing "45 minutes
ago" would take about one hour and 15 minutes, for example, independent of database
size.

4.

Hot PostgreSQL

installation and configuration.

For failover, it must be possible to put a new master into place and reconfigure the
system to allow the reassignment of any replica to the master or to cascade from another
replica. All of this must be possible without tak
ing down the system.

This means that it must be possible to add and synchronize a new replica without
disrupting the master. When the new replica is in place, the master switch can happen.

This is particularly useful when the new replica is a different Pos
tgreSQL version than
the previous one. If you create an 8.0 replica from your 7.4 master, it now is possible to
promote the 8.0 to master as a hot upgrade to the new version.

5.

Schema changes.

16/03/2013

4
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc


Schema changes require special consideration. The bundling of the

replication
transactions must be able to join all of the pertinent schema changes together, whether or
not they took place in the same transaction. Identifying these change sets is very difficult.

In order to address this issue, Slony
-
I has a way to execu
te SQL scripts in a controlled
fashion. This means that it is even more important to bundle and save your schema
changes in scripts. Tracking your schema changes in scripts is a key DBA procedure for
keeping your system in order and your database recreatab
le.

The first part of Slony
-
I also does not address any of the user interface
features required to set up and configure the system. After the core engine of
Slony
-
I becom
es available, development of the configuration and maintenance
interface can begin. There may be multiple interfaces available, depending on
who develops the user interface and how.

Jan points out that "replication will never be something where you type
SE
TUP

and all of a sudden your existing enterprise system will nicely replicate in a
disaster recovery scenario." Designing how to set up your replication is a
complex problem.

The user interface(s) will be important to clarify and simplify the
configuration

and maintenance of your replication system. Some of the issues
to address include the configuration of which tables to replicate, the
requirement of primary keys, and the handling of sequence and trigger
coordination.

The Slony
-
I release does not address
the issues of multi
-
master, synchronous
replication or sporadically synchronizable nodes (the "sales person on the
road" scenario). However, Jan is considering these issues in the architecture of
the system so that future Slony releases may implement some
of them. It is
critical to design future features into the system; analysis of existing
replication systems has shown that it is next to impossible to add fundamental
features to an existing replication system.

The primary question to ask regarding the req
uirements for a failover system is how much down time
can you afford. Is five minutes acceptable? Is one hour? Must the failover be read/write, or is it
acceptable to have a read
-
only temporary failover? The second question you must ask is whether you are
willing to invest in the hardware required to support multiple copies of your database. A clear
cost/benefit analysis is necessary, especially for large databases.

References



General Bits Slony Articles on Tidbits




The Slony
-
I Project documentation on GBorg




Slonik C
ommands



Jan Wieck's Original Slony
-
I Talk and Scripts
, July 2004 in Portland, OR, sponsored by
Affilias
Global Reg
istry Services




Information from IRC's #slony on
freenode.net




The Slony1
-
general@gborg.postgresql.org mailing list


A. Elein Mustain

has more than 15 years of experience working with databases, 10 of those working
exclusively with object relational database systems.

Related Reading


Practical PostgreSQL

By
John C.

Worsley
,
Joshua D.

Drake


Table of Contents

Index

Sample Chapter

16/03/2013

5
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc


Building and Configuring Slony

by
A. Elein Mustain

12/16/2004


Figure 1. Slony mascot

Editor's note: in

Introducing Slony
, A. Elein Mustain explained the goals of Slony, the replication
project for PostgreSQL. This follow
-
up explains how to install, configure, and start using it.

Build
ing

I am pleased to report that the basic instructions for the download, build, and install of
Slony
-
I release
1.0.5

were perfect.

Slony
-
I is fairly version independent, but you still need to build it for
each

PostgreSQL v
ersion (7.3 or
later) and installation on each machine participating in the replication. The same technique applies when
the installations live on different machines.

On one machine, I run several versions of PostgreSQL, each built from source. My plan is
to replicate
between my 7.4 installation and my 8.0 installation, so I configured and built Slony
-
I against each of
those source trees. That took less than a minute for both.

Repeat these steps for each source tree and installation:



./configure
-
with
-
pgsou
rcetree=/local/src/postgresql
-
version




make all


sudo make install


Setting Up Slony
-
I

This step
-
by
-
step reading of instructions will be applied to replicate a small database named
gb
. The
plan is to replicate from a PostgreSQL 7.4 installation to a Postgr
eSQL 8.0 installation, making it
possible to upgrade the database.

Slonik is the command
-
line interface that defines the replication system. There will be Slonik scripts to
create, update, and change the replication cluster for
gb
. There are also tools und
er development to
simplify the creation of replication systems with Slony
-
I; however, this description will explore the
underlying Slonik requirements. It is important to learn the basic Slonik commands.

About the database

gb

is a simple eight
-
table databa
se containing issues and articles for my General Bits web site. The
database is normalized, and all tables have natural primary keys.

There are several prerequisites:

16/03/2013

6
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc




Each installation that will participate in replication must have Slony
-
I built and instal
led.
The
Slony
-
I Project on GBorg

gives instructions for building and installing Slony
-
I. My experience
with building Slony
-
I from source against PostgreSQL 7.4 and 8.0Beta3 was ver
y good.
Following the instructions provided a clean and fast builds.



You need a set of master database tables to replicate and at least one other installation containing
the same schema objects
. The other installation will be the replica. To achieve this
initially, I
dumped and restored the schema for the master database on 7.4 into the 8.0 installation:

pg_dump
-
p 5434
-
C
-
s gb | pgsql
-
p 5430

As you can see, these installations are on the same host and have different port numbers.



The real
-
time clocks of

the servers hosting the nodes must be in sync.
I recommend using NTP.



The
pg_hba.conf

files on each installation must allow each machine to contact the other.

Slonik

Slonik is a command
-
line interface for Slony
-
I. It can connect to the various databases

involved in the
replication scheme to perform specific actions. It is an independent helper of Slony
-
I and of
PostgreSQL.

The first commands for most Slonik scripts constitute the identity of a group of databases and servers
and the connection parameters
for accessing each database in the group. Each database and Slony
-
I
connection is a numbered node. The numbers are simply identifiers. The next parameter is the action
you wish to process.

Slonik commands work well when they are embedded in shell scripts,
as in this example. (The next
section covers the commands to identify the cluster and node connection information.)

#!/bin/bash

slonik << _END_

cluster name = gb_cluster;

node 1 admin connifo = 'dbname=db host=localhost port=5432 user=postgres';

node 2 adm
in connifo = 'dbname=db host=localhost port=5430 user=postgres';



...additional nodes...


...slonik commands...


_END_

Both
Slonik commands

and
Slony
-
I

have full sets of commands.

Node networks

A
node

is the combination of a database in an installation and one
slon

process "belonging to" that
database. A
cluster

is a set of nodes cooperating

in a replication scheme.

The documentation suggests that all nodes have a
path

to all other nodes. With only two nodes, this is
simple to describe. With more nodes, be sure to include a path to all other nodes, regardless of whether
you expect replication

to take the paths.

16/03/2013

7
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc



Figure 2. A Slony
-
I cluster

Our first Slonik script initializes the cluster, defines each node, and defines the paths from each

node to
every other node. Notice that each node has an identifying number.
init cluster

defines the cluster on
the first node.
store node

adds each subsequent node. The user is the slony superuser
--
in this case,
postgres
. You can choose any privileged use
r established as the Postgres superuser on each
installation.

The
path

is defined by designating one node as a server and the other as a client for messaging. The
terminology does not relate to the replicator/replica relationship; instead it references the

possible
network path. The connection information in each command belongs to the server node. The client's
slon

daemon will connect to the server node using that connection information.

#!/bin/bash

#

# 01: Initialize Cluster

#

slonik << _END_

cluster name

= gbcluster;

node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres';

node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres';



#

# Initialize the cluster and create the second node

#

init cluster (id=1, comment='
gb 7.4 5434');

echo 'Initializing gb cluster';

echo 'Node 1 on pgsql74 port 5434 defined';



store node (id=2, comment='gb 8.0 5430');

echo 'Node 2 on pggsql80b port 5430 defined';



#

# create paths in both directions

#

store path (server=1, client=
2, conninfo='dbname=gb host=localhost


port=5434 user=postgres');

store path (server=2, client=1, conninfo='dbname=gb host=localhost


port=5430 user=postgres');

echo 'path from server node 1 to client node 2 created.';

echo 'path from server node

2 to client node 1 created.';



_END_

Using Slonik's
echo

command can help log and track the commands in any Slonik script.

16/03/2013

8
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc


Listening for events

Events

will occur throughout the cluster, and you must tell Slony
-
I what nodes listen to what nodes to
rece
ive these events. The events may be replication information or administrative information that
requires propagation throughout the cluster.

In the simple case of two nodes, they listen to
each other
. In any case, all nodes should be able to listen
to all o
ther nodes. The paths' definitions intentionally make this possible.

Specifying the
origin

identifies which node the
receiver

is listening for. The origin of an event may or
may not provide the event to the receiver; however, the default is to do so. It is

possible for node 3 to
listen for events initiated on node 1 and have those events provided by node 2 (which, one assumes, is
also listening for events from node 1).

In our case, we are having both nodes listen for events on the other, with the events pro
vided by the
origin node.

#!/bin/bash

#

# 02: Listen

#

slonik << _END_

cluster name = gbcluster;

node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres';

node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres';



#

# make the nodes listen on the paths

# in both directions

#

store listen (origin=1, receiver=2, provider=1);

store listen (origin=2, receiver=1, provider=2);



_END_



Starting the
slon

processes

Once the nodes can listen to each other for events, start
slon
. Each database participating in the
replication needs a
slon

process. Give
slon

a chance to start itself and its threads.

The output in our example goes to two logs, which you can
ta
il

to watch the activity and look for
errors.

slon

is essentially an event and messaging system. The events involve the replication of data and
administrative information to facilitate the replication of data.

#!/bin/sh

#

# 02: Start up Slon processes

#

#

# Start Slony for each node

#

slon gbcluster "dbname=gb user=postgres port=5434 host=localhost" >


slon_gb_74.out 2>&1 &

slon gbcluster "dbname=gb user=postgres port=5430 host=localhost" >

16/03/2013

9
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc



slon_gb_80.out 2>&1 &

Creating sets

Replication in Slon
y
-
I works by subscribing to
sets

of tables. The set usually should comprise the group
of related tables for an application or an entire schema.

To make this work, first define a set and designate the origin for the set. Then add the tables by naming
the
se
t ID
, the origin of the set, a
table ID
, the fully qualified table name, and an optional alternate key.
Make sure to enter the origin of the set as it was in the set creation (redundantly). All of the tables
participating in the replication must have a pri
mary key. If the table does not have one, you can have
Slony
-
I add one for replication purposes only.

Be careful when setting the ID number of a table; it also designates the order in which Slony will lock
the tables. This means that master tables should h
ave IDs lower than those of detail tables. The
relationship hierarchy of your schema should help you determine the order of the numbers. If the
ordering of the table IDs is backward or incorrect, there may be problems with deadlocking the
slon

process or P
ostgreSQL.

In our example, the
issues

table is the topmost master, followed by
articles
. Each of the other tables
are lookup tables for those, so their numbers are higher, accordingly.


Figure 3. General Bits schema

You can create a set only once, without any active subscribers. To add tables to replication set, create a
new set. You can later combine two sets by using Slonik's
MERGE SET

command.

#!/bin/s
h

#

16/03/2013

10
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc


# 03: Create Set

#

slonik << _END_

#

# Define cluster namespace and node connection information

#

cluster name = gbcluster;

node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres';

node 2 admin conninfo = 'dbname=gb host=localhost po
rt=5430 user=postgres';



create set (id=1, origin=1, comment='gb tables');

echo 'Set created';

set add table (set id=1, origin=1, id=1,


full qualified name = 'public.issues', comment='Issues table');

set add table (set id=1, origin=1, id=2,


full
qualified name = 'public.articles', comment='Articles table');

set add table (set id=1, origin=1, id=3,


full qualified name = 'public.arttext', comment='Article Text table');

set add table (set id=1, origin=1, id=4,


full qualified name = 'public.sqlb
ase', comment='Full SQL keywords');

set add table (set id=1, origin=1, id=5,


full qualified name = 'public.whoall', comment='All contributors');

set add table (set id=1, origin=1, id=6,


full qualified name = 'public.contrib', comment='Contributors by

Article');

set add table (set id=1, origin=1, id=7,


full qualified name = 'public.keywords', comment='Keywords by Article');

set add table (set id=1, origin=1, id=8,


full qualified name = 'public.sqlkw', comment='Subset of SQL keywords');

echo 'set
1 of gb tables created';

_END_

Subscribing to sets

The nodes can now subscribe to the newly created sets. To subscribe to a set, identify the set, the node
that can provide the set, the receiver of the set, and whether the receiver of this set should be ab
le to
forward the set to another node. In our case, the origin node of the set is the same as the provider of the
set, but for cascading subscriptions that is not necessarily the case.

Even though this replication system has only two nodes, we are saying t
hat the receiving node may
forward the set. This is for the case in which we may want to switch masters or add other nodes to the
cluster. Here, node 2 is subscribing to set 1. originating on node 1 and provided by node 1.

#!/bin/sh

#

# gb_subscribeset.sh

#

slonik << _END_

#

# Define cluster namespace and node connection information

#

cluster name = gbcluster;

node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres';

node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres
';


subscribe set (id=1, provider=1, receiver=2, forward=yes);

echo 'set 1 of gb tables subscribed by node 2';



_END_

Of course, you should assume that these scripts have no typos and that you've run them exactly as
intended.
Yeah, right. Fortunately,
you can recover from mistakes.

16/03/2013

11
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc


Undoing

By this time, you probably have made a typo or two and need to know how to start over. The simplest
way of undoing is to start fresh. There are subtler ways of correcting mistakes by updating the
underlying tables. Ho
wever, I don't recommend those unless you have intimate knowledge of the
underlying tables.



To terminate the
slon

processes, list their process IDs and use
kill
-
TERM

to terminate the
oldest of the processes for each node.



To completely remove all Slony
-
I

definitions from your database, uninstall each node:



#!/bin/sh



# gb_uninstallnode.sh



slonik << _END_



#



# Define cluster namespace and node connection information



#



cluster name = gbcluster;



node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=
postgres';



node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres';



echo 'Cluster defined, nodes identified';






#



# UnInstall both nodes



#



uninstall node (id=1);



uninstall node (id=2);



echo 'Nodes 1 and 2 Removed';

_END_

NOTE:

UNINSTALL N
ODE

removes
all

definitions, and you must start cleanly after that.

Slony
-
I schema

The underlying tables for Slony
-
I are fairly straightforward. The cluster name is the name of the schema
in the database in which the Slony tables reside. (Use
set search_pa
th

in
psql
.) You can verify your
commands to add nodes, listens, paths, and so on by examining these tables. It also looks tempting to
"fix" things by just changing the underlying tables. Resist doing so, however. Use Slonik so that it can
trigger the appr
opriate events to perform the updates in an orderly fashion across all nodes.

16/03/2013

12
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc



Figure 4. Slony schema

References



General Bits Slony Articles on Tidbits




The Slony
-
I Project documentation on GBorg




Slonik Commands



Jan Wieck's Original Slony
-
I Talk and Scripts

July 2004 in Portland, Oregon, sponsored by
Affilias Global Registry Services




Information from IRC #slony on freenode.net



Mailing List: Slony1
-
general@gborg.postgresql.org


Elein Mustain

has more than 15 years of experience working with databases, 10 of those working
exclusively with object relational database systems.

16/03/2013

13
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc



Modifying Slony Clusters

by
A. Elein Mustain

03/17/2005


Replication clusters have much forethought applied to their creation. However, in the course of systems
development, so
me changes are always necessary sooner or later.
This article walks through the steps
required to:



Add a node to the cluster.



Switch data providers of a table set.



Promote a replication to master.



Apply schema changes to the replication cluster.

These
examples will use the replication scheme originally set up in "
Introducing Slony
."


Figure 1.

Adding a Node

Suppose that in addition to the replica you created for the
gb

database, you want another replica of the
same database for reporting. Here's how to add a replica of the
gb

table set on a second database in the
8.0 insta
llation. The 7.4 Node 1, database
gb
, will originate the data set and replicate it directly to Node
2, also database
gb
, and Node 3, database
gb2
.

16/03/2013

14
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc



Figure 2.

Before starting, be sure to create
gb2

in the 8.0 installation, seeding it with the same empty schema as
the other two databases in this cluster. You do not want a dump of the schema of
gb

as is it now, but
rather as it was before you defined
the Slony
-
I cluster.

Next, define Node 3 and ensure there are paths from Node 3 to
and

from Nodes 1 and 2. From there,
enable listening along each path mirroring the expected table set replication. The listening of 2 and 3 via
Node 1 reflects this mirrorin
g, rather than having a direct listen path between 2 and 3.

This is a really good time to remember that the connection information in the
store path

command
pertains to the
server

node. This is also a pretty good time to look up
drop path

and
drop listen
,
two more
slonik

commands.

#!/bin/sh

slonik << _END_

#

# Define cluster namespace and node connection information

#

cluster name = gbcluster;

node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres';

node 2 admin conninfo = 'dbname=gb host
=localhost port=5430 user=postgres';

node 3 admin conninfo = 'dbname=gb2 host=localhost port=5430 user=postgres';

echo 'Cluster defined, nodes identified';


#

# Initialize the cluster and create the second node

#

store node (id=3, comment='gb2 8.0 5430');


#

# create paths

#

store path (server=1, client=3, conninfo='dbname=gb host=localhost


port=5434 user=postgres');

store path (server=2, client=3, conninfo='dbname=gb host=localhost


port=5430 user=postgres');

store path (server=3, client=1, conn
info='dbname=gb2 host=localhost


port=5430 user=postgres');

store path (server=3, client=2, conninfo='dbname=gb2 host=localhost


port=5430 user=postgres');


#

# Enable listening along each path

#

store listen (origin=1, receiver=3, provider=1);

s
tore listen (origin=3, receiver=1, provider=3);

16/03/2013

15
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc


store listen (origin=2, receiver=3, provider=1);

store listen (origin=3, receiver=2, provider=1);


_END_

Now you are ready to start a new
slon

process for Node 3, the 8.0 installation with the
gb2

database.
B
ecause there are two replica databases on the 8.0 installation:

#! /bin/bash

#

# varlena Slony Start Up

# ==============================

slon gb "dbname=gb2 host=localhost port=5430 user=postgres" > slon_gb_3.out 2>&1 &

Once the new
slon

process is up and
running, you can subscribe Table Set 1, originating at Node 1, to
Node 3. At this point the log files are invaluable;
tail
-
f

the log files to watch for progress and errors.
The log files in this case are
slon_gb_1.out
,
slon_gb_2.out
, and slon_gb_3.out. If

you see any problems,
you may have to remove the paths and/or listens, replacing them with corrected ones.

The log file
slon_gb_1.out

looks like:

CONFIG main: slon version 1.0.5 starting up

CONFIG main: local node id = 1

CONFIG main: loading current clust
er configuration

CONFIG storeNode: no_id=2 no_comment='Node 2 dbname=gb host=localhost


port=5430 user=postgres'

CONFIG storePath: pa_server=2 pa_client=1 pa_conninfo="dbname=gb host=localhost


port=5430 user=postgres" pa_connretry=10

CONFIG stor
eListen: li_origin=2 li_receiver=1 li_provider=2

CONFIG main: configuration complete
-

starting threads

CONFIG enableNode: no_id=2

CONFIG storeSet: set_id=1 set_origin=1 set_comment=''

CONFIG storeNode: no_id=3 no_comment='gb2 8.0 5430'

CONFIG enableNode:
no_id=3

CONFIG storePath: pa_server=3 pa_client=1 pa_conninfo="dbname=gb2 host=localhost


port=5430 user=postgres" pa_connretry=10

CONFIG storeListen: li_origin=3 li_receiver=1 li_provider=3

The one for Node 2,
slon_gb_2.out
, looks very similar except

the paths and listens are appropriate for
Node 2.

Now subscribe Node 3 to the table set.

#!/bin/sh

slonik << _END_

#

# Define cluster namespace and node connection information

#

cluster name = gbcluster;

node 1 admin conninfo = 'dbname=gb host=localhost p
ort=5434 user=postgres';

node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres';

node 3 admin conninfo = 'dbname=gb2 host=localhost port=5430 user=postgres';


subscribe set (id=1, provider=1, receiver=3, forward=yes);

echo 'set 1 of gb
tables subscribed by node 3';


_END_

Verification of data is reassuring at this point.

16/03/2013

16
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc




Swapping Providers

If your replication scheme has three or more nodes, you may want to switch provi
ders. This is not the
same as failover or promotion of a new master. It simply changes the source of the replication data for a
node.

In the example case, Node 1, the origin of set 1, provided the information on set 1 to Node 2. When you
added Node 3, you
subscribed it to Table Set 1. The Table Set both originated on and was provided by
Node 1.

Now the goal is to switch providers so that Node 3 retrieves Table Set 1 information from Node 2,
instead of directly from Node 1. Node 1 remains the origin of the T
able Set.

Node 3 could have either Node 1 or Node 2 as a provider of that set information. Node 2 is available as
a provider, because when you subscribed Node 2 to Table Set 1, you also enabled it as a
forwarder

of
Table Set 1.

The
listen

paths, however, h
ave Node 2 and Node 3 listening to each other via Node 1. One necessary
change is to ensure that 2 and 3 listen to each other directly, because Node 2 will now provide the data
for Node 1. Once the paths and listens are already set up, simply resubscribe t
he set, setting the provider
to Node 2 instead of Node 1.


Figure 3.

#! /bin/bash

#

# varlena Slony Initialization

# ========================
======

slonik << _END_


cluster name =gbcluster;


node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres';

node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres';

node 3 admin conninfo = 'dbname=gb2 host=localhost por
t=5430 user=postgres';

# ==============================


store listen (origin=3, receiver=2, provider=3);

store listen (origin=2, receiver=3, provider=2);


subscribe set ( id=1, provider=2, receiver=3, forward=yes);


16/03/2013

17
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc


_END_

After running this script, exami
ne the log files to see that Slony stored the listens and updated the
subscription.

Switching Masters

In the example, the database called
gb

(mastered on a PostgreSQL 7.4 installation) replicates to two
nodes on 8.0 installation. The decision has been made

to move the database forward so that the 8.0
installation is the master. (If you are using Slony
-
I, be prepared to take advantage of a fast upgrade using
master promotion.)

Before you even consider swapping masters, you must have in hand a complete set of

steps to follow to
switch your applications accessing the 7.4 installation to the 8.0 installation. These steps are application
-
dependent, and so you are on your own. However, the steps probably consist of stopping each
application, changing the connectio
n information (ideally in
one

place), and bringing the application
back up after the switchover. It is imperative for a smooth and fast switchover that you have the
application switch information at hand. Write them down and save the instructions in your s
ource code
control mechanism. You will never be sorry you did that.

One more cautionary note, I
highly

recommend that you test your application on a copy of the 8.0
database. This copy should not be the replica, if your application writes to the database;
it should be
another copy. Remember that replicas are
read only

databases. Oh, yes, one more thing. Back up your
databases before performing any major operations on them. Switching replication masters is a major
operation. No excuses. Back up!

Everything t
hat happens in a Slony
-
I replication happens because of an
Event
. One of the important
Event
s is a
SYNC

event. Every subscribing node replicates data up to a
SYNC

event and then commits the
changes to the database. All nodes capable of forwarding subscript
ions also log those changes until
every node subscribed to that set

has confirmed the
SYNC

event. This ensures that replication data
remains available in the system until Slony is sure that no node needs it.

To change the master of a set, you must first en
sure that there are listeners for any of the new paths. The
example provided listening paths from and to both of the nodes already, so there are no new listener
paths required. Before swapping a master on a subscribed set, the set must be locked to ensure
that no
updates occur during the swap
-
over part. Then you may move the set. Finally, the newly designated
replica node, formerly the master node, must subscribe to the set.

Before you run the script, ensure that write access to your master database is
OFF
.

#!/bin/sh



slonik << _EOF_



cluster name = gbcluster;

node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres';

node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres';

node 3 admin conninfo = 'dbname=gb2 host=localh
ost port=5430 user=postgres';



# add listener paths if required



#

# lock and move set

#

lock set (id=1, origin=1);

16/03/2013

18
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc


move set (id=1, old origin=1, new origin=2);



# subscribe set if required

subscribe set (id=1, provider=2, receiver=1, forward=yes);



_E
ND_

After this script, Slony
-
I is ready for the change. Restart your application and have it point to the new
master for writes. This process should only take seconds.

Failover

The replication example here began as a 7.4 database
gb

(Node 1), replicating t
o an 8.0 installation
(Node 2). Then you added a third node on the 8.0 database called
gb2
. Initially the third node replicated
directly from Node 1. You switched Node 3's provider to Node 2, then promoted Node 2 to be the
master. Now you have an 8.0 maste
r at Node 2 and two replicas, one 7.4 and one in the same installation
as
gb2
. Consider the second replica,
gb2
, as if it were on another machine or installation.

Suppose the master database failed or lost power and had to be taken offline. (Remember Node
3, for the
sake of this discussion, is not on this machine, and so this hypothetical situation treats it as if it were live
when the machine died.) What to do next is a
human

decision. You must prepare for this failover
scenario. In this case, you have dec
ided to failover to Node 1 in case of a failure on Node 2.

The applications communicating with the database will also start to fail, so you must take them offline
and restart them, pointing at the new master quickly
after

the master switch takes effect.

Th
e failover procedure for Slony is a combination of a provider change and a master change, both of
which "
Building and Configuring Slony
" covered. Previously, you added Node 3
and had Node 1
provide for it. Then you changed Node 3's provider to Node 2, and finally, promoted Node 2 to master.
In the failover case, Slony must do the reverse using the
failover

command:



Promote Node 1 to master.



Switch Node 3 to pull from Node 1.

Then you can safely remove Node 2 for repair.

#!/bin/sh




slonik <<_EOF_




cluster name = gbcluster;


node 1 admin conninfo = 'dbname=gb host=localhost user=postgres port=5434';


node 2 admin conninfo = 'dbname=gb host=localhost user=postgres port=5430';


node 3 admin conninfo = 'dbname=gb2 host=localhost user=postgres port=5430';




failover (id=2, backup node = 1);




_EOF_

At this point, the
slon

process for Node 2 should be dead. When you are ready to put Node 2 back into
place, add it as a fresh empt
y database into the replication scheme. (See "Add Node," above.) When it
catches up, then you can switch masters so that Node 2 can again be the master of the cluster. (See
"Switching Masters.")

While resolving some of the problems that I ran into, I found

that it was easiest and clearest for me to
drop Node 2.
drop node

"erases" the node from the entire replication cluster. This is different than
16/03/2013

19
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc


uninstall node
, which removes the Slony
-
I schema from a specific database instance. Both tools are
useful, but
don't confuse them.

The other issue I ran into was that while quickly cutting and pasting, I had mismatched paths, listens,
and connection information. It is
very

worthwhile to check each of these commands by hand to verify
that the commands are exactly wh
at you meant. Also, don't forget that the connection information for
store path

pertains to the
server

node. Undoing bad paths and listen connections is a delicate
operation and it is very tempting to throw the whole thing away and start from scratch.

Sche
ma Changes

Changing the schema of a database being replicated is not simple. The schema changes
must

occur at the
exact transactional point in time.
Direct schema changes could lead to serious corruption in Slony
-
I

due
to the handling and disabling of trig
gers, rules, and constraints in the replica.

Slony
-
I provides a way to execute SQL statements via the
Event

mechanism. This provides the
transactional integrity necessary for schema changes, as well as the trigger, rule, and constraint changes
required for

the replicas. You must initiate
Event
s on the master node.

To add a "dummy" column to the artfile table, issue an
Event

to the master Node 1 pointing to the
ALTER TABLE

script file so as to synchronize it between databases. The
EXECUTE SCRIPT

command in
S
lonik will do this. Remember, Slony must be able to find your change script.

#!/bin/sh


#

# Create a new column in a table in the replicated set

#

echo "Creating new column in the artfile table"

slonik <<_EOF_

cluster name = gbcluster;

node 1 admin connin
fo = 'dbname=gb host=localhost user=postgres port=5434';

node 2 admin conninfo = 'dbname=gb host=localhost user=postgres port=5430';

node 3 admin conninfo = 'dbname=gb2 host=localhost user=postgres port=5430';



execute script ( SET ID = 1, FILENAME = '
changes20050219.sql', EVENT NODE = 1);

_EOF_

Once this change propagates, you can do an update to populate the new column and verify it is being
updated on all replicas.

References



General Bits Slony Articles on Tidbits




The Slony
-
I Project documentation on GBorg




Slonik Commands




Jan Wieck's Original Slony
-
I Talk and Scripts

(PDF) July 2004 in Portland, OR, sponsored by
Global Registry Service
s




Information from IRC #slony on freenode.net



Mailing list: Slony1
-
general@gborg.postgresql.org


A. Elein Mustain

has more than

15 years of experience working with databases, 10 of those working
exclusively with object relational database systems.

16/03/2013

20
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc



http://gborg.postgresql.org/project/slony1/projdisplay.php

http://gborg.postgresql.org/project/slony1/genpage.php?slonik_commands

http://www.varlena.com/varl
ena/GeneralBits/Tidbits/




Why yet another replication system?


Slony
-
I was born from an idea to create a replication system that was not tied

to a
specific version of PostgreSQL, and allowed to be started and stopped on

an
existing database with out the

need for a dump/reload cycle.


What Slony
-
I is:


Slony
-
I is a "master to multiple slaves" replication system with cascading and

slave promotion. The big picture for the development of Slony
-
I is a

master
-
slave system that includes all features and capabi
lities needed to

replicate large databases to a reasonably limited number of slave systems.

Slony
-
I is a system for data centers and backup sites, where the normal mode

of
operation is that all nodes are available.


What Slony
-
I is not:


Slony
-
I is not a
network management system. Slony
-
I does not have any

functionality within it to detect a node failure, or automatically promote a

node
to a master or other data origin. Slony
-
I is not multi
-
master; it's not

a
connection broker, and it doesn't make you co
ffee and toast in the morning.


Why doesn't Slony
-
I do automatic fail
-
over/promotion?


This is the job of network monitoring software, not Slony. Every site's

configuration and fail
-
over path is different. For example, keep
-
alive

monitoring with redundan
t NIC's and intelligent HA switches that guarantee

race
-
condition
-
free takeover of a network address and disconnecting the

"failed" node vary in every network setup, vendor choice, hardware/software

combination. This is clearly the realm of network manage
ment software and not

Slony
-
I. Let Slony
-
I do what it does best: provide database replication.


Current Limitations:


Slony
-
I does not automatically propagate schema changes, nor does it have any

ability to replicate large objects.

16/03/2013

21
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc



Getting started with
Slony
-
I


Installation


Before you can begin replicating your databases with Slony
-
I, you need to

install it.


Requirements:


Any platform that can run PostgreSQL should be able to run Slony
-
I. The

platforms that have received specific testing at the time
of this release are

FreeBSD
-
4X
-
i368, FreeBSD
-
5X
-
i386, FreeBSD
-
5X
-
alpha, osX
-
10.3, Linux
-
2.4X
-
i386

Linux
-
2.6X
-
i386 Linux
-
2.6X
-
amd64, Solaris
-
2.8
-
SPARC, Solaris
-
2.9
-
SPARC and

OpenBSD
-
3.5
-
sparc64.


All the servers used within the replication cluster need to h
ave their Real

Time
Clocks in sync. This is to ensure that slon doesn't error with messages

indicating that slave is already ahead of the master during replication.

We recommend you use ntpd running on the master, with the slaves using it as
their time p
eer.


The following software packages are required to build Slony
-
I:


GNU make. Other make programs will not work. GNU make is often installed

under
the name gmake; this document will always refer to it by that name. (On

some
systems GNU make is the defa
ult tool with the name make) to test for GNU

make
enter gmake version. Version 3.76 or later is good. Previous versions

m
ay not
be.


You need an ISO/ANSI C complier. Recent versions of GCC work.


You also need a recent version of PostgreSQL *source*. S
lony
-
I depends on

namespace support so you must have version 7.3 or newer to be able to build

and
use Slony
-
I.


If you need to get a GNU package, it comes in the standard packaging for your

operating system, or you can find it at your local GNU mirror (see

http://www.gnu.org/order/ftp.html for a list) or at ftp://ftp.gnu.org/gnu.)


If you need to obtain PostgreSQL source, you can download it form your

favorite PostgreSQL mirror (see for a list), or via BitTorrent at .


Also check to make sure you have suffi
cient disk space. You will need

approximately 5MB for the source tree during build and installation.


Getting Slony
-
I Source


You can get the Slony
-
I source from


After you have obtained the file, unpack it.


gunzip slony.tar.gz

tar xf slony.tar


This wi
ll create a directory Slony
-
I under the current directory with the

Slony
-
I sources. Change into that that directory for the rest of the

installation procedure.



Short Version


16/03/2013

22
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc



./configure
--
with
-
pgsourcetree=

gmake all

gmake install


1. Configuration


T
he first step of the installation procedure is to configure the source tree

for
your system. This is done by running the configure script. Configure

needs to
know where your PostgreSQL source tree is, this is done with the

--
with
-
pgsourcetree= option.


E
xample:


./configure
--
with
-
pgsourcetree=/usr/local/src/postgresql
-
7.4.3


This script will run a number of tests to guess values for various dependent

variables and try to detect some quirks of your system. Slony
-
I is known to

need
a modified version of l
ibpq on specific platforms such as Solaris2.X on

SPARC
this patch can be found at .


2. Build


To start the build process, type


gmake all


(Remember to use GNU make) The build make take any ware from 30 seconds to 2

minutes depending on your hardware. T
he last line displayed should be


All of Slony
-
I is successfully made. Ready to install.


3. Installing Slony
-
I


To install Slony
-
I, enter


gmake install


This will install files into postgresql install directory as specified by the

--
prefix option used
in the PostgreSQL configuration. Make sure you have

appropriate permissions to write into that area. Normally you need to do this

as
root.



16/03/2013

23
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc


Replicating Your F
irst Database


In this example, we will be replicating a brand new pgbench database. The

mechanics of replicating an existing database are covered here, however we

recommend that you learn how Slony
-
I functions by using a fresh new

non
-
production database
.


The Slony
-
I replication engine is trigger
-
based, allowing us to replicate

databases (or portions thereof) running under the same postmaster.


This example will show how to replicate the pgbench database running on

localhost (master) to the pgbench slave

database also running on localhost

(slave). We make a couple of assumptions about your PostgreSQL configuration:


1. You have tcpip_socket=true in your postgresql.conf and

2. You have localhost set to trust in pg_hba.conf


The REPLICATIONUSER needs to be

PostgreSQL superuser. This is typically

postgres or pgsql.


You should also set the following shell variables:


CLUSTERNAME=slony_example

MASTERDBNAME=pgbench

SLAVEDBNAME=pgbenchslave

MASTERHOST=localhost

SLAVEHOST=localhost

REPLICATIONUSER=pgsql

PGBENCH
USER=pgbench


Here are a couple of examples for setting variables in common shells:


bash/sh:


export CLUSTERNAME=slony_example

(t)csh:


setenv CLUSTERNAME slony_example


Creating the pgbenchuser


createuser
-
A
-
D $PGBENCHUSER


Preparing the databases


cre
atedb
-
O $PGBENCHUSER
-
h $MASTERHOST $MASTERDBNAME

createdb
-
O $PGBENCHUSER
-
h $SLAVEHOST $SLAVEDBNAME


pgbench
-
i
-
s 1
-
U $PGBENCHUSER
-
h $MASTERHOST $MASTERDBNAME


Because Slony
-
I depends on the databases having the pl/pgSQL procedural

language installed
, we better install it now. It is possible that you have

installed pl/pgSQL into the template1 database in which case you can skip this

step because it's already installed into the $MASTERDBNAME.


createlang plpgsql
-
h $MASTERHOST $MASTERDBNAME


Slony
-
I d
oes not yet automatically copy table definitions from a master when a

slave subscribes to it, so we need to import this data. We do this with

pg_dump.


pg_dump
-
s
-
U $REPLICATIONUSER
-
h $MASTERHOST $MASTERDBNAME | psql
-
U
$REPLICATIONUSER
-
h $SLAVEHOST $S
LAVEDBNAME


To illustrate how Slony
-
I allows for on the fly replication subscription, lets

start up pgbench. If you run the pgbench application in the foreground of a

16/03/2013

24
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc


separate terminal window, you can stop and restart it with different

parameters at any t
ime. You'll need to re
-
export the variables again so they

are available in this session as well.


The typical command to run pgbench would look like:


pgbench
-
s 1
-
c 5
-
t 1000
-
U $PGBENCHUSER
-
h $MASTERHOST $MASTERDBNAME


This will run pgbench with 5 con
current clients each processing 1000

transactions against the pgbench database running on localhost as the pgbench

user.


Configuring the Database for Replication.


Creating the configuration tables, stored procedures, triggers and

configuration is all don
e through the slonik tool. It is a specialized

scripting aid that mostly calls stored procedures in the master/salve (node)

databases. The script to create the initial configuration for the simple

master
-
slave setup of our pgbench database looks like thi
s:


#!/bin/sh


slonik <<_EOF_


#
--


# define the namespace the replication system uses in our example it is


# slony_example


#
--


cluster name = $CLUSTERNAME;



#
--


# admin conninfo's are used by slonik to connect to the nodes one for each


#

node on each side of the cluster, the syntax is that of PQconnectdb in


# the C
-
API


#
--


node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST
user=$REPLICATIONUSER';


node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST
user=$REPLIC
ATIONUSER';



#
--


# init the first node. Its id MUST be 1. This creates the schema


# _$CLUSTERNAME containing all replication system specific database


# objects.



#
--


init cluster ( id=1, comment = 'Master Node');




#
--


# Because the h
istory table does not have a primary key or other unique


# constraint that could be used to identify a row, we need to add one.


# The following command adds a bigint column named


# _Slony
-
I_$CLUSTERNAME_rowID to the table. It will have a defau
lt value


# of nextval('_$CLUSTERNAME.s1_rowid_seq'), and have UNIQUE and NOT NULL


# constraints applied. All existing rows will be initialized with a


# number


#
--


table add key (node id = 1, fully qualified name = 'public.history');



#
--



# Slony
-
I organizes tables into sets. The smallest unit a node can


# subscribe is a set. The following commands create one set containing


# all 4 pgbench tables. The master or origin of the set is node 1.


#
--


create set (id=1, origin=1, comm
ent='All pgbench tables');

16/03/2013

25
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc



set add table (set id=1, origin=1, id=1, fully qualified name =
'public.accounts', comment='accounts table');


set add table (set id=1, origin=1, id=2, fully qualified name =
'public.branches', comment='branches table');


set ad
d table (set id=1, origin=1, id=3, fully qualified name =
'public.tellers', comment='tellers table');


set add table (set id=1, origin=1, id=4, fully qualified name =
'public.history', comment='history table', key = serial);



#
--


# Create the second n
ode (the slave) tell the 2 nodes how to connect to


# each other and how they should listen for events.


#
--



store node (id=2, comment = 'Slave node');


store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME
host=$MASTERHOST user=$REPLICAT
IONUSER');


store path (server = 2, client = 1, conninfo='dbname=$SLAVEDBNAME
host=$SLAVEHOST user=$REPLICATIONUSER');


store listen (origin=1, provider = 1, receiver =2);


store listen (origin=2, provider = 2, receiver =1);

_EOF_



Is the pgbench still ru
nning? If not start it again.


At this point we have 2 databases that are fully prepared. One is the master

database in which bgbench is busy accessing and changing rows. It's now time

to start the replication daemons.


On $MASTERHOST the command to sta
rt the replication engine is


slon $CLUSTERNAME "dbname=$MASTERDBNAME user=$REPLICATIONUSER host=$MASTERHOST"


Likewise we start the replication system on node 2 (the slave)


slon $CLUSTERNAME "dbname=$SLAVEDBNAME user=$REPLICATIONUSER host=$SLAVEHOST"


Ev
en though we have the slon running on both the master and slave and they are

both spitting out diagnostics and other messages, we aren't replicating any

data yet. The notices you are seeing is the synchronization of cluster

configurations between the 2 sl
on processes.


To start replicating the 4 pgbench tables (set 1) from the master (node id 1)

the the slave (node id 2), execute the following script.


#!/bin/sh

slonik <<_EOF_


#
----


# This defines which namespace the replication system uses


#
----


cluster name = $CLUSTERNAME;



#
----


# Admin conninfo's are used by the slonik program to connect


# to the node databases. So these are the PQconnectdb arguments


# that connect from the administrators workstation (where


# slon
ik is executed).


#
----


node 1 admin conninfo = 'dbname=$MASTERDBNAME host=$MASTERHOST
user=$REPLICATIONUSER';


node 2 admin conninfo = 'dbname=$SLAVEDBNAME host=$SLAVEHOST
user=$REPLICATIONUSER';


16/03/2013

26
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc



#
----


# Node 2 subscribes set 1


#
----


subscribe set ( id = 1, provider = 1, receiver = 2, forward = no);

_EOF_



Any second here, the replication daemon on $SLAVEHOST will start to copy the

current content of all 4 replicated tables. While doing so, of course, the

pgbench application

will continue to modify the database.
When the copy

process is finished, the replication daemon on $SLAVEHOST will start to catch

up by applying the accumulated replication log. It will do this in little

steps, 10 seconds worth of application work at a
time.
Depending on the

performance of the two systems involved, the sizing of the two databases, the

actual transaction load and how well the two databases are tuned and

maintained, this catchup process can be a matter of minutes, hours, or

eons.


You hav
e now successfully set up your first basic master/slave replication

system, and the 2 databases once the slave has caught up contain identical

data. That's the theory. In practice, it's good to check that the datasets

are in fact the same.


The following

script will create ordered dumps of the 2 databases and compare

them. Make sure that pgbench has completed it's testing, and that your slon

sessions have caught up.



#!/bin/sh

echo
-
n "**** comparing sample1 ... "

psql
-
U $REPLICATIONUSER
-
h $MASTERHOST

$MASTERDBNAME >dump.tmp.1.$$
<<_EOF_


select 'accounts:'::text, aid, bid, abalance, filler


from accounts order by aid;


select 'branches:'::text, bid, bbalance, filler


from branches order by bid;


select 'tellers:'::text, tid, bid
, tbalance, filler


from tellers order by tid;


select 'history:'::text, tid, bid, aid, delta, mtime, filler,


"_Slony
-
I_${CLUSTERNAME}_rowID"


from history order by "_Slony
-
I_${CLUSTERNAME}_rowID";

_EOF_

psql
-
U $REPLICATIONUSER
-
h

$SLAVEHOST $SLAVEDBNAME >dump.tmp.2.$$ <<_EOF_


select 'accounts:'::text, aid, bid, abalance, filler


from accounts order by aid;


select 'branches:'::text, bid, bbalance, filler


from branches order by bid;


select 'tellers:'::text
, tid, bid, tbalance, filler


from tellers order by tid;


select 'history:'::text, tid, bid, aid, delta, mtime, filler,


"_Slony
-
I_${CLUSTERNAME}_rowID"


from history order by "_Slony
-
I_${CLUSTERNAME}_rowID";

_EOF_


if diff dump.tmp
.1.$$ dump.tmp.2.$$ >$CLUSTERNAME.diff ; then


echo "success
-

databases are equal."


rm dump.tmp.?.$$


rm $CLUSTERNAME.diff

else


echo "FAILED
-

see $CLUSTERNAME.diff for database differences"

fi


If this script returns "FAILED" please contact

the developers at

http://slony.org/

16/03/2013

27
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc



Doing switchover and failover with Slony
-
I


Foreword



Slony
-
I is an asynchronous replication system. Because of that, it


is almost certain that at the moment the current origin of a set


fails, the last tr
ansactions committed have not propagated to the


subscribers yet. They always fail under heavy load, and you know


it. Thus the goal is to prevent the main server from failing.


The best way to do that is frequent maintenance.



Opening the c
ase of a running server is not exactly what we


all consider professional system maintenance. And interestingly,


those users who use replication for backup and failover


purposes are usually the ones that have a very low tolerance for


words
like "downtime". To meet these requirements, Slony
-
I has


not only failover capabilities, but controlled master role transfer


features too.



It is assumed in this document that the reader is familiar with


the slonik utility and knows at lea
st how to set up a simple


2 node replication system with Slony
-
I.


Switchover



We assume a current "origin" as node1 (AKA master) with one


"subscriber" as node2 (AKA slave). A web application on a third


server is accessing the database on

node1. Both databases are


up and running and replication is more or less in sync.



Step 1)



At the time of this writing switchover to another server requires


the application to reconnect to the database. So in order to avoid


any compli
cations, we simply shut down the web server. Users who


use pgpool for the applications database connections can shutdown


the pool only.



Step 2)




A small slonik script executes the following commands:



lock set (id = 1, origin = 1);


wa
it for event (origin = 1, confirmed = 2);


move set (id = 1, old origin = 1, new origin = 2);


wait for event (origin = 1, confirmed = 2);



After these commands, the origin (master role) of data set 1


is now on node2. It is not simply transferred.

It is done


in a fashion so that node1 is now a fully synchronized subscriber


actively replicating the set. So the two nodes completely switched


roles.



Step 3)



After reconfiguring the web application (or pgpool) to connect to


the
database on node2 instead, the web server is restarted and


resumes normal operation.



Done in one shell script, that does the shutdown, slonik, move


config files and startup all together, this entire procedure


takes less than 10 seconds.


16/03/2013

28
/
28
seasoningalluring_d1002d37
-
9bf7
-
4b16
-
aae8
-
ff133f7536df.doc




It is now possible to simply shutdown node1 and do whatever is


required. When node1 is restarted later, it will start replicating


again and eventually catch up after a while. At this point the


whole procedure is executed with exchanged nod
e IDs and the


original configuration is restored.




Failover



Because of the possibility of missing not
-
yet
-
replicated

transactions that are
committed, failover is the worst thing

that can happen in a master
-
slave
replication scenario. If ther
e

is any possibility to bring back the failed server
even if only for a few minutes, we strongly recommended that you follow the

switchover procedure above.



Slony does not provide any automatic detection for failed systems.


Abandoning committed

transactions is a business decision that


cannot be made by a database. If someone wants to put the


commands below into a script executed automatically from the


network monitoring system, well ... its your data.



Step 1)



The slonik comm
and



failover (id = 1, backup node = 2);



causes node2 to assume the ownership (origin) of all sets that


have node1 as their current origin. In the case there would be


more nodes, All direct subscribers of node1 are instructed that


this i
s happening. Slonik would also query all direct subscribers


to figure out which node has the highest replication status


(latest committed transaction) for each set, and the configuration


would be changed in a way that node2 first applies those

last


minute changes before actually allowing write access to the


tables.



In addition, all nodes that subscribed directly from node1 will


now use node2 as data provider for the set. This means that


after the failover command succeeded
, no node in the entire


replication setup will receive anything from node1 any more.



Step 2)



Reconfigure and restart the application (or pgpool) to cause it


to reconnect to node2.



Step 3)



After the failover is complete and node2 acc
epts write operations


against the tables, remove all remnants of node1's configuration


information with the slonik command



drop node (id = 1, event node = 2);



After failover, getting back node1



After the above failover, the data stored on
node1 must be


considered out of sync with the rest of the nodes. Therefore, the


only way to get node1 back and transfer the master role to it is


to rebuild it from scratch as a slave, let it catch up and then


follow the switchover procedur
e.