replication is the creation and maintenance of multiple copies of the same database.
In most implementations of database replication, one
maintains the master copy of the
database and additional
maintain slave copies of the database.
base writes are sent to the master database
and are then replicated by the slave database
Database reads are divided among all of the database servers, which results in a large performance
advantage due to load sharing.
In addition, database replication can also improve availability because the slave database servers can be
configured to t
ake over the master role if the master database server becomes unavailable.
is the process of sharing information so as to ensure consistency between redundant
resources, such as
components, to improve reliability,
accessibility. It could be
if the same data is stored on
if the same computing task is executed many times. A computational task is
replicated in space
, i.e. executed on separate device
s, or it could be
replicated in time
, if it is
executed repeatedly on a single device.
The access to a replicated entity is typically uniform with access to a single, non
replicated entity. The
replication itself should be
to an external user. Also, in a failure scenario, a
replicas is hidden as much as possible.
It is common to talk about active and passive rep
lication in systems that replicate data or services.
is performed by processing the same request at every replica. In
each single request is processed on a single replica and then its state is transferred to the othe
If at any time one master replica is designated to process all the requests, then we are talking about the
scheme) predominant in
. On the other
side, if any replica processes a request and then distributes a new state, then this is a
in database field). In the multi
primary scheme, some form of
must be used, such as
distributed lock manager
is different from task replication, since it distributes a load of different (not the
computations across machines, and allows a single computation to be dropped in case of failure. Load
balancing, however, sometimes uses data replication (esp. multi
master) internally, to distribute its data
is different from replication, since it saves a copy of data unchanged for a long period of time.
Replicas on the other hand are frequently updated and quickly lose any historical state.
Replication in distributed systems
Replication is one of the oldest and most important topics in the overall area of
Whether one replicates data or computation, the objective is to have some group of pro
handle incoming events. If we replicate data, these processes are passive and operate only to maintain
the stored data, reply to read requests, and apply updates. When we replicate computation, the usual
goal is to provide fault
example, a replicated service might be used to control a
telephone switch, with the objective of ensuring that even if the primary controller fails, the backup can
take over its functions. But the underlying needs are the same in both cases: by ensuring th
replicas see the same events in equivalent orders, they stay in consistent states and hence any replica
can respond to queries.
lication models in distributed systems
A number of widely cited models exist for data replication, each having its own properties and
. This is the model for replicating transactional data, for example a
database or some other form of transactional storage structure. The
model is employed in this case, which defines legal outcomes of a transaction on replicated data
in accordance with the overall
properties that transactional systems s
eek to guarantee.
State machine replication
This model assumes that replicated process is a
of every event is possible. It is based on a distributed
computing problem called
and has a great deal in common with the
transactional replication model. This is sometimes mistakenly used as synonym of
. This computational model is used when a group of processes cooperate to
memory data or to coordinate actions. The model defines a new distributed entity
. A process can join a group, which is
much like opening a file: the
process is added to the group, but is also provided with a checkpoint containing the current state
of the data replicated by group members. Processes can then send events (
) to the
group and will see incoming events
in the identical order, even if events are sent concurrently.
Membership changes are handled as a special kind of platform
generated event that delivers a
to the processes in the group.
Levels of performance vary widely depending on t
he model selected. Transactional replication is
slowest, at least when one
copy serializability guarantees are desired (better performance can be
obtained when a database uses log
based replication, but at the cost of possible inconsistencies if a
causes part of the log to be lost). Virtual synchrony is the fastest of the three models, but the
handling of failures is less rigorous than in the transactional model. State machine replication lies
somewhere in between; the model is faster than transacti
ons, but much slower than virtual synchrony.
The virtual synchrony model is popular[
] in part because it allows the developer to use
either active or passive replication.
In contrast, state machine replication and transactional replication
are highly constraining and are often embedded into products at layers where end
users would not be
] to access them.
] Database replication
replication can be used on many
database management systems
, usually with a master/slave
relationship between the original and the copies. The master logs the updates, which then ripple
through to the slaves. The sl
ave outputs a message stating that it has received the update successfully,
thus allowing the sending (and potentially re
sending until successfully applied) of subsequent updates.
, where updates can be submitted to any database node, and then ripple through
to other servers, is often desired, but introduces substantially increased costs and complexity which
may make it impractical in some situations. The most com
mon challenge that exists in multi
replication is transactional conflict prevention or resolution. Most synchronous or eager replication
solutions do conflict prevention, while asynchronous solutions have to do conflict resolution. For
a record is changed on two nodes simultaneously, an eager replication system would detect
the conflict before confirming the commit and abort one of the transactions. A
ould allow both transactions to commit and run a conflict resolution during resynchronization. The
resolution of such a conflict may be based on a timestamp of the transaction, on the hierarchy of the
origin nodes or on much more complex logic, which decid
es consistently on all nodes.
Database replication becomes difficult when it scales up. Usually, the scale up goes with two
dimensions, horizontal and vertical: horizontal scale up has more data replicas, vertical scale up has
data replicas located further
away in distance. Problems raised by horizontal scale up can be alleviated
by a multi
view access protocol. Vertical scale up runs into less trouble when the internet
reliability and performance are improving.
] Disk storage replication
replication is usually implemented by distributing updates of a
to several physical
. This way, any
supported by the
replicated without modification, as the file system code works on a level above the block device driver
layer. It is implemented either in hardware (in a
disk array controller
) or in software (in a
The most basic method is
, typical for locally
Notably, the storage industry narrows the definitions, so
is a local (short
is extendable across a
, so the disks can be located in physically distant
locations. The purpose is to avoid damage done by, and improve availability in case of local failures or
ly the above
theoretical replication model is applied. The main
characteristic of such solutions is handling write operations:
guarantees "zero data loss" b
y the means of
i.e. write either completes on both sides or not at all. Write is not considered complete until
acknowledgement by both local and remote storage. Most appl
ications wait for a write
transaction to complete before proceeding with further work, hence overall performance
decreases considerably. Inherently, performance drops proportionally to distance, as
speed of light
. For 10 km distance, the fastest possible roundtrip takes 67 μs, whereas
nowadays a whole local cached write completes in about 10
of synchronous replication is the fact, that failure of either
replica or even just the
stops by definition any and all writes
(freezing the storage system). This is the behaviour that guarantees zero data loss.
However, many commer
cial systems at such potentially dangerous point do not freeze,
but just proceed with local writes, losing the desired zero
recovery point objective
write is considered complete as soon as local storage acknowledges
it. Remote storage is updated, but probably with a small
. Performance is greatly i
but in case of losing a local storage, the remote storage is not guaranteed to have the current
copy of data and most recent data may be lost.
introduced by some vendors, probably as a sales pitch;
exact meaning va
that are replicated instead of
Most important implementations:
odule for Linux.
Symantec Veritas Volume Replicator
] Distributed shared memory replication
Another example of using replication appears in
distributed shared memory
systems, where it may
happen that many nodes of the system share the
same page of the memory
which usually means, that
each node has a separate copy (replica) of this page.
backup and multi
Many classical approaches to replication are based on a primary/backup model where one device or
process has unilateral control over one or more other processes or devices. For example, the primary
might perform some computation, streaming a
log of updates to a backup (standby) process, which can
then take over if the primary fails. This approach the most common one for replicating databases,
despite the risk that if a portion of the log is lost during a failure, the backup might not be in a s
identical to the one the primary was in, and transactions could then be lost.
A weakness of primary/backup schemes is that in settings where both processes could have been active,
only one is actually performing operations. We're gaining fault
ce but spending twice as much
money to get this property. For this reason, starting in the period around 1985, the distributed systems
research community began to explore alternative methods of replicating data. An outgrowth of this
work was the emergence
of schemes in which a group of replicas could cooperate, with each process
backup up the others, and each handling some share of the workload.
, a towering figure within the
database community, analyzed multi
schemes under the transactional model and ultimately published a widely cited paper skeptical of the
"The Dangers of Replicatio
n and a Solution"
). In a nutshell, he argued that unless data splits
in some natural way so that the database can be treated as
databases, concurrency control
conflicts will result in seriously degraded performance and the group of replicas
will probably slow
down as a function of
. Indeed, he suggests that the most common approaches are likely to result in
degradation that scales as
. His solution, which is to partition the data, is only viable in situations
where data actually has a
natural partitioning key.
The situation is not always so bleak. For example, in the 1985
1987 period, the
model was proposed and emerged as a widely adopted standard (it was u
sed in the Isis Toolkit, Horus,
Transis, Ensemble, Totem,
Ensemble, Phoenix and Quicksilver systems, and is the basis for
the CORBA fault
tolerant computing standard; the model is also u
sed in IBM Websphere to replicate
business logic and in Microsoft's Windows Server 2008
synchrony permits a multi
primary approach in which a g
roup of processes cooperate to parallelize
some aspects of request processing. The scheme can only be used for some forms of in
but when feasible, provides linear speedups in the size of the group.
A number of modern products support similar s
chemes. For example, the
same virtual synchrony model and can be used to implement a multi
primary replication scheme; it
would also be possible to use C
Ensemble or Q
uicksilver in this manner. WANdisco permits active
replication where every node on a network is an exact copy or
and hence every node on the
network is active at at one time; this scheme is optim
ized for use in a
wide area network
From Wikipedia, the free encyclopedia
(also known as lazy replication
) is a strategy for
replicas are allowed to diverge. Traditional pessimistic replication systems are based on the principle of
copy consistency. that is, users should observe the system to behave as if there was
copy of the data. Optimistic replication does away with this in favor of
that replicas are guaranteed to converge only when a system is idle.
An optimistic replication algorithm consists of five elements:
: Users submit operations at independent sites.
Each site shares the operations it knows about with the rest of the system.
: Each site decides on an order for the operations it knows about.
: If there are any conflicts among the operations a site has scheduled, it must
odify them in some way.
: The sites agree on a final schedule and conflict resolution result, and the
operations are made permanent.
There are two strategies for propagation: state transfer, where sites propagate a representation of the
state, and operation transfer, where sites propagate the operations that were performed
(essentially, a list of instructions on how to reach the new state).
Scheduling and conflict resolution can either be syntactic or semantic. Syntactic systems rely on
general information, such as when or where an operation was submitted. Semantic systems are able to
make use of application
specific information to make smarter decisions. Note that state transfer
systems generally have no information about the semantics o
f the data being transferred, and so they
have to use syntactic scheduling and conflict resolution.
known example of a system ba
sed on optimistic replication is the
, or any other version control system which uses the
paradigm. CVS covers
each of the five elements:
Operation submission: Users edit local versions of files.
Propagation: Users manually pull updates f
rom a central server, or push changes out once the
user feels they are ready.
Scheduling: Operations are scheduled in the order that they are received by the central server.
Conflict resolution: When a user pushes to or pulls from the central repository,
will be flagged for that user to fix manually.
Commitment: Once the central server accepts the changes which a user pushes, they are
A special case of replication is
, where there are only two replicas. For example,
personal digital assistants (PDAs)
allow users to edit data either on the PDA or a computer, and then t
these two datasets together. Note, however, that replication is a broader problem than
synchronization, since there may be more than two replicas.
May 18, 2004
ion in MySQL
An introduction to replication
Recently, while having the knots pounded out of my body during a par
ticularly painful shiatsu
lesson, I reflected on what put them there in the first place. Yes, 'the database' was once more to
blame. A busy database I work with saw one of its tables jump from 3GB to 7GB overnight, as we
imported archive data. As expected,
this had some performance impact. Unfortunately, I had not
expected quite the knock, and it turned out that this single database server could no longer
handle the load. No matter how much more I tried to
optimize the queries
bump up the query
, it was not enough. The machine could not take any
more memory, and a hardware upgrade would do little good (at least with the kind of budget I
have to play with). However, MySQL does not claim to be enterprise
ready for nothing, and Yahoo
volume users of MySQL certainly do not run on one database server. There are a
number of techniques to handle high volumes, one of which I will introduce this month
replication (I will look at others in future articles).
you to take one database, make an exact copy of it on another server, and set
one of them (the slave) to take all its updates from the other (the master). The slave reads the
, which store all statements that change a database, and repeats these on
its database, keeping the two in exact sync. Since a replicating database simply repeats
statements, the databases are not necessarily exactly in sync, and advan
ced users can take
advantage of this. That is a topic for another article however, and we will look at simple replication
getting one database to be an exact copy of another one.
What replication is not
Replication is not a backup policy. A m
istyped DELETE statement will be replicated on the
slave too, and you could end up with two, perfectly synchronized, empty databases.
Replication can help protect against hardware failure though.
Replication is not an answer to all performance problems. A
lthough updates on the slave
are more optimized than if you ran the updates normally, if you use MyISAM tables, table
locking will still occur, and databases under high
load could still struggle.
Replication is not a guarantee that the slave will be in sy
nc with the master at any one
point in time. Even assuming the connection is always up, a busy slave may not yet have
caught up with the master, so you can't simply interchange SELECT queries across master
and slave servers.
How to start replicating
Grant the slave permission to replicate with the REPLICATION SLAVE privilege, for example
GRANT REPLICATION SLAVE ON *.* TO slave_user IDENTIFIED BY
If the master is not using the binary update log, add the foll
owing lines to the
configuration file, and restart the server:
By convention, the master is usually server
id 1, and any slaves from 2 onwards, though
you can change this if you wish. If the master is already using t
he binary update log, either
take note of the offset at the moment of the backup (the next step), or use the
statement to clear all binary logs and immediately begin the backup. You may
want to make a copy of the binary logs before doing this,
in case you need to use the
binary logs to restore from backup.
Make a backup of the database. You will use this to start the slave server. Note the
comments about the binary log above. You can also skip this step if you use the
DATA FROM MASTER
ement, but see the comments about locking the master below
Add the following to the configuration file on the slave:
The slave user and slave password
are those to which you set when you granted
REPLICATION SLAVE permission on the master. The server
id must be a unique number,
different to the master or any other slaves in the system. There are also two other options:
, used if the master is
running on a non
standard port (3306 is default), and
, a time in seconds for the slave to attempt to reconnect if the master
goes down. 60 seconds is default.
Restore the data from the master, either as you would normally restore a ba
ckup, or with
LOAD DATA FROM MASTER
. The latter will lock the master for the duration
of the operation, which could be quite lengthy, so you may not be able to spare the
Replication in action
Once the slave has started, replication
should begin. Besides the obvious SELECT queries, you can
make sure this is working correctly with the following statements:
mysql> SHOW SLAVE STATUS
*************************** 1. row ***************************
This is a mature slave that has been running a while. The master is already on the 54th binary
log. You can see if the slave is running correctly by looking at the
. The most important field is the
mysql> SHOW MASTER STATUS;
| File | Position | Binlog_do_db | Binlog_ignore_db |
bin.054 | 16664104
| | |
The above is from a master that has been running a while. It is already on binlog 54.
Starting to replicate from a particular point in the binary logs
you need to force the slave to begin at a certain point, usually when the master has been
running with an active binary log, you can do so as follows. The following starts with the 3rd
binary log, as position 420. You can find the position using
mysql> CHANGE MASTER TO MASTER_LOG_FILE='master
Query OK, 0 rows affected (0.00 sec)
mysql> SLAVE START
The SLAVE START and SLAVE STOP commands are used to manually stop and start the slave. The
slave will also always stop
if it comes across an error while replicating.
Removing old binary logs
On active databases, the binary logs tend to grow quite quickly. You may have used RESET
MASTER in the past to clear them, but you cannot do this to the master while replicating! The
statement to use is PURGE MASTER LOGS. First, make sure that all slaves have replicated to at
least the log beyond which you want to remove. For example, in our earlier example, both the
slave and the master are on log
, so we can safely rem
before, as follows:
mysql> PURGE MASTER LOGS TO 'master
MySQL will not allow you to remove a log that the master is still using though.
Replicating specific databases only
As mentioned earlier, the master and slave databa
se server do not need to be entirely in sync,
with all databases and all tables from the master replicated onto the slave. By default, the slave
will replicate everything, but you can change this behavior with the following options in the slave
db=db_name (replicate this database)
db=db_name (don't replicate this database)
table=db_name.table_name (replicate this table)
table=db_name.table_name (don't replicate this table)
(allows wildcards, e.g db% would be all databases beginning with db)
table=db_name.table_name (ignore all specified
tables, with wildcards)
These options can all be used multiple times in a single configuration. A couple of other useful
>slave_db (allows you to use map databases
that use different database names on each server)
update (writes replicated statements to the slaves binary logs)
A few complexities
Before shutting down a slave server (
, make sure
has no temporary tables open (these may be needed for a statement to be replicated). You
can do this with
to see the value of
annoying feature should be fixed soon, so please check the latest documentation.
USER(), UUID(), LOAD_FILE() and CONNECTION_ID() (before MySQL 4.1.1) functions
do not work reliably on the slave (they are replicated without changes).
Before MySQL 4.1.1, FLUSH, ANALYZE, OPTIMIZE, and REPAIR statements are not
replicated. This means tha
t if you change permissions on the master by editing the tables
directly, you will need to manually FLUSH PRIVILEGES on the slave too.
Make sure the slave and the master are using the same character set.
is a method of
to transfer data or changes
to data across multiple computers within a group.
can be contrasted with a
method (also known as
The term Multi
master can also be applied to systems in general where a single piece of information
can be updated by one of several systems. That is, no one system can be said to own the information
and be able to control it consistency and accuracy.
If one master fails, other masters will continue to update the
Masters can be located in several physical sites i.e. distributed across the network.
master replication systems are only loosely consistent, i.e. lazy and asynchronous,
Eager replication systems are complex and int
roduce some communication latency.
Issues such as conflict resolution can become intractable as the number of nodes involved rises
and the required latency decreases
log is referenced to capture changes made to the database. For log
transaction capturing, database changes can only be
Triggers at the subscriber capture changes
made to the database and submit them to the publisher. With
based transaction capturing, database changes can be distributed either
Many directory servers based on
] Active Directory
For example, one of the more prevalent of multi
master replication implementations in directory
, objects that are updated on one
are then replicated to other
domain controllers through multi
master replication. It
is not required for all domain controllers to replicate with each other's domain controller as this would
cause excessive network traffic in large Active Directory implementations. Instead, domain co
have a complex update pattern that ensures that all servers are updated in a timely fashion without
excessive replication traffic. Some
needs are better served by
Flexible single master
master replication using one of two methods.
master replication commits data changes to a
deferred transaction queue
which is periodically
processed on all databases in the cluster.
master replication uses Oracle's two phase
commit functionality to ensure that all databases with the cluster have a consistent
ships with replication support. Using features introduced in MySQL 5.0, it is possi
achieve a multi
master replication scheme.
offers multiple solutions for multi
master replication, including solutions based on two
phase commit. There's
PgPool and PgPool
as well as some
proprietary solutions. Another promising approach, implementing eager
but asynchronous replication is
. It's still in development.
Within Ingres Replicator, objects that are updated on one Ingres server can then replicated to
other servers whether local or remote through multi
master replication. If one server fails, client
n be re
directed to another server. It is not required for all Ingres servers in an
environment to replicate with each other as this could cause excessive network traffic in large
implementations. Instead, Ingres Replicator provides an elegant and sophisti
cated design that allows
the appropriate data to be replicated to the appropriate servers without excessive replication traffic.
This means that some servers in the environment can serve as failover candidates while other servers
can meet other requirement
s such as managing a subset of columns or tables for a departmental
solution, a subset of rows for a geographical region or one
way replication for a reporting server. In the
event of a source, target, or network failure, data integrity is enforced through
protocol by ensuring that either the whole transaction is replicated, or none of it is. In addition, Ingres
Replicator can operate over RDBMS’s from multiple vendors to connect them.
is a powerful open
source Java application for network
heterogeneous database replication or filtered synch
ronization. Supported databases include:
Firebird (database server)