Lecture 3 notes, Database Replication.doc - Scott Streit Content

needmoreneedmoreΔιαχείριση Δεδομένων

28 Νοε 2012 (πριν από 4 χρόνια και 9 μήνες)

403 εμφανίσεις

Database

replication is the creation and maintenance of multiple copies of the same database.

In most implementations of database replication, one
database server

maintains the master copy of the
database and additional
database servers

maintain slave copies of the database.

Data
base writes are sent to the master database
server

and are then replicated by the slave database
server
s
.

Database reads are divided among all of the database servers, which results in a large performance
advantage due to load sharing.

In addition, database replication can also improve availability because the slave database servers can be
configured to t
ake over the master role if the master database server becomes unavailable.


Replication

is the process of sharing information so as to ensure consistency between redundant
resources, such as
software

o
r
hardware

components, to improve reliability,
fault
-
tolerance
, or
accessibility. It could be
data replication

if the same data is stored on
multiple
storage devices
, or
computation replication

if the same computing task is executed many times. A computational task is
typically
replicated in space
, i.e. executed on separate device
s, or it could be
replicated in time
, if it is
executed repeatedly on a single device.

The access to a replicated entity is typically uniform with access to a single, non
-
replicated entity. The
replication itself should be
transparent

to an external user. Also, in a failure scenario, a
failover

of
replicas is hidden as much as possible.

It is common to talk about active and passive rep
lication in systems that replicate data or services.
Active replication

is performed by processing the same request at every replica. In
passive replication
,
each single request is processed on a single replica and then its state is transferred to the othe
r replicas.
If at any time one master replica is designated to process all the requests, then we are talking about the
primary
-
backup

scheme (
master
-
slave

scheme) predominant in
high
-
availability clusters
. On the other
side, if any replica processes a request and then distributes a new state, then this is a
multi
-
primary

scheme (called
multi
-
master

in database field). In the multi
-
primary scheme, some form of
distributed
concurrency control

must be used, such as
distributed lock manager
.

Load balancing

is different from task replication, since it distributes a load of different (not the
same)
computations across machines, and allows a single computation to be dropped in case of failure. Load
balancing, however, sometimes uses data replication (esp. multi
-
master) internally, to distribute its data
among machines.

Backup

is different from replication, since it saves a copy of data unchanged for a long period of time.
Replicas on the other hand are frequently updated and quickly lose any historical state.


Replication in distributed systems

Replication is one of the oldest and most important topics in the overall area of
distributed systems
.

Whether one replicates data or computation, the objective is to have some group of pro
cesses that
handle incoming events. If we replicate data, these processes are passive and operate only to maintain
the stored data, reply to read requests, and apply updates. When we replicate computation, the usual
goal is to provide fault
-
tolerance. For
example, a replicated service might be used to control a
telephone switch, with the objective of ensuring that even if the primary controller fails, the backup can
take over its functions. But the underlying needs are the same in both cases: by ensuring th
at the
replicas see the same events in equivalent orders, they stay in consistent states and hence any replica
can respond to queries.

[
edit
] Rep
lication models in distributed systems

A number of widely cited models exist for data replication, each having its own properties and
performance:

1.

Tran
sactional replication
. This is the model for replicating transactional data, for example a
database or some other form of transactional storage structure. The
one
-
copy serializability

model is employed in this case, which defines legal outcomes of a transaction on replicated data
in accordance with the overall
ACID

properties that transactional systems s
eek to guarantee.

2.

State machine replication
.

This model assumes that replicated process is a
deterministi
c finite
state machine

and that
atomic broadcast

of every event is possible. It is based on a distributed
computing problem called
distributed consensus

and has a great deal in common with the
transactional replication model. This is sometimes mistakenly used as synonym of
active
replication
.

3.

Virtual synchrony
. This computational model is used when a group of processes cooperate to
replicate in
-
memory data or to coordinate actions. The model defines a new distributed entity
called a
process group
. A process can join a group, which is

much like opening a file: the
process is added to the group, but is also provided with a checkpoint containing the current state
of the data replicated by group members. Processes can then send events (
multicasts
) to the
group and will see incoming events

in the identical order, even if events are sent concurrently.
Membership changes are handled as a special kind of platform
-
generated event that delivers a
new
membership view

to the processes in the group.

Levels of performance vary widely depending on t
he model selected. Transactional replication is
slowest, at least when one
-
copy serializability guarantees are desired (better performance can be
obtained when a database uses log
-
based replication, but at the cost of possible inconsistencies if a
failure
causes part of the log to be lost). Virtual synchrony is the fastest of the three models, but the
handling of failures is less rigorous than in the transactional model. State machine replication lies
somewhere in between; the model is faster than transacti
ons, but much slower than virtual synchrony.

The virtual synchrony model is popular[
citation needed
] in part because it allows the developer to use
either active or passive replication.

In contrast, state machine replication and transactional replication
are highly constraining and are often embedded into products at layers where end
-
users would not be
able[
citation ne
eded
] to access them.

[
edit
] Database replication

Database

replication can be used on many

database management systems
, usually with a master/slave
relationship between the original and the copies. The master logs the updates, which then ripple
through to the slaves. The sl
ave outputs a message stating that it has received the update successfully,
thus allowing the sending (and potentially re
-
sending until successfully applied) of subsequent updates.

Multi
-
master replication
, where updates can be submitted to any database node, and then ripple through
to other servers, is often desired, but introduces substantially increased costs and complexity which
may make it impractical in some situations. The most com
mon challenge that exists in multi
-
master
replication is transactional conflict prevention or resolution. Most synchronous or eager replication
solutions do conflict prevention, while asynchronous solutions have to do conflict resolution. For
instance, if
a record is changed on two nodes simultaneously, an eager replication system would detect
the conflict before confirming the commit and abort one of the transactions. A
lazy replication

system
w
ould allow both transactions to commit and run a conflict resolution during resynchronization. The
resolution of such a conflict may be based on a timestamp of the transaction, on the hierarchy of the
origin nodes or on much more complex logic, which decid
es consistently on all nodes.

Database replication becomes difficult when it scales up. Usually, the scale up goes with two
dimensions, horizontal and vertical: horizontal scale up has more data replicas, vertical scale up has
data replicas located further

away in distance. Problems raised by horizontal scale up can be alleviated
by a multi
-
layer multi
-
view access protocol. Vertical scale up runs into less trouble when the internet
reliability and performance are improving.

[
edit
] Disk storage replication

Active (real
-
time)
storage

replication is usually implemented by distributing updates of a
block device

to several physical
hard disks
. This way, any
file system

supported by the
operating system

can be
replicated without modification, as the file system code works on a level above the block device driver
layer. It is implemented either in hardware (in a
disk array controller
) or in software (in a
device
driver
).

The most basic method is
di
sk mirroring
, typical for locally
-
connected disks.

Notably, the storage industry narrows the definitions, so
mirroring

is a local (short
-
distance) operation.
A
replication

is extendable across a
computer network
, so the disks can be located in physically distant
locations. The purpose is to avoid damage done by, and improve availability in case of local failures or
disasters
. Typical
ly the above
master
-
slave

theoretical replication model is applied. The main
characteristic of such solutions is handling write operations:



Synchronous

replication
-

guarantees "zero data loss" b
y the means of
atomic

write operation,
i.e. write either completes on both sides or not at all. Write is not considered complete until
acknowledgement by both local and remote storage. Most appl
ications wait for a write
transaction to complete before proceeding with further work, hence overall performance
decreases considerably. Inherently, performance drops proportionally to distance, as
latenc
y

is
caused by
speed of light
. For 10 km distance, the fastest possible roundtrip takes 67 μs, whereas
nowadays a whole local cached write completes in about 10
-
20 μs.



An often
-
overlooked aspect

of synchronous replication is the fact, that failure of either
remote

replica or even just the
interconnection

stops by definition any and all writes
(freezing the storage system). This is the behaviour that guarantees zero data loss.
However, many commer
cial systems at such potentially dangerous point do not freeze,
but just proceed with local writes, losing the desired zero
recovery point objective
.



Asynchronous

replication
-

write is considered complete as soon as local storage acknowledges
it. Remote storage is updated, but probably with a small
lag
. Performance is greatly i
ncreased,
but in case of losing a local storage, the remote storage is not guaranteed to have the current
copy of data and most recent data may be lost.



Semi
-
synchronous replication
-

introduced by some vendors, probably as a sales pitch;
exact meaning va
ries.



Point
-
in
-
time replication
-

introduces periodic
snapshots

that are replicated instead of
primary storage.

Most important implementations:



DRBD

m
odule for Linux.



EMC SRDF




IBM PPRC

and
Global Mirror




Hitachi TrueCopy




Symantec Veritas Volume Replicator

(VVR)

[
ed
it
] Distributed shared memory replication

Another example of using replication appears in
distributed shared memory

systems, where it may
happen that many nodes of the system share the

same page of the memory
-

which usually means, that
each node has a separate copy (replica) of this page.

[
edit
] Primary
-
backup and multi
-
primar
y replication

Many classical approaches to replication are based on a primary/backup model where one device or
process has unilateral control over one or more other processes or devices. For example, the primary
might perform some computation, streaming a
log of updates to a backup (standby) process, which can
then take over if the primary fails. This approach the most common one for replicating databases,
despite the risk that if a portion of the log is lost during a failure, the backup might not be in a s
tate
identical to the one the primary was in, and transactions could then be lost.

A weakness of primary/backup schemes is that in settings where both processes could have been active,
only one is actually performing operations. We're gaining fault
-
toleran
ce but spending twice as much
money to get this property. For this reason, starting in the period around 1985, the distributed systems
research community began to explore alternative methods of replicating data. An outgrowth of this
work was the emergence
of schemes in which a group of replicas could cooperate, with each process
backup up the others, and each handling some share of the workload.

Jim Gray
, a towering figure within the

database community, analyzed multi
-
primary replication
schemes under the transactional model and ultimately published a widely cited paper skeptical of the
approach (
"The Dangers of Replicatio
n and a Solution"
). In a nutshell, he argued that unless data splits
in some natural way so that the database can be treated as
n

disjoint sub
-
databases, concurrency control
conflicts will result in seriously degraded performance and the group of replicas

will probably slow
down as a function of
n
. Indeed, he suggests that the most common approaches are likely to result in
degradation that scales as
O(n³)
. His solution, which is to partition the data, is only viable in situations
where data actually has a
natural partitioning key.

The situation is not always so bleak. For example, in the 1985
-
1987 period, the
virtual synchrony

model was proposed and emerged as a widely adopted standard (it was u
sed in the Isis Toolkit, Horus,
Transis, Ensemble, Totem,
Spread
, C
-
Ensemble, Phoenix and Quicksilver systems, and is the basis for
the CORBA fault
-
tolerant computing standard; the model is also u
sed in IBM Websphere to replicate
business logic and in Microsoft's Windows Server 2008
enterprise clustering

technology). Virtual
synchrony permits a multi
-
primary approach in which a g
roup of processes cooperate to parallelize
some aspects of request processing. The scheme can only be used for some forms of in
-
memory data,
but when feasible, provides linear speedups in the size of the group.

A number of modern products support similar s
chemes. For example, the
Spread Toolkit

supports this
same virtual synchrony model and can be used to implement a multi
-
primary replication scheme; it
would also be possible to use C
-
Ensemble or Q
uicksilver in this manner. WANdisco permits active
replication where every node on a network is an exact copy or
replica

and hence every node on the
network is active at at one time; this scheme is optim
ized for use in a
wide area network
.

Optimistic replication
From Wikipedia, the free encyclopedia
Jump to:
na
vigation
,
search
Optimistic replication
[1]

(also known as lazy replication
[2]
[3]
) is a strategy for
replication

in which
replicas are allowed to diverge. Traditional pessimistic replication systems are based on the principle of
single
-
copy consistency. that is, users should observe the system to behave as if there was
only one
copy of the data. Optimistic replication does away with this in favor of
eventual consistency
, meaning
that replicas are guaranteed to converge only when a system is idle.

[
edit
] Algorithms

An optimistic replication algorithm consists of five elements:

1.

Operation submission
: Users submit operations at independent sites.

2.

Propagation
:

Each site shares the operations it knows about with the rest of the system.

3.

Scheduling
: Each site decides on an order for the operations it knows about.

4.

Conflict resolution
: If there are any conflicts among the operations a site has scheduled, it must
m
odify them in some way.

5.

Commitment
: The sites agree on a final schedule and conflict resolution result, and the
operations are made permanent.

There are two strategies for propagation: state transfer, where sites propagate a representation of the
current

state, and operation transfer, where sites propagate the operations that were performed
(essentially, a list of instructions on how to reach the new state).

Scheduling and conflict resolution can either be syntactic or semantic. Syntactic systems rely on
general information, such as when or where an operation was submitted. Semantic systems are able to
make use of application
-
specific information to make smarter decisions. Note that state transfer
systems generally have no information about the semantics o
f the data being transferred, and so they
have to use syntactic scheduling and conflict resolution.

[
edit
] Examples

One well
-
known example of a system ba
sed on optimistic replication is the
CVS

version control
system
, or any other version control system which uses the
copy
-
modify
-
merge

paradigm. CVS covers
each of the five elements:

1.

Operation submission: Users edit local versions of files.

2.

Propagation: Users manually pull updates f
rom a central server, or push changes out once the
user feels they are ready.

3.

Scheduling: Operations are scheduled in the order that they are received by the central server.

4.

Conflict resolution: When a user pushes to or pulls from the central repository,

any conflicts
will be flagged for that user to fix manually.

5.

Commitment: Once the central server accepts the changes which a user pushes, they are
permanently committed.

A special case of replication is
synchronization
, where there are only two replicas. For example,
personal digital assistants (PDAs)

allow users to edit data either on the PDA or a computer, and then t
o
merge

these two datasets together. Note, however, that replication is a broader problem than
synchronization, since there may be more than two replicas.

May 18, 2004

Database Replicat
ion in MySQL

By
Ian Gilfillan

An introduction to replication

Recently, while having the knots pounded out of my body during a par
ticularly painful shiatsu
lesson, I reflected on what put them there in the first place. Yes, 'the database' was once more to
blame. A busy database I work with saw one of its tables jump from 3GB to 7GB overnight, as we
imported archive data. As expected,

this had some performance impact. Unfortunately, I had not
expected quite the knock, and it turned out that this single database server could no longer
handle the load. No matter how much more I tried to
optimize the queries
,
tweak the
variables

or
bump up the query

cache
, it was not enough. The machine could not take any
more memory, and a hardware upgrade would do little good (at least with the kind of budget I
have to play with). However, MySQL does not claim to be enterprise
-
ready for nothing, and Yahoo
and othe
r high
-
volume users of MySQL certainly do not run on one database server. There are a
number of techniques to handle high volumes, one of which I will introduce this month
-

MySQL
replication (I will look at others in future articles).

Replication allows
you to take one database, make an exact copy of it on another server, and set
one of them (the slave) to take all its updates from the other (the master). The slave reads the
master's
binary logs
, which store all statements that change a database, and repeats these on
its database, keeping the two in exact sync. Since a replicating database simply repeats
statements, the databases are not necessarily exactly in sync, and advan
ced users can take
advantage of this. That is a topic for another article however, and we will look at simple replication
this month
-

getting one database to be an exact copy of another one.

What replication is not



Replication is not a backup policy. A m
istyped DELETE statement will be replicated on the
slave too, and you could end up with two, perfectly synchronized, empty databases.
Replication can help protect against hardware failure though.



Replication is not an answer to all performance problems. A
lthough updates on the slave
are more optimized than if you ran the updates normally, if you use MyISAM tables, table
-
locking will still occur, and databases under high
-
load could still struggle.



Replication is not a guarantee that the slave will be in sy
nc with the master at any one
point in time. Even assuming the connection is always up, a busy slave may not yet have
caught up with the master, so you can't simply interchange SELECT queries across master
and slave servers.

How to start replicating
-

the

master server



Grant the slave permission to replicate with the REPLICATION SLAVE privilege, for example
as follows:
GRANT REPLICATION SLAVE ON *.* TO slave_user IDENTIFIED BY
'slave_password'




If the master is not using the binary update log, add the foll
owing lines to the
my.cnf

or
my.ini

configuration file, and restart the server:



log
-
bin

server
-
id=1

By convention, the master is usually server
-
id 1, and any slaves from 2 onwards, though
you can change this if you wish. If the master is already using t
he binary update log, either
take note of the offset at the moment of the backup (the next step), or use the
RESET
MASTER

statement to clear all binary logs and immediately begin the backup. You may
want to make a copy of the binary logs before doing this,

in case you need to use the
binary logs to restore from backup.



Make a backup of the database. You will use this to start the slave server. Note the
comments about the binary log above. You can also skip this step if you use the
LOAD
DATA FROM MASTER

stat
ement, but see the comments about locking the master below
first.



Add the following to the configuration file on the slave:

master
-
host=master
-
hostname

master
-
user=slave
-
user

master
-
password=slave
-
password

server
-
id=2




The slave user and slave password

are those to which you set when you granted
REPLICATION SLAVE permission on the master. The server
-
id must be a unique number,
different to the master or any other slaves in the system. There are also two other options:
master
-
port
, used if the master is
running on a non
-
standard port (3306 is default), and
master
-
connect
-
retry
, a time in seconds for the slave to attempt to reconnect if the master
goes down. 60 seconds is default.



Restore the data from the master, either as you would normally restore a ba
ckup, or with
the statement
LOAD DATA FROM MASTER
. The latter will lock the master for the duration
of the operation, which could be quite lengthy, so you may not be able to spare the
downtime.

Replication in action

Once the slave has started, replication

should begin. Besides the obvious SELECT queries, you can
make sure this is working correctly with the following statements:

mysql> SHOW SLAVE STATUS
\
G

*************************** 1. row ***************************


Master_Host: masterhostname.c
o.za


Master_User: slave_server


Master_Port: 3306


Connect_retry: 60


Master_Log_File: master
-
bin.054


Read_Master_Log_Pos: 16664104


Relay_Log_File: slave
-
relay
-
bin.045


Relay_Log_Pos: 17657643

Relay_Master_Log
_File: master
-
bin.054


Slave_IO_Running: Yes


Slave_SQL_Running: Yes


Replicate_do_db: vne


Replicate_ignore_db:


Last_errno: 0


Last_error:


Skip_counter: 0


Exec_master_log_pos: 16664104


Relay_log_space:
17657643


This is a mature slave that has been running a while. The master is already on the 54th binary
log. You can see if the slave is running correctly by looking at the
Slave_IO_Running

and
Slave_SQL_Running
. The most important field is the
Last_error

field.

mysql> SHOW MASTER STATUS;

+
----------------
+
----------
+
--------------
+
------------------
+

| File | Position | Binlog_do_db | Binlog_ignore_db |

+
----------------
+
----------
+
--------------
+
------------------
+

| master
-
bin.054 | 16664104
| | |

+
----------------
+
----------
+
--------------
+
------------------
+


The above is from a master that has been running a while. It is already on binlog 54.

Starting to replicate from a particular point in the binary logs

If
you need to force the slave to begin at a certain point, usually when the master has been
running with an active binary log, you can do so as follows. The following starts with the 3rd
binary log, as position 420. You can find the position using
mysqlbinlo
g
.

mysql> CHANGE MASTER TO MASTER_LOG_FILE='master
-
bin.003' MASTER_LOG_POS=420;

Query OK, 0 rows affected (0.00 sec)

mysql> SLAVE START


The SLAVE START and SLAVE STOP commands are used to manually stop and start the slave. The
slave will also always stop

if it comes across an error while replicating.

Removing old binary logs

On active databases, the binary logs tend to grow quite quickly. You may have used RESET
MASTER in the past to clear them, but you cannot do this to the master while replicating! The

statement to use is PURGE MASTER LOGS. First, make sure that all slaves have replicated to at
least the log beyond which you want to remove. For example, in our earlier example, both the
slave and the master are on log
master
-
bin.054
, so we can safely rem
ove
master
-
log.053

and
before, as follows:

mysql> PURGE MASTER LOGS TO 'master
-
bin.053';

MySQL will not allow you to remove a log that the master is still using though.

Replicating specific databases only

As mentioned earlier, the master and slave databa
se server do not need to be entirely in sync,
with all databases and all tables from the master replicated onto the slave. By default, the slave
will replicate everything, but you can change this behavior with the following options in the slave
configurati
on file:

replicate
-
do
-
db=db_name (replicate this database)

replicate
-
ignore
-
db=db_name (don't replicate this database)

replicate
-
do
-
table=db_name.table_name (replicate this table)

replicate
-
ignore
-
table=db_name.table_name (don't replicate this table)

repli
cate
--
wild
-
do
-
table=db_name.table_name


(allows wildcards, e.g db% would be all databases beginning with db)

replicate
-
wild
-
ignore
-
table=db_name.table_name (ignore all specified

tables, with wildcards)

These options can all be used multiple times in a single configuration. A couple of other useful
options:

replicate
-
rewrite
-
db=master_db
-
>slave_db (allows you to use map databases


that use different database names on each server)

log
-
slave
-
update (writes replicated statements to the slaves binary logs)


A few complexities



Before shutting down a slave server (
mysqladmin
-
shutdown
, not
STOP SLAVE
, make sure

it
has no temporary tables open (these may be needed for a statement to be replicated). You
can do this with
SHOW STATUS
to see the value of
Slave_open_temp_tables
. This
annoying feature should be fixed soon, so please check the latest documentation.



The

USER(), UUID(), LOAD_FILE() and CONNECTION_ID() (before MySQL 4.1.1) functions
do not work reliably on the slave (they are replicated without changes).



Before MySQL 4.1.1, FLUSH, ANALYZE, OPTIMIZE, and REPAIR statements are not
replicated. This means tha
t if you change permissions on the master by editing the tables
directly, you will need to manually FLUSH PRIVILEGES on the slave too.



Make sure the slave and the master are using the same character set.



Multi
-
master replication

is a method of
replication

employed by
databases

to transfer data or changes
to data across multiple computers within a group.
Multi
-
master replicat
ion

can be contrasted with a
master
-
slave

method (also known as
single
-
master replication
).

The term Multi
-
master can also be applied to systems in general where a single piece of information
can be updated by one of several systems. That is, no one system can be said to own the information
and be able to control it consistency and accuracy.

Bene
fits of
Multi
-
master replication

:



If one master fails, other masters will continue to update the
database
.



Masters can be located in several physical sites i.e. distributed across the network.

Disadv
antages of
Multi
-
master replication

:



Most multi
-
master replication systems are only loosely consistent, i.e. lazy and asynchronous,
violating
ACID

properties.



Eager replication systems are complex and int
roduce some communication latency.



Issues such as conflict resolution can become intractable as the number of nodes involved rises
and the required latency decreases


Methods

[
edit
] Log
-
Based

A
database transaction

log is referenced to capture changes made to the database. For log
-
based
transaction capturing, database changes can only be

distributed
asynchronously
.

[
edit
] Trigger
-
Based

Triggers at the subscriber capture changes
made to the database and submit them to the publisher. With
trigger
-
based transaction capturing, database changes can be distributed either
synchronously

or
asynchronously
.

[
edit
] Implementations

Many directory servers based on
LDAP

implement multi
-
master replication.
OpenDS

implements
"synchronization" instead.

[
edit
] Active Directory

For example, one of the more prevalent of multi
-
master replication implementations in directory
servers is
Microsoft
's
Active Directory
. Within
Active Directory
, objects that are updated on one
Domain Controller

are then replicated to other

domain controllers through multi
-
master replication. It
is not required for all domain controllers to replicate with each other's domain controller as this would
cause excessive network traffic in large Active Directory implementations. Instead, domain co
ntrollers
have a complex update pattern that ensures that all servers are updated in a timely fashion without
excessive replication traffic. Some
Active Directory

needs are better served by
Flexible single master
operation
.

[
edit
] Oracle

Oracle database

clusters

implement multi
-
master replication using one of two methods.
Asynchronous

multi
-
master replication commits data changes to a
deferred transaction queue

which is periodically
processed on all databases in the cluster.
Synchronous

multi
-
master replication uses Oracle's two phase
commit functionality to ensure that all databases with the cluster have a consistent
dataset
.

[
edit
] MySQL

MySQL

ships with replication support. Using features introduced in MySQL 5.0, it is possi
ble to
achieve a multi
-
master replication scheme.
[1]

[
edit
] Post
greSQL

PostgreSQL

offers multiple solutions for multi
-
master replication, including solutions based on two
phase commit. There's
Bucardo
,
PgPool and PgPool
-
II
,
PgCluster

and
Sequoia

as well as some
proprietary solutions. Another promising approach, implementing eager
but asynchronous replication is
Postgres
-
R
. It's still in development.

[
edit
] Ingres

Ingres

Within Ingres Replicator, objects that are updated on one Ingres server can then replicated to
other servers whether local or remote through multi
-
master replication. If one server fails, client
connections ca
n be re
-
directed to another server. It is not required for all Ingres servers in an
environment to replicate with each other as this could cause excessive network traffic in large
implementations. Instead, Ingres Replicator provides an elegant and sophisti
cated design that allows
the appropriate data to be replicated to the appropriate servers without excessive replication traffic.
This means that some servers in the environment can serve as failover candidates while other servers
can meet other requirement
s such as managing a subset of columns or tables for a departmental
solution, a subset of rows for a geographical region or one
-
way replication for a reporting server. In the
event of a source, target, or network failure, data integrity is enforced through

this two
-
phase commit
protocol by ensuring that either the whole transaction is replicated, or none of it is. In addition, Ingres
Replicator can operate over RDBMS’s from multiple vendors to connect them.

[
edit
] DBReplicator

DBReplicator

is a powerful open
-
source Java application for network
-
based multi
-
master
heterogeneous database replication or filtered synch
ronization. Supported databases include:
CloudScape

/
Apache Derby

/
JavaDB
,
DaffodilDB

/
One$DB
,
IBM DB2
,
Firebird (database server)
,
MySQL
,
Oracle Data
base
,
PointBase
,
PostgreSQL
,
SQLServer
,
Sybase
.