6.4 Data and File Replication

hedgebornabaloneSoftware and s/w Development

Dec 2, 2013 (4 years and 1 month ago)

99 views

Neha Purohit

Why replicate


Performance



Reliability



Resource
sharing



Network resource saving

Challenge


Transparency

Replication

Concurrent control

Failure recovery

Serialization


Atomicity


In database systems, atomicity is one of the
ACID transaction properties. An atomic
transaction is a series of database operations
which either all occur, or all do not occur[1].



All or nothing


Atomicity


In

DFS

(Distributed

File

System),

replicated

objects

(data

or

file)

should

follow

atomicity

rules,

i
.
e
.
,

all

copies

should

be

updated

(synchronously

or

asynchronously)

or

none
.

Goal


One
-
copy serializability:



The effect of transactions performed by clients
on replicated objects should be the same as if
they had been performed one at a time on a
single set of objects.[2]


Architecture[3]

RM
RM
RM
RM
FSA
FSA
Client
Client
Read operations [3]


Read
-
one
-
primary, FSA only read from a
primary RM, consistency



Read
-
one, FSA may read from any RM,
concurrency



Read
-
quorum, FSA must read from a quorum
of RMs to decide the currency of data

Write Operations[3]


Write
-
one
-
primary, only write to primary RM,
primary RM update all other RMs



Write
-
all, update to all RMs



Write
-
all
-

available, write to all functioning
RMs. Faulty RM need to be synched before
bring online.

Write Operations


Write
-
quorum, update to a predefined
quorum of RMs


Write
-
gossip, update to any RM and
lazily propagated to other RMs

Read one primary, write one primary


Other RMs are backups of primary RM


No concurrency


Easy serialized


Simple to implement


Achieve one
-
copy serializability


Primary RM is performance bottleneck


Read one, Write all


Provides
concurrency



Concurrency control protocol needed to
ensure consistency (serialization
)



Achieve one
-
copy
serializability



Difficult to implement (there will be failed TM
to block any updates)

Read one, Write all available


Variation of Read one, Write all


May not guarantee one
-
copy serializability


Issue of loss conflict in transactions


Read quorum, Write quorum


Version number attached to replicated object


Highest version numbered object is the latest
object in read.


Write operation advances version by 1


Write quorum > half of all object copies


Write quorum+read quorum > all object
copies

Gossip Update


Applicable for frequent read, less update
situations


Increased performance


Typical read one, write gossip


Use timestamp

Basic Gossip Update


Used for overwrite


Three operations, read, update, gossip arrive


Read, if TSfsa<=TSrm, RM has recent data, return
it, otherwise wait for gossip, or try other RM


Update, if Tsfsa>TSrm, update. Update TSrm send
gossip. Otherwise, process based on application,
perform update or reject


Gossip arrive, update RM if gossip carries new
updates.

Causal Order Gossip Protocol[3]


Used for read
-
modify



In a fixed RM configuration



Using vector timestamps



Using buffer to keep the order

Windows Server 2008[4]


Support DFS



“State based, multi master” scheduled replication



Use namespace for transparent file sharing



Use Remote Differential Compression to propagate
change only to save bandwidth


FUTURE WORK IN FILE
REPLICATION

DISTRIBUTED FILE SYSTEMS: STATE OF THE ART


GFS: Google File System


Google


C/C++


HDFS: Hadoop Distributed File System


Yahoo


Java, Open Source


Sector: Distributed Storage System


University of Illinois at Chicago


C++, Open Source


FILE SYSTEMS OVERVIEW


System that permanently stores data


• Usually layered on top of a lower
-
level physical storage
medium


• Divided into logical units called “files”


Addressable by a filename (“foo.txt”)


Usually supports hierarchical nesting (directories)


• A file path joins file & directory names into a relative or
absolute address to identify a file (“/home/aaron/foo.txt”)


SHARED/PARALLEL/DISTRIBUTED FILE
SYSTEMS

• Support access to files on remote servers


• Must support concurrency


Make varying guarantees about locking, who “wins” with
concurrent writes, etc...



Must gracefully handle dropped connections


• Can offer support for replication and local caching


• Different implementations sit in different places on
complexity/feature scale


SECTOR AND SPHERE


Sector: Distributed Storage
System



Sphere
: Run
-
time middleware that supports
simplified distributed data processing
.



Open
source software, GPL, written in C
++.



Started
since 2006, current version
1.18



http
://sector.sf.net


Sector

Brief Definition


Sector is an open source data cloud model.


Its Assumptions: Presence of a high bandwidth
data link among the racks in a Data Center and
also among different Data Centers. Also, that
individual applications may have to process large
streams of data and produce equally large output
streams.







SECTOR: DISTRIBUTED STORAGE SYSTEMS

SECTOR: DISTRIBUTED STORAGE SYSTEMS


Sector stores files on the native/local file system of each slave
node.


• Sector does not split files into blocks


Pro: simple/robust, suitable for wide area


Con: file size limit


• Sector uses replications for better reliability and availability


• The master node maintains the file system metadata. No
permanent metadata is needed.


• Topology aware


SECTOR:WRITE/READ

•Write is exclusive


•Replicas are updated in a chained manner: the client
updates one replica, and then this replica updates
another, and so on. All replicas are updated upon the
completion of a Write operation.


•Read: different replicas can serve different clients at the
same time. Nearest replica to the client is chosen
whenever possible.






SECTOR: TOOLS AND API



Supported file system operation: ls, stat, mv, cp, mkdir,
rm, upload, download


Wild card characters supported


•System monitoring: sysinfo.


•C++ API: list, stat, move, copy, mkdir, remove, open,
close, read, write, sysinfo.





Sector: Architecture Sector manages its data with the
help of the following components.



Security Server :
Authenticates the clients and the slave
servers


Master Server
: Contains File Meta Data, Schedules the
work among the slave servers.


Slave Servers:
Contains datasets divided amongst them
in the form of Linux files.






Sphere:


A Brief Definition Sphere is a programming model built
over the Sector architecture of data cloud.



It falls under the Single instruction Multiple Data
Category from Flynn’s taxonomy.





Sphere

:



Computing

Model

Sphere

identifies

the

individual

records

in

a

file

with

the

help

of

index

files
.


Each

record

or

a

bunch

of

records

are

treated

as

independent

data

entities

that

can

be

processed

in

Parallel

by

different

slave

nodes
.

Similar

to

data

parallelism
.


These

slave

nodes

are

managed

by

Sphere

Processing

Engines(SPE)

and

the

SPE

are

scheduled

by

Sphere
.


Multiple

stages

of

SPE

can

be

coupled

together
.



MAP REDUCE


Similarity to Map Reduce Sphere model is quite similar
to Map Reduce because both of them deal with data
parallelism and allow coupling of at least two stages of
processing elements.



Map Reduce shuffles the data among the slave nodes in
the second stage whereas Sphere allows the output of the
first stage to be distributed among different processing
nodes of the second stage.




References

[
1] Wikipedia; http://
en.wikipedia.org/wiki/Atomicity


[2] M. T. Harandi;J. Hou (modified: I. Gupta);"Transactions with
Replication";http://www.crhc.uiuc.edu/~nhv/428/slides/repl
-
trans.ppt


[3] Randy Chow,Theodore Johnson, “Distributed Operating Systems &
Algorithms”,
1998


[4] "Overview of the Distributed File System Solution in Microsoft Windows
Server 2003
R2";http://
technet2.microsoft.com/WindowsServer/en/library/d3afe6ee
-
3083
-
4950
-
a093
-
8ab748651b761033.mspx?mfr=true


[5] "Distributed File System Replication: Frequently Asked
Questions";http://technet2.microsoft.com/WindowsServer/en/library/f9b9
8a0f
-
c1ae
-
4a9f
-
9724
-
80c679596e6b1033.mspx?mfr=true


[6] http://code.google.com/edu/parallel/dsd
-
tutorial.html


[7] http://code.google.com/edu/parallel/mapreduce
-
tutorial.html


[8]http://static.googleusercontent.com/external_content/untru
sted_dlcp/labs.google.com/en/us/papers/gfs
-
sosp2003.pdf


[9]http://arxiv.org/ftp/arxiv/papers/0809/0809.1181.pdf