Why does Distributed

hedgebornabaloneΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

77 εμφανίσεις

1

Why does Distributed
Computing Matter?

Why does Distributed
Computing Matter?


Clay
Shirkey’s

4 rungs of group interaction


Here comes Everybody (2008)


Links in a fully connected mesh grow exponentially


nodes * (nodes


1) / 2


Internet technology enables ridiculously easy group forming


e.g. email Reply All button


Describes an evolution from


Simple/selfish sharing


example: delicious


conversation


example: high dynamic range photography on
Flickr


collaboration


anime


writing software to subtitle Japanese anime for Western
consumption


collective action


flash mobs in Belarus


now Tunisia and Egypt


Each rung requires greater co
-
ordination amongst members

3

Machine A

Machine B

Machine C

Distributed Applications

Middleware Services

OS, e.g.

Windows

OS, e.g.

Mac OS X

OS, e.g.

Linux

Network

A Distributed System

A distributed system is a collection of independent computers that appears to its
users as a single coherent system.





Heterogeneous computers


vendors/OS should be able to
interoperate



Should mask the heterogeneity from users (applications)



Should
be easy to
expand

and
scale



Should be
permanently available

(even though parts of it are not
)



Communication

is based on
messages



messages

4

Layering




The distributed applications and middleware we will
consider occupy the Application Layer of the OSI
model









Some interact more or less closely with underlying
layers


Typically these systems have various layers within
them as well.


5

Layering




For example a Web Service:


Serializes programming objects to XML, perhaps using W3C
XML Schema specification


Wraps this data in a SOAP envelope.


Sticks the SOAP envelope into an HTTP POST message
payload.


Then HTTP uses what it uses: TCP/IP, DNS


A Peer to Peer application may:


Define custom messages for file sharing


Use UPnP to negotiate
NATs


Tunnel custom messages through another protocol (e.g.
HTTP)


In the end, systems will do whatever they need to get their job
done and don’t care much for nicely defined layers.

6

Messages




These define the protocol of the distributed
application or middleware.


Provide the insulation from heterogeneous
OS’s


Are capable of traversing the network.


This involves some form of
serialization


i.e. the message can be converted into a stream
of bytes and sent over the wire


Often an in
-
memory representation of an Object is
serialized at the sending side, and de
-
serialized at
the receiving side to create an in
-
memory
representation.


Strings, XML, Java serialization,
bencoding



7

Messages




All messages are one
-
way


Protocols may build on top of this to create two
way message exchange patterns


E.g. HTTP is request/response based.


RPC emulates a local computing paradigm with
the concept of a remote procedure with return
values


Messages are typically independent of the
transport protocol they use (TCP, UDP)


But some rely/presume certain transport protocols


E.g.
bittorrent

and HTTP keep the TCP
connection open to reduce overhead. Would not
work with UDP.



8

Distributed
vs

Local Systems


Distributed systems are inherently different from non
-
distributed systems.


Latency
-

n
etwork speed


Memory access
-

not shared


Partial Failure


remote failure does not mean local failure


no global coordination (like an OS)


Guaranteed Concurrency


combined with latency, events are not necessarily received in
the same order as they are generated


Indeterminacy


Your system is not in control of the whole system


With partial failure, a system may just disappear with no
indication of status.


was it the remote machine, remote user or a network
link?



9

Peer

Client

Server

Node

Computer/

Device

Service

Resource

Some Terminology

Resource:
any hardware or software
resource shared on a distributed
network

e.g.
a file storage system, RAM,
CPU, a file, a service or a
communication channel

Generic term
-

any
computer
on a
distributed network

A Provider of
data

A Consumer of
data

Both a Provider
and

Consumer
of data

“A network
-
enabled
entity that provides
some capability”

10

Taxonomy for Distributed Systems

Taxonomy is based on following
factors and their relation to centralization:

1. Resource Discovery
:

2. Resource Availability
:

Scalability


do resources scale with network
?



-

does access to them scale with network?

See example...

Mechanism for discovering resources on a distributed system?



Examples: DNS, JXTA Rendezvous,
Jini

LUS, UDDI etc

11

Mp3.com, Napster and Gnutella

User

Mp3.com

MP3.com

Scenario

User

Napster.com

Napster

Scenario

Gnutella

Scenario

12

Taxonomy for Distributed Systems

3. Resource Communication
: Two types:


Brokered Communication (centralized):
communication is passed through a

central
server
-

resources do not have direct references to each other.


Point to point

(decentralized
-
peer to peer)
: a direct connection between the sender
and the
receiver
.

Taxonomy is based on following
factors and their relation to centralization:

1. Resource Discovery
:

2. Resource Availability
:

Scalability


do resources scale with network
?



-

does access to them scale with network?


Mechanism for discovering resources on a distributed system?



Examples: DNS, JXTA Rendezvous,
Jini

LUS, UDDI etc

13

Equal

Peers,

communication

is

supposed

to

be

even

i
.
e
.

each

provider

is

also

a

consumer

of

information

and

each

node

has

an

equal

number

of

connections


This

is

not

always

the

case



as

we

will

learn

in

lecture

4

Web

Server

Centralization of Point
-
to
-
Point Connections

True Peer to Peer e.g. Gnutella

Many to one relationship

between users and the web
server and therefore this can be
considered centralized
communication

14

Taxonomy for Distributed Systems

Centralized

Decentralized

Hybrid

Centralized
systems
-
typically, client/server
based systems

Decentralized
systems
-

Peer to
Peer (P2P)

Hybrid


combinations
of the 2 extremes e.g.
brokered architecture

3. Resource Communication
: Two types:


Brokered Communication (centralized):
communication is passed through a

central
server
-

resources do not have direct references to each other.


Point to point

(decentralized
-
peer to peer)
: a direct connection (although

connection maybe multi
-
hop) between the sender and the receiver.

Taxonomy is based on following factors:

1. Resource Discovery
:

2. Resource Availability
:

Scalability


do resources scale with network?


Mechanism for discovering resources on a distributed system?



Examples: DNS, JXTA Rendezvous,
Jini

LUS, UDDI etc

15

A Web Server: Centralized

-

Clients

(i
.
e
.

users)

use

their

web

browser

to

navigate

web

pages

on

one

or

more

web

sites
.


-

Web

site

is

static

to

particular

domain





Discovery
:

Centralized,

DNS




Availability
:

available

or

not




Communication
:

centralized

to

the

particular

web

server

Resource

Availability

Resource

Discovery

Resource

Communication

Centralized

Decentralized

Web

Server

Web

Server

The Web as a whole


Discovery
-

ad hoc


Often highly centralized, e.g. Google, but is also highly
decentralized
-

the Web of links, e.g. the blogosphere and out of
bounds


Availability
-

depends on the granularity of the request


There is a level of replication on the Web and caching can be used
to duplicate availability


Communication
-

centralized


Communication happens via a centralized entity, e.g. Facebook,
MySpace, Flickr, blogs, etc

17

Napster: Brokered

Clients

search

through

Napster

web

site

(well,

they

used

to

.
)




Discovery
:

Centralized

through

web

site



Availability
:

Once

discovered

via

web

site,

availability

is

decentralized
.



Communication
:

decentralized

between

peers

(MP
3

sharers)

Resource

Availability

Resource

Discovery

Resource

Communication

Centralized

Decentralized

Napster

User

Napster.com

18

Gnutella: Decentralized



Discovery
:

Decentralized

through

Gnutella

messages

(ping/pong

mechanisms)



Availability
:

Often

an

alternate

path

to

resource




Communication
:

point

to

point
:

decentralized

between

peers

Resource

Availability

Resource

Discovery

Resource

Communication

Centralized

Decentralized

Gnutella

19

SETI@Home

Main Server



Launched In 1996



Scientific experiment
-

uses Internet
-
connected computers in
the Search for Extraterrestrial Intelligence (SETI)



Distributes a screen saver

based application to users



Applies signal analysis
algorithms to
different data sets to
process radio
-
telescope data.



Has more than 3 million users
-

used over a million years of
CPU time to date

Client/Server

P2P

1.

Install

Screen Saver



Radio
-
telescope
Data

2.

SETI client (screen

Saver) starts

3.

SETI client gets

data from server and runs

4.

Client sends results

back to server

SETI
@HOME
(Client/Server)

20

Resource

Availability

Resource

Discovery

Resource

Communication

Centralized

Decentralized

SETI@
home

Search for Extraterrestrial Intelligence@home


volunteer computing
system


generalized to BOINC API

21

Resource

Availability

Resource

Discovery

Resource

Communication

Centralized

Decentralized

Web
Services

Note: This is for the current Web Services, technology stack


-

in principal you can host web services in a number of ways



Discovery
:

Centralized

through

registry



Availability
:

Once

discovered

via

registry,

availability

is

decentralized
.



Communication
:

decentralized

between

provider

and

consumer

22

Concluding Remarks

1.
Taxonomy

a)
Criteria

a)
Resource Discovery

b)
Resource Availability

c)
Resource Communication

b)

Taxonomy

a)
Centralized

b)
Hybrid

c)
Decentralized