ICS362 Distributed Systems

warmersafternoonNetworking and Communications

Oct 23, 2013 (4 years and 18 days ago)

104 views

ICS362


Distributed Systems

Dr. Ken Cosh

Week 4

Review


Processes


Threads


Virtualisation


Clients


Servers


Code Migration

This Week


Communication


Fundamentals


Remote Procedure Calls (RPC)


Message Oriented Communication


Stream Oriented Communication


Multicast Communication

Communication


Inter
-
process communication is at the heart
of all distributed systems.


Crucially communication is governed by
‘protocols’


And protocols are layered


Remember the Open Systems
Interconnection Reference Model?


OSI Model

Layered Protocols


When process A wants to communicate with
process B;


Build message in own address space


Executes system call for OS to send message
across network


Works OK, but both sides have to agree on a
protocol.

Layered Protocols


Agreements are needed at many levels


How many volts signal ‘0’ and how many signal
‘1’?


How do you know the last bit in the message?


How do you know if the message is damaged or
missing parts?


How are data items represented?


These questions are answered at different
layers in the OSI

Connection Oriented vs
Connectionless Protocols


With a connection oriented protocol the sender and
receiver


Establish a connection


Agree on protocols


Terminate connection


E.G. Telephone Call


With a connectionless protocol


No set up needed, the message is just sent


E.G. Dropping a letter in a mailbox

OSI Model

Application

Presentation

Session

Transport

Network

Data Link

Physical

1

2

3

4

5

6

7

Network

OSI Model


Each layer manages a specific aspect of
communication


The problem is divided into manageable pieces each
solvable independently.


Each layer provides an interface to the layer above
it.


Each layer adds a header to the front of a message and
passes the result down to the layer below


Each header is removed by the corresponding layer by the
recipient.

Layers & Headers

Message

Application Layer Header

Presentation Layer Header

Session Layer Header

Transport Layer Header

Network Layer Header

Data Link Layer Header

Data Link

Layer Trailer

Why Layers?


Manager of Zippy Airlines asks secretary to
contact the Sales Manager’s secretary at
Mushy Meals to order 100,000 boxes of
rubber chicken.


Once the secretaries communicated by
telephone, now they use email


without affecting
the communication


The Manager can change the order to Goat Ribs
without affecting the secretaries.

OSI Physical Layer


Deals with transmitting 0 and 1.


How many volts for 0, how many for 1?


How many bits per second?


Can simultaneous 2 way communication happen?


What about the physical plug?


How many pins?


What shape?


What does each pin do?


OSI Data Link Layer


Networks are prone to errors, so the Data
Link Layer detects and corrects errors


Group bits into frames, check that each frame is
correctly received.


Mark the start and end of each frame.


Computing a ‘checksum’ for all the bits in the
frame, and the receiver checks they get the same
value.

OSI Network Layer


Essentially choosing the best route for each
message


The shortest route is not always the best


Each packet can take a different route, routed
independently

OSI Transport Layer


Ideally the Application layers can pass a
message to the Transport layer and assume
that it arrives.


The transport layer puts the message back in
order, maintaining the illusion that messages
arrive undamaged in the same order they were
sent.

Upper OSI Layers


Application, Presentation & Session layers can
essentially be combined.


Session layer keeps track of who is currently talking
(synchronisation facilities). Allows check points to be
inserted to go back to rather than returning to the start if
something goes wrong.


Presentation layer allows structured information such as
someone’s name rather than a random bit string.


Application layer is a container for any application using the
communication protocols


Remember, the OSI is a reference model.

Middleware Alternative

Application

Middleware

Transport

Network

Data Link

Physical

1

2

3

4

5

6

Network

Types of Communication


Persistent vs Transient


Synchronous vs Asychronous


Discrete vs Streaming

Types of Communication


Persistent


Message is stored by communication middleware
as long as it takes to deliver.


E.g. email


Transient


Message is stored only while sender and receiver
are executing, and then it is dropped.

Types of Communication


Synchronous


Sender is blocked until request is known to be
accepted


Communication happens concurrently


Asynchronous


Message is temporarily stored by middleware and
sender continues

Types of Communication


Discrete


Parties communicate with messages which form a
complete unit of information


Streaming


Multiple messages sent after each other, with
each message linked due to some temporal
relationship

Communication Methods


Remote Procedure Call (RPC)


Message Oriented Communication


Stream Oriented Communication


Multicast Communication

Remote Procedure Call


Allowing one machine to call a function on a
different machine.


Refreshingly simply concept, but has challenges,
for example when passing parameters.


The idea being to make a remote function
call look as much as possible like a local
one.


i.e. the calling procedure need not be aware
where the called procedure is being executed.

Conventional Procedures


count = read(fd, buf, nbytes);


The runtime stack:

Stack

Pointer

Main Program’s
local variables

Main Program’s
local variables

fd

fd

buf

nbytes

Return Address

read’s local variables

RPC

Client

Server

Request

Reply

Provide Service

Wait for Result

Time

RPC Calls


1) Client procedure makes function call in usual way


2) Client ‘stub’ builds message and calls local OS


3) Client OS sends message to remote OS


4) Remote OS gives message to server ‘stub’


5) Server stub unpacks parameters and calls server


6) Server does work and returns results to stub


7) Server stub packs in a message and calls OS


8) OS sends message to clients OS


9) Client OS gives message to client stub


10) Client stub unpacks the result and returns it to the client.

Passing Value Parameters


A copy of the parameter is made and sent to the
server.


As long as client and server machines are identical, this
shouldn’t be a problem…


But different machines often have different
representations


IBM mainframe uses EBCDIC character code


IBM personal computer uses ASCII


It isn’t possible to send easily between these machines

Gulliver’s Travels


In Liliput a royal edict says one must open
ones boiled egg at the small end!


Little Endians


In Blefuscu they crack theirs at the large end


Big Endians


The two nations nearly go to war over such a
‘trivial’ problem…

Endianness


Intel Pentium number their bytes from right to left


Sun SPARC number their bytes from left to right


"
Swift's point is that the difference between breaking the egg
at the little
-
end and breaking it at the big
-
end is trivial
.
Therefore, he suggests, that everyone does it in his own
preferred way
.
We agree that the difference between
sending eggs with the little
-

or the big
-
end first is trivial, but
we insist that everyone must do it in the same way, to avoid
anarchy
.
Since the difference is trivial we may choose either
way, but a decision must be made
."
(Cohen)

Sending an Integer (5) and a 4
character string (JILL)

0

1

2

3

4

5

6

7

0

0

0

5

J

I

L

L

3

2

1

0

7

6

5

4

5

0

0

0

L

L

I

J

Message on Pentium

As Received on SPARC

83,886,080 not 5!

3

2

1

0

7

6

5

4

0

0

0

5

J

I

L

L

Suppose we just reverse it?

Sending Reference Parameters


How do we send a pointer?


Or an array?


If I send a reference to a memory address on
my computer, it won’t be the same on the
server.


One option is ‘copy/restore’


Another depends on whether the parameter is
being sent for input, or just for output.


We may not need to send the parameter, or get it back...

Asynchronous RPC


Conventionally when a client calls a remote
procedure it will block until it receives a reply.


But what happens if there is no returned value?


What happens if the client doesn’t even need to
know if the server completes the action?

Asynchronous RPC

Client

Server

Request

Reply

Provide Service

Wait for Acceptance

Time

2 Asynchronous RPCs

Client

Server

Request

Reply

Provide Service

Wait for Acceptance

Time

Asynchronous?


RPC (and Remote Object Invocation) is useful for
access transparency & communication hiding


But, what happens if the receiver isn’t executing?


The client is blocked until it receives a response


It is essentially synchronous


So how about Message Oriented Communication?

Message Oriented Communication


Message Queuing Systems


Or


Message Oriented Middleware (MOM)



Support for persistent asynchronous
communication


Intermediate term storage for messages without
requiring either sender or receiver to be executing

Message Queuing Systems


Applications communicate by putting
messages in specific queues.


Messages the forwarded through the
network, even if receiver is down.


Each application has its own private queue to
which other applications can send messages


The queue can only be read by the specified
application, although applications could share
queues

Message Queuing Systems


An application is guaranteed that a message
will eventually be inserted into the recipients
queue


But no guarantee about when.


Or if the message will be read or responded to.


This permits the sender and receiver to
execute independently


Asynchronously

Message Queuing Combinations

Message Queuing Systems


An application places a message in a local
queue


The source queue


The message includes the specification of
the destination queue


The message queuing system deals with the
rest


Message Oriented Middleware

MOM


Ideally the Queue Manager looks up the address of
the recipient and transfers the message to the
recipients queue


But is that scalable?


All queue managers need a complete address list?


So, some queue managers exist as relays (or
routers)


Each queue needs to know the address of its nearest relay


The relays need an updated list of addresses

Message Brokers


An important application is integrating existing and
new applications into a single coherent distributed
information system.


Can I send a gmail message to a lotus notes client?


One option is to agree on a common message
format


But this can be limiting with higher level abstractions


Message brokers can be used as special nodes to
handle conversions between messages.

Stream Oriented Communication


Thus far we have considered discrete pieces of
information


A message


A function call


With this it doesn’t matter when communication
takes place


While the system may perform
too

slowly, it doesn’t effect
correctness.


With streams the continuous communication is
subject to more rigorous timing constraints.

-
chronous


Asynchronous Transmission Mode


Data items are transmitted one after another, but
no further timing constraints


Synchronous Transmission Mode


There is a maximum end
-
to
-
end delay for each
unit in a data stream, but units can be quicker


Isochronous Transmission Mode


Maximum & Minimum end to end delay


Interesting for distributing multimedia streams

Streams


Simple


Single sequence of data


Complex


Multiple related substreams


E.g. 2 substreams for stereo audio, further
substream for video, and further for subtitles


Sychronisation of these substreams is essential.

QoS


Quality of Service


Timing (and other non
-
functional requirements) is
a Quality of Service (QoS) requirement


i.e. what the underlying network needs to provide
in order for a stream to be preserved


However, given the reliability of networks


A best
-
effort delivery service


A distributed system trying to conceal, as best as
possible, the lack of QoS

Buffering

Buffering


Packet #8 was too late arriving!


Solution could be to increase the buffer size


Increasing the buffer size increases the delay
before playback.


Maybe not a problem with youtubing, but what
about during live chat?

Lost Packets


Packets may get lost.


One option is to encode outgoing packets such
that k out of n packets are needed to reconstruct
the stream


If a packet gets lost it may lead to a gap
between frames


Interleaving can help with this, but again, may
cause the start to be delayed.

Interleaving

Stream Synchronisation


If dealing with multiple substreams, they
need to be precisely synchronised


A stereo audio stream will be distorted if the 2
streams have a difference of more than 20
µs
ec


Obviously all streams need to be at the
recipients machine


But should the synchronisation take place on the
sending or receiving machine?


Stream Synchronisation


Different substreams could be subject to
different delays, so its better to merge the
substreams at the sender and let the receiver
split the channels when it arrives.

Multicast Communication


Disseminating data to multiple receivers



Nodes can connect via an overlay network


Either as a tree, or a mesh graph



Message can then be distributed efficiently
by creating a minimal spanning tree

Gossip Based Dissemination


Based on Epidemic Behaviour


Nodes are either infected, susceptible or removed
(not willing or able to spread data)


A node P picks another node Q at random and
propagates updates by 1 of 3 models


P only pushes updates to Q


P only pulls updates from Q


P and Q send updates to each other (Push/Pull)


Which is better?

Which option?


Push only is a bad choice.


If many nodes are infected, the probability of
choosing a susceptible node is small.


Pull only is better ( O(log(N)) )


When many nodes are infected, the chance of
choosing an infected node is high.


Push/Pull is clearly the best