2. Distributed Computing

coleslawokraΛογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 3 χρόνια και 4 μήνες)

65 εμφανίσεις

COS 497
-

Cloud
Computing


2.
Distributed Computing

The operation of the Cloud lies in the
parallel/distributed
computing paradigm.


Next
s
lides give an overview of this mode of computing.


Reference
:


https://computing.llnl.gov/tutorials/parallel_comp
/


First, some
jargon



Parallel
computing
is a form of computation in which
a large problem is
divided into a number of smaller, discrete, relatively
-
independent parts,
with their execution
carried out
simultaneously.


There
are several different forms
(or
granularities
) of
parallel computing:
bit
-
level, instruction level, data, and task parallelism.


Parallelism
has been
used for
many years,
but
interest in it has
increased
in recent years, mainly
in the form of
multi
-
core
processors
.


Parallel computers can be roughly classified according to the level at which
the hardware supports parallelism,
with
multi
-
core

and
multi
-
processor
computers having multiple processing
units
within a
single

machine,


while
clusters,
grids and clouds use
multiple, distributed

computers to
work on the same task.

Synchronization


The
coordination of parallel tasks
in real time, very often associated
with communications.

Often implemented by establishing a synchronization point within an
application where a task may not proceed further until another task(s)
reaches the same or logically equivalent point.


Granularity



In
parallel computing, granularity is
a qualitative measure of the ratio
of computation to communication.


Coarse
:

relatively large amounts of computational work are done
between communication events


Fine
:

relatively small amounts of computational work are done
between communication events



Fine

Coarse

Traditional mode of computation


instructions executed in sequence

Example

Parallel Computing

Problems (i.e. programs) are
divided into smaller parts and
executed
simultaneously/concurrently

on different processors.

Sequential

Parallel

Example

Parallel computers can be roughly classified according to the level at which
the hardware supports parallelism.


For the Cloud, an important class is
distributed systems.



Distributed
System
:
a
loosely
-
coupled
form of
parallel
computing


o
Use
multiple

computers

to perform computations in parallel.



o
Computers are connected via a
network


computers are
distributed in “space”.



Distributed System:
Use
a “distributed memory
”.


Massage passing
is typically used
to exchange information between the
processors as each one has its own private memory.

Small distributed system

Hardware Models

Flynn’s
Taxonomy for
Computer Architectures

Instructions

Single (SI)

Multiple (MI)

Data

Multiple (MD)

SISD

Single
-
threaded
process

MISD

Pipeline
architecture

SIMD

Vector Processing

MIMD

Multi
-
threaded
Programming

Single (SD)

SISD



Single Instruction stream, Single Data stream

D

D

D

D

D

D

D

Processor

Instructions

Single
processor
executes
a single
stream of instructions (i.e. a
single program),
operating on a single set of
data. Traditional
form of computation.

SIMD



Single Instruction stream, Multiple Data streams

D
0

Processor

Instructions

D
0

D
0

D
0

D
0

D
0

D
1

D
2

D
3

D
4



D
n

D
1

D
2

D
3

D
4



D
n

D
1

D
2

D
3

D
4



D
n

D
1

D
2

D
3

D
4



D
n

D
1

D
2

D
3

D
4



D
n

D
1

D
2

D
3

D
4



D
n

D
1

D
2

D
3

D
4



D
n

D
0

A
number of processors
execute
copies of the
same

program
(Single Instruction stream), but
with
different

sets of
data
(Multiple Data streams
).
A type of parallel computer.
Two
varieties:
Processor Arrays and Vector
Pipelines. And Cloud!

MIMD

-

Multiple Instruction streams, Multiple Data streams

D

D

D

D

D

D

D

Processor

Instructions

D

D

D

D

D

D

D

Processor

Instructions

A
number of processors
execute
different programs
(Multiple
Instruction streams) with
different sets of
data (Multiple Data
streams).
Provides true parallel
processing.
Examples: most current
supercomputers, networked parallel computer clusters and "grids",
multi
-
processor SMP computers, multi
-
core
PCs. And Cloud!

Memory Typology: Shared

Memory

Processor

Processor

Processor

Processor

Programs executing on different processors share, i.e. have
access to, the same common memory (usually via a bus) and
communicate with each other via this memory.

Memory accesses need to be
synchronized.

Pn

P1

P0

Shared memory

Private
memory

Memory Typology: Distributed

Memory

Processor

Memory

Processor

Memory

Processor

Memory

Processor

Network

Each
processor has its own private memory. Computational
tasks can only operate on local data, and if remote data is
required, the computational task must communicate with one or
more remote
processors via a network link.

Memory Typology:
Hybrid


Distributed Shared Memory

Memory

Processor

Network

Processor

Memory

Processor

Processor

Memory

Processor

Processor

Memory

Processor

Processor

Each processor
of a cluster has access to a large shared
memory,

In
addition
each processor has access to remote data via a network
link.

Programming Models

Mirror the Hardware Models

Patterns for Parallelism


Parallel computing has been around for
decades.



Here are some
well
-
known architectural patterns


Master/Slaves

slaves

master

One of the simplest parallel programming
paradigms
is "
master/slaves".

The
main computation (the master) generates many
sub
-
problems
, which are
fired off to be executed by "someone else" (slave).

The
only interaction between the master and slave computations is that the
master starts the slave computation, and the slave computation returns the
result to the master. There are no significant dependencies among the slave
computations.

Producer/Consumer Flow

C

P

P

P

C

C

C

P

P

P

C

C

Producer “threads” create work items.


Consumer “threads” process them.


Can be “daisy
-
chained”, i.e. pipelined.

Work Queues

C

P

P

P

C

C

shared queue

W

W

W

W

W

Used in the Cloud, e.g. Windows Azure

The work queue parallel processing model consists of
a queue
of work items
and processes to
produce and complete
these
work
items.

Each
participant can take a work item off the queue
,
and if necessary, each
participant can add newly
generated work
items to the queue.

As
each participant completes
its
work item, it does not wait for some
participant to
assign
it a new task, but instead takes the next item off the work

queue
and begins execution.

Cloud Computing


A cloud provider has 100s of thousands of nodes (aka servers).


Cloud computing is massively
-
parallel computing with multi
-
processors (i.e. many multi
-
core processors)


In principle, your application may run on one, two, …
thousands of servers (i.e. processors)


For
your application
to
run on one, two, … thousands of
servers, your application code or data must be parallelized.



I.e. Split up into independent or relatively independent
parts.


Parallelizing
code

is real hard work!



-

Splitting a program up into relatively independent parts,
which communicate now and then with each other.


Multi
-
threaded programs are a form of parallelism.


But the general case is still a big research problem.



Splitting data up into smaller chunks is easy, though.


Most Cloud applications are based on data parallelism.

For parallel processing to work, the
computational problem
should be able
to



-

Be
broken apart into discrete pieces of work that can be
solved
simultaneously.


-

Execute
multiple program instructions at any
time


-

Be
solved in less time with multiple compute resources than
with a single compute resource.



The compute resources might be:



-

A
single computer with multiple processors


-

An
arbitrary number of computers connected by a network


-

A
combination of both


Cloud Computing!

Divide and Conquer

“Work”

w
1

w
2

w
3

r
1

r
2

r
3

“Result”

“worker”

“worker”

“worker”

Partition

Combine

Popular cloud approach

Approach used by
MapReduce

Different
Workers?

May be


Different
threads in the same core


Different cores in the same CPU


Different CPUs in a multi
-
processor system


Different machines in a distributed system

Parallelization Problems

What is the common theme
in
all of these problems?


How do we assign work units to workers?


What if we have more work units than workers?


What if workers need to share partial results?


How do we aggregate partial results?


How do we know all the workers have finished?


What if workers die?

Common
Theme?


Parallelization problems
can arise from


Communication between workers


Access to shared resources (e.g
.
data)



Thus
, we need a
synchronization
mechanism!


Some mechanism that allows workers to synchronize (i.e.
keep in step) themselves with other workers.



This
is
tricky


Finding bugs is hard


Solving bugs is even harder

Managing Multiple Workers


Difficult because


(Often)
do not
know the order in which workers run


(Often)
do not
know where the workers are running


(Often)
do not
know when workers interrupt each
other



Thus, we
need
synchronization primitives
(used in operating
systems!)


Semaphores (lock, unlock)


Conditional variables (wait, notify, broadcast)


Barriers



Still, lots of
insidious (i.e.
mnogo

nasty!) problems
:


Deadlock,
livelock
, race conditions,
...



Moral of the story: be careful!


Even trickier if the workers are on different machines

Parallel Programming Models


There
are several
parallel programming models
in common use:


-

Shared Memory


-

Threads


-

Distributed
Memory / Message Passing


-

Data
Parallel


-

Hybrid


-

Single
Program Multiple Data (SPMD)


-

Multiple
Program Multiple Data (MPMD)


Parallel
programming models exist as an abstraction above hardware and
memory architectures.


Although
it might not seem apparent, these models are
not

specific to a
particular type of machine or memory architecture.


In
fact, any of these models can (theoretically) be implemented on any
underlying hardware.

Shared Memory
Model


In
this programming model,
tasks share a
common

address space,
which
they read and write to
asynchronously (aka when they need to.)


Various mechanisms,
such as
locks/semaphores,

may be used to control
access to the shared
memory


synchronizing access to memory.


An
advantage of this model from the programmer's point of view is that
the notion of data "ownership" is lacking, so there is no need to specify
explicitly the communication of data between tasks.


Threads Model



This programming model is
a type of shared memory programming.



In the threads model of parallel programming, a single "
heavyweight
"
process
(i.e. program) can
have
multiple "
lightweight
", concurrent

execution
paths, i.e. threads.



Example


A
program
a.out

is scheduled to run by the
operating
system.


-

a.out

loads,
and acquires all of the necessary system and user
resources to run.
This is the "
heavyweight
" process.


a.out

performs some
sequential
work, and then creates a number of
internal tasks, i.e.
threads,

that can be scheduled and run by the
operating system
concurrently
.

Each
thread has local data,
but also, shares the entire resources of
a.out
.




-

This
saves the overhead associated with replicating a
program's resources for each thread ("
lightweight
").



-

Each
thread also benefits from a global memory view because
it shares the memory space of
a.out
.

A thread
may best be described as
a
block of code
within the main
program.



-

Any
thread can execute
its code
at the same time as other
threads.


Threads
communicate with each other through
global memory
(updating address locations).



-

This
requires
synchronization constructs
to ensure that more
than one thread is not updating the same global address at any time.


Threads
can come and go, but
a.out

remains present to provide the
necessary shared resources until the application has completed.



A number of languages support threads such as Java, C# and Python.

Distributed Memory
-

Message Passing Model



This model demonstrates the following characteristics:
Message Passing
Model



-

A
set of tasks that use their
own local memory
during computation.



-

Multiple
tasks can reside on the same physical machine and/or
across an arbitrary number of machines.



-

Tasks
exchange data through communications by sending and
receiving messages.



-

Data
transfer usually requires cooperative operations to be
performed by each process.



-

For
example, a
send

operation must have a matching
receive

operation.

The standard for message passing is the
Message Passing Interface (MPI)
library.


A number of languages support this library.

Data Parallel Model


This model
demonstrates the following characteristics:



-

Address
space is treated globally



-

Most
of the parallel work focuses on performing operations on a data set.

-

The
data set is typically organized into a common
data structure
,
such as an
array.



-

A
set of tasks work collectively on the
same

data
structure.
H
owever
,
each task works on a
different

partition of the same data structure.



-

Tasks
perform the
same

operation on their partition of work, for
example, "add 4 to every array element".


On
shared memory architectures, all tasks may have access to the data
structure through global memory.


On
distributed memory architectures the data structure is split up and resides
as "chunks" in the local memory of each task.

Single Program,
Multiple Data (SPMD):


SPMD

is actually a "high level" programming model that can be built upon any
combination of the previously mentioned parallel programming models
.


Single Progra
m
:
All tasks execute their copy of the
same

program simultaneously.
This program can be threads, message passing, data parallel or hybrid.


Multiple Data:
All tasks may use
different

data


SPMD

programs
may
have the necessary logic programmed into them to allow
different tasks to branch or conditionally execute only those parts of the program
they are designed to execute.



-

That
is, tasks do not necessarily have to execute the entire program
-

perhaps only a portion of it.


The
SPMD model,
using message passing or hybrid programming, is probably the
most
commonly
-
used
parallel programming model for multi
-
node clusters.


MapReduce

is based on this model.

Multiple
Program,
Multiple Data (MPMD
)


Like
SPMD,
MPMD
is actually a "high level" programming model that can be
built upon any combination of the previously mentioned parallel
programming models
.


Multiple Program:

Tasks may execute
different

programs simultaneously. The
programs can be threads, message passing, data parallel or hybrid.


Multiple Data:
All tasks may use
different

data


MPMD

applications are not as common as SPMD applications, but may be
better suited for certain types of problems, particularly those that lend
themselves better to
functional
(i.e. code) decomposition
than
domain
(i.e.
data) decomposition

SPMD

MPMD

Designing Parallel
Programs


Partitioning


One
of the first steps in designing a parallel program is to break
the problem into discrete "chunks" of work that can be
distributed to multiple tasks.


This
is known as
decomposition

or
partitioning.


There
are two basic ways to partition computational work among
parallel tasks:



domain
decomposition
and
functional decomposition.

Domain
Decomposition


In
this type of partitioning,
the data associated with a problem
is
broken into smaller chunks.


Each
parallel task then works on a portion
of
the data.

MapReduce

works like this, with all tasks being identical.

Functional
Decomposition


In
this approach,
the focus is on the computation that is to be performed
rather than on the data manipulated by the computation.


The
problem is decomposed according to the work that must be done.



Each task then performs a portion of the overall
work.


A cloud application is typically made up of different components


Front end: e.g. load
-
balanced, stateless web servers


Middle worker tier: E.g.
number/data crunching


Backend storage: E.g. SQL tables or files


Multiple instances of each for scalability and availability

Multi
-
Tier Cloud Applications

Cloud computing
architecture



Hardware

Infrastructure

Platform

Application


Generally
speaking, the architecture of a cloud computing

environment can be divided into
four
layers:

the hardware
layer, the infrastructure layer, the
platform layer
and
the application
layer.

Sounds familiar?

The hardware layer:


This
layer is responsible for
managing the
physical resources of
the cloud, including
physical servers
, routers, switches, power
and cooling systems.


In
practice, the hardware layer is
implemented

in
data centers.


A
data center usually contains
thousands of
servers that are
organized in racks and
interconnected through
switches, routers
or other
“fabrics”.


Typical issues at
hardware layer include hardware configuration,
fault tolerance, traffic
management, power and cooling resource

management.

The infrastructure layer:


Also
known as the
virtualization layer
,
the infrastructure layer
creates a pool of
storage and
computing resources by
partitioning the physical
resources using
virtualization
technologies such as
those provided by
Xen
,
VMWare

and KVM.


The
infrastructure layer is
an essential
component of cloud
computing, since many
key features
, such as dynamic resource
assignment, are
only made
available through virtualization
technologies.

The platform layer:


Built
on top of the
infrastructure layer
, the platform layer
consists of operating systems
and application
frameworks.


The
purpose of the platform
layer is
to minimize the burden of
deploying applications
directly into
VM containers.


For
example,
Google App Engine
operates at
the platform layer
to provide API support for
implementing storage
, database and
business logic of typical
web applications
.

The application layer:


At
the highest level of the
hierarchy, the
application layer consists
of the actual cloud applications
.


Different from traditional applications, cloud
applications can
leverage the automatic
-
scaling feature to
achieve better
performance, availability and lower operating cost.

Grid Computing

What is it?

What is Grid Computing?


Multiple independent computing
clusters which act like a “grid”
because they are composed of
resource nodes not located
within a single administrative
domain.


Grid computing depends on
software to divide and apportion
pieces of a program among
several computers, sometimes
up to many thousands.


Grid computing can also be
thought of as distributed and
large
-
scale cluster computing, as
well as a form of network
-
distributed parallel processing.

Cloud
versus
Grid


Similar concepts.



-

Grid
comes from academia.


-

Cloud
comes from enterprise.


Similarities
:


-

Distributed
computing.


-

Large
-
scale
clusters.


-

Commodity
hardware.


-

Heterogeneous cluster
.

Differences:



-

Cloud
:
Elasticity and pay
-
as
-
you
-
go (if not, it is not
Cloud
).




Built around massive data centers.



-

A Grid ...


... Can
be more
loosely
-
coupled
and geographically
dispersed than
a Cloud


… Built around distributed clusters (i.e. racks) of
processors. Not on the scale of Cloud data centers.


... May
use the
user’s
computer as a part of it (volunteer
computing)



Google Trends


Summer 2013


Questions
?