Parallel Computing

unevenoliveSoftware and s/w Development

Dec 1, 2013 (3 years and 8 months ago)

86 views

SJSU SPRING 2011

PARALLEL COMPUTING

Parallel Computing



CS 147: Computer Architecture

Instructor: Professor Sin
-
Min Lee

Spring 2011

By: Alice Cotti


SJSU SPRING 2011

PARALLEL COMPUTING

Background


Amdahl's law and Gustafson's law


Dependencies


Race conditions, mutual exclusion,
synchronization, and parallel slowdown


Fine
-
grained, coarse
-
grained, and
embarrassing parallelism


SJSU SPRING 2011

PARALLEL COMPUTING

Amdahl's Law

The speed
-
up of a program from parallelization is limited by
how much of the program can be parallelized.

Amdahl's Law

SJSU SPRING 2011

PARALLEL COMPUTING

Dependencies

Consider the following functions, which demonstrate
several kinds of dependencies:


1: function Dep(a, b)

2: c := a∙b

3: d := 2∙c

4: end function


Operation 3 in Dep(a, b) cannot be executed before (or
even in parallel with) operation 2, because operation
3 uses a result from operation 2. It violates condition
1, and thus introduces a flow dependency.


SJSU SPRING 2011

PARALLEL COMPUTING

Dependencies

Consider the following functions



1: function NoDep(a, b)

2: c := a∙b

3: d := 2∙b

4: e := a+b

5: end function


In this example, there are no dependencies between
the instructions, so they can all be run in parallel.

SJSU SPRING 2011

PARALLEL COMPUTING

Race condition


A flaw whereby the
output or result of the
process is unexpectedly
and critically dependent
on the sequence or
timing of other events.


Can occur in electronics
systems, logic circuits,
and multithreaded
software.

Race condition in a logic circuit.

Here, ∆t1 and ∆t2 represent the propagation delays of

The logic elements. When the input value (A)

changes, the circuit outputs a short spike of duration ∆t1.

SJSU SPRING 2011

PARALLEL COMPUTING

Fine
-
grained, coarse
-
grained,
and embarrassing parallelism

Applications are often classified according to how
often their subtasks need to synchronize or
communicate with each other.


Fine
-
grained parallelism: subtasks must
communicate many times per second


Coarse
-
grained parallelism: they do not
communicate many times per second


Embarrassingly parallel: rarely or never have to
communicate. Embarrassingly parallel
applications are the easiest to parallelize

SJSU SPRING 2011

PARALLEL COMPUTING

Types of parallelism


Data parallelism


Task parallelism


Bit
-
level parallelism


Instruction
-
level parallelism


A five
-
stage pipelined superscalar processor,

capable of issuing two instructions per cycle.

It can have two instructions in each stage of the

pipeline, for a total of up to 10 instructions (shown

in green) being simultaneously executed.

SJSU SPRING 2011

PARALLEL COMPUTING

Hardware


Memory and communication


Classes of parallel computers


Multicore computing


Symmetric multiprocessing


Distributed computing

SJSU SPRING 2011

PARALLEL COMPUTING

Multicore Computing


PROS


better than dual core


won't use the same bandwidth
and bus


therefore be even faster.

CONS

heat dissipation problems

more expensive

SJSU SPRING 2011

PARALLEL COMPUTING

Software


Parallel programming languages


Automatic parallelization


Application checkpointing

SJSU SPRING 2011

PARALLEL COMPUTING

Parallel programming languages


Concurrent programming languages,
libraries, APIs, and parallel programming
models (such as Algorithmic Skeletons)
have been created for programming
parallel computers.



Shared memory


Distributed memory


Shared distributed memory


SJSU SPRING 2011

PARALLEL COMPUTING

Automatic parallelization

Automatic parallelization of a sequential program
by a compiler is the holy grail of parallel
computing. Despite decades of work by compiler
researchers, has had only limited success.


Mainstream parallel programming languages
remain either explicitly parallel or (at best)
partially implicit, in which a programmer gives
the compiler directives for parallelization.


A few fully implicit parallel programming
languages exist

SISAL, Parallel Haskell, and
(for FPGAs) Mitrion
-
C.

SJSU SPRING 2011

PARALLEL COMPUTING

Application checkpointing


The larger and more complex a computer
is, the more that can go wrong and the
shorter the mean time between failures.



Application checkpointing is a technique
whereby the computer system takes a
"snapshot" of the application. This
information can be used to restore the
program if the computer should fail.

SJSU SPRING 2011

PARALLEL COMPUTING

Algorithmic methods

Parallel computing is used in a wide range
of fields, from bioinformatics to economics.
Common types of problems found in
parallel computing applications are:


Dense linear algebra


Sparse linear algebra


Dynamic programming


Finite
-
state machine simulation

SJSU SPRING 2011

PARALLEL COMPUTING

Programming


The parallel architectures of supercomputers
often dictate the use of special programming
techniques to exploit their speed.


The base language of supercomputer code is, in
general, Fortran or C, using special libraries to
share data between nodes.


The new massively parallel GPGPUs have
hundreds of processor cores and are
programmed using programming models such
as CUDA and OpenCL.

SJSU SPRING 2011

PARALLEL COMPUTING

Classes of parallel computers

Parallel computers can be roughly classified
according to the level at which the
hardware supports parallelism.



Multicore computing


Symmetric multiprocessing


Distributed computing


Specialized parallel computers

SJSU SPRING 2011

PARALLEL COMPUTING

Multicore computing


Includes multiple execution units ("cores") on the
same chip.


Can issue multiple instructions per cycle from
multiple instruction streams. Each core in a
multicore processor can potentially be
superscalar.


Simultaneous multithreading has only one
execution unit, but when that unit is idling (such
as during a cache miss), it process a second
thread. IBM's Cell microprocessor, for use in the
Sony PlayStation 3 is multithreading.

SJSU SPRING 2011

PARALLEL COMPUTING

Symmetric multiprocessing


A computer system with multiple identical
processors that share memory and connect via a
bus.


Bus contention prevents bus architectures from
scaling. As a result, SMPs generally do not
comprise more than 32 processors.


Small size of the processors and the significant
reduction in the requirements for bus bandwidth
achieved by large caches, such symmetric
multiprocessors are extremely cost
-
effective.

SJSU SPRING 2011

PARALLEL COMPUTING

Distributed computing


A distributed memory
computer system in
which the processing
elements are
connected by a
network.


Highly scalable.

(a)

(b) A distributed system.

(c) A parallel system.

SJSU SPRING 2011

PARALLEL COMPUTING

Specialized parallel computers

Within parallel computing, there are
specialized parallel devices that tend to be
applicable to only a few classes of parallel
problems.


Reconfigurable computing


General
-
purpose computing on graphics
processing units


Application
-
specific integrated circuits


Vector processors

SJSU SPRING 2011

PARALLEL COMPUTING

Questions?

SJSU SPRING 2011

PARALLEL COMPUTING



References:



Wikipedia.org


Google.com