# Parallel Computing

Λογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 4 χρόνια και 5 μήνες)

108 εμφανίσεις

SJSU SPRING 2011

PARALLEL COMPUTING

Parallel Computing

CS 147: Computer Architecture

Instructor: Professor Sin
-
Min Lee

Spring 2011

By: Alice Cotti

SJSU SPRING 2011

PARALLEL COMPUTING

Background

Amdahl's law and Gustafson's law

Dependencies

Race conditions, mutual exclusion,
synchronization, and parallel slowdown

Fine
-
grained, coarse
-
grained, and
embarrassing parallelism

SJSU SPRING 2011

PARALLEL COMPUTING

Amdahl's Law

The speed
-
up of a program from parallelization is limited by
how much of the program can be parallelized.

Amdahl's Law

SJSU SPRING 2011

PARALLEL COMPUTING

Dependencies

Consider the following functions, which demonstrate
several kinds of dependencies:

1: function Dep(a, b)

2: c := a∙b

3: d := 2∙c

4: end function

Operation 3 in Dep(a, b) cannot be executed before (or
even in parallel with) operation 2, because operation
3 uses a result from operation 2. It violates condition
1, and thus introduces a flow dependency.

SJSU SPRING 2011

PARALLEL COMPUTING

Dependencies

Consider the following functions

1: function NoDep(a, b)

2: c := a∙b

3: d := 2∙b

4: e := a+b

5: end function

In this example, there are no dependencies between
the instructions, so they can all be run in parallel.

SJSU SPRING 2011

PARALLEL COMPUTING

Race condition

A flaw whereby the
output or result of the
process is unexpectedly
and critically dependent
on the sequence or
timing of other events.

Can occur in electronics
systems, logic circuits,
software.

Race condition in a logic circuit.

Here, ∆t1 and ∆t2 represent the propagation delays of

The logic elements. When the input value (A)

changes, the circuit outputs a short spike of duration ∆t1.

SJSU SPRING 2011

PARALLEL COMPUTING

Fine
-
grained, coarse
-
grained,
and embarrassing parallelism

Applications are often classified according to how
often their subtasks need to synchronize or
communicate with each other.

Fine
-
communicate many times per second

Coarse
-
grained parallelism: they do not
communicate many times per second

Embarrassingly parallel: rarely or never have to
communicate. Embarrassingly parallel
applications are the easiest to parallelize

SJSU SPRING 2011

PARALLEL COMPUTING

Types of parallelism

Data parallelism

Bit
-
level parallelism

Instruction
-
level parallelism

A five
-
stage pipelined superscalar processor,

capable of issuing two instructions per cycle.

It can have two instructions in each stage of the

pipeline, for a total of up to 10 instructions (shown

in green) being simultaneously executed.

SJSU SPRING 2011

PARALLEL COMPUTING

Hardware

Memory and communication

Classes of parallel computers

Multicore computing

Symmetric multiprocessing

Distributed computing

SJSU SPRING 2011

PARALLEL COMPUTING

Multicore Computing

PROS

better than dual core

won't use the same bandwidth
and bus

therefore be even faster.

CONS

heat dissipation problems

more expensive

SJSU SPRING 2011

PARALLEL COMPUTING

Software

Parallel programming languages

Automatic parallelization

Application checkpointing

SJSU SPRING 2011

PARALLEL COMPUTING

Parallel programming languages

Concurrent programming languages,
libraries, APIs, and parallel programming
models (such as Algorithmic Skeletons)
have been created for programming
parallel computers.

Shared memory

Distributed memory

Shared distributed memory

SJSU SPRING 2011

PARALLEL COMPUTING

Automatic parallelization

Automatic parallelization of a sequential program
by a compiler is the holy grail of parallel
computing. Despite decades of work by compiler
researchers, has had only limited success.

Mainstream parallel programming languages
remain either explicitly parallel or (at best)
partially implicit, in which a programmer gives
the compiler directives for parallelization.

A few fully implicit parallel programming
languages exist

(for FPGAs) Mitrion
-
C.

SJSU SPRING 2011

PARALLEL COMPUTING

Application checkpointing

The larger and more complex a computer
is, the more that can go wrong and the
shorter the mean time between failures.

Application checkpointing is a technique
whereby the computer system takes a
"snapshot" of the application. This
information can be used to restore the
program if the computer should fail.

SJSU SPRING 2011

PARALLEL COMPUTING

Algorithmic methods

Parallel computing is used in a wide range
of fields, from bioinformatics to economics.
Common types of problems found in
parallel computing applications are:

Dense linear algebra

Sparse linear algebra

Dynamic programming

Finite
-
state machine simulation

SJSU SPRING 2011

PARALLEL COMPUTING

Programming

The parallel architectures of supercomputers
often dictate the use of special programming
techniques to exploit their speed.

The base language of supercomputer code is, in
general, Fortran or C, using special libraries to
share data between nodes.

The new massively parallel GPGPUs have
hundreds of processor cores and are
programmed using programming models such
as CUDA and OpenCL.

SJSU SPRING 2011

PARALLEL COMPUTING

Classes of parallel computers

Parallel computers can be roughly classified
according to the level at which the
hardware supports parallelism.

Multicore computing

Symmetric multiprocessing

Distributed computing

Specialized parallel computers

SJSU SPRING 2011

PARALLEL COMPUTING

Multicore computing

Includes multiple execution units ("cores") on the
same chip.

Can issue multiple instructions per cycle from
multiple instruction streams. Each core in a
multicore processor can potentially be
superscalar.

execution unit, but when that unit is idling (such
as during a cache miss), it process a second
thread. IBM's Cell microprocessor, for use in the

SJSU SPRING 2011

PARALLEL COMPUTING

Symmetric multiprocessing

A computer system with multiple identical
processors that share memory and connect via a
bus.

Bus contention prevents bus architectures from
scaling. As a result, SMPs generally do not
comprise more than 32 processors.

Small size of the processors and the significant
reduction in the requirements for bus bandwidth
achieved by large caches, such symmetric
multiprocessors are extremely cost
-
effective.

SJSU SPRING 2011

PARALLEL COMPUTING

Distributed computing

A distributed memory
computer system in
which the processing
elements are
connected by a
network.

Highly scalable.

(a)

(b) A distributed system.

(c) A parallel system.

SJSU SPRING 2011

PARALLEL COMPUTING

Specialized parallel computers

Within parallel computing, there are
specialized parallel devices that tend to be
applicable to only a few classes of parallel
problems.

Reconfigurable computing

General
-
purpose computing on graphics
processing units

Application
-
specific integrated circuits

Vector processors

SJSU SPRING 2011

PARALLEL COMPUTING

Questions?

SJSU SPRING 2011

PARALLEL COMPUTING

References:

Wikipedia.org