INTRODUCTION TO HIGH PERFORMANCE COMPUTING

monkeybeetleΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

52 εμφανίσεις

2012

INTRODUCTION TO HIGH PERFORMANCE
COMPUTING

Susan Thomas Brown, Ph.D.

Summer 2010

WHAT IS HIGH PERFORMANCE COMPUTING?


Why do we do HPC?


What differentiates HPC from just “C?”


What are the different types of HPC?


Name some applications of HPC…

CONCEPTS


Serial vs. Parallel Computing


Types of paralle
l computers


Terminology


not to be feared!


COMPUTING BASICS

Von Neumann

Architecture (1945)

Every Computer is comprised of
Four Main components:

1.
Memory

2.
Control Unit

3.
Arithmetic Logic Unit

4.
Input/Output

SERIAL COMPUTING

A problem is broken down into a discrete number
of computations (
discretized
) that can be
executed one after another on one CPU

Input

Output

Oper

2

Oper

1

……….

FLYNN’S TAXONOMY (1966)

SISD

Single Instruction, Single

Data

SIMD

Single Instruction, Multiple

Data

MISD

Multiple Instruction, Single Data

MIMD

Multiple Instruction, Multiple
Data

SISD

SINGLE INSTRUCTION, SINGLE DATA


A serial (non
-
parallel) computer


Single instruction: only one instruction
stream is being acted on by the CPU
during any one clock cycle


Single data: only one data stream is
being used as input during any one
clock cycle


Deterministic execution


This is the oldest and even today, the
most common type of computer


Examples: older generation
mainframes, minicomputers and
workstations; most modern day PCs.

SIMD

SINGLE INSTRUCTION, MULTIPLE DATA


A type of parallel computer


Single instruction: All processing units execute the
same instruction at any given clock cycle


Multiple data: Each processing unit can operate on
a different data element


Best suited for specialized problems characterized
by a high degree of regularity, such as
graphics/image processing.


Synchronous (lockstep) and deterministic
execution


Two varieties: Processor Arrays and Vector
Pipelines


Examples:


Processor Arrays: Connection Machine CM
-
2,
MasPar

MP
-
1 & MP
-
2, ILLIAC IV


Vector Pipelines: IBM 9000, Cray X
-
MP, Y
-
MP
& C90, Fujitsu VP, NEC SX
-
2, Hitachi S820,
ETA10


Most modern computers, particularly those with
graphics processor units (GPUs) employ SIMD
instructions and execution units.

SIMD EXAMPLES

From IBM’s brochure for the

Enterprise IBM/9000, printed

In 1990

Illiac

IV (Burroughs and University of Illinois) (1965)

Photo courtesy of "
Lexikon's

History of Computing Encyclopedia on CD ROM"

LLNL Archives and Research Center

CRAY C90 (
antero
): 1996

1999

MISD

MULTIPLE INSTRUCTION, SINGLE DATA


A single data stream is fed into multiple processing units.


Each processing unit operates on the data independently via independent
instruction streams.


Few actual examples of this class of parallel computer have ever existed. One
is the experimental Carnegie
-
Mellon C.mmp computer (1971).


Some conceivable uses might be:


multiple frequency filters operating on a single signal stream


multiple cryptography algorithms attempting to crack a single coded
message.

MIMD

MULTIPLE INSTRUCTION, MULTIPLE DATA


Currently, the most common type of
parallel computer. Most modern
computers fall into this category.


Multiple Instruction: every processor
may be executing a different instruction
stream


Multiple Data: every processor may be
working with a different data stream


Execution can be synchronous or
asynchronous, deterministic or non
-
deterministic


Examples: most current
supercomputers, networked parallel
computer clusters and "grids", multi
-
processor SMP computers, multi
-
core
PCs.


Note: many MIMD architectures also
include SIMD execution sub
-
components


MIMD EXAMPLES

Pingo

at ARSC


Cray XT5

Midnight at ARSC


SUN Cluster

w/
Opteron

processors

2280 Compute Cores

3456 Compute Cores

SunFire

Workstations at ARSC

4 Compute Cores

GPGPU ARCHITECTURES (~2000)


Use of Graphical Programming Units (GPUs) to
boost performance


Time
-
intensive calculations off
-
loaded to GPU

PARALLEL COMPUTING MEMORY
ARCHITECTURE

Shared Memory


All CPUs access the same
central memory space


Uniform Memory Access
vs. Non
-
uniform Memory
Access

UMA

NUMA

PARALLEL COMPUTING MEMORY
ARCHITECTURE (2)

Distributed Memory


Each processor has its own
local memory


Memory not accessible by
multiple CPUs

Hybrid Distributed
-
Shared Memory


The largest computers in use today
employ this model


Why?


PARALLEL PROGRAMMING MODELS


Shared Memory


Threads


Message Passing


Data Parallel


Hybrid


Higher Level Models

SHARED MEMORY PROGRAMMING MODEL


All tasks share a common address space, which
they read and write to asynchronously


Communication of data between tasks is
simplified for the programmer due to lack of
ownership of data


Controlling the locality of the data is difficult,
however, and makes programming more difficult


Possible slow
-
down due to


Location of memory


“traffic jams”


THREADS MODEL


A single process can have
multiple, concurrent
execution paths


Analogy:
a.out

is the main
program, the subroutine are
the threads


The “threads” share all the
common resources of
a.out
,
but perform simultaneously


Examples:


POSIX


OpenMP

MESSAGE PASSING MODEL


Tasks use their own local
memory during computation


Multiple tasks can reside on the
same physical machine as well
as across an arbitrary number of
machines.


Tasks exchange data through
communications by sending and
receiving messages


MPI Forum formed in 1992


Message Passage Interface
(MPI) released in 1994


Industry standard


MPI
-
2 released in 1994


Name areas where the
programmer could look for slow
performance issues

DATA PARALLEL MODEL


Most of the parallel work
focuses on performing
operations on a data set


A set of tasks work
collectively on the same
data structure, but on a
different partition of the
structure


Tasks perform the same
operation on their
partition of work


Fortran 90 and 95


HIGHER LEVEL MODELS


Model Hierarchy


Assembly or Machine language
-

Simple, fast, low overhead,
knowledge burden on the programmer, not very portable


First level of portable programming languages


C, C++,
Fortran, F77, F90, MPI


Higher Level Models


Intuitive, GUI
-
based, little burden on
the programmer, high overhead


Typical Higher Level Models for Parallel Computing


Single Program Multiple Data (SPMD)


Multiple Program Multiple Data (MPMD)


Python


Matlab

TOP500 LIST


Started in 1993


Based on LINPACK Benchmark


Published every year in November at the
Supercomputer Conference


Freely available:

http://www.top500.org/lists/2011/11


GROUP EXERCISE


Break up into 4 Groups of 2


Each Group take an envelope


You have 15 minutes to build your “computer” using the architecture
defined in your envelope


The components are given


Don’t actually compute, just indicate what each component will do


Use the equation in the problem statement


Ask for help if you need it

1.
Which architecture do you think would be best to solve this
problem?

2.
Can any architecture be used to solve any problem?

3.
What “cost” is incurred in using the non
-
ideal architecture for the
problem at hand?

SOURCES

1.
Barney,
Blaise
, “Introduction to Parallel
Computing,” Lawrence Livermore National
Laboratory,
https://computing.llnl.gov/tutorials/parallel_c
omp/#Abstract

2.
http://www.nvidia.com/object/what
-
is
-
gpu
-
computing.html

3.
http://www.top500.org/list