Fundamental of Computer Architecture

coleslawokraΛογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 3 χρόνια και 7 μήνες)

66 εμφανίσεις


Fundamental of
Computer Architecture

By Panyayot Chaikan

panyayot@coe.psu.ac.th

240
-
208

November 01, 2003


Chapter 10
-

Introduction to Parallel processing

2


240
-
208 Fundamental of Computer Architecture

Chapter
10

แนะน ำกำรประมวลผลแบบขนำน

Introduction to Parallel processing


Chapter 10
-

Introduction to Parallel processing

3


240
-
208 Fundamental of Computer Architecture

เนื้อหำ

แนะน ำสถำปัตยกรรมกำรประมวลผลแบบขนำน

มัลติโพรเซสเซอร์ เวกเตอร์คอมพิวเตอร์ คลัสเตอร์

Interconnection network
แบบต่ำงๆ

แนะน ำกำรเขียนโปรแกรมแบบขนำน



Chapter 10
-

Introduction to Parallel processing

4


240
-
208 Fundamental of Computer Architecture

High performance computer


Large computing capacity


Required to compute large amount of data in a
reasonable amount of time


Often called
Supercomputer


Chapter 10
-

Introduction to Parallel processing

5


240
-
208 Fundamental of Computer Architecture

Supercomputer Applications


Weather forecasting


Finite element analysis in structural design


Fluid flow analysis


Simulation of large complex physical system


Computer Aided Design (CAD)




Chapter 10
-

Introduction to Parallel processing

6


240
-
208 Fundamental of Computer Architecture

Parallel processing

Picture from http
://
www
.
byte
.
com
/
art
/
9601
/
img
/
509029
c
2
.
htm


Chapter 10
-

Introduction to Parallel processing

7


240
-
208 Fundamental of Computer Architecture

3
ways to construct Supercomputer


Vector processing


Multiprocessing


Distributed computer system




Chapter 10
-

Introduction to Parallel processing

8


240
-
208 Fundamental of Computer Architecture

Vector Supercomputing


Using fastest possible circuit


Wide path for access large main memory


Extensive I/O capability


Dissipate considerable power and require
expensive cooling arrangement


Provide excellent performance but at very high
price


Chapter 10
-

Introduction to Parallel processing

9


240
-
208 Fundamental of Computer Architecture

Vector Supercomputing


NEC SX
5


CRAY CRAY
1
, Y
-
MP


Fujitsu VP
5000


Hitachi SR
8000


Chapter 10
-

Introduction to Parallel processing

10


240
-
208 Fundamental of Computer Architecture

Cray Supercomputer

Picture from http
://
www
.
meteo
.
fr
/
scem
/
images
/
cray
.
gif


Chapter 10
-

Introduction to Parallel processing

11


240
-
208 Fundamental of Computer Architecture

Multiprocessor


Use large number of processor design for
workstation or PC market


Has an efficient high bandwidth medium for
communication among


the processor


memory


I/O


Provide High performance but cheaper than
vector processing


Chapter 10
-

Introduction to Parallel processing

12


240
-
208 Fundamental of Computer Architecture

Distributed computer system


Using many workstation connected by Local area
network


Provide large computing capabilities at a
reasonable cost


Chapter 10
-

Introduction to Parallel processing

13


240
-
208 Fundamental of Computer Architecture

Multiprocessing performance


Many computation can proceed in parallel


Difficulty:


the application must be broken down into small task
that can be assigned to individual processor


Processors must communicate with each other to
exchange data


Chapter 10
-

Introduction to Parallel processing

14


240
-
208 Fundamental of Computer Architecture

Classification of Parallel structure


Proposed by Flynn[
1966
]


4
types of computation


SISD


SIMD


MIMD


MISD



Chapter 10
-

Introduction to Parallel processing

15


240
-
208 Fundamental of Computer Architecture

SISD


Single Instruction stream, Single Data stream


Used in single
-
processor computer system


Chapter 10
-

Introduction to Parallel processing

16


240
-
208 Fundamental of Computer Architecture

SIMD


Single Instruction stream, Multiple Data stream


Single stream of instruction is broadcast to a
number of processor


Each processor operates on its own data


Each processor has its own memories


All processors executes the same program but
operate on different data



Chapter 10
-

Introduction to Parallel processing

17


240
-
208 Fundamental of Computer Architecture

MIMD


Multiple Instruction stream, Multiple Data
stream


Many processor execute a different program and
access its own sequence of data


Chapter 10
-

Introduction to Parallel processing

18


240
-
208 Fundamental of Computer Architecture

MISD


Multiple Instruction stream, Single Data stream


Common data structure is manipulated by
separate processor


Each processor executes a different program


This form does not occur often in practice


Chapter 10
-

Introduction to Parallel processing

19


240
-
208 Fundamental of Computer Architecture

Array processing


Is the SIMD form of parallel processing


Instruction is broadcast from a central processor


Control
processor
broadcast
instructions
grid of processing elements

Chapter 10
-

Introduction to Parallel processing

20


240
-
208 Fundamental of Computer Architecture

2
types of Array processing


Use small number of powerful processor


ILLIAC
-
IV:
64
processors, each processor is
64
-
bit


Use large number of very simple processor


CM
2
:
65536
processors, each processor is
1
-
bit


MP
-
1216
:
16384
processors, each processor is
4
-
bit


Gamma II plus:
4096
processors, each processor is
8
-
bit


Chapter 10
-

Introduction to Parallel processing

21


240
-
208 Fundamental of Computer Architecture

Array processing


Well suited to numerical problem that can be
expressed in matrix or vector format


Chapter 10
-

Introduction to Parallel processing

22


240
-
208 Fundamental of Computer Architecture

The structure of general
-
purpose
multiprocessors


UMA multiprocessor


NUMA multiprocessor


Distributed memory system


Chapter 10
-

Introduction to Parallel processing

23


240
-
208 Fundamental of Computer Architecture

A UMA multiprocessor

Interconnection network
P1
P2
P n
M1
M2
M n

Chapter 10
-

Introduction to Parallel processing

24


240
-
208
Fundamental of Computer Architecture

A NUMA multiprocessor

Interconnection network
P1
P2
P n
M1
M n
M2

Chapter 10
-

Introduction to Parallel processing

25


240
-
208 Fundamental of Computer Architecture

A distributed memory system

Interconnection network
P1
P2
P n
M1
M n
M2

Chapter 10
-

Introduction to Parallel processing

26


240
-
208 Fundamental of Computer Architecture

Processor Organizations
SISD
SIMD
MISD
MIMD
Vector
Processor
Array
Processor
Shared memory
(tightly coupled)
Distributed memory
(loosly coupled)
Clusters
UMA
NUMA
Taxonomy of parallel processing


Chapter 10
-

Introduction to Parallel processing

27


240
-
208
Fundamental of Computer Architecture

Interconnection network


Single bus


Crossbar networks


Multistage networks


Hypercube networks


Mesh networks


Tree networks


Ring networks


Chapter 10
-

Introduction to Parallel processing

28


240
-
208
Fundamental of Computer Architecture

Crossbar interconnection network

Q1
Q2
Qn
Q3

Chapter 10
-

Introduction to Parallel processing

29


240
-
208
Fundamental of Computer Architecture

Multistage shuffle network

Q0
Q1
Q2
Q7
Q3
Q4
Q5
Q6

Chapter 10
-

Introduction to Parallel processing

30


240
-
208 Fundamental of Computer Architecture

A 3
-
dimensional Hypercube Network

N
3
(011)
N
2
(010)
N
0
(000)
N
1
(001)
N
6
(110)
N
7
(111)
N
5
(101)
N
4
(100)

Chapter 10
-

Introduction to Parallel processing

31


240
-
208
Fundamental of Computer Architecture

A 2
-
dimensional mesh network


Chapter 10
-

Introduction to Parallel processing

32


240
-
208
Fundamental of Computer Architecture

Four
-
way tree network


Chapter 10
-

Introduction to Parallel processing

33


240
-
208 Fundamental of Computer Architecture

Flat tree network


Chapter 10
-

Introduction to Parallel processing

34


240
-
208
Fundamental of Computer Architecture

Ring network

Upper ring
Lower ring

Chapter 10
-

Introduction to Parallel processing

35


240
-
208
Fundamental of Computer Architecture

HP Convex architecture

Picture from http
://
www
.
byte
.
com
/
art
/
9601
/
img
/
509029g2
.
htm


Chapter 10
-

Introduction to Parallel processing

36


240
-
208 Fundamental of Computer Architecture

HP Convex Hypernode

Picture from http
://
www
.
byte
.
com
/
art
/
9601
/
img
/
509029f2
.
htm


Chapter 10
-

Introduction to Parallel processing

37


240
-
208
Fundamental of Computer Architecture

SGI Power Challenge

Picture from http
://
www
.
byte
.
com
/
art
/
9601
/
img
/
509029l2
.
htm


Chapter 10
-

Introduction to Parallel processing

38


240
-
208 Fundamental of Computer Architecture

Clustered Supercomputer

Picture from http
://
arstechnica
.
com
/
cpu
/
2q00
/
klat2
/
klat2
-
1
.
html


Chapter 10
-

Introduction to Parallel processing

39


240
-
208
Fundamental of Computer Architecture

Clusters

M
I/O
I/O
P
P
I/O
I/O
M
P
P
High speed message link

Chapter 10
-

Introduction to Parallel processing

40


240
-
208 Fundamental of Computer Architecture

Benefits of clustering


Incremental sca
la
bility


High availability


Superior price/performance


Chapter 10
-

Introduction to Parallel processing

41


240
-
208 Fundamental of Computer Architecture

Parallel programming


Task must be broken down into small task that
can be assigned to individual processors at
program level


Need operating system support


Different architecture, different programming
method


Chapter 10
-

Introduction to Parallel processing

42


240
-
208 Fundamental of Computer Architecture

A sequential program to compute the dot product

integer array a
[
1
..
N
]
, b
[
1
..
N
]

integer dot_product

.

.

read a
[
1
..
N
]
from vector_a

read b
[
1
..
N
]
from vector_b

dot_product
:=
0

do_dot
(
a,b
)

print dot_product

.

.

do_dot
(
integer array x
[
1
..
N
]
, integer array y
[
1
..
N
]


for k
:=
1
to N



dot_product
:=
dot_product
+
x
[
k
] *
y
[
k
]


end for

end do_dot


Chapter 10
-

Introduction to Parallel processing

43


240
-
208 Fundamental of Computer Architecture

First attempt of
2
-
processor computation

shared integer array a
[
1
..
N
]
, b
[
1
..
N
]

shared integer dot_product

shared lock dot_product_lock

shared barrier done

.

.

read a
[
1
..
N
]
from vector_a

read b
[
1
..
N
]
from vector_b

dot_product
:=
0

create_thread
(
do_dot, a, b
)

do_dot
(
a,b
)

print dot_product

.

.

do_dot
(
integer array x
[
1
..
N
]
, integer array y
[
1
..
N
])


private integer id


id
:=
mypid
()


for k
:= (
id
*
N
/
2
)
+
1
to
(
id+
1
)*
N
/
2



lock
(
dot_product_lock
)




dot_product
:=
dot_product
+
x
[
k
] *
y
[
k
]



unlock
(
dot_product_lock
)


end


barrier
(
done
)

end do_dot


Chapter 10
-

Introduction to Parallel processing

44


240
-
208 Fundamental of Computer Architecture

An efficient 2
-
processor computation of

a shared memory machine

shared integer array a
[
1
..
N
]
, b
[
1
..
N
]

shared integer dot_product

shared lock dot_product_lock

shared barrier done

.

.

read a
[
1
..
N
]
from vector_a

read b
[
1
..
N
]
from vector_b

dot_product
:=
0

create_thread
(
do_dot, a, b
)

do_dot
(
a,b
)

print dot_product

.

.

do_dot
(
integer array x
[
1
..
N
]
, integer array y
[
1
..
N
])


private integer local_dot_product


private integer id


id
:=
mypid
()


local_dot_product
:=
0


for k
:= (
id
*
N
/
2
)
+
1
to
(
id+
1
)*
N
/
2



local_dot_product
:=
local_dot_product
+
x
[
k
] *
y
[
k
]


end


lock
(
dot_product_lock
)



dot_product
:=
dot_product
+
local_dot_product


unlock
(
dot_product_lock
)


barrier
(
done
)

end do_dot


Chapter 10
-

Introduction to Parallel processing

45


240
-
208 Fundamental of Computer Architecture

Performance considerations

Speedup (S)
Number of processors (P)


S>P
S=P
S<P

Chapter 10
-

Introduction to Parallel processing

46


240
-
208
Fundamental of Computer Architecture

จบ

บทที่

10