OpenMP in a Heterogeneous

coleslawokraΛογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 3 χρόνια και 7 μήνες)

74 εμφανίσεις

OpenMP

in a
H
eterogeneous
W
orld

Ayodunni Aribuki

Advisor: Dr. Barbara Chapman


HPCTools

Group

University of Houston

Top 10 Supercomputers (June 2011)

2

Why
OpenMP


Shared memory parallel programming model


Extends C, C++. Fortran


Directives
-
based


Single code for sequential and parallel version


Incremental parallelism


Little code modification


High
-
level


Leave multithreading details to compiler and runtime


Widely supported by major compilers


Open64, Intel, GNU, IBM, Microsoft, …


Portable

www.openmp.org

3

OpenMP

Example

4

Present/Future Architectures &
Challenges they pose

Node 0

Memory

Node 1

Node 2

Node 3

Memory

Memory

Memory

accelerator

Memory



Many more CPUS

Location

Heterogeneity

Scalability

5

Node 0

Memory

Node 1

Node 2

Node 3

Memory

Memory

Memory

Heterogeneous Embedded Platform

6

Heterogeneous High
-
Performance
Sy
stems

Each node has multiple CPU cores, and some of the nodes are equipped
with additional computational accelerators, such as GPUs.

www.olcf.ornl.gov
/
wp
-
content/uploads/.../
Exascale
-
ASCR
-
Analysis.pdf

7


Must map data/computations to specific devices


Usually involves substantial rewrite of code


Verbose code


Move data to/from

device
x


Launch kernel on device


Wait until
y

is ready/done


Portability becomes an issue


Multiple versions of same code


Hard to maintain

Programming Heterogeneous
Multicore
:

Issues

Always

hardware
-
specific!

8

Programming Models? Today’s Scenario

// Run one
OpenMP

thread per device per MPI node

#
pragma
omp

parallel
num_threads
(
devCount
) if (
initDevice
())


{


// Block and grid dimensions


dim3
dimBlock
(12,12);


kernel<<<1,dimBlock>>>();


cudaThread
Exit
();

}

else


{


printf
("Device
error

on %s
\
n",
processor_name
);

}


MPI_Finalize
();


return

0;

}

www.cse.buffalo.edu
/faculty/miller/Courses/CSE710/
heavner.pdf

9

OpenMP

in the Heterogeneous World


All threads are equal


No vocabulary for heterogeneity, separate device



All threads must have access to
the memory


Distributed memories common in embedded systems


Memories may not be coherent



Implementations rely on OS and threading
libraries


Memory allocation, synchronization e.g. Linux,
Pthreads


10

Extending
OpenMP

Example

Main
Memory

Application
data

General
Purpose
Processor
Cores

HWA

Application
data

Device cores

Upload

remote

data

Download

remote

data

Remote

Procedure

call

11

Heterogeneous OpenMP Solution Stack

OpenMP Application

Directives,
Compiler

OpenMP

library

Environment

variables

Runtime library

OS/system support for shared memory

OpenMP Parallel Computing Solution Stack

User

layer

Prog
. layer

OpenMP

API

System


layer

Core 1

Core 2

Core n



MCAPI, MRAPI, MTAPI


Language
extensions


Efficient
code
generation

12


Target
Portable
Runtime
Interface

12

Summarizing
M
y Research


OpenMP

on
h
eterogeneous architectures


Expressing heterogeneity


Generating efficient code for GPUs/DSPs


Managing memories


Distributed


Explicitly managed


Enabling portable implementations


13

Backup


14

MCA: Generic
Multicore

Programming


Solve portability issue in embedded
multicore

programming


Defining and promoting open specifications for


Communication
-

MCAPI


Resource Management
-

MRAPI


Task Management
-

MTAPI

(www.multicore
-
association.org)

15

Heterogeneous Platform: CPU +
Nvidia

GPU

16