Though they are not called out explicitly, each module is supported by a series of hands-on lab projects that cover the most frequently used functions of each library or tool.

unevenoliveSoftware and s/w Development

Dec 1, 2013 (3 years and 6 months ago)

75 views

BGM course outline


The second two days of the HPC workshop are designed for anyone writing, debugging
or optimizing parallel applications or for someone in the reseller channel who has
customers who will be doing so. It begins with a module on the effect
ive design of a
parallel application. We build on that by covering the Intel compilers and libraries that
are most frequently encountered in the HPC environment. The workshop next covers the
debugging of a parallel program using one of the most popular d
ebuggers


Totalview
from Etnus. Finally we cover the optimization of the parallel program using the Intel
Trace Collector and the Intel Trace Analyzer.
Though they are not called out explicitly,
each module is supported by a series of hands
-
on lab proje
cts that cover the most
frequently used functions of each library or tool.



1)

Parallel Programming Algorithms

a)

Distributed and Shared Computing

b)

Parallel Program Design Methodology

i)

Partitioning

ii)

Communication

iii)

Agglomeration

iv)

Mapping

c)

Examples for Distributed and

Shared programs

2)

Basic MPI Programming

a)

What is MPI?

b)

Configuring and Installing MPI

i)

MPI configuration and installation

ii)

Compiling and running MPI programs

c)

The Core MPI Library

i)

Communicators

ii)

Basic point
-
to
-
point communication

iii)

Blocking versus non
-
blocking comm
unication

iv)

Collective communication

3)

MPI Debugging with Totalview*

a)

Basic Debugging

i)

Starting a simple program under TotalView*

ii)

Controlling program execution

iii)

Looking at data

b)

Parallel Debugging

i)

Starting your MPICH* application under TotalView

ii)

Parallel program
control

iii)

Examining program state including message queues

4)

MPI Tuning with the Intel® Trace Analyzer and the Intel® Trace Collector

a)

Introduction to Intel® Trace Analyzer

i)

Concept, GUI and examples traces

ii)

Live demonstration and BKMs

b)

Introduction to Intel® Tra
ce Collector

i)

Concept, compiling and linking

ii)

Default tracing

iii)

Recording names of functions or regions

c)

Advanced Profiling Options

5)

MPI ScaLAPACK* Cluster Tools

a)

ScaLAPACK* Overview

b)

Calling ScaLAPACK

c)

Data Layouts and optimizing



In addition:


VTune Performance
Analyzer



Navigation of the VTune analyzer User Interface



Performance analysis and sound principles of performance tuning using a
methodology



Configuration of the VTune analyzer environment



Basic sampling, hot spot identification, event ratio analysis



Call graph profiling



Linux* considerations



A highly efficient methodology at selecting and using Itanium 2 performance
counters

Intel® Performance Libraries



The Intel Performance Libraries are presented as a highly efficient approach to
code tuning



Us
ing the Intel® Math Kernel Library for optimizing math intensive floating
point applications



Loop independence concepts



Using the Intel® Integrated Performance Primitives to develop multimedia
applications

Intel® Compilers and Source Code Tuning



Extrac
ting the best performance from the Intel Itanium architecture by using the
Intel compilers: compiler switch usage and expected results



Improving an application's performance by modifying its source code: analyzing
compiler reports, employing loop independ
ence concepts, and assisting the
compiler in software pipelining performance critical code