BGM course outline
The second two days of the HPC workshop are designed for anyone writing, debugging
or optimizing parallel applications or for someone in the reseller channel who has
customers who will be doing so. It begins with a module on the effect
ive design of a
parallel application. We build on that by covering the Intel compilers and libraries that
are most frequently encountered in the HPC environment. The workshop next covers the
debugging of a parallel program using one of the most popular d
ebuggers
–
Totalview
from Etnus. Finally we cover the optimization of the parallel program using the Intel
Trace Collector and the Intel Trace Analyzer.
Though they are not called out explicitly,
each module is supported by a series of hands
-
on lab proje
cts that cover the most
frequently used functions of each library or tool.
1)
Parallel Programming Algorithms
a)
Distributed and Shared Computing
b)
Parallel Program Design Methodology
i)
Partitioning
ii)
Communication
iii)
Agglomeration
iv)
Mapping
c)
Examples for Distributed and
Shared programs
2)
Basic MPI Programming
a)
What is MPI?
b)
Configuring and Installing MPI
i)
MPI configuration and installation
ii)
Compiling and running MPI programs
c)
The Core MPI Library
i)
Communicators
ii)
Basic point
-
to
-
point communication
iii)
Blocking versus non
-
blocking comm
unication
iv)
Collective communication
3)
MPI Debugging with Totalview*
a)
Basic Debugging
i)
Starting a simple program under TotalView*
ii)
Controlling program execution
iii)
Looking at data
b)
Parallel Debugging
i)
Starting your MPICH* application under TotalView
ii)
Parallel program
control
iii)
Examining program state including message queues
4)
MPI Tuning with the Intel® Trace Analyzer and the Intel® Trace Collector
a)
Introduction to Intel® Trace Analyzer
i)
Concept, GUI and examples traces
ii)
Live demonstration and BKMs
b)
Introduction to Intel® Tra
ce Collector
i)
Concept, compiling and linking
ii)
Default tracing
iii)
Recording names of functions or regions
c)
Advanced Profiling Options
5)
MPI ScaLAPACK* Cluster Tools
a)
ScaLAPACK* Overview
b)
Calling ScaLAPACK
c)
Data Layouts and optimizing
In addition:
VTune Performance
Analyzer
Navigation of the VTune analyzer User Interface
Performance analysis and sound principles of performance tuning using a
methodology
Configuration of the VTune analyzer environment
Basic sampling, hot spot identification, event ratio analysis
Call graph profiling
Linux* considerations
A highly efficient methodology at selecting and using Itanium 2 performance
counters
Intel® Performance Libraries
The Intel Performance Libraries are presented as a highly efficient approach to
code tuning
Us
ing the Intel® Math Kernel Library for optimizing math intensive floating
point applications
Loop independence concepts
Using the Intel® Integrated Performance Primitives to develop multimedia
applications
Intel® Compilers and Source Code Tuning
Extrac
ting the best performance from the Intel Itanium architecture by using the
Intel compilers: compiler switch usage and expected results
Improving an application's performance by modifying its source code: analyzing
compiler reports, employing loop independ
ence concepts, and assisting the
compiler in software pipelining performance critical code
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment