Upshot + Java = Jumpshot - Utk

slimwhimperSoftware and s/w Development

Nov 3, 2013 (3 years and 10 months ago)

298 views

1

Scalable Performance

Visualization with

Jumpshot

Omer Zaki, Rusty Lusk, Bill Gropp,

Debbie Swider

Mathematics and Computer Science Division

Argonne National Laboratory

2

Not Included


Getting performance with MPI


applications


implementation research topics


MPI
-
2


contents


implementation availability


MPICH


The MPI
-
2 approach to parallel I/O

3

Outline


What is Jumpshot and where did it come from?


Related efforts


Our requirements for a logfile
-
based
performance visualization system.


Producing logfiles: CLOG


Visualizing logfiles: Jumpshot


Java issues


Future work

4

What is Jumpshot?


Tool for understanding
the behavior of parallel
programs


Post
-
mortem


Logfile
-
based


Includes logging
package (CLOG)


Primarily for MPI
programs


Written in Java

logfile

Jumpshot

processes

display

CLOG

5

Typical Jumpshot Screen

6

Instrumented version of PETSc

7

Where Did Jumpshot Come From?


The history of logfile
-
based performance
-
analysis tools at Argonne is also the history of
the search for a programming environment in
which to implement simple graphics plus a GUI.


Gist (BBN)
--

raw X


black and white, not portable (BBN Butterfly only)


Upshot
--

raw X plus Athena widgets, used with
ALOG logfile format


painful, especially to get performance (1990)


Upshot redone in Tcl/Tk


easy to write, but graphics too slow

8

History (continued)


Nupshot
--

Upshot redone in Tcl/Tk/C for
speed


good performance


Tcl/C interface unstable


CLOG
--

new log format for many reasons


Java + Upshot = Jumpshot


uses CLOG


has new features


explores Java technology


next up: JPython?

9

Related Efforts


Gist survives in Dolphin’s TotalView as
TimeScan


PICL/ParaGraph
-

Pat Worley, Mike Heath,
Jennifer Etheridge, Al Geist


VAMPIR
-

Pallas


Traceview
-

Al Malony


Pablo
-

Dan Reed, Ruth Aydt


XPVM
-

Jim Kohl, Al Geist


XMPI
-

Raja Daoud, now Notre Dame


Paradyne
-

Bart Miller, Myron Livny


others

10

Why Do It Again?

Requirements for a new system (not all of them yet
satisfied by Jumpshot):


stable environment for long lifetime to
accommodate future research


portable, even unto Microsoft


support for upshot
-
type views that we have
found most useful


process timelines with scrolling and zooming


histograms of state durations, message properties,
mountain ranges


animation not so useful

11

Requirements (continued)


flexible, extensible logfile format to
accommodate new types of events, states,
concepts


end
-
user
-
defined states


scalable performance


control of logging at source


aggregation


at least tens of thousands of events


nested and overlapping states


nested more important than overlapping


connection of displayed events back to source
code.

12

Requirements (continued)


MPI awareness (communicators, semi
-
transparency of collective operations)


ability to query details of specific events,
messages, and states.


ability to locate “interesting” parts of display
(research topic)


a new one every week....



13

The CLOG Logging Library:
Background

Characteristics of old ALOG:


fixed
-
format records (6 ints and short string)


good for parsing, storing, access


bad for extendibility


timestamps an integer number of microseconds


OK in 1990


not accurate enough now


ASCII format in file


good for portability, easy to read


can’t store binary data conveniently

14

CLOG: Requirements


Efficient enough to not interfere with behavior
of program; I/O only when program finishes


Only one logfile at end is convenient


Timestamps not assumed synchronized


Flexibility in record type, but not completely
self
-
describing


Portable: logfiles can be read on different
machine than one where written.

15

CLOG: How It Works


relies on MPI, for portability


calls MPI_Wtime to get timestamps


reasonably, but not ultimately, efficient on any
given architecture/OS


multiple record formats with types, plus “raw”
type


user can define own types, states, colors


log records accumulate in big buffers in
memory until malloc fails, then stop

16

How CLOG Works (cont.)


At end (CLOG_Finalize is
collective)


process
-
local data is added
to buffers


timestamps are adjusted,
using simple algorithm and
communication with other
processes


processes form a binary
tree and do local 3
-
way
merge in parallel up to
process 0, which writes file


file is in Java (MPI
-
2’s
“external
-
32”) format

17

“Normal” Jumpshot Features


Scrolling and zooming in timeline view


Arrows to represent messages


Click on arrows and states for details


Histograms of state durations, message
bandwidth


Mountain range view to aggregate state info


Can select/deselect states, messages

18

Timelines and Mountain Range

19

“Unusual” Jumpshot Features


Scrolling and zooming in histogram view


Can focus on extreme durations/bandwidths


calculate top/bottom 1%, 5%, ... based on assumed normal
distribution


blink corresponding state instances, arrows; can help locate
“outlier” events in large confusing display.


Can scroll timelines individually to fine
-
tune clock
synchronization


Inherited from Java:


portability


can be run as applet


fancy GUI features

»
multiple look
-
and
-
feel

»
tear
-
off subwindows

20

Java Issues
-

Good


Portable (Sun, SGI, RS6000, Windows, NT)


Can be run either as normal X application or
with a browser as an applet


Graphics are fast enough


Widget set is more than adequate for GUI
construction
-

Swing.

21

Java Issues
-

Awkward


Java still rapidly evolving in this area (1.0
-

1.1.2
-

1.1.6
-

1.2beta)


made many big changes during two
-
month period
to deal with bugs, re
-
implemented features


add
-
ons also evolving (e.g. swing)


applet behavior not completely consistent with
application behavior


Inconveniences arising from the fact that a
Java program is not self
-
contained (e.g.,
CLASSPATH environment variable)


partially resolved with use of JRE


We are still committed.

22

Jumpshot Distribution


Jumpshot currently comes with


CLOG library for creating logfiles, uses any MPI


mpe logging library with ALOG/CLOG switch


MPI profiling library for automatic instrumentation
of MPI programs


Is distributed


as part of MPICH distribution (version 1.1.1)


separately as part of mpe library, for use with any
MPI implementation


separate jumpshot only

23

Future Work


Meet more of the requirements


connection to code (by logging __FILE__, __LINE__)


select by MPI communicator (not so easy, because of
communicator id issue)


vertical scrolling


Research on scalability issues; useful
agglomerations of data


Research in automated detection of performance
anomalies

24

The End