presentation - NOBUGS 2008 Overview

arghtalentΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 2 μήνες)

77 εμφανίσεις

michael

aivazis

(
aivazis@caltech.edu
)

A survival toolkit

for the technology outback

Michael
Aivazis

California Institute of Technology


NOBUGS

(ok, maybe a few bugs)

Sydney

3
-
5 November 2008

michael

aivazis

(
aivazis@caltech.edu
)

Lay of the land

NOBUGS, Sydney, 3
-
5 November 2008

2

michael

aivazis

(
aivazis@caltech.edu
)

Evolutionary pressures


Alternatives: evolve, uplift or perish…

NOBUGS, Sydney, 3
-
5 November 2008

3

michael

aivazis

(
aivazis@caltech.edu
)

Debunking stereotypes

NOBUGS, Sydney, 3
-
5 November 2008

4

Dymaxion

map (B. Fuller)


michael

aivazis

(
aivazis@caltech.edu
)

NOBUGS, Sydney, 3
-
5 November 2008

User stereotypes


End
-
user


occasional user of prepackaged and specialized analysis tools


Application author


author of prepackaged specialized tools


Expert user


investigator with a specific scientific goal


Domain expert


author of analysis, modeling or simulation software


Software integrator


responsible for extending software with new technology


Framework maintainer


responsible for maintaining and extending the infrastructure

5

michael

aivazis

(
aivazis@caltech.edu
)

Technical challenges

6

NOBUGS, Sydney, 3
-
5 November 2008

michael

aivazis

(
aivazis@caltech.edu
)

Sources of complexity


Project size:


asset complexity: number of lines of code, files, entry points


dependencies: number of modules, third
-
party libraries


runtime complexity: number of objects types and instances


Problem size:


number of processors needed, amount of memory, cpu time


Project longevity:


life cycle, duty cycle


cost/benefit of reuse


managing change: people, hardware, technologies


Locality of needed resources


compute/persist: where, how, when, who


User interfaces


younger users will have no tolerance for either “bad” or “ugly”


Adapting to new tehnology



Turning
craft

into:
science, engineering, … art

7

NOBUGS, Sydney, 3
-
5 November 2008

michael

aivazis

(
aivazis@caltech.edu
)

Past successes


Projects:


Caltech ASC Center (DOE)


GeoFramework (NSF)


Computational Infrastructure in
Geodynamics (NSF):


DANSE (NSF)


Caltech PSAAP Center(DOE)



Large collaborations


faculty, post
-
docs, students


geographically distributed



Challenges


independent but coherent evolution


integration

NOBUGS, Sydney, 3
-
5 November 2008

8

michael

aivazis

(
aivazis@caltech.edu
)

Leveraging

NOBUGS, Sydney, 3
-
5 November 2008

9

michael

aivazis

(
aivazis@caltech.edu
)

Flexibility through scripting


Scripting enables us to


Organize the large number of simulation parameters


Allow the simulation environment to discover new capabilities without
the need for recompilation or re
-
linking


Integration framework is written in Python


The interpreter


modern object oriented language


robust, portable, mature, well supported, well documented


easily extensible


rapid application development


Support for parallel programming


trivial embedding of the interpreter in an MPI compliant manner


a python interpreter on each compute node


MPI is fully integrated: bindings + OO layer


No measurable impact on either performance or scalability


NOBUGS, Sydney, 3
-
5 November 2008

10

michael

aivazis

(
aivazis@caltech.edu
)

Pyre


Pyre is a
software architecture
:


a specification of the organization of the
software system


a description of the crucial structural
elements and their interfaces


a specification for the possible
collaborations of these elements


a strategy for the composition of structural
and behavioral elements



Pyre is multi
-
layered


flexibility


complexity management


robustness under evolutionary pressures




Pyre is a
component framework


NOBUGS, Sydney, 3
-
5 November 2008

11

application
-
general

application
-
specific

framework

computational engines

michael

aivazis

(
aivazis@caltech.edu
)

Choosing your gear

NOBUGS, Sydney, 3
-
5 November 2008

12

michael

aivazis

(
aivazis@caltech.edu
)

Some pyre services


journal


flexible control over the generation and delivery of simulation diagnostics from
the compute nodes to the workstation


monitor


a distributed service for low bandwidth, on the fly visualizations


currently used mostly for status monitoring and debugging


timer: embedded performance monitor


ipa: user authentication


passwords, SSL certificates, … Grid authentication


weaver


a general source code generation facility


support for many languages


FORTRAN, C, C++, python, HTML, XML


automatic web page creation for cgi scripts


blade: a toolkit
-
independent UI generator


opal: web based UI and application hosting


pyre based cgi scripts


auto
-
generation of html/javascript


ajax support in progress (jQuery)

NOBUGS, Sydney, 3
-
5 November 2008

13

michael

aivazis

(
aivazis@caltech.edu
)

Distributed computing


gsl:


a package that completely encapsulates the middleware



provides both user space and
grid
-
enabled solution



User space:


ssh, scp


pyre service factories and component management


Web services


full pyre/opal support for “
science gateways



py
Grid
Ware from Keith Jackson’s group



Advanced features


dynamic discovery for optimized deployment


reservation system for computational resources


NOBUGS, Sydney, 3
-
5 November 2008

14

michael

aivazis

(
aivazis@caltech.edu
)

Pyre components


Component based solutions
are ideal for complex systems


encourage the decomposition of
the problem into manageable
functional units


expose the interaction
mechanisms between these units


enable the nearly independent
evolution of the parts


Component frameworks
enable an incremental and
evolutionary approach


existing codes can start
producing results immediately


new services can be incorporated
incrementally

NOBUGS, Sydney, 3
-
5 November 2008

15

Component

input ports

output ports

properties

component core

name

control

michael

aivazis

(
aivazis@caltech.edu
)

Component anatomy


Core: encapsulation of
computational engines


middleware that manages the
interaction between the framework
and codes written in low level
languages



Harness: an intermediary between
a component’s core and the
external world


framework services:


control


port deployment


core services:


deployment


launching


teardown

NOBUGS, Sydney, 3
-
5 November 2008

16

michael

aivazis

(
aivazis@caltech.edu
)

Component cores


Three tier encapsulation of access
to computational engines


engine


bindings


facility implementation by
extending abstract framework
services



Cores enable the lowest
integration level available


suitable for integrating large codes
that interact with one another by
exchanging complex data
structures


UI: text editor


NOBUGS, Sydney, 3
-
5 November 2008

17

public interface

bindings

computational engine

core

michael

aivazis

(
aivazis@caltech.edu
)

Application archiving


Produce a fully repeatable execution by recording


scripts


user choices


sources (cvs/svn tags or even the files themselves)


build procedure


required third party libraries


version of as many runtime components as can be determined


generated data sets (urls, actual files)


Implementation


meta
-
data in PostgreSQL


HDF5


embed XML meta
-
data


parsed for deducing the layout of the file as format evolves


can be extracted for easy indexing


NOBUGS, Sydney, 3
-
5 November 2008

18

michael

aivazis

(
aivazis@caltech.edu
)

Services for computational engines


Normal engine life cycle:


deployment


staging, instantiation, static initialization, dynamic initialization, resource
allocation


launching


input delivery, execution control, hauling of output


teardown


resource de
-
allocation, archiving, execution statistics


Exceptional events


core dumps, resource allocation failures


diagnostics: errors, warnings, informational messages


monitoring: debugging information, self consistency checks


Distributed computing


Parallel processing

NOBUGS, Sydney, 3
-
5 November 2008

19

michael

aivazis

(
aivazis@caltech.edu
)

HelloApp: hello world

NOBUGS, Sydney, 3
-
5 November 2008

20

from

pyre.application.Application

import

Application


class

HelloApp
(Application)
:



def

main(
self
)
:


print

"Hello world!"


return



def

__init__(
self
):


Application.__init
__(
self
, "hello")


return


# main

if
__name__

== "__main__":


app =
HelloApp
()


app.run
()

access to the base class

> ./hello.py

Hello world!


Output

michael

aivazis

(
aivazis@caltech.edu
)

Properties

NOBUGS, Sydney, 3
-
5 November 2008

21


Named attributes that are under direct user control


automatic conversions from strings to all supported types


Properties have


name


default value


optional validator functions


Accessible from pyre.properties


factory methods:
str
,
bool
,
int
,
float
,
sequence
,
dimensional


validators:
less
,
greater
,
range
,
choice

import

pyre.inventory


flag =
pyre.inventory.bool
(name=“some
-
flag", default=True)

style =
pyre.inventory.string
(name=“my
-
style", default=“boring")

scale =
pyre.inventory.float
(


name="scale", default=1.0,


validator
=
props.inventory.greater
(0))


You can derive your own property type from
pyre.inventory.Property

michael

aivazis

(
aivazis@caltech.edu
)

HelloApp: adding properties

NOBUGS, Sydney, 3
-
5 November 2008

22

from

pyre.application.Application

import

Application


class

HelloApp
(Application)
:






class

Inventory(
Application.Inventory
):



import

pyre.inventory



friend =
pyre.inventory.str
(“friend", default="world")






michael

aivazis

(
aivazis@caltech.edu
)

HelloApp: using properties


Now you can say hello to your friend…

NOBUGS, Sydney, 3
-
5 November 2008

23

from

pyre.application.Application

import

Application


class

HelloApp
(Application)
:







def

main(
self
)
:


print

"Hello %s!" %
self
.inventory.friend


return



def

__init__(
self
):


Application.__init
__(
self
, "hello")


return

> ./hello.py
--
name="Michael"

Hello Michael!

michael

aivazis

(
aivazis@caltech.edu
)

Units

NOBUGS, Sydney, 3
-
5 November 2008

24


Properties can have units:


the framework provides the type
dimensional


Support for units is in
pyre.units


all SI base and derived units


most common abbreviations and alternative unit systems


correct handling of all arithmetic operations


addition, multiplication, functions from
math


parsing expressions from the command line



import

pyre.inventory

from

pyre.units.time

import

s, hour

from

pyre.units.length

import

m, km, mile


speed =
pyre.inventory.dimensional
(


name="speed", default=50*mile/hour)


v =
pyre.inventory.dimensional
(


name="velocity", default=(0.0*m/s, 0.0*m/s, 10*km/s))


michael

aivazis

(
aivazis@caltech.edu
)

Parallel HelloApp

NOBUGS, Sydney, 3
-
5 November 2008

25

from

mpi.Application

import

Application


class

HelloApp
(Application)
:



def

main(
self
)
:


import

mpi


world =
mpi.world
()


print

"[%03d/%03d] Hello world"
%

(
world.rank
,
world.size
)


return



def

__init__(
self
):


Application.__init
__(
self
, "hello")


return


# main

if
__name__

== "__main__":


app =
HelloApp
()


app.run
()

michael

aivazis

(
aivazis@caltech.edu
)

Facilities and components

NOBUGS, Sydney, 3
-
5 November 2008

26


A design pattern that enables the assembly of application
components at run time under user control


Facilities are named abstract application requirements


Components are concrete named engines that satisfy the
requirements



Dynamic control:


the application script author provides


a specification of application facilities as part of the
Application

definition


a component to be used as the default


the user can construct scripts that create alternative components that
comply with facility interface


the end user can


configure the properties of the component


select which component is to be bound to a given facility at runtime

michael

aivazis

(
aivazis@caltech.edu
)

Inversion of control

NOBUGS, Sydney, 3
-
5 November 2008

27


A feature of component frameworks


applications require facilities and invoke the services they promise


component instances that satisfy these requirements are injected at the
latest possible time



The pyre solution to this problem


eliminates the complexity by using "service locators"


takes advantage of the dynamic programming possible in python


treats components and their initialization state fully symmetrically


provides simple but acceptable persistence (performance, scalability)


XML files, python scripts


databases (PostgreSQL, MySQL)


can easily take advantage of other object stores


is ideally suited for both parallel and distributed applications

michael

aivazis

(
aivazis@caltech.edu
)

Are we there yet?

NOBUGS, Sydney, 3
-
5 November 2008

28

michael

aivazis

(
aivazis@caltech.edu
)

Cost/benefit … rationalizations

NOBUGS, Sydney, 3
-
5 November 2008

29


Drawbacks


some reengineering required


paradigm shift


learning curve


not helped by the (current) lack of documentation…



Benefits


clear path forward for “legacy” applications


easy, normalized access to large number of facilities


structured way for enabling engines in modern computational
environments


rigorous separation of UI from computational engines


easy re
-
hosting of compliant application

michael

aivazis

(
aivazis@caltech.edu
)

Forecast


Some things change


languages, platforms, networks


algorithms, tools, processes






Some things don’t


people: your colleagues, your customers,
your boss, his boss…


money/time always short


there will always be bugs


users never know what they want


but they know how you should implement it…


security always annoying but necessary


collaboration is difficult


see items above



Design for “constrained change”

NOBUGS, Sydney, 3
-
5 November 2008

30