Workflow Automation Systems

heavyweightuttermostΜηχανική

5 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

70 εμφανίσεις

Accelerating Scientific Exploration Using
Workflow Automation Systems

Terence Critchlow (LLNL)


Ilkay Altintas (SDSC)



Scott Klasky(ORNL)

Mladen Vouk (NCSU)



Steve Parker (Univ. of Utah)



Bertram Ludaescher (UC Davis)


SIAM CSE Conference

February, 2007


UCRL
-
PRES
-
228193


What is a “scientific workflow”?


Can be arbitrarily complex


Conditionals , loops / iterations, parallel execution


Human interactions


A scientific workflow is any workflow performed in
order to accomplish a larger scientific goal

Definition

A workflow is a predefined sequence of actions
which performs a specific task.

Scientific workflows exist in all domains

Promoter Identification

ROADNet workflow

courtesy of A. Rajasekar SDSC

If we can automate a workflow, application
scientists can spend more time doing science

An executable workflow is defined within a
tool in a way that allows the task to be run


There are many workflow
engines available


http://kepler
-
project.org/



A “Director” is

responsible for task
scheduling


An “Actor” is a single

task that the workflow
needs to schedule


I/O Ports connect actors

Creating an executable workflow requires
precisely defining what needs to be done


Submit a batch job to
supercomputer


When the job starts running


Track progress of simulation


Move output files to an
archive


Move output files to analysis
machine


Clean up

Overall architect (& prototypical user): Scott Klasky (ORNL)

WF design & implementation: Norbert Podhorszki (UC Davis)

Execution Log

(=> Data Provenance)

Splitting output enables
parallel processing of
same data

Each actor executes in
parallel as long as it
has needed inputs

Submit job

Monitor progress and
do analysis

Cleanup

Creating an executable workflow requires
precisely defining what needs to be done

Overall architect (& prototypical user): Scott Klasky (ORNL)

WF design & implementation: Norbert Podhorszki (UC Davis)

Wait for files
to appear

Convert files to
new data format

Send files to
archive

Generate
image

Configure
parameters based on
user and machine

Image generated using SCIRun
(Univ of Utah)

Now that I have an executable workflow, so
what?


Instead of performing the
task by hand each time,
you are able to update the
workflow parameters,
start workflow executing,
and do other things

Now that I have an executable workflow, so
what?


Monitoring for files


File transfer with
automatic restart on
failure


Automatic generation of
images


Instead of performing the
task by hand each time,
you are able to update the
workflow parameters,
start workflow executing,
and do other things


Mundane data
management tasks are
taken care of

Actors can be reused
across workflows


Instead of performing the
task by hand each time,
you are able to update the
workflow parameters,
start workflow executing,
and do other things


Mundane data
management tasks are
taken care of


Workflow executes in
parallel


Now that I have an executable workflow, so
what?

Logging, archiving, and image
generation proceed in parallel
without additional coding

Now that I have an executable workflow, so
what?


Instead of performing the
task by hand each time,
you are able to update the
workflow parameters,
start workflow executing,
and do other things


Mundane data
management tasks are
taken care of


Workflow executes in
parallel


Provenance tracking

Log files reflect both
current status of
simulation run and
provide a permanent
record of execution

Improved provenance
tracking is a major focus
of ongoing work.

Scientific workflow automation has potential
to reduce the data management burden


As experimental and simulation grows, managing
the data efficiently becomes increasingly important


Scientific workflow technology removes much of
the mundane data management burden, freeing
scientists to do science


The CIPRES project has as a key goal the creation of software infrastructure that allows developers in the
community to easily contribute new software tools, ... The modular nature of
Kepler met our requirements
, as it
is a JAVA platform that allows users to construct linear, looping, and complex workflows from just the kinds of
components. The CIPRES community is developing.
By adopting this tool, we were able to focus on
developing appropriate framework

and registry tools for our community, and use the friendly Kepler user
application interface as an entrée to our services. We are very excited about the progress we have made, and
think
the tool will be revolutionary for our user base
.






-

Mark A. Miller, PI, NSF CIPRES project, 2006

This work was performed under the auspices of the U.S.
Department of Energy by University of California Lawrence
Livermore National Laboratory under contract No. W
-
7405
-
ENG
-
48.