Objective - ViroLab Virtual Laboratory

duckexcellentInternet και Εφαρμογές Web

5 Φεβ 2013 (πριν από 4 χρόνια και 8 μήνες)

307 εμφανίσεις

SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Environment for
Collaborative
e-Science
Applications
Piotr Nowakowski
ACC CYFRONET AGH
AGH
University of Science and Technology
Krakow
, Poland
P.Nowakowski
@
cyfronet
.pl
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008

An
approach to scientific investigations

which,

besides of analyses of individual
phenomena, integrates different,

interdisciplinary sources of knowledge

about a complex system, to acquire
understanding of the system as a whole.

Foster, I., Kesselman, C.,
Scaling system-level science: Scientific
exploration and its implications
. IEEE Computer 39 (11) 2006
System-Level Science
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Outline

Motivation – applications, requirements

Idea of an “experiment”

Method

scripts, grid object abstraction

Examples of experiments

A
rchitecture
in short

Demos, demos, demos …

Summary
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Users

Decision Support System use
Experiment use
Experiment planning
Experiment developer
Scientist
Clinical virologist
ViroLab Gem
Development
Experiment
Planning
Experiment
Execution
Results
sharing
Results
management
Decision
support
Results
storing
<<include>>
<<include>>
Publishes new
ViroLab Gem
Uses various
ViroLab Gems
and available
data resources
to create
experiments
Data Source
registration
Adds data
resources inside
virtual lab
Runs prepared
experiments to
obtain results
Discuss and
analyses the
results
Stores the
results in
laboratory
data store
Results regarding
drug resistance
of virus mutants
may become new
rules for DSS
DSS relies on
rules to give
information on
drug resistance
Experience
feedback
<<include>>
Helps developer
improve the
experiment
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Required Functionality

Development:

focus on computational

functionality

easy access to remote data

Experiment sharing
:

creation of experiments by teams
of
developers

Browsing experiments:

web application

for browsing

and executing

experiments

Running experiments:

single-click experiment execution

interactive

communication

between

experiment and user

Gathering results:

dedicated renderers for script input and output parameters

provenance storage and searching

Feedback:

easy communication between end

users and developer
s
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Layer
ed Structure of Environment
Experiment
developer
Scientist
Clinical
Virologist
Experiment
Planning
Environment
Experiment
scenario
ViroLab Portal
Virtual Laboratory runtime components
(Required to select resources and execute experiment scenarios)
Computational services
(services (WS, WTS, WS-RF), components
(MOCCA), jobs (EGEE, AHE))
Data services
(DAS data sources, standalone databases)
Grids, Clusters, Computers, Network
Users
Interfaces
Runtime
Services
Infrastructure
Patient Treatment
Support
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Experiment

Experiment
- a process that combines together
data with a set of activities that act on that data
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Experiment Lifecycle

Experiment Pipeline
- is a collaborative
planning and execution process that may create
a new experiment
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
High-level Script Approach

Application composition based on dynamic scripting language

Rapid application development, prototyping, experiments (scientific
applications)

High-level API

Rich functionality (standard library), few lines of code needed

Easy to learn, clear and readable code

Supports full set of control structures (high expressiveness)

Candidate languages

Python, Ruby, Lua, Perl …

Solution: JRuby

Object oriented, clear syntax, dynamic

Integration with Java (important for use of existing technologies)

Uniform interface to computational resources provided the Grid
Operation Invoker library

high level of abstraction
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Grid Object Abstraction

Grid Object Class
– a group that
has similar functionality with
regard to its domain operations

Grid Operation
– specification of
an activity with descriptions of
input parameters and output
results

Grid Object Implementation

realization of Grid Object Class in
concrete technology (e.g. WS,
MOCCA)

Grid Object Instance

an
implementation that is deployed
and can be accessed by the
Invoker
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Sample application
require ’cyfronet/gridspace/goi/core/g_obj’
begin

drs = GObj.create(’org.virolab.DrugRankingSystem2’)

mutations =

’P1M

I2L

S3T P4Q E6G T7C V10N K11F V35T
’.split(‘ ‘)

r
anking
= drs.drs(’ANRS’, ’reverse_transcriptase’, mutations)

puts r
anking
end

Instantiate Grid Object representative

Prepare input parameters – a simple conversion of a
string into an array of strings

Invocation of the Grid Operation that finds an
optimum drug set for a patient with a given HIV
mutations according to the ANRS rule set
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008

Development and Running
Applications
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Virus Genotype Analysis
http://virolab.cyfronet.pl/trac/vlvl/wiki/ExperimentDemo
Objective:
loads nucle
o
tide sequence of an HIV
virus strain

and provides its mutations and drug ranking information

Gems used:

Alignment

Subtype detection

Drug ranking

Data Access
Service

Input: virus
nucleotide sequence

Output: various analys
e
s
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008

Experiment Plan
patientID = 6 region = "rt“
remoteDB = DACConnector.new( "
DAS
","virolab.
hlrs.de
")
sequences = remoteDB.executeQuery(
"select nucleotides from nt_sequence where
patient_ii=#{patientID.to_s};")
regaDBMutationsTool = GObj.create('regadb.RegaDBMutationsTool')
regaDBMutationsTool.align(sequences, region.upcase)
mutations = regaDBMutationsTool.getResult

regaDBSubtypingTool = GObj.create('regadb.RegaDBSubtypingTool')
regaDBSubtypingTool.subtype(sequences[0])
puts regaDBSubtypingTool.getResult
puts drs.drs('retrogram', region, 100, mutatations)
Parameters
Genotype r
etrieval
Alignment + Mutation

detection
Subtyping
Drug ranking
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Protein Folding
http://virolab.cyfronet.pl/trac/exampleExperiments/wiki/exex/Folding
Objective:
demonstrate the usage of Virtual Laboratory for
proteomics applications

Input: protein and chain ID

Output: 3D structure of protein

Gems used:

Protein Data Bank (PDB) Web Service

Early-stage protein folding
Bryliński M, Jurkowski W, Konieczny L, Roterman I.
Limited conformational space for early-stage
protein folding simulation

Bioinformatics 20(2), 199-205 (2004)

DAC and WebDAV for result storage
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Data
M
ining with Weka
Objective:
to analyze the quality of various classification algorithms on large
datasets using Weka data mining library and MOCCA component
framework.

Input: sample dataset

Output: quality of predictions

Gems used:

Web services for data
retrieval, conversion,
splitting and testing

MOCCA components
wrapping algorithms
from Weka

WebDAV server for
data storage
http://virolab.cyfronet.pl/trac/exampleExperiments/wiki/exex/WekaAdv
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Virtual Laboratory Architecture
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
18
Experiment Planning Environment
Objective:
to facilitate the experiment development process by providing an integrated and
collaborative environment

Assist in the development of
experiment plans powerful GScript
editor

Execute experiments by integrating
with the GSEngine runtime system

Support collaboration between
VL users groups by:

Sharing experiment within
groups of experiment developers

Releasing experiments
for scientific users

Collecting feedback from
experiment users

Enable extendability with new features
(e.g. GRR and Onto Browser plugins)
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
19
Experiment Management Interface
Objective:
to provide ViroLab users with an easily accessible and
convenient facility for experiment and result management

Browse experiment repositories
in search for a suitable
experiment

plan

Interactively run any number
of experiment plans
and monitor

their execution
status

Retrieve, analyze and store
scientific results

Extensible support for
different

security models
with full

Shibboleth
integration
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Demos, demos, demos, ….
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Technolog
ies
Used

GridSphere
and
Google Web Toolkit as the user portal

Eclipse RCP as the developer UI

JRuby for experiment planning
/ m
iddleware

OGSA-DAI based on GT4 for data source integration

OWL, Jena, eXist for ontology storage and processing of
semantically-rich provenance data

GEMINI, Ganglia and JMX for experiment and
infrastructure monitoring

Web Services, XML-RPC for integration

Shibboleth as Authentication and Authorization
framework
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Scientific Contribution

Development of collaborative applications

A new method of collaborative application
development

Abstract layers

to hide technological changes

Semantic description of applications

Integration of provenance recording and tracking

Constructing the virtual laboratory

Semantic description of resources

Deployment on available Grid systems, clusters, and
single CEs

Integration of WS, WSRF, components, jobs

Secure data access and integration

Software engineering aspects
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Team

Runtime system

Tomasz Gubala, Marek Kasztelnik, Piotr Nowakowski
,
Eryk Ciepiela, Asia Kocot

Middleware

Maciej Malawski, Tomasz Bartynski,
Jan Meizner

Presentation and collaboration tools

Wlodzimierz Funika, Dariusz Krol
, Daniel Harezlak
, Alfredo Tirado

Data Access

Matthias Asse
l
, Stefan Wesner, (Aenne Loehden, Bettina Krammer), Piotr
Nowakowski

Provenance

Bartosz Balis, Jakub Wach, Michal Pelczar

Integration

Tomasz
Gubala, Marek Kasztelnik

ViroLab Project (Peter Sloot
)

www.virolab.org

VLvl description, demos, downloads
virolab.cyfronet.pl
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Supporting slides
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Workflow Systems and Virtual Laboratories
Virtual Laboratory
App construction
Middleware
Kepler
Drag&Drop
Ptolemy II, Globus Toolkit, WS
Triana
Drag&Drop
Globus Toolkit, GAT
myGrid, Taverna
Drag&Drop
Globus Toolkit, WS

Geodise
Matlab scripts
Computational Toolbox, Jython, WS,
Java CoG (1.1), GT2
NESSgrid
webpage with tools

Globus Toolkit, WS

VL-e
Drag&Drop
Globus Toolkit, SoapLab

VL PSNC
JWS app
s
, batch jobs
Globus Toolkit, GAT

Conclusions:

There is no solution that fulfil
ls
all requirements

Many useful ideas: semantic modeling, tool registry,
provenance tracking
etc.

Using
a
scripting language - useful (Geodise)
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
26
Domain Ontology Store
Objective:

To base the integration between middleware components
and user interfaces on domain-related taxonomies and concepts.

Domain knowledge is modeled
with OWL-transcribed models

Models are s
tored in a secure,
persistent store with
an
open
access protocol

Other components and tools
are able to query the store
using
several query languages

Ontology Browser Plugin for experiment
developers to use their domain concepts
when planning new experiment
s
Domain B Container
Domain A Container
Experiment
Planning
Environment
Ontology
Browser
Plugin
Domain A Data Model
<<artifact>>
manages
manages
Domain A Object Model
<<artifact>>
Ontology Store Facade
Grid
Resources
Registry
Links to
model content
Queries
models
content
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Experiment Repository
Objective:
to store and version Experiment Plans, which can be executed by
the GSEngine interpreter

Experiment Plans arranged in a
specific structure

Experiment Repository Client –
provides access to Experiment
Repositories

Facilitates releasing Experiment
Plans to the repository from EPE

Enables downloading Experiment
Plans for execution by EMI and
the GSEngine Server interpreter

Enables managing feedback in
EMI

Pluggable adapters for different
version control systems
27
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Grid Resource Registry

Two layers of resource description

technology-independent

technology-specific

ResBrowser Plugin assists developers during
experiment development

GRRAdmin Plugin allows managing
information stored inside the registry
GRR Admin
Plugin
Resources Browser
Plugin
Grid
Operation
Invoker
GrAppO
Optimizer
Registry
Service
Registry Core
Registry DB
OR mapping
Registry Notification
Listener
Monitoring
System
Domain
Ontology Store
PROToS
Browse registry
Administer registry
Get tech info
Delegate request
Notify about resource status
Submit event
Link to ontologies
Objective:

To store information about all available and accessible
computational resources, hides the technological complexity and provides
uniform and user-friendly access to web services, component and grid
infrastructures
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
29
GridSpace Engine
Objective:

To provide a Virtual Laboratory engine of for running experiments
and an entry

point for other capabilities of Virtual Laboratory

Environment for running experiments

Interpreter of
GScript based on the Ruby
programming language

Contain
s
libraries dedicated to
the
Virtual
Laboratory written in Ruby or Java

Experiments provided with
environment
al
context
(
security, monitoring, etc.)

Entry point for the Virtual Laboratory

Proxying Data Access Service

Running experiments stored
within Experiment Repository

Integration with monitoring infrastrucure
and provenance
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
30
Grid Operation Invoker
Objective:
to provide developers of virtual experiments with a
uniform interface
for
computational resources that enables transparent and coherent usage of
web services, components and grid infrastuctures, thus allowing integration of
tools
e
xposed

with diverse middlewaretechnogies in an easy manner.

Adapters for various
middleware
technologies

Simple and clear
developer API

Full integration
with GScript syntax

Usable as a standalone
component
GridObj
GObj
GObj
Optimizer Client
Registry Client
WSAdapter
MOCCAAdapter
WTSAdapter
EDGAdapter
GrAppO
GRR
Grid Operation Invoker
AHEAdapter
WSRFAdapter
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Middleware
Objective:
to harness
diverse
computation models and
technologies
(
Web

Services,
components, job submission – grid systems
) and make them available in the Virtual
Laboratory in a convenient way

Pluggable Grid Operation Invoker adapters for supported technologies

Development and experiments
with component-based
middleware: MOCCA

Component composition

Inter
-
framework
interoperability

Security for component-based
middleware

Monitoring of component-
and service-oriented
middleware
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Optimization of component deployment planning

Problem: mapping of components to computing nodes of
the Grid

Model: weights of components and their connections

Goal: modeling of co-allocation and exclusive access

Notation:

Weight of connection
i:

Weight of component
i
:

The number of node component
i
is mapped to:

Locality flag:

Planning matrix:

Plan Cost:
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Examples of cost minimization

Domain decomposition: N x N nodes

C
– weight of computing component

L
– weight of communication link

I
– internal communication weight

1
– weight of communication
component

Task farm

M
– weight of master component

S
– weight of slave

Plan cost:
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
Optimization in GridSpace

Problem: how to select best resources to
invoke Grid Operations

Similar to workflow scheduling problem

Short-sighted and medium-sighted
optimization

Realized in GrAppO
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
35
GrAppO Architecture

GrAppO Manager – coordinates
GrAppO components

Optimization Engine – calculates
optimization algorithms

Performance Predictor –
estimates performance of
possible solutions using:

Historical Data Analyzer –
analyzes historical performance
data

Resource Condition Data
Analyzer – analyzes current state
of resources

Application Analyzer - retrieves
the application graph and
analyzes it
SCS Colloquium - UvA, Amsterdam,
2
6
.0
9
.
2008
36
Far-sighted scheduling with ASKALON

ASKALON environment
includes a Grid workflow
metascheduler.

We proposed an interface
between GridSpace and
ASKALON.

GridSpace script must be
mapped to ASKALON
workflow language (AGWL).

Execution is performed by
GridSpace, ASKALON
supplies scheduling
decisions.