Mgjansen-190909x - BioAssist Main Page

fishnibblersspongySoftware and s/w Development

Dec 14, 2013 (3 years and 5 months ago)

72 views


Engineering workflow elements

Machiel Jansen


E
-
science support

SARA Amsterdam

Who am I

Living in Baarn

with Jacqueline



No pets…but



Who am I


Education


Psychology: Artificial Intelligence


Philosophy: mainly logic.


Computer Science


PhD on Formal explorations of knowledge
intensive tasks.

And now…

Work: IPS, Getronics, UvA,


VU (VL
-
e), Collexis, SARA


Languages: Java, Prolog, and many more I try to
forget.


Fields of expertise: Knowledge representation, logic,
information retrieval, software engineering, formal
languages, Grid, cognition and as a hobbies: Dutch
architecture, art history, literature, birds, plants and
history in general.

And now…

E
-
science


E
-
science is about sharing and collaboration



It aims for generic software components on a
shared infrastructure



In BioAssist this should be realized by means
of Grid, webservices and workflow

Sharing and collaboration



Sharing data

and sharing functionality



Sharing involves agreeing on common formats
and the meaning of the information shared



It involves the setting up of contracts for the
functionality of software artefacts



Sharing means that software should be generic
and reusable

Workflow and Web Services

The vision:


Workflow and Web Services offer you a way to
share and collaborate.


You publish and share Web Services and use
others. (The same with workflows).


But what are Web Services?

Let’s limit it to SOAP/WSDL…..


SOAP


Originally Simple Object Access Protocol but
now SOAP stands for SOAP!


(Called XP for XML Protocol for a while)



Originates from doing Remote Procedural Calls
over XML and HTTP



Microsoft initiated but moved to W3C



Originally a number of competitors


XML
-
RPC etc…

Remote Procedural Style



myVar

=
foo
(“hello world”, 6)






String
foo
(String
s
, Integer
i
) {




return result;

}

Remote Procedural Style


myVar

=
foo
(“hello world”, 6)






String
foo
(String
s
, Integer
i
) {




return result;

}


LAN

Remote Procedural Style


myVar

=
foo
(“hello world”, 6)






String
foo
(String
s
, Integer
i
) {




return result;

}


Internet

SOAP Web Services


Independent of Programming language



The objects that you send are serialized and
“marshalled” in XML.



XML data is received, validated and processed.



SOAP is a way to do RPC. But not the only one.

Web Services in workflow


In workflow WS are just workflow elements. In
Taverna WS can be replaced by local programs or
other technical implementations.



Each workflow element should be generic. This
means it can play a role in more than one
application.



Workflow can be seen as big abstract programs and
the Web Services are as generic software modules.





Important issues



How do you identify Workflow Elements?



They should play a role in more than one
workflow



They should be independent of other workflow
elements



They should be elegant and easily used


More important issues


They should be resistant to change



They should have a clear interface to the outside
world.







Identifying workflow elements


Workflow elements are high level objects. Use
OO techniques and engineering



Start big and partition into smaller chunks and
apply the criterion for loose coupling and high
cohesion.

(Loose) coupling


This is the criterium for independence between
elements (objects).



Loose coupling means independency. Loosely
coupled components do not rely on each other.



How well are they separated?



A workflow element normally is completely
decoupled from any other until you create the
workflow. Then you couple elements together.

Workflow coupling


Workflow elements may be coupled in
time,
meaning the availability of one system does affect
the other. (Synchronous WS!)



Workflow elements may be coupled in
format,

meaning that differences in data models do have to
be resolved to achieve integration.



Workflow elements may be coupled in
function
,
meaning that their separate use is not very useful (or
likely).

Data coupling



What comes out one workflow element must be
reformatted before it is send as input to the next.



In Taverna: shims and beanshells. In WS this is
very hard to go around. WS are stateless.



In OO you use decoupling techniques like
factories and dependency injection?

Factories and dependency injection


Briefly…



Suppose a class that listst movies. It should be
independent of HOW the movies are stored.



Create an interface which publish an abstract method.
Then give different implementations



Inject the needed implementations in the class that uses
a movie lister.


Wanna know more? Read Fowler, use Spring, read Design
Patterns.

Uh?


Well…


Suppose your WS returns a list of proteins.



Another WS wants it tab
-
delimited, another in XML. How you deal with
that? Do you at all?



If not, the workflow programmer has to write a shim. But it’s the job of the
workflow engine to wire workflow elements. The shim means tight data
coupling.



You can provide different methods, but then the interface publishes non
-
functional details. For each new type the interface changes!



What you want is to that the workflow engine does the wiring. It should first
set the type in the service (injection) and then make a general call for a
protein list.



But WS are stateless. Still, this is food for thought.



Cohesion


Cohesion is about
how the activities within a single
module are related to one another.


Cohesion

is the measure of the strength of
functional relatedness of elements within a module.





High cohesion



Functional cohesion


If you can sum up everything that the module
accomplishes as one problem
-
related function, then that
module is functionally cohesive.



A functionally cohesive module contains elements that
all contribute to the execution of one and only one
problem
-
related task.



COMPUTE COSINE OF ANGLE

VERIFY ALPHABETIC SYNTAX

READ TRANSACTION RECORD



Reuse is good.






Cohesion and workflows


A workflow element should be a functional
cohesive unit on a high level of granularity.



If a workflow element is highly functional
cohesive it can be used in many different
workflows.

Sequential cohesion


A
sequentially cohesive

module is one whose
elements are involved in activities such that output
data from one activity serves as input data to the
next.


CLEAN CAR BODY

FILL IN HOLES IN CAR

SAND CAR BODY

APPLY PRIMER



Reuse is mor problematic. The activities do not form
a functional enitity

Procedural cohesion


A procedurally cohesive module is one whose elements
are involved in different and possibly unrelated activities
in which control flows from each activity to the next.
(Remember that in a sequentially cohesive module data,
not control, flows from one activity to the next.)


CLEAN UTENSILS FROM PREVIOUS MEAL

PREPARE TURKEY FOR ROASTING

MAKE PHONE CALL

TAKE SHOWER

CHOP VEGETABLES

SET TABLE


Such modules are hard to reuse. They may fit in a specific
scenario.


The problem with workflow elements

Workflow elements are not Web Services by
definition. They are generic loosely coupled,
functional cohesive, generic software modules.


This means that a proper workflow element
should be able to be easily used in different
kinds of workflows.


This is difficult!

Difficulties


Organizational
--

developing reusable software requires a deep understanding of
application developer needs and business requirements. As the number of developers and
projects employing reusable assets increases, it becomes hard to structure an organization
which can provide effective feedback.





Economic
--

Developing reusable system takes more effort and time, and hence money.
These investments should pay off later.



Administrative
--

Although it's common to scavenge small classes or functions
opportunistically from existing programs, developers often find it hard to locate suitable
reusable modules outside of their immediate workgroups.



Political
--

Groups that develop reusable middleware platforms are often viewed with
suspicion by application developers, who resent the fact that they may no longer be
empowered to make key architectural decisions. Likewise, rivalries among different may
prevent reuse.



Psychological
--

application developers may also perceive “top down” reuse efforts as an
indication that management lacks confidence in their technical abilities. In addition, the
“not invented here'' syndrome is ubiquitous in many organizations, particularly among
highly talented programmers (reinventing the wheel).