Deliverable1Wp5 - DIP - SemanticWeb.org

ugliestharrasSoftware and s/w Development

Nov 4, 2013 (3 years and 7 months ago)

199 views


Report on the
State
-
of
-
the
-
Art and R
equire
ment
s A
nalysis


Deliverable 5.
1





i




DIP

Data, Information and Process Integration with Semantic Web Services

FP6
-

507483





Deliverable



D5.1

Report on the State
-
of
-
the
-
Art and Requirements A
nalysis

(WP 5


Service Mediation)



Emilia Cimpian

Christian Drumm

Michael Stollberg

Ion Co
nstantinescu

Liliana Cabral

John Domingue

Farshad Hakimpour

Atanas Kiryakov


05 November 2013


Report on the
State
-
of
-
the
-
Art and R
equire
ment
s A
nalysis


Deliverable 5.
1





ii

E
XECUTIVE
S
UMMARY

This deliverable covers the

current state
-
of
-
the
-
art in data, information, and process
mediation
,

and provides an analy
sis of mediation requirements for the DIP
M
ediation
Component.

The document treats the mediation of data and information separately from process
mediation since process mediation requires

the

interpretation of goals
and

workflow
as
well as

flexible Web Ser
vice invocation
, which are

not

required for data and
information mediation
.

This document consists of

two main parts
. The

first
part
provide
s

an overview of the
current state of the art in mediation,
describing some of the existing

approaches and
projects.

In this section
,

the industrial and research approaches
are

treated differently for
data and information mediation.

The second part of the document provide
s

an
analysis
o
f mediation requirements. Three
types

of requirements need to be considered here: req
uirements regarding the general
architecture of the DIP
Mediation C
omponent
(which can be requirements for

the run
-
time environment

and

requirements
for

the design time tool
)
, requirements for data and
information mediation, and requirement
s

for processes
mediation.


Report on the
State
-
of
-
the
-
Art and R
equire
ment
s A
nalysis


Deliverable 5.
1





iii

Document

Information


IST Project
Number

FP6


507483

Acronym

DIP

Full title

Data, Information

and Process Integration with Semantic Web Services

Project URL

http://dip.semanticweb.org

Document
URL

https://bscw.dip.deri.ie/bscw/bscw.cgi/0/521


EU Project officer

Daniele Rizzi


Deliverable

Number

5.1

Title

Report on the
S
tate
-
of
-
the
-
A
rt and
R
equirements
A
nalysis

Work package

Number

5

Title

Service Mediation


Date of delivery

Contractual

M
6

Actual

30
-
Ju
ne
-
04

Status


versio
n

0.

2

final


Nature

Prototype


Report


Dissemination



Dissemination
Level

Public


Consortium





Authors (Partner)

Emilia Cimpian

(NUIG)
, Chris
tian Drumm

(SAP),

Michael Stollberg (UIBK)
,

Ion Constantinescu (EPFL)


Liliana Cabral (OU), John Domingue(OU),
Farshad Hakimpour(OU)
,

Atanas Kiryakov

(SIRMA)

Responsible
Author

Emilia Cimpian

Email

emilia.ci
mpian@deri.ie

Partner

NUIG

Phone

+353
-
91
-
512640



Abstract

(for dissemination)

In the last twelve years since Gio Wiederhold [
Wiederhold, 1992
] first
came up with the idea of medi
ation and mediation systems, intensive
research has been done in this field. In this report
,

we provide an overview
of the current state
-
of
-
the
-
art in data, information
,

and processes
mediation

and an analys
is

of the requirements
for
constructing a mediati
on
system.

Keywords

Data and information mediation; processes mediation; schema matching;
ontology mapping, merging and alignment; choreography
;

orchestration
;

collaboration

Version log/Date

Change

Author

27
-
Feb
-
04

First draft of the skeleton of the
del
iverable

Emilia Cimpian

19
-
Feb
-
04

Changes on the skeleton conforming
to the discussions during the
Wiesbaden meeting

Emilia Cimpian

22
-
March
-
04

Paragraphs added regarding the
state
-
of
-
the art in data mediation

Christian Drumm


Report on the
State
-
of
-
the
-
Art and R
equire
ment
s A
nalysis


Deliverable 5.
1





iv

29
-
March
-
04

Outline of the
state
-
of
-
the
-
art

analysis for process mediation

Michael Stollberg

05
-
April
-
04

IRS II description

John Domingue, Liliana Cabral

21
-
April
-
04

Bullet points on requirements
analysis

Christian Drumm

29
-
April
-
04

D5.1a: Survey of Industrial Data
Integration Sy
stems


separate from
this document, but part of the same
deliverable

Vladimir Alexiev

Atanas Kiryakov

29
-
April
-
04

Compilation of work on ontology
-
based data mediation

Farshad Hakimpour

30
-
April
-
04

Restructuring of the document

Emilia Cimpian, Christia
n Drumm,
Michael Stollberg

07
-
May
-
04

EPFL contribution and process
mediation included

Ion Constantinescu

14
-
May
-
04

Paragraphs added to the
introduction

Michael Stollberg

17
-
May
-
04

SAP XI and XMapper descriptions
added; also significant changes to
the st
ate of the art in data and
information mediation

Christian Drumm

20
-
May
-
04

Paragraphs added to “Overview on
Data and Information Mediation
Approach” and on the requirements;

r
eordering of the references in
alphabetical order.

Emilia Cimpian

21
-
May
-
04

M
ore requirements

Christian Drumm

25
-
May
-
04

More requirements

Christian Drumm

28
-
May
-
04

Update on process mediation

Michael Stollberg

1
-
June
-
04

Reference added to D5.1a

Atanas Kiryakov, Emilia Cimpian

02
-
June
-
04

Ms BizTalk description added;

Restructu
ring of the research state
-
of
-
the
-
art in data and information
mediation

Emilia Cimpian

03
-
June
-
04

More requirements

Christian Drumm

04
-
June
-
04

Small changes concerning the form
of the document

Emilia Cimpian

22
-
June
-
04

Changes in the entire document,
c
onforming to the reviewers
comments

Emilia Cimpian, Christian Drumm,
Michael Stollberg

24
-
June
-
0
4

Small changes after the proof
-
reading

Emilia Cimpian


Report on the
State
-
of
-
the
-
Art and R
equire
ment
s A
nalysis


Deliverable 5.
1





v

Project Consortium Information


Partner

Acronym

Contact

National University of Ireland Galway

NUIG



Prof. Dr. Christoph Bussler

Digital Enterprise Research Institute (DERI)

National University of Ireland, Galway

Galway

Ireland

Email:

chris.bussler@deri.ie

Tel: +353 91
512
460


Fundacion De La

Innovacion.Bank
inter

Bankinter



Monica Martinez Montes

Fundacion de la Innovation.
BankInter,

Paseo Castellana, 29

28046 Madrid,

Spain

Email
:
mmtnez@bankinter.es


Tel:
916234
238

Berlecon

Research GmbH

Berlecon


Dr. Thorsten Wichmann

Berlecon Research GmbH,

Oranienburger Str. 32

10117 Berlin,

Germany

Email:
tw@berlecon.de

Tel: +49 30
2852960

British Telecommunications Plc.

BT


Dr J
ohn Davies

BT Exact (Orion Floor 5 pp12)

Adastral Park Martlesham
,

I
pswich IP5 3RE,

United Kingdom

Email:
john.nj.davies@bt.com

Tel: +44 1473 609583

Swiss Federal Institute of Technology,
Lausanne

EPFL


Pro
f. Karl Aberer

Distri
buted Information Systems Laboratory

É
cole Polytechnique Féderale de Lausanne

Bât. PSE
-
A

1015 Lausanne, Switzerland

Email

:
Karl.Aberer@epfl.ch

Tel
: +41 21 693 4679

Essex County Council

Esse
x


Mary Rowlatt
,

Essex County Council,

PO Box 11, County Hall, Duke Street,
Chelmsford, Essex, CM1 1LX,

United Kingdom.

Email:
maryr@essexcc.gov.uk

Tel:

+44 (0)1245 436524

Forschungszentrum Informatik

FZI



Andreas Abecker

Forschungszentrum Informatik

Haid
-
und
-
Neu Strasse 10
-
14

76131 Karlsruhe,

Germany

Email:
abecker@fzi.de

Tel:
+49 721 9654
0

Institut für Informatik, Leopold
-
Franzens
UIBK

Prof. Dieter Fensel


Report on the
State
-
of
-
the
-
Art and R
equire
ment
s A
nalysis


Deliverable 5.
1





vi

Universitä
t Innsbruck



Institute of computer science

University of Innsbruck

Technikerstr. 25

A
-
6020 Innsbruck, Austria

Email:
dieter.fensel@
deri.org



Tel:
+43 512 5076485

ILOG SA

ILOG


Chr
istian de Sainte Marie

9 Rue de Verdun, 94253

Gentilly, France

Email:
csma@ilog.fr

Tel:
+33 1 49082981

inubit AG

Inubit


Torsten Schmale,

inubit AG

Lützowstraße 105
-
106

D
-
10785 Berlin,

Germany

Email:
ts@inubit.com

Tel:

+49 30726112
0

Intelligent Software Components, S.A.

iSOCO


Dr. V. Richard Benjamins, Director R&D

Intelligent Software Components, S.A.

Pedro de Valdivia 10

28006 Madrid, Spain

Email:
rbenjamins@isoco.com

Tel. +34 913 349 797

Net Dynamics Internet Technologies
GmbH u. Co KG

Net Dynamics



Peter Smolle

Net Dynamics
Internet Technologies GmbH &.
Co KG

Prinz
-
Eugen
-
Strasse 68
-
70

A
-
1040 Wien, Austria

Email:
peter.smolle@netdynamics
-
tech.com

Tel.: +43 1 503982615

The Open University

OU


Dr. John Domingue

Knowledge Media Institute,

The Open University,

Walton Hall,

Milton Keynes, MK7 6AA,

United Kingdom

Email:

j.b.domingue@open.ac.uk


Tel.: +
44 1908 655014

SAP AG

SAP


Dr. Elmar Dorner

SAP Research, CEC Karlsruhe

SAP AG

Vincenz
-
Priessnitz
-
Str. 1

76131 Karlsruhe, Germany

Email:
elmar.dorner@sap.com

Tel:
+49 721
6902
31


Sirma AI Ltd.


Sirma


Atanas Kiryakov,

Ontotext Lab
,
-

Sirma

AI EAD,

Office Express IT Centre, 3rd Floor

135 Tzarigradsko Chausse
,

Sofia 1784,

Bulgaria

Email:
atanas.kiryakov@sirma.bg


Tel.:
+359 2 9768 303

Tiscali Österreich Gmb
h


Tiscali


Dieter Haacker


Tiscali
Österreich

GmbH.

Diefenbachgasse 35,


Report on the
State
-
of
-
the
-
Art and R
equire
ment
s A
nalysis


Deliverable 5.
1





vii


A
-
1150 Vienna,

Austria

Email
:
Dieter.Haacker@at.tisc
ali.com

Tel:
+43 1
899 33 160

Unicorn Solution Ltd.

Unicorn


Jeff Eisenberg

Unicorn Solutions Ltd,

Malcha Technology Park 1

Jerusalem 96951
,

Israel

Email:
Jeff.Eisenberg@unicorn.com

Tel.
:
+972 2
6491111

Vrije Universiteit Brussel

VUB


Carlo Wouters,

Starlab
-

VUB

Vrije Universiteit Brussel

Pleinlaan 2, G
-
10

1050

Brussel
,
Belgium

Email:
carlo.wouters@vub.ac.be

Tel.:
+32 (0) 2 629 3719



Report on the
State
-
of
-
the
-
Art and R
equire
ment
s A
nalysis


Deliverable 5.
1





viii

T
ABLE OF
C
ONTENTS


EXECUTIVE SUMMARY

................................
................................
................................
.......................

II

TABLE OF CONTENTS

................................
................................
................................
.....................

VIII

1

INTRODUCTION

................................
................................
................................
................................
...

1

2

STATE
-
OF
-
THE
-
ART ANALYSIS

................................
................................
................................
......

4

2.1

M
EDIATION OF
D
ATA AND
I
NFORMATION

................................
................................
...........................

5

2.1.1 Importan
ce of Data and Information Mediation in Semantic Web Services

..............................

6

2.1.2 Industrial State
-
Of
-
The
-
Art

................................
................................
................................
........

8

2.1.2.1 Approaches and Projects

................................
................................
................................
....................

9

2.1.2.2 Conclusions

................................
................................
................................
................................
......

14

2.1.3 Research State
-
of
-
the
-
Art

................................
................................
................................
........

15

2.1.3.1 Cl
assification Based on the Scope of Mediation

................................
................................
..............

15

2.1.3.2 Classification Based on the Classes of Application

................................
................................
..........

17

2.1.3.3 Approaches in
Constructing the Mediator

................................
................................
........................

20

2.1.3.4 Conclusions

................................
................................
................................
................................
......

23

2.2

M
EDIATION OF
P
ROCESSES

................................
................................
................................
...............

24

2.2.1 Processes and Process Technologies

................................
................................
........................

24

2.2.1.1 Usage of Process Technologies within Semantic Web Services

................................
.......................

24

2.2.1.
2 Need for Process Mediation

................................
................................
................................
.............

26

2.2.1.2.1. Choreography

................................
................................
................................
..........................

27

2.2.1.2.2. Orchestration

................................
................................
................................
...........................

29

2.2.2 Technologies for Process Mediation

................................
................................
........................

31

2.2.2.1 Existing Process Representation Technologies

................................
................................
................

31

2.2.2.2 Formali
zation of Process Representation

................................
................................
.........................

38

2.2.2.2.1. Logics for Representing Interaction Protocols

................................
................................
........

39

2.2.2.2.2. Formalizing Choreography
Description

................................
................................
..................

41

2.2.2.2.3. Formalizing Web Service Orchestrations

................................
................................
................

43

2.2.2.3 Process Integration by Process Composition

................................
................................
....................

44

2.2.2.3.1. Situation Calculus for Service Composition

................................
................................
...........

44

2.2.2.3.2. Hierarchical Task Planning for Service Composition

................................
.............................

44

2.2.2.3.3. Type Based Service Composition

................................
................................
...........................

45

2.2.3 Conclusion

................................
................................
................................
...............................

51

3

REQUIREMENT ANALYSIS

................................
................................
................................
.............

52

3.1

A
RCHITECTURAL
R
EQUIREMENTS FOR
DIP

M
EDIATION
C
OMPONENT

................................
..............

52

3.1.1 Requirements for the Run
-
Time Environment

................................
................................
.........

53

3.1.2 Requirements on the Design
-
Time Tool

................................
................................
..................

54

3.2

R
EQUIREMENTS FOR
D
ATA
L
EVEL
M
EDIATION

................................
................................
................

57

3.3

R
EQUIREMENTS FOR
P
ROCESS
L
EVEL
M
EDIATION

................................
................................
...........

59

4

CONCLUSIONS

................................
................................
................................
................................
....

61

5

REFERENCES

................................
................................
................................
................................
......

62




FP6


504083

Deliverable 5.1







1

1

I
NTRODUCTION

D
ue to an ever
-
increasing number of resources available on
-
line, end
-
users are
presented with large amounts of data and information, and can find it nearly impossible
to extract the relevant information items. Possibly the best solution to this overloading
problem is to mediate between different heterogeneous sources and to provide the user
with a single, relevant information source that is obtained by combining and relating a
wide variety of different sources.

The systems used for integrating heterogeneous

sources are
mediators

[
Wiederhold,
1992
]
. The
basis

of
mediators

and
mediation architecture
,
as introduced by Wiederhold
,

is that a mediator resolves mismatches during
the
run
-
time of
a
system
; it does

not wrap
resource
s before they are used in a system. Additionally, a mediator not only provides
the mediation, but also provides an authoring environment in order to define mediation.
From the business data viewpoint, it is a mapping tool that maps concepts from different
data sources

without losing
or altering
their semantics. Mediation

/ Integration, or more
generally,

dealing with heterogeneity
,

becomes
very

important when operating in
distributed systems,
likewise
in Internet applications

and

the Semantic Web
,

and
espec
ially when dealing with Semantic Web Services
[
Bussler, 2003
]
.

The basis of mediation is that it provides a high
-
level description technique for
describing the structure of resources. Firstly, a mechanism checks the resources
to be
integrated (that is, to be made interoperable) according to their structure, then it
provides mapping functionalities in order to make the resources interoperable. The basis
of such a mechanism is an exhaustive, declarative description technique that

allows the
description of all features of the resources, thus providing a powerful ontology language
for ontologies, and a suitable process description language for business processes
(
for
ontologies we adopted the definition provided by Gruber

[
Gruber, 1993
]
:
an ontology is
a specification of a conceptualization
;

by processes we
mean

a set of activities and
transitions with conditions for transitions
).


Secondly, an algebra is needed on top of this that defines the computable r
elations
between the modeling primitives and the operations between them
[
Papakonstantinou et
al., 1996
]
.

Thirdly, a classification of mismatches that can occur between resources is required; the
mismatch identificatio
n scheme should also classify the types of mismatches according
to their resolvability.

The fo
u
rth component of a
m
ediator is
a

mechanism that works on the representation
language and resolves a subset of the mismatches, with the algebra as the basis.

In

general,
m
ediation is an infinite problem field and only partial solutions can be
realized by automated mechanism. The reason for this is that it is considered to be
impossible to define an algebra and mismatch
-
resolving mechanisms for all kinds of
hetero
geneities that can appear
[
Yerneni et al., 1999
]
.
M
ost mediation technologies are
only semi
-
automatic for resolving conceptual mismatches, where both resources model
some aspect differently but correctly
(which means that they
require human intervention
when defining the mappings between concepts).

In the Web Service Modeling Framework
(
WSMF
)

[
Fensel and Bussler, 2002
]
,

which
represents

the basis of DIP, and which serves as the conceptual foun
dation of Web

FP6


504083

Deliverable 5.1







2

Service Modeling Ontology (WSMO)
[
Roman et al., 2004
],
three levels of mediation
are differentiated that are needed for Semantic Web Services. These levels are:

-

Data Level:

establish
es

interoperability between hete
rogeneous data sources. As
DIP
uses ontologies for data representation, special attention should be given to
ontology mediation
.


-

Process Level:
establishes

interoperability bet
ween heterogeneous processes. In

the external visible behavio
u
r of Web Service
s that ought to interact, there are
some
mismatches
that may
occur (for example mismatching regarding the
sequence of activities
)
. These have to be resolved in order to make the

processes

interoperable.

-

Protocol Level:
establish
es

interoperability between

resources that request
and
/
or

use heterogeneous messaging patterns or messaging sequences
1
.

It is important to note that for automated handling of Semantic Web Services, they have
to be mediated on all three levels.

The objective of
DIP W
ork
P
ackage

(WP
)
5

is to specify and develop the DIP
Mediation
C
omponent that is
required

to handle all mediation aspects within the DIP
architecture (see
[
Altenhofen et al., 2004
]
). The challenges for WP 5 are to specify the
architect
ure and functionality of the DIP Mediation Component. They have to be usable
for
the
resolution of mismatches between different DIP components, and it should be
possible to invoke adequate mediation facilities within the DIP architecture.

In fact, the DIP

Mediation C
omponent
poses

two major challenges:
firstly,
the

implement
ation of

a mediation
-
oriented architecture in accordance to the concept of
mediators and their usage in modern system architecture as outlined in [
Wie
derhold,
1992
]

and
, secondly,
the development of suitable mediation facilities, that is
mechanisms that actually resolve mismatches between poss
ibly heterogeneous
resources at

the three distinct levels of mediation identified above. The mechanisms to
be d
eveloped within WP 5 have to support the representation languages and ontological
structures of the different components in DIP, and they should apply and extend
existing mediation techniques in order to provide high quality concerning the
resolvability of

possible mismatches.

The aim of this deliverable is to provide an introduction into the field of
m
ediation, and
to analyse existing technologies with respect to their applicability for the DIP Mediation
Component, resulting in a requirements analysis for

WP 5. The document will serve as



1

There have been terminological dissimi
larities between the naming of the different levels of mediation.
While WSMF understands protocol level mediation to be concerned with messaging sequences, another
interpretation is that protocol level mediation is concerned with different communication pr
otocols, for
example HTTP, SOAP, FIPA, and so on. The latter is the understanding that underlies the DIP WP 5
structure: here, process level mediation is understood to deal with heterogeneities between business
processes (the workflow from an application d
omain point of view), and with the external visible
behaviour of services along with possible mismatches in messaging sequences. With protocol level
mediation, DIP WP 5 refers to the aspects of the communication protocol technology used.

The WP 5 consorti
um agreed that establishing interoperability between the external visible behaviours
and the messaging patterns of Web Services are a fundamental challenge to be addressed in the DIP
project. Thus, the Protocol Mediation Level has been included in the Proc
ess Level aspects in this WP 5
(at least at this point in time).


FP6


504083

Deliverable 5.1







3

the basis
of
WP 5.
To

achieve this, the document covers the following aspects: for the
requirements analysis of the DIP Mediation Component architecture, on the one hand
existing architectures should be taken into consider
ation (see the survey in DIP D5.1a),
and, on the other, the interoperability and usability within the overall DIP architecture
has to be ensured.

Regarding the mediation facilities to be developed in WP 5, the objective is to create
high quality mediation
facilities that extend existing approaches and techniques for the
different levels of mediation. Therefore, we have to exhaustively investigate existing
mediation technologies and systems, whereupon a reasonable requirements analysis and
specification of t
he mediation facilities can be derived. For mediation within Semantic
Web Services, the primary interest is in data level mediation and process level
mediation.

Protocol level mediation is covered by the fact that SOAP
-
based communication
protocols are u
sed for messaging within Semantic Web Services, thus no mismatches
are expected on this level. (At least protocol level mediation does have the same priority
as data level and process level mediation, with regard to the distinction explained in
footnote
1
)
. Thus, in the fol
lowing we concentrate the state
-
of
-
the
-
art and requirement
analysis on these areas of
m
ediation
, and the requirements analysis is comprised of
aspects on the architecture of the DIP Mediation Component as we
ll as for the
development o
f mediation facilities for the data and the p
rocess
l
evel.

This
document
is structured
as follows:

Chapter

2

report
s

on
the state
-
of
-
the
-
art

in
mediation facilities and technologies for the data and

the process level,

illustrat
ing
current techniques and existing mediation systems
;

Chapter

3

provides the
requirements
analysis for the DIP
M
ediation
C
omponent
;

Chapter

4

concludes the document.


FP6


504083

Deliverable 5.1







4

2

S
TATE
-
OF
-
THE
-
ART ANALYSIS

As mentioned in the previous
chapter

(
Chapter

1
)
, mediation
in
Semantic Web Services
can take place at three different levels: data, process, and protocol levels. However, in
our report we concentra
te only on data (information) and process mediation.

The relationship between data and information has to be clearly stated here before
continuing with this report. Any facts handled on the Web are considered to be data, but
they remain meaningless if inte
rpreted out
-
of
-
context; data only becomes meaningful
information if it makes sense in some context and has some sense for humans. As a
consequence, the mediation of pure data can be done only on a syntactic level. To
obtain meaningful mediated data, the se
mantic aspects must be considered, and the
mediation must be based on the information inferred from the data. So the mediation of
data actually implies the mediation of information and requires semantic mapping
capabilities, as well as specialized mapping
and integration techniques for a specific
application context.

While data and information are static structures, when mediating processes we have to
take into consideration the dynamic aspects: the order of the activities, the transactions
that may occur,
and the conditions for transitions. In other words, we have to consider
the interpretation of goals, as well as workflow and flexible Web Service invocation.

Considering this difference between static and dynamic, we present the current state
-
of
-
the
-
art i
n data and information mediation separately from the state
-
of
-
the
-
art in process
mediation.



FP6


504083

Deliverable 5.1







5

2.1

Mediation of Data and Information

In the last twelve years, there has been intense research activity in this field resulting in
the development of many mediation
techniques. There have been two different
directions in this area, directions that strongly marked the out coming solutions: the
industrial and the research areas.

In the industrial area, the main focus was the rapid development of mediation systems
approp
riate for particular needs. The industrial systems are application oriented,
offering simple solutions with a low
-
risk factor, based on technologies with years of
expertise.

Additionally, research activity is strongly concentrated on finding new and innova
tive
solutions to improve the quality of results and reduce the effort of the human user;
aiming to semi
-
automate mediating systems. One of the most daring approaches is the
consideration of the semantic as an indispensable factor in the mediation solution
s.
Together with already well
-
refined (as much as possible) syntactical techniques, this
approach intends to have a crucial role in the emergence of the Semantic Web Services.

Unfortunately, both the research and industrial approaches still rely on human u
ser
input.

In the following sections, we first provide a short rational for the use of data and
information mediation in Semantic Web Services, and then we present the current state
-
of
-
the
-
art in both the research and industrial fields.


FP6


504083

Deliverable 5.1







6

2.1.1

Importance of Dat
a and Information Mediation in Semantic Web Services

The main reason for the use of Web Services is to
provide a standard means of
strongly
decoupled
interoperat
ion

between different software applications, running on a variety
of platforms
[
Booth et al., 2004
].

The purpose
of

adding semantic to the Web and to the
Web Services is to define meanings that enable computers to operate in a more
appropriate manner with the information they manage.


The
process of engaging a Web Service
(
or a Semantic Web Service
)
, consists of the
following steps [
Booth et al., 2004
]:

1.

the requester and provider entities become known to each other
;

2.
the requester and provider entities
agree

on the service description and semantics that
will govern the interaction between the requester and provider agents
;

3.
the service description and semantics are realized by the requester and provider
a
gents
;

4.
the requester and provider agents
interact by
exchangi
ng

messages
.

These steps are illustrated in the following figure:


Figure
1
: Engaging a Web Service
2






2

Source: [
Booth et al., 2004
]


FP6


504083

Deliverable 5.1







7

A misunderstanding between the service requestor and service pr
ovider
can

appear
during
either step

1

or
2
, due to the fact that the two entities involved in the process
can

use different data sources
. This imposes the use of mediation at data and information
level
, for facilitating the communication between a requestor and a provider of a
service.


FP6


504083

Deliverable 5.1







8

2.1.2

Industrial State
-
Of
-
The
-
Art


The current state
-
of
-
the
-
art in industrial applications is a central integration server,
which intercepts messages between different systems

and translates the message from a
given source format into the necessary target format. The necessary transformations to
perform these translations are static scripts that are executed by the integration server.
The decision
as to
which script to execute
is either taken during design time by the
system designer or during run
-
time based on predefined rules. Examples for such
systems are MS BizTalk

[
MS BizTalk, 2004
]
, Seeburger Business Integration Server
[
SBIS, 2004
]
,

and SAP Exchange Infrastructure

[
SAP XI, 2004
]
.

Most current integration servers also allow the dynamic routing of messages during run
-
t
ime based on the message content
. This enables some dynamic behav
iour of the system
as the
target of given me
ssages can be determined at run
-
time and doesn’t need to be
defined at design time
, for example SAP Master Data Management
.

The creation of a

single view over multiple data
bases is also a problem that is addresse
d
by several products.

The
points of
interest in the industrial state
-
of
-
the
-
art are how the necessary
transformations are created and how these transformations are executed
, the rest of the

mediation

process being strongly dependent on the implementation
, and not relevant for
this state of the art
.


1.

Creation of Transformation

The creation of transformations in all existing solution
is
strongly based on
a domain
expert

inputs, that is they are se
mi
-
automatic or entirely manual
. Two very different
approach
es to the creation of transformations exist: either they are created using a
graphical tool or by directly programming the transformations using some kind of
scripting language.
B
oth approaches have different advantages and disadvantages
, for
example, gra
phical tools become very confusing for large message schemas, direct
programming does not have any feedback on which elements are already treated, or
which elements were probably forgotten. Some companies like, for example, Seeburger
use a combination of t
he two builds into an IDE to create transformations. This enables
the user to choose the approach that is most suitable for the given problem.


2.

Execution of Transformation

There are two basic approaches for the execution of the transformation. One is to
in
terpret the scripting language used to define the transformation at run
-
time. As an
example, one could

use XSLT
3

[
W3C, XSLT, 1999
]
to
program a transformation and
use a standard XSLT processor to execute it. The second appr
oach is to compile the
transformation into an executable program, for example, a Java class, and to simply call
this program during run
-
time.





3

XLST is a

language for transforming XML documents into other XML documents.


FP6


504083

Deliverable 5.1







9

2.1.2.1

Approaches

and Projects

A broad survey is provided as a separate sub
-
deliverable
D5.1a

“Survey of Industrial
Data Integration Systems”. It is provided separately because of its
size

and complex
internal structure
.

The survey provides an introduction to industrial data integration
systems and an overview of

the systems provided by a few of the leaders in the database
management and enterprise application integration, that is, ORACLE, IBM, Microsoft,
WebLogic, and Cape Clear. Sub
-
deliverable D5.1a also represents the current
“paradigms” and structuring of the

data integration area without a bias towards Web
Services, ontologies, or the Semantic Web; the reason for this being that most of the
experience, industrial approaches, and technologies used for data mediation and
integration are non
-
semantic.

In order t
o point out some of the interesting features of existing systems, we now
describe
SAP Exchange Infrastructure
,
Seeburger Business Integration Server, and
Microsoft BizTalk Server.


The
SAP Exchange Infrastructure

[
SAP XI, 2004
]

is a system that enables the
integration of different enterprise applications on different platforms. It offers a run
-
time infrastructure for message exchange, configuration options for managing messa
ge
flows and business processes,

and support for the cr
eation of the necessary message
transformations. An overview of the architecture of SAP XI is shown in
Figure 2
.


Figure
2
: SAP Exchange Infrastructure Overview
4


The SAP Exchange Infrastructure consis
ts of three main parts: the
Integration
Repository
, the
Integration Directory

and the
Integration Server
. The Integration
Repository is used to capture all
the
information available during design time about an
integration project. This information consists

of interface descriptions, components,
mappings,
and
business processes. In addition to this information, the Integration
Directory contains additional configuration information, such as information about the
system landscape or business partners. The cen
tral component of the SAP XI

is the
Integration Server, which is

the central communication engine for messages sent



4

Source: [
SAP XI, 2004
]


FP6


504083

Deliverable 5.1







10

between different systems. The Integration
S
erver is responsible for the routing and
mappings of messages based on the information stored in

the Integration Directory.
During runtime the Integration server uses the information stored in the Integration
Directory to perform these tasks dynamically based on the content of a message.

The SAP XI is build upon Java 2 Enterprise Edition Platform. It

supports a large
number of open standards in order to ensure wider interoperability. Examples of
supported standards include WSDL for the description of interfaces as well as the
SOAP with Attachments specification and XML on which the communication is bu
ilt.

The Integration Server is currently capable of executing two types of transformations,
XSLT scripts and Java classes. XSLT is supported only for enabling the integration of
pre
-
existing mappings. Therefore
,

SAP XI offers no support for creating XSLT
mappings. During run
-
time the appropriate mappings are dynamically selected from the
integration directory based on the message header or contents using user
-
defined rules.

For the creation of Java mapping programs, SAP XI offers a graphical tool that is v
ery
similar to many other products in this area. A user creates a mapping by graphically
connecting element
s of the schema of the source me
ssage with elements of the target
message schema. In order to support more complexity than just one
-
to
-
one element
ma
pping
,

the tool offers the possibility of inserting arbitrary functions into the data
flow. The tool offers a large number of predefined functions and the possibility of
creating new user
-
defined ones if needed.

Figure 3

shows

a screen shot of the SAP XI Mapping Tool.


FP6


504083

Deliverable 5.1







11


Figure
3
: The SAP XI Mapping Tool
5


Another industrial product is the
Seeburger Business Integration Server

[
SBIS, 2004
]
,
which is
an integration engine fo
r B2B integration. An overview of the architecture is
shown

in
Figure 4
.

The core of BIS is the so
-
called
Workflow Engine
. This engine controls all integration
processes that are handled by BIS. This engine is connected to di
fferent
Event Sources

that can trigger the execution of a workflow. Events can be triggered by different
sources,
for example

files, databases, or messages. Another possible source for events is
the
W
eb
S
ervices interface. BIS is capable of offering to oth
er systems and to use
requests to a web service to trigger the execution of a workflow. The connection of BIS
to the legacy systems is achieved through the
Components
. The available Components
include
C
onnectors

to standard ERP or eBusiness solutions,
C
om
munication
C
omponents

to enable communication with external partners,
Converters
that enable the
easy conversion between different communication standards and furth
er components
that include, for example

security components to enable secure communication.

While the overall architecture of BIS is very similar to the architecture of most
industrial integration systems, there are two special features of BIS. Firstly
,

it contains a
large number of converters and connectors. There exist transformations to conver
t
among all major communication standards and connectors to connect with other
available business systems. This enables the solution of a large number of integration



5

Source: [
SAP XI, 2004
]


FP6


504083

Deliverable 5.1







12

problems easily, as the development can focus on the integration of workflows and does
not

need to be concerned with the creation of transformations.


Figure
4
: Overview of the Seeburger BIS
6


The second special feature that we want to highlight is the tool for gen
erating
transformations, the so
-
called
Mapping Designer
. This Mapping Designer is a very
advanced tool, and can be seen as an integrated development environment for
transformations. Transformations can not only be created using either a graphical
interface or a scripting language like in most other tools, but
us
es

both approaches in an
integrated fashion. The user can either write a transformation script in a special scripting
language and immediately see a graphical representation of the created transformation
or create transformations graphically with the too
l creating the resulting script. This
allows a development process similar to those
supported, for example, by well
-
known
UML tools.

The Mapping Designer also integrates debugging support in order to support the
complete development process.


The last pro
ject described here is
Microsoft BizTalk Server

[
MS BizTalk, 2004
]
,

which
enables the connection of diverse applications using a graphical user interface

to create
and modify business processes that use services from thos
e applications
. In order to do
this, the
Microsoft BizTalk Server engine must provide a way to specify
the

business



6

Source: [
SBIS, 2004
]


FP6


504083

Deliverable 5.1







13

process
es

and a mechanism for communicating between the applications that the
business process use
s.

The main components of BizTalk Server 2
004 are
send

and
receive adapters
,
sen
d

and
receive pipelines
,
orchestrations
, the
BizTalk Server message box
, and the
business
rules engine

(
Figure 5
).


Figure
5
: BizTalk Server Architecture
7

The follow
ing way of processing messages is also illustrated in
Figure 5
:

1.

A message is received through a receive adapter, and then processed through a
receive pipeline (this processing
can

include converting the message from its
native fo
rmat into an XML document and validating its digital signature)
.

2.

The message is delivered to

the so
-
called MessageBox database, a database that
uses Microsoft SQL Server
.

3.

The message is dispatched to its target orchestration, which takes whatever
action th
e business process requires
.


T
he result is usually another message,
which is also saved in the MessageBox database
.

4.

The new message is processed by a send pipeline (the processing
can

include its
conversion from the internal XML format used by BizTalk Ser
ver 2004 to the
format required by its destination and the adding of a digital signature)
. T
he
message is sen
t

to the send adapter
.

BizTalk

Server 2004 is built completely around the Microsoft .NET Framew
ork and
Microsoft Visual Studio

.NET. It also has n
ative support for communicating by using



7

Source: [
MS BizTalk, 2004
]


FP6


504083

Deliverable 5.1







14

Web S
ervices, along with the ability to import and export business processes described
in Business Process Execution Language.


2.1.2.2

Conclusions

The mediation systems described in this section are, like most (or all) of
the industrial
mediation systems, application oriented and appropriate for particular needs. One of the
major challenges in industry is not to obtain a general solution, open to the new and
innovating technique, but to offer a simple solution with a low r
isk factor.

The first impression may be that these approaches do not raise any challenges, and that
they are not appropriate for mediation in the context of Semantic Web Services.
However, they should not be ignored, and when developing a new mediation sys
tem the
option should be considered of improving one of the already existing industrial
mediators are robust and reliable.


FP6


504083

Deliverable 5.1







15

2.1.3

Research State
-
of
-
the
-
Art

The main focus on the research activity in data and information mediation
is

the
development of approaches

as systems that would require the minimum human
intervention. Of course, the ideal
scenario

would be no require
ment for

human inputs at
all, but this
still remains an unsolved problem
.

The impressive number of projects regarding the mediation of data and

information
prevent
s

us from trying to enumerate or describe all of them. Instead, we will provide a
short list of possible classifications for these projects, exemplifying with projects

based
on those specific approaches
.

The classification criteria we c
hose

are based on:

A) Scope of Mediation

B) Classes of application

C) Approaches

in constructing the mediator
8


2.1.3.1

Classification Based on the S
cope
of M
ediation

Based on the scope of mediation,
three strategies
for ontology mediation
are
distinguished
: onto
logy mapping, ontology
merging
and ontology
alignment
[
Noy and
Munsen, 2000
]
.



Ontology Mapping

In so
-
called
ontology mapping
,
rules are defines to enable interoperability between two
or more ontologies. The rules and th
e source ontologies are kept s
eparated after
integrating.

The advantage of using this approach is that the mapping rules, once
defined,
can

be reuse as many times as needed; a mapping rule must be rewrit
ten

only
when one of the ontologies is changing.

A pr
oject that implements this approach is
the
C
OIN

(C
O
ntext I
N
terchange)
[
Goh et al.,
1999
]
p
roject, which implements a suitable architecture for semantic interoperability.

The COIN Framework consists of three main components:

domain
model, e
valuation
a
xiom and
c
ontext
a
xiom
. The
domain model

defines the application domain in terms of
pri
mitive types and semantic types.

T
he
elevation axioms
identify the correspondences
between the attribute from the sources ontology and the types defin
ed in the domain
model
.

T
he last components, the
context axioms
, correspond to the named contexts
associated with different sources
, providing the semantics of data in terms of value
assigned to semantic objects. Associated with a source ontology the conte
xt axiom
s
provide

the articulation of the data semantic.

Articulation of data based on
d
omain
m
odel (ontology) and relating the data with
d
omain
m
odel are important facts considered in their architecture. However, a
d
omain
m
odel is closer to a conceptual s
chema than an ontology. In an example in [
Goh et al.,
1999
], one can see that "money amount" is considered a subtype of "semantic number"



8

These classification criteria are not dis
joint
. A project can be classified using a
ny (or all) of these
criteria.


FP6


504083

Deliverable 5.1







16

while number is only a primitive type for representing the value
-
or "currency type" is a
sub
type of "semantic string". However, according to
[
Gruber, 19
93
]
the
defin
ition of
ontology
is based on the conceptualizations of
the
people in a community. Therefore,
"money amount" is an amount or a quantity. Treating "semantic

number" as a super
-
type is the result of influence of application development, while "money amount" or
"currency type" are related to a value of type number or string, respectively, only for
representation purpose.


Ontology Merging

In the ontology mergi
ng approach
the
two
source ontologies
are united i
nto
a single

ontology
that

comprises all
the
information of the source ontologies

[
Noy and Musen,
2000
]
.
The algorithm used for merging the ontologies should be able to el
iminate any
possible duplicates or inconsistencies that
can

occur
(the
original ontologies
may
cover

similar or overlapping domains which implies that some concepts may be defined in
both of them, not always in a similar or even consistent manner).



The
K
raft

project [
Visser et al., 1999
] implements the ontology merging approach
. It is

a project for the integration of heterogeneous information, using ontologies to r
esolve
semantics problems. The approach is to
extract the vo
cabulary of the community and
the definition of terms from documents existing in an application domain

KRAFT uses
shared ontology [
Jones, 1998
] as a basis for mapping between ontology definitions and
communication between agents.

In [
Visser et al., 1999
], the architecture is "chosen to
make shared ontology as expressive as the 'union' of the ontologies". However, the
definition of the union of ontologies and its similarities or differences with shar
ed
ontology is not stated. KRAFT detects a set of ontology mismatches (as described in
[
Visser et al., 1999
])

and establishes mappings between the shared ontology and local
ontologies.


Ontology Alignment

The alignment of t
he ontologies is accomplished by establishing links between them. A
consequence of the alignment is that the two ontologies
can

reuse information from one
another.




The ontology alignment approach is applied in
O
bserver

[
Mena et
al.,
2000
]
, which

uses ontologies to allow queries against heterogeneous sources. It replaces terms in user
queries with suitable terms in target ontologies, by means of In
ter
-
Ontology relations.
Observer uses d
escription
l
ogic as both ontology definition

language and query
language.

There are three steps in processing a query:
query construction
,
access to underlying
data

and
controlled query expansion to new ontologies
. The first step,
query
construction

needs human intervention in selecting the user on
tology (which contains
information about the semantics of the query) and
in editing the query.
The execution of
the query is performed in the second step (
access to underlying data
), when the user
ontology is accessed.
If the user is not satisfied with the

query results, than other

FP6


504083

Deliverable 5.1







17

ontologies containing related terms are visited (this being the
third

step,
controlled
query expansion to new ontologies
).


A graphical representation of these three approaches is
shown

in
Figure 6
.


Ontology
Alignment

Ontology
Mapping

Mappi ng

Rul es

Ontology
Merging








Ontol ogy A is made
compati ble to ontol ogy B


Figure
6
:

Ontology Integration Strategies

An important issue that needs to be specified here is that
ontology merging and
ontology alignment cannot be considered totally independently fr
om ontology mapping,
as mappings are still necessary for making the merging or the
alignment possible. A

good example in this sense is the Observer project, which maps the query results
obtained by consulting remote ontologies with the results obtained by
consulting

the
user ontology
.


The choice of one of these mediation strategies is determined by the application field. If
the only requirement is to express instances of one ontology in terms of the other, then
the ontology mapping is the most appropriate
technique. If it is necessary to have a set
of rules and links that permit the usage and the interoperation of two ontologies, the
ontology alignment approach is more suited. Finally, if the purpose is to obtain an
ontology containing information from two
different sources (ontologies), then merging
them is the right solution.


2.1.3.2

Classification Based on the
Classes of
A
pplication

Considering the classes of application, we can distinguish the following approaches

[
Madhavan

et a
l.
, 2002
]
:
information integration and
Semantic Web Services, ontology
merging, and data migration.


Information Integration and Semantic Web Services

The
information integration and Semantic Web Services

approach is appropriate when
there is a need for t
he use of many heterogeneous sources, without explicitly referring to
each of them.
The user just queries a mediated logical schema containing relevant
information for the application.
[
Wache and Fensel, 2000
] proposed the
so
-
called
intelligent integration that would allow the integration of a large variety of data sources
,


FP6


504083

Deliverable 5.1







18

should be based on semantics by means of used ontologies and should provide an
advanced query processor
, that
includes facilities for
the
extraction of c
ontent, data
abstraction and a

semantic
-
based query interface
.


An example of a system that uses this approach
is the
IRS
-
II

(Internet
Reasoning
Service
) [
Motta et al., 2003
]

system.

Since this system addresses mediation in the
c
ontext of Semantic Web Services, a more detailed description
is provided
than
for

the
previous presented systems.

The mediation component of IRS

is called a
Bridge

(
a
type of

adapter) and stands
within a framework for describing knowledge components. IRS
bridges are not
explicitly modeled (as
,

for example
,

in W
eb
S
ervice
M
odeling
O
ntology [
Roman et al.,
2004
]
), but they have specific roles, as discussed below.



T
he
IRS
-
II
[
Motta et al., 2003
]

is a Sema
ntic Web Services framework, which allows
applications to semantically describe and execute Web services.

IRS
-
II is based on the Unified Problem Solving Method Development Language

(
UPML
)

framework
[
Omelayenko et al., 2003
]
, which distinguishes between the
following categories of components specified by means of an appropriate ontology:




Domain models
:

t
hese describe the domain of an application
, for example,
vehicles, a medical disease
.



Task models:

t
hese provide a generic

description of the task to be solved,
specif
y
ing the input and output types, the goal to be achieved and applicable
precond
i
tions.



Problem Solving Methods (PSMs):

t
hese provide abstract, implementation
-
independent descriptions of reasoning processes
that

can be applied to solve
tasks in a specific domain.



Bridges: t
hese specify mappings between the different model components within
an application.

The IRS implementation of the UPML framework covers semantic mappings amongst
knowledge components and integra
tion techniques for task
-
centred invocation of Web
Services. The publishing platforms of IRS
-
II facilitate the invocation of Semantic Web
Services by mediating between the server of semantic descripti
ons and the actual Web
service
. The definitions of
task
,

problem

solving method

(PSM) and
bridge

in IRS
correspond to the definitions of
goal
,
web service description

and
mediators

in WSMF
[
Fensel and Bussler, 2002
] since both approaches derive from the UPML framework.

The pro
cess of semantically describing services in IRS involves several mediation
activities: mapping generic tasks and PSMs to a domain model, mapping PSMs to tasks
or, in general, adapting existing resources. More specifically, in the UPML framework,
the knowl
edge components of a library can be described and connected together in
different running systems, through the creation of explicit mediating elements

adapters
. In particular,
bridge
adapters connect two kinds of components by way of
mapping relations
betw
een the ontologies of both components, such as:



Task
-
Domain bridge



PSM
-
Domain bridge



Task
-
PSM bridge


FP6


504083

Deliverable 5.1







19


IRS supports the direct acquisition of the value of an input role, according to the task
ontology.

If the domain knowledge does not conform to the task on
tology, the IRS
supports users in constructing a
mapping relation

between the task role and
the
corresponding domain knowledge. A domain
-
task mapping relation defines the
transformation of a piece of domain knowledge or attributes into an instantiated inpu
t
role for the task; mappings may also be required for task outputs to conform to the
domain ontology.

The UPML description of the library may also include a set of
PSM

task bridges
. If not
already provided in the library, the IRS supports the creation of

such bridges to map the
inputs and outputs of the described task to the ones of the selected PSM. IRS users
specify the domain entities that fill
-
in the input


output roles of the PSM. Some of the
roles for the PSM are inherited from the configured task,
through a corresponding
PSM

task bridge
. In addition, the selected PSM may define supplemental roles. For
example, a PSM can define the notion of an
abstractor
,

a function that computes
abstract types from raw data. The IRS supports the acquisition of doma
in
-
method
mapping knowledge in a way similar to the domain
-
task mapping during task
description.

The invocation process consists of running the Web Service associated with the PSM to
realize the specified task, with domain case data entered by the user. Th
e IRS first
acquires case data from the user and instantiates the case inputs of the PSM by
interpreting the
Task
-
D
omain, Task

PSM and
D
omain

PSM mapping relations. The
IRS also checks the preconditions of the PSM and task on the mapped case data. The

IRS
then runs the Web Service

with the mapped inputs, by accessing the publishing
platform used for registering the service for the PSM. IRS uses the publishing platform
to retrieve knowledge about the location and type of PSM code. Finally, the IRS fills
-
in
t
he domain outputs with the results of PSM execution, possibly transformed with
domain

PSM mapping relations defined at PSM description time.

The IRS
-
Protégé implementation supports a structured methodology for mapping the
input

output roles of reasoning re
sources to relevant domain entities. The methodology
provides a typology of mapp
ing
-
relation template, that is
a
mapping ontology
, which
covers a wide range of mapping relations, from simple renaming mappings, to complex
numerical or lexical transformation

of entities.


Data Migration

The
data migration

approach is used for importing external data and
then mapping,
merging

or aligning them
with internal application data

(see
Section 2.1.3.
1

for

more
details about
these
three approaches
)
. The
decision
as to
which of the
s
e

three techniques
is
most

appropriate is again dictated by the application field, the main cri
teria being that
the mismatches between internal and external data should be covered as much as
possible.


The

Clio

project [
Popa et al., 2002
] is an example of project that implements

a

data
migration approach using
queries for
ontology mapping.
Clio is a

high
-
level schema
-
mapping tool that

guides the user to the mapping specification

by

using the so called

FP6


504083

Deliverable 5.1







20

value correspondences. These value correspondences

specify how the values of
the
source attributes are mapped to
values of the
target attributes.

The
entire

process consists of two main steps: semantic translation and data

translation.

The first step
implies the understanding of the given value references,
which means that

the

semantic mappings
must
be

understood and

converted to

logical mappings,

while in
the second step the logical mappings are

transformed in low
-
level mappings, in
this case
queries.


Ontology Merging

The ontology merging approach was described in the previous
section

(
Classification
Based on the Scope of Mediation
).


The three approaches illustrated in this
section

are not, by a
ny means,
disjoint
. Maybe a
separation between them is theoretically possible, but the actual implementation of
information integration or data migration is not possible without combining it with
ontology merging (or with another one of the techniques desc
ribed in the previous
section
)


2.1.3.3

Approaches in Constructing the M
ediator

There are three main approaches in constructing
a mediator
:
machine learning

[
Doan

et
al., 2002
]
,

and

structure based

(schema matching)

and linguistic/le
xical analysis

[
Rahm
and Bernstein, 2001
]
.


Machine Learning

In the
machine learning

approach, the mapping rules are “learned” based on existing
examples of mappings. These mappings are usually constructed manually or semi
-
automa
tic
ally

(in which case the systems makes
mapping
suggestion
s

but

inputs from
a
domain expert

are still needed
)
.

The larger the training set the

more accurate
are the
results obtained by using this approach

As an example of a system that uses machine learni
ng technique for assisting the
ontology mapping p
rocess we will describe here
the

G
lue

system [
Doan et al., 2002
].
By applying probabilistic definitions for similarity measures, Glue is able to find the
most similar concepts b
etween two heterogeneous data sources.
The architecture of the
system is shown in
Figure 7
.



FP6


504083

Deliverable 5.1







21


Figure
7
: GLUE Architecture
9


The main elements of the architecture are:
t
he
distribution e
stimator
, the
s
imi
larity
e
stimator

and the
r
elaxation
l
abeler
. The
distribution estimator applies machine
learning technique to compute the joint probability distribution between two concepts
belonging to two different taxonomies (the probability for the two concepts to hav
e the
same semantic). The similarity estimator applies a user supply function on this
probability distribution, obtaining a similarity factor for each two concepts. Considering
the entire taxonomies, these similarity factor
s form a similarity matrix that,

together
with
domain specific constraints and the heuristic knowledge is computed by the
relaxation labeler for obtaining a mapping configuration


Schema Matching and
Linguistic/Lexical Analysis

In this case, the internal structure of the concepts is analy
zed. Simultaneously, there
may be used some heuristic functions based on linguistic similarities

(for example
consulting a

dictionary or a thesaurus for finding lexical r
elations between concepts
name

like synonymy or hyponymy
)
.


The
XMapper

system
10

is app
ropriate for illustrating the structured base approach
.

It
was especially developed to create transformations between different XML message
formats. XMapper
uses
only instance information to create the transformations.

Figure 8

s
hows the functionality of the XMapper system.




9

Source: [
Doan et al., 2002
]

10

http://citeseer.nj.nec.com/kurgan02semantic.html


FP6


504083

Deliverable 5.1







22

To actually create a transformation XMapper first extracts a number of XML message
instances from each data source. From these instances
,

XMapper th
e
n extracts the
structure of the message as well as all XML e
lements of each message and a set of
possible values for each of these elements. In the next step a feature vector containing
22 features (like type, allowed values, lengh) is created for each XML element. Sixteen
elements of each feature vector are create
d by the constraint analysis and 6 elements by
the learning component
,

using an algorithm called DataSqueezer [
Larson et al., 1989
].

After a feature vector for each XML element has been created, a
Distance Table

is buil
t

by calc
ul
ating the distance between every
two elements of the different source
s
. The
transformation can now easily be found by mapping the two elements of each source
that have the shortest distance.


Figure
8
: Functionality of the XMapp
er system
11


As in the previous
section

(
Classification Based on the Classes of Application
) the
technique illustrated here are often combined. The XMapper system actually uses both
of th
e
s
e

approaches, by “learning” whi
le it creates the 22 features vector.






11

Source:
http://citeseer.nj.nec.com/kurgan02semantic.html


FP6


504083

Deliverable 5.1







23

2.1.3.4

Conclusions

In this
section

(
research state
-
of
-
the
-
a
rt on
data and i
nformation

mediation



Section
2.1
)

we have described
some of
the current existing
research
approaches on data and
i
nformation mediation
. The selection of
the projects
presented was made based on th
e
classification criteria and

the appro
aches identified;

for each approach, we present a
project that would illustrate its particularities.

The vast number of research approa
ch
es and projects in this area lea
d to only one
conclusion: the research is far from over, but maybe
b
etter
solutions

could be found. As

previously stated in this
section
, an attempt to implement a particular approach,
completely disregarding all the other
s, is neither an optimal nor a feasible solution.
The
best solution is probably to try to extract the most important features (from the
functio
nality point of view) of all the
s
e

approaches and to try to combine them
, in order
to achieve the desired functio
nality
.



FP6


504083

Deliverable 5.1







24

2.2

Mediation of Processes

To analyse the state
-
of
-
the
-
art in process mediation, we first need a clear understanding
of the related concepts. Therefore, we will examine uses of processes in systems, paying
special attention to the usage of process t
echnologies within Semantic Web Services.
Then, we point out where process mediation is needed and the specific requirements
that arise for different process mediation scenarios. This section investigates these
aspects. Starting with a general overview of
process technologies, we point out the
application scenarios for process level mediation within Semantic Web Services, and
then investigate the existing technologies and approaches that can serve as a starting
point for development of the
Process Level Med
iation Module

of the DIP Mediation
Component.


2.2.1

Processes and Process Technologies

This section gives a brief overview of process technologies and their use within
Semantic Web Services, and the resulting requirements for process level mediation
scenarios.



2.2.1.1

Usage of Process Technologies within Semantic Web Services

A process is a set of activities and transitions with conditions for transitions. Depending
on the specific process, its tasks could be a combination of services that stand for
queries, transac
tions, applications, and administrative activities. These services can be
distributed within or across enterprises and are coordinated by constraining control and
data among them. The services can themselves be composite, that is, implemented as
processes,

thus introducing nested processes and recursive definition of processes.

Before explaining the technologies’ requirements and the challenges arising for process
technologies, we first describe the general building blocks of processes and their
definition

[
Bussler, 2003
]
:

-

Activity/
Action/State:
a step in a process that can be resolved arbitrarily,
that is,
either by a simple program or by a more complex one as well as by another process
(‘sub
-
process’ or ‘nested process’), or
by a manual activity. Activities in a process
represent the basic building blocks of what is done or achieved in a process. With
regard to the level of abstraction represented by the process, activities are not split
into smaller building blocks.

-

Trans
ition:
a transition is a conversion between activities. Transit
ions are realized
by conditions
.

-

Data Flow:
process technology allows specifying, executing, and controlling
complex, multi
-
step information processing. Thus, the information to be processed
h
a
s

to be passed through the building blocks of the process.
Data flow

is concerned
with real application data, in contrast to control flow
,

which deals with process
techn
ology information. The duty of d
ata
f
low technology for processes is to ensure
that ea
ch activity in a process receives the information it needs for execution.

-

Control Flow:

control information is needed in order to provide the means for
defining the nature of a process and for controlling its execution. Control flow
primitives can be disti
nguished as:


FP6


504083

Deliverable 5.1







25

a.

Process Logic Primitives:

Control flow elements for

the

specification of control flow structures
that can be combined into more complex algorithms. The most
common process logic description primitives are (naming in
accordance to BPEL4WS, se
e
[
Curbera et al., 2002
]
):



sequence
, for serial execution



while
, to implement a loop



switch
, for multiple way branching



flow
, for parallel execution



pick
, for choosing among alternative paths based on an external
event

b.

Execut
ion Control Primitives:

Primitives for defining
the
execution handling of a process or its
activities. This group (optionally) contains primitives for:



Timing:

handling of timeouts,

and so on

during process
execution



Event Handling:

support for event
-
d
riven execution of processes



Interaction
, that is

interoperation between parties


Adequate process technologies face a number of technical challenges. At first, they have
Adequate process technologies face a number of technical challenges. Firstly, they h
ave
to support modeling processes and ensure correctness of execution with respect to the
model and to the constraints of the underlying services and their resources. Normal
execution of a process is easy when the process model specifies a partial order o
f the
activities in the process. Exception conditions can be more difficult to model and
handle. More important, because interesting business processes are often long running,
interactions among them are non
-
atomic, leading to the possibility that the info
rmation
they take as input can be subject to revision, causing their own results to be invalidated.
Exceptions and revisions are the main sources of complications in the modeling of a
process.

Secondly, a suitable process technology has to support interfa
cing the process with
underlying functionalities, that is, the resources that actually fulfil the distinct activities
in a process. Within database systems, this would require linkage to the concurrency
control and recovery mechanisms of a DBMS


within Se
mantic Web Services a
linkage to the execution control of Web Services is required.

A major use of process technologies is to allow the automation of business processes
within organizations. This area is commonly referred to as “workflow technologies”,
wh
ich are a special type of processes