1
Matching and Reuse of XML Schemas
2
Sample XML Schema
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="car">
<xs:complexType>
<xs:sequence>
<xs:element name="make" type="xs:string"/>
<xs:element name="model" type="xs:string"/>
<xs:element name="year" type="xs:string"/>
<xs:element name="color" type="xs:string"/>
<xs:element name="driver">
<xs:complexType>
<xs:sequence>
<xs:element name="first" type="xs:string"/>
<xs:element name="last" type="xs:string"/>
<xs:element name="license" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
3
What is XML schema matching
Matching
–
identifying the relations among the
corresponding elements of two schemas
e.g. customer/firstName <==> client/name/first
customer/name <==>
concatenate (client/name/first, client/name/last)
Calculate the distance between two Schemas
E.g., distance between
customer.xsd
and
client.xsd
is 0.67.
4
Why XML Schema matching
From data integration point of view:
Purpose: Automatically identifying corresponding elements between two
schemas
Relevant works:
Database schema matching/mapping
, e.g.,
A. Doan, et al.,
Reconciling schemas of
disparate data sources: A machine
-
learning approach.
SIGMOD
, 2001
Generic schema mapping
, e.g.,
J. Madhavan, P. A. Bernstein, E. Rahm.
Generic schema
matching with Cupid.
VLDB
, 2001.
XML Schema matching
. E.g.
H. Do, E. Rahm. COMA A system for flexible combination of
schema matching approaches.
VLDB
2002.
From web service composition point of view
e.g., matching the output type of one service with the input of another in
sequential composition
From software reuse point of view:
Purpose: Build XML Schema categories and search engines;
Relevant works:
Software component search
:
A Mili, R Mili, RT Mittermeir, A survey of software reuse
libraries, Annals of Software Engineering, 1998.
Agent and service matching
: Katia Sycara, Jianguo Lu, Matthias Klusch, Interoperability
among Heterogeneous Software Agents on the Internet, Technical Report CMU
-
RI
-
TR
-
98
-
22, CMU.
5
What are the problems
Modelling
As graph
As tree matching
Node similarity
Name, type, cardinality.
Structure similarity
Tree edit distance
K. Zhang, D. Shasha. Simple fast algorithms for the editing distance
between trees and related problems.
SIAM Journal of Computing
, 1989.
6
Overview of our system
XML
Schema
Name
Similarity
XML
Schema
Modelling
Structural Relations
Name Relations
Results
retrieval
Node Relations
Node
Similarity
Structural
similarity
7
Three
similarities
WordNet,
string matching
Hungarian method
Name
Similarity
Node
Similarity
Structural
Similarity
Node name
Hierarchical
structure
Compatibility
tables
User
-
defined
data type
Built
-
in
data type
Cardinality
Tree matching
algorithm
8
Modelling
<xs:element name="driver" type="driverType"/>
<xs:attribute name="license" type="xs:string"/>
Model schemas as trees
9
Modelling
customerOrder
shipping
billing
address
date
ship2Add
date
bill2Add
street
province
postcode
schema
reference
paper
author
title
contents
refNo
paper
customerOrder
shipping
billing
date
ship2Add
date
bill2Add
schema
street
address
province
postcode
street
address
state
zip
Address_ca.xsd
Address_us.xsd
Model schemas as trees
Reference
Importing and Inclusion
Recursion
10
Information excluded in Modelling
Related to elements or attributes
Default value, value range, unique, nullable…
Related to structure
Sequence
All
Choice
name
first
last
name
last
first
Model schemas as trees
11
Computing node similarity
Computing name similarity with the help of:
WordNet and its API
String matching
Hungarian method
Add the similarity of other information
Data type
Minimum cardinality
Maximum cardinality
Node similarity
12
Name similarity from token lists
Tokenize names
E.g. clientName
-
> client name
submittedReports
-
> submit report
Similarity between two token lists
Using Hungarian method for Weighted Bipartite Graph Matching
(WBGM)
sim
i,j
sim
0,0
customer
delivery
address
client
require
shipping
address
customerDeliveryAddress
vs.
clientRequiredShippingAddress
Node similarity
13
Determine the structural relation
Tree 1
Tree 2
Structure similarity
14
Common substructure
car
make
model
year
color
driver
firstName
lastName
license
make
car
model
year
color
driver
first
last
license
Structure similarity
15
Approximate Common Structure
car
make
model
year
color
driver
firstName
lastName
license
make
car
model
year
color
driver
first
last
license
Structure similarity
16
Mappings in an ACS
car
make
model
year
color
driver
first (firstName)
last (lastName)
license
m
ACS1
= {(s1.car, s2.car),
(s1.make, s2.make),
(s1.year, s2.year),
(s1.color, s2.color)}
m
ACS2
= {(s1.dirver, s2.driver),
(s1.fist, s2.firstName),
(s1.last, s2.lastName),
(s1.license, s2.license)}
ACS1
ACS2
Structure similarity
17
Evaluation
Criteria
Matching outcomes
Mappings
Schema similarity
Execution time
Collected four groups of Schemas
Purchase orders used in COMA (5)
Large schemas from XML.org (86)
Schemas on hospitality domain (95)
Extract from WSDL (419)
Evaluation
18
Comparison with edit distance algorithm element
mapping on data group 1
Evaluation
Method 1: our algorithm
Method 2: edit distance
19
Comparison with edit distance: schema similarity data
group 3 and 4
Evaluation
Method 1: our algorithm
Method 2: edit distance
20
Comparison with edit distance: performance
on data group 2
Evaluation
Method 1: our algorithm
Method 2: edit distance
21
Comparison with COMA (Mapping)
COMA
–
'All'
COMA
–
'All+SchemaM'
Our algorithm
Precision
about 0.95
about 0.93
0.88
Recall
about 0.78
about 0.89
0.87
Overall
0.73
0.82
0.75
Overall
is a measure that combines
precision
and
recall
. It
reflects the efforts of
removing
incorrect mappings and adding
missing ones.
Evaluation
22
Conclusion
Scalable schema matching
Wang Lian, David W. Cheung, Nikos Mamoulis, and Siu
-
Ming Yiu,
An Efficient and Scalable Algorithm for Clustering XML Documents
by Structure, TKDE, 2005.
Subtyping
Apply to web service matching
23
Web service synthesis
24
Web Service Composition
Composite web service: “service implemented by
combining the functionality provided by other web
services”
–
G. Alonso et al.
Web service composition: the process of developing a
composite web service
Approaches to web service composition:
Conventional programming languages, such as Java, C#;
Web service composition languages, such as BPEL;
Workflow, pi
-
calculus, petri net, automata…
Web service synthesis.
composition
25
Web Service Synthesis
BPEL and the like are still programming languages
They describe exactly
how
to compose the web services.
Web service synthesis
We describe
what
is the service. But don’t describe how to
implement it;
We don’t even know what are the component services involved;
The relevant services are discovered and invoked
dynamically;
The implementation is synthesized from the web service
specification,
automatically
.
Program synthesis has a long history.
composition
26
Web Service Synthesis
WS
Syntactic Specification (WSDL)
Semantic Specification (Datalog)
Service Implementation
Service Specification (WSDL/Datalog)
WS2
WS1
WS
Service Implementation (BPEL)
composition
27
Syntactic specification: …
Semantic Specification:
chapters(ISBN, PRICE, TITLE, AUTHOR) <
-
Chapters(ISBN, PRICE), Book1(TITLE, ISBN,
AUTHOR).
Synthesis Example
Service specification
Syntactic:
Interface definition defined by WSDL
Semantic:
Q(ISBN, PRICE, TITLE, RATE) <
-
Chapters(ISBN, PRICE),
Book1(TITLE, ISBN, AUTHOR),
Book2(ISBN, COMMENT, RATE).
Service Implementation
Java code, database
Service Specification
Syntactic specification:
WSDL file
Semantic Specification:
amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <
-
Amazon(ISBN, PRICE),
Book1(TITLE, ISBN, AUTHOR),
Book2(ISBN, COMMENT, RATE).
Chapters
amazon
MetaSearchService
??
MetaSearchService
Implementation
composition
28
Generate the abstract implementation by query rewriting
Syntactic specification: …
Semantic Specification:
chapters(ISBN, PRICE, TITLE, AUTHOR) <
-
Chapters(ISBN, PRICE), Book1(TITLE, ISBN,
AUTHOR).
Service specification
Syntactic:
Interface definition defined by WSDL
Semantic:
Q(ISBN, PRICE, TITLE, RATE) <
-
Chapters(ISBN, PRICE),
Book1(TITLE, ISBN, AUTHOR),
Book2(ISBN, COMMENT, RATE).
Service Implementation
Java code, database
Service Specification
Syntactic specification:
WSDL file
Semantic Specification:
amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <
-
Amazon(ISBN, PRICE),
Book1(TITLE, ISBN, AUTHOR),
Book2(ISBN, COMMENT, RATE).
Chapters
amazon
MetaSearchService
Q(ISBN, PRICE, TITLE, RATE) <
-
amazon(ISBN, PRICE, RATE, TITLE', AUTHOR'),
chapters(ISBN, PRICE0, TITLE, AUTHOR).
MetaSearchService Abstract
Implementation
composition
29
Generate the Concrete Implementation
Syntactic specification: …
Semantic Specification:
chapters(ISBN, PRICE, TITLE, AUTHOR) <
-
Chapters(ISBN, PRICE), Book1(TITLE, ISBN,
AUTHOR).
Service specification
Syntactic:
Interface definition defined by WSDL
Semantic:
Q(ISBN, PRICE, PRICE0, TITLE, RATE) <
-
…
Service Implementation
Java code, database
Service Specification
Syntactic specification:
WSDL file
Semantic Specification:
amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <
-
Amazon(ISBN, PRICE),
Book1(TITLE, ISBN, AUTHOR),
Book2(ISBN, COMMENT, RATE).
Chapters
amazon
MetaSearchService
Q(ISBN, PRICE, PRICE0, TITLE, RATE) <
-
amazon(ISBN, PRICE, RATE, TITLE', AUTHOR'),
chapters(ISBN, PRICE0, TITLE, AUTHOR).
MetaSearchService Abstract
Implementation
Invoke amazon;
Invoke chapters;
Combine the output;
MetaSearchService Concrete
Implementation
composition
30
It is a lightweight approach…
Web services
are restricted to be database queries or
functions that can be described by database queries or
Datalog;
Semantic specification
is Datalog instead of more powerful
specification mechanism employing ontology;
Compositions
are restricted to data composition instead of
full
-
blown process specification such as BPEL.
All those choices are meant for the construction of a
practical web service synthesis system…
composition
31
Mapping between Datalog and Web Services
Database vendors also provide wrappers for web services
Behind a web service there is a SQL query that corresponds to the
web service;
SQL defines the semantics of the web service.
Major database vendors support the mapping between SQL and
Web service;
We experimented with DB2WS.
Malaika, S. et al. DB2 and Web Services.
IBM System Journal
, 41(4), pp. 666
-
685. 2002.
composition
32
Generate the Abstract Implementation by Query
rewriting
Definition:
Given a query Q and a set of views V. A
rewriting
of Q using V is a query Q’ such that Q=Q’,
and Q’ refers to one or more views in V.
Q
T1, T2, T3.
Query:
Views:
Rewriting 2:
Q
V1, V2.
Rewriting 1:
Q
V1, T3.
V1
T1,T2.
V2
T2,
T3.
composition
33
Our query rewriting system
composition
34
Limitations of our approach
Focus on database web services;
Datalog is not expressive enough.
Query rewriting in Description Logic, or OWL.
Assume the existence of global database schemas:
Service providers need to provide the semantic definition of web
services in terms a global database schema;
New service specification is also defined using the common schema
Schema matching
composition
35
Other threads
Web service collection and clustering
From UDDI, Crawler, Search engines such as Google
Master thesis to be finished this summer
Web service metrics
Schema subtyping
Based on regular tree grammar
Master thesis to be finished this summer
Bottom up web service composition
Semantic web service
36
Service Oriented Architecture
Discovery
agency
Provider
Requester
interact
find
publish
37
Web service discovery
Keywords search
Based on IR techniques, such as vector space model
Fast, but not accurate
Signature matching
Decide subtype relations between input and output of web services
Used in service composition, to find composable web services
Relaxed matching
Approximate matching, allowing small deviations in both structure
and words/tags
Semantic matching
Matching functional requirements of web services
Used in adaptive, autonomous systems
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο