Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach

internalchildlikeInternet και Εφαρμογές Web

12 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

78 εμφανίσεις

1

Resolving Structural Conflicts in
the Integration of XML Schemas:


A Semantic Approach

Xia Yang


Mong Li Lee



Tok Wang Ling

National University of Singapore

2

Outline


Introduction


Background


Preliminaries


Motivating Example


Integration Algorithm


Related work


Conclusion

3

Introduction


R
ecent research in integrating XML data
sources has mainly concentrated on
schema
matching
.


The XML Schema or DTD is lacking in
semantics
.


The source schemas are heterogeneous,
containing various conflicts involving naming
conflict, cardinality conflict, structural conflict.

4

Background


Most of the work of integration of XML has
focused on the matching problem to find
equivalent elements

among the different sources.


LSD [4] employs instance information and
machine learning
techniques base on the
instance information in their integration work.


E. Jeong and C.
-
N. Hsu [7] use
schema learning

to generate a set of tree grammar rules from the
DTDs in a class and optimizes the rules to
transforms them into an integrated view.



[
4
]
.A. Doan, P. Domingos, A. Levy. Learning Source Descriptions for Data Integration. WebDB, 2000.

[
7
]
.E. Jeong, C.
-
N. Hsu. Induction of Integrated View for XML Data with Heterogeneous DTDs. ACM CIKM, 2001.

5

Background (cont.)


E. Jeong and C.
-
N. Hsu [7]


DTD clustering

clusters DTDs in similar domains into
classes.


Schema learning

applies a tree grammar inference
technique to generate a set of tree grammar rules
from the DTD in a class from the previous step.


Minimization optimizes the rules generated in the
previous step and transforms them into an
integrated
view
.

6

Background (cont.)


These work lack of
semantic meaning
, which
may lead to the wrong integrated schema.


All these work do not take into consideration the
importance

of the individual data sources, and
how the
majority

of the local schemas model
their data.


They are
binary strategies
, cannot take the
importance of sources into consideration.


7

Preliminaries


ORA
-
SS Model


The
ORA
-
SS

model (Object
-
Relationship
-
Attribute model for Semi
-
Structured data) is a
semantically rich data model that has been
designed for semi
-
structured data [5].


The ORA
-
SS model distinguishes between
objects
,
relationships

and
attributes
.

[
5
]
.

G
.

Dobbie,

X
.

Wu,

T
.
W
.

Ling,

M
.
L
.

Lee
.

ORA
-
SS
:

An

Object
-
Relationship
-
Attribute

Model

for

Semi
-
structured

Data
.

Technical

Report

TR
21
/
00
,

National

University

of

Singapore,

2000
.

8

Preliminaries
(cont.)


ORA
-
SS Schema Diagram example.

p
a
r
t
s
u
p
p
l
i
e
r
s
n
o
p
n
o
q
u
a
n
t
i
t
y
j
p
s
,
3
,
1
:
n
,
1
:
n
j
p
s
p
r
o
j
e
c
t
j
n
o
j
p
,
2
,
1
:
n
,
1
:
n
f
u
n
d
s
u
n
o
p
r
o
j
e
c
t

m
a
n
a
g
e
r
n
a
m
e
m
n
o
9

Preliminaries
(cont.)



Assumptions for the algorithm


The
input

to the proposed integration algorithm is a set of ORA
-
SS schemas with source weight.


The
output

of the algorithm is an integrated schema, also
modeled in ORA
-
SS.


The integrated schema should contain all the information
modeled in the original schemas. Further, the integrated schema
should be as simple and concise as possible to facilitate users’
understanding.


For meaningful integration to occur, we assume that the various
sources model
similar domains
.


Object classes and relationship sets with the same label name
are considered to be semantically
equivalent
. Attributes of the
same object class (or relationship set) with the same label name
are also semantically
equivalent
.

10

Motivating Example

p
r
o
j
e
c
t
p
a
r
t
j
n
o
p
n
o
l
o
c
a
l

f
u
n
d
s
l
n
o
f
o
r
e
i
g
n

f
u
n
d
s
f
n
o
p
r
o
j
e
c
t
m
a
n
a
g
e
r
p
r
o
j
e
c
t

m
a
n
a
g
e
r
m
n
o
n
a
m
e
e
m
a
i
l
o
r
g
a
n
i
z
a
t
i
o
n
o
r
g

n
a
m
e
a
b
b
r
e
v
i
a
t
i
o
n
f
u
l
l

n
a
m
e
p
a
r
t
s
u
p
p
l
i
e
r
s
n
o
p
n
o
q
u
a
n
t
i
t
y
j
p
s
,
3
,
1
:
n
,
1
:
n
j
p
s
p
r
o
j
e
c
t
j
n
o
j
p
,
2
,
1
:
n
,
1
:
n
f
u
n
d
s
u
n
o
p
r
o
j
e
c
t

m
a
n
a
g
e
r
n
a
m
e
m
n
o
(a) Schema S1, sw1=1 (b) Schema S2, sw2=1


(c) Schema S3, sw3=7 (d) Schema S4, sw4=1

The
swi

under each schema indicates the source weight, i.e., the importance of a source.

This is determined by users or computed based on some statistic information.

p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
j
n
o
s
n
o
n
a
m
e
p
r
o
j
e
c
t

m
a
n
a
g
e
r
m
n
o
s
t
a
f
f
o
r
d
i
n
a
r
y

s
t
a
f
f
e
n
o
p
a
r
t
p
n
o
j
s
,
2
,
1
:
n
,
1
:
n
j
s
p
,
3
,
1
:
n
,
1
:
n
n
a
m
e
o
r
g

n
a
m
e
a
b
b
r
e
v
i
a
t
i
o
n
f
u
l
l

n
a
m
e
a
d
d
r
e
s
s
11

Motivating Example


A. Resolve
attribute
-
object class conflict
.


B. Resolve
generalizations
and
specializations
.


C.
Merge

the schemas to obtain an
integrated graph.


D. Transform integrated graph to resolve
structural conflicts

and remove
redundancy
.


E. Augment Graph with
Attributes
.

12

A. Resolve attribute
-
object class conflict.


This occurs when a concept has been
modeled as an attribute in one schema,
and as an object class in another schema.




This conflict can be resolved by
transforming the

attribute to an object
class
.

13

A. Resolve attribute
-
object class conflict.
(cont.)
(example)

p
r
o
j
e
c
t
p
a
r
t
j
n
o
p
n
o
l
o
c
a
l

f
u
n
d
s
l
n
o
f
o
r
e
i
g
n

f
u
n
d
s
f
n
o
p
r
o
j
e
c
t
m
a
n
a
g
e
r
p
r
o
j
e
c
t

m
a
n
a
g
e
r
m
n
o
n
a
m
e
e
m
a
i
l
o
r
g
a
n
i
z
a
t
i
o
n
o
r
g

n
a
m
e
a
b
b
r
e
v
i
a
t
i
o
n
f
u
l
l

n
a
m
e
p
r
o
j
e
c
t
p
a
r
t
j
n
o
p
n
o
l
o
c
a
l

f
u
n
d
s
l
n
o
f
o
r
e
i
g
n

f
u
n
d
s
f
n
o
p
r
o
j
e
c
t

m
a
n
a
g
e
r
m
n
o
(a) Schema S1, sw1=1 (b) Schema S2, sw2=1

(c) Schema S1’: Attribute “project manager” in schema S1 has

been transformed into an object class “project manager” in S1’.

14

B. Resolve generalizations and specializations.


A generalization exists when an object
class in one schema is the union of
several object classes in another schema.



The integrated schema will include the
generalization isa hierarchy
.

15

B. Resolve generalizations and specializations.
(cont.)
(example)

p
r
o
j
e
c
t
p
a
r
t
j
n
o
p
n
o
l
o
c
a
l

f
u
n
d
s
l
n
o
f
o
r
e
i
g
n

f
u
n
d
s
f
n
o
p
r
o
j
e
c
t
m
a
n
a
g
e
r
p
a
r
t
s
u
p
p
l
i
e
r
s
n
o
p
n
o
q
u
a
n
t
i
t
y
j
p
s
,
3
,
1
:
n
,
1
:
n
j
p
s
p
r
o
j
e
c
t
j
n
o
j
p
,
2
,
1
:
n
,
1
:
n
f
u
n
d
s
u
n
o
p
r
o
j
e
c
t

m
a
n
a
g
e
r
n
a
m
e
m
n
o
l
o
c
a
l

f
u
n
d
s
l
n
o
f
o
r
e
i
g
n

f
u
n
d
s
f
n
o
f
u
n
d
s

Schema S1, sw1=1 Schema S
4
, sw
4
=1


Build a generalization hierarchy from part of S1,


which is used for next step to generate an integrated graph

16

C. Merge the schemas to obtain an integrated
graph.


Each node in the graph denotes an object class,
and edges represent the relationship sets
among the object classes.


To facilitate processing, attributes are first
omitted from the integrated graph. The attributes
will be incorporated into the final integrated
schema.


Compute the edge weight.


17

C. Merge the schemas to obtain an
integrated graph. (cont.)


Compute the
edge weight
.


For each original source, the edge weight is the
source weight multiplied by the number of relationship
sets involved in this edge.


The edge weight in the integrated graph is the sum of
all the edge weights of this edge from the original
sources.

18

C. Merge the schemas to obtain an integrated graph.
(cont.) (example)


p
r
o
j
e
c
t
p
a
r
t
j
n
o
p
n
o
l
o
c
a
l

f
u
n
d
s
l
n
o
f
o
r
e
i
g
n

f
u
n
d
s
f
n
o
p
r
o
j
e
c
t
m
a
n
a
g
e
r
p
r
o
j
e
c
t

m
a
n
a
g
e
r
m
n
o
n
a
m
e
e
m
a
i
l
o
r
g
a
n
i
z
a
t
i
o
n
o
r
g

n
a
m
e
a
b
b
r
e
v
i
a
t
i
o
n
f
u
l
l

n
a
m
e
p
a
r
t
s
u
p
p
l
i
e
r
s
n
o
p
n
o
q
u
a
n
t
i
t
y
j
p
s
,
3
,
1
:
n
,
1
:
n
j
p
s
p
r
o
j
e
c
t
j
n
o
j
p
,
2
,
1
:
n
,
1
:
n
f
u
n
d
s
u
n
o
p
r
o
j
e
c
t

m
a
n
a
g
e
r
n
a
m
e
m
n
o
(a) Schema S1, sw1=1 (b) Schema S2, sw2=1


(c) Schema S3, sw3=7 (d) Schema S4, sw4=1

p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
j
n
o
s
n
o
n
a
m
e
p
r
o
j
e
c
t

m
a
n
a
g
e
r
m
n
o
s
t
a
f
f
o
r
d
i
n
a
r
y

s
t
a
f
f
e
n
o
p
a
r
t
p
n
o
j
s
,
2
,
1
:
n
,
1
:
n
j
s
p
,
3
,
1
:
n
,
1
:
n
n
a
m
e
o
r
g

n
a
m
e
a
b
b
r
e
v
i
a
t
i
o
n
f
u
l
l

n
a
m
e
a
d
d
r
e
s
s
19

C. Merge the schemas to obtain an integrated
graph.
(cont.)


Example of
edge weight


Since we have “project” as the parent of “project
manager” in schemas S1 and S4, the weight of the
edge from “project” to “project manager” is given by
the sum of the weights of these schemas, that is,
1+1=2.


Since “project” is the parent of “staff” in schema S3
only, the weight of this edge is 7. Since the edge from

project
” to “
supplier
” in S3 is actually involved in two
relationship sets js and
j
sp, its edge weight would be
given by 7*2=14.

20

C. Merge the schemas to obtain an integrated
graph.
(cont.)

(example)

p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
p
r
o
j
e
c
t

m
a
n
a
g
e
r
s
t
a
f
f
o
r
d
i
n
a
r
y

s
t
a
f
f
p
a
r
t
j
s
,
2
,
1
:
n
,
1
:
n
j
s
p
,
3
,
1
:
n
,
1
:
n
o
r
g
a
n
i
z
a
t
i
o
n
o
r
g

n
a
m
e
f
u
n
d
s
f
o
r
e
i
g
n

f
u
n
d
s
l
o
c
a
l

f
u
n
d
s
3
7
7
1
1
4
2
7
2
7
1
1
7
1
1
Integrated graph obtained from the schemas in page 10.

21

D. Transform integrated graph to resolve structural
conflicts and remove redundancy.



D
-
1. Differentiate
semantically different relationship
sets

among equivalent object classes.



D
-
2. Remove relationship sets that are
projections of
higher degree relationship sets
.



D
-
3. Resolve
ancestor
-
descendant conflicts
.



D
-
4. Remove
transitive relationship sets
.



D
-
5. Remove
other type of
multiple parent nodes
.

22

multiple parent nodes


If a node has more than one incoming
edges in an integrated graph, it is called a
multiple parent node
.

23

D
-
1. Differentiate semantically different relationship
sets among equivalent object classes.


If the relationship sets among the same
object classes are semantically different.


Then duplicate the nodes and make
foreign key
-
key references in the
integrated graph. Move the object classes
involved in n
-
nary (n>2) relationship set.

24

D
-
1. Differentiate semantically different relationship
sets among equivalent object classes.
(cont.) (example)

p
e
r
s
o
n
h
o
u
s
e
n
a
m
e
a
d
d
r
e
s
s
p
h
1
,
2
,
1
:
n
,
1
:
n
t
y
p
e
p
e
r
s
o
n
h
o
u
s
e
n
a
m
e
a
d
d
r
e
s
s
p
h
2
,
2
,
1
:
n
,
1
:
n
t
y
p
e
c
o
n
t
r
a
c
t
c
i
d
t
i
m
e
p
h
c
,
3
,
1
:
1
,
1
:
n
p
e
r
s
o
n
h
o
u
s
e
p
h
2
,
2
,
1
:
n
,
1
:
n
c
o
n
t
r
a
c
t
p
h
1
,
2
,
1
:
n
,
1
:
n
p
h
c
,
3
,
1
:
1
,
1
:
n
Integrated graph G56


Transformed graph G56’





because ph1 and ph2 are semantically different.

Schema

S5




Schema S6

p
e
r
s
o
n
h
o
u
s
e
1
a
d
d
r
e
s
s
p
h
1
,
2
,
1
:
n
,
1
:
n
h
o
u
s
e
2
a
d
d
r
e
s
s
p
h
2
,
2
,
1
:
n
,
1
:
n
h
o
u
s
e
a
d
d
r
e
s
s
c
o
n
t
r
a
c
t
p
h
c
,
3
,
1
:
1
,
1
:
n
t
y
p
e
back

Ph1 and ph2 are semantically different.

25

D
-
2. Remove relationship sets that are projections of
higher degree relationship sets.


A schema may model a relationship set
that is a projection of another relationship
set in another schema.


Keep the complete relationship set in the
integrated schema.

26

D
-
2. Remove relationship sets that are projections of
higher degree relationship sets.
(cont.) (example)

p
r
o
j
e
c
t
p
a
r
t
j
n
o
p
n
o
l
o
c
a
l

f
u
n
d
s
l
n
o
f
o
r
e
i
g
n

f
u
n
d
s
f
n
o
p
r
o
j
e
c
t
m
a
n
a
g
e
r
j
p
p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
p
a
r
t
p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
p
a
r
t

Schema S1 Schema S
3

Part of Integrated graph G13

Part of Transformed graph G13’

If jp is a projection of jsp.

p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
j
n
o
s
n
o
n
a
m
e
p
r
o
j
e
c
t

m
a
n
a
g
e
r
m
n
o
s
t
a
f
f
o
r
d
i
n
a
r
y

s
t
a
f
f
e
n
o
p
a
r
t
p
n
o
j
s
,
2
,
1
:
n
,
1
:
n
j
s
p
,
3
,
1
:
n
,
1
:
n
n
a
m
e
o
r
g

n
a
m
e
a
b
b
r
e
v
i
a
t
i
o
n
f
u
l
l

n
a
m
e
a
d
d
r
e
s
s
27

D
-
3. Resolve ancestor
-
descendant
conflicts.


An
ancestor
-
descendant conflict

arises when a schema
models an object class A as an ancestor of object class
B, and another schema models B as the ancestor of A.
Such conflicts appear as cycles in the integrated graph.


The simplest form of this conflict is the
parent
-
child
conflict
.


In the ancestor
-
descendant conflict cycle, remove the
edge with the smallest edge weight, if this relationship
set can be derived from other relationship sets in the
cycle. If there are two edges with the same smallest
edge weight, remove either one.

28

D
-
3. Resolve ancestor
-
descendant conflicts.
(cont.)
(example)

Parent
-
child conflict

p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
p
r
o
j
e
c
t

m
a
n
a
g
e
r
s
t
a
f
f
o
r
d
i
n
a
r
y

s
t
a
f
f
p
a
r
t
j
s
,
2
,
1
:
n
,
1
:
n
j
s
p
,
3
,
1
:
n
,
1
:
n
o
r
g
a
n
i
z
a
t
i
o
n
o
r
g

n
a
m
e
f
u
n
d
s
f
o
r
e
i
g
n

f
u
n
d
s
l
o
c
a
l

f
u
n
d
s
3
7
7
1
1
4
2
7
2
7
1
1
7
1
1
29

D
-
3. Resolve ancestor
-
descendant conflicts.
(cont.)
(example)

Parent
-
child conflict: the edge from part to supplier is removed.

p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
p
r
o
j
e
c
t

m
a
n
a
g
e
r
s
t
a
f
f
o
r
d
i
n
a
r
y

s
t
a
f
f
p
a
r
t
j
s
,
2
,
1
:
n
,
1
:
n
j
s
p
,
3
,
1
:
n
,
1
:
n
o
r
g
a
n
i
z
a
t
i
o
n
o
r
g

n
a
m
e
f
u
n
d
s
f
o
r
e
i
g
n

f
u
n
d
s
l
o
c
a
l

f
u
n
d
s
3
7
7
1
4
2
7
2
7
1
1
7
1
1
30

D
-
3. Resolve ancestor
-
descendant conflicts.
(cont.)
(example)

d
e
p
a
r
t
m
e
n
t
h
e
a
d
d
n
a
m
e
h
i
d
h
e
a
d

s
e
c
r
e
t
a
r
y
s
i
d
d
h
h
s
h
e
a
d

s
e
c
r
e
t
a
r
y
d
e
p
a
r
t
m
e
n
t
s
i
d
d
n
a
m
e
s
d
d
e
p
a
r
t
m
e
n
t
h
e
a
d
h
e
a
d

s
e
c
r
e
t
a
r
y
d
h
h
s
s
d
h
e
a
d

s
e
c
r
e
t
a
r
y
d
e
p
a
r
t
m
e
n
t
h
e
a
d
s
d
d
h
h
e
a
d
d
e
p
a
r
t
m
e
n
t
h
s
s
d
h
e
a
d

s
e
c
r
e
t
a
r
y

Schema S7


Schema S8

Integrated graph G78


Case 1:



Case 2(a):



Case 2(b):


if sd can be derived by dh and hs if hs derived by sd and dh



if dh derived by hs and sd)


and
sw7=2 and sw8=1
) and
sw7=1 and sw8=2

and
sw7=1 and sw8=2


Transformed graph G78’

Transformed graph G78’’(a)

Transformed graph G78’’(b)

d
e
p
a
r
t
m
e
n
t
h
e
a
d
h
e
a
d

s
e
c
r
e
t
a
r
y
d
h
h
s
31

D
-
4. Remove transitive relationship
sets and redundant object classes.


If one relationship set from object class A to object class
B can be derived from relationship sets which is from A
to other object class sets and back to B, it is called
transitive relationship set
.


Transitive relationship sets are also
redundant
, and can
be removed so that the resulting integrated graph will be
concise, if the intermediate node has attribute or other
sub
-
object classes.


If the intermediate node has no attribute and no other
sub
-
object classes, it will be considered as redundant
object classes.


32

D
-
4. Remove transitive relationship sets and
redundant object classes.
(cont.) (example)

p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
p
r
o
j
e
c
t

m
a
n
a
g
e
r
s
t
a
f
f
o
r
d
i
n
a
r
y

s
t
a
f
f
p
a
r
t
j
s
,
2
,
1
:
n
,
1
:
n
j
s
p
,
3
,
1
:
n
,
1
:
n
o
r
g
a
n
i
z
a
t
i
o
n
o
r
g

n
a
m
e
f
u
n
d
s
f
o
r
e
i
g
n

f
u
n
d
s
l
o
c
a
l

f
u
n
d
s
3
7
7
1
4
2
7
2
7
1
1
7
1
1
33

D
-
4. Remove transitive relationship sets and
redundant object classes.
(cont.) (example)

p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
p
r
o
j
e
c
t

m
a
n
a
g
e
r
s
t
a
f
f
o
r
d
i
n
a
r
y

s
t
a
f
f
p
a
r
t
j
s
,
2
,
1
:
n
,
1
:
n
j
s
p
,
3
,
1
:
n
,
1
:
n
o
r
g

n
a
m
e
f
u
n
d
s
f
o
r
e
i
g
n

f
u
n
d
s
l
o
c
a
l

f
u
n
d
s
7
7
1
4
2
7
7
7
1
1
34

D
-
5. Remove other type of multiple
parent nodes.


If a node has more than one incoming edges in
an integrated graph, it is called a
multiple parent
node
.


Case 1: D
-
1 Different relationship sets among the
same object classes.


Case 2: D
-
2 Relationship sets that are projections of
the higher degree relationship sets.


Case 3: D
-
3 Ancestor
-
descendant conflicts.


Case 4: D
-
4 Transitive relationship sets.


Case 5: D
-
5 Others. As examples in the following
page.


35

D
-
5. Remove multiple parent nodes.
(cont.)

(example)

s
c
h
o
o
l
s
t
u
d
e
n
t
s
c
n
a
m
e
s
n
u
e
m
a
i
l
p
r
o
j
e
c
t
s
t
d
u
e
n
t
j
n
o
s
n
u
a
d
d
r
e
s
s
j
d
j
d
m
a
r
k
s
c
h
o
o
l
s
t
d
u
e
n
t
p
r
o
j
e
c
t
s
c
h
o
o
l
s
t
u
d
e
n
t
1
s
n
u
s
t
u
d
e
n
t
2
s
n
u
s
t
u
d
e
n
t
s
n
u
p
r
o
j
e
c
t
m
a
r
k
j
d
j
d
e
m
a
i
l
a
d
d
r
e
s
s

Schema S9


Schema S10


Integrated graph G9
-
10

Transformed Graph G9
-
10’

back


Case 5:

36

Transformed graph (summary)


Original integrated graph

p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
p
r
o
j
e
c
t

m
a
n
a
g
e
r
s
t
a
f
f
o
r
d
i
n
a
r
y

s
t
a
f
f
p
a
r
t
j
s
,
2
,
1
:
n
,
1
:
n
j
s
p
,
3
,
1
:
n
,
1
:
n
o
r
g
a
n
i
z
a
t
i
o
n
o
r
g

n
a
m
e
f
u
n
d
s
f
o
r
e
i
g
n

f
u
n
d
s
l
o
c
a
l

f
u
n
d
s
3
7
7
1
1
4
2
7
2
7
1
1
7
1
1
37

Transformed graph (cont.)


Transformed graph


p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
p
r
o
j
e
c
t

m
a
n
a
g
e
r
s
t
a
f
f
o
r
d
i
n
a
r
y

s
t
a
f
f
p
a
r
t
j
s
,
2
,
1
:
n
,
1
:
n
j
s
p
,
3
,
1
:
n
,
1
:
n
o
r
g

n
a
m
e
f
u
n
d
s
f
o
r
e
i
g
n

f
u
n
d
s
l
o
c
a
l

f
u
n
d
s
7
7
1
4
2
7
7
7
1
1
38

E. Augment Graph with Attributes


Augment the
graph

with the
attributes of object classes

in the integrated schema.


Augment the
graph

with the
attributes of relationship
sets

in the integrated schema.


For the attributes of duplicated object classes in case D
-
1 and D
-
5, the attributes will become the attributes of the
original ones, not the duplicated object classes.
D
-
1

D
-
5


For attributes of relationship sets which have been
removed,
the attributes will be the attributes of
relationship sets which could derive this relationship set.

39

Final integrated schema

(example)

p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
j
n
o
s
n
o
n
a
m
e
p
r
o
j
e
c
t

m
a
n
a
g
e
r
m
n
o
s
t
a
f
f
o
r
d
i
n
a
r
y

s
t
a
f
f
e
n
o
p
a
r
t
p
n
o
j
s
,
2
,
1
:
n
,
1
:
n
j
s
p
,
3
,
1
:
n
,
1
:
n
n
a
m
e
o
r
g

n
a
m
e
a
b
b
r
e
v
i
a
t
i
o
n
f
u
l
l

n
a
m
e
l
o
c
a
l

f
u
n
d
s
f
u
n
d
s
f
o
r
e
i
g
n

f
u
n
d
s
l
n
o
f
n
o
e
m
a
i
l
q
u
a
n
t
i
t
y
a
d
d
r
e
s
s
j
s
p
back

40

Integration Algorithm

1.
Preprocessing
.

a
.

Resolve

attribute
-
object

class

conflict
.

b
.

Resolve

generalizations

and

specializations
.

2.
Construct

integrated

graph
.

3.
Transform

graph
.

4.
Augment

graph

with

attributes
.


41

Step 3 Transform Graph


3.1 Differentiate
semantically different
relationship sets

among equivalent object
classes.


3.2 Remove relationship sets that are
projections of higher degree relationship sets
.


3. 3
Resolve any
ancestor
-
descendant conflicts

which create cycles in G.


3.4 Remove
transitive relationship sets

and
redundant object classes
.


3.5 Remove other
multiple parent nodes
.





42

Related work (compare with [7])

p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
j
n
o
s
n
o
n
a
m
e
p
r
o
j
e
c
t

m
a
n
a
g
e
r
m
n
o
s
t
a
f
f
o
r
d
i
n
a
r
y

s
t
a
f
f
e
n
o
p
a
r
t
p
n
o
j
s
,
2
,
1
:
n
,
1
:
n
j
s
p
,
3
,
1
:
n
,
1
:
n
n
a
m
e
q
u
a
n
t
i
t
y
l
o
c
a
l

f
u
n
d
s
l
n
o
f
o
r
e
i
g
n

f
u
n
d
s
f
n
o
f
u
n
d
s
u
n
o
e
m
a
i
l
a
d
d
r
e
s
s
o
r
g

n
a
m
e
a
b
b
r
e
v
i
a
t
i
o
n
f
u
l
l

n
a
m
e
problems by

Jeong and C.
-
N. Hsu
[7].

43

Related work (compare with [7]) (cont.)

p
r
o
j
e
c
t
s
u
p
p
l
i
e
r
j
n
o
s
n
o
n
a
m
e
p
r
o
j
e
c
t

m
a
n
a
g
e
r
m
n
o
s
t
a
f
f
o
r
d
i
n
a
r
y

s
t
a
f
f
e
n
o
p
a
r
t
p
n
o
j
s
,
2
,
1
:
n
,
1
:
n
j
s
p
,
3
,
1
:
n
,
1
:
n
n
a
m
e
o
r
g

n
a
m
e
a
b
b
r
e
v
i
a
t
i
o
n
f
u
l
l

n
a
m
e
l
o
c
a
l

f
u
n
d
s
f
u
n
d
s
f
o
r
e
i
g
n

f
u
n
d
s
l
n
o
f
n
o
e
m
a
i
l
q
u
a
n
t
i
t
y
a
d
d
r
e
s
s
j
s
p
Integrated schema obtained by our approach

44

Related work (compare with [7]) (cont.)


Our proposed method employs the ORA
-
SS conceptual
model which is able to capture the
semantics

necessary
for the resolution of structural conflict during integration.


We could take into consideration the
importance

of the
individual data sources, and how the
majority

of the local
schemas model their data.


We employ

n
-
nary strategy
. While
The
binary strategy

will not be able to utilize the source importance and how
the majority of the sources model the data. N
-
nary
strategy is also faster compare to the binary strategy.


( For example, sw1=2,sw2=1,sw3=1,sw4=1and schema 2, 3, 4 are same. The binary
strategy might treat sw1 as the most important schema, while in fact schema 2, 3 and
4 are.)

45

Conclusion


In

this

paper,

we

have

introduced

a

semantic

approach

to

resolve

structural

conflicts

in

the

integration

of

XML

schemas
.



We

employed

the

ORA
-
SS

semantic

data

model

to

capture

the

implicit

semantics

in

an

XML

schema
.



We

presented

a

comprehensive

n
-
nary

algorithm

to

integrate

XML

schemas
.



our

algorithm

takes

into

account

the

data

semantics
,

the

importance

of

a

source,

and

how

the

majority

of

the

sources

model

their

data
.



Structural

conflicts

such

as

attribute/object

class

conflict
,

ancestor
-
descendant

conflict

are

resolved

in

our

approach
.

We

also

remove

redundant

object

classes

and

relationship

sets

such

as

transitive

relationship

sets
,

and

relationship

sets,

which

are

projections

of

higher

degree

relationship

sets

in

order

to

obtain

a

concise

integrated

schema
.


46

References


1. S.Castano, V. Antonellis, S. C. Vimercati, M. Melchiori.
An XML
-
Based Framework for Information Integration over the Web. IIWAS,
2000.


2.Y.B. Chen, T.W. Ling, M.L. Lee. Designing Valid XML Views. ER, 2002.


3.P. Buneman, S. Davidson, W. Fan, C. Hara, W.C. Tan. Keys for XML. WWW, 2001.


4.A. Doan, P. Domingos, A. Levy. Learning Source Descriptions for Data Integration. WebDB, 2000.


5.G. Dobbie, X. Wu, T.W. Ling, M.L. Lee. ORA
-
SS: An Object
-
Relationship
-
Attribute Model for Semi
-
structured Data. Technical Repo
rt
TR21/00, National University of Singapore, 2000.


6.E. Rahm, P. Bernstein. On Matching Schemas Automatically. MSR Tech. Report MSR
-
TR
-
2001
-
17, 2001.


7.E. Jeong, C.
-
N. Hsu. Induction of Integrated View for XML Data with Heterogeneous DTDs. ACM CIKM, 2001.


8.T.W. Ling, M.L. Lee. Relational to Entity
-
Relationship Schema Translation Using Semantic and Inclusion Dependencies, in Journa
l of
Integrated Computer
-
Aided Engineering, John
-
Wiley Publishers, Vol 2, No 2, pages 125
-
145, 1995.


9.M.L. Lee, T.W. Ling. Resolving Structural Conflicts in the Integration of Entity
-
Relationship Schemas. OOER, 1995.


10.M.L. Lee, T.W. Ling. Resolving Constraint Conflicts in the Integration of Entity
-
Relationship Schemas. ER, 1997.


11.M.L. Lee, T.W. Ling, W.L. Low. Designing Functional Dependencies for XML, EDBT, 2002.


12.M.L. Lee, L.H. Yang, W. Hsu, X. Yang. XClust: Clustering XML Schemas for Effective Integration, ACM CIKM, 2002.


13.D. Maier. Theory of Relational Databases. Computer Science Press, 1983.


14.J. Madhavan, P.A. Bernstein, E. Rahm. Generic Schema Matching with Cupid. VLDB, 2001.


15.R. Mello, S. Castano, C.A. Heuser. A Method for the Unification of XML. Information and Software Technology Journal, 2002.


16.P. Mitra, G. Wiederhold and J. Jannink. Semi
-
automatic Integration of Knowledge Sources. Fusion, 1999.


17.P. Mitra, G. Wiederhold, M. Kersten. A Graph
-
Oriented Model for Articulation of Ontology Interdependencies. EDBT 2000.


18.F. Naumann, U. Leser, J.C. Freytag. Quality
-
driven Integration of Heterogeneous Information Systems. VLDB, 1999.


19.C. Reynaud, J.
-
P. Sirot, D. Vodislav. Semantic Integration of XML Heterogeneous Data Sources. IDEAS, 2001.


20.
http://www.cogsci.princeton.edu/~wn


21.Xyleme. A dynamic warehouse for XML Data of the Web. IEEE Data Engineering Bulletin 24(2):40
-
47, 2001.


22.L.L. Yan, T.W. Ling. Translating Relational Schema with Constraints into OODB Schema. IFIP DS
-
5 Semantics of Interoperable
Database Systems. 1992