utilizing Instance Correspondences

hurriedtinkleAI and Robotics

Nov 15, 2013 (3 years and 4 months ago)

54 views

iFuice



Information Fusion

utilizing Instance Correspondences
and Peer Mappings

Erhard Rahm,
Andreas Thor
, David Aumueller,
Hong
-
Hai Do, Nick Golovin, Toralf Kirsten


University of Leipzig, Germany

http://dbs.uni
-
leipzig.de

2

Motivating scenario


Integrating ...

Citeseer

ACM

DBLP

Eventseer

Google Scholar

Who published at SIGMOD as
a PC member?

Who are the candidates for the
SIGMOD test of time award?


Additional relationships / attributes (Eventseer, Google
Scholar)

Local file

Who referenced publications of
my favorite authors?


Hand
-
picked private data (local file)

PubMed

SwissProt

MIM

What information system is used to
support biological cancer anlaysis?


Sources from different domains (SwissProt, MIM)

3

Schema vs. instance based integration


Data integration using query mediator approach


Mediated (global) schema


Matching / views between global and local schemas


Problems


Construction/evolution of global schema


Sources without or semi
-
structured schema


Heterogeneous/dirty data, mapping to artificial schema


Instance correspondences


Represent semantic relationships between instances


Allow integration of sources without schema


Can be inferred by weblinks

4

iFuice

approach


I
nformation
F
usion
u
tilizing
I
nstance
C
orrespondences and P
e
er Mappings


Bottom up integration


High
-
level operators


Generic way to dynamic information fusion


Mediator


Controls mapping / operator execution


Utilizes a domain model


P2P
-
like infrastructure


Correspondences between autonomous data sources


Easy link
-
up of a new source "where it fits best"

5

Agenda


Motivation &
iFuice

approach


Meta data model


Operators


iFuice

scripts


Architecture


Summary & outlook

6

Data sources


Physical data source (PDS)


Web data (DBLP), local data (files), ...


Splitted in logical data sources


Logical data source (LDS)


Refers to one object type


Contains object instances


Object instance


Refers to real world entity


Set of attributes


One attribute is
id

Publication

Conference

Author

DBLP

Name:
Generic schema matching with Cupid

URL:
http://vldb.org...

Conference:
VLDB 2001

Authors:
Jayant Madhavan, Philip A. Bernstein,


Erhard Rahm

Publication

DBLP

7

Mappings


Directed relationship between LDS


Meta data: meaning of the mapping


Semantic mapping type


e.g., "publications of author"


Same mappings

vs.
association mappings


same = "equality" relationship between PDS


e.g., DBLP publication (id)


ACM publication (id)


Id mappings

vs.
query mappings


Instance data: instance correspondences


Materialized: mapping tables


On
-
the
-
fly: execution result (e.g., from web service)

8

Metadata model

Source mapping model

Publication

Conference

Auhor

DBLP

Author

Publication

Publication

ACM

Google Scholar


Used by mediator for mapping/operator execution


Domain model

indicates available object types
and relationships

Author

Publication

Conference

Domain model

extract

LDS

PDS

mapping

(same: )

Legend

9

Operators


Query language capabilites + scripting support


Set
-
oriented operators


Input: set of object or mapping instances

+ parameters / query specification


Output: set of object / mapping instances



Can be
combined bottom
-
up

within
scripts


10

Operators overview


Object instances (OI)


Query


OI:
queryInstances, queryMatch, attrTransf


OI


OI:

getInstances, traverse, traverseSame, map


Aggregated objects (AO)


OI


AO:
agg, disagg, fuseAttributes


AO


AO:
aggregateSame, aggregateTraverse,
aggregateMap


Generic


union, diff, intersect


domain, range, compose

11

Operators for object instances


queryInstances

executes a query on a peer


$S := queryInstances (Conf@DBLP, Series="SIGMOD")

returns all SIGMOD conferences from DBLP


map

executes a mapping


map ($S, DBLP.ConfPubs)

returns all tuples (conference, publication)



traverse

returns the range of a mapping


$P := traverse ($S, DBLP.ConfPubs)

returns all publications


traverseSame

"navigates" to corresponding objects
of another physical source


traverseSame ($P, GoogleScholar)

returns "equal" publications at GoogleScholar

12

Name:
Generic schema matching with Cupid

URL:
http://vldb.org...

Conference:
VLDB 2001

Authors:
Jayant Madhavan, Philip A. Bernstein,


Erhard Rahm

Publication

DBLP

Instance fusion

Name:
Generic schema matching with Cupid

URL:
http:// data.cs.washington.edu...

NoOfCit: 243

Authors:
J Madhavan, PA Bernstein, E Rahm

Publication

GS


Object instances referring to
the same real world object



Aggregated object


Auxillary fusion operators


agg / disagg, fuseAttributes


Generic schema matching with Cupid

http://vldb.org...

http:// data.cs.washington.edu...

Jayant Madhavan, Philip A. Bernstein,

Erhard Rahm

J Madhavan, PA Bernstein, E Rahm

VLDB 2001

243

Publication

DBLP

DBLP

GS

GS

Name:

URL:


Authors:



Conf.:

NoOfCit:

DBLP

GS

DBLP

GS

fuseAttributes

Name:
Generic schema matching with Cupid

URL:
http://vldb.org...

Conference:
VLDB 2001

Authors:
Jayant Madhavan, Philip A. Bernstein,


Erhard Rahm

Name:
Generic schema matching with Cupid

URL:
http:// data.cs.washington.edu...

NoOfCit: 243

Authors:
J Madhavan, PA Bernstein, E Rahm

Publication

DBLP

DBLP

DBLP

DBLP

GS

GS

GS

GS

agg

13

Operators for aggregated objects


aggregateSame


Identify corresponding objects in another source
(traverseSame)


Aggregate resulting objects with input objects (agg)


aggregateSame ($P, GoogleScholar)

returns AOs of (DBLP + GoogleScholar) publications

Name:
Generic schema matching with Cupid

URL:
http://vldb.org...

Conference:
VLDB 2001

Authors:
Jayant Madhavan, Philip A. Bernstein,


Erhard Rahm

Publication

DBLP

Name:
Generic schema matching with Cupid

URL:
http:// data.cs.washington.edu...

NoOfCit: 243

Authors:
J Madhavan, PA Bernstein, E Rahm

Publication

GS

traverse

Same

Name:
Generic schema matching with Cupid

URL:
http://vldb.org...

Conference:
VLDB 2001

Authors:
Jayant Madhavan, Philip A. Bernstein,


Erhard Rahm

Name:
Generic schema matching with Cupid

URL:
http:// data.cs.washington.edu...

NoOfCit: 243

Authors:
J Madhavan, PA Bernstein, E Rahm

Publication

DBLP

DBLP

DBLP

DBLP

GS

GS

GS

GS

agg

14

iFuice

scripts


Batch execution of operators


Store (intermediate) results in variables


Scripts

can be interpreted
as mappings


Other scripts can utilize
iFuice

"script mappings"


Example:
SIGMOD test of time award


$
SIGMODPubs

:=
queryTraverse

(LDS=DBLP.Conf, {Name="SIGMOD 1995"}, DBLPConfPubs)

$CombinedConfPub

:=
aggregateSame

(
$SIGMODPubs
, GoogleScholar)

$CleanedPubs

:=
fuseAttributes

($CombinedConfPub
)

$Result

:=
sort

($
CleanedPubs
, "NoOfCitings")

15

Example: SIGMOD test of time award


16

Mediator architecture

Mapping handler

Duplicate detection

Fusion control unit

mapping


results

Cache

Meta data

model

Repository

store

load

mapping call

mapping result

load


iFuice

mediator


Application

Bio navigator

iFuice mediator

Personal Infor
-
mation Manager

Script / batch

Interactive (step by step)

Mediator interface

request

response

Web service
or java library

Web service

SQL query

Java class

Mapping execution service

Wrap different map
-
ping implementations

iFuice

script

17

Summary & outlook


iFuice
: generic way to dynamic information fusion


Based on
instance correspondences

of P2P sources


Mediator controled data fusion


Two working modes


Script mode:
powerful operators

for information fusion
tasks (with source selection or transparent)


Explorative mode: navigation in information space


Future work


Finishing prototype implementation


Different domains, e.g., bioinformatics and e
-
commerce


Tool supported (semi
-
) automatic integration of local /
private data sources