Infrastructure for Semantic Web

religiondressInternet and Web Development

Oct 21, 2013 (3 years and 8 months ago)

51 views

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

1

Piazza: Data Management
Infrastructure for Semantic Web

Applications


Alon Y. Halevy, Zachary G. Ives,

Peter Mork, Igor
Tatarinov.

Speaker: Sergey Chernov

Tutor: Jens Graupmann

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

2

Outline

1.
INTRODUCTION. SEMANTIC WEB.

2.
PIAZZA: SYSTEM OVERVIEW

3.
IMPLEMENTATION DETAILS

3.1
MAPPING LANGUAGE

3.2
QUERY ANSWERING ALGORITHM

4.
CONCLUSIONS.

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

3

Introduction


Goal:


Data Integration and Knowledge Management


Problem:


Web data lacks machine
-
understandable
semantics


Solution:


Semantic Web?


09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

4

The Semantic Web
*


Web sites include structural annotations


You can pose meaningful queries on them.


Ontologies provide the
semantic glue.


Internal implementation of web sites left open.



Agents perform tasks:


Query one or more web sites


Perform updates (e.g., set schedules)


Coordinate actions


Trust each other (or not).



I.e., agents operating on a gigantic heterogeneous
distributed database.


(*View by A. Halevy)

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

5

General requirements


Robust infrastructure for querying


Peer data management systems.



Facilitate mapping between different structures.
Need tools for:


Locating relevant structures


Easily joining the semantic web.



Get data into structured form


Should we worry about the
legacy web?


09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

6

Using views for specifying

mappings



Local
-
As
-
View (LAV).


Data sources can be
described as views over
the mediated schema.



Global
-
As
-
View (GAV).


Mediated schema can be
described as a set of views
over the data sources.




Mediated
Schema

Site B

Site A

Site C

Mediated
Schema

Site B

Site A

Site C

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

7

Mapping



Mapping AB
specifies
representation
of structured
data from
scheme of
node A into
scheme of
node B

Mediated
Schema

Site B

Site A

Site C

Mapping “AB”

Mapping “BA”

Mapping “BC”

Mapping “CB”

Mapping

“C
-
MS”

Mapping

“MS
-
C”

Mapping

“A
-
MS”

Mapping


“MS
-
A”

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

8

Piazza: Peer Data
-
Management
System


Goal:


Large scale autonomous sharing of structured data



Peer data management system (PDMS)



Autonomous Peers export data in their own
schemas



Pair
-
wise mappings between peers



Generalization of a Data Integration system



NOT a P2P file sharing system

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

9

Relationship of PDMS to…



P2P overlay networks (the “Structured World”)



Data integration systems (no central logical
mediated schema)



Federated databases (scale, ad
-
hoc nature)



Distributed databases (no central administration)

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

10

Representing Data


A spectrum of possibilities:


Relational tables, some integrity constraints


XML: can encode relational, hierarchical


Xquery


emerging standard query language (SQL for XML)


RDF: “
XML on drugs
”.


Sees only the logic; ignores other aspects.


DAML+OIL


Full
-
blown Knowledge representation language.


They all have semantics; just different expressive
powers.


We keep the data simple. Mappings between data
at different peers are more complex.

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

11

Peer Data Management


Mappings are query expressions


DbResearcher(x)


Researcher(x),Area(x,DB)


DbResearcher(x), Office(x,DBLab)
=

DbLabMember(x)

DB
Projects

MIT

UW

UCB

Stanford

Area
(areaID, name, descr)

Project
(projID, name, sponsor)

ProjArea
(projID, areaID)

Pubs
(pubID, projName, title, venue, year)

Author
(pubID, author)

Member
(projName, member)

Project
(projID, name, descr)

Student
(studID, name, status)

Faculty
(facID, name, rank, office)

Advisor
(facID, studID)

ProjMember
(projID, memberID)

Paper
(papID, title, forum, year)

Author
(authorID, paperID)

Area
(areaID, name, descr)

Project
(projID, areaID, name)

Pub
(pubID, title, venue, year)

PubAuthor
(pubID, authorID)

PubProj
(pubID, projID)

Member
(memID, projID, name, pos)

Alumn
(name, year, thesis)

Members
(memID, name)

Projects
(projID, name, startDate)

ProjFaculty
(projID, facID)

ProjStudents
(projID, studID)



Direction
(dirID, name)

Project
(pID, dirID, name)



09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

12

Piazza mapping language (1)

Target:


pubs


book*


title



author*


name


publisher*


name

Source:


authors


author*


full
-
name


publication*


title


pub
-
type

<pubs>


<book>


{: $a IN document(“source.xml”)
\


/authors/author



$t IN $a/publication/title,



$typ IN $a/publication/pub
-
type



WHERE $typ = “book” : }


<title> { $t }</title>


<author>



<name> {: $a/full
-
name :} </name>


</author>


</book>

</pubs>


XML/XML Example


09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

13

Piazza mapping language (2)

Target:


pubs


book*


title



author*


name


publisher*


name

Source:


authors


author*


full
-
name


publication*


title


pub
-
type


piazza:id

attribute

<pubs>


<book
piazza:id={$t}
>


{: $a IN document(“source.xml”)
\


/authors/author



$t IN $a/publication/title,



$typ IN $a/publication/pub
-
type



WHERE $typ = “book” : }


<title
piazza:id={$t}
> { $t }</title>


<author
piazza:id={$t}
>



<name> {: $a/full
-
name :} </name>


</author>


</book>

</pubs>

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

14

Piazza mapping language (3)

Target:


pubs


book*


title



author*


name


publisher*


name

Source:


authors


author*


full
-
name


publication*


title


pub
-
type


Partial mapping

<pubs>


<book piazza:id={$t}>


{: $a IN document(“source.xml”)
\


/authors/author



$t IN $a/publication/title,



$typ IN $a/publication/pub
-
type



WHERE $typ = “book” : }


PROPERTY $t >=’A’ AND $t < ‘B’


: }


[: <publisher>



<name>



{: PROPERTY $this IN



{“PrintersInc”, “PubsInc”} :}



</name>


</publisher> :]


</book>

</pubs>

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

15

Query Answering Algorithm


Problem


Evaluate query Q at P
1

given a network of mappings



Reformulate the query over all relevant peers



Chaining of mappings using a combination of query composition
and query rewriting




Q
P1
(x) :
-

DbResearcher(x)


Query Composition


M:

DbResearcher(x)



剥獥慲捨敲⡸⤬(牥愨砬r䈩B

†††††††††††††


Q
P2

(x)


剥獥慲捨敲⡸⤬(牥愨砬r䈩


Query Rewriting


M:
DbResearcher(x), Office(x,DBLab)
=

DbLabMember(x)




Q
P3

(x)


䑢䱡L䵥浢敲⡸M


09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

16

Query Reformulation (1)

Mapping:

<S2>


<people> {: $people=/S1/people :}


<faculty> {: $name=$people/faculty/name/text():}


{ $name}


</faculty>


<student>{: $student=$people/student/text():}


<name> { $student } </name>


<advisor> {: $faculty=$people/faculty,




$name=$faculty/name/text(),




$advisee=$faculty/advisee/text()




where $advisee=$student :}


{ $name }


<advisor>


</student>


</people>

</S2>

<result> {


for $faculty in /S1/people/faculty,


$name in $faculty/name/text(),


$advisee in $faculty/advisee/text()


where $name = “Ullman”


return


<student> {$advisee} </student>
}

</result>

Query:

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

17

Query Reformulation (2)

<result> {


for $faculty in /S1/people/faculty,


$name in $faculty/name/text(),


$advisee in $faculty/advisee/text()


where $name = “Ullman”


return


<student> {$advisee} </student>
}

</result>

Query:

<result>

name

advisee

$name = “Ullman”

<student> {$advisee}

S1

people

faculty

<S2>


S1

<people> people


faculty


name

<faculty> {$name}

student

<student>

<name> {$student}


faculty


name

advisee

$advisee=$student

<advisor> {$name}

Query tree
pattern:

Mapping tree
pattern:


09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

18

Query Reformulation (3)

Query:

<result>

name

advisee

$name = “Ullman”

<student> {$advisee}

S1

people

faculty

<S2>


S1

<people> people


faculty


name

<faculty> {$name}

student

<student>

<name> {$student}


faculty


name

advisee

$advisee=$student

<advisor> {$name}

Query tree
pattern:

Mapping tree
pattern:


<result> {


for $faculty in /S2/people/student,


$advisor in $student/advisor/text(),


$name in $student/name/text()


where $advisor = “Ullman”


return


<student> { $name } </student>

}

</result>

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

19

Reformulation times


Table 1: The test queries and their
respective running times.

Query

Description

Reformulation time

# of reformulations

Q1

XML
-
related projects.

0.5 sec

12

Q2

Co
-
authors who reviewed
each other's work.

0.9 sec

25

Q3

PC members with a paper
at the same conference.

0.2 sec

3

Q4

PC chairs of recent
conferences + their
projects.

0.5 sec

24

Q5

Conflicts
-
of
-
interest of PC
members.

0.7 sec

36

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

20

Current and the Future


Current status


Demo scenario using XML


Looking at real domains (Bio dbs, NASA dbs)



Future Work


More efficient reformulation algorithm


Semantic network analysis


eliminate redundant
mappings and inconsistent mappings


Query caching to speed up query evaluation

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

21

Conclusions


Mapping language

for mapping between sets of
XML source nodes with different document
structures



Architecture

that uses the transitive closure of
mappings to answer queries



Algorithm for query answering

over this
transitive closure of mappings, which is able to
follow mappings in both forward and reverse
directions




09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

22

Thank You!

09/12/2003

Peer
-
to
-
Peer Information Systems


WS 03/04

23

Further literature

1.
Alon Y. Halevy, Zachary G. Ives, Dan Suciu, Igor Tatarinov:
Schema Mediation for
Large
-
Scale Semantic Data Sharing


2.
Igor Tatarinov, Zachary Ives, Jayant Madhavan, Alon Halevy, Dan Suciu, Nilesh Dalvi,
Xin (Luna) Dong, Yana Kadiyska, Gerome Miklau, Peter Mork:
The Piazza Peer Data
Management Project


3.
Alon Y. Halevy, Zachary G. Ives, Dan Suciu, Igor Tatarinov:
Schema Mediation in Peer
Data Management Systems


4.
Alon Halevy, Oren Etzioni, AnHai Doan, Zachary Ives, Jayant Madhavan, Luke
McDowell, Igor Tatarinov:
Crossing the Structure Chasm


5.
Madhan Arumugam, Amit Sheth, and I. Budak Arpinar:
Towards Peer
-
to
-
Peer
Semantic Web: A Distributed Environment for Sharing Semantic Knowledge on the
Web


6.
Hendler J., Berners
-
Lee T., Miller E.:
Integrating Applications on the Semantic Web