PowerPoint 演示文稿 - GDM@FUDAN (Graph Data Management ...

wrendeceitInternet and Web Development

Oct 21, 2013 (3 years and 11 months ago)

176 views

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

Put conference information here

Reporter: Qi Liu

YAGO

2

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

What is YAGO?


A semantic web


A knowledge base


A combination of WordNet and wikipedia

3

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Semantic
web


Advocated by W3C(
World Wide Web Consortium
)


Aimed at reconstructing the WWW


A standard framework: RDF(
Resource Description
Framework
)


4

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

What is YAGO?


A semantic web


A knowledge base


A combination of WordNet and wikipedia

5

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Knowledge base


To be:


A special database for knowledge management


To do:


Provides a means for collecting, organising, searching
and utilising information


Three types:


Machine
-
readable knowledge bases(DBpedia)


Human
-
readable konwledge bases(Wikipedia)


Knowledge base analysis and design

6

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

What is YAGO?


A semantic web


A knowledge base


A combination of WordNet and wikipedia

7

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

WordNet


To be:


A lexical database for English since 1985


To do:


Groups words into synsets


Provides short, general definitions


Records the semantic relations between these synsets


25 basic noun groups & 15 verb groups

8

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Key Concepts


Ontology vs Taxonomy


Lexicon:
the bridge between a language and the
knowledge expressed in that
language



Syntactic (there vs their)


Semantic (sight vs site)


Pragmatic (infer vs imply)

10

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Semantics of YAGO


Five relations:


Domain


Range


subRelationof


Type


subClassOf


Entities:


Domain


Relation


Range


Literal


......


11

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Axiomatic rules


12

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Reasoning rules


correctness and completeness

13

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

The YAGO system


Knowledge extraction


YAGO storage


Enriching YAGO


14

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Knowledge
extraction


TYPE relation


SUBCLASSOF relation


MEANS relation


Other relations


Meta
-
relations

15

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

TYPE
relation extraction


The Wikipedia Category System


Types: conceptual, administrative, relational, thematic


Identifying Conceptual Categories


Conceptual


TYPE


Adm and relational ones: excluded by hand


Employ a shallow linguistic parsing(Noun Group Parser)
of the left two categories


E.g. Naturalized citizens of United States


domain and range extracted at the same time



16

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

SUBCLASSOF
relation extraction


Wikipedia categories


DAG(directed acyclic graph)


Reflect merely the thematic structure


Use only the leaf categories of Wikipedia


Integrating WordNet Synsets


Match or prefer WordNet


Establishing subClassOf


American people in Japan


Exceptions


Correct manually

17

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Means relation extraction


Exploiting WordNet Synsets


A synset{urban center,metropolis, city}


Attach a class for the synset ‘city’


Exploiting Wikipedia Redirects


Search “Einstein, Albert”, redirected to “Albert, Einstein”


Parsing Person Names


givenNameOf subRelationOf means


familyNameOf subRelationOf means

18

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Other
relations extraction


BornInYear & DiedInYear


EstablisedIn & LocatedIn


WrittenInYear


PolitionOf


HasWonPrize


Filtering the Results


19

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Meta
-
relations

extraction


Descriptions


Individual DESCRIBES URL


Witness


Fact FoundIn URL(of its witness page)


ExtractedBy


Context


Linkages btw A&B: A Context B

20

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Knowledge
extraction


TYPE relation


SUBCLASSOF relation


MEANS relation


Other relations


Meta
-
relations

21

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

The YAGO system


Knowledge extraction


YAGO storage


Enriching YAGO


22

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

YAGO
storage


Model independent of storage


Storage:


Text files, XML, database tables, RDF

23

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Enriching
YAGO


Add the fact(x,r,y)


Map x,y to existing entities(word sense disambiguation)


If mapping failed, add new entity.


Map r to YAGO ontology


If mapping successed, add a FoundIn relation


If mapping failed, add a new fact!

24

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Summary on YAGO1


1M entities & 5M facts


Accuracy around 95%


25

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

26

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

YAGO2: In Time, Space and Many Languages


YAGO: about 100 manually defined relations


Build YAGO2 architecture based on such rules:


Factual rules


E.g. Exceptions,definition of all relations, domains,
ranges and classes


Implication rules


Inferring rules from the facts in the database


Replacement rules


Normalize numbers, tags and other formats


Extraction rules


Extracting facts from a given source text

27

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Temporal Dimension


People

wasBornOnDate & diedOnDate


Groups

wasCreatedOnDate&wasDestroyedOnDate


Artifacts
(buildings, songs,cities) [same as above]


Events

startedOnDate & endedOnDate


=>startExistingOnDate&endExistingOnDate


Facts


Entities in a fact


=>subjectStartRelation&objectStartRelation

28

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

GEO
-
SPATIAL Dimension


All physical objects have a location in space!


Define it with geographical coordinates, i.e.
Latitude and longtitude


=>yagoGeoCoordinates,


=>hasGeoCoordinates


Two sources:


Wikipedia


GeoNames


locatedIn & hasGeoCoordinates

& <location,TYPE,class>

29

Graph Data Management Lab, School of Computer Science

GDM@FUDAN

http://gdm.fudan.edu.cn

Email: zerup123@gmail.com

GDM@FUDAN

Textual Dimension


hasWikipediaAnchorText


hasWikipediaCategory


hasCitationTitle


subClassOf hasContext

Integrating UWN to including 200 languages