Roi Adadi David Ben-David

wrendeceitInternet and Web Development

Oct 21, 2013 (4 years and 22 days ago)

103 views

Roi

Adadi

David Ben
-
David


Semantic Web Document (SWD)


A web page that serializes an RDF graph.


Uses one of the recommended RDF syntax languages, i.e. RDF/XML,
N
-
TRIPLE or N3.


Semantic Web Term (SWT)


An RDF resource that represents an instance of
rdfs:Class

or
rdf:Property
, and can be universally referenced by its URI reference
(
URIref
).


Semantic Web Ontology (SWO)


An SWD is considered to be an SWO when a significant proportion of
the statements it makes defines new SWTs.


Semantic Web Database (SWDB)


An SWD that does not define or extend a significant number of
terms.


Introduces individuals and makes assertions about them.


Make assertions about individuals defined in other SWDs.

<
rdf:RDF
>





<
rdfs:Class

rdf:ID
=”Department” />


<
rdfs:Class

rdf:ID
=”Course” />


<
rdf:Property

rdf:ID
=“name” >


<
rdfs:domain
>


<
owl:Class
>


<
owl:unionOf

rdf:parseType
="Collection">


<
rdfs:Class

rdf:about
=# Department />


<
rdfs:Class

rdf:about
=#Course />


</
owl:unionOf
>


</
owl:Class
>


</
rdfs:domain
>


<
rdfs:range

rdf:resource
= http://www.w3.org/2000/01/rdf
-
schema#Literal/>


</
rdf:Property
>


<
rdf:Property

rdf:ID
=“number” >


<
rdfs:domain

rdf:resource
=“#Course”/>


<
rdfs:range

rdf:resource
= http://www.w3.org/2000/01/rdf
-
schema#Literal/>


</
rdf:Property
>


<
rdf:Property

rdf:ID
=“department” >


<
rdfs:domain

rdf:resource
=“#Course”/>


<
rdfs:range

rdf:resource
=“#Department”>


</
rdf:Property
>


<
rdf:Property

rdf:ID
=“
creditPts
” >


<
rdfs:domain

rdf:resource
=“#Course”/>


<
rdfs:range

rdf:resource
= http://www.w3.org/2000/01/rdf
-
schema#Literal/>


</
rdf:Property
>


<Department
rdf:ID
=“
dept_cs
”>


<name>Computer Science</name>


</Department>


<Course
rdf:ID
=“cs236703” >


<name>Object Oriented Programming</name>


<department
rdf:Resource
=“#
dept_cs
” />


<
creditPts
>3.0</
creditPts
>


</Course>




</
rdf:RDF
>



SWD

SWT

SWT

SWT

SWT

SWT

SWT

Class
Document

Class
Organization

Property
mbox

FOAF

http://xmlns.com/foaf/spec/index.rdf

Contain 12 classes and 51
properties (in 466 triples)

(No individuals)

Name statement

Nick Name
statement


FOAF description for Tim
Finin


www.cs.umbc.edu/~finin//foaf.rdf

Defines three individuals and
make statements about them

(No classes or properties)


Current form of the Semantic Web


web of Semantic Web Documents (SWD)



Navigating the Semantic Web is difficult


Paucity of explicit hyperlinks (beyond NS in
URIrefs
).


Relations such as
rdfs:seeAlso

and
owl:imports

are rare.



There is a need for a search engine customized
for SWD


Find and analyze SWDs on the web.


Suggest a measure for SWDs’ importance (ranking).


Semantic Web researchers


Search for SWTs and SWOs for publishing their
knowledge.



Software Agents


Search SWDs for external knowledge.


Retrieve SWOs to fully understand SWTs.


Conventional web navigation and ranking
models are not suitable for the Semantic Web.



They do not differentiate SWDs from other
web pages.



They do not parse and use the internal
structure of SWD and the external semantic
links among SWDs


Designed to work with NL and unstructured text


Finding appropriate ontologies


Qualified search (Terms + Types)


Ontologies are sorted by their popularity.



Finding instance data


Querying SWDs with constraints on the classes and
properties used by them.


Helps to integrate Semantic Web data on the web.



Characterizing the Semantic Web


Structural properties



Ontology Based Annotation Systems


SHOE,
Ontobroker
,
webKB
,
QuizRDF
, CREAM, …


Annotating online documents.


Document indexes based on the annotations, but
not on the entire document.


Use their own ontologies that might not suit some
SWDs



Ontology Repositories


DAML Ontology Library,
SemWebCentral
, Schema
Web, …


Collect ontologies (simply store the entire RDF
document).


Do not automatically discover SWDs but rather
require people to submit URLs.


Constitute a small portion of the Semantic Web.



Semantic Web Browsers


W3C’s
Ontaria


Searchable and
browsable

directory of RDF documents
developed by the W3C.


Do not automatically discover SWDs.


Stores the full RDF graphs.


Indexes individuals of well known classes


e.g.
foaf:Person
,
rss:Item




Experiments show:


outperforms them all!


Crawler
-
based indexing and retrieval system
for the Semantic web.


Discover semantic web documents


Computes relations between documents


Store and reason over extracted metadata


The system is designed to scale up to handle tens
of millions of documents


Enables rich query constraints on semantic
relations



Collects candidate URLs to find and cache
SWDs


Submitted URLs.


A Web crawler.


A customized meta
-
crawler (using conventional
search engines).


SwoogleBot

Semantic Web Crawler

.


Analyzes SWDs to produce new candidates.





Analyzes the discovered SWDs


Generates the bulk of
Swoogle’s

metadata
about the Semantic Web


Characterizes features associated with SWDs and
SWTs.


Tracks relations among SWDs and SWTs.


Analyzes the generated metadata.


Classification of SWOs and SWDBs.


Hosts the modular ranking mechanisms.


Ontology Rank.




provides search services to software agents
and users, allowing them to access metadata
and navigate the semantic web


Swoogle

Search


searches SWDs using constraints
on URLs, SWTs being used or defined, etc.


Ontology Dictionary


searches
ontologies

at the
term level and offers more navigational paths.



SWD metadata is collected to make SWD
search more efficient and effective.


Derived from the content of SWD as well as
the relations among SWDs


3 categories of metadata:


Basic metadata


Relations among SWDs


Analytical results


Language Features


properties describing
the syntactic or semantic features of an SWD.


Encoding


syntactic encoding of an SWD.


“RDF/XML”, “N
-
TRIPLE” and “N3”.


Language


the language used by an SWD.


“OWL”, “DAML+OIL”, “RDFS” and “RDF”.


OWL Species


the language species of an SWD
written in OWL.


“OWL
-
LITE”, “OWL
-
DL” and “OWL
-
FULL”




RDF Statistics



properties summarizing node
distribution of the RDF graph of an SWD.


How an SWD defines new classes, properties and
individuals.


Let
foo

be an SWD and let
C(foo)
,
P(foo)
,
I(foo)

be the
set of classes, properties and individuals defined in
the SWD
foo

respectively. The
onology
-
ratio

R(foo)

is
calculated by:




R(
foo
)
ranges from 0 to 1, where 0 implies that
foo

is
a pure SWDB and 1 implies that
foo

is a pure SWO.





<
rdf:RDF
>


<
rdfs:Class

rdf:ID
=”Department” />


<
rdfs:Class

rdf:ID
=”Course” />


<
rdf:Property

rdf:ID
=“name” >


<
rdfs:domain
>


<
owl:Class
>


<
owl:unionOf

rdf:parseType
="Collection">


<
rdfs:Class

rdf:about
=# Department />


<
rdfs:Class

rdf:about
=#Course />


</
owl:unionOf
>


</
owl:Class
>


</
rdfs:domain
>


<
rdfs:range

rdf:resource
= http://www.w3.org/2000/01/rdf
-
schema#Literal/>


</
rdf:Property
>


<
rdf:Property

rdf:ID
=“number” >


<
rdfs:domain

rdf:resource
=“#Course”/>


<
rdfs:range

rdf:resource
= http://www.w3.org/2000/01/rdf
-
schema#Literal/>


</
rdf:Property
>


<
rdf:Property

rdf:ID
=“department” >


<
rdfs:domain

rdf:resource
=“#Course”/>


<
rdfs:range

rdf:resource
=“#Department”>


</
rdf:Property
>


<
rdf:Property

rdf:ID
=“
creditPts
” >


<
rdfs:domain

rdf:resource
=“#Course”/>


<
rdfs:range

rdf:resource
= http://www.w3.org/2000/01/rdf
-
schema#Literal/>


</
rdf:Property
>


<Department
rdf:ID
=“
dept_cs
”>


<name>Computer Science</name>


</Department>


<Course
rdf:ID
=“cs236703” >


<name>Object Oriented Programming</name>


<department
rdf:Resource
=“#
dept_cs
” />


<
creditPts
>3.0</
creditPts
>


</Course>

</
rdf:RDF
>




Ontology Annotations


properties that
describe an SWD as an ontology.


The SWD has an instance of
OWL:Ontology


Swoogle

records the following properties:


label (
rdfs:label
)


comment (
rdfs:comment
)


versionInfo

(
owl:versionInfo
/
daml:versionInfo
)






Capturing and analyzing relations at the RDF node
level is hard.


Swoogle

generalizes RDF node level relations and
Focuses on SWD level relations.


Swoogle

captures the following SWD level relations:


TM/IN



SWD is using terms defined by some other SWDs.


IM



an ontology imports another ontology.


EX



an ontology extends another ontology


PV



an ontology is a prior version of another.


CPV



an ontology is a prior version of another and is
compatible with it.


IPV

-

an ontology is a prior version of another and is
incompatible with it.

Indicators of inter
-
ontology relation


OntologyRank

inspired by Google’s
PageRank

algorithm.



Underlying Random Surfing Model:


Surfer jumps to a random URL


With probability
d

randomly chooses a link to follow.


With probability
1
-
d

jumps to another random URL.


Given a document
A
,
A
’s Page rank is
computed by:





where are web documents that link to
A
;
C(T)

is the total
outlinks

of
T
; and d is a
damping factor, typically set to 0.85.



The graph formed by SWDs has a richer set of
relations.


The edges have explicit semantics



Users can navigate the Semantic Web
whithin

or across the web and RDF graph through 7
groups of navigational paths


The semantics
of links lead to a non
-
uniform
probability of following a particular outgoing link.


Given SWD’s A and B,
Swoogle

classifies inter
-
SWD
links into four categories:


imports(A,B)



A import all content of B.


uses
-
term(A,B)



A uses some of the terms defined by B
(without importing B).


extends(A,B)



A extends the definitions of terms defined
by B.


asserts(A,B)



A makes assertions about the individuals
defined by B.


Each category is assigned a different weight, which
represents the probability of following that kind of
link.


Given an SWD
a
,
Swoogle

computes its raw
rank by:







where
L(a)

is the set of SWDs that link to
a
,
T(x)
is the set of SWDs that
x

links to.


Then,
Swoogle

computes the rank for SWDB
and SWO by:





where
T(c)

is the transitive closure of SWOs
imported by
a
.


The problem of Indexing and Searching SWDs


Significant semantic information encoded in marked
documents.


Reasoning over large collection of documents can
be expensive.



Traditional information retrieval techniques


Faster (coarse view of the text).


Can quickly retrieve a set of SWD’s based on
similarities of the source text alone.



SWDs are not entirely markup.


Search should be applied to both structured and
unstructured components of the document.



We may want SWDs to be available to commonly
used search
engins


Documents must be transformed to a form that a
standard IR engine can understand and manipulate.



Well researched methods for ranking matches,
computing similarities between documents and
employing relevance feedback.





Look at a document as a collection of either
tokens or N
-
Grams.


URIrefs

of classes, properties and individuals
corresponds to words in natural languages.


Apply the following process to an SWD


Reduce it to triples.


Extract
URIrefs

(with duplicates).


Discard
URIrefs

of blank nodes.


Hash each URI to a token.


Index the document.




indexes

by either N
-
Gram

or
URIrefs