Knowledge Systems Course

hurriedtinkleAI and Robotics

Nov 15, 2013 (3 years and 8 months ago)

71 views


May 2006


Knowledge Systems Course


1

Knowledge Systems Course

From data source to Guided Exploration:

a tool stack for Semantic Web navigation


May 2006


Aduna, Jeen Broekstra

jeen.broekstra@aduna
-
software.com


May 2006


Knowledge Systems Course


2

Time table


[8:45


9:00] Introduction


Aduna


RDF and the Semantic Web


[9:00


9:30] Software stack: Middleware


Sesame: storage and querying for RDF


Aperture: retrieving metadata from data


[9:45


10:15] Software stack: Presentation


Spectacle: RDF
-
based Facet Navigation


AutoFocus: Cluster Visualization


[10:15


10:30] Demo + discussion



May 2006


Knowledge Systems Course


3

About Aduna


Where are we:



Amersfoort, the Netherlands


What do we do:


Develop software for effective navigation
and visualization of large information
sources


Use Semantic Web technology to enable
better search


May 2006


Knowledge Systems Course


4

Aduna and Software


Software Components:


Aperture


a framework for extracting metadata from various kinds of
sources (e.g. Word files, E
-
mail, PDF, images,…)


Sesame


a toolkit/database for scalable storage and querying of
RDF, RDFS and OWL


Spectacle


efficient facet navigation


Cluster Map


visualization component



May 2006


Knowledge Systems Course


5

RDF in one slide


Data model

for expressing knowledge


basic building block:
statement

<person001> <name> “Jeen” .


groups of statements form
graphs

person001

j.broekstra@tue.nl

Jeen

name

email

project001

worksIn

Sesame

name

projectMemberEmail


May 2006


Knowledge Systems Course


6

RDF Schema in one more slide


RDF Schema is a
Vocabulary
Description Language


it allows specification of domain
vocabulary and a way to structure it


Class, Property,

subClassOf,

subPropertyOf,

domain, range


Formal semantics add

simple reasoning

capabilities:


class and property subsumption


domain and range inference



person001

Researcher

Person

name

rdf:Property

rdfs:Class

rdf:type

rdf:type

rdf:type

rdfs:domain

rdfs:subClassOf


May 2006


Knowledge Systems Course


7

presentation

middleware

The tool stack

Aperture

metadata extraction

Sesame

metadata storage and reasoning


May 2006


Knowledge Systems Course


8

Aperture


May 2006


Knowledge Systems Course


9

What is Aperture?


Aperture is a Java framework for
extracting and querying full
-
text content
and metadata from various information
systems (e.g. file systems, web sites,
mail boxes) and the file formats (e.g.
documents, images) occurring in these
systems.


Open Source project by Aduna and DFKI:

http://aperture.sourceforge.net/


May 2006


Knowledge Systems Course


10

Aperture Features


Crawl information systems such as file
systems, websites, mail boxes and mail servers


Extract full
-
text and metadata from many
common file formats


View files in their native applications


Ease of use: easy to learn, easy to code, easy
to deploy in industrial projects


Flexible architecture: can be extended with
custom file formats, data sources, etc., with
support for deployment on OSGi platforms


Data exchange based on Semantic Web
standards (e.g. RDF, SPARQL, ...)


May 2006


Knowledge Systems Course


11

Supported File Formats


Plain text


HTML, XHTML


XML


PDF (Portable Document Format)


RTF (Rich Text Format)


Microsoft Office: Word, Excel, Powerpoint, Visio, Publisher


Microsoft Works


OpenOffice 1.x: Writer, Calc, Impress, Draw


StarOffice 6.x
-

7.x+: Writer, Calc, Impress, Draw


OpenDocument (OpenOffice 2.x, StarOffice 8.x)


Corel WordPerfect, Quattro, Presentations


Emails (.eml files)



May 2006


Knowledge Systems Course


12

The Sesame Framework


May 2006


Knowledge Systems Course


13

What is Sesame?


A framework for
storage
,
querying

and
inferencing

of RDF and RDF Schema


A Java Library for handling RDF


A Database Server for (remote) access

to
repositories

of RDF data


Open Source project by Aduna

http://www.openRDF.org/






May 2006


Knowledge Systems Course


14

Sesame features


Light
-
weight yet powerful Java API


Highly expressive query and transformation languages


SeRQL, SPARQL


High scalability (O(10
7
) RDF triples on desktop hardware)


Various backends


Native Store


RDBMS (MySQL, Oracle 10, DB2, PostgreSQL)


main memory


Reasoning support


RDF Schema reasoner


OWL DLP (OWLIM)


domain reasoning (custom rule engine)


Rio Toolkit: parsers and writers for different RDF syntaxes:


RDF/XML, Turtle, N3, N
-
Triples, TriX


May 2006


Knowledge Systems Course


15

Sesame 2 architecture

RDF Model

Rio


SAIL API

SAIL Query Model

SeRQL

SPARQL

Repository Access API

HTTP Server

application

application

HTTP / SPARQL protocol




May 2006


Knowledge Systems Course


16

Sesame 2 architecture

RDF Model

Rio


SAIL API

SAIL Query Model

SeRQL

SPARQL

Repository Access API

HTTP Server

application

application

HTTP / SPARQL protocol



Storage And Inference Layer


System API for ‘wrapping’ storage

backend

The core RDF model, containing

objects and interfaces for URIs,

blank nodes, literals, statements.

RDF I/O


Set of parsers and writers for

RDF/XML, Turtle, N3, N
-
Triples.

Can be used separately.

Declarative Querying

and other ‘higher
-
level’

functions on SAILs

Main Access API of Sesame


Offers developer
-
friendly

methods for manipulating

RDF data (query, adding,

removing, updating)

Local apps can just include (parts

of) Sesame as a Java library and

use it to process RDF data

efficiently.

Allows deployment of Sesame

as a web
-
enabled database

server (e.g. in Tomcat).

Implements a superset of SPARQL

protocol (HTTP REST)

Remote apps can communicate over

the Web with a Sesame server and

update data or do queries


May 2006


Knowledge Systems Course


17

The SAIL API



S
torage
A
nd
I
nferencing
L
ayer



Abstraction from physical storage


allows other Sesame components to function
on any type of store


can be used as a
wrapper

layer for a

particular data source



System Internal API


application developers typically do not use it
directly



May 2006


Knowledge Systems Course


18

The Repository Access API


A single Java object representation for a
Sesame database, offering methods for


evaluating a query and retrieving the result


adding RDF data from local file, from the
web, as a text string, etc.


adding/removing (sets of) RDF statements


starting/stopping transactions



May 2006


Knowledge Systems Course


19

Querying RDF


RDF is a labeled, directed graph of
semistructured data


no rigid schema


An RDF query language needs to be able
to address this:


graph path expressions


dealing with semistructured nature of RDF


flexible querying of both data and schema


May 2006


Knowledge Systems Course


20

SeRQL


Language proposal based on best practices


Redesign of RQL to make it easier to use,
incorporating ideas from many other query
languages


Developed in the Sesame project


Expressive language, but still fairly easy to
use


Support for RDF Schema


Implementation: Sesame


May 2006


Knowledge Systems Course


21

SeRQL path expressions


{X} geo:hasCapital {geo:Amsterdam}


{X} geo:hasCapital {Y}


{X} P {Y}

hasCapital

areacode

Netherlands

Amsterdam

020


May 2006


Knowledge Systems Course


22

Chaining, branching and
comparing


Chaining:


{X} geo:hasCapital {Y} geo:areacode {Z}


Branching:


{X} rdf:type {Y};


geo:areacode {Z}


Comparison operators:


String comparison:


X like “*Netherlands”


Y like “A*”


boolean comparison:


X < Y
,
X <= Y
,
Z < 20
,
Z = Y
, etc.

hasCapital

areacode

Netherlands

Amsterdam

020


May 2006


Knowledge Systems Course


23

SeRQL query composition


Using the building blocks, we can compose complex
queries.


SeRQL uses a select
-
from
-
where syntax

SELECT


X, Y

FROM


{X} geo:hasCapital {Y} geo:areacode {Z}

WHERE


Z like “020”

USING NAMESPACE


geo = <http://www.geography.org/schema.rdf#>


May 2006


Knowledge Systems Course


24

Optional path expressions


RDF is
semi
-
structured


Even when the schema says some object
should have a particular property, it may not
always be present in the data:


Persons have names and email addresses, but
Lora is a person without a known email address

person001

j.broekstra@tue.nl

Jeen

Lora

person002

name

name

email

Person

type

type


May 2006


Knowledge Systems Course


25

Optional path expressions (2)


To be able to query for all persons, their
first names, and
if known
their email
address, SeRQL introduces optional path
expressions:

SELECT


Person, Name, Email

FROM


{Person} my:name {Name};


[my:email {Email}]


May 2006


Knowledge Systems Course


26

CONSTRUCT queries


CONSTRUCT
-
queries return RDF
statements


each RDF statement matching the query pattern
is returned


The query result is


a
subgraph

of the original graph, or;


a
transformed

graph



This mechanism also allows formulation of
simple
rules



May 2006


Knowledge Systems Course


27

SeRQL construct
-
queries

CONSTRUCT *

FROM {X} geo:hasCapital {Y}

Netherlands

Amsterdam

hasCapital

Subgraph query:

CONSTRUCT {Y} my:inCountry {X}

FROM {X} geo:hasCapital {Y}

Amsterdam

Netherlands

inCountry

Transformation query:


May 2006


Knowledge Systems Course


28

SeRQL vs. SPARQL


Both: expressive query and transformation language


SELECT and CONSTRUCT


optional path expressions


support for context/named graphs


SeRQL (“circle”)


nested queries, language tags, …


user
-
friendly syntax (but YMMV)


very efficient Sesame implementation


SPARQL (“sparkle”)


W3C Standard

(in progress)


tool interoperability: Jena, Redland, 3Store, Sesame, …




May 2006


Knowledge Systems Course


29

SeRQL vs. SPARQL example

PREFIX geo: <http://www.geography.org/schema.rdf#> .


SELECT ?x ?y

WHERE {




?x geo:hasCapital ?y .



?y geo:areacode ?z .



FILTER (?z = “020”).




}

SELECT



X, Y

FROM


{X} geo:hasCapital {Y} geo:areacode {Z}

WHERE



Z like “020”

USING NAMESPACE


geo = <http://www.geography.org/schema.rdf#>


May 2006


Knowledge Systems Course


30

Presentation

How to navigate ontology
-
based
information


May 2006


Knowledge Systems Course


31

An ontology is not enough


End users do not necessarily think in the
same terms in which an ontology is
modeled


Search and Navigation tools need to
provide for allowing user
-
oriented access
to the information


views


multiple access paths


recognizable options


quick results


May 2006


Knowledge Systems Course


32

Navigation problems 1


Too many links or categories


overwhelming offer


Deep hierarchies


information remains hidden


May 2006


Knowledge Systems Course


33

Examples


May 2006


Knowledge Systems Course


34

Navigation problems 2


Query overspecification


zero results!


Query underspecification


millions of hits!


May 2006


Knowledge Systems Course


35

Examples


May 2006


Knowledge Systems Course


36

Faceted navigation 1


Facet = meta
-
data element


e.g. 'author', 'title', 'date‘, ‘type’


Facets have values


e.g. 'author is J. Brown'


In collections facet values are related


e.g. author 'J. Brown' is connected to title
'Once upon a time ...'


Faceted navigation = chose a facet value
an see all related facets and values


May 2006


Knowledge Systems Course


37

Faceted navigation 2


Problem solved


user has problems specifying query


over
-

and underspecification


Solution


showing all options


give ways to drill down the information


Applied


database selection (e.g. job sites), e
-
commerce
(e.g. travel), enhancement of (full text) search


May 2006


Knowledge Systems Course


38

Example of faceted navigation

Facet
: Type

Facet values
:

Adobe AD, HTML

Document,

XML Document

Nr. of instances


per facet values


May 2006


Knowledge Systems Course


39

Facets are Data Views


Each navigation facet is driven by a
SeRQL query on the underlying Sesame
repository


SeRQL queries can retrieve and
transform the data to provide a facet
‘view’


Spectacle uses the query results to
populate the facet with values


May 2006


Knowledge Systems Course


40

Information visualization 1


Types


Model visualization


Instance visualization


Examples


Hyperbolic tree, InXight


Graph visualisation, AquaBrowser


Claim of visualization: show things that
you can't (easily) express in words or lists


May 2006


Knowledge Systems Course


41

Information visualization 2


Cluster Map = instance visualization


visualization of the search results


instances can be things like files, jobs, and
people


Map shows AND,

OR and NOT of

query arguments


May 2006


Knowledge Systems Course


42

Cluster Map examples


May 2006


Knowledge Systems Course


43

Aduna AutoFocus

Relations shown
in a Cluster Map

Combined full
text search in
documents,
websites and

e
-
mail

Automatically
generated
suggestions
help to refine
the question

AutoFocus helps you to
explore data sources like
files, websites and e
-
mail
with Guided Exploration.


AutoFocus scans data
sources and automatically
makes suggestions after
you entered a search term.
So if you are not completely
sure what to look for,
AutoFocus will help you
with suggestions for
refinement.


Next to that you don’t have
to store or search for
information in complex
directory hierarchies any
more. AutoFocus will
retrieve it anyway.

Support for
multiple
data
sources:
documents,
websites,

e
-
mail
boxes


May 2006


Knowledge Systems Course


44

Aduna Spectacle

Visitors find what they
want without negative
feedback like
´
zero
results
´

Navigation on
multiple facets of
information
collections

Use of
information
increases with
faceted
navigation

Aduna Spectacle helps
website visitors to find what
they want with Guided
Exploration.


Aduna Spectacle supports
faceted navigation. Users
drill down step by step,
making choices on multiple
meta
-
data facets.


Spectacle overcomes
problems related to over
-

and underspecification. The
user gets the right answer.

Easy to
implement on
top of your
existing
information
sources


May 2006


Knowledge Systems Course


45

Pointers


Aduna

http://aduna
-
software.com/


AutoFocus

http://aduna
-
software.com/products/autofocus/


Spectacle

http://aduna
-
software.com/products/spectacle


Sesame

http://www.openrdf.org/


Aperture

http://aperture.sourceforge.net/


May 2006


Knowledge Systems Course


46

Demo & Discussion Time