Compass

erminerebelΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

89 εμφανίσεις

Compass

Semantic search






www.ovitas.no


12.10.2006

TMRA '06

2

Basics



Knowledge model based information
retrieval


Fulltext search enhanced with Topic
Maps = Semantic search


Search driven navigation


12.10.2006

TMRA '06

3

Search technologies

Level of precision

("Intelligence")


Data volume

(Domain size)

Semantic search

Full
-
text search

Conceptual search

Compass

12.10.2006

TMRA '06

4

Given...


a web site with a lot of text,


which is unstructured (no markup, no
tags),


a controlled domain (we know what the
discourse domain is), and


non
-
adequate search engine...

12.10.2006

TMRA '06

5

We would like to...


get relevant hits within a meaningful
context,


spare the work of structuring our data,


add semantics to the content by defining
a knowledge model.

12.10.2006

TMRA '06

6

Compass
-
bowl:

Take a fulltext search engine.

Take a Topic Maps engine.

Add a hint of semantics.

Define the correct processes for
orchestrating the components.

Mix them thoroughly.

Serve to public!

12.10.2006

TMRA '06

7

Full text search engine


Apache Lucene (open source)


Possible to index most file formats


html, asp, php, jsp, pdf, rtf, txt, doc, ppt, xls, pst…


The index is independent o
f

the model


No need to re
-
index when changes are made to the
model


Small index size


typically less than 10% of the size of the data


Fast index lookup


less than 20 ms for index size >20000

12.10.2006

TMRA '06

8

The knowledge model


Based on the ISO International Standard
for Topic Maps


Semantic model of the discourse domain


Concept words = topic names/synonyms


Semantic relationships through
associations


Compass Weight defines “closeness”
between topics


property on association types

12.10.2006

TMRA '06

9

Example

Ovitas

Christopher

type

hasEmployee

CW=0.7

Compass

hasProduct

CW=0.8

type

12.10.2006

TMRA '06

10

Compass orchestrator


Guides the processes of the search:

1.
Search for term in the topic map

2.
Expand the map for relevant/related topics

3.
Send all these terms off to a fulltext search

4.
Calculates relevance (based on the
combination of CW and Lucene weights) and
prepares the result list as an XML instance

5.
Render XML as wished

Topic Map
expansion

Search term

Hits in the fulltext
gruouped by the
related topics

Relevant documents ranked
by the weighting result

Search term in the
topic map, but not
in the text

Relevant information
about ”Chris
Searle”

Synonym
search

12.10.2006

TMRA '06

15

Creating/maintaining the model


An MS Excel plug
-
in serves as the topic
map editor


Can be put under version control


Import the model into the topic map
engine: one click only


For complex topic maps a custom user
interface can be used to enter instance
data

12.10.2006

TMRA '06

16

Navigation


Navigation through the associations
between topic
s


Navigation by search

12.10.2006

TMRA '06

17

User configurations


What pages to index


What topic map to use


The number of hops to perform


The threshold for relevance

12.10.2006

TMRA '06

18

Content lifecycle management


Easy to integrate with content
repositories


A content management or publishing
system can send a request to the indexer
to re
-
index a particular resource


Incremental indexing: add, update or
delete documents


HTTP is used as the basic mechanism to
address content

12.10.2006

TMRA '06

19

Architecture


SOA (service oriented architecture), no
dependency on platform or components


Web service interface (HTTPRest)


.NET platform


Integrated components:


TMCore

Topic Maps engine by
NetworkedPlanet


Apache Lucene
: full text engine

12.10.2006

TMRA '06

20

Architecture diagram

TM Core

Full Text

Excel Editor

Compass Service

TM

Nav

TM editor
person

User

Publishing System
Services

12.10.2006

TMRA '06

21

Compass
-

Summary


Semantic search based on Topic Maps


Search in any document formats


Organize information in a topic
-
oriented
manner


Link to relevant information without touching
the data content


Conceptual navigation by Topic Maps


Tools for maintaining/evolving the
classification


Fast and easy implementation