DCMI Workshop on Metadata and Search

closebunkieΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

64 εμφανίσεις

DCMI Workshop on
Metadata and Search


Vendor Panel
Presentation

Bradley P. Allen

ballen@siderean.com

http://www.siderean.com


Copyright © 2003 Siderean Software LLC. All rights reserved.

Overview


Our perspective is that of a Semantic
Web application vendor


Our belief is that faceted search will be
the first killer application of the
Semantic Web


Our goal is to show how this is possible
and what the benefits are


But first, some general statements…


Copyright © 2003 Siderean Software LLC. All rights reserved.

Tools that leverage Dublin Core


Do supportable tools exist that take
advantage of Dublin Core and other
metadata standards to enhance search
results?


Yes, our work is a case in point


Also relevant:


Weblog CMS


RSS aggregators


Other RDF applications

Copyright © 2003 Siderean Software LLC. All rights reserved.

What's missing?




What do people need to be able to do
to actually use metadata effectively on
their intranets?


Start using what’s out there


Data in relational tables


CMS
-
generated metadata


A lot of metadata is lying around
unexploited

Copyright © 2003 Siderean Software LLC. All rights reserved.

Are Dublin Core guidelines sufficient?


What additional specifications are needed?


None: DC is an excellent minimal vocabulary that
has achieved broad acceptance


What we need are best practices, e.g.:


Encouraging resource values over literal
values for DC attributes as good style


dc:subject using controlled vocabularies


dc:creator using authority records


dc:date using temporal hierarchies


Implementing DCMI validation services

Copyright © 2003 Siderean Software LLC. All rights reserved.

Is XML the primary coding language?


Is it being used for Dublin Core and
other metadata applications?


Yes, for all the right reasons


Open standards


Leverage of existing tools


What other encoding methods are being
used?


RDF/N3 for some RDF
-
based
applications

Copyright © 2003 Siderean Software LLC. All rights reserved.

Our application: Seamark


A
navigation engine

built on three key
ideas


Metadata represented in

Resource
Description Framework (RDF)

is
aggregated from existing enterprise
content and data


Faceted metadata retrieval

turns the RDF
into a navigation web service


Web services

make navigation
applications easy to install and integrate
with existing Web applications

Copyright © 2003 Siderean Software LLC. All rights reserved.

Faceted search and RDF: why?


Enabling more effective retrieval is a major goal for
the Semantic Web


RDF is a superb foundation for faceted search


RDF as an
open standard

for metadata exchange


RDF Schema as a framework for defining facets


The Semantic Web will enable faceted search to
become pervasive


Widespread sharing and reuse of ontologies, vocabularies
and DC instance data becomes possible


The blogosphere as an existence proof


“View Source” for the Semantic Web

Copyright © 2003 Siderean Software LLC. All rights reserved.

Seamark, Dublin Core, and CVs


Enables Dublin Core


Using RDF encodings of DC


Handles controlled vocabularies


Using emerging RDF
-
based standards like
TIF(S)


Supports building and maintaining controlled
vocabularies


Concepts and terms represented as resources
and encoded in RDF in the same way as other
content


Therefore the same tools apply

Copyright © 2003 Siderean Software LLC. All rights reserved.

Seamark’s search interface

Use of flat or
hierarchical
controlled
vocabularies

Transparency
and
customizability
of results
ranking

Parametric
search with
customizable
pull
-
down
menus

Copyright © 2003 Siderean Software LLC. All rights reserved.

Lookups into large CVs in Seamark

Use of
standard
vocabularies
represented
in RDF (e.g.
LC’s
Thesaurus of
Graphical
Materials

Faceted
search over
controlled
vocabulary
terms

Syndication of
CVs, instance
data and
ontologies for
sharing

Copyright © 2003 Siderean Software LLC. All rights reserved.

Query processing in Seamark


Based on XML for Retrieval By Reformulation
(XRBR)


A query language that


Provides support for query reformulation and refinement
while minimizing roundtrips


Supports a stateless protocol for faceted metadata
retrieval with SOAP as a transport mechanism


Handles very large result sets gracefully


Think of XRBR as an application profile in the digital
library sense


Specifies a view over heterogeneous metadata schemas
with hints as to its interpretation and display

Copyright © 2003 Siderean Software LLC. All rights reserved.

Query processing in Seamark


Disambiguation


Suggestions provide this implicitly


Query expansion and concept mapping


RDF models plus XRBR structure queries
provide a general mechanism for this


Entity extraction


XSLT extensions at import augments raw
metadata with additional extracted attributes


Natural language processing


Direct manipulation now; QA to come

Copyright © 2003 Siderean Software LLC. All rights reserved.

Searching across collections


Metadata aggregation using RDF
provides a general platform for
federated search


We can directly leverage emerging SW
approaches to:


Thesaurus mapping


tif:concept
-
equivalence


Schema mapping


rdfs:subPropertyOf

Copyright © 2003 Siderean Software LLC. All rights reserved.

Setup and maintenance


Installation and configuration for Windows,
Linux and Mac OS X


Administration


Simple web
-
based administration interface for
aggregating feeds and specifying initial queries


Training


135 page tutorial


Extensive on
-
line API documentation


Courses



One
-
day on
-
site introduction



Copyright © 2003 Siderean Software LLC. All rights reserved.

Setup and maintenance


Shelley Powers, “Practical RDF”,
O'Reilly & Associates, 2003:


“... the application is easily installed and
configured, and comes with considerable
documentation”


“What I was most impressed with about
the product, though, was how quickly and
easily it integrated my RDF/XML data …
into a sophisticated query engine with
little or no effort.”

Copyright © 2003 Siderean Software LLC. All rights reserved.

Seamark’s administration interface

Users can
specify URLs
serving RDF to
load into a
given model

… then load
them manually
or on a
schedule
basis

Alternatively,
queries can be
executed
against an
SQL database

XSLT
stylesheets
transform XML
documents
and SQL result
sets into RDF

Aggregated
models can be
dumped to
RDF

Copyright © 2003 Siderean Software LLC. All rights reserved.

Sites using Seamark

Copyright © 2003 Siderean Software LLC. All rights reserved.