Enabling Semantic Searching - Betaversion.org

woodruffpassionateInternet και Εφαρμογές Web

21 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

53 εμφανίσεις

stefano@apache.org

Enabling

Semantic Searching

by Stefano Mazzocchi

<stefano@apache.org>

stefano@apache.org

What is the “Semantic Web”?

The Semantic Web is an
extension of the current web in
which information is given
well
-
defined meaning, better
enabling computers and people
to work in cooperation.

[Tim Berners
-
Lee, James Hendler, Ora Lassila]

stefano@apache.org

Didn’t get it? Let’s try again


The web is the most successful
publishing media of the history of
mankind.


And still growing!!



The ‘semantic web’ dream is to make it
possible to have machines that
help us

consuming that much information!

stefano@apache.org

What do we need to build a
semantic web?


Data identification and retrieval


Development of vocabularies


Model constraints


Assertion and proofs


[Eric Prud’hommeaux]

stefano@apache.org

All that?


Unfortunately, yes…


…but each time we reach one of
these steps, the capabilities end
up to be surprising!

stefano@apache.org

One example for all: Google!


Google infers page importance from the global
web hyperlink topology.


This is possible because the semantics of
hyperlinks are well determined, thus
understandable by machines.


The result of such a simple elaboration are
astonishing.

stefano@apache.org

Semantic Searching

The act of looking for data
with the help of information
inferred from some well
-
defined meaning of the data
itself.

stefano@apache.org

Warning: Problems Ahead!


The Babel Problem


The Chicken
-
Egg Problem


The ROI Problem


The Screen
-
Scrape Problem


The Marginal Costs Problem

stefano@apache.org

The Babel Problem (1)


XML makes it possible to create
new markup languages to fit each
little need.


In many situations, existing
markups are complex and their
learning curve is too steep… thus:


We see an explosion of markup
languages

stefano@apache.org

The Babel Problem (2)


It is
not

obvious that this trend
will come to a saturation
(especially with the advent of
SOAP
-
based web services)


Automatic translation between
markups is not always
algorithmically possible.

stefano@apache.org

The Chicken
-
Egg Problem


People won’t feel the need to
publish information in more
semantically meaningful
languages, until there will be some
use of them.


And no use will emerge until there
will be enough of such semantic
information to work on.

stefano@apache.org

The ROI Problem


If writing ‘semantized’ information
is more expensive than writing
‘non
-
semantized’ information…


… and the return on this extra
costs don’t pay them off, it simply
won’t happen!

stefano@apache.org

The Screen
-
Scrape Problem


The great majority of web
information is published using
HTML, which has intrinsically poor
semantic capabilities.


If the extraction of semantic
information from HTML is done
using ‘screen
-
scraping’ the costs
will always exceed the benefits!

stefano@apache.org

The Marginal Cost Problem


If the marginal cost of adding
semantic information while
authoring some text is linear with
the text size, the whole semantic
web might never economically
scale! (especially together with the
ROI problem)

stefano@apache.org

Enabling semantic searching


We need a way to solve all the
previous problems, or there will
never be something better than
Google.

stefano@apache.org

Enter the solutions!


XML
-
based Web Publishing


Standardized semantic HTTP
variants


Semantic
-
aware content editors

stefano@apache.org

XML
-
based Web Publishing


XML
-
based web publishing systems
make it ‘economically worth’ to create
XML content.


This partially solves the chicken
-
egg
and the ROI problems since such
systems allow people to have
immediate benefits (especially for those
with cross
-
media publishing needs)

stefano@apache.org

HTTP Variants!


HTTP/1.1 has the notion of ‘resource
variants’. So it is possible to ask for a
specific
flavor

of a given resource.


If ‘semantic variants’ were
standardized, this might solve, together
with XML
-
based publishing systems, the
Screen
-
Scrape problem.


Apache Cocoon already implements
such a concept with ‘resource views’.

stefano@apache.org

Semantic
-
aware Content
Editors


A simple and cost
-
effective
solution for semantic
-
aware
content editing is a
conditio sine
qua non

for the production of
semantically
-
meaningful content.

stefano@apache.org

Conclusions (1)


Searching is the first scenario of
use of semantic web technologies
since it doesn’t require all the
infrastructure to be present.


Still, many problems must be
faced, especially those socio
-
economically
-
related ones that
academia is currently ignoring.

stefano@apache.org

Conclusions (2)


Without an
incremental and
economically feasible plan of
adoption
, the semantic web is
unlikely to happen.


The proposed plan of adoption that
uses XML publishing on the server
side along with standardized
semantic HTTP variants

stefano@apache.org

Conclusions (3)


Still, the biggest problem to face is
semantically
-
aware content editing
and the solution of the Babel
problem without requiring the
creation of huge ontologies that
will very unlikely be manageable
for the entire web.

stefano@apache.org

ToDo (1)


Agree on a way to publish the different
resource variants!


Agree on markups/metadata or, at
least, provide mechanical ways to
translate one into another.


Enforce the use of namespaced XML
(despite the lack of validation support
in DTD and lack of coherence between
the infoset and the syntax)

stefano@apache.org

ToDo (2)


Think about semantic
-
aware editing
(which is not only XML
-
aware, but also
RDF
-
aware!)


Research into less expressive (than
RDF) but more practical and cost
-
effective solutions to encode semantic
information into the schemas instead of
their content (semantic
-
sheets?,
semantic relevance ratings?)

stefano@apache.org

Thanks!


Any questions?