Mining the web to improve semantic-based multimedia search and digital libraries

steelsquareInternet και Εφαρμογές Web

20 Οκτ 2013 (πριν από 4 χρόνια και 23 μέρες)

80 εμφανίσεις

Mining the web to improve

semantic
-
based multimedia search and
digital libraries



http://gate.ac.uk/

http://nlp.shef.ac.uk/




Horacio Saggion

Kalina Bontcheva

University of Sheffield


21 November 2006

IST Event 2006

Web Mining and Semantic Web: Networking with industry and academia



[This work has been partially supported by

SEKT (
http://sekt.semanticweb.org/
),

PrestoSpace (
http://www.prestospace.org
) and

TAO (
http://www.tao
-
project.eu/

projects]


2
(9)

Web mining and semantic
annotation: why?


Semantic annotation produces explicit
representation of knowledge, given content


Knowledge is often implicit in the data sources


…or hard to extract automatically to a sufficient
accuracy


Frequently knowledge can be mined from the
web and merged with the original content to
improve semantic search and reasoning
capabilities

3
(9)

Web mining and semantic
annotation: how?


GATE is a widely used open
-
source
infrastructure for text mining (
http://gate.ac.uk
):


Ten
years old, with 1000s of users at 100s of sites


Supports major document formats and languages


Helps build semantic annotation components


Integrate these with content and knowledge mined
from the web


Create, test, and deploy these into an end
-
to
-
end
application (some examples next)

4
(9)

RichNews: Multimedia Annotation


The problem:


Access to archive material in the BBC is provided
by some form of semantic annotation and indexing


Manual annotation is time consuming (up to 10x
real time) and expensive


Rich News (developed within the Prestospace
project) aims to (partially) automate the
annotation of news programs


Developed on BBC TV and radio news


Involving human in the loop is possible if desired


Recordings of broadcasts go in one end


Index of semantic metadata describing each
news story comes out the other

http://gate.ac.uk/sale/www05/web
-
assisted
-
annotation.pdf


5
(9)

Web mining in RichNews


Why web mining:


Speech recognition produces poor quality transcripts
with many mistakes


Closed captions/subtitles not always available


These news stories can also be found on the BBC
and other web sites


The solution:


Obtain key terms from the ASR transcripts


Search the web for related stories from same date


Find best matching stories


Obtain semantic annotations from this richer text


Merge with semantic annotations on transcript to
obtain more precise knowledge, grounded in the
video stream


http://gate.ac.uk/sale/www05/web
-
assisted
-
annotation.pdf


6
(9)

RichNews Example

7
(9)

TAO


Augmenting Software
Artefacts with Semantics


TAO project


http://www.tao
-
project.eu



Transitioning Applications to Ontologies


Case study on augmenting software artefacts
with semantics


Learning ontologies from multiple software
artefacts


Knowledge about a software project often
spread across different sources on the web:


Source code, discussion messages, bug descriptions,
documentation

8
(9)

New Challenges


Moving towards mining and semantically
annotating Web 2.0


Opinion mining from blogs and discussion
forums


Mining wikis


Social network analysis


Mining multimedia content


Initial experiments in ongoing projects, but
we need further work on these emerging
social
-
oriented web

9
(9)

Thank you!

These slides:


http://gate.ac.uk/sale/talks/ist06/ist
-
event06.ppt

Further details:


RichNews:
http://gate.ac.uk/sale/www05/web
-
assisted
-
annotation.pdf



SEKT:
http://gate.ac.uk/sale/iswc06/iswc06.pdf



TAO:
http://www.tao
-
project.eu