Annotating Documents for

sounderslipInternet and Web Development

Oct 22, 2013 (3 years and 7 months ago)

49 views

Annotating Documents for
the Semantic Web

Using Data
-
Extraction
Ontologies


Dissertation Proposal


Yihong Ding

2

Motivation


The representation of web content
limits its usability



A machine understandable web


Shared, explicit, formal
conceptualizations (ontologies)


The semantic web

3

A Problem



How to transform current web to be
the semantic web?

4

A Solution:

Semantic Annotation


Add explicit, formal, and
unambiguous metadata to web
documents



Explicit: publicly accessible


Formal: publicly agreeable


Unambiguous: publicly identifiable

5

Annotation
Representation

Explicit
Annotation

Implicit
Annotation

6

Semantic Annotation
Current Research Status



Manual annotation through friendly
interfaces [Annotea, etc.]



Automatic annotation with ontology
generation [SCORE]



Automatic annotation using automated IE
tool based on pre
-
defined ontologies
[SemTag, MnM, etc.]

7

Current Automatic Annotator


a typical paradigm






Domain Ontology

Non
-
ontology
-
based IE
Wrapper

Rules and

extracting categories

Document

(1) Extraction

(2) Alignment

(3) Annotation

8

Current Automatic Annotator


Problems






Domain Ontology

Document

(1) Problem of

data recognition

(2) Problem of

concept disambiguation

(3) Problem of

Annotation formatting,

storing, indexing, sharing

(4) Problem of

Assembling ontologies

Non
-
ontology
-
based IE
Wrapper

Rules and

extracting categories

9

“Main Drawback of Using
Automated IE”

[Kiryakov04]


“none of these approaches expects an input or
produces output with respect to ontologies”



“a set of heuristics for post
-
processing and
mapping of the IE results to an ontology … not
sufficient for large
-
scale, domain
-
independent
semantic annotation.”



“IE and wrapper induction techniques need to use
the ontology more directly during the process of
extraction.”

10

Ontology
-
driven Paradigm

(Data
-
Extraction Ontology)

for Semantic Annotation






Document

Non
-
ontology
-
based

IE Wrapper

Ontology
-
based

IE Wrapper

Document

11

Ontology
-
driven Paradigm

for Semantic Annotation

Some Arguments



Resiliency w.r.t. web page layouts (helps scale to large set
of web pages)



Adpativeness w.r.t. domain specifications (helps scale to
large size domains)



Creation of ontologies: still a problem but no longer a
drawback



Speed of execution: still a drawback (but we are going to
propose a solution next)

12

Two
-
Layer

Annotation Model

Conceptual Annotator

using an

ontology
-
based IE tool

Document

Structural

Annotator

Sample

Annotation

Process

Similar

Documents

Massive

Annotation

Process

13

Structural Annotator


Major components


HTML hierarchical path that leads to concept
locations


Local context around locations


Dependencies among multiple semantic
categories


Significance


Identify both categories and their semantic
meanings

14

Ontology Factors in
Semantic Annotation Tasks


Knowledge specification


Semantic web community


Web Ontology Language (OWL)



Knowledge instantiation


IE and database community


Object
-
oriented System Model in XML
(OSMX)

15

Ontology Conversion


Similarities (OWL vs. OSMX)


Class vs. object set


ObjectProperty vs. relationship set


Cardinality restriction vs. participation constraint


subclassOf vs. is
-
a relationship


Unique features


OWL


subpropertyOf


symmetric and transitive property


namespace declaration


ontology importing


OSMX


arbitrary n
-
ary relationship sets


data frames


general constraints

16

Ontology Construction



An Unavoidable Problem



Semantic annotation tasks require
ontologies.



The ontology for a specific semantic
annotation task is not promised to be
available all the time.

17

Ontology Construction



General and Special


Generally speaking


Until now, main stream, manual construction


Automatic and semi
-
automatic ontology generation, many
research papers, few or none practical, a very hard
problem



Special to semantic annotation purpose


Very dynamic and variant domains


Much overlapped information


Limited size of scope for one web page


Flat structure

18

Ontology Construction



Knowledge Reusing


“What has been will be again, what has
been done will be done again; there is
nothing new under the sun.”
(The Holy Bible,
Ecclesiastes, 1:9, NIV translation)



A “new” ontology is a new assembly with
unions and projections of several pre
-
existed ontologies.

19

Architecture on
Dynamically Assembling
Domain of Interest

Web Page

(1)

(2)

(1)
Knowledge
-
component selection

(2)
Ontology assembly











……

Collection of Knowledge






Selected Knowledge Components



Assembled Ontology








20

Thesis Statement


Propose a new solution to perform semantic
annotation on normal HTML web pages,
specifically

1.
apply ontology
-
based automatic IE techniques

2.
augment OWL with knowledge recognition
extension

3.
combine conceptual annotator and layout
-
based
annotator

4.
assemble a new domain ontology for an
annotation task dynamically

21

Standard Evaluation


Annotation performance


Precision


Recall


Speed of execution


Testing bed


5 ~ 10 different domains, with over 10 lexical
concepts in each domain ontology


20 ~ 50 web pages on each domain

22

Ontology Converter Test


A complete and sound checking is costly and
difficult to implement.



Our simple test


Start with an OSMX ontology
A


Covert it to OWL and then transform it back to be
OSMX ontology
B


Process both
A

and
B

to annotate a same set of web
pages (say 30


50 web pages)


Annotation results should be identical

23

Two
-
Layer Annotation
Model Evaluation


Standard evaluation



In addition


About five large web sites with
machine
-
generated web pages, each of
which contains at least dozens of web
pages

24

Dynamic Ontology
Assembler Evaluation


Regular precision and recall study according to
selected knowledge components



A pilot study on when ontology assembler works
better than manual ontology construction


Record the time to use a tool to create an ontology from
scratch


Record the time to assemble a same ontology


Compare their differences and the special conditions for
each case


Make empirical suggestions about how to build a
knowledge base that favors ontology assembly

25

Delimitations


Automatic ontology creation from scratch



Annotation storing, indexing, and sharing
mechanisms



Semantic annotation for multimedia content



Parallel or distributional computing to further
scale the semantic annotation system to a large
number of web pages

26

Contributions


To convert current web pages into machine
-
understandable semantic web
pages



Producing a pure ontology
-
driven semantic annotator using ontology
-
based
IE wrapper



Proposing a novel two
-
layer annotation model to do fast, accurate, and
resilient annotation



Studying a dynamic ontology assembler that helps maximize the reuse of
existing knowledge and minimize the load of manual ontology creation



Implementing an ontology converter so that this work is useful to the rest
of the semantic web society.