Research of semantic annotation prototype based on domain ontology

drillchinchillaInternet and Web Development

Oct 21, 2013 (3 years and 11 months ago)

90 views

Research of semantic annotation prototype

based on domain ontology

Yong

LIU

1
-
2
,
Zhen
-
Z
hen

LI
2

(1

The Department of Computer, Ocean University of China QingDao


ShanDong 266061
;

2

The College of Information Science and Technology, Qingdao University of S
cience and Technology

QingDao
ShanDong 266061)

Abstract
:

This

paper makes a
brief

research and
analysis of current semantic annotation prototypes
from
different levels and proposed a Chinese semantic
annotation m
e
thod which based on homeapplicance
domain
ontology according to existing shortcomings of
current semantic annotation pro
totypes.

The method
combined ar
ticle level and lexical level annotation
with SVM classification and NLP technology used,
which improves annotating integrity and provides
theoreti
cal basis for intelligent retrieval systems.

Key words
:

Semantic Web
;
domain ontology
;
semantic
annotation
;
SVM

1.

Introduction

Tim

Berners
-
Lee
, the
Web founder

,
put
s
forward
the idea of the Semantic Web
[1]

for the first time
in
1998
.
Semantic Web is an
expansion of the current
Web

rather than a new Web,

it focus
on

how to
make

the

web

information understand and deal with

by
computer
, that is,
the web information
with
semantic
element
s
. Current
ly

Semantic Web research and
application face many difficultie
s

which is far away
from the ideal application scene.
In
order to make
the
computer understand

the semantic
s

of

the

Semantic
Web
,
S
emantic

tag

must be added to the

existing
unstructured

semi
-
structured
and
structure data,
which
makes implicit semantic inf
ormation

be

explicit,
and then the needs of
Semantic
annotation are raised.

Semantic
annotation

technologies are
the key

in
realiz
ing the
Semantic Web

assuming,

and
directly
determine the availability and
scale

of
Semantic Web

which is also
one of the cor
e issues

in
the Se
mant
ic
Web research and application
.

The home page of
Semantic
annotation

[2]

list
s

more than
ten

kinds of
Ontology
-
based semantic

annotation

tool
s in
October
2004

such as
Annotea

Annozilla

Briefing Associate

GATE

Melita

OntoMat

Annotizer

Semantic
Markup Plug
-
in for IE

SemanticWor
d

SHOE

Knowledge Anotaor

SM
O
RE

Yawas

MnM

and
so on.
According to a

comparative analysis

of

r
epresentative

annotation tool
s
[3]
.

this

paper

makes a
brief

i
ntroduc
tion

about current
research situation

of
the semantic

annotation

tool
s

and
analy
z
es
the
shortcomings of

semantic annotation prototypes
;

And
then,

a Chinese semantic annotation
prototype was
proposed,

and its annotating
frame

and

process

were

also in
troduced
particularly.
Finally
,

the paper draws a
summarization and points out the future research prospect.

2. Annotation

Tech
n
ology Based on
Ontology

2.1 Analysis o
f Semantic Annotation
Prototype
s

Se
mantic
a
n
notation
prototype

was
analyzed

separate
ly
fro
m

annotation

location
,
annotation

l
anguage
,
the
level

of

annotating

automation
,
annotation

particle

of

the

document

i
n the following
s
.

1)
The annotation prototype
s

can be divided into

th
ree parts

a
ccording to
the
storage

location

of
annotation
[3]
:
stored
in
annotat
ing

server
, embedd
ing


in
annotat
ed documents
,

separate local file
s
.
Embed
ding

annotation

refers to

the
annotation of
page
stored in the
annotated

document
s

(such as SMORE,
etc.); MnM

and
OntoMat Annotizer
put their

annotation stored

in
separate

file
s
; Annotea
and
C
HOSE put their annotation
stored in a separate
annotation server.

2)
For annotation l
anguage,
little
semantic tools
support

Web

O
ntology Language

OWL

w
hich

recommended

by
the W3C
;

GATE, Melita, MnM
Supported XML
language
;
Only

Annotea

and SMORE
support

RDF

language.

O
WL
is
the
upgrading
language of RDF.

3)
M
anual

annotation
, semi
-
automatic

annotation

and automatic

annotation

are three types of
annotating

automation. Automatic

annotation

means
that
users
only need to input a document

and

select a
ontology
,
the
annotation

of

the document
s

is automatically
generated without any interaction, such as AeroDAML.

U
ser

s i
nteraction
s

are needed
in
semi
-
automatic

annotation
, or

the user's
checking
s

are
required
in
automatic


annotation
. Most of

the
annotation

are semi
-
automatic, such as SMORE, MnM, OntoMat Anotizer
and so on. Manual
annotation
refers to the user to
manually enter tagging script, such as SHOE
Kno
wl
edge Annotator.

4)
There are two annotating
particles
,

the article

level
annotating

(
whole article) and the

lexical

level
annotating

(based on article content tagging)
.

Lexical

level
annotating

is the real meaning of the semantic
annotating.

2.2
Shortcomings of current semantic
annotation prototypes

According to

current research status o
f semantic
annotation and
comparative

analysis of cu
rrent
semantic annotating tools
[3]
, shortcomings

of semantic
annotati
ng tools

can be summarized in the followings.

1)
T
he
majority of these
annotation

tools only
support manual
annotating
, a small number
of
them
support semi
-
automatic
annotating with

user
s
guid
ing
required
.The

level
of automation

is not enough and
accuracy
is very
poor;2) the ma
jority of the prototype
s

put annotations
embedded in Web pages
, with
lot
s

of
new
elements and attributes

add
ed
, a
nd these elements
and attributes
that
does not appear in the corresponding
DTD

will lead to DTD validation failure.

Therefore,
link
-
type
annotating

is an effective solution
.

The so
-
called link
-
type
annotating means that annotation
stored

in

separate docume
nts

with link elements added
which link to the annotated pages
.

3)
Most
tools

do not
support

ontology

other than

few support
voca
bulary
edit, modify and expand
.
4)
A lot
of tools are developed
abroad, only supports English
annotating

rather than
Chinese.

3.

An Annotation M
e
thod Based On
Domain Ontology

The

traditional

way

of

document

indexing

based

on

keywords can

no
t work

well

in

the

network

environment
, which
lack
s

the

ability

of
semantic

deducing.


B
ecause

of

o
ntology’
s
good

concept

structure

and

support
i
ng

for

l
o
gic

deducing
,

Ontology

is

proving

its

application

value

i
n information

retrieval

and so on
.

Ontology
-
based annotation prototypes

support
semantic annotating and
add
ing

semantic
information
t
o
the
w
eb
data,

which
mak
e

machines and

humans
understand
.


Most semantic annotating tools currently supported
manual
annotating
,

the

level

of automation

and
accuracy

are

not enough
,


and don

t support

ontology
;
E
specially

Chinese based tools
are

mostly in
researching phrase.
Pape
r

[
9
]

p
ropose
s
a
semantic

annotat
ion prototype
based on

teaching plan domain
ontology,
in which people

s
assistance

are needed
.
In
annotating m
e
thod
s
, or directly
annotating
document
s
with the domain ontology mapping

[4]
, but the ontology
mapping
tech
n
ology
has

t been

yet
mature
;

O
r

seman
tic
annotating after
changing web information

into
semantic
documents
[5]
,
but
all documents converted to
semantic
resource
is not
efficient
;
P
aper

[7]
proposed a
method

based on
a triple extraction
which
has
strong

function

in

semantic features extracti
ng

f
rom
documents
;
A

m
e
thod based on
s
emantic
r
ole

l
abeling
with
m
aximum

e
ntropy
c
lassifier

is proposed in paper

[10].
Semantic
annotation

technologies are
the key link
in
realiz
ing the
Semantic Web

assuming
, semantic tool
s
with

better
integrality

and

feasibil
ity

have
not
successful
ly researched.
Although

some
expert
s

and
researcher
s

ha
ve

proposed some annotating prototypes
and
have
ma
d
e

some
simulation experiment
s, semantic
annotation based on Chinese is
still

in researching
phase
.

In this paper,
a m
e
thod that

combining

the
article level
annotating

and

lexical
level
annotating
,
and
triple
s

extraction
with
the
co
m
binat
ion of


combined
triple
s
extraction

which based on SVM
c
lassifier
, improv
ing

efficiency and semantic
annotating

integrality
.

3.1 System Architectu
re

According to the

analysis

above
,

there are two types
of annotating
prototype
s
;
T
he article level

annotation

put
s

the entire document as a
unit
; According to
the
document meta
data

extraction,

a relation between
metadata annotation based on ontology and
t
he entire
document

was established
,
implement
ing

article level
semantic
annotation
.
L
exical
level annotation
put
s

the
segment
s

and
vocabular
ies

of the document

as a
annotating unit

rather than the
whole article
,
extract
ing
information
and annotating f
rom

t
he whole document,

which is the
article

level for detail.

Lexical

level
annotating

is the real meaning of the semantic
annotating.

Support Vector Machine (SVM) algorithm is widely
used
[6]
, and has been prove
d

fast

and
efficient
ly

in the
text cla
ssificatio
n
applications

[
8]
. In this paper,
a
model
is proposed with
SVM classifier
,

RDF triple
s

extraction

and
extraction of combin
ed

triple
s
,

system

frame shown in Figure
1.
This
paper

is
based on
home
appliances
domain ontology
, so

that

before
annotating,
domain on
tology

must be established and
parse
d into the collection
s
of clas
s
es
,
attributes,
relationships

and
triples

so on
,
all of which should be
stored in the ontology
repository

for
further

use.

For
web resources preprocessing, a filter
was used to
filter
out t
he resources

that

not
belong to the
domain
of
home
appliances,

and the left

was stored in
homeappliances
-
domain

repository.

Semantic
annotation

mod
el

implement
s

Semantic

annotating
and

stor
ing of

annotating

resources
.
The
homeappliances
-
domain

resource
s are
in correspondence with

the
annotated
documents;

they are
one
-
to
-
one mapping

and
can be accessed mutually. In the
process

of semantic
annotation, firstly,

some
pretreatment

are made such as
tokenization,

part of speech (pos) annotating
,
n
amed
e
ntity recogni
zers,
metadata
extracti
on,

etc
. Then,

the
ontology
concept
s a
re used to train SVM classifier
, the
classified

homeappliances
-
domain resource
s

are
separate
ly annotated in metadata annotating based on
article level and semantic annotating for the whole
docume
nt

based on lexical level.
F
inally
,
metadata
annotating documents and semantic annotating
documents are
create
d and stored
in
Annotation

repository
for further
and
searching
,
retrieval, etc
.

3.2 Tokenization and Part of Speech
Annotating

Chinese
is

differe
nt

from
English language,
which

has its own unique

characteristics
.

Currently,

the
Chinese Lexical Analysis System
ICTCLAS from
Institute of Computing Technology is relatively mature
software system in the research field of Chinese
tokenization and part of

speech (POS)

annotating,

n
amed
e
ntity recognizers
,

this paper use ICTCLAS as
Tokenization
,

POS annotating and n
amed
e
ntity
recognizers

function
.

the
core algorithm
of
ICTCLAS
are

the shortest path and hidden
M
arko

model
(HMM)
,
the shortest path algorithm
w
as used

for
tokenization
with HMM used
for
POS annotating.

I
ts tokenization
precision is

97.58% (973 Evaluating the result

from
recent
official

evaluation in national 973 project)
, the
recall
ing

rate for unknown word
recognized using roles
tagging achieve
more than

90%,and the Chinese names
recognizing achieve nearly 98%
,
processing speed
31.5Kbytes
/
s
, we choose ICTCLAS as our
tokenization and pos tools
.

3.3
Article

Level
Annotation

Metadata is the data's data
, used to describe other
data.
This
paper

define
s

the metadata
of
describing

the
document as follows: title, author, keywords,
literature
sources
,
a
bstract
, e
tc
. Article level

annotation
extract
s
the metadata
from
the
document.

Article level

annotating

is

only

a simple
annotation
.
First of all,
the
collected web data
was
filte
red

out
,

and
domain ontology

was

parse
d into the collection of
clas
s
es
,

attributes, re
lationships
,

triples

so on
.
Secondly,
metadata resources were

extract
ed according
to the defined home
appliances
-
domain metadata, and
XML metadata document was
create
d.

Finally,
the
ontology
concept
s are used to train SVM classifier, the
classified resou
r
ce
s and XML metadata
-
annotation
documents are stored in the annotation
repository.
If
the
home
appliances
-
domain resources

need to be
searched,

the users can
retrieve the
XML metadata
-
annotation documents from the annotation
repository
,then find the
home
appli
ances
-
domain
resources according to the relationship between
annotation document and

home
appliances
-
domain
document.

People don

t have to search the entire
resources

repository
,

which
improv
ing

query
efficiency.

3.4

Lexical Level Annotation

The annotation

of lexical level

is

the

real

semantic
annotation,

annotating the segment
s

and
vocabular
ies

of the
document
s
, which

is the
expand
ing of
article

level annotation,

and

has more significance in practice
.
In the f
ollowing
,

there is an
introduction

of annotating
process,

as follows
in Figure 2:


1.

Preprocessing of homeapplianc
e
-
domain resources
such as tokenization,

part of speech (pos) annotating
and

n
amed
e
ntity
recognizers.

2.

Judge

that
whether needs to combine
triple
s

newly
, if

NO

,
continue

to carry on next step, otherwise
,

jumping

to “the step 6”
.

3. After

part
-
of
-
speec
h
annotating
, each

sentence

consists of one or more
NVN, NVNPN,

etc,
extract
ing
the NVN
triple
s

model
.

4. T
riple
s

which
extract

from

the documents resources
match

with

triple
s

that

parsed from

the ontology model
.


5.
J
udg
e that

w
hether
the user
satisfy wi
th

the

match

result
s
, if


YES

,

jumping

to the “step 9”, otherwise
jumping

to the

“step 2” to carry on
triple
s

combine
d
.


6.

After
tokenization and
part
-
of
-
speech
annotation
,

classif
ying

the words
according to POS of
verb

and
noun, matching classified verb
s

and noun
s

with
ontology

vocabulary
,

and filtering out the words that do
not
belong to

ontology

vocabulary
.

7.

W
ord frequency statistics

on
filtered

verb and

noun,
obtain
ing
the high frequency
noun

and verb separately.

8. Generating

triple
s

of NVN patterns

from the

high
frequency

words
, match

the newly
triple
s

with

triple
s

that

parsed from

the ontology model
, and obtain the
matching
triple
s
.

9. Carr
y

on the a
rtificial
checking
, delete
the
triple
s

that
the user

does not

request
ed
, manual
annotations
are added
selective
ly.

Finally
,
produc
e

the semantic
annotation

documents.

4.

Conclusion

At present
,

semantic
annotating

based on Chinese

is
still in research phase
,
the
method

that
based on the
entire network

and
automatic

annotating

has
not yet

been

come out
, this
paper

propose
a

kind

of


semantic
annotation prototype
based on
homeappliances
-
domain

,

combining

the article level
with

lexical
level
annotation

and
triple extraction
with
combinat
ed

triple extraction
,

which
improv
ing

semantic
annotating

integrality
, and manual
correct
ing is added

which

im
prove

annotating

accuracy.

Semantic
annotation are
useful in

intelligent retrieval

and so on
,

its development
has the profound significance to other related domain,
which
is worth us going to the deep res
earch.

References

[1] Tim Bemers
-
Lee
.
Semantic Web Road

Map

[EB
/
OL]
.
http
://www.
w3
.
org
/
Designlssues
/
S
e
mantic
,

1998
-
10


[
2
] Semantic

web

Annotation

and Authoring
Community

homepage
[DB
/
OL]
.
http://
a
nn
o
ta
t
ion
.
Semant
ic

web
.
org
/
current

2004
-
04
-
18


[3]
Shu
-
mei

L
ia
o
.

Comments on Ontology

Based Semantic

Annotation

Prototypes

[J]
.
computer engineering &
science
V0l
_28

No

9

2006

[4]
Nian
-
Yun Shi,Chen Yang.
Towards domain ontology
-
based semantic annotation research

Computer Engineering
and Design
[J]
Dec

2007

Vo1

28No

24

[5] Guo
-
Bing Zhou.

Research and Realization

of

Semantic
Search Engine

T
echnology

Based

O
n

Domain Ontology
[
M]
.
ShangHai

University
.
2006.12.01

[6]

SVM Application List, 2005
.

Available from

URL
:

http:Hwww.clopinet.com/isabelle/Proje
c
ts/SVM/
a
pplist.h
t
ml

[7]
He

Hu
, Xiaoyong

Du
.

ConAnnotator:
Ontology

Aided
Collaborative Annotation
System. Proceedings

of the 10th
International

Conference on Computer Supported
Cooperative Work in Design
.2006

[8] K. Crammer
,

Y. Singer
.

On the algorithmic
implementation

of multi
-
class kernel

based vector machines
,
Machine

Learning Research, 2:265
-
292, 2001.

[9] Mei Li.

T
e
aching

P
lan

D
omain

O
ntolog
y

Construction

and
T
e
aching

P
lan

R
esource

A
nnotation

B
ased

O
n

OWL
[
M]
.

Southeast

university
.2005

[10]
Ting

L
iu
,Wan
-
Xiang

Che
,Sheng

Li .
Semantic Role
Labeling with Maximum Entropy Classifier

Journal of
Softw
are, Vol.18, No.3, March 2007


Author Introduction:

1.

Yong Liu, Associate Professor, Directions of Research:
Network

and Semantic Web

Tel
:

13906426319

email:liuyong0202@sohu.com

Address 506 mail box of
Qingdao University of Science and Technology, SongLin
g
Road,

Gaoxin
Area,

Qingdao, ShanDong, 266061

2.

Zhen
-
zhen

Li, Master, directions of Research:
Semantic
Web, Semantic

Annotation,
and Intelligence

Search

E
ngine

Tel
:

13791988843

e
-
mail:
lzz261575820@163.com