Question Answering System Based on Ontology and Semantic Web

cluckvultureInternet και Εφαρμογές Web

20 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

119 εμφανίσεις

Question Answering System Based on Ontology
and Semantic Web
Qinglin Guo and Ming Zhang
Department of Computer Science and Technology
Peking University,Beijing 100871,China
qlguo88@sohu.com,mz@163.com
Abstract.
Semantic web and ontology are the key technologies of Ques-
tion Answering system.Ontology is becoming the pivotal methodology
to represent domain-specific conceptual knowledge in order to promote
the semantic capability of a QA system.In this paper we present a QA
systemin which the domain knowledge is represented by means of Ontol-
ogy.In addition,a Chinese Natural Language human-machine interface
is implemented mainly through a NL parser in this system.An initial
evaluation result shows the feasibility to build such a semantic QA sys-
tem based on Ontology,the effectivity of personalized semantic QA,the
extensibility of ontology and knowledge base,and the possibility of self-
produced knowledge based on semantic relations in the ontology.And
experiments do prove that it is feasible to use the method to develop a
QA System,which is valuable for further study in more depth.
Keywords:
WWW,Ontology,Semantic Web,Question Answering.
1 Introduction
Semantic web technologies bring new benefits to knowledge-based Question An-
swering system.Especially,Ontology is becoming the pivotal methodology to
represent domain-specific conceptual knowledge in order to promote the seman-
tic capability of a QA system.Specific research in the areas of QA has been
advanced in the past couple of years particularly by TREC-QA[1].The QAcom-
petitions focus on open-domain systems that can potentially answer any generic
question.In contrast,a QA system working on a specific technical domain can
make use of the specific domain-dependent terminology to recognize the true
meaning included in a segment of natural language text.So we realize that the
terminology plays a pivotal role in a technical domain such as Java program-
ming.A great deal of work has been done re
presenting domain
-specific concepts
and the terminology by mean
s of Ontology [2].Recent research advancements on
Knowledge Representation with Semantic Web and Ontology have proved that
this methodology is able to promote the semantic capability of a QA system.
The Semantic Web is a Web that includes documents,or portions of docu-
ments,describing explicit relationships between things and containing semantic
information intended for automated processing by our machines.It operates on
G.Wang et al.(Eds.):RSKT 2008,LNAI 5009,pp.652–659,2008.
c

Springer-Verlag Berlin Heidelberg 2008
Question Answering System Based on Ontology and Semantic Web 653
the principle of shared data.When you define what a particular type of data is,
you can link it to other bits of data and say ”that’s the same”,or some other
relation.Although it gets more complicated than this,which is basically what
the Semantic Web is all about,sharing data through ontologies,and processing
it logically.Trust is also important,as the trust of a certain source is fully in the
hands of the user.Although the Semantic Web is a Web of data,it is intended
primarily for humans;it would use machine processing and databases to take
away some of the burdens we currently face so that we can concentrate on the
more important things that we can use the
Web for.For example,recent research
in information processing has focused on health care consumers [3].These users
often experience frustration while seeking online information,due to their lack of
understanding of medical concepts and unfamiliarity with effective search strate-
gies.We are exploring the use of semantic relationships as a way of addressing
these issues.Semantic information can guide the lay health consumer by sug-
gesting concepts not overtly expressed in a
n initial query.We present an analysis
of semantic relationships that were manually extracted from questions asked by
health consumers as well as answers provided by physicians.Our work concen-
trates on samples from Ask-the-Doctor Web sites.The Semantic Network from
the Unified Medical Language System(UMLS) [4] served as a source for semantic
relationship types and this inventory was modified as we gained experience with
relationship types identified in the texts.A semantic relationship associates two
concepts expressed in text and conveys a
meaning connecting those concepts.A
large variety of such relationships have been identified in several disciplines,in-
cluding linguistics,philosophy,computer science,and information science.Some
researchers have organized hierarchies of semantic relationships into meaningful
but not formal structures.
2 Semantic Web and Agent-Based Semantic Web
Services Query
Making the Web more meaningful and open to manipulation by software appli-
cations is the objective of the Semantic Web initiative.Knowledge representa-
tion and logical inference techniques formthe backbone.Annotations expressing
meaning help software agents to obtain semantic information about documents
[5].For annotations to be meaningful for both creator and user of annotations,
a shared understanding of precisely defined annotations is required.Ontologies
– the key to a semantic Web – express terminologies and semantic properties
and create shared understanding.Web ontologies can be defined in DAML+OIL
– an ontology language based on XML and RDF/RDF Schema.Some effort has
already been made to exploit Semantic Web and ontology technology for the
software engineering domain [6].DAML-S is a DAML+OIL ontology for de-
scribing properties and capabilities of Web services,which shows the potential of
this technology for software engineering.Formality in the Semantic Web frame-
work facilitates machine understanding and automated reasoning.DAML+OIL
is equivalent to a very expressive description logic [7].This fruitful connection
654 Q.Guo and M.Zhang
provides well-defined semantics and reasoning systems.In the conventional Web
Services approach exemplified by WSDL or even by DAML Services,the com-
municative intent of a message is not separated from the application domain.
This is at odds with the convention from the multi-agent systems world,where
there is a clear separation between the i
ntent of a message,which is expressed
using an agent communication language.This separation between intent and do-
main is beneficial because it reduces the b
rittleness of a system.If the character
of the application domain changes,then only that component which deals with
the domain-specific information need change;the agent communication language
component remains unchanged.
When the service in the QA example is invoked,the value of the input pa-
rameter should be an instance of the class restriction that is given as the input
parameter types in both the profile and the process descriptions.For the various
query performatives,this input parameter contains the query expression that
would be contained in the message content in a conventional agent-based sys-
tem.However,there is as yet no standard query language for RDF,DAML+OIL
or OWL,although there are several under development,including DAML Rules
[8,9].As an example,the domain ontology that we have designed for this appli-
cation is centred on events and reports of events.We have taken the approach
that communication in the systemwill be about these events and reports,so the
queries can be expressed using the anonymous resource technique by specifying
the properties that the report must possess.It should be noted,however,that we
did not specifically design the ontology in this report to circumvent the expres-
sive limitations of our chosen query language,but rather that the query language
was chosen because it was appropriate for use with the domain ontology that we
had already designed.
3 The Stochastic Syntax-Parse Model Named LSF of
Knowledge-Information in QAS
Local environment information is regarded as an important means to WSD in
sentence structure all along [10].But in some lingual models,which are assigned
by probability on the basis of rules traditionally,the probability of grammar-
producing model is only decided by non-terminal,while is independent of glos-
sarial example in analyzing tree.This quality of non-vocabulary makes lingual
phenomena description inadequate for probability model.Therefore,QAS adopts
the stochastic syntax-parse model named LSF.
Here,we describe a sort of basic probability depending model.It is named
lexical semantic frame (LSF for short [11]) in order to be put easily.LSF is
supposed as a result of character string
s
=
w
i
...
w
j
,SR(
R
,
h
,
w
i
) denotes
that w
i
among LSF relies on the word h through semantic relation,thus we can
write down the function SR(
i
)=SR(
R
,
h
,
w
i
)
.Analyzing semantic probability
p(SR(
i
)
|
h
,
w
i
)among words is on the basis of this model.The model supposes
that there exists high conjunction between depending relation R and Hyponym
node,the contradiction of data sparsely is less.So we can give LSF the analyzing
Question Answering System Based on Ontology and Semantic Web 655
probability from
w
i
...
w
j
.Unlike rules probability model,the probability model
parameter based on vocabulary association is usually gained from supervised
training as well as using tagged corpus.In fact,The reasons that we use both
the words in corpus and their Hyponym POS information to estimate P(LSF
|
w
i
...
w
j
) are:
(1) Vocabulary information plays a vital role on QA system.
(2) Considering the limit to corpus scale,words repetition has little probability
in sentence analysis,we must deal with statistic result smoothly [12].Vocabulary
information is needed to “magnify” to reduce the degree of data sparseness
with the help of Hyponym part of speech.But the close word class such as
preposition or adverb uses statistic information of words.We may use parameter
smoothing technology.In analysis course,dynamic scheming pruning process
and probability computing process are similar to rules probability model.If the
analysis of the two parts in one cell case having the same attribute structure,
then the analysis result of the part which has lower probability will be cast aside
and will not participate in the following analyzing-combining process.
Supposing that we inputting a sentence in QAS:“She eats pizza without
anchovies”,now we have:
P(
T
1
)=P(AGT
|
eat,she)P(OBJ
|
eat,pizza)P(MOD
|
pizza,anchovies) (1)
P(
T
2
)= P(AGT
|
eat,she)P(OBJ
|
eat,pizza)P(MOD
|
eat,anchovies) (2)
Supposing that we can gain the correlative model parameter through corpus
statistics such as Table 1,then:
P(
T
1
)=0.0025
×
0.002
×
0.003=1.5
×
10

6
P(
T
2
)=0.0025
×
0.002
×
0.0001=5
×
10

8
T
1
may be chosen to be the right result according to this.If we convert “an-
chovies” to “hesitation”,then P(
T
1
)=5
×
10

8
,P(
T
2
)=4
×
10

7
.We find that lan-
guage model may also help us to choose sound analysis result with the change
of words in sentence.This is just about its merit.
Table 1.
Interrelated model parameters
PFUNC(X) Value
P(AGT
|
eat,she) 0.0025
P(OBJ
|
eat,pizza) 0.002
P(MOD
|
pizza,anchovies) 0.003
P(MOD
|
eat,anchovies) 0.0001
P(MOD
|
pizza,hesitation) 0.0001
P(MOD
|
eat,hesitation) 0.0008
4 Explaining Answers from the Semantic Web
Semantic Web aims to enable applications to generate portable and distributed
justifications for any answer they pro
duce.Users need to decide when to trust
answers before they can use those answers with confidence.We believe that the
656 Q.Guo and M.Zhang
key to trust understands.Explanations of knowledge provenance and derivation
history can be used to provide that understanding [13].In one simple case,Users
may need to inspect information contained
in the deductive proof trace that was
used to derive implicit information before they trust the system answer.Some
users will decide to trust the deductions if they know what reasoner was used
to deduce answers and what data sources were used in the proof.Other users
may need additional information including how an answer was deduced before
they will decide to trust the answer.Users may also obtain information from
hybrid and distributed systems and they may need help integrating answers and
solutions.Inference Web addresses the issues of knowledge provenance with its
registry infrastructure called Semantic Web Ontology [14].It also addresses the
issues concerned with insp
ecting proofs and explanat
ions with its browser.It
addresses the issues of explanations with its language axioms and rewrite rules.
In order to present the findings,the analyst may need to defend the conclu-
sions by exposing the reasoning path used along with the source of the infor-
mation.In order for the analyst to reuse the previous work,s/he will also need
to decide if the source information and assumptions used previously are still
valid.Inference Web includes a new explanation dialogue component that was
motivated by usage observations.The goal is to present a simple format that is
a typical abstraction of useful information supporting a conclusion.The current
instantiation provides a presentation of the question and answer,the ground
facts on which the answer depended,and an abstraction of the metal informa-
tion about those facts.There is also a follow-up action option that allows users
to browse the proof or explanation,obtain the assumptions that were used,get
more information about the sources;provide input to the system,etc.
5 Implement of QAS
Our Automatic Question Answer System includes three models:question’s se-
mantic comprehension model based on Ontology and Semantic Web,FAQ-based
question similarity match model,document warehouse-base automatic answer
fetching model.The question’s semantic comprehension model combines many
natural language processing techniques,including Ontology and Semantic Web,
Segmentation and Part-Of-Speech Tagging,the confirmation of the question
type,the extarction of keywords and extending,the confirmation of the knowl-
edge unit,Through these works,the intention of the user is held,which greatly
helped the last work of this system.The FAQ-based question similarity match
model is implemented by semantic sentence similarity computation,which is
improved by our system,this model can answer frequently-asked question fast
and concisely.The document warehouse-base automatic answer fetching model
firstly deal with the document warehouse beforehand and construct inversed in-
dex,then use high efficient information retrieval model to search in the base and
return some relevant documents,lastly,we use answer extraction technique to
get the answer from these relevant documents and present it to users.For the
question that cannot be answered by FAQ base,this model can automatically
Question Answering System Based on Ontology and Semantic Web 657
return exact answer fast.The document repository pre-processing module includ-
ing Web pages crawlering,HTML format filtering,segmentation and Tagging
etc.we receive a term-docu
ment matrix by computer th
e word frequency.This
matrix is then analyzed to derive our particular latent semantic structure model
for later document retrieval and passage retrieval.Question analysis module is
important to QA system.Given a question,the system generates a number of
weighted rewrite strings.And then,transform the query into a vector by those
weighted rewrite strings.In this module,lay emphasis on question classifica-
tion.Systems classifies a query into the predefined classes based on the type
of answer it is looking for,and then use the question types to identify a candi-
date answer within the retrieved sentences
.Answer extraction module including:
document retrieval,passage retrieval and answer matching.System provides a
varying method to calculate weight and sort the answer by the weight.Finally,
the answer been restricted within 50
words long and returned to user.
The QAS focuses on the key techniques of pattern knowledge based ques-
tion answering [15].We design and implement the question answering system
and take part in the evaluation of Text Retrieval Conference.We also apply
the pattern matching technique to a new related research area Reading Com-
prehension,and a satisfied result is acquired.The key task to implement the
pattern matching technique is to construct a perfect pattern knowledge base.
We put forward a novel question classification hierarchy that is based on answer
type and question pattern.It retains the semantic and structured information
of questions.We make use of the questions on FAQ base as our training and
test data.The answer patterns to different question types are studied and eval-
uated automatically.We have implemented pattern learning to questions with
complex structure.It is more effective a
nd reliable to extract the correct answer
with answer patterns containing multiple question terms.For higher precision,
we give semantic restriction to candidate answers that are extracted by answer
patterns.We adopt generalization strategy to answer patterns using named en-
tity information.It makes the answer patterns have better extending ability;the
constituent elements of answer pattern contain both morphological and seman-
tic information with better robustness.We evaluate all the answer patterns by
the concept of Confidence and Support,which are borrowed from data mining.
Answer patterns with higher confidence lead to choose the answer with greater
reliability.Table 2 is the experimental results of QAS.
Table 2.
Experimental results of QAS
number of questions Answer correctly Answer mistakenly no responsion Accuracy recall
2000 1641 198 161 82.05 91.95
6 Conclusions
An initial evaluation is performed on our QA system,focusing on 4 aspects:
the feasibility to build such a semantic QA system based on not traditional
658 Q.Guo and M.Zhang
natural language text but Ontology,the effectivity of personalized semantic QA,
the extensibility of ontology and knowledge base,and the possibility of self-
produced knowledge based on semantic relations in the ontology.The test set
includes 100 questions sampled from a set of questions asked by the students
in a one-semester programming lesson,excluding the questions about reading a
segment of program,writing a small programto finish a function and so on,which
is beyond the ability of a QA system.At the same time,all these 100 questions
are ensured within the covering scale.For the scale of the initial evaluation,we
don’t distinguish the situations between no answer and a false answer.These
two situations are regarded as the same - no answer.
The initial evaluation result shows the feasibility of building a semantic QA
systembased on Ontology and Semantic Web.The personalized answering based
on a user model benefits to focusing the user’s more attentions on fresh learning
material.A user can get the direct answ
ers about some questions based on se-
mantic QA,which shows the effectivity of the system.In no answer situation,the
systemtakes a big proportion,which shows the good extensibility,for the answer
can be easily supplied into the knowledge ontology without conflicting with the
semantic relations defined in the ontology.At last,the system takes a small pro-
portion,in which the ontology needs to be expanded and ontology consistency
must be ensured.How to prove the possibility of self-produced knowledge based
on semantic relations in the ontology?A simple example is that the property re-
quire is a transitive property,so if the fact that document Arequires document B
and document B requires document C is stated in the knowledge ontology,a new
document relation,document A requires document C,would be self-produced
based on the system inference.Afterwards,the update of inter-dependency be-
tween documents would bring new answer for a question.And experiments do
prove that it is feasible to use the method based on Ontology and Semantic Web
to develop a Question Answering System,which is valuable for further study in
more depth.
Acknowledgments
We would like to acknowledge the support from the National Natural Science
Foundation of Chin
a (90412010,70572090),the National High Technology Re-
search and Development Program (863 Program in china:2004AA1Z2450),HP
Labs China under “On line course o
rganization”,NSCF Grant#60573166.
References
1.Voorhees,E.M.:The TREC Question Answering Track.Natural Language Engi-
neering 17,361–378 (2006)
2.Laura,A.,Thomas,C.:Semantic representation of consumer questions and physi-
cian answers.Int.J.of Meth.Inform.11,513–529 (2006)
3.Lindberg,D.,Humphrey,B.,McCray,C.:The Unified Medical Language System.
Int.J.Meth.Inform.Med.32,281–289 (2003)
Question Answering System Based on Ontology and Semantic Web 659
4.McCray,C.,Hole,W.:The scope and structure of the first version of the UMLS
Semantic Network.Int.J.Annu.Symp.Comput.16,126–130 (2006)
5.W3C Semantic Web Activity (2002),
http://www.w3.org/sw
6.Paolucci,M.,Kawamura,K.,Sycara,K.:Semantic Matching of Web Services Ca-
pabilities.In:Cruz,I.,Decker,S.,Allemang,D.,Preist,C.,Schwabe,D.,Mika,P.,
Uschold,M.,Aroyo,L.M.(eds.) ISWC 2006.LNCS,vol.4273,p.279.Springer,
Heidelberg (2006)
7.Baader,F.,McGuiness,D.,Schneider,P.P.:The Description Logic Handbook.
Cambridge University Publishers,Baader,England (2003)
8.Decker,S.:DAML Rules—An RDF Query,Inference and Transformation Language
(2007),
http://wwwdb.stanford.edu/stefan/daml/2007/07/03/rules/damlrules.ps
9.Paolucci,M.,Kawamura,T.,Payne,T.R.:Semantic matching of Web services
capabilities.In:Horrocks,I.,Hendler,J.(eds.) ISWC 2002.LNCS,vol.2342,pp.
206–211.Springer,Heidelberg (2002)
10.Collins,M.A.:New Statistical Parser Based on Bigram Lexical De-pendencies.In:
Proceedings ACL 2006 - 44th Annual Meeting of the ACL,Jose,America,pp.
184–191 (2006)
11.Terje,B.,John,A.:Natural language analysis for semantic document modeling.
Int.J.of Data and Know.Eng.28,45–62 (2005)
12.Shichao,A.,Mnhammed,J.:RMining Multiple Data Sources:Local Pattern Anal-
ysis.Int.J.of Data Mining and Knowledge Discovery 12,121–125 (2006)
13.McGuinness,D.L.:Trusting answers on the web.New Directions in Question An-
swering,Berlin,Germany (2005)
14.Lambrix,P.:Evaluation of ontology development tools for bioinformatics.Int.J.
of Bioinformatics 19,1564–1571 (2005)
15.Eilbeck,K.:The sequence ontology:a tool for the unification of genome annota-
tions.Int.J.Genome Biol.6,44–49 (2005)