Pavlin Dobrev
<
p.dobrev@prosyst.com
>
Scientific Advisor: Galia Angelova
Linguistic Modeling Department,
Institute for Parallel Processing,
Bulgarian Academy of Sciences
Knowledge representation, modeling,
acquisition and management with
applications
in
Natural Language
Processing
(
PhD Thesis proposal
)
The concept of knowledge
•
Meaning of knowledge in PhD thesis
–
Linguistics knowledge
–
Knowledge related to the particular domain,
concept type hierarchy, relation type
hierarchy, definitions
•
Formalism for knowledge representation
–
Conceptual Graph
–
Ontologies/hierarchical structures
Current results and publications
•
Workgroup for conceptual graph
:
–
Formats for knowledge representation
–
Representation and modeling of knowledge with conceptual
graph
–
Editing of knowledge represented with conceptual graph
–
Architectures of applications for processing of conceptual graph
•
Extraction of conceptual graph from controlled English
•
Semantic web and natural language processing
•
Using of conceptual graph in the area of semantic web
for semantic annotations
•
Integration of application for processing of conceptual
graph
Conceptual Graph
•
As defined in Conceptual Graph Standard a
conceptual graph (CG or graph) is an abstract
representation of logic with nodes called
concepts and conceptual relations, linked
together by arcs
•
They express meaning in the form that is:
–
logically precise
–
humanly readable
–
computationally tractable
•
In CGWorld conceptual graph is any collection of
concepts and relations linked by their
appropriate arrows or co
-
referent links
Representation of sentence „John is going
to Boston by bus” as conceptual graph
CGWorld
–
from Conceptual Graph
Theory to the Implementation
CGWorld
was
first
introduced
at
ICCS
2000
.
Future
development
was
presented
at
ICCS
conferences
during
the
next
years
.
The
main
goals
followed
in
the
design
and
development
of
the
CGWorld
workbench
are
:
•
to
allow
for
collaborative,
distributed
acquisition
and
editing
of
a
CG
knowledge
base
;
•
to
provide
easy
search
and
navigation
in
a
large
KB
;
•
to
maintain
different
representation
languages,
thus
accommodating
the
needs
of
different
users
of
CGWorld
and
the
different
applications
the
KB
of
CGs
is
used
in
;
•
to
provide
a
graphical
editor
and
viewer
for
CGs
that
is
easy
to
use
by
non
-
experts
in
CG
theory
•
to
integrate
and
add
Web
access
to
previously
developed
CG
applications,
written
in
different
programming
languages
.
High level architecture
Conceptual Graph editor
Primary Market
is
defined as
Financial
Market
where newly
issued financial
instruments are
traded.
Main features of the CGWorld
Editor
•
Portable across all platforms (It has been tested with the
most popular browsers
–
Opera, Mozila, Netscape and
Internet Explorer);
•
Any number of graph windows may be opened for
editing;
•
Concepts, relations, arcs, coreference links and contexts
are supported for editing via a simple Drag & Drop
interface;
•
Ability to customize the color, the position and the size of
conceptual objects;
•
Ability to assign any number of additional properties to
the conceptual objects (e. g. number, definite marker,
comment);
•
Zooming capability;
•
Storing and retrieving of conceptual graphs to/from the
application server.
Conceptual Graph
Knowledge Base
(Visual and CGIF)
A convertible bond
is one which is
convertible into the
company common
stock
When a bond is
converted to
common stock
,
the corporate debt
is reduced
A bond is
converted into
common stock
Formats for knowledge representations (1/2)
Bond is a
security which
represents debt
of corporation
Different Format
of Knowledge
Representation:
•
CGIF
•
FOL
•
CGLex
•
CGXML
Formats for knowledge representations (2/2)
Different Format of Knowledge Representation
•
NL: A bond is converted into common stock.
•
FOL: exists(A1,exists(A0,convert_into(A0,A1)
& bond(A0) & common_stock(A1)))
CGLex:
cgc(55,simple,'bond',[fs(num,sing)],[]).
cgc(53,simple,'common_stock',[fs(num,s
ing)],[]).
cg(155,[cgr(convert_into, [55, 53], _)],
none,
fs(kind,'body_of_context'),
fs(comment,'A bond is converted into
common stock')]).
XML:
-
<relation type="
convert_into
">
-
<concept type="
bond
">
<number type="
single
" />
</concept>
-
<concept
type="
common_stock
">
<number type="
single
" />
</concept>
</relation>
Join
Operation
Type
Contraction
Operation
Challenge: to acquire formal
specifications from NL
•
too complicated task
-
> at present feasible
for controlled NL only
•
there are many approaches to acquire CGs
from controlled NL (see the proceedings
);
in
general:
–
with limited vocabulary,
–
recognition of phrases and/or simple sentences,
–
often missing syntactic analysis as a separate
module,
–
type labels are juxtaposed to sentence words,
–
relation types: either thematic roles, or key
-
words,
–
limited capacity to acquire contexts and coreferences.
Our approach: start from a NLU
machine
•
PARASITE (Allan Ramsay, UMIST, Manchester,
UK) provides syntactic analysis and processes
extended discourse (i.e. recognises
coreferences)
•
builds a “model” for every semantically correct
discourse (and logical forms for each sentence)
•
Checks contradictions with the given meaning
postulates
•
Our prototype
CGExtract
is focused on the
proper KB issues
CGExtract
•
Acquires from English sentences
–
type hierarchy
–
type definitions
–
graphs
•
checks loop definitions and contradictions
between the newly defined graph and the
existing KB facts
•
Visualisation provided by CGWorlds modules
(see windows and menus in the text)
Extracting conceptual graph in
correct discourse
Logical Model of the Input Text
•
for(n541, n546(n541)).
•
issue(n541).
•
theta(n541, agent, n543).
•
theta(n541, object, n544).
•
local_government(n543).
•
authority(n543).
•
municipal_bond(n544).
•
theta(n544, purpose n545).
•
theta(n545, agent, n543).
•
pay(n545).
•
community_infrastructure_p
roject(n546(n541)).
•
predication(n925).
•
theta(n925, topic, n929).
•
theta(n925, pred, n929).
•
investor(n928).
•
income_tax(n929).
•
free(n929).
•
of(n929, n928).
•
interest(n929).
A local government authority issues a municipal bond to pay for a community
infrastructure project. An interest of the municipal bond is an income tax free
Generated Conceptual Graph
from the Input Text
cgc(101,simple,interest,[],_3661).
cgc(102,simple,free,[],_4244).
cgc(103,simple,income_tax,[],_4788).
cgc(104,simple,predication,[],_5332).
cgc(105,simple,community_infrastructure_project,[],_5902).
cgc(106,simple,pay,[],_6459).
cgc(107,simple,municipal_bond,[],_7029).
cgc(108,simple,government_authority,[],_7612).
cgc(109,simple,issue,[],_8182).
cg(110,[cgr(of,[101,107],_8734),cgr(pred,[104,101],_8776),
cgr(topic,[104,101],_8818),cgr(agent,[106,108],_8860),
cgr(purpose,[107,106],_8902),cgr(object,[109,107],_894
4),cgr(agent,[109,108],_8986),cgr(for,[109,105],_9028)],[
],
[fs(kind,normal),fs(comment,)]).
Negative sides of our approach
•
Too much resources required for filling in
the linguistic data (e.g. the lexicon;
fortunately most of the English syntax is
embedded in PARASITE)
•
Special efforts to understand the existing
PARASITE’s prover
Positive sides of our approach
•
(1) Lexicon, (2) meaning postulates (similarly to
canonical graphs) as well as (3) initial type
hierarchy are always obligatory for automatic KA
-
all systems need to have them
-
so what we
gain is the syntactic analysis and the embedded
semantic analysis of the linguistic semantics
•
This allows us to focus on the proper KB
consistency
Semantic Web Challenges
V. Richard Benjamins, Jesús Contreras, Oscar Corcho and
Asunción Gómez
-
Pérez, The six challenges for the
Semantic Web. White paper 2002
•
Challenge 1: The Availability of Content
•
Challenge 2: Ontology Availability, Development and
Evolution
•
Challenge 3: Scalability of Semantic Web Content
•
Challenge 4: Multilinguality
•
Challenge 5: Visualization
•
Challenge 6: Semantic Web Languages Standardization
Semantic Web Challenges
•
Challenge 1: Almost no annotated content
•
Challenge 2: according to the results from
Interoperability Working Days in Madrid
(October 10th
-
11th 2005) we are still far from achieving ontology
development tools interoperability using RDF(S) as an
interchange format.
•
Challenge 3: We cannot talk about the scalability
because of non availability of the content.
•
Challenge 4: Most work in the Semantic Web area only
for English
•
Challenge 5: No standards for visualization
–
Maybe CG
•
Challenge 6: Semantic Web Languages Standardization
–
It was expected RDF and OWL to be available in 2002.
World Wide Web Consortium Issues RDF and OWL
Recommendations at 10 Feb 2004
Ontology visualization for the
semantic web
•
Simultaneous view of a
concept in the ontology
hierarchy and its
instances on the web
page
•
draw lines
between concept
and its instances
•
Showing both language
context of concept’s
usage as well as its
ontological environment
•
Application: support
user’s comprehension
while reading a web
page
Semantic Annotations
•
Manual annotation strictly depends on the
individual
-
> result is ambiguous
•
Fully automatic annotation is impossible
-
human intervention is always necessary
•
Prototype that uses:
–
NLP for automatic extraction of formal knowledge
(CGs)
–
CGs for visualization and enrichment of annotations
Extract conceptual graphs from texts
•
Visualization of concepts’
properties and their
relationships as
Conceptual Graphs (CGs)
•
GGs querying and
inference capabilities
can be exploited
•
Concept
-
> view
assertions relevant to
this concept
•
CGs are extracted from
CG KB developed by a
previous project and
extended by the prototype
Simplify conceptual graph
•
Simplify Conceptual
Graph
–
Type Contraction
Operation
typedef
“verb”
is
[AgentType] <
-
(agnt) <
-
[verb]
-
> (obj)
-
>
[ObjectType].
[Concept]
-
> (def)
-
>
[Concept: CG].
Simplify conceptual graph
•
Simplify Conceptual
Graph
–
Type Contraction
Operation
typedef
“verb”
is
[AgentType] <
-
(agnt) <
-
[verb]
-
> (obj)
-
>
[ObjectType].
[Concept]
-
> (def)
-
>
[Concept: CG].
CG Tools Integration
•
The formal approach for integration is chosen based on
definition of
Levels of Conceptual Interoperability Model
-
LCIM:
•
Andreas Tolk, James Muguira, The Levels of Conceptual
Interoperability Model (LCIM), Fall Simulation
Interoperability Workshop, Orlando, FL, September 2003
•
[Т04]
Andreas Tolk, Composable Mission Spaces
and M&S Repositories
-
Applicability of Open Standards
Spring Simulation Interoperability Workshop,
Washington, D.C., April 2004
•
It is expected LCIM to be part of the Software Integration
standard of Software Engineering Institute of CMU
:
http://www.sei.cmu.edu/isis/guide/introduction/lcim.htm
Levels of Conceptual Interoperability (LCIM)
•
On level 0, no connection is established at all.
•
On level 1, the technical level, physical connectivity is
established allowing bits and bytes to be exchange.
•
On level two, the syntactical level, data can be
exchanged in standardized formats, i.e., the same
protocols and formats are supported.
•
On level 3, the semantic level, not only data but also its
contexts, i.e. information, can be exchanged. The
unambiguous meaning of data is defined by common
reference models.
•
On level 4, the pragmatic/dynamical level, information
and its use and applicability, i.e. knowledge, can be
exchanged. The applicability of information is here
defined in an unambiguous form.
•
On level 5, the conceptual level, a common view of the
world is established, i.e. an epistemology.1 This level not
only comprises the implemented knowledge, but also the
interrelations between these elements.
Amine Platform
Man eats food with
spoon
CharGer
Charger use XML
file for CG
WebKB
Interoperability problems of current
systems based on conceptual graph
•
All available CG Tools are result from some research
projects
•
Implement CGIF. Non availability of the content
•
Software platform and test suites
•
No standardized display form
•
No Web services available
•
No standardization of CG in the Semantic Web
Languages
•
Reasoning with Conceptual Graphs
•
Persistence and Scalability
•
Internal Formats of CGs Representations Used in the
Tools
Levels of Conceptual Interoperability (LCIM)
for existing systems (1/2)
•
Level 1, the technical level
–
we have it. Most of
the tools are in Java and/or can interact with
Java (e.g. HTTP access to WebKB). The
authors of CG Tools must find information
exchange points that are possible to be used in
order to exchange data and/or components.
•
Level 2, the syntactical level
–
we do not have it.
We are far from achieving CGIF interoperability.
If we implement common standards like XML,
WSDL, UDDI and requester services in a
common registry, we will have it.
Levels of Conceptual Interoperability (LCIM)
for existing systems (2/2)
•
Level 3, the semantic level
–
we do not have it. We will
achieve it if we agree on a common understanding what
CGIF and CG are and how they must be processed and
visualized by the CG Tools.
•
Level 4, the pragmatic/dynamical level
–
we do not have
it. We need to define a common architecture or standard
that is enough open in order to allow components from
one tool to be reused in other, test data sets and
services interoperability.
•
Level 5, the conceptual level
–
we do not have it.
Ontology standards are required in order to achieve it.
Good direction is Standard Upper Ontology
(http://suo.ieee.org/).
Current results (1/2)
•
The formal methods for knowledge representation,
modeling, acquisition and management are analyzed
and classified.
•
Research on main methods for visualization of
knowledge base is performed.
•
Algorithms for representation of knowledge using
different formalisms are created as well as algorithms for
converting between them.
•
The component architecture for system for knowledge
representation, modeling, acquisition and management
is designed. It is based on the natural language
processing technologies.
•
The approach for using of automatically annotated texts
is developed including editing of the annotation by the
knowledge engineers
.
Current results (2/2)
•
CGWorld
-
A Web Based Workbench for Conceptual
Graphs Management and Applications
•
Application of CGWorld in Larflast
–
financial domain
.
•
Integration of existing systems (DBR
-
MAT)
•
CGExtract
–
extract of the conceptual graph from
controlled English
•
ViSem prototype for semantic annotations using
conceptual graph
.
•
CGWolrd is available on
:
http://larflast.bas.bg:8080/
Conclusion and Further Work
•
The general idea is to provide a set of
components that can be used as building blocks
for CG applications
•
The integration of CGs in web page annotation
enables:
–
better visualization
–
easy editing and enrichment of annotations
•
Future directions
–
more on visualization of semantic web knowledge
–
Lots of work must be done before we can really say
that we have interoperability between the GC Tools.
Publications and references
•
Number of references related to the PhD thesis
–
10
•
Conferencies
:
ICCS 2000, ICCS 2001, ICCS
2002, AIMSA 2004, ICCS 2005, BIS 21++
Information Days
,
ICCS 2006 (Accepted for
publication)
•
References of these publications
–
12.
With big
number of references
-
4:
–
CGWorld
-
A Web Based Workbench for Conceptual
Graphs Management and Applications
–
CGExtract: towards Extraction of Conceptual Graphs
from Controlled English
•
References to the CGWorld home page
–
8
Publications related to the PhD thesis
1.
P. Dobrev.
CG Tools Interoperability and the Semantic Web Challenges
, accepted for publication in
Contributions to ICCS 2006
-
14th International Conference on Conceptual Structures, Aalborg University
Press
2.
P. Dobrev.
Knowledge Management in Natural Language Processing Using Conceptual Graph
, BIS 21++
Information Days, 21
-
23 March 2006, Velingrad, Hotel Kamena,
http://www.euromap.bas.bg/velingrad/PDobrev.zip
3.
P. Dobrev, A. Strupchanska.
Conceptual Graphs and Annotated Semantic Web Pages
. In Common Semantics
for Sharing Knowledge: Contributions to ICCS 2005, 13th International Conference on Conceptual Structures,
ICCS 2005, Kassel, Gremany, pp. 54
-
65, ISBN 3
-
89958
-
138
-
5
4.
Dobrev P, Strupchanska A., Angelova G.,
Towards a Better Understanding of the Language Content in the
Semantic Web
, AIMSA 2004, Varna Bulgaria, September 2004
5.
Dobrev P., Toutanova K.,
CGWorld
-
Architecture and Features
, ICCS 2002, Borovets, Bulgaria, July 2002,
Lecture Notes in Computer Science 2393 Springer 2002, ISBN 3
-
540
-
43901
-
3
6.
Dobrev P., Strupchanska A., Toutanova K,
CGWorld
-
from Conceptual Graph Theory to the Implementation
,
Applications with Conceptual Structures Workshop at ICCS
-
2002
, Borovets, Bulgaria, July 2002
7.
Boytcheva Sv., P. Dobrev and G. Angelova.
CGExtract: towards Extraction of Conceptual Graphs from
Controlled English.
In: G. Mineau (Ed.),
Conceptual Structures: Extracting and Representing Semantics
,
Contributions to ICCS
-
2001, the 9th Int. Conference on Conceptual Structures, Stanford, California, August
2001, pp. 89
-
116.
8.
Dobrev P., Strupchanska A., Toutanova K.,
CGWorld
-
2001
-
New Features and New Directions, CGTools
Workshop at ICCS
-
2001
, Stanford, CA, USA, August 2001, electronic proceedings at
http://www.cs.nmsu.edu/~hdp/CGTools/proceedings/papers/CGWorld.pdf
9.
A. Strupchanska, P. Dobrev, S. Boytcheva, T. Nikolov, K. Toutanova,
Sample
Knowledge Base in
Finance
,Contribution to CGTools Workshop at ICCS 2001 (http://www.ksl.stanford.edu/iccs2001/CGTools/)
10.
Dobrev P., Toutanova K., CGWorld
-
A Web Based Workbench for Conceptual Graphs Management and
Applications
, In Proceedings of the ICCS
-
2000 (Working with Conceptual Structures), Darmstadt, Germany,
August 2000
Plan for finishing of the PhD thesis
•
Paper for magazine Information and Control that present
summary of the results of the PhD thesis
.
•
Paper for ICCS 2007 that includes new results of the
PhD student.
•
Finishing of the full text of the PhD thesis in
9
-
12
months
:
–
Extending of the
survey
of the current methods and systems for
knowledge representation, modeling, acquisition and
management
–
Including new results of the PhD student
–
Extending the explanation of the existing systems that are based
on the results included in the PhD thesis
–
Extending of the English
-
Bulgarian dictionary for the domain
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment