Knowledge representation, modeling,

religiondressInternet and Web Development

Oct 21, 2013 (3 years and 5 months ago)

40 views


Pavlin Dobrev

<
p.dobrev@prosyst.com
>


Scientific Advisor: Galia Angelova


Linguistic Modeling Department,

Institute for Parallel Processing,

Bulgarian Academy of Sciences

Knowledge representation, modeling,
acquisition and management with


applications
in

Natural Language
Processing

(
PhD Thesis proposal
)


The concept of knowledge


Meaning of knowledge in PhD thesis


Linguistics knowledge


Knowledge related to the particular domain,
concept type hierarchy, relation type
hierarchy, definitions


Formalism for knowledge representation


Conceptual Graph


Ontologies/hierarchical structures


Current results and publications


Workgroup for conceptual graph
:


Formats for knowledge representation


Representation and modeling of knowledge with conceptual
graph


Editing of knowledge represented with conceptual graph


Architectures of applications for processing of conceptual graph


Extraction of conceptual graph from controlled English


Semantic web and natural language processing


Using of conceptual graph in the area of semantic web
for semantic annotations


Integration of application for processing of conceptual
graph

Conceptual Graph


As defined in Conceptual Graph Standard a
conceptual graph (CG or graph) is an abstract
representation of logic with nodes called
concepts and conceptual relations, linked
together by arcs


They express meaning in the form that is:


logically precise


humanly readable


computationally tractable


In CGWorld conceptual graph is any collection of
concepts and relations linked by their
appropriate arrows or co
-
referent links

Representation of sentence „John is going
to Boston by bus” as conceptual graph

CGWorld


from Conceptual Graph
Theory to the Implementation

CGWorld

was

first

introduced

at

ICCS

2000
.

Future

development

was

presented

at

ICCS

conferences

during

the

next

years
.

The

main

goals

followed

in

the

design

and

development

of

the

CGWorld

workbench

are
:



to

allow

for

collaborative,

distributed

acquisition

and

editing

of

a

CG

knowledge

base
;



to

provide

easy

search

and

navigation

in

a

large

KB
;



to

maintain

different

representation

languages,

thus

accommodating

the

needs

of

different

users

of

CGWorld

and

the

different

applications

the

KB

of

CGs

is

used

in
;



to

provide

a

graphical

editor

and

viewer

for

CGs

that

is

easy

to

use

by

non
-
experts

in

CG

theory



to

integrate

and

add

Web

access

to

previously

developed

CG

applications,

written

in

different

programming

languages
.

High level architecture

Conceptual Graph editor

Primary Market

is
defined as
Financial
Market

where newly
issued financial
instruments are
traded.

Main features of the CGWorld
Editor


Portable across all platforms (It has been tested with the
most popular browsers


Opera, Mozila, Netscape and
Internet Explorer);


Any number of graph windows may be opened for
editing;


Concepts, relations, arcs, coreference links and contexts
are supported for editing via a simple Drag & Drop
interface;


Ability to customize the color, the position and the size of
conceptual objects;


Ability to assign any number of additional properties to
the conceptual objects (e. g. number, definite marker,
comment);


Zooming capability;


Storing and retrieving of conceptual graphs to/from the
application server.

Conceptual Graph
Knowledge Base

(Visual and CGIF)

A convertible bond
is one which is
convertible into the
company common
stock


When a bond is
converted to
common stock

,
the corporate debt
is reduced

A bond is
converted into
common stock

Formats for knowledge representations (1/2)

Bond is a
security which
represents debt
of corporation

Different Format
of Knowledge
Representation:



CGIF



FOL



CGLex



CGXML

Formats for knowledge representations (2/2)

Different Format of Knowledge Representation


NL: A bond is converted into common stock.



FOL: exists(A1,exists(A0,convert_into(A0,A1)
& bond(A0) & common_stock(A1)))



CGLex:

cgc(55,simple,'bond',[fs(num,sing)],[]).

cgc(53,simple,'common_stock',[fs(num,s
ing)],[]).

cg(155,[cgr(convert_into, [55, 53], _)],


none,


fs(kind,'body_of_context'),


fs(comment,'A bond is converted into
common stock')]).

XML:

-

<relation type="
convert_into
">

-

<concept type="
bond
">



<number type="
single
" />



</concept>

-

<concept
type="
common_stock
">



<number type="
single
" />



</concept>



</relation>

Join
Operation

Type
Contraction
Operation

Challenge: to acquire formal
specifications from NL


too complicated task
-
> at present feasible
for controlled NL only


there are many approaches to acquire CGs
from controlled NL (see the proceedings
);

in
general:


with limited vocabulary,


recognition of phrases and/or simple sentences,


often missing syntactic analysis as a separate
module,


type labels are juxtaposed to sentence words,


relation types: either thematic roles, or key
-
words,


limited capacity to acquire contexts and coreferences.

Our approach: start from a NLU
machine


PARASITE (Allan Ramsay, UMIST, Manchester,
UK) provides syntactic analysis and processes
extended discourse (i.e. recognises
coreferences)


builds a “model” for every semantically correct
discourse (and logical forms for each sentence)


Checks contradictions with the given meaning
postulates


Our prototype
CGExtract
is focused on the
proper KB issues

CGExtract


Acquires from English sentences


type hierarchy


type definitions


graphs


checks loop definitions and contradictions
between the newly defined graph and the
existing KB facts


Visualisation provided by CGWorlds modules
(see windows and menus in the text)

Extracting conceptual graph in
correct discourse

Logical Model of the Input Text


for(n541, n546(n541)).


issue(n541).


theta(n541, agent, n543).


theta(n541, object, n544).


local_government(n543).


authority(n543).


municipal_bond(n544).


theta(n544, purpose n545).


theta(n545, agent, n543).


pay(n545).


community_infrastructure_p
roject(n546(n541)).


predication(n925).


theta(n925, topic, n929).


theta(n925, pred, n929).


investor(n928).


income_tax(n929).


free(n929).


of(n929, n928).


interest(n929).

A local government authority issues a municipal bond to pay for a community
infrastructure project. An interest of the municipal bond is an income tax free


Generated Conceptual Graph
from the Input Text

cgc(101,simple,interest,[],_3661).

cgc(102,simple,free,[],_4244).

cgc(103,simple,income_tax,[],_4788).

cgc(104,simple,predication,[],_5332).

cgc(105,simple,community_infrastructure_project,[],_5902).

cgc(106,simple,pay,[],_6459).

cgc(107,simple,municipal_bond,[],_7029).

cgc(108,simple,government_authority,[],_7612).

cgc(109,simple,issue,[],_8182).

cg(110,[cgr(of,[101,107],_8734),cgr(pred,[104,101],_8776),

cgr(topic,[104,101],_8818),cgr(agent,[106,108],_8860),

cgr(purpose,[107,106],_8902),cgr(object,[109,107],_894
4),cgr(agent,[109,108],_8986),cgr(for,[109,105],_9028)],[
],

[fs(kind,normal),fs(comment,)]).


Negative sides of our approach


Too much resources required for filling in
the linguistic data (e.g. the lexicon;
fortunately most of the English syntax is
embedded in PARASITE)


Special efforts to understand the existing
PARASITE’s prover


Positive sides of our approach


(1) Lexicon, (2) meaning postulates (similarly to
canonical graphs) as well as (3) initial type
hierarchy are always obligatory for automatic KA
-

all systems need to have them
-

so what we
gain is the syntactic analysis and the embedded
semantic analysis of the linguistic semantics



This allows us to focus on the proper KB
consistency

Semantic Web Challenges

V. Richard Benjamins, Jesús Contreras, Oscar Corcho and
Asunción Gómez
-
Pérez, The six challenges for the
Semantic Web. White paper 2002





Challenge 1: The Availability of Content



Challenge 2: Ontology Availability, Development and
Evolution



Challenge 3: Scalability of Semantic Web Content



Challenge 4: Multilinguality



Challenge 5: Visualization



Challenge 6: Semantic Web Languages Standardization


Semantic Web Challenges



Challenge 1: Almost no annotated content



Challenge 2: according to the results from
Interoperability Working Days in Madrid
(October 10th
-

11th 2005) we are still far from achieving ontology
development tools interoperability using RDF(S) as an
interchange format.



Challenge 3: We cannot talk about the scalability
because of non availability of the content.



Challenge 4: Most work in the Semantic Web area only
for English



Challenge 5: No standards for visualization


Maybe CG



Challenge 6: Semantic Web Languages Standardization


It was expected RDF and OWL to be available in 2002.
World Wide Web Consortium Issues RDF and OWL
Recommendations at 10 Feb 2004

Ontology visualization for the
semantic web


Simultaneous view of a
concept in the ontology
hierarchy and its
instances on the web
page



draw lines
between concept
and its instances



Showing both language
context of concept’s
usage as well as its
ontological environment



Application: support
user’s comprehension
while reading a web
page

Semantic Annotations


Manual annotation strictly depends on the
individual
-
> result is ambiguous


Fully automatic annotation is impossible
-

human intervention is always necessary


Prototype that uses:


NLP for automatic extraction of formal knowledge
(CGs)


CGs for visualization and enrichment of annotations

Extract conceptual graphs from texts



Visualization of concepts’
properties and their
relationships as
Conceptual Graphs (CGs)



GGs querying and
inference capabilities
can be exploited



Concept
-
> view
assertions relevant to
this concept



CGs are extracted from
CG KB developed by a
previous project and
extended by the prototype

Simplify conceptual graph



Simplify Conceptual
Graph


Type Contraction
Operation


typedef

“verb”

is


[AgentType] <
-

(agnt) <
-

[verb]
-
> (obj)
-
>
[ObjectType].



[Concept]
-

> (def)
-
>
[Concept: CG].

Simplify conceptual graph



Simplify Conceptual
Graph


Type Contraction
Operation


typedef

“verb”

is


[AgentType] <
-

(agnt) <
-

[verb]
-
> (obj)
-
>
[ObjectType].



[Concept]
-

> (def)
-
>
[Concept: CG].

CG Tools Integration


The formal approach for integration is chosen based on
definition of
Levels of Conceptual Interoperability Model
-

LCIM:


Andreas Tolk, James Muguira, The Levels of Conceptual
Interoperability Model (LCIM), Fall Simulation
Interoperability Workshop, Orlando, FL, September 2003


[Т04]

Andreas Tolk, Composable Mission Spaces
and M&S Repositories
-

Applicability of Open Standards
Spring Simulation Interoperability Workshop,
Washington, D.C., April 2004


It is expected LCIM to be part of the Software Integration
standard of Software Engineering Institute of CMU

:

http://www.sei.cmu.edu/isis/guide/introduction/lcim.htm

Levels of Conceptual Interoperability (LCIM)


On level 0, no connection is established at all.


On level 1, the technical level, physical connectivity is
established allowing bits and bytes to be exchange.


On level two, the syntactical level, data can be
exchanged in standardized formats, i.e., the same
protocols and formats are supported.


On level 3, the semantic level, not only data but also its
contexts, i.e. information, can be exchanged. The
unambiguous meaning of data is defined by common
reference models.


On level 4, the pragmatic/dynamical level, information
and its use and applicability, i.e. knowledge, can be
exchanged. The applicability of information is here
defined in an unambiguous form.


On level 5, the conceptual level, a common view of the
world is established, i.e. an epistemology.1 This level not
only comprises the implemented knowledge, but also the
interrelations between these elements.

Amine Platform

Man eats food with
spoon


CharGer

Charger use XML
file for CG

WebKB



Interoperability problems of current
systems based on conceptual graph


All available CG Tools are result from some research
projects


Implement CGIF. Non availability of the content


Software platform and test suites


No standardized display form


No Web services available


No standardization of CG in the Semantic Web
Languages


Reasoning with Conceptual Graphs


Persistence and Scalability


Internal Formats of CGs Representations Used in the
Tools

Levels of Conceptual Interoperability (LCIM)
for existing systems (1/2)


Level 1, the technical level


we have it. Most of
the tools are in Java and/or can interact with
Java (e.g. HTTP access to WebKB). The
authors of CG Tools must find information
exchange points that are possible to be used in
order to exchange data and/or components.


Level 2, the syntactical level


we do not have it.
We are far from achieving CGIF interoperability.
If we implement common standards like XML,
WSDL, UDDI and requester services in a
common registry, we will have it.

Levels of Conceptual Interoperability (LCIM)
for existing systems (2/2)


Level 3, the semantic level


we do not have it. We will
achieve it if we agree on a common understanding what
CGIF and CG are and how they must be processed and
visualized by the CG Tools.


Level 4, the pragmatic/dynamical level


we do not have
it. We need to define a common architecture or standard
that is enough open in order to allow components from
one tool to be reused in other, test data sets and
services interoperability.


Level 5, the conceptual level


we do not have it.
Ontology standards are required in order to achieve it.
Good direction is Standard Upper Ontology
(http://suo.ieee.org/).

Current results (1/2)


The formal methods for knowledge representation,
modeling, acquisition and management are analyzed
and classified.


Research on main methods for visualization of
knowledge base is performed.


Algorithms for representation of knowledge using
different formalisms are created as well as algorithms for
converting between them.


The component architecture for system for knowledge
representation, modeling, acquisition and management
is designed. It is based on the natural language
processing technologies.


The approach for using of automatically annotated texts
is developed including editing of the annotation by the
knowledge engineers
.

Current results (2/2)


CGWorld
-

A Web Based Workbench for Conceptual
Graphs Management and Applications


Application of CGWorld in Larflast


financial domain
.


Integration of existing systems (DBR
-
MAT)


CGExtract


extract of the conceptual graph from
controlled English


ViSem prototype for semantic annotations using
conceptual graph
.


CGWolrd is available on
:

http://larflast.bas.bg:8080/


Conclusion and Further Work


The general idea is to provide a set of
components that can be used as building blocks
for CG applications


The integration of CGs in web page annotation
enables:


better visualization


easy editing and enrichment of annotations


Future directions


more on visualization of semantic web knowledge



Lots of work must be done before we can really say
that we have interoperability between the GC Tools.


Publications and references


Number of references related to the PhD thesis



10


Conferencies
:
ICCS 2000, ICCS 2001, ICCS
2002, AIMSA 2004, ICCS 2005, BIS 21++
Information Days
,
ICCS 2006 (Accepted for
publication)


References of these publications



12.
With big
number of references
-

4:


CGWorld
-

A Web Based Workbench for Conceptual
Graphs Management and Applications


CGExtract: towards Extraction of Conceptual Graphs
from Controlled English


References to the CGWorld home page


8

Publications related to the PhD thesis

1.
P. Dobrev.
CG Tools Interoperability and the Semantic Web Challenges
, accepted for publication in
Contributions to ICCS 2006
-

14th International Conference on Conceptual Structures, Aalborg University
Press

2.
P. Dobrev.
Knowledge Management in Natural Language Processing Using Conceptual Graph
, BIS 21++
Information Days, 21
-

23 March 2006, Velingrad, Hotel Kamena,
http://www.euromap.bas.bg/velingrad/PDobrev.zip

3.
P. Dobrev, A. Strupchanska.
Conceptual Graphs and Annotated Semantic Web Pages
. In Common Semantics
for Sharing Knowledge: Contributions to ICCS 2005, 13th International Conference on Conceptual Structures,
ICCS 2005, Kassel, Gremany, pp. 54
-
65, ISBN 3
-
89958
-
138
-
5

4.
Dobrev P, Strupchanska A., Angelova G.,
Towards a Better Understanding of the Language Content in the
Semantic Web
, AIMSA 2004, Varna Bulgaria, September 2004

5.
Dobrev P., Toutanova K.,
CGWorld
-

Architecture and Features
, ICCS 2002, Borovets, Bulgaria, July 2002,
Lecture Notes in Computer Science 2393 Springer 2002, ISBN 3
-
540
-
43901
-
3

6.
Dobrev P., Strupchanska A., Toutanova K,
CGWorld
-

from Conceptual Graph Theory to the Implementation
,
Applications with Conceptual Structures Workshop at ICCS
-
2002
, Borovets, Bulgaria, July 2002

7.
Boytcheva Sv., P. Dobrev and G. Angelova.
CGExtract: towards Extraction of Conceptual Graphs from
Controlled English.
In: G. Mineau (Ed.),
Conceptual Structures: Extracting and Representing Semantics
,
Contributions to ICCS
-
2001, the 9th Int. Conference on Conceptual Structures, Stanford, California, August
2001, pp. 89
-
116.

8.
Dobrev P., Strupchanska A., Toutanova K.,
CGWorld
-
2001
-

New Features and New Directions, CGTools
Workshop at ICCS
-
2001
, Stanford, CA, USA, August 2001, electronic proceedings at



http://www.cs.nmsu.edu/~hdp/CGTools/proceedings/papers/CGWorld.pdf

9.
A. Strupchanska, P. Dobrev, S. Boytcheva, T. Nikolov, K. Toutanova,
Sample

Knowledge Base in
Finance
,Contribution to CGTools Workshop at ICCS 2001 (http://www.ksl.stanford.edu/iccs2001/CGTools/)

10.
Dobrev P., Toutanova K., CGWorld
-

A Web Based Workbench for Conceptual Graphs Management and
Applications
, In Proceedings of the ICCS
-
2000 (Working with Conceptual Structures), Darmstadt, Germany,
August 2000


Plan for finishing of the PhD thesis


Paper for magazine Information and Control that present
summary of the results of the PhD thesis
.


Paper for ICCS 2007 that includes new results of the
PhD student.


Finishing of the full text of the PhD thesis in
9
-
12
months
:


Extending of the
survey

of the current methods and systems for
knowledge representation, modeling, acquisition and
management


Including new results of the PhD student


Extending the explanation of the existing systems that are based
on the results included in the PhD thesis


Extending of the English
-
Bulgarian dictionary for the domain