Word Knowledge, World

woodruffpassionateInternet και Εφαρμογές Web

21 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

126 εμφανίσεις

Word Knowledge, World
Knowledge, and Ontology

Towards a linguistic infrastructure for knowledge

representation and knowledge engineering

Chu
-
Ren Huang

Academia Sinica


From Sense to Ontology:


Lexical Structure of Knowledge


City University of Hong Kong

June 23, 2003

Prelude



Seeing is Believing?

What Does This Tell Us?


Information does not automatically
translate into knowledge


Information is sharable and reusable
only when it is

situated


同志之愛

camaraderie

Share party membership?




homosexuality?




Nothing?

Outline


Language and Knowledge


Semantic Web, Ontology and
Language


Ontology and WordNet


SUMO at
http://ontology.teknowledge.com/


http://ckip.iis.sinica.edu.tw/CKIP/ontology/


I. Introduction

Language and Knowledge


語言的功能為何?


What is the ultimate function of
Language?


To Communicate
--

to send and
receive information
(and mis
-
information)


溝通:傳送與接收訊息

語言是訊息的載體與

知識的表徵結構


Language
encodes

information, and
decodes

knowledge (= acquired information)


溝通包括了訊息的傳遞與知識的接收


Language mediate what we know (word knowledge)
and what is there to be known (world knowledge)



語言是個人知識與天下知識間的媒介


--
a knowledge
-
based lexicalist view

語言是人類處理表徵知識的獨特工具

Language is human’s unique tool to represent
and manipulate world knowledge



By adopting a lexicalist approach,
we assume

1. That a person

s lexicon contains a
set of conventionalized terms which
are conceptual atoms to that person.

2. That knowledge and conceptual
structure is lexically accessed

Summary

A speaker’s

word

knowledge


is the comprehensive interface
to his/her

world

knowledge

II.
語意網
Semantic Web


A new form of Web content that is
meaningful to computers

will
unleash a revolution of new
possibilities


語意網將成為下一代的網際網路


「科學人」
2002
八月號
46
-
56


Scientific American, May 2001

SW Websites


Http://www.w3.org/2001/sw



Http://www.SemanticWeb.org



Http://www.sciam.com


From World Wide Web to
Semantic Web


WWW is simply a medium for people
to exchange and acquire information.
The computer does not
read

anything, people do.



In order to turn WWW to SW,
content must be added specifically
for machines to
read

and manipulate.

Semantic but Non
-
linguistic

The Semantic Web will enable
machines to COMPREHEND semantic
documents and data, not human
speech and writings.


Berners
-
Lee, Tim, James Hendler and Ora
Lassila. The Semantic Web. Scientific
American. May 2001
.

What Are the Computers
Reading



RDF: Resource Description
Framework


What is it?


URI: Uniform Resource Identifier


Where is it?


Ontology (
知識本體
)


How to interpret the content?

-
how the information is ‘situated’

What is Ontology?


Ontology
本體論

(
in Philosophy,

the original meaning)


A theory/description of the nature of existence,
which predicts what types of things exist.


Ontology

知識本體



(In computer science, the meaning in use)

A document defining all (conceptual) terms
in a system by describing all their
relations.

-
An ontology usually contain a taxonomy of
elements, as well as some inference rules.

SW

s

New Challenges


Is Language
still

the Essential
Gateway to Knowledge?



If language is the web that we
use to catch fish (
ontology
), can
we discard the web after the fish
is caught?
得魚忘筌,

可乎?

Is Language the Essential
Gateway to Knowledge?


Can knowledge be acquired and
manipulated without Language?

-
An Engineer

s Question



Can this exercise help us understand
how Language relates to knowledge
representation and conceptual
manipulation?

-
A Cognitive Scientist/Linguist

s Question

The First Question


Which Language(s) will the
Semantic Web use?

語意網將會使用什麼語言?

Answer A

English


Since it is already the
dominant language in the
WWW

Ten Year From Now:

Web and Chinese


B.F. Chu (
朱邦復
)

900 million
Chinese Peasants will use WWW.


M. Zhou (
周明
, MSR Asia)



500 million Chinese will Online


C.R.Huang

One in four web users
will be Chinese


Source: panel on

Chinese Language Processing: 10 Years from
Now

. The First SigHan Workshop on Chinese Language
Processing. COLING2002. Sept. 1. Taipei.

Answer B

Any Language(s)


Since SW is knowledge
-
based
NOT language
-
based


Other languages: OWL, XML, etc

The Second Question


Does SW need LT if SW is
ontology
-
based and NOT
language
-
based?

既然語意網主要依靠知識本體;

特定語言的知識處理還有必要嗎?

Answer A

No, it doesn

t.


Since ontology by definition is
language independent

Answer B

Yes, it does.

-
Since human relies on language
to express and access
information

-
Language offers a shared
structure to

situate


information
(
i.e.
knowledge representation)

Each Ontology is the Conceptual
Backbone of a Knowledge Domain

Ontological Variations


Sources: culture, domain,
environment, ethnicity, media,
science, society




Instantiation and Representation:
Shared linguistic use; i.e. Sub
-
language and Sub
-
lexicon

The Challenges that
Semantic Web Faces


Multilingual Processing



Multi
-
domain knowledge



How many ontologies must be built?



How efficiently and robustly can
ontologies be built?

Linguistic Ontology

LingOnto: The conceptual structure that all
speaker of a language implicitly adopt



Each major language has a LingOnto with more
expert users than any constructed ontology


Linguistic Ontology can be successfully
constructed from a (sub
-
)language with good LT


Linguistic Ontologies can be used to signal and
detect language variations and changes


(The linguistic anchoring project of Taiwan’s NDAP)

Towards a Linguistic
Ontology for SW

Linguistic Ontology mediates human
cognition and real world situations,
hence it should be a natural basis
for SW ontology


Candidates: WordNet, FrameNet,
Qualia Structure etc.



also note
LanguageWeb


The Multilingual and Multicultural Aspects of the
Semantic Web



Making the semantic web accessible
in many languages


Allowing the semantic web to
represent many different cultures





An EU NoE Proprosal with Comprehensive Asian
Participation

Summary


SW can use any human languages


SW does need Language Technology


SW can be multilingual and
multicultural


Are we ready for the challenge to provide
Linguistic Ontologies
?

III. Ontology and WordNet


WordNet 1.7.
詞彙網路

1990


www.cogsci.princeton.edu/~wn/


Monolingual: English


SUMO: Suggested Upper Merged
Ontology


http://ontology.teknowledge.com



Open resource created under an initiative from
IEEE Standard Upper Ontology Working Group



Wordnet as a Representation
of Linguistic Ontology

Wordnet Atoms



Formal Atoms:
Lemmas



Content Atoms:
Senses



Relation Atoms:


Lexical Semantic Relations (LSR)

Organization of Wordnets


Concept
-
driven: All words sharing the
same sense form a SynSet
同義詞集


-
each synset is an instance of linguistic
conceptualization


(Note that a word is a unique pair of lemma
-
sense)



Relation
-
based: A language wordnet is
the network formed by instantiating all
LSR

s between each synset pairs

Wordnet and Ontology



Synset: Lexicon driven
concept identification



LSR: Lexically entailed
knowledge inference

Lexical Semantic Relations
詞義關係


antonymy


反義關係


hypernymy


上位關係

hyponymy


下位關係



holonymy


整體-部份關係

meronymy


部份-整體關係

metonymy


轉指關係


near
-
synonymy

近義關係

synonymy

同義關係

troponymy


方式關係

EuroWorNet

EWN


歐語詞網

Since 1996

http://www.hum.uva.nl/~ewn/

Multilingual: Basque, Catalan, Czech,
Dutch, French, English, Estonian,
French, German, Italian, Spanish,
(Swedish, Norway, Danish, Greek,
Portuguese, Romanian, Lithuan,
Russian, Bulgarian, Slovenic)

HowNet

知網(董振東的中文語意網路)

Since 1996

http://www.keenage.com

Bilingual: EC, CE

SUMO: Suggested Upper
Merged Ontology

SUMO Atoms


Concepts
: around 1000


note that concepts are not necessarily
linguistically realized


Relations
(ISA):
See SUMO Graph


Axioms
: for inference

SUMO

s Top Conceptual
Hierarchy


Entity



Physical



Object



SelfConnectedObject



Region



Collection



Agent



Process



Abstract



SetOrClass



Relation



Quantity



Number



PhysicalQuantity



Attribute



Proposition



Graph



GraphElement


Definition of the Concept

CorpuscularObject

in SUMO



(
subclass

CorpuscularObject

SelfConnectedObject
)



(
disjoint

CorpuscularObject

Substance
)



(
documentation

CorpuscularObject

"A &%SelfConnectedObject whose
parts have properties that are not
shared by the whole.")

One of the Axioms
Involving
CorpuscularObject


(
=>

(
instance

?OBJ
CorpuscularObject
) (
exists

(?SUBSTANCE1 ?SUBSTANCE2) (
and

(
subclass

?SUBSTANCE1
Substance
)
(
subclass

?SUBSTANCE2
Substance
)
(
material

?SUBSTANCE1 ?OBJ)
(
material

?SUBSTANCE2 ?OBJ) (
not

(
equal

?SUBSTANCE1 ?SUBSTANCE2
)))))

Summary:Synergy of Ontologies and
Linguistic Knowledge Representation


Upper Ontology provides a constant
and universal infrastructure for
knowledge representation and
inference.


Linguistic knowledge representation
(such as wordnets), provides a basis
for quick bootstrapping
--
across
various languages, and domains.

Towards a linguistic infrastructure for knowledge


representation and knowledge engineering


The Criteria


Lexicon
-
driven


Mutlilingual


Domain Inter
-
operable


Our Prototype

An English
-
Chinese Bilingual Interface of
General and Domain
-
specific Ontologies

http://ckip.iis.sinica.edu.tw/CKIP/ontology

Applying Ontology in
Linguistics/NLP


Metaphors are mappings of source
domain knowledge to target domain
knowledge. E.g. Journey


Life


Lakoff (1993
)


The Mapping is governed by Conceptual
Mapping Rules
. Ahrens (2002)


--
時間如流水

Vs.
花錢如流水


Same knowledge domain but different knowledge?

Ontology
-
based representation

Ahrens, Chung, and Huang (2003)


--
How to describe source and domain knowledge?

--
How to predict the mapping rules?


Metaphor used to refer to economy
經濟

in Chinese


-
ECONOMY IS A PERSON

(121 instances)

chen2zhang3

grow
,
shuai1tui4

regression/decay,
chen2zhang3chi2

growth period,
bing4zhuang4

symptoms,
ming4mai4
lifeblood


-

ECONOMY IS A CONTEST



-

ECONOMY IS WAR


The Structure of Knowledge
Domains as Shown by Metaphors


An organism is an agent of a process that holds
the attribute of being alive for a duration.

=>

(
and

(
instance

?ORGANISM
Organism
) (
agent

?PROCESS ?ORGANISM))
(
holdsDuring

(
WhenFn

?PROCESS) (
attribute

?ORGANISM
Living
)))



Economy is understood as a person in terms of
the living cycle


Contest

ViolentContest

War




Economy is War is an elaboration of economy is
a contest, and both dealing with two agents
vying for purposeful gains.

Conclusion


Knowledge is situated information


Web content will be situated for the
computers by ontology in the Semantic
Web


Human word knowledge is linguistically
situated by LSR


The synergy of Upper Ontology and
Linguistic Ontology mediates word
knowledge and World Knowledge

Upper Ontology and NLP

a NLPKE03 special session


2003 International Conference on Natural Language Processing
and Knowledge Engineering (NLP
-
KE)


Oct 26
-
29, 2003
--

Beijing, China


Paper Submission


BY: July 15, 2003


TO: cqzong@nlpr.ia.ac.cn



Special Session on Upper Ontology and NLP


Invited Speaker: Junichi Tsujii


Panel: Calzolari, Dong, Huang, Pease, Zhao


Program Committee: Calzolari, Huang, Pease, Yu



BY: July 18, 2003


TO:
hrzhang@pku.edu.cn


http://www.cie
-
china.org/nlpke2003/

What is in our Onto
-
Base


English
-
Chinese Translation Equivalents
Database

-
includes all WordNet entries

-
manually checked, up to 3 Chinese translation
for each English entry


SUMO as upper ontology

-
C
-
E bilingual ontology nodes

-
lexical
-
conceptual link


Domain Tag for Each Entry (
under constr.)

Domain Ontology


The EMELD Domain Ontology of Linguistic
Concepts

http://emeld.org/tools/ontology.cfm


http://emeld.douglass.arizona.edu:8080/se
archindex.html



E
-
MELD: Electronic Metastructure for
Endangered Language Data