Semantic Lexicons in Multilingual Information Management

erminerebelAI and Robotics

Nov 15, 2013 (3 years and 8 months ago)

96 views

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Ontologies


Contributions from Language Technology



Paul Buitelaar



DFKI GmbH

Language Techology Lab

DFKI Competence Center Semantic Web

Saarbrücken, Germany

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Overview

Ontologies and the Semantic Web



Semantic Web Intro



Ontlgie猠and Knwledge Markup



Ontlgy Develp浥nt



Ontlgy Lifecycle & Language 呥chnlgy


Language Technology



Levels of Automatic Linguistic Analysis


Ontologies in Multilingual Information Access



A Medical Example: MuchMore Project



Se浡ntic Re獯urce猠in the Medical D浡in



De浯 MuchMre Sy獴em



Language 呥chnlgy in Anntatin and Indexing


Conclusions



MuchMore for the Legal Domain…

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Semantic Web

Semantic Web

Intelligent

Man
-
Machine Interface

Knowledge

Markup

Ontologies

Semantic

Web Services

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Ontology
-
based Knowledge Markup

Semantic Metadata



Metadata, e.g. Dublin Core
--

Title
,
Author
, etc.



Semantic:

Formal

Properties of Objects of Class
Author

<xmnls jobs="http://www.jobs.org/daml+oil
-
jobs
-
ontology#">



<jobs:systems
-
analyst>

John Smith

</jobs:systems
-
analyst>

Knowledge Markup

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Semantic Web Architecture

Layered Architecture (Tim Berners
-
Lee)

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Knowledge Markup Languages

XML Schema

Namespaces


Interpretation Context

RDF Schema

OWL

(DAML+OIL)

Formalization:

Classes (Inheritance),
Properties

Formalization:

Classes, Class Definitions,
Properties, Property Types

(e.g. Transitivity)

Data Types

XML

RDF

Syntax

Semantics

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Ontologies: Basic Idea

Definition







Explicit,

Formal

Specification

of

a

Shared



Conceptualization

of

a

Domain

of

Interest



T
.

Gruber

Towards

principles

for

the

design

of

ontologies

used

for

knowledge

sharing
.

Int
.

J
.

of

Human

and

Computer

Studies,

1994

Purpose




Knowledge

Sharing

(e
.
g
.

between

Agents)



Inference

(over

Sets

of

Instances)

Related

Areas,

e
.
g
.




Terminologies,

Controlled

Vocabulary,

Thesauri,



Taxonomies,

Semantic

Lexicons,

Wordnets,

etc
.



Conceptual

Models,

Schemas,

etc
.

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Ontologies: Applications, e.g.

Semantic

Web

Services




Interoperability

for

(
Semantic)

Web

Services

Intelligent

Agents




Domain

Models

for

Intelligent

Agents

Text

Interpretation




Ontology
-
aware

Information

Extraction

Multimedia

Integration




Ontology
-
based

Alignment

of

Extracted

Objects



in

Text,

Audio,

Video


Intelligent

Search/Navigation




Ontology
-
based

Indexing

in

Web
-
Retrieval

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Ontologies: Development

Ontology

Editor

/

KB

Management




Most

Widely

Used
:

Protégé

(Stanford

University,



Medical

Informatics,

USA)



Originally

for

Development

and

Maintenance

of



Medical

Expert

Systems




Other,

e
.
g
.




KAON
:

University

of

Karlsruhe

-

AIFB,

Germany



WebOde
: UPM


Ontology Group, Madrid, Spain



WebOnto
: Open University
-

KMI, UK




Overview at
XML.com

by Michael Denny
:
Ontology


Building: A Survey of Editing Tools

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Class Hierarchy

Slot Descriptions

http://dmag.upf.es/ontologies/2003/12/ipronto.owl

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Ontology Lifecycle

Creating

Populating

Validating

Evolving

Maintaining

Deploying

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

LT in the Ontology Lifecycle

Ontology

(Knowledge)

Creating & Evolving


Linguistic Analysis to Extract

Classes / Relations

Populating


(Knowledge Base Generation)


Linguistic Analysis to Extract

Instances

Instances

Documents

(Text)

Language Technology (LT) for Ontology:

Language Technology = Automated Linguistic Analysis

Classes,

Relations/Properties

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Linguistic Analysis: Example

The

Dell

computer

with

a

flat

screen

had

to

be

rejected

because

of

a

failure

in

the

motherboard
.

Dell computer

flat screen

motherboard

has
-
a

has
-
a

reject

failure

location
-
of

animate
-
entity

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Part
-
of
-
Speech, Morphology

Part
-
of
-
Speech



e.g.: noun, verb, adjective, preposition, …



PS tag 獥t猠浡y have between 10 and 50 (r 浯re) tags

Morphology



Most languages have inflection and declination, e.g.:



Singular/Plural


computer, computers


Present/Past


reject, rejected




Many languages have also complex (de)composition, e.g.:



Flachbildschirm

(flat screen)

>

flach + Bildschirm






>

flach + Bild + Schirm

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Phrases, Terms, Named Entities

Semantic Units



Phrases (e.g. nominal
-

NP, prepositional
-

PP)



NP



a flat screen


PP



with a flat screen


NP (recursive)

the Dell computer with a flat screen






a failure in the motherboard




Terms (domain
-
specific phrases)


Dell computer



Dell computer with a flat screen





Named Entities (phrases corresponding to dates, names, …)



COMPANY

Dell


COMPANY

Dell Computer Corporation


PERSON

Michael Dell

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Dependency Structure

Semantic Structure

Dependencies between
Predicates and Arguments



the Dell computer with a flat screen had to be rejected



PRED: reject


ARG1: ENTITY


ARG2: ‘the Dell computer with a flat screen’



‘Logical Form’ :

reject(x,y) & animate
-
entity(x) & computer(y) & …

The Dell computer that has been rejected
was claimed to have
suffered from handling
.



reject(e
1
,x
1
,y
1
) & animate
-
entity(x
1
) & Dell_computer(y
1
)


& claim(e
2
,x
2
,e
3
) & animate
-
entity(x
2
)


& suffer_from(e
3
,y
1
,y
2
) & handling (y
2
)

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

MuchMore Project

Demonstration

Prototype




Real
-
Life

Medical

Scenario

for

Cross
-
Lingual

Information

Retrieval

Research

&

Development




Combined

Data
-

and

Knowledge
-
Driven

Performance

Evaluation




Performance

Comparison

of

Existing

and

Novel

Methods

http://muchmore.dfki.de

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

General


WordNet (EN), GermaNet (DE), EuroWordNet (“linked”)

Medical Domain


UMLS:
U
nified
M
edical
L
anguage
S
ystem



Medical MetaThesaurus (only MeSH2001 is used)




English, German, Spanish, …



730.000 Concepts



9 Relations (Broader, Narrower,…)



Semantic Network




134 Semantic Types



54 Semantic Relations

Semantic Resources

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

C0019682|ENG|P|L0019682|PF|S0048631|HIV|0|

C0019682|ENG|S|L0020103|PF|S0049688|HTLV
-
III|0|

C0019682|ENG|S|L0020128|VS|S0049756|Human Immunodeficiency Virus|0|

C0019682|ENG|S|L0020128|VWS|S0098727|Virus, Human Immunodeficiency|0|

C0019682|FRE|P|L0168651|PF|S0233132|HIV|3|

C0019682|FRE|S|L0206547|PF|S0277133|VIRUS IMMUNODEFICIENCE HUMAINE|3|

C0019682|GER|P|L0413854|PF|S0538136|HIV|3|

C0019682|GER|S|L1261793|PF|S1503739|Humanes T
-
Zell
-
lymphotropes Virus Typ III|3|

other languages

GERMAN 66,381

ENGLISH 1.462,202

Concept Names: 1.734,706

Each CUI (Concept Unique Identifier) is mapped to one
out
of 134
S
emantic
T
ypes
or
TUI

(Type Unique Identifier)


Clozapine: C0009079


Pharmacologic Substance:
T121

MetaThesaurus, SemNet

Semantic

Types

are

organized

in

a

Network

through

54

Relations


T
121
|T
154
|T
047


©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Token (with Part
-
of
-
Speech)

German:

Kreuzbandes


English:

ligaments


Lemma (or Sequence of Lemmas
-

Decomposition)

German:

Faserknorpel


Faser + 䭮rpel

English:

ligament


UMLS Concept Code and Semantic Type

ligament : C0022745_T030


MeSH Code

A2.513


Semantic Relation (over a Pair of UMLS Concepts)

C0022745_T030 interconnects C0047693_T065

Annotation & Indexing

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004


UMLS

Semantic

Network

specifies

54

types

of

relations

between

134

semantic

types





Pharmacologic

Substance

affects

Cell

Function



Relations

are

generic

and

potentially

false




Therapeutic

Procedure

method_o
f


Occupation
,
Discipline





*discectomy

method_of


history



Relations

are

ambiguous




Therapeutic

Procedure

prevents

Neoplastic

Process



Therapeutic

Procedure

complicates

Neoplastic

Process



Therapeutic

Procedure

affects


Neoplastic

Process



Therapeutic

Procedure

treats


Neoplastic

Process

Relations

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004


Discontinuation

of

heparin

is

a

simple

and

essential

maneuvre,

and

anticoagulation

has

to

be

continued

by

alternative

drugs
.

Example

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004


Terms
:

C
0019134


Heparin

C
0005790

Blood

coagulation

tests

C
0013227

Pharmaceutical

preparations


Example: Terms/Concepts


Discontinuation

of

heparin

is

a

simple

and

essential

maneuvre,

and

anticoagulation

has

to

be

continued

by

alternative

drugs
.

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Relations
:

C0019134


interacts_with

C0013227

C0005790

analyse
s


C0019134


C0005790


analyse
s


C0013227

Example: Relations


Terms
:

C
0019134


Heparin

C
0005790

Blood

coagulation

tests

C
0013227

Pharmaceutical

preparations


Discontinuation

of

heparin

is

a

simple

and

essential

maneuvre,

and

anticoagulation

has

to

be

continued

by

alternative

drugs
.

©

Paul

Buitelaar
:

eJustice

Presentation,

July

15
th,

2004

Conclusions

MuchMore for the Legal Domain…





Resources


Legal Domain Ontology with…



…Large
-
scale Terminology for Multiple Languages, or if not available…


…Large Legal Domain Corpora in Multiple Languages for Term Extraction…


…and for Relation Extraction if Ontology Needs to be Constructed/Adapted




Tools


Linguistic Analysis (PoS, Morphology, Term Grammars, etc.)…



…for Multiple Languages…


…Tuned to the Legal Domain…


Information Retrieval Infrastructure, Interface Design, etc.