Applications of memory-based natural language processing

huntcopywriterΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

72 εμφανίσεις

Applications
of
memory-based
natural
language
processing
Antal van den Bosch and Roser Morante
ILK Research Group
Tilburg University
Prague,
June
24,
2007
Current

ILK

members

Principal investigator:

Antal van den Bosch

Post-doc researchers:

Piroska Lendvai, Martin Reynaert, Roser Morante, Erwin Marsi

Ph.D. students:

Sander Canisius, Toine Bogers, Marieke van Erp, Herman
Stehouwer

Scientific
programmers:

Ko van der Sloot, Steve Hunt, Peter Berck

Guest
researchers:

Erik
Tjong Kim Sang, Iris Hendrickx, Walter Daelemans
0utline
of
the
talk
1.
Scientific embedding
1.1 NLP as classification
1.2 Inference in NLP
2.
Memory-based NLP applications
3.
Embedded memory-based applications
4.
Software and infrastructure
5.
e-Learning?
1
Scientific
embedding
(1)

Language
processing
is
memory-based

Learning consists of:

Storing instances in memory

Drawing analogies with the stored
instances to deal with new
experiences.

Learning is a supervised process

Annotated data are needed
Representation
of
instances
Task:
assigning
part
of
speech
tags
Focus
word
Context
Context
?
always
accepted
.
.
_
were
1
Scientific

embedding

(2)

Language
processing
has
simplicity
constraints:

Context
is
a
local
phenomenon

Abstraction
is
harmful
1
Scientific

embedding

(3)

Language
processing
can
be
reduced
to:

Classification

Segmentation, mapping

Inference:

Finding the optimal sequence/structure
1.1
NLP

as

classification
(1)

Classification:

Given
new
test
instance
X
,
– Compare it to all memory instances

Compute
a
distance
between
X
and
memory instance
Y
– Update the top
k
of closest instances
(nearest neighbors)
• When done, take the majority class of the
k
nearest neighbors as the class of
X
1.1
NLP

as

classification
(2)
Sentence
accent
placement
Dependency
relation
assignment
1.2
Inference

in

NLP

Local
classifications

ʺ
global
solution

Open up search space

In which there is an optimal global solution

Search
algorithms

Constraint satisfaction inference

Beam search

Viterbi
2
Memory-based

NLP

apps

Basic
NLP

Spelling correction

Speech synthesis

Morpho-syntax

Semantics

Machine translation

Embedded
NLP

Dialogue systems

Professional document writing

Knowledge enrichment
2.1
Morpho-phonology
2.2
Morpho-syntax
2.3
Semantics
2.3
Semantics
Semantic relations: content-container
2.4
Machine

Translation

Memory-based text-to-text

processing

Machine
translation

Language
modelling

Confusible
disambiguation
3 Embedded

Memory-Based

Apps

Dialogue
systems

NWO
IMIX:
ROLAQUAD

Professional
document
writing

Senter
Novem
IOP-MMI
À

Propos

Knowledge
enrichment
in
domains

NWO CATCH: MITCH
3.1 Semantic

Classification

in

QA

Answer retrieval from domain documents
through alignment of question analyses with
off-line document analyses.


3.2 Professional

Document

Writing

Pro-active personalization for professional
document writing

Recommend
related
articles
for
a
'focus'
online
news
article

Retrieve
similar
passages

Classify
experts
3.3
Knowledge

Enrichment

Mining information from texts in the cultural heritage

From documents to knowledge bases and ontologies

Goal:
research
and
develop
techniques
to
discover
new meaning in large collections of partially
structured data that are available at
Naturalis
3.4
Text

Mining

in

Animal

Data
In
sum

Text
Text
Meaning
Meaning
LT Modules
Applications
Lexical / Morphological Analysis
Syntactic Analysis
Semantic Analysis
Discourse Analysis
Tagging
Chunking
Word Sense Disambiguation
Grammatical Relation Finding
Named Entity Recognition
Reference Resolution
OCR
Spelling Error Correction
Grammar

Checking
Information

retrieval
Information

Extraction
Summarization
Machine

Translation
Document

Classification
Ontology

Extraction

and

Refinement
Question

Answering
Dialogue

Systems
4
Software

and

Infrastructure

Open Source (
GPL
) software: a.o.

TiMBL,
MBT:
Machine
learning
and
sequence
processing

NeXTeNS:
text-to-speech
conversion

POS
tagging,
lemmatization,
morphological
analysis,
shallow
parsing
(Tadpole)

Demos

Web
interfaces

Computing infrastructure

One
supercomputer;
one
high-end
file
server

Approx.
20
computing
servers,
4
web/data
servers,
20
desktops

Parallelisation:
Dimbl,
Mumbl
e-Learning?

Better
accessibility

Recommendation tools

Multi-lingual NLP & MT

Creating
better
e-Learning
apps
with
more
natural
interfaces

Speech synthesis

QA, dialogue systems

Language
e-Learning

“Help
the
computer
learn
language”

Win-win situation, “
open mind”
http://ilk.uvt.nl
Thanks
for
your
attention
!
You
will
find
more
information
in:
Partners

Academic

CNTS, University of Antwerp

Project partners: Nijmegen, Groningen, Maastricht, Utrecht,
Eindhoven, Leuven

University
of
Bergen,
Dublin
City
University,
Polytechnic
University
of
Catalunya, Saarland University, University of Illinois at Urbana-
Champaign

Non-commercial

Naturalis
Museum
of
Natural
History

Industrial

Textkernel

Project partners: Polderland, SEC, Irion, Trezorix
Spin

off

Textkernel
B.V.

Information
extraction

Robust
text
matching

Dialogue
systems

Foundation
for
Inductive
Learning
Applications

Broker
for
Tilburg
and
Antwerp
university
software

Consultancy
Eager

vs

Lazy

Learning