Natural Language and Speech Processing

blabbingunequaledΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

80 εμφανίσεις

Natural Language and Speech Processing


Creation of computational models of the understanding and
the generation of natural language.


Different fields coming together, looking at speech and
language processing from different perspectives.


Computational Linguistics (Linguistics)


Natural Language Processing (Computer Science)


Speech Recognition (Electrical Engineering)


Computational Psycholinguistics (Psychology)


Different Levels of Speech and Language
Processing


Phonetics and Phonology


The study of sounds in
language


Morphology


The study of components of words


Syntax


The study of structural relationships between
words


Semantics


The study of meaning


Pragmatics


The study of use of language for
accomplishing goals


Discourse


The study of large linguistic units


Ambiguity in Language

Almost in every level ambiguity is introduced, and one of the
main tasks in NLP is to resolve such ambiguities.

I made her duck

=


I cooked waterfowl for her.


I cooked waterfowl belonging to her.


I created the (plastic?) duck she owns.


I caused her to quickly lower her body.


I waved my magic wand and turned her into a waterfowl.


Time flies like an arrow vs. Fruit flies like a banana

Models and Algorithms for NLP


Taken mainly from Computer Science, Mathematics and
Linguistics


State Machines and Automata: Finite
-
state automata &
transducers, weighted automata, Markov models…


Formal Rule Systems: Regular grammars, CFGs,
Unification Grammars…


Logic: First
-
order Calculus, Predicate Logic…


Probability Theory: Statistical Processing, Machine
Learning…


The Turing Test


Alan Turing (1950): Empirical test for Artificial
Intelligence. A human interrogator asks questions to a
human and to a machine through a teletype, and tries to
find out who is the human and who is the machine.


Q: Please write me a sonet on the topic of the Fouth Bridge.

A: Count me out on this one. I never could write poetry.

Q: Add 34957 to 70764.

A: (Pause for 30 seconds) 105621.


ELIZA


Weizenbaum (1966): Program imitating the responses of a
psychotherapist
.

User: You are like my father in some ways.

ELIZA: What resemblance do you see?

User: You are not very aggresive but I think you don’t want me to notice that.

ELIZA: What makes you think I am not very aggressive?

User: You don’t argue with me.

ELIZA: Why do you think I don’t argue with you?



Used simple pattern matching, without any deeper
knowledge of the world or of the conversation.


http://www
-
ai.ijs.si/cgi
-
bin/eliza/eliza_script

Foundational
I
nsights
:

1940s and 1950s



Automata.


Based of Turing’s computational model.


Led to formal language theory (Chomsky).


Probabilistic


Information Theoretic Models.


Transmission of language and communication treated
as a noisy channel and decoding problem.


First machine speech recognizers (1952).


Two
C
amps
: 1957
-
1970


Symbolic vs. Stochastic Paradigm.


Symbolic


Formal language theory, generative syntax (Chomsky)


Implementation of first parsers


Artificial Intelligence


Stochastic


Bayesian Methods


Optical Character Recognition


Authorship Identification

Four
P
aradigm
s: 1970
-
1983


Stochastic Paradigm


Speech Recognition Algorithms (Hidden Markov
Models)


Logic
-
Based Paradigm


Work that led to Prolog, Functional Grammars and
Unification


Natural Language Understanding


SHRDLU


Question
-
answering Systems


Discourse Modeling


Automatic Reference Resolution

Empiricism and Finite
-
S
tate
M
odels
:

1983
-
1993


Return of Empiricism and Finite State Methods.


Not so popular in the previous decades.


Finite
-
state models
:



P
honology and morphology


S
yntax


Probabilistic models
:


Speech recognition



Part of speech tagging


Probabilistic parsing

The
F
ield
C
omes
T
ogether
:

1994
-


Spread of probabilistic
and data
-
driven
methods to all
kinds of problems
.


Increase in computer speed led to c
ommercial
exploitation
of speech and language technologies.


The web
led to emphasis on information retrieval and
extraction.


Some lessened emphasis on theoretical work

Practical Application Areas


Information
-
accessing Systems


Database quer
ies


Information Retrieval


Information Extraction


Task
-
oriented Systems


Text
-
editors


Robots


Educational Systems


Intelligent Tutoring


Student Modelling


Translation Systems


Machine Translation


Computer
-
aided translation

Practical Application Areas

System Modality


Text


Speech


Multi
-
modal applications


System Initiatives


Analysis


Generation



Theoretical Applications


Theory
-
specification tool
s


Transformational Grammar
,
ATNs
, LFG, GPSG,
HPSG, Systemic Grammar,
Functional Unification
Grammar



Theoretical modeling

1.
Processing models
: Parsing, Semantics, Speech
Recognition.

2.
Acquisition models
: Language Learning Models



Current Research

http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html



Spoken Language Input


Written Language Input


Language Analysis and Understanding


Language Generation


Spoken Output Technologies


Discourse and Dialogue


Document Processing


Multilinguality


Multimodality


Transmission and Storage


Mathematical Methods


Language Resources


Evaluation


Course Topics


Computational Morphology


Regular Grammars, Finite
-
state Automata and Transducers


Corpus Linguistics


N
-
Grams, Part
-
of
-
speech Tagging


Parsing and Context
-
free Grammars


Unification Grammars


Lexical Semantics and WordNet


Word Sence Disambiguation and Information Retrieval


Machine Translation