Natural Language in AI

blabbingunequaledΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

63 εμφανίσεις


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Natural Language in AI



Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Outline


Text
-
based natural language


Dialogue
-
based natural language



Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Methods in Natural Language
Processing

Methods in NLP can be oriented to two
categories of tasks:


NL generation


NL understanding



Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Natural Language problems


dialogue
-
based


NL interfaces


spoken and written communication


uses natural language understanding


discourse
(any string more than 1 Sentence
long)


text
-
based


text categorization, text generation, information
extraction, machine translation


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Text
-
Based Natural Language


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Text
-
based NL
problems


story/text understanding;


information extraction: extracting information
from text;


translating documents, manuals,
communications;


drafting documents;


summarizing texts;


text generation, categorization or clustering,
text DB retrieval, text mining, topic
identification;


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Text
-
based

Natural Language Topics



Information extraction


Machine translation


Drafting


Text summarization


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Information Extraction


Extracting specific types of information from large
volumes of unrestricted text;


The IE system must be input with domain guidelines
that specify what to find and what to extract;


They seek for the portions that might contain the
relevant information intended.


IE systems are not required to understand completely
the text source;


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Types of IE


Knowledge
-
based Information
Extraction


Machine learning IE


Template
-
based, Wrappers


Template Mining


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Knowledge
-
based Information Extraction


Use of linguistic patterns to support the interpretation
of input texts in knowledge
-
based information
extraction.

Machine learning IE


inductive learning mechanism to automatically
construct a knowledge base of patterns.

Types of IE


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Template
-
based, Wrappers


IE’s output is a populated database, which can be
used as a case base


The values for the slots are strings from the source
text


The resulting database works as a template

Template Mining


well suited for areas, “where the text is terse and
sentences are unambiguous and declarative in
nature”.

Types of IE


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Relation between IE and
NLP

Using linguistic patterns:


knowledge
-
based (represents patterns)


inductive learning based (learns patterns)


template mining (skips parsing)



NLP is needed whenever there is need for
disambiguating negation and ordering makes
a difference in meaning


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Examples of applications of
IE


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
References of IE

Robert Gaizauskas and Yorick Wilks (1998) Information Extraction: Beyond
Document Retrieval. Computational Linguistics and Chinese Language
Processing, vol. 3, no. 2, pp. 17
-
60.

Riloff, E. Lehnert, W. (1994). Information Extraction as a Basis for High
-
Precision Text Classification. ACM Transactions in Information Systems,
12, 3, 296
-
333.

Lehnert, W., McCarthy, J., Soderland, S., Riloff, E., Cardie, C., Peterson, J.,
Feng, F.,Dolan, C., and Goldman, S., (1993) UMASS/HUGHES:
Description of the CIRCUS System Used for MUC
-
5. Proceedings of the
Fifth Message Understanding Conference,pp. 277
-
291. San Mateo,
CA:Morgan Kaufmann.

S. Soderland and W. Lehnert (1994) Wrap Up: a Trainable Discourse Module
for Information Extraction, Journal of Artificial Intelligence Research, 2,
131
-
168.


Natural Language Processing Laboratory Online Information Extraction
Bibliography online at: http://www
-
nlp.cs.umass.edu/ciir
-
pubs/tepubs.html


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber


Information extraction


Machine translation


Drafting


Text summarization

Text
-
based Natural Language
Topics


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Can you translate this
sentence?

Ever since computers were invented, it has been natural
to wonder whether they might be able to learn.



By Tom Mitchell




Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Describe the steps you used to
translate the sentence


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
List the words you used in the
translated sentence and associate to
the ones in the source sentence



Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Ever since

computers

were

invented

it has been

natural

to wonder

whether

they

might be

able

to learn.

Desde que

computadores

foram

inventados

tem sido

natural

imaginar

que

eles

sejam

capazes de

aprender.


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Online translators



http://babelfish.altavista.com/babelfish/tr

http://world.altavista.com/tr

http://www.systransoft.com/



What’s wrong with them?


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
…cursing my head for things that I've said till
I finally died, which started the whole world
living…



Can you translate this
sentence?


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
What works?

The KANT project:



Knowledge
-
based
, Accurate Translation for technical
documentation



founded in 1989



large
-
scale, practical translation systems



for technical documentation


Kant project homepage:


http://www.lti.cs.cmu.edu/Research/Kant/


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
KANT


uses a
controlled vocabulary

and
grammar

for
each

language


explicit yet focused
semantic models

for each
technical domain


achieves very high accuracy in translation


multilingual document production


has been applied to the domains


electric power utility management



heavy equipment technical documentation
.


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Machine Translation


Unrestricted MT is still inadequate. Will it ever
change?


Why would MT target outperforming human
translation?


An alternative is using humans to edit the original
document into a subset of the original language
(canonical form)

Cost of MT


lexicons of 20,000
-
100,000 words


grammars with 100 to 10,000 rules


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber


Information extraction


Machine translation


Drafting


Text summarization

Text
-
based Natural Language
Topics


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Drafting


applications in the legal domain


drafting of wills


petitions for restraining orders


use of rhetorical structure

Example Rhetorical
Structure


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber


Information extraction


Machine translation


Drafting


Text summarization

Text
-
based Natural Language
Topics


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Summarize text


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Describe the steps you used to
summarize text


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Text summarization
applications



Generate a summary of many documents;


Generate a summary of one document
only;


Headline generation;


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Text summarization

The traditional idea of summarization is to extract sentences and
concatenate them.


Human beings produce summaries of documents by creating new
sentences that capture the most salient pieces of information in
the original document and that are grammatical, that cohere with
one another, and .


Given that large collections of text/abstract pairs are available
online, it is now possible to envision algorithms that are trained to
mimic this process.


From Knight, K. and Marcu, D. 2000.


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Text summarization steps



Identify most relevant segments;


Apply rules for deleting redundant parts;


Compress/aggregate long sentences;


Assess coherence of segments;


Revise.


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Example



Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Dialogue
-
based natural language





Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
NL Understanding


Speech recognition


intonation, pronunciation, speed


Natural Language Processing


syntactic , semantic , pragmatic analysis

Natural Language Generation


intention, generation, speech synthesis



Dialogue
-
based natural language


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber

analog signal from voice is digitized


identify phonemes produced


template matching attempts to match
phonemes from a library of sounds with sounds
produced


outcome is a list of phonemes and probabilities


find the words using hidden Markov modeling


Speech recognition


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
How to recognize speech


How to wreck a nice beach

Ice cream

I scream


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Speech Recognition
Methods


speech recognition can also be implemented with an
inductive method such as neural networks


individual and continuous recognizers


controlled vocabulary can increase chances of
success e.g., Jupiter


limit to one speaker , when multiple speakers are
needed, retraining may be often necessary


speech understanding includes speech recognition
and understanding of the recognized utterance


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber

-

Syntactic Analysis



-

Parsing


-

Semantics


-

Pragmatics


Natural Language
Understanding


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Syntactic analysis


a parser recovers the phrase structure of an
utterance, given a grammar (rules of syntax)


parser’s outcome is the structure (groups of
words and respective parts of speech)


phrase structure is represented in a parse tree


Parsing is the first step towards determining the
meaning of an utterance


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Parsing


Parsing: method to analyze a sentence to
determine its structure according to the
grammar


Grammar: formal specification of the
structures allowable in the language




Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Examples of Symbols in a
Grammar


(S) sentence


(NP) noun phrase


(VP) verb phrase


(PP) prepositional phrase


(RelClause) relative clause


(Det) determiner


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Grammar rules

S


NP VP


NP


Det Adjective N

S


VP VP



VP


V Adjective

S


VP PP



NP


Adjective N

S


NP VP VP


Dictionary entries:


VP


V S



V


ate

VP


V NP


NAME


John

VP


V PP


Det(art)


the

NP


Noun


N


cat

PP


P Noun

NP


Det Noun

Parsing Tree

S

NP

VP

Article

Noun

Verb

Adjective

The



terrain

is insurmountable


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber

the outcome of the syntactic analysis can still
be a series of alternate structures with
respective probabilities


sometimes grammar rules can disambiguate a
sentence,



“John set the set of chairs”

Sometimes they can’t.




…the next step is semantic analysis


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Semantic analysis


semantics provide a partial representation
for meaning


represents the sentence in meaningful
parts


uses possible syntactic structures and
meaning


builds a parse tree with associated
semantics


semantics typically represented with logic


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Compositional semantics


The semantics of a phrase is a function of the
semantics of its sub
-
phrases


It does not depend on any other phrase


So, if we know the meaning of sub
-
phrases, then we
know the meaning of the phrases



A goal of semantic interpretation is to find a way that
the meaning of the whole sentence can be put
together in a simple way from the meanings of the
parts of the sentence
.” (Alison, 1997 p. 112)


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber

transitiveness of a verb enhances the
meaning in a parse tree (e.g., jump is
intransitive, love is transitive)


-
John died Mary

Is there a period missing or is it:


-
John dyed Mary

Semantic analysis


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Pragmatic analysis


uses context


uses partial representation


includes purpose and performs
disambiguation


Where, when, by whom an utterance was
said


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Example using Ontology


Fred saw the plane flying over Zurich.


Fred saw the mountains flying over Zurich.

Traditional NL systems will have difficulty resolving this
syntactic ambiguity, but because CYC knows that planes
fly and mountains do not, it will be able to parse these
sentences just as easily as a human.

It's difficult to see how this could be done without relying on
a large database of common sense.

http://www.cyc.com/products2.html


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber

because it includes context it can recognize that
another sentence that followed the previous: The
man saw the plane flying over Zurick. It was dark,
when he looked up to the sky again the plane was
gone.


Another interpretation would be given if the
following sentence was: The man saw the plane
flying over Zurick. He also saw the building where
the plane crashed.

Example using Ontology


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Pronoun disambiguation:

The police arrested the demonstrators because they feared
violence.

The police arrested the demonstrators because they
advocated violence.

Mary saw the coat in the store window and wanted it.

Mary saw the coat in the store window and pressed her nose
up against it.



Pronoun disambiguation


using Ontology


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Communication and
Planning


Decide what to say relates to planning


Understanding relates to plan recognition


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Currently NLP


logic
-
based NLP is less accurate


statistical natural language processing
increases accuracy to around 98%


still not good, given that the average size of a
sentence in a newspaper is such that this
accuracy can result in 1 error per sentence


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Processes in NL
communication

Communication involves three steps by the
speaker:


the intention to convey an idea (what to say)


the mental generation of words (how to say)


their synthesis (say it)

Natural Language Generation


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
what to say


text planning


utterances that achieve a goal, may include
ordering


result of reasoning (e.g., retrieval)


a confirmation or thanks (Jupiter sounds a
beep)


question motivated by need of confirmation


question motivated by need of missing
information


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
how to say


how to convert a semantic representation into a
sentence


grammatically correct


proper choice of words


in limited problem types, templates are helpful


e.g., JUPITER says “I have no knowledge of that”


starts sentences with:


In (city) (day of the week), chances…


finishes sentences with:


Is there something else? or “Can I help you with something
else?”


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
say it!


speech synthesis


from words into speech signal


applications of neural networks


templates with recordings from humans


record every word in a dictionary


record every phoneme (worst choice!)


JUPITER uses a commercial speech
synthesizer


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
Example


Nitrogen is a prototype natural language
generation system



that combines symbolic rules with linguistic information
gathered statistically from large online text corpora.




http://www.isi.edu/natural
-
language/mt/nitrogen/

http://www.mri.mq.edu.au/~peba/MLPeba/system.html

http://cslu.cse.ogi.edu/HLTsurvey/ch4node3.html#SECTIO
N4


Natural Language

INFO 629 Dr. R. Weber

Copyright R. Weber
JUPITER


1
-
888
-
573
-
8255

http://www.sls.lcs.mit.edu/sls/whatwedo/applications/jupiter.html


"What will the weather be like in Boston tomorrow?" Jupiter invokes the following
procedure:


-

Speech recognition:

SUMMIT

converts the spoken sentence into text

-

Language understanding:

TINA

parses the text into a semantic frame
--

a
grammatical structure containing the basic terms needed to query the Jupiter
database

-

Language generation:

GENESIS

uses the semantic frame's basic terms to
build a Structured Query Language (SQL) query for the database

-

Information retrieval:

Jupiter executes the SQL query and retrieves the
requested information from the database

-

Language generation:

TINA

and
GENESIS

convert the query result into a
natural language sentence

-

Information delivery:

Jupiter delivers the generated sentence to the user via
voice (using a speech synthesizer) and/or display