ICT619 Intelligent Systems

matchmoaningΤεχνίτη Νοημοσύνη και Ρομποτική

17 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

78 εμφανίσεις

ICT619 Intelligent
Systems



Topic 9: Natural Language
Processing and Language
Technology

ICT619

2

What is natural language processing
(NLP)?



An ideal goal for human
-
computer communication is
the ability to communicate in a
natural language



NLP grew as a sub
-
domain of AI and linguistics


-

the task of developing software capable of
understanding information (commands, text) expressed
in a natural language in order to achieve specific goals



Understanding natural languages is a challenging task
for computers



Due to ambiguities, frequent use of context and the
overall knowledge acquisition and use problem

ICT619

3

Speech (voice) recognition and
natural language processing



Speech recognition

concerns understanding spoken
commands or sentences from voice inputs


Example: Telstra’s directory assistance


A speech recognition system must first extract and
recognise words from audio input


We might also like the system to be able to answer in
speech
-

this requires
speech generation

as well


In NLP, input is already available in machine
-
readable
form (eg words as Unicode text)


Future improvements of speech recognition will to
some extent depend on progress in NLP

ICT619

4

Speech Recognnition


The state
-
of
-
the
-
art


60
-
90% accuracy
-

good enough for general dictation


Speaker dependent


needs training


Cheap desktop software available


Example: IBM ViaVoice, Dragon Naturally Speaking



Issues:


Isolated vs. continuous speech


Vocabulary size


Better speaker independence

ICT619

5

Language Technology



Covers all areas related to NLP with a practical focus



Language technology is defined as:


The application of knowledge about human language
in computer
-
based solutions



Applications covered by language technology include:


Spoken language dialogue systems (speech recognition,
some understanding, and speech generation)


Machine translation


Text summarisation


Information retrieval



ICT619

6

Language Technology (cont’d)


The input to a language technology system
may be provided through


speech recognition


optical character recognition (OCR)


handwriting recognition

and


the output may be in the form of speech or
tailored documents, or web pages.


ICT619

7

Approaches to natural language
processing

Main Approaches


Keyword searching


Linguistic analysis


AI
-
based


ANN
-
based


Statistical analysis


Keyword searching systems


Early NLP systems
-

and some in use today
-

are
based on
keyword searching (pattern matching)


ICT619

8

Keyword searching NLP systems


Selected keywords or phrases are searched for
in the input sentence



The program responds with specific pre
-
stored
responses based on the keywords or phrases



Program may actually construct a response
based on a partial reply coupled with keywords
and phrases from the input



No real understanding of the input is involved

ICT619

9

Keyword searching NLP systems
(cont’d)

The most well known example


-

ELIZA program from MIT mid
-
1960s

ICT619

10

Keyword systems


Limitations


Inflexible
-

really just reactive responses


Unable to cope with anything not in their keyword
look
-
up tables, and


No knowledge modelling



Today’s more sophisticated NLP systems


Try to understand the content of language by doing
syntactical
,
semantic

and
pragmatic

analyses


May be able to do some conceptual modelling


Better able to maintain continuous dialogues


Attempt to cope with the ambiguity and other
features common in natural language


ICT619

11

Other approaches to NLP

Linguistic analysis approach


Based on encoding formal grammar rules for

sentence
-
level processing


A linguistically
-
oriented system focuses on the

syntax and semantics


AI based systems


Focuses on using world knowledge to understand
language


One example of an AI
-
based NLP system is BORIS


written by Michael Dyer, a student of Roger Schank's


a story understanding program that reads a narrative and
answers questions about it

ICT619

12

AI
-
based NLP example
-

BORIS

Richard hadn’t heard from his college roommate Paul for years. Richard had borrowed
money from Paul which was never paid back. But now he had no idea where to find his
old friend. When a letter finally arrived from San Francisco, Richard was anxious to
find out how Paul was.


Q:


What happened to Richard at home?


BORIS: Richard got a letter from Paul.


Q:


Who is Paul?


BORIS: Richard’s friend.


Q:


Did Richard want to see Paul?


BORIS: Yes, Richard wanted to know how Paul was.


Q:


Had Paul helped Richard?


BORIS: Yes, Paul lent money to Richard.


The BORIS system (from Roger Schank and Peter Childers,
The Cognitive
Computer
).

ICT619

13

Artificial neural networks based NLP

ANN based systems


Uses ANNs for processing language, particularly for
lexical disambiguation


A neural net is trained to disambiguate by using
context


Trained presents units of 6 or so words containing
target word to be learned



Example: Disambiguation of word “bank” in “We got a
bank loan to buy a house”


Two possible senses: money sense, river sense


Groups of co
-
occurring words (neighbourhoods):


Money sense:
bank

money loan branch fee robbery


River sense:
bank

river bridge erosion earth slope

ICT619

14

Statistical approach to NLP


Based on extracting statistically significant information
-

tags
-

from large corpora or bodies of text (millions of
words) and using these as very general indexes to
model parts or responses



Valuable because it does not require as much hand
-
modelling of knowledge, but acquires the tags
automatically



Statistical methods are now receiving much attention,
and more systems are likely to incorporate them in
future.



Most NLP systems use a combination of the linguistic
and AI approaches

Linguistic approach

ICT619

15

Components of NLP systems



Five major elements: the parser, the lexicon, the
semantic analyser, the knowledge base, and the
generator

ICT619

16

Components of NLP systems (cont’d)


A syntactical parser analyses the input sentence using
the language's
grammar

or rules of syntax


Output produced is a structural description of the
sentence
-

known as a
parse tree


Some rules of syntax for English:





S = NP + VP


S : sentence NP: noun phrase VP: predicate or verb
phrase


The noun phrase can be more than a single noun


NP = D + ADJ + N


D: determiner (D) eg, “a”, “this”, ADJ: adjective, N:
main noun

ICT619

17

Components of NLP systems

(cont.)

The lexicon


An internal dictionary
used to perform the
syntactic and semantic
analysis



Contains semantic and
grammatical information
(eg, part
-
of
-
speech)
about words or word
strings


Fig. An example parse tree for the
sentence “Mary had a little lamb”

ICT619

18

The semantic analyser and the
knowledge base


The semantic analyser uses the parse tree and the
knowledge base to try to determine what the sentence
means



It creates another data structure that represents the
meaning of the input sentences



It can also draw inferences from input statements using
general knowledge in the KB



The semantic analyser's data structure and those in
the KB should be in a common knowledge
representation, such as KQML or Conceptual Graphs

ICT619

19

The Generator


The generator uses the KB data structure created by the semantic
analyser to create a usable output



The response depends in part on the pragmatics of the input
language eg greetings require greetings, questions require
answers, commands require actions



The data structure can be used to initiate some action,



eg the language system is a
front
-
end

to a DBMS. The generator
writes commands in a query language to begin a search



Simple generators feed standard pre
-
stored output responses to
the user based on the built meaning representation



More sophisticated generators construct an original response by
instantiating templates

based on models of language use

ICT619

20

Applications of NLP
-

Natural language
interfaces (NLI)



An NLP system can be the front
-
end of information
systems to provide a more user
-
friendly interface


Eg, the command: “List details of all files in this folder
sorted by time of creation” much friendlier than ”ls

atl”,
especially using voice input



An NLI processes sentences exchanged between a
user and an application


Task made easier by the restricted domain of
discourse, eg, in databases due to their highly
restricted domain containing information on a single
area of application



Interfaces for expert systems, operating systems and
document retrieval systems are also being developed

Homer: A Language
-
using Agent

Source: Vere, S. & Bickmore, T. A Basic Agent Computational Intelligence, 1990, 6, 4, 41
-
60.

ICT619

22

Natural Language
-

Homer

STEVE> What is in front of you?

HOMER> A log.

STEVE> Do you own the log?

HOMER> No I don’t.

STEVE> The log belongs to you.

HOMER> Oh.

STEVE> Cows eat grass.

HOMER> I know.

STEVE> Do you own the log now?

HOMER> Yes I do.

TIM> Drop the package at the barge next Saturday at 9pm.

HOMER> OK.

TIM> Are you going to be at the pier next Saturday?

HOMER> Yes.

TIM> Why are you going to the pier?

HOMER> I’ll go to the pier in order to pick up the package.

TIM> What time are you going to pick it up?

HOMER> 8:56pm.

STEVE> Where will the package be next Saturday?

HOMER> At the barge.

ICT619

23

Examples of commercial NL :
Intellect

Intellect (Trinzic Corp.)


One of the most widely used natural language front
-
end interfaces available for mainframes




Designed for use with DBMS under IBM operating
systems environments



In addition to allowing access to data in a database,
Intellect allows creation of databases using natural
language




The built
-
in lexicon may be modified to fit a particular
application

ICT619

24

Q&A (Symantec Corp.)


A basic file manager with a natural language front
-
end called “The
Intelligent Assistant”


Parses common English input questions and converts them into
queries that the file manager can understand


Paraphrases input requests to ensure full understanding of what
user wants



Eg, User input:



Show the total 1992 sales for the Central Region



Q&A Intelligent Assistant’s response:



Shall I do the following?



Create a report showing the amount of sales for

the central


region in 1992?



Y(es)


Continue

N(o)


Cancel request



Semantec discontinued and then sold Q&A to a German
company called CAB GmbH.

ICT619

25

Machine translation

Goal:


To support translation of some language into a language
other than the original


Applications include:


Desktop and web
-
based translation services


Spoken language translation services (eg phone
-
based)


Requirements:


Understanding meaning of input sentences


This would involve a semantic analysis of the input using
semantic knowledge


An automatic translation system is expected to be robust
and not stop whenever it encounters an item it cannot
understand


ICT619

26

Machine translation (cont’d)

Current approaches use a transfer grammar


Input text


Partial analysis


1st Intermediate
representation of content (related to the source
language)


Intermediate representation


Transformation using
a transfer grammar


2nd intermediate representation
(related to the target language)


2nd intermediate representation


NL generator


Text in target language



Machine translation as performed since mid
-
1960s is
not true “understanding” of text


By 1991, systems that could process sentences with
limited vocabulary started appearing

ICT619

27

Current state
-
of
-
the
-
art of machine
translation


Broad coverage MT systems already available on the Web
with fast turnaround time and acceptable error rate



Higher accuracy achieved by domain
-
specific systems


For example, controlled language used in Caterpillar
manuals


Machine translation products


Bowne Global Solution’s iTranslator


www.itranslator.com



Systran’s Babel Fish (used by AltaVista)


www.systransoft.com


ICT619

28

Current state
-
of
-
the
-
art of machine
translation (cont’d)

An example: Systran’s Web
-
based Translator

ICT619

29

Spoken language dialogue systems


Communicate with users via automatic speech recognition
and text
-
to
-
speech interfaces


Mediate the user’s access to a back
-
end database



Examples:


Information services: stock quotes, timetables


Transaction services: banking, betting, flight reservations


Current technology has been claimed to be capable of
reducing call centre costs from $75 to 18c a call


Some issues:


Telephony
-
based systems cannot afford a training period


Making a conversation too realistic falsely raises user
expectations and can confuse the system


ICT619

30

Spoken language dialog systems
(cont’d)

More issues:


Error handling is a significant issue


Giving initiative to the user increases difficulty


Some relatively successful examples:


A Sydney taxi booking service (about 30% of cases have to
go to human operators).


Telstra directory assistance service (15
-
20% accuracy but
15
-
20% of automation may be useful enough)



Spoken language dialog systems fielded applications:


Nuance (
www.nuance.com
)


ScanSoft/SpeechWorks( (
www.scansoft.com
)


Philips (
www.speech.philips.com
)

ICT619

31

Text processing


A number of different applications dealing with the
processing of continuous text may be grouped together
under this heading


Editing tools


Most common example: spelling and syntax (or grammar)
checkers

Characterised by avoidance of deep semantic processing



Content extraction


Concerns extraction of specific information from texts


Examples:


Extraction of information related to financial transaction from
a bank telex or of bibliographic information from research
papers


ICT619

32

Text processing (cont’d)



Content extraction (cont’d)


Requires deep semantic analysis which is aided by the
restricted domain and
a priori

knowledge of the
information to be extracted



Commercial systems exist for electronic mail
processing, banking systems and automatic summary
generation


Examples:


ATRANS from Cognitive Systems


DEAL
-
READER from Gecosys

ICT619

33

Text processing (cont.)


Text summarisation

Objective:


To produce a version of a document shorter than the
original document


Applications of text summarisation are found in


Information browsing


Voice delivery of Web pages and email



Issues concerning text summarisation


Different kinds of summaries:


Indicative (what is it about?) vs Informative (what is there of
interest to user?)


Real summarisation requires real understanding

ICT619

34

Text summarisation state
-
of
-
the
-
art


Commercial systems work on a ‘sentence
-
extraction’ model
Sentences regarded as ‘important’ are extracted and put
together


Importance of sentences decided on the basis of location,
inclusion of key words, statistical information such as
frequency



Current systems are relatively knowledge
-
free


Not based on real understanding of the text



Some text summarisation applications currently available:


CognIT’s CORPORUM (
www.cognit.com
)


INXight’s Summarizer (
www.inxight.com
)


MS Word’s summarisation tool

ICT619

35

Search and Information Retrieval


Ever increasing amount of information available
worldwide, particularly on the Internet


Searching for and retrieving information relevant to a
topic of interest an active area of research and
application.



Document retrieval (DR)


Also known as text retrieval


Involves retrieving text ranging from paragraph to book
length for humans to read


DR may involve


searching well
-
maintained bibliographic databases


scanning hard disks for missing files


searching thousands of Web servers for natural language
articles on a topic of interest

ICT619

36

Search and Information Retrieval
(cont’d)


Efficacy of a DR system measured by


Precision


proportion retrieved that are relevant, and


Recall


proportion of relevant documents retrieved



Retrieval depends on
indexing

-

indicating what documents are
about


Indexing requires an
indexing language
, a
term

vocabulary, and a
method for constructing requests and document descriptions



Both controlled language indexing and the more sophisticated
natural language indexing require NLP capabilities


Compact descriptions of a document’s significance may increase
the efficiency of matching


Increasing both recall and precision is the fundamental goal of
index languages

ICT619

37

Search and Information Retrieval
(cont’d)

Current topics of interest in search and information retrieval include:


In a concept
-
based search, documents are characterised by
relevant concepts and not just key words



For example, a search for ‘car’ should also retrieve documents on
'automobiles'



Named entity recognition involves recognising names of peoples,
places, organisations etc.



One person or organisation can be referred to by many name
variants


eg, John Howard, Mr. Howard, J.W. Howard, the PM



Many persons or organisations can share the same name


eg,
politician John Howard, actor John Howard

ICT619

38

Search and Information Retrieval
(cont’d)

Search and Information Retrieval State
-
of
-
the
-
art


Current trend (eg Google) is to expand the search
vocabulary by using thesauri (eg, ‘car’


‘automobile’)


Linguistic analysis to identify phrases relevant to the initial
query



Key phrases can be more useful than just key word


Can be used to expand an initial user query (Khan & Khor
2004)



Some current search and information retrieval applications:


Ultra Find:
www.ultradesign.com/untrafind/ultrafind.html


Lotus Discovery Server:
www.lotus.com/products/discserver.nsf


Smart text processing suites:


Inxight:
www.inxight.com


Verity: wwwl.verity.com

ICT619

39

Challenges faced by NLP


A good NLP system must be capable of handling
common linguistic problems caused by ambiguities and
the use of context


Prepositional phrase attachment



A sentence can often be analysed in more than one
way, producing multiple parse trees for the sentence.


Example sentence:



“John saw
the boy in the park with a telescope


has 3 possible parses

Without contextual knowledge, it is not known whether
John was looking through the telescope, the boy had a
telescope, or the park had a telescope in it.

ICT619

40

Challenges faced by NLP (cont’d)

Lexical ambiguity


When words have multiple meanings


A classic example:


Time flies like an arrow.


Fruit flies like a banana.



In the first case, “flies” is a verb and “like” is an
adverb


In the second case, “flies” is a noun and “like”
is a verb.

ICT619

41

Challenges faced by NLP (cont.)

Anaphoric reference

or
pronoun resolution



Problem of figuring out what a pronoun refers to


Example:


Give me the names of all managers and how much
they earn.

(1)


Mary went to see Jane. She was happy to see her

(2)



In (1), easy to decide that “they” refers to the managers
already mentioned


In (2), difficult to decide who “she” and “her” refer to




was Mary happy to see Jane, or was Jane happy to
see Mary?

ICT619

42

Challenges faced by NLP (cont.)

Ellipsis



Sentences appearing to have parts missing


Example


John works in Personnel, Mary in Accounting
.


“Mary in accounting” lacks a verb but is
understandable using context of entire
sentence


“Mary in accounting” is an elliptical form of
“Mary works in accounting”.

ICT619

43

Challenges faced by NLP (cont.)

Quantifier scope



Quantifiers such as “all”, “every”, “some”, and “no” can
be ambiguous



Example:


Every employee does not like Mr Smith


Meaning
-

not a single employee likes Mr Smith


or
-

some do and some don’t.



No current NLP system can handle all of these
problems


no unrestricted NLP system yet



Yet some such as HOMER can handle the most
common forms

ICT619

44

REFERENCES


Germain, E.,
Introducing Natural Language Processing
, AI Expert,
August 1992, pp.30
-
35.


Lewis, D.D., and Jones, K.S.,
Natural Language Processing for
Information retrieval
, Communications of the ACM Vol. 39, No. 1
(January 1996), pp.92
-
100.


Turban, E.,
Decision Support and Expert Systems
, Prentice Hall,
Englewood Cliffs, New Jersey, 1995, pp. 242
-
257.


Thayse, A. (Editor),
From Natural Language Processing to Logic for
Expert Systems
, John Wiley & Sons, 1991.


Cole, R., Zaenen A., & Zampolli (eds),
Survey of the State of the Art
in Human Language technology
, Cambridge University Press, 1998


Available on the web:
http://cslu.cse.ogi.edu/HLTsurvey/


Dale, R.,

Language Technology: Applications and Techniques
Tutorial 2004,
The 8th Pacific Rim Int. Conf. on Artificial Intelligence,
Auckland, 9
-
13 August, 2004.


Khan, M.S., and Khor, S. “Automatic Query Expansion for Enhanced
Web Document Retrieval”, Journal of the American Society for
Information Science and Technology, Vol. 55, No. 1, 2004, pp.29
-
40.