A literature review of the implementation of Dialogue based Natural Language Chatbot and their practical applications

estonianmelonΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 4 χρόνια και 2 μήνες)

82 εμφανίσεις

A literature review of
the implementation of
Dialogue

based
Natural Language Chatbot

and their practical
applications

Lok Hey Young and
Wang Xiang

Advisor
:
Dr.

Supratip Ghose

1


Abstract.

In this paper, a survey was conducted on current Recommender
Syst
ems with Natural Language based Dialogue systems like Chatbot. In
particular, we investigate
the potential of Alice Chatbot systems to work as
Recommender System or perform as a Web guide

in a topic
-
specific

domain
.
The conclusion of the survey was that th
e proper way of knowledge acquisition
in Natural language Chatbot can contribute a significant improvement to
recommender systems and can act as
a webguide to any suitable domain.

Keywords
:
Natural Language Chatbot, Dialogue Systems, Alice Chatbot,
Recomm
ender Systems, Web Guide, Knowledge Acquisition

1 Introduction

It is a great trade
-
off for the information revolution that end
-
users are finding it
increasing challenging to locate relevant information and services quickly and easily.
The traditional Web

search engine, for example, is an essential tool for information
ac
c
ess, but even today’s leading search engines are finding it difficult to cope with the
continued growth of the Internet and the poor quality of typical user queries.

While phenomenally s
uccessful in terms of amount of accessible content and
number of users, today’s web is a relatively simple artifact. Web content consists
mainly of distributed hypertext and hypermedia accessible via keyword
-
based search
and link navigation. Simplicity is
one of the Web’s great strengths and an important
factor in its popularity and growth; even naïve users quickly learn to use it and even
create their own content.

However, the explosion in both the range and quality of Web content also
highlights serious s
hortcomings in the hypertext paradigm. The required content
becomes incre
asingly difficult to locate via

search and browse; for example, finding
information about people with common names can be frustrating.

Many electronic
-
commerce sites attempt to solve
the problem by providing
keyword search capabilit ies. However, keyword search engines usually require that
users know domain
-
specific jargon so that the keywords could possibly match
indexing terms used in the product catalogue or documents. Keywords searc
h doesn’t
allow uses to precisely describe their intentions or specify relational operators (for
example, less than, cheapest) on product attributes. A search for shirt can reveal
dozens or even hundreds of items, which are useless for somebody who has a s
pecific
style and pattern in mind. Moreover, keyword search systems lack an understanding
of the semantic meaning of the search words and phrases. For example, keyword
search systems usually cannot understand that summer dress should look up in
women’s clo
thing under dress, whereas dress shirt is most likely in men’s under shirt.
Finally, search engines do not accommodate business rules, for example, a prohibition
against displaying cheap earrings, with more expensive ones.

Given the context, Personalizat io
n became an important business strategy in
Business to Consumer commerce, where a user explicit ly wants the e
-
commerce site
to consider her own information such as preferences in order to improve access to
relevant products.
Recommender systems have become

integral to e
-
commerce,
providing technology that suggests products to a visitor based on previous purchases
or rat ing history.

Recomme
n
der Systems offer a solution to the problem of successful
informat ion search in the knowledge reservoirs of the Interne
t by providing individual
recommendations.

A chat
ter
bot

(chatbot)
system is a software program that interacts
with users using natural language. It

is capable of
engaging in conversation with a
user. In most applications chatterbots are used as guides who
can show the user
around on a website.

The purpose of a chatbot system is to simulate a human
conversation; the chatbot architecture integrates a language model and computational
algorithms to emulate informal chat communicat ion between a human user and
co
mputer using natural language.
Given the current trends for personalization
and
Recommender systems, user preferences can be extracted from the dialogue session
with ALICE and from this user profile can be built to recommend users relevant
products or to n
avigate them successfully in a big websites with thousand products
and pages in which us
er can be easily lost in the vast amount

of informat ion.

Therefore, the survey takes the objective to study the implementation of Chabots and
finds its different applic
ation the other types of dialogue systems in order to
complement the search system.

T
his survey

investigates the knowledge acquisition activities of a chatterbot
program that mimic human conversation. Moreover,
this paper investigates how
the
chatterbot p
rogram
as a guide

and how can it be used as a tool for

Know
ledge
acquisition
.

Besides
,

it shows how it is useful in daily life, such as help desk tools,
automatic telephone answering systems, tools, to aid in education, business and e
-
commerce.

Web
-
based c
hatterbot system can provide an easy, natural extension to
knowledge acquisition and recommender systems.


An artificially intelligent natural language chat robot would be of great value in
several situations. For example, consider a university environment
. Even though most
of the information is available on the web, students often like to have personal
interaction with the advisor. In such an environment, a chat robot could be designed
for providing academic advice.

In business, chat robot can answer quest
ions on
various government services or provide social security informat ion. A chat robot
serving in above environments will have to understand natural language inputs from
users, locate the requested informat ion, and disseminate information through a natur
al
language interface.

The survey summarizes the existing literature on ALICE chatbot and how they fit
into the framework of current informat ion systems paradigm. In section 2, the study
reflects on algorithm and general structures of ALICE Chatbot and its

application
with informat ion Systems. In section 3, the study summarizes the main techniques
taken from the literatures and in section 4 the study will draw the conclusion
regarding the survey.

2

Literature review

The main topic of this survey was to st
udy on Alice Chatbot, its algorithm and
language. The survey intends to extend the algorithm of Alice Chatbot
so that it can
be served as web guide. Therefore, in the subsequent sections the study focuses on
points on the construction of ALICE and how ALIC
E is used in different systems and
application areas.


2
.1
Alice Chatbot

A.L.I.C.E. is an artificial intelligence natural language chat robot based on an
experiment specified by Alan M. Turing in 1950. The A.L.I.C.E. software utilizes
AIML, an XML langu
age we designed for creating stimulus
-
response chat robots.

Some view A.L.I.C.E. and AIML as a simple extension of the old ELIZA
psychiatrist program [Weizenbaum, J. 1966]. The comparison is fair regarding the
stimulus
-
response architecture. But the A.L.I
.C.E. bot has at present more than 40,000
categories of knowledge, whereas the original ELIZA had only about 200. Another
innovation was provided by the web, which enabled natural language sample data
collection possible on an unprecedented scale.

A.L.I.C
.E. won the Loebner Prize, an annual Turing Test, in 2000 and 2001.
Although no computer has ever ranked higher than the humans in the contest she was
ranked “most human computer” by the two panels of judges. What it means to “Pass
the Turing Test” is not
so obvious. Factors such as the age, intellect and expectations
of the judges have tremendous impact on their perceptions of intelligence. Alan
Turing himself did not describe only one “Turing Test.” His original imitation game
involved determining the gen
der of the players, not their relative humanness.

ALICE uses XML knowledge bases to match user input against a predefined
response set. The shortcoming of this system is that it cannot adequately answer all of
the queries given to it as Russesl et al. in
[2] contends that ALICEBOTs have no
cognitive theory behind them, instead they blindly rely on canned response to
matched queries. Alicebots are also able to expand their present knowledge bases
through XML based AIML (Art ificial Intelligence Markup Langua
ge) (Wallace
2003). This would imply that ALICEbots could be given an ‘expert appearance’
within a part icular domain of knowledge. This expansion has already been witnessed
in the areas of foreign language fluency, and specific domain
-
related knowledge
fie
lds which can either be supervised by interceding chatterbot master or unsupervised
where knowledge is gathered en masse from trusted sources.


In this section, the main features of ALICE are briefly presented and more
informat ion can be found at A.L.I.C.E
: Artificial Intelligence Foundation. A.L.I.C.E
matches the user input with a set of stored categories in reverse lexicographical order.
The categories are specified in Art ificial Intelligence Markup Language (AIML).
Each category comprises of one or more
patterns and a template. If any of the patterns
match with the user input, the string specified in the template would the robot’s
response. For example, consider the following category:










When the user input matches the patter “HI”, A.L.I.C.E. resp
onds with the
template reply “Hello, nice to meet you. “ Even though the above strategy appears to
be a naïve pattern matching approach, when used n conjunction with other techniques

it can generate seemingly intelligent responses [Wallace, R.S. 2000].

A.
L.I.C.E. provides a powerful capability named symbolic reduction [Wallace,
R.S. 2000]. The symbolic reduction can be used to “jump” from one category to
another. For example, consider the case where the user input “Hello, how are you” is
matched with the p
attern specified in one of the categories. Another user input, “Hi,
how are you?” could be matched with a different category. Even though the matching
categories in both cases might be different, the two user inputs essentially mean the
same thing. So, ins
tead of specifying individual responses for the two dif
ferent inputs,
they all could be

mapped to single category (“how are you”) by using symbolic
reduction.

There are three types of categories: atomic categories, default categories, and
recursive categ
ories.

a.

Atomic categories:
are those with patterns that do not have wildcard symbols, _

and

*
, e.g.:

<category>


<pattern>10 Dollars</ pattern>


<template>Wow, that is cheap. </ template>


In the above category, if the user inputs ’10 dollars’, then

ALICE answers ‘WOW,
that is cheap.’


b.
Default categories:
are those with patterns having wildcard symbols
*

or _. The
wildcard symbols match any input but they differ in their alphabetical order.
Assuming the previous input 10 Dollars, if the robot does

not find the previous
category with an atomic pattern, then it will try to find a
category with a default
pattern, such as:

<category>


<pattern>10
*
</pattern>


<template>It is ten.</template>

</category>


<category>

<pattern>HI</pattern>

<templ ate> Hello, nice to meet you.
</template>

</category>



So ALICE answers ‘It is ten’.


c.
Recursive
categories:

are

those with templates having <s
r
ai> and <s
r
> tags, which
refer to recursive reduction rules. Recursive categories have many applications:
symbolic
r
eduction that reduces complex g
r
ammatical forms to simpler ones; divide
and conquer that spli
ts an input into two or more subparts, and combines the
responses to each; and dealing with synonyms by mapping different ways of saying
the same thing to the same reply.







In this

example <srai> is used to reduce the input to simpler form “what is
*
”.







The input is partitioned into two parts, “yes” and the second part;
*

is matched with
the <sr/>tag. <sr/>=<srai>/></srai>








c
.1 Symbolic reduction


<category>

<pattern> DO YOU KNOW WHAT THE
*

IS </pattern>

<template>


<srai> What is <star/
×

/srai>

<template>

</category>


<

c.2 Divide and conquer



<category>

<pattern> YES
*
</pattern>

<template>


<srai> YES</srai>


<S
R/>

<template>

</category>


c.3 Synonyms


<category>

<pattern> HALO
</pattern>

<template>


<srai> Hello< /srai>

<template>

</category>


The input is mapped to another form, which has the same meaning.

ALICE Pattern matching algorithm

Before the matc
hing process starts, a normalizat ion process is applied for each input,
to remove all punctuation; the input is split into two or more sentences if appropriate;
and converts the input to and uppercase. For example, if input is: “I do not know. Do
you, or w
ill you, have a robots.txt file?” Then after the normalizat ion it will be: “DO
YOU OR WILL YOU HAVE A ROBOTS DOT TXT FILE”.

After the normalizat ion, the AIML interpreter tries to match word by word to
obtain the longest pattern match, as we expect this nor
mally to be the best one. This
behaviour can be described in terms of the Graphmaster set of files and directories,
which has a set of nodes called

nodemappers


and branches representing the first
words of all patterns and wildcard symbols (Wallace, 2003)
.

Assume the user input starts with word X and the root of this tree structure is a
folder of the file system that contains all patterns and templates, the pattern matching
algorithm uses depth first search techniques:


1. If the folder has a subfolder sta
rts with underscore then turn to , “_/”, scan
through it to match all words suffixed X, if no match then:

2. Go back to folder, try to find a subfolder start with word X, if so turn to “X/”,
scan for matching the tail of X. Patterns are matched. If no matc
h then:

3. Go back to the folder, try to find a subfolder start with star notation, if so, turn to

*
/”, try all remaining suffixes of input following “X” to see if one match. If no match
was found, change directory back to the parent of this folder, and p
ut “X” back on the
head of the input.

When a match is found, the process stops, and the template that belongs to that
category is processed by the interpreter to construct the output.

There are more than
50,000 categories in the current public
-
domain ALICE

“brain”, slowly built up over
several years by the Botmaster, Richard Wallace, the researcher who maintained and
edited the database of the original ALICE. However all these categories are manually
“hand

coded”, which is time
-
consuming, and restricts ada
ptation to new discourse
-
domains and new
languages.

In the following section we will present the automation
process we developed, to re
-
train ALICE using a corpus based approach.


2
.2
Chatbots and
Dialogue System

A number of chat robots have been develo
ped to serve different application
domains.
Their design has become increasingly sophisticated and their use adopted in
education (Jia J.)
, commerce

[Angeli D., Creative Virtual], entertainment [Wacky
Web], and the public
sector [
e.g., West Ham and Plainst
ow NDC].

ELIZA was
developed to serve as a psychoanalyst who simply frames questions deriving from the
user input itself. ELIZA, though an innovative program, is “memory less” and does
not really “understand” user inputs.

EGAIN CHATBOT

and IKEA

[IEKA.com]

help centre chatbot

is a lifelike
conversational agent providing an interactive and personal way for users to get
answers and assistance on any

website. A customer simply
chats with an assistant,
and the assistant acts as an agent, providing answers, proce
ssing data and solving
customer problems. A chatbot provides frontline support so the customer service staff
can concentrate on more complex tasks. A chatbot on the business sites are regarded
as shopping bots and upon installation will Greets uses on the
site, answer FAQ, take
uses on a tour of a site and conduct surveys. It learns from each customer visit and
keeps them coming back by remembering the sequential visits of the customers and
can prompt users based on prior conversations. Those ChatBots have
a reporting tool
that allows vendors to follow his customers thought process and identify ways to
enhance the site to better meet their needs. Reporting provides real
-
time access to all
customer dialogs, and a query tool for generating graphical reports sh
owing how one
customer is responding to survey questions. Moreover, since your assistant always
asks permission to gather information, customer’s privacy is assured. Commercial
Chatbot can be engaging and those have personality and character of a human
rep
resentative
-

one that provides entertainment and support, and evokes trust,
commitment, and loyalty in users.

In studies conducted on ALICEbots in particular, some interesting results have
been obtained. One study focused on using an ALICEbot as a Social T
heory tutor for
students. It was discovered that students were more interested in using the system as a
search engine to answer assignment questions rather than as the conversational tutor
per its design. This sentence
-
based
information retrieval aspect ha
s

traditionally been
confirmed to the arena of search engine design. In another learning
-
style experiment
in [Jia 2002], a modified version of an ALICEbot was used as a learning tool to teach
Chinese students either English or German. The study focused mor
e on user attitudes
rather than on chatterbot efficieny. It was discovered that 62% of users chatted for 10
lines or less, and that 8.5% of the t ime ALICEbot has no specific pattern to match the
given input and had to rely on root
-
level genric resposes. Th
ese conversational
entities all have in common the difficulty of maintaining dialogue for sustainable
period of time.

YPA is a natural language Chatbot dialogue systems that allows user
to retrieve informaiton Brit ish Telecom

s

Yellow pages [Kruschwithz et
. 1999,
2000]YPA is composed of Dialogue manager, Natural Language Front
-
End, Query
Construction Component, and the backend database.


Dialogue systems carry out conversations with users in natural language, whether
spoken or typed. The main tasks perform
ed by dialogue systems are language
interpretation, language generation, and dialogue management. The simplest dialogue
managers are based on finite
-
state automata in which the states correspond to
questions and arcs correspond to actions that depend on a
user
-
provided response
[Stent, Dowding 1999;

Winograd Flores, 1986].

These systems support what are
called fixed
-

or system
-
initiat ive conversations, in which only one of the participants
controls the actions, whether it is the system helping the user or t
he user asking
question of the system. Next, in complexity are frames or template based systems in
which questions can be asked and answered in any order [Bobrow et al., 1977]. Next,
true mixed
-
initiative systems allow either dialogue participant to contri
bute to the
interaction as their knowledge permits [Allen, 1999; Haller & McRoy, 1998;
Pieraccini, Levin, and & Eckert, 1997]. Thus, the conversational focus can change at
any time due to the user’s (or system’s) initiat ive of that change. Finally, some
di
fferent approaches that support sophisticated dialogues include plan
-
based systems
[Allen et al., 1995; Cohel & Perrault, 1979] and systems using models of rational
interaction [Sadek, Bretier, & Panaget, 1997].


2.3

Recommender Systems and Natural Langu
age Assistant

Natural Language Assistant [NLA] allows customers to make requests in natural
language and directs them towards appropriate web pages. For example [Joyce C.,
2002] presented an architecture in which the NLA was deployed in a pilot study at an

IBM external web site in order to drive Natural Language Search. The work involves
with the system is to engineer dialog for presenting the merchandise to the user, using
user interface studies to guide both the form and the content, and designing the sys
tem
to support business rules and business processes for updating the data, for example,
when offerings change. A natural language search with the user specification can be
formulated “The fastest Computer under $1400”. Based on the understanding of the
in
put, NLA retrieves the laptop (a Thinkpad R30 model) that has the fastest central
processing unit (CPU) speed among all laptops with a price less than $1400. This
example demonstrates the tremendous advantage of natural language search because
the user is
able to obtain the desired product in one interaction rather than navigate up
and down several layers of menus (menu
-
driven navigation) or browse among several
irrelevant pages (Keyword search). The results showed that the users clearly preferred
dialog
-
ba
sed searches to non
-
dialog based searches. Asking users question about their
lifestyle and how they were going to use computer accounted for improved
performance for directed dialog systems.


The Adative Place Advisor suggested by [Cynthia A. T., Goker H.
M. and Pat L]
presented a novel user model that influences both item search and the questions asked
during a conversation.
This is

one of the fist conversation based spoken dialog
systems and build a model of user preference from conversations.


2.3

Kno
wledge Acquisition Techniques with ALICE

The process of knowledge acquisition is to transfer existing knowledge and its
structure into a computer
-
interpretable form (Potter 2001). When coupled with the
internet, knowledge acquisition inherits new problems
of scale such as information
quality and reliability issues.

Knowledge acquisition has been a sought after goal since the early days of
Artificial Intelligence. The automated or semi
-
automated approaches have several
sub
-
approaches; pattern and template ma
tching, and text data mining (Lavrac and
Mozetic 1992; potter 2001). In pattern and machine learning, natural language input I
matched against some preconceived template where the response is pre
-
coded and
dictated by the matching template.

One of the appr
oaches is Eliza
-
type chatter bot, which played the role of a
Rogerian psychotherapist to pseudo extract information from its patients
(Weizenbaum 1966). With Eliza, it became possible to carry on generalized
conversations in a reasonable manner. Dialog sys
tem can adequately carry out the
conversations with the user and can log the conversations which can be good source
for knowledge acquisition for domain specific topic. Therefore
,
techniques of
knowledge acquisition were

rightly used in [Schumaker P. R., L
iu Y.,

Ginsburg M.,
Chen H] with their system AZ
-
ALICE chatbot that is a extension in ALICE chatbot.
T
hey tested their system in
a domain specific knowledge acquisition

as testbed

with a
number of participants and it was shown that mass knowledge acquisiti
on improved
the domain
-
specific chatterbot responses. They hypothesized that use of a chatterbot
as a knowledge acquisition tool appears to be a stable instrument in gathering both
conversati
on and domain
-
related knowledge and showed with statistics that i
t would
show a promising future in domain restricted areas.


3

Implementation of the Project and Hypothesis

The implementation of this project in a university environment is indeed something
very new. This Help
-
bot can be used to assist the Graduate Advisor
in giving advice
to students. It is hoped that this help
-
bot will lessen the current workload of the
Graduate Advisor.

In this project, the botmaster was the authors of the survey and A
Bot master is the master of the chatbot. Our responsibilities included
reading the
dialogues, analyzing the responses, and creating new

replies for the patterns.
We
intend the serve the UIC graduate advisor with our Chatbot and implement it as a
help
-
bot that complements the Academic Registry of UIC to advise undergraduate
an
d
graduate

student.
The main

from the surveys is

the following:


1.

Can
we store the correct student model so that the chatbot can successfully
recommend or advise a graduate with his/her coursework?

2.

Can the conversation be driven in particular manner so that

it can successfully
elicit the knowledge from the conversation log and increased the responses from
the user.

3.

Can it help a Navigation system as large as UIC to find the information
properly?

4

Conclusion of the Survey

Chat bots are becoming very common in

many web sites today. They serve to give
personality to web sites. It is deemed from the survey that Chatbot can be used as a
help
-
bot to successfully guide a user in a vast navigation space. Next, it can be used
successfully in many systems as conversati
on recommendation. The use of chatbot
can be used to gather domain specific knowledge and can be

correct ly used as a FAQ
system in a Graduate advisor system.



References


A.L.I.C.E.: Artificial Intelligence Foundation: ALICE and AIML Documentation,
http:
//
www.alicebot.org/documentation



Allen, J.F. (1995). Natural Language Understanding (second edition). Benjamin/Cummings,
Menlo Park, CA.


Chai, J., Horvath V., Nicolov, N., Stys, N., K., Zadrozny, W
., Melville, P. (2002).Natural
Lanuage Assistant: A dialogue system for online product recommendation. AI Magazine;
Summer 2002; 23, 2, pp 63
-
75 ProQuest Science Jounals.



Creative Virtual. UK Lingubot Customers. 2004
-
2006. Listing of major companies usin
g
Linubot technology.
www.creativevirtual.com/customers.php. Accessed 14/12/05



Cohen, W., Schapire, R., & Singer, Y. (1999). Learning to order things. Journal of Artificial

Intelligence Research. 10, 243
-
270.


De Angeli, A., G.I Johnson, and L. Coventry. The unfriendly user: exploring social reactions
to chatterbots, in International Conference on Affective Human Factors Design, Helander,
Khalid, and Tham, Editors 2001, Ase
an Academic Press: London


eGain S
ervice,
Web Self
-
Service Products
,

http://www.egain.com/docs/overviews/egain_overview_chatbot.pdf


Haller, S., McRoy, S. (1998). Preface to th
e special issue computational models of mixed
-
initiative interaction. User Modeling and User
-
adapted Interaction, 8, 167
-
170



IEKA.com help centre Bot.
http://www.ieka.com



Krusschwitz, U., De Roeck, A., Scott, P., S
teel, S., Turner, R., and Webb, N. (2000).
Extracting semistructured data lessons learnt. In proceedings of the 2
nd

international
conference on natural language processin
g (NLP2000), pages 406
-
417.



Jia, J.,

.: The Study of the Application of a keywords
-
b
ased Chatbot System on the Teaching
of Foreign Languages
. Ausburg, Germany, University of Ausburg(2002
)



Lavrac, N. and I. Mozetic, Eds. (1992). Second generation knowledge acquisition methods
and their application to medicine. Deep Models for Medical

Knowledge Engineering.
Elsevier, New York.



Potter, S. 2001. A Survey of Knowledge Acquisition from Natural Language. TMA of
Knowledge Acquisition from Natural Language. Edinburgh. 2003,
http://www.aiai.ed.ac.uk/project/akt/work/stephenp/TMA%20of%20KAfromNL.pdf


Sadek, M., Bretier, P., Panaget, F. (1997). ARTIMIS: Natural dialogue meets rational
agency. In proceedings of the Fifteenth International Joint Conference

on Artificial
Intelligence, pp. 1030
-
1035. Nagoya, Japan, Morgan Kaufmann.





Stent, A., Dowding, J., Gawron, J., Bratt, E., & Moore, R. (1999). The commandTalk
spoken dialogue system. In proceedings of the Thirty
-
seventh Annual Meeting of the
Associatio
n for Computaional Linguistics, pp. 183
-
190. College Park, MD. Association for
Computation Linguistics.


Thompson C. A., Mehmet H. Goker, Langley P. A personalized System for Conversational
Recommendations.


Wacky Web Fun Ltd. RacingFrogz.org c 2005. Onl
ine game with chatbot capabilities.
www.racingfrogz.org
. Accessed 17.05. 06





Wallace, R.S. 2000. Symbolic Reductions in AIML,
http://alicebot.org/srai.html



We
izenbaum, J. 1966. ELIZA
-

A Computer Program for the Study of Natural Language
Communication between Man and Machine, CACM 9(7), 36
-
43