Learning is perceived as a gateway to understanding the problem of intelligence. As seeing is intelligence, learning is also becoming a key to the study of artificial vision. Thus, instead of building a machine or computer program to solve a specific visual task, computer vision and pattern recognition is used to develop systems that can be trained with examples and visual tasks. Vision systems that 'learn and adapt' represent one of the most important directions in computer vision and pattern

coatiarfAI and Robotics

Oct 17, 2013 (4 years and 22 days ago)

55 views

www.IndianJournals.com
Members Copy, Not for Commercial Sale
Downloaded From IP - 115.248.73.67 on dated 1-Dec-2010
Management & Change, Volume 13, Number 1 (2009)
Perspectives
A PATTERN RECOGNITION IN A VIRTUAL WORLD
Samia L. Jones Omprakash K. Gupta
Learning is perceived as a gateway to understanding the
problem of intelligence. As seeing is intelligence, learning is
also becoming a key to the study of artificial vision. Thus, instead
of building a machine or computer program to solve a specific
visual task, computer vision and pattern recognition is used to
develop systems that can be trained with examples and visual
tasks. Vision systems that 'learn and adapt' represent one of
the most important directions in computer vision and pattern
recognition research. This paper presents a learning system
simulator without prior knowledge about the users in advanced.
The users will play the game not against each other but to
complement each other. The users will be in teams to solve
game problems with strategy that may differ every time and
the sitting is in virtual world.
Key words:Pattern recognition, reinforcement learning,
computer vision, second life.
INTRODUCTION
Increasing automation and demands for quality mean inspection by human
beings becomes impractical, driving the continuing development of new
machine vision applications. The cameras and computers that make up a
machine vision system don't get tired, make mistakes or subjective decisions;
they work with constant precision at the pace of the production process. A
machine vision system consists of computer hardware and software working
together with cameras and lighting to capture images of objects for the
purpose of making a quality control decision (Ghosh & Pal, 2005). When
the image is captured and stored in memory, it is algorithmically compared
to a predefined image or quality standard in an effort to detect defects. This
machine vision technology can be applied in a number of different
applications. They are programmed to perform narrowly defined tasks such
as counting objects or searching for surface defects. They are capable of
processing images consistently however they are typically designed to
perform single, repetitive tasks. Moreover no machine vision or computer
Management & Change, Volume 13, Number 1 (2009)
© 2009 IILM Institute for Higher Education. All Rights Reserved.
www.IndianJournals.com
Members Copy, Not for Commercial Sale
Downloaded From IP - 115.248.73.67 on dated 1-Dec-2010
Management & Change, Volume 13, Number 1 (2009)
vision system can yet match some capabilities of human vision in terms of
image comprehension.
Thus, machine vision typically replaces random sampling techniques or
human vision inspection techniques as a means of monitoring users and
their behaviors in different situations. A study by Machine vision trends and
technologies in the December 2008, Control Engineering North American
print edition Product Research article, revels that slightly more than a third
(38 per cent) said they currently use an integrator. Of the 63 per cent who
do not now use an integrator, nearly 22 per cent said that they plan to do so
in the next 12 months.
Reinforcement Learning (RL) is a Machine Learning technique, which
has become very popular in recent days. The technique has been applied to
a variety of artificial domains, such as game playing, as well as real-world
problems. In principle, a Reinforcement Learning agent learns from its
experience by interacting with the environment. The agent is not told how
to behave and is allowed to explore the environment freely. However once it
has taken its actions, the agent is rewarded if its actions were good and punished
if they were bad. This system of rewards and punishments teaches the agent
which actions to take in the future, and guides it towards a better outcome.
In Reinforcement Learning (RL) the agent learns from its experience,
and learns which moves lead to a winning situation and which moves should
be avoided. RL algorithms do not require a complete model of the situation,
but only its rules and final outcomes. A popular RL algorithm is Temporal
Difference (TD) learning. In TD, the estimated value of the current solution
is updated based on the immediate reward and the estimated value of the
subsequent solution. In the case of a deterministic solution like mathematics
problems, RL algorithms can be extended to learn the values of subsequent
states (after states), instead of the usual state-action values. If n states lead
to the same after state then by visiting just one of those states, the agent can
assign correct values to all n states (Sutton & Barto 1998).
RL algorithms have been extensively applied to games. The most
successful application was in Backgammon (Tesauro 1992, 1994, 1995),
where a program (TDGammon) trained via self-play was able to match top
human players. Other applications include: Chess (Beal & Smith 2000;
Dazeley 2001; Thrun 1995), Connect Four (Ghory 2004), Go (Schraudolph,
et al., 2000), Othello (Tournavitis, 2003; Walker, et al., 1994; Pal, 2006; &
Pal and Mitra, 2004), and Shogi (Beal & Smith, 1999).
88 A Pattern Recognition in a Virtual World
www.IndianJournals.com
Members Copy, Not for Commercial Sale
Downloaded From IP - 115.248.73.67 on dated 1-Dec-2010
Management & Change, Volume 13, Number 1 (2009)
REINFORCEMENT LEARNING GENERAL MODEL
In Reinforcement Learning the agent's surroundings are called the
environment. The agent interacts with the environment at each discrete
time step t = 0, 1, 2, 3, … At each time step the agent receives the
environment's representation of the state st from a set of possible states S.
Based on state st, the agent selects an appropriate action at from a set of
available actions for that state A(st). As seen in Fig 1 the consequence of
the action at, the agent receives a reward rt+1 and finds itself at a new state
st+1 one time step later (Sutton & Barto 1998).
The basic reinforcement learning model, as applied to Markov decision
Process (MDP), consists of:
"A set of states S,
"A set of actions A,
"A reward function R.S. A, and
"A state transition function
 
'
T:S A (S).T(s,a,s )× →Π
gives the probability of making a transition from
state s to state
s’
using action
a
.
From the above list we can see that MDPs are important since they
can describe the majority of Reinforcement Learning problems (Sutton &
Barto, 1998). Given any state
s
and action
a
, the probability of each possible
next state
s’
is given y:
 
=
a
ss'
P
 
{ }
1
Pr = s, a a
+
= =
t t t
s s's
Fig. 1 The Agent-Environment Interaction in Reinforcement Learning
Samia L. Jones and Omprakash K. Gupta 89
www.IndianJournals.com
Members Copy, Not for Commercial Sale
Downloaded From IP - 115.248.73.67 on dated 1-Dec-2010
Management & Change, Volume 13, Number 1 (2009)
Similarly, given any current state
s
, action
a
and any next state
s’
, the
expected value of the next reward is :
 
{ }
1 1+ +
= =
a
ss't t t t
R E r s = s, a = a, s s'
A state-value function is a function that estimates the quality of a given
state. The quality of a state is defined in terms of the future rewards that
can be expected from that state. Given that the expected rewards depend
on the actions taken by the agent, the state-value function is defined in
terms of the policy used (Sutton& Barto 1998). The value of a state under
s
the policy
 
π
is denoted by
 
V (s)π
and defined as:
 
{ }
t t
V (s) E R s s
π
π
= =
Where
 
E
π
denotes the expected value given that the agent follows policy
 
π
.
In each time step, we use the value function from next state to estimate
the value of current state. In this way, the reinforcement learning algorithm
learns by iteratively reducing the discrepancy between value function
estimates for adjacent states.
This is called Temporal Difference learning (TD) (Kaelbling, et al.,
1996; Sutton, 1988; Sutton & Barto, 1998) which is an error-driven method
that is without doubt the most central idea of Reinforcement Learning.
The TD method uses its experience with policy
 
π
to update the estimate
of
 
V
π
. The method updates the estimate of
 
t
V(s )
based on what
happens after its visit to state
 
t
s
. The update can occur immediately at time
 
t 1+
, when the method can form a target based on the observed reward
 
t 1
r
+
and the estimate
 
t 1
V(s )
+
. Once the target is formed the error term
can be calculated and the estimate of
 
t
V(s )
updated (Sutton & Barto, 1998):
As follows:
Step 1:
 
t arget ←
 
1 1+ +
+
t t
r V(s )
γ
90 A Pattern Recognition in a Virtual World
www.IndianJournals.com
Members Copy, Not for Commercial Sale
Downloaded From IP - 115.248.73.67 on dated 1-Dec-2010
Management & Change, Volume 13, Number 1 (2009)
Step 2:
 
error t arget← −
 
t
V( s )
Step 3: .
 
← +
t t
V( s ) V( s )
α
.
 
error
In the above,
 
α
is the learning rate and
 
γ
is the discount factor. The
learning rate determines how much
 
t
V(s )
is updated after each time step:
it is set between 0 and 1.
 
So the process goes like this:
Initialise V(s)with random values
For each learning episode
Initialize state s
For each step of episode
Execute action a
'
'
'
given by for s
Observe reward r and the next state s
V(s) V(s)+ r V(s ) V(s)
s s
Until s is terminal
π
⎡ ⎤
← α + γ −
⎣ ⎦

RESEARCH OBJECTIVES
The basic goal of this paper is the visual perception of the ability to
"understand" the environment visually. It is an ever-changing environment
with lots of different information which is essential to the core goal.
In this research, computer vision used is as follows:
Users will sign on their second life virtual space with their virtual names
and join a study session whereas computer vision is mainly focused on their
images processing machine vision digital input/output that will be fed to
computer networks to observe the behavior in the different situations.
Most machine vision system consists of the following:
1.A camera for acquiring images as well as digitizing images "frame
grabber"
2.A processor (PC)
3.Input/ output hardware (e.g. network connection to report results)
4.A program to process images and detect relevant features
Samia L. Jones and Omprakash K. Gupta 91
www.IndianJournals.com
Members Copy, Not for Commercial Sale
Downloaded From IP - 115.248.73.67 on dated 1-Dec-2010
Management & Change, Volume 13, Number 1 (2009)
5.A synchronizing sensor for part detection (often an optical or magnetic
sensor) to trigger image acquisition and processing.
In this research Camtasia software was used to capture the session
and create a digital input output file to be used in the reinforcement learning
program using matlab and a PC.
At the appointed time for the session, all the users with their virtual
names will be joining the recorded session and participate in the environment
while they are being observed and recorded.
On another different time the same process is repeated with different
groups to study the patterns of behaviours while doing the same tasks.
After all data collected is analyzed the same way. This was repeated twenty
seven (27) times with each time a group of ten (10) users.
MATHEMATICS SIMULATION SYSTEM THE ENVIRONMENT
MODEL
This method is simple. The user will be represented by two variables: knowing
or guessing. So, we need only to know if the user close to the right answer
at all time. Guessing user status may be changing all the time. At each time
step, the user only need to know how is he close to answer a (Fig. 2). Each
solution consists of number of segments/steps. All the segments can be
divided into 3 categories: right, wrong, or go back.
In order to provide this information to the user, we must first find out
which segment of the problem he/she is in. In other words, we need to
know at any time step whether the user is right at the current segment to
allow entered the next one. Then, we execute different equations to solve
the problem, with respect to different types of the segments. By tracking
the user's steps/segments, we can evaluate the progress of the user and
how close to the right solution.
Segments for Sample Tasks
• Identify all variables that change with time
• Write an equation that relates the variables identified
• Take the derivative, with respect to time "t" .
• Substitute known rates and values into the differentiated equation
• Solve for the derivative, to get the desired answer.
92 A Pattern Recognition in a Virtual World
www.IndianJournals.com
Members Copy, Not for Commercial Sale
Downloaded From IP - 115.248.73.67 on dated 1-Dec-2010
Management & Change, Volume 13, Number 1 (2009)
Reward Function
There are two types of situations where the user should be given a negative
reward. One is when the user is far away from the suggested answer and
the other when it is clear that the user is guessing. When either one of these
situation is happening, the use will receive a ( -1) reward and then restarts
from the beginning of the segment/step. Otherwise he will get a value of 1.
Value Function
The value function is to learn the long run reward to state-action pair of
each time step given the immediate reward above. Then the optimal policy
can be constructed based on this value function. Here we use a decision
tree to find out the current state and the corresponding action to take, as in
Fig. 2.
Fig. 2 Simulation Model, Decision Tree
Samia L. Jones and Omprakash K. Gupta 93
www.IndianJournals.com
Members Copy, Not for Commercial Sale
Downloaded From IP - 115.248.73.67 on dated 1-Dec-2010
Management & Change, Volume 13, Number 1 (2009)
Sample problem:
An Airplane is climbing at a 45 degree angle at the speed of 300 miles per
hour. How fast is the altitude increasing (in miles per hour and feet per
second?
 
Second Life Environment
Second Life is a 3-D virtual world developed by Linden Lab on 2003 and is
accessible free via the Internet by downloading a client program called the
Second Life Viewer. It enables its users, called Residents, to interact with
each other through avatars. Residents can explore, meet other residents,
socialize, participate in individual and group activities.
From the moment a user enter the World he/she will discover a vast
digital continent, teeming with people. He/she will also be surrounded by the
Creations of other fellow Residents.
Users are organized in second life to discuss the current set of questions.
The time of these meetings, typically in the evening and usually adjusted to
maximize user access. The assessments will specifically analyze the ability
of avatar-based learning environments to address best way of learning,
gender, cultural, and racial issues in the classroom.
94 A Pattern Recognition in a Virtual World
www.IndianJournals.com
Members Copy, Not for Commercial Sale
Downloaded From IP - 115.248.73.67 on dated 1-Dec-2010
Management & Change, Volume 13, Number 1 (2009)
The pattern of users is studied and their behavior to a certain step is
graphed as a liner profile. The graph is analyzed to find known patterns. A
matrix with true and false values is created for every case. All these matrices
compared against database full of such matrices for getting the popular
trend of the users. At the end the system is trained and able to give advise
to users with similar behaviors as studied and were given valuable advise to
help them carry the task in almost half the time they could have spend
without the couching of the system. Users' comments were very positive
and encouraging to continue the same experiment with new tasks.
CONCLUSION
Users finished the tasks within 48% less time which almost half time. They
were happier by the end of the task and had a positive experience. Most
(98%) did recommend using similar methods for other subjects and tasks
because they though it made them happy that someone intelligent (the
machine) understand them and assist them their way!
REFERENCES
Beal, D.F. & M.C. Smith (1999) "First Results from Using Temporal
Difference Learning in Shogi", Computers and Games, . 1558: 113-25.
Beal, D.F. & M.C. Smith (2000) "Temporal Difference Learning for Heuristic
Search and Game Playing", Information Sciences, 122 (1): 3-21.
Control Engineering Staff (2008) Machine Vision: Trends and
Technologies, North American Print Edition Product Research Article,
December 11, 2008. (http://www.controleng.com/article/276139/
Machine_vision_product_research_positive_attitudes_outlooks_new
products.php)
Dazeley, R. (2001) "Investigations into Playing Chess Endgames Using
Reinforcement Learning", Honours Thesis, University of Tasmania.
Ghory, I. (2004) "Reinforcement Learning in Board Games", Masters
thesis, University of Bristol, UK.
Ghosh, A. & S.K. Pal (2005) Pattern Recognition and Machine Intelligence.
In Proceedings First International Conference PReMI 2005, Kolkata
December 20-22.
Samia L. Jones and Omprakash K. Gupta 95
www.IndianJournals.com
Members Copy, Not for Commercial Sale
Downloaded From IP - 115.248.73.67 on dated 1-Dec-2010
Management & Change, Volume 13, Number 1 (2009)
Kaelbling, L.P., M.L. Littman & A.W. Moore (1996) "Reinforcement
Learning: A Survey", Journal of Artificial Intelligence Research, 4
:237-85.
Pal, P (2006) Advances in Pattern Recognition. New York : World Scientific
Publication Com. Inc.
Pal, S.K. & P. Mitra (2004) Pattern Recognition Algorithms for Data
Mining. London : Chapman & Hall.
Schraudolph, N.N., P. Dayan, & T.J. Sejnowski (2000) "Learning to Evaluate
Go Positions Via Temporal Difference Methods". In Jain, L.C. & N.
Baba (eds.) Soft Computing Techniques in Game Playing. Berlin:
Springer Verlag,
Sutton, R.S. (1988) "Learning to Predict by the Method of Temporal
Differences", Machine Learning, 3: 9-44.
Sutton, R.S. & A.G. Barto (1998) Reinforcement Learning: An
Introduction. Boston: MIT Press.
Tesauro, G. (1992) "Practical Issues in Temporal Difference Learning",
Machine Learning, 8.(3 & 4): 257-277.
Tesauro, G. (1994) "TD-Gammon, a Self-Teaching Backgammon Program,
Achieves Master-Level Play", Neural Computation, 6 (2): 215-9.
Tesauro, G. (1995) "Temporal Difference Learning and TD-Gammon",
Communications of the ACM, 38 (3): 58-68.
Thrun, S. (1995) "Learning to Play the Game of Chess", Advances in Neural
Information Processing Systems, 7:1069-76.
Tournavitis, K. (2003) 'MOUSE(mu): a Self Teaching Algorithm that
Achieved Master Strength at Othello', Computers and Games, 2883:
11-28.
Walker, S., R. Lister & T Downs (1994) “A Noisy Temporal Difference
Approach to Learning Othello: A Deterministic Board Game”, ACNN,
5: 113-7.
96 A Pattern Recognition in a Virtual World