CSE333 project initial spec: Learning agents - University of ...

randombroadΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

135 εμφανίσεις

Learning agents

Project Mid
-
semester Report

October 22
nd
, 2002

Group participants:

Huayan Gao (
huayan.gao@uconn.edu
),

Thibaut Jahan (
thj@ifrance.com
),

David Keil (
dmkeil@att.net
),

Jian Lian (
lianjian@yahoo.com
)

Students in

CSE 333

Distributed Componen
t Systems

Professor Steven Demurjian

Department of Computer Science & Engineering

The University of Connecticut


Learning ag
ents midsemester

10/22/02



CONTENTS

CONTENTS

................................
................................
................................
........................

1

1. Objectives and goals

................................
................................
................................
.......

1

2. Topic summary

................................
................................
................................
...............

2

2. Topic summary

................................
................................
................................
...............

2

2.1 Definition and classification of agents and intelligent agents

................................
..

2

2.2 Learning

................................
................................
................................
....................

2

2.3 Platform
................................
................................
................................
.....................

3

3. Topic breakdown

................................
................................
................................
............

5

3.1 Machine learning (David)

................................
................................
.........................

5

3.2 A maze problem (David)

................................
................................
..........................

6

3.3 Agent platform (Jian)

................................
................................
................................

7

3.4 Agent computing (Huayan)
................................
................................
.......................

7

3.5 Distributed computing (J
ian)

................................
................................
....................

8

3.6 Implementation using Together, UML, and Java (Thibaut)

................................
.....

8

3.7 Extension to UML needed for multi
-
agent systems(Huayan)

................................
..

9

4. Progress on project, changes in direction and focus

................................
.....................

12

5. Planned activities

................................
................................
................................
..........

14

5.1 Oct. 23


Oct. 29:

................................
................................
................................
....

14

5.2 Oct. 30


Nov. 5:

................................
................................
................................
.....

14

5.3 Nov. 6


Nov. 12:

................................
................................
................................
....

14

5.4 Nov. 13


Nov. 19:

................................
................................
................................
..

14

5.5 Nov. 20


Nov. 26:

................................
................................
................................
..

14

5.6 Nov. 27


Dec. 2:

................................
................................
................................
....

14

6. References

................................
................................
................................
.....................

15

Appendix A: Risks

................................
................................
................................
............

17

Appendix B: Categories of agent computing

................................
................................
....

17

Appendix C: Q
-
learning Algorithm

................................
................................
..................

18


Learning agents midsemester

10/22/02

1


1. Objectives and goals

Our ambition is to build a general
-
architecture model of components for learning
agents. The project will investigate current
research on software learning agents and will
implement a simple system of such agents. We will demonstrate our work with a
distributed learning agent system that interactively finds a policy for navigating a maze.
Our implementation will be component
-
base
d, using UML and Java.

We will begin with the notion
intelligent agent
, seeking to implement this in a
distributed agent environment on a pre
-
existing agent platform. Agents, in the sense of
mobile or distributed agents implemented, we will refer to as “d
eployed agents.”

We will implement the different “generic” components so they can be assembled
easily into an agent. The project may also include investigation on scalability, robustness,
and adaptability of the system. Four candidate components of a distr
ibuted learning agent
are
perception, action, communication
, and
learning
.

Our design and implementation effort will focus narrowly on an artifact of realistic
limited scope that solves a well
-
defined arbitrarily simplifiable maze problem using Q
-
learning
. We will relate the features of our implementation to recent research in the same
narrow area and to broader concepts encountered in the sources.

We select JADE (Java Agent Development Framework) as our software
development framework aimed at developing m
ulti
-
agent systems and applications
conforming to FIPA (Foundation of Intelligent Physical Agents) standards for learning
agents.


Learning agents midsemester

10/22/02

2

2. Topic summary

In this section we will discuss the following questions in detail: What is an agent?
What is learning? How
are learning and agents combined? What agent platform will we
use?

2.1 Definition and classification of agents and intelligent agents

Researchers involved in agent have offered a variety of definitions. Some general
features that characterize agents are: a
utonomy, goal
-
orientedness, collaboration,
flexibility, ability to be self
-
starting, temporal continuity, character, adaptiveness,
mobility, and capacity to learn.

According to a definition from IBM,

“Intelligent agents are software entities that
carry o
ut some set of operations on behalf of a user or another program with some degree
of independence or autonomy, and in so doing, employ some knowledge or
representation of the user's goals or desires.”


An
autonomous agent
is a system situated within and a

part of an environment that
senses that environment and acts on it, over time, in pursuit of its own agenda and so as
to effect what it senses in the future.”[fra
-
gei01]. The latter broad definition is close to
the notion of
intelligent agent

used in the
artificial
-
intelligence field, replacing the logic
-
programming knowledge
-
base
-
oriented paradigm.

2.2 Learning

Machine learning
is a branch of artificial intelligence concerned with enabling
intelligent agents to improve their behavior. Among many categorie
s of learning, we will
focus on
reinforcement learning

and the special case,
Q
-
learning
.

Learning agents midsemester

10/22/02

3

Reinforcement learning is online rational policy search and uses ideas associated
with adaptive systems and related to optimal control and dynamic programming [sut
-
bar
98]. It is distinguished from traditional machine
-
learning research approaches that
assumed
offline

learning, separated from the application of knowledge learned during a
separate, training phase.

In the broader definition of intelligent agents, the agent

responds to its environment
under a
policy
, which

maps from a perceived state of the environment (determined by
agent
percepts
) to
actions
. An agent’s actions are a series of responses to previously
unknown, dynamically generated percepts. A rational agen
t is one that acts to maximize
its expected future reward or performance measure. Because its actions may affect the
environment, such an agent must incorporate thinking or planning ahead into its
computations. Because it obtains information from its envir
onment only through
percepts, it may have incomplete knowledge of the environment. The agent must conduct
a trial
-
and
-
error search for a policy that obtains a high performance measure.
Reinforcement by means of rewards is part of that search.

For intellige
nt agents that use reinforcement learning, unlike systems that learn by
training examples, the issue arises of
exploitation

of obtained knowledge versus
exploration

to obtain new information. Exploration gains no immediate reward and is
only useful if it c
an improve utility by improving future expected reward. Failing to
explore, however, means sacrificing any benefit of learning.

2.3 Platform

JADE (Java Agent Development Framework) is a software framework fully
implemented in the Java language. It simplifi
es the implementation of multi
-
agent
Learning agents midsemester

10/22/02

4

systems through a middle
-
ware platform that claims to comply with the FIPA
specifications and through a set of tools that supports the debugging and deployment
phase. The agent platform can be distributed across machine
s and the configuration can
be controlled via a remote GUI.

According to the FIPA specification, agents communicate via asynchronous message
passing, where objects of the ACL Message class are the exchanged payloads.
JADE
creates and manages a queue of inc
oming ACL messages; agents can access their queue
via a combination of several modes: blocking, polling, timeout and pattern matching
based.

As for the transport mechanism protocol, Java RMI, event
-
notification, and IIOP
are currently used.

The standard mo
del of an agent platform is represented in the following figure.



Fig. 2.3.1
The standard model of

an agent platform



JADE is FIPA
-
compliant Agent Platform, which includes the
AMS
(
Agent
Management System
), the
DF
(
Directory Facilitator
), and t
he
ACC
(
Agent
Communication Channel
). All these three components are automatically activated at the
agent platform start
-
up. The AMS provides white
-
page and life
-
cycle service,
maintaining a directory of agent identifiers (AID) and agent state. Each agent
must
register with an AMS in order to get a valid AID. The Directory Facilitator (DF) is the
Learning agents midsemester

10/22/02

5

agent who provides the default yellow page service in the platform. The Message
Transport System, also called Agent Communication Channel (ACC), is the software
co
mponent controlling all the exchange of messages within the platform, including
messages to/from remote platforms.


3. Topic breakdown

Our project will be focused on grid
-
based problems for learning agents. It will be a
similar aim to the one expounded in
[rus
-
nor95], but we have extended that simple
example further. Our realization will be more interesting using learning multi
-
agents and
maybe undecided rewards and walls. We will use
JADE (Java Agent Development
Framework)

as our main agent platform to dev
elop and implement the maze.

3.1 Machine learning (David)

Part of this project will consist of investigating the literature on machine learning,
particularly reinforcement learning. David will lead this work.

The problem of learning in interaction with th
e agent’s environment is that of
reinforcement learning

(RL). The learner executes a
policy search
, in some solutions
using a
critic

to aid the reward inputs as guides to improving policy (see figure below).



Fig. 3.1.
1

Learning agent

Learning agents midsemester

10/22/02

6

Within reinforc
ement learning we will address
Q
-
learning
, a variant in which the
agent incrementally computes, from its interaction with its environment, a table of
expected aggregate future rewards, with values discounted as they extend into the future.
As it proceeds,
the agent modifies the values in the table to refine its estimates. The
Q

function returns the optimal action, given a state. The evolving table of estimated
Q

values is called
Q
ˆ
.

3.2 A maze problem (David)

The concrete problem described below will help t
o define how the project breaks
down into components:

Both [mitchelt97] and [sut
-
bar98] present a simple example consisting of a maze for
which the learner must find a policy, where the reward is determined by eventually
reaching or not reaching a goal loc
ation in the maze.


Fig. 3.
2
.1
A maze problem

We propose to modify the original problem definition by permitting multiple
distributed agents that communicate, either directly or via the environment. Either the
multi
-
agent system, or each agent, will
use Q
-
learning. The mazes can be made arbitrarily
simple or complex to fit the speed, computational power, and effectiveness of the system
we are able to develop in the time available.

Learning agents midsemester

10/22/02

7

A further interesting variant of the problem would be to allow the maze

to change
dynamically, either autonomously or in response to the learning agents. Robust
reinforcement learners will adapt successfully to such changes.

3.3 Agent platform (Jian)

There are many kinds of agent platform we may choose from
http://www.ece.ari
zona.edu/~rinda/compareagents.html
. We choose JADE
(Java Agent
Development Framework)

as a deployed
-
agent platform.

JADE (Java Agent Development Framework) is a software framework fully
implemented in
the
Java language. It simplifies the implementation of
multi
-
agent
systems through middleware
using
a set of tools that supports the debugging and
deployment phase. The agent platform can be distributed across machines (which
do
not
even need to share the same OS) and the configuration can be controlled via a
remote
GUI. The configuration can be even changed at run time by moving agents from one
machine to another one, as and when required.


3.4 Agent computing (Huayan)

We will survey the agent paradigm of computing, focusing on rational agents, as
described in

part 2 above. We will apply these concepts to the problem of machine
learning, as is done in much reinforcement
-
learning research.

We have defined an intelligent agent as a software entity that can monitor its
environment and act autonomously on behalf of

a user or creator. To do this, an agent
must perceive relevant aspects of its environment, plan and carry out proper actions, and
communicate its knowledge to other agents and users. Learning agents will help us to
Learning agents midsemester

10/22/02

8

achieve these goals. Learning agents are

adaptive, so that in difficult changing
environments they may change their behavior based on its previous experience.

The real problem with any intelligent agent system is the amount of trust placed in
the agent's ability to cope with the information prov
ided by its sensors in its environment.
Sometimes the agent

s learning capability is not so good to achieve the anticipated goal.
This would be the emphasis when we study the agent.

Advantages of learning agents are their ability to adapt to environmental

change, their
customizability, and their manageable flexibilities

as the anticipated way
. Disadvantages
are the time needed to learn/relearn, their ability only to automate preexisting patterns,
and thereby their lack of common sense.

3.5 Distributed com
puting (Jian)

In multi
-
agent learning in the strong sense, a common learning goal is pursued or, in
the weaker sense, agents pursue separate goals but share information. Distributed agents
may identify or execute distinct learning subtasks [weiss99]. We wi
ll survey the literature
on distributed computing, looking for connections to learning agents, and will apply what
we find in an attempt to build a distributed system of cooperating learning agents.

3.6 Implementation using Together, UML, and Java (Thibaut
)

The maze described above could be represented as a bitmap or a two
-
dimensional
array of squares. Starting with a simple example is useful in order to concentrate on good
component design and successful implementation.

We used the Together CC software to
reverse engineer existing code of examples of
learning agents. We used two examples, the cat and mouse example, and the dog and cat
Learning agents midsemester

10/22/02

9

example, explained below. We are using these examples to extract from the class
diagrams a possible design for our agents.

M
ulti
-
Agent systems being actors and software, their design do not follow typical
UML design. The paper [fla
-
gei01], by Flake and Geiger suggest that UML does not
offer the full possibility of designing these agents.

We plan on using the Together CC softwar
e to implement these agents, starting by
their UML design. We have so far identified several distinct components that we think
will be used in these learning agents.

These Java
-
implemented agents would then be executed through the JADE
environment. The com
munication component will have to be specific to the Agent
Communication Language (ACL) used in JADE. This should be the only environment
-
dependent component. We will try to make the other components (learning, perception,
action) to be as “generic” as pos
sible.

Besides the design and implementation of the agents, we also have to design the
environment (maze,…).

3.7 Extension to UML needed for multi
-
agent systems(Huayan)

Nowadays, Unified Modeling Language (UML) has been widely used in software
engineering.

So it is easy to think of applying UML to the design of Agents Systems. But
Many UML applications are focused on macro aspect of agent systems like agent
interaction and communication, the design of micro aspects of such agents like goals,
complex strateg
ies, knowledge, etc. has often been missed out.
So the standard UML
could not afford to provide the complete solutions to multi
-
agent systems.
A detail
Learning agents midsemester

10/22/02

10

description about how to use extended
-
UML to implement Multi
-
agents systems can be
seen from [
fla
-
gei01
]
. A Dog
-
Cat Use
-
Case Diagram is given as follows:


Fig.
3.7.
1

Dog
-
Cat Use
-
Case Diagram

In the above graph, a
gents are modeled as actors with square heads, and elements of
the environment are modeled as clouds. A goal case serves as a means of capturin
g high
level goals of an agent. Reaction cases are used to model how the environment directly
influences agents. An arc between an actor and a reactive use case expresses that the
actor is the source of events triggering this use case.

Figure 1 illustrates

Dog
-
Cat use
-
case:
the dog triggers the reactive use case DogDetected in the cat agent. In the environment,
the tree triggers the TreeDetected use case in the cat.

In the following, we will give a similar Use
-
Case Diagram of Cat
-
Mouse and Maze.

T
he rules
of the Cat and Mouse game are: cat catches mouse and mouse escapes cat,
mouse catches cheese, and game is over when the cat catches the mouse.

The Cat
-
Mouse use
-
case Diagram is as follows
:

Learning agents midsemester

10/22/02

11


Fig.
3.7.
2

Cat
-
Mouse Use
-
Case Diagram

To the well
-
known ma
ze problem, as we have mentioned in section3.2, we give the
following use
-
case Diagram:




Fig.
3.7.
3


The

Maze Problem Use
-
Case Diagram


Learning agents midsemester

10/22/02

12

4. Progress on project, changes in direction and focus

We meet at least every Tuesday after class. Our main chan
ge of focus has been the
identification of an existing Q
-
learning package, “Cat and Mouse” (URL:
http://www.cse.unsw.edu.au/~aek/catmouse/followup.html
), implemented in Java, and an
existing agent platform, JADE.

Thibaut generated a class diagram of the Ca
t and Mouse Java code using Together.
Jian installed the Java code into the JADE platform to create a distributed environment
for the learner. Our goal is to implement agents that learn to pursue moving or stationary
goals (cat pursues mouse, mouse pursues

cheese) or avoid negative rewards (mouse flees
cat).

Huayan found a similar example, “Dog and Cat,” described with use cases, and
located other sources related to agent
-
based systems.

The source for “Dog and Cat” [Flake, Geiger, 2001], raised the issue o
f the
limitations of standard UML use
-
case diagramming for the purpose of depicting multi
-
agent systems.
Cat
, for example, has the use case
Escape

while
Dog

has
Chase
. But these
two use cases denote the same set of events, seen from opposite perspectives.

David coded a simple maze reinforcement learner based on [RN95] in C++, writing
classes for the maxe, individual states in the maze, and the learning agent. At a later stage
this code could easily be ported to java.

David also wrote C++ code for a system
based on (Michie, Chambers, 1968) that
uses reinforcement learning to solve the classic problem of pole balancing, in which a
controller nudges a cart that sits on a track, with a pole balanced on it, trying to avoid
letting the pole fall. In this problem,

physical states are on a continuum in four
Learning agents midsemester

10/22/02

13

dimensions, but may be quantized into a tractable number of discrete states from the
standpoint of the learner, leading to a solution.

The two directions taken so far by group members are somewhat complementary.
The group may have to choose between them, however. Use of the existing Cat
-
and
-
Mouse system will allow us with certainty to address harder problems, where the
learner’s environment changes in response to the agent (e.g., cat flees dog). Using JADE
has the

best chance to allow us to attain our goal of implementing distributed learning
agents that communicate. We may then seek to extend the existing solution by adding to
its Java code.

The approach of coding known solutions from scratch, on the other hand, g
uarantees
that at least one member of the group will understand the code, and all members will
understand it if all members participate in the coding or if the code explains the code to
the others. We noticed that the Java code for Cat
-
and
-
Mouse is quite l
engthy.

Learning agents midsemester

10/22/02

14

5. Planned activities

5.1 Oct. 23


Oct. 29:

Consultation with instructor on platform and problem choices to be made; discussion
on selection of problem and platform. Decision on role in this project of UML extension
to multi
-
agent systems.

5.2 Oc
t. 30


Nov. 5:

Java implementation of the learning aspect of the agents and enhancement of
communication efficiency. Each participant will code the components decided on and
described in the design part. Once these components are tested, they will be inte
grated
and the resulting system tested.

5.3 Nov. 6


Nov. 12:

Extensions to code. Circulation of draft report.

5.4 Nov. 13


Nov. 19:

Preliminary preparation of slides

5.5 Nov. 20


Nov. 26:

Preparation of the final report and last adjustments of the learn
ing agents.

5.6 Nov. 27


Dec. 2:

Polishing of report and slides.

Learning agents midsemester

10/22/02

15

6. References


[aga
-
bek97]

Arvin Agah and George A. Bekey
.
Phylogenetic and ontogenetic learning in
a colony of interacting robots
.
Autonomous Robots

4
, pp. 85
-
100
,
1997.

[anders02]
Chuck A
nderson
.
Robust Reinforcement Learning with Static and Dynamic
Stability.
http://www.cs.colostate.edu/~anderson/res/rl/nsf2002.pdf
, 2002.

[durfee99
]
Edmund H. Durfee.

Distributed pro
blem solving and planning
.
In
Gerhard
Weiss, Ed.
,
Multiagent systems: A modern approach to distributed artificial
intelligence
.
MIT Press
,
1999
,
pp. 121ff
,
1999.

[d’Inverno01] Mark d’Inverno, Michael Luck.
Understanding Agent Systems
. [PUB?]
2001.

[fla
-
ge
i01] Stephan Flake, Christian Geiger, Jochen M. Kuster. Towards UML
-
based
analysis and design of multi
-
agent systems. International NAISO Symposium on
Information Science Innovations in Engineering of Natural and Artificial Intelligent
Systems (ENAIS’2001)
, Dubai, March 2001.

[fra
-
gra96]

Stan Franklin and Art Graesser
.
Is it an agent, or just a program?:
A

taxonomy for autonomous agents
.
Proceedings of the Third International Workshop
on Agent Theories, Architectures, and Languages
, 1996.
www.msci.memphis.edu/
~franklin/AgentProg.html

[huh
-
ste99]

Michael N. Huhns and Larry M. Stephens.

Multiagent systems and societies
of agents
.
In
Gerhard Weiss, Ed.
,
Multiagent systems: A modern approach

to
distributed artificial intelligence
,
MIT Press
,
1999
,
pp. 79
-
120
,
1999.

[jac
-
byl] Ivar jacobson and Stefan Bylund. A multi
-
agent system assisting software
developers. Downloaded.

[Knapik98] Michael Knapik, Jay Johnson. Developing Intelligent Agents fo
r Distributed
Systems, 1998

[lam
-
lyn90]

Leslie Lamport and Nancy Lynch
.
Distributed computing: models and
methods
.
In
Jan van Leeuwen, ed.
,
Handbook of Theoretical Computer Science
, Vol. B
,
MIT Press
,
1990
,
pp. 1158
-
1199.

[mitchelt97]

Tom M. Mitchell
.
Mach
ine learning
.
McGraw
-
Hill
,
1997.

[mor
-
mii96]

David E. Moriarty and Risto Miikkulainen
.
Efficient reinforcement learning
through symbiotic evolution
.
Machine Learning

22
, pp. 11
-
33
,
1996.

[petrie96]

Charles J. Petrie
.
Agent
-
based engineering, the web, and i
ntelligence
.
IEEE

Expert
,
December 1996.

[rus
-
nor95] Stuart Russell and Peter Norvig.
Artificial intelligence: A modern approach.

Prentice Hall, 1995.

[SAG97] Software Agents Group MIT Media Laboratory. “CHI97 Software Agents
Tutorial”,
htt
p://pattie.www.media.mit.edu/people/pattie/CHI97/
.

Learning agents midsemester

10/22/02

16

[sandho99]

Tuomas W. Sandholm.

Distributed rational decision making
.
In
Gerhard
Weiss, Ed.
,
Multiagent systems: A modern approach to distributed artif
icial
intelligence
,
MIT Press
,
1999
,
pp. 201
-
258
,
1999.

[sen
-
wei99]

Sandip Sen and Gerhard Weiss
.
Learning in multiagent systems
.
In
Gerhard
Weiss, Ed.
,
Multiagent systems: A modern approach to distributed artificial
intelligence
,
MIT Press
,
1999
,
pp. 25
9
-
298
,
1999.

[shen94]

Wei
-
Min Shen
.
Autonomous learning from the environment
.
Computer Science
Press
,
1994.

[sut
-
bar98]

Richard S. Sutton and Andrew G. Barto
.
Reinforcement learning: An
introduction
.
MIT Press
,
1998.

[syc
-
pan96]

Katia Sycara, Anandeep Pann
u, Mike Williamson, Dajun Zeng, Keith
Decker.

Distributed intelligent agents
.
IEEE Expert
,
December 1996, pp. 36
-
45.

[venners97] Bill Venners
.
The architecture of aglets
.
Java World
,
April
,
1997.

[wal
-
wya94] Jim Waldo, Geoff Wyant, Ann Wollrath, Sam Kenall
. A note on distributed
computing. Sun Microsystems technical report SMLI TR
-
94
-
29, November 1994.

[weiss99
]
Gerhard Weiss, Ed.

Multiagent systems: A modern approach to distributed
artificial intelligence
.
MIT Press
,
1999.

[wooldr99]

Michael Wooldridge
.
In
telligent agents
.
In
Gerhard Weiss, Ed.
,
Multiagent
systems: A modern approach to distributed artificial intelligence
,
MIT Press
,
1999, pp.
27
-
77.

Reference to get title, author:

[xx99]
http://www.cs.helsinki.fi/research/hallinto/TOIMINTARAPORTIT/1999/report99/node2.html
.

Learning agents midsemester

10/22/02

17

Appendix A: Risks

Our objectives include avoiding several possible risks, including (1) the construction
of “toy worlds,” i.e., problem

specifications tailored to the envisioned solution;
(2)

complexity of design without performance gain; (3) overfitting the generalizable
components to the specific problem at hand, putting reusability at risk; (4) premature
commitment to a specific soluti
on (Q
-
learning) as opposed to exploration of various
alternatives.

Appendix B: Categories of agent computing

A wide range of agent types exists.



Interface agents

are computer programs that employ artificial intelligence
techniques to provide active assist
ance to a user with computer
-
based tasks.



Mobile agents
are software processes capable of moving around networks
such as the World Wide Web, interacting with hosts, gathering information
on behalf of their owner and returning with requested information th
at is
found.



Co
-
operative agents
can communicate with, and react to, other agents in a
multi
-
agent systems within a common environment. Such an agent's view of
its environment might be very narrow due to its limited sensory capacity. Co
-
operation exists w
hen the actions of an agent achieve not only the agent's
own goals, but also the goals of agents other than itself.



Reactive Agents
do not possess internal symbolic models of their
environment. Instead, the reactive agent “reacts” to a stimulus or input th
at is
Learning agents midsemester

10/22/02

18

governed by some state or event in its environment. This environmental event
triggers a reaction or response from the agent.

The application field of agent computing includes economics, business (commercial
databases), management, telecommunications
(network management)

and
e
-
societies (as
for e
-
commerce). Techniques from databases, statistics, and machine learning are widely
used in agent applications. In the telecommunication field, agent technology is used to
support efficient (in terms of both cos
t and performance) service provision to fixed and
mobile users in competitive telecommunications environments.

Appendix C: Q
-
learning Algorithm

With a known model (M below) of the learner’s transition probabilities given a state
and an action, the followin
g constraint equation holds for Q
-
values, where
a

is an action,
i

and
j

states, and
R

a reward:



Q
(
a, i
) =
R

(
i
) +

j
M
a
ij

max
a


Q
(
a

,
j
)

Using the temporal
-
difference learning approach, which does not require a model,
we have the following update formula
that is calculated after the learn goes from state
i

to
state
j
:


Q
(
a, i
)


Q

(
a, i
) +

(R(i) + max
a


Q
(
a

,
j
)
-

Q

(
a, i
))

Within the objective of a simple implementation, we will aim to provide an analysis
of the time complexithy, adaptability to dynamic
environments, and scalability of Q
-
learning agents as compared to more primitive reinforcement learners.