An Incremental Machine Learning Mechanism Applied to Robot Navigation

cobblerbeggarAI and Robotics

Oct 15, 2013 (3 years and 8 months ago)

88 views




An Incremental Machine Learning Mechanism Applied to
Robot Navigation



Nawwaf N Kharma, Majd Alwan,
and

Peter Y K Cheung



Department of Electrical Engineering, Imperial College of Science Technology and Medicine,




London SW7 2BT U.
K.


Fax: + 44 171 5814419


























































Abstract

In this paper we apply an incremental machine learning algorithm to the problem of robot navigation. The learning
algorithm is applied to a simple robot sim
ulation to automatically induce a list of declarative rules. The rules are
pruned in order to remove the rules that are operationally useless. The final set is initially used to control the robot
navigating an obstacle
-
free path planned in a polygonal env
ironment with satisfactory results. Crisp conditions used
in the rules are then replaced by fuzzy conditions fashioned by a human expert. The new set of rules are shown to
produce better results.

Keywords

incremental machine learning, tripartite rules, sc
hema, robot navigation.

1. Introduction

Both Classifier Systems and Q
-
Learning techniques [1]
have two major common deficiencies. These are:

A. The Two
-
Part Rule Form

Production Rules (PR) (both classifiers and Q
-
Learning
rules are PR) have two parts only
: A left side
representing the conditions that have to be met before
the right side, the action, is taken. Thus PR in this form
are situation
-
action rules. The information in schemas
(see 2.2) can be coded in a two
-
part PR syntax.
However, for many reasons

this is not suitable.

1.

There is evidence from animal ethology [2,3]
indicating that animals learn an action result
association, and that this association, as a unit, is then
linked with the context.

2.

The number of rules in PR systems is determined
by
the number of combinations of contexts, actions, and
results that could make up a rule. This could result in a
very large number of rules. In contrast, schemas used in
this paper are built incrementally and hence require less
memory.

B. Implicit Represe
ntation of Result Values

1.

Q
-
learning as well as Classifier Systems assign a
strength to each rule that implicitly expresses the
operational value the rule. This contrasts to schemas
that have explicit declarative result components.

2.

The rule selectio
n mechanism in PR systems chooses
high strength rules, or rules that are in the vicinity of
high strength ones. This means that learning only takes
place along the fringe of the state
-
space that has already
been connected to a goal. While animats should be

allowed to seek knowledge that may not have
immediate use for the goal at hand.

The Learning Mechanism on the other hand is a system
that learns incrementally (i.e. every rule is built in a
number of steps) using explicit units of representation
(schemas)
. The algortihm aims to enable robots to
acquire and use sensorimotor knowledge autonomously,
without
a priori

knowledge of the skills concerned.
This algorithm is based on the Schema Mechanism
developed by Drescher [4]. It was altered and amended
signifi
cantly in three main ways to make it more
suitable for sensorimotor robotic applications. Our
learning mechanism:

* Aims at the automatic induction of a list of
declarative rules that describe the interaction between
an agent (e.g. robot) and its environme
nt.

* Is simplified and made practically useful for real
-
time
applications.

* Is amended to take into account the richness and the
inherent uncertainties of the real world.

This paper has four main sections, the first represents
the basic assumptions and t
erms that are needed to
make the algorithm work. The second section describes
the main algorithm itself, the third outlines the specific
problem and experiments carried out, and the fourth
section shows and discusses the results obtained.

2. Assumptions an
d Terms

2.1 Basic Assumptions

The following are the basic assumptions made, in order
for the main algorithm to work in line with
expectations:

* All learned information may be put in the form of a
list of declarative rules.

* The nature of the environment

is static. The set of
laws that govern the environment do not change over
time.

* There are no hidden states. The relevant aspects of
the environment are all detectable through the robot’s
sensors.

* Crisp conditions are initially sufficient for learnin
g.

* Disjunctions of conjunctions of conditions are enough
to characterise any state.

* Temporal Credit Allocation problem [5] may be
overlooked.

* Actions are taken in serial. They are finite in duration,
and do not destabilise the system.

* Relevant resu
lts can be pre
-
defined in terms of a
combination of conditions.

* There are any number of agents in the environment.
Any one of them may be monitored by the learning
algorithm.

2.2 Definiton of Terms



Schema and Schema Space



F
ig. 1 A schema.

The main structure of the learning mechanism is the
Schema (or rule) Space. The Schema Space is the
collection of all schemas. At any time the Schema
Space of the robot represents all its knowledge. The job
of the learning algorithm is sim
ply to create, modify,
delete, and possibly link schemas.

A schema representation is made of two main
structures: a Main Body (which comprises of a context,
action and result), and the extended structures. (see Fig.
1.) The main body is a tripartite rule
representing a
counterfactual assertion. The extended structures keep
information that is mainly used for creating (or
spinning
-
off) new schemas.

A schema has both declarative and procedural aspects.
Declaratively, it is a unit of information about an
inte
raction between the robot and the environment.
Procedurally, a schema represents a

possible action to
be taken at situations when its context is fulfilled and its
result is desired.

The components of a schema are:



Main Body:

-

Conditions: A condition ma
y be viewed as a function
representing the degree of membership (or D.O.M.) of
a sensor's output in a set representing that condition. In
a crisp DOM case, a condition can either be true or
false.

-

Context, Result and Action: A Context (and similarly
a Re
sult) is a conjunction of one or more conditions
(and their negations.) A Result could be either
predefined or created at run
-
time. Contexts of reliable
schemas are automatically added, at run
-
time, to the set
of results. An Action represents a command to
an
effector to take an action. If an Action is taken then it's
command is executed.



Extended Structures

Each schema has extended structures that contain two
main sets of correlation statistics. These statistics are
necessary for the development of schemas
. The first set
contains the Positive Transition Correlation (PTC) and
the Negative Transition Correlation (NTC) are used to
find
relevant

results of an action. A relevant result of an
action is a result that has empirically shown that it
follows the execu
tion of that action significantly more
often than other actions from the robot’s repertoire of
actions. The PTC discovers positive results while the
NTC discovers negative ones. The second set of
statistics contains the Positive Success Correlation
(PSC) a
nd the Negative Success Correlation (NSC).
PSC is used to find conditions that when included in the
context of a schema, will make its result follow more
reliably

than before adding these conditions. NSC has
the same function as PSC except that it is used

to find
conditions that need to be excluded from the context of
a schema to make its result follow more reliably.
Reliability of a schema is measured by the ratio of: the
number of times that its action is executed, in the right
context, and leads to t
he fulfilment of its result, to the
total number of times that its action is executed in the
right context.



Configuration Parameters and others

-

Result spin
-
off: a new schema made out of a copy of a
previous one, by adding a condition to the result side.

-

Context spin
-
off: a new schema made out of a copy
of a previous one, by adding a condition to the context
side.

-


1: the relevance threshold, for producing result spin
-
offs.

-

N1: the total number of experiments that need to be
taken before a result spin
-
off is allowed.

-


2: the reliability threshold, used for producing
context spin
-
offs.

-

N2: The number of acti
vations that a result spin
-
off
schema needs to go through before it is allowed to
produce a context spin
-
off.

3. The Main Algorithm

The learning mechanism is best described by explaining
the main algorithm that it embodies. This main
algorithm goes through

the following main steps:

1. Randomly select an action and execute it.

2. Use the data collected before, during and after taking
the action of the schema in its context, to update the
two sets of correlation statistics

3. Based on the statistics in step
2, the rule base may be
updated as detailed in the algorithmic notation.

4. Repeat steps 1 to 3 above until the predetermined
number of experiments is met.

The two phases of rule base update is best described in
the following algorithmic notation:


If

( no

of experiments > N1



AND PTC/NTC(Result
i
) >=




䅎䐠剥獵Rt
i

not used before)


then

Result spin
-
off

When update is completed and once the PSC and NSC
are known context spin
-
off takes place according to:


If

(no. of experiments > N2



AND PSC/NSC(Condition
i
) >=

2



䅎䐠䍯湤楴楯n
i

not used before)


then

Context spin
-
off

4. Problem and Experiments

The learning algorithm is now applied to the problem of
robot navigation. The goal of this application is to:

* Show that the algorithm is capable to deduce a list of
rules that is capable (if properly pru
ned) of controlling
the navigational behaviour of the robot navigating an
obstacle
-
free path planned in a given environment.


* Investigate the results of fuzzifying the context/result
conditions on the execution of the deduced rule base.

4.1 The Robot Sim
ulation and the Task to Learn


The robot has a cylindrical body, a differential drive
with two independently driven motorised wheels that
perform both the diving and steering. Four castors to
support the mobile base on the flour (see Fig. 2.)




Fig. 2 A
schematic of the Mobile Robot’s Base.

Steering results from driving the left and right wheels at
different speeds. This arrangement enables the robot to
turn around its centre.

The robot is equipped with an on
-
board electronic
compass, and odometry for loc
alisation.

The robot requires two commands, linear speed and
change of direction. These are separated into individual
rotational speed commands for the two driving motors,
which are put in a velocity closed
-
loop control. The
global position control loop is

closed by the feedback
coming from the localisation system.

The task we want to learn is navigating our robot on a
path consisting of straight line segments. The learnt
navigation rulebase should be able to control the robot
to traverse the planned path s
moothly.

A simulation of the kinematics and dynamics of the
described mobile robot base was used for testing the
learnt control rules. The robot simulation links to
FuzzyTECH 3.1 development environment [6], where
the rules and the input/output membershi
p functions
(including crisp ones) can be graphically edited.

4.2 Experimental set
-
up for learning

The learning algorithm goes through two runs. One to
discover the block of rules that are relevant to the
orientation control. The second block contains rul
es
that control the linear velocity of the robot.

The learning algorithm is configured as follows:

1:= 2,
N1:= the total number of experiments taken,

2:= 1,
N2:= 3. Negative spin
-
off mechanisms are disabled.

The sets of conditions and actions used are:

DirDif={
right_big
,
right_small
,
centre
,
left_small
,

left_big
}, Dist={
very_near
,
near
,
medium
,
far
},
SpIn={
zero
,
slow
,
medium
,
high
}, DirOut={
left_far
,
left_slight
,
straight
,
right_slight
,
right_far
},
SpOut={

zero
,
slow
,
medium
,
high

}.

5. Results

5.1 Lear
ning Algorithm Results

A series of experiments are fed to the learning
algorithm. These experiments were chosen such that
they cover, on a uniformly random basis, the context
space of the actions concerned. The learning algorithm
is run and a series of rul
es are produced. If the
direction control action
left_slight

is taken as an
example we will find the following rules are produced:

IF

right_big ^ left_slight

THEN
right_small

IF
right_small

^
left_slight

THEN
centre

IF

centre ^

left_slight

THEN

lef
t_small

IF
left_small


^

left_slight

THEN

left_big

They were found

with different reliability values
depending on the specific series of experiments taken.
The rules produced by the learning algorithm are then
pruned using the criteria of:
-


1. Relevanc
e to the goal (heading towards the goal),

2. High reliability.

The above rules become:

IF

right_big ^ left_slight

THEN
right_small

IF
right_small

^
left_slight

THEN
centre

With respect to rule block 1, the final list becomes (put
in operational form):

IF

DirDif:
righ
t_
big

THEN DirOut:
left_far

IF
right_big

THEN
left_slight

IF
right_small

THEN
left_far

IF

right_small

THEN
left_slight

IF

centre

THEN
straight

IF
left_small

THEN
right_slight

IF
left_small

THEN
right_far

IF
left_big

THEN
right_slight

IF
left_big

THEN
right_far

For the second block of rules, those that are concerned
with the control of the linear velocity of the robot when
heading towards a goal, a number of constraints is
placed on the learning algorithm:
-

1. Due to inertia, the robot

is prevented from taking an
experiment in which the speed changes suddenly from
slow

to
high
or
high

to

zero
. Speed can only change
gradually. This corresponds to real robots with
dynamics, as opposed to mere kinematic simulations.

C

C

C

C

DW L

DW
R

2. We prune the first l
ist of rules according to a
different criteria (from the case with the orientation
block). This criteria is:


A. Highest reliability.


B. Maximum distance traversal at each step.


C. Zero speed at the goal.

This gives us the following list of rules:

IF Dis
t:
X

^ SpIn:
zero


THEN SpOut:
slow

IF
medium ^

medium

THEN
medium

IF
far

^
medium

THEN
high

IF

far

^

slow


THEN
medium

IF

far

^
high


THEN
high



IF
medium

^

high

THEN

medium

IF
medium

^
slow

THEN
medium

IF
near

^
slow

THEN
slow

IF
very_ne
ar

^
slow



THEN
zero

Since the two blocks of rules are learned separately,
separation is enforced in action. This is done via adding
another block of rules which makes sure that the speed
rules are only active when the robot is heading towards
the goal
. This special block is:

IF DirOut:

right_far

THEN SpOut:
zero

IF
left_far

THEN
zero



IF
right_slight

THEN
zero

IF
left_slight

THEN
zero

This means that at execution the robot should first
execute the first set making sure that robot is in the
right

direction and then the second block starts
executing.

5.2 Simulation Results


Fig. 3 Navigation using Crisp Conditions.

Fig. 3 shows the robot navigating a planned path using
the learnt rules with crisp membership functions for the
context conditions.
It is clear that the robot’s centre
moved off the straight path segments, due to non
-
overlapping between the
straight

and the contiguous
membership functions, and to its width. This is
unsuitable in cluttered environments (e.g. a narrow
corridor). Had the
straight

membership function been
narrower the robot would have swung right and left of
the path in a zigzag, due to the activation of exactly the
same rules regardless of the required amount of
direction change when change of direction is required.


Fig
. 4 Navigation using Fuzzy Conditions.

However, when appropriate fuzzy membership
functions replace the crisp ones, the performance of the
learnt navigation rules significantly improves, as Fig. 4
shows. This is because when the robot becomes closer
to t
he direction of the goal, the final output of the
orientation control rules is significantly reduced
according to the degree of fulfilment.

6. Conclusions and Recommendations

The learning algorithm succeeded in finding the
declarative rules that represent
, in their totality, the
interaction between the robot and the environment.
Many of these rules were operationally useless, and had
to be pruned (according to the criteria mentioned
previously). Once pruned, the resulting rules (both in
crisp and fuzzy fo
rms) were effective in controlling the
robot in navigation.

We have shown that the performance of the learnt
schemas improves as the context conditions are
fuzzified. Hence, our future work would be making the
learning schema mechanism a fuzzy one, which w
ould
be more general and capable of learning tasks in the
continuous real world. Our learning mechanism,
presented in this paper, readily allows this extension.

7. References

[1] Dorigo M. Et al. (1994) "A comparison of Q
-
learning and
classifier systems."

In From animals to animats 3, edited by D.Cliff
et al. MIT Press, Cambridge, MA.

[2] Rescorla R. (1990) "Evidence for an association between the
disciminative stimulus and the response
-
outcome association in
instrumental learning." Journal of experimental

psychology: animal
behavior process, 16, 326
-
334.

[3] Roitblat H.(1994) "Mechanism and process in animal behavior:
models of animals, animals as models." In From animals to animats
3, edited by D.Cliff et al. MIT Press, Cambridge, MA.

[4] Drescher G. (19
90) "Made
-
up minds: a constructivist approach to
artificial intelligence." MIT Press, Cambridge, MA.

[5] Holland J H. (1992) “Adaptation in Natural and Artificial
Systems.” MIT Press, Cambridge, MA.

[6] FuzzyTECH 3.1 Software Manuals.