An Incremental Machine Learning Mechanism Applied to
Nawwaf N Kharma, Majd Alwan,
Peter Y K Cheung
Department of Electrical Engineering, Imperial College of Science Technology and Medicine,
London SW7 2BT U.
Fax: + 44 171 5814419
In this paper we apply an incremental machine learning algorithm to the problem of robot navigation. The learning
algorithm is applied to a simple robot sim
ulation to automatically induce a list of declarative rules. The rules are
pruned in order to remove the rules that are operationally useless. The final set is initially used to control the robot
navigating an obstacle
free path planned in a polygonal env
ironment with satisfactory results. Crisp conditions used
in the rules are then replaced by fuzzy conditions fashioned by a human expert. The new set of rules are shown to
produce better results.
incremental machine learning, tripartite rules, sc
hema, robot navigation.
Both Classifier Systems and Q
Learning techniques 
have two major common deficiencies. These are:
A. The Two
Part Rule Form
Production Rules (PR) (both classifiers and Q
rules are PR) have two parts only
: A left side
representing the conditions that have to be met before
the right side, the action, is taken. Thus PR in this form
action rules. The information in schemas
(see 2.2) can be coded in a two
part PR syntax.
However, for many reasons
this is not suitable.
There is evidence from animal ethology [2,3]
indicating that animals learn an action result
association, and that this association, as a unit, is then
linked with the context.
The number of rules in PR systems is determined
the number of combinations of contexts, actions, and
results that could make up a rule. This could result in a
very large number of rules. In contrast, schemas used in
this paper are built incrementally and hence require less
B. Implicit Represe
ntation of Result Values
learning as well as Classifier Systems assign a
strength to each rule that implicitly expresses the
operational value the rule. This contrasts to schemas
that have explicit declarative result components.
The rule selectio
n mechanism in PR systems chooses
high strength rules, or rules that are in the vicinity of
high strength ones. This means that learning only takes
place along the fringe of the state
space that has already
been connected to a goal. While animats should be
allowed to seek knowledge that may not have
immediate use for the goal at hand.
The Learning Mechanism on the other hand is a system
that learns incrementally (i.e. every rule is built in a
number of steps) using explicit units of representation
. The algortihm aims to enable robots to
acquire and use sensorimotor knowledge autonomously,
knowledge of the skills concerned.
This algorithm is based on the Schema Mechanism
developed by Drescher . It was altered and amended
cantly in three main ways to make it more
suitable for sensorimotor robotic applications. Our
* Aims at the automatic induction of a list of
declarative rules that describe the interaction between
an agent (e.g. robot) and its environme
* Is simplified and made practically useful for real
* Is amended to take into account the richness and the
inherent uncertainties of the real world.
This paper has four main sections, the first represents
the basic assumptions and t
erms that are needed to
make the algorithm work. The second section describes
the main algorithm itself, the third outlines the specific
problem and experiments carried out, and the fourth
section shows and discusses the results obtained.
2. Assumptions an
2.1 Basic Assumptions
The following are the basic assumptions made, in order
for the main algorithm to work in line with
* All learned information may be put in the form of a
list of declarative rules.
* The nature of the environment
is static. The set of
laws that govern the environment do not change over
* There are no hidden states. The relevant aspects of
the environment are all detectable through the robot’s
* Crisp conditions are initially sufficient for learnin
* Disjunctions of conjunctions of conditions are enough
to characterise any state.
* Temporal Credit Allocation problem  may be
* Actions are taken in serial. They are finite in duration,
and do not destabilise the system.
* Relevant resu
lts can be pre
defined in terms of a
combination of conditions.
* There are any number of agents in the environment.
Any one of them may be monitored by the learning
2.2 Definiton of Terms
Schema and Schema Space
ig. 1 A schema.
The main structure of the learning mechanism is the
Schema (or rule) Space. The Schema Space is the
collection of all schemas. At any time the Schema
Space of the robot represents all its knowledge. The job
of the learning algorithm is sim
ply to create, modify,
delete, and possibly link schemas.
A schema representation is made of two main
structures: a Main Body (which comprises of a context,
action and result), and the extended structures. (see Fig.
1.) The main body is a tripartite rule
counterfactual assertion. The extended structures keep
information that is mainly used for creating (or
off) new schemas.
A schema has both declarative and procedural aspects.
Declaratively, it is a unit of information about an
raction between the robot and the environment.
Procedurally, a schema represents a
possible action to
be taken at situations when its context is fulfilled and its
result is desired.
The components of a schema are:
Conditions: A condition ma
y be viewed as a function
representing the degree of membership (or D.O.M.) of
a sensor's output in a set representing that condition. In
a crisp DOM case, a condition can either be true or
Context, Result and Action: A Context (and similarly
sult) is a conjunction of one or more conditions
(and their negations.) A Result could be either
predefined or created at run
time. Contexts of reliable
schemas are automatically added, at run
time, to the set
of results. An Action represents a command to
effector to take an action. If an Action is taken then it's
command is executed.
Each schema has extended structures that contain two
main sets of correlation statistics. These statistics are
necessary for the development of schemas
. The first set
contains the Positive Transition Correlation (PTC) and
the Negative Transition Correlation (NTC) are used to
results of an action. A relevant result of an
action is a result that has empirically shown that it
follows the execu
tion of that action significantly more
often than other actions from the robot’s repertoire of
actions. The PTC discovers positive results while the
NTC discovers negative ones. The second set of
statistics contains the Positive Success Correlation
nd the Negative Success Correlation (NSC).
PSC is used to find conditions that when included in the
context of a schema, will make its result follow more
than before adding these conditions. NSC has
the same function as PSC except that it is used
conditions that need to be excluded from the context of
a schema to make its result follow more reliably.
Reliability of a schema is measured by the ratio of: the
number of times that its action is executed, in the right
context, and leads to t
he fulfilment of its result, to the
total number of times that its action is executed in the
Configuration Parameters and others
off: a new schema made out of a copy of a
previous one, by adding a condition to the result side.
off: a new schema made out of a copy
of a previous one, by adding a condition to the context
1: the relevance threshold, for producing result spin
N1: the total number of experiments that need to be
taken before a result spin
off is allowed.
2: the reliability threshold, used for producing
N2: The number of acti
vations that a result spin
schema needs to go through before it is allowed to
produce a context spin
3. The Main Algorithm
The learning mechanism is best described by explaining
the main algorithm that it embodies. This main
algorithm goes through
the following main steps:
1. Randomly select an action and execute it.
2. Use the data collected before, during and after taking
the action of the schema in its context, to update the
two sets of correlation statistics
3. Based on the statistics in step
2, the rule base may be
updated as detailed in the algorithmic notation.
4. Repeat steps 1 to 3 above until the predetermined
number of experiments is met.
The two phases of rule base update is best described in
the following algorithmic notation:
of experiments > N1
not used before)
When update is completed and once the PSC and NSC
are known context spin
off takes place according to:
(no. of experiments > N2
not used before)
4. Problem and Experiments
The learning algorithm is now applied to the problem of
robot navigation. The goal of this application is to:
* Show that the algorithm is capable to deduce a list of
rules that is capable (if properly pru
ned) of controlling
the navigational behaviour of the robot navigating an
free path planned in a given environment.
* Investigate the results of fuzzifying the context/result
conditions on the execution of the deduced rule base.
4.1 The Robot Sim
ulation and the Task to Learn
The robot has a cylindrical body, a differential drive
with two independently driven motorised wheels that
perform both the diving and steering. Four castors to
support the mobile base on the flour (see Fig. 2.)
Fig. 2 A
schematic of the Mobile Robot’s Base.
Steering results from driving the left and right wheels at
different speeds. This arrangement enables the robot to
turn around its centre.
The robot is equipped with an on
compass, and odometry for loc
The robot requires two commands, linear speed and
change of direction. These are separated into individual
rotational speed commands for the two driving motors,
which are put in a velocity closed
loop control. The
global position control loop is
closed by the feedback
coming from the localisation system.
The task we want to learn is navigating our robot on a
path consisting of straight line segments. The learnt
navigation rulebase should be able to control the robot
to traverse the planned path s
A simulation of the kinematics and dynamics of the
described mobile robot base was used for testing the
learnt control rules. The robot simulation links to
FuzzyTECH 3.1 development environment , where
the rules and the input/output membershi
(including crisp ones) can be graphically edited.
4.2 Experimental set
up for learning
The learning algorithm goes through two runs. One to
discover the block of rules that are relevant to the
orientation control. The second block contains rul
that control the linear velocity of the robot.
The learning algorithm is configured as follows:
N1:= the total number of experiments taken,
N2:= 3. Negative spin
off mechanisms are disabled.
The sets of conditions and actions used are:
ning Algorithm Results
A series of experiments are fed to the learning
algorithm. These experiments were chosen such that
they cover, on a uniformly random basis, the context
space of the actions concerned. The learning algorithm
is run and a series of rul
es are produced. If the
direction control action
is taken as an
example we will find the following rules are produced:
right_big ^ left_slight
They were found
with different reliability values
depending on the specific series of experiments taken.
The rules produced by the learning algorithm are then
pruned using the criteria of:
e to the goal (heading towards the goal),
2. High reliability.
The above rules become:
right_big ^ left_slight
With respect to rule block 1, the final list becomes (put
in operational form):
For the second block of rules, those that are concerned
with the control of the linear velocity of the robot when
heading towards a goal, a number of constraints is
placed on the learning algorithm:
1. Due to inertia, the robot
is prevented from taking an
experiment in which the speed changes suddenly from
. Speed can only change
gradually. This corresponds to real robots with
dynamics, as opposed to mere kinematic simulations.
2. We prune the first l
ist of rules according to a
different criteria (from the case with the orientation
block). This criteria is:
A. Highest reliability.
B. Maximum distance traversal at each step.
C. Zero speed at the goal.
This gives us the following list of rules:
Since the two blocks of rules are learned separately,
separation is enforced in action. This is done via adding
another block of rules which makes sure that the speed
rules are only active when the robot is heading towards
. This special block is:
This means that at execution the robot should first
execute the first set making sure that robot is in the
direction and then the second block starts
5.2 Simulation Results
Fig. 3 Navigation using Crisp Conditions.
Fig. 3 shows the robot navigating a planned path using
the learnt rules with crisp membership functions for the
It is clear that the robot’s centre
moved off the straight path segments, due to non
overlapping between the
and the contiguous
membership functions, and to its width. This is
unsuitable in cluttered environments (e.g. a narrow
corridor). Had the
membership function been
narrower the robot would have swung right and left of
the path in a zigzag, due to the activation of exactly the
same rules regardless of the required amount of
direction change when change of direction is required.
. 4 Navigation using Fuzzy Conditions.
However, when appropriate fuzzy membership
functions replace the crisp ones, the performance of the
learnt navigation rules significantly improves, as Fig. 4
shows. This is because when the robot becomes closer
he direction of the goal, the final output of the
orientation control rules is significantly reduced
according to the degree of fulfilment.
6. Conclusions and Recommendations
The learning algorithm succeeded in finding the
declarative rules that represent
, in their totality, the
interaction between the robot and the environment.
Many of these rules were operationally useless, and had
to be pruned (according to the criteria mentioned
previously). Once pruned, the resulting rules (both in
crisp and fuzzy fo
rms) were effective in controlling the
robot in navigation.
We have shown that the performance of the learnt
schemas improves as the context conditions are
fuzzified. Hence, our future work would be making the
learning schema mechanism a fuzzy one, which w
be more general and capable of learning tasks in the
continuous real world. Our learning mechanism,
presented in this paper, readily allows this extension.
 Dorigo M. Et al. (1994) "A comparison of Q
In From animals to animats 3, edited by D.Cliff
et al. MIT Press, Cambridge, MA.
 Rescorla R. (1990) "Evidence for an association between the
disciminative stimulus and the response
outcome association in
instrumental learning." Journal of experimental
behavior process, 16, 326
 Roitblat H.(1994) "Mechanism and process in animal behavior:
models of animals, animals as models." In From animals to animats
3, edited by D.Cliff et al. MIT Press, Cambridge, MA.
 Drescher G. (19
up minds: a constructivist approach to
artificial intelligence." MIT Press, Cambridge, MA.
 Holland J H. (1992) “Adaptation in Natural and Artificial
Systems.” MIT Press, Cambridge, MA.
 FuzzyTECH 3.1 Software Manuals.