grandgoatΤεχνίτη Νοημοσύνη και Ρομποτική

23 Οκτ 2013 (πριν από 4 χρόνια και 8 μήνες)

132 εμφανίσεις



Cernic, S., Jezierski, E., Britos, P., Rossi, B.

& García Martínez R.

Buenos Aires Institute of Technology

Madero 399. (1106) Buenos Aires. ARGENTINA

genetic algorithms - robotics -
adaptative control - intelligent control.


This paper shows the results of our work
on the use of genetic algorithms for robot
navigation controller optimization, specifically
those that are neural-network based. Both
efficiency and effectiveness were measured in a
set of different scenarios, and the results suggest
interesting conclusions relating these indicators
and the complexity of the environment.

1. Introduction

The problem of robot navigation has
been approached in various ways, and it is now an
accepted ‘useful’ test problem and model problem
for both control and optimization schemes. Many
solution classes have been proposed for this, but
one of the most promising is the behavior control
based on neural networks. Due to its intrinsic
properties, many different problems of a very
diverse nature have been successfully solved with
this technique. The purpose of this study is to
qualify the possibility of neural network controller
optimization by means of genetic algorithms.
The use of genetic algorithms has been
proved to be useful in a large set of optimization
problems (The Boeing engines being a most
remarkable example) of diverse types and
complexities, but the optimization of a neural
network continues to be a demanding application
of this methodology, mainly due to its high
polimodality and size of the search space.
To carry the experiments we first
selected a robot model and a simulation platform
on which we would measure the results. Then we
designed a ‘basic’ problem that the controller was
to handle. Once we had defined the robot’s
objective we architectured a neural network for its
controller that was trained on a very basic and
simple set of behavior rules. Now, with the
trained network, we created a population of nets
that were then subject to a genetic algorithm
based optimization, selecting those with a better
performance. Once we observed an insubstantial
increase of performance, the best individual was
extracted from the population and benchmarked
against implementations of the model problem of
increasing complexity.
2. State Of The Art

It was selected the robot model used by
Korsten, Kopeczak and Szapakowizc [1989], the
one which describes the autonomous agents
behavior as compared to various stages. For the
description of the stages model we base on the
model suggested by [Lozano -Perez and Wesley,
1979; Iyengar and other 1985; Gil de Lamadrid
and Gini, 1987; Mckendrick, 1988; Dudek and
other 1991; Borenstein and Korent, 1991; Evans
and other, 1992], those which establish to study
the learning processes, planning and simulation
two-dimensional stages. The description of the
environment can be simulated in a counterfoil in
the one which each element represents a portion
of the space, the one which can be an obstacle, an
energy point or a point of the passable space by
the robot.
The system can be described as an exploring robot
that perceives the environment. As of the
situation in the one which is found the system
attempts to determine an actions sequence that
permit to it to reach a nearby objective, the one
which calls plan. Said sequence is presented to
the evaluator of plans, who determines its
acceptability. The plans controller in execution is
entrusted with verifying that the plan will be
fulfilled successfully. All movement of the robot
is accompanied of the description of its
environment, the conjunction of the action applied
to such description and obtained the resulting
situation make to the apprenticeship of the
system. If said knowledge already it was learnt is
reinforced what is, in other case is incorporated
what is and are generated mutant theories.
The model of the sensoring system was extracted
and modified from the proposed by [Mahadevan
and Connell, 1992], who were suggesting a
system of 24 sectors, distributed in three levels,
Garcia Martinez [1992c] suggested that the model
had to present 8 sectors constituted in two levels
and distributed in three regions: a Lateral Left, a
Frontaland and a Lateral Straight. The Frontal
region is found shared in vertical form in two
subregions. As previously we mention each
region possesses two scope levels, a level of
sensoring nearby and a level of sensoring distant.
The sensoring system possesses eight sectors,
each sector is corresponded with a binary
representation the one which is reúne in a set that
describes us the perception of a situation.
3. Problem Identification

To have consistent measures of the robot
performance, a constant controller architecture
and problem class was designed. These are
described below.

3.1. Problem Class

From those available on market, ‘the
Kheppera’ simulation platform was chosen. In
this environment, we built ‘worlds’ with two
elements: obstacles (blocks) and lights.
In a typical world, there are four lights
uniformly distributed, and a number of walls,
corridors, and obstacles. The robot’s objective in
a given world is to find all four lights in the least
amount of time (cycles) as possible. Once the
robot finds a given light, it stops receiving it in its
sensors, and so it takes off in search of another.
This process goes on until all four lights are found
or a given maximum time has been reached.
The created worlds are divided into
complexity classes, given by the amount of bricks
in each one of them. This measure was chosen
because it is proportional to the difficulty or
amount of obstacles that the robot will find in its
way. Although it may seem that a larger amount
of brick in equally-sized worlds implies that the
robot obtains more information of it through its
sensors –and therefore solves the problem faster -,
this is not so, because the bricks themselves block
the light sources and because the relative brick-
light configurations were varied from world to
Complexities of 10, 30 and 50 bricks per
world were taken into consideration for the
benchmarking of the controller.

3.2. Controller Architecture

En the Kheppera environment a robot
has infrared distance and light intensity sensors,
and two motors on which it applies power in any
direction. The input of the eight sensors has a
noise level of approx. 10% and the measure types
(light and distance) have different resolution
The designed controller does not include any kind
of ‘memory’ or state retention (internal or
external), so it works in an instantaneous or
stateless mode – it takes an input, and it decides
how to operate on each motor (what power to
apply) only based on these.
The controller output is therefore these
power values, which are applied to the robot’s left
and right motors. It was decided that we would
have only the four basic outputs: going forward,
backwards, and turning left or right on the spot.
These four movements are accomplished
combining equal and/or opposite power values on
both motors.
To simplify and reduce the controller
size the high input resolution provided by the
sensors was reduced. The final detail was set to
eight distance levels and four light intensity
levels. For both these inputs for each of the eight
sensors, the robot would decide which of the four
movements to make.
Taking into account these world and
controller designs, a solution using a neural
network and genetic algorithms was proposed,
and, after the optimization process, we observed
if, and with what efficiency, the robot
accomplished the given objectives.
The genetic algorithms and neural
networks were implemented in the C Language,
running on a Linux / X-Windows box –so as to be
able to function with the Kheppera simulator.
Some solution design decisions, which are shown
in the next paragraph, where influenced by the OS
and environment limitations.

4. Proposed Solution

As described earlier, the proposed work
scheme was to build a neural network, train it
with a minimum set of rules, optimize it using
genetic algorithms and then benchmark the best
controller on a number of different worlds. In this
section, we’ll show the decisions taken and the
specific parameter choices in each of these
To start with, the controller is a feed-
forward neural network, with three fully
interconnected tiers. As mentioned above, the
controller input is the measures of the 8 light
sensors and 8 distance sensors, with 4 and 8 steps
of resolution respectively. From here we conclude
that the input to the neural network is of 40 bits:
24 (8 x 3) for distance and 16 (8 x 2) for light. A
straightforward binary coding from the simulator-
provided value to the appropriate neural net input
was chosen for simplicity.
The controller’s outputs are the
directions of both left and right motors, so each
motor output was coded as a single bit: 1 means
‘forward’, and 0 means ‘backwards’. This gives a
total of 2 bits for the neural net output.
In the hidden layer, we placed 20
neurons. The neuron outputs are continuous
values between –1 and 1 and the axon weights are
unbound. Only one bias neuron, valued at +1, was
used per layer.
Once we had designed the neural net, we
proceeded to train it with approximately 30 basic
behavior rules (which were mainly instructions so
as to prevent head-on collisions to walls, to turn
when obstacles were encountered and to chase
lights). This training was used using classic
backpropagation (with
=0.3), with the only
addition of non-compensated inertia (with
and of the inclusion of a random noise in the
training process (which altered a randomly chosen
axon value) with a probability of 0.0005 per
cycle. The training process continued until we
obtained a MSE of 0.01, achieved after
approximately 150 training cycles.
Once we had this net with trained
weights, we structured the chromosome that
would be processed by the genetic algorithm.
Each axon weight was coded using 20 bits, so our
neural net of 40, 20 and 2 neurons (with its 41x20
+ 21x2 =862 axons) ended up coded as a 17240
bits long chromosome.
Now we had to optimize the controller
behavior using genetic algorithms. We decided to
use the traditional non-reordering operators
(mutation and cross-over). Interesting
investigations might be carried out to determine if
the inclusion of reordering operators (such as
PMX and inversion) increase the optimization
effectiveness and efficiency.
We used a population of 20 individuals,
with mutation probability per bit of P
= 0.00002
and crossover probability of P
=0.7 (crossover
chance per parent couple).
We then modified the Kheppera
simulator so it would allow continuous and
unattended operation of different robots (different
controllers really), so the evaluation of each
individual could be as real as possible. For the
selection process, a score based on performance (a
function of how well it did finding lights and how
many collisions it had) was given to each
individual. So the performance measure of an
individual was achieved by setting it to be the
robot controller, inserting it in a test world, letting
it run for 15.000 cycles and then assigning it a
score based on its achievements. The score
assigning function is summarized in the following

Performance scoring used in genetic
Base Score:
3750 (15000 / 4)
Collision penalty
Objective Bonus:

Finding 1 light 8250 (15000 x (0.25 + 0.3))
Finding 2 lights 13500 (15000 x (0.50 + 0.4))
Finding 3 lights 18750 (15000 x (0.75 + 0.5))
Finding42 lights 24000 (15000 x (1.00 + 0.6))

The penalty is applied on each cycle
the robot collides

After 60 generations of unattended
execution (which took roughly 2 days of
evaluation on the machine we could get) we
extracted the best individual in the population and
prepared the controller for the performance

5. Results and Conclusions

So as to benchmark the performance of the
best genetic algorithm-produced individual we
modified the Kheppera simulator again so it
would allow us to use this controller, set it on a
preset initial position, point it in a certain
direction, and then let it run until it either found
the four lights or 30.000 cycles elapsed. If the
controller found all 4 lights the time used –in
cycles- was stored. If, on the other hand, after
30.000 cycles the controller was still searching,
we stored which lights it had found and then
proceeded with the next evaluation.
For the benchmark, we used worlds that
fell into 3 complexity levels. Let us recall that
each complexity (1, 2 and 3) had 10, 30 and 50
bricks each. For each complexity level 3 worlds
were built, each with the 4 lights and the
appropriate number of bricks.
Four points were chosen on each world
to be the starting spot of the robot for each
measurement. We also decided to set the robot
going facing 4 initial directions (North, East,
South and West) on each of these startup points so
as to make sure that there wasn’t any particular
direction that would prove useful in some worlds
and deterring in others.
Measurements were then taken for each
complexity, each world, and each of the 16 initial
states (4 positions in 4 directions). On each of
these situations, 4 sample measurements were
taken, for a total of 576 simulation runs.
Two result sets are shown. The first
compares the performance in terms of the
achievements – i.e. how many lights were found.
So the comparisons say, for example, how many
robots found four lights, or only three lights, etc.
The second set makes a comparison of
cycles used to find all four lights (for those robots
that did, of course). This would be a measurement
of controller efficiency in the successful case.
The following graph (“Avg. Lights
Found by Complexity”) will allow us to start the
analysis. This graph shows the amount of lights
that were found on all measurements, for all
worlds of a certain complexity. The first
interesting result is that the choice of complexity
measurement (amount of bricks) proved
successful, overthrowing the suspicion that with a
larger amount of blocks the problem would prove
easier to solve.

Avgerage Lights Found by Complexity
3.83 3.78 3.16
1 2 3

Fig. 1: Average lights reached for all worlds of
certain complexity.

Delving a bit deeper into these results we
show “Total Lights Found per Individual”. Here
we can see how many individuals found only 1
light, only 2, etc. for each complexity. It is
interesting to highlight the shift of the maximum
point (located in 4 lights for complexities 1 and 2)
towards 3 in the high complexity case. This
suggests a relationship between the mode of lights
found and complexity. Perhaps with a larger range
of complexities a trend could be clearly exposed.

Total Lights Found per Individual
Comp. 1
3 2 19 168
Comp. 2
4 1 29 158
Comp. 3
10 16 99 67
1 2 3 4

Fig. 2: Amount of lights found per robot for
different complexities

A more easily understood trend appears
when one graphs the cumulative of the previous
chart, that is, for a certain complexity, not the
number of robots that found N lights, but those
that found N - even if they later found more. Thus
for 1 light we get 192 individuals (which means
that every robot in that complexity found at least
one light), and for complexity 3 we get the
minimum mark for four lights (67 – a 34% of the
maximum possible value).
As in the previous case, we can see a trend in the
leftwards shift of the inflection point (the point
where there is a considerable drop in
performance) with increasing world complexity.
As mentioned before, with a larger number of
complexity classes this trend may have been more

Cumulative Lights Found per Individual
Comp. 1
192 189 187 168
Comp. 2
192 188 187 158
Comp. 3
192 182 166 67
1 2 3 4
Fig. 3: Cumulative graph of lights found, per

We’ll start analyzing the controller
efficiency. We previously defined this as the time
used to find all four lights, in those cases in which
all four lights were found. This quantity is
expressed in cycles, and it ranges from 1 to
An interesting result is shown in the
following chart, “Number of Successful Runs”,
which shows the amount of individuals that found
all four lights during the 15 2000 cycle sets.

Number of Successful Runs

Fig. 4: Amount of robots that reached the goal, in 2000
cycle long time spans

shift of the start of the line as complexity
increases, which seems to suggest that the
minimum time needed to reach the goal is a
function of it.
Another interesting issue is that once the
first 14.000 cycles have been used, the difference
between complexities is small and noisy. The
interesting point is that it would seem to suggest
that performance stops being a function of
complexity and starts being more a matter of
chance from a certain point onwards for this
experiment type. It would also suggest that our
choice of using 15.000 cycles for the individual
evaluation (during the genetic algorithm
optimization phase) was lucky, to say the least.
However - could it be that this efficiency
drop was caused by the fact that all controllers
were given only 15000 cycles to show their
performance? With this question in mind, we
proceeded to observe the workings of the
controller in various cases so as to try to see
something that would suggest it–or deny it. But a
number of factors, such as the fact that the
controller is ‘stateless’, and that the performance
drop was more due to the robot entering behavior
cycles (both large and small) than something else,
we dropped the theory. Perhaps increasing the
evaluation time may have increased the
performance limit a bit, by eliminating those
controllers with a tendency to enter large cycles,
but the difference would have been minimal.
Finally, we show the overall
measurement results for all three complexities
(see “Min., Avg. and Max. Times per
Complexity”). Here we show how the minimum
and average time required to reach the goal
increases for the three complexities. It is
interesting to point out that the maximum time
would seem not to be function of the complexity
of the problem - setting aside the probably
random smaller time for complexity 3.

0 5000 10000 15000 20000 25000 30000
Comp. 1
Comp. 2
Comp. 3
Minimum, Average and Maximum Times per Comlpexity

Fig. 5: Minimum, Average and Maximum times used
per complexity

6. References

Carbonell, J., Michalski, R. y Mitchell T. 1983. Machine
Learning: The Artificial Intelligence Aproach Vol. I.
Morgan Kaufmann.
Carbonell, J., Michalski, R. y Mitchell T. 1986. Machine
Learning: The Artificial Intelligence Aproach Vol. II.
Morgan Kaufmann.
Fritz, W. The Intelligent System. 1984. ACM SIGART
Newsletter. Nber 90. October.
Fritz, W., García Martínez, R., Blanqué, J., Rama, A.,
Adobbati, R. y Sarno, M. 1989. The Autonomous
Intelligent System. Robotics and Autonomous Systems.
Vol. 5 Nber. 2. pp. 109-125. Elsevier.
García Martínez, R. 1990. Un Algoritmo de Generación de
Estrategias para Sistemas Inteligentes Autonomos.
Proceedings II Iberoamerican Congress on Artificial
Intelligence. pp. 669-674. LIMUSA. México.
García Martínez, R. 1992. Aprendizaje Basado en Formación
de Teorías sobre el Efecto de las Acciones en el Entorno.
Master Thesis. Artificial Intelligence Department. School
o Computer Science. Politechnic University of Madrid.
García Martínez. 1993a. Aprendizaje Automático basado en
Método Heurístico de Formación y Ponderación de
Teorías. Tecnología. Vol.15. pp. 159-182. Brasil.
García Martínez, R. 1993b. Heuristic theory formation as a
machine learning method Proceedings VI International
Symposium on Artificial Inteligence. pp 294-298.
LIMUSA. México.
García Martínez, R. 1993c. Heuristic-based theory formation
and a solution to the theory reinforcement problem in
autonomous intelligent systems. Proceedings III Argentine
Symposium on Artificial Intelligence. pp. 101-108.
Science and Technology Secretary Press. Argentine.
García Martínez, R. 1993d. Measures for theory formation in
autonomous intelligent systems. Proceedings RPIC'93. pp
451-455. Tucumán University Press. Argentine.
Goldberg. A. Genetic Algorithms in Search and Optimization..
Adison Wesley. 1993.
Kodratoff I. y Carbonell J. 1990. Machine Learning: The
Artificial Intelligence Aproach. Vol. III. Morgan
Matheus, C. Feature Construction : An Analytic Framework
and An Application to decision Trees. Ph.D. Tesis.