Decision-Making Systems in Mobile Robotics

flybittencobwebAI and Robotics

Nov 2, 2013 (3 years and 10 months ago)

82 views

Decision-Making Systems in Mobile
Robotics
Ján VAŠ
ˇ
CÁK
Technical University of Košice,Slovakia
Abstract.This chapter deals with description of means for decision-making in mo-
bile robotics as well as some integrated decision-making systems.They are mostly
oriented to applications in robotic soccer because it enables to solve a number
of robotic problems from optimal control of individual actuators up to construct-
ing complex multiagent systems for teams of robots.The spectrum of nowadays
decision-making means that are used in robotics is much spread and it is not possi-
ble to mention themall.Therefore,there are described only such means that seems
to be perspective for their use in robotic decision-making processes like naive
Bayes classifiers,decision trees,Markov decision processes and fuzzy cognitive
maps.Based on these means several autonomous decision systems are described.
Keywords.decision-making systems,mobile robotics,robotic soccer,multiagent
systems
Introduction
Robots are devices of diverse constructions,which perform various tasks and work in
manifold environments.Therefore there are possible not only many different decision
and control structures but also the definition of the notion decision can vary from one
application to another one.We can meet beside this notion also with expressions like
behaviour,strategy choice or control only.Because the border line between decision and
control is also not clear.This fact is not only caused by definition views of researchers
but also by a fact there are performed many mutually interconnected processes and their
separation would be often suspicious.
Mobile robots and especially humanoids represent the top level of robotics.Because
of their complexity they are the best objects for depicting all possible robotic activities
and related tasks as sensing,especially vision,low-level control,image recognition,lo-
calization,strategy choice,navigation and planning.In all of the mentioned activities de-
cision is contained in any form.Therefore we will concentrate our focus just on mobile
robots in this chapter.Robots can be placed either in static or dynamic environments,
where their elements are changed in time.Further,they work individually or as members
of a team,which requires additional abilities to communicate as well as to cooperate
each other.To show basic information modules of such a robot,relations among them
and possible approaches how to implement decision-making methods being convenient
for robotics it is useful to propose a complex task,where all these elements would be
presented.Development of the mentioned tasks and means for their solution and their
composition into a meaningful entity led to establishing robotic soccer,which will be
mostly our benchmark problem,where we will show individual means as elements of
complex decision-making systems.
Figure 1 shows basic information modules and relations divided into individual De-
cision Levels (DL) from DL-0 to DL-2.These DLs do not differ only by their subordi-
nation (DL-0 as the lowest and DL-2 as the highest) and their tasks but also by means
used.
Figure 1.Basic information structure of a mobile robot configured by decision levels DL-0 – DL-2.
Control can be also viewed as a special kind of decision-making,although its meth-
ods are far from human imaginations about decision-making and approaches.However,
required reaction times for the lowest DL are very short so DL-0 utilizes mostly ’classi-
cal’ approaches like PID controllers and Kalman filters based on analytical description
of the inverse kinematics task by physical equations.Another advantage of these meth-
ods is their ability of doing precise analyses concerning mainly the stability and robust-
ness.These means are used for creating fundamental motion primitives as e.g.stepping,
turning of wheals or obstacle avoidance,which are then combined by instructions from
higher DLs.A special but not the least module is reactive navigation which solves colli-
sions and because of very quick reactions required it is placed into this DL.Processing
sensor data can be divided into two basic groups:(a) basic (rough) processing and (b)
advanced one.Especially for mobile robots there are visual data from cameras the most
significant although the data processing is not limited only to vision.Similarly as motion
control the first group is based on primary operations like filtering,image segmentation,
edge detection using gradients,etc.The output is in the case of vision a compressed im-
age with potentially extractable (but not yet extracted) and highlighted information rele-
vant for decision-making.As seen DL-0 modules use the most strange methods as well as
mathematical calculi fromthe viewpoint of a human and represent a special independent
research area.However,they are at the beginning in formof basic sensor data processing
as well as the end as motion control of the whole decision-making chain and therefore
their omitting would lose completeness of this introductory overview.As the means used
in DL-1 and DL-2 have much more common properties we will further deal with these
two ones.
DL-1 is an overbuilt on DL-0,which enables to a robot to act independently.The ap-
proaches used reflect much more human manners than in the case of DL-0.Many of them
belong into the area of artificial intelligence like decision trees,reinforcement learning,
rule-based systems,fuzzy logic,cognitive maps,etc.The ability of adaptivity and self-
learning became an inseparable part of this DL,too.Thus we can include all adaptation
methods,e.g.evolutionary algorithms and neural networks.The advanced sensor data
processing is practically the second stage after the basic one is done.The compressed
image is still in form of rough data.However,we need to extract e.g.obstacles,target,
paths,etc.fromthe image,i.e.firstly to recognize,secondly to localize themand eventu-
ally to describe themin a more detail.In reality,we need to extract froma data pixel ma-
trix a scene with relevant objects and mutual relations,which is the extracted image in-
formation.Based on this information we can do some decisions.Basically,we can divide
decision-making into two stages (in simpler designs they merge into one) —a strategic
and a tactic one.Doing a border line between these two stages is much speculative.They
depend on individual designs but roughly we can think about strategy in relation to be-
haviour.For instance,in a soccer (football) there are defenders,attackers,goalies,etc.
Principally,any robot can do any of these roles.Their assignment will depend only on
the type of strategy,e.g.defensive or offensive one.For successful performance of the
strategy choice module the situation evaluation is the most important.Resulting activi-
ties fromthis module like planning of activities,choice of a convenient path,etc.belong
to the subordinated tactic stage.If we consider only one robot in a given environment
without any other ones DL-1 will be the top level and not any additional modules are
necessary.
However,if there are at least two robots doing their tasks for the same goal they need
to communicate and to cooperate.In such a case we get a Multiagent System (MAS),
which represents the highest level DL-2 and supersedes the strategic stage in DL-1 in
a large measure.For solving complex robotic tasks MAS becomes intrinsic part of any
decision-making design and therefore we will deal with MAS in a special section.Of
course,all data,information and control flows create a very complex structure and hereby
they require special programming techniques for their handling like memory sharing,
assigning priorities,etc.Such a managing system represents a decision-making system
about itself.However,its description needs extending to programming techniques and
therefore it will not be a subject of this chapter.
In next section 1 some basic means used in decision-making systems will be
described.As our benchmark problem is robotic soccer we will concentrate our at-
tention just on the means used in this area.Nowadays,two main associations ex-
ist,which deal with robotic soccer — FIRA (http://www.fira.net/) and RoboCup
(http://www.robocup.org/).Their characterization,common as well as different features
are described in [1].These associations offer a number of various competitions and we
will focus particularly on the Standard Platform League (SPL),see (http://www.tzi.de/
spl/bin/view/Website/WebHome),where descriptions and reports of participating teams
can be found.They will be cited like RoboCup papers.Although all designs use hybrid
approaches of several mixed approaches one fact is obvious.More than one half of teams
utilize less or more means of artificial intelligence and their popularity is steadily grow-
ing.This is the reason why we will concentrate in descriptions mainly on artificial intel-
ligence means.After that some autonomous robotic decision-making systems composed
of these means will be featured in the section 2.
1.Means for Decision-Making in Robotics
Decision-making processes need two basic groups of means:(a) process description and
(b) object description.The means of the first group are mostly graph-based approaches
like decision trees,Markov chains and related Markov decision processes,cognitive
maps,finite automata and Petri nets,which enable to observe information flows as well
as transitions among states (situations) by actions.Objects (the second group) can be of
diverse matter and therefore we need a palette of convenient means for their description,
e.g.probabilistic and fuzzy models or emotional models for interaction man–machine.
Usually,objects are by means of the first group projected into graph nodes.Of course,
such a division of means is fictive in praxis and they are mutually interconnected.It
should only help us to realize the matter of these two aspects.All these approaches are
able to describe most parts of the same information.The only difference is their knowl-
edge representation.Each type of knowledge representation has some strengths,which
are useful for a given application.
As the fundamental knowledge representation the rule base can be considered and
others may be derived from it.Like an example we can mention the rule base of a goal
keeper [2]:
1.
Search the ball.
2.
Calculate direction and distance to the ball.
3.
If the ball is far then go to the center of the goal.
4.
If the ball is close and straight then wait.
5.
If the ball is close,not moving and straight then go to kick it.
6.
If the ball is close and not straight then walk sideways.
Figure 2.A rule set of the goal keeper’s behaviour.
Figure 2 depicts only a very simple and uncertain rule base.The descriptions of
notions like walk sideways or close are missing.The problem of obstacles in form of
players is not solved,etc.Besides,rules create usually decision (implication) chains and
even closed loops.We can see adding other rules will cause such a rule base will be hard
readable.This is just one reason why other more advanced means for decision-making
process description are necessary.In next parts of this section some of mentioned means
will be described on examples of robotic soccer applications.
1.1.Fuzzy Naive Bayes Classifier
In any task there is a set of possible actions and robots decide,which one of themwill be
performed next time,e.g.to kick a ball to a goal or rather to pass it to a teammate,etc.We
must take into account also a fact that robots are strongly limited by computational power
and sampling times (time intervals,whose robots must react in),which requires first of all
simple solutions.Naive Bayes Classifiers (NBC) correspond just to these requirements.
Although their main area of use is in pattern recognition but some applications appear
also in robotics.
Let us suppose a set (class) of possible actions C = {c
1
;:::;c
k
}.If any action
could be performed some conditions must be fulfilled,which are described by feature
variables X
1
;:::;X
n
.Each X
i
has a finite set of values x
i
∈< x
i;min
;x
i;max
>.If
feature variables achieve certain values,which can be measured then we can ask about
possibility that C = c
j
will be chosen.Hence the conditional probability of occurrence
of C = c
j
named as conditional model and described generally as P(C|X
1
;:::;X
n
)
can be computed using Bayes’ rule as
P(C|X
1
;:::;X
n
) =
P(X
1
;:::;X
n
|C):P(C)
P(X
1
;:::;X
n
)
:(1)
As the denominator in (1) is a constant being independent from any given action it can
be assumed like a normalization value and for further omitted without the loss of cor-
rectness.In addition,if we suppose feature variables are mutually independent then (1)
can be rewritten as
P(C|X
1
;:::;X
n
) = P(C):
n

i=1
P(X
i
|C):(2)
The expression (2),known as naive Bayes probabilistic model,shows we can handle one
dimensional distributions instead of a compound one in (1) hereby the task is simplified
sparing computational efforts and thus the so-called curse of dimensionality is obviated.
Moreover,if the conditional independence assumption is not fulfilled,which is a very
often case in praxis,in spite of that NBC will mostly work well.Robustness is a typical
feature of NBC.
NBCcombines its probabilistic model with a decision rule.In this case we search for
a hypothesis,which is the most probable.Therefore,we will use the so-called maximum
a posteriori decision rule and NBC is defined as a function
NBC(a) = arg max
c
j
∈C
P(c
j
):
n

i=1
P(x
i
|c
j
);(3)
where X
i
= x
i
,C = c
j
and a is a combination of feature variables X
i
for given values
x
i
,i.e.a = {X
1
= x
1
;:::;X
n
= x
n
},which represents an example being classified
by NBC.The output of the classifier is such an action c
j
,where (3) has the maximum
value.NBC can be rewritten into a set of k rules,where ranges of given feature variables
are defined.If a not yet classified example enters the NBC evaluating (3) will result in an
action c
j
with the highest probability.In the case of decision,which of three teammates
a ball should be passed to,there will be three rules (each for one teammate) with nine
intervals for feature variables of distances D
PB
,D
OB
,D
T1
,D
T2
,D
T2
and angles ,

1
,
2
and 
3
as shown in Figure 3.
Figure 3.Feature variables of NBC for passing;P – passer,T – teammate,O – opponent, – ball.
A classical NBC supposes for feature variables discrete intervals x
i
∈< x
i;min
;
x
i;max
> but in reality we operate with continuous intervals (in our case distances and
angles).Further,it could also be advantageous to deal with linguistic indications of these
variables like e.g.short distance or wide angle.The definitions for short and wide would
cover a range of values.Thus we get a finite number of linguistic values,which cover
intervals of feature variables continuously.In such a manner we obtain a fuzzy variant
of NBC (FNBC).In addition,fuzzy systems are typical with their robustness,which is
testified in this case by continuous transition from one action to another one.In [3] the
following modification of NBC (3) is proposed:
FNBC(a) = arg max
c
j
∈C
P(c
j
):
n

i=1
x
i;max

x
i
=x
i;min
P(x
i
|c
j
):
Lx
i
;(4)
where 
Lx
i
is a membership function and Lx
i
stands for any linguistic indication like
short distance or wide angle,see Figure 4.
NBCs as well as FNBCs have another very important advantage.They can be con-
structed automatically using a relatively small amount of r examples e = {X
1
=
x
1
;:::;X
n
= x
n
;C = c
j
} from the training set L (e ∈ L) using supervised learning,
i.e.we know desired relations between inputs (feature variables X
i
) and outputs (ac-
tions c
j
).Just fromthis reason the transformation (1) was necessary.To calculate (4) the
following formulae were proposed:
(a) (b)
(c) (d)
Figure 4.Membership functions for feature variables:(a) distance D
PB
,(b) distances D
OB
and D
T
1
B

D
T
3
B
,(c) angle ,(d) angles 
1
– 
3
;S – small,M– medium,L – large,C – close,W– wide [4].
P(C = c
j
) =
1 +

e∈L

e
c
j
r +k
;(5)
P(X
i
= x
i
) =
1 +

e∈L

e
Lx
i
r +nmf
i
;(6)
P(X
i
= x
i
|C = c
j
) =
1 +

e∈L

e
Lx
i
:
e
c
j
nmf +

e∈L

e
c
j
;(7)
where nmf is the number of all defined membership functions for the feature variable
X
i
and 
e
c
j
is a truth function of an action c
j
in a given example e.For correctness we
require for all membership functions the condition of fuzzy partition,i.e.

∀i

i
(u) = 1,
u ∈ U (U as a general universe of discourse).
In [4] a slightly modified variation of FNBC for passing a ball is described,which
was used in [2] on a simulation platform.The scenario in Figure 3 is simplified only for
one teammate.A series of experiments for various configurations ball,passer,opponent
and teammate is performed.At the beginning of each experiment the feature variables are
recorded (D
PB
,D
OB
,D
T
,,).Then both the opponent and the teammate try to win
the ball.If the teammate gets the ball the experiment will be tagged as success otherwise
miss.In this case the classification is dichotomic (either success or miss).After the learn-
ing stage the classifier is applied in a play with four teammates and four opponents in the
following way.The passer uses the proposed FNBC for evaluation of individual com-
petitions,i.e.one teammate versus one opponent doing all possible combinations.The
lowest probability of success is stored for each teammate and finally the teammate with
maximum probability of success is chosen,which is the so-called min-max approach.
In spite of the fact that the real play is different from that one in the learning stage the
quality of such a classifier is satisfactory,which again proves the robustness of FNBC.
1.2.Decision Trees
Decision Trees (DT) represent one of the oldest means for decision-making.They have
several significant advantages like their simplicity and hereby ability to process with
large data sets.Further,they are able of generalization,where their nodes behave like
centers of clusters in a state space [5].In addition,they are easy to be analysed,e.g.
information gain or entropy calculation and other statistical evaluations.Finally,fromthe
viewpoint of comprehensibility they are of great importance like means for explanation
mechanisms and graphical representation of basic notions and their relations.From a
variety of DT learning methods the Quinlan’s C4.5 [6] and its commercial improvement
C5 are especially very popular.However,comparison of results obtained by conventional
DTs with NBCs proved a little smaller efficacy for DTs in a number of applications,e.g.
in the problemof passing (section 1.1) [4].Only more advanced means derived fromthe
theory of DTs are able to overcome NBCs.
In next part we will focus on a special type of DTs,which was especially devel-
oped for motion planning and was used as a part of the so-called behavioral control for
purposes of robotic soccer [7].Rapidly exploring Dense Trees (RDTs) [8] represent a
family of trees,whose main idea is to incrementally construct a search tree,which would
continuously improve its resolution.In this case decision-making is transformed into a
searching task to find a route (trajectory) from a starting point q
S
to the goal q
G
.These
DTs are constructed with the help of a dense sequence  compound from samples (i),
which describe given area C.If  is a randomsequence then we will call such a DT as a
Rapidly exploring Random Tree (RRT),which is the objective of next considerations.
The RRT algorithm starts with the first sample (1),which creates the first node of
the tree.Then the second sample (2) will do the same and the nodes will be mutually
connected.After this initial step other samples will try to find the nearest point q
n
of
such a gradually growing tree and to create a direct link between q
n
and (i).If q
n
is
a node then a new edge will be added.If q
n
lies somewhere on an existing edge then it
will be divided in the place of q
n
,which will become a new node connecting the graph
with (i).Thus in each step one or two newedges will arise and (i) will become a new
node.If between q
n
and (i) an obstacle exists then a new node q
O
will be created on
the border with the obstacle closer to the tree and (i) will be returned somewhere to the
sequence ,see Figure 5.
The RRT algorithm can be rewritten into a pseudocode as depicted in Figure 6,
where we can see the number of tree nodes is at least equivalent to the number of (i).
If they are dense enough and properly dispersed in the investigated area then we will get
an accurate map of free area S (S ⊆ C) with indicated obstacles and further limitations.
Individual edges are possible paths,which can be chained into a trajectory froma starting
point q
S
to a goal q
G
using any graph search algorithm.For generating a motion plan we
start creating the tree with q
S
and continue expanding (exploring the area) until a node
in q
G
(logical tag Path = 1) is created or after a given number of cycles we will give
up,i.e.Path = 0.The functions NEAREST and SEARCH_BORDER can be realized
variously.The main effort is in minimizing computational demands because these two
functions are often used.Also expanding the tree can be modified,e.g.as simultaneous
Figure 5.Process of creating a RRT;O – obstacle.
expanding of two trees fromq
S
and q
G
,respectively until they mutually merge.A more
detailed description of RDTs can be found in [8].
1.3.Markov Decision-Making Processes and Reinforcement Learning
A real robot is a dynamic system,where its states change.To exactly describe it,we
need to consider its current state,actions and also partially history of its states,which
is in form of a series of significant derivatives.It means a kind of memory is necessary
and the problem becomes too complicated (description as well as solving).However,if
the sampling is dense enough we can do a simplification based on the assumption that a
new state s

of the robot will depend only on its current state s (s;s

∈ S,S – set of all
possible states) and performed action a.If we suppose there is no deterministic transition
s → a → s

we will describe it as a probability p in form of the transition function
T(s;a;s

),which is in other words a conditional probability p(s

|s;a),too.Considering
states as nodes of a graph and actions as its edges we can depict all possible combinations
s → a → s

as seen on an example in Figure 7,where edge expressions a
i
:p
j
denote
probability p
j
that action a
i
will transform the input state of a given edge to the output
one.We can see it is required to keep a sum of all probabilities for any action a
i
of a
given node equal to one (the second Kolmogorov probability axiom).
Besides,a reward function r:S → R is defined,which enables to sort individual
states.Then we will start a chain of transitions from one state to another one,etc.Each
transition will be rewarded.After a series of such transitions the immediate rewards r
will be in a way accumulated into a cumulative reward (utility) u.Of course,a robot
Path ←0
Initialize RRT
for i = 1 to K
q
n
←NEAREST(RRT;(i))
if q
n
̸=node
Split edge
Add q
n
as a node to RRT
end if
if edge (q
n
;(i)) ⊂ S
Add (i) as a node to RRT
Add edge (q
n
;(i))
if (i) = q
G
Path ←1
return
end if
else
Add node q
O
←SEARCH_BORDER(edge (q
n
;(i)))
Add edge (q
n
;q
O
)
Include (i) into 
end if
end for
Figure 6.Pseudocode of RRT creating algorithm.
(agent) tries to maximize u.How to solve this task under mentioned conditions it is
the objective of the so-called Markov Decision Process (MDP),whose output is usually
optimal policy formulated as


(s) = arg max
a∈A

∀s

T(s;a;s

):u(s

):(8)
In other words, is a mapping from states to actions :S → A (A – the set of all
actions).The influence of future rewards is often required to be decreased by a discount
factor like e.g.
0
:r(s
1
)+
1
:r(s
2
)+
2
:r(s
3
):::for a sequence of states s
1
;s
2
;s
3
;:::
Considering a similar sequence of states we can estimate the cumulative reward u for
each state s and so we will get a systemof n Bellman equations (if there are n states) as
u(s) = r(s) + :max
a

∀s

T(s;a;s

):u(s

):(9)
From computational complexity reasons the equations (9) are mostly solved iteratively.
A more detailed description can be found in [9].
We see that MDP simulates processes in time as well as it also gradually enhances
their quality and improves assigning correct actions to individual states.It is an adaptive
(self-learning) means,too.MDP is a general calculus and can be used not only in sim-
ulation processes (e.g.epidemic spreading) but also in planning or multiagent systems.
In [10] MDP was used for localization of robots,where this approach was combined
with a Kalman filter.MDP was used for estimation of the rotation angle and Kalman
filter for estimation of translational robot’s position.Such hybrid approach simplified
Figure 7.Graphical representation of MDP —an example [9].
the complexity of this probabilistic task and hereby it shortened the computational time,
where the information fromsensors is not totaly precise.
However,a fully described MDP as in Figure 7 is often not available.In reality,
robots must operate in unknown areas where states,their relations and even rewards are
unknown.Thus MDP without some modifications is not usable.One of means for solving
such a problem is Reinforcement Learning (RL) representing a broad set of methods,
which use environment description based on MDP.For this reason the MDP scheme
is modified in such a sense that rewards are placed on the edges instead of the nodes
(states).So we get the reward function in formr(s;a):S ×A →R.In other words,for
each state s we search the best action a,which maximizes the cumulative reward u(s;a)
and Bellman equations are in form
u(s;a) = r(s;a) + :max
a

∀s

T(s;a;s

):u(s

;a

):(10)
Although RL is a supervised type of learning unlike conventional supervised learn-
ing RL does not use pairs input/desired output but it is concentrated like MDP on win-
ning the biggest cumulative reward during a time period,i.e.finding the best policy


(s).Methods for handling the maximization process divide RL approaches into two
groups:(a) model-free and (b) model-based methods.The first group is concentrated on
searching the optimum action for each state.The main advantage is low computational
complexity but the quality is less than for the model-based methods.Therefore methods
of the group (a) are convenient for systems with small amount of dynamics.A typical
representative is Q-learning [11].Methods of the group (b) at first try to approximate
the transition and reward functions hereby creating their inner models,which actions are
simulated in.There is still one more very worth property of RLs.Similarly like NBCs
these types of algorithms do not need large data sets of training data,which is a very
important condition.Next we will describe a hybridized form of a model-based RL us-
ing DTs (RL-DT),which was used for training a bipedal robot NAO to score penalty
kicks [12].
As computational complexity and related time responses are critical in robotic appli-
cations the main idea of RL-DT is to minimize the so-called exploration mode although
the obtained solution needs not to be the optimum.For a more efficient exploration DTs
are used,which are able quickly to minimize the search area by generalizing similar
states into one common.DTs are used for constructing T(s;a;s

) and r(s;a) [5].If a
state s is characterized by n state variables X
i
then it is in form of a vector with val-
ues x
i
,i.e.s = (x
1
;:::;x
n
),x
i
∈ X
i
.Thus using e.g.Quinlan’s C4.5 algorithm there
are constructed n +1 DTs for variables X
i
and the reward r(s;a).These DTs perform
predictions of the changes when their input is in form (a;x
1
;:::;x
n
).Each tree is pro-
cessed resulting in a leaf dependent on the input.For transition DTs we get probabilities
of state variables changes p(x
c
i
|s;a) (x
c
i
∈ X
c
i
as an element of the state change s
c
).
As we suppose the variables X
i
are independent then we will get a change of the total
transition function as
T(s
c
;a;s) =
n

i=0
p(x
c
i
|s;a):(11)
Finally,next state s

is computed as a sum of the current state s
m
and s
c
.A similar
treatment is in the case of the reward DT.This activity represents the UPDATE_MODEL
function used in the RL-DT algorithm(see Figure 9).Figure 8 shows an example of a DT
for relative changes of the variable X if a robot is moving on a grid.DT first investigates
whether the action ’go to the left’ has been performed.Then the decision-making flowis
divided by individual results of given questions till a leaf (a rectangular node) is reached.
If a robot is in the position indicated by a bullet and it moves to one of the black fields
(there is important also the robot’s orientation) then DT will not generate any changes.
RL-DT algorithm divides its processing into two separate stages.Beside the explo-
ration mode there is also the so-called exploitation mode.During exploration the models
T(s;a;s

) and r(s;a) for new (yet unknown) states are updated.During exploitation we
suppose the models are correct and therefore the optimal policy is calculated in a way as
defined for MDPs.The algorithmstarts with a state s,which is added to a set of already
visited states S
M
.At the same time s and all connected actions a construct state —ac-
tion pairs,whose numbers of visits are counted in counters visits(s;a).After that a loop
will be started,where the action a with the highest cumulative reward u(s;a) is chosen
and the state s

is obtained.If it is not yet in S
M
it will be added.The function UP-
DATE_MODEL is applied using C4.5 algorithm,which incrementally updates models
for T(s;a;s

) and r(s;a).If some updates are done the logical variable CH (change)
will be set to 1.At this moment the algorithm decides between the exploration and ex-
ploitation mode with help of a heuristic incorporated in the function CHECK_MODEL.
If the prediction of an immediate reward is less than 40%of the maximum one then the
exploration will be switched on and the logical variable exp will be set to 1.This heuris-
tic causes exhaustive exploring in the early stages,when the knowledge of the domain is
low.After gaining positive rewards the exploitation mode will be chosen.Finally,a new
optimal policy will be computed by the function COMPUTE_VALUES if the models
T(s;a;s

) and r(s;a) were changed.The loop counter is incremented and a new loop is
Figure 8.An example of a DT for change prediction of the variable X [5].
started with s

as s.The pseudocode of RL-DT algorithmis depicted in Figure 9.Further
details of this algorithmare in [13].
Include s into S
M
∀a ∈ A:visits(s;a) ←0
loop
a ←arg max
∀a
u(s;a)
Execute a,obtain reward r(s;a),observe s

Increment visits(s;a)
if s

̸∈ S
M
Include s

into S
M
∀a ∈ A:visits(s

;a) ←0
end if
PerformUPDATE_MODEL
exp ←CHECK_MODEL
if CH = 1
COMPUTE_VALUES
end if
s ←s

end loop
Figure 9.Pseudocode of the general RL-DT algorithm.
1.4.Fuzzy Cognitive Maps
As mentioned above rule bases are considered as the basic knowledge representation.
However,in the area of mobile robotics we have often to handle with decisions chains,
where many of them create closed loops.Thus individual rules are causally intercon-
nected (see rows 1–2 in Figure 2) and cannot be ordered in a simple list of uniformIF —
THEN rules as it is usual for the so-called simple rule-based systems,e.g.rows 3–6 in
Figure 2.In such a case rules are not mutually independent.In a more complex rule-
based system relations between rules become incomprehensible and a graphical means
is necessary to clearly catch them.Fuzzy Cognitive Maps (FCM) seem to be just such
an aid not only for knowledge representation but also for its analyzing and processing
complex rule-based systems [14].
The notion Cognitive Map (CM) was introduced by a political scientist Robert Axel-
rod primarily to model social processes [15].It is an oriented graph where its nodes rep-
resent notions and edges are causal relations.Mostly,notions are states or conditions and
edges are actions or transfer functions,which transform a state in a node to another one
in another node.CM is able to describe complex dynamic systems.It is possible to in-
vestigate cycles,collisions,etc.and to define strengths (weights) of relations,too.Origi-
nally,they were represented by three values −1;0 and 1.Another advantage is its human-
friendly knowledge representation and ability to describe various types of relations (in a
more detail see e.g.[16]).
FCMrepresents an extension of CMand was proposed by Kosko in 1986 [17].The
extension is based on strength values that are froman interval [−1;1] as well as the nodes
can be represented by activation values from an interval [0;1] or by membership func-
tions as well,see Figure 10.Strengths after their combining correspond to rule weights
in rule based systems,too.
Figure 10.An example of FCM.
There are two basic formal definitions of FCM[18] and [19].Further,the definition
by Chen will be used where FCMis defined as a 4-tuple:
FCM = (C;E;;);(12)
where:
C - finite set of cognitive units (nodes) described by their states
C = {C
1
;C
2
;:::;C
n
};
E - finite set of oriented connections (edges) between nodes
E = {e
11
;e
12
;:::;e
nn
};
 - mapping :C →[0;1] (originally proposed as [−1;1]);
 - mapping :E →[−1;1].
In other words, is a computational method for evaluating numerical values of the
set of nodes C.On behalf of correctness,it is necessary to mention that a cognitive unit
is represented by two values:(a) its symbolic (linguistic) meaning denoted as C
i
and (b)
its numerical activation value A
i
∈ [0;1].C
i
represents the qualitative aspect and A
i
the
quantitative one of a node.However,fromsimplicity reasons we can omit this distinction
and we will use the symbol C
i
although further we will handle only activation values.
On the other hand, represents the way how the weights of edges e
ij
∈ [−1;1] are
determined.The signs in this case define the character of relationship between nodes —
either strengthening (+) or weakening (−).
The set of connections E forms the so-called connection (adjacency) matrix,which
can be used for computing of new activation values of nodes C.For the example in
Figure 10 it will look as:
E =






0 −1 0 −0;5 1
−0;9 0 0;3 0 0
0 0 0 0 1
0 0 −0;4 0 −1
0 0 0 0;7 0






:(13)
Cognitive units are in each time step k in a certain state.Using E we can compute
their states for next time step k + 1 and thus repeatedly for further steps.Similarly as
for differential equations we can draw phase portraits.To preserve values in prescribed
limits a limiting (threshold or transform) function L is used,too.So we can compute the
states C
i
for k +1 as follows [19]:
C
i
(k +1) = L


n

j=1
e
ij
:C
j
(k)


:(14)
The formula (14) can be generalized for computation of the whole state vector C
using a matrix product,i.e.C(k + 1) = L(E:C(k)),which is a special form of  in
the FCMdefinition.There exists still an incremental modification of (14) where C(k) is
incremented by computed values,e.g.[20]:
C
i
(k +1) = L


C
i
(k) +
n

j=1
e
ij
:C
j
(k)


:(15)
The primary role of the function Lis to keep activation values in the interval [0;1].A
number of functions fulfil this condition.However,as the most advantageous the sigmoid
function seems to be [21].For a special kind of FCMwith active inputs [22] any kind of
typical membership functions can be used as L.
If we compare a complex system like in Figure 10 and a simple system with m
independent rules as Figure 11 (e.g.rows 3–6 in Figure 2) we can see FCMs are an
extension of simple fuzzy rule-based systems.Fuzzy rules are totally independent in
simple systems because their consequents do not have any mutual influence,which is
possible only in simpler decision cases.Simple rule-based systems do not enable any
decision chains or representation of temporal information.From this point of view they
are only a very special and restrained case of FCMs,which can be depicted as an example
in Figure 11 where a set of mrules with antecedent parts LX
i
,which represent already
merged particular inputs like x
j
is LX
i
j
(x
j
∈ X) and outputs LU
i
(i = 1;:::;m)
is figured in form of a FCM.There is depicted the evaluation process resulting in an
accumulated (aggregated) value LU
c
being defuzzified into a crisp form LU

c
,too.A
more detailed description of fuzzy reasoning methods is e.g.in [23].
Figure 11.An example of FCMrepresenting a simple rule set.
2.Autonomous Decision-Making Systems in Multiagent Environment
Mentioned means for decision-making support fromthe previous section represent some
elements of a decision and control structure,which formulates the main goal of a given
systemand interconnects these means to fulfil the goal.If we consider an individual robot
then there will be necessary only DL-0 and DL-1,see Figure 1.However,robotic soccer
is a teamplay and robots must be supplied by other abilities for their mutual cooperation,
which extend our basic structure to DL-2,whose task is to create an overall structure for
control,decision-making,communication,cooperation and behaviour management of a
distributed system with several more or less independent units called as agents.Such an
overall structure is the objective of MAS.
The concept of MAS was inspired by several apparently different disciplines like
economy,sociology and biology and it represents common interconnections among
them.From the typological viewpoint of information sciences it originates from an in-
tersection of distributed computing and artificial intelligence as a part of distributed ar-
tificial intelligence [24],which differs fromdistributed computing using communication
and cooperation instead of parallelization and synchronization.
In our case agents play a key role of MAS.There does not exist any general definition
of an agent because this termis too widespread,existing on various levels of complexity
and it can be implemented in a number of forms.However,we can characterize it by
some minimal properties that must be fulfilled [25]:
1.
Autonomy:An agent is able to operate independently froma human.It has indi-
vidual internal states and goals.Autonomy cannot be real without the proactivity
attribute,where it takes the initiative and does not wait passively for a stimulus
fromoutside.
2.
Cooperation:It is the core property,without it we cannot speak about any MAS.
It is the ability to interact with other agents (eventually human) using a commu-
nication language.
3.
Learning:Maybe not always imperative but very useful property,which enables
to the agent to be flexible and able to adapt to changes of environment.
Intersections of these properties enable us to define the following categories of agents:
collaborative agents,collaborative learning agents,interface agents and smart agents,
see Figure 12.For purposes of robotic soccer mainly collaborative and smart agents come
into consideration.If there is allowed any central command unit like a couch or captain
then also collaborative learning agents can be used.
Figure 12.Basic typology of agents by [25].
Another division of agents is possible by kinds of control and their approaches:
reactive,deliberative,hybrid and behaviour-based [26].
Reactive control represents the simplest type.Its approach can be expressed like:
"Do not think,act."Such an agent only reacts immediately to an external stimulus.It uses
a finite set of actions being known in advance and so it does not need any inner model
of the surrounding world or communication with other agents.From this reason such
agents are known as ’memoryless’.Their main advantage is computational simplicity
and speed.Their use is on the lowest decision (control) level based on direct inputs
fromsensors.Although,these agents seemto be very primitive but convenient growth of
their complexity can lead to apparently intelligent behaviour with help of the so-called
subsumption architecture,which belongs to this category,too.
Deliberative control is based on:"Think hard,then act."It is characteristic by proac-
tive and intentional approach.Its main feature is communication with its surrounding
and creating own inner world model with the aim to reach determined goals.The world
model is strictly centralized and based on sequences state — action.Its purpose is to
design plans of actions.Beside apparent advantages there are also drawbacks like their
high information dependency and computational complexity.Such an approach strikes
on needs as prediction,processing inaccurate information,change of model in dynamic
environment,etc.Therefore,the use of deliberative control is reasonable on the highest
decision levels.
Hybrid approach is characteristic by:Think and act independently,in parallel."It
tries to eliminate the disadvantages of the two mentioned types of control.Comparing
real needs of robotic teams we can see the necessity of both approaches.Therefore,the
reactive as well as the deliberative architecture are merged into one unit but they work
independently each other.The reactive part ensures real-time responses on the level of
sensorial data,e.g.to avoid collisions with obstacles.The deliberative part works on high
decision-making levels using abstract world representation to make strategic decisions
and plans.
Behaviour-based approach is a little of different nature.Its basic idea is:"Think the
way,you act."This approach found its inspiration in biology.It is based on observed pat-
terns of various kinds of behaviour.These behaviours are derived from interactions be-
tween the agent and its environment.So incremental adaptation is supposed,where first
basic (survival) actions are created and continuously more advanced patterns are added,
e.g.starting with simple obstacle avoidance up to e.g.passing a ball.Individual patterns
are stored in the system in form of distributed layers,which are mutually competitive.
This is the main difference to the hybrid type,which also determines the use of these two
control types.
Hybrid agents are used mostly in single robot control whereas behaviour-based con-
trol is used in multi-robot applications like robotic soccer,where a quick response to
changes of such a dynamic game is especially required.In next parts of this section we
will deal mainly with such MAS architectures,which are related first of all to behaviour-
based control.
There is still another division of agents in MAS,which are constructed over them:
(a) homogenous,i.e.agents are mutually equivalent (hardware as well as software) per-
forming the same tasks and (b) heterogeneous if they differ by different roles (e.g.pos-
sibility to incorporate hierarchies) although their hardware could be identical.Heteroge-
neous agents are from the viewpoint of research and also of utilization potential more
interesting and therefore most MAS architectures try to solve communication and role
allocation among heterogeneous agents,too.
Summarizing mentioned facts we can state that MAS are suitable especially for
situations where agents have incomplete information as well as data are decentralized.In
addition,there is absent any control from a global level and calculations are performed
asynchronously.As robotic soccer can be characterized just by such properties it seems
to be an excellent candidate for using MAS as well.In general,MAS enable shortening
of computational response (thank to parallelism) as well as communication demands
because agents are functionally independent units.Besides,operability and reliability of
such a systemis enhanced because of flexibility of MAS architectures,in general.
In next parts we will introduce some types of decision-making systems for robotic
teams where the principles of MAS can be more or less visible.However,till nowadays
there does not exist any decision-making system with full functionality of MAS as de-
scribed above.We must take into consideration there are still lots of children’s problems,
which must be solved and robots do not yet own so powerful processors they would be
able to solve these problems in full breadth and depth.
2.1.Cooperative Control Architecture ALLIANCE
ALLIANCE is one of the most known control architectures for multi-robot applica-
tions [27].Although it is not directly used in robotic soccer because of its proper-
ties,where the existence of adversaries is not supposed,but some of other approaches
(e.g.[28]) are more or less inspired by ALLIANCE.
ALLIANCE is proposed for mutual cooperation of heterogeneous robots in small-
and medium sized teams.It is determined for dynamic environments,where problems
with faulty sensing or incomplete information can occur to some extent.Mutual com-
munication is not always necessary and there is no central unit with complete informa-
tion about the environment.Agents (robots) are fully independent.However,on the other
hand side,it is required that robots should be able at least partially to sense their position
and results of their activities as well as of that of their teammates,in general.First of
all,full cooperation among all robots is supposed,i.e.no tricks,no lying or even ad-
verse behaviour.Just the last condition contradicts needs of robotic soccer but there are
many other application possibilities,where we can conform to these requirements,like
watching systems,objects (garbage) collecting or handling (box pushing).
As in praxis some problems with communication can happen it is more convenient
to be able to recognize behaviour of other robots to predict their future performance than
to ask them directly about their activities.Principles of ALLIANCE are based on such
an action recognition.The basic architecture of ALLIANCE is depicted in Figure 13,
which is implemented on each agent.We can see it is a fully distributed cooperative
control scheme of the behaviour-based architecture.It is based on behaviour sets,which
represent individual roles or modes.For instance,in our case a robot can change a role
of an attacker to a defender depending on the situation.These roles are connected with
various activities and behaviour on higher decision levels.A behaviour set can be either
active or nonactive at one moment.There can be active only one behaviour set for each
robot.On contrary,the lower control levels (indicated as layers) are allowed to be active
continuously,i.e.for collision prevention.The selection of a behaviour set is performed
by motivation behaviour,each for one behaviour set extra.Motivation behaviour evalu-
ates suitability of a given behaviour set and activates or inhibits it,which is in Figure 13
indicated by gate-shaped symbols.Therefore,these modules are mutually interconnected
by cross-inhibitions.
The main idea of this approach is based on an assumption that each agent should
perform a task only as long as it demonstrates its ability to perform the desired effect
on the environment.It means,agents are self-checking and only they alone decide about
their activities.Such a behaviour is based on two principal motivations —impatience
and acquiescence.The impatience helps the agent to act if other agents fail.On contrary,
if the agent is performing a task,which continues over a certain time limit,then it will
become more willing to interrupt it and to enable other agents to continue with a chance
it will be accomplished.At the moment if other agents start performing the same task
the given agent will give it up.As an example we can mention transferring boxes from
one place to another one and their placing into racks.If an agent (A) starts transferring
then the agent (B) will try e.g.to place theminto racks.It is caused by the acquiescence
Figure 13.The ALLIANCE architecture [29].
motivation of B.After a certain time Acan decelerate its activities,e.g.due to discharged
batteries,which can be observed that B must wait for new boxes.Then B will start also
transferring boxes,i.e.the impatience motivation is shown.Contemporarily,the agent A
will give up transferring boxes due to the acquiescence motivation and another behaviour
set will be activated,e.g.starting recharging batteries.In other words,the robot Ashould
allow to the robot B to finish its task before it will again become impatient to B.On
contrary,there is no reason for Ato stay inactive if a given task is unfinished and together
no other robot starts accomplishing it.If both motivations are balanced to be mutually
complementary then both agents will cooperate without any eventual conflicts.
A behaviour set is composed of activities,which together direct to accomplishing a
task.It depends on a concrete task whether these actions are mutually interconnected by
some decision rules or they are simply chained into a series.For instance,we can define
a task passing.How to realize it,what sequence of steps to choose (e.g.leading a ball,
turning to an appropriate kicking angle,deciding about eventual obstacle avoidance and
finally kicking it,etc.),it is the objective of concrete tasks contained in a behaviour set.
Therefore,each behaviour set is handled as a unit.
Similarly,as in alternating activities among robots also behaviour sets alternate mu-
tually.As already mentioned at one moment only one behaviour set can be active on
the robot i.However,if it is active considerably long and another behaviour set j ex-
ceeds a certain activation threshold  of its motivation value m
ij
(t) then the first set will
be inhibited and the another one activated.The motivation m
ij
(t) depends on several
factors described by functions like sensory feedback sf(t),inter-robot communication
irc(t),suppression fromactive behaviour sets sabs(t),as well as already mentioned im-
patience imp(t) and acquiescence acq(t).For a time step t the total motivation m
ij
(t)
(m
ij
(0) = 0) is calculated as
m
ij
(t) = (m
ij
(t −1) +imp(t)) ×sf(t) ×irc(t) ×sabs(t) ×acq(t):(16)
The functions sf(t),irc(t),sabs(t) and acq(t) are in formof threshold functions,whose
outputs are 0 or 1.These values indicate occurrence of some conditions for growing
m
ij
(t) (value 1) or resetting it to 0.The impatience function imp(t) defines the rate
of motivation growth.All these functions contain a number of parameters.To easier set
them up an adaptation (learning) modification of ALLIANCE named as L-ALLIANCE
was designed.More detailed description of ALLIANCE as well as L-ALLIANCE can
be found in [29].
2.2.Behavioral Kinodynamic Planning
MAS approaches tend to divide a given control system into hierarchical levels.Such a
way is indeed systematic giving an inner order but individual control levels lose imme-
diate contacts each other because it is restricted only to exactly defined communication
links.Further,many tasks are in reality hardly to separate because they are conditionally
interconnected.Such a simplified division into modules and levels can be too rough.Thus
higher strategic levels lose the contact with physics of a robot and its environment.Simi-
larly,lower control levels do not have any mind about strategy.Further,soccer (football)
is considerably characterized by its randomness.From these two reasons the so-called
Randomized Behavioral Kinodynamic Planning (RBKP) of tactics was proposed in [7].
RBKP is based on embedding predefined basic skills and tactics into a kinody-
namic planning framework,which utilizes physical principles of kinematics and dynam-
ics.Such an approach is strongly problem-oriented,i.e.it requires specific knowledge
but its efficacy can be very high.Using RRTs-based planning (see section 1.2) enables
quickly to define tactics as the best nondeterministically modeled combination of skills.
Robotic soccer is thus seen as a randomized kinodynamic planning problem.The struc-
ture of RBKP is depicted in Figure 14.If we compare this structure with Figure 1 we
can see the modules of strategy choice and action planning in DL-1 are merged in this
approach.
For modelling tactics a nondeterministic Finite State Machine (FSM) is used.This
approach enables to generate various sequences of state transitions,i.e.tactics,which
reduces the behaviour predictability for opponents,too.Accordingly,skills will be also
nondeterministic.The system searches over a variety of simulated executions until it
finds such an instance that leads to the goal.The skills do not execute deterministic
actions based on the state of the environment but they have certain freedomto randomly
chose from alternatives of their typical behaviour.For instance,if a skill named kick
to goal tries usually to kick the ball to the biggest free area of the goal it can change
randomly its habit to kicking into a corner of the goal and so to surprise the opponent
goalie.In such a way the system becomes more creative experimenting with various
alternatives and thus it has chance continuously to find optimal tactics as well as skills.
In other words,it is a self-learning approach.
To transformour task of robotic soccer into the graph space of RRTs it is necessary
at first to define the state space X of robotic movement rm as well as inner states of
robots described by FSM states fsm together with information about time t,i.e.x =
(t;fsm;rm),where x ∈ X is a concrete state of the whole playground.The robotic
movement represents a subspace of X,where e.g.positions,rotations,velocities,etc.
of all objects of the playground are included,i.e.not only robots but also the ball and
Figure 14.Structure of the randomized behavioral kinodynamic planning system[7].
other obstacles,etc.Hence X is basically a multi-dimensional space,generally with a
high dimensionality.To simplify such a form of representation it will be transformed in
a space Qof decision trees,in our case RRTs.
The process of constructing behavioral kinodynamic planning is following.Using
the start state the tree T is initialized.Then using the function SAMPLE_RANDOM_
STATE,which uses internal probability distribution,a sample state x
sample
from the
state space in form of (i) will enter the RRT creating algorithm,see Figure 6.In com-
parison to section 1.2 the calculation of the nearest point q
n
requires using a more so-
phisticated metrics than using Euclidean one,see [7].This is provided by the function
KINEMATIC_NEAREST_NEIGHBOUR.As behaviour depends also on inner states of
the robot it is needed to calculate state transition using a nondeterministic FSM(function
FSM_TRANSITION),too.Then we verify whether this new internal state fsm_state
is valid (function IS_VALID_FSM_STATE),i.e.all conditions and limitations are kept.
For instance,there is a rule that only the goalie can operate in the penalty area.Others
are prohibited to enter it.If it is not fulfilled a new (i) is chosen.If the new state is ac-
ceptable an action a,which is a skill,is calculated by APPLY_FSM_BEHAVIOUR.This
function associates concrete inner states with actions into pairs (fsm_state −a),which
is the task of user’s imagination about the style of playing and they can be expressed also
in form of rules like IF fsm_state THEN a.The action a is usually a vector with sev-
eral parameters describing e.g.angles for individual joints.At this moment a and q
n
will
enter the kinodynamic simulator (function SIMULATE),the so-called Physics Engine in
Figure 14,where after a ∆t simulation a new state x
new
is computed.This value corre-
sponds to a point on the line segment q
n
−(i).After a certain number of simulations
it will reach (i) or in the case of an obstacle the point q
O
.The engine includes still a
function IS_VALID_PHYSICS_STATE,which checks also from viewpoint of physics
that all limitations are fulfilled.If it is correct then x
new
will be as q
new
added to T.
Again a new x
sample
is generated and the whole cycle will be repeated until a goal will
be shot or we will give up (or the game will be timed out).If a goal or its surrounding
X
goal
is reached we can apply a search tree algorithm(Figure 14) in backward direction
until the initial state is reached.After reversing we will get a sequence of skills,i.e.a
tactics.In Figure 15 the pseudocode of the described RBKP algorithmis depicted,which
was used for two-wheeled robots in RoboCup [28],too.
Initialize RRT
for i = 1 to K
(i) ←SAMPLE_RANDOM_STATE()
q
n
←KINEMATIC_NEAREST_NEIGHBOUR(T;(i))
fsm_state ←FSM_TRANSITION(q
n
)
if IS_VALID_FSM_STATE(fsm_state)
a ←APPLY_FSM_BEHAVIOUR(fsm_state;q
n
;(i))
x
new
←SIMULATE(q
n
;a;∆t)
if IS_VALID_PHYSICS_STATE(x
new
)
Add node q
new
to T
Add edge (q
n
;q
new
) with value a
if x
new
∈ X
goal
return x
new
end if
end if
end if
end for
return failed
Figure 15.Pseudocode of RBKP algorithm[7].
Although this method does not guarantee the optimality of solution but there is one
interesting potential of this method.Thanks to the randomness the opponents can pre-
dict the behaviour of our team only very difficult but we can predict the behaviour of
opponents.As the physics of each investigated point x is exhaustively described and if
we incorporate some basic and supposed efforts of opponents like e.g.effort always to
catch the ball then using our physics engine we can model opponents’ behaviour and so
to undertake some steps in advance.It is only necessary to construct a special FSMfor
opponents.
2.3.Negotiation-Based Approaches
Previous two MAS methods have their own strengths but also limitations.ALLIANCE
is a very strong method in real applications,where the environment is uncertain,sensors
and actuators are inaccurate,there are noises of various origin or other agents simply
fail.However,this approach in its pure formis not convenient for applications,where we
meet with adversarial behaviour.Behavioral kinodynamic planning needs first of all an
accurate physical model of a given domain and it can be a serious problem to obtain it.
On the other hand side,its unpredictability for opponents and quick planning makes it
very suitable just for adversarial environments.However,both approaches do not guar-
antee optimality and full efficacy.Just the so-called negotiation-based approaches try to
optimize cooperation among robots and to find the best one for a given task.In general,
these approaches are fully decentralized.An agent,which needs to fulfil a task at first
offers it e.g.by broadcasting to others.Then the agent with the best bid will be assigned
the given task and hereby it will be fully responsible for its accomplishing.There are var-
ious modifications and extensions of this approach and further we will deal with the so-
called market-driven approach (MDA),which was derived from an auction-based task
allocation systemMURDOCH [30].
MDA[31] is based on the principle of free-markets,which utilize a supposition that
if each participant of a market tries to maximize its own profit then the profit of the
whole team will be increased,too.Of course,the communication among participants of
a common team is necessary and they interchange all necessary information.The team
tries for a common objective,in our case a game win.Usually,such a common objective
is reachable only after a series of steps (partial objectives) into them it is decomposed.
For instance,shooting a goal is a result of several combined steps (objectives or tasks) as
passing,catching and dribbling the ball.For each partial task an auction is done.There
is a virtual auctioneer and a group of bidders,i.e.players in this context.The bidders
compute supposed costs necessary to performa given task and offer themto the auction-
eer.In the case of mobile robots the costs are in the formof time consumption for doing
translation or rotation during movement to a given position or obstacle avoidance.The
auction winner is that one offering the lowest task costs and its reward is the allowance to
performthis task.Incorporating further auction rules depends on the level of complexity
that we need to reach,e.g.embedding a certain momentumelement to prevent too often
exchange of player roles.
The concrete implementation is based on estimating these costs [31] for each robot
separately:
C
ES
= w
1
· t
d
+w
2
· t
a
+w
3
· COA
g
;(17)
C
A
= C
ES
+w
4
· COA
b
;(18)
C
D
= w
5
· t
d
+w
6
· t
a
+w
7
· COA
d
:(19)
The value C
ES
represents estimated score costs computed as a sum of costs required
to move for a specified distance t
d
and angle of rotation t
a
as well as costs caused by
obstacle avoidance (COA) on the path to the opponent goal area COA
g
.C
ES
is just the
value being offered to the auctioneer by the robot i,so the equations (17) – (19) should
be correctly indexed by i.However,real costs for the winner (the attacker) is by (18),
where it is still necessary to take into account costs of obstacle avoidance on the path to
the ball COA
b
,which are in the reality the first calculated costs preceding C
ES
.How-
ever,it is supposed that the initial goal position is more important then the position to
the ball,which can be passed by another teammate to the attacker,i.e.auction winner.
Similarly to (17),the defender’s costs are calculated,where COA
d
represents costs of
obstacle avoidance on the path to the midpoint on the line between the own goal and
the ball position.The weights w
1
;:::;w
7
determine the playing strategy.For instance,
if w
1
is a high value then the winning robot will try to shoot a goal only fromimmediate
distance to the opponent goal because only robots occurring in that area have the chance
to become an attacker.There are several ways how to design these weights.Beside ex-
pert evaluations several setting-up methods for w
i
were proposed,e.g.reinforcement
learning,neural networks and genetic algorithms.
MDA is based on dynamical assigning roles of attackers and defenders.Beside the
goalie whose role is given statically for the whole game another robots change their tasks
according their positions to the goals (own or opponent) and costs of obstacle avoidance.
Each robot (beside the goalie) computes in each sampling step (17) – (19) and offers
them to two auctions —for choosing one attacker and one defender.The robot with the
lowest value of C
ES
will become an attacker and similarly for the lowest C
D
it will
take over the role of a defender.Other robots can be in position of secondary attackers
or further defenders.If,as mostly in robotic soccer,there are only three robots —one
goalie,one attacker and one defender the task will be simplified.It will be necessary only
to calculate (17) and the defender will be the remaining robot.The whole role assigning
algorithmis depicted in Figure 16,where in the module of role assignment either the role
of a defender is assigned to a given robot or of any auxiliary position.This approach was
successfully used in SPL [32],too.
Figure 16.Role assigning algorithmusing MDA.
In [33] a hybrid Integrated Decision-making system of Players (IDP) was designed,
which connects MDA and FCMs.Its basic structure is in Figure 17.Cyclically,at cer-
tain time steps there are performed vision with image processing and scene recognition
followed by strategy choice realized by MDA.After that the possibility to kick the ball
is verified.If a robot is in the kicking position regardless of its current role it will try
to kick the ball in correspondence with its role.Otherwise the robots will continue their
movements during one time step and then the playing situation will be again evaluated.
Parameters for equations (17) – (19) are obtained from cameras as well as COA values
are estimated in a special navigation module proposed in [34],whose output is utilized
in the module of strategy choice.
Figure 17.Basic architecture of IDP approach.
In MDA the whole strategy of play is formulated implicitly rather offensively than
defensively because players try always to win and kick the ball regardless of their role.
The formulae (17) – (19) serve for role assignment and the weights w
i
will concretely
determine conditions when individual roles will be assigned.However,how individual
players will play it depends on particular rules that we define for each role.We can pro-
pose either only simple rules,e.g.a defender will try always to pass the ball to the attacker
or there can be defined also more complicated sets of rules combining e.g.passing and
dribbling or mutual passing among several players.In IDP these rules are implemented
into FCMs (in the module strategy choice).
The decision-making module about kicking possibility (Possible kick?) detects a
state whether a robot is already able to kick the ball by the instructions of its role or not.
If yes,then it will determine the kicking target (goal or teammate) and invoke a robot
controller from the low control level for controlling all its actuators and the robot will
kick.The decision-making module checks whether kicking has caused any instabilities
(e.g.crash or downfall).For determining the kicking target there is proposed a special
system based also on FCMs,where the shooting and catching rules are included (see
Figure 18),which is a part of the overall IDP FCM.Each player has its range of interest,
which is projected into supports of membership functions used in the nodes TM
i
(see
Figure 19).Just in this range players try to move.The wider the support the bigger the
range of interest is.FCMof kicking differs fromother parts of the overall FCMin a fact it
is dynamic,i.e.its membership functions and weights of connections change in time and
depend on the current playing situation.Peaks of the membership functions in the nodes
TM
i
represent angles of view of players seen from the shooter’s perspective,similarly
for the opponent O.The successfulness of passing the ball depends on mutual positions
of players as well as on shooter’s speed.Therefore,the weights of the connections w
si
are indirectly proportional to distances between the shooter and teammates.Values of
nodes A
j
are also indirectly proportional to the size of the angular rotation needed for
the shooter.These nodes (A
j
) represent possible angular ranges (intervals) of shooting
to individual teammates.The calculation of resultant kicking angle is performed in the
node S.
Figure 18.General structure of the kicking FCM.
In each time step new membership functions in the nodes TM
i
and O are set-up,
having their peaks at measured angle values of players’ positions.Simultaneously,the
weights w
si
and values in the nodes A
j
are calculated in the already mentioned manner.
As the membership functions in the nodes TM
i
should help attracting the ball and re-
versely the function in the node Oshould repulse it thus this function will be constructed
in a reverse manner than other functions,i.e.it will be minimum at the measured angle
of the opponent,see Figure 19.Then the values of the nodes A
j
will be multiplied to-
gether with the weights w
si
.Thus we get total costs of shooting that include rotation and
distance.The functions TM
i
will be scaled using these costs and finally in the node S
all membership functions will be united using a t-conormS
K
.
Figure 19.Example of the evaluation process in a kicking FCMbased on membership functions TM
i
and O.
3.Concluding Remarks and Outlook into Future
In this chapter an overview about some means and decision-making systems has been
done.As robotic soccer represents a set of tasks and problems that appear in real appli-
cations it was chosen for demonstrating these means.It is only a short and incomplete di-
gest froma variety of used methods.We can observe there are efforts to use means from
various research areas starting from conventional control theory and ending in artificial
intelligence as well as to interconnect them although they can be of very diverse princi-
ples.There is only one criterion for their choice —efficiency of an individual means for
a given individual task.It is often miraculous how these means can mutually cooperate.
Hybridization has become to a slogan in constructing decision-making systems.
As present technological level of robotics does not yet enable using too complex
algorithms and decision structures the designers are still forced to develop calculation-
saving systems,which is reflected on their decision structures,too.However,in this area
we can await a very rapid change in only a few years.Especially,in last not only five
years many advanced mobile robotic platforms have appeared with relatively powerful
processors as well as quality sensors and actuators,e.g.robotic dog Aibo or bipedal robot
Nao,etc.
In future we can suppose the research in robotics will be more oriented on inter-
actions with humans,which will lead to use of emotional models and fuzzy logic in a
greater measure.We will observe development of decision-making as well as control
systems fromprecise sensing and description of the environment (world) to systems be-
ing able to work in uncertain and heavily predictable environments.Now the period of
using probabilistic and uncertain methods is starting in robotics,too.
References
[1]
J.Baltes,N.M.Mayer,J.Anderson,K.-Y.Tu and A.Liu,The humanoid leagues in robot soccer
competitions,In Proc.of the IJCAI Workshop on Competitions in Artificial Intelligence and Robotics,
Pasadena,California,USA (2009),9–16.
[2]
D.García,E.Carbajal,C.Quintana,E.Torres,I.Gonzalez,C.Bustamante and L.Garrido,Borregos:
Teamdescription,In RoboCup papers (2010).
[3]
H.P.Störr,A compact fuzzy extension of the naive Bayesian classification algorithm,In Proc.In-
Tech/VJFuzzy (2002),172–177.
[4]
C.Bustamante,L.Garrido and R.Soto,Fuzzy naive Bayesian classification in RoboSoccer 3D:Ahybrid
approach to decision making,In Proc.of the RoboCup International Symposium,Bremen,Germany
(2006),507–515.
[5]
T.Hester,M.Quinlan and P.Stone,Generalized model learning for reinforcement learning on a hu-
manoid robot,In Proc.IEEE Int.Conf.on Robotics and Automation (ICRA),Anchorage,USA (2010),
2369–2374.
[6]
J.R.Quinlan,C4.5:Programs for Machine Learning,Morgan Kaufman Publishers,1993.
[7]
S.Zickler and M.Veloso,Playing creative soccer:Randomized behavioral kinodynamic planning of
robot tactics,In Proc.of RoboCup 2008:Robot Soccer World Cup XII (2008),414–425.
[8]
S.M.LaValle,Planning Algorithms,Cambridge University Press,Cambridge,United Kingdom,2006.
[9]
J.M.Vidal,Fundamentals of Multiagent Systems with NetLogo Examples,Unpublished,2010.
[10]
J.Brindza,A.Lee,A.Majumdar,B.Scharfman,A.Schneider,R.Shor and D.Lee,UPennalizers:Team
report,In RoboCup papers (2009).
[11]
C.Watkins,Learning from Delayed Rewards,PhD thesis,University of Cambridge,1989.
[12]
S.Barrett,K.Genter,M.Hausknecht,T.Hester,P.Khandelwal,J.Lee,A.Setapen,A.Tian,M.Quinlan,
M.Sridharan and P.Stone,TT-UT Austin Villa:Teamdescription,In RoboCup papers (2010).
[13]
T.Hester and P.Stone,Generalized model learning for reinforcement learning in factored domains,
In Proc.of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS),
Budapest,Hungary (2009),717–724.
[14]
S.Oblak,I.Škrjanc and S.Blažiˇc,If approximating nonlinear areas,then consider fuzzy systems,IEEE
Potentials 25 (2006),18–23.
[15]
R.Axelrod,Structure of Decision:The Cognitive Maps of Political Elites,Princeton University Press,
1976.
[16]
P.P.Groumpos,Fuzzy cognitive maps:Basic theories and their application to complex systems,In
Fuzzy Cognitive Maps,Studies in Fuzziness and Soft Computing 247 (2010),1–23.
[17]
B.Kosko,Fuzzy cognitive maps,International Journal of Man-Machine Studies 24 (1986),65–75.
[18]
S.M.Chen,Cognitive-map-based decision analysis based on NPN logics,Fuzzy Sets and Systems
71 (1995),155–163.
[19]
W.Stach,L.Kurgan,W.Pedrycz and M.Reformat,Genetic learning of fuzzy cognitive maps,Fuzzy
Sets and Systems 153 (2005),371–401.
[20]
E.I.Papageorgiou,K.E.Parsopoulos,C.D.Stylios,P.P.Groumpos and M.N.Vrahatis,Fuzzy cogni-
tive maps learning using particle swarm optimization,International Journal of Intelligent Information
Systems 25 (2005),95–121.
[21]
S.Bueno and J.L.Salmeron,Benchmarking main activation functions in fuzzy cognitive maps,Expert
Systems Applications,36 (2009),5221–5229.
[22]
J.Vašˇcák and M.Pal’a,Adaptation of fuzzy cognitive maps for navigation purposes by migration
algorithms,International Journal of Artificial Intelligence (in press).
[23]
Z.C.Johanyák and S.Kovács,A brief survey and comparison on various interpolation-based fuzzy
reasoning methods,Acta Polytechnica Hungarica 3 (2006),91–105.
[24]
P.Stone and M.Veloso,Multiagent systems:Asurvey froma machine learning perspective,Autonomous
Robots 8 (2000),345–383.
[25]
H.S.Nwana,Software agents:An overview,The Knowledge Engineering Review,11 (1996),205–244.
[26]
M.J.Matari´c,Learning in behavior-based multi-robot systems:Policies,models,and other agents,
Cognitive Systems Research,2 (2001),81–93.
[27]
L.E.Parker,Evaluating success in autonomous multi-robot teams:Experiences from ALLIANCE
architecture implementations,Journal of Experimental and Theoretical Artificial Intelligence 13 (2001),
95–98.
[28]
J.Bruce,S.Zickler,M.Licitra and M.Veloso,CMDragons:Dynamic passing and strategy on a cham-
pion robot soccer team,In Proc.IEEE Int.Conf.on Robotics and Automation (ICRA),Pasadena,USA
(2008),4074–4079.
[29]
L.E.Parker,ALLIANCE:An architecture for fault tolerant multirobot cooperation,IEEE Transactions
on Robotics and Automation 14 (1998),220–240.
[30]
B.P.Gerkey and M.J.Matari
´
c,Sold!:Auction methods for multi-robot coordination,IEEETransactions
on Robotics and Automation 18 (2002),758–786.
[31]
H.Köse,C.Mericli,K.Kaplan and H.L.Akin,All bids for one and one does for all:Market-driven
multi-agent collaboration in robot soccer domain,In Proc.18th International Symposium of Computer
and Information Sciences (ISCIS),Antalya,Turkey (2003),529–536.
[32]
H.L.Akin,T.Mericli,N.E.Özkucur,C.Kavaklioglu and B.Gökce,Cerberus’10 SPL team,In
RoboCup papers (2010).
[33]
J.Vašˇcák and K.Hirota,Integrated decision-making system for robot soccer,Journal of Advanced
Computational Intelligence and Intelligent Informatics 15 (2011),156–163.
[34]
C.Pozna,F.Troester,R.E.Precup,J.K.Tar and S.Preitl,On the design of an obstacle avoiding
trajectory:Method and simulation,Mathematics and Computers in Simulation 79 (2009),2211–2226.