DecisionMaking Systems in Mobile
Robotics
Ján VAŠ
ˇ
CÁK
Technical University of Košice,Slovakia
Abstract.This chapter deals with description of means for decisionmaking in mo
bile robotics as well as some integrated decisionmaking systems.They are mostly
oriented to applications in robotic soccer because it enables to solve a number
of robotic problems from optimal control of individual actuators up to construct
ing complex multiagent systems for teams of robots.The spectrum of nowadays
decisionmaking means that are used in robotics is much spread and it is not possi
ble to mention themall.Therefore,there are described only such means that seems
to be perspective for their use in robotic decisionmaking processes like naive
Bayes classiﬁers,decision trees,Markov decision processes and fuzzy cognitive
maps.Based on these means several autonomous decision systems are described.
Keywords.decisionmaking systems,mobile robotics,robotic soccer,multiagent
systems
Introduction
Robots are devices of diverse constructions,which perform various tasks and work in
manifold environments.Therefore there are possible not only many different decision
and control structures but also the deﬁnition of the notion decision can vary from one
application to another one.We can meet beside this notion also with expressions like
behaviour,strategy choice or control only.Because the border line between decision and
control is also not clear.This fact is not only caused by deﬁnition views of researchers
but also by a fact there are performed many mutually interconnected processes and their
separation would be often suspicious.
Mobile robots and especially humanoids represent the top level of robotics.Because
of their complexity they are the best objects for depicting all possible robotic activities
and related tasks as sensing,especially vision,lowlevel control,image recognition,lo
calization,strategy choice,navigation and planning.In all of the mentioned activities de
cision is contained in any form.Therefore we will concentrate our focus just on mobile
robots in this chapter.Robots can be placed either in static or dynamic environments,
where their elements are changed in time.Further,they work individually or as members
of a team,which requires additional abilities to communicate as well as to cooperate
each other.To show basic information modules of such a robot,relations among them
and possible approaches how to implement decisionmaking methods being convenient
for robotics it is useful to propose a complex task,where all these elements would be
presented.Development of the mentioned tasks and means for their solution and their
composition into a meaningful entity led to establishing robotic soccer,which will be
mostly our benchmark problem,where we will show individual means as elements of
complex decisionmaking systems.
Figure 1 shows basic information modules and relations divided into individual De
cision Levels (DL) from DL0 to DL2.These DLs do not differ only by their subordi
nation (DL0 as the lowest and DL2 as the highest) and their tasks but also by means
used.
Figure 1.Basic information structure of a mobile robot conﬁgured by decision levels DL0 – DL2.
Control can be also viewed as a special kind of decisionmaking,although its meth
ods are far from human imaginations about decisionmaking and approaches.However,
required reaction times for the lowest DL are very short so DL0 utilizes mostly ’classi
cal’ approaches like PID controllers and Kalman ﬁlters based on analytical description
of the inverse kinematics task by physical equations.Another advantage of these meth
ods is their ability of doing precise analyses concerning mainly the stability and robust
ness.These means are used for creating fundamental motion primitives as e.g.stepping,
turning of wheals or obstacle avoidance,which are then combined by instructions from
higher DLs.A special but not the least module is reactive navigation which solves colli
sions and because of very quick reactions required it is placed into this DL.Processing
sensor data can be divided into two basic groups:(a) basic (rough) processing and (b)
advanced one.Especially for mobile robots there are visual data from cameras the most
signiﬁcant although the data processing is not limited only to vision.Similarly as motion
control the ﬁrst group is based on primary operations like ﬁltering,image segmentation,
edge detection using gradients,etc.The output is in the case of vision a compressed im
age with potentially extractable (but not yet extracted) and highlighted information rele
vant for decisionmaking.As seen DL0 modules use the most strange methods as well as
mathematical calculi fromthe viewpoint of a human and represent a special independent
research area.However,they are at the beginning in formof basic sensor data processing
as well as the end as motion control of the whole decisionmaking chain and therefore
their omitting would lose completeness of this introductory overview.As the means used
in DL1 and DL2 have much more common properties we will further deal with these
two ones.
DL1 is an overbuilt on DL0,which enables to a robot to act independently.The ap
proaches used reﬂect much more human manners than in the case of DL0.Many of them
belong into the area of artiﬁcial intelligence like decision trees,reinforcement learning,
rulebased systems,fuzzy logic,cognitive maps,etc.The ability of adaptivity and self
learning became an inseparable part of this DL,too.Thus we can include all adaptation
methods,e.g.evolutionary algorithms and neural networks.The advanced sensor data
processing is practically the second stage after the basic one is done.The compressed
image is still in form of rough data.However,we need to extract e.g.obstacles,target,
paths,etc.fromthe image,i.e.ﬁrstly to recognize,secondly to localize themand eventu
ally to describe themin a more detail.In reality,we need to extract froma data pixel ma
trix a scene with relevant objects and mutual relations,which is the extracted image in
formation.Based on this information we can do some decisions.Basically,we can divide
decisionmaking into two stages (in simpler designs they merge into one) —a strategic
and a tactic one.Doing a border line between these two stages is much speculative.They
depend on individual designs but roughly we can think about strategy in relation to be
haviour.For instance,in a soccer (football) there are defenders,attackers,goalies,etc.
Principally,any robot can do any of these roles.Their assignment will depend only on
the type of strategy,e.g.defensive or offensive one.For successful performance of the
strategy choice module the situation evaluation is the most important.Resulting activi
ties fromthis module like planning of activities,choice of a convenient path,etc.belong
to the subordinated tactic stage.If we consider only one robot in a given environment
without any other ones DL1 will be the top level and not any additional modules are
necessary.
However,if there are at least two robots doing their tasks for the same goal they need
to communicate and to cooperate.In such a case we get a Multiagent System (MAS),
which represents the highest level DL2 and supersedes the strategic stage in DL1 in
a large measure.For solving complex robotic tasks MAS becomes intrinsic part of any
decisionmaking design and therefore we will deal with MAS in a special section.Of
course,all data,information and control ﬂows create a very complex structure and hereby
they require special programming techniques for their handling like memory sharing,
assigning priorities,etc.Such a managing system represents a decisionmaking system
about itself.However,its description needs extending to programming techniques and
therefore it will not be a subject of this chapter.
In next section 1 some basic means used in decisionmaking systems will be
described.As our benchmark problem is robotic soccer we will concentrate our at
tention just on the means used in this area.Nowadays,two main associations ex
ist,which deal with robotic soccer — FIRA (http://www.ﬁra.net/) and RoboCup
(http://www.robocup.org/).Their characterization,common as well as different features
are described in [1].These associations offer a number of various competitions and we
will focus particularly on the Standard Platform League (SPL),see (http://www.tzi.de/
spl/bin/view/Website/WebHome),where descriptions and reports of participating teams
can be found.They will be cited like RoboCup papers.Although all designs use hybrid
approaches of several mixed approaches one fact is obvious.More than one half of teams
utilize less or more means of artiﬁcial intelligence and their popularity is steadily grow
ing.This is the reason why we will concentrate in descriptions mainly on artiﬁcial intel
ligence means.After that some autonomous robotic decisionmaking systems composed
of these means will be featured in the section 2.
1.Means for DecisionMaking in Robotics
Decisionmaking processes need two basic groups of means:(a) process description and
(b) object description.The means of the ﬁrst group are mostly graphbased approaches
like decision trees,Markov chains and related Markov decision processes,cognitive
maps,ﬁnite automata and Petri nets,which enable to observe information ﬂows as well
as transitions among states (situations) by actions.Objects (the second group) can be of
diverse matter and therefore we need a palette of convenient means for their description,
e.g.probabilistic and fuzzy models or emotional models for interaction man–machine.
Usually,objects are by means of the ﬁrst group projected into graph nodes.Of course,
such a division of means is ﬁctive in praxis and they are mutually interconnected.It
should only help us to realize the matter of these two aspects.All these approaches are
able to describe most parts of the same information.The only difference is their knowl
edge representation.Each type of knowledge representation has some strengths,which
are useful for a given application.
As the fundamental knowledge representation the rule base can be considered and
others may be derived from it.Like an example we can mention the rule base of a goal
keeper [2]:
1.
Search the ball.
2.
Calculate direction and distance to the ball.
3.
If the ball is far then go to the center of the goal.
4.
If the ball is close and straight then wait.
5.
If the ball is close,not moving and straight then go to kick it.
6.
If the ball is close and not straight then walk sideways.
Figure 2.A rule set of the goal keeper’s behaviour.
Figure 2 depicts only a very simple and uncertain rule base.The descriptions of
notions like walk sideways or close are missing.The problem of obstacles in form of
players is not solved,etc.Besides,rules create usually decision (implication) chains and
even closed loops.We can see adding other rules will cause such a rule base will be hard
readable.This is just one reason why other more advanced means for decisionmaking
process description are necessary.In next parts of this section some of mentioned means
will be described on examples of robotic soccer applications.
1.1.Fuzzy Naive Bayes Classiﬁer
In any task there is a set of possible actions and robots decide,which one of themwill be
performed next time,e.g.to kick a ball to a goal or rather to pass it to a teammate,etc.We
must take into account also a fact that robots are strongly limited by computational power
and sampling times (time intervals,whose robots must react in),which requires ﬁrst of all
simple solutions.Naive Bayes Classiﬁers (NBC) correspond just to these requirements.
Although their main area of use is in pattern recognition but some applications appear
also in robotics.
Let us suppose a set (class) of possible actions C = {c
1
;:::;c
k
}.If any action
could be performed some conditions must be fulﬁlled,which are described by feature
variables X
1
;:::;X
n
.Each X
i
has a ﬁnite set of values x
i
∈< x
i;min
;x
i;max
>.If
feature variables achieve certain values,which can be measured then we can ask about
possibility that C = c
j
will be chosen.Hence the conditional probability of occurrence
of C = c
j
named as conditional model and described generally as P(CX
1
;:::;X
n
)
can be computed using Bayes’ rule as
P(CX
1
;:::;X
n
) =
P(X
1
;:::;X
n
C):P(C)
P(X
1
;:::;X
n
)
:(1)
As the denominator in (1) is a constant being independent from any given action it can
be assumed like a normalization value and for further omitted without the loss of cor
rectness.In addition,if we suppose feature variables are mutually independent then (1)
can be rewritten as
P(CX
1
;:::;X
n
) = P(C):
n
∏
i=1
P(X
i
C):(2)
The expression (2),known as naive Bayes probabilistic model,shows we can handle one
dimensional distributions instead of a compound one in (1) hereby the task is simpliﬁed
sparing computational efforts and thus the socalled curse of dimensionality is obviated.
Moreover,if the conditional independence assumption is not fulﬁlled,which is a very
often case in praxis,in spite of that NBC will mostly work well.Robustness is a typical
feature of NBC.
NBCcombines its probabilistic model with a decision rule.In this case we search for
a hypothesis,which is the most probable.Therefore,we will use the socalled maximum
a posteriori decision rule and NBC is deﬁned as a function
NBC(a) = arg max
c
j
∈C
P(c
j
):
n
∏
i=1
P(x
i
c
j
);(3)
where X
i
= x
i
,C = c
j
and a is a combination of feature variables X
i
for given values
x
i
,i.e.a = {X
1
= x
1
;:::;X
n
= x
n
},which represents an example being classiﬁed
by NBC.The output of the classiﬁer is such an action c
j
,where (3) has the maximum
value.NBC can be rewritten into a set of k rules,where ranges of given feature variables
are deﬁned.If a not yet classiﬁed example enters the NBC evaluating (3) will result in an
action c
j
with the highest probability.In the case of decision,which of three teammates
a ball should be passed to,there will be three rules (each for one teammate) with nine
intervals for feature variables of distances D
PB
,D
OB
,D
T1
,D
T2
,D
T2
and angles ,
1
,
2
and
3
as shown in Figure 3.
Figure 3.Feature variables of NBC for passing;P – passer,T – teammate,O – opponent, – ball.
A classical NBC supposes for feature variables discrete intervals x
i
∈< x
i;min
;
x
i;max
> but in reality we operate with continuous intervals (in our case distances and
angles).Further,it could also be advantageous to deal with linguistic indications of these
variables like e.g.short distance or wide angle.The deﬁnitions for short and wide would
cover a range of values.Thus we get a ﬁnite number of linguistic values,which cover
intervals of feature variables continuously.In such a manner we obtain a fuzzy variant
of NBC (FNBC).In addition,fuzzy systems are typical with their robustness,which is
testiﬁed in this case by continuous transition from one action to another one.In [3] the
following modiﬁcation of NBC (3) is proposed:
FNBC(a) = arg max
c
j
∈C
P(c
j
):
n
∏
i=1
x
i;max
∑
x
i
=x
i;min
P(x
i
c
j
):
Lx
i
;(4)
where
Lx
i
is a membership function and Lx
i
stands for any linguistic indication like
short distance or wide angle,see Figure 4.
NBCs as well as FNBCs have another very important advantage.They can be con
structed automatically using a relatively small amount of r examples e = {X
1
=
x
1
;:::;X
n
= x
n
;C = c
j
} from the training set L (e ∈ L) using supervised learning,
i.e.we know desired relations between inputs (feature variables X
i
) and outputs (ac
tions c
j
).Just fromthis reason the transformation (1) was necessary.To calculate (4) the
following formulae were proposed:
(a) (b)
(c) (d)
Figure 4.Membership functions for feature variables:(a) distance D
PB
,(b) distances D
OB
and D
T
1
B
–
D
T
3
B
,(c) angle ,(d) angles
1
–
3
;S – small,M– medium,L – large,C – close,W– wide [4].
P(C = c
j
) =
1 +
∑
e∈L
e
c
j
r +k
;(5)
P(X
i
= x
i
) =
1 +
∑
e∈L
e
Lx
i
r +nmf
i
;(6)
P(X
i
= x
i
C = c
j
) =
1 +
∑
e∈L
e
Lx
i
:
e
c
j
nmf +
∑
e∈L
e
c
j
;(7)
where nmf is the number of all deﬁned membership functions for the feature variable
X
i
and
e
c
j
is a truth function of an action c
j
in a given example e.For correctness we
require for all membership functions the condition of fuzzy partition,i.e.
∑
∀i
i
(u) = 1,
u ∈ U (U as a general universe of discourse).
In [4] a slightly modiﬁed variation of FNBC for passing a ball is described,which
was used in [2] on a simulation platform.The scenario in Figure 3 is simpliﬁed only for
one teammate.A series of experiments for various conﬁgurations ball,passer,opponent
and teammate is performed.At the beginning of each experiment the feature variables are
recorded (D
PB
,D
OB
,D
T
,,).Then both the opponent and the teammate try to win
the ball.If the teammate gets the ball the experiment will be tagged as success otherwise
miss.In this case the classiﬁcation is dichotomic (either success or miss).After the learn
ing stage the classiﬁer is applied in a play with four teammates and four opponents in the
following way.The passer uses the proposed FNBC for evaluation of individual com
petitions,i.e.one teammate versus one opponent doing all possible combinations.The
lowest probability of success is stored for each teammate and ﬁnally the teammate with
maximum probability of success is chosen,which is the socalled minmax approach.
In spite of the fact that the real play is different from that one in the learning stage the
quality of such a classiﬁer is satisfactory,which again proves the robustness of FNBC.
1.2.Decision Trees
Decision Trees (DT) represent one of the oldest means for decisionmaking.They have
several signiﬁcant advantages like their simplicity and hereby ability to process with
large data sets.Further,they are able of generalization,where their nodes behave like
centers of clusters in a state space [5].In addition,they are easy to be analysed,e.g.
information gain or entropy calculation and other statistical evaluations.Finally,fromthe
viewpoint of comprehensibility they are of great importance like means for explanation
mechanisms and graphical representation of basic notions and their relations.From a
variety of DT learning methods the Quinlan’s C4.5 [6] and its commercial improvement
C5 are especially very popular.However,comparison of results obtained by conventional
DTs with NBCs proved a little smaller efﬁcacy for DTs in a number of applications,e.g.
in the problemof passing (section 1.1) [4].Only more advanced means derived fromthe
theory of DTs are able to overcome NBCs.
In next part we will focus on a special type of DTs,which was especially devel
oped for motion planning and was used as a part of the socalled behavioral control for
purposes of robotic soccer [7].Rapidly exploring Dense Trees (RDTs) [8] represent a
family of trees,whose main idea is to incrementally construct a search tree,which would
continuously improve its resolution.In this case decisionmaking is transformed into a
searching task to ﬁnd a route (trajectory) from a starting point q
S
to the goal q
G
.These
DTs are constructed with the help of a dense sequence compound from samples (i),
which describe given area C.If is a randomsequence then we will call such a DT as a
Rapidly exploring Random Tree (RRT),which is the objective of next considerations.
The RRT algorithm starts with the ﬁrst sample (1),which creates the ﬁrst node of
the tree.Then the second sample (2) will do the same and the nodes will be mutually
connected.After this initial step other samples will try to ﬁnd the nearest point q
n
of
such a gradually growing tree and to create a direct link between q
n
and (i).If q
n
is
a node then a new edge will be added.If q
n
lies somewhere on an existing edge then it
will be divided in the place of q
n
,which will become a new node connecting the graph
with (i).Thus in each step one or two newedges will arise and (i) will become a new
node.If between q
n
and (i) an obstacle exists then a new node q
O
will be created on
the border with the obstacle closer to the tree and (i) will be returned somewhere to the
sequence ,see Figure 5.
The RRT algorithm can be rewritten into a pseudocode as depicted in Figure 6,
where we can see the number of tree nodes is at least equivalent to the number of (i).
If they are dense enough and properly dispersed in the investigated area then we will get
an accurate map of free area S (S ⊆ C) with indicated obstacles and further limitations.
Individual edges are possible paths,which can be chained into a trajectory froma starting
point q
S
to a goal q
G
using any graph search algorithm.For generating a motion plan we
start creating the tree with q
S
and continue expanding (exploring the area) until a node
in q
G
(logical tag Path = 1) is created or after a given number of cycles we will give
up,i.e.Path = 0.The functions NEAREST and SEARCH_BORDER can be realized
variously.The main effort is in minimizing computational demands because these two
functions are often used.Also expanding the tree can be modiﬁed,e.g.as simultaneous
Figure 5.Process of creating a RRT;O – obstacle.
expanding of two trees fromq
S
and q
G
,respectively until they mutually merge.A more
detailed description of RDTs can be found in [8].
1.3.Markov DecisionMaking Processes and Reinforcement Learning
A real robot is a dynamic system,where its states change.To exactly describe it,we
need to consider its current state,actions and also partially history of its states,which
is in form of a series of signiﬁcant derivatives.It means a kind of memory is necessary
and the problem becomes too complicated (description as well as solving).However,if
the sampling is dense enough we can do a simpliﬁcation based on the assumption that a
new state s
′
of the robot will depend only on its current state s (s;s
′
∈ S,S – set of all
possible states) and performed action a.If we suppose there is no deterministic transition
s → a → s
′
we will describe it as a probability p in form of the transition function
T(s;a;s
′
),which is in other words a conditional probability p(s
′
s;a),too.Considering
states as nodes of a graph and actions as its edges we can depict all possible combinations
s → a → s
′
as seen on an example in Figure 7,where edge expressions a
i
:p
j
denote
probability p
j
that action a
i
will transform the input state of a given edge to the output
one.We can see it is required to keep a sum of all probabilities for any action a
i
of a
given node equal to one (the second Kolmogorov probability axiom).
Besides,a reward function r:S → R is deﬁned,which enables to sort individual
states.Then we will start a chain of transitions from one state to another one,etc.Each
transition will be rewarded.After a series of such transitions the immediate rewards r
will be in a way accumulated into a cumulative reward (utility) u.Of course,a robot
Path ←0
Initialize RRT
for i = 1 to K
q
n
←NEAREST(RRT;(i))
if q
n
̸=node
Split edge
Add q
n
as a node to RRT
end if
if edge (q
n
;(i)) ⊂ S
Add (i) as a node to RRT
Add edge (q
n
;(i))
if (i) = q
G
Path ←1
return
end if
else
Add node q
O
←SEARCH_BORDER(edge (q
n
;(i)))
Add edge (q
n
;q
O
)
Include (i) into
end if
end for
Figure 6.Pseudocode of RRT creating algorithm.
(agent) tries to maximize u.How to solve this task under mentioned conditions it is
the objective of the socalled Markov Decision Process (MDP),whose output is usually
optimal policy formulated as
∗
(s) = arg max
a∈A
∑
∀s
′
T(s;a;s
′
):u(s
′
):(8)
In other words, is a mapping from states to actions :S → A (A – the set of all
actions).The inﬂuence of future rewards is often required to be decreased by a discount
factor like e.g.
0
:r(s
1
)+
1
:r(s
2
)+
2
:r(s
3
):::for a sequence of states s
1
;s
2
;s
3
;:::
Considering a similar sequence of states we can estimate the cumulative reward u for
each state s and so we will get a systemof n Bellman equations (if there are n states) as
u(s) = r(s) + :max
a
∑
∀s
′
T(s;a;s
′
):u(s
′
):(9)
From computational complexity reasons the equations (9) are mostly solved iteratively.
A more detailed description can be found in [9].
We see that MDP simulates processes in time as well as it also gradually enhances
their quality and improves assigning correct actions to individual states.It is an adaptive
(selflearning) means,too.MDP is a general calculus and can be used not only in sim
ulation processes (e.g.epidemic spreading) but also in planning or multiagent systems.
In [10] MDP was used for localization of robots,where this approach was combined
with a Kalman ﬁlter.MDP was used for estimation of the rotation angle and Kalman
ﬁlter for estimation of translational robot’s position.Such hybrid approach simpliﬁed
Figure 7.Graphical representation of MDP —an example [9].
the complexity of this probabilistic task and hereby it shortened the computational time,
where the information fromsensors is not totaly precise.
However,a fully described MDP as in Figure 7 is often not available.In reality,
robots must operate in unknown areas where states,their relations and even rewards are
unknown.Thus MDP without some modiﬁcations is not usable.One of means for solving
such a problem is Reinforcement Learning (RL) representing a broad set of methods,
which use environment description based on MDP.For this reason the MDP scheme
is modiﬁed in such a sense that rewards are placed on the edges instead of the nodes
(states).So we get the reward function in formr(s;a):S ×A →R.In other words,for
each state s we search the best action a,which maximizes the cumulative reward u(s;a)
and Bellman equations are in form
u(s;a) = r(s;a) + :max
a
∑
∀s
′
T(s;a;s
′
):u(s
′
;a
′
):(10)
Although RL is a supervised type of learning unlike conventional supervised learn
ing RL does not use pairs input/desired output but it is concentrated like MDP on win
ning the biggest cumulative reward during a time period,i.e.ﬁnding the best policy
∗
(s).Methods for handling the maximization process divide RL approaches into two
groups:(a) modelfree and (b) modelbased methods.The ﬁrst group is concentrated on
searching the optimum action for each state.The main advantage is low computational
complexity but the quality is less than for the modelbased methods.Therefore methods
of the group (a) are convenient for systems with small amount of dynamics.A typical
representative is Qlearning [11].Methods of the group (b) at ﬁrst try to approximate
the transition and reward functions hereby creating their inner models,which actions are
simulated in.There is still one more very worth property of RLs.Similarly like NBCs
these types of algorithms do not need large data sets of training data,which is a very
important condition.Next we will describe a hybridized form of a modelbased RL us
ing DTs (RLDT),which was used for training a bipedal robot NAO to score penalty
kicks [12].
As computational complexity and related time responses are critical in robotic appli
cations the main idea of RLDT is to minimize the socalled exploration mode although
the obtained solution needs not to be the optimum.For a more efﬁcient exploration DTs
are used,which are able quickly to minimize the search area by generalizing similar
states into one common.DTs are used for constructing T(s;a;s
′
) and r(s;a) [5].If a
state s is characterized by n state variables X
i
then it is in form of a vector with val
ues x
i
,i.e.s = (x
1
;:::;x
n
),x
i
∈ X
i
.Thus using e.g.Quinlan’s C4.5 algorithm there
are constructed n +1 DTs for variables X
i
and the reward r(s;a).These DTs perform
predictions of the changes when their input is in form (a;x
1
;:::;x
n
).Each tree is pro
cessed resulting in a leaf dependent on the input.For transition DTs we get probabilities
of state variables changes p(x
c
i
s;a) (x
c
i
∈ X
c
i
as an element of the state change s
c
).
As we suppose the variables X
i
are independent then we will get a change of the total
transition function as
T(s
c
;a;s) =
n
∏
i=0
p(x
c
i
s;a):(11)
Finally,next state s
′
is computed as a sum of the current state s
m
and s
c
.A similar
treatment is in the case of the reward DT.This activity represents the UPDATE_MODEL
function used in the RLDT algorithm(see Figure 9).Figure 8 shows an example of a DT
for relative changes of the variable X if a robot is moving on a grid.DT ﬁrst investigates
whether the action ’go to the left’ has been performed.Then the decisionmaking ﬂowis
divided by individual results of given questions till a leaf (a rectangular node) is reached.
If a robot is in the position indicated by a bullet and it moves to one of the black ﬁelds
(there is important also the robot’s orientation) then DT will not generate any changes.
RLDT algorithm divides its processing into two separate stages.Beside the explo
ration mode there is also the socalled exploitation mode.During exploration the models
T(s;a;s
′
) and r(s;a) for new (yet unknown) states are updated.During exploitation we
suppose the models are correct and therefore the optimal policy is calculated in a way as
deﬁned for MDPs.The algorithmstarts with a state s,which is added to a set of already
visited states S
M
.At the same time s and all connected actions a construct state —ac
tion pairs,whose numbers of visits are counted in counters visits(s;a).After that a loop
will be started,where the action a with the highest cumulative reward u(s;a) is chosen
and the state s
′
is obtained.If it is not yet in S
M
it will be added.The function UP
DATE_MODEL is applied using C4.5 algorithm,which incrementally updates models
for T(s;a;s
′
) and r(s;a).If some updates are done the logical variable CH (change)
will be set to 1.At this moment the algorithm decides between the exploration and ex
ploitation mode with help of a heuristic incorporated in the function CHECK_MODEL.
If the prediction of an immediate reward is less than 40%of the maximum one then the
exploration will be switched on and the logical variable exp will be set to 1.This heuris
tic causes exhaustive exploring in the early stages,when the knowledge of the domain is
low.After gaining positive rewards the exploitation mode will be chosen.Finally,a new
optimal policy will be computed by the function COMPUTE_VALUES if the models
T(s;a;s
′
) and r(s;a) were changed.The loop counter is incremented and a new loop is
Figure 8.An example of a DT for change prediction of the variable X [5].
started with s
′
as s.The pseudocode of RLDT algorithmis depicted in Figure 9.Further
details of this algorithmare in [13].
Include s into S
M
∀a ∈ A:visits(s;a) ←0
loop
a ←arg max
∀a
u(s;a)
Execute a,obtain reward r(s;a),observe s
′
Increment visits(s;a)
if s
′
̸∈ S
M
Include s
′
into S
M
∀a ∈ A:visits(s
′
;a) ←0
end if
PerformUPDATE_MODEL
exp ←CHECK_MODEL
if CH = 1
COMPUTE_VALUES
end if
s ←s
′
end loop
Figure 9.Pseudocode of the general RLDT algorithm.
1.4.Fuzzy Cognitive Maps
As mentioned above rule bases are considered as the basic knowledge representation.
However,in the area of mobile robotics we have often to handle with decisions chains,
where many of them create closed loops.Thus individual rules are causally intercon
nected (see rows 1–2 in Figure 2) and cannot be ordered in a simple list of uniformIF —
THEN rules as it is usual for the socalled simple rulebased systems,e.g.rows 3–6 in
Figure 2.In such a case rules are not mutually independent.In a more complex rule
based system relations between rules become incomprehensible and a graphical means
is necessary to clearly catch them.Fuzzy Cognitive Maps (FCM) seem to be just such
an aid not only for knowledge representation but also for its analyzing and processing
complex rulebased systems [14].
The notion Cognitive Map (CM) was introduced by a political scientist Robert Axel
rod primarily to model social processes [15].It is an oriented graph where its nodes rep
resent notions and edges are causal relations.Mostly,notions are states or conditions and
edges are actions or transfer functions,which transform a state in a node to another one
in another node.CM is able to describe complex dynamic systems.It is possible to in
vestigate cycles,collisions,etc.and to deﬁne strengths (weights) of relations,too.Origi
nally,they were represented by three values −1;0 and 1.Another advantage is its human
friendly knowledge representation and ability to describe various types of relations (in a
more detail see e.g.[16]).
FCMrepresents an extension of CMand was proposed by Kosko in 1986 [17].The
extension is based on strength values that are froman interval [−1;1] as well as the nodes
can be represented by activation values from an interval [0;1] or by membership func
tions as well,see Figure 10.Strengths after their combining correspond to rule weights
in rule based systems,too.
Figure 10.An example of FCM.
There are two basic formal deﬁnitions of FCM[18] and [19].Further,the deﬁnition
by Chen will be used where FCMis deﬁned as a 4tuple:
FCM = (C;E;;);(12)
where:
C  ﬁnite set of cognitive units (nodes) described by their states
C = {C
1
;C
2
;:::;C
n
};
E  ﬁnite set of oriented connections (edges) between nodes
E = {e
11
;e
12
;:::;e
nn
};
 mapping :C →[0;1] (originally proposed as [−1;1]);
 mapping :E →[−1;1].
In other words, is a computational method for evaluating numerical values of the
set of nodes C.On behalf of correctness,it is necessary to mention that a cognitive unit
is represented by two values:(a) its symbolic (linguistic) meaning denoted as C
i
and (b)
its numerical activation value A
i
∈ [0;1].C
i
represents the qualitative aspect and A
i
the
quantitative one of a node.However,fromsimplicity reasons we can omit this distinction
and we will use the symbol C
i
although further we will handle only activation values.
On the other hand, represents the way how the weights of edges e
ij
∈ [−1;1] are
determined.The signs in this case deﬁne the character of relationship between nodes —
either strengthening (+) or weakening (−).
The set of connections E forms the socalled connection (adjacency) matrix,which
can be used for computing of new activation values of nodes C.For the example in
Figure 10 it will look as:
E =
0 −1 0 −0;5 1
−0;9 0 0;3 0 0
0 0 0 0 1
0 0 −0;4 0 −1
0 0 0 0;7 0
:(13)
Cognitive units are in each time step k in a certain state.Using E we can compute
their states for next time step k + 1 and thus repeatedly for further steps.Similarly as
for differential equations we can draw phase portraits.To preserve values in prescribed
limits a limiting (threshold or transform) function L is used,too.So we can compute the
states C
i
for k +1 as follows [19]:
C
i
(k +1) = L
n
∑
j=1
e
ij
:C
j
(k)
:(14)
The formula (14) can be generalized for computation of the whole state vector C
using a matrix product,i.e.C(k + 1) = L(E:C(k)),which is a special form of in
the FCMdeﬁnition.There exists still an incremental modiﬁcation of (14) where C(k) is
incremented by computed values,e.g.[20]:
C
i
(k +1) = L
C
i
(k) +
n
∑
j=1
e
ij
:C
j
(k)
:(15)
The primary role of the function Lis to keep activation values in the interval [0;1].A
number of functions fulﬁl this condition.However,as the most advantageous the sigmoid
function seems to be [21].For a special kind of FCMwith active inputs [22] any kind of
typical membership functions can be used as L.
If we compare a complex system like in Figure 10 and a simple system with m
independent rules as Figure 11 (e.g.rows 3–6 in Figure 2) we can see FCMs are an
extension of simple fuzzy rulebased systems.Fuzzy rules are totally independent in
simple systems because their consequents do not have any mutual inﬂuence,which is
possible only in simpler decision cases.Simple rulebased systems do not enable any
decision chains or representation of temporal information.From this point of view they
are only a very special and restrained case of FCMs,which can be depicted as an example
in Figure 11 where a set of mrules with antecedent parts LX
i
,which represent already
merged particular inputs like x
j
is LX
i
j
(x
j
∈ X) and outputs LU
i
(i = 1;:::;m)
is ﬁgured in form of a FCM.There is depicted the evaluation process resulting in an
accumulated (aggregated) value LU
c
being defuzziﬁed into a crisp form LU
∗
c
,too.A
more detailed description of fuzzy reasoning methods is e.g.in [23].
Figure 11.An example of FCMrepresenting a simple rule set.
2.Autonomous DecisionMaking Systems in Multiagent Environment
Mentioned means for decisionmaking support fromthe previous section represent some
elements of a decision and control structure,which formulates the main goal of a given
systemand interconnects these means to fulﬁl the goal.If we consider an individual robot
then there will be necessary only DL0 and DL1,see Figure 1.However,robotic soccer
is a teamplay and robots must be supplied by other abilities for their mutual cooperation,
which extend our basic structure to DL2,whose task is to create an overall structure for
control,decisionmaking,communication,cooperation and behaviour management of a
distributed system with several more or less independent units called as agents.Such an
overall structure is the objective of MAS.
The concept of MAS was inspired by several apparently different disciplines like
economy,sociology and biology and it represents common interconnections among
them.From the typological viewpoint of information sciences it originates from an in
tersection of distributed computing and artiﬁcial intelligence as a part of distributed ar
tiﬁcial intelligence [24],which differs fromdistributed computing using communication
and cooperation instead of parallelization and synchronization.
In our case agents play a key role of MAS.There does not exist any general deﬁnition
of an agent because this termis too widespread,existing on various levels of complexity
and it can be implemented in a number of forms.However,we can characterize it by
some minimal properties that must be fulﬁlled [25]:
1.
Autonomy:An agent is able to operate independently froma human.It has indi
vidual internal states and goals.Autonomy cannot be real without the proactivity
attribute,where it takes the initiative and does not wait passively for a stimulus
fromoutside.
2.
Cooperation:It is the core property,without it we cannot speak about any MAS.
It is the ability to interact with other agents (eventually human) using a commu
nication language.
3.
Learning:Maybe not always imperative but very useful property,which enables
to the agent to be ﬂexible and able to adapt to changes of environment.
Intersections of these properties enable us to deﬁne the following categories of agents:
collaborative agents,collaborative learning agents,interface agents and smart agents,
see Figure 12.For purposes of robotic soccer mainly collaborative and smart agents come
into consideration.If there is allowed any central command unit like a couch or captain
then also collaborative learning agents can be used.
Figure 12.Basic typology of agents by [25].
Another division of agents is possible by kinds of control and their approaches:
reactive,deliberative,hybrid and behaviourbased [26].
Reactive control represents the simplest type.Its approach can be expressed like:
"Do not think,act."Such an agent only reacts immediately to an external stimulus.It uses
a ﬁnite set of actions being known in advance and so it does not need any inner model
of the surrounding world or communication with other agents.From this reason such
agents are known as ’memoryless’.Their main advantage is computational simplicity
and speed.Their use is on the lowest decision (control) level based on direct inputs
fromsensors.Although,these agents seemto be very primitive but convenient growth of
their complexity can lead to apparently intelligent behaviour with help of the socalled
subsumption architecture,which belongs to this category,too.
Deliberative control is based on:"Think hard,then act."It is characteristic by proac
tive and intentional approach.Its main feature is communication with its surrounding
and creating own inner world model with the aim to reach determined goals.The world
model is strictly centralized and based on sequences state — action.Its purpose is to
design plans of actions.Beside apparent advantages there are also drawbacks like their
high information dependency and computational complexity.Such an approach strikes
on needs as prediction,processing inaccurate information,change of model in dynamic
environment,etc.Therefore,the use of deliberative control is reasonable on the highest
decision levels.
Hybrid approach is characteristic by:Think and act independently,in parallel."It
tries to eliminate the disadvantages of the two mentioned types of control.Comparing
real needs of robotic teams we can see the necessity of both approaches.Therefore,the
reactive as well as the deliberative architecture are merged into one unit but they work
independently each other.The reactive part ensures realtime responses on the level of
sensorial data,e.g.to avoid collisions with obstacles.The deliberative part works on high
decisionmaking levels using abstract world representation to make strategic decisions
and plans.
Behaviourbased approach is a little of different nature.Its basic idea is:"Think the
way,you act."This approach found its inspiration in biology.It is based on observed pat
terns of various kinds of behaviour.These behaviours are derived from interactions be
tween the agent and its environment.So incremental adaptation is supposed,where ﬁrst
basic (survival) actions are created and continuously more advanced patterns are added,
e.g.starting with simple obstacle avoidance up to e.g.passing a ball.Individual patterns
are stored in the system in form of distributed layers,which are mutually competitive.
This is the main difference to the hybrid type,which also determines the use of these two
control types.
Hybrid agents are used mostly in single robot control whereas behaviourbased con
trol is used in multirobot applications like robotic soccer,where a quick response to
changes of such a dynamic game is especially required.In next parts of this section we
will deal mainly with such MAS architectures,which are related ﬁrst of all to behaviour
based control.
There is still another division of agents in MAS,which are constructed over them:
(a) homogenous,i.e.agents are mutually equivalent (hardware as well as software) per
forming the same tasks and (b) heterogeneous if they differ by different roles (e.g.pos
sibility to incorporate hierarchies) although their hardware could be identical.Heteroge
neous agents are from the viewpoint of research and also of utilization potential more
interesting and therefore most MAS architectures try to solve communication and role
allocation among heterogeneous agents,too.
Summarizing mentioned facts we can state that MAS are suitable especially for
situations where agents have incomplete information as well as data are decentralized.In
addition,there is absent any control from a global level and calculations are performed
asynchronously.As robotic soccer can be characterized just by such properties it seems
to be an excellent candidate for using MAS as well.In general,MAS enable shortening
of computational response (thank to parallelism) as well as communication demands
because agents are functionally independent units.Besides,operability and reliability of
such a systemis enhanced because of ﬂexibility of MAS architectures,in general.
In next parts we will introduce some types of decisionmaking systems for robotic
teams where the principles of MAS can be more or less visible.However,till nowadays
there does not exist any decisionmaking system with full functionality of MAS as de
scribed above.We must take into consideration there are still lots of children’s problems,
which must be solved and robots do not yet own so powerful processors they would be
able to solve these problems in full breadth and depth.
2.1.Cooperative Control Architecture ALLIANCE
ALLIANCE is one of the most known control architectures for multirobot applica
tions [27].Although it is not directly used in robotic soccer because of its proper
ties,where the existence of adversaries is not supposed,but some of other approaches
(e.g.[28]) are more or less inspired by ALLIANCE.
ALLIANCE is proposed for mutual cooperation of heterogeneous robots in small
and medium sized teams.It is determined for dynamic environments,where problems
with faulty sensing or incomplete information can occur to some extent.Mutual com
munication is not always necessary and there is no central unit with complete informa
tion about the environment.Agents (robots) are fully independent.However,on the other
hand side,it is required that robots should be able at least partially to sense their position
and results of their activities as well as of that of their teammates,in general.First of
all,full cooperation among all robots is supposed,i.e.no tricks,no lying or even ad
verse behaviour.Just the last condition contradicts needs of robotic soccer but there are
many other application possibilities,where we can conform to these requirements,like
watching systems,objects (garbage) collecting or handling (box pushing).
As in praxis some problems with communication can happen it is more convenient
to be able to recognize behaviour of other robots to predict their future performance than
to ask them directly about their activities.Principles of ALLIANCE are based on such
an action recognition.The basic architecture of ALLIANCE is depicted in Figure 13,
which is implemented on each agent.We can see it is a fully distributed cooperative
control scheme of the behaviourbased architecture.It is based on behaviour sets,which
represent individual roles or modes.For instance,in our case a robot can change a role
of an attacker to a defender depending on the situation.These roles are connected with
various activities and behaviour on higher decision levels.A behaviour set can be either
active or nonactive at one moment.There can be active only one behaviour set for each
robot.On contrary,the lower control levels (indicated as layers) are allowed to be active
continuously,i.e.for collision prevention.The selection of a behaviour set is performed
by motivation behaviour,each for one behaviour set extra.Motivation behaviour evalu
ates suitability of a given behaviour set and activates or inhibits it,which is in Figure 13
indicated by gateshaped symbols.Therefore,these modules are mutually interconnected
by crossinhibitions.
The main idea of this approach is based on an assumption that each agent should
perform a task only as long as it demonstrates its ability to perform the desired effect
on the environment.It means,agents are selfchecking and only they alone decide about
their activities.Such a behaviour is based on two principal motivations —impatience
and acquiescence.The impatience helps the agent to act if other agents fail.On contrary,
if the agent is performing a task,which continues over a certain time limit,then it will
become more willing to interrupt it and to enable other agents to continue with a chance
it will be accomplished.At the moment if other agents start performing the same task
the given agent will give it up.As an example we can mention transferring boxes from
one place to another one and their placing into racks.If an agent (A) starts transferring
then the agent (B) will try e.g.to place theminto racks.It is caused by the acquiescence
Figure 13.The ALLIANCE architecture [29].
motivation of B.After a certain time Acan decelerate its activities,e.g.due to discharged
batteries,which can be observed that B must wait for new boxes.Then B will start also
transferring boxes,i.e.the impatience motivation is shown.Contemporarily,the agent A
will give up transferring boxes due to the acquiescence motivation and another behaviour
set will be activated,e.g.starting recharging batteries.In other words,the robot Ashould
allow to the robot B to ﬁnish its task before it will again become impatient to B.On
contrary,there is no reason for Ato stay inactive if a given task is unﬁnished and together
no other robot starts accomplishing it.If both motivations are balanced to be mutually
complementary then both agents will cooperate without any eventual conﬂicts.
A behaviour set is composed of activities,which together direct to accomplishing a
task.It depends on a concrete task whether these actions are mutually interconnected by
some decision rules or they are simply chained into a series.For instance,we can deﬁne
a task passing.How to realize it,what sequence of steps to choose (e.g.leading a ball,
turning to an appropriate kicking angle,deciding about eventual obstacle avoidance and
ﬁnally kicking it,etc.),it is the objective of concrete tasks contained in a behaviour set.
Therefore,each behaviour set is handled as a unit.
Similarly,as in alternating activities among robots also behaviour sets alternate mu
tually.As already mentioned at one moment only one behaviour set can be active on
the robot i.However,if it is active considerably long and another behaviour set j ex
ceeds a certain activation threshold of its motivation value m
ij
(t) then the ﬁrst set will
be inhibited and the another one activated.The motivation m
ij
(t) depends on several
factors described by functions like sensory feedback sf(t),interrobot communication
irc(t),suppression fromactive behaviour sets sabs(t),as well as already mentioned im
patience imp(t) and acquiescence acq(t).For a time step t the total motivation m
ij
(t)
(m
ij
(0) = 0) is calculated as
m
ij
(t) = (m
ij
(t −1) +imp(t)) ×sf(t) ×irc(t) ×sabs(t) ×acq(t):(16)
The functions sf(t),irc(t),sabs(t) and acq(t) are in formof threshold functions,whose
outputs are 0 or 1.These values indicate occurrence of some conditions for growing
m
ij
(t) (value 1) or resetting it to 0.The impatience function imp(t) deﬁnes the rate
of motivation growth.All these functions contain a number of parameters.To easier set
them up an adaptation (learning) modiﬁcation of ALLIANCE named as LALLIANCE
was designed.More detailed description of ALLIANCE as well as LALLIANCE can
be found in [29].
2.2.Behavioral Kinodynamic Planning
MAS approaches tend to divide a given control system into hierarchical levels.Such a
way is indeed systematic giving an inner order but individual control levels lose imme
diate contacts each other because it is restricted only to exactly deﬁned communication
links.Further,many tasks are in reality hardly to separate because they are conditionally
interconnected.Such a simpliﬁed division into modules and levels can be too rough.Thus
higher strategic levels lose the contact with physics of a robot and its environment.Simi
larly,lower control levels do not have any mind about strategy.Further,soccer (football)
is considerably characterized by its randomness.From these two reasons the socalled
Randomized Behavioral Kinodynamic Planning (RBKP) of tactics was proposed in [7].
RBKP is based on embedding predeﬁned basic skills and tactics into a kinody
namic planning framework,which utilizes physical principles of kinematics and dynam
ics.Such an approach is strongly problemoriented,i.e.it requires speciﬁc knowledge
but its efﬁcacy can be very high.Using RRTsbased planning (see section 1.2) enables
quickly to deﬁne tactics as the best nondeterministically modeled combination of skills.
Robotic soccer is thus seen as a randomized kinodynamic planning problem.The struc
ture of RBKP is depicted in Figure 14.If we compare this structure with Figure 1 we
can see the modules of strategy choice and action planning in DL1 are merged in this
approach.
For modelling tactics a nondeterministic Finite State Machine (FSM) is used.This
approach enables to generate various sequences of state transitions,i.e.tactics,which
reduces the behaviour predictability for opponents,too.Accordingly,skills will be also
nondeterministic.The system searches over a variety of simulated executions until it
ﬁnds such an instance that leads to the goal.The skills do not execute deterministic
actions based on the state of the environment but they have certain freedomto randomly
chose from alternatives of their typical behaviour.For instance,if a skill named kick
to goal tries usually to kick the ball to the biggest free area of the goal it can change
randomly its habit to kicking into a corner of the goal and so to surprise the opponent
goalie.In such a way the system becomes more creative experimenting with various
alternatives and thus it has chance continuously to ﬁnd optimal tactics as well as skills.
In other words,it is a selflearning approach.
To transformour task of robotic soccer into the graph space of RRTs it is necessary
at ﬁrst to deﬁne the state space X of robotic movement rm as well as inner states of
robots described by FSM states fsm together with information about time t,i.e.x =
(t;fsm;rm),where x ∈ X is a concrete state of the whole playground.The robotic
movement represents a subspace of X,where e.g.positions,rotations,velocities,etc.
of all objects of the playground are included,i.e.not only robots but also the ball and
Figure 14.Structure of the randomized behavioral kinodynamic planning system[7].
other obstacles,etc.Hence X is basically a multidimensional space,generally with a
high dimensionality.To simplify such a form of representation it will be transformed in
a space Qof decision trees,in our case RRTs.
The process of constructing behavioral kinodynamic planning is following.Using
the start state the tree T is initialized.Then using the function SAMPLE_RANDOM_
STATE,which uses internal probability distribution,a sample state x
sample
from the
state space in form of (i) will enter the RRT creating algorithm,see Figure 6.In com
parison to section 1.2 the calculation of the nearest point q
n
requires using a more so
phisticated metrics than using Euclidean one,see [7].This is provided by the function
KINEMATIC_NEAREST_NEIGHBOUR.As behaviour depends also on inner states of
the robot it is needed to calculate state transition using a nondeterministic FSM(function
FSM_TRANSITION),too.Then we verify whether this new internal state fsm_state
is valid (function IS_VALID_FSM_STATE),i.e.all conditions and limitations are kept.
For instance,there is a rule that only the goalie can operate in the penalty area.Others
are prohibited to enter it.If it is not fulﬁlled a new (i) is chosen.If the new state is ac
ceptable an action a,which is a skill,is calculated by APPLY_FSM_BEHAVIOUR.This
function associates concrete inner states with actions into pairs (fsm_state −a),which
is the task of user’s imagination about the style of playing and they can be expressed also
in form of rules like IF fsm_state THEN a.The action a is usually a vector with sev
eral parameters describing e.g.angles for individual joints.At this moment a and q
n
will
enter the kinodynamic simulator (function SIMULATE),the socalled Physics Engine in
Figure 14,where after a ∆t simulation a new state x
new
is computed.This value corre
sponds to a point on the line segment q
n
−(i).After a certain number of simulations
it will reach (i) or in the case of an obstacle the point q
O
.The engine includes still a
function IS_VALID_PHYSICS_STATE,which checks also from viewpoint of physics
that all limitations are fulﬁlled.If it is correct then x
new
will be as q
new
added to T.
Again a new x
sample
is generated and the whole cycle will be repeated until a goal will
be shot or we will give up (or the game will be timed out).If a goal or its surrounding
X
goal
is reached we can apply a search tree algorithm(Figure 14) in backward direction
until the initial state is reached.After reversing we will get a sequence of skills,i.e.a
tactics.In Figure 15 the pseudocode of the described RBKP algorithmis depicted,which
was used for twowheeled robots in RoboCup [28],too.
Initialize RRT
for i = 1 to K
(i) ←SAMPLE_RANDOM_STATE()
q
n
←KINEMATIC_NEAREST_NEIGHBOUR(T;(i))
fsm_state ←FSM_TRANSITION(q
n
)
if IS_VALID_FSM_STATE(fsm_state)
a ←APPLY_FSM_BEHAVIOUR(fsm_state;q
n
;(i))
x
new
←SIMULATE(q
n
;a;∆t)
if IS_VALID_PHYSICS_STATE(x
new
)
Add node q
new
to T
Add edge (q
n
;q
new
) with value a
if x
new
∈ X
goal
return x
new
end if
end if
end if
end for
return failed
Figure 15.Pseudocode of RBKP algorithm[7].
Although this method does not guarantee the optimality of solution but there is one
interesting potential of this method.Thanks to the randomness the opponents can pre
dict the behaviour of our team only very difﬁcult but we can predict the behaviour of
opponents.As the physics of each investigated point x is exhaustively described and if
we incorporate some basic and supposed efforts of opponents like e.g.effort always to
catch the ball then using our physics engine we can model opponents’ behaviour and so
to undertake some steps in advance.It is only necessary to construct a special FSMfor
opponents.
2.3.NegotiationBased Approaches
Previous two MAS methods have their own strengths but also limitations.ALLIANCE
is a very strong method in real applications,where the environment is uncertain,sensors
and actuators are inaccurate,there are noises of various origin or other agents simply
fail.However,this approach in its pure formis not convenient for applications,where we
meet with adversarial behaviour.Behavioral kinodynamic planning needs ﬁrst of all an
accurate physical model of a given domain and it can be a serious problem to obtain it.
On the other hand side,its unpredictability for opponents and quick planning makes it
very suitable just for adversarial environments.However,both approaches do not guar
antee optimality and full efﬁcacy.Just the socalled negotiationbased approaches try to
optimize cooperation among robots and to ﬁnd the best one for a given task.In general,
these approaches are fully decentralized.An agent,which needs to fulﬁl a task at ﬁrst
offers it e.g.by broadcasting to others.Then the agent with the best bid will be assigned
the given task and hereby it will be fully responsible for its accomplishing.There are var
ious modiﬁcations and extensions of this approach and further we will deal with the so
called marketdriven approach (MDA),which was derived from an auctionbased task
allocation systemMURDOCH [30].
MDA[31] is based on the principle of freemarkets,which utilize a supposition that
if each participant of a market tries to maximize its own proﬁt then the proﬁt of the
whole team will be increased,too.Of course,the communication among participants of
a common team is necessary and they interchange all necessary information.The team
tries for a common objective,in our case a game win.Usually,such a common objective
is reachable only after a series of steps (partial objectives) into them it is decomposed.
For instance,shooting a goal is a result of several combined steps (objectives or tasks) as
passing,catching and dribbling the ball.For each partial task an auction is done.There
is a virtual auctioneer and a group of bidders,i.e.players in this context.The bidders
compute supposed costs necessary to performa given task and offer themto the auction
eer.In the case of mobile robots the costs are in the formof time consumption for doing
translation or rotation during movement to a given position or obstacle avoidance.The
auction winner is that one offering the lowest task costs and its reward is the allowance to
performthis task.Incorporating further auction rules depends on the level of complexity
that we need to reach,e.g.embedding a certain momentumelement to prevent too often
exchange of player roles.
The concrete implementation is based on estimating these costs [31] for each robot
separately:
C
ES
= w
1
· t
d
+w
2
· t
a
+w
3
· COA
g
;(17)
C
A
= C
ES
+w
4
· COA
b
;(18)
C
D
= w
5
· t
d
+w
6
· t
a
+w
7
· COA
d
:(19)
The value C
ES
represents estimated score costs computed as a sum of costs required
to move for a speciﬁed distance t
d
and angle of rotation t
a
as well as costs caused by
obstacle avoidance (COA) on the path to the opponent goal area COA
g
.C
ES
is just the
value being offered to the auctioneer by the robot i,so the equations (17) – (19) should
be correctly indexed by i.However,real costs for the winner (the attacker) is by (18),
where it is still necessary to take into account costs of obstacle avoidance on the path to
the ball COA
b
,which are in the reality the ﬁrst calculated costs preceding C
ES
.How
ever,it is supposed that the initial goal position is more important then the position to
the ball,which can be passed by another teammate to the attacker,i.e.auction winner.
Similarly to (17),the defender’s costs are calculated,where COA
d
represents costs of
obstacle avoidance on the path to the midpoint on the line between the own goal and
the ball position.The weights w
1
;:::;w
7
determine the playing strategy.For instance,
if w
1
is a high value then the winning robot will try to shoot a goal only fromimmediate
distance to the opponent goal because only robots occurring in that area have the chance
to become an attacker.There are several ways how to design these weights.Beside ex
pert evaluations several settingup methods for w
i
were proposed,e.g.reinforcement
learning,neural networks and genetic algorithms.
MDA is based on dynamical assigning roles of attackers and defenders.Beside the
goalie whose role is given statically for the whole game another robots change their tasks
according their positions to the goals (own or opponent) and costs of obstacle avoidance.
Each robot (beside the goalie) computes in each sampling step (17) – (19) and offers
them to two auctions —for choosing one attacker and one defender.The robot with the
lowest value of C
ES
will become an attacker and similarly for the lowest C
D
it will
take over the role of a defender.Other robots can be in position of secondary attackers
or further defenders.If,as mostly in robotic soccer,there are only three robots —one
goalie,one attacker and one defender the task will be simpliﬁed.It will be necessary only
to calculate (17) and the defender will be the remaining robot.The whole role assigning
algorithmis depicted in Figure 16,where in the module of role assignment either the role
of a defender is assigned to a given robot or of any auxiliary position.This approach was
successfully used in SPL [32],too.
Figure 16.Role assigning algorithmusing MDA.
In [33] a hybrid Integrated Decisionmaking system of Players (IDP) was designed,
which connects MDA and FCMs.Its basic structure is in Figure 17.Cyclically,at cer
tain time steps there are performed vision with image processing and scene recognition
followed by strategy choice realized by MDA.After that the possibility to kick the ball
is veriﬁed.If a robot is in the kicking position regardless of its current role it will try
to kick the ball in correspondence with its role.Otherwise the robots will continue their
movements during one time step and then the playing situation will be again evaluated.
Parameters for equations (17) – (19) are obtained from cameras as well as COA values
are estimated in a special navigation module proposed in [34],whose output is utilized
in the module of strategy choice.
Figure 17.Basic architecture of IDP approach.
In MDA the whole strategy of play is formulated implicitly rather offensively than
defensively because players try always to win and kick the ball regardless of their role.
The formulae (17) – (19) serve for role assignment and the weights w
i
will concretely
determine conditions when individual roles will be assigned.However,how individual
players will play it depends on particular rules that we deﬁne for each role.We can pro
pose either only simple rules,e.g.a defender will try always to pass the ball to the attacker
or there can be deﬁned also more complicated sets of rules combining e.g.passing and
dribbling or mutual passing among several players.In IDP these rules are implemented
into FCMs (in the module strategy choice).
The decisionmaking module about kicking possibility (Possible kick?) detects a
state whether a robot is already able to kick the ball by the instructions of its role or not.
If yes,then it will determine the kicking target (goal or teammate) and invoke a robot
controller from the low control level for controlling all its actuators and the robot will
kick.The decisionmaking module checks whether kicking has caused any instabilities
(e.g.crash or downfall).For determining the kicking target there is proposed a special
system based also on FCMs,where the shooting and catching rules are included (see
Figure 18),which is a part of the overall IDP FCM.Each player has its range of interest,
which is projected into supports of membership functions used in the nodes TM
i
(see
Figure 19).Just in this range players try to move.The wider the support the bigger the
range of interest is.FCMof kicking differs fromother parts of the overall FCMin a fact it
is dynamic,i.e.its membership functions and weights of connections change in time and
depend on the current playing situation.Peaks of the membership functions in the nodes
TM
i
represent angles of view of players seen from the shooter’s perspective,similarly
for the opponent O.The successfulness of passing the ball depends on mutual positions
of players as well as on shooter’s speed.Therefore,the weights of the connections w
si
are indirectly proportional to distances between the shooter and teammates.Values of
nodes A
j
are also indirectly proportional to the size of the angular rotation needed for
the shooter.These nodes (A
j
) represent possible angular ranges (intervals) of shooting
to individual teammates.The calculation of resultant kicking angle is performed in the
node S.
Figure 18.General structure of the kicking FCM.
In each time step new membership functions in the nodes TM
i
and O are setup,
having their peaks at measured angle values of players’ positions.Simultaneously,the
weights w
si
and values in the nodes A
j
are calculated in the already mentioned manner.
As the membership functions in the nodes TM
i
should help attracting the ball and re
versely the function in the node Oshould repulse it thus this function will be constructed
in a reverse manner than other functions,i.e.it will be minimum at the measured angle
of the opponent,see Figure 19.Then the values of the nodes A
j
will be multiplied to
gether with the weights w
si
.Thus we get total costs of shooting that include rotation and
distance.The functions TM
i
will be scaled using these costs and ﬁnally in the node S
all membership functions will be united using a tconormS
K
.
Figure 19.Example of the evaluation process in a kicking FCMbased on membership functions TM
i
and O.
3.Concluding Remarks and Outlook into Future
In this chapter an overview about some means and decisionmaking systems has been
done.As robotic soccer represents a set of tasks and problems that appear in real appli
cations it was chosen for demonstrating these means.It is only a short and incomplete di
gest froma variety of used methods.We can observe there are efforts to use means from
various research areas starting from conventional control theory and ending in artiﬁcial
intelligence as well as to interconnect them although they can be of very diverse princi
ples.There is only one criterion for their choice —efﬁciency of an individual means for
a given individual task.It is often miraculous how these means can mutually cooperate.
Hybridization has become to a slogan in constructing decisionmaking systems.
As present technological level of robotics does not yet enable using too complex
algorithms and decision structures the designers are still forced to develop calculation
saving systems,which is reﬂected on their decision structures,too.However,in this area
we can await a very rapid change in only a few years.Especially,in last not only ﬁve
years many advanced mobile robotic platforms have appeared with relatively powerful
processors as well as quality sensors and actuators,e.g.robotic dog Aibo or bipedal robot
Nao,etc.
In future we can suppose the research in robotics will be more oriented on inter
actions with humans,which will lead to use of emotional models and fuzzy logic in a
greater measure.We will observe development of decisionmaking as well as control
systems fromprecise sensing and description of the environment (world) to systems be
ing able to work in uncertain and heavily predictable environments.Now the period of
using probabilistic and uncertain methods is starting in robotics,too.
References
[1]
J.Baltes,N.M.Mayer,J.Anderson,K.Y.Tu and A.Liu,The humanoid leagues in robot soccer
competitions,In Proc.of the IJCAI Workshop on Competitions in Artiﬁcial Intelligence and Robotics,
Pasadena,California,USA (2009),9–16.
[2]
D.García,E.Carbajal,C.Quintana,E.Torres,I.Gonzalez,C.Bustamante and L.Garrido,Borregos:
Teamdescription,In RoboCup papers (2010).
[3]
H.P.Störr,A compact fuzzy extension of the naive Bayesian classiﬁcation algorithm,In Proc.In
Tech/VJFuzzy (2002),172–177.
[4]
C.Bustamante,L.Garrido and R.Soto,Fuzzy naive Bayesian classiﬁcation in RoboSoccer 3D:Ahybrid
approach to decision making,In Proc.of the RoboCup International Symposium,Bremen,Germany
(2006),507–515.
[5]
T.Hester,M.Quinlan and P.Stone,Generalized model learning for reinforcement learning on a hu
manoid robot,In Proc.IEEE Int.Conf.on Robotics and Automation (ICRA),Anchorage,USA (2010),
2369–2374.
[6]
J.R.Quinlan,C4.5:Programs for Machine Learning,Morgan Kaufman Publishers,1993.
[7]
S.Zickler and M.Veloso,Playing creative soccer:Randomized behavioral kinodynamic planning of
robot tactics,In Proc.of RoboCup 2008:Robot Soccer World Cup XII (2008),414–425.
[8]
S.M.LaValle,Planning Algorithms,Cambridge University Press,Cambridge,United Kingdom,2006.
[9]
J.M.Vidal,Fundamentals of Multiagent Systems with NetLogo Examples,Unpublished,2010.
[10]
J.Brindza,A.Lee,A.Majumdar,B.Scharfman,A.Schneider,R.Shor and D.Lee,UPennalizers:Team
report,In RoboCup papers (2009).
[11]
C.Watkins,Learning from Delayed Rewards,PhD thesis,University of Cambridge,1989.
[12]
S.Barrett,K.Genter,M.Hausknecht,T.Hester,P.Khandelwal,J.Lee,A.Setapen,A.Tian,M.Quinlan,
M.Sridharan and P.Stone,TTUT Austin Villa:Teamdescription,In RoboCup papers (2010).
[13]
T.Hester and P.Stone,Generalized model learning for reinforcement learning in factored domains,
In Proc.of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS),
Budapest,Hungary (2009),717–724.
[14]
S.Oblak,I.Škrjanc and S.Blažiˇc,If approximating nonlinear areas,then consider fuzzy systems,IEEE
Potentials 25 (2006),18–23.
[15]
R.Axelrod,Structure of Decision:The Cognitive Maps of Political Elites,Princeton University Press,
1976.
[16]
P.P.Groumpos,Fuzzy cognitive maps:Basic theories and their application to complex systems,In
Fuzzy Cognitive Maps,Studies in Fuzziness and Soft Computing 247 (2010),1–23.
[17]
B.Kosko,Fuzzy cognitive maps,International Journal of ManMachine Studies 24 (1986),65–75.
[18]
S.M.Chen,Cognitivemapbased decision analysis based on NPN logics,Fuzzy Sets and Systems
71 (1995),155–163.
[19]
W.Stach,L.Kurgan,W.Pedrycz and M.Reformat,Genetic learning of fuzzy cognitive maps,Fuzzy
Sets and Systems 153 (2005),371–401.
[20]
E.I.Papageorgiou,K.E.Parsopoulos,C.D.Stylios,P.P.Groumpos and M.N.Vrahatis,Fuzzy cogni
tive maps learning using particle swarm optimization,International Journal of Intelligent Information
Systems 25 (2005),95–121.
[21]
S.Bueno and J.L.Salmeron,Benchmarking main activation functions in fuzzy cognitive maps,Expert
Systems Applications,36 (2009),5221–5229.
[22]
J.Vašˇcák and M.Pal’a,Adaptation of fuzzy cognitive maps for navigation purposes by migration
algorithms,International Journal of Artiﬁcial Intelligence (in press).
[23]
Z.C.Johanyák and S.Kovács,A brief survey and comparison on various interpolationbased fuzzy
reasoning methods,Acta Polytechnica Hungarica 3 (2006),91–105.
[24]
P.Stone and M.Veloso,Multiagent systems:Asurvey froma machine learning perspective,Autonomous
Robots 8 (2000),345–383.
[25]
H.S.Nwana,Software agents:An overview,The Knowledge Engineering Review,11 (1996),205–244.
[26]
M.J.Matari´c,Learning in behaviorbased multirobot systems:Policies,models,and other agents,
Cognitive Systems Research,2 (2001),81–93.
[27]
L.E.Parker,Evaluating success in autonomous multirobot teams:Experiences from ALLIANCE
architecture implementations,Journal of Experimental and Theoretical Artiﬁcial Intelligence 13 (2001),
95–98.
[28]
J.Bruce,S.Zickler,M.Licitra and M.Veloso,CMDragons:Dynamic passing and strategy on a cham
pion robot soccer team,In Proc.IEEE Int.Conf.on Robotics and Automation (ICRA),Pasadena,USA
(2008),4074–4079.
[29]
L.E.Parker,ALLIANCE:An architecture for fault tolerant multirobot cooperation,IEEE Transactions
on Robotics and Automation 14 (1998),220–240.
[30]
B.P.Gerkey and M.J.Matari
´
c,Sold!:Auction methods for multirobot coordination,IEEETransactions
on Robotics and Automation 18 (2002),758–786.
[31]
H.Köse,C.Mericli,K.Kaplan and H.L.Akin,All bids for one and one does for all:Marketdriven
multiagent collaboration in robot soccer domain,In Proc.18th International Symposium of Computer
and Information Sciences (ISCIS),Antalya,Turkey (2003),529–536.
[32]
H.L.Akin,T.Mericli,N.E.Özkucur,C.Kavaklioglu and B.Gökce,Cerberus’10 SPL team,In
RoboCup papers (2010).
[33]
J.Vašˇcák and K.Hirota,Integrated decisionmaking system for robot soccer,Journal of Advanced
Computational Intelligence and Intelligent Informatics 15 (2011),156–163.
[34]
C.Pozna,F.Troester,R.E.Precup,J.K.Tar and S.Preitl,On the design of an obstacle avoiding
trajectory:Method and simulation,Mathematics and Computers in Simulation 79 (2009),2211–2226.
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο