Decision-Making Systems in Mobile

Robotics

Ján VAŠ

ˇ

CÁK

Technical University of Košice,Slovakia

Abstract.This chapter deals with description of means for decision-making in mo-

bile robotics as well as some integrated decision-making systems.They are mostly

oriented to applications in robotic soccer because it enables to solve a number

of robotic problems from optimal control of individual actuators up to construct-

ing complex multiagent systems for teams of robots.The spectrum of nowadays

decision-making means that are used in robotics is much spread and it is not possi-

ble to mention themall.Therefore,there are described only such means that seems

to be perspective for their use in robotic decision-making processes like naive

Bayes classiﬁers,decision trees,Markov decision processes and fuzzy cognitive

maps.Based on these means several autonomous decision systems are described.

Keywords.decision-making systems,mobile robotics,robotic soccer,multiagent

systems

Introduction

Robots are devices of diverse constructions,which perform various tasks and work in

manifold environments.Therefore there are possible not only many different decision

and control structures but also the deﬁnition of the notion decision can vary from one

application to another one.We can meet beside this notion also with expressions like

behaviour,strategy choice or control only.Because the border line between decision and

control is also not clear.This fact is not only caused by deﬁnition views of researchers

but also by a fact there are performed many mutually interconnected processes and their

separation would be often suspicious.

Mobile robots and especially humanoids represent the top level of robotics.Because

of their complexity they are the best objects for depicting all possible robotic activities

and related tasks as sensing,especially vision,low-level control,image recognition,lo-

calization,strategy choice,navigation and planning.In all of the mentioned activities de-

cision is contained in any form.Therefore we will concentrate our focus just on mobile

robots in this chapter.Robots can be placed either in static or dynamic environments,

where their elements are changed in time.Further,they work individually or as members

of a team,which requires additional abilities to communicate as well as to cooperate

each other.To show basic information modules of such a robot,relations among them

and possible approaches how to implement decision-making methods being convenient

for robotics it is useful to propose a complex task,where all these elements would be

presented.Development of the mentioned tasks and means for their solution and their

composition into a meaningful entity led to establishing robotic soccer,which will be

mostly our benchmark problem,where we will show individual means as elements of

complex decision-making systems.

Figure 1 shows basic information modules and relations divided into individual De-

cision Levels (DL) from DL-0 to DL-2.These DLs do not differ only by their subordi-

nation (DL-0 as the lowest and DL-2 as the highest) and their tasks but also by means

used.

Figure 1.Basic information structure of a mobile robot conﬁgured by decision levels DL-0 – DL-2.

Control can be also viewed as a special kind of decision-making,although its meth-

ods are far from human imaginations about decision-making and approaches.However,

required reaction times for the lowest DL are very short so DL-0 utilizes mostly ’classi-

cal’ approaches like PID controllers and Kalman ﬁlters based on analytical description

of the inverse kinematics task by physical equations.Another advantage of these meth-

ods is their ability of doing precise analyses concerning mainly the stability and robust-

ness.These means are used for creating fundamental motion primitives as e.g.stepping,

turning of wheals or obstacle avoidance,which are then combined by instructions from

higher DLs.A special but not the least module is reactive navigation which solves colli-

sions and because of very quick reactions required it is placed into this DL.Processing

sensor data can be divided into two basic groups:(a) basic (rough) processing and (b)

advanced one.Especially for mobile robots there are visual data from cameras the most

signiﬁcant although the data processing is not limited only to vision.Similarly as motion

control the ﬁrst group is based on primary operations like ﬁltering,image segmentation,

edge detection using gradients,etc.The output is in the case of vision a compressed im-

age with potentially extractable (but not yet extracted) and highlighted information rele-

vant for decision-making.As seen DL-0 modules use the most strange methods as well as

mathematical calculi fromthe viewpoint of a human and represent a special independent

research area.However,they are at the beginning in formof basic sensor data processing

as well as the end as motion control of the whole decision-making chain and therefore

their omitting would lose completeness of this introductory overview.As the means used

in DL-1 and DL-2 have much more common properties we will further deal with these

two ones.

DL-1 is an overbuilt on DL-0,which enables to a robot to act independently.The ap-

proaches used reﬂect much more human manners than in the case of DL-0.Many of them

belong into the area of artiﬁcial intelligence like decision trees,reinforcement learning,

rule-based systems,fuzzy logic,cognitive maps,etc.The ability of adaptivity and self-

learning became an inseparable part of this DL,too.Thus we can include all adaptation

methods,e.g.evolutionary algorithms and neural networks.The advanced sensor data

processing is practically the second stage after the basic one is done.The compressed

image is still in form of rough data.However,we need to extract e.g.obstacles,target,

paths,etc.fromthe image,i.e.ﬁrstly to recognize,secondly to localize themand eventu-

ally to describe themin a more detail.In reality,we need to extract froma data pixel ma-

trix a scene with relevant objects and mutual relations,which is the extracted image in-

formation.Based on this information we can do some decisions.Basically,we can divide

decision-making into two stages (in simpler designs they merge into one) —a strategic

and a tactic one.Doing a border line between these two stages is much speculative.They

depend on individual designs but roughly we can think about strategy in relation to be-

haviour.For instance,in a soccer (football) there are defenders,attackers,goalies,etc.

Principally,any robot can do any of these roles.Their assignment will depend only on

the type of strategy,e.g.defensive or offensive one.For successful performance of the

strategy choice module the situation evaluation is the most important.Resulting activi-

ties fromthis module like planning of activities,choice of a convenient path,etc.belong

to the subordinated tactic stage.If we consider only one robot in a given environment

without any other ones DL-1 will be the top level and not any additional modules are

necessary.

However,if there are at least two robots doing their tasks for the same goal they need

to communicate and to cooperate.In such a case we get a Multiagent System (MAS),

which represents the highest level DL-2 and supersedes the strategic stage in DL-1 in

a large measure.For solving complex robotic tasks MAS becomes intrinsic part of any

decision-making design and therefore we will deal with MAS in a special section.Of

course,all data,information and control ﬂows create a very complex structure and hereby

they require special programming techniques for their handling like memory sharing,

assigning priorities,etc.Such a managing system represents a decision-making system

about itself.However,its description needs extending to programming techniques and

therefore it will not be a subject of this chapter.

In next section 1 some basic means used in decision-making systems will be

described.As our benchmark problem is robotic soccer we will concentrate our at-

tention just on the means used in this area.Nowadays,two main associations ex-

ist,which deal with robotic soccer — FIRA (http://www.ﬁra.net/) and RoboCup

(http://www.robocup.org/).Their characterization,common as well as different features

are described in [1].These associations offer a number of various competitions and we

will focus particularly on the Standard Platform League (SPL),see (http://www.tzi.de/

spl/bin/view/Website/WebHome),where descriptions and reports of participating teams

can be found.They will be cited like RoboCup papers.Although all designs use hybrid

approaches of several mixed approaches one fact is obvious.More than one half of teams

utilize less or more means of artiﬁcial intelligence and their popularity is steadily grow-

ing.This is the reason why we will concentrate in descriptions mainly on artiﬁcial intel-

ligence means.After that some autonomous robotic decision-making systems composed

of these means will be featured in the section 2.

1.Means for Decision-Making in Robotics

Decision-making processes need two basic groups of means:(a) process description and

(b) object description.The means of the ﬁrst group are mostly graph-based approaches

like decision trees,Markov chains and related Markov decision processes,cognitive

maps,ﬁnite automata and Petri nets,which enable to observe information ﬂows as well

as transitions among states (situations) by actions.Objects (the second group) can be of

diverse matter and therefore we need a palette of convenient means for their description,

e.g.probabilistic and fuzzy models or emotional models for interaction man–machine.

Usually,objects are by means of the ﬁrst group projected into graph nodes.Of course,

such a division of means is ﬁctive in praxis and they are mutually interconnected.It

should only help us to realize the matter of these two aspects.All these approaches are

able to describe most parts of the same information.The only difference is their knowl-

edge representation.Each type of knowledge representation has some strengths,which

are useful for a given application.

As the fundamental knowledge representation the rule base can be considered and

others may be derived from it.Like an example we can mention the rule base of a goal

keeper [2]:

1.

Search the ball.

2.

Calculate direction and distance to the ball.

3.

If the ball is far then go to the center of the goal.

4.

If the ball is close and straight then wait.

5.

If the ball is close,not moving and straight then go to kick it.

6.

If the ball is close and not straight then walk sideways.

Figure 2.A rule set of the goal keeper’s behaviour.

Figure 2 depicts only a very simple and uncertain rule base.The descriptions of

notions like walk sideways or close are missing.The problem of obstacles in form of

players is not solved,etc.Besides,rules create usually decision (implication) chains and

even closed loops.We can see adding other rules will cause such a rule base will be hard

readable.This is just one reason why other more advanced means for decision-making

process description are necessary.In next parts of this section some of mentioned means

will be described on examples of robotic soccer applications.

1.1.Fuzzy Naive Bayes Classiﬁer

In any task there is a set of possible actions and robots decide,which one of themwill be

performed next time,e.g.to kick a ball to a goal or rather to pass it to a teammate,etc.We

must take into account also a fact that robots are strongly limited by computational power

and sampling times (time intervals,whose robots must react in),which requires ﬁrst of all

simple solutions.Naive Bayes Classiﬁers (NBC) correspond just to these requirements.

Although their main area of use is in pattern recognition but some applications appear

also in robotics.

Let us suppose a set (class) of possible actions C = {c

1

;:::;c

k

}.If any action

could be performed some conditions must be fulﬁlled,which are described by feature

variables X

1

;:::;X

n

.Each X

i

has a ﬁnite set of values x

i

∈< x

i;min

;x

i;max

>.If

feature variables achieve certain values,which can be measured then we can ask about

possibility that C = c

j

will be chosen.Hence the conditional probability of occurrence

of C = c

j

named as conditional model and described generally as P(C|X

1

;:::;X

n

)

can be computed using Bayes’ rule as

P(C|X

1

;:::;X

n

) =

P(X

1

;:::;X

n

|C):P(C)

P(X

1

;:::;X

n

)

:(1)

As the denominator in (1) is a constant being independent from any given action it can

be assumed like a normalization value and for further omitted without the loss of cor-

rectness.In addition,if we suppose feature variables are mutually independent then (1)

can be rewritten as

P(C|X

1

;:::;X

n

) = P(C):

n

∏

i=1

P(X

i

|C):(2)

The expression (2),known as naive Bayes probabilistic model,shows we can handle one

dimensional distributions instead of a compound one in (1) hereby the task is simpliﬁed

sparing computational efforts and thus the so-called curse of dimensionality is obviated.

Moreover,if the conditional independence assumption is not fulﬁlled,which is a very

often case in praxis,in spite of that NBC will mostly work well.Robustness is a typical

feature of NBC.

NBCcombines its probabilistic model with a decision rule.In this case we search for

a hypothesis,which is the most probable.Therefore,we will use the so-called maximum

a posteriori decision rule and NBC is deﬁned as a function

NBC(a) = arg max

c

j

∈C

P(c

j

):

n

∏

i=1

P(x

i

|c

j

);(3)

where X

i

= x

i

,C = c

j

and a is a combination of feature variables X

i

for given values

x

i

,i.e.a = {X

1

= x

1

;:::;X

n

= x

n

},which represents an example being classiﬁed

by NBC.The output of the classiﬁer is such an action c

j

,where (3) has the maximum

value.NBC can be rewritten into a set of k rules,where ranges of given feature variables

are deﬁned.If a not yet classiﬁed example enters the NBC evaluating (3) will result in an

action c

j

with the highest probability.In the case of decision,which of three teammates

a ball should be passed to,there will be three rules (each for one teammate) with nine

intervals for feature variables of distances D

PB

,D

OB

,D

T1

,D

T2

,D

T2

and angles ,

1

,

2

and

3

as shown in Figure 3.

Figure 3.Feature variables of NBC for passing;P – passer,T – teammate,O – opponent, – ball.

A classical NBC supposes for feature variables discrete intervals x

i

∈< x

i;min

;

x

i;max

> but in reality we operate with continuous intervals (in our case distances and

angles).Further,it could also be advantageous to deal with linguistic indications of these

variables like e.g.short distance or wide angle.The deﬁnitions for short and wide would

cover a range of values.Thus we get a ﬁnite number of linguistic values,which cover

intervals of feature variables continuously.In such a manner we obtain a fuzzy variant

of NBC (FNBC).In addition,fuzzy systems are typical with their robustness,which is

testiﬁed in this case by continuous transition from one action to another one.In [3] the

following modiﬁcation of NBC (3) is proposed:

FNBC(a) = arg max

c

j

∈C

P(c

j

):

n

∏

i=1

x

i;max

∑

x

i

=x

i;min

P(x

i

|c

j

):

Lx

i

;(4)

where

Lx

i

is a membership function and Lx

i

stands for any linguistic indication like

short distance or wide angle,see Figure 4.

NBCs as well as FNBCs have another very important advantage.They can be con-

structed automatically using a relatively small amount of r examples e = {X

1

=

x

1

;:::;X

n

= x

n

;C = c

j

} from the training set L (e ∈ L) using supervised learning,

i.e.we know desired relations between inputs (feature variables X

i

) and outputs (ac-

tions c

j

).Just fromthis reason the transformation (1) was necessary.To calculate (4) the

following formulae were proposed:

(a) (b)

(c) (d)

Figure 4.Membership functions for feature variables:(a) distance D

PB

,(b) distances D

OB

and D

T

1

B

–

D

T

3

B

,(c) angle ,(d) angles

1

–

3

;S – small,M– medium,L – large,C – close,W– wide [4].

P(C = c

j

) =

1 +

∑

e∈L

e

c

j

r +k

;(5)

P(X

i

= x

i

) =

1 +

∑

e∈L

e

Lx

i

r +nmf

i

;(6)

P(X

i

= x

i

|C = c

j

) =

1 +

∑

e∈L

e

Lx

i

:

e

c

j

nmf +

∑

e∈L

e

c

j

;(7)

where nmf is the number of all deﬁned membership functions for the feature variable

X

i

and

e

c

j

is a truth function of an action c

j

in a given example e.For correctness we

require for all membership functions the condition of fuzzy partition,i.e.

∑

∀i

i

(u) = 1,

u ∈ U (U as a general universe of discourse).

In [4] a slightly modiﬁed variation of FNBC for passing a ball is described,which

was used in [2] on a simulation platform.The scenario in Figure 3 is simpliﬁed only for

one teammate.A series of experiments for various conﬁgurations ball,passer,opponent

and teammate is performed.At the beginning of each experiment the feature variables are

recorded (D

PB

,D

OB

,D

T

,,).Then both the opponent and the teammate try to win

the ball.If the teammate gets the ball the experiment will be tagged as success otherwise

miss.In this case the classiﬁcation is dichotomic (either success or miss).After the learn-

ing stage the classiﬁer is applied in a play with four teammates and four opponents in the

following way.The passer uses the proposed FNBC for evaluation of individual com-

petitions,i.e.one teammate versus one opponent doing all possible combinations.The

lowest probability of success is stored for each teammate and ﬁnally the teammate with

maximum probability of success is chosen,which is the so-called min-max approach.

In spite of the fact that the real play is different from that one in the learning stage the

quality of such a classiﬁer is satisfactory,which again proves the robustness of FNBC.

1.2.Decision Trees

Decision Trees (DT) represent one of the oldest means for decision-making.They have

several signiﬁcant advantages like their simplicity and hereby ability to process with

large data sets.Further,they are able of generalization,where their nodes behave like

centers of clusters in a state space [5].In addition,they are easy to be analysed,e.g.

information gain or entropy calculation and other statistical evaluations.Finally,fromthe

viewpoint of comprehensibility they are of great importance like means for explanation

mechanisms and graphical representation of basic notions and their relations.From a

variety of DT learning methods the Quinlan’s C4.5 [6] and its commercial improvement

C5 are especially very popular.However,comparison of results obtained by conventional

DTs with NBCs proved a little smaller efﬁcacy for DTs in a number of applications,e.g.

in the problemof passing (section 1.1) [4].Only more advanced means derived fromthe

theory of DTs are able to overcome NBCs.

In next part we will focus on a special type of DTs,which was especially devel-

oped for motion planning and was used as a part of the so-called behavioral control for

purposes of robotic soccer [7].Rapidly exploring Dense Trees (RDTs) [8] represent a

family of trees,whose main idea is to incrementally construct a search tree,which would

continuously improve its resolution.In this case decision-making is transformed into a

searching task to ﬁnd a route (trajectory) from a starting point q

S

to the goal q

G

.These

DTs are constructed with the help of a dense sequence compound from samples (i),

which describe given area C.If is a randomsequence then we will call such a DT as a

Rapidly exploring Random Tree (RRT),which is the objective of next considerations.

The RRT algorithm starts with the ﬁrst sample (1),which creates the ﬁrst node of

the tree.Then the second sample (2) will do the same and the nodes will be mutually

connected.After this initial step other samples will try to ﬁnd the nearest point q

n

of

such a gradually growing tree and to create a direct link between q

n

and (i).If q

n

is

a node then a new edge will be added.If q

n

lies somewhere on an existing edge then it

will be divided in the place of q

n

,which will become a new node connecting the graph

with (i).Thus in each step one or two newedges will arise and (i) will become a new

node.If between q

n

and (i) an obstacle exists then a new node q

O

will be created on

the border with the obstacle closer to the tree and (i) will be returned somewhere to the

sequence ,see Figure 5.

The RRT algorithm can be rewritten into a pseudocode as depicted in Figure 6,

where we can see the number of tree nodes is at least equivalent to the number of (i).

If they are dense enough and properly dispersed in the investigated area then we will get

an accurate map of free area S (S ⊆ C) with indicated obstacles and further limitations.

Individual edges are possible paths,which can be chained into a trajectory froma starting

point q

S

to a goal q

G

using any graph search algorithm.For generating a motion plan we

start creating the tree with q

S

and continue expanding (exploring the area) until a node

in q

G

(logical tag Path = 1) is created or after a given number of cycles we will give

up,i.e.Path = 0.The functions NEAREST and SEARCH_BORDER can be realized

variously.The main effort is in minimizing computational demands because these two

functions are often used.Also expanding the tree can be modiﬁed,e.g.as simultaneous

Figure 5.Process of creating a RRT;O – obstacle.

expanding of two trees fromq

S

and q

G

,respectively until they mutually merge.A more

detailed description of RDTs can be found in [8].

1.3.Markov Decision-Making Processes and Reinforcement Learning

A real robot is a dynamic system,where its states change.To exactly describe it,we

need to consider its current state,actions and also partially history of its states,which

is in form of a series of signiﬁcant derivatives.It means a kind of memory is necessary

and the problem becomes too complicated (description as well as solving).However,if

the sampling is dense enough we can do a simpliﬁcation based on the assumption that a

new state s

′

of the robot will depend only on its current state s (s;s

′

∈ S,S – set of all

possible states) and performed action a.If we suppose there is no deterministic transition

s → a → s

′

we will describe it as a probability p in form of the transition function

T(s;a;s

′

),which is in other words a conditional probability p(s

′

|s;a),too.Considering

states as nodes of a graph and actions as its edges we can depict all possible combinations

s → a → s

′

as seen on an example in Figure 7,where edge expressions a

i

:p

j

denote

probability p

j

that action a

i

will transform the input state of a given edge to the output

one.We can see it is required to keep a sum of all probabilities for any action a

i

of a

given node equal to one (the second Kolmogorov probability axiom).

Besides,a reward function r:S → R is deﬁned,which enables to sort individual

states.Then we will start a chain of transitions from one state to another one,etc.Each

transition will be rewarded.After a series of such transitions the immediate rewards r

will be in a way accumulated into a cumulative reward (utility) u.Of course,a robot

Path ←0

Initialize RRT

for i = 1 to K

q

n

←NEAREST(RRT;(i))

if q

n

̸=node

Split edge

Add q

n

as a node to RRT

end if

if edge (q

n

;(i)) ⊂ S

Add (i) as a node to RRT

Add edge (q

n

;(i))

if (i) = q

G

Path ←1

return

end if

else

Add node q

O

←SEARCH_BORDER(edge (q

n

;(i)))

Add edge (q

n

;q

O

)

Include (i) into

end if

end for

Figure 6.Pseudocode of RRT creating algorithm.

(agent) tries to maximize u.How to solve this task under mentioned conditions it is

the objective of the so-called Markov Decision Process (MDP),whose output is usually

optimal policy formulated as

∗

(s) = arg max

a∈A

∑

∀s

′

T(s;a;s

′

):u(s

′

):(8)

In other words, is a mapping from states to actions :S → A (A – the set of all

actions).The inﬂuence of future rewards is often required to be decreased by a discount

factor like e.g.

0

:r(s

1

)+

1

:r(s

2

)+

2

:r(s

3

):::for a sequence of states s

1

;s

2

;s

3

;:::

Considering a similar sequence of states we can estimate the cumulative reward u for

each state s and so we will get a systemof n Bellman equations (if there are n states) as

u(s) = r(s) + :max

a

∑

∀s

′

T(s;a;s

′

):u(s

′

):(9)

From computational complexity reasons the equations (9) are mostly solved iteratively.

A more detailed description can be found in [9].

We see that MDP simulates processes in time as well as it also gradually enhances

their quality and improves assigning correct actions to individual states.It is an adaptive

(self-learning) means,too.MDP is a general calculus and can be used not only in sim-

ulation processes (e.g.epidemic spreading) but also in planning or multiagent systems.

In [10] MDP was used for localization of robots,where this approach was combined

with a Kalman ﬁlter.MDP was used for estimation of the rotation angle and Kalman

ﬁlter for estimation of translational robot’s position.Such hybrid approach simpliﬁed

Figure 7.Graphical representation of MDP —an example [9].

the complexity of this probabilistic task and hereby it shortened the computational time,

where the information fromsensors is not totaly precise.

However,a fully described MDP as in Figure 7 is often not available.In reality,

robots must operate in unknown areas where states,their relations and even rewards are

unknown.Thus MDP without some modiﬁcations is not usable.One of means for solving

such a problem is Reinforcement Learning (RL) representing a broad set of methods,

which use environment description based on MDP.For this reason the MDP scheme

is modiﬁed in such a sense that rewards are placed on the edges instead of the nodes

(states).So we get the reward function in formr(s;a):S ×A →R.In other words,for

each state s we search the best action a,which maximizes the cumulative reward u(s;a)

and Bellman equations are in form

u(s;a) = r(s;a) + :max

a

∑

∀s

′

T(s;a;s

′

):u(s

′

;a

′

):(10)

Although RL is a supervised type of learning unlike conventional supervised learn-

ing RL does not use pairs input/desired output but it is concentrated like MDP on win-

ning the biggest cumulative reward during a time period,i.e.ﬁnding the best policy

∗

(s).Methods for handling the maximization process divide RL approaches into two

groups:(a) model-free and (b) model-based methods.The ﬁrst group is concentrated on

searching the optimum action for each state.The main advantage is low computational

complexity but the quality is less than for the model-based methods.Therefore methods

of the group (a) are convenient for systems with small amount of dynamics.A typical

representative is Q-learning [11].Methods of the group (b) at ﬁrst try to approximate

the transition and reward functions hereby creating their inner models,which actions are

simulated in.There is still one more very worth property of RLs.Similarly like NBCs

these types of algorithms do not need large data sets of training data,which is a very

important condition.Next we will describe a hybridized form of a model-based RL us-

ing DTs (RL-DT),which was used for training a bipedal robot NAO to score penalty

kicks [12].

As computational complexity and related time responses are critical in robotic appli-

cations the main idea of RL-DT is to minimize the so-called exploration mode although

the obtained solution needs not to be the optimum.For a more efﬁcient exploration DTs

are used,which are able quickly to minimize the search area by generalizing similar

states into one common.DTs are used for constructing T(s;a;s

′

) and r(s;a) [5].If a

state s is characterized by n state variables X

i

then it is in form of a vector with val-

ues x

i

,i.e.s = (x

1

;:::;x

n

),x

i

∈ X

i

.Thus using e.g.Quinlan’s C4.5 algorithm there

are constructed n +1 DTs for variables X

i

and the reward r(s;a).These DTs perform

predictions of the changes when their input is in form (a;x

1

;:::;x

n

).Each tree is pro-

cessed resulting in a leaf dependent on the input.For transition DTs we get probabilities

of state variables changes p(x

c

i

|s;a) (x

c

i

∈ X

c

i

as an element of the state change s

c

).

As we suppose the variables X

i

are independent then we will get a change of the total

transition function as

T(s

c

;a;s) =

n

∏

i=0

p(x

c

i

|s;a):(11)

Finally,next state s

′

is computed as a sum of the current state s

m

and s

c

.A similar

treatment is in the case of the reward DT.This activity represents the UPDATE_MODEL

function used in the RL-DT algorithm(see Figure 9).Figure 8 shows an example of a DT

for relative changes of the variable X if a robot is moving on a grid.DT ﬁrst investigates

whether the action ’go to the left’ has been performed.Then the decision-making ﬂowis

divided by individual results of given questions till a leaf (a rectangular node) is reached.

If a robot is in the position indicated by a bullet and it moves to one of the black ﬁelds

(there is important also the robot’s orientation) then DT will not generate any changes.

RL-DT algorithm divides its processing into two separate stages.Beside the explo-

ration mode there is also the so-called exploitation mode.During exploration the models

T(s;a;s

′

) and r(s;a) for new (yet unknown) states are updated.During exploitation we

suppose the models are correct and therefore the optimal policy is calculated in a way as

deﬁned for MDPs.The algorithmstarts with a state s,which is added to a set of already

visited states S

M

.At the same time s and all connected actions a construct state —ac-

tion pairs,whose numbers of visits are counted in counters visits(s;a).After that a loop

will be started,where the action a with the highest cumulative reward u(s;a) is chosen

and the state s

′

is obtained.If it is not yet in S

M

it will be added.The function UP-

DATE_MODEL is applied using C4.5 algorithm,which incrementally updates models

for T(s;a;s

′

) and r(s;a).If some updates are done the logical variable CH (change)

will be set to 1.At this moment the algorithm decides between the exploration and ex-

ploitation mode with help of a heuristic incorporated in the function CHECK_MODEL.

If the prediction of an immediate reward is less than 40%of the maximum one then the

exploration will be switched on and the logical variable exp will be set to 1.This heuris-

tic causes exhaustive exploring in the early stages,when the knowledge of the domain is

low.After gaining positive rewards the exploitation mode will be chosen.Finally,a new

optimal policy will be computed by the function COMPUTE_VALUES if the models

T(s;a;s

′

) and r(s;a) were changed.The loop counter is incremented and a new loop is

Figure 8.An example of a DT for change prediction of the variable X [5].

started with s

′

as s.The pseudocode of RL-DT algorithmis depicted in Figure 9.Further

details of this algorithmare in [13].

Include s into S

M

∀a ∈ A:visits(s;a) ←0

loop

a ←arg max

∀a

u(s;a)

Execute a,obtain reward r(s;a),observe s

′

Increment visits(s;a)

if s

′

̸∈ S

M

Include s

′

into S

M

∀a ∈ A:visits(s

′

;a) ←0

end if

PerformUPDATE_MODEL

exp ←CHECK_MODEL

if CH = 1

COMPUTE_VALUES

end if

s ←s

′

end loop

Figure 9.Pseudocode of the general RL-DT algorithm.

1.4.Fuzzy Cognitive Maps

As mentioned above rule bases are considered as the basic knowledge representation.

However,in the area of mobile robotics we have often to handle with decisions chains,

where many of them create closed loops.Thus individual rules are causally intercon-

nected (see rows 1–2 in Figure 2) and cannot be ordered in a simple list of uniformIF —

THEN rules as it is usual for the so-called simple rule-based systems,e.g.rows 3–6 in

Figure 2.In such a case rules are not mutually independent.In a more complex rule-

based system relations between rules become incomprehensible and a graphical means

is necessary to clearly catch them.Fuzzy Cognitive Maps (FCM) seem to be just such

an aid not only for knowledge representation but also for its analyzing and processing

complex rule-based systems [14].

The notion Cognitive Map (CM) was introduced by a political scientist Robert Axel-

rod primarily to model social processes [15].It is an oriented graph where its nodes rep-

resent notions and edges are causal relations.Mostly,notions are states or conditions and

edges are actions or transfer functions,which transform a state in a node to another one

in another node.CM is able to describe complex dynamic systems.It is possible to in-

vestigate cycles,collisions,etc.and to deﬁne strengths (weights) of relations,too.Origi-

nally,they were represented by three values −1;0 and 1.Another advantage is its human-

friendly knowledge representation and ability to describe various types of relations (in a

more detail see e.g.[16]).

FCMrepresents an extension of CMand was proposed by Kosko in 1986 [17].The

extension is based on strength values that are froman interval [−1;1] as well as the nodes

can be represented by activation values from an interval [0;1] or by membership func-

tions as well,see Figure 10.Strengths after their combining correspond to rule weights

in rule based systems,too.

Figure 10.An example of FCM.

There are two basic formal deﬁnitions of FCM[18] and [19].Further,the deﬁnition

by Chen will be used where FCMis deﬁned as a 4-tuple:

FCM = (C;E;;);(12)

where:

C - ﬁnite set of cognitive units (nodes) described by their states

C = {C

1

;C

2

;:::;C

n

};

E - ﬁnite set of oriented connections (edges) between nodes

E = {e

11

;e

12

;:::;e

nn

};

- mapping :C →[0;1] (originally proposed as [−1;1]);

- mapping :E →[−1;1].

In other words, is a computational method for evaluating numerical values of the

set of nodes C.On behalf of correctness,it is necessary to mention that a cognitive unit

is represented by two values:(a) its symbolic (linguistic) meaning denoted as C

i

and (b)

its numerical activation value A

i

∈ [0;1].C

i

represents the qualitative aspect and A

i

the

quantitative one of a node.However,fromsimplicity reasons we can omit this distinction

and we will use the symbol C

i

although further we will handle only activation values.

On the other hand, represents the way how the weights of edges e

ij

∈ [−1;1] are

determined.The signs in this case deﬁne the character of relationship between nodes —

either strengthening (+) or weakening (−).

The set of connections E forms the so-called connection (adjacency) matrix,which

can be used for computing of new activation values of nodes C.For the example in

Figure 10 it will look as:

E =

0 −1 0 −0;5 1

−0;9 0 0;3 0 0

0 0 0 0 1

0 0 −0;4 0 −1

0 0 0 0;7 0

:(13)

Cognitive units are in each time step k in a certain state.Using E we can compute

their states for next time step k + 1 and thus repeatedly for further steps.Similarly as

for differential equations we can draw phase portraits.To preserve values in prescribed

limits a limiting (threshold or transform) function L is used,too.So we can compute the

states C

i

for k +1 as follows [19]:

C

i

(k +1) = L

n

∑

j=1

e

ij

:C

j

(k)

:(14)

The formula (14) can be generalized for computation of the whole state vector C

using a matrix product,i.e.C(k + 1) = L(E:C(k)),which is a special form of in

the FCMdeﬁnition.There exists still an incremental modiﬁcation of (14) where C(k) is

incremented by computed values,e.g.[20]:

C

i

(k +1) = L

C

i

(k) +

n

∑

j=1

e

ij

:C

j

(k)

:(15)

The primary role of the function Lis to keep activation values in the interval [0;1].A

number of functions fulﬁl this condition.However,as the most advantageous the sigmoid

function seems to be [21].For a special kind of FCMwith active inputs [22] any kind of

typical membership functions can be used as L.

If we compare a complex system like in Figure 10 and a simple system with m

independent rules as Figure 11 (e.g.rows 3–6 in Figure 2) we can see FCMs are an

extension of simple fuzzy rule-based systems.Fuzzy rules are totally independent in

simple systems because their consequents do not have any mutual inﬂuence,which is

possible only in simpler decision cases.Simple rule-based systems do not enable any

decision chains or representation of temporal information.From this point of view they

are only a very special and restrained case of FCMs,which can be depicted as an example

in Figure 11 where a set of mrules with antecedent parts LX

i

,which represent already

merged particular inputs like x

j

is LX

i

j

(x

j

∈ X) and outputs LU

i

(i = 1;:::;m)

is ﬁgured in form of a FCM.There is depicted the evaluation process resulting in an

accumulated (aggregated) value LU

c

being defuzziﬁed into a crisp form LU

∗

c

,too.A

more detailed description of fuzzy reasoning methods is e.g.in [23].

Figure 11.An example of FCMrepresenting a simple rule set.

2.Autonomous Decision-Making Systems in Multiagent Environment

Mentioned means for decision-making support fromthe previous section represent some

elements of a decision and control structure,which formulates the main goal of a given

systemand interconnects these means to fulﬁl the goal.If we consider an individual robot

then there will be necessary only DL-0 and DL-1,see Figure 1.However,robotic soccer

is a teamplay and robots must be supplied by other abilities for their mutual cooperation,

which extend our basic structure to DL-2,whose task is to create an overall structure for

control,decision-making,communication,cooperation and behaviour management of a

distributed system with several more or less independent units called as agents.Such an

overall structure is the objective of MAS.

The concept of MAS was inspired by several apparently different disciplines like

economy,sociology and biology and it represents common interconnections among

them.From the typological viewpoint of information sciences it originates from an in-

tersection of distributed computing and artiﬁcial intelligence as a part of distributed ar-

tiﬁcial intelligence [24],which differs fromdistributed computing using communication

and cooperation instead of parallelization and synchronization.

In our case agents play a key role of MAS.There does not exist any general deﬁnition

of an agent because this termis too widespread,existing on various levels of complexity

and it can be implemented in a number of forms.However,we can characterize it by

some minimal properties that must be fulﬁlled [25]:

1.

Autonomy:An agent is able to operate independently froma human.It has indi-

vidual internal states and goals.Autonomy cannot be real without the proactivity

attribute,where it takes the initiative and does not wait passively for a stimulus

fromoutside.

2.

Cooperation:It is the core property,without it we cannot speak about any MAS.

It is the ability to interact with other agents (eventually human) using a commu-

nication language.

3.

Learning:Maybe not always imperative but very useful property,which enables

to the agent to be ﬂexible and able to adapt to changes of environment.

Intersections of these properties enable us to deﬁne the following categories of agents:

collaborative agents,collaborative learning agents,interface agents and smart agents,

see Figure 12.For purposes of robotic soccer mainly collaborative and smart agents come

into consideration.If there is allowed any central command unit like a couch or captain

then also collaborative learning agents can be used.

Figure 12.Basic typology of agents by [25].

Another division of agents is possible by kinds of control and their approaches:

reactive,deliberative,hybrid and behaviour-based [26].

Reactive control represents the simplest type.Its approach can be expressed like:

"Do not think,act."Such an agent only reacts immediately to an external stimulus.It uses

a ﬁnite set of actions being known in advance and so it does not need any inner model

of the surrounding world or communication with other agents.From this reason such

agents are known as ’memoryless’.Their main advantage is computational simplicity

and speed.Their use is on the lowest decision (control) level based on direct inputs

fromsensors.Although,these agents seemto be very primitive but convenient growth of

their complexity can lead to apparently intelligent behaviour with help of the so-called

subsumption architecture,which belongs to this category,too.

Deliberative control is based on:"Think hard,then act."It is characteristic by proac-

tive and intentional approach.Its main feature is communication with its surrounding

and creating own inner world model with the aim to reach determined goals.The world

model is strictly centralized and based on sequences state — action.Its purpose is to

design plans of actions.Beside apparent advantages there are also drawbacks like their

high information dependency and computational complexity.Such an approach strikes

on needs as prediction,processing inaccurate information,change of model in dynamic

environment,etc.Therefore,the use of deliberative control is reasonable on the highest

decision levels.

Hybrid approach is characteristic by:Think and act independently,in parallel."It

tries to eliminate the disadvantages of the two mentioned types of control.Comparing

real needs of robotic teams we can see the necessity of both approaches.Therefore,the

reactive as well as the deliberative architecture are merged into one unit but they work

independently each other.The reactive part ensures real-time responses on the level of

sensorial data,e.g.to avoid collisions with obstacles.The deliberative part works on high

decision-making levels using abstract world representation to make strategic decisions

and plans.

Behaviour-based approach is a little of different nature.Its basic idea is:"Think the

way,you act."This approach found its inspiration in biology.It is based on observed pat-

terns of various kinds of behaviour.These behaviours are derived from interactions be-

tween the agent and its environment.So incremental adaptation is supposed,where ﬁrst

basic (survival) actions are created and continuously more advanced patterns are added,

e.g.starting with simple obstacle avoidance up to e.g.passing a ball.Individual patterns

are stored in the system in form of distributed layers,which are mutually competitive.

This is the main difference to the hybrid type,which also determines the use of these two

control types.

Hybrid agents are used mostly in single robot control whereas behaviour-based con-

trol is used in multi-robot applications like robotic soccer,where a quick response to

changes of such a dynamic game is especially required.In next parts of this section we

will deal mainly with such MAS architectures,which are related ﬁrst of all to behaviour-

based control.

There is still another division of agents in MAS,which are constructed over them:

(a) homogenous,i.e.agents are mutually equivalent (hardware as well as software) per-

forming the same tasks and (b) heterogeneous if they differ by different roles (e.g.pos-

sibility to incorporate hierarchies) although their hardware could be identical.Heteroge-

neous agents are from the viewpoint of research and also of utilization potential more

interesting and therefore most MAS architectures try to solve communication and role

allocation among heterogeneous agents,too.

Summarizing mentioned facts we can state that MAS are suitable especially for

situations where agents have incomplete information as well as data are decentralized.In

addition,there is absent any control from a global level and calculations are performed

asynchronously.As robotic soccer can be characterized just by such properties it seems

to be an excellent candidate for using MAS as well.In general,MAS enable shortening

of computational response (thank to parallelism) as well as communication demands

because agents are functionally independent units.Besides,operability and reliability of

such a systemis enhanced because of ﬂexibility of MAS architectures,in general.

In next parts we will introduce some types of decision-making systems for robotic

teams where the principles of MAS can be more or less visible.However,till nowadays

there does not exist any decision-making system with full functionality of MAS as de-

scribed above.We must take into consideration there are still lots of children’s problems,

which must be solved and robots do not yet own so powerful processors they would be

able to solve these problems in full breadth and depth.

2.1.Cooperative Control Architecture ALLIANCE

ALLIANCE is one of the most known control architectures for multi-robot applica-

tions [27].Although it is not directly used in robotic soccer because of its proper-

ties,where the existence of adversaries is not supposed,but some of other approaches

(e.g.[28]) are more or less inspired by ALLIANCE.

ALLIANCE is proposed for mutual cooperation of heterogeneous robots in small-

and medium sized teams.It is determined for dynamic environments,where problems

with faulty sensing or incomplete information can occur to some extent.Mutual com-

munication is not always necessary and there is no central unit with complete informa-

tion about the environment.Agents (robots) are fully independent.However,on the other

hand side,it is required that robots should be able at least partially to sense their position

and results of their activities as well as of that of their teammates,in general.First of

all,full cooperation among all robots is supposed,i.e.no tricks,no lying or even ad-

verse behaviour.Just the last condition contradicts needs of robotic soccer but there are

many other application possibilities,where we can conform to these requirements,like

watching systems,objects (garbage) collecting or handling (box pushing).

As in praxis some problems with communication can happen it is more convenient

to be able to recognize behaviour of other robots to predict their future performance than

to ask them directly about their activities.Principles of ALLIANCE are based on such

an action recognition.The basic architecture of ALLIANCE is depicted in Figure 13,

which is implemented on each agent.We can see it is a fully distributed cooperative

control scheme of the behaviour-based architecture.It is based on behaviour sets,which

represent individual roles or modes.For instance,in our case a robot can change a role

of an attacker to a defender depending on the situation.These roles are connected with

various activities and behaviour on higher decision levels.A behaviour set can be either

active or nonactive at one moment.There can be active only one behaviour set for each

robot.On contrary,the lower control levels (indicated as layers) are allowed to be active

continuously,i.e.for collision prevention.The selection of a behaviour set is performed

by motivation behaviour,each for one behaviour set extra.Motivation behaviour evalu-

ates suitability of a given behaviour set and activates or inhibits it,which is in Figure 13

indicated by gate-shaped symbols.Therefore,these modules are mutually interconnected

by cross-inhibitions.

The main idea of this approach is based on an assumption that each agent should

perform a task only as long as it demonstrates its ability to perform the desired effect

on the environment.It means,agents are self-checking and only they alone decide about

their activities.Such a behaviour is based on two principal motivations —impatience

and acquiescence.The impatience helps the agent to act if other agents fail.On contrary,

if the agent is performing a task,which continues over a certain time limit,then it will

become more willing to interrupt it and to enable other agents to continue with a chance

it will be accomplished.At the moment if other agents start performing the same task

the given agent will give it up.As an example we can mention transferring boxes from

one place to another one and their placing into racks.If an agent (A) starts transferring

then the agent (B) will try e.g.to place theminto racks.It is caused by the acquiescence

Figure 13.The ALLIANCE architecture [29].

motivation of B.After a certain time Acan decelerate its activities,e.g.due to discharged

batteries,which can be observed that B must wait for new boxes.Then B will start also

transferring boxes,i.e.the impatience motivation is shown.Contemporarily,the agent A

will give up transferring boxes due to the acquiescence motivation and another behaviour

set will be activated,e.g.starting recharging batteries.In other words,the robot Ashould

allow to the robot B to ﬁnish its task before it will again become impatient to B.On

contrary,there is no reason for Ato stay inactive if a given task is unﬁnished and together

no other robot starts accomplishing it.If both motivations are balanced to be mutually

complementary then both agents will cooperate without any eventual conﬂicts.

A behaviour set is composed of activities,which together direct to accomplishing a

task.It depends on a concrete task whether these actions are mutually interconnected by

some decision rules or they are simply chained into a series.For instance,we can deﬁne

a task passing.How to realize it,what sequence of steps to choose (e.g.leading a ball,

turning to an appropriate kicking angle,deciding about eventual obstacle avoidance and

ﬁnally kicking it,etc.),it is the objective of concrete tasks contained in a behaviour set.

Therefore,each behaviour set is handled as a unit.

Similarly,as in alternating activities among robots also behaviour sets alternate mu-

tually.As already mentioned at one moment only one behaviour set can be active on

the robot i.However,if it is active considerably long and another behaviour set j ex-

ceeds a certain activation threshold of its motivation value m

ij

(t) then the ﬁrst set will

be inhibited and the another one activated.The motivation m

ij

(t) depends on several

factors described by functions like sensory feedback sf(t),inter-robot communication

irc(t),suppression fromactive behaviour sets sabs(t),as well as already mentioned im-

patience imp(t) and acquiescence acq(t).For a time step t the total motivation m

ij

(t)

(m

ij

(0) = 0) is calculated as

m

ij

(t) = (m

ij

(t −1) +imp(t)) ×sf(t) ×irc(t) ×sabs(t) ×acq(t):(16)

The functions sf(t),irc(t),sabs(t) and acq(t) are in formof threshold functions,whose

outputs are 0 or 1.These values indicate occurrence of some conditions for growing

m

ij

(t) (value 1) or resetting it to 0.The impatience function imp(t) deﬁnes the rate

of motivation growth.All these functions contain a number of parameters.To easier set

them up an adaptation (learning) modiﬁcation of ALLIANCE named as L-ALLIANCE

was designed.More detailed description of ALLIANCE as well as L-ALLIANCE can

be found in [29].

2.2.Behavioral Kinodynamic Planning

MAS approaches tend to divide a given control system into hierarchical levels.Such a

way is indeed systematic giving an inner order but individual control levels lose imme-

diate contacts each other because it is restricted only to exactly deﬁned communication

links.Further,many tasks are in reality hardly to separate because they are conditionally

interconnected.Such a simpliﬁed division into modules and levels can be too rough.Thus

higher strategic levels lose the contact with physics of a robot and its environment.Simi-

larly,lower control levels do not have any mind about strategy.Further,soccer (football)

is considerably characterized by its randomness.From these two reasons the so-called

Randomized Behavioral Kinodynamic Planning (RBKP) of tactics was proposed in [7].

RBKP is based on embedding predeﬁned basic skills and tactics into a kinody-

namic planning framework,which utilizes physical principles of kinematics and dynam-

ics.Such an approach is strongly problem-oriented,i.e.it requires speciﬁc knowledge

but its efﬁcacy can be very high.Using RRTs-based planning (see section 1.2) enables

quickly to deﬁne tactics as the best nondeterministically modeled combination of skills.

Robotic soccer is thus seen as a randomized kinodynamic planning problem.The struc-

ture of RBKP is depicted in Figure 14.If we compare this structure with Figure 1 we

can see the modules of strategy choice and action planning in DL-1 are merged in this

approach.

For modelling tactics a nondeterministic Finite State Machine (FSM) is used.This

approach enables to generate various sequences of state transitions,i.e.tactics,which

reduces the behaviour predictability for opponents,too.Accordingly,skills will be also

nondeterministic.The system searches over a variety of simulated executions until it

ﬁnds such an instance that leads to the goal.The skills do not execute deterministic

actions based on the state of the environment but they have certain freedomto randomly

chose from alternatives of their typical behaviour.For instance,if a skill named kick

to goal tries usually to kick the ball to the biggest free area of the goal it can change

randomly its habit to kicking into a corner of the goal and so to surprise the opponent

goalie.In such a way the system becomes more creative experimenting with various

alternatives and thus it has chance continuously to ﬁnd optimal tactics as well as skills.

In other words,it is a self-learning approach.

To transformour task of robotic soccer into the graph space of RRTs it is necessary

at ﬁrst to deﬁne the state space X of robotic movement rm as well as inner states of

robots described by FSM states fsm together with information about time t,i.e.x =

(t;fsm;rm),where x ∈ X is a concrete state of the whole playground.The robotic

movement represents a subspace of X,where e.g.positions,rotations,velocities,etc.

of all objects of the playground are included,i.e.not only robots but also the ball and

Figure 14.Structure of the randomized behavioral kinodynamic planning system[7].

other obstacles,etc.Hence X is basically a multi-dimensional space,generally with a

high dimensionality.To simplify such a form of representation it will be transformed in

a space Qof decision trees,in our case RRTs.

The process of constructing behavioral kinodynamic planning is following.Using

the start state the tree T is initialized.Then using the function SAMPLE_RANDOM_

STATE,which uses internal probability distribution,a sample state x

sample

from the

state space in form of (i) will enter the RRT creating algorithm,see Figure 6.In com-

parison to section 1.2 the calculation of the nearest point q

n

requires using a more so-

phisticated metrics than using Euclidean one,see [7].This is provided by the function

KINEMATIC_NEAREST_NEIGHBOUR.As behaviour depends also on inner states of

the robot it is needed to calculate state transition using a nondeterministic FSM(function

FSM_TRANSITION),too.Then we verify whether this new internal state fsm_state

is valid (function IS_VALID_FSM_STATE),i.e.all conditions and limitations are kept.

For instance,there is a rule that only the goalie can operate in the penalty area.Others

are prohibited to enter it.If it is not fulﬁlled a new (i) is chosen.If the new state is ac-

ceptable an action a,which is a skill,is calculated by APPLY_FSM_BEHAVIOUR.This

function associates concrete inner states with actions into pairs (fsm_state −a),which

is the task of user’s imagination about the style of playing and they can be expressed also

in form of rules like IF fsm_state THEN a.The action a is usually a vector with sev-

eral parameters describing e.g.angles for individual joints.At this moment a and q

n

will

enter the kinodynamic simulator (function SIMULATE),the so-called Physics Engine in

Figure 14,where after a ∆t simulation a new state x

new

is computed.This value corre-

sponds to a point on the line segment q

n

−(i).After a certain number of simulations

it will reach (i) or in the case of an obstacle the point q

O

.The engine includes still a

function IS_VALID_PHYSICS_STATE,which checks also from viewpoint of physics

that all limitations are fulﬁlled.If it is correct then x

new

will be as q

new

added to T.

Again a new x

sample

is generated and the whole cycle will be repeated until a goal will

be shot or we will give up (or the game will be timed out).If a goal or its surrounding

X

goal

is reached we can apply a search tree algorithm(Figure 14) in backward direction

until the initial state is reached.After reversing we will get a sequence of skills,i.e.a

tactics.In Figure 15 the pseudocode of the described RBKP algorithmis depicted,which

was used for two-wheeled robots in RoboCup [28],too.

Initialize RRT

for i = 1 to K

(i) ←SAMPLE_RANDOM_STATE()

q

n

←KINEMATIC_NEAREST_NEIGHBOUR(T;(i))

fsm_state ←FSM_TRANSITION(q

n

)

if IS_VALID_FSM_STATE(fsm_state)

a ←APPLY_FSM_BEHAVIOUR(fsm_state;q

n

;(i))

x

new

←SIMULATE(q

n

;a;∆t)

if IS_VALID_PHYSICS_STATE(x

new

)

Add node q

new

to T

Add edge (q

n

;q

new

) with value a

if x

new

∈ X

goal

return x

new

end if

end if

end if

end for

return failed

Figure 15.Pseudocode of RBKP algorithm[7].

Although this method does not guarantee the optimality of solution but there is one

interesting potential of this method.Thanks to the randomness the opponents can pre-

dict the behaviour of our team only very difﬁcult but we can predict the behaviour of

opponents.As the physics of each investigated point x is exhaustively described and if

we incorporate some basic and supposed efforts of opponents like e.g.effort always to

catch the ball then using our physics engine we can model opponents’ behaviour and so

to undertake some steps in advance.It is only necessary to construct a special FSMfor

opponents.

2.3.Negotiation-Based Approaches

Previous two MAS methods have their own strengths but also limitations.ALLIANCE

is a very strong method in real applications,where the environment is uncertain,sensors

and actuators are inaccurate,there are noises of various origin or other agents simply

fail.However,this approach in its pure formis not convenient for applications,where we

meet with adversarial behaviour.Behavioral kinodynamic planning needs ﬁrst of all an

accurate physical model of a given domain and it can be a serious problem to obtain it.

On the other hand side,its unpredictability for opponents and quick planning makes it

very suitable just for adversarial environments.However,both approaches do not guar-

antee optimality and full efﬁcacy.Just the so-called negotiation-based approaches try to

optimize cooperation among robots and to ﬁnd the best one for a given task.In general,

these approaches are fully decentralized.An agent,which needs to fulﬁl a task at ﬁrst

offers it e.g.by broadcasting to others.Then the agent with the best bid will be assigned

the given task and hereby it will be fully responsible for its accomplishing.There are var-

ious modiﬁcations and extensions of this approach and further we will deal with the so-

called market-driven approach (MDA),which was derived from an auction-based task

allocation systemMURDOCH [30].

MDA[31] is based on the principle of free-markets,which utilize a supposition that

if each participant of a market tries to maximize its own proﬁt then the proﬁt of the

whole team will be increased,too.Of course,the communication among participants of

a common team is necessary and they interchange all necessary information.The team

tries for a common objective,in our case a game win.Usually,such a common objective

is reachable only after a series of steps (partial objectives) into them it is decomposed.

For instance,shooting a goal is a result of several combined steps (objectives or tasks) as

passing,catching and dribbling the ball.For each partial task an auction is done.There

is a virtual auctioneer and a group of bidders,i.e.players in this context.The bidders

compute supposed costs necessary to performa given task and offer themto the auction-

eer.In the case of mobile robots the costs are in the formof time consumption for doing

translation or rotation during movement to a given position or obstacle avoidance.The

auction winner is that one offering the lowest task costs and its reward is the allowance to

performthis task.Incorporating further auction rules depends on the level of complexity

that we need to reach,e.g.embedding a certain momentumelement to prevent too often

exchange of player roles.

The concrete implementation is based on estimating these costs [31] for each robot

separately:

C

ES

= w

1

· t

d

+w

2

· t

a

+w

3

· COA

g

;(17)

C

A

= C

ES

+w

4

· COA

b

;(18)

C

D

= w

5

· t

d

+w

6

· t

a

+w

7

· COA

d

:(19)

The value C

ES

represents estimated score costs computed as a sum of costs required

to move for a speciﬁed distance t

d

and angle of rotation t

a

as well as costs caused by

obstacle avoidance (COA) on the path to the opponent goal area COA

g

.C

ES

is just the

value being offered to the auctioneer by the robot i,so the equations (17) – (19) should

be correctly indexed by i.However,real costs for the winner (the attacker) is by (18),

where it is still necessary to take into account costs of obstacle avoidance on the path to

the ball COA

b

,which are in the reality the ﬁrst calculated costs preceding C

ES

.How-

ever,it is supposed that the initial goal position is more important then the position to

the ball,which can be passed by another teammate to the attacker,i.e.auction winner.

Similarly to (17),the defender’s costs are calculated,where COA

d

represents costs of

obstacle avoidance on the path to the midpoint on the line between the own goal and

the ball position.The weights w

1

;:::;w

7

determine the playing strategy.For instance,

if w

1

is a high value then the winning robot will try to shoot a goal only fromimmediate

distance to the opponent goal because only robots occurring in that area have the chance

to become an attacker.There are several ways how to design these weights.Beside ex-

pert evaluations several setting-up methods for w

i

were proposed,e.g.reinforcement

learning,neural networks and genetic algorithms.

MDA is based on dynamical assigning roles of attackers and defenders.Beside the

goalie whose role is given statically for the whole game another robots change their tasks

according their positions to the goals (own or opponent) and costs of obstacle avoidance.

Each robot (beside the goalie) computes in each sampling step (17) – (19) and offers

them to two auctions —for choosing one attacker and one defender.The robot with the

lowest value of C

ES

will become an attacker and similarly for the lowest C

D

it will

take over the role of a defender.Other robots can be in position of secondary attackers

or further defenders.If,as mostly in robotic soccer,there are only three robots —one

goalie,one attacker and one defender the task will be simpliﬁed.It will be necessary only

to calculate (17) and the defender will be the remaining robot.The whole role assigning

algorithmis depicted in Figure 16,where in the module of role assignment either the role

of a defender is assigned to a given robot or of any auxiliary position.This approach was

successfully used in SPL [32],too.

Figure 16.Role assigning algorithmusing MDA.

In [33] a hybrid Integrated Decision-making system of Players (IDP) was designed,

which connects MDA and FCMs.Its basic structure is in Figure 17.Cyclically,at cer-

tain time steps there are performed vision with image processing and scene recognition

followed by strategy choice realized by MDA.After that the possibility to kick the ball

is veriﬁed.If a robot is in the kicking position regardless of its current role it will try

to kick the ball in correspondence with its role.Otherwise the robots will continue their

movements during one time step and then the playing situation will be again evaluated.

Parameters for equations (17) – (19) are obtained from cameras as well as COA values

are estimated in a special navigation module proposed in [34],whose output is utilized

in the module of strategy choice.

Figure 17.Basic architecture of IDP approach.

In MDA the whole strategy of play is formulated implicitly rather offensively than

defensively because players try always to win and kick the ball regardless of their role.

The formulae (17) – (19) serve for role assignment and the weights w

i

will concretely

determine conditions when individual roles will be assigned.However,how individual

players will play it depends on particular rules that we deﬁne for each role.We can pro-

pose either only simple rules,e.g.a defender will try always to pass the ball to the attacker

or there can be deﬁned also more complicated sets of rules combining e.g.passing and

dribbling or mutual passing among several players.In IDP these rules are implemented

into FCMs (in the module strategy choice).

The decision-making module about kicking possibility (Possible kick?) detects a

state whether a robot is already able to kick the ball by the instructions of its role or not.

If yes,then it will determine the kicking target (goal or teammate) and invoke a robot

controller from the low control level for controlling all its actuators and the robot will

kick.The decision-making module checks whether kicking has caused any instabilities

(e.g.crash or downfall).For determining the kicking target there is proposed a special

system based also on FCMs,where the shooting and catching rules are included (see

Figure 18),which is a part of the overall IDP FCM.Each player has its range of interest,

which is projected into supports of membership functions used in the nodes TM

i

(see

Figure 19).Just in this range players try to move.The wider the support the bigger the

range of interest is.FCMof kicking differs fromother parts of the overall FCMin a fact it

is dynamic,i.e.its membership functions and weights of connections change in time and

depend on the current playing situation.Peaks of the membership functions in the nodes

TM

i

represent angles of view of players seen from the shooter’s perspective,similarly

for the opponent O.The successfulness of passing the ball depends on mutual positions

of players as well as on shooter’s speed.Therefore,the weights of the connections w

si

are indirectly proportional to distances between the shooter and teammates.Values of

nodes A

j

are also indirectly proportional to the size of the angular rotation needed for

the shooter.These nodes (A

j

) represent possible angular ranges (intervals) of shooting

to individual teammates.The calculation of resultant kicking angle is performed in the

node S.

Figure 18.General structure of the kicking FCM.

In each time step new membership functions in the nodes TM

i

and O are set-up,

having their peaks at measured angle values of players’ positions.Simultaneously,the

weights w

si

and values in the nodes A

j

are calculated in the already mentioned manner.

As the membership functions in the nodes TM

i

should help attracting the ball and re-

versely the function in the node Oshould repulse it thus this function will be constructed

in a reverse manner than other functions,i.e.it will be minimum at the measured angle

of the opponent,see Figure 19.Then the values of the nodes A

j

will be multiplied to-

gether with the weights w

si

.Thus we get total costs of shooting that include rotation and

distance.The functions TM

i

will be scaled using these costs and ﬁnally in the node S

all membership functions will be united using a t-conormS

K

.

Figure 19.Example of the evaluation process in a kicking FCMbased on membership functions TM

i

and O.

3.Concluding Remarks and Outlook into Future

In this chapter an overview about some means and decision-making systems has been

done.As robotic soccer represents a set of tasks and problems that appear in real appli-

cations it was chosen for demonstrating these means.It is only a short and incomplete di-

gest froma variety of used methods.We can observe there are efforts to use means from

various research areas starting from conventional control theory and ending in artiﬁcial

intelligence as well as to interconnect them although they can be of very diverse princi-

ples.There is only one criterion for their choice —efﬁciency of an individual means for

a given individual task.It is often miraculous how these means can mutually cooperate.

Hybridization has become to a slogan in constructing decision-making systems.

As present technological level of robotics does not yet enable using too complex

algorithms and decision structures the designers are still forced to develop calculation-

saving systems,which is reﬂected on their decision structures,too.However,in this area

we can await a very rapid change in only a few years.Especially,in last not only ﬁve

years many advanced mobile robotic platforms have appeared with relatively powerful

processors as well as quality sensors and actuators,e.g.robotic dog Aibo or bipedal robot

Nao,etc.

In future we can suppose the research in robotics will be more oriented on inter-

actions with humans,which will lead to use of emotional models and fuzzy logic in a

greater measure.We will observe development of decision-making as well as control

systems fromprecise sensing and description of the environment (world) to systems be-

ing able to work in uncertain and heavily predictable environments.Now the period of

using probabilistic and uncertain methods is starting in robotics,too.

References

[1]

J.Baltes,N.M.Mayer,J.Anderson,K.-Y.Tu and A.Liu,The humanoid leagues in robot soccer

competitions,In Proc.of the IJCAI Workshop on Competitions in Artiﬁcial Intelligence and Robotics,

Pasadena,California,USA (2009),9–16.

[2]

D.García,E.Carbajal,C.Quintana,E.Torres,I.Gonzalez,C.Bustamante and L.Garrido,Borregos:

Teamdescription,In RoboCup papers (2010).

[3]

H.P.Störr,A compact fuzzy extension of the naive Bayesian classiﬁcation algorithm,In Proc.In-

Tech/VJFuzzy (2002),172–177.

[4]

C.Bustamante,L.Garrido and R.Soto,Fuzzy naive Bayesian classiﬁcation in RoboSoccer 3D:Ahybrid

approach to decision making,In Proc.of the RoboCup International Symposium,Bremen,Germany

(2006),507–515.

[5]

T.Hester,M.Quinlan and P.Stone,Generalized model learning for reinforcement learning on a hu-

manoid robot,In Proc.IEEE Int.Conf.on Robotics and Automation (ICRA),Anchorage,USA (2010),

2369–2374.

[6]

J.R.Quinlan,C4.5:Programs for Machine Learning,Morgan Kaufman Publishers,1993.

[7]

S.Zickler and M.Veloso,Playing creative soccer:Randomized behavioral kinodynamic planning of

robot tactics,In Proc.of RoboCup 2008:Robot Soccer World Cup XII (2008),414–425.

[8]

S.M.LaValle,Planning Algorithms,Cambridge University Press,Cambridge,United Kingdom,2006.

[9]

J.M.Vidal,Fundamentals of Multiagent Systems with NetLogo Examples,Unpublished,2010.

[10]

J.Brindza,A.Lee,A.Majumdar,B.Scharfman,A.Schneider,R.Shor and D.Lee,UPennalizers:Team

report,In RoboCup papers (2009).

[11]

C.Watkins,Learning from Delayed Rewards,PhD thesis,University of Cambridge,1989.

[12]

S.Barrett,K.Genter,M.Hausknecht,T.Hester,P.Khandelwal,J.Lee,A.Setapen,A.Tian,M.Quinlan,

M.Sridharan and P.Stone,TT-UT Austin Villa:Teamdescription,In RoboCup papers (2010).

[13]

T.Hester and P.Stone,Generalized model learning for reinforcement learning in factored domains,

In Proc.of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS),

Budapest,Hungary (2009),717–724.

[14]

S.Oblak,I.Škrjanc and S.Blažiˇc,If approximating nonlinear areas,then consider fuzzy systems,IEEE

Potentials 25 (2006),18–23.

[15]

R.Axelrod,Structure of Decision:The Cognitive Maps of Political Elites,Princeton University Press,

1976.

[16]

P.P.Groumpos,Fuzzy cognitive maps:Basic theories and their application to complex systems,In

Fuzzy Cognitive Maps,Studies in Fuzziness and Soft Computing 247 (2010),1–23.

[17]

B.Kosko,Fuzzy cognitive maps,International Journal of Man-Machine Studies 24 (1986),65–75.

[18]

S.M.Chen,Cognitive-map-based decision analysis based on NPN logics,Fuzzy Sets and Systems

71 (1995),155–163.

[19]

W.Stach,L.Kurgan,W.Pedrycz and M.Reformat,Genetic learning of fuzzy cognitive maps,Fuzzy

Sets and Systems 153 (2005),371–401.

[20]

E.I.Papageorgiou,K.E.Parsopoulos,C.D.Stylios,P.P.Groumpos and M.N.Vrahatis,Fuzzy cogni-

tive maps learning using particle swarm optimization,International Journal of Intelligent Information

Systems 25 (2005),95–121.

[21]

S.Bueno and J.L.Salmeron,Benchmarking main activation functions in fuzzy cognitive maps,Expert

Systems Applications,36 (2009),5221–5229.

[22]

J.Vašˇcák and M.Pal’a,Adaptation of fuzzy cognitive maps for navigation purposes by migration

algorithms,International Journal of Artiﬁcial Intelligence (in press).

[23]

Z.C.Johanyák and S.Kovács,A brief survey and comparison on various interpolation-based fuzzy

reasoning methods,Acta Polytechnica Hungarica 3 (2006),91–105.

[24]

P.Stone and M.Veloso,Multiagent systems:Asurvey froma machine learning perspective,Autonomous

Robots 8 (2000),345–383.

[25]

H.S.Nwana,Software agents:An overview,The Knowledge Engineering Review,11 (1996),205–244.

[26]

M.J.Matari´c,Learning in behavior-based multi-robot systems:Policies,models,and other agents,

Cognitive Systems Research,2 (2001),81–93.

[27]

L.E.Parker,Evaluating success in autonomous multi-robot teams:Experiences from ALLIANCE

architecture implementations,Journal of Experimental and Theoretical Artiﬁcial Intelligence 13 (2001),

95–98.

[28]

J.Bruce,S.Zickler,M.Licitra and M.Veloso,CMDragons:Dynamic passing and strategy on a cham-

pion robot soccer team,In Proc.IEEE Int.Conf.on Robotics and Automation (ICRA),Pasadena,USA

(2008),4074–4079.

[29]

L.E.Parker,ALLIANCE:An architecture for fault tolerant multirobot cooperation,IEEE Transactions

on Robotics and Automation 14 (1998),220–240.

[30]

B.P.Gerkey and M.J.Matari

´

c,Sold!:Auction methods for multi-robot coordination,IEEETransactions

on Robotics and Automation 18 (2002),758–786.

[31]

H.Köse,C.Mericli,K.Kaplan and H.L.Akin,All bids for one and one does for all:Market-driven

multi-agent collaboration in robot soccer domain,In Proc.18th International Symposium of Computer

and Information Sciences (ISCIS),Antalya,Turkey (2003),529–536.

[32]

H.L.Akin,T.Mericli,N.E.Özkucur,C.Kavaklioglu and B.Gökce,Cerberus’10 SPL team,In

RoboCup papers (2010).

[33]

J.Vašˇcák and K.Hirota,Integrated decision-making system for robot soccer,Journal of Advanced

Computational Intelligence and Intelligent Informatics 15 (2011),156–163.

[34]

C.Pozna,F.Troester,R.E.Precup,J.K.Tar and S.Preitl,On the design of an obstacle avoiding

trajectory:Method and simulation,Mathematics and Computers in Simulation 79 (2009),2211–2226.

## Comments 0

Log in to post a comment