Towards Semantic Navigation System

grassquantityAI and Robotics

Nov 15, 2013 (4 years and 6 months ago)


Recent Advances in Intelligent Information Systems
ISBN 978-83-60434-59-8,pages 711–720
Towards Semantic Navigation System
Barbara Siemiątkowska,Jacek Szklarski,Michal Gnatowski,
and Adam Borkowski
Institute of Fundamental Technological Research,Polish Academy of Sciences,
We present an approach for semantic navigation of a mobile robot in a human-made
environment.Our system consists of following parts:map building,localization,
path planning.The dual map represents the environment at different levels of
abstraction:metric information is used for obstacle avoidance,semantic informa-
tion is usefull for robot-human interaction.Our path-planning module allows us
to specify the goal for the robot in natural language.
1 Introduction
Mobile robots find many applications nowadays.Starting from a tiny vacuum
cleaner (iRobot,2009) and ending with huge machines for cutting wood (Halme
and Vainio,1998),such robots replace human in performing hard,boring or dan-
gerous work.Poland has found its niche in mobile robotics:a family of remotely
operated robots supporting police and antiterrorist units has been developed at the
Industrial Institute of Automation and Measurements (PIAP) in Warsaw (PIAP,
2009).Several universities produce in a small scale mobile robots for education
and research (compare,e.g.(Chojecki and Olszewski,2009)).
In order to perform any useful job,mobile robot has to know where it is at
present.It must also posses a map of surrounding area that would allow the
robot to plan its further movements.The ability to localize itself,to build the
map and to plan routes on such map is incorporated in a navigation module that
plays important role in each controller of a mobile robot.For robots working
inside buildings (in-door robots) the main subproblems of navigation,namely,the
localization problem,the mapping problem and the path planning problem are
at present to much extent satisfactorily solved.A comprehensive overview of the
existing navigation techniques can be found in Thrun et al.(2005).
A serious drawback of the work done so far on navigation issues comes out from
a circumstance that this work was oriented solely towards geometrical description
of the world.Current position of the robot,locations of the obstacles and a
collision-free trajectory leading the robot to the next goal where expressed as
numbers,possibly random,referring to some fixed system of coordinates.In order
to act purposefully in the building,the mobile robot must be aware of the meaning
of surrounding world.A capability of breaking through the barrier that separates
the acquisition of geometrical features from the understanding of semantics will
712 Barbara Siemiątkowska et al.
determine the success of robots in the future.Most researchers recognize this
challenge and first attempts to add meaning to environment maps are reported in
the literature (Mozos et al.,2007;Vasudevan and Siegwart,2008).These works
follow a pattern recognition scheme.It is assumed that the robot is given certain
knowledge about the building (an ontology).Such knowledge allows the robot
to recognize particular components of the floor layout (windows,doors,etc.) on
the current map,or even to detect functionality of a particular subspace of the
building (a hall,a staircase,etc.).
Learning semantics autonomously is a difficult task.An obvious solution is to
do it interactively with the help of human operator.After the robot has attached
semantic labels to the objects in his working environment,two gains are visible
at the first glance.Firstly,the human-robot interaction is considerably simplified.
The operator can command go to the library instead of defining a digital position
of the goal.Secondly,the action planner of the robot can work at much higher
level of abstraction than purely geometrical reasoning.Thus,a request of bringing
a book would trigger a chain of reasoning leading to sub-goals go_to_library,
find_a_shelf,find_a_book,etc.One of the obvious ways to achieve such reasoning
is a rule-based inference engine,containing the ontology of buildings.Fortunately,
much of the latter is already available through Industry Foundation Classes (IFC)
(BuildingSMART,2008) and Building Information Modeling (BIM) (Eastman,
In this paper we show that certain semantic information can be extracted from
scans produced by a laser-based ranging sensor.This extraction is enabled by
a combination of image analysis techniques,a learn-capable feature filter and a
background knowledge about typical elements of the building.
2 Related Work
Historically mapping was divided into metric and topological approaches.The
former describes geometric properties of the environment while the latter describes
the environment as a list of connected places.The typical examples of metric
maps are well known Elves and Moravec’s occupancy grid mapping algorithm
which models the environment by free and occupied grids,and describing the
geometry of environment by a set of polygons proposed by Chatila and Laumond.
In topological approaches connections among given places contain information how
to navigate between each other (Thrun,2002).
In the last decade more popular are probabilistic maps,which can be divided
into:Kalman filters,Expectation-Maximization (EM) algorithms and semantic
maps.Kalman filters estimate the map which describes the location of artificial or
natural landmarks,as well as robot location.The EMalgorithm (Dempster et al.,
1977) is used to find out whether sensors measurements recorded at different points
in time correspond to the same physical object.Semantic maps identify objects of
the environment which may be ceiling,floor,walls,doors,etc.This kind of maps
are practical in human-robot interaction due to labeled objects of interest,which
simplify human-friendly commands like go to the library.A snapshot technique
in taking pictures from the real world and storing them in a semantic map is
Towards Semantic Navigation System 713
presented in Nielsen et al.(2004).
Another taxonomy distinguishes between global and local coordinate mapping.
The former does not contain information about the sensor measurements,while
the latter describes the sensor measurements that a robot would receive at different
locations.Local maps seems to be more convenient but it is difficult to extrapolate
unknown area and it is difficult to distinguish between two similar places (Thrun,
Image similarity based on local feature descriptions and geometric constraints
can be used for building an appearance graph representation.The graph is used
for representing semantic information about the space,and can be used for visual
navigation (B.Krose and Zivkovic,2007).
A nodding SICK in orthogonal 3D-SLAMfor indoor environments is proposed
for feature extraction in the fast SLAM algorithm.3D images are segmented into
planar patches with polygonal boundaries (Harati and Siegwart,2007).
In Vasudevan and Siegwart (2008) a robot identifies high level features (walls,
doors,etc) based on sensory information.Recognized objects are grouped into
abstractions along spatial and semantic dimensions.Spatial abstractions are per-
ceptual formations (walls,doors etc.),while semantic abstractions are conceptual
and functionality arrangements like work space,coffee space,etc.
3 Data Acquisition
The experiments have been performed on a mobile robot “Elektron” which has been
made at the Institute of Automatic Control and Robotics of Warsaw University
of Technology.The basic sensor is the Sick LMS 200 indoor laser mounted on a
rotating support which allows to take 3-dimensional snapshots representing the
environment.The scanning laser measures distance to obstacles in 2D plane,its
ray sweeps space for angles (90


) with resolution 0.5

giving 361 values.
The head can rotate the scanner about the horizontal axis within angular range
from -15

to +90

and the support rotates the laser vertically,which allows to
make 3D scans.One 3D scan gives usually about N  65000 points.
4 Object recognition based on image analysis
As discussed above the laser scanner provides measurements as a set of 3-tuples
g where  and  represent the horizontal angle of the laser ray and vertical
inclination angle of the laser base respectively,r
is the measured distance.It is
common that the next step of data analysis is to transform these values into a
point cloud which is a set of 3D points in the Cartesian coordinate system with
the robot in its center.However,we propose here a rather different and novel
approach in which we convert the measurements into a 2D image and then apply
fast and well-known algorithms used in image analysis.
The most straightforward way to transform data from the laser scanner is to
use (;) as pixel coordinates and assign the pixel color according to the mea-
sured distance,see Fig.1,left.However,since this approach does not lead to a
satisfactory method of representing geometrical properties of the environment we
714 Barbara Siemiątkowska et al.
propose to map three coordinates associated with normal vector for each pixel to
RGB values of an ordinary color image.For each pixel (j;k) we find its position
p in 3D Cartesian coordinates with the robot in its center.Then we analyze its
neighborhood so we can find a normalized vector n normal to surface at p.A
color RGB image is constructed by assigning values of the coordinates n
as colors red,green and blue respectively,see Fig.1,right.Straightforward map-
ping of angles ; as pixel position does not have to lead to construction of images
optimal for subsequent processing and good results are achieved when using,e.g.,
Albers equal-area conic projection (Snyder,1987).
Figure 1:Left:a sample gray-scale image representing distance from the robot in
(;) space.Here,for clarity,we show all distances greater than 4 meters as black.Note
that on the left side of the image there is a black area due to some error in scanner
readings (connected with strong scattering of the laser beam).Right:data for the
scene transformed in order to obtain normal vectors n for each pixel.Values of (x;y;z)
coordinates of n are presented as red,green and blue components of this 8-bit RGB
image (e.g.,n
= 1 gives red=0,n
= 1 gives blue=255,etc.).
4.1 Segmentation of the image
The purpose of this step is to perform a fast segmentation of the gathered data
into areas,each one representing a flat polygon in the real scene.In the first step
the flood fill algorithmwith floating range is run on an image representing distance
(cf.Fig.1) with pixel (0,0) as the seed point (the threshold corresponds to about
5 cm).If the resulting area is large enough it is marked with number 1 and then
the same procedure is applied for the next pixel which has not been assigned to
any area,the procedure is repeated until all the pixels are assigned to one of the n
areas.Then the areas are divided into smaller ones by running flood fill algorithm
on the RGB image constructed from normals.Each area i,1  i  n,is used as a
mask,so only pixels belonging to the area i are used as seed points and all pixels
from area outside i act as borders for filling.This step gives a list of m n areas
each representing a more or less flat surface of the real scene,see Fig.2,left.
In the final phase of the segmentation process the m areas are converted into
polygons in full three-dimensional space.Taking 3D point from all pixels which
form an edge of an area we obtain list of vertices defining a connected component
Towards Semantic Navigation System 715
– polygon – related to the area j.These polygons are used in later stage in rule-
based classification.Such classification is usually based on adjacency relations and
geometrical size.For example,a door is an abject which is adjacent to walls and
floor,it has „neutral” blue component  128,and its size lies in a certain range.
However,in order to classify more complex objects it is necessary to include more
features as input to the rule based system.
Figure 2:Left:areas detected after the second stage of the segmentation process applied
to data presented in Fig.1 Each area is marked with different,arbitrary chosen color.
The segmentation is done on an image which has been firstly projected using Albers conic
projection.Right:part of a 3D scene with areas labeled by the rule-based classification
system.Visible are:the door,the ceiling,the wall and unclassified areas.
4.2 Rapid object detection using Haar-like features
Treating laser scanner data as an image makes possible to directly apply well
known methods for object detection and pattern recognition from image analysis
field.Here we show how to enrich our classification system by using scheme based
on a boosted cascade of simple features to detect objects.Algorithms which are
applied here implement methods proposed by Viola and Jones (2001) for basic
set of Haar-like features and by Lienhart and Maydt (2002) for rotated features
which enhance the basic set.After the system is trained for recognition of certain
objects new images can be analyzed very fast while maintaining good hit rate
and reasonably low false alarm rate.This makes the method interesting and
practical for our purposes.Moreover,having direct geometrical information about
a detected object we can often reject false classifications by analysing its real size.
Images generated from laser data in the way proposed above have,of course,
different properties than usual visual images produced by ordinary cameras.For
example illumination and any lighting conditions are not of our interest here.
Either in bright light with many shadows or in completely dark room one gets the
same image.On the other hand,when the robot equipped with the scanner moves
on a floor,color components of the image change so there is some dependency
upon its position.This,however,does not have to lead to problems with object
detection by image analysis,since many of methods used for that purpose operate
on gray-scale images.
716 Barbara Siemiątkowska et al.
In the first stage of our classifier it is necessary to train it for objects of inter-
est.In order to perform training we generate a large set of positive and negative
examples.Positive examples are small images containing the object placed on an
arbitrary background,and negative examples are just randomly chosen images not
containing any “positive” object.
When the training process is completed,the classifier can be applied to any
region of an image giving true if the region is likely to contain pattern similar
to one of those from the positive samples set,false otherwise.Analysis is very
fast so one can try many different regions with varied sizes so entire image can be
searched for an object.Figure 3 shows result of searching with classifier trained
for detection of objects similar to a sink.
Areas corresponding to objects detected with the discussed method can be later
processed in our classification scheme when building semantic map.Naturally,each
object which is going to be recognized has to have its own specifically trained Haar-
like features classifier.In more sophisticated approach it is possible to combine
object recognition from both:visual images and images constructed from laser
Figure 3:The left panel shows 21 images – a small subset of set of positive samples
which are used to train classifier for a sink-like object.On the right result of classification
for test image is presented (the black rectangle represents area which has been recognized
by the classifier).Obviously the test image was excluded fromthe training process.Note,
that when using Haar-like features classifier we generate RGB image by taking absolute
values of components jn
5 Path-planning
Our approach for path planning combines the advantages of the diffusion method
Choset et al.(2005) (lack of local minima) and the potential field method Hwang
and Ahuja (1992);Latombe (1991) (the planned path does not go too close to
obstacles).It also allows us to indicate the goal for the robot using semantic
labels.For example it is possible to aske the vehicle to go to a nearest door or to
a specific table.The proposed algorithm is an extension of the diffusion method.
It is implemented using the Cellular Neural Network (CNN).
CNN consists of neurons (cells) which interact locally.Usually cells are ar-
ranged in a form of N  M array.The symbol c
denotes cell i;j,its state is
Towards Semantic Navigation System 717
described by x
(t),its input signals u
2 R,output signals y
(t) 2 R,a bias
I 2 R,a
2 R - interconnection weights,b
2 R - the feed-forward parameters.
Symbol N
denotes the neighborhood of the cell c
5.1 Application of CNN for collision free path planning
The algorithm for path planning consists of following parts:
 the N M grid-based map is created,
 the N M CNN is built,each cell of the map corresponds to the cell of CNN,
 the weights a
are computed,a
is proportional to the distance between
centers of gravity of areas which are represented by cells c
and c
 two neurons are distinguished in CNN:c
which represents the position of the
goal and c
- the position of the robot,
 input signals u
are computed,
 value f
is given by:f
= f
),where r is the radius
of the neighborhood,f
is the function of input signals,
 the state of the cell c
is described as follows:
(t +1) =

K f
if c
= c
(t) a
) f
;0) if c
6= c
 output signal y
of the cell c
is given by y
(t) = x
The process of diffusion is continued until stability,8
(t) =
(t + 1).If the cell c
represents the position of the robot then next po-
sition of the vehicle is indicated by the neuron c
2 N
for which x
(t) =
(t)g);where c
2 N
is fulfilled.
If r = 0 and f
= u
and u
= K (if c
indicates the area occupied by an
obstacle) or u
= 0 (if c
represents traversable part of the environment) then
the planned path usually goes too close to obstacles.If u
2 [0;K] then values of
additional parameters (for example the quality of the ground) have the influance
during path planning.
Usually the robot is represented as a point and all obstacles are extended in
order to take into account the dimension of the vehicles.This method is not
convenient when robots with different size plan their paths based on the same
grid-based map.In our approach,the dimension of the robot is taken into account
during path planning.The parameter r is r 
,where xx is the size of a
cell in grid-based map,2T 2T is the size of rectangular area which is occupied by
the robot.Feed-forward parameters are computed,b
indicates the influence of an
obstacle represented by the cell c
to the cell c
.It is assumed that:0  b
 1.
If b
= 0 then c
does not influence c
,if b
= 1 the influence is maximal.The
function f
can be described by formulae:
= maxfb
 u
 u
 u
g (2)
718 Barbara Siemiątkowska et al.
 u
In order to determine the optimal linear and angular velocities of the robot a
Dynamic Window method (Fox et al.,1997) is used.
Figure 4:Experimental results:a) the path traversed by the robot,b) linear velocities,
c) angular velocities
Fig.4a) presents the path traversed by the robot (red dots).The radius of
influence of the obstacles equals 70cm(7 cells).The servo tick of the control loop
equals 300ms.Fig.4b and fig.4c show the linear and angular velocities for selected
points on the path.The robot goes along the center of the corridor but it is able
to go through gaps.
5.2 Path planning based on dual map of the environment
In our system the dual grid-based and semantic map of an environment is built.
Each cell contains semantic information (a list of objects),and traversability level
of corresponding area (Gu et al.,2008).Figure 5 shows the map of environment
presented in Fig.2.White rectangles represent the area free from the obstacles
(a floor),red rectangles show parts of the environment which are not traversable,
gray cells indicate the positions of the door (light-gray) and the wall (dark-gray).
The goal for the robot is indicated using semantic labels,for example the door.
Semantic goal is transformed into metric one.All cells of the CNN are initialized:
(t) =

K f
if c
2 c
0 if c
=2 c
is the set of cells which correspond to the object described by the semantic
If the task for the robot is to move to the nearest object,then all cells which
correspond to the position of any door are considered as goals.
Towards Semantic Navigation System 719
Figure 5:The map of the environment
Based on semantic information values of input signals u
can be modified.It
is possible to take into account smoothness of the terrain or to force the robot to
avoid some specific area.
6 Conclusions
Our experience gained so far brings the hope that the semantic barrier can be
overcome even when crude data supplied by sensors are noisy and incomplete.
Reverting to colors as basic coordinates extracts surprisingly many objects from
the initial cloud of measured points.Further classification can be performed in
different ways.We demonstrate that cascade filters working on simple features of
the objects subject to classification work quite well.
We believe that dual maps will play increasingly significant role in future nav-
igation systems.Path planning on such maps is very efficient and a possibility
to reason automatically about the goals that the robot has to achieve seems to
be very important.Needless to say,understanding semantics of the surrounding
world will enable the robot to consult its plans with human operator in a natu-
ral or almost natural language.This will considerably simplify the human-robot
interface and will make it less prone to errors.
7 Acknowledgment
This work has been partially supported by the Polish Ministry of Science and
Higher Education (grant:4311/B/T02/2007/33).
O.Booij B.Krose and Z.Zivkovic (2007),Ageometrically constrained image similarity
measure for visual mapping,localization and navigation,in Proceedings of the European
Conference on Mobile Robots (ECMR07),pp.168–173.
BuildingSMART (2008),Industry Foundation Classes,Version 2X4,World Wide Web
electronic publication,URL
720 Barbara Siemiątkowska et al.
R.Chojecki and M.Olszewski (2009),A Mobile Robot for Laboratory Purposes,
Pomiary Automatyka Kontrola (PAK),3(55):190–193.
and S.Thrun (2005),Principles of Robot Motion:Theory,Algorithms,and Imple-
mentations (Intelligent Robotics and Autonomous Agents,MIT Press.
A.P.Dempster,N.M.Laird,and D.B.Rubin (1977),Maximum likelihood from
incomplete data via the EMalgorithm,Journal of The Royal Statistical Society,Series
C.Eastman (2008),BIM Handbook.A Guide to Building Information Modeling for
Owners,Managers,Designers,Engineers,and Contractors,Hoboken,NJ:Wiley.
D.Fox,W.Burgard,and S.Thrun (1997),The dynamic window approach to collision
avoidance,IEEE Robot.Autom.Mag.,4:23–33.
J.Gu,Q.Cao,and Y.Huang (2008),Rapid Traversability Assesment in 2.5Dgrid based
map on rough terrain,Internationl Journal of Advanced Robotics Systems,5(4):389–
A.Halme and M.Vainio (1998),Forestry Robotics —Why,What and When,in Auto-
nomic Robotic Systems,volume 236 of LNCS,pp.149–162,Springer.
A.Harati and R.Siegwart (2007),Orthogonal 3D-SLAM for Indoor Environments
Using Right Angle Corners,in Proceedings of the European Conference on Mobile
Robots (ECMR07),pp.144–149.
Y.K.Hwang and N.Ahuja (1992),A potential field approach to path planning,IEEE
Trans.Robotics and Aut.,pp.23–32.
iRobot (2009),Household Robots,World Wide Web electronic publication,URL
J.Latombe (1991),Robot Motion Planning (Kluwer International Series in Engineering
and Computer Science,Kluwer.
R.Lienhart and J.Maydt (2002),An Extended Set of Haar-like Features for Rapid
Object Detection,IEEE ICIP 2002,1:900–903.
O.M.Mozos,R.Triebel,P.Jensfelt,A.Rottmann,and W.Burgard (2007),
Supervised semantic labeling of places using information extracted from sensor data,
Robotics and Autonomous Systems,5(55):391–402.
M.C.Walton (2004),Snapshots for semantic maps,in SMC (3),pp.2853–2858.
PIAP (2009),Security Engineering,World Wide Web electronic publication,URL
J.P.Snyder (1987),Map Projections – A Working Manual.,United States Government
S.Thrun (2002),Robotic Mapping:A Survey,in Exploring Artificial Intelligence in the
New Millenium,Morgan Kaufmann.
S.Thrun,D.Fox,and W.Burgard (2005),Probabilistic robotics,MIT Press.
Sh.Vasudevan and R.Siegwart (2008),Bayesian space conceptualization and place
classification for semantic maps in mobile robotics,Robot.Auton.Syst.,56(6):522–537,
ISSN 0921-8890,doi:
P.Viola and M.J.Jones (2001),Rapid Object Detection using a Boosted Cascade of
Simple Features,IEEE CVPR.