Discovering Whereabouts through Localization and Mapping.

useumpireΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 9 μήνες)

64 εμφανίσεις






Administrative Information

ةيرادلاا تامولعملا


عورشملا ناونع
-


Project Title

Discovering Whereabouts through Localization and Mapping



يسيئرلا ثحابلا

Principal Investigator
-


مسلاا


乡浥

ةسسؤملا

I湳瑩瑵t楯i

ةيفيظولا

P潳o

ناونعلا

䅤摲e獳

ا ناونعلا
ينورتكللا

e
-
浡楬

فتاهلا مقر

呥汥灨潮l

䑡湩敬n
䅓䵁A

naciremA
mi aciireAinA
taicer

iiiiremrA
rcinaiiic

hieSAliAdiiR
A

摡㈰䁡畢⹥摵⹬d


-
㌵〰〰


نوكراشملا نوثحابلا

Co
-
Workers
-


مسلاا


乡浥

ةسسؤملا

I湳瑩瑵t楯i

ينورتكللاا ناونعلا

e
-
浡楬

necrem
A
H hWRdD

䅭Ar
楣i渠啮楶e牳楴y
潦⁂e楲畴

摡牷楳桀慵戮e摵⹬d






عورشملل ةيدقاعتلا ةدملا

Duration
-
:

周楳⁷T猠s′ yea爠灲潪oc琠t桡琠獴a牴r搠潮⁄dce浢m爠㌱
st

2007 until July 31
st

2010 (given a 6 month extension)






Scientific Information

ة
ّ
يملعلا تامولعملا

فدهلا
-


Ob
jectives

The objective of the project is to develop a system

for the 3D mapping of an indoor
environment.



ةققحملا تازاجنلاأ

Achievements
-

Year 1
:

In the original scope of our proposal, the intent was to use an omnidirectional panoramic
camera (camera

looking up at a paraboloid mirror) capable of delivering 360
o

panoramic images
at each pose. Lemaire et al. [5] propose such a system but do not mention that the required
processing time is high. Indeed, once the system was implemented, it was observed th
at the cost
of capturing the images, transferring them to the required format, and processing them
amounted to approximately 15 second per image. This frame rate is too low for the following
reason. Visual servoing works by tracking visual features in cons
ecutive frames and estimating
the motion of the robot based on the displacement of the matched features between frames. If
consecutive frames are too spread in time it will be difficult to match features given that they
are not invariant under changes in v
iewpoint (rotation and scale) [12, 8] as well as the non
-
affine transformations that are observed in panoramic images. We attempted to alleviate the
issue of non
-
affine transformations by unwrapping each panoramic image using the procedure
of Xiong et al.
[15], but this added the issue of unwrapping noise, which exacerbated the
problem of feature matching because of the low Signal to Noise Ratio (SNR).

On another note, we observed that while omni
-
came
ras generate panoramic informa
tion, the
robot must still
navigate to different location if the 3D reconstruction system requires images of
a scene from different viewpoints. Eye in hand methods were not included in last year’s work
but are addressed in the second part of this project.

Once it was decided to use
a stereo rig instead of t
he omnicamera, we sought to de
velop a stereo
algorithm capable of yielding smooth surfaces. Options included the works of Pons et al. [11],
the variational stereo algorit
h
m by Faugeras et al. [3], the multi
-
view reconstruction meth
od of
Bradley et al. [1], and the work of W
ang et al. [14]. Of these sys
tems those of Faugeras et al.
and of Wang et al. were implemented because they appeared the most promising to our system.

Faug
era
s

et al. [3] In this method, a surface is deformed unde
r multiple resolutions based on
the two images obtained from the two camera
s by minimizing an energy func
tion. Given the
high computational demand of this method, it was implemented on a Graphical Programming
Unit in [6] under the Cg programming language [
9] and then under CUDA [10] once a stable
release was available.

Wang et al. [14] start with color segmentation of the two images using mean
-
shift [2] with
segmentation window parameter that generates many small segments of uniform color. Then
they do an i
nitial guess of the disparity map by using any readily available fast stereo algorithm.
Planes are then fit to the small segments using a voting technique. Matching and refinement of
these planes is then done per segment and the optimization is done iterat
ively using a
cooperative optimization approach where each segment finds the best solution while at the same
time, compromising that solution based on the solution obtained by its neighboring segments.

The 3D maps extracted using the above techniques were
lacking primarily because they
were built in
a batched approach [7] using se
veral

depth maps extracted from dif
ferent poses.
Alternatively, one can resort to acquiring a multitude of images of a scene at one instance and
reconstructing the 3D scene in one

shot as proposed in [4, 1, 13]
. Of these systems the work of
Furukaw
a et al. [4] was adopted primar
ily because it did not require objects to be pre
-
segmented, while the other two did. Also, the evaluation done

in [13] shows that the method of
Furukawa et a
l. gave the most complete models when the fewest number of images is used (16
images from the dataset). A
lso, their algorithm seemed read
i
l
y parallelizable which made an
implementation with a reasonable amount of running time possible.

Year 2
:

During the f
irst quarter of the second year we continued developing the 3D reconstruction
algorithm of Furukawa but unfortunately the model was lacking and we had to drive the
research in a different direction.
The
approach now relied on
using sparse set of image feat
ures
extracted from a stereo camera

to build the 3D map
. Stereo vision algorithms have the tendency
to produce less than desirable results in non
-
textured environments, and considering that most
walls in indoor places consist of one color, it is worthwhile

to investigate alternative methods
that differ in approach from traditional dense stereo matching algorithms. Th
is
approach detects
the ground and ceiling using a sparse set of image features with their 3D coordinates calculated
via triangulation and a se
gmentation algorithm.
The feature type that is
used is the Scale
Invariant Feature Transform (SIFT
)
, and the image segmentation type is the graph
-
cut
algorithm
, but these can be substituted by any other robust feature and segmentation algorithms
without af
fecting the technique described here. After the ground is detected, line segments are
fit to its boundaries incrementally. Then, at the extremities of these line segments, an edge
detection test is performed to detect whether a vertical line exists there.
If it does, a vertical line
is formed perpendicular to the plane until it reaches the ceiling. These newly formed vertical
lines form the sides of walls. The detected features are also used as input to a feature based
Extended Kalman Filter (EKF) Simultane
ous Localization And Mapping (SLAM) algorithm.
The corrected path output from SLAM is then used to merge the 3D models calculated at
different pit stops.

When the robot's camera is pointed at a window, a common situation in indoor
environments, the pixels

inside the window and around it a saturated to a completely white
color, a common phenomenon in photography called overexposure. While attempting to extract
features from images and matching them, overexposed pixels in the images proved quite
problematic:

no details remain in that area of the image to extract any features from it. In
addition, the boundaries of that area show a synthetic contrast, a product only of the inability of
the camera to capture the whole dynamic range of the scene intensities, and

this causes false
features which are not stable. To solve this issue, a photographic technique known as High
Dynamic Range (HDR)
i
maging was used. At periodic intervals of time, the robot is stopped,
and pictures are taken at multiple exposures, and comb
ined using

Iterative Closest Point (ICP)
.
This technique, although highly used in photography and computer graphics, has rarely been
used in Computer Vision. Adding the HDR technique to the robot's image acquisition process
provided images of consistent in
tensities in revisited areas of the environment.


Bibliography:

[1] D. Bradley, T. Boubekeur, and W. Heidrich. Accurate multi
-
view reconstruction using robust
binocular stereo and surface meshing. pages 1

8, June 2008.

[2] D. Comaniciu and P. Meer. Mean sh
ift: A robust approach toward feature space analysis. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 24:603

619, 2002.

[3] O. Faugeras and R. Keriven. IEEE Transactions on Image Processing, 7:336

344, 1998.

[4] Y. Furukawa and J. Ponce. Ac
curate, dense, and robust multi
-
view stereopsis. In Computer Vision
and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on, pages 1

8, 2007.

[5] T. Lemaire and S. Lacroix. Slam with panoramic vision. Journal of Field Robotics, 24(1
-
2):91

111,
2007.

[6
] J. Mairal, R. Kevien, , and A. Chariot. A GPU Implementation of Variational Stereo, 2005. Available
at
http://certis.enpc.fr/publications/papers/05certis13.pdf
.

[7] P. Merrell, A.
Akbarzadeh, Liang Wang, P. Mordohai, J.
-
M. Frahm, Ruigang Yang, D. Nister, and
M. Pollefeys. Real
-
time visibility
-
based fusion of depth maps. pages 1

8, Oct. 2007.

[8] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Trans.

Pattern
Anal. Mach. Intell., 27(10):1615

1630, October 2005.

[9] Nvidia. Cg (C for Graphics). http://developer.nvidia.com/page/cg_main.html.[10] Nvidia. CUDA
(Compute Unified Device Architecture). http://www.nvidia.com/ cuda.

[11] J. Pons, R. Keriven, and

O. Faugeras. Multi
-
view stereo reconstruction and scene flow estimation
with a global image
-
based matching score. Int. J. Comput. Vision, 72(2):179

193, 2007.

[12] C. Schmid, R. Mohr, and C. Bauckhage. Evaluation of interest point detectors. Inter
-

nation
al
Journal of Computer Vision, 37(2):151

172, 2000.

[13] S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski. Middlebury multi
-

view stereo
dataset and evaluation page.
http://vision.middle
bury.edu/mview/
.

[14] Zeng
-
Fu Wang and Zhi
-
Gang Zheng. A region based stereo matching algorithm using cooperative
optimization. pages 1

8, June 2008.

[15] Z.Xiong,M.Zhang,Y.Wang,T.Li,andS.Li.Fastpanoramaunrollingofcatadioptric omni
-
directional
images for
cooperative robot vision system. Computer Supported Cooperative Work in Design,
2007. CSCWD 2007. 11th International Conference on, pages 1100

1104, April 2007.





ثحبلا قافآ

Perspectives
-


G
iven that the quality of our depth sensor is tightly linked to the goodness of the original
segmentation, it is anticipated that any advances in segmentation techniques that will come
about from the computer vision community wi
ll have a direct impact on the efficiency and
future success of this procedure. We are in the process of studying the efficiency of statistical
segmentation methods such as the Statistical Region Merging (SRM) by Nock and Nielson [
16
].
The choice of segmen
tation system will always be a compromise between efficiency and
required processing time.

The depth sensor that is developed here is designed to operate in an indoor structured
environment, where the ground is flat, walls are perpendicular to the ground,
and the camera
orientation remains parallel to the ground. Nevertheless, we can see how this system could be
extended to a more elaborate scene that is piecewise flat such as an in an urban setting. Here,
the system would have to be able to segment and gro
up patches at different heights. We are
currently investigating how such a system could be implemented.

Current limitations of the system also include the presence of obstacles such as furniture as
these are not
taken into account in this work
. However, th
e idea that we present for detecting
planes can be extended in future work to include all the planes that are present in the scene
regardless of their height. These planes wo
uld then be incorporated into a
3D instead of a 2D
grid map in order to allow the
system to segment both vertical and horizontal planes. The
granularity of the 3D grid would also allow for arbitrarily oriented planes; arbitrarily shaped
objects can then be modeled with local planes akin to the polygonal models that are used in 3D
games
and virtual reality systems.


[
16
] R.

Nock

and

and
F.

Nielson.

Statistical

region

merging.

Pattern

Analysis

and Machine
Intelligence (PAMI), IEEE Transactions on, 26(11), November 2004.





تارمتؤملا يف تامهاسملاو تاروشنملا
-


Publications & Communications

1] Daniel Asmar and Samir Shaker.
Real
-
Time Occupancy
-
Grid SLAM of Structured

Indoor Environments using a Single Camera
.
IEEE Transactions on Robotics
. (Submitted)

[2] Samir Shaker, and Daniel Asmar, and Imad Elhajj. 3D reconstruction of Indoor Scenes
by
Casting Visual Rays in an Occupancy Grid.
International Conference on Robotics and
BIOmimetics

(ROBIO), December 14
-
18, Tianjin, China.



ثحبلا جئاتن نع زجوم

Abstract
-


Three
-
dimensional maps are useful for many applications; from the gaming industry, to
augmented reality, to the development of tour guides of important landmarks such as museums
or university campuses. The generation of such

maps is very labor intensive and has therefore
justified its automation using robots with range sensors such
as lasers or cameras. This work

presents an automated 3D reconstruction system for indoor environments, which relies on a
vision
-
based occupancy
-
g
rid SLAM (Si
multaneous Localization and Mapping) to detect the
ground. The novelty in our work is the method in which 3D information is extracted and fed to
SLAM. The ground plane is determined based on the orientation of segmented regions and
virtual rays

are cast into the field of view from the camera center to the intersection of each
ray’s 2D projection with the Ground boundaries. Dense depth information can then be
suggested from these rays and inputted to SLAM. Our system produces high
-
quality maps an
d
reduces the high computational cost of dense stereo matching by processing only a sparse set of
highly rel
iable salient features. Experi
ments are conducted inside a lab setting and results prove
the success of the system