Stereo Matching Algorithm with Refinement Stage

unclesamnorweiganAI and Robotics

Oct 18, 2013 (3 years and 5 months ago)

102 views





Stereo Matching Algorithm with Refinement Stage



Michael Nachaat

maykelnawar@hotmail.com

Mahmoud Nabil

mah.nabil@yahoo.com

Miriam Azmy

miriam_azmy@hotmail.com



Abstract

Stereo matching is one of the most active research
areas in computer vision. It has provoked a great deal of
research

into computer vision systems with two inputs
that exploit the knowledge of their own relative geometry
to derive depth information from the two views they
receive. Depth information can be used to track moving
objects in 3D space, gather distance informat
ion for
scene features, or to construct a 3D spatial model of a
scene.

In this paper, we present a

local

algorithm used for

binocular
stereo matching (
we use

in this algorithm:

pixel
-
based matching
in the
cost computation, fixed
window

in the

cost aggregat
ion, and
trivial assignment in
the disparity computation
)
,

then we will present an
algorithm used for the

disparity

refinement of the results
obtained by the first algorithm (sub
-
pixel refinement
algorithm),

then we will compare between the results of

the
stereo matching algorithm with and without applying
the refinement algorithm.

In order to establish a software implementation and a
collection of data sets to show the results of the
algorithms, we have found a flexible Matlab
implementation of the algori
thms that enables the analyze
of the two algorithms. We have also found multi
-
frame
stereo data sets with ground truth and are making both
the code and data sets available on the CD.

Finally, we will make a comparison between the
algorithm we have used an
d other algorithms used for
stereo matching.



1.

Introduction

The binocular (
two
-
eyed
) human vision system
captures two different views of a scene. The human brain
processes each view and matches similarities. Most of the
information captured in each a parti
cular view is
congruent with the information captured in the other,
however, some information is not
.
The differences allow
the human brain to build depth information.

In binocular stereo matching, if two calibrated
ca
meras observe the same scene point p (
refer to figure
1), its
3D coordinates can be computed as the intersection
of two such rays. This is the basic principle of stereo
vision that typically consists of three steps:

• Camera calibration.

• Establishing point correspondences between pairs of
p
oints from the left and the right images.

• Reconstruction of 3D coordinates of the points in the
scene.


Figure
1

-

Stereo Camera

If the two cameras are calibrated such that they will be
perfectly aligned and with the same focal

length, then the
depth can be easily calculated as shown in the equations
below (refer to figure 2):


By considering similar triangles (P
and Ppp’):







Let

is the disparity

And

for the pair of cameras



Then



Figure
2

-

Aligned Stereo Cameras with the same
focal length

Since the disparity is inversely proportional to
the depth of the point

(maximum disparity ≡ minimum
depth
1
), then if the point is near then its disparity is high
and
if the point is far then its disparity is low (refer to
figure 3)


Figure
3

-

Disparity and Depth Relationship


As we have seen in figure 3 that the depth of a
pixel in a reference image (left image) can be determined
knowing its
disparity from its corresponding pixel in a
target image (right image), so i
n the next section we will
introduce the methodology of
calculat
ion of

the disparity
of

a given pixel in a reference image
from its
corresponding pixel a target image.

2.

Methodology

In this section, we describe

the
algorithm we used in
the stereo
matching.

Basically there exist two different (not mutually
exclusive) strategies:



Local algorithms: In order to increase the SNR
(reduce ambiguity) the matching costs are



1


aggregated over a

support window will be
discussed later.



Global (and semi
-
global
2
) approaches:


Many algorithms search for the disparity assignment
that minimize
s

a certain cost function over the
whole
3

stereo pair.



We used a Local algorithm which consists of four
mai
n steps as follows:

1.

Matching cost computation
.

2.

Cost aggregation
.

3.

Disparity computation
.

4.

Disparity refinement

(Optional)
.


2.1.

Matching cost computation

First we have to introduce the
correspondence
problem

that tries to figure out which parts of an image
correspond to which parts of another image as shown in
figure 4


Figure
4

the

C
orrespondence
P
roblem


We used the Absolute differences Pixel
-
based
matching costs where the matching energy function is
equal to the absolute difference between the pixel in the
reference image and the pixel in the target image as
shown in Fig
5


(





)




(



)
-


(





)


Equation (1)



Figure
5

Pixel
B
ased
M
atching
C
ost

By applying Equation

(
1
)

using different
disparities from 1 to

(maximum disparity) this



2

subset of the stereo pair


3

subset of the stereo pair




will result in what is known as the disparity space
image

, a
s shown in Fig
6



Figure
6

DSI

Where

is a 3D matrix where each element

of the


represents the cost of the
correspondence between

and


Calculation of


can be done on Matla
b

as shown in
the Code sample of

f
ig
ure

7




Area
-
based matching
cost can be used
to
calculate the matching cost
where the energy functions
will be as follows
:


Where S is the covered area


Figure 8
Area

B
ased
M
atching
C
ost

2.2.

Cost aggregation

It is used in order to increase the SNR (reduce
ambiguity) the matching costs are aggregated over a
support window.

In the proposed algorithm we aggregate matching
costs of


horizontally then vertically then we used
the simplest Fixed Window (FW) cost aggregation
strategy,

a
s shown in Fig
ure

9

the Matla
b

code for this
s
tep is shown in
f
ig
ure

10
.


Figure
9

Fixed
W
indow

Cost Aggregation
.


2.2.1.

Pitfalls in
Fixed Window cost
aggregation

FW
fails

in most points for the following reasons:

1.

Implicitly assumes frontal
-
parallel surfaces.

It is shown in figures 11 and 12 that the
fixed window cost aggregation
is violated while
dealing with curves and slanted surface.




Figure 11 Fixed Windows with Curves





Figure 12 Fixed Windows with Slanted
Surfaces

2.

Ignores depth discontinuities

% Calculate pixel cost

for

Dc = 1 : D


maxL = widthL + 1
-

Dc;


pcost(:, Dc : widthL, Dc ) =
imabsdiff( imgright( :, 1 : maxL),
imgl
eft( :, Dc : widthL) );

end




Figure
7

Matlab Code Sample

h = zeros(WS,WS,
'double'
);

h(1,1) = 1; h(1,WS) =
-
1; h(WS,1) =
-
1;
h(WS,WS) = 1;

% Calculate integral cost

icost = single(pcost);

icost = cumsum( cumsum( icost ), 2 );

%
Calculate window cost

wcost=imfilter(icost,h,
'same'
,
'symmetric'
);



Figure
10

Matla
b

Code Sample



Implicitly assuming frontal
-
parallel
surface in the real scene is violated near depth
discontinuities.


Figure 13
Figure 11 Fixed Windows with
Discontinued
Areas

Aggregating the matching costs of two
populations at different depth (aligned foreground
and misaligned background (outliers)) results in the
typical inaccurate localization of depth borders.


Figure 14
Effe
cts

of Discontinued Areas

3.

Does not deal explicitly with ambiguous
regions

uniform areas

If the fixed window is smaller than the
ambiguou
s regions and the uniform areas, the
fixed window approach will not able to
determine the real disparity of the pixel
(bad
results of the algorithm).


Figure15 Fixed Windows with Big Ambiguous
Regions

4.

Does not deal explicitly with
repetitive
patterns

If the fixed window is smaller than the
repetitive pattern, the fixed window approach
will not able to determine the real
disparity of
the pixel (bad results of the algorithm).



Figure 16 Effects of Repetitive Patterns


2.2.2.

Advantages of the FW algorithm



Easy to implement.



Fast, thanks to incremental calculation schemes.



Runs in real
-
time on standard processors
(SIMD).



Has
limited memory requirements.



Hardware implementations (FPGA) run in real
-
time with limited power consumption (<1W).

Other approaches

used
in
Cost aggregation
:



Using Shift
-
able

Windows



Using Multiple Windows



Using Variable Windows:



Segmentation

2.3.

Disparity c
omputation

This step aims at finding the best disparity
assignment (e.g. the best path/surface within the DSI)
that minimizes a cost function over the whole stereo pair.

As mentioned above, differences between two
images gives depth information. The key s
tep to
obtaining accurate depth information is therefore finding
a detailed and accurate disparity map. Disparity maps can
be visualized in grayscale. Close objects result in a large
disparity value. This is translated into light grayscale
values. Objects
further away will appear darker.

In our algorithm we used the
M
atla
b

function
[DisparityCost,Disparitymap]

=

min(DSI,[], 3 );


Where DisparityCost is 2D matrix represent the best
disparity assignments and Disparity map is 2D matrix
th
at contain the
indices of the disparity assignments

For example if the best disparity assignment for
the point (x, y) has larg
e

value in the

(i.e. Close
object to the camera) then it will have large index
returned in the disparity map which means light
grayscale value and vice versa.

But generally in global stereo matching
algorithms the energy function has two terms as the
following:





The data term Edata measure how well the
assignment fits to the stereo pair (in terms of
overall matching cost). Several approaches rely
on simple pixel
-
based cost functions but
effective support aggregation strategies have
been successfully adopted



The smoothness/regularization Esmooth term
explicitly enforces piecewise assumptions
(continuity) about the scene. This term penalizes
disparity variations and large variations are
allowed only at (unknown) depth borders.
Plausibility of depth border is o
ften related to
edges.

So finding the best assignment that minimizes the
energy

function a NP
-
hard problem

Relevant approaches are:



Graph Cuts





Belief Propagation



Cooperative optimization

2.4.

Disparity refinement

Most stereo correspondence algorithms comput
e a
set of

d
isparity estimates in some
discre
t
i
z
ed

space. For
applications such as robot navigation or people tracking,
these may be perfectly adequate. However for image
-
based rendering, such quantized maps lead to very
unappealing view synthesis results
(the scene appears to
be made up of many thin shearing layers). To remedy this
situation, sub
-
pixel disparity estimates can be computed
in a variety of ways, including iterative gradient descent
and fitting a curve to the matching costs at discrete
dispari
ty levels this provides an easy way to increase the
resolution of a stereo algorithm with little additional
computation. However, to work well, the intensities
being matched must vary smoothly, and the regions over
which these estimates are computed must b
e on the same
(correct) surface.

We used

Sub
-
pixel interpolation
where the
sub
-
pixel disparity

is obtained interpolating
three matching
costs with a second degree function
as shown in figure
17



Figure
1
7
Sub
-
P
ixel Interpolation


This method is
computationally inexpensive and
reasonably accurate
.



2.5.

Other methods

Not all binocular stereo correspondence algorithms
can be described in terms of our basic local algorithm.
Here we briefly mention some additional algorithms that
are not c
overed by our paper.

A uni
-
valued representation of the disparity map is
not essential. Multi
-
valued representations, which can
represent several depth values along each line of sight,
have been extensively studied recently, especially for
large multi
-
view

data set. Another way to represent a
scene with more complexity is to use multiple layers,
each of which can be represented by a plane plus residual
parallax. Finally, deformable surfaces of various kinds
have also been used to perform 3D shape reconstruc
tion
from multiple images.

3.

Experimental Results

In this section, we describe the experiments used to
evaluate the stereo algorithms. Using the implementation
framework we have found, we examine the two main
algorithm components identified in the abstract

We use the Teddy, Tsukuba, Cones, and Venus data
sets in all experiments and report results on subsets of
these images. The complete set of results (all experiments
run on all data sets) is available on the CD.

3.1.

Resultant Stereo Match

The used image is
Tsuk
uba figure 1
8



Figure
1
8

Tsukuba image


The resultant disparity map image without sub
-
pixel

Interpolation is shown in figure

1
9



Figure
1
9

Tsukuba
D
isparity
M
ap without


Sub
-
P
ixel

Interpolation

The resultant disparity map

image with sub
-
pixel

i
nterpolation is shown in figure

20

where the scene

i
s
smoother


Figure
20

Tsukuba
D
isparity
M
ap with


Sub
-
P
ixel

Interpolation

The matching cost of
Tsukuba is shown in figure
21

where the light pixels have more cost than the dark ones.





Figure
21

Tsukuba
M
atching
C
ost


The pixel cost histogram and window cost histogram of
one pixel e.g. x=100 and y=100

In Tsukuba

image is shown in figure
22



Figure
22

C
ost of
P
ixel x=
100 and y=100

It is clear that aggregating cost

over a support window

Reduce

ambiguity

of the disparity assignment.

4.

Conclusion

In this paper we have introduced the methodology of

binocular
stereo matching,

then we introduced a local
algorithm used in stereo matching, and we introduced
other methods that could be used in stereo matching

like
:



Area based algorithm in matching cost stage.



Using shift
-
able windows, multiple
windows, variable windows size,
and
s
egmentation in cost aggregation stage.



Graph cuts, belief propagation, cooperative
optimization in disparity computation stage
.




I
terative gradient descent in curve fitting
stage.

We have shown

the role of stereo matching in the
formation of a completed 3D scene using the algorithm
explained.

Although that the algorithm we have used doesn’t
provide the best accuracy, but it is widely used in many
applications, due to its fast speed, there are ma
ny
applications

where this algorithm is used

such as:

1)

3D Tracking



people counting (building, bus,
train)



Safety



Surveillance and security

2)

3D Graffiti detection

3)

3D Scanning

4)

Space time stereo

5)

3D
motion

detection

Finally we would like to say that stereo matc
hing is
one of the most active research areas in computer vision
,
due to its importance in real
-
time application, and the
biggest challenge in this area of research is to design an
algorithm that find the best match in the minimum time.

5.

References

[
1
]

Stefano Mattoccia, " Stereo Vision: Algorithms and
Applications ", July 25 2010
, pp.1
-
56


[
2
]

Daniel Scharstein, Richard Szeliski, "A Taxonomy and
Evaluation of Dense Two
-
Frame

Stereo Correspondence
Algorithms"


[
3
]
Paul Munro, Antony P.
Gerdelan, "Stereo Vision Computer
Depth Perception"
, pp. 1
-
12


[
4
]
Milan Sonka and Thomson
, "Image Processin
g,

Analysis,

and

Machine

Vision,

Third

Edition
",pp573
-
594

[5
] "
www.vision.deis.unibo.itsmattstereo.htm
"

[
6
] "
http://en.wikipedia.org/wiki/Sum_of_absolute_differences
"