Stereo Matching Algorithm with Refinement Stage
Michael Nachaat
maykelnawar@hotmail.com
Mahmoud Nabil
mah.nabil@yahoo.com
Miriam Azmy
miriam_azmy@hotmail.com
Abstract
Stereo matching is one of the most active research
areas in computer vision. It has provoked a great deal of
research
into computer vision systems with two inputs
that exploit the knowledge of their own relative geometry
to derive depth information from the two views they
receive. Depth information can be used to track moving
objects in 3D space, gather distance informat
ion for
scene features, or to construct a 3D spatial model of a
scene.
In this paper, we present a
local
algorithm used for
binocular
stereo matching (
we use
in this algorithm:
pixel

based matching
in the
cost computation, fixed
window
in the
cost aggregat
ion, and
trivial assignment in
the disparity computation
)
,
then we will present an
algorithm used for the
disparity
refinement of the results
obtained by the first algorithm (sub

pixel refinement
algorithm),
then we will compare between the results of
the
stereo matching algorithm with and without applying
the refinement algorithm.
In order to establish a software implementation and a
collection of data sets to show the results of the
algorithms, we have found a flexible Matlab
implementation of the algori
thms that enables the analyze
of the two algorithms. We have also found multi

frame
stereo data sets with ground truth and are making both
the code and data sets available on the CD.
Finally, we will make a comparison between the
algorithm we have used an
d other algorithms used for
stereo matching.
1.
Introduction
The binocular (
two

eyed
) human vision system
captures two different views of a scene. The human brain
processes each view and matches similarities. Most of the
information captured in each a parti
cular view is
congruent with the information captured in the other,
however, some information is not
.
The differences allow
the human brain to build depth information.
In binocular stereo matching, if two calibrated
ca
meras observe the same scene point p (
refer to figure
1), its
3D coordinates can be computed as the intersection
of two such rays. This is the basic principle of stereo
vision that typically consists of three steps:
• Camera calibration.
• Establishing point correspondences between pairs of
p
oints from the left and the right images.
• Reconstruction of 3D coordinates of the points in the
scene.
Figure
1

Stereo Camera
If the two cameras are calibrated such that they will be
perfectly aligned and with the same focal
length, then the
depth can be easily calculated as shown in the equations
below (refer to figure 2):
By considering similar triangles (P
and Ppp’):
Let
is the disparity
And
for the pair of cameras
Then
Figure
2

Aligned Stereo Cameras with the same
focal length
Since the disparity is inversely proportional to
the depth of the point
(maximum disparity ≡ minimum
depth
1
), then if the point is near then its disparity is high
and
if the point is far then its disparity is low (refer to
figure 3)
Figure
3

Disparity and Depth Relationship
As we have seen in figure 3 that the depth of a
pixel in a reference image (left image) can be determined
knowing its
disparity from its corresponding pixel in a
target image (right image), so i
n the next section we will
introduce the methodology of
calculat
ion of
the disparity
of
a given pixel in a reference image
from its
corresponding pixel a target image.
2.
Methodology
In this section, we describe
the
algorithm we used in
the stereo
matching.
Basically there exist two different (not mutually
exclusive) strategies:
Local algorithms: In order to increase the SNR
(reduce ambiguity) the matching costs are
1
aggregated over a
support window will be
discussed later.
Global (and semi

global
2
) approaches:
Many algorithms search for the disparity assignment
that minimize
s
a certain cost function over the
whole
3
stereo pair.
We used a Local algorithm which consists of four
mai
n steps as follows:
1.
Matching cost computation
.
2.
Cost aggregation
.
3.
Disparity computation
.
4.
Disparity refinement
(Optional)
.
2.1.
Matching cost computation
First we have to introduce the
correspondence
problem
that tries to figure out which parts of an image
correspond to which parts of another image as shown in
figure 4
Figure
4
the
C
orrespondence
P
roblem
We used the Absolute differences Pixel

based
matching costs where the matching energy function is
equal to the absolute difference between the pixel in the
reference image and the pixel in the target image as
shown in Fig
5
(
)
(
)

(
)
Equation (1)
Figure
5
Pixel
B
ased
M
atching
C
ost
By applying Equation
(
1
)
using different
disparities from 1 to
(maximum disparity) this
2
subset of the stereo pair
3
subset of the stereo pair
will result in what is known as the disparity space
image
, a
s shown in Fig
6
Figure
6
DSI
Where
is a 3D matrix where each element
of the
represents the cost of the
correspondence between
and
Calculation of
can be done on Matla
b
as shown in
the Code sample of
f
ig
ure
7
Area

based matching
cost can be used
to
calculate the matching cost
where the energy functions
will be as follows
:
Where S is the covered area
Figure 8
Area
B
ased
M
atching
C
ost
2.2.
Cost aggregation
It is used in order to increase the SNR (reduce
ambiguity) the matching costs are aggregated over a
support window.
In the proposed algorithm we aggregate matching
costs of
horizontally then vertically then we used
the simplest Fixed Window (FW) cost aggregation
strategy,
a
s shown in Fig
ure
9
the Matla
b
code for this
s
tep is shown in
f
ig
ure
10
.
Figure
9
Fixed
W
indow
Cost Aggregation
.
2.2.1.
Pitfalls in
Fixed Window cost
aggregation
FW
fails
in most points for the following reasons:
1.
Implicitly assumes frontal

parallel surfaces.
It is shown in figures 11 and 12 that the
fixed window cost aggregation
is violated while
dealing with curves and slanted surface.
Figure 11 Fixed Windows with Curves
Figure 12 Fixed Windows with Slanted
Surfaces
2.
Ignores depth discontinuities
% Calculate pixel cost
for
Dc = 1 : D
maxL = widthL + 1

Dc;
pcost(:, Dc : widthL, Dc ) =
imabsdiff( imgright( :, 1 : maxL),
imgl
eft( :, Dc : widthL) );
end
Figure
7
Matlab Code Sample
h = zeros(WS,WS,
'double'
);
h(1,1) = 1; h(1,WS) =

1; h(WS,1) =

1;
h(WS,WS) = 1;
% Calculate integral cost
icost = single(pcost);
icost = cumsum( cumsum( icost ), 2 );
%
Calculate window cost
wcost=imfilter(icost,h,
'same'
,
'symmetric'
);
Figure
10
Matla
b
Code Sample
Implicitly assuming frontal

parallel
surface in the real scene is violated near depth
discontinuities.
Figure 13
Figure 11 Fixed Windows with
Discontinued
Areas
Aggregating the matching costs of two
populations at different depth (aligned foreground
and misaligned background (outliers)) results in the
typical inaccurate localization of depth borders.
Figure 14
Effe
cts
of Discontinued Areas
3.
Does not deal explicitly with ambiguous
regions
–
uniform areas
If the fixed window is smaller than the
ambiguou
s regions and the uniform areas, the
fixed window approach will not able to
determine the real disparity of the pixel
(bad
results of the algorithm).
Figure15 Fixed Windows with Big Ambiguous
Regions
4.
Does not deal explicitly with
repetitive
patterns
If the fixed window is smaller than the
repetitive pattern, the fixed window approach
will not able to determine the real
disparity of
the pixel (bad results of the algorithm).
Figure 16 Effects of Repetitive Patterns
2.2.2.
Advantages of the FW algorithm
Easy to implement.
Fast, thanks to incremental calculation schemes.
Runs in real

time on standard processors
(SIMD).
Has
limited memory requirements.
Hardware implementations (FPGA) run in real

time with limited power consumption (<1W).
Other approaches
used
in
Cost aggregation
:
Using Shift

able
Windows
Using Multiple Windows
Using Variable Windows:
Segmentation
2.3.
Disparity c
omputation
This step aims at finding the best disparity
assignment (e.g. the best path/surface within the DSI)
that minimizes a cost function over the whole stereo pair.
As mentioned above, differences between two
images gives depth information. The key s
tep to
obtaining accurate depth information is therefore finding
a detailed and accurate disparity map. Disparity maps can
be visualized in grayscale. Close objects result in a large
disparity value. This is translated into light grayscale
values. Objects
further away will appear darker.
In our algorithm we used the
M
atla
b
function
[DisparityCost,Disparitymap]
=
min(DSI,[], 3 );
Where DisparityCost is 2D matrix represent the best
disparity assignments and Disparity map is 2D matrix
th
at contain the
indices of the disparity assignments
For example if the best disparity assignment for
the point (x, y) has larg
e
value in the
(i.e. Close
object to the camera) then it will have large index
returned in the disparity map which means light
grayscale value and vice versa.
But generally in global stereo matching
algorithms the energy function has two terms as the
following:
The data term Edata measure how well the
assignment fits to the stereo pair (in terms of
overall matching cost). Several approaches rely
on simple pixel

based cost functions but
effective support aggregation strategies have
been successfully adopted
The smoothness/regularization Esmooth term
explicitly enforces piecewise assumptions
(continuity) about the scene. This term penalizes
disparity variations and large variations are
allowed only at (unknown) depth borders.
Plausibility of depth border is o
ften related to
edges.
So finding the best assignment that minimizes the
energy
function a NP

hard problem
Relevant approaches are:
Graph Cuts
Belief Propagation
Cooperative optimization
2.4.
Disparity refinement
Most stereo correspondence algorithms comput
e a
set of
d
isparity estimates in some
discre
t
i
z
ed
space. For
applications such as robot navigation or people tracking,
these may be perfectly adequate. However for image

based rendering, such quantized maps lead to very
unappealing view synthesis results
(the scene appears to
be made up of many thin shearing layers). To remedy this
situation, sub

pixel disparity estimates can be computed
in a variety of ways, including iterative gradient descent
and fitting a curve to the matching costs at discrete
dispari
ty levels this provides an easy way to increase the
resolution of a stereo algorithm with little additional
computation. However, to work well, the intensities
being matched must vary smoothly, and the regions over
which these estimates are computed must b
e on the same
(correct) surface.
We used
Sub

pixel interpolation
where the
sub

pixel disparity
is obtained interpolating
three matching
costs with a second degree function
as shown in figure
17
Figure
1
7
Sub

P
ixel Interpolation
This method is
computationally inexpensive and
reasonably accurate
.
2.5.
Other methods
Not all binocular stereo correspondence algorithms
can be described in terms of our basic local algorithm.
Here we briefly mention some additional algorithms that
are not c
overed by our paper.
A uni

valued representation of the disparity map is
not essential. Multi

valued representations, which can
represent several depth values along each line of sight,
have been extensively studied recently, especially for
large multi

view
data set. Another way to represent a
scene with more complexity is to use multiple layers,
each of which can be represented by a plane plus residual
parallax. Finally, deformable surfaces of various kinds
have also been used to perform 3D shape reconstruc
tion
from multiple images.
3.
Experimental Results
In this section, we describe the experiments used to
evaluate the stereo algorithms. Using the implementation
framework we have found, we examine the two main
algorithm components identified in the abstract
We use the Teddy, Tsukuba, Cones, and Venus data
sets in all experiments and report results on subsets of
these images. The complete set of results (all experiments
run on all data sets) is available on the CD.
3.1.
Resultant Stereo Match
The used image is
Tsuk
uba figure 1
8
Figure
1
8
Tsukuba image
The resultant disparity map image without sub

pixel
Interpolation is shown in figure
1
9
Figure
1
9
Tsukuba
D
isparity
M
ap without
Sub

P
ixel
Interpolation
The resultant disparity map
image with sub

pixel
i
nterpolation is shown in figure
20
where the scene
i
s
smoother
Figure
20
Tsukuba
D
isparity
M
ap with
Sub

P
ixel
Interpolation
The matching cost of
Tsukuba is shown in figure
21
where the light pixels have more cost than the dark ones.
Figure
21
Tsukuba
M
atching
C
ost
The pixel cost histogram and window cost histogram of
one pixel e.g. x=100 and y=100
In Tsukuba
image is shown in figure
22
Figure
22
C
ost of
P
ixel x=
100 and y=100
It is clear that aggregating cost
over a support window
Reduce
ambiguity
of the disparity assignment.
4.
Conclusion
In this paper we have introduced the methodology of
binocular
stereo matching,
then we introduced a local
algorithm used in stereo matching, and we introduced
other methods that could be used in stereo matching
like
:
Area based algorithm in matching cost stage.
Using shift

able windows, multiple
windows, variable windows size,
and
s
egmentation in cost aggregation stage.
Graph cuts, belief propagation, cooperative
optimization in disparity computation stage
.
I
terative gradient descent in curve fitting
stage.
We have shown
the role of stereo matching in the
formation of a completed 3D scene using the algorithm
explained.
Although that the algorithm we have used doesn’t
provide the best accuracy, but it is widely used in many
applications, due to its fast speed, there are ma
ny
applications
where this algorithm is used
such as:
1)
3D Tracking
people counting (building, bus,
train)
Safety
Surveillance and security
2)
3D Graffiti detection
3)
3D Scanning
4)
Space time stereo
5)
3D
motion
detection
Finally we would like to say that stereo matc
hing is
one of the most active research areas in computer vision
,
due to its importance in real

time application, and the
biggest challenge in this area of research is to design an
algorithm that find the best match in the minimum time.
5.
References
[
1
]
Stefano Mattoccia, " Stereo Vision: Algorithms and
Applications ", July 25 2010
, pp.1

56
[
2
]
Daniel Scharstein, Richard Szeliski, "A Taxonomy and
Evaluation of Dense Two

Frame
Stereo Correspondence
Algorithms"
[
3
]
Paul Munro, Antony P.
Gerdelan, "Stereo Vision Computer
Depth Perception"
, pp. 1

12
[
4
]
Milan Sonka and Thomson
, "Image Processin
g,
Analysis,
and
Machine
Vision,
Third
Edition
",pp573

594
[5
] "
www.vision.deis.unibo.itsmattstereo.htm
"
[
6
] "
http://en.wikipedia.org/wiki/Sum_of_absolute_differences
"
Comments 0
Log in to post a comment