kneewastefulAI and Robotics

Oct 29, 2013 (4 years and 7 months ago)




Recovering 3
D structure using a set of 2
D images is an important problem in the field of
computer vision. One method for collecting a set of images is the temporal accumulation of
information through a monocular observer. The relationship between subsequ
ent still images in a
video stream provides a wealth of information. This information is in the form of spatio
change. The temporal integration of such velocity information in both 2
D and 3
D space is
essential for solving shape
motion [26],

collision [8], object tracking [20], object
recognition [2], and figure
ground problems.

It is intuitively sound to suggest that changes in intensity on an image plane are somewhat
coupled with the projection of the apparent motion of the 3
D spa
ce surrounding the plane. It is
however incorrect to say that such projections are unique and complete. The loss of a
dimension, quantization of intensity, discrete sampling of infinitesimal spatial data and sensor
noise make the problem of recovering 3
motion from 2
D intensity distribution ill

There are two levels of ill
posedness in recovering 3
D motion from a sequence of intensity
distributions. The first level involves recovering the true 2
D velocity field from the subsequent
intensity imag
es formed on the retina of the 2
D imaging device. The second problem involves
constructing a fully determined system of 2
D flow fields, such that the 3
D motion can be fully,
uniquely and stably recovered.

Under constraints of rigid motion and weak per
spective the second problem becomes tractable
[26]. This literature review will concern itself with the more difficult problem of recovering the 2
D flow from the intensity fields. This is commonly referred to in the literature as the optical

Barron et


define the optical flow problem as that of “computing the approximation to the
D motion field

a projection of the 3
D velocities of surface points onto the imaging surface

from spatio
temporal patterns”.

Two comprehensive papers

on the subject of optical flow performance exist. Barron et


have produced a paper that compares nine classic flow algorithms on the basis of accuracy and
density. They provide a clear test set of image sequences that can be used for quantitative a
qualitative comparison of the different algorithms. More recent work by Liu et


[16] has
improved on Barron et

’s study by including efficiency in the evaluation of the algorithms.
They provide a coordinate system that compares accuracy with effic
iency. A curve is
constructed in this coordinate system by changing the search areas of the different algorithms.

As stated above, recovering the 2
D velocity field from a sequence of intensity images is ill
posed [13]. The problem is ill
posed because lo
cal intensity alone fails to completely encode
motion information. For example, in regions of constant intensity, motion cannot be detected
and an infinite number of solutions exist. Even when the intensity is constant in a given direction
, the solution is still only partially available. Under such conditions it is only possible to provide


the component of the solution that is perpendicular to
. This is referred to as the aperture
problem [21].

Another example of ho
w intensity fails to describe motion is Horn’s mirror ball problem [14].
Consider a sphere with no texture. As the sphere rotates about its center, no change in intensity
is observed, yet the sphere does possess a motion field.

Because the problem of reco
vering the 2
D velocity field is ill
posed, additional information
must be added to the problem statement to clearly define a closed set in which a single stable
solution exists. This additional information takes the form of imposed additional constraints
the problem and regularization strategies.

In [14], Horn formalizes the image flow constraint, thus creating an incomplete correlation
between the motion domain and the intensity domain. This constraint can be interpreted as the
assumption that a point

in the 3
D shape, when projected onto the 2
D image plain, maintains a
constant intensity over time. Mathematically this is formulated as



is the intensity distribution of the image of a pixel at a


at a time t.
Simoncelli et


[22] present this constraint from a Bayesian perspective by suggesting that the
maximum a posteriori

D displacement of a point in 3
D space when projected into an
intensity plane

is obtained as


On top of constraints, regularization strategies are used to interpolate and propagate flow
estimates from areas of greater certainty to those of lower certainty where the aperture problem
prevails. Some

algorithms simply perform post
filtering while other algorithms provide a
measure of confidence for each estimate, thus providing a criterion for better re
distribution of
the velocities estimates of high confidence to areas of lower confidence.

Thus an
optical flow algorithm is specified by three elements [3]:

the spatio
temporal operators that are applied to the image sequence to extract
features and improve the signal
noise ratio,

how velocity estimates are produced from a gradient search of the ex
tracted feature
space, and

the form of regularization applied to the flow field considering confidence measures
if they exist.



Barron et


[3] classify optical flow algorithms by their signal
extraction stage. This provides
four groups: differential tec
hniques, energy
based methods, phase
based techniques and region
based matching.

Liu et


[16] prefer to classify algorithms into two groups: those that perform a gradient search
on extracted structure of the image sequence and those that don’t. This ef
fectively groups the
differential techniques, energy
based methods, and phase
based methods into a single class of
algorithms referred to as gradient
methods, while leaving the region
matching techniques in a
class of their own. This classification is not
truly justified as region based matching also performs
hill climbing, only on a much coarser level than filter
dependent algorithms. The coarser search
provides quicker more robust results. The cost of the improved efficiency is reduced accuracy.

ntial Techniques

Differential techniques are characterized by gradient search performed on extracted first and
second order spatial derivatives, and temporal derivatives. From the Taylor expansion of the
flow constraint equation, the gradient constraint e
quation is obtained:


Horn and Schunk [15] combine a global smoothness term with the gradient constraint equation
to obtain a functional for estimating optical flow. Their choice of smoothness term minimizes the
absolute grad
ient of the velocity:


This functional can be reduced to a pair of recursive equations that must be solved iteratively.

Lucas and Kanade [18] also construct a flow estimation technique based on first order
derivatives of the

image sequence. In contrast to Horn and Schunk’s post
regularization, they choose to pre
smooth the data before using the gradient constraint equation.



is a window that
gives more influence to constraints near the center of the

This can be reduced to a closed form solution for the flow estimates where:






One important a
dvantage of this approach over Horn and Schunk [15] is the existence of a
confidence measure. The smallest eigenvalue,
, of


provides a measure to distinguish estimates of normal velocity from 2
D velocity

Barron et


report in their survey [3] that Lucas and Kanade’s algorithm provides the second
most accurate results. Liu et


[16] evaluate Lucas and Kanade as providing the third best
accuracy curve. This has motivated much work on Lucas

and Kanade’s algorithm.

Fleet and Langley [10] attempt a more efficient implementation using IIR temporal pre
and temporal recursive estimation for regularization. The temporal support was reduced to 3
frames, and the computation time improved
, while only slightly diminishing performance.

Accuracy issues are also tackled especially in the domain of discontinuities. Discontinuities
provide information about occlusion and shape. Thus researchers have attempted to reduce the
effects of smoothing
along steep intensity gradients. Nagel and Enkelmann [19] were the first to
formulate an oriented smoothness constraint:



is a fixed constant.

Ghosal and Vanek [11] look at weighted anisotrop
ic filtering to reduce the loss of discontinuity
information when regularizing. Using the eigenvalues of (8), when W is the identity matrix, they
establish weights for imposing isotropic smoothness along the x and y directions.



In the same spirit, Spetsa
kis [25] uses an adaptive Gaussian filter approach to minimize the loss
of occluding information. He applies this Gaussian

to normal equations of (8) and obtains


A velocity estimate is obtained from t
he solution of the system when the constraints

are zero.

The size of the Gaussian filter applied to a given pixel should be governed by the following rules:
the Gaussian should be larger when

the flow is smoot
h, and

instabilities in the system of equations occur.

Spetsaskis derives a measure of confidence called the
incompatible measure
. This value is
determined from the residual values

that result when reverse inje

into (10)

is the identity operator. If the residual is high, the solution is considered robust. If the
residual is low, then the Gaussian

has not gathered enough information and its
size must be

More generally, differential methods can be seen as band
pass signal extraction. Therefore they
only provide a local representation in frequency space and thus are restricted to performing well
on an interval of velocities characte
rized by a pre
smoothing of the spatio
temporal signal
before numerical differentiation. Flow information may not always be limited to a tight frequency
band. Solutions to these problems are somewhat tackled by Spetsakis, and Ghosal and Vanek
through adapt
ive filtering. Heeger [12], and Fleet and Jepson [9], however, suggest that
providing information for multiple frequency bands is much more accurate. This approach is
much more computationally expensive as well [16].

Based Methods

The advantage of

based methods is the hierarchical decomposition of the image
sequence in the frequency domain. Energy based techniques extract velocities by using families
of band
pass filters that are velocity and orientation tuned. The Fourier transform of a tra
D pattern is





is the Fourier transform of

is a Dirac delta function,

temporal frequency, and
enotes spatial frequency. This effectively implies that all
power associated with the translation will be mapped to a plane that traverses the origin in
frequency space.

As such, Heeger [12] uses a family of twelve Gabor filters of different spatial reso
lutions to
extract velocity information from the image sequences. By using Gabor filters [9], which provide
the simultaneous spatio
temporal and frequential localization, a clean band
pass representation
is obtained. A least square fit is then applied to t
he resulting distribution in frequency

Based Techniques

The most accurate optical flow estimations are produced using Fleet and Jepson’s [9] phase
based approach [3,16]. Phase
based methods also use a family of velocity
tuned filters to
ract a local
frequency representation of the image sequence. Flow estimates are provided by
gradient search in the phase space of the extracted signatures.

Motivation for this approach is based on the argument that evolution of phase contours provides
good approximation to projected motion field. The phase output of band
pass filters is
generally more stable than the amplitude when small translations in the scene are sought. As
optical flow is a localized measurement, it is often characterized by small
displacements. Thus
deriving the velocity from phase as opposed to magnitude is advantageous.

Region Matching

Region matching is particular in that it forms the filters for feature extraction from the previous
image in the sequence. Tiles from the previ
ous image are correlated with the next image using
some distance measure. The best match provides the most likely displacement. This is
equivalent to searching a spatially shifted and temporally differentiated space.

This approach provides more robustness

with respect to numerical differentiation and is
generally quicker since it constructs a highly quantized gradient distribution. As mentioned
earlier, this distribution is so coarse that Liu et


[16] classify it as a non
gradient algorithm.

The distan
ce measure used by more classical algorithms such as Anandan [1], and Singh and
Allen’s [24] is referred to as the sum
square differences (SSD). It is formulated as




where W is a 2
D window function and
denote the suggest displacement vector.

Anandan constructs a multi
scale method based on the Burt Laplacian pyramid [6]. A coarse
fine strategy is adopted such that larger displacements are first determined from less resolved
versions of the im
ages and then improved with more accurate higher resolution versions of the
image. This strategy is well suited for large
scale displacements but is less successful for sub
pixel velocities.

Confidence measures,

, which are based on the principle curvatures of the SSD
surface, are used to steer the smoothing process. The smoothness constraint is based on the
principle axes of the SSD surface
, the estimated displacement

the sought best
fit velocity estimate
. Anandan also includes Horn and Schunk’s [15]
formulation of the smoothness constraint. Mathematically,


Singh and Allen [24] provid
e another approach using the SSD. They use a three
approach to the region matching method to average out temporal error in the SSD. For a frame
0, they form an SSD distribution with respect to frame

1 and frame +1 as such:


, Singh and Allen build a probability distribution:


where k is a normalization constant. The sub
pixel flow estimates

are then
obtained by considering the mean of the di
stribution with respect to




Singh and Allen [24] employ a Laplacian pyramid strategy similar to that of Anandan [1]. This
provides a more symmetric distribution about displace
ment estimates in the SSD. A covariance
matrix is then constructed from these estimates as such:


Singh suggests that the eigenvalues of the inverse of

provide a measure of confidence for

For a given flow field
, the least
square estimate in a

neighborhood about
can be obtained from


A covariance matrix
can then
be generated in the same manner as (17) from (18). Flow
regularization is then obtained by minimizing the sum of the Mahalanobis distances between the
estimated flow field
and the two distributions


The eigenvalues of the covariance matrix

serve as confidence measures for the
regularization process.

Benoits and Ferrie [4] build a more robust and simplified region matching metric that is sim
ilar to

the SSD. They denote a tile
in pixel pattern in frame 1 at position
and a
corresponding pattern
in frame 2 at position

. T
he match distance
between the two tiles is provided by a combination of the absolute difference and sum in
intensities for the tiles


Thresholding is applied to the difference and sum distributions. When the sum of pixel i
is small, the data is considered unusable. When the difference between successive inputs is


small, the intensities should be considered the same. From the total difference
sum ratio
distribution, average pixel
matching error is summarized.

near flow consistency is implemented using adaptive diffusion. Similarity measures are
constructed for neighborhoods of flow based on magnitude and direction. These are averaged
to form a single similarity metric.

An interesting region matching flow algor
ithm is Camus’ quantized flow [7]. Camus constructs a
time flow algorithm on the idea that performing a search over time instead of over space is
linear in nature rather than quadratic. A quantized sub
pixel displacement field results. Liu et



report that Camus’ algorithm provides one of the two best accuracy
efficiency ratio

The SSD search space constructed in Camus’ algorithm is limited spatially to small areas yet
extends itself in time. This is denoted as a temporal search S frame
s deep, and the spatial search
over a (2n+1)x(2n+1) pixel area. The success of this algorithm is based on the idea that support
for a faster frame
rate reduces the necessary area of spatial search by providing a better
sampling rate. Another efficient elem
ent of this algorithm is its suitability for integer arithmetic.
However, it only provides a quantized flow field containing
different possible

Other Algorithms

The only other algorithm that obtains comparable efficienc
accuracy results to Camus’
approach is that of Liu et


[17]. Using a general steady 3
D motion model in combination
with 3
D Hermite polynomial differentiation filters an efficient and accurate algorithm is
constructed [16]. Their approach is similar
to that of Heeger [12] and Fleet and Jepson [9], as
it requires generating a family of spatio
temporal filters. The filters are thus tuned to reflect the
D motion model when projected onto a 2
D perspective model.

Hermite polynomial filters offer seve
ral advantages. Orthogonal and Gaussian properties ensure
stability. They are extensible to higher order derivatives. Finally, they reflect numerous
physiological models that support receptive fields being modeled by Gaussian derivatives of
various widths.

Other original approaches to flow estimation include Nesi at

’s [21] work. They obtain better
discontinuous flow estimates using clustering techniques. They use the Combinatorial Hough
Transform to propose a multipoint solution with most
likelihood es
timation. They argue that
clustering techniques provide a better approach to solving the flow
blurring problem than more
traditional techniques that use least
square estimation.

Using the flow constraint equation Nesi et

build a line parameterization
of the linear system
provided by a neighborhood of pixels:





Votes are accumulated by counting the number of lines that intersect in any given neigh
of the parameterized space. This effectively provides a discriminating function for possible
solutions. Outliers are ignored and multiple velocities in the polled neighborhood are segregated,
thus avoiding the aliasing that results from traditional

square filter estimation methods.

In [13], Heitz and Bouthemy also use a statistical model to provide better estimation of
discontinuous flow fields. They suggest that using the intersection of solution sets provided by
complementary constraints pr
ovides a more robust estimate. Data fusion between constraints is
formulated using a Bayesian framework associated with Markov Random Fields.

A two
constraint system is provided in [13]. The first is the traditional image flow constraint (1).
The second i
s Bouthemy’s feature

based motion constraint [5]. It is based on spatio
surface modeling and hypothesis testing techniques and incorporates occlusion information into
the flow estimate.

Heitz and Bouthemy’s data fusion approach is interesting as
it is scalable to incorporate any
number of constraints (correlation, similarity functions, etc.).

The final point of interest of this literature review considers how one measures the validity of the
estimated flow field that results from the image forma
tion process with respect to the actual 2
velocity field that results from the projection of a 3
D motion field onto a perspective plane. The
work of Verri and Poggio [27] should be mentioned. These authors contend that the flow
constraint model provides

a near correct solution set for true 2
D flow along areas of high
curvature in the intensity domain. This indeed bridges the gap between the confidence measure
of Lucas and Kanade [18] and the confidence of the estimate with respect to the actual 2

In conclusion, there has been much work done on the optical flow problem. Researchers have
experimented with different representations of the image sequence, different regularization
techniques and different confidence measures to provide a large famil
y of flow algorithms. Some
algorithms provide near real
time frame rates with poorer accuracy while others provide more
accurate results at a higher computational cost. Some provide sparser flow
fields that are more
accurate while others provide more estim
ates that are smoothed out. It is clear that the choice of

flow algorithm when implementing a computer vision system is dependent on the application in
question. It is also clear that with advances in computing power and parallel processing, frame
rates of

these algorithms will continue to improve.




[1] Anandan, P., “A Computational Framework and an Algorithm for Measurement of Visual
Motion”, International Journal of Computer Vision, Vol. 2, pp. 283
310, 1989.

[2] Arbel, T., Ferrie, F.P, Mitra
n, M., “ Recognizing Objects From Curvilinear Motion”,
Submitted to the International Conference on Computer Vision and Pattern Recognition,

[3] Barron, J.L., Fleet, D.J. and Beauchemin, S.S., “Performance of Optical Flow Techniques”,
Internation J
ournal of Computer Vision, 12:1, pp. 43
77, 1994.

[4] Benoits, S.M. and Ferrie, F.P., “Monocular Optical Flow for Real
Time Vision Systems”,
Technical Report, Center for Intelligent Machines, McGill University, 1996.

[5] Bouthemy, P., ”A Maximum
od Framework for Determining Moving Edges”, IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 5, pp. 499
May 1989.

[6] Burt, P.J., “Fast Filter Transforms for Image Processing”, Computer Graphics and Image
Processing, Vol.
16, pp. 20
51, 1981.

[7] Camus, T., “Real
Time Quantized Optical Flow”, Proceedings of IEEE Conference on
Computer Architecture for Machine Perception, Como, Italy, pp. 126
131, 1995.

[8] De Micheli, E., Torre, V. and Uras, S., “The Accuracy of the Compu
tation of Optical Flow
and of the Recovery of Motion Parameters”, IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol. 15, No. 5, May 1993.

[9] Fleet, D.J. and Jepson, A.D., “Computation of Component Image Velocity from Local
Phase Informa
tion”, International Journal of Computer Vision, 5:1, pp. 77
104, 1990.

[10] Fleet, D.J. and Langley, K., “Recursive Filters for Optical Flow”, IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol. 17, No. 1, pp.61
67, Jan 1995.

[11] Ghosa
l, S. and Vanek, Petr., “A Fast Scalable Algorithm for Discontinuous Optical Flow
Estimation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 18, No. 2,
pp. 181
194, Feb. 1996.

[12] Heeger, D.J., “Optical Flow using Spatiotemporal Fi
lters”, International Journal of
Computer Vision. Vol. 1, pp. 279
302, 1988.



[13] Heitz, F. and Bouthemy, P., “Multimodal Estimation of Discontinuous Optical Flow Using
Markov Random Fields”, IEEE Transactions on Pattern Analysis and Machine Intelligence,

Vol. 15, No. 12, pp. 1217
1232, Dec. 1993.

[14] Horn, B.K.P, “Robot Vision”, The MIT Press, Cambridge, Massachusetts, 1986.

[15] Horn, B.K.P. and Schunk, B.G., “Determining Optical Flow”, Artificial Intelligence, Vol.
17, pp. 185
201, 1981.

[16] Liu,
H., Hong, T.H., Herman, M., Camus, T. and Chellappa, R., “Accuracy vs Efficiency
offs in Optical Flow Algorithms”, Computer Vision and Image Understanding, Vol. 72,
No. 3, pp. 271
286, 1998.

[17] Liu, H., Hong, T.H, Herman, M., and Chellappa, R., “A

Generalized Motion Model for
Estimating Optical Flow Using 3
D Hermite Polynomials”, Proceedings of the IEEE
International Conference on Pattern Recognition, Jerusalem, Israel, pp. 360
366, 1994

[18] Lucas, B. and Kanade, T., “An Iterative Image Regitra
tion Technique with Applications in
Stereo Vision”, Proceedures of the DARPA Image Understanding Workshop, pp. 121

[19] Nagel, H.H. and Enkelmann, W., “An Investigation of Smoothness Constraints for the
Estimation of Displacement Vector Fields
from Image Sequences”. IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol. 8, pp. 565
593, 1986.

[20] Negahdaripour, S., Yu, C.H, and Shokrollahi A.H., “Recovering Shape and Motion From
Undersea Images”, IEEE Journal of Oceanic Engineerin
g, Vol. 15, No. 3, pp 189
198, July

[21] Nesi, P., Del Bimbo, A., and Ben
Tzvi, D., “A Robust Algorithm for Optical Flow
Estimation”, Computer Vision and Image Understanding, Vol. 62, No.1, pp 59
68, July

[22] Schunk, B.G., “The Image Flow Co
nstraint Equation”, Computer Visison, Graphics and
Image Processing, Vol. 35, pp 20
46, 1986.

[23] Simoncelli, E.P., “Distributed Representation and Analysis of Visual Motion”, Ph.D.
dissertation, Dept. of Electrical Engineering and Computer Science, MIT,


[24] Singh, A. and Allen, P., “Image
Flow Computation: An Estimation
Theoretic Framework
and a Unified Perspective”, Computer Vision, Graphics and Image Processing, Vol. 56, pp.
177, Sept 1992.



[25] Spetsakis, M.E., “Optical Flow Estimation Us
ing Discontinuity Conforming Filters”,
Computer Vision and Image Understanding, Vol. 68, No. 3, pp. 276
289, Dec. 1997.

[26] Weber, J. and Malik J.,”Rigid Body Segmentation and Shape Description from Dense
Optical Flow Under Weak Perspective”, IEEE Transa
ctions on Pattern Analysis and Machine

Intelligence, Vol. 19, No. 2, pp. 139
143, Feb. 1997.

[27] Verri, A. and Poggio, T., “Motion Field and Optical Flow: Qualitative Properties”, IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, N
o. 5, pp.490
May 1989.