The task is to track vehicles
in a video stream accurately enough to provide a
trajectory of the vehicle as it moves through an interesting traffic structure, such as an
approach for tracking vehicles uses a feature
based motion segmentation approach.
Vehicles are identified by looking for regions of the video image which have a great deal
After an area of motion has been identified and clustered, that
region may be trac
ked, as shown below in Figure 1
Figure 1 shows a track as a green curve trailing behind the white lower left vehicle. The
image region which is being tracked is within the red ellipse. The ye
llow ellipse defines
a boundary within which no other track may be initialized. The white car in the lower
left and the dark truck in the upper right both have yellow dots indicating the center of a
The motion of the vehicle is used to pro
vide an initial location which
may be tracked throughout the frame.
Tomasi (KLT) features may be used to
general motions within
video images. The KLT algorithm finds several thousand features in each f
video. It then attempts to find a correspondence between the features in one frame with
the features in the next.
all the KLT features found
for the video frame in Figure 1
as red dots
line connecting the feature to it
s location in the previous frame.
Most of the
KLT features do not move, and appear simply
The white vehic
le in the lower left
of Figure 3
has lines trailing behind the vehicle showing where that vehicle’s features
were located in the previous fr
The KLT algorithm
finds image features at locations where the minimum eigenvalue
the 2x2 symmetric matrix G:
above some threshold
the local image intensity gradient vect
, a 2x1 matrix, and
is a window about some image region
Generally, image locations which satisfy the above criterion appear to be
hich can more easily be tracked. S
the KLT algorithm uses the G
matrix to find the displacement. The above criterion for the eigenvalues of G attempts to
ensure that the KLT’s solution for the displacement is of a higher quality than points
which do not meet the
A solution for the displacement may be found by
minimizes the sum of the squares of the intensity in the following cost function:
maps which are adjacent in time.
If one simplifies
the above equation by using a two term Taylor series expansion for I
and differentiating with respect to d and setting th
e result equal to zero:
the above can be expressed as the following matrix equation:
where G is the same 2x2 matrix used to find the image features.
The criteria us
ed to find
trackable features help ensure that
can be solved for d.
This algorithm uses the implementation of the KLT found in the OpenCV computer
Classifying Feature Movemen
The movement of the KLT features is
classified by finding an
affine transform for all
over the entire image
. The KLT features whose residual exceeds some
eshold are assumed to be located on a moving object.
All of the features judged t
moving are shown in Figure 3.
The moving KLT features
re clustered by finding groups of features
whose members are
separated by no more than some minimum distance. This clustering is accompli
finding a Delaun
for the moving
points, and then
recursively traversing the Delaunay triangulation without traversing an edge
whose length exceeds the minimum distance.
Figure 4 shows a Delaunay triangulation of the moving features shown in F
igure 3. The
clustering algorithm uses the Delaunay triangulation by selecting a random vertex, and
traversing all adjacent vertices where an adjacent vertex is conne
cted by an edge whose
length is less than some threshold.
Figure 5 shows Figure 4’s vertices without any edges. The vertices have been clus
into two primary regions
he green vertices in the lower left and the r
ed vertices in the
There are a few outlying vertices near the upper right hand cluster.
Tracking an Object’s Movement
The location of moving objects in the video may be interpreted as the centroid of the
moving KLT feature clusters. However
inding the center of motion of a particular
image region is not a very good method of tracking an object. It
evidence that a vehicle probably exists near that location.
The moving cluster centroid
stay fixed with respect t
which produced it.
A better method of updating the location of a moving object is to use the local KLT
features to find a transform which maps the current KLT features to features in a
subsequent frame. The resulting transform is then used t
o update the moving object’s
Because the transform is not dependent on whether an object is moving,
the transform may be used to update
motionless objects as well.
This particular algorithm uses a 2D affine transform
for no reason other t
and ease of use.
Random Sample Consensus Affine Transform Estimate
The local affine transform is found by Random Sample Consensus (RANSAC).
RANSAC repeatedly selects three KLT features within some threshold distance of a
current track l
ocation. These three points are used to find a hypothesis for the local affine
. RANSAC then selects the affine transform hypothesis which has the most
support among the local KLT features.
f a reasonable affine transform wit
h a small residual is found at a current track location,
the transform is used to update the position and velocity of the track. Figure 6 shows
four vehicles being tracked.
The video frame shown in Figure 8 demonstrates some of the difficulties encountered
when tracking in the presence of deep shadows. The tracked regions of the vehicles tend
to be centered on the bright side away fr
om the shadows.
 B. Delaunay, Sur la sphère vide,
Izvestia Akademii Nauk SSSR
Matematicheskikh i Estestvennykh Nauk,
 J. Shi and C. Tomasi.
Good features to trac
In Proc. IEEE International Conf.
Vision and Pattern Recognition (CVPR) IEEE Press, 1994
OpenCV Open Source Computer Vision Library