Monocular 3D Human Tracking

copygrouperMechanics

Nov 13, 2013 (3 years and 8 months ago)

68 views

Cristian Sminchisescu
(University of Toronto)

Bill Triggs
(INRIA Rhone
-
Alpes)


Kinematic Jump Processes for
Monocular 3D Human Tracking

Goal: track human body motion in monocular
video and estimate 3D joint motion



Why Monocular ?



Movies, archival footage



Resynthesis,
e.g.

change point of view or actor



Tracking / interpretation of actions & gestures (HCI)



How do humans do this so well?

Overall Modeling Approach

1.
Generative Human Model


Kinematics, geometry, photometry


Predicts images or descriptors


Priors and anatomical constraints


2.
Model
-
image matching cost function


Robust, probabilistically motivated


Contour and intensity based


3.
Tracking by search / optimization


Discovers well supported
configurations of matching cost


Why is 3D
-
from
-
monocular hard?

Image matching
ambiguities

Depth ambiguities

Violations of
physical constraints

How many
local minima
are there?

Thousands !


even
without
image matching
ambiguities …

Examples of Kinematic Ambiguities


Minima are separated
by large distances in
parameter space

Monocular 3D Tracking Methods


CONDENSATION (discrete, motion models)


Deutscher et al.’00
: annealing, walking


Sidenbladh et al.’00,02
: importance sampling (walking + snippets)



CSS, ET/HS/Hyperdynamics (continuous, cost
-
sensitive)


Sminchisescu&Triggs’01,02

Covariance Scaled
Sampling (CSS)

Hyperdynamics

Hypersurface
Sweeping (HS)

Search Globality and Adaption


Cost sensitive continuous search methods

are


Efficient

-

avoid large wastage factors with random sampling


Generic
-

no assumptions on known motions


Focus on locating transition states and nearby minima



But


Still local (
i.e.

sometimes myopic)


Minima are typically far in parameter space


No knowledge of global long
-
range minimum structure



Want to search quasi
-
globally, yet preserve generality


Can we find other minima more efficiently by exploiting
intrinsic problem structure?

Kinematic Jump Sampling


For any given model configuration, we can
explicitly

build
the interpretation tree of alternative
kinematic solutions
with

identical
joint projections


work outwards from root of kinematic tree, recursively
evaluating forward/backward ‘flips’ for each body part


Alternatively, sample by generating flips randomly


… or, for tracking, sample shallowly and treat each limb quasi
-
independently

Efficient Inverse Kinematics


The inverse kinematics is
simple, efficient to solve


Constrained by many
observations (3D articulation
centers)


The quasi
-
spherical
articulation of the body


Mostly in closed form



The iterative solution is also very competitive


Optimize over model
-
hypothesized 3D joint assignments


1 local optimization work per new minimum found


An adaptive diffusion method (CSS) is necessary for
correspondence ambiguities

Candidate Sampling Chains

s=CovarianceScaledSampling(m
i
)

S=BuildInterpretationTree (s,C)

E=InverseKinematics(S)

Prune and locally optimize
E

1

t
p





M
i
N
j
i
j
j
C
C
1
1
]
[
v
)
(
vote

t
p
)
,
(
i
i
i
m



C=SelectSamplingChain(m
i
)

E
C
1

C
M

C

The KJS Algorithm

Tracking Experiments


4s agile dancing sequence, 25 frames per
second


Cluttered background, self
-
occlusion, motion
in depth


Automatically select kinematic jump samples
(KJS) from short 3
-
link chains (rooted at
hips, shoulders, neck)


8 modes, CSS diffusion with scaling 4

Jump Sampling in Action

Quantitative Search Statistics


Initialize in one minimum, different sampling regimes


Improved minima localization by KJS


Local optimization often not necessary

Summary


Kinematic Jump Sampling

Algorithm


Construct interpretation trees of 3D joint positions corresponding to
monocular kinematic ambiguities


Solve efficiently using closed
-
form inverse kinematics


Highly accurate hypothesis generator for long
-
range search


Local optimization polishing often un
-
necessary


Explicit kinematic jumps + cost
-
sensitive sampling


Address both depth and image matching ambiguities


Future work


Scene constraints (ground plane, equilibrium)


Jump strategies for image matching


Prior knowledge
(Sminchisescu&Jepson03 upcoming)

The End

The End