# Learning Human Pose and Motion Models for Animation

AI and Robotics

Oct 15, 2013 (4 years and 7 months ago)

92 views

Learning Human Pose and
Motion Models for Animation

Aaron Hertzmann

University of Toronto

Animation is maturing …

… but it’s still hard to create

Keyframe animation

Keyframe animation

q
1

q
2

q
3

q
(t)

q
(t)

Characters are very complex

Woody:

-

200 facial controls

-

700 controls in his body

http://www.pbs.org/wgbh/nova/specialfx2/mcqueen.html

Motion capture

[Images from NYU and UW]

Motion capture

Mocap is not a panacea

Goal: model human motion

What motions are likely?

Applications:

Computer animation

Computer vision

Related work: physical models

Accurate, in principle

Too complex to work with

(but see [Liu, Hertzmann, Popović 2005])

Computationally expensive

Related work: motion graphs

Input: raw motion capture

“Motion graph”

(slide from J. Lee)

Approach: statistical models of motions

Learn a PDF over motions, and synthesize
from this PDF
[Brand and Hertzmann 1999]

What PDF do we use?

Style
-
Based Inverse
Kinematics

with: Keith Grochow, Steve
Martin, Zoran Popović

Motivation

Body parameterization

Pose at time
t
:
q
t

Root pos./orientation
(6 DOFs)

Joint angles
(29 DOFs)

Motion

X

= [
q
1
, …,
q
T
]

Forward kinematics

Pose to 3D positions:

q
t

[x
i
,y
i
,z
i
]
t

FK

Problem Statement

Generate a character pose based on a chosen
style subject to constraints

Constraints

Degrees of freedom (DOFs)
q

Real
-
time Pose Synthesis

Off
-
Line Learning

Approach

Motion

Learning

Style

Synthesis

Pose

Constraints

y(q) = q orientation(q) velocity(q)

[ q
0

q
1

q
2

…… r
0

r
1

r
2

v
0

v
1

v
2

… ]

Features

Goals for the PDF

Learn PDF from any data

Smooth and descriptive

Minimal parameter tuning

Real
-
time synthesis

Mixtures
-
of
-
Gaussians

GPLVM

y
1

y
2

y
3

x
1

x
2

Latent Space

Feature Space

Gaussian Process Latent Variable
Model [Lawrence 2004]

GP

-
1

x ~
N
(0,I)

y ~ GP(x;

)

Learning: arg max p(X,

⤠瀨堩p

Scaled Outputs

Different DOFs have different
“importances”

Solution:

RBF kernel function k(x,x’)

k
i
(x,x’) = k(x,x’)/w
i
2

Equivalently: learn x

Wy

where W = diag(w
1
, w
2
, … w
D
)

Precision in Latent Space

2
(x)

SGPLVM Objective Function

y
1

y
2

y
3

x
1

x
2

Baseball Pitch

Track Start

Jump Shot

Style interpolation

Given two styles

1

and

2
, can we
“interpolate” them?

Approach: interpolate in log
-
domain

Style interpolation

(1
-
s)

s

Style interpolation in log space

(1
-
s)

s

Interactive Posing

Interactive Posing

Interactive Posing

Multiple motion style

Realtime Motion Capture

Style Interpolation

Trajectory Keyframing

Posing from an Image

Modeling motion

GPLVM doesn’t model motions

Velocity features are a hack

How do we model and learn
dynamics
?

Gaussian Process
Dynamical Models

with: David Fleet, Jack Wang

Dynamical models

x
t+1

x
t

Hidden Markov Model (HMM)

Linear Dynamical Systems (LDS)

[van Overschee et al ‘94; Doretto et al ‘01]

Switching LDS

[Ghahramani and Hinton ’98; Pavlovic et
al ‘00; Li et al ‘02]

Nonlinear Dynamical Systems

[e.g., Ghahramani and Roweis ‘00]

Dynamical models

Gaussian Process Dynamical Model
(GPDM)

Marginalize out , and then optimize the latent positions to
simultaneously minimize pose reconstruction error and (dynamic)
prediction error on training data

.

pose reconstruction

latent dynamics

Latent dynamical model
:

Assume IID Gaussian noise, and

with Gaussian priors on

and

Dynamics

The latent dynamic process on
has a similar form:

where

is a kernel matrix defined by kernel function

with hyperparameters

Subspace dynamical model
:

Markov Property

Remark:

Conditioned on , the dynamical
model is 1
st
-
order Markov, but the marginalization
introduces longer temporal dependence.

Learning

To estimate the latent coordinates & kernel parameters we minimize

with respect to and .

GPDM posterior:

reconstruction
likelihood

priors

dynamics
likelihood

training
motions

hyperparameters

latent
trajectories

Motion Capture Data

~2.5 gait cycles (157 frames)

Learned latent coordinates
(1st
-
order prediction, RBF kernel)

56 joint angles + 3 global translational velocity + 3 global orientation
from CMU motion capture database

3D GPLVM Latent
Coordinates

large “jumps’
in latent space

Reconstruction Variance

Volume visualization of
.

(1
st
-
order prediction, RBF kernel)

Motion Simulation

Animation of mean motion
(200 step sequence)

initial state

Random trajectories from MCMC
(~1 gait cycle, 60 steps)

Simulation: 1
st
-
Order Mean
Prediction

Red: 200 steps of mean prediction

Green: 60
-
step MCMC mean

Animation

Missing Data

50 of 147 frames dropped
(almost a full gait cycle)

spline interpolation

Missing Data: RBF Dynamics

Determining hyperparameters

GPDM

Neil’s parameters

MCEM

Data: six distinct walkers

Where do we go from here?

Let’s look at some limitations of the model

60 Hz

120 Hz

What do we want?

Phase

Variation

x
1

x
2

A walk cycle

Branching motions

Walk

Run

Stylistic variation

Current work: manifold GPs

Latent space (x)

Data space (y)

Summary

GPLVM and GPDM provide priors from
small data sets

Dependence on initialization, hyperpriors,
latent dimensionality

Open problems modeling data topology
and stylistic variation