Dynamic Markov Random Fields

lynxherringAI and Robotics

Oct 18, 2013 (4 years and 8 months ago)


Dynamic Markov Random Fields

P.H.S. Torr

Department of Computing

Oxford Brookes University, UK



Markov Rando
m Fields, Structure from

Segmentation, Pose Estimation
, Blending on

and offline


In this talk I will outline some of the recent work undertaken
by the Oxford Brookes Vision Group, a common theme
underlying much of the resea
rch is to cast vision problems in
terms of combinatorial optimization which provides a rich a
deep theory for understanding them, with

many new and
exciting results.



As is the nature for a key note talk much of the work has been
published be

so here I will only provide a brief overview
of the work to be presented together with references for the
interested reader and some high l
evel points related to the

The structure of the talk is as

first I will talk about
the classical

SFM problem, and show how it can be solved by
graph cuts. However this is very slow and of course can still

Computer Vision is a difficult task and might not be solved
for a long time, so it becomes interesting to ask how the
human might be include
d in the loop. If we are to design
systems that are truly useful then instead of aiming for a fully
automated system (which is still some way in the future)
another philosophy might be to think of computer vision as a
“power assist” for a human operator, d
ealing with the dull
tasks and allowing the human to add the intelligence for the
difficult tasks. Human time is the valuable resource and the
research should be aimed at conserving it.

This is the philosophy behind VideoTrace, the assumption is
that give
n the input video we have some allotted time before
hand to precompute whatever is needed for the task in hand in
order to minimize the amount of user interaction necessary
The interesting research is in the blending of what should be
computer off line an
d what should be done on line, and in
what algorithms can be exploited or developed to help the
meeting of offline and online information.

This leads our recent research on dynamic graph cuts, one
such algorithm that can combine offline and online
tion in an optimal manner.

Given a solution to one
graph cut, if the graph is slightly altered (say by user
interaction) a new graph cut may be found highly efficiently.

Our work on Dynamic
Graph Cuts
lead to other fruitful
applications in particular in s
egmentation, recognition and
pose estimation.

It should be no surprise that there is a strong link between 3D
reconstruction (which can be seen as 3D segmentation) and
2D image segmentation and graph cuts have been exploited
for this purpose by Boykov and

Jolly who suggested and
ingenious use of graph cuts for interactive segmentation.

In the spirit of the philosophy suggested above we would like
to provide a “power up” to the user to make the task of image
segmentation/editing easier. Typically a user i
s concerned
with editing semantically meaningful objects in a scene e.g.
faces, dogs etc and so it would seem logically to seek to
combine the low level segmentation methods such as graph
cuts with higher level information provided by object
recognition. T
his is one of the original vision problems how
to combine top down and bottom up information and one
solution is given by our ObjCut Method.

Another is given by
PoseCut in which the dynamic graph cut is used to make the
algorithm efficient and provide an
estimate of the pose of the
object in addition to the segmentation.

Next brief abstracts are provided for each of the techniques.

2 T
he Offline: VogCuts
Volumetric Graph Cuts
for Multi View SFM

First I shall discuss a new volumetric formulation for th
view stereo problem which is amenable to a
computationally tractable global optimisation using Graph
cuts. Our approach is to seek the optimal partitioning of 3D
space into two regions labelled as ‘object’ and ‘empty’ under

a cost functional consis
ting of the following two terms: (1) A
term that forces the boundary between the two regions to pass
through photo
consistent locations and (2) a ballooning term
that inflates the ‘object’ region. To take account of the effect
of occlusion on the first ter
m we use an occlusion robust
consistency metric based on Normalised Cross
Correlation, which does not assume any geometric knowledge
about the reconstructed object. The globally optimal 3D
partitioning can be obtained as the minimum cut solution of a

weighted graph.


The Online:
Video Trace, ideas for
interactive SFM.

Based on offline computations, such as those described above
one can now refine the structure using online computation.

VideoTrace is a system for interactively generating realisti

models of objects from video

models that might be
inserted into a

video game, a simulation environment, or
another video sequence.

The user interacts with VideoTrace
by tracing the shape of the object

to be modelled over one or
more frames of the vide
o. By interpreting

the sketch drawn by
the user in light of 3D information

obtained from computer
vision techniques, a small number of simple

2D interactions
can be used to generate a realistic 3D model.

Each of the
sketching operations in VideoTrace provi
des an intuitive

powerful means of modelling shape from video, and executes

quickly enough to be used interactively. Immediate feedback

allows the user to model rapidly those parts of the scene
which are

of interest and to the level of detail required.

combination of

automated and manual reconstruction allows
VideoTrace to model

parts of the scene not visible, and to
succeed in cases where


automated approaches would


Merging Offline and Online: Dynamic Graph

We present a fast

new fully dynamic algorithm for the st
flow problem. We show how this algorithm can
be used to efficiently compute MAP solutions for certain
dynamically changing MRF models in computer vision such
as image segmentation. Specifically, given the
solution of the
flow problem on a graph, the dynamic algorithm
efficiently computes the maximum flow in a modified version

of the graph. The time taken by it is roughly proportional to
the total amount of change in the edge weights of the graph.

Our ex
periments show that, when the number of changes in
the graph is small, the dynamic algorithm is significantly
faster than the best known static graph cut algorithm. We test
the performance of our algorithm on one particular problem:
the object
background s
egmentation problem for video. It
should be noted that the application of our algorithm is not
limited to the above problem, the algorithm is generic and can
be used to yield similar improvements in many other cases
that involve dynamic change.



Combining Top Down and Bottom
up Cues for Segmentation.

We present a principled probabilistic method for detecting
and segmenting instances of a particular object category

within an image. Our approach overcomes the deficiencies of
previous segmentation t
echniques based on traditional

grid Markov random fields (MRF), namely (i) they require
the user to provide seed pixels for the object and the

background; and (ii) they provide a poor prior for specific
shapes due to the small neighbourhood size of grid MR

Specifically, we replace the manual interaction by automatic
object detection. Furthermore, we propose a new

model which includes strong shape parameters for the object
in addition to the grid clique terms. This model incorporates
n information that is global across the image together
with the bottom
up information used in previous approaches.
We represent the articulated and non
articulated object
categories using a novel layered pictorial structures model
and set of exemplars resp

These object category models have the advantage that they
can handle large intra
class shape, appearance and spatial
variation. We develop an efficient method, OBJCUT, to
obtain segmentations using our probabilistic framework.
Novel aspects of
this method include efficient algorithms for
sampling the object category models of our choice, and the
observation that the expected log likelihood of the model can
be increased by a single graph cut. Results are presented on
several articulated (e.g. ani
mals) and non
articulated (e.g.
fruits) object categories. We compare our method with the
state of the art in object category specific image segmentation
and demonstrate significant improvements.

6 P
Simultaneous Segmentation and

Pose Estima
tion of Humans using Dynamic


We present a novel algorithm for performing integrated
segmentation and 3D pose estimation of a human body from
multiple views. Unlike other related state of the art techniques
which focus on either segmentation or
pose estimation
individually, our approach tackles these two tasks together.
Normally, when optimizing for pose, it is traditional to use
some fixed set of features, e.g. edges or chamfer maps. In
contrast, our novel approach consists of optimizing a cost
function based on a Markov Random Field (MRF). This has
the advantage that we can use all the information in the
image: edges, background and foreground appearances, as
well as the prior information on the shape and pose of the
subject and combine them in
a Bayesian framework.
Previously, optimizing such a cost function would have been
computationally infeasible. However, our recent research in
dynamic graph cuts allows this to be done much more
efficiently than before. We demonstrate the efficacy of our
proach on challenging motion sequences. Note that
although we target the human pose inference problem in the
paper, our method is completely generic and can be used to
segment and infer the pose of any specified rigid, deformable
or articulated object.


ObjCut for Face Detection and Segmentation.

Object detection and segmentation are important problems of
computer vision and have numerous commercial applications
such as pedestrian detection, surveillance and gesture
recognition. Image segmentation has
been an extremely active
area of research in recent years. In particular segmentation of
the face is of great interest due to such applications as
Windows Messenger and Skype.

This paper proposes a novel, simple and efficient method for
face segmentation
which works by coupling face detection
and segmentation in a single framework. We use the
OBJCUT formulation that allows for a smooth combination
of object detection and Markov Random Field for
segmentation, to produce a real
time face segmentation. It
uld be noted that our algorithm is extremely efficient and
runs in real


This talk has explored several themes:


How can we combine on line and offline


Exploitation and
development of recent advances in
theory of combinatorial optimization specifically
graph cuts and their relation to MRF’s.


How to combine top down and bottom up
information in a principled manner.


This work was supported by the EPSRC rese
arch grant
GR/T21790/01(P), EPSRC EP/C006631/1(P),
Discovery Grant DP0558318
, and the IST Programme of
European Community, under the PASCAL Network of
Excellence. As well as internal research grants from Oxford
Brookes University.

Also this work was do
ne jointly with all
the co
authors i
n the publications listed below.


A. Hengel, A. Dick, T. Thormahlen, B. Ward and P.H.S.
Torr, VideoTrace: Rapid interactive scene modelling from
video, In
ACM Transactions on Graphics,
to appear, 2007

G. Vogi
atzis, C.H. Esteban, P.H.S. Torr, and R. Cipolla.
view stereo via Volumetric Graph
cuts and Occlusion
Robust Photo Consistency
. In

IEEE Trans Pattern Analysis

Machine Intelligence

to appear, 2007.

P. Kohli and P.H.S. Torr.
Dynamic Graph Cuts for Efficient
Inference in Markov Random Fields


IEEE Trans Pattern
Analysis and Machine Intelligence


appear, 2007.

A. Hengel, A. Dick, T. Thormahlen, B. Ward and P.H.S.
Torr, VideoTrace: Rapid interactive scene modelling from
video, In

. (oral).

P. Kohli, P. Kumar, and P.H.S. Torr,

& Beyond: Solving
Energies with Higher Order Cliques
, In
Proceedings IEEE
onference of Computer Vision and Pattern Recognition,


Y. Sun, P. Kohli, M. Bray, and P.H.S. Torr,
Using Strong
Shape Priors for Stereo
, In

IAPR Best paper
award at conference.

J. Rihan, P. Kohli,

and P.H.S. Torr,
ObjCut for Face
, In

. (oral)

Y. Sun, M. Bray, A. Thayananthan, B. Yuanand, and P.H.S.
Based Human Motion Capture From Voxel
, In
Proceedings British Machine Vision Conference,

. (poster)

A. Hengel,

A. Dick, T. Thormahlen, B. Ward and P.H.S.
Building Models of Regular Scenes from Structure and

Proceedings British Machine Vision Conference,

. (poster)

A. Hengel,
A. Dick, T. Thormahlen, B. Ward and P.H.S.
Rapid Interactive Modelling from Video with Graph
, In


P. Kohli and P.H.S. Torr.
Measuring Uncertainty in Graph
Cut Solutions

Efficiently Computing Min
marginal Energies
using Dynamic Graph Cuts
, In

the Proceedings of the Ninth
European Conference on Computer Vision 2006

P. Kohli, P.H.S. Torr and Mathieu Bray.
Simultaneous Segmentation and 3D Pose Estimation of
Humans using Dynamic Gra
, In

the Proceedings of
the Ninth European Conference on Computer Vision 2006

P. Kohli and P.H.S. Torr.
Efficiently Solving Dynamic
Markov Random Fields Using Graph Cuts


IEEE Tenth
International Conference on Computer Vision


(patent pending).

G. Vogiatzis, P.H.S. Torr
, and R. Cipolla.
view stereo
via Volumetric Graph

Proceedings IEEE C
of Computer Vision and Pattern Recognition,
pages 391

M. Pawan. Kumar, P.H.S. Torr, and A. Zisserman.

Accepted to
IEEE Conference of Computer Vision and
Pattern Recognition,

pages 18
25, 2005.


M. Pawan. Kumar, P.H.S. Torr, and A. Zisserman.
Articulated Objects Using Pictorial Structures


Proceedings Bri
tish Machine Vision Conference,

pages 789
798, 2004


A. Blake, C. Rother, M. Brown, P. Perez, P.H.S. Torr.
Interactive Image Segmentation Using an Adaptive GMMRF
, In
Eighth European Conference on
Computer Vision
, pages 428

442, 2004.