Dynamic Markov Random Fields
P.H.S. Torr
Department of Computing
Oxford Brookes University, UK
http://cms.brookes.ac.uk/research/visiongroup/
philiptorr@brookes.ac.uk
Keywords:
Markov Rando
m Fields, Structure from
Motion,
Segmentation, Pose Estimation
, Blending on
line
and offline
Computation
.
Abstract
In this talk I will outline some of the recent work undertaken
by the Oxford Brookes Vision Group, a common theme
underlying much of the resea
rch is to cast vision problems in
terms of combinatorial optimization which provides a rich a
deep theory for understanding them, with
many new and
exciting results.
1
Introduction
As is the nature for a key note talk much of the work has been
published be
fore
so here I will only provide a brief overview
of the work to be presented together with references for the
interested reader and some high l
evel points related to the
work
.
The structure of the talk is as
follows;
first I will talk about
the classical
SFM problem, and show how it can be solved by
graph cuts. However this is very slow and of course can still
fail.
Computer Vision is a difficult task and might not be solved
for a long time, so it becomes interesting to ask how the
human might be include
d in the loop. If we are to design
systems that are truly useful then instead of aiming for a fully
automated system (which is still some way in the future)
another philosophy might be to think of computer vision as a
“power assist” for a human operator, d
ealing with the dull
tasks and allowing the human to add the intelligence for the
difficult tasks. Human time is the valuable resource and the
research should be aimed at conserving it.
This is the philosophy behind VideoTrace, the assumption is
that give
n the input video we have some allotted time before
hand to precompute whatever is needed for the task in hand in
order to minimize the amount of user interaction necessary
.
The interesting research is in the blending of what should be
computer off line an
d what should be done on line, and in
what algorithms can be exploited or developed to help the
meeting of offline and online information.
This leads our recent research on dynamic graph cuts, one
such algorithm that can combine offline and online
informa
tion in an optimal manner.
Given a solution to one
graph cut, if the graph is slightly altered (say by user
interaction) a new graph cut may be found highly efficiently.
Our work on Dynamic
Graph Cuts
lead to other fruitful
applications in particular in s
egmentation, recognition and
pose estimation.
It should be no surprise that there is a strong link between 3D
reconstruction (which can be seen as 3D segmentation) and
2D image segmentation and graph cuts have been exploited
for this purpose by Boykov and
Jolly who suggested and
ingenious use of graph cuts for interactive segmentation.
In the spirit of the philosophy suggested above we would like
to provide a “power up” to the user to make the task of image
segmentation/editing easier. Typically a user i
s concerned
with editing semantically meaningful objects in a scene e.g.
faces, dogs etc and so it would seem logically to seek to
combine the low level segmentation methods such as graph
cuts with higher level information provided by object
recognition. T
his is one of the original vision problems how
to combine top down and bottom up information and one
solution is given by our ObjCut Method.
Another is given by
PoseCut in which the dynamic graph cut is used to make the
algorithm efficient and provide an
estimate of the pose of the
object in addition to the segmentation.
Next brief abstracts are provided for each of the techniques.
2 T
he Offline: VogCuts

Volumetric Graph Cuts
for Multi View SFM
First I shall discuss a new volumetric formulation for th
e
multi

view stereo problem which is amenable to a
computationally tractable global optimisation using Graph

cuts. Our approach is to seek the optimal partitioning of 3D
space into two regions labelled as ‘object’ and ‘empty’ under
a cost functional consis
ting of the following two terms: (1) A
term that forces the boundary between the two regions to pass
through photo

consistent locations and (2) a ballooning term
that inflates the ‘object’ region. To take account of the effect
of occlusion on the first ter
m we use an occlusion robust
photo

consistency metric based on Normalised Cross
Correlation, which does not assume any geometric knowledge
about the reconstructed object. The globally optimal 3D
partitioning can be obtained as the minimum cut solution of a
weighted graph.
3
The Online:
Video Trace, ideas for
interactive SFM.
Based on offline computations, such as those described above
one can now refine the structure using online computation.
VideoTrace is a system for interactively generating realisti
c
3D
models of objects from video
—
models that might be
inserted into a
video game, a simulation environment, or
another video sequence.
The user interacts with VideoTrace
by tracing the shape of the object
to be modelled over one or
more frames of the vide
o. By interpreting
the sketch drawn by
the user in light of 3D information
obtained from computer
vision techniques, a small number of simple
2D interactions
can be used to generate a realistic 3D model.
Each of the
sketching operations in VideoTrace provi
des an intuitive
and
powerful means of modelling shape from video, and executes
quickly enough to be used interactively. Immediate feedback
allows the user to model rapidly those parts of the scene
which are
of interest and to the level of detail required.
The
combination of
automated and manual reconstruction allows
VideoTrace to model
parts of the scene not visible, and to
succeed in cases where
purely
automated approaches would
fail.
4
Merging Offline and Online: Dynamic Graph
Cuts.
We present a fast
new fully dynamic algorithm for the st

mincut/max

flow problem. We show how this algorithm can
be used to efficiently compute MAP solutions for certain
dynamically changing MRF models in computer vision such
as image segmentation. Specifically, given the
solution of the
max

flow problem on a graph, the dynamic algorithm
efficiently computes the maximum flow in a modified version
of the graph. The time taken by it is roughly proportional to
the total amount of change in the edge weights of the graph.
Our ex
periments show that, when the number of changes in
the graph is small, the dynamic algorithm is significantly
faster than the best known static graph cut algorithm. We test
the performance of our algorithm on one particular problem:
the object

background s
egmentation problem for video. It
should be noted that the application of our algorithm is not
limited to the above problem, the algorithm is generic and can
be used to yield similar improvements in many other cases
that involve dynamic change.
5
ObjCut,
Combining Top Down and Bottom
up Cues for Segmentation.
We present a principled probabilistic method for detecting
and segmenting instances of a particular object category
within an image. Our approach overcomes the deficiencies of
previous segmentation t
echniques based on traditional
grid Markov random fields (MRF), namely (i) they require
the user to provide seed pixels for the object and the
background; and (ii) they provide a poor prior for specific
shapes due to the small neighbourhood size of grid MR
Fs.
Specifically, we replace the manual interaction by automatic
object detection. Furthermore, we propose a new
probabilistic
model which includes strong shape parameters for the object
in addition to the grid clique terms. This model incorporates
top

dow
n information that is global across the image together
with the bottom

up information used in previous approaches.
We represent the articulated and non

articulated object
categories using a novel layered pictorial structures model
and set of exemplars resp
ectively.
These object category models have the advantage that they
can handle large intra

class shape, appearance and spatial
variation. We develop an efficient method, OBJCUT, to
obtain segmentations using our probabilistic framework.
Novel aspects of
this method include efficient algorithms for
sampling the object category models of our choice, and the
observation that the expected log likelihood of the model can
be increased by a single graph cut. Results are presented on
several articulated (e.g. ani
mals) and non

articulated (e.g.
fruits) object categories. We compare our method with the
state of the art in object category specific image segmentation
and demonstrate significant improvements.
6 P
oseCut
:
Simultaneous Segmentation and
3D
Pose Estima
tion of Humans using Dynamic
Graph

Cuts
We present a novel algorithm for performing integrated
segmentation and 3D pose estimation of a human body from
multiple views. Unlike other related state of the art techniques
which focus on either segmentation or
pose estimation
individually, our approach tackles these two tasks together.
Normally, when optimizing for pose, it is traditional to use
some fixed set of features, e.g. edges or chamfer maps. In
contrast, our novel approach consists of optimizing a cost
function based on a Markov Random Field (MRF). This has
the advantage that we can use all the information in the
image: edges, background and foreground appearances, as
well as the prior information on the shape and pose of the
subject and combine them in
a Bayesian framework.
Previously, optimizing such a cost function would have been
computationally infeasible. However, our recent research in
dynamic graph cuts allows this to be done much more
efficiently than before. We demonstrate the efficacy of our
ap
proach on challenging motion sequences. Note that
although we target the human pose inference problem in the
paper, our method is completely generic and can be used to
segment and infer the pose of any specified rigid, deformable
or articulated object.
7
ObjCut for Face Detection and Segmentation.
Object detection and segmentation are important problems of
computer vision and have numerous commercial applications
such as pedestrian detection, surveillance and gesture
recognition. Image segmentation has
been an extremely active
area of research in recent years. In particular segmentation of
the face is of great interest due to such applications as
Windows Messenger and Skype.
This paper proposes a novel, simple and efficient method for
face segmentation
which works by coupling face detection
and segmentation in a single framework. We use the
OBJCUT formulation that allows for a smooth combination
of object detection and Markov Random Field for
segmentation, to produce a real

time face segmentation. It
sho
uld be noted that our algorithm is extremely efficient and
runs in real
time.
Conclusion
This talk has explored several themes:
1.
How can we combine on line and offline
computation.
2.
Exploitation and
development of recent advances in
theory of combinatorial optimization specifically
graph cuts and their relation to MRF’s.
3.
How to combine top down and bottom up
information in a principled manner.
Acknowledgements
This work was supported by the EPSRC rese
arch grant
GR/T21790/01(P), EPSRC EP/C006631/1(P),
ARC
Discovery Grant DP0558318
, and the IST Programme of
European Community, under the PASCAL Network of
Excellence. As well as internal research grants from Oxford
Brookes University.
Also this work was do
ne jointly with all
the co

authors i
n the publications listed below.
References
A. Hengel, A. Dick, T. Thormahlen, B. Ward and P.H.S.
Torr, VideoTrace: Rapid interactive scene modelling from
video, In
ACM Transactions on Graphics,
to appear, 2007
.
G. Vogi
atzis, C.H. Esteban, P.H.S. Torr, and R. Cipolla.
Multi

view stereo via Volumetric Graph

cuts and Occlusion
Robust Photo Consistency
. In
IEEE Trans Pattern Analysis
and
Machine Intelligence
,
to appear, 2007.
P. Kohli and P.H.S. Torr.
Dynamic Graph Cuts for Efficient
Inference in Markov Random Fields
.
In
IEEE Trans Pattern
Analysis and Machine Intelligence
,
to
appear, 2007.
A. Hengel, A. Dick, T. Thormahlen, B. Ward and P.H.S.
Torr, VideoTrace: Rapid interactive scene modelling from
video, In
SIGGRAPH,
2007
. (oral).
P. Kohli, P. Kumar, and P.H.S. Torr,
P
3
& Beyond: Solving
Energies with Higher Order Cliques
, In
Proceedings IEEE
C
onference of Computer Vision and Pattern Recognition,
2007
(oral).
Y. Sun, P. Kohli, M. Bray, and P.H.S. Torr,
Using Strong
Shape Priors for Stereo
, In
ICVGIP,
2006
.
IAPR Best paper
award at conference.
(oral)
J. Rihan, P. Kohli,
and P.H.S. Torr,
ObjCut for Face
Detection
, In
ICVGIP,
2006
. (oral)
Y. Sun, M. Bray, A. Thayananthan, B. Yuanand, and P.H.S.
Torr,
Regression

Based Human Motion Capture From Voxel
Data
, In
Proceedings British Machine Vision Conference,
2006
. (poster)
A. Hengel,
A. Dick, T. Thormahlen, B. Ward and P.H.S.
Torr,
Building Models of Regular Scenes from Structure and
Motion,
In
Proceedings British Machine Vision Conference,
2006
. (poster)
A. Hengel,
A. Dick, T. Thormahlen, B. Ward and P.H.S.
Torr,
Rapid Interactive Modelling from Video with Graph
Cuts
, In
Proceedings
Eurographics,
2006
.
P. Kohli and P.H.S. Torr.
Measuring Uncertainty in Graph
Cut Solutions

Efficiently Computing Min

marginal Energies
using Dynamic Graph Cuts
, In
the Proceedings of the Ninth
European Conference on Computer Vision 2006
(
oral
)
.
P. Kohli, P.H.S. Torr and Mathieu Bray.
PoseCut:
Simultaneous Segmentation and 3D Pose Estimation of
Humans using Dynamic Gra
ph

Cuts
, In
the Proceedings of
the Ninth European Conference on Computer Vision 2006
(
oral
)
.
P. Kohli and P.H.S. Torr.
Efficiently Solving Dynamic
Markov Random Fields Using Graph Cuts
.
In
IEEE Tenth
International Conference on Computer Vision
2005
(
oral
)
.
(patent pending).
G. Vogiatzis, P.H.S. Torr
, and R. Cipolla.
Multi

view stereo
via Volumetric Graph

cuts.
In
Proceedings IEEE C
onference
of Computer Vision and Pattern Recognition,
pages 391

398,
2005.
(
oral
)
.
M. Pawan. Kumar, P.H.S. Torr, and A. Zisserman.
OBJCUT
.
Accepted to
IEEE Conference of Computer Vision and
Pattern Recognition,
pages 18

25, 2005.
(
oral
)
.
M. Pawan. Kumar, P.H.S. Torr, and A. Zisserman.
Detecting
Articulated Objects Using Pictorial Structures
.
In
Proceedings Bri
tish Machine Vision Conference,
pages 789

798, 2004
.
(poster)
.
A. Blake, C. Rother, M. Brown, P. Perez, P.H.S. Torr.
Interactive Image Segmentation Using an Adaptive GMMRF
Model
, In
The
Proceedi
ngs
Eighth European Conference on
Computer Vision
, pages 428
—
442, 2004.
(poster)
Comments 0
Log in to post a comment