Final project: Bae&Kim
Nov. 27, 2006
1
Clustering
appearance and shape
by Jigsaw
,
and comparing
it
with
Epitome.
Soo Hyun Bae, Woo Young Kim
Georgia Institute of Technology
CS 7495 Fall 2006
Due
:
November
2
8
, 2006
1.
Overview
This project
involves 4 papers, two of them are for the original im
plementation and the other two
are for their corresponding methods.
Our original intention
was to implement the
advanced
techniques
appeared in the paper, “Epitomic analysis of appearance and shape,”
by N. Jojic, J.
Frey
and A.
Kannan,
although
the
impleme
ntation of the portion of
learning
Epitome
was
released in
public. While we
are
implementing the applications including
image
segmentation and
image
denoising with
the
Epitomic model, we found
out that two recent papers have been
proposed
for
improving
the
original epitomic analysis : (1) “Video epitomes,” by V.
Cheung, J.
Frey and N. Jo
j
ic.
(2) “Clustering appearance and shape by learning jigsaws,” by A.
Kannan, J. Winn and C.
Rother.
The paper (1) is an extended version of the original paper for the spati
o

temporal domain while
retaining the ge
nerative models of given source and its training methods.
Also it provided more
detailed of how to exploit higher dimensional sources (e.g. videos). The paper (2) suggested a
flexible technique for generating texture
map, called
Jigsaw
in this paper,
and provided more
feasible results for images.
Accordingly
we
determin
e
to
implement the “Jigsaw” algorithm in the paper
(2)
, and compare
the results with epitomic learning and reconstructing.
For
aiming this goal
, we
hav
e
to
employ
“
G
raph
C
ut” algorithm
, which is introduced in
“
Fast approximate energy minimization via graph
cuts
, ” by
Y B
oykov, O. Veksler, and R. Zabih, since the “Jigsaw” is obtained through iterative EM
steps
. An offset
map, which generates a bridge from
a given image to a jigsaw map, is
obtained by
a
n
optimization procedure based on a
graph

cut algorithm.
Final project: Bae&Kim
Nov. 27, 2006
2
2.
Papers
[1]
A. Kannan, J. Winn and C. Rother
Clustering appearance and shape by learning jigsaws
To appear in
A
dvances in Neural Information Processing Systems
, Volume 19, 2006.
[2]
Y Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts.
PAMI
, 23(11),
2001.
[3]
V. Cheung, B. J. Frey, and N. Jojic.
Video epitomes.
In
Proc. IEEE Conf. Computer Vision
and Pattern Recognition (CVPR)
, 2005
[4]
N. Jojic, B. J. Frey, and A. Kannan.
Epitomic analysis of appearance and shape.
In
Proc.
IEEE Conf. Computer Vision (ICCV)
, 2003.
3.
Algorithms
Epitomic and Jigsaw models are
of
patch

based probabilistic
generation
.
Such
represen
tations of
a
given
image
I
are
much
compacter
than the source
representation,
but retain most of the
constitutive elements needed to reconstruct it.
The differences
between the two
are
as follows
:
While the
E
pitomic analysis and other patch

based models ha
ve
to specify the appropriate set of
dimensions of patches
and shapes before learning,
the
Jigsaw model can
effectively
learn the patch
sizes and
shapes in the learning process.
3.1.
Epitome
We consider epitome as a set of Gaussian distributions,
that is, ea
ch entry has
a
mean and
variance
, which are
parameters.
T
o learn
the
parameters of
epitome, we train a set of patches
from a
set of images
I
, and
update the parameters
by
averaging
patch
values
with appropriate
mapping
T
.
The training patches
z
k
’s are syst
ematically selected, for example, all the possible or partially
Final project: Bae&Kim
Nov. 27, 2006
3
overlapped patches in
I
.
Figure 1 shows the graphical model of epitome.
According to it,
for each
pixel
x
in image
I
to generate
, we
consider
a
number of patches
z
k
’ that
share
x
.
Then,
, wh
ere N is the number of patches that share
x
.
Each patch is generated by epitome
e
, using the
corresponding mapping
T
.
Then the entire model has the following joint distribution.
Maximizing the joint distribution lead
s
to the following update rules t
o be iterated.
The first and second value
s
are
computed in E

step, and the third and last
are
updated in M

step.
The value in
(3)
determines epitomic mean and
(4)
epitomic variance. Note that
(1)
is estimated
value assuming that ep
itome and the mappings are known.
We should, however, decide the patch sizes and shapes before learning, and
practically
the patch
shape is
chosen as
rectangle.
(1)
(2)
(3)
(4)
Final project: Bae&Kim
Nov. 27, 2006
4
3.2.
Jigsaw
Jigsaw is also
a
patch

based probabilistic
generative model
, and each entry is a
Gaussian
distribution
too
.
Although i
t learns a jigsaw from a set of training
patches
of
input
images
as
epitome,
the patches are not selected manually, but
it learns the appropriate sizes and shapes of the
patches automatically
.
In other words, t
he main d
ifference of jigsaw against epitome is
having
an
associated offset map
L
of the same size as the input image
I
.
Figure 2 shows the graphical model
of
generative process of
Jigsaw.
Each pixel
z
in a jigsaw
J
is defined as the intensity
μ(z)
,
and a
variance
1/λ
,
(
λ
is called the precision
), as epitome mean and epitome variance.
It joins together
pieces of th
e jigsaw using hidden offset
map
of the same size with input
,
and then
add
s
Gaussian
noise of variance given by the jigsaw.
Figure 2
is the graphical model of Jigsaw.
Note that the offset map
L
defines a position in the jigsaw for each pixel in the input.
Going
into detai
ls
for learning the jigsaw, it assumes the followings;
First each pixel of input image
I
is independent conditioned
on Jigsaw
J
, and offset map
L
.
To preserve the local consistency in the input, so that neighboring pixels have the same offsets,
it defines Markov Random Field.
Figure 1
. Graphical model of
epitome:
e
is epitome,
T is hidden
mapping, and z
‘
s are the
patches
predicted from
e
.
x
’
s are
measurements which is gen
erated
by adding noise to the patches.
(5)
Final project: Bae&Kim
Nov. 27, 2006
5
J
X
L
,where
E
is the set of edges in 4

connected grid.
In
order to allow having
the unused pixels in Jigsaw while learning
, it has the following Normal

Gamma prior.
Now,
it learns Jigsaw by maximizing the following joint probability
, iteratively.
As epitome iterate the hidden mapping
T
and parameters

mean
and variance of each Gaussian
distribution

, Jigsaw iterate the hidden offset map
L
and parameters, which is mean and precision.
Figure
2
. Graphical model of
Jigsaw:
J
is epitome,
L
i
s hidden
offset
map, and
X
is an
input.
(6)
(7)
(8)
Final project: Bae&Kim
Nov. 27, 2006
6
The offset map
L
is updated applying the alpha

expansion graph

cut algorithm in
[2].
W
ith the
L
,
it update
s
mean and precision
in the followings.
, where
X(z)
is the set of image pixels corresponding
z
pixel in Jigsaw.
4.
I
mplementation
W
e implemented t
he learning
process
of the Jigsaw model and reconstruction with it
.
As a part
of
learning
the
Jigsaw model, we implemented the
graph

cut in paper [2]
, in order to get offset map.
4.1.
Jigsaw
T
h
ere are two main parts. First it initialize
s
the Jigsaw by setting the precisions to be the
expected value under the prior
b/a
, where
b
is a shape,
a
is
a
scale
parameter i
n Gamma of prior of
Jig
saw. The mean is
equivalent to the
one
of
the input
distribution
on Ga
u
ss
i
an
.
This helps us to
avoid falling into a
local maxim
um from the globally optimum solution
.
Beginning with
these
initial Jigsaw parameters,
we
get offset map
L
by
the
graph

cut algor
ithm. This process will be
detailed in the next subsection.
After offset map
L
is
obtained according to the input data, it updates means and
precisions
using
it
. The detail is explained in the 3.2 section.
We repeat this process until
it
converges.
4.2.
Graph

cut
The focus of the paper
[2] is minimization of energy in early vision problems. The goal is to find
a labeling
f
that assigns each pixel
p
a label
L
f
p
, where
f
is both
piecewise smooth
and consistent wit the observed data. Many vision problems can be naturally formulated in terms of
minimization of two energy terms,
)
(
)
(
)
(
f
E
f
E
f
E
data
smooth
Here,
smooth
E
measures the extent to which
f
is not piecewise smooth, while
)
(
f
E
data
measures the disagreement between
f
and the observed data. In generating the optimized offset
(9)
(10)
Final project: Bae&Kim
Nov. 27, 2006
7
map for a given Jigsaw, the label set
L
corresponds to a set o
f index for the Jigsaw set, thus
f
is
a Jigsaw index. The data energy is formulated as
p
p
p
data
f
D
f
E
)
(
)
(
At a certain pixel
location
p
, we calculate the datacost for a given offset.
We set the distortion
fu
nction as a second norm
.
The datacost
)
(
p
p
f
D
is formulated as second norm between the
pixel value and the corresponding jigsaw value mapped by the offset,
2
)
(
)
(
p
p
p
p
i
f
f
D
The smoothing term considers the distance of a location
p
between the neighboring jigsaw
values,
}
,
{
}
,
{
)
,
(
q
p
q
p
q
p
smooth
f
f
V
E
We have tested several distortion functions so that weighted first order norm works well as
follows:
pq
q
p
q
p
w
f
f
f
f
V
)
,
(
here, we set the
pq
w
to
0.2 for any.
In the paper [
1
]
, the authors described that they utilized alpha

expansion graph

cut algorithm for
obtaining a
n
optimized offset map for a given Jigsaw map. However, from a number of experiments,
we here conjecture that the alpha

expansion a
lgorithm does not work correctly. We would say that
the general expansion graph

cut algorithm only work for this optimum offset map generation.
4.3.
Epitome
For the purpose of comparing, we
implemented the reconstruction of the original image from the
help of
the open source code for
training epitome
.
We used the
training implementation
of “video
epitomes,” instead of the original image epitome, believing the former code shows
better
performances.
(11)
(12)
(13)
(14)
Final project: Bae&Kim
Nov. 27, 2006
8
Results
We focused on comparing the Jigsaw results with epitome,
since the Jigsaw seems to be
superior
to the epitome in the sense that it can automatically learn the appropriate patch sizes and shapes.
For
the purpose of clear comparison, e
ven though there are other applications then reconstruction, we
applied the Jig
saw model for reconstruction only
,
and then
compared the results with epitome
.
Input
Jigsaw
Reconstruction
O
ffset map
Epitome
Reconstruction (average)
Reconstruction
(no avg.)
Input
Jigsaw
Reconstruction
O
ffset map
Epitome
Reconstruction (average)
Reconstruction
(no avg.)
Final project: Bae&Kim
Nov. 27, 2006
9
Input
Jigsaw
Reconstruction
O
ffset map
Epitome
Reconstruction (average)
Reconstruction(no avg
.
)
As
a
result
, we demonstrate the
comparison
with three images. We present Jigsaw me
an and
reconstructed imag
e with the inferred offset map. Epitome mean and the reconstructed images are
also shown.
In the reconstruction with epitome, we show
two reconstructed image, one of which
used
only one patch per pixel,
the other
a number of patche
s per pixel. In fact, the former method is
most
analogous
to the jigsaw reconstruction, since
Jigsaw algorithm
only needs one offset per pixel.
A
s we can see, even in the case of averaging patches with epitome, jigsaw is doing much better job
for reconstru
ction.
Notice that the image with one patch per pixel is very blocky and the other is
blurry. W
ith Jigsaw, we almost got the perfect reconstruction
.
Note that, however, the image of Jigsaw mean still
requires
additional clustering step if we want
to get t
he
Jigsaw
images
introduced
in the paper.
Because of the ambiguous explanation in the
paper,
however,
we could not implement this cluster part yet.
Figure
3
.
Comparison
of epitome and Jigsaw with reconstruction.
For the better understanding, we used
the same images in
the Jigsaw paper, as shown in the second and third image.
The top image is selected as
a more
complicated
image to compare.
Final project: Bae&Kim
Nov. 27, 2006
10
5.
Problems
There were several
problems
,
however,
that
we
encounter
ed while implementing the papers.
5.1.
Out of
memory
When we implement the graph

cut algorithm to
obtain
the appropriate offset map in Jigsaw
learning, we
face
a critical problem of insufficient memory
.
Again, we want to minimize the data
cost
p
p
p
data
f
D
f
E
)
(
)
(
.
For every pixel, the cost map for
the entire Jigsaw values should
be generated.
The size of the “bird” image 268x179 and the Jigsaw size was set to 30x30. Thus, the
data cost map requires
MB
690
16
30
30
179
268
for ‘double’ precision, which is
beyond the maximum limitation of 32bit MATAL
B.
Although the costs are truncated into 32bit
integer, it still requires around
170MB
, which is not affordable yet.
Therefore, we implemented the
graph

cut algorithm with MEX, a C implementation of MATLAB function. The current
implementation only works in
a 64bit Linux system with higher than 8GB system memory.
5.2.
Clustering the output of Jigsaw image
The third image shown in the previous table is same as the image shown in the paper
[
2
]
, but the
jigsaw is not same as the corresponding jigsaw figure. In the
paper
[
2
]
, the authors mentioned that
they applied a clustering step to determine the jigsaw pieces, but no further details.
W
e did not
implement the clustering algorithm
at this time around
,
but,
w
e note that the clustering algorithm
helps
us
see the jigs
aw pieces i
n comprehensive representation.
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο