Hidden Markov Trees for Statistical Signal/Image Processing

AI and Robotics

Nov 5, 2013 (4 years and 6 months ago)

75 views

Hidden Markov Trees
for Statistical Signal/Image Processing
Xiaoning Qian
ECEN689–613 Probability Models
Texas A&M University
Part I
Papers
M.S.Crouse,R.D.Nowak,R.G.Baraniuk,
Wavelet-Based
Statistical Signal Processing Using Hidden Markov Models
,
TSP,46(4),1998.
H.Choi,R.G.Baraniuk,Multiscale Image Segmentation Using
Wavelet-Domain Hidden Markov Models,TIP,10(9),2001.
J.Romberg,M.Wakin,H.Choi,R.G.Baraniuk,A Geometric
Hidden Markov Tree Wavelet Model,dsp.rice.edu,2003.
Wavelet
Part II
Wavelet Transform
Wavelet
What is a wavelet?
Wikepedia:
A wavelet series representation of a
square-integrable function is with respect to either a
complete,
orthonormal
set of basis functions,or an
overcomplete set of
Frame
of a vector space (also known as a Riesz basis),for the
Hilbert space of square integrable functions.
Wavelet
What is a wavelet?
The main idea of wavelets is from the idea of function
representations.Wavelets are closely related to
multiscale/multiresolution analysis
:
Decompose functions into diﬀerent scales/frequencies and
study each component with a resolution that matches its scale.
Wavelets are a class of a functions used to localize a given
function in both space and scaling/frequency.
http://www.amara.com/current/wavelet.html
Wavelet
An example – Haar basis
Example
Haar wavelet:the wavelet function (mother wavelet) ψ(t);
scaling function (father wavelet) φ(t):
ψ(t) =

1 0 ≤ t < 1/2
−1 1/2 ≤ t < 1
0 otherwise
;φ(t) =
￿
1 0 ≤ t < 1
0 otherwise
.
“Daughter” wavelets:ψ
a,b
(t) =
1

|a|
ψ(
t−b
a
),a–scale;b–shift;
ψ
J,K
(t) = ψ(2
J
t −K).
Multi-dimensional wavelet – tensor product of 1-dimensional
wavelet
Wavelet
An example – Haar basis
Example
Wavelet
Why wavelet?
Wavelets are localized in both space and frequency whereas
the standard Fourier transform is only localized in frequency.
Multiscale analysis
Less computationally complex
...
Wavelet
Wavelet transform
Continuous wavelet transform (CWT):
z(t) =
￿
R
W
ψ
{z}(a,b)ψ
a,b
(t)db
W
ψ
{z}(a,b) =
￿
R
z(t)ψ

a,b
(t)dt
￿
R
ψ
a,b
(t)ψ

c,d
(t)dt = δ
ac
(t)δ
bd
(t)
Wavelet
Wavelet transform
Discrete wavelet transform (DWT):
z(t) =
￿
K
u
K
φ
J
0
,K
(t) +
J
0
￿
J=−∞
￿
K
w
J,K
ψ
J,K
(t)
w
J,K
=
￿
z(t)ψ

J,K
(t)dt
￿
ψ
J
￿
,K
￿ (t)ψ

J,K
(t)dt = δ
JJ
￿ (t)δ
KK
￿ (t)
Wavelet
Properties for wavelet transform
Locality:Each wavelet is localized simultaneously in space
and frequency.
Multiresolution:Wavelets are compressed and dilated to
analyze at a nested set of scales.
Compression:The wavelet transforms of real-world signals
tend to be sparse.
Wavelet
“Secondary” properties may be useful.
Clustering:If a particular wavelet coeﬃcient is large/small,
adjacent coeﬃcients are very likely to also be large/small.
Persistence:Large/small values of wavelet coeﬃcients tend to
propagate across scales
Application
Part III
Signal processing problems with wavelet
applications
Application
Denoising or signal detection
Example
Application
Denoising or signal detection
Note:
The signal model is in the wavelet domain
.
Signal model:
w
k
i
= y
k
i
+n
k
i
,
where w
k
i
is the i th wavelet coeﬃcient by transforming the
kth sample.And the task of denoising or detection is to
estimate y
k
i
.
Traditional assumption is that they follow indenpendent
Gaussian distribution.n
i
is the white noise,
thresholding
is enough for denoising based on the
“compression” property.
Application
Image segmentation
Example
Application
Image segmentation
Modeling the statistical dependency in images.
Image model:f (x
r
|c
i
),where c
i
are the labels for diﬀerent
objects in an image,x
r
are image regions with the same label;
c = {c
i
,∀i } can be considered as a random ﬁeld while x
r
is
the observation.
The model for c can be considered as prior knowledge.
Maximum likelihood segmentation:max
c
Π
r
f (x
r
|c)
Maximum A Posteriori segmentation:max
c
Π
r
f (x
r
|c)f (c)
Note:
The model can be either in the image domain or the
wavelet domain
;
Application
Multiscale image segmentation
Multiscale image segmentation:
window size
Note:the model in multiscale segmentation is again in the
wavelet domain now;
the label random ﬁeld is in the quadtree
structure
.
Diﬀerent statistical properties for wavelet coeﬃcients
correspond to diﬀerent image regions.
Singularity structures (edges) have large wavelet coeﬃcients
(useful for heterogeneous regions).
Application
Multiscale image segmentation
Example
Application
Basic assumptions in these applications
Independent Gaussian
for wavelet coeﬃcients
Better assumptions?
Secondary properties?
HMT
Part IV
Hidden Markov Trees
HMT
Graphical models as probability models
General settings:c – random ﬁeld (latent/hidden variables);x
– observations
Independent c:f (c
i
) and f (x|c) = Πf (x
i
|c
i
)
Markov random ﬁeld (hidden Markov model):
f (c
i
|x,c) = f (c
i
|N
i
) and f (x|c) = Πf (x
i
|c
i
)
Conditional random ﬁeld:f (c
i
|x,c) = f (c
i
|x,N
i
)
HMT
Independent c
Simplest assumption:c’s are all independent:f (c
i
) and
f (x|c) = Πf (x
i
|c
i
)
Classiﬁcation algorithms
HMT
Hidden Markov chain model
c follows a Markov chain structure:
f (c
i
|x,c) = f (c
i
|c
i −1
,c
i +1
)
EM algorithms
HMT
More general hidden Markov model
c has a complex neighbor structure:
f (c
i
|x,c) = f (c
i
|c
i −2
,c
i −1
,c
i +1
,c
i +2
)
EM algorithms
HMT
Conditional random ﬁeld
c has a Markov structure
globally conditioned on x
:
f (c
i
|x,c) = f (c
i
|x,N
i
)
We usually assume that the probability (or transition function
and state function) has some special form.
Belief propagation algorithms
HMT
Graphical model in the image domain
Independent model for homogeneous image regions
Simple classiﬁers for pixel intensities
Image
Pixel
Hidden state
HMT
Graphical model in the image domain
Markov random ﬁeld for noisy images or texture images
Adding prior on hidden states for “neighbors”
Image
Pixel
Hidden state
HMT
Graphical model in the image domain
Conditional random ﬁeld for more complicated appearance
Image
Pixel
Hidden state
HMT
Graphical model in the image domain
Hidden random ﬁeld for image regions with diﬀerent parts
HMT
Graphical model in the image domain
Hidden random ﬁeld for image regions with diﬀerent parts
Image
Pixel
Hidden state for parts
Hidden state for regions
HMT
What is an appropriate model?
Tradeoﬀ between accuracy and complexity.
Small sample size
Overﬁtting...
HMT
Hidden Markov trees for wavelet coeﬃcients
Residual dependency structure (nested structure) –
“secondary” properties;
A model that reﬂects these properties would be appropriate,
ﬂexible but not too complicated;
Nested multiscale graph (tree to be speciﬁc) model:
HMT
Independent mixture model for wavelet coeﬃcients
Mixture model provides appropriate approximation for
non-Gaussian real-world signals.
HMT
Hidden Markov chain model for wavelet coeﬃcients
Hidden Markov chain at the same scale:
HMT
Hidden Markov tree for wavelet coeﬃcients
Dependence across scales according to the “secondary
properties” of wavelet coeﬃcients:
HMT
Hidden Markov tree for images
HMT
Probabilities in hidden Markov trees
For a single wavelet coeﬃcient,as the real world signal is
always non-Gaussian,we model it with a mixture model:
f (w) =
￿
m
f (w|c = m)f (c = m)
Independent mixture model;Hidden Markov chain model;
Hidden Markov tree
model
for the tree root c
0
:f (c
0
);
for the tree nodes other than root – transition probability:
f (c
i
|c
ρ(i )
),where ρ(i ) denotes the parent node of i.
HMT
Parameters in HMT
f (c
0
),f (c
i
|c
ρ(i )
);
mixture means and variance:µ
i,m

2
i,m
Notice the conditional independence properties of the model.
HMT
Problems of HMT
As all of the graphical models,we need to solve
Training the model;
Computing the likelihood with the given observations;
Estimating the latent/hidden states.
HMT
Expectation–Maximization algorithm
General settings for estimation:max
θ
f (x|θ) (ML) or
max
θ
f (θ|x) ⇔max
θ
f (x|θ)f (θ) (MAP).
EM algorithm provides a greedy and iterative way to solve the
general estimation problem based on the hidden/latent
variables c.
log f (x|θ) = log f (x,c|θ) −log f (c|x,θ).Since this is the
iterative algorithm,we take the expectation with respect to c
with the estimated parameters θ
k−1
:
￿
log f (x|θ)f (c|x,θ
k−1
)dc =
￿
log f (x,c|θ)f (c|x,θ
k−1
)dc

￿
log f (c|x,θ)f (c|x,θ
k−1
)dc
HMT
Expectation–Maximization algorithm
Jensen’s inequality
:
￿
log f (c|x,θ)f (c|x,θ
k−1
)dc ≤
￿
log f (c|x,θ
k−1
)f (c|x,θ
k−1
)dc
To guarantee the increase of likelihood logf (x|θ),we only
need to solve:
θ
k
= arg max
θ
￿
log f (x,c|θ)f (c|x,θ
k−1
)dc
Hence,
E-step
is for computing f (c|x,θ
k−1
);
M-step
is to
solve the above optimization problem.
HMT
Training hidden Markov trees with EM
In HMT,θ = {f (c
0
),f (c
i
|c
ρ(i )
),µ
i,m

2
i,m
},where i denotes
each wavelet coeﬃcient;m denotes each component in the
mixture.
Update the similar equation:
θ
k
= arg max
θ
￿
log f (w,c|θ)f (c|w,θ
k−1
)dc
We need several tricks to complete the EM algorithm here
since we do not have an easy form for f (w,c|θ).
HMT
Training hidden Markov trees with EM
The main task to estimate the marginal state distribution
f (c
i
= m|w,θ) and the parent-child joint distribution
f (c
i
= m,c
ρ(i )
= n|w,θ).
Based on the conditional independence we have for HMT,we
can write:f (c
i
= m,w|θ) = f (w
T
i
|w
ˆ
T
i
,c
i
= m,θ)f (c
i
=
m,w
ˆ
T
i
|θ) = f (w
T
i
|c
i
= m,θ)f (c
i
= m,w
ˆ
T
i
|θ) = β
i
(m)α
i
(m);
and similarly,
f (c
i
= m,c
ρ(i )
= n,w|θ) = β
i
(m)f (c
i
|c
ρ(i )

ρ(i )
(n)β
ρ(i )\i
(n).
While f (w|θ) =
￿
m
f (c
i
= m,w|θ) =
￿
m
α
i
(m)β
i
(m),we
have these distributions expressed in terms of α,β.
HMT
Training hidden Markov trees with EM
For the computation,we need to follow the downward
algorithm from coarse to ﬁne levels to estimate α’s and
upward algorithm from ﬁne to coarse levels to estimate β’s as
described in the paper.
M-step is simply the conditional means due to Gaussian
assumption.
Note the tricks to handle with K trees and tying.
HMT
Coming back to the denoising problem...
With the EM trained parameters,including
f (c
i
= m|w,θ),σ
￿
c
i
s,and σ
n
’s,the estimation for the signal is
simple as solving the conditional mean estimates:
E(y
i
|w,θ) =
￿
m
f (c
i
= m|w,θ)
σ
2
i,m
σ
2
i,m

2
n
w
i
HMT
Image segmentation
2D hidden Markov trees
Similar setting as in 1D signal model
Diﬀerence:
Subband independence:
f (w|Θ) = f (w
LH

LH
)f (w
HL

HL
)f (w
HH

HH
) (scaling);
Leads to diﬀerent expansion of α’s and β’s;
Context-based interscale fusion:prior f (c):context vector
Diﬀerent EM
HMT
Image segmentation
HMT
Image segmentation
HMT
Image segmentation
HMT
Extended hidden Markov trees
Geometric hidden Markov trees:
Modeling contours explicitly;
Hidden state space:c
i
= {d
m

m
}
New conditional distribution of wavelet coeﬃcients:
f (w
i
|c
i
) ∝ exp(−dist(w
i
,e
m
)
2
/(2σ
2
g
)),where e
m
is the
response for edges with ﬁxed distance d
m
and angle θ
m
(ﬁlter
banks)
New transition probability:f (n|m) ∝ exp(−HD(l
m
,l
n
)),where
HD(l
m
,l
n
) is the Hausdorﬀ distance between lines determined
by distance and angle restricted to a square in the plane.
HMT
Take home message
Know available tools;
Do not force one tool for every problem;
Have a right model and appropriate assumptions;
Work hard
to ﬁnd the simplest (elegant) solution.