Hidden Markov Trees for Statistical Signal/Image Processing

paradepetAI and Robotics

Nov 5, 2013 (3 years and 10 months ago)

65 views

Hidden Markov Trees
for Statistical Signal/Image Processing
Xiaoning Qian
ECEN689–613 Probability Models
Texas A&M University
Part I
Papers
M.S.Crouse,R.D.Nowak,R.G.Baraniuk,
Wavelet-Based
Statistical Signal Processing Using Hidden Markov Models
,
TSP,46(4),1998.
H.Choi,R.G.Baraniuk,Multiscale Image Segmentation Using
Wavelet-Domain Hidden Markov Models,TIP,10(9),2001.
J.Romberg,M.Wakin,H.Choi,R.G.Baraniuk,A Geometric
Hidden Markov Tree Wavelet Model,dsp.rice.edu,2003.
Wavelet
Part II
Wavelet Transform
Wavelet
What is a wavelet?
Wikepedia:
A wavelet series representation of a
square-integrable function is with respect to either a
complete,
orthonormal
set of basis functions,or an
overcomplete set of
Frame
of a vector space (also known as a Riesz basis),for the
Hilbert space of square integrable functions.
Wavelet
What is a wavelet?
The main idea of wavelets is from the idea of function
representations.Wavelets are closely related to
multiscale/multiresolution analysis
:
Decompose functions into different scales/frequencies and
study each component with a resolution that matches its scale.
Wavelets are a class of a functions used to localize a given
function in both space and scaling/frequency.
For more information:
http://www.amara.com/current/wavelet.html
Wavelet
An example – Haar basis
Example
Haar wavelet:the wavelet function (mother wavelet) ψ(t);
scaling function (father wavelet) φ(t):
ψ(t) =



1 0 ≤ t < 1/2
−1 1/2 ≤ t < 1
0 otherwise
;φ(t) =
￿
1 0 ≤ t < 1
0 otherwise
.
“Daughter” wavelets:ψ
a,b
(t) =
1

|a|
ψ(
t−b
a
),a–scale;b–shift;
ψ
J,K
(t) = ψ(2
J
t −K).
Multi-dimensional wavelet – tensor product of 1-dimensional
wavelet
Wavelet
An example – Haar basis
Example
Wavelet
Why wavelet?
Wavelets are localized in both space and frequency whereas
the standard Fourier transform is only localized in frequency.
Multiscale analysis
Less computationally complex
...
Wavelet
Wavelet transform
Continuous wavelet transform (CWT):
z(t) =
￿
R
W
ψ
{z}(a,b)ψ
a,b
(t)db
W
ψ
{z}(a,b) =
￿
R
z(t)ψ

a,b
(t)dt
￿
R
ψ
a,b
(t)ψ

c,d
(t)dt = δ
ac
(t)δ
bd
(t)
Wavelet
Wavelet transform
Discrete wavelet transform (DWT):
z(t) =
￿
K
u
K
φ
J
0
,K
(t) +
J
0
￿
J=−∞
￿
K
w
J,K
ψ
J,K
(t)
w
J,K
=
￿
z(t)ψ

J,K
(t)dt
￿
ψ
J
￿
,K
￿ (t)ψ

J,K
(t)dt = δ
JJ
￿ (t)δ
KK
￿ (t)
Wavelet
Properties for wavelet transform
Locality:Each wavelet is localized simultaneously in space
and frequency.
Multiresolution:Wavelets are compressed and dilated to
analyze at a nested set of scales.
Compression:The wavelet transforms of real-world signals
tend to be sparse.
Wavelet
“Secondary” properties may be useful.
Clustering:If a particular wavelet coefficient is large/small,
adjacent coefficients are very likely to also be large/small.
Persistence:Large/small values of wavelet coefficients tend to
propagate across scales
Application
Part III
Signal processing problems with wavelet
applications
Application
Denoising or signal detection
Example
Application
Denoising or signal detection
Note:
The signal model is in the wavelet domain
.
Signal model:
w
k
i
= y
k
i
+n
k
i
,
where w
k
i
is the i th wavelet coefficient by transforming the
kth sample.And the task of denoising or detection is to
estimate y
k
i
.
Traditional assumption is that they follow indenpendent
Gaussian distribution.n
i
is the white noise,
adaptive
thresholding
is enough for denoising based on the
“compression” property.
Application
Image segmentation
Example
Application
Image segmentation
Modeling the statistical dependency in images.
Image model:f (x
r
|c
i
),where c
i
are the labels for different
objects in an image,x
r
are image regions with the same label;
c = {c
i
,∀i } can be considered as a random field while x
r
is
the observation.
The model for c can be considered as prior knowledge.
Maximum likelihood segmentation:max
c
Π
r
f (x
r
|c)
Maximum A Posteriori segmentation:max
c
Π
r
f (x
r
|c)f (c)
Note:
The model can be either in the image domain or the
wavelet domain
;
Application
Multiscale image segmentation
Multiscale image segmentation:
window size
Note:the model in multiscale segmentation is again in the
wavelet domain now;
the label random field is in the quadtree
structure
.
Different statistical properties for wavelet coefficients
correspond to different image regions.
Singularity structures (edges) have large wavelet coefficients
(useful for heterogeneous regions).
Application
Multiscale image segmentation
Example
Application
Basic assumptions in these applications
Independent Gaussian
for wavelet coefficients
Better assumptions?
Secondary properties?
HMT
Part IV
Hidden Markov Trees
HMT
Graphical models as probability models
General settings:c – random field (latent/hidden variables);x
– observations
Independent c:f (c
i
) and f (x|c) = Πf (x
i
|c
i
)
Markov random field (hidden Markov model):
f (c
i
|x,c) = f (c
i
|N
i
) and f (x|c) = Πf (x
i
|c
i
)
Conditional random field:f (c
i
|x,c) = f (c
i
|x,N
i
)
HMT
Independent c
Simplest assumption:c’s are all independent:f (c
i
) and
f (x|c) = Πf (x
i
|c
i
)
Classification algorithms
HMT
Hidden Markov chain model
c follows a Markov chain structure:
f (c
i
|x,c) = f (c
i
|c
i −1
,c
i +1
)
EM algorithms
HMT
More general hidden Markov model
c has a complex neighbor structure:
f (c
i
|x,c) = f (c
i
|c
i −2
,c
i −1
,c
i +1
,c
i +2
)
EM algorithms
HMT
Conditional random field
c has a Markov structure
globally conditioned on x
:
f (c
i
|x,c) = f (c
i
|x,N
i
)
We usually assume that the probability (or transition function
and state function) has some special form.
Belief propagation algorithms
HMT
Graphical model in the image domain
Independent model for homogeneous image regions
Simple classifiers for pixel intensities
Image
Pixel
Hidden state
HMT
Graphical model in the image domain
Markov random field for noisy images or texture images
Adding prior on hidden states for “neighbors”
Image
Pixel
Hidden state
HMT
Graphical model in the image domain
Conditional random field for more complicated appearance
Image
Pixel
Hidden state
HMT
Graphical model in the image domain
Hidden random field for image regions with different parts
HMT
Graphical model in the image domain
Hidden random field for image regions with different parts
Image
Pixel
Hidden state for parts
Hidden state for regions
HMT
What is an appropriate model?
Tradeoff between accuracy and complexity.
Small sample size
Overfitting...
HMT
Hidden Markov trees for wavelet coefficients
Residual dependency structure (nested structure) –
“secondary” properties;
A model that reflects these properties would be appropriate,
flexible but not too complicated;
Nested multiscale graph (tree to be specific) model:
HMT
Independent mixture model for wavelet coefficients
Mixture model provides appropriate approximation for
non-Gaussian real-world signals.
HMT
Hidden Markov chain model for wavelet coefficients
Hidden Markov chain at the same scale:
HMT
Hidden Markov tree for wavelet coefficients
Dependence across scales according to the “secondary
properties” of wavelet coefficients:
HMT
Hidden Markov tree for images
HMT
Probabilities in hidden Markov trees
For a single wavelet coefficient,as the real world signal is
always non-Gaussian,we model it with a mixture model:
f (w) =
￿
m
f (w|c = m)f (c = m)
Independent mixture model;Hidden Markov chain model;
Hidden Markov tree
model
for the tree root c
0
:f (c
0
);
for the tree nodes other than root – transition probability:
f (c
i
|c
ρ(i )
),where ρ(i ) denotes the parent node of i.
HMT
Parameters in HMT
f (c
0
),f (c
i
|c
ρ(i )
);
mixture means and variance:µ
i,m

2
i,m
Notice the conditional independence properties of the model.
HMT
Problems of HMT
As all of the graphical models,we need to solve
Training the model;
Computing the likelihood with the given observations;
Estimating the latent/hidden states.
HMT
Expectation–Maximization algorithm
General settings for estimation:max
θ
f (x|θ) (ML) or
max
θ
f (θ|x) ⇔max
θ
f (x|θ)f (θ) (MAP).
EM algorithm provides a greedy and iterative way to solve the
general estimation problem based on the hidden/latent
variables c.
log f (x|θ) = log f (x,c|θ) −log f (c|x,θ).Since this is the
iterative algorithm,we take the expectation with respect to c
with the estimated parameters θ
k−1
:
￿
log f (x|θ)f (c|x,θ
k−1
)dc =
￿
log f (x,c|θ)f (c|x,θ
k−1
)dc

￿
log f (c|x,θ)f (c|x,θ
k−1
)dc
HMT
Expectation–Maximization algorithm
Jensen’s inequality
:
￿
log f (c|x,θ)f (c|x,θ
k−1
)dc ≤
￿
log f (c|x,θ
k−1
)f (c|x,θ
k−1
)dc
To guarantee the increase of likelihood logf (x|θ),we only
need to solve:
θ
k
= arg max
θ
￿
log f (x,c|θ)f (c|x,θ
k−1
)dc
Hence,
E-step
is for computing f (c|x,θ
k−1
);
M-step
is to
solve the above optimization problem.
HMT
Training hidden Markov trees with EM
In HMT,θ = {f (c
0
),f (c
i
|c
ρ(i )
),µ
i,m

2
i,m
},where i denotes
each wavelet coefficient;m denotes each component in the
mixture.
Update the similar equation:
θ
k
= arg max
θ
￿
log f (w,c|θ)f (c|w,θ
k−1
)dc
We need several tricks to complete the EM algorithm here
since we do not have an easy form for f (w,c|θ).
HMT
Training hidden Markov trees with EM
The main task to estimate the marginal state distribution
f (c
i
= m|w,θ) and the parent-child joint distribution
f (c
i
= m,c
ρ(i )
= n|w,θ).
Based on the conditional independence we have for HMT,we
can write:f (c
i
= m,w|θ) = f (w
T
i
|w
ˆ
T
i
,c
i
= m,θ)f (c
i
=
m,w
ˆ
T
i
|θ) = f (w
T
i
|c
i
= m,θ)f (c
i
= m,w
ˆ
T
i
|θ) = β
i
(m)α
i
(m);
and similarly,
f (c
i
= m,c
ρ(i )
= n,w|θ) = β
i
(m)f (c
i
|c
ρ(i )

ρ(i )
(n)β
ρ(i )\i
(n).
While f (w|θ) =
￿
m
f (c
i
= m,w|θ) =
￿
m
α
i
(m)β
i
(m),we
have these distributions expressed in terms of α,β.
HMT
Training hidden Markov trees with EM
For the computation,we need to follow the downward
algorithm from coarse to fine levels to estimate α’s and
upward algorithm from fine to coarse levels to estimate β’s as
described in the paper.
M-step is simply the conditional means due to Gaussian
assumption.
Note the tricks to handle with K trees and tying.
HMT
Coming back to the denoising problem...
With the EM trained parameters,including
f (c
i
= m|w,θ),σ
￿
c
i
s,and σ
n
’s,the estimation for the signal is
simple as solving the conditional mean estimates:
E(y
i
|w,θ) =
￿
m
f (c
i
= m|w,θ)
σ
2
i,m
σ
2
i,m

2
n
w
i
HMT
Image segmentation
2D hidden Markov trees
Similar setting as in 1D signal model
Difference:
Subband independence:
f (w|Θ) = f (w
LH

LH
)f (w
HL

HL
)f (w
HH

HH
) (scaling);
Leads to different expansion of α’s and β’s;
Context-based interscale fusion:prior f (c):context vector
Different EM
HMT
Image segmentation
HMT
Image segmentation
HMT
Image segmentation
HMT
Extended hidden Markov trees
Geometric hidden Markov trees:
Modeling contours explicitly;
Hidden state space:c
i
= {d
m

m
}
New conditional distribution of wavelet coefficients:
f (w
i
|c
i
) ∝ exp(−dist(w
i
,e
m
)
2
/(2σ
2
g
)),where e
m
is the
response for edges with fixed distance d
m
and angle θ
m
(filter
banks)
New transition probability:f (n|m) ∝ exp(−HD(l
m
,l
n
)),where
HD(l
m
,l
n
) is the Hausdorff distance between lines determined
by distance and angle restricted to a square in the plane.
HMT
Take home message
Know available tools;
Do not force one tool for every problem;
Have a right model and appropriate assumptions;
Work hard
to find the simplest (elegant) solution.