Hidden Markov Trees

for Statistical Signal/Image Processing

Xiaoning Qian

ECEN689–613 Probability Models

Texas A&M University

Part I

Papers

M.S.Crouse,R.D.Nowak,R.G.Baraniuk,

Wavelet-Based

Statistical Signal Processing Using Hidden Markov Models

,

TSP,46(4),1998.

H.Choi,R.G.Baraniuk,Multiscale Image Segmentation Using

Wavelet-Domain Hidden Markov Models,TIP,10(9),2001.

J.Romberg,M.Wakin,H.Choi,R.G.Baraniuk,A Geometric

Hidden Markov Tree Wavelet Model,dsp.rice.edu,2003.

Wavelet

Part II

Wavelet Transform

Wavelet

What is a wavelet?

Wikepedia:

A wavelet series representation of a

square-integrable function is with respect to either a

complete,

orthonormal

set of basis functions,or an

overcomplete set of

Frame

of a vector space (also known as a Riesz basis),for the

Hilbert space of square integrable functions.

Wavelet

What is a wavelet?

The main idea of wavelets is from the idea of function

representations.Wavelets are closely related to

multiscale/multiresolution analysis

:

Decompose functions into diﬀerent scales/frequencies and

study each component with a resolution that matches its scale.

Wavelets are a class of a functions used to localize a given

function in both space and scaling/frequency.

For more information:

http://www.amara.com/current/wavelet.html

Wavelet

An example – Haar basis

Example

Haar wavelet:the wavelet function (mother wavelet) ψ(t);

scaling function (father wavelet) φ(t):

ψ(t) =

1 0 ≤ t < 1/2

−1 1/2 ≤ t < 1

0 otherwise

;φ(t) =

1 0 ≤ t < 1

0 otherwise

.

“Daughter” wavelets:ψ

a,b

(t) =

1

√

|a|

ψ(

t−b

a

),a–scale;b–shift;

ψ

J,K

(t) = ψ(2

J

t −K).

Multi-dimensional wavelet – tensor product of 1-dimensional

wavelet

Wavelet

An example – Haar basis

Example

Wavelet

Why wavelet?

Wavelets are localized in both space and frequency whereas

the standard Fourier transform is only localized in frequency.

Multiscale analysis

Less computationally complex

...

Wavelet

Wavelet transform

Continuous wavelet transform (CWT):

z(t) =

R

W

ψ

{z}(a,b)ψ

a,b

(t)db

W

ψ

{z}(a,b) =

R

z(t)ψ

∗

a,b

(t)dt

R

ψ

a,b

(t)ψ

∗

c,d

(t)dt = δ

ac

(t)δ

bd

(t)

Wavelet

Wavelet transform

Discrete wavelet transform (DWT):

z(t) =

K

u

K

φ

J

0

,K

(t) +

J

0

J=−∞

K

w

J,K

ψ

J,K

(t)

w

J,K

=

z(t)ψ

∗

J,K

(t)dt

ψ

J

,K

(t)ψ

∗

J,K

(t)dt = δ

JJ

(t)δ

KK

(t)

Wavelet

Properties for wavelet transform

Locality:Each wavelet is localized simultaneously in space

and frequency.

Multiresolution:Wavelets are compressed and dilated to

analyze at a nested set of scales.

Compression:The wavelet transforms of real-world signals

tend to be sparse.

Wavelet

“Secondary” properties may be useful.

Clustering:If a particular wavelet coeﬃcient is large/small,

adjacent coeﬃcients are very likely to also be large/small.

Persistence:Large/small values of wavelet coeﬃcients tend to

propagate across scales

Application

Part III

Signal processing problems with wavelet

applications

Application

Denoising or signal detection

Example

Application

Denoising or signal detection

Note:

The signal model is in the wavelet domain

.

Signal model:

w

k

i

= y

k

i

+n

k

i

,

where w

k

i

is the i th wavelet coeﬃcient by transforming the

kth sample.And the task of denoising or detection is to

estimate y

k

i

.

Traditional assumption is that they follow indenpendent

Gaussian distribution.n

i

is the white noise,

adaptive

thresholding

is enough for denoising based on the

“compression” property.

Application

Image segmentation

Example

Application

Image segmentation

Modeling the statistical dependency in images.

Image model:f (x

r

|c

i

),where c

i

are the labels for diﬀerent

objects in an image,x

r

are image regions with the same label;

c = {c

i

,∀i } can be considered as a random ﬁeld while x

r

is

the observation.

The model for c can be considered as prior knowledge.

Maximum likelihood segmentation:max

c

Π

r

f (x

r

|c)

Maximum A Posteriori segmentation:max

c

Π

r

f (x

r

|c)f (c)

Note:

The model can be either in the image domain or the

wavelet domain

;

Application

Multiscale image segmentation

Multiscale image segmentation:

window size

Note:the model in multiscale segmentation is again in the

wavelet domain now;

the label random ﬁeld is in the quadtree

structure

.

Diﬀerent statistical properties for wavelet coeﬃcients

correspond to diﬀerent image regions.

Singularity structures (edges) have large wavelet coeﬃcients

(useful for heterogeneous regions).

Application

Multiscale image segmentation

Example

Application

Basic assumptions in these applications

Independent Gaussian

for wavelet coeﬃcients

Better assumptions?

Secondary properties?

HMT

Part IV

Hidden Markov Trees

HMT

Graphical models as probability models

General settings:c – random ﬁeld (latent/hidden variables);x

– observations

Independent c:f (c

i

) and f (x|c) = Πf (x

i

|c

i

)

Markov random ﬁeld (hidden Markov model):

f (c

i

|x,c) = f (c

i

|N

i

) and f (x|c) = Πf (x

i

|c

i

)

Conditional random ﬁeld:f (c

i

|x,c) = f (c

i

|x,N

i

)

HMT

Independent c

Simplest assumption:c’s are all independent:f (c

i

) and

f (x|c) = Πf (x

i

|c

i

)

Classiﬁcation algorithms

HMT

Hidden Markov chain model

c follows a Markov chain structure:

f (c

i

|x,c) = f (c

i

|c

i −1

,c

i +1

)

EM algorithms

HMT

More general hidden Markov model

c has a complex neighbor structure:

f (c

i

|x,c) = f (c

i

|c

i −2

,c

i −1

,c

i +1

,c

i +2

)

EM algorithms

HMT

Conditional random ﬁeld

c has a Markov structure

globally conditioned on x

:

f (c

i

|x,c) = f (c

i

|x,N

i

)

We usually assume that the probability (or transition function

and state function) has some special form.

Belief propagation algorithms

HMT

Graphical model in the image domain

Independent model for homogeneous image regions

Simple classiﬁers for pixel intensities

Image

Pixel

Hidden state

HMT

Graphical model in the image domain

Markov random ﬁeld for noisy images or texture images

Adding prior on hidden states for “neighbors”

Image

Pixel

Hidden state

HMT

Graphical model in the image domain

Conditional random ﬁeld for more complicated appearance

Image

Pixel

Hidden state

HMT

Graphical model in the image domain

Hidden random ﬁeld for image regions with diﬀerent parts

HMT

Graphical model in the image domain

Hidden random ﬁeld for image regions with diﬀerent parts

Image

Pixel

Hidden state for parts

Hidden state for regions

HMT

What is an appropriate model?

Tradeoﬀ between accuracy and complexity.

Small sample size

Overﬁtting...

HMT

Hidden Markov trees for wavelet coeﬃcients

Residual dependency structure (nested structure) –

“secondary” properties;

A model that reﬂects these properties would be appropriate,

ﬂexible but not too complicated;

Nested multiscale graph (tree to be speciﬁc) model:

HMT

Independent mixture model for wavelet coeﬃcients

Mixture model provides appropriate approximation for

non-Gaussian real-world signals.

HMT

Hidden Markov chain model for wavelet coeﬃcients

Hidden Markov chain at the same scale:

HMT

Hidden Markov tree for wavelet coeﬃcients

Dependence across scales according to the “secondary

properties” of wavelet coeﬃcients:

HMT

Hidden Markov tree for images

HMT

Probabilities in hidden Markov trees

For a single wavelet coeﬃcient,as the real world signal is

always non-Gaussian,we model it with a mixture model:

f (w) =

m

f (w|c = m)f (c = m)

Independent mixture model;Hidden Markov chain model;

Hidden Markov tree

model

for the tree root c

0

:f (c

0

);

for the tree nodes other than root – transition probability:

f (c

i

|c

ρ(i )

),where ρ(i ) denotes the parent node of i.

HMT

Parameters in HMT

f (c

0

),f (c

i

|c

ρ(i )

);

mixture means and variance:µ

i,m

,σ

2

i,m

Notice the conditional independence properties of the model.

HMT

Problems of HMT

As all of the graphical models,we need to solve

Training the model;

Computing the likelihood with the given observations;

Estimating the latent/hidden states.

HMT

Expectation–Maximization algorithm

General settings for estimation:max

θ

f (x|θ) (ML) or

max

θ

f (θ|x) ⇔max

θ

f (x|θ)f (θ) (MAP).

EM algorithm provides a greedy and iterative way to solve the

general estimation problem based on the hidden/latent

variables c.

log f (x|θ) = log f (x,c|θ) −log f (c|x,θ).Since this is the

iterative algorithm,we take the expectation with respect to c

with the estimated parameters θ

k−1

:

log f (x|θ)f (c|x,θ

k−1

)dc =

log f (x,c|θ)f (c|x,θ

k−1

)dc

−

log f (c|x,θ)f (c|x,θ

k−1

)dc

HMT

Expectation–Maximization algorithm

Jensen’s inequality

:

log f (c|x,θ)f (c|x,θ

k−1

)dc ≤

log f (c|x,θ

k−1

)f (c|x,θ

k−1

)dc

To guarantee the increase of likelihood logf (x|θ),we only

need to solve:

θ

k

= arg max

θ

log f (x,c|θ)f (c|x,θ

k−1

)dc

Hence,

E-step

is for computing f (c|x,θ

k−1

);

M-step

is to

solve the above optimization problem.

HMT

Training hidden Markov trees with EM

In HMT,θ = {f (c

0

),f (c

i

|c

ρ(i )

),µ

i,m

,σ

2

i,m

},where i denotes

each wavelet coeﬃcient;m denotes each component in the

mixture.

Update the similar equation:

θ

k

= arg max

θ

log f (w,c|θ)f (c|w,θ

k−1

)dc

We need several tricks to complete the EM algorithm here

since we do not have an easy form for f (w,c|θ).

HMT

Training hidden Markov trees with EM

The main task to estimate the marginal state distribution

f (c

i

= m|w,θ) and the parent-child joint distribution

f (c

i

= m,c

ρ(i )

= n|w,θ).

Based on the conditional independence we have for HMT,we

can write:f (c

i

= m,w|θ) = f (w

T

i

|w

ˆ

T

i

,c

i

= m,θ)f (c

i

=

m,w

ˆ

T

i

|θ) = f (w

T

i

|c

i

= m,θ)f (c

i

= m,w

ˆ

T

i

|θ) = β

i

(m)α

i

(m);

and similarly,

f (c

i

= m,c

ρ(i )

= n,w|θ) = β

i

(m)f (c

i

|c

ρ(i )

)α

ρ(i )

(n)β

ρ(i )\i

(n).

While f (w|θ) =

m

f (c

i

= m,w|θ) =

m

α

i

(m)β

i

(m),we

have these distributions expressed in terms of α,β.

HMT

Training hidden Markov trees with EM

For the computation,we need to follow the downward

algorithm from coarse to ﬁne levels to estimate α’s and

upward algorithm from ﬁne to coarse levels to estimate β’s as

described in the paper.

M-step is simply the conditional means due to Gaussian

assumption.

Note the tricks to handle with K trees and tying.

HMT

Coming back to the denoising problem...

With the EM trained parameters,including

f (c

i

= m|w,θ),σ

c

i

s,and σ

n

’s,the estimation for the signal is

simple as solving the conditional mean estimates:

E(y

i

|w,θ) =

m

f (c

i

= m|w,θ)

σ

2

i,m

σ

2

i,m

+σ

2

n

w

i

HMT

Image segmentation

2D hidden Markov trees

Similar setting as in 1D signal model

Diﬀerence:

Subband independence:

f (w|Θ) = f (w

LH

|Θ

LH

)f (w

HL

|Θ

HL

)f (w

HH

|Θ

HH

) (scaling);

Leads to diﬀerent expansion of α’s and β’s;

Context-based interscale fusion:prior f (c):context vector

Diﬀerent EM

HMT

Image segmentation

HMT

Image segmentation

HMT

Image segmentation

HMT

Extended hidden Markov trees

Geometric hidden Markov trees:

Modeling contours explicitly;

Hidden state space:c

i

= {d

m

,θ

m

}

New conditional distribution of wavelet coeﬃcients:

f (w

i

|c

i

) ∝ exp(−dist(w

i

,e

m

)

2

/(2σ

2

g

)),where e

m

is the

response for edges with ﬁxed distance d

m

and angle θ

m

(ﬁlter

banks)

New transition probability:f (n|m) ∝ exp(−HD(l

m

,l

n

)),where

HD(l

m

,l

n

) is the Hausdorﬀ distance between lines determined

by distance and angle restricted to a square in the plane.

HMT

Take home message

Know available tools;

Do not force one tool for every problem;

Have a right model and appropriate assumptions;

Work hard

to ﬁnd the simplest (elegant) solution.

## Comments 0

Log in to post a comment