# Wavelets, Approximation Theory, and Signal Processing

AI and Robotics

Nov 24, 2013 (4 years and 7 months ago)

288 views

Wavelets,Approximation Theory,and Signal Processing
Güntürk,Fall 2010
Scribe:Evan Chou
References:Mallat,Wavelet tour of signal processing
Week 1 (9/9/2010)..........................2
Fourier Transform....................................................2
Uncertainty Principles...............................................2
Time Frequency Representations...........................................4
Week 2 (9/16/2010).........................6
Wavelet Transform....................................................8
The Haar System....................................................10
Week 3 (9/23/2010).........................11
Convergence of wavelet series.............................................11
Week 4 (9/30/2010).........................16
Multiresolution Analysis (MRA) Framework (Meyer,Mallat)........................16
Week 5 (10/7/2010).........................20
Finding the Mother Wavelet.............................................20
Week 6 (10/14/2010).........................23
Week 7 (10/21/2010).........................27
Deisgning Wavelets...................................................27
Meyer’s construction of wavelets in S........................................31
Week 8 (10/28/2010).........................32
Vanishing Moments...................................................32
Spline Wavelets......................................................33
Week 9 (11/4/2010).........................36
Compactly Supported Wavelets...........................................36
Week 10 (11/11/2010).........................40
Week 11 (11/24/2010).........................44
Vanishing Moments,Again..............................................44
Decay of wavelet coeﬃcients from vanishing moments...........................46
Lipschitz regularity from decay of wavelet coeﬃcients...........................47
Week 12 (12/1/2010).........................48
L
p
convergence of wavelet series...........................................48
Approximation by wavelet series...........................................51
1
Week 13 (12/9/2010).........................52
Linear Approximation v Nonlinear Approximation...............................54
Besov Space........................................................55
Week 1 (9/9/2010)
Fourier Transform
To start,we’ll start with the Fourier transform,studying its deﬁciencies and how to overcome them.
The convention for Fourier transform we’ll use in this course is the one with 2π in the exponent,for ease
of inversion,Plancherel,sampling at integers without 2π,among other reasons:
Ff(ξ) =f
ˆ
(ξ) =
Z
R
f(t)e
−2πiξt
dt
Initially this makes sense for f ∈ L
1
,and as an operator F:L
1
→L

(and in fact maps to C
0
).By the
standard extension procedure we can deﬁne this for L
2
(approximate by L
1
∩ L
2
),and we have that F:
L
2
→L
2
and is an isometry,so that kf
ˆ
k
2
=kf k
2
.By interpolation (Riesz-Thorin),we also have have that
F:L
p
→L
p

for 1<p <2,a result by the name of Hausdorﬀ-Young.
Also,if we consider the Fourier transform as an operator on the Schwarz space S,then F:S →S and by
duality F:S

→S

,an operator on the space of tempered distributions such as the dirac delta.
To recover f from f
ˆ
,we have the inversion formula:
f(t) =
Z
f
ˆ
(ξ)e
2πiξt

and viewing f
ˆ
=Ff as an analysis of f,this can be considered the synthesis (or reconstruction) of f from
its frequency content f
ˆ
.
Deﬁciency.This is a very nonlocal transform.To recover f(t) for even a single t,we need all frequency
data f
ˆ
(ξ) for all ξ ∈R.
A quick reason for this is the fact that the complex exponential e
2πiξt
is globally supported.This is the
main deﬁciency of the Fourier Transform,when local information about a signal is needed.For instance,
in music we can discern which frequencies are being played at which time,and if we were to take the
Fourier transform of the music,it would not give this information;instead,it would give only a vague
sense of which frequencies were present in the entire song.Such local frequency information would be
useful in general for signals with transients:seismic signals,medical signals,images and edges,just to
name a few.
Here we investigate why the Fourier transform does not provide an eﬃcient way of capturing local fre-
quency information.
Uncertainty Principles
These essentially say that a function cannot be simultaneously localized in time and frequency.
0.Scaling properties:
f(δ  )
(ξ) =
1
δ
f
ˆ

ξ
δ

Thus dilating f has the eﬀect of spreading out f
ˆ
.
2
1.If f ￿ 0,then at most one of supp f and supp f
ˆ
can be compact.
The reason for this is that if supp f is compact,then
f
ˆ
(z) =
Z
f(t)e
2πizt
dt
is analytic,and therefore f
ˆ
(ξ) cannot be compactly supported.
2.Heisenberg Uncertainty Principle.Suppose we had f ∈ L
2
with kf k
2
= 1.In physics,we can
think of |f(t)|
2
as a pdf.In this context we then deﬁne the mean
m
f
￿
Z
t |f(t)|
2
dt
and the variance
σ
f
2
￿
Z
(t −m
f
)
2
|f(t)|
2
dt
Similarly we deﬁne m
f
ˆ
and σ
f
ˆ
2
.Then we have
Theorem 1.(Heisenberg Uncertainty Principle)
σ
f
σ
f
ˆ

1

Proof.Note a few facts:σ
f
and σ
f
ˆ
are invariant under translations and modulations,i.e.if we
deﬁne f
α,β
(t) = e
2πiαt
f(t − β) then σ
f
α,β

f
.The same holds for f
ˆ
since the Fourier transform
exchanges translations and modulations.Thus without loss of generality we may assume that m
f
=
m
f
ˆ
=0.
For what follows,we assume f is Schwarz,and the general case can be obtained by approximation.
1=
Z
|f |
2
dt =
Z
f f
¯
dt
= t(ff
¯
)

−∞

Z
t(ff
¯
)

dt
= −2Re
Z
tf f
¯

dt
(Cauchy-Schwarz) ≤ 2
 Z
t
2
f
2
dt

1/2
 Z
|f
¯

|
2
dt

1/2
(Plancherel) = 2σ
f
 Z
|2πiξ f
ˆ
(ξ)|
2


1/2
= 4πσ
f
σ
f
ˆ
We conclude that
1

≤σ
f
σ
f
ˆ
For general f,we can just approximate with a Schwarz function and take limits.￿
The minimizers of σ
f
σ
f
ˆ
turn out to be Gaussians.
3
Examples:
• f =δ
0
,f
ˆ
=1.(Not quite an instance of above,but an extreme case).Here we have perfect
localization in time,but no frequency localization.
• f =1
h

1
2
,
1
2
i
,f
ˆ
=
sinπξ
πξ
,where σ
f
<∞ but σ
f
ˆ
=∞.This is a milder case.
• f(t) =e
−πt
2
/2
.Then σ
f
σ
f
ˆ
=
1

.This is the balanced case.
3.Amrein-Betheir.If f ∈L
1
,and f ￿ 0,then
|supp f |  |supp f
ˆ
| =∞
where |A| denotes the measure.
4.Donaho-Stark If
R
T
c
|f |
2

1/2
≤ε
T
kf k
2
and

R
Ω
c
|f
ˆ
|
2

1/2
≤ε
Ω
kf
ˆ
k
2
,then
|T ||Ω| ≥(1−ε
T
−ε
Ω
)
2
so that even if we wanted to ﬁnd sets T,Ω such that f |
T
and f
ˆ
|
Ω
capture most of the norms of f,
f
ˆ
,there is a limitation on how small we can make T,Ω.
5.Uniform Uncertainty Principle(s),Candés,Tao,others.For f ∈R
N
,and k,m related by
m≥ck log N
If#(supp f) ≤k,then there exists a set Λ⊂{1,￿,N} such that |Λ| ≤m and f
ˆ

Λ
determines f.
This means that if f,g are two signals with support of size k,and f
ˆ

Λ
=gˆ

Λ
,then f =g.
Also,if h is a signal of support size 2k,then h
ˆ
cannot vanish on Λ,otherwise we can break h ino a diﬀer-
ence two signals of size k with disjoint support,whose Fourier transforms agree on Λ.So this is a discrete
version which says that a signal of small support must have some frequency content in Λ,and equivalently,
if a signal vanishes on Λ,then the support of the signal cannot be smaller than 2k.
Time Frequency Representations
We will investigate diﬀerent ways of analyzing local frequency content of functions,starting with the short
time (windowed) Fourier transform,and later we will look at wavelets (which analyzes functions at dif-
ferent time scales).
Since the Fourier transform does not give us a good view of the frequency content of a signal near partic-
ular times,the idea is to preprocess the signal by restricting it to values near a particular time (to a
window of time) and then feeding it to the Fourier transform.
f
ϕ(  −τ)
τ
4
Above ϕ is the window,translated to time τ.We deﬁne the windowed Fourier transform to be
(T
ϕ
f)(ξ,τ) ￿
Z
f(t) ϕ(t −τ)
e
−2πiξt
dt =
h
f,ϕ
ξ,τ
i
which can be interpreted as inner products with a family of functions ϕ
ξ,τ
(t) = e
2πiξt
ϕ(t − τ).This is a
continuous transform,and later we will be considering whether we can sample at discrete points on a lat-
tice in the time-frequency plane (ξ,τ).
How do we recover f?Since these are inner products,it is tempting to try placing the coeﬃcients back
with the functions,and this does work:
Theorem 2.(Reconstruction from windowed Fourier transform)
Z Z
(T
ϕ
f)(ξ,τ) ϕ
ξ,τ
(t)dξdt
Proof.Again we will assume that f is suﬃciently nice to apply Fubini.The rest is just computation:
Z Z
(T
ϕ
f)(ξ,τ)e
2πiξt
ϕ(t −τ)dξdt =
Z Z Z
f(s) ϕ(s −τ)
e
−2πiξs
dse
2πiξt
ϕ(t −τ)dξdτ
=
Z Z
f(s)e
2πiξ(t−s)

Z
ϕ(t −τ) ϕ(s −τ) dτ

￿
Φ
t
(s)
dsdξ
=
Z
e
2πiξt
F
{
f Φ
t
}
(ξ)dξ
= f(t)Φ
t
(t)
= f(t)
￿
This gives us a better idea about which frequencies are present near a particular time.Also,in the recon-
struction formula for f(t),we note that if ϕ has suﬃciently fast decay,we can truncate the integral with
a small loss in error,so we do not really need all values T
ϕ
f(ξ,τ) for (ξ,τ) ∈ R
2
to recover f(t) approxi-
mately.Also,we suspect that the information in T
ϕ
f is fairly redundant,at least at a glance we are using
a R
2
→R function to represent our signal f:R →R.There is an example that makes this redundancy
very apparent.
Example 3.Let ϕ = 1
[0,1]
.Then it is already obvious that we lose no information by restricting τ to
integers,as
P
ϕ(t −k) =1 form a partition of unity.Furthermore,for each piece f(t) ϕ(t −k) supported
on [k,k +1] we have a Fourier series representation,so we can sample ξ at integers also.In fact,{ϕ
ξ,τ
,(ξ,
τ) ∈Z
2
} form an orthonormal basis for L
2
(R),and thus
f =
X
(ξ,τ)∈Z
2
hf,ϕ
ξ,τ
i ϕ
ξ,τ
=
X
(ξ,τ)∈Z
2
(T
ϕ
f)(ξ,τ) ϕ
ξ,τ
As a quick note,we cannot sample any less here or else we will actually lose part of the original signal.
The other thing to note is that 1
[0,1]
is not nice,as f 1
[0,1]
is not even continuous (even on the torus T),
and thus the decay of the Fourier series coeﬃcients will be slow,which will not allow us to truncate the
series for a ﬁnite approximation.
5
In the 1940’s,Gabor suggested using Gaussian windows,since they balance smoothness and locality well.
However,it turns out that sampling the windowed Fourier transform at integers does not work.
Notation:For ξ
0
>0,τ
0
>0,we write
ϕ
m,n
ξ
0

0
￿ ϕ

0
,nτ
0
and
G(ϕ,ξ
0

0
) =
n
ϕ
m,n
ξ
0

0
,(m,n) ∈Z
2
o
The family G we will call a Gabor system.We can then ask the following questions.
Question 1:When is a given Gabor system an orthonormal basis?
We already have an example above.A necessary condition is that ξ
0
τ
0
=1.But this condition is not suﬃ-
cient,just considering the earlier example,since we cannot sample at (
1
2
,2) for instance.In general,con-
sider ϕ with supp(ϕ) ⊂[0,τ
0
]
Question 2:When is the system complete?(i.e.it has the proeprty that if hf,ϕ
m,n
i = 0 for all m,n,
then f =0).
A necessary condition is that ξ
0
τ
0
≤1.But this condition is not suﬃcient for the same reason.
Later we will also show
Theorem 4.(Balian-Low) If G(ϕ,1,1) is an orthonormal basis,then either σ
ϕ
=∞ or σ
ϕˆ
=∞.
In particular,this means that Gaussians can never yield a Gabor system that is an orthonormal basis.
Week 2 (9/16/2010)
When is a good time frequency localization possible for ϕ?Last time we saw a limitation from the
Heisenberg Uncertainty Principle (Theorem 1):
σ
f
σ
f
ˆ

1

Now we turn to proving the Balian-Low theorem concerning Gabor systems.
Proof.(of Theorem 4) Suppose G(ϕ,1,1) ={ϕ
m,n
}
m,n∈Z
is an orthonormal basis.Deﬁne
(Pf)(x) ￿ xf(x)
(Qf)(x) ￿ (Pf
ˆ
)

(x) =
1
2πi
df
dx
(i.e.(Qf)

=Pf
ˆ
).We want to show that either Pϕ ￿ L
2
or Qϕ ￿ L
2
.Also,without loss of generality we
can center the functions at 0,i.e.we may assume m
f
= m
f
ˆ
both Pϕ and Qϕ are in L
2
.We thus have
hPϕ,Qϕi =
X
m,n
hPϕ,ϕ
m,n
ihϕ
m,n
,Qϕi
First,note that
hPϕ,ϕ
m,n
i =hPϕ,ϕ
m,n
i −n hϕ,ϕ
m,n
i
6
since ϕ=ϕ
0,0
and thus hϕ
0,0

m,n
i =0 for (m,n) ￿ (0,0),and otherwise n=0.Continuing,we have that
hPϕ,ϕ
m,n
i = h(P −nI)ϕ,ϕ
m,n
i
=
Z
(x −n) ϕ(x)e
−2πimx
ϕ(x−n)
dx
=
Z
xϕ(x+n)e
−2πimx
ϕ(x)
dx
= hϕ
−m,−n
,Pϕi
We have a similar identity with Qϕ,which we prove from the identity for Pϕ:

m,n
,Qϕi =

m,n
)

,(Qϕ)

= h(ϕˆ)
−n,m
,Pϕˆi
= hPϕˆ,(ϕˆ)
n,−m
i
= h(Qϕ)

,(ϕ
−m,−n
)

i
= hQϕ,ϕ
−m,−n
i
This implies that
h
Pϕ,Qϕ
i
=
X
m,n
h
Pϕ,ϕ
m,n
ih
ϕ
m,n
,Qϕ
i
=
X
m,n
hQϕ,ϕ
m,n
ihϕ
m,n
,Pϕi
= hQϕ,Pϕi
Note also that
hPf,gi =
Z
xf(x) g(x)
dx=hf,Pgi
h
Qf,g
i
=
h
(Qf)

,gˆ
i
=
D
Pf
ˆ
,gˆ
E
=
D
f
ˆ
,Pgˆ
E
=
D
f
ˆ
,(Qg)

E
= hf,Qgi
so that both P,Q are self-adjoint.Thus,
hϕ,PQϕi =hPϕ,Qϕi =hQϕ,Pϕi =hϕ,QPϕi
and
h
ϕ,(PQ−QP) ϕ
i
=0 for all ϕ
But also,we can compute (PQ−QP) ϕ:
(PQ−QP) ϕ =
x
2πi

dx

1
2πi
d
dx
(xϕ)
= −
1
2πi
ϕ
So we have a contradiction since
hϕ,(PQ−QP)ϕi =−
1
2πi
kϕk
2
￿ 0
7
￿
As mentioned earlier,the Balian-Low theorem implies that G(ϕ,1,1) is not an orthonormal basis for any
Gaussian ϕ.It is true that if ξ
0
τ
0
<1 then G(ϕ,ξ
0

0
) forms a tight frame (Seip,Lyubarski).
It turns out that we can get around this theorem if we measure the variance slightly diﬀerently.
Generalizations:
Deﬁne for any p >0,
σ
f,p
￿ inf
m

Z
(t −m)
2p
|f |
2

1/2p
For p =1,the m that minimizes this is exactly the mean m
f
,and hence σ
f,1

f
.Note that if σ
f,p
<∞,
then σ
f,q
<∞ for q <p by Hölder (treat |f |
2
dt as a probability measure).
We then have the following results,which we will not prove:
Theorem 5.(Balian) There exists ϕ such that σ
ϕ,p
σ
ϕˆ,p
< ∞ for all p < 1 and G(ϕ,1,1) is an
orthonormal basis for L
2
Note that the example of the rectangular window ϕ=1
[−1/2,1/2]
does not satisfy this,as ϕˆ(ξ) =
sinπξ
πξ
and
in order for σ
ϕˆ,p
<∞ we need p <1/2 so that the decay is still o

1
ξ

.
Theorem 6.(Steges) For any orthonormal basis {ψ
k
} of L
2
(R),and for any p >1,

sup
k
σ
ψ
k
,p

sup
k
σ
ψ
ˆ
k
,p

=∞
Theorem 7.(Bourgain) There exists {ψ
k
} an orthonormal basis of L
2
(R) such that

sup
k
σ
ψ
k

sup
k
σ
ψ
ˆ
k

<∞
Wavelet Transform
Let ψ∈L
2
(R).We will use the notation
ψ
a,b
(t) =
1
a

ψ

t −b
a

for a >0,b ∈R.The a

normalization is chosen so that ψ
a,b
has the same L
2
norm as ψ.The continuous
wavelet transform is deﬁned as
(W
ψ
f)(a,b) =

f,ψ
a,b

where a is the scale and b is the position.The reconstruction formula is similar to that of the windowed
Fourier transform:
f(t) =
1
c
ψ
Z
−∞

db
Z
0

da
a
2
(W
ψ
f)(a,b) ψ
a,b
(t)
8
where c
ψ
=
R

ˆ
(ξ)|
2
|ξ|
dξ <∞ (assumed to be ﬁnite).In particular,if this condition is satisﬁed,it is neces-
sary that ψ
ˆ
(0) =
R
ψ = 0 and a suﬃcient condition for this to be satisﬁed is that ψ ∈ L
2
∩ L
1

ˆ
(0) =
R
ψ=0 and some regularity of ψ
ˆ
at ξ =0.
Theorem 8.Assume c
ψ
=
R

ˆ
(ξ)|
2
|ξ|
dξ <∞.Then for all f,g ∈L
2
(R),
c
ψ
h
f,g
i
=
Z
−∞

db
Z
0

da
a
2
(W
ψ
f)(a,b) (W
ψ
g)(a,b)
This implies the reconstruction formula,since if hf
1
,gi =hf
2
,gi for all g,then f
1
=f
2
(take g =f
1
−f
2
so
that hf
1
−f
2
,gi =kf
1
−f
2
k
2
=0).Note that
Z
−∞

db
Z
0

da
a
2
(W
ψ
f)(a,b) (W
ψ
g)(a,b)
=

Z
−∞

db
Z
0

da
a
2
(W
ψ
f)(a,b) ψ
a,b
(  ),g

Proof.By properties of Fourier transform,

a,b


(ξ) = a

e
−2πibξ
ψ
ˆ
(aξ)
Also,
W
ψ
f (a,b) =

f,ψ
a,b

=
D
f
ˆ

a,b
E
=
Z
f
ˆ
(ξ) a

e
−2πibξ
ψ
ˆ
(aξ)

Computation:
Z
db
Z
da
a
2
W
ψ
f(a,b)W
ψ
g(a,b) =
Z
db
Z
da
a
 Z
dξ f
ˆ
(ξ) ψ
ˆ
(aξ)
e
2πibξ
 Z

gˆ(ξ

)
ψ
ˆ
(aξ

)e
−2πibξ


=
Z
da
a
Z
db
h
f
ˆ
(  )
ψ
ˆ
(a )
i

(b)
h
gˆ(  )
ψ
ˆ
(a )
i

(b)
=
Z
da
a
Z
dξ f
ˆ
(ξ) ψ
ˆ
(aξ)
gˆ(ξ)
ψ
ˆ
(aξ)
=
Z
dξ f
ˆ
(ξ) gˆ(ξ)
Z
da
a

ˆ
(aξ)|
2
=
Z
dξ f
ˆ
(ξ) gˆ(ξ)
Z
da
a

ˆ
(a)|
2
= c
ψ
D
f
ˆ
,gˆ
E
=c
ψ
h
f,g
i
￿
We can also talk about the discrete wavelet transform which samples a,b in the continuous wavelet trans-
form.For sampling dilations and translations it is natural to look at dyadic scales
a∈{2
−j
,j ∈Z},b ∈{2
−j
k,j ∈Z,k ∈Z}
so that for each scale a,the supports of ψ
a,b
cover all of R.We will use the notation
ψ
j,k
(x) ￿ ψ
2
−j
,2
−j
k
(x) =2
j/2
ψ(2
j
(x −2
−j
k)) =2
j/2
ψ(2
j
x−k)
9
As we saw earlier,a necessary admissibility condition is that ψ
ˆ
(0) =0.Recall
ψ
ˆ
j,k
(ξ) =2
−j/2
e
2πi2
−j

ψ
ˆ
(2
−j
ξ)
k only aﬀects the phase of ψ
ˆ
j,k
whereas j dilates ψ
ˆ
,giving higher frequency localization for smaller j.If
we consider the time-frequency locality of the family of functions ψ
j,k
,we see a diﬀerent tiling of the
time-frequency plane.In the windowed Fourier transform,the sampled functions all have the same time-
frequency localization.In the wavelet transform,ψ
j,k
has higher frequency localization for negative j and
higher time localization for positive j:
ξ
τ
j =−1
j =0
j =1
j =−2
j =−2
j =−1
j =0
j =1
ψ
j,k
(ξ)
ξ
The Haar System
(A Haar,1910) For H=L
2
([0,1]),deﬁne
H(x) =ψ(x) =

1 x∈[0,1/2)
−1 x∈[1/2,1]
and H
j,k

j,k
.Then {1
[0,1]
} ∪{H
j,k
}
j=0,k=0,￿,2
j
−1

forms an orthonormal basis for H.Orthogonality is
obvious.1
[0,1]
and H
j,k
are orthogonal since H
j,k
have mean zero.Otherwise,for hH
j,k
,H
j

,k

i,if j = j

and k ￿ k

then the supports are disjoint,so the inner product is zero.If j ￿ j

,then either the supports
are disjoint or the wider function is constant on the support of the narrower function,which has mean
zero,and thus the inner product is again zero.
We will show completeness next time.
10
Week 3 (9/23/2010)
Convergence of wavelet series
Some deﬁnitions:
• I
j,k
=[k2
−j
,(k +1)2
−j
).{I
j,k
,k ∈Z} partitions R for all j.
• V
j
={f ∈L
2
:f is constant on each I
j,k
,k ∈Z}
f ∈V
j
￿ f =
P
k
α
k
1
I
j,k
with
P
|a
k
|
2
<∞
Note that V
j
⊂V
j+1
for all j.
• P
j
￿ orthogonal projection on V
j
.
Fact:min
α
kf −αk
L
2
(A)
is attained at the average value α

=
1
|A|
R
A
f.
Given any other α,f −α

and α

−α will be orthogonal with respect to the inner product hf,gi =
1
|A|
R
f g¯dx,and the rest follows by Pythagorean identity
kf −αk
2
=kf −α

k
2
+kα

−αk
2
≥kf −α

k
2
Thus
P
j
f =
X
k∈Z
f
¯
j,k
1
I
j,k
where f
¯
j,k
=
1
|I
j,k
|
R
I
j,k
f =2
j
R
I
j,k
f.
• Let ϕ=1
[0,1)
,and deﬁne
ϕ
j,k
(x) =2
j/2
ϕ(2
j
x−k) =2
j/2
1
I
j,k
(x)
Then {ϕ
j,k
:k ∈Z} is an orthonormal basis for V
j
for all j.
If we set c
j,k
(f) =hf,ϕ
j,k
i =2
−j/2
f
¯
j,k
,then P
j
f =
P
k
hf,ϕ
j,k
i ϕ
j,k
• Note that P
j
is well-deﬁned on L
loc
1
by the same formula.
Now we deﬁne W
j
to be the orthogonal complement of V
j
in V
j+1
,i.e.
V
j+1
=V
j
⊕W
j
and let Q
j
be the orthogonal projection on W
j
.We call W
j
the “detail space” at scale 2
−j
.Note that
Q
j
=P
j+1
−P
j
,and
Q
j
f =P
j+1
f −P
j
f =
X
k
f
¯
j+1,k
1
I
j+1,k

X
k
f
¯
j,k
1
I
j
,k
Note that I
j,k
= I
j+1,2k
∪ I
j+1,2k+1
,so that f
¯
j,k
=
1
2
(f
¯
j+1,2k
+ f
¯
j+1,2k+1
).Then above we can split the
ﬁrst sum into even and odd terms and compute:
Q
j
f =
X
k

f
¯
j+1,2k
1
I
j+1,2k
+f
¯
j+1,2k+1
1
I
j+1,2k+1
−f
¯
j,k
(1
I
j+1,2k
+1
I
j+1,2k+1
)

=
X
k

[f
¯
j+1,2k
−f
¯
j,k
]1
I
j+1,2k
−[f
¯
j,k
−f
¯
j+1,2k+1
]1
I
j+1,2k+1

11
and since 2 f
¯
j,k
= f
¯
j+1,2k
+ f
¯
j+1,2k+1
,we have that f
¯
j+1,2k
− f
¯
j,k
= f
¯
j,k
− f
¯
j+1,2k+1
,and denoting this
quantity b
j,k
,we have that
Q
j
f =
X
k
b
j,k
(1
I
j+1,2k
−1
I
j+1,2k+1
) =
X
k
b
j,k
2
−j/2
ψ
j,k
where ψ
j,k
was deﬁned earlier for the Haar system.Note that {ψ
j,k
,k ∈ Z} form an orthonormal system
for W
j
.In fact,it is a basis by the above computation,where we can express any Q
j
f ∈ W
j
in terms of
the elements ψ
j,k
.Deﬁne
d
j,k
￿ hf,ψ
j,k
i =2
−j/2
b
j,k
Now pick any pair J
0
<J
1
.We have that
V
J
1
=V
J
0
⊕W
J
0
⊕W
J
0
+1
⊕￿ ⊕W
J
1
−1
by repeatedly applying the decomposition V
j+1
= V
j
⊕ W
j
.V
J
0
is the function at the coarser scale 2
−J
0
,
and W
J
0
,W
J
0
+1
,￿ are successive reﬁnements (detail spaces).Equivalently,
P
J
1
=P
J
0
+
X
k=J
0
J
0
−1
Q
k
Naturally,we want to know what happens as we take J
0
→−∞ and J
1
→∞.
Theorem 9.For all f ∈L
2
(R),P
j
f →f in L
2
as j →∞,i.e.kP
j
f −f k
L
2 →0
Proof.The ﬁrst observation is that it suﬃces to check this for a dense subset X of L
2
.This is a stan-
dard approximation argument:for ε >0,we can ﬁnd g ∈X with kf −gk
2
≤ε.Then
kP
j
f −f k
2
≤kP
j
(f −g)k
2
+kP
j
g −gk
2
+kg −f k
2
≤2ε +kP
j
g −gk
2
(note kP
j
(f −g)k
2
≤kf −gk
2
since P
j
is a projection in L
2
)
Now if we take the limsup of both sides as j →∞,we know that kP
j
g −gk
2
→0,so
limsup
j
kP
j
f −f k
2
≤2ε
as ε is arbitrary,we have that limsup
j
kP
j
f −f k
2
≤0 and thus lim
j
kP
j
f −f k
2
=0.
So now we ﬁnd a convenient dense subspace to work with.Here we will use X = C
c
(R),the continuous
functions with compact support,which is dense in L
2
.Note that these functions are uniformly continuous,
i.e.the modulus of continuity
ω
g
(δ) ￿ sup
|x−y|<δ
|g(x) −g(y)| ￿ 0 as δ →0
This means that
sup
k
kg −g¯
j,k
k
L

(I
j,k
)
≤ω
g
(2
−j
) ￿ 0 as j →∞
This implies that kP
j
g −gk
L

￿ 0 as j →∞ (recall P
j
g is just g¯
j,k
on I
j,k
),and thus by Hölder
kP
j
g −gk
L
2 ≤|supp(P
j
g −g)|
1/2
kP
j
g −gk
2
￿ 0
12
noting that we can ﬁnd |supp(P
j
g −g)| ≤C for large j (in fact C=|supp(g)| +1 for j >0)
￿
We therefore have the following immediate corollary:
Corollary 10.For any f ∈L
2
(R),
f =P
0
f +
X
j=0

Q
j
f
(Take J
1
→∞ and J
0
=0)
If we restrict ourselves to functions on [0,1],then this says that
f =f
¯
0,0
+
X
j=0

X
k=0
2
j−1
h
f,ψ
j,k
i
ψ
j,k
for any f ∈ L
2
([0,1]).Furthermore,if f is continuous,then the convergence is uniform (directly from the
proof).Compare this with Fourier series,where uniform convergence is not guaranteed for continuous
functions (in fact,not even pointwise convergence...).This Haar basis was the ﬁrst orthonormal basis for
L
2
[0,1] found with this property.Here the basis is
{1
[0,1]
}∪{ψ
j,k
,j >0,k ∈Z}
For L
2
(R),we have the orthonormal basis

0,k
,k ∈Z}∪{ψ
j,k
,j ≥0,k ∈Z}
for which we have uniform convergence of the basis expansion of f ∈L
2
if f is uniformly continuous.
Now let’s consider what happens if we take J
0
→−∞.
Theorem 11.For all f ∈L
2
,P
j
f →0 in L
2
as j →−∞.
Proof.Again,it suﬃces to show the result for a dense subspace X⊂L
2
,with the exact same reasoning.
For the dense subspace,it will be convenient to work with X =L
1
∩ L
2
since as j →−∞,the support of
P
j
f grows to R,in which case we will be integrating f on larger and larger sets.
Note P
j
g =
P
k
h
g,ϕ
j,k
i
ϕ
j,k
,and thus by triangle inequality,
kP
j
gk
2

X
k
|
h
g,ϕ
j,k
i
| =
X
k

2
j/2
Z
I
j,k
g

≤ 2
j/2
kgk
1
which tends to 0 as j →−∞.￿
Corollary 12.Taking J
1
→∞ and J
0
→−∞,we therefore have that
f =
X
j=−∞

Q
j
f =
X
j,k
hf,ψ
j,k
i ψ
j,k
where convergence is in L
2
.
13
Hence,{ψ
j,k
,j,k ∈Z} is an orthonormal basis for L
2
(R).
Remark:Note that
R
ψ
j,k
= 0 for all j,k,yet
R
f need not be zero,even though we can write f as a
linear combination of ψ
j,k
!In fact,this will be the case for any f ∈ L
1
∩ L
2
with
R
f ￿ 0.Note that this
is not contradictory since the convergence is not in L
1
.
We can also talk about L
1
convergence.
Proposition 13.
1.If f ∈L
1
,then P
j
f →f in L
1
as j →∞
2.It is not necessarily true that P
j
f →0 in L
1
as j →−∞
Proof.To show (1),it suﬃces to show that kP
j
k
1,1
are uniformly bounded as j →∞.Note that in the
proof of Theorem 9,we can use the same dense subspace X = C
c
to show that P
j
g →g in L
1
as j →∞
(we can transfer from uniform convergence to L
1
convergence).Now if we take g ∈ X with kf − gk
1
≤ε,
and use the triangle,
kP
j
f −f k
1
≤kP
j
(f −g)k
1
+kP
j
g −gk
1
+kg −f k
1
≤(1 +kP
j
k
1,1
)ε +kP
j
g −gk
1
we would be able to carry out the same argument if we can bound kP
j
k
1,1
uniformly.
Note that if f ≥ 0 then P
j
f ≥0,so that P
j
is a positive operator (P
j
replaces f by its average value on
intervals,and if f is never negative,the average value will never be negative).This means that since
−|h| ≤h ≤|h|,−P
j
|h| ≤P
j
h ≤P
j
|h| and thus
kP
j
hk
1

Z
P
j
|h|
=
X
j,k
Z
Ij,k

2
j
Z
Ij,k
|h|dx
!
dy
=
X
j,k
2
−j
2
j
Z
I
j,k
|h|dx
=
Z
|h|
= khk
1
This implies that kP
j
k
1,1
≤ 1 (operator norm),and we can follow through with the density argument as
described above.
To show (2),note that
R
P
j
f =
R
f by the same reasoning as above (note we need to apply Fubini,
which is okay since f ∈L
1
),and therefore

Z
f

=

Z
P
j
f

Z
|P
j
f | =kP
j
f k
1
Thus if
R
f ￿ 0,then kP
j
f k
1
￿
0 as j →−∞.
￿
Remarks
• Note that this tells us that Haar’s expansion f = hf,ϕ
0,0

0,0
+
P
j,k
hf,ψ
j,k

j,k
on [0,1] con-
verges in the L
1
sense as well (which is also not true for the Fourier series).
14
• It might be tempting to think of P
j
f as a conditional expectation E[f | F
j
] where F
j
is the σ-ﬁeld
generated by the intervals I
j,k
;however,we are dealing with Lebesgue measure on R which is not a
probability distribution.Interestingly,if we replace the Lebesgue measure with any probability
measure (which would require renormalization of the orthonormal basis),then we can think of con-
ditional expectations,and in fact P
j
f =E[f | F
j
] ￿ f (in L
p
,1 ≤ p <∞,including ∞ if f is uni-
formly continuous) for j →∞ would follow from martingale convergence theorem (although the ele-
mentary proof above works just as well).
Note that
{∅,(−∞,0),[0,∞),R}=F
−∞
⊂￿ ⊂F
−1
⊂F
0
⊂￿ ⊂F

=B(R),Borel sets on R
For j → − ∞,we would have E[f | F
j
] ￿ E[f | F
−∞
],which is
1
µ(x<0)
R
x<0
f for x < 0 and
1
µ(x>0)
R
x>0
f for x >0.This can proved directly as well.The idea is that given ε >0,we can ﬁnd
some j (negative),for which µ([2
−j
,2
j
]) ≥1 −ε.Thus,the two intervals I
j,−1
and I
j,0
give almost
the entire space,and thus P
j
f will be close to E[f | F
−∞
].
Implementing the Haar System
The point of the Haar expansion is that it allows us to work with approximations of a function at dif-
ferent scales.As we saw earlier,we can go from a coarse scale P
J
0
f to a ﬁne scale P
J
1
f with the succes-
sive reﬁnements Q
k
f,J
0
≤k ≤J
1
−1.Let us look at how to transition from coarse to ﬁne scales algorith-
mically.
Using the notation d
j,k
(f) =hf,ψ
j,k
i and c
j,k
(f) =hf,ϕ
j,k
i as before,we note that since ϕ
j,k
=2
j/2
1
I
j,k
and ψ
j,k
=2
j/2
(1
I
j+1,2k
−1
I
j+1,2k+1
),we have that
ψ
j,k
=
1
2

j+1,2k
−ϕ
j+1,2k+1
]
ϕ
j,k
=
1
2

j+1,2k

j+1,2k+1
]
Thus we have the formulas
c
j,k
=
1
2

[c
j+1,2k
−c
j+1,2k+1
]
d
j,k
=
1
2

[c
j+1,2k
−c
j+1,2k+1
]
Suppose we have P
J
1
f,the ﬁner approximation of f.This is described entirely by the coeﬃcients c
J
1
,k
.
Using the formulas we obtain c
J
1
−1,k
,d
J
1
−1,k
from c
J
1
,k
,and so on until we obtain c
J
0
,d
J
0
:
c
J
1
c
J
1
−1
d
J
1
−1
c
J
1
−2 c
J
0
+1
d
J0+1
c
J
0
d
J0
￿
d
J
1
−2
The boxed elements above give the ﬁne to coarse decomposition,and of course we can invert the formulas
to go backwards as well.Let’s examine how the algorithm works in the discrete setting (or,ﬁnite interval
[0,1]):
15
c
30
c
31
c
32
c
33
c
34
c
35
c
36
c
37
c
20
d
20
c
21
d
21
c
22
d
22
c
23
d
23
c
10
d
10
c
11
d
11
c
00
d
00
This is a picture in the case where we are going between P
3
f and (P
0
f,Q
k
f,0 ≤ k ≤ 2).We can recon-
struct the coeﬃcients c
3,k
from c
0,0
,d
0,0
,d
1,∗
,d
2,∗
and vice versa.
Each line in the diagram represents a multiply-add operation to transfer between successive levels.At the
top row,we need 16 = 2
4
operations,and the second we need 8 = 2
3
,and at the bottom we need 4 = 2
2
operations.In general,to transfer between the coeﬃcients of P
J
f and (P
0
f,Q
k
f,0 ≤k ≤J −1) we need a
total of 2
J+1
+2
J
+￿ +2
2
=2
J+2
−4 ≈4N operations,which is a linear time algorithm (compare to the
FFT).
We can also represent this transformation as a matrix:

1
8


1 1 1 1 1 1 1 1
1 1 1 1 −1 −1 −1 −1

1
4


1 1 −1 −1
1 1 −1 −1

1
2

1 −1
1 −1
1 −1
1 −1

c
30
c
31
c
32
c
33
c
34
c
35
c
36
c
37

=

c
00
d
00
d
10
d
11
d
20
d
21
d
22
d
23

The Haar basis functions (wavelets) are not ideal since they are not smooth.In terms of applying the
Haar decomposition to functions,we will not be able to obtain good estimates (especially not if we want
to apply Fourier transforms).We will need to work towards ﬁnding similar systems with smooth wavelets.
Week 4 (9/30/2010)
We have that the Haar basis is a basis for L
2
as well as L
p
,and also approximates C[0,1] in the L

norm
(though since the Haar basis functions are not continuous themselves,it is not a basis for (C[0,1],k  k

)).
In general,there is a framework that captures more general bases,and we will use this to ﬁnd smooth
basis functions of any speciﬁed degree.
Multiresolution Analysis (MRA) Framework (Meyer,Mallat)
Deﬁnition 14.An MRA is a sequence (V
j
)
j∈Z
of linear closed subspaces of L
2
(R) with the following
properties:
1.V
j
⊂V
j+1
for all j ∈Z
16
2.
S
j∈Z
V
j
=L
2
(R),i.e.L
2
(R) can be approximated by the subspaces V
j
.
3.
T
j∈Z
V
j
={0}
4.f ∈V
j
￿ f(2 ) ∈V
j+1
for all j ∈Z,i.e.V
j
are scaled copies of one another.
5.There exists ϕ∈V
0
such that {ϕ(  −k),k ∈Z} is an orthonormal basis for V
0
.
(4) and (5) together imply that {2
j/2
ϕ(2
j
x − k),k ∈ Z} is an orthonormal basis for V
j
.We will also see
that the conditions are redundant:(3) can be derived from the other properties.
(5) can also be relaxed to require that {ϕ(  − k),k ∈ Z} be a Riesz basis rather than an orthonormal
basis,i.e.{ϕ(  −k),k ∈Z} spans V
0
and
C
1
kck
l
2

X
k
c
k
ϕ(  −k)

L
2
≤C
2
kck
l
2
,for all c ∈l
2
As in the Haar basis case,we deﬁne
• P
j
as the orthogonal projection on V
j
• W
j
is the orthogonal complement of V
j
in V
j+1
,i.e.V
j+1
=V
j
⊕W
j
.Note that f ∈ W
j
if and only
if f(2 ) ∈W
j+1
.
• Q
j
is the orthogonal projection on W
j
,so P
j+1
=P
j
+Q
j
.
(1) and (2) above say that
lim
j→∞
kP
j
f −f k
2
=0
which we proved in Theorem 9
and (1) and (3) say that
lim
j→−∞
kP
j
f k
2
=0,for all f ∈L
2
proved in Theorem 11.
As in Corollary 12,we also have
f =
X
j∈Z
Q
j
f
with convergence in L
2
.
Summarizing,
Proposition 15.
1.lim
j→∞
kP
j
f −f k
2
=0
2.lim
j→−∞
kP
j
f k
2
=0,for all f ∈L
2
3.f =
P
j∈Z
Q
j
f with convergence in L
2
Proposition 16.(3) follows from properties (1),(2),(4),(5)
17
Proof.Let f ∈
T
j∈Z
V
j
.Then f ∈ V
−j
for j >0.Property (4) implies that f(2
j
 ) ∈V
0
for j >0.Prop-
erty (5) implies that 2
j/2
f(2
j
x) =
P
k
α
k
(j)
ϕ(x − k),with kα
(j)
k
l
2 = kf k
L
2 for all j.Taking the Fourier
transform,we have that
2
−j/2
f
ˆ
(2
−j
ξ) =
"
X
k
α
k
(j)
e
−2πikξ
#
ϕˆ(ξ)
Denote m
j
(ξ) ￿
P
k
α
k
(j)
e
−2πikξ
,a 1-periodic function.Note that km
j
k
L
2
(T)
=kα
(j)
k
l
2 =kf k
L
2.Replace ξ
by 2
j
ξ above:
f
ˆ
(ξ) =2
j/2
m
j
(2
j
ξ) ϕˆ(2
j
ξ)
We will show that f
ˆ
=0 by showing that
R
I
f
ˆ
=0 for any interval I.Let a,b >0 ﬁrst.Then
Z
a
b
|f
ˆ
(ξ)|dξ ≤

Z
a
b
|m
j
(2
j
ξ)|
2

!
1/2

Z
a
b
2
j
|ϕˆ(2
j
ξ)|
2

!
1/2

2
−j
Z
a2
j
b2
j
|m
j
(ξ)|
2

!
1/2

Z
a2
j
b2
j
|ϕˆ(ξ)|
2

!
1/2

2
−j
⌈2
j
(b −a)⌉kf k
2
2

1/2

Z
a2
j
b2
j
|ϕˆ(ξ)|
2

!
1/2
≤ C

Z
a2
j
b2
j
|ϕˆ(ξ)|
2

!
1/2
￿ 0
as j →∞.This argument works the same if a,b <0 as well,and in general we can split the interval into
three parts,[a,δ],[−δ,δ],[δ,b],noting that the integral over [−δ,δ] can be made arbitrarily small.This
implies that f
ˆ
=0 and therefore f =0.
￿
Deﬁnition:A wavelet ψ is a function in L
2
such that

j,k
￿ 2
j/2
ψ(2
j
x−k),j,k ∈Z}
is an orthonormal basis for L
2
(R).
We say that the wavelet is associated to a given MRA (V
j
)
j∈Z
if {ψ(  −k)}
k∈Z
is an orthonormal basis
for W
0
,in which case ψ
j,k
deﬁned above is an orthonormal basis for L
2
(R).
Remark 17.We can ﬁnd wavelets ψ for which {ψ
j,k
,j,k ∈Z} is a basis,but that ψ is not associated to
any MRA.In this case,if we try to ﬁnd the associated MRA,which by deﬁnition must be
V
j
=
M
l=−∞

W
l
property (5) can fail.Nevertheless,the MRA framework is very convenient to work with,and many exam-
ples will fall under this framework.
Example 18.Let V
j
￿ {f ∈L
2
:supp(f
ˆ
) ⊂[−2
j−1
,2
j−1
]},j ∈Z.Properties (1),(2),(3),(4) are all trivial.
V
0
={f ∈L
2
:supp(f
ˆ
) ⊂[−1/2,1/2]}
18
Set ϕ=F
−1
(1
[−1/2,1/2]
),then since {2
−2πikξ
ϕˆ,k ∈Z} is an orthonormal basis for L
2
[−1/2,1/2],we have
that {ϕ(x−k),k ∈Z} is an orthonormal basis for V
0
,and ϕ(x) =
sinπx
πx
.Also,
W
0
={f ∈L
2
:supp(f
ˆ
) ⊂[−1,−1/2] ∪[1/2,1]}
the wavelet is ψ =F
−1
(1
[−1,−1/2]∪[1/2,1]
).Note that J =[−1,−1/2] ∪[1/2,1] is congruent to [0,1) mod Z,
i.e.{J + k,k ∈ Z} is a partition of R.This means that {e
−2πikξ
1
J
,k ∈ Z} is an orthonormal basis for
L
2
(J) =F(W
0
) and thus {ψ(  −k),k ∈Z} is an orthonormal basis for W
0
.Computing ψ explicitly:
ψ
ˆ
= 1
[−1,1]
−1
[−1/2,1/2]
ψ = 2
sin(2πx)
2πx

sinπx
πx
=
sinπx
πx
[2cos πx−1]
which still has bad localization properties.This is often called the Shannon wavelet in connection to the
sampling theorem,which says that if f ∈V
0
then f(x) =
P
k
f(k) ϕ(x −k).
This is related to the Littlewood Paley decomposition,which replaces ϕˆ with smoother cutoﬀs,but in the
process property (5) no longer holds.
So we have the Haar wavelet which lacks smoothness in time but has good localization in time,and the
Shannon wavelet which lacks localization in time but has great smoothness in time.(In frequency the
roles reverse).They lie at the two extremes of time frequency localization.
Goal:Find intermediate cases.Daubechies found orthonormal bases of compactly supported wavelets in
C
k
for any speciﬁed k.
Further Goal:Understand/characterize smoothness function spaces (Sobolev,Besov,etc) in terms of
the wavelet coeﬃcients.(This is impossible without good localization/smoothness)
First,we pose the following question:
Question:For what g ∈ L
2
is {g(  − k),k ∈ Z} an orthonormal system?Or equivalently,when is
{e
−2πikξ
gˆ(ξ),k ∈Z} an orthornomal system.(equivalent since F is an isometry)
Theorem 19.{g(  −k),k ∈Z} is an orthonormal system if and only if
P
|gˆ(ξ +l)|
2
=1 a.e.
(Already we have the example where g(x) =
sinπx
πx
)
Proof.Deﬁne G(ξ) =
P
l∈Z
|gˆ(ξ +l)|
2
,which is 1-periodic,and also well deﬁned since
Z
0
1
G(ξ)dξ =
Z
−∞

|gˆ(ξ)|
2
=kgk
L
2
2
<∞
so that G∈L
1
(T) and hence is ﬁnite a.e.We can then look at the Fourier series
G
ˆ
(k) ￿
Z
0
1
G(ξ)e
−2πikξ
dξ =
Z
−∞

e
−2πikξ
gˆ(ξ) gˆ(ξ)

=
Z
−∞

g(x−k) g(x)
dx
19
using Plancherel.This implies that
G≡1 on T ￿ G
ˆ
(k) =hg,g(  −k)i =δ
k
￿ hg(  −k),g(  −k

)i =δ
k,k

which gives the result.￿
Exercise 1.Let ψ
ˆ
=1
J
where J =I ∪(−I) where I =[2/7,1/2] ∪[2,2 +2/7].Then |J| =1,and check that
• {2
j
J,j ∈Z} and {J +l,l ∈Z}are both partitions of R
• Deﬁning W
j
={f ∈L
2
,supp(f
ˆ
) ⊂2
j
J} and V
j
=
￿
−∞
j−1
W
l
,Property (5) fails.
Short term goal:Given an MRA,what is the corresponding wavelet ψ?We will cover this next time.
Week 5 (10/7/2010)
Finding the Mother Wavelet
Given an MRA,we are looking for the functino ψ (mother wavelet) for which {ψ(  − k)} is an
orthonormal basis for W
0
,then {ψ
j,k
,j,k ∈Z} will be an orthonormal basis for L
2
(R).
Basic Relations
• f ∈V
0
if and only if f =
P
h
f,ϕ
0,k
i
ϕ
0,k
,and denoting c
k
(f) =
h
f,ϕ
0,k
i
,this is true if and only if
f
ˆ
(ξ) =
h
X
c
k
e
−2πikξ
i
ϕˆ(ξ)
Call a
f
(ξ) =
P
c
k
e
−2πikξ
,a one-periodic function,and we note ka
f
k
L
2
(T)
=kc
k
(f)k
l
2 =kf k
L
2
(R)
• Since V
0
⊂V
1
,f ∈V
0
implies that f ∈V
1
so that f(x) =
P
hf,ϕ
1,k
i ϕ
1,k
where ϕ
1,k
= 2

ϕ(2  −k).
Then
f
ˆ
(ξ) =

1
2

X
hf,ϕ
1,k
i e
−πikξ

ϕˆ(ξ/2)
Call m
f
(ξ/2) =
1
2

P
hf,ϕ
1,k
i e
−πikξ
,then we have km
f
k
L
2
(T)
=
1
2

kf k
L
2
(R)
.
So f
ˆ
(ξ) =m
f
(ξ/2) ϕˆ(ξ/2)
Notation:For f =ϕ,we will write m
0
￿ m
ϕ
.
For f =ϕ,we have that
ϕ(x) =
X
h
k
ϕ
1,k
(x)
where h
k
=hϕ,ϕ
1,k
i and ϕ
1,k
(x) = 2

ϕ(2x −k).Then
ϕ(x) = 2

X
k
h
k
ϕ(2x−k)
with khk
2
=1.This is called the reﬁnement equation (2 scale diﬀerence equation).
20
In the case of the Haar basis,ϕ=1
[0,1]
and
1
[0,1]
= 2


1
2

1
[0,1/2]
+
1
2

1
[1/2,1]

so that h
0
=h
1
=
1
2

and ϕ
1,0
=1
[0,1/2]

1,1
=
1
2

1
[1/2,1]
.
Let us examine m
0
(ξ) =
1
2

P
h
k
e
−2πikξ
.We have ϕˆ(ξ) = m
0
(ξ/2) ϕˆ(ξ/2).Recall from last time (The-
orem 19) that if {g(  − k),k ∈ Z} form an orthonormal basis for L
2
,then
P
l
|gˆ(ξ + l)|
2
= 1 for a.e.ξ.
Applying this to g =ϕ,we have
1 =
X
l∈Z
|ϕˆ(2ξ +l)|
2
=
X
l∈Z
|ϕˆ(2ξ +2l)|
2
+
X
l∈Z
|ϕˆ(2ξ +2l +1)|
2
a.e.ξ
=
X
l∈Z
|m
0
(ξ +l)|
2
|ϕˆ(ξ +l)|
2
+
X
l∈Z
|m
0
(ξ +1/2)|
2
|ϕˆ(ξ +l +1/2)|
2
1 = |m
0
(ξ)|
2
+|m
0
(ξ +1/2)|
2
Above we have used the fact that m
0
is 1-periodic so that m
0
(ξ + l) = m
0
(ξ).Summarizing the facts,we
have
Proposition 20.
• If f ∈V
0
,then f
ˆ
(ξ) =m
0
(ξ/2) f
ˆ
(ξ/2).In particular,ϕˆ(ξ) =m
0
(ξ/2) ϕˆ(ξ/2)
• 1 =|m
0
(ξ)|
2
+|m
0
(ξ +1/2)|
2
Example 21.
1.For Haar,we have ϕ=1
[0,1]
=1
[−1/2,1/2]
(  −1/2),so ϕˆ(ξ) =e
−πiξ
sinπξ
πξ
,and
m
0
(ξ) =
1
2


1
2

+
1
2

e
−2πiξ

=
1+e
−2πiξ
2
=e
−πiξ
cos(πξ)
It is easy to see that both properties above are satisﬁed using the half angle formula.Also,we note
that m
0
(ξ) is a low-pass ﬁlter,mainly supported near 0 ∈ T,and m
0
(ξ + 1/2) is a high-pass ﬁlter,
mainly supported near 1/2 ∈T.
2.For Shannon,we have ϕˆ =1
[−1/2,1/2]
so ϕ(x) =
sinπx
πx
.
1
[−1/2,1/2]
=1
[−1/2,1/2]
1
[−1,1]
so m
0
(ξ) =1
[−1/4,1/4]
.Again,it is easy to check the two properties and that m
0
(ξ) is low pass and
m
0
(ξ +1/2) is high-pass.
Let us now characterize the detail space W
0
.f ∈ W
0
if and only if f ∈ V
1
∩ V
0

that f ∈V
1
if and only if f
ˆ
(ξ) =m
f
(ξ/2) ϕˆ(ξ/2) with km
f
k
L
2
(T)
=kf k
L
2.
f is perpendicular to V
0
if and only if f is perpendicular to ϕ
0,k
for all k ∈ Z,which is true if and only if
f
ˆ
is perpendicular to e
−2πikξ
ϕˆ(ξ).Computationally,(the same computation as in Theorem 19) this
means that
0=
Z
R
f
ˆ
(ξ)ϕˆ(ξ)
e
2πikξ
dξ =
Z
0
1

X
l
f
ˆ
(ξ +l) ϕˆ(ξ +l)
!
e
2πikξ
for all k
21
noting that
P
l
f
ˆ
(ξ +l) ϕˆ(ξ +l)
is a one-periodic function that is well-deﬁned,as

X
l
f
ˆ
(ξ +l) ϕˆ(ξ +l)

L
1
(T)

X
l

f
ˆ
(ξ +l) ϕˆ(ξ +l)

L
1
(T)

X
l
|f
ˆ
(ξ +l)|
L
2
(T)
|ϕˆ(ξ +l)|
L
2
(T)

X
l
|f
ˆ
(ξ +l)|
L
2
(T)
2
!
1/2

X
l
|ϕˆ(ξ +l)|
L
2
(T)
2

1/2
= kf
ˆ
k
L
2 kϕˆk
L
2 <∞
Continuing on,the previous statement is equivalent to
X
l
f
ˆ
(ξ +l) ϕˆ(ξ +l)
≡0 a.e.ξ
Now we split into evens and odds again,and replace ξ by 2ξ.Now we additionally use the fact that f ∈V
1
so that f
ˆ
(ξ) =m
f
(ξ/2) ϕˆ(ξ/2).
0 =
X
l
f
ˆ
(2ξ +2l) ϕˆ(2ξ +2l)
+
X
l
f
ˆ
(2ξ +2l +1) ϕˆ(2ξ +2l +1)
=
X
l
m
f
(ξ +l)ϕˆ(ξ +l) m
0
(ξ +l) ϕˆ(ξ +l)
+
X
l
m
f
(ξ +l +
1
2
)ϕˆ(ξ +l +
1
2
) m
0
(ξ +l +
1
2
)ϕˆ(ξ +l +
1
2
)
= m
f
(ξ) m
0
(ξ)
X
l
|ϕˆ(ξ +l)|
2
+m
f
(ξ +1/2) m
0
(ξ +1/2)
X
l
|ϕˆ(ξ +l +1/2)|
2
0 = m
f
(ξ) m
0
(ξ)
+m
f
(ξ +1/2) m
0
(ξ +1/2)
We use the following elementary lemma:Let (z
1
,z
2
) ∈C
2
and (ν
1

2
) ∈C
2
such that z
1
ν
1
¯ +z
2
ν
2
¯ =0.Then
there exists λ ∈C such that (z
1
,z
2
) =λ(ν
2
¯,−ν
1
¯ ).(Proof:if ν
1
￿ 0,set λ =−z
2

1
¯,otherwise ν
2
￿ 0 and we
set λ=z
1

2
¯ )
All of this implies that f ∈W
0
if and only if there exists λ
f
(ξ) such that
(m
f
(ξ),m
f
(ξ +1/2)) =λ
f
(ξ)(m
0
(ξ +1/2)
,−m
0
(ξ)
)
i.e.m
f
(ξ) =λ
f
(ξ) m
0
(ξ +1/2)
and m
f
(ξ +1/2) =−λ
f
(ξ) m
0
(ξ)
.Combining these shows that
λ
f
(ξ +1/2)m
0
(ξ) =m
f
(ξ +1/2) =−λ
f
(ξ) m
0
(ξ)
so that λ
f
(ξ + 1/2) =−λ
f
(ξ) whenever m
0
(ξ) ￿ 0.Note the second property of m
0
says that |m
0
(ξ)|
2
+
|m
0
(ξ + 1/2)|
2
= 1 for a.e.ξ,so that either m
0
(ξ) ￿ 0 or m
0
(ξ + 1/2) ￿ 0 for all ξ.In the ﬁrst case,this
means that λ
f
(ξ +1/2) =−λ
f
(ξ),and in the second case,this means that λ
f
(ξ +1) =λ
f
(ξ) =−λ
f
(ξ +1/
2),so in fact λ
f
(ξ +1/2) =−λ
f
(ξ) for all ξ.
Now deﬁne ν
f
(ξ) ￿ e
−πiξ
λ
f
(ξ/2).ν
f
is one-periodic by the anti-symmetry of λ
f
,so
ν
f
(ξ +1) =−e
−πiξ
λ
f
(ξ/2+1/2) =e
−πiξ
λ
f
(ξ/2) =ν
f
(ξ)
Thus we have the following characterization for W
0
:
Proposition 22.f ∈W
0
if and only if
f
ˆ
(ξ) =e
πiξ
ν
f
(ξ) m
0
(ξ/2 +1/2)
ϕˆ(ξ/2)
22
for some ν
f
∈L
2
(T) with kν
f
k
L
2
(T)
=kf k
L
2
(R)
.
ν
f
contains information speciﬁc to f,and m
0
(ξ/2+1/2) acts as a high pass ﬁlter.
Proof.We already have that f ∈V
1
implies f
ˆ
(ξ) =m
f
(ξ/2) ϕˆ(ξ/2).We just showed that
m
f
(ξ/2) =λ
f
(ξ/2) m
0
(ξ/2 +1/2)
=e
πiξ
ν
f
(ξ) m
0
(ξ/2+1/2)
for some ν
f
(note we can transfer between λ
f
and ν
f
with the deﬁnition) Also,
kf k
L
2
(R)
2
= 2km
f
k
L
2
(T)
2
= 2
Z
0
1/2
|m
f
(ξ)|
2
dξ +2
Z
0
1/2
|m
f
(ξ +1/2)|
2

= 2
Z
0
1/2

f
(ξ)|
2
(|m
0
(ξ +1/2)|
2
+|m
f
(ξ)|
2
)dξ
= 2
Z
0
1/2

f
(ξ)|
2
=
Z
0
1

f
(ξ)|
2
as desired.
￿
Now for a function in W
0
to be the mother wavelet,there is an additional property to be satisﬁed (from
Theorem 19).We will show that
Theorem 23.Given an MRA with scaling function ϕ and the associated low pass ﬁlter m
0
,ψ ∈ W
0
is a
mother wavelet for the given MRA (i.e.ψ(  −k) an orthonormal basis for W
0
)
if and only if
ψ
ˆ
(ξ) =e
πiξ
m
0
(ξ/2+1/2)
ϕˆ(ξ/2) γ(ξ)
where |γ(ξ)| =1 a.e.ξ and is 1-periodic.
Week 6 (10/14/2010)
Proof.First assume that the equation for ψ
ˆ
holds.By Proposition 22,with ν
f
=γ,we have that ψ ∈W
0
and that kψk
L
2
(R)
= kγk
L
2
(T)
= 1.We want to check that {ψ(  − k),k ∈ Z} forms an orthonormal basis
for W
0
.By Theorem 19,we just need to check that
P

ˆ
(ξ + l)|
2
= 1 a.e.We already know that
P
|ϕˆ(ξ +l)|
2
=1 a.e.since ϕ(  −k),k ∈ Z is an orthonormal basis for V
0
.Applying the usual computation,
we look at 2ξ and split into even and odds:
X
l

ˆ
(2ξ +l)|
2
=
X
l
|m
0
(ξ +l/2+1/2)|
2
|ϕˆ(ξ +l/2)|
2
=
X
l
|m
0
(ξ +l +1/2)|
2
|ϕˆ(ξ +l)|
2
+
X
l
|m
0
(ξ +l +1)|
2
|ϕˆ(ξ +l +1/2)|
2
= |m
0
(ξ +1/2)|
2
+|m
0
(ξ +l)|
2
=1
23
noting the facts from Proposition 20 for the last line,and that m
0
is 1-periodic.Thus ψ(  − k),k ∈ Z
forms an orthonormal system.Now we show that this orthonormal system spans W
0
.Again from the pre-
vious Proposition 22,we know that since f ∈W
0
we can write
f
ˆ
(ξ) =e
πiξ
ν
f
(ξ) m
0
(ξ/2 +1/2)
ϕˆ(ξ/2) =
ν
f
(ξ)
γ(ξ)
ψ
ˆ
(ξ)
so that f
ˆ
(ξ) is the product of a 1-periodic function and ψ
ˆ
(ξ).Taking the Fourier series of
ν
f
(ξ)
γ(ξ)
and
inverting the Fourier transform (see remark below) shows that
f(ξ) =
X
k
c
k
ψ(ξ −k)
so that f ∈span{ψ(  −k),k ∈Z}.This shows that ψ is a wavelet for the given MRA.
Conversely,now assume that ψ is a wavelet for the given MRA.Then by Proposition 22 we have some ν
ψ
such that ψ
ˆ
(ξ) =e
πiξ
ν
ψ
(ξ) m
0
(ξ/2 + 1/2)
ϕˆ(ξ/2).and kν
ψ
k
L
2
(T)
=kψk
L
2
(R)
= 1.We also have that
P
l

ˆ
(ξ + l)|
2
= 1 a.e.since ψ is an orthonormal system.Then by the same computation as above,we
have
1 =
X
l

ˆ
(2ξ +l)|
2
=
X
l

ψ
(2ξ +l)|
2
|m
0
(ξ +l/2 +1/2)|
2
|ϕˆ(ξ +l/2)|
2
= |ν
ψ
(2ξ)|
2

X
l
|m
0
(ξ +l +1/2)|
2
|ϕˆ(ξ +l)|
2
+
X
l
|m
0
(ξ +l +1)|
2
|ϕˆ(ξ +l +1/2)|
2

= |ν
ψ
(2ξ)|
2
so that |ν
ψ
(ξ)| =1 for a.e.ξ.￿
The following is a useful characterization of when a function lies in the span of translates,and we used it
in the previous proof:
Proposition 24.Let {g(  − k),k ∈ Z} be an orthonormal basis for Y ⊂ L
2
.We note that f ∈ Y =
span{g(  −k),k ∈Z} if and only if f
ˆ
(ξ) =λ
f
(ξ) gˆ(ξ) where λ
f
is a 1-periodic function.
Proof.We showed this in the previous proof by using the Fourier series λ
f
(ξ) =
P
λ
f
(k)e
−2πikξ
,so that
f
ˆ
(ξ) =
X
λ
f
(k)e
−2πikξ
gˆ(ξ) ￿ f(x) =
X
λ
f
(k) g(x −k)
Conversely,if f ∈ Y,so that f =
P
c
k
g(x − k),then taking the Fourier transform shows that f
ˆ
(ξ) =
P
k
c
k
e
−2πikξ

gˆ(ξ).￿
From this,we can show that for any γ,1-periodic with |γ| =1,if we set gˆ
γ
(ξ) ￿ gˆ(ξ) γ(ξ),then {g
γ
(  −
k),k ∈Z} also forms an orthonormal basis for Y.This follows from the fact that
X
|gˆ(ξ +l)|
2
=1 a.e.￿
X
|gˆ
γ
(ξ +l)|
2
=1 a.e.
so that g
γ
(  −k),k ∈Z forms an orthonormal system.That the span is the same follows immediately from
the remark,so that for any f ∈Y,
f
ˆ
(ξ) =λ
f
(ξ) gˆ(ξ) =
λ
f
(ξ)
γ(ξ)

γ
(ξ)
24
just as in the previous proof.
For example,if we take γ(ξ) = e
2πiNξ
,this shifts ψ by N,which is saying the obvious fact that if ψ(  −
k),k ∈Z is an orthonormal basis for W
0
,then ψ(  −N −k),k ∈Z is an orthonormal basis for W
0
.
Essentially this observation says that the γ of Theorem 23 can be an arbitrary 1-periodic function with
|γ| =1.The canonical choice of course is just to set γ(ξ) =1.
This allows us to compute ψ explicitly in terms of the reﬁnement equation.Recall
m
0
(ξ) =
1
2

X
k
h
k
e
−2πikξ
where h
k
=hϕ,ϕ
1,k
i.Take γ ≡1,so that
ψ
ˆ
(ξ) =e
πiξ
m
0
(ξ/2 +1/2)ϕˆ(ξ/2) =
1
2

X
k
h
¯
k
e
2πik(ξ/2+1/2)
e
πiξ
ϕˆ(ξ/2)
=
1
2

X
k
h
¯
k
(−1)
k
e
πi(k+1)ξ
ϕˆ(ξ/2)
(k =−l −1) =
1
2

X
l
(−1)
−l−1
h
−l−1
e
−πilξ
ϕˆ(ξ/2)
Inverting the Fourier transform gives
Proposition 25.
ψ(x) = 2

X
l
h
¯
−l−1
(−1)
l−1
ϕ(2x−l)
so ψ is built from the coeﬃcients h
k
,except ﬂipped and with modulated signs.
Example 26.Check this with the Haar system.In the case of the Haar basis,we had ϕ=1
[0,1]
,
m
0
(ξ) =
1
2


1
2

+
1
2

e
−2πiξ

=e
−πiξ
cos(πξ)
and h
0
=h
1
=
1
2

.Then
ψ(x) = 2


1
2

1
[0,1]
(2x+1) −
1
2

1
[0,1]
(2x+2)

=1
[0,1]
(2x+1) −1
[0,1]
(2x+2) =

1 −
1
2
≤x ≤0
−1 −1 ≤x ≤−
1
2
Note that it is a shifted and ﬂipped version of what we used before,which of course is equivalent from the
earlier remarks.
Computing the Wavelet Coeﬃcients
Deﬁne g
k
￿ −h
¯
−k−1
(−1)
k−1
,so that ϕ(x) = 2

P
h
k
ϕ(2x −k) and ψ(x) = 2

P
g
k
ϕ(2x −k).This
means that
ϕ
j,k
(x) =2
j/2
ϕ(2
j
x−k) =2
j+1
2
X
l
h
l
ϕ(2(2
j
x −k) −l) =
X
l
h
l
ϕ
j+1,l+2k
25
Likewise,
ψ
j,k
=
X
l
g
l
ϕ
j+1,l+2k
As before,we will set c
j,k
(f) =hf,ϕ
j,k
i and d
j,k
(f) =hf,ψ
j,k
i.Then taking the inner product with the
previous relations,
c
j,k
=
X
l
h
¯
l
c
j+1,l+2k
=
X
l
h
¯
l−2k
c
j+1,l
We write
P
l
h
¯
−(2k−l)
c
j+1,l
as the convolution

h
¯
−()
∗ c
j+1,()

2k
c
j,k
=
X
l
h
¯
l−2k
c
j+1,l
=

h
¯
−()
∗ c
j+1,()

2k
d
j,k
=
X
l

l−2k
c
j+1,l
=

−()
∗ c
j+1,()

2k
As a block diagram,
(g¯
−n
)
(h
¯
−n
)
↓2
↓2
(c
j+1
)
(c
j
)
(d
j
)
where the ﬁrst blocks denote convolution and ↓2 denotes the downsampling operator
(c
0
,c
1
,c
2
,c
3
,￿ ) ￿ (c
0
,c
2
,￿ )
(note we want the 2k-th coeﬃcient of the convolution for the k-th coeﬃcients for c
j
,d
j
).As before,given
the coeﬃcients for a function f at the ﬁner level V
j+1
,we can decompose them to the coeﬃcients at the
coarser level V
j
and the coeﬃcients in the detail space W
j
.In a matrix form,we have

￿
c
j,k
d
j,k
c
j,k+1
d
j,k+1
￿

=

￿
￿ h
¯
0
h
¯
1
h
¯
2
￿
￿ g¯
0

1
h
¯
2
￿
￿ h
¯
0
h
¯
1
h
¯
2
￿
￿ g¯
0

1

2
￿
￿

￿
c
j+1,2k
c
j+1,2k+1
c
j+1,2k+2
c
j+1,2k+3
c
j+1,2k+4
￿

This matrix is an orthogonal matrix.
Exercise 2.Write out |m
0
(ξ)|
2
+|m
0
(ξ +1/2)|
2
=1 in terms of h
k
only.Then
￿
k
h
k
h
¯
k+2n

n
.
Doing the same for m
ψ
(ξ) m
0
(ξ)
+m
ψ
(ξ +1/2) m
0
(ξ +1/2)
=0 shows that
￿
h
k

2k+n
=0.
For the reconstruction formula,we note that
P
j+1,f
f =P
j
f +Q
j
f =
X
c
j,l
ϕ
j,l
+
X
d
j,l
ψ
j,l
26
so that
c
j+1,k
= hf,ϕ
j+1,k
i
=
h
P
j+1
f,ϕ
j+1,k
i
=
X
c
j,l

j,l

j+1,k
i +
X
d
j,l

j,l

j+1,k
i
=
X
c
j,l
h
k−2l
+
X
d
j,l
g
k−2l
noting that
h
ϕ
j,l

j+1,k
i
=2
j
2
2
j+1
2
Z
ϕ(2
j
x−l) ϕ(2
j+1
x−k)dx= 2

Z
ϕ(x) ϕ(2x−k +2l)dx=
h
ϕ
0,0

1,k−2l
i
=h
k−2l
and a similar computation hold sfor g
k−2l
.We can write the above as
c
j+1,k
=

h∗ c˜
j,()

k
+

g ∗d
˜
j,()

k
where c˜
j
= (￿ c
−1
,0,c
0
,0,c
1
,0,￿ ) and likewise for d
˜
(this operation is called upsampling,which we will
denote by ↑2 in the block diagram:
h
g
↑2
↑2
(c
j+1
)
(c
j
)
(d
j
)

Note that for practical purposes,instead of working directly with the functions f ∈ L
2
,ϕ,ψ,these equa-
tions allow us to simply work entirely in terms of the wavelet coeﬃcients c
j
,d
j
,h
j
,g
j
.
For ﬁnite sequences c
j
,let us compute the time to compute the decomposition d
j−1
,d
j−2
,￿,d
0
,c
0
.As
long as h,g are ﬁnite length sequences,we note that the implementation time is still linear!Computing
c
j−1
,d
j−1
from c
j
takes a linear amount of computation,say CN (depending on the support of h,g).
Then,we note that there are half as many coeﬃcients in c
j−1
,so to compute c
j−2
,d
j−2
from c
j−1
takes
half the amount of computation CN/2,and so forth.Adding these up is a geometric series of length at
most log
2
N (where N is the number of coeﬃcients in (c
j
)),so the total amount of computation is
CN

2
0
+￿ +2
−log
2
N
] ≤2CN.(Again,compare to FFT(N) which takes N log N time).The multiresolu-
tion structure is what buys us this speedup.
Week 7 (10/21/2010)
Deisgning Wavelets
What do we want in a wavelet basis?
1.A basis not just for L
2
but for other function spaces (Sobolev,Hölder,Besov).This typically
requires that ψ has suﬃcient smoothness.
2.Local expansions.This requires that ψ be well localized in time,ideally compact with small sup-
port,otherwise fast decay.
27
3.If we have f =
P
d
j,k
ψ
j,k
,we want sparse decompositions,or decay of the wavelet coeﬃcients for
large |j| (the scale).This is analogous to the case of Fourier basis,where the smoothness of the
function corresponds to decay in the Fourier coeﬃcients.For k (the position) there is no corre-
sponding property,as decay in positional coeﬃcients only describes the decay of the function,and
not the smoothness.In any case,we care about wavelets for which the wavelet coeﬃcients decay in
|j| at a rate that corresponds to the smoothness class of f.
This requires vanishing moments for ψ,i.e.
R
x
k
ψ(x)dx =0 or ψ
ˆ
(k)
(0) =0 for k =0,1,￿,r.This
is because if functions are locally smooth,then locally they are well-approximated by polynomials.
In this case the wavelet coeﬃcients are nothing more than
Z

j,k
=
Z
(f −p
f
) ψ
j,k
if the degree of p
f
≤ r.If p
j
approximates f well near where ψ
j,k
is supported,then the wavelet
coeﬃcient is small.
The reason we care about decay is for truncation purposes:we can describe f with fewer coeﬃ-
cients while minimizing the distortion when we reconstruct f.
4.Fast algorithms (using MRA framework and good time localization)
So far we’ve seen the route MRA((V
j
),ϕ) ￿ ψ,characterizing the wavelet ψ for an MRA.Now we
want to design ϕ so that we construct an MRA corresponding to a wavelet with the properties that we
want,i.e.the route we take now is
ϕ￿ MRA((V
j
),ϕ) ￿ ψ
Given some ϕ,we can start by setting V
j
= span{ϕ
j,k
,k ∈ Z}.We then ask when this gives rise to an
MRA?Here is one ingredient:
Proposition 27.Let ϕ∈L
2
(R) such that {ϕ(  −k),k ∈Z} forms an orthonormal system.If ϕˆ is contin-
uous at 0 and ϕˆ(0) ￿ 0 then
S
j∈Z
V
j
=L
2
(R).
Proof.We want to show that if f ⊥V
j
for all j ∈ Z then f ≡0.Let f be such a function,and let ε >0.
There exists g ∈L
2
such that supp gˆ is compact,say supp gˆ ⊂[−2
J−1
,2
J−1
] such that kf − gk ≤ε.This
implies that kP
j
gk
2
=kP
f
(f −g)k
2
≤ε.We will be interested in the ﬁner scales j ≥J.Consider the func-
tion
h
j
(ξ) =gˆ(ξ) ϕˆ(2
−j
ξ)
on [−2
j−1
,2
j−1
]
and write the Fourier series expansion with respect to 2
−j/2
e
2πikξ2
−j
,k ∈Z:
kh
j
k
L
2
(R)
2
=
Z
−2
J−1
2
J−1
|gˆ(ξ)|
2
|ϕˆ(2
−j
ξ)|
2

(Plancherel) =
X
k∈Z
|h
ˆ
j
(k)|
2
(support of gˆ is [−2
J−1
,2
J−1
]) =
X
k∈Z

Z
R
gˆ(ξ) ϕ(2
−j
ξ)
2
−j
e
−2πikξ2
−j

2
=
X
k∈Z
|h
gˆ,ϕ
j,k
i|
2
=
X
k∈Z
|
h
g,ϕ
j,k
i
|
2
(Parseval)
= kP
j
gk
L
2
(R)
2
28
This means that kh
j
k
2
= kP
j
gk
2
≤ ε.We know that as j →∞,kh
j
k
2
→kgˆk
2
2
 |ϕˆ(0)|
2
since ϕˆ(ξ 2
−j
) →
ϕˆ(0) uniformly on ξ ∈[−2
J−1
,2
J−1
],using the fact that ϕˆ is continuous at 0.Thus,
kgˆk
2

ε
|ϕˆ(0)|
for all ε >0
Now we use that ϕˆ(0) ￿ 0,and transfer to f:
kf k
2
≤kf −gk
2
+kgk
2
≤ε +
ε
|ϕˆ(0)|
≤Cε
and since ε is arbitrary,this implies kf k
2
=0 and f ≡0.
￿
Note that if (V
j
,ϕ) also satisﬁes the other properties of an MRA,then earlier we showed that kP
j
gk
2

kgk
2
(needs nesting of V
j
,Proposition 15).Since above we showed that kP
j
gk
2
→kgk
2
|ϕˆ(0)|,this implies
that |ϕˆ(0)| =1.In other words,
Proposition 28.Let (V
j
,ϕ) be an MRA.If ϕˆ is continuous at 0,and ϕˆ(0) ￿ 0,then |ϕˆ(0)| =1
Thus,if we want to construct an MRA in this manner,we would ﬁnd ϕ satisfying ϕˆ continuous at 0 with
|ϕˆ(0)| =1.
If we assume something even stronger,that ϕ∈L
1
so that ϕˆ is continuous everywhere,then
Corollary 29.Let (V
j
,ϕ) be an MRA.If ϕ∈L
1
∩L
2
and ϕˆ(0) ￿ 0,then
1.ϕˆ(k) =0 for all k ∈Z\{0}
2.
P
k
ϕ(x+k) =1 a.e.
Proof.Since ϕ is in L
1
,ϕˆ is continuous,and since the translates of ϕ form an orthonormal system
(MRA),
P
|ϕˆ(ξ + l)|
2
= 1 a.e.,and by continuity,this in fact holds everywhere.But since ϕˆ(0) = 1
(assumption and the previous proposition),it must be the case that ϕˆ(k) = 0 for k ∈ Z\{0},and this
shows (1).
Now consider Φ(x) =
P
k
ϕ(x+k),which is 1-periodic.Taking the Fourier series,
Φ
ˆ
(l)
F.S.
=ϕˆ(l)
F.T.

l
so that Φ≡1 a.e.,which shows (2) ￿
Another relation on ϕ,ψ.
Earlier we had derived the following relations:
ϕˆ(ξ) = m
0
(ξ/2) ϕˆ(ξ/2)
ψ
ˆ
(ξ) = e
iπξ
m
0
(ξ/2 +1/2)
ϕˆ(ξ/2) γ(ξ)
1 = |m
0
(ξ) +m
0
(ξ +1/2)|
2
where m
0
,γ is 1-periodic,in L
2
(T) and |γ| =1 (Proposition 20,Theorem 23)
29
These imply that
|ϕˆ(ξ)|
2
+|ψ
ˆ
(ξ)|
2
=|ϕˆ(ξ/2)|
2
for all ξ
As an aside,we note that plugging in ξ =0,then if we choose ϕ so that ϕˆ is continuous at 0,then ψ
ˆ
(0) =
0,i.e.
R
ψ=0 (note this is not a necessary condition for (V
j
,ϕ) to be an MRA)
Now rewrite the above:
|ϕˆ(ξ)|
2
= |ϕˆ(2ξ)|
2
+|ψ
ˆ
(2ξ)|
2
= |ϕˆ(4ξ)|
2
+|ψ
ˆ
(4ξ)|
2
+|ψ
ˆ
(2ξ)|
2
= |ϕˆ(2
N
ξ)|
2
+
X
j=1
N

ˆ
(2
j
ξ)|
2
Note that since
P
|ϕˆ(ξ + l)|
2
= 1,|ϕˆ(ξ)|
2
≤ 1 a.e.This implies that both
P
j=1
N

ˆ
(2
j
ξ)|
2
and ϕˆ(2
N
ξ)
converge a.e.as N →∞.Since ϕˆ(2
N
 ) →0 in L
2
,it must be that ϕˆ(2
N
 ) →0 a.e.Finally,this means
that
|ϕˆ(ξ)|
2
=
X
j=1

ˆ
(2
j
ξ)|
2
a.e.
Note that this does not hold everywhere since plugging in ξ =0 gives 0 on the right but 1 on the left,and
plugging in other integer ξ gives 0 on the left but not necessarily on the right.
In any case,this allows us to recover |ϕˆ| from |ψ
ˆ
| and vice versa.(Can look at what this looks like graph-
ically,and can compare to the Haar system).
Summary
We have the following ingredients.We will choose ϕ∈L
2
such that
1.
P
l∈Z
|ϕˆ(ξ +l)|
2
=1,ensuring that {ϕ(  −k),k ∈Z} is an orthonormal system.
2.ϕˆ continuous at 0,and |ϕˆ(0)| =1.(satisﬁes spanning criteria
S
j
V
j
=L
2
)
3.ϕˆ(ξ) =m
0
(ξ/2) ϕˆ(ξ/2) for some 1-periodic function m
0
∈L
2
(T) (a necessary condition for ϕ∈V
0
in
an MRA.).Note that the same iteration procedure (as above for |ϕˆ|,|ψ
ˆ
|) allows us to recover ϕˆ
from m
0
Then it turns out that all the properties of an MRA will be satisﬁed by (V
j
,ϕ).
We need to check the conditions of Deﬁnition 14.As mentioned above,the second condition follows from (2)
and the ﬁfth condition follows from (1).
By construction,we have that {ϕ
j,k
,k ∈Z} is an orthonormal basis for V
j
.Then from Proposition 24,we have
f ∈V
0
￿ f
ˆ
(ξ) =(1-periodic fn) ϕˆ(ξ)
￿
1
2
f
ˆ
(ξ/2) =(2-periodic fn) ϕˆ(ξ/2)
￿ f(2∙ ) ∈V
1
which is the fourth condition.The ﬁnal condition is the nesting condition V
j
⊂ V
j+1
.Now we use (3),which
gives the reﬁnement equation
ϕ(x) =
￿
c
k
ϕ(2x −k)
30
with c
k
=2m
0
(−k) (plug in and take Fourier transforms).This gives the nesting,since for f ∈V
0
,
f =
￿
j
a
j
(f) ϕ
0,j
=
￿
j
a
j
(f)
￿
k
c
k
ϕ(2x −2j −k) =
￿
j,k
a
j
(f)c
k
ϕ
1,k+2j
∈V
1
Note that we showed earlier that the third condition is implied by the others.
Meyer’s construction of wavelets in S
We will construct a wavelet ψ with ψ
ˆ
compactly supported and ψ ∈C

.In particular ψ will not be com-
pactly supported,which will not be as useful.In fact,there is no wavelet that is compactly supported and
C

(can only obtain up to ﬁxed order C
k
).The idea is as follows:Let ϕˆ ∈C

satisfying:
a) supp(ϕˆ) ⊂[ −2/3,2/3]
b) ϕˆ(ξ) =1 on [ −1/3,1/3]
c) ϕˆ real valued and even (so that ϕ is also real valued and even)
d) |ϕˆ(ξ)|
2
+|ϕˆ(ξ −1)|
2
=1 for all ξ ∈[0,1].
We will refer to the ingredients (1,2,3) in the previous discussion.
(a,d) imply (1) since the support condition on ϕ
ˆ
make it so that the inﬁnite sum
P
l∈Z

ˆ
(ξ + l)|
2
con-
sists of two terms for every ξ,which will be consecutive,and by (d) the sum will be 1.
(b) will imply (2) by construction.(a-d) will imply (3) with
m
0
(ξ) =
(
ϕˆ(2ξ) |ξ| ≤1/3
0
1
3
≤|ξ| ≤
1
2
(and periodized of course)
Note
ϕˆ(2ξ) =

m
0
(ξ) |ξ| <1/3
0 |ξ| ≥1/3
and
m
0
(ξ) ϕˆ(ξ) =

m
0
(ξ) |ξ| ≤1/3 (from (b))
0 1/3≤|ξ| ≤2/3 (m
0
(ξ) =0 on [1/2,2/3])
0 1/2≤|ξ| ≤2/3 (ϕˆ(ξ) =0 after [2/3,∞))
How do we satisfy (d)?Since we are looking for even ϕˆ,we want |ϕˆ(ξ)|
2
+|ϕˆ(1 −ξ)|
2
=1.We want to use
|cos(ξ)|
2
+ |cos(
π
2
− ξ)|
2
= |cos(ξ)|
2
+ |sin(ξ)|
2
= 1,but cos(ξ) is not compactly supported.However,now
the problem reduces to ﬁnding η ∈ C

for which η(ξ) + η(1 − ξ) =
π
2
for all ξ with η(ξ) =
π
2
for |ξ| ≥2/3
and η(ξ) =0 for |ξ| ≤1/3.
The construction is as follows.First we ﬁnd a smooth function which is 1 for x <0 and 0 for x >1 and strictly
between 0 and 1 otherwise.We simply convolve a smooth bump with support [ −ε,ε],ε <1/2,with the func-
tion H(x) where H is 1 for x<1/2 and 0 for x>1/2.Call this function f.
Now we take η(x) =
f(x)
f(x) +f(1 −x)
,which is smooth since f is smooth and f(x) + f(1 − x) > 0.Then by con-
struction we have η(x) + η(1 −x) =
f(x) +f(1 −x)
f(x) +f(1 −x)
=1,and η(x) =1 for x <0 (since f(x) =1 and f(1 −x) =0
when x<0) and η(x) =0 for x >1.
31
Then just shift and scale η to ﬁt the requirements above.
∗ Alternatively,take a symmetric smooth bump α(t) with support [1/3,2/3] centered at 1/2 with integral
π
2
and consider
η(ξ) =
￿
−∞
ξ
α(t)dt
Note that η(ξ) +η(1−ξ) =
￿
−∞
ξ
+
￿
−∞
1−ξ
=
￿
−∞
ξ
+
￿
ξ

=
π
2
(using symmetry of α).
In summary,ϕˆ is C

and compactly supported,and thus in S,so that ϕ is in S.If ψ is chosen according
to our convention,then ψ ∈ S as well.Moreover,ψ
ˆ
vanishes in a neighborhood of 0,which we can see
from
ψ
ˆ
(ξ)
2
=ϕˆ(ξ/2)
2
−ϕˆ(ξ)
2
noting ϕˆ(ξ) is 1 in a neighorhood of zero,and thus ψ
ˆ
(ξ) is 0 in a neighborhood of 0.This means that
ψ
ˆ
(k)
(0) =0 for all k =0,1,2,￿ so all moments vanish:
R
x
k
ψ(x)dx=0 for all k.
As mentioned previously,the Meyer wavelet is not usable in practice since it is not compactly supported.
Week 8 (10/28/2010)
Vanishing Moments
First,we revisit speciﬁcs about vanishing moments.If f is smooth and ψ has vanishing moments,then
hf,ψ
j,k
i decays fast in j →∞,depending on the smoothness of f and the number of vanishing moments.
Here is a basic case:
Theorem 30.Let f be Lip
α
∩L
2
,i.e.|f(x) −f(y)| ≤|f |
Lip
α
|x −y|
α
,where 0 <α ≤1,and let ψ be such
that
R
ψ(x)dx=0.If c
ψ
￿
R
|x|
α
|ψ(x)| dx<∞,then |hf,ψ
j,k
i| ￿C(f,ψ)2
−j(α+1/2)
.
Remark:The result should not be confused with exponential decay in frequency,since ψ
j,k
in the fre-
quency domain is concentrated around ξ ∼ 2
−j
so really the decay in frequency is like |ξ|
−α
.Note that
the condition that c
ψ
<∞ means that ψ should have some order of decay at as |x| →∞.
Proof.Note
hf,ψ
j,k
i =2
j/2
Z
f(x)
ψ(2
j
x −k)dx=2
j/2
Z

f(x)
−f(k2
−j
)

ψ(2
j
x−k)dx
noting that ψ integrates to 0 and ψ
j,k
centers around k2
−j
,so we extract f(k2
−j
) from f(x).Then
taking absolute values inside,and applying Lipschitz constant:
|hf,ψ
j,k
i| ≤ 2
j/2
|f |
Lip
α
Z
|x −k2
−j
|
α
|ψ(2
j
x−k)|dx
= 2
j(1/2−α)
|f |
Lip
α
Z
|2
j
x−k|
α
|ψ(2
j
x −k)|2
j
dx
= 2
j(1/2−α)
|f |
Lip
α
c
ψ
￿
32
Thus,having the 0-th vanishing moment for ψ gives results for Lip
α
.For higher vanishing moments,we
can subtract a polynomial instead of the constant,and for this we need more smoothness of f to get
(local) polynomial approximation.l vanishing moments can handle f ∈ Lip
l+α
.The proof is essentially
the same.
Note that for f in a range of Lip
α
spaces and say ψ has all vanishing moments,to obtain the optimal
decay in j we note that we need to look at the how |f |
Lip
α
and c
ψ
depend on α and optimize over α
(there is a tradeoﬀ between the decay rate of 2
−jα
and the growth of |f |
Lip
α
,c
ψ
(α)).
Spline Wavelets
At this point,Meyer wavelets can be used to characterize spaces,but implementation-wise we will need
other wavelets.In particular we want wavelets with corresponding high pass and low pass having ﬁnite
support.As an intermediate step to the Daubechies compactly supported wavelets,we will look at splines,
piecewise polynomials on uniformly spaced knot sequences (Knots are the points at which the function
changes.Uniform spacing will give us shift invariance).
First,some basic background on splines..Let d ≥ 0 be an integer (degree of splint),and ﬁx a > 0
(spacing).Deﬁne
S
d
(aZ) ￿
n
f:R→C,f |
[ak,a(k+1)]
is a polynomial of degree ≤d,f ∈C
d−1
o
e.g.S
0
(aZ) are piecewise constants and S
1
(aZ) are piecewise linear functions which are still continuous.
A basis for S
d
(Z) is given by B-splines,{B
d
(  −k),k ∈Z} where B
0
=1
[0,1]
and B
n+1
=B
n
∗ 1
[0,1]
.
(Note that as n→∞,n

B
n
( n

 ) tends to a Gaussian,by the central limit theorem)
We can check the following properties (inductively):
• B
d
∈S
d
(Z).Assuming B
n
is piecewise polynomial on [k,k +1],we note that
B
n+1
(x) =
Z
x−1
x
B
n
(y)dy
and for x∈[k,k +1],we have
B
n+1
(x) =
Z
x−1
k
B
n
(y)dy +
Z
k
x
B
n
(y)dy
which is polynomial since the antiderivative of polynomials is again polynomial.
Also,we show that B
n+1
∈ C
n
assuming B
n
∈ C
n−1
.It suﬃces to check near integer points k ∈ Z,
since polynomials are smooth.Using properties of convolution,we have that B
n+1
(n)
=
d
dx
h
B
n
(n−1)

1
[0,1]
i
,then
B
n+1
(n)
(x) =
d
dx

Z
x−1
x
B
n
(n−1)
(y)dy

= B
n
(n−1)
(x) −B
n
(n−1)
(x−1)
which shows also that B
n+1
(n)
is continuous (since B
n
(n−1)
(x) is continuous by our inductino hypoth-
esis).
• B
d
d+1
2
33
We wish to show that B
n+1
(x) =B
n+1
(d+2 −x) given that B
n
(x) =B
n
(d+1 −x):
B
n+1
(x) =
Z
x−1
x
B
n
(y)dy
=
Z
x−1
x
B
n
(d+1 −y)dy
=
Z
d+1−x
d+2−x
B
n
(y)dy
= B
n+1
(
d+2−x
)
• supp(B
d
) =[0,d +1]
This follows from convolution properties,supp(f ∗ g) =supp(f) +supp(g) (Minkowski sum).

P
B
d
(x −k) ≡1 for all x,i.e.B
d
(  −k) form a partition of unity.
Let G(x) =
P
k
B
d
(x − k),then G
ˆ
(l) = B
d
(l) =
h
B
0
(l)
i
d+1
(G
ˆ
is Fourier series and B
d
is Fourier
transform).Noting that B
0
(ξ) =e
−iπξ
sin(πξ)
πξ
,this implies that G
ˆ
(l) =δ
l
so that G≡1.
• {B
d
(  −k),k ∈Z} form a basis for S
d
(Z).
The previous two properties implies that the functions are linearly independent.Suppose we have a
ﬁnite linear combination of
P
a
k
B
d
(x −k) =0,note that the functions at the end (or beginning)
have support where the other functions are not supported,and thus the coeﬃcient must be 0.By
repeating this process,we conclude that all the coeﬃcients are 0,by peeling oﬀ the functions that
are furthest away from the origin.
To show that these functions span the spline space is more diﬃcult (There’s a reference in the
wikipedia article for “B-spline”)
Spline MRA (d is ﬁxed)
Deﬁne V
j
￿ S
d
(2
−j
Z) ∩L
2
(R).
Note that we immediately have V
j
⊂ V
j+1
(knot sequence of V
j
is a subsequence of the knot sequence of