FAST WAVELET TECHNIQUES FOR NEAROPTIMAL IMAGE PROCESSING
RONALD A.DeVORE BRADLEY J.LUCIER
Department of Mathematics Department of Mathematics
University of South Carolina Purdue University
Columbia,South Carolina 29208 West Lafayette,Indiana 47907
devore@math.scarolina.edu lucier@math.purdue.edu
1.Introduction
In recent years many authors have introduced cer
tain nonlinear algorithms for data compression,noise re
moval,and image reconstruction.These methods include
wavelet and quadraturemirror lters combined with quan
tization of lter coecients for image compression,wavelet
shrinkage estimation for Gaussian noise removal,and cer
tain minimization techniques for image reconstruction.
In many cases these algorithms have outperformed lin
ear algorithms (such as convolution lters,or projections
onto xed linear subspaces) according to both objective
and subjective measures.Very recently,several authors,
e.g.,[5],[7],[14],have developed mathematical theories
that explain the improved performance in terms of how
one measures the smoothness of images or signals.In this
paper we present a unied mathematical approach that al
lows one to formulate both linear and nonlinear algorithms
in terms of minimization problems related to the socalled
Kfunctionals of harmonic analysis.We then summarize
the previously developed mathematics that analyzes the
image compression and Gaussian noise removal algorithms.
We do not know of a specic formulation of the image re
construction problem that supports an analysis or even
denition of optimal solution.Although our framework
and analysis can be applied to any ddimensional signals
(d = 2 for images,d = 1 for audio signals,etc.),we restrict
our discussion in this paper to images.
The outline of our paper is as follows.We want to nd
an approximation
~
f to a given image f on a square domain
I,either to compress the image,remove noise from the
image,etc.The size of the dierence between f and
~
f is
measured by a norm,which we shall take in this paper to
be the L
2
(I) (meansquare) norm.(We emphasize that we
do not believe that the L
2
(I) norm matches the spatial
frequency{contrast response of the human visual system
we use the L
2
(I) norm here only because the presentation
is simpler;see,e.g.,[5],where we develop a theory of image
compression in L
p
(I),with 0 < p < 1.) We wish to
balance the smoothness of
~
f with the goodness of t kf −
~
fk
L
2
(I)
;to this end we consider the problemof minimizing
(1) kf −gk
L
2
(I)
+kgk
Y
;
MILCOM'92,IEEE Military Communications Conference Record,
San Diego,California,October 11{14,1992,IEEE,Piscataway,NJ,
1992,pages 1129{1135.
where Y is a space that measures the smoothness of the
approximations g.We take
~
f to be a function that min
imizes (1).If the positive parameter is large,then the
smoothness of g is important;if it is small,the approxi
mation error between f and g is important.We consider
two families of spaces in which to measure the smoothness
of
~
f,the Sobolev spaces W
(L
2
(I)),whose functions have
\derivatives"in L
2
(I),and the Besov spaces B
(L
(I))
with = 2=(1+),which contain functions with \deriva
tives"in L
(I).The rst family of spaces will likely be
more familiar to the reader than the second,but the sec
ond family is also quite natural to consider,as it is pre
cisely the scale of spaces with minimal smoothness to be
embedded in L
2
(I).By using wavelet decompositions of
the functions f and
~
f,we show that one can nd near
optimal solutions to the variational problem (1) (or one
close to it).This is possible because the norms kfk
L
2
(I)
,
kfk
W
(L
2
(I))
,and kfk
B
(L
(I))
can be determined simul
taneously be examining the wavelet coecients of f.The
approximate minimizers
~
f of (1) that arise from these al
gorithms have surprisingly good approximation properties
to f.In fact,when considering the problem of image com
pression,by setting Y = W
(L
2
(I)) the procedure above
denes a mapping f!
~
f that is a nearoptimal approxima
tion from all linear approximation procedures to functions
in W
(L
2
(I));and by setting Y = B
(L
(I)),one nds a
mapping f!
~
f that is nearoptimal among all nonlinear
approximations to functions in B
(L
(I)) that satisfy a
certain continuous selection property.Thus,one derives
both linear and nonlinear nearoptimal algorithms by con
sidering the same principle,only with dierent families of
smoothness spaces.The analysis for noise removal meth
ods is not as complete as for compression methods,but
we report here several results of Donoho and Johnstone
[13] [14],and,to a lesser extent,the present authors [8],
about nonlinear noise reduction techniques.We nish with
sample images that illustrate the theory in the text.
2.Images,smoothness spaces,
and minimization problems
We begin with a function f that represents a greyscale
image.To be specic,we assume that as x:= (x
1
;x
2
),
0 x
1
1,0 x
2
1,ranges over the unit square
I:= [0;1]
2
:= [0;1] [0;1],f(x) denotes the intensity of
the image at location x.(We use:= to mean\dened
by.") We want to\approximate"f by a\smoother"func
tion g.To be precise,we must dene what we mean by
\approximate"and\smoother,"and how we wish to trade
o the size of the approximation error and the smoothness
of g.To this end,we introduce the\error"function space
X with norm k k
X
and the\smoothness"function space
Y with corresponding norm k k
Y
.(The dot in the pre
vious expressions indicates where the missing arguments
are to be placed.Thus,we could talk of a function f or
f( ).) The smoothness of the function g will be measured
by kgk
Y
and the size of the error between f and g will be
given by kf −gk
X
.
We shall require that any smooth function have a nite
Xnorm,so that there exists a constant C such that for
all g in Y,kgk
X
Ckgk
Y
.We shall assume also that the
space Y is dense in X,so that given any function f and
any positive ,there is a g 2 Y with kf −gk
X
< .
Finally,we set up our minimization problem as:x a
positive parameter and nd a function
~
f that minimizes
the right side of
(2) K(f;;X;Y ):= inf
g2Y
fkf −gk
X
+kgk
Y
g:
The function K(f;;X;Y ) is called the Kfunctional be
tween X and Y.The parameter determines the relative
importance of the smoothness of g and the approximation
error of f.
We note several properties of K(f;;X;Y ).First,
K(f;;X;Y ) kfk
X
,since we can take g = 0.Second,
K(f;;X;Y )!0 as !0,since Y is dense in X.In fact,
one can measure the smoothness of any f in X in terms of
how quickly K(f;;X;Y )!0 as !0.We say that f is
in the interpolation space (X;Y )
;q
,0 < < 1,0 < q 1,
if K(f;;X;Y ) decays fast enough as !0 that
kfk
;q
:=
Z
1
0
[
−
K(f;;X;Y )]
q
d
1=q
< 1:
The spaces (X;Y )
;q
are somehowintermediate in smooth
ness to X and Y.
We must choose spaces X and Y useful for image pro
cessing.To simplify our presentation,we shall assume that
X is the space L
2
(I),the space of squareintegrable func
tions.(Other spaces are of both theoretical and practi
cal importance;see,e.g.,image compression algorithms in
L
p
(I),p 6
= 2,as described in [5].)
A function f is in L
2
(I) if
kfk
2
L
2
(I)
:=
Z
I
jf(x)j
2
dx < 1:
Two families of spaces are useful for measuring the
smoothness of g.The more classical spaces are the Sobolev
spaces W
(L
2
(I)),which consist of all functions with
derivatives in L
2
(I).When is an integer,we can dene
kgk
2
W
(L
2
(I))
:=
X
jmj
kD
m
fk
2
L
2
(I)
;
here m:= (m
1
;m
2
),m
i
is a nonnegative integer,jmj:=
m
1
+m
2
,D
m
f:= D
m
1
1
D
m
2
2
f,and D
i
f:= @f=@x
i
.
When is not an integer,the Sobolev spaces W
(L
2
(I))
can be dened in terms of the Besov spaces B
q
(L
p
(I)).
Functions in these spaces have,roughly,\derivatives"in
L
p
(I);the parameter q,which measures more subtle gra
dations in smoothness,is necessary for certain theorems.
A precise denition of the Besov spaces of interest to us
can be found in [5].The spaces W
(L
2
(I)) can be dened
by W
(L
2
(I)) = B
2
(L
2
(I)).In computations,it is more
convenient to consider the functional
(3) inf
g2W
(L
2
(I))
fkf −gk
2
L
2
(I)
+kgk
2
W
(L
2
(I))
g;
which,by a certain abuse of notation,we shall also call
K(f;;L
2
(I);W
(L
2
(I))).
We consider a second class of smoothness spaces,the
Besov spaces B
(L
(I)) with 1= = =2 + 1=2.These
spaces have \derivatives"in L
(I),where = 2=( +1).
Since < 2,and,indeed,if > 1 then < 1,we see that
functions need less smoothness to be in B
(L
(I)) than to
be in W
(L
2
(I)) = B
2
(L
2
(I)).Stated another way,there
are more functions (images) with derivatives in L
(I)
than there are functions with derivatives in L
2
(I).
For the pair (X;Y ) = (L
2
(I);B
(L
(I))) we shall con
sider the functional
(4) inf
g2B
(L
(I))
fkf −gk
2
L
2
(I)
+kgk
B
(L
(I))
g;
which we,yet again,denote by K(f;;L
2
(I);B
(L
(I))).
One can show that all these choices of K(f;;X;Y ) yield
the same family of interpolation spaces (X;Y )
;q
with
the natural transformation of parameters;see Bergh and
L¨ofstr¨om [1],page 68.
The spaces B
(L
(I)) are quite natural when consider
ing approximation in L
2
(I),because this family of spaces
has the minimal smoothness to be embedded in L
2
(I).
This means that given a pair and = 2=(1+),there is
no
0
< or
0
< such that B
0
(L
(I)) or B
0
(L
0
(I))
is contained in L
2
(I).
One can ask two questions:which of these spaces
can contain common images and what are the interpola
tion spaces between L
2
(I) and W
(L
2
(I)) or B
(L
(I))?
First,one should note that the intensity of images of
ten is discontinuous across lines or onedimensional curves,
and one nds that,necessarily,images with such a prop
erty (hereinafter called\images with edges") cannot be
in W
(L
2
(I)) if 1=2,and cannot be in B
(L
(I)) if
1;see [5] for the calculation.So images can be in
smoother classes B
(L
(I)) than W
(L
2
(I)).Second,the
interpolation spaces between L
2
(I) and our smoothness
spaces are again spaces in the same class;specically,for
= ,0 < < 1,
(L
2
(I);W
(L
2
(I)))
;2
= W
(L
2
(I))
(see Bergh and L¨ofstr¨om [1]) and
(L
2
(I);B
(L
(I)))
;
= B
(L
(I)) = 2=(1 +);
(see DeVore and Popov [11]).
2
3.Wavelet transforms
Even though our methods and analysis apply to any
wavelet transform,we shall consider here only the well
known Haar transform of images;this avoids certain tech
nicalities near the boundary of the square I.Some readers
may object that the Haar transform lacks smoothness.In
reply,we point out that if one works with images with
edges,then the Haar transform has enough smoothness
to prove optimal convergence results in L
2
(I).For exam
ple,no smoother wavelet transforms can achieve higher
global asymptotic rates of convergence in image compres
sion applicationsusing smoother wavelets may decrease
the error by a constant factor,because of their better per
formance in smooth regions of an image,but they will not
achieve a better rate of convergence.
To brieﬂy recall the Haar transform,we introduce the
functions
(1)
(x):=
1;x
1
1
2
;
−1;x
1
>
1
2
;
(2)
(x):=
1;x
2
1
2
;
−1;x
2
>
1
2
;
(3)
(x) =
(1)
(x)
(2)
(x),and
(4)
(x) 1 for x 2 I.If we
dene
Ψ
k
:=
f
(1)
;
(2)
;
(3)
;
(4)
g;k = 0;
f
(1)
;
(2)
;
(3)
g;k > 0;
and Z
2
k
:= fj:= (j
1
;j
2
) j 0 j
1
< 2
k
;0 j
2
< 2
k
g,then
the set
f
j;k
(x):= 2
k
(2
k
x −j);x 2 I j 2 Ψ
k
;j 2 Z
2
k
;k 0g
forms an orthonormal basis for L
2
(I).This means that
any function f 2 L
2
(I) can be written as
f =
X
0k
X
j2Z
2
k
X
2Ψ
k
c
j;k;
j;k
;c
j;k;
=
Z
I
f(x)
j;k
(x) dx:
More importantly,we have the following equivalences (see,
e.g.,[15]):
(5)
kfk
2
L
2
(I)
=
X
0k
X
j2Z
2
k
X
2Ψ
k
jc
j;k;
j
2
;
kfk
2
W
(L
2
(I))
X
0k
X
j2Z
2
k
X
2Ψ
k
2
2k
jc
j;k;
j
2
; < 1=2;
kfk
B
(L
(I))
X
0k
X
j2Z
2
k
X
2Ψ
k
jc
j;k;
j
for < 1 and = 2=(1 +).(Two notes:We have 1= =
=d + 1=2 for ddimensional signals,and A(f) B(f)
means that A(f)=B(f) is bounded above and below by
positive constants that are independent of f.) Thus,the
norms kfk
L
2
(I)
,kfk
W
(L
2
(I))
,and kfk
B
(L
(I))
are simul
taneously equivalent to the corresponding sequence norms
on the right hand side of (5).We haven't yet bothered to
dene precisely the norms kfk
W
(L
2
(I))
and kfk
B
(L
(I))
;
we now use the sequence norms on the right side of (5) as
the denition of these quantities.
4.Solving minimization problems
We consider how to minimize (3).We decompose f
as above,and write g =
P
0k
P
j2Z
2
k
P
2Ψ
k
d
j;k;
j;k
.
Using the equivalent sequence norms,we wish to minimize
X
0k
X
j2Z
2
k
X
2Ψ
k
[(c
j;k;
−d
j;k;
)
2
+2
2k
jd
j;k;
j
2
]:
Clearly,one minimizes the entire expression by minimizing
each term separately.So we consider the problem:given t
and > 0,nd s that minimizes
E:= (t −s)
2
+s
2
:
It is easy to see that this expression attains its minimumof
t
2
=(1+) when s = t=(1+).By considering separately
the cases 1 and 1,one sees that for all > 0,
1
2
min(;1)
1 +
min(;1);
so that the following algorithm,which gives E a value of
min(;1)t
2
,minimizes E to within a factor of 2:If < 1
then set s = t,otherwise set s = 0.Thus,an approximate
minimizer of (3) is
(6)
~
f =
X
2
2k
<1
X
j2Z
2
k
X
2Ψ
k
c
j;k;
j;k
:
This sum consists of all c
j;k;
j;k
with k < K,where K is
the smallest integer for which 2
2k
1;this is a linear
algorithm for nding
~
f.
When Y = B
(L
(I)),we wish to minimize (4) or
X
0k
X
j2Z
2
k
X
2Ψ
k
(c
j;k;
−d
j;k;
)
2
+
X
0k
X
j2Z
2
k
X
2Ψ
k
jd
j;k;
j
:
Again,one should minimize each term separately,so we
consider the problem of minimizing
E:= (t −s)
2
+jsj
:
We can assume without loss of generality that t 0,and,
trivially,0 s t.Note that if s t=2 then E t
2
=4,
and if s t=2 then E t
=2
.Since < 2,2
< 4 and
E is bounded below by
1
4
min(t
2
;t
):
Thus,the following algorithm,which gives a value of
min(t
2
;t
) to E,minimizes E to within a factor of 4:
If t
t
2
,set s = t,otherwise,set s = 0.Thus,an
approximate minimizer of (4) is
(7)
~
f =
X
jc
j;k;
j
jc
j;k;
j
2
c
j;k;
j;k
:
Note that we choose all c
j;k;
greater than a certain thresh
old,namely jc
j;k;
j
1=(2−)
.Thus,this is a nonlinear
method for choosing
~
f.
3
5.Image compression
We assume the support of our image is the square I,and
that a function F(x) measures the intensity distribution
of light at each point x in I.We shall assume that I is
covered by 2
2m
pixels,in 2
m
columns and 2
m
rows,and
that each pixel p
j
,j 2 Z
2
m
,is the average of the intensity
F(x) on I
j;m
,which we dene to be the square of side
length 2
−m
and lower left corner located at j=2
m
.(This is
a fair approximation of what happens with CCD cameras.)
It is not hard to see that if we set f(x) = p
j
for x 2 I
j;m
and
F =
X
0k
X
j2Z
2
k
X
2Ψ
k
c
j;k;
j;k
;then
f =
X
0k<m
X
j2Z
2
k
X
2Ψ
k
c
j;k;
j;k
:
In what follows,we consider smoothness of any order
0 < < ,because we want to make clear that our algo
rithms depend only on what family of smoothness spaces
we use,not on the precise value of .Clearly,the L
2
(I),
W
(L
2
(I)),0 < < 1=2,and B
(L
(I)),0 < < 1 and
= 2=(1+),norms of f are bounded by the correspond
ing norms of F.
The linear approximation
~
f to f is given by
~
f =
X
0k<K
X
j2Z
2
k
X
2Ψ
k
c
j;k;
j;k
where K is the smallest integer such that 2
2K
1;see
(6).There are N:= 2
2K
terms in this sum,and the error
between f and
~
f is bounded by
(8)
kf −
~
fk
2
L
2
(I)
=
X
Kk<m
X
j2Z
2
k
X
2Ψ
k
jc
j;k;
j
2
2
−2K
X
Kk<m
X
j2Z
2
k
X
2Ψ
k
2
2k
jc
j;k;
j
2
2
−2K
kfk
2
W
(L
2
(I))
:
Thus,we achieve a rate of approximation
kf −
~
fk
L
2
(I)
N
−=2
kfk
W
(L
2
(I))
:
This is an optimal rate of linear approximation for func
tions in W
(L
2
(I));see [17],pp.3 .Recall that < 1=2
for images with edges.
The analysis of the nonlinear algorithm is only slightly
more dicult.Here,our approximation has the form
~
f =
X
jc
j;k;
j
c
j;k;
j;k
;
where =
1=(2−)
.Again,we wish to bound (1) the
number of terms N in this sum and (2) the error kf −
~
fk
L
2
(I)
in terms of N and kfk
B
(L
(I))
.We note that
this is an optimal way to partition the coecients:If is
any set of N coecients,then,because we include the N
largest coecients in
~
f,
kf −
~
fk
L
2
(I)
kf −
X
c
j;k;
2
c
j;k;
j;k
k
L
2
(I)
:
Because
X
0k<m
X
j2Z
2
k
X
2Ψ
k
jc
j;k;
j
= kfk
B
(L
(I))
;
it is clearly necessary that N
kfk
B
(L
(I))
.Conse
quently,N
−
kfk
B
(L
(I))
and,if N > 0,
2−
N
1−2=
kfk
2−
B
(L
(I))
:
Therefore,we have
(9)
kf −
~
fk
2
L
2
(I)
=
X
jc
j;k;
j<
jc
j;k;
j
2
2−
X
jc
j;k;
j<
jc
j;k;
j
N
1−2=
kfk
2−
B
(L
(I))
kfk
B
(L
(I))
= N
−
kfk
2
B
(L
(I))
;
since 1 −2= = −.Thus,for the nonlinear algorithm,
kf −
~
fk
L
2
(I)
N
−=2
kfk
B
(L
(I))
:
This is an optimal rate of nonlinear approximation for
functions in B
(L
(I)) among all methods that satisfy a
socalled continuous selection property;see DeVore and
Yu [12].Even though the linear and nonlinear methods
have the same convergence rate of O(N
−=2
),the non
linear method is better for two reasons:For a given ,
the nonlinear method achieves this convergence rate for
more functions,since B
(L
(I)) is much larger (contains
many more images) than W
(L
2
(I));and,while must
be less than 1=2 for the linear method applied to images
with edges,the nonlinear method allows 1=2 < 1.
Note that neither the linear nor nonlinear algorithms
depend on the precise value of .In the linear case,we
choose all coecients c
j;k;
with k less than a predeter
mined value K.This achieves results similar to linear
ltering or projections onto any other nitedimensional
subspace of dimension 2
2K
of L
2
(I).In the nonlinear case,
we choose all coecients c
j;k;
with jc
j;k;
j greater than a
predetermined parameter .In the engineering literature,
this is known as threshold coding.In each case,only the
resulting rates of convergence depend on .
6.Removing Gaussian noise from images
In the previous section,we assumed that the mea
sured pixel values p
j
,j 2 Z
2
m
,were the exact averages
of the intensity F(x) on the pixel squares I
j;m
.In this
4
section,we assume that our measurements are corrupted
by Gaussian noise,that is,that we measure not p
j
,but
p
j
:= p
j
+
j
,where
j
are i.i.d.(independent,identically
distributed) normal random variables with mean 0 and
variance
2
0
(denoted by N(0;
2
0
)).(This
0
should not
be confused with the in B
(L
(I)).) Because the map
ping f2
−m
p
j
g!fc
j;k;
g,which takes scaled pixel values
to wavelet coecients,is an orthonormal transformation,
the noisy image can be represented by
f =
X
0k<m
X
j2Z
2
k
X
2Ψ
k
c
j;k;
j;k
;
where c
j;k;
:= c
j;k;
+
j;k;
and the
j;k;
are i.i.d.
N(0;2
−2m
2
0
) random variables.This model assumes that
the expected value of the noise,
E(kf −
fk
2
L
2
(I)
) =
2
0
;
is independent of the number of pixels.We now examine
how the linear and nonlinear algorithms for calculating
~
f
can be adapted for noise removal.
Starting with
f,the linear algorithm (6) calculates
~
f =
X
0k<K
X
j2Z
2
k
X
2Ψ
k
c
j;k;
j;k
;
we will choose K (hence ) to minimize E(kF −
~
fk
2
L
2
(I)
).
Using the wavelet decompositions of
~
f and F,we calculate
E(kF −
~
fk
2
L
2
(I)
) =
X
0k<K
X
j2Z
2
k
X
2Ψ
k
E([c
j;k;
−c
j;k;
]
2
)
+
X
Kk
X
j2Z
2
k
X
2Ψ
k
jc
j;k;
j
2
X
0k<K
X
j2Z
2
k
X
2Ψ
k
E(
2
j;k;
)
+2
−2K
kFk
2
W
(L
2
(I))
= 2
2K−2m
2
0
+2
−2K
kFk
2
W
(L
2
(I))
:
The inequality follows from our estimate (8),and the sec
ond equality holds because the 2
2K
randomvariables
j;k;
each have variance 2
−2m
2
0
.
Again we set N:= 2
2K
and we minimize E(kF −
~
fk
2
L
2
(I)
) with respect to N.Calculus shows that we over
estimate the error by at most a factor of 2 if we set the two
terms in our bound equal to each other,i.e.,N
2
0
2
−2m
=
N
−
kFk
2
W
(L
2
(I))
.This yields
N =
kFk
2
W
(L
2
(I))
2
2m
2
0
1=(+1)
;
and
E(kF −
~
fk
2
L
2
(I)
) 2(2
−2m
2
0
)
=(+1)
kFk
2=(+1)
W
(L
2
(I))
:
This linear algorithmremoves all terms c
j;k;
j;k
with k
greater than a threshold K;these terms can be considered
to have frequency at least 2
K
.Thus,the linear method
considers any lowfrequency structure to be signal,and
any highfrequency structure to be noise,no matter how
large c
j;k;
,the scaled amplitude of the signal,might be.
This is not acceptable to people,such as astronomers,who
deal with high amplitude,small extent (and hence high
frequency) signals.The nonlinear algorithm presented
next recognizes highamplitude,highfrequency structures
as signals.It has been used in astronomical calculations,
e.g.,by White [20].
The bound for the nonlinear algorithm is much more
complicated,and will not be derived here.We will,how
ever,give the following lower bound,derived independently
by Donoho and Johnstone [13],which is based on the as
sumption that we have extra information about which of
the true coecients c
j;k;
are large.
We limit our estimator to the form
~
f =
X
0k
X
j2Z
2
k
X
2Ψ
k
~c
j;k;
j;k
where either ~c
j;k;
= c
j;k;
or ~c
j;k;
= 0.In the rst
case,E([c
j;k;
− ~c
j;k;
]
2
) = E(
2
j;k;
) = 2
−2m
2
0
,and in
the second case,E([c
j;k;
− ~c
j;k;
]
2
) = jc
j;k;
j
2
.Thus,if
we knewwhich coecients c
j;k;
satisfy jc
j;k;
j
2
< 2
−2m
2
0
,
we would obtain an optimal estimator
~
f =
X
jc
j;k;
j2
−m
0
c
j;k;
j;k
:
From Section 5,coecients satisfying jc
j;k;
j 2
−m
0
number at most (2
−m
0
)
−
kFk
B
(L
(I))
,while
X
jc
j;k;
j<2
−m
0
jc
j;k;
j
2
(2
−m
0
)
2−
kFk
B
(L
(I))
:
Thus,E(kF −
~
fk
2
L
2
(I)
) equals
X
jc
j;k;
j2
−m
0
2
−2m
2
0
+
X
jc
j;k;
j<2
−m
0
jc
j;k;
j
2
2(2
−m
0
)
2−
kFk
B
(L
(I))
= 2(2
−2m
2
0
)
=(+1)
kFk
2=(+1)
B
(L
(I))
;
since 2− = 2=(+1) and = 2=(+1).By considering
functions F all of whose nonzero coecients c
j;k;
satisfy
either jc
j;k;
j = 2
−m
0
or jc
j;k;
j = 2
−m
0
− for some
small > 0,we see that half this value is the promised
lower bound.Note that we get the same rate of approx
imation as in the linear case,but we are using a much
weaker norm to measure the smoothness of the image F.
In practice,we must guess from the noisy coecients
c
j;k;
which of the c
j;k;
are large.Our nonlinear approx
imation algorithm,when applied to the noisy data c
j;k;
,
is of the form
(10)
~
f =
X
jc
j;k;
j>
c
j;k;
j;k
;
5
and the problem is how to choose (=
1=(2−)
) to
minimize the maximum expected error.If one chooses
= a2
−m
0
with a xed,then the expected error for F 0
is
2
0
1
p
2
Z
jxj>a
x
2
e
−x
2
=2
dx;
which does not tend to zero as the number of data points
2
2m
increases.Thus,a must increase without bound as
the number of data points tends to innity.
The problem of optimal estimation was studied in sev
eral papers by Donoho and Johnstone culminating in [14],
and in [8] by the present authors.In various settings,
Donoho and Johnstone show that,asymptotically,the op
timal choice of a is C
p
log(2
2m
).Instead of keeping all
coecients with jc
j;k;
j a
0
2
−m
,they take
~c
j;k;
=
8
>
<
>
:
c
j;k;
−a
0
2
−m
;c
j;k;
> a
0
2
−m
;
0;jc
j;k;
j a
0
2
−m
;
c
j;k;
+a
0
2
−m
;c
j;k;
< −a
0
2
−m
;
a technique they call\wavelet shrinkage."In [8] by the
present authors,it is shown for F in B
(L
(I)) that a =
p
Clog(2
2m
) for some C yields an asymptotically near
optimal algorithm of the form (10).With this choice,the
error is proportional to
(log 2
2m
)
=(+1)
(2
−2m
)
=(+1)
:
Thus,we incur an extra factor of (log 2
2m
)
=(+1)
because
we don't know which coecients c
j;k;
are large.
7.Image reconstruction
Several authors (e.g.,[2],[16]) use functionals
min
kgk
Y
<1
fkf −fk
2
L
2
(I)
+kgk
Y
g
for purposes of image reconstruction,where Y is close to
the norms considered here.For example,Bouman and
Sauer [2] investigate minimizing for p 1
X
j2Z
2
m
jf
j
−g
j
j
2
+
X
j2Z
2
m
[jg
j
−g
j+(1;0)
j
p
+jg
j
−g
j+(0;1)
j
p
];
where f
j
are the pixels of f and g
j
are the pixels of g.In
the continuous limit,this is similar to minimizing
kf −gk
2
L
2
(I)
+[kD
1
gk
p
L
p
(I)
+kD
2
gk
p
L
p
(I)
];p 1;
where kgk
p
L
p
(I)
:=
R
I
jg(x)j
p
dx.They found that for im
ages with edges,p = 1 was most eective in reconstructing
the original image after bluring.In this case,their func
tional is precisely
(11) kf −gk
2
L
2
(I)
+kgk
BV
;
where BV is the space of functions of bounded variation on
I.We have B
1
1
(L
1
(I)) BV B
1
1
(L
1
(I)),and B
1
1
(L
1
(I))
is one of our nonlinear spaces.Thus,minimizing (11)
should yield a reconstructed
~
f for f that is close to the
usual
~
f we get from our nonlinear algorithm.
Acknowledgments
This work was supported in part by the National Science Founda
tion (grants DMS8922154 and DMS9006219),the Air Force Oce
of Scientic Research (contract 890455DEF),the Oce of Naval
Research (contracts N0001491J1152 and N0001491J1076),the
Defense Advanced Research Projects Agency (AFOSR contract 90
0323),and the Army High Performance Computing Research Center
(Army Research Oce contract DAAL0389C0038).
References
Kfunctionals are discussed in [1].Aspects of wavelet theory and
practice are surveyed in,e.g.,[3],[9],[15],and [18].The theory of
nonlinear data compression using wavelets is developed in [7],[4],[5],
[6],and [10].Nonparametric estimation using functionals similar to
(3) is surveyed in [19].
1.J.Bergh and J.L¨ofstr¨om,Interpolation Spaces:An Introduction,
SpringerVerlag,New York,1976.
2.C.Bouman and K.Sauer,Bayesian estimation of transmission
tomograms using segmentation based optimization,IEEE Trans.
Nuclear Science 39 (1992),no.4 (to appear).
3.I.Daubechies,Ten Lectures on Wavelets,CBMSNSF Regional
Conference Series in Applied Mathematics,vol.91,SIAM,
Philadelphia,1992.
4.R.DeVore,B.Jawerth,and B.Lucier,Data compression using
wavelets:Error,smoothness,and quantization,DCC 91:Data
Compression Conference (J.Storer and J.Reif,eds.),IEEE Com
puter Society Press,Los Alamitos,CA,1991,pp.186{195.
5.
,Image compression through wavelet transform coding,
IEEE Trans.Information Theory 38 (1992),no.2,719{746,Spe
cial issue on Wavelet Transforms and Multiresolution Analysis.
6.
,Surface compression,Computer Aided Geom.Design
(to appear).
7.R.DeVore,B.Jawerth,and V.Popov,Compression of wavelet
decompositions,Amer.J.Math.(to appear).
8.R.DeVore and B.Lucier,Nonlinear wavelet algorithms for re
moving Gaussian noise,in preparation.
9.
,Wavelets,Acta Numerica 1992 (A.Iserles,ed.),Cam
bridge University Press,New York,1992,pp.1{56.
10.R.A.DeVore,P.Petrushev,and X.M.Yu,Nonlinear wavelet ap
proximations in the space C(R
d
),Proceedings of the US/USSR
Conference on Approximation,Tampa,SpringerVerlag,New
York,1991 (to appear).
11.R.DeVore and V.Popov,Interpolation of Besov spaces,Trans.
Amer.Math.Soc.305 (1988),397{414.
12.R.DeVore and X.Yu,Nonlinear nwidths in Besov spaces,Ap
proximation Theory VI:Vol.1,C.K.Chui,L.L.Schumaker,and
J.D.Ward,eds.,Academic Press,New York,1989,pp.203{206.
13.D.Donoho and I.Johnstone,Ideal spatial adaptation by wavelet
shrinkage,preprint.
14.D.Donoho and I.Johnstone,Minimax estimation via wavelet
shrinkage,preprint.
15.Y.Meyer,Ondelettes et Operateurs I:Ondelettes,Hermann,
Paris,1990.
16.D.Mumford and J.Shah,Optimal approximations by piecewise
smooth functions and associated variational problems,Comm.
Pure Appl.Math.XLII (1989),577{685.
17.A.Pinkus,Nwidths and Approximation Theory,Springer
Verlag,New York,1985.
18.O.Rioul and M.Vetterli,Wavelets and signal processing,IEEE
Signal Processing Magazine 8 (1991),no.4,14{38.
19.G.Wahba,Spline Models for Observational Data,CBMSNSF
Regional Conference Series in Applied Mathematics,vol.59,
SIAM,Philadelphia,1990.
20.R.White,Highperformance compression of astronomical im
ages (abstract only),DCC 92:Data Compression Conference (J.
Storer and M.Cohn,eds.),IEEE Computer Society Press,Los
Alamitos,CA,1992,p.403.
6
Fig.1.512 512 lena,green component of color image.
Fig.2.Compressed with the linear method,N = 16384.
Fig.3.Compressed with the nonlinear method,N = 16384.
Fig.4.With 32 grey scales RMS noise added.
Fig.5.Noise removed with linear method,N = 16384.
Fig.6.Noise removed with wavelet shrinkage,a = 1.
7
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο