Non

local means: a look at non

local self

similarity of images
IT 530, LECTURE NOTES
Partial Differential Equations (PDEs):
Heat Equation
•
Inspired from thermodynamics
•
Blurs out edges
2
Executing several iterations of
this PDE on a noisy image is
equivalent to convolving the
same image with a Gaussian!
The “sigma” of the Gaussian is
directly proportional to the
number of time

steps of the
PDE.
PDEs: Anisotropic Diffusion
•
Diffusivity function “g”.
•
Decreasing function of gradient
magnitude.
•
Preserve edges: Diffuse
along
edges not
across
.
Several papers:
Perona
and
Malik
[IEEE PAMI 1990],
Total variation method [
Rudin
et al
, 1992], Beltrami flow
[
Sochen
et al
, IEEE TIP 1998], etc.
3
PDEs: Total Variation
•
Total variation
denoising
seeks to minimize
the following energy functional:
Euler

Lagrange equation
(Partial differential equation): exhibits anisotropic
behaviour
due to gradient
magnitude term in the denominator. Diffusion is low across strong edges.
Heat
equation
Perona

Malik
PDE
Total variation
Neighborhood Filters for
Denoising
Simple averaging filter
–
will
cause blurring of edges and
textures in the image
Denoising
with a
neighborhood filter
Neighborhood Filters for
Denoising
:
Lee Filter
•
Weigh the pixels in the neighborhood by
factors inversely proportional to the distance
between the central pixel and the particular
pixel used for weighting.
•
This is expressed as:
More weight to
nearby pixels
Anisotropic Neighborhood Filter
(
Yaroslavsky
Filter)
•
Weigh the pixels in the neighborhood by
factors inversely proportional to the difference
between the intensity values at those pixels
and the intensity value of the pixel to be
denoised
.
•
This is expressed as:
More weight to
pixels with similar
intensity values:
better preservation
of edges/boundaries
Bilateral Filter (
Lee+Yaroslavsky
Filter)
•
Weigh the pixels in the neighborhood by
factors inversely proportional to the difference
between the intensity values at those pixels
and the intensity value of the pixel to be
denoised
, and the difference in pixel locations.
•
This is expressed as:
More weight to
pixels with similar
intensity values:
better preservation
of edges/boundaries
Comparative Results
Comparative Results
•
The anisotropic diffusion algorithm performs
better than the others.
•
In the
Yaroslavsky
/Bilateral filter, the comparison
between the intensity values is not very robust.
This creates artifacts around the edges.
•
Performance difference between
Yaroslavsky
and
bilateral
filter is minor
.
•
All aforementioned filter are based on the
principle of
piece

wise constant intensity
images
.
Non

local self

similarity
Non

local self

similarity
is very useful in
denoising
(and almost
everything else in image
processing).
For
denoising
, you could
simply take an average
of all those patches that
were “similar” (modulo
noise).
Non

local Means
Natural images have a great
deal of redundancy: patches
from different regions can be
very similar
NL

Means: a non

local pixel

based method
(
Buades
et al
, 2005)
•
Awate
and Whitaker (PAMI 2007)
•
Popat
and Picard (TIP 1998)
•
De

Bonet
(MIT Tech report 1998)
•
Wang
et al
(IEEE SPL 2003)
14
Difference
between
patches
Non

local means: Basic Principle
•
Non

local means compares entire patches (not
individual pixel intensity values) to compute
weights for
denoising
pixel intensities.
•
Comparison of entire patches is more robust,
i.e. if two patches are similar in a noisy image,
they will be similar in the underlying clean
image with very high probability.
•
We will see this informally and prove it
mathematically in due course.
Non

local means: Variant
Euclidean distance between two
patches is being weighted by a
Gaussian with maximum weight
at the center of the two patches
and decaying outwards
Three principles to evaluate
denoising
algorithms
•
(1): The residual image (also called “method noise”)
–
defined as the difference between the noisy image and
the
denoised
image
–
should look like (and have all the
properties of) a pure noise image.
•
(2): A
denoising
algorithm should transform a pure
noise image into another noise image (of lower
variance).
•
(3): A competent
denoising
algorithm should find for
any pixel ‘
i
’,
all and only
those pixels ‘j’ that have the
same model as ‘
i
’ (i.e. those pixels whose intensity
would have most likely been the same as that of ‘
i
’, if
there were no noise).
Principle 1: Residual Image
Principle 1: Residual Image
Principle 2: Noise to noise
The pixels with high weight in anisotropic diffusion or
bilateral filters do NOT line up with our expectation (in
all images!). This is because noise affects the gradient
computation or single intensity driven weights.
In NL

means, the comparison between patches is
MUCH more robust to noise!
Principle 3: Correct models?
Non

local means: Implementation
details
•
A drawback of the algorithm is its very high
time complexity
–
O(N x N) for an image with
N pixels.
•
Heuristic work

around: given a reference
patch, restrict the research for similar patches
to a window of size S x S (called as “search
zone”) around the center of the reference
patch.
Non

local means implementation
details
•
The parameter sigma to compute the weights will
depend on the noise variance. Heuristic relation
is:
•
Patch

size is a free parameter
–
usually some size
between 7 x 7 and 21 x 21 is chosen. Larger
patch

size
–
better discrimination of the truly
similar patches, but more expensive and more
(over)smoothing.
•
Smaller patch

size
–
less smoothing.
Patch

size selection
Patch

size too small: mottling effect (fake
edges/patterns in constant intensity
regions)
Patch

size too large:
oversmoothing
of
subtle textures and edges
Ref: Duval and
Gousseau
, “A bias

variance approach for the non

local means”
Gray region
(containing patch P)
Black region
(containing patch Q)
Noisy gray region
(containing patch U(x))
Ref: Duval and
Gousseau
, “A bias

variance approach for the non

local
means”
This is a zero

mean Gaussian
random variable with variance 1
By definition of
erfc
, this probability
decreases as ‘s’ increases.
Discriminability
improves as patch

size increases! It explains why NL

means outperforms single

pixel
neighborhood filters!
Assume patch

size is
s x s.
Assume noise from
N(0,1).
Extension to Video
denoising
•
For video

denoising
, simply
denoising
each
individual frame independently ignores temporal
similarity or redundancy.
•
Most video
denoising
algorithms first perform a
motion compensation step: (1) estimate the
motion between consecutive frames, and (2)
align each successive frame to its previous frame.
•
Motion estimation is performed typically by
exploiting the “brightness constancy
assumption”, i.e. that the intensity of any physical
point is unchanged throughout the video.
Extension to Video
denoising
•
The most popular motion compensation
algorithms also assume that the motion of
nearby pixels is similar (motion smoothness
assumption).
•
You will study this in more detail in computer
vision: optical flow.
•
Denoising
is done after motion compensation
(assuming that pixels at the same coordinate in
successive frames will have same/similar
intensities).
Extension to Video
denoising
•
There are some problems in motion
estimation, even more so, if the video is noisy.
•
One such issue is called the aperture problem
–
for any block in one frame, there are many
matching blocks in the next frame.
Extension to video
denoising
•
The motion smoothness assumption is one
way to alleviate the aperture problem (again,
you will study this in more detail in computer
vision).
•
On the next slide, we will see the performance
of the Lee filter and the
Yaroslavsky
filter, with
and without motion compensation.
NL

means
performs much
better!
NL

Means for video
denoising
•
Video data has tremendous redundancy (more than
individual frames).
•
Any reference patch in one frame will have many
similar patches in other frames
–
the aperture problem
is
NO
problem for video
denoising
!
•
So forget about motion compensation!
•
Run NL

means on each frame, using similar patches
from that frame as well as from nearby frames.
•
Advantages: avoids all the inevitable errors in motion
estimation, AND saves computational cost!
An information

theoretic (and
iterated) variant of NL

Means

UINTA
•
UINTA = Unsupervised information

theoretic
adaptive filter.
•
UINTA is again based on the principle of non

local similarity.
•
It uses tools from information theory
(conditional entropy) and kernel density
estimation.
•
Uses a simple observation about the entropy
of natural images.
Ref:
Awate
and Whitaker, Higher

order image
statistics for unsupervised, information

theoretic,
adaptive image filtering”
Principle of UINTA
•
The conditional entropy of the intensity of a
central pixel given its neighbors is low in a
“clean” natural image.
•
As noise is added, this entropy increases.
X
y1
y2
y5
y20
y24
To
denoise
, you can
minimize the following
quantity at each pixel:
Overview of UINTA algorithm
•
For each pixel location
i
, we seek to minimize
the following quantity:
•
For this do a gradient descent (at each
location) until convergence:
Mathematical details
•
For image neighborhoods with
n
pixels, we
first need to estimate probability density
functions of random variables having
n
(or
n

1
) dimensions.
•
Consider the neighborhoods are denoted as
follows:
•
The expression for the PDF of Z is as follows:
Mathematical Details
•
The expression for the entropy is:
•
The gradient descent is given on the following
slide.
A projection vector
that extracts only
the dimension
corresponding to
the central pixel
Chain rule
Independent of
value of x
Central pixel to be
denoised
Neighborhood
Note!
Note! If you set the derivative of the conditional entropy to
zero (you do this since you want to minimize the conditional
entropy) and rearrange the terms, you get the NL

means
update for
denoising
. So UINTA can be considered an
iterated form of NL

means!
Earlier work on non

local similarity
•
A technique similar (in principle) to UINTA was
developed by
Popat
and Picard in 1997.
•
A training set of clean and degraded images was
used to learn the joint probability density of
degraded neighborhoods and clean central pixels.
•
Given a noisy image, a pixel value is restored
using an MAP estimate.
•
Unlike UINTA, this method requires prior training.
Texture synthesis or completion:
another use of non

local similarity
Ref:
Efros
and Leung,
“Texture Synthesis by
Non

parametric sampling”
Remember: a texture image contains very high repetition of “similar” patches all
over!
Method:
•
For every pixel (
x,y
) that needs to be filled, collect
valid neighboring intensity values.
•
Search throughout the image to find “similar”
neighborhoods.
•
Assign the intensity at (
x,y
) as some weighted
combination of such central pixel values.
•
Free parameters: size of the neighborhood and
the definition of “similar neighborhoods”.
•
For pseudo

code, see
http://graphics.cs.cmu.edu/people/efros/research/EfrosLeung.html
Some more results
Something similar in Natural Language
Processing
•
Collect sequences of n consecutive words (or
alphabets) from a large corpus of English text
(
eg
: newspaper, book etc.)
•
Compute the probability of occurrence of the
(n+1)

th
word given a preceding sequence of n
words.
•
Sampling from such a conditional probability
table allows for construct of plausible English

like text.
Ref: Shannon, A mathematical theory of communication, 1948
Comments 0
Log in to post a comment