PCA vs ICA vs LDA
How to represent images?
•
Why representation methods are needed??
–
Curse of dimensionality
–
width x height x channels
–
Noise reduction
–
Signal analysis & Visualization
•
Representation methods
–
Representation in frequency domain : linear transform
•
DFT, DCT, DST, DWT, …
•
Used as compression methods
–
Subspace derivation
•
PCA, ICA, LDA
•
Linear transform derived from training data
–
Feature extraction methods
•
Edge(Line) Detection
•
Feature map obtained by filtering
•
Gabor transform
•
Active contours (Snakes)
•
…
1 1 2 2
1 2
ˆ
...
where,,...,isa baseinthe dimensionalsubs
pace(K<N)
K K
K
x bu b u b u
u u u K
ˆ
x x
1 1 2 2
1 2
...
where,,...,isa basein theoriginal Ndimensi
onalspace
N N
n
x a v a v a v
v v v
•
Find a basis in a low dimensional sub

space:
−
Approximate vectors by projecting them in a low dimensional
sub

space
:
(1) Original space representation:
(2) Lower

dimensional
sub

space
representation:
•
Note:
if K=N, then
What is subspace? (1/2)
What is subspace? (2/2)
•
Example (K=N):
PRINCIPAL COMPONENT
ANALYSIS (PCA)
Why Principal Component Analysis?
•
Motive
–
Find bases which has high variance in data
–
Encode data with small number of bases with low MSE
Derivation of PCs
[ ]
E
x 0
T T
a
x q q x
2 2 2 2
[ ] [ ] [ ]
[( )( )] [ ]
T T T T T
E a E a E a
E E
q x x q q xx q q Rq
Rq q
1 2 1 2
,[,,...,,...,],[,,...,,...,]
1,2,...,
T
j m j m
j j j
diag
j m
R Q
Λ
Q Q q q q q
Λ
Rq q
Find q’s maximizing this!!
1
2
  ( ) 1
T
q q q
Principal component q can be obtained
by Eigenvector decomposition such as SVD!
Assume that
0
5
10
15
20
25
PC1
PC2
PC3
PC4
PC5
PC6
PC7
PC8
PC9
PC10
Variance (%)
Dimensionality Reduction (1/2)
Can
ignore
the components of less significance
.
You do
lose some information
, but if the eigenvalues are small, you
don’t lose much
–
n
dimensions in original data
–
calculate
n
eigenvectors and eigenvalues
–
choose only the first
p
eigenvectors, based on their eigenvalues
–
final data set has only
p
dimensions
Dimensionality Reduction (2/2)
Variance
Dimensionality
Reconstruction from PCs
q=1
q=2
q=4
q=8
q=16
q=32
q=64
q=100…
Original
Image
LINEAR DISCRIMINANT
ANALYSIS (LDA)
Limitations of PCA
Are the maximal variance dimensions the
relevant dimensions for preservation?
Linear Discriminant Analysis (1/6)
•
What is the goal of LDA?
−
Perform dimensionality reduction “while preserving as much of the
class discriminatory information as possible”.
−
Seeks to find directions along which the classes are best separated.
−
Takes into consideration the scatter
within

classes
but also the
scatter
between

classes
.
−
For example of face recognition, more capable of distinguishing
image variation due to
identity
from variation due to other sources
such as
illumination
and
expression
.
Linear Discriminant Analysis (2/6)
1 1
1
( )( )
( )( )
i
n
c
T
w j i j i
i j
c
T
b i i
i
S Y M Y M
S M M M M
Within

class scatter matrix
Between

class scatter matrix
T
U
y x
projection matrix
−
LDA computes a transformation that maximizes the between

class
scatter while minimizing the within

class scatter:
   
max max
 
 
T
b b
T
w
w
S U S U
U S U
S
products of eigenvalues !
,
b w
S S
: scatter matrices of the projected data
y
1
w
T
b
S S U U
Linear Discriminant Analysis (3/6)
−
c.f. Since
S
b
has at most rank C

1, the max number of eigenvectors
with non

zero eigenvalues is C

1 (i.e.,
max dimensionality of sub

space is C

1
)
•
Does
S
w

1
always exist?
−
If
S
w
is non

singular, we can obtain a conventional eigenvalue
problem by writing:
−
In practice,
S
w
is often singular since the data are image vectors
with large dimensionality while the size of the data set is much
smaller (
M
<<
N
)
1
w
T
b
S S U U
Linear Discriminant Analysis (4/6)
•
Does
S
w

1
always exist?
–
cont.
−
To alleviate this problem, we can use PCA first:
1)
PCA is first applied to the data set to reduce its dimensionality.
2)
LDA is then applied to find the most discriminative directions:
Linear Discriminant Analysis (5/6)
D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image
Retrieval", IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 18, no. 8, pp. 831

836, 1996
PCA
LDA
Linear Discriminant Analysis (6/6)
•
Factors unrelated to classification
−
MEF vectors show the tendency of PCA to capture major variations
in the training set such as lighting direction.
−
MDF vectors discount those factors unrelated to classification.
D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image
Retrieval", IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 18, no. 8, pp. 831

836, 1996
INDEPENDENT COMPONENT
ANALYSIS
PCA vs ICA
•
PCA
–
Focus on uncorrelated and Gaussian components
–
Second

order statistics
–
Orthogonal transformation
•
ICA
–
Focus on independent and non

Gaussian components
–
Higher

order statistics
–
Non

orthogonal transformation
Independent Component
Analysis (1/5)
•
Concept of ICA
–
A given signal(
x
) is generated by linear mixing(
A
) of independent
components(
s
)
–
ICA is a statistical analysis method to estimate those independent
components(
z
) and Mixing rule(
W
)
A
ij
s
1
s
2
s
3
s
M
s
A
x
x
1
x
2
x
3
x
M
W
ij
W
z
z
1
z
2
z
3
z
M
WAs
Wx
z
We do not know
Both unknowns
Some optimization
Function is required!!
Independent Component
Analysis (2/5)
A
W
U
A
X
U
X
W
1
Independent Component
Analysis(3/5)
•
What is independent component??
–
If one variable can not be estimated from other
variables, it is independent.
–
By Central Limit Theorem, a sum of two
independent random variables is more gaussian
than original variables
distribution of
independent components are nongaussian
–
To estimate ICs, z should have nongaussian
distribution, i.e. we should maximize
nonguassianity.
Independent Component
Analysis(4/5)
•
What is nongaussianity?
–
Supergaussian
–
Subgaussian
–
Low entropy
Gaussian
Supergaussian
Subgaussian
Independent Component
Analysis(5/5)
•
Measuring nongaussianity by Kurtosis
–
Kurtosis : 4
th
order cumulant of randomvariable
•
If kurt(z) is zero, gaussian
•
If kurt(z) is positive, supergaussian
•
If kurt(z) is negative, subgaussian
•
Maximzation of kurt(z) by gradient method
2
2
4
})
{
(
3
}
{
)
(
z
E
z
E
z
kurt
]
3
}
)
(
{
))[
(
(
4

)
(

2
3

w

w
x
w
x
x
w
w
x
w
T
T
T
E
kurt
sign
kurt

w

w/
w
x)
x(w
x
w
w
T
T
}
{
))
(
(
3
E
kurt
sign
w
x)
x(w
w
T
3
}
{
3
E
Fast

fixed point algorithm
Simply change
The norm of w
PCA vs LDA vs ICA
•
PCA
: Proper to
dimension reduction
•
LDA
: Proper to
pattern classification
if the
number of training samples of each class are
large
•
ICA
: Proper to
blind source separation or
classification
using ICs when class id of
training data is not available
References
•
Simon Haykin, “Neural Networks
–
A Comprehensive
Foundation

2
nd
Edition,” Prentice Hall
•
Marian Stewart Bartlett, “Face Image Analysis by
Unsupervised Learning,” Kluwer academic publishers
•
A. Hyvärinen, J. Karhunen and E. Oja, “Independent
Component Analysis,”, John Willy & Sons, Inc.
•
D. L. Swets and J. Weng, “Using Discriminant
Eigenfeatures for Image Retrieval”, IEEE Trasaction
on Pattern Analysis and and Machine Intelligence,
Vol. 18, No. 8, August 1996
Comments 0
Log in to post a comment