# PCA vs ICA vs LDA

Τεχνίτη Νοημοσύνη και Ρομποτική

17 Νοε 2013 (πριν από 4 χρόνια και 5 μήνες)

105 εμφανίσεις

PCA vs ICA vs LDA

How to represent images?

Why representation methods are needed??

Curse of dimensionality

width x height x channels

Noise reduction

Signal analysis & Visualization

Representation methods

Representation in frequency domain : linear transform

DFT, DCT, DST, DWT, …

Used as compression methods

Subspace derivation

PCA, ICA, LDA

Linear transform derived from training data

Feature extraction methods

Edge(Line) Detection

Feature map obtained by filtering

Gabor transform

Active contours (Snakes)

1 1 2 2
1 2
ˆ
...
where,,...,isa baseinthe -dimensionalsub-s
pace(K<N)
K K
K
x bu b u b u
u u u K
   
ˆ
x x

1 1 2 2
1 2
...
where,,...,isa basein theoriginal N-dimensi
onalspace
N N
n
x a v a v a v
v v v
   

Find a basis in a low dimensional sub
-
space:

Approximate vectors by projecting them in a low dimensional

sub
-
space
:

(1) Original space representation:

(2) Lower
-
dimensional
sub
-
space

representation:

Note:

if K=N, then

What is subspace? (1/2)

What is subspace? (2/2)

Example (K=N):

PRINCIPAL COMPONENT
ANALYSIS (PCA)

Why Principal Component Analysis?

Motive

Find bases which has high variance in data

Encode data with small number of bases with low MSE

Derivation of PCs

[ ]
E

x 0
T T
a
 
x q q x
2 2 2 2
[ ] [ ] [ ]
[( )( )] [ ]
T T T T T
E a E a E a
E E

  
  
q x x q q xx q q Rq

Rq q
1 2 1 2
,[,,...,,...,],[,,...,,...,]
1,2,...,
T
j m j m
j j j
diag
j m
  

  
  
R Q
Λ
Q Q q q q q
Λ
Rq q
Find q’s maximizing this!!

1
2
|| || ( ) 1
T
 
q q q
Principal component q can be obtained

by Eigenvector decomposition such as SVD!

Assume that

0
5
10
15
20
25
PC1
PC2
PC3
PC4
PC5
PC6
PC7
PC8
PC9
PC10
Variance (%)
Dimensionality Reduction (1/2)

Can
ignore
the components of less significance
.

You do
lose some information
, but if the eigenvalues are small, you
don’t lose much

n

dimensions in original data

calculate

n

eigenvectors and eigenvalues

choose only the first
p

eigenvectors, based on their eigenvalues

final data set has only
p

dimensions

Dimensionality Reduction (2/2)

Variance

Dimensionality

Reconstruction from PCs

q=1

q=2

q=4

q=8

q=16

q=32

q=64

q=100…

Original
Image

LINEAR DISCRIMINANT
ANALYSIS (LDA)

Limitations of PCA

Are the maximal variance dimensions the
relevant dimensions for preservation?

Linear Discriminant Analysis (1/6)

What is the goal of LDA?

Perform dimensionality reduction “while preserving as much of the
class discriminatory information as possible”.

Seeks to find directions along which the classes are best separated.

Takes into consideration the scatter
within
-
classes

but also the
scatter
between
-
classes
.

For example of face recognition, more capable of distinguishing
image variation due to
identity

from variation due to other sources
such as
illumination

and
expression
.

Linear Discriminant Analysis (2/6)

1 1
1
( )( )
( )( )
i
n
c
T
w j i j i
i j
c
T
b i i
i
S Y M Y M
S M M M M
 

  
  


Within
-
class scatter matrix

Between
-
class scatter matrix

T
U

y x
projection matrix

LDA computes a transformation that maximizes the between
-
class
scatter while minimizing the within
-
class scatter:

| | | |
max max
| |
| |
T
b b
T
w
w
S U S U
U S U
S

products of eigenvalues !

,
b w
S S
: scatter matrices of the projected data
y

1
w
T
b
S S U U

 
Linear Discriminant Analysis (3/6)

c.f. Since
S
b

has at most rank C
-
1, the max number of eigenvectors
with non
-
zero eigenvalues is C
-
1 (i.e.,
max dimensionality of sub
-
space is C
-
1
)

Does
S
w
-
1

always exist?

If
S
w

is non
-
singular, we can obtain a conventional eigenvalue
problem by writing:

In practice,
S
w

is often singular since the data are image vectors
with large dimensionality while the size of the data set is much
smaller (
M
<<
N
)

1
w
T
b
S S U U

 
Linear Discriminant Analysis (4/6)

Does
S
w
-
1

always exist?

cont.

To alleviate this problem, we can use PCA first:

1)
PCA is first applied to the data set to reduce its dimensionality.

2)
LDA is then applied to find the most discriminative directions:

Linear Discriminant Analysis (5/6)

D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image
Retrieval", IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 18, no. 8, pp. 831
-
836, 1996

PCA

LDA

Linear Discriminant Analysis (6/6)

Factors unrelated to classification

MEF vectors show the tendency of PCA to capture major variations
in the training set such as lighting direction.

MDF vectors discount those factors unrelated to classification.

D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image
Retrieval", IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 18, no. 8, pp. 831
-
836, 1996

INDEPENDENT COMPONENT
ANALYSIS

PCA vs ICA

PCA

Focus on uncorrelated and Gaussian components

Second
-
order statistics

Orthogonal transformation

ICA

Focus on independent and non
-
Gaussian components

Higher
-
order statistics

Non
-
orthogonal transformation

Independent Component
Analysis (1/5)

Concept of ICA

A given signal(
x
) is generated by linear mixing(
A
) of independent
components(
s
)

ICA is a statistical analysis method to estimate those independent
components(
z
) and Mixing rule(
W
)

A
ij

s
1

s
2

s
3

s
M

s

A

x

x
1

x
2

x
3

x
M

W
ij

W

z

z
1

z
2

z
3

z
M

WAs
Wx
z

We do not know

Both unknowns

Some optimization

Function is required!!

Independent Component
Analysis (2/5)

A
W
U
A
X
U
X
W

1
Independent Component
Analysis(3/5)

What is independent component??

If one variable can not be estimated from other
variables, it is independent.

By Central Limit Theorem, a sum of two
independent random variables is more gaussian
than original variables

distribution of
independent components are nongaussian

To estimate ICs, z should have nongaussian
distribution, i.e. we should maximize
nonguassianity.

Independent Component
Analysis(4/5)

What is nongaussianity?

Supergaussian

Subgaussian

Low entropy

Gaussian

Supergaussian

Subgaussian

Independent Component
Analysis(5/5)

Measuring nongaussianity by Kurtosis

Kurtosis : 4
th

order cumulant of randomvariable

If kurt(z) is zero, gaussian

If kurt(z) is positive, supergaussian

If kurt(z) is negative, subgaussian

Maximzation of |kurt(z)| by gradient method

2
2
4
})
{
(
3
}
{
)
(
z
E
z
E
z
kurt

]
3
}
)
(
{
))[
(
(
4
|
)
(
|
2
3
||
w
||
w
x
w
x
x
w
w
x
w

T
T
T
E
kurt
sign
kurt
||
w
||
w/
w
x)
x(w
x
w
w
T
T

}
{
))
(
(
3
E
kurt
sign
w
x)
x(w
w
T
3
}
{
3

E
Fast
-
fixed point algorithm

Simply change

The norm of w

PCA vs LDA vs ICA

PCA

: Proper to
dimension reduction

LDA

: Proper to
pattern classification

if the
number of training samples of each class are
large

ICA

: Proper to
blind source separation or
classification

using ICs when class id of
training data is not available

References

Simon Haykin, “Neural Networks

A Comprehensive
Foundation
-

2
nd

Edition,” Prentice Hall

Marian Stewart Bartlett, “Face Image Analysis by