Independent Component
Analysis
CMPUT 466/551
Nilanjan Ray
The Origin of ICA: Factor Analysis
•
Multivariate data are often thought to be indirect measurements
arising from some underlying sources, which cannot be directly
measured/observed.
•
Examples
–
Educational and psychological tests use the answers to questionnaires
to measure the underlying intelligence and other mental abilities of
subjects
–
EEG brain scans measure the neuronal activity in various parts of the
brain indirectly via electromagnetic signals recorded at sensors placed
at various positions on the head.
•
Factor analysis is a classical technique developed in statistical
literature that aims at identifying these
latent
sources.
•
Independent component analysis (ICA) is a kind of factor analysis
that can uniquely identify the latent variables.
Latent Variables and Factor Analysis
p
pp
p
p
p
p
p
p
p
S
a
S
a
S
a
X
S
a
S
a
S
a
X
S
a
S
a
S
a
X
2
2
1
1
2
2
22
1
21
2
1
2
12
1
11
1
Latent variable model:
or,
AS
X
Observed variable
Latent components
Mixing matrix
Factor analysis attempts to find out
both
the mixing coefficients and the
latent components given some
instances
of observed variables
Latent Variables and Factor Analysis…
Typically we require the latent variables to have
unit variance
and to be
uncorrelated
.
Thus, in the following model,
cov
(
S
) =
I
.
AS
X
This representation has an
ambiguity
. Consider, for example an orthogonal matrix
R:
*
*
S
A
RS
AR
X
T
I
RR
RIR
R
S
R
S
T
T
T
)
cov(
)
cov(
*
So, is also a factor model with unit variance, uncorrelated latent variables.
*
*
S
A
X
Classical factor analysis
cannot
remove this ambiguity; ICA can
remove
this ambiguity.
Classical Factor Analysis
p
p
pq
p
p
p
p
q
p
q
S
a
S
a
S
a
X
S
a
S
a
S
a
X
S
a
S
a
S
a
X
2
2
1
1
2
2
2
22
1
21
2
1
1
2
12
1
11
1
Model:
’s are zero mean, uncorrelated Gaussian noise.
q
<
p
, i.e., the number of underlying latent factor is assumed less than
the number of observed components.
D
AA
T
The covariance matrix takes this form:
Maximum likelihood estimation is used to estimate
A
.
Diagonal matrix
However, still the previous problem of ambiguity remains here too…
Independent Component Analysis
•
Step 1: Center data:
•
Step 2: Whiten data: compute SVD of the centered data
matrix
–
After whitening in the factor model, the covariance of
x
,
cov
(
x
) =
I
, and
A
become orthogonal
•
Step 3: Find out
orthogonal
A
and
unit variance
,
non

Gaussian and independent
S
)
(
;
1
1
x
x
x
x
x
i
i
N
i
i
N
i
T
i
T
N
U
UD
UDV
x
x
x
x
2
/
1
1
;
]
[
AS
X
PCA
Example: PCA
and
ICA
2
1
22
21
12
11
2
1
s
s
a
a
a
a
x
x
Blind source separation (cocktail party problem)
Model:
PCA vs.
ICA
PCA:
1.
Find projections to
minimize
reconstruction error
•
Variance of projected
data is as large as
possible
2.
2nd

order statistics
needed
(
cov
(
x
))
ICA:
1.
Find “interesting”
projections
–
Projected data look as
non

Gaussian
,
independent
as possible
2.
Higher

order statistics
needed to measure
degree of
independence
Computing ICA
Step 3: Find out
orthogonal
A
and
unit variance
,
non

Gaussian and independent
S.
The computational approaches are mostly based on information theoretic criterion.
•
Kullback

Leibler
(KL) divergence
•
Negentropy
Another different approach emerged recently is called “Product Density Approach”
AS
X
Model:
ICA: KL Divergence Criterion
•
x
is
zero

mean
and whitened
•
KL
divergence measures “distance” between
two probability densities
–
Find
A
such that
KL
(.) is minimized:
Independent density
Joint density
H
is differential entropy:
))]
(
[log(
))
(
log(
)
(
)
(
y
f
E
dy
y
f
y
f
y
H
ICA: KL Divergence Criterion…
•
Theorem for random variable transformation says:
So,
Hence,
Minimize with respect to
orthogonal
A
ICA:
Negentropy
Criterion
•
Differential entropy
H
(.) is not invariant to
scaling
of variable
•
Negentropy
is
a scale

normalized
version of
H
(.):
•
Negentropy
measures the departure of a
r.v
. s
from a Gaussian
r.v
. with same variance
•
Optimization
criterion:
ICA:
Negentropy
Criterion…
•
Approximate the
negentropy
from data by:
•
FastICA
(
http://www.cis.hut.fi/projects/ica/fastica/
) is
based on
negentropy
. Free software in Matlab,
C++, Python…
ICA Filter Bank for Image Processing
An image patch is modeled as a weighted sum of basis images (basis functions):
Image patch
Basis functions (a.k.a. ICA filter bank)
Jenssen
and
Eltoft
, “ICA filter bank for segmentation of textured images,” 4
th
International symposium on ICA and BSS,
Nara, Japan, 2003
s
s
a
a
a
x
A
N
]
[
2
1
x
x
s
T
A
A
1
Rows of
A
T
are filters
Columns of
A
are filters
Filter responses
Texture and ICA Filter Bank
Training textures
12x12 ICA basis functions or ICA filters
Jenssen
and
Eltoft
, “ICA filter bank for segmentation of textured images,” 4
th
International symposium on ICA and BSS,
Nara, Japan, 2003
Segmentation By ICA FB
Image,
I
ICA Filter Bank
With
n
filters
I
1,
I
2
,…,
I
n
Clustering
Segmented image,
C
Above is an unsupervised setting.
Segmentation (i.e., classification in this context) can also be performed by
a supervised method on the output feature images
I
1
,
I
2
, …,
I
n
.
A texture image
Segmentation
Jenssen
and
Eltoft
, “ICA filter bank for segmentation of textured images,” 4
th
International symposium on ICA and BSS,
Nara, Japan, 2003
These
are filter
responses
On PCA and ICA
•
PCA & ICA differ in choosing projection
directions:
–
Different principle: least

square (PCA),
independence (ICA)
•
For data compression, PCA would be a
good choice
•
For discovering structures of data, ICA
would be a reasonable choice
Comments 0
Log in to post a comment