Independent Component Analysis

molassesitalianΤεχνίτη Νοημοσύνη και Ρομποτική

6 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

50 εμφανίσεις

Independent Component
Analysis

CMPUT 466/551

Nilanjan Ray

The Origin of ICA: Factor Analysis


Multivariate data are often thought to be indirect measurements
arising from some underlying sources, which cannot be directly
measured/observed.


Examples


Educational and psychological tests use the answers to questionnaires
to measure the underlying intelligence and other mental abilities of
subjects


EEG brain scans measure the neuronal activity in various parts of the
brain indirectly via electromagnetic signals recorded at sensors placed
at various positions on the head.


Factor analysis is a classical technique developed in statistical
literature that aims at identifying these
latent

sources.


Independent component analysis (ICA) is a kind of factor analysis
that can uniquely identify the latent variables.

Latent Variables and Factor Analysis

p
pp
p
p
p
p
p
p
p
S
a
S
a
S
a
X
S
a
S
a
S
a
X
S
a
S
a
S
a
X
















2
2
1
1
2
2
22
1
21
2
1
2
12
1
11
1
Latent variable model:

or,

AS
X

Observed variable

Latent components

Mixing matrix

Factor analysis attempts to find out
both

the mixing coefficients and the

latent components given some
instances

of observed variables

Latent Variables and Factor Analysis…

Typically we require the latent variables to have
unit variance
and to be
uncorrelated
.

Thus, in the following model,
cov
(
S
) =
I
.

AS
X

This representation has an
ambiguity
. Consider, for example an orthogonal matrix
R:

*
*
S
A
RS
AR
X
T


I
RR
RIR
R
S
R
S
T
T
T




)
cov(
)
cov(
*
So, is also a factor model with unit variance, uncorrelated latent variables.

*
*
S
A
X

Classical factor analysis
cannot

remove this ambiguity; ICA can
remove
this ambiguity.

Classical Factor Analysis

p
p
pq
p
p
p
p
q
p
q
S
a
S
a
S
a
X
S
a
S
a
S
a
X
S
a
S
a
S
a
X






















2
2
1
1
2
2
2
22
1
21
2
1
1
2
12
1
11
1
Model:


’s are zero mean, uncorrelated Gaussian noise.

q

<
p
, i.e., the number of underlying latent factor is assumed less than

the number of observed components.


D
AA
T



The covariance matrix takes this form:

Maximum likelihood estimation is used to estimate
A
.

Diagonal matrix

However, still the previous problem of ambiguity remains here too…

Independent Component Analysis


Step 1: Center data:



Step 2: Whiten data: compute SVD of the centered data
matrix



After whitening in the factor model, the covariance of
x
,
cov
(
x
) =
I
, and
A

become orthogonal



Step 3: Find out
orthogonal

A

and
unit variance
,
non
-
Gaussian and independent

S

)
(
;
1
1
x
x
x
x
x





i
i
N
i
i
N
i
T
i
T
N
U
UD
UDV
x
x
x
x
2
/
1
1
;
]
[




AS
X

PCA

Example: PCA
and
ICA




















2
1
22
21
12
11
2
1
s
s
a
a
a
a
x
x
Blind source separation (cocktail party problem)

Model:

PCA vs.
ICA

PCA:

1.
Find projections to
minimize
reconstruction error


Variance of projected
data is as large as
possible

2.
2nd
-
order statistics
needed
(
cov
(
x
))


ICA:

1.
Find “interesting”
projections


Projected data look as
non
-
Gaussian
,
independent

as possible

2.
Higher
-
order statistics
needed to measure
degree of
independence


Computing ICA

Step 3: Find out
orthogonal

A

and
unit variance
,
non
-
Gaussian and independent

S.


The computational approaches are mostly based on information theoretic criterion.




Kullback
-
Leibler

(KL) divergence




Negentropy


Another different approach emerged recently is called “Product Density Approach”



AS
X

Model:

ICA: KL Divergence Criterion


x
is
zero
-
mean
and whitened


KL
divergence measures “distance” between
two probability densities


Find
A

such that
KL
(.) is minimized:

Independent density

Joint density

H

is differential entropy:

))]
(
[log(
))
(
log(
)
(
)
(
y
f
E
dy
y
f
y
f
y
H





ICA: KL Divergence Criterion…


Theorem for random variable transformation says:

So,

Hence,

Minimize with respect to
orthogonal

A

ICA:
Negentropy

Criterion


Differential entropy
H
(.) is not invariant to
scaling
of variable


Negentropy

is
a scale
-
normalized
version of
H
(.):





Negentropy

measures the departure of a
r.v
. s
from a Gaussian
r.v
. with same variance


Optimization
criterion:


ICA:
Negentropy

Criterion…


Approximate the
negentropy

from data by:





FastICA

(
http://www.cis.hut.fi/projects/ica/fastica/
) is
based on
negentropy
. Free software in Matlab,
C++, Python…

ICA Filter Bank for Image Processing

An image patch is modeled as a weighted sum of basis images (basis functions):

Image patch

Basis functions (a.k.a. ICA filter bank)

Jenssen

and
Eltoft
, “ICA filter bank for segmentation of textured images,” 4
th

International symposium on ICA and BSS,
Nara, Japan, 2003

s
s
a
a
a
x
A
N


]
[
2
1

x
x
s
T
A
A



1
Rows of
A
T

are filters

Columns of
A

are filters

Filter responses

Texture and ICA Filter Bank

Training textures

12x12 ICA basis functions or ICA filters

Jenssen

and
Eltoft
, “ICA filter bank for segmentation of textured images,” 4
th

International symposium on ICA and BSS,
Nara, Japan, 2003

Segmentation By ICA FB

Image,
I

ICA Filter Bank

With
n

filters

I
1,

I
2
,…,
I
n

Clustering

Segmented image,
C

Above is an unsupervised setting.


Segmentation (i.e., classification in this context) can also be performed by

a supervised method on the output feature images
I
1
,
I
2

, …,
I
n
.

A texture image

Segmentation

Jenssen

and
Eltoft
, “ICA filter bank for segmentation of textured images,” 4
th

International symposium on ICA and BSS,
Nara, Japan, 2003

These

are filter

responses

On PCA and ICA


PCA & ICA differ in choosing projection
directions:


Different principle: least
-
square (PCA),
independence (ICA)


For data compression, PCA would be a
good choice


For discovering structures of data, ICA
would be a reasonable choice