ICA

parathyroidsanchovyΤεχνίτη Νοημοσύνη και Ρομποτική

17 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

73 εμφανίσεις

Independent Components

Analysis

What is ICA?



“Independent component analysis (ICA) is
a method for finding underlying factors or
components from multivariate (multi
-
dimensional) statistical data. What
distinguishes ICA from other methods is that it
looks for components that are both
statistically
independent
, and
nonGaussian
.”

A.Hyvarinen, A.Karhunen, E.Oja

‘Independent Component Analysis’

ICA

Blind Signal Separation (BSS) or Independent Component Analysis (ICA) is the
identification & separation of mixtures of sources with little prior
information.


Applications include:





Audio Processing


Medical data


Finance


Array processing (beamforming)


Coding


… and most applications where Factor Analysis and PCA is currently used.


While PCA seeks directions that represents data best in a
Σ
|x
0

-

x|
2
sense,
ICA seeks such directions that are most independent from each other.

Often used on Time Series separation of Multiple Targets


ICA estimation principles

by
A.Hyvarinen, A.Karhunen, E.Oja

‘Independent Component Analysis’



Principle 1
: “Nonlinear decorrelation. Find the
matrix
W

so that for any
i


j

, the components
y
i

and
y
j

are uncorrelated,
and
the transformed components
g(y
i
)

and
h(y
j
)

are uncorrelated, where
g
and
h

are
some suitable nonlinear functions.”


Principle 2
: “Maximum nongaussianity”. Find the
local maxima of nongaussianity of a linear
combination
y=Wx

under the constraint that the
variance of
x
is constant.


Each local maximum gives one independent
component.

ICA mathematical approach

from
A.Hyvarinen, A.Karhunen, E.Oja ‘Independent Component
Analysis’


“Given a set of observations of random
variables
x
1
(t), x
2
(t)…x
n
(t)
, where
t

is the time
or sample index, assume that they are
generated as a linear mixture of independent
components:
y=Wx
, where
W

is some
unknown matrix. Independent component
analysis now consists of estimating both the
matrix
W

and the
y
i
(t)
, when we only observe
the
x
i
(t)
.”

The simple “Cocktail Party” Problem

Sources

Observations

s
1

s
2

x
1

x
2

Mixing matrix
A

x

=
As

n

sources, m=
n

observations

Classical ICA (fast ICA) estimation

0
50
100
150
200
250
-0.2
-0.1
0.0
0.1
0.2
V1
0
50
100
150
200
250
-0.2
-0.1
0.0
0.1
0.2
V2
0
50
100
150
200
250
-0.10
-0.05
0.00
0.05
0.10
V3

ICA

Observing signals

Original source signal

0

50

100

150

200

250

-
0.10

-
0.05

0.00

0.05

0.10

V4

Motivation

Two Independent Sources

Mixture at two Mics

a
IJ

... Depend on the distances of the microphones from the speakers

2
22
1
21
2
2
12
1
11
1
)
(
)
(
s
a
s
a
t
x
s
a
s
a
t
x




Motivation

Get the Independent Signals out of the Mixture

ICA Model (Noise Free)


Use statistical “latent variables“ system


Random variable s
k

instead of time signal


x
j

= a
j1
s
1

+ a
j2
s
2

+ .. + a
jn
s
n
, for all j

x

=
As


IC‘s
s

are latent variables & are unknown
AND

Mixing matrix
A

is also
unknown


Task: estimate
A

and
s

using only the observeable random vector
x


Lets assume that no. of IC‘s = no of observable mixtures





and

A

is square and invertible


So after estimating A, we can compute W=A
-
1
and hence

s = Wx = A
-
1
x


Illustration

2 IC‘s with distribution:





Zero mean and variance equal to 1


Mixing matrix
A

is




The edges of the parallelogram are in the direction of
the cols of
A

So if we can Est joint pdf of
x
1

&
x
2

and then
locating the edges, we can Est
A
.










1
2
3
2
A












otherwise
s
if
s
p
i
i
0
3
|
|
3
2
1
)
(
Restrictions


s
i

are statistically independent


p(s
1
,s
2
) = p(s
1
)p(s
2
)


Nongaussian distributions


The joint density of unit variance
s
1

& s
2

is symmetric. So it
doesn‘t contain any information
about the directions of the cols of
the mixing matrix A. So A cann‘t
be estimated.


If only one IC is gaussian, the
estimation is still possible.













2
exp
2
1
)
,
(
2
2
2
1
2
1
x
x
x
x
p

Ambiguities


Can‘t determine the variances (energies) of the
IC‘s


Both s & A are unknowns, any scalar multiple in one of the sources can
always be cancelled by dividing the corresponding col of A by it.


Fix magnitudes of IC‘s assuming unit variance: E{s
i
2
} = 1


Only ambiguity of sign remains


Can‘t determine the order of the IC‘s


Terms can be freely changed, because both
s

and
A

are unknown. So
we can call any IC as the first one.

ICA Principal (Non
-
Gaussian is Independent)


Key to estimating A is non
-
gaussianity


The distribution of a sum of independent random variables tends toward a Gaussian
distribution. (By CLT)








f(s
1
) f(s
2
) f(x
1
) = f(s
1

+s
2
)


Where
w

is one of the rows of matrix
W.




y is a linear combination of s
i
, with weights given by z
i
.


Since sum of two indep r.v. is more gaussian than individual r.v., so z
T
s is more
gaussian than either of s
i
. AND becomes least gaussian when its equal to one of s
i
.


So we could take
w

as a vector which
maximizes the non
-
gaussianity
of
w
T
x
.


Such a
w

would correspond to a
z

with only one non zero comp. So we get back the
s
i.

s
z
As
w
x
w
y
T
T
T



Measures of Non
-
Gaussianity


We need to have a quantitative measure of non
-
gaussianity for ICA Estimation.


Kurtotis : gauss=0

(sensitive to outliers)



Entropy : gauss=largest



Neg
-
entropy : gauss = 0

(difficult to estimate)



Approximations






where
v
is a standard gaussian random variable and :



2
2
4
})
{
(
3
}
{
)
(
y
E
y
E
y
kurt





dy
y
f
y
f
y
H
)
(
log
)
(
)
(
)
(
)
(
)
(
y
H
y
H
y
J
gauss




2
2
2
)
(
48
1
12
1
)
(
y
kurt
y
E
y
J








2
)
(
)
(
)
(
v
G
E
y
G
E
y
J


)
2
/
.
exp(
)
(
)
.
cosh(
log
1
)
(
2
u
a
y
G
y
a
a
y
G




Data Centering & Whitening


Centering

x

=
x



E
{
x
‘}


But this doesn‘t mean that ICA cannt estimate the mean, but it just simplifies
the Alg.


IC‘s are also zero mean because of:

E{
s
} = W
E
{
x
}


After ICA, add W.
E
{
x
‘} to zero mean IC‘s


Whitening


We transform the x’s linearly so that the x
~

are white. Its done by EVD.

x
~
= (ED
-
1/2
E
T
)x = ED
-
1/2
E
T

Ax = A
~
s

where
E
{xx
~
} = EDE
T

So we have to Estimate Orthonormal Matrix A
~


An orthonormal matrix has n(n
-
1)/2 degrees of freedom. So for large dim A we
have to est only half as much parameters. This greatly simplifies ICA.


Reducing dim of data (choosing dominant Eig) while doing whitening also
help.


Computing the pre
-
processing steps for ICA

0) Centring = make the signals centred in zero

x
i


x
i
-

E[x
i
] for each i


1) Sphering = make the signals uncorrelated. I.e. apply a transform
V

to
x

such that Cov(
Vx
)=
I

// where Cov(
y
)=E[
yy
T
] denotes covariance matrix

V
=E[
xx
T
]
-
1/2

// can be done using ‘sqrtm’ function in MatLab

x

Vx

// for all t (indexes t dropped here)


// bold lowercase refers to column vector; bold upper to matrix


Scope:
to make the remaining computations simpler. It is known that
independent variables must be uncorrelated


so this can be fulfilled before
proceeding to the full ICA

Fixed Point Algorithm

Input:
X

Random init of
W

Iterate until convergence:




Output:
W
,
S


1
)
(
)
(




W
W
W
W
S
X
W
X
W
S
T
T
T
g





T
t
T
t
T
G
Obj
1
)
(
)
(
)
(
I
W
W
Λ
x
W
W
0
ΛW
X
W
X
W





T
T
g
Obj
)
(

where g(.) is derivative of G(.),


W

is the rotation transform sought


Λ

is Lagrange multiplier to enforce that
W is an orthogonal transform i.e. a rotation

Solve by fixed point iterations

The effect of

Λ

is an orthogonal de
-
correlation

Aapo Hyvarinen (97)

Computing the rotation step

This is based on an the maximisation of an
objective function G(.) which contains an
approximate non
-
Gaussianity measure.


The overall transform then
to take
X

back to
S

is (
W
T
V
)


There are several g(.)
options, each will work best
in special cases. See FastICA
sw / tut for details.

Application domains of ICA


Blind source separation (Bell&Sejnowski, Te won Lee, Girolami,
Hyvarinen, etc.)


Image denoising (Hyvarinen)


Medical signal processing



fMRI, ECG,
EEG (Mackeig)


Modelling of the hippocampus and visual cortex (Lorincz,
Hyvarinen)


Feature extraction
,
face recognition (Marni Bartlett)


Compression, redundancy reduction


Watermarking (D Lowe)


Clustering (Girolami, Kolenda)


Time series analysis (Back, Valpola)


Topic extraction (Kolenda, Bingham, Kaban)


Scientific Data Mining (Kaban, etc)

Image denoising

Wiener
filtering

ICA
filtering

Noisy
image

Original
image

Noisy ICA Model

x

=
As

+
n


A

... mxn mixing matrix


s

... n
-
dimensional vector of IC‘s


n

... m
-
dimensional random noise vector


Same assumptions as for noise
-
free model, if we use measures of nongaussianity
which are immune to gaussian noise.


So gaussian moments are used as contrast functions. i.e.




however, in pre
-
whitening the effect of noise must be taken in to account:

x
~
= (
E{xx
T
}

-

Σ
)
-
1/2
x

x
~

= Bs + n
~
.










)
2
/
exp(
2
/
1
)
(
)
(
)
(
)
(
2
2
2
c
x
c
y
G
v
G
E
y
G
E
y
J





22

Exercise (part 1, Updated Nov 10)


How would you calculate efficiently the PCA of
data where the dimensionality d is much larger
than the number of vector observations n?


Download the Wisconsin Data from the UC
Irvine repository, extract PCAs from the data,
test scatter plots of original data and after
projecting onto the principal components, plot
Eigen values

Ex1. Part 2

to
ninbbelt@gmail.com

subject: Ex1 and last names

1.
Given a high dimensional data, is there a way
to know if all possible projections of the data
are Gaussian? Explain

-

What if there is some additive Gaussian noise?

Ex1. (cont.)

2.


Use Fast ICA (easily found in google)
http://www.cis.hut.fi/projects/ica/fastica/co
de/dlcode.html


Choose your favorite two songs


Create 3 mixture matrices and mix them


Apply fastica to de
-
mix

Ex1 (cont.)


Discuss the results


What happens when the mixing matrix is
symmetric


Why did u get different results with different
mixing matrices



Demonstrate that you got close to the original
files



Try different nonlinearity of fastica, which one
is best, can you see that from the data

References


Feature extraction (Images, Video)


http://hlab.phys.rug.nl/demos/ica/


Aapo Hyvarinen
: ICA (1999)


http://www.cis.hut.fi/aapo/papers/NCS99web/node11.html


ICA demo step
-
by
-
step


http://www.cis.hut.fi/projects/ica/icademo/


Lots of links


http://sound.media.mit.edu/~paris/ica.html


object
-
based audio capture demos


http://www.media.mit.edu/~westner/sepdemo.html


Demo for BBS with „
CoBliSS
“ (wav
-
files)


http://www.esp.ele.tue.nl/onderzoek/daniels/BSS.html


Tomas Zeman‘s page on BSS research


http://ica.fun
-
thom.misto.cz/page3.html


Virtual Laboratories in Probability and Statistics


http://www.math.uah.edu/stat/index.html