Machine learning techniques for quantifying neural synchrony:

zoomzurichAI and Robotics

Oct 16, 2013 (3 years and 11 months ago)

106 views

Machine learning techniques for
quantifying neural synchrony:


application to the diagnosis of
Alzheimer's disease

from EEG

Justin Dauwels

LIDS, MIT

LMI, Harvard Medical School

Amari Research Unit, Brain Science Institute, RIKEN


June 9, 2008

RIKEN Brain Science Institute



RIKEN Wako Campus (near Tokyo)




about 400 researchers and staff (20% foreign)




300 research fellows and visiting scientists




about 60 laboratories




research covers most aspects of brain science

Collaborators

François
Vialatte
*, Theo Weber
+
, Shun
-
ichi

Amari
*,
Andrzej

Cichocki
* (*RIKEN,
+
MIT)


Project

Early diagnosis of Alzheimer’s disease based on
EEG


Financial Support

Research Overview





EEG (RIKEN, MIT, MGH)



diagnosis of Alzheimer’s disease



detection/prediction of epileptic seizures



analysis of EEG evoked by visual/auditory stimuli



EEG during meditation



projects related to brain
-
computer interface (BMI)




Calcium imaging (RIKEN, NAIST, MIT)


effect of calcium on neural growth


role of calcium propagation in
gliacells

and neurons





Diffusion MRI (
Brigham&Women’s

Hospital, Harvard Medical School, MIT)



estimation and clustering of tracts (future project)

Machine learning & signal processing for applications in
NEUROSCIENCE


= development of
ALGORITHMS

to analyze
brain signals

subject
of this
talk

Overview


Alzheimer’s Disease (AD)


EEG of AD patients: decrease in synchrony


Synchrony measure in time
-
frequency domain


Pairs of EEG signals


Collections of EEG signals


Numerical Results


Outlook



A
lzheimer's

disease

Outside glimpse: clinical perspective


Mild (early stage)

-
becomes less energetic or spontaneous

-
noticeable cognitive deficits

-
still independent (able to compensate)


Moderate (middle stage)

-
Mental abilities decline

-
personality changes

-
become dependent on caregivers


Severe (late stage)

-
complete deterioration of the personality

-
loss of control over bodily functions

-
total dependence on caregivers

Apathy

Memory

(forgetting

relatives)

Evolution of the disease (stages)

One disease,

many symptoms

Loss of

Self
-
control

Video sources: Alzheimer society


2 to 5 years before

-
mild cognitive impairment (often unnoticed)

-
6 to 25 % progress to Alzheimer's per year

memory, language, executive functions,

apraxia, apathy, agnosia, etc…



2% to 5% of people over 65 years old


up to 20% of people over 80


Jeong 2004 (Nature)

EEG data

A
lzheimer's

disease

Inside glimpse: brain atrophy

Video source: P. Thompson, J.Neuroscience, 2003

Images: Jannis Productions.

(R. Fredenburg; S. Jannis)

amyloid plaques and

neurofibrillary tangles

Video source:

Alzheimer society

Overview


Alzheimer’s Disease (AD)


EEG of AD patients: decrease in synchrony


Synchrony measure in time
-
frequency domain


Pairs of EEG signals


Collections of EEG signals


Numerical Results


Outlook



A
lzheimer's

disease

Inside glimpse: abnormal EEG


AD vs. MCI
(Hogan et al. 203; Jiang et al., 2005)


AD vs. Control
(Hermann,
Demilrap
, 2005,
Yagyu

et al. 1997;
Stam

et al., 2002;
Babiloni

et al. 2006)


MCI
vs.
mildAD

(
Babiloni

et al., 2006).


Decrease of synchrony

Brain “slow
-
down”

slow rhythms (0.5
-
8 Hz)


fast rhythms (8
-
30 Hz)






(
Babiloni

et al., 2004;
Besthorn

et al., 1997;
Jelic

et al. 1996,
Jeong

2004;
Dierks

et al., 1993).

Images: www.cerebromente.org.br

EEG system: inexpensive, mobile, useful for screening

focus of this project

Spontaneous (scalp) EEG

Fourier power

f (Hz)

t (sec)

amplitude

Fourier
|X(f)|
2


EEG

x(t)


Time
-
frequency
|X(t,f)|
2

(wavelet transform)

Time
-
frequency patterns

(“bumps”)

Fourier transform

High frequency

Low frequency

Frequency

1









2

3

2

1

3

Windowed Fourier transform

*

=

Fourier basis functions

Window

function

windowed basis

functions

Windowed

Fourier

Transform

t

f

Spontaneous EEG

Fourier power

f (Hz)

t (sec)

amplitude

Fourier
|X(f)|
2


EEG

x(t)


Time
-
frequency
|X(t,f)|
2

(wavelet transform)

Time
-
frequency patterns

(“bumps”)

Signatures of local synchrony

f (Hz)

t (sec)

Time
-
frequency patterns

(“bumps”)

EEG stems from
thousands

of neurons

bump

if neurons are
phase
-
locked

= local synchrony








A
lzheimer's

disease

Inside glimpse: abnormal EEG


AD vs. MCI
(Hogan et al. 203; Jiang et al., 2005)


AD vs. Control
(Hermann,
Demilrap
, 2005,
Yagyu

et al. 1997;
Stam

et al., 2002;
Babiloni

et al. 2006)


MCI
vs.
mildAD

(
Babiloni

et al., 2006).


Decrease of synchrony

Brain “slow
-
down”

slow rhythms (0.5
-
8 Hz)


fast rhythms (8
-
30 Hz)






(
Babiloni

et al., 2004;
Besthorn

et al., 1997;
Jelic

et al. 1996,
Jeong

2004;
Dierks

et al., 1993).

Images: www.cerebromente.org.br

EEG system: inexpensive, mobile, useful for screening

focus of this project

Overview


Alzheimer’s Disease (AD)


EEG of AD patients: decrease in synchrony


Synchrony measure in time
-
frequency domain


Pairs of EEG signals


Collections of EEG signals


Numerical Results


Outlook



Comparing EEG signal
rhythms ?





PROBLEM I:


Signals of 3
seconds sampled at 100 Hz (


㌰〠
獡浰s敳
)

Time
-
frequency representation of one signal = about
25 000 coefficients


2 signals







Numerous
neighboring

pixels

Comparing EEG signal rhythms ?(2)

One pixel

PROBLEM II:


Shifts in time
-
frequency!


Sparse representation: bump model

Assumptions
:



1.
time
-
frequency

map

is

suitable

representation


2.
oscillatory

bursts

(“bumps”)

convey

key

information

Bumps


Sparse
representation

F. Vialatte et al. “
A machine learning approach to the analysis of time
-
frequency maps and its application to neural dynamics
”, Neural Networks (2007).

Normalization
:





10
4
-

10
5

coefficients

about 10
2

parameters

t (sec)

f(Hz)

f(Hz)

t (sec)

f(Hz)

t (sec)

Similarity of bump models...

How

similar


or

synchronous


are two bump models
?

=
GLOBAL

synchrony

Reminder: bumps due to
LOCAL

synchrony

=
MULTI
-
SCALE

approach

... by matching bumps

y
1

y
2

Some bumps
match

Offset

between matched bumps

SIMILAR
bump models if:

Many
matches

Strongly

overlapping
matches


... by matching bumps (2)



Bumps in
one

model, but
NOT

in other


→ fraction of “spurious” bumps
ρ
spur




Bumps in
both

models, but with offset


→ Average time offset
δ
t

(delay)


→ Timing jitter with variance
s
t


→ Average frequency offset
δ
f



→ Frequency

jitter with variance
s
f


Synchrony:
only
s
t
and
ρ
spur
relevant




PROBLEM:

Given two bump models, compute
(
ρ
spur
,

δ
t
,

s
t
,

δ
f
,
s
f

)

Stochastic Event Synchrony (SES)


=

(
ρ
spur
,

δ
t
,

s
t
,

δ
f
,
s
f

)

Overview


Alzheimer’s Disease (AD)


EEG of AD patients: decrease in synchrony


Synchrony measure in time
-
frequency domain


Pairs of EEG signals


Collections of EEG signals


Numerical Results


Outlook



Average synchrony

3. SES for each pair of models

4. Average the SES parameters

1.
Group electrodes in regions

2.
Bump model for each region

Beyond pairwise interactions...

Pairwise similarity

Multi
-
variate similarity

...by clustering

y
1

y
2

y
3

y
4

y
5

y
1

y
2

y
3

y
4

y
5

Constraint: in each cluster
at most
one bump from each signal

Models similar if



few deletions/large clusters



little jitter

HARD combinatorial problem!

Overview


Alzheimer’s Disease (AD)


EEG of AD patients: decrease in synchrony


Synchrony measure in time
-
frequency domain


Pairs of EEG signals


Collections of EEG signals


Numerical Results


Outlook



EEG Data

EEG data provided by Prof. T. Musha



EEG of 22
Mild Cognitive Impairment
(MCI) patients and 38 age
-
matched


control

subjects (CTR) recorded while in
rest with closed eyes




spontaneous EEG




All
22 MCI patients suffered from
Alzheimer’s disease
(AD) later on




Electrodes located on
21 sites
according to 10
-
20 international system




Electrodes grouped into
5 zones
(reduces number of pairs)


1 bump model per zone




Used continuous “artifact
-
free” intervals of
20s




Band pass filtered
between 4 and 30 Hz



Similarity measures


Correlation

and
coherence


Granger causality
(linear system):
DTF, ffDTF, dDTF, PDC, PC, ...






Phase Synchrony:
compare

instantaneous phases
(wavelet/Hilbert transform)









State space

based measures


sync likelihood, S
-
estimator, S
-
H
-
N
-
indices, ...




Information
-
theoretic

measures


KL divergence, Jensen
-
Shannon divergence, ...



No Phase Locking

Phase Locking

TIME

FREQUENCY

Sensitivity (average synchrony)

Granger

Info. Theor.

State Space

Phase

SES

Corr/Coh

Mann
-
Whitney test:
small p value suggests large difference in statistics of both groups

Significant differences for ffDTF and
ρ
!

Classification


Clear

separation, but
not yet

useful as diagnostic tool


Additional

indicators needed (fMRI, MEG, DTI, ...)


Can be used for
screening

population (inexpensive, simple, fast)

ffDTF

Strong (anti
-
) correlations „families“ of sync measures

Correlations

Overview


Alzheimer’s Disease (AD)


EEG of AD patients: decrease in synchrony


Synchrony measure in time
-
frequency domain


Pairs of EEG signals


Collections of EEG signals


Numerical Results


Outlook



Ongoing work


Time
-
varying

similarity parameters















s
t


low s
t

high s
t

high s
t

no stimulus

no stimulus

stimulus

low s
t

high s
t

high
s
t

Future work


Matching event
patterns

instead of
single

events











= allows us to extract
patterns

in time
-
frequency map of EEG!



HYPOTHESIS:

Perhaps specific patterns occur in time
-
frequency EEG maps


of
AD patients


before onset of
epileptic

seizures


REMARK:

Such patterns are
ignored

by classical approaches:
STATIONARITY/AVERAGING!

coupling between

frequency bands

t (sec)

f(Hz)

Conclusions


Measure for similarity of
point processes
(„
stochastic event synchrony
“)



Key idea:
alignment

of events



Solved by
statistical inference



Application:
EEG synchrony of MCI patients



About
85%

correctly classified; perhaps useful for
screening

population



Ongoing/future work
: time
-
varying SES, extracting patterns of bumps









References + software


References


Quantifying Statistical Interdependence by Message Passing on Graphs: Algorithms and
Application to Neural Signals,
Neural Computation (under revision)


A Comparative Study of Synchrony Measures for the Early Diagnosis of Alzheimer's Disease Based
on EEG,
NeuroImage

(under revision)


Measuring Neural Synchrony by Message Passing,
NIPS 2007


Quantifying the Similarity of Multiple Multi
-
Dimensional Point Processes by Integer Programming
with Application to Early Diagnosis of Alzheimer's Disease from EEG,
EMBC 2008 (submitted)









Software


MATLAB implementation
of the synchrony

measures








Machine learning techniques for
quantifying neural synchrony:


application to the diagnosis of
Alzheimer's disease

from EEG

Justin Dauwels

LIDS, MIT

LMI, Harvard Medical School

Amari Research Unit, Brain Science Institute, RIKEN


June 9, 2008

Machine learning for neuroscience


Multi
-
scale

in time and space



Data fusion
: EEG, fMRI, spike data, bio
-
imaging, ...



Large
-
scale inference



Visualization


Behavior ↔ Brain ↔ Brain Regions ↔ Neural Assemblies ↔ Single neurons ↔ Synapses ↔ Ion channels

Estimation

Deltas: average offset

Sigmas: var of offset

...where

Simple closed form expressions

artificial observations (conjugate prior)

Large
-
scale synchrony

Apparently,
all
brain regions affected...

A
lzheimer's

disease

0
2
4
6
8
10
12
14
1980
1990
2000
2010
2020
2030
2040
2050
Outside glimpse: the future (prevalence)

USA (Hebert et al. 2003)

0
20
40
60
80
100
120
2000
2030
2050
Developped countries
Developping countries
World (
Wimo

et al. 2003)

Million of sufferers

Million of sufferers



2% to 5% of people
over 65 years old


Up to 20% of people
over 80


Jeong 2004 (Nature)

Ongoing and future work

Applications





alternative inference techniques (e.g., MCMC, linear programming)



time dependent (Gaussian processes)



multivariate (T.Weber)


Fluctuations of EEG synchrony


Caused by auditory stimuli and music (T. Rutkowski)


Caused by visual stimuli (F. Vialatte)


Yoga professionals (F. Vialatte)


Professional shogi players (RIKEN & Fujitsu)


Brain
-
Computer Interfaces (T. Rutkowski)


Spike data from interacting monkeys (N. Fujii)


Calcium propagation in gliacells (N. Nakata)


Neural growth (Y. Tsukada & Y. Sakumura)


...


Algorithms


Fitting bump models



Signal

Bump

Initialisation

After adaptation

Adaptation

gradient method

F. Vialatte et al. “
A machine learning approach to the analysis of time
-
frequency maps and its application to neural dynamics
”, Neural Networks (2007).

Boxplots

SURPRISE!

No

increase in
jitter
, but significantly
less

matched activity
!


Physiological
interpretation



neural assemblies more
localized
?



harder to establish
large
-
scale

synchrony?

Similarity of bump models...

How

similar


or

synchronous


are two bump models?

POINT ESTIMATION:
θ
(i+1)

= argmax
x

log p(y, y’,
c
(i+1)

,
θ

)


Uniform

prior p(
θ
):
δ
t
,
δ
f

= average offset,
s
t
, s
f

= variance of offset


Conjugate

prior p(
θ
): still closed
-
form expression

Other kind of

prior p(
θ
): numerical optimization (gradient method)


Probabilistic inference

MATCHING:
c
(i+1)

= argmax
c

log p(y, y’,
c
,
θ
(i)

)

ALGORITHMS




Polynomial
-
time

algorithms gives
optimal

solution(s) (Edmond
-
Karp and Auction algorithm)



Linear programming
relaxation: extreme points of LP polytope are
integral



Max
-
product
algorithm

gives optimal solution if
unique
[Bayati et al. (2005), Sanghavi (2007)]

EQUIVALENT to (imperfect) bipartite max
-
weight matching problem


c
(i+1)
= argmax
c
log p(y, y’,
c
,
θ
(i)

) = argmax
c
Σ
kk’

w
kk’
(i)

c
kk’



s.t.
Σ
k’

c
kk’
≤ 1

and
Σ
k

c
kk’
≤ 1


and c
kk’
2

{0,1}

Probabilistic inference

not

necessarily
perfect

find
heaviest

set of
disjoint

edges

p(y, y’,
c
,
θ
)
/

I(
c
) p
θ
(
θ
)
Π
kk’

(
N(t
k’



t
k ;

δ
t ,
s
t,kk’
)

N(f
k’



f
k ;

δ
f ,
s
f, kk’
)

β
-
2
)
c
kk’

Max
-
product algorithm

MATCHING:
c
(i+1)

= argmax
c

log p(y, y’,
c
,
θ
(i)

)

Generative model

Max
-
product algorithm

MATCHING:
c
(i+1)

= argmax
c

log p(y, y’,
c
,
θ
(i)

)

μ↑

μ↑

μ↓

μ↓

Conditioning on
θ


Max
-
product algorithm (2)



Iteratively compute
messages




















At convergence, compute
marginals

p(c
kk’
) =
μ↓
(c
kk’
)

μ↓
(c
kk’
)

μ↑
(c
kk’
)



Decisions
: c*
kk’
= argmax
c
kk


p(c
kk’
)




Algorithm

MATCHING → max
-
product

ESTIMATION → closed
-
form

PROBLEM:

Given two bump models, compute (
ρ
spur
,

δ
t
,

s
t
,

δ
f
, s
f

)

APPROACH:
(
c*
,
θ
*
) = argmax
c,
θ

log p(y, y’,
c
,
θ
)

θ

SOLUTION:
Coordinate descent



c
(i+1)

= argmax
c

log p(y, y’,
c
,
θ
(i)

)


θ
(i+1)

= argmax
x

log p(y, y’,
c
(i+1)

,
θ

)


Generative model

Generate bump model (
hidden
)




geometric prior for number n of bumps


p(n) = (1
-

λ

S) (
λ

S)
-
n




bumps are uniformly distributed in rectangle




amplitude, width (in t and f) all i.i.d
.

Generate two
“noisy” observations




offset between hidden and observed bump


= Gaussian random vector with


mean

(
±
δ
t
/2,
±
δ
f

/2)


covariance
diag(s
t
/2, s
f
/2)




amplitude, width (in t and f) all i.i.d.




“deletion” with probability p
d

y
hidden

y

y


Easily
extendable

to more than 2 observations…

(
-
δ
t
/2,
-
δ
f

/2)

(
δ
t
/2,
δ
f

/2)

Generative model (2)



Binary variables
c
kk’




c
kk’

= 1
if
k

and
k’

are observations of same hidden bump, else
c
kk’

= 0
(e.g.,

c
ii’
= 1 c
ij’
= 0
)




Constraints: b
k

=
Σ
k’

c
kk’
and b
k’

=
Σ
k

c
kk’
are
binary
(“matching constraints”)




Generative Model

p(y, y’, y
hidden

,

c
,
δ
t

,

δ
f
,

s
t

,

s
f
) (
symmetric

in y and y’)





Eliminate y
hidden

→ offset is Gaussian RV with
mean = (
δ
t
,
δ
f

) and

covariance
diag (s
t
, s
f
)






Probabilistic Inference:

(
c*
,
θ
*
) = argmax
c,
θ

log p(y, y’,
c
,
θ
)


y

y


(
-
δ
t
/2,
-
δ
f

/2)

(
δ
t
/2,
δ
f

/2)

i

i’

j’

p(y, y’,
c
,
θ
) =


p(y, y’, y
hidden

,

c
,
θ
) dy
hidden


θ



Bumps in
one

model, but
NOT

in other


→ fraction of “spurious” bumps
ρ
spur




Bumps in
both

models, but with offset


→ Average time offset
δ
t

(delay)


→ Timing jitter with variance
s
t


→ Average frequency offset
δ
f



→ Frequency

jitter with variance
s
f


PROBLEM:

Given two bump models, compute (
ρ
spur
,

δ
t
,

s
t
,

δ
f
,
s
f

)

APPROACH:
(
c*
,
θ
*
) = argmax
c,
θ

log p(y, y’,
c
,
θ
)

θ

Summary

Objective function



Logarithm of model: log p(y, y’,
c
,
θ
) =
Σ
kk’

w
kk’

c
kk’
+ log I(
c
) + log p
θ
(
θ
) +
γ


w
kk


=
-
(
1/
s
t

(t
k’



t
k


δ
t
)
2

+ 1/
s
f

(f
k’



f
k


δ
f
)
2

)
-


2

log
β




β


= p
d

(
λ
/V)
1/2

Euclidean distance between bump centers



Large

w
kk


if : a) bumps are
close
b)

small
p
d

c)
few bumps
per volume element




No need to specify p
d

,
λ
, and V, they only appear through
β

=
knob

to control # matches

y

y


(
-
δ
t
/2,
-
δ
f

/2)

(
δ
t
/2,
δ
f

/2)

i

i’

j’

Distance measures


w
kk


= 1/
s
t,kk’

(t
k’



t
k


δ
t
)
2
+

1/
s
f,kk’

(f
k’



f
k


δ
f
)
2

+

2

log
β


s
t,kk’
=
(
Δ
t
k
+
Δ
t’
k
)
s
t
s
f,kk’
=
(
Δ
f
k
+
Δ
f’
k
)
s
f




Scaling

Non
-
Euclidean

p(y, y’,
c
,
θ
)
/

I(
c
) p
θ
(
θ
)
Π
kk’

(
N(t
k’



t
k ;

δ
t ,
s
t,kk’
)

N(f
k’



f
k ;

δ
f ,
s
f, kk’
)

β
-
2
)
c
kk’

Generative model







Expect bumps to appear at about
same frequency
, but
delayed


Frequency shift requires non
-
linear transformation, less likely than delay


Conjugate priors
for
s
t

and
s
f

(scaled inverse chi
-
squared):






Improper prior for
δ
t

and

δ
t
:
p(
δ
t
) = 1 = p(
δ
f
)

Prior for parameters

CTR

MCI

Preliminary results for multi
-
variate model




linear comb of p
c

Probabilistic inference

MATCHING

POINT ESTIMATION

PROBLEM:

Given two bump models, compute (
ρ
spur
,

δ
t
,

s
t
,

δ
f
, s
f

)

APPROACH:
(
c*
,
θ
*
) = argmax
c,
θ

log p(y, y’,
c
,
θ
)

θ

SOLUTION:
Coordinate descent



c
(i+1)

= argmax
c

log p(y, y’,
c
,
θ
(i)

)


θ
(i+1)

= argmax
x

log p(y, y’,
c
(i+1)

,
θ

)


X

Y

Min
x
2

X
,

y
2
Y


d(
x
,
y
)

Generative model

Generate bump model (
hidden
)




geometric prior for number n of bumps


p(n) = (1
-

λ

S) (
λ

S)
-
n




bumps are uniformly distributed in rectangle




amplitude, width (in t and f) all
i.i.d
.

Generate
M

“noisy” observations




offset between hidden and observed bump


= Gaussian random vector with


mean

(
δ
t,m

/2,
δ
f,m

/2)


covariance
diag
(
s
t,m
/2,
s
f,m

/2)




amplitude, width (in t and f) all
i.i.d
.




“deletion” with probability p
d


(other prior
p
c0

for cluster size)

y
hidden

y
1

y
2

y
3

y
4

y
5

Parameters:
θ
=
δ
t,m
,

δ
f,m ,
s
t,m
, s
f,m
,

p
c

p
c

(i) = p(cluster size = i |y) (i = 1,2,…,M)



(Hebb 1949, Fuster 1997)

Stimuli

Consolidation

Stimulus

Voice

Face

Voice

Role of local synchrony

Assembly activation

Hebbian consolidation

Assembly recall

Probabilistic inference

CLUSTERING (IP or MP)

POINT ESTIMATION

PROBLEM:

Given
M
bump models, compute
θ
=
δ
t,m
,

δ
f,m ,
s
t,m
, s
f,m
,

p
c

APPROACH:
(
c*
,
θ
*
) = argmax
c,
θ

log p(y, y’,
c
,
θ
)

SOLUTION:
Coordinate descent



c
(i+1)

= argmax
c

log p(y, y’,
c
,
θ
(i)

)


θ
(i+1)

= argmax
x

log p(y, y’,
c
(i+1)

,
θ

)


Integer program



Max
-
product algorithm (MP) on sparse graph



Integer programming methods (e.g., LP relaxation)