Machine learning techniques for
quantifying neural synchrony:
application to the diagnosis of
Alzheimer's disease
from EEG
Justin Dauwels
LIDS, MIT
LMI, Harvard Medical School
Amari Research Unit, Brain Science Institute, RIKEN
June 9, 2008
RIKEN Brain Science Institute
•
RIKEN Wako Campus (near Tokyo)
•
about 400 researchers and staff (20% foreign)
•
300 research fellows and visiting scientists
•
about 60 laboratories
•
research covers most aspects of brain science
Collaborators
François
Vialatte
*, Theo Weber
+
, Shun

ichi
Amari
*,
Andrzej
Cichocki
* (*RIKEN,
+
MIT)
Project
Early diagnosis of Alzheimer’s disease based on
EEG
Financial Support
Research Overview
•
EEG (RIKEN, MIT, MGH)
•
diagnosis of Alzheimer’s disease
•
detection/prediction of epileptic seizures
•
analysis of EEG evoked by visual/auditory stimuli
•
EEG during meditation
•
projects related to brain

computer interface (BMI)
•
Calcium imaging (RIKEN, NAIST, MIT)
•
effect of calcium on neural growth
•
role of calcium propagation in
gliacells
and neurons
•
Diffusion MRI (
Brigham&Women’s
Hospital, Harvard Medical School, MIT)
•
estimation and clustering of tracts (future project)
Machine learning & signal processing for applications in
NEUROSCIENCE
= development of
ALGORITHMS
to analyze
brain signals
subject
of this
talk
Overview
Alzheimer’s Disease (AD)
EEG of AD patients: decrease in synchrony
Synchrony measure in time

frequency domain
Pairs of EEG signals
Collections of EEG signals
Numerical Results
Outlook
A
lzheimer's
disease
Outside glimpse: clinical perspective
•
Mild (early stage)

becomes less energetic or spontaneous

noticeable cognitive deficits

still independent (able to compensate)
•
Moderate (middle stage)

Mental abilities decline

personality changes

become dependent on caregivers
•
Severe (late stage)

complete deterioration of the personality

loss of control over bodily functions

total dependence on caregivers
Apathy
Memory
(forgetting
relatives)
Evolution of the disease (stages)
One disease,
many symptoms
Loss of
Self

control
Video sources: Alzheimer society
•
2 to 5 years before

mild cognitive impairment (often unnoticed)

6 to 25 % progress to Alzheimer's per year
memory, language, executive functions,
apraxia, apathy, agnosia, etc…
•
2% to 5% of people over 65 years old
•
up to 20% of people over 80
Jeong 2004 (Nature)
EEG data
A
lzheimer's
disease
Inside glimpse: brain atrophy
Video source: P. Thompson, J.Neuroscience, 2003
Images: Jannis Productions.
(R. Fredenburg; S. Jannis)
amyloid plaques and
neurofibrillary tangles
Video source:
Alzheimer society
Overview
Alzheimer’s Disease (AD)
EEG of AD patients: decrease in synchrony
Synchrony measure in time

frequency domain
Pairs of EEG signals
Collections of EEG signals
Numerical Results
Outlook
A
lzheimer's
disease
Inside glimpse: abnormal EEG
•
AD vs. MCI
(Hogan et al. 203; Jiang et al., 2005)
•
AD vs. Control
(Hermann,
Demilrap
, 2005,
Yagyu
et al. 1997;
Stam
et al., 2002;
Babiloni
et al. 2006)
•
MCI
vs.
mildAD
(
Babiloni
et al., 2006).
Decrease of synchrony
Brain “slow

down”
slow rhythms (0.5

8 Hz)
fast rhythms (8

30 Hz)
(
Babiloni
et al., 2004;
Besthorn
et al., 1997;
Jelic
et al. 1996,
Jeong
2004;
Dierks
et al., 1993).
Images: www.cerebromente.org.br
EEG system: inexpensive, mobile, useful for screening
focus of this project
Spontaneous (scalp) EEG
Fourier power
f (Hz)
t (sec)
amplitude
Fourier
X(f)
2
EEG
x(t)
Time

frequency
X(t,f)
2
(wavelet transform)
Time

frequency patterns
(“bumps”)
Fourier transform
High frequency
Low frequency
Frequency
1
2
3
2
1
3
Windowed Fourier transform
*
=
Fourier basis functions
Window
function
windowed basis
functions
Windowed
Fourier
Transform
t
f
Spontaneous EEG
Fourier power
f (Hz)
t (sec)
amplitude
Fourier
X(f)
2
EEG
x(t)
Time

frequency
X(t,f)
2
(wavelet transform)
Time

frequency patterns
(“bumps”)
Signatures of local synchrony
f (Hz)
t (sec)
Time

frequency patterns
(“bumps”)
EEG stems from
thousands
of neurons
bump
if neurons are
phase

locked
= local synchrony
A
lzheimer's
disease
Inside glimpse: abnormal EEG
•
AD vs. MCI
(Hogan et al. 203; Jiang et al., 2005)
•
AD vs. Control
(Hermann,
Demilrap
, 2005,
Yagyu
et al. 1997;
Stam
et al., 2002;
Babiloni
et al. 2006)
•
MCI
vs.
mildAD
(
Babiloni
et al., 2006).
Decrease of synchrony
Brain “slow

down”
slow rhythms (0.5

8 Hz)
fast rhythms (8

30 Hz)
(
Babiloni
et al., 2004;
Besthorn
et al., 1997;
Jelic
et al. 1996,
Jeong
2004;
Dierks
et al., 1993).
Images: www.cerebromente.org.br
EEG system: inexpensive, mobile, useful for screening
focus of this project
Overview
Alzheimer’s Disease (AD)
EEG of AD patients: decrease in synchrony
Synchrony measure in time

frequency domain
Pairs of EEG signals
Collections of EEG signals
Numerical Results
Outlook
Comparing EEG signal
rhythms ?
PROBLEM I:
Signals of 3
seconds sampled at 100 Hz (
㌰〠
獡浰s敳
)
Time

frequency representation of one signal = about
25 000 coefficients
2 signals
Numerous
neighboring
pixels
Comparing EEG signal rhythms ?(2)
One pixel
PROBLEM II:
Shifts in time

frequency!
Sparse representation: bump model
Assumptions
:
1.
time

frequency
map
is
suitable
representation
2.
oscillatory
bursts
(“bumps”)
convey
key
information
Bumps
Sparse
representation
F. Vialatte et al. “
A machine learning approach to the analysis of time

frequency maps and its application to neural dynamics
”, Neural Networks (2007).
Normalization
:
10
4

10
5
coefficients
about 10
2
parameters
t (sec)
f(Hz)
f(Hz)
t (sec)
f(Hz)
t (sec)
Similarity of bump models...
How
“
similar
”
or
“
synchronous
”
are two bump models
?
=
GLOBAL
synchrony
Reminder: bumps due to
LOCAL
synchrony
=
MULTI

SCALE
approach
... by matching bumps
y
1
y
2
Some bumps
match
Offset
between matched bumps
SIMILAR
bump models if:
Many
matches
Strongly
overlapping
matches
... by matching bumps (2)
•
Bumps in
one
model, but
NOT
in other
→ fraction of “spurious” bumps
ρ
spur
•
Bumps in
both
models, but with offset
→ Average time offset
δ
t
(delay)
→ Timing jitter with variance
s
t
→ Average frequency offset
δ
f
→ Frequency
jitter with variance
s
f
Synchrony:
only
s
t
and
ρ
spur
relevant
PROBLEM:
Given two bump models, compute
(
ρ
spur
,
δ
t
,
s
t
,
δ
f
,
s
f
)
Stochastic Event Synchrony (SES)
=
(
ρ
spur
,
δ
t
,
s
t
,
δ
f
,
s
f
)
Overview
Alzheimer’s Disease (AD)
EEG of AD patients: decrease in synchrony
Synchrony measure in time

frequency domain
Pairs of EEG signals
Collections of EEG signals
Numerical Results
Outlook
Average synchrony
3. SES for each pair of models
4. Average the SES parameters
1.
Group electrodes in regions
2.
Bump model for each region
Beyond pairwise interactions...
Pairwise similarity
Multi

variate similarity
...by clustering
y
1
y
2
y
3
y
4
y
5
y
1
y
2
y
3
y
4
y
5
Constraint: in each cluster
at most
one bump from each signal
Models similar if
•
few deletions/large clusters
•
little jitter
HARD combinatorial problem!
Overview
Alzheimer’s Disease (AD)
EEG of AD patients: decrease in synchrony
Synchrony measure in time

frequency domain
Pairs of EEG signals
Collections of EEG signals
Numerical Results
Outlook
EEG Data
EEG data provided by Prof. T. Musha
•
EEG of 22
Mild Cognitive Impairment
(MCI) patients and 38 age

matched
control
subjects (CTR) recorded while in
rest with closed eyes
→
spontaneous EEG
•
All
22 MCI patients suffered from
Alzheimer’s disease
(AD) later on
•
Electrodes located on
21 sites
according to 10

20 international system
•
Electrodes grouped into
5 zones
(reduces number of pairs)
1 bump model per zone
•
Used continuous “artifact

free” intervals of
20s
•
Band pass filtered
between 4 and 30 Hz
Similarity measures
•
Correlation
and
coherence
•
Granger causality
(linear system):
DTF, ffDTF, dDTF, PDC, PC, ...
•
Phase Synchrony:
compare
instantaneous phases
(wavelet/Hilbert transform)
•
State space
based measures
sync likelihood, S

estimator, S

H

N

indices, ...
•
Information

theoretic
measures
KL divergence, Jensen

Shannon divergence, ...
No Phase Locking
Phase Locking
TIME
FREQUENCY
Sensitivity (average synchrony)
Granger
Info. Theor.
State Space
Phase
SES
Corr/Coh
Mann

Whitney test:
small p value suggests large difference in statistics of both groups
Significant differences for ffDTF and
ρ
!
Classification
•
Clear
separation, but
not yet
useful as diagnostic tool
•
Additional
indicators needed (fMRI, MEG, DTI, ...)
•
Can be used for
screening
population (inexpensive, simple, fast)
ffDTF
Strong (anti

) correlations „families“ of sync measures
Correlations
Overview
Alzheimer’s Disease (AD)
EEG of AD patients: decrease in synchrony
Synchrony measure in time

frequency domain
Pairs of EEG signals
Collections of EEG signals
Numerical Results
Outlook
Ongoing work
Time

varying
similarity parameters
s
t
low s
t
high s
t
high s
t
no stimulus
no stimulus
stimulus
low s
t
high s
t
high
s
t
Future work
Matching event
patterns
instead of
single
events
= allows us to extract
patterns
in time

frequency map of EEG!
HYPOTHESIS:
Perhaps specific patterns occur in time

frequency EEG maps
of
AD patients
before onset of
epileptic
seizures
REMARK:
Such patterns are
ignored
by classical approaches:
STATIONARITY/AVERAGING!
coupling between
frequency bands
t (sec)
f(Hz)
Conclusions
Measure for similarity of
point processes
(„
stochastic event synchrony
“)
Key idea:
alignment
of events
Solved by
statistical inference
Application:
EEG synchrony of MCI patients
About
85%
correctly classified; perhaps useful for
screening
population
Ongoing/future work
: time

varying SES, extracting patterns of bumps
References + software
References
Quantifying Statistical Interdependence by Message Passing on Graphs: Algorithms and
Application to Neural Signals,
Neural Computation (under revision)
A Comparative Study of Synchrony Measures for the Early Diagnosis of Alzheimer's Disease Based
on EEG,
NeuroImage
(under revision)
Measuring Neural Synchrony by Message Passing,
NIPS 2007
Quantifying the Similarity of Multiple Multi

Dimensional Point Processes by Integer Programming
with Application to Early Diagnosis of Alzheimer's Disease from EEG,
EMBC 2008 (submitted)
Software
MATLAB implementation
of the synchrony
measures
Machine learning techniques for
quantifying neural synchrony:
application to the diagnosis of
Alzheimer's disease
from EEG
Justin Dauwels
LIDS, MIT
LMI, Harvard Medical School
Amari Research Unit, Brain Science Institute, RIKEN
June 9, 2008
Machine learning for neuroscience
Multi

scale
in time and space
Data fusion
: EEG, fMRI, spike data, bio

imaging, ...
Large

scale inference
Visualization
Behavior ↔ Brain ↔ Brain Regions ↔ Neural Assemblies ↔ Single neurons ↔ Synapses ↔ Ion channels
Estimation
Deltas: average offset
Sigmas: var of offset
...where
Simple closed form expressions
artificial observations (conjugate prior)
Large

scale synchrony
Apparently,
all
brain regions affected...
A
lzheimer's
disease
0
2
4
6
8
10
12
14
1980
1990
2000
2010
2020
2030
2040
2050
Outside glimpse: the future (prevalence)
USA (Hebert et al. 2003)
0
20
40
60
80
100
120
2000
2030
2050
Developped countries
Developping countries
World (
Wimo
et al. 2003)
Million of sufferers
Million of sufferers
•
2% to 5% of people
over 65 years old
•
Up to 20% of people
over 80
Jeong 2004 (Nature)
Ongoing and future work
Applications
alternative inference techniques (e.g., MCMC, linear programming)
time dependent (Gaussian processes)
multivariate (T.Weber)
Fluctuations of EEG synchrony
Caused by auditory stimuli and music (T. Rutkowski)
Caused by visual stimuli (F. Vialatte)
Yoga professionals (F. Vialatte)
Professional shogi players (RIKEN & Fujitsu)
Brain

Computer Interfaces (T. Rutkowski)
Spike data from interacting monkeys (N. Fujii)
Calcium propagation in gliacells (N. Nakata)
Neural growth (Y. Tsukada & Y. Sakumura)
...
Algorithms
Fitting bump models
Signal
Bump
Initialisation
After adaptation
Adaptation
gradient method
F. Vialatte et al. “
A machine learning approach to the analysis of time

frequency maps and its application to neural dynamics
”, Neural Networks (2007).
Boxplots
SURPRISE!
No
increase in
jitter
, but significantly
less
matched activity
!
Physiological
interpretation
•
neural assemblies more
localized
?
•
harder to establish
large

scale
synchrony?
Similarity of bump models...
How
“
similar
”
or
“
synchronous
”
are two bump models?
POINT ESTIMATION:
θ
(i+1)
= argmax
x
log p(y, y’,
c
(i+1)
,
θ
)
Uniform
prior p(
θ
):
δ
t
,
δ
f
= average offset,
s
t
, s
f
= variance of offset
Conjugate
prior p(
θ
): still closed

form expression
Other kind of
prior p(
θ
): numerical optimization (gradient method)
Probabilistic inference
MATCHING:
c
(i+1)
= argmax
c
log p(y, y’,
c
,
θ
(i)
)
ALGORITHMS
•
Polynomial

time
algorithms gives
optimal
solution(s) (Edmond

Karp and Auction algorithm)
•
Linear programming
relaxation: extreme points of LP polytope are
integral
•
Max

product
algorithm
gives optimal solution if
unique
[Bayati et al. (2005), Sanghavi (2007)]
EQUIVALENT to (imperfect) bipartite max

weight matching problem
c
(i+1)
= argmax
c
log p(y, y’,
c
,
θ
(i)
) = argmax
c
Σ
kk’
w
kk’
(i)
c
kk’
s.t.
Σ
k’
c
kk’
≤ 1
and
Σ
k
c
kk’
≤ 1
and c
kk’
2
{0,1}
Probabilistic inference
not
necessarily
perfect
find
heaviest
set of
disjoint
edges
p(y, y’,
c
,
θ
)
/
I(
c
) p
θ
(
θ
)
Π
kk’
(
N(t
k’
–
t
k ;
δ
t ,
s
t,kk’
)
N(f
k’
–
f
k ;
δ
f ,
s
f, kk’
)
β

2
)
c
kk’
Max

product algorithm
MATCHING:
c
(i+1)
= argmax
c
log p(y, y’,
c
,
θ
(i)
)
Generative model
Max

product algorithm
MATCHING:
c
(i+1)
= argmax
c
log p(y, y’,
c
,
θ
(i)
)
μ↑
μ↑
μ↓
μ↓
Conditioning on
θ
Max

product algorithm (2)
•
Iteratively compute
messages
•
At convergence, compute
marginals
p(c
kk’
) =
μ↓
(c
kk’
)
μ↓
(c
kk’
)
μ↑
(c
kk’
)
•
Decisions
: c*
kk’
= argmax
c
kk
’
p(c
kk’
)
Algorithm
MATCHING → max

product
ESTIMATION → closed

form
PROBLEM:
Given two bump models, compute (
ρ
spur
,
δ
t
,
s
t
,
δ
f
, s
f
)
APPROACH:
(
c*
,
θ
*
) = argmax
c,
θ
log p(y, y’,
c
,
θ
)
θ
SOLUTION:
Coordinate descent
c
(i+1)
= argmax
c
log p(y, y’,
c
,
θ
(i)
)
θ
(i+1)
= argmax
x
log p(y, y’,
c
(i+1)
,
θ
)
Generative model
Generate bump model (
hidden
)
•
geometric prior for number n of bumps
p(n) = (1

λ
S) (
λ
S)

n
•
bumps are uniformly distributed in rectangle
•
amplitude, width (in t and f) all i.i.d
.
Generate two
“noisy” observations
•
offset between hidden and observed bump
= Gaussian random vector with
mean
(
±
δ
t
/2,
±
δ
f
/2)
covariance
diag(s
t
/2, s
f
/2)
•
amplitude, width (in t and f) all i.i.d.
•
“deletion” with probability p
d
y
hidden
y
y
’
Easily
extendable
to more than 2 observations…
(

δ
t
/2,

δ
f
/2)
(
δ
t
/2,
δ
f
/2)
Generative model (2)
•
Binary variables
c
kk’
c
kk’
= 1
if
k
and
k’
are observations of same hidden bump, else
c
kk’
= 0
(e.g.,
c
ii’
= 1 c
ij’
= 0
)
•
Constraints: b
k
=
Σ
k’
c
kk’
and b
k’
=
Σ
k
c
kk’
are
binary
(“matching constraints”)
•
Generative Model
p(y, y’, y
hidden
,
c
,
δ
t
,
δ
f
,
s
t
,
s
f
) (
symmetric
in y and y’)
•
Eliminate y
hidden
→ offset is Gaussian RV with
mean = (
δ
t
,
δ
f
) and
covariance
diag (s
t
, s
f
)
•
Probabilistic Inference:
(
c*
,
θ
*
) = argmax
c,
θ
log p(y, y’,
c
,
θ
)
y
y
’
(

δ
t
/2,

δ
f
/2)
(
δ
t
/2,
δ
f
/2)
i
i’
j’
p(y, y’,
c
,
θ
) =
∫
p(y, y’, y
hidden
,
c
,
θ
) dy
hidden
θ
•
Bumps in
one
model, but
NOT
in other
→ fraction of “spurious” bumps
ρ
spur
•
Bumps in
both
models, but with offset
→ Average time offset
δ
t
(delay)
→ Timing jitter with variance
s
t
→ Average frequency offset
δ
f
→ Frequency
jitter with variance
s
f
PROBLEM:
Given two bump models, compute (
ρ
spur
,
δ
t
,
s
t
,
δ
f
,
s
f
)
APPROACH:
(
c*
,
θ
*
) = argmax
c,
θ
log p(y, y’,
c
,
θ
)
θ
Summary
Objective function
•
Logarithm of model: log p(y, y’,
c
,
θ
) =
Σ
kk’
w
kk’
c
kk’
+ log I(
c
) + log p
θ
(
θ
) +
γ
w
kk
’
=

(
1/
s
t
(t
k’
–
t
k
–
δ
t
)
2
+ 1/
s
f
(f
k’
–
f
k
–
δ
f
)
2
)

2
log
β
β
= p
d
(
λ
/V)
1/2
Euclidean distance between bump centers
•
Large
w
kk
’
if : a) bumps are
close
b)
small
p
d
c)
few bumps
per volume element
•
No need to specify p
d
,
λ
, and V, they only appear through
β
=
knob
to control # matches
y
y
’
(

δ
t
/2,

δ
f
/2)
(
δ
t
/2,
δ
f
/2)
i
i’
j’
Distance measures
w
kk
’
= 1/
s
t,kk’
(t
k’
–
t
k
–
δ
t
)
2
+
1/
s
f,kk’
(f
k’
–
f
k
–
δ
f
)
2
+
2
log
β
s
t,kk’
=
(
Δ
t
k
+
Δ
t’
k
)
s
t
s
f,kk’
=
(
Δ
f
k
+
Δ
f’
k
)
s
f
Scaling
Non

Euclidean
p(y, y’,
c
,
θ
)
/
I(
c
) p
θ
(
θ
)
Π
kk’
(
N(t
k’
–
t
k ;
δ
t ,
s
t,kk’
)
N(f
k’
–
f
k ;
δ
f ,
s
f, kk’
)
β

2
)
c
kk’
Generative model
Expect bumps to appear at about
same frequency
, but
delayed
Frequency shift requires non

linear transformation, less likely than delay
Conjugate priors
for
s
t
and
s
f
(scaled inverse chi

squared):
Improper prior for
δ
t
and
δ
t
:
p(
δ
t
) = 1 = p(
δ
f
)
Prior for parameters
CTR
MCI
Preliminary results for multi

variate model
linear comb of p
c
Probabilistic inference
MATCHING
POINT ESTIMATION
PROBLEM:
Given two bump models, compute (
ρ
spur
,
δ
t
,
s
t
,
δ
f
, s
f
)
APPROACH:
(
c*
,
θ
*
) = argmax
c,
θ
log p(y, y’,
c
,
θ
)
θ
SOLUTION:
Coordinate descent
c
(i+1)
= argmax
c
log p(y, y’,
c
,
θ
(i)
)
θ
(i+1)
= argmax
x
log p(y, y’,
c
(i+1)
,
θ
)
X
Y
Min
x
2
X
,
y
2
Y
d(
x
,
y
)
Generative model
Generate bump model (
hidden
)
•
geometric prior for number n of bumps
p(n) = (1

λ
S) (
λ
S)

n
•
bumps are uniformly distributed in rectangle
•
amplitude, width (in t and f) all
i.i.d
.
Generate
M
“noisy” observations
•
offset between hidden and observed bump
= Gaussian random vector with
mean
(
δ
t,m
/2,
δ
f,m
/2)
covariance
diag
(
s
t,m
/2,
s
f,m
/2)
•
amplitude, width (in t and f) all
i.i.d
.
•
“deletion” with probability p
d
(other prior
p
c0
for cluster size)
y
hidden
y
1
y
2
y
3
y
4
y
5
Parameters:
θ
=
δ
t,m
,
δ
f,m ,
s
t,m
, s
f,m
,
p
c
p
c
(i) = p(cluster size = i y) (i = 1,2,…,M)
(Hebb 1949, Fuster 1997)
Stimuli
Consolidation
Stimulus
Voice
Face
Voice
Role of local synchrony
Assembly activation
Hebbian consolidation
Assembly recall
Probabilistic inference
CLUSTERING (IP or MP)
POINT ESTIMATION
PROBLEM:
Given
M
bump models, compute
θ
=
δ
t,m
,
δ
f,m ,
s
t,m
, s
f,m
,
p
c
APPROACH:
(
c*
,
θ
*
) = argmax
c,
θ
log p(y, y’,
c
,
θ
)
SOLUTION:
Coordinate descent
c
(i+1)
= argmax
c
log p(y, y’,
c
,
θ
(i)
)
θ
(i+1)
= argmax
x
log p(y, y’,
c
(i+1)
,
θ
)
Integer program
•
Max

product algorithm (MP) on sparse graph
•
Integer programming methods (e.g., LP relaxation)
Comments 0
Log in to post a comment