modeling of single cell data for

lettuceescargatoireΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

56 εμφανίσεις


Probabilistic
(Bayesian network)

modeling of single cell data for
elucidation of s
ignaling pathways


Alpha Symposium

June 17, 2005


Karen Sachs

MIT

Lauffenburger Lab


2

From Phospho
-
molecular profiling to
Signaling pathways

High throughput data

Raf

Erk

p38

PKA

PKC

Jnk

PIP2

PIP3

Plc


Akt

...

Cell1

Cell2

Cell3

Cell4

Cell600

Signaling Pathways

Flow Measurments

Picture: John Albeck

3


d
[
R
]
dt

k
1
[
LR
]


k
2
[
R
][
L
]


...
Spectrum of Modeling Tools in

Systems Biology

4

The modeling approach

“A description of a process that could have
generated the observed data”


Biomolecules (Proteins, genes, small molecules: )


Dependence relationships (Activation, repression: )

or

Pr2

Pr1

Pr3

Pr2

Pr1

Pr3

YFM

5

Model Based Approach

Abstract class of models

Biological insights and

understanding

analysis

Learning

or

Pr2

Pr1

Pr3

Pr2

Pr1

Pr3

Specific model


Best explains data

Pr2

Pr1

Pr3

+

Data

Prediction

6

What is a Bayesian Network?

Protein A

Protein B

Protein C

Protein D

Protein E

A Graph..

+ A Mathematical
(probabilistic)
description of the
connections in the
graph

P(B=1|A=0) = 0.8

P(B=1|A=1) = 0.3

If Protein A is
Off
, Protein B is
On

with
probability 0.8

7

How do we use Bayesian Networks to
infer pathways?



Based on data, we can assess how good a model is

A

B

Off

On

Off

Off

Off

On

Off

On

Off

On

On

Off

On

Off

On

Off

On

Off

Data:

Protein A

Protein B

Bad model

Good models

Protein A

Protein B

Protein A

Protein B

Protein C

Protein A

Protein B

The Bayesian network does this
using probability theory

(With more variables and more complex
relationships)

8

How do we use Bayesian Networks to
infer pathways?

The Technical Details

BayesianScore
(
S
)

log
P
(
S
D
)

log
P
(
S
)

log
P
(
D
S
)

c


Score candidate models



Use a heuristic search to find high scoring models


...
P
(
D

,
S
)
P
(

S
)
d


n


1


...
P
(
D
,

S
)
d


n


1

P
(
D
S
)
(analytical solution!)

9

Model
searching by heuristics

For a given set of variables:


Implement a greedy random search:



Pick edge to add, delete or reverse



Retain edge if score increases


Score:
-
63

Score:
-
58

Keep edge

10

Assessing models

A

B

C

C

A

B

A

B

C

On

On

On

On

Off

Off

On

On

Off

On

On

On

On

On

On

On

On

On

Off

Off

Off

Off

On

On

Off

Off

Off

Off

Off

Off

Off

Off

Off

11

Assessing models

A

B

C

C

A

B

P(B=‘On’|A=‘On’) = 0.83

A

B

C

On

On

On

On

Off

Off

On

On

Off

On

On

On

On

On

On

On

On

On

Off

Off

Off

Off

On

On

Off

Off

Off

Off

Off

Off


Off

Off

Off

5/6 = 0.83

12

Assessing models

A

B

C

C

A

B

P(B=‘On’|A=‘On’) = 0.83

A

B

C

On

On

On

On

Off

Off

On

On

Off

On

On

On

On

On

On

On

On

On

Off

Off

Off

Off

On

On

Off

Off

Off

Off

Off

Off

Off

Off

Off

P(B=‘Off’|A=‘Off’) = 0.8

4/5 = 0.8

13

Assessing models

A

B

C

C

A

B

P(B=‘On’|A=‘On’) = 0.83

A

B

C

On

On

On

On

Off

Off

On

On

Off

On

On

On

On

On

On

On

On

On

Off

Off

Off

Off

On

On

Off

Off

Off

Off

Off

Off

Off

Off

Off

P(B=‘Off’|A=‘Off’) = 0.8

P(C=‘On’|A=‘On’) = 0.66

4/6 = 0.66

14

Assessing models

A

B

C

C

A

B

P(B=‘On’|A=‘On’) = 0.83

A

B

C

On

On

On

On

Off

Off

On

On

Off

On

On

On

On

On

On

On

On

On

Off

Off

Off

Off

On

On

Off

Off

Off

Off

Off

Off

Off

Off

Off

P(B=‘Off’|A=‘Off’) = 0.8

P(C=‘On’|A=‘On’) = 0.66

4/5 = 0.8

P(C=‘On’|B=‘On’) = 0.8

15

Need many observations of the system

The dependencies we find are probabilistic

What kind of data do we need?

Expression

data



Population average



mRNA levels are not ideal for
inferring signaling pathways



Hard to get large numbers of chips
on same cells, same conditions, etc.

12 COLOR FLOW

16

Flow Cytometry:
Single Cell Analysis

Thousands of
datapoints

17


Datasets


of cells



condition ‘a’



condition ‘b’


condition…‘n’

12 Color Flow Cytometry

perturbation a

perturbation n

perturbation b

Conditions
(96 well format)

T
-
Lymphocyte Data


Primary human T
-
Cells


9 conditions


(6
Specific

interventions)


9 phosphoproteins, 2
phospolipids


600 cells per condition


5400 data
-
points

18

Using Correlations

PKC

Raf

Erk

Mek

Plc


PKA

Akt

Jnk

P38

PIP2

PIP3



Phospho
-
Proteins



Phospho
-
Lipids

19

Statistical Dependencies

But
, how can statistical dependencies

determine directionality?

A

B

C


D

E

Phospho A

Phospho B

20

The Power of Interventions

A

B

No Manipulations

A inhibited

B inhibited

Phospho A

Phospho B

B

A

B

A

21

Dismissing Edges

Phospho A

Phospho B

Phospho B

Phospho C

Phospho A

Phospho C

A

B

C


D

E

Edges A
-
>B and B
-
>C explain
dependence of A and C
dismissing the edge
between them

22

Context Specificity

Phospho B

Phospho D


E is high


B and D seem unrelated


Relationship is revealed
by considering
simultaneous
measurement of E


Demonstrates the need
for simultaneous
measurements of
variables


Pairwise computational
analysis (e.g.
correlations)
insufficient

23

Indirect Edges


What would happen if B was not measured?

A

C


D

B

E

Phospho A

Phospho C

24

Overview

Influence
diagram
of
measured
variables

Bayesian
Network
Analysis


Datasets


of cells



condition ‘a’



condition ‘b’


condition…‘n’

Multiparameter Flow Cytometry

perturbation a

perturbation n

perturbation b

Conditions
(96 well format)

25

Conditions


Stimuli:


anti
-
CD3, anti
-
CD28, ICAM
-
2


Inhibitors to:


Akt, PKC, PIP3, Mek


Activators of:


PKC, PKA

26

PKC

Raf

P44/42

Mek

Plc


PKA

Akt

Jnk

P38

PIP2

PIP3



Phospho
-
Proteins



Phospho
-
Lipids



Perturbed in data

Inferred Network

27

PKC

Raf

P44/42

Mek

Plc


PKA

Akt

Jnk

P38

PIP2

PIP3



Phospho
-
Proteins



Phospho
-
Lipids



Perturbed in data

How well did we do?

Direct phosphorylation

28

Features of Approach


Direct phosphorylation:



Mek

Difficult to detect using other forms of
high
-
throughput data:

-
Protein
-
protein interaction data

-
Microarrays

Erk

29

PKC

Raf

P44/42

Mek

Plc


PKA

Akt

Jnk

P38

PIP2

PIP3



Phospho
-
Proteins



Phospho
-
Lipids



Perturbed in data

How well did we do?

30

PKC

Raf

P44/42

Mek

Plc


PKA

Akt

Jnk

P38

PIP2

PIP3



Phospho
-
Proteins



Phospho
-
Lipids



Perturbed in data

How well did we do?

Indirect Signaling

31

Features of Approach


Indirect signaling




Explaining away



Raf

Mek

Erk

PKC

Jnk

PKC

Mapkkk

Jnk

Not measured

Mapkk

Indirect connections can be found even when
the intermediate molecule(s) are not measured

Unnecessary edges do NOT appear

32

Indirect signaling
-

Complex example


Is this a mistake?




The real picture




Phoso
-
protein specific


More than one pathway of influence

PKC

Raf

Mek

PKC

Raf
s259

Mek

Raf
s497

Ras

33

PKC

Raf

P44/42

Mek

Plc


PKA

Akt

Jnk

P38

PIP2

PIP3

Expected Pathway


15/17 Classic



Phospho
-
Proteins



Phospho
-
Lipids



Perturbed in data

How well did we do?

34

PKC

Raf

P44/42

Mek

Plc


PKA

Akt

Jnk

P38

PIP2

PIP3

Expected Pathway

Reported

Missed


15/17 Classic


17/17 Reported


3 Missed

Reversed



Phospho
-
Proteins



Phospho
-
Lipids



Perturbed in data

How well did we do?

35

Prediction

Erk1/2 unperturbed


Erk influence on Akt
previously reported in colon
cancer cell lines

Predictions:


Erk1/2 influences Akt


While correlated, Erk1/2
does not influence PKA

PKC

Raf

Erk1/2

Mek

PKA

Akt

36

Validation

control, stimulated

Erk1 siRNA, stimulated



SiRNA on Erk1/Erk2


Select transfected cells


Measure Akt and PKA

10
0
10
1
10
2
10
3
10
4
APC-A: p-akt-647 APC-A
10
0
10
1
10
2
10
3
10
4
PE-A: p-pka-546 PE-A
P
-
Akt

P
-
PKA

P=9.4e
-
5

P=0.28

37

Power of Interventions

PKC

Raf

Erk

Mek

Plc


PKA

Akt

Jnk

P38

PIP2

PIP3

A. Lacking
Intervention
data



Expected

7/10



Reported

1/10



Reversed

N/A



Unexplained

2



Missed

11




Complete
Dataset

15/17

2/17

1

0

3

Dataset: 1200 samples:



2 conditions



no interventions

38

Power of Large Dataset

A.
Lacking
Intervention
data


B.

Truncated
data

Expected

7/10


7/14

Reported

1/10


1/14

Reversed

N/A

4


Unexplained

2

6


Missed

11


10


Complete
Dataset

15/17

2/17

1

0

3

PKC

Raf

Erk

Mek

Plc


PKA

Akt

Jnk

P38

PIP2

PIP3

Dataset: 420 samples:



14 conditions



30 random cells each

39

Power of Single Cell

A.
Lacking
Intervention
data


B.

Truncated
data

C.

“Western
blot”

Expected

7/10


7/14

6/16

Reported

1/10


1/14

1/16

Reversed

N/A

4


3

Unexplained

2

6


8

Missed

11


10

12


Complete
Dataset

15/17

2/17

1

0

3

PKC

Raf

Erk

Mek

Plc


PKA

Akt

Jnk

P38

PIP2

PIP3

Simulated western blot: 420 samples:



14 conditions



Each point average of 20 random cells

40

Summary


Proof of principle: Automated reconstruction
of signaling pathway in human cells


Advantages:


In
-
vivo


Directed edges (causality)


Detects direct and in
-
direct influences


Single cell


Choose sub
-
populations of interest


Disadvantage:


Static, cells fixed and stained


a
-
cyclic


Sachs et al, Science 2005

41

What next?


Can be applied to less studied pathways, initial
guess of related players


Dynamic models over time


Comparison of influences (sub
-
populations,
disease states, treatment conditions, cell types)


Larger network from overlapping sets


Associate phoso
-
state with cell
-
fate, cellular
decisions.

42

Comparison of signaling pathways


Often, the same players
participate in very
different outcomes


Similar cues may give
different outcomes in
context
-
dependent
manner (cell lines,
disease states, doses)


What characterizes the
differences leading to
different outcomes?

Response1

Response2

Cue1

Cue2

43

Differences in signaling


Starting with an ‘Averaged model’


Which edges dominate under which
condition?


Are certain paths invoked more under
different conditions? (e.g. if there are
multiple paths to ‘C’)


Can we find qualitative and quantitative
differences in the dependencies?
(hidden nodes)


44

Cell fate across treatment conditions

45

Different conditions & outcomes


ICAM2 has a ‘protective’ (or
delaying
)
affect on apoptosis under conditions of
CD3, CD28 stimulation


What differences in signaling exist
between the two conditions?


46

PKC

Raf

P44/42

Mek

Plc


PKA

Akt

Jnk

P38

PIP2

PIP3



Phospho
-
Proteins



Phospho
-
Lipids



Perturbed in data

Averaged Network

47

Which edges are stronger?

48

Arc strengths
-

preliminary results

PKC

Raf

P44/42

Mek

Plc


PKA

Akt

Jnk

P38

PIP2

PIP3

Stronger under
condition inducing
more apoptosis

Stronger under
condition
protecting from
apoptosis

No difference in
signal strength

49

Differences in dependencies?

50

Acknowledgements

Omar Perez

Garry Nolan

Dana Pe’er



DAL lab

LGG lab

Gifford group

John Albeck

Doug Lauffenburger

51

PKC

Raf

P44/42

Mek

Plc


PKA

Akt

Jnk

P38

PIP2

PIP3

Expected Pathway

Reported

Missed


15/17 Classic


17/17 Reported


3 Missed

Reversed



Phospho
-
Proteins



Phospho
-
Lipids



Perturbed in data

How well did we do?

52

Need Diverse

Computational Mining & Modeling
Approaches

differential equations

statistical mining

Bayesian networks

SPECIFIED

ABSTRACTED

Markov chains

Boolean / Fuzzy logic

+ molecular
structure /
interaction
models

relationships

influences

mechanisms

53

What kind of data do we need?

The dependencies we find are probabilistic

A

B

Off

On

Off

Off

Off

On

Off

On

Off

On

Off

On

Off

On

Off

On

Off

On

Data:

noise


Need many observations of
the system

Expression

data



Population average



mRNA levels are not ideal for
inferring signaling pathways



Hard to get large
numbers of chips
on same cells, same
conditions, etc.

12 COLOR FLOW