ppt - Bayesian Gene Expression

utterlypanoramicSecurity

Nov 30, 2013 (3 years and 4 months ago)

93 views

Lewin A
1
, Richardson S
1
, Marshall C
1
,

Glazier A
2

and Aitman T
2

(2006),

Biometrics 62, 1
-
9.


1: Imperial College Dept. Epidemiology

2: Imperial College Microarray Centre


Bayesian Modelling of Differential

Gene Expression


Introduction to microarrays and differential
expression


Bayesian hierarchical model for differential
expression


Decision rules


Predictive model checks


Gene Ontology analysis for differentially
expressed genes


Further work

Outline

(1) Array contains thousands of
spots


Millions of strands of DNA of known
sequence fixed to each spot

(2) Sample (unknown
sequences of cDNA)
labelled with
fluorescent dye

(3) Matching sequences
of DNA and cDNA
hybridize together

*

*

*

*

*

(4) Array washed


潮汹 瑣桩t朠
獡浰汥m 汥晴l⡳(攠
wh楣栠晲潭f
晬f潲敳捥湴n獰潴猩


Pictures courtesy of Affymetrix

Microarrays measure gene
expression (mRNA)

DNA TGCT

cDNA ACGA

Microarray Data


3 SHR compared with 3 transgenic rats (with Cd36)

3 wildtype (normal) mice compared with 3 mice with Cd36
knocked out



12000 genes on each array


Biological Question


Find genes which are expressed differently between animals
with and without Cd36.

Microarray experiment to find
genes associated with Cd36

Cd36: gene known to be important in insulin resistance





Aitman et al 1999, Nature Genet 21:76
-
83


Introduction to microarrays and differential
expression


Bayesian hierarchical model for differential
expression


Decision rules


Predictive model checks


Gene Ontology analysis for differentially
expressed genes


Further work

Outline


1st level

y
g1r
|

g,
δ
g,

g1



N(

g



½
δ
g

+

r(g)1
,

g1
2
),

y
g2r
|

g,
δ
g,

g2



N(

g

+ ½
δ
g

+

r(g)2
,

g2
2
),

Bayesian hierarchical model for
differential expression

array effect or
normalisation
(function of

g
)

differential effect for gene g
between 2 conditions

(fixed effect or mixture prior)

overall gene
expression

(fixed effect)

variance for
each gene

y
gsr

is log gene expession


2nd level




gs
2
|
μ
s
,
τ
s



logNorm (
μ
s
,
τ
s
)



Hyper
-
parameters
μ
s

and
τ
s

can be
influential, so these are estimated
in the model.



3rd level


μ
s




N( c, d)


τ
s




Gamma (e, f)

Prior for gene variances

Variances estimated using information
from all measurements (~12000 x 3)
rather than just 3

3 wildtype mice

Spline Curve


r(g)s
= quadratic in

g

for a
rs(k
-
1)



g



a
rs(k)

with coeff (b
rsk
(1)
,

b
rsk
(2)
), k =1, …
#breakpoints

Prior for array effects (Normalization)

Locations of break points not fixed

Must do sensitivity checks on # break points





a
1

a
2

a
3

a
0

loess

Bayesian posterior mean

Array effect as function of gene effect

Inference on
δ


(1)

d
g

= E(
δ
g
| data) posterior mean



Like point estimate of log fold change.



Decision Rule: gene g is DE if |d
g
| >
δ
cut


(2)

p
g

= P( |
δ
g
|

>
δ
cut
| data)



posterior probability (incorporates uncertainty)



Decision Rule: gene g is DE if p
g

> p
cut



This allows biologist to specify what size of effect


is interesting (not just statistical significance)

Decision Rules for Inference:

Fixed Effects Model

biological

interest

biological

interest

statistical

confidence

Illustration of decision rule

p
g

= P( |
δ
g
|

> log(2)

and

g
> 4

| data)



x

p
g

> 0.8



Δ

t
-
statistic > 2.78


(95% CI)

3 wildtype v. 3 knock
-
out mice


Introduction to microarrays and differential
expression


Bayesian hierarchical model for differential
expression


Decision rules


Predictive model checks


Gene Ontology analysis for differentially
expressed genes


Further work

Outline

Key Points


Predict new data from the model (using the
posterior distribution)


Get Bayesian p
-
value for
each

gene


Use
all genes together

(1000’s) to assess model
fit (p
-
value distribution close to Uniform if model
is good)

Predictive Model Checks

Mixed Predictive Checks


g

ybar
g

S
g

post.

pred.

S
g

mixed

pred.

S
g

σ
g
pred

σ
g

μ
,
τ

Mixed prediction is less
conservative than posterior
prediction

Bayesian predictive p
-
values


Introduction to microarrays and differential
expression


Bayesian hierarchical model for differential
expression


Decision rules


Predictive model checks


Gene Ontology analysis for differentially
expressed genes


Further work

Outline

Picture from Gene Ontology website

Links connect more general
to more specific terms


Directed Acyclic Graph


~16,000 terms

Gene Ontology: network of terms

Picture from Gene Ontology website

Each term may have
1000s of genes
annotated (or none)


Gene may be annotated
to several GO terms


Gene annotated to term A


annotated to all
ancestors of A

Annotations of genes to a node

GO annotations of genes associated
with the insulin
-
resistance gene Cd36

Compare GO annotations of genes
most and least differentially
expressed


Most differentially expressed


p
g

> 0.5 (280 genes)


Least differentially expressed


p
g

< 0.2 (11171 genes)

GO annotations of genes associated
with the insulin
-
resistance gene Cd36

Use Fisher’s test to compare GO annotations of genes most and
least differentially expressed (one test for each GO term)

None significant with simple multiple testing adjustment, but there
are many dependencies

Inflammatory
response recently
found to be important
in insulin resistance

Summary of work in Biometrics paper


Bayesian hierarchical model flexible, estimates variances
robustly



Predictive model checks show exchangeable prior good for
gene variances



Useful to find GO terms over
-
represented in the most
differentially
-
expressed genes


Introduction to microarrays and differential
expression


Bayesian hierarchical model for differential
expression


Decision rules


Predictive model checks


Gene Ontology analysis for differentially
expressed genes


Further work

Outline

BGmix: mixture model for
differential expression



Group genes into 3 classes:


non
-
DE


over
-
expressed


under
-
expressed


Estimation and classification is
simultaneous

Change the prior on the
differential expression
parameters
δ
g

BGmix: mixture model for
differential expression


Choice of Null Distribution


True log fold changes = 0


‘Nugget’ null: true log fold
changes = small but not
necessarily zero

Choice of DE genes distributions


Gammas


Uniforms


Normal

Outputs


Point estimates (and s.d.) of log fold changes (stabilised and
smoothed)


Posterior probability for gene to be in each group


Estimate of proportion of differentially expressed genes based on
grouping (parameter of model)

BGmix: mixture model for
differential expression


Obtaining gene lists


Threshold on posterior probabilities
(Posterior probability of classification in the
null < threshold
→ gene is DE)


Estimate of False Discovery Rate
for any gene list (estimate =
average of posterior probabilities)


Very simple estimate!


Choice of decision rule:


Bayes Rule


Fix False Discovery Rate


More complex rules for mixture
of 3 components

BGmix: mixture model for
differential expression



g


g
pred

z
g

ybar
g

S
g

mixed

pred.

ybar
g

mixed

pred.

S
g

σ
g
pred

σ
g

μ
,
τ

η

w


Model checks for
differential expression
parameters
δ
g



More complex for
mixture model



Important point: we
check each mixture
component separately

Predictive Checks for Mixture Model

Bayesian p
-
values for Mixture Model

Simulated data
from incorrect
model




Simulated data
from correct
model

Acknowledgements

Co
-
authors

Sylvia Richardson, Clare Marshall
(
IC Epidemiology)

Tim Aitman, Anne
-
Marie Glazier (IC Microarray Centre)


Collaborators on BGX Grant

Anne
-
Mette Hein, Natalia Bochkina

(
IC Epidemiology)

Helen Causton (IC Microarray Centre)

Peter Green (Bristol)


BBSRC Exploiting Genomics Grant

Papers and Software

Software
:

Winbugs code for model in Biometrics paper

BGmix (R package) includes mixture model


Papers
:

BGmix paper, submitted

Paper on predictive checks for mixure prior, in preparation


http://www.bgx.org.uk/