Lewin A
1
, Richardson S
1
, Marshall C
1
,
Glazier A
2
and Aitman T
2
(2006),
Biometrics 62, 1

9.
1: Imperial College Dept. Epidemiology
2: Imperial College Microarray Centre
Bayesian Modelling of Differential
Gene Expression
Introduction to microarrays and differential
expression
Bayesian hierarchical model for differential
expression
Decision rules
Predictive model checks
Gene Ontology analysis for differentially
expressed genes
Further work
Outline
(1) Array contains thousands of
spots
Millions of strands of DNA of known
sequence fixed to each spot
(2) Sample (unknown
sequences of cDNA)
labelled with
fluorescent dye
(3) Matching sequences
of DNA and cDNA
hybridize together
*
*
*
*
*
(4) Array washed
潮汹 瑣桩t朠
獡浰汥m 汥晴l⡳(攠
wh楣栠晲潭f
晬f潲敳捥湴n獰潴猩
Pictures courtesy of Affymetrix
Microarrays measure gene
expression (mRNA)
DNA TGCT
cDNA ACGA
Microarray Data
3 SHR compared with 3 transgenic rats (with Cd36)
3 wildtype (normal) mice compared with 3 mice with Cd36
knocked out
12000 genes on each array
Biological Question
Find genes which are expressed differently between animals
with and without Cd36.
Microarray experiment to find
genes associated with Cd36
Cd36: gene known to be important in insulin resistance
Aitman et al 1999, Nature Genet 21:76

83
Introduction to microarrays and differential
expression
Bayesian hierarchical model for differential
expression
Decision rules
Predictive model checks
Gene Ontology analysis for differentially
expressed genes
Further work
Outline
1st level
y
g1r

g,
δ
g,
g1
N(
g
–
½
δ
g
+
r(g)1
,
g1
2
),
y
g2r

g,
δ
g,
g2
N(
g
+ ½
δ
g
+
r(g)2
,
g2
2
),
Bayesian hierarchical model for
differential expression
array effect or
normalisation
(function of
g
)
differential effect for gene g
between 2 conditions
(fixed effect or mixture prior)
overall gene
expression
(fixed effect)
variance for
each gene
y
gsr
is log gene expession
2nd level
gs
2

μ
s
,
τ
s
logNorm (
μ
s
,
τ
s
)
Hyper

parameters
μ
s
and
τ
s
can be
influential, so these are estimated
in the model.
3rd level
μ
s
N( c, d)
τ
s
Gamma (e, f)
Prior for gene variances
Variances estimated using information
from all measurements (~12000 x 3)
rather than just 3
3 wildtype mice
Spline Curve
r(g)s
= quadratic in
g
for a
rs(k

1)
≤
g
≤
a
rs(k)
with coeff (b
rsk
(1)
,
b
rsk
(2)
), k =1, …
#breakpoints
Prior for array effects (Normalization)
Locations of break points not fixed
Must do sensitivity checks on # break points
a
1
a
2
a
3
a
0
loess
Bayesian posterior mean
Array effect as function of gene effect
Inference on
δ
(1)
d
g
= E(
δ
g
 data) posterior mean
Like point estimate of log fold change.
Decision Rule: gene g is DE if d
g
 >
δ
cut
(2)
p
g
= P( 
δ
g

>
δ
cut
 data)
posterior probability (incorporates uncertainty)
Decision Rule: gene g is DE if p
g
> p
cut
This allows biologist to specify what size of effect
is interesting (not just statistical significance)
Decision Rules for Inference:
Fixed Effects Model
biological
interest
biological
interest
statistical
confidence
Illustration of decision rule
p
g
= P( 
δ
g

> log(2)
and
g
> 4
 data)
x
p
g
> 0.8
Δ
t

statistic > 2.78
(95% CI)
3 wildtype v. 3 knock

out mice
Introduction to microarrays and differential
expression
Bayesian hierarchical model for differential
expression
Decision rules
Predictive model checks
Gene Ontology analysis for differentially
expressed genes
Further work
Outline
Key Points
Predict new data from the model (using the
posterior distribution)
Get Bayesian p

value for
each
gene
Use
all genes together
(1000’s) to assess model
fit (p

value distribution close to Uniform if model
is good)
Predictive Model Checks
Mixed Predictive Checks
g
ybar
g
S
g
post.
pred.
S
g
mixed
pred.
S
g
σ
g
pred
σ
g
μ
,
τ
Mixed prediction is less
conservative than posterior
prediction
Bayesian predictive p

values
Introduction to microarrays and differential
expression
Bayesian hierarchical model for differential
expression
Decision rules
Predictive model checks
Gene Ontology analysis for differentially
expressed genes
Further work
Outline
Picture from Gene Ontology website
Links connect more general
to more specific terms
Directed Acyclic Graph
~16,000 terms
Gene Ontology: network of terms
Picture from Gene Ontology website
Each term may have
1000s of genes
annotated (or none)
Gene may be annotated
to several GO terms
Gene annotated to term A
annotated to all
ancestors of A
Annotations of genes to a node
GO annotations of genes associated
with the insulin

resistance gene Cd36
Compare GO annotations of genes
most and least differentially
expressed
Most differentially expressed
↔
p
g
> 0.5 (280 genes)
Least differentially expressed
↔
p
g
< 0.2 (11171 genes)
GO annotations of genes associated
with the insulin

resistance gene Cd36
Use Fisher’s test to compare GO annotations of genes most and
least differentially expressed (one test for each GO term)
None significant with simple multiple testing adjustment, but there
are many dependencies
Inflammatory
response recently
found to be important
in insulin resistance
Summary of work in Biometrics paper
Bayesian hierarchical model flexible, estimates variances
robustly
Predictive model checks show exchangeable prior good for
gene variances
Useful to find GO terms over

represented in the most
differentially

expressed genes
Introduction to microarrays and differential
expression
Bayesian hierarchical model for differential
expression
Decision rules
Predictive model checks
Gene Ontology analysis for differentially
expressed genes
Further work
Outline
BGmix: mixture model for
differential expression
Group genes into 3 classes:
non

DE
over

expressed
under

expressed
Estimation and classification is
simultaneous
Change the prior on the
differential expression
parameters
δ
g
BGmix: mixture model for
differential expression
Choice of Null Distribution
True log fold changes = 0
‘Nugget’ null: true log fold
changes = small but not
necessarily zero
Choice of DE genes distributions
Gammas
Uniforms
Normal
Outputs
Point estimates (and s.d.) of log fold changes (stabilised and
smoothed)
Posterior probability for gene to be in each group
Estimate of proportion of differentially expressed genes based on
grouping (parameter of model)
BGmix: mixture model for
differential expression
Obtaining gene lists
Threshold on posterior probabilities
(Posterior probability of classification in the
null < threshold
→ gene is DE)
Estimate of False Discovery Rate
for any gene list (estimate =
average of posterior probabilities)
Very simple estimate!
Choice of decision rule:
Bayes Rule
Fix False Discovery Rate
More complex rules for mixture
of 3 components
BGmix: mixture model for
differential expression
g
g
pred
z
g
ybar
g
S
g
mixed
pred.
ybar
g
mixed
pred.
S
g
σ
g
pred
σ
g
μ
,
τ
η
w
Model checks for
differential expression
parameters
δ
g
More complex for
mixture model
Important point: we
check each mixture
component separately
Predictive Checks for Mixture Model
Bayesian p

values for Mixture Model
Simulated data
from incorrect
model
Simulated data
from correct
model
Acknowledgements
Co

authors
Sylvia Richardson, Clare Marshall
(
IC Epidemiology)
Tim Aitman, Anne

Marie Glazier (IC Microarray Centre)
Collaborators on BGX Grant
Anne

Mette Hein, Natalia Bochkina
(
IC Epidemiology)
Helen Causton (IC Microarray Centre)
Peter Green (Bristol)
BBSRC Exploiting Genomics Grant
Papers and Software
Software
:
Winbugs code for model in Biometrics paper
BGmix (R package) includes mixture model
Papers
:
BGmix paper, submitted
Paper on predictive checks for mixure prior, in preparation
http://www.bgx.org.uk/
Comments 0
Log in to post a comment