Using Bayesian Networks to
Analyze Expression Data
By Friedman Nir, Linial Michal,
Nachman Iftach, Pe'er Dana (2000)
Presented by
Nikolaos Aravanis
Lysimachos Zografos
Alexis Ioannou
Outline
•
Introduction
•
Bayesian Networks
•
Application to expression data
•
Application to cell cycle expression patterns
•
Discussion and future work
The Road to Microarray Data
Analysis
•
Development of microarrays
–
Measure
all
the genes of an organism
•
Enormous amount of data
•
Challenge: Analyze datasets and infer biological
interactions
Most Common Analysis Tool
•
Clustering Algorithms
•
Allocate groups of genes with similar expression
patterns over a set of experiments
•
Discover genes that are co

regulated
Problems
•
Data give only a partial picture
–
Key events are not reflected (translation and
protein (in) activation)
•
Amount of samples give few information for
constructing full detailed models
•
Using current technologies even few samples
have high noise to signal ratio
Possible Solution
•
Analyze gene expression patterns that uncover
properties of the transcriptional program
•
Examine dependence and conditional
independence of the data
•
Bayesian Networks
Bayesian Networks
•
Represent dependence structure between multiple
interacting quantities
•
Capable of handling noise and estimating the confidence
in the different features of the network
•
Focus on interactions whose signal is strong
•
Useful for describing processes composed of locally
interacting components
•
Statistical foundations for learning Bayesian networks
and the statistics to do so are well understood and have
been successfully applied
•
Provide models of causal influence
Informal Introduction to
Bayesian Networks
•
Let P(X,Y) be a joint distribution over variables X and Y
•
X and Y independent if P(X,Y) = P(X)P(Y) for all values X and Y
•
Gene A is transcriptor factor of gene B
•
We expect their expression level to be dependent
•
A parent of B
•
B trascription factor of C
•
Expression levels of each pair are dependent
•
If A does not directly affect C, if we fix the expression
level of B, we will observe A and C are independent
•
P(AB,C) = P(AB) (A and C conditionally independent of B)
I(A;CB)
Informal Introduction to
Bayesian Networks (contd’)
•
Component of Bayesian Networks is that each variable
is a stochastic function of its parents
•
Stochastic models are natural in gene expression domain
•
The biological models we want to process are stochastic
•
Measurements are noisy
Representing Distributions with
Bayesian Networks
•
Representation of joint probability distribution
consisting of 2 components
•
Directed acyclic graph (G)
•
Conditional distribution for each variable given its
parents in G
•
G encodes Markov Assumption
•
By applying chain rule this decomposes in product
form
i
i
i
i
Pa
ents
NonDescend

;
,
Equivalence Classes of BNs
A BN
implies
further independence
assumptions
=>
Ind(G)
>1 graphs
can imply the same assumptions
=>
Equivalent
networks if
Ind(G)=Ind(G')
Equivalence Classes of BNs
A BN
implies
further independence statements
=>
Ind(G)
>1 graphs
can imply the same statements
=>
Equivalent
networks if
Ind(G)=Ind(G')
Ind(G)=Ind(G')=Ø
Equivalence Classes of BNs
For equivalent networks:
DAGs have the same underlying undirected
graph.
PDAGs
are used to represent them.
Equivalence Classes of BNs
For equivalent networks:
DAGs have the same underlying undirected
graph.
PDAGs
are used to represent them.
Disagreeing
edge
Question:
Given dataset D, what BN, B=<G,Θ> best
matches D?
Answer:
Statistically motivated scoring function to
evaluate each BN: e.g.
Bayesian Score
S(G:D)=logP(GD)=logP(DG)+logP(G)+C
,
where C is a constant independent of G
and
P(DG)=∫P(DG,Θ)P(ΘG)dΘ
is the
marginal likelihood
over all parameters for
G.
Learning BNs
Question:
Given dataset D, what BN, B=<G,Θ> best
matches D?
Answer:
Statistically motivated scoring function to
evaluate each BN: e.g.
Bayesian Score
S(G:D)=logP(GD)=
logP(DG)
+
logP(G)
+
C
,
where
C is a constant
independent of G
and
P(DG)
=∫P(DG,Θ)
P(ΘG)
dΘ
is the
marginal likelihood
over all parameters for
G.
Learning BNs
Learning BNs (contd)
Steps:
–
Decide priors
(P(ΘD), P(G))
=> Use of BDe priors
(structure equivalent, decomposable)
–
Find G to
maximize S(G:D)
NP hard problem
=>local search using local permutations of
candidate G
(Heckerman et al. 1995)
Learning Causal Patterns
–
Bayesian Network is
model of dependencies
–
Interest in
modelling the process
that generated
them.
=> model the
flow of causality
in the system of
interest and create a
Causal Network
(CN).
A Causal Network models the probability
distribution
as well as the effect of causality.
CNs VS BNs:

CNs interpret parents as immediate causes
(c.f. BNs)

CNs and BNs relate when using the
Causal Markov Assumption
:
“given the values of a variable's immediate
causes, it is independent of its earlier causes”, if
this holds, then BN==CN
Learning Causal Patterns
CNs VS BNs:

CNs interpret parents as immediate causes
(c.f. BNs)

CNs and BNs relate when using the
Causal Markov Assumption
:
“given the values of a variable's immediate
causes, it is independent of its earlier causes”, if
this holds, then BN==CN
Learning Causal Patterns
X
Y
X
Y
equivalent BNs
but not CNs
Applying BNs to Expression
Data
Expression level
of each gene as a
random
variable
Other attributes (e.g temperature, exp.
conditions) that affect the system can be
modelled as random variables
Bayesian Net/ Dependency structure can
answer queries
CON: problems in
computational complexity
and the statistical significance of the resulting
networks.
PRO: genetic regulation networks are
sparse
Representing Partial Models
–
Gene networks: many variables
=>
>1 plausible models
(not enough data)
–
we
can learn up to equivalence class.
Focus on
feature learning
in order to have a
causal network
:
Representing Partial Models
Features
:

Markov relations (e.g. Markov Blanket)

Order relations (e.g. X is an ancestor of Y in all
networks)
Representing Partial Models
Features
:

Markov relations (e.g. Markov Blanket)

Order relations (e.g. X is an ancestor of Y in all
networks)
Feature learning leads to a
Causal Network
Statistical Confidence of
Features
–
Likelihood
that a given feature is actually true.
–
Can't calculate posterior (P(GD))
=>
Bootstrap method
for i=1...n
resample D with replacement

>
D';
learn G' from D';
end
Statistical Confidence of
Features
Individual feature confidence (IFC)
IFC = (1/n)∑{f(G')}
where f(G') = 1 if the feature exists in G'
Efficient Learning Algorithms
–
Vast search space
=> need
efficient
algorithms
–
Attention on relevant regions of the search
space
=>
Sparse Candidate Algorithm
Efficient Learning Algorithms
Sparse Candidate Algorithm
Identify a small number of candidate parents
for each gene based on simple local statistics
(e.g. correlation).
–
Restrict our search to networks with the
candidate parents
–
Potential pitfall:
early choice
=> Solution: adaptive algorithm
Discretization
The practical side:
Need to define the local probability model for
each variable.
=> discretize experimental data into

1,0,1
(expression level lower, similar, higher than
control)
Set control by averaging.
Set a threshold ratio for significantly
higher/lower.
Application to Cell Cycle
Expression Patterns
•
76 gene expression measurements of the mRNA levels
of 6177
Saccharomyces cerevisiae ORFs. Six time
series under different cell cycle synchronization
methods(Spellman 1998).
•
800 differentially expressed, 250 clustered in 8 distinct
clusters. Variables for the networks represent the
expression level of the 800 genes.
•
Introduced an additional variable that denoted the cell
cycle phase to deal with the temporal nature of the cell
cycle process and forced it as a root in the network
•
Applied Sparse Candidate Algorithm to 200

fold
bootstrap of the original data.
•
Used no prior biological knowledge in the learning
algorithm
Network with all edges
Network with edges that
represent relations with
confidence level above 0.3
YNL058C Local
Map
•
Edges
•
Markov
•
Ancestors
•
Descendants
•
SGD entry
•
YPD entry
Robustness analysis
•
Use 250 gene data for robustness analysis
•
Create random data set by permuting the
order of experiments independently for
each gene
•
No “real” features are expected to be
found
Robustness analysis (contd’)
•
Lower confidence for order and
Markov relations in the random
data set
•
Longer and heavier tail in the
high confidence region in the
original data set
•
Sparser networks learned from
real data
•
Features learned in original
data with high confidence level are
not an artifact of the bootstrap
estimation
Robustness analysis (contd’)
•
Compared confidence level of learned features between 250
and 800 gene data set
•
Strong linear correlation
•
Compared confidence level of learned features between
different discretization thresholds
•
Definite linear tendency
Biological Analysis
Order relations
•
Dominant genes indicate potential causal
sources of the cell cycle process
•
Dominance score of X
where is
the confidence in X
being ancestor of Y
, k is used to reward
high confidence features and t is a
threshold to discard low confidence ones
t
Y
X
C
Y
k
o
Y
X
C
)
,
(
,
0
)
,
(
)
,
(
Y
X
C
o
Biological Analysis (contd’)
•
Dominant genes are key genes in basic
cell functions
Biological Analysis (contd’)
•
Top Markov relations reveal functional relations between genes
1. Both genes known: The relations make sense biologically
2. One unknown gene: Firm homologies to proteins functionally
related to the other gene
3. Two unknown genes: Physically adjacent to the chromosome,
presumably regulated by the same mechanism
•
FAR1

ASH1, low correlation, different clusters, known though to participate in a mating type
switch
•
CLN2 is likely to be a parent to RNR3, SVS1, SRO4 and RAD41. Appeared in same cluster. No
links between the 4 genes. CLN2 is known to be a central cycle control and there is no clear
biological relationship between the others
Markov relations
Discussion and Future Work
•
Applied Sparse Candidate Algorithm and Bootstrap resampling to extract a
Bayesian Network for the 800 genes data set of Spellman
•
Used no prior biological knowledge
•
Derived biologically plausible conclusions
•
Capability of discovering causal relationships, interactions between genes
and rich structure between clusters.
•
Developing hybrid algorithms with clustering algorithms to learn models
over clustered genes
•
Extensions:
–
Learn local probability models dealing with continuous data
–
Improve theory and algorithms
–
Include biological knowledge as prior knowledge
–
Improve search heuristics
–
Apply Dynamic Bayesian Networks to temporal data
–
Discover causal patterns (using interventional data)
Comments 0
Log in to post a comment