Using Bayesian Networks to

benhurspicyAI and Robotics

Nov 7, 2013 (3 years and 7 months ago)

81 views

Using Bayesian Networks to
Analyze Expression Data

By Friedman Nir, Linial Michal,
Nachman Iftach, Pe'er Dana (2000)


Presented by

Nikolaos Aravanis

Lysimachos Zografos

Alexis Ioannou


Outline


Introduction


Bayesian Networks


Application to expression data


Application to cell cycle expression patterns


Discussion and future work

The Road to Microarray Data
Analysis


Development of microarrays


Measure
all

the genes of an organism


Enormous amount of data


Challenge: Analyze datasets and infer biological
interactions

Most Common Analysis Tool


Clustering Algorithms


Allocate groups of genes with similar expression
patterns over a set of experiments


Discover genes that are co
-
regulated


Problems


Data give only a partial picture


Key events are not reflected (translation and
protein (in) activation)


Amount of samples give few information for
constructing full detailed models


Using current technologies even few samples
have high noise to signal ratio





Possible Solution


Analyze gene expression patterns that uncover
properties of the transcriptional program


Examine dependence and conditional
independence of the data


Bayesian Networks


Bayesian Networks


Represent dependence structure between multiple
interacting quantities


Capable of handling noise and estimating the confidence
in the different features of the network


Focus on interactions whose signal is strong


Useful for describing processes composed of locally
interacting components


Statistical foundations for learning Bayesian networks
and the statistics to do so are well understood and have
been successfully applied


Provide models of causal influence

Informal Introduction to
Bayesian Networks


Let P(X,Y) be a joint distribution over variables X and Y



X and Y independent if P(X,Y) = P(X)P(Y) for all values X and Y


Gene A is transcriptor factor of gene B


We expect their expression level to be dependent


A parent of B



B trascription factor of C


Expression levels of each pair are dependent



If A does not directly affect C, if we fix the expression


level of B, we will observe A and C are independent



P(A|B,C) = P(A|B) (A and C conditionally independent of B)


I(A;C|B)

Informal Introduction to
Bayesian Networks (contd’)


Component of Bayesian Networks is that each variable


is a stochastic function of its parents



Stochastic models are natural in gene expression domain


The biological models we want to process are stochastic


Measurements are noisy

Representing Distributions with


Bayesian Networks


Representation of joint probability distribution
consisting of 2 components


Directed acyclic graph (G)


Conditional distribution for each variable given its
parents in G


G encodes Markov Assumption




By applying chain rule this decomposes in product
form











i
i
i
i
Pa
ents
NonDescend





|
;
,
Equivalence Classes of BNs


A BN
implies

further independence
assumptions

=>
Ind(G)



>1 graphs

can imply the same assumptions

=>
Equivalent

networks if
Ind(G)=Ind(G')

Equivalence Classes of BNs


A BN
implies

further independence statements

=>
Ind(G)



>1 graphs

can imply the same statements

=>
Equivalent

networks if
Ind(G)=Ind(G')

Ind(G)=Ind(G')=Ø

Equivalence Classes of BNs

For equivalent networks:



DAGs have the same underlying undirected
graph.



PDAGs

are used to represent them.

Equivalence Classes of BNs

For equivalent networks:



DAGs have the same underlying undirected
graph.



PDAGs

are used to represent them.

Disagreeing

edge

Question:

Given dataset D, what BN, B=<G,Θ> best
matches D?

Answer:

Statistically motivated scoring function to
evaluate each BN: e.g.
Bayesian Score


S(G:D)=logP(G|D)=logP(D|G)+logP(G)+C
,

where C is a constant independent of G

and
P(D|G)=∫P(D|G,Θ)P(Θ|G)dΘ

is the
marginal likelihood

over all parameters for
G.

Learning BNs

Question:

Given dataset D, what BN, B=<G,Θ> best
matches D?

Answer:

Statistically motivated scoring function to
evaluate each BN: e.g.
Bayesian Score


S(G:D)=logP(G|D)=
logP(D|G)
+
logP(G)
+
C
,

where
C is a constant

independent of G

and
P(D|G)
=∫P(D|G,Θ)
P(Θ|G)


is the
marginal likelihood

over all parameters for
G.

Learning BNs

Learning BNs (contd)

Steps:




Decide priors

(P(Θ|D), P(G))

=> Use of BDe priors

(structure equivalent, decomposable)




Find G to
maximize S(G:D)


NP hard problem

=>local search using local permutations of
candidate G



(Heckerman et al. 1995)

Learning Causal Patterns



Bayesian Network is
model of dependencies




Interest in
modelling the process

that generated
them.


=> model the
flow of causality

in the system of
interest and create a
Causal Network
(CN).


A Causal Network models the probability
distribution

as well as the effect of causality.

CNs VS BNs:

-

CNs interpret parents as immediate causes

(c.f. BNs)





-

CNs and BNs relate when using the

Causal Markov Assumption
:

“given the values of a variable's immediate
causes, it is independent of its earlier causes”, if
this holds, then BN==CN

Learning Causal Patterns

CNs VS BNs:

-

CNs interpret parents as immediate causes

(c.f. BNs)


-

CNs and BNs relate when using the

Causal Markov Assumption
:

“given the values of a variable's immediate
causes, it is independent of its earlier causes”, if
this holds, then BN==CN

Learning Causal Patterns

X

Y

X

Y

equivalent BNs
but not CNs

Applying BNs to Expression
Data



Expression level

of each gene as a
random
variable



Other attributes (e.g temperature, exp.
conditions) that affect the system can be
modelled as random variables



Bayesian Net/ Dependency structure can
answer queries



CON: problems in
computational complexity

and the statistical significance of the resulting
networks.



PRO: genetic regulation networks are
sparse

Representing Partial Models



Gene networks: many variables

=>
>1 plausible models

(not enough data)


we
can learn up to equivalence class.



Focus on
feature learning

in order to have a

causal network
:

Representing Partial Models

Features
:

-

Markov relations (e.g. Markov Blanket)

-

Order relations (e.g. X is an ancestor of Y in all
networks)










Representing Partial Models

Features
:

-

Markov relations (e.g. Markov Blanket)

-

Order relations (e.g. X is an ancestor of Y in all
networks)








Feature learning leads to a

Causal Network


Statistical Confidence of
Features



Likelihood

that a given feature is actually true.



Can't calculate posterior (P(G|D))

=>
Bootstrap method


for i=1...n


resample D with replacement
-
>
D';

learn G' from D';


end


Statistical Confidence of
Features

Individual feature confidence (IFC)


IFC = (1/n)∑{f(G')}


where f(G') = 1 if the feature exists in G'

Efficient Learning Algorithms



Vast search space

=> need
efficient

algorithms



Attention on relevant regions of the search
space


=>
Sparse Candidate Algorithm



Efficient Learning Algorithms

Sparse Candidate Algorithm


Identify a small number of candidate parents
for each gene based on simple local statistics
(e.g. correlation).




Restrict our search to networks with the
candidate parents



Potential pitfall:
early choice

=> Solution: adaptive algorithm

Discretization

The practical side:

Need to define the local probability model for
each variable.


=> discretize experimental data into
-
1,0,1


(expression level lower, similar, higher than
control)




Set control by averaging.



Set a threshold ratio for significantly
higher/lower.

Application to Cell Cycle
Expression Patterns



76 gene expression measurements of the mRNA levels
of 6177
Saccharomyces cerevisiae ORFs. Six time
series under different cell cycle synchronization
methods(Spellman 1998).


800 differentially expressed, 250 clustered in 8 distinct
clusters. Variables for the networks represent the
expression level of the 800 genes.


Introduced an additional variable that denoted the cell
cycle phase to deal with the temporal nature of the cell
cycle process and forced it as a root in the network


Applied Sparse Candidate Algorithm to 200
-

fold
bootstrap of the original data.


Used no prior biological knowledge in the learning
algorithm

Network with all edges

Network with edges that
represent relations with
confidence level above 0.3

YNL058C Local
Map



Edges



Markov



Ancestors



Descendants



SGD entry



YPD entry

Robustness analysis



Use 250 gene data for robustness analysis


Create random data set by permuting the
order of experiments independently for
each gene


No “real” features are expected to be
found

Robustness analysis (contd’)



Lower confidence for order and


Markov relations in the random


data set



Longer and heavier tail in the


high confidence region in the


original data set



Sparser networks learned from


real data



Features learned in original


data with high confidence level are


not an artifact of the bootstrap


estimation

Robustness analysis (contd’)


Compared confidence level of learned features between 250
and 800 gene data set


Strong linear correlation


Compared confidence level of learned features between
different discretization thresholds


Definite linear tendency

Biological Analysis



Order relations


Dominant genes indicate potential causal
sources of the cell cycle process



Dominance score of X




where is
the confidence in X
being ancestor of Y

, k is used to reward
high confidence features and t is a
threshold to discard low confidence ones






t
Y
X
C
Y
k
o
Y
X
C
)
,
(
,
0
)
,
(
)
,
(
Y
X
C
o


Biological Analysis (contd’)


Dominant genes are key genes in basic
cell functions

Biological Analysis (contd’)


Top Markov relations reveal functional relations between genes


1. Both genes known: The relations make sense biologically


2. One unknown gene: Firm homologies to proteins functionally


related to the other gene


3. Two unknown genes: Physically adjacent to the chromosome,


presumably regulated by the same mechanism


FAR1
-

ASH1, low correlation, different clusters, known though to participate in a mating type
switch


CLN2 is likely to be a parent to RNR3, SVS1, SRO4 and RAD41. Appeared in same cluster. No
links between the 4 genes. CLN2 is known to be a central cycle control and there is no clear
biological relationship between the others

Markov relations

Discussion and Future Work



Applied Sparse Candidate Algorithm and Bootstrap resampling to extract a
Bayesian Network for the 800 genes data set of Spellman


Used no prior biological knowledge


Derived biologically plausible conclusions


Capability of discovering causal relationships, interactions between genes
and rich structure between clusters.



Developing hybrid algorithms with clustering algorithms to learn models
over clustered genes


Extensions:


Learn local probability models dealing with continuous data


Improve theory and algorithms


Include biological knowledge as prior knowledge


Improve search heuristics


Apply Dynamic Bayesian Networks to temporal data


Discover causal patterns (using interventional data)