# Using Bayesian Networks to

AI and Robotics

Nov 7, 2013 (4 years and 6 months ago)

104 views

Using Bayesian Networks to
Analyze Expression Data

By Friedman Nir, Linial Michal,
Nachman Iftach, Pe'er Dana (2000)

Presented by

Nikolaos Aravanis

Lysimachos Zografos

Alexis Ioannou

Outline

Introduction

Bayesian Networks

Application to expression data

Application to cell cycle expression patterns

Discussion and future work

Analysis

Development of microarrays

Measure
all

the genes of an organism

Enormous amount of data

Challenge: Analyze datasets and infer biological
interactions

Most Common Analysis Tool

Clustering Algorithms

Allocate groups of genes with similar expression
patterns over a set of experiments

Discover genes that are co
-
regulated

Problems

Data give only a partial picture

Key events are not reflected (translation and
protein (in) activation)

Amount of samples give few information for
constructing full detailed models

Using current technologies even few samples
have high noise to signal ratio

Possible Solution

Analyze gene expression patterns that uncover
properties of the transcriptional program

Examine dependence and conditional
independence of the data

Bayesian Networks

Bayesian Networks

Represent dependence structure between multiple
interacting quantities

Capable of handling noise and estimating the confidence
in the different features of the network

Focus on interactions whose signal is strong

Useful for describing processes composed of locally
interacting components

Statistical foundations for learning Bayesian networks
and the statistics to do so are well understood and have
been successfully applied

Provide models of causal influence

Informal Introduction to
Bayesian Networks

Let P(X,Y) be a joint distribution over variables X and Y

X and Y independent if P(X,Y) = P(X)P(Y) for all values X and Y

Gene A is transcriptor factor of gene B

We expect their expression level to be dependent

A parent of B

B trascription factor of C

Expression levels of each pair are dependent

If A does not directly affect C, if we fix the expression

level of B, we will observe A and C are independent

P(A|B,C) = P(A|B) (A and C conditionally independent of B)

I(A;C|B)

Informal Introduction to
Bayesian Networks (contd’)

Component of Bayesian Networks is that each variable

is a stochastic function of its parents

Stochastic models are natural in gene expression domain

The biological models we want to process are stochastic

Measurements are noisy

Representing Distributions with

Bayesian Networks

Representation of joint probability distribution
consisting of 2 components

Directed acyclic graph (G)

Conditional distribution for each variable given its
parents in G

G encodes Markov Assumption

By applying chain rule this decomposes in product
form

i
i
i
i
Pa
ents
NonDescend

|
;
,
Equivalence Classes of BNs

A BN
implies

further independence
assumptions

=>
Ind(G)

>1 graphs

can imply the same assumptions

=>
Equivalent

networks if
Ind(G)=Ind(G')

Equivalence Classes of BNs

A BN
implies

further independence statements

=>
Ind(G)

>1 graphs

can imply the same statements

=>
Equivalent

networks if
Ind(G)=Ind(G')

Ind(G)=Ind(G')=Ø

Equivalence Classes of BNs

For equivalent networks:

DAGs have the same underlying undirected
graph.

PDAGs

are used to represent them.

Equivalence Classes of BNs

For equivalent networks:

DAGs have the same underlying undirected
graph.

PDAGs

are used to represent them.

Disagreeing

edge

Question:

Given dataset D, what BN, B=<G,Θ> best
matches D?

Statistically motivated scoring function to
evaluate each BN: e.g.
Bayesian Score

S(G:D)=logP(G|D)=logP(D|G)+logP(G)+C
,

where C is a constant independent of G

and
P(D|G)=∫P(D|G,Θ)P(Θ|G)dΘ

is the
marginal likelihood

over all parameters for
G.

Learning BNs

Question:

Given dataset D, what BN, B=<G,Θ> best
matches D?

Statistically motivated scoring function to
evaluate each BN: e.g.
Bayesian Score

S(G:D)=logP(G|D)=
logP(D|G)
+
logP(G)
+
C
,

where
C is a constant

independent of G

and
P(D|G)
=∫P(D|G,Θ)
P(Θ|G)

is the
marginal likelihood

over all parameters for
G.

Learning BNs

Learning BNs (contd)

Steps:

Decide priors

(P(Θ|D), P(G))

=> Use of BDe priors

(structure equivalent, decomposable)

Find G to
maximize S(G:D)

NP hard problem

=>local search using local permutations of
candidate G

(Heckerman et al. 1995)

Learning Causal Patterns

Bayesian Network is
model of dependencies

Interest in
modelling the process

that generated
them.

=> model the
flow of causality

in the system of
interest and create a
Causal Network
(CN).

A Causal Network models the probability
distribution

as well as the effect of causality.

CNs VS BNs:

-

CNs interpret parents as immediate causes

(c.f. BNs)

-

CNs and BNs relate when using the

Causal Markov Assumption
:

“given the values of a variable's immediate
causes, it is independent of its earlier causes”, if
this holds, then BN==CN

Learning Causal Patterns

CNs VS BNs:

-

CNs interpret parents as immediate causes

(c.f. BNs)

-

CNs and BNs relate when using the

Causal Markov Assumption
:

“given the values of a variable's immediate
causes, it is independent of its earlier causes”, if
this holds, then BN==CN

Learning Causal Patterns

X

Y

X

Y

equivalent BNs
but not CNs

Applying BNs to Expression
Data

Expression level

of each gene as a
random
variable

Other attributes (e.g temperature, exp.
conditions) that affect the system can be
modelled as random variables

Bayesian Net/ Dependency structure can

CON: problems in
computational complexity

and the statistical significance of the resulting
networks.

PRO: genetic regulation networks are
sparse

Representing Partial Models

Gene networks: many variables

=>
>1 plausible models

(not enough data)

we
can learn up to equivalence class.

Focus on
feature learning

in order to have a

causal network
:

Representing Partial Models

Features
:

-

Markov relations (e.g. Markov Blanket)

-

Order relations (e.g. X is an ancestor of Y in all
networks)

Representing Partial Models

Features
:

-

Markov relations (e.g. Markov Blanket)

-

Order relations (e.g. X is an ancestor of Y in all
networks)

Causal Network

Statistical Confidence of
Features

Likelihood

that a given feature is actually true.

Can't calculate posterior (P(G|D))

=>
Bootstrap method

for i=1...n

resample D with replacement
-
>
D';

learn G' from D';

end

Statistical Confidence of
Features

Individual feature confidence (IFC)

IFC = (1/n)∑{f(G')}

where f(G') = 1 if the feature exists in G'

Efficient Learning Algorithms

Vast search space

=> need
efficient

algorithms

Attention on relevant regions of the search
space

=>
Sparse Candidate Algorithm

Efficient Learning Algorithms

Sparse Candidate Algorithm

Identify a small number of candidate parents
for each gene based on simple local statistics
(e.g. correlation).

Restrict our search to networks with the
candidate parents

Potential pitfall:
early choice

Discretization

The practical side:

Need to define the local probability model for
each variable.

=> discretize experimental data into
-
1,0,1

(expression level lower, similar, higher than
control)

Set control by averaging.

Set a threshold ratio for significantly
higher/lower.

Application to Cell Cycle
Expression Patterns

76 gene expression measurements of the mRNA levels
of 6177
Saccharomyces cerevisiae ORFs. Six time
series under different cell cycle synchronization
methods(Spellman 1998).

800 differentially expressed, 250 clustered in 8 distinct
clusters. Variables for the networks represent the
expression level of the 800 genes.

Introduced an additional variable that denoted the cell
cycle phase to deal with the temporal nature of the cell
cycle process and forced it as a root in the network

Applied Sparse Candidate Algorithm to 200
-

fold
bootstrap of the original data.

Used no prior biological knowledge in the learning
algorithm

Network with all edges

Network with edges that
represent relations with
confidence level above 0.3

YNL058C Local
Map

Edges

Markov

Ancestors

Descendants

SGD entry

YPD entry

Robustness analysis

Use 250 gene data for robustness analysis

Create random data set by permuting the
order of experiments independently for
each gene

No “real” features are expected to be
found

Robustness analysis (contd’)

Lower confidence for order and

Markov relations in the random

data set

Longer and heavier tail in the

high confidence region in the

original data set

Sparser networks learned from

real data

Features learned in original

data with high confidence level are

not an artifact of the bootstrap

estimation

Robustness analysis (contd’)

Compared confidence level of learned features between 250
and 800 gene data set

Strong linear correlation

Compared confidence level of learned features between
different discretization thresholds

Definite linear tendency

Biological Analysis

Order relations

Dominant genes indicate potential causal
sources of the cell cycle process

Dominance score of X

where is
the confidence in X
being ancestor of Y

, k is used to reward
high confidence features and t is a
threshold to discard low confidence ones

t
Y
X
C
Y
k
o
Y
X
C
)
,
(
,
0
)
,
(
)
,
(
Y
X
C
o

Biological Analysis (contd’)

Dominant genes are key genes in basic
cell functions

Biological Analysis (contd’)

Top Markov relations reveal functional relations between genes

1. Both genes known: The relations make sense biologically

2. One unknown gene: Firm homologies to proteins functionally

related to the other gene

3. Two unknown genes: Physically adjacent to the chromosome,

presumably regulated by the same mechanism

FAR1
-

ASH1, low correlation, different clusters, known though to participate in a mating type
switch

CLN2 is likely to be a parent to RNR3, SVS1, SRO4 and RAD41. Appeared in same cluster. No
links between the 4 genes. CLN2 is known to be a central cycle control and there is no clear
biological relationship between the others

Markov relations

Discussion and Future Work

Applied Sparse Candidate Algorithm and Bootstrap resampling to extract a
Bayesian Network for the 800 genes data set of Spellman

Used no prior biological knowledge

Derived biologically plausible conclusions

Capability of discovering causal relationships, interactions between genes
and rich structure between clusters.

Developing hybrid algorithms with clustering algorithms to learn models
over clustered genes

Extensions:

Learn local probability models dealing with continuous data

Improve theory and algorithms

Include biological knowledge as prior knowledge

Improve search heuristics

Apply Dynamic Bayesian Networks to temporal data

Discover causal patterns (using interventional data)