1

Bioinformática-55

Regulatory Networks

Bayesian Networks

2

Bioinformática-55

Bayesian Networks

• A Bayesian Network (BN) is a representation of

a join probability distribution

Compact & intuitive representation

Useful for describing processes composed of locally

interacting components

Have a good statistical foundation

Provide models of causal influence

Deals with noisy data

Efficient model learning algorithm

Bioinformática-55

Bayesian Networks

• Why is it suitable for this problem?

Gene expression is an

inherently stochastic

phenomenon

To capture the nature of

interactions

between genes

especially the

causal connection

A

B

Microarray techniques are associated with

missing

and

noisy

data values

3

Bioinformática-55

Analyzing Data

• Practical problem —Small data sets

variables —

hundreds

of or

thousands

of genes

samples — just

tens

of microarray experiments

• On the positive side, genetic regulation

networks are

sparse!!!

• Characterize and learn

features

that are

common to most of these networks

Bioinformática-55

Analyzing Data

• The first feature —Markov relations

Symmetric relation: Y is in X’s

Markov blanket

iff there

is either an edge between them, or both are parents of

another variable (Pearl 98).

Biological interpretation: a Markov relation indicates

that the two genes are related in some

joint biological

interaction or process

4

Bioinformática-55

Analyzing Data

• The second feature —order relations

Global property: A is an

ancestor

of B in all the

equivalent Bayesian networks learned

Biological interpretation: an order relation indicates

that the transcription of one gene is a

direct cause

of

the transcription of another gene

A

B

Bioinformática-55

Representing Distributions

• Considere a set of assertions and a variety of

ways in which they support each other

• Each assertion establishes a value for an

attribute and is of the form

X

i

= x

i

• The variables are X

1

,... ,X

n

• We would know everything we need to know

about the world described by these assertions if

we had the join probability P(X

1

,... ,X

n

)

5

Bioinformática-55

Representing Distributions

• From the previous probability function it is

possible to compute any other probability such

– P(X

2

) or

– P(X

2

|X

3

,X

5

)

• Assuming, for simplicity, that the variables are

binary, the representation complexity of

P(X

1

,... ,X

n

) is 2

n

– Impratical even for small value of n

Bioinformática-55

Bayesian Networks

• BN simplify this problem by taking advantage of

– Existing causal connections between assertions

– Assumptions about conditional independence

• A BN representation consists of two

components

– G, a DAG whose vertices correspond to the random

variables X

1

,... ,X

n

– Θ, a conditional distribution for each variable, given

its parents in G

• These two components specify a unique

distribution on X

1

,... ,X

n

6

Bioinformática-55

Bayesian Networks

• The graph G represents conditional

independence assumptions that allow the joint

distribution to be decomposed, economizing on

the number of parameters

• The graph G encodes the Markov Assumption

– Each variable X

i

is independent of its non-

descendants, given its parents in G

Bioinformática-55

Bayesian Networks

• Definition

We say that x is conditional

independent

of y given z if

P(x,y|z) = P(x|z)P(y|z)

or

P(x|y,z) = P(x|z)

• We denote I(X;Y|Z) to mean X is independent

of Y conditioned on Z

7

Bioinformática-55

Bayesian Networks

• Any joint distribution that satisfies the

markov assumption can be

decomposed in the product form

where Parents

G

(X

i

) is the set of parents of

X

i

in G

∏

=

=

n

i

i

G

in

XParentsXPXXP

1

1

))(|(),...,(

Bioinformática-55

Example

E

B

A

R

C

The network structure also

implies that the join distribution

has the product form:

)|(),|()|()()(

),,,,(

ACPEBAPERPBPEP

RCEBAP

=

This network structure implies

several conditional independence

statements:

I(E;B), I(A;R|E,B), I(C;E,R,B|A) ...

8

Bioinformática-55

Bayesian Networks

• The representation complexity is O(n2

k+1

)

where n is the number of variables and k is

the maximum number of possible parents

• The set of local conditional probability tables

for all variables, together with the set of

conditional independence assumptions

described by the network, describe the full

joint probability distribution for the network

Bioinformática-55

Example

S,B S,notB notS,B notS,notB

C 0.4 0.1 0.8 0.2

notC 0.6 0.9 0.2 0.8

Campfire

Storm

BustourGroup

Campfire

Lightning

Thunder

ForestFire

Associated with each node

is a conditional probability

table

9

Bioinformática-55

• Conditional distribution specification

– Discrete variables

In the case of finite valued

variables, we can represent as tables

– Continuous variables

When the variable and its

parents are real valued, there is no representation

of all possible densities. Use linear Gaussians

conditional densities in order to represent

multivariate continuous distributions

– Hybrid networks

When the network contains

continuous variables with discrete parents, use

conditional gaussian distributions. The case of a

discrete value with continuous parents is not

allowed

Bayesian Networks

Bioinformática-55

Inference

• Given a Bayesian network, we might want to answer

many types of questions that involve the joint

probability

– What is the probability of X=x given observation of some of

the other variables?

– Are X and Y independent once we observe Z?

• Example: Linkage Analysis

– “Linkage” refers to the tendency of certain genes to be

inherited together. Is a tool that enables us to describe a

family genotype tree and to find which parent a specific gene

has been inherited

– We can learn about a child given information about its

parents (no grandparent information is required)

10

Bioinformática-55

Equivalence Classes

• A Bayesian network structure G implies a set

of independence assumptions.

• Let Ind(G) be the set of independence

statements that hold in all distributions

satisfying these Markov ssumpitons

• More than one graph can imply exactly the

same set of independencies

• Two graphs G and G’ are equivalent if

Ind(G)=Ind(G’)

– Both graphs are alternative ways of describing the

same set of independencies

Bioinformática-55

Equivalence Classes

• The notion of equivalence is crucial, since when we

examine observations from a distribution, we cannot

distinguish between equivalent graphs

• Theorem

(Pearl & Verma 1991) Two DAGs are

equivalent if and only if they have the same

underlying undirect graph and the same v-structures

(i.e. Converging directed edges into the same node,

such as a-> b <- c)

11

Bioinformática-55

Learning Bayesian Networks

• Given a training set D = {x1,...,xN} of

independent instances of X, find a network

B = <G,Θ> that best matches D

– Θ corresponds to the parameters that specify the

conditional distributions

• More precisely, we search for an equivalence

class of networks that best matches D

Bioinformática-55

Learning Bayesian Networks

• Introduce a statistically motivated scoring

function that evaluates each network with

respect to the training data

• Search for the optimal network according to

this score

• It’s possible to derive a score using Bayesian

considerations

– Evaluate the posterior probability of a graph given

the data

12

Bioinformática-55

Learning Bayesian Networks

where C is a constante independent of G and

is the marginal likelihood which averages the probability of

the data over all possible parameter assignments to G

CGPGDP

DGPDGS

++=

=

)(log)|(log

)|(log):(

∫

ΘΘΘ= dGPGDPGDP )|(),|()|(

Bioinformática-55

Learning Bayesian Networks

• The particular choice of priors P(G) and

P(Θ|G) for each G determines the exact

Bayesian score

• This means, that given a sufficiently large

number of instances in large data sets,

learning procedures can pinpoint the exact

network structure up to the correct

equivalence class

13

Bioinformática-55

Bayes Theorem

• Provides a direct method to calculate the probability

of a hypothesis based on its prior probability, the

probabilities of observing various data given the

hypothesis, and the observed data itself

• Notation:

P(h) : the initial probability that hypothesis h holds, before we

have observed the training data (prior probability)

P(D) : the prior probability that training data D will be observed

P(D|h) : the probability of observing data D given some world in

which hypothesis h holds

P(h|D) : the probability that h holds given the observed training

data D (posterior probability of h)

Bioinformática-55

Bayes Theorem

• In machine learning problems we are

interested in the probability P(h|D)

• This probability reflects the influence of the

training data D, in contrast to the prior

probability P(h), which is independent of D

)(

)()|(

)|(

DP

hPhDP

DhP =

14

Bioinformática-55

Bayes Theorem

• In many learning scenarios, the learner considers

some set of candidate hypotheses H and is interested

in finding the most probable hypothesis h ∈ H given

the observed data D

• Such maximally probable hypothesis is called a

maximum a posteriori (MAP) hypothesis

• Use Bayes theorem to calculate the posterior

probability of each candidate hypothesis

Bioinformática-55

Bayes Theorem

h

MAP

is a MAP hypothesis provider

the term P(D) is dropped because it is a

constant independent of h

)()|(maxarg

)(

)()|(

maxarg

)|(maxarg

hPhDP

DP

hPhDP

DhPh

Hh

Hh

Hh

MAP

∈

∈

∈

=

=

≡

15

Bioinformática-55

Example

• Consider a medical diagnosis problem in

which there are two alternative hypotheses:

– (1) the patient has a particular form of cancer

– (2) the patient does not

• The available data is from a particular

laboratory test with two possible outcomes: Y

(positive) and N (negative)

• As prior knowledge: over the entire

population of people only 0.008 have this

disease

Bioinformática-55

Example

• Furthermore, the lab test is only an imperfect

indicator of the disease

• The test returns a correct positive result in only 98%

of the cases in which the disease is actually present

and a correct negative result in only 97%of the cases

in which the disease is not present

• Considere the following probabilities:

97.0)|(03.0)|(

02.0)|(98.0)|(

992.0)(008.0)(

=¬=¬

==

=

¬

=

cancerNPcancerYP

cancerNPcancerYP

cancerPcancerP

16

Bioinformática-55

Example

• Suppose we now observe a new patient

for whom the lab test returns a positive

result

• Should we diagnose the patient as

having cancer or not?

Bioinformática-55

Example

• The maximum a posteriori hypothesis

can be found:

thus

0298.0992.003.0)()|(

0078.0008.098.0)()|(

=×=¬¬

=

×

=

cancerPcancerYP

cancerPcancerYP

cancerh

MAP

¬

=

17

Bioinformática-55

Learning Bayesian Networks

• Priors for hybrid networks of multinomial distributions and conditional

Gaussian distributions

• Assuming that the data set is a complete data, several properties are

satisfied by these priors:

– The priors are structure equivalent (if G and G’ are equivalent

structures they are guaranteed to have the same score)

– The priors are decomposable. The score can be rewritten as the

sum

were the contribution for every variable Xi to the total network

score depends only on its own value and the values of its parents

in G

– The local contribution for each variable can be computed using a

closed form equation

∑

=

i

ii

DXParentsXibutionScoreContrDGScore ):)(,():(

Bioinformática-55

Learning Bayesian Networks

• Finding the structure G that maximizes the

score is known to be NP-hard

• Heuristic search:

– A local search procedure: changes one arc at

each move; efficiently evaluate the gains made by

adding, removing or reversing a single arc

– Greedy hill-climbing algorithm

• At each step performs the local change that results in the

maximal gain, until it reaches a local maximum

• Although this procedure does not necessarily find a

global maximum, it does perform well in practice

18

Bioinformática-55

Learning Causal Patterns

• Model the flow of causality in the system of interest

(e.g. gene transcription)

• A causal network is a model of such causal

processes

• Is similar to a Bayesian network but it views the

parents of a variable as its immediate causes

• Example: X is a TF of Y, so there is an edge X -> Y

• Relate causal networks and Bayesian networks, by

assuming the Causal Markov Assumption

Bioinformática-55

Learning Causal Patterns

• When can we learn a causal network from

observations?

• Distinction between an:

– Observation: a passive measurement of our domain (i.e., a

sample from X)

– Intervention: setting the values of some variables using forces

outside the causal model (e.g., gene knockout or over-

expression)

• Interventions are an important tool for inferring

causality

19

Bioinformática-55

Learning Causal Patterns

• What is surprising is that some causal

relations can be inferred from observations

alone

• When learning an equivalence class from the

data

, if a directed arc X -> Y is in the graph

then all the networks in the class agree that X

is an immediate cause of Y

• The situation is more complex when we have

a combination of observations and results of

different intervations

Bioinformática-55

Applying Bayesian Networks to

Expression Data

• The expression level of each gene is

modeled as a random variable

• Other atributes that affect the system can be

modeled as a random variable

– Attributes of the sample, such as experimental

conditions, temporal indicators, background

variables, etc

• Possible queries about the system

– Does the expression level of a particular gene

depend on the experimental conditions?

– The dependence is direct or indirect?

20

Bioinformática-55

Learn a model from expression data

• We attempt to build a model wich is a joint

distribution over a collection of random

variables

• These involve:

– Statistical aspects of interpreting the results

– Algorithmic complexity issues in learning from the

data

– The choice of local probability models

Bioinformática-55

Learn a model from expression data

• Learning difficulties

– Expression data involves thousands of genes

while current datasets contain at most a few

dozen samples

• This raises problems in computational complexity and

the statistical significance of the resulting networks

• The positive side

– Genetic regulation networks are sparce

• Bayesian networks are specially suited for learning in

such sparce domains

21

Bioinformática-55

Network Features

• When learning models with many variables,

small datasets are not sufficiently informative

• Many different networks should be

considered as reasonable explanation of the

given data

– We would like to analyze this set of plausible

networks

• From Bayesian perspective

– The posterior probability over the models is not

dominated by a single model

Bioinformática-55

Network Features

• Characterize features that are common to

most of these networks

• Focus on learning them

• Two classes of features involving pairs fo

variables (this type of analysis is not

restricted to pairwise)

– Markov Relations

– Order Relations

22

Bioinformática-55

Markov Relations

• A relation of this type specifies if Y is in the

Markov blanket of X

• The Markov blanket of X is the minimal set of

variables that shield X from the rest of the

variables in the model

• This relation is symmetric

– Y is in X’s Markov blanket if and only if there is

either an edge between them, or both are parents

of another variable

Bioinformática-55

Markov Relations

• In the context of gene expression analysis, a

Markov relation indicates that the two genes are

related in some joint biological interaction or

process

• Two variables in a Markov relation are directly

linked

in the sense that no variable in the model

mediates the dependence between them

• It remains possible that an unobserved variable

(e.g., protein activation) is an intermediate in their

interaction

23

Bioinformática-55

Order Relations

• Is X an ancestor of Y in all the networks of a

given equivalence class?

• This type of relation does not involve only a

close neighborhood, but rather captures a

global property

• Learning that X is an ancestor of Y would

imply that X is a cause of Y

– We view such a relation as an indication, rather

than an evidence, that X might be a causal

ancestor of Y

Bioinformática-55

Estimating Statistical Confidence in

Features

• To what extent do the data support a given

feature?

• We want to estimate a measure of confidence

in the features of the learned network

– “confidence”, approximates the likelihood that a

given feature is actually true

• An effective approach for estimating

confidence is the bootstrap method (Efron

and Tibshirani 1993)

24

Bioinformática-55

Methods

• Learning algorithm —induce network

structure

– Sparse Candidate Algorithm.

• Feature estimate —extract useful

features

– A Bootstrap Approach.

Bioinformática-55

Sparse Candidate Algorithm

• An heuristic, iterative approach

• At each iteration n, for each variable Xi, the algorithm

chooses the set C

i

n

= {Y1,...,Yk} of variables which

are the most promising candidate parents for Xi

• Search for Gn, a high scoring network in which

Parents

Gn

(X

i

) ⊆ C

i

n

• The network found is then used to guide the selection

of better candidate sets for the next iteration

25

Bioinformática-55

Sparse Candidate Algorithm

• Method for choosing C

i

n

– Assign each variable Xj some score of relevance to Xi (such as

correlation)

– Choose variables with the highest score

• How to measure the relevance of potential parent Xj to Xi?

S(X

i

,Parents

Gn-1

(X

i

)∪{X

j

} : D) – S(X

i

,Parents

Gn-1

(X

i

) : D)

• Restrict the search to networks in which only the candidate

parents of a variable can be its parents

Bioinformática-55

Bootstrap Method

• Generate ”perturbed” versions of original data set,

and learn from them

• Collect many networks, all of which are fairly

reasonable models of the data

• The networks show how small perturbations to the

data can affect many of the features

• Experiments show that features induced with high

confidence are rarely false positives, even in cases

where the data sets are small compared to the

system being learned

26

Bioinformática-55

Bootstrap Method

• For i=1 …m

• Resample with replacement N instances from D.

Denote by D

i

the resulting dataset

• Learn on D

i

to induce a network structure G

i

• For each feature f of interest calculate

conf(f) =Σ

i=1

m

f(G

i

)/m

Where f(G

i

) = 1 if f is a feature in G

i

, and 0 otherwise

Bioinformática-55

Experiment

• The map left is an

example of Markov

relation features for

gene SVS1.

• The width of edges

corresponds to the

confidence.

27

Bioinformática-55

Bayesian Networks

• Advantages of Bayesian Network

models

– Can describe local interaction components

– Can Reveal the structure of the

transcription regulation process

– Provide clear methodologies for learning

from

– Can Deal with uncompleted data sets

Bioinformática-55

Slides source

• Nir Friedman. 2002. Analysis of Gene Expression Data

• Friedman N. et al. 2000. Using Bayesian Networks to Analyze

Expression Data. ICCMB

## Comments 0

Log in to post a comment