Regulatory Networks Bayesian Networks

placecornersdeceitAI and Robotics

Nov 7, 2013 (4 years and 1 day ago)

65 views

1
Bioinformática-55
Regulatory Networks
Bayesian Networks
2
Bioinformática-55
Bayesian Networks
• A Bayesian Network (BN) is a representation of
a join probability distribution
￿ Compact & intuitive representation
￿ Useful for describing processes composed of locally
interacting components
￿ Have a good statistical foundation
￿ Provide models of causal influence
￿ Deals with noisy data
￿ Efficient model learning algorithm
Bioinformática-55
Bayesian Networks
• Why is it suitable for this problem?
￿ Gene expression is an
inherently stochastic
phenomenon
￿ To capture the nature of
interactions
between genes
especially the
causal connection
A
B
￿ Microarray techniques are associated with
missing
and
noisy
data values
3
Bioinformática-55
Analyzing Data
• Practical problem —Small data sets
￿ variables —
hundreds
of or
thousands
of genes
￿ samples — just
tens
of microarray experiments
• On the positive side, genetic regulation
networks are
sparse!!!
• Characterize and learn
features
that are
common to most of these networks
Bioinformática-55
Analyzing Data
• The first feature —Markov relations
￿ Symmetric relation: Y is in X’s
Markov blanket
iff there
is either an edge between them, or both are parents of
another variable (Pearl 98).
￿ Biological interpretation: a Markov relation indicates
that the two genes are related in some
joint biological
interaction or process
4
Bioinformática-55
Analyzing Data
• The second feature —order relations
￿ Global property: A is an
ancestor
of B in all the
equivalent Bayesian networks learned
￿ Biological interpretation: an order relation indicates
that the transcription of one gene is a
direct cause
of
the transcription of another gene
A
B
Bioinformática-55
Representing Distributions
• Considere a set of assertions and a variety of
ways in which they support each other
• Each assertion establishes a value for an
attribute and is of the form
X
i
= x
i
• The variables are X
1
,... ,X
n
• We would know everything we need to know
about the world described by these assertions if
we had the join probability P(X
1
,... ,X
n
)
5
Bioinformática-55
Representing Distributions
• From the previous probability function it is
possible to compute any other probability such
– P(X
2
) or
– P(X
2
|X
3
,X
5
)
• Assuming, for simplicity, that the variables are
binary, the representation complexity of
P(X
1
,... ,X
n
) is 2
n
– Impratical even for small value of n
Bioinformática-55
Bayesian Networks
• BN simplify this problem by taking advantage of
– Existing causal connections between assertions
– Assumptions about conditional independence
• A BN representation consists of two
components
– G, a DAG whose vertices correspond to the random
variables X
1
,... ,X
n
– Θ, a conditional distribution for each variable, given
its parents in G
• These two components specify a unique
distribution on X
1
,... ,X
n
6
Bioinformática-55
Bayesian Networks
• The graph G represents conditional
independence assumptions that allow the joint
distribution to be decomposed, economizing on
the number of parameters
• The graph G encodes the Markov Assumption
– Each variable X
i
is independent of its non-
descendants, given its parents in G
Bioinformática-55
Bayesian Networks
• Definition
We say that x is conditional
independent
of y given z if
P(x,y|z) = P(x|z)P(y|z)
or
P(x|y,z) = P(x|z)
• We denote I(X;Y|Z) to mean X is independent
of Y conditioned on Z
7
Bioinformática-55
Bayesian Networks
• Any joint distribution that satisfies the
markov assumption can be
decomposed in the product form
where Parents
G
(X
i
) is the set of parents of
X
i
in G

=
=
n
i
i
G
in
XParentsXPXXP
1
1
))(|(),...,(
Bioinformática-55
Example
E
B
A
R
C
The network structure also
implies that the join distribution
has the product form:
)|(),|()|()()(
),,,,(
ACPEBAPERPBPEP
RCEBAP
=
This network structure implies
several conditional independence
statements:
I(E;B), I(A;R|E,B), I(C;E,R,B|A) ...
8
Bioinformática-55
Bayesian Networks
• The representation complexity is O(n2
k+1
)
where n is the number of variables and k is
the maximum number of possible parents
• The set of local conditional probability tables
for all variables, together with the set of
conditional independence assumptions
described by the network, describe the full
joint probability distribution for the network
Bioinformática-55
Example
S,B S,notB notS,B notS,notB
C 0.4 0.1 0.8 0.2
notC 0.6 0.9 0.2 0.8
Campfire
Storm
BustourGroup
Campfire
Lightning
Thunder
ForestFire
Associated with each node
is a conditional probability
table
9
Bioinformática-55
• Conditional distribution specification
– Discrete variables
In the case of finite valued
variables, we can represent as tables
– Continuous variables
When the variable and its
parents are real valued, there is no representation
of all possible densities. Use linear Gaussians
conditional densities in order to represent
multivariate continuous distributions
– Hybrid networks
When the network contains
continuous variables with discrete parents, use
conditional gaussian distributions. The case of a
discrete value with continuous parents is not
allowed
Bayesian Networks
Bioinformática-55
Inference
• Given a Bayesian network, we might want to answer
many types of questions that involve the joint
probability
– What is the probability of X=x given observation of some of
the other variables?
– Are X and Y independent once we observe Z?
• Example: Linkage Analysis
– “Linkage” refers to the tendency of certain genes to be
inherited together. Is a tool that enables us to describe a
family genotype tree and to find which parent a specific gene
has been inherited
– We can learn about a child given information about its
parents (no grandparent information is required)
10
Bioinformática-55
Equivalence Classes
• A Bayesian network structure G implies a set
of independence assumptions.
• Let Ind(G) be the set of independence
statements that hold in all distributions
satisfying these Markov ssumpitons
• More than one graph can imply exactly the
same set of independencies
• Two graphs G and G’ are equivalent if
Ind(G)=Ind(G’)
– Both graphs are alternative ways of describing the
same set of independencies
Bioinformática-55
Equivalence Classes
• The notion of equivalence is crucial, since when we
examine observations from a distribution, we cannot
distinguish between equivalent graphs
• Theorem
(Pearl & Verma 1991) Two DAGs are
equivalent if and only if they have the same
underlying undirect graph and the same v-structures
(i.e. Converging directed edges into the same node,
such as a-> b <- c)
11
Bioinformática-55
Learning Bayesian Networks
• Given a training set D = {x1,...,xN} of
independent instances of X, find a network
B = <G,Θ> that best matches D
– Θ corresponds to the parameters that specify the
conditional distributions
• More precisely, we search for an equivalence
class of networks that best matches D
Bioinformática-55
Learning Bayesian Networks
• Introduce a statistically motivated scoring
function that evaluates each network with
respect to the training data
• Search for the optimal network according to
this score
• It’s possible to derive a score using Bayesian
considerations
– Evaluate the posterior probability of a graph given
the data
12
Bioinformática-55
Learning Bayesian Networks
where C is a constante independent of G and
is the marginal likelihood which averages the probability of
the data over all possible parameter assignments to G
CGPGDP
DGPDGS
++=
=
)(log)|(log
)|(log):(

ΘΘΘ= dGPGDPGDP )|(),|()|(
Bioinformática-55
Learning Bayesian Networks
• The particular choice of priors P(G) and
P(Θ|G) for each G determines the exact
Bayesian score
• This means, that given a sufficiently large
number of instances in large data sets,
learning procedures can pinpoint the exact
network structure up to the correct
equivalence class
13
Bioinformática-55
Bayes Theorem
• Provides a direct method to calculate the probability
of a hypothesis based on its prior probability, the
probabilities of observing various data given the
hypothesis, and the observed data itself
• Notation:
P(h) : the initial probability that hypothesis h holds, before we
have observed the training data (prior probability)
P(D) : the prior probability that training data D will be observed
P(D|h) : the probability of observing data D given some world in
which hypothesis h holds
P(h|D) : the probability that h holds given the observed training
data D (posterior probability of h)
Bioinformática-55
Bayes Theorem
• In machine learning problems we are
interested in the probability P(h|D)
• This probability reflects the influence of the
training data D, in contrast to the prior
probability P(h), which is independent of D
)(
)()|(
)|(
DP
hPhDP
DhP =
14
Bioinformática-55
Bayes Theorem
• In many learning scenarios, the learner considers
some set of candidate hypotheses H and is interested
in finding the most probable hypothesis h ∈ H given
the observed data D
• Such maximally probable hypothesis is called a
maximum a posteriori (MAP) hypothesis
• Use Bayes theorem to calculate the posterior
probability of each candidate hypothesis
Bioinformática-55
Bayes Theorem
h
MAP
is a MAP hypothesis provider
the term P(D) is dropped because it is a
constant independent of h
)()|(maxarg
)(
)()|(
maxarg
)|(maxarg
hPhDP
DP
hPhDP
DhPh
Hh
Hh
Hh
MAP



=
=

15
Bioinformática-55
Example
• Consider a medical diagnosis problem in
which there are two alternative hypotheses:
– (1) the patient has a particular form of cancer
– (2) the patient does not
• The available data is from a particular
laboratory test with two possible outcomes: Y
(positive) and N (negative)
• As prior knowledge: over the entire
population of people only 0.008 have this
disease
Bioinformática-55
Example
• Furthermore, the lab test is only an imperfect
indicator of the disease
• The test returns a correct positive result in only 98%
of the cases in which the disease is actually present
and a correct negative result in only 97%of the cases
in which the disease is not present
• Considere the following probabilities:
97.0)|(03.0)|(
02.0)|(98.0)|(
992.0)(008.0)(
=¬=¬
==
=
¬
=
cancerNPcancerYP
cancerNPcancerYP
cancerPcancerP
16
Bioinformática-55
Example
• Suppose we now observe a new patient
for whom the lab test returns a positive
result
• Should we diagnose the patient as
having cancer or not?
Bioinformática-55
Example
• The maximum a posteriori hypothesis
can be found:
thus
0298.0992.003.0)()|(
0078.0008.098.0)()|(
=×=¬¬
=
×
=
cancerPcancerYP
cancerPcancerYP
cancerh
MAP
¬
=
17
Bioinformática-55
Learning Bayesian Networks
• Priors for hybrid networks of multinomial distributions and conditional
Gaussian distributions
• Assuming that the data set is a complete data, several properties are
satisfied by these priors:
– The priors are structure equivalent (if G and G’ are equivalent
structures they are guaranteed to have the same score)
– The priors are decomposable. The score can be rewritten as the
sum
were the contribution for every variable Xi to the total network
score depends only on its own value and the values of its parents
in G
– The local contribution for each variable can be computed using a
closed form equation

=
i
ii
DXParentsXibutionScoreContrDGScore ):)(,():(
Bioinformática-55
Learning Bayesian Networks
• Finding the structure G that maximizes the
score is known to be NP-hard
• Heuristic search:
– A local search procedure: changes one arc at
each move; efficiently evaluate the gains made by
adding, removing or reversing a single arc
– Greedy hill-climbing algorithm
• At each step performs the local change that results in the
maximal gain, until it reaches a local maximum
• Although this procedure does not necessarily find a
global maximum, it does perform well in practice
18
Bioinformática-55
Learning Causal Patterns
• Model the flow of causality in the system of interest
(e.g. gene transcription)
• A causal network is a model of such causal
processes
• Is similar to a Bayesian network but it views the
parents of a variable as its immediate causes
• Example: X is a TF of Y, so there is an edge X -> Y
• Relate causal networks and Bayesian networks, by
assuming the Causal Markov Assumption
Bioinformática-55
Learning Causal Patterns
• When can we learn a causal network from
observations?
• Distinction between an:
– Observation: a passive measurement of our domain (i.e., a
sample from X)
– Intervention: setting the values of some variables using forces
outside the causal model (e.g., gene knockout or over-
expression)
• Interventions are an important tool for inferring
causality
19
Bioinformática-55
Learning Causal Patterns
• What is surprising is that some causal
relations can be inferred from observations
alone
• When learning an equivalence class from the
data
, if a directed arc X -> Y is in the graph
then all the networks in the class agree that X
is an immediate cause of Y
• The situation is more complex when we have
a combination of observations and results of
different intervations
Bioinformática-55
Applying Bayesian Networks to
Expression Data
• The expression level of each gene is
modeled as a random variable
• Other atributes that affect the system can be
modeled as a random variable
– Attributes of the sample, such as experimental
conditions, temporal indicators, background
variables, etc
• Possible queries about the system
– Does the expression level of a particular gene
depend on the experimental conditions?
– The dependence is direct or indirect?
20
Bioinformática-55
Learn a model from expression data
• We attempt to build a model wich is a joint
distribution over a collection of random
variables
• These involve:
– Statistical aspects of interpreting the results
– Algorithmic complexity issues in learning from the
data
– The choice of local probability models
Bioinformática-55
Learn a model from expression data
• Learning difficulties
– Expression data involves thousands of genes
while current datasets contain at most a few
dozen samples
• This raises problems in computational complexity and
the statistical significance of the resulting networks
• The positive side
– Genetic regulation networks are sparce
• Bayesian networks are specially suited for learning in
such sparce domains
21
Bioinformática-55
Network Features
• When learning models with many variables,
small datasets are not sufficiently informative
• Many different networks should be
considered as reasonable explanation of the
given data
– We would like to analyze this set of plausible
networks
• From Bayesian perspective
– The posterior probability over the models is not
dominated by a single model
Bioinformática-55
Network Features
• Characterize features that are common to
most of these networks
• Focus on learning them
• Two classes of features involving pairs fo
variables (this type of analysis is not
restricted to pairwise)
– Markov Relations
– Order Relations
22
Bioinformática-55
Markov Relations
• A relation of this type specifies if Y is in the
Markov blanket of X
• The Markov blanket of X is the minimal set of
variables that shield X from the rest of the
variables in the model
• This relation is symmetric
– Y is in X’s Markov blanket if and only if there is
either an edge between them, or both are parents
of another variable
Bioinformática-55
Markov Relations
• In the context of gene expression analysis, a
Markov relation indicates that the two genes are
related in some joint biological interaction or
process
• Two variables in a Markov relation are directly
linked
in the sense that no variable in the model
mediates the dependence between them
• It remains possible that an unobserved variable
(e.g., protein activation) is an intermediate in their
interaction
23
Bioinformática-55
Order Relations
• Is X an ancestor of Y in all the networks of a
given equivalence class?
• This type of relation does not involve only a
close neighborhood, but rather captures a
global property
• Learning that X is an ancestor of Y would
imply that X is a cause of Y
– We view such a relation as an indication, rather
than an evidence, that X might be a causal
ancestor of Y
Bioinformática-55
Estimating Statistical Confidence in
Features
• To what extent do the data support a given
feature?
• We want to estimate a measure of confidence
in the features of the learned network
– “confidence”, approximates the likelihood that a
given feature is actually true
• An effective approach for estimating
confidence is the bootstrap method (Efron
and Tibshirani 1993)
24
Bioinformática-55
Methods
• Learning algorithm —induce network
structure
– Sparse Candidate Algorithm.
• Feature estimate —extract useful
features
– A Bootstrap Approach.
Bioinformática-55
Sparse Candidate Algorithm
• An heuristic, iterative approach
• At each iteration n, for each variable Xi, the algorithm
chooses the set C
i
n
= {Y1,...,Yk} of variables which
are the most promising candidate parents for Xi
• Search for Gn, a high scoring network in which
Parents
Gn
(X
i
) ⊆ C
i
n
• The network found is then used to guide the selection
of better candidate sets for the next iteration
25
Bioinformática-55
Sparse Candidate Algorithm
• Method for choosing C
i
n
– Assign each variable Xj some score of relevance to Xi (such as
correlation)
– Choose variables with the highest score
• How to measure the relevance of potential parent Xj to Xi?
S(X
i
,Parents
Gn-1
(X
i
)∪{X
j
} : D) – S(X
i
,Parents
Gn-1
(X
i
) : D)
• Restrict the search to networks in which only the candidate
parents of a variable can be its parents
Bioinformática-55
Bootstrap Method
• Generate ”perturbed” versions of original data set,
and learn from them
• Collect many networks, all of which are fairly
reasonable models of the data
• The networks show how small perturbations to the
data can affect many of the features
• Experiments show that features induced with high
confidence are rarely false positives, even in cases
where the data sets are small compared to the
system being learned
26
Bioinformática-55
Bootstrap Method
• For i=1 …m
• Resample with replacement N instances from D.
Denote by D
i
the resulting dataset
• Learn on D
i
to induce a network structure G
i
• For each feature f of interest calculate
conf(f) =Σ
i=1
m
f(G
i
)/m
Where f(G
i
) = 1 if f is a feature in G
i
, and 0 otherwise
Bioinformática-55
Experiment
• The map left is an
example of Markov
relation features for
gene SVS1.
• The width of edges
corresponds to the
confidence.
27
Bioinformática-55
Bayesian Networks
• Advantages of Bayesian Network
models
– Can describe local interaction components
– Can Reveal the structure of the
transcription regulation process
– Provide clear methodologies for learning
from
– Can Deal with uncompleted data sets
Bioinformática-55
Slides source
• Nir Friedman. 2002. Analysis of Gene Expression Data
• Friedman N. et al. 2000. Using Bayesian Networks to Analyze
Expression Data. ICCMB