Probabilistic graphical models and
regulatory networks
BMI/CS 576
www.biostat.wisc.edu/bmi576.html
Sushmita Roy
sroy@biostat.wisc.edu
Nov 27
th
, 2012
Two main questions in regulatory
networks
HSP12
Sko1
Hot1
Sko1
Structure
HSP12
Hot1
Who are the regulators?
ψ(
X
1
,X
2
)
Function
X
1
X
2
Y
BOOLEAN
LINEAR
DIFF. EQNS
PROBABILISTIC
….
How they determine expression levels?
Hot1 regulates HSP12
HSP12 is a target of Hot1
Graphical models for representing regulatory
networks
•
Bayesian networks
•
Dependency networks
Structure
Msb
2
Sho1
Ste20
Random variables
encode expression levels
T
ARGET
R
EGULATORS
X
1
X
2
Y
3
X
1
X
2
Y
3
Edges correspond to some form of statistical dependencies
Y
3
=
f
(X
1
,X
2
)
Function
Bayesian networks: estimate a set of conditional
probability distributions
Function: Conditional probability
distribution (CPD)
?
?
?
…
Y
i
Target (child)
Regulators (parents)
Dependency networks: a set of regression problems
Y
i
=
X
1
……
X
p
b
j
1
d
1
p
1
d
1
p
Regularization term
?
?
?
…
Y
i
Regulators
Function: Linear regression
Number of genes
Bayesian
networks
•
a BN is a Directed Acyclic Graph (DAG) in which
–
the nodes denote random variables
–
each node
X
has a
conditional probability distribution
(CPD) representing P(
X

Parents
(
X
)
)
•
A type of probabilistic graphical model (PGM)
–
A graph
–
Parameters (for probability distributions)
•
the
intuitive meaning of an arc from
X
to
Y
is that
X
directly influences
Y
•
Provides a tractable way to work with large joint
distributions
Bayesian networks for representing
regulatory networks
Conditional probability distribution (CPD)
?
?
?
…
Y
i
Target (child)
Regulators (parents)
Example Bayesian network
C
HILD
P
ARENTS
X
1
X
2
X
3
X
5
X
4
No independence assertions
Independence assertions
Assume
X
i
is binary
Needs 2
5
measurements
Needs 2
3
measurements
Representing
CPDs
for
discrete variables
A
B
C
t
f
t
t
t
0.9
0.1
t
t
f
0.9
0.1
t
f
t
0.9
0.1
t
f
f
0.9
0.1
f
t
t
0.8
0.2
f
t
f
0.5
0.5
f
f
t
0.5
0.5
f
f
f
0.5
0.5
P(
D

A
,
B,
C
) as a table
P(
D

A
,
B,
C
) as a tree
A
Pr(
D
=
t
) =
0.9
f
t
B
Pr(
D
=
t
)
=
0.5
f
t
C
Pr(
D
=
t
)
=
0.5
Pr(
D
=
t
)
=
0.8
f
t
•
CPDs
can be represented using tables or trees
•
consider the following case with Boolean variables
A
,
B, C, D
Representing
CPDs
for continuous
variables
X
1
X
2
X
3
Parameters
Conditional Gaussian
The learning problems
•
Parameter learning:
–
Given a network structure learn the parameters
from data
•
Structure learning:
–
Given data, learn the structure (and parameters)
–
Subsumes parameter learning
Structure learning
•
Maximum likelihood framework
•
Bayesian framework
Score
The
structure learning task
•
structure learning methods have two main
components
1.
a scheme for scoring a given BN
structure
2.
a search procedure for exploring the space of structures
Structure
learning using score

based
search
...
Maximum likelihood
Best graph
Decomposability of scores
•
Score decomposes over variables.
•
Thus we can independently optimize the
S(X_i
)
•
It all boils down to accurately estimating the
conditional probability distributions
Structure
search operators
A
B
C
D
A
B
C
D
add an edge
A
B
C
D
delete an edge
given the current network
at some stage of the search,
we can…
Check for cycles
Bayesian network search:
hill

climbing
given
:
data set
D
, initial network
B
0
i
= 0
B
best
B
0
while stopping criteria not met
{
for each possible operator application
a
{
B
new
apply(
a
,
B
i
)
if
score(
B
new
) >
score(
B
best
)
B
best
B
new
}
++
i
B
i
B
best
}
Learning networks from expression is
difficult due to too few measurements
•
Reduce the candidate parents
–
Sparse candidate
–
Prior knowledge
–
MinREG
•
Reduce the target set
–
Module networks
Bayesian
network
s
earch
:
the
Sparse Candidate
algorithm
[Friedman et al.,
UAI
1999]
Given: data set
D
, initial network
B
0
, parameter
k
•
to identify candidate parents in the
first
iteration, can compute the
mutual information
between pairs of variables
•
where denotes the probabilities estimated from the data set
The
restrict
step
i
n
Sparse Candidate
•
suppose true network structure is:
•
we’re selecting two candidate parents for
A
and
I(
A; C
) > I
(
A;
D
) > I(
A; B
)
•
the candidate parents for
A
would then be
C
and
D
;
how could we get
B
as a candidate parent on the
next iteration?
A
B
C
D
A
D
C
The
restrict
step
i
n
Sparse Candidate
•
mutual information can be thought of as the KL divergence
between the distributions
•
Kullback

Leibler
(KL)
divergence
provides a distance measure
between two distributions,
P
and
Q
(assumes
X
and
Y
are independent)
The
restrict
step in Sparse Candidate
•
we can use KL to assess the discrepancy between the network’s
estimate
P
net
(
X
,
Y
)
and the empirical estimate
A
B
C
D
A
D
C
B
true distribution
current Bayes net
The
restrict
step in Sparse Candidate
How might we calculate
P
net
(A,B
)?
important to ensure monotonic improvement
The
restrict
step in Sparse Candidate
Pa
(
X
i
)
: current parents of
X
i
The
maximize
s
tep
in Sparse Candidate
•
hill

climbing search with
add

edge
,
delete

edge
,
reverse

edge
operators
•
test to ensure that cycles aren’t introduced into the graph
Efficiency
of
Sparse
C
andidate
possible parent sets
for each node
changes scored on
first iteration of
search
changes scored on
subsequent iterations
ordinary greedy
search
greedy search w/at
most
k
parents
Sparse Candidate
n
= number of variables
Comments 0
Log in to post a comment