Probabilistic graphical models and regulatory networks

hartebeestgrassAI and Robotics

Nov 7, 2013 (3 years and 7 months ago)

81 views

Probabilistic graphical models and
regulatory networks

BMI/CS 576

www.biostat.wisc.edu/bmi576.html

Sushmita Roy

sroy@biostat.wisc.edu


Nov 27
th
, 2012


Two main questions in regulatory
networks

HSP12

Sko1

Hot1

Sko1

Structure

HSP12

Hot1

Who are the regulators?

ψ(
X
1
,X
2
)

Function

X
1

X
2

Y

BOOLEAN

LINEAR

DIFF. EQNS

PROBABILISTIC

….

How they determine expression levels?

Hot1 regulates HSP12

HSP12 is a target of Hot1

Graphical models for representing regulatory
networks


Bayesian networks


Dependency networks

Structure

Msb
2

Sho1

Ste20

Random variables
encode expression levels

T
ARGET

R
EGULATORS

X
1

X
2

Y
3

X
1

X
2

Y
3

Edges correspond to some form of statistical dependencies

Y
3
=
f
(X
1
,X
2
)

Function

Bayesian networks: estimate a set of conditional
probability distributions

Function: Conditional probability
distribution (CPD)

?

?

?



Y
i

Target (child)

Regulators (parents)

Dependency networks: a set of regression problems

Y
i

=

X
1
……
X
p

b
j

1

d

1

p

1

d

1

p

Regularization term

?

?

?



Y
i

Regulators

Function: Linear regression

Number of genes

Bayesian

networks


a BN is a Directed Acyclic Graph (DAG) in which


the nodes denote random variables


each node
X

has a
conditional probability distribution
(CPD) representing P(
X
|
Parents
(
X
)

)



A type of probabilistic graphical model (PGM)


A graph


Parameters (for probability distributions)



the
intuitive meaning of an arc from
X
to
Y

is that
X
directly influences
Y



Provides a tractable way to work with large joint
distributions


Bayesian networks for representing
regulatory networks

Conditional probability distribution (CPD)

?

?

?



Y
i

Target (child)

Regulators (parents)

Example Bayesian network

C
HILD

P
ARENTS

X
1

X
2

X
3

X
5

X
4

No independence assertions

Independence assertions

Assume
X
i

is binary

Needs 2
5
measurements

Needs 2
3
measurements

Representing
CPDs

for

discrete variables

A

B

C

t

f

t

t

t

0.9

0.1

t

t

f

0.9

0.1

t

f

t

0.9

0.1

t

f

f

0.9

0.1

f

t

t

0.8

0.2

f

t

f

0.5

0.5

f

f

t

0.5

0.5

f

f

f

0.5

0.5

P(
D

|
A
,
B,

C

) as a table

P(
D

|
A
,
B,

C

) as a tree

A

Pr(
D

=

t
) =
0.9

f

t

B

Pr(
D

=

t
)
=
0.5

f

t

C

Pr(
D

=

t
)
=
0.5

Pr(
D

=

t
)
=
0.8

f

t


CPDs

can be represented using tables or trees


consider the following case with Boolean variables
A
,
B, C, D

Representing
CPDs

for continuous
variables

X
1

X
2

X
3

Parameters

Conditional Gaussian

The learning problems


Parameter learning:



Given a network structure learn the parameters
from data


Structure learning:


Given data, learn the structure (and parameters)


Subsumes parameter learning

Structure learning


Maximum likelihood framework




Bayesian framework




Score

The

structure learning task


structure learning methods have two main
components


1.
a scheme for scoring a given BN
structure


2.
a search procedure for exploring the space of structures

Structure

learning using score
-
based
search

...

Maximum likelihood

Best graph

Decomposability of scores


Score decomposes over variables.



Thus we can independently optimize the
S(X_i
)



It all boils down to accurately estimating the
conditional probability distributions

Structure

search operators

A

B

C

D

A

B

C

D

add an edge

A

B

C

D

delete an edge

given the current network

at some stage of the search,

we can…

Check for cycles

Bayesian network search:

hill
-
climbing

given
:

data set
D
, initial network
B
0


i

= 0

B
best



B
0

while stopping criteria not met

{


for each possible operator application
a


{



B
new



apply(
a
,
B
i
)



if
score(
B
new
) >
score(
B
best
)




B
best



B
new


}


++
i


B
i



B
best

}





Learning networks from expression is
difficult due to too few measurements


Reduce the candidate parents


Sparse candidate


Prior knowledge


MinREG


Reduce the target set


Module networks

Bayesian

network
s
earch
:

the
Sparse Candidate

algorithm

[Friedman et al.,
UAI

1999]

Given: data set
D
, initial network
B
0
, parameter
k


to identify candidate parents in the
first

iteration, can compute the
mutual information
between pairs of variables






where denotes the probabilities estimated from the data set

The

restrict
step
i
n
Sparse Candidate


suppose true network structure is:






we’re selecting two candidate parents for
A

and
I(
A; C
) > I
(
A;
D
) > I(
A; B
)



the candidate parents for
A

would then be
C

and
D
;
how could we get
B

as a candidate parent on the
next iteration?

A

B

C

D

A

D

C

The

restrict
step
i
n
Sparse Candidate


mutual information can be thought of as the KL divergence
between the distributions





Kullback
-
Leibler

(KL)
divergence

provides a distance measure
between two distributions,
P

and
Q


(assumes
X

and
Y

are independent)

The
restrict
step in Sparse Candidate


we can use KL to assess the discrepancy between the network’s
estimate
P
net
(
X
,
Y
)
and the empirical estimate




A

B

C

D

A

D

C

B

true distribution

current Bayes net

The
restrict
step in Sparse Candidate

How might we calculate
P
net
(A,B
)?

important to ensure monotonic improvement

The
restrict
step in Sparse Candidate

Pa
(
X
i
)
: current parents of
X
i

The

maximize
s
tep
in Sparse Candidate


hill
-
climbing search with
add
-
edge
,
delete
-
edge
,

reverse
-
edge

operators


test to ensure that cycles aren’t introduced into the graph

Efficiency
of

Sparse
C
andidate

possible parent sets
for each node

changes scored on
first iteration of
search

changes scored on
subsequent iterations

ordinary greedy
search

greedy search w/at
most
k

parents

Sparse Candidate

n

= number of variables