# Probabilistic graphical models and regulatory networks

AI and Robotics

Nov 7, 2013 (4 years and 8 months ago)

109 views

Probabilistic graphical models and
regulatory networks

BMI/CS 576

www.biostat.wisc.edu/bmi576.html

Sushmita Roy

sroy@biostat.wisc.edu

Nov 27
th
, 2012

Two main questions in regulatory
networks

HSP12

Sko1

Hot1

Sko1

Structure

HSP12

Hot1

Who are the regulators?

ψ(
X
1
,X
2
)

Function

X
1

X
2

Y

BOOLEAN

LINEAR

DIFF. EQNS

PROBABILISTIC

….

How they determine expression levels?

Hot1 regulates HSP12

HSP12 is a target of Hot1

Graphical models for representing regulatory
networks

Bayesian networks

Dependency networks

Structure

Msb
2

Sho1

Ste20

Random variables
encode expression levels

T
ARGET

R
EGULATORS

X
1

X
2

Y
3

X
1

X
2

Y
3

Edges correspond to some form of statistical dependencies

Y
3
=
f
(X
1
,X
2
)

Function

Bayesian networks: estimate a set of conditional
probability distributions

Function: Conditional probability
distribution (CPD)

?

?

?

Y
i

Target (child)

Regulators (parents)

Dependency networks: a set of regression problems

Y
i

=

X
1
……
X
p

b
j

1

d

1

p

1

d

1

p

Regularization term

?

?

?

Y
i

Regulators

Function: Linear regression

Number of genes

Bayesian

networks

a BN is a Directed Acyclic Graph (DAG) in which

the nodes denote random variables

each node
X

has a
conditional probability distribution
(CPD) representing P(
X
|
Parents
(
X
)

)

A type of probabilistic graphical model (PGM)

A graph

Parameters (for probability distributions)

the
intuitive meaning of an arc from
X
to
Y

is that
X
directly influences
Y

Provides a tractable way to work with large joint
distributions

Bayesian networks for representing
regulatory networks

Conditional probability distribution (CPD)

?

?

?

Y
i

Target (child)

Regulators (parents)

Example Bayesian network

C
HILD

P
ARENTS

X
1

X
2

X
3

X
5

X
4

No independence assertions

Independence assertions

Assume
X
i

is binary

Needs 2
5
measurements

Needs 2
3
measurements

Representing
CPDs

for

discrete variables

A

B

C

t

f

t

t

t

0.9

0.1

t

t

f

0.9

0.1

t

f

t

0.9

0.1

t

f

f

0.9

0.1

f

t

t

0.8

0.2

f

t

f

0.5

0.5

f

f

t

0.5

0.5

f

f

f

0.5

0.5

P(
D

|
A
,
B,

C

) as a table

P(
D

|
A
,
B,

C

) as a tree

A

Pr(
D

=

t
) =
0.9

f

t

B

Pr(
D

=

t
)
=
0.5

f

t

C

Pr(
D

=

t
)
=
0.5

Pr(
D

=

t
)
=
0.8

f

t

CPDs

can be represented using tables or trees

consider the following case with Boolean variables
A
,
B, C, D

Representing
CPDs

for continuous
variables

X
1

X
2

X
3

Parameters

Conditional Gaussian

The learning problems

Parameter learning:

Given a network structure learn the parameters
from data

Structure learning:

Given data, learn the structure (and parameters)

Subsumes parameter learning

Structure learning

Maximum likelihood framework

Bayesian framework

Score

The

structure learning methods have two main
components

1.
a scheme for scoring a given BN
structure

2.
a search procedure for exploring the space of structures

Structure

learning using score
-
based
search

...

Maximum likelihood

Best graph

Decomposability of scores

Score decomposes over variables.

Thus we can independently optimize the
S(X_i
)

It all boils down to accurately estimating the
conditional probability distributions

Structure

search operators

A

B

C

D

A

B

C

D

A

B

C

D

delete an edge

given the current network

at some stage of the search,

we can…

Check for cycles

Bayesian network search:

hill
-
climbing

given
:

data set
D
, initial network
B
0

i

= 0

B
best

B
0

while stopping criteria not met

{

for each possible operator application
a

{

B
new

apply(
a
,
B
i
)

if
score(
B
new
) >
score(
B
best
)

B
best

B
new

}

++
i

B
i

B
best

}

Learning networks from expression is
difficult due to too few measurements

Reduce the candidate parents

Sparse candidate

Prior knowledge

MinREG

Reduce the target set

Module networks

Bayesian

network
s
earch
:

the
Sparse Candidate

algorithm

[Friedman et al.,
UAI

1999]

Given: data set
D
, initial network
B
0
, parameter
k

to identify candidate parents in the
first

iteration, can compute the
mutual information
between pairs of variables

where denotes the probabilities estimated from the data set

The

restrict
step
i
n
Sparse Candidate

suppose true network structure is:

we’re selecting two candidate parents for
A

and
I(
A; C
) > I
(
A;
D
) > I(
A; B
)

the candidate parents for
A

would then be
C

and
D
;
how could we get
B

as a candidate parent on the
next iteration?

A

B

C

D

A

D

C

The

restrict
step
i
n
Sparse Candidate

mutual information can be thought of as the KL divergence
between the distributions

Kullback
-
Leibler

(KL)
divergence

provides a distance measure
between two distributions,
P

and
Q

(assumes
X

and
Y

are independent)

The
restrict
step in Sparse Candidate

we can use KL to assess the discrepancy between the network’s
estimate
P
net
(
X
,
Y
)
and the empirical estimate

A

B

C

D

A

D

C

B

true distribution

current Bayes net

The
restrict
step in Sparse Candidate

How might we calculate
P
net
(A,B
)?

important to ensure monotonic improvement

The
restrict
step in Sparse Candidate

Pa
(
X
i
)
: current parents of
X
i

The

maximize
s
tep
in Sparse Candidate

hill
-
climbing search with
-
edge
,
delete
-
edge
,

reverse
-
edge

operators

test to ensure that cycles aren’t introduced into the graph

Efficiency
of

Sparse
C
andidate

possible parent sets
for each node

changes scored on
first iteration of
search

changes scored on
subsequent iterations

ordinary greedy
search

greedy search w/at
most
k

parents

Sparse Candidate

n

= number of variables