Using Bayesian Networks to Analyze Expression Data

ocelotgiantΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 1 μήνα)

82 εμφανίσεις

Using Bayesian Networks to
Analyze Expression Data


Jeong, Jong Cheol

Nir Friedman, Michal Linial, Iftach Nachman & Dana Pe’er

Dept. Electrical Engineering & Computer Science

University of Kansas


Bayesian Networks


Representation of a joint probability
distribution


A directed acyclic graph:

-

Random variables

-

Conditional distribution


Conditional independence assumption

-

Each variable is independent of its none
-
descendants

Conditional independence assumption


Any joint distribution can be decomposed
into product form

1
1
(,...,) ( | ( ))
n
G
n i i
i
P X X P X X



Pa
( )
G
i
X
Pa
i
X

is the set of parents of

Bayesian Networks

Conditional independence

I(A; E), I(B; D | A, E),

I(C; A, D, E | B)

I(D; B, C, E | A), I(E; A, D)

P(A,B,C,D,E) = P(A)∙P(B|A,E) ∙P(C|B) ∙P(D|A) ∙P(E)

Specifying conditional distribution


Discrete variables:

-

Using table that specifies the probability of values


-

For binary variable, the table specifies
distribution



Continuous variables:

-

Using linear Gaussian distribution


1
( |,...)
k
P X U U
2
k
1
{,...}
k
U U
: parents of a variable of

2
( |,...,) (,)
i k o i i
i
P X u u N a a u

 

Equivalence Classes of Bayesian Networks


Two directed acyclic graphs are equivalent if only if
they have the same underlying undirected graph and
the same v
-
structure (Pearl & Verma 1991).

-

converging directed edges into the same node:

a b c
 

Ind(G): the set of independence statements

-

if more than one graph exactly same sat of
independencies

where

How can be distinguish between equivalent graph?

:
G X Y

':
G X Y

Ind( ) Ind(')
G G
 
Learning Bayesian Networks


Training set:

-

finding a network


which best matches D


Score function

-

evaluating the posterior probability of a graph given


the data

1
{,...,}
N
D

x x
,,
B G

θ
( | ( ))
G
i i
P X X


Pa
(:) log ( | )
log ( | ) log ( )
S G D P G D
P D G P G C

  
( | ) ( |,) ( | )
P D G P D G P G d
  


-

Marginal likelihood

Property of priors


Structure equivalent:

if graph G and G’ are
equivalent graphs then they are guaranteed to have
the same posterior score.


Decomposable:

the contribution of variable to
the total network score depends only on its own value
and the values of its parents in G




Local contributions for each variable can be
computed using a closed form equation.


i
X
(:) ScoreContribution(,( ):)
G
i i
i
S G D X X D


Pa
Learning Causal Patterns


Bayesian network
:
model of dependencies
between multiple measurements
.



A causal network
:
having stricter interpretation of
the meaning of edges (i.e., the parents of a variable are
its immediate causes.)


X Y X Y
  
Using Bayesian Networks to Analyze
Expression Data
(Nir Friedman et. al.)


Goal

-

Building Bayesian networks which can be applied to
model interactions among genes



-

Examine


1. Markov relation


2. Order relations

Estimating Statistical Confidence in
Features


Using bootstrap method


for

1...
i m

i
D
: sampling N instances from D with replacement

Apply the learning procedure on to induce a network structure

i
D
i
G
f
for each feature , calculating

1
1
conf ( ) ( )
m
i
i
f f G
m



where

1 : is a feature in G
( )
0 : otherwise
i
f
f G




Sparse Candidate algorithm

Choosing most promising candidate parents for

i
X
1
{,...,}
n
i k
C Y Y

Searching a high network in which

( )
n
G
n
i i
X C

Pa
Repeat


if

2 1
( ) ( )
n n
Score G Score G
 

1
n n
G G


End if

Until

is no changeable

n
i
C

Measuring the relevance of potential parent


to .

j
X
i
X
1
1
ScoreContribution(,( ) { }:)
ScoreContribution(,( ):)
n
n
G
i i j
G
i i
X Pa X X D
X Pa X D




Application to Cell Cycle Expression
Patterns


Data set: S. cerevisiae ORFs

-

76 gene expression measurements of the mRNA
levels of 6177.


Sparse candidate algorithm with 200
-
fold bootstrap


Experiment

-

the discrete multinomial distribution

-

linear Gaussian distribution

Markov features


Local map for the gene SVS1


Multinomial


Linear Gaussian


References


Friedman, N., Linial, M., Nachman, I., & Pe’er, D. Using bayesian
networks to analyze expression data.