Using Bayesian Networks to

tripastroturfΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

69 εμφανίσεις

Using Bayesian Networks to
Analyze Expression Data

N
. Friedman, M.
Linial
, I.
Nachman
,
D.
Pe’er

@ Hebrew University

What I will cover


Domain background


Overview of their work


Causal networks vs.
Bayes

networks


Application


Results


BACKGROUND INFORMATION


What are gene expressions?


It is the process in which information is used in
the synthesis of a functional gene product (protein
or
Rna
).


Think of it as a menu for a dinner given a
certain holiday.


Need certain ingredients / food to pull it off right.


Too much or too little of something can lead to
odd results.



Advancement in technology lead to DNA
Microarrays.


Snapshot of internals of a cell at a given moment in
time.


No more having to look at one gene at a time for
comparison.


Most computational analysis has focused on
clustering algorithms.


Cluster like genes with like genes.


Useful for finding co
-
regulated genes but not really for
finding the structure of the regulation process.

OVERVIEW

Overview


How to discover key relations in cellular
systems given large amounts of micro
array
data.


Propose a Bayesian Network framework for
gene interaction discovery from micro array
data.


Trying to build statistical dependencies.


Understand interactions from multiple expression
measurements.


Overview


Want to uncover properties of the network by
examining the dependence and conditional
dependence of the gene data.


How does one gene interact with another etc.


Can use this information to determine causal
influence.

BAYES

NETS

Bayesian Network


Bayesian Network


Useful for a few reasons


Great for describing locally interacting entities.


Well understood array of algorithms and
successful use in many areas.


Can be used to infer a causal network even though
they are not mathematically defined as such.


Able to handle noise fairly well.

Causal Network


Very similar to a typical Bayesian net.


Bayesian network with a strict requirement
that the relationships are causal.


X causes something about Y.


Learning multiple networks with the same
directed path could mean there is a causal
indication between X and Y.


Bayes

vs

Causal


Bayesian Network generally deals with
dependence.


Causal Networks deal with strict relationships.


Bayesian Network can have equivalent
networks.


X


Y is equivalent to Y


X


Causal Network


The above cannot hold due to the definition of
Causal networks.

Learning Causal Patterns


Need to determine a causal interpretation of
the network.


Observation


Passive domain measurement.


Intervention


Setting variable values using outside forces.

Causal Markov assumption


Given the values of a variables immediate
causes, it is independent of its earlier causes.


Once we know the makeup of the genes parents,
we don’t care about the ancestors anymore in
terms of the current gene.

Analyzing Expression Data



Consider distributions over all possible states (
can include environmental states etc)


State of the system is a series of random
variables.


Each random variable denotes expression level of
each gene.


Take all of these variables and build the joint
distribution.


Difficult to learn from expression data due to
involving transcript levels from thousands of
genes!


However these gene networks are sparse so
Bayes

Nets are still well suited.

Learning the model


Markov relations are a feature that indicates if
two genes are related in a joint biological
process.


Order relations are a feature that captures a
global property about the network.


Used as an indication of some causality between X
and Y. Its not certain though.

Confidence of features


Produce m different networks and for each
feature of interest calculate its confidence.





Where f(G) is 1 if f is a feature of G, 0
otherwise.





m
i
i
G
f
m
f
conf
1
)
(
1
)
(
Learning the network structure


Issues


Extremely large search space (super
-
exponential
in the number of variables)


Need to id potential parents for each gene
using simple statistics to build the network.


Reduces search space to networks that only
contain the candidate parents as parents of some
variables X
i

.

Different local probability models


Multinomial Model


Treat each variable as discrete and learn
multinomial distribution to describe the possible
state of each child given the stat of the parents.


Linear Gaussian Model


Linear regression model for the child given its
parents.

Results


Applied Cell Cycle Expression patterns.


76 gene expression measurements.


Treat each measurement as an independent
sample.


Performed the boot strapping algorithm along
with the sparse search algorithm to extract
learned features.


Performed on only 250 genes

Test robustness


Tested their confidence assessment by using a
randomly created data set. Random
permutation of the order of experiments per
gene.


Found that random data did not perform well due
to not finding real features that correspond in the
data.


Tells us that the learned features are not artifacts
of the boot strapping estimation.


Managed to extract plausible biological
knowledge without use of priors.


Framework builds a much “richer” structure
from the data compared to clustering
techniques.


Capable of discovering causal relationships
between genes from expression data.