Using Bayesian Networks to
Analyze Expression Data
N
. Friedman, M.
Linial
, I.
Nachman
,
D.
Pe’er
@ Hebrew University
What I will cover
•
Domain background
•
Overview of their work
•
Causal networks vs.
Bayes
networks
•
Application
•
Results
BACKGROUND INFORMATION
•
What are gene expressions?
–
It is the process in which information is used in
the synthesis of a functional gene product (protein
or
Rna
).
•
Think of it as a menu for a dinner given a
certain holiday.
–
Need certain ingredients / food to pull it off right.
–
Too much or too little of something can lead to
odd results.
•
Advancement in technology lead to DNA
Microarrays.
–
Snapshot of internals of a cell at a given moment in
time.
–
No more having to look at one gene at a time for
comparison.
•
Most computational analysis has focused on
clustering algorithms.
–
Cluster like genes with like genes.
–
Useful for finding co

regulated genes but not really for
finding the structure of the regulation process.
OVERVIEW
Overview
•
How to discover key relations in cellular
systems given large amounts of micro
array
data.
•
Propose a Bayesian Network framework for
gene interaction discovery from micro array
data.
–
Trying to build statistical dependencies.
–
Understand interactions from multiple expression
measurements.
Overview
•
Want to uncover properties of the network by
examining the dependence and conditional
dependence of the gene data.
–
How does one gene interact with another etc.
–
Can use this information to determine causal
influence.
BAYES
NETS
Bayesian Network
Bayesian Network
•
Useful for a few reasons
–
Great for describing locally interacting entities.
–
Well understood array of algorithms and
successful use in many areas.
–
Can be used to infer a causal network even though
they are not mathematically defined as such.
–
Able to handle noise fairly well.
Causal Network
•
Very similar to a typical Bayesian net.
•
Bayesian network with a strict requirement
that the relationships are causal.
–
X causes something about Y.
•
Learning multiple networks with the same
directed path could mean there is a causal
indication between X and Y.
Bayes
vs
Causal
•
Bayesian Network generally deals with
dependence.
•
Causal Networks deal with strict relationships.
•
Bayesian Network can have equivalent
networks.
–
X
Y is equivalent to Y
X
•
Causal Network
–
The above cannot hold due to the definition of
Causal networks.
Learning Causal Patterns
•
Need to determine a causal interpretation of
the network.
•
Observation
–
Passive domain measurement.
•
Intervention
–
Setting variable values using outside forces.
Causal Markov assumption
•
Given the values of a variables immediate
causes, it is independent of its earlier causes.
–
Once we know the makeup of the genes parents,
we don’t care about the ancestors anymore in
terms of the current gene.
Analyzing Expression Data
•
Consider distributions over all possible states (
can include environmental states etc)
•
State of the system is a series of random
variables.
–
Each random variable denotes expression level of
each gene.
•
Take all of these variables and build the joint
distribution.
•
Difficult to learn from expression data due to
involving transcript levels from thousands of
genes!
•
However these gene networks are sparse so
Bayes
Nets are still well suited.
Learning the model
•
Markov relations are a feature that indicates if
two genes are related in a joint biological
process.
•
Order relations are a feature that captures a
global property about the network.
–
Used as an indication of some causality between X
and Y. Its not certain though.
Confidence of features
•
Produce m different networks and for each
feature of interest calculate its confidence.
•
Where f(G) is 1 if f is a feature of G, 0
otherwise.
m
i
i
G
f
m
f
conf
1
)
(
1
)
(
Learning the network structure
•
Issues
–
Extremely large search space (super

exponential
in the number of variables)
•
Need to id potential parents for each gene
using simple statistics to build the network.
–
Reduces search space to networks that only
contain the candidate parents as parents of some
variables X
i
.
Different local probability models
•
Multinomial Model
–
Treat each variable as discrete and learn
multinomial distribution to describe the possible
state of each child given the stat of the parents.
•
Linear Gaussian Model
–
Linear regression model for the child given its
parents.
Results
•
Applied Cell Cycle Expression patterns.
•
76 gene expression measurements.
•
Treat each measurement as an independent
sample.
•
Performed the boot strapping algorithm along
with the sparse search algorithm to extract
learned features.
–
Performed on only 250 genes
Test robustness
•
Tested their confidence assessment by using a
randomly created data set. Random
permutation of the order of experiments per
gene.
–
Found that random data did not perform well due
to not finding real features that correspond in the
data.
–
Tells us that the learned features are not artifacts
of the boot strapping estimation.
•
Managed to extract plausible biological
knowledge without use of priors.
•
Framework builds a much “richer” structure
from the data compared to clustering
techniques.
•
Capable of discovering causal relationships
between genes from expression data.
Comments 0
Log in to post a comment