Genetic network inference: from co-expression clustering to reverse ...

rapidparentΒιοτεχνολογία

12 Δεκ 2012 (πριν από 8 χρόνια και 7 μήνες)

302 εμφανίσεις

Genetic network inference:
from co
-
expression clustering
to reverse engineering

Patrik D’haeseleer,Shoudan Liang
and Roland Somogyi


The goal of this review


Principles of genetic network
organization


Computational methods for extracting
network architectures from
experimental data


Outline


Introduction



A conceptual approach to complex network
dynamics


Inference of regulation through clustering of
gene expression data


Modeling methodologies


Gene network inference:reverse engineering


Conclusions and Outlook


Genes encode proteins, some of which
in turn regulate other genes



determine the structure of this
intricate network of genetic regulatory
interactions


Traditional approach: local


Examining and collecting data on a single
gene, a single protein or a single reaction
at a time



functional genomics


Functional Genomics


Specifically,
functional genomics

refers to the
development and application of
global

experimental approaches to assess gene
function by making use of the information
and reagents provided by structural genomic.


high throughput


large scale experimental methodologies combined
with statistical and computational analysis of the
results.

Functional Genomics(Cont.)


We need to define the mapping from
sequence space to functional space.

Intermediate representation


Focus at the level of single cells


A biological system can be considered
to be a state machine,where the
change in internal state of the system
depends on both its current internal
state and any external inputs.


The goal



Observe the state of a cell and how it
changes under different circumstances,
and from this to derive a model of how
these state changes are generated


The state of cell


All those variables determining its behavior

Example


A simple,6
-
node regulatory network

Outline


Introduction


A conceptual approach to complex network
dynamics


Inference of regulation through clustering of
gene expression data


Modeling methodologies


Gene network inference:reverse engineering


Conclusions and Outlook


The global gene expression pattern is
the result of the collective behavior of
individual regulatory pathways


Gene function depends on its cellular
context; thus understanding the
network as a whole is essential.

Boolean Networks


Each gene is considered as a binary
variable

either ON or OFF

regulated
by other genes through logical or
Boolean functions.


Even with this simplification ,the
network behavior is already extremely
rich.

Boolean Networks(Cont.)



Cell differentiation corresponds to
transitions from one global gene
expression pattern to another.

Outline


Introduction


A conceptual approach to complex network
dynamics


Inference of regulation through clustering of
gene expression data


Modeling methodologies


Gene network inference:reverse engineering


Conclusions and Outlook

Scoring methods


Whether there has been a significant
change at any one condition


Whether there has been a significant
aggregate change over all conditions


Whether the fluctuation pattern shows
high diversity according to Shannon
entropy

Guilt By Association


Select a gene


Determine its nearest neighbors in
expression space within a certain user
-
defined distance cut
-
off

Clustering


extract groups of genes that are tightly
co
-
expressed over a range of different
experiments.


Caution


Different clustering methods can have
very different results


It’s not yet clear which clustering
methods are most useful for gene
expression analysis.


Definition:Gene Expression
Profile


An
expression profile e
j

of an ordered
list of N samples(k=1 to N) for a
particular gene j is a vector of scaled
expression values v
jk


The expression profile is:


e
j
=(v
j1
,v
j2
,v
j3
,…,v
jN
)


Definition:Gene Expression
Profile( Cont.)


A
difference

between two genes p and
q may be estimated as N
-
dimensional
metric “distance” between e
p

and e
q
.


Euclidean distance
:



=

N
v
v
N
j
jq
jp



..
1
2
)
(
pq
d
Clustering algorithms


Non
-
hierarchical methods


Cluster N objects into K groups in an
iterative process until certain goodness
criteria are optimized


E.g. K
-
means


Clustering algorithms


Hierarchical methods


Return an hierarchy of nested clusters,
where each cluster typically consists of the
union of two or more smaller clusters.


Agglomerative methods


Start with single object clusters and recursively
merge them into larger clusters


Divisive methods


Start with the cluster containing all objects and
recursively divide it into smaller clusters

Other applications of co
-
expression clusters


Extraction of regulatory motifs


Genes in the same expression share biological
funtions


Inference of functional annotation


Functions of unknown genes may be hypothesized
from genes with know function within the same
cluster


As a molecular signature in distinguishing cell
or tissue types


mRNA expression

Which clustering method to
use?



There is no single best criterion for
obtaining a partition because no precise
and workable definition of ‘cluster’
exists.


Clusters can be of any arbitrary shapes
and sizes in a multidimensional pattern
space.

Challenge in cluster analysis


A gene could be a member of several
clusters, each reflecting a particular
aspect of its function and control


Solutions


clustering methods that partition genes
into non
-
exclusive clusters


Several clustering methods could be used
simultaneously

Outline


Introduction


A conceptual approach to complex network
dynamics


Inference of regulation through clustering of
gene expression data


Modeling methodologies


Gene network inference:reverse engineering


Conclusions and Outlook

Level of biochemical detail


abstract


Boolean networks


concrete



Full biochemical interaction models with
stochastic kinetics in Arkin et al.(1998)

Forward and inverse modeling


Forward modeling approach


Inverse modeling, or reverse
engineering


Given an amount of data, what can we
deduce about the unknown underlying
regulatory network?


Requires the use of a parametric model,
the parameters of which are then fit to the
real
-
world data.

Outline


Introduction


A conceptual approach to complex network
dynamics


Inference of regulation through clustering of
gene expression data


Modeling methodologies


Gene network inference:reverse engineering


Conclusions and Outlook

Goal of network inference


Construct a coarse
-
scale model of the
network of regulatory interactions
between the genes


It’s possible to reverse engineer a
network from its activity profiles

Data requirements


We need to observe the expression of
that gene under many different
combinations of expression levels of its
regulatory inputs


Use data from different sources


Deal with different data types

Estimates for network models


a sparse network model of
N

genes,
where each gene is only affected by

K

other genes on average.



a sparsely connected, directed graph
with
N

nodes and
NK

edges.

Estimate for network
models(Cont.)


To specify the correct model, we need








bits of information.





)!
(
)!
(
!
log
log
2
2
2
NK
N
NK
N
C
NK
N


)
/
log(
K
N
NK

Correlation Metric Construction


Adam Arkin and John Ross


A method to reconstruct reaction
networks from measured time series of
the component chemical species.


The system is driven using inputs for
some of the chemical species and the
concentration of all the species is
monitored over time.




Correlation Metric
Construction(Cont. )


The time
-
lagged correlation matrix is
calculated


From this a distance matrix is constructed
based on the maximum correlation between
any two chemical species


This distance matrix is then fed into a simple
clustering algorithm to generate a tree of
connections between the species


The results are mapped into a two
-
dimensional graph for visualization

Additive regulation models


Property


The regulatory inputs are combined using
a weighted sum


Can be used as a first
-
order
approximation to the gene network


Additive regulation models


The change in each variable over time is
given by a weighted sum of all other
variables




is the level of the i
-
th varibale



is a bias term indicating whether I is expressed
of not in the absence of regulatory inputs



represents the influence of j on the regulation of
i





j
i
j
ji
i
b
y
w
y
i
y
i
b
ji
w
Use of such models


We can infer regulatory interactions
directly from the data, by fitting these
simple network models to large scale
gene expression data.


Outline


Introduction


A conceptual approach to complex network
dynamics


Inference of regulation through clustering of
gene expression data


Modeling methodologies


Gene network inference:reverse engineering


Conclusions

Conclusion


Conceptual foundations for
understanding complex biological
networks


Several practical methods for data
analysis