paper by Hidde de Jong

ocelotgiantAI and Robotics

Nov 7, 2013 (3 years and 7 months ago)

88 views

Modeling and Simulation of Genetic
Regulatory Systems

paper by
Hidde

de Jong

reviewed by

Ulrich Basters

and Christian Hahn

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

1
/41

0.1 Overview


Introduction


Directed and undirected graphs


Bayesian networks


Boolean networks


Generalized logical networks


Non
-
linear ordinary differential equations


Piecewise linear differential equations


Qualitative differential equations


Partial differential equations


Stochastic master equations


Rule based formalisms


Conclusion

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

2
/41

1.1 Genetic Regulatory Systems


In order to understand the functioning of organisms on the
molecular level,
we

need to know which genes are
expressed, when and where in the organism, and to which
extend.


The regulation of gene expression is achieved through
genetic regulatory systems, structured by networks of
interactions between DNA, RNA, proteins, and small
molecules.


Intuitive understanding of whole dynamic is hard to obtain


Consequence: formal methods and computer tools for
modeling and simulating might be an approach

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

3
/41

1.2 Genetic Regulatory Systems


Genes have influence on each other as they produce
proteins that work as promoters or repressors on other
genes.


Complex system where different concentrations of an
agent trigger different actions

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

4
/41

1.3 Motivation


Genetic regulatory systems hard to understand in whole
complexity


GOAL: Complexity reduction by appropriate models and
formalisms


Better understanding of GRSes


Intuitive visualization of GRSes


Better analysis of GRSes


Models can give
hints

where to continue research for
dependencies


Models point out important parts of the system


Gaining understanding of emergence of complex patterns
of behavior from interactions between genes in a
Regulatory Network

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

5
/41

1.4 Modeling Life
-
Cycle


Process model refining the development of a technique
that models a GRN:

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

6
/41

2.1 Directed and undirected Graphs


Motivation/Definition


Probably most straightforward way to model a GRN


G=<V,E>


V

set of vertices


Set of edges
E=<i,j>

where
i,j є V
, head and tail of edge


Additional labels denote positive/negative influence

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

7
/41

2.2 Directed and undirected Graphs
-

Summary

Advantages:


Intuitive way of visualization


Common and well explored graph algorithms can make
biologically relevant predictions about GRSes:


paths between genes may reveal missing regulatory
interactions or provide clues about redundancy


cycles in the network point at feedback relations


connectivity characteristics give indication of the complexity


loosely connected subgraphs point at functional modules


Disadvantages:


Time does not play a role


Too much abstraction: very simplified model far from
reality

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

8
/41

3.1 Bayesian Networks
-

Definition


Directed acyclic graph
G=<V,E>


Vertices
1≤i≤n, iєV

represent genes or other elements.
Correspond to random variables
X
i


X
i

conditional distribution
p(X
i

| parents(X
i
))
, where
parents(X
i
)

denotes direct regulators


Conditional Independency:
i(X
i
; Y | Z)

expresses fact that
X
i

is independent of
Y

given
Z

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

9
/41

3.2 Bayesian Networks


Markov Assumption


Graph encodes Markov assumption, stating that for every
gene
i

in
G

the conditional independency holds


Method is used to analyse dependencies between genes,
not applicable for a system
-
simulation


Techniques rely on a matching score to evaluate networks
and search for the network with optimal score


Graphs are said to be equivalent, if they imply the same
set of independencies thus forming an equivalence class
(useful for determining important subgraphs)


Looking at Markov and order relations between pairs of
genes may point to a relationship between the genes

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

10
/41

3.3 Bayesian Networks


Summary

Advantages


Attractive because of solid basics in statistics (enables to
deal with stochastic aspects and noisy measurements in a
natural way)


Applicable also if incomplete knowledge about the system
is available.


Shows up important parts of the system


usually only a
few genes play an important role in large systems


Disadvantages


Incomplete knowledge under
-
determines the network (at
best a few dozen experiments provide information on
transcription of thousands of genes)


Search is known as NP
-
hard. Heuristics are used but they
do not guarantee to find a globally optimal solution


Static network


leaves out dynamic aspects


fixed by
Dynamic Bayesian Networks

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

11
/41

4.1 Boolean Networks
-

Definition


State of a gene can be expressed by boolean variable
expressing that it is active (=1) or inactive (=0)


Interactions between genes can be represented by boolean
functions calculating the state of a gene from activation of
other genes


Results in a
Boolean Network
:

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

12
/41

4.2 Boolean Networks


Definition/Properties


Method is similar to circuitry


n
-
vector of variables in a Boolean Network represents
the state of a regulatory system of n elements, each has
value 0 or 1


So system consists of
2
n

states


State of an element at timepoint
t+1

computed by
boolean function or rule the state of
k

of the
n

elements
at time point
t



maps
k

inputs to an output value

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

13
/41

4.3 Boolean Networks


Properties


Transitions between states are
deterministic

and
synchronous

(outputs of elements are updated
simultaneously)


Sequence of states forms a
trajectory

of the system


A trajectory will either reach a steady state
(point
attractor)

or a state cycle
(dynamic attractor)

as number
of states is finite


Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

14
/41

4.4 Boolean Networks


Summary

Advantages


Efficient analysis of large RN


Positive/negative feedback
-
cycles can be modeled with
BN‘s


Disadvantages


Strong simplifying assumptions


gene is either on or off,
no in between states


Transitions assumed to occur synchronously


not usually
the case, so certain behaviors may be not predicted by
simulation algorithm


There are situations where boolean idealisation is not
appropriate


more general methods required

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

15
/41

5.1 Generalized Logical Networks (GLN)


Definition


Generalizes Boolean Networks


allows variables to have
more than 2 values


Transitions between states occur asynchronously


Discrete, so called
logical variables

being abstractions of
real concentration values
x
i


Possible values of of element
i

defined by thresholds of
influence on other elements


if element has influence on
p

other elements it may have
p

different thresholds

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

16
/41

5.2 GLN


Definition

Formally:


If an element
i

influences
p

other elements, then it will
have
p

distinct thresholds





has the possible values
{0,...,p}

and is defined by:








The vector denotes the logical state of the RN



Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

17
/41

5.3 GLN


Definition


The pattern of an interaction is described by logical
equations of the form:





is called the
image

of , which denotes the value
towards which tends when the logical state is


Positive and negative feedback
-
loops are possible to model


Refinement of simple on/off variables in Boolean Networks

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

18
/41

5.4 GLN


Properties


A
logical steady state

occurs, when the logical state equals
its image:


Since the number of logical states is finite, one can test for
logical steady states, other states are called
transient
logical states



If the system is in a transient logical state, it will make a
transition into another logical state


Since a logical variable will move into the direction of its
image, the successor states can be deduced by comparing
the value of a logical variable with that of its image


The logical states and transitions among them can be
organized in a
state transition graph


Analyses of state transitions, time delays, translation and
transport can be taken into account


Improves standard Boolean Network model

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

19
/41

6.1 Nonlinear Ordinary Differential Equations
-

Definition


Models the concentration of RNA, proteins and other
molecules by time
-
dependant variables


Gene regulation is modeled by
rate equations,
expressing
the rate of production of a component as function of the
concentrations of other components


Rate expressions have the following form:




where
x = [x
1
, ... , x
n
] ≥ 0

denotes the vector of
concentrations and
ƒ
i
: R
n

→ R

a usually non
-
linear function


Discrete time delays
τ
i1
, ... , τ
in

> 0

can also be represented:

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

20
/41

6.2 ODE
-

Definition


Goal: Specifying function
ƒ
i





k
1,n
, k
2,1
, ... , k
n,n
-
1

> 0

are production constants and gamma are
degradation constants


The rate expression express a balance between the number
of molecules appearing, disappearing per unit time


For
x
1
, a
regulation function

r: R → R

is involved whereas the
concentration for
i > 1

increases linearly in
x
i
-
1


An often used regulation function is the so
-
called
Hill curve:

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

21
/41

6.3 ODE
-

Definition


θ
j
> 0

describes the threshold for the regulatory influence of
x
j

to a target gene


m is stepness parameter


The
h
+
-
function ranges from 0 to 1


An increase in
x
j

(
x
j


) will tend to increase the
expression rate of a gene (activation),


In order to express that an increase of
x
j

will tend to
decrease the expression rate (inhibition), the regulation
function is replaced by:

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

22
/41

6.4 ODE Properties

Advantages


More „realistic“ way of modeling


Disadvantages


Lack of
in vivo

or
in vitro

measurements of kinetic
parameters in the rate equations


Numerical parameter values are available for only a
handful of well
-
studied systems (
λ
-
phage)


In most cases parameter values had to be chosen such
that the models were able to reproduce observed
qualitative behavior


For larger models finding appropriate values may be
difficult


Solution


Growing availability of data could handle the problem to
some extent

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

23
/41

7.1 Piecewise
-
Linear Differential Equations (PLDE)
-

Definition


Special case of rate equation, two simplifications:


Interactions by directly relating the expression levels of
genes in the network.


Continuous sigmoid curves is approximated by
discontinuous step functions


PLDEs have the following form:




Where
x
i

denotes the cellular concentrations of the
product of gene i and
γ > 0

the degradation rate


The function
g
i
: R
n

0



R

0

is defined as:





where
k
il
> 0

is a rate parameter,
b
il
: R → {0,1}

a
combination of step functions


b
il

is arithmetic equivalent of a boolean function,
expressing conditions under which gene is expressed at
a rate
k
il

(step function)

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

24
/41

7.2 PLDE
-

Graphical simplification


Consider an n
-
dimensional hyperbox defined by:




Assume that for all threshold concentrations
θ
ik

of the
protein encoded by gene
i

it holds that
θ
ik

< max
i


The n
-
1 hyperplanes defined by the thresholds divide the
box into orthants


Each orthant of the box reduces to ODEs with a constant
production term
μ
i

composed of rate parameters in
b
i
:

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

25
/41

7.3 PLDE
-

Example


State equations corresponding to the orthant


0 ≤ x
1

< θ
21
, θ
12

< x
2

≤ max
2

and

θ
33

< x
3

≤ max
3


Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

26
/41

8.1 Qualitative Differential Equations (QDE)
-

Definition


Incomplete understanding GRNs and absence of
quantitative knowledge


need for qualitative simulation
techniques


Idea behind QDE: abstract discrete description from
continuous model


Discrete abstraction then used to draw conclusions about
the dynamics of the system


QDEs are abstractions of ODEs of the form:





where
ƒ
i
: R → R

and
x

take a
qualitative value
composed of
a qualitative magnitude and direction


Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

27
/41

8.2 QDE
-

Properties


The qualitative magnitude of a variable
x
i

is a discrete
abstraction of its real value, the qualitative direction is the
sign of its derivate


The function
ƒ
i

is abstracted into a set of
qualitative
constraints


Algorithm (QSIM) generates a tree of qualitative behaviors
out of an initial qualitative state consisting of qualitative
values


Each behavior in the tree describes a possible sequence of
state transitions from the initial state


Every qualitatively distinct behavior of the ODE
corresponds to a behavior in the tree generated from the
QDE (the reverse may not be true)

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

28
/41

8.3 QDE
-

Summary

Problems


Limited up scalability, behavior trees quickly grow out of
bounds


Solutions


Using a simulation algorithm tailored to the equations,
larger networks with complex feedback loops can be
treated


Advantages


allow weak numerical information


Integration of numerical information is more difficult to
achieve in logical approaches

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

29
/41

8.4 QDE


HYPGENE / GENSIM


Qualitative process theory
is used for construction and
revision of gene regulation models


User definition and knowledge base are used by GENSIM
to simulate a proposed experiment


If the predictions

do not match, HYPGENE
-
algorithm
generates hypothesis to explain the discrepancies


HYPGENE revises assumptions about the experimental
conditions


Helps to refine the model


Both algorithms have been able to partially reproduce the
experimental reasoning of the attenuation mechanism
regulating the synthesis of tryptophan in E.coli

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

30
/41

9.1 Partial Differential Equations (PDE)


Motivation


Regulatory systems are assumed to be spatially
homogenous


Important in certain situations to abstract from these
assumptions


Distinguish between different compartments of a cell, for
example nucleus and cytoplasm or multiple cells affecting
each other


Diffusion of regulatory proteins or metabolites for one
compartment to another


This is a critical feature in embryonal development


Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

31
/41

9.2 Partial Differential Equations (PDE)


Definition


The reaction
-
diffusion
-
equation (for a row of cells):




Can be adapted to other 1
-

or higher dimensional spacial
configurations


If number of cells is large enough, discrete variable
l

can
be replaced by continuous variable
λ

representing the size
of the system


Concentration variables now are defined as functions of
l

and
t

and the reaction
-
diffusion
-
equations become a partial
differential equation (PDE):




Using
modes

or
eigenfunctions

of the Laplacian operator
gives:



Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

32
/41

9.3 Partial Differential Equations (PDE)


Definition


Product of gene 1, the activator, must positively regulate
itself; product of gene 2, the inhibitor, must negatively
regulate gene 1


Activator
-
inhibitor
-
systems were extensively used to study
the emergence of segmentation patterns in the early
Drosophila embryo


Observed spacial and temporal expression patterns of
genes much resemble to the models modes


Numerical simulations demonstrated that some aspects of
stripe formation in the Drosophila blastoderm can indeed
be reproduced this way

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

33
/41

9.4 Partial Differential Equations (PDE)


Properties


Shown formula still not applicable in all situations, more
complex formulas were formed for several special cases


Predictions quite sensitive to the shape of the spacial
domain, the boundary conditions and chosen parameter
values


Models need to be simple and usually are strong
abstractions of biological processes (i.e. only watch at
concentrations of a few gene
-
products)


For larger and more complex models computational costs
for finding an optimal fit between data and parameters
may be prohibitively high

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

34
/41

10.1 Stochastic Master Equations
-

Motivation


Differential equations describe gene regulation in great
detail


Differential equations presuppose the concentrations of
substances continuously and deterministically


Both assumptions are questionable in the case of gene
regulation


So, we prefer to use a discrete and stochastic approach


Discrete amounts
X

of molecules

are taken as state
variables, joint probability distribution
p(X, t)

is introduced
to express probability that at time
t

the cell contains
X
1

molecules of the first species,
X
2

of the second, etc.

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

35
/41

10.2 Stochastic Master Equations
-

Definition


The time evolution of
p(X, t)

can be expressed as:





Where
m

is number of reactions,


α
j
Δt

the probability that reaction
j

will occur in the
interval
[t, t+Δt]

given that system is in state
X

at time
t


β
j
Δt

the probability that reaction
j

will bring the system
in state
X

from another state in
[t, t+ Δ
t
]



Rearranging and taking limit
Δt → 0

gives the
Master
equation
:

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

36
/41

10.3 Stochastic Master Equations
-

Properties


Master equations can be approximated by stochastic
differential equations


An alternative approach would be to disregard the master
equations and directly simulate the time evolution


Based on the stochastic simulation approach


Determines when the next reaction occurs and of which
type it will be


Revises the state in accordance with this reaction


Continuous at the resulting next state


Master equations deal with the behavior averages,
stochastic simulation provides information on individual
behaviors


Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

37
/41

10.4 Stochastic Master Equations
-

Summary

Advantages


Simulation results in closer approximations to the
molecular reality of gene regulation


Disadvantages


The use of stochastic simulation is not always evident


Requires detailed knowledge


Simulation is costly

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

38
/41

11.1 Rule
-
Based Formalisms (RBF)
-

Definiton


Knowledge
-
based

or
rule
-
based simulation

formalisms,
permit rich knowledge about system to express in a single
formalism


Consist of two components:
facts

and
rules


The rules consist of two parts:
condition

and
action





Advantages


Capability to deal with a richer variety of biological
knowledge


Disadvantages


Difficulties in maintaining a consistent knowledge base


RBF cannot compete with former formalisms
(quantitatively)


Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

39
/41

12.1 Conclusions

Major difficulties in modeling and simulating genetic

regulatory networks:


Biochemical reaction mechanisms are not known or a
incompletely known


Quantitative information and molecular concentration is
only selfdom available

Formalisms discussed allow GRSes to be modeled in quite

different ways


depending on application:

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

40
/41

12.2 Expectations


Emergence of new experimental techniques promise to
relieve the data bottleneck


Increasing knowledge on molecular mechanisms to model
regulatory systems allow a finer level of granularity


The use of quantitative models permits larger systems to
be studied at a higher precision


The expectations will bring researchers nearer to the
ultimate goal: to use models that integrate gene
regulation with metabolism, signal transduction,
replication and repair and a variety of other celluar
processes


Each of the approaches above has its merits, but neither of
them seems sufficient in itself


It can be expected that a combination of the two
approaches, exploiting a wide range of structural and
functional information on regulatory networks, will be most
effective

Seminar Bioinformatics
-

Modelling and Simulation of Genetic Regulatory Systems
-

Christian Hahn, Ulrich Basters, 08/27/2003

41
/41

13.1 References / Acknowledgements

References (and all images taken from)


Hidde de Jong,
Modeling and simulation of genetic
regulatory systems: a literature review
; J Comput Biol.
2002;9(1):67
-
103. Review.



Acknowlegements


Thanks to Marite Sirava and Thomas Schäfer at ZBI of
Universität des Saarlandes for supporting us to work out
this talk