Modeling and Simulation of Genetic
Regulatory Systems
paper by
Hidde
de Jong
reviewed by
Ulrich Basters
and Christian Hahn
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
1
/41
0.1 Overview
Introduction
Directed and undirected graphs
Bayesian networks
Boolean networks
Generalized logical networks
Non

linear ordinary differential equations
Piecewise linear differential equations
Qualitative differential equations
Partial differential equations
Stochastic master equations
Rule based formalisms
Conclusion
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
2
/41
1.1 Genetic Regulatory Systems
In order to understand the functioning of organisms on the
molecular level,
we
need to know which genes are
expressed, when and where in the organism, and to which
extend.
The regulation of gene expression is achieved through
genetic regulatory systems, structured by networks of
interactions between DNA, RNA, proteins, and small
molecules.
Intuitive understanding of whole dynamic is hard to obtain
Consequence: formal methods and computer tools for
modeling and simulating might be an approach
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
3
/41
1.2 Genetic Regulatory Systems
Genes have influence on each other as they produce
proteins that work as promoters or repressors on other
genes.
Complex system where different concentrations of an
agent trigger different actions
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
4
/41
1.3 Motivation
Genetic regulatory systems hard to understand in whole
complexity
GOAL: Complexity reduction by appropriate models and
formalisms
Better understanding of GRSes
Intuitive visualization of GRSes
Better analysis of GRSes
Models can give
hints
where to continue research for
dependencies
Models point out important parts of the system
Gaining understanding of emergence of complex patterns
of behavior from interactions between genes in a
Regulatory Network
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
5
/41
1.4 Modeling Life

Cycle
Process model refining the development of a technique
that models a GRN:
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
6
/41
2.1 Directed and undirected Graphs
–
Motivation/Definition
Probably most straightforward way to model a GRN
G=<V,E>
V
set of vertices
Set of edges
E=<i,j>
where
i,j є V
, head and tail of edge
Additional labels denote positive/negative influence
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
7
/41
2.2 Directed and undirected Graphs

Summary
Advantages:
Intuitive way of visualization
Common and well explored graph algorithms can make
biologically relevant predictions about GRSes:
paths between genes may reveal missing regulatory
interactions or provide clues about redundancy
cycles in the network point at feedback relations
connectivity characteristics give indication of the complexity
loosely connected subgraphs point at functional modules
Disadvantages:
Time does not play a role
Too much abstraction: very simplified model far from
reality
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
8
/41
3.1 Bayesian Networks

Definition
Directed acyclic graph
G=<V,E>
Vertices
1≤i≤n, iєV
represent genes or other elements.
Correspond to random variables
X
i
X
i
conditional distribution
p(X
i
 parents(X
i
))
, where
parents(X
i
)
denotes direct regulators
Conditional Independency:
i(X
i
; Y  Z)
expresses fact that
X
i
is independent of
Y
given
Z
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
9
/41
3.2 Bayesian Networks
–
Markov Assumption
Graph encodes Markov assumption, stating that for every
gene
i
in
G
the conditional independency holds
Method is used to analyse dependencies between genes,
not applicable for a system

simulation
Techniques rely on a matching score to evaluate networks
and search for the network with optimal score
Graphs are said to be equivalent, if they imply the same
set of independencies thus forming an equivalence class
(useful for determining important subgraphs)
Looking at Markov and order relations between pairs of
genes may point to a relationship between the genes
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
10
/41
3.3 Bayesian Networks
–
Summary
Advantages
Attractive because of solid basics in statistics (enables to
deal with stochastic aspects and noisy measurements in a
natural way)
Applicable also if incomplete knowledge about the system
is available.
Shows up important parts of the system
–
usually only a
few genes play an important role in large systems
Disadvantages
Incomplete knowledge under

determines the network (at
best a few dozen experiments provide information on
transcription of thousands of genes)
Search is known as NP

hard. Heuristics are used but they
do not guarantee to find a globally optimal solution
Static network
–
leaves out dynamic aspects
→
fixed by
Dynamic Bayesian Networks
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
11
/41
4.1 Boolean Networks

Definition
State of a gene can be expressed by boolean variable
expressing that it is active (=1) or inactive (=0)
Interactions between genes can be represented by boolean
functions calculating the state of a gene from activation of
other genes
Results in a
Boolean Network
:
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
12
/41
4.2 Boolean Networks
–
Definition/Properties
Method is similar to circuitry
n

vector of variables in a Boolean Network represents
the state of a regulatory system of n elements, each has
value 0 or 1
So system consists of
2
n
states
State of an element at timepoint
t+1
computed by
boolean function or rule the state of
k
of the
n
elements
at time point
t
maps
k
inputs to an output value
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
13
/41
4.3 Boolean Networks
–
Properties
Transitions between states are
deterministic
and
synchronous
(outputs of elements are updated
simultaneously)
Sequence of states forms a
trajectory
of the system
A trajectory will either reach a steady state
(point
attractor)
or a state cycle
(dynamic attractor)
as number
of states is finite
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
14
/41
4.4 Boolean Networks
–
Summary
Advantages
Efficient analysis of large RN
Positive/negative feedback

cycles can be modeled with
BN‘s
Disadvantages
Strong simplifying assumptions
–
gene is either on or off,
no in between states
Transitions assumed to occur synchronously
–
not usually
the case, so certain behaviors may be not predicted by
simulation algorithm
There are situations where boolean idealisation is not
appropriate
–
more general methods required
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
15
/41
5.1 Generalized Logical Networks (GLN)
–
Definition
Generalizes Boolean Networks
–
allows variables to have
more than 2 values
Transitions between states occur asynchronously
Discrete, so called
logical variables
being abstractions of
real concentration values
x
i
Possible values of of element
i
defined by thresholds of
influence on other elements
–
if element has influence on
p
other elements it may have
p
different thresholds
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
16
/41
5.2 GLN
–
Definition
Formally:
If an element
i
influences
p
other elements, then it will
have
p
distinct thresholds
has the possible values
{0,...,p}
and is defined by:
The vector denotes the logical state of the RN
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
17
/41
5.3 GLN
–
Definition
The pattern of an interaction is described by logical
equations of the form:
is called the
image
of , which denotes the value
towards which tends when the logical state is
Positive and negative feedback

loops are possible to model
Refinement of simple on/off variables in Boolean Networks
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
18
/41
5.4 GLN
–
Properties
A
logical steady state
occurs, when the logical state equals
its image:
Since the number of logical states is finite, one can test for
logical steady states, other states are called
transient
logical states
If the system is in a transient logical state, it will make a
transition into another logical state
Since a logical variable will move into the direction of its
image, the successor states can be deduced by comparing
the value of a logical variable with that of its image
The logical states and transitions among them can be
organized in a
state transition graph
Analyses of state transitions, time delays, translation and
transport can be taken into account
Improves standard Boolean Network model
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
19
/41
6.1 Nonlinear Ordinary Differential Equations

Definition
Models the concentration of RNA, proteins and other
molecules by time

dependant variables
Gene regulation is modeled by
rate equations,
expressing
the rate of production of a component as function of the
concentrations of other components
Rate expressions have the following form:
where
x = [x
1
, ... , x
n
] ≥ 0
denotes the vector of
concentrations and
ƒ
i
: R
n
→ R
a usually non

linear function
Discrete time delays
τ
i1
, ... , τ
in
> 0
can also be represented:
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
20
/41
6.2 ODE

Definition
Goal: Specifying function
ƒ
i
k
1,n
, k
2,1
, ... , k
n,n

1
> 0
are production constants and gamma are
degradation constants
The rate expression express a balance between the number
of molecules appearing, disappearing per unit time
For
x
1
, a
regulation function
r: R → R
is involved whereas the
concentration for
i > 1
increases linearly in
x
i

1
An often used regulation function is the so

called
Hill curve:
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
21
/41
6.3 ODE

Definition
θ
j
> 0
describes the threshold for the regulatory influence of
x
j
to a target gene
m is stepness parameter
The
h
+

function ranges from 0 to 1
An increase in
x
j
(
x
j
→
∞
) will tend to increase the
expression rate of a gene (activation),
In order to express that an increase of
x
j
will tend to
decrease the expression rate (inhibition), the regulation
function is replaced by:
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
22
/41
6.4 ODE Properties
Advantages
More „realistic“ way of modeling
Disadvantages
Lack of
in vivo
or
in vitro
measurements of kinetic
parameters in the rate equations
Numerical parameter values are available for only a
handful of well

studied systems (
λ

phage)
In most cases parameter values had to be chosen such
that the models were able to reproduce observed
qualitative behavior
For larger models finding appropriate values may be
difficult
Solution
Growing availability of data could handle the problem to
some extent
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
23
/41
7.1 Piecewise

Linear Differential Equations (PLDE)

Definition
Special case of rate equation, two simplifications:
Interactions by directly relating the expression levels of
genes in the network.
Continuous sigmoid curves is approximated by
discontinuous step functions
PLDEs have the following form:
Where
x
i
denotes the cellular concentrations of the
product of gene i and
γ > 0
the degradation rate
The function
g
i
: R
n
≥
0
→
R
≥
0
is defined as:
where
k
il
> 0
is a rate parameter,
b
il
: R → {0,1}
a
combination of step functions
b
il
is arithmetic equivalent of a boolean function,
expressing conditions under which gene is expressed at
a rate
k
il
(step function)
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
24
/41
7.2 PLDE

Graphical simplification
Consider an n

dimensional hyperbox defined by:
Assume that for all threshold concentrations
θ
ik
of the
protein encoded by gene
i
it holds that
θ
ik
< max
i
The n

1 hyperplanes defined by the thresholds divide the
box into orthants
Each orthant of the box reduces to ODEs with a constant
production term
μ
i
composed of rate parameters in
b
i
:
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
25
/41
7.3 PLDE

Example
State equations corresponding to the orthant
0 ≤ x
1
< θ
21
, θ
12
< x
2
≤ max
2
and
θ
33
< x
3
≤ max
3
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
26
/41
8.1 Qualitative Differential Equations (QDE)

Definition
Incomplete understanding GRNs and absence of
quantitative knowledge
→
need for qualitative simulation
techniques
Idea behind QDE: abstract discrete description from
continuous model
Discrete abstraction then used to draw conclusions about
the dynamics of the system
QDEs are abstractions of ODEs of the form:
where
ƒ
i
: R → R
and
x
take a
qualitative value
composed of
a qualitative magnitude and direction
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
27
/41
8.2 QDE

Properties
The qualitative magnitude of a variable
x
i
is a discrete
abstraction of its real value, the qualitative direction is the
sign of its derivate
The function
ƒ
i
is abstracted into a set of
qualitative
constraints
Algorithm (QSIM) generates a tree of qualitative behaviors
out of an initial qualitative state consisting of qualitative
values
Each behavior in the tree describes a possible sequence of
state transitions from the initial state
Every qualitatively distinct behavior of the ODE
corresponds to a behavior in the tree generated from the
QDE (the reverse may not be true)
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
28
/41
8.3 QDE

Summary
Problems
Limited up scalability, behavior trees quickly grow out of
bounds
Solutions
Using a simulation algorithm tailored to the equations,
larger networks with complex feedback loops can be
treated
Advantages
allow weak numerical information
Integration of numerical information is more difficult to
achieve in logical approaches
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
29
/41
8.4 QDE
–
HYPGENE / GENSIM
Qualitative process theory
is used for construction and
revision of gene regulation models
User definition and knowledge base are used by GENSIM
to simulate a proposed experiment
If the predictions
do not match, HYPGENE

algorithm
generates hypothesis to explain the discrepancies
HYPGENE revises assumptions about the experimental
conditions
Helps to refine the model
Both algorithms have been able to partially reproduce the
experimental reasoning of the attenuation mechanism
regulating the synthesis of tryptophan in E.coli
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
30
/41
9.1 Partial Differential Equations (PDE)
–
Motivation
Regulatory systems are assumed to be spatially
homogenous
Important in certain situations to abstract from these
assumptions
Distinguish between different compartments of a cell, for
example nucleus and cytoplasm or multiple cells affecting
each other
Diffusion of regulatory proteins or metabolites for one
compartment to another
This is a critical feature in embryonal development
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
31
/41
9.2 Partial Differential Equations (PDE)
–
Definition
The reaction

diffusion

equation (for a row of cells):
Can be adapted to other 1

or higher dimensional spacial
configurations
If number of cells is large enough, discrete variable
l
can
be replaced by continuous variable
λ
representing the size
of the system
Concentration variables now are defined as functions of
l
and
t
and the reaction

diffusion

equations become a partial
differential equation (PDE):
Using
modes
or
eigenfunctions
of the Laplacian operator
gives:
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
32
/41
9.3 Partial Differential Equations (PDE)
–
Definition
Product of gene 1, the activator, must positively regulate
itself; product of gene 2, the inhibitor, must negatively
regulate gene 1
Activator

inhibitor

systems were extensively used to study
the emergence of segmentation patterns in the early
Drosophila embryo
Observed spacial and temporal expression patterns of
genes much resemble to the models modes
Numerical simulations demonstrated that some aspects of
stripe formation in the Drosophila blastoderm can indeed
be reproduced this way
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
33
/41
9.4 Partial Differential Equations (PDE)
–
Properties
Shown formula still not applicable in all situations, more
complex formulas were formed for several special cases
Predictions quite sensitive to the shape of the spacial
domain, the boundary conditions and chosen parameter
values
Models need to be simple and usually are strong
abstractions of biological processes (i.e. only watch at
concentrations of a few gene

products)
For larger and more complex models computational costs
for finding an optimal fit between data and parameters
may be prohibitively high
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
34
/41
10.1 Stochastic Master Equations

Motivation
Differential equations describe gene regulation in great
detail
Differential equations presuppose the concentrations of
substances continuously and deterministically
Both assumptions are questionable in the case of gene
regulation
So, we prefer to use a discrete and stochastic approach
Discrete amounts
X
of molecules
are taken as state
variables, joint probability distribution
p(X, t)
is introduced
to express probability that at time
t
the cell contains
X
1
molecules of the first species,
X
2
of the second, etc.
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
35
/41
10.2 Stochastic Master Equations

Definition
The time evolution of
p(X, t)
can be expressed as:
Where
m
is number of reactions,
α
j
Δt
the probability that reaction
j
will occur in the
interval
[t, t+Δt]
given that system is in state
X
at time
t
β
j
Δt
the probability that reaction
j
will bring the system
in state
X
from another state in
[t, t+ Δ
t
]
Rearranging and taking limit
Δt → 0
gives the
Master
equation
:
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
36
/41
10.3 Stochastic Master Equations

Properties
Master equations can be approximated by stochastic
differential equations
An alternative approach would be to disregard the master
equations and directly simulate the time evolution
Based on the stochastic simulation approach
Determines when the next reaction occurs and of which
type it will be
Revises the state in accordance with this reaction
Continuous at the resulting next state
Master equations deal with the behavior averages,
stochastic simulation provides information on individual
behaviors
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
37
/41
10.4 Stochastic Master Equations

Summary
Advantages
Simulation results in closer approximations to the
molecular reality of gene regulation
Disadvantages
The use of stochastic simulation is not always evident
Requires detailed knowledge
Simulation is costly
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
38
/41
11.1 Rule

Based Formalisms (RBF)

Definiton
Knowledge

based
or
rule

based simulation
formalisms,
permit rich knowledge about system to express in a single
formalism
Consist of two components:
facts
and
rules
The rules consist of two parts:
condition
and
action
Advantages
Capability to deal with a richer variety of biological
knowledge
Disadvantages
Difficulties in maintaining a consistent knowledge base
RBF cannot compete with former formalisms
(quantitatively)
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
39
/41
12.1 Conclusions
Major difficulties in modeling and simulating genetic
regulatory networks:
Biochemical reaction mechanisms are not known or a
incompletely known
Quantitative information and molecular concentration is
only selfdom available
Formalisms discussed allow GRSes to be modeled in quite
different ways
–
depending on application:
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
40
/41
12.2 Expectations
Emergence of new experimental techniques promise to
relieve the data bottleneck
Increasing knowledge on molecular mechanisms to model
regulatory systems allow a finer level of granularity
The use of quantitative models permits larger systems to
be studied at a higher precision
The expectations will bring researchers nearer to the
ultimate goal: to use models that integrate gene
regulation with metabolism, signal transduction,
replication and repair and a variety of other celluar
processes
Each of the approaches above has its merits, but neither of
them seems sufficient in itself
It can be expected that a combination of the two
approaches, exploiting a wide range of structural and
functional information on regulatory networks, will be most
effective
Seminar Bioinformatics

Modelling and Simulation of Genetic Regulatory Systems

Christian Hahn, Ulrich Basters, 08/27/2003
41
/41
13.1 References / Acknowledgements
References (and all images taken from)
Hidde de Jong,
Modeling and simulation of genetic
regulatory systems: a literature review
; J Comput Biol.
2002;9(1):67

103. Review.
Acknowlegements
Thanks to Marite Sirava and Thomas Schäfer at ZBI of
Universität des Saarlandes for supporting us to work out
this talk
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment