Genomic Cybernetics - Laboratory for Systems Biology and Bio ...

doubleperidotAI and Robotics

Nov 30, 2013 (3 years and 4 months ago)

87 views


15/07/2003 1/17
Mathematical Systems Biology: Genomic Cybernetics



Olaf Wolkenhauer

Dept. Biomolecular Sciences and Dept.
Electrical Eng. & Electronics
Control Systems Centre
UMIST
Manchester M60 1QD, UK
o.wolkenhauer@umist.ac.uk
Walter Kolch

Institute of Biomedical and Life Sciences

University of Glasgow
CRC Beatson Laboratories
Garscube Estate, Switchback Road
Glasgow G61 1BD, UK
wkolch@beatson.gla.ac.uk
Kwang-Hyun Cho

School of Electrical Engineering
University of Ulsan
Ulsan, 680-749, Korea
ckh@mail.ulsan.ac.kr




Abstract

The purpose of mathematical systems biology is to investigate genome expression and regulation
through mathematical modeling and systems theory in particular. The principal idea is to treat
gene expression and regulatory mechanisms of the cell cycle, morphological development, cell
differentiation and environmental responses as controlled dynamic systems.

Although it is common knowledge that cellular systems are dynamic and regulated processes, to
this date they are not investigated and represented as such. The kinds of experimental techniques,
which have been available in molecular biology, largely determined the material reductionism,
which describes gene expression by means of molecular characterization.

Instead of trying to identify genes as causal agents for some function, role, or change in
phenotype we ought to relate these observations to sequences of events. In other words, in
systems biology, instead of looking for a gene that is the reason, explanation or cause of some
phenomenon we seek an explanation in the dynamics (sequences of events ordered by time) that
led to it.

In mathematical systems biology we are aiming at developing a systems theory for the dynamics
of a cell. In this text we first define the concept of complexity in the context of gene expression
and regulation before we discuss the challenges and problems in developing mathematical models
of cellular dynamics, and provide an example to illustrate systems biology, its challenges and
perspectives of this emerging area of research.

Introduction: Action vs. Interactions

Gene expression is the process by which information, stored in the DNA is transformed via RNA
into proteins. While the availability of genome sequences is without doubt a revolutionary
development in the life sciences, providing a basis for technologies such as microarrays, the
principal aim of the post-genome era is to understand the organization (structure) and dynamics
(behavior) of genetic pathways. The area of genomics reflects this shift of focus from molecular
characterization of components to an understanding of the functional activity of genes, proteins
and metabolites. This shift of focus in genomics requires a change in the way we formally
investigate cellular processes: Here we suggest a dynamic systems approach to gene expression
and regulation, an approach we refer to as systems biology or genomic cybernetics.

Further below we are going to provide an example for intra-cellular dynamics by means of a
mathematical model for a signaling pathway. However, looking at cells interacting in the
morphological development of an organism provide another example for the importance of a

15/07/2003 2/17
dynamic-systems perspective of gene expression and regulation. For differentiation of cells in
development we find that the relation between the genome of a cell and the reactions which occur
in the cells we require a conceptual framework for both, spatial and temporal aspects in order to
capture the relationship between an internal programme and dynamic interactions between the
cell and its environment. The environment may be other cells, physical constraints or external
signals to which the cellular system can respond. While we suppose that the cells in a developing
organism can possess the same genome they nevertheless can develop and respond completely
different from one another. To answer why and how this can happen one ought to study gene
expression as a temporal process. The principle challenge for systems biology is then to answer
the following questions [adopted from Sole 2000]:
1. How do cells act and interact within the context of the organism to generate coherent
wholes?
2. How do genes act and interact within the context of the cell as to bring about structure
and function?
Asking how genetic pathways are dynamically regulated and spatially organized, we distinguish
between the action and interaction of genes and cells respectively (intra- and inter-cellular
dynamics). For example, considering morphological development, to what extend do genes
control the process or do genes only participate in a reactive fashion? Many decisions in
development are induction events mediated by the contact with the surroundings. The
multicellular context therefore determines what happens to the individual cell. For example,
cancer cells have lost this ability to respond and therefore disregard tissue organisation and grow
unrestricted and invasively. It seems that cells and eventually organs have an inherent
developmental programme which they execute unless instructed otherwise.

Since the 1960s it is known that the most basic cellular processes are dynamic, feedback
controlled and that cells display anticipatory behavior. In the 1960’s, investigating regulatory
proteins and the interactions of allosteric enzymes, Francois Jacob and Jaques Monod introduced
the distinction between ‘structural genes’ (coding for proteins) and ‘regulatory genes’, which
control the rate at which structural genes are transcribed. This control of the rate of protein
synthesis gave the first indication of such processes being most appropriately viewed as dynamic
systems. With the lack of experimental time-course data, mathematical models of gene regulatory
networks have so far focused on ordinary or stochastic differential equations and automata
[Tyson 2001, Hasty 2001]. For such models to be specific they only consider a small number of
genes and for simulations of many genes interacting, the relation to experimental data is lost. The
problem, also known as Zadeh’s uncertainty principle is further discussed below. It is clearly
important to explore the principal limits of how we can balance the composition of components
on a large scale, preserving the integrity of the whole system, with the individuality of its
components, and without loosing too much accuracy on the small scale. Since the two
organizational levels (gene vs. genome or cell vs. tissue/colony) are very different with regard to
how we can observe and represent them, different areas of research have evolved around these
organizational and descriptional levels. For example, while differential equations have been used
to develop accurate or predictive models of individual genes in a particular organism and context
[Tyson 2001], Boolean networks modeling hundreds and thousands of interacting genes have
been successful in capturing evolutionary aspects at the genome level [Kauffman 1995].

The challenge is to develop a conceptual framework, which integrates these models through
abstraction (i.e., generalization). For even the simplest of biological systems we find that a whole
range of techniques, ranging from time series analysis (regression models), dynamic systems
theory (rate equations, behavioral models), automata theory (finite state machines) and various
others are likely to be considered. The validation and evaluation of any mathematical model with
experimental data will further require pattern recognition techniques such as multivariate
clustering and component analysis. There is therefore a great need for an integration of
mathematical models and to formalize the modeling process itself. Possible approaches which

15/07/2003 3/17
may be able to integrate or unify these distinct methodologies are briefly discussed in the
following section.

Integrating Organizational and Descriptional Levels of Explanation

Depending on what biological problem is investigated, a number of quite distinct mathematical
concepts are used to represent the system under consideration. While it is often possible to take
alternative perspectives on the same problem, there are situations in which a certain conceptual
framework is more ‘natural’ and has been established as the most appropriate representation. An
important question for mathematical modeling in the post-genome era is therefore to compare and
contrast different organizational and descriptional levels and to identify the most appropriate
mathematical framework. Some interesting questions arising from this are:
 Why are there alternative formal representations?
 What are the limitations of formal representations, how do these depend on the available
experimental data as well as the descriptional and organizational level of the system
under consideration?
 How can we relate and combine different mathematical models?

An investigation into the questions above would generate a ‘wish-list’ of mathematical research
that is required to address the challenges provided by post-genome life science research. While
the question of how to integrate mathematical models is relatively new, the need to integrate
various software tools has long been recognized in the area of bioinformatics. Over the last few
years a number of software tools have been developed to describe various aspects of gene
expression and regulation. Depending on which organizational or descriptional level of the
biological problem is addressed, these tools are usually not alternatives but complement each
other. It is therefore generally recognized that there is no all-in-one package providing a solution
but instead a common interface is necessary. The ‘Systems Biology Workbench’ and ‘Systems
Biology Markup Language’ [Hucka 2001] are the result of such considerations. The present text
is to suggest a complementary effort at the theoretical (mathematical) level.

In modeling gene expression and regulation we are particularly interested in representing intra-
and inter-cellular dynamics by combining two modeling paradigms: Components (cells or the
expression of particular genes) are represented by continuous dynamics, i.e., rate equations
(differential or difference equations) based on the well-known enzyme kinetics in biochemistry
while multi-cellular dynamics are modeled using discrete representations such as finite state
machines (Discrete Event Modeling) [Ramadge 1989, Cao 1990, Cho 1999].

For a formal representation, one possible conceptual framework which possibly could unify these
different mathematical models is closely related to Rosen’s Metabolic-Repair or (M,R)-systems
[Rosen 1958, Wolkenhauer 2001a]. Rosen uses category theory to discuss limitations of
reductionism and modeling in the Newtonian realm. Another important application of category
theory to biological systems are the Memory Evolutive Systems (MES) of Ehresmann and
Vanbremeersch [http://perso.wanadoo.fr/vbm-ehr/AnintroT.htm]. Ehresmann and Vanbremeersch
have developed a mathematical model for open, self-organized, hierarchical autonomous systems
with memory and the ability to adapt to various conditions through a change of behavior. We
shall here adapt Rosen’s (M,R)-systems as Transformation-Regulation or (T,R)-systems to reflect
the more general application to gene expression and regulation. The formal representation of gene
expression and regulation therefore addresses two aspects: transformation and regulation. The
concept of regulation is either represented explicitly by control components, or realized implicitly
as an emergent phenomenon (e.g. self-organization).

The first step in this approach is to introduce initially two mathematical spaces (domain and co-
domain) representing either abstract or material objects. For example we may want to relate genes
with function; substrates with products or as in the context of time course experiments, we relate

15/07/2003 4/17
sequences of events. In any case, a component or system is subsequently represented by a
mapping between the associated spaces. This mapping represents some transformation, which
itself is regulated through further maps from the previously introduced co-domain and the set of
mappings between the two spaces. While Rosen captured this transformation-regulation process
using category theory, it is possible to derive conventional models such as automata, state-space
representations and regression models from them [Casti 1988]. In [Wolkenhauer 2001b] we
discussed how automata and state-space models can be considered as special cases (or
‘realizations’) of (T,R)-systems. The shift of focus from molecular characterization to
understanding the dynamics of pathways in genomics is reflected in the change of the definition
of the objects in the domain and co-domain to become sequences of data obtained from time
course experiments. Further below we return to the discussion about how the change of thinking
in genomics should be reflected in mathematical modeling of biological systems.

Constraints on the nature of mappings and therefore the class or categories of functions and its
structure arise ‘naturally’ from biological considerations. For instance, gene products usually
have more than one biological function which frequently depends on the state of the cell
(metabolic, other signaling, etc.). To give one extreme example, beta-catenin is a structural
protein of cell-cell adhesions at the cell membrane, where it helps gluing cells together. However,
it also can work as a transcription factor in the nucleus as the endpoint of the so-called wnt
pathway, which is an extremely important developmental pathway. Any deviations from expected
behavior have catastrophic consequences in the development of the organism. Thus, a mapping or
the class of mappings must be able to accommodate dynamic changes. Sometimes two different
genes may lead to the same biological function. Gene knock-out studies show that the function of
a deleted gene can sometimes be replaced by another gene or genes. For instance, there are
several Ras genes, three of which have been knocked out in mice: Harvey-Ras, Kirsten-Ras und
N-Ras. H-Ras and N-Ras knock-out are almost normal, but the K-Ras knock-out is lethal. The
work of Casti [1988, 1988b], which extends Rosen’s work on (M,R)-systems and considers
regulation in dynamic metabolic systems, could provide an interesting starting point to investigate
this problem.

Conventional systems theory considers inputs (independent variables) transformed into outputs
(dependent variables). The input/output point of view, although suitable for the engineering and
physical sciences, is unsuitable for cellular systems or gene networks as these systems do not
have an obvious signal flow of direction. In contrast, in the ‘behavioral approach’ [Willems 1991]
systems are viewed as defined by any relation among dynamic variables and a mathematical
model is defined as a subset of a universum of possibilities. Before we accept a mathematical
model as an encoding of the natural system, all outcomes in the universe are possible. The
modeling process then defines a subset of time-trajectories, taking on values in a suitable signal
space, and thereby defines a dynamic system by its behavior rather than its inputs and outputs.
While the definition of causal entailment via designated ‘inputs’ and ‘outputs’ remains the
primary objective for the biological scientist, its definition follows that of a dynamic system in
terms of time-trajectories. Willems’ behavioral framework fits therefore very well the situation in
which we obtain experimental data. For example, microarrays provide us with large sets of short
time series for which dependencies have to be identified from the data rather than being defined a
priori.

Microarrays are one of the latest breakthroughs in experimental molecular biology and allow the
monitoring of gene expression for tens of thousands of genes in parallel and in time. For a
comprehensive representation of gene expression current microarray technology lacks resolution
and the activity of post-translational factors in regulation remains undetected by it. Many
molecules that control genetic regulatory circuits act at extremely small intracellular
concentrations. Resultant fluctuations in the reaction rates of a biochemical process (e.g. a
signaling pathway) cause large variations in rates of for example development and morphology.
Most of the changes that matter must therefore be comparatively large by their very nature, at

15/07/2003 5/17
least for a short period of time to be observable with microarrays. A problem is that one tends to
look at large populations, e.g., bacterial cells in a colony grown on a Petri dish. Even massive
changes occurring in single cells will appear small, if they do not occur synchronized within a
small window of time. Nevertheless the technology is progressing and one can expect that some
of these technical limitations will be overcome to allow system identification from time series
data [Alter 2000, Wolkenhauer 2001b].

Scaling and Model Integration

On an empirical level a complex system is one that exhibits the emergence of unexpected
behavior. In other words, a (complex) system is defined as an organized structure of
interdependent components whose properties and relationships are largely determined by their
function in the whole. Here we shall adopt a notion of complexity that reflects our ability to
interact with the natural system in such ways as to make its qualities available for scientific
analysis. In this context, by ‘analysis’ we understand the process of encoding a natural system
through formal systems, i.e., mathematical modeling. The more independent encodings of a given
natural system can be build, the more complex the system is. Complexity is therefore not just
treated as a property of some particular mathematical model; nor is complexity entirely an
objective property of the natural system. Summarizing, the complexity of biological systems we
identify complexity as
 a property of an encoding (mathematical model), e.g., its dimensionality, order or number
of state-variables.
 an attribute of the natural system under consideration, e.g., the number of components,
descriptive and organizational levels that ensure its integrity.
 our ability to interact with the system, to observe it, i.e., to make measurements and
generate experimental data.

On all three accounts, genes, cells, tissue, organs, organisms and populations are individually and
as a functional whole a complex system. At any level, the notion of complex systems and the
implicit difficulties in studying them is closely related to the specific approach by which we
proceed. On a philosophical level this is related to epistemological questions while for scientific
practices this relates to the choice of a particular methodology (e.g. Bayesian approach) or model
(e.g. differential equations). We return to the choice of an appropriate mathematical model further
below.

In dynamic systems theory, one would initially ignore spatial aspects in the analysis of cell
differentiation. This approach is usually limited because both, space and time are essential to
explain the physical reality of gene expression. The fact that the concepts of space and time have
no material embodiment; they are not to be in the molecules or their DNA sequence; has been an
argument against material reductionism. Although this criticism is in principle correct, alternative
methods are in short supply. The problem is that although components of cells have a specific
location, these locations lack exact coordinates. Without spatial entailment there can be no living
cell but for formal modeling we would require a topological representation of this organization.
Notwithstanding the fact that for example for larger diffusion times we ought to consider partial
differential equations in biokinetic modeling, the complexity of these models forces us frequently
to compromise. It is the movement of molecules which raises most concern to the modeler,
location or compartmentalization can be dealt with an increased number of variables covering
regions.

Although the environment of a cell is always taken as one of the essential factors for cell
differentiation, it will be difficult to separate external from internal signaling in the analysis of
experimental data. A key problem is then how we can generalize from a model which assumes
physiological homogeneity as well as a homogenous or closed environment, to a model that
includes intracellular biochemical reaction dynamics; signaling, and cell-to-cell interactions?

15/07/2003 6/17

Gene expression takes place within the context of a cell, between cells, organs and organisms.
While we wish to ‘isolate’ a system, conceptually ‘close’ it from its environment through the
definition of inputs and outputs, we inevitably loose information in this approach. (Conceptual
closure amounts to the assumption of constancy for the external factors and the fact that external
forces are described as a function of something inside the system). Different levels may require
different modeling strategies and ultimatively we require a common conceptual framework that
integrates different models. For example, differential equations may provide the most realistic
modeling paradigm for a single-gene or single-cell representation but cell-to-cell, and large-scale
gene interaction networks are probably most appropriately represented by some finite state
machine. In addressing the problem of scaling and integration of models, there are two kinds of
system representations:
 Intra-component representations in which the state of a sub-system or component (e.g.
cell or gene) of a system is determined by a function (e.g. linking state-variables in rate
equations) and the evolution of states determines the system’s behavior.
 Inter-component discrete representations of ‘whole’ systems (e.g. clone, tissue or
genome), which do not define the state of the system explicitly but instead the state
emerges from the interactions of sub-systems or components (“cells as agents”).

A problem is how to combine these two very different representations? While a clone or colony
of bacteria might be described as optimizing a global ‘cost-function’, one could alternatively
consider cells as related but essentially independent components with an internally defined
programme for development, including mechanisms in response to environmental changes or
inputs. The comparison and combination of both modeling paradigms could lead to a number of
interesting questions related to how the scientist interprets causal entailment in biological systems.

In general, causation is a principle of explanation of change in the realm of matter. In dynamic
systems theory causation is defined as a (mathematical) relationship, not between material
objects, but between changes of states within and between components. In biology causation
cannot be formally proven and a “historical approach” is the basis for reasoning, i.e., if
correlations are observed consistently and repeatedly over an extended period of time, under
different conditions and by different researchers the relationship under consideration is
considered ‘causal’. This approach is surprisingly robust, although exceptions have been found to
almost any single dogma in biology. For instance, some viruses contain RNA genomes which
they copy into DNA for replication and then have the host cell transcribe it back into RNA.

Theory and Reality: Experimental Data and Mathematical Models

Abstract, theoretical mathematical models have, so far, played little or no role in the post-genome
era of the life sciences. The use of mathematical or probabilistic models has been mostly
restricted to the justification of algorithms in sequence analysis. Mathematical models of gene
expression or gene interactions have either been a theoretical exercise or are only concerned with
the practical application of multivariate techniques such as for instance in the analysis of array
data.

More abstract and hence general models are necessary and particularly useful in situations that
capture hierarchical systems consisting of highly interconnected components. For example,
consider the development of blood cells; there it seems that the primitive stem cells express a
whole battery of so called “lineage specific genes”, i.e. genes that are normally only expressed in
a subset of differentiated cells such as B-cells or T-cells. During differentiation, which again is
induced from outside by hormones, growth factors and other still ill defined cues, this “mess” in
gene expression is cleaned up and most genes are shut down. Thus, only the genes which
determine the proper lineage remain on. This is rather the opposite what one would expect. In the

15/07/2003 7/17
stem cell everything is on, and specificity in differentiation is achieved by shutting of the
expression of most genes and just leave a few selected on.

Two very fundamental aspects of life are ‘transformation’ (change) and ‘maintenance’
(replication, repair, regulation). Here we have summarized these processes can be summarized as
‘gene expression’ - the process by which information, stored in the DNA, is transformed into
products such as proteins. While in the past biologists have studied gene expression by means of
‘molecular characterization’ (of material objects) the post-genome era is characterized by a shift
of focus towards an understanding of ‘functional activity’. While the study of structural properties
of proteins (e.g. with the purpose to determine its function) will continue to be a research area, it
is increasingly recognized that protein interactions are the basis for observations made at the
metabolic and physiological level. This shift of perspective is possible with new experimental
technologies allowing for experiments that consider temporal changes in gene expression. In
other words, it becomes now possible to study gene expression as a dynamic, regulated process.

The development of (Zermelo-Fraenkel) set theory in mathematics and the material reductionism
in biology have parallels in that both regard things as more fundamental than processes or
transformations. The limitations of the “object-centered material reductionism” in biology are
generally accepted. The books by Rosen and more recently those by Sole and Goodmann (-
‘Signs of Life’) and Rothmann’s ‘Lessons from the Living Cell’ discuss these issues.
Mathematicians have developed with category theory a more flexible language in which
processes and relationships are put on equal status with ‘things’. In other words, category theory
promotes a conceptual framework in which ‘things’ are described not in terms of their
constituents, but by their relationships to other things. There are other philosophical reasons to
consider such a relational perspective of biology. In particular, the philosophical system of Arthur
Schopenhauer (who essentially refined Immanuel Kant's work) provides a basis for a relational
approach following from the fact that always and everywhere each thing/object exists merely in
virtue of another thing. But for anything to be different from anything else, either space or time
has to be pre-supposed, or both. Since causation is the principle of explanation of change in the
realm of matter, causation is subsequently a relationship, not between things, but between
changes of states of things.

In order to verify theoretical concepts and mathematical models we ought to identify the model
from experimental data or at least validate the model with data. The problem of complexity
appears then in two disguises:
 Dimensionality: hundreds or thousands of variables/genes/cells.
 Uncertainty: small samples (few time points, few replicates), imprecision, noise.

Analyzing experimental data we usually rely on assumptions made about the ensemble of
samples. A statistical or ‘average perspective’ may however hide short-term effects that are the
cause for a whole sequence of events in a genetic pathway. What in statistical terms is considered
an outlier may just be the phenomenon the biologist is looking for. It is therefore important to
compare different methodologies and to question their implicit assumptions with the
consequences for the biological questions asked. To allow reasoning in the presence of
uncertainty, we have to be precise about uncertainty.

For a systems approach, investigating causal entailment it is further necessary to be able to
systematically manipulate the system. At present the “data mining” approach is the prevailing
technique to study genomic data but it is important to realize that this will only allow us to
investigate associations (e.g. quantified by means of correlation analysis). The study of causal
relationship can only be studied through a comparison of system behavior in response to
perturbations. This not only imposes demands on the experimental design (being able to
manipulate certain variables according to specific input patterns to the system) but further

15/07/2003 8/17
suggests that the systems biologist should be part of the experimental design process rather than
being “delivered” a data set for analysis.






















Once the experimental design is completed and data are being generated, the question of which
kind of mathematical model and which structure it should have arises. In the theory of dynamic
systems we generally have to make a decision whether to regard the process as a deterministic
non-linear system but with a negligible stochastic component or to assume that the nonlinearity to
be only a small perturbation of an essentially linear stochastic process. Genuine nonlinear
stochastic processes have not yet been shown to be applicable for practical time-series analysis.
Although natural phenomena are never truly linear, for a very large number of them linear
(stochastic) modeling is often the only feasible option. The dilemma with, for example,
microarray time course experiments is that hundreds of variable are sampled at only few sample
points with replicates considered a luxury. This naturally gives rise to questions regarding the
limitations of stochastic linear modeling in the context of such data.

An interesting question in the context of the semantics of mathematical models is the role of
‘noise’ or random fluctuations in general. In biology, the role of random variation is often
illustrated with examples related to evolution and intracellular fluctuations of regulatory
molecules. For the latter the question is usually answered by the number of molecules involved,
fewer molecules usually suggesting a stochastic model while large numbers of molecules often
permit a deterministic model. While in the former case variation is an intrinsic aspect of the
natural system under consideration, a noise term in a description or formal representation, is often
used to ‘cover up’ variations that cannot be explained with the given model and hence relates to a
limitation in the observation and explanation of the phenomena. The question then is to whether a
mathematical model is considered to explain the underlying ‘mechanism’, which led to the
observations? Or do we require a model which numerically predicts a particular variable or set of
variables? Many biological systems appear to require a certain amount of noise to reach a state
with optimal conditions (e.g. equilibrium). Random variations allow the system to adapt to a
changed environment. In the extreme, without noise a biological system cannot react to change
and a purely random system has lost its ability to perform any regular function. This discussion
leads to an argument for an optimal ‘signal-to-noise’ ratio and mathematical models which allow
for a noise term. For example, in time series analysis Yule developed a conceptual framework in
NATURAL
SYSTEM
MODEL
physico-chemical
principles
Non-linear
Partial Diff.
Equations
linearisation
Linear
Partial Diff.
Equations
reduction
Ordinary
Diff.
Equations
simulation
pre-processing
measurement
and observation
Identificaton
Data Set
Model
Structure
realisation
Raw Data
parameter
estimation
Figure 1: Mathematical modelling of biological systems can follow two routes –

‘modelling’, guided by experimental data and ‘identification’ from experimental data.
In both cases, we rely on numerous assumptions and simplifications [Wolkenhauer
2001b].

15/07/2003 9/17
which order (represented by a linear, parametric or autoregressive model) is obtained from a
sequence of independent random shocks (white noise process).

Noise in the form of random fluctuations arises in pathway modeling in two ways. Internal noise
is inherent in the biochemical reactions. The magnitude is inverse proportional to the system size,
and its origin is usually considered to be thermal. On the other hand, external noise is a variation
in one or more of the control parameters, such as the rate constants associated with a given set of
reactions. External noise then drives the system into different attractors (i.e., fixed points, limit
cycles) of the dynamical systems model. If the noise level is considered small, its effects can
often be incorporated post hoc into the rate equations as an additional term. On the other hand, if
noise is the more dominant aspect, a stochastic model may be a more appropriate conceptual
framework to start with. Biochemical processes typically only involve a small fraction of any
given signalling molecule. For instance, most receptors give a full biological response when only
10-20 percent of them are engaged by ligand. More ligand often even leads to an inhibition of
responses. For this reason one type of signaling molecule can function in several distinct
pathways and exert completely different functions (this again could be represented by a hybrid
model). While random variations appear to be an essential strategy for adaptation and survival,
many regulatory pathways in cells have highly predictable outcomes. This dynamic stability of
genetic networks is the result of redundancy and the interconnection of systems (loops). To
faithfully represent these phenomena using mathematical modeling we therefore need to model
individual sub-systems as well as a collection of components into a complex network.

Mathematical Systems Biology: Genomic Cybernetics

Systems biology is an emerging field of research focused on the application of systems and
control theory to molecular systems [Kitano 2001, Wolkenhauer 2001a]. It aims at a system-level
understanding of metabolic or regulatory pathways by investigating interrelationships
(organization or structure) and interactions (dynamics or behavior) of genes (RNA transcripts,
proteins) and the genome or cells (metabolites).

The biggest problem that any approach to mathematical modeling in biology faces is well
summarized by Zadeh's uncertainty principle which states that as the complexity of a system
increases, our ability to make precise and yet significant statements about its behavior diminishes
until a threshold is reached beyond which precision and significance (or relevance) become
almost exclusive characteristics. Overly ambitious attempts to build predictive models of cells or
subcelluar processes are likely to experience the fate of historians and weather forecasters –
prediction is difficult, especially if it concerns the future… , and these difficulties are independent
of the time, amount of data available or technological resources (e.g. computing power) thrown at
the problem.

The problem is that perturbations to cells have multi-gene / multi-transcript / multi-protein
responses, ‘closing’ the system, i.e., restricting the model to a small set of variables, assuming
constancy of some variables, inevitably leads to an often unacceptable level of uncertainty in the
inference. In other words, the problems of applying systems theory in biology can be summarised
by
a) the difficulty of building precise and yet general models,
b) the ‘openness’ of biological systems, the fact that these systems are hierarchical and
highly interconnected.






15/07/2003 10/17





















Dynamic Pathway Modeling as an Example

We mentioned before the need to combine continuous representations (e.g. mass action
differential equations) and process algebras (formal languages such as p-calculus). The example
given above was motivated by combining representations of intra- and inter-celluar dynamics.
The problem of modeling signaling pathways is however another good example in which the need
for hybrid models has become clear. Intracellular signaling pathways directly govern cell
behaviour at cellular, tissue and whole-genome level and thereby influence severe pathologies
such as cancer, chronic inflammatory disease, cardiovascular disease and neurological
degeneration syndromes. Signal transduction mechanisms have been identified as important
targets for disease therapy. Signaling modules regulate fundamental biological processes
including cell proliferation, differentiation and survival. These ‘decisions’ are arrived at by
reaching thresholds in concentrations. The duration of reaching threshold matters and while some
processes are reversible others aren’t. While rate changes are best represented by differential
equations, such switching into different ‘operating modes’ is best represented using a ‘logical
formalism’. Forward and backward biochemical reactions run in parallel and ‘compete’ rendering
sequential representations unrealistic. Rate equations originate as a first approximation, whereby
internal fluctuations are ignored. These deterministic differential equations describe the evolution
of the mean value of concentrations of the various elements involved. The existence of positive
and negative feedback in a regulatory network is considered common and leads to nonlinear rate
equations.

The MAPK signaling pathway dynamics are an example of a system which has been investigated
by a number of research groups with very different modeling paradigms, including mass-action
differential equations, Monte-Carlo simulations, or process algebras. To this date, none of these
considered an all-round satisfactory solution, providing a biologically faithful and transparent
model that can be verified experimentally. Intra-cellular signaling pathways carry signals from
cell-surface receptors (where the process known as signal transduction converted the signal
produced by activation of a cell-surface receptor) to their intracellular destination. The
information flow is realized by biochemical processes, implemented by networks of proteins.
These networks have been represented and visualized by Petri nets, Boolean networks and other
graph-based networks. A number of simulation environments such as for example BioSpice,
NATURAL SYSTEM
Measurement
Observation
Causal Entailment
Inferential Entailment
Accurate Conclusions
but
Imprecise Reasoning
Precise Inference
but
Inaccurate Conclusions
Represents
Simple Systems
Quantitatively
Describes
Complex Systems
Qualitatively
Mathematical Modelling
(axioms, equations, diagrams)
Empirical Analysis
(natural language, diagrams, pictures)
Figure 2:
There is an interesting contrast and complementarity between modelling in the engineering
and physical sciences and inference in biology.

15/07/2003 11/17
DBSolve, Gepasi, StochSim, ProMOt, Diva, Cellerator, Vcell and E-cell amongst others are
available and efforts such as the Systems Biology Workbench and Systems Biology Markup
Language are a suitable computational tool to integrate and combine various tools. Here we shall
consider a sub-module of a signaling pathway and focus on a description of its biokinetic
reactions by means of (nonlinear) ordinary differential equations. The difficulties and challenges
arising when this model is to be extended to cover most of the aspects discussed previously will
become apparent from the discussion below.


Figure 3 The Ras/Raf-1/MEK/ERK signaling pathway.

The Ras/Raf-1/MEK/ERK module (Figure 3) is a ubiquitously expressed signaling pathway that
conveys mitogenic and differentiation signals from the cell membrane to the nucleus. This kinase
cascade appears to be spatially organized in a signaling complex nucleated by Ras proteins. The
small G protein Ras is activated by many growth factor receptors and binds to the Raf-1 kinase
with high affinity when activated. This induces the recruitment of Raf-1 from the cytosol to the
cell membrane. Activated Raf-1 then phosphorylates and activate MAPK/ERK Kinase (MEK), a
kinase that in turn phosphorylates and activates Extracellular signal Regulated Kinase (ERK), the
prototypic Mitogen-Activated Protein Kinase (MAPK). Activated ERKs can translocate to the
nucleus and regulate gene expression by the phosphorylation of transcription factors. This kinase
cascade controls the proliferation and differentiation of different cell types. The specific
biological effects are crucially dependent on the amplitude and kinetics of ERK activity. The
adjustment of these parameters involves the regulation of protein interactions within this pathway
and motivates a systems biological study. Figure 4 and 5 describe “circuit diagrams” of the
biokinetic reactions for which a mathematical model is used to simulate the influence RKIP has
on the pathway.



15/07/2003 12/17
VXEVWUDWH?6?HQ]\P H?(?
SURGXFW?3?
N?N?
N?
FRP SOH[?( 6?
1
x
2
x
3
x
4
x
N?

Figure 4. The pathway model is constructed from basic reaction modules like this enzyme kinetic reaction
for which a set of four differential equations is required.

The pathway is described by ‘reaction modules’ (Figure 4), each of which can be viewed as a
(slightly modified) enzyme kinetic reaction for which the following set of differential equations is
obtained:
1
1 1 2 2 3
2
1 1 2 2 3 3 3
3
1 1 2 2 3 3 3
4
3 3 4 4
( )
( ) ( ) ( )
( )
( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( )
( )
( ) ( )
dx t
k x t x t k x t
dt
dx t
k x t x t k x t k x t
dt
dx t
k x t x t k x t k x t
dt
dx t
k x t k x t
dt

    
      
     
  

The entire model, as shown in Figure 5, is composed of these modules, leading to what usually
becomes a relatively large set of differential equations for which parameter values have to be
identified.
k1/ k2
k3/ k4
Raf-1* RKIP
Raf-1*/RKIP/ERK-PP
Raf-1*/RKIP
k5
ERK-PP
RKIP-P
ERK
k6/ k7
MEK-PP
k8
k9/ k10
k11
RP
RKIP-P/RP
MEK-PP/ERK
m4
m10
m6
m5m7
m8
m9
m1 m2
m3
m11

Figure 5. Graphical representation of the ERK signaling pathway regulated by RKIP: a circle  represents
a state for the concentration of a protein and a bar  a kinetic parameter of reaction to be estimated. The

15/07/2003 13/17
directed arc (arrows) connecting a circle and a bar represents a direction of a signal flow. The bi-directional
thick arrows represent an association and a dissociation rate at the same time. The thin unidirectional
arrows represent a production rate of products.

As illustrated in Figure 6, in the estimation of parameters from western blot data, the parameter
estimates usually appear as a time dependent profile since the time course data include various
uncertain factors such as transient responses, noise terms, etc. However, if the signal transduction
system itself is inherently time-invariant then estimated parameter profile should converge to a
certain constant value at steady-state. Therefore we have to find this convergence value if the
system is time-invariant. Otherwise we have to derive an interpolated polynomial function of time
for time-varying systems. For reasons of cost, logistics and time management, for any particular
system under study, concentration profiles are usually obtained only for a relatively small number
of proteins and for few data points. One subsequently relies on values obtained from the literature.
But even if data are available, the estimation of parameters for nonlinear ordinary differential
equations is far from being a trivial problem. For the parameter estimation shown in Figure 6, we
discretized the given continuous differential equations along with a sample time, which usually
corresponds to the time of measurement. Then the continuous differential equations can be
approximated by difference equations. This leads to a set of linear algebraic difference equations
with respect to parameters and regression techniques can be employed.

0
2
4
6
8
10
0.2
0.3
0.4
0.5
0.6
0.7
t: time
k1(t): value of parameter
parameter estimation for k1: k1=0.53
0
2
4
6
8
10
4
5
6
7
8
x 10
-3
t: time
k2(t): value of parameter
parameter estimation for k2: k2=0.0072
0
2
4
6
8
10
0.3
0.4
0.5
0.6
0.7
t: time
k3(t): value of parameter
parameter estimation for k3: k3=0.625
0
2
4
6
8
10
1
1.5
2
2.5
3
x 10
-3
t: time
k4(t): value of parameter
parameter estimation for k4: k4=0.00245
calculated
trend
estimated
calculated
trend
estimated
calculated
trend
estimated
calculated
trend
estimated

Figure 6. Illustration for parameter estimation from time series data: the upper left shows Raf-1*/RKIP
complex association parameter k1, the upper right shows Raf-1*/RKIP/ERK-PP association parameter k3,
the lower left shows Raf-1* and RKIP dissociation parameter k2, and the lower right shows ERK-PP and
Raf-1*/RKIP complex dissociation parameter k4.

If a satisfactory model is obtained, this can then be used in a variety of ways to validate and
generate hypotheses, or to help experimental design. Based on the mathematical model illustrated
in Figure 5, and the estimated parameter values as for example obtained using a discretization of
the nonlinear ordinary differential equations (as illustrated in Figure 6), we can perform
simulation studies to analyze the signal transduction system with respect to the sensitivity for the
variation of RKIP and ERK-PP. For this purpose, we first simulate the pathway model according
to the variation of the initial concentration of RKIP (RKIP sensitivity analysis – see Figure 7).

15/07/2003 14/17
Next we perform the simulation according to the variation of the initial concentration of ERK-PP
in this case (ERK-PP sensitivity analysis – see Figure 8).


Figure 7. The simulation results according to the variation of the concentration of RKIP: The upper left
shows the change of concentration of Raf-1*, the upper right shows ERK, the lower left shows RKIP, and
the lower right shows RKIP-P.


Figure 8. The simulation results according to the variation of the concentration of ERK-PP (continued):
The upper left shows the change of concentration of MEK-PP, the upper right shows RKIP-P-RP, the lower
left shows ERK-P-MEK-PP, and the lower right shows RP.

The kind of models and the modeling approach which we introduced here has already proven to
be successful (i.e. useful to the biologist) despite the many assumptions, simplifications and
subsequent limitations of the model. A challenge for systems biology remains: how can we scale
these models up to describe not only more complex pathways but also to integrate information
and capture dynamic regulation at the transcriptome, proteomic and metabolomic level.

Especially MAP kinase pathways have been investigated by various groups using a variety of
mathematical techniques [Schoeberl 2002, Asthagiri 2001] and the co-existence or generalization
of different methodologies raises questions about the biological systems considered, for
mathematical models to have explanatory power, hence being useful to biologists, the semantics

15/07/2003 15/17
or interpretation of the models matters. Do we assume that a cell is essentially a computer or
machine – executing logical programmes, is it a biochemical soup, an essentially random process
or are independent agents interacting according to a set of pre-defined rules? Is noise an inherent
part of the biological process or do we introduce it as a means to represent unknown quantities
and variations?

Real-world problems and challenges to apply and develop research in the area of systems biology
are abounding. For example consider the development of mathematical models used to analyze
and simulate problems in development such as what is sometimes called asymmetrical division.
This describes the phenomenon that when a stem cell divides, one daughter cell differentiates
whereas the other remains a stem cell. Otherwise our stem cells would get depleted. This
phenomenon happens although the dividing stem cell has the same pattern of gene expression and
is exposed to the exact same environmental cues. Another application would be the mathematical
modeling of differential gene expression and regulation of transcription during a bacterial or viral
infection. The theoretical work could be guided by the analysis of DNA microarray data, which
are available for a number of organisms.

Summary and Conclusions

The discussion above outlined a dynamic systems framework for the study of gene expression
and regulation. We are interested in the interface between internal cellular dynamics and the
external environment in a multi-cellular system. A definition of complexity in the context of
modeling gene expression and regulation is given and the background and perspective taken is
described in detail. While the motivation is to investigate some fundamental questions of
morphological development, differentiation and responses to environmental stress, the proposal is
to focus these questions on a limited set of problems, methodologies and experimental techniques.
The use of a model is to see the general in the particular, i.e., the purpose of a mathematical
model of cellular processes is not to obtain a perfect fit to experimental data, but to refine the
biological question and experiment under consideration.

The central dogma of systems biology is the fact that the cell and its inter- and intra-cellular
processes describe dynamic systems. An understanding of regulatory systems therefore requires
more than merely collecting large amounts of data by gene expression assays. If we are to go
beyond ‘association’ to an understanding of ‘causal entailment’, we need to go beyond the data
mining approach. The systems approach is characterized by systematic manipulation of the
system behavior.

Reality is described as a continuous dynamic process, best represented as a system of components
realizing a spatio-temporal relationship of events. The motivation comes from the fact that despite
the endless complexity of life, it can be organized and repeated patterns appear at different
organizational and descriptional levels. Indeed, the fact that the incomprehensible presents itself
as comprehensible has been a necessary condition for the sanity and salary of scientists. This
principle is tested in systems biology with mathematical models of gene expression and
regulation for simple and yet complex biological systems.

If this documents gives the impression that molecular biology, with its focus on spatial/structural
molecular characteristics, is failing to address temporal and relational aspects, so does systems
and control theory miss the importance of spatial or structural arrangements in its representations.
The problem of how to combine both temporal and spatial aspect in one model has been a major
challenge in the engineering and physical sciences and will be an even greater one for molecular
processes, which are consisting of a large number of interacting components.

With the shift of focus from molecular characterization to an understanding of functional activity
in genomics, systems biology can provide us with methodologies to study the organization and

15/07/2003 16/17
dynamics of complex multivariable genetic pathways. The application of systems theory to
biology is not new and Mihajlo Mesarovic wrote in 1968 that “in spite of the considerable interest
and efforts, the application of systems theory in biology has not quite lived up to expectations. [..]
one of the main reasons for the existing lag is that systems theory has not been directly concerned
with some of the problems of vital importance in biology.” His advice for the biologists was that
progress could be made by more direct and stronger interactions with system scientists. “The real
advance in the application of systems theory to biology will come about only when the biologists
start asking questions which are based on the system-theoretic concepts rather than using these
concepts to represent in still another way the phenomena which are already explained in terms of
biophysical or biochemical principles. [..] then we will not have the ‘application of engineering
principles to biological problems’ but rather a field of systems biology with its own identity and in
its own right.”

References

O. Alter, P.O. Brown and D. Botstein (2000): Singular Value Decomposition for Genome-Wide
Expression Data Processing and Modeling. PNAS, Vol. 97, No. 18, 10101-10106, 29 August
2000.

A.R. Asthagiri and D.A. Lauffenburger (2001): A Computational Study of Feedback Effects on
Signal Dynamics in a Mitogen-Activated Protein Kinase (MAPK) Pathway Model. Biotechnol.
Prog., Vol. 17, 227-239.

M.S. Branicky, V.S. Brokar and S.K. Mitter (1998): A Unified Framework for Hybrid Control:
Model and Optimal Control Theory. IEEE Trans. Automatic Control, Vol. 43, No. 1, 31-45.

X.-R. Cao and Y.-C. Ho (1990): Models of Discrete Event Dynamic Systems. IEEE Control
Systems Magazine, Vol. 10, No. 3, 69-76.

K.-H. Cho and J.-T. Lim (1999): Mixed Centralized/Decentralized Supervisory Control of
Discrete Event Dynamic Systems. Automatica, Vol. 35, No. 1, 121-128.

J.L. Casti (1988): Linear Metabolism-Repair Systems. Int. J. General Systems, Vol. 14, 143-167.

J.L. Casti (1988b): The Theory of Metabolism-Repair Systems. Appl. Mathematics Comput., Vol.
28, 113-154.

J. Hasty, D. McMillen, F. Isaacs and J.J. Collins (2001): Computational Studies of Gene
Regulatory Networks: In Numero Molecular Biology. Nature Reviews Genetics, Vol. 2, No 4,
268-279, April 2001.

M. Hucka, A. Finney, H. Sauro, H. Bolouri, J. Doyle and H. Kitano (2001): The ERATO Systems
Biology Workbench: An Integrated Environment for Multiscale and Multitheoretic Simulations in
Systems Biology. Chapter 6 in Foundations of Systems Biology, H. Kitano (ed.), MIT Press, 2001.

S.A. Kauffman (1995): At Home in the Universe: The Search for Laws of Self-Organisation and
Complexity. Oxford University Press, New York, 1995.

C. Furausawa and K. Kaneko (1998): Emergence of Rules in Cell Society: Differentiation,
Hierarchy, and Stability. Bull. Math. Biol., Vol. 60, 46-49.

H. Kitano ed. (2001): Foundations of Systems Biology. MIT Press.


15/07/2003 17/17
B. Lennartson, M. Tittus, B. Egardit and S. Pettersson (1996): Hybrid Systems in Process Control.
IEEE Control Systems Magazine, Vol. 16, No. 5, 45-56.

P.J. Ramadge and W.M. Wonhan (1989): The Control of Discrete Event Systems. Proc. IEEE,
Vol. 77, 81-98 (Special Issue: Discrete Event Dynamic Systems).

R. Rosen (1958): The Representation of Biological Systems from the Standpoint of the Theory of
Categories. Bulletin of Mathematical Biophysics, Vol. 20, 317-341.

R. Rosen (1985): Anticipatory Systems. Pergamon Press, New York.

B. Schoeberl, C. Eichler-Jonsson, E.D. Gilles and G. Müller (2002): Computational Modelling of
the Dynamics of the MAP kinase Cascade Activated by Surface and Internalized EGF Receptors.
Nature Biotechnology, Vol. 20, April, 370-375.

R. Sole and B. Goodwin (2000): Signs of Life: How Complexity Pervades Biology? Basic Books,
New York.

J.J. Tyson and M.C. Mackey (2001): Molecular, Metabolic and Genetic Control. Chaos, Vol. 11,
No 1, March 2001 (Special Issue).

J.C. Willems (1991): Paradigms and Puzzles in the Theory of Dynamical Systems. IEEE
Transactions on Automatic Control, Vol. 36, No. 3, 259-294, March 1991.

O. Wolkenhauer (2001a): Systems Biology: The Reincarnation of Systems Theory Applied in
Biology? Briefings in Bioinformatics, Henry Stewart Publications, Vol. 2, No. 3, 258-270,
September 2001 (Special Issue: Modelling Cell Systems)

O. Wolkenhauer (2001b): Mathematical Modelling in the Post-Genome Era: Understanding
Genome Expression and Regulation - A System Theoretic Approach. BioSystems, Elsevier, In
press.

O. Wolkenhauer (2001c): Data Engineering: Fuzzy Mathematics in Systems Theory and
Data Analysis. John Wiley & Sons, New York, 2001.

H. Ye, A.N. Michel and L. Hou (1998): Stability Theory for Hybrid Dynamical Systems. IEEE
Trans. Automatic Control, Vol. 43, No. 4, 461-474.