15/07/2003 1/17

Mathematical Systems Biology: Genomic Cybernetics

Olaf Wolkenhauer

Dept. Biomolecular Sciences and Dept.

Electrical Eng. & Electronics

Control Systems Centre

UMIST

Manchester M60 1QD, UK

o.wolkenhauer@umist.ac.uk

Walter Kolch

Institute of Biomedical and Life Sciences

University of Glasgow

CRC Beatson Laboratories

Garscube Estate, Switchback Road

Glasgow G61 1BD, UK

wkolch@beatson.gla.ac.uk

Kwang-Hyun Cho

School of Electrical Engineering

University of Ulsan

Ulsan, 680-749, Korea

ckh@mail.ulsan.ac.kr

Abstract

The purpose of mathematical systems biology is to investigate genome expression and regulation

through mathematical modeling and systems theory in particular. The principal idea is to treat

gene expression and regulatory mechanisms of the cell cycle, morphological development, cell

differentiation and environmental responses as controlled dynamic systems.

Although it is common knowledge that cellular systems are dynamic and regulated processes, to

this date they are not investigated and represented as such. The kinds of experimental techniques,

which have been available in molecular biology, largely determined the material reductionism,

which describes gene expression by means of molecular characterization.

Instead of trying to identify genes as causal agents for some function, role, or change in

phenotype we ought to relate these observations to sequences of events. In other words, in

systems biology, instead of looking for a gene that is the reason, explanation or cause of some

phenomenon we seek an explanation in the dynamics (sequences of events ordered by time) that

led to it.

In mathematical systems biology we are aiming at developing a systems theory for the dynamics

of a cell. In this text we first define the concept of complexity in the context of gene expression

and regulation before we discuss the challenges and problems in developing mathematical models

of cellular dynamics, and provide an example to illustrate systems biology, its challenges and

perspectives of this emerging area of research.

Introduction: Action vs. Interactions

Gene expression is the process by which information, stored in the DNA is transformed via RNA

into proteins. While the availability of genome sequences is without doubt a revolutionary

development in the life sciences, providing a basis for technologies such as microarrays, the

principal aim of the post-genome era is to understand the organization (structure) and dynamics

(behavior) of genetic pathways. The area of genomics reflects this shift of focus from molecular

characterization of components to an understanding of the functional activity of genes, proteins

and metabolites. This shift of focus in genomics requires a change in the way we formally

investigate cellular processes: Here we suggest a dynamic systems approach to gene expression

and regulation, an approach we refer to as systems biology or genomic cybernetics.

Further below we are going to provide an example for intra-cellular dynamics by means of a

mathematical model for a signaling pathway. However, looking at cells interacting in the

morphological development of an organism provide another example for the importance of a

15/07/2003 2/17

dynamic-systems perspective of gene expression and regulation. For differentiation of cells in

development we find that the relation between the genome of a cell and the reactions which occur

in the cells we require a conceptual framework for both, spatial and temporal aspects in order to

capture the relationship between an internal programme and dynamic interactions between the

cell and its environment. The environment may be other cells, physical constraints or external

signals to which the cellular system can respond. While we suppose that the cells in a developing

organism can possess the same genome they nevertheless can develop and respond completely

different from one another. To answer why and how this can happen one ought to study gene

expression as a temporal process. The principle challenge for systems biology is then to answer

the following questions [adopted from Sole 2000]:

1. How do cells act and interact within the context of the organism to generate coherent

wholes?

2. How do genes act and interact within the context of the cell as to bring about structure

and function?

Asking how genetic pathways are dynamically regulated and spatially organized, we distinguish

between the action and interaction of genes and cells respectively (intra- and inter-cellular

dynamics). For example, considering morphological development, to what extend do genes

control the process or do genes only participate in a reactive fashion? Many decisions in

development are induction events mediated by the contact with the surroundings. The

multicellular context therefore determines what happens to the individual cell. For example,

cancer cells have lost this ability to respond and therefore disregard tissue organisation and grow

unrestricted and invasively. It seems that cells and eventually organs have an inherent

developmental programme which they execute unless instructed otherwise.

Since the 1960s it is known that the most basic cellular processes are dynamic, feedback

controlled and that cells display anticipatory behavior. In the 1960’s, investigating regulatory

proteins and the interactions of allosteric enzymes, Francois Jacob and Jaques Monod introduced

the distinction between ‘structural genes’ (coding for proteins) and ‘regulatory genes’, which

control the rate at which structural genes are transcribed. This control of the rate of protein

synthesis gave the first indication of such processes being most appropriately viewed as dynamic

systems. With the lack of experimental time-course data, mathematical models of gene regulatory

networks have so far focused on ordinary or stochastic differential equations and automata

[Tyson 2001, Hasty 2001]. For such models to be specific they only consider a small number of

genes and for simulations of many genes interacting, the relation to experimental data is lost. The

problem, also known as Zadeh’s uncertainty principle is further discussed below. It is clearly

important to explore the principal limits of how we can balance the composition of components

on a large scale, preserving the integrity of the whole system, with the individuality of its

components, and without loosing too much accuracy on the small scale. Since the two

organizational levels (gene vs. genome or cell vs. tissue/colony) are very different with regard to

how we can observe and represent them, different areas of research have evolved around these

organizational and descriptional levels. For example, while differential equations have been used

to develop accurate or predictive models of individual genes in a particular organism and context

[Tyson 2001], Boolean networks modeling hundreds and thousands of interacting genes have

been successful in capturing evolutionary aspects at the genome level [Kauffman 1995].

The challenge is to develop a conceptual framework, which integrates these models through

abstraction (i.e., generalization). For even the simplest of biological systems we find that a whole

range of techniques, ranging from time series analysis (regression models), dynamic systems

theory (rate equations, behavioral models), automata theory (finite state machines) and various

others are likely to be considered. The validation and evaluation of any mathematical model with

experimental data will further require pattern recognition techniques such as multivariate

clustering and component analysis. There is therefore a great need for an integration of

mathematical models and to formalize the modeling process itself. Possible approaches which

15/07/2003 3/17

may be able to integrate or unify these distinct methodologies are briefly discussed in the

following section.

Integrating Organizational and Descriptional Levels of Explanation

Depending on what biological problem is investigated, a number of quite distinct mathematical

concepts are used to represent the system under consideration. While it is often possible to take

alternative perspectives on the same problem, there are situations in which a certain conceptual

framework is more ‘natural’ and has been established as the most appropriate representation. An

important question for mathematical modeling in the post-genome era is therefore to compare and

contrast different organizational and descriptional levels and to identify the most appropriate

mathematical framework. Some interesting questions arising from this are:

Why are there alternative formal representations?

What are the limitations of formal representations, how do these depend on the available

experimental data as well as the descriptional and organizational level of the system

under consideration?

How can we relate and combine different mathematical models?

An investigation into the questions above would generate a ‘wish-list’ of mathematical research

that is required to address the challenges provided by post-genome life science research. While

the question of how to integrate mathematical models is relatively new, the need to integrate

various software tools has long been recognized in the area of bioinformatics. Over the last few

years a number of software tools have been developed to describe various aspects of gene

expression and regulation. Depending on which organizational or descriptional level of the

biological problem is addressed, these tools are usually not alternatives but complement each

other. It is therefore generally recognized that there is no all-in-one package providing a solution

but instead a common interface is necessary. The ‘Systems Biology Workbench’ and ‘Systems

Biology Markup Language’ [Hucka 2001] are the result of such considerations. The present text

is to suggest a complementary effort at the theoretical (mathematical) level.

In modeling gene expression and regulation we are particularly interested in representing intra-

and inter-cellular dynamics by combining two modeling paradigms: Components (cells or the

expression of particular genes) are represented by continuous dynamics, i.e., rate equations

(differential or difference equations) based on the well-known enzyme kinetics in biochemistry

while multi-cellular dynamics are modeled using discrete representations such as finite state

machines (Discrete Event Modeling) [Ramadge 1989, Cao 1990, Cho 1999].

For a formal representation, one possible conceptual framework which possibly could unify these

different mathematical models is closely related to Rosen’s Metabolic-Repair or (M,R)-systems

[Rosen 1958, Wolkenhauer 2001a]. Rosen uses category theory to discuss limitations of

reductionism and modeling in the Newtonian realm. Another important application of category

theory to biological systems are the Memory Evolutive Systems (MES) of Ehresmann and

Vanbremeersch [http://perso.wanadoo.fr/vbm-ehr/AnintroT.htm]. Ehresmann and Vanbremeersch

have developed a mathematical model for open, self-organized, hierarchical autonomous systems

with memory and the ability to adapt to various conditions through a change of behavior. We

shall here adapt Rosen’s (M,R)-systems as Transformation-Regulation or (T,R)-systems to reflect

the more general application to gene expression and regulation. The formal representation of gene

expression and regulation therefore addresses two aspects: transformation and regulation. The

concept of regulation is either represented explicitly by control components, or realized implicitly

as an emergent phenomenon (e.g. self-organization).

The first step in this approach is to introduce initially two mathematical spaces (domain and co-

domain) representing either abstract or material objects. For example we may want to relate genes

with function; substrates with products or as in the context of time course experiments, we relate

15/07/2003 4/17

sequences of events. In any case, a component or system is subsequently represented by a

mapping between the associated spaces. This mapping represents some transformation, which

itself is regulated through further maps from the previously introduced co-domain and the set of

mappings between the two spaces. While Rosen captured this transformation-regulation process

using category theory, it is possible to derive conventional models such as automata, state-space

representations and regression models from them [Casti 1988]. In [Wolkenhauer 2001b] we

discussed how automata and state-space models can be considered as special cases (or

‘realizations’) of (T,R)-systems. The shift of focus from molecular characterization to

understanding the dynamics of pathways in genomics is reflected in the change of the definition

of the objects in the domain and co-domain to become sequences of data obtained from time

course experiments. Further below we return to the discussion about how the change of thinking

in genomics should be reflected in mathematical modeling of biological systems.

Constraints on the nature of mappings and therefore the class or categories of functions and its

structure arise ‘naturally’ from biological considerations. For instance, gene products usually

have more than one biological function which frequently depends on the state of the cell

(metabolic, other signaling, etc.). To give one extreme example, beta-catenin is a structural

protein of cell-cell adhesions at the cell membrane, where it helps gluing cells together. However,

it also can work as a transcription factor in the nucleus as the endpoint of the so-called wnt

pathway, which is an extremely important developmental pathway. Any deviations from expected

behavior have catastrophic consequences in the development of the organism. Thus, a mapping or

the class of mappings must be able to accommodate dynamic changes. Sometimes two different

genes may lead to the same biological function. Gene knock-out studies show that the function of

a deleted gene can sometimes be replaced by another gene or genes. For instance, there are

several Ras genes, three of which have been knocked out in mice: Harvey-Ras, Kirsten-Ras und

N-Ras. H-Ras and N-Ras knock-out are almost normal, but the K-Ras knock-out is lethal. The

work of Casti [1988, 1988b], which extends Rosen’s work on (M,R)-systems and considers

regulation in dynamic metabolic systems, could provide an interesting starting point to investigate

this problem.

Conventional systems theory considers inputs (independent variables) transformed into outputs

(dependent variables). The input/output point of view, although suitable for the engineering and

physical sciences, is unsuitable for cellular systems or gene networks as these systems do not

have an obvious signal flow of direction. In contrast, in the ‘behavioral approach’ [Willems 1991]

systems are viewed as defined by any relation among dynamic variables and a mathematical

model is defined as a subset of a universum of possibilities. Before we accept a mathematical

model as an encoding of the natural system, all outcomes in the universe are possible. The

modeling process then defines a subset of time-trajectories, taking on values in a suitable signal

space, and thereby defines a dynamic system by its behavior rather than its inputs and outputs.

While the definition of causal entailment via designated ‘inputs’ and ‘outputs’ remains the

primary objective for the biological scientist, its definition follows that of a dynamic system in

terms of time-trajectories. Willems’ behavioral framework fits therefore very well the situation in

which we obtain experimental data. For example, microarrays provide us with large sets of short

time series for which dependencies have to be identified from the data rather than being defined a

priori.

Microarrays are one of the latest breakthroughs in experimental molecular biology and allow the

monitoring of gene expression for tens of thousands of genes in parallel and in time. For a

comprehensive representation of gene expression current microarray technology lacks resolution

and the activity of post-translational factors in regulation remains undetected by it. Many

molecules that control genetic regulatory circuits act at extremely small intracellular

concentrations. Resultant fluctuations in the reaction rates of a biochemical process (e.g. a

signaling pathway) cause large variations in rates of for example development and morphology.

Most of the changes that matter must therefore be comparatively large by their very nature, at

15/07/2003 5/17

least for a short period of time to be observable with microarrays. A problem is that one tends to

look at large populations, e.g., bacterial cells in a colony grown on a Petri dish. Even massive

changes occurring in single cells will appear small, if they do not occur synchronized within a

small window of time. Nevertheless the technology is progressing and one can expect that some

of these technical limitations will be overcome to allow system identification from time series

data [Alter 2000, Wolkenhauer 2001b].

Scaling and Model Integration

On an empirical level a complex system is one that exhibits the emergence of unexpected

behavior. In other words, a (complex) system is defined as an organized structure of

interdependent components whose properties and relationships are largely determined by their

function in the whole. Here we shall adopt a notion of complexity that reflects our ability to

interact with the natural system in such ways as to make its qualities available for scientific

analysis. In this context, by ‘analysis’ we understand the process of encoding a natural system

through formal systems, i.e., mathematical modeling. The more independent encodings of a given

natural system can be build, the more complex the system is. Complexity is therefore not just

treated as a property of some particular mathematical model; nor is complexity entirely an

objective property of the natural system. Summarizing, the complexity of biological systems we

identify complexity as

a property of an encoding (mathematical model), e.g., its dimensionality, order or number

of state-variables.

an attribute of the natural system under consideration, e.g., the number of components,

descriptive and organizational levels that ensure its integrity.

our ability to interact with the system, to observe it, i.e., to make measurements and

generate experimental data.

On all three accounts, genes, cells, tissue, organs, organisms and populations are individually and

as a functional whole a complex system. At any level, the notion of complex systems and the

implicit difficulties in studying them is closely related to the specific approach by which we

proceed. On a philosophical level this is related to epistemological questions while for scientific

practices this relates to the choice of a particular methodology (e.g. Bayesian approach) or model

(e.g. differential equations). We return to the choice of an appropriate mathematical model further

below.

In dynamic systems theory, one would initially ignore spatial aspects in the analysis of cell

differentiation. This approach is usually limited because both, space and time are essential to

explain the physical reality of gene expression. The fact that the concepts of space and time have

no material embodiment; they are not to be in the molecules or their DNA sequence; has been an

argument against material reductionism. Although this criticism is in principle correct, alternative

methods are in short supply. The problem is that although components of cells have a specific

location, these locations lack exact coordinates. Without spatial entailment there can be no living

cell but for formal modeling we would require a topological representation of this organization.

Notwithstanding the fact that for example for larger diffusion times we ought to consider partial

differential equations in biokinetic modeling, the complexity of these models forces us frequently

to compromise. It is the movement of molecules which raises most concern to the modeler,

location or compartmentalization can be dealt with an increased number of variables covering

regions.

Although the environment of a cell is always taken as one of the essential factors for cell

differentiation, it will be difficult to separate external from internal signaling in the analysis of

experimental data. A key problem is then how we can generalize from a model which assumes

physiological homogeneity as well as a homogenous or closed environment, to a model that

includes intracellular biochemical reaction dynamics; signaling, and cell-to-cell interactions?

15/07/2003 6/17

Gene expression takes place within the context of a cell, between cells, organs and organisms.

While we wish to ‘isolate’ a system, conceptually ‘close’ it from its environment through the

definition of inputs and outputs, we inevitably loose information in this approach. (Conceptual

closure amounts to the assumption of constancy for the external factors and the fact that external

forces are described as a function of something inside the system). Different levels may require

different modeling strategies and ultimatively we require a common conceptual framework that

integrates different models. For example, differential equations may provide the most realistic

modeling paradigm for a single-gene or single-cell representation but cell-to-cell, and large-scale

gene interaction networks are probably most appropriately represented by some finite state

machine. In addressing the problem of scaling and integration of models, there are two kinds of

system representations:

Intra-component representations in which the state of a sub-system or component (e.g.

cell or gene) of a system is determined by a function (e.g. linking state-variables in rate

equations) and the evolution of states determines the system’s behavior.

Inter-component discrete representations of ‘whole’ systems (e.g. clone, tissue or

genome), which do not define the state of the system explicitly but instead the state

emerges from the interactions of sub-systems or components (“cells as agents”).

A problem is how to combine these two very different representations? While a clone or colony

of bacteria might be described as optimizing a global ‘cost-function’, one could alternatively

consider cells as related but essentially independent components with an internally defined

programme for development, including mechanisms in response to environmental changes or

inputs. The comparison and combination of both modeling paradigms could lead to a number of

interesting questions related to how the scientist interprets causal entailment in biological systems.

In general, causation is a principle of explanation of change in the realm of matter. In dynamic

systems theory causation is defined as a (mathematical) relationship, not between material

objects, but between changes of states within and between components. In biology causation

cannot be formally proven and a “historical approach” is the basis for reasoning, i.e., if

correlations are observed consistently and repeatedly over an extended period of time, under

different conditions and by different researchers the relationship under consideration is

considered ‘causal’. This approach is surprisingly robust, although exceptions have been found to

almost any single dogma in biology. For instance, some viruses contain RNA genomes which

they copy into DNA for replication and then have the host cell transcribe it back into RNA.

Theory and Reality: Experimental Data and Mathematical Models

Abstract, theoretical mathematical models have, so far, played little or no role in the post-genome

era of the life sciences. The use of mathematical or probabilistic models has been mostly

restricted to the justification of algorithms in sequence analysis. Mathematical models of gene

expression or gene interactions have either been a theoretical exercise or are only concerned with

the practical application of multivariate techniques such as for instance in the analysis of array

data.

More abstract and hence general models are necessary and particularly useful in situations that

capture hierarchical systems consisting of highly interconnected components. For example,

consider the development of blood cells; there it seems that the primitive stem cells express a

whole battery of so called “lineage specific genes”, i.e. genes that are normally only expressed in

a subset of differentiated cells such as B-cells or T-cells. During differentiation, which again is

induced from outside by hormones, growth factors and other still ill defined cues, this “mess” in

gene expression is cleaned up and most genes are shut down. Thus, only the genes which

determine the proper lineage remain on. This is rather the opposite what one would expect. In the

15/07/2003 7/17

stem cell everything is on, and specificity in differentiation is achieved by shutting of the

expression of most genes and just leave a few selected on.

Two very fundamental aspects of life are ‘transformation’ (change) and ‘maintenance’

(replication, repair, regulation). Here we have summarized these processes can be summarized as

‘gene expression’ - the process by which information, stored in the DNA, is transformed into

products such as proteins. While in the past biologists have studied gene expression by means of

‘molecular characterization’ (of material objects) the post-genome era is characterized by a shift

of focus towards an understanding of ‘functional activity’. While the study of structural properties

of proteins (e.g. with the purpose to determine its function) will continue to be a research area, it

is increasingly recognized that protein interactions are the basis for observations made at the

metabolic and physiological level. This shift of perspective is possible with new experimental

technologies allowing for experiments that consider temporal changes in gene expression. In

other words, it becomes now possible to study gene expression as a dynamic, regulated process.

The development of (Zermelo-Fraenkel) set theory in mathematics and the material reductionism

in biology have parallels in that both regard things as more fundamental than processes or

transformations. The limitations of the “object-centered material reductionism” in biology are

generally accepted. The books by Rosen and more recently those by Sole and Goodmann (-

‘Signs of Life’) and Rothmann’s ‘Lessons from the Living Cell’ discuss these issues.

Mathematicians have developed with category theory a more flexible language in which

processes and relationships are put on equal status with ‘things’. In other words, category theory

promotes a conceptual framework in which ‘things’ are described not in terms of their

constituents, but by their relationships to other things. There are other philosophical reasons to

consider such a relational perspective of biology. In particular, the philosophical system of Arthur

Schopenhauer (who essentially refined Immanuel Kant's work) provides a basis for a relational

approach following from the fact that always and everywhere each thing/object exists merely in

virtue of another thing. But for anything to be different from anything else, either space or time

has to be pre-supposed, or both. Since causation is the principle of explanation of change in the

realm of matter, causation is subsequently a relationship, not between things, but between

changes of states of things.

In order to verify theoretical concepts and mathematical models we ought to identify the model

from experimental data or at least validate the model with data. The problem of complexity

appears then in two disguises:

Dimensionality: hundreds or thousands of variables/genes/cells.

Uncertainty: small samples (few time points, few replicates), imprecision, noise.

Analyzing experimental data we usually rely on assumptions made about the ensemble of

samples. A statistical or ‘average perspective’ may however hide short-term effects that are the

cause for a whole sequence of events in a genetic pathway. What in statistical terms is considered

an outlier may just be the phenomenon the biologist is looking for. It is therefore important to

compare different methodologies and to question their implicit assumptions with the

consequences for the biological questions asked. To allow reasoning in the presence of

uncertainty, we have to be precise about uncertainty.

For a systems approach, investigating causal entailment it is further necessary to be able to

systematically manipulate the system. At present the “data mining” approach is the prevailing

technique to study genomic data but it is important to realize that this will only allow us to

investigate associations (e.g. quantified by means of correlation analysis). The study of causal

relationship can only be studied through a comparison of system behavior in response to

perturbations. This not only imposes demands on the experimental design (being able to

manipulate certain variables according to specific input patterns to the system) but further

15/07/2003 8/17

suggests that the systems biologist should be part of the experimental design process rather than

being “delivered” a data set for analysis.

Once the experimental design is completed and data are being generated, the question of which

kind of mathematical model and which structure it should have arises. In the theory of dynamic

systems we generally have to make a decision whether to regard the process as a deterministic

non-linear system but with a negligible stochastic component or to assume that the nonlinearity to

be only a small perturbation of an essentially linear stochastic process. Genuine nonlinear

stochastic processes have not yet been shown to be applicable for practical time-series analysis.

Although natural phenomena are never truly linear, for a very large number of them linear

(stochastic) modeling is often the only feasible option. The dilemma with, for example,

microarray time course experiments is that hundreds of variable are sampled at only few sample

points with replicates considered a luxury. This naturally gives rise to questions regarding the

limitations of stochastic linear modeling in the context of such data.

An interesting question in the context of the semantics of mathematical models is the role of

‘noise’ or random fluctuations in general. In biology, the role of random variation is often

illustrated with examples related to evolution and intracellular fluctuations of regulatory

molecules. For the latter the question is usually answered by the number of molecules involved,

fewer molecules usually suggesting a stochastic model while large numbers of molecules often

permit a deterministic model. While in the former case variation is an intrinsic aspect of the

natural system under consideration, a noise term in a description or formal representation, is often

used to ‘cover up’ variations that cannot be explained with the given model and hence relates to a

limitation in the observation and explanation of the phenomena. The question then is to whether a

mathematical model is considered to explain the underlying ‘mechanism’, which led to the

observations? Or do we require a model which numerically predicts a particular variable or set of

variables? Many biological systems appear to require a certain amount of noise to reach a state

with optimal conditions (e.g. equilibrium). Random variations allow the system to adapt to a

changed environment. In the extreme, without noise a biological system cannot react to change

and a purely random system has lost its ability to perform any regular function. This discussion

leads to an argument for an optimal ‘signal-to-noise’ ratio and mathematical models which allow

for a noise term. For example, in time series analysis Yule developed a conceptual framework in

NATURAL

SYSTEM

MODEL

physico-chemical

principles

Non-linear

Partial Diff.

Equations

linearisation

Linear

Partial Diff.

Equations

reduction

Ordinary

Diff.

Equations

simulation

pre-processing

measurement

and observation

Identificaton

Data Set

Model

Structure

realisation

Raw Data

parameter

estimation

Figure 1: Mathematical modelling of biological systems can follow two routes –

‘modelling’, guided by experimental data and ‘identification’ from experimental data.

In both cases, we rely on numerous assumptions and simplifications [Wolkenhauer

2001b].

15/07/2003 9/17

which order (represented by a linear, parametric or autoregressive model) is obtained from a

sequence of independent random shocks (white noise process).

Noise in the form of random fluctuations arises in pathway modeling in two ways. Internal noise

is inherent in the biochemical reactions. The magnitude is inverse proportional to the system size,

and its origin is usually considered to be thermal. On the other hand, external noise is a variation

in one or more of the control parameters, such as the rate constants associated with a given set of

reactions. External noise then drives the system into different attractors (i.e., fixed points, limit

cycles) of the dynamical systems model. If the noise level is considered small, its effects can

often be incorporated post hoc into the rate equations as an additional term. On the other hand, if

noise is the more dominant aspect, a stochastic model may be a more appropriate conceptual

framework to start with. Biochemical processes typically only involve a small fraction of any

given signalling molecule. For instance, most receptors give a full biological response when only

10-20 percent of them are engaged by ligand. More ligand often even leads to an inhibition of

responses. For this reason one type of signaling molecule can function in several distinct

pathways and exert completely different functions (this again could be represented by a hybrid

model). While random variations appear to be an essential strategy for adaptation and survival,

many regulatory pathways in cells have highly predictable outcomes. This dynamic stability of

genetic networks is the result of redundancy and the interconnection of systems (loops). To

faithfully represent these phenomena using mathematical modeling we therefore need to model

individual sub-systems as well as a collection of components into a complex network.

Mathematical Systems Biology: Genomic Cybernetics

Systems biology is an emerging field of research focused on the application of systems and

control theory to molecular systems [Kitano 2001, Wolkenhauer 2001a]. It aims at a system-level

understanding of metabolic or regulatory pathways by investigating interrelationships

(organization or structure) and interactions (dynamics or behavior) of genes (RNA transcripts,

proteins) and the genome or cells (metabolites).

The biggest problem that any approach to mathematical modeling in biology faces is well

summarized by Zadeh's uncertainty principle which states that as the complexity of a system

increases, our ability to make precise and yet significant statements about its behavior diminishes

until a threshold is reached beyond which precision and significance (or relevance) become

almost exclusive characteristics. Overly ambitious attempts to build predictive models of cells or

subcelluar processes are likely to experience the fate of historians and weather forecasters –

prediction is difficult, especially if it concerns the future… , and these difficulties are independent

of the time, amount of data available or technological resources (e.g. computing power) thrown at

the problem.

The problem is that perturbations to cells have multi-gene / multi-transcript / multi-protein

responses, ‘closing’ the system, i.e., restricting the model to a small set of variables, assuming

constancy of some variables, inevitably leads to an often unacceptable level of uncertainty in the

inference. In other words, the problems of applying systems theory in biology can be summarised

by

a) the difficulty of building precise and yet general models,

b) the ‘openness’ of biological systems, the fact that these systems are hierarchical and

highly interconnected.

15/07/2003 10/17

Dynamic Pathway Modeling as an Example

We mentioned before the need to combine continuous representations (e.g. mass action

differential equations) and process algebras (formal languages such as p-calculus). The example

given above was motivated by combining representations of intra- and inter-celluar dynamics.

The problem of modeling signaling pathways is however another good example in which the need

for hybrid models has become clear. Intracellular signaling pathways directly govern cell

behaviour at cellular, tissue and whole-genome level and thereby influence severe pathologies

such as cancer, chronic inflammatory disease, cardiovascular disease and neurological

degeneration syndromes. Signal transduction mechanisms have been identified as important

targets for disease therapy. Signaling modules regulate fundamental biological processes

including cell proliferation, differentiation and survival. These ‘decisions’ are arrived at by

reaching thresholds in concentrations. The duration of reaching threshold matters and while some

processes are reversible others aren’t. While rate changes are best represented by differential

equations, such switching into different ‘operating modes’ is best represented using a ‘logical

formalism’. Forward and backward biochemical reactions run in parallel and ‘compete’ rendering

sequential representations unrealistic. Rate equations originate as a first approximation, whereby

internal fluctuations are ignored. These deterministic differential equations describe the evolution

of the mean value of concentrations of the various elements involved. The existence of positive

and negative feedback in a regulatory network is considered common and leads to nonlinear rate

equations.

The MAPK signaling pathway dynamics are an example of a system which has been investigated

by a number of research groups with very different modeling paradigms, including mass-action

differential equations, Monte-Carlo simulations, or process algebras. To this date, none of these

considered an all-round satisfactory solution, providing a biologically faithful and transparent

model that can be verified experimentally. Intra-cellular signaling pathways carry signals from

cell-surface receptors (where the process known as signal transduction converted the signal

produced by activation of a cell-surface receptor) to their intracellular destination. The

information flow is realized by biochemical processes, implemented by networks of proteins.

These networks have been represented and visualized by Petri nets, Boolean networks and other

graph-based networks. A number of simulation environments such as for example BioSpice,

NATURAL SYSTEM

Measurement

Observation

Causal Entailment

Inferential Entailment

Accurate Conclusions

but

Imprecise Reasoning

Precise Inference

but

Inaccurate Conclusions

Represents

Simple Systems

Quantitatively

Describes

Complex Systems

Qualitatively

Mathematical Modelling

(axioms, equations, diagrams)

Empirical Analysis

(natural language, diagrams, pictures)

Figure 2:

There is an interesting contrast and complementarity between modelling in the engineering

and physical sciences and inference in biology.

15/07/2003 11/17

DBSolve, Gepasi, StochSim, ProMOt, Diva, Cellerator, Vcell and E-cell amongst others are

available and efforts such as the Systems Biology Workbench and Systems Biology Markup

Language are a suitable computational tool to integrate and combine various tools. Here we shall

consider a sub-module of a signaling pathway and focus on a description of its biokinetic

reactions by means of (nonlinear) ordinary differential equations. The difficulties and challenges

arising when this model is to be extended to cover most of the aspects discussed previously will

become apparent from the discussion below.

Figure 3 The Ras/Raf-1/MEK/ERK signaling pathway.

The Ras/Raf-1/MEK/ERK module (Figure 3) is a ubiquitously expressed signaling pathway that

conveys mitogenic and differentiation signals from the cell membrane to the nucleus. This kinase

cascade appears to be spatially organized in a signaling complex nucleated by Ras proteins. The

small G protein Ras is activated by many growth factor receptors and binds to the Raf-1 kinase

with high affinity when activated. This induces the recruitment of Raf-1 from the cytosol to the

cell membrane. Activated Raf-1 then phosphorylates and activate MAPK/ERK Kinase (MEK), a

kinase that in turn phosphorylates and activates Extracellular signal Regulated Kinase (ERK), the

prototypic Mitogen-Activated Protein Kinase (MAPK). Activated ERKs can translocate to the

nucleus and regulate gene expression by the phosphorylation of transcription factors. This kinase

cascade controls the proliferation and differentiation of different cell types. The specific

biological effects are crucially dependent on the amplitude and kinetics of ERK activity. The

adjustment of these parameters involves the regulation of protein interactions within this pathway

and motivates a systems biological study. Figure 4 and 5 describe “circuit diagrams” of the

biokinetic reactions for which a mathematical model is used to simulate the influence RKIP has

on the pathway.

15/07/2003 12/17

VXEVWUDWH?6?HQ]\P H?(?

SURGXFW?3?

N?N?

N?

FRP SOH[?( 6?

1

x

2

x

3

x

4

x

N?

Figure 4. The pathway model is constructed from basic reaction modules like this enzyme kinetic reaction

for which a set of four differential equations is required.

The pathway is described by ‘reaction modules’ (Figure 4), each of which can be viewed as a

(slightly modified) enzyme kinetic reaction for which the following set of differential equations is

obtained:

1

1 1 2 2 3

2

1 1 2 2 3 3 3

3

1 1 2 2 3 3 3

4

3 3 4 4

( )

( ) ( ) ( )

( )

( ) ( ) ( ) ( )

( )

( ) ( ) ( ) ( )

( )

( ) ( )

dx t

k x t x t k x t

dt

dx t

k x t x t k x t k x t

dt

dx t

k x t x t k x t k x t

dt

dx t

k x t k x t

dt

The entire model, as shown in Figure 5, is composed of these modules, leading to what usually

becomes a relatively large set of differential equations for which parameter values have to be

identified.

k1/ k2

k3/ k4

Raf-1* RKIP

Raf-1*/RKIP/ERK-PP

Raf-1*/RKIP

k5

ERK-PP

RKIP-P

ERK

k6/ k7

MEK-PP

k8

k9/ k10

k11

RP

RKIP-P/RP

MEK-PP/ERK

m4

m10

m6

m5m7

m8

m9

m1 m2

m3

m11

Figure 5. Graphical representation of the ERK signaling pathway regulated by RKIP: a circle represents

a state for the concentration of a protein and a bar a kinetic parameter of reaction to be estimated. The

15/07/2003 13/17

directed arc (arrows) connecting a circle and a bar represents a direction of a signal flow. The bi-directional

thick arrows represent an association and a dissociation rate at the same time. The thin unidirectional

arrows represent a production rate of products.

As illustrated in Figure 6, in the estimation of parameters from western blot data, the parameter

estimates usually appear as a time dependent profile since the time course data include various

uncertain factors such as transient responses, noise terms, etc. However, if the signal transduction

system itself is inherently time-invariant then estimated parameter profile should converge to a

certain constant value at steady-state. Therefore we have to find this convergence value if the

system is time-invariant. Otherwise we have to derive an interpolated polynomial function of time

for time-varying systems. For reasons of cost, logistics and time management, for any particular

system under study, concentration profiles are usually obtained only for a relatively small number

of proteins and for few data points. One subsequently relies on values obtained from the literature.

But even if data are available, the estimation of parameters for nonlinear ordinary differential

equations is far from being a trivial problem. For the parameter estimation shown in Figure 6, we

discretized the given continuous differential equations along with a sample time, which usually

corresponds to the time of measurement. Then the continuous differential equations can be

approximated by difference equations. This leads to a set of linear algebraic difference equations

with respect to parameters and regression techniques can be employed.

0

2

4

6

8

10

0.2

0.3

0.4

0.5

0.6

0.7

t: time

k1(t): value of parameter

parameter estimation for k1: k1=0.53

0

2

4

6

8

10

4

5

6

7

8

x 10

-3

t: time

k2(t): value of parameter

parameter estimation for k2: k2=0.0072

0

2

4

6

8

10

0.3

0.4

0.5

0.6

0.7

t: time

k3(t): value of parameter

parameter estimation for k3: k3=0.625

0

2

4

6

8

10

1

1.5

2

2.5

3

x 10

-3

t: time

k4(t): value of parameter

parameter estimation for k4: k4=0.00245

calculated

trend

estimated

calculated

trend

estimated

calculated

trend

estimated

calculated

trend

estimated

Figure 6. Illustration for parameter estimation from time series data: the upper left shows Raf-1*/RKIP

complex association parameter k1, the upper right shows Raf-1*/RKIP/ERK-PP association parameter k3,

the lower left shows Raf-1* and RKIP dissociation parameter k2, and the lower right shows ERK-PP and

Raf-1*/RKIP complex dissociation parameter k4.

If a satisfactory model is obtained, this can then be used in a variety of ways to validate and

generate hypotheses, or to help experimental design. Based on the mathematical model illustrated

in Figure 5, and the estimated parameter values as for example obtained using a discretization of

the nonlinear ordinary differential equations (as illustrated in Figure 6), we can perform

simulation studies to analyze the signal transduction system with respect to the sensitivity for the

variation of RKIP and ERK-PP. For this purpose, we first simulate the pathway model according

to the variation of the initial concentration of RKIP (RKIP sensitivity analysis – see Figure 7).

15/07/2003 14/17

Next we perform the simulation according to the variation of the initial concentration of ERK-PP

in this case (ERK-PP sensitivity analysis – see Figure 8).

Figure 7. The simulation results according to the variation of the concentration of RKIP: The upper left

shows the change of concentration of Raf-1*, the upper right shows ERK, the lower left shows RKIP, and

the lower right shows RKIP-P.

Figure 8. The simulation results according to the variation of the concentration of ERK-PP (continued):

The upper left shows the change of concentration of MEK-PP, the upper right shows RKIP-P-RP, the lower

left shows ERK-P-MEK-PP, and the lower right shows RP.

The kind of models and the modeling approach which we introduced here has already proven to

be successful (i.e. useful to the biologist) despite the many assumptions, simplifications and

subsequent limitations of the model. A challenge for systems biology remains: how can we scale

these models up to describe not only more complex pathways but also to integrate information

and capture dynamic regulation at the transcriptome, proteomic and metabolomic level.

Especially MAP kinase pathways have been investigated by various groups using a variety of

mathematical techniques [Schoeberl 2002, Asthagiri 2001] and the co-existence or generalization

of different methodologies raises questions about the biological systems considered, for

mathematical models to have explanatory power, hence being useful to biologists, the semantics

15/07/2003 15/17

or interpretation of the models matters. Do we assume that a cell is essentially a computer or

machine – executing logical programmes, is it a biochemical soup, an essentially random process

or are independent agents interacting according to a set of pre-defined rules? Is noise an inherent

part of the biological process or do we introduce it as a means to represent unknown quantities

and variations?

Real-world problems and challenges to apply and develop research in the area of systems biology

are abounding. For example consider the development of mathematical models used to analyze

and simulate problems in development such as what is sometimes called asymmetrical division.

This describes the phenomenon that when a stem cell divides, one daughter cell differentiates

whereas the other remains a stem cell. Otherwise our stem cells would get depleted. This

phenomenon happens although the dividing stem cell has the same pattern of gene expression and

is exposed to the exact same environmental cues. Another application would be the mathematical

modeling of differential gene expression and regulation of transcription during a bacterial or viral

infection. The theoretical work could be guided by the analysis of DNA microarray data, which

are available for a number of organisms.

Summary and Conclusions

The discussion above outlined a dynamic systems framework for the study of gene expression

and regulation. We are interested in the interface between internal cellular dynamics and the

external environment in a multi-cellular system. A definition of complexity in the context of

modeling gene expression and regulation is given and the background and perspective taken is

described in detail. While the motivation is to investigate some fundamental questions of

morphological development, differentiation and responses to environmental stress, the proposal is

to focus these questions on a limited set of problems, methodologies and experimental techniques.

The use of a model is to see the general in the particular, i.e., the purpose of a mathematical

model of cellular processes is not to obtain a perfect fit to experimental data, but to refine the

biological question and experiment under consideration.

The central dogma of systems biology is the fact that the cell and its inter- and intra-cellular

processes describe dynamic systems. An understanding of regulatory systems therefore requires

more than merely collecting large amounts of data by gene expression assays. If we are to go

beyond ‘association’ to an understanding of ‘causal entailment’, we need to go beyond the data

mining approach. The systems approach is characterized by systematic manipulation of the

system behavior.

Reality is described as a continuous dynamic process, best represented as a system of components

realizing a spatio-temporal relationship of events. The motivation comes from the fact that despite

the endless complexity of life, it can be organized and repeated patterns appear at different

organizational and descriptional levels. Indeed, the fact that the incomprehensible presents itself

as comprehensible has been a necessary condition for the sanity and salary of scientists. This

principle is tested in systems biology with mathematical models of gene expression and

regulation for simple and yet complex biological systems.

If this documents gives the impression that molecular biology, with its focus on spatial/structural

molecular characteristics, is failing to address temporal and relational aspects, so does systems

and control theory miss the importance of spatial or structural arrangements in its representations.

The problem of how to combine both temporal and spatial aspect in one model has been a major

challenge in the engineering and physical sciences and will be an even greater one for molecular

processes, which are consisting of a large number of interacting components.

With the shift of focus from molecular characterization to an understanding of functional activity

in genomics, systems biology can provide us with methodologies to study the organization and

15/07/2003 16/17

dynamics of complex multivariable genetic pathways. The application of systems theory to

biology is not new and Mihajlo Mesarovic wrote in 1968 that “in spite of the considerable interest

and efforts, the application of systems theory in biology has not quite lived up to expectations. [..]

one of the main reasons for the existing lag is that systems theory has not been directly concerned

with some of the problems of vital importance in biology.” His advice for the biologists was that

progress could be made by more direct and stronger interactions with system scientists. “The real

advance in the application of systems theory to biology will come about only when the biologists

start asking questions which are based on the system-theoretic concepts rather than using these

concepts to represent in still another way the phenomena which are already explained in terms of

biophysical or biochemical principles. [..] then we will not have the ‘application of engineering

principles to biological problems’ but rather a field of systems biology with its own identity and in

its own right.”

References

O. Alter, P.O. Brown and D. Botstein (2000): Singular Value Decomposition for Genome-Wide

Expression Data Processing and Modeling. PNAS, Vol. 97, No. 18, 10101-10106, 29 August

2000.

A.R. Asthagiri and D.A. Lauffenburger (2001): A Computational Study of Feedback Effects on

Signal Dynamics in a Mitogen-Activated Protein Kinase (MAPK) Pathway Model. Biotechnol.

Prog., Vol. 17, 227-239.

M.S. Branicky, V.S. Brokar and S.K. Mitter (1998): A Unified Framework for Hybrid Control:

Model and Optimal Control Theory. IEEE Trans. Automatic Control, Vol. 43, No. 1, 31-45.

X.-R. Cao and Y.-C. Ho (1990): Models of Discrete Event Dynamic Systems. IEEE Control

Systems Magazine, Vol. 10, No. 3, 69-76.

K.-H. Cho and J.-T. Lim (1999): Mixed Centralized/Decentralized Supervisory Control of

Discrete Event Dynamic Systems. Automatica, Vol. 35, No. 1, 121-128.

J.L. Casti (1988): Linear Metabolism-Repair Systems. Int. J. General Systems, Vol. 14, 143-167.

J.L. Casti (1988b): The Theory of Metabolism-Repair Systems. Appl. Mathematics Comput., Vol.

28, 113-154.

J. Hasty, D. McMillen, F. Isaacs and J.J. Collins (2001): Computational Studies of Gene

Regulatory Networks: In Numero Molecular Biology. Nature Reviews Genetics, Vol. 2, No 4,

268-279, April 2001.

M. Hucka, A. Finney, H. Sauro, H. Bolouri, J. Doyle and H. Kitano (2001): The ERATO Systems

Biology Workbench: An Integrated Environment for Multiscale and Multitheoretic Simulations in

Systems Biology. Chapter 6 in Foundations of Systems Biology, H. Kitano (ed.), MIT Press, 2001.

S.A. Kauffman (1995): At Home in the Universe: The Search for Laws of Self-Organisation and

Complexity. Oxford University Press, New York, 1995.

C. Furausawa and K. Kaneko (1998): Emergence of Rules in Cell Society: Differentiation,

Hierarchy, and Stability. Bull. Math. Biol., Vol. 60, 46-49.

H. Kitano ed. (2001): Foundations of Systems Biology. MIT Press.

15/07/2003 17/17

B. Lennartson, M. Tittus, B. Egardit and S. Pettersson (1996): Hybrid Systems in Process Control.

IEEE Control Systems Magazine, Vol. 16, No. 5, 45-56.

P.J. Ramadge and W.M. Wonhan (1989): The Control of Discrete Event Systems. Proc. IEEE,

Vol. 77, 81-98 (Special Issue: Discrete Event Dynamic Systems).

R. Rosen (1958): The Representation of Biological Systems from the Standpoint of the Theory of

Categories. Bulletin of Mathematical Biophysics, Vol. 20, 317-341.

R. Rosen (1985): Anticipatory Systems. Pergamon Press, New York.

B. Schoeberl, C. Eichler-Jonsson, E.D. Gilles and G. Müller (2002): Computational Modelling of

the Dynamics of the MAP kinase Cascade Activated by Surface and Internalized EGF Receptors.

Nature Biotechnology, Vol. 20, April, 370-375.

R. Sole and B. Goodwin (2000): Signs of Life: How Complexity Pervades Biology? Basic Books,

New York.

J.J. Tyson and M.C. Mackey (2001): Molecular, Metabolic and Genetic Control. Chaos, Vol. 11,

No 1, March 2001 (Special Issue).

J.C. Willems (1991): Paradigms and Puzzles in the Theory of Dynamical Systems. IEEE

Transactions on Automatic Control, Vol. 36, No. 3, 259-294, March 1991.

O. Wolkenhauer (2001a): Systems Biology: The Reincarnation of Systems Theory Applied in

Biology? Briefings in Bioinformatics, Henry Stewart Publications, Vol. 2, No. 3, 258-270,

September 2001 (Special Issue: Modelling Cell Systems)

O. Wolkenhauer (2001b): Mathematical Modelling in the Post-Genome Era: Understanding

Genome Expression and Regulation - A System Theoretic Approach. BioSystems, Elsevier, In

press.

O. Wolkenhauer (2001c): Data Engineering: Fuzzy Mathematics in Systems Theory and

Data Analysis. John Wiley & Sons, New York, 2001.

H. Ye, A.N. Michel and L. Hou (1998): Stability Theory for Hybrid Dynamical Systems. IEEE

Trans. Automatic Control, Vol. 43, No. 4, 461-474.

## Comments 0

Log in to post a comment