Introduction to Systems Biology
Tom
MacCarthy
Math Tower 1

101
maccarth@ams.sunysb.edu
Office hours Tue/Fri 10

12
Course website:
www.ams.sunysb.edu/faculty/~maccarth
Systems Biology
Systems Biology implies holistic (whole system) view of
biological systems
The study of the interactions between the components of
biological systems, and how these interactions give rise to the
function and behavior of that system
Antithesis of reductionist approach (study of components in
isolation)
In practice, Systems Biology usually involves
Mathematical modeling
Generating (lots of) experimental data
Statistical data analysis
Here we will be dealing with computational aspects
Growth of biological data
First draft human genome published June 26, 2000
Cost: approximately $3 billion
Consists of ~3 billion nucleotides (C,G,A,T)
First phase of 1000 genomes project
published Oct 28, 2010
Cost: $30

50
m
illion
Growth of biological data
Comparable increases in
other types of data,
for example gene expression
data, now increasingly
performed via sequencing
technologies.
The reduced cost of sequencing
las
led to
many other projects such as:
T
he Cancer Genome Atlas
The 1000 Plant Genomes project
Systems Biology
The availability of large and varied amounts of biological
data has created a need for computational tools for
manipulation and analysis.
Mathematical modeling can be used to generate or test
novel hypotheses
Example: Transcription factor networks in blood cell
differentiation
*** This course is introductory and inter

disciplinary
therefore my apologies to specialists ***
DNA
→ RNA →
Protein
•
M
ost cells contain
DNA
(deoxyribonucleic acid)
•
Genes
are
segments
of DNA
that
contain
the
necessary
i
nformation
for
making
proteins
.
•
Proteins
are
molecules
with
specific
cellular
functions
TF1
TF2
TF3
TF4
•
At any given moment genes may or may not be producing protein
•
Proteins called transcription factors (TFs) control the level of activation (or
“expression”) of each gene.
•
Gene have
regulatory regions
which contain short DNA sequences (or “motifs”)
that are recognized by the TFs.
→
in this way TFs activate or repress gene expression
Gene regulation
Gene expression
Gene regulatory networks
Gene X coding region
TF binding site
X
Gene Y coding region
TF binding site
Y
Transcription/
Translation
Intermediate/s
TF X
activation
activation/
repression
TF Y
TF Z
•
Transcription factors themselves are proteins
•
They are activated/repressed by other TFs (or by themselves)
•
In this way they form
gene regulatory networks
Blood cell differentiation
During blood cell differentiation GATA

1 and PU.1 are transcription factors that
control
erythroid
and myeloid development, respectively.
The two proteins have been shown to function in an antagonistic fashion,
with GATA

1 repressing PU.1 activity during
erythropoiesis
(red blood cells)
and PU.1 repressing GATA

1 function during
myelopoiesis
(macrophages, etc.)
Where are GATA

1 and Pu.1 binding?
ChIP

Seq
was used to detect where in the
(mouse genome) GATA

1 and Pu.1 are
binding
Where are GATA

1 and Pu.1 binding?
Find
151
myelo

lymphoid genes that are occupied by GATA

1 and PU.1 and
that are positively regulated by PU.1 and repressed by
GATA

1, for
example:
Mathematical modeling
Already known that GATA

1 and
Pu.1 are mutually antagonistic.
Also known before that Pu.1
represses GATA

1 targets.
Last piece of puzzle: GATA

1 also
represses Pu.1 targets
Question: What are the
consequences of mutual
repression of the targets on gene
expression dynamics?
Can compare a mathematical
model with and without the
repression of the targets
Mathematical modeling
A system of four coupled non

linear ordinary differential
equations is used to model the GATA

1

PU.1 regulatory
network
We manipulated the rate constants to evaluate the different
network architectures.
For example, as we increase
K
ir
→
∞
then mutual
antagonism (GATA

1
↔
Pu.1) disappears.
Similarly,
K
it
modulates the cross

regulation of targets
GATA

1
GATA

1
target
Pu.1
Pu.1
target
Mathematical modeling
We used
matlab
to simulate the
system
Mathematical modeling
Mathematical modeling
Systematically modulated
(a)
the mutual
antagomism
between GATA

1 and Pu.1
(b)
m
utual antagonism
between the targets
For every point in the plane
we evaluate the steady
state ratio
g
T
/
p
T
T
he model behavior illustrates
that mutual inhibition and
repression of opposing
downstream targets act
synergistically to maximize the
G
T
/P
T
ratio
Systems Biology in practice
These results suggest that the dual mechanism
provides, in comparison to either cross

inhibition or
target inhibition alone, more robust suppression of an
alternative gene expression program during lineage

specification.
The example illustrates the highly multi

disciplinary
nature of much modern biological research, here
combining:
1. High

throughput techniques (
ChIP

Seq
)
2. Data analysis
3. Mathematical modeling to test hypothesis
Many times, the hypothesis might come first from the
mathematical model
Why
Matlab
and R?
Computational tools are indispensable for doing this kind
of research
In many cases students are held back by lack of
computational skills
Matlab
and R are both interpreted languages, i.e. no
compiler
This makes them slower than compiled languages
Both have an enormous number of extension packages
Octave is free
Matlab
“clone” and is available for
Windows, Mac and Linux
Both languages can be used interactively, but it is more
powerful to write programs.
Matlab
Advantages
Matlab
allows one to easily perform numerical calculations and
visualize the results.
Many additional libraries for statistics, signal processing, image
processing, etc.
Note
Matlab
has Symbolic Toolbox, Octave does not
Disadvantages
Slow, but can be improved via
vectorization
Matlab
not good for complex software projects (not OO)
Octave download and libraries
To download octave for your home PC or laptop, go to:
http://octave.sourceforge.net/
To install a package, from within octave, run:
pkg
install
package_file_name.tar.gz
For list of packages choose “Packages” from top menu:
Course outline
1. Learning to program in
Matlab
/octave
2. Applications in Mathematical Biology, including:
Elementary image processing
Linear regression
Markov processes and Fisher

Wright model
Difference equations
Ordinary differential equations
3. R programming
4. Statistics and Bioinformatics using R
Linear models
Statistical hypothesis testing and linear models
Expression data analysis
Analysis of high

throughput sequencing data
Further reading
There isn’t yet a good Systems Biology textbook that I’m
aware of.
I do
not
recommend this one
→
Further reading
“MATLAB Programming for Engineers” by Stephen J.
Chapman (Brooks/Cole)
“Mathematical Models in Biology” by Elizabeth S.
Allman
and John A. Rhodes (Cambridge
Univ
Press)
“Introductory Statistics with R” by Peter
Dalgaard
(Springer)
“Bioinformatics and Functional Genomics” by Jonathan
Pevsner (Wiley)
Octave
To start octave, open a terminal window and enter the
command “octave”
Octave basics
Getting help
Within octave type
help <command>, e.g. “help sort”
User

friendly online help available at
http://www.mathworks.com/help/techdoc/
GNU Octave help:
http://www.gnu.org/software/octave/doc/interpreter/
Octave basics
Files and directories
A MATLAB script file (Called an M

file) is a text (plain
ASCII) file that contains one or more MATLAB
commands and, optionally, comments.
The file is saved with the extension ".m".
When the filename (without the extension) is issued as a
command in MATLAB, the file is opened, read, and the
commands are executed as if input from the keyboard.
Download the file
calc_area.m
from the course website
http://www.ams.stonybrook.edu/~maccarth/teaching.shtml
Place the file in subdirectory “work”
MATLAB Script Files
The preceding file is executed by issuing a MATLAB
command:
>>
calc_area
This single command causes MATLAB to look in the
current directory, and if a file
calc_area.m
is found, open it
and execute all of the commands
.
If MATLAB cannot find the file in the current working
directory, an error message will appear.
MATLAB Script Files
When the file is not in the current working directory, a
cd or chdir command may be issued to change the
directory.
>> cd
~/work
>>
calc_area
Octave basics
The search path
Matlab
/Octave also uses a search path to find M

files
The m

files are organized in directories which
matlab
searches
To add a directory to the search path:
addpath
(‘<
directory_name
>’); e.g.
addpath
(‘~/work’)
savepath
;
You should now be able to run calc_area.m even if it is not
your current directory, simply type:
calc_area
Now open the file
calc_area.m
with ‘
gedit
’
Applications
–
Accessories
–
Text Editor
Change the radius to 3 and re

run ‘
calc_area
’
Comments 0
Log in to post a comment