# Session 1

Biotechnology

Oct 4, 2013 (4 years and 9 months ago)

131 views

Introduction to Systems Biology

Tom
MacCarthy

Math Tower 1
-
101

maccarth@ams.sunysb.edu

Office hours Tue/Fri 10
-
12

Course website:
www.ams.sunysb.edu/faculty/~maccarth

Systems Biology

Systems Biology implies holistic (whole system) view of
biological systems

The study of the interactions between the components of
biological systems, and how these interactions give rise to the
function and behavior of that system

Antithesis of reductionist approach (study of components in
isolation)

In practice, Systems Biology usually involves

Mathematical modeling

Generating (lots of) experimental data

Statistical data analysis

Here we will be dealing with computational aspects

Growth of biological data

First draft human genome published June 26, 2000

Cost: approximately \$3 billion

Consists of ~3 billion nucleotides (C,G,A,T)

First phase of 1000 genomes project
published Oct 28, 2010

Cost: \$30
-

50
m
illion

Growth of biological data

Comparable increases in

other types of data,

for example gene expression

data, now increasingly
performed via sequencing
technologies.

The reduced cost of sequencing
las

led to
many other projects such as:

T
he Cancer Genome Atlas

The 1000 Plant Genomes project

Systems Biology

The availability of large and varied amounts of biological
data has created a need for computational tools for
manipulation and analysis.

Mathematical modeling can be used to generate or test
novel hypotheses

Example: Transcription factor networks in blood cell

differentiation

*** This course is introductory and inter
-
disciplinary
therefore my apologies to specialists ***

DNA
→ RNA →

Protein

M
ost cells contain
DNA

(deoxyribonucleic acid)

Genes

are
segments

of DNA
that

contain

the

necessary

i
nformation

for

making

proteins
.

Proteins

are
molecules

with

specific

cellular

functions

TF1

TF2

TF3

TF4

At any given moment genes may or may not be producing protein

Proteins called transcription factors (TFs) control the level of activation (or
“expression”) of each gene.

Gene have
regulatory regions

which contain short DNA sequences (or “motifs”)
that are recognized by the TFs.

in this way TFs activate or repress gene expression

Gene regulation

Gene expression

Gene regulatory networks

Gene X coding region

TF binding site

X

Gene Y coding region

TF binding site

Y

Transcription/

Translation

Intermediate/s

TF X

activation

activation/

repression

TF Y

TF Z

Transcription factors themselves are proteins

They are activated/repressed by other TFs (or by themselves)

In this way they form
gene regulatory networks

Blood cell differentiation

During blood cell differentiation GATA
-
1 and PU.1 are transcription factors that

control
erythroid

and myeloid development, respectively.

The two proteins have been shown to function in an antagonistic fashion,

with GATA
-
1 repressing PU.1 activity during
erythropoiesis

(red blood cells)

and PU.1 repressing GATA
-
1 function during
myelopoiesis

(macrophages, etc.)

Where are GATA
-
1 and Pu.1 binding?

ChIP
-
Seq

was used to detect where in the
(mouse genome) GATA
-
1 and Pu.1 are
binding

Where are GATA
-
1 and Pu.1 binding?

Find
151
myelo
-
lymphoid genes that are occupied by GATA
-
1 and PU.1 and
that are positively regulated by PU.1 and repressed by
GATA
-
1, for
example:

Mathematical modeling

-
1 and
Pu.1 are mutually antagonistic.

Also known before that Pu.1
represses GATA
-
1 targets.

Last piece of puzzle: GATA
-
1 also
represses Pu.1 targets

Question: What are the
consequences of mutual
repression of the targets on gene
expression dynamics?

Can compare a mathematical
model with and without the
repression of the targets

Mathematical modeling

A system of four coupled non
-
linear ordinary differential
equations is used to model the GATA
-
1
-
PU.1 regulatory
network

We manipulated the rate constants to evaluate the different
network architectures.

For example, as we increase
K
ir

then mutual
antagonism (GATA
-
1

Pu.1) disappears.

Similarly,
K
it

modulates the cross
-
regulation of targets

GATA
-
1

GATA
-
1

target

Pu.1

Pu.1

target

Mathematical modeling

We used
matlab

to simulate the
system

Mathematical modeling

Mathematical modeling

Systematically modulated

(a)
the mutual
antagomism

between GATA
-
1 and Pu.1

(b)
m
utual antagonism
between the targets

For every point in the plane
state ratio
g
T
/
p
T

T
he model behavior illustrates
that mutual inhibition and
repression of opposing
downstream targets act
synergistically to maximize the
G
T
/P
T

ratio

Systems Biology in practice

These results suggest that the dual mechanism
provides, in comparison to either cross
-
inhibition or
target inhibition alone, more robust suppression of an
alternative gene expression program during lineage
-
specification.

The example illustrates the highly multi
-
disciplinary
nature of much modern biological research, here
combining:

1. High
-
throughput techniques (
ChIP
-
Seq
)

2. Data analysis

3. Mathematical modeling to test hypothesis

Many times, the hypothesis might come first from the
mathematical model

Why
Matlab

and R?

Computational tools are indispensable for doing this kind
of research

In many cases students are held back by lack of
computational skills

Matlab

and R are both interpreted languages, i.e. no
compiler

This makes them slower than compiled languages

Both have an enormous number of extension packages

Octave is free
Matlab

“clone” and is available for
Windows, Mac and Linux

Both languages can be used interactively, but it is more
powerful to write programs.

Matlab

Matlab

allows one to easily perform numerical calculations and
visualize the results.

Many additional libraries for statistics, signal processing, image
processing, etc.

Note
Matlab

has Symbolic Toolbox, Octave does not

Slow, but can be improved via
vectorization

Matlab

not good for complex software projects (not OO)

http://octave.sourceforge.net/

To install a package, from within octave, run:

pkg

install
package_file_name.tar.gz

For list of packages choose “Packages” from top menu:

Course outline

1. Learning to program in
Matlab
/octave

2. Applications in Mathematical Biology, including:

Elementary image processing

Linear regression

Markov processes and Fisher
-
Wright model

Difference equations

Ordinary differential equations

3. R programming

4. Statistics and Bioinformatics using R

Linear models

Statistical hypothesis testing and linear models

Expression data analysis

Analysis of high
-
throughput sequencing data

There isn’t yet a good Systems Biology textbook that I’m
aware of.

I do
not

recommend this one

“MATLAB Programming for Engineers” by Stephen J.
Chapman (Brooks/Cole)

“Mathematical Models in Biology” by Elizabeth S.
Allman

and John A. Rhodes (Cambridge
Univ

Press)

“Introductory Statistics with R” by Peter
Dalgaard

(Springer)

“Bioinformatics and Functional Genomics” by Jonathan
Pevsner (Wiley)

Octave

To start octave, open a terminal window and enter the
command “octave”

Octave basics

Getting help

Within octave type

help <command>, e.g. “help sort”

User
-

http://www.mathworks.com/help/techdoc/

GNU Octave help:

http://www.gnu.org/software/octave/doc/interpreter/

Octave basics

Files and directories

A MATLAB script file (Called an M
-
file) is a text (plain
ASCII) file that contains one or more MATLAB

The file is saved with the extension ".m".

When the filename (without the extension) is issued as a
command in MATLAB, the file is opened, read, and the
commands are executed as if input from the keyboard.

calc_area.m

from the course website

http://www.ams.stonybrook.edu/~maccarth/teaching.shtml

Place the file in subdirectory “work”

MATLAB Script Files

The preceding file is executed by issuing a MATLAB
command:

>>
calc_area

This single command causes MATLAB to look in the
current directory, and if a file
calc_area.m
is found, open it
and execute all of the commands
.

If MATLAB cannot find the file in the current working
directory, an error message will appear.

MATLAB Script Files

When the file is not in the current working directory, a
cd or chdir command may be issued to change the
directory.

>> cd
~/work

>>
calc_area

Octave basics

The search path

Matlab
/Octave also uses a search path to find M
-
files

The m
-
files are organized in directories which
matlab

searches

To add a directory to the search path:

(‘<
directory_name
>’); e.g.
(‘~/work’)

savepath
;

You should now be able to run calc_area.m even if it is not

calc_area

Now open the file
calc_area.m

with ‘
gedit

Applications

Accessories

Text Editor

Change the radius to 3 and re
-
run ‘
calc_area