Session 1

tastelesscowcreekBiotechnology

Oct 4, 2013 (4 years and 2 months ago)

114 views

Introduction to Systems Biology

Tom
MacCarthy

Math Tower 1
-
101

maccarth@ams.sunysb.edu

Office hours Tue/Fri 10
-
12


Course website:
www.ams.sunysb.edu/faculty/~maccarth


Systems Biology


Systems Biology implies holistic (whole system) view of
biological systems


The study of the interactions between the components of
biological systems, and how these interactions give rise to the
function and behavior of that system


Antithesis of reductionist approach (study of components in
isolation)


In practice, Systems Biology usually involves


Mathematical modeling


Generating (lots of) experimental data


Statistical data analysis


Here we will be dealing with computational aspects


Growth of biological data


First draft human genome published June 26, 2000


Cost: approximately $3 billion


Consists of ~3 billion nucleotides (C,G,A,T)


First phase of 1000 genomes project
published Oct 28, 2010


Cost: $30
-

50
m
illion

Growth of biological data

Comparable increases in

other types of data,

for example gene expression

data, now increasingly
performed via sequencing
technologies.

The reduced cost of sequencing
las

led to
many other projects such as:

T
he Cancer Genome Atlas

The 1000 Plant Genomes project

Systems Biology


The availability of large and varied amounts of biological
data has created a need for computational tools for
manipulation and analysis.


Mathematical modeling can be used to generate or test
novel hypotheses


Example: Transcription factor networks in blood cell

differentiation


*** This course is introductory and inter
-
disciplinary
therefore my apologies to specialists ***


DNA
→ RNA →

Protein



M
ost cells contain
DNA

(deoxyribonucleic acid)



Genes

are
segments

of DNA
that

contain

the

necessary


i
nformation

for

making

proteins
.



Proteins

are
molecules

with

specific

cellular

functions

TF1

TF2

TF3

TF4



At any given moment genes may or may not be producing protein



Proteins called transcription factors (TFs) control the level of activation (or
“expression”) of each gene.



Gene have
regulatory regions

which contain short DNA sequences (or “motifs”)
that are recognized by the TFs.




in this way TFs activate or repress gene expression

Gene regulation

Gene expression

Gene regulatory networks

Gene X coding region

TF binding site

X

Gene Y coding region

TF binding site

Y

Transcription/

Translation

Intermediate/s

TF X

activation

activation/

repression

TF Y

TF Z



Transcription factors themselves are proteins



They are activated/repressed by other TFs (or by themselves)



In this way they form
gene regulatory networks

Blood cell differentiation

During blood cell differentiation GATA
-
1 and PU.1 are transcription factors that

control
erythroid

and myeloid development, respectively.


The two proteins have been shown to function in an antagonistic fashion,

with GATA
-
1 repressing PU.1 activity during
erythropoiesis

(red blood cells)

and PU.1 repressing GATA
-
1 function during
myelopoiesis

(macrophages, etc.)

Where are GATA
-
1 and Pu.1 binding?

ChIP
-
Seq

was used to detect where in the
(mouse genome) GATA
-
1 and Pu.1 are
binding

Where are GATA
-
1 and Pu.1 binding?

Find
151
myelo
-
lymphoid genes that are occupied by GATA
-
1 and PU.1 and
that are positively regulated by PU.1 and repressed by
GATA
-
1, for
example:

Mathematical modeling


Already known that GATA
-
1 and
Pu.1 are mutually antagonistic.


Also known before that Pu.1
represses GATA
-
1 targets.


Last piece of puzzle: GATA
-
1 also
represses Pu.1 targets


Question: What are the
consequences of mutual
repression of the targets on gene
expression dynamics?


Can compare a mathematical
model with and without the
repression of the targets

Mathematical modeling


A system of four coupled non
-
linear ordinary differential
equations is used to model the GATA
-
1
-
PU.1 regulatory
network

We manipulated the rate constants to evaluate the different
network architectures.

For example, as we increase
K
ir





then mutual
antagonism (GATA
-
1

Pu.1) disappears.

Similarly,
K
it

modulates the cross
-
regulation of targets

GATA
-
1

GATA
-
1

target

Pu.1

Pu.1

target

Mathematical modeling

We used
matlab

to simulate the
system

Mathematical modeling

Mathematical modeling

Systematically modulated

(a)
the mutual
antagomism

between GATA
-
1 and Pu.1

(b)
m
utual antagonism
between the targets


For every point in the plane
we evaluate the steady
state ratio
g
T
/
p
T


T
he model behavior illustrates
that mutual inhibition and
repression of opposing
downstream targets act
synergistically to maximize the
G
T
/P
T

ratio

Systems Biology in practice

These results suggest that the dual mechanism
provides, in comparison to either cross
-
inhibition or
target inhibition alone, more robust suppression of an
alternative gene expression program during lineage
-
specification.


The example illustrates the highly multi
-
disciplinary
nature of much modern biological research, here
combining:


1. High
-
throughput techniques (
ChIP
-
Seq
)

2. Data analysis

3. Mathematical modeling to test hypothesis


Many times, the hypothesis might come first from the
mathematical model

Why
Matlab

and R?


Computational tools are indispensable for doing this kind
of research


In many cases students are held back by lack of
computational skills


Matlab

and R are both interpreted languages, i.e. no
compiler


This makes them slower than compiled languages


Both have an enormous number of extension packages


Octave is free
Matlab

“clone” and is available for
Windows, Mac and Linux


Both languages can be used interactively, but it is more
powerful to write programs.


Matlab


Advantages


Matlab

allows one to easily perform numerical calculations and
visualize the results.


Many additional libraries for statistics, signal processing, image
processing, etc.


Note
Matlab

has Symbolic Toolbox, Octave does not


Disadvantages


Slow, but can be improved via
vectorization


Matlab

not good for complex software projects (not OO)

Octave download and libraries

To download octave for your home PC or laptop, go to:
http://octave.sourceforge.net/

To install a package, from within octave, run:


pkg

install
package_file_name.tar.gz

For list of packages choose “Packages” from top menu:

Course outline


1. Learning to program in
Matlab
/octave


2. Applications in Mathematical Biology, including:


Elementary image processing


Linear regression


Markov processes and Fisher
-
Wright model


Difference equations


Ordinary differential equations


3. R programming


4. Statistics and Bioinformatics using R


Linear models


Statistical hypothesis testing and linear models


Expression data analysis


Analysis of high
-
throughput sequencing data

Further reading


There isn’t yet a good Systems Biology textbook that I’m
aware of.


I do
not

recommend this one


Further reading


“MATLAB Programming for Engineers” by Stephen J.
Chapman (Brooks/Cole)


“Mathematical Models in Biology” by Elizabeth S.
Allman

and John A. Rhodes (Cambridge
Univ

Press)


“Introductory Statistics with R” by Peter
Dalgaard

(Springer)


“Bioinformatics and Functional Genomics” by Jonathan
Pevsner (Wiley)

Octave


To start octave, open a terminal window and enter the
command “octave”


Octave basics


Getting help


Within octave type


help <command>, e.g. “help sort”


User
-
friendly online help available at

http://www.mathworks.com/help/techdoc/


GNU Octave help:

http://www.gnu.org/software/octave/doc/interpreter/

Octave basics


Files and directories


A MATLAB script file (Called an M
-
file) is a text (plain
ASCII) file that contains one or more MATLAB
commands and, optionally, comments.


The file is saved with the extension ".m".


When the filename (without the extension) is issued as a
command in MATLAB, the file is opened, read, and the
commands are executed as if input from the keyboard.


Download the file
calc_area.m

from the course website


http://www.ams.stonybrook.edu/~maccarth/teaching.shtml


Place the file in subdirectory “work”

MATLAB Script Files


The preceding file is executed by issuing a MATLAB
command:



>>
calc_area


This single command causes MATLAB to look in the
current directory, and if a file
calc_area.m
is found, open it
and execute all of the commands
.


If MATLAB cannot find the file in the current working
directory, an error message will appear.

MATLAB Script Files


When the file is not in the current working directory, a
cd or chdir command may be issued to change the
directory.


>> cd
~/work

>>
calc_area


Octave basics


The search path


Matlab
/Octave also uses a search path to find M
-
files


The m
-
files are organized in directories which
matlab

searches


To add a directory to the search path:


addpath
(‘<
directory_name
>’); e.g.
addpath
(‘~/work’)


savepath
;


You should now be able to run calc_area.m even if it is not
your current directory, simply type:


calc_area


Now open the file
calc_area.m

with ‘
gedit



Applications


Accessories


Text Editor


Change the radius to 3 and re
-
run ‘
calc_area