Midterm2x - People


Oct 1, 2013 (4 years and 9 months ago)



Paradigms in Bioinformatics

Midterm #2

Alexandru Cioaca

While prokaryotes are simple, mostly unicellular organisms concerned
with basic
interactions with the environment, eukaryotes have a much more complex organization, with
cells that
specialize i
n different functions
and self
assemble in higher order
structures such as tissues and organs. By employing these
varied functions in a more or less

intentional fashion, the

cells synergize



upon the basic nee
of survival and reproduction.

hey benefit from the apparent advantage of a
larger set of skills
through which they can intera
ct with other cells, individuals, species or the environment.

However, a
lmost all the cells in an eukaryote organism contain a copy of the DNA molecule of
the individual. In other words, each one of these cells contains a copy of the genome

so it has
iate access to all the biologic
al information regarding the totality of


of the
particular individual. This information is structured in basic units known as genes
, one gene

nothing more than a
specific section of the DNA, encoded as the order

in which
nucleotides are
laid out

in linear sequence

Through biochemical reactions
, this information is
used to synthesize proteins (mostly) which accomplish various tasks inside the organism

order to support
the processes

of life.

But if each body ce
contains the information

all that
the cell

is designated to
do but also what different types of cells are designated to do, then how do cells differentiatiate
in the first place and how do



know what subset of genes has to be active for
each cell type
? At the same time, it is very intuitive that gene expression doesn’t happen

all the
time for every gene. While it is true that there some genes (called housekeeping genes)
responsible with the
permanent cycle of routines sustaining metabolism, it is also true that
there are genes that get expressed only under particular circumstances, usually when their


really needed inside the cell. This suggests that
at the cellular level, there m
exist mechanisms regulating the activity of genes. The activity of these mechanisms falls under
the broad term of “gene regulation”.

If life can be seen as a complex system of biochemical
processes consisting, at the lowest level, of using genomic inf
ormation to act in a specific
manner, then gene regulation consists in controlling these processes and the way they are

Since the pathway of gene expression

has several steps, it is useful to revise their order
for a better understanding o
f where gene regulation can intervene. The structure of the DNA
molecule resides in the nucleus of the cell and contains all the genes. When a particular gene is
about to be expressed, a temporary copy of its information is created under the form of a RNA
molecule (more specifically, mRNA). This process is called “transcription” and takes place in the
nucleus. The mRNA molecule is then processed through a couple of other chemical reactions
responsible for improving its robustness. Then, the mRNA moves from
the nucleus to the
and based on the information transported from the nucleus, it instructs the
ribosomes to synthesize chains of aminoacids called polypeptides which bind to form the
protein product. This process is called “translation”. The path
way can be topologically extended
both at the beginning and the end. At the end, because the protein product is subject again to
transportation or various other reactions which might stop it from accomplishing the task it’s
meant to. While at the beginning
, we can think of the fact that the DNA molecule, although
identical in information in each body cell throughout the organism, it differs from cell to cell in
the set of genes that are active.


steps in gene expression where control mechanisms




transcription (before transcription is initiated,

active genes)


Transcription (copying DNA into mRNA might be blocked through certain mechanisms

or might
require some auxiliary cellular activity)


transcription (mRNA might not be
robust enough to make it to the ribosomes)


Translation (the structure of mRNA might not be suitable for use by ribosomes)


translation (the protein product might be obstructed from pursuing its actions)

We can say there are two big categories of contr
ol mechanisms: some that facilitate or
induce a certain event and some that deny or inhibit a certain event.

An important observation that has to be done is the fact that we cannot consider to be control
mechanisms those various breakdowns inside the inte
rmediary steps in protein synthesis that
have a temporary or non
deterministic origin, such as random faults or insufficient resources.
Regulation implies steadiness and under similar circumstances, similar results
have to be
obtained, with little to noth
ing variation acceptable. Regulation does not happen by chance but
through means which are as deeply embedded in the organism as the processes under control.
At the same time, this thing tells us that regulation is a product of evolution as well so it
ared through the same natural principles of hit’n’miss, from mutation to mutation, until it
served an advantage in survival.

Following up is a list of gene expression regulatory mechanisms in chronological order,
transcription to post


Chromatin structure

In eukaryotic organisms, the DNA strand can reach a length of 2 meters, but it has a diameter of
molecular scale. In order for it to fit inside the microscopic nucleus of cells, it is packaged
arranged around molecules called
histone in a tight thread
spool fashion. These structural units
are called “nucleosomes” and are, at their own turn, tightly bound in structures known as
chromosomes. Studies done on DNA in vitro have showed that it makes a difference whet


a gene are to be found wrapped around histones or suspended between them.
sites allow certain proteins to bind to them, proteins which play an important role in

to RNA. Two relevant examples are the transcriptional activator protein (TA
P) and
the TATA
bnding protein (TBP). In
the attached figures

we can see the unfavorable case
where these two binding sites are inaccessible. However, there are several multiprotein
complexes called chromatin
remodeling complexes (CRC) that can be empl
oyed in order to

DNA off the nucleosomes so that TAP and TBP become acces
sible. This is shown in
the attached figures
. Once this happens, the transcriptional factor can attach close to the gene
represented by these sites and is joined by RNA poly
merase which transcribes the particular
gene. The different distributions of DNA onto the nucleosomes are hard to predict

from one cell
to another but due to the tight compression of DNA, it is very likely that there will be genes
whose binding sites will
not be accessible. In this case, CRCs act as regulators that make the
genes transcribable. Since CRCs are nothing but protein complexes, they are synthesized from
information contained in other genes.


Altering of DNA structure

Sometimes, changes occur i
n the DNA sequence of somatic cells and are transmitted to their
descendents. These changes are programmed and are either d
eletions or transpositions. T

have direct effects on gene expression, since the sequence of nucleotides is no longer the
same. An

example for programmed deletions takes place in the bone
derived cells and
derived cells of vertebrate immune systems. They have complementary

B cells
produce antibodies that mark antigens for destruction, while T cells recognize this

mark and
prevent them from entering the cell. B cells are able to synthesize only type of antibody and
have been discovered that this is the expression of a particular gene. However,
each one of the

particular gene
s responsible for synthesizing antibodies

was found to be a subsequence of a
longer initial sequence. This long sequence is cut and joined after mitosis based on reacting at
the encounter with a type of antigen. One of the most commonly

in the
organism is immunoglobin G and its s
tructure was found to resemble the letter Y. When
targeting different antigens, most of its structure was chemically similar, except for its upper
ends. Their configuration was proved to be due to the way programmed deletions occur, which
brings nearer to
the constant part of the gene the correct type of DNA sequence associated
with the antibody in cause.

Another example of altering DNA structure is that of programmed transpositions in regulating
yeast mating type. This organism has two mating types, a and

. The diff
erence stands only in
phenotype, as studies have revealed the fact that the genotype contains biological information
about both genders under the form of interchangeable cassettes. Through DNA rearrangement,
yeast can switch to either a and
in the lineage of a particular cell and mate from this


Alternative promoters

There are genes that have more than one associated promoter.

From the same protein
regions, depending on which one of the promoters is active, different tran
scripts can be
obtained. In this case, the control mechanism is the active promoter. Its active status is
determined in the cell cycle. For example, the gene for alcohol dehydrogenase in Drosophila
uses one promoter when it is in larvae state and another o
ne when in adult state. This is a
fascinating and elegant solution
, comparable to that of dynamic pointers in high
programming languages.


Epigenetic control

Epigenetic is a term that means “on genes”. It refers to a type of control over gene expression
that is not caused by altering the sequence of bases in DNA, but to an external factor that
prevents a particular sequence of being read (transcribed) for what
it is supposed to be. One
example is the addition of a methyl (CH3) group to the number
5 carbon atom in the cytosine
This process is called methylation and it causes a lower transcription rate of the
methylated sequence. Another type of epigenetic
mechanism refers to specialized proteins that
bind at a particular sequence of the DNA molecule with the same effect in the transcription
rate. Heavy methylation is associated with the inactivation of genes in the X chromosome, for
example. As cells underg
o division in females, there is a moment in the cell lineage where one
of the X chromosomes becomes inactive and all descendants of that cell will inherit this
particularity. Another example is that of “genomic imprinting” in mammals, where hundreds of
es are heavily methylated in the germ line, but in a different fashion from male to female.
This is retained throughout embryonic development but it can be reversed later on in
development. Various theories suggest that this is a parental conservation inst
inct at the
expense of the fetus

so there is a balance between the exchange of resources between the
mother and the progeny.


Transcriptional initiation

The initiation of transcript is probably the most used regulatory strategy. Transcription takes
place i
n the nucleus and produces an mRNA molecule that carries information about a gene
encoded in the DNA molecule to the cytoplasm. This is achieved by RNA polymerase which
copies a sequence of nucleotides into mRNA. Thus, RNA polymerase has to know where is t
sequence of interest located and when to start copying it. The latter issue usually involves
proteins known as inducer and repressor. Inducers correspond to positive regulation and
activate transcription of a certain gene. Without the activity or prese
nce of an inducer, the gene
is inactive. When the cell needs the gene to be expressed, a chain of events unfold so that the
inducer attaches in a location close to the gene (upstream) which signals the start of
transcription. If the gene is constantly acti
ve, then it is probably regulated negatively, through
proteins called repressors that bind upstream (by themselves or along with a protein complex)
and disable the expression of the gene until further events deem it necessary to recommence.

The most impor
tant factor in transcriptional initiation is a protein called transcriptional
activator protein (TAP). This binds upstream of the gene and recruits the transcription complex
which at its turn, triggers the recruitment of RNA polymerase

. Transcri
activator proteins are mostly gene
specific; their action can be negatively regulated by proteins
that bind to it and block the transcription complex. Some categories of TAPs are helix
motif and zinc fingers. As there are two types of re
gulation, that is positive and negative, this
implies a large variety of possibilities for interacting with the environment. A protein product
that is required in special circumstances could be associated with a gene that is normally
inactive. The lack of
the protein product might negatively regulate another gene that is
responsible for producing the TAP which enables the transcription of the gene associated with
our missing protein product. Another plausible scenario of regulation deals with synthesizing
roducts that defend against a high concentration of an unwanted molecule in the gene. When
the unwanted molecule is present, a normally inactive gene might be positively regulated by
the intruder and its associated product will start being produced.

r class of regulatory mechanisms are DNA sequences found at a variety of locations
around genes, called enhancers and silencers. As the names suggest, their molecular structure

is designated to either hasten or strength transcription (enhancers) by bonding

with the
transcriptional complex or on the contrary, prevent the transcription (silencers).


Transcript Processing

The transcription process from the same gene under the same promoter can still yield different
mRNA molecules. This is due to an important feature of the genome called “alternative
splicing”. Since most of the eukaryotic genes are non
contiguous blocks of
coding sequences of
base pairs, the first draft of mRNA contains two types of sequences: exons, which give the final
form of mRNA and introns, which are removed. However, by alternating the
selection of exons
and introns in the post
transcription processin
g of mRNA, the cell can come up with more than
one expression from the same gene. For example, the 30000 human genes can encode 64000
to 90000 proteins, based on this alternation. Thus, gene expression can be regulated to keep
certain sections on the initi
al mRNA molecule and discard other. This is governed by decisional
factors from within the cell as it processes the mRNA in order to obtain sequences that are
viable for translation. These decisional factors are means to regulate gene expression and act
rough the same biochemical algorithms developed by evolution.


RNA Transport

Once DNA has been transcribed into mRNA and this has been processed for translation, mRNA
is heading towards ribosomes in the cytoplasm for translation. Regulation factors have be
found that can stop it on its way, RNA interference being one of them. RNA interference works
through small RNA molecules that can cleave mRNA in non
translational sequences or even
block it from being translated by the ribosome. These molecules
are of
two types: small
interfering RNA (siRNA) and micro RNA (miRNA). They are produced in the cytoplasm from a
special molecule called double
stranded RNA and are first chopped in even smaller sequences
by the

enzyme. These cleavage products are recruited

by an RNA
induced silencing
complex protein (RISC) and target mRNA with complementary sequences. Their effect on the
mRNA is different: RISC with siRNA cleaves mRNA, while miRNA attaches to it and prevents


Transcript Stability

The mRNA molec
ule has a lifetime of about 3 hours in most eukaryotes and it is meant for
being translated in the same cell. This is due to the fact that each cell differentiate through the
active set of genes that describe the function of that cell. Under special circum
stances, this rule
does not stand and the mechanisms that ensure a certain destination and length of life for the
mRNA molecule are overwritten. An example occurs in newly fertilized eggs whose metabolism
translates preexisting cytoplasmic mRNAs transcribe
d by the mother. This is definitely not
common practice in mature organisms. For example, the way this becomes possible in
Drosophila is through the elongation of the poly
A tail of the mRNA
. Another relevant example
is that of silkworm fibroin mRNA. Durin
g cocoon formation, the silk gland synthesizes silk
fibroin in large amounts. There are three factors controlling this unusual behavior: cells become
highly polyploidy accumulating a large number of chromosomes, hence copy of the silk gene,
the promoter of

this gene is strong and enhances the rate of transcription and the transcribed
mRNA is very stable, which a lifetime of days.

At the same time, there are factors that can
speed up the degradation of mRNA. One of them is the deadenylation
dependent pathway
through which an enzyme trims the length of the poly
A tail of the mRNA which makes it
susceptible to a decapping enzyme that removes the 5’ cap. Without it, the mRNA is unable to
initiate translation and is rapidly degrade by exonucleases. The other one

is called
independent pathway which either decaps or cleaves mRNA. These regulation
mechanisms are useful to prevent the synthesis of incomplete polypeptides in the cell.


Initiation of Translational

Translation is the process through which
mRNA is used by the ribosome to synthesize the
polypeptides that compose the protein. This process takes place outside of the nucleus and it is
independent from transcription. Eukaryotes can regulate gene expression at this level too. The
two basic types o
f regulation that can be imposed here is the obstruction
or facilitation
mRNA to be translated and the rate at which proteins are produced.
In contrast with the
examples given above in the case of transcript stability, here we are referring to a regular

messenger RNA transcript, but an intensification or relaxation of the translation process. The
most interesting example of regulation at this level is given by recently discovered small
regulatory RNA molecules complementary in sequence with mRNA. These a
re called “antisense
RNA” and they act by pairing with mRNA over short sequences, the consequence being either
inhibition or activation of the translation. An example of inhibition can be found in E.coli’s
through the OxyS regulatory RNA

which affects the

. This molecule has the
ability of binding at critical sites, rendering the mRNA unable to bind with the ribosome. On the
other hand, DsrA regulatory RNA
activates the translation of the gene
, responsible for
encoding a sigma factor for

RNA polymerase that allows transcription of a new set of RNAs
from a special set of promoters at stationary phase in cell cultures when the cell density is high
and the intensity of cell proliferation is low. The 5’ end of

mRNA is self
and it curls under the shape of a hairpin, trapping the ribosome
binding site and the
translational start site. These sites become exposed under the effect of DsrA on the

so translation can be issued.

See attached figures.



After the protein has been synthesized by the ribosome under the form of series of
polypeptides (chains of aminoacids), its functions can be extended through further chemical
modifications consisting in joining other molecules to it, cleaving
its structure at different sites
or changing some of its aminoacids groups.

These operations are usually performed by
specialized enzymes which can be considered the control

mechanism active at this level. One of
the organelles responsible for this type o
f regulation is the Golgi apparatus.


Protein Stability

Some proteins degrade faster than other. The rate at which they decay can be a consequence
of external molecules acting upon it, for example to regulate an excess of the protein in cause
or a fault wh
ich generated a protein to be active in the wrong cell. Another factor of decay can
be embedded in the protein, under the form of aminoacid sequences that break down in time
easier. This means that once a protein is synthesized, there are still ways of con
trolling its


Protein Transport

Last step in expressing a gene consists in transporting the protein to its designated

Responsible for the displacement of proteins are, obviously, other proteins called
carrier proteins. The most chall
enging transport occurs through the cellular membrane,
between two neighboring cells. This also hints at the possible reasons why protein transport
should be controlled: there has to exist a mechanism to check outgoing or incoming proteins
and make sure th
ey are eligible for transport. Since these kinds of verifications can only occur
from a biochemical point
view, the structure of the carrier protein enables them to verify the
compatibility of the protein to be transported with its destination.

A gene regulatory network (GRN), as the name implies, represents a set of genes
responsible for influencing the expression of another

whose product is required by
the cell
. The term “network” suggests that the effect they have on the gene to be

expressed is

to that of a network of on
off switches and potentiometers
, so both digital and analog

The scientific approach towards studying and modeling

employs mathematical
concepts such as graph theory and combinatorial logic.


a basic level, a GRN can be thought
of as a
box that exerts one specific

action of control on the gene to be expressed, action
that can be

as the resultant action of all the
genes part of the network
. But looking
what happens at
the m
olecular level,
the situation is far more complicated. T
he individual
regulatory genes come into play at different time
s and determine different characteristic
s of
he protein synthesis process. Some of them are interconnected as adjacent unit
, where the
product of one gene directly communicates (r
eacts) with the next


in line

while some of
them can be considered on far


branches of the

network; t
work in parallel and
appear to
be independent but their products cumulate after other conditions (pa
rt of the same GRN) are

Considering the fact that gene expression has the ultimate single goal of providing a
finite product under the form of a protein, GRN can be seen as
converging network of
regulatory genes

(like a funnel towards

, orchestrating the
intermediate steps
by enabling or disabling
, amplifying or attenuating

certain biochemical reactions
. M
definitions of GRN place their actio
n in
the transcriptional scope, as
the GRN determines when
and how much RNA is transcrib
ed for synthesis in the ribosome. However, considering the
various types of gene regulation presented above, an even larger scope of GRN extends to all
the steps undertaken

the cell towards protein synthesis and usage.

Just as genes vary greatly in
DNA sequence and protein

in their structure, GRNs can
come in different forms. Their common features are those given by their general role of
regulation. First of all,

GRNs need

be able to read the features of interest in the environment
(cell, tissue, et
c) through input signals. These input signals could be the concentration of a
particular molecule such as protein
s and hormones. Then, GRNs need be able to generate the
appropriate output signals through which they influence the outcome of the target gene
expression process. Since we are working in the same molecular context, it comes as no
surprise that these output signals are molecules as well, mostly proteins. It is interesting to
note, from an engineering perspective, that this type of communication is

neither asynchronous
nor synchronous and it does not involve neither closed or open channels. The cell is functioning
so well exactly because there are no constraints imposed on communication. Input signals are
from the “wild “
and generate the relea
se of output signals into the “wild”

other similar
mechanisms are responsible for
transporting this output signal to its place of action where it
will act as a decisional factor. Also, as all regulatory structures, GRNs need a feedback loop
which can be s
een as nothing more than input signals (from the GRN perspective) that were
generated by the environment after the GRN started taking action. As we can see, input, output
and feedback signals are not that different in concept, all of them being molecule
It is their
particular structure that makes the difference, but their particular structure is more than a
symbol or a tag, it is also the actual function

they are designated for
. We are literally talking
about a permanent circle of life, where nothing is e
nforced or requested, except that everybody
plays a

small pa
binding with the molecules with which it
is compatible
and treating them (or
the event of

binding) as an input signal;

based on this,
other molecules are

will serve as outp

signal and influence the expression of other “coworkers”
. Only when we
place cellular life in an abstract framework and start discerning genes in regulatory or structural
genes we are able to see causalities

The relation of gene regulatory networks wit
h the gene regulation mechanisms
presented above

is the fact that GRNs can employ any of them as nodes of the network.
Moreover, as we can see from above, most of these mechanisms contain more than one step
so we can say that they are gene regulatory netwo
rks as well. Taking the most trivial example of
regulation, we could have a housekeeping gene that is always active and its biochemical
pathway isn’t influenced by the action of any other regulatory gene. Even in this case, we still
have more than one regu
latory mechanism involved, as translation has to be started by a
transcriptional activator protein which is the expression of another gene, and mRNA has to be
processed and then translated into a protein. But if we choose to consider this as not being a
twork, we can see that there
are no other regulatory mechanisms

as straightforward;
they all
increase the complexity of the pathway so under the assumption we made, there is more than
one gene contributing to the final product. In my opinion as a control e
ngineer, “gene
regulatory networks” is nothing more than the appropriate term for what actually happens
during gene expression. It is true, however, that we can establish degrees of complexity in
these networks, based on the number of participant genes (nu
mber of nodes) and the
interactions between them (number of arcs). For example, it would be unfair to place under the
same category housekeeping genes with the complex system of genes controlling early
development in embryos. The threshold between simple n
etworks and complex networks is
debatable, so probably approaches such as complexity theory and chaos theory could also be
suitable for their study and it will also eliminate the bias between classifying some of the genes
as structural and some as function
al, which is only a matter of perspective.

The two papers studying chromatin structures and gene regulation I chose are:

Mechanism of Protein Access to Specific DNA

Sequences in Chromatin: A Dynamic Equilibrium

Model for Gene Regulation

K. J. Polach

nd J. Widom

J. Mol. Biol. (1995) 254, 130


throughput mapping of the chromatin structure

of human promoters

Fatih Ozsolak
, Jun S Song, X Shirley Liu


David E Fisher

Nature Biotechnology, Vol. 25, No. 2, Feb. 2007

The first paper deals with
the problem of certain DNA sequences tightly wrapped around the
nucleosomes not being accessible to transcription regulatory proteins, but they are still
transcribed. The authors present three alternatives as an answer: proteins bind before DNA is

in chromosomes,
the DNA sequences of interest are never actually packaged or there
are mechanisms of active invasion in order to transcribe those sequences. They provide
arguments for all three models. For the first one, cells that are prevented f
replicating their DNA are still undergo transcription. The second one is dismissed through
physical considerations, as there is nothing in the structure of DNA that could enforce the same
distribution along the nucleosomes in cell of the organism. And
for the third one, the problem is
that it lacks an explanation for how proteins are able to target the right nucleosome.

The authors are trying to extend the third model by considering the nucleosomes as dynamic
structures that expose temporarily stretche
s of their DNA. They use an approach based on
modeling mathematical
ly the kinetics of nucleosomes and trying to prove the correctness of
their model through the laws of energy conservation. In parallel, they are performing
experiments in vitro with sea urc
hin cells by replacing the regulatory protein with a restriction
enzyme and engineering nucleosomes with sites for E. In this way, they expect to detect
through gel electrophoresis the effect of the enzyme on the nucleosomes.
From their
experiments they ob
serve that all restriction sites are cleaved, hence accessible. At the same
time, they estimate an equilibrium constant that happens to quantify how well inside the
nucleosome is the the restriction site located; the further it is from exposure, the more e
is needed for dislocating it. The authors conclude their assumption is true and the model is a
good approximation of the underlying mechanics. Nucleosomes are not static and temporarily
expose the binding sites needed for regulatory proteins. These p
roteins use that short window
of time to attach to the promoter and they recruit other protein complexes which help
displacing the nucleosomes for proper transcription.

The second paper
is trying to address the problem of observing the motion of
nucleosomes in
experiments. The authors present a high
throughput microarray approach and an analysis
algorithm for examining nucleosomes
positioning in promoters of 3600 human genes

First, they are performing an in vivo footprinting experiment on the DNA

molecule on human
cancer cells and hybridize isolated nucleosomal and input DNA in the microarray. They study
the data using signal
processing techniques such as wavelet decomposition against high
frequency noise and edge
detection for curve profiling. Th
ese curves have oscillatory shapes
and based on the peaks of the oscillations, the location of each nucleosome is inferred. Then
they focus on extrapolating from their data locations of transcription factors and discover these
to be mostly between nucleoso
mes. The paper was interesting to read
as it involved

the use of
signal processing and statistical analysis