BIOINFORMATICS IN THE POST-GENOMIC ERA

wickedshortpumpBiotechnology

Oct 1, 2013 (3 years and 10 months ago)

72 views


1



BOGAZICI UNIVERSITY

BM 591 BIOINFORMATICS IN THE POST
-
GENOMIC ERA


I. 2003


2004

Fall

Semester Syllabus

Instructor
: Isil Aksan
-
Kurnaz


aksan@boun.edu.tr


iak
urnaz@yeditepe.edu.tr


http://hamlin.cc.boun.edu.tr/~aksan



Week 1

22 Sep

Course Introduction and organization





Introduction to basic Molecular Biology





Overview of Bioinformatics Topics :




DNA
sequencing



Introduction to Molecular Biology Databases



BLAST


Basic Local Alignment of Sequences



CLUSTALW


Multiple Alignment of Sequences



Protein sequencing



Gene regulation and transcriptomics



GeneScan and GeneFinder



PromoterScan



TRANSFAC Database



DNA C
HIP analysis tools



Modeling of biological systems


Introduction and assignment of term projects :


A


BIOINFORMATICS PROJECTS

(codes written by Jscript, C / C++, PERL or Java, as required)


1 )
siRNA designer




2) promoter finder



3) insulator analyzer



4) codon analysis


B


MODELING PROJECTS

(Gepasi simulation software, MatLab, or other)


1)
VEGF and angiogenesis


2) nuclear translocation and transcription



3) insulin gene regulation and blood glucose levels




4) cell cycle checkpoints and apoptosis



Week 2

29 Sep

Individual Meetings

1 x
1
hr per student

…..








Group Meetings


1 x 2 hr for all

Week 7

3 Nov


2






INDIVIDUAL MEETINGS




Mehmet Ugur Dogan




Ersen Kavak




Kurtulus Baris Oner




Ismail Burak Parlak




Ezgi Tasdemir




Meryem Ayse yu
cel







GROUP MEETING




The individual meetings will involve feasibility discussions, regular progress
meetings and reports, and group meetings will involve 20 min presentation per
week per student on the background information, methodology and progress

of
their term project, followed by group discussions. Please note that these group
meetings and your involvement in the discussions will form part of your grading !


Required from each student:

Lab book for recording the details of experimental procedure

Proper and regular documentation on the progress of project

Literature search and Bibliography generation (ProCite or other)



Week 8

10 Nov

Choose a journal suitable for publication





Start preparing preliminary manuscript !

… … …

Week 12

15 Dec

Term
paper manuscript discussion

Week 13

22 Dec

Final corrections of web trials and manuscript


Week 14

29 Dec

FINAL SEMINAR


open house





Submission of the manuscript to a journal


II


GRADING


1) Termpaper (3
0 %)


Each student will choose a journal relev
ant to his/her term project, prepare a
manuscript for submission by
Week 12
.


The manuscript will be auto
-
criticized and discussed by the whole group, and
submitted by
Week 14
.


2)
Final
Seminar (30 %)


3




Each student
will also be required to present a
fin
al scientific

seminar

to the
colleagues in the institute.
T
his

will constitute his/her final

grade.


3)
Discussions and Weekly Reports (40 %)


On top of mastering his/her own topic, each student must also gain sufficient
knowledge about other projects in
the group.
Therefore, the integration to group
discussions will constitute
20 %
of the total grade.


In addition, each student will be required to bring a
written weekly report

on the
ongoings of the specific project, which will be an additional
20 %
of t
he total
grade.


A MINIMUM OF 50 PTS IS REQUIRED TO PASS THE COURSE (TERMPAPER
20 +
SEMINAR 15 + REPORTS 25
).


I
II


BOOKS AND REFERENCE MATERIAL


Since each student will receive an individual project from a diverse set of topics,
there will be no single c
ourse book.


Each student will require
general molecular biology

information during the
semester. The following books are recommended, and can be found in the library:


Alberts, B, Johnson, A, Lewis, J, Raff, M, Roberts, K, Walter, P

Molecular Biology of

the Cell

(BIO401 / BIO402 textbook)




Stryer

Biochemistry

(BIO301 / BIO302)


For students choosing
Bioinformatics Projects
, the following book is recommended:

Krane and Raymer

Fundamental Concepts of Bioinformatics

2003


For students choosing
Modeling Pr
ojects
, the following book is recommended:

Campbell and Heyer

Discovering Genomics, Proteomics and Bioinformatics

2003


The following web site will be an essential reference material for all:

http://www.nc
bi.nlm.nih.gov/Entrez


4





BASIC CONCEPTS IN MOLECULAR BIOLOGY


Molecules of Life

There are four major organic molecules essential for life: lipids, carbohydrates, proteins and
nucleic acids. Since majority of the databases covered here are concerned with
genes, and thus
DNA, deoxyribonucleic acid, and gene products, proteins, we will quickly cover the nucleic
acids and protein structure.


Nucleic acids

are composed of nucleotides, the building blocks, and essentially contain a 5
-
carbon sugar, a phosphate g
roup, and nitrogen
-
containing bases
(Figure 1.1a)
(25, 45, 74)
.


There are 2 major groups of nucleic acids,
DNA
, or deoxyribonucleic acid, and
RNA
,
ribonucleic acid
(Figure 1.1b)
. DNA serves as the genetic material, in other words, genes are
made up of DN
A. DNA molecule has four major building blocks, Adenine (A), Thymine (T),
Cytosine (C), and Guanine (G), attached to a deoxyribose sugar. In other words, genes are
different combinations of these four letters, A, T, C, and G. RNA is also composed of nuc
leotide
building blocks, with one letter difference: it uses A, C, G, but Uracil (U) instead of Thymine, all
attached to a ribose sugar
(Figure 1.c, Appendix 2)
(25, 45, 74)
.


In all eukaryotic cells, the genetic material is packaged into a special compa
rtment called the
nucleus, whereas in prokaryotes there is no nuclear compartment and the genetic material resides
within the cytoplasm in a nuclear region. RNA in eukaryotes carries the genetic













a)

b)

c)


5










information from DNA in the nucleus to the cytoplasm where the action takes place through
proteins
(Figure 1.2)
(25, 45, 74).

















RNA molecules can be subdivided into three major subcategories, each with a different function
in the cell. The messenger RNA, or
mRNA
, is the message copied from a gene region within the
nucleus. It will then be transferred out to the cyt
oplasm, where it will be translated into protein,
and carry out a function in biological interactions. Ribosomal RNA, or
rRNA
, is a structural
component of the ribosomes, sites of protein synthesis
(Figure 1.2)
. Transfer RNA, or
tRNA
, is
another key com
ponent in protein synthesis, whose function is to carry the amino acids to the
mRNA and ribosomes for protein synthesis. We need not concern ourselves greatly about the
Figure 1.1

The building blocks of DNA. (a) The sugar
-
phosphate backbone of a double stranded
DNA, and the base pairs in the middle.

(b) A cartoon representation of RNA and DNA molecules.
(c) The nitrogenous bases in the structure of RNA and DNA.

nucleus

cytoplasm

DNA

RNA

Protein
synthesis
machinery

PROTEIN

Figure 1.2

A summary of gene expression in eukaryotic cells (bottom, previous page). DNA
gets
transcribed

into RNA in the nucleus. This RNA then gets translocated into the cytoplasm,
where the protein synthesis machinery
(ribosomes) will
translate

the message into protein.


6



protein synthesis machinery for the purposes of this book. In summary, genes are tra
nscribed
into mRNAs within the nucleus, and protein will be synthesized with the help of the translation
machinery (ribosomes, tRNA, and other components) using the mRNA as a template
(Figure
1.2)
(25, 45, 74).


Proteins are perhaps the most diverse of all

biological molecules, composed of building blocks
called amino acids
(Figure 1.3)
. There are 20 different amino acids
(Appendix 3)
, and these
may be present in any combination. Due to this diversity, proteins serve many different
functions in the cell,
such as structural components of the cell, cellular recognition or cell
-
to
-
cell
signaling, or as enzymes, carrying out biological reactions.









Since proteins are synthesized from an mRNA template, the translation machinery must
recognize the nucl
eotides in the mRNA (A, U, C, G) and translate this information into the
amino acid code
(Figure 1.4)
. Each 3 nucleotides (AUG, CCG, GCA, and so on) is called a
codon

and represent one amino acid (such as Methionine, Proline, and Alanine, respectively).
These nucleotide
-
to
-
amino acid translations have been used to generate a universal table called
the
genetic code
, which is conserved among all species (exceptions are shown in
Appendix 4)
(25, 45, 74).

The function of proteins largely depends on their 3
-
di
mensional structure and conformation,
therefore when considering a protein’s function it is important to predict its 3
-
D (tertiary)
structure, which is not possible to do at present only judging from their linear amino acid
sequence (primary structure)
(Ap
pendix 5)
(25, 74). Therefore most of the freely available
prediction programs initially focus on aligning different proteins and defining common ‘motifs’
which carry out certain functions, thus by analogy you can try to determine the function of the
pro
teins.



H
2
N C
+

R

H

H
2
N C CO

+

R

H


N C COO

R

H

a)

b)

Figure 1.3


Amino acids as building blocks of proteins. (a) an amino acid consists of an amino group, a
carboxyl group, and a variable group (R). (b) peptide bond between two amino acids (solid line).


7
















Recombinant DNA Technology

Genetic engineering, alternatively called recombinant DNA technology, is the name given to all
the techniques used in laboratory
-
based manipulation of genes. The tools needed for genetic
engineering or g
ene cloning have been identified within the past 20 years, the main components
being the restriction enzymes, and plasmid vectors (74).


Restriction enzymes are bacterial enzymes that recognize special DNA sequences and cut, or
digest, the DNA to smaller
fragments. They are present in the bacteria to actually prevent
bacteriophages (or bacterial viruses) from infecting the bacteria, hence the name ‘restriction’
(Figure 1.7)
.












mRNA

A U G G C U C A U U C A

Met

Ala

His

Ser

5’





3’

Met

Ala

His

Ser

Figure 1.4

Cartoon representation of translation from messenger RNA (mRNA) into protein.
Note that the codons (nucleotide triplets) are ‘read’ from 5’ to 3’ direction. The amino acids are
joined by peptide bonds (solid lines).

PROTEIN

GAATTC

CTTAAG

GAATT
CTTAAG

Restriction

Enzyme

G AATTC

CTTAA G

Figure 1.7

Restriction

enzymes act at specific sequences and ‘digest’ double stranded DNA
into two fragments at the site of digestion.


8




Vectors are special DNA sequences used to ‘carry’ the gene to be clo
ned. Most commonly used
ones derived from bacterial plasmids, or small circular DNAs replicating within the bacteria,
independent of chromosomal replication. Vectors are used to insert foreign DNA into the
organism of study, be it bacteria, yeast, insect

of mammalian cells,























where this newly engineered ‘recombinant DNA’ can duplicate every time the host cell divides.
If bacteria are used as host cells, millions and billions of copies can be generated in a short
peri
od. Since bacteria

divide by binary fission and produce identical progeny, all the bacteria thus produced will
contain exactly the same recombinant DNA molecule, hence these bacteria are referred to as
clones or colonies, and this procedure is known as
g
ene cloning

(Figure 1.8)
(45, 74).


Figure 1.8

General cloning strategy. The gene of interest and the cloning vector are both
digested

with the same restriction enzyme(s) and ligated. The construct thus produced is then
transferred into bacteria and amplified (the ‘cloning’ step)


gene of interest

cloning vector

RE

RE

RE


gene of interest

into bacteria


9



Another commonly used technique is the
Polymerase Chain Reaction
, or PCR, which serves to
amplify a specific DNA region. It involves amplification of new DNA strands from a template
under laboratory conditions (
in vitro
). For example, in about 30 PCR reaction cycles, one
generates 2
30

new DNA fragments identical to the original DNA region
(Figure 1.9)
(45, 74).















Regulation of Gene Expression


There are around 30,000 genes estimated in humans (31, 49)

but
not all of those genes are used
in all tissues. Pancreas is specialized in insulin production, muscle cell is specialized in
contraction, while the red blood cells in carrying oxygen and carbon dioxide, which means all of
these cells need a different set
of genes required to carry out their specialized functions.


Therefore it is necessary to control the expression of these genes at the right cell type and at the
right time. This elaborate control, called
gene regulation
, is achieved through certain DNA
elements preceding each gene, called
promoters

(Figure 1.10)
. The information as to where,
when, and how much a gene is expressed is somehow encrypted in these promoter sequences
(74). Regulatory proteins, or
transcription factors
, recognize certain regu
latory elements or
regions within the promoter in a molecular switch mechanism (74). In other words transcription
factors are special proteins, they can recognize defined motifs or sequences on the DNA, bind
that region, and either enhance or repress gen
e expression (by enhancing or repressing gene
transcription,

DNA

r
r
o
o
u
u
n
n
d
d


1
1


o
o
f
f


P
P
C
C
R
R


r
r
o
o
u
u
n
n
d
d


2
2


o
o
f
f


P
P
C
C
R
R


2
2
0
0
-
-
3
3
0
0


c
c
y
y
c
c
l
l
e
e
s
s


o
o
f
f


P
P
C
C
R
R


m
m
i
i
l
l
l
l
i
i
o
o
n
n
s
s


o
o
f
f


c
c
o
o
p
p
i
i
e
e
s
s


Figure 1.9

Polymerase Chain Reaction. The region of interest is amplified from template DNA
using specific primers (short horizontal arrows).

After 20


30 rounds of PCR the DNA region is
amplified 2
20



2
30

times.


10










as required). Therefore, which genes are expressed in a given cell type largely depends on which
transcription factors are active in that cell and which promoters can bind th
ose transcription
factor combination (74).


Genomics and the Post
-
genomic Era


With the recent developments in Human Genome Project, genomics has gained a substantial
place in the popular press. Genomics is the generic name for studies concerned with the
identification and analysis of the total genomic information in a given organism.


The first part of the Human Genome Project, just like many other genome projects, aims at
identification of the sequence of bases (A, T, G, Cs) in the human genome. This
is achieved
primarily through base
-
by
-
base sequencing of the DNA
(Appendix 6)
. Considering there are
around 3 billion bases in humans, it becomes clear what a daunting undertaking this has been for
many laboratories around the world, and these efforts hav
e proven fruitful in the past years (62,
64)
.
There is now the next challenge of identifying which portions of the 3 billion bases encode
for the genetic information, while other parts are ‘regulatory regions’, ‘non
-
coding sequences’,
or ‘junk DNA’ (75, 7
6, 77). The incredible effort of organizing and analysing the sequence data
has initiated the field known as genomics. Sub
-
fields of genomics include comparative
genomics, functional genomics, structural genomics and so on.



After the announcement of t
he first drafts of the human genome sequence, attention has been
focused on whole new areas of post
-
genomic research; namely, proteomics, transcriptomics and
sub
-
fields thereof (44, 55, 60, 67).

We will cover databases and analysis tools related to these
fields in later chapters.


p
p
r
r
o
o
m
m
o
o
t
t
e
e
r
r


G
G
E
E
N
N
E
E


T
T
r
r
a
a
n
n
s
s
c
c
r
r
i
i
p
p
t
t
i
i
o
o
n
n


f
f
a
a
c
c
t
t
o
o
r
r
s
s


Figure 1.10

A simplified cartoon of gene regulation. A combination of gene regulators known
as
transcription factors

bind to specif
ic sites on each promoter, and upon activation lead to
gene expression.


11



Model Organisms in Biology


Ultimately what molecular biologists are seeking is an explanation as to what life consists of,
how we turn into walking, talking, conscious human beings. Geneticists usually choose to study
mutations
, such as those in human genetic disorders, to find out what the normal function of a
gene may be. However it is a difficult task to study genetics with humans, both practically and
ethically: one cannot maintain them under laboratory conditions, mate the
m the way your genetic
experiment requires, or manipulate their genes (luckily!). Therefore other organisms, from
simple prokaryotes to the more complex eukaryotes, have been largely used for genetic studies
all over the world, in order to gain some more
information that may eventually benefit humans.

Figure 1.11.
A micrograph of
E. coli.

(http://education.wsu.edu/schmid/thebugzone/ecoli/e_coli.htm)







Eschericia coli

is
a
prokaryotic

organism. It contains no membrane
-
bound organelles, only the
cytoplasm,

ribosomes
,
cell membrane and cell wall, in addition to DNA, its genetic material,
which resides in the cytoplasm. The bacterial system has been,and still is, an axcelle
nt tool,
since one can grow them very rapidly and in large quantities with little cost, they are easy to
manipulate, and in addition the whole genome of
E. coli

has recently been sequenced (51, 89).
Bacteria have
haploid

genomes, i.e. they carry only a s
ingle chromosome, and they replicate by
a mechanism known as binary fission. This means a single bacterium divides and generates two
daughter cells exactly identical to each other and to the parent, which can be seen as the father of
gene cloning.


Ye
ast is a unicellular organism. What makes yeast really popular among molecular biologists is
the fact that it is a
unicellular eukaryote
, and as such yeast are more similar to humans than
bacteria are. One of the most commonly used yeast in the laborato
ry is the
Saccharomyces
cerevisiae
, or the

baker’s yeast,

the other being
Schizosaccharomyces pombe
, or brewer’s yeast.



12





Figure 1.12

A micrograph of
Saccharmomyces cerevisiae
.

(http://genome
-
www.stanford.edu/Saccharomyces/images/wildtype.GIF)

This organism’s genome, no surprise, has also been sequenced within the last 2 years (47, and
references therein, 52, 8
9).


Caenorhabditis elegans
, a transparent worm, has also been very popular mainly in the field of
genetics (47, and references therein, 89). It is very easy to grow and maintain in the laboratory,
and furthermore its developmental program has been extens
ively studied and many of its
developmental stages and
events have been identified, such as all the neural connections made and the fate of each cell
during development
. C. elegans

is also one of the organisms whose genomic sequence has been
published.


Figure 1.13

Transparent worm
Caenorhabditis elegans
. Micrograph from
http://eatworms.swmed.edu/Worm_labs/Avery/Pictures/wild
-
type_low.gif.



13




Figure 1.14

Zebrafish picture, http://www.stolaf.edu/people/colee/zebra.html.


Zebrafish,
Danio rerio
, is anot
her very popular tool for geneticists and
developmental biologists (45, 74, 89). Its easy
-
to
-
manipulate embryonic
stages are valuable for researchers.



Figure 1.15
The fruit fly.
(http://animalpicturesarchive.com/animal/EngNames/fruit_fly.html

)


Drosophila melanogaster
, the fruit fly, is an insect that has been used
quite exte
nsively by geneticists. The studies of Thomas Morgan and his
students in
Drosophila

had lead to the concept of sex
-
linked inheritance
(74). Many developmental mechanisms have
also been identified w
ith
the help of
Drosophila

genetics.

The fruit fly is ve
ry easy to
grow, maintain and manipulate in the laboratory. One can very easily
introduce mutations in this organism, its ‘fate map’ has been
constructed, which means its developmental program is almost entirely
deciphered, and in addition, its genome has

been largely sequenced (54,
89).


House mouse,
Mus musculus
, has been extremely popular in molecular biology,
especially since it is the closest organism to humans compared to all other model
systems from a developmental perspective. It has therefore es
tablished itself as a major

14



model organism for studying human disorders and cancer, as well as regulatory
mechanisms during development (74).


Figure 1.16
.
Mus musculus,

or the house mouse, is commonly used as an animal model for
human diseases. (Picture taken from

http://www.smd.be/msw/eng/r&d/rlvbd/pic11.htm)


15



A
PPENDIX 1.

Some useful numbers



Weight Conversions :



1



=


-
6

g


1 ng

=

10
-
9

g


1 pg

=

10
-
12

g




Other Conversions:



1 joule = 0.239 cal

1 cal = 4.184 joule

1 nm =


10 A




1 atm = 760 torr = 14.696 psi


o
K =
o
C + 273



o
C = 5/9 (
o
F


32)


Avogadro’s number (N)


=

6.022 x 10
23

molecules / mol

Boltzmann constant (k)


=

1.38 x 10

23

J /
o
K


Curie (Ci)



=

3.7 x 10
10

dps


Electron charge (e)


=

1.6 x 10

19

coulomb


Faraday constant (
F)


=

96485 J / V.mol


Gas constant (R)


=

8.31451 J /
o
K.mol


Light speed (c)


=

3 x 10
8

m / s


Planck’s constant (h)


=

6.63 x 10

34

J s


Henderson


Hasselbach equation



pH = pKa + log([A
-
] / [HA])

Michaelis


Menten equa
tion


V = V
max

[S] (K
M

+ [S])

Free energy change





G‽




T

S


G‽

G
o

+ RT ln ( [products] / [reactants] )



o


16



APPENDIX 2. Nucleotides















Molecular








Weight

Adenosine Triphosphate


(ATP)


507.2

Cytidine Triphosphate



(CTP)


483.2

Guanosine Triphosphate


(GTP)


523.2

Uradine

Triphosphate


(UTP)


484.2


deoxyAdenosine Triphospate

(dATP)

491.2

deoxyCytidine Triphosphate

(dCTP)

467.2

deoxyGuanosine Triphospate

(dGTP)

507.2

deoxyThymidine Triphosphate

(dTTP)

482.2




1 A
260

unit of double
-
stranded DNA = 50

朠⼠ml

ㄠ1
260

unit of single
-
stranded DNA = 33

朠⼠gl

ㄠ1
260

unit of single
-
stranded RNA = 40

朠⼠gl




The⁡癥r慧a⁍圠潦⁡⁤e潸祲楢潮uc汥潴楤e⁢慳a‽″㌰†䑡汴潮

The⁡癥r慧a⁍圠潦⁡† † 楢潮uc汥潴楤e†††⁢慳e‽″㐰†䑡汴潮



T漠o潮癥r琠tm潬o
潦⁤潵b汥
-
獴s慮ded⁄ 䄠A漠

g›

pmol

x (# of nucleotide pairs) x 660 pg/pmol {average MW of a nucleotide pair} x 1

g / 10
6

pg =

g

(A)


(G)

(dT)


(U)


(C)



17



APPENDIX 3.

Amino Acids




Alanine

(Ala, A)



Arginine (Arg, R)


aliphatic, hydrophobic, neutral


polar, hyd
rophobic, +charged (basic)


MW = 89





MW = 174








Asparagine

(Asn, N)



Aspartate (Asp, D)


polar, hydrophilic, neutral




polar, hydrophilic,
-
charged (acidic)


MW = 132




MW = 133










Cysteine

(Cys, C)



Glutamine (Gln, Q)


polar, hydrophobic, neutral




polar, hydrophilic, neutral



MW = 121





MW = 146







Glutamate

(Glu, E)



Glycin
e (Gly, G)


polar, hydrophilic,
-
charged (acidic) aliphatic, hydrophobic, neutral


MW = 147




MW = 75





18







Histidine

(His, H)



Isoleucine (Ile, I)


aromatic, polar, +charged (basic)


aliphatic, hydrophobic, neutral


MW = 155




MW = 131







Leucine

(Leu, L)



Lysine (Lys, K)


aliphatic, hydrophobic, neutra
l


polar, hydrophilic, +charged (basic)


MW = 131




MW = 146









Methionine

(Met, M)



Phenylalanine (Phe, F)


hydrophobic, neutral




aromatic, hydrophobic, neutral


MW = 149





MW = 165


















19



Proline

(Pro, P)



Serine (Ser, S)


hydrophobic, neutral


polar, hydrophili
c, neutral


MW = 115





MW = 105










Threonine

(Thr, T)



Tryptophan (Trp, W)


polar, hydrophilic, neutral


aromatic, hydrophobic, neutral


MW = 119





MW = 204







Tyrosine

(Tyr, Y)



Valine (Val, V)


aromatic, polar, hydrophobic


aliphatic, hydrophobic, neutral



MW = 181





MW = 117












Ave
rage MW of an amino acid = 110 Daltons

1 kb coding DNA = 333 amino acids (37 kDa protein)

270 bp DNA


= 10 kDa protein

1.35 kb DNA


= 50 kDa protein



100 pmol of 10 kDa protein

=

1

g

㄰〠1m潬o‵〠 䑡⁰牯e楮

=



g


20



APPENDIX 4.

The Genetic Code






The Universal Code

The Human Mitochondrial Genetic Code

Trp

Met

STOP

STOP


21




























There are cases where amino acids other than the standard 20 amino acids can be
incorporated into the growing polypeptide,
selenocysteine

and
pyrrolysine

being two
of them. These amino acids are encoded by UGA
and UAG, respectively, in addition
to the chain termination function of these two codons. The translation machinery can
somehow discriminate when these codons should be used for these amino acids rather
than a STOP. This codon usage has been found in som
e Archae and eubacteria.

The Yeast Mitoc
hondrial Genetic Code

Thr

Thr

Thr

Thr


22




APPENDIX 5.

Protein Structures





















Primary Structure
of proteins is the defined sequence of amino acids in the respective
polypeptide. It is this level of structure that other higher order structures of protein
s are
based, however we are still only scratching the surface of principles defining higher
order structure formation.




Secondary Structure
is the easily distinguishable regular local folding of proteins.
There are essentially two main folding patterns
, the

-

helix
, formed by regular folding
of a single polypeptide chain, or the

-
sheets
, formed by adjacent chains. In between
these regions one can find less structured regions, commonly known as the
random
coil
.

aa1

aa2

aa3

aa4

aa5

aa6


23









Globular proteins
carry out essential functions in the cell, such as synthesis,
metabolism and transport, so named because of their compact shape. The polypep
tide
chain forms secondary structures, which further fold onto each other, forming the
Tertiary Structure

of the protein. Note the 3
-
D arrangement of

-
helices in the
structure of ferrin and the barrel arrangement of

-
sheets in the structure of porin in the
tertiary structures below.






ferritin



porin

Tertiary structure formation should be a thermodynamically favored process, under
physiological conditions. Protein folding is mediated by noncovalent bonds, charge
-
to
-
charge interactions between side chains, internal hydrog
en bonds, and van der Waals
interactions, as well as hydrophobicity; ie hydrophobic side chains will prefer to be on
the inside of the globular protein, away from aqueous intracellular environment.



Many functional proteins within cells exist as aggreg
ates of two or more polypeptide
chains, called the
Quaternary Structure.

Each folded polypeptide chain forms further
interactions with other partner polypeptides, again through salt bridges, hydrogen
bonds, van der Waals forces, or hydrophobic interaction
s, and sometimes disulfide
bonds.


-
helix

-
sheet


24






Bovine hemoglobin above is composed of two pairs of non
-
identical subunits, alpha
and beta, the overa
ll arrangement being roughly tetrahedral. Each subunit is colored
differently for easy visualization.


25




APPENDIX 6.

DNA Sequencing























The basic principle of DNA sequencing is essentially similar to PCR technique.

DNA
is denatrured, and the DNA template (green strand in the above picture) is annealed to
the sequencing primer (short red strand). The primer is extended using DNA
polymerase and deoxynucleotides (in the 5’ to 3’ direction), until terminating
nucleoti
des are ‘randomly’ incorporated, and polymerisation reaction is stopped. The
extended DNA polymers are then run through a sequencing gel, and individual
fragments are visualized, and DNA sequence is read.


Read sequence:

1

2

3

4

5

6

7

8

9

10

11

12

13


26



In the Sanger method of dideoxy
-
sequencing, the t
erminations nucleotides are
dideoxynucleotides (ddGTP, ddATP, ddCTP, and ddTTP). Because these nucleotides
lack both oxygens in the ribose sugar, the sugar
-
phosphate bonding cannot be
catalysed, and polymerisation reaction stops. In this kind of sequenci
ng, the initial
polymerisation reaction is carried out in the presence of a radioactively labelled
deoxynucleotide (e.g.
35
SdCTP), and then the reaction is divided into 4 tubes, each
containing only one type of dideoxynucleotide. The reaction products in
4 tubes are
loaded onto different wells of the sequencing gel, and exposed to X
-
ray film and
visualized through
autoradiography.













Above is a sample autoradiogram of a DNA sequence from the species
C. cylindrica
,
AR and BR representing seque
nces in opposite directions.


In
Automated sequencing

the deoxynucleotides are conjugated to a special dye which
fluoresces upon excitation with laser. Each of the 4 nucleotides have their own unique
color, which means the reaction can be carried out in a

single tube, as opposed to 4
tubes in manual sequencing.


27



As the sequencing reaction passes through the gel, each fluorescence is measured by a
single peak, which is then interpreted by the computer and sequence generated. For
convincing results, and to a
void any ambiguity, usually both DNA strands are
sequenced and analysed by the computer.