Bioinformatics - Log in to PING PONG - Karolinska Institutet

lambblueearthBiotechnology

Sep 29, 2013 (3 years and 11 months ago)

108 views

Proteomics and Bioinformatics
Bioinformatics
2010-02-08
(Jan-Olov Höög 1
Bioinformatics
Bioinformatics
Medicine 
Computer 
Science
Matematics
Statistics
Biology
Statistics
Jan‐Olov Höög, Dep of Medical Biochemistry and Biophysics, 
Karolinska Institutet, Stockholm
Bioinformatics
Bioinformatics
Bioinformatics is the application of information
technology
and computer science
to the field of molecular
biology
. The term bioinformatics was coined by Paulien
Hogeweg
in 1979 for the study of
informatic
processes in
Hogeweg
in

1979

for

the

study

of

informatic
processes

in

biotic systems. Its primary use since at least the late 1980s
has been in genomics and genetics, particularly in those
areas of genomics involving large-scale DNA sequencing.
Bioinformatics now entails the creation and advancement
of databases, algorithms, computational and statistical
techniques, and theory to solve formal and practical
problems arising from the management and analysis of
biological data.
2
Common activities in bioinformatics include mapping and
analyzing DNA
and protein sequences, aligning different
DNA
and protein sequences to compare them and creating
and viewing 3-D models of protein structures.
Proteomics and Bioinformatics
Bioinformatics
2010-02-08
(Jan-Olov Höög 2
The mother of BioinformaticsThe mother of Bioinformatics
Dr. Margaret Oakley Dayhoff (1925-
1983) was a pioneer in the use of
computers in chemistry and biology,
be
g
innin
g
with her PhD thesis pro
j
ect in
g g j
1948.Her work was multi-disciplinary,
and used her knowledge of chemistry,
mathematics, biology and computer
science to develop an entirely new
field.She is credited today as a founder
of the field of Bioinformatics.This field is
defined as the use of computers in
solving information problems in the life
sciences, mainly involving the creation
f t i l t i d t b
3
o
f
ex
t
ens
i
ve e
l
ec
t
ron
i
c
d
a
t
a
b
ases on
protein sequences and genomes.Dr.
Dayhoff was the first woman in the field
of Bioinformatics.She was also the first
woman to hold office in the Biophysical
Society, serving first as Secretary and
later as President.
Bioinformatics milestonesBioinformatics milestones
• 1965 Margaret Dayhoff’s Atlas of Protein Sequences
• 1970 Needleman-Wunsch algorithm (alignment)
1977 DNA i d ft t l it (
St d
)

1977

DNA
sequenc
i
ng an
d
so
ft
ware
t
o ana
l
yze
it

(
St
a
d
en
)
• 1978 Bioinformatics was coined
• 1981 Smith-Waterman algorithm developed (alignment)
• 1981 First protein sequence database available online
• 1982 GenBank Release 3 made public
• 1986 Swiss-Prot database available online
• 1990 BLAST: fast sequence similarity searching

1992
TopPred
:topology predictor for membrane proteins
4
1992

TopPred
:

topology

predictor

for

membrane

proteins
• 1995 First bacterial genomes completely sequenced
• 1996 Yeast genome completely sequenced
• 1997 PSI-BLAST
• 1998 SignalP: prediction of signal peptides
Proteomics and Bioinformatics
Bioinformatics
2010-02-08
(Jan-Olov Höög 3
Sequence → Sequence → 
structure → structure → 
function
function
function
function
5
Prediction from DNA sequencePrediction from DNA sequence

Protein-coding genes (gene finding)
• transcription factor binding sites

transcription start/stop
transcription

start/stop
• translation start/stop
• splicing: donor/acceptor sites
• Non-coding RNA
• tRNAs
• rRNAs
• miRNAs
• General features
6
• Structure (curvature/bending)
• Binding (histones etc.)
Proteomics and Bioinformatics
Bioinformatics
2010-02-08
(Jan-Olov Höög 4
Shotgun Shotgun 
sequencingsequencing
7
Prediction from amino acid sequencePrediction from amino acid sequence
• Folding / structure
• Post-Translational Modifications

Attachment:
phosphorylation
glycosylation
lipid attachment etc

Attachment:

phosphorylation
,
glycosylation
,
lipid

attachment

etc
• Cleavage: signal peptides, propeptides
• Sorting: secretion, import into various organelles, insertion into
membranes
• Interactions
• Function
• Enzyme activity, transport, receptors, structural components etc.
• Pathwa
y
8
y
Proteomics and Bioinformatics
Bioinformatics
2010-02-08
(Jan-Olov Höög 5
Diagonal Diagonal plot analysisplot analysis
:
:
10 A |* * * corresponding alignment:
D | *
E | *
G | * * A K G C T G A E D A....
G | * * A K G C S G G E D A....
5 S |
C | *
G | * *
K | *
1 A |* * *
+--------------------
A K G C T G A E D A ....
1 5 10
9
1

5

10
Principles of classificationPrinciples of classification
• Prediction of a protein property or feature is very often a
classification problem
classification

problem
.
• Example: Signal peptides – we want to classify every protein
as either having a signal peptide or not having one
• Example: Post-translational modification – we want to classify
every amino acid as either modified or not modified
• What do we want to accomplish?
• Find as many true positives as possible = high sensitivity
10
• Predict as few false positives as possible = high specificity
Proteomics and Bioinformatics
Bioinformatics
2010-02-08
(Jan-Olov Höög 6
Classification methods in Classification methods in 
bioinformaticsbioinformatics
• Homology-based

Simpe
pattern recognition
Simpe
pattern

recognition
• Consensus patterns
• Statistical methods
• Parameters are calculated
• Weight matrices
• Clustering
• Variance analysis
• Regression
M hi l i
11

M
ac
hi
ne
l
earn
i
ng
• Parameters are estimated by iterative training rather than direct
calculation
• Hidden Markov models
• Artificial Neural Networks
• Support Vector Machines
Homology, some definitionsHomology, some definitions
• HOMOLOG

a 来湥牡g 瑥牭 楮摩捡瑥 来湥g 潲 灲潴敩湳 瑨慴 慲a e癯汵瑩潮慲v

a

来湥牡g

瑥牭

楮摩捡瑥

来湥g



灲潴敩湳

瑨慴

慲a

敶潬畴楯湡特
牥污瑥搮⁈潭潬潧潵猠灲潴敩湳r浡礠扥b敩瑨敲e orthologs or
paralogs.
• ORTHOLOG
 for orthologs (ortho=exact), the homology is the result of
speciation, i.e. same exact gene in different organisms.
• PARALOG
 for paralogs (para=in parallel), the homology is the result of
d li i i i il i i ll i hi h
12
a gene
d
up
li
cat
i
on,
i
.e. s
i
m
il
ar prote
i
ns, potent
i
a
ll
y w
i
t
hi
n t
h
e
same organism.
NB!Homology  Identity  Similarity
Proteomics and Bioinformatics
Bioinformatics
2010-02-08
(Jan-Olov Höög 7
Homology
Homology‐
‐based predictionsbased predictions
1. For an unknown protein, identify a homolog (preferably an
ortholog
) for which there is information
ortholog
)

for

which

there

is

information
.
2. Assume that the unknown protein will resemble the known one.
Information about
structure, function etc
Known Unknown
13
protein protein
homolog homolog
AbAb initio predictionsinitio predictions
• The Latin term ab initio means “from the beginning” (in some
contexts called

de novo

instead)
contexts

called

de

novo

instead)
• A term used to define methods for making predictions about
biological features using only a computational model without
direct comparison to existing data.
• Advantages:
• Can be used for all proteins, no known homolog is needed
• If we can understand the principles enough to be able to make ab
initio predictions, we have learned a lot about how that specific
Karolinska Institutet 14
biological system works
• Drawbacks:
• Difficult for many problems!
Proteomics and Bioinformatics
Bioinformatics
2010-02-08
(Jan-Olov Höög 8
Artificial neural networksArtificial neural networks
• Applications in bioinformatics
• Protein sorting prediction

Prediction of post
-
translational modifications
Prediction

of

post
translational

modifications
• Secondary-structure prediction
• Surface accessibility prediction
• Protein disorder prediction
• Other applications
• Speech recognition
• Hand-written text recognition
• Spam-filters
C t AI
15

C
ompu
t
er game
AI
• Vehicle control (driving a car)
• Medical diagnosis
• Financial applications (automatic trading systems)
Molecular analysis of biological Molecular analysis of biological 
systemssystems
Karolinska Institutet 16
Proteomics and Bioinformatics
Bioinformatics
2010-02-08
(Jan-Olov Höög 9
A new approach to decoding life: A new approach to decoding life: 
systems biologysystems biology

Systems biology studies biological systems by systematically

Systems

biology

studies

biological

systems

by

systematically

perturbing them (biologically, genetically, or chemically); monitoring
the gene, protein, and informational pathway responses;
integrating these data; and ultimately, formulating mathematical
models that describe the structure of the system and its response
to individual perturbations.
Karolinska Institutet 17
Ideker, T. et al. (2001) Annu. Rev. Genomics Hum. Genet. 2,343–72
Integrated physicalIntegrated physical‐
‐interaction interaction 
networknetwork
Karolinska Institutet 18
Proteomics and Bioinformatics
Bioinformatics
2010-02-08
(Jan-Olov Höög 10
The ”selfThe ”self‐
‐surviving” cell modelsurviving” cell model
Karolinska Institutet 19
A total of 127 genes in the self-surviving cell
Bioinformatics as an integrated Bioinformatics as an integrated 
subject within Biomedicinesubject within Biomedicine
20