pptx - Bioinformatica UPF

hordeprobableBiotechnology

Oct 4, 2013 (3 years and 10 months ago)

90 views

Introducci
ó a la
Bioinformàtica

Roderic Guigó i Serra

roderic.guigo@crg.cat

Bioinform
àtica
, UPF

Curs
2012
-
2013

Van Leeuwenhoek


In 1676 his credibility was questioned
when he sent the Royal Society a copy
of his first observations of microscopic
single celled organisms. Heretofore, the
existence of single celled organisms
was entirely unknown … The Royal
Society arranged to send an English
vicar, as well as a team of respected
jurists and doctors to Delft, Holland to
determine whether it was in fact Van
Leeuwenhoek's ability to observe and
reason clearly (wikipedia)

ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATG
AGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTAC
TCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGG
GCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGG
ACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGA
AGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGG
AAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATG
GAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGA
GGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGT
AACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGT
TGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATC
ACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCG
GTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTC
TTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTT
GACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGT
GCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGT
AGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATT
ATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAG
CCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTT
TTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGC
TCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACA
TGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAA
CTCCTGACCTCAGGTGATCTGCCTGCTTCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGC
TGGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGACAGCTGTGATCTTTATTCTCCATCACCC
CACACAGCCCTGCCTGGGGCACACAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATCCCA
GCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCC
CGTCTCTACTAAAAATACAAAAAACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGGCTGA
GGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGG
GTGACACAGCGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTAGGCA
CGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGCCCAGGAGTTCAACAC
CAGACTCAGCAACATAGTGAGACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACACCTGTGGT
CCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCCAGAAGGTCAAGGTTGCAGTGAACCACGTTCAG
GCCACTGCAGTCCAGCCTGGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGATTAAACAGA
CTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATTTGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCT
GCCTGGACGGGGTCAGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCAAGGTGGAGCAA
CCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAGGCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGG
CACTGGGTCGCTTTTGGGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGCTGCTCAGCC
CCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGGCCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCC
AGGTCCAGGTTTCATTCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGCTTCCTCTTCCCAT
TTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCTCTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTC
ACACTCGTCCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAGATGGGGTCTCACTGTGTTGC
CCAGGCTGGTCTTGAACTTCTGGGCTCAAGCGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGA
GCCACCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCGTCTGTCTTTGTCTCCTCTCTGCCTC
TGTCCCGTTCCTTCTCTCTTGGTTCACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCCTTCT
CGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC

ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATG
AGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTAC
TCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGG
GCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGG
ACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGA
AGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGG
AAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATG
GAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGA
GGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGT
AACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGT
TGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATC
ACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCG
GTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTC
TTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTT
GACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGT
GCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGT
AGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATT
ATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAG
CCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTT
TTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGC
TCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACA
TGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAA
CTCCTGACCTCAGGTGATCTGCCTGCTTCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGC
TGGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGACAGCTGTGATCTTTATTCTCCATCACCC
CACACAGCCCTGCCTGGGGCACACAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATCCCA
GCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCC
CGTCTCTACTAAAAATACAAAAAACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGGCTGA
GGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGG
GTGACACAGCGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTAGGCA
CGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGCCCAGGAGTTCAACAC
CAGACTCAGCAACATAGTGAGACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACACCTGTGGT
CCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCCAGAAGGTCAAGGTTGCAGTGAACCACGTTCAG
GCCACTGCAGTCCAGCCTGGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGATTAAACAGA
CTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATTTGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCT
GCCTGGACGGGGTCAGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCAAGGTGGAGCAA
CCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAGGCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGG
CACTGGGTCGCTTTTGGGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGCTGCTCAGCC
CCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGGCCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCC
AGGTCCAGGTTTCATTCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGCTTCCTCTTCCCAT
TTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCTCTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTC
ACACTCGTCCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAGATGGGGTCTCACTGTGTTGC
CCAGGCTGGTCTTGAACTTCTGGGCTCAAGCGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGA
GCCACCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCGTCTGTCTTTGTCTCCTCTCTGCCTC
TGTCCCGTTCCTTCTCTCTTGGTTCACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCCTTCT
CGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC

ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATG
AGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTAC
TCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGG
GCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGG
ACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGA
AGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGG
AAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATG
GAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGA
GGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGT
AACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGT
TGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATC
ACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCG
GTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTC
TTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTT
GACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGT
GCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGT
AGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATT
ATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAG
CCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTT
TTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGC
TCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACA
TGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAA
CTCCTGACCTCAGGTGATCTGCCTGCTTCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGC
TGGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGACAGCTGTGATCTTTATTCTCCATCACCC
CACACAGCCCTGCCTGGGGCACACAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATCCCA
GCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCC
CGTCTCTACTAAAAATACAAAAAACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGGCTGA
GGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGG
GTGACACAGCGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTAGGCA
CGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGCCCAGGAGTTCAACAC
CAGACTCAGCAACATAGTGAGACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACACCTGTGGT
CCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCCAGAAGGTCAAGGTTGCAGTGAACCACGTTCAG
GCCACTGCAGTCCAGCCTGGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGATTAAACAGA
CTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATTTGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCT
GCCTGGACGGGGTCAGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCAAGGTGGAGCAA
CCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAGGCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGG
CACTGGGTCGCTTTTGGGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGCTGCTCAGCC
CCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGGCCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCC
AGGTCCAGGTTTCATTCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGCTTCCTCTTCCCAT
TTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCTCTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTC
ACACTCGTCCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAGATGGGGTCTCACTGTGTTGC
CCAGGCTGGTCTTGAACTTCTGGGCTCAAGCGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGA
GCCACCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCGTCTGTCTTTGTCTCCTCTCTGCCTC
TGTCCCGTTCCTTCTCTCTTGGTTCACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCCTTCT
CGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC

La mat
è
ria cromosòmica
és

“un cristall
aperiòdic”, constituït per

la successi
ó

d'un
nombre petit d'elements isom
è
rics*, la
seqü
è
ncia

concreta dels quals
és

la
responsable de la seva funcionalitat.





(*) “
the number of atoms in such a structure need not to be very
large to produce an almost unlimited number of possible
arrangements. For illustration, think of the Morse code…”

1943: Schroëdinger,

“What is life?”

ENIAC

Late 40s: first digital computers

MALWTRLRPLLALLALWPPPPARAFVNQHLCGS
HLVEALYLVCGERGFFYTPKARREVEGPQVGAL
ELAGGPGAGGLEGPPQKRGIVEQCCASVCSLYQ
LENYCN


Amino acid sequence of the bovine insuline

http://www.ict
-
science
-
to
-
society.org/

Early 60s: the
genetic code

GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGT
CGAAAAATTATCTTGATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCG
AAGTGGACTGCTGGCGGAAAATGAGAAAATTCGACCTATCCTTGCGCAGCTCGA
GAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTGTCAAAAACTGA
CGCGTTGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTG
GTTTAGATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGT

MALWTRLRPLLALLALWPPPPARAFVNQHLCGSHLVEALYLVCGERGFFY
TPKARREVEGPQVGALELAGGPGAGGLEGPPQKRGIVEQCCASVCSLYQ
LENYCN

1957: invention of the
programming language FORTRAN

Computers become smaller and therefore faster and cheaper


During the 60s computers are introduced into bancs,
financial institutions, universities and research centers

60s:

Transistors and integrated circuits

Sequence alignment and
comparison

substitution
matrices

Sequence alignment


The substitution matrices provided a model under
which the concept of optimal alignment could be
formalized, and computed. The optimal alignment
between two sequences is the alignment that
maximizes the sum of the amino acid substitution
values at each aligned position.



A R N D C Q A R N D C Q


S K
-

E A E
-

S K E A E

+1+3
-
1+3
-
2+2=6
-
1+0+1+3
-
2+2=3

The total number of possible alignments between
two sequences of length 100 is approximately
10
200
.


With
DP

the number of operations required to
obtain the optimal alignment is
aproximately

3x100
2

Query: 25
IPREVIERLARSQIHSIRDLQRLLEIDSVGSEDSLDTSLRAHGVHATKHVPEKRPLPIRR 84


IP E+ + L+ I S DLQRLL+ DS G ED + L H+ + R

Sbjct: 10 IPEELYKMLSGHSIRSFDDLQRLLQGDS
-
GKEDGAELDLNMTRSHSGGELESLA
----
RG 64



Query: 85 KRSI
------
EEAVPAVCKTRTVIYEIPRSQVDPTSANFLIWPPCVEVKRCTGCCNTSSV 138


KRS+ E A+ A CKTRT ++EI R +D T+ANFL+WPPCVEV+RC+GCCN +V

Sbjct: 65 KRSLGSLSVAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNV 124


Query: 139 KCQPSRVHHRSVKVAKVEYVRKKPKLKEVQVRLEEHLECAC 179


+C+P++V R V+V K+E VRKKP K+ V LE+HL C C

Sbjct: 125 QCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKC 165

DYNAMIC PROGRAMMING,

Nedleman

and
Wunsch
, 1970

Smith and
Waterman
, 1981

70

s: Optimal sequence alignment

mid70

s: DNA sequencing,

Sanger. Maxam and Gilbert


By the end of the sixties, hundreds of
proteins had been sequenced, but the
sequencing on nucleic acids remained
elusive



Sanger (Cambridge)


Maxam and Gilbert (Harvard)

Anys 70: Internet.

Advanced Research Projects Agency

gagttttatcgcttccatgacgcagaagttaacactttcggatatttctgatgagtcgaaaaattatcttgataaagcaggaattactactgcttgtttacgaattaaat
cgaagtggactgctggcggaaaatgagaaaattcgacctatccttgcgcagctcgagaagctcttactttgcgacctttcgccatcaactaacgattctgtcaaaaactg
acgcgttggatgaggagaagtggcttaatatgcttggcacgttcgtcaaggactggtttagatatgagtcacattttgttcatggtagagattctcttgttgacatttta
aaagagcgtggattactatctgagtccgatgctgttcaaccactaataggtaagaaatcatgagtcaagttactgaacaatccgtacgtttccagaccgctttggcctct
attaagctcattcaggcttctgccgttttggatttaaccgaagatgatttcgattttctgacgagtaacaaagtttggattgctactgaccgctctcgtgctcgtcgctg
cgttgaggcttgcgtttatggtacgctggactttgtgggataccctcgctttcctgctcctgttgagtttattgctgccgtcattgcttattatgttcatcccgtcaaca
ttcaaacggcctgtctcatcatggaaggcgctgaatttacggaaaacattattaatggcgtcgagcgtccggttaaagccgctgaattgttcgcgtttaccttgcgtgta
cgcgcaggaaacactgacgttcttactgacgcagaagaaaacgtgcgtcaaaaattacgtgcggaaggagtgatgtaatgtctaaaggtaaaaaacgttctggcgctcgc
cctggtcgtccgcagccgttgcgaggtactaaaggcaagcgtaaaggcgctcgtctttggtatgtaggtggtcaacaattttaattgcaggggcttcggccccttacttg
aggataaattatgtctaatattcaaactggcgccgagcgtatgccgcatgacctttcccatcttggcttccttgctggtcagattggtcgtcttattaccatttcaacta
ctccggttatcgctggcgactccttcgagatggacgccgttggcgctctccgtctttctccattgcgtcgtggccttgctattgactctactgtagacatttttactttt
tatgtccctcatcgtcacgtttatggtgaacagtggattaagttcatgaaggatggtgttaatgccactcctctcccgactgttaacactactggttatattgaccatgc
cgcttttcttggcacgattaaccctgataccaataaaatccctaagcatttgtttcagggttatttgaatatctataacaactattttaaagcgccgtggatgcctgacc
gtaccgaggctaaccctaatgagcttaatcaagatgatgctcgttatggtttccgttgctgccatctcaaaaacatttggactgctccgcttcctcctgagactgagctt
tctcgccaaatgacgacttctaccacatctattgacattatgggtctgcaagctgcttatgctaatttgcatactgaccaagaacgtgattacttcatgcagcgttacca
tgatgttatttcttcatttggaggtaaaacctcttatgacgctgacaaccgtcctttacttgtcatgcgctctaatctctgggcatctggctatgatgttgatggaactg
accaaacgtcgttaggccagttttctggtcgtgttcaacagacctataaacattctgtgccgcgtttctttgttcctgagcatggcactatgtttactcttgcgcttgtt
cgttttccgcctactgcgactaaagagattcagtaccttaacgctaaaggtgctttgacttataccgatattgctggcgaccctgttttgtatggcaacttgccgccgcg
tgaaatttctatgaaggatgttttccgttctggtgattcgtctaagaagtttaagattgctgagggtcagtggtatcgttatgcgccttcgtatgtttctcctgcttatc
accttcttgaaggcttcccattcattcaggaaccgccttctggtgatttgcaagaacgcgtacttattcgccaccatgattatgaccagtgtttccagtccgttcagttg
ttgcagtggaatagtcaggttaaatttaatgtgaccgtttatcgcaatctgccgaccactcgcgattcaatcatgacttcgtgataaaagattgagtgtgaggttataac
gccgaagcggtaaaaattttaatttttgccgctgaggggttgaccaagcgaagcgcggtaggttttctgcttaggagtttaatcatgtttcagacttttatttctcgcca
taattcaaactttttttctgataagctggttctcacttctgttactccagcttcttcggcacctgttttacagacacctaaagctacatcgtcaacgttatattttgata
gtttgacggttaatgctggtaatggtggttttcttcattgcattcagatggatacatctgtcaacgccgctaatcaggttgtttctgttggtgctgatattgcttttgat
gccgaccctaaattttttgcctgtttggttcgctttgagtcttcttcggttccgactaccctcccgactgcctatgatgtttatcctttgaatggtcgccatgatggtgg
ttattataccgtcaaggactgtgtgactattgacgtccttccccgtacgccgggcaataacgtttatgttggtttcatggtttggtctaactttaccgctactaaatgcc
gcggattggtttcgctgaatcaggttattaaagagattatttgtctccagccacttaagtgaggtgatttatgtttggtgctattgctggcggtattgcttctgctcttg
ctggtggcgccatgtctaaattgtttggaggcggtcaaaaagccgcctccggtggcattcaaggtgatgtgcttgctaccgataacaatactgtaggcatgggtgatgct
ggtattaaatctgccattcaaggctctaatgttcctaaccctgatgaggccgcccctagttttgtttctggtgctatggctaaagctggtaaaggacttcttgaaggtac
gttgcaggctggcacttctgccgtttctgataagttgcttgatttggttggacttggtggcaagtctgccgctgataaaggaaaggatactcgtgattatcttgctgctg
catttcctgagcttaatgcttgggagcgtgctggtgctgatgcttcctctgctggtatggttgacgccggatttgagaatcaaaaagagcttactaaaatgcaactggac
aatcagaaagagattgccgagatgcaaaatgagactcaaaaagagattgctggcattcagtcggcgacttcacgccagaatacgaaagaccaggtatatgcacaaaatga
gatgcttgcttatcaacagaaggagtctactgctcgcgttgcgtctattatggaaaacaccaatctt
c
ccaagcaacagcaggtttccgagattatgcgccaaatgctta
ctcaagctcaaacggctggtcagtattttaccaatgaccaaatcaaagaaatgactcgcaaggttagtgctgaggttgacttagttcatcagcaaacgcagaatcagcgg
tatggctcttctcatattggcgctactgcaaaggatatttctaatgtcgtcactgatgctgcttctggtgtggttgatatttttcatggtattgataaagctgttgccga
tacttggaacaatttctggaaagacggtaaagctgatggtattggctctaatttgtctaggaaataaccgtcaggattgacaccctcccaattgtatgttttcatgcctc
caaatcttggaggcttttttatggttcgttcttattacccttctgaatgtcacgctgattattttgactttgag

1977:


X174 virus
genome

1982: the first electronic databases

FASTA,
1982: Wilbur and Lipman, 1985: Lipman and Pearson

BLAST
, 1990:
Altschul, Gish, Miller, Myers

and

Lipman

accelerating database searches
hash methods

1

2

3

4

5

6

7

8

9

10

11

12

13

W

A

T

S

N

A

N

D

C

R

I

C

K

A

C

D

I

K

N

R

S

T

W

2

6

9

12

8

11

13

5

7

10

4

3

1

Query Sequence

Hash table K=1

http://www.ccl.rutgers.edu/~ouyang/5020/FASTA
-
BLAST.ppt


Search of the Platelet Derived Growth Factor sequence

1982, Doolittle: relationship between
oncogenes and growth factors

1990:The human genome project

THE HUMAN GENOME PROGRAM
(HGP) is producing large quantities of
complex map and DNA sequence data.
Informatics projects in algorithms,
software, and databases are crucial in
accumulating and interpreting these data
in a robust and automated fashion at
genome and sequencing centers



Computer systems play essential roles in
all aspects of genome research, from
data acquisition and analysis to data
management.
Without powerful
computers

and appropriately designed
data

management systems, high

volume
genome research cannot
proceed
.


This proposal concerns the management of general information about accelerators and
experiments at CERN. It discusses the problems of loss of information about complex evolving
systems and derives a solution based on a distributed hypertext system

(Tim Berners
-
Lee)

1990:WWW at CERN

Human Genome Project Milestones

2001: la culminaci
ó del projecte

bioinformatics

Medline articles with keyword
Bioinformatics
.

year

# articles

To 1990

0

bioinformatics

Medline articles with keyword
Bioinformatics
.

year

# articles

To 1990

0

1990
-
1994

15

bioinformatics

Medline articles with keyword
Bioinformatics
.

year

# articles

To 1990

0

1990
-
1994

15

1995
-
1999

823

bioinformatics

Medline articles with keyword
Bioinformatics
.

year

# articles

To 1990

0

1990
-
1994

15

1995
-
1999

823

2000
-
2004

7827

bioinformatics

Medline articles with keyword
Bioinformatics
.

year

# articles

To 1990

0

1990
-
1994

15

1995
-
1999

823

2000
-
2004

7827

2005
-
2008

18822

Bioinformatics, Genomics,
Systems Biology in Medline

What is past, is a prologue

W. Shakespeare, La Tempestad,



mid70

s: DNA sequencing,

Sanger. Maxam and Gilbert


By the end of the sixties, hundreds of
proteins had been sequenced, but the
sequencing on nucleic acids remained
elusive



Sanger (Cambridge)


Maxam and Gilbert (Harvard)

ABI PRISM 3700 DNA Analyzer



2008: Major genome centers can sequence the
same number of base pairs
every 4 days


1000 Genome project launched


World
-
wide capacity dramatically increasing

Further Evolution of Large
-
scale

Genome Sequencing


2000: Human genome working drafts


Data unit of approximately 10x coverage of human


10 years and cost about $3 billion


2009:
Every 4 hours ($25,000)


2010:

Every 14 minutes ($5,000)


Illumina HiSeq2000 machine produces 200
gigabases per 8 day run (BGI have ordered have
128)

Slide from Paul Flicek. EBI Bioinformatics Advisory Council

ENIAC, 1950s

2.4
x

0.9
x

30 (
m
)

385 operations/second.
10
-
6

operations/second/cm
3

ENIAC, 1950s

2.4
x

0.9
x

30 (
m
)

385 operations/second.
10
-
6

operations/second/cm
3

MAC AIR, 2010s

~1
x

32.5
x

22.7 (cm)


133,656,056 operations/second.
10
5

operations/second/cm
3

CELERA GENOMICS, year 2000

1,000 m
2
. 2 yr.

3GB at 10x

5x10
-
6

Gb/day/m
3



CELERA GENOMICS, year 2000

1,000 m
2
. 2 yr.

3GB at 10x

5x10
-
6

Gb/day/m
3



HISEQ 2500. year 2012

119
x

94
x

76 (cm). 1 day

120
Gb

10
2

Gb/day/m
3


Moore’s Law