Presented by Jennifer
Johnstone
What is Bioinformatics?
Sequence Matching Problem
The Alignment Problem
Future Research
Bioinformatics is the application of computers in
Biology using algorithms, statistics and other
mathematical techniques to decipher the
language of DNA.
Given a string
s,
of size
n
, and a pattern
p
, of size
m,
for what indices
I
of
s
does
p
exactly match
s.
Example: Let p =
ABA
and s =
AABAAGTABA
then
I = {2, 8} since
A
ABA
AGTABA
ABA
and
AABAAGT
ABA
ABA
Naive String Matching Algorithm,
O(m*n).
String Matching with Finite Automata ,
O((m*|
Σ
|)+n).
Boyer
-
Moore Algorithm,
O(
m+n
) (
in practice).
String Matching with Compact Suffix Trees,
O(n log(n) + m*|
Σ
| +k).
String Matching using Suffix Arrays ,
O(
n+m
log(n) +k).
Given a pattern
p
=
aba
and a string
s
=
acbababa
we must first define the state function
δ
(
q,x
).
q
x
δ
(
焬x
⤠
0
a
1
0
b
0
0
c
0
1
a
1
1
b
2
1
c
0
2
a
3
2
b
0
2
c
0
3
a
1
3
b
2
3
c
0
i
s
i
State
1
a
δ
(0,a) = 1
2
c
δ
(1,c) = 0
3
b
δ
(0,b) = 0
4
a
δ
(0,a) = 1
5
b
δ
(1,b) = 2
6
a
δ
(2,a) = 3
7
b
δ
(3,b) = 2
8
a
δ
(2,a) = 3
Now we see that the match
condition is met for
i
= 6, 8.
Then the starting indexes are
j =
i
–
3+ 1, such that I ={ 4, 6 }.
Given two strings we want to generate an optimal
alignment. The alignment of two strings may
involve the insertion of gaps and
\
or the
acceptance of mismatched entries.
Example: Consider the following possible alignment of
the two strings
GACGGATTATG and
GATCGGAATAG:
GA
CGGA
T
TATG
GATCGGA
A
TA
G
Dynamic Approach
Computing Optimal Alignment using a
dynamic programming matrix and a scoring
function. (
O(m*n))
Heuristic Approach used in practice to speed up
search times on large databases. Consider the
Human genome which is over 3 billion
characters long for which you may need to
align only a small portion.
FASTP and FASTA Programs
BLAST Algorithm
Development of the Heuristic approaches is
constantly being improved upon and
researched as the algorithms themselves are
only 10
-
15 years old.
Development of tools that can perform a 10
-
way comparison of genomes.
Bioinformatics as a whole is an active field of
research that strongly needs qualified
professionals who have an aptitude for
computing and
\
or biology
.
B
ockenhauer
, Hans
-
Joachim and
Bongartz
, Dirk (2007)
Algorithmic Aspects of Bioinformatics. Berlin: Springer
pg.37
-
114
Haubold
, Bernhard and
Wiehe
, Thomas (2006)
Introduction to Computational Biology: An Evolutionary
Approach. Basel:
Birkh
auser
pg.65
-
85.
Jones, Neil C. and
Pevzner
,
Pavel
A. (2004)
An
Introduction to Bioinformatics Algorithms. Cambridge: The
MIT Press pg. 148
-
226 and 311
-
337.
Parida
,
Laxmi
(2008)
Pattern Discovery in Bioinformatics:
Theory & Algorithms. Boca Raton: Chapman & Hall/CRC
pg. 139
-
182 and 183
-
212.
Polanski,
Andrzej
and Kimmel,
Marek
(2007)
Bioinformatics. Berlin: Springer pg. 155
-
183 and 349
-
354.
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο