CS/BF549 Pattern Matching and Pattern Detection

odecrackAI and Robotics

Oct 29, 2013 (3 years and 9 months ago)

72 views

CS
/BF
5
49


Pattern Matching and Pattern Detection

Fall 2008


Room:

KCB 102

Meeting Time:

T, Th 3:30


5:00 pm







Instructor: Gary Benson


Office:

Rm 903, 24 Cummington St.

Phone:

617
-
358
-
2965

Email
:

gbenson@bu
.edu



Office Hours: W 4:00
-
5:00 or by appointment



Grading:


Homework: assigned each week 60%

Programming project:



20%

Final:





15%

Class Participation:



5%



Project Due:
Dec. 5

Final Exam

(take home) Due
:

Dec. 18








Policy on Ac
ademic Honesty

Except as ot
herwise noted, all homeworks, projects, and take home tests
are to represent individual
effort, and are

to be written up and turned in individually.
In class tests are to be taken without
notes, other aides, or reference to anoth
er student’s work. Violations of these policies will result in
a failing grade for the assignment or test. Violations of academic honesty which exceed the purview
of this class will be referred to the Dean of Students.


Approximate

Week
-
by
-
Week Syllabus


Week 1: Exact Pattern Matching



Introduction



Naïve algorithm



Finite Automata


Week 2: Exact Pattern Matching



Knuth


Morris


Pratt (KMP) automaton



KMP text scan algorithm



KMP failure table

Readings:

S. Baase.
Computer Algorithms: Introduction to Desig
n and Analysis.

Addison
-
Wesley,
2
nd

edition, 1988, pp 212


219.


Week 3: Exact Pattern Matching



Boyer


Moore (BM) algorithm



Finding all cyclic shifts of a pattern in a text



Substring


Prefix oracle

Readings:

D. Gusfield.
Algorithms on strings, trees, a
nd sequences:

computer scie
nce and
computational biology.

Cambridge University Press, 1997, pp 7


9, 16


23.


Week 4: Exact Pattern Matching



Randomized Pattern Matching (Karp


Rabin)

Readings:

R. Karp and M. Rabin. Efficient Randomized Pattern
-
Matching
Algorithms.
IBM Journal
of Research and Development
. 31:249
-
260, 1987.


Week 5: Exact Pattern Matching



Aho


Corasick Dictionary Matching



Algorithm



fail links

Readings:

D. Gusfield.
Algorithms on strings, trees, and sequences:

computer scie
nce and
computat
ional biology.

Cambridge University Press, 1997, pp 52


59.


Week 6: Exact Pattern Matching



Introduction to Suffix Trees



O(m
3
) time construction



Suffix links and (m
2
) time construction

Readings:

Dan Gusfield.
Algorithms on strings, trees, and sequences:

c
omputer scie
nce and
computational biology.
Cambridge University Press,
1997, pp 94


103.


Week 7: Exact Pattern Matching



Suffix trees O(m) time construction

Readings:

Dan Gusfield.
Algorithms on strings, trees, and sequences:

computer scie
nce and
computat
ional biology.
Cambridge University Press,
1997, pp 103


107.


Week 8: Exact Pattern Matching



Suffix arrays

Readings:

TBA


Week 9: Approximate Pattern Matching



Introduction to Sequence Alignment



Common mutations, distance and similarity scoring



Longest
common subsequence problem

Readings:

G. Benson.
An Introduction to Computational Biology.

Lecture Notes, 2001, pp 1


18.


Week 10: Approximate Pattern Matching



Local and Global Alignment

Readings:

G. Benson.
An Introduction to Computational Biology.

Le
cture Notes, 2001, pp 19


28.


Week 11: Approximate Pattern Matching



Tandem Alignment


Wraparound Dynamic Programming



Alignment of minisatellite maps

Readings

W. Miller and E. Myers. Approximate matching of regular expressions.
Bulletin of
Mathematical B
iology
,

51:5
-
37, 1989.

G. Benson.
An Introduction to Computational Biology.

Lecture Notes, 2001, pp 28


35.

Berard, Nicolas, Buard, Gascuel, Rivals, A fast and specific alignment method for
minisatellite maps


Week 12: Approximate Pattern Matching



Heuri
stic Matching

BLAT

Readings:

W.J. Kent. BLAT


the BLAST
-
like alignment tool.
Genome Research
.12:

656
-
664,
2002.


Week 13: Approximate Pattern Matching



Seeds for heuristic matching



Spaced seeds



Indel seeds

Readings:

B. Ma, J. Tromp, and M. Li. PatternHunt
er: faster and more sensitive homology search.

Bioinformatics
,18:440
-
445, 2002.

D. Mak, G. Benson, All Hit All the Time,
Parameter Free Calculation of Seed Sensitivity
,
Proceedings of the Fifth Asia
-
Pacific Bioinf
ormatics Conference (APBC 2007),
2007
.

D. M
ak, Y. Gelfand, and G. Benson, Indel Seeds for Homology Search, Proceedings of
the 14th Annual International Conference on Intelligent Systems for Molecular Biology
(ISMB 2006),
Bioinformatics
,

22(14):e341
-
e349, 2006.


Week 14: Pattern Detection



Multiple

Short Word Methods



Tandem Repeats Finder



Inverted Repeats Finder

Readings:

G. Benson. Tandem repeats fi
nder: a program to analyze DNA sequences.
Nucleic Acids
Research
, 27:573
-
580, 1999.


Week 15: Pattern Detection



Pattern Enumeration Methods

Readings:

J
onassen
,

J. Collins

and D. Higgins. Finding flexible patterns in unaligned protein
sequences.
Protein
Science
, 4:1587
-
1595, 1995.

A. Neuwald and P. Green. Detecting patterns in protein sequences.
J
ournal of
Mol
ecular

Biol
ogy
, 239:698
-
712, 1994.


Additional

Readings come from the following books:


T. Cormen,
C. Leiserson and R. Rivest
.
Introduction to Algorithms
.
MIT Press, 1990.


W. Feller.
An introduction to probability theory and its applications
, volume I. John
Wiley & Sons, 3
rd

edition, 1968.