BCB 567. Bioinformatics I (Fundamentals of Genome Informatics).
(Cross

listed
with COM S, CPR E.) (3

0) Cr. 3. F.
Prereq: Com S 208; Com S 330; Stat 341; credit or
enrollment in Biol 315, Stat 430.
Biology as an information science. Review of
algorithms an
d information processing. Generative models for sequences. String
algorithms. Pairwise sequence alignment. Multiple sequence alignment. Searching
sequence databases. Genome sequence assembly.
Goals:
Study methods for designing efficient algorithms and dat
a structures for problems in
Computational Biology. Analyzing the performance of algorithms for various tasks in
Computational Biology and learning to estimate their intrinsic resource requirements.
Study practical intractability in Computational Biology a
nd approaches for dealing with
it. Study models for Computational Biology.
Prerequisites:
ComS 208, ComS 330, Stat 341; Credit or enrollment in Biol 315; Stat 430
Textbook:
(1) Jones and Pevzner
–
An Introduction to Bioinformatics Algorithms; MIT Press 2
004
(required)
(2) Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press
1999 (reprint with corrections)
(3) Cormen, Leiserson, Rivest, and Stein
–
Introduction to
Algorithms;
MIT Press
2009
(3
rd
edition)
(4)
Kleinberg
and Tar
dos
–
Algorithms Design; Addison

‐Wesley 2005
Another
Instructor Description
This is a course on computational techniques for reconstruction and alignment of
genome sequences. Those techniques are very useful for students to solve computational
problems in genome sequencing and analysis
. The course starts with review of concepts
in algorithm design and analysis. This course covers pairwise sequence alignment, model
of sequence evolution, fast string matching, sequence database searching, multiple
sequence alignment, and genome sequence a
ssembly.
Course Objectives
Upon completion of this course the student should be able to apply computational
techniques to solve problems in genome sequencing and analysis.
Half sheet synopsis of
BCB 567
–
summary of topics
–
Fall 2008
1. Review of conce
pts in algorithm design
An instance of a problem, the size of an input instance, the algorithm,
the time and space requirements of an algorithm as functions of input size,
the big O notation, an example of designing and analyzing an algorit
hm.
2. Pair sequence alignment
2.1 Motivation
Alignments of DNA and protein sequences are useful in studying
the evolutionary history of the sequences and finding functional
elements in the sequences, e.g., reconstruction of p
hylogenetic trees
and finding sequence regions under strong selection.
2.2 A global alignment model
Alignment configuration: matches, mismatches, deletion and insertion gaps,
substitution scoring table, affine gap scoring
function, the score of an alignment,
an optimal alignment.
2.3 A dynamic programming algorithm
the major steps of applying dynamic programming as an algorithm design technique,
developing an algorithm for computing
an optimal global alignment by applying
dynamic programming to the problem. Obtaining the time and space requirements of
the algorithm.
2.4 A linear space algorithm
The high space requirement of the standard algorithm on long se
quences.
Obtaining the necessary and sufficient conditions for finding a middle
pair of positions on an optimal global alignment.
Developing a recursive algorithm based on finding a middle pair of positions
on an optimal global
alignment.
Obtaining the time and space requirements of the algorithm.
2.5 A banded alignment algorithm
The high time requirement of the standard algorithm on long sequences.
Developing an efficient algorithm by restricting the
standard algorithm
or the linear space algorithm to a small area of the matrix.
2.6 A local alignment model
Limitation of the global alignment algorithm on sequences that are not entirely
similar
but contain local regions that a
re similar.
Definition of a local alignment.
Developing a dynamic programming algorithm for computing an optimal local
alignment
between two sequences.
Developing a linear

space algorithm for computing an optimal local alignment
.
2.7 A generalized alignment algorithm
Limitations of the global and local alignment algorithms on sequences with
similar regions (exons) separated by different regions (introns).
Introducing a new type of alignment configurati
ons called difference blocks
for dealing with different regions.
Developing an algorithm for computing an optimal alignment that consists of
similarity blocks separated by difference blocks.
3. String matching
3.1 Finding exact s
tring matches between sequences
A lookup table for finding exact matches of words of length w and its extension for
finding exact matches of strings of lengths in a multiple of w.
Or suffix trees and arrays for finding exact matches of s
trings of any length.
3.2 Finding approximate string matches between sequences
A word model of 1's and 0's with 1 indicating a match and 0 for "don't care".
Use of a lookup table for finding approximate word matches under a word model.
4
. Fast sequence comparison and database search methods
4.1 Limitations of alignment algorithms on whole genome sequences
4.2 Computing high

scoring segment pairs (HSPs) based on finding string matches.
4.3 Dynamic programming algorithm for c
omputing high

scoring chains of HSPs.
5. Construction of substitution matrices
5.1 Construction of PAM matrices based on an evolutionary model
5.2 Construction of Blosum matrices based on sequence similarity
6. Reconstruction of phylogeneti
c trees
6.1 Computation of evolutionary distances between sequences.
6.2 A distance method for building a phylogenetic tree.
7. Multiple sequence alignment
7.1 A sum

of

pair scoring scheme for a multiple sequence alignment.
7.2 Limitati
on of a dynamic programming algorithm for building a multiple sequence
alignment.
7.3 A progressive alignment method for building a multiple alignment of sequences
based on
a phylogenetic tree of the sequences.
8. Genome assembly
8.1 Term
s in genome assembly
Base sequences, quality values, pairs of reads from the ends of DNA segments,
overlaps, the layouts and consensus sequences of contigs, and scaffolds.
8.2 Algorithm for quickly computing overlaps between sequences
8.3 Algorithm for building the layouts of contigs
8.4 Algorithm for building scaffolds of contigs
8.5 Algorithm for generating the consensus sequences of contigs.
BCB Prog
ram/Orientation/2011/BCB567

Description.doc
Comments 0
Log in to post a comment