BCB567-Description - Bioinformatics and Computational Biology


1 Οκτ 2013 (πριν από 4 χρόνια και 9 μήνες)

122 εμφανίσεις

BCB 567. Bioinformatics I (Fundamentals of Genome Informatics).

with COM S, CPR E.) (3
0) Cr. 3. F.
Prereq: Com S 208; Com S 330; Stat 341; credit or
enrollment in Biol 315, Stat 430.

Biology as an information science. Review of
algorithms an
d information processing. Generative models for sequences. String
algorithms. Pairwise sequence alignment. Multiple sequence alignment. Searching
sequence databases. Genome sequence assembly.


Study methods for designing efficient algorithms and dat
a structures for problems in
Computational Biology. Analyzing the performance of algorithms for various tasks in
Computational Biology and learning to estimate their intrinsic resource requirements.
Study practical intractability in Computational Biology a
nd approaches for dealing with
it. Study models for Computational Biology.


ComS 208, ComS 330, Stat 341; Credit or enrollment in Biol 315; Stat 430


(1) Jones and Pevzner

An Introduction to Bioinformatics Algorithms; MIT Press 2

(2) Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press
1999 (reprint with corrections)

(3) Cormen, Leiserson, Rivest, and Stein

Introduction to

MIT Press





and Tar

Algorithms Design; Addison
‐Wesley 2005


Instructor Description

This is a course on computational techniques for reconstruction and alignment of

genome sequences. Those techniques are very useful for students to solve computational
problems in genome sequencing and analysis
. The course starts with review of concepts
in algorithm design and analysis. This course covers pairwise sequence alignment, model
of sequence evolution, fast string matching, sequence database searching, multiple
sequence alignment, and genome sequence a

Course Objectives

Upon completion of this course the student should be able to apply computational
techniques to solve problems in genome sequencing and analysis.

Half sheet synopsis of
BCB 567

summary of topics

Fall 2008

1. Review of conce
pts in algorithm design

An instance of a problem, the size of an input instance, the algorithm,

the time and space requirements of an algorithm as functions of input size,

the big O notation, an example of designing and analyzing an algorit

2. Pair sequence alignment

2.1 Motivation

Alignments of DNA and protein sequences are useful in studying

the evolutionary history of the sequences and finding functional

elements in the sequences, e.g., reconstruction of p
hylogenetic trees

and finding sequence regions under strong selection.

2.2 A global alignment model

Alignment configuration: matches, mismatches, deletion and insertion gaps,

substitution scoring table, affine gap scoring

function, the score of an alignment,

an optimal alignment.

2.3 A dynamic programming algorithm

the major steps of applying dynamic programming as an algorithm design technique,

developing an algorithm for computing
an optimal global alignment by applying

dynamic programming to the problem. Obtaining the time and space requirements of

the algorithm.

2.4 A linear space algorithm

The high space requirement of the standard algorithm on long se

Obtaining the necessary and sufficient conditions for finding a middle

pair of positions on an optimal global alignment.

Developing a recursive algorithm based on finding a middle pair of positions

on an optimal global

Obtaining the time and space requirements of the algorithm.

2.5 A banded alignment algorithm

The high time requirement of the standard algorithm on long sequences.

Developing an efficient algorithm by restricting the
standard algorithm

or the linear space algorithm to a small area of the matrix.

2.6 A local alignment model

Limitation of the global alignment algorithm on sequences that are not entirely

but contain local regions that a
re similar.

Definition of a local alignment.

Developing a dynamic programming algorithm for computing an optimal local

between two sequences.

Developing a linear
space algorithm for computing an optimal local alignment

2.7 A generalized alignment algorithm

Limitations of the global and local alignment algorithms on sequences with

similar regions (exons) separated by different regions (introns).

Introducing a new type of alignment configurati
ons called difference blocks

for dealing with different regions.

Developing an algorithm for computing an optimal alignment that consists of

similarity blocks separated by difference blocks.

3. String matching

3.1 Finding exact s
tring matches between sequences

A lookup table for finding exact matches of words of length w and its extension for

finding exact matches of strings of lengths in a multiple of w.

Or suffix trees and arrays for finding exact matches of s
trings of any length.

3.2 Finding approximate string matches between sequences

A word model of 1's and 0's with 1 indicating a match and 0 for "don't care".

Use of a lookup table for finding approximate word matches under a word model.

. Fast sequence comparison and database search methods

4.1 Limitations of alignment algorithms on whole genome sequences

4.2 Computing high
scoring segment pairs (HSPs) based on finding string matches.

4.3 Dynamic programming algorithm for c
omputing high
scoring chains of HSPs.

5. Construction of substitution matrices

5.1 Construction of PAM matrices based on an evolutionary model

5.2 Construction of Blosum matrices based on sequence similarity

6. Reconstruction of phylogeneti
c trees

6.1 Computation of evolutionary distances between sequences.

6.2 A distance method for building a phylogenetic tree.

7. Multiple sequence alignment

7.1 A sum
pair scoring scheme for a multiple sequence alignment.

7.2 Limitati
on of a dynamic programming algorithm for building a multiple sequence

7.3 A progressive alignment method for building a multiple alignment of sequences
based on

a phylogenetic tree of the sequences.

8. Genome assembly

8.1 Term
s in genome assembly

Base sequences, quality values, pairs of reads from the ends of DNA segments,

overlaps, the layouts and consensus sequences of contigs, and scaffolds.

8.2 Algorithm for quickly computing overlaps between sequences

8.3 Algorithm for building the layouts of contigs

8.4 Algorithm for building scaffolds of contigs

8.5 Algorithm for generating the consensus sequences of contigs.

BCB Prog