BCB567-Description - Bioinformatics and Computational Biology

vivaciousefficientΒιοτεχνολογία

1 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

95 εμφανίσεις

BCB 567. Bioinformatics I (Fundamentals of Genome Informatics).

(Cross
-
listed
with COM S, CPR E.) (3
-
0) Cr. 3. F.
Prereq: Com S 208; Com S 330; Stat 341; credit or
enrollment in Biol 315, Stat 430.

Biology as an information science. Review of
algorithms an
d information processing. Generative models for sequences. String
algorithms. Pairwise sequence alignment. Multiple sequence alignment. Searching
sequence databases. Genome sequence assembly.


Goals:

Study methods for designing efficient algorithms and dat
a structures for problems in
Computational Biology. Analyzing the performance of algorithms for various tasks in
Computational Biology and learning to estimate their intrinsic resource requirements.
Study practical intractability in Computational Biology a
nd approaches for dealing with
it. Study models for Computational Biology.


Prerequisites:

ComS 208, ComS 330, Stat 341; Credit or enrollment in Biol 315; Stat 430


Textbook:

(1) Jones and Pevzner


An Introduction to Bioinformatics Algorithms; MIT Press 2
004
(required)

(2) Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press
1999 (reprint with corrections)

(3) Cormen, Leiserson, Rivest, and Stein


Introduction to
Algorithms;

MIT Press

2009

(3
rd

edition)


(4)

Kleinberg
and Tar
dos


Algorithms Design; Addison
--
‐Wesley 2005


Another

Instructor Description


This is a course on computational techniques for reconstruction and alignment of

genome sequences. Those techniques are very useful for students to solve computational
problems in genome sequencing and analysis
. The course starts with review of concepts
in algorithm design and analysis. This course covers pairwise sequence alignment, model
of sequence evolution, fast string matching, sequence database searching, multiple
sequence alignment, and genome sequence a
ssembly.


Course Objectives

Upon completion of this course the student should be able to apply computational
techniques to solve problems in genome sequencing and analysis.


Half sheet synopsis of
BCB 567


summary of topics


Fall 2008

1. Review of conce
pts in algorithm design



An instance of a problem, the size of an input instance, the algorithm,


the time and space requirements of an algorithm as functions of input size,


the big O notation, an example of designing and analyzing an algorit
hm.


2. Pair sequence alignment



2.1 Motivation



Alignments of DNA and protein sequences are useful in studying


the evolutionary history of the sequences and finding functional


elements in the sequences, e.g., reconstruction of p
hylogenetic trees


and finding sequence regions under strong selection.



2.2 A global alignment model




Alignment configuration: matches, mismatches, deletion and insertion gaps,


substitution scoring table, affine gap scoring

function, the score of an alignment,


an optimal alignment.




2.3 A dynamic programming algorithm




the major steps of applying dynamic programming as an algorithm design technique,


developing an algorithm for computing
an optimal global alignment by applying


dynamic programming to the problem. Obtaining the time and space requirements of


the algorithm.



2.4 A linear space algorithm



The high space requirement of the standard algorithm on long se
quences.


Obtaining the necessary and sufficient conditions for finding a middle


pair of positions on an optimal global alignment.


Developing a recursive algorithm based on finding a middle pair of positions


on an optimal global
alignment.


Obtaining the time and space requirements of the algorithm.



2.5 A banded alignment algorithm



The high time requirement of the standard algorithm on long sequences.


Developing an efficient algorithm by restricting the
standard algorithm


or the linear space algorithm to a small area of the matrix.



2.6 A local alignment model



Limitation of the global alignment algorithm on sequences that are not entirely
similar


but contain local regions that a
re similar.


Definition of a local alignment.


Developing a dynamic programming algorithm for computing an optimal local
alignment


between two sequences.


Developing a linear
-
space algorithm for computing an optimal local alignment
.



2.7 A generalized alignment algorithm



Limitations of the global and local alignment algorithms on sequences with


similar regions (exons) separated by different regions (introns).


Introducing a new type of alignment configurati
ons called difference blocks


for dealing with different regions.


Developing an algorithm for computing an optimal alignment that consists of


similarity blocks separated by difference blocks.


3. String matching



3.1 Finding exact s
tring matches between sequences



A lookup table for finding exact matches of words of length w and its extension for


finding exact matches of strings of lengths in a multiple of w.


Or suffix trees and arrays for finding exact matches of s
trings of any length.



3.2 Finding approximate string matches between sequences



A word model of 1's and 0's with 1 indicating a match and 0 for "don't care".


Use of a lookup table for finding approximate word matches under a word model.


4
. Fast sequence comparison and database search methods



4.1 Limitations of alignment algorithms on whole genome sequences



4.2 Computing high
-
scoring segment pairs (HSPs) based on finding string matches.



4.3 Dynamic programming algorithm for c
omputing high
-
scoring chains of HSPs.


5. Construction of substitution matrices




5.1 Construction of PAM matrices based on an evolutionary model



5.2 Construction of Blosum matrices based on sequence similarity


6. Reconstruction of phylogeneti
c trees



6.1 Computation of evolutionary distances between sequences.



6.2 A distance method for building a phylogenetic tree.


7. Multiple sequence alignment



7.1 A sum
-
of
-
pair scoring scheme for a multiple sequence alignment.


7.2 Limitati
on of a dynamic programming algorithm for building a multiple sequence
alignment.


7.3 A progressive alignment method for building a multiple alignment of sequences
based on


a phylogenetic tree of the sequences.


8. Genome assembly



8.1 Term
s in genome assembly




Base sequences, quality values, pairs of reads from the ends of DNA segments,


overlaps, the layouts and consensus sequences of contigs, and scaffolds.



8.2 Algorithm for quickly computing overlaps between sequences



8.3 Algorithm for building the layouts of contigs



8.4 Algorithm for building scaffolds of contigs



8.5 Algorithm for generating the consensus sequences of contigs.


BCB Prog
ram/Orientation/2011/BCB567
-
Description.doc