Annotated Bibliography


Oct 2, 2013 (4 years and 9 months ago)


Carmen Nigro

October 9
, 2009

Bioinformatics: Sequence Alignment

This research examines different algorithms
for determining relationships between
sequences of amino acids
or nucleotides
from DNA, RNA, or proteins.


Sequence alignment can help scientists
hypothesize the function of a particular
sequence of DNA or protein. Similarities in
sequences can imply similarities in function
and structure.


D.J. Lipman, S.F. Altschul
, and
J.D. Kececioglu
, “A Tool for Multiple Sequence
Proc. Nail. Acad. Sci. USA, Vol. 86, pp. 4412
4415, June 1989.

This article offers an alternative to dynamic programming for multiple sequence
Until recently, dynamic programming has


impractical for multiple
ce alignment, as too much computing time would be required to compare more
than three sequences

The article suggests a program that implements the Carillo

Lipman algorithm, called Multiple Sequence Alignment (MSA). This program allows for
the comparison of up to six sequences.
Dynamic programming involves breaking a larger
problem down into smalle
r, more manageable pieces.
basic dynamic programming
approach for sequence alignment finds an optimal path through a rectangular path graph.
It accomplishes this by turning one sequence into another through a series of edits. Each
edit to the sequence
is associated with a particular cost and the purpose is to find the edits
that produce the lowest cost.
The algorithm employed by MSA reduces the number of
cell comparisons, by calculating an upper bound for the cost of a projection for an
optimal sequence

alignment on a pair of sequences.

R. Chenna, H. Sugawara, T. Koike, R. Lopez, T.J. Gibson, D.G. Higgins
and J.D.
, “
Multiple sequence alignment with the Clustal series of programs”,
Nucleic Acids Research, Vol. 31, pp.
00, 2003.

The Clustal series of programs are the most widely used programs for sequence
alignment. This article describes the Clusta
l series’

main features. The programs would
be very suitable for conducting a student research project, because they were designed to
be a portable
set of
s that provide

accurate alignments in a reasonable response time.
Many web servers have been set up to run

the program as a service; however

the size of

input must not be too large. But this can be over
come by simply downloading the free
software and running it locally. The article runs through a brief history of the Clustal
series and its different implemen
tations, but does not go very deep into detail.

The paper
touches upon the generation of trees from the multiple alignments through the Neighbor
Joining method. The paper also touches upon sequence weighting, position
specific gap
penalties and the automat
ic choice of a suitable residue comparison matrix at each step in
the multiple alignment. It may be useful to look further into these features.

J.D. Thompson, F. Plewniak,
O. Poch
, “A comprehensive comparison of multiple
sequence alignment programs
Oxford Journals:
Nucleic Acids Research, Vol.
27, pp.
2690, 1999.

This article compares the most widely used programs for multiple sequence alignment.
The paper compares ten different alignment programs using the BAliBASE, a database of
verified alignments, as a reference. The results show that iterative implementations often
offer improved accuracy; however, at the cost of more computing time.
The paper

compares global and local methods. Global alignments attempt to compare every residue
of every sequence and are best employed when the sequences are similar and are of the
same size. Local alignments are best employed for dissimilar sequences that ma
y have
similar regions.
The successfulness of an alignment strategy greatly depends on the
sequence to be aligned. The re
partition of sequences, the sequence length, and the
presence of N/C
terminal extensions may affect the results of any program and non
e of
the programs studied in this experiment performed well for all three variables. The paper
does not analyze computer time and space requirements when comparing the programs.





F. Plewniak,

F. Jeanmougin,


D. G.


CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment
aided by quality analysis tools”
Oxford Journals:
Nucleic Acids Research, Vol. 25, pp.
4882, 1997.

This article describes the Clustal_X user interface and useful

new features. The article
also describes the algorithms used to check alignment quality.

The C
lustal series builds
up a multiple alignment progressively
, a tree
like method
. It achieves this by aligning the
most closely rel
ated groups first and then other


groups are slowly aligned
together, without changing the earlier alignments
. The initial tree is based on the
comparison of pairs in the sequence. Progressive alignment strategies are greatly
dependent on how the most related sequences are determi
ned. A poor c
hoice of strategy
for this initial tree may lead to inaccurate alignments.

This approach works best when the
sequences are closely related.

mechanism has been added to the program that
highlights problem regions of an alignment and allows th
e user to manually realign these
residue ranges.
The program also provides the user with the option of automatically
realigning low scoring regions.

I. M. Wallace


O. Orla,

and D. G. Higgins, “Evaluation of Iterative Alignment
Algorithms for Multiple Alignment”, Oxford Journals: Bioinformatics, Vol. 21, pp.
1414, 2005.

This article compares different iterative algorithms for multiple alignment. The paper
analyzes the results of several tests that were run on iterative algorithms.

The paper
concludes that iteration may be incorporated into many other
methods of sequence
alignment to produce even better results.
The algorithms implemented and tested were
remove first, best first, random, tree
based iterative, and tree base splitting.
Both the
remove first and best first algorithms performed the best overall. The remove first method
is based on removing a sequence from the alignment at each step of the iteration and
realigned to the remaining alignment. If this new alignment is better,
it is used as the
input for the next iteration. The best first approach compensates for the greedy nature of
the remove first approach. At each iteration, every sequence is removed and realigned to
the rest. The alignment with the best score is used as inp
ut for the next iteration.

Other Useful Resources:

S. Waterman, “Efficient Sequence Alignment Algorithms”,
J. theor. Biol.


, 1984.

This article evaluates sequence alignment algorithms and compares them using big O
notation. The article proposes the use of concave weighting functi
ons in order to increase

H. Rangwala


G. Karypis, “
Incremental window
based protein





Oxford Journals: Bioinformatics, Vol. 23, pp. e17
e23, 2007.
[This article proposes a new algorithm for sequence
, which is based on short
variable length high
scoring subsequences. The results show that this algorithm
gives comparable results to algorithms already in use.]

L. A. Newberg, “Memory efficient dynamic programming backtrace and pairwise
sequence alignment”, Oxford Journals: Bioinformatics, Vol. 24, pp.
1778, 2008.

[Because it is insufficient to store all intermediate sequences in a cache, this article
proposes a memory efficient algorithm for calculating these intermediate
values as they
are needed. The article describes the results obtained from experiments with this check
pointing system on pairwise local sequences.]

J. Hérisson,

G. Payen,


R. Gherbi, “A 3D pattern matching algorithm for DNA
sequences” ,
Oxford Journals
: Bioinformatics, Vol. 23, pp. 680
686, 2007
[The article proposes a 3D model for DNA rather than the traditional textual models. A
3D model would allow scientists to study syntax and other properties of DNA. ]

T. W. Lam,

W. K. Sung,

S. L. Tam,

C. K. Wong,


S. M. Yiu, “Compressed indexing
and local alignment of DNA”, Oxford Journals: Bioinformatics, Vol. 24, pp. 791

article focuses on finding local alignments of DNA sequences through indexing
certain se
quences of DNA. This is a faster alternative to dynamic programming;
however, it is a heuristic
based approach and may not be as accurate.]

J. M. Sauder, J. W. Arthur, and .R L. Dunbrack, Jr., “
Scale Comparison of




With Structure Alignments,” Proteins:
Structure, Function, and Genetics, Vol. 40, pp. 6
22, 2000.

[This paper compares a number of sequence alignment algorithms and their accuracy for
protein sequence alignments.]

L. Delcher,

A. Phillippy,

J. Ca

and S. L. Salzberg, “
Fast algorithms for large
genome alignment and comparison”, Oxford Journals: Nucleic Acids Research, Vol. 30,
2483, 2002

[The article proposes a suffix tree algorithm for which they claim can align entire
sequences using minimal computer time and memory.]