OMWOMA VINCENT
P58/76972/2012
ASSINGMENT III
DESIGN AND ANALYSIS OF ALGORITHMS
1.
What is the time complexity of the bottom

up string alignment
algorithm?
Complexity of CLUSTALW
Clustalw is one of the
algorithms
that implements bottom

up method
. By workin
g out the
complexity of Clustalw algorithm, then by default we shal
l
be working up the complexity of the
bottom

up string alignment.
Robert C
Edgar (
2004)
It is instructive to consider the complexity of CLUSTALW. This is of
intrinsic interest as CLUSTALW i
s currently the
most widely used MSA program and, to the
best of our knowledge, its complexity has not previously been stated
correctly in the literature.
The similarity measure is the fractional
identity computed from a global alignment, clustering is
don
e by neighbor

joining. Global alignment of a
pair of sequences or profiles is computed using
the Myers

Miller linear space
algorit
hm
which
is O(
L
) space and
O(
L
2
) time in the typical
sequence length
L
. Given
N
sequences and thus
N
(
N

1)/2 = O(
N
2
) pairs,
it is therefore(
N
2
L
2
)
time and O(
N
2
+
L
) space to construct the distance matrix. The neighbor

joining implementation
is
O(
N
2
) space and O(
N
4
) time, at least up to CLUSTALW 1.82, although O(
N
3
) time is possible
.
A single
iteration of progressive alignment c
omputes a profile of each subtree from its multiple
alignment, which is
O(
N
P
L
P
) time and space in the number of sequences in the profile
N
P and the
profile length
L
P, then uses Myers

Miller to align the profiles in O(
L
P
) space and O(
L
P
2
) time.
There are
N

1 internal nodes in a rooted binary tree
and hence O(
N
) iterations. It is often
assumed that
L
P is O(
L
), i.e. that O(0) gaps are introduced in each iteration.
However, we often
observe the alignment length to grow approximately linearly, i.e. that O(1) g
aps are added per
iteration
. It is therefore more realistic to
assume that
L
P is O(
L
+
N
), making one iteration of
progressive alignment O(
NL
+
L
2
) in both space and time.
This analysis is summarized in Table
1.
Step
O(Space)
O(Time)
Distance mat
rix
N
²
+
L
N
2
L
²
Neighbor joining
N
²
N
4
Progressive (one iteration)
NL
P
+
L
P
=
NL
+
L
²
NL
P
+
L
P
2 =
N
2 +
L
2
Progressive (total)
NL
+
L
²
N
3
+
NL
2
TOTAL
N
²
+
L
²
N
4
+
L
2
2
Create a top

down algorithm for the string alignment problem
In this question am using the LGscore alignment method to demonstrate the top

down algorithm.
Arne Elofsson(2002)
while Number of aligned residues
.
25
Super position all residues in the model
and in the correct structure.
Calculate and store the p value f
or this
super position
Delete the pair of residues that is furthest
apart from each other in the model
and the correct structure.
return the best p value.
3.
W
hat
is the time complexity of the algorithm
The Clustering Method
Clustering method
is one of
the algorithms
that implements top

down
method. By working out
the complexity of
clustering
algorithm, then by default we shal
l
be working up t
he complexity of
the top

down
string alignment.
Kuen

Feng Huang Et al (2001),
since
group alignment method is
being
introduced,
CMSA
(Clustering Multiple
Sequence Alignment) algorithm
is going to be shown
. The tree based
method uses a technique of ”once a gap, always a gap”.
Our main idea is to reduce the number of
gaps in each group. Thus, if we put the two seque
nces of the longest
distance in two distinct
groups, we can get a better multiple sequence alignment when the input
sets of sequences are
very similar. The detail of our CMSA isas follows.
Algorithm: CMSA
(Clustering Multiple Sequence Alignment)
Input:
A s
et of sequences
S
={
S
1
,
S
2
……
Sn
}
.
Output:
A multiple sequence alignment of
S
.
Step 1:
If

S

=<
1, then stop.
Step 2:
Compute the optimal alignment on each pair
of sequences in
S
. Then construct the
distance
matrix for
S
.
Step 3:
Sort all entries in the di
stance matrix into non
increasing
order.
Step 4:
Create a set of sequences
R
=
S
.
Step 5:
In
R
, select a pair of sequences
Si
and
Sj
such
that
Si
and
Sj
have the longest distance.
Step 6:
Let
G
1
={
Si
}
_
and
G
2
={
Sj
}
.
R
=
R

{
Si
,
Sj
}
.
Perform the follow
ing substeps until
R
becomes
empty.
Step 6.1:
Select
Sk
Ε
R
such that
M
in
{
d
(
Sk
,
G
1
)
,
d
(
Sk
,
G
2
),
is the minimum.
Step 6.2:
If
d
(
Sk
,
G
1
)<=
d
(
Sk
,
G
2
)
, then
G
1
=
G
1
U
{
Sk
}
; otherwise
G
2
=
G
2
U{
Sk
}
.
Step 6.3:
R
=
R
–
{
Sk
}
.
Step 7:
Recursively apply this algorit
hm (Algorithm
CMSA) by setting the input
S
=
G
1.
Recursively apply this algorithm (Algorithm
CMSA) by setting the input
S
=
G
2.
Step 8:
Perform our group alignment method (Algorithm
Group Alignment) on
G
1 and
G
2.
Let
L
=
max
{
S
1
,

S
2
,……..,
Sn
}
, where

Si

denotes
the length of
Si
. The complexity of each
step is
as follows.
Step 2:
O
(
n
²
L
²
)
.
Step 3:
O
(
n
²
log
n
)
.
Step 6:
O
(
n
)
Step 8:
O
(
n
²
L
²
)
.
The time required for one recursion is
O
(
n
²
L
²
)
,
since
L
>
n
in almost all practical cases.
Combining
with the recurs
ive work in Step 7, we obtain the time
complexity of th
e algorithm as
O
(
n
²
L
²
)
.
We now explain our algorithm step by step. Let us
consider the following five sequences.
Suppose that
in the score function,
=
0, the costs of a match, a
mismatch and an inde
l are 0, 1
and 1, respectively.
S
1
=
AAGGCCTT
S
2
=
CGATT
S
3
=
AGGGAT
S
4
=
TCGA
S
5
=
AGGGCTT
In Step 2, we obtain a distance matrix, as shown. Our job is to divide the five sequences into two
groups in Steps 5 and 6. The dividing process is shown
in Figure
3. The first sequence put in
each group is
represented by a gray node. The number associated
with each node represents the
order that the sequence
References
1.
Robert C Edgar
;
A multiple sequence alignment method with reduced
time and space c
omplexity
, 2004
2.
Arne Elofsson
;
AStudy on Protein Sequence Alignment Quality
2002
3.
Kuen

Feng Huang, Chang

Biau Yang and Kuo

Tsung Tseng
An Efficient Algorithm for
Multiple Sequence Alignment
2001
4.
Van Walle I, Lasters I, Wyns L:
Align

m
–
a new algorithm f
or
multiple alignment of highly divergent sequences.
Bioinformatics
2004.
5.
Chia Mao Huang and Chang Biau Yang. Approximation algorithms for constructing
evolutionary trees. In
Proc. of National Computer Symposium,
Workshop on Algorithm
and Computation Theory
, pages A099
–
A109, 2001.
Comments 0
Log in to post a comment