# BINF 101 Introduction to Bioinformatics Spring 2008 Final Exam

Biotechnology

Oct 1, 2013 (5 years and 5 months ago)

131 views

BINF 101 Introduction to Bioinformatics Spring 2008 Final Exam

May 1, 2008

This is an open book, open notes test.

Write your name on the blue books.

1.
(15 points) For the following distance matrix (o
f some gene sequences s1, s2, s3 , s4):

s1

s2

s3

s4

s1

0

5

4

4

s2

5

0

3

7

s3

4

3

0

6

s4

4

7

6

0

(a) Is it
? explain why or why not.

(b) Apply neighbor joining algorithm (my version) to construct a phylogenetic tree.

2.(15 points) For t
he following multiple sequence alignment of four sequences s1, s2, s3,
s4, of length 3, find the maximum parsimony tree; that is, find the phylogenetic tree
based on the parsimony method.

s1
: GGT

s2
: AGG

s3
: AAC

s4
: GAA

3.(15

points)

Assume that th
e score of match is 1 and the score of mismatch is 0, and that
there is no penalty for gaps. Find the optimal alignment of the following two sequences
using dynamic programming method:

s1
:

s2
:

NOTE: No credit for visual
match
ing

(co
mputer can’t see)
algorithm step
-
by
-
step.

4.(15

points)

In the maximum likelihood method for constructing the phylogenetic tree,
f
or the following tree topology

T

of three
given
leaves

x
1
, x
2
, x
3

(which are nucleotides)
with

given br
anch length
t

= (
t
1
, t
2
, t
3
, t
4
)
; for example,
x
1
=A
,

x
2
=G
,

x
3
=C, and
t
1
=6,

t
2
=3,

t
3
= 4
,

t
4
= 2,
what is the likelihood (or the probability) of the tree
:

P
(
x
1
, x
2
, x
3

|
T
,
t

)

in terms of the

probability assignment
s

q
(
a
) and
p
(
x
i

|
x
j

,
t

)
, where
q
(
a
) is the
probability of assigning a nucleotide
a

to a node and
p
(
x
i

|
x
j

,
t

) is the
(conditional)
probability of mutation from
x
j

to
x
i

after time

t

?

x
1

x
2

x
3

5.(1
5

points)

In the

BLAST search, what is the meaning of the p
-
value?

H
ow does the
increasing or deceasing of the word size
w

affect the results?
How does the increasing
or deceasing of the threshold
T

affect the results?

6.(1
5

points)

How is

PAM
-
1 scoring matrix constructed? What is the meaning of PAM
-
n,
for example PAM
-
250? How is BLOSUM scoring matrix constructed? What is the
meaning of BLOSUM
-
n, for example, BLOSUM
-
61?

7.(15

points)

For multiple sequence alignment, we discussed two
methods: multi
-
dimensional dynamic programming and Feng
-
Doolittle’s progressive alignment using a
guide tree (
as used in
CLUSTALX). What are the pros and cons for each of them?

t
1

t
4

t
2

t
3

T

: