# phylogeny_student_worksheetx

Βιοτεχνολογία

1 Οκτ 2013 (πριν από 4 χρόνια και 9 μήνες)

128 εμφανίσεις

AP Biology

Student Worksheet

Page
1

of
25

Le
arning Evolution Using

Phylogenetic Analysis

The purpose of this hands
-
on practice is to
learn how to utilize
bioinformatics

tools

e
volution
.

Pre
-
requisites: You should have completed the
evolution review homework exercises provided to you by your
teacher prior to this lesson.

Pages 1 and 2 are for your information and better
understanding of the big picture of what this worksheet is helping you learn
.

Big Picture

When we perform phylogenetic analysis we follow these main steps:

1.

based on which you wan
t to compare the desired
species
/entities
. In
bioinformatics

we generally use genomic or protein sequences. NCBI’s Nucleotide
AP Biology

Student Worksheet

Page
2

of
25

(
http://www.ncbi.nlm.nih.gov/nucleotide/
)

and Protein (
http://www.ncbi.nlm.nih.go
v/protein/
)

databases are good sources of genomic and proteomic sequences. For this lesson, the sequences are
provided to you at
http://compbio.soe.ucsc.edu/binf
-
in
-
AP/

2.

Perform multiple sequence align
ment

on the selected sequences. There are many online tools
available that produce multiple sequence alignment. In this lesson we will be using Clustal Omega site
http://www.ebi.ac.uk/Tools/msa/clust
alo/

3.

Calculate the

distance matrix

from the multiple sequence alignment. A distance matrix is a square
matrix that indicates the d
istance between each two
species

in your list. In this lesson we will be using
Protdist
tool in Phylip package on
The Institu
t Pasteur
’s web site
http://mobyle.pasteur.fr/cgi
-
bin/portal.py#forms::protdist

Note: There are many different computational methods for building a phylogenetic tree. In this lesson
we will be using distance based tree construction. That is why we need to calculate a distance matrix.
There are methods that are not based on evolutionar
y distance. Those methods do not require a use of
a distance matrix.
Some of them are regarded more highly than distance
-
based methods.
However,
they are often quiet slow to do optimally. In this worksheet, we are going to use neighbor
-
joining
method, whic
h is a class of
distance based method family. It is a good compromise between speed and
accuracy when producing

phylogenies.

4.

Build
a phylogenetic tree
. There are many tools available for building phylogenetic trees. Some are
web tools and some have to be
of Phylip package located on The Institut Pasteur web site. We will specifically use Neighbor
-
Joining
and UPGMA methods tool
at
http://mobyle.pasteur.fr/cgi
-
bin/portal.py#forms::neighbor

The Institut Pasteur is located in Paris, France, and has many useful
bioinformatics

tools on its web site.
We encourage you to explore those tools on your own outside of this

Phylogeny
.
Just
navigate to
http://mobyle.pasteur.fr/cgi
-
bin/portal.py#welcome

and investigate the Programs list in
the top right corner of this web page. This web site p
rovides a workbench in which you are able to click
back and forth between different tools/forms you are using and various jobs that have been completed
by those tools.

Some of the other tools you might want to explore on your own are:

Phylogeny.fr at
www.phylogeny.fr/version2_cgi/index.cgi

TreeTop at
http://www.genebee.msu.su/services/phtree_reduced.html

5.

V
isualize the

tree

in graphical output from the text representation of the tree produced by the
previous step. In this lesson we will use
Newickstop
, Drawtree, and Drawgram

in Phylip package
at
http://mobyle.pasteur.fr/cgi
-
bin/portal.py#forms::newicktops

http://mobyle.pasteur.fr/cgi
-
bin/portal.py#forms::drawtree

http://mobyle.pasteur.fr/cgi
-
bin/portal.py#forms::drawgram

AP Biology

Student Worksheet

Page
3

of
25

Part A

Exercise 1:

Draw a line between the two animals that are more closely related
.

Quagga

Zebra

Horse

AP Biology

Student Worksheet

Page
4

of
25

Exercise 2:

Draw a line between the two animals that are more closely related.

Banksia

Pine

Hakea

AP Biology

Student Worksheet

Page
5

of
25

Exercise 3
:

Draw a line between the two animals that are more closely related.

Horseshoe crab

Stone crab

Aquatic spider

AP Biology

Student Worksheet

Page
6

of
25

Exercise 4:

Draw a line between the two animals that are more closely related.

Exercise 5
:

When using computational methods to determine phylogeny, we can use either genomic (DN
A) or
protein

(amino acid) sequences.
Can you thin
k of any reasons when it is adva
ntageous to use genomic o
ver protein
sequences and vice

versa
?

Barnacle

Limpet

Shrimp

AP Biology

Student Worksheet

Page
7

of
25

P
art B

Exercise 1
:

In order to calculate dist
ance between two or more
species
/entities

based on protein sequences, we first need
to know how simular those sequences are.
Bioinformaticians use a technique called multiple sequence
alignment just for that purpose.
The discussion of how multiple sequence alignment algorithms work is
outside of
the scope of this course.

Let’s align some sequences.

1.

.txt from
http://compbio.soe.ucsc.edu/binf
-
in
-
AP/evolution/

2.

T
his file contains sequences f
or Beta Globin for these
sp
ecies
:
Homo sapiens

(human)
,

Bos taurus

(cow

),
Salmo salar

(Atlantic salmon

),

Mus musculus

(mouse

),
Otolemur crassicaudatus

(galago

),

Gorilla gorilla

(gorilla

),
Gallus gallus

(rooster

),

Tarsius syrichta

(lemur

),

(sloth

),

Dasypus novemcinctus

llo

),
Rattus norvegicus

(rat

),

Pan
troglodytes

(chimp

)

3.

This file contains protein sequences in a format called FASTA
. It is just one of many file formats
Bioinformaticians use.

(
http://en.wikipedia.org/wiki/FASTA_format
)

AP Biology

Student Worksheet

Page
8

of
25

You now
completed the following steps in the phylogenetic analysis pipeline:

4.

Go to Clustal Omega site at
http://www.ebi.ac.uk/Tools/msa/clustalo/

5.

Either

a.

O
any text editor (e.g.
TextEdit

on MacOS or TextPad on Windows
)

b.

Highlight all sequences and copy the text

c.

Paste the copied sequences into
Step1

field

6.

Or

a.

In Clustal Omega click on
Choose File
, navigate to the file you downloaded in step 3 and click
Op
en

AP Biology

Student Worksheet

Page
9

of
25

7.

Make sure that it says PROTEIN in the drop down box above the data input field

8.

Click on Submit button

9.

Once alignments appear, click on Show Colors button. You should see a screen like this:

AP Biology

Student Worksheet

Page
10

of
25

10.

Click on FAQ to the left of the alignment section of the
screen. A new window will open with the list of
questions as links. Click on the question “What do the colours mean when I show them on the
alignment?”

What does each color of the alignment mean?

You now completed the following steps in the
phylogenetic analysis pipeline:

Exercise 2
:

Let’s calculate the distance matrix for the multiple alignment we created.

AP Biology

Student Worksheet

Page
11

of
25

1.

Click on Download Alignment File button. The alignment file is a text file containing the alignment.
Either save the file and open it in a text editor (e.g. TextEdit) or chose to open it in a text editor without
saving it first.

2.

Highlight the contents of
the text file and copy it.

3.

Go to
Protdist

tool in Phylip package at
http://mobyle.pasteur.fr/cgi
-
bin/portal.py#forms::protdist

4.

Paste the alignment into the data input box
.

5.

Cli
ck on Run button
.

6.

field

and click OK

7.

Enter captcha text into the validation box

and click OK

You now completed the following steps in the phylogenetic analysis pipeline:

Exercise 3:

Let’s now build a phylogenetic tree from the distance matrix produced in exercise 2.

1.

Highlight all the text in the
Outfile (PhylipDistanceMatrix)

and copy
.

AP Biology

Student Worksheet

Page
12

of
25

2.

On the top left of the page

you can see the list of programs provided by Pasteur web site:

Click on the plus sign next to
phylogeny

to expand it. Click on plus sign next to
distance

to expand it.
Click on
neighbor

program. You will see
Neighbor
-
Joining and UPGMA methods

tool in Phylip package

(
http://mobyle.pasteur.fr/cgi
-
bin/portal.py#forms::neighbor
)

Notice how you can click back and forth between Forms and Job
s tabs as well as within differ
ent
subtabs within those. This workbench makes it very easy and fluid to use

multiple tools.

3.

Paste the distance matrix into the data input text field.

4.

Click on Run button.

5.

Click on the
full screen view

button under the
Neighbor output file

text box
.

6.

Explore the output tree in the new window/tab.

7.

Please note that it says “
remember: this is an unrooted tree!

in

the

Neighbor output file

field.

This is
important. Even if the tree looks like it is drawn as rooted, there is no root in this tree.

8.

Now, look at
Neighbor output tree file

text box
. What you see in there is the ou
tput tree in NEWICK
format, which is one of the text formats for representing trees. In this forma
t, each subtree is encolsed
in

a set of matching parentheses. Two leaf nodes or subtrees are s
eparated by a comma. Every open
ing
parenthese must be m
atched by

a closing parenthese. The tree ends with a semi
-
colon.

9.

Can you draw a tree from the following NEWICK formatted tree?

(((A, B), (C, D)), (E, F))

AP Biology

Student Worksheet

Page
13

of
25

10.

Click on
view with archaeopteryx

button under
Neighbor output tree fi
le field.

Does the tree you see
look like the same or different as the one in step 7?

Troubleshooting:

Archaeopteryx is a Java application. If Java is disabled on your computer then skip this
step and go to Exercise 4 below.

11.

Notice the menu at the top of the p
opup
screen. Play around with differ
ent types of tree under the
Type

menu option. Notice that the types available
here do not change the topology of the tree. They
only cha
nge the way edges are dispayed.

12.

Click on X in the upper right corner to close
archaeopteryx screen.

You now completed the following steps in the phylogenetic analysis pipeline:

Exercise 4:

Let’s now

learn how to

visualize the tree using tools provided by Phylip.

Part A

Let’s learn
newicktops

tool.

1.

Highlight and copy the text in
the
Neighbor output tree

file text box
.

2.

On the top right of the page, under
phylogeny

click on plus sign next to
display

to expand it. Click on
newicktops

program. You will see
Newicktops

tool in Phylip package (
http://mobyle.pasteur.fr/cgi
-
bin/portal.py#forms::newicktops
)

3.

Paste the tree in NEWICK format into the data input box.

AP Biology

Student Worksheet

Page
14

of
25

Note:

t B below. Newicktops outputs a tree layout
in a format that Windows cannot read.

4.

Click on Run button.

5.

Click on
full screen view

under
Graphic tree file
:

6.

The file should open

in Preview

or in another brower tab

Troubleshooting:

If you are having troubles with this step then:

If you are on MacOS and the file does not open in preview then check the bottom of your
browser to see if a .ps file has been downloaded, then click on that file.
Depending on your
le may not be at the bottom of the brower window. Check

7.

You should see an output similar to this:

8.

Does the tree you see look like the same

as or different from

the one viewed in archaeopteryx?

9.

From this tree, what
species
’s beta globin is the human’s beta globin is most closely related to?

10.

From this tree, what
species
’s beta globin is the human’s beta globin is second most closely related to?

AP Biology

Student Worksheet

Page
15

of
25

Part B

Let’s le
arn
drawgram

tool.

1.

Go back to Jobs workbench tab.

2.

Click on the last subtab that says “neighbor

XXX” where XXX is a timestamp (the job subtabs run from
top to bottom and from left to right).

3.

Highlight and copy the text in the
Neighbor output tree

file text box
.

4.

On the top right of the page, under
phylogeny

-
>

display

c
lick on
drawgram

program. You will see
Drawgram

tool in Phylip package (
http://mobyle.pasteur.fr/cgi
-
bin/portal.py#forms::drawgram
)

5.

Paste the tree in NEWICK format into the data input box.

6.

Click on

and
in the field
Which plotter or printer will the tree be drawn on

select
Postscript for MacOS or Bitmap for Windows.

7.

Click on Run button.

8.

Examine the
Standard output

field.

9.

Does this tool output rooted or unrooted tree
?

(Hint: The text of the output should state so.)

10.

Click on
full screen view

under
Graphic tree file
.

Troubleshooting:

If you are having troubles with this step then:

If you are on Windows, did you remember to select Bitmap format for your output in step 6?

Check the bottom of your browser to see if a .ps (or .bmp on Windows) file has been

If all else f

11.

The file should open in Preview

12.

You should see an output similar to this:

13.

What
species

does the chicken
appear to be the closest to in this tree
?

AP Biology

Student Worksheet

Page
16

of
25

14.

Do you believe that is a correct relationship? If not, how would you suggest f
ixing it?

15.

Go back to the Forms workspace tab.

16.

You should be back in
drawgram

subtab.

17.

Scroll down to
Tree grows …

option and pick
Horizontally

in the drop down list.

18.

Right below it, select
Circular tree (O)

in the
Tree style

drop down list.

19.

Click on Run

button.

20.

Click on
full screen view

under
Graphic tree file
.

21.

You should see an output similar to this:

22.

Where in this tree would you place a root? (you can just draw it on the figure above)

Part C

Let’s learn
drawtree

tool.

1.

Go back to Jobs workbench tab.

2.

Click on the last subtab that says “neighbor

XXX” where XXX is a timestamp (the job subtabs run from
top to bottom and from left to right).

3.

Highlight and copy the text in the
Neighbor output tree

file text box
.

4.

On the top right of the page, under
phylog
eny

-
>
display

click on
drawtree

program. You will see
Drawtree

tool in Phylip package (
http://mobyle.pasteur.fr/cgi
-
bin/portal.py#forms::drawtree
)

5.

Paste the tree in NEWICK for
mat into the data input box.

6.

Scroll down and select
Yes

in the drop down list for
Try to avoid label overlap

option

AP Biology

Student Worksheet

Page
17

of
25

7.

Click on

and

in the field
Which plotter or printer will the tree be drawn on

select
Postscript for MacOS or Bitmap for Windows.

8.

Click on Run button.

9.

Examine the
Standard output

field.

10.

Does this tool output
a
rooted or
an
unrooted tree
?

(Hint: the output text should state this.)

11.

Click on
full screen view

under
Graphic tree file
.

Troubleshooting:

If you are having troubles with this step then:

If you are on Windows, did you remember to select Bitmap format for your output in step 6?

Check the bottom of your browser to see if a .ps (or .bmp on Windows) file has been

12.

The file should open in Preview
.

13.

You should see an output similar t
o this:

14.

Where in this tree would you place a root? (you can just draw it on the figure above)

23.

Go back to the Forms workspace tab.

24.

You should be back in
drawtree

subtab.

25.

Scroll down to
Use branch lengths

option and pick
No

in the drop down list.

26.

Click
on Run button.

27.

Click on
full screen view

under
Graphic tree file
.

28.

You should see an output similar to this
:

AP Biology

Student Worksheet

Page
18

of
25

29.

As a biologist, how would you interpret the difference betwee
n the trees produced in steps 8 and 26
?
What does one tree tell you that the other
tree does not tell you?

You now completed the following steps in the phylogenetic analysis pipeline:

AP Biology

Student Worksheet

Page
19

of
25

Exercise 5
:

As you can see from Exercise 4, without including an outlier group a tree could show misleading relationships.
The trees produced in Exercise 4 led you to believe that chicken is more closely related to Salmon than any
other
species
. L
et’s

now

er group. We are going to add
the proteomic sequence for the human
myoglobin protein
. An outlier groups should be a related sequence whose relationship is known to be older
than the one you are interested in. This could be a paralog that predates the speci
ation.
We know that
myoglobin and betaglobin proteins split from the common ancestor before speciation for those
species

we
included into our analysis.

1.

_2.txt from
http://compbio.soe.uc
sc.edu/binf
-
in
-
AP/

This file contains the same sequences we used in Exercise 1 with one addition of the outlier sequence

2.

3.

Highlight all sequences and copy the text

4.

Perform multiple sequence

5.

Compute the distance matrix for the alignment as described in Exercise 2.

6.

Build phylogenetic tree as described in Exercise 3
.

a.

When in the
Neighbor
-
Joining and UPGMA methods

tool, scroll
down to
Outgroup
species

(
default, use as outgroup
species

1)

option and type in
13

(this means that 13
th

sequence is the
outlier group
)

7.

Visualize the tree using one or more of the methods described in Exercise 4
.

Troubleshooting:

If you were unable to suc
ceed with any visualization methods in Exercise 4 then skip
through to Part C.

AP Biology

Student Worksheet

Page
20

of
25

8.

In the output tree, what
species
/group is chicken most related to
?

AP Biology

Student Worksheet

Page
21

of
25

Part C

In this section
we will use phylogenetic analysis to find out to which strain of

the

Simian immunodeficiency
virus (
SIV
)
the
Human immunodeficiency virus (
HIV
) is
evolved from
. SIV is able to infect at least 33
species

of
African primates. Different strains of this virus have been
extracted from different
species
. We will use the
protein
sequence of Group
-
specific antigen (
GAG
) protein from 4 different strains of SIV and from 1 strain of
HIV to build a phylogenetic tree.
GAG gene is a characteristic component of retroviruses. Retroviruses are
those viruses that carry an RNA genome, rather
than a DNA genome. They use an enzyme called reverse
transcriptase to produce DNA from their RNA genome, then incorporate that DNA into the host’s genome.
Both SIV and HIV are retroviruses.

Remember that

in order

to produce a phylogenetic tree you follow
these steps you learned in Part B:

Exercise 1
:

Let’s use GAG protein sequence from different strains of SIV and HIV to build a phylogenetic tree.

1.

.txt from
http://compbio.soe.ucsc.edu/binf
-
in
-
AP/evolution/

AP Biology

Student Worksheet

Page
22

of
25

This file contains GAG sequences for
immunodefficiency viruses found in
human
s
,
African green
monkey
s
,
Sooty mangabey monkey
s
,
Chimpanzee
s
, and
Macaque
s
.

1.

Following the steps you learned in Part B
analyze phylogeny of the provided SIV and HIV viruses and

2.

3.

What
species

of SIV virus is human HIV virus is more closely related to
?

AP Biology

Student Worksheet

Page
23

of
25

Part D

We will n
ow
make

phylogenetic
analysis
to determine phylogeny of marsupials based on Retinol Binding
Protein 3. This protein is a large extracellular glycoprotein
that binds retinol

to the contiguous layer of pigment
epithelium cells. It is well known phylogenetic marker
in
mammal

evolution and has been used in
the
scientific
studies
befo
re
.

However, this protein is not specific to
mammals

only and is
present in other animals.

Remember that

in order

to produce a phylogenetic tree you follow thes
e steps you learned in Part B:

Exercise 1:

Let’s
use
RBP3
sequence
s

from
various
animals

to see evolutionary rel
ationship between marsupials,

placental
s, and other

animals
.

2.

.txt from
http://compbio.soe.ucsc.edu/binf
-
in
-
AP/evolution/

This file contains
seq
uences for the following
species
:

AP Biology

Student Worksheet

Page
24

of
25

Marmosops
noctivagus

(n
eotropic opossum
)
,
Ornithorhynchus anatinus

(platypus
)
,

Dipodomys merriami

(
mariam’s
kangaroo rat
)
,

Dipodomys ordii

(ord’s kangaroo rat
),
Dipodomys spectabilis

(banner
-
tailed kangaroo rat
),

Wallabia bicolor

(swamp wallaby
)
,

Petrogale lateralis

(rock walla
by
)
,

Setonix brachyurus

(quokka
)
,
Onychogalea unguifera

(nail
-
tail wallaby
)
,

(beaver
),

Peromyscus maniculatus

(deer mouse
),

Uranomys ruddi

(
white
-
bellied brush
-
furred rat

),

Notomys fuscus

(dusky hopping
mouse
),
Perognathus flavus

(silky pocket mouse
)
,
Liomys
pictus

(painted spiny pocket mouse
)
,

Drosophila melanogaster

(fly
)
.

4.

Which out of the listed
species

should be an outgroup
?

5.

AP Biology

Student Worksheet

Page
25

of
25

6.

Following the steps you learned in Part B analyze phylogeny of the provided
species

following questions.

7.

What other
species

is platypus most closely related to
?

8.

Do kangaroo rats belong to the same clade
?

9.

What other
species

are the kangaroo rats are most closely related to
?