Bioinformatics Lab

vivaciousefficientΒιοτεχνολογία

1 Οκτ 2013 (πριν από 4 χρόνια και 11 μέρες)

105 εμφανίσεις

* Fictional statement. Can you guess how the “Bigfoot” sequence was generated?

**GenBank link: http://www.ncbi.nlm.nih.gov/Genbank/







Exercise 1:

Molecular Phylogeny of Humans, Primates, and … the Yeti?

(Thanks to Dr. Brian Bettencourt, UML, for the following)



Our research team recently received a hair sample, believed to be of the legendary and
mysterious Yeti (“Bigfoot”, “Sas
quatch”, etc.), from a group of Nepalese monks living high in
the Himilayas.* Although the hair was apparently old and not preserved in any fashion, we
successfully extracted
mitochondrial DNA
. To determine whether and how the purported
Bigfoot was relate
d to modern humans
and other primates, we amplified the
“hypervariable region 1” of the
mitochondrial D
-
loop noncoding
sequence. This region has been heavily
used in studies of humanand primate
genetic diversity. Search GenBank**
for “
D
-
loop Hypervariabl
e Region 1,”
you will find almost 2000 entires.

The D
-
loop, and especially its
hypervariable regions 1 and 2, is non
-
coding, so it tends to vary quite a bit
between individuals, populations, and
species. D
-
loop sequences thus work
well to determine close
relations (and
not well at all for deep divergence).
Plus, since the mitochondrial genome is
so small, it can often persist for a very
long time in particular tissues


mitochondrial genes have been
successfully amplified from a variety of
ancient sources
, including
Neanderthal

remains and prehistoric modern humans.


The question to be addressed is where the putative Bigfoot sequence will be placed on a
phylogeny (
i.e.

“family” tree) of related sequences. A hardworking instructor has done much of
the legw
ork for this task by searching GenBank to find D
-
loop Hypervariable Region 1
(henceforth “HVR”) sequences from several primates, Neanderthals, ancient humans, and
modern humans and reassemble the sequences into a multifasta file. So, yo
u’re ready to begin
!


• See Instructions on Reverse •
Technology, Engineering and Math
-
Science Academy for Advanced High School Students


Bioinformatics
Exploration


2

Exercise 1 Procedure


1.

On your
computer’s
desktop is a file named: “HVR1.txt”.

Double click on the icon to open the text

file and examine the sequences.
Here’s how to interpret the names of the sequences in the HVR1.txt
file
. If you visit the “Exercise 1” page on the TEAMS Wiki, you can see photos of each “critter.”

Non
-
Homo sapiens

sapiens

sequences are prefixed as follows:

a.

pongo


Pongo pygmaeus (orangutan)

b.

pan


Pan troglodytes troglodytes (
Common
chimpanzee)

c.

panv

Pan t
roglodytes verus (
West African
chimpanzee)

d.

bono


Pan paniscus (bonobo

or Pygmy Chimpanzee
)

e.

bigfoot


Bigfoot!

(or Sasquatch or the Yeti)

f.

nea


Homo sapiens neanderthalensis (Neanderthal)

Th
e rest of the
DNA
sequences relate to
Homos s
apiens

sapiens

of one t
ype or another:

Ancient human sequences

are named “ancientAUS” = an ancient Australian human.

Modern
human sequences

are named by country of origin.
Here’s a bit of information about the Homo
Sapien DNA sequences in the HVR1.txt file:


g.

luke


A sequence ob
tained from a body reported to be that of the Christian Saint
Luke!
(2000 years old?) We’re not making this up! L
ook in GenBank
!

h.

Vietnam

add brief descriptions + age for the rest of these

(can look up name in
Gen Bank


see link on wiki)

i.

AncientAUS

60,00
0
-
year
-
old Mungo Man skeleton unearthed in New South Wales in 1974

j.

Syria

add brief descriptions

2.

Open
CLUSTALX

program on you desktop the open

the HVR1 sequences

(browse for
HVR1.txt

file in your
My Documents

folder)
.

3.

Under
Alignment
-
> Output Format Option
s
, change “Output Order” to “
Input” then press the
CLOSE

button.

4.

Under
Alignment
, select “
Do complete alignment
”.

This process will take a few moments.

5.

Next, under
Trees
, select “
Bootstrap N
-
J Tree
”. Make sure 1000 trials will be run
, then click on
Run/O
K
.
Afer a few moments, t
his will produce a

bootstrapped NJ tree file,


suffixed .phb (we’ll
discuss
bootstrapping

and NJ in lecture).

6.

Open the
HVR1
.phb bootstrapped tree
using
TreeView

(doubleclick on the icon on desktop, then
File
Open

and
B
rowse

in
My
Documents
).



Click on the buttons for the various tree structures
(
Radial, Slanted C
ladogram, Phylogram
, etc.
)




What does each graph tell you about the relationships between the DNA sequences?


7.

Open the
HVR1.phb

bootstrapped tree using
NJPlot

(doubleclick o
n the icon on desktop, then
File
Open

and browse in My Documents).



Toggle display of Bootstrap values. What do

the number on the clade mean?



How well supported is each clade based on the bootstrap values?

8.

Discussion Questions:

a.

How are modern and anci
ent human sequences related to one another? (Hint
-

there’s a surprise)

b.

What is the placement of Bigfoot relative to humans and other primates? Who are our most
recent ancestors?

c.

How many diagnostic substitutions differentiate humans from Neanderthals? W
hat about
human/Neanderthal shared polymorphisms?

d.

Based on the level of variability in the dataset, do you think adding more sequences would
increase or decrease the likelihood of supporting a model whereby humans and Neanderthals
interbred?

e.

Finally,
can y
ou
briefly suggest a model of human
evolution based on this dataset?


3




Exercise 2: 3D Modeling of a Complex Molecular
Structure



In this lab, you’ll explore protein structures and how they can be represented graphically.
Proteins are not just linear
polymers of amino acids; proteins have very specific 3
-
dimensional
structures.

Protein structure is divided into four categories:


1.

Primary structure

= the actual sequence of amino acids in the polypeptide chain.


2.

Secondary structure

= the three
-
dim
ensional “folded” shape that the polypeptide
chain (backbone) assumes, most common being


(alpha)
helices and


(beta)
pleated sheets.


3.

Tertiary structure

= Overall three
-
dimensional shape of a polypeptide made by
interactions of different secondary structures within the protein.


4.

Quaternary structure

= more than one polypeptide chain i
nteracting to form a
multi
-
subunit structure (i.e., dimers, trimers, etc…).


a.

“homo”

multimers contain >1 molecule of the same polypeptide


b.

“hetero”

multimers contain two or more different polypeptide chains.


Since proteins are large, complex molecul
es, determining the actual 3
-
D structure of any given
protein can be a very arduous task that can require years of work. However, once the structure of
a particular protein is known, it still is difficult for a scientist to visualize these complex
molecul
es. With the advent of computer graphics technology, programs have been developed
which use the structural data obtained for a complex protein to create a 3
-
D image that can be
manipulated
in silico
. In this exercise, we will be using a very new program
called Cn3D
available
(free)
at:


http://www.ncbi.nlm.nih.gov/Structure

THE OBJECTIVES
:


1.

Understand basic molecular structure.

2.

Become familiar with computer visualization of molecular structures.

3.

Learn how to derive information from molecular models.


Prot
ein Structure Scavenger Hunt



You can use

NCBI’s MMDB protein
-
structure database and accompanying Cn3D tool to
hunt down, explore, and illustrate several types of protein structures. To illustrate, follow along
with this first example


hunting for an
al
pha helix
.


1.

Go to MMDB at
http://www.ncbi.nlm.nih.gov/Structure/

2.

In the “Search Entrez
Structure/MMDB

box (empty white box near top), enter
“protein alpha
helix”
and click

Go.

Technology, Engineering and Math
-
Science Academy for Advanced High School Students


Bioinformatics
Exploration


4

3.

The last time we looked (10/22/07) there were
686 listings of different protein
structures. You
can pick one of the structures to explore by clicking on its accession number
(
blue, underlined)



for example, the top entry in the list,

2JUW




Nmr Solution Structure of Homodimer Protein
So_2176 From
Shewanella oneidensis
”.

4.

Clicking on

an accession number will take you to a MMDB Structure Summary Page. It will tell
you some information about the protein structure, by whom it was submitted, and so forth.

5.

Now click
“View 3D Structure” or click directly on the image
.

6.

This should pop up th
e helper application Cn3D. The structure of the
Glms Ribozyme
(or
whichever one you choose) will be displayed in a black window, and the primary sequence(s) of
the polypeptide chains will be aligned in a white window. Note that some structures are of on
ly
one sequence, some are of more than one (depending on whether a single protein or a complex
had its structure solved).

7.

The default rendering style (how the graphics are displayed) should be set as “Worms”, and the
default coloring style should be set as

“Secondary Structure”. You can check those (and/or try
other styles) by using the
“Style” menu, “Edit Global Style” option in the top menu bar of the
structure window
. This example will use the default settings.

8.

In Worms view with Secondary Structure co
loring, strands that are parts of
Alpha helices
are
green,
Loops

are blue, and
Beta Sheets

are gold. Arrows always point in the N


C terminal
direction.

9.

The task at hand is to illustrate an alpha helix. A good way to do this is by exploiting the
sequenc
e window (lower). You’ll notice that the amino acid residues are
colored the same way
as the cartoon view
: So, for example, residues that are part of alpha helices are green!

10.

In the sequence window, use the mouse to draw a box around the first set of ali
gned green
residues. They will get highlighted

as shown
:








11.

Next, pull the slider bar to the right to see the rest of the sequence.

12.

Holding down the shift key
,
select the other two chunks of alpha
-
helical sequence:








13.

Now, up in the graphic wind
ow, select
“Show/Hide


Show Selected
Domains

. The cartoon
should now change coloring: The residues wrapped around the green cylinders are now yellow!
In addition, all the non
-
selected features should disappear.

14.

NOW

your task is to repeat this process fo
r each of the following structures
!



parallel beta sheet



antiparallel beta sheet



helix
-
loop
-
helix



beta barrel



leucine zipper


HIGHLIGHT THE LEUCINES