CISC 4020 Bioinformatics

moredwarfBiotechnology

Oct 1, 2013 (4 years and 10 days ago)

112 views

CISC 4020 Bioinformatics







Mon
day, February
1
4

Lab Exercise #2
:
Pairwise Sequence Alignment


(due February 22



submit on Blackboard)

Resources:

NCBI
BLAST:
www.
ncbi
.nlm.nih.gov/
BLAST
/

Entrez:
www.
ncbi
.nlm.nih.gov/
Entrez
/


1) Michael Crichton'
s fantasy about cloning dinosaurs, Jurassic Park contains a putative dinosaur
DNA sequence. Use nucleotide
-
nucleotide BLAST against the “Nucleotide collection” database
to identify the real source of the following sequence:



>DinoDNA "Dinosaur DNA" from
Crichton's JURASSIC PARK p. 103 nt 1
-
1200

GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGC

GGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCG

TGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGC

TGCTCACGCTGTACCTATCTCAGTTCGGTG
TAGGTCGTTCGCTCCAAGCTGGGCTGTGTG

CCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAA

AGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAG

ATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACT

CCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGG
CGGCCGACGCGCTGGGCT

GGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGG

CCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAA

CGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCG

CACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCAT
CACAAA

CAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAA

GCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGG

CTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTG

ACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCA

ACACG
ACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCC

GCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGG

CCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGG

CCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCT



2) Mark Boguski

of the NBCI noticed this and supplied Crichton with a better sequence for the
sequel, The Lost World. Identify the most likely source of this sequence using nucleotide
-
nucleotide BLAST.


>DinoDNA "Dinosaur DNA" from Crichton's THE LOST WORLD p. 135

GAA
TTCCGGAAGCGAGCAAGAGATAAGTCCTGGCATCAGATACAGTTGGAGATAAGGACG

GACGTGTGGCAGCTCCCGCAGAGGATTCACTGGAAGTGCATTACCTATCCCATGGGAGCC

ATGGAGTTCGTGGCGCTGGGGGGGCCGGATGCGGGCTCCCCCACTCCGTTCCCTGATGAA

GCCGGAGCCTTCCTGGGGCTGGGGGGGGGCGAGAGGACGGAGGCGGGGGGGCTGCTGGCC

TCCTACCCCCCCTCA
GGCCGCGTGTCCCTGGTGCCGTGGGCAGACACGGGTACTTTGGGG

ACCCCCCAGTGGGTGCCGCCCGCCACCCAAATGGAGCCCCCCCACTACCTGGAGCTGCTG

CAACCCCCCCGGGGCAGCCCCCCCCATCCCTCCTCCGGGCCCCTACTGCCACTCAGCAGC

GGGCCCCCACCCTGCGAGGCCCGTGAGTGCGTCATGGCCAGGAAGAACTGCGGAGCGACG

GCAACGCCGCTGTGGCGCCGGGACGGC
ACCGGGCATTACCTGTGCAACTGGGCCTCAGCC

TGCGGGCTCTACCACCGCCTCAACGGCCAGAACCGCCCGCTCATCCGCCCCAAAAAGCGC

CTGCTGGTGAGTAAGCGCGCAGGCACAGTGTGCAGCCACGAGCGTGAAAACTGCCAGACA

TCCACCACCACTCTGTGGCGTCGCAGCCCCATGGGGGACCCCGTCTGCAACAACATTCAC

GCCTGCGGCCTCTACTACAAACTGCACCAAGTGAACCGC
CCCCTCACGATGCGCAAAGAC

GGAATCCAAACCCGAAACCGCAAAGTTTCCTCCAAGGGTAAAAAGCGGCGCCCCCCGGGG

GGGGGAAACCCCTCCGCCACCGCGGGAGGGGGCGCTCCTATGGGGGGAGGGGGGGACCCC

TCTATGCCCCCCCCGCCGCCCCCCCCGGCCGCCGCCCCCCCTCAAAGCGACGCTCTGTAC

GCTCTCGGCCCCGTGGTCCTTTCGGGCCATTTTCTGCCCTTTGGAAACTCC
GGAGGGTTT

TTTGGGGGGGGGGCGGGGGGTTACACGGCCCCCCCGGGGCTGAGCCCGCAGATTTAAATA

ATAACTCTGACGTGGGCAAGTGGGCCTTGCTGAGAAGACAGTGTAACATAATAATTTGCA

CCTCGGCAATTGCAGAGGGTCGATCTCCACTTTGGACACAACAGGGCTACTCGGTAGGAC

CAGATAAGCACTTTGCTCCCTGGACTGAAAAAGAAAGGATTTATCTGTTTGCTTCTTGCT

GA
CAAATCCCTGTGAAAGGTAAAAGTCGGACACAGCAATCGATTATTTCTCGCCTGTGTG

AAATTACTGTGAATATTGTAAATATATATATATATATATATATATCTGTATAGAACAGCC

TCGGAGGCGGCATGGACCCAGCGTAGATCATGCTGGATTTGTACTGCCGGAATTC


3)
Viral reverse transcriptase, such as the
pol

gene product encoded by HIV
-
1,
have human
homologs. The GenBank accession number for HIV
-
1 reverse transcriptase is NP_057849. (Use
Entrez to confirm the accession number.)

A search of Entrez reveals many

human viral
-
related
gene products, including a retrovirus
-
related Pol polyprotein
of 874 amino acid residues
(P10266). Perform a pairwise alignment using the blastp program.


The default conditions for this search include the use of the BLOSUM62 scoring matrix. The
expect value is about 1 x 10
-
67
, indicating that the proteins are close
ly related even though they
share only 28% identity over a span of 761 amino acids. Repeat the analy
sis using the
BLOSUM62, BLOSUM45
, and BLOSUM
8
0 scoring matrices. What is the effect of changing
the search parameters?


4)
Next perform pairwise alignments
of the
proteins described in exercise 3

us
ing the PAM30
and PAM70

matrices. What are the expect values? What span of amino acid residues is aligned?
Are the search results using d
ifferent PAM matrices similar or

different to the results of using
different
BLOSUM matrices?


5
)
Compare modern human mitochondrial DNA to extinct Neanderthal DNA. First obtain the
nucleotide sequence of a mitochondrially encoded gene, cytochrome oxidase

in Neanderthals.
Next, perform a BLAST search on the human genome, note the m
ost significant alignment and
record the percent nucleotide identities.


6
) Is a hippopotamus more closely related to a pig or to a whale? To answer this question, first
find the protein sequence of hemoglobin from each of these three organisms. Next, per
form
pairwise sequence alignments and record amino acid identities.