Find the closest match to your sequence - GENI-SCIENCE.org

thingyoutstandingBiotechnology

Oct 1, 2013 (3 years and 10 months ago)

94 views

Bioinformatics and Genomics Lab

Dr. Wood

Seattle Pacific University


Purpose:
The goal of this lab is to introduce you to the tools and resources used to extract
information from genome sequence data and to familiarize you with the information resources
that will enable you to apply this information in clinical practice.


Part I: What can we learn from DNA?

Scenario:
You are a laboratory assistant who has just taken a job at the Centers for Disease
Control in Atlanta, GA. Your supervisor has just begun wo
rk on a disease outbreak occuring in a
small town in Washington state called Coupeville. A large number of
high school
athletes have
acquired skin infections that are resistant to standard antibiotic treatment. Clinicians on site are
confident that they ha
ve identified the causative agent of the disease and are working hard to
find an effective treatment. Given the size of the outbreak the CDC has decided to begin an
investigation into the source of the outbreak which requires further molecular characteriza
tion.
You have been given two tasks. First, confirm the identity of the responsible bacterium. Second,
determine the cause of the antibiotic resistance. Once this is complete the laboratory will
genetically profile all isolates to determine if a one or mor
e individual strains are involved in
this outbreak (you will not do this during this lab).

Procedure:

Identify the causative agent using rDNA comparison.
Your supervisor has requested that the
laboratory sequence the rDNA of the bacterium isolated from pat
ients in the outbreak. The
sequence has been provided to you below and you must use it to verify the clinical
identification of the agent causing this outbreak.


(modified from
http://rdp.cme
.msu.edu/assigngen/basicinstr.jsp

):

Find
the closest match to your sequence



Go to the Ribosomal Database Project at
http://rdp.cme.msu.edu/index.jsp
.



Go to the Sequence Match analysis tool (also called SEQM
ATCH).



Paste your unknown rDNA sequence into the text box.



Change the options below to:

o

Strain:

Both

o

Source:

Isolates

o

Size:

>1200

o

Quality:

Good

o

Taxonomy:

Nomenclatural

o

KNN matches:

1



Click on "Submit".



Go to "view selectable matches."
O
nly the closest match will be displayed. Record the
Genus
and species

of the closest relative.

Name of organism:______________________________

What diseases

are caused by this type of bacterium
?




Determine why the organism is resistant to antibiotics.

Yo
ur supervisor has
asked

the laboratory to
provide
you with
the sequence of a particular gene that she believes is involved in the ability of this
pathogen to resist antimicrobic therapy.
This sequence is available below. D
etermine the identity of the
gene
encoded by this sequence and investigate the mechanism by which it confers resistance.

a.

Identify the unknown gene.

Using the BLAST program discussed in lab (see links below) identify
the name of the unknown gene and answer the following questions:

I.

From

N
CBI BLAST
(
http://www.ncbi.nlm.nih.gov/blast/

)

select
protein blast

near the
center of the page.

II.

Paste your unknown sequence into the box and select the
BLAST

button near the
bottom of the screen.

III.

Review

the information and answer the following questions:

b.

Questions
:


I.

What is the e
-
value of the match between your protein and its best match?

What does
this tell you about these two proteins?


II.

Is your protein identical to the best match? If not, how many of
the amino acids are
exact matches?


III.

What is the name of your unknown protein

based on similarity with its best match?


IV.

What is the name of the gene that

makes

your protein
?



c.

Investigate protein domains.

Using the PFAM program below investigate any protein

do
mains
in your unknown protein

to help you
determine its function
.

I.

From the
PFAM programs at the Sanger center (
http://pfam.sanger.ac.uk/
)

select

the
Sequenc e Search

link.

II.

Paste your unknown sequence into the b
ox and hit go.

III.

Review the information and answer the following questions.

d.

Questions
:

I.

What is a protein domain?



II.

Evaluate the three best scoring domains. List each below along with the e
-
value of the
match and briefly describe its function .

i.



ii.




iii.



e.

Further

investigation
.
You should work on this section at home. You will need to research the
function of the gene

you identified

using internet or other sources. Feel free to work with your
lab partner or in groups to answer these questions.


I.

What class of antib
iotics would you expect this pathogen to be resistant to?



II.


How does general type of protein work in normal cells?




III.

How does the presence of this protein provide resistance to antibiotics?








IV.

What antibiotics might work to treat this disease?



Par
t II: The human genome and disease.

In this part of the laboratory you will investigate the link
between genetic alterations in the human genome and disease. Use the information found at
http://www.ncbi
.nlm.nih.gov/disease/

and other sites you

locate on the internet to answer the
following questions.


Questions:

1.

What gene or genes are mutated in patients with Cystic Fibrosis?




2.

How many mutations are associated with CF?




3.

Which chromosome contains
the gene whose alteration leads to CF?




4.

What
specific
microorganisms are commonly associated with this disease?



5.

How is this disease treated?



6.

Select one other genetic disease (it does not need to be microbial in nature). Note the
chromosome or chromos
omes involved, the gene or genes involved, the specific
mutation or mutations
and briefly review

the symptoms and treatment for the disease.





Resources
:

Your unknown rDNA sequence
:

TTTTATGGAGAGTTTGATCCTGGCTCAGGATGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAGCGAACGGACG
AGAAGCTTGCTTCTCTGATGTTAGCGGCGGACGGGTGAGTAACACGTGGATAACCTACCTATAAGACTGGGATAACT
TCGGGAAACCGGAGCTAATACCGGATAATATTTTGAACCGCATGG
TTCAAAAGTGAAAGACGGTCTTGCTGTCACT
A
TAGATGGATCCGCGCTGCATTAGCT
AGTTGGTAAGGTAACGGCTTACCAAGGCAACGATGCATAGCCGACCTGAGAG
GGTGATCGGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGG
CGAAAGCCTGACGGAGCAACGCCGCGTGAGTGATGAAGGTCTTCGGATCGTAAAACTCTGTTATTAGGGAAGAACAT
ATGTGTAAGTAACTGTGCACATCTTGACGGTACCTAATCAGAAAGCCACG
GCTAACTACGTGCCAGCAGCCGCGGTA
ATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGTAGGCGGTTTTTTAAGTCTGATGTGAAA
GCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGAAAACTTGAGTGCAGAAGAGGAAAGTGGAATTCCATGTGT
AGCGGTGAAATGCGCAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGATG
TG
CGAAAGCGTGGGGATCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGG
TTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTGAAACTCAAAG
GAATTGACGGGGACCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAAATCTTG
ACATCCTTTGACAACTCTAGAGA
TAGAGCCTTCCCCTTCGGGGGACAAAGTGACAGGTGGTGCATGGTTGTCGTCAG
CTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTAAGCTTAGTTGCCATCATTAAGTTGGGCA
CTCTAAGTTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGATTTGGGC
TACACACGTGCTACAATGGACAATACAAAGGGCAGCGAAACCGCGAGG
TCAAGCAAATCCCATAAAGTTGTTCTCAG
TTCGGATTGTAGTCTGCAACTCGACTACATGAAGCTGGAATCGCTAGTAATCGTAGATCAGCATGCTACGGTGAATA
CGTTCCCGGGTCTTGTAC
ACACCGCCCGTCACACCACGAGAGTTTGTAA
ACCCGAAGCCGGTGGAGTAACCTTTTAG
GAGCTAGCCGTCGAAGGTGGGACAAATGATTGGGGTGAAGTCGTAACAAGGTAGCCGTATCGGAAGGTGCGGC
TGGA
TCACCTCCTTTCT


Sequence of unknown protein provided by your supervisor:

mkkikivpli livvvvgfgi yfyaskdkei nnti
w
aiedk nfkqvykdss yisksdngev

em
m
er
e
ikiy nslgvkdini qdrkikkvsk nkkrvdaqyk iktnygnidr nvqfnfvked

gmwkldwdhs viipgmqkdq sihienlkse rgki
w
drnnv elan
tgtaye igivpknvsk

kdy
w
aiakel
i
i
e
edyikqq m
q
q
a
wvqddt fvplktvkkm deylsdfakk fhlttnetes

rnyplgkats hllgyvgpin seelkqkeyk gykddavigk kgleklydkk lqhedgyrvt

ivddnsntia htliekkkkd gkdiqltida kvqksiynnm kndygsgtai hpqtgellal

vstpsydvyp fmygmsneey n
v
ltedkkep llnkfqi
tts pgstqkilta miglnnktld

dktsykidgk
w
wqkdkswgg yn
q
tryevvn gni
q
l
q
qaie ssd
f
iffarv alelgskkfe

kgmkklgvge di
e
sdypfyn aqisnknldn eilladsgyg qgeilinpvq ilsiysalen

ngninaphll kdtknkvwkk niiskeninl lt
m
gm
m
qvvn kthkediyrs yanligksgt

grqigwfisy dkdnpnmmma invkdvqdkg

masynakisg kvydelyeng

nkkydide