Bioinformatics - SCC draft - Fgamedia.org

moredwarfBiotechnology

Oct 1, 2013 (3 years and 11 months ago)

97 views

Bioinformatics


Story Centered Curriculum


Overview



bioinformatics provides tools to analyze genomic and proteomic data, from
which mutation, especially in viruses, can be studied in nearly real
-
time. Recent research
1

has focused on bioinformatics compa
risons of current avian influenza with the infamous
pandemic of 1918, facilitating a deeper understanding of flu evolution, including
predictions of an emerging epidemic. Using bioinformatics, HIV, SARS, and avian flu,
have all been investigated using phyl
ogenetic tools, especially multiple sequence
alignment of genomic and proteomic data. This unit will provide the student / researcher
with a step
-
by
-
step approach to finding and aligning sequence data from various viruses,
focusing on following the relatio
nship between mutation and epidemiology. This exercise
requires knowledge of viral epidemiology, and a bioinformatics skill foundation as a
prerequisite. This is provided in the first half of our bioinformatics class, spanning
roughly three assignments. In

a phylogenetics exercise examining HIV, SARS, and then
avian flu, students will progress to performing ‘problem posing’ using data found in
public bioinformatic databases (NCBI in particular). Students will be asked to evaluate
the threat of avian flu, if

avian flu is as serious as the Spanish flu, and what specific
resources should be mobilized to appropriately address a possible pandemic. Using
bioinformatics tools, it is now possible to model how bird flu is evolving, where and
when it will enter the hu
man population, and the best strategy for containment.



Scenario: You are working for the World Health Organization (WHO) in Summer 2005.
You have access to avian and human flu sequences from 1918, 1934, 1957, 1968, and
1997, and a basic phylogenetics to
ol for performing multiple sequence alignments. Your
job is to analyze the current, real
-
time influenza (1997 AI) sequence data from Asia,
compare it to both phylogenetic and epidemiological data from 1918,
and realistically
assess the current and future t
hreat of an influenza pandemic in humans
. If needed,
you many specify what additional data you need to predict when and where bird flu
might enter the human population, and how best to contain a possible pandemic. You
have only 12 weeks to complete your in
itial assignment and make both a presentation and
formal recommendations to the United Nations and World Health Organization.


Learning objectives and tasks:


1)

Learn epidemiology and biology of influenza

2)

Search and review 1918 influenza bioinformatics lite
rature (use PNAS)

3)

Learn why was influenza pandemic of 1918 so deadly?

4)

Perform Multiple Sequence Alignment of HIV and SARS

5)

Perform phylogenetic comparisons of HIV and SARS

6)

From literature, determine what flu sequences to compare

7)

Get 1918 and other flu seque
nces from NCBI (and other sources)

8)

Format flu sequences into a single organizing text document

9)

Upload flu sequence document to Biology Workbench and run sequences (MSA)

10)

Make intuitive judgments about outbreaks of flu and antigenic drift and shift

11)

Predict
from all the above tasks if Avian Influenza (AI) will be a pandemic

Detail


Foundational:


Learn NCBI


portal structure, types of information. Follow a mascot. Do BLAST.
Why use NCBI
-

because 50% of all genomic information is there. Navigation is not
int
uitive. Tip is to write down your travels. Write down names of articles. Keep a
journal. Even remarks about where you find sequences. Learn phylogenetics and
dendrograms to interpret protein evolution. Understand how proteins are similar and
different to e
ach other. What is a substitution (matrix) table? Do some basic MSA.


1)

Learning protein alignments

a.

Background to sequence alignment

b.

Substitution tables / substitution matrix

c.

Understanding dendrograms

2)

Do the HIV mutation study task (read the Markham paper
on HIV mutation)

a.

Study the HIV PPT

b.

Read the Markham paper

c.

Do the HIV upload to Biology Workbench

d.

Perform a MSA, create a dendrogram

e.

Interpret dendrogram, reflect on Markham paper

3)

Retrovirus comparison of HIV, SIV, HTLV, STLV

a.

Find the ENV protein for HIV,
SIV, HTLV, and STLV

b.

Create a text file with that information

c.

Upload the protein sequences to Biology Workbench (Retro session)

d.

Perform a MSA with phylogenetics tool, create series of dendrograms

e.

Determine which of the four viruses has the ‘oldest’ envelope

protein

4)

Do the SARS exercise (complete step
-
by
-
step, and do a talk aloud demo)

a.

Start with the CDC sequence from May 2003

b.

Read the Science express Paper carefully, especially figures

c.

Download protein sequences for SARS and related viruses

d.

Create a single t
ext file for protein sequences from all viruses

e.

Upload the protein sequences to Biology Workbench (SARS session)

f.

Perform a MSA with phylogenetics tool, create series of dendrograms

g.

Interpret dendrogram related to SARS emerging in Civet cats.

5)

Do influenza

a.

R
ead the literature for bioinformatics of influenza

b.

Analyze what has been done before.

c.

Choose an experiment to repeat

i.

NA comparison of H5N1 and H1N1

ii.

HA comparison of H5N1 and H1N1

iii.

NA comparison of all recent flu strains

iv.

HA comparison of all recent flu strai
ns

v.

Comparison of all recent H5N1 isolates

d.

From one experiment, record sequences (accession numbers) to download

e.

Create a text file with that information (combine all the sequence data)

f.

Upload the protein sequences to Biology Workbench (Flu session)

g.

Perform

a MSA with phylogenetics tool, create series of dendrograms

h.

What can you tell from these data?


SARS exercise


story centered warm
-
up.


You are working at the CDC in spring 2003, and the first SARS virus has been
sequenced. Your job is to figure out what

it is like, and how it is different, from
current circulating corona viruses, and other viruses circulating in birds and pigs.


1)

Find and or download the genomic reference sequence (RefSeq) for SARS
coronavirus (search genome for SARS, the accession numbe
r is
NC_004718
)

2)

View the protein coding regions, save all proteins in a table in FASTA format.

3)

Do the same for the viruses in the Science paper
The Genome Sequence of
the SAR
S
-
Associated Coronavirus (
this is the first

published paper with
the SARS sequence, putative protein assignments, and phylogenetics.

4)

You will need to find and download the protein sequences for Transmissible
gastroenteritis virus, SARS coronavirus, porcin
e epidemic diarrhea virus,
Murine hepatitis virus, Human coronavirus 229E, Bovine coronavirus, avian
infectious bronchitis virus, for spike, replicase, nucleocapsid, and membrane
proteins. This exercise is part of the NCBI skill set.

5)

Create a document simi
lar to corona_sequence_master.txt, and make sure to
build the document upside down in reverse alphanumeric order.

6)

Open or use your account at Biology Workbench
(
http://workbench.sdsc.edu/
). Create a session for S
ARS, and upload the
protein sequences