Bioinformatics of viral genomes - BIDD

weinerthreeforksBiotechnology

Oct 2, 2013 (3 years and 9 months ago)

150 views

LSM3241: Bioinformatics and Biocomputing


Lecture 2: Bioinformatics of viral genome


Prof. Chen Yu Zong


Tel: 6874
-
6877

Email:
csccyz@nus.edu.sg

http://xin.cz3.nus.edu.sg

Room 07
-
24, level 7, SOC1,

National University of Singapore


2

Resource of Viral Genomes

NCBI Genome Database
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

3

Resource of Viral Genomes

NCBI Genome Database
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

2,226 entries of viral genomes (1,524 distinct virus strains) in the
database. Early 2005 figure: 1,250 entries and 1,022 distinct

1,193 entries of complete viral genome. Early 2005 figure: 900

4

Resource of Viral Genomes

NCBI Genome Database
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

12 entries of coronavirus genomes (8 in early 2005)

16 entries of influenza H5N1 genomes

5

Resource of Viral Genomes

NCBI Genome Database
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

Information of viral genomes in the database can also be retrieved by
clicking the viruses link:

Click

Here

6

Resource of Viral Genomes

NCBI Genome Database
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

List of viral genomes: (1,927 entries in Jan 2006, 1,461 in Jan 2005)

7

Resource of Viral Genomes

NCBI Genome Database
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

Viral taxonomy groups:

8

Resource of Viral Genomes

NCBI Genome Database
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

Viral genome list:

9

Resource of Viral Genomes

Viral genome list:

10

Bioinformatics of Viral Genomes

Viral name link:

Viral genome link

All entries

11

Bioinformatics of Viral Genomes

Viral protein link:

Limit to title search

12

Bioinformatics of Viral Genomes

SARS coronavirus PP1ab PID link. It gives multiple entries from difference
strains or from related species

Viral strain

13

Different strains of SARS coronavirus

14

Bioinformatics of Viral Genomes

Note: Viral polyprotein is not a single protein, it is a combination of several
proteins. Information about these proteins can be difficult to read

Suggestion: Looking into a latest NCBI entry of the same virus from a
reputable research group

15

Bioinformatics of Viral Genomes

SARS coronavirus unknown sars3a PID link:

16

Bioinformatics of Viral Genomes

Alternative way to find SARS coronavirus genome. Look for the latest entry with
complete genome and good functional annotation. Not all entries have these.

17

Bioinformatics of Viral Genomes

The latest good entry: AY572038 civet020 SARS coronavirus (In Jan 2005 AY310120
SARS coronavirus FRA), complete genome

18

SARS Coronavirus Genome

You are expected to find the info about each gene (genome location,
sequence, function)

19

Function of SARS Coronavirus Genes

20

Bioinformatics of Viral Genomes

Where to find the proteins in the genome entry?

Source 1: mat_peptide

Protein name

21

Bioinformatics of Viral Genomes

Where to find the proteins in the genome entry?


Source 1:

mat_peptide

22

Bioinformatics of Viral Genomes

Where to find the proteins in the genome entry?

Putative 3C
-
like protease mat_peptide link:

Protein name

Protein function

23

Bioinformatics of Viral Genomes

Where to find the proteins in the genome entry?

Source 2: CDS

Protein name

24

Bioinformatics of Viral Genomes

Where to find the proteins in the genome entry?


Source 2:

CDS

25

Bioinformatics of Viral Genomes

Where to find the proteins in the genome entry?


Source 2:

CDS

26

Bioinformatics of Viral Genomes

Where to find the proteins in the genome entry?


Source 2:

CDS

27

Bioinformatics of Viral Genomes

Where to find the proteins in the genome entry?

Nucleocapsid protein protein_id link:

Protein name

28

Bioinformatics of Viral Genomes

How to find the name or function of a putative

protein in a genome?



Medline keyword search


Google search

29

Bioinformatics of Viral Genomes

What if the function of a putative protein is unknown?



Sequence alignment (BLAST, PSI
-
BLAST).
This will be further
discussed in lecture 4.


Motif analysis (Conduct a
PROSITE

motif search)


If sequence analysis fails or in doubt, try machine learning
method (
SVMProt

, Nucleic Acids Res., 31: 3692
-
3697;
ProtFun

, Bioinformatics, 19:635
-
642).
This will be studied in
lecture 5.


30

Bioinformatics of Viral Genomes

Drug design:


Step 1: Finding the right target in the genome


A key protein involved in viral cycle (stop the disease process)


Different from human proteins (reduce side
-
effects)


Step 2: Finding or making a chemical agent to stop the
protein


In majority of cases: protein inhibitors


Step 3: Test and clinical trials

31

Bioinformatics of Viral Genomes

SARS Drug design:

The target: 3C like protease

32

Bioinformatics of Viral Genomes

SARS Drug design:


Inhibitor design: Finding inhibitors of similar proteins, such as
those of the same name (3C like proteases or 3C proteases of
other species), may offer clues to inhibitor design.

Search from
NCBI

33

Bioinformatics of Viral Genomes

Search from
NCBI

finds 19 references.

34

Bioinformatics of Viral Genomes

Check each abstract to find the name of one or more inhibitors.

Be prepared to read the full paper to find inhibitors

35

Bioinformatics of Viral Genomes

Make sure the paper talks about the inhibitors of the right protein.

This one actually talks about inhibitors of protease family, thus may

not necessarily be suitable for SARS 3C like protease

36

Bioinformatics of Viral Genomes

SARS Drug design:


Inhibitor design: Finding inhibitors of similar proteins, such as
those of the same name (3C like proteases or 3C proteases of
other species), may offer clues to inhibitor design.

Search from
Google

37

Bioinformatics of Viral Genomes

Search from
Google

finds numerous entries

38

Bioinformatics of Viral Genomes

Check each entry to find the name of one or more inhibitors.

Be prepared to read the full paper to find inhibitors

39

Bioinformatics of Viral Genomes

Design of SARS 3C like protease inhibitors

using rhinovirus 3C like protease inhibitors as templates

40

Summary of Today’s lecture


Genome database at NCBI


Viral genomes


SARS coronavirus genome as an example


Finding proteins from a genome


Therapeutic target identification from a genome and
inhibitor design