Tutorial - Protein Information Resource - Georgetown University

weinerthreeforksBiotechnology

Oct 2, 2013 (3 years and 8 months ago)

78 views



Bio
-
Trac 25 (Proteomics: Principles and Methods)

March 24, 2006


Zhang
-
Zhi Hu, M.D.

Senior Bioinformatics Scientist, Protein Information Resource

Research Assistant Professor, Department of

Biochemistry and Molecular Biology

Georgetown University Medical Center

Tutorial:

Bioinformatics Resources

(
http://pir.georgetown.edu/~huz/class/bioinfo_resource.html
)

2

computer + mouse =
bioinformatics

(information) (biology)

NIH Biomedical Information Science and Technology
Initiative (BISTI) Working Definition (2000)

-

Research,
development, or application of computational tools and
approaches for expanding the use of biological, medical,
behavioral or health
data
, including those to
acquire
,
store
,

organize
,
archive
,
analyze
, or
visualize

such data.

What is Bioinformatics?

3

Molecular Biology Database Collection

--

858

key databases
of
15

categories

(
http://nar.oxfordjournals.org/cgi/content
/full/34/suppl_1/D3/DC1
)

4

Database Collection in

Nucleic Acids Res.

5

http://pir.georgetown.edu/~huz/class/2005_database_update.html

Online Access to Database Collection

http://www.oxfordjournals.org/nar/database/cap/


2006

6

Overview

I.
Text search / Information retrieval

II.
Sequence & genomics databases

III.
Protein family databases

IV.
Database of protein functions

V.
Databases of protein structures

VI.
Proteomics databases

Database Contents, Search and Retrieval

7

Entrez

Text Searches


(
http://www.ncbi.nlm.nih.gov/Entrez/
)


8

PubMed Literature Database

(
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=PubMed
)

9

UniProt Text Search

(
http://www.pir.uniprot.
org/cgi
-
bin/textSearch
)

Google

type search vs.
Boolean

searches: AND, OR, NOT

10

PIR Text Search (I)

(
http://pir.georgetown.edu/pirwww/
search/textsearch.html
)

Search:

Alpha crystallin A chain and
protein family?

11

PIR Text Search (II)

Can you find
which
crystallin

that has 3D
structure
determined?

Search:

Crystallins
that are
enzymes ?

12

I. Sequence & Genomics Databases

GenBank
:
An annotated collection of all publicly available nucleotide
and protein sequences.

RefSeq
: NCBI
non
-
redundant set of reference sequences, including
genomic DNA, transcript (RNA), and protein products

UniProt

Consortium Database
:
U
niversal protein knowledgebase, a
central resource of protein sequence and function from
Swiss
-
Prot
,
TrEMBL

and
PIR
.

Entrez Gene
: Gene
-
centered information at NCBI.

UniGene
: Unified clusters of ESTs and full
-
length mRNA sequences .

OMIM
:

Online Mendelian inheritance in man: a catalog of human
genetic and genomic disorders.

Model Organism Genome Databases
:

MGD, RGD, SGD, Flybase…

GeneCards
:

Integrated database of human genes, maps, proteins and
diseases.

SNP
Consortium Database

13

UniProt Consortium Databases

(
http://www.uniprot.org
)

2.85 million

Universal Protein Resource

UniProtKB
UniRef
UniParc

14

UniProt Sequence Report (I)

(
http://www.pir.uniprot.org/cgi
-
bin/unipEntry?id=CRYAA_RABIT
)

What’s the difference between
CRYAA_RABIT & CYRBAA?

15

UniProt Sequence Report (II)

(
http://www.pir.uniprot.org/cgi
-
bin/unipEntry?id=UniRef100_P02489
)

(
http://www.pir.uni
prot.org/cgi
-
bin/unipEntry?id=
UniRef90_P02489
)

16

Entrez Gene

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd
=Retrieve&dopt=Graphics&list_uids=12954#ubor0_RefSeq

17

OMIM:
Online Mendelian inheritance in man

(
http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580
)

18

II. Protein Family Databases

Whole Proteins


PIRSF:

A Network Classification System of Protein Families


COG
(Clusters of Orthologous Groups) of Complete Genomes


ProtoNet:

Automated Hierarchical Classification of Proteins

Protein Domains


Pfam:

Alignments and HMM Models of Protein Domains


SMART:

Protein Domain Families


CDD:
Conserved Domain Database

Protein Motifs


PROSITE:

Protein Patterns and Profiles


BLOCKS:

Protein Sequence Motifs and Alignments


PRINTS:

Protein Sequence Motifs and Signatures

Integrated Family Databases


iProClass:

Superfamilies/Families, Domains, Motifs, Rich Links


InterPro:

Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF,
SuperFamily

19

Protein Clustering

COGs:

(
http://www.ncbi.nlm.
nih.gov/COG/
)

20

KOGs:

Eukaryotic
Clusters

(
http://www.ncbi.nlm.nih.
gov/COG/new/shokog.cgi?
KOG3591
)

21

Domain Classification

(
http://pir.georgetown.edu/cgi
-
bin/ipcEntry?id=CRYAA_RABIT
)

(
http://www.sanger.ac.uk/cgi
-
bin/Pfam/swisspfamget.pl?na
me=CRYAA_RABIT
)

22

Pfam Domain

(
http://www.sanger.ac.uk/cgi
-
bin/Pfam/getacc?PF00525
)

23

Integrated Family Classification

InterPro
:


An

integrated
resource unifying
PROSITE,
PRINTS, ProDom,
Pfam, SMART,
and TIGRFAMs,
PIRSF.
(
http://www.ebi.ac.
uk/interpro/search.
html
)

24

PIRSF:

Full Length
Classification


iProClass
Family Report

(
http://pir.georgetown.edu/cgi
-
bin/ipcSF?id=SF002280
)

25

Protein Motifs


PROSITE

is a database of protein families and domains. It consists of
biologically significant sites, patterns and profiles.

(
http://us.expasy.org/prosite/
)

26

III. Databases of Protein Functions

Metabolic Pathways, Enzymes, and Compounds


Enzyme Classification:

Classification and Nomenclature of Enzyme
-
Catalysed
Reactions (EC
-
IUBMB)


KEGG
(Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways


LIGAND
(at KEGG): Chemical Compounds, Reactions and Enzymes


EcoCyc:

Encyclopedia of
E. coli

Genes and Metabolism


MetaCyc:

Metabolic Encyclopedia (Metabolic Pathways)


BRENDA:

Enzyme Database


UM
-
BBD:

Microbial Biocatalytic Reactions and Biodegradation Pathways

Cellular Regulation and Gene Networks


EpoDB:

Genes Expressed during Human Erythropoiesis


BIND:

Descriptions of interactions, molecular complexes and pathways


DIP:

Catalogs experimentally determined interactions between proteins


BioCarta:
Biological pathways of human and mouse


GO:
Gene Ontology Consortium Database

27

KEGG Metabolic & Regulatory Pathways

(
http://www.genome.ad.jp/dbget
-
bin/show_pathway?hsa00220+4.3.2.1
)


KEGG

is a suite of databases and associated software, integrating our current knowledge


on molecular interaction networks, the information of genes and proteins, and of chemical


compounds and reactions. (
http://www.genome.ad.jp/kegg/kegg2.html
)

28

BioCyc (EcoCyc/MetaCyc
Metabolic Pathways)


The BioCyc Knowledge Library is a collection of
Pathway/Genome Databases
(
http://biocyc.org/
)

29

BioCarta Cellular Pathways

(
http://www.biocarta.com/index.asp
)

30

Protein
-
Protein Interaction: BIND

(
http://www.bind.ca/
)

31

Gene Ontology

(
http://www.geneontology.org/
)


Three GOs:


Molecular Function


Biological Process


Cellular Component

32

IV. Databases of Protein Structures

Protein Structure


PDB:

Structure Determined by X
-
ray Crystallography and NMR


PDBsum:
Summaries and analyses of PDB structures


MMDB:
NCBI’s database of 3D structures, part of NCBI Entrez


SWISS
-
MODEL Repository:

Database of annotated protein 3D models


ModBase:

Annotated comparative protein structure models

Structure Classification


CATH:

Hierarchical Classification of Protein Domain Structures


SCOP:

Familial and Structural Protein Relationships


FSSP:

Protein Fold Classification Based on Structure
--
Structure
Alignment

33

PDB: Experimental 3D Structure Repository

(
http://www.rcsb.org/pdb/
)

Rat gamma
-
crystallin,
chain A, B.

Can you do a text
search at PIR to
find this?

34

PDBsum:

Summary and Analysis

(
http://www.ebi.ac.uk/thornton
-
srv/databases/pdbsum/
)


Search

3
-
D structure summary

2
-
D structure

35

Protein Structural Classification (1)

CATH
: Hierarchical domain
classification of protein structures
(
http://www.biochem.
ucl.ac.uk/bsm/cath_new/
)

36

Protein Structural Classification (2)

(
http://scop.mrc
-
lmb.cam.ac.uk/scop/data/scop.b.html
)

SCOP:

comprehensive description of structural and evolutionary relationships
between all proteins whose structure is known.

37

SWISS
-
MODEL Repository

A database of annotated three
-
dimensional
comparative protein structure models

(
http://swissmodel.expasy.org/repository/s
mr.php?sptr_ac=CRGE_RAT&job=2
)

38

VI. Proteomic Resources


GELBANK

(
http://gelbank.anl.gov
): 2D
-
gel patterns from completed
genomes;
SWISS
-
2DPAGE

(
http://www.expasy.org/ch2d/
)

PEP

(
http://cubic.bioc.columbia.edu/ pep/
): Predictions for Entire
Proteomes: summarized analyses of protein sequences


Integr8
(
http://www.ebi.ac.uk/integr8/
): A

browser for information
relating to completed genomes and proteomes, based on data
contained in Genome Reviews and the UniProt proteome sets

PRIDE

(
http://www.ebi.ac.uk/pride/
): PRoteomics IDEntifications
database
Expression Profiling databases

GPMdb

(
http://gpmdb.thegpm.org/
): Mass Spec Proteomics
Databases

39

2D
-
Gel Image Databases (1)

(
http://us.expasy.org/ch2d/2d
-
index.html
)

(
http://us.expasy.org/cgi
-
bin/nice2dpage.pl?P02489
)

40

2D
-
Gel Image Databases (2)

(
http://gelbank.anl.gov/2dgels/index.asp
)

41

GPMdb MS Data Search

http://gpmdb.thegpm.org/

Craig, et al., J Proteome Res.
2004, 3:1234
-
42.

42

iProLINK: Protein Literature Mining Resource

http://pir.georgetown.edu/iprolink/

Text mining of Protein phospohrylation

Gene/protein name thesaurus:
synonyms, ambiguous names…

43

Choose additional
protein IDs

to browse the variety of
molecular biology databases each sequence report links to.

Delta crystallin II (Argininosuccinate lyase)
(UniProt: ARLY2_ANAPL/P24058)


Alpha crystallin A
(UniProt: CRYAA_RABIT/P02493)

Lab: