Bioinformatics & Database tools

disturbedtonganeseBiotechnology

Oct 2, 2013 (4 years and 12 days ago)

100 views

Proteomics

Jen,Mona

& Krishna

Introduction


What is proteome?



proteome is the entire complement of
proteins, including the modifications made to
a particular set of proteins, produced by an
organism or system at particular time and
conditions.



varies with time and distinct requirements,
or stresses, that a cell or organism
undergoes.





What is proteomics?



Proteomics is the large
-
scale study of proteins,
particularly their functions and structures.


A short list of protein modifications that might be
studied under proteomics include:

1.
phosphorylation

2.
ubiquitination

3.
methylation

4.
acetylation

5.
glycosylation

6.
oxidation

7.
Nitrosylation etc.

Why proteomics?


Gives better understanding of an organism than
Genomics.


Limitations of genomics that made proteomics a better
approach:

1.

the level of transcription of a gene gives only a rough
estimate of its level of expression into a protein.

2.
many transcripts give rise to more than one protein,
through alternative splicing or alternative post
-
translational modifications.

3.
many proteins form complexes with other proteins or
RNA molecules, and only function in the presence of
these other molecules.

4
. proteins experience post
-
translational modifications that
profoundly affect their activities.

5. protein degradation rate plays an important role in protein
content.



Any cell may make different sets of proteins at different
times, or under different conditions. Furthermore, any one
protein can undergo a wide range of post
-
translational
modifications. So proteomics study can be complex.



Therefore, proteomics is a better approach but complex.

Branches of proteomics


Proteomics analysis


Determining proteins which are post
-
translationally modified


Expression proteomics


Profiling of expressed proteins using quantitative
methods


Cell mapping proteomics


Identification of protein complexes

Methods

1.
Gel based proteomics(
2
DE):


older approach


Separates proteins according to charge in the first
dimension and according to the size in the second
dimension.


Commonly separated using polyacrylamide gel
electrophorosis(PAGE).


Identifies individual proteins in complex samples or
multiple proteins in single sample.

2.Mass spectrometry based proteomics:


Highly accurate for extremely low mass particles
.


Proteins are cleaved into peptides with enzymatic protease
and the peptide masses are detected with the help of mass
spectrometer(
eg

TOF)


The mass spectrum of the peptides is obtained and it is
converted to a list of peptide masses that is searched
against the genome databases.


Since, each protein has a unique peptide mass fingerprint,
peptide masses can identify the protein in the database.



3.Protein arrays


Idea is similar to cDNA arrays.


Substrate is bound on the surface of array


Sample is introduced, binding takes place


Detection and analysis.


Analysis of protein
-
protein, protein
-
DNA or protein
-
RNA
interactions can be done.


Applications


Identification of potential new drugs for the treatment
of diseases. This relies on genome and proteome
information to identify proteins associated with a
disease, which computer software can then use as
targets for new drugs.


Biomarkers


A number of techniques allow to test for proteins produced
during a particular disease, which helps to diagnose the disease
quickly
.



Examples of biomarkers


Alzheimer's disease


In Alzheimer’s disease, elevations in beta secretase create
amyloid/beta
-
protein
, t
argeting this enzyme decreases the
amyloid/beta
-
protein and slows the progression of the disease



Heart disease


Standard protein biomarkers for CVD include interleukin
-
6
,
interleukin
-
8
, serum amyloid A protein, fibrinogen, and

troponins.





BIOINFORMATICS &
DATABASE TOOLS

Introduction


Current State


Many different informational protein
databases available online


Most databases are focused on protein
identification


Research community provides the data that
drives the database contents


Validation of Mass Spec data


Single vs. Multiple Species Support





Overview of Databases


NCBI


Protein /
Peptidome


Human Gene and Protein Database (HGPD)


Human
Proteinpedia

/ Human Protein
Reference Database (HPRD)


Dynamic Proteomics


Open Proteomics Database


Global Proteome Machine Database


Peptide Atlas


Proteomics Identifications Database (PRIDE)


UniProt

Knowledgebase

NCBI


Protein /
Peptidome


Two databases contained in the
Entrez

suite


Multi
-
species result sets


Protein


Provides gene information pertaining to the
expressed protein queried


Peptidome


Mass Spec based protein identification
database


Experiment based result sets

Human Gene and Protein Database
(HGPD)


Several
cDNA

contributors, spanning the
globe


Gateway Expression System


Allows for reproducible clone library. Clones
are available for purchase.


Wheat Germ Cell
-
free protein synthesis


Protein Expression portion of the database.
Allows for visualization of the SDS
-
PAGE
results.


Human
Proteinpedia

/ Human
Protein Reference Database
(HPRD)


Modeled after
wikipedia


Users submit and edit the data in the database


Differences


Original submitter expected to provide experimental evidence
for the data


Only the original submitter can edit that specific data later.


Allows several protein features to be annotated


Post
-
translational modification


Tissue expression


Cell line expression


Subcellular

localization


Enzyme substrates


Protein
-
protein interactions



Human
Proteinpedia

/ Human
Protein Reference Database
(HPRD)


No visual protein expression data


Protein amino acid sequence given


Raw and processed mass spec files are
available as experimental evidence


Provides links to the protein in other
databases



Dynamic Proteomics


Different type of database, focusing on the dynamics
of proteins treated with an anti
-
cancer drug


Shows different uses for data repositories for
proteomics


Not just all
-
encompassing data source with generic data.


Using simple databases and web front ends to make more
specific types of data available to the community.



Also provides links to other databases


Can compare multiple sequences at once to search
the
cDNA

library.


Dynamic Proteomics

Time lapse microscopy movies that illustrate
the protein dynamics in individual living human
cancer cells in response to an anti
-
cancer drug

Time Lapse Video

Open Proteomics Database


University of Texas


Multi
-
species results


Smaller pool of data submitted for query


Global Proteome Machine Database



Private industry involvement


Mass Spec Validation


Protein Identification


Utilizes data from other databases


Differs from the scheme of just linking to
other protein databases

Peptide Atlas


Seattle Proteome Center


Focused on subset of human proteins


Heart, Lung, Blood


Funded by NIH


Part of the Trans
-
Proteomic Pipeline
software suite

Proteomics Identifications Database
(PRIDE)



One of the earlier proteomic databases


European Bioinformatics Institute


Larger selection of species specific data


Java based, available for local deployment


UniProt

Knowledgebase


Swiss Institute of Bioinformatics


Also
curated

by European Bioinformatics
Institute


Funded by NIH


Forced the conversion of earlier non
-
public
versions to become free and open


Overview of Tools


ExPAsy

Proteomics Server


Trans
-
Proteomic Pipeline


ExPAsy

Proteomics Server


Swiss Institute of Bioinformatics tool suite


Protein ID by amino acid sequence


Isoelectric

Point Computation


Prediction of post translational modifications
and amino acid substitutions.


Predicts protein cleavage sites


Protein identification by molecular weight


Trans
-
Proteomic Pipeline


Seattle Proteome Center


Challenges


Large number of data sources


Parallel efforts


Validation of Mass Spec data



Future Considerations


Selection of a few ‘primary’ data repositories


Consolidation of multiple redundant efforts
being funded by the same agency


Particularly NIH


Data standards to streamline the submission
of results into multiple data sources.


Reduction of the need to perform many searches
to find information about a protein


mzXML

is a start, but only covers mass spec data

Database References



NCBI


Protein
http://www.ncbi.nlm.nih.gov/protein/


Peptidome

http://www.ncbi.nlm.nih.gov/pepdome


Human Gene and Protein Database (HGPD)


http://riodb.ibase.aist.go.jp/hgpd/cgi
-
bin/index.cgi


Human
Proteinpedia



http://www.humanproteinpedia.org/index_html


Human Protein Reference Database (HPRD)


http://www.hprd.org/


Dynamic Proteomics


http://alon
-
serv.weizmann.ac.il/dynamprotb/seqsrch


Open Proteomics Database


http://bioinformatics.icmb.utexas.edu/OPD/


Global Proteome Machine Database


http://thegpm.org


Peptide Atlas


http://www.peptideatlas.org/


Proteomics Identifications Database (PRIDE)


http://www.ebi.ac.uk/pride/


UniProt

Knowledgebase


http://www.uniprot.org/

Tool References



ExPAsy

Proteomics Server


http://www.expasy.ch/


Trans
-
Proteomic Pipeline


http://tools.proteomecenter.org/wiki/index.p
hp?title=Software:TPP

Applications of Proteomics


Mona Motwani

Discovery of protein biomarkers


A biomarker can be defined as any laboratory measurement or physical
sign used as a substitute for a clinically meaningful end point that measures
directly how a patient feels, functions or survives as applied to proteomics,
a biomarker is an identified protein(s) that is unique to a particular disease
state.



Biomarkers of drug efficacy and toxicity are becoming a key need in the
drug development process.



Mass spectral
-
based proteomic technologies are ideally suited for the
discovery of protein biomarkers in the absence of any prior knowledge of
quantitative changes in protein levels.



The success of any biomarker discovery effort will depend upon the
quality of samples analysed, the ability to generate quantitative information
on relative protein levels and the ability to readily interpret the data
generated.



Study of Tumor Metastasis and
Cancers


The identification of protein molecules with their expressions correlated to
the metastatic process help to understand the metastatic mechanisms and
thus facilitate the development of strategies for the therapeutic interventions
and clinical management of cancer.




Information contained within proteomic patterns has been demonstrated to
detect ovarian, breast and prostate cancers with sensitivities and specificities
greater than 90%.


Field

of Neurotrauma


Neurotrauma results in complex alterations to the biological systems
within the nervous system, and these changes evolve over time.



Near
-
completion of the Human Genome Project has stimulated scientists
to begin looking for the next step in unraveling normal and abnormal
functions within biological systems. Consequently, there is new focus on
the role of proteins in these processes.



Proteomics is a burgeoning field that may provide a valuable approach to
evaluate the post
-
traumatic central nervous system (CNS). However the
senstivity

of the tissue and detection of potential biomarkers are major
concern.

Renal disease diagnosis


Proteomics has also found significant application in studying the effects of
chemical insults on the kidney, particularly as a result of environmental toxins,
drugs and other bioactive agents.



Combining classic analytical techniques as two
-
dimensional gel electrophoresis
and more sophisticated techniques, such as MS, liquid chromatography has
enabled considerable progress to be made in cataloguing and quantifying
proteins present in urine and various kidney tissue compartments in both
normal and diseased physiological states.




Critical developmental tasks that still need to be accomplished are completely
defining the proteome in the various biological compartments (e.g. tissues,
serum and urine) in both health and disease, which presents a major challenge
given the dynamic range and complexity of such proteomes; and also achieving
the routine ability to accurately


and reproducibly quantify proteomic expression profiles and develop
diagnostic platforms.



Neurology


In neurology and neuroscience, many applications of proteomics have
involved neurotoxicology and neurometabolism, as well as in the
determination of specific proteomic aspects of individual brain areas and
body fluids in neurodegeneration.



Investigation of brain protein groups in neurodegeneration, such as
enzymes, cytoskeleton proteins, chaperones, synaptosomal proteins and
antioxidant proteins, is in progress as phenotype related proteomics.




The concomitant detection of several hundred proteins on a gel provides
sufficiently comprehensive data to determine a pathophysiological protein
network and its peripheral representatives. An additional advantage is that
hitherto unknown proteins have been identified as brain proteins.

Autoantibody profiling


Proteomics technologies enable profiling of autoantibody responses using
biological fluids derived from patients with autoimmune disease.



They provide a powerful tool to characterize autoreactive B
-
cell responses
in diseases including rheumatoid arthritis, multiple sclerosis, autoimmune
diabetes, and systemic lupus erythematosus.



Autoantibody profiling may serve purposes including classification of
individual patients and subsets of patients based on their 'autoantibody
fingerprint', examination of epitope spreading and antibody isotype usage,
discovery and characterization of candidate
autoantigens
, and tailoring
antigen
-
specific therapy.


Alzheimer's disease



In Alzheimer’s disease, elevations in beta secretase create amyloid/beta
-
protein, which causes plaque to build up in the patient's brain, which is
thought to play a role in dementia.



Targeting this enzyme decreases the amyloid/beta
-
protein and so slows the
progression of the disease.



A procedure to test for the increase in amyloid/beta
-
protein is
immunohistochemical staining, in which antibodies bind to specific antigens
or biological tissue of amyloid/beta
-
protein.


Heart disease



Heart disease is commonly assessed using several key protein based
biomarkers. Standard protein biomarkers for CVD include interleukin
-
6,
interleukin
-
8, serum amyloid A protein, fibrinogen, and
troponins
.




cTnI cardiac troponin I increases in concentration within 3 to 12 hours of
initial cardiac injury and can be found elevated days after an acute
myocardial infarction.



A number of commercial antibody based assays as well as other methods
are used in hospitals as primary tests for acute MI.


Future Challenges



There is a need for biomarkers with more accurate diagnostic capability,
particularly for early
-
stage disease.



Also adding a quality control sample on each chip array, and normalizing
spectral data through commercially available or in
-
house generated
computer programs



Another challenge that proteomics techniques face lie largely in the
application of bioinformatics, i.e. the spectral data management and analysis.
The vast amount of spectral data generated demand implementation of
advanced data management and analysis strategies.



Finally, the obvious challenge, as stated by many investigators, is the
identification of the important proteins and peptides that contribute to the
proteomic analysis.