Proteogenomics ppt

hordeprobableΒιοτεχνολογία

4 Οκτ 2013 (πριν από 3 χρόνια και 6 μήνες)

55 εμφανίσεις

Hugo Willy


A combination of the words
Proteomics and Genomics.


Proteogenomics

commonly refer to
studies that use proteomic
information, often derived from
mass
spectrometry
, to improve
gene

annotations.



It is used to characterize protein
sequence.

The basic idea is to ionize proteins
and let it “fly” in a vacuum chamber.

The mass/charge (m/z) ratio of the
ion can be deduced from the Time of
Flight (TOF) of the ion (to reach a
detector) or the frequency in which it
is circling in a magnetic field.

Some Mass Spectrometry technique ionize
whole proteins but the current popular
method is to chop a protein into peptides.

The peptides are separated by their masses
before ionization and sequenced
independently.

The peptide sequences are mapped back to
known protein sequences or used for de
novo sequencing (very much like genome
sequencing)

The peptide lengths


according to the
people I met is around 7
-
15 amino acids

Pros:

It is accurate in determining mass.

It can surely point, assuming
unambiguous mapping to a protein
sequence, to those proteins that are
translated in the cell


this can point
which mRNAs get translated and which
are not.

It can be used to quantify the amount of
different proteins in the sample


as
opposed to predicting it from the mRNA
levels using microarray

Pros:

It can identify Post Translational
Modification
i.e


If proteins are
phosphorylated

(then it is
Kinase

related)

If proteins are
methylated

and acetylated
(important in
Histone

code)

If proteins are
ubiquitinated

(related to
protein degradation)

It can detect (ribosomal) programmed
frameshift

and alternative splicing
events.

Cons:


It is still expensive (but some expert in
RECOMB Satellite for Computational
Proteomics said it is just as expensive as
RNA
-
Seq
).


It is hard to distinguish amino acids with
similar mass sum (most notably
Leucine

and
Isoleucine
)

We do not have reliable way to amplify
proteins in the sample (serious problem)

Accurate prediction of Translation Start Site.

Accurate prediction of programmed
frameshifts
.

Accurate prediction of post translational
modification.

A confirmation if a (pseudo)gene is actually
translated.

Observation: most current algorithms on
gene prediction are not based on proteomic
data (because they were not available)

For a novel protein, mapping the
peptides from the Mass Spectrometry
experiments to the
exomes
/genomes
(similar problem as RNA
-
Seq
)

Currently they try to collect
exomes

(regions that is assumed to be
exons
)
and translate them in 6 different frames
(3 in each DNA strand).

They also build a
exon

splice graph which
models different splicing alternatives of a
single gene

They developed a program to search a
peptide in this graph called Inspect. Can be
found at

http://proteomics.ucsd.edu/Inspect

Each box represents a
single
exon

and the
arrows represent
possible combinations of
them in the translated
protein product.

Revising gene models


hence their
annotations.

Finding novel peptides that maps to
non
-
exonic

regions


novel genes?

Nitin

Gupta et al. Whole proteome analysis of post
-
translational modifications: applications of mass
-
spectrometry for
proteogenomic

annotation.
Genome
Res
2007.

Proteogenomics
: Annotating Genomes using the
Proteome. Natalie
Castellana
. Poster in RECOMB CP
2011.
http://proteomics.ucsd.edu/recombcp2011/Posters/Poste
r_B19.pdf

Tutorial
:
Proteogenomics
. Natalie
Castellana
.
http://bix.ucsd.edu/projects/recombcp10_tutorials/RECO
MBCP_Tutorial_Castellana.pdf

Most of the work are done by
Pavel

Pevzner

and other
groups in UC San Diego. Here is their website
http://proteomics.ucsd.edu/

Is a branch of
proteogenomics

that
compares proteomic data from multiple
related species concurrently and exploits
the homology between their proteins to
improve annotations with higher statistical
confidence.

In a sense


this is the approximate
peptide matching problem.

However, it needs to take residue
conservation at different part of the
proteins into account
e.g

sites which are
post
translationally

modified must be
preserved to maintain function.

Some work in comparative
proteogenomics
:

Nitin

Gupta
et al.
Comparative
proteogenomics
:
Combining mass spectrometry and comparative
genomics to analyze multiple genomes.
Genome
Res

2008.

GenoMS

(
Castellana

et al. MCP 2010)


This is a
program to map peptides to the genome of other
related organism

Metaproteomics

(also Community Proteomics,
Environmental Proteomics, or Community
Proteogenomics
) is the study of all protein
samples recovered directly from
environmental samples.

This involves simultaneous mapping of
peptides to all known genomes and
proteomes to get the identity of different
organisms present in a sample.

Example work in this field is by


Wilmes P, Bond PL.
Metaproteomics
: studying
functional gene expression in microbial
ecosystems.
Trends
Microbiol
.

2006.


CSPS (Bandeira
et al. Nat. Biot. 2009)

MassBank

http://www.massbank.jp/en/docume
nt.html

I notice that Hoang’s problem


the one
which may be able to store multiple
reference genomes is going to be very
relevant.

RNA
-
Seq

-

Mass Spectrometry = Non
-
coding RNA?

Anything else?