In Depth Analysis of Protein Amino Acid Sequence - Bioinformatics ...

educationafflictedBiotechnology

Oct 4, 2013 (3 years and 11 months ago)

81 views

In
-
depth Analysis of Protein Amino Acid Sequence and
PTMs with High
-
resolution Mass Spectrometry

Lian Yang
2
;
Baozhen Shan
1
;
Bin Ma
2

1
Bioinformatics Solutions Inc, Canada

2
University of Waterloo, Canada


Problem


Complete protein sequence coverage

o
antibody confirmation

o
biomarker discovery



Database search software along is insufficient

Protein sequence analysis


Possible reasons for incomplete coverage



“non
-
database” peptides

o
unexpected modifications

o
mutated residues

o
novel peptide


database errors






Meanwhile


Large amount of
high
-
quality spectra

are
not matched
.

Protein sequence analysis


A workflow to identify
both the
database

and

non
-
database
” peptides


Objective


Maximize protein sequence coverage


Explain more high
-
quality MS/MS spectra

Proposed workflow for in
-
depth analysis


Workflow













Proposed workflow for in
-
depth analysis

Multiple enzyme



Multiple

protein digests with
different
enzymes



High accuracy MS
for both
precursor

and
fragment

ions


Workflow













Proposed workflow for in
-
depth analysis

PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry
.

Rapid
Commun

Mass
Spectrom
.

2003;17(20):2337
-
42.

Multiple enzyme



Identify
de novo

sequence tags



Reveal
a set of high quality spectra


Workflow













Proposed workflow for in
-
depth analysis

PEAKS DB: De Novo sequencing assisted database search for sensitive and accurate peptide identification.


Mol Cell Proteomics

2012; 11:10.1074, 1

8.

Multiple enzyme



Identify
database peptides
.



Database search result validated
by
de novo
tags



Reveal
a set of confident proteins


Workflow













Proposed workflow for in
-
depth analysis

PeaksPTM: Mass spectrometry
-
based identification of peptides with unspecified modifications.

Journal of Proteome Research

10.7 (2011) : 2930
-
2936

Multiple enzyme



Identify peptides with
unexpected
modifications



Peptides from the set of confident
proteins are “modified” in
-
silico

by trying
all possible modifications in UNIMOD.




Speed up by
de novo

tags

For input spectra with

+
highly confident de novo tags

-

no significant database matches


Workflow













Proposed workflow for in
-
depth analysis

SPIDER: software for protein identification from sequence tags with de novo sequencing error.


J
Bioinform

Comput

Biol.

2005 Jun;3(3):697
-
716.

Multiple enzyme



Identify peptides with
mutation
,
such as
residue
insertion
,
deletion
, and
substitution
.



Screen the protein database to find short
sequences similar to
de novo

tags



Use both the
de novo

tags and database
sequence to reconstruct the most probable
sequences that match the spectrum

For input spectra with

+
highly confident de novo tags

-

no significant database matches


Workflow













Proposed workflow for in
-
depth analysis

Multiple enzyme

Unassigned
de novo

sequence tags

are reported as
possible novel
peptides


Result integration











Proposed workflow for in
-
depth analysis


Test the workflow with the standard bovine serum albumin


Sample






Workflow






In
-
depth analysis of BSA


Pure ALBU_BOVIN from SIGMA


3 digests with
Trypsin
, LysC
, GluC.


LC
-
MS/MS with Thermo LTQ
-
Orbitrap XL.



Workflow implemented in PEAK 6


3 digests in one project


Searched database: Swiss
-
Prot

Trypsin

LysC

GluC

Workflow

LC
-
MS/MS


More PSMs are identified in each additional step:

Result

5,152 MS/MS spectra

1,737 PSMs

906 PSMs

44 PSMs

38 MS/MS spectra

Filtered
at 1% FDR


1,737
-
>

2,687
PSMs

PEAKS ALC score > 70%


BSA coverage

Result

The uncovered 4% is in the protein N
-
terminal region, which is mostly
likely cleaved
-
off and not in the purchased sample
1
.


1
specific binding site (Asp
-
Thr
-
His
-
Lys) for Cu(II) ions.

T. Peters Jr., F.A.
Blumenstock
. J. Biol. Chem., 242 (1967), p. 1574

87%

96%

82%
84%
86%
88%
90%
92%
94%
96%
98%
Trypsin + PEAKS DB
Proposed workflow

Contaminants


Identified with at least 3 unique peptides.


Human keratin proteins (K2C1_HUMAN and K1C_HUMAN)





Bacteria protein (SSPA_STAAR)



Trypsin (TRY1_BOVIN)



Result


PTMs


Unsuspected modifications identified by PTM search










Three PTMs specified in database search

»
Carbamidomethylation

(C)

»
Oxidation (M)

»
Deamidation

(NQ)



Result


Mutation


214
th

amino acid A


T


Brown 1975, Fed. Proc. 34:591


Result


Unexplained de novo tags


Might be…


Novel peptides outside of the searched database




Result

KK.QTALVELLK.HK


|||||||


DPALVELLKK


A software workflow proposed for in
-
depth protein
sequence analysis


Found many things in a “pure” sample


Contaminants


Unsuspected PTMs


Mutations



Improved protein sequence coverage


BSA coverage:
87%
-
>

96%



Explained more high
-
quality MS/MS spectra


Identified MS/MS spectra:
1,737

-
>
2,687


Summary

Q / A