Slide 1 - Bioinformatics Solutions Inc.

clumpfrustratedBiotechnology

Oct 2, 2013 (4 years and 9 days ago)

91 views

Practical Guide to Significantly Improve
Peptide Identification

Sensitivity

and
Accuracy

Bin Ma, CTO

Bioinformatics Solutions Inc.

June 5, 2011.

The Sensitivity and Accuracy Dilemma

score

false

true

Publication Guideline


Earlier experiments paid too much attention on sensitivity and
not enough on accuracy.


MCP started the guideline in 2004 to ensure accuracy.

People are generally over
-
optimistic about how reliable their results are.



ABRF
iPRG

2011.

1%

iPRG
/ABRF 2011 Study

30 out of 45 submissions have FDR much
higher than the required 1%

Estimated FDR lower bound

Estimated FDR upper bound





PEAKS Achieved both Sensitivity and Accuracy

1%

PEAKS

PEAKS

More peptides in submission

Outline

1.
FDR


pitfalls and solutions

2.
De novo
sequencing assisted database search

3.
Three essential examinations to ensure result
quality.

1. FDR


pitfalls and solutions

FDR Estimation

Search Engine

𝐹𝐷𝑅
=
#
 𝑜𝑦
#
𝑡𝑎𝑟𝑔𝑡

target

decoy

#

decoy hits

Protein DB

Identified Peptides

#

false target hits



Pitfall 1


Multiple Round Search

Round 1.

Fast Search

Round 2.

More Sensitive Search

FDR underestimation.

#

decoy hits

#

false target hits

>

more targets
than decoys

Craig and Beavis 2004.
Bioinformatics

20, 1466

67.

Bern and
Kil

2011,
J Proteome Res
. 10, 2123
-
27.

Evertt

et al. 2010.
J Proteome Res
. 9, 700
-
707.

Our Solution: Decoy Fusion

Fast Search

More Sensitive Search

Decoy sequence append to
each target protein.

PEAKS DB paper. Submitted.

Equal targets and decoys

#

decoy hits

#

false target hits



Pitfall 2


Mix Protein and Peptide ID

Idea
: Peptides on a multi
-
hit protein get a
bonus

on their scores to increase sensitivity.

Pitfall

More multi
-
hit proteins from target DB



more
false
hits are “saved” from target DB



FDR underestimation.

A weak hit is
“saved”

due to the bonus.

So is this weak
false hit.

decoy hit

target false

hit

Our Solution: Decoy Fusion

Weak false hits are “saved” with approx.
equal probabilities in target and decoy.

Get the sensitivity, but still estimate the FDR correctly.

Pitfall 3


Machine Learning with Decoy

Idea
: Re
-
train the coefficients of scoring
function for
every

search
after knowing
the decoy hits.

Pitfall
: Risk of over
-
fit. Machine learning
experts only.

Adjust scoring function to
remove decoy hits after search.

Fewer target false hits are removed


FDR underestimation

Search

target false hits

decoy hits

Solution
s

1.
Don’t use it.


Judges cannot be
players.

2.
Only use for
very

large dataset.

3.
Train coefficients and reuse; don’t re
-
train
for every search.

or

or

PEAKS 5.3


PEAKS DB used all these techniques (and
many more) to ensure the accuracy while
maximizing sensitivity.


Reliable FDR estimation is the top priority in
PEAKS DB design.

2
.
De novo
sequencing assisted
database
search

An Idea to Improve Score Function

score

false

true

Idea
: If
de novo
matches a DB peptide, it is
likely to be correct.

De Novo Assisted DB Search

# matched amino acids

between
de novo
& DB search

x
+4
y

best
separation line

DB Search Score

score

false

true

Including
de novo
matching as a feature
gives the score function a better
discriminative power.

before

after

This is just one example of many other new
features in PEAKS 5.3 for improving score function.

… far better than what I could ever squeeze out of my data



Stefano
Gotta
, Siena Biotech

0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
0
500
1000
1500
2000
2500
3000
3500
4000
FDR

# of PSM

product M

PEAKS DB





DB search

Found?

Yes

No

De Novo

All Spectra

DB peptides

De novo
only

PEAKS DB Workflow

De novo
both helps to
improve DB search, and
reports novel peptides.

3. Three
essential examinations to
ensure result quality.

Don’t Trust Software Blindly!


Google “Don’t trust software blindly

returned 5,140,000
results.


As you quality control your experiments,
quality control the software’s results too.

Essential Examination 1

#decoy


#target

in low score region

Low #decoy

in high score region

Essential Examination 2

High scoring peptides

should have low precursor error.

Precursor error start to scatter

b
elow threshold

Essential Examination 3


Spectrum annotation around score threshold.

Take Home Message


Another year of dedicated work on PEAKS.


Ensured accuracy; maximized sensitivity.


Do the three essential examinations.


They are simple … at least in PEAKS.



“a

big

step

forward”



Christian
Schmelzer
, Martin Luther University

Enjoy!

http://www.bioinfor.com/peaks
-
download
-
a
-
pricing