Prostate Cancer from Proteomic Patterns

clangedbivalveAI and Robotics

Oct 19, 2013 (3 years and 8 months ago)

77 views

A Significance Test
-
Based Feature
Selection Method for the Detection of
Prostate Cancer from Proteomic Patterns

Qianren (Tim) Xu

M.A.Sc. Candidate:

Supervisors:

Dr. M. Kamel

Dr. M. M. A. Salama

2

Highlight


STFS can be generally used for
any problems of supervised
pattern recognition


Very good performances have
been obtained on several
benchmark datasets,

especially with a large number
of features

Significance Test
-
Based
Feature Selection (STFS):

Proteomic Pattern
Analysis for Prostate
Cancer Detection

STFS

Neural Networks

ROC analysis


Sensitivity 97.1%,

Specificity 96.8%


Suggestion of mistaken
label by prostatic biopsy

3

Outline of Part I

Significance Test
-
Based Feature Selection (STFS)
on Supervised Pattern Recognition


Introduction


Methodology


Experiment Results on Benchmark Datasets


Comparison with MIFS


4

Introduction

Problems on Features


Large number


Irrelevant


Noise


Correlation

Increasing

computational complexity

Reducing
recognition rate

5

Mutual Information Feature Selection


Large number of features and the large number
of classes


Continuous data


But estimation of the mutual information is difficult:


One of most important heuristic feature selection
methods, it can be very useful in any classification
systems.

6

Problems on Feature Selection
Methods


Computational complexity


Optimal deficiency

Two key issues:

7

Proposed Method

Significance
of feature

=

Criterion of Feature Selection

Significant
difference

Independence

X

Noncorrelation between

candidate feature and

already
-
selected features

Pattern separability

on individual

candidate features

8

Measurement of Pattern
Separability of Individual Features

Statistical Significant Difference

Continuous data
with normal
distribution

Continuous data with
non
-
normal distribution

or rank data

Categorical
data

Two
classes

More than
two classes

Two
classes

More than two
classes

t
-
test

ANOVA

Mann
-
Whitney

test

Chi
-
square

test

Kruskal
-
Wallis

test

9

Independence

Independence

Continuous data
with normal
distribution

Continuous data with
non
-
normal distribution

or rank data

Categorical
data

Pearson
contingency

coefficient

Pearson

correlation

Spearman rank

correlation

10

Selecting Procedure

MSDI:
M
aximum
S
ignificant
D
ifference

and
I
ndependence Algorithm

MIC:
M
onotonically
I
ncreasing


C
urve Strategy

11

Maximum Significant Difference and
Independence (MSDI) Algorithm

Compute the significance difference (
sd
) of every initial features

Select the feature with maximum
sd

as the first feature

Computer the independence level (
ind
) between every
candidate feature and the already
-
selected feature(s)

Select the feature with maximum feature
significance (
sf
=

sd

x

ind
) as the new feature

12

Monotonically Increasing Curve
(MIC) Strategy

0

10

20

30

0.4

0.6

0.8

1

Number of features

Rate of recognition

Performance Curve

The feature subset

selected by MSDI

Plot performance curve

Delete the features that have
“no good” contribution to
the increasing of
recognition

Until the curve is monotonically increasing

13

Example I: Handwritten Digit Recognition


Thus 8x8 matrix is generated, that is 64 features


The pixels in each block is counted


32
-
by
-
32 bitmaps are divided into 8
X
8=64 blocks

14

Performance Curve

0

10

20

30

40

50

60

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of features

Rate of recognition

MSDI

MIFS(
β
=1.0)

MIFS(
β
=0.8)

MIFS(
β
=0.6)

MIFS(
β
=0.4)

MIFS(
β
=0.2)

Battiti’s MIFS
:

Random ranking

It is need to
determined
β

MSDI:

Maximum Significant
Difference and Independence

MIFS: Mutual Information
Feature Selector

15

Computational Complexity

Selecting 15 features from the 64
original feature set

MSDI
: 24 seconds

Battiti’s MIFS
: 1110 seconds

(5 vales of β are searched


in the range of 0
-
1)

16

Example II: Handwritten digit recognition

The
649 features

that distribute over the
following six feature sets:


76 Fourier coefficients of the character
shapes,


216 profile correlations,


64 Karhunen
-
Love coefficients,


240 pixel averages in 2 x 3 windows,


47 Zernike moments,


6 morphological features.

17

Performance Curve

0

10

20

30

40

50

0.2

0.4

0.6

0.8

1

Number of features

Rate of recognition

MSDI + MIC

MSDI

Random ranking

MSDI:

Maximum Significant
difference and independence

MIC:

Monotonically Increasing
Curve


18

Comparison with MIFS

0

10

20

30

40

50

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of features

Rate of recognition

MSDI

MIFS (
β
=0.2)

MIFS (
β
=0.5)

MSDI

is much
better on
large number
of features

MIFS

is better
on small
number of
features

MSDI:

Maximum Significant
Difference and Independence

MIFS:

Mutual Information
Feature Selector

19

Summary on Comparing MSDI
with MIFS


MSDI is much more computational
effective


MIFS need to calculate the pdfs


The computational effective criterion
(Battiti’s MIFS) still need to determine
β


MSDI only involves the simple statistical
calculation


MSDI can select more optimal feature subset from
a large number of feature, because it is based on
relevant statistical models


MIFS is more suitable on small volume of data and
small feature subset

20

Outline of Part II

Mass Spectrometry
-
Based Proteomic Pattern
Analysis for Detection of Prostate Cancer


Problem Statement


Methods


Feature


Classification


optimization


Results and Discussion

21

Problem Statement

1.
Very large number of features

2.
Electronic and chemical noise

3.
Biological variability of human disease

4.
Little knowledge in the proteomic mass
spectrum

15154 points (features)

22

The system of Proteomic
Pattern Analysis

Training dataset

(initial features > 10
4
)

RBFNN / PNN learning

Most significant features

selected by STFS

Mature classifier

Trained neural classifier

Optimization of the size of feature

subset and the parameters of classifier

by minimizing ROC distance

STFS
: Significance Test
-
Based
Feature Selection

PNN
: Probabilistic Neural Network

RBFNN
: Radial Basis Function
Neural Network

23

Feature Selection: STFS

MIC

MSDI

Significance

of feature

=

Significant

difference

Independence

x

STFS: Significance Test
-
Based Feature Selection

MSDI: Maximum Significant Difference and Independence Algorithm

MIC: Monotonically Increasing Curve Strategy

Student

Test

Pearson

correlation

24

Classification: PNN / RBFNN

x
1

x
2


x
n


Pool 1

Pool 2

S
1

S
2

x

1

x

2

x

3

x

n

y
(1)

y
(2)

y

y
d

PNN is a standard
structure with four layers

RBFNN is a modified

four
-
layer structure

PNN: Probabilistic Neural Network

RBFNN: Radial Basis Function Neural Network

25

Optimization: ROC Distance

Minimizing the ROC distance

to optimize:

-

Feature subset numbers m

-

Gaussian spread
σ

-

RBFNN pattern decision weight
λ

ROC: Receiver Operating Characteristic

a

b

d
ROC

False positive rate

(1
-
specificity)

1

0

0

1

True positive rate

(sensitivity)

26

Results:
Sensitivity and Specificity

Sensitivity

Specificity

Our results

97.1%

96.8%

Petricoin (2002)

94.7%

75.9%

DRE

55
-
68%

6
-
33%

PSA

29
-
80%

--

27

Pattern

Distribution

-
0.4

-
0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0

10

20

30

40

50

60

-
0.4

-
0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0

10

20

30

40

50

60

70

False negative 2.9%

False positive 3.2%

True negative 96.8%

True positive 97.1%

Cut
-
point

Non
-
Cancer

Cancer

Labelled by

Biopsies

Non
-
Cancer

Cancer

Pattern
recognized

by RBFNN

28

The possible causes on

the unrecognizable samples


1.
The algorithm of the classifier is not able
to recognize all the samples

2.
The proteomics is not able to provide
enough information

3.
Prostatic biopsies mistakenly label the
cancer

29

Possibility of Mistaken Diagnosis of
Prostatic Biopsy

-
0.4

-
0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0

10

20

30

40

50

60

-
0.4

-
0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0

10

20

30

40

50

60

70

True non
-
cancer

False non
-
cancer

False cancer

True cancer

Cut
-
point



Biopsy has limited sensitivity

and specificity


Proteomic classifier has very

high sensitivity and specificity
correlated with biopsy


The results of proteomic

classifier are not exactly the

same as biopsy


All unrecognizable sample

are outliers



31

Summary (1)

Significance Test
-
Based Feature Selection (STFS):


STFS selects features by maximum significant difference
and independence (MSDI), it aims to determine minimum
possible feature subset to achieve maximum recognition
rate


Feature significance (selecting criterion ) is estimated
based on the optimal statistical models in accordance
with the properties of the data


Advantages:


Computationally effective


Optimality

32

Summary (2)

Proteomic Pattern Analysis for Detection of
Prostate Cancer


The system consists of three parts: feature selection by
STFS, classification by PNN/RBFNN, optimization and
evaluation by minimum ROC distance


Sensitivity 97.1%, Specificity 96.8%, it would be an
asset to early and accurately detect prostate, and to
prevent a large number of aging men from undergoing
unnecessary prostatic biopsies



Suggestion of mistaken label by prostatic biopsy
through pattern analysis may lead to a novel direction
in the diagnostic research of prostate cancer

33

Thanks for your time

Questions?