W. Art Chaovalitwongse

pucefakeΤεχνίτη Νοημοσύνη και Ρομποτική

30 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

75 εμφανίσεις

Medical Diagnosis Decision
-
Support
System: Optimizing Pattern
Recognition of Medical Data

W. Art Chaovalitwongse


Industrial & Systems Engineering


Rutgers University




Center for Discrete Mathematics & Theoretical Computer Science (DIMACS)


Center for Advanced Infrastructure & Transportation (CAIT)


Center for Supply Chain Management, Rutgers Business School

This work is supported in part by research grants from
NSF
CAREER CCF
-
0546574
,
and
Rutgers Computing Coordination
Council (CCC
)
.

Outline


Introduction


Classification: Model
-
Based versus Pattern
-
Based


Medical Diagnosis


Pattern
-
Based Classification Framework


Application in Epilepsy


Seizure (Event) Prediction


Identify epilepsy and non
-
epilepsy patients


Application in Other Diagnosis Data


Conclusion and Envisioned Outcome

2

Pattern Recognition:

Classification

3

Positive Class

Negative Class

?

Supervised learning
:
A class
(category) label for each pattern
in the training set is provided.


Model
-
Based Classification


Linear
Discriminant

Function



Support Vector Machines






Neural Networks

4



0
1
0
0
|
i
j
d
j
ij
i
T
i
i
i
i
w
x
w
w
w
,
g






x
w
w
x
2
1
i i
i i
|| ||
min ( )
2
subject to
1 if w x b 1-
( )
1 if w x b 1
N
k
i
i
i
w
L w C
f x




 
 
 
 
  



     


0 0
1 1
( )
H
n
d
k k kj ji i j k
j i
g x z f w f w x w w
 
 
 
   
 
 
 
 
 
Tid

Refund

Marital

Status

Taxable

Income

Cheat

1

Yes

Single

125K

No

2

No

Married

100K

No

3

No

Single

70K

No

4

Yes

Married

120K

No

5

No

Divorced

95K

Yes

6

No

Married

60K

No

7

Yes

Divorced

220K

No

8

No

Single

85K

Yes

9

No

Married

75K

No

10

No

Singl
e

90K

Yes

10


Attributes
Samples
Class or
Category
Support Vector Machine


A

and
B

are data matrices of
normal and pre
-
seizure,
respectively


e

is the vector of ones




is a vector of real numbers




is a scalar


u, v

are the misclassification
errors

Mangasarian,
Operations Research

(1965); Bradley et al.,
INFORMS J. of Computing
(1999)

6

Pattern
-
Based Classification:
Nearest Neighbor Classifiers


Basic idea:


If it walks like a duck, quacks like a duck, then it’s
probably a duck

Training
Records

Test
Record

Compute
Distance

Choose k of the
“nearest” records

7

Traditional Nearest Neighbor

X
X
X
(a) 1-nearest neighbor
(b) 2-nearest neighbor
(c) 3-nearest neighbor

K
-
nearest neighbors of a record
x

are data points
that have the
k

smallest distance to
x

Drawbacks


Feature Selection


Sensitive to noisy features


Optimizing feature selection


n

features,
2
n

combinations


combinatorial optimization


Unbalanced Data


Biased toward the class (category) with larger
samples


Distance weighted nearest neighbors


Pick the
k

nearest neighbors from each class (category) to the
training sample and compare the average distances.


8

Multidimensional Time Series
Classification in Medical Data


Positive
versus

Negative


Responsive
versus

Unresponsive



Multidimensional Time
Series Classification


Multisensor

medical signals
(e.g., EEG, ECG, EMG)


Multivariate is ideal but
computationally impossible


It is very common that
physicians always use
baseline data as a reference
for diagnosis


The use of baseline data
-

naturally lends itself to
nearest
neighbor classification

Normal

Abnormal

?

9

Ensemble Classification
for
Multidimensional time series data


Use each electrode as a
base classifier


Each base classifier makes its own decision


Multiple decision makers
-

How to combine them?


Voting

the final decision


Averaging

the prediction score



Suppose there are 25 base classifiers


Each classifier has error rate,


= 0.35


Assume classifiers are independent


Probability that the ensemble classifier makes a wrong prediction
(voting):














25
13
25
06
.
0
)
1
(
25
i
i
i
i


10

Modified K
-
Nearest Neighbor
for MDTS

11

Ch
1
Ch 2
Ch 3
Ch n
…………….
D(X,Y)

Time series distances:
(1) Euclidean, (2) T
-
Statistical, (3) Dynamic Time Warping

Abnormal

Normal

K = 3

Dynamic Time Warping (DTW)


The
minimum
-
distance

warp path is the optimal alignment of two time series,
where the distance of a warp path
W

is:


is the Euclidean distance of warp path
W
.


is the distance between the two data point indices


(from
L
i

and
L
j
) in the
k
th

element of the warp path.

)
(
W
Dist



K
k
t
k
s
k
w
w
Dist
W
Dist
1
,
,
)
,
(
)
(
)
,
(
,
,
t
k
s
k
w
w
Dist
Dynamic Programming
:



30
,
30
D
The
optimal warping distance

is













1
,
1
,
1
,
,
,
1
min
,
,






t
s
D
t
s
D
t
s
D
L
L
Dist
t
s
D
t
j
s
i
12

Figure B)

Is from
Keogh and Pazzani,
SDM
(2001)

Optimizing Pattern
Recognition

13

Baseline Data
Signal Processing
(Feature Extraction)
Extracted Features
Selected Features
of All Baseline Data
Classifying New Samples
Feature Selection
Cleansed Data
Baseline Data
Signal Processing
(Feature Extraction)
Selecting Good
Baseline Data
and Deleting
Outliers
Integrated Feature
Selection & Pattern
Matching Optimization
Classifying New Samples
Optimally Selected Features
of Optimized Baseline Data
Traditional Pattern
-
Based Classification
Proposed Pattern
-
Based Classification
Cleansed Data
Extracted Features
Support Feature Machine


Given an unlabeled sample
A
, we calculate average statistical
distances of
A

Normal

and

A

Abnormal

samples in
baseline (training) dataset
per electrode (channel)
.



Statistical distances:
Euclidean, T
-
statistics, Dynamic Time
Warping



Combining all electrodes,
A

will be classified to the group
(normal or abnormal) that yields


the
minimum

average statistical distance; or


the

maximum

number of votes



Can we select/optimize the selection of a subset of electrodes
that maximizes number of correctly classified samples

14


Two distances for each sample at each electrode are calculated:


Intra
-
Class:

Average distance from each sample to all other
samples in the same class at Electrode
j


Inter
-
Class:

Average distance from each sample to all other
samples in different class at Electrode
j


Averaging:
If for Sample
i
(on average of
selected

electrodes
)

Average

intra
-
class

distance over all
electrodes

Average

inter
-
class

distance over all
electrodes

<

We claim that Sample
i
is
correctly

classified.

SFM:
Averaging

and
Voting



Voting:
If for Sample
i
at
Electrode

j (vote)

Intra
-
class distance
<

Inter
-
class distance (
good vote
)

Based on
selected

electrodes, if
# of good votes
>
# of bad votes
, then
Sample
i

is
correctly classified
.

Chaovalitwongse et al.,
KDD

(2007) and Chaovalitwongse et al.,
Operations Research
(
forthcoming
)

Distance Averaging: Training

Industrial & Systems Engineering Rutgers
University

16

Sample
i

at Feature
1

∙∙∙

Sample
i

at Feature
2

Sample
i

at Feature
m

Select a subset of features ( ) such that


as many samples as possible.





m
,...,
,
s
2
1






s
j
s
j
ij
d
ij
d
1
i
d
1
i
d
2
i
d
im
d
2
i
d
im
d
Majority Voting: Training

Industrial & Systems Engineering Rutgers
University

17


(Correct) if ; (Incorrect) otherwise.

1

ij
a
ij
d
ij
d

Negative

Positive

i

ij
d
ij
d
Feature
j

0

ij
a
Negative

Positive

Feature
j

j
i
d

j
i
d

i’

total number of samples.
n

total number of electrodes.
m

SFM Optimization Model

1 if sample is correctly classified;
0 otherwise, for 1,...,.
i
i
y
i n





1 if electrode is selected;
0 otherwise, for 1,...,.
j
j
x
j m





average distance from sample to all ot
her samples
in the same class, for 1... and
1....
ij
d i
i n j m

 
Intra
-
Class

average distance from sample to all ot
her samples
in different class, for 1... an
d 1....
ij
d i
i n j m

 
Inter
-
Class

Chaovalitwongse et al.,
KDD

(2007) and Chaovalitwongse et al.,
Operations Research
(
forthcoming
)





1
1
1 1
2
1 1
1
max
s.t. for 1,...,
1 for 1,...,
1
0,1
for
n
i
i
m m
ij j ij j i
j j
m m
ij j ij j i
j j
m
j
j
j
y
d x d x M y i n
d x d x M y i n
x
x j

 
 

  
   



 
 



1,...,
0,1
for 1,...,

i
m
y i n

 
Averaging SFM

Chaovalitwongse et al.,
KDD

(2007) and Chaovalitwongse et al.,
Operations Research
(
forthcoming
)

Maximize the number of correctly
classified samples

Must select at least one electrode

Logical constraints
on intra
-
class and
inter
-
class
distances if a
sample is correctly
classified





1
1
1 1
2
1 1
1
max
s.t. for 1,...,
2
1 for 1,...,
2
1
0,1
f
n
i
i
m m
j
ij j i
j j
m m
j
ij j i
j j
m
j
j
j
y
x
a x M y i n
x
a x M y i n
x
x


 
 

  
    



 
 



or 1,...,
0,1
for 1,...,
0
i
j m
y i n


 

Voting SFM


Chaovalitwongse et al.,
KDD

(2007) and Chaovalitwongse et al.,
Operations Research
(
forthcoming
)

Maximize the number of correctly
classified samples

Must select at least one electrode

Logical constraints: Must
win the voting if a sample
is correctly classified

1 if sample is correctly classified a
t electrode (good vote);
0 otherwise (bad vote), for 1,..., and
1,...,.
ij
i j
a
i n j m



 

Precision matrix,
A
contains elements of

Support Feature Machine

21

Step 1:
For individual feature
(electrode), apply the nearest
neighbor rule to every training
sample to construct the distance
and accuracy matrices
Step 2:
Formulate and solve the
SFM models and obtain the
optimal feature (electrode)
selection
Step 3:
Employ the
nearest neighbor rule to
classify unlabeled data
to the closest baseline
(training) data based on
the selected features
(electrodes)
Unlabeled
Samples
Training
Testing
Abnormal
Samples
Normal
Samples
x

electrode selection
y

training accuracy

voting matrix

distance matrices
Support Feature Machine
Support Vector Machine

Feature 1

Feature 2

Feature 3

Ch
1
Ch 2
Ch 3
Ch n
…………….
1
2
3
4
n
A data
vector of EEG sample
……
Pre
-
Seizure

Normal

Application in Epilepsy
Diagnosis

23

Facts about Epilepsy


About 3 million Americans and other 60 million people worldwide (about 1%
of population) suffer from Epilepsy.


Epilepsy is the second most common brain disorder (after stroke), which
causes recurrent seizures (
not vice versa
).


Seizures usually occur spontaneously, in the absence of external triggers.


Epileptic seizures occur when a massive group of neurons in the cerebral
cortex suddenly begin to discharge in a highly organized rhythmic pattern.


Seizures cause temporary disturbances of brain functions such as motor
control, responsiveness and recall which typically last from seconds to a few
minutes.


Based on 1995 estimates, epilepsy imposes an annual economic burden of
$12.5 billion* in the U.S.

in associated health care costs and losses in
employment, wages, and productivity.


Cost per patient ranged from
$4,272 for persons**
with remission after initial
diagnosis and treatment to
$138,602 for persons**
with intractable and
frequent seizures.

*Begley et al.,
Epilepsia

(2000); **Begley et al.,
Epilepsia

(1994).

24

Simplified EEG System and
Intracranial Electrode Montage

1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
4
4
4
4
4
4
5
5
LTD
RTD
LOF
LST
ROF
RST
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
4
4
4
4
4
4
5
5
LTD
RTD
LOF
LST
ROF
RST
Electroencephalogram (EEG)
is a traditional tool for evaluating
the physiological state of the brain

by

measuring voltage
potentials

produced by brain cells while communicating

25

Scalp EEG Acquisition

F3
F7
Fp2
Fp1
Fz
F8
F4
C 3
T3
C 4
T4
P4
Pz
P3
T5
T6
O2
Oz
O1
F3
F7
Fp2
Fp1
Fz
F8
F4
C 3
T3
C 4
T4
P4
Pz
P3
T5
T6
O2
Oz
O1
F3
F3
F7
F7
Fp2
Fp2
Fp1
Fp1
Fz
Fz
F8
F8
F4
F4
C 3
C 3
T3
T3
C 4
C 4
T4
T4
P4
P4
Pz
Pz
P3
P3
T5
T5
T6
T6
O2
O2
Oz
Oz
O1
O1
18 Bipolar Channels

Goals: How can we help?


Seizure Prediction


Recognizing
(data
-
mining) abnormality patterns in EEG signals
preceding seizures


Normal
versus

Pre
-
Seizure


Alert when pre
-
seizure samples are detected (online classification)


e.g.,
statistical process control in production system, attack alerts from
sensor data, stock market analysis


EEG Classification: Routine EEG Check


Quickly identify if the patients have epilepsy


Epilepsy
versus

Non
-
Epilepsy


Many causes of seizures:

Convulsive or other seizure
-
like activity can be
non
-
epileptic in origin, and observed in many other medical conditions.
These non
-
epileptic seizures can be hard to differentiate and may lead
to misdiagnosis.


e.g.,
medical check
-
up, normal and abnormal samples


27

Normal versus Pre
-
Seizure

28

10
-
second EEGs: Seizure Evolution

Normal

Pre
-
Seizure

Seizure Onset

Post
-
Seizure

Chaovalitwongse et al.,
Annals of Operations Research

(2006)

29

Normal

versus

Pre
-
Seizure

Data Set

EEG Dataset Characteristics

Patient ID

Seizure types

Duration of EEG(days)

# of seizures

1

CP, SC

3.55

7

2

CP, GTC, SC

10.93

7

3

CP

8.85

22

4

,SC

5.93

19

5

CP, SC

13.13

17

6

CP, SC

11.95

17

7

CP, SC

3.11

9

8

CP, SC

6.09

23

9

CP, SC

11.53

20

10

CP

9.65

12

Total



84.71

153

CP: Complex Partial; SC subclinical; GTC: Generalized Tonic/
Clonic

Sampling Procedure


Randomly and uniformly sample 3 EEG epochs per
seizure from each of normal and pre
-
seizure states.


For example, Patient 1 has 7 seizures. There are 21
normal and 21 pre
-
seizure EEG epochs sampled.


Use leave
-
one(seizure)
-
out cross validation to perform
training and testing.


Seizure

Seizure

Duration of EEG

30 minutes

30 minutes

8 hours

8 hours

8 hours

8 hours

Pre
-
seizure

Normal

Information/Feature Extraction
from EEG Signals


Measure the brain dynamics from
EEG signals


Apply dynamical measures (based on
chaos theory) to non
-
overlapping
EEG epochs of 10.24 seconds =
2048 points
.


Maximum Short
-
Term
Lyapunov

Exponent


measure the stability/
chaoticity

of
EEG signals


measure the average uncertainty
along the local eigenvectors and
phase differences of an attractor
in the phase space

Pardalos,
Chaovalitwongse,
et al.,
Math Programming
(2004)

Time
EEG Voltage
Evaluation


Sensitivity

measures the fraction of positive cases that
are classified as positive.



Specificity

measures the fraction of negative cases
classified as negative.


Sensitivity

=
TP/(TP+FN)

Specificity = TN/(TN+FP)



Type I error = 1
-
Specificity


Type II error = 1
-
Sensitivity

Chaovalitwongse et al
., Epilepsy Research
(2005)


Leave
-
One
-
Seizure
-
Out Cross
Validation

SFM

N2

N3

N4

N5

P2

P3

P4

P5

1

2

3

4

5

6

7

.

.

.


23
24

25

26

Training Set

Testing Set

Selected
Electrodes

34

P1

N1

N



EEGs from Normal State

P



EEGs from Pre
-
Seizure State

assume there are 5 seizures in the recordings

EEG Classification



Support Vector Machine
[Chaovalitwongse et al.,
Annals of OR
(2006)]


Project time series data in a high dimensional (feature) space


Generate a
hyperplane

that separates two groups of data


minimizing the
errors


Ensemble K
-
Nearest Neighbor
[Chaovalitwongse et al.,
IEEE SMC: Part A
(2007)]


Use each electrode as a base classifier


Apply the NN rule using statistical time series distances and optimize the
value of
“k”
in the training


Voting and Averaging


Support Feature Machine
[Chaovalitwongse et al.,
SIGKDD

(2007); Chaovalitwongse
et al.,
Operations Research
(
forthcoming
)]


Use each electrode as a base classifier


Apply the NN rule to the entire baseline data


Optimize by selecting the best group of classifiers (electrodes/features)


Voting: Optimizes the ensemble classification


Averaging: Uses the concept of inter
-
class and intra
-
class distances (or
prediction scores)

35

Performance Characteristics:

Upper Bound

37

SFM
-
> Chaovalitwongse et al.,
SIGKDD (2007);
Chaovalitwongse et al.,

Operations Research (forthcoming)


NN
-
> Chaovalitwongse et al
., Annals of Operations Research
(2006)


KNN
-
> Chaovalitwongse et al.,
IEEE Trans Systems, Man, and Cybernetics: Part A
(2007)

Separation of Normal and Pre
-
Seizure EEGs

From 3 electrodes
selected

by SFM

From 3 electrodes
not selected

by SFM

Performance Characteristics:

Validation

39

39

SFM
-
> Chaovalitwongse et al.,
SIGKDD (2007);
Chaovalitwongse et al.,

Operations Research (forthcoming)


SVM
-
> Chaovalitwongse et al
., Annals of Operations Research
(2006)


KNN
-
> Chaovalitwongse et al.,
IEEE Trans Systems, Man, and Cybernetics: Part A
(2007)

Epilepsy versus Non
-
Epilepsy

40

Epilepsy
versus

Non
-
Epilepsy

Data Set


Routine EEG check: 25
-
30 minutes of recordings ~ with scalp
electrodes


Each sample is
5
-
minute

EEG epoch (
30 points

of
STLmax

values).


Each sample is in the form of
18 electrodes X 30 points


5
sampled epochs
30 points
30 points
Epilepsy patients
Non
-
Epilepsy patients
Elec
1
…..
150 points
(25 minutes)
…..
Elec
2
Elec
17
Elec
18
Leave
-
One
-
Patient
-
Out Cross
Validation

SFM

E1

N2

N3

N4

N5

E2

E3

E4

E5

N1

1

2

3

4

5

6

7

.

.

.


23
24

25

26

Training Set

Testing Set

Selected
Electrodes

42

N



Non
-
Epilepsy

P



Epilepsy

Voting SFM: Validation


43

0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
k=5
k=7
k=9
k=11
k=All
Overall Accuracy
Voting
SFM Performance

Average of 10 Patients
DTW
EU
TS
KNN SFM
KNN SFM
KNN SFM
KNN SFM
KNN SFM
Averaging SFM: Validation


44

0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
k=5
k=7
k=9
k=11
k=All
Overall Accuracy
Averaging
SFM Performance

Average of 10 Patients
DTW
EU
TS
KNN SFM
KNN SFM
KNN SFM
KNN SFM
KNN SFM
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Selection Percentage

Electrode

Selected Electrodes From Averaging SFM

Averaging SFM - DTW
Averaging SFM - EU
Averaging SFM - TS

1 Fp1


C3

16 T6


Oz

17 Fz


Oz

F3
F7
Fp2
Fp1
Fz
F8
F4
C3
T3
C4
T4
P4
Pz
P3
T5
T6
O2
Oz
O1
Other Medical Diagnosis

46

Other Medical Datasets


Breast Cancer


Features of Cell Nuclei (Radius, perimeter, smoothness, etc.)


Malignant or Benign Tumors


Diabetes


Patient Records (Age, body mass index, blood pressure, etc.)


Diabetic or Not


Heart Disease


General Patient Info, Symptoms (e.g., chest pain), Blood Tests


Identify Presence of Heart Disease


Liver Disorders


Features of Blood Tests


Detect the Presence of Liver Disorders from Excessive Alcohol Consumption

47

Performance

LP SVM

NLP SVM

V
-
SFM

A
-
SFM

WDBC

98.08

96.17

97.28

97.42

HD

85.06

84.66

86.48

86.92

PID

77.66

77.51

75.01

77.96

BLD

65.71

57.97

63.46

66.43

48

LP SVM

NLP SVM

V
-
NN

A
-
NN

V
-
SFM

A
-
SFM

97.00

95.38

91.60

93.18

94.99

96.01

82.96

83.94

80.87

82.77

82.49

84.92

76.93

76.09

63.14

74.94

72.75

75.83

65.71

57.97

38.38

54.09

58.20

59.57

Training

Testing

Average Number of
Selected

Features

LP SVM

NLP SVM

V
-
SFM

A
-
SFM

WDBC

30

30

11.6

8.5

HD

13

13

7.4

8.7

PID

8

8

4.3

4.5

BLD

6

6

3.3

3.7

49

Medical Data Signal Processing
Apparatus (MeDSPA)


Quantitative analyses of medical data


Neurophysiological data (e.g.,
EEG
,
fMRI
) acquired during brain
diagnosis



Envisioned to be an
automated decision
-
support system

configured
to accept input medical signal data (
associated with a spatial
position or feature
) and provide measurement data to
help
physicians obtain a more confident diagnosis outcome
.



To improve the current medical diagnosis and prognosis by
assisting the physicians


recognizing (data
-
mining) abnormality patterns in medical data


recommending the diagnosis outcome (e.g.,
normal

or
abnormal
)


identifying a
graphical indication
(or
feature
) of abnormality (localization)

50

Automated Abnormality Detection Paradigm

User/Patient

Interface

Technology

Multichannel

Brain Activity

Data Acquisition

Statistical Analysis:

Pattern Recognition

Initiate a warning or a variety
of therapies (e.g., electrical
stimulation, drug injection)

Stimulator

Drug

Optimization:

Feature Extraction/
Clustering

Nurse

Feature 1
Feature 2
Feature 3
Acknowledgement:
Collaborators


E.
Micheli
-
Tzanakou
, PhD


L.D.
Iasemidis
, PhD


R.C. Sachdeo, MD


R.M. Lehman, MD


B.Y. Wu, MD, PhD


Students


Y.J. Fan, MS


Other undergrad students

52

Thank you for your attention!

Questions?

53