Department of Computer Science and Engineering

hesitantdoubtfulAI and Robotics

Oct 29, 2013 (3 years and 9 months ago)

72 views

Department of Computer Science and Engineering

Yixin

Chen
(陈一昕)
, Yi Mao, Minmin
Chen,
Rahav

Dor
, Greg
Hackermann
,
Zhicheng

Yang,
Chengyang

Lu

School
of Medicine

Kelly Faulkner, Kevin Heard, Marin
Kollef
, Thomas Bailey

Real
-
Time Clinical Warning for Hospitalized Patients via
Data Mining
(数据挖掘实现的住院病人的实时预警)

Background



The ICU direct costs per day for
survivors
is
between six and seven
times those for non
-
ICU care.




Unlike patients at ICUs,
general
hospital wards (
GHW) patients are not
under extensive electronic monitoring
and nurse
care.





Clinical study has found that 4

17%
of patients will undergo
cardiopulmonary or respiratory arrest
while in the GHW of hospital.


Project mission



Sudden deteriorations (e.g. septic
shock, cardiopulmonary or respiratory
arrest) of GHW patients can often be
severe and life threatening.




Goal: Provide early detection and
intervention based on data mining


to prevent these serious, often life
-
threatening events.


Using both
clinical data
and
wireless
body sensor data



A NIH
-
ICTS funded project: currently
under clinical trials at Barnes
-
Jewish
Hospital, St. Louis, MO








What exactly do we predict

Is he going

to die?

What exactly do we predict

Is he going

to ICU?

System Architecture


Tier 1: EWS (early warning system)



Clinical data, lab tests, manually collected, low frequency



Tier 2: RDS (real
-
time data sensing)



Body sensor data, automatically collected, wirelessly transmitted, high frequency


Agenda

Background and overview

1

Real
-
time data sensing (RDS)

3

Future work

5

Early warning system (EWS)

2

Medical
Record (34 vital signs: pulse, temperature, oxygen
saturation, shock index, respirations, age, blood pressure …)

Time/second

Time/second

Related Work

Main problems : Most previous
general work
uses a snapshot method
that takes all the features at a given time as input to a model, discarding
the temporal evolving of data

Medical


data

mining

medical


knowledge

machine

learning


method
s

SCAP and PSI

Acute Physiology


Score, Chronic

Health Score , and

APACHE score are

used to predict


renal failures

Modified Early

Warning

Score (MEWS)

decision

trees

neural

networks

SVM

Overview of EWS


Goal
:
Design an data mining algorithm that
can automatically
identify patients at risk of clinical deterioration based on their
existing electronic medical
records time
-
series.

Challenges:


Classification of high
-

dimensional time
series data


Irregular data gaps


measurement errors


class imbalance

Key Techniques in the EWS Algorithm


Temporal bucketing


Discriminative classification


Bootstrap aggregating (bagging)


Exploratory under
-
sampling


Exponential moving average smoothing


Kernel
-
density estimation

Workflow of the System

Data Preprocessing

Outlier removal


Normalization

Temporal Bucketing

We retain data in a sliding
window of
the
last
24 hours
and
divided it
evenly
into
6 buckets


In
order to capture
temporal variations,
we compute several
feature values for each bucket, including the
minimum,
maximum,

and
average

Bucket 1

Bucket 3

Bucket 5

Bucket 2

Bucket 4

Bucket 6

Discriminative Classification

Clinical data

Data
preprocessing

Classification
Algo
.

Output
Model,
Threshold



Logistic regression (LR)




Support vector machine (SVM)




Use
max
,
min
, and
avg

of each
bucket
and each
vital sign as the
input features. (~ 400
features in
total)




Use
the training data
to learn
the
model parameters.

Temporal Bucketing

Aggregated Bootstrapping (bagging)

Advantages:

1. Handles outliers

2. Avoid over
-
fitting

3. Better model quality




Biased Bucket Bagging

Exploratory Undersampling



Exponential Moving Average (EMA)

Evaluation Criteria

AUC
(Area Under receives operating characteristic (ROC) Curve)
represents the probability that a randomly chosen positive example is
correctly rated with greater suspicion than a randomly chosen negative
example.

Results on Historical Database

Method

AUC

SENS

PPV

NPV

ACCU

1

0.86809

0.44753

0.29562

0.97345

0.92747

2

0.8907

0.5135

0.3386

0.9751

0.9293

3

0.91995

0.58558

0.36864

0.97871

0.93269

4

0.92108

0.60087

0.37466

0.97948

0.93342

5

0.9221

0.60961

0.37805

0.97992

0.93384

1: bucketing
+ logistic regression

2: bucketing + logistic
regression +
bagging

3: bucketing + logistic regression + bucket bagging

4:
bucketing + logistic regression +
biased
bucket
bagging

5:
bucketing + logistic regression + biased bucket bagging +
exploratory
undersampling

At
specificity=0.95

Comparison of various models

Method

AUC

SPEC

SENS

PPV

NPV

ACCU

RPART

0.6703

0.93

0.55

0.287

0.977

0.912

SVM (Linear
kernel

0.6879

0.9762

0.3997

0.4405

0.9719

0.95033

SVM
(Quadratic
kernel

0.6851

0.9675

0.4028

0.3676

0.9718

0.94216

SVM (Cubic
kernel)

0.6792

0.9681

0.3904

0.3646

0.9713

0.94216

SVM(RBF
kernel


0.6968

0.9615

0.4321

0.3448

0.9730

0.93774

Our method 5

0.9221

0.94996

0.60961

0.37805

0.97992

0.93384

Dates

Start Date

Last Date

277 days

1/24/2011

11/1/2011

ICU
Transfers

total

with alert

w/o alert

ICU transfer

510

243

267

Total

11286

1430

9856

Ratio

4.5%

17.0 %

2.7 %

Deaths

total

with alert

w/o alert

Deaths

239

138

102

Total

11286

1430

9856

Ratio

2.12%

9.65 %

1.02 %

Alerts already triggered early prevention that may prevented deaths

Clinical Trial at Barnes
-
Jewish Hospital

Agenda

Background & Related work

1

Real
-
time data sensing (RDS)

3

Future work

5

Early warning system (EWS)

2

A challenging problem



Classification based on multiple high
-
frequency real
-
time time
-
series (heart rate, pulse, oxygen sat., CO2, temperature, etc.)





Overview of RDS

Wireless Sensor Network at BJH

Overview of Learning Algorithm

Key techniques:


Feature extraction from multiple time series


Feature selection


Classification algorithms


Exploratory
undersampling



A Large Pool of Features

Features:


Detrended fluctuation
analysis (DFA) features


Approximate entropy
(ApEn)


Spectral
features


First
-
order

features


Second
-
order features


Cross
-
sign

features

Detrended Fluctuation Analysis (DFA)

DFA is a method for quantifying the statistical

self
-
affinity

of a time
-
series

signal. (See: e.g.,
Peng

et al. 1994)


Applicable to both pulse rate and SpO
2

Spectral Analysis (FFT)

Used component values of VLF (<0.04Hz), LF (0.04
-
0,15HZ),

HF (0.15
-
0.4HZ), and the ratio LF/HF for each signal.

Other Features


Approximate Entropy (ApEn):

It quantifies the unpredictability of
fluctuations in a time series.


A low value


deterministic


A high value


unpredictable


First Order Features
:


Mean, standard deviation


skewness

(symmetry of distribution), Kurtosis (
peakness

of distribution)


Second Order Features
: related to co
-
occurrence of patterns


First quantify a time series into Q discrete bins, then construct a pattern matrix


energy (E), entropy (S), correlation (COR), inertia (F), local homogeneity (LH),


Cross
-
sign features:

link multiple vital signs together


Correlation: the degree of departure of two signals from independence


Coherence: amplitude and phase about the frequencies held in common
between two signals






Empty Feature Set

Current Feature Set

Evaluate each of the
remaining features

Forward Feature Selection

Pick one feature

to add into the set

(if no improvement)

Final feature set

Experimental Setup

Dataset
: MIMIC
-
II (
Multiparameter

Intelligent Monitoring in
Intensive Care II): A public
-
access ICU database


The data model can be used for
both
GHW patients
with sensors
and

ICU patients

Our data:
between 2001 and 2008 from a variety of ICUs
(medical, surgical, coronary care, and neonatal)

Prediction goal:

death or survival

Real
-
time vital signs:

heart rate and oxygen saturation rate

Class imbalance:
most patients survived

Evaluation:

Based on a 10
-
fold cross validation






Method

Feature

AUC

Specificity

Sensitivity

PPV

NPV

LSVM

1

0.5759

0.9497

0.0755

0.2550

0.7781

LR

1

0.4742

0.9483

0.0729

0.3181

0.7555

KSVM

1

0.5897

0.9497

0.1265

0.3643

0.7879

LSVM

2

0.4473

0.9497

0.0346

0.1300

0.7705

LR

2

0.4902

0.9483

0.0313

0.1667

0.7473

KSVM

2

0.5016

0.9497

0.0676

0.2450

0.7768

LSVM

1 & 2

0.5757

0.9497

0.1416

0.3917

0.7694

LR

1 & 2

0.5370

0.9483

0.0521

0.2500

0.7513

KSVM

1 & 2

0.6332

0.9497

0.1428

0.4146

0.7911

LSVM: Linear SVM

LR: Logistic Regression

KSVM: RBF Kernel SVM

1: DFA of Heart Rate

2: DFA of Oxygen Saturation

Result


Linear and Nonlinear Classification

Algorithm

Features

AUC

KSVM

DFA

0.6332

DFA + Cross
-
sign features

0.6565

DFA + Cross
-
sign features + ApEn

0.6753

All features

0.7079

Logistic Regression

DFA

0.5370

DFA + Cross
-
sign features

0.5731

DFA + Cross
-
sign features + ApEn

0.5974

All features

0.7402

Result


Feature Combinations

Result


Feature Selection

Method

#Selected
Features

AUC

Specificity

Sensitivity

PPV

NPV

KSVM


5

0.7752

0.9654

0.4852

0.8041

0.8651

LR

23

0.7844

0.9483

0.5208

0.7692

0.8567

LR is our first choice: better AUC, interpretability, efficiency

First 12 Selected Features (in logistic regression)

standard deviation of heart rate

ApEn of heart rate

Energy of oxygen saturation

LF of oxygen saturation

LF of heart rate

DFA of oxygen saturation

Mean of heart rate

HF of heart rate

Inertia of heart rate

Homogeneity of heart rate

Energy of heart rate

linear correlation of heart rate of oxygen saturation

Result


Our Final Model

Method

AUC

Specificity

Sensitivity

PPV

NPV

1

0.7402

0.9500

0.3646

0.7000

0.8185

2

0.7767

0.9500

0.4615

0.9000

0.6440

3


0.8082

0.9500

0.4865

0.9000

0.6546

Method 1:

Logistic Regression + all features

Method 2:

Logistic Regression + all
features + exploratory
undersampling

Method 3:

Logistic Regression +
feature selection +
exploratory
undersampling

Current Work: Density
-
based LR


Standard logistic regression
φ
k
(
x
) =
x
k
:


P(y=1|
x
) = 1/(1 + exp(
-


w
k
x
k
))


Probability of an event (e.g., ICU, death) grows or decreases
monotonically

with each feature


Not true in many case: e.g., ICU transfer rate vs. age


Ideas: transform each feature
x
k



Current Work: Density
-
based LR


Use a kernel
-
density estimator to estimate
p(
x
k
, y=1)
and
p(
x
k
, y=0)
for each feature
x
k





Resulting in a nonlinear separation plane that
conforms to the true distribution of data


Advantages over KLR, SVM


Efficiency, interpretability

Example of Density
-
based LR

Original

LR

Density
-
based

LR

Test



Data:

Future Work


Distance
-
based classification algorithms for multi
-
dimensional time
-
series


Dynamic time warping, information distance


Combination of feature
-
base and distance
-
based
classification algorithms


Include distance information in the objective function


Combining Tier
-
1 and Tier
-
2 data


Multi
-
kernel methods


Interpretation of alerts


Based on the magnitude and sign of model coefficients




Real
-
Time Simulation on Historical Data

Method

AUC

SENS

PPV

NPV

ACCU

1

0.6834

0.30159

0.2345

0.9634

0.9128

1 + EMA

0.78203

0.36508

0.27059

0.96664

0.9128

2

0.74359

0.30159

0.23457

0.96342

0.9293

2 + EMA

0.777737

0.38095

0.27907

0.96342

0.92134

4

0.77689

0.38905

0.27907

0.96745

0.9336

4 + EMA

0.81411

0.39683

0.28736

0.96825

0.92212

5

0.79902

0.4127

0.29545

0.96096

0.9229

5 + EMA

0.79902

0.4127

0.29545

0.96096

0.9229

@ Specificity=0.95

(Assuming feature

Independence)

Feature

Coefficient

local homogeneity of heart rate

-
14.50

standard deviation of oxygen
saturation

10.20

entropy of oxygen saturation

10.17

LF of heart rate

8.62

local homogeneity of oxygen
saturation

7.77

LF/HF of oxygen saturation

4.53

inertia of heart rate

3.86

entropy of heart rate

2.97

low frequency of oxygen
saturation

-
2.89

mean of oxygen saturation

-
2.86


Let each be the bucket sample that is independently
drawn from . is the predictor.


The aggregated predictor is:




The average prediction error in is:



The error in the aggregated predictor is:



Using the inequality gives us

.

Why Bagging Works?

Algorithm details


Biased Bucket bagging (BBB)

A critical factor deciding how much bagging will improve accuracy is the
variance of these bootstrap models. We see that BBB with
4
buckets has the
largest difference between and . Besides this,
BBB
with 4 buckets

also has the highest standard deviations in predict results. So
we choose BBB with 4 buckets as the final method.

Standard deviation

Algorithm Details

Bucket Bagging

Result on Real
-
Time System

We can see that all cases
attain best performance
when is around 0.06,
showing that the choice of is
robust. This small optimal
value shows that historical
records plays an important
role for prediction.

Cross validation for the EMA parameter