Department of Computer Science and Engineering
Yixin
Chen
（陈一昕）
, Yi Mao, Minmin
Chen,
Rahav
Dor
, Greg
Hackermann
,
Zhicheng
Yang,
Chengyang
Lu
School
of Medicine
Kelly Faulkner, Kevin Heard, Marin
Kollef
, Thomas Bailey
Real

Time Clinical Warning for Hospitalized Patients via
Data Mining
（数据挖掘实现的住院病人的实时预警）
Background
•
The ICU direct costs per day for
survivors
is
between six and seven
times those for non

ICU care.
•
Unlike patients at ICUs,
general
hospital wards (
GHW) patients are not
under extensive electronic monitoring
and nurse
care.
•
Clinical study has found that 4
–
17%
of patients will undergo
cardiopulmonary or respiratory arrest
while in the GHW of hospital.
Project mission
•
Sudden deteriorations (e.g. septic
shock, cardiopulmonary or respiratory
arrest) of GHW patients can often be
severe and life threatening.
•
Goal: Provide early detection and
intervention based on data mining
–
to prevent these serious, often life

threatening events.
–
Using both
clinical data
and
wireless
body sensor data
•
A NIH

ICTS funded project: currently
under clinical trials at Barnes

Jewish
Hospital, St. Louis, MO
What exactly do we predict
Is he going
to die?
What exactly do we predict
Is he going
to ICU?
System Architecture
•
Tier 1: EWS (early warning system)
•
Clinical data, lab tests, manually collected, low frequency
•
Tier 2: RDS (real

time data sensing)
•
Body sensor data, automatically collected, wirelessly transmitted, high frequency
Agenda
Background and overview
1
Real

time data sensing (RDS)
3
Future work
5
Early warning system (EWS)
2
Medical
Record (34 vital signs: pulse, temperature, oxygen
saturation, shock index, respirations, age, blood pressure …)
Time/second
Time/second
Related Work
Main problems : Most previous
general work
uses a snapshot method
that takes all the features at a given time as input to a model, discarding
the temporal evolving of data
Medical
data
mining
medical
knowledge
machine
learning
method
s
SCAP and PSI
Acute Physiology
Score, Chronic
Health Score , and
APACHE score are
used to predict
renal failures
Modified Early
Warning
Score (MEWS)
decision
trees
neural
networks
SVM
Overview of EWS
Goal
:
Design an data mining algorithm that
can automatically
identify patients at risk of clinical deterioration based on their
existing electronic medical
records time

series.
Challenges:
•
Classification of high

dimensional time
series data
•
Irregular data gaps
•
measurement errors
•
class imbalance
Key Techniques in the EWS Algorithm
•
Temporal bucketing
•
Discriminative classification
•
Bootstrap aggregating (bagging)
•
Exploratory under

sampling
•
Exponential moving average smoothing
•
Kernel

density estimation
Workflow of the System
Data Preprocessing
Outlier removal
Normalization
Temporal Bucketing
We retain data in a sliding
window of
the
last
24 hours
and
divided it
evenly
into
6 buckets
In
order to capture
temporal variations,
we compute several
feature values for each bucket, including the
minimum,
maximum,
and
average
Bucket 1
Bucket 3
Bucket 5
Bucket 2
Bucket 4
Bucket 6
Discriminative Classification
Clinical data
Data
preprocessing
Classification
Algo
.
Output
Model,
Threshold
•
Logistic regression (LR)
•
Support vector machine (SVM)
•
Use
max
,
min
, and
avg
of each
bucket
and each
vital sign as the
input features. (~ 400
features in
total)
•
Use
the training data
to learn
the
model parameters.
Temporal Bucketing
Aggregated Bootstrapping (bagging)
Advantages:
1. Handles outliers
2. Avoid over

fitting
3. Better model quality
Biased Bucket Bagging
Exploratory Undersampling
Exponential Moving Average (EMA)
Evaluation Criteria
AUC
(Area Under receives operating characteristic (ROC) Curve)
represents the probability that a randomly chosen positive example is
correctly rated with greater suspicion than a randomly chosen negative
example.
Results on Historical Database
Method
AUC
SENS
PPV
NPV
ACCU
1
0.86809
0.44753
0.29562
0.97345
0.92747
2
0.8907
0.5135
0.3386
0.9751
0.9293
3
0.91995
0.58558
0.36864
0.97871
0.93269
4
0.92108
0.60087
0.37466
0.97948
0.93342
5
0.9221
0.60961
0.37805
0.97992
0.93384
1: bucketing
+ logistic regression
2: bucketing + logistic
regression +
bagging
3: bucketing + logistic regression + bucket bagging
4:
bucketing + logistic regression +
biased
bucket
bagging
5:
bucketing + logistic regression + biased bucket bagging +
exploratory
undersampling
At
specificity=0.95
Comparison of various models
Method
AUC
SPEC
SENS
PPV
NPV
ACCU
RPART
0.6703
0.93
0.55
0.287
0.977
0.912
SVM (Linear
kernel
0.6879
0.9762
0.3997
0.4405
0.9719
0.95033
SVM
(Quadratic
kernel
0.6851
0.9675
0.4028
0.3676
0.9718
0.94216
SVM (Cubic
kernel)
0.6792
0.9681
0.3904
0.3646
0.9713
0.94216
SVM(RBF
kernel
0.6968
0.9615
0.4321
0.3448
0.9730
0.93774
Our method 5
0.9221
0.94996
0.60961
0.37805
0.97992
0.93384
Dates
Start Date
Last Date
277 days
1/24/2011
11/1/2011
ICU
Transfers
total
with alert
w/o alert
ICU transfer
510
243
267
Total
11286
1430
9856
Ratio
4.5%
17.0 %
2.7 %
Deaths
total
with alert
w/o alert
Deaths
239
138
102
Total
11286
1430
9856
Ratio
2.12%
9.65 %
1.02 %
Alerts already triggered early prevention that may prevented deaths
Clinical Trial at Barnes

Jewish Hospital
Agenda
Background & Related work
1
Real

time data sensing (RDS)
3
Future work
5
Early warning system (EWS)
2
A challenging problem
•
Classification based on multiple high

frequency real

time time

series (heart rate, pulse, oxygen sat., CO2, temperature, etc.)
Overview of RDS
Wireless Sensor Network at BJH
Overview of Learning Algorithm
Key techniques:
Feature extraction from multiple time series
Feature selection
Classification algorithms
Exploratory
undersampling
A Large Pool of Features
Features:
•
Detrended fluctuation
analysis (DFA) features
•
Approximate entropy
(ApEn)
•
Spectral
features
•
First

order
features
•
Second

order features
•
Cross

sign
features
Detrended Fluctuation Analysis (DFA)
DFA is a method for quantifying the statistical
self

affinity
of a time

series
signal. (See: e.g.,
Peng
et al. 1994)
Applicable to both pulse rate and SpO
2
Spectral Analysis (FFT)
Used component values of VLF (<0.04Hz), LF (0.04

0,15HZ),
HF (0.15

0.4HZ), and the ratio LF/HF for each signal.
Other Features
•
Approximate Entropy (ApEn):
It quantifies the unpredictability of
fluctuations in a time series.
–
A low value
deterministic
–
A high value
unpredictable
•
First Order Features
:
–
Mean, standard deviation
–
skewness
(symmetry of distribution), Kurtosis (
peakness
of distribution)
•
Second Order Features
: related to co

occurrence of patterns
–
First quantify a time series into Q discrete bins, then construct a pattern matrix
–
energy (E), entropy (S), correlation (COR), inertia (F), local homogeneity (LH),
•
Cross

sign features:
link multiple vital signs together
–
Correlation: the degree of departure of two signals from independence
–
Coherence: amplitude and phase about the frequencies held in common
between two signals
Empty Feature Set
Current Feature Set
Evaluate each of the
remaining features
Forward Feature Selection
Pick one feature
to add into the set
(if no improvement)
Final feature set
Experimental Setup
Dataset
: MIMIC

II (
Multiparameter
Intelligent Monitoring in
Intensive Care II): A public

access ICU database
The data model can be used for
both
GHW patients
with sensors
and
ICU patients
Our data:
between 2001 and 2008 from a variety of ICUs
(medical, surgical, coronary care, and neonatal)
Prediction goal:
death or survival
Real

time vital signs:
heart rate and oxygen saturation rate
Class imbalance:
most patients survived
Evaluation:
Based on a 10

fold cross validation
Method
Feature
AUC
Specificity
Sensitivity
PPV
NPV
LSVM
1
0.5759
0.9497
0.0755
0.2550
0.7781
LR
1
0.4742
0.9483
0.0729
0.3181
0.7555
KSVM
1
0.5897
0.9497
0.1265
0.3643
0.7879
LSVM
2
0.4473
0.9497
0.0346
0.1300
0.7705
LR
2
0.4902
0.9483
0.0313
0.1667
0.7473
KSVM
2
0.5016
0.9497
0.0676
0.2450
0.7768
LSVM
1 & 2
0.5757
0.9497
0.1416
0.3917
0.7694
LR
1 & 2
0.5370
0.9483
0.0521
0.2500
0.7513
KSVM
1 & 2
0.6332
0.9497
0.1428
0.4146
0.7911
LSVM: Linear SVM
LR: Logistic Regression
KSVM: RBF Kernel SVM
1: DFA of Heart Rate
2: DFA of Oxygen Saturation
Result
–
Linear and Nonlinear Classification
Algorithm
Features
AUC
KSVM
DFA
0.6332
DFA + Cross

sign features
0.6565
DFA + Cross

sign features + ApEn
0.6753
All features
0.7079
Logistic Regression
DFA
0.5370
DFA + Cross

sign features
0.5731
DFA + Cross

sign features + ApEn
0.5974
All features
0.7402
Result
–
Feature Combinations
Result
–
Feature Selection
Method
#Selected
Features
AUC
Specificity
Sensitivity
PPV
NPV
KSVM
5
0.7752
0.9654
0.4852
0.8041
0.8651
LR
23
0.7844
0.9483
0.5208
0.7692
0.8567
LR is our first choice: better AUC, interpretability, efficiency
First 12 Selected Features (in logistic regression)
standard deviation of heart rate
ApEn of heart rate
Energy of oxygen saturation
LF of oxygen saturation
LF of heart rate
DFA of oxygen saturation
Mean of heart rate
HF of heart rate
Inertia of heart rate
Homogeneity of heart rate
Energy of heart rate
linear correlation of heart rate of oxygen saturation
Result
–
Our Final Model
Method
AUC
Specificity
Sensitivity
PPV
NPV
1
0.7402
0.9500
0.3646
0.7000
0.8185
2
0.7767
0.9500
0.4615
0.9000
0.6440
3
0.8082
0.9500
0.4865
0.9000
0.6546
Method 1:
Logistic Regression + all features
Method 2:
Logistic Regression + all
features + exploratory
undersampling
Method 3:
Logistic Regression +
feature selection +
exploratory
undersampling
Current Work: Density

based LR
•
Standard logistic regression
φ
k
(
x
) =
x
k
:
–
P(y=1
x
) = 1/(1 + exp(

∑
w
k
x
k
))
–
Probability of an event (e.g., ICU, death) grows or decreases
monotonically
with each feature
–
Not true in many case: e.g., ICU transfer rate vs. age
•
Ideas: transform each feature
x
k
Current Work: Density

based LR
•
Use a kernel

density estimator to estimate
p(
x
k
, y=1)
and
p(
x
k
, y=0)
for each feature
x
k
•
Resulting in a nonlinear separation plane that
conforms to the true distribution of data
•
Advantages over KLR, SVM
–
Efficiency, interpretability
Example of Density

based LR
Original
LR
Density

based
LR
Test
Data:
Future Work
•
Distance

based classification algorithms for multi

dimensional time

series
–
Dynamic time warping, information distance
•
Combination of feature

base and distance

based
classification algorithms
–
Include distance information in the objective function
•
Combining Tier

1 and Tier

2 data
–
Multi

kernel methods
•
Interpretation of alerts
–
Based on the magnitude and sign of model coefficients
Real

Time Simulation on Historical Data
Method
AUC
SENS
PPV
NPV
ACCU
1
0.6834
0.30159
0.2345
0.9634
0.9128
1 + EMA
0.78203
0.36508
0.27059
0.96664
0.9128
2
0.74359
0.30159
0.23457
0.96342
0.9293
2 + EMA
0.777737
0.38095
0.27907
0.96342
0.92134
4
0.77689
0.38905
0.27907
0.96745
0.9336
4 + EMA
0.81411
0.39683
0.28736
0.96825
0.92212
5
0.79902
0.4127
0.29545
0.96096
0.9229
5 + EMA
0.79902
0.4127
0.29545
0.96096
0.9229
@ Specificity=0.95
(Assuming feature
Independence)
Feature
Coefficient
local homogeneity of heart rate

14.50
standard deviation of oxygen
saturation
10.20
entropy of oxygen saturation
10.17
LF of heart rate
8.62
local homogeneity of oxygen
saturation
7.77
LF/HF of oxygen saturation
4.53
inertia of heart rate
3.86
entropy of heart rate
2.97
low frequency of oxygen
saturation

2.89
mean of oxygen saturation

2.86
Let each be the bucket sample that is independently
drawn from . is the predictor.
The aggregated predictor is:
The average prediction error in is:
The error in the aggregated predictor is:
Using the inequality gives us
.
Why Bagging Works?
Algorithm details
–
Biased Bucket bagging (BBB)
A critical factor deciding how much bagging will improve accuracy is the
variance of these bootstrap models. We see that BBB with
4
buckets has the
largest difference between and . Besides this,
BBB
with 4 buckets
also has the highest standard deviations in predict results. So
we choose BBB with 4 buckets as the final method.
Standard deviation
Algorithm Details
–
Bucket Bagging
Result on Real

Time System
We can see that all cases
attain best performance
when is around 0.06,
showing that the choice of is
robust. This small optimal
value shows that historical
records plays an important
role for prediction.
Cross validation for the EMA parameter
Comments 0
Log in to post a comment