Practical Natural Language Processing Harvesting Low Hanging Fruit

wonderfuldistinctAI and Robotics

Oct 16, 2013 (4 years and 23 days ago)

53 views

Practical Natural Language Processing
Harvesting Low Hanging Fruit

Jeffrey P Ferraro
, PhD
(Jeffrey.Ferraro@imail.org)

Scott L DuVall, PhD (Scott.DuVall@hsc.utah.edu)

Outline



Application of NLP
-

involves
interpretation of radiographs.



Effects on NLP with and without radiology interpretations.



Some issues around obtaining good NLP results.



Difficult interpretations due to ambiguity in radiographs.



What’s been easy and what’s hard.



Alternatives tried to NLP that have failed.



Describe some different approaches to NLP.



Describe the mechanics of the method used.



Working Example


Early Detection

Community Acquired Pneumonia (CAP)

Significance




CAP along with influenza


8
th

leading cause of death in the United States.



~6 million cases annually / about 500,000


1.1 million hospitalizations annually.


Challenges




Diagnostic error rate for Pneumonia: 10%
-

25%



High variability in hospitalization decisions among clinicians (38%
-

79% of


hospitalizations could not be explained by illness severity.



10.7% treated as outpatients secondarily admitted within 7 day.



Early Detection


compliance w/Joint Commission (JCAHO) quality accreditation


Benefits




Reduction in diagnostic errors.



Reduction in unnecessary hospital admissions (20 times more costly).



Rapid diagnosis and severity assessment for proper care & treatment.


CAP Real
-
time Screening and
eProtocol

Vitals

Labs

Physical

Exam

Radiological

Reports

Predictive Diagnostic Screening

Likelihood of Pneumonia +/
-

CAP
eProtocol

1)
Pleural Effusion


IP vs. OP Treatment

2)
Multilobe

Infiltrates



Severe CAP Criteria


IP


ICU
Tretment

3)
Cavitary

Disease



MRSA Risk Factor


Treatment Protocol

Real
-
time

Predictive Diagnostic Screening

Bayesian Network


NLP Interpretation Effect

AUC with/NLP: 0.92

AUC without/NLP: 0.65

Average Merit

Average Rank Attribute

6951.986 +
-
26.887

1 +
-

0


NLP Finding

1618.487 +
-
22.655

2 +
-

0


Temperature

1120.249 +
-
26.959

3.1 +
-

0.3 Heart Rate

1078.845 +
-
11.476

3.9 +
-

0.3


Chief Complaint

662.104 +
-
31.555

5 +
-

0


Age

496.629 +
-
17.675

6.5 +
-

0.5 SPO2

489.792 +
-
15.667

6.5 +
-

0.5 Respiratory Rate
422.224 +
-
14.469

8 +
-

0


WBC

224.394 +
-

9.062

9.1 +
-

0.3 Systolic BP

203.584 +
-

9.537

10.3 +
-

0.46


Mean BP


Obtaining Good NLP Results

Conclusion



Radiology interpretations are necessary for good diagnostic prediction.


Challenges




Reduction in ambiguous language results in better predictive capabilities



Standardization of Terminology



Radiologists Response




Need complete and accurate clinical contexts.



Accurate protocol (film) selection


NLP Challenges
-

Ambiguous Language

Possible Pneumonia

Clinical History: Cough and
dyspnea
.


Study: PA and lateral chest on …


Findings: No comparison studies available.


Heart is normal in size. Aorta is moderately tortuous.


Lung volumes are normal.
Minimal airspace disease is identified

within the region of medial right middle lobe
, seen on frontal

and lateral views
.
This opacity is consistent with focal

atelectasis

or possibly inflammatory process.


No significant edema. No pleural effusion or
pneumothorax
.


Impression:


1. Mild airspace disease in right middle lobe, as above.




NLP Challenges
-

Ambiguous Language

Positive Pneumonia

PA and lateral chest radiograph


Comparison: None


Indication: Fever, cough


Findings:


The lungs are symmetrically inflated there is obscuration of the

cardiac apex on the frontal projection with increased attenuation

on the lateral view suggesting
subsegmental

lingular

airspace

disease.
The diaphragm appears well visualized. Mild

calcification noted at the level of the aortic arch. Mild
ectasia

of the descending thoracic aorta. Mild
cardiomegaly
. Osseous

structures and soft tissues are unremarkable.


Impression:

1.
Subsegmental

airspace disease of the
lingula
. Recommend a
followup

erect PA and lateral chest radiograph.


2. Mild
cardiomegaly
.


3. Mild atherosclerotic disease and
ectasia

of the thoracic

aorta.


NLP


What’s Easy and What’s Hard

Clinical Findings using NLP



+/
-

Pneumonia


Fairly Good
(ambiguity / standard terminology)


Sen: 0.95, Spec: 0.81, PPV: 0.93, Acc: 0.91




+/
-

Pleural Effusions


Good

(succinct language)



Sen: 0.93, Spec: 0.97, PPV: 0.84, Acc: 0.96




+/
-

Cavitary

Disease


Good

(succinct language)



Sen: 0.88, Spec: 1.0, PPV: 1.0, Acc: 1.0




Single lobe or Multi
-
lobe Infiltrates


Poor

(must be inferred by locations: RLL,


RML, RUL, LLL, LUL, Right Lung, Left Lung) Error Propagation Problem



Single Lobe: Sen: 0.65, Spec: 0.78, PPV: 0.82, Acc: 0.70


Multi
-
lobe: Sen: 0.78, Spec: 0.65, PPV: 0.58, Acc: 0.70


NLP Alternatives that Failed

Templating



<report body>

. . . .

Pneumonia Quality Assurance

-----------------------------------------

Parenchymal

opacity c/w pneumonia in the appropriate clinical setting (yes | indeterminate):


or

No
parenchymal

opacity to suggest pneumonia.


Multilobar

or bilateral involvement (
yes|no
):


Cavitation

(
yes|no
):


Pleural fluid (
yes|no
):

No Compliance w/
Templating



Complex cases



Proper Protocol (Film) & Clinical Context



Workflow / Productivity Impact (Template Selection)



Approaches to NLP

Direct Machine Learning (ML) Classification



Course Grained Approach


bag of words, sentences, n
-
grams, chunking phrases



Supervised Learning Methods (Need to know Truth)


-

Support Vector Machines


-

Gaussian Mixture Models


-

Bayesian Networks


-

K
-
Nearest Neighbor


-

Decision Trees


-

Random Forests


Information Extraction & Rule based



Fine Grained Approach


extract clinical concept categories (e.g., appliances, state change,


locations, clinical findings)



IE: Pattern Matching, NER, statistical inference models)



Classification (Inference Rules)


Information Extraction & Machine Learning



Fine Grained Approach


extract clinical concept categories (e.g., appliances, state


change, locations, clinical findings)



IE: Pattern Matching, NER, statistical inference models)



Classification (ML: Supervised Learning Methods)


Course Grained Method

+ /
-

Pneumonia



Machine Learning Approach


Random Forest (ensemble of decision trees)



Decision Trees from various perspectives (randomly constrain available features)



Majority Vote Rules


Required Artifacts

-

Segmenter

-

decompose document to sentence

-

Sentence Level Annotation (positive evidence, negative evidence, no


information gain)

-

Statistical Methods (10
-
fold cross validation, Bootstrapping)

-

Training and test data sets


Course Grained Method

The upper lobes are well aerated and normal .

0

The right lung appears clear .

0

The remainder of the lungs is clear .

0

The right lung and apical portion of the left upper lobe remains clear .

0

Left lung is clear .

0

No other areas of air space opacity are seen to suggest other regions of pneumonia .

0

There is no segmental or lobar consolidation .

0

There is an area of
parenchymal

lung opacity present in the left lower lobe and pneumonia is questioned .

1

Left lower lobe pneumonia .

1

However , there is an area of patchy
parenchymal

opacity present
posteriorly

in the right lower lobe .

1

The appearance of this is suggestive of pneumonia .

1

There is confluent
opacification

of most of the
posterobasal

segment of the left lower lobe .

1

Left lower lobe consolidation , consistent with pneumonia .

1

There is complete
opacification

of the superior segment of the right lower lobe , with air
bronchograms

.

1

Frontal view gives this consolidated segment a round appearance .

1

Segmental consolidation , consistent with pneumonia .

1

There is confluent opacity in the left base which blurs the diaphragm .

1

Left basilar
subsegmental

consolidation , consistent with pneumonia .

1

Spondylitic

change is noted of the thoracic spine .

2

On the lateral projection this appears to be in the
lingula

and the lower lobe .

2

Heart size and vascular pattern are within normal limits .

2

Cardiac size is normal .

2

No
extraventilatory

air is seen .

2

Underlying bilateral severe
bullous

emphysema is also noted , with marked hyperinflation in the upper lobes .

2

Central pulmonary arterial hypertension is noted .

2

Expansion is within normal limits .

2

Osseous mineralization is diffusely diminished .

2

Thoracic
kyphosis

is accentuated and there is severe collapse of several mid thoracic vertebrae .

2

-

ve

+’
ve

~IG


Course Grained Method

Feature Set (X)

upper

right

lingula

size

lateral

lobe

pneumonia

air

seen

1

1

0

0

0

1

1

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

1

1

1

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Truth (Y)

1

1

0

0

2

2




Course Grained Method

Example


Questions ?