Recent Efforts in Clinical NLP:

blabbingunequaledΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

111 εμφανίσεις

Recent Efforts in Clinical NLP:

Clinical Text Analysis and Knowledge
Extraction System (cTAKES)

Guergana K. Savova, PhD

Children’s Hospital Boston and Harvard
Medical School

Acknowledgements

Software

developers

and

contributors

at

different

times

(in

no

specific

order)

James

Masanz,

Mayo

Clinic

Patrick

Duffy,

Mayo

Clinic

Philip

Ogren,

University

of

Colorado

Sean

Murphy,

Mayo

Clinic

Vinod

Kaggal,

Mayo

Clinic

Jiaping

Zheng,

Childrens

Hospital

Boston

Pei

Chen,

Childrens

Hospital

Boston

Jihno

Choi,

University

of

Colorado


Investigators

(in

no

specific

order)

Christopher

Chute,

MD,

DrPH,

Mayo

Clinic

James

Buntrock,

MS,

Mayo

Clinic

Guergana

Savova,

PhD,

Childrens

Hospital

Boston

Overview

Background

Clinical

Text

Analysis

and

Knowledge

Extraction

System

(cTAKES)

cTAKES

for

developers


Download and install of cTAKES


How to build the dictionary

cTAKES
:

graphical

user

interface

4

Definitions


Information

Extraction

(IE)


Extracting existing facts from unstructured or loosely structured text
into a structured form


Information

Retrieval

(IR)


Finding documents relevant to a user query


Named

Entity

Recognition

(NER)


Discovery of groups of textual mentions that belong to certain
semantic class


Natural

Language

Processing

(NLP)


Computational methods for text processing based on linguistically
sound principles


Clinical NLP


NLP for the clinical narrative


Biomedical NLP


NLP for the clinical narrative and biomedical
literature

5

Problem Space


Structured

information


Relational databases


Easy to extract information from them


Semi
-
structured

information


Loosely formatted XML, CSV tables


Not challenging to extract information


Unstructured

information


Scholarly literature, clinical notes, research reports, webpages


Majority of information is unstructured!!


Real challenge to extract the information

Overarching Goal

Open
-
source,

general
-
purpose

clinical

NLP

toolkit


Phenotype extraction from unstructured data


Library of modules


Cohesive with other initiatives


Cutting edge methodologies


Best software development practices

Our

principles


Open source


Scalable and robust


Modular and expandable


Based on existing standards and conventions


Scalable, adaptable methodologies through open collaboration in
the open
-
source development

A 43
-
year
-
old woman was
diagnosed with type 2
diabetes mellitus by her
family physician 3
mpresentation. Her initial
blood glucose was 340 mg/dL.
Glyburide

A 43
-
year
-
old woman
was diagnosed with
type 2 diabetes mellitus
by her family physician
3 months before this
presentation. Her
initial blood glucose
was 340 mg/dL.
Glyburide

A 43
-
year
-
old woman was
diagnosed with type 2 diabetes
mellitus by her family physician
3 months before this
presentation. Her initial blood
glucose was 340 mg/dL.
Glyburide

A 43
-
year
-
old woman was diagnosed with type 2 diabetes
mellitus by her family physician 3 months before this
presentation. Her initial blood glucose was 340 mg/dL. Glyburide
2.5 mg once daily was prescribed. Since then, self
-
monitoring of
blood glucose (SMBG) showed blood glucose levels of 250
-
270
mg/dL. She was referred to an endocrinologist for further
evaluation.

On examination, she was normotensive and not acutely ill. Her
body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb
weight loss. Her thyroid was symmetrically enlarged and ankle
reflexes absent. Her blood glucose was 272 mg/dL, and her
hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total
cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL
level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function
was normal. Urinanalysis showed trace ketones.

She adhered to a regular exercise program and vitamin regimen,
smoked 2 packs of cigarettes daily for the past 25 years, and
limited her alcohol intake to 1 drink daily. Her mother's brother
was diabetic.

Processing Clinical Notes

A 43
-
year
-
old woman was diagnosed with type 2 diabetes
mellitus by her family physician 3 months before this
presentation
.
Her initial blood glucose was 340 mg/dL.
Glyburide 2.5 mg once daily was prescribed
. Since then,
self
-
monitoring of blood glucose (SMBG) showed blood
glucose levels of 250
-
270 mg/dL. She was referred to an
endocrinologist for further evaluation.

On examination, she was normotensive and not acutely
ill. Her body mass index (BMI) was 18.7 kg/m2 following
a recent 10 lb weight loss. Her thyroid was
symmetrically enlarged and ankle reflexes absent. Her
blood glucose was 272 mg/dL, and her hemoglobin A1c
(HbA1c) was 10.3%. A lipid profile showed a total
cholesterol of 261 mg/dL, triglyceride level of 321
mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL.
Thyroid function was normal. Urinanalysis showed trace
ketones.

She adhered to a regular exercise program and vitamin
regimen,

smoked 2 packs of cigarettes daily for the
past 25 years
, and limited her alcohol intake to 1
drink daily.
Her mother's brother was diabetic.


Clinical Element Model

http://intermountainhealthcare.org/cem/Pages/
home.aspx

Disorder CEM

text:

diabetes mellitus

code:

73211009

subject:


patient

relative temporal context:


3 months ago

negation indicator:


not negated

Disorder CEM

text:

diabetes mellitus

code:

73211009

subject:


family member

relative temporal context:



negation indicator:


not negated

Tobacco Use CEM

text:

smoking

code:

365981007

subject:


patient

relative temporal context:


25 years

negation indicator:


not negated

Medication CEM

text:

Glyburide

code:

315989

subject:


patient

frequency:


once daily

negation indicator:


not negated

strength:

2.5 mg

A 43
-
year
-
old woman was diagnosed with type 2 diabetes
mellitus by her family physician 3 months before this
presentation
.
Her initial blood glucose was 340 mg/dL.
Glyburide 2.5 mg once daily was prescribed
.
Since then,
self
-
monitoring of blood glucose (SMBG) showed blood
glucose levels of 250
-
270 mg/dL. She was referred to an
endocrinologist for further evaluation.

On examination, she was normotensive and not acutely
ill. Her body mass index (BMI) was 18.7 kg/m2 following
a recent 10 lb weight loss. Her thyroid was
symmetrically enlarged and ankle reflexes absent. Her
blood glucose was 272 mg/dL, and her hemoglobin A1c
(HbA1c) was 10.3%. A lipid profile showed a total
cholesterol of 261 mg/dL, triglyceride level of 321
mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL.
Thyroid function was normal. Urinanalysis showed trace
ketones.


She adhered to a regular exercise program and vitamin
regimen,

smoked 2 packs of cigarettes daily for the
past 25 years
, and limited her alcohol intake to 1
drink daily.

Her mother's brother was diabetic.


A 43
-
year
-
old woman was diagnosed with type 2 diabetes
mellitus by her family physician 3 months before this
presentation.

Her initial blood glucose was 340 mg/dL.
Glyburide 2.5 mg once daily was prescribed
.
Since then,
self
-
monitoring of blood glucose (SMBG) showed blood
glucose levels of 250
-
270 mg/dL. She was referred to an
endocrinologist for further evaluation.

On examination, she was normotensive and not acutely
ill. Her body mass index (BMI) was 18.7 kg/m2 following
a recent 10 lb weight loss. Her thyroid was
symmetrically enlarged and ankle reflexes absent. Her
blood glucose was 272 mg/dL, and her hemoglobin A1c
(HbA1c) was 10.3%. A lipid profile showed a total
cholesterol of 261 mg/dL, triglyceride level of 321
mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL.
Thyroid function was normal. Urinanalysis showed trace
ketones.


She adhered to a regular exercise program and vitamin
regimen,

smoked 2 packs of cigarettes daily for the
past 25 years
, and limited her alcohol intake to 1
drink daily.

Her mother's brother was diabetic.


A 43
-
year
-
old woman was diagnosed with type 2 diabetes
mellitus by her family physician 3 months before this
presentation.

Her initial blood glucose was 340 mg/dL.
Glyburide 2.5 mg once daily was prescribed
.
Since then,
self
-
monitoring of blood glucose (SMBG) showed blood
glucose levels of 250
-
270 mg/dL. She was referred to an
endocrinologist for further evaluation.

On examination, she was normotensive and not acutely
ill. Her body mass index (BMI) was 18.7 kg/m2 following
a recent 10 lb weight loss. Her thyroid was
symmetrically enlarged and ankle reflexes absent. Her
blood glucose was 272 mg/dL, and her hemoglobin A1c
(HbA1c) was 10.3%. A lipid profile showed a total
cholesterol of 261 mg/dL, triglyceride level of 321
mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL.
Thyroid function was normal. Urinanalysis showed trace
ketones.


She adhered to a regular exercise program and vitamin
regimen,

smoked 2 packs of cigarettes daily for the
past 25 years
, and limited her alcohol intake to 1
drink daily.

Her mother's brother was diabetic.


A 43
-
year
-
old woman was diagnosed with type 2 diabetes
mellitus by her family physician 3 months before this
presentation.

Her initial blood glucose was 340 mg/dL.
Glyburide 2.5 mg once daily was prescribed
.
Since then,
self
-
monitoring of blood glucose (SMBG) showed blood
glucose levels of 250
-
270 mg/dL. She was referred to an
endocrinologist for further evaluation.

On examination, she was normotensive and not acutely
ill. Her body mass index (BMI) was 18.7 kg/m2 following
a recent 10 lb weight loss. Her thyroid was
symmetrically enlarged and ankle reflexes absent. Her
blood glucose was 272 mg/dL, and her hemoglobin A1c
(HbA1c) was 10.3%. A lipid profile showed a total
cholesterol of 261 mg/dL, triglyceride level of 321
mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL.
Thyroid function was normal. Urinanalysis showed trace
ketones.


She adhered to a regular exercise program and vitamin
regimen, smoked 2 packs of cigarettes daily for the
past 25 years, and limited her alcohol intake to 1
drink daily.

Her mother's brother was diabetic.


Comparative Effectiveness

Disorder CEM

text:

diabetes mellitus

code:

73211009

subject:


patient

relative temporal context:


3 months ago

negation indicator:


not negated

Disorder CEM

text:

diabetes mellitus

code:

73211009

subject:


family member

relative temporal context:



negation indicator:


not negated

Tobacco Use CEM

text:

smoking

code:

365981007

subject:


patient

relative temporal context:


25 years

negation indicator:


not negated

Medication CEM

text:

Glyburide

code:

315989

subject:


patient

frequency:


once daily

negation indicator:


not negated

strength:

2.5 mg

Compare the effectiveness of different treatment
strategies (e.g., modifying target levels for glucose,
lipid, or blood pressure) in reducing cardiovascular
complications in newly diagnosed adolescents and
adults with type 2 diabetes
.


Compare the effectiveness of traditional behavioral
interventions versus economic incentives in
motivating behavior changes (e.g., weight loss,
smoking cessation, avoiding alcohol and substance
abuse) in children and adults
.

Meaningful Use

Disorder CEM

text:

diabetes mellitus

code:

73211009

subject:


patient

relative temporal context:


3 months ago

negation indicator:


not negated

Disorder CEM

text:

diabetes mellitus

code:

73211009

subject:


family member

relative temporal context:



negation indicator:


not negated

Tobacco Use CEM

text:

smoking

code:

365981007

subject:


patient

relative temporal context:


25 years

negation indicator:


not negated

Medication CEM

text:

Glyburide

code:

315989

subject:


patient

frequency:


once daily

negation indicator:


not negated

strength:

2.5 mg


Maintain problem list


Maintain active med list


Record smoking status


Provide clinical summaries for each office visit


Generate patient lists for specific conditions


Submit syndromic surveillance data

Clinical Practice

Disorder CEM

text:

diabetes mellitus

code:

73211009

subject:


patient

relative temporal context:


3 months ago

negation indicator:


not negated

Medication CEM

text:

Glyburide

code:

315989

subject:


patient

frequency:


once daily

negation indicator:


not negated

strength:

2.5 mg


Provide problem list and meds from the visit

Applications


Meaningful

use

of

the

EMR


Comparative

effectiveness


Clinical

investigation


Patient cohort identification


Phenotype extraction


Epidemiology


Clinical

practice


and

many

more

.

With

deep

semantic

processing,

the

sky

is

the

limit

for

applications

Partnerships

NCBC
-
funded

initiatives


Integrating Data for Analysis, Anonymization and Sharing (iDASH)


Ontology Development and Information Extraction (ODIE)

Veterans

Administration

Strategic

Health

Advanced

Research

Projects

(SHARP)


SHARP 3: SMaRT app (http://www.smartplatforms.org/)


SHARP 4: www.sharpn.org

R
01
s


Shared annotated lexical resource


Temporal relation discovery for the clinical domain


Milti
-
source integrated platform for answering clinical questions

eMERGE,

PGRN

(Pharmacogenomics

Research

Network)

Linguistic

Data

Consortium

and

Penn

Treebank

MITRE

Corporation

Integrating cTAKES within i2b2

Querying encrypted clinical notes stored in
the i2b2 database

Processing the result notes through
cTAKES

Persisting extracted concepts into the i2b2
database

Thus, the concepts are now searchable by
the researcher

Enabling the training and running
classifiers directly from the i2b2
workbench

https://www.i2b2.org/events/slides/i2b2_AMIA_Tutorial_20100310.pdf

….a scalable informatics framework that will enable clinical
researchers to use existing clinical data for discovery research
and, when combined with IRB
-
approved genomic data,
facilitate the design of targeted therapies for individual
patients with diseases having genetic origins.

15

clinical Text Analysis and Knowledge
Extraction System (cTAKES)

16

cTAKES Adoption


May,

2011
:



2306 downloads*


eMERGE

(SGH,

NW)


PGRN

(HMS,

NW)


Extensions
:

Yale

(YATEX),

MITRE

* Source: http://sourceforge.net/project/stats/?group_id=255545&ugn=ohnlp&type=&mode=alltime

18

cTAKES Technical Details


Open

source


Apache v2.0 license


http://sourceforge.net/projects/ohnlp/


Java 1.5


Dependency on UMLS which requires a UMLS license (free)


Framework



IBM’s Unstructured Information Management Architecture (UIMA)
open source framework, Apache project


Methods



Natural Language Processing methods (NLP)


Based on standards and conventions to foster interoperability


Application



High
-
throughput system

19

cTAKES: Components


Sentence boundary detection (OpenNLP technology)


Tokenization (rule
-
based)


Morphologic normalization (NLM’s LVG)


POS tagging (OpenNLP technology)


Shallow parsing (OpenNLP technology)


Named Entity Recognition


Dictionary mapping (lookup algorithm)


Machine learning (MAWUI)


types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications


Negation and context identification (NegEx)


Dependency parser


Drug Profile module


Smoking status classifier


CEM normalization module (soon to be released)

20

Output Example: Drug Object



Tamoxifen

20

mg

po

daily

started

on

March

1
,

2005
.



Drug


Text: Tamoxifen


Associated code: C0351245


Strength: 20 mg


Start date: March 1, 2005


End date: null


Dosage: 1.0


Frequency: 1.0


Frequency unit: daily


Duration: null


Route: Enteral Oral


Form: null


Status: current


Change Status: no change


Certainty: null

21

Output Example: Disorder Object



No

evidence

of

cholangiocarcinoma
.



Disorder


Text: cholangiocarcinoma


Associated code: SNOMED 70179006


Certainty: 1


Context: current


Relatedness to patient: true


Status: negated



(1)
cTAKES for developers

Download and install of cTAKES


Building the dictionary

Jiaping Zheng

Children’s Hospital Boston

Introduction


See

separate

pdf

for

the

slides


24

Graphical User Interface (GUI) to cTAKES:

a Prototype

Pei J. Chen

Children’s Hospital Boston

cTAKES as a Service

Objectives

1.
Demo cTAKES prototype web application

Empower End Users to leverage cTAKES

2.
Gather feedback for future cTAKES GUI

3.
Potential system integrations with other applications

(i.e. i2b2, ARC, Web Annotator)


Developed

within

i
2
b
2

to

integrate

cTAKES

in

the

i
2
b
2

NLP

cell


cTAKES Web Application: a Prototype



http://chipweb2.chip.org/cTakes_webservice_trunk/index.html


Single clinical note

Technologies

Front
-
End


Web

GUI


ExtJS


JavaScript


Back
-
End


cTAKES


JAVA


UIMA


Middleware


Web Services


JAVA


Apache CXF


JSON


Deployment Considerations

Deployment

Model

Security

Performance

Licensing

(UMLS,

Apache,

GPL

v
.
3
)