Project 3: High-Throughput Phenotyping

cathamΤεχνίτη Νοημοσύνη και Ρομποτική

23 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

56 εμφανίσεις

Strategic Health IT Advanced
Research Projects (SHARP)

Area 4: Secondary Use of EHR Data

Project 3: High
-
Throughput
Phenotyping

June 30, 2011


Jyoti

Pathak, PhD

Assistant Professor of Biomedical Informatics

Department of Health Sciences Research


Project 3: Collaborators and
Acknowledgments


CDISC (Clinical Data Interchange Standards Consortium)


Rebecca Kush,
Landen

Bain, Mark
Arratoon


Centerphase

Solutions


Gary
Lubin
, Jeff
Tarlowe


Harvard University/MIT


Guergana

Savova
, Margarita
Sordo
, Peter
Szolovits


IBM T.J. Watson Research Labs


Marshall
Schor


Intermountain Healthcare/University of Utah


Susan Welch, Herman Post, Darin Wilcox, Peter
Haug


Mayo Clinic


Cui Tao, Lacey Hart, Erin Martin, Sridhar
Dwarkanath
, Calvin
Beebe, Kent Bailey, Kevin Bruce, Mike Conway (UCSD)


Outline


Background


On
-
going projects and updates


Proposed project ideas for Year 2


Productivity till date


Q & A


The Big Question…


The era of Genome
-
Wide Association Studies (GWAS) has
arrived


Genotyping cost is
asymptoting

to free [Altman et al.]


Most (all?) published GWAS are done on carefully
selected and uniformly characterized patient populations


Time consuming


Clinical
Phenotyping
, on the other hand, is lacking


Slow
-
throughput


Costly and time consuming


How
“good” are EMRs (with inconsistencies and biases) as a
source for phenotypes?


Why is this important
now
?


Bio
-
repositories are becoming popular


Linking
biospecimens

to personal health data


Population
-
based studies for genetic and environmental
conditions and contributions to disease etiology


Often limited in scope or population diversity


Clinical trials eligibility


Cohort identification is always a bottleneck


Quality metrics and HITECH Act


Large
-
scale prospective cohort studies
could

be facilitated by
availability of
complete
,
standardized
, and
unbiased
data from
EMRs

Pros and Cons of EMR Data for
Phenotyping


We have a LOT of information about subjects


Demographics, labs, meds, procedures…


Team diagnoses as opposed to a diagnoses based on a
single person’s opinion


Potential for more reliable diagnoses


Identification of otherwise latent population
differences



Possible issues with using EMR data for
phenotyping


Non
-
standardized, heterogeneous, unstructured data


Measured (e.g., demographics) vs. un
-
measured (e.g.,
socio
-
economic status) population differences


Hospital specialization and coding practices


Population/regional market landscape

But…the challenges can be
addressed…if we


Develop techniques for standardization and normalization of
clinical data


Develop techniques for transforming and managing
unstructured clinical text into structured representations


Develop techniques for resolving
missing and inconsistent
data


Develop a scalable, robust and flexible framework for
demonstrating

all of the above in a “real
-
world
setting”





EMR
-
derived
Phenotyping


Overarching goal


To develop techniques and algorithms that operate on
normalized EMR data to identify cohorts of potentially
eligible subjects on the basis of disease, symptoms, or
related findings



Phenotyping (from our perspective)


Inclusion and exclusion criteria for cohort identification


Numerator and denominator criteria for clinical quality
metrics


Trigger criteria for clinical decision support









EMR
-
based Phenotype Algorithms


Typical components


Billing and diagnoses codes


Procedure codes


Labs


Medications


Phenotype
-
specific co
-
variates

(e.g., Demographics,
Vitals, Smoking Status, CASI scores)


Pathology


Imaging?


Organized into inclusion and exclusion criteria


Experience from
eMERGE

(
http://www.gwas.net
)


Electronic Medical Records and Genomics Network






EMR
-
based Phenotype Algorithms


Iteratively refine case definitions through partial manual
review to achieve ~PPV ≥ 95%)



For controls, exclude all potentially overlapping syndromes
and possible matches; iteratively refine such that ~NPV ≥
98
%

Example: Type 2 Diabetes (cases)

Challenges


Algorithm design


Non
-
trivial; requires significant expert involvement


Highly iterative process


Time
-
consuming manual chart reviews


Representation of “phenotypic logic”


Data access and representation


Lack of unified vocabularies, data elements, and value
sets


Questionable reliability of ICD & CPT codes (
e,g
., omit
codes that don’t pay well, billing the wrong code since it is
easier to find)


Natural Language Processing needs


And many more…


Outline


Background


On
-
going projects and updates


Proposed projects for Year 2


Productivity

till date


Q & A


Current HTP Project Themes


Identification of Clinical Element Models




Phenotyping Execution Logic




Data Quality, Validation and Cost Effectiveness

Project Overview


Three
eMERGE

phenotyping

algorithms as initial Use Cases


Type 2 Diabetes Mellitus (T2DM)


Peripheral Arterial Disease (PAD)


Hypothyroidism


Specified computable mappings between CEMs and algorithms


Classified
phenotyping

input specifications into two categories:


General EHR data requirements (Examples: demographics,
diagnoses)


Phenotype
-
specific EHR data (Example: Ankle
-
brachial index
for PAD)


Proposed semantic types of the input specifications

Semantic Classification Types


Demographic data (e.g., Gender, Race, Age, etc)


Physical measurements (e.g., Weight, Height, BMI, etc)


Diagnosis (ICD codes, SNOMED CT annotations from
problem list, administrative coding workflows, clinical
notes, and etc)


Procedure (CPT codes, ICD procedure codes)


Medication


Laboratory

General Models for Scalability


Diagnosis


AdministrativeDiagnosisCode
:
billing purposes


ClinicalAssertedDiagnosisCode
:
problem list, clinical notes,
etc


Medication


Prescribed/Ordered


Dispensed


Administered


Procedure


AdministrativeProcedureCode
: CPT code, ICD 9 code for
inpatient.


Laboratory

Mapping Issues


Secondary use versus patient care meanings


History of X
meaning “evidence of X prior to date Y”


versus
history of X
statement

in text documents


Diagnosis
inputs often validated on ICD
-
9
-
CM codes


Non
-
standard aggregations


Fasting glucose test


Availability of data in EHR


Age at onset of X


Medical specialty (ankle brachial index)


Smoking history/family history (NLP/structured
solutions)






Mapping Considerations


Algorithm inputs are
abstractions

of EHR content


Native content


Generalized content


Computed


Selected content



Common constraints of EHR content


Source of data, i.e., EHR application used, encounter type


Allowable codes


Temporal bounds


Relationships among separate observations


Example CEM to Algorithm Map

Example CEM to Algorithm Map
-

2

Current HTP Project Themes


Identification of Clinical Element Models




Phenotyping Execution Logic




Data Quality, Validation and Cost Effectiveness

Drools
-
based Phenotyping

Architecture

Clinical
Element
Database

List of

Patients
for
Specific
Cases



Rule accessibility by clinicians


BPMN, decision tables, DSL;
collaborative authoring



Workflow authoring by domain experts (clinicians)

Domain Expert ~

Analyst ~

Developer

Drools

(A long with other technologies)

Drools
-
based Phenotyping

Architecture


Business Logic

Clinical
Element
Database

List of

Diabetic
Patients


Data Access
Layer


Transformation
Layer


Inference
Engine
(Drools)

Service for
Creating Output
(File,

Database,
etc)

Transform physical representation



Normalized logical representation
(Fact Model)

Drools


Workflow

Diabetes Project Status


Diabetes Rules are Completed



Demonstrated the Workflow/Rules for Feedback



Make Rules “Shareable”



Performance Validation



More details in the later session!




DM2 algorithm

Logic Statement

GELLO expression

QDM expression

Patient record flagged as
“Y” with research
Authorization (nothing in
data model to represent
this)

context Patient

def:
researchAuthorization
:
Boolean =
Exist(Self.explicitConsent

= ‘Y’)


If
ResearchAuthorization


If Patient.explicitConsent = ‘Y’

Patient age greater than
18 at the start of
measurement period
[1/1/09
-
12/31/10]

context Patient

def: age: Integer =


let startOfMeasurement =
PointInTime : 1/1/09



in StartOfMeasurement


Self.birthdate


If age > 18

startOfMeasurement= 1/1/09


If startOfMeasurement


Patient.birthdate > 18

Patient meets at least one
of the following criteria:


Patient has at least 2
clinic (face
-
to
-
face
outpatient) visits during
measurement period with
visits coded with a
diabetes ICD
-
9 CM code
OR


context Patient

def: face2face: Integer =


let startOfMeasurement =
PointInTime : 1/1/09,


let endOfMeasurement =
PointInTime: 12/31/10,


let dmCodes =
{listOfICD9CodesForDM},



let
EncountersWithDMcodes:
Set(Encounter)
-
> select
(Encounter.EncounterType
=outpatient AND
Encounter.StartDate >=
startOfMeasurement AND
Encounter.StartDate <=
endOfMeasurement AND
Encounter.ClinicalEncounterId
= dmCodes)



in
count(encountersWithDMcod
es)



if face2face >= 2

startOfMeasurement= 1/1/09

endOfMeasurement = 12/31/10

dmCodes = {listOfICD9CodesForDM}


Countdistinct(Encounter: encounter outpatient DURING
StartOfMEasurement and endOfMeasurement and
Encounter.ClinicalEncounterId in dmCodes) >=2

Patient is on DM
medications
OR

context Patient

def: onDMmeds: Boolean =


let dmMedications =
{listOfRxNormCodesForDMme
ds}



in
Exist(Medication.MedicationId
in dmMedications)


If onDMmeds

dmMedications = {listOfRxNormCodesForDMmeds}


Count(Medication.MedicationId in dmMedications) > 0


Patient has at least 2
clinic (face
-
to
-
face
outpatient) visits during
measurement period with
capillary glucose lab value
in the measurement
period OR with an
abnormal lab glucose
level > 200 mg/dL OR
fasting blood glucose level
> 125 mg/dL OR (glyco)
hemoglobin A1c >= 6.5%
OR

context Patient

def: face2face: Boolean =


let startOfMeasurement =
PointInTime : 1/1/09,


let endOfMeasurement =
PointInTime: 12/31/10,


let capillaryGlucoseCodes =
Set (codes) :


{listOfLoincCapillaryGlucoseCo
des} ,


let glucoseTestCodes = Set
(codes):


{listOfLoincCodesForGlucose},


let HbA1CCodes = Set
(codes) :


{listOfLoincCodesForHbA1c},


let fastingGlucoseCodes =
Set (codes):


{listOfLoincCodesForFastingGl
ucose},



let capillaryGlucoseTests: set
(Lab)
-
> select
(Lab.specimenCollectionDate
>= startOfMeasurement AND
Lab.specimenColletiondate <=
endOfMeasurement AND
Lab.ResultCode in
capillaryGlucoseCodes),



let abnormalGlucoseTests:
Set (lab)
-
> select
(Lab.specimenCollectionDate
>= startOfMeasurement AND
Lab.specimenCollectiondate
<= endOfMeasurement AND
Lab.ResultCode in
glucoseTestCodes AND
Lab.value > 200 mg/dL),



let fastingGlucoseTests:
Set(Lab)
-
> select
(Lab.specimenCollectionDate
>= startOfMeasurement AND
Lab.specimenColletiondate <=
endOfMeasurement AND
Lab.ResultCode in
fastingGlucoseCodes and
Lab.value > 125 mg/dL),



let HbA1CTests: Set(Lab)
-
>
select
(Lab.specimenCollectionDate
>= startOfMeasurement AND
Lab.specimenColletiondate <=
endOfMeasurement AND
Lab.ResultCode in
HbA1CCodes and Lab.value >=
6.5%)



let
EncountersDuringMeasureme
nt: Set(Encounter)
-
> select
(Encounter.EncounterType
=outpatient AND
Encounter.StartDate >=
startOfMeasurement AND
Encounter.StartDate <=
endOfMeasurement )



in


count(EncountersDuringMeas
urement) > 2 AND


(
exist(EncountersDuringMeasu
rement
-
>


intersection(capillaryGlucoseT
ests) OR


exist(EncountersDuringMeasu
rement
-
>


intersection(abnormalGlucose
Tests) OR


exist(EncountersDuringMeasu
rement
-
>


intersection(fastingGlucoseTes
ts) OR


exist(EncountersDuringMeasu
rement
-
>


intersection(HbA1CTests)


)

startOfMeasurement= 1/1/09

endOfMeasurement = 12/31/10

capillaryGlucoseCodes =


{listOfLoincCapillaryCodes}

glucoseTestCodes =


{listOfLoincCodesForGlucose}

HbA1CCodes = {listOfLoincCodesForHbA1c}
fastingGlucoseCodes =


{listOfLoincCodesForFastingGlucose}


If( CountDistinct(Encounter: encounter outpatient


DURING StartOfMEasurement and


endOfMeasurement) >= 2 AND


( Lab.ResultCode in capillaryGlucoseCodes starts


concurrent with Encounter outpatient


OR


( Lab.ResultCode in glucoseTestCodes starts


concurrent with Encounter outpatient AND


Lab.Value > 200 mg/dL )


OR


( Lab.ResultCode in fastingGlucoseCodes starts


concurrent with Encounter outpatient AND


Lab.Value > 125 mg/dL )


OR


( Lab.ResultCode in HbA1CCodes starts


concurrent with Encounter outpatient AND


Lab.Value >= 6.5% )


)

)




Patient has ‘diabetes’ in
the EMR problem list

context Patient

def: hasDMinProblemList:
Boolean =


let DMproblem =
{listOfICD9codesForDM}



In Exist(Problem.ProblemId in
Dmproblem}


If hasDMinProblemList

DMproblem

= {listOfICD9codesForDM}


Count(Problem.ProblemId

in
DMproblem
) > 0


NQF QDM Criteria

Current HTP Project Themes


Identification of Clinical Element Models




Phenotyping Execution Logic




Data Quality, Validation and Cost Effectiveness

Data Quality: Objectives



Assess
Data variability within and across
institutions



Assess
impact of this variability on

Secondary
Use of EMR



Generate
specifications for Widgets


“Warning Label” for suspect data categories


Data quality audits with logs


Batch data correction /
removal


More details during the later session!


Centerphase Project

Research Design

Randomly generate
ONE

sample set of patient records from database:

Based on T2DM ICD9 codes from at least 2 visits during measurement
period


Sample Patient
Records

Screens 1
-
3

Screens 1
-
3


Patient

Result Set


Patient

Result Set

Manual

Process

Algorithm
-
Driven

Process

Compare time, cost and accuracy of results

Study coordinator
(SC) conducts
manual review of
patient charts,
and monitors
activity time

Programmer
develops and
runs algorithm
to query
records, and
monitors
development
and run time

Outline


Background


On
-
going projects and updates


Proposed projects for Year 2


Productivity

till date


Q & A


Project 1: National Library for Clinical
Phenotyping

Algorithms


Current state of the art


MS Word files: do
not

scale


An FTP server: will
not

work either


We need…programmatic access, querying, navigation


Promote re
-
use (where applicable)


Research Question
: To develop an implementation
independent,
phenotyping

logic representation template for
algorithm design


Existing work on Drools, GELLO and NQF


Leverage
CEMs

for algorithm design and representation


Publicly accessible Web
-
based environment for
phenotyping

algorithms


Validate algorithm deployment in multiple EMR settings



Project 2: Machine Learning and
Phenotyping


EMR
-
derived
phenotyping

algorithm development is tedious,
and time
-
consuming


Based on our
own
experience!


Research
Question
: To leverage machine learning methods
for rule/algorithm development, and validate against expert
developed
ones


Use
eMERGE

library of phenotype algorithms for
validation


Asthma and Diabetes as initial use
-
cases


Preliminary work by Susan


Work with data normalization and NLP teams



Project 3: Just
-
in
-
Time Phenotyping


The current pipeline prototype is based on a relational
persistence layer


Access to historical, retrospective data


Offline processing of data and phenotyping algorithms


Research
Question
: To

to apply phenotyping algorithms as
“data sniffers” that can be plugged within an UIMA pipeline


Online, real
-
time phenotyping (e.g., for clinical decision
support)


How much data is “necessary”? How much data is
“necessary and sufficient”?


More active role of NLP techniques




Project 4: Phenotyping Workbench


EMR
-
based phenotyping algorithms are hard to design, and
even harder to implement


Access to domain experts

often a resource issue


Access to IT/informatics experts

also, a resource issue


Lot of moving components


Research
Question
: To

develop a phenotyping “plug & play”
workbench for algorithm design and evaluation


Visual and graphical algorithm editing (
jPBMN
)


Configurable algorithms (Drools code snippets)


User workspace management (who are these “users”?)


File
-
based or database access layer (CEM
-
based)


Leverage i2b2 workbench where applicable


“Plug & Play” is still a big challenge…








Outline


Background


On
-
going projects and updates


Proposed projects for Year 2


Productivity

till date


Q & A

Productivity till date


Manuscripts/Abstracts/Posters


Conway MA, Berg RL,
Carrell

D, Denny JC,
Kho

AN, Kullo IJ,
Linneman

JG,
Pacheco JA,
Pessig

PL, Rasmussen L, Weston N, Chute CG, Pathak J.
Analyzing Heterogeneity and Complexity of Electronic Health Record
Oriented Phenotyping Algorithms. AMIA 2011 (paper).


Tao C, Parker CG,
Oniki

TA, Pathak J, Huff SM, Chute CG. An OWL Meta
-
Ontology for Representing the Clinical Element Model. AMIA 2011 (paper).


Chute CG, Pathak J, Savova GK, Bailey KR,
Schor

MI, Hart LA, Beebe CE,
Huff SM. The
SHARPn

Project on Secondary Use of Electronic Medical
Record Data: Progress, Plans and Possibilities. AMIA 2011 (paper).


Conway MA, Pathak J. Analyzing the Prevalence of Hedges in Electronic
Health Record Oriented Phenotyping Algorithms. AMIA 2011 (poster).


Tao C, Welch SR, Wei WQ,
Oniki

TA, Parker CA, Pathak J, Huff SM, Chute
CG. Normalized Representation of Data Elements for Phenotype Cohort
Identification in Electronic Health Record. AMIA 2011 (poster).


Prototype software


Drools
-
based implementation of the diabetes algorithm




Thank You!