Scalable Architecture for Federated Translational ... - HISB 2012

candlewhynotData Management

Jan 31, 2013 (4 years and 9 months ago)

221 views

Preparing Electronic Health Records

for Multi
-
Site CER Studies

Michael G. Kahn
1,3,4
, Lisa Schilling
2

1
Department of Pediatrics, University of Colorado, School of Medicine

2
Department of Medicine, University of Colorado, School of Medicine

3
Colorado Clinical and Translational Sciences Institute

4
Department of Clinical Informatics, Children

s Hospital Colorado


IEEE Annual Research Meeting

27 September 2012

Michael.Kahn@ucdenver.edu

Lisa.Schilling@ucdenver.edu



Funding provided by AHRQ 1R01HS019908 (Scalable Architecture for Federated Translational Inquiries Network)

Setting the context:

AHRQ Distributed Research Networks


AHRQ ARRA OS: Recovery Act 2009: Scalable
Distributed Research Networks for Comparative
Effectiveness Research (R01)



Goal: enhance the capability and capacity of electronic
health networks designed for
distributed

research to
conduct prospective,
comparative effectiveness research

on outcomes of clinical interventions.


Funding provided by AHRQ 1R01HS019908 (Scalable Architecture for Federated Translational Inquiries Network)

AHRQ Distributed Research Networks
Funded Projects


SAFTINet
: Scalable Architecture for Federated
Translational Inquiries Network


Lisa M. Schilling, University of Colorado Denver

(R01 HS19908
-
01)



SCANNER
: Scalable National Network for Effectiveness
Research


Lucila Ohno
-
Machado, University of California San Diego

(R01 HS19913
-
01)



SPAN
: Scalable
PArtnering

Network for CER: Across
Lifespan, Conditions, and Settings


Matt Daley, Kaiser Foundation Research Institute

(R01 HS19912
-
01)

Funding provided by AHRQ 1R01HS019908 (Scalable Architecture for Federated Translational Inquiries Network)

SAFTINet Partners


Clinical partners


Colorado Community Managed Care Network and the Colorado Associated
Community Health Information Enterprise


Colorado Federally Qualified Health Centers


Denver Health and Hospital Authority


Cherokee Health Systems, Tennessee



Technology partners


Ohio State University, Department of Bioinformatics


QED Clinical, Inc., d/b/a CINA


Recombinant Data Corporation


Observational Medical Outcomes Partnership (OMOP)



Medicaid partners


Colorado Health Care Policy & Financing


TennCare

and Tennessee managed care organizations (partnership in development)



Leadership


University of Colorado Anschutz Medical Campus


American Academy of Family Physicians, National Research Network

Key Differences between EHR and CER data

EHR Data

CER Data

EHR
-
>CER

task

Fully identified

LDS or de
-
identified

Strip

identifiers;

keep mappings?

Local codes and values

Standardized

codes and
values

Terminology

and value
set mapping (manual!)

Broad data domains

Focused

data domains

Filtering by patient,

encounter, date, facility

Variable data quality;
high

level of missingness

Substantial

data quality
processes applied

Data profiling;

iterative
investigations

Lots of free text

Fully coded data only

NLP

or ignore free text

Local

access only

Shared access

Distributed or

centralized
data access

Single data source

Multiple

data sources

Record linkage

Key Differences between EHR and CER data

EHR Data

CER Data

EHR
-
>CER

task

Fully identified

LDS or de
-
identified

Strip

identifiers;

keep mappings?

Local codes and values

Standardized

codes and
values

Terminology

and value
set mapping (manual!)

Broad data domains

Focused

data domains

Filtering by patient,

encounter, date, facility

Variable data quality;
high

level of missingness

Substantial

data quality
processes applied

Data profiling;

iterative
investigations

Lots of free text

Fully coded data only

NLP

or ignore free text

Local

access only

Shared access

Distributed or

centralized
data access

Single data source

Multiple

data sources

Record linkage

Key Differences between EHR and CER data

EHR Data

CER Data

EHR
-
>CER

task

Fully identified

LDS or de
-
identified

Strip

identifiers;

keep mappings?

Local codes and values

Standardized

codes and
values

Terminology

and value
set mapping (manual!)

Broad data domains

Focused

data domains

Filtering by patient,

encounter, date, facility

Variable data quality;
high

level of missingness

Substantial

data quality
processes applied

Data profiling;

iterative
investigations

Lots of free text

Fully coded data only

NLP

or ignore free text

Local

access only

Shared access

Distributed or

centralized
data access

Single data source

Multiple

data sources

Record linkage

Key Differences between EHR and CER data

EHR Data

CER Data

EHR
-
>CER

task

Fully identified

LDS or de
-
identified

Strip

identifiers;

keep mappings?

Local codes and values

Standardized

codes and
values

Terminology

and value
set mapping (manual!)

Broad data domains

Focused

data domains

Filtering by patient,

encounter, date, facility

Variable data quality;
high

level of missingness

Substantial

data quality
processes applied

Data profiling;

iterative
investigations

Lots of free text

Fully coded data only

NLP

or ignore free text

Local

access only

Shared access

Distributed or

centralized
data access

Single data source

Multiple

data sources

Record linkage

Key Differences between EHR and CER data

EHR Data

CER Data

EHR
-
>CER

task

Fully identified

LDS or de
-
identified

Strip

identifiers;

keep mappings?

Local codes and values

Standardized

codes and
values

Terminology

and value
set mapping (manual!)

Broad data domains

Focused

data domains

Filtering by patient,

encounter, date, facility

Variable data quality;
high

level of missingness

Substantial

data quality
processes applied

Data profiling;

iterative
investigations

Lots of free text

Fully coded data only

NLP

or ignore free text

Local

access only

Shared access

Distributed or

centralized
data access

Single data source

Multiple

data sources

Record linkage

Key Differences between EHR and CER data

EHR Data

CER Data

EHR
-
>CER

task

Fully identified

LDS or de
-
identified

Strip

identifiers;

keep mappings?

Local codes and values

Standardized

codes and
values

Terminology

and value
set mapping (manual!)

Broad data domains

Focused

data domains

Filtering by patient,

encounter, date, facility

Variable data quality;
high

level of missingness

Substantial

data quality
processes applied

Data profiling;

iterative
investigations

Lots of free text

Fully coded data only

NLP

or ignore free text

Local

access only

Shared access

Distributed or

centralized
data access

Single data source

Multiple

data sources

Record linkage

Key Differences between EHR and CER data

EHR Data

CER Data

EHR
-
>CER

task

Fully identified

LDS or de
-
identified

Strip

identifiers;

keep mappings?

Local codes and values

Standardized

codes and
values

Terminology

and value
set mapping (manual!)

Broad data domains

Focused

data domains

Filtering by patient,

encounter, date, facility

Variable data quality;
high

level of missingness

Substantial

data quality
processes applied

Data profiling;

iterative
investigations

Lots of free text

Fully coded data only

NLP

or ignore free text

Local

access only

Shared access

Distributed or

centralized
data access

Single data source

Multiple

data sources

Record linkage

A common data model is critical!


CINA

CDR


Other

EHR

Local

Data

Warehouse

Other

EHR

Existing

Clinical

Registries

Other

EHR

Limited Data Set

Common Data Model

Common Terminology

Common Query Interface

Limited Data Set

Common Data Model

Common Terminology

Limited Data Set

Common Data Model

Common Terminology

Crossing the CER chasm !!

CER

ROSITA
-
GRID
-
PORTAL

Grid Portal

Why ROSITA?


ROSITA: Reusable OMOP and


SAFTINet

Interface Adaptor



ROSITA: The only bilingual Muppet



Converts EHR data into research limited data set

1.
Replaces local codes with standardized codes

2.
Replaces direct identifiers with random identifiers

3.
Supports clear
-
text and encrypted record linkage

4.
Provides data quality metrics

5.
Pushes data sets to grid node for distributed queries

ETL

XML

ROSITA

OMOP CDM V4

Grid Data Service

SAFTINet Data Quality

Data Service

Client CDW

Medicaid

ETL

XML

ROSITA: transforming EHR data for
comparative effectiveness research


ETL Specifications

Scalable
Architecture for Federated Therapeutic Inquiries Network (SAFTINet)

ETL Specifications Document

Version 4.0

August 6th, 2012


SAFTINet

ETL Specifications

SAFTINet

ETL Specifications

ROSITA Key Functional Components


XML Validator


against XSD XML schema


XML Importer (
StAX
, JAXB, JDBC Template)

writes to
PostrgreSQL

db


Source Data Profiler & Validation

summary stats, random
chart selection (
JasperReports
)


CSV Exporter
-

unmapped source values to UC


CSV Importer
-

return of newly mapped source values


ETL Processor

maps local to concept ID, populates OMOP
PostgreSQL

db


OMOP Data Profiler


OMOP Data Validation


anomalies, summary statistics



Transforming EHR Data:

What does ROSITA do?

What does ROSITA do?







What does ROSITA do?

ROSITA administrative control

Do not have Medicaid figured out

ROSITA Security Discussion Framework

ROSITA: Current Status


OMOP CDM V4 finalized!


Clinical & financial extensions


Phase 1 (16 wk) Software development complete


C
linical data only; no Medicaid


Administrative portal


Deployed at 1 site, connected to grid node this week


Phase 2:


Medicaid + record linkage


Quality database, data quality profiling reports in
JasperReports


Alternative input mechanisms


All SAFTINet partners have begun ETL activities


Two sites have provided full ETL extracts for development and testing


Everything will be available



https://github.com/saftinetrosita/
SAFTINetROSITA


http://saftinet.net
/


Questions?

Lisa.schilling@ucdenver.edu

Funding provided by AHRQ 1R01HS019908 (Scalable Architecture for Federated Translational Inquiries Network)