NEW ENHANCEMENTS IN THE SYNTHETIC DERIVATIVE AND WHAT THAT MEANS FOR THE RESEARCHER

bloatdecorumSoftware and s/w Development

Oct 30, 2013 (3 years and 9 months ago)

106 views

NEW ENHANCEMENTS IN THE
SYNTHETIC DERIVATIVE AND WHAT
THAT MEANS FOR THE
RESEARCHER

Jacqueline Kirby

June 7
th
, 2013

Resources


StarPanel


Identified clinical data; designed for clinical use


Record Counter


De
-
identified clinical data; sophisticated phenotype searching


Returns a
number



record counts and aggregate demographics


Synthetic Derivative


De
-
identified clinical data; sophisticated phenotype searching


Returns record counts AND de
-
identified narratives, test values,
medications, etc., for review and creation of study data sets


Research Derivative


Identified clinical data


Programmer (human) supported


BioVU



Genotype data


De
-
identified clinical data; sophisticated phenotype searching


Able to link phenotype information to biological sample




The
Synthetic Derivative Record Counter (
RecordCounter
)
provides
exploratory
data figures

and counts to
members of
the VU research community for research planning purposes
and feasibility assessment.





Available to ANYONE with the VUNET id


Allows
the user to input basic medical data, such as ICD 9
codes or text keywords, e.g., lung cancer, as well as
demographic information, and then search the Synthetic
Derivative
database
to determine the approximate number
of records that meet those criteria.



What is the
RecordCounter
?


Rich, multi
-
source database of de
-
identified clinical and demographic
data


User Interface tool that can be used for access and analysis


Services are available to help deliver results for non
-
standard
queries (temporal queries, controls matching,
etc
)


Contains ~
2.3
million records


~1 million with detailed longitudinal data


averaging 100k bytes in size


an average of 27 codes per record


Records updated over time and are current through December,
2012


Soon to be 5/31/2013


What is the Synthetic
Derivative (SD)

The
RecordCounter

Vs. The SD

The
RecordCounter



Users can use search criteria to
return exploratory
counts

(The results returned are not
exact and are meant for a high level assessment of the
available data.)



The SD
-

User can use search criteria to returns exact
count and the associated longitudinal data for review.

What is
BioVU
?


The move towards personalized medicine requires very large
sample sets for discovery and validation


BioVU
:
biobank

intended to support a broad view of biology and
enable personalized medicine


Contains de
-
identified DNA extracted from leftover blood after
clinically
-
indicated testing of Vanderbilt patients who have not
opted out


Linked to Synthetic Derivative: de
-
identified EMR


Current sample number:
166,397

o
147,292 adult samples

o
19,220 pediatric samples

Synthetic Derivative vs.
BioVU


Documents,

such

as
:



Clinical

Notes



Discharge

Summaries



History

and

Physicals



Problem

Lists



Surgical

Reports



Progress

Notes



Letters



Diagnostic

Codes,

Procedural

Codes



Forms

(intake,

assessment)



Reports

(pathology,

ECGs,

echocardiograms)



Clinical

Communications



Lab

Values

and

Vital

Signs



Medication

Orders



TraceMaster

(ECGs)


Tumor

Registry


Synthetic Derivative Data Types

Technology + policy

De
-
identification


Derivation of 128
-
character identifier (RUI) from the MRN generated by
Secure Hash Algorithm (SHA
-
512)


HIPAA
identifiers removed using combination of custom techniques and
established de
-
identification
software

Date Shift


Our
algorithm shifts the dates within a record by a time
period
(
up to
364 days
backwards)


that is consistent within each record, but differs
across

records


Restricted
access & continuous oversight


Access restricted to VU; not a public resource


IRB approval for study (non
-
human)


Data Use Agreement


Audit logs of all searches and data exports

Synthetic
Derivative 3.0
was launched with on February 25, 2013.
SD
3.0 leverages the power of an IBM
Netezza

data warehouse appliance
to provide faster, near
-
immediate counts as the user builds their search
criteria and new review features that includes enhanced data
visualization and covariate annotation capabilities
.


SEARCH:

Counts are provided for each search item in real
-
time as you
build your algorithm letting you adjust your criteria immediately.
Modifiers for ICD 9 codes allow searches to require 2 or more codes.

REVIEW:

Filter and highlight documents, medications and labs to make
review efficient.

ANNOTATE:

Create your own set
-
based annotations that are sharable
across the study team.


The New SD…

General algorithm for
determining a
phenotype


Definition of phenotype for cases and controls is critical


May require consultation with experts


Basic understanding of data elements; uses and
limitations of particular data points is important


Reviewing records manually to make case determination
(or even to calculate PPV of search methodology) will be
somewhat time consuming


The problem with ICD9 codes


ICD9 give both false negatives and false positives


False
negatives
:


Outpatient billing limited to 4 diagnoses/visit


Outpatient billing done by physicians (e.g., takes too long to find the
unknown ICD9)


Inpatient billing done by professional coders:


omit codes that don’t pay well


can only code problems actually explicitly mentioned in documentation


False
positives
:


Diagnoses evolve over time
--

physicians may initially bill for suspected
diagnoses that later are determined to be incorrect


Billing the wrong code (perhaps it is easier to find for a busier clinician)


Physicians may bill for a different condition if it pays for a given treatment


Example: Anti
-
TNF biologics (e.g., infliximab) originally not covered for psoriatic
arthritis, so rheumatologists would code the patient as having rheumatoid arthritis



Lessons from preliminary phenotype
development


Eliminating negated and uncertain terms:


“I don’t think this is MS”, “uncertain if multiple sclerosis”


Delineating section tag of the note


“FAMILY MEDICAL HISTORY: Mother had multiple sclerosis.”


Adding requirements for further signs of “severity of disease”


For MS: an MRI with T2 enhancement, myelin basic protein or
oligoclonal

bands on lumbar puncture, etc.


This could potentially miss patients with outside work
-
ups,
however


Once you have logged in…

The New SD gives a cleaner Home
page
interface with aggregate SD
graphs.


New features for the Investigator:


A welcome and
announcement section to give
the Investor any immediate
information/Help when
accessing the SD


Overall SD/
BioVU

population
demographics with to give an
up
-
to
-
date population details
of the resource

Improved Search Features

Once you have selected “Start a New Search”, you will go to the Search Interface.
Users

can
select search criteria to see record counts by dragging and dropping Search Criteria (e.g.
ICD

codes, Labs, Document Keywords, Medications) into the Search box.


New Search Features include:


C
ounts
for each specific criteria element as denoted to the right hand side of the search
box(circled in red),
summary counts for combined criteria (this OR that) indicated at the
bottom of the group
box(circled in blue),
and a final Total count at the right corner of your
search(circled in green)


Limit Search To
BioVU

Records, Non
-
compromised
BioVU

Samples, or only
BioVU

Samples available for external assay


Limit your search based on number of ICD code occurrences in the subject record to
require multiple instances of a ICD code

Improved Set Review

After you have build your set, you can be begin
reviewing your records. The New SD has both a
Summary view to see a high level graphic view
of a subject AND a Detail view that allows you to
customize your view with a new Tabular view.


What’s new in Review:


Subject ids listed on left hand side to move
easily through the records.


Tabular view of the different data elements
with custom sorting of tabs


Arial buttons for determining Subject status

New Data Visualization Features


In the Summary tab and in the Vitals
view, the new SD has new data
visualization features that allow a
reviewer to get a quick view of a
subject’s longitudinal data.

Improved Document View

Documents are divided into three tabs:


High Value Documents


Other Documents


Problem Lists

On each Document tab, you can

1.
Filter based on Keywords, Document Type,
Subtypes

2.
Filter keywords searches and display only
the context

3.
Highlight based on
Keywords and display
either the full documents or the word(s) in
context

New Medications and Labs Display

Medication and Lab view now have two
displays for easier review. The Summary view
displays aggregate mentions of meds/labs with
beginning and end dates. The Details view
show each instance of the meds/lab full detail
display with the ability to filter by data
element.

Improved Annotations

Annotations allow for easier identification
and saving of covariate information during
set
review. Create
your own set
-
based
annotations that are sharable across the
study team
. These can be exported to
excel when performing your data analysis.

What’s Next?


Data Export into
REDCap


Adding
PheWAS

to the search criteria


Predict Labs in the Lab view


Custom and Timeline View


….


The SD
has
evolved greatly in the past six months and this
is largely due to suggestions and needs from its users.
Please let us know
what
YOU

would like
in the SD
so that
the SD
can continue to evolve.


SD Access Protocol

Researcher



Requests IRB

Exemption


Signs DUA

Researcher

accesses

SD

SD staff

verify/

access

granted

Enters

StarBRITE to

complete

electronic

application

(IRB status is

in StarBRITE)

Leveraging VICTR Resources


Record Counter (RC)


part of SD but open to anyone
with a
Vunet

ID:

https://biovu.vanderbilt.edu/RC/RC.html


SD (/
BioVU
)


Erica Bowton (via
StarBrite
)


RD


email or call me, or fill out a Request form at
https://starbrite.vanderbilt.edu/


(
https://starbrite.vanderbilt.edu/managedata/datarequest.h
tml

)


SD User Group Sessions will be held the fourth
Wednesday of each month at 1 pm
. All are welcome.

Time:

1:00
-
2:00 PM

Location:

Light Hall, Room 439


If
you have any questions or feedback about the new
SD,
please contact
us, email
Jacqueline.Kirby@Vanderbilt.edu

Questions or Comments?

THANK YOU!