to Address Diabetes in the US

bloatdecorumSoftware and s/w Development

Oct 30, 2013 (3 years and 7 months ago)

68 views

Harnessing Health.Data.gov Data
to Address Diabetes in the US

Dr. Brand Niemann

Director and Senior Data Scientist

Semantic Community

http://semanticommunity.info/

AOL Government Blogger

http://gov.aol.com/bloggers/brand
-
niemann/

April 17, 2013

http://semanticommunity.info/Health_Datapalooza_IV#Health.Data.gov

1

Background


HealthData.gov and Health
Datapalooza

III Knowledge Base and Data
Ecosystem:


Two Published Stories, Two Spreadsheets, and Two Spotfire Dashboards. My
Note: HealthData.gov 194 Data Sets in 2012 and 399 now in 2013.


Health
Datapalooza

IV Technology Development Track:


Knowledge Graph, Metadata, RPI Watson,
Bootcamp
, and Linked Data. See
Next Slide


My Process:


Harness Data for Diabetes Knowledge Base


Data Ecosystem Spreadsheet


Data Ecosystem Spotfire


My Results:


Story


Slides


Spotfire Dashboard


Research Notes


2

HealthData.gov and Health
Datapalooza

III Knowledge Base

3

http://semanticommunity.info/HealthData.gov

HealthData.gov and Health
Datapalooza

III

Spotfire Data Ecosystem

4

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?HealthData.gov
-
Spotfire

Health
Datapalooza

IV

Technology Development Track


Open Health Knowledge
Graphs:


This session will describe

healthdata.gov

platform components, including new functionality
that programmatically exposes tabular and graph
-
oriented data.



Lifting
Schemes:


We will describe the ‘bottom up’ automation tools and techniques employed in the winning
submission for the

healthdata.gov

Metadata Domain Challenge
.


Open
Government
Data:


We will present emerging solution standards and transitioning academic technologies,
including innovative work conducted by
the

Watson’ research group

at

Rensselaer
Polytechnic Institute

on using Watson as a ‘data advisor’.


Health
Industry
Bootcamp

-

A Real
-
World Crash
Course:


An interactive, games
-
based
bootcamp

designed to get participants up and running the same
day with their own real
-
world portfolio covering how to use public data to create market
value, how to navigate perverse incentives in the industry, and how to deliver public and social
good.


Cooperation Without Coordination: Managing Distributed Clinical Trial
Data:


TBA See
http://health.data.gov/cqld/

and
http://reference.data.gov/cqld/about.html


Linked Data


Structured Data on the
Web:


TBA See
http://sw.appliedinformaticsinc.com/fct/facet_doc.html

5

http://healthdatapalooza.org/agenda/tech
-
development
-
track/

Vocab.Data.gov:

Government Data Vocabulary

6

http://vocab.data.gov/gd

Health Data Platform Metadata Challenge

7

http://www.health2con.com/devchallenge/health
-
data
-
platform
-
metadata
-
challenge/

http://www.healthdata.gov/blog/domain
-
challenge
-
1
-
metadata

Mirrored
http://hub.healthdata.
gov

to improve the
CKAN
-
metadata and
RDF.

Created three levels of
metadata for
http://healthdata.gov

datasets.

Created a set of
ontologies to link
several datasets from
HealthData.gov.

IBM Watson at RPI


What is Watson?:


The underlying “
DeepQA
” architecture is designed to find the meaning behind
a question posed in natural language and deliver a single, precise answer.


IBM’s Watson goes to school: A Q&A with RPI’s Jim Hendler:


A version of the system similar to the one used on “Jeopardy!” will be housed
at RPI for three years as part of a Shared University Research Award from IBM
Research. The system at RPI will have 15 terabytes of hard disk storage and
give 20 users access to the system simultaneously, making it, according to a
release, "an innovation hub” for the campus
.


One thing we want to explore is how Watson can interact with social media,
especially things such as “tweets” where the language is not as carefully
constructed as it is in the documents Watson has used in the Jeopardy game
.


I run a group that does a lot of work with Open Government Data systems (like
the US data.gov) and we’re excited about the possibility of using Watson to
help researchers around the world find relevant government data and
documents for their work
.


Our
goal for the next few years is to gain an understanding of what having the
new ways of bringing unstructured data and documents into our
computational lives will be.

8

http://watson.rpi.edu/


My Note: See Our
Semantic Medline

Work with
New Cray Graph Computer
.

Health.Data.gov

9

http://www.healthdata.gov/

My Note: Promotes the Diabetes Challenge,

But Does Not Provide Much Data For It!

Health.Data.gov: Search for Diabetes

10

http://www.healthdata.gov/dataset/search/diabetes

http://statesnapshots.ahrq.gov/snaps09/allStatesallMeasures.jsp?menuId=63&state=

My Note: Found One Data Set and

Downloaded Two Excel Files and

Added Them to the Diabetes Ecosystem

Spreadsheet. See Slide 18.

HealthData.gov Catalog Hub

11

http://hub.healthdata.gov/

My Note: 402 datasets instead of 399.

My Note: Found Same State AQHR Snapshots

and CDC WONDER Births. See Next Slide.

HealthData.gov Catalog Hub:

CDC WONDER Births

12

http://hub.healthdata.gov/dataset/wonder
-
births

HealthData.tw.rpi.edu Catalog Hub:

CDC WONDER Births

13

http://healthdata.tw.rpi.edu/hub/dataset/wonder
-
births
-
1

“We mirrored the
http://hub.healthdata.gov

CKAN
instance using its API to our own
instance at
http://healthdata.tw.rpi.edu/hub
.
This allowed us to both improve
the CKAN
-
based metadata,
including adding Data Dictionaries
and Technical Documentation as
Resources, and to improve the
RDF generated by CKAN.”

Source:
Health Data Platform
Metadata Challenge

Source: See Next Slide

CDC WONDER:

Natality

Information Live Births

14

http://wonder.cdc.gov/natality.html

My Note: Data Description contains Maternal Risk Factors:

Diabetes
-

Yes, No, Not Stated, Not Reported.

My Note: A Data Access Agreement is Required.

CDC WONDER:

Natality

Data Live Births
-

Diabetes

15

http://wonder.cdc.gov/controller/datarequest/D66;jsessionid=A7C4A365FB2F877955A61D7BF9C5EC5C

CDC WONDER:

Natality

Data Live Births
-

Diabetes

16

http://wonder.cdc.gov/controller/datarequest/D66;jsessionid=A7C4A365FB2F877955A61D7BF9C5EC5C

My Note: Export to Text File

And Remove Metadata and

Import to Spreadsheet.

Harness Health.Data.gov Data to Address
Diabetes in the US Knowledge Base

17

http://semanticommunity.info/Health_Datapalooza_IV#Health.Data.gov

My Note: Did not find CAHMI!

My Note: Only found one!

Diabetes Data Ecosystem Spreadsheet

18

http://semanticommunity.info/@api/deki/files/23811/Diabetes.xlsx


NHQR State Snapshots 2009

19

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AHRQFocusonDiabetes
-
Spotfire.dxp

AHRQ State Snapshots Conclusion


Getting started on quality improvement is not an easy task. One
strategy a State may find helpful is to identify other States with
populations similar to those targeted for a quality improvement
effort. For example, a State seeking to improve rates of pneumonia
vaccination for people discharged from hospitals may want to
model its efforts on those of a State that has previously
implemented an improvement program in this area and
demonstrated success.


In many cases, the greatest value in comparison may lie in
identifying States that have started from relatively low performance
and made incremental improvements. The State with the greatest
improvements may have the most to contribute in demonstrating
to other States how to encourage delivery system change that
improves quality of care.

20

http://statesnapshots.ahrq.gov/snaps09/interpretation.jsp?menuId=67&state=AL#conclusion


AHRQ Quality
of Care for Diabetes by Region
and State for 2005
-
2006 by
Conditions

21

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AHRQFocusonDiabetes
-
Spotfire.dxp

CDC WONDER Births
Natality

Diabetes

22

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AHRQFocusonDiabetes
-
Spotfire.dxp

Diabetes Data Ecosystem Spotfire

23

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AHRQFocusonDiabetes
-
Spotfire.dxp

My Note: Can See All the Data Sets and Their Data Elements

To Do Joins, Mappings, and Rule
-
Driven Visualizations.

Conclusions and Recommendations


A Health.Data.gov search for “diabetes” gives only one
data set. A Search of
HealthData.gov Catalog
Hub gives
two data sets.


The Health
Datapalooza

IV Technology Development
Track Objectives Are Shown in This Work.


I prefer
both
human
-
readable and machine
-
readable
metadata instead of just the later which I find at the
HealthData.gov Catalog Hub.


Next is First Lady Michelle Obama on Exercise and Dr.
Amen on Natural Supplements Data in Preventing and
Treating Diabetes.

24