Big Data and Predictive

globestupendousSecurity

Dec 3, 2013 (3 years and 6 months ago)

59 views

Big Data and Predictive
Analytics in
Government
November
6, 2013


JAMES G. SHEEHAN

EXECUTIVE DEPUTY
COMMISSIONER, NEW
YORK
CITY HUMAN
SERVICES ADMINISTRATION


DAVID FITZ, CPA, PMP, CGFM

PARTNER

KPMG LLP

Agenda



Government Initiatives in Big Data



Data and Analytics Overview



Value Proposition and Opportunities



Current State



Success Factors

Government Initiative


“Can Anyone Catch New York?” 4/2/2013 Saul Sherry in Big Data Republic


Richness of City data available


Index crimes down 80%


95 per cent success rate in tracking down dumpers of cooking oil


2x improvement in id of illegal cigarette retailers


Use of inspection algorithm to identify and inspect firetraps


Boston and Chicago are trying

Government Initiatives
-
IRS

IRS
-
Compliance Data Warehouse
-
1996


Estimate US tax gap


Predict identity theft, fraud, other non
-
compliance


Optimize workload for enforcement


IRS lessons:


Interdisciplinary teams


Data quality focus
-
for quality and for trust


Governance to avoid policies and procedures that stifle change

Government Initiatives


Patriot Act
-
NSA, FBI
-
2002


$1.5 billion NSA digital data center
-
Bluffdale
, Utah


FinCEN
-
”data reasonably necessary to identify illicit finance” (e.g.,
Bitcoin
)


Part D Medicare
-
2006
-
insurance design and and drug benefit
information 51 gigabytes of data (but most usable data left with
plans)


MA. May 2012 “Big Data Initiative”


“A Big Data Road map for Government”
-
NY Times
10/2012


“Demystifying Big Data: A Practical Guide to
Transforming the Business of Government”
Techamerica

Foundation
www.techamericafoundation.org/bigdata

Government Initiatives


March 2012
-
Obama Administration announces $250
million


DOD
, “Big Data Research and Development Initiative”
-
Office of Science and Technology
Policy


National
Science Foundation/National Institutes of Health
solicitation


$10
million to
UC/Berkeley



Xdata
,” “
Earthcube


Government Initiatives


NSF Request for information


Big Data Fact Sheet


“Data to Knowledge to Action” program event by NITRD
on Tuesday, November 12
www.nitrd.gov

available by
webinar
-
no agenda yet posted

Government Initiatives

White House Press Release 2013

What is “Big” Data?


A collection of data sets so large and complex that it becomes
difficult to process using on
-
hand database management tools or
traditional data processing applications


What is considered "big data" varies depending on the capabilities
of the organization managing the set, and on the capabilities of the
applications that are traditionally used to process and analyze the
data set in its domain


The “V’s”: Volume, Variety, Velocity, Veracity


Source: Search on Big Data, www.wikipedia.org


15 of our 17 industry sectors in the United States have more data
stored per company than the U.S. Library of Congress, which itself
collected 235 terabytes of date in April 2011.


Wal
-
Mart Stores Inc. handles more than 1 million customer
transactions every hour, feeding databases estimated at more than
2.5
petabytes



the equivalent of 167 times the books in the Library
of Congress.


30 billion pieces of content are shared on
Facebook



monthly.


Source: “Big Data + Big Analytics = Big Opportunity”, by Jeanne Johnson, KPMG LLP; Financial Executive, July/August
2012.

What is “Big” Data?

Data Analytics


Analysis of data is a process of inspecting, cleaning, transforming,
and modeling data with the goal of highlighting useful information,
suggesting conclusions, and supporting decision making (includes
hidden patterns, unknown correlations)


Data analysis has multiple facets and approaches, encompassing
diverse techniques under a variety of names, in different business,
science, and social science domains


Data mining, computed matching, extending data

Source: Search on Data Analytics, www.wikipedia.org

Future of Social Services Programs
-
Growth

Huge growth in eligible populations (about half of USA population (up
to $90,000 annual income for a family of four)qualifies for
Obamacare

subsidies or Medicaid, over 60 million SNAP recipients)

Growth in cash equivalents for recipients and providers

Growth in means
-
tested eligibility programs


Earned Income tax credits
-
federal, state


Child care subsidies and vouchers based on income


Expanded higher education grants and direct government loans


Student loan reduced payment/forgiveness based on income


Expanded Social Security disability, SSI


Housing vouchers and income based eligibility, homeless services
support


(continued)


Automated applications


Limited or no face
-
to
-
face interaction with front line staff


Loss of practical expertise and local knowledge of front
-
line staff in
assessing applicants


No original documents (or front
-
line staff copy of original documents
presented)


internet access for all
-
anonymous or public access devices


Limited preservation of electronic communications


Potential for capture of data about clients, client conduct, and
interactions

Future of Social Services Programs
-
Growth

What risks can we predict for automated
enrollment in social services programs?



Anonymity/reduced
identification breeds fraud
risk

What
people will do in the dark (psych research)

Earned
income tax credit (GAO
reports
-
about 24% fraud rate)

Driver
behavior vs. pedestrian behavior

FEMA applications after Katrina

Use of SNAP benefits at
Walmart

after SNAP meltdown by
Xerox


Data Mining Predictive Analysis Issues

Sensitivity
-
how likely is the test to identify improper claims?

Specificity
-
how likely is the test to identify only improper claims

Problem
-
every predictive analysis will generate more work than
there are people available to do
-
and those people will ignore
results from any analysis with many false positives

Rules
-
based vs. learning systems

What We Learn From Banks And Credit
Cards
-
Data Mining needs to drive
organization adaptation?


Identity
, status, credentialing verification (new
accounts)


Transaction
tests ($ thresholds, patterns, locations
)


Front
end identity
questions


Prompt
telephone and IM contact on fraud risk
identification


Transaction verification


Scripted
interviews and answers (e.g., “there has been a security breach
on your account
.”)


Close
and replace account
promptly


Someone
is
watching


Reduced
reliance on prosecution


The Affordable Care Act


$ 350
million over 10 years
to bolster
anti
-
fraud efforts, including predictive modeling
programs


Provides
funding for the Health Care Fraud and Abuse Control (HCFAC)
Program
account
, the Medicare Integrity Program, and the Medicaid Integrity
Program


Strengthen
cooperative efforts across the Federal government and with the
private
sector


Increased
data sharing between Federal entities to monitor and assess high risk



program
areas and better identify potential sources of
fraud


Expansion of Integrated
Data Repository (IDR) which is currently
populated with years
of historical Part A, Part B and Part D paid claims, to include
near real
time pre
-
payment stage claims
data; indicators
of aberrant
activity throughout
the claims
processing cycle
. (e.g., time claim was submitted and modified)


State
data set will be harmonized with Medicare claims data in the IDR
to detect
potential fraud, waste and abuse across multiple payers

CMS Predictive Modeling


2011
-
CMS contract
with Northrop
Grumman
and IBM to
lead
teams
to develop a predictive modeling system (Northrop
Grumman) and
models (Northrop
Grumman and IBM) to
identify high
-
risk claims


Northrop
Grumman
working
with National Government
Services (NGS)
and Federal
Network
Systems (Verizon);
IBM team
includes Health Integrity


4
-
year
task order


How is the Northrop Grumman/ IBM project going?


See December 2012 CMS report at
www.stopmedicare.gov/fraud
-
rtc
12142012pdf

Improper Payment Elimination and
Recovery Act (IPERA) July , 2010


Defines
“improper payment”
:


Payments
that should not have been made, or payments
made in an
incorrect amount
(including overpayments and
underpayments
)


Payment
to an ineligible
recipient


Payment
for an ineligible
service


Any
duplicate
payment


Payment
for services not
received


Payments
for an incorrect amount

The Small Business Jobs Act of 2010


Requires
the Center for Medicare & Medicaid Services
(CMS) to “adopt predictive modeling and other analytics
technologies to identify improper claims for reimbursement
and to prevent the payment of such claims under the
Medicare fee
-
for
-
service program.”


T
wo
year predictive modeling contest for hospital
admissions. WSJ 3/16/11


Small Business Jobs Act of 2010


CMS Responsibilities


Contract
with private companies to conduct predictive modeling and
other analytics
to identify and prevent improper payment of claims
submitted
under Parts
A and B of
Medicare


Identify 10 states
that have the highest risk of waste, fraud and abuse
in
the Medicare
program; for one year, use predictive modeling and
other
analytics technologies
to stop fraudulent claims in these
states


CMS to start
using predictive analytics technologies on July 1,
2011


After
the initial
year HHS OIG was required to report
to Congress on
actual savings to the Medicare FFS for the
prior year
, projected future
savings from the use of these technologies, and
the return
on
investments as a result of the predictive analytics
technologies.


CMS was to expand
the use of predictive analytics technologies on
October 1,
2012, to
apply to 10 more States as having the highest risk
of waste, fraud,
or abuse
in the Medicare fee
-
for
-
service program

Small Business Jobs Act of 2010 Data
Mining Requirements
-
How Did CMS Do?


OIG Report


Fraud Control Report


CMS Integrity Strategy?


Small Business Jobs Act of 2010 Data
Mining Requirements
-
How Did CMS Do?

HHS OIG report
-
A
-
17
-
12
-
53000 (September 2012) “The
Department of Health and Human Services has
implemented predictive analytics technologies but can
improve its reporting on related savings and return on
investment” oig.hhs.gov/
oas
/region1/171253000.pdf

Small Business Jobs Act of 2010 Data
Mining Requirements
-
How Did CMS Do?


“In its first report, the Department could not present actual
savings with respect to improper payments recovered.”


“We could not determine whether the $68.2 million in
projected savings from law enforcement referrals was an
accurate projection of savings. This amount represents the
total value of claims identified during the investigation of
leads.”


“Because the Department used actual and projected savings
to calculate returns on investment, it should have included
actual and projected costs to ensure that all costs were
included in the return on investment calculation.”

Computer World
Study


94% of IT projects in last ten years with budgets of over
$10 million (in government and out) launched with major
problems or simply failed.


“Companies should take small steps, via pilots and
skunkworks
, and invest in the ones that work.” MIT
Sloan study 2013

Facial Recognition Technology and More
Data

Numerous
public and private entities are incorporating FRT into their
operations, as part of the larger biometric technology boom
.


systems that consummate online transactions only when the identity
of the parties has been verified via webcam
.


commercial and government buildings with restricted access identify
authorized persons by some biometric
characteristic,
with facial
scanning expected to become more
prevalent (think of how often
your picture is taken for temporary building ids)


tagged
photos for enhanced background checks on job
applicants

What Can We Do With All This Information?


The Supreme Court
has held that the creation and dissemination of
information are speech within the meaning of the First Amendment.
See,
e.g.,
Bartnicki
,

at 527
,
121
S.Ct
. 1753

("[I]f the acts of
`disclosing' and `publishing' information do not constitute speech, it
is hard to imagine what does fall within that category, as distinct
from the category of expressive
conduct“


prescriber
-
identifying
information is speech for First Amendment
purposes
Sorrell v. IMS Health Inc., 131 S. Ct. 2653
-

Supreme
Court
2011

Predictable Crises in Big Data


Privacy


Contracts
-
proprietary information


Snooping/unauthorized use


Need for lawyers to analyze and agree on disclosures and data
sharing


Compliance with affiliation and authorized use agreements


We’ve got it
-
what do we do with it?


Disclosure events
-
do you know where your copier memory went?

Current State


Hostmann
, Bill.
Best Practices in Analytics: Integrating Analytical Capabilities and Process Flows. Gartner, 2012
Data
Sources
Unstructured
Other
Unstructured
Text
Predictive
Prescriptive
Structured
Text
Structured
Transactions
Descriptive
Diagnostic
Finding your right balance of
data & analytics

Analytic
Methods
Capitalize
Gaining confidence in
external
data
sources
Compete
Knowing the value of what
you already have
Cope
just 31
percent say
their agency
has an
adequate
big data
strategy*

Current State

Source: “Transforming Internal Audit: A Maturity Model from Data Analytics to Continuous Assurance”, by Jim Littley,
KPMG LLP; 2013.

Success Factors

Additional “V”s


The Analytics


Viability


Understanding and testing the usefulness of the data
variables, new variables, validating the hypothesis


Value


Confirm viability


add value/extend variables


Visualization


Making the data usable through maps, graphs,
charts. Know the audience and potential.


Source: “The Missing V’s in Big Data: Viability and Value”, by Neil
Biehn
, PROS; Wired.com, Innovation Insights, May 6, 2013.

Success Factors


Define the value


Tone at the top and senior leadership active involvement


Data strategy


include organizational design


Improved analytic capabilities of staff


Robust governance
-

Internal and external


Risk management
-

Data security and privacy


Change management and communication strategy

Predictable Crises in Big Data


Cost


Technology change
-
Rosetta Stone problems


4
Vs
-


Accuracy/Reliability