POWERING UP ANALYTICS WITH BIG DATA - THE SAS WAY!

voltaireblingData Management

Nov 20, 2013 (3 years and 6 months ago)

80 views


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

POWERING UP ANALYTICS WITH BIG DATA
-
THE SAS WAY!

-
PRIYA SARATHY,
PH.D

ANALYTIC SALES CONSULTANT, SAS


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

SALUTE TO THE WORLD RUN BY
STATISTICIANS

Play


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

AGENDA


High Performance Analytics
(HPA)


Meeting Challenges


The What?


Understanding the Analytic paradigm Shift


High
P
erformance
A
nalytics


the SAS way


What is the business value add


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

MEETING CHALLENGES


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

HIGH
PERFORMANCE
ANALYTICS

WHAT IS HPA DELIVERING


What is HPA about?


Evolving business needs


Why does business need it?


Leveraging information to compete in the market


Raise revenue/ profits


Reduce costs and inefficiencies


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

HIGH PERFORMANCE ANALYTICS GREW FROM THE NEED FOR BIG DATA
ANALYTICS!

Big Data
Analytics

Big
Analytics

Big Data BI

BI

Reactive

Proactive

Analytic Capabilities

L
arge

Big

Data Size


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

HIGH
PERFORMANCE
ANALYTICS

HPA

IS IMPACTING BUSINESS PERFORMANCE IN MANY
AREAS


Data

Analysis, Variable Selection, Modeling


millions of customers
scored in batch


Reduce the time to complete all these tasks from 167 hours to 84
seconds!!!

Probability of Default on
Mortgage


Market risk solution that simulates market states to derive the value at
risk


Understand exposures by counterparties / instrument , Rapidly respond
to crisis and adjust your positions accordingly


Recalculate entire risk portfolio in 12 minutes

down from 18 hours!!

Stress Testing Portfolio


Multiple offers, millions of customers, Regional, response history ,
business rule constraints.


Optimization across cross
-
sell, upsell offers can run several hours


Speed up computation from 5.5 hours to 2 minutes.

Next Best Offer


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

HIGH
PERFORMANCE
ANALYTICS

WHAT CONVERSATIONS ARE YOU INVOLVED IN?


More data analyzed for fraud


more quickly and accurately than ever


across all departments from inside a single enterprise data warehouse.


Trade monitoring
-
unauthorized trades, Commercial fraud

ACH, Wire,
Warranty, Customer fraud
-

payroll, claims fraud.

Fraud Detection


Multi level relationships, Segments, global markets


Accuracy in demand forecasting, daily to weekly forecast updates
across several models


Promote inventory flow from 24 months by 85%

Forecasting Inventory
Management


Household Targeting, Retail bank Campaigns, Customer Acquisition Model


Data

Analysis, Variable Selection, Modeling


Real time offers


coupons, cross sell offers

Retail Marketing


Sports retailer, Location
-
based analytics and CLV modeling with real time
updates, pattern and behavioral analysis = > 60% increase in response rates.


Airline operations: 8
-
10 hours of modeling, lagged data creating suboptimal
decisions


faster insights, greater accuracy from multiple iterations, reduce
operation cost.

Real Time Relationship Marketing


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

WHAT DO OTHERS
THINK?

DATA MEASUREMENT IS THE MODERN EQUIVALENT OF
THE
MICROSCOPE*


28 year Asst. professor at Stanford
combined
math
with political science
in his undergraduate and
graduate studies, seeing “
an opportunity because the
discipline is becoming increasingly data
-
intensive
.”
His research involves the computer
-
automated analysis
of blog postings, Congressional speeches and press
releases, and news articles, looking for insights into
how political ideas spread.

At the World Economic Forum last month in
Davos, Switzerland, Big Data was a marquee
topic. A report by the forum, “Big Data, Big
Impact,” declared data a new class of economic
asset, like currency or gold.

It’s not just more streams of data, but entirely new
ones
-

countless digital sensors worldwide in industrial
equipment, automobiles, electrical meters and shipping
crates
-

measure and communicate location, movement,
vibration, temperature, humidity, even chemical
changes in the
air..

* Quote from Professor
Brynjolfsson

The Age of Big
Data, By
STEVE
LOHR, NYT


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

THE WHAT?


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

THE NEW NORMAL


WHAT IS HPA DOING TO
ANALYTICS?


Analyze 100% of data


More/New variables


More model iterations


Manage complex models


More models (per domain area)


More
questions/ideas/scenarios to evaluate


Multiple
deployment options: batch,
real
-
time


Continuously monitor model effectiveness and retrain

The Things you
can Think!


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

HIGH
PERFORMANCE
ANALYTICS

HPA COMBINES THE THREE PILLARS TO DELIVER
RESULTS


Data:
Leveraging technology to collect, access
and manage data


Analytics:
Adapting to new technology, In
-
memory, Grid, In
-
database


Platform:
Positioning analytics within industry
leaders technology solutions


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

HIGH
PERFORMANCE
ANALYTICS

ADVANCED
ANALYTICS AND FAST COMPUTING
CAPABILITIES ARE
BROUGHT TOGETHER WITH SAS HPA


In
a recent
National Post

interview with Jim Goodnight
, the SAS CEO
explains it like this:


There's
a lot of business processes that will be changing because of the speed at
which we can do analytics; using a thousand processes in parallel to do these
computations can make it possible to do huge problems that we would never have
been able to do before because it would take too long on a single processor.



A
big part of how HPA gets its speed: it breaks larger problems down into
smaller pieces.


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

HIGH
PERFORMANCE
ANALYTICS


From Sampling to Populations analysis


50 Attributes to 500+ Attributes


Reduce run times 18
Hrs

-

30 minutes


Build more complex models


3 month Lagged modeling to Real time updates


Structured data to combining unstructured data


Shortening model lifecycle


More frequent updates, model iterations


real time scoring impacting business bottom
-
line



HPA HELPS REMOVE LIMITATIONS

You will have more time to think!


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

UNDERSTANDING THE
ANALYTIC PARADIGM
SHIFT


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

Data Analysis

45%

Model Build

30%

Validation &
Implementation

10%

Monitoring
& Results
Reporting

15%

MODEL LIFECYCLE

HOW MUCH TIME DO YOU SPEND ON YOUR MODELS?


Where would you like to
spend more time?


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

RESPONSIBILITIES
OF AN STATISTICAL
ANALYST

MODEL BUILDING PARADIGM SHIFT


Extract, Transform, Load data


Data massaging/ mining


Aggregating, normalizing data


Identifying Analytic approach


Building Samples


Building Models


Creating Scoring Code


Validation Reports/ model documentation


Implementation for Production


Results monitoring


Update, refresh, or rebuild model


IT


shifting responsibilities to


EDW/ DW



Data Quality


Data integration


ODS


Production implementation


Analyst


building models


Access to more and better data


Need for documentation and transparency


Greater number of business solutions


Changing market and data dynamics
impacting frequency of build and update


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

Data Analysis

25%

Model Build

6
0%

Validation &
Implementation

10%

Monitoring
& Results
Reporting

5
%

MODEL LIFECYCLE

CHANGING ROLES AND RESPONSIBILITIES


New technology, new tools


New business processes


New competitive demands



Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

THE FARFALLE
MODEL

THE BASIC STRUCTURE OF ANALYTIC FUNCTION

Source: IDC, 2012


70% of the effort in analytics is typically on the information management
side of
the
model.


Analytical teams
in the middle are small but crucial for translating the data assets into
actionable
insights
.


The
organization change side highlights the attributes of behavior
changes needed
by business
users.


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

VOLUME

VARIETY

VELOCITY

VALUE

TODAY

THE FUTURE

DATA SIZE

Working with a Tsunami of data


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

HIGH PERFORMANCE ANALYTICS



THE
SAS WAY



Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

SAS
®

HIGH
-
PERFORMANCE
ANALYTICS

EMBRACING NEW TECHNOLOGY, BUILDING NEW
STRENGTHS

Visual
Analytics


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

CLIENT FRAME

PHYSICAL

LAYOUT

SCALABLE ANALYTIC CAPABILITY

DATA FRAME

SAS Analytic

&
Scoring

Accelerators

RDMBS

Shared /
Clustered File

HADOOP

MID
-
TIER

SAS Analytic

&
Scoring

Accelerators

COMPUTING FRAME

Node 2

Node 1

SAS Metadata
Servers

Node n

[Controller
Node n
cores]


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

HIGH
PERFORMANCE
ANALYTICS

CHANGING THE WAY ANALYTICS IS DONE BOTTOMS UP

Data
Preparation

Data Exploration

Analytics


DS2


SORT


SUMMARY/MEANS


FREQ


RANK



HPLOGISTIC


HPREG


HPLMIXED


HPFOREST


HPNEURAL


HPREDUCE


HPNLIN



Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

Text Mining



Parsing large
-
scale text
collections


Extract entities


Auto. stemming &
synonym
detection


Topic discovery

Predictive Analytics &
Data Mining


Binary target &
continuous no.
predictions


Linear & Non
-
Linear modeling


Complex
relationships


Tree
-
based
Classification


Optimization*



Local search
optimization


Large
-
scale linear
& mixed integer
problems

Econometrics Time
Series


Probability of an
event(s)


Severity
of
random event(s)

SAS
®

HIGH
-
PERFORMANCE
ANALYTICS SERVER

AREAS OF MODEL DEVELOPMENT THAT BENEFIT

*Currently only available for Teradata and EMC Greenplum


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

IN
-
MEMORY HIGH PERFORMANCE ANALYTICS

HPA

VA


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.



84

SECONDS

DATA

EXPLORATION

MODEL

DEVELOPMENT

MODEL

DEPLOYMENT

FINANCIAL SERVICES

CUSTOMER ACQUISITION USE CASE


Current Process

High
-
Performance Process

One algorithm (Neural Network)

Multiple algorithms (e.g. Forest, Logistic
Reg., etc.)

1 model per day

1 model per 30 minutes

5 hours to process model

3 minutes to process model

Model lift of 1.6%

Model lift of 2.5%


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.


Think left and think right and think low and think high. Oh, the thinks you can
think up if only you try!


Oh the things you can find, if you don't stay behind
!
Dr
. Seuss

(On Beyond
Zebra!, 1955)


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

SAS
®

HIGH
-
PERFORMANCE
ANALYTICS SERVER

KEY DIFFERENTIATORS


Only in
-
memory offering in the market delivering high
-
end analytics, including text mining and optimization



Addresses
the entire model development and
deployment lifecycle



36
years of proven technology...
faster. Opens
up vast
array of possibilities
to
get value from big
data


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

ADDITIONAL CASE STUDIES


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

TOP FIVE WAYS HIGH
-
PERFORMANCE ANALYTICS WILL
TRANSFORM MARKETING


Faster, more sophisticated, effective segmentation



segmentation tests can be run against the entire populations in order to determine the best campaign interaction
methods


Real
-
time, relevant next
-
best customer actions or offers



This results in a more relevant offer or customer interaction surfacing at the “point of need” in real
-
time


Instant deployment and management of marketing models that give you a sustainable
advantage



companies to quickly and efficiently update their numerous models without submitting a slow overnight batch
update process.


1:1 real
-
time experiences to bolster brand connections



The outcome is more precise, real
-
time interactions with consumers at the “point of need.”


Optimized marketing for broader business
impact


Now businesses can not only determine the customer and financial impacts of their campaigns faster but also
adapt instantaneously to market, competitive and customer changes.


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

UNITED HEALTHCARE GROUP

HEALTHCARE PAYER

BUSINESS ISSUE


Electronic medical records (EMRs) driving a data explosion


Utilize all of the unstructured text (records, case notes, emails,
transcripts, etc.)


How to improve quality and cost of care? “Create Healthier Lives”

SOLUTION


SAS
®

High
-
Performance
Analytics
Server including HP
Text Mining


Greenplum Data
Computing Appliance

RESULTS


Reduce model processing time from four
hours
to
10
seconds.


Reduce misclassification rates from 30% to 10%


Historical models improved with more than 10% lift


I
can
now tell that
a prescription will harm a patient before you write it…


I
can tell
that a customer
is dissatisfied before you lose
him or her...


I can now determine that a claim is fraudulent before you pay it…

“ SAS is helping make our
member services the best in
the industry, In less
than one
hour, we can load a huge
table (
169
million row
dataset), find the best
variables, compare different
models and pick the best
model
. I
would not attempt
to model a dataset this large
without SAS HPA Server.”


Mark Pitts

Director of Data Science,
Solutions and
Strategy


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

Worker Node 1


Worker Node 2


Worker Node N


Root Node

(Teradata Managed Server
)

SAS HIGH
-
PERFORMANCE

LEVERAGING DATABASE APPLIANCE FOR HPA


Request
is

sent
to
the

root node

i
nside the

appliance




Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

Worker Node 1


Worker Node 2


Worker Node N


Root Node


SAS HIGH
-
PERFORMANCE

ANALYTICAL COMPUTATION AND DATA REQUEST SENT
TO THE WORKER
NODES


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

Root Node


SAS HIGH
-
PERFORMANCE

DATA REQUEST
SENT TO
THE DATABASE.

DATA
SLICE
MOVED
INTO
MEMORY


Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

SAS HIGH
-
PERFORMANCE

ANALYTIC PROCESSING WITH
INTERNODE COMMUNICATION

Root Node



Copyr i ght © 2012, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.

Root Node


SAS HIGH
-
PERFORMANCE

WORKER NODE
RETURNED
TO THE
ROOT
NODE. JOB IS

COMPLETE
.