Big Data

wonderfuldistinctΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

109 εμφανίσεις

Architecture Services

Big Data


Big Changes

Business Analyst Professional Development Day

September 2013

Contents

Architecture Services

1

What is Advanced Analytics & Big Data?

Business Intelligence, Advanced Analytics and Big Data seem to be used synonymously


they are different and build on
each other from a maturity perspective


Big Data & Analytics Continuum

Leveraging “Big Data” should be done on a stable foundation
-

Examples


Skills of the Data Analyst / Scientist

N
ew skills and levels of maturity, certifications and training

Contents

Architecture Services

What is Advanced Analytics?

2

Hindsight

Current
Sight

Foresight

What happened?

When did it happen?


Standard Reports


How many?

How often?

Where?


Adhoc

Reports


Where exactly is the


problem?

How do I find the answers
?


Query Drilldown


When should I react?

What actions are needed now?


Alerts


Why is this happening?

What opportunities am

I missing?


Statistical Analysis



What if these trends continue?

How much is needed?

When will it be needed?


Forecasting


What will happen next?

How will it affect my business?


Predictive Analytics


How can we get better?

What is the best decision?


Optimization


Advanced Analytics is comprised of both Business Intelligence technologies and
complex analytic practices that are used to uncover relationships and patterns within
large volumes of historical data that can be used to predict future behavior and events
or improve operational results.

Architecture Services

Definition:

“Dealing with information management

challenges that don’t natively fit with

traditional approaches to handling the

problem.”


Tom Deutsch (IBM)

What is Big Data? Volume, Variety, Velocity (and sometimes
Veracity and Value)

Architecture Services


Industry estimates suggest that 80% of enterprise data
is in unmodeled/unstructured forms where it is nearly
inaccessible and traditional modeling does not fit.


Integrating text extraction techniques to varieties and
large volumes of data such as SEC filings can be
combined with traditional BI data
to create new
structured metrics for analysis and exploration.


Text is also trapped in large description fields in our
operational data stores like the Claims
DW
.



Internal Data

captured or streamed today in
Systems

and Data Warehouses

(e.g., policy
a
dmin, claims)

NEW internal Data

not previously captured

(e.g., emails, clickstream, mobile,
telematics, unstructured notes from
agents or claims adjusters)

NEW External Data

from non
-
traditional sources

(e.g., internet, social networks,
demographic, local economy, price
elasticity, mobile location stream,
localized competitor intelligence)


Comprehensive advanced analytics have been built
around marketing, product and pricing, and other
areas of the business


mostly disconnected, some
using rudimentary technologies that are inefficient and
focused mainly on data movement and not getting
value out of the data.

Where has Nationwide been, and where can we go?

Architecture Services

Big Data & Analytics Continuum

5

Cognitive


Reasoning


Learning


Natural Language

What is the most likely answer?

What is the right question?

Prescriptive

Predictive

Descriptive

Information

Layer


Optimization


Rules


Constraints


Machine Learning


Forecasting


Statistical Analysis


Alerts & Drill Down


Ad hoc Reports


Standard Reports


Big Data Platforms


Content Management


RDBMS and Integration

What’s the next best action?

What will happen when and why?

What could happen?

What if these trends continue?

What has happened and why?

How many, how often, who & where?

How do I integrate new data sources?

How is data managed and stored?

Business Value

When entering the Big Data space, be cautious of your foundational competencies. Information
Management capabilities such as data integration, extensible data modeling, data quality and
data governance become even more important when dealing with these new, uncertain, high
volume data sources. Additionally, to achieve the full
ROI
, you must have mature analytics
methodology, appropriately skilled resources and technology.

Architecture Services

Selected Results



Accrual Score (Bankruptcy) Prediction

The machine learning technique called Support
Vector Machine (
SVM
) was selected. This
supervised learning technique takes a set of factors
in a training set of labeled results and constructs a
model.


Cross Business Interest


Freedom Specialty Insurance


Enterprise Applications Investments


NF opportunities just beginning to be explored

Open Source R was chosen to accelerate the model development

process for the intern. Several external R packages were added to

complete the
SVM

capability in R as a desktop tool. Supplemental

data preparation of the S&P financial data was handled with

various scripts and spreadsheets.


The project will provide knowledge transfer to Freedom Specialty where

they currently intend to implement it in SAS.

positive precision 0.81

positive recall
0.70

positive
F1

score
0.75


negative precision
0.74

negative recall
0.83

negative
F1

score
0.78


accuracy
0.77

Model Validation

Results (Jan 2013)


Further Optimization

Pending

Although Freedom’s project was a predictive
modeling effort, the business is anxious to
pursue analyzing the “fine print” of unstructured
text in filings and media reports looking for red
flags to help triage the workload for analysts.

Use Case


Machine Learning: Advanced Analytics, Structured

Principle: Start with solid advanced analytics
capabilities and add “Big Data” for added
ROI

Architecture Services

Use Case


Speech Analytics: Volume, Variety (Unstructured)

7

Hypothesis:


Determine if there are certain words used more
prevalently during a first notice of loss call which would
indicate a fraudulent claim.


Convert first notice of loss call history to text and store in big data platform.


Associate call text into two categories: those that resulted in fraud and
those that did not.


Mine data for word patterns. Determine if there are differences in word
usage between fraudulent and non
-
fraudulent claims.


Build model / rules to execute against call in real time using streaming
technology.

This will result in false
positives! Should be
combined with claims, billing,
contact history to enhance
accuracy of model.

Principle: “Big data” does not replace your existing analytics using your structured data
warehouse. Big Data is simply an additional data set which enhances an existing set of
capabilities and should not be used out of context.

Architecture Services


Data Analyst / Data Scientist



What is Data Analysis?


How do you recognize patterns in data?


What is the process for inspecting the
data?


How do you identify data cleansing and
transformation rules?


Why / How do you visualize your findings
and information?


How do you manage, manipulate and
query large, complex data on
Hadoop

as
an analyst?


What statistical model is most
appropriate for the problem scenario?
What other type of model is appropriate?


New Roles, New Skills


Types of Tools Used



R


SPSS


Tableau


Data Mining tools such as
Teradata

Miner


Hadoop

implementation specific tools
such as
BigSQL

&
BigSheets

(IBM)


Other Considerations



Certifications: Certified Analytics
Professional from Informs


Nationwide / IBM Client Center for
Advanced Analytics



Architecture Services

Appendix

9

Architecture Services

More Terminology to Learn

10

Classes of Advanced

Analytics Problems

With a wide range of advanced
modeling
techniques…


ARMA


CART


CIR++


Compression Nets


Decision Trees


Discrete Time Survival
Analysis


D
-
Optimality


Ensemble Model


Gaussian Mixture Model


Genetic Algorithm


Gradient Boosted Trees


Hierarchical Clustering


Kalman

Filter


K
-
Means


KNN


Linear Regression


Logistic Regression



Monte Carlo Simulation


Multinomial Logistic Regression


Neural Networks


Optimization: LP; IP; NLP


Poisson Mixture Model


Restricted Boltzmann Machine


Sensitivity Trees


SVD, A
-
SVD, SVD++


SVM


Projection on Latent Structures


Spectral Graph Theory


Regression



Classification


Clustering


Forecasting


Optimization


Simulation


Sparse Data Inference


Anomaly Detection


Natural Language
Processing


Intelligent Data Design

Architecture Services


Presentation

The technologies that deal with the big data

problems are broad and diverse, it is not

just Hadoop

Big Data Analytics


The Landscape

Architecture Services

Touchpoints



Just Two Use Cases