Business Analyst Professional Development Day
What is Advanced Analytics & Big Data?
Business Intelligence, Advanced Analytics and Big Data seem to be used synonymously
they are different and build on
each other from a maturity perspective
Big Data & Analytics Continuum
Leveraging “Big Data” should be done on a stable foundation
Skills of the Data Analyst / Scientist
ew skills and levels of maturity, certifications and training
What is Advanced Analytics?
When did it happen?
Where exactly is the
How do I find the answers
When should I react?
What actions are needed now?
Why is this happening?
What opportunities am
What if these trends continue?
How much is needed?
When will it be needed?
What will happen next?
How will it affect my business?
How can we get better?
What is the best decision?
Advanced Analytics is comprised of both Business Intelligence technologies and
complex analytic practices that are used to uncover relationships and patterns within
large volumes of historical data that can be used to predict future behavior and events
or improve operational results.
“Dealing with information management
challenges that don’t natively fit with
traditional approaches to handling the
Tom Deutsch (IBM)
What is Big Data? Volume, Variety, Velocity (and sometimes
Veracity and Value)
Industry estimates suggest that 80% of enterprise data
is in unmodeled/unstructured forms where it is nearly
inaccessible and traditional modeling does not fit.
Integrating text extraction techniques to varieties and
large volumes of data such as SEC filings can be
combined with traditional BI data
to create new
structured metrics for analysis and exploration.
Text is also trapped in large description fields in our
operational data stores like the Claims
captured or streamed today in
and Data Warehouses
NEW internal Data
not previously captured
(e.g., emails, clickstream, mobile,
telematics, unstructured notes from
agents or claims adjusters)
NEW External Data
(e.g., internet, social networks,
demographic, local economy, price
elasticity, mobile location stream,
localized competitor intelligence)
Comprehensive advanced analytics have been built
around marketing, product and pricing, and other
areas of the business
mostly disconnected, some
using rudimentary technologies that are inefficient and
focused mainly on data movement and not getting
value out of the data.
Where has Nationwide been, and where can we go?
Big Data & Analytics Continuum
What is the most likely answer?
What is the right question?
Alerts & Drill Down
Ad hoc Reports
Big Data Platforms
RDBMS and Integration
What’s the next best action?
What will happen when and why?
What could happen?
What if these trends continue?
What has happened and why?
How many, how often, who & where?
How do I integrate new data sources?
How is data managed and stored?
When entering the Big Data space, be cautious of your foundational competencies. Information
Management capabilities such as data integration, extensible data modeling, data quality and
data governance become even more important when dealing with these new, uncertain, high
volume data sources. Additionally, to achieve the full
, you must have mature analytics
methodology, appropriately skilled resources and technology.
Accrual Score (Bankruptcy) Prediction
The machine learning technique called Support
Vector Machine (
) was selected. This
supervised learning technique takes a set of factors
in a training set of labeled results and constructs a
Cross Business Interest
Freedom Specialty Insurance
Enterprise Applications Investments
NF opportunities just beginning to be explored
Open Source R was chosen to accelerate the model development
process for the intern. Several external R packages were added to
capability in R as a desktop tool. Supplemental
data preparation of the S&P financial data was handled with
various scripts and spreadsheets.
The project will provide knowledge transfer to Freedom Specialty where
they currently intend to implement it in SAS.
positive precision 0.81
Results (Jan 2013)
Although Freedom’s project was a predictive
modeling effort, the business is anxious to
pursue analyzing the “fine print” of unstructured
text in filings and media reports looking for red
flags to help triage the workload for analysts.
Machine Learning: Advanced Analytics, Structured
Principle: Start with solid advanced analytics
capabilities and add “Big Data” for added
Speech Analytics: Volume, Variety (Unstructured)
Determine if there are certain words used more
prevalently during a first notice of loss call which would
indicate a fraudulent claim.
Convert first notice of loss call history to text and store in big data platform.
Associate call text into two categories: those that resulted in fraud and
those that did not.
Mine data for word patterns. Determine if there are differences in word
usage between fraudulent and non
Build model / rules to execute against call in real time using streaming
This will result in false
positives! Should be
combined with claims, billing,
contact history to enhance
accuracy of model.
Principle: “Big data” does not replace your existing analytics using your structured data
warehouse. Big Data is simply an additional data set which enhances an existing set of
capabilities and should not be used out of context.
Data Analyst / Data Scientist
What is Data Analysis?
How do you recognize patterns in data?
What is the process for inspecting the
How do you identify data cleansing and
Why / How do you visualize your findings
How do you manage, manipulate and
query large, complex data on
What statistical model is most
appropriate for the problem scenario?
What other type of model is appropriate?
New Roles, New Skills
Types of Tools Used
Data Mining tools such as
implementation specific tools
Certifications: Certified Analytics
Professional from Informs
Nationwide / IBM Client Center for
More Terminology to Learn
Classes of Advanced
With a wide range of advanced
Discrete Time Survival
Gaussian Mixture Model
Gradient Boosted Trees
Monte Carlo Simulation
Multinomial Logistic Regression
Optimization: LP; IP; NLP
Poisson Mixture Model
Restricted Boltzmann Machine
Projection on Latent Structures
Spectral Graph Theory
Sparse Data Inference
Intelligent Data Design
The technologies that deal with the big data
problems are broad and diverse, it is not
Big Data Analytics
Just Two Use Cases