Data Mining What is data Mining? data modeling process

yawnclappedData Management

Nov 20, 2013 (3 years and 10 months ago)

368 views


1

Data Mining


What is data Mining?



It is a
data modeling process

that covers a broad range of techniques
being used in a variety of industries involved with marketing, risk
and customer relationship management.




The success of any modeling project requi
res not only a good
understanding of the methodologies but solid knowledge of the data,
market and overall business objectives.



Effective use of data mining techniques is a delicate belend of
art
and science.


2

Steps for Preparing a Data Mining Project


1.

S
etting the Objectives


2.

Selecting the Data Sources


3.

Preparing the Data for Modeling


4.

Selecting and Transforming the variables


5.

Processing and Evaluating the Model


6.

Validating the Model


7.

Implementing and Maintaing the model


8.

Applications



3

Defining the Goal



To measure or to Predict?



Predictive models estimate values that represent future activity.



A descriptive model creates rules that are used to group subjects into
descriptive categories.




From business point of view, companies use predictive and
descriptive models to

attract and retain profitable customers.




4

One way to determine the objective of target modeling or profiling
project is to ask the following questions:



Do you want to attract new customers?


Do you want those new customers to be p
rofitable?


Do you want to avoid high
-
risk customers?


Do you want to understand the characteristics of your current
customers?


Do you want to make your unprofitable customers more profitable?


Do you want to retain your profitable customers?


Do you want

to win back your lost customers?


Do you want to improve customer satisfaction?


Do you want to increase sales?


Do you want to reduce expenses?



5

Do you want to attract new customers?

Targeted response modeling


Do you want those new customers to be prof
itable?

Lifetime value modeling


Do you want to avoid high
-
risk customers?

Risk or approval models


Do you want to understand the characteristics of your current
customers?

Segmenting and profile analysis


Do you want to make your unprofitable customers mo
re profitable?

Cross
-
sell and up
-
sell targeting models


Do you want to retain your profitable customers?

Retention or churn models


Do you want to win back your lost customers?

Win
-
back models


Do you want to improve customer satisfaction?


Do you want to
increase sales?


Do you want to reduce expenses?



6

Some Terminologies


Profile Analysis


It measures common characteristics within a population of interest.
Demographics as well as consumption behaviors are typically the key
variables to be analyzed.



Seg
mentation


Use profiles analysis to separate customers by profitability and
market potential, or by profit and risk.


Response


The goal of a response model is to predict who will be responsive to
an offer for a product or a service.


Risk


Approval or ris
k models are unique to banking and insurance
industries that assume the potential for loss when offering a product
or service.


Activation


Activation models are models that predict if a prospect will become a
full
-
fledged customer.


Cross
-
sell and up
-
sell



7

Cross
-
sell models are used to predict the probability or value of a
current customer buying a different product or service from the same
company.

Up
-
sell models predict the probability or value of a customer buying
more of the same products or services.


Attrition


Attrition is defined as a decrease in the use of a product or service.
The issue is to predict the act of reducing or ending the use of a
product or service after an account has been activated.


Net present Value


A net present value (NPV) mode
l attempts to predict the overall
profitability of a product for a predetermined length of time



Lifetime Value


A lifetime value model attempts to predict the overall profitability of
a customer for a predetermined length of time.


8

Choose the Modeling Me
thodology


(Details to be discussed)


Linear Regression




Logistic Regression




Multivariate techniques for Clustering and classification




Neural Networks




Genetic Algorithms



Classification Trees




To be successful to support an analytic approach,

every area of
the company must be willing to work toward the same goals,
especially the team work among finance, accounting, marketing
and information technoloy groups.




9


Selecting the Data Source




There are three basic types of data:


Demographic data

--

provides description of personal or household
characteristics


Gender, age, martial status, income, home ownership, dwelling type,
education level, ethnicity, presence of children, ….


Behavior data



records or measurement of action or behavior


Sale
s amount, types and dates of purchases, payment patterns,
customer service activities, insurance claims, bankruptcy behavior,…


Psychographic or attitudial data



provides indication of intended
behavior and is characterized by opinions, lifestyle characte
ristics or
personal values





10

Source of data
--

Internal Sources


Customer Database



Customer ID, household ID, account number, customer name,
address, phone number, demographics, product or services, offer
details, model scores,…


Each customer has a
record


Transation Database


Customer ID, account number, sales activity, date pf activity,


Each transaction has a record.


Offer history database


This contains details about offers made to prospects, customers or
both.



Data warehouse


A data warehous
e is a structure that links information from two or
more databases.


It is effective to integrate all internal databases into an information
data mart for general applications.



External Source
--

List sailer and compilers for new customers


11


Selecting t
he best data for targeting model development requires
a thorough understanding of the market and the objective.



More and more companies are forming affinity relationship with
other companies to pool resources and increase profits.



Tip: strive to have t
he population from which the data is extracted be
representative of the population to be scored.


Data for prospecting: Data from a prior compaign for the same
product and to the same group is the optimal choice for data in any
targeting models. Often new

list can be obtained from list providers
or alliance.



Data for customer model


Phone survey from existing customers for cross
-
sell analysis.



Data for risk models


Credit and insurance risk data.

Usually, it is more effective to use both internal and e
xternal data
source to build reisk models over a pre
-
specified period



Pupulation and Sampling Methods


Simple random sampling, stratified random sampling


12

Prepare the data for modeling


Fixed format versus variable format




Qualitative data versus Quant
itative data




Nominal data versus Cardinal data




Interval data versus Continuous data



Cleaning the Data



Examine data for possible errors, outliers, and missing values.



Need examples here for (a) identifying errors and outliers, (b)
methods of de
aling with outliers, and (c) methods to relace missing
values.


Continuous variables and categorical variables should be discussed
separately. For continuous variables, normally unusual data points
would be deleted or replaced, for categorical data often m
issing data
can be a new category.




13



Defining Objective Function and variable selcetion




With respect to the goal of the study, a specific objective should be
formulated.


For instance,


Net present value of a product (NPV)


NPV = probability of activa
tion * Risk index * profit


marketing
expenses


Each component of the above formulation needs to be modelled and
estimated.


The overall NPV should be estimated by the combinations of
segments of the entire market.


Methods of variable reductions: ratios,

summerization, aggregations.


Segmentation, transformation of data,


Building linear predictors, interactions, threshold models,








14


Model selection


Criteria


Stepwise, forward, backword






Stability and homogeniouity of results




Validation of the

model (Model Checking)


Splitting the data


for fitting and validation




Resampling methods


Model Modifications


15



Implementing and maintaining the model


Once an useful model is built, the parameters would be updated
and validation measurements would
be examined periodically.