# Some simple data mining on project 1 with SPSS

Διαχείριση Δεδομένων

20 Νοε 2013 (πριν από 4 χρόνια και 6 μήνες)

191 εμφανίσεις

Project Idea from MRKT 350 (courtesy Dwayne Ball)

Some simple data mining on project 1 with SPSS

This describes some simple analyses you can do with SPSS to explore the differences between people in your test
sample that ordered and people that didn’t.

1.

There are two data fil
es

(Master Customer File1 and 2)

in the “
M
arketing

350
” folder of the
public “P”

drive
.
Simply double
-
click on either of them and the data will be opened in SPSS.

2.

Bring up the data window in SPSS

Now that a dataset is on your hard drive, double
-
click o
n it in order to open the data window, which will show you
the data window. Examine each variable and look at the individual data points in order to understand what each
variable is and what each value means.

3.

Look at the frequencies or distributions of v
arious variables.

Click on the
Analyze

pull
-
Descriptive Statistics
, then to
Descriptives
. When the dialogue
box comes up, select all the variabl
es into the “variables” list.

This will give you some
basic

statistics

mean,
standa
rd deviation, minimum, and maximum
.

Note that the mean of a binary variable coded as either 0 or 1 is
simply the proportion with a response of 1.

When you specify an analysis, the output window will automatically open, showing the results of your analys
es.
As you do more analyses, the output is added to the bottom of the output window.

You can save this output, and
even copy tables from an SPSS output window into MS Word.

4.

Explore relationships with the response variable.

Again click on the
Analy
ze

pull
-
Compare Means
, and then on
Means
. Try looking at
all the

variables
(other than CUSTID) averages

against the promotional response,
WINETAST
.

Select all the variables except CUSTID and WINETAST and put them in the “Dependent v
ariables” box. You can
do this by clicking on the first variable you want, then holding down the shift key and clicking on the last variable
you want, highlighting all of them. Then click the arrow next to the dependent variables box. Put WINETAST in
t
he independent variables box, and click OK.

As with a lot of SPSS procedures, the first table that comes up is a “case processing summary” to check and make
sure that there were no problems with the analysis. Just make sure that it appears that all 10,00
0 cases were
processed for each variable chosen, except for the questionnaire items, which will have in the neighborhood of 1,000
cases.

Now, look at the second table, which will be much too long to fit on the

page
, but you can use the bottom scroll bar
t
o see the rest of it. The first row represents the people who did not respond to the invitation, the second row
represents the people who did respond, and the last row is the total. When a variable, like KIDHOME is binary
(0,1), the mean in the box for
the first row, for example, represents the fraction of people

that did not accept the
invitation who have a child under 13 at home, and the number in the box below represents the fraction of people
who DID accept the invitation who have a child at home. I
t is the contrast between these two numbers

the mean in
the first row and the mean in the second row
-

that is important in generating ideas about what differentiates those
who accept the invitation from those who don’t.

Look for big differences. For e
xample, if the mean household income for people who did not accept the invitation
was \$80,000, and the mean income for those that did accept the invitation was \$120,000, and the standard deviations
for those two means were around \$15,000, that means that
the difference between the two groups is almost 3
standard deviations. That’s a big difference. On the other hand, if the mean difference was only between \$85,000
Project Idea from MRKT 350 (courtesy Dwayne Ball)

and \$80,000, that would only be 1/3 of a standard deviation

not very promising as a varia
ble to separate the
accepters from the non
-
accepters.

After you’ve gone through the whole table

ignoring the questionnaire items, since only 10% of the sample has
those variables

you will have a set of hypotheses about what separates the accepters fr
om the non
-
accepters.

5.

Create

the variable that defines the target market (those to be invited), or “The Logic.”
:

U
se the “
Compute
” alternative in the
Transform

pull
-
down menu. This allows you to create a new variable out of
other variables.
You wa
nt

to create a variable called “
INVITE
” to indicate which people you’d like to
invite

to the
wine tastings

in other words, the target market. Let’s suppose you’d like to
invite

people who have spent at least
\$100 with you in terms of
REVLSTYR (revenue
last year)
, but that you’d
not

like to
invite

people who are lower
income
(
INCGRP
1
=1 or INCGRP
2
=
1
). Here’s what you would do.

First, click on the
Transform

pull
-
compute
. In the target variable box, type the name of your
new variab
le, such as “
INVITE
.” Then, in the “
numeric expression

box” type (or select from the symbols
in the
box
below)
=0
. Then, press the
OK

button. After a few seconds, this will create a new variable called

INVITE
,” for which everyone has a value of 0.

Check the far right side of the data in “Data View” to make sure
you have the new variable and everyone has a zero value for it.

Now, you can set about altering that value for
selected people.

Again, go to the
Transform

pull
-
comput
e
.” You will notice that your previous definition is
still there. Wipe out the ‘=0’ in the
numeric expression

box, and substitute =1. That means,
INVITE

will =1 for
everyone, unless you specify only certain people. To specify only those people who

fit your criteria will have a
value of 1 for

INVITE
, hit the “
If
…” button. At the top of the dialogue box that opens, select “include if case
satisfies condition.” Then, type in your logical statement, making sure you get the variable names correct (va
riable
names can be sent over to the box with a mouseclick). In this case, your logical statement in the box would be:

REVLSTYR

>100 and
INCGRP
1
=0 and
INCGRP
2=0

Or,

an

equivalent way to say it (there are many Boolean expressions that will result in t
he same target market) would
be:

REVLSTYR >100 and (INCGRP3=1 or INCGRP4=1 or INCGRP5=1 or INCGRP6=1)

You can use all sorts of Boolean operators
:

and or > < >= <= ( ) ~ (not

equal to)

and many more, many of which can be found in the dialogue box that opens to create your “if” statement.

I recommend that you open up a word processor or spreadsheet and copy this logical statement onto a page for later
reference.
In fact,

a

good place to keep track of this is on the spreadsheet that estimates total profit for your logic for
the entire customer base.
When you estimate the profit for this logic, store that in a nearby cell of the spreadsheet so
you can keep track of the log
ical statements you have used and how successful you were.

Now click on “
continue
,” and when the main dialogue box appears, click on “
OK
.” You will be asked if you want
to change the existing variable

and you do

so click
yes
. The computer will think

for a moment, and then the new
values of “
INVITE
” will appear. Check a few people to make sure that
INVITE

is being computed correctly.
Now, you have a value of
INVITE

that indicates who you would like to
invite

to the wine tasting

in the larger
dat
aba
se, and you can check your test

datasets to see how effective and profitable that promotion is likely to be.

6.

Estimating the profit from the promotion
to the entire remaining customer base,

Project Idea from MRKT 350 (courtesy Dwayne Ball)

When you have decided on a possible logic for

whom to
invite

promotions to, you can now crosstabulate the
variable indicating who receives the promotion (“
INVITE
”) with whether or not they responded to it in the test data
set (‘
WINETAST
”). Then, you can estimate how profitable or unprofitable your
promotion would be in the real
world.
This will be done using a spreadsheet discussed in class.

To do this, you must go to the ANALYZE pull
-
down menu, and select DESCRIPTIVE STATISTICS,
CROSSTABS. The dialog box that appears allows you to choose a row v
ariable
and a column variable. S
elect
WINETAST as the row variable and INVITE as the column variable, and then click the “cells” button in the upper
right of the dialog box.

Under “percentages” select “column” and then “Continue.” Then click “OK,” and y
ou will
produce a table like this:

Did customer attend (and pay for) wine tasting party as part of program test? * INVITE
Crosstabulation

INVITE

0

1

Total

Did customer attend (and
pay for) wine tasting party as
part of program test?

no

Count

645
4

2743

9197

% within INVITE

94.9%

85.7%

92.0%

yes

Count

347

456

803

% within INVITE

5.1%

14.3%

8.0%

Total

Count

6801

3199

10000

% within INVITE

100.0%

100.0%

100.0%

Now, you will note that the number of people you would invite is 3199 in thi
s table, or 31.99% of the population.
That percentage goes in the spreadsheet for calculating profit from a promotion.

You will also note that, of the 3199 people you would have invited by this logic, 456 of them, or 14.3%, would have
responded by accep
ting the invitation. This is the response rate. This percentage also goes in the spreadsheet to
calculate profit.

When you put those two numbers, 31.99% and 14.3%, into the spreadsheet at the correct points, you will see that the
calculated dollar prof
it is \$9,589. Not at all impressive. I’m sure you can do better.

7.

Try different logic until you are satisfied or decide to stop
.

Now, suppose you want to change the logic? IMPORTANT POINT! Go back to transform

compute and set the
value of INVITE to

zero, and click the If button, and set the “include all cases” condition.

Click OK, and OK to
change existing variable, and then make sure that everyone in the data set now has a value of zero for INVITE.

Keep trying different logical statements until you are happy with the amount of profit you have made, or at least
have used up your patience with this problem and have decided to stop.

8.

Generate a profit calculation spreadsheet for the project 1 report
.

Project Idea from MRKT 350 (courtesy Dwayne Ball)

At that point, to estimate what you will get in the entire remaining customer database, apply your logic to Master
Customer File 2. Just create the INVITE variable in that dataset, according to your logic, run the crosstabulation,
and see what the pro
fit is on that data set. It will be somewhat different than what you got in dataset 1, but probably
(not always) a better estimate of what will happen if the promotion is “rolled out” to the entire remaining customer
base that is selected by the logic.

9.

Send me your logic and target market percentage to check

For a check, send me
an e
-
mail with
the logic and the percent of the target market that would be INVITEd by that
logic, and I’ll tell you if I get something close to that when I apply the logic to

the entire remaining dataset. Of
course, I cannot tell you what your response rate or profit will be, just as the keeper of your customer data base in
real life could not tell you what the results of the promotion would be until after the results were in
. However, he or
she could tell you if your logic actually INVITEs the fraction of the database that you think it does.

That’s all the
check you’d get in real life. If I get just about the same target market size that you do, at least there are no obvio
us
errors in what you’ve done.

10.

Write and turn in project 1

After you hear from me that your target market seems to be the correct size, take a deep breath and write your
report, which is essentially a proposal to management to spend some money to run t
his promotion on a certain target
market, and a promise that a certain amount of profit is likely. Turn in your project and keep your fingers crossed!