Data Mining Using MS Excel

levelsordΔιαχείριση Δεδομένων

20 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

74 εμφανίσεις

Data Mining Using MS Excel
2007/2010

Intro


It is a free downloadable add
-
in


You download and run setup


that is all


You need the Professional Edition of Excel 2007 and
SQL Server 2008 Analysis Services


The add
-
in creates a database during setup with a
wizard


It is powerful, and provides an excellent introduction to
data mining


You have to set your own environment because


You’d better have the admin right for the SQL Server 2008


In addition, Analysis Services support only Windows
-
authenticated clients

When everything is working


When select a cell, the Table Tools groups shows. Clicking it to see
Analyze ribbon


buttons showing you the types of analysis you can
perform

The Eight Tasks You Can Perform


The Analyze Key Influencers


Detect Category


Fill From Example


Forecasting


Highlight Exceptions


Scenario Analysis Tool


Predication Calculator


Shopping Basket Analysis

The Analyze Key Influencers


Read page 22


Click the Analyze Key
Influencers button
and select
Occupation to get
the result on the
right (partial)


What the reporting
is saying?


What influence the
outcome of the
target column?

The Discrimination Report


To look into deeper
what factors
(columns) influence
the outcome (target
column), we use
the discrimination
Report

Detect Category Tool


This one finds out how records are related to
each other.


It allows you to analyze just a handful of
categories of records, rather than millions


Each record can only belong to one category.


It generates three reports


The category


The key “signatures” of a category


Category profile over columns

Detect Category Tool
--
Configure


Categories and Characteristics

The categories are not named. You
can rename them to show their
true characteristics, showing below

Category Profile
--

Education


Fill From Example


“Machine learning”


Sample uses different set of data


Predicts for missing values


high value customer


The more training data, the better


Can take the iterative approach to make
corrections and re
-
run the reports


The outputs are two


The predicated values


The column
-
value pairs that influence the outcome
the most

Report Beside the predicted values

Forecasting Tool


Want to tell the future?


With the help from MS, you may

.


If you give some training data that show the
trend with time, we can tell you the future
using the Forecasting Tool.


It only deals with numbers.


Forecast Report


High Exception Tool


Out of millions records, how
to find the abnormal ones?
Use MS’s High Exception Tool!


This one tells you the number
of exceptional records and
highlight them in the data
sheet!


You can change the threshold
to control the number of
records picked out, the small
the number the more records
will be selected


Scenario Analysis Tool


Goal Seek


This one actually has
two sub tasks: Goal
Seek and What
-
If


Basically, it tries to
find the correlation
between two
columns


Can apply the same
for the entire table

Scenario Analysis Tool


What
-
If


Prediction Calculator


We have been collecting customer data and
know who purchased a bike and who have not.


Can we use the info to predicate, for a given
customer info, if s/he will buy a bike?


At least some likelihood indication


This is what this one does!


It assigns a weight for each possible value of a
column and uses the weight to calculate the
likelihood for a given record

Prediction Calculator


set to run


Predication Calculator Spreadsheet


This one allows you to
“Predicate” a customer
whether s/he will buy a bike
if you provide the
characteristics of the
customer.


It also show the relative
impact of the column and
value.


A printer
-
ready report is also
generated that list, for each
column, all the possible
values or ranges

The Consequences of Mistakes


It is a prediction, and can be correct and
incorrect. What are the costs of mistakes?


Predicated
Actual
Correct
Comment
Cost
No
No
Yes
True negative
Marketing
No
Yes
No
Type II (false negative)
(Profit )
Yes
No
No
Type I (False positive)
(Marketing)
Yes
Yes
Yes
True positive
Profit
Cost Charts


Shopping Basket Analysis


What items do your customers purchase
together ? Beer and Chips is a possibility, what
about Chips and Beer?


The Shopping Basket Analysis will answer the
question regarding what people buy together
(on the same transaction).


If you can provide pricing, MS even gives you
revenue data.


You need to indicate transaction ID

Shopping Basket Analysis
--
Data

Order Number

Category

Product

Product Price

SO61269

Helmets

Sport
-
100

53.99

SO61269

Jerseys

Long
-
Sleeve Logo Jersey

49.99

SO61270

Fenders

Fender Set
-

Mountain

21.98

SO61271

Tires and Tubes

LL Road Tire

21.49

SO61271

Tires and Tubes

Patch kit

564.99

SO61272

Tires and Tubes

Mountain Tire Tube

4.99

SO61272

Tires and Tubes

Patch kit

564.99

SO61273

Bottles and Cages

Water Bottle

4.99

SO61274

Caps

Cycling Cap

8.99

SO61274

Shorts

Women's Mountain Shorts

69.99

SO61275

Helmets

Sport
-
100

53.99

SO61276

Jerseys

Short
-
Sleeve Classic Jersey

539.99

SO61276

Caps

Cycling Cap

8.99

SO61277

Mountain Bikes

Mountain
-
500

539.99

SO61277

Jerseys

Short
-
Sleeve Classic Jersey

539.99

SO61277

Caps

Cycling Cap

8.99

SO61278

Road Bikes

Road
-
350
-
W

2443.35

SO61278

Bottles and Cages

Road Bottle Cage

8.99

SO61278

Bottles and Cages

Water Bottle

4.99

SO61278

Jerseys

Short
-
Sleeve Classic Jersey

539.99

SO61279

Mountain Bikes

Mountain
-
200

2319.99

SO61279

Fenders

Fender Set
-

Mountain

21.98

Shopping Basket Analysis

Setting Up

Shopping Basket Analysis

Result


Only the first three columns will be provided if
pricing data is not selected or not available