Data Mining in SQL Server 2000 and 2005

boorishadamantAI and Robotics

Oct 29, 2013 (3 years and 7 months ago)

92 views

DAT377

Data Mining in SQL Server 2000
and 2005

Jamie MacLennan

SQL Server Data Mining

Agenda

Data Mining


Where and what is it?

SQL Server 2000 Data Mining

SQL Server 2005 Data Mining

What we offer

Algorithms

Intelligent Applications

Version Comparison

Status of Data Mining

Data Mining


fastest growing BI segment

According to IDC

All major database vendors are stepping up efforts
in this area

Data Mining emerging into mainstream

Amazon


“People who bought this book also
bought…”

Vivisimo.com


Hierarchically clustered search
results

What does Data Mining Do?

Explores
Your Data

Finds
Patterns

Performs
Predictions

Microsoft Data Mining

Enter the Game


Create industry standard


Target developer audience


V1.0 product with 2
algorithms

OLEDB for Data Mining

Elevated “Mining Model” Concept

Database object

Client/Server

Security Model

Backup/Restore

Specified Access

OLEDB


ADO, ADO.Net

DMX


Data Mining eXtensions to SQL

DMX

CREATE MINING MODEL

CreditRisk

(CustID


LONG KEY,

Gender TEXT DISCRETE,

Income


LONG CONTINUOUS,

Profession TEXT DISCRETE,

Risk


TEXT DISCRETE PREDICT)

USING

Microsoft_Decision_Trees

INSERT INTO

CreditRisk

(CustId, Gender, Income, Profession,
Risk)

Select

CustomerID, Gender, Income,
Profession,Risk

From Customers

Select

NewCustomers.CustomerID, CreditRisk.Risk,

PredictProbability(CreditRisk)

FROM

CreditRisk
PREDICTION JOIN

NewCustomers

ON

CreditRisk.Gender=NewCustomer.Gender


AND

CreditRisk.Income=NewCustomer.Income

AND
CreditRisk.Profession=NewCustomer.Profession


demo

SQL Server 2000 Data Mining

Microsoft Data Mining

Enter the Game


Create industry standard


Target developer audience


V1.0 product with 2
algorithms

Win Leadership


Continue standards and
developer effort


Comprehensive feature set


Penetrate the Enterprise


Thought leadership

What we offer

Data mining is made accessible and easy
to use through integrated user interface,
cross
-
product integration and familiar,
standard APIs

Full suite of algorithms to automatically
extract information from your data

Complete framework for building and
deploying intelligent applications

Data Mining User Interface

BI Development Studio

Creation and exploration environment

Data Mining projects inside Visual Studio solutions
with related projects

Source Control Integration

SQL Management Studio

Single place for management of all SQL
technologies

Manage, Browse, and Query Data Mining Models


BI Integration

DTS

Data Mining processing and results integrate
directly into the operational pipeline

OLAP

Process Mining Models from Universal Dimensional
Models (Cubes), use learned content to slice cubes
based on data
-
specific patterns

Reporting

Embed Data Mining results directly into Reporting
Services reports

Data Mining Data Flow

Cube

New

Dataset

Data Transform (DTS)

Reporting

Mining Models

Model

Browsing

Prediction

Cube

LOB

Application

Web

.
Net

Native

Operations

(DTS)

Historical

Dataset

SQL

OLE/DB

Text File


demo

SQL Server 2005 Data Mining

Algorithms

Data Mining algorithms developed in conjunction
with Microsoft Research address the full range
of data mining applications

Classification, Regression, Segmentation,
Association, Forecasting, Text Analysis, and
Advanced Data Exploration

Algorithm complexity is hidden behind a
consistent, familiar API

Data Mining logic accessed through SQL queries,
results provided through datasets, rowsets

Applications can use SQL Server Data Mining to
apply learned rules, or show patterns to the user

Data Mining Tasks

Classification

Regression

Segmentation

Association

Forecasting

Text Analysis

Advanced Data
Exploration


Data Mining Tasks

Classification

Regression

Segmentation

Association

Forecasting

Text Analysis

Advanced Data
Exploration


Typical Business Questions

What type of membership card
should I offer?

Which customers will respond to
my mailing?

Is this transaction fraudulent?

Will I lose this customer?

Will this product be defective?

Algorithms

Preferred

Decision Trees

Naïve Bayes

Neural Nets

Also Can

Clustering

Sequence Clustering

Association Rules

Data Mining Tasks

Classification

Regression

Segmentation

Association

Forecasting

Text Analysis

Advanced Data
Exploration


Typical Business Questions

How much revenue will I get
from this customer?

How long will this asset be in
service?

Algorithms

Preferred

Microsoft Decision Trees

Neural Nets

Also Can

Clustering

Sequence Clustering

Data Mining Tasks

Classification

Regression

Segmentation

Association

Forecasting

Text Analysis

Advanced Data
Exploration


Typical Business Questions

Describe my customers

How can I differentiate my
customers?

How can I organize my data in a
manner that makes sense?

Algorithms

Preferred

Clustering

Sequence Clustering

Also Can

Neural Nets

Data Mining Tasks

Classification

Regression

Segmentation

Association

Forecasting

Text Analysis

Advanced Data
Exploration


Typical Business Questions

Market Basket Analysis/Cross
Sales

What items are bought together?

What products should I
recommend to my customers?

Algorithms

Preferred

Association Rules

Decision Trees (small
catalogs)

Also Can

Clustering, Sequence
Clustering, Naïve Bayes,
Neural Nets

Data Mining Tasks

Classification

Regression

Segmentation

Association

Forecasting

Text Analysis

Advanced Data
Exploration


Typical Business Questions

What are projected revenues
for all products?

What are inventory levels
next month?

Algorithms

Time Series

Data Mining Tasks

Classification

Regression

Segmentation

Association

Forecasting

Text Analysis

Advanced Data
Exploration


Analysis of unstructured data

Key term and key phrase
extraction

Conversion to structured data

Feed into other algorithms

Classification

Segmentation

Association

Typical Business Questions

How do I handle call center
data?

How can I classify mail?

What can I do with web
feedback?

Implemented inside DTS

Data Mining Tasks

Classification

Regression

Segmentation

Association

Forecasting

Text Analysis

Advanced Data
Exploration


Learning more about your data
through visualizations

Typical Business Questions

Why do people churn?

What are the relationships
between products?

What are the differences
between high profit and low profit
customers?

All algorithms

Intelligent Applications

Make decisions without coding

Data Mining algorithms learn business rules directly from the
data, freeing you from trying discover and code them
yourselves

Customized for each client

Data Mining learns the rules from the client’s data


resulting
in logic that is automatically specialized for each individual
client

Automatically update themselves

As your client’s business changes, so do the factors that
impact their business. Data Mining allows your application
logic to be automatically updated through a simple processing
step. Applications do not need to be rewritten, recompiled or
redeployed, and are always online


even during processing.

Data Mining Programmability

DMX Query Interface

OLEDB, ADO, ADO.Net, ADOMD.Net, XMLA

Dim cmd as ADOMD.Command

Dim reader as ADOMD.DataReader

Cmd.Connection = conn

Set reader = Cmd.ExecuteReader(“Select Predict(Gender)…”)

Data Mining Object Model

ADOMD.Net, Server ADOMD.Net

Direct access to Mining content

CLR User Defined Procedures execute on the server

Expandability

Plug
-
In Algorithms

Plug
-
In Viewers


demo

Intelligent Applications

Version Comparison

SQL 2000

SQL 2005

Algorithms

2

7++

Algorithm Parameters

~2 each

~6 each

Processing
Perf/Scalability

Best on market

Better than SQL 2K

Prediction Perf

Not measured

10M/hour

Viewers

2 proprietary

12 embeddable

Tools

Limited

Extensive

Extensibility

Limited and difficult

Flexible and easy

Integration

OLAP, DTS

OLAP++, DTS++, Reporting,
SQL Workbench

Main API’s

OLEDB, ADO, DSO

+ XMLA, ADOMD.Net, AMO

Ask The Experts

Get Your Questions Answered

We will complete this slide in your deck on your
behalf with the times you will need to be
available for the Ask the Experts area

Community Resources

This slide will be completed on your behalf.

It will appear within your presentation

Microsoft Learning

TechNet and Microsoft Press will complete
this slide on your behalf and will insert per
session.

evaluations

© 2003 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only.

MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.