Advanced Analytics - Chicago SQL BI User Group

plantationscarfΤεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

57 εμφανίσεις

Advanced Analytics

Data
Mining using SQL Server




Tuesday
, April 17, 2012 from 5:30 PM to 7:30 PM (CT)

Thomas Arehart

Microsoft Technology Center

Growing Business Use

Whether delivered as dashboards, scorecards or standalone tools, the number of
users benefiting from access to business intelligence (BI) and analytics tools is taking
off.

Once limited to only a few number crunchers with degrees in advanced
mathematics, BI and analytics tools are rapidly being deployed to all professionals in

many organizations, and to everyone in a substantial number of companies,
according to analysts and recent surveys.

While traditional BI tools were complex and expensive, access to powerful BI and
analytics capabilities is no longer out of reach for the masses. Today, BI capabilities
are increasingly embedded in a wide range of software applications.

Another reason for the broader use of these tools is that the market has evolved
into a broad ecosystem. A wide swath of vendors in a variety of fields essentially
have collaborated to simplify the technology front ends as well as focused the tools
on specific vertical markets such as retailing, telecom and consumer packaged
goods manufacturing. The BA market ranges from platform technologies such as
data warehouse management to end user
-
facing analytic applications and BI tool.
(1)

Business Need

With BI capabilities now found in a wide range of software applications as well as
lighter weight, standalone packages, new
-
generation BI is often invisible to its users.
This lets them focus on making better decisions and serving customers more
effectively as opposed to staying up to speed on the latest technology acronyms.

Knowledge workers need analytical tools to explore the gaps in a process when
things break.

Analytical software that analyzes a multitude of databases and transaction histories
can provide guidance and predictions about future customer needs and behavior.
This guidance empowers employees to anticipate customer needs and reduce costs

and improve overall efficiency.

Companies want more automation and consistency around the decisions employees

make on a daily basis. (1)

Evolutionary Step

Business Question

Enabling Technologies

Product Providers

Data
Characteristics

Data Collection

(1960s)

"What was my total
revenue in the last five
years?"

Spreadsheets, desktop
databases

Microsoft Excel
and Access

Retrospective,
static data
delivery

Data Access

(1980s)

"What were unit sales in
New England last March?"

SQL, relational database
management systems
(RDBMS)

Microsoft
SQLServer

Retrospective,
dynamic data
delivery at
record level

Data Warehousing &

Decision Support

(1990s)

"What were unit sales in
New England last March?
Drill down to Boston."

On
-
line analytic
processing (OLAP),
multidimensional
databases, data
warehouses

Microsoft
Reporting SQL
Server Services
(SSRS),

Microsoft SQL
Server Analysis
Services (SSAS)

Retrospective,
dynamic data
delivery at
multiple levels

Data Mining

(Emerging Today)

"What’s likely to happen to
Boston unit sales next
month? Why?"

Advanced algorithms,
multiprocessor
computers, massive
databases

SQL Server
Integration
Services (SSIS),
Excel
Add
-
In

Prospective,
proactive
information
delivery

Table 1. Steps in the Evolution of Data Mining

Examples of tasks

Microsoft algorithms to
use (2)

Predicting a discrete attribute


䙬慧Fth攠cust潭敲猠楮 愠pr潳o散e楶攠buy敲猠汩lt 慳ag潯d 潲 p潯爠
prospects.


C慬au污t攠th攠pr潢慢楬楴y that 愠獥sv敲ew楬氠i慩氠w楴h楮 th攠next 㘠
months.


Cat敧潲楺攠pat楥it 潵tc潭敳e慮d exp汯l攠r敬et敤 f慣a潲献

M楣r潳oft D散楳楯e Tr敥猠䅬A潲楴hm


Microsoft Naive Bayes Algorithm


Microsoft Clustering Algorithm


Microsoft Neural Network Algorithm


Predicting a continuous attribute


F潲散慳t next y敡爧猠獡汥献


偲敤楣t 獩s攠癩v楴潲猠杩g敮 p慳t h楳t潲楣慬a慮d 獥慳sn慬atr敮d献


䝥G敲at攠愠物獫 獣潲攠杩g敮 d敭e杲慰h楣献

M楣r潳oft D散楳楯e Tr敥猠䅬A潲楴hm


M楣r潳oft 呩T攠卥物敳r䅬A潲楴hm


M楣r潳oft 䱩L敡爠R敧e敳獩en 䅬A潲楴hm


偲敤楣t楮朠愠獥su敮c攠


P敲f潲洠c汩lkstr敡洠慮慬a獩猠sf 愠c潭p慮礧猠W敢 獩s攮


Analyze the factors leading to server failure.


C慰tur攠慮d 慮慬az攠獥su敮c敳e潦 慣t楶楴楥猠iu物r朠潵tpat楥it
visits, to formulate best practices around common activities.

Microsoft Sequence Clustering Algorithm


Finding groups of common items in transactions


啳攠浡mket b慳aet 慮慬a獩猠瑯sdet敲浩m攠pr潤uct p污捥浥nt.


卵gg敳t 慤d楴楯i慬apr潤uct猠t漠愠cust潭敲 f潲 purch慳攮


Analyze survey data from visitors to an event, to find which
activities or booths were correlated, to plan future activities.

Microsoft Association Algorithm


Microsoft Decision Trees Algorithm


Finding groups of similar items


Cr敡t攠pat楥it 物獫 pr潦楬敳i杲潵p猠b慳敤 潮 att物rut敳e獵ch 慳a
d敭e杲慰h楣猠慮d b敨a癩潲献


䅮慬az攠u獥s猠b礠br潷獩s朠慮d bu祩ng patt敲n献


䥤敮t楦礠獥sv敲猠that hav攠獩浩污爠u獡s攠ch慲慣a敲楳t楣献

M楣r潳oft C汵st敲楮朠䅬A潲楴hm


M楣r潳oft 卥qu敮c攠C汵st敲楮朠䅬A潲楴hm


Analytic Algorithm Categories

Regression

a powerful and commonly used algorithm that evaluates the relationship of one variable, the

dependent variable, with one or more other variables, called independent variables. By measuring exactly how

large and significant each independent variable has historically been in its relation to the dependent variable,

the future value of the dependent variable can be estimated. Regression models are widely used in applications,

such as seasonal forecasting, quality assurance and credit risk analysis.

Analytic Algorithm Categories

Clustering /
Segmentation

the process of grouping items together to form categories. You might look at a

large collection of shopping baskets and discover that they are clustered corresponding to health food buyers,

convenience food buyers, luxury food buyers, and so on. Once these characteristics have been grouped together,

they can be used to find other customers with similar characteristics. This algorithm is used to create groups for

applications, such as customers for marketing campaigns, rate groups for insurance products, and crime statistics

groups for law enforcement.

Analytic Algorithm Categories

Nearest Neighbor

quite similar to clustering, but it will only look at others records in the dataset that
are “
nearest” to a chosen
unclassified record based on a “similarity” measure. Records that are “near” to each
other tend
to have similar
predictive values as well. Thus, if you know the prediction value of one of the records,
you can
predict its nearest
neighbor. This algorithm works similar to the way that people think


by detecting
closely matching
examples.
Nearest Neighbor applications are often used in retail and life sciences applications.

Analytic Algorithm Categories

Association Rules

detects related items in a dataset. Association analysis identifies and groups together similar

records that would otherwise go unnoticed by a casual observer. This type of analysis is often used for market

basket analysis to find popular bundles of products that are related by transaction, such as low
-
end digital
cameras being
associated with smaller capacity memory sticks to store the digital images.

Analytic Algorithm Categories

Decision Tree

a tree
-
shaped graphical predictive algorithm that represents alternative sequential
decisions and
the possible outcomes for
each decision. This algorithm provides alternative actions that are available
to the
decision maker, the probabilistic events
that follow from and affect these actions, and the outcomes that
are associated
with each possible scenario of actions and
consequences. Their applications range from credit
card scoring
to time series predictions of exchange rates.

Analytic Algorithm Categories

Sequence Association

detects causality and association between time
-
ordered events, although the
associated events
may be spread
far apart in time and may seem unrelated. Tracking specific time
-
ordered
records and
linking these records to a
specific outcome allows companies to predict a possible outcome based on a
few occurring
events. A sequence
model can be used to reduce the number of clicks customers have to make
when navigating
a company’s
website.

Analytic Algorithm Categories

Neural Network

a sophisticated pattern detection algorithm that uses machine learning techniques to
generate predictions
. This technique
models itself after the process of cognitive learning and the neurological
functions of
the brain capable of predicting new
observations from other known observations. Neural networks
are very
powerful, complex, and accurate predictive models
that are used in detecting fraudulent behavior, in
predicting the
movement of stocks and currencies, and in improving the
response rates of direct marketing campaigns.

Conventional

BI Reporting
Architecture








Excel Data Analysis Tool

Analysis Category

Anova
: Single Factor

multiple linear regression

Anova: Two
-
Factor with replication

multiple linear regression

Anova: Two
-
Factor without replication

multiple linear regression

Correlation

linear regression

Covariance

linear regression

Descriptive Statistics

linear regression

Exponential Smoothing

naïve forecast

F
-
Test Two
-
sample for Variances

linear regression

Fourier Analysis

linear regression

Histogram

linear regression

Moving Average

linear regression

Random Number Generation

N/A

Rank and Percentile

clustering

Regression

linear regression

Sampling

N/A

t
-
Test: Paired Two Sample for Means

linear regression

t
-
Test: Two
-
Sample Assuming Equal
Variances

linear regression

t
-
Test: Two
-
Sample Assuming Unequal
Variances

linear regression

z
-
Test: Two Sample for Means

linear regression



Table Analysis Tools for Excel (SQL Server 2008 Data Mining Add
-
ins)

>

The
Analyze Key Influencers

tool enables you to select a column that contains a
desired outcome or target value, and then analyze the patterns in your data to
determine which factors had the strongest influence on the outcome. For
example, if you have a customer list that includes a column that shows the total
purchases for each customer over the past year, you could analyze the table to
determine the customer demographics for your top purchasers.

Microsoft SQL Server 2008 Data Mining Add
-
Ins for Office 2007




Analyze Key Influencers (Table Analysis Tools for Excel)


Task

Description

Algorithms

Market Basket Analysis

Discover items sold together to create recommendations on
-
the
-
fly and
to determine how product placement can directly contribute to your
bottom line.

Association

Decision Trees

Churn Analysis

Anticipate customers who may be considering canceling their service
and identify the benefits that will keep them from leaving.

Decision Trees

Linear Regression

Logistic Regression

Market Analysis

Define market segments by automatically grouping similar customers
together. Use these segments to seek profitable customers.

Clustering

Sequence Clustering

Forecasting

Predict sales and inventory amounts and learn how they are
interrelated to foresee bottlenecks and improve performance.

Decision Trees

Time Series

Data Exploration

Analyze profitability across customers, or compare customers that
prefer different brands of the same product to discover new
opportunities.

Neural Network

Unsupervised Learning

Identify previously unknown relationships between various elements of
your business to inform your decisions.

Neural Network

Web Site Analysis

Understand how people use your Web site and group similar usage
patterns to offer a better experience.

Sequence Clustering

Campaign Analysis

Spend marketing funds more effectively by targeting the customers
most likely to respond to a promotion.

Decision Trees

Naïve Bayes

Clustering

Information Quality

Identify and handle anomalies during data entry or data loading to
improve the quality of information.

Linear Regression

Logistic Regression

Text Analysis

Analyze feedback to find common themes and trends that concern your
customers or employees, informing decisions with unstructured input.

Text Mining

Microsoft
Office 2007
Data Mining Tasks
(4)

Data Analysis Expressions (DAX) is the standard PowerPivot formula language that supports
custom calculations in PowerPivot tables and Excel PivotTables. While many of the
functions used in Excel are included, DAX also offers additional functions for carrying out
dynamic aggregation and other operations with your data. (8)

(7)

Time related calculated
measures (10)

Dax Formula

Description

=IF( COUNTROWS(VALUES(
DimDate
[
CalendarYear
]))=1


Previous Year



, CALCULATE([Sales], PREVIOUSYEAR(DimDate[DateKey]))



, BLANK()

) OR

or

=IF(



COUNTROWS(VALUES(DimDate[CalendarYear]))=1 ,



CALCULATE([Sales], PARALLELPERIOD(
DimDate
[
Datekey
],
-
12,MONTH))



,

BLANK()



)



or



=IF(



COUNTROWS(VALUES(DimDate[CalendarYear]))=1 ,



[Sales] (PARALLELPERIOD(DimDate[Datekey],
-
12,MONTH)),



BLANK()



)

=IF(COUNTROWS(VALUES(DimDate[CalendarYear]) = 1



, [Sales]
-

CALCULATE([Sales], PREVIOUSYEAR(DimDate[Datekey]))



, Blank()

)



Sales]
-

CALCULATE([Sales], PREVIOUSYEAR(DimDate[Datekey]))

Year over year growth



, Blank()

)

The DMX query editor for SQL Server Reporting Services

Reporting is a fundamental activity in most businesses, and SQL Server

2008 Reporting
Services provides a comprehensive solution for creating, rendering, and deploying reports
throughout the enterprise. SQL

Server Reporting Services can render reports directly from a
data mining model by using a data mining extensions (DMX) query.
This enables users to
visualize the content of data mining models for optimized data representation.
Furthermore, the ability to query directly against the data mining structure enables users to
easily include attributes beyond the scope of the mining model requirements, presenting
complete and meaningful information. (4)

For more information about the functions that are supported for each model type,
see the following links:



Association Model Query Examples

Microsoft Naive Bayes Algorithm

Clustering Model Query Examples

Neural Network Model Query Examples

Decision Trees Model Query Examples

Sequence Clustering Model Query Examples

Linear Regression Model Query Examples

Time Series Model Query Examples

Logistic Regression Model Query Examples




You can also call VBA functions, or create your own functions. For more information,
see Functions (DMX).


SELECT


PredictTimeSeries([Forecasting].[Amount]) as [PredictedAmount]

, PredictTimeSeries([Forecasting].[Quantity]) as [PredictedQty]

FROM


[Forecasting]

Prediction Queries (Data Mining) (9)

SQL Server 2008 data mining supports a number of application programming interfaces
(APIs) that developers can use to build custom solutions that take advantage of the
predictive analysis capabilities in SQL

Server. DMX, XMLA, OLEDB and ADOMD.NET, and
Analysis Management Objects (AMO) offer a rich, fully documented development platform,
empowering developers to build data mining aware applications and providing real
-
time
discovery and recommendation through familiar tools.

This extensibility creates an opportunity for business organizations and independent
software vendors (ISVs) to embed predictive analysis into line
-
of
-
business applications,
introducing insight and forecasting that inform business decisions and processes. For
example, the Analytics Foundation adds predictive scoring to Microsoft Dynamics® CRM, to
enable information workers across sales, marketing, and service organizations to identify
attainable opportunities that are more likely to lead to a sale, increasing efficiency and
improving productivity (for more information, see the
Microsoft Dynamics

site).

Integration Services Data Mining Tasks and Transformations

--------------------------------------------------------------------------------


SQL Server Integration Services provides many components that support data mining.


Some tools in Integration Services are designed to help automate common data mining tasks,
including prediction, model building, and processing.


For example:

1)
Create an Integration Services package that automatically updates the model every time the
dataset is updated with new customers

2)
Perform custom segmentation or custom sampling of case records.

3)
Automatically generate models passed on parameters.


However, you can also use data mining in a package workflow, as an input to other processes.

For example:

1)
Use probability values generated by the model to weight scores for text mining or other
classification tasks.

2)
Automatically generate predictions based on prior data and use those values to assess the
validity of new data.

3)
Using logistic regression to segment incoming customers by risk.


Data mining in SQL Server Integration Services

Microsoft SQL Server 2008 Integration Services provides a powerful, extensible ETL platform
that Business Intelligence solution developers can use to implement ETL operations . SQL

Server
Integration Services includes a Data Mining Model Training destination for training data mining
models, and a Data Mining Query transformation that can be used to perform predictive
analysis on data as it is passed through the data flow. Integrating predictive analysis with
SQL

Server Integration Services enables organizations to flag unusual data, classify business
entities, perform text mining, and fill
-
in missing values on the fly based on the power and insight
of the data mining algorithms. (4)

After you have created a mining structure and mining model by using the Data Mining Wizard,
you can use the Data Mining Designer from either SQL Server Data Tools (SSDT) or SQL Server
Management Studio to work with existing models and structures.


The designer includes tools for these tasks:


1)
Modify the properties of mining structures, add columns and create column aliases,
change the binning method or expected distribution of values.

2)
Add new models to an existing structure; copy models, change model properties or
metadata, or define filters on a mining model.

3)
Browse the patterns and rules within the model; explore associations or decision
trees. Get detailed statistics about

4)
Custom viewers are provided for each different time of model, to help you analyze
your data and explore the patterns revealed by data mining.

5)
Validate models by creating lift charts, or analyzing the profit curve for models.
Compare models using classification matrices, or validate a data set and its models
by using cross
-
validation.

6)
Create predictions and content queries against existing mining models. Build one
-
off
queries, or set up queries to generate predictions for entire tables of external data.


SQL Server 2008 Analysis Services provides a highly scalable platform for multidimensional
OLAP analysis. Many customers are already reaping the benefits of creating a unified
dimensional model (UDM) in Analysis Services and using it to slice and dice business
measures by multiple dimensions. Predictive analysis, being part of SQL Server

2008 Analysis
Services provides a richer OLAP experience, featuring data mining dimensions that slice your
data by the hidden patterns within.(4)

A data mining dimension in an OLAP cube

Data Mining Algorithms


(Analysis Services
-

Data Mining)

Choosing an Algorithm
by Task


To help you select an
algorithm for use with a
specific task, the
following table provides
suggestions for the
types of tasks for which
each algorithm is
traditionally used.


Examples of tasks
Microsoft algorithms to use
Predicting a discrete attribute
Microsoft Decision Trees Algorithm
Flag the customers in a prospective buyers list as good
or poor prospects.
Microsoft Naive Bayes Algorithm
Calculate the probability that a server will fail within
the next 6 months.
Microsoft Clustering Algorithm
Categorize patient outcomes and explore related
factors.
Microsoft Neural Network Algorithm
Predicting a continuous attribute
Microsoft Decision Trees Algorithm
Forecast next year's sales.
Microsoft Time Series Algorithm
Predict site visitors given past historical and seasonal
trends.
Microsoft Linear Regression Algorithm
Generate a risk score given demographics.
Predicting a sequence
Microsoft Sequence Clustering Algorithm
Perform clickstream analysis of a company's Web site.
Analyze the factors leading to server failure.
Capture and analyze sequences of activities during
outpatient visits, to formulate best practices around
common activities.
Finding groups of common items in transactions
Microsoft Association Algorithm
Use market basket analysis to determine product
placement.
Microsoft Decision Trees Algorithm
Suggest additional products to a customer for
purchase.
Analyze survey data from visitors to an event, to find
which activities or booths were correlated, to plan
future activities.
Finding groups of similar items
Microsoft Clustering Algorithm
Create patient risk profiles groups based on attributes
such as demographics and behaviors.
Microsoft Sequence Clustering Algorithm
Analyze users by browsing and buying patterns.
Identify servers that have similar usage
characteristics.
Many businesses use KPIs to evaluate critical business metrics against targets. SQL Server

2008
Analysis Services provides a centralized platform for KPIs across the organization, and
integration with Microsoft Office PerformancePoint® Server

2007 enables decision makers to
build business dashboards from which they can monitor the company’s performance. KPIs are
traditionally retrospective, for example showing last month’s sales total compared to the sales
target. However, with the insights made possible through data mining, organizations can build
predictive KPIs that forecast future performance against targets, giving the business an
opportunity to detect and resolve potential problems proactively. Predictive analysis can detect
attributes that influence KPIs. Together with Office PerformancePoint Server 2007, users can
monitor trends in key influencers to recognize those attributes that have a sustained effect.
Such insights enable businesses to inform and improve their response strategy. (4)

Microsoft Office PerformancePoint Server 2007

The SQL Server data mining toolset is fully extensible through Microsoft .NET

stored
procedures, plug
-
in algorithms, custom visualizations and PMML. This enables developers
to extend the out
-
of
-
the
-
box data mining technologies of SQL Server 2008 to meet
uncommon business needs that are specific to the organization by:



Creating custom data mining algorithms to solve business
-
specific analytical problems.



Using data mining algorithms from other software vendors.



Creating custom visualizations of data mining models through plug
-
in viewer APIs.

Although the data mining functionality provided with SQL Server

2008 is comprehensive
enough to meet the needs of a wide range of business scenarios, its extensibility ensures
that it can be used to solve virtually any predictive problem. The ability to extend the
data mining technologies of SQL

Server through custom algorithms and visualizations,
together with the ability to embed predictive functionality into line
-
of
-
business
applications makes SQL Server

2008 a powerful platform for introducing predictive
analysis into existing business processes to add insight and recommendations into
everyday operations. (4)

Plugin Algorithms


SQL Server 2012

SQL Server 2008 R2

SQL Server 2008

SQL Server 2005


In addition to the algorithms that Microsoft SQL Server Analysis Services provides, there
are many other algorithms that you can use for data mining. Accordingly, Analysis
Services provides a mechanism for "plugging in" algorithms that are created by third
parties. As long as the algorithms follow certain standards, you can use them within
Analysis Services just as you use the Microsoft algorithms. Plugin algorithms have all the
capabilities of algorithms that SQL Server Analysis Services provides.


For a full description of the interfaces that Analysis Services uses to communicate with
plugin algorithms, see the samples for creating a custom algorithm and custom model
viewer that are published on CodePlex Web site.


One Way ANOVA (Analysis of Variance)


When to Use One
-
Way, Single Factor ANOVA

In a manufacturing or service environment, you might wonder if changing a
formula, process or material might deliver a better product at a lower cost. Saving
a penny a pound on five million pounds a month can really add up. Saving ten
minutes of wait time in hospital might add $100,000 to the bottom line and
deliver better patient outcomes. Comparing two or more drug formulations might
pinpoint the best drug for a desired result.


How can you compare the old formula with a new one and be certain that you
have an opportunity to improve? Use one
-
way ANOVA (also known as single factor
ANOVA) to determine if there's a statistically significant difference between two or
more alternatives.






One Way ANOVA (Analysis of Variance)

Imagine that you manufacture paper bags and that you want to improve the tensile
strength of the bag. You suspect that changing the concentration of hardwood in the bag
will change the tensile strength. You measure the tensile strength in pounds per square
inch (PSI).


So, you decide to test this at 5%, 10%, 15% and 20% hardwood concentration levels.
These "levels" are also called "treatments."


Since we are only evaluating a single factor (hardwood concentration) this is called


one
-
way ANOVA
.


The
null

hypothesis

is that the means are equal:


H
0
: Mean1 = Mean2 = Mean3 = Mean4

The alternate hypothesis is that at least one of the means are
different
:


H
a
: At least one of the means is different

To conduct the one
-
way ANOVA test, you need to randomize the trials (assumption #1).


Imagine that we've conducted these trials at each of the four levels of hardwood
concentration.

One Way ANOVA (Analysis of Variance)

You'll find the results of these trials in the ANOVA test data provided with the QI Macros
at
c:
\
qimacros
\
testdata
\
anova.xls
.

The QI Macros will prompt you for the
significance level
you desire.

While the
default is 0.05
(95% confident), in this example we want to be even more
certain, so we use 0.01 (99% confident).

One Way ANOVA (Analysis of Variance)

Interpreting the
Anova

One Way test results

The QI Macros automatically compares the p value to
a
, but you might want to know how to do
this manually. The "null" hypothesis assumes that there is
no difference

between the hardwood
concentrations.

If

Then

test statistic > critical value

(i.e. F> Fcrit)

Reject the null hypothesis

test statistic < critical value

(i.e. F< Fcrit)

Accept the null hypothesis

p value

<
a

Reject the null hypothesis

p value

>
a

Accept the null
hypothesisThe

P
-
value of 0.000 is
less than the significance level (0.01), so we can
reject the null hypothesis and safely assume that
hardwood concentration affects tensile strength. F
(19.60521) is greater than F
crit

(4.938193), so
again, we can reject the null hypothesis.

Interpreting the Anova One Way test results

The QI Macros automatically compares the p value to
a
, but you might want to know how to
do this manually. The "null" hypothesis assumes that there is
no difference

between the
hardwood concentrations.


The P
-
value of 0.000 is
less than

the significance level (0.01), so we can
reject

the null
hypothesis and safely assume that hardwood concentration affects tensile strength.

F
(19.60521) is greater than
F crit (4.938193)
, so again, we can reject the null hypothesis.

One Way ANOVA (Analysis of Variance)

Now we can look at the average tensile strength and variances:

The average tensile strength increases, but we cannot say for certain which means differ. The variance at the 15% level looks

su
bstantially lower
than the other levels. We might need to do additional analysis.


If we reran the one way Anova test with just 10% and 15%, we'd discover there is no statistically significant difference betw
een

the two means.

The P value (0.349) is greater than the signficance level (0.01), so we cannot reject the null hypothesis that the means are
equ
ivalent.
And F (0.963855) is less than F crit (10.04429) so we cannot reject the null hypothesis.


Based on this analysis, if we were aiming for a tensile strength of 15 PSI or greater, the
10% level might be more cost effective.

Two Way ANOVA (Analysis of Variance)
-

Without Replication

What's cool about QI Macros Two
-
Way ANOVA?



Unlike other statistical software, the QI Macros is the only SPC software that compares the
p
-
values to the significance level and tells you when to "Accept or Reject the Null
Hypothesis" and what that tells you: "Means are Same or Different ".


Two Way Anova (Analysis of variance) , also known as two factor Anova, can help you
determine if two factors have the same "mean" or average. This is a form of "
hypothesis
testing
."

Two Way ANOVA (Analysis of Variance)
-

Without Replication

The null hypothesis is that the means are equal:


•H0: Factor 1's Means = Factor 2's Means

The alternate hypothesis is:


•Ha: The means are different.

The goal is to accept or reject the null hypothesis (i.e., the samples have different means)
at a certain confidence level (95% or 99%).


Two Way ANOVA (Analysis of Variance)
-

Without Replication

Using Excel and the QI Macros, run a two
-
way analysis without replication
(alpha=0.05 for a 95% confidence).



Click on QI Macros menu and select: ANOVA Two Factor without replication.

Two Way ANOVA (Analysis of Variance)
-

Without Replication

Interpreting the
Anova

Two Way Without Replication Results

In case you want to know how to do this manually, use these instructions.

If

Then

test statistic > critical value

(i.e. F> Fcrit)

Reject the null hypothesis

test statistic < critical value

(i.e. F< Fcrit)

Accept the null hypothesis

p value

<
a

Reject the null hypothesis

p value

>
a

Accept the null
hypothesisHere
, the P
-
value for Rows
(i.e., golfers) is less than alpha (0.05), so we can
reject the hypothesis that all of the golfers are the
same. The P
-
Value for Columns (i.e., golf balls) is also
less than alpha, so we can reject the hypothesis that
all of the golf balls are the same.

Interpreting the Anova Two Way Without Replication Results

In case you want to know how to do this manually, use these instructions.

Here, the P
-
value for Rows (i.e., golfers) is less than alpha (0.05), so we can reject the hypothesis that all of the golfers ar
e the same. The P
-
Value for
Columns (i.e., golf balls) is also less than alpha, so we can reject the hypothesis that all of the golf balls are the same.

Two Way ANOVA (Analysis of Variance)
-

Without Replication

It does look like Brand B and C are similar. We could run a
paired two sample t test

on Brands B
and C to determine if they deliver the same average distance.



Since the p values are greater than alpha (0.05), we can accept the null hypothesis that there is
no difference between the two brands of golf balls, except perhaps price.

Since the p values are greater than alpha (0.05), we can accept the null hypothesis that
there is no difference between the two brands of golf balls, except perhaps price.

Two Way ANOVA (Analysis of Variance)
With Replication

When to Use Two Way Anova

Two Way Anova (Analysis of variance) , also known as two factor Anova, can help
you determine if two or more samples have the same "mean" or average. This is a
form of "hypothesis testing."

The null hypothesis is that the means are equal. The alternate hypothesis is that
the means are not equal.


•H0: Mean1 = Mean2 = Mean3

•Ha: Mean1 <> Mean2 <> Mean3

The goal is to accept or reject the null hypothesis (i.e., the samples have different
means) at a certain confidence level (95% or 99%).


Two Way ANOVA (Analysis of Variance)
With Replication

What if you have two populations of patients (male/female) and three different kinds of
medications, and you want to evaluate their effectiveness? You might run a study with
three "replications", three men and three women.

Two Way ANOVA (Analysis of Variance)
With Replication

Using the QI Macros, run a two
-
way Anova analysis with replication (alpha=0.05 for a
95% confidence).



What's cool about QI Macros ANOVA?



Unlike other statistical software, the QI Macros is the only SPC software
that compares the p
-
values (0.179) to the signficance (0.05) and tells
you to "Accept the Null Hypothesis because p>0.05" and that the
"Means are the same ".

Two Way ANOVA (Analysis of Variance)
With Replication

If

Then

test statistic > critical value

(i.e. F>
Fcrit
)

Reject the null hypothesis

test statistic < critical value

(i.e. F< Fcrit)

Accept the null hypothesis

p value

<
a

Reject the null hypothesis

p value

>
a

Accept the null
hypothesisHere
, the P
-
value for
Male/Female is greater than alpha (.179> .05),
so we accept the null hypothesis that the
means are the same. The P
-
Value for Drugs is
greater than alpha (.106 > .05), so the null
hypothesis holds as well (means are the same).
The P
-
value for the interaction of the drugs and
patients is less than alpha (.006< .05), so we
reject the null
hypotheis

and can say that the
effectiveness of the drugs is not the same for
the two categories of patients.

Interpreting the Anova Two Way Results

In case you want to know how to do this manually, use these
instructions:

1)
Pervasive insights produce better business decision opening access to business intelligence by
embedding analytics capabilities into everyday software tools pays substantial dividends.

By Lauren Gibbons Paul

2
)
Data Mining Algorithms (Analysis Services
-

Data Mining)


http://msdn.microsoft.com/en
-
us/library/ms175595.aspx

3)
Data Mining Query Task

http://msdn.microsoft.com/en
-
us/library/ms141728.aspx

4)
Predictive Analysis with SQL Server 2008
-

White Paper
-

Microsoft
-

Published: November 2007

5)
Predictive Analytics for the Retail Industry
-

White Paper
-

Microsoft
-

Writer: Matt Adams
Technical Reviewer: Roni Karassik, Published: May 2008

6)
Breakthrough Insights using Microsoft SQL Server 2012
-

Analysis Services

https://www.microsoftvirtualacademy.com/tracks/breakthrough
-
insights
-
using
-
microsoft
-
sql
-
server
-
2012
-
analysis
-
services

7)
Useful DAX Starter Functions and

Expressions

http://thomasivarssonmalmo.wordpress.com/category/powerpivot
-
and
-
dax/

8)
Stairway to PowerPivot and DAX
-

Level 1: Getting Started with PowerPivot and DAX


By
Bill_Pearson
, 2011/12/21

9
) Data Mining Tool

http://technet.microsoft.com/en
-
us/library/ms174467.aspx

10) DAX Cheat Sheet

http://powerpivot
-
info.com/post/439
-
dax
-
cheat
-
sheet



References