Data Mining

boorishadamantAI and Robotics

Oct 29, 2013 (3 years and 9 months ago)

85 views

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

1


Data Warehousing


& Data Mining

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

2

Some Definitions



A
data warehouse

(DW) is a collection of integrated
databases designed to support a DSS


An
operational data store (ODS)

stores data for a specific
application. It feeds the data warehouse a stream of desired
raw data.


A
data mart

is a lower
-
cost, scaled
-
down version of a data
warehouse, usually designed to support a small group of
users (rather than the entire firm)


The
metadata

is information that is kept about the warehouse


Online Analytical Processing

(OLAP) is the broad category of
software technology that enables multidimensional analysis of
enterprise data

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

3

Business Intelligence and Analytics


Business intelligence (BI)


Acquisition of data and information for use in decision
-
making
activities


Business analytics (BA)


Models and solution methods


Web intelligence


Application of business intelligence techniques to Web sites


Web analytics


Application of business analytics to Web sites


Data mining


Applying models and methods to data to identify patterns and
trends

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

4

Data Warehouse


Subject
-
oriented (as opposed to
application
-
oriented)


Data is organised based on its intended use



Scrubbed


and

cleansed


so that data from heterogeneous
sources are standardised


Time series, historical data


Non
-
volatile (read only)


Summarised: in decision
-
usable format


Data from both internal and external sources is present


Metadata included


Business metadata


Semantic metadata

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

5

Data Warehouse: Environment


The organisation’s legacy systems and data stores
provide data to the data warehouse (DW) or mart


During the transfer of data from the various sources,
cleansing or transformation may occur, so the data in
the DW is more uniform


Simultaneously, metadata is recorded


Finally, the DW or mart may be used to create one or
more “personal” warehouses

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

6

Data Warehouse: Environment

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

7

Integration of Data Sources


Access needed to multiple sources


Often enterprise
-
wide


Disparate and heterogeneous databases


XML becoming language standard


External data sources: Web


Intelligent agents


Document management systems


Content management systems


External data sources: commercial databases


Might buy / sell access to specialised databases


Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

8

Integration of Data Sources

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

9

Data Marts


Dependent


Created from warehouse


Replicated


Functional subset of warehouse


Independent


Scaled down, less expensive version of data
warehouse


Designed for a department or SBU


Organisation may have multiple data marts


Difficult to integrate


Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

10

Migrating Data


Business rules


Stored in metadata repository


Applied to data warehouse centrally


Data extracted from all relevant sources


Loaded through data
-
transformation tools or
programs


Separate operation and decision support
environments


Correct problems in quality before data stored


Cleanse and organise in consistent manner

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

11

Data Quality


Quality is critical


Quality determines
usefulness


Often neglected or casually handled


Problems exposed when data is summarised



Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

12

Data Quality

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

13

Data Quality


Cleanse data


When populating warehouse


Data quality action plan


Best practices for data quality


Measure results


Data integrity issues


Uniformity


Version


Completeness check


Conformity check


Genealogy or drill
-
down

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

14

Advantages of Data Warehousing


Simplicity


a data warehouse provides a single image of business reality by
integrating various data


Better quality data; improved productivity


consistency and accuracy leads to better and more productive
decision
-
making; end
-
user computing boosts productivity


Fast access


necessary data is in one place, so system response time is cut


Easy to use


designed for specific informational needs of end users


Separate decision
-
support operation from production
operation


speeds access, avoids conflict and integrity problems

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

15

Advantages of Data Warehousing


Gives competitive advantage


through better management and and utilisation of corporate knowledge


Ultimate distributed database


a data warehouse pulls together information from disparate and potentially
incompatible locations throughout the organisation


Information flow management


a data warehouse, especially the meta data, is helpful in the continual task of
incrementally refining process workflows in a changing business environment


Enables parallel processing


users can ask questions that were too process
-
intensive to answer before and a
data warehouse can handle more users, transactions, queries, and messages


Robust processing engines


data warehouses allow users to directly obtain and refine data from different
software applications without affecting the operational databases


Security


since clients of the data warehouses cannot directly query the production
databases, the security of the production databases is increased

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

16

Disadvantages of Data Warehousing


Complexity and anticipation in development


you cannot just buy a data warehouse; you have to build one
because each warehouse has a unique architecture and a set of
requirements that spring from the individual needs of the
organisation


Takes time to build


Expensive to build


End
-
user training


It is necessary to create a new “mind
-
set” with all employees
who must be prepared to capitalise upon the innovative data
analysis provided by data warehouses


Complexity involved in symmetrical multiprocessing
(SMP) and massively parallel processing (MPP)

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

17

The Future of Data Warehousing


As the DW becomes a standard part of an
organisation, there will be efforts to find new ways to
use the data. This will likely bring with it several new
challenges:


Regulatory constraints

may limit the ability to
combine sources of disparate data (e.g. Data
Protection Act)


These disparate sources are likely to contain
unstructured data
, which is hard to store


The
Internet

makes it possible to access data from
virtually “anywhere”. Of course, this just increases
the disparity.

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

18

Data Mining


Definition: “the analysis of data to discover
previously unknown relationships that provide
useful information” (
Hand et al.
)


Data mining makes use of statistical and
visualisation techniques to discover and present
information in a form that is easily
comprehensible


Data mining can be applied to tasks such as
decision support, forecasting, estimation, and
uncovering and understanding relationships
among data elements

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

19

Data Mining


Traditionally the task of identifying and utilising information
hidden in data has been achieved through some form of
traditional statistical methods


Typically, this involves a user formulating a guess about a
possible relationship in the data and evaluating this
hypothesis via a statistical test. This is a largely time
-
intensive, user
-
driven, top
-
down approach to data analysis.


With data mining, the interrogation of the data is done by the
data mining algorithm rather than by the user


Data mining is a self
-
organising, data
-
influenced, bottom
-
up
approach to data analysis


Simply put, what data mining does is sort through masses of
data to uncover patterns and relationships, then build models
to predict behaviours

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

20

Web Mining


Web mining is a special case of data mining
where the mining occurs over a Website


It enhances the website with intelligent
behaviour, such as suggesting related links or
recommending new products


It allows you to unobtrusively learn the interests
of the visitors and modify their user profiles in
real time


They also allow you to match resources to the
interests of the visitor

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

21

Data Mining: Why the Growth in Popularity?


One reason is that we keep getting more and more
data all the time and need tools to understand it


We also are aware that the human brain has trouble
processing multidimensional data


A third reason is that machine learning techniques
are becoming more affordable and more refined at
the same time

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

22

Verification
-
v
-

Knowledge Data Discovery


In the past, decision support activities
were primarily based on the concept of
verification


This required a great deal of prior
knowledge on the decision
-
maker

s part in
order to verify a suspected relationship


With the advance of technology, the
concept of verification began to turn into
knowledge data discovery


Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

23

Knowledge Data Discovery


Knowledge data discovery (KDD)
techniques include: statistical analysis,
neural or fuzzy logic, intelligent agents,
data visualisation


KDD techniques not only discover useful
patterns in the data, but also can be used
to develop predictive models


Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

24

The Knowledge Discovery Search Process


Define the business problem and obtain the
data to study it


Use data mining software to model the
problem


Mine the data to search for patterns of
interest


Review the mining results and refine them by
re
-
specifying the model


Once validated, make the model available to
other users of the DW

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

25

Analytic Systems


Real
-
time queries and analysis


Real
-
time decision
-
making


Real
-
time data warehouses updated daily or
more frequently


Updates may be made while queries are active


Not all data updated continuously


Deployment of business analytic applications


Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

26

On
-
line Analytical Processing (OLAP)


Activities performed by end users in on
-
line (i.e.
“live” multi
-
user) systems


Specific, open
-
ended query generation e.g. SQL


Ad hoc reports


Statistical analysis


Building DSS applications


Modeling and visualisation capabilities


Special class of tools


DSS, BI, BA, DBMS, GIS, etc.

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

27

Multidimensional OLAP (MOLAP)


Data can be viewed across
several dimensions. Here
sales are arrayed by region
and product


A fourth dimension could be
added by using several
graphs, perhaps at different
time points


Most analyses have many
more dimensions than this.
MOLAP handles data as an
n
-
dimensional hypercube


Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

28

Relational OLAP (ROLAP)


A large relational database server replaces the
multidimensional one


The database contains both detailed and summarised
data, allowing “drill down” techniques to be applied


SQL interfaces allow vendors to build tools, both
portable and scalable


This requires databases with many relational tables
which may lead to substantial processor overhead on
complex joins

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

29

Data Mining Technologies


Statistics



the most mature data mining
technologies, but are often not applicable
because they need clean data. In addition,
many statistical procedures assume linear
relationships, which limits their use.


Neural networks, genetic algorithms, fuzzy
logic



these technologies are able to work
with complicated and imprecise data. Their
broad applicability has made them popular in
the field.

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

30

Data Mining Technologies


Decision trees



these technologies are
conceptually simple and have gained in
popularity as better tree growing
software was introduced. Because of
the way they are used, they are perhaps
better called

classification


trees.

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

31

Data Mining Techniques


Paralleling the popularity of data mining itself, the
development of new techniques is exploding as well


Many innovations are vendor
-
specific, which
sometimes does little to advance the state of the art


Regardless, data
-
mining techniques tend to fall into
four major categories:


classification


association


sequencing


clustering

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

32

Classification Methods


The goal is to discover rules that define whether
an item belongs to a particular subset or class of
data


For example, if we are trying to determine which
households will respond to a direct mail
campaign, we will want rules that separate the
“probables” from the not probables.


These IF
-
THEN rules often are portrayed in a
tree
-
like structure

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

33

Sequencing Methods


These methods are applied to time series
data in an attempt to find hidden trends


If found, these can be useful predictors of
future events


For example, customer groups that tend to
purchase products tied
-
in with hit movies
would be targeted with promotional
campaigns timed to release dates

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

34

Clustering Techniques


Clustering techniques attempt to create partitions in the
data according to some “distance” metric


Clustering aims to segment a diverse group into a
number of similar subgroups or clusters


The clusters formed are data grouped together simply by
their similarity to their neighbours


By examining the characteristics of each cluster, it may
be possible to establish rules for classification


In clustering, there are no predefined classes and no
examples. The records are grouped together on the
basis of self
-
similarity.

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

35

Association Methods


These techniques search all transactions from a system
for patterns of occurrence


A common method is
market basket analysis
, in which
the set of products purchased by thousands of
consumers are examined


It finds affinity groupings that

discover what items are
usually purchased with

others, predicting the frequency
with which

certain items are purchased at the same time


Results are then portrayed as percentages; for example,
“30% of the people that buy steaks also buy charcoal”

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

36

Association: Market Basket Analysis


This is the most widely used and, in many ways, most
successful data mining algorithm


It essentially determines what products people
purchase together


Retailers can use this information to place these
products in the same area


Direct marketers can use this information to determine
which new products to offer to their current customers


Inventory policies can be improved if reorder points
reflect the demand for the complementary products

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

37

Market Basket Analysis Method


We first need a list of transactions to see what
was purchased. This can be easily obtained
from cash registers / POS devices.


Next, we choose a list of products to analyse,
and tabulate how many times each was
purchased with the others


Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

38

A Convenience Store Example


Consider the following simple example about
five transactions at a convenience store:


Transaction 1: Pizza, cola, milk


Transaction 2: Milk, potato chips


Transaction 3: Cola, pizza


Transaction 4: Milk, biscuits


Transaction 5: Cola, biscuits


These

need to be cross tabulated and displayed
in a table


Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

39

A Convenience Store Example



Pizza and Cola sell together more often than any other
combination; a cross
-
marketing opportunity?


Milk sells well with everything; people probably come
here specifically to buy it

Product
Bought

Pizza
also

Milk

also

Cola

also

Chips
also

Biscuits

also

Pizza

2

1

2

0

0

Milk

1

3

1

1

1

Cola

2

1

3

0

1

Chips

0

1

0

1

0

Biscuits

0

1

1

0

2

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

40

Market Basket Analysis:

Using the Results


The tabulations can immediately be translated
into association rules and the numerical
measures computed


Comparing this week’s table to last week’s table
can immediately show the affect of this week’s
promotional activities


Some rules are going to be
trivial

(e.g. hot dogs
and buns sell together) or
inexplicable / spurious
(e.g. wheelbarrows sell best on Wednesdays?)

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

41

Market Basket Analysis: Limitations


A large number of real transactions are needed
to do an effective basket analysis, but the data’s
accuracy is compromised if all the products do
not occur with similar frequency


The analysis can sometimes capture results that
were due to the success of previous marketing
campaigns (and not natural tendencies of
customers)


(Have a look at Amazon.com to see it in action)

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

42

Data Visualisation


Data visualisation is so powerful because the
human visual cortex converts objects into
information so quickly


See an example on the next slide where height
and shading add additional dimensions to the
figure …

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

43

Data Visualisation: An “Enlivened”
Risk Analysis Report

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

44

Data Visualisation


Technologies which support visualisation and
interpretation include:


Digital imaging, GIS, GUI, tables, multi
-
dimensions, graphs, VR, 3D, animation


Helps to visually identify relationships and trends


Data manipulation allows real
-
time inspection of
performance data / CPI benchmarks

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

45

Geographical Information Systems
(GIS)


A Geographical Information System (GIS) is a special
purpose database that contains a spatial co
-
ordinate
system


Computerised system for managing and manipulating
data with digitised maps


Used for modeling and simulations


A comprehensive GIS requires:


Data input from maps, aerial photos, etc.


Data storage, retrieval and query


Data transformation and modeling


Data reporting (maps, reports and plans)

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

46

GIS: Sample Applications

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

47

Capabilities of a GIS


In general, a GIS contains two types of data:


Spatial data
: these elements correspond to a
uniquely
-
defined location on earth. They could be
in point, line or polygon form


Attribute data
: These are the data that will be
portrayed at the geographic references
established by spatial data


Example (next slide): data from an opinion poll is
displayed for multiple regions in the USA. Clicking on
an area allows the user to drill down to the results for
smaller areas.

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

48

Sample GIS Application:

Telephone Polling Results

On the

live


map, clicking on an area allows the user
to drill down and see results for smaller areas

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

49

Data Mining: Some Applications


Pharmaceuticals:

Massive amounts of biological and clinical information
can be analysed with data mining methods to discover new uses for existing
drugs


Healthcare:

Hospitals are using data mining to perform utilisation analysis
and pricing analysis, to estimate outcome analysis, to improve preventive
care, and to detect fraud and questionable practices


Banking:

Data mining tools help banks to understand customer behaviour,
conduct profitability analysis, improve cross
-
selling efforts, identify credit
risk, identify customers for loan campaigns, tailor financial products to meet
customer needs, seek new customers, and enhance customer service


Credit card companies:

Predictors for credit card customer attrition and
fraud are frequently identified via data mining. Successful users of data
mining include American Express and Citibank.


Financial services:

Security analysts are using data mining extensively to
analyse large volumes of financial data in order to build trading and risk
models for developing investment strategies

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

50

Data Mining: Some Applications


Telemarketing and direct marketing:

In this sector, companies
have gained big savings and are able to target customers more
accurately by using data mining. Direct marketers are configuring
and mailing their product catalogs based on customers' purchase
history and demographic data.


Airlines:

As the competition in the airline business increases,
understanding customers' needs has become imperative. Airlines
capture customer data in order to make strategic movements such
as expanding their services in new routes.


Manufacturers:

Data mining is widely used in manufacturing
industries to control and schedule technical production processes.


Insurance companies:

The insurance industry is data intensive.
Data mining has recently provided insurers with a wealth of useful
information extracted from huge databases for decision making.


Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

51

Data Mining: Some Applications


Telecommunications:

By applying the insights learned through data mining,
telecommunications companies can identify products and services that maximise
value and then use this information to establish marketing campaigns to improve
market share. A common example in this industry is identifying factors that influence
customer retention. In the US, telephone companies were famous for their price
-
cutting strategy in the past, but the new strategy is to know their customers better.
Using data mining, telephone companies are able to provide customers with a great
variety of new services they are likely to purchase.


Distribution and retailing:

With the huge amount of consumer data flowing in daily
from different sources, especially from e
-
commerce Web sites, data mining helps
companies learn more about their customers and develop insights into their buying
habits. Knowing the behaviours (e.g. likes and dislikes) of customers leads to better
customer service and allows companies to create one
-
to
-
one relationships with
customers, hopefully prolonging loyalty and prompting repeat business. As such, data
mining is used extensively in the area of customer relationship management. Large
users of data mining in retailing industry include Wal
-
Mart and Victoria's Secret.


Remotely sensed data:

Huge amounts of remotely sensed data are taken in every
day from satellite images and other related sources. Data mining is used in prediction
of weather, monitoring and reasoning about ozone depletion, etc.

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

52

Advantages of Data Mining


Provide better information to achieve competitive edge


This advantage is the primary motivation for data mining. Data mining
has a powerful analytical ability to generate information, which allows an
organisation to better understand itself, its customers, and the
marketplace it competes in. When used as a marketing tool, data mining
often results in sharper competitive edge, an evidence
-
based selling
approach, a customer
-
oriented marketing plan, shorter selling cycles,
and reduced operational costs.


Add value to a data warehouse



A data warehouse by itself is just a large repository of unstructured
data, and data mining is the process of analysing the data and
transforming it into useful information. Organisations have experienced
a payback of 10 to 70 times their data warehouse investment after data
mining components are added.


Increase operating efficiency



Data mining's ability to quickly organise and analyse a large pool of data
has dramatically increased workplace efficiency. It allows users to
create complex financial statement in minutes compared with weeks by
traditional methods.


Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

53

Advantages of Data Mining


Provide flexibility in using data


With data mining, users gain control over the data. Instead of letting the
system push the data, users are now able to pull the data they need. Users
can let their imagination run and manipulate data in various ways to answer
their questions. The easy
-
to
-
use interface of data mining tools and
client/server technology has made the information directly accessible by
individual users.


Reduce operating costs



Modern data mining tools are made of highly sophisticated hardware and
software components. They allow these tools to analyse massive data sets
efficiently with reduced operating costs. (e.g. the high costs faced by public
sector organisations such as healthcare providers when asked to answer a
“parliamentary question” raised in the Oireachtas could be reduced by the
use of data warehouses and data mining)


Ready
-
to
-
use


Unlike traditional data analysis methods, data mining hardly requires pre
-
processing of data prior to analysis. It can use a mixture of numeric,
categorical, and date data, and can tolerate missing and noisy data. The
results are in the form of ready
-
to
-
use business rules with almost no
statistical expertise and guesswork needed.


Solve research bottleneck


In many social science and business situations, conducting real
experiments is almost impossible. Data mining is able to provide these
research agendas with a more limited set of working hypotheses for further
investigation based on large, unstructured data sets.

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

54

Disadvantages of Data Mining


No definitive answer



Data mining yields useful insights and clues but no definitive answers. The
definitive answers need to be achieved through much more rigorous
scientific experimentation. Experiences from Wall Street have shown that
this technology may not outperform traditional methods. Therefore, users
should have a realistic expectation of the results of data mining.


High cost



The cost of implementing data mining is quite high; thus, it may not be
appropriate in some business environments. Need to justify ROI by cost
-
benefit analysis


Complex and lengthy project



Experience from data mining system developers has shown that it takes a
long time to get the project right. Developers suggest focusing on
incremental development and benefits.


Privacy


The detailed data about individuals used in data mining might involve a
violation of privacy. This problem worsens when the World Wide Web is
involved, because detailed personal information is easily accessible and
can fall into wrong hands.

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

55

Disadvantages of Data Mining


Knowledge requirement of user


Despite its increasingly simple interface and automation of the thinking
processes, data mining is more suitable for people with statistical, operation
research, and management science backgrounds. The ease of use
becomes a critical factor for attracting more businesses to invest in this
technology.


Unmanageable database


Many authors have suggested that organisations must increase the size of
their databases tremendously in order to do data mining. However, some
are concerned that this will result in unmanageable and unnecessary
databases.


Wrong information from errors in data


The massive data used in data mining inevitably contains mistakes caused
by human errors. Information generated should be used with caution to
avoid lawsuits in areas such as hiring. Experts suggest using only relevant
information for mining to reduce such risks.

Slides adapted by Michael Lang, NUI Galway from Turban et al. (2005)
Decision Support Systems and Intelligent
Systems
, 7th ed,
©

Prentice Hall

56

Additional Resources


See case studies of successful implementations at:
http://www.sas.com/success/technology.html


See product demos at:
http://www.sap.com/solutions/analytics/


CIO Magazine
-

ERP Resources:
http://www.cio.com/enterprise/erp/


White papers available from:
http://www.datawarehousing.com/papers.asp


Industry research reports available from:
http://www.datawarehousingonline.com


The Data Warehousing Information Center:
http://www.dwinfocenter.org