DATA WAREHOUSING/ DATA MINING

siberiaskeinData Management

Nov 20, 2013 (3 years and 8 months ago)

262 views


DATA
WAREHOUSING/
DATA MINING



12/
5
/2012

Big Data



Data Warehousing/ Data Mining


Page
1

TABLE OF CONTENTS


EXECUTIVE SUMMARY
|

................................
................................
................................
......

2

INTRODUCTION |

................................
................................
................................
...............

2

BUSINESS USES |

................................
................................
................................
.................

9

TOOLS |

................................
................................
................................
............................

12

COSTS AND RESOURCES
|

................................
................................
................................

15

CONCLUSION |

................................
................................
................................
.................

20





Data Warehousing/ Data Mining


Page
2


Data Warehousing/ Data
M
ining


B I G DATA

EXECUTIVE SUMMARY

The rate and volume of data being created and stored each year is growing exponential
ly
.
This
report

will introduce the origins,
evolution and current trends
in

Data Warehousing.
The different types of
data warehouses, along with the be
nefits and detractors of
each type
, will be explored.
Available

tools and
methodologies for Data Mining

will be
introduced
.

S
pecific business uses
and case studies
will be discussed
as
well as common uses of certain products.

Costs, benefits and related business options
which drive

management

decision
s

regarding

implement
ation of

these systems will be presented.

Techniques for determining return on
investment along with actual results from implementation will be provided.

INT
RODUCTION

Digitally stored data has increased from 1% in 1996 to over 94% in 2007
(Paul, 2011)
. Current
estimates are that approximately 1.2 zettabytes
(1 zettabyte is equal to 1 billion
terabytes

(Roe))
of new
data was
created worldwide in 2010, 1.8 zettabytes was created in 2011 and that is predicted to grow to
approximately 7.9 zettabytes by 2015. Sto
ring and analyzing enormous

volumes of data can be a
monumental, but critical task

for businesses. Data Warehousing

an
d Data Mining are two of the processes
used for those tasks.

The classic

definition of
a
data warehouse comes from Ralph Kimball, considered to be one of the
founding architects of the technology. He says that “a data warehouse is a copy of transaction da
ta
specifically structured for querying and reporting”
(1keydata, 2012)
. As with any concept in the field of
information technology, there are competing opinions and definitions
.

T
he most common
is

from someone who
is also co
nsidered
one of the

founding fathers of data warehousing, Bill Inmon.
By h
is definition “a

data
warehouse is a subject
-
oriented, integrated, time
-
variant and non
-
volatile collection of data in support of
management's decision making process”
(1keydata, 2012)
. In reality, each of their definitions and data
warehousing philosophies differ mainly from their data storage approaches. Inmon’s methodology maintains
that a data warehouse should follow a top down approach, where a large enterprise
-
wide database is t
he
repository for all corporate data and that smaller departmental data marts
(a data warehouse subset)
should
be used for specific analytical purposes. Kimball’s approach is the opposite. He maintains that the
Data Warehousing/ Data Mining


Page
3

departmental data marts should be used to f
eed their data into the corporate data warehouse
(Anupindi,
2005)
.

Structurally, there are two approaches for creating data warehouses. The first is favored by Kimball
and is called the dimensional model. Dimensional modeling

utilizes a design that focuses on data storage with
the end goal being ease of access for reporting. Dimensional modeling consists of two categories, dimensions
and measures. Dimensions are what we are trying to model. It can be a product, a time frame

or a store.
Each dimension would typically be stored in its own database table with a unique field called a key.
A key is
a field
containing a unique piece of common data

used to link information from one table to another.
The
number of columns in the
table depends upon how many attributes
should be
store
d

for that dimension. If a
retail chain wanted to create a dimensional data warehouse, they may have dimension tables for their stores,
dates
,
products
,

and possibly dozens more. It is critical to und
erstand that sales counts or dollars are not
stored in dimension tables. Those figures are instead loaded into measure tables, otherwise known as fact
tables.
F
act table
s

house

the key to each of the dimension tables
stored

for this information along with the
quantity measure
; fact tables typically contain numeric data that can be aggregated such as sales numbers or
inventory counts
. In the case of
a

hypothetical retail chain, a fact table would have columns for the store k
ey,
date key, product key and the quantity we desire to store along with any other applicable dimensions. The
dimensional design is frequently referred to as a star schema, because it has the appearance of a star when
laid out on paper or in a data modeli
ng tool
(Mitra, The 101 Guide to Dimensional Data Modeling, 2012)
.

The second method for creating data warehouses is the approach recommended by Inmon and is
called the
R
elational
M
odel. Relational modeling mimics the structur
e of a normalized relational database
and is typically used for online transaction processing systems (OLTP). This model consists of entities and
relationships
, with a primary goal

to eliminate data redundancy through a process known as
D
ata
N
ormalization. Data normalization is the process of organizing data to elim
inate redundancy
. It is classified
according to five levels called forms
, each level adhering to additional design rules and characteristics
. The
goal of the first normal form is
to eliminate repeating groups in tables, creating separate tables for related
data and identifying sets of related data with a primary key. The goal of the second normal form is to
create separate tables for sets of values that apply to multiple records a
nd to relate these tables with a
foreign key. Finally, the third normal form eliminates fields that do not depend on the key. Typically
the
third normal form is the highest that most organizations implement. It is important to recognize that these
form
s are guidelines and not absolutes, but following them is generally recommended
(Kent, 1982)
.

The dimensional and relational approaches each have

design
drawbacks
. The first drawback
associated with dimensional modeling deals w
ith hierarch
ic
al data. Consider a dimension table that stores
information about DVD’s, but there are two types, standard and Blu
-
Ray. This model offers two ways of
Data Warehousing/ Data Mining


Page
4

storing this data. The first is the traditional star schema, where both items

are stored

in the products dimension
table
. T
he second
, called a snowflake schema,

would be to create another dimension table to store the
two
type
s
. The second dimensional modeling drawback deals with changes in dimension tables. There are three
types of
dimensions; unchanging, slowly changing and rapidly changing. An unchanging dimension contains

static data and does not change, for example the days of the week.

A slowly changing dimension is an
attribute that changes infrequently, such as the price for

a product or a person’s phone number. A rapidly
changing dimension would be an attribute that changes often, possibly daily or even more frequently such as
the number of text messages sent or minutes used on a mobile phone plan. The main point to rememb
er is
that the attributes of dimension tables will play a part in the physical design of
the

tables
(Mitra, History
Preserving in Dimensional Modeling, 2012)
.


The relational modeling design has its own set of drawbacks.
The

normalization of data often
causes
it to end
up
a complex and complicated series of keys, which can degrade performance
.

Another drawback
is the complexity of connectivity between data sources

which

are often required to obtain any meaningful
information
.

Data Warehousing/ Data Mining


Page
5

Regardless of the method

chosen to implement a data warehouse, the primary usage is for data
mining.
Generally, data mining is the process of analyzing data from different perspectives and summarizing
it into useful
and actionable
information
.


I
nformat
ion can
then
be used to increase revenue, cuts costs, or both.
Technically, data mining is the process of finding correlations or patterns among dozens of fields in large
relational databases
(Palace, 1996)
. There are several
major techniques used for data mining including
association, classification, clustering, prediction
, and sequential patterns. B
elow is a brief overview of each

technique.

The technique of a
ssociation attempts to discover relationship patterns
. In retail,

for example, it can
uncover a relationship between an

item and other items

frequently

purchased in the same transaction.

This

relationship

could be used
in cases where that did not take place, and targeted marketing efforts could result
in incremental s
ales of the associated item
. Classification is a highly mathematical process that classifies items
into a group or group
s

using techniques like decision trees, linear programming
,

and neural networks.
Clustering automatically groups items that have simil
ar characteristics.
A

comparative analogy

is a

library
where books are grouped by subject
. Prediction is a technique utilizing regression to discover relationships
between dependent and independent variables. Finally,
s
equential
pattern

attempts to disc
over relationships
by analyzing business transactions over a period of time
(Pham, 2012)
.

Graphic of data warehousing architecture:

http://datawarehouse4u.info/



Data Warehousing/ Data Mining


Page
6

G
raphic showing
the volume of new data created every minute of the day:

http://consumer.media.seagate.com/2012/06/the
-
digital
-
den/how
-
much
-
data
-
is
-
generated
-
i
n
-
a
-
minute/












Data Warehousing/ Data Mining


Page
7

Differences between relational and dimensional modeling:

http://www.laynetworks.com/Diamension
-
Modeling11.htm


Relational Data Modeling

Dimensional Data Modeling

Data is stored in RDBMS

Data is stored in RDBMS or Multidimensional databases

Tables are units of storage

Cubes are units of storage

Data is normalized and used for OLTP. Optimized for
OLTP processing

Data is denormalized and used in
data warehouse

and
data mart. Optimized for OLAP

Several tables and chains of relationships among them

Few tables and fact tables are connected to
dimensional tables

Volatile(several updates) and time variant

Non volatile and time invariant

SQL is used to manipulate
data

MDX is used to manipulate data

Detailed level of transactional data

Summary of bulky transactional data(Aggregates and
Measures) used in business decisions

Normal Reports

User friendly, interactive, drag and drop
multidimensional OLAP Reports

Typical data design used for business transaction
systems

Data design used for analysis systems

Goal


reduce every piece of information to
its

simplest
form



a debit transaction, a customer record, an address.

Goal


break up information into ‘Facts’


things a
company measures and ‘Dimensions’
-

how we measure
them: by time, region, or customer

Suited for concurrent handling of many small
transactions by many users. Only a limited amount of
data history is normally kept

Suited for reading or analyzing

large amounts of data
by a modest numbers of users. Many years of data
history may be kept.

User is usually constrained by an application that
understands the data design. Users are typically
operations staff.

This simpler data design makes it easier for

users to
analyze data in any way they choose. Users are
typically analysts,


company strategists, or even
executives










Data Warehousing/ Data Mining


Page
8

BUSINESS USES |

Businesses from mom and pop organizations to billion dollar corporations use information to inc
rease
sales,
manage inventory, and

predict trends.

Following are some examples of industry specific usage of data
warehousing and data mining.

Telecommunications is a con
tinually

evolving, competitive industry with access to large volumes of
data. The main providers,
such as Verizon Communications and AT&T, have compiled information about their
respective customer bases for decades, and constantly seek ways of utilizing this information to achieve their
goals. According to Goran, a few of the telecommunications goals t
hat data mining can be helpful with
include
identifying
which customers are most likely to
leave
, how to increase customer loyalty, which additional
products and services customers will have a propensity to buy, and analyze traffic patterns to assist with
capacity planning.

Analyzing which customers are most likely to churn involves evaluating customer feedback, paired with
common triggers for leaving. In an effort to prevent subscriber loss and demonstrate true concern for
enhancing the quality of service

provided, Verizon Wireless solicits feedback randomly from customers within
a few days after they have completed a transaction with any channel of sales or support. An automated
attendant places a call to the customer and asks for a rating of 0
-
10, “Overa
ll, how likely is it you’d
recommend Verizon Wireless to a friend or colleague?” and prompts them to select the number on their phone
keypad. Once the digit is input, the customer is also given an opportunity to record a customized message if
they would li
ke to provide additional insight about their recent experience. First, the survey question is
carefully worded not only
to
gauge whether the customer is reasonably satisfied with their interaction, but to
understand whether the customer will go forth and r
ecommend the company in a positive light.
Furthermore
,
giving the client a chance to tell more about their experience beyond just the rating alone affords them an
opportunity to either practice raving about Verizon or, on the opposite note, to vent about
any outstanding
concerns. Based on historical data, Verizon knows that a customer who rates “extremely likely” (and does not
convey any areas of opportunity in the verbatim section) will remain a customer and provide exceptional
word
-
of
-
mouth advertising.
Any client who responds “neutral” is in danger of churning, will passively go about
their business, and will not promote the company services. A rating of “not likely at all” means the client is
highly likely to churn and will also speak negatively about t
he organization, which will detract from potential
growth.

Data Warehousing/ Data Mining


Page
9


Once a survey is completed, steps are taken to increase customer loyalty. As indicated, the
respondent may hear from a manager “to better understand the feedback to improve the service.” In addition
to churn prevention and increased customer loyalty, the v
erbatim information is able to be categorized into
different areas of opportunity. For instance, interaction, service, billing, coverage,
or
equipment
-
related
answers guide the business decisions that will help
both
customers

and Verizon

achieve their desi
red goals. In
short, everybody wins!

Data warehousing and
data mining are also widely used
w
ithin the healthcare industry
. For example,
historically
if a patient sought the most advanced expertise and, therefore the best
,

breast cancer treatment
they
had
to

travel to a highly esteemed oncology practice, such as the Mayo Clinic in Rochester, Minnesota or
Johns Hopkins in Baltimore, Maryland. According to the British Journal of Cancer article written by V Patkar
and colleagues, “computerized decision support

(CDS) systems enable the use of evidence
-
based clinical
Data Warehousing/ Data Mining


Page
10

guidelines” to standardize triage and treatment steps for breast cancer throughout the field of medicine.
While some remotely located physicians may not have the hands
-
on experience that doctors have

in the top
-
rated oncology centers, they are gaining access to systems that assist with the most effective steps to
evaluating and treating breast cancer. Review the images below for a better understanding:

Flow
-
chart of the Tallis Triple Assessment Workfl
ow:


Decision selection allows access to further research of steps to be taken:


Systems like this are dependent upon data collection and compilation. The full implementation of
warehoused data and mining for similar case studies makes it possible to min
imize improper diagnosis, reduce
the need for second opinions, and achieve faster handling of time
-
sensitive, life
-
threatening diseases like
breast cancer. These enhancements to treatment could ultimately lead to more lives saved, reductions in
malpractice

claims and coverage costs, and increased patient confidence, just to name a few of the benefits.
The opportunities are truly infinite.

Data Warehousing/ Data Mining


Page
11

Data mining can assist in nearly every step within the process of manufacturing goods as well. As
early as 2002, several

companies had begun to utilize data within certain phases of their work, even if it was
not fu
lly integrated into their

process. Braha cites “Caterpillar for early adoption into quality control and
warranty claims analysis and Boeing for use in post
-
fligh
t diagnostics” just to name a
few
.

Caterpillar not only put data to work within their culture but even went so far as to join forces with
StatSoft in 2004 to “develop and market software solutions for the modeling, optimization and simulation of
complex ma
nufacturing processes (Quality Digest).” The union of these organizations continues to advance the
causes of process monitoring, failure prediction and prevention, root
-
cause analysis, and production
optimization both inside and beyond Caterpillar’s indust
ry. Proceed is Caterpillar’s trademarked solutions
software.



Data Warehousing/ Data Mining


Page
12

TOOLS |

Data Mining is an attempt to discover hidden rules in a vast amount of information. It is an
informational tool ciphering through large amounts of information by finding patterns and

relationships and
taking what’s valuable from it (
Dufrane
215)
. After the mining process, information is stored in
the data
warehouse. This information may include but is not limited to; a company’s trends, produc
ts, consumers and
processes. Data minin
g tools and data warehouses are two separate entities. One sorts through information
and pulls out what is valuable (
data mining
) and the other stores it for up to ten years (
data warehouse
). Data
mining tools from different corporations vary. From Orac
le to IBM, the tools used are meant to help
managers make sound decisions based on the information generated to the data warehouse. Extracting
Transforming Loading (ETL) are tools that are used to make the mining process more reliable (
etltools.net
).

Amon
g other uses, t
he warehouse for data is
a valuable tool

for management
reporting. A collection
process begins using any one of a numbers of different tools. These tools will vary depending on the type of
data being collected. They can take the form anyw
here from a shop floor PLC (Programmable Logic
Controller) which collects information such as tolerance limits to large ERP (Enterprise Resource Planning)
software collecting information about sales, marketing, and inventory.

Once the information is colle
cted
, the
data mining can begin. Users from

bus
iness analysts
through executives can use it to ask questions of this vast
array of collected data to make informed decisions about operations, marketing, or any other area in which
data has been collected.

Oracle

is

known for its leadership in database software
. One methodology used by
Oracle is

Radio Frequency Identification.

Radio Frequency Identification technology (RFID) is one data
collection
tool used by Oracle

in its
software offerings that allows bu
sinesses

to tag products that are being shipped to various locations.
This
allows a business to keep an accurate inventory count at all times and alerts the proper departments when it
is necessary to reorder materials.


ETL
(Extract, Transform, Load)
is
a
ten step
process
that

works
by

taking information from internal or
external sources.
The data is transformed into a
standardized

data source for storing and then loaded into
the data warehouse.

There are many commercial ETL
tools utilized

by

companies
such as

IBM, Oracle,
and
Microsoft
. These tools are narrowed down

by cost and
the volume of data t
o be extracted. IBM uses
Infosphere DataStage, a tool that integrates data on demand, allowing direct access to “big data” from
customers helping to address

the most challenging data volumes in a business (
http://www
-
01.ibm.com/software/data/infosphere/datastage/
. November 15, 2012.). Infosphere DataStage is a new
data integration s
oftware program that enables business developers to maximize flexibility, speed and
effectiveness while their customers accelerate problem resolution and work smarter.

Data Warehousing/ Data Mining


Page
13


ETL data tools
allow connections while

Business Intelligence mining tools (BI)
are

used for
finding
patterns that will help improve the processes.


Programs like Excel and

Access

provide a low end solution for
data mining while more sophisticated (and more costly) tools are available for larger companies with more
complex needs.

Micros
oft is a software company that uses both
types of
tools to run
its

business effectively.
Server Integration Services (SQL) is an ETL tool used by Microsoft for building enterprise level data
integration allowing the transformation of data from any data so
urce type without
the need for additional

coding

(
msdn.microsoft.com
)
. The BI tools of Microsoft are
Power Pivot
, PerformancePoint and
Reporting
Service

and Analysis services on the SQL (
crn.com)
. November 15, 2012). The BI tools offered by Microsoft
he
lp configure and tailor solutions to their clients needs.

Business Intelligence tools are used
for reporting
, dashboarding

and analysis
. Ac
cording to
businessintelligencetoolbox.com
,

a case study performed by Microsoft on Great Western Bank noted their
pr
ofitability

after utilizing Microsoft’s BI to
ols. Great Western Bank was utilizing

old reporting tools
which
forced

clients wait as long as a week for help desk assistance. This made it difficult for banking officers to
identif
y dissatisfied customers and

cross
-
sell to them. Great Western needed a solution that would support
their growth while still having regulatory compliance. To
increase

time
-
to
-
solution, Great Western Bank used
Power View, one of the BI tools offered by Microsoft which is used to gra
ph reports and is user friendly for
those who are non
-
technical

(
Microsoft.com)
.

Data mining tools such as Power View, SQL, Infosphere Datastage and others are used to help
corporations
run more
efficiently

and

effectively.
They enable analysts to report
information that drive
management decisions.


Data Warehousing/ Data Mining


Page
14

COSTS AND RESOURCES
|

Cost is always a factor in considering viability of upgrades to systems. Determining the return on
investment of such expenditures is complex and varies from business to business.

TWDI is “the premier provider of in
-
depth, high
-
quality education and research in the business
intelligence and data warehousing industry” (Russom, 3). TWDI and other firms conducted a survey of
375
data management professionals

of varying backgrounds

and

narrowed the scope to 278 by removing
vendors from survey respondents. TDWI also conducted phone interviews and received product briefings
from vendors.

Respondents by Position:


34% of the respondents felt that the
ir current

tools or platforms were h
olding back performance.
61% consider cost the biggest obstacle to high performance data warehouse success (HPDW). 66% called
HPDW extremely important. According to TDWI 61% can meet their needs with “a moderate amount of
tweaking” (Russom, 17).

IBM p
roduced a report comparing three platforms for data warehouses and the costs associated with
implementing

each
: IBM Smart Analytics 770, Oracle Exadata Database Machine, and Teradata’s flagship
Active Enterprise Data Warehouse (Active EDW) 6650. The rese
arch and analysis was conducted by the
International Technology Group (ITG). Cost analytics are provided based on the size of the data.

Initial implementation costs run from $679K to $1.41M for the smallest models up to $4.52M to
$5.68M for larger models.

Three year cost comparisons run from $2.08M to $2.16M for the smallest
Data Warehousing/ Data Mining


Page
15

models up to $7.26M to $13.22M for larger ones. These cost comparisons are for the largest of the entry
level to mid
-
range models. All of these platforms can be configured in larg
er increments

The cost per terabyte (TB) for the three platforms runs from $215K to $347K for smaller models to
$105K to $188K for larger models for non
-
compressed data and $82K
-
217K for smaller models to $42K to
$151K
-
42K for larger models compressed data
.

Note that the cost per terabyte drops as the model size
increases.

Image from
ITG

report.


Data Warehousing/ Data Mining


Page
16

The costs outlined above include hardware, system software licenses and installation, along with
hardware maintenance, software support and the ongoing service c
osts. These costs do not typically include
applications software.

Those are the costs for housing “big data” but there are also costs associated with mining
and

analyzing that data.

Microsoft office applications (Excel and Access) can be used as a tool to

query and analyze data up
to a point. These tools cost $350 to $500 on average for a single user with 2 PCs for a small business. Excel
and Access can be purchased independently for between $100 and $200 per user.

Microsoft Office SQL is a data mining
tool that is available for as low as $2,700 for small businesses,
including up to 10 client access licenses. The basic version of SQL can compress data up to 60%, which lowers
the cost of housing the data
and reduces

time spent on backups. It “delivers a

complete data management
and business intelligence platform for departments and small organizations to run their applications, helping
enable effective database management with minimal IT resources” (Microsoft.com).

Oracle offers
Business I
ntelligence sof
tware for
s
mall to medium sized business
es. The software

includes interactive dashboards, formatted reports, ad hoc analysis, and server administration delivered on
Web

architecture. This
solution

starts as low a
s $1,200 plus a support package, which sta
rts
at $264.

According to a 2002 International Data Corporation study, business analytics implementations have
generated a five year return averaging 112% with a mean payback of 1.6 years in revenue increases or
cost savings.

Per Farrell, Harrah’s Entertai
nment instituted a Total Rewards program utilizing Teradata. The
company recognized a very big payoff:



Increased cross
-
property revenue 72%



Increased trip frequency of regular players from 1.2 to 1.9 times per month



Doubled profits to $60 per person per t
rip



Achieved growth 2
-
3 times that of competitors



Achieved highest 3
-
year
Return on Investment (ROI)

in the industry


The value of information and data efficiency must be quantified on an individual basis. Data
warehouses increase the amount of detailed data available for analysis, the number of users who can access
the data at any given time, the depth of analysis of th
e data, and the frequency and speed of availability of
Data Warehousing/ Data Mining


Page
17

the data.

W
hile many companies realize an increase in revenue based on analytics, others recognize cost
savings in the form of employee reduction or identifying savings opportunities via in
-
depth anal
ysis.

Informatica gives a seven step approach to calculating the return on investment for implementing a
data warehouse for businesses:

Step 1


“Build an Enterprise Deployment Map” (Informatica 4). This may include a long term
schedule for implementati
on but should be maintained in order to show the schedule and costs associated with
it.

Step 2


“Analyze Potential Benefits” (Informatica 4). For some it may be increased visibility or
speed of visibility into relevant data. This could mean reducing ti
me spent producing reports in addition to
the ability to act on information earlier. Other benefits could be less clear; faster access to customer records,
increased market share, or better customer service. Some of the benefits will be tangible and easi
ly
measured if a baseline is established. The desired benefits should be prioritized and measures available
when possible.

Step 3


“Calculate Net Present Value for all Benefits” (Informatica 5). If revenue increases while
costs lower or remain static, w
hat is the benefit?

Figure from Informatica representing Step 3.



Step 4


“Define Overall Costs” (Informatica 5). According to Informatica there are ten fundamental
costs:

1.

Hardware

2.

Networks

3.

Relational Database Management System software

4.

Back
-
end Tools

5.

Query/Reporting Tools

6.

Metadata Repository

7.

Internal Labor

Data Warehousing/ Data Mining


Page
18

8.

External Labor

9.

Ongoing Support

10.

Training

Step 5


“Calculate Net Present Value for all Costs” (Informatica 6). Actual costs can be estimated on
the enterprise deployment map timeframe. Another opti
on is to use the value as a percentage of the costs.

Step 6


“Assess Risks, Adjust Costs and Benefits” (Informatica 6). There are certain common risk
factors that can be mitigated early by detailed planning.

Step 7


“Determine Overall ROI” (Informatica

7). Subtract net present value of total costs from net
present value of incremental revenue plus cost savings. A 1996 study by International Data Corporation
found an average three year ROI of 401%.














Data Warehousing/ Data Mining


Page
19

CONCLUSION |

There are many options avail
able when it comes to data warehousing and data mining. T
he
sophistication and size of a data warehouse
can be designed to suit any

business.

Data warehouses can also
grow with a business
so
costs can be minimized until value can be determined
.
For l
arg
er corporations that
have already determined the value and viability of data warehousing, designing or operating a data
warehouse in house could be a viable option.


Likewise,
many options

exist

for data mining, depending on the needs of a particular
busi
ness or
industry.
Regardless of the size or type, cost must be a consideration

-

M
ost companies
utilizing data
warehouses have found
that the break even point for this

investment is just over a year.



Data Warehousing/ Data Mining


Page
20

WOR KS C I T E D

BIBLIOGRAPHY

1keydata. Data Warehouse
Definition. 2012. 10 November 2012.
<http://www.1keydata.com/datawarehousing/data
-
warehouse
-
definition.html>.

Anupindi, Nagesh V.
Inmon vs. Kimball
-

An Analysis
. 25 August 2005. 9 November 2012.
<http://www.nagesh.com/publications/technology/173
-
inmon
-
vs
-
kimball
-
an
-
analysis.html>.

Çakir, A., & Demirel, B. (2011). A software tool for determination of breast cancer treatment methods using data
mining approach. Journal of Medical Systems, 35(6), 1503
-
11. doi:

http://dx.doi.org/10.1007/s10916
-
009
-
9427
-
x

Chidansh, A. B., & Mohan, S. K. (2011). Multimedia data mining: State of the art and challenges. Multimedia Tools
and Applications, 51(1), 35
-
76. doi:

http://dx.doi.org/10.1007/s11042
-
010
-
0645
-
5

Dan Braha (Ed), Data Mining for Design and Manufacturing, Springer, 2002, 544 p.

Davis, B. (1999). Data mining transformed. InformationWeek, (751), 86
-
88.

Retrieved from

http://search.proquest.com/docview/229154773?accountid=11824

DuFrene, D., & Lehman, C., & Reynolds, G., & Stair, R.
Principles of Information Systems.

(2011). Bo
ston. ISBN:
978
-
538
-
47829
-
9.

Farrell, Vickie. Realizing Data Warehouse Return on Investment. Teradata.com. 2004. Web.

IBM.com.

Informatica. The 7 Steps to Calculating Data Warehouse ROI. Web.

HYPERLINK "http://http
-
server.carleton.ca/~aramirez/4406
/Data_Warehouse.pdf"
http://http
-
server.carleton.ca/~aramirez/4406/Data_Warehouse.pdf .

International business machines corporation; patent issued for input data structure for data mining. (2012).
Computer Weekly News, , 1521. Retrieved from

http://search.proquest.com/docview/1035612181?accountid=11824

International Technology Group. Cost/Benefit Case for Enterprise Warehouse Solutions. June 2011. Web.
http://www.ibm.com/us/en/
.

Kent, William.
A Simple Guide to Five Normal Forms in Relational Database Theory
. September 1982. 10 November
2012. <http://www.bkent.net/Doc/simple5.htm>.

Lehman M. Carol, Dufrene D. Debbie. 2011. Principles of Information Systems. Boston: Course Techno
logy.

Mariscal, G., Marbán, Ó., & Fernández, C. (2010). A survey of data mining and knowledge discovery process
models and methodologies. The Knowledge Engineering Review, 25(2), 137
-
166. doi:

htt
p://dx.doi.org/10.1017/S0269888910000032

Microsoft.com.

Data Warehousing/ Data Mining


Page
21

Microsoft Case Studies. (2011, 13 December). Microsoft. Retrieved on November 15, 2012 from

HYPERLINK
"http://msdn.microsoft.com/en
-
us/library/ms141026.aspx"
http://msdn.microsoft.com/en
-
us/library
/ms141026.aspx .

Mitra, Akash.
History Preserving in Dimensional Modeling
. 21 July 2012. 8 November 2012.
<http://www.dwbiconcepts.com/data
-
warehousing/12
-
data
-
modelling/132
-
history
-
preserving
-
in
-
dimensional
-
modeling.html>.

Oracle.com.

Palace, Bill.
Data
Mining: What is Data Mining?

June 1996. 14 November 2012.
<http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm>.

Patkar, V., Hurt, C., Steele, R., Love, S., Purushotham, A., Williams, M., Thomson, R., Fox, J., (2006)
. Evidence
-
based
guidelines and decision support services: a discussion and evaluation in triple assessment of suspected
breast cancer. British Journal of Cancer, 95(11), 1490
-
1496. doi:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2360742/

Paul.
Data Storage [Infographic]
. 26 August 2011. 11 November 2012.
<http://techblog.cosmobc.com/2011/08/26/data
-
storage
-
infographic/>.

Pham, Anthony.
Data Mining Techniques
. October 2012. 18 November 20
12. <http://www.zentut.com/data
-
mining/data
-
mining
-
techniques/>.

Proceed Powered by Statistica
-

Case Study. StatSoft, Inc. (2005). StatSoft, Inc.. Web. 19 Nov 2012.

http://proceed.statsoft.com/PDF/PROCEED_Optimizes_Preventative_Maintenance.pdf

Roe, Charles.
The Growth of Unstructured Data: What To Do with All Those Zettabytes
?

15 March 2012. 7
November 2012. <http://www.dataversity.net/the
-
growth
-
of
-
unstructured
-
data
-
what
-
are
-
we
-
going
-
to
-
do
-
with
-
all
-
those
-
zettabytes/>.

The 101 Guide to Dimensional Data Modeling
. 26 June 2012. 7 November 2012.
<http://www.dwbiconcepts.com/data
-
warehousing/12
-
data
-
modelling/127
-
dimensional
-
data
-
modeling.html>.

Whiting, Rick of CRN. (2012, 23 April). CRN. Retrieved on November 15, 2012 from

HYPERLINK
"http://msdn.microsoft.com/en
-
us/library/ms141026.aspx"
http://msdn.microsoft.com/en
-
us/library
/ms141026.aspx .

Yoo, I., Alafaireet, P., Marinov, M., Pena
-
hernandez, K., Gopidi, R., Chang, J., & Hua, L. (2012). Data mining in
healthcare and biomedicine: A survey of the literature. Journal of Medical Systems, 36(4), 2431
-
48. doi:

http://dx.doi.org/10.1007/s10916
-
011
-
9710
-
5