Data Mining and Data Warehousing Formal Report

clutteredreverandData Management

Oct 31, 2013 (3 years and 11 months ago)

143 views

Blue Team
1





Data Mining and Data Warehousing Formal Report









Team Blue

Davia Alvarado

Whitney Boring

Ryan Brown

Edgardo Carbone

Ramona Dan

Joseph Koscielski





Blue Team
2


Table of Contents







Letter of Transmittal


Executive Summary

Introduction

Business

Uses

Tools

Available

in

the

Marketplace

Costs
,
Resource

Needs
,
and

Other

Considerations

Summary

of

Report

Findings


References




Blue Team
3


Letter of Transmittal


Team Blue


Kennesaw State University


December 2
, 2011

Professor Cochran

Kennesaw State University

1000 Chastain Road

Kennesaw, GA 30144


Dear Professor,


In satisfying the terms of the class, please find enclosed with this letter, the formal report for
Data
Warehousing and Mining, titled “
Data Mining and
Data Warehousing

Formal Report
”.


The purpose of this report is to take an introspective look at the background and capabilities of
Data Warehousing and Mining. The introduction states what Data Warehousing
and Mining is
and how it was started. In addition, the formal report details the specific tools which perform
different tasks, and the benefits to businesses each tool has within the Data Warehousing and
Mining. Lastly, the report delves into the associate
d costs of operating and maintaining Data
Warehousing and Mining capabilities.


I hope

this report will be informative and

satisfactory
.


Respectfully,


Team Blue, IS 2200

Kennesaw State University



Enclosure:
Data Mining and Data Warehousing Formal Report


Accompanying
Website

and
Presentation






Blue Team
4


Error! Reference source not found.


In the formal report the team discusses information on Data Mining and Data
Warehousing. Data Mining uses descriptive modeling to search through large amounts of data,
finding trends, patterns, and relationships. This helps corporations and companies disco
ver
information on customers, how to compete on price, determine fraudulent activity, and other
various subject matters. This gives companies the ability to make accurate decisions on issues
they will face in the future.

Data Warehousing and Data Mining h
as shown useful capabilities in business by creating
more efficient work processes and spreading information across systems with ease. The
reliability, structure, and modeling of any typ
e of data that Data Warehousing and Data
Mining
can produce
are

a grea
t benefit for businesses today.
Several
tools are available in the
marketplace that has

unique capabilities for large and small enterprises. Such useful tools
include OLAP (online analytical processing) which quickly computes MDA (
multi
-
dimensional

analyt
ical) queries (Withrow, 2002). Another useful tool is OLTP (online transaction processing)
which manages and updates transaction recordings; this application is very useful in the banking
industry (Withrow, 2002).

Although Data Mining and Warehousing can
be very useful, it is important for a business
to know the positive and negative effects of investing in them. The costs associated with Data
Mining and Warehousing is substantially high. Not only is the expense of service from providers
like IBM or Oracle

Sun costly, but resources like extra salaries and employee time spent
managing data is costly as well. It is very important for a company, whether small or large, to
project the necessity and benefit of having Data Mining and Warehousing functionalities.
The
specific business type is also important to take into account. If the type of business requires fast
turn
-
around on data and reports needed, dealing with complicated queries and time spent
focusing on queries is not a viable option if the company canno
t afford to expend the appropriate
employee resources. If after planning and consolation, it is determined that Data Mining and
Warehousing proves beneficial, a business can likely grow and expand past that of their
competitors. The ability to manage and e
xtrapolate new data and reports is a very useful asset in
today's market, with consumers and products growing at an exponential rate. Data Warehousing
and Mining has provided many optimizations towards processes of business to bring in more
annual revenue.







Blue Team
5



Introduction



The world is beginning to
depend

more and more
on technology
, especially in the
business world
. It is important to make sure that
firms in the market become
the most
productive, effective, fast, and efficient use out of
new

technol
ogy. Further sections discuss data
mining and data warehousing, and how they
help

businesses grow. Data mining is the process of
analyzing data from different perspectives and summarizing it into useful information that can b
e
used to increase revenue, cut

cost
s
, or both (Palace, 2006). Data warehousing is the process of
centralized data management and retrieval. There have been
several

technological advancements
in data capturing, data processing, data transmission, and storage capabilities
. These
advancem
ents

have allowed businesses to integrate their databases into one centralized data
warehouse (Palace, 2006).


Data mi
ning is

the extraction of hidden predictive i
nformation from large databases.

It

is
new technology that has the potential to allow businesses to focus on the most important
information in their data warehouses (Thearling, 2010).
D
ata mining and data warehousing
seldom go hand in hand. Data mining takes information from all sorts of di
fferent angles and
converts all the information it collects into a statistical answer of some sort.
For example,

when

a
retailer
sends

out a

new catalog with new products,

d
ata mining software would take all of the
information from a historical data base,
within the data warehouse, and sort out all of the
information. The data mining software would take information such as zip codes, ages of
customers receiving the catalogs, and their response on the previous products. The data mining
software then uses thi
s collected information to predict where the product will sell
the most
products

and where the company will benefit from
selling
the new product
s

(Thearling, 2006)
.



Data mining can help a number of different companies. Data mining is not just for
busines
ses that are trying to sell a particular product, or find out where the best area a product
may or may not sell. Data mining can also be beneficial in the field
of health

and science. It can
be used to find predictive analysis’s on comparison of gene seque
nces in bio technology (Koh &
Tan, 2005). Other areas also include
data mining being used in the
criminal justice
system

to
detect fraud and help find previous offenders of fraud (Koh & Tan, 2005). When companies use
data mining they use it to find patter
ns in databases
.

The companies then use

that information to
build predictive models (“Data Mining,” 2003). These models help a company to be efficient and
carry out business

activities

in the most effective way possible.


Data warehousing makes it
possibl
e

for companies to gather and store enterprise
information in a single conceptual enterprise (Withrow, 2002). Data mining is used within the
data warehouse to find patterns to help formulate models to help the company. Data modeling
techniques are applied
to create relationship associations between individual data elements or
data element groups which data mining software helps to formulate with their tool set (Withrow,
2002). Companies with more developed warehouse toolsets incorporate the concept of
multi
dimensional data, or data cubes (Withrow, 2002). This data structure allows information to
to be multi
-
indexed, which allows for rapid drill
-
down on data attributes (Withrow, 2002). These
Blue Team
6


data cubes that companies use are mostly used for what
-
if scenarios
over specified data, while
using tool sets within the data warehouse (Withrow, 2002). Using these multidimensional data
tool sets companies can answer question
s such as,
“If there were a 50% increase in the 2nd
quarter, what would be the total revenue at t
he end of the year?” Questions similar to this help a
company review where a company stands at any point in a
ny

given time. Questions like this can
also help
find

patterns that can
cause

the
company

to

grow and prosper
because it chose the best
option
. Mul
tidimensional analysis of multiple business views is called Online Analytical
Processing
, or most commonly referred to as
OLAP

(Withrow, 2002). The main function of
OLAP systems is to provide users the ability to perform manual exploration, and to use the
analysis summary to come up with new models
that

help the company run more effectively and
efficiently. However, in order to effectively use OLAP
,

the searcher must know exactly what
information he/she is looking for.


Companies use data mining and data warehousing to ensure that they
can increase profits
and decrease costs
. Data mining and data warehousing go hand in hand with one another.
T
his
formal report
takes an

in depth

look at

how data mining and data warehousing
are

used in the
business world,
what
variations of
tools are available in the market place, and the costs
associated with
installing data mining systems
.


Business Uses


There are several different business uses for d
ata mining and data warehousing. Various
establishments use data mining and data warehousing to find trends and patterns regarding all
kinds of issues. In recent years, hospitals and people in the medical field have used data mining
and data warehousing to

uncover important information on patients. Hemodialysis is a method in
which waste products such as crematinine, urea, and water from the blood are removed from
kidneys during renal failure (Rosner, 2009). Those with renal failure suffer every day in hosp
itals
and hemodialysis centers. It is important to patients for doctors to find early treatments

in order

to avoid hospitalization. Data mining and data warehousing has helped doctors and others in the
medical field predict the duration of hospitalization
for patients and the possible treatments to
prevent hospitalization.

According to Jinn
-
Yi Wu, hemodialysis centers have been looking for ways to reduce
hospitalization, because increasing numbers

of

patients leads to a decrease in health care quality.
In
order for hospitals and centers to reduce the number of patients, Wu and colleagues are
interested in developing a decision support system to predict hospitalization of patients during
hemodialysis sessions (Wu, 2011).



For
these
patients that receive lon
g
-
term
hemodialysis
treatment, the centers examine
their biochemical data monthly and input these records in the data analysis. The system is
applicable to high dimensional and massive time series data. It has successfully combined
temporal abstract and as
sociation rule algorithm to determine the patients’ success with
hemodialysis and their future. The output of the information through data mining helps make
medical decisions concerning renal failure patients (Wu, 2011). It has predicted the survival time
of renal
failure

and is changing the future for the science and medical field. Other life threatening
diseases and their symptoms are being put into similar data sets to determine treatment and
survival time. This gives patients assurance when these diseas
es are caught and treated early on.

Blue Team
7


Figure 2


Major corporations and businesses use data mining in order to create value and answer
day
-
to
-
day questions. Successful companies and corporations such as Johnson & Johnson, GE
Capital, Fingerhut, Proctor & Gamble, Harr
ah’s Casino, and various Fortune 500 companies use
data mining to gain knowledge concerning customers, trend analysis, marketing, and sales
analysis. The accounting and finance departments
analyze the company’s potential future

and
what needs to change in
order to have high returns each year. It is important for companies that
want to
be highly efficient

in the business world to look at the advantages of data mining and
data warehousing.


In a recent study, major corporations were given a survey that asked

if and how data
mining is used in the company. Sixty
-
five percent of the companies that responded said they
used data mining. These companies use data mining as a tool for several different parts of the
company including: customer relations, financial man
agement, customer and sales management,
database marketing, customer retention, and other various uses. The use of data mining has
helped large corporations produce up to $24 million more each year. Sales in these companies
have also gone up, some reportin
g $9.65 billion compared to $4.2 billion without the use of data
mining (Calderon, 2010). By discovering company trends, large corporations can figure out how
to bring in new customers and keep old ones.

According to the study, the companies surveyed use

data mining software such as
Oracle’s Darwin, SAS’s Enterprise Miner, IBM’s Intelligent Miner, as
well as MineSet, and
Clementine (Calderon, 2010).

In many c
ases, Fortune 500 companies and
larger
business
corporations use
several different

data mining
software
. This indicates that not one program can
satisfy the multitude of needs that companies have,
no single tool is the answer for company issues.
When asked about the data mining techniques
companies use, many responded that regression
modeling and di
scriminant analysis are most often
used. As indicated in Figure 1, there are several
different techniques used by the surveyed
companies, but decision trees and regression
modeling are most frequently used.

Regression modeling analyzes the
rel
ationship between a dependent variable and one
or more independent variables; it is mostly used
for predictions and forecasting (Stockburger
,
1996
).
According to Stockburger, i
n order to construct a regression model, both the information
used to make the p
rediction and the information which
will be

predicted must be
recorded

from a
sample of objects or individuals. Th
is

relationship between the two pieces of information is then
set in a linear model. I
n the future, only
information
needed to make the predic
tion

is necessary
. Th
e

regression model
will then

transform this
information into the predict
ion
. In other words, it is
necessary to have information on both variables
before the model can be constructed

for present and
future purposes
.

Figure 2 illustrates a typical
regression model. Although it is nearly impossible to
see regression models with the naked eye when
Blue Team
8


dealing with such large amounts of data, this figure indicates that a small amount of information
is used for prediction.
Reg
ression modeling is
probably

used more often because it is the easiest
to use. Companies can use this technique in order to plan for the future. Managers can easily find
the information needed by looking at easy
-
to
-
read equations and statistics. Accountant
s use
regression modeling to develop cost estimation and to find appropriate cost drivers (Calderon,
2010). According to the study, techniques such as decision trees and clustering permit
identification of hidden patterns and trends in existing data, but r
egression modeling assumes
patterns exist based on existing theory or observation. For example, when estimating costs,
management accountants often make implicit or explicit decisions about factors that drive costs,
and their estimations confirm their beli
efs.

It is important for companies, corporations, and other fields to look into data mining and
the potential benefits of such softwa
re. Questions about the future

come up every day
,
whether
a
company or firm deals with

a customer’s next move

or a

patient
’s diagnosis
. Industries need the
answers to those questions before they lose money or even lives. Data mining and data
warehousing helps answer those questions through computer
-
based algorithmic equations
beyond the understanding of humans.
The marketplac
e provides different tools in data mining
that businesses and economic firms can use in order to increase annual returns.



Tools Available in the Marketplace


The relationships and distinctions between the information systems concepts of data
warehousing

and data mining, combined with online analytical processing (OLAP), form the
backbone of decision support capability in the database industry. Decision support applications
impose different demands for OLAP database technology than the online transaction
-
processing
(OLTP) model that preceded it. Data mining with OLAP differs from OLTP queries in the use
of multidimensional data models, different data query and analysis tools at both the user
-
facing
front end and the database back end, and different mecha
nisms for data extraction and
preparation before loading into a data warehouse can take place. The construction of data
warehouses entails the operations of data cleaning and data integration, which are key pre
-
processing steps for enabling data mining. F
urthermore, the concept of metadata (data about
data) is essential to the functioning of a data warehouse, and must be managed appropriately for
an effective and efficient installation (Chaudhuri et. al, 1997).

The major commercial players in the data ware
housing market today include IBM,
Oracle
-
Sun, Teradata and Microsoft. Data mining functionality is typically included within the
data warehousing vendor’s software suite. Some vendors have specialized further by creating
product suites sold as data wareho
using appliances. These consist of an integrated, pre
-
packaged
combination of server and storage hardware, with pre
-
installed operating system and relational
database software that has been optimized for typical medium to large
-
scale customer
implementatio
ns (Microsoft, 2008).

Gartner (2008) predicted that a fifth of all organizations worldwide would have
customized software
-
as
-
a
-
service (SaaS) applications created to supplement their business
in
telligence operations by 2010.
The value
-
added business of inf
ormation aggregators is to
provide domain
-
specific analysis capability using competitive business information as a base.
This by its nature tends to generate monopolies in vertical information domains, due to the need
for aggregators to ensure the confide
ntiality and secure protection of their clients’ sensitive
Blue Team
9


business data.
Without proper integration into proprietary internal information stored in data
warehouses, customized SaaS
-
based tools cannot generate the benefits they are expected to
provide.

Data warehousing may be defined in its simplest form as “a process of centralized data
management and retrieval” (Palace, 1996). Ideally, a data warehouse is the centralized
repository of all of an organization’s data, made available for users to access a
nd analyze
according to their individual needs through the process of data mining. It provides the tools and
mechanisms for business executives to systematically organize, comprehend, and utilize their
data to make strategic decisions. In recent years wi
th competition mounting in every industry,
data warehousing has become an essential method for organizations to retain customers by
learning more about their needs using a solid platform of consolidated historical data and
powerful analysis and mining tool
s (Berson et. al., 1997)

Data mining refers to the ability to enable analysis, categorization and summarization of data
from multiple angles or different dimensions. Palace (1996) defines data mining as “the process
of finding correlations or patterns amo
ng dozens of fields in large relational databases.” The
relationships, associations, historical patterns and future trends extracted from data in the
database are what constitute useful information or knowledge to the user. Data mining was
initially used

and promoted by consumer
-
oriented organizations that needed to deal with large
volumes of data related to their business, finances, and customers, so as to be able to effectively
design and price their products to address competition and meet customer pri
orities.

Douq (2009) outlines the set of marketing criteria that most often addressed by vendors of data
warehouse products in comparing their own offerings with competing providers. Physical
architecture and design, scalability, parallelism, performance
and optimization, system
availability, ease of operations and management are the subjects most frequently discussed and
debated by vendors and analysts in industry circles.

It is useful to distinguish commercial relational databases from the multidimension
al database
structures used in data mining and warehousing. Traditional relational databases emphasize the
operation of normalization (minimizing data redundancy), and are specifically tuned and
organized to permit ad
-
hoc queries upon normalized data stor
ed in tables and indexes.
Multidimensional databases organize data in the form of data “cubes”, which can be visualized
as data sets and subsets implemented in array structures. A data cube consists of a large set of
facts or
measures
, along with a numb
er of associated
dimensions
. Dimensions are hierarchical
entities that the organization wants to record and keep information about (Berson et. al., 1997).
For example, a 3
-
D data cube could display the value of sales dollars, according to the measures
of

city, product and month sold. A 4
-
D data cube could add the dimension of year sold to the
original three. Figure
3

provides a simplified example of the 3
-
D case illustrating the conceptual
model.

Unlike traditional relational database implementations, d
ata may be repeated or
reorganized extensively within a multidimensional database to meet the needs for faster search
and query operations. Therefore, the needs of data warehouses are most compatible with data
mining operations carried out on multidimensi
onal databases (Palace, 1996). Data warehouses
commonly utilize three
-
tier architecture. The first or bottom tier is the data warehouse database
server’s relational database system. The second or middle tier is an OLAP server implementing
the multidimensi
onal OLAP database functionality. The third or top tier is a client layer
providing the user
-
facing query and reporting tools used for mining the data warehouse (Berson
et. al., 1997).

Blue Team
10


Figure
3

OLAP Cube (Microsoft TechNet, 2011)


Figure
4

Typical data warehouse
appliance

Two leading commercial implementations of data warehousing and data mining
functionality include Oracle Corporation and NCR Teradata. Both solutions are based upon
relational database management systems (RDBMS) at their core. However their origins,
implem
entation specifics, and performance characteristics have significant differences. Oracle’s
database originally evolved to respond to the market for traditional online transaction processing
(OLTP), then gradually evolved to incorporate data
warehousing an
d mining capabilities through its online
analytical processing (OLAP) offerings. OLAP functionality
is encompassed within the larger Business Intelligence (BI)
disciplines, and includes both relational queries and data
mining functions to produce output r
eports oriented to the
business functions of finance, marketing, and management.
Oracle’s OLAP implementation deals effectively with multi
-
dimensional data by using algorithms optimized to handle
rapid drill
-
down and aggregation in large data sets. This
e
nables the Oracle data warehouse system to respond to
complex information queries that may be posed in different
ways from different angles (Douq, 2009).

Teradata is generally acknowledged to be the original
large
-
scale data warehouse offering. It origina
ted as part of NCR Corporation, and formally
separated into its own entity in 2007. The Teradata relational database was created and
architected from its earliest beginnings for optimized information retrieval. As such, it is
arguably faster and more ef
ficient for certain “pure” data warehousing implementations than
Oracle (Douq, 2009).

At a smaller scale, data warehousing and mining capability can also be created using
desktop tools such as Oracle MySQL, or Microsoft Access and Microsoft Excel spreadsh
eets.
With the Microsoft product suite, using features such as pivot tables, fact tables and the Query
-
By
-
Example function enables search indexing for practical performance on databases of over a
million records while bypassing the more sophisticated prog
ramming methods involving
Structured Query Language (SQL) commonly found in commercial RDBMS products (Microsoft
Corporation, 2009)

How effectively a vendor or
small business is able to integrate the
operations

of warehousing and mining
of data is a key determinant of not only
its competitive strength, but also the
type of target implementations where a
satisfactory outcome is most likely to
result for the end customer. As such,
the strategic business intellige
nce
derived from data warehousing and data
mining has become a management tool
of critical importance to gaining and
retaining competitive advantage (King,
2009).

Blue Team
11


IBM made a strategic entry into the commercial data warehouse appliance space with its
acquis
ition of Netezza as a subsidiary in 2010. Netezza
-
based appliances feature a proprietary
hardware and software

implementation called Asymmetric Massively Parallel Processing
(AMPP). This architecture incorporates rack
-
mounted blade format servers and dis
k storage,
with a hardware
-
based data
-
filtering component using field
-
programmable gate arrays (FPGA).
Following IBM’s acquisition of the ten year old Netezza technology, it has modified the
TwinFins standard configuration to exchange processing modules f
or additional disk storage
within the same two or four
-
rack assembly, to offer a “near
-
line” data warehouse appliance
(Prickett Morgan, 2010, 2011). Figure
4

illustrates a typical example of a large
-
scale,
commercial data warehouse appliance product, the IBM Netezza.

The technical implementation of a data warehouse RDBMS can differ substantially from
a standard commercial implementation. For example, data war
ehouses are designed to optimize
the speed of complex data retrieval queries involved in data mining. To accomplish this, a data
warehouse RDBMS may store multiple copies of the same data in granular format using a
technique called aggregation. De
-
normal
ization of data (that is, the use of data repetition and
grouping) is common for read
-
intensive database applications to ensure adequate query response
times. Without de
-
normalization, performance can be seriously hindered by the overhead
involved in acce
ssing normalized logical views or join tables across multiple physical data files
(Shin et. al., 2006). The technique of indexing for the purpose of increasing query efficiency
and speed is used in both traditional RDBMS and multidimensional OLAP
-
based R
DBMS for
data mining and warehousing. Bitmap
-
based indexing is one technique used in warehouse
implementations to reduce the high processing overhead of join, aggregate and compare
operations into highly efficient bit arithmetic. Join indexing is used in

relational databases to
join the relevant rows from two or more source data tables. A hybrid method called bitmapped
join indexing is employed in OLAP implementations to further improve system performance for
multidimensional queries (Berson et. al., 199
7).

We have discussed data warehousing and data mining, which may be considered as two
sides of the same coin for decision support operations. For all businesses with an interest in data
analysis, having strong data warehousing and mining capabilities ca
n enhance business
productivity by enabling quick and efficient assembly of relevant information which accurately
describes the organization. It is likely to provide a competitive advantage by presenting relevant
views of historical organization data, fro
m which business trends can be clearly seen. Similarly,
business performance can be more accurately measured, to help support critical adjustments
designed to assist in competitive positioning. Data warehousing and mining can help to facilitate
customer
relationship marketing, by enabling a consistent view of customer activity across all
markets, products, business lines and departments. Finally, an efficient data warehouse with data
mining capability can help lower the cost of doing business by consisten
tly and reliably tracking
trends, patterns, and exceptions over extended periods of time and multiple business cycles
(Berson et. al., 1997).




Costs, Resources Needs, and Other Considerations




According to Danna and Gandy many companies including those in Fortune’s top 1000
are using data mining techniques in their business maneuvers (Danna, 2002). Each and every
company uses data mining to increase their profits by better understanding trends i
n the
marketplace. The easy use of data mining software for marketing department employees also
Blue Team
12


benefits in the costs of the company because it decreases the amount of money it has to spend on
training each individual employee on how to use the new softwar
e. The use of data mining
software also helps in saving the company money by seeing which techniques work the best for
the company.




Some of the resources necessary for data warehousing integration include the funds and
time to train the empl
oyees as well as the time to integrate the system with the previous system
that the company was using. Data warehousing is most effective when used with previous data
collected and then integrated with the current flow of data so that a nice, solid databas
e would be
optimal for integrating into a data warehousing system. There are several different packages a
company can obtain when it comes to data mining. Depending on what package the company
chooses and the custom features that are added onto that packag
e typically determine the cost of
integration. Therefore, the actual cost of data warehousing integration is inconclusive because
there are hundreds of different variables that rely on the decision of the company.


In Leventhal’s introduction to data ware
housing and data mining integration, he describes
the variability of purchasing one of these systems. The cost is also dependant upon the size and
needs of the company. A data warehousing system can cost anywhere from a few thousand
dollars to upwards of o
ne million dollars for companies such as those in the Fortune’s top 1000.


Also dependant upon the cost of the data warehousing system integration is the product in
which the company chooses. There are multiple companies that offer data warehousing systems

such as IBM, Microsoft and Teradata. Teradata recently became the forerunner in the race to
perfect data warehousing software according to an article written by an anonymous source in
Business Wire.
Therefore, companies will most likely pay more for a Ter
adata data warehousing
system than any other.


Summary of Report Findings


Data Mining and Warehousing has proven to be successful techniques/tools for
companies and organizations looking to find answers quickly and accurately. From this report we
understa
nd that Data mining is the process of analyzing data from different perspectives and
summarizing it into useful information
-

information that can be used to increase revenue, cuts
cost, or both (Palace, 2006). In partnership, we also understand that Data w
arehousing is the
process of centralized data management and retrieval. There have been technological
advancements in data capturing, data processing, data transmission, and storage capabilities have
allowed businesses to integrate their databases into one

centralized data warehouse (Palace,
2006).

There are many business uses of Data mining and warehousing but as before mentioned
there are applications in today’s world where data mining and warehousing can positively affect
people’s lives. In our example,
hospitals have worked to reduce hospitalization because
there are
a larger number of patients per hospital
this
leads to
inefficient

care. Efficiency with patients in
the hospital would lead to better care for patients. In addition the data mining gives a

timeline on
when patients (hemodialysis patients) can expect a general recovery and success rates of
procedures that have been used in the past. The data can be broken down by gender, race, age
range, etc.

Blue Team
13


There are many tools available for data warehousi
ng/mining but the major tools currently
in the marketplace are OLAP, OLTP, and SaaS (customized software). These tools differ in the
way they compute and configure information for the eventual end information wanted. There are
different uses of each tool,

for example, OLAP (online analytical processing) quickly computes
MDA (multi
-
dimensional analytical) queries. OLTP (online transaction processing) manages and
updates transaction recordings which is very useful in the banking industry. The major players

in
offering data mining/warehousing services are IBM, Oracle
-
Sun, Teradata, and Microsoft.
Opposite of large enterprises or exponential information, data warehousing and mining
capabilities can be performed on common tools such as MySQL, Microsoft Access,

and
Microsoft Excel. Although these smaller tools don’t have as much variation capabilities, they
still serve as great tools for assimilating data and obtaining wanted information in the proper
format. Data mining and warehousing give great advantages but

these very useful applications
come at a cost. The initial costs of incorporating, hiring staff, training staff, and finding a system
that works all consume an exuberant amount of resources. However each company is different
because the size of the need i
s different.
Costs are

as little as one thousand dollars upwards to
millions of dollars. Thus it is important for a business or organization to plan accordingly with
pursuing data mining and warehousing.



Blue Team
14


References


Anonymous.
Data Mining
.

(2003, July 23). Retrieved from
http://www.dwreview.com/Data_mining/index.html


Anonymous. Teradata customers recognized for best practices leadership in finance and operational
performance management in 2007. (2007, Nov 06).
Business Wire,
pp. n/a.
Retrieved from
http://search.proquest.com/docview/444873123?accountid=11824


Berson, A., & Smith, S. J. (1997). Data Warehousing, Data Mining, and Olap (1st ed.). McGraw
-
Hill.
Retrieved from http://dl.acm.org/citation.cfm?id=549950


Calderon, T.G., Cheh,
J.J., & Kim, I. (2003).
How large corporations use data mining to create value.
Retrieved from
http
://
findarticles
.
com
/
p
/
articles
/
mi
_
m
0
OOL
/
is
_2_4/
ai
_99824637/
pg
_2/?
tag
=
content
;
co
11


Chaudhuri, S., & Dayal, Umeshwar. (1997).
An overview of data warehousing and OLAP technology
.
ACM Digital Library, 26(1). doi:10.1145/248603.248616


Danna, A., & Gandy, O. (2002).

All that glitters is not gold: Digging beneath the surface of data mining.

Journal of Business Ethics, 40
(4), 373
-
386. Retrieved from
http://search.proquest.com/docview/198053489
?accountid=11824



Douq, Q. (2009, November 24). Comparison of Oracle to IBM DB2 UDB and NCR Teradata for Data
Warehousing. X
-
Space. Retrieved November 14, 2011, from
http://space.itpub.net/673
608/viewspace
-
620367


Gartner Newsroom. (2008). Gartner Reveals Five Business Intelligence Predictions for 2009 and Beyond.
Retrieved November 14, 2011, from
http://www.gartner.com/it/page.jsp?id
=856714


King, M. A. (2009). A Realistic Data Warehouse Project. Retrieved November 14, 2011, from
http://www.eric.ed.gov/ERICWebPo
rtal/search/recordDetails.jsp?searchtype=advanced&pageSiz
e=50&ERICExtSearch_SearchCount=1&ERICExtSearch_SearchValue_0=%22data+warehousin
g%22&eric_displayStartCount=1&ERICExtSearch_Operator_1=and&ERICExtSearch_SearchTy
pe_1=kw&ERICExtSearch_SearchType_0=kw&_
pageLabel=RecordDetails&objectId=0900019
b803e8a94&accno=EJ868864&_nfls=false

Koh, H., & Tan, G. (2005).

Data mining applications in health care. Orginal Contributions, 19(2), 64
-
72.
Retrieved from http://www.himss.org/content/files/code109_ Data mining in
healthcare_JHIM_.pdf


Leventhal, B. (2010). An introduction to data mining and other techniques for adv
anced analytics.

Journal of Direct, Data and Digital Marketing Practice, 12
(2), 137
-
153.
doi:10.1057/dddmp.2010.35


Blue Team
15


Microsoft Corporation. (2009, January). Data Warehouse in the Enterprise. Retrieved from
http://www.google.com/url?sa=t&rct=j&q=data%20warehouse%20in%20the%20enterprise&sour
ce=web&cd=4&ved=0CGkQFjAD&url=http%3A%2F%2Fdownload.microsoft.com%2Fdownloa
d%2FF%2F9%2F5%2FF959E048
-
04DC
-
40A1
-
8F17
-
A0A5BAD23A2D%2FDataWarehouseEnt
erprise.docx&ei=F3bBTpf1MqeIiAKPkJ2mAw&usg=
AFQjCNFKhxz0w7
-
PEFwchF
-
Vsz0k8UyCqQ&cad=rja


OLAP Cube. (2011).Microsoft TechNet. Retrieved November 15, 2011, from
http://technet.mi
crosoft.com/en
-
us/library/ee277469(BTS.10).aspx


Palace, B. (1996). Data Mining: What is Data Mining? Retrieved November 14, 2011, from
http://www.a
nderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm


Prickett Morgan, T. (2011, June 23). IBM fattens up Netezza data warehouses •. The Register. Retrieved
November 14, 2011, from http://www.theregister.co.uk/2011/06/23/ibm_nete
zza_high_capacity/


Prickett Morgan, T. (2010
). Netezza to bake analytics into appliances. The Register. Retrieved November
14, 2011, from http://www.theregister.co.uk/2010/02/24/netezza_data_analytics/


Rosner, M.H. (2009, October 12). Hemodialysis.

Retrieved from
http
://
www
.
webmd
.
com
/
a
-
to
-
z
-
guides
/
hemodialysis
-
20667
.


Shin, S. K., & Sanders, Lawrence G. (2006). Denormalization strategies for data retrieval from data
warehouses. ACM Digital Library, 42(1). doi:doi:10.1016/j.dss.2004.12.004


St
ockburger, D. W. (1996, July 15). Regression models. Retrieved from
http://www.psychstat.missouristate.edu/introbook/sbk16.htm


Thearling, K. (2010) An introduction to data mining.
Retrieved from
http://www.thearling.com/text/dmwhite/dmwhite.htm

Withrow, S. (2002, April 03). Data warehousing and mining basics.

Retrieved from
http://www.techrepublic.com/article/data
-
warehousing
-
and
-
mining
-
basics/1045046


Wu, J., & Yeh, T
. (2011). A decision support system for predicting hospitalization of hemodialysis
patients.
International Journal of Biological & Life Sciences
, 8

(1), 26
-
36.