Data warehousing and Data mining

desertcockatooData Management

Nov 20, 2013 (3 years and 9 months ago)

94 views


Data w
arehousing
and Data

mining




Presented
By
:
-



Anne.Renuka (
III

CSIT)




Tatineni.Prasanthi
(III

CSIT)


Department of
Computer Science &
Information Technology


St.Ann’s College Of

Engineering
& Technology





-
Chirala
.


Address for Communication:
-




A.Renuka &
T.Prasanthi
,




R.NO:
04F01A1237
, 12
2
9




3
rd

CSIT,




St.Ann’s College Of

Engineering
& Technology
,




Chirala.prakasam
. (Dt),




Andhra Pradesh.



E
-
mail
:
-


renuka_anne2312@yahoo.co.in





Ph.no:9949259437



&




tatineni.prasanthi@gmail.com




Ph.no:
9866681884











Abstract

Data mining allows users to analyze large databases to solve business decision
problems. Data minin
g is, in some ways, an extension of statistics, with a few artificial
intelligence and machine learning twists thrown in. Like statistics, data mining is not a
business solution, it is just a technology.



For example, consider a catalog retailer who needs

to decide who should receive
information about a new product. The information operated on by the data mining
process is contained in a historical database of previous interactions with customers and
the features associated with the customers
, such as age,

zip code and

their responses. The
data mining software would use this historical information to build a model of customer
behavior that could be used to predict which customers would be likely to respond to the
new product.



By using this information a m
arketing manager can select only the customers
who are most likely to respond.


The operational business software can then feed the
results of the decision to the appropriate touch point systems (call centers, direct mail,
web servers, email systems, etc.)

so that the right customers receive the right offers.

Data w
arehousing has evolved rapidly and continues to be a very fast moving and
fast changing market segment. Given an appropriate architecture and a suitable approach
to the goal of the enterprise dat
a warehouse, any team can deliver a high impact, high
value, high ROI and sustainable data warehouse system that will entirely change the
range of potential outcomes for the enterprise.










Data Warehousing and Data mining




Data
Warehousing
:

Introduction
:

In an era of globalization, Enterprises are forced to make business decisions
within extremely short timeframes. Data warehousing is an essential practice for all
competitive

enterprise. The availability of up
-
to
-
date information and its proper use will
be the crucial factor to beat the competition, ever
-
shorter production cycles, unique
customer demands and high return on investment (ROI). Indeed it is often said that
Busines
s Intelligence is no longer an optional requirement.

A data warehouse is more than an archive for corporate data and more than a new
way of accessing corporate information. A data warehouse is a subject
-
oriented
repository designed with enterprise
-
wide ac
cess in mind. It provides tools to satisfy the
information needs of enterprise managers at all organizational levels


not just for
complex data queries, but as a general facility for getting quick, accurate, and often
insightful information. A data wareho
use is designed so that its users can recognize the
information they want and access that information using simple tools.

One of the principal reasons for developing a data warehouse is to integrate
operational data from various sources into a single and c
onsistent structure that supports
analysis and decision
-
making within the enterprise. Operational systems create, update
and delete production data that "feed" the data warehouse. The Insurance industry is quite
diverse in terms of
portfolio

of

products pr
ovided by different companies. Growing
mergers and acquisitions, new distribution channels, growing competition
Demutualization, Redemostication and more focus on changing customer needs presents
a


unique

challenge

to

the

insurer in

leveraging

its

large

volumes

of

data.









Data Warehouse Architecture:



DATA INFORMATION

DECISION
















DATA











DIPPERS



























OLAP












TOOLS






FIG. DATA WAREHOUSE ARCHITECTURE


Data
warehouse
:

A

data warehouse
is a data

structure that is optimized for distribution.
It collects and stores integrated sets of historical data fr
om multiple operational systems
a
nd feeds them to one or more data marts. It may also provide end
-
user access to support
enterprise views of data.


T
he relationship between operational data, a data warehouse and data marts

OPERATIONAL DATA

DATA WAREHOUSE


DATA
MARTS






EXTRACT FROM




SEVARAL


DATA BASES







Data mart
:

A
data mart
is a data structure that is optimized for access. It is designed to
facilitate end
-
user analysis

of data. It typically supports a single analytic application used
by a distinct set of workers.

OPERATIONAL


DATA

EXTERNAL

DATA

L

O

A

D


M

A

N

A

G

E

R

DETAILED

INFORMATION

INFORMATION

SUMMARY


INFO


META


DATA

Q

U

E

R

Y


M

A

N

A

G

E

R



WAREHOUSE MANAGER

Dimension
:

In dimensional modeling, a dimension is an aspect or perspective by which the
facts may be accessed, selected, sequenced, grouped, filtered and pres
ented; a collection
of dimension levels.

Data Quality
:

Identifying data that does not conform to standards for the warehouse. Data
Quality tools filter, clean and transform the data so that it conforms to desired
specifications. These tools also perform so
me data matching and integration of the data.



Future Evolution of Data warehousing
:




Distributed Data warehouse

In many organizations, multiple teams undertake data warehouse projects,
resulting in multiple data warehouse systems across the enterprise.
Although in the
strictest sense, there is only one Enterprise data warehouse, with all other entities being
subset or incremental data marts, not many organizations are this strict in semantics. Thus
we have many enterprises around the world with two to si
x, or even dozen or more “data
warehouse” system. This proliferation of data warehouses has led to the next evolution of
Enterprise Data warehouse (EDW) architecture.





Extranet Data warehouse

Extranet data warehouses provide customers and/or suppliers Web
-
based access
to critical information, such as account, shipping, and performance data, through a portal
interface. Extranet data warehouses are a competitive asset, which enable companies to
cement relationships with key customers and suppliers and better

manage supply chains.



Biotech Data warehouse

Biology and other life sciences will soon emerge as a major new market for data
management vendors, says IDC. According to IDC, the Bio
-
IT market will increase at a
compound annual growth rate of 24 percent to
nearly $38 billion by 2006.Pharmaceutical
and Biotechnology are the top target market for Business Intelligence solution in the
future.



Real
-
time Analysis

Traditionally, data warehouses provide historical data collected on a monthly,
weekly, or even daily
basis. Now, companies are beginning to refresh data warehouses in
near real time by trickle feeding transactions into the warehouse instead of performing
batch loads, which may not work in terabyte
-
plus data warehousing environments. In
addition, users inc
reasingly need to perform complex analyses of transaction data to
make decisions in minutes or hours instead of weeks or months. Some companies are
using operational data stores and high
-
performance rules engines to support real
-
time
analytics, while other
s are using "active" data warehouses.


Data Mining
:

Data mining helps

extracting hidden predictive information from databases



by recognizing patterns and trends in data. The features of the Data Mining products are



Decision Tree



Neural Networ
k



Traditional statistical Techniques



Prediction and Time series.

Data Mining Benefits:



In the Insurance industry, Predicting Bankruptcy, risk analysis, credit and

Collection models.



In the Finance industry, Analysis and forecasting of business

Performance,

stock and bond performance.



In the Market Research, Media selection, broadcasting analysis, product

segmentation.



In the Banking, mortgage approval, loan underwriting, fraud analysis and

detection.

Most companies already collect and refine massive quantit
ies of data. Data
mining techniques can be implemented rapidly on existing software and hardware
platforms to enhance the value of existing information resources, and can be integrated
with new products and systems as they are brought on
-
line. When impleme
nted on high
performance client/server or parallel processing computers, data mining tools can analyze
massive databases to deliver answers to questions such as, "Which clients are most likely
to respond to my next promotional mailing, and why?"

The Founda
tions of Data Mining

Data mining techniques are the result of a long process of research and product
development. This evolution began when business data was first stored on computers,
continued with improvements in data access, and more recently, generate
d technologies
that allow users to navigate through their data in real time. Data mining takes this
evolutionary process beyond retrospective data access and navigation to prospective and
proactive information delivery. Data mining is ready for application

in the business
community because it is supported by three technologies that are now sufficiently mature:



Massive data collection



Powerful multiprocessor computers



Data mining algorithms

Architecture for Data Mining
:

To best apply these advanced techn
iques, they must be fully integrated with a data
warehouse as well as flexible interactive business analysis tools. Many data mining tools
currently operate outside of the warehouse, requiring extra steps for extracting,
importing, and analyzing the data.
Furthermore, when new insights require operational
implementation, integration with the warehouse simplifies the application of results from
data mining. The resulting analytic data warehouse can be applied to improve business
processes throughout the orga
nization, in areas such as promotional campaign
management, fraud detection, new product rollout, and so on. Figure illustrates an
architecture

for advanced analysis in a large data warehouse.

































Problem definition

Data

Recovery

Client’s
proprietary
data files

Syndicated data
sources


Data
Preparation


Data conversion

Overlying
syndicated data

Modeling
process

Build Model on first

Subsamble

Validate Model on Second
(Holdout) Subsamble

Implement
Model

(e.g.,Score
,Master
File)

T
he ideal starting point is a data warehouse containing a combination of internal
data tracking all customer contact coupled with external market data about competitor
activity. Background information on potential customers also provides an excellent basis
for prospecting. This warehouse can be implemented in a variety
of relational database
systems
.


OLAP Analysis
:
OLAP (On
-
Line Analytical Processing) helps managers, executives
and analysts to gain insight into data through fast, consistent a multidimensi
onal view of
aggregate data to provide quick access to strategic information for further analysis.

The features of all the OLAP products are



Multidimensional views of data.



Calculation
-
intensive capabilities.



Time Intelligence.

Benefits of OLAP are



Report
s can be more flexible, have higher presentation quality and be

more useful.



OLAP allows systems for executive information, decision support,

m
anagement

reporting, budgeting, forecasting and performance

and

measurement to be totally integrated.

Conclusion
:

Data Warehousing has evolved rapidly and continues to be a very fast moving and
fast changing market segment. Given an appropriate architecture and a suitable approach
to the goal of the enterprise data warehouse, any team can deliver a high impact, high
value, high ROI and sustainable data warehouse system that will entirely change the
range of potential outcomes for

the enterprise.

Data mining is a best
-
in
-
class approach to leveraging corporate information. An
organization’s inherent requirement for repo
rting, analysis, modeling and planning
applications is to use information from the data warehouse to help drive improved
business performance. To successfully meet this requirement, an organization must first

understand past performance and second, have th
e ability to prepare for and manage the
future.