ISM3610 Decision Support and Intelligence System Data Warehousing

sunfloweremryologistΔιαχείριση Δεδομένων

31 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

130 εμφανίσεις


1

ISM3610

Decision Support and Intelligence
System




Data
Warehousing




B
y

Group
B


C
han Chi Leung (03012034)

Chan Wing Sze (03012077)

C
heung Helios Su Ho (03012107)

Fong Yau Shing (
03000923
)

K
ong Kevin Tsz Wang (03012239)

L
au Ka Wing (03012255)

P
ong Shuk
Ting (04001737)

Wong Chi Ho (03012468)


1
9
th April, 2007


2

T
able of Contents

1.

Introduction

................................
................................
................................
...

5

1.1

What is data warehouse?

................................
...............................

5

1.2

Construction
................................
................................
......................

8

1.3

Data Acquisition and Collection

................................
.....................

8

1.4

Metadata
................................
................................
............................

9

1.5

Data Marts
................................
................................
.......................

10

1.6

Trustworthiness and Security
................................
.......................

10


2.

Characte
ristics of a Data Warehouse

................................
.....................

11

2.1

Subject Oriented

................................
................................
............

11

2.2

Integrated

................................
................................
........................

11

2.3

Time Variant

................................
................................
....................

12

2.4

Non
-
Volatile
................................
................................
.....................

12


3.

Data Warehouse Architecture

................................
................................
..

14

3.1

Operational Database / External Database Layer

....................

15

3.2

Information Access Layer

................................
.............................

15

3.3

Data Access Layer

................................
................................
.........

16

3.4

Data Warehouse (Physical) Layer

................................
..............

17

3.5

Application Messaging Layer

................................
.......................

18

3.6

Process Management Layer

................................
........................

18

3.7

Data Directory (Metadata) Layer

................................
.................

18

3.8

Data Staging Layer

................................
................................
........

19



3

4.

Examples on Data Warehousi ng Vendors

................................
.............

20

4.1

IBM

................................
................................
................................
...

20

4.1.1

Introduction to IBM
................................
.............................

20

4.1.2

Features of DB2 Data Warehouse Edition
.....................

20

4.1.3

Advantages

................................
................................
.........

22

4.1.4

Disadvantages
................................
................................
....

23

4.1.5

Application of DB2 DWE in Copenhagen’s TDC

..........

23


4.2

Oracle

................................
................................
..............................

27

4.2.1

Introduction to Oracle

................................
........................

27

4.2.2

Features of Oracle Data Warehousi ng

...........................

28

4.2.3

Advantages

................................
................................
.........

32

4.2.4

Disadvantages
................................
................................
....

33

4.2.5

Application of Oracle Datawarehousing in Absa Group
Limited


................................
................................
..............................

34


4.3

SAS

................................
................................
................................
..

35

4.3.1

Introduction to SAS
................................
............................

35

4.3.2

Features of SAS Warehousi ng Admi nistrator

................

36

4.3.3

Advantages

................................
................................
.........

40

4.3.4

Disadvantages
................................
................................
....

40

4.3.5

Application of SAS Da
ta Warehousi ng in the HK Trade
Development Council

................................
................................
........

41



4

5.

How to implement Data Warehouse successfully

................................

42

5.1

“If you Built It,

They Will Come”

................................
...................

42

5.2

Omission of an Architectural Framework

................................
...

42

5.3

Understanding the Importance of Documenti ng Assumptions
43

5.4

Failure to Use the Right Tool for the Job
................................
....

43

5.5

Life Cycle Abuse

................................
................................
............

44

5.6

Ignorance Concerni ng the Resolution of Data Conflicts
..........

44

5.7

Failure to Learn from Mistakes

................................
....................

44

6.

Concerns & Concl
usion

................................
................................
............

46

7.

References
................................
................................
................................
..

47



5

1.

Introduction

Since the early 1990s, data warehouses have been at the forefront of
information technology applications as a way for o
rganizations to
effectively use digital information for business planni ng and decision
making. As i nformation professionals, we no doubt will encounter the data
warehouse phenomenon if we have not already been exposed to it i n our
work. Hence, an understan
di ng of data warehouse system architecture is
or will be important in our roles and responsibilities in i nformation
management.


1.1

What is data warehouse?

Simply saying, a data warehouse could be thought of as a place for
secondhand data that originates in e
ither other corporate applications,
such as the one our company uses to solve pri nter problems that are
reported from customers, and our front and second li ne support staff, or
some other data source external to our company, such as a public
database that
contains customer support i nformation gathered from our
competitors.


Technically, a data warehouse is the coordinated, architected, and periodic
copying of data from various sources, both i nside and outside the
enterprise, i nto an environment optimized f
or analytical and informational
processing. The key here is that the data is copied (duplicated) i n a
controlled manner and is copied periodically (batch
-
oriented processi ng).

6

Data warehousi ng is also, therefore, the process of creati ng an architected
info
rmation
-
management solution to enable analytical and informational
processing despite platform, application, organi zational, and other barriers.
The key concept here is that barriers are bei ng broken and distributed
information is being consolidated for an
alysis, although no preconcei ved
notion exists for the exact means of doing so, such as duplicating data.


As we all know that, large companies use software packages that gather
and store data in special configurations called data warehouses. Since a
data

warehouse is an integrated collection of data it can support
management analysis and decision making. For example, i n a typical
company, data is generated by transaction
-
based systems, such as order
entry, i nventory, accounts recei vable, and payroll. If a

user wants to know
the customer number on a particular sales order, they can retrieve the data
easily from the order entry system application.


On the other hand, suppose that a user wants to see May sales results for
the sales representati ve assigned to

a specific customer, as shown i n the
figure 1

for a typical data warehouse.


7



F
igure 1


Typical Data Warehouse


Although the information
systems are interactive, it is difficult for a user to
extract specific data that spans several systems and time frames; the
average user might need assistance from the IT staff.


What's nice about a data warehouse is that rather then accessing separate
s
ystems, the data warehouse stores transaction data i n a format that
allows users to access, combine, and analyze the data. Again, this should
help in taming and controlli ng data volume. A data warehouse allows users
to specify certain dimensions, or charac
teristics. In a consumer products
data warehouse, dimensions might include time, customer, and sales
representative. By selecting values for each characteristic, a user can

8

obtain multidimensional information from the stored data.


Data warehousi ng is als
o a collection of decision support technologies,
aimed at enabling the knowledge worker, who could be an executi ve,
manager, or analyst to make better and faster decisions.


1.2

Construction

The steps i n planni ng a data warehouse are identical to the steps f
or any
other type of computer application. Users must be involved to determine
the scope of the warehouse and what business requirements need to be
met. After selecting a focus area, for example, analyzing the use of state
government records over time, a d
ata warehouse team of business users
and information professionals compiles a list of different types of data that
should go into the warehouse. After busi ness requirements have been
gathered and validated, data elements are organized i nto a conceptual
dat
a model. The conceptual model is used as a bluepri nt to develop a
physical database design. As in all systems design projects, there are a
number of iterations, prototypes, and technical decisions that need to be
made between the steps of systems analysis,

design, development,
implementation, and support.


1.3

Data Acquisition and Collection

The data warehouse team must determine what data should go into the
warehouse and where those particular pieces of i nformation can be found.
Some of the data will be inte
rnal to an organization. In other cases, it can

9

be obtained from another source. Another team of analysts and
programmers create extraction programs to collect data from the various
databases, files, and legacy systems that have been identified, copying
ce
rtain data to a staging area outside of the warehouse. At this point, they
ensure that the data has no errors, and then copy it all i nto the data
warehouse. This source data extraction, selection, and transformation
process is unique to data warehousing. S
ource data analysis and the
efficient and accurate movement of source data into the warehouse
environment are critical to the success of a data warehouse project.


1.4

Metadata

Good metadata is essential to the effective operation of a data warehouse
and it
is used i n data acquisition/collection, data transformation, and data
access. Acquisition metadata maps the translation of information from the
operational system to the analytical system. This i ncludes an extract
history describing data origins, updates,
algorithms used to summarize
data, and frequency of extractions from operational systems.
Transformation metadata includes a history of data transformations,
changes i n names, and other physical characteristics. Access metadata
provides navigation and grap
hical user interfaces that allow non
-
technical
business users to interact i ntuiti vely with the contents of the warehouse.
And on top of these three types of metadata, a warehouse needs basic
operational metadata, such as procedures on how a data warehouse
is
used and accessed, procedures on monitori ng the growth of the data
warehouse relati ve to the available storage space, and authori zations on

10

who is responsible for and who has access to the data i n the data
warehouse and data in the operational system.


1.5

Data Marts

Data in a data warehouse should be reasonably current, but not
necessarily up to the mi nute, although developments i n the data
warehouse i ndustry have made frequent and i ncremental data dumps
more feasible. Data marts are smaller than data war
ehouses and generally
contain i nformation from a single department of a business or organization.
The current trend in data warehousing is to develop a data warehouse with
several smaller related data marts for specific ki nds of queries and reports.


1.6

Trus
tworthiness and Security

As with any information system, trustworthiness of data is determi ned by
the trustworthi ness of the hardware, software, and the procedures that
created them. The reliability and authenticity of the data and i nformation
extracted f
rom the warehouse will be a function of the reliability and
authenticity of the warehouse and the various source systems that it
encompasses. In data warehouse environments specifically, there needs
to be a means to ensure the integrity of data first by ha
ving procedures to
control the movement of data to the warehouse from operational systems
and second by having controls to protect warehouse data from
unauthori zed changes. Data warehouse trustworthiness and security are
conti ngent upon acquisition, transf
ormation and access metadata and
systems documentation.


11

2.

Characteristics of a Data Warehouse

This part focuses on the fundamental characteristics of a data warehouse.
Bill Inmon,
is recogni zed as

father of data warehousing

,
has defined data
warehousi ng a
s a database containing
Subject Oriented
,
Integrated
,
Time

Variant
and
Non
-
volatile

i nformation used to support the decision
making process

(
Martyn R Jones
,1999)
.
T
he followi ng will explai n these
four fundamental characteristics of data warehouse.


2.1

Subjec
t Oriented

Operational databases, such as order processi ng and payroll databases,
are organi zed around busi ness processes or functional areas. These
databases grew out of the applications they served. Thus, the data was
relati ve to the order processing ap
plication or the payroll application. Data
on a particular subject, such as products or employees, was maintai ned
separately (and usually i nconsistently) in a number of different databases.
In contrast, a data warehouse is organi zed around subjects. This s
ubject
orientation presents the data in a much easier
-
to
-
understand format for
end users and non
-
IT business analysts.


2.2

Integrated

Integration of data withi n a warehouse is accomplished by maki ng the data
consistent in format, naming, and other aspects. Op
erational databases,
for historic reasons, often have major i nconsistencies i n data
representations. For example, a set of operational databases may

12

represent "male" and "female" by usi ng codes such as "m" and "f", by "1"
and "2", or by "b" and "g". Often,

the inconsistencies are more complex and
subtle. In a data warehouse, on the other hand, data is always maintai ned
in a consistent fashion.


2.3

Time Variant

Data warehouses are time variant i n the sense that they both

mai ntain

historical and (nearly) current

data. Operational databases, in contrast,
contain only the most current, up
-
to
-
date data values. Furthermore, they
generally maintain this i nformation for no more than a year (and often
much less). In contrast, data warehouses contai n data that is general
ly
loaded from the operational databases daily, weekly, or monthly which is
then typically maintai ned for a period of 3 to 10 years. This is a major
difference between the two types of environments.


Historical i nformation is of high importance to decision

makers, who often
want to understand trends and relationships between data. For example,
the product manager for a Liquefied Natural Gas soda drink may want to
see the relationship between coupon promotions and sales. This is
information that is almost im
possible
-

and certai nly in most cases not cost
effective
-

to determine with an operational database.


2.4

Non
-
Volatile

Non
-
volatility, the fi nal primary aspect of data warehouses, means that
after the data warehouse is loaded there are no changes, i nserts, o
r

13

deletes performed agai nst the informational database. The data
warehouse is, of course, first loaded with transformed data that originated
in the operational databases.


The data warehouse is subsequently reloaded or, more likely, appended
on a periodic
basis (usually nightly, weekly, or monthly) with new
transformed data from the operational databases. Outside of this loading
process, the data warehouse generally stays static. Due to non
-
volatility,
the data warehouse can be heavily optimized for query p
rocessing.



14

3.

Data Warehouse Architecture

A Data Warehouse Architecture (DWA) is a way of representi ng the overall
structure of data, communication, processi ng and presentation that exists
for end
-
user computing within the enterprise. The architecture is ma
de up
of several components:



Operational Database / External Database Layer



Information Access Layer



Data Access Layer



Data Warehouse Layer



Application Messaging Layer



Process Management Layer



Data Directory (Metadata) Layer



Data Staging Layer

The
figure below shows how the different layers are i nter
-
connected
together.


F
igure 2


Data Warehouse Architecture



15

3.1

Operational Database / External Database Layer

Operational systems process data to support critical busi ness operational
needs. Operational
databases have been created to provide an efficient
processing structure for a relati vely small number of well
-
defi ned business
transactions. However, because of the limited implementation of
operational systems, the databases designed to support operation
al
systems have difficulty accessi ng the data for other management or
informational purposes. This difficulty i n accessing operational data is
amplified by the fact that many operational systems are often very old in
age. This means that the data access te
chnology available to obtain
operational data itself is dated.


The goal of data warehousing is to free the i nformation that is locked up in
the operational databases and to mi x it with i nformation from other external
sources of data. Nowadays, many large
organizations are acquiring
additional data from outside databases. This information includes
demographic, economic, competiti ve and purchasing trends. The so
-
called
"information superhighway" is providing access to more data resources
every day.


3.2

Informat
ion Access Layer

The Information Access layer of the Data Warehouse Architecture is the
layer that the end
-
user deals with directly. In particular, it represents the
tools that the end
-
user normally uses day to day, e.g., Excel, Lotus 1
-
2
-
3,
Access, SAS, e
tc. This layer also i ncludes the hardware and software

16

involved i n displaying and printing reports, spreadsheets, graphs and
charts for analysis and presentation. Over the past two decades, the
Information Access layer has expanded enormously, especially a
s
end
-
users have moved to PCs and PC/LANs.


Today, more and more sophisticated tools exist on the desktop PC for
manipulati ng, analyzi ng and presenti ng data; however, there are
significant problems in maki ng the raw data contained i n operational
systems av
ailable easily to end
-
user tools. One of the key problems is to
find a common data language that can be used throughout the enterprise.


3.3

Data Access Layer

The Data Access Layer is involved with allowi ng the Information Access
Layer to communicate to the Op
erational Layer. Today the common data
language that has emerged is SQL. Originally, SQL was developed by IBM
as a query language, but over the last twenty years has become the
standard for data interchange.


One of the key breakthroughs of the last few ye
ars has been the
development of a series of data access "filters" such as Enterprise Data
Access (EDA)/SQL that make it possible for SQL to access nearly all
DBMSs and data file systems, relational or non
-
relational. These filters
make it possible for Info
rmation Access tools to access data stored on
database management systems that are even twenty years old.



17

The Data Access Layer not only spans different DBMSs and file systems
on the same hardware, it spans manufacturers and network protocols as
well. One

of the keys to a Data Warehousi ng strategy is to provide
end
-
users with "universal data access". Universal data access means that,
theoretically, end
-
users, regardless of location or Information Access tool,
should be able to access any or all of the data

i n the enterprise that is
necessary for them.


In some cases, this is all that certain end
-
users need. However, in general,
organi zations are developi ng a much more sophisticated scheme to
support Data Warehousing.


3.4

Data Warehouse (Physical) Layer

The cor
e Data Warehouse is where the actual data used for informational
uses occurs. In some cases, one can thi nk of the Data Warehouse simply
as a logical or virtual view of data. In many instances, the data warehouse
may not actually involve storing data.


In a

Physical Data Warehouse, copies, in some cases many copies, of
operational and or external data are actually stored in a form that is easy to
access and is highly flexible. Increasi ngly, Data Warehouses are stored on
client/server platforms, but they are
often stored on main frames as well.



18

3.5

Application Messaging Layer

The Application Message Layer has to do with transporti ng i nformation
around the enterprise computi ng network. Application Messagi ng is also
referred to as "middleware", but it can i nvolve
more than just networking
protocols. Application Messaging for example can be used to isolate
applications, operational or i nformational, from the exact data format.
Application Messaging can also be used to collect transactions or
messages and deliver the
m to a certai n location at a certain time.
Application Messaging is the transport system underlying the Data
Warehouse.


3.6

Process Management Layer

The Process Management Layer is involved in scheduling the various
tasks that must be completed to build and m
aintai n the data warehouse
and data directory i nformation. The Process Management Layer can be
regard as the scheduler or the high
-
level job controller for the many
processes that must be done to keep the Data Warehouse up
-
to
-
date.


3.7

Data Directory (Metadat
a) Layer

In order to provide for universal data access, it is necessary to mai ntain
some form of data directory or repository of meta
-
data information.
Meta
-
data is the data about data within the enterprise. Record descriptions
in a COBOL program are meta
-
data. So are DIMENSION statements in a
FORTRAN program, or SQL Create statements.



19

In order to have a fully functional warehouse, it is necessary to have a
variety of meta
-
data available, data about the end
-
user views of data and
data about the operational

databases. Ideally, end
-
users should be able to
access data from the data warehouse without having to know where that
data resides or the form in which it is stored.


3.8

Data Staging Layer

The fi nal component of the Data Warehouse Architecture is Data Stagin
g.
Data Staging is also called copy management or replication management.
Actually, it includes all the processes necessary to select, edit, summari ze,
combine and load data warehouse and information access data from
operational and/or external databases.


Data Staging often i nvolves complex programming, but increasi ngly data
warehousi ng tools provide help i n this process. Data Staging may also
involve data quality analysis programs and filters that identify patterns and
data structures within existing oper
ational data.



20

4.

Examples on Data Warehousing Vendors

A
s IBM, Oracle, and SAS are the famous software vendors.
A
lso, their
data warehouse technology that provided by those vendors are widely
common used by different industries.
T
herefore, we chose data ware
house
examples from these companies.


4.1

IBM

4.1.1

Introduction to IBM

IBM is aligned around a si ngle, focused business model: i nnovation. It
takes its breadth and depth of i nsight on issues, processes and operations
across a variety of industries, and invents and
applies technology to help
solve its clients' most intractable business and competitive problems. It
provides different types of data warehouses for the users to deliver
dynamic warehousing. One of the data warehouse is the DB2 Data
Warehouse Edition (DB2
DWE).


4.1.2

Features of DB2 Data Warehouse Edition

DB2 DWE integrates and simplifies the data warehouse environment to
deliver all of the capabilities in order to consolidate, manage, deli ver and
analyze your busi ness i nformation. It is optimized for reporting
and
analysis and data are summari zed and stored in a dimension
-
based model.
It can allow the people get a good high
-
level understanding of what it takes
to implement a successful data warehouse project in their business. It
represents the IBM offeri ng for
implementi ng i ntegrated Business

21

Intelligence solutions i n order to remove cost and time to facilitate the data
analysis for the business.



F
igure 3


Platform of IBM DB2 Warehouse


4.1.2.1

Power
ful DB2 data server foundation

The IBM DB2 platform is the foundati
on for the DB2 Warehouse solution.
With its massi vely scalable, shared
-
nothing distributed architecture, DB2 9
provides high performance for mi xed workload query processing against
both relational and native XML data. Advanced features such as data
partiti
oning, new row compression, multidimensional clusteri ng and
materialized query tables (MQTs) make DB2 a powerful engi ne for
dynamic warehousing.



22

4.1.2.2

Do it right

DB2 DWE captures new opportunities with a highly flexible, scalable data
warehousi ng framework, an
d combi nes common design tools, advanced
compression technology, inline analytics and pre
-
built mining capabilities.


4.1.2.3

Do it smarter

It can increase the return on your data warehouse investment by choosing
a high
-
performance, open
-
standards
-
based solution t
hat can be rapidly
implemented with reduced risk to your business.


4.1.2.4

Modeling and design tool

It provides the core components to graphically model data structures,
move and transform data withi n the data warehouse, implement online
analytical process (OLAP)
, build and score data mining models, and finally
the ability to develop embedded analytic application components.


4.1.3

Advantages

DB2 DWE is a comprehensi ve and i ntegrated solution to enterprise data
warehouse development. It provides tools to help data wareh
ouse
administrator on designing, deploying and maintai ning enterprise data
warehouse. DB2 DWE

s multidimensional database provides OLAP
(Onli ne analytical processing) which allow users to view data i n the system
from different point of view dynamically. Us
ers can generate statistics by

23

specifying their own requirement in DB2 DWE. On the other hand, DB2
DWE provides advanced compression of data which lowers the cost of
storing large volume of data. Benchmark reports that DB2 DWE can save
45
-
69 percent of dis
k spaces. The
compression of data also reduces the
read/write

frequency of the storage devices. Thus, the
efficiency of
querying is higher

than uncompressed data.


4.1.4

Disadvantages

The disadvantages of using DB DWE are the high cost and the high
system requi
rement. It costs about US $1,000 for each year license. The
system hardware requirement is high because of the data compression
scheme. Higher processi ng power is needed for both compression and
decompression on data access. Typical personal computers cann
ot meet
the requirement. Thus, a powerful server is needed. The cost of usi ng DB2
DWE
rises

for additional hardware.


4.1.5

Application of DB2 DWE

in Copenhagen

s TDC

Here is a real case of company getti ng benefits from DB2 DWE.
Copenhagen’s TDC, Denmark
’s leadi
ng telecommu
nications company,
can testify to the ongoing love affair of people with their telephones.

24

Danish customers make so many calls that each month TDC has to deal
with 1.5 terabytes of new raw data. The company’s i nformation technology
(IT) team re
ali zed that its existing technology system rapidly was running
out of storage capability. After upgrading their system to DB2 DWE, they
found that the producti vity is increased by offeri ng higher levels of
performance and additional applications to interna
l users. Customer
service is improved by offering most economical service plan based on
usage. Also, the marketing is enhanced through better customer targeting
and campaign tracki ng. TDC found that before the new system, TDC’s
batch window hardly afforded

enough time for the required data to be
processed by the next morning. In the past, if the team lost even one day
of producti vity, it would take us as much as a week to catch up. Now, TDC
has no problems

receiving, loading and process
ing data by morning.


For
more i nformation about the case
, please refer to the URL,
http://www
-
306.ibm.com/software/success/cssdb.nsf/CS/SPAT
-
6ATKAP?OpenDocument
&S
ite=dmbi&cty=en_us

.



25

S
ome screen shots of DB2 DWE are shown as the following.


T
he following f
igure

show
s

how DB2 DWE works. DB2 DWE can generate
reports on user

s needs.



Figure 4


Example on
Reports Generation on user

s needs (DB2 DWE)



26

If user wa
nts a report of more detailed level, detailed reports can be show
in very simple operation.



F
igure 5


Example on
Details Report (DB2 DWE)


DB2 DWE provides an interactive platform between the system developers
and the users. When the developers make so
me changes to the system,
message is prompted to users to tell them what have been changed.



F
igure 6


Example on
Prompt Message (DB2 DWE)


27

On users


need, some criteria can be set to the reports. The data entry
which conflicts with the criteria would be

highlighted. This makes user
easier to notice the characteristics of the data.


4.2

Oracle

4.2.1

Introduction to Oracle

Oracle Database
havi ng
the ideal technology for the data warehouse
because the software’s open i nterfaces offered easy i ntegration with
multiple
systems. This was important as the company wanted to import
data from existing applications i nto the data warehouse. It also
accommodated large amounts of detail, down to i ndividual flights in
specific segments.


Furthermore, the Oracle solution offered a

flexible structure for reporting,
enabli ng
customer
to design customized reports and allowing staff to
undertake multi
-
dimensional analysis. Oracle is also highly scalable,
ensuring it can cater for future growth.


A research from Wi nter Corporation and O
racle stated that the si ze of a
data warehouse triple itself i n every 2 years since 2001. There are many
reasons that lead to the significant growth i n si ze of the data warehouse,
which many of them can be explained by the industrial trends i n data
warehou
sing.


Oracle stated that the first reason that can explain this phenomenon is the

28

development of real
-
time busi ness (Oracle.com, 2006). Organi zations
strive to react to the market changes as quickly as possible to gain market
advantages. Data latency has

to be reduced in order to achieve a real
-
time
business model and as a result, the data si ze wi ll gradually i ncrease.
Enterprises also tend to have a detailed log of the enterprise data as some
regulatory compliance like Basel II (
the International Converg
ence of
Capital Me
asurement and Capital Standards) requires organi zations to
capture and retain detailed transaction histories. Moreover, new types of
storage
-
intensi ve information can create new business opportunities, like
the RFID technology, which is a
lso one of the reasons in resulti ng with a
huge size of the data warehouses.


Besides data volume, data warehousi ng is experiencing growth in different
dimensions. Traditionally data warehouses, or databases, were only used
for reporting and analysis, but
nowadays they often come with a prediction
function and are shared and i ntegrated to application. Furthermore, the
accessibility of data warehouse is no longer limited to users withi n the
enterprise, but to other customers, partners, and suppliers. Togethe
r with
the i ncreasing complexity of queries for i ntensive analysis of sophisticated
business i ntelligence applications, there are many criteria for a data
warehouse to satisfy today’s need.


4.2.2

F
eatures of Oracle Data Warehousing

Oracle, one of the most popul
ar databases for data warehousing, has
developed its data warehousing application to fit to the above needs.

29

There are some key features of the Oracle application that i ncrease its
capability.


4.2.2.1

Partitioning

Partitioning is the “foundation” for achieving ef
fective performance in
large
-
scale Oracle data warehouses. It means splitting data i nto separate
“chunks”. It can shorten the response time and i ncrease throughput. Some
other different features that wi ll be discussed below have to function
depending on th
e partitioning of the data warehouse.


4.2.2.2

Parallel Operation

Parallelism enables scalability, which makes large workloads, large
databases, and very large data warehouses (VLDW) possible. It is
because if not all parts i n the system is functioning in paralle
l, any
single
-
threaded path can potentially bottleneck the throughput of the
system and as a result limiting its ability to scale.


4.2.2.3

Materialized views

Materialized views can enable sophisticated data analysis on large data
sets. Significant processes woul
d have to be allocated to complicated joins
and aggregations i n order to produce the complex summaries required
without materialized views. On the contrary, however, queries can be
written against tables and views that they have been logically designed,
an
d the application will deal with the physical tables. Materiali zed views
often provide the performance boost necessary to turn a runaway query
into a powerful analytical tool.


30

4.2.2.4

Intelligent Optimization

The intelligent optimi zer selects the best strategy and

optimizes the order
of operations. As a result the query performances of i ndexi ng, partitioning,
and other data access features can be speeded up.



F
igure 7


Example on
query (Oracle Datawarehousing)


4.2.2.5

Table Compression

Data compression to save disk spa
ce is an attracti ve option to save costs
by decreasi ng storage requirements. Whereas traditional data
compression will lead to query performance degradation, Oracle’s table
compression feature eliminates duplicate, or redundant, data values
without any neg
ative impacts on the query performance.


31

4.2.2.6

Online Analytical Processing

OLAPs are deployed to gain better visibility i nto the busi ness. It helps to
understand what’s happening, why it’s happeni ng, and what will happen to
the business. Thus, all necessary kno
wledge and i nformation for planning,
budgeting, forecasting, sales, and marketing functions can be derived from
the existing databases. Oracle’s OLAP product uses a si ngle database
platform for all query processi ng. Both SQL and OLAP API queries can be
dir
ected to 1 single data store. Without the need of transferring data to
different environments, users can benefit from reducing data latency, faster
access to more recent data, and reducing low cost and complexity.



F
igure 8


Example on OLAPs (Oracle Dat
awarehousing)


32

4.2.2.7

Data Mining

Data mini ng is intended to sift through volumes of data to fi nd hidden
patterns. These patterns can derive new busi ness insights that can attract
and retai n customers, enhance customer and supplier relationships,
identify new sale
s opportunities, or identify potentially fraudulent behavior.


4.2.3

A
dvantages



Provided an i ntegrated view of the business by building an enterprise
data warehouse



Supported decision
-
making and busi ness analysis at all levels of the
company



Improved performan
ce through early detection of market opportunities



Catered for future growth with scalable solution



Oracle is better known and management often feels more comfortable
with a better
-
known vendor and product.



There are more Oracle DBAs in the job market



T
here are more books written on supporting Oracle



Vendors of data warehouse products will almost always write their
products to support Oracle first. In addition, there is usually more
experience with these products with Oracle



Oracle has a pretty complete

suite of products for data warehouse.



A company usi ng Oracle's ERP products almost always use Oracle’s
RDBMS for their data warehouse.



33

4.2.4

D
isadvantages



It can be extremely expensi ve to build and mai ntain. You may need to
have buy
-
in from senior management
to get approval for a data
warehouse.



You need large amounts of storage space, potentially one terabyte or
more.



Because there is a huge amount of data, it is possible to write queries
that seem to run forever and never come back with an answer (the
quer
y from the Twilight Zone).



The data is not up
-
to
-
date?
I
n some cases, 24 hours or more old.



They are not easily changed. If you spot an error in the data
warehouse, you will have to correct it i n the source system. If that
system cannot be changed, the d
ata warehouse cannot be changed
and you will have to live with incorrect data. For example, company
"ABC Widgets" could be stored in the database as "A.B.C. Widgets",
"AB and C Widgets", or "AB&C". Unless you know about these
possible irregularities, you w
ill get incomplete results. You may have a
difficult time persuading your company to change their procedures to
satisfy the data warehouse.



Because the data is coming from different sources, you may not be
able to get the same answer from your OLTP system

as you do from
the data warehouse. It will be difficult, if not impossible, to identify if
any OLTP transactions are missing from the data warehouse.



34

4.2.5

Application of Oracle Datawarehousing

in Absa Group
Limited

Absa Group Limited, one of South Africa’s l
argest fi nancial services
organi zations offers a complete range of products and services. Absa has
assets of R372 billion (US$62 billion), 686 staffed outlets, 5,468 ATMs and
South Africa’s largest i nternet banking customer base. In 2005, Barclays
took a m
ajority stake, to help Absa become the financial services leader in
South Africa and ultimately the pre
-
eminent bank on the African continent.


4.2.5.1

Challenges



Improve Absa’s business responsiveness by consolidating its
fragmented busi ness i ntelligence environm
ent, which required
compiling 1,200 reports and 31 business intelligence projects



Align busi ness intelligence to corporate strategy by standardi zed
methodology, architecture, tools and measurement



Deliver reports required by all business units



Cut costs o
f delivering and printing manually generated reports



Replace paper
-
based reports with electronic i ntelligence for indi vidual
business units to reduce report delivery
-
to
-
desk time


4.2.5.2

Solution



Used Oracle Database as the si ngle source of data to make the
Ente
rprise Data Warehouse more efficient



Consolidated the business i ntelligence environment on Oracle
Application Server to reduce duplication of reports


35



Implemented a business intelligence methodology with OLAP and
Oracle Balanced Scorecard tools to support c
ommon strategic
planning across the group



Aligned busi ness performance measurement to focus on causes of
problems and thus ensure better business decision making



Used Oracle Warehouse Builder to create common processes for
extracting data and loading into

the Data Warehouse



Able to source data from 52 core banking systems, on a daily, weekly
or monthly basis, as well as external data sources



Implemented Oracle Discoverer for end
-
user analysis



Anticipated cost savings from reduced manual reporting and remov
al
of disparate BI projects represents a possible return on investment of
more than 300% over five years


4.3

SAS


4.3.1

I
ntroduction to SAS

S
AS Institute Inc., has been a major producer of software si nce it was
founded in 1976. SAS was
originally

an acronym for Sta
tistical Analysis
System but for many
years

has been used as an arbitrary trade
-
name to
refer the
company

as a whole.


T
he SAS System,
originally

Statistical Analysis System, is an i ntegrated
system of software products provided by SAS
Institute

that enabl
es the
programmer to perform:



D
ata entry, retrieval, management, and mining


36



R
eport writing and graphics



S
tatistical and mathematical analysis



B
usiness planning, forecasting, and decision support



O
perations research and project management



Q
uality improvemen
t



A
pplications development



Data warehousing (extract, transform, load)



platform independent and remote computing


In addition, the SAS System integrates with many SAS business solutions
that enable large scale software solutions for areas such as human
res
ource management, fi nancial management, busi ness intelligence,
customer relationship management and more.



4.3.2

F
eatures of SAS Warehousing Administrator

SAS/warehousing admi nistrator is designed for the IT professional
responsible for creati ng and managing da
ta warehouse / data mart
processes. It provides Customi zable solution that offers a si ngle point of
control, maki ng it easier to respond to the ever
-
changi ng needs of the
business community. Also, it simplifies the creation and maintenance of
data warehou
ses.


The main benefit of using SAS/warehousing admi nistrator is simplifying the
setup and management of multiple data warehouses and data marts. The
details are as follows,


37



Integrates extraction, transformation and loading tools for building and
managing
data warehouses/data marts.



Provides a framework for effecti ve warehouse management through a
metadata
-
driven architecture.



Facilitates busi ness subject defi nition, consolidation of busi ness rules,
scheduli ng of processes for warehouse maintenance and in
tegration
with decision
-
support tools for effective warehouse exploitation.



Leverages the strengths of SAS software and rapid warehousi ng to
deliver the well
-
proven benefits of a data warehouse even faster.


With usi ng the graphical user interface, the v
isuali zation, navigation and
maintenance of the data warehouse are simplified and eliminate much of
the coding work required to build and manage it. Moreover, it offers the
adaptability and the manageability you need as your busi ness and
information needs

change, as more data is added, as processes become
more complex, and as users require greater support.


F
igure

9



Example

on
use interface
(SA
S/
Warehousing Administrator)


38


F
igure

10



Example on reports (SAS Data Warehouse)


4.3.2.1

SAS Enterprise Data Integra
tion

In different from common data warehousi ng, SAS provides a complete
functional data capturing, storage, integration and analysis software across
the enterprise. The SAS Enterprise Data Integration attains and manages
consistent and trusted data through
out the organization i n a flexible and
reliable manner.




Graphical user i nterface provides technicians with an interacti ve,

39

single poi nt of control for managi ng data integration processes,
including wi zards for bui lding and executing data access,
transform
ations and storage process flows.



Connectivity to more data sources on more platforms such as IBM
DB2, Oracle DB, Microsoft Access, Sybase, etc.



Data quality embedded into batch, near
-
time and real
-
time processes



Metadata is captured and documented through
out transformation and
data integration processes



Migrate or synchronize data between database structures, enterprise
applications, mainframe legacy files, text, XML and message queues



Join data across these virtual data sources for real
-
time access and
an
alysis



Business metadata design i nterface allows data analysts to quickly
build semantic layer



Business rules library for reusable busi ness rules clean, standardize,
match and enhance data as it moves i nto the master reference file
and is reused for downst
ream processes


F
igure

11



Overview on SAS Data Integration


40

4.3.3

A
dvantages

4.3.3.1

High Compatibility

Access to ERP systems such as Baan, People Soft, and SAP; relational
databases such as DB2, Oracle, Informix, ODBC, MS SQL Server, Sybase,
and Teradata; and non
-
rel
ational databases such as Adabas and PC file
formats.


4.3.3.2

Point
-
and
-
click interface

The user friendly i nterface enable data management specialist
implementing the warehousing application without the assist of
programmer and also operators
.


4.3.4

D
isadvantages

4.3.4.1

Unkn
own Implementation cost

When compare with other, like Oracle, SAS does not has a well pricing
policy. It gives difficulty for customers to choose between available
products.


4.3.4.2

Unknown difficulty of implementation

Compare with other company, such as Oracle o
ffers data warehouse and
analytic specific services that combi ne technical leadership and expertise
with Oracle technology to provide a complete business i ntelligence
solution, SAS does not mention the degree of difficulty of implementation.



41

4.3.5

Application
of SAS Data Warehousing in the HK Trade
Development Co
u
ncil

The Hong Kong Trade Development Council, which launched
Business
-
Stat On
-
line usi ng Data Warehousing and Web Enablement
technology from SAS.


Business
-
Stat On
-
li ne (BSO) is an i nteractive on
-
li ne
service allowing
companies to access monthly trade figures compiled by the Census and
Statistics Department. Information available i ncludes Hong Kong’s total
trade figures, overseas trade, and trade according to specific types of
product and service.


The

project involved the design and implementation of a Data Warehouse
containi ng fi ve years of export, domestic export and import data broken
down by a wide range of product and market areas. Other trade service
data was also imported into the system usi ng c
ustomized tools provided by
SAS. SAS also developed an extensi ve number of statistical reports. Over
6000 pre
-
summarized general tables were created for on
-
line access,
designed as a starting point to Hong Kong’s general trade performance. In
addition, use
rs of the service can view an unlimited number of dynamic
reports based on selection criteria such as region, industry and product
type. Registration and admi nistration tools provided by SAS allow
subscribers to register for the BSO service on
-
line free of

charge. They are
then automatically notified by e
-
mail of their logon ID and password,
allowing them full access to the service.


42

5.

How to implement Data Warehouse
successfully

Talki ng so many benefits about the application of data warehouse i n an
enterpris
e, but how we could implement the DW technology successfully
into our operational processes? Denis Kozar suggested the “seven deadly
sins” on the DW implementation.


5.1

“If you Built It, They Will Come”

The bli nd faith on the DW technology leads to the failur
e to recogni ze the
importance of defini ng a set of business objectives for the data warehouse
prior to its implementation. A clearly defined data warehouse plan is
important to the needs of the entire enterprise and a documented set of
requirements is nece
ssary to guide the design, construction, and rollout of
the project.


5.2

Omission of an Architectural Framework

One the most important factors in a successful data warehouse
implementation is the development and maintenance of a comprehensive
architectural fr
amework. The framework serves as the bluepri nt for
construction and use of the various DW components. Developers need to
consider, the number of end
-
users, volume and di versity of data, expected
data
-
refresh cycle, etc., in the DW architecture.



43

5.3

Understand
ing the Importance of Documenting
Assumptions

The assumptions and potential data conflicts associated wi th the DW must
be included in the architectural framework for the project. Several
questions need to be considered during the requirements phase of the
project that serve to reveal these important underlying assumptions about
the DW. How much data would be loaded into the warehouse? How often
the data need to be refreshed? On what platform the DW will be developed?
Answers to these questions are essential

to the success of DW
implementation.


5.4

Failure to Use the Right Tool for the Job

The design and construction of a DW is much different from that of an
operational application system. The DW tools can be categori zed i nto four
areas:



Analysis Tools


assist
in identification of data requirements



Development Tools


responsible for data cleansing, code generation,
data integration, and loading of the data into the data repository.



Implementation Tools


contain data acquisition tools to gather
process, clean,
replicate, and consolidate data.



Delivery Tools


assist in data conversion, derivation, and reporti ng for
the application platform.

Correct application of these tools could help to implement the DW
efficiently and effectively.



44

5.5

Life Cycle Abuse

The life c
ycle of DW development is a conti nuous, ongoi ng set of acti vities
that flow from initial investigation of DW requirements through data
administration and back again. The development of DW project should be
kept running conti nuously as if the DW is to remai
n a viable source of
decision
-
making support in the ever changing business environment.


5.6

Ignorance Concerning the Resolution of Data
Conflicts

Analysis must be conducted to determine the best data sources available
withi n an organi zation. Once these system
s have been identified, the
conflicts associated with disparate nami ng conventions, file formats and
sizes, and value ranges must be resolved. This process may involve
working with data owners to establish an understanding with regard to
future planned or
unplanned changes to the source data. Failure to allow
sufficient time and resources to resolve data conflicts can delay a
warehouse implementation and result i n an organizational deadlock that
can threaten the success of the project.


5.7

Failure to Learn fro
m Mistakes

The ongoi ng nature of the DW development cycle suggests that DW
project simply relates one another. Because of this, careful documentation
of the mistakes made in the previous projects wi ll directly impact the quality
assurance acti vities of all

future projects. By learning from the past, a

45

strong DW with lasting benefits can be built.


If developers can pay attention to the above areas, the implementation of
Data Warehouse will certainly bring great benefits to the business.



46

6.

Concerns

& Conclus
ion

Data warehouse can bri ng many benefits to enterprises, however, there
are concerns of using it.



Extracting, cleaning and loading data is time consuming.



Data warehousi ng project scope must be actively managed to deliver
a release of defined content an
d value.



Problems with compatibility with systems already in place.



Security could develop into a serious issue, especially if the data
warehouse is web accessible.



Data Storage design controversy warrants careful consideration and
perhaps prototyping o
f the data warehouse solution for each project's
environments


So, managers need to aware of the concerns when usi ng the data
warehouse, so that they can get the benefits of data warehousi ng without
any problems.


47

7.

References

George M. Marakas. (©1999) pp.
343
-
346
,
Decision support system i n the
twenty
-
first century: DSS and data mini ng technologies for tomorrow’s
manager

IBM, Background,

http://www
-
03.ibm.com/press/us/en/background.wss

IBM, D
B2 Data Warehouse Edition,

Features and benefits

,

http://www
-
306.ibm.com/software/data/db2/dwe/features.html?S_CMP=rnav

IBM, DB2 Data Warehouse Edition,

Overview

,

http://www
-
306.ibm.com/software/data/db2/dwe/

IBM,

Denmarks


TDC answers the call of Danish telephone consumers
with IBM Data Warehouse

,

http://www
-
306.ibm.com/soft ware/success/cssdb.nsf/CS/SPAT
-
6ATKAP?OpenDocu
ment&Site=dmbi&cty=en_us

Ken Orr (©1996, revised 2000)
,
Data Warehouse Technology
,

http://www.kenorrinst.com/dwpaper.html

M
anufacturi ng

Business Technology: Software Finder,

Oracle vs SAS

,
http://softwarefinder.mbtmag.com/search/for/Oracle
-
vs
-
SAS.html


Martyn R Jones (1999), “Brief defi ning characteristics of a Data
Warehouse”,
http://www.brint.com/wwwboard/messages/4599.html


Paul Westerman
,
Data Warehousing : using the Wal
-
Mart model

SAS
,
Data Integration
,

http://www.sas.com/technologies/dw/

Wikipedia,

Bill Inmon

,

http://en.wikipedia.org/wiki/Bill_Inmon

Wikipedia
,

SAS Institute

,
http://en.wikipedia.org/wiki/SAS_Institute

Wikipedia,

SAS System

,
http://en.wikipedia.org/wiki/SAS_System