Online Analytical Processing

goldbashedΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

370 εμφανίσεις


Online Analytical Processing


Peng Zhou


pzhou6@illinois.edu, Civil Engineering, Construction Management;


Xuan Lv


xuanlv2@illinois.edu, Civil Engineering, Construction Management.


Abstract
:

One of the most important applications of database in
the
industry and business world

is the decision supporting

system. Driven by the need to solve business problems in
terms of a multidimensional model and with less

time, on
-
line analytical processing
technology has been widely used to facilitate business decis
ions. In the

opening part
of the survey, , we are going review the history of OLAP, and describe the basic
operations of

OLAP. Then we will also cover the structure of OLAP system as well as
the dif
ferent types of OLAP systems.

While in the main part of th
e survey, we will
provide a summary of the latest development of OLAP technology

together with its
application in some software. In the end, we will reach a con
clusion on the prospect
of OLAP

technology in the industry and the business world.


1.

Introduction


When a company has accumulated
certain

amount of data, it often wan
ts a report to
get a summary of the data, to
visualize

the data and to make decision based on the
data
. However, u
sing traditional IT tools such as SQL

can bring some problems.

When
using

traditional IT tools

for business analysis,

large da
ta sets and tricky
queries
may be involve
d
,
then
it can take a long time
to design the report
with
traditional syst
ems and
it

does not faci
litate explorative views on the

data
.

Moreover,
if the company wants to make any change in the
reports
, further modification in
legacy applications is required.

In order to solve the above problems, OLAP is developed and applied in the business
world.
OLAP stands for online analytical
process
ing
, which
is a powerful technique
especially for solving business

problems in te
rms of a multidimensional model

(A
Model for Object Relational OLAP).

OLAP describes a series of
technologies

that are
designed for live ad hoc data access and analysis.
It pr
ovides multidimensional views
of business data
supported by multidimensional database.

For example, it can answer
the query like

How much revenue did the new product genera
te by month in the
Chicago
division?”
, this query requires two
dimensions: time

and location.

OLAP applications present
t
he end user

with information rather than just
data.
They
make it easy for users to identify patterns or trends in the data very quickly, without
the
need for them to search through mountains of "raw" data.



2.

Fundamentals

of OLAP


2.1

Logical architecture


2.2

Operations of OLAP


a.

ROLL
-
UP: Roll
-
up creates a subtotal at any level of aggregation needed from the
most detailed up to the grant total. This function is also called consolidation.

Example:

b.

DRILL
-
DOWN. Drill
-
down

breaks the subtotal at any level of granularity to
lower level of granularity to lower level of granularity in hierarchy, means it gives
details of relationship at the lower level.

Example:

c.

SLICING AND DICING


Selecting a subsection of data cube based on

the
constants in one or few dimensions. If one dimension is fixed, the operation is
called slice and if more than one dimension are fixed, the operation is called dice.

Example:

d.

PIVOT. Pivoting is swapping of columns and rows. This allow user to look at d
ata
from different view. This is also commonly known as rotation.


2.3

Data store
architectures


OLAP systems can be classified into the following categories:

MOLAP
,
which uses multidimensional arrays in memory to store data cubes.

This is

suitable for
applications requiring iterative and comprehensive time series analysis of

trends. However, it has a limitation because when
the navigation or the dimension

requirements change, the data structures may need
to be physically reorganized to

optimally support

the new requirements. It dynami
cally obtains the detailed data

maintained in an RDBMS when an applic
ation reaches the bottom of the

multidimensional cells during a drill
-
down analysis.

ROLAP
,

which integrates OLAP into a relational sy
stem, with the data s
tored in a

relational database. It maps operations on the multidimensio
nal data to standard
relational

operations (SQL). It does not need to structure the data around any desired
view. The

OLAP functions are achieved through metadata, bypassing

any require
ment
for creating a

static multidimensional data structure.

HOLAP is specialized OLAP, which combines
MOLAP and ROLAP. It stores some

summaries in memory and stores the base data and other summari
es in a relational

database.


3.

New developments of OLAP
technology


3.1

Object relation OLAP


I
n traditional OLAP, only attributes that are primitiv
e and additive can be chosen as

measures. However in OROLAP we need to cons
ider another way of aggregation

because an aggregation can generate a group of objects that
are not additive
s
.

U
nlike a relational database,
OROLAP
allo
ws the value of an attribute to

be an object
ID referencing another object or a set of object IDs referencing a set of

objects. In an
ORDB, we map a table and a tuple to a clas
s and an object resp
ectively as

follows:

Class: Each table in an object relational model is mapped to one class.

Object: Each tuple in a table is mapped to an object
whose class is derived from the

corresponding table.


3.1.1

OROLAP operations



OROLAP
allow the user to build,
manipulate

and produce a new cube. Once an
operation is applied to a cube the resulting cube is

called the “active” cube during a
user session.


a.

Building a cube
:
have to identify what to be

analyzed (facts) and what to be used
to accomplish the analysis (d
imensions).


b.

Object slice

In traditional OLAP, the Slice operation deletes one dimension of a
cube. This is not the

case in OROLAP because a dimension can be a primitive
attribute or an attribute whose

value is an object or a set of objects. An Object
Slic
e removes a dimension class or attributes of a dimension class from a

cube and
aggregates the cells.



c.

Object dice and select
:
When a dimension is not primitive, the traditional Dice and
Select operations cannot be

applied because the corresponding
attribute is an
object characterized by multiple

attributes. In OROLAP
, a
n Object select chooses
those objects whose specified attributes have a specified

value.

An Object dice
deletes those objects whose specified attributes have some specifie
d
value.


d.

Se
mantic dice and select
:
The Object Select and Dice operations discussed in the
previous section are horizontal or

vertical restrictions of a cube. OROLAP
supports “diagonal” or “distributive” restrictions

of a cube via user
-
defined
functions; the cells tha
t satisfy the conditions specified by the

user
-
defined
functions are chosen. In order to implement the idea, the condition part of

object
Select and Dice can be replaced by some user
-
defined functions that return true or

false

A Semantic Select and Dice is

an Object Select and Dice whose condition
part

contains one or more user
-
defined functions.



e.

Range and interval operation
:

T
emporal

stands for the manipulation of historical
informatio
n
including current and past data. A non
-
temporal database models the
current state of the

world, but a temporal database keeps the past states. The valid
time, which is the set of

time intervals during which the fact is true, and
transaction time, which is the time

interval during which the fact is current, are
important fa
ctors of temporal operations.

While a temporal operation usually
refers to a restriction on time using ranges and

intervals, it can be as well extended
to cover other attributes, such as age and revenue.

The restriction seems like
object Select and Dice; h
owever objects which satisfy a

restriction can be
aggregated into one cell and a point of the aggregated dimension can be

a range
value or an interval value.


f.

Dimension drill down and roll up
:
In traditional OLAP, drill down and roll up are
applied to a
dimension hierarchy. A

hierarchy is composed of a set of attributes of
a dimension which are primitive, and cells

of a cube can be aggregated according
to the dimension hierarchy.

In OROLAP, each node in an object hierarchy
contains a set of object IDs. Al
ong a

hierarchy, groups of object IDs are composed
or decomposed based on the levels in the

hierarchy, such as year, month or day.


g.

Analyzing a cube
:

analysts not only need to examine the aggregated and
summarized data, but

also need to derive new informat
ion via user
-
defined
functions.


3.1.2

OROLAP and ROLAP


ROLAP can be considered as a special case o
f OROLAP. Like ROLAP, OROLAP is
integrated into a relational system, where data are stor
ed in a relational database and

operated based on a multidimensional
model. The b
asic difference between the two

models lies in the structure of a cell and a dimension.
A cell in OROLAP can consist
of

an object or a set of objects. However ROLAP does not al
low a cell to be an object
or a

set of objects; it only allows a cel
l to be a simple and additive attribute.




4.

The limitation of OLAP


When first
investigating OLAP, it is easy to quest
ion the need for it. If an end
user
requires high
-
level information about their company, then that information
can always
be derived from the underlying
transa
ctional data, hence we can

achieve every
requirement with an OLTP
applic
ation. Were this true, OLAP would not have become
the
important topic

that it is today. OLAP exists
and

continues to expand in usage
bec
ause
there are limitations with the
OLTP approach. The limits of OLTP
applications are seen in three areas.


a.

Increasing data storage

The trend towards companies storing more & more

data about their
business
shows no sign of stopping. Retrievi
ng many thousands of records for
immediate

analysis is a time and resource consuming process, particularly when

many
users are using an application at the same time. Database engines that can quickly
retrieve a few tho
usand records for half
-
a
-
dozen users s
truggle when forced to
return
the results of large queries to a thousand concurrent users.

Caching frequently requested data in temporary ta
bles & data stores can relieve some
of the
symptoms,
but only goes part of t
he way to solving the problem,
particu
larly if
each user requires a slightly different set of data.

In a modern data warehouse where the required data might be spread
across

multiple tables, the complexity of the query
ma
y also cause time
delays
and
require
more system
resources which mea
ns

more money must be spent on
database servers
in order to keep up with user demands.


b.

Data versus Information

Business users need
both data and inform
ation. Users who make business decisions
based on events that are happening need the information contained within their
compan
y's data. A stock controller
in
a superstore might want the
full list of all
goods sold in order to check up on

stock levels, but

the manager might only want
to
know the amount of fruit & frozen goods being sold. Even more useful would be the
trend of frozen good sales over the last three months.

In order to answer the question "How many frozen

goods did we sell today?"
,

an
OLTP a
pplication must retrieve all of the frozen g
ood sales for the day and then count
them, presenting only the
s
ummarized information to the end
-
user. To

make a
comparison over three months, this proc
edure must be repeated for
multiple days. Multip
ly the problem by s
everal hundred stores, so that the
managing
director can see how the whole company is performing and it is easy
to see that the
problem requires considerable amounts of processing power
to

provide answers
within the few seconds that a bu
siness user would be prepared to wait.

Database engines were not primarily designed to
retrieve groups of records and
then
sum them together mathematically and the
y tend not to perform well when

asked to
do so. An OLTP application would always be able to
provide
the

answers, but not in
the typical few
-
seconds response times demanded by users.

Caching results doesn't help here either, because i
n order to be effective, every
possible aggregation must be cached, or the benefit won't always be
realized.

Cach
ing on this scale would require enormous sets of
t
emporary tables
and

enormous
amounts of disk space to store them.


c.

Data layout

The relational database model was designed for t
ransactional processing and is
not
alw
ays the best way to store data when atte
mpting to answer
business
questions such
as "Sales of computers by region" or "Volume of credit
-
card transactions by month".
These types
of queries r
equire vast amounts of data to be retrieved
and

aggregated
on
-
demand,
somethi
ng that will require time
and
system resources to achieve.

More significantly, related queries such as "Product sales broken down by
region" and
"Regions broken down by product sale
s" require separate queries to
be performed on
the same data set.

The answer to the limitations of OLTP

is not

to spend more & more
money on bigger &
f
aster databases, but to use a different approach altogether to the
probl
em and that approach is OLAP.

OLAP applications
sto
re data in a different way
from

the traditional relational
model,
allowing them to
work with data sets designed to serve greater numb
ers of users in
parallel. Unlike databases, OLAP data stores are designed to work with aggregated
data, allowing them to
quickly
answer high
-
level questions about
a company's data
whilst still allowin
g user
s to access the original
transactional data when required.


Reference

[1] Thomas, H., and Datta, A. (2001). "A Conceptual Model and Algebra for On
-
Line
Analytical Processing in

Decision Support Databases." Information Systems Research, 12(1), 83.

[2] Gonza
les, M. L. (2004). "The SQL of OLAP." Intelligent Enterprise, 7(14), 18
-
47.

[3] Witkowski, A., Bellamkonda, S., Bozkaya, T., Folkert, N., Gupta, A., Haydu, J.,
Sheng, L., and

Subramanian, S. (2005). "Advanced SQL Modeling in RDBMS."

ACM Transactions
on Database Systems,

30(1), 83
-
121.

[4] Kimball, R. (2007). "Dimensional Relational vs. OLAP: The Final Deployment
Conundrum." Intelligent

Enterprise, 10(4), 1
-
1.

[5] Akcan, H., Astashyn, A., and Brönnimann, H. (2008). "Deterministic algo
rithms
for sampling count data."

Data & Knowledge Engineering, 64(2), 405
-
418.

[6] Butuza, A., and Hauer, I. (2010). "AN OVERVIEW OF RELATIONAL
DATABASES AND OLAP

TECHNOLOGY." O VEDERE GENERALA ASUPRA BAZELOR DE DATE
RELATIONALE SI TEHNOLOGIE

OLAP., 12(2),

1
-
8.

[7] Ikeda, S., Sheu, P. C. Y., and Tsai, J. J. P. (2010). "A MODEL FOR OBJECT
RELATIONAL OLAP."

International Journal on Artificial Intelligence Tools, 19(5), 551
-
595.

[8] McCarthy, M., and He, Z. (2011). "Efficient Updates for OLAP Range Queries on
Flash Memory."

Computer Journal, 54(11), 1773
-
1789.

[9] Pardillo, J., and Mazón, J.
-
N. (2012). "Model
-
driven development of OLAP
metadata for relational data

warehouses." Computer Standards & Interfaces, 34(1), 189
-
202.