WebFOCUS Hyperstage Overview

hesitantdoubtfulAI and Robotics

Oct 29, 2013 (3 years and 9 months ago)

1,024 views

Peter Azzarello

April 11, 2012

IB Toronto User Forum

WebFOCUS

Hyperstage

Overview

Summit 2012

WebFOCUS

Higher Adoption & Reuse with Lower TCO

Reporting

Query &

Analysis

Dashboards

Information

Delivery

Performance

Management

Enterprise

Search

Visualization

& Mapping

Data Updating

Predictive

Analytics

MS Office &

e
-
Publishing

Extensions to the
WebFOCUS
platform allow you
to build more
application types at
a lower cost

Business to

Business

Data Warehouse

& ETL

Master Data

Management

Data Profiling &

Data Quality

Business Activity

Monitoring

High Performance

Data Store

Mobile

Applications

Copyright 2007, Information Builders. Slide
3

The Business

Challenge


Big Data

Big Data and Machine Generated Data

Data

Storage

Time

Machine
-

Generated

Data

Human
-
Generated

Data

Today’s Top Data
-
Management Challenge

Source: KEEPING UP WITH EVER
-
EXPANDING ENTERPRISE DATA ( Joseph
McKendrick


Unisphere

Research October 2010)

How Performance Issues are Typically Addressed


by Pace of Data Growth

0%
20%
40%
60%
80%
100%
Don't Know / Unsure
Upgrade networking infrastructure
Archive older data on other systems
Upgrade/expand storage systems
Upgrade server hardware/processors
Tune or upgrade existing databases
7%

21%

30%

33%

54%

66%

4%

32%

44%

60%

70%

75%

High Growth
Low Growth
When organizations have long running queries that limit the business, the
response is often to spend much more time and money to resolve the
problem

IT Manager’s try to mitigate these response times …..

Copyright 2007, Information Builders. Slide
6

Traditional Data
Warehousing


Labor intensive, heavy
indexing, aggregations
and
partitioning


Hardware intensive:
massive storage; big servers


Expensive and complex

More Data,

More Data Sources

More Kinds of Output

Needed by More Users,

More Quickly

Limited Resources

and Budget

010101010101010101010101010
1

0101010101010101010101010

01010101010101010101
01

1

0101010101010101010
101

10

0

1

1

010101010101010101010
1010

01010101010101010101010101
01

1

010

1

Real time data

Multiple databases

External Sources

Data Warehousing Challenges

New Demands:

Larger transaction volumes driven by the internet

Impact of Cloud Computing

More
-
> Faster
-
> Cheaper

Data Warehousing Matures:

Near real time updates

Integration with master data management

Data mining using discrete business transactions

Provision of data for business critical applications

Early Data Warehouse Characteristics:

Integration of internal systems

Monthly and weekly loads

Heavy use of aggregates

Data Warehousing Challenges

CUBES/OLAP

Classic Approaches to deal with Large Data

INDEXES

Limitations of Indexes


Increased Space requirements


Sum of Index Space requirements can exceed the source
DB


Index Management


Increases Load times


Building the index


Predefines a fixed access path

Limitations of OLAP


Cube technology has limited scalability


Number of dimensions is limited


Amount of data is limited


Cube technology is difficult to update (add Dimension)


Usually requires a complete rebuild


Cube builds are typically slow


New design results in a new cube

Easy Migration to
Hyperstage


Most cubes will be fed from a relational source


Common that relational source is a star schema


The source star schema can be migrated directly to
Hyperstage


WebFOCUS metadata can be used to define hierarchies and
drill paths to navigate the star schema


Copyright 2007, Information Builders. Slide
12

Pivoting Your Perspective:

Columnar Technology ….

1
. Impediments to business agility:
Organizations often must wait for DBAs to
create indexes or other tuning
structures, thereby
delaying access to data. In
addition, indexes significantly slow data
-
loading operations and increase the size of
the database
, sometimes by a factor of 2x.

2. Loss of data and time fidelity:
IT generally performs ETL operations in batch
mode during non
-
business hours.
Such transformations
delay access to data and
often result in mismatches between operational and analytic databases.

3. Limited ad hoc capability:
Response times for ad hoc queries increase as the
volume of data grows.
Unanticipated queries
(where DBAs have not tuned the
database in advance) can result in unacceptable response times, and may
even fail
to complete.

4
. Unnecessary expenditures:
Attempts to improve performance using hardware
acceleration and database tuning schemes
raise the
capital costs of equipment and
the operational costs of database administration. Further, the added complexity of
managing
a large
database diverts operational budgets away from more urgent IT
projects
.

These Solutions Contribute to Operational Limitations

The Limitation of Rows

Row
-
based databases are
ubiquitous because so many
of our most important business
systems are transactional.


Row
-
oriented
databases

are well suited for
transactional environments,
such as a call center where a
customer’s entire record is
required when their profile

is retrieved and/or when fields
are frequently updated.

The Ubiquity of Rows …

But
-

Disk
I/O becomes a substantial limiting factor since
a row
-
oriented design forces the database to retrieve all
column data for any query.

30 columns

50

m
illions

Rows

The Limitation of Rows

Row Oriented

(
1, Smith, New York, 50000; 2, Jones, New York, 65000; 3, Fraser, Boston, 40000; 4, Fraser, Boston, 70000
)


Works well if all the columns are needed for every query.


Efficient for transactional processing if all the
data for
the row is available


Works well with aggregate results (sum, count, avg. )


Only columns that are relevant need to be touched


Consistent performance with any database design


Allows for very efficient compression


Column Oriented

(
1, 2, 3, 4; Smith, Jones, Fraser, Fraser; New York, New York, Boston, Boston, 50000, 65000, 40000, 70000
)


Pivoting Your Perspective:
Columnar Technology

Employee Id


1

2

3

Name

Smith

Jones

Fraser

Location

New York

New York

Boston

Sales

50,000

65,000

40,000

4

Fraser

Boston

70,000

Employee Id


1

2

3

Name

Smith

Jones

Fraser

Location

New York

New York

Boston

Sales

50,000

65,000

40,000

1

Smith

New York

50,000

2

Jones

New York

65,000

3

Fraser

Boston

40,000

1

2

3

Smith

New York

50,000

Jones

New York

65,000

Data stored in rows

Fraser

Boston

40,000

Data stored in columns

Pivoting Your Perspective:
Columnar Technology

4

Fraser

Boston

70,000

4

Fraser

Boston

70,000

4

Fraser

Boston

70,000

Copyright 2007, Information Builders. Slide
17

Introducing

WebFOCUS

Hyperstage

The
Hyperstage

Mission

Improve database performance for
WebFOCUS

applications with less
hardware, no database tuning and easy
migration.

The
WebFOCUS

Hyperstage

high
performance analytic
data
store is designed
to handle business
-
driven queries on
large
volumes
of data

without IT intervention. Easy to
implement
and
manage,
Hyperstage

provides
the
answers to your
business users need at a price you can afford
.

Introducing
WebFOCUS

Hyperstage

….

What is it?

Hyperstage

combines
a columnar database
with
intelligence we
call the Knowledge
Grid to deliver fast
query responses.

.

Introducing
WebFOCUS

Hyperstage

….

How is it architected?

Hyperstage

Engine

Knowledge Grid

Compressor

Bulk

Loader


Unmatched Administrative Simplicity


No Indexes


No data partitioning


No Manual tuning


Self
-
managing
: 90% less administrative effort


Low
-
cost
: More than 50% less than
alternative
solutions


Scalable
, high
-
performance
: Up to 50 TB using
a
single
industry standard server


Fast
queries
: Ad
-
hoc queries are as fast as
anticipated queries
, so users have total flexibility


Compression
: Data compression of 10:1 to
40:1
that
means a lot less storage is
needed, it might
mean you can get the entire database in memory!

Introducing
WebFOCUS

Hyperstage

….

What does this mean for Customers?

Create Information

(Metadata) about the data,


and,
upon Load,

automatically …

Uses the metadata when

Processing a query to

Eliminate / reduce need to

access data

Architecture Benefits

o
Stores it in the Knowledge Grid (KG)

o
KG Is loaded into Memory

o
Less than 1% of compressed data Size

o
The less data that needs to be accessed,


the faster the response

o
Sub
-
second responses when answered by KG

o
No Need to partition data, create/maintain indexes


projections, or tune for performance

o
Ad hoc queries are as fast as static queries,


so users have total flexibility

Introducing
WebFOCUS

Hyperstage

….

How does it work?

WebFOCUS

Hyperstage

Runtime Architecture

Hypercopy

Hyperstage

Server

Hyperstage

Engine

MySQL

WebFOCUS

Server

WebFOCUS

Pro Server

Hyperstage

Adapter

Knowledge Grid

Compressor

Bulk

Loader

Hypercopy

Hyperstage

Server

Hyperstage

Engine

WebFOCUS

Server

WebFOCUS

Hyperstage

Adapter

Knowledge Grid

Compressor

Bulk

Loader

Smarter
Architecture


No maintenance


No query planning


No partition schemes


No DBA

Data Packs



data
stored

in manageably sized,
highly
compressed
data
packs

Knowledge Grid


statistics
and metadata “describing”
the super
-
compressed data

Column Orientation

WebFOCUS

Hyperstage

Engine

Data
compressed
using algorithms
tailored to

data type

How does it work?

Copyright 2007, Information Builders. Slide
26

Summary

Copyright 2007, Information Builders. Slide
27

Business Intelligence


Meeting Requirements

WebFOCUS

Hyperstage

The Big Deal…


No indexes


No partitions


No views


No materialized aggregates



Value proposition


Low IT overhead


Allows for autonomy from IT


Ease of implementation


Fast time to market


Less Hardware


Lower TCO



No DBA
Required!

What’s it look like?

What’s it look like?

Pay no attention to that man behind the curtain.


CREATE FILE
baseapp
/
pa_inventory_ind_t

DROP


-
RUN



BULKLOAD
baseapp
/
pa_inventory_ind_t

FOR SQLINLD INV_CODE; TYPE; CATEGORY; NAME; MODEL;
MEASURE1_INV; MEASURE2_INV; MEASURE3_INV;



JOIN



SYMBOLS.SYMBOLS.SYMBOL IN SYMBOLS TO MULTIPLE QUOTES_2B.QUOTES_2B.SYMBOL



IN QUOTES_2B TAG J0 AS J0



END


TABLE FILE SYMBOLS


PRINT



SYMBOL CLOSE_DATE CLOSE_PRICE VOLUME OPEN_PRICE


WHERE ( SYMBOL EQ '&SYMBOL.(<MSFT,MSFT>).SYMBOL.' ) AND ( CLOSE_DATE GT '&START_DATE.(<2000
-
03
-
01,2000
-
03
-
01>).
yyyy
-
mm
-
dd.' ) AND ( CLOSE_DATE LT '&END_DATE.(<2000
-
03
-
31,2000
-
03
-
31>).
yyyy
-
mm
-
dd.' );


ON TABLE SET PAGE
-
NUM NOLEAD


ON TABLE NOTOTAL


ON TABLE PCHOLD FORMAT HTML


ON TABLE SET HTMLCSS ON


ON TABLE SET STYLE *



INCLUDE =
endeflt
,


$


ENDSTYLE


END

Example


Focus to
Hyperstage

Compression


243639 Rows


Q&A

Copyright 2007, Information Builders. Slide
33

STAR SCHEMA
CONSIDERATIONS

Leverage the Knowledge Grid


Do constrain the fact table directly


Do use sub
-
selects instead of joins


Do use date based constraints as
much as possible


Do add additional columns to
create useful knowledge nodes





Everyone wants to be a Star





Adding as many WHERE conditions as you can to your SQL
increases the chance that knowledge grid statistics can be
used to increase the performance of your queries.