Extreme Performance Data Warehousing

levelsordData Management

Nov 20, 2013 (3 years and 11 months ago)

110 views

<Insert Picture Here>

Extreme Performance Data Warehousing

Çetin

Özbütün

Vice President, Data
Warehousing Technologies

21%

20%

21%

19%

17%

5%

12%

18%

25%

34%

Less than 500 GB
500 GB - 1 TB
1 - 3 TB
3 - 10 TB
More than 10 TB
In 3 Years
Today
Source: TDWI Next Generation Data Warehouse Platforms Report, 2009

Challenge: Much More Data to Analyze

Data Warehouse Size and Growth


Challenge: No Single Source of Truth

Expensive Data Warehouse Architecture

ETL

OLAP

Data Mining

OLAP

Data Mining

ETL

Data

Marts

Data

Marts

DW

Strategy


Single source of truth


Extreme performance


Lower cost of ownership


Deeper Insight

DW

Strategy


Single source of truth


Extreme performance


Lower cost of ownership


Deeper Insight

Consolidate Onto a Single Platform

Faster Performance, Single Source of Truth

Oracle Database
11g

Oracle Exadata Database Machine

Data

Marts

Data Mining

Online

Analytics

ETL

Oracle Exadata Database Machine

For OLTP, Data Warehousing & Consolidated Workloads


Improve query performance by 10x


Better insight into customer requirements


Expand revenue opportunities


Consolidate OLTP and analytic workloads


Lower admin and maintenance costs


Reduce points of failure


Integrate analytics and data mining


Complex and predictive analytics


Lower risk


Streamline deployment


One support contact



Select sum(sales)

where salesdate=

‘22
-
Jan
-
2010’…

Sum

Return Sales for
Jan 22 2010

Exadata Smart Scan

Improve Query Performance by 10x or More

What Were
Yesterday’s
Sales?


Off
-
load data intensive processing to Exadata Storage Server


Exadata Storage Server only returns relevant rows and columns


Wide Infiniband connections eliminate network bottlenecks

Exadata Hybrid Columnar Compression

Reduce Disk Space Requirements

0
10
20
30
40
50
60
70
80
90
100
Data


Terabytes

3x

10x

15x

1.4x

2.5 x

Uncompressed

Data

Data Warehouse

Appliances

OLTP
Data

DW
Data

Archive
Data

Oracle

Built
-
in Analytics

Secure, Scalable Platform for Advanced Analytics


Complex and predictive analytics embedded into Oracle Database 11g


Reduce cost of additional hardware, management resources


Improve performance by eliminating data movement and duplication

Oracle Data Mining

Uncover and predict

Oracle OLAP

Analyze and summarize

Oracle Database 11
g

The Best Database for Data Warehousing


World record performance for fast access to information


Manage growing volumes of information cost
-
effectively


Reduce costs through server and data consolidation

Real Application Clusters

Advanced Compression

Partitioning

OLAP



Data Mining
















The Concept of Partitioning

Maintain Consistent Performance as Database Grows

SALES

SALES

Jan

Feb

SALES

Jan

Feb

Europe

USA

Large Table


Difficult to Manage

Partition


Divide and Conquer


Easier to Manage


Improve Performance

Composite Partition


Higher Performance


Match to business
needs

Partition for Performance

Partition Pruning

What
was the total
sales amount for May
20 and May 21 2010?

Select sum(sales_amount)

From SALES

Where sales_date between

to_date(‘05/20/2010’,’MM/DD/YYYY’)

And

to_date(‘05/22/2010’,’MM/DD/YYYY’);

5/20

5/21

5/22

5/19

Sales Table


Performs operations only on relevant partitions


Dramatically reduces amount of data retrieved from disk


Improves query performance and optimizes resource utilization

Partition to Manage Data Growth

Compress Data and Lower Storage Costs



Distribute partitions across multiple compression tiers


Free up storage space and execute queries faster


No changes to existing applications


Active Data

3x OLTP
Compression

Read Only Data

10
-
15x DW
Compression

Archive Data

15
-
50x Archive
Compression

In
-
Memory Parallel Execution

Efficient use of memory on clustered servers


Compress more data into available memory on cluster


Intelligent algorithm


Places table fragments in memory on different nodes


Reduces disk IO and speeds query execution


© 2010 Oracle Corporation

In
-
Memory Parallel Query in Database Tier

Automated Degree of Parallelism


Optimizer derives the best Degree of Parallelism


Based on resource requirements of all concurrent operations


Less DBA management, better resource utilization

Automatically

determine

DOP

Enough parallel servers available

Execute
immediately

Queue statements if not enough parallel servers available

When required number of servers are
available, execute first statement

8

64

32

16


Pre
-
summarized information stored within Oracle Database 11g


Separate database object, transparent to queries


Supports sophisticated transparent query rewrite


Fast incremental refresh of changed data

Summary Management

Improve Response Time with Materialized Views


Date

Products

Channel

SQL Query

Sales by

Date

Sales by

Product

Sales by

Region

Sales by

Channel

Region

Materialized Views

Relational Star
Schema

Query
Rewrite


Exposes Oracle OLAP cubes as relational materialized views


Provides
SQL access to data stored in an OLAP cubes


Any BI tool or SQL application can leverage OLAP cubes



Region

Date

Products

Channel

Cube Organized Materialized Views


SQL Query

Automatic
Refresh

Query Rewrite

Summaries

DW

Strategy


Single source of truth


Extreme performance


Lower cost of ownership


Deeper Insight



In
-
database Analytics

Bring Algorithms to the Data, Not Data to the Algorithms


Analytic computations
done in the database


Dimensional analysis


Statistical analysis


Data Mining


Scalability


Security


Backup & Recovery


Simplicity

OLAP

Data Mining

Statistics


Multidimensional analytic engine that analyzes summary data


Offers improved query performance and fast, incremental updates


Embedded in Oracle Database instance and storage



Oracle OLAP

Built
-
in Access to Analytic Calculations


How do sales in the Western region this
quarter compare with sales a year ago?


What will sales next quarter be?


What factors can we alter to improve the
sales forecast?


Collection of data mining algorithms that solve business problems


Simplifies development of predictive BI applications


Embedded in Oracle Database instance and storage



Oracle Data Mining

Find Hidden Patterns, Make Predictions

Retail

Financial Services


Customer Segmentation


Response

Modeling


Credit

Scoring


Possibility of default

Communications

Utilities


Customer churn


乥瑷潲欠i湴牵獩潮


偲潤畣琠扵湤bi湧


偲敤mc琠灯t敲eli湥n晡fl畲u

Healthcare

Public Sector


Patient

outcome prediction


Fraud

detection


Tax fraud


Crime

analysis


Enrich BI with map visualization of Oracle Spatial data


Enable location analysis in reporting, alerts and notifications


Use maps to guide data navigation, filtering and drill
-
down


Increase ROI from geospatial and non
-
spatial data



Oracle Spatial and OBIEE




Data Models

Exadata

Business Intelligence

Oracle Exadata Intelligent Warehouse

For Industries


Combine deep industry knowledge with data warehousing expertise


Help jump
-
start design and implementation of data warehouses


Available for Retail and Communications industries




Combine deep industry knowledge with data warehousing expertise


Help jump
-
start design and implementation of data warehouses


Optimized for Oracle Database 11g and Oracle Exadata



Reference Data Model

Aggregate Data Model

Relational (STAR) for BI

OLAP for Analytical

Derived Data Model

Data Mining/Complex
Reports/Query

Base Data Model (3NF)

Atomic Level of Transaction Data

Oracle Industry Data Models


Extreme Performance Data Warehousing

Integrated Technology Stack


Single source of truth


Extreme performance


Lower cost of ownership


Deeper Insight



Smart Storage

Database

Data Models

ELT Tools

BI Tools

BI Applications

Data Warehouse Reference Architecture

Base data warehouse schema

Atomic
-
level data, 3nf design

Supports general end
-
user queries

Data feeds to all dependent systems

Application
-
specific performance structures

Summary data / materialized views

Dimensional view of data

Supports specific end
-
users, tools, and applications



Oracle #1 for Data Warehousing

Source: IDC, July 2009


“Worldwide Data Warehouse Management Tools 2008 Vendor Shares”