A First Look at Materialized Query Table (MQT) in DB2 LUW

basesprocketΔιαχείριση Δεδομένων

31 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

100 εμφανίσεις

®


IBM Software Group

© 2009 IBM Corporation

A First Look at Materialized Query Table (MQT) in
DB2 LUW

Petrus Chan, IBM Toronto Lab, petrus@ca.ibm.com

IBM Software Group | DB2 information management software

2

Agenda


Motivation


Sample scenario


How MQT can help


Different types of MQTs & Maintenance


MQT Matching


what queries can benefit from MQTs


Design Considerations


Other Uses of MQT




IBM Software Group | DB2 information management software

3

Database Schema for Our Sample Scenario

Product_id

Store_id

Date_id


Amount

Quantity



SALES

Store_id


Region_id



STORE

Date_id


Month_id

Quarter_id

Year_id

TIME

Product_id


Class_id

Group_id

Family_id

Line_id

Division_id



PRODUCT

Star Schema

IBM Software Group | DB2 information management software

4

Dimensions Hierarchy

Store

Month

Product

Class

Group

Family

Line

Division

Quarter

Year

Region

Sales Fact

Product Dimension

Time Dimension

Store Dimension

Level 5

Level 1

Level 2

Level 3

Level 4

Level 0

Day

IBM Software Group | DB2 information management software

5

Sample Query

Q9: Total sales of each product group at each store for December 2008.

SELECT t.month_id,


f.store_id,


p.group_id,




sum(amount) as total_sales


FROM sales f, product p, time t


WHERE f.date_id = t.date_id AND


f.product_id = p.product_id AND


t.month_id = 200812


GROUP BY t.month_id, f.store_id, p.group_id

IBM Software Group | DB2 information management software

6

Benefits of MQTs

Fact

GB

JOIN

Dim2

Dim1

JOIN

Q9

Month=200812

Fact

GB

JOIN

Dim2

Dim1

JOIN

Joe’s Query

Group=Laser Printer

Fact

GB

JOIN

Dim2

Dim1

JOIN

Bob’s Query

Store=McLean

Fact

GB

JOIN

Dim2

Dim1

JOIN

MQT

Q9

Bob’s Q

Joe’s Q

Compute
Once &
Reuse

Month=200812

Store=McLean

Group=Laser Printer

IBM Software Group | DB2 information management software

7

What is an MQT?


MQT stands for Materialized Query Table.



Basic Idea
:


Query results caching
-

pre
-
compute and store (in database
tables) the results of frequently issued queries


Query optimizer automatically and transparently replace portions
of future queries with direct references to the pre
-
computed (i.e.,
materialized) results. This is referred to as MQT routing or
matching.



Standard technique in data warehousing, also known as aggregate
tables and aggregate (re
-
)routing, or alternatively, as materialized
views.


IBM Software Group | DB2 information management software

8

Usages of MQTs


At a High Level


Aggregate data over one or more dimensions


Joins and aggregate data over a group of tables


Data from a commonly accessed subset of data, that is, from a "hot"
horizontal or vertical database partition


Repartitioned data from a table, or part of a table, in a partitioned
database environment

IBM Software Group | DB2 information management software

9

How query can be routed to MQT?

MQT_SalesByMthStrGrp

SELECT t.month_id,


f.store_id,


p.group_id,


sum(amount) as total_sales


FROM sales f, product p, time t


WHERE f.date_id = t.date_id


AND f.product_id = p.product_id


GROUP BY t.month_id, f.store_id,


p.group_id

Q9: Original Query

SELECT t.month_id,


f.store_id,


p.group_id,



sum(amount) as total_sales


FROM sales f, product p, time t


WHERE f.date_id = t.date_id AND


f.product_id = p.product_id


AND
t.month_id = 200812


GROUP BY t.month_id, f.store_id,
p.group_id

Q9: Rewritten Query

SELECT m.month_id,


m.store_id,


m.group_id,


m.total_sales


FROM
MQT_SalesByMthStrGrp m


WHERE m.month_id
= 200812



Route Query
to MQT

IBM Software Group | DB2 information management software

10

MQT Creation


MQTs are created using the
CREATE TABLE statement



Immediately after creation, the
MQT is
empty

and not
accessible. It is populated using
the REFRESH TABLE statement



After running the REFRESH, the
MQT becomes accessible

CREATE TABLE MQT_SalesByMthStrGrp AS

(SELECT t.month_id,


f.store_id,


p.group_id,


sum(amount) as total_sales


FROM sales f, product p, time t


WHERE f.date_id = t.date_id AND


f.product_id = p.product_id

GROUP BY t.month_id, f.store_id, p.group_id)

DATA INITIALLY DEFERRED

REFRESH DEFERRED;


REFRESH TABLE
MQT_SalesByMthStrGrp;


RUNSTATS
ON TABLE
petrus.MQT_SalesByMthStrGrp

WITH DISTRIBUTION AND DETAILED INDEXES ALL;


--

Confirm that MQT is being used by

--

(a) setting
CURRENT REFRESH AGE to ANY

--

(b) explaining the query used as

--

the MQT’s definition

IBM Software Group | DB2 information management software

11

Agenda


Motivation


Sample scenario


How MQT can help


Types of MQTs & Maintenance


MQT Matching


what queries can benefit from MQTs


Design Considerations


Other Uses of MQT




IBM Software Group | DB2 information management software

12

Types of MQTs

MQT

System
Maintained

User
Maintained

Refresh
IMMEDIATE

Refresh
DEFERRED

WITH staging table

WITHOUT
staging table

IBM Software Group | DB2 information management software

13

MQT Maintenance


Process of synchronizing the MQT data with changes to the underlying
tables


Incremental Maintenance



Immediate Propagate Immediate Applied (IPIA)


Full Maintenance


Hybrid


Immediate Propagate Deferred Applied (IPDA)

IBM Software Group | DB2 information management software

14

Incremental Maintenance

MQT

base

tables

base

tables

base

tables

Base

Tables

Delta select/join

Delta

Aggregate

I
mmediate
P
ropagate

I
mmediate
A
pply

Pair Delta to MQT

Combine old and

new values

Primary
-
key:
grouping
columns


the ability to maintain the MQT without the
need to access the modified base table,
only using the
delta

information (newly
inserted/updated/deleted data) and the
MQT itself.


REFRESH IMMEDIDATE MQTs are
maintained
incrementally

during:


INSERT/UPDATE/DELETE of base
table processing


REFRESH TABLE processing after
LOAD INSERT into the base table.


Immediate Propagate, Immediate Apply


For performance reasons, only
incrementally maintainable MQTs can be
defined as REFRESH IMMEDIATE
(hence more restrictions in MQT
definition)

IBM Software Group | DB2 information management software

15

Full Maintenance


MQTs are fully rebuilt by re
-
evaluating the
query in the MQT definition


REFRESH DEFERRED MQTs (without
staging table):


not maintained during
INSERT/UPDATE/DELETE operations on
the base table.


synchronized with base table using the
REFRESH TABLE statement

MQT

base

tables

base

tables

base

tables

Base

Tables

select/join

Aggregate

IBM Software Group | DB2 information management software

16

Hybrid
-

Immediate Propagate, Deferred Applied

AST

base

tables

base

tables

base

tables

Base

Tables

Delta select/join

Delta

Aggregate

I
mmediate
P
ropagate

D
eferred

A
pply

Pair Delta to AST

Combine old and

new values

Primary
-
key:
grouping
columns

Staging

Staging

Tables

Net
-
effect


REFRESH DEFERRED MQTs
with staging table


Delta changes to MQT are
computed upon
INSERT/UPDATE/DELETE of
base tables


Delta aggregate is recorded in a
separate
staging table


Subsequent REFRESH of MQT
can be done incrementally


Immediate Propagate, Deferred
Apply


One staging table per MQT

CREATE TABLE SALES_BY_YEAR_STG

FOR SALES_BY_YEAR

PROPAGATE IMMEDIATE;

IBM Software Group | DB2 information management software

17

User Maintained MQT


User is responsible for maintaining MQT content


REFRESH TABLE disallowed


Insert/update/delete on MQT allowed


Must be REFRESH DEFERRED


Scenarios where user maintained MQTs are


For efficiency reasons, when the user is convinced that (s)he can implement
MQT maintenance far more efficiently than the mechanisms used by DB2.


For leveraging existing “user maintained” summaries, where the user wants
DB2 to automatically consider them for optimization for new ad hoc queries
being executed against the base tables.

IBM Software Group | DB2 information management software

18

Refresh Considerations

Refresh Immediate

Refresh Deferred
(WITH staging
table)

Refresh Deferred
(NO staging table)

Latency of Data

Always in sync

Up to last
REFRESH

Up to last
REFRESH

Transaction
Overhread

upon base table
modification

Yes

Yes

No

Incremental
Refresh

No need, except
after LOAD
INSERT

Yes

No

Lock Contention
during base table
modifications

Potential

Less

No

IBM Software Group | DB2 information management software

19

Agenda


Motivation


Sample scenario


How MQT can help


Types of MQTs & Maintenance


MQT Matching


what queries can benefit from MQTs


Design Considerations


Other Uses of MQT




IBM Software Group | DB2 information management software

20

MQT Matching


Process to automatically use
the MQT to derive results for a
query for improved
performance


MQT_SalesByMthStrGrp

SELECT t.month_id,


f.store_id,


p.group_id,


sum(amount) as total_sales


FROM sales f, product p, time t


WHERE f.date_id = t.date_id


AND f.product_id = p.product_id


GROUP BY t.month_id, s.store_id,


p.group_id

Q9: Original Query

SELECT t.month_id,


f.store_id,


p.group_id,



sum(amount) as total_sales


FROM sales f, product p, time t


WHERE f.date_id = t.date_id AND


f.product_id = p.product_id


AND
t.month_id = 200812


GROUP BY t.month_id, f.store_id,
p.group_id

Q9: Rewritten Query

SELECT m.month_id,


m.store_id,


m.group_id,


m.total_sales


FROM
MQT_SalesByMthStrGrp m


WHERE m.month_id
= 200812



Route Query
to MQT

IBM Software Group | DB2 information management software

21

What queries can benefit from an MQT?


Queries


that matches an MQT perfectly!


with predicates that are a subset of those in MQT


with more table joins than MQT (
re
-
join
)


with fewer table joins than MQT (need RI constraint)


with columns not in MQT definition (need functional
dependencies)


at a higher
aggregation level

than an MQT


IBM Software Group | DB2 information management software

22

Query with predicates that are a subset of those in
MQT

Fact

GB

JOIN

Dim2

Dim1

JOIN

Q9

Month=200812

Fact

GB

JOIN

Dim2

Dim1

JOIN

Joe’s Query

Group=Laser Printer

Fact

GB

JOIN

Dim2

Dim1

JOIN

Bob’s Query

Store=McLean

Fact

GB

JOIN

Dim2

Dim1

JOIN

MQT

Q9

Bob’s Q

Joe’s Q

Compute
Once &
Reuse

Month=200812

Store=McLean

Group=Laser Printer

IBM Software Group | DB2 information management software

23

Query with more table joins than MQT

MQT:

Original

Query:

Sales

Product

Time

GB

Sales

Product

Time

Store

GB

If the extra table join
is through a column
in group
-
by list,
match and re
-
join
with tables not in
MQT

Rewritten

Query:

MQT

Store

GB

Re
-
join

IBM Software Group | DB2 information management software

24

Query with
more

table joins than MQT (Example)

MQT_SalesByMthStrGrp

SELECT t.month_id,


f.store_id,


p.group_id,


sum(amount) as total_sales


FROM sales f, product p, time t


WHERE f.date_id = t.date_id


AND f.product_id = p.product_id


GROUP BY t.month_id, s.store_id,


p.group_id

Original Query

SELECT t.month_id,


s.region_id,



p.group_id,



sum(amount) as total_sales


FROM sales f, product p, time t,


store s


WHERE f.date_id = t.date_id AND


f.product_id = p.product_id


AND
f.store_id = s.store_id

GROUP BY t.month_id,
s.region_id,


p.group_id

Rewritten Query

SELECT m.month_id,


s.region_id,


m.group_id,


sum(m.total_sales)


FROM
MQT_SalesByMthStrGrp m
,


store s

WHERE
m.stored_id = s.store_id

GROUP BY m.month_id,
s.region_id,


m.group_id

Route Query
to MQT

IBM Software Group | DB2 information management software

25

Queries with
fewer

joins than MQT


In general, we can’t route queries to an MQT that has more table joins because
the extra joins affects the MQT content


However, if the extra joins in MQT are RI
-
joins, i.e. joins with RI parent table
with join predicate of the form:

child.fk = parent.pk

They are “look
-
up” joins that
do not add or delete rows


RI
-
join is common between Fact and Dimension tables.


Need to declare RI constraints (even just informational constraints) to tell DB2
about it

Query:

MQT:

Sales

Time

GB

Sales

Time

Product

GB

RI
-
join

IBM Software Group | DB2 information management software

26

Queries with
fewer

joins than MQT (Example)

Original Query

SELECT t.month_id,




f.store_id,



sum(amount) as total_sales


FROM sales f, time t

WHERE f.date_id = t.date_id


GROUP BY t.month_id, f.store_id

Rewritten Query

SELECT m.month_id,


m.store_id,


sum(m.total_sales)


FROM
MQT_SalesByMthStrGrp m


GROUP BY m.month_id, m.store_id

Route Query
to MQT

MQT_SalesByMthStrGrp

SELECT t.month_id,


f.store_id,


p.group_id,



sum(amount) as total_sales


FROM sales f, product p, time t


WHERE f.date_id = t.date_id


AND
f.product_id = p.product_id


GROUP BY t.month_id, s.store_id,


p.group_id

alter table sales add constraint
fk_product
foreign key
(product_id) references product;


Sales (product_id int
NOT NULL
)

IBM Software Group | DB2 information management software

27

Importance of NOT NULL Constraint on Foreign Key Columns


Declaring NOT NULL constraint on the foreign key column of the child table is
important!










If a row of column
SALES.product_id

has null value:


the row will not qualify the join predicate
f.product_id = p.product_id
.



The MQT would be missing some rows that are needed by the query, hence
incorrect to route.

MQT_SalesByMthStrGrp

SELECT t.month_id,


f.store_id,


p.group_id,


sum(amount) as total_sales


FROM sales f, product p, time t


WHERE f.date_id = t.date_id


AND
f.product_id = p.product_id


GROUP BY t.month_id, s.store_id,


p.group_id

IBM Software Group | DB2 information management software

28

Queries at higher aggregation level than an MQT


Store

Month

Product

Class

Group

Family

Line

Division

Quarter

Year

Region

Sales Fact

Product Dimension

Time Dimension

Store Dimension

Level 5

Level 1

Level 2

Level 3

Level 4

Level 0

Day



MQT for the Store
-
Month
-
Group slice



Queries that access data at or above
that slice can be satisfied by the MQT
with some additional aggregating.



Querying more detailed data below
the slice, such as Day or Product,
cannot be routed to this MQT.


IBM Software Group | DB2 information management software

29

MQT_SalesByMthStrGrp

SELECT t.month_id,


f.store_id,


p.group_id,


sum(amount) as total_sales


FROM sales f, product p, time t


WHERE f.date_id = t.date_id


AND f.product_id = p.product_id


GROUP BY t.month_id, s.store_id,


p.group_id

Queries at a higher level aggregation
-

Example

Original Query

SELECT
t.quarter_id,


f.store_id,


p.group_id,


sum(amount) as total_sales


FROM sales f, product p, time t


WHERE f.date_id = t.date_id AND


f.product_id = p.product_id

GROUP BY
t.quarter_id,

f.store_id,


p.group_id

Rewritten Query

SELECT
t.quarer_id,


m.store_id,


m.group_id,


sum(m.total_sales)


FROM
MQT_SalesByMthStrGrp m,



time t


WHERE m.month_id = t.month_id


GROUP BY
t.quarter_id
,
m.store_id, m.group_id



Route Query
to MQT

Functional Dependency

TIME
(month_id→quarter_id)

IBM Software Group | DB2 information management software

30

Functional Dependencies


X functionally determines Y

iff each X value is associated with
precisely one Y value.


By definition, all columns of a relation are functionally dependent
on the
unique/primary key.



De
-
normalization introduces functional dependencies between
non
-
key columns.


Between ID and descriptive columns


Example:
group_id → group_desc


Between different levels of the hierarchy


Example:
group_id → (family_id, line_id,


division_id)


Product

Class

Group

Family

Line

Division

Product Dimension

IBM Software Group | DB2 information management software

31

Functional Dependencies in DB2


User can help DB2 identifies the functional dependencies among non
-
key
columns by declaring NOT ENFORCED FD constraints:

ALTER TABLE PRODUCT

ADD CHECK((group_desc)
DETERMINED BY

group_id)

NOT ENFORCED ENABLE QUERY OPTIMIZATION


During MQT routing, DB2 will re
-
join with the table to pick up the
“dependents”


IBM Software Group | DB2 information management software

32

Functional Dependencies
-

Example

Query at
year
-
region
-
division

level:

Select t.year_id, s.region_desc, p.division_desc,


sum(amount) as total_sales

...

GROUP BY t.year_id, s.region_desc, p.division_desc;

MQT at the
month
-
store
-

group

aggregate level:


SELECT t.month_id,


t.year_id,


f.store_id,


s.region_id, s.region_desc,


p.group_id,


p.division_id, p.division_desc,


sum(amount) as total_sales


FROM sales f, product p, time t


,stored s


WHERE f.date_id = t.date_id


AND f.product_id = p.product_id


AND f.store_id = s.store_id


GROUP BY t.month_id,


t.year_id,


s.store_id,


s.region_id, s.region_desc,


p.group_id


,p.division_id, p.division_desc


FD: TIME: month_id
-
> year_id


UK: STORE store_id
-
> (region_id, region_desc)


FD: PRODUCT: group_id
-
> (division_id, division_desc)

MQT

PRODUCT

TIME

GB


(year
-
region
-
division)

FD re
-
join

FD re
-
join

PRODUCT

FD re
-
join


(month_id, year_id)


(group_id, division_desc)

(store_id, region_desc)


(month
-
store
-
group)

IBM Software Group | DB2 information management software

33

Functional Dependencies in MQT


It is a good practice to declare functional dependencies to keep the
minimum amount of dimensional information in the MQT. This will:


keep the size of the MQT small


Keep the MQT unaffected by dimension table updates
-

minimize
the need for maintenance

IBM Software Group | DB2 information management software

34

MQT Matching


Insurance Policy


MQT matching performed during
Query Rewrite


Rewrite transformations mostly
rule/heuristic based


Insurance Policy:


Optimizer Cost based comparison
between:


Access plan with MQT routing


Access plan WITHOUT MQT
routing


Cheapest plan wins
-

against
choosing a wrong heuristic

Parser

Query Graph Semantics

Query Rewrite

Optimizer

Code Generator

SQL Query

With and without Aggregate MQT

IBM Software Group | DB2 information management software

35

Agenda


Motivation


Sample scenario


How MQT can help


Types of MQTs & Maintenance


MQT Matching


what queries can benefit from MQTs


Design Considerations


Other Uses of MQT




IBM Software Group | DB2 information management software

36

Design Considerations


On usage of MQT


Space requirement


MQTs are
materialized

tables



On MQT definition


Level of aggregation vs. query processing time


Too high level


few queries can route to it


Too low level


costly compensation time


Workload
-
based vs. Model
-
based


Numbers of MQTs




Store

Month

Product

Class

Group

Family

Line

Division

Quarter

Year

Region

Sales Fact

Product Dim.

Time Dim.

Store Dim.

Level 5

Level 1

Level 2

Level 3

Level 4

Level 0

Day

IBM Software Group | DB2 information management software

37

Refreshing with finer granularity MQTs


DB2 9 allows the use of existing MQTs to be used by higher level MQTs during
REFRESH


Matching done during REFRESH


It is very important to aggregate in the right order


Example :


MQT1 : GROUP BY DATE, MONTH, YEAR


MQT2 : GROUP BY MONTH, YEAR


REFRESH MQT1


REFRESH MQT2


uses MQT1

IBM Software Group | DB2 information management software

38

Another Use of Replicated MQTs


In partitioned database, colocation
of rows involved in different tables
=> efficient join (
colocated joins
)


In a typical star schema setup, the
fact table can be co
-
located with
at most one dimension table.


The remaining (smaller)
dimension tables are frequently
replicated

to all nodes of the fact
table


Replicated MQTs are used to
achieve co
-
location of data for
efficient join processing in
partitioned database environment


Product

ProdID1

Sales

ProdID1

Product

ProdID2

Sales

ProdID2

Product

ProdID3

Sales

ProdID3

Store


Store


Store


Store


IBM Software Group | DB2 information management software

39

Summary


An MQT is a table that stores pre
-
computed query results.


Queries can be matched to an MQT to improve performance.


Different types of MQTs and their maintenance:


immediate, deferred (with/without staging table), user maintained


replicated


Queries can be matched to an MQT that has:


Predicates that are superset of those in queries


Finer grained aggregation level


More/few table joins


Not null constraint, foreign key constraints and functional
dependencies can be defined to help MQT matching


IBM Software Group | DB2 information management software

40

References


Redbooks (http://www.redbooks.ibm.com)


DB2 UDB's High Function Business Intelligence in e
-
Business
(SG24
-
6546
-
00)


Related Papers


Wolfgang Lehner, Richard Sidle, Hamid Pirahesh, Roberta
Cochrane: Maintenance of Automatic Summary Tables.
SIGMOD Conference 2000: 512
-
513


Markos Zaharioudakis, Roberta Cochrane, George Lapis,
Hamid Pirahesh, Monica Urata: Answering Complex SQL
Queries Using Automatic Summary Tables. SIGMOD
Conference 2000: 105
-
116

IBM Software Group | DB2 information management software

41

Thank You