®
IBM Software Group
© 2009 IBM Corporation
A First Look at Materialized Query Table (MQT) in
DB2 LUW
Petrus Chan, IBM Toronto Lab, petrus@ca.ibm.com
IBM Software Group | DB2 information management software
2
Agenda
Motivation
Sample scenario
How MQT can help
Different types of MQTs & Maintenance
MQT Matching
–
what queries can benefit from MQTs
Design Considerations
Other Uses of MQT
IBM Software Group | DB2 information management software
3
Database Schema for Our Sample Scenario
Product_id
Store_id
Date_id
Amount
Quantity
…
SALES
Store_id
Region_id
…
STORE
Date_id
Month_id
Quarter_id
Year_id
TIME
Product_id
Class_id
Group_id
Family_id
Line_id
Division_id
…
PRODUCT
Star Schema
IBM Software Group | DB2 information management software
4
Dimensions Hierarchy
Store
Month
Product
Class
Group
Family
Line
Division
Quarter
Year
Region
Sales Fact
Product Dimension
Time Dimension
Store Dimension
Level 5
Level 1
Level 2
Level 3
Level 4
Level 0
Day
IBM Software Group | DB2 information management software
5
Sample Query
Q9: Total sales of each product group at each store for December 2008.
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id AND
f.product_id = p.product_id AND
t.month_id = 200812
GROUP BY t.month_id, f.store_id, p.group_id
IBM Software Group | DB2 information management software
6
Benefits of MQTs
Fact
GB
JOIN
Dim2
Dim1
JOIN
Q9
Month=200812
Fact
GB
JOIN
Dim2
Dim1
JOIN
Joe’s Query
Group=Laser Printer
Fact
GB
JOIN
Dim2
Dim1
JOIN
Bob’s Query
Store=McLean
Fact
GB
JOIN
Dim2
Dim1
JOIN
MQT
Q9
Bob’s Q
Joe’s Q
Compute
Once &
Reuse
Month=200812
Store=McLean
Group=Laser Printer
IBM Software Group | DB2 information management software
7
What is an MQT?
MQT stands for Materialized Query Table.
Basic Idea
:
Query results caching
-
pre
-
compute and store (in database
tables) the results of frequently issued queries
Query optimizer automatically and transparently replace portions
of future queries with direct references to the pre
-
computed (i.e.,
materialized) results. This is referred to as MQT routing or
matching.
Standard technique in data warehousing, also known as aggregate
tables and aggregate (re
-
)routing, or alternatively, as materialized
views.
IBM Software Group | DB2 information management software
8
Usages of MQTs
–
At a High Level
Aggregate data over one or more dimensions
Joins and aggregate data over a group of tables
Data from a commonly accessed subset of data, that is, from a "hot"
horizontal or vertical database partition
Repartitioned data from a table, or part of a table, in a partitioned
database environment
IBM Software Group | DB2 information management software
9
How query can be routed to MQT?
MQT_SalesByMthStrGrp
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id
AND f.product_id = p.product_id
GROUP BY t.month_id, f.store_id,
p.group_id
Q9: Original Query
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id AND
f.product_id = p.product_id
AND
t.month_id = 200812
GROUP BY t.month_id, f.store_id,
p.group_id
Q9: Rewritten Query
SELECT m.month_id,
m.store_id,
m.group_id,
m.total_sales
FROM
MQT_SalesByMthStrGrp m
WHERE m.month_id
= 200812
Route Query
to MQT
IBM Software Group | DB2 information management software
10
MQT Creation
MQTs are created using the
CREATE TABLE statement
Immediately after creation, the
MQT is
empty
and not
accessible. It is populated using
the REFRESH TABLE statement
After running the REFRESH, the
MQT becomes accessible
CREATE TABLE MQT_SalesByMthStrGrp AS
(SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id AND
f.product_id = p.product_id
GROUP BY t.month_id, f.store_id, p.group_id)
DATA INITIALLY DEFERRED
REFRESH DEFERRED;
REFRESH TABLE
MQT_SalesByMthStrGrp;
RUNSTATS
ON TABLE
petrus.MQT_SalesByMthStrGrp
WITH DISTRIBUTION AND DETAILED INDEXES ALL;
--
Confirm that MQT is being used by
--
(a) setting
CURRENT REFRESH AGE to ANY
--
(b) explaining the query used as
--
the MQT’s definition
IBM Software Group | DB2 information management software
11
Agenda
Motivation
Sample scenario
How MQT can help
Types of MQTs & Maintenance
MQT Matching
–
what queries can benefit from MQTs
Design Considerations
Other Uses of MQT
IBM Software Group | DB2 information management software
12
Types of MQTs
MQT
System
Maintained
User
Maintained
Refresh
IMMEDIATE
Refresh
DEFERRED
WITH staging table
WITHOUT
staging table
IBM Software Group | DB2 information management software
13
MQT Maintenance
Process of synchronizing the MQT data with changes to the underlying
tables
Incremental Maintenance
Immediate Propagate Immediate Applied (IPIA)
Full Maintenance
Hybrid
Immediate Propagate Deferred Applied (IPDA)
IBM Software Group | DB2 information management software
14
Incremental Maintenance
MQT
base
tables
base
tables
base
tables
Base
Tables
Delta select/join
Delta
Aggregate
I
mmediate
P
ropagate
I
mmediate
A
pply
Pair Delta to MQT
Combine old and
new values
Primary
-
key:
grouping
columns
the ability to maintain the MQT without the
need to access the modified base table,
only using the
delta
information (newly
inserted/updated/deleted data) and the
MQT itself.
REFRESH IMMEDIDATE MQTs are
maintained
incrementally
during:
INSERT/UPDATE/DELETE of base
table processing
REFRESH TABLE processing after
LOAD INSERT into the base table.
Immediate Propagate, Immediate Apply
For performance reasons, only
incrementally maintainable MQTs can be
defined as REFRESH IMMEDIATE
(hence more restrictions in MQT
definition)
IBM Software Group | DB2 information management software
15
Full Maintenance
MQTs are fully rebuilt by re
-
evaluating the
query in the MQT definition
REFRESH DEFERRED MQTs (without
staging table):
not maintained during
INSERT/UPDATE/DELETE operations on
the base table.
synchronized with base table using the
REFRESH TABLE statement
MQT
base
tables
base
tables
base
tables
Base
Tables
select/join
Aggregate
IBM Software Group | DB2 information management software
16
Hybrid
-
Immediate Propagate, Deferred Applied
AST
base
tables
base
tables
base
tables
Base
Tables
Delta select/join
Delta
Aggregate
I
mmediate
P
ropagate
D
eferred
A
pply
Pair Delta to AST
Combine old and
new values
Primary
-
key:
grouping
columns
Staging
Staging
Tables
Net
-
effect
REFRESH DEFERRED MQTs
with staging table
Delta changes to MQT are
computed upon
INSERT/UPDATE/DELETE of
base tables
Delta aggregate is recorded in a
separate
staging table
Subsequent REFRESH of MQT
can be done incrementally
Immediate Propagate, Deferred
Apply
One staging table per MQT
CREATE TABLE SALES_BY_YEAR_STG
FOR SALES_BY_YEAR
PROPAGATE IMMEDIATE;
IBM Software Group | DB2 information management software
17
User Maintained MQT
User is responsible for maintaining MQT content
REFRESH TABLE disallowed
Insert/update/delete on MQT allowed
Must be REFRESH DEFERRED
Scenarios where user maintained MQTs are
For efficiency reasons, when the user is convinced that (s)he can implement
MQT maintenance far more efficiently than the mechanisms used by DB2.
For leveraging existing “user maintained” summaries, where the user wants
DB2 to automatically consider them for optimization for new ad hoc queries
being executed against the base tables.
IBM Software Group | DB2 information management software
18
Refresh Considerations
Refresh Immediate
Refresh Deferred
(WITH staging
table)
Refresh Deferred
(NO staging table)
Latency of Data
Always in sync
Up to last
REFRESH
Up to last
REFRESH
Transaction
Overhread
upon base table
modification
Yes
Yes
No
Incremental
Refresh
No need, except
after LOAD
INSERT
Yes
No
Lock Contention
during base table
modifications
Potential
Less
No
IBM Software Group | DB2 information management software
19
Agenda
Motivation
Sample scenario
How MQT can help
Types of MQTs & Maintenance
MQT Matching
–
what queries can benefit from MQTs
Design Considerations
Other Uses of MQT
IBM Software Group | DB2 information management software
20
MQT Matching
Process to automatically use
the MQT to derive results for a
query for improved
performance
MQT_SalesByMthStrGrp
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id
AND f.product_id = p.product_id
GROUP BY t.month_id, s.store_id,
p.group_id
Q9: Original Query
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id AND
f.product_id = p.product_id
AND
t.month_id = 200812
GROUP BY t.month_id, f.store_id,
p.group_id
Q9: Rewritten Query
SELECT m.month_id,
m.store_id,
m.group_id,
m.total_sales
FROM
MQT_SalesByMthStrGrp m
WHERE m.month_id
= 200812
Route Query
to MQT
IBM Software Group | DB2 information management software
21
What queries can benefit from an MQT?
Queries
that matches an MQT perfectly!
with predicates that are a subset of those in MQT
with more table joins than MQT (
re
-
join
)
with fewer table joins than MQT (need RI constraint)
with columns not in MQT definition (need functional
dependencies)
at a higher
aggregation level
than an MQT
IBM Software Group | DB2 information management software
22
Query with predicates that are a subset of those in
MQT
Fact
GB
JOIN
Dim2
Dim1
JOIN
Q9
Month=200812
Fact
GB
JOIN
Dim2
Dim1
JOIN
Joe’s Query
Group=Laser Printer
Fact
GB
JOIN
Dim2
Dim1
JOIN
Bob’s Query
Store=McLean
Fact
GB
JOIN
Dim2
Dim1
JOIN
MQT
Q9
Bob’s Q
Joe’s Q
Compute
Once &
Reuse
Month=200812
Store=McLean
Group=Laser Printer
IBM Software Group | DB2 information management software
23
Query with more table joins than MQT
MQT:
Original
Query:
Sales
Product
Time
GB
Sales
Product
Time
Store
GB
If the extra table join
is through a column
in group
-
by list,
match and re
-
join
with tables not in
MQT
Rewritten
Query:
MQT
Store
GB
Re
-
join
IBM Software Group | DB2 information management software
24
Query with
more
table joins than MQT (Example)
MQT_SalesByMthStrGrp
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id
AND f.product_id = p.product_id
GROUP BY t.month_id, s.store_id,
p.group_id
Original Query
SELECT t.month_id,
s.region_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t,
store s
WHERE f.date_id = t.date_id AND
f.product_id = p.product_id
AND
f.store_id = s.store_id
GROUP BY t.month_id,
s.region_id,
p.group_id
Rewritten Query
SELECT m.month_id,
s.region_id,
m.group_id,
sum(m.total_sales)
FROM
MQT_SalesByMthStrGrp m
,
store s
WHERE
m.stored_id = s.store_id
GROUP BY m.month_id,
s.region_id,
m.group_id
Route Query
to MQT
IBM Software Group | DB2 information management software
25
Queries with
fewer
joins than MQT
In general, we can’t route queries to an MQT that has more table joins because
the extra joins affects the MQT content
However, if the extra joins in MQT are RI
-
joins, i.e. joins with RI parent table
with join predicate of the form:
child.fk = parent.pk
They are “look
-
up” joins that
do not add or delete rows
RI
-
join is common between Fact and Dimension tables.
Need to declare RI constraints (even just informational constraints) to tell DB2
about it
Query:
MQT:
Sales
Time
GB
Sales
Time
Product
GB
RI
-
join
IBM Software Group | DB2 information management software
26
Queries with
fewer
joins than MQT (Example)
Original Query
SELECT t.month_id,
f.store_id,
sum(amount) as total_sales
FROM sales f, time t
WHERE f.date_id = t.date_id
GROUP BY t.month_id, f.store_id
Rewritten Query
SELECT m.month_id,
m.store_id,
sum(m.total_sales)
FROM
MQT_SalesByMthStrGrp m
GROUP BY m.month_id, m.store_id
Route Query
to MQT
MQT_SalesByMthStrGrp
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id
AND
f.product_id = p.product_id
GROUP BY t.month_id, s.store_id,
p.group_id
alter table sales add constraint
fk_product
foreign key
(product_id) references product;
Sales (product_id int
NOT NULL
)
IBM Software Group | DB2 information management software
27
Importance of NOT NULL Constraint on Foreign Key Columns
Declaring NOT NULL constraint on the foreign key column of the child table is
important!
If a row of column
SALES.product_id
has null value:
the row will not qualify the join predicate
f.product_id = p.product_id
.
The MQT would be missing some rows that are needed by the query, hence
incorrect to route.
MQT_SalesByMthStrGrp
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id
AND
f.product_id = p.product_id
GROUP BY t.month_id, s.store_id,
p.group_id
IBM Software Group | DB2 information management software
28
Queries at higher aggregation level than an MQT
Store
Month
Product
Class
Group
Family
Line
Division
Quarter
Year
Region
Sales Fact
Product Dimension
Time Dimension
Store Dimension
Level 5
Level 1
Level 2
Level 3
Level 4
Level 0
Day
MQT for the Store
-
Month
-
Group slice
Queries that access data at or above
that slice can be satisfied by the MQT
with some additional aggregating.
Querying more detailed data below
the slice, such as Day or Product,
cannot be routed to this MQT.
IBM Software Group | DB2 information management software
29
MQT_SalesByMthStrGrp
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id
AND f.product_id = p.product_id
GROUP BY t.month_id, s.store_id,
p.group_id
Queries at a higher level aggregation
-
Example
Original Query
SELECT
t.quarter_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id AND
f.product_id = p.product_id
GROUP BY
t.quarter_id,
f.store_id,
p.group_id
Rewritten Query
SELECT
t.quarer_id,
m.store_id,
m.group_id,
sum(m.total_sales)
FROM
MQT_SalesByMthStrGrp m,
time t
WHERE m.month_id = t.month_id
GROUP BY
t.quarter_id
,
m.store_id, m.group_id
Route Query
to MQT
Functional Dependency
TIME
(month_id→quarter_id)
IBM Software Group | DB2 information management software
30
Functional Dependencies
X functionally determines Y
iff each X value is associated with
precisely one Y value.
By definition, all columns of a relation are functionally dependent
on the
unique/primary key.
De
-
normalization introduces functional dependencies between
non
-
key columns.
Between ID and descriptive columns
Example:
group_id → group_desc
Between different levels of the hierarchy
Example:
group_id → (family_id, line_id,
division_id)
Product
Class
Group
Family
Line
Division
Product Dimension
IBM Software Group | DB2 information management software
31
Functional Dependencies in DB2
User can help DB2 identifies the functional dependencies among non
-
key
columns by declaring NOT ENFORCED FD constraints:
ALTER TABLE PRODUCT
ADD CHECK((group_desc)
DETERMINED BY
group_id)
NOT ENFORCED ENABLE QUERY OPTIMIZATION
During MQT routing, DB2 will re
-
join with the table to pick up the
“dependents”
IBM Software Group | DB2 information management software
32
Functional Dependencies
-
Example
Query at
year
-
region
-
division
level:
Select t.year_id, s.region_desc, p.division_desc,
sum(amount) as total_sales
...
GROUP BY t.year_id, s.region_desc, p.division_desc;
MQT at the
month
-
store
-
group
aggregate level:
SELECT t.month_id,
t.year_id,
f.store_id,
s.region_id, s.region_desc,
p.group_id,
p.division_id, p.division_desc,
sum(amount) as total_sales
FROM sales f, product p, time t
,stored s
WHERE f.date_id = t.date_id
AND f.product_id = p.product_id
AND f.store_id = s.store_id
GROUP BY t.month_id,
t.year_id,
s.store_id,
s.region_id, s.region_desc,
p.group_id
,p.division_id, p.division_desc
FD: TIME: month_id
-
> year_id
UK: STORE store_id
-
> (region_id, region_desc)
FD: PRODUCT: group_id
-
> (division_id, division_desc)
MQT
PRODUCT
TIME
GB
(year
-
region
-
division)
FD re
-
join
FD re
-
join
PRODUCT
FD re
-
join
(month_id, year_id)
(group_id, division_desc)
(store_id, region_desc)
(month
-
store
-
group)
IBM Software Group | DB2 information management software
33
Functional Dependencies in MQT
It is a good practice to declare functional dependencies to keep the
minimum amount of dimensional information in the MQT. This will:
keep the size of the MQT small
Keep the MQT unaffected by dimension table updates
-
minimize
the need for maintenance
IBM Software Group | DB2 information management software
34
MQT Matching
–
Insurance Policy
MQT matching performed during
Query Rewrite
Rewrite transformations mostly
rule/heuristic based
Insurance Policy:
Optimizer Cost based comparison
between:
Access plan with MQT routing
Access plan WITHOUT MQT
routing
Cheapest plan wins
-
against
choosing a wrong heuristic
Parser
Query Graph Semantics
Query Rewrite
Optimizer
Code Generator
SQL Query
With and without Aggregate MQT
IBM Software Group | DB2 information management software
35
Agenda
Motivation
Sample scenario
How MQT can help
Types of MQTs & Maintenance
MQT Matching
–
what queries can benefit from MQTs
Design Considerations
Other Uses of MQT
IBM Software Group | DB2 information management software
36
Design Considerations
On usage of MQT
Space requirement
–
MQTs are
materialized
tables
On MQT definition
Level of aggregation vs. query processing time
Too high level
–
few queries can route to it
Too low level
–
costly compensation time
Workload
-
based vs. Model
-
based
Numbers of MQTs
Store
Month
Product
Class
Group
Family
Line
Division
Quarter
Year
Region
Sales Fact
Product Dim.
Time Dim.
Store Dim.
Level 5
Level 1
Level 2
Level 3
Level 4
Level 0
Day
IBM Software Group | DB2 information management software
37
Refreshing with finer granularity MQTs
DB2 9 allows the use of existing MQTs to be used by higher level MQTs during
REFRESH
Matching done during REFRESH
It is very important to aggregate in the right order
Example :
MQT1 : GROUP BY DATE, MONTH, YEAR
MQT2 : GROUP BY MONTH, YEAR
REFRESH MQT1
REFRESH MQT2
uses MQT1
IBM Software Group | DB2 information management software
38
Another Use of Replicated MQTs
In partitioned database, colocation
of rows involved in different tables
=> efficient join (
colocated joins
)
In a typical star schema setup, the
fact table can be co
-
located with
at most one dimension table.
The remaining (smaller)
dimension tables are frequently
replicated
to all nodes of the fact
table
Replicated MQTs are used to
achieve co
-
location of data for
efficient join processing in
partitioned database environment
Product
ProdID1
Sales
ProdID1
Product
ProdID2
Sales
ProdID2
Product
ProdID3
Sales
ProdID3
Store
Store
Store
Store
IBM Software Group | DB2 information management software
39
Summary
An MQT is a table that stores pre
-
computed query results.
Queries can be matched to an MQT to improve performance.
Different types of MQTs and their maintenance:
immediate, deferred (with/without staging table), user maintained
replicated
Queries can be matched to an MQT that has:
Predicates that are superset of those in queries
Finer grained aggregation level
More/few table joins
Not null constraint, foreign key constraints and functional
dependencies can be defined to help MQT matching
IBM Software Group | DB2 information management software
40
References
Redbooks (http://www.redbooks.ibm.com)
DB2 UDB's High Function Business Intelligence in e
-
Business
(SG24
-
6546
-
00)
Related Papers
Wolfgang Lehner, Richard Sidle, Hamid Pirahesh, Roberta
Cochrane: Maintenance of Automatic Summary Tables.
SIGMOD Conference 2000: 512
-
513
Markos Zaharioudakis, Roberta Cochrane, George Lapis,
Hamid Pirahesh, Monica Urata: Answering Complex SQL
Queries Using Automatic Summary Tables. SIGMOD
Conference 2000: 105
-
116
IBM Software Group | DB2 information management software
41
Thank You
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment