Incremental Maintenance for
Non

Distributive Aggregate Functions
work done at IBM Almaden Research Center
Themis Palpanas (U of Toronto)
Richard Sidle
Bobbie Cochrane
Hamid Pirahesh
UoA, Sep 2002
Themis Palpanas

U of Toronto
2
Motivation
large amounts of data stored in databases
often times data warehouses are used
consolidate data from many sources
offer more general and descriptive view of data
queried by business intelligence tools and decision support
systems
produce expensive OLAP queries
these OLAP queries have nice properties:
based on same set of tables
perform similar aggregations
UoA, Sep 2002
Themis Palpanas

U of Toronto
3
Motivation (cont’d)
can efficiently support such queries with Automatic
Summary Tables (ASTs)
materialized queries defined over a set of base tables
precomputed once, used many times
answer complex queries fast
must maintain ASTs when base tables change
inserts, updates, deletes
4
Motivation (cont’d)
base
tables
AST
insert/update/delete
AST
definition
UoA, Sep 2002
Themis Palpanas

U of Toronto
5
Aggregate Functions
characterization of functions wrt insertion and
deletion operations
updates are series of deletions and insertions
distributive aggregate functions
new value computed based on old value and value of
operation
SUM()
non

distributive aggregate functions
above property does not hold
STDDEV()
MIN() (because of deletions)
UoA, Sep 2002
Themis Palpanas

U of Toronto
6
Problem Statement
given ASTs with aggregate functions
distributive
SUM, COUNT
non

distributive
STDDEV, CORRELATION, REGRESSION, MIN/MAX,
XMLAGG, …
when base tables change
incrementally maintain affected ASTs
efficient maintenance of ASTs with
non

distributive aggregate functions
UoA, Sep 2002
Themis Palpanas

U of Toronto
7
Outline
Current Approach
Our Solution
Experimental Evaluation
Related Work
Conclusions
8
AST
Current Approach
base
tables
insert/update/delete
delta
combine
old and
new values
AST
definition
Propagate
phase
Apply
phase
UoA, Sep 2002
Themis Palpanas

U of Toronto
9
Current Approach (cont’d)
works for distributive
SUM, COUNT
does not work for
non

distributive
STDDEV, CORRELATION, REGRESSION
MIN/MAX
XMLAGG
need new way to deal with these functions
UoA, Sep 2002
Themis Palpanas

U of Toronto
10
Our Solution
selective recomputation
no longer enough to compute
delta
must recompute some aggregation groups
minimize work to be done
choose which groups to recompute
optimize query plan
11
Our Solution (cont’d)
AST
base
tables
insert/update/delete
delta
recompute
affected
groups
combine
old and
new values
Propagate
phase
AST
definition
Apply
phase
UoA, Sep 2002
Themis Palpanas

U of Toronto
12
Our Solution (cont’d)
the 5 steps
1.
compute new aggregate values
2.
change column derivation
3.
recompute only affected groups
4.
eliminate unnecessary operations
5.
optimize for special cases
UoA, Sep 2002
Themis Palpanas

U of Toronto
13
Initial Query Plan
prop
UDI
LOJ
AST
Query Graph Model (QGM)
UoA, Sep 2002
Themis Palpanas

U of Toronto
14
1. Compute New Aggregate Values
compute
delta
for distributive
functions
recompute non

distributive
functions
get those values only for
affected groups
duplicate computation for
distributive functions!
prop
UDI
LOJ
AST
AST
LOJ
UoA, Sep 2002
Themis Palpanas

U of Toronto
15
2. Change Column Derivation
change column derivation
rewrite phase projects out
unused columns
entire AST gets recomputed!
prop
UDI
LOJ
AST
AST
LOJ
non

distributive
only
distributive
only
UoA, Sep 2002
Themis Palpanas

U of Toronto
16
2. Change Column Derivation
example AST:
SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary)
FROM employees
GROUP BY dept_id
result of COUNT() computed from old propagate phase
results of MAX() and STDDEV() from AST definition
UoA, Sep 2002
Themis Palpanas

U of Toronto
17
3. Recompute Affected Groups
push join predicate
down in AST
only affected groups
are recomputed
special rules for super

aggregates
GROUPING SETS
ROLLUP
CUBE
prop
UDI
LOJ
AST
AST*
LOJ
non

distributive
only
distributive
only
T1
Tk
…
J
J
UoA, Sep 2002
Themis Palpanas

U of Toronto
18
3. Recompute Affected Groups
special treatment for ASTs with super

aggregates
predicates not pushdownable
caution not to compute totals of totals
build special join predicate
ensure correct aggregations
change rewrite rules
allow predicate pushdown through super aggregates
applicable only for special join predicate
UoA, Sep 2002
Themis Palpanas

U of Toronto
19
4. Remove Unnecessary Operations
outerjoin not always needed
when changes are only inserts
reroute columns from
propagate phase through AST
remove outerjoin operator
same for updates not
referencing AST grouping
columns and predicates
prop
UDI
LOJ
AST
AST
T1
Tk
…
J
J
all columns
distributive
only
UoA, Sep 2002
Themis Palpanas

U of Toronto
20
4. Remove Unnecessary Operations
example AST:
SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary)
FROM employees
GROUP BY dept_id
modification on base tables:
UPDATE employees SET salary=10 WHERE age>40
outerjoin operation will not be built
update does not refer to grouping column (
dept_id
), and no
predicate in AST refers to updated column (
salary
)
certain that no tuples in AST will be deleted
only STDDEV() will be recomputed
the rest are not affected by changes
UoA, Sep 2002
Themis Palpanas

U of Toronto
21
5. Optimize for Special Cases
recomputation step not needed when
only insertions and only MIN/MAX functions
build predicate in apply phase
check if new min/max should replace old values
only deletions referring only to grouping columns of
AST
can only cause entire groups to be deleted
handled in apply phase
UoA, Sep 2002
Themis Palpanas

U of Toronto
22
5. Optimize for Special Cases
example AST:
SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary)
FROM employees
GROUP BY dept_id
modification on base tables:
DELETE FROM employees WHERE dept_id>40
selective recomputation step not needed
deletion refers only to grouping column (
dept_id
)
certain that entire groups will be deleted from AST
no other groups will be affected
UoA, Sep 2002
Themis Palpanas

U of Toronto
23
Experimental Evaluation
prototype implementation in IBM DB2 UDB
star schema database
sales of products over 5 year time period
fact table: 10 million tuples
AST with non

distributive aggregate function
240,000 tuples
workload simulates nightly updates
1.
add/delete data for first day of month
2.
add/delete data for second day of month
3.
add/delete data for full month
UoA, Sep 2002
Themis Palpanas

U of Toronto
24
Experimental Evaluation (cont’d)
workload 1
workload 2
workload 3
incremental
286
294
420
full refresh
699
702
692
deletions require 40

60% of full refresh time
workload 1
workload 2
workload 3
incremental
3
n/a
31
full refresh
699
702
692
optimized deletions require 1

4% of full refresh time
UoA, Sep 2002
Themis Palpanas

U of Toronto
25
Experimental Evaluation (cont’d)
workload 1
workload 2
workload 3
incremental
151
158
180
full refresh
702
702
721
insertions/updates require 20

25% of full refresh time
UoA, Sep 2002
Themis Palpanas

U of Toronto
26
Related Work
incremental view maintenance
differential refresh algorithms
Lindsay et al. 1986, Blakeley et al. 1986, Qian and
Wiederhold 1991, Ceri and Widom 1991
deferred incremental maintenance
Colby et al. 1996, Salem et al. 2000
views with aggregation
Quass 1996, Mumick et al. 1997
UoA, Sep 2002
Themis Palpanas

U of Toronto
27
Conclusions
incremental maintenance for ASTs with non

distributive aggregate functions
support MIN/MAX, STDDEV, CORRELATION,
REGRESSION, XMLAGG, …
efficient selective recomputation
recompute only affected groups
optimize query plan
customize for special cases
significant performance improvements
UoA, Sep 2002
Themis Palpanas

U of Toronto
28
Future Work
examine use of work areas
temporary storage space
store intermediate values
maintenance without recomputation
STDDEV, MIN/MAX(?), …
very helpful for ASTs defined with super

aggregates
ASTs with HAVING clauses
do not know when groups will enter/leave AST
Comments 0
Log in to post a comment