Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor

triparkansasData Management

Oct 31, 2013 (4 years and 8 days ago)

114 views

RecommendingMaterialized
ViewsandIndexeswiththeIBM
DB2DesignAdvisor
DanielC.Zilioetal
ProceedingsoftheInternationalConferenceon
AutomaticComputing(ICAC’04)
RolandoBlanco
CS848-Spring2005
–p.1/24
Overview

MaterializedViews

DB2Advisorextensions

OtherSystems

Comment/Observations
–p.2/24
MaterializedViews(MVs)

Materializedpartialqueryresults

Reason:Improveperformance

Drawbacks:
-
Introducedataredundancy
-
Maintenance:
*
Deferred
*
Immediate

Goodiftheybenefitmanyqueries

DesignIssues:
-
Whatpartialresultstomaterialize
-
Whattoindex(Requireindicesasanyothertables)
-
Howtorefresh
Basetablesmaybepartitioned.
–p.3/24
MaterializedViews

Goal:MinimizetotalcostofqueryevaluationandMV
maintenanceunderlimitedresource(storage)constraint

Algorithm:
-
Input:Workload,diskspaceconstraint
-
Output:OptionalsetofMVsandindicesfor
workload
-
ImplementationOptions:
1.
ChooseMVsgivenstorageconstraint,then
indicesusingremainingspace
2.
(MVs,indices),(MVs,indices),...
3.
(MV,indices)inonestep
–p.4/24
DB2Advisor

MVandIndexRecommendation:
a.
GenerationofMVcandidates
b.
Generationofindexcandidates
c.
StatsestimationforMVsandindices
d.
MVandindexselection
e.
Filtering
Output:(1)MVs,or(2)Indicesonbasetables,or(3)
MVs,andindicesonbasetablesandMVs.
a,b,andcbyinvokingoptimizerunderanewEXPLAIN
mode.
–p.5/24
a.GenerationofMVCandidates

Queriesinworkload(generalized)
-
select...from...where[groupby]

Userdefinedviews

MultipleQueryOptimization(MQO)tofindcommon
subexpressionsamongqueriesinworkload
–p.6/24
a.GenerationofMVCandidates

MQO[
LCPZ01
]:Queryrepresentedasgraphfollowing
DB2’squerygraphmodel(QGM)
SELECTo_orderstatus,
SUM(l_quantity)ASsum_qty,
AVG(l_discount)ASavg_disc,
SUM(l_extendedprice)
ASsum_base_price,
SUM(l_extendedprice*
(1-l_discount))
ASsum_disc_price
FROMlineitem,orders
WHEREl_orderkey=o_orderkey
ANDo_orderpriority=’3-MEDIUM’
GROUPBYo_orderstatus
HAVINGMIN(l_quantity)>5
PRD1: $C3 < 5Select
Select
Select
o_orderkey,o_orderstatus,o_orderpriority
l_orderkey,l_quantity,l_extendedprice,l_discount
sum_qty,sum_disc_price,
o_orderstatus,sum_price,$c1/$c2 AS avg_disc,MIN(l_quantity) AS $c3 COUNT(l_discount) AS $c2,MIN(l_quantity) AS $c3
SUM(l_discount) AS $c1,
Groupby
lineitemorders
o_orderstatus,SUM(l_quantity) AS sum_qty,SUM(l_extendedprice) AS sum_price,
AS sum_disc_price
SUM(l_extendedprice * (1 - l_discount)),o_orderstatus
l_quantity, l_extendedprice, l_discount,
PRD2: L_ORDERPRIORITY = '3-MEDIUM'
PRD1: L_ORDERKEY = O_ORDERKEY
GRP: O_ORDERSTATUS
PRD1: -
–p.7/24
a.GenerationofMVCandidates

MQO
-
QGMtraversedbottomup
-
Identificationofcommon/similarsubexpressions
-
Generalizationofexpressionsifrequired
-
Compensationifdataneedstobeadjusted
*
Back-joins
*
Predicateadjustment
-
Rules:
*
Basetableboxesrefertosamedatasources
*
Expressionsmustbederivablefromgeneralized
expression
–p.8/24
b.GenerationofIndexCandidates

[
VZZ
+
00
]
-
SmartColumnEnumerationforIndexScans
(SAEFIS)–analysisofstatementpredicatesand
clauses
-
BruteForceandIgnorance(BFI)enumeration–
enumerationofallpossibleindiceswithupperbound
HowareMVsconsidered?
–p.9/24
c.StatsEstimationforMVs/Indices

MVs:
-
Size:rowwidthandcardinalityforMVs
*
Estimationbyoptimizer,or
*
Datasampling;morereliablesizeestimates

Indices:
-
Basedontableandcolumn(s)stats:
*
Numberofuniquevaluesinindex
*
B+-treelevels
*
Numberofleafandnon-leafpages
NoteindicesonMVswillbeestimatedonestimates
–p.10/24
d.MV/IndexSelection

Knapsackproblem

Obtaininitialreasonablesolutionbyrelaxingintegrality
constraint

O(nlogn)byorderingbydescendingbenefit/costweight
Weight(A)=
￿
q∈Q
f(q)∗B(q)
D(A)
B(q)=E(q)−M(q)
f(q):queryfrequency
B(q):performancechange
D(A):DiskspaceconsumedbyA
E(q):“Explain”plancostwithexistentMVsandindices
M(q):“Explain”plancostwithexistentandnewMVsandindices
NoteB(q)notspecifictoA
–p.11/24
d.MV/IndexSelection

CostofmaintainingMV:
-
Immediaterefresh:
*
Option1:Rowsaffected:Estimatednumberof
rowsforallMVs,estimatedfrequenciesand
numberofupdatedrowsforeachupdate
statement.
|MV|/|BaseTable|∗Cost
Update(BaseTable)?
|Updates|callstooptimizer
*
Option2:Timetoupdate:
foreachu∈Updatesdo
t1=Estimateexecutiontimeforu
foreachv∈MVsdo
t2=Estimateexecutiontimeforuwithv
Estimatedtimecostofv=t2-t1
Totalcallstooptimizer:|Updates|+|MVs|∗|Updates|
–p.12/24
d.MV/IndexSelection

CostofmaintainingMV:
-
Fullrefresh:
Weight(A)=
￿
q∈Q
f(q)∗B(q)
D(A)
−g(A)∗
C(A)
D(A)
g(A):timesAismaterializedinthetimeinterval,1
bydefault(Canitbededucedfromworkload?)
C(A):costofmaterializingA
–p.13/24
d.MV/IndexSelection
ADD
COMBINE

Knapsackrelaxedsolution
-
AddMVifindexselected

Iteraterandomswapping
-
8timeswithnoimprovement,or
-
untiltimelimit
IfMVswapped,itsindicesneedtobeswappedtoo;ifindexofMV
swappedin,thenMVmustbeaddedtoo
Compensatesfor:
-
B(q)isthesameforallMVsandindicesusedby
query
-
NowaytocomputeWeight(A)giventhatBnot
selected
-
Relaxationofintegralityconstraint
–p.14/24
e.Filtering

SelectionofMVsandindicesthatcompeteinsome
optimizedqueryplan(Optimizerselectsoneortheother)

Selectionsforotherdatabasesystemsinheterogeneous
DBMSenvironments
–p.15/24
Experiments

OLAPstarschemawithstats,nodata(nosampling)

15tables,400GBytes

Workload:12OLAPqueries,equalfrequency

Advisorlimitedtorun5minutes
–p.16/24
Experiments

Experiment1:
-
ImprovementsfromrecommendedMVs
-
ComparisonofbenefitfromMQOcandidatesversus
MQO+materializingworkload
-
Result:
*
MQO:30%improvement
*
MQO+Materializedqueries:similartoMQO
alone

Experiment2:
-
ImprovementofMVsandindices
-
89%improvement(versus30%withMVsonly)
–p.17/24
Summary/Contributions

Toolthatusesoptimizeritselftosuggestandevaluate
MVsandindicesgivenaworkloadanddiskspace
constraint

Simplealgorithmforfinding“good”solutionquickly
–p.18/24
OtherSystems

SQLServer[
ACN00
]
-
GenerationofMVCandidates
*
MVcandidatesgeneratedbysyntacticstructure
*
SingleblockMVsonly,noback-joins.HenceMVs
matchingqueriesinworkloadnotconsidered
*
MVformultiplequeriesviaMVmerging
-
StatsEstimation
*
Nosampling
-
MV/IndexSelection
*
Pruningbasedonbenefitthresholdsand
Greedy(m,k)algorithmaswithindices
–p.19/24
OtherSystems

Oracle10g[
DDD
+
04
,
DRS
+
05
]
-
AWR:AutomaticWorkloadRepository
*
Performancestatsandworkloadinfo
-
ADDM:AutomaticDatabaseDiagnosticEngine.
*
Classificationengine
*
Diagnosiseveryhourbydefault
-
Advisors:
*
SQLTuningAdvisor.Mayrecommend
runningSQLAccessAdvisor.
*
SegmentAdvisor
*
MemoryAdvisor
–p.20/24
OtherSystems

SQLAccessAdvisor:
-
RecommendswhatMVsandindicestocreate,drop,
orretain.
-
PL/SQLinterfaceaccessibleviaPL/SQL,GUIor
commandline
*
Createtask
*
Defineworkload
*
Generaterecommendation
*
Viewandimplementrecommendations
-
SQLProfiling
*
StatisticsaboutSQLstatements
*
Mayrequirerunningqueryonsampledata
*
Purposeistoreduceerrorsinvariouscost
estimates,improvingoptimizer’soperationand
recommendations.
–p.21/24
Comments/Observations

Matchingandgeneralizationofcommonsubexpressions

Experimentation:
-
Minimal
-
Nocomparisonwithoptimalsolution
-
Nodatasampling,justestimates(areimprovements
estimated,notcorroborated?)
-
Caseswhereitdoesnotworkwell(e.g.workloads
withtoomanyupdates?)

e.filtering:“usefulinheterogeneousDBMS
environments”

d.MV/Indexselection:“8iterations”
–p.22/24
Comments/Observations

InvalidSQLstatementsinsubexpressionexamples/Why
groupby?MQT1:SELECTD,EFROMSGROUPBYE

HowareinitialindicesonMVsgenerated?

NorecommendationswithregardstoMVrefreshinterval

AreincrementalrefreshesofMVspossible?

Whodeterminesifmaintenancecostshouldconsider
numberofrowsormaintenancetime?

“Advisorisabletooptimizelargeworkloadsofqueriesin
reasonableamountoftime”.Notsupported(forexample
withexperimentation).[
ZRL
+
04
]:“runtimegrows
exponentiallywithlinearincreaseoftheworkloadsize”.
SAP:30,000databaseobjects(includestablesand
indices).Built-inworkloadcompression.
–p.23/24
References
[ACN00]
SanjayAgrawal,SurajitChaudhuri,andVivekR.Narasayya.Automatedselectionof
materializedviewsandindexesinsqldatabases.InVLDB’00:Proc’softhe26th
InternationalConferenceonVeryLargeDataBases,pages496–505,2000.
[DDD
+
04]
BenoˆıtDageville,DineshDas,KarlDias,KhaledYagoub,MohamedZa
¨
ıt,andMohamed
Ziauddin.Automaticsqltuninginoracle10g.InVLDB,pages1098–1109,2004.
[DRS
+
05]
KarlDias,MarkRamacher,UriShaft,VenkateshwaranVenkataramani,andGraham
Wood.Automaticperformancediagnosisandtuninginoracle.InCIDR,pages84–94,
2005.
[LCPZ01]
WolfgangLehner,RobertaCochrane,HamidPirahesh,andMarkosZaharioudakis.fAST
refreshusingmassqueryoptimization.InProc’softhe17thInternationalConferenceon
DataEngineering,pages391–398,2001.
[VZZ
+
00]
GaryValentin,MichaelZuliani,DanielC.Zilio,GuyLohman,andAlanSkelley.Db2
advisor:Anoptimizersmartenoughtorecommenditsownindexes.InICDE’00:Proc’s
ofthe16thInternationalConferenceonDataEngineering,page101,2000.
[ZRL
+
04]
DanielC.Zilio,JunRao,SamLightstone,GuyM.Lohman,AdamStorm,Christian
Garcia-Arellano,andScottFadden.Db2designadvisor:Integratedautomaticphysical
databasedesign.InProc’softhe30thVLDBConference,pages1087–1097,2004.
–p.24/24