Hypothetical I dnexes on PostgreSQL - PUC-Rio

arizonahoopleΔιαχείριση Δεδομένων

28 Νοε 2012 (πριν από 4 χρόνια και 7 μήνες)

288 εμφανίσεις

Hypothetical
Id
I
n
d
exes on
PostgreSQL
PostgreSQL
This
tutorial
shows
the
usage
of
hypothetical
indexes
on
PostgreSQL
Hypothetical
This
tutorial
shows
the
usage
of
hypothetical
indexes
on
PostgreSQL
.
Hypothetical
indexeswerefirstdiscussedin[Frank,Omiecinski,Navathe,92]
Hypotheticalindexesaresimulatedindexstructurescreatedsolelyinthedatabase
catalo
g
.Thist
yp
eofindexhasno
p
h
y
sicalextensionand,therefore,cannotbeused
g
yp
py
toanswerqueries.Themainbenefitistoprovideameansforsimulatinghowquery
executionplanswouldchangeifthehypotheticalindexeswereactuallycreatedinthe
database.ThusthisfeatureisusefulfordatabasetunersandDBAs.
Index
selection
tools
such
as
Microsoft

s
SQL
Server
Index
Tuning
Wizard
make
Index
selection
tools
,
such
as
Microsofts
SQL
Server
Index
Tuning
Wizard
,
make
useofhypotheticalindexesinthedatabaseservertoevaluatecandidateindex
configurations.
WehavemadesomeserverextensionstoPost
g
reSQL9.0.1toincludethenotionof
g
hypotheticalindexesinthesystem.Wehaveintroducedthreenewcommands:
createhypotheticalindex,drophypotheticalindexandexplain
hypothetical.TodownloadtheHypotheticalPlugincode,clickhere
.
Begin Tutorial
Begin Tutorial
Hypothetical
Id
I
n
d
exes on
PostgreSQL
PostgreSQL
Thistutorialusesanexampledatabasethatstoresproduct(produto)and
Product(Produto)
Sale (Venda)
sale(venda)informationforanenterprise.Thedatabaseconsistsoftwo
relations:
Product

(Produto)
num
–int
descricao –varchar(50)
num
–int
prodNum –int
data

date
Number of tuples: 4
data

date
qtd –int
valor –numeric(10,2)
Numberoftuples:400000
Number

of

tuples:

400000
Actual indexes are created for all of the tables’ primary keys. Scripts to create
the enterprise database can be found here
.
NextPrevious
Back to Beginning
Hypothetical
Id
I
n
d
exes on
PostgreSQL
PostgreSQL
Select prodNum, data, sum(valor) as total
fromvenda
The following query is very frequently issued by the enterprise application:
from

venda

where valor > 1500000 and
data between '2004-01-01' and '2004-01-31'
group by prodNum, data;
Letstakealookatitsqueryexecutionplanusingthe
explain
statement:
Lets

take

a

look

at

its

query

execution

plan

using

the

explain

statement:
explain
select prodNum, data, sum(valor) as total
from venda
hl1500000d
wh
ere va
l
or >
1500000
an
d
data between '2004-01-01' and '2004-01-31'
group by prodNum, data;
NextPrevious
Back to Beginning
Hypothetical
Id
I
n
d
exes on
PostgreSQL
PostgreSQL
QueryExecutionPlan
HashAggregate (cost=9778.97..9779.11 rows=11 width=18)
-> Seq Scan on venda (cost=0.00..9759.00 rows=2663 width=18)
Filter: ((valor > 1500000::numeric) AND (data >= '2004-01-01'::date) AND
(data <= '2004-01-31'::date))
Asequentialscanwaschosenbytheplannertoaccessthevendatable.Perhaps
wecouldimprovethisbycreatinganindexonthevaloranddatacolumns.
NextPrevious
Back to Beginning
Hypothetical
Id
I
n
d
exes on
PostgreSQL
PostgreSQL
Althoughwecouldbenefitfromtheexistenceofanindexonthevaloranddata
columns,weshouldbecarefultocreateit.Firstly,wedonotknowiftheDBMSwill
actuallychoosetouseanindexinthevaloranddatacolumnsifitexists.Secondly,if
wetrytocreateanactualindexinthesecolumns,theDBMSwillpreventwritersfrom
accessing
the
table
So
it
is
hard
to
experiment
with
new
indexes
and
evaluate
how
accessing
the
table
.
So
it
is
hard
to
experiment
with
new
indexes
and
evaluate
how
goodtheyare.
Insteadofincurringtheburdenofcreatinganactualindexonthecolumns,wecould
simulateifthisindexwouldbeusefultothedatabase.Todothat,wecreateitasa
hypothetical
index
:
hypothetical
index
:
create hypothetical index hi_venda_valor_data
on venda (valor, data);
ThecreatehypotheticalindexcommandalsoexistsinotherDBMSs,butwitha
differentsyntax.Forexample,seethesyntaxproposedforSQLServerin[Chaudhuri,
Narasayya,98]
NextPrevious
Back to Beginning
Hypothetical
Id
I
n
d
exes on
PostgreSQL
PostgreSQL
The hypothetical index is not actually materialized in the database. Therefore, we will not
incur in heavy creation costs or obtain locks on the underlying table to create it. The
DBMS, however, cannot use the hypothetical index to answer a user query. If we query
thedatabaseagainorusethe
explain
statementthesystemwillstilluseasequential
the

database

again

or

use

the

explain
statement
,
the

system

will

still

use

a

sequential

scan to access the employeetable.
We can see how the DBMS would behave if the hypothetical index were materialized
using the explain hypotheticalstatement:
explain hypothetical
select prodNum, data, sum(valor) as total
from venda
wherevalor>1500000and
where

valor

>

1500000

and
data between '2004-01-01' and '2004-01-31'
group by prodNum, data;
NextPrevious
Back to Beginning
Hypothetical
Id
I
n
d
exes on
PostgreSQL
PostgreSQL
QueryExecutionPlan
HashAggregate (cost=3976.05..3976.19 rows=11 width=18)
>BitmapHeapScanonvenda(
cost=107226395608rows=2663
width=18)
-
>

Bitmap

Heap

Scan

on

venda

(
cost=1072
.
26
..
3956
.
08

rows=2663

width=18)
Recheck Cond: ((valor > 1500000::numeric) AND (data >= '2004-01-01'::date)
AND (data <= '2004-01-31'::date))
-> Bitmap Index Scan on hi_venda_valor_data (cost=0.00..1071.60
Iftheindexhivendavalordatawasmaterialized
,
theDBMSwoulduseitto
rows=2663 width=0)
Index Cond: ((valor > 1500000::numeric) AND (data >= '2004-01-01'::date)
AND (data <= '2004-01-31'::date))
___
,
processthequery.Theestimatedcosttoprocessthequerywoulddropfrom
9778.97..9779.11usingthesequentialscanto3976.05..3976.19usingtheindex
scan.
NextPrevious
Back to Beginning
Hypothetical
Id
I
n
d
exes on
PostgreSQL
PostgreSQL
Now that we know that the index is beneficial to performance, we can drop the
hypothetical index and create a corresponding actual one:
drop hypothetical index hi_venda_valor_data;
create index i_venda_valor_data on venda ( valor, data );
NextPrevious
Back to Beginning
Hypothetical
Id
I
n
d
exes on
PostgreSQL
PostgreSQL
Lets check the query execution plan for the query with the actual index created:
explain
select prodNum, data, sum(valor) as total
from venda
where valor > 1500000 and
data between '2004-01-01' and '2004-01-31'
group by prodNum, data;
NextPrevious
Back to Beginning
Hypothetical
Id
I
n
d
exes on
PostgreSQL
PostgreSQL
QueryExecutionPlan
HashAggregate (cost=3880.04..3880.18 rows=11 width=18)
-> Bitmap Heap Scan on venda (cost=976.25..3860.07 rows=2663 width=18)
Recheck Cond: ((valor > 1500000::numeric) AND (data >= '2004-01-01'::date)
AND (data <= '2004-01-31'::date))
->
BitmapIndexScanonivendavalordata(cost
=
0.00
..975.59
The
cost
estimated
by
the
planner
for
the
query
using
the
hypothetical
index
was

Bitmap

Index

Scan

on

i
_
venda
_
valor
_
data

(cost0.00
..975.59

rows=2663 width=0)
Index Cond: ((valor > 1500000::numeric) AND (data >= '2004-01-
01'::date) AND (data <= '2004-01-31'::date))
The
cost
estimated
by
the
planner
for
the
query
using
the
hypothetical
index
was
3976.05..3976.19.Withtheactualindex,theplannergaveusanestimateof
3880.04..3880.18.Costestimatesforhypotheticalindexestendtobeconservative,
butalwaysclosetothecostofusingtheactualindex.
NextPrevious
Back to Beginning
Hypothetical
Id
I
n
d
exes on
PostgreSQL
PostgreSQL
Estimationsmadeforh
yp
otheticalindexestendto
p
roduce
yp
p
conservativecostvalues.Thecostvaluesarebiggerthanthose
verifiedforthecorrespondingactualindexes.Thishappensbecause
wehavemadesomeapproximationstoestimatetheindexsize.
W
it
th
ttl
b
f
tl
t
i
th
id
b
th
W
eapprox
i
ma
t
e
th
e
t
o
t
a
l
num
b
ero
f
t
up
l
espresen
t
i
n
th
e
i
n
d
ex
b
y
th
e
numberoftuplespresentinthetablebeingindexed.Wealso
approximatethenumberofpagesintheindexbythenumberofpages
inthetable.
Usingtheseestimates,thequeryoptimizertendstoconsidertheindex
biggerthanitwouldactuallybeifmaterialized.Thismeansthatthe
costcalculationsmadeforthisindexwillproducehigherI/Ofigures
thanthecalculationsforthecorres
p
ondin
g
actualindex.
pg
Apositiveconsequenceofthispolicyisthatwewillneverrecommend
anindexthatwillnotbechosenbytheoptimizerwhenmaterialized.
Previous
Next
Back to Beginning
Hypothetical
Id
I
n
d
exes on
PostgreSQL
PostgreSQL
Thatendsourtutorial.Wehopethetutorialhasbeenusefulforyouto
understandhowhypotheticalindexescanbeusedtosimulateindex
configurationsforthedatabase.Oneimportantfacttonoticeisthatthe
additionofhypotheticalindexesdoesnotimpactpreviouslyexisting
applications.Thenewfeatureisaimedprimarilyatdatabasetuners
andDBAs.
After
implementing
the
server
extensions
for
hypothetical
indexes
the
After
implementing
the
server
extensions
for
hypothetical
indexes
,
the
nextlogicalstepistoimplementautomaticindexselectiontoolsand
algorithmsforthePostgreSQLdatabase.Wearecurrentlydoingthat
atPUC-Rio.Oneinterestingresearchprototypeweobtainedisa
software
agent
written
in
C++
that
can
be
integrated
to
the
DBMS
in
software
agent
,
written
in
C++
,
that
can
be
integrated
to
the
DBMS
in
ordertomakeindexselectionandcreationtotallyautonomic.
Previous
Next
Back to Beginning
Hypothetical
Id
I
n
d
exes on
PostgreSQL
PostgreSQL
Ifyouwouldlikemoredetailsonhowallofthisstuffwasimplemented,
emailabrito@inf.puc-rio.br
.Youcanalsocontactourresearchgroup
headsergio@inf.puc-rio.br
.
Thankyouforyourinterest!☺
Previous
Back to Beginning