Towards Self-Tuning Databases

basesprocketΔιαχείριση Δεδομένων

31 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

133 εμφανίσεις

데이터베이스의

끝없는

도전

-

Towards Self
-
Tuning Databases
-







wonlee@ece.skku.ac.kr

http://vldb.skku.ac.kr

성균관대

&

엑셈

월간

마이크로소프트웨어

창간

20
주년

기념

세미나

데이터베이스
(
가제
)

Contents


A Quick Overview of Database History


Why Self
-
Tuning Databases?



Database Tuning in the 21
st

century



Towards Self
-
Tuning Databases

A Quick Overview of Database History


File systems(50s)



Hierarchy/Network databases(60s)



Relational databases(70s)



Object databases(80s)



OR databases(90s)



XOR databases(00s)

A Quick Overview of Database History(2)


The state
-
of
-
the
-
arts functionalities of RDBMSs


A giantic black
-
hole of application intelligence


Developer


DB2, Oracle,

SQL Server

Active DB

(Trigger)

Spatial DB

Data Mining

EXCEL

MOLAP

ROLAP

(Cube)

Deductive DB

(Recursion)

Procedure

-
C/C++, Java, ..

Temporal DB

Queue

e
-
mail

Multimedia

File System

XML/Stream/

Bioinformatics/

Appl. Server/

Web Service ..

A Quick Overview of Database History(3)


The state
-
of
-
the
-
arts functionalities of RDBMSs


Rapidly growing internal optimization/tuning knobs


DBA




DB2, Oracle,

SQL Server

MVs

Partitioning

Buffer Mgmt

Bitmap Index

………
.

Extensibility

Cost
-
Based

Optimization

Automatic

Disk Mgmt

A Quick Overview of Database History(4)


Achievements of RDBMSs

1.
Relational model and the “Esperanto” SQL


“productivity”

2.
Query optimization


“can beat all but the best programmers”

3.
Transaction management


“ACID”



E. F. Codd gone! but Relational forever!!



A vision for database community


"My prediction is that eventually
all storage systems

will
evolve
to be database systems
."(Jim Gray, 2002)


A Quick Overview of Database History(5)

DB = IT Core Platform in the 21
st

Century???



ERP/CRM/SEM/BSC


DW/OLAP/Data Mining


Web Log Analysis


GIS/XML/Mobile


Bio
-
informatics


....


Ubiquitous Databases



A Quick Overview of Database History(6)


One big challenge for RDBMS


Simplicity is a big issue.

(Jim Gray, 2003)



Commercial DBMSs are exponentially complicated


Even 10
-
year veteran Oracle DBAs do not know 50% of all the
new features of Oracle’s latest release


Worsely, it would take more than 1 week for them to understand
the performance behavior of a single new feature


The gap between human DB learning curve and new features
becomes larger



“Crossing this chasm” seems to be impossible!!



It’s time for self
-
tuning intelligence!

What is DB Tuning?


Car tuning


Goals?


What?


How?

Why DB Tuning in the 21st Century?


DB performance influences all aspects of your business


DB performance management is one of the most important
success factors in almost every IT projects



미래

웹경쟁력은

데이터처리능력에”
(Scott McNealy, Sun CEO,

자신문
)



DB tuning is
not

a technical issue,
but

a business issue!



Improve your
productivity



Increase your
revenue/profit



Increase your
ROI(Return on Investment)

Why DB Tuning in the 21st Century?(2)

1.
More data
-

doubling every 9 month

2.
More users
-

from CEO to sales representatives

3.
More complex query
-

e.g. OLAP/data mining query

4.
Faster responses


e.g. web response time impacts revenue!


e.g. real time personalization






Data growth vs.
computer speedup



*
Moore’s Law

--

# of
transistors/chip doubles
every 18 months (1965)

Why DB Tuning in the 21
st

Century?(4)


Disk sales doubling every 9 months



Greg Papadopoulos, Sun CTO


Also from Winter VLDB survey (http://www.wintercorp.com
)



Time to process all your data doubles every 18 months!


Without DB tuning, you need
double
-
up

investment on DB
server and software
every 6 month
!



Max
imize your
DB

performance!! (Let’s
DBMax
)


Maximize your
PRODUCTIVITY


Maximize your
REVENUE/PROFIT


Maximize your
ROI

Why Self
-
Tuning Databases?


War of TCO(Total Cost of Ownership) in
21
st

century


Major costs of IT infra.: H/W, S/W, Human Cost


Moore’s law:
H/W, S/W cost



BUT! Human
cost



cf. 1950s: telephone industry


manual switching by operators


automatic switching system

Why Self
-
Tuning Databases?(2)

Blackbox Database Tuning

Self
-
Tuning: Targets

Server

Processes

Shared Pool

Database Buffer Cache


SGA

Redo Log

Buffer

Oracle Instance

Parameter File

Datafiles

Control Files

Redo Log Files

Oracle Database

User

Processes

SQL

Optimizer

SQL Exec.
Engine

SQL Exec.
Memory

Big Picture of a Simplied DBMS

Self
-
Tuning: Targets(2)


SQLs


Memory


Index/Materialized Views


Emerging fields

SQL Self
-
Tuning: Optimizer


SQL


best execution plan


access method + join order + join method


Rule
-
based optimizer vs.
Cost
-
based optimizer



The Selinger
-
style optmizer


P. Selinger et al, “Access path selection in a relational
database management systems,” SIGMOD 79


SYSTEM/R


A “MUST” reading for every SQL guys


cost estimation


dynamic programming


SQL Self
-
Tuning(2): Cost
-
Based Optimzation

Data

Dictionary

Query Optimizer

Single

Selectivity

Combined

Selectivity

Selectivity &

Cardinality

Estimation

Formula

Estimated

Cardinality

Cost Model

Estimated

Cost

RDBMS Instance

Runtime Memory Environment:

Buffer Cache Size,

Sort/Hash Memory etc.

Real Data

2. Plan Generator

P1, P2, .. , Pn

1. SQL

4. (
Estimated
)

Best Execution

Plan for SQL:

3. Cost Estimator

Pi

for each plan

estimated cost

SQL Self
-
Tuning(3): Cost
-
Based Optimzation


Cost
-
based optimization technique itself is a self
-
tuning
process, but it lacks 5%


80% of SQL: Best plan


15%: top 3 plan


5%: not so good



Fallacies of CBO

1.
Uniform distribution

2.
Attribute Independence

3.
Join Independence

SQL Self
-
Tuning(4): Cost
-
Based Optimzation


How to fix the fallacies of CBO?


Various approaches: who? when? based on what?


1.
Manual tuning

2.
LEO

3.
SOS [13]

4.
Dynamic reoptimization

5.
Parameteric query optimization

6.
….

SQL Self
-
Tuning(5): IBM DB2 LEO


LEO( LE
arning
O
ptimizer

)


DB part of IBM

SMART
: Self
-
Managing And Resource Tuning


(See IBM’s
IBM Autonomic Computing

)


SQL Self
-
Tuning(6): IBM DB2 LEO


eeed

SELECT * FROM X,Y,Z

WHERE X.PRICE>=100 AND Z.CITY=

Denver


AND Y.MONTH=

Dec


AND X.ID=Y.ID

AND Y.NR=Z.NR

GROUP BY A

Negative feed back!!!

SQL Self
-
Tuning(7): Oracle’s v$sql_plan_statistics


Runtime statistics monitor(Since Oracle9iR2)


You can compare optimizer’s estimation with real statistics


Then, you can find out where is the source of errors?

** Someday, Oracle and MS will come with LEO
-
like features



SKKU and XM are developing
SOS

tool



S
ave
O
ur
S
QL”


How to tune SQL (semi
-
)automatically?

1.
Find the negative feedback

2.
Drill down to the causes

3.
Analyze the effect

4.
Hint the SQL

Self
-
Tuning: Targets


SQLs


Memory



Index/Materialized Views


Emerging fields

Memory Self
-
Tuning


Complex queries use memory intensive operators


Sorting, hashing, bitmap index manipulation etc.


The amount of available memory greatly affects their
performance







However, there is only a finited amount of memory,
shared by all concurrent users

Response Time

Memory

.

.

1
-
pass

optimal

M
-
pass

Sort

M
1
-
pass

= sqrt(I x S x 2)

M
optimal

= I

Memory Self
-
Tuning(2): Oracle SQL Memory Mgmt.

Oracle Memory Model

Sort Area,

Hash Area,

Bitmap Index


Pre
-
Oracle9i


Fixed PGA size


Sort(Hash)_Area_Size



Oracle9i [8]


PGA_Aggregate_Target


Dynamically allocated
according to the needs

Memory Self
-
Tuning(3): Oracle SQL Memory Mgmt.


Manual vs. Automatic: Response time

Number of Users


Manual vs. Automatic: Total Memory Usage

Memory Self
-
Tuning(4): Oracle SQL Memory Mgmt.


DBA specifies PGA_AGGREGATE_TARGET


Oracle divides PGA memory for max. performance and
optimal memory usage

Memory Self
-
Tuning(5): Oracle SQL Memory Mgmt.





Local

Memory

Manager

WP
1

WP
2

WP
3

S
n

S
n

S
n

S
n

Memory Bound

Computation

Memory Target

Computation

Memory Usage

Statistics

SQL Statements

pga_aggregate_target

Active work areas profiles

SQL

Memory

Target

Memory

Bound

SMM Daemon

WP
4



WP
n

Self
-
Tuning: Targets


SQLs


Memory


Index/Materialized View(MV)s



Emerging fields

Index/MV Self
-
Tuning


Selection of Index/MVs:


“SQL Workloads + Available Disk Space


Best Index/MV sets”



Workload:
queries + updates


Constraints:
upper bound on storage space for indexes


Search:
pick a configuration that is of “lowest” cost for the given
database and workload



Industry cases


MS Autoadmin


IBM Design Advisor


Oracle OEM Index/Summary Advisor

Index/MV Self
-
Tuning(2): MS SQL Server


AutoAdmin
: Self
-
Tuning and
-
Administering
Databases[12]


“We achieve this by enabling databases
to track the usage of their
systems

and
to gracefully adapt to application requirements
. Thus,
instead of applications having to track and tune databases,
databases
actively auto
-
tunes itself to be responsive to application needs
.”



Workload driven physical database design



To reduce TCO(Total Cost of Ownership)

Index/MV Self
-
Tuning(3): MS SQL Server

Architecture of Index/MV Selection Tools

Issue
-

search space


Large search space for
indexes


Many columns to choose from


Kinds of indexes


Explosive search space for
MVs


Query optimizers use physical
design in novel ways


Physical design choices
interact

Workload

Syntactic structure

selection

Candidate

Index

Selection

Configuration

Enumeration


Final

Recommendation


Microsoft

SQL

Server











Configuration

Simulation

and Cost

Estimation

Module


Candidate

Materialized

View Selection

Self
-
Tuning: Targets


SQLs


Memory


Index/Materialized View(MV)s


Emerging areas for self
-
tuning

Emerging Areas for Self
-
Tuning

Server

Processes

Shared Pool

Database Buffer Cache


SGA

Redo Log

Buffer

Oracle Instance

Parameter File

Datafiles

Control Files

Redo Log Files

Oracle Database

User

Processes

SQL

Optimizer

SQL Exec.
Engine

SQL Exec.
Memory

Big Picture of a Simplied DBMS

Emerging Areas for Self
-
Tuning(2)


Automatic configuration of init. parameter[5]


e.g. IBM DB2 V7.2 “Configuration Advisor”


For 36 parameters, 98.4% of DB2 performance experts



Automating physical database layout[9]


“How database objects such as tables and indexes are assigned
to disk drives can impact the I/O performance of the system” (MS
SQL Server)


Automatic partitioning of tables under parallel databases (DB2)



Emerging Areas for Self
-
Tuning(3)


Accurate
storage device access cost

estimation[11]


CPU speed, disk I/O time, network, RAID rebuild


Inaccurate estimation cause (seriously) suboptimal query plan


Why wrong estimation?


storage or network may undergo heavy load or failure


older/newer than the h/w the optimizer assume


“We can achieve noticeable performance improvements by
providing query optimizer with
accurate and timely information
about the current status of their storage devices



Emerging Areas for Self
-
Tuning(4)


System tuning[1]


Most urgent, but only ad
-
hoc or rule
-
of
-
thumb approaches with some
monitoring tools: Bizmax/MaxGuage, Orange/Chakra, Spotlight, Patrol,
OEM






Only the black box consulting


Highly dependent on consultant or DBA experience


Not scientic/engineering realm!!



We need to develop
mathematical model

for the interactions among
various system tuning knobs; Or, we need to construct and share
the
rule/knowledge base

of the system tuning experts

Monitoring

Drill
-
down

to Bottleneck

Root Cause

Analysis

Tuning

Conclusion


Next decade will be self
-
tuning intelligence era


Better than average DBAs



BUT, You may be still the MVP for IT ROI
only if


“Crossing the chasm between you and your DBMS technology”


Know your data


Know your application workloads


Know your DBMS intelligence

Let’s

DBMax
!

References

1.
Gerhard Weikum et al., “Self
-
Tuning Database Technology and Information
Services: from Wishful Thinking to Viable Engineering” VLDB 2002, Ten
-
Year Award

2.
Gerhard Weikum et al, “The COMFORT Automatic Tuning Project”,
Information Systems 19(5), 1994

3.
Gerhard Weikum et al, “Special Issues on Self
-
Tuning Database and
Applications,” Data Engineering Bulletine, Jun., 1999

4.
IBM’s Autonomic Computing Perspectives:
http://www.resarch.ibm.com/autonomic/manifesto/autonomic_computing.p
df

5.
Guy Lohman et al, “Toward Automomic computing with DB2 Universal
Database,” SIGMOD Record, Sep., 2002

6.
MS AutoAdmin Homepage:
http://www.research.microsoft.com/

7.
Oracle Manageability:
http://technet.oracle.com/products/manageability/content.html

References(2)



8.
Benoit Dageville and Mohamed Zait, “SQL Memory Management in
Oracle9i,” VLDB 2002

9.
Jun Rao, Chun Zhang, Guy Lohman et al., “Automating Physical Database
Design in a Parallel Database,” SIGMOD 2002

10.
S. Agrawal and Surajit Chaudhuri et al., “Automating Layout of Relational
Databases,” ICDE 2003

11.
F. R. Reiss and T. Kanungo, “A Characterization of the Sensitivity of Query
Optimization to Storage Access Cost Parameters,” SIGMOD 2003

12.
S. Agrawal and Surajit et al., “Automated Selection of Materialized Views
and Indexes for SQL Databases,” VLDB 2000

13.
Michael Stillger, Guy M. Lohman et al., “LEO
-

DB2's LEarning Optimizer,”
VLDB 2001