presentation slides (PPT) - Database Research Group - University of ...

musicincurableData Management

Jan 31, 2013 (4 years and 5 months ago)

154 views

Database Virtualization:

A New Frontier for Database
Tuning and Physical Design

Ahmed A. Soror Ashraf Aboulnaga


Kenneth Salem

David R. Cheriton School of Computer Science

University of Waterloo

SMDB 2007

Virtualization and Databases


The virtual machine monitor
(VMM)


A software abstraction layer
that partitions a hardware
platform into one or more
virtual machines. (VMware,
Virtual PC, Xen,..)



2

Ahmed A. Soror



Server Consolidation



Security



Manageability



Virtual Appliances

Virtualization Design

“ Given database workloads,
W
1
, …,
W
N
that will run
on
N
database systems inside
N

virtual machines,
how should we allocate the available resources
R

to
the
N

virtual machines to get the best overall
performance
?”

3

Ahmed A. Soror

Outline


Virtualization and Databases


Virtualization Design


Solution Approach


Optimizer Calibration


Experiments


Conclusions

4

Ahmed A. Soror

Solution Approach


To solve this problem,
we need:


A search algorithm for
enumerating candidate
resource allocations


A method for evaluating
the cost of a candidate
solution


5

Ahmed A. Soror

Cost Model for Virtualization


Cost modelling is an interesting part of the
story


We need a cost model that


Models query execution


Virtualization aware


Throughput oriented


Use the
query optimizer
in a new
what
-
if
mode


Need optimizer calibration

6

Ahmed A. Soror


Example:

PostgreSQL Optimizer


Resource modelling Parameters


CPU


cpu_tuple_cost


cpu_operator_cost


cpu_index_cost



Memory


shared_buffers


work_mem


effective_cache_size


I/O


random_page_cost


sequential_page_fe
tch_time

Need to estimate these
parameters for different
VM settings
.

7

Ahmed A. Soror


Use simple queries on synthetic data that perfectly
matches optimizer assumptions (fixed length fields,
uniform and independent data,..)


SELECT * FROM R


Execution time = fetched_pages + nTuples *
cpu_tuple_cost



SELECT max(A) FROM R


Execution time = fetched_pages + nTuples *

cpu_tuple_cost


+

nTuples *
cpu_operator_cost


Similar approach for other parameters


PostgreSQL:
cpu_tuple_cost

8

Ahmed A. Soror

PostgreSQL:
cpu_tuple_cost

0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0
20
40
60
80
100
cpu_tuple_cost


allocated CPU%

20
40
50 Mem
60
80
9

Ahmed A. Soror

Experimental setup


Machine


Two 2.8GHz Intel Xen CPUs (HT)


4GB of memory


SUSE Linux 10.0 / Xen 3.0.2


Database


4GB (with indexes) TPC
-
H (OSDB)


Two workloads running in two virtual machines


What do I want to see?


The optimizer ability to find better allocation than the default
one (50% mem, 50% CPU)

Ahmed A. Soror

10

And so, it works ..

11

Ahmed A. Soror

Conclusions

"Resource virtualization can provide many benefits to database
systems. But it also introduces new tuning and physical design
problems that are of interest to researchers in self
-
managing
database systems."




Virtualization Design


Formulation


Cost modeling




Next Steps


Inaccuracies in the optimizer cost model


The calibration procedure


Pruning the search space


Static versus dynamic


Optimizing for response time

Ahmed A. Soror

12