Goal: Build a custom, high performance OLTP database

quaggahooliganInternet και Εφαρμογές Web

5 Φεβ 2013 (πριν από 4 χρόνια και 8 μήνες)

112 εμφανίσεις

The end of an architectural era:


(it’s time for a complete rewrite)

M.
Stonebraker
, S. Madden, D. J.
Abadi
, S.
Harizopoulos
, N.
Hachem
,
and P.
Helland

VLDB, 2007



Presented by: Suprio Ray

The I/O Gap


Disk capacity doubles every 18 months















The I/O Gap


Disk capacity doubles every 18 months


Memory size doubles every 18 months


Disk bandwidth
doubles every

10 years













(R.
Feritas

et. al. FAST, 2008)




Memory (latency) is ~6000 times faster than disk




The I/O Gap


Disk capacity doubles every 18 months


Memory size doubles every 18 months


Disk bandwidth
doubles every

10 years













(R. Feritas et. al. FAST, 2008)


Avoid accessing disk (if possible)





One size does not fit all


OLTP


Amazon : 42 TB


Typical: less than a TB



Data Warehouse


Yahoo : 2 PB



Ebay
: 1.4 PB




Search engines (text)


Google : 850 TB



Scientific


US Department of Energy (NERSC): 3.5 PB




Stream processing







One size does not fit all


OLTP


Amazon : 42 TB


Typical: less than a TB



Data Warehouse


Yahoo : 2 PB



Ebay
: 1.4 PB




Search engines (text)


Google : 850 TB



Scientific


US Department of Energy (NERSC): 3.5 PB




Stream processing







Goal:

Build a custom,
high performance
OLTP database

Overview



Motivation



OLTP overheads


System architecture


Transaction management


Evaluation


Conclusion and discussion



Data + Indexes

Database System Architecture

Query Processing

Transaction Management

SQL query

Parser

Query

Rewriter

and

Optimizer

Execution

Engine

relational algebra

Statistics &

Catalogs &

System Data

query execution

plan

Buffer

Manager

Transaction

Manager

Calls from Transactions (read,write)

Concurrency

Controller

Lock

Table

Recovery

Manager

Log

OLTP Overheads


Logging


-
Must be written to disk for durability



Locking

-

To read or write a record



Latching


-

Updates to shared data structure



Buffer management


-

Cache disk pages in memory







Design considerations to remove overheads







Optimization


Advantage

Memory resident database

Remove buffer mgt

Partitioning and replication

High
-
availability, Remove
logging

Single
-
threaded execution

Remove locking and latching

Transaction variants


Remove concurrency control

H
-
Store system architecture


Shared
-
nothing, main
-
memory, row
-
store relational database



Node



hosts 1 or more sites



Site



single threaded



one site per core




Relation


divided into

one or more partitions



or



cloned



Partition



replicated and hosted on multiple sites

Runtime model


Stored procedure interface for transaction



Unique name


Control and SQL commands




SQL command execution


annotate the exec plan



passed to Transaction mgr



plans are transmitted



results passed back to initiator











System deployment



Cluster
deployment framework (CDF) accepts



a set of stored procedure



database schema



sample workload



available sites




CDF produces



a set of compiled



stored procedure



physical DB layout





Transaction variants


Single
-
sited

-
All queries can be executed on just one node



One
-
shot

-

Individual queries can be executed on single nodes



Two
-
phase

-
Phase 2 can be executed without integrity violation



Strongly two
-
phase

-

Either all replicas continue or all abort



Sterile

-

Order of execution doesn’t matter







Transaction management


Replica synchronization


Read any replica; update all replicas




Transaction ordering


Each transaction is timestamped




Concurrency control considerations


OLTP transactions are very short
-
lived



Single threaded execution avoids page latching


Not needed for some transaction classes (single
-
sited/one shot/sterile)






site_id


local_unique_timestamp

Concurrency control strategy



Basic strategy



Wait for a small time for conflicting transactions with
lower timestamp



If none found, execute the subplan and send result


Else, issue an abort




Intermediate strategy



Wait for a length of time approximated by




MaxD * average_round_trip_message_delay




Advanced strategy


If needed, abort a transaction using Optimistic CC rules





Evaluation


experimental setup


Benchmark: a variant of TPC
-
C


all transaction classes made one
-
shot and strongly two
-
phased



all transaction classes implemented as stored procedures




Databases


H
-
Store



a popular commercial RDBMS, X



Hardware



Dual
-
core 2.8GHz system



4GB RAM



4 x 250 GB SATA disk drives






Evaluation


results


Metric: Transactions/second per core


H
-
Store 82 times faster than X


















* performance record published by TPC
-
C















35000

425

1250

1000

0
5000
10000
15000
20000
25000
30000
35000
40000
H-Store
X
X (without
logging)
Best TPC-C *
Transactions/sec per core

Database

H
-
Store limitations



The database must fit into the available memory




A cluster
-
wide power failure to cause the loss of
committed transactions



A limited subset of SQL '99 is supported



DDL operations like ALTER and DROP aren't supported




Challenging operations model


Changing the schema or reconfiguring hardware requires first
saving and shutting down the system



No WAN support (single data
-
center)


In case of a network partition, some queries will not execute







Conclusion



Demise of general purpose database
(prediction)




H
-
Store is a custom, main
-
memory database
optimized for OLTP




H
-
Store shows significant performance
advantage over a popular relational database

Discussion


Raw speed vs. ease of use


Limited DDL support, changing schema/node requires reboot



“Separation of concern”



Is it a good idea to embed appl. logic in stored procedure?



Custom vs. general purpose query language



SQL to be replaced with Ruby
-
on
-
Rails ?



No WAN support: single data
-
center assumption



CAP theorem



Catastrophic failure scenario