NoSQL with MySQL

boundlessbazaarΔιακομιστές

9 Δεκ 2013 (πριν από 3 χρόνια και 10 μήνες)

134 εμφανίσεις

NoSQL with MySQL

Yekesa Kosuru

Distinguished Architect, Nokia

Peter Zaitsev, Percona

Agenda


Part 1 : Problem statement


Part 2 : Background (What is, Why, ACID, CAP)


Part 3 : Building a Key
-
Value Store


Part 4 : NoSQL with MySQL


PART 1: PROBLEM STATEMENT

NoSQL : No Formal Definition


No Formal Definition


Not Only SQL ?


Don’t use SQL ?


Limit expressive power ?


Underlying issues ?


ACID (Strong Consistency ?, Isolation ?)


SQL


Something else ?


Facts


Hardware will fail


Software will have bugs


Growing traffic, data volumes, unpredictable


No downtime or expose errors to users

Problem to solve


Key Value Store Problem Statement


A consumer facing system with
foll
. characteristics :


Host large volumes of data & queries


Responsive with predictable performance


Always available (survives failures, software/hardware upgrades)


Low
cost of
ownership


Minimal administration


Availability : How to service customers (24 X 7) despite failures, upgrades


Scalability : How to scale (capacity, transactions, costs…)


Predictability : Predictable responsiveness


Not suited for all applications

PART 2: BACKGROUND

ACID


Transactions simplify job of
developers


Client/developer point of view


Atomicity
: Bundled
operations


All
of the operations (multiple updates, deletes, inserts) in the
transaction will complete, or none
will


Consistency
: The database is in a consistent state when the transaction
begins and
ends


Referential
integrity, and other constraints


All reads must
see latest
data


Isolation
: The transaction will behave as if it is the only operation being
performed upon the database
. Serialized updates
i.e

locks


Durability
: Upon completion of the transaction, the operation will not be
reversed
.


ACID in single database is efficient


Issues Scaling DBMS


Databases extend ACID to span multiple
nodes using 2PC


Shared disk & shared nothing architectures


Cost of 2PC (N nodes involved in transaction)


Consistency & Isolation over 2PC makes it expensive


3N+1 Message Complexity


N+1 Stable Writes


Locks


lock resources


Availability is a pain point


2PC is blocking protocol , uses single coordinator and not
fault
tolerant


C
oordinator failure can block transaction indefinitely


B
locked transaction causes increased contention


Rebalancing


Upgrades


Total Cost of Ownership


RequestCommit

Prepare

Prepare

Prepare

Prepare

Prepare

Prepare

Prepare

Commit

Prepare

Prepare

Prepare

Prepared

CAP


Introduced by Prof. Brewer in year 2000


C
onsistency,
A
vailability, (Network)
P
artition
Tolerance


Industry
proved that we can guarantee two out of
three


System point
of
view


Consistency

:
Defines
rules for the apparent order and visibility of
updates


Variations on consistency


Eventual Consistency
-

eventually replicas will be updated


order of
updates may
vary
-

all reads will eventually see
it


Availability
:
All clients can
read & write data to some replica, even in
the
presence of
failures


Partition
-
tolerance
:
operations will complete, even if network is
unavailable i.e disconnected
network

PART 3: BUILDING KEY
-
VALUE STORE

Important Considerations


Functional


Simple API ( Get, Put, Delete
-

like Hashtable)


Primary key lookups only (limit expressive power)


No

joins,
No

non
-
key lookups,
No

grouping functions,
No

constraints …


Data Model (Key
-
Value, Value = JSON)


Read
-
Your
-
Write Consistency


SLA


Highly Available & Horizontally Scalable


Performance (low latencies & predictable)


Uniform cost of queries


Operational


Symmetric, Rolling Upgrades, Backups, Alerts, Monitoring


Total Cost of Ownership (TCO)


Hardware, Software, Licenses …



Making it Available


Make N replicas of data


Use Quorum (R
+ W > N
)


Overlap R & W to make it strongly consistent


Handle Faults (partition tolerance and self heal)


Real
faults


Machine down, NIC down …


Eventual Consistency (data)


Prefer A & P in
CAP


If a node is down during write, read repair later


let write proceed


Intermittent faults


X and Y are clients of Z; X can talk to Z but Y can’t


Server does not respond within timeout, stress scenarios …


Inconsistent
cluster
view can cause inconsistencies
in data


If two concurrently writing clients have different views of cluster,
resolve inconsistency later
-

let write proceed


R + W > N

2 + 2 > 3

N = Number of replicas

R = Number of Reads

W

= Number of Writes


Making it Scalable


Sharding (or
partition data)
works well for Capacity


Large number of logical shards distributed over few physical machines


How to shard

-
Consistent Hashing

-
Hash(key) say “2”, move clockwise to find nearest node “B”

-
Hash to create a uniform distribution

-
Allows for adding of machines

-
Without redesign of hashing

-
Without massive movement of data

-
Load balancing

-
If a machine
can’t
handle load, move shards to another machine


A,B,C = nodes

1,2,3,4 = keys

Vector Clocks


Determine history of a key
-
value, think data versioning


Need to interpret causality of events, to identify missed updates & detect
inconsistencies


Timestamps are one solution, but rely on synchronized clocks and don't
capture causality


Vector clocks are an alternative method of capturing order in a distributed
system


Node (i) coordinating the write generates a clock at the time of write


Construct: ( nodeid : logical sequence)


Node A receives first insert :
A:1


Node B receives an update :
A:2


Clock: (
A:1
) is predecessor of (
A:1
, B:1)


Clock: (A:2) and (A:1, B:1) are
concurrent


Reads use vector clocks to detect missed updates & inconsistencies



3 replicas



Eventual Consistency Example 3/2/2



Server

Node
B

Application

Server

Node
A

Client

PUT

GET

V1
A
:1

Write

Write

Read

PUT

PUT

GET

PUT

Write

Server

Node
C

Write

V1
A
:1

V1
A
:1

V2
(A:1,
B
:1)

V2
(A:1,
B
:1)

Read

V1
A
:1

Read

Read
Repair

NODE 1 DOWN

NODE 1 UP

Detect Missing
Update

V2
(A:1,
B
:1)

V2
(A:1,
B
:1)

V2
(A:1,
B
:1)

V2
(A:1,
B
:1)

Outage

Upgrade

Missed update added

PART 4: NOSQL WITH MYSQL

NoSQL with MySQL


Key
-
Value store implemented on top of MySQL/ InnoDB


MySQL serves as persistence layer, Java code handles the rest


Atomicity limited to one operation (auto commit), Consistency & Isolation
limited to one machine (recall issues with 2PC)


Why MySQL ?


Easy to install, tune and restore


Fast, Reliable, Proven


Performance :


MVCC, row level locking


Buffer pool, insert buffering, direct IO


Data Protection & Integrity :


Automatic crash recovery


Backup, Recovery (full and incremental)


Leverage expertise across ACID and BASE systems


Implementation


InnoDB Engine (innodb_file_per_table)


Simple schema


Auto increment PK


App Key, Value


Metadata


Hash of app key (index)


Clustered Index and Secondary Index


KV use simple queries


no joins or grouping functions


Database is a shard (pick right number of shards)


Backup & Recovery

Tuning


Queries are so simple that tuning is focused on IOPS, Memory, TCP,
Sockets, Timeouts, Connection Pool, JDBC, JVM , Queuing etc


Memory to disk ratio


Connector/J

useServerPrepStmts=true


useLocalSessionState = true

cachePrepStmts = true

cacheResultSetMetadata = true

tcpKeepAlive=true

tcpTrafficClass=24

autoReconnect=true

maxReconnects=...

allowMultiQueries=true

prepStmtCacheSize=…

blobSendChunkSize=…

largeRowSizeThreshold=…

locatorFetchBufferSize=…


CentOS 5.5, XFS File System


Scalability Test (TPS/IOPS) EC2’s

0
500
1000
1500
2000
2500
3000
3500
0
20
40
60
80
100
120
TPS
50% LAT
90% LAT
99% LAT
Server IOPS
T

P

S

Client Threads

Characteristics


m1.xlarge image

User Dataset = 256GB
(240M 1K values)

Machines = 5

Spindles=4 per box

XFS File System

Buffer Pool=12GB/machine

Memory:Disk=1:4

Workload R/W=100/0

Random Access Pattern

N/R/W=1/1/1

Incremental load to 100
threads





Client

TPS

50.00%

90.00%

99.00%

99.50%

99.90%

Single Server

Threads



LAT

LAT

LAT

LAT

LAT

IOPS

1

114.4

8

16

24

28

52

26

2

226.88

8

16

26

30

63

48

4

437.37

9

17

29

34

63

96

5

600.1

8

14

27

32

64

107

10

1061.45

9

18

35

44

77

199

20

1730.69

10

23

52

66

118

307

50

2775.85

12

40

98

123

202

482

100

3207.18

15

78

224

274

407

571

Know Your IOPS

Disk

IOPS (random)





7.2k rpm SATA

90

10k rpm SAS

140

15k rpm SAS

180

Value SSD

400

Intel X25
-
M

1500

Intel X25
-
E

5000

DDR Drive x1

300000

Performance Test

0
0.1
0.2
0.3
0.4
0.5
0.6
1458
2393
3328
4263
5198
6132
7066
8000
8937
9863
50%
90%
99%
Characteristics


User Dataset = 900GB

Machines = 9

Spindles=8 (15K)/machine

XFS File System (RAID0)

Buffer Pool=16GB/machine

Memory:Disk
=1:6

Cache=16%

Workload R/W=100/0

Recency Skew

i.e Recent Access=80/20

N/R/W=3/2/2

Incremental load to 10K
(TPS)





S

E

C

S

TPS

Tips By Peter Zaitsev

Percona

Why XFS

File System

Threads(1)

Threads(16)

EXT3

RNDRD

243

582

RNDWR

219

218

Total

462

800

XFS

RNDRD

239

552

RNDWR

205

408

Total

444

960

MySQL for NoSQL


Tune for Simple Queries


Memory is the most important


innodb_buffer_pool_size


Are we looking at intensive writes ?


innodb_log_file_size


RAID with BBU


XFS + O_DIRECT (avoid per inode locking)


Restrict Concurrency on database


Connection pool size or

innodb_thread_concurrency

More on Scalability


Is contention problem


Use MySQL 5.5 or Percona Server


Tune innodb_buffer_pool_instances


Index update contention


Use Multiple Tables


Or MySQL Partitions if you can't


Eliminate Query Parsing Overhead


Consider HandlerSocket


NoSQL Interface to
InnoDB
Storage Engine

Work on Result Stability


Ensure you're getting stable results


InnoDB
Flushing can cause
non
uniform performance

YEKESA.KOSURU@NOKIA.COM


THANK YOU