SQL AND NOSQL ARE TWO SIDES OF THE

sunfloweremryologistData Management

Oct 31, 2013 (3 years and 7 months ago)

82 views

© 2012 Microsoft

SQL AND
NOSQL

ARE TWO SIDES OF THE
SAME COIN


Michael Rys, Microsoft Corp.

@
SQLServerMike

Strata 2012 Conference, March 2012

AGENDA


Scaling out your business is important!


NoSQL

Paradigms and
NoSQL

Platforms


SQL learns from
NoSQL


(with a demo of SQL
Azure
Federations)


NoSQL

learns from SQL


Scalable Data Processing Platform of the Future


Online

Business

Application

Attract Individual
Consumers:

-
Provide
interesting
service

-
Provide mobility

-
Provide social


Monetize Individual:

-
Upsell service

-
VIP

-
Speed

-
Extra
Capabilities

Monetize the Social:

-
Improve individual
experience

-
Re
-
sell Aggregate Data
(e.g., Advertisers)

THE WEB 2.0 BUSINESS ARCHITECTURE

SOCIAL
NETWORKING
:
THE BUSINESS PROBLEM


100s of million of users


10s of million of users concurrently


Terabytes to petabytes
of data


Structured and unstructured


Required (eventual) data
consistency across
users


E.g. show
your updated
state in
your
friends’ profile pages

SOLUTION


Shard/Partition user data
across
hundreds
to
thousands
of SQL Databases


Propagate data changes from one DB to other
DBs using reliable,
async

Message Service


Managing routes from each DB to every other DB
would be too complex


Global Transactions would hinder scale and
availability


Provide a caching layer for performance


And also used for

o
Clean
-
up
state (e.g. on account close)

o
Deploy business logic (stored procedures)



EXAMPLE ARCHITECTURE

1
-
1000

3001
-
4000

2001
-
3000

1001
-
2000

4001
-
5000

5001
-
6000

I change


my status


userId
=1024


Web Tier

Data Tier

Service

Dispatcher

Async

Message

Async

Message

TX2

My DB

gets updated

TX1

Async

Message

TX3

TX4

TX5

MANY LARGE SCALE CUSTOMERS USING SIMILAR PATTERNS


Patterns


Sharding

and reliable messaging


Sharding

and fan/out query layer


Caching layer



Customer Examples


Social Networking: Facebook, MySpace,
etc


Online electronic stores (cannot give names

)


Travel reservation systems (e.g. Choice International)


MSN Casual Gaming


etc.


LESSONS LEARNED FROM THESE SCENARIOS


Require

high availability


Be able to scale out:


Functional and Data Partitioning Architecture


Provide scale
-
out processing:

o
Function shipping

o
Fanout

and Map/Reduce processing


Be able to deal with failures:

o
Quorum

o
Retries

o
Eventual Consistency (similar to Read
-
consistent Snapshot Isolation)


Be able to quickly grow and change:


Elastic scale


Flexible, open schema


Multi
-
version schema support


Move better support for these patterns into the Data Platform!

WHAT IS
NOSQL

ABOUT?



NoSQL

= operational and developer agility at low
CapEx

and
OpEx
!



Low Cost


Free

Open Source Stores


Scale
CapEx

cost
below customer growth
rate


Web friendly
developer model and tool chain



Processing
Paradigms


High Availability
(scalable Replication, Fast Failover, DR/
GeoDR
, tunable latency)


Scale
-
out

(
Sharding
, Map
-
Reduce, Elasticity)


Performance

(tuned for specific workloads, Caching, co
-
located compute with partitioned state)


Tunable/Eventual Consistency



Data Model Paradigms


Data first: Flexible Schema


Low
-
impedance mismatch
between programming and data model:

o
Key
-
Documents and Objects (BLOBS, JSON, XML, POJO)

o
Key
-
Wide Sparse Column Sets

o
Graphs (e.g., RDF)



Range from devices, over OLTP Web 2.0 applications to
BigData

Analytics

DATA MODELS

Data Model

Example Stores (apologies to the ones I did

not list)

Simple Key
-
Value Pairs

Memcache, Redis, Dynamo, Voldermort,
LevelDB
,

Azure Caching

Wide Sparse Column Sets

HyperTable
, Big Table, Cassandra, HBASE,
Hyperbase
, Amazon
DynamoDB
,
Windows Azure Tables, SQL Server/Azure Sparse
columns

BLOBs

Amazon S3, Oracle Berkeley
NoSQL
,
Windows Azure Blob
Store, SQL Server RBS/
FileTable

JSON Documents

MongoDB,
CouchBase
, Riak, RavenDB

Graph

Neo4J, GraphDB,
HypergraphDB
,
Stig
,
Intellidimension

Objects and XML Documents

Versant, Oracle

Berkeley
NoSQL
,
MarkLogic
,
existDB
, EMC
HiveDB
,
SQL Server/Azure
, Oracle, IBM DB2

Extended Relational

Oracle, EMC
SQLFire
, IBM DB2, MySQL,
Postgres
,
SQL
Server/Azure

WHAT CAN SQL LEARN FROM
NOSQL
?


Low
CapEx
, Low
OpEx



Built
-
in
tunable High
-
Availability


Data scale
-
out (
Sharding
)


Processing scale
-
out (Map
-
Reduce, Fan
-
Out, tunable consistency)


Flexible Data Models


JSON (& XML) support


Sparse columns/Column sets


Integrate with
BigData

Analytics (e.g.,
Hadoop
)


Many Relational Database Systems are
incorporating these learning!

EXAMPLE: SQL AZURE FEDERATIONS


Provides Data Partitioning/
Sharding

at the Data Platform


Enables applications to build elastic scale
-
out applications


Provides non
-
blocking SPLIT/DROP for shards (MERGE to come later)


Auto
-
connect to right shard based on
sharding

keyvalue


Provides SPLIT resilient query mode


SQL AZURE FEDERATION CONCEPTS

16

Federation


Orders_Fed


Sharded

Application

Azure DB with
Federation
Root

Federation Directories, Federation Users,
Federation Distributions, …


Federation


Represents the data being
sharded


Federation
Root


Database
that
logically houses federations, contains
federation meta data


Federation Key


Value that determines the routing of a piece of data
(defines a Federation Distribution)


Atomic Unit


All rows with the same federation
key
value: always
together!


Federation
Member (aka Shard)


A physical container for a
set

of federated tables for
a specific key range and reference tables


Federated Table


Table that contains only atomic units for the
member’s key range



Reference Table


Non
-
sharded

table

Member: PK [min, 100)

Member: PK [100, 488)

Member: PK [488, max)

(Federation Key
:
CustomerID
)

AU

PK=5

AU

PK=25

AU

PK=35

AU

PK=105

AU

PK=235

AU

PK=365

AU

PK=555

AU

PK=2545

AU

PK=3565

Connection
Gateway

DEMO

MAP
-
REDUCE SCALE
-
OUT OVER SQL
AZURE FEDERATIONS


Sharded

GamesInfo

table using SQL Azure Federations


Use a C# library that does implement a Map/Reduce
processor on top SQL Azure Federations


Mapper and Reducer are specified using SQL


17

WHAT CAN NOSQL LEARN FROM SQL?


Flexible data is good, but:


Provide optional schema in data platform to help with constraints and optimizations


Procedural Scale
-
Out
processing is good, but:


Develop a declarative language suited for and across the data models (e.g.,
coSQL
)


Standardize suitable abstractions and languages


Eventual Consistency is good, but:


Provide users the choice


Simple
Queries are good, but:


Provide me with secondary indexes


it will be more efficient to join between two collections of JSON documents in the
query engine than in the Application layer


Many
NoSQL

Database
Systems are
starting to incorporate these
learnings
!


Online

Business

Application

Attract Individual
Consumers:

-
Provide
interesting
service

-
Provide mobility

-
Provide social


Monetize Individual:

-
Upsell service

-
VIP

-
Speed

-
Extra
Capabilities

Monetize the Social:

-
Improve individual
experience

-
Re
-
sell Aggregate Data
(e.g., Advertisers)

THE WEB 2.0 BUSINESS ARCHITECTURE

Primary
Shard

Readable
Replica

Readable
Replica

Primary
Shard

Readable
Replica

Readable
Replica

Primary
Shard

Readable
Replica

Readable
Replica

OLTP Workloads


Highly Available

High Scale

High Flexibility


mostly touching 1
to low number of
shards

Dynamic OLAP Workloads


3Vs (Volume, Velocity, Variety)
Exploratory


Scale
-
out queries, often using
eventual consistent scale
-
out
frameworks like
Hadoop

SCALE
-
OUT DATA PLATFORM ARCHITECTURE

Traditional OLAP Workloads

known schema

Data warehouse, “Star joins”

Copy

Query

SQL or
NoSQL

Store

BIG DATA REQUIRES AN END
-
TO
-
END APPROACH

21

CALL TO ACTION


Familiarize yourself with the
NoSQL

genes in the Microsoft Online Platform


Free 3
-
Month Trial for Windows and SQL Azure:
http
://
www.windowsazure.com




Engage with us throughout Strata











Download slides with additional information and
r
elated resources:

http://.../....

22

Presentation

Speaker

Date

and Time

Do We Have the Tools We Need to Navigate
the New World of Data?


Dave Campbell

2/29 9:00am PST

Onsite Interview *

Tim O’Reilly, Dave Campbell

2/29 10:15am PST

Unleash Insights on All Data With Microsoft
Big Data

Alexander Stojanovic

2/29 11:30am PST

Office Hours (Q&A session)

Dave Campbell

2/29 1:30pm PST

Hadoop + Javascript: What We Learned

Asad Khan

2/29 2:20pm PST

Democratizing BI at Microsoft: 40,000 Users
and Counting

Kirkland Barrett

3/1 10:40am PST

Data Marketplaces For Your Extended
Enterprise

Piyush Lumba

3/1 2:20pm PST

APPENDIX

23

RELATED RESOURCES


Scale
-
Out with SQL Databases


http
://gigaom.com/cloud/facebook
-
shares
-
some
-
secrets
-
on
-
making
-
mysql
-
scale
/



Windows Gaming Experience Case Study:
http
://
www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4000008310



Scalable
SQL:
http://
cacm.acm.org/magazines/2011/6/108663
-
scalable
-
sql


http://
www.slideshare.net/MichaelRys/scaling
-
with
-
sql
-
server
-
and
-
sql
-
azure
-
federations



NoSQL

and the Windows Azure
Platform


Whitepaper:

http
://
download.microsoft.com/download/9/E/9/9E9F240D
-
0EB6
-
472E
-
B4DE
-
6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf



SQL Federation
blog:

http
://
blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql
-
genes
-
in
-
sql
-
azure
-
federations.aspx



Contact
me


@
SQLServerMike


http://
sqlblog.com/blogs/michael_rys/default.aspx