Introduction to Big Data

fallsnowpeasInternet και Εφαρμογές Web

12 Νοε 2013 (πριν από 3 χρόνια και 4 μήνες)

76 εμφανίσεις

1

Introduction to Big Data
and
NoSQL

SQL Azure Saturday

April, 21, 2012

Don Demsak

Advisory Solutions Architect

EMC
Consulting

www.donxml.com


2

Meet Don


Advisory Solutions Architect


EMC Consulting


Application Architecture, Development & Design


DonXml.com, Twitter:
donxml


Email


don@donxml.com


SlideShare

-

http://www.slideshare.net/dondemsak


3

The era of Big Data

4

How did we get here?


Expensive


Processors


Disk space


Memory


Operating Systems


Software


Programmers


Monoculture


Limit CPU cycles


Limit disk space


Limit memory


Limited OS
Development


Limited Software


Programmers


Mono
-
lingual


Mono
-
persistence

5

Typical RDBMS Implementations


Fixed table schemas


Small but frequent reads/writes


Large batch transactions


Focus on ACID


Atomicity


Consistency


Isolation


Durability

6

How we scale RDBMS
implementations

7

1
st

Step


Build a relational database

Database

8

2
nd

Step


Table Partitioning

Database

p1 p2 p3

9

3
rd

Step


Database Partitioning

Web Tier

Browser

B/L Tier

Database

Customer #2

Web Tier

Browser

B/L Tier

Database

Customer #1

Web Tier

Browser

B/L Tier

Database

Customer #3

10

4
th

Step


Move to the cloud?

Web Tier

Browser

B/L Tier

SQL Azure

Federation


Customer #2

Web Tier

Browser

B/L Tier

SQL Azure

Federation

Customer #1

Web Tier

Browser

B/L Tier

SQL Azure

Federation


Customer #3

11

There has to be
other ways

12

Polyglot Persistence

13

Polyglot Programmer

14

15

Where
Did
NoSQL

Originate
?


1998
-

Carlo
Strozzi


NoSQL

project
-

lightweight open
-
source relational DB
with no SQL interface


2009
-

Eric Evans & Johan
Oskarsson

of Last.fm
wanted to organize an event to discuss open
-
source distributed databases

16

NoSQL

(loose) Definition


(often) Open
source


Non
-
relational


Distributed


(often) don’t guarantee
ACID

17

Atlanta 2009


No:sql
(east) conference


select fun, profit from
real_world

where relational=false


Billed as “conference of no
-
rel

datastores



18

Types Of NoSQL Data Stores

19

5 Groups of Data Models

Relational

Document

Key Value

Graph

Column Family

20

Document Store


Apache Jackrabbit


CouchDB


MongoDB


SimpleDB


XML Databases


MarkLogic

Server


eXist
.


21

Document?


Okay think of a web page...


Relational model requires column/tag


Lots of empty columns


Wasted space


Document model just stores the pages as is


Saves on space


Very flexible.


22

Graph Storage


AllegroGraph


Core Data


Neo4j


DEX


FlockDB


Microsoft Trinity (research project)


http://research.microsoft.com/en
-
us/projects/trinity/

23

What’s a graph?


Graph consists of


Node (‘stations’ of the graph)


Edges (lines between them)


FlockDB


Created by the Twitter folks


Nodes = Users


Edges = Nature of relationship between nodes.


24

Key/Value Stores


On
disk


Cache
in Ram


Eventually
Consistent


Weak Definition


“If no updates occur for a period, eventually all updates will
propagate through the system and all replicas will be consistent”


Strong Definition


“for a given update and a given replica eventually either the
update reaches the replica or the replica retires”


Ordered


Distributed Hash Table allows lexicographical processing

25

Key/Value Examples


Azure
AppFabric

Cache


Memcache
-
d


VMWare
vFabric

GemFire


26

Object Databases


Db4o


GemStone
/S


InterSystems

Caché


Objectivity/DB


ZODB

27

Tabular


BigTable


Mnesia


Hbase


Hypertable


Azure Table
Storage


SQL Server 2012

28

Azure Table Storage Demo

29

Big Data

30

Big Data Definition


Volumes & volumes of data


Unstructured


Semi
-
structured


Not suited for Relational Databases


Often utilizes
MapReduce

frameworks

31

Big Data Examples


Cassandra


Hadoop


Greenplum


Azure Storage


EMC
Atmos


Amazon S3


SQL Azure (with Federations support
)

32

Real World Example


Twitter


The challenges


Needs to store many graphs


Who you are following


Who’s following you


Who you receive phone
notifications from
etc


To deliver a tweet requires
rapid paging of followers


Heavy write load as followers
are added and removed


Set arithmetic for @mentions
(intersection of users).

33

What did they try?


Started with Relational
Databases


Tried Key
-
Value storage
of
denormalized

lists


Did it work?


Nope


Either good at


Handling the write load


Or paging large
amounts of data


But not
both

34

What did they need?


Simplest possible thing that would work


Allow for horizontal partitioning


Allow write operations to


Arrive out of order


Or be processed more than once


Failures should result in redundant work


Not lost work
!

35

The Result was
FlockDB


Stores graph data


Not
optimized
for graph traversal operations


Optimized
for large adjacency lists


List of all edges in a graph


Key is the edge value a set of the node end points


Optimized
for fast read and write


Optimized
for page
-
able set arithmetic.


36

How Does it Work?


Stores graphs as sets of edges between nodes


Data is partitioned by node


All queries can be answered by a single partition


Write operations are idempotent


Can
be applied multiple times without changing the
result


And commutative


Changing the order of operands doesn’t change the
result.


37

Working With Big Data

38

ACID


Atomicity


All or Nothing


Consistency


Valid according to all defined rules


Isolation


No transaction should be able to interfere with another
transaction


Durability


Once a transaction has been committed, it will remain
so, even in the event of power loss, crashes, or errors


39

BASE


Basically Available


High availability but not always consistent


Soft state


Background cleanup mechanism


Eventual consistency


Given a sufficiently long period of time over which no
changes are sent, all updates can be expected to
propagate eventually through the system and all the
replicas will be consistent.


40

Traditional (relational) Approach

Extract

Transform

Load

Transactional Data Store

Data Warehouse

41

Big Data Approach


MapReduce

Pattern/Framework


an Input Reader


Map Function



To transform to a common shape
(format)


a partition function


a compare function


Reduce Function


an Output
Writer

42

MongoDB

Example

> // map function

> m = function(){

...
this.tags.forEach
(

... function(z){

... emit( z , { count : 1 }
);

... }

... );

...};


> // reduce function

> r = function( key , values ){

...
var

total = 0;

... for (
var

i=0; i<
values.length
; i++ )

... total += values[i].count;

... return { count : total };

...};


> //
execute

> res =
db.things.mapReduce
(m, r, { out : "
myoutput
" } );



43

MongoDB

Demo

44

Big Data on Azure


Azure Table Storage


Azure Service Bus


SQL Azure Federations


MongoDB

on Azure


http://www.mongodb.org/display/DOCS/MongoDB+on+Azure


Hadoop

on Azure


https://www.hadooponazure.com/

45

Using Azure for Computing

Master

Client

Data

Worker

Worker

Worker

Data

Data

Data

Job/Task Scheduler

46

Moving to Event Based Architecture

Web Role

Queue

Req

Web Role

Web Role

Req

Req

Monitor queue

length against
user’s expectations

Web Role

Web Role

Web Role

Worker Role

Worker Role

Worker Role

Worker Role

Worker Role

Worker Role

47

Aggregate Stores

48

Visualizing Aggregates

ID: 1001

Customer: Ann

Line Items

32411234

2

$48

$96

707423234

1

$56

456

125145

1

$24

$24

Payment Details

Card:
AmEx

CC#: 12343

Expiration: 07/2015


Orders

Customers

Order Lines

Credit Cards

49

Visualizing Aggregates

ID: 1001

Customer: Ann

Line Items

32411234

2

$48

$96

707423234

1

$56

456

125145

1

$24

$24

Payment Details

Card:
AmEx

CC#: 12343

Expiration: 07/2015


{


SalesOrdersView
”:{



ID: 1001,



Customer: Ann,



LineItems
: []

……………..

…………….

……………..

}

}

50

MongoDB

on Azure Demo

51

Next Steps


Learn a
NoSQL

product


Great place to start


AppFabric

Cache, Azure Table
Storage,
MongoDB


Pick a new programming language to learn


Not Java or C#/VB


Node.js
, JavaScript
,
F
#




52

THANK YOU