MDC - Data Migration

scacchicgardenSoftware and s/w Development

Dec 13, 2013 (3 years and 4 months ago)

262 views

1

Mongo DB


MongoDB (from "humongous

) is a scalable, high
-
performance, open source, document
-
oriented
database. Written in C++.



Home:
http://www.mongodb.org/


Support by
http://www.10gen.com/


Production Deploy
http://www.mongodb.org/display/DOCS/Production+
Deployments




2

Agenda



Getting Up to Speed with MongoDB ( key
features)



Developing with MongoDB (start & shutdown &
connect & query & DML)



Advanced Usage ( index & Aggregation, GridFS)



Administration ( admin,replication,sharding)



MISC (BJSON;internal)



3

Getting Up to Speed with MongoDB



Key Features of MongoDB

1.
document
-
oriented

2.
schema
-
free



Design Philosophy



Different from Relation database


easy & simple admin/dev : no
-
transaction, no
-
relation, no
-
duration, no
-
SQL


Different from key
-
value database


tons of functions: indexing,Aggregation (MapReduce
etc),Fixed
-
size collections,File storage,replication


4

Getting Up to Speed with MongoDB



Key Features of MongoDB


1.

Document
-
oriented (multiple key/value pairs)


{ "_id" : ObjectId("4d9a2fa7640cde2b218c6f65"), "version" : 1300363306, "evenOrOdd" : 0,
"siteId" : 0 }


{ "_id" : ObjectId("4d9cd50d2a98297726eeda5b"), "prefix" : "craft fl", "res" : { "sug" : [ "craft
flowers" ], "categories" : [ [ 14339, "Crafts" ] ] } }


2.
Instance
--
1:X
--

database
--

1:X


collection

1:X


document


3.
schema
-
free



Use test



Db.test.insert({ "version" : 1300363306, "evenOrOdd" : 0, "siteId" : 0 })



Db.test.insert({"prefix" : "craft fl", "res" : { "sug" : [ "craft flowers" ], "categories" : [ [ 14339,
"Crafts" ] ] } })



Db.test.insert({

name

:

binzhang

});



Db.test.ensureIndex({

name

,

1

})





5

Getting Up to Speed with MongoDB



Design Philosophy


1.

Databases are specializing
-

the "one size fits all" approach no longer applies.
MongoDB is bettween in
-
memory key
-
value and relational persistent database.


2.

By reducing transactional semantics the db provides, one can still solve an
interesting set of problems where performance is very important, and horizontal scaling
then becomes easier. The simpler, the faster.


3.

The document data model (JSON/BSON) is easy to code to, easy to manage
(schemaless), and yields excellent performance by grouping relevant data together
internally. But waste a bit space.


4.

A non
-
relational approach is the best path to database solutions which scale
horizontally to many machines. Easy to scale out for in
-
complex application.


5.

While there is an opportunity to relax certain capabilities for better performance,
there is also a need for deeper functionality than that provided by pure key/value stores.




6

Getting Up to Speed with MongoDB



Different from Relation database


1.
easy & simple admin/dev


Kill

INT to shutdown instance

2.
no
-
transaction

3.
no
-
relation

4.
no
-
duration

KILL
-
9 would corrupt database and need to repair when start up
next time.

5.
no
-
SQL


MongoDB comes with a JavaScript shell that allows
interaction with MongoDB instance from the command line.



7

Getting Up to Speed with MongoDB



Different from key
-
value database

1.
Datatypes:
null,boolean,32
-
bit integer,64 bit integer,64 bit floating
point number, string, object_id,date,regular expression,code,binary
data,undefined,array,embedded document etc

2.
Indexing:
unique index,combine index, geospatial indexing etc

3.
Aggregation (MapReduce etc):
group distinct etc

4.
Fixed
-
size collections:
Capped collections are fixed in size and
are useful for certain types of data, such as logs.

5.
File storage:
a protocol for storing large files, uses subcollections to
store file metadata separately from content chunks

6.
Replication:
include master
-
slave mode and replicate
-
set mode

7.
Security :
simple authorization.



8

Agenda



Getting Up to Speed with MongoDB ( Summary :
document oriented & schema
-
free )



Developing with MongoDB (start & shutdown &
connect & query & DML)



Advanced Usage ( index & Aggregation, GridFS)



Administration ( admin,replication,sharding)



MISC



9

Developing with MongoDB




Continue.


Start MongoDB


connect


Query


DML ( create, insert, update, delete, drop )


Stop cleanly


10

Developing with MongoDB




Start MongoDB



Mkdir /MONGO/data01



/opt/mongo/bin/mongod
--
logpath /MONGO/log01/server_log.txt
--
logappend
--
fork
--
cpu
--
dbpath /MONGO/data01
--
replSet autocomplete


Fri Apr 1 14:37:08 [initandlisten] MongoDB starting : pid=10799

port=27017
dbpath=/MONGO/data01

64
-
bit

Fri Apr 1 14:37:08 [initandlisten] db version v1.8.0, pdfile version 4.5

Fri Apr 1 14:37:08 [initandlisten] git version:
9c28b1d608df0ed6ebe791f63682370082da41c0

Fri Apr 1 14:37:08 [initandlisten] build sys info: Linux bs
-
linux64.10gen.cc
2.6.21.7
-
2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64
BOOST_LIB_VERSION=1_41

Fri Apr 1 14:37:08 [initandlisten] waiting for connections on port 27017

Fri Apr 1 14:37:08 [websvr] web admin interface listening on port 28017




11

Developing with MongoDB


Connect to MongoD


$opt/mongo/bin/mongo

MongoDB shell version: 1.8.0

connecting to: test

autocomplete:PRIMARY> exit

Bye


usage: /opt/mongo/bin/mongo [options]
[db address] [file names (ending in .js)]












12

Developing with MongoDB



MongoDB Shell



MongoDB comes with a JavaScript shell that allows interaction with a
MongoDB instancefrom the command line.



Query


find()



db.c.find() returns everything in the collection c.


db.users.find({"age" : 27}) where the value for "age" is 27


db.users.find({}, {"username" : 1, "email" : 1}) if you are interested only
in the "username" and "email" keys


db.users.find({}, {"fatal_weakness" : 0}) never want to return the
"fatal_weakness" key


db.users.find({}, {"username" : 1, "_id" : 0})


db.users.find({"age" : {"$gte" : 18, "$lte" : 30}})


db.raffle.find({"ticket_no" : {"$in" : [725, 542, 390]}})


db.c.find({"z" : {"$in" : [null], "$exists" : true}})


db.users.find({"name" : /joe/i})










13

Developing with MongoDB

Behind Find() : Cursor


The database returns results from find using a
cursor
.


The client
-
side implementations of cursors generally allow you
to control a great deal about the eventual output of a query.


> for(i=0; i<100; i++) {

... db.c.insert({x : i});

... }

> var cursor = db.collection.find();


> while (cursor.hasNext()) {


... obj = cursor.next();


... // do stuff


... }

> var cursor = db.people.find();

> cursor.forEach(function(x) {

... print(x.name);

... });

adam

matt

zak

14

Developing with MongoDB


Behind Find() : Cursor continue


Getting Consistent Results?


var cursor = db.myCollection.find({country:'uk'}).snapshot();



A fairly common way of processing data is to pull it out of
MongoDB, change it in some way, and then save it again:


cursor = db.foo.find();

while (cursor.hasNext()) {


var doc = cursor.next();


doc = process(doc);


db.foo.save(doc);

}

15

Developing with MongoDB


Create collection


db.foo.insert({"bar" : "baz"})



Insert


db.foo.insert({"bar" : "baz"})



Update



db.users.update({"_id" : ObjectId("4b253b067525f35f94b60a31")},


... {"$set" : {"favorite book" : "war and peace"}})



Delete

db.users.remove()

db.mailing.list.remove({"opt
-
out" : true})



Drop collection



db.foo.drop();



16

Developing with MongoDB


DML continue : Safe Operation


1.
MongoDB does not wait for a response by default when writing to
the database. Use the getLastError command to ensure that
operations have succeeded.


2.
The getLastError command can be invoked automatically with many
of the drivers when saving and updating in "safe" mode (some
drivers call this "set write concern").



db.$cmd.findOne({getlasterror:1})


db.runCommand("getlasterror")


db.getLastError()


17

Developing with MongoDB




Stop MongoDB



kill
-
2 10014 (SIGINT) or kill 10014 (SIGTERM).


1.
wait for any currently running operations or file preallocations to finish (this
could take a moment)

2.
close all open connections

3.
flush all data to disk

4.
halt.



use the shutdown command


> use admin

switched to db admin

> db.shutdownServer();

server should be down...






18

Agenda



Getting Up to Speed with MongoDB ( Summary :
document oriented & schema
-
free )



Developing with MongoDB (Summary: find())



Advanced Usage ( index & Aggregation, GridFS)



Administration ( admin,replication,sharding)



MISC



19

MongoDB Advanced Usage




Advanced Usage


Index


Aggregation


MapReduce


Database commands


Capped Collections


GridFS: Storing Files


20

MongoDB Advanced Usage




Index


MongoDB

s indexes work almost identically to typical relational
database indexes,


Index optimization for MySQL/Oracle/SQLite will apply equally
well to MongoDB.


if an index has N keys, it will make queries on any prefix of
those keys fast


Example


db.people.find({"username" : "mark"})


db.people.ensureIndex({"username" : 1})



db.people.find({"date" : date1}).sort({"date" : 1, "username" : 1})


db.ensureIndex({"date" : 1, "username" : 1})



db.people.find({"username" : "mark"}).explain()



21

MongoDB Advanced Usage




Index continue


Indexes can be created on keys in embedded documents in the same
way that they are created on normal keys.


Indexing for Sorts : Indexing the sort allows MongoDB to pull the
sorted data in order, allowing you to sort any amount of data without
running out of memory.


Index Nameing rule:

keyname1
_
dir1
_
keyname2
_
dir2
_..._
keynameN
_
dirN
, where
keynameX is the index

s key and dirX is the index

s direction (1 or
-
1).



db.blog.ensureIndex({"comments.date" : 1})


db.people.ensureIndex({"username" : 1}, {"unique" : true})


db.people.ensureIndex({"username" : 1}, {"unique" : true, "dropDups" :
true})



autocomplete:PRIMARY> db.system.indexes.find()

{ "name" : "_id_", "ns" : "test.fs.files", "key" : { "_id" : 1 }, "v" : 0 }

{ "ns" : "test.fs.files", "key" : { "filename" : 1 }, "name" : "filename_1", "v" : 0 }

{ "name" : "_id_", "ns" : "test.fs.chunks", "key" : { "_id" : 1 }, "v" : 0 }

{ "ns" : "test.fs.chunks", "key" : { "files_id" : 1, "n" : 1 }, "name" :
"files_id_1_n_1", "v" : 0 }



22

MongoDB Advanced Usage




Index continue : explain()



explain will return information about the indexes used for the query (if any)
and stats about timing and the number of documents scanned.


autocomplete:PRIMARY> db.users.find({"name":"user0"}).explain()

{


"cursor" : "BtreeCursor name_1",


"nscanned" : 1,


"nscannedObjects" : 1,


"n" : 1,


"millis" : 0,


"nYields" : 0,


"nChunkSkips" : 0,


"isMultiKey" : false,


"indexOnly" : false,


"indexBounds" : {


"name" : [


[


"user0",


"user0"


]

]

}

}





23

MongoDB Advanced Usage




Index continue : hint()


If you find that Mongo is using different indexes than you want it
to for a query, you can force it to use a certain index by using
hint.


db.c.find({"age" : 14, "username" : /.*/}).hint({"username" : 1, "age" : 1})




Index continue : change index


db.runCommand({"dropIndexes" : "foo", "index" : "alphabet"})



db.people.ensureIndex({"username" : 1}, {"background" : true})


Using the {"background" : true} option builds the index in the
background, while handling incoming requests. If you do not include
the background option, the database will block all other requests
while the index is being built.









24

MongoDB Advanced Usage




Advanced Usage


Index


Aggregation

db.foo.count()

db.foo.count({"x" : 1})

db.runCommand({"distinct" : "people", "key" : "age"})



Group


{"day" : "2010/10/03", "time" : "10/3/2010 03:57:01 GMT
-
400", "price" : 4.23}

{"day" : "2010/10/04", "time" : "10/4/2010 11:28:39 GMT
-
400", "price" : 4.27}

{"day" : "2010/10/03", "time" : "10/3/2010 05:00:23 GMT
-
400", "price" : 4.10}

{"day" : "2010/10/06", "time" : "10/6/2010 05:27:58 GMT
-
400", "price" : 4.30}

{"day" : "2010/10/04", "time" : "10/4/2010 08:34:50 GMT
-
400", "price" : 4.01}


db.runCommand({"group" : {

... "ns" : "stocks",

... "key" : "day",

... "initial" : {"time" : 0},

... "$reduce" : function(doc, prev) {

... if (doc.time > prev.time) {

... prev.price = doc.price;

... prev.time = doc.time;
... } }}})



25

MongoDB Advanced Usage



Mapreduce



It is a method of aggregation that can be easily parallelized
across multiple servers. It splits up a problem, sends chunks of
it to different machines, and lets each machine solve its part of
the problem. When all of the machines are finished, they merge
all of the pieces of the solution back into a full solution.
\


Example: Finding All Keys in a Collection

>map = function() {

... for (var key in this) {

... emit(key, {count : 1});

... }};



> mr = db.runCommand({"mapreduce" : "foo", "map" : map, "reduce" :
reduce})

> db[mr.result].find()



> reduce = function(key, emits) {

... total = 0;

... for (var i in emits) {

... total += emits[i].count; }

... return {"count" : total};

... }


26

MongoDB Advanced Usage


Database commands



Commands implement all of the functionality that doesn

t fit
neatly into

create, read, update, delete.



Example:

> db.runCommand({"drop" : "test"});

{ "errmsg" : "ns not found", "ok" : false }


It equals querying $cmd internal collections.


>db.$cmd.findOne({"drop" : "test"});



Show all commands


>db.listCommands()




27

MongoDB Advanced Usage


Capped Collections

1.
capped collections automatically age
-
out the
oldest documents as new documents are
inserted.

2.
Documents cannot be removed or deleted
(aside from the automatic age
-
out described
earlier), and updates that would cause
documents to move (in general updates that
cause documents to grow in size) are
disallowed.

3.
inserts into a capped collection are
extremely fast.

4.
By default, any find performed on a capped
collection will always return results in
insertion order.

5.
ideal for use cases like logging.

6.
Replication use capped collection as OpLog.




28

MongoDB Advanced Usage


GridFS: Storing Files



GridFS is a mechanism for storing large binary files in MongoDB.



Why using GridFS




Using GridFS can simplify your stack. If you

re already using MongoDB,
GridFS

obviates the need for a separate file storage architecture.




GridFS will leverage any existing replication or autosharding that you

ve set
up for

MongoDB, so getting failover and scale
-
out for file storage is easy.




GridFS can alleviate some of the issues that certain filesystems can exhibit
when

being used to store user uploads. For example, GridFS does not have issues
with

storing large numbers of files in the same directory.




You can get great disk locality with GridFS, because MongoDB allocates
data files

in 2GB chunks.


29

MongoDB Advanced Usage


GridFS: example


$ echo "Hello, world" > foo.txt

$ ./mongofiles put foo.txt

connected to: 127.0.0.1

added file: { _id: ObjectId('4c0d2a6c3052c25545139b88'),

filename: "foo.txt", length: 13, chunkSize: 262144,

uploadDate: new Date(1275931244818),

md5: "a7966bf58e23583c9a5a4059383ff850" }

done!


$ ./mongofiles list

connected to: 127.0.0.1

foo.txt 13


$ rm foo.txt

$ ./mongofiles get foo.txt

connected to: 127.0.0.1

done write to: foo.txt


$ cat foo.txt

Hello, world

30

MongoDB Advanced Usage


GridFS: internal

The basic idea behind GridFS is that we can store large files by splitting them up into chunks and storing each
chunk as a separate document.


autocomplete:PRIMARY> show collections

fs.chunks

fs.files

system.indexes


autocomplete:PRIMARY> db.fs.chunks.find()

{ "_id" : ObjectId("4db258ae05a23484714d58ad"), "files_id" :
ObjectId("4db258ae39ae206d1114d6e4"), "n" : 0, "data" :
BinData(0,"SGVsbG8sbW9uZ28K") }

{ "_id" : ObjectId("4db258d305a23484714d58ae"), "files_id" :
ObjectId("4db258d37858d8bb53489eea"), "n" : 0, "data" :
BinData(0,"SGVsbG8sbW9uZ28K") }

{ "_id" : ObjectId("4db2596d05a23484714d58af"), "files_id" :
ObjectId("4db2596d4fefdd07525ef166"), "n" : 0, "data" :
BinData(0,"SGVsbG8sbW9uZ28K") }


autocomplete:PRIMARY> db.fs.files.find()

{ "_id" : ObjectId("4db258ae39ae206d1114d6e4"), "filename" : "file1.txt", "chunkSize" :
262144, "uploadDate" : ISODate("2011
-
04
-
23T04:42:22.546Z"), "md5" :
"c002dec1a1086442b2aa49c2b6e48884", "length" : 12 }

{ "_id" : ObjectId("4db258d37858d8bb53489eea"), "filename" : "file2.txt", "chunkSize" :
262144, "uploadDate" : ISODate("2011
-
04
-
23T04:42:59.851Z"), "md5" :
"c002dec1a1086442b2aa49c2b6e48884", "length" : 12 }

{ "_id" : ObjectId("4db2596d4fefdd07525ef166"), "filename" : "file2.txt", "chunkSize" :
262144, "uploadDate" : ISODate("2011
-
04
-
23T04:45:33.771Z"), "md5" :
"c002dec1a1086442b2aa49c2b6e48884", "length" : 12 }


autocomplete:PRIMARY> db.system.indexes.find()

{ "ns" : "test.fs.files", "key" : { "filename" : 1 }, "name" : "filename_1", "v" : 0 }

{ "ns" : "test.fs.chunks", "key" : { "files_id" : 1, "n" : 1 }, "name" : "files_id_1_n_1", "v" :
0 }

31

MongoDB Advanced Usage




Advanced Usage review


Index : almost same as oracle


Aggregation


MapReduce : built
-
in


Database commands : db.listCommands()


Capped Collections : suitable for log


GridFS: Storing Files : built
-
in document oriented



Others


Geospatial Indexing


Database References


Server
-
Side Scripting


32

DBA on MongonDB


Administration ( admin,replication,sharding)


Monitoring


Security and Authentication


Backup and Repair


Master
-
Slave Replication


Replication
-
set


Sharding





33

DBA on MongonDB


Easy
Monitoring


Using the Admin Interface


db.runCommand({"serverStatus" : 1})


mongostat


Third
-
Party Plug
-
Ins






34

Using the Admin Interface


35

db.runCommand({"serverStatus" : 1})



db.runCommand({"serverStatus" : 1})


{


"version" : "1.5.3",


"uptime" : 166,


"localTime" : "Thu Jun 10 2010
15:47:40 GMT
-
0400 (EDT)",


"globalLock" : {


"totalTime" : 165984675,


"lockTime" : 91471425,


"ratio" : 0.551083556358441


},


"mem" : {


"bits" : 64,


"resident" : 101,


"virtual" : 2824,


"supported" : true,


"mapped" : 336


},


"connections" : {


"current" : 141,


"available" : 19859


},


"extra_info" : {


"note" : "fields vary by platform"


},


"indexCounters" : {


"btree" : {


"accesses" : 1563,


"hits" : 1563,


"misses" : 0,


"backgroundFlushing" : {


"flushes" : 2,


"total_ms" : 44,


"average_ms" : 22,


"last_ms" : 36,


"last_finished" : "Thu Jun 10 2010
15:46:54 GMT
-
0400 (EDT)"


},


"opcounters" : {


"insert" : 38195,


"query" : 8874,


"update" : 4058,


"delete" : 389,


"getmore" : 888,


"command" : 17731


},


"asserts" : {


"regular" : 0,


"warning" : 0,


"msg" : 0,


"user" : 5054,


"rollovers" : 0


},

36

mongostat




Fields



inserts
-

# of inserts per second



query
-

# of queries per second



update
-

# of updates per second



delete
-

# of deletes per second



getmore
-

# of get mores (cursor batch) per
second



command
-

# of commands per second



flushes
-

# of fsync flushes per second



mapped
-

amount of data mmaped (total data size)
megabytes



visze
-

virtual size of process in megabytes



res
-

resident size of process in megabytes



faults
-

# of pages faults per sec (linux only)



locked
-

percent of time in global write lock



idx miss
-

percent of btree page misses (sampled)



qr|qw
-

queue lengths for clients waiting
(read|write)



ar|aw
-

active clients (read|write)



netIn
-

network traffic in
-

bits



netOut
-

network traffic out
-

bits



conn
-

number of open connections


37

DBA on MongonDB

Security and Authentication

1.

Each database in a MongoDB instance can
have any number of users.

2.
only authenticated users of a database are able
to perform read or write operations on it.

3.
A user in the
admin
database can be thought of
as a superuser

4.
Need to start MongoDB with

--
auth


option to
enable authentication.





38

Backup on MongonDB

1.
Data File Cold Backup


kill

INT mongod; copy
--
dbpath

2.
mongodump (exp) and mongorestore (imp)

3.
fsync and Lock

4.
Slave Backup


> use admin

switched to db admin

> db.runCommand({"fsync" : 1, "lock" : 1});

{

"info" : "now locked against writes, use db.$cmd.sys.unlock.findOne() to unlock",

"ok" : 1

}


Do mongodump


> db.$cmd.sys.unlock.findOne();

{ "ok" : 1, "info" : "unlock requested" }





39

Repair MongonDB

1.
Need to repair databases after an
unclean shutdown ( kill
-
9 )



**************

old lock file: /data/db/mongod.lock. probably means unclean
shutdown

recommend removing file and running
--
repair

see: http://dochub.mongodb.org/core/repair for more
information

*************


2.
All of the documents in the database are
exported and then immediately imported,
ignoring any that are invalid. Then rebuild
indexes.

3.
Take a long time while data
-
set is humongous

4.
Repairing a database will also perform a
compaction.

5.
db.repairDatabase() can repair single database



40

DBA on MongonDB

Replication


Master
-
Slave Replication


Replication
-
set







41

Master
-
Slave Replication








$ mkdir
-
p ~/dbs/master

$ ./mongod
--
dbpath ~/dbs/master
--
port 10000
--
master


$ mkdir
-
p ~/dbs/slave

$ ./mongod
--
dbpath ~/dbs/slave
--
port 10001
--
slave
--
source localhost:10000

1.
Scale read

2.
Backup on Slave

3.
Process data on Slave

4.
DR

42

Master
-
Slave Replication








How it works? The Oplog

oplog.$main a
capped collection in
local

database.



ts Timestamp for the operation. The timestamp type is an internal type
used to track when operations are performed. It is composed of a 4
-
byte
timestamp and a 4
-
byte incrementing counter.


op Type of operation performed as a 1
-
byte code (e.g., “i” for an insert).


ns Namespace (collection name) where the operation was performed.


o Document further specifying the operation to perform. For an insert,
this would be the document to insert.


1.
Slave first starts up, it will do a full sync of the data on the master node.


2.
After the initial sync is complete, the slave will begin querying the
master’s oplog and applying operations in order to stay up
-
to
-
date.
“async”













43

Replication on MongonDB

Replication
-
set

1.

A replica set

is basically a master
-
slave cluster
with automatic failover.

2.
One master, some secondary (slave)

3.
One secondary is elected by the cluster and may
change to another node if the current master
goes down.




44

Replication on MongonDB

Setup Replication
-
set

1.
Option
--
replSet is
name
for this replica set.


$ ./mongod
--
dbpath ~/dbs/node1
--
port 10001
--
replSet autocomplete/slcdbx1005:10002


We start up the other server in the same way:


$ ./mongod
--
dbpath ~/dbs/node2
--
port 10002
--
replSet autocomplete/slcdbx1006:10001


If we wanted to add a third server, we could do so with either of these commands:


$ ./mongod
--
dbpath ~/dbs/node3
--
port 10003
--
replSet autocomplete/slcdbx1005:10001

$ ./mongod
--
dbpath ~/dbs/node3
--
port 10003

replSet autocomplete/slcdbx1005:10001,
slcdbx1006:10002


45

Replication
-
set failover


standard a full copy of data & voting & ready to be primary


passive a full copy of data & voting


arbiter voting & no data replicated

46

MongonDB Auo
-
sharding







Sharding : splitting data up and storing different portions of the

data on different machines.


1.
Manualy sharding: The application code manages storing different
data on different servers and querying against the appropriate server to
get data back.


2.
Auto Sharding : The cluster handles splitting up data and rebalancing
automatically.














47

Auto sharding







When to shard?


1.
You’ve run out of disk space on
your current machine.

2.
You want to write data faster than a
single mongod can handle.

3.
You want to keep a larger proportion
of data in memory to improve
performance.

4.
DR

5.
Failover automatically














48

Auto sharding







Component of MongoDB sharding?



Config server

$ mkdir
-
p ~/dbs/config

$ ./mongod
--
dbpath ~/dbs/config
--
port 20000




Mongos (router)

$ ./mongos
--
port 30000
--
configdb localhost:20000




Sharding ( usually replication
-
set)


$ mkdir
-
p ~/dbs/shard1

$ ./mongod
--
dbpath ~/dbs/shard1
--
port 10000



Mongos> db.runCommand({addshard : "localhost:10000",
allowLocal : true})

{

"added" : "localhost:10000",

"ok" : true

}













49

Pre sharding a table







Determine a shard key


1.
define how we distribute data.

2.
MongoDB's sharding is order
-
preserving; adjacent data by shard key tends to be on the same
server.

3.
The config database stores all the metadata indicating the location of data by range:

4.
It should be
granular

enough to ensure an even distribution of data.




Chunks



1.
a contiguous range of data from a particular collection.

2.
Once a chunk has reached about 200M size, the chunk splits into two new chunks.


When a
particular shard has excess data, chunks will then migrate to other shards in the system.

3.
The addition of a new shard will also influence the migration of chunks.




50

Sharding a table







Enable sharding on a database


db.runCommand({"enablesharding" : "foo"})

Enable sharding on collection.


db.runCommand({"shardcollection" : "foo.bar", "key" : {"_id" : 1}})


Show autosharding status


> db.printShardingStatus()

---

Sharding Status
---

sharding version: { "_id" : 1, "version" : 3 }

shards:

{ "_id" : "shard0", "host" : "localhost:10000" }

{ "_id" : "shard1", "host" : "localhost:10001" }

databases:

{ "_id" : "admin", "partitioned" : false, "primary" : "config" }

{ "_id" : "foo", "partitioned" : false, "primary" : "shard1" }

{ "_id" : "x", "partitioned" : false, "primary" : "shard0" }

{ "_id" : "test", "partitioned" : true, "primary" : "shard0",

"sharded" : { "test.foo" : { "key" : { "x" : 1 }, "unique" : false } } }

test.foo chunks:

{ "x" : { $minKey : 1 } }
--
>> { "x" : { $maxKey : 1 } } on : shard0

{ "t" : 1276636243000, "i" : 1 }













51

Query on sharding


assume a shard key of { x : 1 }.




52

Sharding machine layout

Avoid single failure

53

Review



Getting Up to Speed with MongoDB ( document
oriented and schema
-
free )



Developing with MongoDB (find())



Advanced Usage ( Tons of features)



Administration ( Easy to
admin,replication,sharding)



MISC (BJSON;internal)



54

Misc

1.
BSON

2.
Datafiles layout

3.
Memory
-
Mapped Storage Engine

55

Misc1


BSON (Binary JSON)



a lightweight binary format capable of
representing any MongoDB document as a
string of bytes.


BSON is the format in which


documents are saved to disk.


When a driver is given a document to insert,
use as a query, and so on, it will encode that
document to BSON before sending it to the
server.


Goals:
Efficiency Traversability Performance

56

Datafiles layout


& Memory
-
Mapped Storage Engine


1.

The numeric data files for a database will double in size for each
new file, up to a maximum file size of 2GB.

2.
Preallocates data files to ensure consistent performance





3.
Memory
-
Mapped storage Engine




4.
When the server starts up, it memory maps all its data files.

5.
OS is to manage flushing data to disk and paging data in and out.

6.
MongoDB cannot control the order that data is written to disk, which
makes it impossible to use a writeahead log to provide single
-
server
durability.

7.
32
-
bit MongoDB servers are limited to a total of about 2GB of data
per
mongod
. This is because all of the data must be addressable
using only 32 bits.

57

Q&A



Getting Up to Speed with MongoDB ( key
features)



Developing with MongoDB (start & shutdown &
connect & query & DML)



Advanced Usage ( index & Aggregation, GridFS)



Administration ( easy admin,replication,sharding)



MISC (BSON; Memory
-
Mapped)