Distributed and Non-Relational Databases

longtermagonizingInternet and Web Development

Dec 13, 2013 (3 years and 5 months ago)


Distributed and Non
Relational Databases

Mark Feltner

Department of Computer Science

y of Wisconsin


Platteville, WI 53818



The data storage climate was much different when

the standard data storage model


s. Today’s data is adaptive, moves quickly, can be astronomical in size,
and may not even be located in a single geographic spot.

the New York Stock
Exchange generate

about one teraby
te of trade data per day, Facebook host


billion photos which take up about one petabyte of storage, and the Large Hadron Collider in
Geneva, Switzerland will produce about fifteen petabytes of data per year

New data storage
solutions have been, and are being, conceived which specialize in handling these sorts of
problems. These non
relational models (so called because they stray from the foundations of
relational, SQL databases) are flexible to design and

develop with, can easily be scaled
horizontally, and are specialized
for dealing with Big Data.


This section deals broadly with the theory of databases and distributed systems. The foundational
concepts of databases and both the relational and non
relational models are described. A few
concepts relevant to data storage with distributed syst
ems are also featured.

Relational Databases

The first coinage of a database m
anagement system was in 1970 by Edgar Codd at

BM Almden
Research Center. Codd proposed a relational model, based on
Set Theory, to structure stored data


The relational model defines a relation as a set of tuples which have the same attribute. Usually
one tuple represents an object and its associated information. Basically, each tuple is a row in the
table and each attribute is a column

. Usually, both rows and attributes are unordered sets. A

relation requires a
primary key
, or at least a set of attributes which are guaranteed to be distinct
among all tuples in the relation

Relations are reference

foreign keys
. This allows a single tuple to be reference
d by any
number of other tuples (i.e., one
many relationship).

The relational model also allows one to define precompiled

which can find tuples
quicker than checking

each single tuple for a match.

Data in the relational model is often

to retain integrity and remove redundancy


Database transactions are the standard units of work fo
r a


database management
system (DBMS). Transactions are small and simplified work units
. Examples being
, and

A good DBMS is usually able to take a complex query and divide
it up into efficient, small, discrete, and

reliable transactions.

This provides the DBMS
ability to

the databa
se in a consistent state even if a transaction fails.

systems benefit from being able to

from changes by simply reversing the t
were executed.

Besides querying data with the

statement, users can modify relations
, and, of course, other DBMS provided procedures.


ACID is one of the core characteristics of a "reliable" database. ACID is an acro
nym for
Atomicity, Consistency, Isolation, and Durability. This set of properties guarantees that reliable
database transactions may take place, and has been a mainstay in modern relational database


means that each transaction is

If any operation in the transaction
group fails, then the entire transaction must be undone.

If one were to insert a million rows into a
table, and the 999,999 thousandth one failed before a

was issued, the database would
revert back to its previous state.

For a database to be
, the data must always be in a valid, or consistent, state. Relational
databases are designed with rules in mind. Strict schemas are defined and sp
lengths, value types, relations, and other rules

An ACID compliant database would never break
the rules it is defined with.

Of course, this c
onsistency does not guarantee programmer
correctness, just that the database will follow the rules laid ou
t for it.


defines a DBMS property that enables concurrent execution of transactions. Isolation
ensures that transactions executed serially will result in the same system state as transactions
executed concurrently.


means that once a pr
ogrammer has executed a

then the transaction is
permanent across all clients.


All of these checks and rules, of course, do add overtime to each transaction that occurs in an
ACID compliant DBMS. For systems in the banking or government sectors that
do require a
strong degree of data reliability, then founding their models on that of ACID will benefit them.

The relational model is backed by decades of use, strong foundations in well
mathematics, and a variety of ready
use implementations.
That being said, it is not the end
all of data storage.
It is inflexible, tedious to work with, and the benefits of ACID do not
always outweigh the costs
Relational databases lend themselves to systems which require

large degree of reliability and availability, and are generally better suited for single
systems. T
he rise of distributed systems and web services, and the difficulty found when scaling
traditional DBMSs, ha

led to new

in data storage

Relational Databases

In 2009, the landscape of databases had changed vastly from that conceived by Codd in the
1970s. Nowadays, we are dealing with systems made to handle thousands of concurrent
connections a second, store

of data, and sync
with other similar server
s and services
around the globe simulatenously.


(so named because of its focused on data storage techniques sans SQL)
defines non
relational, distributed data stores

often did not attempt to provide atomicity, consistency,
isolation, and durabi

the components of ACID.

As consumers became more and more
connected, data needed to be delivered faster. Two methods of solving this problem were

conceived. First, s

out the data stores geographically
. Second, specialize
the data stores for
the types of data they



NoSQL systems store


data. Semi
structured data is much like a
spreadsheet. You know that the data is organized in cells, rows, and columns, but you only have
a loose idea of what kind of data is stored. Unstructured data has no particular internal structure
(i.e., plain text

or image data).

Recently, there has been a surge of NoSQ

and NoSQL
esque database management systems.
Most, if not all, of these systems lack fixed schemas, avoid joins, and scale really well.

Different systems satisfy different needs. Document
ones, for instance, have an
immense easy
use, while key
value or column oriented systems sacrifice this in order to more
efficiently scale data over clusters


value stores are basically “big, distributed,
persistent, fault
tolerant hash” tables. Based on
an extremely simple concept, hash tables, key
value stores are prized for their simplicity and

value stores allow the application developer to store schema
less d

This is
beneficial when using a database for persistent storage of object oriented data.


The data in a key
value store is usually just a string that represents a key, and a value which is
some type of primitive of whatever programming language is bei
ng used (string, integer, array)

There are a variety of ways to implement this.

The most common key
value store is the file
system. The key, for example, would be the path to a file, and the value is the binary data
ated with that file.

This simplicity means that key
value stores are not only easy to use, but
the code produced is simple and readable. An example of accessing a key
value store in ruby:

require "pstore" # the key
value store interface

store = PStore.new(

store.transaction do # begin transaction

# load some data into the store...

store[:single_object] = "Lorem ipsum dolor sit amet..."

store[:obj_heirarchy] = { "Marc Seeger" => ["ruby", "nosql"],

"Rainer Wahnsinn" => ["php", "mysql"] }

d # commit changes to data store file

Some of the most popular key
value storage solutions include: Apache Cassandra, Redis,
memcached, and Google’s BigTable.


oriented databases are defined as
structured data storages implemented without
tables and relations

. Document
stores are highly flexible and have no fixed schema. The idea
is to allow the client application to add and remove attributes to any single
tuple without wasting
any space by creating empty attributes for all the other tuples.

oriented databases are suited well for dynamic data because of their lack of fixed
schemas and developer flexibility. Because of this, though, document

databases have a
lack of safety (much like using a duck
typed language versus a statically typed one) and there is
a relative performance hit.

Many popular document
oriented databases provide layers over already standardized semi
structured file formats s
uch as JSON, YAML, or XML. This allows them to map easily to web
services and other distributed systems which operate by passing these small, structured files

Some of the most popular document
oriented databases include MongoDB, CouchDB, and


based databases use graph
like structures and use nodes, edges, and properties to
represent data. Graph database are powerful tools, especially for graph
like queries such as
shortest path computation and community analysis.


These databas
es provide another advantage: index
free adjacency. This basically means no index
lookups are necessary because each element contains a direct pointer to its adjacent element.

Graph database scale more naturally for large datasets and are often faster tha

applications for associative data sets. Graphs typically don’t require expensive JOIN operations
and their flexible schemas are suitable for ad
hoc and changing data.

Some popular graph
sed solutions include
AllegroGraph, InfiniteGraph,

and FlockDB.


Of course, there are many other types of distributed
databases including tuple stores, tabular
databases, RDF databases, object databases, multivalue databases, RAM stores, and many other
categories of previously listed categories
. All
all, there exist quite a number of solutions
for almost any data storage need.

Distributed Systems

Distributed systems, also referred to commonly as “grids”, are in rising popularity. There has
been a growing need for distributed systems in both commercial and scientific domains.
systems can be classified into the computational, data, and service t
ypes. For the purposes of this
paper, we will focus on the data grid. The data grid is defined as “systems that provide a
hardware and software infrastructure for synthesizing new information from data repositories
that are distributed in a wide area netwo

Almost any application that relies on a database which houses a large volume of data, generally
the bottleneck of that application will be its database. I/O operations are expensive, disk reads
are slow,
writes are slo
, and
since many databases exist in remote data centers one must take
network latency into account as well
. A simple solution is to add more hardware to the database
system, but this is innately unsustainable.

Another solution is to parallelize and dis
tribute your data processing. A parallel database
management system can achieve both high throughput and interquery parallelism and low
response time in intraoperation parallelism

Parallelism does add architectural comple
xity, and creates a lot of system administration
overhead. The biggest disadvantage in parallel computing is the lack of standards and applicable

The structure of distributed systems grants them different
performance metrics and different
. These systems are measured on their efficiency, or utilization rate of resources;
dependability; adaptability, or ability to support billions of job requests of massive data sets and
virtualized cloud servers under
various workloads and service models; and flexibility, or the
ability of the system to run in both high
performance and high
throughput computing


CAP Theorem

In 2000, Eric Brewer laid out the CAP Theorem in
a keynote at the
Proceedings of the Annual
ACM Symposium on Principles of Distributed
, and

Gilbert and Nancy Lynch
formalized his ideas into a theory in 2009. The CAP Theorem states that it is impossible to
guarantee consistency, availabilit
y, and partition tolerance in a distributed system
. This idea
has vast implications, especially today where almost every system is distributed in one way or

A service that is consistent operates fully or not at al
l. That is, a system is consistent if an update
is applied to all relevant nodes at the same logical time

. One way to drop consistency with
your service is to simply deal with inconsistencies. Some services simply accept that things are
eventually consistent

Eventual consistency is a consistency model used in the parallel programming domain. It means
that gi
ven a sufficiently long period of time over which no changes are sent, all updates can be
expected to propagate throughout the system and replicas will be consistent

“For a distributed system to be continuously available,
every request received by a non
node in the system must result in a response”
. Availability is one of the hardest things to test
and plan for because it tends to desert you when you need it most. It is only inevitab
le that
services go down at busy times just because they are busy. Implementing a system that can deal
with availability gracefully can be non

“In order to model partition tolerance, the network will be allowed to
lose arbitrarily many
messages sent from one node to another. When a network is partitioned, all messages sent from
nodes in one component of the partition to nodes in another component are lost”
. Once a
system has become d
istributed, there is a high chance that it will be in a partitioned state. Say the
data has not traversed the wires to reach your entire data store yet, or perhaps some of your
machines have crashed and are no longer online. To handle issues of consistency

availability, we can scale horizontally, but as the number of partitions increases, the more atomic
entities you need to keep synchronized. You can drop partition tolerance by placing everything
on a single machine, but this severely limits your abili
ty to scale

To have a consistent system you must sacrifice some level of availability or partition tolerance,
a system with good availability will have to sacrifice some consistency or partition tolerance, and
a system th
at is partitioned will need to sacrifice some degree of consistency or availability.

Choosing which part of CAP to drop has been argued over
over again since CAP’s
inception. Hale believes that you cannot drop partition tolerance because the
probability that one
or more node in your system will fail increases exponentially as you add nodes. Instead, Hale
says that you should be asking yourself: “
In the event of failures, which will this system
sacrifice? Consistency or availability?”



Distributed Systems

and Data Parallelization Techniques


MapReduce is a popular web programming model used for scalable data processing on large
clusters over large data sets


MapReduce is suited for applications where data is written once
and read many times

The user provides two functions: a map function which generates a list
of intermediate key/value pairs, and a reduce function which merge
s all the intermediate values
with the same intermediate key. Being purely functional, MapReduce operates well on



The map function serves to process a subset of the original problem and returns the result as a list
of (A,B)

The reduce function will, given an A value and a list of B values, produces a list of results that is
the final answer for all A
parts. The beauty of these two operations is that they are purely
functional and have no side
effects. What does this mean
? It means they tend to scale well, are
highly parallelizable, and extremely flexible

: MapReduce Visual Representation


This operation is linearly scalable and is massively parallelizable
; if you double the amount
of data to operate on, the operation takes twice as much time, but you can also double the size of
the cluster and have the opera
tion take half as much time
. It has been proven to work on
terabytes of data on hundreds of thousands of client machines with hundreds of simultaneous
MapReduce programs. In fact, there are estimated thousands of these oper
ations occurring on
Google’s clusters every day


Table 1: Example

used to compare row

and column
oriented operations





Breaking the Law

Judas Priest

British Steel



Iron Maiden



Kickstart My Heat

Motley Crue

Dr. Feelgood


Raining Blood


Reign in Blood


I Wanna Be Somebody





oriented databases represent objects as tuples and attributes as one

piece of information
about that object. Generally, we think of a tuple as a row and a attribute as column. This is the
primary way of thinking abou
t data in the relational model

A row
oriented database would store data
somewhat like this:

Breaking the Law, Judas Priest, British Steel, 1980,
ces High, Iron Maiden, Powerslave.
1984 …



oriented databases are tuned to operate more efficiently on a set of attributes for


And the table above in a column
oriented database:

Breaking the Law, Aces High, Kickstart My Heart, Raining Blood, I Wanna Be
Somebody, Judas Priest, Iron Maiden, Motley Crue, Slayer, W.A.S.P.

British Steel
Powerslave, Dr. Feelgood,
in Blood



These two distinctly different ways of storing data result in very different results when database
operations are performed.

To create a new tuple (CREATE) in a row
oriented database, the system would only need
append the new tuple to the end of the collection (table). In a column
oriented database, careful
insertions would need to be made to keep the data lined up correctly.

Reversely, creating a new tuple in a row
oriented database require the more expensive

and non
trivial operation, whereas it is as simple as inserting the new data at the end of a column oriented

Select statements, such as a query listing all the albums from the year 1984, are more efficient in
a row
oriented system because the qu
ery simply needs to check the 4

element of each tuple and
return the ones matching the year.

Aggregate statements, on the other hand, are more efficient in a column
oriented solution. An
operation such as summing all the years is much easier in a column

oriented system because all
the years are consecutively found and
need not

be compared to other attributes.

In summary, we can conclude that row

and column
oriented solutions both have trade
offs. A
oriented system may be best when the system
requires the creation of many rows or the
retrieval of many attributes from the same tuple at the same time. Colum
oriented systems
might be the better option when new columns are created often, and aggregations over rows, but
not columns, happen a lot of

the time



Enabling Technology

It has only been in the last decade or so that NoSQL solutions have really started to appeal to
developers and engineers. This is the result of a couple of enabling technologies i
n the hardware
sector. Specifically, advances in computation and memory and storage systems.

Today’s hard drives can reach sizes in excess of one terabyte with read speeds around 100 MB/s.
So it takes more than two and a half hours to read all the data
off the

disk. This a long time to
read data, and

it takes even longer to write.

The solution to this problem is to spread the data out
onto separate drives. By replicating the data onto, say, 100 separate drives, we can take
advantage of

ability to rea
d from multiple disks at once.

There are two problems with this approach though. First is how to deal with hardware failure, a
common occurrence. Commonly, you can use replication to store redundant copies of data so
that if there is a failure, extra copie
s are available. The second big issue is that you are going to
need to combine that data in some way eventually.
Distributed data
storage systems abstract
these problems and provide reliable, shared storage and analysis systems.


Distributed Systems


Hadoop was created by Doug Cutting, named after his son’s stuffed elephant, and was bred off
of Apache Nutch, an open
source web search engine that is part of the Lucene project

2008, Yahoo! announced that its production search
index was being powered by a 10,000 core
Hadoop cluster

In April of 2008, Hadoop broke
world record to become the fastest
system to sort a terabyte of data. It ran
on a 910

node cluster and only took 209 seconds,
beating the previous winner by 88 seconds


HBase is another Apache Software Foundation project that stemmed from the Hadoop project.
HBase is basically a Hadoop
database running over a Hadoop Distributed File System. It is
modeled after Google’s BigTable.

HBase provides nice developer tools including a Java API,
REST interface, and Thrift gateway.

HBase follows a column
oriented architecture and has easy integrati
on with MapReduce jobs
which are run i
n Hadoop. Tables can be used as

both input and output for Ma
pReduce jobs. It
data model is based on a four dimensional dictionary.

> value

HBase has excellent fault toleranc
e. It has block
level replication which is achieved from storing
files as set of blocks which are then distributed across the entire cluster. Block replication also
gives HDFS high
throughput across large data sets. By splitting individual files in 64MB bl
HDFS is able to decrease the amount of metadata storage per file and read data fast because of
sequential blocks.

HBase and the HDFS
file system

it is built on will sometimes compromise reliability in order to
achieve lower communication costs. HBase
supports heartbeat and block report periodic
messaging between nodes.


Cassandra is a “distributed storage solution for managing very large amounts of structured data
spread out across many commodity servers, while providing highly available service with no
single point of failure”
. Cassandra

is an Apache Software Foundation project now, but was
designed by Facebook to fix their Inbox Search problem. The ability of Facebook users to search
through their Inboxes required massive amounts of read and write throughput, billions of writes
per day,
and needed to be scalable for the future. Facebook created Cassandra to resolve this

problem and so far Cassandra has held to its duty by scaling well from 100 million users to 250
million users

Cassandra is a column
nted, distributed DBMS. It features a four to five dimensional key
value store with eventual consistency

The Cassandra API is extremely simple, and only exposes three methods:

Insert(table, key, rowMutation)

Get(table, ke
y, columnName)

Delete(table, key, columnName)


Cassandra has the backing of a large and popular company in Facebook, and has proven itself on
the largest scale. It is extremely simple to use and get started, yet extremely p



MySQL is one of the most popular relational database management systems in the world. It is a
core component of web development and is the ‘M’ in the LAMP (Linux Apache MySQL PHP)
acronym. You will find MySQL used in small
to medium scale single
server deployments.
MySQL does support scaling by using full replication to increase read capacity, or sharding to
increase write capacity. It also supports multiple storage engines, both native, partner developed,
and open


One popular document
oriented database is MongoDB. MongoDB is a scalable, opens
oriented database that has been around since October 2007. MongoDB is based on B
Trees and stores data as binary JSON
style documents. MongoDB scales v
ery easily horizontally
and supports sharding and replication. MongoDB is mainly used for problems with strong
transaction requirements. MongoDB is also very flexible and developer

lack of fixed
schema means modifying database definitions, col
umns, and tuples is easy.


Any software that depends on persistent storage will end up using a database one way or another,
and many times the database can and will be t
he bottleneck of an application.

It is important to

know what the implications are in choosing a database system. Today’s data storage climate is
such that traditional, SQL
based DBMSes are not the best solution to every problem. Through a
better understanding of the theory, algorithms, and implementations
behind relational and non
relation databases one will be better suited to making data
driven decisions.
For a vast account of
performance considerations that is out of the scope of this paper, the author would like to suggest
Chapter 7 in
Lith and Mattsson
’s Paper: "Investigating storage solutions for large data”. The
paper investigates relational versus non
relational databases in a specific case study and includes
pages of graphs showing specific performance benchmarks for different databases.




T. E. White, Hadoop: the definitive guide, Sebastopol, CA: O'Reill Media, Inc., 2012.


M. Stonebraker and U. Çetintemel, "“One Size Fits All”: An Idea Whose Time Has Come and


摯ⰠÇat慢a獥猠sllumi湡n敤e⁓散潮o⁅摩tio測⁏湴慲ioW g潮敳…⁂artl整t⁌ ar湩湧Ⱐ㈰ㄲ⸠


A⸠䱩.栠慮搠g⸠䵡tt獳潮Ⱐ"f湶敳tig慴i湧⁳ 潲慧攠e潬uti潮猠sorarg攠摡t愬a
Chalmers University of
pp. 1
70, 2010.


C. Hale, "You Can't Sacrifice
Partition Tolerance," 07 October 2010. [Online]. Available:
tolerance/. [Accessed 07 April 2013].


W. Vogels, "Eventually Consistent,"
ACM Queue,


S. Gilbert and N. Lynch, "Brewers conjectur
e and the feasibility of consistent, available, partition
tolerant web services," p. 12, 2002.


M. Seeger, "Key
Value stores: a practical overview,"
medien informatik,


F. Magoules, Fundamentals of Grid Computing, Boca Raton: Chapman and
Hall, 2010.


A. J. Wells, Grid Database Design, Boca Raton: Auerbach Publications, 2005.


J. D. G. C. F. Kai Hwang, Distributed and Cloud Computing: From Parallel Processing to the
Internet of Things, Morgan Kaufmann, 2011.


Lakshman and P. Malik, "Cassandra

A Decentralized Structured Storage System," Ithaca, 2009.


F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes
and R. E. Gruber, "Bigtable: A Distributed Storage System for
Structured Data," in
Seventh Symposium on Operating System Design and Implementation
, Seattle, WA, 2006.


Figure 1: http://upload.wikimedia.org/wikipedia/en/3/35/Mapreduce_Overview.svg