Data Modeling with Neo4j

electricianpathInternet and Web Development

Dec 13, 2013 (3 years and 8 months ago)

93 views

1
Data Modeling with Neo4j
1
Stefan Armbruster, Neo Technology
(slides from Michael Hunger)
3
3
4
is a
4
5
5
NOSQL
6
Graph Database
6
7
A graph database...
7
NO
: not for charts & diagrams, or vector artwork
YES
: for storing data that is structured as a graph
remember linked lists, trees?
graphs are the general-purpose data structure

A relational database may tell you the average age of everyone in
this place,
but a graph database will tell you who is most likely to buy you a
beer.”
8
You know relational
8
foo
bar
foo_bar
9
now consider relationships...
9
1
0
We're talking about a
Property Graph
1
0
Properties (each a key+value)
+ Indexes (for easy look-ups)
1
1
Aggregate vs. Connected
Data-Model
1
1
1
2
1
2
NOSQL Databases
1
3
1
3

There is a significant downside - the whole
approach works really well when data access is
aligned with the aggregates, but what if you want
to look at the data in a different way? Order entry
naturally stores orders as aggregates, but
analyzing product sales cuts across the aggregate
structure. The advantage of not using an
aggregate structure in the database is that it
allows you to slice and dice your data different
ways for different audiences.
This is why aggregate-oriented stores talk so
much about map-reduce.”
Martin Fowler
Aggregate Oriented Model
1
4
1
4
The connected data model is based on fine
grained elements that are richly connected, the
emphasis is on extracting many dimensions and
attributes as elements.
Connections are cheap and can be used not only
for the domain-level relationships but also for
additional structures that allow efficient access for
different use-cases. The fine grained model
requires a external scope for mutating operations
that ensures Atomicity, Consistency, Isolation and
Durability - ACID also known as Transactions.
Michael Hunger
Connected Data Model
1
5
Data Modeling
1
5
1
6
Why Data Modeling
1
6

What is modeling?

Aren‘t we schema free?

How does it work in a graph?

Where should modeling
happen? DB or Application
1
7
Data Models
1
7
Model mis-match
Real World
Model
Model mis-match
Application Model
Database Model
Trinity of models
2
1
Whiteboard --> Data
2
1
Andre
as
Peter
Emil
Alliso
n
knows
knows
knows
knows
// Cypher query - friend of a friend
start n=node(0)
match (n)--()--(foaf)
return foaf
2
2
// lookup starting point in an index
START
n=node:People(name = ‘Andreas’)
You traverse the graph
2
2
// then traverse to find results
START
me=node:People(name = ‘Andreas’
MATCH
(me)-[:FRIEND]-(friend)-[:FRIEND]-(friend2)
RETURN
friend2
2
3
SELECT skills.*, user_skill.*
FROM users
JOIN user_skill ON users.id = user_skill.user_id
JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1
2
3
START user = node(1)
MATCH user -[user_skill]-> skill
RETURN skill, user_skill
2
4
An Example
2
4
What language do they speak here?
Language
Country
What language do they speak here?
Language
Country
What language do they speak here?
Language
Country
Tables
language_code
language_nam
e
word_count
Language
country_code
country_name
flag_uri
Country
Need to model the relationship
language_code
language_nam
e
word_count
Language
country_code
country_name
flag_uri
language_cod
e
Country
What if the cardinality changes?
language_code
language_nam
e
word_count
country_code
Language
country_code
country_name
flag_uri
Country
Or we go many-to-many?
language_code
language_nam
e
word_count
Language
country_code
country_name
flag_uri
Country
language_cod
e
country_code
LanguageCountr
y
Or we want to qualify the
relationship?
language_code
language_nam
e
word_count
Language
country_code
country_name
flag_uri
Country
language_code
country_code
primary
LanguageCountry
Start talking about
Graphs
Explicit Relationship
name
word_count
Language
name
flag_uri
Country
IS_SPOKEN_IN
Relationship Properties
name
word_count
Language
name
flag_uri
Country
IS_SPOKEN_IN
as_primary
What’s different?
language_code
language_nam
e
word_count
Language
country_code
country_name
flag_uri
Country
language_code
country_code
primary
LanguageCountr
y
IS_SPOKEN_IN
What’s different?

Implementation of maintaining relationships is left up
to the database

Artificial keys disappear or are unnecessary

Relationships get an explicit name

can be navigated in both directions
Relationship specialisation
name
word_count
Language
name
flag_uri
Country
IS_SPOKEN_IN
as_primary
Bidirectional relationships
name
word_count
Language
name
flag_uri
Country
IS_SPOKEN_IN
PRIMARY_LANGUAGE
Weighted relationships
name
word_count
Language
name
flag_uri
Country
POPULATION_SPEAKS
population_fraction
Keep on adding relationships
name
word_count
Language
name
flag_uri
Country
POPULATION_SPEAKS
population_fraction
SIMILAR_TO
ADJACENT_TO
EMBRACE the
paradigm
Use the building blocks

Nodes

Relationships

Properties
name: value
RELATIONSHIP_NAME
Anti-pattern: rich properties
name: “Canada”
languages_spoken: “[ ‘English’, ‘French’ ]”
Normalize Nodes
Anti-Pattern: Node represents
multiple concepts
name
flag_uri
language_name
number_of_words
yes_in_language
no_in_language
currency_code
currency_name
Country
USES_CURRENCY
Split up in separate concepts
name
flag_uri
currency_code
currency_name
Country
name
number_of_words
yes
no
Country
SPEAKS
Currency
currency_code
currency_name
Challenge: Property or Relationship?

Can every property be replaced by a relationship?

Should every entities with the same property values
be connected?
Object Mapping

Similar to how you would map objects to a relational
database, using an ORM such as Hibernate

Generally simpler and easier to reason about

Examples

Java: Spring Data Graph

Ruby: Active Model

Why Map?

Do you use mapping because you are scared of
SQL?

Following DDD, could you write your repositories
directly against the graph API?
CONNECT for fast
access
In-Graph Indices
Relationships for querying

like in other databases

same structure for different use-cases (OLTP and
OLAP) doesn‘t work

graph allows: add more structures

Relationships should the primary means to access
nodes in the database

Traversing relationships is cheap – that’s the whole
design goal of a graph database

Use lookups only to find starting nodes for a query
Data Modeling examples in Manual
Anti-pattern: unconnected graph
name: “Jones”
name: “Jones”
name: “Jones”
name: “Jones”
name: “Jones”
name: “Jones”
name: “Jones”
name: “Jones”
name: “Jones”
name: “Jones”
name: “Jones”
5
3
Pattern: Linked List
5
3
5
4
Pattern: Multiple Relationships
5
4
5
5
Pattern-Trees: Tags and Categories
5
5
5
6
Pattern-Tree: Multi-Level-Tree
5
6
5
7
Pattern-Trees: R-Tree (spatial)
5
7
5
8
Example: Activity Stream
5
8
5
9
Graph Evolution
5
9
6
0
Evolution: Relationship to Node
6
0
SENT_EMAIL
EMAIL_FROM
EMAIL_TO
EMAIL_CC
TAGGED
. . .
see Hyperedges
Combine multiple Domains in a
Graph

you start with a single domain

add more connected domains as your system evolves

more domains allow to ask different queries

one domain „indexes“ the other

Example Facebook Graph Search

social graph

location graph

activity graph

favorite graph

...
6
2
Notes on the Graph Data Model

Schema free, but constraints

Model your graph with a whiteboard and a wise man

Nodes
as main entities but useless without connections

Relationships
are first level citizens in the model and database

Normalize
more than in a relational database

use
meaningful
relationship-types, not generic ones like IS_

use
in-graph structures
to allow different access paths

evolve
your graph to your needs, incremental growth
6
2
6
8
How to get started?

Documentation

neo4j.org

http://www.neo4j.org/learn/nosql

docs.neo4j.org - tutorials+reference

Data Modeling Examples

http://console.neo4j.org

Neo4j in Action

Good Relationships

Worldwide one-day Neo4j Trainings

Get Neo4j

http://neo4j.org/download

http://addons.heroku.com/neo4j/

Participate

http://groups.google.com/group/neo4j

http://neo4j.meetup.com

a session like this one ;)
6
8
6
9
6
9
Really, once you start
thinking in graphs
it's hard to stop
Recommendations
MDM
Systems
Management
Geospatial
Social computing
Business intelligence
Biotechnology
Making Sense of all that
data
your brain
access control
linguistics
catalogs
genealogy
routing
compensation
market vectors
What will you build?
7
0
Thank You!
Questions ?
7
0