AllegroGraph as a Graph Database

businessunknownInternet and Web Development

Nov 12, 2013 (3 years and 9 months ago)

115 views

AllegroGraph as a Graph Database

Jans

Aasman
, Ph.D.

CEO
-

Franz Inc

Ja@Franz.com


Contents


AllegroGraph as a


QuintupleStore

(well
OcttupleStore

in 2011)


RDF store


Graph Database


Agraph

architecture


Extreme use cases


AMDOCS … CRM on top a trillion triples


Pharmaceutic

… explore connections in graph space


Demo

Agraph

as a quintuple store


S, P, O, G + unique ID + transaction #


SPOG can be any data type

1



2.0

3


4

2001
-
12
-
12

after

010
-
12
-
12 +19258781444

Jans


loves

pizza


file1

12

NoOne

believes 12


And include very efficient geospatial and temporal
representations and indices


6 default indices, 24 user controlled indices


Range indexing,
Freetext

Indexing


Neighborhood

matrixes

&

UPI

maps

(for

1

ms

access)


2011
:

time,

security


Agraph

as an RDF store


RDF store when you adhere to the RDF conventions.


Full
Sparql

1.0, most of
Sparql

1.1


RDFS++
reasoner


GeoSpatial

and Temporal representations.


Prolog for Rules


Soon Common Logic (CLIF+)


As a usability layer on top of Prolog


Easier to combine Rules and Queries


Agraph

as a Graph Database


If you want a Property Graph:


use the graph argument

Jans

loves pizza gr1

gr1 weight 90

gr1 author Sophia


Schema


Node typing


Edge typing


Attributes (nodes)


Attributes (edges)


Directed edges


Undirected edges


Restricted edges


Loop edges


Attribute indexing


Starting node


Schema




Yes


Yes


Yes


Yes: A trusts B gr1, gr1 certainty 80.


Yes: A trusts B


Yes: if using RDFS symmetric property or generators


Yes, if it means there can be islands.


Yes, A loves A


Yes


No, although, is that a DB property?


Yes and No: On demand you can use Ontology and
validation is straight forward

Querying


Language


Traversals




Lisp, Prolog, JavaScript and toy version of Gremlin


Yes, through adjacency lists and special indices.. This
seems to be an implementation point and not a
fundamental property

Database


Transactional


ACID



Fully Indexed


Distributed


Cache


Embeddable


Store
-
engine


Migration
framework


Object mapping


Yes


Yes


Yes


Federation (in
-
machine, between machines), AG5


Yes, adjacency vectors (
neighbourhood

matrics
)


Yes: 3.3, No: 4.2.x


Custom


From RDB to Graph DB? Various



Only in Lisp, not in clients.

Utilities


Shell


Algorithms


Benchmark


Protocols


RDF Store


OWL Store


IDE Integration


Admin tool


Importer


Exporter


Loader


Scripting Language


All from Lisp shell, some from
cshell
,
wget
/curl


Yes, JavaScript, Prolog and Lisp


Yes, but only for RDF stores and reasoning


REST/JSON


Yes


Yes


Yes


Yes,
AGWebview


Yes, from various input formats


Yes, clients lets you dump triples


AGLoad
, Gruff,
AGWebview


Lisp and
Javascript
.

Languages


Java


Python


Ruby


C#


Scala


Clojure


Perl


PHP



Many graph algorithms using
generator model



Because of Social Network Analysis

requirements we implement many

graph algorithms.


Using generators


A first class function that takes


One node as input


Returns all children


And
neighbourhood

matrices

(or adjacency hash
-
tables) for

speed.


how far is Actor1 from Actor2?


Degrees of separation


How far is P1 from P2




Connection strength


How many shortest paths

from P1 to P2 through a

series of predicates and rules


In what groups is this actor?


Find the ego
-
network

around a person or thing


Friend, friends

of friends, etc.



Find all the fully connect

graphs around a person

or thing


Questions in SNA:


How Important is an actor?



In
-
degree, out
-
degree



Actor degree centrality


I have the most connections

in a group so I am more important



Actor closeness centrality


I have more shortest paths to

anyone else in the group so

I am more important



Actor
betweenness

centrality


I am more often on the shortest path between other people in the group so I am
more important. I can control flow of information better than other people


Has the group a leader, is the group
cohesive?


Group centralization


How centralized is this group?


Does this group have a leader


Is there someone controlling

the information flow




Group cohesiveness


How strong and well

connected is this group


Are most people connected


What is the density





All search and SNA functions use
Generators


Generator


Input: one node


Output: list of nodes


Fully functional, can be complex
sparql

or prolog queries


Or just predicates and indication of direction





How to get from A to E??

subj

pred

obj


a dinner
-
with b


a kissed
-
with c


c movie
-
with e


b kissed
-
with d


d movie
-
with e


e dinner
-
with a


(
defgenerator

knows (node)


(objects
-
of :p dinner
-
with))


(
defgenerator

knows (node)


(objects
-
of :p dinner
-
with)


(subjects
-
of :p dinner
-
with))

How to get from A to E??

(
defgenerator

knows ()


(object
-
of :p dinner
-
with)


(subject
-
of :p dinner
-
with)


(object
-
of :p movie
-
with)


(subject
-
of :p movie
-
with)


(object
-
of :p kissed
-
with)


(subject
-
of :p kissed
-
with))


(
defgenerator

knows ()


(undirected (dinner
-
with movie
-
with kissed
-
with)))

Declaratively specify

(generator knows (node)


(select (?x)


(q ??node movie
-
with ?x)



(q ??node dinner
-
with ?x)


(not (q ??node kissed
-
with ?x)))


(select (?x)


(q ?x movie
-
with ??node)


(q
-

?x dinner
-
with ??node)


(not (q
-

?x kissed
-
with ??node)))

Sample SNA functions

(Ego
-
group actor generator depth ?group)


-

binds ?group to group of nodes

(Ego
-
group
-
members actor generator depth ?a)


-

bind ?a to every member in the group

(Cliques actor generator min
-
depth ?
cl
)


-

binds ?
cl

to all cliques

(Clique
-
members actor generator min
-
depth ?
cl

?a)


-

binds ?
cl

to cliques and then iterates of ever member ?a in ?
cl

(Actor
-
centrality actor group generator ?num)


-

binds ?num to
actorcentrality

(Actor
-
centrality
-
members group ?actor ?num)



-

binds ?actor to every actor in group, ?centrality is centrality of



that actor, we start with the actor with highest centrality.

(Group
-
centrality group generator ?num)



Actor = single node

Group = list of nodes

Depth = number

Generator = generator

Integrated in Prolog and
Common Logic (CLIF)

(
defgenerator

knows (node)


(undirected :p (!
fr:dinner
-
with !
fr:kissed
-
with)))



(select (?x)


(ego
-
group
-
members !
person:jans

knows ?x 2)


(q ?x !
geo:place

?y)


(geo
-
box
-
around !
geoname:Berkeley

?y 5 miles))


(select (?x)


(ego
-
group !
person:jans

knows ?group 2)


(actor
-
centrality
-
members ?group knows ?x ?num)


(q ?x !
geo:place

?y)


(geo
-
box
-
around !
geoname:Berkeley

?y 5 miles))

Where we use this?


Amdocs: Know everything about every customer


Partitioned on customer


Most graph search centered in client



Pfizer: help me find connections between drugs, diseases,
genes, side effects in a sea of clinical trials


Just a mess of data


All graph search in server

Traditional Business Intelligence

Can tell you ALL about

the average customer










but NOTHING about







the individual.




Can you in < 1 second with one
push of a button


Predict the
three most likely reasons
why Joe Smith from
Kansas is calling the call center? Bill unexpectedly high,
loosing connection too often, doesn’t know how to use new
subscription service?


The
ten last events
that happened for JS? Phone calls,
sms
,
downloads of movie, device stopped working, payment of bill,
looking at map, search for local store.


What is the
likelyhood

that he will
change

from T
-
Mobile to
Sprint or AT&T?


What are his ten
most important friends
and what devices do
they have. And who is the first to change and who follows?


Can you in < 1 second with one
push of a button


What are the
usual daily locations
for this person? What kind
of shops?


What kind of
services

does he
download
, what kind of
movies/music/games

does he like, what products does he
buy?


Is his
plan

the
right

plan for him?


Is he in a good mood?


Is he a
valuable customer
, is he a
good payer
, what is your
margin

on him, how many times per month does he call a call
center, does he look up help for mail on the internet? Can you
predict if he is going to pay the bill?




Events

Decision Engine

Container

Container

Actions

SBA Application Server


















“Sesame”

AllegroGraph

Triple Store DB

Event

Ingestion

Scheduled

Events

Inference

Engine

(Business

Rules)

Bayesian

Belief

Network

Events

Operational Systems

Event Data Sources

Amdocs

Event Collector

CRM

CRM

RM

Amdocs

Integration

Framework

OMS

NW

Web 2.0

Work for
Pharma



sider


Gruff Demo


What about Scalability





Architecture overview

Storage layer ( compression, indexing,
freetext
, transactions )

Session Management, Query Engine, Federation

REST

Backup/Restore


Replication


Warm Failover


Security



Management


Sparql

Prolog

Rules
Clif
++

Geo

SNA

Time

RDFS+

Java
-
Script

Java:

Sesame Jena

Python

Ruby

C#

Clojure

Scala

Perl


Thanks…