1
On Querying Historical Evolving
Graph Sequences
Chenghui
Ren
$
, Eric Lo
*
, Ben Kao
$
,
Xinjie
Zhu
$
,
Reynold
Cheng
$
$
The University of Hong Kong
$
{
chren
,
kao
,
xjzhu
,
ckcheng
}@
cs.hku.hk
*
Hong Kong
Polytechnic University
*
ericlo@comp.polyu.edu.hk
2
Motivation
Graphs are widely used to model the world
The world is ever changing/Graphs evolve with time
…
…
3
Motivation
How does the importance of a vertex change?
E.g. closeness centrality
Evolving Graph Sequence (
EGS
)
…
4
Motivation
How does the shortest path between
a
and
e
change?
…
Evolving Graph Sequence (
EGS
)
…
5
0
100
200
300
400
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Snapshot number
Shortestpath distance
365
304
186
178
Key moments:
Their distance changed
How did they get closer?
The shortest path distances
between two particular
Facebook
users over one year period (365
snapshots)
Example Study on
Facebook
EGS
Shortest Path Query
6
Problem Definition
Evolving Graph Sequence (
EGS
)
Problem: Given a query (e.g., shortest path between
a
and
e
), find the solution for each snapshot in the EGS:
…
…
7
Issues of Querying EGS
We are interested in the
EGSs such that the snapshot
graphs are:
a)
Large
b)
Numerous
c)
Gradually evolving
We need:
Efficient algorithm to process queries on EGSs
Effective storage models to store EGSs
Example:
Facebook
EGS
a) 60,000 vertices, 900,000 edges
b) 365 snapshots
c) 99%+ edges in common
8
Outline
Introduction
Solution framework
Storage models
Experimental evaluation
Conclusions
9
Baseline Algorithm
Baseline algorithm: run a traditional algorithm
directly on each snapshot in an EGS
E.g., breadth

first

search for shortest path query
Not efficient
Graphs in an EGS are usually large and numerous
Our goal: Exploit graph redundancies in an
EGS to make query processing faster
10
Find

Verify

Fix (FVF) Framework
An EGS
11
Find

Verify

Fix (FVF) Framework
√
√
√
√
12
Preprocessing:
Construct Representative Graphs
13
Preprocessing: Cluster Analysis
Segmentation clustering algorithm:
A cluster consists of successive snapshots
A cluster satisfies:
EGS
14
Query Processing Phase
Type of queries we use FVF to solve:
Shortest path
Closeness centrality
Graph diameter
15
Shortest Path Query Processing
FIND
Representative Solutions
16
Shortest Path Query Processing
VERIFY
Representative Solutions
Bounding property:
17
Shortest Path Query Processing
VERIFY
Representative Solutions
√
×
×
×
18
Shortest Path Query Processing
VERIFY
Representative Solutions
√
√
×
19
Shortest Path Query Processing
FIX
Representative Solutions
20
Outline
Introduction
Solution framework
Storage models
Experimental evaluation
Conclusions
21
EGS Storage Models
Wikipedia dataset (365 snapshots, >1M articles, >20M hyperlinks)
Space cost: more than 365X20M =
7.3billion
hyperlinks!!!
Aims of storage models:
1) Compress data to fit in memory
2) Support the application of the FVF algorithm framework
Effectiveness of our storage models:
50M
hyperlinks for the baseline algorithm,
100M
hyperlinks for the FVF algorithm,
compared to
7.3 billion
hyperlinks without compression!!!
22
Experimental Evaluation
Datasets
Real datasets
Facebook

friendship
YouTube
Wikipedia
Synthetic datasets
FVF VS Baseline
Baseline: Execute a graph algorithm on each snapshot
independently
Settings
C++, Linux, CPU: 2.83GHz Dual Core, Memory: 4G
23
Experimental Evaluation
Average graph edit similarity (
ges
) between successive snapshots
Dataset statistics
24
Experimental Evaluation

Shortest Path Queries
500 queries
25
0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
Similarity threshold (
)
Number of clusters
Experimental Evaluation

Shortest Path Queries
FBFriend
dataset
A cluster satisfies:
1.
Fewer graphs in a cluster
2.
More clusters
Find Time
VF

Time
Residual

SPA Time
26
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.5
1
1.5
Similarity threshold (
)
Time (sec)
FVF
Find Time
VF Time
ResidualSPA Time
Decompression Time
0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
Similarity threshold (
)
Number of clusters
Experimental Evaluation

Shortest Path Queries
FBFriend
dataset
1.
Fewer graphs in a cluster
2.
More clusters
27
Experimental Evaluation

Shortest Path Queries
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.5
1
1.5
Similarity threshold (
)
Time (sec)
FVF
Find Time
ResidualSPA Time
FBFriend
dataset
0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
Similarity threshold (
)
Number of clusters
1.
Fewer graphs in a cluster
2.
More clusters
28
Experimental Evaluation

Shortest Path Queries
0.4
0.5
0.6
0.7
0.8
0.9
1
0
2
4
6
8
10
Similarity threshold (
)
Speedup
FBFriend
dataset
29
Experimental Evaluation

Closeness Centrality Queries
FBFriend
dataset
0.4
0.5
0.6
0.7
0.8
0.9
1
0
2
4
6
8
10
Similarity threshold (
)
Speedup
30
Conclusions
We proposed the evolving graph sequences to model world
evolution
We demonstrated that interesting information can be
obtained by posing queries on the various EGSs
We introduced the find

verify

fix (FVF) framework to query
EGSs
We discussed how to store EGSs
Experiments showed that our FVF framework is efficient and
interesting information can be unveiled
31
Thank you!
Chenghui
Ren
$
, Eric Lo
*
, Ben Kao
$
,
Xinjie
Zhu
$
,
Reynold
Cheng
$
$
The University of Hong Kong
$
{
chren
,
kao
,
xjzhu
,
ckcheng
}@
cs.hku.hk
*
The
Hong Kong
Polytechnic University
*
ericlo@comp.polyu.edu.hk
32
Related Work
Distance

based queries on a single large graph [F. Wei 2010,
Y.Xiao
2009]
Our work focuses on processing queries on an evolving graph sequence
Graph database [D.
Shasha
2002,
X.Yan
2005]
Different: Their work usually only support graph queries (e.g.
sub/super

graph query)
Similar: Both target to minimize the number of expensive graph
operations
Time

dependent graph [B. Ding 2008]
Our work is different in two ways:
Node set is not fixed
Find answers on all snapshots
Comments 0
Log in to post a comment