# On Querying Historical Evolving

AI and Robotics

Nov 25, 2013 (4 years and 5 months ago)

81 views

1

On Querying Historical Evolving
Graph Sequences

Chenghui

Ren
\$
, Eric Lo
*
, Ben Kao
\$
,
Xinjie

Zhu
\$
,
Reynold

Cheng
\$

\$
The University of Hong Kong

\$
{
chren
,
kao
,
xjzhu
,
ckcheng
}@
cs.hku.hk

*

Hong Kong

Polytechnic University

*
ericlo@comp.polyu.edu.hk

2

Motivation

Graphs are widely used to model the world

The world is ever changing/Graphs evolve with time

3

Motivation

How does the importance of a vertex change?

E.g. closeness centrality

Evolving Graph Sequence (
EGS
)

4

Motivation

How does the shortest path between
a

and
e

change?

Evolving Graph Sequence (
EGS
)

5

0
100
200
300
400
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Snapshot number

Shortest-path distance
365
304
186
178
Key moments:

Their distance changed

How did they get closer?

The shortest path distances
between two particular

users over one year period (365
snapshots)

Example Study on

EGS

Shortest Path Query

6

Problem Definition

Evolving Graph Sequence (
EGS
)

Problem: Given a query (e.g., shortest path between
a

and
e
), find the solution for each snapshot in the EGS:

7

Issues of Querying EGS

We are interested in the
EGSs such that the snapshot
graphs are:

a)
Large

b)
Numerous

c)

We need:

Efficient algorithm to process queries on EGSs

Effective storage models to store EGSs

Example:

EGS

a) 60,000 vertices, 900,000 edges

b) 365 snapshots

c) 99%+ edges in common

8

Outline

Introduction

Solution framework

Storage models

Experimental evaluation

Conclusions

9

Baseline Algorithm

Baseline algorithm: run a traditional algorithm
directly on each snapshot in an EGS

-
first
-
search for shortest path query

Not efficient

Graphs in an EGS are usually large and numerous

Our goal: Exploit graph redundancies in an
EGS to make query processing faster

10

Find
-
Verify
-
Fix (FVF) Framework

An EGS

11

Find
-
Verify
-
Fix (FVF) Framework

12

Preprocessing:

Construct Representative Graphs

13

Preprocessing: Cluster Analysis

Segmentation clustering algorithm:

A cluster consists of successive snapshots

A cluster satisfies:

EGS

14

Query Processing Phase

Type of queries we use FVF to solve:

Shortest path

Closeness centrality

Graph diameter

15

Shortest Path Query Processing

FIND

Representative Solutions

16

Shortest Path Query Processing

VERIFY

Representative Solutions

Bounding property:

17

Shortest Path Query Processing

VERIFY

Representative Solutions

×

×

×

18

Shortest Path Query Processing

VERIFY

Representative Solutions

×

19

Shortest Path Query Processing

FIX
Representative Solutions

20

Outline

Introduction

Solution framework

Storage models

Experimental evaluation

Conclusions

21

EGS Storage Models

Wikipedia dataset (365 snapshots, >1M articles, >20M hyperlinks)

Space cost: more than 365X20M =
7.3billion

Aims of storage models:

1) Compress data to fit in memory

2) Support the application of the FVF algorithm framework

Effectiveness of our storage models:

50M

100M

compared to
7.3 billion

22

Experimental Evaluation

Datasets

Real datasets

-
friendship

Wikipedia

Synthetic datasets

FVF VS Baseline

Baseline: Execute a graph algorithm on each snapshot
independently

Settings

C++, Linux, CPU: 2.83GHz Dual Core, Memory: 4G

23

Experimental Evaluation

Average graph edit similarity (
ges
) between successive snapshots

Dataset statistics

24

Experimental Evaluation
-

Shortest Path Queries

500 queries

25

0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
Similarity threshold (

)
Number of clusters

Experimental Evaluation
-

Shortest Path Queries

FBFriend

dataset

A cluster satisfies:

1.
Fewer graphs in a cluster

2.
More clusters

Find Time

VF
-
Time

Residual
-
SPA Time

26

0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.5
1
1.5
Similarity threshold (

)
Time (sec)

FVF
Find Time
VF Time
Residual-SPA Time
Decompression Time
0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
Similarity threshold (

)
Number of clusters

Experimental Evaluation
-

Shortest Path Queries

FBFriend

dataset

1.
Fewer graphs in a cluster

2.
More clusters

27

Experimental Evaluation
-

Shortest Path Queries

0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.5
1
1.5
Similarity threshold (

)
Time (sec)

FVF
Find Time
Residual-SPA Time
FBFriend

dataset

0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
Similarity threshold (

)
Number of clusters

1.
Fewer graphs in a cluster

2.
More clusters

28

Experimental Evaluation
-

Shortest Path Queries

0.4
0.5
0.6
0.7
0.8
0.9
1
0
2
4
6
8
10
Similarity threshold (

)
Speedup

FBFriend

dataset

29

Experimental Evaluation
-

Closeness Centrality Queries

FBFriend

dataset

0.4
0.5
0.6
0.7
0.8
0.9
1
0
2
4
6
8
10
Similarity threshold (

)
Speedup

30

Conclusions

We proposed the evolving graph sequences to model world
evolution

We demonstrated that interesting information can be
obtained by posing queries on the various EGSs

We introduced the find
-
verify
-
fix (FVF) framework to query
EGSs

We discussed how to store EGSs

Experiments showed that our FVF framework is efficient and
interesting information can be unveiled

31

Thank you!

Chenghui

Ren
\$
, Eric Lo
*
, Ben Kao
\$
,
Xinjie

Zhu
\$
,
Reynold

Cheng
\$

\$
The University of Hong Kong

\$
{
chren
,
kao
,
xjzhu
,
ckcheng
}@
cs.hku.hk

*

The

Hong Kong

Polytechnic University

*
ericlo@comp.polyu.edu.hk

32

Related Work

Distance
-
based queries on a single large graph [F. Wei 2010,
Y.Xiao

2009]

Our work focuses on processing queries on an evolving graph sequence

Graph database [D.
Shasha

2002,
X.Yan

2005]

Different: Their work usually only support graph queries (e.g.
sub/super
-
graph query)

Similar: Both target to minimize the number of expensive graph
operations

Time
-
dependent graph [B. Ding 2008]

Our work is different in two ways:

Node set is not fixed