Efficient Algorithms for Reachability and Path-Selection Problems with Applications

mitemaskNetworking and Communications

Jul 13, 2012 (5 years and 4 months ago)

348 views

Efficient Algorithms for Reachability and
Path-Selection Problems with Applications
Final Report of Project Funded by the
John S.Latsis Public Benefit Foundation
December 2010
TeamMembers
Department of Informatics and Department of Computer Science
Telecommunications Engineering University of Ioannina
University of Western Macedonia
Alexandra Galani Stavros D.Nikolopoulos
Research and Teaching Staff Professor
Loukas Georgiadis Leonidas Palios
Assistant Professor (Coordinator) Associate Professor
Abstract
Graphs are mathematical structures that model many important entities such as the
world-wide web,transportation,communication and social networks,databases,and
biological systems.The objective of this research project was the design of efficient al-
gorithms for a collection of graph problems related to Reachability and Path-Selection.In
Reachability and Path-Selection problems we are given an input graph and wish to ef-
ficiently perform queries that report if two vertices are connected by a path or compute
paths connecting specified vertices so that certain requirements are satisfied.Algo-
rithmic problems of this kind have numerous applications,including internet routing,
geographical navigation,and knowledge-representation systems.Specifically,in this
project we studied the following types of problems:
Join-Reachability:This is a natural extension of the standard Reachability problem
for a collection of graphs G.We wish to process G so that we can report fast the
set of vertices that reach a given vertex in all graphs of G.
Computation of Disjoint Paths:Our goal is to compute a pair of disjoint paths from
a given source vertex to every other vertex,or to a specific target vertex.
We developed algorithmic techniques and provided efficient algorithms for problems of
the above types.We also considered newapplications of our techniques and algorithms.
Περίληψη
Ταγραφήµαταείναι µαθηµατικές δοµές πουµοντελοποιούνπολλές σηµαντικές οντό-
τητες όπως ο παγκόσµιος ιστός,µεταφορικά,επικοινωνιακά και κοινωνικά δίκτυα,
βάσεις δεδοµένων και βιολογικάσυστήµατα.Οσκοπός του ερευνητικού προγράµµα-
τος ήταν η σχεδίαση αποδοτικών αλγόριθµων για µια συλλογή προβληµάτων που
σχετίζονται µε τη Συνδετικότητα και την Επιλογή Μονοπατιών σε γραφήµατα.Σε
προβλήµατα Συνδετικότητας και Επιλογής Μονοπατιών µας δίνεται ένα γράφηµα
εισόδου για το οποίο επιθυµούµε να απαντούµε αποδοτικά ερωτήµατα για το εάν
δύο κορυφές του συνδέονται µε κάποιο µονοπάτι ή να υπολογίζουµε µονοπάτια που
συνδέουν συγκεκριµένες κορυφές και ταυτόχρονα ικανοποιούν καθορισµένες απαι-
τήσεις.Αλγοριθµικά προβλήµατα αυτού του τύπου έχουν πολυάριθµες εφαρµογές
που περιλαµβάνουν τη δροµολόγηση σε δίκτυα,τη γεωγραφική πλοήγηση και τα
συστήµατα αναπαράστασης γνώσης.Συγκεκριµένα,στο πρόγραµµα αυτό εξερευ-
νήσαµε τους ακόλουθους τύπους προβληµάτων:
Από Κοινού Συνδετικότητα:Είναι µια φυσική επέκταση του τυπικού προβλήµα-
τος συνδετικότητας για µια συλλογή γραφηµάτων G.Επιθυµούµε να επεξερ-
γαστούµε τη G έτσι ώστε να µπορούµε να αναφέρουµε γρήγορα το σύνολο των
κορυφών γιατις οποίες υπάρχει µονοπάτι προς µιαδεδοµένη κορυφή σε όλατα
γραφήµατα της G.
Υπολογισµός ΜηΤεµνόµενων Μονοπατιών:Οστόχος µας είναι ναυπολογίσουµε
ένα ζεύγος µη τεµνόµενων µονοπατιών από µια δεδοµένη αφετηριακή κορυφή
προς κάθε άλλη κορυφή ή προς µια συγκεκριµένη καταληκτική κορυφή.
Αναπτύξαµε αλγοριθµικές τεχνικές και παρουσιάσαµε αποδοτικούς αλγόριθµους
γιαπροβλήµατατωνπαραπάνωτύπων.Επιπλέοναναζητήσαµε νέες εφαρµογές των
τεχνικών και των αλγορίθµων µας.
Contents
1 Introduction
4
1.1 Fundamental Concepts in Graph Theory
...................
5
1.2 Reachability
....................................
6
1.3 Path-Selection
...................................
7
2 Join-Reachability
8
2.1 Applications
....................................
9
2.2 Results
.......................................
10
2.3 Explicit Join-Reachability
............................
11
2.3.1 Computational Complexity
.......................
11
2.3.2 Combinatorial Complexity
.......................
12
2.4 Implicit Join-Reachability
............................
13
3 Connectivity and Vertex-Disjoint Paths
16
3.1 Vertex Connectivity
................................
16
3.2 Dominator Verification
..............................
17
3.3 Independent Spanning Trees
..........................
19
3.4 Testing 2-Vertex Connectivity
..........................
20
3.5 Computing Pairs of Vertex-Disjoint s-t Paths
.................
20
4 Further Applications
22
1
4.1 Interprocedural Dominance
...........................
22
4.2 Computational Morphological Analysis
....................
24
5 Conclusions and Future Work
26
2
List of Figures
1.1 A directed graph.
.................................
5
2.1 An instance of join-reachability for two digraphs.
..............
9
2.2 Reducing the size of a graph with the use of a Steiner vertex.
.......
12
2.3 Mapping the vertices of two paths to points in the plane.
..........
13
2.4 A Cartesian tree.
.................................
14
3.1 A strongly connected and a 2-vertex connected digraph.
..........
17
3.2 A flowgraph G(s) and its dominator tree D(s).
................
18
3.3 Two independent spanning trees of a 2-vertex connected graph.
......
19
3.4 Two vertex-disjoint paths in a 2-vertex connected graph.
..........
21
4.1 An interprocedural flowgraph and its dominator dag.
............
23
4.2 A graph of a morphological analysis.
.....................
25
3
Chapter 1
Introduction
The area of graph algorithms is rich and successful as graphs are mathematical struc-
tures that model many important and diverse entities such as the world-wide web,
transportation,communication and social networks,databases,biological systems and
the control-flow of computer programs.The problems of reachability and path-selection
are fundamental in graph algorithms,with numerous application areas,including inter-
net routing,geographical navigation,knowledge-representation,computational biology,
program optimization and natural language processing.Our project was motivated by
recent advances in these areas,as well as emerging applications of graph-based data
structures.
We studied a collection of graph problems related to reachability and path-selection.
The outcomes of this program were the design of efficient algorithms,the development
of new algorithmic techniques,the design and implementation of practical algorithms,
and the identification of new applications of our algorithms and techniques.In this
report we restrict ourselves to an overview of the research project,in order to make the
content comprehensive to nonspecialists in theoretical computer science.For the full
technical details and proofs of our results we refer to the research articles,which are
preliminary versions of [
Geo10
,
GNP10
,
GT10
],posted at the project’s website:
4
V = fa,b,c,d,eg
G = (V,E)
e
d
E = f(a,b),(b,c),(c,d),(d,a),(d,e),(e,c)g
a
b
c
Figure 1.1:
A directed graph.
http://www.icte.uowm.gr/lgeorg/RPS/
In this chapter we introduce the basic terminology and define the problems in
our study.Chapter
2
deals with reachability problems and Chapter
3
discusses path-
selection problems.In Chapter
4
we present further applications of our techniques.
Finally,in Chapter
5
we discuss directions for future research.
1.1 Fundamental Concepts in Graph Theory
Agraph G = (V,E) is an abstract representation of a set of objects V,called vertices,and
a set of links E,called edges,which connect pairs of objects.The edges may be directed
(asymmetric),in which case we have a directed graph or undirected (symmetric),in which
case we have an undirected graph.Figure
1.1
shows a directed graph with 5 vertices and
6 edges.
A path v
1
,v
2
,...,v
k
in G is a sequence of vertices v
i
2 V such that there is an edge
in E from v
i
to v
i+1
,denoted as (v
i
,v
i+1
),for i = 1,...,k ¡1;v
1
is the start vertex and
v
k
is the end vertex of the path.For example,the sequence a,b,c,d is a path from a to
d in the graph of Figure
1.1
.For any pair of vertices v,u 2 V,vertex v is reachable from
u (equivalently u reaches v) if there a path with start vertex u and end vertex v.A cycle
is a path such that the start vertex and the end vertex are the same,e.g.,e,c,d,e in the
graph of Figure
1.1
.A graph with no cycle is called acyclic.
An undirected graph G = (V,E) is connected if for every pair of vertices u,v 2 V
5
there is a path connecting u and v.A tree is an undirected graph that is acyclic and
connected.Let G = (V,E) and G
0
= (V
0
,E
0
) be two undirected graphs.Then G
0
is a
subgraph of G if V
0
is a subset of V (V
0
µ V) and E
0
is a subset of E (E
0
µ E).If V
0
= V
then G
0
is a spanning subgraph of G.If,moreover,G
0
is a tree then G
0
is a spanning tree of
G.
Adirected graph G = (V,E) is strongly connected if for every pair of vertices u,v 2 V,
u reaches v and v reaches u.E.g.,the graph of Figure
1.1
is strongly connected.A
spanning tree of G rooted at a vertex s is a subgraph of G such that for any other vertex
v 2 V there is exactly one path froms to v.E.g.,the spanning subgraph of the graph of
Figure
1.1
that is formed by the subset of edges f(a,b),(d,a),(d,e),(e,c)g is a spanning
tree with root d.
A planar graph is a graph that can be embedded in the plane,i.e.,it can be drawn on
the plane in such a way that its edges intersect only at their endpoints.The graph of
Figure
1.1
is planar.
All the graphs considered in this study are directed,although some of the problems
we discuss can be also defined (and are interesting) for undirected graphs.
1.2 Reachability
In the reachability problem our goal is to preprocess an input graph into a data structure
so that queries of whether a vertex b is reachable from a vertex a can be answered
quickly.This has applications in internet routing,geographical navigation,knowledge-
representation systems and other areas [
WHY
+
06
].In this project we introduced the
study of a related collection of novel problems which we call join-reachability problems.
These are motivated by recent work on graph-structured databases,social networks and
program optimization.Formally,we are given a collection of graphs G,where each
graph G
i
2 G represents a binary relation over a set of elements V.We define the join-
reachability relation R as follows:a is related to b under R if and only if b is reachable
6
from a in all graphs in G.Our goal is to find an efficient representation of R such that,
for any given b 2 V,we can quickly report all the elements that are related to b in R.
In Chapter
2
we distinguish two versions of this problem,depending on the type of
desired representation of R,and provide an overview of our results.
1.3 Path-Selection
The second main thread of our project deals with the design of algorithms for a spe-
cific type of path-selection problems.Path-selection refers to the computation of paths
connecting a given subset of the vertices of a graph,such that certain requirements are
satisfied.Typical examples are the computation of a shortest path between two vertices,
finding a path connecting two vertices such that a region of the graph is avoided,or
computing edge-disjoint or vertex-disjoint paths.This area contains some of the most
important network optimization problems which have been extensively studied.
In this project we explored the computation of pairs of disjoint paths:Given a source
vertex our goal is to compute two disjoint paths to every other vertex,or to a specific
target vertex.We also considered the problem of testing if the input graph has the
necessary connectivity requirements for such disjoint paths to exist.An overview of
this study is presented in Chapter
3
.
7
Chapter 2
Join-Reachability
In the reachability problem our goal is to preprocess a graph G into a data structure
that can quickly answer queries that ask if a vertex b is reachable from a vertex a.
This problem is fundamental for many application areas,including internet routing,
geographical navigation,and knowledge-representation systems [
WHY
+
06
].Recently,
the interest in graph reachability problems has been rekindled by emerging applications
of graph data structures in areas such as the semantic web,bio-informatics and social
networks.
The above developments together with recent applications in graph algorithms [
Geo08
,
Geo10
,
GT05
] have motivated us to introduce the study of the join-reachability problem:
We are given a collection G of l directed graphs G
i
= (V
i
,A
i
),1 · i · l,where each
graph G
i
represents a binary relation R
i
over a set of elements V µ V
i
in the following
sense:For any a,b 2 V,we have that a is related to b under R
i
,denoted by aR
i
b,if and
only if b is reachable from a in G
i
.Let R ´ R(G) be the binary relation over V defined
by:aRb if and only if aR
i
b for all i 2 f1,...,lg (i.e.,b is reachable from a in all graphs
in G).An example is given in Figure
2.1
;a join-reachability query for vertex c returns
the set fa,b,f,gg which consists of the vertices reaching c in both G
1
and G
2
.
Our objective is to find an efficient representation of this relation.For simplicity,we
will restrict our attention to the case of two input graphs (l = 2).
8
e
c
G
2
G
1
f
h
g
d
b
h
g
f
e
d
c
b
a
a
Figure 2.1:
An instance of join-reachability for two digraphs.
The join-reachability problem admits a simple solution,which is to precompute the
answer to all possible join-reachability queries:For vertex a 2 V and for each graph
G
i
2 G we can compute the set reach(a,i) consisting of the vertices that reach a in
G
i
.Then we can store the answer to the join-reachability query for a by computing
the intersection
T
l
i=1
reach(a,i).With this representation join-reachability queries can
be answered in optimal time,but it requires O(n
2
) storage space,which is prohibitive
for large graphs.Our goal is to construct space-efficient representations that allow fast
join-reachability reporting.
2.1 Applications
Instances of the join-reachability problem appear in various applications.For exam-
ple,in the rank aggregation problem[
DKNS01
] we are given a collection of rankings of
some elements and we may wish to report which (or howmany) elements have the same
ranking relative to a given element.This is a special version of join-reachability since
the given collection of rankings can be represented by a collection of directed paths with
the elements being the vertices of the paths.Similarly,in a graph-structured database
9
with an associated ranking of its vertices we may wish to find the vertices that are re-
lated to a query vertex and have higher or lower ranking than this vertex.Instances
of join-reachability also appear in graph algorithms arising fromprogramoptimization.
Specifically,in [
Geo08
] we need a data structure capable of reporting which vertices sat-
isfy certain ancestor-descendant relations in a collection of rooted trees.Also,in current
work in progress,we show that join-reachability structures for two trees can yield effi-
cient solutions to special cases of the interprocedural dominance problem [
dSvPdB07
].
See Section
4.1
.
There are also instances of join-reachability that are related to the topics considered
in Chapter
3
.In [
GT05
] (see also [
GT10
]) it is shown that any directed graph G with
a distinguished source vertex s has two spanning trees rooted at s such that a vertex a
is a dominator of a vertex b (meaning that all paths in G from s to b pass through a)
if and only if a is an ancestor of b in both spanning trees.This generalizes the graph-
theoretical concept of independent spanning trees.Two spanning trees of a graph G are
independent if they are both rooted at the same vertex s and for each vertex v the paths
from s to v in the two trees are internally vertex-disjoint.Similarly,l spanning trees of
G are independent if they are pairwise independent.In this setting,we can apply a join-
reachability structure to decide if l given spanning trees are independent.Moreover,a
variant of the join-reachability problemappears in our algorithmfor computing pairs of
vertex-disjoint paths [
Geo10
].
2.2 Results
In [
GNP10
] we explored two versions of the join-reachability problem.In the explicit
version we wish to represent R with a directed graph J ´ J (G),which we call the
join-reachability graph of G,i.e.,for any a,b 2 V,we have aRb if and only if b is reachable
from a in J.Our goal is to minimize the size (i.e.,the number of vertices plus edges)
of J.We presented results on the computational and combinatorial complexity of J.
10
In the implicit version we wish to represent R with an efficient data structure (in terms
of storage space and query time) that can report fast all elements a 2 V satisfying aRb
for any query element b 2 V.First,we provided efficient join-reachability structures for
simple graph classes.Then,based on these results,we considered planar graphs and
general directed graphs.
2.3 Explicit Join-Reachability
In the explicit version of join-reachability we wish to construct a join-reachability graph
of small size.First we explore the computational complexity of computing the smallest
such graph,and then we provide bounds for its size in several cases.
2.3.1 Computational Complexity
We consider the computational complexity of computing the smallest J(fG
1
,G
2
g):
Given two graphs G
1
= (V,A
1
) and G
2
= (V,A
2
) we wish to compute a graph
J ´ J (fG
1
,G
2
g) of minimum size such that for any a,b 2 V,b is reachable from a
in J if and only if b is reachable from a in both G
1
and G
2
.We can further distinguish
two versions of this problem,depending on whether J is allowed to have Steiner ver-
tices (i.e.,vertices not in V) or not:In the unrestricted version V(J) ¶ V,while in the
restricted version V(J) = V.
The problemof computing the smallest J in the unrestricted case belongs to the class
of NP-hard problems.This is implied by a straightforward reduction to the reachability
substitute problem,which was shown to be NP-hard by Katriel et al.[
KKS05
].
In the restricted case,on the other hand,we can compute J using transitive closure
and transitive reduction computations,which can be done in polynomial time [
AGU72
].
Note that the existence of Steiner vertices can reduce the size of J significantly.
Consider for example a complete bipartite digraph G with V(G) = X [Y and A(G) =
X £Y.This digraph has the same transitive closure as the digraph G
0
with V(G
0
) =
11
z
Y
X
Y
X
Figure 2.2:
Reducing the size of a graph with the use of a Steiner vertex z.
V(G) [fzg and A(G
0
) = f(x,z),(z,y) j x 2 X,y 2 Yg.See Figure
2.2
.
2.3.2 Combinatorial Complexity
The objective here is to develop methods for constructing small join-reachability graphs
(but not necessarily optimal),and then provide bounds on the size (number of vertices
plus edges) of these constructions.Our starting point is to build join-reachability graphs
for paths and trees.The basic idea is to map the vertices of V to geometric objects in a
d-dimensional space,where d is some constant.Then,the join-reachability relation can
be decided fromthe position of these objects in the d-dimensional space.
An example for the case of two paths is depicted in Figure
2.3
:Each vertex v 2 V
receives coordinates (x
1
(v),x
2
(v)),where x
1
(v) corresponds to the position of v in the
first path,and x
2
(v) corresponds to the position of v in the second path.Specifically,
x
1
(v) is equal to the number of vertices (other than itself) that reach v in G
1
;x
2
(v) is
defined analogously.It follows that each vertex is mapped to a point in the 2d-space
[O,n ¡1]
2
and has integer coordinates.Moreover,for any two vertices a,b 2 V,we
have that b is reachable from a if and only if (x
1
(a),x
2
(a)) · (x
1
(b),x
2
(b)).In Figure
2.3
the vertices that reach f in both G
1
and G
2
are inside the dashed rectangle.Based
on this geometrical view we can find the necessary edges (and Steiner vertices) that
12
g[6]
h
a[0]
b[1]
c[2]
d[3]
e[4]
f [5]
h[7]
a[0]
e[1]
c[2]
g[3]
b[4]
f [5]
d[6]
h[7]
G
2
0
1
2
3
4
6
7
1
2
5
4
3
7
6
5
x
1
x
2
0
a
c
b
g
e
f
d
G
1
Figure 2.3:
Mapping the vertices of two paths to points in the plane.
should be included in the join-reachability graph.The bound we derive is O(nlogn)
which turns out to be tight in the worst case:We presented examples where the smallest
join-reachability graph must have W(nlogn) edges.Based on similar ideas we provide
methods for building join-reachability graphs of size O(nlog
k
n),for some constant k ·
3,when we deal with trees and planar graphs.These methods can also be applied
to general graphs,but the quality of the produced structures depends on number of
disjoint-paths into which the graphs can be decomposed.
2.4 Implicit Join-Reachability
In the implicit version of the join-reachability problem our goal is to construct an effi-
cient data structure that supports the following type of query:Given a query vertex b
report all vertices a that reach b in J(fG
1
,G
2
g).We measure the efficiency of a data
structure in terms of the storage space it requires,and the time it needs to answer
a join-reachability query (i.e.,the time it needs to locate all vertices that reach b in
J(fG
1
,G
2
g)).To that end,we use the notation hs(n),q(n,k)i to refer to a data structure
13
G
1
g[6]
a[0]
b[1]
c[2]
d[3]
e[4]
f [5]
h[7]
a[0]
e[1]
c[2]
g[3]
b[4]
f [5]
d[6]
h[7]
G
2
0
1
2
3
4
6
7
1
2
5
4
3
7
6
5
x
1
x
2
0
a
b
g
e
f
d
h
c
Figure 2.4:
A Cartesian tree.
with O(s(n)) space and O(q(n,k)) query time for reporting k elements.
In order to design efficient join-reachability data structures we apply the techniques
of Section
2.3.2
combined with data structures fromcomputational geometry.
Consider,for example,the case of two paths.Using the mapping of
2.3.2
we need
a data structure that returns the vertices a with (x
1
(a),x
2
(a)) · (x
1
(b),x
2
(b)).This
reporting can be accomplished with a Cartesian tree [
GBT84
].A Cartesian tree T is a
binary tree defined recursively as follows:The root of T is the point a with minimum
x
2
-coordinate.The left subtree of the root is a Cartesian tree for the points b with
x
1
(b) < x
1
(a) and the right subtree of the root is a Cartesian tree for the points b with
x
1
(b) > x
1
(a).See Figure
2.4
.The reporting algorithm uses the following property:
Consider two points a and b,and let c be the point with minimum x
2
-coordinate such
that x
1
(a) · x
1
(c) · x
1
(b).Then c is the nearest common ancestor of a and b in T.
(The nearest common ancestor of two vertices in a tree is their common ancestor that
is farthest from the root.E.g.,in Figure
2.4
the nearest common ancestor of d and h is
e.) Now let z be the point with the smallest x
1
-coordinate.In order to find all points
a such that (x
1
(a),x
2
(a)) · (x
1
(b),x
2
(b)) we first locate the nearest common ancestor
14
of z and b in T;call this vertex y.The returned point y has the smallest x
2
-coordinate
in the x
1
-range [0,x
1
(b)].If x
2
(y) > x
2
(b) then the answer is null and we stop our
search.Otherwise we return y and search recursively in the x
1
-ranges [0,x
1
(y) ¡1] and
[x
1
(y) +1,x
1
(b)].Using the fact that nearest common ancestor queries in a tree can
be answered in constant time after linear time preprocessing [
HT84
],it follows that the
efficiency of the above data structure is hn,ki.
Again,based on similar ideas,we provide data structures for trees,planar graphs,
and general graphs.
15
Chapter 3
Connectivity and Vertex-Disjoint
Paths
In this chapter we present algorithms for computing pairs of vertex-disjoint paths in a
graph G = (V,E) froma common start vertex.We consider the following two problems:
(a)
Compute a pair of vertex-disjoint s-v paths for all vertices v 2 Vn fsg,where s 2 V
is a fixed source vertex.
(b)
Compute a pair of vertex-disjoint s-t paths for a given start vertex s and a given
terminal vertex t.
We also consider the connectivity requirements that G must satisfy in order for such
paths to exist.We remark that the more general problem of computing two vertex-
disjoint paths that may connect different start and terminal vertices is NP-hard [
BJG02
].
3.1 Vertex Connectivity
A directed (undirected) graph is k-vertex connected if it has at least k +1 vertices and the
removal of any set of at most k ¡1 vertices leaves the graph strongly connected (con-
nected).See Figure
3.1
.The vertex connectivity k ´ k(G) of a graph G is the maximumk
16
(i)
(ii)
Figure 3.1:
A strongly connected and a 2-vertex connected digraph.
such that G is k-vertex connected.
Graph connectivity is one of the most fundamental concepts in graph theory with
numerous practical applications [
BJG02
].Currently,the fastest known algorithm for
computing k is due to Gabow [
Gab06
],with O((n +minfk
5/2
,kn
3/4
g)m) running time.
A related problem is to test if a graph satisfies k ¸ k for a given integer k.Henzinger
et al.[
HRG00
] showed how to test k-vertex connectivity in time O(minfk
3
+n,kngm).
They also gave a randomized algorithm for computing k with error probability 1/2 in
time O(nm).For an undirected graph,a result of Nagamochi and Ibaraki [
NI92
] allows
m to be replaced by kn or kn in the above bounds.Cheriyan and Reif [
CR94
] showed
howto test k-vertex connectivity in a directed graph with a Monte Carlo algorithmwith
running time O((M(n) +nM(k)) logn) and error probability 1/n,and with a Las Vegas
algorithm with expected running time O((M(n) +nM(k))k).In these bounds,M(n) is
the time to multiply two n £n matrices,which is O(n
2.376
) [
CW90
].
3.2 Dominator Verification
A flowgraph G(s) = (V,A,s) is a graph with a distinguished root s 2 V such that every
vertex is reachable from s.The dominance relation in G is defined as follows:A vertex w
dominates a vertex v if every path from s to v includes w;if w 62 fs,vg then w is a proper
dominator of v;otherwise,w is a trivial dominator of v.The dominance relation can be
17
h
h
d
c
e
g
g
f
e
d
c
f
b
a
s
s
a
b
G(s)
D(s)
Figure 3.2:
A flowgraph G(s) and its dominator tree D(s).
represented compactly by the dominator tree D:This is a tree rooted at s that satisfies the
following property:For any two vertices v and w,w dominates v if and only if w is an
ancestor of v in D [
ASU86
].See Figure
3.2
.
The computation of dominators appears in several application areas,such as pro-
gram optimization and code generation,constraint programming,circuit testing,theo-
retical biology,and other areas [
GTW06
].Dominators can be computed in almost linear
time with the algorithm of Lengauer and Tarjan [
LT79
].This algorithm has some con-
ceptual complexities,but it is used in many applications as more simple algorithms have
quadratic complexity or worse.There are also even more complicated truly linear-time
algorithms [
AHLT99
,
BGK
+
08
,
GT04
].
We define the dominator verification problem as follows:Given a flowgraph G(s) and
a tree T test if T is the dominator tree of G(s).An important special case of this prob-
lem is the verification of trivial dominators:Given a flowgraph G(s) test if s is the only
proper dominator of every vertex v 6= s.We have shown that the dominator verification
problemcan be reduced in linear time to the problemof verifying trivial dominators.
The dominator verification problem was initially motivated by the complexities of
the efficient algorithms for computing dominators.Moreover,in the next sections we
18
t
f
d
c
b
e
a
s
T
2
T
1
s
a
e
b
c
d
f
t
G
s
a
e
b
c
t
f
d
Figure 3.3:
Two independent spanning trees of a 2-vertex connected graph.
show that the problems of testing the existence of pairs of vertex-disjoint paths of type
(a) to all vertices starting from a fixed source,and (b) from a given source vertex to a
given target vertex,can be reduced to the verification of trivial dominators.
3.3 Independent Spanning Trees
Let T
1
and T
2
be two spanning trees of a graph G = (V,E) rooted at a vertex s 2 V.
The spanning trees are independent if for each vertex v the two s-v paths in T
1
and T
2
are
internally vertex-disjoint.See Figure
3.3
.The spanning trees are strongly independent if
they contain an s-v path and an s-u path that are vertex-disjoint,for all pairs of vertices
u and v.Independent spanning trees have been used in fault-tolerant communications
(see,e.g.,[
AB00
,
IR88
]).
The existence of two such spanning trees is implied by a result of Whitty [
Whi87
],
when G satisfies the following necessary and sufficient condition:G contains two vertex-
disjoint s-v paths for all vertices v 6= s.This is equivalent to stating that the flowgraph
with root s has only trivial dominators.Whitty gave a polynomial-time construction for
two strongly independent spanning trees.Simpler constructions were later provided by
Plehn [
Ple91
],Cheriyan and Reif [
CR94
],but the time complexity of these constructions
was not specified.Huck [
Huc94
] gave an O(mn)-time construction of two independent
spanning trees.In [
GT10
] we provide linear-time constructions of two strongly inde-
19
pendent spanning trees and other related concepts.
3.4 Testing 2-Vertex Connectivity
Consider a 2-vertex connected graph G = (V,E).For s 2 V,let G(s) be the flowgraph
with root s.The definition of 2-vertex connectivity implies that s is the only proper
dominator in G(s) for all vertices v 6= s.The same property holds for the reverse graph
G
r
,which is derived from G after reversing all edge directions.
In [
Geo10
] we show that for a graph to be 2-vertex connected it is sufficient that
the above two properties hold for two arbitrary vertices.Therefore,testing a graph
for 2-vertex connectivity can be reduced to testing if constant number of flowgraphs
have trivial dominators only.This reduction together with the results of [
GT10
] imply a
simple linear-time algorithmfor testing 2-vertex connectivity.
3.5 Computing Pairs of Vertex-Disjoint s-t Paths
We consider next the problemof computing two internally vertex-disjoint paths directed
from s to t,for any given source vertex s and target vertex t.See Figure
3.4
.This
problem can be reduced to computing two edge-disjoint paths (by applying a standard
vertex splitting procedure),which in turn can be carried out in O(m) time by computing
two flow-augmenting paths [
BJG02
].
In [
Geo10
] we presented a faster algorithm for 2-vertex connected graphs.First we
note that our algorithm for testing 2-vertex connectivity allows us to find in linear time
a 2-vertex connected spanning subgraph of the input digraph with O(n) edges.Hence,
the flow-augmenting algorithm can compute two internally vertex-disjoint s-t paths in
O(n) time.We can further improve this with the use of independent spanning trees.
Based on the results mentioned in Section
3.3
,we can construct a linear space data
structure that computes two internally vertex-disjoint s-t paths,for any s,t,in O(log
2
n)
20
c
g
a
e
b
c
d
f
h
G
g
a
e
b
d
two vertex-disjoint d-e paths
h
f
Figure 3.4:
Two vertex-disjoint paths in a 2-vertex connected graph.
time,so that the two paths can be reported in constant time per vertex.We remark that
the reporting algorithm needs to find common ancestors of some vertices in pairs of
trees,which is a variant of the join-reachability problemdefined in Chapter
2
.
21
Chapter 4
Further Applications
Now we consider additional applications of our algorithms and techniques.We remark
that the material we present here is part of ongoing research.
4.1 Interprocedural Dominance
As we already mentioned in Section
3.2
the computation of dominators is crucial in
the analysis and optimization of computer programs.In the context of whole-program
analysis and optimization,however,we have to take into account the fact that there are
path-constraints which make some paths of the flowgraph invalid [
RHS95
].As a result,
the most efficient algorithms for intraprocedural dominators are unable to handle the
intrerprocedural case.
We formulate the interprocedural dominance problemas in [
dSvPdB07
].The vertices
of the flowgraph are partitioned into sets corresponding to different procedures.Each
procedure P has a unique entry vertex s(P) and a unique exit vertex t(P);the main
procedure contains the root vertex s and the terminal vertex t.An edge e is directed
from tail(e) to head(e).A call edge has the form (x,s(P)) with x 62 P.Similarly,a return
edge has the form(t(P),y) with y 62 P.Each call edge has a unique corresponding return
edge and vice versa.We let f denote the (bijective) function that maps a call edge to
22
main
A
B
s
c
d
t
s(A)
t(A)
s(B)
t(B)
s(A)
t(A)
s(B)
t(B)
a
c
b
d
e
f
f
a
b
e
t
s
f(e
1
) = e
2
f(e
3
) = e
4
f(e
5
) = e
6
f(e
7
) = e
8
e
1
e
2
e
3
e
4
e
5
e
6
e
7
e
8
Figure 4.1:
An interprocedural flowgraph and its dominator dag.Procedure call edges
(e
1
,e
3
,e
5
,and e
7
) and return edges (e
2
,e
4
,e
6
,and e
8
) are dotted;the call-return corre-
spondence is given by the f() function.
its corresponding return edge;if f((x,s(P))) = (t(P),y) then it is implied that x and y
belong to the same procedure.Figure
4.1
gives an example.
A full path starts at s and ends at t.A full path Q is valid if it has a proper nesting of
procedure calls-returns,i.e.,
²
if Qcontains a return edge e = (t(P),y) then the prefix of Qfroms to t(P) contains
the call edge f
¡1
(e),and
²
if Q contains the call edges e and e
0
,where e precedes e
0
,then f(e
0
) precedes f(e)
in Q.
A valid path is a prefix of a full valid path.A vertex w dominates a vertex v if every valid
path from s to v includes w.
The existence of path-constraints modifies the structure of the dominance relation
(with respect to the standard problem).Specifically,the transitive reduction of the inter-
procedural dominance relation is no longer a tree but a directed acyclic graph.
We have developed efficient algorithms for special cases of this problem by formu-
lating them in the context of join-reachability.We are currently extending our solutions
23
to these special cases in order to derive efficient algorithms for computing the interpro-
cedural dominance relation in the general case.
4.2 Computational Morphological Analysis
Morphology is the study of the internal structure of words.Morphological analysis
consists of the identification of the constituents of words.The smallest meaningful
constituents are called morphemes (e.g.,dog,dog-s).The morphemes have grammatical
functions;they express inflectional properties.For instance,tense and aspect are the
inflectional categories expressed in verbs (play,play-ed,play-ing).
Lexemes are abstract entities and can be thought of as a set of words (PLAY).Word-
forms are concrete entities and belong to a single lexeme (plays,played belong to the
lexeme PLAY).The set of word-forms that belong to a lexeme is called a paradigm.
Word-forms with a concrete meaning are called roots (play).They also consist of affixes
with an abstract meaning (play-ing,play-ed,play-er).Affixes that follow the root are
called suffixes (play-ing,play-ed,play-er).Affixes that precede the root are called prefixes
(re-read).
There are four major theoretical approaches to inflection;see [
Se01
].In the present
study,we adopt the framework of Distributed Morphology.For simplicity reasons,we
only provide a brief sketch of a possible morphological analysis of some forms of a
verbal paradigmin Greek [
Gal05
].
Some of the core questions and issues we need to ask and take into account in any
given morphological analysis are:
-
What morphological units languages consist of?
-
What features are expressed in each morpheme?
-
How do different morphemes interact with one another?
-
Can all morphemes be matched to one another?
24
-i
-os
-tirio
-tis
-tik-
-
¤
smen-
-an
¤
-
-th-
-thik-
-ontas
-omun
-ome
-a
-o
-en-
apolim-
-o
Figure 4.2:
A graph of a morphological analysis.
-
How do we account for any constraints between the matching of morphemes?
Derivation is the process by which new words (with a new meaning) are formed
(read,read-able,kind,kind-ness).Different languages employ different processes by which
derivation occurs.For instance,by affixation.Here,some of the fundamental questions
one needs to answer are:
-
How roots combine with certain prefixes and affixes?
-
What are the constraints in such formations?
-
What about the interface of inflection and derivation?
Computational approaches to morphology can provide empirical evidence that can
help in answering such questions.Parts of such approaches can be formulated as graph
reachability and path-selection problems.A simple example is shown in Figure
4.2
;
Constituents are combined in paths to form a word-form.(We stress that this figure
is not an exhaustive morphological representation.We leave aside any phonological
and/or lexical rules that may further apply.)
25
Chapter 5
Conclusions and Future Work
In this project we studied a collection of Reachability and Path-Selection problems,and
designed efficient algorithms for their solution.We believe that several related topics,
some of which are listed below,deserve further investigation.
Problems related to Reachability:
²
Determine the computational complexity of constructing the smallest join-reachability
graph for simple graph classes such as trees.
²
Provide bounds for the explicit representation of the join-reachability graph for
other interesting graph classes.
²
Consider the problem of approximating the smallest join-reachability graph for
specific graph classes.
Problems related to Path-Selection:
²
Design fast algorithms for testing k-connectivity for constant k > 2.
²
Consider data structures that report fast more than 2 disjoint s-t paths.
²
Design fast (linear or near linear time) algorithms for computing a sparse 2-vertex
connected subgraph of a given graph;The computation of the smallest such sub-
26
graph is NP-hard,so here we are interested in fast heuristics that achieve good
approximation guarantees.
We plan to investigate the above topics in our future studies.
27
Bibliography
[AB00]
F.S.Annexstein and K.A.Berman.Directional routing via generalized
st-numberings.SIAMJ.Discret.Math.,13(2):268–279,2000.
[AGU72]
A.V.Aho,M.R.Garey,and J.D.Ullman.The transitive reduction of a
directed graph.SIAMJ.Comput.,1(2):131–137,1972.
[AHLT99]
S.Alstrup,D.Harel,P.W.Lauridsen,and M.Thorup.Dominators in linear
time.SIAMJournal on Computing,28(6):2117–32,1999.
[ASU86]
A.V.Aho,R.Sethi,and J.D.Ullman.Compilers:Principles,Techniques,and
Tools.Addison-Wesley,Reading,MA,1986.
[BGK
+
08]
A.L.Buchsbaum,L.Georgiadis,H.Kaplan,A.Rogers,R.E.Tarjan,and
J.R.Westbrook.Linear-time algorithms for dominators and other path-
evaluation problems.SIAMJournal on Computing,38(4):1533–1573,2008.
[BJG02]
J.Bang-Jensen and G.Gutin.Digraphs:Theory,Algorithms and Applications
(Springer Monographs in Mathematics).Springer,1st ed.2001.3rd printing
edition,2002.
[CR94]
J.Cheriyan and J.H.Reif.Directed s-t numberings,rubber bands,and
testing digraph k-vertex connectivity.Combinatorica,14(4):435–451,1994.
[CW90]
D.Coppersmith and S.Winograd.Matrix multiplication via arithmetic
progressions.J.Symb.Comput.,9(3):251–280,1990.
28
[DKNS01]
C.Dwork,R.Kumar,M.Naor,and D.Sivakumar.Rank aggregation meth-
ods for the web.In WWW’01:Proceedings of the 10th international conference
on World Wide Web,pages 613–622,2001.
[dSvPdB07]
B.de Sutter,L.van Put,and K.de Bosschere.A practical interprocedural
dominance algorithm.ACMTrans.Program.Lang.Syst.,29(4),2007.
[Gab06]
H.N.Gabow.Using expander graphs to find vertex connectivity.Journal of
the ACM,53(5):800–844,2006.
[Gal05]
A.Galani.The Morphosyntax of Verbs in Modern Greek.PhDthesis,University
of York,UK,September 2005.
[GBT84]
H.N.Gabow,J.L.Bentley,and R.E.Tarjan.Scaling and related techniques
for geometry problems.In Proc.16th ACM Symp.on Theory of Computing,
pages 135–143,1984.
[Geo08]
L.Georgiadis.Computing frequency dominators and related problems.In
ISAAC ’08:Proceedings of the 19th International Symposium on Algorithms and
Computation,pages 704–715,2008.
[Geo10]
L.Georgiadis.Testing 2-vertex connectivity and computing pairs of vertex-
disjoint s-t paths in digraphs.In Proc.37th Int’l.Coll.on Automata,Languages,
and Programming,pages 738–749,2010.
[GNP10]
L.Georgiadis,S.D.Nikolopoulos,and L.Palios.Join-reachability in di-
rected graphs.Manuscript,2010.
[GT04]
L.Georgiadis and R.E.Tarjan.Finding dominators revisited.In Proc.15th
ACM-SIAMSymp.on Discrete Algorithms,pages 862–871,2004.
[GT05]
L.Georgiadis and R.E.Tarjan.Dominator tree verification and vertex-
disjoint paths.In Proc.16th ACM-SIAMSymp.on Discrete Algorithms,pages
433–442,2005.
29
[GT10]
L.Georgiadis and R.E.Tarjan.Dominator verification and independent
spanning trees.Manuscript,2010.
[GTW06]
L.Georgiadis,R.E.Tarjan,and R.F.Werneck.Finding dominators in
practice.Journal of Graph Algorithms and Applications (JGAA),10(1):69–94,
2006.
[HRG00]
M.R.Henzinger,S.Rao,and H.N.Gabow.Computing vertex connectivity:
New bounds fromold techniques.Journal of Algorithms,34:222–250,2000.
[HT84]
D.Harel and R.E.Tarjan.Fast algorithms for finding nearest common
ancestors.SIAMJournal on Computing,13(2):338–55,1984.
[Huc94]
A.Huck.Independent trees in graphs.Graphs and Combinatorics,10:29–45,
1994.
[IR88]
A.Itai and M.Rodeh.The multi-tree approach to reliability in distributed
networks.Information and Computation,79(1):43–59,1988.
[KKS05]
I.Katriel,M.Kutz,and M.Skutella.Reachability substitutes for planar
digraphs.Technical Report MPI-I-2005-1-002,Max-Planck-Institut F
¨
ur In-
formatik,2005.
[LT79]
T.Lengauer and R.E.Tarjan.A fast algorithm for finding dominators
in a flowgraph.ACM Transactions on Programming Languages and Systems,
1(1):121–41,1979.
[NI92]
H.Nagamochi and T.Ibaraki.A linear-time algorithm for finding a sparse
k-connected spanning subgraph of a k-connected graph.Algorithmica,
7:583–596,1992.
[Ple91]
J.Plehn.
¨
Uber die Existenz und das Finden von Subgraphen.PhD thesis,Uni-
versity of Bonn,Germany,May 1991.
30
[RHS95]
T.Reps,S.Horwitz,and M.Sagiv.Precise interprocedural dataflowanalysis
via graph reachability.In Proceedings of the 22nd ACM SIGPLAN-SIGACT
symposium on Principles of programming languages,pages 49–61,June 1995.
[Se01]
A.Spencer and A.Zwicky (eds).The Handbook of Morphology.Blackwell
Publishers,2001.
[Whi87]
R.W.Whitty.Vertex-disjoint paths and edge-disjoint branchings in directed
graphs.Journal of Graph Theory,11:349–358,1987.
[WHY
+
06]
H.Wang,H.He,J.Yang,P.S.Yu,and J.X.Yu.Dual labeling:Answering
graph reachability queries in constant time.In ICDE ’06:Proceedings of the
22nd International Conference on Data Engineering,page 75,2006.
31