tree decomposition - wiki.ornl.gov - Oak Ridge National Laboratory

clumpfrustratedBiotechnology

Oct 2, 2013 (3 years and 10 months ago)

74 views

Scalable Graph
Decompositions

Presented by:



Blair D. Sullivan

Complex Systems Group

Center for Engineering Design & Advanced Research

Computer Science and Mathematics Division

Oak Ridge National Laboratory


Research supported by the Department of Energy’s Office of Science

Office of Advanced Scientific Computing Research

Applied Mathematics Program

Motivation


Massive data with an underlying graph
structure is emerging in many fields
including communication & transportation
networks, bioinformatics, and the power
grid.


Tree & branch decompositions

are
specialized mappings of graphs onto trees,
with quality measured by
width
metrics.
Many data sets exhibit very low
-
width
decompositions independent of the
number of nodes/edges in the graph.

Yeast Protein Interaction Network


Tree/branch decompositions may serve a dual role in analyzing such large graphs:

1.
A tree or branch decomposition naturally breaks up a graph, allowing data
parallel computation.

2.
Thanks to
fixed parameter tractability
, many
NP
-
Hard problems on a graph can
be solved in time that is exponential in the
width

of the decomposition, but
often
linear

in the size of the graph.

Background:


In 2003, Cook & Seymour used a branch decomposition
approach to discover new best known solutions to
several widely
-
studied large Traveling Salesman
Problem instances from the TSPLIB.


Some recent work in bioinformatics uses tree
decompositions for smaller combinatorial problems
where graphs have very small width (e.g. RNA sequence
alignment and protein side
-
chain placement).


Few examples of utilizing these decompositions to
subdivide large graphs for parallel computation and
exploiting fixed parameter tractability.

Optimal TSP tour of 15,112 German cities

Approach:


Use tree decompositions to transform algorithm complexity so that it is exponential in
the width, but polynomial in number of nodes.


Develop efficient, scalable algorithms for computing low
-
width decompositions for large
graphs.


Integrate parallel computing with decomposition algorithms and dynamic programming.

Tree Decompositions


A
tree decomposition

of a graph
G = (V,E)

is a pair
(X,T)
, where
X

is a collection of subsets of
V

and
T
is
a tree with nodes
{1, …, n}
so that each node

i

of
T

has an associated set in

S,
say

X
i

, so that
(X,T)

satisfies three conditions:

1)
The union of the sets in

X
is equal to
V
.

2)
For every edge
(
u,v
)

in
G
,
{u, v}
is a subset of
some
X
i

,

3)
For every vertex
v

in
G
, the set of nodes
whose bags contain
v

form a connected sub
-
tree of
T
.


The
width

of a tree decomposition is the maximum
of |

X
i

| − 1 over
i

=
1, 2, … , n.


The
treewidth

(
tw
) of a graph
G

is the minimum
width over all tree decompositions of
G
.

The Petersen graph and a width 4 tree decomposition are shown at right.
The
subtree

associated with vertex
w
3

is shown in orange.

Finding a Tree Decomposition


Tree decompositions are usually computed via graph
triangulation. Given an
elimination ordering

(linear order
of the vertices), one triangulates the graph by
sequentially adding edges among each vertex’s higher
-
numbered neighbors.


The bags of the tree decomposition correspond to sets of
higher
-
numbered neighbors in the triangulation. If the
max clique in the triangulated graph has size
k + 1
, the
associated tree decomposition has width
k
.



Heuristics

vary greatly in computational
complexity and the width of the
resultin
g
decomposition. Most were designed to
minimize the number of fill edges, not width.

Left: comparison of width and fill from 6
heuristics on graphs known to have
tw

<= 30

Above: A triangulation of the Petersen graph

Branch Decompositions


A
branch decomposition

of a graph
G

is a pair
(T,
φ
)

where
T

is a ternary tree (every node
of T has degree
1

or
3
), and
φ

is a
bijection

between the edges of
G

and the leaves of
T
.


The
width

of an edge in
T

is the number of vertices of
G

shared by leaves on both sides.
The width of the branch decomposition is the max width of its edges.


The
branchwidth

(
bw
) of
G

is the minimum width over all branch decompositions of
G
.


Robertson and Seymour proved that
tw

is bounded between
bw
-
1
and
3*
bw
/2
-
1
.

The Petersen graph (left) and a branch decomposition (center) are shown at the top. The red
edge partitions the graph edges into the sets colored green and blue (right), and thus has width
4, the number of vertices incident with edges of both colors (shaded in red).

Forming a Branch Decomposition

Initial Star

Common algorithms refine a tree with heuristics
until it is ternary, trying to keep middle sets small.
Techniques include searching for 2
-

and 3
-
separations, and using eigenvectors and network
flows.

Intermediate


Tree

Intermediate


Tree

Final

Ternary

Tree


Tree and branch decompositions provide a framework for a variety of dynamic
programming algorithms for NP
-
Hard decision and optimization problems on graphs


The general strategy is to root the tree and then work “up” from the leaves, solving sub
-
problems and storing partial solutions along the way











Solving the sub
-
problems requires information about only a small part of the original
graph, represented by the child nodes lower in the tree


The complexity of processing a specific node can be exponential in its bag size (tree
decomposition) or middle set of the neighboring edges (branch decomposition).

Dynamic Programming

In a tree decomposition, computing the
dynamic programming table at node
c

requires information about the vertices
in the bag
V
c

and the children’s tables,
T
a

and
T
b
.
The complexity of this
computation can be exponential in |
V
c
|.

Parallelization of Dynamic Programming



Individual sub
-
trees in a tree or branch decomposition can be processed independently



Processing each sub
-
tree requires information about only a small part of the graph



As more and more sub
-
trees are processed, we move closer to the root



Will require non
-
traditional parallelization techniques due to difficulty in
estimating computational workload for each sub
-
tree, load
-
balancing, etc

A rooted branch decomposition for a TSP
graph is colored to illustrate the
independent sub
-
trees that can be
processed in parallel during dynamic
programming.

Parallel
-
friendly Decompositions


Decompositions need to have several
attributes to allow
efficent

parallel
computation:

1.
They should be far from being “path
-
like”
to allow multiple sub
-
trees to be
processed simultaneously

2.
The distribution of bag sizes/middle set
cardinalities should be controlled to allow
better load balancing

Sample Applications & Future Work


Cai et al have used tree decompositions to
solve the NP
-
hard problem of maximum
weight independent set for secondary
structure prediction in RNA on low
-
width
stem graphs.


Key combinatorial

scientific computing kernels
use NP
-
hard problems such as graph coloring
for which tree and branch decompositions
may offer faster (lower complexity) solutions.


How will graph decomposition based approaches scale when applied to massive graphs?

Contacts

Blair D. Sullivan

Complex Systems Group

Computer Science & Mathematics Division

Oak Ridge National Laboratory

sullivanb@ornl.gov


Chris
Groër

Computational Mathematics Group

Computer Science & Mathematics Division

Oak Ridge National Laboratory

groercs@ornl.gov