Parallel Algorithms for Hierarchical Clustering and Applications to Split Decomposition and Parity Graph Recognition

spiritualblurtedΤεχνίτη Νοημοσύνη και Ρομποτική

24 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

91 εμφανίσεις

.
Journal of Algorithms 36,205]240 2000
doi:10.1006rjagm.2000.1090,available online at http:rrwww.idealibrary.com on
Parallel Algorithms for Hierarchical Clustering and
Applications to Split Decomposition and Parity
Graph Recognition
Elias Dahlhaus
1
Department of Computer Science and Department of Mathematics,
Uni
¨
ersity of Cologne,Pohligstrasse 1,D-50969 Cologne,Germany
E-mail:dahlhaus@suenner.informatik.uni-koeln.de
Received January 13,1999
.
We present efficient parallel algorithms for two hierarchical clustering heuris-
tics.We point out that these heuristics can also be applied to solving some
algorithmic problems in graphs,including split decomposition.We show that
efficient parallel split decomposition induces an efficient parallel parity graph
recognition algorithm.This is a consequence of the result of S.Cicerone and D.Di
w x
Stefano 7 that parity graphs are exactly those graphs that can be split decomposed
into cliques and bipartite graphs.
Q 2000 Academic Press
Key Words:parallel algorithms;graph algorithms;split decomposition;hierarchi-
cal clustering;single linkage.
1.INTRODUCTION
Hierarchical clustering plays an important role in many areas of applied
science.The major application is the classification of objects as is done in
psychology,the social sciences,and artificial intelligence.The reader who
w x
is interested in these applications should take a look at 20,e.g.
There are different approaches to hierarchical clustering.One is the
 w x.
single linkage method see for example 20.We are given distances
between the elements of a fixed set V and consider two elements to be in
the same d-cluster if one can reach the second element from the first by
one or more jumps through elements of V,such that the distance of each
jump is at most d.
1
Present address:Institute of Computer Graphics,Vienna University of Technology.
205
0196-6774r00 $35.00
Copyright Q 2000 by Academic Press
All rights of reproduction in any formreserved.
ELIAS DAHLHAUS
206
In another approach,we are given a set system and want to turn it into a
hierarchical clustering.That means that we would like to transform the
given set system into another set system,so that the two sets are either
disjoint or one is a subset of the other.We say that two sets overlap if they
intersect but one is not a subset of the other.A natural approach is to
determine the o
¨
erlap components,i.e.,the connected components of the
graph G that consists of the sets of the set system S as vertices and
S
overlapping pairs of sets in S as edges.The clusters are the unions of sets
that are in a certain overlap component.
It turns out that these two clustering procedures have a common
application in graph algorithms.They both appear as subprocedures in an

efficient parallel algorithm for split decomposition preliminary version
w x.
18.A split of a graph is a partition of its vertex set into two sets such that
the edges between the two sets induce a complete bipartite graph.A first
w x
parallel split decomposition algorithm is due to Barten 4.The processor

4.8.1
.
2
.
number of this algorithm is O n and the time bound is O log n.We
will show that split decomposition can be done in polylogarithmic time
 w x.
with a linear processor number preliminary version 18.We also will
show that the algorithm can be turned into a linear time sequential
 w x.
algorithm this result is also mentioned in the preliminary paper 18.The
fastest known split decomposition algorithm that was known before is due
w x 
2
.
to Ma and Spinrad 28 and runs in O n time.Finally,using a result of
w x
Cicerone and Di Stefano 7,it comes out that parity graphs can be
recognized in linear time and in parallel with a linear processor bound in
quadratic logarithmic time.One gets a simplification of an earlier parity
w x
graph recognition algorithm due to the author 16 that has the same time
and processor bound.This algorithm is an improvement of the algorithm
w x
of Corneil and Przyticka 30.
In Section 3 we develop an efficient algorithm for single linkage cluster-
w x
ing.This is an improvement of the algorithm of the author in 14.In
Section 4,we present an efficient parallel and linear time algorithm to
determine the overlap components of a set system.This algorithm has also
w x
been sketched in 18.In Section 5,we develop an efficient parallel and
linear time algorithm for split decomposition.We also discuss the recogni-
tion of parity graphs in this section.
2.PRELIMINARIES
A hierarchical clustering is a collection C of subsets c of a fixed set V,
such that for all c and d in C either c ld sB or c;d or d;c.This
means that if each singleton and V are included in C,the subset relation
defines on C a tree-like partial ordering,i.e.,for each c gC with c/V,
CLUSTERING AND SPLIT DECOMPOSITION
207
there is a unique smallest d gC with c;d.This d is also called the
parent of c.
In general,a tree is a root-directed tree consisting of a set V of nodes
T
and a set E of directed edges.The parent of a node t is the unique
T
.
u gV with u,t gE.The children of t are the nodes that have t as
T T
their parent.
y gV is called an ancestor of x gV iff there is a directed path
T T
.
possibly of length 0 from y to x in T.x is also called a descendant of y if
y is an ancestor of x.The set of descendants of t in T including t is
denoted by T.We identify T and its induced subtree.
t t
For x,y gV the least common ancestor of x and y,denoted by
T
.
LCA x,y,is the common ancestor z of x and y,such that no child of z is
an ancestor of x and y.
A distance function is a binary symmetric positively real valued function
d on a domain V.Moreover,we assume that for x gV the equation
.
d x,x s0 is valid.A metric is a distance function satisfying the triangle
inequality.
Here we always assume that V is a finite domain.Moreover,we let V
 4
be a set of the form 1,...,n.The distance function d is implemented as
an n =n matrix.
A distance function d is called an ultrametric iff the following extended
triangle inequality is valid:
d i,j Fmax d i,k,d j,k.
 4
...
A dendrogram is a root-directed tree T together with a positively real
..
valued labeling h of the vertices with a height function,i.e.,h
¨
-h w if
w is an ancestor of
¨
.
Note that a dendrogram always defines a hierarchical clustering.The
cluster c of a node t of T is just the set of descendants of t in T that are
t
leaves of T.The hierarchical clustering defined by T is the collection of all
c with t gT.
t
 w x.
P
ROPOSITION
See for Example 22.A distance function d on V is an
.
ultrametric iff there is a dendrogram T,h such that
1.V is the set of lea
¨
es of T,and
. ..
2.for all u,
¨
gV the distance d u,
¨
is the labeling h LCA u,
¨
of
.
the least common ancestor LCA u,
¨
of u and
¨
with respect to T.
.
A graph Gs V,E consists of a
¨
ertex set V and an edge set E.Multiple
edges and loops are not allowed.The edge joining x and y is denoted
by xy.
ELIAS DAHLHAUS
208
We say that x is a neighbor of y iff xy gE.The neighborhood of x is
 4
the set y:xy gE consisting of all neighbors of x and is denoted by
.
X

X
.
N x.The neighborhood of a set of vertices V is the set N V s y ¬'x
X
4
X
gV,yx gE of all neighbors of some vertex in V.
.
X X
.
X X
A subgraph of V,E is a graph V,E such that V;V,E;E.An

X X
.
induced subgraph is an edge-preserving subgraph,i.e.,V,E is an in-
.
X X

X
4
duced subgraph of V,E iff V;V and E s xy gE:x,y gV.
Connectedness is defined as usual.
For a distance function d with domain V and a real number r,let G be
r
.
the graph consisting of all unordered pairs with d x,y Fr and let
X
.
d x,y be the minimum r such that x and y are in the same connected
component of G.
r
 w x.
X
L
EMMA
1 See for Example 20.d is an ultrametric.
w x
The proof is trivial and can be seen in 20.
D
EFINITION
1.The clustering defined by the ultrametric d
X
is called the
single linkage clustering of d.d
X
is also called the single linkage distance
function for d.
L
EMMA
2.The clusters of the single linkage clustering of d are exactly the
connected components of any G as defined abo
¨
e.
r
If a single linkage cluster c is a connected component of G then it is
r
also called the single linkage cluster of coarseness r.
.
A minimum spanning tree MST is a tree T with vertex set V,such that
.
the sum of d xy over all edges xy of T is a minimum.
 w x.
L
EMMA
3 See for Example 20.Let T be a minimumspanning tree for
X
.
X X
.
X X
d.Then d x,y is the maximum d x,y o
¨
er all edges x y on the unique
path from x to y in T.
Two sets o
¨
erlap if they are not disjoint and not comparable with
respect to the subset relation.For a collection C of subsets of V,an
o
¨
erlap component is a connected component of the graph G with vertex
C
set C and c c is an edge iff c and c overlap.Note that if C and C are
1 2 1 2 1 2
different overlap components of C then either all c gC and c gC
1 1 2 2
are disjoint or there is a c gC such that for all c gC,c;c,or vice
2 2 1 1 1 2
versa.
We say C;C if there is a c gC,such that for all c gC,c;c.
1 2 2 2 1 1 1 2
.
We denote the set of overlap components of C by Overlap C.
One can show the following.
 ..
L
EMMA
4.Overlap C,;is a tree-like ordering.For each o
¨
erlap
component C of C,there is a unique o
¨
erlap component C such that
1 2
CLUSTERING AND SPLIT DECOMPOSITION
209
C;C and for no o
¨
erlap component C,C;C;C if and only if there
1 2 3 1 3 2
is an o
¨
erlap component C
X
with C;C
X
.
1 1 1
We will show the lemma in the section dealing with overlap components.
3.PARALLEL ALGORITHM FOR THE SINGLE
LINKAGE METHOD
w x
In this section,a preliminary version of which appeared in 14,we show
the following.
T
HEOREM
1.If an MST for d is known then the dendrogramfor the single
X
..
linkage distance d of d can be determined in O log n time with O n
.
processors and O n space on a CREW-PRAM.
They key for an algorithm to determine the dendrogram of the single
w x
linkage distance is the following result in 15.
L
EMMA
5.Each ultrametric d has a minimumspanning tree T with parent
function par,s.t.for each x gV that is not the root of T and that is not a
 .. . ...
child of the root of T,d x,par x -d par x,par par x.If such a
spanning tree for the ultrametric d is known then the dendrogram can be
..
determined in O log n time with O n processors on an EREW-PRAM.
We call a spanning tree with the requirements as stated in Lemma 5 a
canonical spanning tree.
The main job is to compute a canonical spanning tree of the single
linkage distance efficiently.We assume that the MST T of d is directed to
. 4
the root
¨
and p x is the parent of x in T for each x gV _
¨
.Let
0 0
. .....
par x be the first ancestor y of x,such that d x,p x -d y,p y,i.e.,
k
.
l
 . ..
y sp x,for some k,for all l -k,with z sp,d z,p z Fd x,p x,
 .. ...
and d y,p y )d x,p x.If such a y does not exist then par x [
¨
.
0
The construction of par is shown in Fig.1.The straight lines are the edges
of the MST with parent function p.The remaining lines are the edges
.
x par x.
L
EMMA
6.par is the parent function of a canonical MST T
X
for the single
linkage distance d
X
of d.
X
 . ..
Proof.First we show that d x,par x sd x,p x.This follows from
.
the fact that for all edges yz on the unique path from x to par x,
. ...
d yz Fd x,p x and xp x is one of the edges of this path.
ELIAS DAHLHAUS
210
FIG.1.Fromthe MST to the canonical MST of the corresponding ultrametric.
.
Now suppose x is not a child of
¨
with respect to par;i.e.,par x/
¨
.
0 0
Then
d
X
x,par x sd x,p x
..
..
-d par x,p par x sd
X
par x,par par x.
....
..
..
It remains to show the following.
P
ROPOSITION
1.The canonical MST T
X
for the single linkage distance d
X
..
of d can be computed in O log n with O n processors by a CREW-PRAMif
the MST T for d is known.
Proof.We compute the parent function par as in Lemma 6.Here we
.w x
present an algorithm that needs only O n space.The algorithm in 14
.
needs O n log n space.
The structure of the algorithm is as follows.
.
1.We decompose the edge into edge disjoint paths lines,such that
the unique path from each vertex to the root passes only logarithmically
many lines.This can be done by tree contraction in logarithmic time with a
linear processor number.
CLUSTERING AND SPLIT DECOMPOSITION
211
.
2.For each vertex
¨
gV,let l
¨
be the line that contains the edge
..
¨
p
¨
of T.We determine the maximum distance d e of an ancestor edge
..
of
¨
on l
¨
,denoted by w
¨
.This can be done in logarithmic time with a
Ã
w x
linear workload by list ranking 3.
.
3.We determine,for each vertex
¨
,the first ancestor a
¨
,such that
. .
w
¨
)d
¨
,p
¨
.This can be done,for each vertex
¨
,in logarithmically
Ã
.
many steps as follows.For each vertex x we determine the root r
¨
of the
... . ..
line l
¨
.Start with a
¨
[
¨
and assign a
¨
[r a
¨
,as long as w a
¨
Ã
 ..
Fd
¨
,p
¨
.This requires logarithmically many steps,because one passes
at most logarithmically many lines.
. ..
4.Note that par
¨
,the first ancestor w of
¨
,such that d
¨
,p
¨
-
 ... ...
d w,p w,is an ancestor of a
¨
in l a
¨
.We determine par
¨
by a
binary search strategy that will be described below.
3.1.Decomposition into Lines
We first make T a tree that is almost binary;i.e.,each vertex has at most
two children.If a vertex t has children t,...,t then we replace the star
1 k
with vertices t,t,...,t and edges tt by a binary tree B with root t and
1 k i t
leaves t and with the parent function p.The distances of edges in B are
i B t
. ..
determined as follows.The edge t p t gets the distance d t p t [
i B i i B i
.
d t t.All of the other edges of B get the distance zero.Note that the
i t
distances of the unique path from a vertex s to a vertex t in T before
making it almost binary and after making it almost binary deviate only by
additional zero distances.Therefore the maximum distance of the unique
path from s to t does not change.
We next determine the chains of T,i.e.,the maximal path of T,such
that all inner nodes have exactly one child.This can be done by list
w x
ranking in logarithmic time with a linear workload 3.Note that each edge
of T belongs to exactly one chain.In the case that both end vertices of the
edge e of T are not of degree two then the chain containing e consists
.
only of this edge.For each chain c,let b c be the vertex in c that is
..
farthest away from the root of T i.e.the leaf of c and f c be the vertex
.
of c that is closest to the root of T the root of c.
w x
Now we proceed as in the tree contraction procedure as described in 2
 w x.
see also 1.
1.We number the leaves from left to right.
2.For each odd numbered leaf
¨
,we remove
¨
and the inner
..
vertices of the chain c
¨
that contains
¨
and make c
¨
a line.
ELIAS DAHLHAUS
212
.
3.For each c
¨
,we concatenate the two chains of the remaining
 ...
tree T that contain the root f c
¨
of c
¨
to one chain.
4.We renumber the leaves of T by dividing their numbers by two.
We repeat this procedure until the only remaining vertex of T is the root.
w x
By the same argument as in 2,the chains that are removed have
pairwise no vertex in common,because only chains associated to odd
numbered leaves are removed.For the same reason,there are no three
chains that are concatenated to one chain at the same time.Therefore
there is no writing conflict and also no reading conflict.The procedure is
.
repeated O log n times,and since one application of the procedure
.
requires only a constant time,the whole procedure needs O log n time.
w x .
The workload is as in 2,O n.
Since only logarithmically many steps are necessary to eliminate the
whole tree T,one can reach the root from each vertex of the original tree
T by passing logarithmically many lines.Therefore we get the follow-
ing result.
L
EMMA
7.One can split the edge set T into lines,such that the unique
.
path from any
¨
ertex t of T to the root of T passes O log n many lines in
..
O log n time with a workload of O n.
. ..
3.2.Finding par
¨
in l a
¨
. ....
Let w
¨
[d
¨
,p
¨
.par
¨
is the first ancestor of a
¨
,say y,such
..
that w
¨
-w y.
For each line l,let S be a balanced binary tree with the edges of l as
l
leaves.The leaves appear in S in the same order as on the line l.For any
l
.
inner node t of S,let D t be the maximum distance of an edge of l that
l
is a descendant of t.Denote the interval of edges of l that are descendants
.
of t by I.Note that D t is the maximum distance of an edge in I.Our
t t
strategy is as follows.We first search for the next inclusion maximal
...
interval I,say I,that is right from a
¨
with w
¨
-D t.Then by
t t
¨
.
.
binary search in I,we determine the next right ancestor edge e [e
¨
t
¨
.
...
with w
¨
-d e,and we get par
¨
.
... ...
Determine t
¨
I.Let f [a
¨
p a
¨
be the parent edge of a
¨
.
t
¨
.
.
First we search for the first ancestor s
¨
of f that is a right child.If
 .....
D s
¨
)w
¨
then t
¨
[s
¨
.Otherwise we search for the next ances-
..
tor t of s
¨
that is a left child and update s
¨
as the right sibling of t.We
 .....
repeat this step until D s
¨
)w
¨
.Afterward we set t
¨
[s
¨
.
.
Determine e
¨
.Let t be the left child and t be the right child of
1 2
...
t
¨
.If the maximum distance of an edge in I,D t,is larger than w
¨
t 1
1
CLUSTERING AND SPLIT DECOMPOSITION
213
...
then update t
¨
by t,else update t
¨
by t.Repeat this step until t
¨
is
1 2
..
an edge of l.Finally,e
¨
[t
¨
.
..
Note that par
¨
is the child vertex of e
¨
.It is easily seen that for each
¨
both steps can be done in logarithmic time.The space needed for each
¨
separately is constant.Moreover,the tree S can be determined in loga-
l
.
rithmic time with a linear workload.Therefore we find par
¨
in logarith-
mic time with a linear processor number in linear space.This completes the
proof of Proposition 1 and of the theorem.
4.A PARALLEL AND A LINEAR TIME ALGORITHM TO
DETERMINE OVERLAP COMPONENTS
We assume that a collection C of subsets of V is given.The basic
.
strategy is as follows.We determine,for each c gC,a set Max c gC
.
that overlaps with c and which is of maximum cardinality.Max c is only
defined if its size is at least as large as the size of c.The following result is
essential to get all overlap components in an appropriate time and proces-
sor bound.
< < < < < .<
L
EMMA
8.If c F d FMax c and c ld/Bthen d o
¨
erlaps with c
.
or with Max c.
.
Proof.If d does not overlap with c then c;d.But then c and Max c
.
have a nonempty intersection.Therefore if d and Max d do not overlap
..
then d:Max c.But then c:Max c.This is a contradiction.
4.1.Determining the O
¨
erlap Components if Max Is Known
Define C to be the set of c gC,such that x gc.Moreover,we assume
x
that C is sorted in increasing order with respect to size and each C is
x
sorted in the same increasing order with respect to size,i.e.,C s
x

x x
4 <
x
< <
x
<
c,...,c and,with i -j,c F c.We discuss later how to get such a
1 k i j
sorting in linear time.
We use these sortings to construct a graph G with vertex set C with an
C
< <
edge number that is not larger than the sum of V and the sum of the
sizes of c gC.This graph should have the same connected components as
the overlap components of C.
The edge set E of G consists of the edges c
x
c
x
with the property
C C i iq1
< 
x
.< <
x
<
that there is a j Fi with Max c G c.
j iq1
< < < <
P
ROPOSITION
2.1.E FS c.
C cgC
.
2.The connected components of G [ C,E are exactly the o
¨
erlap
C C
components of C.
ELIAS DAHLHAUS
214
Proof of Proposition.It is easily seen that the size of E is bounded by
C
the sum of the sizes of the sets C.But this is also the sum of the sizes of
x
c,c gC.
To prove the second part,we have to show the following.
L
EMMA
9.1.If c
x
c
x
gE then c
x
and c
x
are in the same o
¨
erlap
i iq1 C i iq1
component.
2.If c and d gC o
¨
erlap then they are in the same connected compo-
nent of G.
C
Proof of Lemma.The first part of the lemma is proved as follows.If
x x
< 
x
.< <
x
< <
x
<
c c gE then there is a j Fi with Max c G c.Note that c F
i iq1 C j iq1 j
<
x
< <
x
< < 
x
.<
x

x
.
x x
c F c FMax c.Since c and Max c overlap and c and c
i iq1 j j j i iq1
x

x
.
x x
overlap with c or with Max c,c and c are in the same overlap
j j i iq1
component.
To prove the second part of the lemma,we pick an x gc ld.We
assume w.l.o.g.that c sc
x
,d sc
x
,and i -j.Since c and d overlap,
i j
< .< < 
x
.< < < <
x
<
Max c sMax c G d s c.Therefore,for all l with i Fl -j,there
i j
X
.< 
x
.< <
x
<
X
is an l Fl that is,i with Max c G c.That means that for all l
l lq1
with i Fl -j,c
x
c
x
gE.Therefore c sc
x
and d sc
x
are in the same
l lq1 C i j
connected component of G.
C
.
Proposition
< < < <
It remains to check the complexity of computing E.Let n [ V q C
C
< < .
and m[S c.The size of the input of V,C is n qm.
cgC
P
ROPOSITION
3.E can be computed by an EREW-PRAMin logarithmic
C
.
time with a linear workload and therefore in linear time,pro
¨
ided Max c,for
each c gC and the sortings of C and the sets C,x gV,are known.
x
x
< 
x
.<
Proof of Proposition.For each c,we compute the maximum Max c
i j

x
.
with j Fi and denote it by MAX c.We get this in logarithmic time with
i
 w x
a linear workload by parallel prefix computation see for example 21;
w x.
x x
compare also 27.To check that c c gE,one only has to check that
i iq1 C
x x
< < .
c FMAX c.
iq1 i
C
OROLLARY
1.The o
¨
erlap components of C can be computed in linear
time sequentially and in parallel by a CRCW-PRAMin logarithmic time with a
linear processor number,pro
¨
ided Max and the sortings of C and the sets C
x
are gi
¨
en.
Proof.This follows immediately from Proposition 3 and the fact that
connected components can be computed in the time bounds as mentioned

in the corollary for the sequential case see any textbook on algorithms,
w x w x.
e.g.,9,and for the parallel case see 31.
CLUSTERING AND SPLIT DECOMPOSITION
215
4.2.The Tree Structure of O
¨
erlap Components
First we show now the following lemma that has been mentioned before.
 ..
L
EMMA
10.Overlap C,;is a tree-like ordering.For each o
¨
erlap
component C of C,there is a unique o
¨
erlap component C,such that
1 2
C;C,and for no o
¨
erlap component C,C;C;C if and only if
1 2 3 1 3 2
there is an o
¨
erlap component C
X
with C;C
X
.
1 1 1
Proof of Lemma.First observe that if d overlaps with D c then it
cgC
1
overlaps with some c in the overlap component C.Therefore for any two
1
overlap components,say C and C,either D c and D c are
1 2 cgC cgC
1 2
disjoint or they are comparable with respect to the subset relation.Note
that if C;C then D c;D c.Also,the converse is true,for the
1 2 cgC cgC
1 2
following reason:Suppose D c;D c.Since no c gC overlaps
cgC cgC 2
1 2
with d [D c,there are the possibilities that all c gC are disjoint
1 cgC 2
1
with d or there is a c gC that contains d as a subset.The first
1 2 1
possibility is impossible,because d sD c;D c.Therefore one
1 cgC cgC
2 2
gets a unique``minimal''component C that``contains''C if there is a
2 1
X
component C that contains C.
1 1
The component C as mentioned in the last lemma is also called the
2
parent component of C.The tree that corresponds to the tree-like order-
1
ing;on the overlap components of C is called the o
¨
erlap tree of C and
is denoted by T.
C
P
ROPOSITION
4.The o
¨
erlap tree of C can be determined in constant time
by a CRCW-PRAM with a linear number of processors,pro
¨
ided the o
¨
erlap
components of C and the sortings of C and the sets C with respect to the sizes
x
are known.That means the o
¨
erlap tree of C can be determined in linear time
sequentially.
Proof of Proposition.Let c
x
be defined as above.We have to proceed
i
as follows.If c
x
and c
x
are in different overlap components,say C and
i iq1 1
.
x
C,then we set parent C [C.Note that c is the smallest c gC
2 1 2 iq1
that contains D c,and therefore must be in the``next larger''overlap
cgC
1
component that contains D c as a subset.Therefore C as determined
cgC 2
1
above is really the parent component of C.There might be a writ-
1
ing conflict caused by different c
x
gC.But all these processors write
i 1
the same.The number of processors is linear,and the parallel time is con-
stant.
4.3.Determining Max

2
.
First we describe an algorithm that works in O n time.Afterwards we
transform it into a linear time algorithm that can be parallelized.
ELIAS DAHLHAUS
216
First we sort C in increasing order with respect to size into a sequence
.
c,...,c.This can be done in linear time by bucket sorting see for
1 k
w x.
example 9.This can also be done in logarithmic time with a linear
w x
processor number by an EREW-PRAM 8.
Next we sort V lexicographically with respect to C where the largest
x
c gC has the highest priority,i.e.,if x fc,y gc and,for all j )i,
x i i
either x and y are in c or x and y are not in c then x -j.This again can
j j
be done in linear time.It also can be done in logarithmic time with a
linear processor number by a CRCW-PRAM.Note that one comparison
needs one time unit by a CRCW-PRAM.Combining this with the algo-
w x
rithm of 8,one gets a logarithmic time bound by a CRCW-PRAM.
.
To determine Max c,we use the following observation.
L
EMMA
11.Let c be the element of C with the highest index i that
i
o
¨
erlaps with c gC.Then for all x gc_c and all y gc lc,x -y.
i i
Proof.Note that for all j )i,c;c or c lc sB.Therefore all
j j
x gc do not distinguish in the membership of any c,j )i.Therefore if
j
x gc_c and y gc lc,x -y.
i i
.
Therefore one can determine Max c as follows.We determine the
...
smallest x gc,called min c,and the largest x gc,called max c.Max c
.
is the c with the highest index containing max c and not containing
i
.< .< < <
min c.Finally,we have to check that Max c G c.Obviously this can be
done in quadratic time.
..
Note that min c and max c can be determined in linear time and also
in parallel by an EREW-PRAM in logarithmic time with a linear work-
.
X
load.To determine Max c,we determine the lowest index i,such that
..
X
min c and max c do not distinguish in the membership in c,j Gi.
j
.
We assume that V is sorted to a sequence
¨
,...,
¨
with
¨
-
¨
,i -j.
1 l i j
 ...
Define the barrier of i to be i,b i where the height b i is defined to be
b i [max j ¬
¨
and
¨
distinguish the membership in c.
.
.
i iq1 j
The barriers can be determined in linear time sequentially and in constant
time with a linear processor number by a CRCW-PRAMand therefore in
logarithmic time with a linear processor number by an EREW-PRAM.
L
EMMA
12.Suppose i -j.Then the maximum i
X
,such that
¨
and
¨
i j

X
.
X
X
distinguish in the membership of c,is the maximum b j with i Fj -j.
i

X
.
X
Proof.Let q be the maximum b j,such that i Fj -j.Note that all
¨
X
with i Fj
X
-j do not distinguish in the membership of c
X
,i
X
)q.
j i
Moreover,let q
X
be the maximum index,such that
¨
and
¨
distinguish in
i j
the membership of c
X
.Suppose x distinguishes in the membership of c
Y
,
q q
CLUSTERING AND SPLIT DECOMPOSITION
217
Y X
.
Y
q )q with
¨
and therefore also with
¨
.We assume q is a maximum
i j
.
Y Y
index.Then either x -
¨
if
¨
gc and x fc or x )
¨
.That means
i i q q j
the index of x is not between i and j.Therefore all x
X
,i Fj
X
-j,do not
j
Y X
Y
distinguish in the membership in c,q )q.This proves the lemma.
q
Next we build up a tree structure with the elements of V as leaves and
 ..
the barriers i,b i as inner nodes.
..
X
L
EMMA
13.There are no two i -j with b i sb j and for all i with
X

X
..
i Fi -j,b i -b i.
..
Proof.Let q sb i sb j.Then,since
¨
-
¨
and
¨
and
¨
do
i iq1 i iq1
not distinguish in the membership,for c,p )q,
¨
fc and
¨
gc.
p i q iq1 q
For the same reason
¨
fc and
¨
gc.Since all barriers between i
j q jq1 q
and j are of smaller height than q,
¨
and
¨
do not distinguish in the
iq1 j
membership in c.This is a contradiction.
q
. ... ..
For each
¨
,let left
¨
[ i y1,b i y1 and right
¨
[ i,b i.For
i i i
 .. .. ..
a barrier,i,b i,let left i,b i be the barrier j,b j of maximum index
.. .. ..
j -i with b i Fb j and right i,b i be the j,b j of minimum index
.
j Gi,such that b j Gb.Note that,because of Lemma 13,the heights
i
. .. ..
b j of left i,b i and of right i,b i are always greater than the height
. ..
b i of i,b i.
w x
L
EMMA
14 34.The functions left and right can be determined in linear
time and in logarithmic time with a linear workload by a CREW-PRAM.
.
The tree T with parent function Pa is defined as follows:Pa t is the
....
barrier left t or right t of smaller height and if left t or right t is not
....
defined then Pa t is that of left t or right t that is defined.left t and
.
right t have different heights because of Lemma 13.
 ...
P
ROPOSITION
5.The barrier p,b p of highest height b p with i Fp
-j is the least common ancestor of
¨
and
¨
in T.
i j
...
Proof.For technical reasons,let left t [ 0,`if left t is not defined
...
and right t [ k,`if right t is not defined k is the largest index for
.
the elements
¨
gV.
.
One gets a canonical ordering of VjB B is the set of barriers by the
sequence
0,`,
¨
,1,b 1,
¨
,2,b 2,...,k y1,b k y1,
¨
,k,`.
.....
...
.
1 2 k
.. ..
We say that
¨
is between b i and b j if it is between i,b i and
n
 ..
j,b j in the canonical ordering,i.e.,i -nFj.Analogously,we say that
 ....
a barrier n,b n is between b i and b j if i -n-j.
ELIAS DAHLHAUS
218
 ..
L
EMMA
15.i,b i is an ancestor of
¨
if and only if
¨
is between
j j
 .. ..
left i,b i and right i,b i.
 .. ..
Proof of Lemma.Suppose
¨
is between left i,b i and right i,b i.
j
 .. ..
Then w.l.o.g.we may assume that
¨
is between left i,b i and i,b i.
j
.
Note that always the height of Pa t is greater than the height of t.By
induction one can easily show that if t is an ancestor of
¨
and the height
j
. ..
of t is smaller than the height b i then t is between left i,b i and
 .. ...
i,b i if the height of Pa t is less than b i then Pa t is between
 ..... ...
left i,b i if Pa t sleft t and between t and i,b i otherwise.Con-
.
sider now the ancestor t of
¨
with the highest height smaller than b i.
j
. ... ..
Then left t sleft i,b i and right t s i,b i it cannot be a barrier of
.. .. ..
smaller height.Then clearly Pa t s i,b i,and therefore i,b i is an
ancestor of
¨
.
j
 ..
Vice versa,let i,b i be an ancestor of
¨
.Consider any ancestor t of
j
.. ...
¨
.It is easily seen that if Pa t sleft t then right Pa t sright t and
j
 ....
left Pa t is clearly left from left t.In any case,if
¨
is between left t
j
. .. ..
and right t then
¨
is also between left Pa t and right Pa t.This
j
proves the other direction of the lemma.
The rest of the proof of the proposition works as follows.Consider the
 ..
barrier t s q,b q of highest height between
¨
and
¨
.Then
¨
i j i
..
is between left t and t,and
¨
is between t and right t,and therefore
j
..
both are between left t and right t;i.e.,t is an ancestor of
¨
and
¨
.
i j
X

X
.
Suppose t is a barrier of smaller height than t.Then left t is right from t

X
.
or equals t or right t is left from t or equals t.If t would not be the least
common ancestor of
¨
and
¨
then there would be a common ancestor t
X
i j
.
of smaller height than t.This is a contradiction.Proposition
This proposition reduces the problem of determining Max into the
problem of determining the least common ancestor.We have to determine
 ...
a linear number of least common ancestors of min c and max c and we
.
have a tree of linear size.This allow us to determine Max c,for all c
simultaneously,by an EREW-PRAM in logarithmic time with a logarith-
w x
mic workload 31,and therefore sequentially in linear time.
P
ROPOSITION
6.Max can be determined by an EREW-PRAMin logarith-
.
mic time with a linear workload with respect to the size of the input V,C and
therefore sequentially in linear time.
The overall result of this section is therefore the following.
T
HEOREM
2.The o
¨
erlap components can be determined in linear time
sequentially and by a CRCW-PRAMin logarithmic time with a linear number
of processors.
CLUSTERING AND SPLIT DECOMPOSITION
219
5.EFFICIENT PARALLEL SPLIT DECOMPOSITION
ALGORITHM AND ITS TRANSFORMATION INTO A
LINEAR TIME ALGORITHM
5.1.Formulation of the Problem
.
A split of the graph Gs V,E is a partition of V into two subsets V
1
and V with at least two elements,such that all vertices in V that have
2 1
neighbors in V have the same neighbors in V.
2 2
Split decomposition is the following recursive procedure.
v
If G has a split into subsets V and V,we apply split decomposition
1 2
to graphs G,G that are defined as follows:The vertex sets of G and G
1 2 1 2
 4  4
are V j
¨
and V j
¨
respectively.The additional vertices
¨
and
¨
1 1 2 2 1 2
.
are called
¨
irtual
¨
ertices.The edge set of G G consists of the edges of
1 2
..
G restricted to V V and the edges w
¨
w
¨
,such that w is in the
1 2 1 2
..
neighborhood of V V in V V.
2 1 1 2
v
If G does not have a split then G is called prime.
We call the final graphs created by the split decomposition of G split

components of G.Note that cliques and stars one center vertex joined by
.
an edge with each vertex of an independent set are not uniquely split
w x
decomposable.But Cunningham proved the following 12.
T
HEOREM
3.Each connected graph has a unique split decomposition into
prime graphs,stars,and cliques with a minimumnumber of split components.
.
Modules are splits of special type.By a module of Gs V,E,we mean
a subset V
X
of V such that with y gV _V
X
and u,
¨
gV
X
,both uy and
¨
y
are in E or none of uy and
¨
y is in E.
.
A graph Gs V,E with more than two vertices is called modularly
prime if the only modules are V and the subsets of V contain exactly one
vertex.
A module X:V is called a prime module if the graph that results from
the identification of all vertices that are in the same maximal submodule
of X is modularly prime.A module X is degenerated if X can be
partitioned into submodules X,...,X,such that either for all X,X,all
1 k i j
vertices of X are adjacent with all vertices of X,or for all X,X,all
i j i j
vertices of X are not adjacent with all vertices of X.If all submodules
i j
X,X are pairwise adjacent,X is called positi
¨
ely degenerated;otherwise
i j
the degenerated submodule X is called negatively degenerated.Note that
two modules o
¨
erlap,i.e.,the intersection and both differences are not
empty,only if they are both degenerated.We call a degenerated module
o
¨
erlap-free if it does not overlap with another module.For each overlap
free module M/V,there exists exactly one minimal overlap free module
ELIAS DAHLHAUS
220
M
X
such that M is a proper subset of M
X
.M
X
is also called the parent
.
module of M.Note that with P M sparent module of M,a parent
function of a root-directed tree T with V as its root is defined.We call T
G G
also the modular tree of G.
5.2.The Structure of Splits
.
We assume that Gs V,E is connected.Let T be a spanning tree of
1
.
G and let u,...,u be the postorder enumeration of V.For each
1 n
.
u/u,let Par u be the neighbor u of u with the maximum index j.
i n i j i

Let T be the spanning tree of G with parent function Par.Let L [ x
2 i
.4  4
gV¬ Par x su and L [ u.Call L a layer of G.
i nq1 n i
.
Let V,V be a split of G and u gV.Let U be the set of neighbors
1 2 n 2 1
of V in V.
2 1
L
EMMA
16.All
¨
ertices of U are in the same layer.
1
Proof.Let U be the set of neighbors of V in V.Note that the edges
2 1 2
between V and V are exactly the pairs of vertices
¨ ¨
with
¨
gU and
1 2 1 2 1 1
¨
gU.Pick the vertex u gU with the largest index i.No u of larger
2 2 i 2 j
index than i is V jU,because u has been chosen as an element in U
1 2 i 2
with maximum index and the unique path on T from any u gV to u
1 l 1 n
.
X
must pass a vertex u in U that is because it is an ancestor of u of
l 2 l
larger index l
X
than u.This means that all
¨
gU are in the neighbor-
l 1
hood of u but not in the neighborhood of some u,j )i.Therefore all U
i j 1
are a subset of the layer L.
i
Let W sV _U.Suppose U:L.
1 1 1 1 i
L
EMMA
17.U is a module of G restricted to V _W.
1 1
Proof.W is the set of all vertices in V that have no neighbors in V.
1 1 2
Therefore if we delete the vertices of W from G,only the vertices in V
1 1
that have neighbors in V remain.This is the set U and since all vertices
2 1
in V that have neighbors in V have the same neighbors in V,U is a
1 2 2 1
module in G restricted to V _W.
1
C
OROLLARY
2.U is a module in G restricted to D L.
1 jGi j
L
EMMA
18.W is a union of connected components of G restricted to
1
D L if U:L.
j-i j 1 i
Proof.Note that all neighbors of W that are not in W are in U.That
1 1 1
means also that all neighbors of W are in V.As in the proof of Lemma
1 1
17,we use the fact that for all u gV,j -i,and therefore all
¨
gW are
j 1 1
in an L,j -i.On the other hand,to come through a path from a vertex
j
in W to a vertex not in W one must pass U and therefore vertices in L.
1 1 1 i
Therefore W is a union of connected components of D L.
1 j-i j
CLUSTERING AND SPLIT DECOMPOSITION
221
C
OROLLARY
3.Suppose C is a connected component of G restricted to
..
D L,that is,not in W.Then either U lN C sBor U:N C.In
j-i j 1 1 1
the second case,all
¨
ertices in U ha
¨
e the same neighbors in C,i.e.,for each
1
..
¨
ertex x gC,U:N x or U lN x sB.
1 1
.
T
HEOREM
4.A partition V,V of the
¨
ertex set V of G with u,...,u
1 2 1 n
as defined before and u gV is a split if and only if for the maximum layer
n 2
L with V lL/B,
i 1 i
1.V lL is a module in G restricted to D L,
1 i jGi j
2.W [V lD L is a union of connected components of G
1 1 j-i j
restricted to D L,
j-i j
3.the set of neighbors of W that are not in W is a subset of
1 1
U [V lL,and
1 1 i
..
4.for all x gD L _V,N x lV lL sBor V lL:N x.
j-i j 1 1 i 1 i
Proof.The direction from left to right follows from previous considera-
tions.The other direction can be seen as follows.First observe that all
vertices in V that have neighbors in V sV _V are not in W and
1 2 1 1
therefore in U.Next we use the fact that U is a module in G restricted to
1 1
D L.Therefore all vertices in U have the same neighbors in D L
jGi j 1 jGi j
that are not in V and therefore in V.Finally,by the last item,all vertices
1 2
in U have the same neighbors in V that are in a layer L,j -i.
1 2 j
.
We say that U represents a split if there is a split V,V,such that
1 1 2
¨
gV and U is the set of vertices in V that have neighbors in V.
n 2 1 1 2
From the last theorem we get the following.
T
HEOREM
5.U represents a split iff there is an i with U;L,such that
1 1 i
1.U is a module in D L,
1 jGi j
2.there is a union W of connected components of D L,such that
1 j-i j
all neighbors of W that are not in W are in U and the
¨
ertices of
1 1 1
D L _W are adjacent either to all
¨
ertices of U or to no
¨
ertex of U.
j-i j 1 1 1
5.2.1.Stars
Stars come up if there is the following situation:V is decomposed in
V,...,V.For each V,there is a subset U such that with i/1,all
1 k i i
vertices in U are adjacent to all vertices in U.
i 1
First assume that
¨
is in V.Then U,...,U all represent a split and the
n 1 2 k
union of U,...,U represents a split.Therefore all U,...,U are in the
2 k 2 k
.
same level L and forma not necessarily overlap free negatively degener-
i
ated module of L.We say that D
k
U represents a star of the first kind
i js2 j
.
see also Fig.2.
ELIAS DAHLHAUS
222
FIG.2.Star of the first kind.

k
.
Next assume that
¨
gV.Then D V,V jV and V jD V,
n 2 js3 j 1 2 1 jG3 j
.
V are splits.That means U represents a split,and all U with j G3 and
2 1 j
D U represent splits.We assume that U is in layer L.The
¨
X
of
jG3 j 1 i i
largest index that is adjacent to some vertex in U,j G3,is in U.
j 1
Therefore all U,j G3,are in one layer L
X
with i
X
-i.The U,j G3,form
j i j
a negatively degenerated module in L
X
.Moreover,all neighbors of the U,
i j
j G3,that are in an L
Y
,i
Y
Gi
X
,are in U.We say that U and D
k
U
i 1 1 js3 j
.
together represent a star of the second kind see also Fig.3.U is also
1
called the center of the star of the second kind it represents U,...,U are
3 k
called the child representati
¨
es of U.
1
5.2.2.Cliques
Cliques come up if there is the following situation:V is decomposed in
V,...,V.For each V,there is a subset U such that with i/j,all vertices
1 k i i
in U are adjacent to all vertices in U.Without loss of generality,we
i j

k
..
X X
assume that
¨
gV.Then D V,V and each V,D V with j/1
n 1 js2 j 1 j j/j j
are splits,and therefore D
k
U and each U,j/1,represent a split.All
js2 j j
U,j G2,are in the same layer L.D U forms a positively degenerated
j i jG2 j

module of L with U,...,U as submodules not necessarily overlap free
i 2 k
.
k
modules.We say that D U represents a clique.
js2 j
5.3.An Outline of the Algorithm
The basic strategy is as follows.
1.We first compute the sets U that represent a split.
1
2.We extract stars and cliques.
3.We determine the prime components.
CLUSTERING AND SPLIT DECOMPOSITION
223
FIG.3.Star of the second kind.
5.3.1.The Computation of the Sets Representing a Split
First we compute,for each i,the set C
C
of connected components of
i
D L.A compact representation of the sets C
C
will be discussed in the
j-i j i
next subsection.Let D
D
be the set of those CgC
C
such that all neighbors
i i
of C in D L are in L.
jGi j i
For each L,we compute the set M of overlap free modules of L.
i i i
 .4  .4
Let S sM j N C lL ¬ CgC
C
j N x ¬ x gD L.Note that
i i i i j/i j
S is a multiset.
i
.
L
EMMA
19.Let V,V be a split decomposition of G,u gV,and V
1 2 2 2
be connected.Let U be the set of neighbors of V in V and U:L.Then,
1 2 1 1 i
for each XgS,X is a subset of U,U is a subset of X,or X and U are
i 1 1 1
disjoint.Moreo
¨
er,if X is a prime module then X:U or XlU sB or
1 1
ELIAS DAHLHAUS
224
there is a child module X
X
of X such that U:X
X
,and if X is the
1
neighborhood of some x gL,j )i,then either U lXsBor U:X.
j 1 1
.
Proof.Suppose XsN x lL and x gL,j -i,and x gV.Then
i j 1
.
x gV _U and N x:U.Otherwise x is joined by an edge with all
1 1 1
vertices in U or with no vertex in U,and the vertices in U are the only
1 1 1
vertices of V for which x is joined by edges.Therefore U:X or
1 1
U lXsB.In any case,if x gL,j )i,x fV,and therefore either
1 j 1
U:X or XlU sB.
1 1
Note that U is a module of L and therefore does not overlap with any
1 i
overlap-free module of L.Therefore U is either a subset of a degener-
1 1
ated module that does not overlap with its child modules or it is a prime
module.Therefore if X is a prime module,U lXsB or X:U or X is
1 1
a subset of a child module of X.
Finally,let X be the neighborhood of a connected component of
.
D L in L.Then X is the union of some N x lL,x gL,j -i,and
j-i j i i j
therefore XlU sB or X and U are comparable with respect to the
1 1
subset relation.
We continue as follows.
v
We compute the overlap components of S.Note that it is possible
i
that A,BgS which represent the same set might be in different overlap
i
 4  4
components.This can be if A and B are one-element overlap compo-
nents.In that case,we consider overlap components as equi
¨
alent but not
equal.
v
.
If x gCgC
C
then the overlap components of N x lL and
i i
.
N C lL and all of the overlap components in between are combined
i
into one component.
v
If an overlap component A,i.e.,the union of all sets of A,is
contained in a prime module X
X
but not contained in a child module of
X
X
then c and the overlap component A
X
containing X
X
are unified into
one component.
v
If a component A contains the neighborhood of some x gL,j )i,
j
then all components containing A are combined into one component,and
if the neighborhood of x is not the only element of A then we put A also
into this component.These components are marked as bad components.
v
.
If,for CgC
C
,N C _C contains vertices in some L,j )i,then
i j
.
the component A containing N C lL and all its ancestor components
i
but not components equivalent to A are unified into one component,and
this component is marked as a bad component.
Components resulting from this procedure are called clusters.The
subset relation on clusters is defined as the subset relation on overlap
components.
CLUSTERING AND SPLIT DECOMPOSITION
225
For any cluster A,let L be the set of connected components C of
A
.
D L such that the cluster,which contains N C lL,is a subcluster of
j-i j i
A or A itself.Let U be the union of all sets in A.
A
L
EMMA
20.If A is a cluster than U represents a split if and only if A is
A
not a bad component.
Proof.First we show the following.
S
UBLEMMA
1.If A is a cluster that is not a bad component then U is a
A
module of D L.
jGi j
Proof of Sublemma.First we have to show that U is a module in L.
A i
By construction,each U does not overlap with any overlap free module of
A
L.Next suppose X is a prime module and U;X.It cannot be that U is
i A A
not a subset of a child module of X.Otherwise X and the sets in A would
have been unified into one cluster.Therefore U is a module in L.
A i
To show the sublemma we have to show that for each x gL,j )i,
j
either all vertices in U or no vertex in U is a neighbor of x.Suppose
A A
there is a vertex x gL,j )i,that is a neighbor of some but not of all the
j
.
vertices of U.Then N x lL overlaps with some set in A or is a subset
A i
of all sets in A.Therefore A would be a bad component.
To show the lemma,we first have to construct a split if U satisfies the
A
conditions as stated above.Let V be the union of U and all connected
1 A
..
components C of D L with N C lL gA or N C lL in a sub-
j-i j i i
cluster of A.Since A is not a bad component,no such component has
neighbors in some L,j )i,and U is a module in D L.The only
j A jGi j
vertices in V that have neighbors outside V are therefore the vertices in
1 1
U.Suppose x gV _V.If x gL,j Gi,then it is adjacent to all vertices
A 1 j
of U or to no vertex of U,since U is a module in D L.If
A A A jGi j
.
x gL,j -i,then N x lL is not in A and not in a subcluster of A,
j i
..
and therefore there are only the alternatives that U:N x or U lN x
A A
sB.
.
Vice versa,suppose U represents a split,say V,V.Then V _U
A 1 2 1 A
consists of connected components C of D L.If C;V then all of its
j-i j 1
neighbors in L,j Gi,are in U and therefore in L.The neighborhoods
j A i
of these components C are therefore in A or in a subcluster of A.
Moreover,they do not make A a bad component.Now consider any
..
x gL,j )i.Then U:N x of U lN x sB,since U is a module in
j A A A
.
D L.Therefore U:N x lL and therefore A is a subcluster of the
jGi j A i
. .4
cluster containing N x lL or As N x lL.A is not made a bad
i i
.
component by N x.
ELIAS DAHLHAUS
226
We call a set U that represents a split as in the last lemma a cluster split
A
representati
¨
e and A a split cluster.
Not all sets U representing a split are cluster split representatives.If U
1 1
is a degenerated module then it is not necessarily an overlap-free degener-
ated module,but only a subset of an overlap-free degenerated module of
L,say m.Let A
X
be the cluster containing m.Then all U with U;U
i A A 1
are contained in the same sets of A
X
,and A
X
is the parent cluster of each
such A,such that U is a maximal subset of U.
A 1
.
For each A let Buf A be the sets of the parent cluster of A that
contain U,called the buffer of A.
A
L
EMMA
21.U represents a split if and only if U is a cluster split
1 1
representati
¨
e or if U is a union of cluster split representati
¨
es U,such that
1 A
.
their split clusters ha
¨
e all the same buffer Buf A,U is not the neighbor-
A
.
hood of some x gD L in L,and if Buf A contains a module then the
j)i j i
.
smallest module in Buf A is not a prime module.
Proof.Suppose U is a split representative and represents the split
1
..
V,V.Then U is a module not necessarily overlap free of D L,for
1 2 1 jGi j
some i.Let C be the set of connected components of V _U and M be
1 1 1 1
the set of overlap free modules that are subsets of U.If U is a prime
1 1
module then U itself is a cluster split representative.We assume that U
1 1
is a subset of a degenerated module that does not overlap with its child
.
modules.U also does not overlap with any neighborhood N c of a
1
connected component of D L.Let A,...,A be the clusters that are
j-i j 1 k
contained in U and that are maximal with respect to the subcluster
1
relation.It includes also the case that there is only one A and U sU.
l A 1
l
Then U is a cluster split representative.Now assume that all U are
1 A
l
properly contained in U.Note that all U are not bad components and
1 A
l
are therefore cluster split representatives.All U are in the neighborhood
A
l
of the same x not in V and therefore have the same buffer.If the buffer
1
of any A contains a module then it contains also the smallest overlap-free
l
module that contains U.This is a degenerated module.Note that each A
1 l
is not the neighborhood of some x gD L in L.
j)i j i
Vice versa,let A,...,A be cluster split representatives with the same
1 k
..
buffer Buf A.First assume Buf A does not contain a module.We have
i i
to show that the smallest overlap-free module m that contains all U is a
A
i
degenerated module.If m is a prime module then the cluster containing
..
Buf A is unified with m and therefore Buf A contains a module.
i i

i i
.
i
We assume that U represents the split V,V and V is maximal
A 1 2 1
i
under this condition.That means V
i
contains all connected components c
1
.
of D L with N c lL:U.Since all A have the same buffer,every
j-i j i A i
i
CLUSTERING AND SPLIT DECOMPOSITION
227
.
N x lL,x fL,is either contained in some U,contains all U,or is
i i A A
l l
.
disjoint with all U.If x gL,j )i,then N x lL cannot be a proper
A j i
l
subset of any U,because each A is not a bad component.By assump-
A l
l
.
tion,N x lL cannot be equal to some U.Therefore such an x is
i A
l
either adjacent to all vertices in the union of A or to none of these
l
vertices.Since each A is not a bad component,each A and each
l l
subcluster of A does not contain the neighborhood of some connected
l
component c of D L that has neighbors in an L
X
,j
X
)i.
j-i j j
1 k
.
With V sV j???jV,V,V _V is a split that is represented
1 1 1 1 1
by U.
A
We can immediately derive the following.
L
EMMA
22.If U and D
k
U together represent a star of the second kind
1 js2 j
then there are i -i such that U sU,for some cluster A in L;all U,
1 2 1 A i j
2
j G2,are of the form U where A is a cluster in L;all A ha
¨
e the same
A j i j
j 1
buffer B;the buffer of B and any ancestor component do not contain an
.
N C lL or a prime module or a positi
¨
ely degenerated module;and U is
i 1
1
a cluster split representati
¨
e.
L
EMMA
23.If A,...,A are the clusters with buffer B,no buffer of any
2 k
.
A or ancestor cluster of A contains a set N C lL or a prime module or a
j j i
1
positi
¨
ely degenerated module,and the neighborhood of any U,j G2,in
A
j
D L is a cluster split representati
¨
e U,then U together with D
k
U
l Gi l A A js2 A
1 1 1 j
represents a star of the second kind.
Proof.Note that D
k
U represents a split.Moreover,all U have no
js2 A A
j j
neighbors in L _U,because the buffer B and the buffers of any
i A
1 j
ancestor cluster of the A do not contain a prime or positively degenerated
j
module.If C is a connected component of D L then its neighborhood
jFi j
1
is either a subset of some U or it is disjoint with all U,because the
A A
l l
buffer B and the buffer of any ancestor cluster of the A does not contain
l
.
a set N C lL.Since all U,l s2,...,k,have as neighbors in D L
i A jGi j
1 l 1
exactly the vertices in U,we get a split representative of the second kind.
A
1
L
EMMA
24.Suppose U,...,U represent a star of the first kind.Then all
1 k
U are cluster split representati
¨
es.Assume U sU,i s1,...,k.Then all
i i A
i
.
the clusters A ha
¨
e the same buffer Buf A and the smallest module that is
i i
.
in Buf A or in the buffer of an ancestor of A is a negati
¨
ely degenerated
i i
module.
Proof.Note first that the union of the U is a negatively degenerated
i
module.If U is not a cluster split representative then also U is a
i i
ELIAS DAHLHAUS
228
negatively degenerated module and it can be split into cluster split repre-
sentatives U
1
,...,U
q
.But then also U,...,U,U
1
,...,U
q
,U,...,U
i i 1 iy1 i i iq1 k
represent a star of the first kind.Since the union of the U represents a
i
.
split not necessarily is it a cluster split representative,all A have the
i
same buffer.Since the union of the U is a negatively degenerated module,
i
the smallest overlap-free module containing the union of the U is nega-
i
tively degenerated.This is the smallest module in the buffer of any A or
i
in any ancestor cluster of A.
i
.
L
EMMA
25.If clusters A,...,A ha
¨
e the same buffer Buf A,the
1 k i
smallest module in the buffer of the A or an ancestor cluster of A is a
l i
negati
¨
ely degenerated module,and A,...,A are not a part of the star of the
1 k
second kind,then U,...,U represent a star of the first kind.
A A
1 k
Proof.Since the clusters A,...,A have a buffer,that are not unified
1 k
with the root cluster and therefore they are not bad components and
therefore they represent splits.The union of the U is a negatively
A
l
degenerated module,because the smallest overlap-free module containing
the U is negatively degenerated.Note that the union of the U repre-
A A
l l
sents a split,because all the A have the same buffer.
l
L
EMMA
26.Suppose U,...,U represent a clique.Then all U are cluster
1 k i
.
split representati
¨
es,say U sU,and all A ha
¨
e the same buffer Buf A.
i A i i
i
Moreo
¨
er,the smallest module that is in the buffer of A or of an ancestor of
i
A is a positi
¨
ely degenerated module.
i
Proof.Whenever we have a clique component,we have a decomposi-
tion of the vertex set V of G in V,...,V,U:V,l s1,...,k q1,and
1 kq1 l l
all vertices in different U are pairwise joined by an edge.There are no
l
other edges between different V.Without loss of generality,we assume
l
.
that
¨
gV.Since V j???jV,V is a split,all U,...,U are in
n kq1 1 k kq1 1 k
the same layer L.The union of the U,l s1,...,k,is a positively
i l
degenerated module and therefore the smallest overlap-free module con-
taining the U is positively degenerated.Therefore the smallest module in
l
the buffer of any A or of an ancestor cluster is a positively degenerated
l
module.
Vice versa,clusters A as mentioned in the last lemma represent
l
cliques.
L
EMMA
27.Suppose A,...,A ha
¨
e the same buffer,and that the
1 k
smallest module in the buffer of the A or in the buffer of an ancestor cluster of
l
the A is positi
¨
ely degenerated.Then U,...,U represent a clique.
l A A
1 k
Proof.Since the buffers of the A are defined,the A are not identi-
l l
fied with the root cluster and the A are not bad components.Since the
l
CLUSTERING AND SPLIT DECOMPOSITION
229
smallest module containing the U is positively degenerated,all vertices in
A
l
different U are pairwise joined by an edge.Each A represents a split
A l
l
.
V,W.We assume that V is maximal under the condition that U
l l l A
l
.
represents a split V,W.That means each connected component c of
l l
D L is in V,or the neighborhood of c in L is disjoint with U,or the
j-i j l i l
neighborhood of c in L contains all the U.The union of the U
i l A
l

k
.
represents a split D V,W.Let U be the set of vertices in W that have
ls1 l
neighbors not in W.U is just the set of neighbors of the union of the U
A
l
that are not in the sets V.All the vertices that are in different U,W are
l A
l
pairwise joined by an edge.Therefore U,...,U represent a clique.
A A
1 k
Algorithmically we proceed as follows.
1.We compute the clusters as mentioned above and select the good
clusters,i.e.,those clusters that are not bad.
2.We select the U,U,...,U representing a star of the second kind
1 2 k
as follows.Let A,...,A be the clusters that have the same buffer.First
2 k
we check that the buffers of A and of any ancestor of A contain only
j j
.
negatively degenerated modules and sets of the form N x lL,x gL,
i l
l )i;i.e.,they do not contain any module that is prime or positively
.
degenerated and they do not contain any set N C lL,l -i.Then we
l
check that the neighborhood of all U in D L is a cluster split
j l Gi l
representative.
3.We select the U,...,U representing stars of the first kind as
1 k
follows.Again let A,...,A be the clusters that have the same buffer
1 k
.
Buf A.We assume that A,...,A do not pass all the checks of the
j 1 k
previous step.To represent a star of the first kind,one only has to check
that the smallest module in the buffer of A or of an ancestor of A is a
j j
negatively degenerated module.
4.To select U,...,U that represent a clique,one determines
1 k
A,...,A with the same buffer,such that the smallest module that is in
1 k
the buffer of A or of an ancestor cluster of A is a positively degenerated
j j
module.
5.3.2.Computation of the Connected Components of D L
j-i j
.
We assign each edge xy of G a distance d x,y that is the maximum i
.
with x gL or y gL,i.e.,if x gL and y gL then d x,y is the
i i i j
maximum of i and j.
L
EMMA
28.The connected components of D L are the single link
j-i j
clusters C of d that are of coarseness i y1.
ELIAS DAHLHAUS
230
Proof.Note that an edge xy joins two vertices that are in D L if
j-i j
.
and only if d x,y Fi y1.Therefore the single link clusters of coarseness
i y1 coincide with the connected components of D L.
j-i j
To get the connected components of D L,for all i,we perform a single
j-i j
..
link clustering on the distance function d x,y smax j:x gL or y gL.
j j
.
This can be done in logarithmic time with O n qm processors on a
CRCW-PRAM.
To determine the neighborhood of a connected component C of D L,
j-i j
we proceed as follows:
For any edge xy with x gL,y gL,and j -i,we determine the
j i
largest single linkage cluster CsC that contains x but not y and put
 x,y.
an edge Cy gE
X
.Note that for all xy gE simultaneously,all C can be
 x,y.
determined in logarithmic time with a linear workload by determining the least
common ancestor D of x and y in the single linkage clustering.C is just
x,y  x,y.
 w x.
the child of D that is an ancestor of x see 1.
x,y
Note that the number of edges in E
X
is bounded by the number of edges
X
.
of G and that E can be determined in O log n time with a linear
workload.
.
L
EMMA
29.For a connected component C of D L,N C lL s y
j-i j i
X
4
gL ¬ Cy gE.
i
.
Proof.y gN C lL is equivalent to the statement that there is an
i
x gC with xy gE.Since C is a connected component of D L,it is a
j-i j
maximal single link cluster of coarseness i y1 and therefore a maximal
single link cluster containing x but not y.Therefore Cy gE
X
.Vice versa,
suppose Cy gE
X
and y gL.There is an x gC with xy gE.Note that
i
the coarseness of C is less than i.Otherwise x would belong to C.C is a
maximal cluster of coarseness -i that contains x and is therefore a
connected component of D L.
j-i j
C
OROLLARY
4.If there is a y gL with Cy gE
X
then C is a connected
i
component of D L.
j-i j
.
X X
To get the sets N C lL one determines the edge set E with Cy gE if
i
there is an x gC,such that C is the maximum single linkage cluster
X
.
containing x and not y.For each L that contains y with Cy gE,N C lL
i i

X
4
s y gL ¬ Cy gE.
i
Next we check whether C is bad in L,i.e.,C has neighbors in L and
i i
in an L,j )i.
j
For each single linkage cluster C,we determine the maximum i,such
that C has a neighbor in L,say p.
i C
Clearly,C is bad in L if there is a y gL with Cy gE
X
and i -p.p
i i C C
can be determined simultaneously,for all C,in logarithmic time with a linear
 w x.
workload by tree contraction see 1.
CLUSTERING AND SPLIT DECOMPOSITION
231
5.4.Determining Splits
The basic idea is to traverse along the split representatives.Let U be a
.
split representative of a split V,V with U:V.Then we provide nodes
1 2 1
u
U
and
¨
U
.
¨
U
is joined by an edge with all vertices in U and
¨
U
is joined
in out in out
by an edge with all neighbors of U in V.
2
L
EMMA
30.If U is a split representati
¨
e and not the center of a star of the
.
second kind then there is a unique split V,V with
¨
gV and U is the set
1 2 n 2
of
¨
ertices in V that ha
¨
e neighbors in V.The neighborhood of U:L in V
1 2 i 2
consists of
1.all
¨
ertices in D L that are neighbors of U and not in U,and
jGi j
2.all
¨
ertices x gD L such that x is a neighbor of all
¨
ertices in U
j-i j
and U is the union of a proper subcluster or a buffer of a proper subcluster of
.
the cluster that contains N x gL.We also call x an outer neighbor of U.
i
.
Proof.Assume that U represents more than one split V,V and
1 2
.
W,W.Then also
1 2
V,lW,V jW
.
1 1 2 2
and
V jW,V lW
.
1 1 2 2
and splits represented by U.Let W be the intersection of all V with the
1 1
.
X
property that U represents a split V,V,and let W be the union of all
1 2 1
.
X
V such that U represents a split V,V.All vertices in W _W _U that
1 1 2 1 1
have neighbors in U have the same neighbors in U and have no other
X
.
neighbors in W.Therefore with X sW _W _U,X,V _X is a split.
1 1 1 1 1 1
Note that X is a union of connected components of D L if U:L,
1 j-i j i
say C,...,C.All vertices in any C that have neighbors outside C have
1 p l l
exactly the vertices in U as neighbors.Therefore U is the center of a star
of the second kind.This is a contradiction.
.
Assume U is not the center of a star and V,V is the split represented
1 2
by U.Then the neighborhood U in V is clearly determined as in the
2
lemma.
The basic idea of a split decomposition algorithm is as follows:For each
cluster split representative U we introduce a vertex
¨
U
that is joined by an
in
edge with all vertices in U and a vertex
¨
U
that is joined by an edge with
out
.
all neighbors of U not in V,where V,V is the split represented by U.If
1 1 2
the split is unique then we can do this.If this is not the case then U is a
center of a star of the second kind.
If U is the center of a star of the second kind and U,...,U are the
1 l
child representatives of U then we split U formally into two representa-
ELIAS DAHLHAUS
232
tives,U and U.The vertices in U,...,U are considered as outer
down up 1 l
neighbors of U but not as outer neighbors of U.U is considered
down up down
as a child representative of U.
up
We also have to parallelize this procedure.
1.We join
¨
U
and all
¨
U
X
where U
X
is a maximal subset split
in out
representative of U or a vertex of the same layer L that is in U but not in
i
a smaller split representative than U.
2.If xy gE and x,y gL,let U be the maximum split representa-
i x
tive that contains x but not y and U be the maximum split representative
y
that contains y but not x.We join
¨
U
x
and
¨
U
y
by an edge.
out out
3.Suppose x gL,y gL,i -j,and xy gE.Let U be the maxi-
i j x
mum split representative containing x and U be the maximum split
y
representative containing y and having x as an outer neighbor.We join
¨
U
x
and
¨
U
y
by an edge.
out out
T
HEOREM
6.The decomposition procedure as described abo
¨
e determines
the unique split decomposition into stars,cliques,and prime components.
Proof.We also could proceed as follows.We first eliminate the stars.
Then we get the maximal stars.Then we eliminate the cliques.Then we
get the maximal clique components.Finally,we only have to determine the
prime components in the components we got by the decompositions
before.They are unique in any case.Therefore we get the unique split
decomposition.
5.5.Complexity Analysis
When we compute overlap components,we also always use overlap free
modules of L as sets.To get a linear time bound and a linear processor
i
bound in parallel,we have to show the following.
T
HEOREM
7.In any graph G with n
¨
ertices and m edges,the set of pairs
.
x,M,such that x gM and M is an o
¨
erlap-free module,is bounded by
n qm.
Proof.Suppose M is a prime module or a positively degenerated
module.Then the number of vertices in M is bounded by the number of
edges that join two vertices in M that are not in a common child module
of M.They are exactly those edges xy such that the smallest overlap-free
.
module containing x and y is M.Therefore the number of pairs x,M,
such that x gM and M is an overlap-free module that is not a negatively
degenerated module,is bounded by the number of edges.
Now suppose M is a negatively degenerated module.We first assume
that M is not the whole graph.Then M has a parent module M
X
that is
CLUSTERING AND SPLIT DECOMPOSITION
233
not a negatively degenerated module.Therefore there are edges that join
all vertices in M with a vertex in M
X
_M.They are just those edges xy
with x gM,and the smallest overlap-free module containing x and y is
M
X
.The number of vertices in M is bounded by the number of edges xy,
x gM,y gM
X
_M.Every such edge xy joins at most two child modules of
M
X
that are negatively degenerated.Therefore one can bound the number
.
of x,M,such that x gM and M is an overlap-free negatively degener-
.
ated module/V by the number of edges of Gs V,E.
If MsV then M is bounded by n.Only one overlap-free module is V.
We also have to compute the modules of each L.
i
w x 
2
.
L
EMMA
31 17.Modular decomposition can be done in O log n time
.
with O n qm processors on a CRCW-PRAM.
Recall that,for a connected component C of D L and a y gL,
j-i j i
Cy gE
X
if and only if y is in the neighborhood of C.We determined these
connected components of D L by the single linkage method which
j-i j
consists of the computation of a minimum spanning tree and the applica-
tion of the single linkage algorithm.The overall complexity is in parallel

2
..
O log n time with a processor number of O n qm on a CREW-PRAM
w x
X
6.To determine whether Cy gE,we compute,for each edge xy,the
largest component C that contains x but not y.That means,in the cluster
tree of the single linkage clustering we compute the least common ances-
tor C
X
of x and y and the child C of C
X
that is an ancestor of x.This can
be done,for all edges xy simultaneously,in logarithmic time with a linear
w x
processor number 33 on a CREW-PRAM.Immediately we get the
following result.
L
EMMA
32.E
X
and therefore the neighborhood of any connected compo-

2
..
nent of D L in L can be computed in O log n with O n qm
j-i j i
processors on a CREW-PRAM.The size of E
X
does not exceed the number m
of edges.
As a consequence of the last lemma and the last theorem we get the
following.
C
OROLLARY
5.The o
¨
erlap components of modules,of neighborhoods of
¨
ertices in D L,and of neighborhoods of connected components of
j/i j

2
.
D L can be determined by a CREW-PRAM in O log n time with
j-i j
.
O n qm processors.
To check whether a connected component C of D L is a bad
j-i j
component,i.e.,has a neighbor in L,k )i,we only determine the
k
ELIAS DAHLHAUS
234
maximum k,such that C has a neighbor in L.This can be done as
k
follows.For each vertex x,we determine the maximum k,such that x has
x
a neighbor in L.We determine by tree contraction the minimum k with
k x
x
w x
x gC.This can be done in logarithmic time with a linear workload 1.
To collect overlap components to clusters we are given a set of pairs
.
c,c where c is an ancestor overlap component of c,and we collect
1 2 2 1
c,c,and all ancestors of c that are descendants of c to one compo-
1 2 1 2
nent.
.
1.If c is the overlap component containing N x lL and x g
1 i
.
D L,then c is the overlap component that contains N C lL
j-i j 2 i
where C is the connected component of D L that contains x.
j-i j
.
2.Let c sN x lL and x gD L.If the overlap component
i j)i j
containing c has more than one element then c is the overlap component
1
containing c,and c is the root overlap component.If c is the only
2
element of its overlap component then c is the parent overlap component
1
of c and c is the root overlap component.
2
3.Let c be a module,c
X
be the parent module of c,and c
X
be a
prime module.Suppose c and c
X
are in different overlap components.If
the overlap component containing c has more than one element then c is
1
the overlap component containing c and c is the overlap component
2
containing c
X
.If c is the only element of an overlap component then c is
1
the parent overlap component of c and c is the overlap component
2
containing c
X
.
.
4.If C is a bad component then c [N C lL and c is the root
1 i 2
overlap component.
.
We can compute the set Pairs of these pairs c,c in logarithmic time
1 2
with a linear workload on an EREW-PRAM.
The collection of overlap components to clusters is done as follows.First
we check whether the component c and its parent component c
X
have to
be collected to one cluster as follows.We compute,for each overlap
.
component c,the``size''of Si c,i.e.,the number of descendants.Then,
.
for each c,we compute the maximum size M c of an overlap component
.
c with c sc and c,c gPairs.Then,for each overlap component c
2 1 1 2

X
.
X
we determine the maximum M c,such that c is a descendant of c or
X X
.
c sc,say M c.All these data can be computed in logarithmic time with
w x
a linear workload by tree contraction 1.
L
EMMA
33.c and its parent component belong to the same cluster if and
.
X
.
only if Si c -M c.
Proof.Note that c and its parent belong to the same cluster if and only
.
if there is a c,c gPairs,such that c sc or c is a descendant of c
1 2 1 1
CLUSTERING AND SPLIT DECOMPOSITION
235
and c is the parent of c or an ancestor of the parent of c.This is
2
.
equivalent to the statement that there is a c,c gPairs,such that
1 2
..
c sc or c is a descendant of c and Si c )Si c.This is again
1 1 2
X
..
equivalent to the statement that Si c -M c.
Finally,we have to compute the clusters.For each overlap component c,
.
let anc c be the next ancestor overlap component with the property that

its parent component does not belong to the same cluster it is possible
..
that anc c sc.anc can be determined with linear workload in logarith-
mic time by an EREW-PRAMusing list ranking on the Euler cycle of the
overlap components tree where each edge is replaced by a double edge
 w x..
see for example 21.We identify each overlap component c with anc c.
To extract representatives of stars and of cliques,we have to compute
the buffers of any cluster.Given a cluster c,we pick a vertex
¨
that
c
.
appears in some set in c.Note that Buf c is the set of sets in the parent
..
cluster Par c of c that contains
¨
.In any case,we know the set S
¨
of
c c
sets in S that contain
¨
.When we select
¨
we only have to select those
i c c
.
sets in S
¨
that appear in the parent cluster of c.What we need is to find
c
out the collection of clusters with the same buffer.This can be done by
lexicographic sorting.Sequentially,this can be done in linear time.In
parallel we can do a CRCW-PRAM in logarithmic time with a linear

2
.
processor number and therefore by a CREW-PRAM in O log n time
with a linear processor number.
Next we have to check that the clusters c,...,c with the same buffer
1 k
form together a negatively degenerated module,i.e.,the smallest ancestor
buffer of all c contains a negatively degenerated module.We only have to
i
.
label each cluster c by a 1 if Buf c contains a module.We determine the
.
first ancestor of any c,called ANC c,such that its buffer contains a
module.Note that the modules in any buffer are ordered by inclusion.We
 .. .
select the smallest module Modul ANC c in Buf ANC c.If this module
is a negatively degenerated module for c sc then the clusters c,...,c
i 1 k
form a negatively degenerated module and are therefore candidates for a
star representative.
.
ANC c can be found in linear time sequentially and in logarithmic time

with a linear workload on an EREW-PRAM use list ranking and Eulerian
w x..
cycle techniques;see 21.To find the smallest module in ANC C,one

needs logarithmic time and a linear workload standard minimum compu-
.
tation.
To check whether c,...,c represent the lower layer components of a
1 k
star of the second kind,one has to check that all ancestor buffers do not
contain certain sets in S.Here we label clusters with a 1 if they contain
i
forbidden sets.Otherwise we label a cluster with a 0.One has to check
that all ancestor clusters are labelled by a 0.One only has to determine,
ELIAS DAHLHAUS
236
for each cluster,the sum of labels of its ancestors.This can be done by

Eulerian cycle techniques in logarithmic time with a linear workload see
w x.
for example 21.One also has to check that the neighborhood of each c
i
in a higher layer is a split representative.First one has to check that all
these neighbors are of the same layer.Then one determines the smallest
cluster containing these neighbors.If this cluster is a good cluster then we
determine the number of underlying vertices and compare this number
with the number of neighbors of each c in this layer.This is a procedure
i
that can be done in logarithmic time and linear workload on an EREW-
PRAM.
Finally,we have to do the split decomposition as described in the last
subsection.This can be done in logarithmic time with linear workload on
an EREW-PRAM.One only has to follow the algorithmic description and
one gets this bound.
As an overall result,we get the following.
T
HEOREM
8.Split decomposition can be done by a CRCW-PRAM in

2
..
O log n time with O n qm processors.All steps with the exception of
determining the connected components of D L and the neighborhoods of
j-i j
these components can be done sequentially in linear time.All steps but

2
.
modular decomposition can be done by a CREW-PRAM in O log n time
.
with O n qm processors.
5.6.Transformation into a Linear Time Algorithm
Since minimum spanning tree computation and the usual single linkage
..
clustering needs O n qm log n workload,we have to circumvent these
procedures in some way.On the other hand,we can compute a breadth-first
search tree in linear time and therefore the set K of vertices that have
i
distance k yi from a fixed vertex
¨
.k is the minimum distance of a
n
vertex from
¨
.
n
.
L
EMMA
34.Let V,V be a split with
¨
gV and U be the set of
1 2 n 2 1
¨
ertices in V that ha
¨
e neighbors in V.Then U is a subset of some K and
1 2 1 i
V _U is a union of connected components of D K.
1 1 j-i j
Proof.The proof is the same as in the hierarchy of L.
i
To get a split decomposition,we determine first,for each i,the neigh-
.
borhoods N C lK of connected components C of D K.Then we do
i j-i j
modular decomposition sequentially.The rest is done as in the parallel
algorithm with the L hierarchy.Note that the rest can be done in linear
i
time.
CLUSTERING AND SPLIT DECOMPOSITION
237
w x
L
EMMA
35 19,29,11.Modular decomposition can be done in linear
time.
.
To determine the neighborhoods N C lK,we determine for i s
i
0,...,k the set C
C
of connected components of K,shrink each compo-
i i
nent c gC
C
to one vertex
¨
,determine the neighborhood of c in K,
i c iq1
and put
¨
into K.It is easily seen that at level i each connected
c iq1
component of D K is shrunk to one vertex.By construction,each edge
jFi j
is either in one level K or joins two vertices of consecutive levels
i
K,K.Therefore,for a connected component of K,say c,that is
i iq1 i
shrunk to one vertex
¨
,the incident edges of
¨
are all the edges between
c c
K and K.Therefore each edge is called once in the shrinking process
i iq1
and once in the process to compute connected components.Therefore the
.
time to compute the neighborhoods of connected components N C jK
i
is linear.
As an overall result we get the following.
T
HEOREM
9.Split decomposition can be done in linear time.
5.7.Parity Graph Recognition
A parity graph is a graph with the property that for each vertex x and
each vertex y,all chordless paths from x to y have an odd length,or all
chordless paths from x to y have an even length.
Parity graphs can be characterized as follows.
w x
T
HEOREM
10 7.Parity graphs are exactly those graphs that can be split
decomposed into cliques and bipartite graphs.
An immediate consequence is the following.
C
OROLLARY
6.Parity graphs can be recognized in linear time.
We can improve the result of the parallel time bound.
L
EMMA
36.Let L be defined as in the parallel split decomposition
i
algorithm.In a parity graph,each layer L is a cograph,i.e.,a graph that has
i
no induced path of length 3.
.
Proof.Suppose L has an induced path u,u,u,u of length three.
i 1 2 3 4
.
Then u,
¨
,u is a path of length two and the other path is of length
1 i 4
four.This is a contradiction to the assumption that we are given a parity
graph.
Next we can characterize also cographs as follows.
 w x.
L
EMMA
37 See for Example 10.A graph is a cograph if and only if all
its modules are degenerated modules.
ELIAS DAHLHAUS
238
The modular tree of a cograph is identical to its cotree.
w x w x.
L
EMMA
38 13;compare also 25.The cotree of a cograph can be

2
.
computed in O log n time with a linear processor number on a CREW-
PRAM.It can be checked in the same bounds whether a graph is a cograph.
As a final result,we get

2
.
T
HEOREM
11.Parity graphs can be recognized in O log n time with a
linear processor number on a CREW-PRAM.
Proof.We determine first the layers L.Then we check whether each
i
L is a cograph,and if this is the case then we compute,for each L,a
i i
.
cotree which is also a modular tree.This part replaces the modular
decomposition of each L.Then in the rest of the split decomposition
i
procedure,we continue as in the general split decomposition algorithm.
Finally,we check for each component whether it is a clique or bipartite.
The only step in the general modular decomposition procedure that could

2
.
not be done in the time bound of O log'n on a CREW-PRAM is the
general modular decomposition.But this has been replaced by cograph
recognition and cotree computation.It can be checked in the same bounds
as the computation of the connected components that a graph is bipartite.
This proves the theorem.
Remark 1.There is a previous paper dealing with a parallel algorithm
w x
to recognize parity graphs in the same bounds as in the theorem 16.The
w x
algorithm did not use the results of 7.
6.CONCLUSIONS
Undirected split decomposition also became interesting in connection
 w x.
with recognizing circle graphs see for example 15,23,30.It is well
known that it is sufficient to check the circle graph property for the prime
components.It might be interesting to find linear time algorithms or
almost linear workload parallel algorithms to check whether a certain
prime graph is a circle graph.
REFERENCES
1.K.Abrahamson,N.Dadoun,D.Kirkpatrick,and T.Przyticka,A simple parallel tree
.
contraction algorithm,J.Algorithms 10 1988,287]302.
2.M.Atallah,M.Goodrich,and S.R.Kosaraju,Parallel algorithms for evaluating se-
.
quences of set manipulation operations,J.ACM 41 1994,1049]1085.
.
3.R.Anderson and G.Miller,Deterministic parallel list ranking,Algorithmica 6 1991,
859]868.
CLUSTERING AND SPLIT DECOMPOSITION
239
4.A.Barten,``Design of Very Fast Parallel Algorithms in the Combinatorial Optimization,''
w x
Diploma thesis,RWTH Aachen,1989 in German.
5.A.Bouchet,Reducing prime graphs and recognizing circle graphs,Combinatorica 7
.
1987,243]254.
6.F.Chin,J.Lam,and I.Chen,Efficient parallel algorithms for some graph problems,
.
Comm.ACM 25 1982,659]665.
7.S.Cicerone and D.Di Stefano,On the extension of bipartite graphs to parity graphs,
.
Discrete Appl.Math.95 1999,181]195.
8.R.Cole,Parallel merge sort,in``Proceedings,27th IEEE-FOCS,1986,''pp.511]516.
9.T.Cormen,C.Leiserson,and R.Rivest,``Introduction into Algorithms,''MIT Press,
Cambridge,MA,1990.
10.D.Corneil,H.Lerchs,and L.Burlingham,Complement reducible graphs,Discrete Appl.
.
Math.3 1981,163]174.
11.A.Cournier and M.Habib,A new linear algorithm for modular decomposition,in

``CAAP'94:19th International Colloquium,''Lecture Notes in Computer Science Sophie
.
Tison,Ed.,Vol.787,pp.68]82,Springer-Verlag,New YorkrBerlin,1994.
12.W.Cunningham,Decomposition of directed graphs,SIAM J.Algebraic and Discrete
.
Methods 3 1982,214]228.
13.E.Dahlhaus,Efficient parallel algorithms to recognize cographs and distance hereditary
.
graphs,Discrete Appl.Math.57 1995,29]44.
14.E.Dahlhaus,Fast parallel algorithm for the single link heuristics of hierarchical cluster-
ing,in``Proceedings of the Fourth IEEE Symposium on Parallel and Distributed
Processing,1992,''pp.184]186.
15.E.Dahlhaus,Fast parallel recognition of ultrametrics and tree metrics,SIAM J.Discrete
.
Math.6 1993,523]532.
16.E.Dahlhaus,An efficient parallel recognition algorithm of parity graphs,in``ICCI 93''
.
O.Abou-Rabia et al.,Eds.,pp.82]86.
17.E.Dahlhaus,Efficient parallel modular decomposition,extended abstract,in``Graph-

Theoretic Concepts in Computer Science,21th International Workshop WG'95''Nagl
.
et al.,Eds.,Lecture Notes in Computer Science,Vol.1017,pp.290]302,Springer-Verlag,
New YorkrBerlin,1995.
18.E.Dahlhaus,Efficient parallel and linear time split decomposition,in``14th FST]TCS''
.
P.Thiagarajan,Ed.,Lecture Notes in Computer Science,Vol.880,pp.171]180,
Springer-Verlag,New YorkrBerlin,1994.
19.E.Dahlhaus,J.Gustedt,and R.McConnell,Efficient and practical modular decomposi-
tion,in``Eighth Annual ACM]SIAM Symposium on Discrete Algorithms,1997,''pp.
26]35.
20.R.Dubes and A.Jain,``Algorithms for Clustering Data,''Prentice]Hall,Englewood
Cliffs,New Jersey,1988.
21.A.Gibbons and W.Rytter,``Efficient Parallel Algorithms,''Cambridge Univ.Press,
Cambridge,UK,1989.
22.M.Golumbic,``Algorithmic Graph Theory and Perfect Graphs,''Academic Press,New
York,1980.
23.C.Gabor,K.Supowit,and W.Hsu,Recognizing circle graphs in polynomial time,J.
.
ACM 36 1989,435]473.
.
24.P.Hammer and F.Maffray,Completely separable graphs,Discrete Appl.Math.27 1990,
85]99.
25.X.He,Parallel algorithm for cograph recognition with applications,J.Algorithms 15
.
1993,284]313.
26.P.Klein,Efficient parallel algorithms for chordal graphs,in``29th IEEE]FOCS,1988,''
pp.150]161.
ELIAS DAHLHAUS
240
.
27.R.Ladner and M.Fischer,Parallel prefix computation,J.ACM 27 1980,831]838.

2
.
28.T.Ma and J.Spinrad,An O n -algorithm for undirected split decomposition,J.
.
Algorithms 16 1994,145]160.
29.R.McConnell and J.Spinrad,Linear-time modular decomposition and efficient transitive
orientation of comparability graphs,in``Fifth Annual ACM]SIAM Symposium of Dis-
crete Algorithms,1994,''pp.536]545.
.
30.T.Przyticka and D.Corneil,Parallel algorithms for parity graphs,J.Algorithms 12 1991,
96]109.
.
31.Y.Shiloach and U.Vishkin,An O log n parallel connectivity algorithm,J.Algorithms 3
.
1982,57]67.
.
32.J.Spinrad,Recognition of circle graphs,J.Algorithms 16 1994,264]282.
33.R.Tarjan and U.Vishkin,Finding biconnected components in logarithmic parallel time,
.
SIAM-J.Computing 14 1984,862]874.
34.H.Wagener,Triangulating a monotone polygon in parallel,in``Computational Geometry
and Its Applications,''Lecture Notes in Computer Science,Vol.333,pp.136]142,
Springer-Verlag,New YorkrBerlin,1988.