1

bahmotherΗλεκτρονική - Συσκευές

7 Οκτ 2013 (πριν από 4 χρόνια και 5 μέρες)

116 εμφανίσεις

Triplet and Quartet Distances

Between Trees of Arbitrary Degree

Gerth
Stølting

Brodal

Aarhus University

Rolf Fagerberg

University

of Southern Denmark

Thomas Mailund, Christian N. S. Pedersen, Andreas Sand

Aarhus University
,
Bioinformatics Research Center

ACM
-
SIAM Symposium on Discrete
Algorithms, New Orleans, Louisiana, USA
, 8
January

2013

Evolutionary

Tree

Bonobo

Chimpanzee

Human

Neanderthal

Gorilla

Orangutan

Time

Rooted

Unrooted

Evolutionary

Tree

Dominant
modern

approach
to
study

evolution is from DNA
analysis

Constructing

Evolutionary

Trees




Binary

or
Arbitrary

Degrees

?

Sequence

data

Distance matrix

1

2

3

∙∙∙

n

1

2

3

∙∙∙


n

Neighbor

J
oining

Saitou
,
Nei

1987


[ O(
n
3
)
Saitou
,
Nei

1987 ]

Refined

Buneman
Trees

Moulton
, Steel 1999

[ O(
n
3
) Brodal
et al
.
2003 ]

Buneman
Trees

Buneman 1971

[
O(
n
3
) Berry, Bryan 1999
]

1

2

3

∙∙∙


n

....

....

Binary

t
rees

(
despite

no

evidence

in distance data)

Arbitrary

degrees

(
strong

support for all
edges

;
few

branches
)

Arbitrary

degree

(
compromise

;
good

support for all
edges
)

Data Analysis
vs

Expert
Trees




Binary

vs

Arbitrary

Degrees

?

Linguistic

expert

classification


(
Aryon

Rodrigues
)

Neighbor

Joining

on
l
inguistic

d
ata

Cultural
Phylogenetics

of the
Tupi

Language Family
in
Lowland
South
America.

R. S. Walker, S. Wichmann, T. Mailund, C. J.
Atkisson
.
PLoS

One. 7(4), 2012.

Evolutionary

Tree

Comparison

?



s
plit

1357|2468

1

4

3

2

5

6

7

T
2

8

1

6

3

2

5

4

7

T
1

8

Common

Only

T
1

Only

T
2

1357|2468

35|124678

57|123468

13567|248

48|123567

Robinson
-
Foulds

distance

= # non
-
common

splits =

2

+

1

=
3

[Day 1985]

O(
n
) time
algorithm

using

2 x DFS + radix sort

D. F. Robinson and L. R.
Foulds
. Comparison of weighted
labeled
trees.
In
Combinatorial

mathematics, VI
, Lecture Notes in Mathematics, pages 119

126. Springer, 1979.

8

8

Robinson
-
Foulds

Distance (
u
nrooted

t
rees
)

?



T
1

Common

Only

T
1

Only

T
2

(none)

12567|34
8

1257|346
8

157|2346
8

57|12346
8

12567
8
|34

1257
8
|346

157
8
|2346

57
8
|12346

7
8
|123456

1

6

2

5

4

7

T
2

3

1

6

2

5

4

7

3

RF
-
dist
(
T
1
,
T
2
) =
4

+
5

=
9

RF
-
dist
(
T
1
\
{
8
} ,
T
2
\
{
8
}
)
=
0

Robinson
-
Foulds

very

sensitive
to
outliers

D. F. Robinson and L. R.
Foulds
. Comparison of weighted
labeled
trees.
In
Combinatorial

mathematics, VI
, Lecture Notes in Mathematics, pages 119

126. Springer, 1979.

resolved

:
ij
|
kl

Quartet Distance (
u
nrooted

t
rees
)

Consider all
n
4

quartets
, i.e.
topologies

of subsets of 4
leaves

{
i
,
j
,
k
,
l
}

Quartet

T
1

T
2

{1,2,3,4}

14|23

14|23

{1,2,3,5}

13|25

15|23

{1,2,4,5}

14|25

1245

{1,3,4,5}

14|35

1345

{2,3,4,5}

25|34

23|45

i

j

k

l

i

j

k

l

u
nresolved

:
ijkl

(
only

non
-
binary

trees
)

Quartet
-
dist
(
T
1
,
T
2
) =
n
4

-

#
common

quartets

=
5

-

1

=
4

1

3

2

5

2

4

3

1

5

T
1

T
2

4

G.
Estabrook
, F.
McMorris
, and C.
Meacham. Comparison
of undirected phylogenetic trees
based on
subtrees

of four
evolutionary units.
Systematic
Zoology
, 34:193
-
200, 1985.

Triplet Distance (
r
ooted

t
rees
)

Consider all
n
3

triplets
, i.e.
topologies

of subsets of 3
leaves

{
i
,
j
,
k
}

Triplet

T
1

T
2

{1,
2
,3}

2
|13

2
|13

{1,
2
,4}

1|24

4|1
2

{1,
2
,5}

1|
2
5

5|1
2

{1,3,4}

4|13

4|13

{1,3,5}

5|13

5|13

{1,4,5}

1|45

1|45

{
2
,3,4}

3|
2
4

4|
2
3

{
2
,3,5}

3|
2
5

5|
2
3

{
2
,4,5}

5|
2
4

2
|45

{3,4,5}

3|45

3|45

resolved

:
k
|
ij

i

j

k

i

j

k

u
nresolved

:
ijk

(
only

non
-
binary

trees
)

Triplet
-
dist
(
T
1
,
T
2
) =
n
3

-

#
common

triplets

=
10

-

5

=
5

1

2

5

3

T
1

4

4

1

5

2

T
2

3

D. E.
Critchlow
, D. K. Pearl, C. L.
Qian
:
The triples distance for rooted bifurcating
phylogenetic
trees.
Systematic Biology
,
45(3):
323
-
334,

1996
.

Rooted

Triplet

distance

Unrooted

Quartet

distance

Binary



O(
n
2
)

O(
n

log

n
)



CPQ 1996

[SODA 2013]



O(
n
3
)

O(
n
2
)

O(
n

log
2

n
)

O(
n

log

n
)



D 1985

BTKL 2000

BFP 2001

BFP 2003

Degrees



d





O(
n
2
)

O(
n

log

n
)




BDF 2011

[SODA 2013]




O(
d

9

n

log

n
)

O(
n
2.688
)

O(
d

n

log

n
)




SPMBF 2007

NKMP 2011

[SODA 2013]

Computational

Results

1

2

5

3

4

1

3

2

5

4

1

2

5

3

4

6

7

12

3

1

10

6

7

13

11

5

8

9

Distance
Computation


T
2

Resolved

Unresolved





T
1



Resolved




A

:
Agree



C

B
:
Disagree

Unresolved

D

E

i

j

k

i

j

k

i

j

k

i

j

k

i

j

k

j

k

i

i

k

j

i

k

j

Triplet
-
dist
(
T
1
,
T
2
) =
n
3


A



E

=
B

+
C

+
D

A
+
B

+
C

+
D

+
E

=
n
3

D

+
E
and
C

+
E

unresolved

in
one

tree

Sufficient to
compute

A

and
E

or
A

and
B

i

j

k

j

k

i


T
2

Resolved

Unresolved





T
1



Resolved




A

:
Agree



C

B
:
Disagree

Unresolved

D

E

i

j

k

i

j

k

i

j

k

i

j

k

i

j

k

j

k

i

i

k

j

i

k

j

i

j

k

j

k

i

Parameterized

Triplet &
Q
uartet
D
istances

B
+
α

(C + D) , 0


α



1

BDF 2011

O(
n
2
) for triplet,
NKMP 2011

O(
n
2.688
) for
quartet

[SODA 13]

O(
n
∙log

n
) and O(
d

n

log

n
),
respectively

Counting

Unresolved

Triplets in One
Tree

n
1

n
2

n
3

∙∙∙
n
d

v



n
i
·
n
j
·
n
k

i
<
j
<
k
v



n
i
·
n
j
·
n
k
·
n
l
i
<
j
<
k
<
l
+
n


n
l
l

Ŷ
i
·
n
j
·
n
k
i
<
j
<
k
v

Computable

in O(
n
) time
using

DFS +
dynamic

programming

n
1

n
2

n
3

∙∙∙
n
d

v

Triplet
anchored

at
v

Quartet
anchored

at
v

Quartets

(
root

tree

arbitrary
)

Counting

Agreeing

Triplets

(Basic Idea)

v

1

i

j

d





n
i
c
2
n
w

n
c

n
i
w
+
n
i
c

1

i

d
c
w

T
2
v

T
1

T
1

j

T
2

c

i
i


n
i
w
1

i

d

w

0

Efficient

Computation

Limit
recolorings

in T
1

(
and T
2
)
to
O(
n
∙log

n
)

v

1

1

1

0

v

1

2

d

0

v

1

0

v

0

v

1

0

v

1

0

...

Count

T
2

contribution

(
precondition
)

Recolor

Recolor

Recurse

Recolor

&
r
ecurse

T
1

Reduce

recoloring

cost

in T
2

to
O(
n
∙log
2

n
)

1

2

3

4

5

6

7

8

9

1

2

4

7

9

8

5

6

3

Reduce

recoloring

cost

in T
2

from
O(
n
∙log
2

n
)
to
O(
n
∙log

n
)


T
2

arbitrary

height

degree

H
(
T
2
)

binary

height

O(log
n
)


Contract

T
2

and
reconstruct

H
(
T
2
)
during

recursion

Counting

Agreeing

Triplets (II)

v

1

i

j

d

T
1

0

n
ode in

H
(
T
2
)
=

component

composition

in
T
2


n
i
C
1
2
n

C
2

n
i
C
2
1

i

d


n

C
1

n
i
C
1
n
(
ii
)
C
2
1

i

d


n
i
C
1
·
n
i


C
2
1

i

d

+ +

Contribution

to
agreeing

triplets at node in
H
(
T
2
)

i

i

j

i

i

j

j

i

i

C
2

C
1

From

O(n∙log
2

n
) to O(
n
∙log

n
)

Update
O
(1)
counters

for all
colors

through

node


log
|
T

2
|
n
i
2

i

d
=

n
i

log
n
v
n
i
2

i

d

Colored

path

lengths

v

1

i

j

d

T
1

0

n
v

n
i

w

H
(
T

2
)

Compressed

version
of
T
2

of
size

O(
n
v
)

Total
cost

for
updating

counters



log
n
a
(
j
+1)
n
a
(
j
)
ancestor

a
(
j
)
not

heavy

child
leaf

l

T
1

=

n
·
log
n

a
(1)

a
(2)

l
=
a
(0)

a
(3)

a
(4)

a
(5)

T
1

Counting

Quartets...

Bottleneck

in
computing

disagreeing

resolved
-
resolved

quartets

v

1

i

j

d

T
1

0

i

j

i

j



n
(
ij
)
G
1
·
n
(
ij
)
G
2
i
<
j

d
1

i
<
d


G
1

G
2

T
2

double
-
sum


factor
d

time


Root

T
1
and
T
2

arbitrary


Keep

up to
15+38
d

different

counters

per node in
H
(
T
2
)...

Distance
Computation


T
2

Resolved

Unresolved





T
1



Resolved




A

:
Agree



C

B
:
Disagree

Unresolved

D

E

i

j

k

i

j

k

i

j

k

i

j

k

i

j

k

j

k

i

i

k

j

i

k

j

Triplet
-
dist
(
T
1
,
T
2
) =
n
3


A



E

=
B

+
C

+
D

A
+
B

+
C

+
D

+
E

=
n
3

D

+
E
and
C

+
E

unresolved

in
one

tree

Sufficient to
compute

A

and
E

or
A

and
B

i

j

k

j

k

i

Rooted

Triplet

distance

Unrooted

Quartet

distance

Binary



O(
n
2
)

O(
n

log

n
)



CPQ 1996

[SODA 2013]



O(
n
3
)

O(
n
2
)

O(
n

log
2

n
)

O(
n

log

n
)



D 1985

BTKL 2000

BFP 2001

BFP 2003

Degrees



d




O(
n
2
)

O(
n

log

n
)




BDF 2011

[SODA 2013]




O(
d

9

n

log

n
)

O(
n
2.688
)

O(
d

n

log

n
)




SPMBF 2007

NKMP 2011

[SODA 2013]

Summary

d

= maximal
degree

of
any

node in
T
1

and
T
2

1

2

5

3

4

1

3

2

5

4

1

2

5

3

4

6

7

12

3

1

10

6

7

13

11

5

8

9

O(
n
∙log

n
) ?

o
(
n
∙log

n
) ?