SplitsTree4 – a Java Framework for Phylogenetic Trees and Networks

Arya MirDéveloppement de logiciels

27 avr. 2012 (il y a 2 années et 5 mois)

624 vue(s)

SplitsTree4
SplitsTree4 ––a Java Framework
a Java Framework
for Phylogenetic Trees and
for Phylogenetic Trees and
Networks
Networks
www
www--ab.informatik.uni
ab.informatik.uni--tuebingen.de
tuebingen.de

Daniel Huson
Daniel Huson
Copyright (c) 2008 Daniel Huson.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
Texts. A copy of the license can be found at http://www.gnu.org/copyleft/fdl.html
Trees vs networks
Trees vs networks
••
Evolutionary relationships are usually
Evolutionary relationships are usually
represented by phylogenetic trees
represented by phylogenetic trees
••
But: real data contain different and/or
But: real data contain different and/or
conflicting signals, and thus do not always
conflicting signals, and thus do not always
clearly support a unique tree
clearly support a unique tree

conflicting signals, and thus do not always
conflicting signals, and thus do not always
clearly support a unique tree
clearly support a unique tree
••
Enter phylogenetic networks, as either:
Enter phylogenetic networks, as either:
••
simply a visualization of conflicting data, or
simply a visualization of conflicting data, or
••
a more complex model of evolution containing
a more complex model of evolution containing
events such as recombination or hybridization
events such as recombination or hybridization


Trees vs Networks
Trees vs Networks
Bacteria
BacteriaEukaryotes
EukaryotesArchaea
Archaea
Proteobacteria
Proteobacteria
Fungi
Fungi
Plants
Plants
Animals
Animals
Euryarchacota
Euryarchacota
Crenarchaeota
Crenarchaeota
Cyanobacteria
Cyanobacteria
Domain
Domain
Kingdom
Kingdom
Bacteria
BacteriaEukaryotes
EukaryotesArchaea
Archaea
Proteobacteria
Proteobacteria
Fungi
Fungi
Plants
Plants
Animals
Animals
Euryarchacota
Euryarchacota
Crenarchaeota
Crenarchaeota
Cyanobacteria
Cyanobacteria

Proteobacteria
Proteobacteria
Animals
Animals
Archezoa
Archezoa
Euryarchacota
Euryarchacota
Crenarchaeota
Crenarchaeota
Cyanobacteria
Cyanobacteria
Kingdom
Kingdom
Doolittle, 1999
Doolittle, 1999
Proteobacteria
Proteobacteria
Animals
Animals
Archezoa
Archezoa
Euryarchacota
Euryarchacota
Crenarchaeota
Crenarchaeota
Cyanobacteria
Cyanobacteria
Doolittle, 1999
Doolittle, 1999
Tree of life Web o
f life
Tree of life Web o
f life


Trees vs Networks
Trees vs Networks
Computed using
Computed using
split decomposition
split decomposition
Computed using
Computed using
Neighbor
Neighbor--Joining
Joining

Neisseria phylogeny (Eddie Holmes, 1999)
Neisseria phylogeny (Eddie Holmes, 1999)


Trees and splits
Trees and splits

The
The
split encoding
split encoding

(T) of a tree T:
(T) of a tree T:
GG
66
GG
44

GG
11
GG
88
GG
77
GG
55
GG
44
GG
33
GG
22
GG
11
,G,G
33
,G,G
44
,G,G
66
,G,G
77
vsvs
GG
22
,G,G
55
,G,G
88
ee
GG
88
GG
55
GG
22


Networks and splits
Networks and splits

Cut
Cut--set of parallel edges defines split {
set of parallel edges defines split {
A,B
A,B
} vs rest
} vs rest


Splits and splits graphs
Splits and splits graphs
••
Any
Any
given system
given system

of splits can be
of splits can be
represented by a
represented by a
splits graph G
splits graph G
. Note that:
. Note that:
••
G is a tree iff
G is a tree iff

is compatible
is compatible
(e.g. Neighbor
(e.g. Neighbor--Joining)
Joining)
••
G is outer
G is outer
--
labeled
labeled
--
planar iff
planar iff

is circular
is circular

••
G is outer
G is outer
--
labeled
labeled
--
planar iff
planar iff

is circular
is circular
(e.g. Neighbor
(e.g. Neighbor--Net, Bryant & Moulton 2002)
Net, Bryant & Moulton 2002)
••
G is usually planar or only mildy non
G is usually planar or only mildy non--planar iff
planar iff

is weakly compatible
is weakly compatible
(e.g. Split Decomposition)
(e.g. Split Decomposition)
••
G is always subgraph of n
G is always subgraph of n--dim. hypercube
dim. hypercube
(e.g. recoding of sequences, spectral analysis, med
ian networks,
(e.g. recoding of sequences, spectral analysis, med
ian networks,
consensus networks, Z
consensus networks, Z--super networks)
super networks)
(Theory of splits worked out by Bandelt and Dress 1
992)
(Theory of splits worked out by Bandelt and Dress 1
992)


SplitsTree 3.2
SplitsTree 3.2

Implements split
Implements split
decomposition and
decomposition and
related methods
related methods

First version developed
First version developed


First version developed
First version developed
with Rainer Wetzel in
with Rainer Wetzel in
1995
1995

Current version 3.2 in
Current version 3.2 in
C++ using Tcl
C++ using Tcl--TkTk

Runs under Linux, Unix,
Runs under Linux, Unix,
Windows and MacOS
Windows and MacOS


••
Must run on any machine with minimal
Must run on any machine with minimal
installation requirements
installation requirements
••
GUI for interactive use, command
GUI for interactive use, command
--
line for
line for
Design criteria for SplitsTree4
Design criteria for SplitsTree4

••
GUI for interactive use, command
GUI for interactive use, command
--
line for
line for
pipelines
pipelines
••
Open system, decentralized plug
Open system, decentralized plug--in concept
in concept
••
Based on splits, also including quartets etc
Based on splits, also including quartets etc
••
Based on Nexus file format, with support
Based on Nexus file format, with support
for most common formats
for most common formats


Data flow in SplitsTree
Data flow in SplitsTree
Taxa
Taxa
Characters
Characters
Unaligned
Unaligned
Bootstrap
Bootstrap
Assumptions
Assumptions
Taxa are
Taxa are
represented e.g. by
represented e.g. by
aligned sequences
aligned sequences
Assumptions
Assumptions
Bootstrap
Bootstrap

Splits
Splits
Trees
Trees
Quartets
Quartets
Distances
Distances
Analysis
Analysis
Transform
Transform
characters into
characters into
distances e.g. using
distances e.g. using
Hamming distances
Hamming distances
Transform
Transform
distances into splits
distances into splits
e.g. using Neighbor
e.g. using Neighbor--
net
net
Transform splits in
Transform splits in
to unrooted or
to unrooted or
rooted graph or tree
rooted graph or tree
Every connector
Every connector
represents a data
represents a data
transformation
transformation
(plug
(plug--in)in)
Analysis
Analysis
Graph
Graph


Writing a new transformation
Writing a new transformation
A new tree
A new tree--building method “BestTree” is provided t
o
building method “BestTree” is provided to
SplitsTree as follows:
SplitsTree as follows:
p
u
b
l
i
c

c
l
a
s
s

B
e
s
t
T
r
e
e

i
m
p
l
e
m
e
n
t
s

D
i
s
t
a
n
c
e
s
2
T
r
e
e
p
u
b
l
i
c

c
l
a
s
s

B
e
s
t
T
r
e
e

i
m
p
l
e
m
e
n
t
s

D
i
s
t
a
n
c
e
s
2
T
r
e
e
{
{
/
/

r
e
t
u
r
n
s

t
r
u
e
,

i
f

B
e
s
t
T
r
e
e

i
s

a
p
p
l
i
c
a
b
l
e
/
/

r
e
t
u
r
n
s

t
r
u
e
,

i
f

B
e
s
t
T
r
e
e

i
s

a
p
p
l
i
c
a
b
l
e
p
u
b
l
i
c

b
o
o
l
e
a
n

i
s
A
p
p
l
i
c
a
b
l
e

(
T
a
x
a

t
,
D
i
s
t
a
n
c
e
s

d
)
p
u
b
l
i
c

b
o
o
l
e
a
n

i
s
A
p
p
l
i
c
a
b
l
e

(
T
a
x
a

t
,
D
i
s
t
a
n
c
e
s

d
)

p
u
b
l
i
c

b
o
o
l
e
a
n

i
s
A
p
p
l
i
c
a
b
l
e

(
T
a
x
a

t
,
D
i
s
t
a
n
c
e
s

d
)
p
u
b
l
i
c

b
o
o
l
e
a
n

i
s
A
p
p
l
i
c
a
b
l
e

(
T
a
x
a

t
,
D
i
s
t
a
n
c
e
s

d
)
{

}
{

}
/
/

a
p
p
l
i
e
s

B
e
s
t
T
r
e
e

a
n
d

r
e
t
u
r
n
s

t
h
e

t
r
e
e
/
/

a
p
p
l
i
e
s

B
e
s
t
T
r
e
e

a
n
d

r
e
t
u
r
n
s

t
h
e

t
r
e
e
p
u
b
l
i
c

T
r
e
e

a
p
p
l
y

(
T
a
x
a

t
,
D
i
s
t
a
n
c
e
s

d
)

{

}
p
u
b
l
i
c

T
r
e
e

a
p
p
l
y

(
T
a
x
a

t
,
D
i
s
t
a
n
c
e
s

d
)

{

}
/
/

c
o
m
m
u
n
i
c
a
t
i
n
g

o
p
t
i
o
n
s

t
o

S
p
l
i
t
s
T
r
e
e
:
/
/

c
o
m
m
u
n
i
c
a
t
i
n
g

o
p
t
i
o
n
s

t
o

S
p
l
i
t
s
T
r
e
e
:
i
n
t

g
e
t
O
p
t
i
o
n
T
h
r
e
s
h
o
l
d

(
)

{

}
i
n
t

g
e
t
O
p
t
i
o
n
T
h
r
e
s
h
o
l
d

(
)

{

}
v
o
i
d

s
e
t
O
p
t
i
o
n
T
h
r
e
s
h
o
l
d

(
i
n
t

t
)

{

}
v
o
i
d

s
e
t
O
p
t
i
o
n
T
h
r
e
s
h
o
l
d

(
i
n
t

t
)

{

}
}
}


SplitsTree Windows
SplitsTree Windows



SplitsTree Windows
SplitsTree Windows
Taxa
Taxa
Unaligned
Unaligned
Characters
Characters
Distances
Distances
Quartets
Quartets
Trees
Trees
Splits
Splits

Main Window
Main Window
Method Window
Method Window


SplitsTree Editor
SplitsTree Editor



Individual Gene Trees
Individual Gene Trees
ITS00
ITS00

46 taxa
46 taxa


Individual Gene Trees
Individual Gene Trees
ITS03
ITS03

40 taxa
40 taxa


Individual Gene Trees
Individual Gene Trees
SSU00
SSU00

29 taxa
29 taxa


Individual Gene Trees
Individual Gene Trees
SSU03
SSU03

40 taxa
40 taxa


Individual Gene Trees
Individual Gene Trees
Gpd03
Gpd03

40 taxa
40 taxa


Gene Trees as Super Network
Gene Trees as Super Network



Gene Trees as Super Network
Gene Trees as Super Network
ITS00+
ITS00+
ITS03
ITS03



Gene Trees as Super Network
Gene Trees as Super Network
ITS03+
ITS03+
SSU00
SSU00



Gene Trees as Super Network
Gene Trees as Super Network
ITS00+
ITS00+
ITS00+
ITS00+
SSU03
SSU03

SSU03
SSU03


Gene Trees as Super Network
Gene Trees as Super Network
ITS00+
ITS00+
ITS03+
ITS03+
SSU03+
SSU03+

SSU03+
SSU03+
Gpd03
Gpd03


Gene Trees as Super Network
Gene Trees as Super Network
ITS00+
ITS00+
ITS03+
ITS03+
SSU00+
SSU00+

SSU00+
SSU00+
SSU03+
SSU03+
Gpd03
Gpd03


Exponential Explosion
Exponential Explosion

Methods like the consensus network,
Methods like the consensus network,
ZZ--super network or bootstrap
super network or bootstrap--network
network
only produce a polynomial number of
only produce a polynomial number of
splits
splits

only produce a polynomial number of
only produce a polynomial number of
splits
splits

The number of nodes and edges of the
The number of nodes and edges of the
corresponding splits graph can grow
corresponding splits graph can grow
exponentially…
exponentially…

How to deal with this?
How to deal with this?


Incompatibility Graph IG(
Incompatibility Graph IG(





))

Nodes: splits
Nodes: splits

Edges: pairs of incompatible splits
Edges: pairs of incompatible splits

Note:
Note:
A 3
A 3--cube in the
cube in the
splits graph corresponds
splits graph corresponds
to a 3
to a 3--clique in IG(
clique in IG(

))


All splits of 50 Gene Trees on Archaea
All splits of 50 Gene Trees on Archaea

D=1
D=1
D=2
D=2
D=3
D=3
D=4
D=4
D=5
D=5
D=10
D=10


Hybridization Networks
Hybridization Networks

Currently developing algorithms for
Currently developing algorithms for
computing hybridization networks and
computing hybridization networks and
ancestor recombination graphs
ancestor recombination graphs



Summary
Summary

SplitsTree provides a frame
SplitsTree provides a frame--work for
work for
phylogenetic analysis
phylogenetic analysis

Extensibility based on plug
Extensibility based on plug--in design
in design

Built on splits, incorperates both tree and
Built on splits, incorperates both tree and
network methods
network methods


Built on splits, incorperates both tree and
Built on splits, incorperates both tree and
network methods
network methods

Provides all popular distance
Provides all popular distance--based tree
based tree
building algorithms
building algorithms

Provides network methods such as split
Provides network methods such as split
decomposition, Neighbor
decomposition, Neighbor--net, consensus
net, consensus
networks and super networks.
networks and super networks.


Credits
Credits

Authors: David Bryant and D.H.
Authors: David Bryant and D.H.

Additional programming:
Additional programming:


Additional programming:
Additional programming:
Tobias Dezulian, Markus Franz,
Tobias Dezulian, Markus Franz,
Miguel Jette,Tobias Kloepper, and
Miguel Jette,Tobias Kloepper, and
Michael Schröder
Michael Schröder

