Detecting horizontal gene transfers
using discrepancies in species and
gene classifications
Alix Boc
Vladimir Makarenkov
Université du Québec à Montréal
Presentation summary
•
Some words about phylogeny
•
Network models in phylogenetic analysis
•
What is a horizontal gene transfer (HGT)?
•
Description of the new method
•
Examples of application
•
Future works
•
T

Rex software
Recontruction of a phylogenetic tree
A: CGTAAT
B: CGTACG
C: CGTCGA
D: ACT………
E: ………………
F: ………………
A
B
C
D
E
F
A
0
2
3
5
5
4
B
2
0
3
5
5
4
C
3
3
0
4
4
3
D
5
5
4
0
2
3
E
5
5
4
2
0
3
F
4
4
3
3
3
0
DNA Sequences Distance Matrix Phylogenetic Tree
Inferring phylogenetic trees
Four main approaches:
•
Distance

based methods
•
UPGMA
by Michener and Sokal (1957)
•
ADDTREE
by Sattath et Tversky (1977)
•
Neighbor

joining
(
NJ
)
by Saitou and Nei (1988)
•
UNJ and BioNJ methods
by Gascuel (1997)
•
Fitch
by Felsenstein (1997)
•
Weighted least

squares MW
by Makarenkov and Leclerc (1999)
•
Maximum Parsimony
(Camin and Sokal 1965; Farris 1970; Fitch 1971)
•
Maximum Likelihood
(Felsenstein 1981)
•
Bayesian approach
(Rannala and Yang 1996; Huelsenbeck and Ronquist 2001)
Phylogenetic mechanisms requiring a
network representation
•
Horizontal gene transfer (i.e. lateral gene transfer)
•
Hybridization
•
Homoplasy and gene convergence
•
Gene duplication and gene loss
•
SplitsTree, Huson (1998)
•
T

Rex, Makarenkov (2001)
•
NeighborNet, Bryant and Moulton (2002)
Software for building phylogenetic networks
•
Hein (1990) and Hein
et al.
(1995, 1996)
•
Haseler and Churchill (1993)
•
Page (1994); Page and Charleston (1998)
•
Charleston (1998)
•
Hallet and Lagergren (2001)
•
Mirkin, Fenner, Galperin and Koonin (2003)
•
V’yugin, Gelfand and Lyubetsky (2003)
•
Boc and Makarenkov (2003); Makarenkov, Boc and Diallo (2004)
Methods for detecting horizontal gene transfers
Three types of horizontal gene transfer
The new model
Basic
ideas:
1)
Reconcile
the
species
and
gene
phylogenetic
trees
using
either
a
topological
(Robinson
and
Foulds
topological
distance)
or
a
metric
(least

squares)
criterion
2
)
Incorporate
necessary
biological
rules
into
the
mathematical
model
3)
Maintain the algorithmic time complexity polynomial
Partial gene transfer versus complete transfer
(a)
(b)
Biological rules
Partial gene transfer. Incorporating biological rules.
Situations
when
a
new
HGT
branch
(
a
,
b
)
can
affect
the
evolutionary
distance
between
species
i
and
j,
and
cannot
affect the distance
between
i
1
and
j
.
Partial gene transfer. Incorporating biological rules (2).
Three cases when the evolutionary distance between the
species
i
and
j
is not affected by addition of a new HGT
branch (
a
,
b
)
No HGTs can be considered when affected branches are
located on the same lineage
Partial gene transfer. Incorporating biological rules (3).
No HGT can be considered when two HGTs affecting
a pair of lineages intersect as shown
Partial gene transfer. Incorporating biological rules (4).
•
Cases (a) and (b): path between the leaves
i
and
j
is
allowed
to go
through both HGT branches (
a
,
b
) and (
a
1
,
b
1
).
•
Cases (c) and (d) : path between the leaves
i
and
j
is
not allowed
to
go through both HGT branches (
a
,
b
) and (
a
1
,
b
1
).
Partial gene transfer. Incorporating biological rules (5).
Sub

Tree constraint
Timing constraint: the transfer between the branches (
z
,
w
) and (
x
,
y
) of the species tree
T
can
be allowed if and only if the cluster regrouping both affected sub

trees is present in the gene
tree
T
1
. Here and further in the article a single branch is depicted by a plane line and a path is
depicted by a wavy line.
•
To arrange the topological conflicts between
T
and
T
1
that are due to the
transfers between single species or their close ancestors.
•
To identify the transfers that have occurred deeper in the phylogeny.
Optimization
Optimization problem : Least

squares
The least

squares loss function to be minimized with an unknown length
l
of
the HGT branch (
a,b
):
Q
(
ab,l
) =
+
min
d
(
i
,
j
)

the minimum path

length distance between
the leaves (i.e. taxa)
i
and
j
in the tree
T
(
i
,
j
)

the given dissimilarity value between
i
and
j
dist
(
i,j
) =
d
(
i,j
)
–
Min
{
d
(
i,a
) +
d
(
j,b
);
d
(
j,a
) +
d
(
i,b
) }
Optimization problem :
Robinson and Foulds
topological distance
The
topological
distance
of
Robinson
and
Foulds
(
1981
)
between
two
phylogenetic
trees
is
equal
to
the
minimum
number
of
elementary
operations
consisting
of
merging
or
splitting
vertices
necessary
to
transform
one
tree
into
another
.
Robinson and Foulds topological distance
Robinson and Foulds distance between
T
and
T
1
is 2.
The
HGT
minimizing
the
Robinson
and
Foulds
topological
distance
between
the
species
and
gene
phylogenetic
trees
can
be
considered
as
the
best
candidate
to
reconcile
the
species
and
gene
phylogenies
.
Algorithm
6
A
0 2 3 5 5 4
B
2 0 3 5 5 4
C
3 3 0 4 4 3
D
5 5 4 0 2 3
E
5 5 4 2 0 3
F
4 4 3 3 3 0
A
0 4 4 2 4 4
B
4 0 4 4 2 4
C
4 4 0 4 4 2
D
2 4 4 0 4 4
E
4 2 4 4 0 4
F
4 4 2 4 4 0
6
Input file for our program
Distance Matrix
for the species tree
Distance Matrix
for the gene tree
Set
X
of Taxa = {A,B,C,D,E,F}
•
Optimization criterion : Least

Squares or
Robinson and Foulds distance.
•
Type of scenario : Unique or Multiple.
•
Maximum number of HGTs.
•
Position of the root.
Program options
Algorithm : unique scenario
Begin
Reconstruction of the species tree
T
Reestimate the length of each branch in
T
While
Optimization criterion > 0
loop
Test all possible HGTs
Add the best HGT
Reestimate the length of each branch in
T
Compute the value of the optimization criterion
End
Loop
End
Algorithm : multiple scenario
Begin
Reconstruction of the species tree
T
Reestimate the length of each branch in
T
Test all connections between pairs of branches
Establish a list of HGTs ordered according to the
optimization
criterion.
End
Algorithm : Step 1
•
Reconstruction of the species tree
T
with Neighbor Joinning
•
Set
X
of
n
taxa
•
Binary tree: internal nodes are all of
degree 3, 2
n

3 branches
•
T
is explicitly rooted
Criterion 2
:
Reestimate the length of each
branch of the species tree
T
according to the distances in
T
1
.
LS

Least

Squares coefficient
between distances in
T
and
T
1
If
LS == 0
then
There is no HGTs
Else
Step 3 (next slide)
End if
Algorithm : Step 2
•
Comparing the gene tree
T
1
and the species tree
T
Criterion 1 :
RF

Robinson and Foulds
distance between
T
and
T
1
If
RF == 0
then
There is no HGTs
Else
Step 3 (next slide)
End if
Algorithm : Step 3
Multiple Scenario
•
Test all connections between pairs of branches.
•
Reestimate the length of each branch in
T
according to the gene distance matrix.
•
Establish a list of HGTs ordered according
to the least

squares coefficient or the
Robinson

Foulds distance.
Algorithm : Step 3
Species Tree
+ HGT
3
(Gene Tree)
Species Tree
Upcoming
HGT
1
Species Tree
+ HGT
2
Upcoming HGT
3
Species Tree
+ HGT
1
Upcoming HGT
2
Unique Scenario
1.
The best HGT found is added to the species tree.
2.
The length of each branch is reestimated
according to the gene tree.
3.
RF distance or LS coefficient are computed.
1
2
3
output
Type de scenario : Unique
Liste des aretes et leur longueur de
l'arbre d'especes construit avec NJ
1
7

B
1.800000
2
8

C
1.800000
3
9

D
1.800000
4
10

9
0.000020
5
9

E
1.800000
6
10

F
1.800000
7
7

A
1.800000
8
7

8
0.000020
9
10

8
0.000020
Le critere des moindres carres LS pour
l'arbre d'especes dont les branches sont
evaluees en fonction de l'arbre de gene est: 9.600160
La racine se trouve sur la branche 8

10
===================== TLG #1 ======================
Menant de la branche 7

B a la branche 10

9
LS = 5.333387
RF = 4
===================== TLG #2 ======================
Menant de la branche A

7 a la branche 9

D
LS = 0.000000
RF = 0
Examples
Horizontal transfer of the Rubisco Large subunit gene
Delwiche, C.F., and J. D. Palmer. 1996. Rampant
Horizontal Transfer and Duplication of Rubisco Genes in
Eubacteria and Plastids.
Mol. Biol. Evol.
13:873

882.
Application example 1
rbcL Gene Phylogeny
Delwiche and Palmer (1996)

hypotheses of HGTs
1

Cyanobacteria
→
γ

Proteobacteria
2

α

Proteobacteria
→ Red and brown algae
3

γ

Proteobacteria →
α

Proteobacteria
4

γ

Proteobacteria →
β

Proteobacteria
HGTs of the rbcL gene
8
1
4
2
5
6
3
7
HGTs of the rbcL gene

comparison
Hypotheses by Delwiche
and Palmer
(1996)
1

Cyanobacteria
→
γ

Proteobacteria
2

α

Proteobacteria
→ Red and brown algae
3

γ

Proteobacteria →
α

Proteobacteria
4

γ

Proteobacteria →
β

Proteobacteria
Solution
1.

Proteobacteria →
β

Proteobacteria
2.
α

Proteobacteria → Red and brown algae
3.

Proteobacteria →
γ

Proteobacteria
4.

Proteobacteria →

Proteobacteria
5.
γ

Proteobacteria →
Cyanobacteria
6.
β

Proteobacteria →
γ

Proteobacteria
7.
γ

Proteobacteria →
β

Proteobacteria
8. Cyanobacteria →
γ

Proteobacteria
Application example 2
Horizontal transfers of the protein
rpl12e
Data taken from
:
Matte

Tailliez O., Brochier C., Forterre P. &
Philippe H. Archaeal phylogeny based on
ribosomal proteins. (2002).
Mol. Biol. Evol.
19,
631

639.
Rpl12e HGTs
Species tree
Rpl12e gene tree
Assumed HGTs of the rpl12e gene involved the clusters of
Crenarchaeota and Thermoplasmatales (Matte

Tailliez, 2004)
Reconciliation scenario
1
3
2
4
5
74%
69%
60%
60%
55%
Application example
3
Horizontal transfers of the
PheRS synthetase
Data taken from
:
Woese, C. R., G. Olsen, M. Ibba, and D. Söll.
2000. Aminoacyl

tRNA synthetases, the
genetic code, and the evolutionary process.
Microbiol. Mol. Biol.
Rev.
64
:202

236.
PheRS synthetase
Reconciliation scenario
4
2
1
3
5
62%
88%
65%
85%
60
%
Software
T

R
EX
—
Tree and Reticulogram Reconstruction
1
Downloadable
from
http
:
//www
.
info
.
uqam
.
ca/~makarenv/trex
.
html
Authors
:
Vladimir
Makarenkov
Versions
:
Windows
9
x/NT/
2000
/XP
and
Macintosh
With
contributions
from
A
.
Boc,
P
.
Casgrain,
A
.
B
.
Diallo,
O
.
Gascuel,
A
.
Guénoche,
P
.

A
.
Landry,
F
.

J
.
Lapointe,
B
.
Leclerc,
and
P
.
Legendre
.
________
1
Makarenkov,
V
.
2001
.
T

REX
:
reconstructing
and
visualizing
phylogenetic
trees
and
reticulation
networks
.
Bioinformatics
17
:
664

668
.
T

Rex : Multiple scenario screenshot
Bioinformatics software
T

Rex Web infrastructure
Future developments
•
Maximum Likelihood model
•
Maximum
Parsimony model
•
Decreasing the running time
Bibliography
•
Boc, A. and Makarenkov, V. (2003), New Efficient Algorithm for Detection of Horizontal Gene Transfer
Events,
Algorithms in Bioinformatics
, G. Benson and R. Page (Eds.), 3rd Workshop on Algorithms in
Bioinformatics, Springer

Verlag, pp. 190

201.
•
Delwiche, C.F., and J. D. Palmer (1996). Rampant Horizontal Transfer and Duplication of Rubisco
Genes in Eubacteria and Plastids.
Mol. Biol. Evol.
13:873

882.
•
Makarenkov,V. (2001), T

Rex: reconstructing and visualizing phylogenetic trees and reticulation
networks.
Bioinformatics
, 17, 664

668.
•
Makarenkov, V., Boc, A., Delwiche, C.F. and Philippe, H. (2005), A novel approach for detecting
horizontal gene transfers: Modeling partial and complete gene transfer scenarios, submitted
Mol. Biol.
Evol.
•
Makarenkov, V., Boc, A. and Diallo A.B. (2004), Representing Lateral gene transfer in species
classification. Unique scenario, IFCS’2004 proceedings, Chicago.
•
Matte

Tailliez O., Brochier C., Forterre P. & Philippe H. (2002). Archaeal phylogeny based on ribosomal
proteins.
Mol. Biol. Evol.
19, 631

639.
•
Robinson, D.R. and Foulds L.R. (1981), Comparison of phylogenetic trees, Mathematical Biosciences
53, 131

147.
•
Woese, C. R., G. Olsen, M. Ibba, and D. Söll. 2000. Aminoacyl

tRNA synthetases, the genetic code,
and the evolutionary process.
Microbiol. Mol. Biol.
Rev.
64
:202

236.
Comments 0
Log in to post a comment