Text s1, DOCX - mBio

stalliongrapevineΒιοτεχνολογία

1 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

95 εμφανίσεις

Supplemental Online Material for Wasmuth
et al
.
: Integrated bioinformatic and targeted deletion
analyses of the
SRS

gene superfamily identify SRS2
9C

as a negative regulator of
Toxoplasma

virulence
.


The structural core of the SRS domain is stabilized by four invariant cysteines

To highlight the key differences between the SRS subfamilies, we constructed a structure based
alignment for each SRS domain subfamily, which were then combined to provide a re
fined alignment for
the SRS superfamily. Of the six characteristic cysteines (cys
-
1, cys
-
2 ... cys
-
6), four are invariant: cys
-
1,
cys
-
2, cys
-
5 and cys
-
6 (
Fig
ure

S1
). In families one and two, the canonical cys
-
3 is absent, but studies of
the
SRS28 (
sporoSAG
)

structure showed that a spatially conserved disulphide bond forms between cys
-
4
and a subfamily specific cysteine that is present just upstream of cys
-
5

(Crawford et al., 2010)
, indicating
that the relative position of the cysteine in the primary sequence does not necessarily predict the
tertiary structure’s topology. More striking are families that are missing one or both of the middle
cysteines with no apparent compensatory su
bstitution elsewhere;
for example

families three, four and
six. Taken with studies on the three dimensional structures of SRS proteins, four invariant cysteines
stabilise the core of the domain whereas some subfamilies possess two other cysteines that form

a
peripheral disulphide bond, whose importance likely correlates with function and depends upon several
factors including the length of the beta strands and the inter strand loops and topology. Despite intra
-
subfamily pair wise identity as low as 23%, the

majority of the conservation can be mapped to the
proposed β
-
strands structures.


Residues implicated in ligand binding display greatest sequence diversity

To investigate whether the relative position of the SRS domains in the two
-
domain proteins exhibit

differences in conservation, we generated separate alignments for the D1 (solvent
-
exposed) and D2
(membrane proximal) domains from all two domain
T. gondii

Me49 SRS proteins and calculated
conservation scores for each residue. Of the 122 total residues in

the D1 domain, 33 (27%) are
categorized as significantly conserved with 31 (25%) and 59 (48%) moderately and poorly conserved,
respectively. Residue conservation scores were mapped onto the surfaces of S
RS29B

D1 and D2 (
Figure

2
A
-
C
). The conservation scor
es across the D2 domains show a slightly different pattern with a more even
distribution of conservation. Interestingly, the membrane distal end of the D1 subunit is poorly
conserved, while the membrane proximal end of the D2 subunit appears highly conserv
ed. This
sequence variation in the D1 subunit likely reflects the role these residues play in generating antigenic
variability or for binding a variety of different host targets. The interface between the two domains
exhibit similar patterns of conservatio
n, reflecting the complementary surfaces required to stabilize this
interface. Beyond core structural elements of the SRS fold, a notably conserved feature in D1 is a pair of
lysines, Lys33 and Lys94.Since these lysines are not likely serving as structural

elements, the high degree
of conservation suggests a potential functional role. Structural mapping reveals that Lys33 and Lys94
form a basic groove at the interface of the SAG1 dimer, and are predicted to serve as potential binding
sites for negatively ch
arged biomolecules such as a sulfated proteoglycan on the surface of host cells

(He
et al., 2002)
. In fact, the more polar surfaces tend to overlay with the moderately to
poorly conserved
regions, consistent with antigenic drift operating in regions under potential immune selection pressures.
No equivalent surface was identified on D2. It is likely that the D2 domain serves as the foundation to
stabilize and orient the D1 d
omain to promote multimerization.


To assess the potential for structural consequences of key residues identified as divergent in at least one
of the eight newly defined SRS subfamilies, we mapped divergent residues onto the D1 domain of
S
RS29B and col
o
r coded them to represent separated regions in the primary sequence(
Fig
ure

2
D
). It is
clear that divergent individual or clusters of residues are distal to the S
RS29B

D1 dimer interface.
A

side
view of the S
RS29B

dimer (
Fig
ure

2
D
) shows the subfamily defi
ning residues clustered at the tip of the
D1 domain that contacts the D2 domain and largely localizes to the inter strand loops peripherally
associated with the


sandwich core. A zoomed in view reveals the degree to which the subfamily
defining residues a
re spatially associated despite adopting distal positions in the sequence alignment.
The most noteworthy divergence is the substitution of a cysteine at position 54 with a threonine or
valine/phenylalanine in families one and three, respectively and substi
tution of Cys62 with a large polar
residue (Asp, Glu, Lys) in subfamily three (cysteine pair is shown in yellow in
Fig
ure

2
). This observation
is consistent with our analysis above that shows lower conservation scores for these cysteines.



Expansion
of
SRS

gene
superfamily includes whole locus duplications and convergent evolution

The sequence of the SRS proteins is highly diverse, indeed outside the core of four or six cysteine
residues, there is little that is conserved across all the domains. This mak
es phylogenetic reconstruction
extremely difficult, as few branch points could be robustly supported through a bootstrap analysis.
Therefore, we decided to cluster the domain sequences based on pairwise alignment scores. The
resultant eight domain subfamil
ies present an excellent opportunity to study the SRS superfamily. An
important caveat is that intra
-
subfamily relations must remain ambiguous. Within each multi
-
gene
locus, we note that member genes have similar architectures of domain families, have simi
lar gene
structures (number of introns) and, with the exception of the
SRS38

locus, are transcribed in the same
direction

(Figure 1B)
. These strong associations between domain families and the small number of
observed domain architectures (only nine of a p
otential 36 non
-
directional two
-
domain combinations)
implies that exon
-
switching is not a significant mechanism for increasing protein divergence. Taken
together, these data suggest that these loci arose from tandem gene duplication. We focused upon the
mo
st abundant two
-
domain proteins to understand the mechanism(s) that underlie the spread of the
family throughout the genome (Figure
4
). The first mechanism relies solely upon single gene
duplications, some resulting in tandemly arrayed genes and others co
lonising new parts of the genome.
The second is that a single multigene locus arose and is duplicated as a whole through
interchromosomal exchanges. While the second mechanism is the more parsimonious

one
, our
phylogenetic reconstruction of two
-
domain prot
eins containing domain families seven and eight, points
to a mixture of these mechanisms (Figure
4
). The situation is further complicated by the phenomenon of
convergent evolution (reviewed in

(Liao, 1999)
; intralocus exchange (ectopic recombination) within a
locus will homogenise the member genes. It is plausible that the monophyletic groups for
SRS12
,
19
,
36

and
38

were the result of whole locus duplications followed by ectopic recombination. However, the
difference in locus size and pseudogenes prevents us from precluding locus
-
specific single gene
duplications. Furthermore, the
SRS15
,
16

and
40

loci may be r
ecently diverged from their common
ancestor and their genes are concurrently being purified. The forthcoming genome sequences
of
the
related coccidian parasites,
Neospora caninum

and
Sarcocystis neurona
, along with more strains of
T.
gondii
, should
facilitate construction of more informative phylogenies to propose accurate models for

SRS

gene expansion. The addition of
SRS

genes from strains GT1 and VEG showed that the syntenic
genes were monophyletic (data not shown), suggesting that there has been
insufficient time since their
most recent common ancestor for strain
-
specific convergence.


Detailed Materials and Methods


Sequence datasets

The genome, protein and CDS sequences for the ME49, GT1 and VEG strains of
Toxoplasma gondii

were
downloaded from
ToxoDB (version 5.3 (Tg))

(Gajria et al., 2008)

and the expressed sequence tag (EST)
sequences and t
heir associated library information for
T. gondii
,
N. caninum

and
S. neurona

were
downloaded from dbEST

(Boguski et al., 1993)
.


SRS domain family searches

The seed model for the Pfam (version 23) domain PF04092 was searched against the protein sequences
for
T. gondii

(Finn et al., 2008)
. The gathering score (GA) cut off was used. The CD
-
Hit program was used
to build a representative set of domain sequences (95% identity)

(Li and Godzik, 2006)
. All
-
against
-
all
global alignments were produced using the Needleman
-
Wunsch algorithm and the bit score used to
hierarchically cluster the domains

(metric: Kendall’s tau; method: Complete linkage)

(de Hoon et al.,
2004)
.


The eight SRS subfamilies were aligned with PROMALS which combines sequence and structural
information

(Pei et al., 2008)
. The alignment was seeded with the structures of
SRS29B

(1KZQ),
SRS28

(2X28) and BSR4 (2JKS). From these alignments, hidden Markov models were built using HMMER
(version 3.0b3
-

http://h
mmer.janelia.org/)

(Eddy, 1998)
. Searches, with hmmsearch, were performed
against the protein sets for
Toxoplasma
, as well as six
-
frame translations (transeq

(Olson, 2002)
)
of the
Toxoplasma

genome. Manual inspection
of the results, coupled with our understanding of the structural
determinates of an SRS domain, helped define three rules to aid in domain identification. Namely, a
sequence was accepted as an SRS domain if it aligned with one of the domain models with an
Evalue < e
-
5
, was at least 90 amino acids in length and contained at least four cysteine residues. Aligned regions
with fewer than four cysteines, were designated as ‘degraded’ domains.


Pseudogenes were defined as SRS domains that satisfied our search cri
teria but were not part of a
ToxoDB gene model. If two domains were separated by less than 1000bp we considered the domains to
be part of a multidomain pseudogene separated by an intron. We noted that stop codons were present
in 11% of the pseudogenes iden
tified; the lack of a gene model for the remaining 89% is likely to be due
to a stop codon elsewhere in the sequence, but outside the SRS domains.


Genome coverage

Sequences for the
B1

(accession: AF179871) and
rDNA

(accession L25635) loci were downloaded
from
Genbank. Trace sequences for
T. gondii Me49

were downloaded from the NCBI trace archive
(ftp://ftp.ncbi.nih.gov/pub/TraceDB/). BLAT was used to align the traces to the assembled genome
sequence

(Kent, 2002)

(Kent 2002)

and genomeCoverageBed calculated the coverage for each
nucleotide

(Quinlan and
Hall, 2010)
.


Phylogenetic reconstruction

The two domain proteins containing SRS domains from families 7 and 8 were aligned, with
SRS37A

and
SRS37B

providing an outgroup. The evolutionary history was inferred using the Minimum Evolution (ME)
and
Maximium Likelihoood (ML) methods

(Felsenstein, 1981)
. For ME, evolut
ionary distances were
calculated using the DayHoff

(Schwarz and Dayhoff, 1979)

and JTT matrices

(Jones et al., 1992)
, and for
ML the

JTT matrix was used. The ME trees were generated with MEGA4
(Tamura

et al.

2007)

and the
ML tree with Phylip
(Felsenstein, 2005)
.


Inter
-
strain and inter
-
speci
es comparisons

For the three
Toxoplasma

strains, comparisons were done at the protein level and between individual
domains. Sequences were clustered to identify orthologus and paralogous relationships
(Alexeyenko

et
al.

2006)
. Throughout the manuscript, the following terminology was used: “genes” refers to individual
genetic loci; “orthologous group” refers to a set of syntenic genes shared between the three strain
s; and
“loci” was used to define either one gene or a cluster of paralogous genes that group together in a
distinct chromosomal region.


Sequence conservation

The individual family alignments were then aligned together. A superfamily
-
based HMMlogo was
gen
erated and deviations from this in each subfamily were visualised using HMMlogos

(Beitz, 2006)
. The
conservati
on at each position was calculated for D1 and D2 domains

(Pei and Grishin, 2001)
.


Expression profiling

ESTs were mapped onto the
SRS

gene models usi
ng BLAST
X
(Altsc
hul et al., 1997)
. The microarray data
was downloaded from ToxoDB

(Gajria et al., 2008)
. To detec
t stage
-
specific expression, we used an
arbitrary cut
-
off of +/
-

two
-
fold divergence. For the three
-
way inter
-
Type comparisons, the same cut
-
off
was used, relative to the median fold difference.


Polymorphism analyses

Strain type
-
specific single nucleotid
e (SNP) polymorphisms were downloaded for all
SRS

gene model IDs
from ToxoDB (version 6.0) and the ratio of synonomous to non
-
synonomous changes calculated. In
addition, protein
-
based multiple sequence alignments were generated for orthologous groups found

in
all three strains

(Edgar, 2004)
. The coding sequences were then mapped onto the alignment, to maintain
codon integrity. Pair
wise distances were calculated using dnadist

(Felsenstein, 2
005)

and a threshold of
0.4% used to delineate alleles

(Grigg et al., 2001)
. To estimate


(dN/dS ratio), the program Slr was used

(Massingham an
d Goldman, 2005)
;

>1 is positive selection and


<0.5 is purifying.


Plasmid constructs and parasite strains

The
SRS2
9C

gene was amplified from Type I RH and Type II 76K strain genomic DNA using the sense
primer, 5’
-
CGGGATCCGCCGTACAGATACGAGC
-
3’ and th
e antisense primer, 5’
-
AAAACTGCAGCACTCCCTCCACCCCCCTG
-
3’. The resulting PCR fragments were cloned into pTOPO. The
SRS2
9C
-
I and SRS2
9C
-
II pTOPO plasmids contain the extracellular domain of the coding sequence
flanked by 1.0kb of 5’ sequence corresponding to
the SRS2
9C

promoter, un
-
translated regions, the
coding sequence, and 1.0kb of 3’ un
-
translated region. These plasmids were transfected into the
parental parasite RH∆ strain for this study (WT). WT is Type I RH strain whose hypoxanthine
-
xanthine
-
guanine pho
sphoribosyl transferase (
HXGPRT
) has been disrupted. The pHXGPRT plasmid contains the
HXGPRT
open reading frame flanked with the regulatory regions isolated from the Di
-
Hydro Folate
Reductase (
DHFR)
gene. The pGFP
-
LUC plasmid contains separate cassettes fo
r GFP (driven by the GRA4
promoter) and LUC (driven by the DHFR promoter)


Flow Cytometry

Parasites were fixed with 2% formaldehyde in PBS for 2 min, blocked with 1%BSA in PBS and stained
with anti
-
SRS2
9C

antibody (1:1000 monoclonal and polyclonal) anti
-
S
R
S29B

DG52 antibody (1:2000
monoclonal), anti
-
SRS
29A

polyclonal (1:500), anti
-
S
RS34
A 5A6 monoclonal (1:600), and anti
-
S
RS57

IF12
monoclonal (1:2000). Following three washes with 1% BSA in PBS, parasites were incubated with
fluorescent (Phycoerythrin or FIT
C, Sigma) labeled secondary antibody. After four washes with PBS, the
stained parasites were analysed for surface protein expression by Flow cytometry (FACS) using a
FACSCalibur (BD Biosciences).

Mouse infection

7
-
10 week old female CD1 outbred mice were
infected by intraperitoneal injection with a total of 50
parasites diluted in 400 µl of PBS.

Quantitative RT
-
PCR

Total RNA from 0.5
-
1x10
7

tachyzoites was isolated using RNeasy mini kit (Qiagen
), according to the
manufacturer’s instructions. 1 to 2 µg of total RNA was used in a cDNA synthesis reaction using random
primers according to manufacturer’s instructions (Invitrogen). After DNase treatment using RNase
-
Free
DNase kit (Qiagen) RNA concentr
ation was determined by absorbance at 260 nm using
NanoDrop 1000
Spectrophotometer V3.7 (Thermo Scientific)
. Additionally RNA integrity was electrophoretically verified
by GelRed nucleic acid staining (Biotium). Total RNA (2 µg) from each strain was revers
e transcribed with
200 units of reverse transcriptase using 250 ng of random primers (SuperScript II reverse transcriptase
(RT) kit, Invitrogen) according to the manufacturer’s instructions.

Gene expression was measured by quantitative PCR (qPCR) using th
e TaqMan probe based method.
Primer and probe sets were designed using Primer3 software (
http://frodo.wi.mit.edu/primer3/
) and
are listed in table S3
. The primers were purchased from Integrated DNA Technolog
ies, INC and the
TaqMan probes (Sigma
-
Aldrich) were labeled at the 5’ end with 6
-
carboxyfluorescin (6FAM) and the 3’
end with black hole quencher 1 (BHQ1).
Toxoplasma

18S rRNA was used as a reference gene to
normalize the quantity of transcripts. The ampli
fication efficiency of the reference and target gene
primers was grater than 90%. The reactions were performed in a final volume of 20 µl. Each reaction
mix contained 5 µ1 of ‘assay mix’ that included (i) the forward primer (18 µM), (ii) the reverse prime
r (18
µM) and (iii) the TaqMan probe (5 µM), 5 µ1 containing 100 ng of cDNA template and 10 µl of TaqMan
Fast Universal PCR Master Mix (2X) (Applied Biosystems). Each reaction was performed in two
replicates. The experiment was repeated twice (biological

replicates) using two different batches of
cDNAs. The assay plates were run on an Applied Biosystems 7900HT Real
-
Time PCR System (Applied
Biosystem) using the thermal cycling protocol 9600 Emulation. The cycling program included 2 min at
50
o
C, 10 min incu
bation at 95
o
C followed by 40 cycles of 95
o
C for 15 sec and 60
o
C for 1 min. The cycle
threshold value (0.20), was automatically determined by the equipment allowing comparison between
assays. Transcript levels were represented as 2^
-
∆CT

to show absolute le
vels of transcripts relative to
every SRS gene examined.


References

Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997).
Gapped BLAST and PSI
-
BLAST: a new generation of protein database search programs. Nucleic Acids Res
25
, 3389

3402.

Beitz, E. (2006). Subfamily logo
s: visualization of sequence deviations at alignment positions with high
information content. BMC Bioinformatics
7
, 313.

Boguski, M. S., Lowe, T. M., and Tolstoshev, C. M. (1993). dbEST
--
database for “expressed sequence
tags.” Nat Genet
4
, 332

333.

Crawfor
d, J., Lamb, E., Wasmuth, J., Grujic, O., Grigg, M. E., and Boulanger, M. J. (2010). Structural and
functional characterization of SporoSAG: a SAG2
-
related surface antigen from Toxoplasma gondii. The
Journal of biological chemistry
285
, 12063

12070.

Eddy,
S. R. (1998). Profile hidden Markov models. Bioinformatics
14
, 755

763.

Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Nucleic Acids Res
32
, 1792

1797.

Felsenstein, J. (1981). Evolutionary trees from DNA se
quences: a maximum likelihood approach. J Mol
Evol
17
, 368

376.

Felsenstein, J. (2005). PHYLIP (Phylogeny Inference Package) version 3.6.

Finn, R. D., Tate, J., Mistry, J., Coggill, P. C., Sammut, S. J., Hotz, H. R., Ceric, G., Forslund, K., Eddy, S. R.,
S
onnhammer, E. L., et al. (2008). The Pfam protein families database. Nucleic Acids Res
36
, D281

D288.

Gajria, B., Bahl, A., Brestelli, J., Dommer, J., Fischer, S., Gao, X., Heiges, M., Iodice, J., Kissinger, J. C.,
Mackey, A. J., et al. (2008). ToxoDB: an
integrated Toxoplasma gondii database resource. Nucleic Acids
Res
36
, D553

D556.

Grigg, M. E., Bonnefoy, S., Hehl, A. B., Suzuki, Y., and Boothroyd, J. C. (2001). Success and virulence in
Toxoplasma as the result of sexual recombination between two distinc
t ancestries. Science
294
, 161

165.

He, X. L., Grigg, M. E., Boothroyd, J. C., and Garcia, K. C. (2002). Structure of the immunodominant
surface antigen from the Toxoplasma gondii SRS superfamily. Nat Struct Biol
9
, 606

611.

de Hoon, M. J., Imoto, S., Nola
n, J., and Miyano, S. (2004). Open source clustering software.
Bioinformatics
20
, 1453

1454.

Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992). The rapid generation of mutation data matrices
from protein sequences. Comput Appl Biosci
8
, 275

282.

Kent, W. J. (2002). BLAT
--
the BLAST
-
like alignment tool. Genome Res
12
, 656

664.

Li, W., and Godzik, A. (2006). Cd
-
hit: a fast program for clustering and comparing large sets of protein or
nucleotide sequences. Bioinformatics
22
, 1658

1659.

Liao, D. (1999)
. Concerted evolution: molecular mechanism and biological implications. Am J Hum Genet
64
, 24

30.

Massingham, T., and Goldman, N. (2005). Detecting Amino Acid Sites Under Positive Selection and
Purifying Selection. Genetics
169
, 1753

1762.

Olson, S. A. (20
02). EMBOSS opens up sequence analysis. European Molecular Biology Open Software
Suite. Brief Bioinform
3
, 87

91.

Pei, J., Kim, B. H., and Grishin, N. V. (2008). PROMALS3D: a tool for multiple protein sequence and
structure alignments. Nucleic Acids Res
36
, 2295

2300.

Pei, J., and Grishin, N. V. (2001). AL2CO: calculation of positional conservation in a protein sequence
alignment. Bioinformatics
17
, 700

712.

Quinlan, A. R., and Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomi
c
features. Bioinformatics
26
, 841

842.

Schwarz, R., and Dayhoff, M. (1979). Matrices for detecting distant relationships. In Atlas of protein
sequences, M. Dayhoff, ed. (National Biomedical Research Foundation), pp. 353

358.