mbo004121320s1x - mBio


1 Οκτ 2013 (πριν από 4 χρόνια και 9 μήνες)

115 εμφανίσεις

Supplemental Methods

Genomic DNA preparation

For genomic DNA, strains were grown in liquid culture for 5 to 7 days.

Hyphal mats were
recovered by filtration to Whatman

paper, dried, lyophilized, and ground in liquid nitrogen using
a mortar and pestle. The resulting powder was immersed in OmniPrep lysis buffer (G
Biosciences, St. Louis, MO). Proteinase K was added, the solution was incubated at 55° C for 2
hrs with gen
tle inversion and extracted repeatedly with chloroform until supernatant is clear.
tenth volume of stripping solution was added and the solution was incubated at 55° C for 10
m with inversion. The sample was cooled to room temperature and one
fifth v
olume of
precipitation solution was added with inversion. The sample was pelleted and re
precipitated or
spooled with isopropanol. Purified genomic DNA was then washed in 70% ethanol, dried, and
solubilized in 10 mM Tris, 1 mM EDTA, pH 7.0 and treated wi
th 1ul/100 ul LongLife RNAse
Biosciences, St. Louis, MO) at 37° C for 1 hour. DNA was phenol/chloroform extracted and
quantitated by both nanodrop spectrophotometer and Quan
iT PicoGreen Assay (Invitrogen,

Gene prediction

ding genes were initially predicted using a combination of gene models from the gene
prediction programs FGENESH

(Salamov, A. A., and V. V. Solovyev.

Genome Res 10:516
ra, G., E. Blanco, and R. Guigo, Genome Res
5, 2000)
, and
Borodovsky, M.
, A. Lo
msadze, N. Ivanov, and R. Mills, Curr Protoc
Chapter 4:

Unit4 6, 2003)

as well as EST
based automated and manual gene
models. A gene set was then selected by evaluating the various candidate gene models in
comparison to the best s
coring blast hits to UniRef90
Suzek, B. E., H. Huang, P. McGarvey, R.
Mazumder, and C. H. Wu, B
8, 2007)

and to a preliminary set of proteins
from the dermatophyte
genomes. We further improved the

consistency of the gene models
this group of genomes by examining the alignments of protein orthology clusters
identified using OrthoMCL (
Li, L., C. J. Stoeckert, Jr., and D. S. Roos, Genome Res 13:2178
. Protein sequences for each cluster were aligned using ClustalW (

J. D., D. G.
Higgins, and T. J. Gibson, Nucleic Acids Res 22:4673
80, 1994)
, and regions of poor alignment
were used to identify discrepancies in
gene models across the five genomes.

These included
missing genes, 5'

or 3'
end disagreement, and missing o
r incorrect internal exons. These gene
models were flagged and manually reviewed and fixed when possible. In cases where
predictions overlapped non
coding RNA features, such genes were manually inspected and
removed. The gene sets were then filtered by r
emoving spurious gene models based on matches
to repeat and low
complexity sequences. This included genes with repeat
like names, genes
with > 40% of their coding length overlapping TransposonPSI
) hits, genes with overlap to repeats identified by
RepeatScout (
Price, A. L., N. C. Jones, and P. A. Pevzner,Bioinformatics 21 Suppl 1:i351

known repeat PFAM/TIGRFAM domains and without other PFAM/TIGRFAM domains,
genes overlapping blast hits to database set of known repeat proteins (also without non
PFAM domains), and genes with alignment to at least10 different genomic loci with at l
east 90%
identity. Gene product names were assigned based on the best scoring blast hits to UniRef90 and
to Hmmer equivalogs from TIGRfam and Pfam hits.

Ribosomal DNA locus assembly and analysis

In the dermatophytes as in many filamentous fungi, 5S rRNA g
enes are found in multiple copies
dispersed throughout the genome. To identify 5S copies in the dermatophyte assemblies, the 119
bp 5S rRNA gene of
Aspergillus unguis
(GenBank AY924883.1) was used as a query for
BLASTN. This identified a total of eight (in



to 21 (in


copies in the
nuclear genomes.
full length 5S rRNA gene was considered to be 119
120bp after Rooney and
Ward (
Rooney, A. P., and T. J. Ward, Proc Natl Acad Sci U S A 102:5084
9, 2005)
. In total, 103
full length 5S co
pies were

found, averaging 15 per genome and ranging from a high of 19 in

to a low of seven
T. verrucosum

The copies of the 5S rRNAs within each genome share a higher degree of similarity than
previously observed for other fungi. Of the
M. canis

19 full length 5S genes identified, 11 were
100% similar in sequence, seven varied by a single basepair at the 3’ end, and one had two
differences in a 20 base internal region. This is less than the variation of 5S rRNA genes seen in
other filamentous fun
M. canis

5S rRNA was found to be truncated, subsequent analysis
indicated that this was likely a pseudogene and only 65bp in length. This proportion of
pseudogenes for
M. canis (
5%) is lower than that seen in other filamentous fungi
Rooney, A. P.,

and T. J. Ward, Proc Natl Acad Sci U S A 102:5084
9, 2005)

To assist population studies of
T. rubrum

we assembled a complete copy of the rRNA repeat
unit, and estimated that 25 copies are present in the genome. To calculate the number of rRNA
copies in
T. rubrum

genome, we queried the sequencing reads directly, as rDNA is usually
collapsed in assemblies due to the high similarity between repeat units. First, sequencing reads
matching an rDNA query (
Saccharomyces cerevisiae
) by BLASTN (acceptin
g reads
with an E value cutoff of e
8or less) were collected. Next, these reads were iteratively assembled
into one contig, which was used to re
collect all reads with BLASTN matches to the rRNA
assembly. Coverage was estimated by taking the total amount o
f the raw reads that covered the
final assembled rRNA repeat unit and dividing it by the coverage from the rest of the genome
(8.1X for
T. rubrum
). The complete NTS, 18S, ITS1, 5.8S, ITS2, 28S and part of an adjacent
NTS were assembled (