NIH_31may06_EV

aquahellishSoftware and s/w Development

Dec 13, 2013 (3 years and 7 months ago)

154 views


1

Tools for genetic and genomic studies in
Selaginella
.


SPECIFIC AIMS


The basal plant
Selaginella

moellendorffi

is a
principal

model system for studying the
evolution of metabolic and regulatory pathways at the whole genome level. The long
-
term objective
of the proposed research is to develop the
tools that will enable
researchers to test hypotheses
of gene function
derived from
Selaginella’s
unique
position as a model, especially

hypotheses

that relate
to comparative genomics of
gene ne
tworks
. The specific objectives of this research are:


1.

To develop cDNA libraries for full
-
length cDNA sequencing

2.

To develop protocols for the stable and heritable transfer of genes.

3.

To develop reverse genet
ics methods to study gene function.

4.

To make available on a public web site relevant information about the protocols
developed, EST and genome sequence data, and an easily used order form for
obtaining cDNA and BAC clones.


BACKGROUND AND SIGNIFICANCE


Sela
ginella

is a principal model system for studying the evolution of regulatory pathways


To
appreciate
the significance of
Selaginella

as a model system, it is
necessary to
understand where it fits in the plant evolutionary tree. A simplif
ied phylogeny of the
major groups of land plants (modified from Pryer et al. 2001) and some of the major
innovations that occurred in these lineages are shown in Figure 1. The first vascular
land plants appeared on earth about 425 million years ago ({Stew
art, 1993 #40}) and
subsequently followed two independent lines of evolution, one giving rise to the
lycophytes and the other giving rise to the seed plants (Figure 1). While flowering
plants dominate the earth today, the lycophytes dominated the earth’s
flora during the
Pennsylvanian and Mississippian periods, 360 to 286 million years ago. Modern day
coal deposits consist of fossil remnants of the once
-
abundant lycophytes, which led to
the coining of the term “Carboniferous Era”
for
this period of tim
e. The only lycophytes
that remain from this era include members of the families Lycopodiaceae, Isoteaceae
<<JODY


is it Isoetaceae (sp)? Also in Figure 1 I think it is misspelled>>
and
Selaginellaceae,

the latter represented by only one genus. In addit
ion to their ancient
origins, modern lycophytes are remarkable in that the morphology of modern and the
most ancient lycophytes is conserved and, thus, are true living fossils ({Heuber, 1992
#42}).

Selaginella moellendorffii
, the subject of this project,
is shown in Figure 2.


While determining genome sizes of different lycophytes, we recently discovered that

Selaginella moellendorffii

has a genome size of ~100Mb, which is the smallest genome
size of any plant reported. Through a community effort led by B
anks, the Joint Genome
Institute named the S. moellendorffii
<<will need to be consistent with italics
throughout>>
genome one of its top sequencing priorities in 2004. By whole genome
shotgun sequencing, JGI has now sequenced to 16x coverage the

Selagine
lla

genome.

2

These sequences have been assembled by Gribskov and the assembly made publicly
available (http://Selaginella.genomics.purdue.edu/). In addition to this sequence, other
resources place

Selaginella

at the forefront of genomics research. These

include an
arrayed BAC library (10x coverage) and the sequence of ~34,000 cDNAs (~68,000
ESTs), also provided by JGI.


Brenner et al.
[4]

pointed out the benefits of dev
eloping a model system that has a
minimal genome size and substantial complexity, yet is sufficiently phylogenetically
removed from other species of interest to provide a powerful framework for comparative
genomics. Brenner made this argument for
Fugu rub
ripes

with respect to human and
vertebrate genomes.
Selaginella

moellendorfii
has many of the same characteristics of
Fugu with respect to angiosperms:
Selaginella

has a compact genome (smaller than
Arabidopsis) but is a complex multicellular organism.
More importantly, the divergence
of
Selaginella

and the angiosperm lineages occurred before the development of many
important features such as flowers, fruits and seeds. This makes
Selaginella

the
Fugu

of plants: an ideal laboratory
organism
for studying
the origin of the metabolic,
regulatory, and biochemical features important for angiosperms, including crop plants.






















Selaginella
, as a lycophyte,

sits between the bryophytes and the fern
/seed plant clade
and shares characteristics
of these earlier and later diverging lineages ({Stewart, 1993
#40}). Like the bryophytes, the

l
y
cophytes

have sperm with two flagella, only 1
-
4
chloroplasts per cell and lack true leaves and roots. In common with the
euphyllophytes, the lycophytes have w
ater
-
conducting vascular tissue, a dominant and
complex diploid sporophyte generation, and a waxy cuticle covering the epidermis to
protect the plant from dehydration. Although

Selaginella

has root
-

and leaf
-
like organs,
leaves and organs evolved independ
ently in the lycophyte and euphyllophyte lineages.

Figure 2. Arial portion of
a
Selaginella

moellendorffi
plant.

F
igure 1. A plant phylogenetic tree, modified from Pryer et
al. (
needs ref
).

Hornworts
Liverworts
Mosses
Lycopodium
Gymnosperms
Isoetes
Selaginella
Angiosperms
Ferns
Vascular tissue
Move to land
Leaves, roots
Microphylls
,

roots
Seeds; gametophyte

dependent
Flowers, fruit, pollen; loss of flagella
Bryophytes
Lycophytina
Moniliformopses
Spermatophyta


3

The independent evolution of these major organs is reflected in the different
architectures and morphology of the adult sporophyte plants.


The availability of the whole genome sequences from the green al
gae Chlamydomonas
(
http://genome.jgi
-
psf.org/Chlre3/Chlre3.home.html)
, the moss Physcomitrella (in
progress by JGI),
Selaginella
, and several angiosperms, including Arabidopsis (The
Arabidopsis Genome Initiative {, 2000 #45}), rice ({Goff, 2002 #44}{Yu, 20
02 #7}) and
poplar

({Brunner, 2004 #43}), makes it feasible
for the first time
to address questions
related to the evolution of regulatory pathways in plants using comparative genomics
approaches
. Fortunately, many of these regulatory
pathways have been discovered in
angiosperms over the past decade using Arabidopsis as a model system (reviewed in
the electronic book, The Arabidopsis Book edited by E. Meyerowitz and C. Sommerville
and available online at
http://www.bioone.org/perlserv/?request=get
-
static&name=arabidopsis_ebook
). While our knowledge of the regulation of
developmental, physiological and biochemical processes in
Selaginella

at the mo
lecular
level is limited,
Selaginella

has been intensively studied throughout the past century.
The
majority of these studies predate the rise of Arabidopsis as a model plant system.


Descriptions of the ontogeny ({Lu, 1996 #46}; {Buck, 1976 #47}; {Imaich
i, 1991 #48};
{Webster, 1967 #49};{Dengler, 1983 #50}; {Dengler, 1999 #52}; {Dengler, 1983 #51};
{Jernstedt, 1985 #70}), anatomy ({Jacobs, 1988 #53}; {Schneider, 2000 #54}; {Worsdell,
1910 #55}; {Ma, 1930 #56}), hormone biology and physiology ({Bilderback,

1984 #57};
{Bilderback, 1984 #58};{Wochok, 1973 #59}; {Wochok, 1974 #60}; {Wochok, 1975
#61}), and reproduction ({Brooks, 1973 #62}; {French, 1972 #63}; {Horner, 1970 #64};
{Lyon, 1901 #65}; {Pettitt, 1971 #66}; {Renzaglia, 1999 #67}; {Slagg, 1932 #68}) h
ave
been well documented in various
Selaginella

species. While these (and other) studies
are descriptive in nature, they provide a wealth of information and a solid basis for
addressing interesting biological questions. Questions relating to the evoluti
on of plant
organs (leaves and roots) in
Selaginella

and angiosperms have been of particular
interest to researchers since the early 1900s, in large part because
Selaginella

is so
different from angiosperms and is a relatively simple plant. Armed with the

knowledge
of genes that regulate development in angiosperms, researchers have begun to identify
and characterize regulatory genes in
Selaginella
. Harrison et al. ({Harrison, 2005 #71}),
recently addressed the question of organ homology/analogy between
Sel
aginella

and
Arabidopsis by cloning and examining the expression of
Selaginella

transcription factors
similar to those known to regulate differentiation and pattern formation from the
Arabidopsis shoot meristem (the meristem is located at the tips of shoot
s and roots and
is the source of all cells of the mature plant body). Their results suggest that the organs
of both plants use the same genetic circuitry in patterning their development. Another
aspect of development that is of current interest is the ro
le of microRNAs in regulating
growth and development, and how this mode of gene regulation evolved. In a recent
study, Floyd and Bowman ({Floyd, 2004 #72}) identified from
Selaginella

a gene similar
to an Arabidopsis transcription factor that is regulated

by a microRNA and discovered
that the expression of
the
Selaginella

gene is similarly regulated by a microRNA. With
the genome now sequenced, it should be possible to predict microRNAs using new
algorithms developed for this purpose ({Burgler, 2005 #73}
; {Robins, 2005 #74}; {Zhang,

4

2005 #75}; {Lai, 2004 #76}; {Nam, 2005 #77}; {Rehmsmeier, 2004 #78}) and
experimentally validate
Selaginella

microRNAs and their targets. With the development
of tools described in the proposed research, will it be possible t
o examine the functions
of these genes and the processes they regulate in
Selaginella
.



A medicinal herb,
Selaginella

is a princip
al

system for investigations of metabolic
pathways important to human health.


Several different species of
Selaginella

ha
ve been used as traditional medicines. In
India,
S. bryopteris

is referred to “Sanjeevani”

one that infuses life

for its medicinal
properties ({Sah, 2005 #28}). In Columbia, healers use
S. articulata

to treat snakebites
and neutralize the effects of
Both
rops atrox

venom ({Otero, 2000 #30}).
Throughout
southern China,
Selaginella

is used as a popular herb for the treatment of various
ailments ({Lin, 1990 #29} {Ma, 2003 #32} {Pan, 2001 #31}). In recent years, the
constituents of bioactive extracts from
Se
laginella

have been purified and shown to be
novel plant secondary products.



Plants collectively produce tens of thousands of secondary products and have evolved
complex biosynthetic pathways for their synthesis (
{
Buchanan et al.
, 2000 #88}). While
g
enerally not essential, these secondary products function in plants to provide UV
protection, defense against microorganisms, color and odorants for attracting
pollinators, and give rigidity to the cell wall. Plant secondary products, such as taxol,
colch
icine, ipecac, quinine, and acetylsalicylic acid are also a major source of
pharmaceuticals. Other plant secondary products, such as nicotine and opium, are
of
social, economic and medical concern.

The three major groups of plant secondary
metabolite
s include the terpenoids, the alkaloids, and the phenylpropanoids and allied
phenolic compounds (Buchanan et al.{, 2000 #88}), each grouping based upon their
common biosynthetic origins. While many of these metabolites are common to all
plants, many are a
lso taxon
-
specific. Given >400 Myr of separation,
Selaginella

is likely
to have evolved pathways responsible for the production of secondary products that are
not present in angiosperms.


While most reports of the medicinal uses of
Selaginella

are anecdo
tal, some of the
active compounds have been well characterized ({Yin, 2005 #34}; {Kang, 2004 #35};
{Chen, 2005 #33}). Among them are two novel chromone glycosides, uncinoside A and
uncinoside B (Figure 3), which were obtained from extracts of
Selaginella

uncinata
, a
popular Chinese herb ({Ma, 2003 #36}). Both compounds show potent antiviral
activities against respiratory syncytial virus (RSV), with IC50 values comparable to that
of ribavirin, an approved drug for the treatment of RSV infections in humans
. RSV
causes bronchiolitis and pneumonia in infants and young children (reviewed in
{Chidgey, 2005 #79}; {Mejias, 2005 #80}).











5










Figure 3. Structures of compounds of biomedical interest derived from
Selaginella
.


The biflavonoids 2’
,8”
-
biapigenin, sumaflavone and taiwaniaflavone (Figure 3) isolated
from
S. tamariscina
inhibit lipopolysaccharide induction of COX
-
2 (cyclooxygenase) and
iNOS (inducible nitric oxide synthase) at the transcriptional level

({Woo, 2006
#39}
{Pokharel, 2006 #
84}; {Yang, 2006 #37}
)
. 2’,8”
-
biapigenin also inhibits nuclear
factor (NF)
-

B activation, and this action is required for its inhibitory effects on iNOS
and COX
-
2 ({Woo, 2006 #39}). This finding is significant, as increased NO and
prostaglandin productio
n (mediated by iNOS and COX
-
2) are thought to be involved in
the pathogenesis of some cancers ({Lala, 2001 #85}; {Zha, 2004 #86}).


Selaginella moellendorffi
, native to southeast China, is itself used as a folk medicine to
treat bleeding, jaundice, gonor
rhea and acute hepatitis ({Su, 2000 #38}). The biflavone
ginkgetin (Figure 3) isolated from
S. moellendorffii

selectively inhibits the growth of
human ovarian adenocarcinoma cells by inducing apoptosis ({Sun, 1997 #87}; {Su,
2000 #38}).


The tools develop
ed from this project will be of great value in understanding the
biosynthesis of these products as well as secondary metabolism in general. Secondary
metabolites are notoriously difficult to characterize in any organism because of their
uniqueness, specif
icity, high diversity and low abundance (reviewed in {Jewett, 2006
#90}; {Keller, 2005 #91}). While methods for extracting, detecting and databasing
secondary metabolites are improving (reviewed in {Hall, 2006 #92}), genes that are
involved in their biosy
nthesis and the regulation of these biosynthetic pathways are not
well characterized in general. Our knowledge of the metabolites produced in yeast
({Keller, 2005 #91}) and Arabidopsis ({D'Auria, 2005 #93}) has advanced through the
annotation of their gen
omes. In Arabidopsis, several metabolic genes, listed in Table 1,
have been annotated as new terpene synthases, cytochrome P450s,
glucosyltransferases, acyltransferases and O
-
methyltransferases, all of which are likely
to be necessary for the synthesis or

modification of terpenes, alkaloids and
phenylpropanoids (including the chromones and biflavones from
Selaginella
). The
ability to annotate these genes indicate
s

that they are sufficiently conserved to be useful
in annotating similar genes in other genom
es. Representatives of each gene family
listed in Table 1 are also present in
Selaginella

based on
TBLASTN

searches of its
assembled genome (E value <1e
-
25
; data not shown).


Uncinoside A (1) and
uncinoside B (2) from
S. uncinata

({Ma, 2003
#36}

Sumaflavone from
S.
tamariscina

({Yang,
2006 #37})

Ginkgetin from
S.
moellendorffii

({Su, 2000
#38}). 2’,8”
-
biapigenin
from
S. tamariscina

has a
hydroxyl group at the
arrow ({Woo, 2006 #39}

Taiwaniaflavone
from
S.
tamariscina

({Pokharel, 2006
#84}


6

The results of studies aimed at profiling the
Selaginella

and Arabidopsis metabo
lomes
(currently in progress at Purdue University) can identify secondary metabolites common
or unique to each species. Correlating the presence or absence of specific metabolites
in
Selaginella

and Arabidopsis, for example, to the presence or absence of
specific
metabolic genes in these two genomes using a comparative genomics approach could
be useful in defining genes likely to be involved in specific metabolic pathways.
S
imilar
approach
es

were
recently proposed in yeast by {Keller, 2005 #91}

and
successfully
used to identify previously unknown flagella proteins in humans; this was accomplished
by comparing the genomes of organisms with flagella (humans and Chlamydomonas) to
those without flagella (Arabidopsis)({Li, 2004
#477})
.

From the set of
genes common to
humans and Chlamydomonas
,

th
e

set genes also found in Arabidopsis

was subtracted
.
The specificity of the
resulting putative,
flagella
r

gene

s
et

was furthe
r verified by defining
the flagella proteomes of Chlamydomonas (Pazour) and Trypanosoma (Broadhead, R,
Dawe HR 2006). While defining the metabolome of
Selaginella

is a long
-
term goal,
developing the tools described in this project will be necessary to ver
ify the functions of
putative metabolic biosynthetic genes in
Selaginella
.


Table 1. Representative gene families encoding enzymes participating in secondary
metabolism in A. thaliana (copied from {D'Auria, 2005 #89}; see {D'Auria, 2005 #89} for
reference
s).


Gene family

Genes

Secondary Metabolites Formed

Acyltransferases



BAHD

64

Acylated anthocyanins, aliphatic and aromatic

esters

Serine carboxy
-
peptidase like

Methyltransferases

53

Hydroxycinnamate esters

SABATH

24

Aliphatic and aromatic methyl este
rs

Type I OMT

6

Flavonoid methyl ethers

Carboxy methyl esterases

20

Carboxylic acids

Cytochrome P450 monooxygenases

272

Hydroxylated phenylpropanoids, glucosinolates

Glutathione S
-
transferases

48

Glutathione conjugates

Aldehyde dehydrogenases

14

Aroma
tic and aliphatic acids

Terpene synthases

30

Mono
-
, sesqui
-
, and diterpenoids

Oxidosqualene cyclases

13

Triterpenoids

Glycosyl transferases

107

Glycosides (e.g. glucosinolates, anthocyanins)

Glycoside hydrolases family I

47

Aglucones (e.g. flavonols, p
henylpropanoids)

Pathogenesis
-
related lipase
-
like

proteins

6

Fatty
-
acid
-
derived compounds

Acyl
-
activating enzymes/CoA ligases

63

CoA thioesters, amino acid conjugates

<<
THIS MIGHT BE A GOOD PLACE TO DEFINE THE
SELAGINELLA
COMMUNITY
,

IF YOU
ARE
LOOKING T
O
PUT THAT PARAGRAPH INTO THE BODY OF THE PROPOSAL>>>

PRELIMINARY RESULTS


Genome organization based on BAC sequences.

To estimate the total number of genes that are encoded by the
Selaginella

genome, we
have constructed a BAC library {Wang, 2005 #4} and

sequenced to 10x coverage the
inserts of two BAC clones, named SmBAC1 and SmBAC2. Open reading frames,
protein
-
encoding genes, transposons and retrotransposons were identified using
several gene prediction and protein homology
-
based searches, as implemen
ted under

7

NIX (
http://www.hgmp.mrc.ac.uk
). The sequence of SmBAC1 (132kb) revealed a
pattern of genome organization consisting of gene
-
rich regions alternating with regions
of repetitive DNA. Four regions of th
is BAC insert, ranging in size from 8
-
40kb and
totaling 78kb, consisted solely of repetitive DNA with homology to known
retrotransposons. Twenty
-
five genes encoding predicted reading frames greater than
100 amino acids and without homology to transposons
were predicted by GeneMark
and GENESCAN; all but one were located outside of the repetitive
-
rich regions. To
assess whether these predicted genes are expressed, either gene specific primers were
used in RT
-
PCR experiments, or the BAC sequence used to quer
y the JGI EST
sequences. Shoot tissue was used as a source of RNA in the RT
-
PCR reactions.
Evidence of expression was observed for 15 of the 25 predicted genes.

The second
BAC insert (138kb) contains two sizeable regions (10kb and 44kb) of repetitive
DNA
composed of retrotransposon
-
like sequences. Using the
same
gene
-
finding methods
,
39 predicted genes could be identified within the non
-
repetitive regions; evidence of
expression of 21 predicted genes could be detected in vegetativ
e tissues by RT
-
PCR.
Thus, the two BACs contained 270 kb of Selaginella genomics DNA, in which we
detected expression for 36

of 64 predicted genes.


By extrapolating from these two BAC sequences and assuming that they are typical of
the
Selaginella

genome
, we estimate that the
Selaginella

genome includes 17,000
-

20,000 protein
-
encoded genes, over one
-
half of which are expressed in non
-
reproductive tissues of this plant. This number is less than that estimated in the two
plant genomes that have been sequen
ced to date, including rice with an estimated
32,000
-
62,000 genes {Goff, 2002 #6} {Yu, 2002 #7}, and Arabidopsis with an estimated
25,000
-
30,000 genes {The Arabidopsis Genome Initiative, 2000 #8}. The BAC
sequences reveal that although the genome is punct
uated with islands of gene poor,
retrotransposon rich sequences, the
Selaginella

genome as a whole is relatively gene
dense, ranging from
4.5
-
5.6 genes/kb
. This compares favorably to the Arabidopsis
genome, which has on average
5.3 genes/kb

<<JODY maybe yo
u actually mean
kb/gene here?>>

{Haas, 2005 #5}. Even though the 5’ ends of the
Selaginella

transcripts have not been mapped experimentally, the computational methods used to
identify genes in the
Selaginella

BAC clones indicate that the distance between
the end
of one gene and the beginning of another ranges from 50bp to 1kbp within the regions
of the genome that are gene dense. The close spacing of genes should prove useful in
identifying conserved
cis
-
regulatory sequences using comparative computationa
l
methods
, especially
when
full
-
length cDNA sequences are
available
.


Whole genome shotgun sequence assembly.

To date, JGI has deposited into the NCBI trace archive 1.8M sequence reads or ~1.4Gb
of sequence. The sequences were generated from paired end

reads of three genomic
libraries made from DNA isolated form purified
Selaginella

nuclei (DN
A provided by
Banks), each library having insert sizes of 3kb, 8kb and 40kb. LUCY (H.
-
H. Chou and
M.H. Holmes. Bioinformatics, 17:12, pp. 1093
-
1104, 2001) was use
d to process
sequence reads; processed sequences were then assembled using PCAP REP (12,13)
and placed on a pubic website 27 March 2006 where blast searches can be performed
(http://
s
elaginella.genomics
.purdue.edu/cgi
-
bin/blast_tmpl_s.cgi). While the comm
unity

8

of scientists using this website is fairly small (33 have registered voluntarily since it went
on line in January) this number will grow. JGI also will provide their assembly of the
Selaginella

genome sequen
ce in the future. Comparing the two indep
endent
assemblies wi
ll be useful in evaluating the quality of each assembly.

The current
assembly covers 198.6 Mbases with about 50% of the sequence covered in 3800
contigs, and 90% covered in 17,800 contigs (33,774 total contigs). Surprisingly, 92.5 %
o
f
Arabidopsis

proteins have a match in the current
Selaginella
genomic sequence that
covers at least 50% of their coding sequence. This suggests that most angiosperm
gene families can be identified in
Selaginella
.


Gribskov and collegues have
considerable

experience with development of databases
and electronic resources. Two most relevant projects are the PlantsP/PlantsT/Plantubq
resources, and the Protein Kinase Resource.
PlantsP
(
http://plantsp.genomic
s.purdue.edu
), PlantsT (
http://plantst.genomics.purdue.edu
) and
PlantsUbq (
http://plantsubq.genomics.purdue.edu
) databases
[1]
,

each focusing on
families of proteins in plants and providing links between sequences, functional
genomics, and experimental information. Each

of these protein groups number in the
thousands so the total number for each resource (across multiple plant genomes) is
similar to the number of genes in
Selaginella
.

The Protein Kinase Resource, PKR
[2, 3]
,
integrates sequence, three
-
dimensional structure, genetic, and functional information
related to protein kinas
es. PKR provides a scalable, extensible and redistributable
package implemented in Java using Open Source tools such as Jakarta Struts, Jakarta
Tomcat and the Hibernate object
-
relational persistence framework
(Hibernate, 2004)
.


EST sequences.

To date, JGI has made a cDNA library from mixed
Selaginella

tissues, including roots,
young plants, shoots and leaves and sequenced the 5’ and

3’ ends of 37,225 cDNA
clones to generate 68,132 ESTs. The ESTs were aligned using either blastn or malign
and from these alignments, 7,811 (malign) cDNA clusters plus 1,605 singlets identified.
27% of the total cDNA sequences defined only 9 unigenes, w
ith the most abundant
cDNA cluster represented 6,526 times. The distribution of number of ESTs per cluster
indicates that the likelihood of identifying new genes from this cDNA library is very low
(<1/1000 new ESTs are likely to be novel). While these cD
NA sequences will be useful
in training gene
-
finding programs, such as GENESCAN {Burge, 1998 #2; Burge, 1997
#1} and GeneMark {Besemer, 2005 #3} and Glimmer ({Majoros, 2003 #94}), the number
of genes identified by the current EST collection is estimated to

represent about one
-
half of the total number of genes that are expressed in
Selaginella
.


Among
the genes that are missing from the cDNA sequence collection
, in addition to
th
os
e
that produce
low
-
abundance mRNAs,

are
those
involved in reproductive
development
.

R
eproductive tissues
were not included in pooled tissues used to
generate the cDNA library. To test this directly, we used a comparative genomics
approach to identify flagella genes in
Selaginella

since fl
agella are only ever present on
Selaginella

sperm, and sperm RNA was not included in the cDNA library. We predicted
that of the flagellar genes present in the
Selaginella

genome, few if any would be
represented by ESTs. To identify flagella genes in
Sela
ginella
,
566 Chlamydomonas

9

flagella proteins described by Pazour et al. (2005) were used to query the translated
Selaginella

genomic contigs and EST sequences as well as the Arabidopsis (which
lacks flagellated sperm) and human genomes. We observed the fo
llowing:




34% of the Chlamydomonas flagella genes were found
in plants and human; in
Selaginella
, 95% of these genes were represented by ESTs that encode metabolic
enzymes, histones, tubulin, ATPases, kinases, and protein synthesis genes.



34% of the Chlamy
domonas flagellar genes were unique to Chlamydomonas, all
encoding uncharacterized flagella associated proteins.



14% of the Chlamydomonas flagellar genes were plant specific and unlikely to be
flagella specific. In
Selaginella
, 98% of these genes were re
presented by ESTs
and almost all encode unknown or uncharacterized proteins.



17% of the Chlamydomonas flagellar genes were specific to organisms with
flagella or cilia. Of these, only 18% were represented by
Selaginella

ESTs. This
class included dynein

heavy and light chains, intraflagellar transport proteins, radial
spoke proteins, kinesins and uncharacterized proteins.



2% of the Chlamydomonas flagellar genes were absent from the Arabidopsis and
human genomes but present in
Selaginella

may be specific

to the flagella of plants.
Of these, 15% were represented by
Selaginella

ESTs. All of the proteins encoded
by these genes are uncharacterized.


We conclude from this analysis that most of the flagella protein
-
encoding genes that are
specific to organi
sms with flagellated sperm are present in the
Selaginella

genome, yet
a small proportion of them (<18%) are represented in the EST collection. Enriching for
genes expressed during the reproductive phase of development should, therefore, be a
priority.


Pr
ogress in Agrobacterium
-
mediated
Selaginella

transformation

Most Agrobacterium
-
mediated plant transformation methods target
differentiated
leaf
tissues or friable callus, which consists of dedifferentiated, proliferative cell types

(
reviewed in
Birch 1997)
.
As a foundation for transformation experiments, we have
devised standardized tissue culture conditions under which
S. moellendorfii

can be
aseptically propagated as well as induced to form callus. After testing different tissue
culture media formulatio
ns, standard MS medium with 3% sucrose was found to foster
rapid growth of whole plants, larger leaflets and a darker green color than other media
but with more vitrification (hyperhydration). More detailed response curves for sugar
concentration and pH a
re in process, but we have determined a suitable initial medium
that produces vigorous growth of
Selaginella

and which allows co
-
cultivation with
Agrobacterium. We are currently refining c
o
-
cultivation conditions, including testing the
addition of acetosy
ringone, a phenolic compound that induces the Agrobacterium
virulence genes that promote the transfer of T
-
DNA from Agrobacterium to the host
plant nucleus (reviewed by Gelvin S

Microbiology and Molecular Biology Reviews,
March 2003, p. 16
-
37, Vol. 67, No.

1)
. Of the callus
-
inducing media tested, two (1/10X
fern callus medium and RMNO) produced enhanced proliferation of roots that resemble
early stage “hairy root” induced by
Agrobacterium rhizogenesis
. While not a callus
response, t
his result is significa
nt because the
induction of roots

is usually necessary to

10

transplant
cultured (i.e.,
transformed
)

regenerated shoots
to soil.

Importantly, we
observed that KCMS medium induces callus from roots but not shoots. We are
continuing the KCMS cultures to see i
f stable, friable callus next forms as is typical of
this course of response in other plants. If so, we will have in place the basic set of
culture conditions for
Selaginella

transformation.


A selectable marker is

required to select and test the heritabi
lity of transforming genes
in transgenic plants. While many selectable markers that confer resistance to
antibiotics or herbicides are available for plants, the most commonly used markers and
associated genes that confer resistance to them are: the antibi
otics kanamycin/G418
(conferred by the expression of NPTII) and hygromycin (conferred by expression of
HPT), and the herbicides basta/glufosinate (conferred by the expression of
bar
) and
imazaquin/chlorsulfuron (conferred by expression of ALS). By plating

Selaginella

tissues on varying concentrations of selective agent, we have shown that G418,
imazaquin and chlorsulfuron kill or inhibit growth of
Selaginella

explants at
concentrations similar to those used in dicot species (15 mg/L, 15 ppm and 15 ppb
resp
ectively). Kanamycin does not appear to provide adequate control of wild
-
type
growth, as excessive concentrations (up to 400ppm) have been tested without complete
control. Glufosinate kill curve experiments are currently underway. Therefore, binary
vecto
rs that have HPT or ALS, or possibly
bar
, may be used to transform
Selaginella
.


Successful plant transformation will also require a compatible interaction between the
bacterium and the host plant that leads to the successful transfer of T
-
DNA from the
bac
te
rium to the nucleus of the host cell. We have been working with two standard,
broad host
-
range bacterial strains C58 and EH101, and a number of binary T
-
DNA
vectors. To see if the T
-
DNA from Agrobacterium can be transferred to
Selaginella

explant

cells,

Selaginella

explants were co
-
cultivated with Agrobacterium (strain EH101)
that harbored a T
-
DNA containing the glucorodnidase (GUS) gene driven by the CaMV
35S promoter (a commonly used constitutive plant promoter). A plant intron (derived
from the soybe
an catalase gene) was inserted into the GUS reading frame to prevent
active GUS protein production from Agrobacterium cells.
Selaginella

explants stained
for GUS activity two days following co
-
cultivation with Agrobacterium had blue cells,
indicative of G
US activity. Plants co
-
cultivated with the same Agrobactium strain, but
lacking a Ti plasmid, showed no GUS positive cells. Additional experiments with an
Arabidopsis ret
r
o
transposon (
Athila
) promoter showed similar GUS staining patterns.
These results

demonstrate that the Agrobacterium/
Selaginella

interaction is compatible
and that T
-
DNA can be successfully transferred to the nucleus of
Selaginella

cells,
allowing the transient expression of a gene within the T
-
DNA.


Our work thus far has succeeded in
establishing tissue culture conditions, appropriate
selectable markers for transformation and that Agrobacterium can infect and transfer
DNA to
Selaginella

cells. After optimizing these conditions
, the tools will be in place to
transform,
propagate
and select transgenic
S. moellendorffii

plants.


11

PROJECT DESCRIPTION


Objective 1: Generating normalized libraries for full
-
length cDNA sequencing.


The process of annotating the human {Imanishi, 2004 #12} and Arabidopsis genomes
{Castelli
, 2004 #9; Haas, 2005 #10} has been greatly improved by the availability of full
-
length cDNA sequences. In TIGR’s five annotation releases of the Arabidopsis
genome, for example, the overall statistics of the genome (gene number and density)
changed littl
e, while individual gene annotations changed significantly; only 67% of the
original gene structures remained unchanged from release 1 in 2001 to release 5 in
2005 {Castelli, 2004 #9}. A similar but independent evaluation of Arabidopsis gene
annotation li
kewise showed that exon
-
intron boundaries, 5’ and 3’ untranslated regions,
splicing variants and pseudogenes were better defined using full
-
length cDNA
sequences as a basis for annotation {Haas, 2005 #5}. Given the importance of full
-
length cDNA sequences

in accurate genome annotation, an objective of this proposal is
to make
Selaginella

cDNA libraries from which full
-
length (or near full
-
length) sequences
can be generated in the future.


To identify new
Selaginella

cDNAs, we propose to generate addition
al cDNA libraries.
While
sequencing the libraries is
outside
the scope of this proposal, our intent is to
submit a proposal to JGI’s Community Sequencing Program to sequence these cDNA
clones once the libraries are made.

Our
objective
to faci
litate accurate annotation of the
Selaginella

genome
is to develop
,

earl
ier

rather than later in the annotation process,
as
many full
-
length cDNA sequences as possible
.
Because
normaliz
i
ng

libraries increases
the likelihood that rare messages will be represented
as compared with
non
-
normalized
libraries, normalized libraries will be made where possible. Given
that
Selaginella

is a
relatively simple plant with few complex organs, three libraries generated from different
Selaginella

tissues should be sufficient to provide the full complement of genes
expressed in this organism. The following
d
escribes

the
Selaginella

organs and tissues
from which RNA will be isolated for generating
three
new cDNA libraries.


















Figure 4. The morphology of
S.
moellendorffii.

A) a shoot tip
showing two ranks of microphylls
(R1 and R2), a reproductive cone,
or strobilus (S), the microsporangia
in the axis of microphylls of the
c
one (MS), and the terminal bulbils
(B). B) a megasporangium (right)
and a megaspore (left). C) a
microsporangium with subtending
microphyll plus microspores. D)
SEM of an emerging rhizoid (RZ)
that has undergone a transition to
form a root (R).


12


As shown in Figure 4, the
Selaginella

sporophyte consists of a shoot system, a root
system, and the reproductive strobil
i. The microphylls (leaves) of
Selaginella

are very
small (~2
-
5mm) and occur in two ranks (marked R1 and R2 in Fig. 4A). The root (Fig.
4D) emerges as a stem
-
like rhizoid (labeled RZ in Fig. 4D) than undergoes a transition
to form a true root (labeled R)
. Two normalized libraries will be generated from the
sporophyte, one from leaves, stems and roots and the second from strobili. The
leaf/stem/root sample will also include the shoot and root apical meristems, which
include the actively dividing stem cel
l population plus differentiating stem, leaf and root
tissues. Each strobilus (S in figure 4A) consists of four ranks of microphylls (leaves)
that have in their axes a sac
-
like megasporangium or a microsporangium (a
microsporangium is labeled MS in Fig. 4
A). Inside each sporangium, diploid micro
-

or
megaspore mother cells differentiate and undergo meiosis to produce mega
-

or
microspores, respectively (Figures 4B and 4C). At maturity, each sporangium contains
either hundreds of orange haploid microspores
or four black macrospores. The
dormant spores are eventually released from the sporangia upon desiccation and can
be harvested by placing strobili in glassine bags. As sporangia develop from the base
to the tip of the strobilus, all stages of mega
-

and m
icrosporogenesis are present on a
single strobilus.
W
e are able to harvest sufficient amounts of tissue to generate 100 mg
quantities of RNA

for each of these
two
sporophyte
cDNA libraries
. For all libraries,
tissues will be harvested throughout the day and night, since diurnal changes in gene
expression are well documented in plants.


The
other cDNA library will be produced from the
haploid gametophyte g
eneration
,
which

is of great interest because
Selaginella

is the first vascular plant with an
independent haploid generation to have a complete genome sequence
.
We
now have
unprecedented resources to examine this phase of the plant life cycle. In
Sel
aginella
,
each microspore gives rise to the sperm
-
producing male gametophyte, while each
macrospore gives rise to the egg
-
producing female gametophyte. To generate RNA
from male gametophytes, microspores will be surface sterilized, placed in water and
har
vested daily between 1 and 5 days after sterilization as mature flagellated sperm are
released from the microspores within this time period. RNA will also be isolated from
the pooled female gametophytes of varying ages. Because male and female
gametophyte

tissue is limited, RNA from these tissues will be pooled

and

then used to
construct a non
-
normalized cDNA library in the Banks lab.


While we have experience making non
-
normalized and subtracted cDNA libraries from
a variety of plants ({Wen, 1999 #13}), n
ormalized full
-
length cDNA libraries are
technically challenging to make. Many improvements in normalized library construction
are also proprietary. For these reasons, we will use a reputable company (Invitrogen) to
make the two normalized libraries. Th
is company guarantees 3x10
6

primary clones and
a minimum of 50% of the inserts full length. While creating full
-
length cDNA

sequences
from these libraries for annotation purposes is important, they will also be useful in
developing microarrays for
Selagi
nella

in the future.
E
xpression profiling by microarrays
will
ultimately
be of
great
use to the community
. However,
we feel that at this time
developing the tools that are necessary for examining gene function
is
of
greater

13

importance
if
Selaginella

is to be used as a system amenable to experimentation.
These tools include stable transformation and developing both reverse and forward
genetics for this system.


Objective 2. Developing protocols for the stable and heritable transfer of

genes.


Stable transformation of
Selaginella

with exogenous DNA is a necessary tool for
studying gene function
in planta
. Once developed, this technique can be used in
experiments
including
(1) study
ing

the effects of over
-

or ectopically expressing g
enes,
(2) complement
ing

mutant phenotypes with specific genes, and (3)
introducing
constructs that suppress gene expression by RNA interference (RNAi) throughout the
plant.


Approaches used to stably transform plants all take advantage of the t
otipotent nature
of plant cells; in virtually every plant taxon that has been investigated, whole plants can
be regenerated from one or a few cells.
Thus, plant transformation is a core research
tool that has been developed for well over 100 diverse plant
species (Birch 1997).
Transformation methodology consists of preparing tissues or cells, transforming in DNA,
and regenerating transgenic plants. Adapting transformation to a new plant species thus
requires identifying an appropriate target tissue, determi
ning a method of delivering
DNA to the plant cell nucleus and optimizing plant regeneration from transformed tissue.
Establishing a working optimum a
mong these inherently interdependent variables is an
empirical process,
most
prudently

undertaken by a rese
archer with expertise in plant
transformation.
Hall (Vollbrecht lab) has worked in plant transformation research for
nearly 20 years, w
ith experiences ranging from managing the UC Davis plant
transformation facility that pioneered transgenic crop research, to recent experiments
demonstrating novel, homologous gene replacement technology in plants (Wright et al
2005).
Vollbrecht is a membe
r of the Center for Plant Transformation

and Gene
Expression (CPTGE) at Iowa State,
and collaborates with CPT
GE
’s

Plan
t
Transformation Facility

(
http://www.agron.iastate.edu/ptf/index.aspx)
, the premier
academic
facility

in the U.S. for
crop

transformation
.


Several factors impact the variables of target tissue, DNA delivery and regenerating
transgenics.
A
ppropriate tissue target(s) may vary between species or even between
genotypes within species, from whole plants to dissected or macerated tissues to
prot
oplasted, individual cells, and finally dedifferentiated callus tissue. Because
dedifferentiated tissues may carry the burden of somaclonal variation upon
regeneration, they are not the first tissue of choice for transformation. Similarly, the
length of ti
me required to regenerate whole, transgenic plants is typically directly
proportional to the degree of dissection, so development of new methods ideally
proceeds from trials involving larger to sequentially smaller explant types, unless one
tissue suggests

particular promise. Based on our transient expression data, most
Selaginella tissue types are
potential

targets, although meristem

tissue
s may be
particularly attractive
, and so multiple tissue types will continue to be tested
. The
method of DNA delivery
is critical, because of its influence on the other transformation
variables and on gene expression. Three general DNA delivery approaches are

14

routinely used to transform plants. Two of these approaches simply
bring

naked DNA
into the nucleus, and are there
fore coupled with on technologies to circumvent the
barrier of the plant cell wall. These naked DNA delivery methods include (1) removing
the cell wall to produce protoplasts into which DNA may be introduced by
electroporation (similar to bacterial transfo
rmation), and (2) “biolistics” methods in which
cells are bombarded with DNA
-
coated microprojectiles. A disadvantage of these
methods is that delivered DNA preferentially integrates as multiple, tandem copies,
sometimes leading to gene silencing and other
expression artifacts. Nevertheless,
naked DNA introduction is the method of choice for some plants, notably those that are
recalcitrant to the third method described below.


A third method, Agrobacterium
-
mediated transformation, is based on the biology of

a
plant
-
pathogen interaction in which a defined DNA segment is transferred from the
pathogen (Agrobacterium) into the host plant (reviewed in Gelvin 2003). Agrobacterium
has an incredibly broad host range spanning at least numerous angiosperms and
gymnosp
erms and can even transform several fungal species (reviewed in Gelvin
1993). Moreover, the bacterium interacts efficiently with relatively intact, wounded or
slightly disturbed tissue that has been induced to divide, and
frequently
produces low
-

or single
-
copy inserts. For these reasons
Agrobacterium tumefaciens

is routinely used
to transform many plant species. Agrobacterium harbors a special (Ti
-
) plasmid that
includes a T (for transfer)
-
DNA sequence that is transferred and integrated into the plant
nucl
eus, plus genes necessary for the transfer process. Binary vectors that contain the
T
-
DNA borders and can replicate in both
E. coli

and Agrobacterium have been
developed and are publicly available. Thus, genes to be studied in transgenic plants are
cloned

into the binary vector using
E. coli
as host; the vector is then isolated and
transferred to Agrobacterium. Plant tissues or cells are co
-
cultivated with the
transformed Agrobacterium and, because

the binary vector includes a selectable
marker, transform
ed plants are identified by treatment with an antibiotic or herbicide.
Additional bacterial species capable of transforming plants/Arabidopsis rice have
recently been developed (Rhizobium, Broothaerts, et al. Nature Vol. 433 Feb 10 2005).
We will test all

three transformation approaches, but will begin by devoting substantial,
initial efforts to developing an Agrobacterium
-
based method.


Our
transient expression

experiments
demonstrat
e

the plausibility of developing
Selagi
nella transformation

relatively quickly
. For example, if we can regenerate plants
using some version of the classic leaf disc method that has proven adequate for a
multitude of species (
Horsch, R.B., Fry, J.E., Hoffmann, N.L., Eichholtz, D., Rogers,
S.G. a
nd Fraley, R.T. (1985) A simple and general method for transferring genes into
plants. Science, 227, 1229

1231), then standardizing a transformation protocol should
be rapid.
In any case, a

key juncture in developing the method
will be

observing
transient
expression in
a

tissue type that we can easily
propagate or
regenerate into
whole
plants.


Thus, the next set of experiments, will determine which tissue types in the
shoot are capable of transient expression, both in intact and disrupted explants, and wil
l
carefully compare those tissue types with the tissues that produce plantlets in culture
.

Environmental conditions have an enormous influence on the quality of plant material
produced in tissue culture. Plants will be grown in a tissue culture incubator

that has

15

the ability and flexibility to tightly control temperature, light intensities and duration
independently on each shelf. Determining the optimal conditions for growth of plants
and calli
is
a
critical
component
for develop
ing a
transformation and regeneration
system
.

For
determining regenerative capacity

of tissue types
,
Hall has extensive
experience with adapting plant hormone re
gimes to new plants, for regenerating plants
by both the organogenic and somatic embryogenesis developmental pathways (Birch
1997). Notably, because shoot branches in Selaginella are produced from a single
apical cell, individual
, transformed

shoots

are un
likely to be chimeric for transformed
and untransformed tissue, which
may

be a significant advantage.
After co
-
cultivation
with Agrobacterium, p
lants that are
resistant to selection and retain transforming DNA
even when selection is relaxed, are considered

stably transformed. Stable DNA
integration will be confirmed by standard DNA gel blot and TAIL
-
PCR or IPCR methods,
as are commonly employed in the Vollbrecht lab to confirm transgenics in other plants
and new DNA transposon inserti
on events (
ref TAIL, Br
utnell IPCR
).


The moss
Physcomitrella patens
, a bryophyte (see Figure 1), is readily transformed by
delivering DNA to isolated protoplasts (
Schaefer DG, Zryd J
-
P, Knight CD, Cove DJ.
1991. Stable transformation of the moss
Physcomitrella patens
. Mol. Gen.

Genet.
226:418

24

and

Kamisugi 2005
).
Significantly, i
n
P. patens

transformation experiments,
inclusion of endogenous genomic sequences in the transforming DNA, allows for gene
replace
ment by homologous recombination (HR) (reviewed in Cove 2005). With HR,

gene function may be dissected by reverse genetics through allele modification in gain
or loss of function experiments. In combination with a well
-
annotated genome
sequence, HR in Selaginella could in the future enable comprehensive, genomics
-
scale
revers
e genetics approaches like systematic tagged mutagenesis (Nishiyama T,
Hiwatashi Y, Sakakibara K, Kato M, Hasebe M. 2000. Tagged mutagenesis and gene
-
trap in the moss,
Physcomitrella patens
. DNA Res. 7:9

17) to
generate a large
collection of mapped inserti
onal mutants for the research community
. We will therefore
perform experiments to test for HR in Selaginella, by both the Agrobacterium
-
mediated
and the protoplasting/electroporation
-
mediated approaches (Wright 2005). We have
already calibrated sensitivity

of S. moellendorfii to the imazaquin and chlorsulfuron
herbicides in culture. These herbicides cause toxicity by inhibiting acetohydroxy
-
acid
synthase (AHAS), a key enzyme in the synthesis of the branched chain amino acids
valine, leucine, and isoleucine.

A mutation of S653N in the AHAS gene prevents
herbicidal inhibition while preserving its normal catalytic function, a feature that has
been exploited to create crops resistant to AHAS
-
inhibiting herbicides (
e.g.
, Zhu T,
Peterson DJ, Tagliani L, St. Clair
G, Baszczynski CL, and Bowen B. 1999. Targeted
manipulation of maize genes in vivo using chimeric RNA/DNA oligonucleotides.
Proceedings of the National Academy of Science 96:8768
-
8773.).
The
S. mollendorffii

draft sequence contains an AHAS gene that is ver
y similar to the tobacco gene, for
which modified, herbicide
-
resistant constructs are readily available (D. Wright and D.
Voytas, ISU, pers. comm.).
Our strategy will be to introduce altered AHAS activity by
HR, using
a 5' truncated, modified gene for reco
mbination into the native locus. In this
way herbicide resistance can be expressed only if the T
-
DNA fragment recombines with
the endogenous gene, and we should be able to
detect recombination events by
selection with the herbicide
.
A clone of the full len
gth modified gene will to serve as a

16

positive control; clones will be produced both in T
-
DNA vectors for Agrobacterium
experiments and in plasmid constructs that will be used for electroporation.


Objective 3. Developing reverse genetics methods to study g
ene function.


Forward genetics is invaluable for dissecting developmental, physiological and
biochemical pathways and understanding the underlying nature of many human
diseases. Since forward genetics works in the direction of phenotype to the gene,
succ
ess does not depend on having any prior knowledge of a gene. Banks has
extensive experience developing forward genetics in the fern Ceratopteris ({Banks,
1994 #18; DeYoung, 1997 #16; Eberle, 1996 #17; Banks, 1997 #15; Strain, 2001 #14})
and in studying th
e epigenetic regulation of McClintock’s Spm transposable element in
maize ({Fedoroff, 1988 #20; Masson, 1989 #22; Banks, 1988 #21}). While identifying
developmental mutants in
Selaginella

would be interesting, the community of scientists
using
Selaginella

has immediate interest in studying the evolution of specific genes or
gene families whose presence in
Selaginella

is determined using computational
methods. While specific genes can be sought in the
Selaginella

genome by nucleotide
or amino acid homology
, conservation of sequence is not an indicator of conservation of
gene/protein function. At most, one can examine the expression of a gene in
Selaginella

as an indicator of function, or transform it into another plant to assess what
affect it’s expression

may have on the transgenic plant. The latter approach was
recently used to examine the functional conservation of the
LEAFY

(
LFY
) genes
({Maizel, 2005 #25}). LFY is a transcription factor necessary to induce flowering in
Arabidopsis.
LFY
-
like genes fro
m 14 plants representing several plant lineages that
diverged from the angiosperm lineage between 200 to 400 Myr ago were cloned and
expressed under the native
LFY

promoter in
lfy

mutant Arabidopsis plants. A gradient
of
LFY

complementation that reflected

phylogenetic distance from angiosperms was
observed, with
LFY

genes isolated from plants most closely related to the Arabidopsis
complementing more
lfy

phenotypes than the
LFY

genes isolated from more distantly
related plants.
LFY

genes from moss (the mo
st diverged lineage) showed no
complementation of any
lfy

phenotypes. While this observation is interesting, it does
not resolve the function of
LFY
-
like genes in the taxa from which they were isolated and
underscores the importance of developing tools fo
r analyzing gene function in critically
selected taxa.


To investigate the evolution of genes and their functions, we propose to develop reverse
genetics tools that have a high probability of enabling r
esearchers in these efforts.
Vollbrecht has
extensive
experience with
insertional mutagenesis (
Hake 1989,
Vollbrecht 1991, Vollbrecht 2005
) and in
developing gene knockout resources for
reverse genetics in plants (
May, BP, H Liu, E Vollbrecht, L Senior, PD Rabinowicz, D
Roh, X Pan, L Stein, M Freel
ing, D Alexander, and R Martienssen. "Maize
-
targeted
mutagenesis: A knockout resource for maize."
Proc Natl Acad Sci U S A
. 100 (2003):
11541
-
11546.)
,

and is currently funded to produce and characterize an insertion
-
based,
community
knockout library in mai
ze (see current support).

Because the
Agrobacterium
-
mediated transformation process described above results in random T
-
DNA insertions into the genome, mutation
s caused by the insertion of T
-
DNA can be

17

identified and the disrupted gene readily cloned usin
g the T
-
DNA sequence as a tag.
While most of our initial transgenic experiments will be focused on characterizing the
consequences of novel gene function, we will also characterize the T
-
DNA insertion
locations and should begin to acquire information about

insertion sites and potential
gene knockouts. This process of “T
-
DNA insertion tagging” is extensively used in
Arabidopsis to knock out essentially every gene (
http://signal.salk.edu/cgi
-
bin/tdnaexpress
).
While such coverage is beyond the scope of thi
s proposal, it is likely that with optimized
transformation methods, we will be able to scale up the production of transgenics to
implement full
-
scale T
-
DNA tagging in
Selaginella

as well.


EMS mutagenesis and reverse genetic screens by TILLING

Selaginella

moellendorffi

can be propagated two ways. One is by clonal propagation
through the production of bulbils, a technical term for the developmentally arrested and
small (~2.5mm) meristems that form at the termini of each branch (shown in Figure 4A).
They a
re very similar to seeds in that each bulbil is dormant and consists of arrested
shoot meristems from which adult plants grow when placed in soil or on agar. Each
Selaginella

branch produces up to 100 bulbils that can be easily harvested, stored for at
le
ast one year and surface sterilized to generate sterile plants. The ability to produce
bulbils is a unique trait of this species and is extremely useful because the bulbils can
be mutagenized and used to generate mutants in much the same way that
seeds are
treated with ethyl methanesulfonate (EMS) to generate mutants in Arabidopsis. Our
objective is to perform pilot studies to optimize EMS mutagenesis in this organism then
generate mutant lines of
Selaginella

that can be used for genetic cross
es, suppressor
screens and TILLING.


Given a 1C genome size of 100Mb and a G:C content of 44%, there are ~4.4 x 10
7

bp
that are susceptible to EMS mutagenesis in the
Selaginella

genome, which is equal to
the number in Arabidopsis. Assuming that all G:C ba
se pairs are equally sensitive to
EMS and an EMS mutation rate similar to that of Arabidopsis (1.6x10
5
,{Jander, 2003
#476}), methods for saturation mutagenesis developed for Arabidopsis should be
applicable to
Selaginella
.


TILLING (targeting induced local

lesions in genomes) is a relatively recent technique
used to identify chemically
-
induced mutations in specific gene sequences in Arabidopsis
({Comai, 2006 #26}{McCallum, 2000 #27}; {Henikoff, 2004 #473}), human (Underhill et
al. 2004 Drosophila ({Winkler,

2005 #471}), zebrafish ({Bradbury, 2004 #474}) and
maize ({Till, 2004 #472}). TILLING involves isolating DNA from mutagenized individuals;
pools of DNA are used as templates for PCR using primers designed to amplify a
specific gene region of interest. Af
ter heteroduplexes between wild
-
type and mutant
gene fragments are allowed to form by heating and cooling the DNA samples, DNA
mismatches are cleaved with CELI and fragments resolved by electrophoresis. Here
we propose to apply TILLING to EMS mutagenized
Selaginella
. Once the optimal
conditions for TILLING and EMS treatments are established, we will test this method for
identifying mutations by seeking EMS
-
induced mutations in
Selaginella

flagella genes
(previously mentioned). Mutations that are likely t
o result in loss of gene function will be

18

sought. Whether these mutations have a flagella phenotype should be evident by
examining the motility of
Selaginella

flagella.


To establish the efficiency of EMS mutagenesis in
Selaginella
, pools of 500
Selagin
ella

bulbils will be treated with four different concentrations of EMS (0.1, 0.2, 0.4 and 0.8%
EMS) for 8 hr; 0.4% EMS is typically used in Arabidopsis ({Kim, 2006 #475}).
Mutagenesized bulbils will be planted and DNA from leaves from individual M1 plants

isolated.
Selaginella

has a single apical cell from which the adult plant is derived; the
adult plant, therefore, is unlikely to be chimeric. DNA from four adult individuals within
the same pool will be combined. Using the CODDLE interface (Comai and He
nikoff
2006), gene specific primers designed to amplify 10 genes encoding either radial spoke
proteins or intraflagellar transport proteins and that are present as single copies in the
Selaginella

genome will be used in PCR reactions. After digestion with

CELI, PCR
fragments will be run on LI
-
COR gel analyzers in the Maize Genome TILLING facility.
This facility is run by Dr. Cliff Weil (Purdue University) who will collaborate with us on
this aspect of the project (a letter of support from Weil is provided
). If a polymorphism is
detected, individual DNA samples from the appropriate pool will be processed to identify
the individual having a mutation in a specific gene. The relevant PCR fragment from
wild type and mutant alleles will be amplified, cloned th
en sequenced to verify the
nature of the mutation. Individuals with missense and truncation mutations that affect
the function of the protein(s) will be selected. Given a 16x coverage of the
Selaginella

genome sequence by JGI, any polymorphisms that exis
t in plants prior to mutagenesis
should be evident in the assembled
Selaginella

genome sequence. Primers will be
designed to avoid the amplification of fragments containing polymorphisms. All primers
will be tested using wild
-
type DNA prior to TILLING mu
tagenized populations. Methods
developed for DNA isolation and PCR in Arabidopsis work very well in
Selaginella
.


While M1 individuals are likely to be heterozygous at a mutant locus, the microspores
derived from each M1 are haploid and should segregate
normal:mutant spores in a 1:1
ratio. To test the TILLING process further, microspores from the M1 plant will be
harvested, sterilized, added to water and their sperm harvested between 2 and 4 days
after inoculation.
For a strong loss of function mutat
ion we
predict that only one
-
half of
the spores will give rise to males that produce motile sperm, as the genes targeted for
TILLING are important components of flagella in other organisms.


The results of these experiments will be used to: 1) determine t
he optimal conditions for
EMS mutagenesis that yields a high mutation rate; 2) determine the nature of EMS
-
induced mutations; 3) test the effectiveness of obtaining multiple mutations in a given
gene in
Selaginella
. Once accomplished, larger populations o
f individuals (the number
depending on the mutation rate) will be mutagenized, self
-
fertilized and F2 vegetative
bulbils from each individual harvested and stored. This population can then be used in
future experiments to identify mutations in other
Selag
inella

genes.


The greatest challenge with
using
S. moellendorffii
as a
genetic
system is its generation
time. Bulbils take 15 months to reach sexual maturity when grown in greenhouses at
Purdue University. We are currently culturing plants under

controlled environments

19

using growth chambers to vary light quality, temperature and day length to shorten the
life cycle.



Objective 4.
Provide public access to

information
developed in this project


By virtue of its position in the evolutionary tree
,
Selaginella

is perfectly poised to shed
light on important biological and development
al

processes in plants. Many of these
systems, such as signal transduction and gene regulation by transcription factors are
common to animals, and so the development of

methods to describe and analyze
these
features is of broad general importance.


The goal of this portion of the project is not to develop a fully featured genome database
for
Selaginella
, but rather to make available specific kinds of information that w
ill most
broadly enhance the use of
Selaginella

as a model organism. We will provide an online
Selaginella

resource that makes the following information available


A.

Sequences


Raw and assembled DNA and RNA sequences and standard
sequence searching methods

(i.e. BLAST). These are available now. JGI plans
sequencing additional reads

which will require assembly
.

Preliminary predictions of
protein sequences will be made across the entire genome.

B.

Preliminary gene annotation


preliminary function predictions

with focus on
target

families such as those involved in regulation, signal transduction and secondary
metabolism.

C.

Wiki


a

Wiki will be used to
gather
information from the community
describing the

use of
S. moellendorffii

as a model organism, including ex
pert annotation of specific
protein families, metabolic and developmental processes.


Protocols

and
e
xperimental protocols for growth, DNA and RNA isolation, transformation, etc.

will
be included in the Wiki, as well.


As far as possible, this resource wil
l reuse components that are either publicly available,
such as from GMOD [???] or reuse components from other local projects.


A. Sequences and preliminary gene annotation

Preliminary
identification of genes and their functions (gene functional annotation)


provides the "handle" that research groups need to identify genes involved in specific
processes. The
Selaginella

resource will provide flexible methods for assembling sets
of genes and retrieving their sequences. Basic analyses will also be provided (
and
more sophisticated ones described below). These will be provided in an intuitive web
-
based framework.


In order to support comparative genomics, the initial
Selaginella

genomic sequence
must
be
made available for comparison with relevant genomes.
We
plan to include
c
urrently
available
genomic sequences
from
Arabidopsis thaliana

[5]
,
Oryza sativa

(rice)

20

[6
-
8]
,
Chlamydomonas reinhardtii
[9]
,
Synechocystis

[10]
, and
Cyanidioschyzon merolae

[11]
. The first two are angiosperms, and the latter three are red and green algae.
Sel
aginella

is perfectly positioned to fill this gap between single
-
celled plants and
advanced plants.

Other genomic sequences (or non
-
genomic) will be added as they
become available.

<<WHY NOT MENTION PHYSCOMITRELLA? COMPETING?>>


Gene Finding.

The current

Selaginella

sequence is available as nucleotide sequence
only
. Before the sequence can be broadly useful, genes must be predicted across the
entire sequence. Many programs exist for this purpose. Among the most successful to
plant genomes have been Gli
mmer, Genemark, and FGeneSH. Glimmer
[14]

uses
interpolated Markov models (IMMs) to identify coding regions and identifies ATG, GTG,
and TTG as potential start sites. Glimmer was used for prediction of genes in the
original Arabidopsis genome assembly. Genemark
[15, 16]

uses species specific
Markov models of coding and non
-
coding regions to determine coding potential.
GeneMark.hmm
-
E programs are predicting genes and intergenic regions in a sequence
as a whole. They use the Hi
dden Markov models reflecting the "grammar" of gene
organization. The GeneMark.hmm
-
E is designed for eukaryotes and takes splicing into
account when predicting the maximum likely parse of the whole DNA sequence into
protein coding genes. Typically, gene p
rediction algorithms require prior knowledge to
train the algorithm for a particular organism. Genemark uniquely has a self
-
training
mode (Genemark hmm
-
ES)
[16]
; because the
Selaginella

sequence has not been well
studied in the past, this is an attractive option for gene identification. FGENESH uses
pattern recognition algorithms to recognize a variety of sequence signals, and combines
the
se predictions with Markov chain models of coding regions. The optimal combination
of these features is then found by dynamic programming and a set of gene models is
constructed along given sequence. FGENESH and Genemark were recently found to
be more ac
curate that GENSCAN, Glimmer
-
R, and Grail when tested on Maize
sequences
[17]
. We do not anticipate any insurmountable obstacles in gene
identification as it is likely that methods trained with either Arabidopsis or rice will be
sufficient for initial predictions.

<<IN GENE FINDING

AND/OR IN ANNOTATION

SECTION
, SEEMS SHOULD
MENTION INCORPORATING THE SEQUENCES OF THE FULL
-
LENGTH Cdnas WE
PROPOSE TO GENERATE IN OBJ
ECTIVE

1

??>>

B. Gene Function Annotation.


Many comparisons require preliminary assignment of protein function. We will

use
conventional high
-
throughput methods that include sequence comparison with BLAST
[18]
, and identification of sequence motifs by comparison with PROSITE
[19, 20]

and
Pfam libraries
[21, 22]
. Based on these

results, Gene Ontology terms
[23]

will be added
using the existing assignments made to the Arabidopsis and Rice genomes as
templates. Additional assignment to useful functional groupings such as Clusters of

Orthologous Genes (COG
[24]
) and to metabolic pathways using KEGG
[25]

and
MetaCyc
[26]

will be similarly performed,


Comparative Genomics
.

As described before,
Se
laginella

is a powerful source of
comparative information that sheds light on the
evolution
of fundamental

biological

21

proces
ses and development in plants.

The
Selaginella

resource will provide flexible
methods for assembling sets of genes and
retrieving their sequences. Basic analyses
will also be provided (and more sophisticated ones described below). These will be
provided in an intuitive web
-
based framework.

Our goal is not to develop a fully
featured genome database, but to provide resea
rch groups with the tools to extract sets
of genes of interest, and to make basic comparisons with other plant genomes.


Basic Analyses



BLAST
[18]

search (
Selaginel
la

genomic and EST sequence,
Arabidopsi
s, Rice,
others as required)



Multiple sequence alignment (Clustal
[27]

and Muscle
[28]
)



Trees and Gene Clusters (Phylogenetic tree
s
[29
-
31]
and MCL
[32]
)


In addition to the preliminary annotation of genes above, w
e will focus on particular
features of the
Selaginella

genome that are less commonly examine
d

at early stages.

Because it is a basal model organism, it is useful to focus on features that may be
involved in the regulatory changes that separate
Selagine
lla

from angiosperms. We will
make particular effort to identify repeated sequence elements, non
-
coding RNAs and
regulatory factors and their binding sites. Development of pipelines to identify such
features is generally useful for any model organism gen
ome.


Repeated and Transposable Elements

Many repetitive sequences can be identified by
using of a combination of the REPEATMASKER program
[33]

and Class
-
specific hidden
Markov models. We will be guided by the suggestions of J
eretic et al.
[34]

in applyi
ng
these approaches. We will use the combined evidence system of Quesneville et al.
[35]

to improve the quality of predictions.



miRNAs
and other Noncoding RNAs

microRNA
(
miRNA
) is a recently discovered class
of non
-
coding RNA that regulates gene and protein expression in plants and animals.
miRNAs
have been identified by a variety of computational me
thods
[36
-
40]
. In general,
these methods identify long hairpin structures and GC content.
mi
RNA
is found in both
plants and animals,
and have been demonstrated in
Selaginella

(JODY knows the
reference)
. Due to the evolutionary divergence between
Selaginella

and Angiosperms
(e.g., rice and Arab
idopsis) it is particularly interesting to ask whether
miRNAs
are
different in
Selaginella
,
to what extent
they regulate the same (or corresponding genes),
and how they have changed over such long evolutionary times. In this regard, the wor
k
of Wang et al.
[41, 42]
, in which
miRNAs
we
re computationally predicted in both Rice
and Arabidopsis will be a particularly useful comparison.


Detecting Known Regulatory (Binding) Sites

There are numerous databases of
transcription factors and their binding sites including several with plant spec
ific
information
[43
-
45]
. As only a lim
ited number of plant transcriptional regulatory sites are
known, searching for sites upstream of similar genes using pattern recognition methods
(see below) or sequence comparison based methods such as rVISTA
[46]

are likely to
be
particularly useful
.



22

Labeling of genes with their Arabidopsis and rice cogna
tes, and thereby with GO terms
will allow sets of genes in a variety of biological systems, or with common molecular
function to be identified. Construction of such sets is a key feature that will be
implemented in the
Selaginella

Resource. While the lab
eling of genes with functions
may not be perfect,
our
growing understanding of Arabidopsis and Rice allow targeted
searches.


There are two basic approaches in use for discovery of unknown binding sites:
probabilistic methods such as MEME/MCAST
[47, 48]
, Gibbs sampler
[45, 49
-
51]
, or
combinatorial methods such as YMF
[52, 53]

and Winnower
[54, 55]
, or PROJECTION
[56]
. Both approaches rely on selecti
ng groups of sequences that are apparently co
-
regulated based on expression array experiments and searching for sub
-
sequences that
are over
-
represented in the upstream region of the genes (or operons). Methods that
allow multiple sites to be combined to d
escribe complex sites such as MCAST
[57]

may
also be useful.

<<THIS IS ALSO A POSSIBLE PLACE TO INSERT THE PARAGRAPH ABOUT THE
SELAGINELLA COMMUNITY>>

C. Community Access &
Selaginella

Resource

All data
are
publicly available at the
Selaginella

web si
te (http://
Selaginella.genomics

.purdue.edu). While construction of a database is not the main purpose of this
proposal, we will reuse existing database designs and software
[1]

and the facilities of
the Purdue University Computational Genomics Core Facility to make the sequence and
derived information publicly accessible and searchable. The
Selaginella

web site
currently ma
kes BLAST searches of the EST and genomics sequence assembly
available, and incorporates a Wiki
[58
-
62]

to
provide a forum for community feedback,
interaction, and community
-
based expert curation. Wikis have numerous advantages:



Works off
-
the
-
shelf



No
-
cost for software



Organized as a hyperlink
ed text



Built
-
in searching capability



Allows user contributions
as well as curated articles


The
Selaginella

Resource is
based on existing functional
genomics databases in the
Gribskov group
[1, 63]
. These
databases support community
modeling of annotation and
integration with functional
genomic information such as
microarray and proteomic
information. This family of
databases has been under
development since 1999, and has a wel
l developed database and software
infrastructure. The overall architecture of the system is divided into three parts: a
persistence layer, an object
-
relational layer, and an application/display layer. The
Display Layer
HTML
Perl
CGI
Object
-
Relational Layer
User
Perl
(.pm)
Packages
Persistence Layer
MySQL

23

persistence layer is implemented using MySQL
[64]
, a robust open
-
source relational
database sys
tem. The object
-
relational layer is a set of Perl packages totaling about
20,000 lines of code. This layer maps the tables and attributes of the persistence layer
onto programming objects. Because (nearly) all SQL code is encapsulated in the
Object
-
Rela
tional layer, the applications are relatively immune to small revisions in the
database schema, or in principle, to the RDBMS. The Display layer mediates the
interaction between the database system and the outside world. At this time it mostly
comprises
Perl CGI (Common Gateway Interface) scripts that are activated via a web
browser.


Evidence Codes

Annotation must follow standard procedures
for

providing evidence
codes for the source of annotations. We
will

use evidence codes borrowed from the GO
proj
ect. G
O

currently uses the following codes



IEA inferred from electronic annotation



IC inferred by curator



IDA inferred from direct assay



IEP inferred from expression pattern



IGI inferred from genetic interaction



IMP inferred from mutant phenotype



IPI in
ferred from physical interaction



ISS inferred from sequence or structural similarity



NAS non
-
traceable author statement



ND no biological data available



TAS traceable author statement



NR not recorded

Each code of these covers specific types of evidence, for

instance IPI is used when the
evidence is “2
-
hybrid interactions”, “Co
-
purification”, “Co
-
immunoprecipitation”, or
“Ion/protein binding experiments”, but these actual experimental types are not recorded.
As experimental data, analyses, and predictions ar
e returned by the projects, these
codes will be enhanced by detailed subcodes that specify the type of experiment or
analysis (as in the types enumerated for the GO codes, but explicitly recorded as a sub
-
code). This experimental/analysis code will be des
igned to allow the user to track an
annotation to the analysis that supports it with associated literature citation and
metadata describing the actual experiment. As described above, the current database
schema already implements such a system for computa
tionally predicted features; each
feature is related to a specific.


Data Release and Intellectual property polices

All data and software produced by this
project will be made available to the academic research community without charge.
Software will be
made available in source code form, and data will be provided using
approved data standards
. Data generated by the project will be made
available via FTP
immediately after preliminary processing (e.g. background adjustment and
norm
alization).