Review for Online Journal of Bioinformatics

moredwarfΒιοτεχνολογία

1 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

154 εμφανίσεις

REVIEWERS PRECIS


In silico

analysis of PCR amplified DOf (DNA binding with one finger) transcription
factor domains and cloned Genes from cereals and millets


Reviewer 1:
The submitted paper describes the identification of DNA binding with one
finger (DOF
) domain genes and the PCR amplification of DOF domains from 11 cereals
and several varieties of millet. The authors describe a process of
in silico
analysis of the
sequences of the DOF genes and domains, making use of a variety of publicly available
bioin
formatics applications for phylogenetic and functional analysis and motif detection.
The evolutionary relationships between the genes and among the domain sequences
are explored, and functional significance of some motifs discussed.


Overall, this work rep
resents an effort to explore the relationships among the DOF
family of genes across a number of cereals and millets. Interestingly, the authors are
able to identify a motif in the DOF gene sequences associated with a particular class of
biological function
. However, while a number of bioinformatics approaches have been
applied, the results of these analyses have not been well integrated


better integration
of the phylogenetic and functional analyses would result in a much stronger manuscript.
The manuscrip
t would also benefit from the inclusion of further discussion of the
biological significance of the bioinformatics findings
.


Reviewer 2:

The text has been returned
28
th

Nov, 2008
with the recommended changes
and in my view this paper is now acceptable for

publication.



Details below:






























Review for
Online Journal of Bioinformatics


In silico

analysis of PCR amplified DOf (DNA binding with one finger) transcription
factor domains and cloned Genes from cereals and millets

Hariom Kus
hwaha, Nidhi Gupta, Vinay Kumar Singh, Anil Kumar and Dinesh Yadav*


The submitted paper describes the identification of DNA binding with one finger (DOF)
domain genes and the PCR amplification of DOF domains from 11 cereals and several
varieties of millet
. The authors describe a process of
in silico
analysis of the sequences of
the DOF genes and domains, making use of a variety of publicly available bioinformatics
applications for phylogenetic and functional analysis and motif detection. The
evolutionary r
elationships between the genes and among the domain sequences are
explored, and functional significance of some motifs discussed.


Overall, this work represents an effort to explore the relationships among the DOF family
of genes across a number of cereals

and millets. Interestingly, the authors are able to
identify a motif in the DOF gene sequences associated with a particular class of
biological function. However, while a number of bioinformatics approaches have been
applied, the results of these analyses

have not been well integrated


better integration of
the phylogenetic and functional analyses would result in a much stronger manuscript. The
manuscript would also benefit from the inclusion of further discussion of the biological
significance of the bio
informatics findings.


General
comments
:

The paper contains grammatical errors too numerous to be itemized here. The paper
should be carefully revised by the authors or a competent editor to remove these
errors prior to re
-
submission.


Abbreviation of th
e DOF domain is not consis
tent through out the manuscript


it is
written as DOF, Dof, and dof. Likewise, abbreviation of PBF
-
DOF is not consistent,
written as PBF
-
Dof, PBF dof and PBF
-
dof. Use of abbreviations must be consistent
throughout the document. A
ll abbreviations, including the abbreviated names of
domains, should be written out in full on the first use of the term, and then followed
in text by the abbreviation. The abbreviation should then be used in the remainder of
the manuscript.

The term
homol
ogy

has a specific meaning: that the biological entities in question are
descended from a common ancestor. Homology is not a synonym for similarity,
identity, or other attributes that might indicate a homologous relationship between
entities. Care should b
e taken throughout the manuscript to ensure that this term is
used accurately, and it should be replaced by a more appropriate term such as
similarity

if this more accurately reflects the property under discussion.

For example, sentence 4 of paragraph 1 o
f the Introduction currently reads:

In spite
of intensive
homology

in the Dof domain, the rest of the amino acid sequences in the
proteins are divergent, coinciding with their expected diverse functions.”

It may be
more accurate to say:
“While the DOF dom
ain is
highly conserved
, the rest of the
amino acid sequence of DOF proteins is divergent…”

Major issues:

1.

The structure of the DOF domain should be explained in greater detail, preferably
with reference to a diagram representing the structure of the domai
n. The
description provided in the Introduction is ambiguous:


Dof proteins contain about 200

400 amino acids that have a highly conserved
DNA binding domain that is composed of 52 amino acid residues that contain a
single zinc finger”

It is unclear from t
his what length a DOF domain might have, whether the DOF
domain is the 52 residue DNA binding domain, or if there are other elements that
are part of the DOF domain other than a Zinc Finger. The typical location of the
DOF domain in DOF family proteins sho
uld also be discussed. Is the domain
typically found at the N
-
terminus of proteins? If so, this will have implications for
the identification of DOF domains in sequence fragments lacking the N
-
terminal
region of sequence.

The difference between a standard
DOF domain and the PBF
-
DOF domain should
be explained, preferably with reference to a diagram showing the differences in
the domain structure or consensus sequences.

2.

The bioinformatics analyses presented in this manuscript should be integrated.
For example
, the variable presence of motifs detected by MEME across domain
and gene sequences should be presented with respect to the phylogenetic tree
obtained for the sequences


instead of ordering the sequences in Figures 2A and
6A by the MEME e
-
value, they coul
d be ordered according to the clustering
obtained from phylogenetic analysis. This would make it clear if particular motifs
are present in certain groupings of species but absent from others. Likewise, the
motifs could be highlighted in the multiple sequen
ce alignment images (see point
6 below).

3.

It is not clear from the Introduction or Discussion if distinct sub
-
families of DOF
domains are detected in multiple sequence alignment, or if the sequences cluster
according to the species tree for these plants. Th
is point should be clarified: is
there variation that exists within species across sub
-
families of DOF proteins, or
is the variation found in domain sequences across species. The evolutionary
relationship between DOF and PBF
-
DOF domain sequences in particu
lar should
be explored.

4.

The domain sequence length given in Table 1 and 2 ranges from 70 bp to 409 bp


representing a range of approximately 23
-
136 amino acids. How does this relate
to the 52 residue size identified on page 2 in paragraph 1 of the Introdu
ction? It
should be made clear if this variation in length represents diversity in the size of
the DOF domain, or has been caused by partial amplification of the regions of
interest or sequencing of regions flanking the DOF domain. The difference
between t
he expected amplicon size and the sequenced products should also be
discussed.

5.

There is no reference to the source of the accession numbers used in this
manuscript. If they are internal identifiers, this should be explained, otherwise, the
source database
of the accession numbers should be clearly identified. Also, while
only EU accession numbers are cited in Table 2, Figures 1, 2, 4, 5 and 6 all
contain sequences identified by Genbank accession numbers (gi and gb prefixes).
Accession numbers used to identi
fy sequences should be consistent through out
the manuscript, and preferably be taken from a well known publicly accessible
resource such as Genbank.

6.

The significance of the motifs found in the DOF domain and DOF gene sequences
should be discussed. The bio
logical relevance of Motif 1 found in the gene
sequences is explained on page 15, and this type of explanation should be
attempted for the motifs found in the domain sequences, as well as the other
motifs found in the gene sequences. Alternatively, if no b
iological significance
can be found for the other motifs, this should be clearly stated. The possibility that
Motif 7 (Figure 6 and Table 5) is a fragment of Motif 1 present in incompletely
sequenced products should be explored. Is Motif 7 found in the sam
e region of
sequence as Motif 1? Highlighting the motifs discovered by MEME in a multiple
sequence alignment might indicate whether Motif 7 is a fragment of Motif 1.

7.

The multiple sequence alignments are described as covering the DOF domains of
the aligned
sequences, however it is clear that the alignments do not cover the
entire region of the DOF domain. The whole alignment should be presented. For
example, the alignment in Figure 1 only covers ~30 residues of DOF domain. A
consistent style of presentation
should be used for the multiple sequence
alignments in Figures 1 and 4.

8.

The significance of the partial DOF domains (missing two cysteine residues of the
Zing Finger) should be discussed


do these partial domains represent true
variation in the DOF domain
, and if so, what is the likely consequences regarding
the function of the protein? If this is not the case, is the partial DOF domain likely
to be caused by partial sequencing or incomplete amplification of the region of
interest? These possibilities shou
ld be discussed in the Results and Discussion
section.


Minor issues:

1.

Listing the species and accession numbers in the Abstract presents an unnecessary
level of detail. The second sentence would be more concisely written as:


The DOF domain sequences of di
fferent cereals were subjected to homology
search, multiple sequence alignment and motif analysis.”

2.

Page 3, paragraph 1, sentence 2 is unclear:

“Because hierarchy organization of
genes reflects an ancient process of gene duplication and divergence, many of

the
theoretical and analytical tools of the phylogenetic systematics can be utilized in
comparative genomics”
.
What
hierarchy organization

is referred to in this
statement? Do the authors intend to refer to the hierarchical clustering generally
seen in ph
ylogenetic trees? This statement should be re
-
written for greater clarity.

3.

Species names should be written in full on the first use, and then abbreviated
subsequently, i.e.
Arabidopsis thaliana

is abbreviated to
A. thaliana

in paragraph
1 of page 3, howev
er
Oryza sativa

is not.

4.

On page 3 the authors describe the manuscript of Moreno
-
Risueno et al., 2007 as
an analysis of DOF sequences across representative organisms belonging to both
monocots and dicots, however the cited work also covers species such as
green
algae, moss, and ferns, and is therefore inaccurately described as a comparison of
monocots and dicots.

5.

Table 2 has a miss
-
aligned column.

6.

The full names of the databases on page 6 should be given, followed by the
abbreviation only if the database wi
ll be referred to subsequently in the
manuscript.

7.

Database versions for the frequently updated database tools Pfam, PROSITE and
InterProScan need to be provided to indicate which versions of these resources
were used in analysis.

8.

Applications from the CBS
prediction servers used in analysis should be listed


this prediction server contains many applications.

9.

Parameters used with all prediction programs should be recorded in the Materials
and Methods section. If default parameters are used, this should be s
pecified, as is
done in the Results and Discussion section for MEME. Parameters used in
running MEME should be described in the Materials and Methods section rather
than in results and discussion.

10.

Page 7, paragraph 1, sentence 3 is unclear:

Further the de
duced
nucleotide

sequences translated was confirmed by subjecting the nucleotide sequences to
gene finding software namely GENESCAN and FGENESH to find the putative
CDS and protein sequence.”

The authors may have meant to say:
“Further, the deduced
protein

sequences so
translated were confirmed by subjecting the nucleotide sequences to gene finding
software…”

This should be clarified, and if necessary corrected.

11.

Data on Figures 2A and 2B could be combined for more concise representation of
results


the nu
mber of times each motif was found in each sequence could be
shown on Figure 2A in the box where currently a redundant indication of the
motif number (1
-
7) is shown. Motif number is already indicated in the column
header. The same could be done with Figure
s 6A and B.

12.

Justification of the restrictions placed on MEME should be included. Why is
MEME restricted to finding up to a maximum of 10 motifs with minimum width
set to 15 residues?

13.

The term
“multilevel consensus sequence”

should be explained, since this
term is
specifically defined with reference to the MEME software and its definition will
be unfamiliar to readers who are not familiar with that software.

14.

The identification of a motif associated with DOF proteins involved in regulating
seed storage genes
is an interesting result and could be explored in greater detail.
The way in which this association was established should be included in the
discussion, or a reference for the association should be cited.



REPLY FROM AUTHORS (received 28
th

Nov, 2008)

Ans
wer to Query

Major issues:


1.

The DOF domain structure has been explained in greater detail with suitable
diagram (
Figure
-
1)
in

Introduction section. The DOF domain is conserved and
there is no separate domain for PBF DOF gene.

2.

The motifs have been analyzed
in context with the constructed phylogenetic tree
(
Figure
-
3

and
6
). Most frequently observed Motifs have been highlighted in
multiple sequence alignment (
Figure
-
2
and

5
).

3.

There are no subfamilies of DOF domain and as such there is no evolutionary
relatio
nship between DOF and PBF DOF domain.

4.

The sequences of PCR amplified DOF domain of cereals and millets shows a
variation in length from 70
-
409 bp, due to different length of sequencing though
the size of amplicon was uniformly constant i.e 172 bp as observ
ed in the PCR
amplification pattern obtained with domain specific primer as shown by
incorporating a new figure in the result section (
Figure
-
2
).

5.

The source of accession number has been provided and now there is no ambiguity
as such in the presentation of
accession numbers used in the present manuscript.

6.

The biological significance of different motifs observed in domain and genes has
been discussed.

7.

The alignment has been provided for the entire length of DOF domain and figure
has been changed (
Figure
-
2
and

5).

8.

The lack of two cysteine residue in some of the clones has no evolutionary
significance and it might be caused due to partial sequencing.


Minor issues:

1.

Abstract has been reframed avoiding unnecessary level of details.

2.


As the manuscript has been rev
ised in light of major issues page 3, paragraph 1,
sentence 2 of previous manuscript has been removed to avoid any ambiguity.

3.


The suggestion for incorporating the species name in full on the first use,
followed by subsequent abbreviation has been incorpor
ated in the manuscript.

4.

It has been expanded as desired.

5.

Table
-
2
has been aligned and reframed.

6.

It has been explained in separate
Table
-
3
.

7.

It has been provided (Material and Method section) as suggested.

8.

CBS prediction server has not been used in the prese
nt study which was by
mistake described in the earlier manuscript.

9.

It has been incorporated as desired.

10.

It has been corrected as suggested.

11.

These figures have been changed in the revised manuscript.

12.

The selections of MEME parameters is on basis of occurren
ces of observed motif
in order to minimize the ‘E
-
value’ of the given parameter based on probability of
finding an equally well
-
conserved pattern in set of sequences.

13.

The term ‘multilevel consensus sequence’ has been explained.

14.

It has been discussed as sug
gested.