Request for Bioinformatics support - The GenePool

moredwarfBiotechnology

Oct 1, 2013 (3 years and 11 months ago)

101 views

GenePool SOLEXA and 454 Bioinformatics Requirements [August 2009] [v1
.2
]

page
1

of
11

Request for Bioinformatics support



A range of support is available to users of the GenePool, using custom tools developed in house
and also standard software tools such as Velvet, Newbler, MAQ, BLAST, InterPro annotation, etc.
We also offer training to
researchers by hosting you in our bioinformatics laboratory and leading
you through the sequence assembly and annotation protocols useful for your data, and advice on
local requirements for data analyses.


In planning your sequencing project, please recall

that the volume of data produced, especially
by the next generation instruments, is huge, and ‘usual’ ways of working (a sequence at a time)
will in all probability be inadequate. We strongly recommend you factor in significant resource
for bioinformatic
s

analyses of your data
-

the GenePool

can only provide person
-
months of
analytical time if this time is paid for, or directly requested in your grant or other application.
We can also support you in analyses of Sanger sequencing data
-

please ask.
If there

are analyses
you would like to do that are not listed here, please discuss them with
us at the
GenePool
-

we
aim to support you in analysis of your data.


The GenePool is a research facility, and one favoured and cost
-
efficient way for you to access
bioin
formatic analyses for your programme is to include the GenePool as a full collaborator in
your grant application. In this way we can request funding for the bioinformatics support you
need, and provide this by dedicating skilled personnel in the GenePool t
o your project. Thus we
can ensure you have access to a bioinformatician who knows your project, assists with
experimental design, performs bespoke analyses and delivers results throughout a 3
-
year grant,
while only requesting the 6
-
9 months of actual time

required for the work.


Please contact us to discuss this option well in advance of any deadline.


This form
seems

quite

long!

-

but you should only need to check boxes in 3 or at most 4 pages of
it for each project.

GenePool SOLEXA and 454 Bioinformatics Requirements [August 2009] [v1
.2
]

page
2

of
11

Request for Bioinformatics support


Your name


Your email address



a What kind of bioinformatics analyses/support do you need?

In the sections below please indicate the kinds of analyses you require assistance with. Before
submitting the application, please discuss these requirements wit
h us so that we both
understand
what

will be done,
who

will do it (either we do it as a service, or we show you how
to do it), and what the resource implications are. For each kind of data, there are core analyses
we perform as part of the quality assuranc
e process
-

these are indicated by pre
-
entered ‘YES’.


• Analyses of Transcriptome sequencing (RNAseq, EST) project
-

a獳敭扬y a湤
a湮otat楯渠nooc桥 q楴a湩畭 o爠f汬畭楮a 䝁ffF


go to b

• Analyses of small RNA (microRNA, siRNA) sequencing project
-

楤敮t楦
楣慴楯測n
co畮t楮朠慮搠da灰楮朠go 牥晥牥rc攠来eome

E
f汬um楮a 䝁ffF


go to c

• Analyses of deep SAGE (digital transcriptomics) project
-

楤敮t楦icat楯測n
co畮t楮朠慮搠da灰楮朠go 牥晥牥rc攠来eome

E
f汬um楮a 䝁ffF


go to d

• Large
-
楮獥牴rc汯湥 獥q略湣楮n 灲p橥
ct
-

a獳敭扬y a湤 a湮otatio渠n卡湧敲e
ooc桥 q楴a湩畭 o爠f汬畭i湡 䝁ffF


go to e

• Genome resequencing project
-

ma灰楮朠go 牥r敲敮c攠来eom攠⡒oc桥
q楴a湩畭 o爠f汬畭楮a 䝁ffF


go to f

• Hybridisation
-
獥汥ct敤 akA 獥煵敮c楮朠g牯橥rt
-

ma灰楮朠go 牥晥牥湣攠
⡒oc桥
q楴a湩畭 o爠f汬畭楮a 䝁ffF


go to g

• De novo genome sequencing project
-

a獳敭扬y a湤 a湮otat楯渠nooc桥
q楴a湩畭 o爠f汬畭楮a 䝁ffF


go to h

• ChIP or other immunoprecipitate/small fragment sequencing project
-

ma灰楮朠go 牥晥牥湣攠来湯me

⡒oc桥 q
楴a湩畭 o爠f汬畭楮a 䝁ffF


go to i

• Phylogenetic or population genetic marker sequencing project (specimen
扡獥搩d
-

a獳敭扬yⰠa汬敬e ca汬楮本ga汩杮m敮t

Eooc桥 q楴a湩畭F


go to j

• DNA barcoding project
-

敳t業at楯渠n映浯汥c畬a爠o灥牡r楯湡氠瑡硯湯mic 畮i


⡒oc桥 q楴a湩畭
F


go to k

• Metagenomics project
-

a獳敭扬yⰠa湮otat楯測n敳t業atio渠n映瑡硯渠摩癥牳vty

⡒oc桥 q楴a湩畭
F


go to l

GenePool SOLEXA and 454 Bioinformatics Requirements [August 2009] [v1
.2
]

page
3

of
11

b
ANALYSES OF TRANSCRI
PTOME SEQUENCING PRO
JECTS


Analyses

we do it
for you

we show you
how to do it

• Base calling and q
uality assessment

YES

----------------


剥ourn

of ‘raw’ base called sequence reads to you
with quality files

YES

----------------

• Secure archiving of raw sequence data for 3 years from
generation

YES

----------------

• Submission of raw sequence data

to the relevant
international repository



• Assembly of reads into contigs using

TIGRclus/Newbler/MIRA2 (for Roche Titanium);
generation of cluster consensus sequences (including,
where relevant, incorporation of other transcriptome
data from public sou
rces)




剥ourn

of clustered data and consensuses to you


----------------

• Annotation of consensuses with similarity information
(using the BLAST suite of tools and databases indicated
by you)



• Annotation of consensuses with
GO, KEGG and EC
inform
ation

(using the annot8r tool)



• Generation of a PostgreSQL relational database of
sequences, consensuses and annotations (using the
PartiGene suite of tools)



• Generation of a web
-
accessible database portal
permitting simple queries of the PartiGene

database




剥ourn

of annotated consensus data, postgreSQL
database and web tools to you (and assistance with
setting up the database on your home computer)


----------------

• Mapping of reads and / or consensuses to a reference
genome or transcriptom
e




剥ourn

of consensus
-
to
-
reference or read
-
to
-
reference
mappings in agreed format


----------------


Submission of your sequences to public databases




GO
Gene Ontology descriptors of biological function, physiological role and cellular location

K
EGG Kyoto Encyclopaedia of Genes and Genomes pathways

EC Enzyme Commission reference numbers

postgreSQL a ‘structured query language’

rel
ational database manager [we can also generate
MySQL databases if requested]

GenePool SOLEXA and 454 Bioinformatics Requirements [August 2009] [v1
.2
]

page
4

of
11

c
ANALYSES OF SMALL RN
A SEQUENCING PROJEC
TS


Analyses

we do it
for you

we show you
how to do it

• Base calling and quality assessment

YES

----------------


剥ourn

of ‘raw’ base called sequence reads to you
with quality files

YES

----------------

• Secure archiving of raw sequence data for 3 y
ears from
generation

YES

----------------

• Submission of raw sequence data to the relevant
international repository



• Counting of unique small RNA sequences present in
samples




剥ourn

of sequence count data in agreed format
(showing reads per sequ
ence sample where relevant)


----------------

• Identification of matches between sequences of small
RNAs and known small RNAs in RNAfam and miRBASE




剥ourn

of data showing sequences mapped to known
small RNAs


----------------

• Mapping of reads and

/ or unique sequences to a
reference genome or transcriptome




剥ourn

of sequence
-
to
-
reference or read
-
to
-
reference
mappings in agreed format


----------------


Submission of your sequences to public databases




GenePool SOLEXA and 454 Bioinformatics Requirements [August 2009] [v1
.2
]

page
5

of
11

d
ANALYSES OF DIGITAL
TRANSCRIPTOMI
CS
( DEEP SAGE, RNA
-
SEQ
) PROJECTS


Analyses

we do it
for you

we show you
how to do it

• Base calling and quality assessment; extraction of 21
base SAGE tags

YES

----------------


剥ourn

of SAGE tag sequence reads to you with quality
files

YES

-----------
-----

• Secure archiving of raw sequence data for 3 years from
generation

YES

----------------

• Submission of raw sequence data to the relevant
international repository



• Counting of unique SAGE tags per sample




剥ourn

of counts of unique tags to

you in spreadsheet
format (with columns for each sample) ready for further
analysis


----------------

• Mapping of SAGE tags a reference genome or
transcriptome; annotation with gene identifier or
genome position data




剥ourn

of SAGE tag
-
to
-
reference

mappings in agreed
format (eg spreadheet format) ready for further analysis


----------------


Submission of your sequences / analyses to public
databases




GenePool SOLEXA and 454 Bioinformatics Requirements [August 2009] [v1
.2
]

page
6

of
11

e
LARGE
-
INSERT CLONE SEQUENC
ING PROJECT


Analyses

we do it
for you

we show you
how to do it

• Base calling and quality assessment

YES

----------------


剥ourn

of ‘raw’ base called sequence reads to you
with quality files

YES

----------------

• Secure archiving of raw sequence data for 3 years from
generation

YES

----------------

• Assembly of

draft
-
scale project reads into contigs
using standard tools [Phrap, CAP (Sanger),
Newbler/MIRA2 (Roche Titanium) or Velvet (Illumina
GAII)]; generation of contig consensus sequences (draft
assembly)



• Assembly of full
-
scale project reads into contigs u
sing
standard tools [Phrap, CAP (Sanger), Newbler/MIRA2
(Roche Titanium) or Velvet (Illumina GAII)]; generation
of contig consensus sequences; error checking of
consensus compared to reads and production of gold
standard assembly




剥ourn

of contigged d
ata, assembly information and
consensuses to you


----------------

• Annotation of consensuses with similarity information
(using the BLAST suite of tools and databases indicated
by you)



• Annotation of consensuses with
GO, KEGG and EC
information

(usi
ng the annot8r tool)



• Annotation of consensuses using comparative tools
[prokaryotic genomes only]




剥ourn

of annotated consensus data, with
annotations, in standard format (
GFF (version3)
) for
viewing in genome browsers (such as Artemis)


--------
--------

• Alignment of consensus to a reference genome and
identification of differences (SNPs and indels)




剥ourn

of consensus
-
to
-
reference mappings in agreed
format


----------------


Submission of your sequences / analyses to public
databases




SNP

single nucleotide polymorphism

indel

insertion/ deletion


GenePool SOLEXA and 454 Bioinformatics Requirements [August 2009] [v1
.2
]

page
7

of
11

f
GENOME RESEQUENCING
PROJECT


Analyses

we do it
for you

we show you
how to do it

• Base calling and quality assessment

YES

----------------


剥ourn

of ‘raw’ base called sequence reads to

you
with quality files

YES

----------------

• Secure archiving of raw sequence data for 3 years from
generation

YES

----------------

• Mapping of sequencing reads to reference genome or
genomes using standard tools (such as MAQ for Illumina
GAII and Ref
erenceMapper for Roche Titanium)



• Identification of robustly called single nucleotide
polymorphisms using read depth and read quality
information



• Mapping of SNPs to genomic features in reference
genome segments and classification into
genic/interg
enic, exon/intron and
synonymous/nonsynonymous (using features defined in
genome annotation files)




剥ourn

of reference mapping data, SNP calls and SNP
‘effects’ to you in agreed format (such as
GFF (version3)

files, or other spreadsheet format)


-----
-----------

• Identification of robustly called insertions and
deletions (indels) using read depth and read quality
information (using additional read mapping tools)



• Mapping of indels to genomic features in reference
genome and classification into ge
nic/intergenic,
exon/intron (using features defined in genome
annotation files)




剥ourn

of indel calls and indel ‘effects’ to you in
agreed format (such as
GFF (version3)

files, or other
spreadsheet format)


----------------

• Assembly
de novo

of unma
pped reads to identify
possible novel genetic elements in the resequenced
genome; identification of possible insertion points in
reference genome




剥ourn

of consensuses built from unmapped reads and
possible reference insertion points


----------------


Submission of your sequences / analyses to public
databases




SNP

single nucleotide polymorphism

GFF (version3)

general feature format
-

a common genome annotation feature exchange format

indel

insertion/ deletion

GenePool SOLEXA and 454 Bioinformatics Requirements [August 2009] [v1
.2
]

page
8

of
11

g
HYBRIDISATION OR PCR
-
SELECTED DNA

SEQUENCING PROJECT


Analyses

we do it
for you

we show you
how to do it

• Base calling and quality assessment

YES

----------------


剥ourn

of ‘raw’ base called sequence reads to you
with quality files

YES

----------------

• Secure archiving of raw sequ
ence data for 3 years from
generation

YES

----------------

• Mapping of sequencing reads to reference genome
segments using standard tools (such as MAQ for Illumina
GAII and ReferenceMapper for Roche Titanium)



• Identification of robustly called single

nucleotide
polymorphisms using read depth and read quality
information; classification into heterozygote and
homozygote classes



• Mapping of SNPs to genomic features in reference
genome segments and classification into
genic/intergenic, exon/intron and

synonymous/nonsynonymous (using features defined in
genome annotation files)




剥ourn

of reference mapping data, SNP calls and SNP
‘effects’ to you in agreed format (such as
GFF (version3)

files, or other spreadsheet format)


----------------

• Identi
fication of robustly called insertions and
deletions (indels) using read depth and read quality
information (using additional read mapping tools)



• Mapping of indels to genomic features in reference
genome and classification into genic/intergenic,
exon/
intron (using features defined in genome
annotation files)




剥ourn

of indel calls and indel ‘effects’ to you in
agreed format (such as
GFF (version3)

files, or other
spreadsheet format)


----------------


Submission of your sequences / analyses to pu
blic
databases




SNP

single nucleotide polymorphism

GFF (version3)

general feature format
-

a common genome annotation feature exchange format

indel

insertion/ deletion

GenePool SOLEXA and 454 Bioinformatics Requirements [August 2009] [v1
.2
]

page
9

of
11

h
DE NOVO GENOME SEQUE
NCING PROJECT


Analyses

we do it
for you

we show you
how to do

it

• Base calling and quality assessment

YES

----------------


剥ourn

of ‘raw’ base called sequence reads to you
with quality files

YES

----------------

• Secure archiving of raw sequence data for 3 years from
generation

YES

----------------


Prelim
inary assembly of sequence reads using ‘best
practice’ assemblers. This could involve both Illumina
SOLEXA and Roche 454 data, and thus require a ‘mixed’
assembly.




Identification of potential ‘joins’ between contigs
using ‘matchmaker’ algorithms to as
sist in closure.




Preliminary annotation of first
-
pass assembly using
BLAST matches to other genomes (using for example the
RAST server)




剥ourn

of
assembled sequence data to you in an
agreed format
(such as
GFF (version3)

files,
EMBL format
files,

or other spreadsheet format)


----------------


Submission of your sequences / analyses to public
databases




Due to the massive additional effort required for genome closure and full annotation, if this is
your goal we recommend that you budget in yo
ur application for a full time bioinformatician
post. We will be happy to assist such staff in getting the best out of genome sequencing data.


GenePool SOLEXA and 454 Bioinformatics Requirements [August 2009] [v1
.2
]

page
10

of
11

i
CHIP OR SMALL FRAGME
NT SEQUENCING PROJEC
T


Analyses

we do it
for you

we show you
how to do it

• Base callin
g and quality assessment

YES

----------------


剥ourn

of ‘raw’ base called sequence reads to you
with quality files

YES

----------------

• Secure archiving of raw sequence data for 3 years from
generation

YES

----------------

• Mapping of sequencing re
ads to reference genome
segments using standard tools (such as MAQ

or Novoalign

for Illumina GAII and ReferenceMapper for Roche
Titanium)



• Mapping of
reads

to genomic features in reference
genome segments and classification
(
genic/intergenic,
exon/intr
on

etc.,
using features defined in genome
annotation files)



• Performing counts of reads mapped per genomic
interval




剥ourn

of reference mapping data,
counts of reads per
genome interval and associations with genome features

to you in agreed format

(such as
GFF (version3)

files, or
other spreadsheet format)


----------------


Submission of your sequences / analyses to public
databases





j
PHYLOGENETIC / POPUL
ATION GENETIC MARKER

SEQUENCING PROJECT


Analyses

we do it
for you

we show you
how to d
o it

• Base calling and quality assessment

YES

----------------


剥ourn

of ‘raw’ base called sequence reads to you
with quality files

YES

----------------

• Secure archiving of raw sequence data for 3 years from
generation

YES

----------------


Align
ment of reads to each other and to a set of
reference database sequences




Identification of unique haplotypes and counting of
SNP frequencies




剥ourn

of
assemblies, alignments and haplotype data
to you in agreed format


----------------


Submissi
on of your sequences / analyses to public
databases





GenePool SOLEXA and 454 Bioinformatics Requirements [August 2009] [v1
.2
]

page
11

of
11

k
DNA BARCODING PROJEC
T


Analyses

we do it
for you

we show you
how to do it

• Base calling and quality assessment

YES

----------------


剥ourn

of ‘raw’ base called sequence reads to you
with qual
ity files

YES

----------------

• Secure archiving of raw sequence data for 3 years from
generation

YES

----------------

• Filtering of sequence data by length, quality, or other
criteria.




Identification of unique haplotypes




Generation of molecu
lar operational taxonomic unit
assignments for your data (with other sequences if
required) using MOTU_define




Comparison of unique haplotypes with public DNA
barcode datasets to assign putative taxonomic
identifications to sequences




剥ourn

of
ass
emblies, haplotype consensuses,
alignments, MOTU_define analyses to you in agreed
formats


----------------


Submission of your sequences / analyses to public
databases




l
METAGENOMICS PROJECT


Analyses

we do it
for you

we show you
how to do it

• Bas
攠ea汬楮朠慮搠煵a汩ty a獳敳獭敮t

vbp

ⴭⴭⴭ-ⴭ-ⴭ-ⴭ-


Return

of ‘raw’ base called sequence reads to you
w楴栠煵a汩ty 晩f敳

vbp

ⴭⴭⴭ-ⴭ-ⴭ-ⴭ-

• Secure archiving of raw sequence data for 3 years from
来湥牡r楯n

vbp

ⴭⴭⴭ-ⴭ-ⴭ-ⴭ-


A獳敭扬y o映摡fa
獥t猠s獩湧 扥獴
-
灲pct楣攠too汳.




f摥湴楦icat楯渠慮搠dou湴楮朠g映晲a杭敮t猠敮co摩湧
来湥猠s映楮f敲敳t a湤 来湥牡r楯渠n映浵汴楰汥 獥q略湣攠
a汩杮m敮ts




A獳楧湭敮t o映晲a杭敮t猠敮co摩湧 来湥猠s映楮f敲敳t to
ta硯湯m楣i杲g異猠畳楮朠䉌A協 a湡汹獩s





o扡氠晵lct楯湡氠慮湯tat楯渠n映獥f略湣敳⁵n楮朠
a湮ot㡲8⡇伬EbC a湤 䭅䝇 楤敮t楦i敲猩




Return

of
assemblies, alignments, functional
assignments to you in agreed formats


----------------


卵扭楳獩o渠n映祯畲u獥煵敮c敳 ⼠慮a汹獥猠so 灵扬楣i
摡ta扡獥猬so爠
牥景牭rtt楮朠g映祯畲u摡ta 景爠獵扭楳獩o渠
to ot桥爠a湡汹t楣慬⁰i灥汩湥猠獵c栠慳 䍁䵅oA.