ITS - Inra

yokeenchantingΒιοτεχνολογία

29 Σεπ 2013 (πριν από 3 χρόνια και 8 μήνες)

86 εμφανίσεις


A L I M E N T A T I O N


A G R I C U L T U R E



E N V I R O N N E M E N T

Taxonomy assignment based on fungal ITS
pyrosequencing

data

AG PEPI IBIS ,
December

6
2012


N
.
Lapalu

Scientific context

Wood diseases

Grape canker

(
Eutypa

lata
)

Black dead arm (
Botryosphaeriaceae
)

ESCA (
Fomitiporia

mediterranea
,

Phaemoniella

chlamydospora
)




Control methods

Chemical treatment
-
> unauthorized since 2001

Uprooting of grapevine trunks

Prevention
-
> best characterization of the microbiological flora

Goals

3



-
Identification of fungal species in grapevine trunks with / without diseases, to
develop methods to analyze variations of fungal communities during diseases
development


-
Avoid traditional isolation protocols and species identifications (MLST)


-

Access to exhaustive repertoire of fungal diversity and associated
microorganisms

Projects

4

Titles:


Bioinformatique

pour la
taxonomie

et les
inventaires

en
écologie

microbienne
”, “maladies du bois de la
vigne


Type:


appel

à
projet

SPE 2010, CASDAR
(
developpement

agricole

et rural)



Tasks:

-
Validate sequences commonly used in molecular taxonomy
-
> fungi (ITS,
18S)

-
Reliable bioinformatics protocol to analyze high
-
throughput sequencing
data and assign taxonomy.

-
Enrich the
R
-
Syst

toolkit with a bioinformatics pipeline for molecular
taxonomy


Deliverable:
a workflow to analyze high
-
throughput sequencing data of
environmental samples


Barcoding

5

CO1

Cytochrome

c
oxydase

subunit 1 (barcode animal)

Two
-
marker system of chloroplast genes (plants)


More info:
http://www.barcodeoflife.org/
, http://www.boldsystems.org/views/login.php

ITS

6

http://unite.ut.ee/css/images/Primer_map_UNITE_Mar09.gif

ITS

ITS
-
long

ITS
-
long2

Primers

and
expected

amplicons

:

Protocol

7


Trunk area,
symptoms, wood,

Organisms

and
regions
:

fungi

(18S and ITS)

bacteria

(16S)

archaebacteria

(16S)

oomycetes

(ITS)

ROCHE 454

PCR fragment

MID

MID

Primer Forward

Primer Reverse

PCR:

Samplings:

Sequencing:

MID for
Pooling

:

Data
-

RUN 1

8

RUN 1
:

8 samplings

sample

area

type

wood

1

Ext

sympto

BB

2

Ext

sympto

BS

3

Ext

asympto

BS

4

Int

sympto

Nec

5

Int

sympto

BS

6

Int

sympto

Amadou

7

Int

asympto

Nec

8

Int

asympto

BS

For each sample, at least 4 primers:

-

ITS/ITS
-
long/ITS
-
long2

-

ITS
-
oomyc

-

16S
-
bacteria

-

18S
-
fungi

Controls
: PCR,
sequencing

effects

?

9



9

Mix of 20 fungi from BIOGER:

>1
Fusarium

graminearum


>2
Fusarium

culmorum

>3
Fusarium

sporotrichioides


>4
Fusarium

poae


>5
Fusarium

langsethiae


>6
Fusarium

avenaceum


>7
Botrytis
ficariarum


>8
Botrytis
aclada

>9
Botrytis
porri

>10
Botrytris

globosa


>11
Botrytis
calthae

>12
Botrytis
cinerea

>13
Septoria

nodorum

>14
Puccinia
striiformis

>15
Sclerotinia

minor

>16
Sclerotinia

sclerotiorum

>17
Microdochium

nivale

>18
Microdochium

majus

>19
Mycosphaerella

graminicola


>20
Leptosphaeria

maculans


-
Extraction, DNA quantification and pooling

-
PCR

-
High Throughput Sequencing

Growth

-
Extraction (No pooling)

-
PCR

-
SANGER Sequencing =
seq

Ref

VS

Species:

Tools:
technology

forecasting


10



Pre
-
processing / data filtering


Quality filtering (
pyrocleaner
, CD
-
HIT
-
454)


Denoising

( quality filtering
flowgram

based + clustering) (
AmpliconNoise
,
QIIME)


Sequence extraction (ITS Extractor)


Chimera detection (UCHIME,
AmpliconNoise
)


Data clustering


Uclust
, USEARCH, CD
-
HIT
-
OTU, MOTHUR


Taxonomic assignation


Sequence similarity (blast)


Naïve
Bayesian

Classifier

(RDP classifier)

Tools:
technology

forecasting


11



Pre
-
processing / data filtering


Quality filtering (
pyrocleaner
, CD
-
HIT
-
454)


Denoising

( quality filtering
flowgram

based + clustering) (
AmpliconNoise
,
QIIME)


Sequence extraction (ITS Extractor)


Chimera detection (UCHIME,
AmpliconNoise
)


Data clustering


Uclust
, USEARCH, CD
-
HIT
-
OTU, MOTHUR


Taxonomic assignation


Sequence similarity (blast)


Naïve
Bayesian

Classifier

(RDP classifier)

Workflow
-

MetaGPipe

12

sequence

extraction

Clustering

Refinement
:
Clustering

100%

Taxonomic

assignation

sff

files

Fungal

ITSExtractor

UCLUST
-

97% (QIIME)

BLAST (QIIME)

UCLUST
-
100% (QIIME)

Python scripts

Reads

cleaning


Pyrocleaner

Pick

Reference

sequence

-

Available

in
Gnome_tools

(REPET package)

-

Allow

parallel

computing

on cluster

-

Toolkit

-
> script for
database

formating
, plots,
analysis

card

Result

formatting

Pick_ref_seq

(QIIME)

Data
cleansing
:
Pyrocleaner

13

Mariette J,
Noirot

C,
Klopp

C. Assessment of replicate bias in 454
pyrosequencing

and a multi
-
purpose
read
-
filtering tool. BMC Research Notes 2011

Between

90%
-
70%
reads

are
conserved

after

filtering

Main
reason

-
>
low

complexity

filtering

Samples

nb
raw

reads

Nb
cleaned

reads

Length

filter

Undetermined

bases
filter

Low

complexity

filter

Pool1.MID1

5402

5025 (93%)

0

1

376

Pool1.MID5

12450

11571 (92%)

9

0

870

Pool1.MID7

11680

11342 (97%)

41

0

297

Pool1.MID14

7079

6444(91%)

0

0

635

Sequence

extraction

14

Why extract sequences ?

“Because the highly conserved ribosomal genes flanking the ITS1 marker may
distort sequence clustering and similarity searches“

454
Pyrosequencing

and Sanger sequencing of tropical
mycorrhizal

fungi provide similar results but reveal
substantial methodological biases,
New Phytol. 2010, Tedersoo L et al,



How to do that ?


We removed these from the dataset using
Fungal ITS Extractor


An open source software package for automated extraction of
ITS1

and
ITS2

from fungal
ITS

sequences for use in
high
-
throughput community assays and molecular ecology,
Fungal

Ecology

,
2010,
R. Henrik Nilsson et al,


HMMER package and HMM
profils
. Long and short
HMM to extract full or
partial regions

Clustering
:
Uclust

15



Threshold : 97% similarity

“sequencing artifacts or
errors”


New Phytol. 2010, Tedersoo L et al,


Taxonomic

assignation

16



BLAST (QIIME)

Genbank

Genbank

filtered

eUtils
:
keywords
(Fungi,
Ascomycota

ITS, ...)

ITS1

ITS2

ITS1
&
ITS2

ATGTCGTGATGCGTGAGT

ATGTCGTGATGCGTGAGT

ATGTCGTGATGCGTGAGT

ATGTCGTGATGCGTGAGT

ATGTCGT
-
ATGCGTGAGT

ATGTCGTGATG
G
GTGAGT

cluster

ATGTCGTGATGCGTGAGT

Most
abundant

ITS
extraction

Controls

-

RUN1

17

Strand

ITS type

Nb

sequences

Nb

clusters

Pool2_MID1

Foward


ITS1

1006

24

Pool2_MID13

Reverse

ITS1

966

24

19/20 : One species is missing

Puccinia
striiformis


Controls

-

RUN1

18

Strand

ITS type

Nb

sequences

Nb

clusters

Pool2_MID1

Foward


ITS1

1006

24

Pool2_MID13

Reverse

ITS1

966

24

Controls

-

RUN1

19

15%
with

errors

Cluster:

Microdochium

nivale

PCR: Pool2
-
MID1 (ITS1
Forward
)

Nb

reads

Nb

clusters
(100%)

type

37

31

One

to four mutations

129

1

Similar to
M.nivale

83

1

Similar to
M.majus

Controls

-

RUN1

20

Re
-
clustering 100% :

If there are two main clusters and dozens of clusters with 1
-
3 reads, the two
main clusters are 2 different species.


Problem :

If ITS sequences have more than 97% of similarity, re
-
clustering with 100% is
required to highlight multi
-
species distributions.


Cluster:

Microdochium

nivale

PCR: Pool2
-
MID1 (ITS1
Forward
)

Analysis

Card

21

Sequence graph distribution : raw,
cleaned, ITS extracted

Clusters 97%, SMC,
nb

reads,

Analysis

Card

22

Sequences du cluster
97%

reclustering100%

Results

-

RUN1

23

Sample

Genus


Ext

Sympt

BS

Bacidia
,
Lecania
,
Caloplaca
,
Aureobasidium

Ext

Asympto

BS

Catillaria
,
Bacidia
,
Aureobasidium
,
Psathyrella

Int
Sympto

Nec

Fomitiporia

Int
Sympto

BS

Catillaria
,
Bacidia
,
Lecania

Int
Sympto

amadou

Aureobasidium
,

Bacidia
,
Lecania
,
Calopiaca

Int
Asympto

Nec

Phaeoacremonium
,
Fomitiporia

Int
Asympto

BS

Bacidia
,

Lecania

Results

-

RUN1

24

ITS
-
long

ITS
-
long2

ITS

ITS
-
long primers work well, used in RUN2

Data
-

RUN 2

25

RUN 2
:
28 samplings

sample

type

plant area

date

1

Ceps

sains

Bras

avril

2010

2

Ceps

sains

Tronc

avril

2010

3

Ceps

sains

Porte
-
greffe

avril

2010

4

Ceps

malades

Bras

avril

2010

5

Ceps

malades

Tronc

avril

2010

6

Ceps

malades

Porte
-
greffe

avril

2010

7

Ceps

malades

Amadou

avril

2010

8

Ceps

sains

Bras

juin

2010

9

Ceps

sains

Tronc

juin

2010

10

Ceps

sains

Porte
-
greffe

juin

2010

11

Ceps

malades

Bras

juin

2010

12

Ceps

malades

Tronc

juin

2010

13

Ceps

malades

Porte
-
greffe

juin

2010

...

...

...

...

Results

-

RUN 2

26

ITS

ITS
-
long

Question : Is it possible to compare taxonomic assignation between forward
MID
-

ITS1
vs

reverse MID


ITS2 ?

Reverse
Reads

Forward
Reads

Results

-

RUN 2

27



MID

raw

reads

cleaned

reads

ITS2
reads

%

MID
-
122

56092

52292

52274

93,19

MID
-
117

28312

26351

26349

93,07

MID
-
94

6341

5956

5954

93,90

MID
-
4

25220

23140

23121

91,68

MID
-
80

16882

15655

15611

92,47

MID
-
82

7830

7332

7304

93,28

MID
-
61

8372

7263

6939

82,88

MID
-
57

7683

7087

7018

91,34

MID
-
51

15085

13720

13642

90,43

MID
-
35

3509

3313

3311

94,36

MID
-
31

19951

18676

18641

93,43

MID
-
28

14022

13171

13161

93,86

MID
-
135

8827

8185

8170

92,56

MID
-
8

3918

3511

3321

84,76

Problems

/ Questions

28



Problems


Amplicon

size
-
> PCR / sequencing problems, the expected
reads are
smaller
than
observed reads
(chimera)


Questions


Clustering methods ?


Denoising

-
> avoid re
-
clustering 100% ?


Perspectives

29



Analysis of 16S and 18S data with specific extractor


16S bacteria


16S
archae

18S fungi

Hartmann M, (2010). V
-
Xtractor
: An open
-
source,
high
-
throughput

software
tool

to
identify

and
extract

hypervariable
regions

of
small

subunit

(16S/18S) ribosomal RNA
gene

sequences
. Journal of
Microbiological

Methods


Participants

30


URGI/BIOGER

A.Gautier

V.Laval

L.Brigitte

J.
Amselem

M.H. Lebrun

N.Lapalu




SAVE

J.Vallance

E.Bruez

P.Rey



BIOGECO

A.Franc

31


Clustering
:
Uclust

32



Edgar,RC

(2010) Search and clustering orders of magnitude faster than BLAST,
Bioinformatics

26(19),

Ribosome

33

Winnebeck

EC, Millar CD,
Warman

GR. 2010. Why does insect RNA look degraded? Journal of Insect Science 10:159, available
online: insectscience.org/10.159