A flexible, scalable genomics framework for integrating heterogeneous

signtruculentBiotechnology

Oct 2, 2013 (3 years and 11 months ago)

105 views

Scott Emrich


Assistant Professor,

Computer Science and Engineering

Scientific Manager, VectorBase

University of Notre Dame

A flexible, scalable genomics
framework for integrating
heterogeneous vector sequence data

Assembly required…

VectorBase is here to help

(esp.

OMICs

data)

Please see me and/or Dan Lawson (EBI) anytime this meeting

Anopheles
gambiae M & S

Lawnziak
, Emrich et al. (2010, Science)

Some genomic regions display footprint of
strong, recent selection

Lawniczak
, Emrich et al. 2010 Science

A

C

G

T

C

G

T T

A

C

T

G

C

Reference:

A

C

G

T

C
G

A

T

A

C

T

G

C

Sample_1:

A

C

G
T

C

G

T T

A

T T

G

C

Sample_2:

A

C

G

T

C

G

A

T

A

T T

G

C

A

C

G

T

C

G

A

T

A

T T

G

C

A

C
G

T

C

G

A

T
A

C

T

G

C

A

C

G

T

C

G

A

T
A

C

T

G

C

A

C

G

T

C

G

T T

A

T T

G

C

A

C

G

T

C

G

T T

A

T T

G

C

A

C

G

T

C

G

T T

A

T T

G

C

A

C

G

T

C
G

T T

A

T T

G

C

FlexReseq

tool
for integrating diverse
sequence data

FlexReseq implementation

Genome Analysis Toolkit (GATK):


Map
-
Reduce framework that allows efficient
access to large resequencing data sets

FlexReseq
: A module for GATK:


Configurable interface allows easy data
exploration


Modular implementation of rules allows for easy
extension of software



Saves you from lots of scripting (Perl) code!

McKenna et al.,
Genome Research,
2010

A malaria use
-
case for
FlexReseq

Samarakoon, Regier, et al.,
BMC Genomics,

2011

Why are some parasites drug
-
resistant
?


Goal: we want to connect genotype (genome)

to phenotype (drug response)

How did drug
-
resistance evolve?

1. Whole genome
shotgun
sequencing

2. Reference
genome mapping

NCBI Trace Archive

[28]



Reference

genome

(3D7)

Parental genomes

[shotgun libraries]

Progeny genomes

[shotgun libraries]

PlasmoDB (v5.4)

[27]

Mapped:

SSAHA2

http://www.s
anger.ac.uk

Parents

HB3, Dd2

Progeny

recombinants

SC05


7C126

Shotgun libraries

GS
-
FLX technology

454/Roche

Genetic cross

Wellems

et al.

1990 [24]

A more detailed map of

P. falciparum

Dd2

HB3

Chromosome

position

1


2


3


4


5


6


7


8


9


10


11


12


13


14


Chromosom
e

(A) 7C126

(B) SC05

Association of 2La with clines of aridity in
Nigeria…

Modified from
Coluzzi

et al (1979)

24,000 mosquitoes


194 sampling localities

High
-
throughput sequencing


Data from Besansky


lab


Illumina

Genome
Analyzer


4 population pools

(S
-
form)


SHRiMP

alignment


BWA works also


C. Cheng et al, unpublished

Differential mapping biases do exist

Population
haplotyping

In situ

error isolation

Has been shown to be important in ancient DNA
-
based ecology

Thanks to…

VectorBase

(NIH/NIAID)


Dr. Nora Besansky (ND)


Dr. Frank Collins (ND)


Rory Carmichael,
Andrew
Shehan
, Nate
Konopinski
, Dave
Campbell (ND), others…

Notre Dame Bioinformatics Lab, Summer 2010

Anopheles genome cluster group

i5K

Arthropod Genomics Consortium steering committee