A flexible, scalable genomics framework for integrating heterogeneous

signtruculentBiotechnology

Oct 2, 2013 (3 years and 9 months ago)

101 views

Scott Emrich


Assistant Professor,

Computer Science and Engineering

Scientific Manager, VectorBase

University of Notre Dame

A flexible, scalable genomics
framework for integrating
heterogeneous vector sequence data

Assembly required…

VectorBase is here to help

(esp.

OMICs

data)

Please see me and/or Dan Lawson (EBI) anytime this meeting

Anopheles
gambiae M & S

Lawnziak
, Emrich et al. (2010, Science)

Some genomic regions display footprint of
strong, recent selection

Lawniczak
, Emrich et al. 2010 Science

A

C

G

T

C

G

T T

A

C

T

G

C

Reference:

A

C

G

T

C
G

A

T

A

C

T

G

C

Sample_1:

A

C

G
T

C

G

T T

A

T T

G

C

Sample_2:

A

C

G

T

C

G

A

T

A

T T

G

C

A

C

G

T

C

G

A

T

A

T T

G

C

A

C
G

T

C

G

A

T
A

C

T

G

C

A

C

G

T

C

G

A

T
A

C

T

G

C

A

C

G

T

C

G

T T

A

T T

G

C

A

C

G

T

C

G

T T

A

T T

G

C

A

C

G

T

C

G

T T

A

T T

G

C

A

C

G

T

C
G

T T

A

T T

G

C

FlexReseq

tool
for integrating diverse
sequence data

FlexReseq implementation

Genome Analysis Toolkit (GATK):


Map
-
Reduce framework that allows efficient
access to large resequencing data sets

FlexReseq
: A module for GATK:


Configurable interface allows easy data
exploration


Modular implementation of rules allows for easy
extension of software



Saves you from lots of scripting (Perl) code!

McKenna et al.,
Genome Research,
2010

A malaria use
-
case for
FlexReseq

Samarakoon, Regier, et al.,
BMC Genomics,

2011

Why are some parasites drug
-
resistant
?


Goal: we want to connect genotype (genome)

to phenotype (drug response)

How did drug
-
resistance evolve?

1. Whole genome
shotgun
sequencing

2. Reference
genome mapping

NCBI Trace Archive

[28]



Reference

genome

(3D7)

Parental genomes

[shotgun libraries]

Progeny genomes

[shotgun libraries]

PlasmoDB (v5.4)

[27]

Mapped:

SSAHA2

http://www.s
anger.ac.uk

Parents

HB3, Dd2

Progeny

recombinants

SC05


7C126

Shotgun libraries

GS
-
FLX technology

454/Roche

Genetic cross

Wellems

et al.

1990 [24]

A more detailed map of

P. falciparum

Dd2

HB3

Chromosome

position

1


2


3


4


5


6


7


8


9


10


11


12


13


14


Chromosom
e

(A) 7C126

(B) SC05

Association of 2La with clines of aridity in
Nigeria…

Modified from
Coluzzi

et al (1979)

24,000 mosquitoes


194 sampling localities

High
-
throughput sequencing


Data from Besansky


lab


Illumina

Genome
Analyzer


4 population pools

(S
-
form)


SHRiMP

alignment


BWA works also


C. Cheng et al, unpublished

Differential mapping biases do exist

Population
haplotyping

In situ

error isolation

Has been shown to be important in ancient DNA
-
based ecology

Thanks to…

VectorBase

(NIH/NIAID)


Dr. Nora Besansky (ND)


Dr. Frank Collins (ND)


Rory Carmichael,
Andrew
Shehan
, Nate
Konopinski
, Dave
Campbell (ND), others…

Notre Dame Bioinformatics Lab, Summer 2010

Anopheles genome cluster group

i5K

Arthropod Genomics Consortium steering committee