(0) Save to: more options SeqWare Query Engine: storing and ...

importantpsittacosisΒιοτεχνολογία

20 Φεβ 2013 (πριν από 4 χρόνια και 3 μήνες)

103 εμφανίσεις




(0)

Save to:

more
options



SeqWare Query Engine: storing and searching sequence data in the
cloud

Author(s):

O'Connor, BD
(O'Connor, Brian D.)
2
;
Merriman, B
(Merriman, Barry)
1
;
Nelson, SF

(Nelson, Stanley
F.)
1


Source:
BMC BIOINFORMATICS


Volume:

11


Supplement:

12




Article Number:

S2


DOI:

10.1186/1471
-
2105
-
11
-
S12
-
S2


Published:

DEC 21 2010

Times Cited:

6

(from Web of Science)

Cited References:

30

[
view related records

]





Citation Map






Abstract:

Background: Since the introduction of next
-
generation DNA sequencers the rapid increase in
sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being
resequenced over the last few years. These efforts are mer
ely a prelude for a future in which genome
resequencing will be commonplace for both biomedical research and
clinical

applications. The dramatic increase
in sequencer output strains all facets of computational infrastructure, especially databases and query

interfaces.
The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a
compelling solution to these ever increasing demands.


Results: In this work, we present the SeqWare Query Engine which has been
created using modern cloud
computing technologies and designed to support databasing information from thousands of genomes. Our
backend implementation was built using the highly scalable, NoSQL HBase
database

from the Hadoop project.
We also created a web
-
based frontend that provides both a programmatic and interactive query interface and
integrates with widely used genome browsers and tools. Using the query engine, users can load and query
variants (SNVs, indels, translocations, etc) with a rich level of a
nnotations including coverage and functional
consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line.
We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an exam
ple of
using the Hadoop MapReduce framework within the query engine. This
software

is open source and freely
available from the SeqWare project (http://seqware.sourceforge.net).


Conclusions: The SeqWare Query Engine provided an easy way to make the U87MG

genome accessible to
programmers and non
-
programmers alike. This enabled a faster and more open exploration of results, quicker
tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of
analytical t
ools. The range of data types supported, the ease of querying and integrating with existing tools, and
the robust scalability of the underlying cloud
-
based technologies make SeqWare Query Engine a nature fit for
storing and searching ever
-
growing genome se
quence datasets.

Accession Number:

WOS:000290219600002

Document Type:

Article; Proceedings Paper

Language:

English

KeyWords Plus:

MYELOID
-
LEUKEMIA GENOME; CANCER GENOME; BROWSER; UCSC; MUTATIONS

Reprint Address:

Nelson, SF (reprint author), Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA
90095 USA.

Addresses:


1. Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA

2. Univ N Carolina, UNC Lineberger Comprehens Canc Ctr, Chapel Hill, NC

27599 USA

E
-
mail Address:

snelson@ucla.edu


ResearcherID Numbers: [
?

]

[ 2 researcher(s) included this record in their ResearcherID My Publication List. Click to view. ]
Nelson, Stanley


D
-
4771
-
2009

[ View profile at
ResearcherID.com ]

zong, fico


H
-
4677
-
2011

[ View profile at ResearcherID.com ]


Publisher:

BIOMED CENTRAL LTD, 236 GRAYS INN RD, FLOOR 6, LONDON WC1X 8HL, ENGLAND

Web of Science Categories:

Biochemical Research Methods; Biotechnology & Applied Microbiology;
Mathematical & Computational Biology

Research Areas:

Biochemistry & Molecular Biology; Biotechnology & Applied Microbiology; Mathematical &
Computational Biology

IDS Number:

759DA

I
SSN:

1471
-
2105