(0) Save to: more options Efficient selection of unique and popular ...

tomatoedgeBiotechnology

Feb 20, 2013 (4 years and 6 months ago)

167 views




(0)

Save to:

more options



Efficient selection of unique and popular oligos for large EST databases

Author(s):

Zheng, J

(Zheng, J);
Close, TJ

(Close, TJ);
Jiang, T

(Jiang, T);
Lonardi, S

(Lonardi, S)

Source:
BIOINFORMATICS


Volume:

20


Issue:

13


Pages:

2101
-
2112


DOI:

10.1093/bioinformatics/bth210


Published:

SEP 1 2004

Times Cited:

10

(from Web of Science)

Cited References:

29

[
view related records

]





Citation Map






Abstract:

Motivation: Expressed sequence tag (EST) databases have grown exponentially in recent years and
now represent the largest collection of
genetic

sequences. An important application of these databases is that
they contain information useful for the design of gene
-
specific oligonucleotides (or simply, oligos) that can be
used in PCR primer design, microarray experiments and genomic library screening
.


Results: In this paper, we study two complementary problems concerning the selection of short oligos, e.g. 20
-
50
bases, from a large
database

of tens of thousands of ESTs: (i) selection of oligos each of which appears (exactly)
in one unigene but does
not appear (exactly or approximately) in any other unigene and (ii) selection of oligos that
appear (exactly or approximately) in many unigenes. The first problem is called the unique oligo problem and has
applications in PCR primer and microarray probe de
signs, and library screening for gene
-
rich clones. The second
is called the popular oligo problem and is also useful in screening genomic libraries. We present an efficient
algorithm to identify all unique oligos in the unigenes and an efficient heuristic
algorithm to enumerate the most
popular oligos. By taking into account the distribution of the frequencies of the words in the unigene
database
, the
algorithms have been engineered carefully to achieve remarkable running times on regular PCs. Each of the
a
lgorithms takes only a couple of hours (on a 1.2 GHz CPU, 1 GB RAM machine) to run on a dataset 28 Mb of
barley unigenes from the HARVEST
database
. We present simulation results on the synthetic data and a
preliminary analysis of the barley unigene
databas
e
.

Accession Number:

WOS:000223827000013

Document Type:

Article

Language:

English

KeyWords Plus:

EXPRESSED SEQUENCE TAGS; UNUSUAL WORDS; DNA
-
SEQUENCE; DISCOVERY;
PATTERNS; GENES; MOTIFS

Reprint Address:

Lonardi, S (reprint author), Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521
USA.

Addresses:


1. Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA

2. Univ Calif Riverside, Dept Bot & Plant Sci, Riverside, CA 92521 USA

E
-
mail Address:

stelo@cs.ucr.edu


ResearcherID Numbers: [
?

]

[ 1 researcher(s) included this record in their ResearcherID My Publication List. Click to view. ]
Zheng, Jie


C
-
1356
-
2011

[ View profile at Researcher
ID.com ]


Publisher:

OXFORD UNIV PRESS, GREAT CLARENDON ST, OXFORD OX2 6DP, ENGLAND

Web of Science Categories:

Biochemical Research Methods; Biotechnology & Applied Microbiology; Computer
Science, Interdisciplinary Applications; Mathematical &

Computational Biology; Statistics & Probability

Research Areas:

Biochemistry & Molecular Biology; Biotechnology & Applied Microbiology; Computer Science;
Mathematical & Computational Biology; Mathematics

IDS Number:

853KR

ISSN:

1367
-
4803