Selection of oligonucleotide probes for protein coding sequences


Sep 29, 2013 (4 years and 7 months ago)


Vol.19 no.7 2003,pages 796–802
Selection of oligonucleotide probes for protein
coding sequences
Xiaowei Wang and Brian Seed

Department of Molecular Biology,50 Blossom Street,Massachusetts General
Hospital,Boston,MA 02114,USA
Received on July 31,2002;revised on November 13,2002;accepted on December 14,2002
Motivation:Large arrays of oligonucleotide probes have
become popular tools for analyzing RNA expression.
However to date most oligo collections contain poorly
validated sequences or are biased toward untranslated
regions (UTRs).Here we present a strategy for picking
oligos for microarrays that focus on a design universe
consisting exclusively of protein coding regions.We
describe the constraints in oligo design that are imposed
by this strategy,as well as a software tool that allows the
strategy to be applied broadly.
Result:In this work we sequentially apply a variety of sim-
ple Þlters to candidate sequences for oligo probes.The pri-
mary Þlter is a rejection of probes that contain contiguous
identity with any other sequence in the sample universe
that exceeds a pre-established threshold length.We Þnd
that rejection of oligos that contain 15 bases of perfect
match with other sequences in the design universe is a fea-
sible strategy for oligo selection for probe arrays designed
to interrogate mammalian RNA populations.Filters to re-
move sequences with low complexity and predicted poor
probe accessibility narrow the candidate probe space only
slightly.Rejection based on global sequence alignment is
performed as a secondary,rather than primary,test,lead-
ing to an algorithm that is computationally efÞcient.Splice
isoforms pose unique challenges and we Þnd that isoform
prevalence will for the most part have to be determined by
analysis of the patterns of hybridization of partially redun-
dant oligonucleotides.
Availability:The oligo design programOligoPicker and its
source code are freely available at our website.
Supplementary information:http://pga.mgh.harvard.
Microarray technology has been widely used to monitor
gene expression in recent years.Hybridization to such
arrays provides a fast and moderately reliable approach to

To whomcorrespondence should be addressed.
analyze a large number of genes simultaneously (reviewed
by Gerhold et al.,1999).The two most widely used
types of non-commercial microarrays are spotted cDNA
and spotted oligonucleotide arrays.One of the challenges
posed by oligo array technology is the design of probes
that provide the maximumamount of biologically relevant
To date little attention has been paid to choosing oligos
that provide information that has the highest biological
relevance.Most probe collections focus on 3

part because of a presumption that oligo dT will be
used to prime the RNA populations,and in part because
sequence divergence is typically greater in such regions.
However 3

UTR variability is substantial,and the tools
for predicting the prevalence of alternative 3

ends are
still in development.The 5

ends of transcription units
that lack TATA elements are also heterogeneous,and
quantitative estimates of the distributions of ends are rare
in the literature.Ultimately,most users of microarray data
are interested in the abundance of the encoded proteins,
for which the prevalence of the cognate RNAs remains the
most accessible surrogate.
In this paper,we describe the results of studies on
the impact of various design criteria on the fraction of
candidate probes qualiÞed for microarray interrogation
of RNA coding regions.OligoPicker,our design tool,
evaluates speciÞcity primarily for the occurrences of
contiguous perfect matches.Sequence speciÞcity is also
crosschecked by global BLAST score.The hybridization
accessibility of both the probes and the target analyte is
assessed by calculation of regions that may self-anneal.
We have also developed an algorithmto cluster redundant
mouse and human genes and have used the resulting
unique sequences for probe design.
Oligo picking criteria
Location in the sequence.Two kinds of priming reac-
tions are commonly used to create cDNA from mRNA or
total RNA for microarray experiments:random priming
Bioinformatics 19(7)
Oxford University Press 2003;all rights reserved.
by guest on September 29, 2013 from
by guest on September 29, 2013 from
by guest on September 29, 2013 from
by guest on September 29, 2013 from
by guest on September 29, 2013 from
by guest on September 29, 2013 from
by guest on September 29, 2013 from
Oligo probe selection
and oligo dT priming.Random priming is expected to re-
sult in the creation of probes that have signiÞcant represen-
tation of non-coding RNAs,such as ribosomal,or small
nuclear RNAs.Oligo dT priming is expected to result in
probes that are enriched for mRNAs,but that will not nec-
essarily encompass the entire coding region.Because we
are interested in understanding the contributions of differ-
ent coding region isoforms,we expect to be performing
random primed labeling.Since the primer can anneal at
any place in the template,if each primer is capable of be-
ing extended to the 5

end of the RNA,there will result
a linear gradient in the representation of the sequences,
ranging from highest at the 5

end to lowest at the 3

In the limit of very large transcript length,premature ter-
mination of reverse transcription will have the effect of
ßattening the gradient toward the 5

end;but in general
the desired oligos for maximum sensitivity in a random
primed labeling will lie as close to the 5

end of the RNA
as possible.
uniformity.To insure quantitative comparison of
gene expression,microarray hybridization conditions
should be similar for all genes in the study.Although
tetra-alkyl ammonium salts in appropriate concentrations
are known to eliminate the dependence of melting temper-
ature on base composition (Jacobs et al.,1988) and have
been applied to the use of degenerate oligonucleotide hy-
bridization probes (Wood et al.,1985),they have not yet
been widely applied to microarray hybridizations.Hence
one design requirement for the oligo probes was that
their melting temperatures (T
) should fall in a narrow
range.Since the GC content varies in different organisms,
the oligo design tool Þrst evaluates all sequences in the
data set and determines the median T
for oligos by the
following formula:
64.9 + 41 × gcCount/oligoLength − 600/oligoLength
where gcCount is the number of all Gs and Cs in an
oligo and the molar sodium concentration is taken to
be 0.1 M (Schildkraut,1965).OligoPicker starts picking
oligos fromone end of the sequence.A probe candidate is
discarded if its T
is not within 5

C of the median T
Probe accessibility.The contribution of probe sec-
ondary structure to oligo hybridization efÞciency has
not been studied in great detail to date,but may impact
performance because hybridizations are performed at
below 50

C in many microarray protocols (Hughes et
al.,2001;Kane et al.,2000).Although the prediction of
nucleic acid secondary or tertiary structure remains quite
challenging,a priori,the likelihood of secondary structure
is greatest in regions of signiÞcant self-complementarity,
for example in the stems of stem-loop structures (reviewed
in Mount,2001).Probe candidates are therefore tested for
homology to the complementary strand of their cognate
sequences using BLAST.This approach does not take
local concentration of the complementary sequence into
account,for example the expected higher free energy
(due to ring closure entropy) of structures that entail the
formation of large loops.
Reduced cross-hybridization.Although the hybridiza-
tion free energy cannot be calculated directly at present
(Li and Stormo,2001),and empirical data suggest that
mismatch context can signiÞcantly affect DNA duplex
stability (Kierzek et al.,1999),a single mismatch can
be expected to destabilize the hybridization complex
(Willems et al.,1989).Because we hypothesize that
contiguous base pairing is the single most important
determinant of cross-hybridization,we have made the
rejection of contiguous sequence identity the primary
Þlter in our selection scheme.
To quickly search for a stretch of perfectly matched
bases,we store all possible 10mers within the data set
in a hash table data structure.The hash key is a 10mer
sequence and the hash value is a string representation
of the relative sequence indexes and positions where this
particular 10mer is found (Fig.1A).10mers were adopted
after considering both programperformance and computer
memory consumption.Repetitive n-mers are identiÞed
by using two overlapping 10mers in the hash (Fig.1B).
Therefore the design tool is able to Þnd repetitive sequence
stretches from10mers to 20mers very quickly.
In an earlier study it was reported that very high
sequence similarity may lead to cross-hybridization even
when the sequences have been prescreened for contiguous
perfect match (Kane et al.,2000).To reduce the contribu-
tion to cross-hybridization attributable to global similarity,
oligos are also screened on the basis of their NCBI BLAST
scores ( against the
design universe.
Because of high sequence homology,some sequences
will inevitably fail to be represented by unique oligo
probes.The sequence homology is most likely from
alternative splicing or from other distinct genes in gene
super-families.In these cases,oligo probes are picked
from the regions that cross-hybridize,determined by both
continuous base match and BLAST score,to the smallest
number of other sequences in the sample universe.
Evasion of non-coding RNA and low complexity regions.
One practical concern is that RNA other than mRNA may
interfere with array hybridization when total RNA is used
as the starting material.These RNAs may also be reverse-
transcribed and cross-hybridize to some oligo probes.To
address this issue,sequence regions similar to rRNA or
snRNA are avoided during the probe selection process by
using both contiguous base match screening and BLAST.
Low-complexity regions are also likely to contribute to
X.Wang and B.Seed
Gene Index = 1000
gtcattgatg (1000,1),(343, 22),(4442, 201),(4599, 890),(18949,1900), 
tcattgatga (1000,2),(10225, 455),(14567, 890),(20021,12), 
cattgatgaa (1000,3),(23,444),(2265, 211),(7895, 2110), 
attgatgaag (1000,4),(6679,3451), 
ttgatgaagc (1000,5),(7865, 67), 
tgatgaagcg (1000,6),(4599,895),(9899,22), 
gtcattgatgaagcg (15-mer)
gtcattgatg (1000,1),(343, 22),(4442, 201),(4599, 890),(18949,1900), 
tgatgaagcg (1000,6),(4599,895),(9899,22), 
Fig.1.An example to illustrate how a repetitive 15mer is identiÞed.All possible 10mers within the data set were stored in a hash data
structure.A hash key is a 10mer sequence and the hash value is a string representation of the relative sequence indexes and positions where
this particular 10mer is found.Repetitive 15mers were identiÞed by using two overlapping 10mers in the hash.( A) The relative gene index
of this sample sequence is 1000.All 10mers were stored in the existing hash by gliding through the sequence one base at a time.Each
pair of parentheses (gene index,position) represents where this 10mer is found.( B) The 15mer ÔgtcattgatgaagcgÕfrom position 1 in gene
index 1000 is used here to illustrate how repetitive 15mers are identiÞed by using the 10mer hash data structure.This 15mer also occurred at
position 890 in sequence index 4599 (highlighted in bold).
cross-hybridization (Wootton and Federhen,1996).These
regions are identiÞed by the DUST program(Hancock and
Armstrong,1994) and skipped when selecting probes.
Sequence Þle preparation
Sequence redundancy is one of the biggest challenges
for probe design.The most highly documented source of
candidate genes,the NCBI Reference Sequence Project,
RefSeq (Pruitt et al.,2000;Pruitt and Maglott,2001)
provides functional annotations and avoids sequence
redundancy,but is still relatively incomplete.The In-
stitute for Genomic Research (TIGR) has produced
an orthologous gene list for human,mouse,and rat
( that we have also used to help iden-
tify gene candidates.However the principle source of our
sequence information has been the NCBI protein database
GenPept.The corresponding DNAcoding sequences were
retrieved based on the CDS feature instructions (http:
// and
redundant sequences were clustered (data sets available
online,see Supplementary Information).In our clustering
algorithm and its implementation (a program called
DeRedund),we gave higher priority to RefSeq sequences
(excluding RefSeq genes predicted from genomic contig
assembly process),orthologous sequences and longer
coding sequences.Each DNA coding sequence was
aligned against all other coding sequences in the data set
using BLAST.Sequences with at least 96% identity to
the query sequence and with an alignment equal to or
greater than the query lengthÑ5 bases were considered
redundant.Varying the identity threshold from 90 to
98% had little effect on the Þnal number of clusters
(<5%).Ninety-six percent was chosen after considering
sequencing errors and polymorphism density.Cluster
representatives were fed to the oligo design program for
probe design.
There were close to 60 000 mouse sequences in NCBI
protein database as of 1st March 2002.About 1/4 of
them are immunoglobulin or TCR sequences,most of
Oligo probe selection
which were removed as the initial step to reduce sequence
redundancy.Redundancies were then removed to generate
20 030 gene clusters.The resulting sequences were
organized in a FASTA Þle and used as the input data for
OligoPicker to design 70mer probes.All oligo probes
have predicted melting temperatures falling in a 10

range (79

C ±5

Impact of threshold for rejection of contiguous
We empirically explored the fraction of oligonucleotides
containing contiguous identity of length n or greater with
at least one other sequence in the design universe (Fig.2).
For this investigation we used the mouse GenPept data
set,and assessed picking failure,deÞned by the inability
to choose at least one oligo for a given gene,as a function
of the perfect match length n.Figure 2 shows the fraction
of genes failing the test as n is incrementally shortened
from an initial value of 20.The latter value was chosen
because there was very little change over the range 70 to
20 (data not shown),indicating that the sequences that
fail the 20mer perfect match threshold correspond to very
closely related sequences in the database.In the remaining
universe of sequences that pass the 20mer match criterion,
an increasing fraction fail as the contiguous match length
is decreased.In Figure 2 we also display the expected
number of picking failures when each of the mouse
sequences is replaced by random strings of nucleotides of
the same length and base composition.The difference in
behavior between random sequences and sequences from
the mouse data set can be taken as a measure of the higher
order non-random character of the murine coding regions
due to biological selection.Rejection of 15mer match was
chosen as the default for the oligo selection tool since
14mer match is not likely to signiÞcantly contribute to
cross-hybridization under most array hybridization con-
ditions.For the non-coding RNA Þlter,a more stringent
criterion,rejection of 13mer match,was chosen because
of the abundance of these RNAs in the sample.
Impact of oligo probe length
To evaluate the effect of oligo length on selection failure
rate,the oligo length was varied over the range 30Ð100
nucleotides.Figure 3 shows that the number of probes that
can be designed decreases as the probe length increases.
70mer oligo probes have been widely adopted in the
microarray community,so this length was chosen for
our probe sets.However,the users have the option to
design probes from 20 bases to 100 bases long (see
Supplementary Information).
BLAST and self-annealing Þlters
After rejecting candidate oligonucleotides on the basis
of contiguous match,BLAST and self-annealing Þlters
>= 20-mer 19-mer 18-mer 17-mer 16-mer 15-mer 14-mer
Perfect-Match Length for Rejection
Additional Failed Sequences
mouse sequence
random sequence
Fig.2.Impact of threshold for rejection of contiguous matches.
Each probe is a 70mer oligonucleotide representing the cognate
DNA sequence.All probes are within a 10

range (79 ±5

of each other.Probe candidates containing contiguous identity of
length n or greater with at least one other sequence in the design
universe were rejected.Sequence failure,deÞned by the inability to
choose at least one oligo probe for a given gene,was assessed as a
function of the perfect match length for rejection.The fraction of
genes failing the test is shown here as the length of perfect matches
is incrementally shortened from an initial value of 20.Also shown
is the expected number of sequence failures when each of the mouse
sequences is replaced by random strings of nucleotides of the same
length and base composition.
30 40 50 60 70 80 90 100
Probe Length
Unique Probes
Fig.3.Impact of oligo probe length on probe selection.The oligo
length was varied over the range 30Ð100 nucleotides for each probe
set.All probes are within a 10

range (79±5

C) of each other
and the selected probes do not have any repetitive 15mer.
were applied.Oligos were designated as eligible if they
showed a BLAST score lower than a threshold value
when compared to all sequences in the sample universe
(Fig.4).The BLAST parameters are -F F -S 1 -e 1000.
To evaluate the effect of BLAST score on oligo picking,
X.Wang and B.Seed
25 30 35 40 45 50 55
Threshold BLAST Score
Unique Probes
Fig.4.Impact of BLAST score Þlter on probe design.Each probe
is a 70mer oligonucleotide representing the cognate DNAsequence.
All probes are within a 10

range (79 ± 5

C) of each other
and the selected probes do not have any repetitive 15mer.Oligos
were designated as unique if they showed a BLAST score lower
than a threshold value when compared to all sequences in the sample
universe.To evaluate the effect of BLAST score on oligo picking,a
score range 28Ð50 was examined.
a score range 28Ð50 was examined.Only a small number
of probes could be designed when the score was lowered
to 28.A BLAST score of 30 roughly corresponds to
15 contiguously matched bases.As discussed in the
contiguous match Þlter section,14mer match is acceptable
and so is BLAST score <30.Therefore,score 30 was
selected for the BLAST Þlter.
The self-annealing Þlter was examined in a similar
way.Probe candidate sequences were aligned against the
complementary strands of the target analyte using BLAST.
The BLAST parameters are -S 2 -F F -W 8 -e 1000.
The longer the perfect-match regions on the cDNA,the
more likely they are to base pair and become inaccessible
to probes during hybridization.Probes were selected
if no more than a threshold number (8Ð12) of perfect
matches were found with the complementary strand of the
mRNA.As shown in Figure 5,very similar numbers of
probes could be designed when the possible base-paired
region was longer than eight.Therefore,nine maximum
paired-bases were chosen for the cDNA self-annealing
Þlter.Similarly for oligo self-annealing,the oligo was
compared to its self-complementary sequence to search
for perfect matches.A more stringent criterion was usedÐ
up to eight paired-bases were allowed to accommodate the
expectation that self-pairing of the oligo would be more
detrimental to hybridization than self-pairing to distant
sequence in the mRNA.
Impact of combining all screening Þlters
Probes were selected under Þve different conditions:(a)
no repetitive 15mer in the oligos.(b) Condition a +
8 9 10 11 12
Maximum Paired Bases
Unique Probes
Fig.5.Impact of probe self-annealing Þlter on probe design.Each
probe is a 70mer oligonucleotide representing the cognate DNA
sequence.All probes are within a 10

range (79 ± 5

C) of
each other and the selected probes do not have any repetitive 15mer.
Probes were selected if no more than a threshold number (8Ð12)
of perfect matches were found with the complement strand of the
cognate mRNA.
low-complexity Þlter.(c) Condition b + RNA Þlter.
(d) Condition c + self-annealing Þlter.(e) Condition d
+ BLAST Þlter (Fig.6).As expected,the number of
sequences that failed to generate unique probes increased
as the screening stringency increased.The rejection of
repetitive 15mers was the most effective Þlter with the
parameters we set,resulting in 87% of all the picking
failures (Fig.6).The low-complexity Þlter,RNA Þlter,
self-annealing (accessibility) Þlter,and BLAST Þlter only
slightly affected the probe design.For the sequences
that did not have unique probes under these conditions,
the algorithm is designed to accept repetitive 15mers
(reject repetitive 16mers) and under these conditions 2771
additional unique oligo probes could be designed.
Characterization of the non-unique probes
A 70mer oligo probe set was prepared by applying
15-base contiguous match,low-complexity,RNA,self-
annealing,and BLAST score Þlters.Each sequence
was represented by one oligo probe.One unique oligo
probe was designed for each sequence if the oligo passed
these Þlters.Sequences that had no unique probes were
represented by probes that cross-hybridized to other
sequences.14 554 sequences were represented by unique
probes.Among the non-unique probes,the vast majority,
3196 probes,cross-hybridized to only one sequence.This
probe fraction was further analyzed by keyword search
in the sequence annotations for splice isoforms and very
similar genes (gene families).Olfactory receptors were
the largest gene family (173 members) among these 3196
sequences.The cross-hybridizing probes were categorized
Oligo probe selection
a b c d e
Screening Conditions
Additional Failed Sequences
Fig.6.Effects of combining all screening Þlters.All probes are
within a 10

C Tm range (79 ± 5

C) of each other.In addition,
probes were selected under Þve different conditions (a) no repetitive
15mer in the oligos.(b) Condition a + low-complexity Þlter.
(c) Condition b + RNA Þlter.(d) Condition c + self-annealing
Þlter.(e) Condition d + BLAST Þlter.The number of sequences
that failed to generate unique probes increased as the screening
stringency increased.Additional failed sequences are shown here
as the result of increased screening stringency.
into two classes,100% cross-hybridization and partial
cross-hybridization.Non-unique probes that represented
splice isoforms,identiÞed by sequence annotations,
were mostly found to totally cross-hybridize to another
sequence.On the other hand,high gene similarity usually
resulted in partial cross-hybridization (Fig.7).Further
analysis showed that 1549 out of the 3196 non-unique
probes cross-hybridized to sequences that were repre-
sented by unique probes.Randomsampling indicated that
the cross-hybridizing sequences were usually the longer
form of the isoforms.Thus in general the shortest of a
collection of splice isoforms is the hardest to design a
unique oligo for.
Our Þndings indicate that the challenges posed by
splice isoforms do not in general admit a solution based
on border or splice junction oligos,given the design
criteria and oligo length we have adopted.Instead we
Þnd it necessary to identify splice isoforms by analysis
of ensembles of oligonucleotides and interpretation of the
pattern of hybridization.The concentration of the smallest
splice isoform in an RNA sample will in general have
to be calculated from the difference in signal intensity
between a redundantly represented oligo that hybridizes
to two or more species,and a unique oligo.In some cases,
for example where many different isoforms are present,
the accuracy of this determination is not likely to be high
and other methods,such as quantitative PCR are likely to
provide better estimates.
100% Cross-
Partial Cross-
Probe Fraction
Splice Isofor
All Sequences
Olfactory Receptor
Fig.7.Splice isoform vs.gene family in non-unique probes.There
were 3196 probes that had only one cross-hybridizing sequence.
This probe fraction was further analyzed by keyword search in
the sequence annotations for splice isoforms and very similar
genes (gene families).Olfactory receptors were the largest gene
family among these sequences.The cross-hybridizing probes were
categorized into two classes,100% cross-hybridization and partial
Probe speciÞcity
Several papers have analyzed the requirements for de-
signing gene speciÞc oligo probes (Evertsz et al.,2001;
Hughes et al.,2001;Kane et al.,2000;Miller et al.,2002).
There are also a few programs available either commer-
cially or as open source for probe design (Hughes et al.,
2001;Pruitt et al.,2000;Lockhart et al.,1996;Rouillard
et al.,2002;
These programs usually use the BLAST program as the
primary tool to locate gene speciÞc regions where the
probes are designed.
In our algorithm,we Þrst addressed the importance
of contiguous matches in the role of designing gene
speciÞc probes.A hashing technique,similar to that used
in FASTA and BLAST,was adopted to provide rapid
sequence similarity assessment.However instead of being
used as seeds for alignment extensions as in BLAST,the
10mers were used as basic units to construct all possible
15mers in the data set (Fig.1).In this way,repetitive
15mers could be quickly identiÞed.The probes were also
crosschecked later by BLAST screening.The occurrence
of 15mer perfect matches strongly correlated with high
BLAST scores.As shown in Figure 6,unique probes with
below threshold BLAST score (<30) can be designed
for most sequences that have passed the 15mer perfect
match screening in the sample design universe.Because of
the hashing technique used here,our program can screen
20 030 sequences in a few hours,much faster than other
oligo picking programs that were implemented primarily
with BLAST.
X.Wang and B.Seed
Probe sensitivity
It is important that all the probes and the hybridizing
regions of oligos be accessible under hybridization condi-
tions.Several programs can predict secondary structures
for small regions of nucleic acid based on free energy
calculations (reviewed in Zuker,2000).These programs
have been used to design secondary-structure free probes
(Rouillard et al.,2002).However an equally if not more
relevant parameter is the secondary structure of the target
analyte.If the sequences chosen for the oligo coincide
with a region of self-complementarity in the mRNA,it
is likely that hybridization efÞciency will be reduced.To
avoid such sequences we search the complementary strand
of the mRNA for similarity to sequences in the oligo,
and reject those that exceed an empirically established
In summary,we have designed an oligo picking algo-
rithmand used it to design unique oligo probes for 14 554
mouse genes and 17 558 human genes after clustering
redundant sequences.Probes that may cross-hybridize to
more than one gene were also picked to represent the rest
of the sequences in NCBI GenPept database.
We thank our colleagues in the Molecular Biology Core
Facility for their contributions,Jonathan Urbach for an
early version of the code for removing redundancies,and
Mason Freeman for suggestions on oligo design.This
research was supported by the PGA grant U01 HL66678
fromthe National Heart,Lung and Blood Institute.
Reynolds,M.A.(2001) Hybridization cross-reactivity
within homologous gene families on glass cDNA microarrays.
Gerhold,D.,Rushmore,T.and Caskey,C.T.(1999) DNA chips:
promising toys have become powerful tools.Trends.Biochem.
Hancock,J.M.and Armstrong,J.S.(1994) SIMPLE34:an improved
and enhanced implementation for VAXand Sun computers of the
SIMPLE algorithm for analysis of clustered repetitive motifs in
nucleotide sequences.Comput.Appl.Biosci.,10,67Ð70.
Meyer,M.R.(2001) Expression proÞling using microarrays fabri-
cated by an ink-jet oligonucleotide synthesizer.Nat.Biotechnol.,
and Fritsch,E.F.(1988) The thermal stability of oligo-
nucleotide duplexes is sequence independent in tetraalkylammo-
niumsalt solutions:application to identifying recombinant DNA
clones.Nucleic Acids Res.,16,4637Ð4650.
Madore,S.J.(2000) Assessment of the sensitivity and speciÞcity
of oligonucleotide (50mer) microarrays.Nucleic Acids Res.,28,
Kierzek,R.,Burkard,M.E.and Turner,D.H.(1999) Thermodynam-
ics of single mismatches in RNA duplexes.Biochemistry,38,
Li,F.and Stormo,G.D.(2001) Selection of optimal DNA oligos for
gene expression arrays.Bioinformatics,17,1067Ð1076.
Chee,M.S.,Mittmann,M.,Wang,C.,Kobayashi,M.and Horton,H.
(1996) Expression monitoring by hybridization to high-density
oligonucleotide arrays.Nat.Biotechnol.,14,1675Ð1680.
LaBrie,S.T.(2002) Cross-hybridization of closely related genes
on high-density macroarrays.Biotechniques,32,620Ð625.
Mount,D.W.(2001) Bioinformatics:Sequence and Genome Analy-
sis.Cold Spring Harbor Laboratory Press,New York City.
Pruitt,K.D.,Katz,K.S.,Sicotte,H.and Maglott,D.R.(2000) Intro-
ducing RefSeq and LocusLink:curated human genome resources
at the NCBI.Trends Genet.,16,44Ð47.
Pruitt,K.D.and Maglott,D.R.(2001) RefSeq and LocusLink:NCBI
gene-centered resources.Nucleic Acids Res.,29,137Ð140.
Rouillard,J.M.,Herbert,C.J.and Zuker,M.(2002) OligoArray:
genome-scale oligonucleotide design for microarrays.Bioinfor-
Schildkraut,C.(1965) Dependence of the melting temperature of
DNA on salt concentration.Biopolymers,3,195Ð208.
Willems,v.D.,Schroeder,H.W.J.,Perlmutter,R.M.and Milner,E.C.
(1989) Heterogeneity in the human Ig VH locus.J.Immunol.,
Wood,W.I.,Gitschier,J.,Lasky,L.A.and Lawn,R.M.(1985) Base
composition-independent hybridization in tetramethylammo-
nium chloride:a method for oligonucleotide screening of highly
complex gene libraries.Proc.Natl Acad.Sci.USA,82,1585Ð
Wootton,J.C.and Federhen,S.(1996) Analysis of compositionally
biased regions in sequence databases.Methods Enzymol.,266,
Zuker,M.(2000) Calculating nucleic acid secondary structure.Curr.