Generation of Pairwise Test Sets using a Genetic Algorithm
James D. McCaffrey
Microsoft MSDN
Redmond, WA 98052
USA
Abstract
Pairwise testing is a combinatorial technique used
to reduce the number of test case inputs to a system in
situations
where exhaustive testing with all possible
inputs is not possible or prohibitively expensive. Given
a set of input parameters where each parameter can
take on one of a discrete set of values, a pairwise test
set consists of a collection of vectors which ca
ptures
all possible combinations of pairs of parameter values.
The generation of minimal pairwise test sets has been
shown to be an NP

complete problem and there have
been several deterministic algorithms published. This
paper presents the results of an in
vestigation of
generating pairwise test sets using a genetic algorithm.
Compared with published results for deterministic
pairwise test set generation algorithms, the genetic
algorithm approach produced test sets which were
comparable or better in terms of
test set size in 39 out
of 40 cases. However, the genetic algorithm approach
required longer processing time than deterministic
approaches in all cases. The results demonstrate that
the generation of pairwise test sets using a genetic
algorithm is possibl
e, and suggest that the approach
may be practical and useful in certain testing
scenarios
.
Keywords: Combinatorial mathematics, genetic
algorithms, pairwise testing, software quality, software
testing.
1. Introduction
Consider a system which has n inp
ut parameters
where each parameter can take on a single, discrete
value. In many situations exhaustive testing of all
possible combinations of input values is not feasible.
For example, if n = 20 input parameters, where each
parameter can be assigned one o
f 10 values, there are
10
20
different input sets. If tests can be executed at a
rate of 1,000 cases per second, a test run would require
10
17
seconds, or roughly 3 billion years, to complete.
Even when the total number of test case combinations
is small, e
xhaustive testing may not be possible if each
test case is expensive. Pairwise testing is a
combinatorial technique which selects a subset of all
possible test case input combinations [
10
, 12
]. A
pairwise test set consists of a collection of test vectors
w
hich captures all possible combinations of pairs of
input parameter values. In informal terms, for two
parameters p0 and p1, and any valid values v0 for p0
and v1 for p1, there is a test vector in which p0 has the
value v0 and p1 has the value v1. The conc
ept is best
illustrated by example. Suppose a system has four
parameters, p0, p1, p2, and p3. Further, suppose that
parameter p0 can accept one of two possible values,
{a0, a1}. And suppose the possible values for
parameters p1, p2, and p3 are {b0, b1, b2,
b3}, {c0, c1,
c2}, and {d0, d1} respectively. For this situation there
are a total of 2 * 4 * 3 * 2 = 48 combinations of input
values. For example, one arbitrary test vector is {a0,
b2, c1, d0}. Additionally, for this situation there are a
total of 44 pai
rs of input values:
{a0, b0}, {a0, b1}, {a0, b2}, {a0, b3},
{a0, c0}, {a0, c1}, {a0, c2}, {a0, d0},
{a0, d1}, {a1, b0}, {a1, b1}, {a1, b2},
{a1, b3}, {a1, c0}, {a1, c1}, {a1, c2},
{a1, d0}, {a1, d1}, {b0, c0}, {b0, c1},
{b0, c2}, {b0, d0}, {b0, d1}, {b1,
c0},
{b1, c1}, {b1, c2}, {b1, d0}, {b1, d1},
{b2, c0}, {b2, c1}, {b2, c2}, {b2, d0},
{b2, d1}, {b3, c0}, {b3, c1}, {b3, c2},
{b3, d0}, {b3, d1}, {c0, d0}, {c0, d1},
{c1, d0}, {c1, d1}, {c2, d0}, {c2, d1}.
A pairwise test set for this scenario consists of
a
collection of test vectors which capture all input pairs.
For example, the following test set of 12 test vectors
captures all 44 possible pairs of input values:
0: a0 b0 c0 d0
1: a1 b0 c1 d1
2: a1 b1 c2 d0
3: a0 b2 c2 d1
4: a1 b3 c0 d1
5: a0
b1 c1 d0
6: a1 b2 c0 d0
7: a0 b3 c1 d0
8: a0 b0 c2 d0
9: a0 b1 c0 d1
10: a0 b2 c1 d0
11: a0 b3 c2 d0
Because the intent of pairwise testing is to reduce
the number of test cases, smaller test set sizes are better
than larger test set sizes. The
fundamental notion
behind pairwise testing is the premise that most
software faults result from either single

value inputs or
by an interaction between pairs of input values [4].
Generating minimal size pairwise test sets is an NP

complete problem [
9
]. On
e approach to pairwise test
set generation is the use of orthogonal arrays [1
2
].
Another approach is the use of an iterative technique
which employs a greedy algorithm to construct a test
set one vector at a time until all possible pairs are
captured [3].
A third approach is to generate a test set
for the first two parameters, and then iteratively extend
the test set to account for each remaining parameter
[4]. A comprehensive review of the research literature
on pairwise test set generation techniques yiel
ded
a
single
paper which explored the use of a genetic
algorithm [
5
]. That paper presented the results of a
feasibility study performed on a single input set.
However, the input set was small (four parameters,
each of which could take on one of three values) and
the resulting pairwise test set size was non

optimal (10
tes
t vectors rather than 9 vectors). Additionally, the
study did not compare the effectiveness of the
approach with other techniques. This paper extends
that feasibility study and demonstrates the use of a
genetic algorithm to generate pairwise test sets. The
technique is referred to as GAPTS (Genetic Algorithm
for Pairwise Test Sets) generation. The GAPTS
algorithm was executed against seven benchmark input
sets, and the GAPTS results
were
compared with the
results produced by five other pairwise test set
gen
eration algorithms.
2.
Genetic Algorithms
Genetic Algorithms (GAs) are a class of
computational procedures inspired by biological
evolution [
7
]. GAs encode a potential solution to a
specific problem using a simple chromosome

like data
structure and then
apply operators modeled after
genetic recombination and mutation to these structures
in a way that is designed to preserve essential
information. GAs maintain a population of individuals
each of which consists of a chromosome/solution and a
fitness value w
hich measures how well the individual's
chromosome solves the problem. Individuals with high
fitness values are selected to serve as the basis for
producing offspring solutions. Individuals with low
fitness values are removed from the population of
solutio
ns and replaced by offspring solutions. Genetic
algorithms are typically used to solve maximization
and minimization problems that are combinatorially
complex and which do not lend themselves to standard
algorithmic techniques. In pseudo code, one typical
form of a GA is:
set generation := 0
initialize population
while (generation < maxGenerations)
evaluate population fitness values
sort population based on fitness
if (optimal solution exists)
break
select high

fitness individuals
produce off
spring
stochastically mutate offspring
replace low

fitness individuals
end while
return best individual
There are many variations of the basic algorithm
structure which are possible. Genetic algorithms
merely provide a basic framework for solving a
problem and the implementation of a specific genetic
algorithm which solves a specific problem requires
several design decisions. Some of the major design
decisions include the following. First, a chromosome
representation of a solution to the target probl
em must
be designed. Second, a fitness function which measures
how well a chromosome solves the target problem
must be constructed. Third, stochastic algorithms to
implement genetic crossover and mutation must be
designed. Additional GA design parameters i
nclude
selection of the population size, a method for
determining which chromosome

solutions are selected
for reproduction, and a method for determining which
chromosome

solutions are selected for removal from
the population.
3.
GAPTS
Design
Pairwise tes
t set generation is general in the sense
that the technique applies to any type of discrete input
parameter values. When using a genetic algorithm each
parameter value corresponds to a gene in GA
terminology. Three common forms of genetic
algorithm gene re
presentation are binary Gray code,
integer, and character encoding [
6
]. For example,
suppose some parameter p0 can take on a string value
such as "red". A possible Gray code gene encoding is a
sequence of 24 bits where each character is mapped to
an 8

bit
Gray code. Alternatively, the value "red" can
be mapped to an arbitrary identifying integer value.
This identifying integer can then be used directly as a
gene representation, or the corresponding Gray code
can be used. GAPTS experimentation with various
g
ene encoding techniques found no advantage to using
a more complex Gray code scheme compared to using
a simpler identifying integer approach. The use of a
Gray code for gene representation is attractive in
certain numeric problem domains because mutation o
f
a single bit yields a gene value which is close to the
original value [
6
]. This characteristic of Gray codes is
not useful when generating pairwise test vectors if
gene values are categorical data encoded as
consecutive integers and mutation occurs at th
e gene
value level rather than at the bit level. Therefore the
GAPTS technique initially maps all possible input
parameter values to consecutive integer values, and
these integer values are used as individual gene values.
This approach results in a chromos
ome representation
with an array of integer values. For example, suppose
some system under test has three input parameters, p0,
p1, and p2. Parameter p0 can take on one of two string
values, "alpha" or "beta". Parameter p1 can take on one
of three numeric
values, 72.5, 88.0, or 112.3.
Parameter p3 can take on one of two Boolean values,
false or true. These seven values are mapped to
arbitrary integer IDs 0, 1, 2, 3, 4, 5, 6 and used as gene
values for "alpha", "beta", 72.5, . . . true. A
chromosome represen
ts a test set. Suppose the test set
size is set to 4 test vectors. Then the array
[0,3,6,1,2,5,1,4,5,0,2,6] is a chromosome modeling a
test set of size 4, where each test vector has size 3.
Notice that this chromosome representation introduces
implicit tes
t vector boundary locations which can serve
as target locations for crossover operations.
An individual in the GAPTS implementation is
defined as a chromosome and a fitness value. The
fitness function is straightforward and is defined
simply as the total n
umber of distinct pairs captured by
the individual's chromosome representation of test
vectors. Because the total number of pairs of parameter
values can be computed for any given set of parameters
and their associated possible values, the fitness value
ca
n be used to identify situations where a given
individual captures all pairs. A GAPTS population is
defined as an array of individuals. The key population
design decision was the choice of population size. A
larger GA population size tends to maintain more
solution diversity which in turn often reduces GA
tendency for population stagnation [
6
]. However, a
larger population size increases algorithm processing
time and increases the likelihood that low fitness
individuals will be selected as the basis for pro
ducing
offspring which in turn can increase the number of low
fitness individuals in the population. Based on the
results of preliminary experimentation, the GAPTS
algorithm uses a population size of 20. This population
size is relatively small compared to
genetic algorithms
applied to many problems. Somewhat surprisingly,
larger population sizes did not increase the
effectiveness of the GAPTS with respect to final
pairwise test set size. An investigation of the
interaction effects of population size with o
ther GA
parameters such as mutation rates on GAPTS is a
possible area for future research.
The GAPTS algorithm uses roulette wheel selection
to determine which individuals to choose as the basis
for producing offspring solutions. In roulette wheel
selectio
n, the probability that an individual is selected
is given by the ratio of the individual's fitness value to
the sum of all fitness values in the population. Roulette
wheel selection appears to be the most common form
of GA selection, and this study did no
t examine
alternative techniques such as tournament selection [1].
Early versions of GAPTS deterministically selected the
two least fit individuals in a population as those to be
replaced by two new offspring solutions in each
generation iteration. However
, this approach led to
frequent premature convergence to non

optimal
solutions and superior results were obtained by
selecting for removal those individuals whose fitness
rankings were the complements of the fitness rankings
for the two individuals selecte
d to produce offspring.
For example, with a population size of 20, suppose the
selection algorithm picks individuals with finesses
ranks 1 (highest fitness) and 3 (third highest fitness).
The two individuals selected for removal would be
those with fitness
ranks 20 (worst) and 18 (third
worst). The implication here is that with a relatively
small population size, even the least fit individuals
contain valuable information and should have some
measure of protection from elimination.
The GAPTS crossover mecha
nism uses a single
crossover point when producing offspring solutions.
This study did not investigate the use of multiple
crossover points. The GAPTS algorithm was least
effective for very large input sets which generate very
large chromosomes. Multiple cr
ossover points have the
potential to be an effective way to improve GAPTS
performance on large input sets [1
1
]. The use of
multiple crossover points is a very promising area for
future investigation. The selection of the single
crossover point was implemen
ted in such a way that
the location fell on a test vector boundary. This
approach ensured that the result of each crossover
operation yielded two valid chromosome

solutions. As
a result of the chromosome design described above,
any crossover point which do
es not fall on a test vector
boundary would yield two individuals with defective
genes, or equivalently, invalid test vectors.
This study performed significant experimentation
with mutation rates after preliminary versions of
GAPTS showed large performanc
e effects resulting
from changing these mutation probabilities. The
version of GAPTS described in this study uses a fixed
mutation rate of 0.001 where each gene in an offspring
chromosome is independently mutated to a new legal
gene value with probability
equal to the mutation rate.
Mutation rates larger than 0.001 tended to modify good
solutions t
o
o often which led to very slow
convergence. Mutation rates smaller than 0.001 tended
to lead to lead to premature convergence to non

optimal solutions. A promisi
ng area of future
investigation is to examine the effect of dynamically
adjusting the GAPTS mutation rate as a function of the
number of input set parameters values [
8
].
Preliminary versions of GAPTS tended to be
subject to population stagnation, or equiva
lently,
premature convergence to a non

optimal solution.
GAPTS uses two mechanisms to address the
population stagnation effect. First, GAPTS uses of
form of elitism, in this case where the individual with
highest fitness value in the population is immune f
rom
removal in each generation. Second, GAPTS uses a
form of immigration, where an individual with a
randomly generated chromosome is inserted into the
population every 1,000 generations. GAPTS replaces
an existing individual in the population with the
imm
igrant, by randomly selecting an individual whose
fitness ranking is in the bottom half of all rankings. As
noted previously, it was determined that with relatively
small population sizes, even low

ranking individuals
tend to hold valuable information and
therefore a
probabilistic approach for eliminating a chromosome

solution upon the introduction of an immigrant is
preferable to a deterministic replacement strategy.
Along with multiple crossover and dynamic mutation
rates, investigations of different immi
gration strategies
offer promising areas of additional research.
To summarize, one of the weaknesses of genetic
algorithms is the large number of design parameters
which must be determined. The GAPTS technique uses
integer encoding for genes, a population
size of 20
chromosome

solutions, standard GA roulette wheel
selection for parent chromosomes, a parent

rank
complement approach for determining chromosome

solution removals from the population, a single
crossover point strategy, a fixed mutation rate of 0.
001,
best chromosome elitism, and random immigration
every 1,000 generations.
4.
GAPTS
Implementation
The diagrams in Figure 1 illustrate the principal
data structures used by the GAPTS program. The two
inputs to the program are a text file which contain
s
parameters and parameter values, and the target test set
size. The input file format consists of a parameter
name followed by a colon delimiter and a comma

separated list of values that the parameter can take on.
The GAPTS implementation reads the input
file and
creates a jagged array named legalValues where
parameter values are normalized to consecutive 0

based integer values. This approach facilitates the
generation of efficient lookup tables where the table
indexes correspond to parameter values. The
l
egalValues data structure is used to create two such
lookup tables. The pairsSearch table represents all
pairs of parameter values which must be captured by
the final pairwise test set. This table is used to validate
pairs of values in a chromosome

solutio
n when the
fitness of each chromosome is computed.
Figure 1. Principal GAPTS Data Structures
.
Generation of the pairsSearch
table also produces
the total number of distinct pairs which must be
captured, 16 in the case of the example data in Figure
1. The legalValues data structure is also used to create
a second lookup table named legalSearch. The
legalSearch table is used to
validate the gene values in
the chromosome data structures used by the GAPTS
algorithm when chromosomes are initially generated
and when new offspring chromosomes are created by
the crossover and mutation operations. A chromosome
data structure is realized
as an array of integer values.
The length of a chromosome structure is equal to the
product of number of parameters and the test set size.
The chromosome illustrated in Figure 1 represents the
six test vectors {0, 2, 5}, {1, 3, 6}, {0, 4, 5}, 0, 3, 6},
{1
, 2, 5}, and {0, 2, 6}. The fitness of a particular
chromosome is calculated by scanning through the
chromosome values and counting the number of
distinct pairs which are captured. For example, in the
chromosome in Figure 1, the first test vector captures
the three pairs (0, 2), (0, 5), and (2, 5), and each pair
can be validated by performing lookups into the
pairsSearch table. Additionally, the chromosome
captures the ten pairs (0, 2), (0, 3), (0, 4), (0, 5), (0, 6),
(1, 2), (1, 3), (1, 5), (1, 6), (2, 5),
(2, 6), (3, 6), and (4,
5) so the fitness value of the chromosome is 13.
5
.
GAPTS Results
The GAPTS algorithm described in the previous
section was implemented using the C# programming
language. Using published results as guidelines, for a
given input s
et an initial test set size was supplied to
GAPTS. The GAPTS program was then run. If the
program successfully produced a test set which
captured all possible pairs of input values, then the test
set size was decremented and the program run again.
This pro
cess was continued until no additional
improvement was discovered after 10
6
generations.
Table 1. Input Test Sets.
Input
Parameters
Values
Pairs
S0
4
11
44
S1
4
12
54
S2
13
39
702
S3
61
339
14,026
S4
75
291
17,987
S5
100
200
19,800
S6
20
200
19,000
The primary focus of this study was to investigate
pairwise test set size. The data in Table 1 summarize
the characteristics of seven benchmark input sets used
to evaluate the effectiveness of GAPTS with respect to
test set size. Input set S0 has
np = 4 (four input
parameters) where the first parameter can take on one
of two values, the second parameter one of four values,
the third parameter one of three values, and the fourth
parameter one of two values. Input set S0 has nv = 11
(11 total number
of parameter values) and requires tp =
44 (total number of pairs which must be captured).
Input sets S1, S2, S5, and S6 have parameter valu
es
which are evenly distributed: 4 3

valued parameters,
13 3

valued parameters, 100 2

valued parameters, and
20 10

va
lued parameters respectively.
Input set S3 has
15 4

value
d
parameters, 17 3

value
d
parameters, and
29 2

value
d
parameters. Input set S4 has one 4

value
d
parameter, 39 3

value
d
parameters, and 35 2

value
d
parameters.
Table 2.
Pairwise Test Set Size Results
Comparison.
S0
S1
S2
S3
S4
S5
S6
PICT
12
13
20
38
31
16
216
QICT
12
11
22
42
34
16
219
Allpairs
12
10
22
41
30
16
664
Pairtest
n/a
9
19
36
29
15
218
AETG
n/a
9
15
41
28
10
1
94
GAPTS
12
9
15
35
27
10
196
The data in Table 2 present the results of the
GAPTS algorithm along with the results of five other
algorithms. The first three pairwise test set generation
algorithms listed in Table 2, PICT, QICT, and Allpairs,
use different variations of an iterative al
gorithm which
constructs a test set one vector at a time. PICT
(Pairwise Independent Combinatorial Testing) is a
command line utility, implemented in C++ [9]. QICT
is a command

line utility, implemented in the C#
language [
10
]. Allpairs is a command

line u
tility,
implemented in the Perl language [2]. The PICT and
Allpairs tools are available on the Internet as
downloadable executables. The QICT tool was
developed as part of a
n empirical
study o
f
the
effectiveness of pairwise testing with respect to
program
fault detection. The test set size data in Table
2 are the results of executing those three tools. Pairtest
is a program implemented in Java which uses an
algorithm that generates a test set by accounting for
each parameter in turn [
9
]. AETG (Automatic Eff
icient
Test Generator) is a fee

based commercial subscription
service which uses an unpublished, proprietary
algorithm [4]. The Pairtest and AETG tools are not
readily available and the test set size data in Table 1
come from a previously published study [
9
]. That
study did not use input set S0 so there are no results
available for the Pairtest and AETG tools on input set
S0.
Additionally, that study incorrectly reported that
the AETG tool generated a pairwise test set of size 180
test vectors for input set
S6. That reported value (180)
was
significantly (
17%
)
smaller than the next best
available reported value (216) so the AETG service
was contacted
. It was determined that the correct
AETG

generated test set size for input set S6 was 194
rather than 180.
In
39 out of 40 instances, the GAPTS technique
produced pairwise test sets with sizes that are
comparable to or better than the other five algorithms
examined. The single instance where GAPTS did not
meet published results was with test set S6, where the
GAP
TS test set size was 196 compared to 1
94
for the
AETG se
rvice. However, GAPTS still out
performed
the other four
, non

fee

based
algorithms. The
unusually high results for the Allpairs tool on test set
S6 suggest that the program may have an overflow
error o
f some sort. Whether or not the smaller pairwise
test set sizes produced by GAPTS are significant or not
depends entirely upon the testing scenario under
consideration. If, for example, each test case is very
expensive to perform (perhaps because the test
run is
destructive in some way
or because each test case
requires a very long time to execute
), then a reduction
in test set size of even one or two test vectors may be
significant. On the other hand in most normal software
application testing scenarios, t
he reduction in the
number of test cases produced by a genetic algorithm
generation technique
is
likely not important enough to
warrant the extra effort
required
to implement GAPTS.
The time required to generate test sets was not a
subject of this study. H
owever, the GAPTS program
required more time to generate pairwise test sets than
the PICT, QICT, and Allpairs programs (comparable
times were not available for Pairtest and AETG). For
example,
on
test set S3, PICT required less than 1
second of run time, and QICT and Allpairs required
approximately 1.5 seconds of run time. GAPTS
required approximately 5 seconds. Note that it is very
difficult to make meaningful run time comparisons
because of the
different ways that the algorithms
produce output to file or command shell. As the input
size increased, GAPTS took significantly longer to
produce results. For test set S6, GAPTS required over
45 minutes of run time.
However, b
ecause test case
generation
is rarely if ever performed in real t
i
me, the
time required by different algorithms to produce
pairwise test sets is likely not a significant factor in
most software testing scenarios
.
6
. References
[
1
]
G.B. Alvareng, and G.
R. Mateues
, "Hierarchical
Tournament Selection Genetic Algorithm for the Vehicle
Routing Problem with Time Windows",
Fourth International
Conference on Hybrid Intelligent Systems
, December 2004,
pp. 410

415
.
[
2
]
J. Bach and P. Shroeder, "Pairwise Testing: A Best
Pr
actice that Isn’t",
Proceedings of the 22nd Pacific
Northwest Software Quality Conference
, October 2004, pp.
180
–
196
.
[
3
]
D.M. Cohen, S.R. Dalal, M.L. Fredman, and G.C. Patton,
"The Combinatorial Design Approach to Automatic Test
Generation",
IEEE Softwa
re
, September 1996, pp. 83
–
87
.
[
4
]
D.M. Cohen, S.R. Dalal, M.
L. Fredman, and G.C. Patton,
"The AETG System: An Approach to Testin
g Based on
Combinatorial Design
"
,
IEEE Transactions on Software
Engineering
,
vol.
23
, no. 7,
1997
, pp. 437

444
.
[
5
]
S.A.
Ghazi and M.
A. Ahmed
, "
Pair

wise Test Coverage
Using Genetic Algorithms
",
Proceedings of the 2003
Congress on Evolutionary Computation
, vol. 2, December
2003, pp. 1420

1424.
[
6
]
D.
E. Goldberg
,
Genetic Algorithms in Search,
Optimization and Machine Learni
ng
,
Addison

We
s
ley,
Reading, MA, 1989
.
[
7
] J.
H. Holland,
Adaptation in Natural and Artificial
Systems
, The University of Michigan Press, Ann Arbor, MI,
1975
.
[
8
]
Tzung

Pei Hong and Hong

Shung Wang, "A Dynamic
Mutation Genetic Algorithm",
IEEE Internation
al
Conference on Systems, Man, and Cybernetics
, October
1996, pp. 2000

2005
.
[
9
]
Yu Lei and K.C. Tai, "In

Parameter

Order: A Test
Generation Strategy for Pairwise Testing",
Proceedings of
Third IEEE International High

Assurance Systems
Engineering Symposi
um
, Nov. 1998, pp. 254

261
.
[
10
]
J
.
D. McCaffrey, "Pairwise T
esting with QICT",
Microsoft Developer Network Magazine
, September 2009,
vol. 24, no. 9, (in press)
.
[
1
1
]
Bian Runqiang, Chen Zengqiang, and Yuan Zhuzhi
,
"Improved Crossover Strategy of Genetic Algorithms and
Analysis of its Performance",
Proceedings of the 3rd World
Congress on Intelligent Control and Automation
, 2000, pp.
516

520
.
[
1
2
]
A.W. Williams and R.L. Probert, "A Practical Strategy
for
Testing P
air

Wise Coverage of Network Interfaces",
Proceedings of the IEEE Symposium on Software Reliability
Engineering
, 1996, pp. 246

254
.
[
1
3
]
J. Yan and J. Zhang, "Backtracking Algorithms and
Search Heuristics to Generate Test Suites for Combinatorial
Testing"
,
Proceedings of the 30th Annual International
Computer Software and Applications Conference
, 2006, pp.
385

394
.
Comments 0
Log in to post a comment