Using Chaos in Genetic Algorithms

grandgoatAI and Robotics

Oct 23, 2013 (3 years and 8 months ago)

163 views

Using Chaos in Genetic Algorithms
John Determan
Idaho National Engineering and Environmental Laboratory
P.O. Box 1625
Idaho Falls, ID 83415-2107
jcd@inel.gov

James A. Foster
University of Idaho
Department of Computer Science
Moscow, ID 83844-1010
foster@cs.uidaho.edu



Abstract
We have performed several experiments to study the
possible use of chaos in simulated evolution. Chaos is
often associated with dynamic situations in which there
is feedback, hence there is speculation in the literature
that chaos is a factor in natural evolution. We chose the
iterated prisoner’s dilemma problem as a test case, since
it is a dynamic environment with feedback. To further
illustrate the benefits of employing chaos in genetic
algorithms, data derived from a genetic data clustering
algorithm under development at the Idaho National
Engineering and Environmental Laboratory is also
presented. To perform an initial assessment of the use
of chaos we used the logistic function, a simple equation
involving chaos, as the basis of a special mutation
operator, which we call λ mutation. The behavior of the
logistic function is well known and comprises three
characteristic ranges of operation: convergent,
bifurcating, and chaotic. We hypothesize that the
chaotic regime will aid exploration, and the convergent
range will aid exploitation. The bifurcating range is
likely neutral, and hence an insignificant factor. Our
results confirm these expectations.
1 Introduction
Chaos underlies many natural phenomena, from turbulent
fluid flow to global weather patterns (Gleick), from healthy
heart rhythms (Goldberger, Rigney, and West) to DNA
coding sequences (Ohno). Complex dynamic systems with
intricate feedback paths may often be understood in terms
of chaos theory. Therefore it is natural to speculate that
evolutionary processes may also be understood in terms of
chaos. Beyond Natural Selection (Wesson) thoroughly
motivates and discusses this speculation. To explore this
idea, we incorporated a simple chaotic equation, the logistic
function, into a simple genetic algorithm.
1.1 The Iterated Prisoner’s Dilemma Problem
A problem involving coevolution is a natural candidate for
this initial test. Coevolution introduces a dynamic
environment and a natural feedback path. The iterated
prisoner’s dilemma (IPD) problem with each individual
scored against the entire population is such a problem, and
will be our target application. The IPD was first
implemented in genetic algorithms by Axelrod in the 1970’s
(Axelrod). To understand the IPD, suppose that two
prisoners are being held as suspects in a crime. The
prisoners attempt independently to get the most
advantageous deal from the police by either supporting or
turning on their partner, that is by cooperating with their
partner or defecting. However, the consequences of their
actions are intertwined, such that although the decision is
binary, it is not simple. A payoff matrix, Table 1,
represents the severity of a criminal’s sentence as a function
of both his and his partner’s actions. A high score
represents a light sentence and therefore a successful pair of
decisions for the criminal receiving the score. In the
iterated version, the score is summed over multiple plays,
which makes the study of general strategies interesting,
since the only rational move in a single game is defection.

Move Score
Player 1 Player 2 Player 1 Player 2
Cooperate Cooperate 3 3
Cooperate Defect 0 5
Defect Cooperate 5 0
Defect Defect 1 1
Table 1. IPD payoff matrix.

As in Axelrod’s initial experiments, we represent an IPD
strategy as a response to a history of three previous pairs of
moves, together with six initial moves. Since there are two
possible moves for each player, this means our strategies
will be represented as binary strings with 2
2*3
+ 2*3 = 70
bits. Also as in Axelrod’s work, we evaluate an
individual’s fitness by playing its strategy against each
individual in the population (including itself) for 10
iterations and computing the average cumulative score per
opponent.
1.2 Genetic Data Clustering Algorithm
Work is currently being performed to apply expert system
technology to the review of nondestructive assay (NDA) of
transuranic waste at the Idaho National Engineering and
Environmental Laboratory (INEEL). One area of research
is the automatic generation of expert system rules from
training data (Determan, Becker). Data clustering is
employed to recognize patterns in the training data from
which fuzzy expert system rules can be formed. The system
reported in Determan and Becker employs a fuzzy
clustering method known as mountain clustering (Yager and
Filev). Attempts to improve the speed of the rule
generation process and the accuracy of the generated rules
have led to the development of a genetic data clustering
algorithm. We present data that illustrates the beneficial
effects of chaos in this application.
1.3 Chaos
A common and simple chaotic function, the logistic
equation (Moon) is:

(1) x
n+1
= λ x
n
(1-x
n
)

The properties of the logistic function are well known, but
we briefly discuss them here. For values of λ between 0
and 3, (1) will converge to some value x. For λ between 3
and about 3.56 the solution to (1) bifurcates into 2, then 4,
then 8 (and so on) periodic solutions. For λ between 3.56
and 4 the solutions to (1) become fully chaotic: neither
convergent nor periodic, but variable with no discernable
pattern. As λ approaches 4, the variation in solutions to (1)
appears increasingly random. Figure 1 illustrates this
behavior. We refer to these three regimes as convergent,
bifurcating, and chaotic, respectively.
1.4 The Logistic Function as a Genetic Operator
Our individual chromosomes maintain an IPD strategy
(70 bits), a value for λ (10 bits), and a uniform crossover
mask (70 bits). We use uniform crossover with a mask: one
child from crossover of parents A and B has strategy bits
selected from B when the mask bits from A are ones and
from A when they are zeros, and it inherits its other bits
(including the mask) from A.
Since λ must be a real between zero and four, we scale
the ten bit binary value in the chromosome appropriately.
Though arbitrary, a ten-bit precision seemed adequate.
We modify the mask itself with the logistic equation,
using the λ value stored in the chromosome and treating the
mask itself as the input. That is, we interpret the 70-bit
mask as a real value, scaled into the range (0,1) to get x
n
,
and set the new mask, x
n+1
, to the 70-bit representation of
λx
n
(1-x
n
). This operation, which we call λ mutation,
incorporates a type of self-directed chaos into the
evolutionary search for IPD strategies via crossover.
We also subjected both λ and the strategy bits to
ordinary, bit-flipping mutation. See Table 2 as a summary
of our representation and use of mutation operators.

Contents: IPD Strategy
λ
Mask = x
n
Size in bits: 70 10 70
Range of
phenotype:
n/a (0,4) (0,1)
Crossover Type: Uniform with Mask none none
Mutation Type: bit flip bit flip
λ mutation
Table 2. For the chromosome in the boxed region, we
describe the contents and size of the gene, the range of
alleles and the genetic operations used on that gene.

An incidental requirement is the ability to perform
floating-point math with a minimum of 70-bit precision
(plus a few for rounding). The Windows specific type long
double is limited to 64-bit precision, so we implemented a
high precision floating point class that supports user
specified precision. We tested this class by iterating the
logistic equation on 75-bit precision numbers and verifying
that the code displayed the correct behavior and values. We
then used a precision of 75 bits for λ mutation.
1.5 Expected Effect of λ Mutation
Consider, for a moment, the behavior of the logistic
function. For values of λ below 3, iteratively applying the
logistic function to some value x
0
results in convergence to
some value x. Speed of convergence decreases, and the
value of x increases, as λ approaches 3, and for λ > 2 the
convergence is oscillatory. This will produce convergent
mutation and will tend to produce crossover masks that
preserve the higher order bits of the mask, but vary the
lower order bits. Near convergence, the mask will become
fixed. Thus an individual with convergent λ will tend to
produce offspring with progressively more rigid crossover
masks. Individuals with nonconvergent λ will tend to have
a high degree of variability in the crossover masks of their
0
0.2
0.4
0.6
0.8
1
0 10 20 30
λ
= 2
λ
= 3.9
λ
= 3.3
(Convergent)
(Chaotic)
(Bifurcating)
0 10 20 30
0 10 20 30
n
n
n
Figure 1. Solutions to the logistic equation for various
λ
(x
0
=0.25).
x
descendants, with the variability increasing as λ approaches
the maximum value of 4.
Note that the probability of λ mutation has a large effect
on this qualitative behavior. For generations where
λ mutation does not occur, the crossover mask will be
inherited unchanged with high probability from the mother
or father, thus good masks will tend to propagate unchanged
most of the time, but with occasional modification from
λ mutation. So, low values of λ should favor exploitation
of good masks, and large values of λ should favor
exploration of the space of possible masks (and therefore
strategies). Thus, one should observe greater exploration
with λ mutation than without. Similarly, one should
observe increased exploration when λ is in the chaotic
domain rather than the convergent one. Also, exploration
should increase with the probability of λ mutation. We can
quantify exploration as the total number of strategies
explored to test these hypotheses.
2 Results for the Iterated Prisoner’s Dilemma
Our control case was the basic IPD with masked uniform
crossover and bit flipping mutation. Each chromosome
contained a crossover mask, which was subject to only bit
flipping mutation, and not λ mutation. The probability of
mutation was 0.001. We also performed a sensitivity
experiment with the probability of mutation at 0.01.
The probability of crossover in all tests was 0.7. Our
GA was the Simple GA of GALib (Wall) with elitism. To
determine which scaling and selection algorithms gave the
best performance on the control problem, we compared
several combinations of scaling and selection algorithms,
namely linear and ST scaling, and tournament and SUS
selection algorithms. We used a population of 20
individuals, and all calculations were performed for
50 generations for these tests. We repeated each run
10 times. Table 3 shows the average IPD scores and their
standard deviations from these series of calculations. The
data show that ST outperforms linear scaling, and that SUS
selection outperforms tournament selection on the IPD, but
just barely. Because of these tests, we used ST scaling and
SUS selection in all subsequent experiments.
There is a basic difference between bit flipping mutation
and λ mutation. The occurrence of bit flipping mutation is
decided for each bit of each chromosome, whereas the
occurrence of λ mutation is decided once per chromosome.
Each chromosome is 70 strategy bits, 70 mask bits, and
10 bits for λ, for a total of 150 bits per chromosome. With
1,000 individuals, for example, this produces 150,000
opportunities for mutation. With the typical mutation
probability of 0.001, then, one expects approximately
150 bits to be flipped each generation. Although a single
λ mutation has more effect on a chromosome than a single
bit flipping mutation, a probability of λ mutation of 0.001 is
too low to be a significant test of λ mutation. Therefore, we
performed tests with the probability of λ mutation set to
0.01, 0.05 and 0.1 (but the probability of bit flipping
mutation was 0.001).

Scaling Selection Avg score Std dev.
Linear tournament 28.4 2.65
Linear stochastic uniform 32.0 4.47
σ-truncation
tournament 30.2 3.57
σ-truncation
stochastic uniform 33.8 4.21
Table 3. Average and standard deviation of ten IPD
calculations for various combinations of scaling and
selection operators.

To ensure that the comparison remained fair, we also ran
the control with an increased mutation probability (0.01).
As shown in Table 4, the average IPD score (over 10 trials)
decreased when the mutation probability increased from
0.001 to 0.01. Thus, it is fair to compare calculations with
higher λ mutation probabilities to the control case.

Calculation Avg. score Std. dev.
Control case, mutation prob. 0.001 33.8 4.21
Control case, mutation prob. 0.01 31.4 4.90
Table 4. Average and standard deviation of ten IPD
calculations for the control case and increased
mutation probability.

We repeated all tests 10 times. Summary results of these
tests are in Table 5. The first two cases are the control case
and the mutation rate sensitivity calculations. As mentioned
before, the average IPD score decreased with increasing
mutation rate. The column labeled “Average Exploration
Rate” is the number of strategies generated divided by the
number of individuals in the population, averaged over 10
calculations. For the control case, increasing the mutation
probability increased the exploration rate by more than a
factor of three, yet the average IPD score decreased. This
sensitivity clearly indicates that merely increasing
exploration may not improve system performance, or that
there is such a thing as too much exploration.
The next two cases in Table 5 (IDs 3 and 4) employ
λ mutation with random λ values in the initial populations.
However, because the convergent λ values cover three
quarters of the range of possible λ values, convergent λ
values are likely to dominate in a random initial population.
We include these results because the greatest IPD score
obtained in any of the tests occurred during these tests.
However, because of the random, but biased, initialization,
the only conclusion that we can draw from these tests is that
some mixture of λ values in the population appears to be
beneficial. In addition, increasing the rate of λ mutation
slightly improved the average IPD score. Without a better
handle on the initial λ distribution it is not possible to
speculate on what meaning, if any, this increase has. Note,
however that the average IPD scores for these test series fall
between the average IPD scores of the control case and its
sensitivity (IDs 1 and 2), even though the maximum score is
higher for the λ mutation cases than for the control cases.
Also, note that the increasing the λ mutation produced a
moderate increase in the exploration rate, which may be
related to the increased average IPD score.
To get a better handle on the effect of λ mutation, we
forced several known λ value distributions on the initial
population: including all convergent values (IDs 5 and 6),
an even distribution of λ values from all three regimes (IDs
7 and 8), and all chaotic values (IDs 9 and 10). All three
series of tests were performed with the λ mutation
probability at both 0.01 and 0.05. For all of these cases, an
increase in the λ mutation probability is accompanied by an
increase in the exploration rate, though a modest increase by
comparison to the control cases. However, the average IPD
scores for the convergent and chaotic test series decreased
with increasing λ mutation rate, while the average IPD
score for the even distribution case increased with the
λ mutation rate. We observed this same trend with the
random initialization cases. Also note that the highest
maximum IPD score for these six tests occurred for one of
the even distribution cases. Apparently, a mixture of initial
λ values in the population improves the effectiveness of
λ mutation, so that increasing the λ mutation rate increases
the average performance on the IPD. There is a weak
suggestion that λ mutation with a mixed initial population is
beneficial for obtaining high scores on the IPD.
As a more extensive comparison to the control case, we
performed an additional series of 50 tests with reduced data
collection for the control case and the even λ distribution
case. We used the mutation rates for which each case
performed the best. The results are in Table 6. Both series
obtained the same maximum IPD score of 48, equal to the
highest in any of our runs. The even λ distribution case had
the higher average IPD score. This likely indicates that
average performance on the IPD is related to the ability of
the algorithm, whereas the maximum performance is
significantly influenced by chance.

ID Description Avg Score Std dev. Max Score
1
No λ mutation,
mutation rate .001
31.6 4.90 48
2
Even λ mutation,
mutation rate 0.05
32.9 5.33 48
Table 6. Comparison of control case and even
λ mutation average and maximum values over 50
tests.

The win rates of different categories of λ values in the
even distribution case afford yet another perspective on the
effect of chaos in GAs. Even if a mixture of λ values in the
initial population is beneficial, one category or another of λ
values might tend to produce the highest scoring individual
more frequently than the others do. Table 7 presents the
fraction of maximum scoring individuals per λ category in
the final generation for the series of 50 even λ distribution
tests. It might have been expected that whatever value
chaotic individuals have during a calculation, they would
tend to die off by the final generation. Yet Table 7 indicates
that in 30% of 50 tests, the maximum scoring individual
possessed a chaotic value of λ. If the IPD score of an
individual were independent of the λ category it belonged
to, then an even one-third split among the categories would
be expected. Table 7 indicates that it is definitely harmful
for an individual to possess a bifurcating λ value, neutral to
possess a chaotic value, and beneficial to possess a
convergent value. Thus, while chaotic values may not
always be the winners, they will remain in the population,
and continue to provide diversity and promote exploration.
In summary, the results indicate that the introduction of
chaos into GAs can be beneficial, but that too much chaos,
like too much mutation, is detrimental. Chaotic operations,
like ordinary bit flipping mutation, appear to be good in
small quantities. Further, λ mutation with a mixture of λ
categories improved both the exploration rate and average
IPD scores.

Description Chaotic Bifurcatin
g
Convergent
Even λ mutation,
probability 0.05
30% 8% 62%
Table 7. Percentage of maximum scoring individuals
from each λ category over fifty tests.
3 Chaos in Genetic Data Clustering
Subtractive clustering (Chiu), based on the mountain
clustering method (Yager and Filev) can be employed to
generate fuzzy logic expert system rules from training data.
The clustering process locates potential clusters and
estimates their means and variances in each dimension of
the training data. The means and variances are further
Probability of
ID
Type of λ
Mutation bit mutation λ mutation
Avg IPD
Score
Std
deviation
Avg
Exploration Rate
Max IPD
Score
1 None 0.001 0 33.8 4.21 0.231 44
2 None 0.010 0 31.4 4.90 0.845 43
3 Full 0.001 0.01 32.3 5.46 0.210 48
4 Full 0.001 0.10 33.5 5.22 0.314 48
5 Convergent 0.001 0.01 34.7 5.40 0.223 44
6 Convergent 0.001 0.05 29.8 4.49 0.282 42
7 Even 0.001 0.01 31.2 3.66 0.212 40
8 Even 0.001 0.05 32.1 5.59 0.270 47
9 Chaotic 0.001 0.01 31.5 2.38 0.223 35
10 Chaotic 0.001 0.05 30.9 4.46 0.251 40
Table 5. Comparison of average, best-of-ten IPD scores, and exploration rate for
different types and rates of mutation. Full
λ
mutation randomly selects
λ
from all
domains, convergent and chaotic
λ
mutation selects from convergent and chaotic
domains, and even
λ
mutation selects with equal probability from all three domains.
refined using backpropagation against the training data
(Yager and Filev, Chiu), and each cluster is used to form a
fuzzy rule. The estimated means and variances are used to
form gaussian membership functions. These membership
functions represent the conditions to be matched by a data
point to be classified. An estimate of the mean of the
correct classification for each cluster is also computed, and
becomes the rule output. For this process to work, various
parameters need be defined, the most important of which
are the cluster radius and the backpropagation learning rate
and convergence criteria. The choice of cluster radius
effects the number of clusters located in the data, and
therefore the number of rules, while the other parameters
effect the estimated means and variances (and therefore, the
accuracy of the rules). Genetic algorithms were employed
to determine appropriate values for these parameters
(Determan and Becker).
Work to improve the speed of the training process and
the accuracy of the generated rules is underway. First, the
process described above is inverted: a genetic clustering
algorithm locates data points in the training data that belong
together; afterwards, means and variances are calculated
over these clusters. Fuzzy rules are constructed from the
cluster means and variances, as before. This procedure
bypasses the time-consuming process of backpropagation.
The goal of current research is to develop genetic operators
that perform well on the data clustering task. This section
will begin by discussing the training data used in this
process, and continue by examining the fundamental genetic
operators and representations of this approach. Finally,
some results will be presented that illustrate the benefits of
using chaos in this application.
3.1

Training Data
To test the performance of the rule generation system, a
set of waste container assays with known validity
confidence ratings was generated. To do this, a panel of
three waste NDA experts was assembled and given a set of
waste assays to assign validity confidence ratings, on a scale
from 0 to 10. They rated their confidence in each of the
three measurement modes of the system (one active, and
two passive neutron measurement modes), for each assay.
The set contained 99 assays evenly selected from the waste
types graphite, combustibles, and glass. Disagreement
existed between the experts as to the proper classification of
each assay, due partly to differing scales of judgment used
by each expert, and partly to the uncertainty inherent in the
interpretation of NDA results. A normalization procedure
was used to reduce the scaling bias of each expert. Assays
with normalized scores that still did not agree between the
experts were removed from the test set. The final data set
contained 67 assays. Three figures of merit derived from
the assay measurements were determined for each
measurement mode of the assay system and used as training
input. Thus, there were 3 data sets, each consisting of 3
input features and one classification. Results presented in
this section are derived from the active mode data.
3.2

Genetic Representation and Operators
The genomes are lists of integers, as shown in Figure 2.
The first portion of the chromosome represents the data
points, specified by integer identifiers, and grouped to
represent clusters of data points. Following are the number
of clusters, and the number of data points in each cluster.
Finally, two integers are used to represent the probability of
mutation and
λ
, as discussed below.

0 3 8 12
2 5 7 1 6
4 11 9 10
3 4 5 4
6879
59879
# of data clusters
# of items in each cluster
identifiers of data items, grouped by cluster
integer representing
λ
in瑥ger representing⁰robab楬楴if 瑡tion

Figure 2. The genome representation for the data
clustering problem.
Two methods of initializing the population have been
tested. One randomly assigns points to clusters, while the
other performs subtractive clustering (without
backpropagation) and assigns points to clusters
probabilistically based on their distance from the estimated
cluster centers. Thus, points close to estimated cluster
centers have a high probability of being associated in a
cluster and those more distant have a lower probability. An
important feature of genetic clustering is that points in a
region of overlap are not limited to being associated with
the closest estimated cluster. Not surprisingly, the
clustering initialization method performs very well on
simple data. However, tests indicate that both perform
equally well when the data becomes more difficult to
cluster, as is the case with the active mode neutron data.
The clustering initializer is used in the data presented.
As with the IPD, GALib is used in this application.
GALib’s simple GA with elitism turned on, tournament
selection (k = 2), and linear scaling were selected. The
objective function is based on the Xie-Beni fuzzy clustering
validity index (Xie and Beni) and performance on the
training data (Becker and Determan). Conceptually, the
clustering index provides information on how well the
selected rules can be expected to generalize to new data,
which helps avoid overfitting the training data. Because the
problem is combinatorial, order and cycle crossover have
been experimented with, but it appears at present that
mutation alone is sufficient. Regardless, for clarity in the
present discussion it is best to consider mutation in
isolation, without complications from crossover.
Some form of swap mutation that selects two genes
representing data points and exchanges their positions in the
genome is appropriate to the application. Similar to the
clustering initializer, two clusters are chosen at random and
a data point from each is selected probabilistically based on
distance from the centroid of the cluster. One limitation at
present is that the clusters are not allowed to change size.
Three variations of the swap mutator are considered below.
The probability of mutation becomes extremely
important in a problem employing only mutation. The
swapping actions described above occur with a frequency
determined by the probability of mutation, P
m
. While the
usual value of .001 is too low, any large, constant value
will likely prevent convergence by continually modifying
the entire population. P
m
, must change as the problem
progresses, so the best course is to let it be determined
genetically.
A simple form of swap mutation is considered to offer a
baseline comparison. Instead of determining P
m
genetically,
P
m
is varied according to a predefined function:

(2) P
m
= P
m0
* exp(-g/
τ
) * sin(
π
/P –
π
/2).

This function is based on the idea that P
m
must decrease for
convergence to occur, but modulation helps to reintroduce
variation into the population as the calculation proceeds.
Tests have shown that setting P =
τ
/2 yields good results,
and that
τ
should be scaled to the length of the calculation
(thus, this method is not sufficiently general for actual use).
In the second mutation operator, non-chaotic swap
mutation, the genome includes an integer gene value
representing the mutation probability. The allele values for
this gene range between 1 and 10
5
, such that mutation
probabilities between 10
-5
and 1 are possible. The swap
mutator function is extended to modify this parameter.
Conceptually, the method is designed to reduce the mutation
probability for genomes that are achieving good results (to
encourage their survival and propagation); the remaining
genomes should either improve or disappear. The method
chosen is to select any genome with an objective score
within 5% of the maximum score in the population and
reduce its P
m
with the following formula:

(3) P
m
= P
m
– min(P
m
, (1- P
m
))/4

The effect is to produce small changes in the mutation
probability near one or zero and larger changes in the
midrange, while preventing P
m
from becoming invalid.
The third mutation operator, chaotic swap mutation, is
similar to that just described, except for the introduction of
chaos. In this method, the genome is extended to represent
both P
m
and

λ
. Like P
m
,
λ
is represented as an integer, with
allele values between 1 and 2
16
, such that when divided by
2
14
a maximum value of 4 results. Instead of directly
modifying P
m
, however, the third method modifies
λ
in a
manner analogous to P
m
above (slowly near 0 and 4, faster
in the midrange, and is applied only those genomes
achieving a high objective score). The logistic function is
applied to mutate P
m
.
Results of calculations using the swap mutation
operators described above will be presented. All
calculations had a population size of 48 individuals, and
used a specific variation of swap mutation as described
above, but no crossover. In every case, 100 repetitions of
each test were performed. Tests involving genetically
determined P
m
were initialized to random values of P
m

between .01 and .9. When
λ
was used,
λ
was randomly
initialized to values ranging from 3.6 to 4 (all in the chaotic
regime).
3.3

Results
Table 8 presents the mean and standard variation over
100 repetitions for each test considered. The tests presented
are 1) the baseline calculation, 2) non-chaotic swap
mutation, and 3) chaotic swap mutation (
λ
mutation). Tests
4–6 are sensitivities and are discussed below. Each
calculation generated a set of rules, and these were used to
classify test data. Results are reported as the decimal
percentage of correct classifications. It is observed that the
average and maximum scores increased, and the standard
deviation decreased as the tests progressed from the first to
the third.
This behavior is further illustrated in Figure 3, where the
distribution of scores from the 100 repetitions of each test is
plotted. Again, as the tests progress from Test 1 to Test 3,
the distributions are observed to shift rightward in the plot.
Finally, Figure 4 presents the probability distribution
P(score>x). This figure indicates that Test 3 has about a
43% chance of scoring above 0.7 (70% correct
classification by the rules produced in the calculation),
while the other tests have about a 24% probability of
achieving the same level of performance. By way of
comparison, 78% classification accuracy has been obtained
on the same data used in these tests, (Determan and
Becker). Note that only the chaotic case has even a slight
probability of surpassing this score (a maximum score of
82% was obtained in 2 out of 100 trials). Thus, while more
work is still required, it is clear that the chaotic swap
mutation operator represents progress toward the goal of
achieving a genetic clustering algorithm capable of
producing improvements over previous results.

ID average
score
std. dev.
score
Maximum
score
Average
generations
1 .56 .11 .75 10
2 .59 .091 .77 10
3 .65 .079 .82 25
4 .62 .082 .77 17
5 .58 .1 .75 12
6 .61 .10 .75 25
Table 8. Average, standard deviation, maximum
score, and average number of generations to
convergence (for 100 repetitions of each test ) for rule
sets generated using 1) the baseline swap mutation, 2)
non-chaotic swap mutation, and 3) chaotic swap
mutation. Test 4 is a variation of test 3 with initially
convergent λ values. Test 5 is a variation of test 4
with the logistic function iterated to convergence
prior to calculation of P
m
. Test 6 is a variation of test
2 with random variations introduced into the
calculation of P
m
.

Additional tests were performed to isolate the reason for
λ
mutation’s improved performance. The
λ
mutation
operator provides greater variation in the values of P
m
used
during a calculation. Therefore, initializing
λ
in the
convergent regime is expected to decrease performance to
about the level observed for Test 2. Likewise, if increased
variation in P
m
is the only factor behind the performance of
λ
mutation, the performance of Test 2 should improve if
random variations are included in the swap mutator for that
test. Test 4 was performed using the chaotic swap mutator
with
λ
initialized randomly between 2.5 and 2.95, the upper
end of the convergent regime. The performance in Test 4
decreased as expected, but only to an average value of 62%.
This is due to the nature of convergence in the logistic
equation. Higher values of
λ
within the convergent regime
result in larger fluctuations and longer convergence times,
thus even in the convergent regime
λ
mutation offers
increased exploration. Test 5 confirms this effect. In
Test 5, the variation in P
m
due to oscillations during
convergence are removed by iterating the logistic function
to convergence prior to calculating P
m
. Test 5 shows
performance comparable to Test 2. Finally, to increase the
variation of P
m
in the non-chaotic swap mutator, Equation 3
was modified such that delta could be either added or
subtracted (random), and the divisor in the delta was
allowed to vary (randomly) between 2 and 8. Test 6
indicates that only minimal improvement resulted, in
comparison to Test 2.
Two additional conclusion may be drawn from the
sensitivity tests:
1) increased variation and exploration due to operation
within the chaotic regime is primarily responsible for the
increased performance of the chaotic swap mutator, and
2) fluctuations arising from convergence of the logistic
equation, apart from chaotic behavior, are also
beneficial.
The failure of Test 6 to significantly improve the
performance of Test 2 further indicates, but by no means
conclusively proves, that the overall effect of
λ
mutation is
more than an increase in exploration. It is speculated that
the property of the logistic equation to gracefully transition
the calculation from exploration to exploitation is a key
factor in the performance gain obtained using
λ
mutation.
4 Conclusions
We examined using chaos in simulated evolution to
direct crossover for a dynamic, coevoltionary problem. A
number of interesting trends are suggested by the data.

Although increasing ordinary bit flipping mutation
beyond a small degree greatly increases exploration, it
also degrades system performance. Increasing
λ
mutation improves system performance by
increasing exploration in a more focused, exploitive
manner than bit flipping. In a sense, chaos directed
mutation better than randomness.

In a large number of tests, where the
λ
values were
initially uniformly distributed, individuals with
chaotic or convergent
λ
values achieved the
maximum IPD score most of the time. This indicates
that there is pressure to keep both of these
components in the population. The chaotic values
presumably aid exploration, and the convergent
values aid exploitation. There must be benefit in
mixing of chaotic and non-chaotic individuals,
because experiments where the populations were
initially all chaotic or all convergent performed
poorly. It is perhaps the mixing of genetic material
from exploring (chaotic) individuals and exploitive
(convergent) individuals that gives
λ
mutation is
greatest advantage.
0
10
20
30
40
Performance range
Base
No Chaos
Chaos
.35 .4 .45 .5 .55 .6 .65 .7 .75 .8
0
0.2
0.4
0.6
0.8
1
0.3 0.5 0.7 0.9
Performance
Base
No Chaos
Chaos
Figure 3. Distribution of performance
scores for tests 1-3 of Table 8.
Figure 4. Probability distribution
P(score>x) for tests 1-3 of Table 8.
Probability
Frequency

Individuals with bifurcating
λ
values rarely achieved
the maximum IPD scores. Bifurcating mutation is
neither exploration, nor exploitation, but merely,
cyclic repetition of previously used genetic material.
Unfortunately, the above conclusions are not clearly
statistically significant. We suspect that this is due in part
to our representation, since the convergent regimes tend to
concentrate changes in the low order bits of the strategy,
though these hold no special significance for the IPD.
The additional conclusions are drawn with respect to the
use of chaos in genetic data clustering:

In an application free from the biased representation
noted above, the use of chaos is clearly beneficial. A
significant performance gain is observed, compared to
the use of a similar but non-chaotic method.

Genetic data clustering, particularly when employing
chaos, has the potential to outform traditional data
clustering methods. Rule sets generated with genetic
data clustering and chaos have outperformed, if only
modestly (82% versus 78%) rules generated by
traditional methods.

The improved performance due to the use of
λ
mutation
is attributable to both chaotic and non-chaotic
variations introduced by operation of the logistic
function. The graceful transition between chaotic and
non-chaotic behavior of the logistic equation is likely a
key factor in the performance of
λ
mutation.
While it is conceivable that a carefully constructed
procedure for introducing random variations could mimic
the behavior of
λ
mutation, the fact that this behavior arises
simply and naturally from the logistic equation favors the
use of
λ
mutation.
We have seen that a judicious use of chaos in simulated
evolution can be beneficial. This also again raises the
question, with which we began, of whether chaos might not
play a role in natural evolution.
5 Acknowledgments
This document was prepared for the U.S. Department of
Energy Assistant Secretary for Environmental Management
Under DOE Idaho Operations Office Contract DE-AC07-
94ID13223.
Bibliography
Axelrod, R. 1987. The Evolution of Strategies in the
Iterated Prisoner’s Dilemma,” in L.D. Davis, ed., Genetic
Algorithms and Simulated Annealing. Morgan Kaufmann.

Becker G., Determan, J. 1998. “Expert System Technology
for Nondestructive Waste Assay,” Proceedings of 39
th

Annual Meeting of the Institute of Nuclear Material
Management.

Chiu, S. L. 1994. Fuzzy Model Identification based on
Cluster Estimation, Journal of Intelligent and Fuzzy
Systems, 2: 267–278.

Determan, J., Becker G. 1998. “Expert System Technology
for Nondestructive Waste Assay,” Proceedings of 10
th

Conference on Innovative Applications of Artificial
Intelligence, AAAI Press, Menlo Park California.

Gleick, James. 1988. Chaos: Making a New Science. New
York: Viking.

Goldberger, Ary L., Rigney, David R., West, Bruce J.
1990. “Chaos and Fractals in Human Physiology.” Scientific
American 262 (February): 42 – 49.

Moon, Francis C. 1992. Chaotic and Fractal Dynamics, an
Introduction for Applied Scientists and Engineers. New
York: John Wiley & Sons, Inc.

Ohno, S. 1988. “Codon Sequence is But an Illusion Created
by the Construction Principle of Codon Sequence.”
Proceedings of the National Academy of Science USA 85:
4378-4386.

Wall, M. 1996. GALib, A C++ Library of Genetic
Algorithm Components, Version 2.4, Documentation
Revision B, Massachusetts Institute of Technology.

Wesson, Robert G. 1981. Beyond Natural Selection.
Cambridge, Massachusetts: The MIT Press.
Xie, X., Beni, G, 1991. “A Validity Measure for Fuzzy
Clustering,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, Volume 13, Number 8.

Yager, R. R.; Filev, D. P. 1994. Generation of Fuzzy Rules
by Mountain Clustering, Journal of Intelligent and Fuzzy
Systems, 2: 209 - 219.