Using Chaos in Genetic Algorithms

John Determan

Idaho National Engineering and Environmental Laboratory

P.O. Box 1625

Idaho Falls, ID 83415-2107

jcd@inel.gov

James A. Foster

University of Idaho

Department of Computer Science

Moscow, ID 83844-1010

foster@cs.uidaho.edu

Abstract

We have performed several experiments to study the

possible use of chaos in simulated evolution. Chaos is

often associated with dynamic situations in which there

is feedback, hence there is speculation in the literature

that chaos is a factor in natural evolution. We chose the

iterated prisoner’s dilemma problem as a test case, since

it is a dynamic environment with feedback. To further

illustrate the benefits of employing chaos in genetic

algorithms, data derived from a genetic data clustering

algorithm under development at the Idaho National

Engineering and Environmental Laboratory is also

presented. To perform an initial assessment of the use

of chaos we used the logistic function, a simple equation

involving chaos, as the basis of a special mutation

operator, which we call λ mutation. The behavior of the

logistic function is well known and comprises three

characteristic ranges of operation: convergent,

bifurcating, and chaotic. We hypothesize that the

chaotic regime will aid exploration, and the convergent

range will aid exploitation. The bifurcating range is

likely neutral, and hence an insignificant factor. Our

results confirm these expectations.

1 Introduction

Chaos underlies many natural phenomena, from turbulent

fluid flow to global weather patterns (Gleick), from healthy

heart rhythms (Goldberger, Rigney, and West) to DNA

coding sequences (Ohno). Complex dynamic systems with

intricate feedback paths may often be understood in terms

of chaos theory. Therefore it is natural to speculate that

evolutionary processes may also be understood in terms of

chaos. Beyond Natural Selection (Wesson) thoroughly

motivates and discusses this speculation. To explore this

idea, we incorporated a simple chaotic equation, the logistic

function, into a simple genetic algorithm.

1.1 The Iterated Prisoner’s Dilemma Problem

A problem involving coevolution is a natural candidate for

this initial test. Coevolution introduces a dynamic

environment and a natural feedback path. The iterated

prisoner’s dilemma (IPD) problem with each individual

scored against the entire population is such a problem, and

will be our target application. The IPD was first

implemented in genetic algorithms by Axelrod in the 1970’s

(Axelrod). To understand the IPD, suppose that two

prisoners are being held as suspects in a crime. The

prisoners attempt independently to get the most

advantageous deal from the police by either supporting or

turning on their partner, that is by cooperating with their

partner or defecting. However, the consequences of their

actions are intertwined, such that although the decision is

binary, it is not simple. A payoff matrix, Table 1,

represents the severity of a criminal’s sentence as a function

of both his and his partner’s actions. A high score

represents a light sentence and therefore a successful pair of

decisions for the criminal receiving the score. In the

iterated version, the score is summed over multiple plays,

which makes the study of general strategies interesting,

since the only rational move in a single game is defection.

Move Score

Player 1 Player 2 Player 1 Player 2

Cooperate Cooperate 3 3

Cooperate Defect 0 5

Defect Cooperate 5 0

Defect Defect 1 1

Table 1. IPD payoff matrix.

As in Axelrod’s initial experiments, we represent an IPD

strategy as a response to a history of three previous pairs of

moves, together with six initial moves. Since there are two

possible moves for each player, this means our strategies

will be represented as binary strings with 2

2*3

+ 2*3 = 70

bits. Also as in Axelrod’s work, we evaluate an

individual’s fitness by playing its strategy against each

individual in the population (including itself) for 10

iterations and computing the average cumulative score per

opponent.

1.2 Genetic Data Clustering Algorithm

Work is currently being performed to apply expert system

technology to the review of nondestructive assay (NDA) of

transuranic waste at the Idaho National Engineering and

Environmental Laboratory (INEEL). One area of research

is the automatic generation of expert system rules from

training data (Determan, Becker). Data clustering is

employed to recognize patterns in the training data from

which fuzzy expert system rules can be formed. The system

reported in Determan and Becker employs a fuzzy

clustering method known as mountain clustering (Yager and

Filev). Attempts to improve the speed of the rule

generation process and the accuracy of the generated rules

have led to the development of a genetic data clustering

algorithm. We present data that illustrates the beneficial

effects of chaos in this application.

1.3 Chaos

A common and simple chaotic function, the logistic

equation (Moon) is:

(1) x

n+1

= λ x

n

(1-x

n

)

The properties of the logistic function are well known, but

we briefly discuss them here. For values of λ between 0

and 3, (1) will converge to some value x. For λ between 3

and about 3.56 the solution to (1) bifurcates into 2, then 4,

then 8 (and so on) periodic solutions. For λ between 3.56

and 4 the solutions to (1) become fully chaotic: neither

convergent nor periodic, but variable with no discernable

pattern. As λ approaches 4, the variation in solutions to (1)

appears increasingly random. Figure 1 illustrates this

behavior. We refer to these three regimes as convergent,

bifurcating, and chaotic, respectively.

1.4 The Logistic Function as a Genetic Operator

Our individual chromosomes maintain an IPD strategy

(70 bits), a value for λ (10 bits), and a uniform crossover

mask (70 bits). We use uniform crossover with a mask: one

child from crossover of parents A and B has strategy bits

selected from B when the mask bits from A are ones and

from A when they are zeros, and it inherits its other bits

(including the mask) from A.

Since λ must be a real between zero and four, we scale

the ten bit binary value in the chromosome appropriately.

Though arbitrary, a ten-bit precision seemed adequate.

We modify the mask itself with the logistic equation,

using the λ value stored in the chromosome and treating the

mask itself as the input. That is, we interpret the 70-bit

mask as a real value, scaled into the range (0,1) to get x

n

,

and set the new mask, x

n+1

, to the 70-bit representation of

λx

n

(1-x

n

). This operation, which we call λ mutation,

incorporates a type of self-directed chaos into the

evolutionary search for IPD strategies via crossover.

We also subjected both λ and the strategy bits to

ordinary, bit-flipping mutation. See Table 2 as a summary

of our representation and use of mutation operators.

Contents: IPD Strategy

λ

Mask = x

n

Size in bits: 70 10 70

Range of

phenotype:

n/a (0,4) (0,1)

Crossover Type: Uniform with Mask none none

Mutation Type: bit flip bit flip

λ mutation

Table 2. For the chromosome in the boxed region, we

describe the contents and size of the gene, the range of

alleles and the genetic operations used on that gene.

An incidental requirement is the ability to perform

floating-point math with a minimum of 70-bit precision

(plus a few for rounding). The Windows specific type long

double is limited to 64-bit precision, so we implemented a

high precision floating point class that supports user

specified precision. We tested this class by iterating the

logistic equation on 75-bit precision numbers and verifying

that the code displayed the correct behavior and values. We

then used a precision of 75 bits for λ mutation.

1.5 Expected Effect of λ Mutation

Consider, for a moment, the behavior of the logistic

function. For values of λ below 3, iteratively applying the

logistic function to some value x

0

results in convergence to

some value x. Speed of convergence decreases, and the

value of x increases, as λ approaches 3, and for λ > 2 the

convergence is oscillatory. This will produce convergent

mutation and will tend to produce crossover masks that

preserve the higher order bits of the mask, but vary the

lower order bits. Near convergence, the mask will become

fixed. Thus an individual with convergent λ will tend to

produce offspring with progressively more rigid crossover

masks. Individuals with nonconvergent λ will tend to have

a high degree of variability in the crossover masks of their

0

0.2

0.4

0.6

0.8

1

0 10 20 30

λ

= 2

λ

= 3.9

λ

= 3.3

(Convergent)

(Chaotic)

(Bifurcating)

0 10 20 30

0 10 20 30

n

n

n

Figure 1. Solutions to the logistic equation for various

λ

(x

0

=0.25).

x

descendants, with the variability increasing as λ approaches

the maximum value of 4.

Note that the probability of λ mutation has a large effect

on this qualitative behavior. For generations where

λ mutation does not occur, the crossover mask will be

inherited unchanged with high probability from the mother

or father, thus good masks will tend to propagate unchanged

most of the time, but with occasional modification from

λ mutation. So, low values of λ should favor exploitation

of good masks, and large values of λ should favor

exploration of the space of possible masks (and therefore

strategies). Thus, one should observe greater exploration

with λ mutation than without. Similarly, one should

observe increased exploration when λ is in the chaotic

domain rather than the convergent one. Also, exploration

should increase with the probability of λ mutation. We can

quantify exploration as the total number of strategies

explored to test these hypotheses.

2 Results for the Iterated Prisoner’s Dilemma

Our control case was the basic IPD with masked uniform

crossover and bit flipping mutation. Each chromosome

contained a crossover mask, which was subject to only bit

flipping mutation, and not λ mutation. The probability of

mutation was 0.001. We also performed a sensitivity

experiment with the probability of mutation at 0.01.

The probability of crossover in all tests was 0.7. Our

GA was the Simple GA of GALib (Wall) with elitism. To

determine which scaling and selection algorithms gave the

best performance on the control problem, we compared

several combinations of scaling and selection algorithms,

namely linear and ST scaling, and tournament and SUS

selection algorithms. We used a population of 20

individuals, and all calculations were performed for

50 generations for these tests. We repeated each run

10 times. Table 3 shows the average IPD scores and their

standard deviations from these series of calculations. The

data show that ST outperforms linear scaling, and that SUS

selection outperforms tournament selection on the IPD, but

just barely. Because of these tests, we used ST scaling and

SUS selection in all subsequent experiments.

There is a basic difference between bit flipping mutation

and λ mutation. The occurrence of bit flipping mutation is

decided for each bit of each chromosome, whereas the

occurrence of λ mutation is decided once per chromosome.

Each chromosome is 70 strategy bits, 70 mask bits, and

10 bits for λ, for a total of 150 bits per chromosome. With

1,000 individuals, for example, this produces 150,000

opportunities for mutation. With the typical mutation

probability of 0.001, then, one expects approximately

150 bits to be flipped each generation. Although a single

λ mutation has more effect on a chromosome than a single

bit flipping mutation, a probability of λ mutation of 0.001 is

too low to be a significant test of λ mutation. Therefore, we

performed tests with the probability of λ mutation set to

0.01, 0.05 and 0.1 (but the probability of bit flipping

mutation was 0.001).

Scaling Selection Avg score Std dev.

Linear tournament 28.4 2.65

Linear stochastic uniform 32.0 4.47

σ-truncation

tournament 30.2 3.57

σ-truncation

stochastic uniform 33.8 4.21

Table 3. Average and standard deviation of ten IPD

calculations for various combinations of scaling and

selection operators.

To ensure that the comparison remained fair, we also ran

the control with an increased mutation probability (0.01).

As shown in Table 4, the average IPD score (over 10 trials)

decreased when the mutation probability increased from

0.001 to 0.01. Thus, it is fair to compare calculations with

higher λ mutation probabilities to the control case.

Calculation Avg. score Std. dev.

Control case, mutation prob. 0.001 33.8 4.21

Control case, mutation prob. 0.01 31.4 4.90

Table 4. Average and standard deviation of ten IPD

calculations for the control case and increased

mutation probability.

We repeated all tests 10 times. Summary results of these

tests are in Table 5. The first two cases are the control case

and the mutation rate sensitivity calculations. As mentioned

before, the average IPD score decreased with increasing

mutation rate. The column labeled “Average Exploration

Rate” is the number of strategies generated divided by the

number of individuals in the population, averaged over 10

calculations. For the control case, increasing the mutation

probability increased the exploration rate by more than a

factor of three, yet the average IPD score decreased. This

sensitivity clearly indicates that merely increasing

exploration may not improve system performance, or that

there is such a thing as too much exploration.

The next two cases in Table 5 (IDs 3 and 4) employ

λ mutation with random λ values in the initial populations.

However, because the convergent λ values cover three

quarters of the range of possible λ values, convergent λ

values are likely to dominate in a random initial population.

We include these results because the greatest IPD score

obtained in any of the tests occurred during these tests.

However, because of the random, but biased, initialization,

the only conclusion that we can draw from these tests is that

some mixture of λ values in the population appears to be

beneficial. In addition, increasing the rate of λ mutation

slightly improved the average IPD score. Without a better

handle on the initial λ distribution it is not possible to

speculate on what meaning, if any, this increase has. Note,

however that the average IPD scores for these test series fall

between the average IPD scores of the control case and its

sensitivity (IDs 1 and 2), even though the maximum score is

higher for the λ mutation cases than for the control cases.

Also, note that the increasing the λ mutation produced a

moderate increase in the exploration rate, which may be

related to the increased average IPD score.

To get a better handle on the effect of λ mutation, we

forced several known λ value distributions on the initial

population: including all convergent values (IDs 5 and 6),

an even distribution of λ values from all three regimes (IDs

7 and 8), and all chaotic values (IDs 9 and 10). All three

series of tests were performed with the λ mutation

probability at both 0.01 and 0.05. For all of these cases, an

increase in the λ mutation probability is accompanied by an

increase in the exploration rate, though a modest increase by

comparison to the control cases. However, the average IPD

scores for the convergent and chaotic test series decreased

with increasing λ mutation rate, while the average IPD

score for the even distribution case increased with the

λ mutation rate. We observed this same trend with the

random initialization cases. Also note that the highest

maximum IPD score for these six tests occurred for one of

the even distribution cases. Apparently, a mixture of initial

λ values in the population improves the effectiveness of

λ mutation, so that increasing the λ mutation rate increases

the average performance on the IPD. There is a weak

suggestion that λ mutation with a mixed initial population is

beneficial for obtaining high scores on the IPD.

As a more extensive comparison to the control case, we

performed an additional series of 50 tests with reduced data

collection for the control case and the even λ distribution

case. We used the mutation rates for which each case

performed the best. The results are in Table 6. Both series

obtained the same maximum IPD score of 48, equal to the

highest in any of our runs. The even λ distribution case had

the higher average IPD score. This likely indicates that

average performance on the IPD is related to the ability of

the algorithm, whereas the maximum performance is

significantly influenced by chance.

ID Description Avg Score Std dev. Max Score

1

No λ mutation,

mutation rate .001

31.6 4.90 48

2

Even λ mutation,

mutation rate 0.05

32.9 5.33 48

Table 6. Comparison of control case and even

λ mutation average and maximum values over 50

tests.

The win rates of different categories of λ values in the

even distribution case afford yet another perspective on the

effect of chaos in GAs. Even if a mixture of λ values in the

initial population is beneficial, one category or another of λ

values might tend to produce the highest scoring individual

more frequently than the others do. Table 7 presents the

fraction of maximum scoring individuals per λ category in

the final generation for the series of 50 even λ distribution

tests. It might have been expected that whatever value

chaotic individuals have during a calculation, they would

tend to die off by the final generation. Yet Table 7 indicates

that in 30% of 50 tests, the maximum scoring individual

possessed a chaotic value of λ. If the IPD score of an

individual were independent of the λ category it belonged

to, then an even one-third split among the categories would

be expected. Table 7 indicates that it is definitely harmful

for an individual to possess a bifurcating λ value, neutral to

possess a chaotic value, and beneficial to possess a

convergent value. Thus, while chaotic values may not

always be the winners, they will remain in the population,

and continue to provide diversity and promote exploration.

In summary, the results indicate that the introduction of

chaos into GAs can be beneficial, but that too much chaos,

like too much mutation, is detrimental. Chaotic operations,

like ordinary bit flipping mutation, appear to be good in

small quantities. Further, λ mutation with a mixture of λ

categories improved both the exploration rate and average

IPD scores.

Description Chaotic Bifurcatin

g

Convergent

Even λ mutation,

probability 0.05

30% 8% 62%

Table 7. Percentage of maximum scoring individuals

from each λ category over fifty tests.

3 Chaos in Genetic Data Clustering

Subtractive clustering (Chiu), based on the mountain

clustering method (Yager and Filev) can be employed to

generate fuzzy logic expert system rules from training data.

The clustering process locates potential clusters and

estimates their means and variances in each dimension of

the training data. The means and variances are further

Probability of

ID

Type of λ

Mutation bit mutation λ mutation

Avg IPD

Score

Std

deviation

Avg

Exploration Rate

Max IPD

Score

1 None 0.001 0 33.8 4.21 0.231 44

2 None 0.010 0 31.4 4.90 0.845 43

3 Full 0.001 0.01 32.3 5.46 0.210 48

4 Full 0.001 0.10 33.5 5.22 0.314 48

5 Convergent 0.001 0.01 34.7 5.40 0.223 44

6 Convergent 0.001 0.05 29.8 4.49 0.282 42

7 Even 0.001 0.01 31.2 3.66 0.212 40

8 Even 0.001 0.05 32.1 5.59 0.270 47

9 Chaotic 0.001 0.01 31.5 2.38 0.223 35

10 Chaotic 0.001 0.05 30.9 4.46 0.251 40

Table 5. Comparison of average, best-of-ten IPD scores, and exploration rate for

different types and rates of mutation. Full

λ

mutation randomly selects

λ

from all

domains, convergent and chaotic

λ

mutation selects from convergent and chaotic

domains, and even

λ

mutation selects with equal probability from all three domains.

refined using backpropagation against the training data

(Yager and Filev, Chiu), and each cluster is used to form a

fuzzy rule. The estimated means and variances are used to

form gaussian membership functions. These membership

functions represent the conditions to be matched by a data

point to be classified. An estimate of the mean of the

correct classification for each cluster is also computed, and

becomes the rule output. For this process to work, various

parameters need be defined, the most important of which

are the cluster radius and the backpropagation learning rate

and convergence criteria. The choice of cluster radius

effects the number of clusters located in the data, and

therefore the number of rules, while the other parameters

effect the estimated means and variances (and therefore, the

accuracy of the rules). Genetic algorithms were employed

to determine appropriate values for these parameters

(Determan and Becker).

Work to improve the speed of the training process and

the accuracy of the generated rules is underway. First, the

process described above is inverted: a genetic clustering

algorithm locates data points in the training data that belong

together; afterwards, means and variances are calculated

over these clusters. Fuzzy rules are constructed from the

cluster means and variances, as before. This procedure

bypasses the time-consuming process of backpropagation.

The goal of current research is to develop genetic operators

that perform well on the data clustering task. This section

will begin by discussing the training data used in this

process, and continue by examining the fundamental genetic

operators and representations of this approach. Finally,

some results will be presented that illustrate the benefits of

using chaos in this application.

3.1

Training Data

To test the performance of the rule generation system, a

set of waste container assays with known validity

confidence ratings was generated. To do this, a panel of

three waste NDA experts was assembled and given a set of

waste assays to assign validity confidence ratings, on a scale

from 0 to 10. They rated their confidence in each of the

three measurement modes of the system (one active, and

two passive neutron measurement modes), for each assay.

The set contained 99 assays evenly selected from the waste

types graphite, combustibles, and glass. Disagreement

existed between the experts as to the proper classification of

each assay, due partly to differing scales of judgment used

by each expert, and partly to the uncertainty inherent in the

interpretation of NDA results. A normalization procedure

was used to reduce the scaling bias of each expert. Assays

with normalized scores that still did not agree between the

experts were removed from the test set. The final data set

contained 67 assays. Three figures of merit derived from

the assay measurements were determined for each

measurement mode of the assay system and used as training

input. Thus, there were 3 data sets, each consisting of 3

input features and one classification. Results presented in

this section are derived from the active mode data.

3.2

Genetic Representation and Operators

The genomes are lists of integers, as shown in Figure 2.

The first portion of the chromosome represents the data

points, specified by integer identifiers, and grouped to

represent clusters of data points. Following are the number

of clusters, and the number of data points in each cluster.

Finally, two integers are used to represent the probability of

mutation and

λ

, as discussed below.

0 3 8 12

2 5 7 1 6

4 11 9 10

3 4 5 4

6879

59879

# of data clusters

# of items in each cluster

identifiers of data items, grouped by cluster

integer representing

λ

in瑥ger representing⁰robab楬楴if 瑡tion

Figure 2. The genome representation for the data

clustering problem.

Two methods of initializing the population have been

tested. One randomly assigns points to clusters, while the

other performs subtractive clustering (without

backpropagation) and assigns points to clusters

probabilistically based on their distance from the estimated

cluster centers. Thus, points close to estimated cluster

centers have a high probability of being associated in a

cluster and those more distant have a lower probability. An

important feature of genetic clustering is that points in a

region of overlap are not limited to being associated with

the closest estimated cluster. Not surprisingly, the

clustering initialization method performs very well on

simple data. However, tests indicate that both perform

equally well when the data becomes more difficult to

cluster, as is the case with the active mode neutron data.

The clustering initializer is used in the data presented.

As with the IPD, GALib is used in this application.

GALib’s simple GA with elitism turned on, tournament

selection (k = 2), and linear scaling were selected. The

objective function is based on the Xie-Beni fuzzy clustering

validity index (Xie and Beni) and performance on the

training data (Becker and Determan). Conceptually, the

clustering index provides information on how well the

selected rules can be expected to generalize to new data,

which helps avoid overfitting the training data. Because the

problem is combinatorial, order and cycle crossover have

been experimented with, but it appears at present that

mutation alone is sufficient. Regardless, for clarity in the

present discussion it is best to consider mutation in

isolation, without complications from crossover.

Some form of swap mutation that selects two genes

representing data points and exchanges their positions in the

genome is appropriate to the application. Similar to the

clustering initializer, two clusters are chosen at random and

a data point from each is selected probabilistically based on

distance from the centroid of the cluster. One limitation at

present is that the clusters are not allowed to change size.

Three variations of the swap mutator are considered below.

The probability of mutation becomes extremely

important in a problem employing only mutation. The

swapping actions described above occur with a frequency

determined by the probability of mutation, P

m

. While the

usual value of .001 is too low, any large, constant value

will likely prevent convergence by continually modifying

the entire population. P

m

, must change as the problem

progresses, so the best course is to let it be determined

genetically.

A simple form of swap mutation is considered to offer a

baseline comparison. Instead of determining P

m

genetically,

P

m

is varied according to a predefined function:

(2) P

m

= P

m0

* exp(-g/

τ

) * sin(

π

/P –

π

/2).

This function is based on the idea that P

m

must decrease for

convergence to occur, but modulation helps to reintroduce

variation into the population as the calculation proceeds.

Tests have shown that setting P =

τ

/2 yields good results,

and that

τ

should be scaled to the length of the calculation

(thus, this method is not sufficiently general for actual use).

In the second mutation operator, non-chaotic swap

mutation, the genome includes an integer gene value

representing the mutation probability. The allele values for

this gene range between 1 and 10

5

, such that mutation

probabilities between 10

-5

and 1 are possible. The swap

mutator function is extended to modify this parameter.

Conceptually, the method is designed to reduce the mutation

probability for genomes that are achieving good results (to

encourage their survival and propagation); the remaining

genomes should either improve or disappear. The method

chosen is to select any genome with an objective score

within 5% of the maximum score in the population and

reduce its P

m

with the following formula:

(3) P

m

= P

m

– min(P

m

, (1- P

m

))/4

The effect is to produce small changes in the mutation

probability near one or zero and larger changes in the

midrange, while preventing P

m

from becoming invalid.

The third mutation operator, chaotic swap mutation, is

similar to that just described, except for the introduction of

chaos. In this method, the genome is extended to represent

both P

m

and

λ

. Like P

m

,

λ

is represented as an integer, with

allele values between 1 and 2

16

, such that when divided by

2

14

a maximum value of 4 results. Instead of directly

modifying P

m

, however, the third method modifies

λ

in a

manner analogous to P

m

above (slowly near 0 and 4, faster

in the midrange, and is applied only those genomes

achieving a high objective score). The logistic function is

applied to mutate P

m

.

Results of calculations using the swap mutation

operators described above will be presented. All

calculations had a population size of 48 individuals, and

used a specific variation of swap mutation as described

above, but no crossover. In every case, 100 repetitions of

each test were performed. Tests involving genetically

determined P

m

were initialized to random values of P

m

between .01 and .9. When

λ

was used,

λ

was randomly

initialized to values ranging from 3.6 to 4 (all in the chaotic

regime).

3.3

Results

Table 8 presents the mean and standard variation over

100 repetitions for each test considered. The tests presented

are 1) the baseline calculation, 2) non-chaotic swap

mutation, and 3) chaotic swap mutation (

λ

mutation). Tests

4–6 are sensitivities and are discussed below. Each

calculation generated a set of rules, and these were used to

classify test data. Results are reported as the decimal

percentage of correct classifications. It is observed that the

average and maximum scores increased, and the standard

deviation decreased as the tests progressed from the first to

the third.

This behavior is further illustrated in Figure 3, where the

distribution of scores from the 100 repetitions of each test is

plotted. Again, as the tests progress from Test 1 to Test 3,

the distributions are observed to shift rightward in the plot.

Finally, Figure 4 presents the probability distribution

P(score>x). This figure indicates that Test 3 has about a

43% chance of scoring above 0.7 (70% correct

classification by the rules produced in the calculation),

while the other tests have about a 24% probability of

achieving the same level of performance. By way of

comparison, 78% classification accuracy has been obtained

on the same data used in these tests, (Determan and

Becker). Note that only the chaotic case has even a slight

probability of surpassing this score (a maximum score of

82% was obtained in 2 out of 100 trials). Thus, while more

work is still required, it is clear that the chaotic swap

mutation operator represents progress toward the goal of

achieving a genetic clustering algorithm capable of

producing improvements over previous results.

ID average

score

std. dev.

score

Maximum

score

Average

generations

1 .56 .11 .75 10

2 .59 .091 .77 10

3 .65 .079 .82 25

4 .62 .082 .77 17

5 .58 .1 .75 12

6 .61 .10 .75 25

Table 8. Average, standard deviation, maximum

score, and average number of generations to

convergence (for 100 repetitions of each test ) for rule

sets generated using 1) the baseline swap mutation, 2)

non-chaotic swap mutation, and 3) chaotic swap

mutation. Test 4 is a variation of test 3 with initially

convergent λ values. Test 5 is a variation of test 4

with the logistic function iterated to convergence

prior to calculation of P

m

. Test 6 is a variation of test

2 with random variations introduced into the

calculation of P

m

.

Additional tests were performed to isolate the reason for

λ

mutation’s improved performance. The

λ

mutation

operator provides greater variation in the values of P

m

used

during a calculation. Therefore, initializing

λ

in the

convergent regime is expected to decrease performance to

about the level observed for Test 2. Likewise, if increased

variation in P

m

is the only factor behind the performance of

λ

mutation, the performance of Test 2 should improve if

random variations are included in the swap mutator for that

test. Test 4 was performed using the chaotic swap mutator

with

λ

initialized randomly between 2.5 and 2.95, the upper

end of the convergent regime. The performance in Test 4

decreased as expected, but only to an average value of 62%.

This is due to the nature of convergence in the logistic

equation. Higher values of

λ

within the convergent regime

result in larger fluctuations and longer convergence times,

thus even in the convergent regime

λ

mutation offers

increased exploration. Test 5 confirms this effect. In

Test 5, the variation in P

m

due to oscillations during

convergence are removed by iterating the logistic function

to convergence prior to calculating P

m

. Test 5 shows

performance comparable to Test 2. Finally, to increase the

variation of P

m

in the non-chaotic swap mutator, Equation 3

was modified such that delta could be either added or

subtracted (random), and the divisor in the delta was

allowed to vary (randomly) between 2 and 8. Test 6

indicates that only minimal improvement resulted, in

comparison to Test 2.

Two additional conclusion may be drawn from the

sensitivity tests:

1) increased variation and exploration due to operation

within the chaotic regime is primarily responsible for the

increased performance of the chaotic swap mutator, and

2) fluctuations arising from convergence of the logistic

equation, apart from chaotic behavior, are also

beneficial.

The failure of Test 6 to significantly improve the

performance of Test 2 further indicates, but by no means

conclusively proves, that the overall effect of

λ

mutation is

more than an increase in exploration. It is speculated that

the property of the logistic equation to gracefully transition

the calculation from exploration to exploitation is a key

factor in the performance gain obtained using

λ

mutation.

4 Conclusions

We examined using chaos in simulated evolution to

direct crossover for a dynamic, coevoltionary problem. A

number of interesting trends are suggested by the data.

•

Although increasing ordinary bit flipping mutation

beyond a small degree greatly increases exploration, it

also degrades system performance. Increasing

λ

mutation improves system performance by

increasing exploration in a more focused, exploitive

manner than bit flipping. In a sense, chaos directed

mutation better than randomness.

•

In a large number of tests, where the

λ

values were

initially uniformly distributed, individuals with

chaotic or convergent

λ

values achieved the

maximum IPD score most of the time. This indicates

that there is pressure to keep both of these

components in the population. The chaotic values

presumably aid exploration, and the convergent

values aid exploitation. There must be benefit in

mixing of chaotic and non-chaotic individuals,

because experiments where the populations were

initially all chaotic or all convergent performed

poorly. It is perhaps the mixing of genetic material

from exploring (chaotic) individuals and exploitive

(convergent) individuals that gives

λ

mutation is

greatest advantage.

0

10

20

30

40

Performance range

Base

No Chaos

Chaos

.35 .4 .45 .5 .55 .6 .65 .7 .75 .8

0

0.2

0.4

0.6

0.8

1

0.3 0.5 0.7 0.9

Performance

Base

No Chaos

Chaos

Figure 3. Distribution of performance

scores for tests 1-3 of Table 8.

Figure 4. Probability distribution

P(score>x) for tests 1-3 of Table 8.

Probability

Frequency

•

Individuals with bifurcating

λ

values rarely achieved

the maximum IPD scores. Bifurcating mutation is

neither exploration, nor exploitation, but merely,

cyclic repetition of previously used genetic material.

Unfortunately, the above conclusions are not clearly

statistically significant. We suspect that this is due in part

to our representation, since the convergent regimes tend to

concentrate changes in the low order bits of the strategy,

though these hold no special significance for the IPD.

The additional conclusions are drawn with respect to the

use of chaos in genetic data clustering:

•

In an application free from the biased representation

noted above, the use of chaos is clearly beneficial. A

significant performance gain is observed, compared to

the use of a similar but non-chaotic method.

•

Genetic data clustering, particularly when employing

chaos, has the potential to outform traditional data

clustering methods. Rule sets generated with genetic

data clustering and chaos have outperformed, if only

modestly (82% versus 78%) rules generated by

traditional methods.

•

The improved performance due to the use of

λ

mutation

is attributable to both chaotic and non-chaotic

variations introduced by operation of the logistic

function. The graceful transition between chaotic and

non-chaotic behavior of the logistic equation is likely a

key factor in the performance of

λ

mutation.

While it is conceivable that a carefully constructed

procedure for introducing random variations could mimic

the behavior of

λ

mutation, the fact that this behavior arises

simply and naturally from the logistic equation favors the

use of

λ

mutation.

We have seen that a judicious use of chaos in simulated

evolution can be beneficial. This also again raises the

question, with which we began, of whether chaos might not

play a role in natural evolution.

5 Acknowledgments

This document was prepared for the U.S. Department of

Energy Assistant Secretary for Environmental Management

Under DOE Idaho Operations Office Contract DE-AC07-

94ID13223.

Bibliography

Axelrod, R. 1987. The Evolution of Strategies in the

Iterated Prisoner’s Dilemma,” in L.D. Davis, ed., Genetic

Algorithms and Simulated Annealing. Morgan Kaufmann.

Becker G., Determan, J. 1998. “Expert System Technology

for Nondestructive Waste Assay,” Proceedings of 39

th

Annual Meeting of the Institute of Nuclear Material

Management.

Chiu, S. L. 1994. Fuzzy Model Identification based on

Cluster Estimation, Journal of Intelligent and Fuzzy

Systems, 2: 267–278.

Determan, J., Becker G. 1998. “Expert System Technology

for Nondestructive Waste Assay,” Proceedings of 10

th

Conference on Innovative Applications of Artificial

Intelligence, AAAI Press, Menlo Park California.

Gleick, James. 1988. Chaos: Making a New Science. New

York: Viking.

Goldberger, Ary L., Rigney, David R., West, Bruce J.

1990. “Chaos and Fractals in Human Physiology.” Scientific

American 262 (February): 42 – 49.

Moon, Francis C. 1992. Chaotic and Fractal Dynamics, an

Introduction for Applied Scientists and Engineers. New

York: John Wiley & Sons, Inc.

Ohno, S. 1988. “Codon Sequence is But an Illusion Created

by the Construction Principle of Codon Sequence.”

Proceedings of the National Academy of Science USA 85:

4378-4386.

Wall, M. 1996. GALib, A C++ Library of Genetic

Algorithm Components, Version 2.4, Documentation

Revision B, Massachusetts Institute of Technology.

Wesson, Robert G. 1981. Beyond Natural Selection.

Cambridge, Massachusetts: The MIT Press.

Xie, X., Beni, G, 1991. “A Validity Measure for Fuzzy

Clustering,” IEEE Transactions on Pattern Analysis and

Machine Intelligence, Volume 13, Number 8.

Yager, R. R.; Filev, D. P. 1994. Generation of Fuzzy Rules

by Mountain Clustering, Journal of Intelligent and Fuzzy

Systems, 2: 209 - 219.

## Comments 0

Log in to post a comment