Investigating the Performance of Genetic Algorithm-Based Software ...

grandgoatAI and Robotics

Oct 23, 2013 (4 years and 2 months ago)

99 views

Cristian Urs

and

Ben Riveira

Introduction


The article we chose focuses on improving the
performance of Genetic Algorithms by:



Use of predictive models to efficiently perform repetitive
test case executions.


Directly improving the efficiency of the internal
workings of the Genetic Algorithm itself.

The Genetic Algorithm, Defined


A GA is a search algorithm with the following key
features:



A population of
individuals
, where each individual
represents a possible solution to the problem.



A
fitness function
, which selects individuals for
reproduction, based on the individual’s fitness.



Genetic operators
, which crossover or mutate selected
individuals, creating new individuals for testing.

Example GA Pseudocode

1.
Choose the initial population of individuals.

2.
Evaluate the fitness of each individual in that population.

3.
Repeat on this generation until termination.

1.
Select the best
-
fit individuals for reproduction.

2.
Breed new individuals through crossover and mutation
operations to give birth to offspring.

3.
Evaluate the individual fitness of new individuals.

4.
Replace least
-
fit population with new individuals.


Advantages of Genetic Algorithms


The population of a GA allows it to:



Explore a search space without completely losing partial
solutions that have already been found.



Perform parallel searches into multiple regions of the
solution space.



In the area of software verification and validation, GA’s
have become useful for automatically generating large
volumes of software test cases.


Two Approaches

Neural Network
-
Based Oracles


Use of a system oracle



Avoids expensive execution costs for evaluating test
cases.


Provides efficient execution of repetitive testing tasks
after deployment.


Dramatically reduces the burden of evaluating test cases
in each genetic algorithm generation.

Neural Network
-
Based Oracles

Input
Domain
Data

Genetic
Algorithm

Tester

Test Oracle

Software
Under Test

Output

Result

Selected Individual

Failed Test Cases

Failure Intensity Evaluation

Use of a System Oracle

Neural Network
-
Based Oracles

Input
Domain
Data

Genetic
Algorithm

Tester

Test Oracle

Software
Under Test

Output

Result

Selected Individual

Failed Test Cases

Failure Intensity Evaluation

Use of a System Oracle

Neural Network
-
Based Oracles


A neural network is an algorithm for optimization and
learning based loosely on the nature of the brain.



A directed graph known as the network topology whose
arcs we refer to as links.


A state variable and real
-
valued bias associated with
each node.


Real
-
valued weight and bias associated with each link.


A transfer function for each node.

Neural Network
-
Based Oracles

x

1

2

z

y

1

Input

Output

-
2

1

1

1

1

1

1

1

A simple Feed Forward
Neural Network

Neural Network
-
Based Oracles

Input
Domain
Data

Genetic
Algorithm

Tester

GA Trained
Oracle

Random
Trained
Oracle

Output

Result

Selected Individual

Failed Test Cases

Failure Intensity Evaluation

Neural Network
-
Based Oracles

Comparison of Accuracy for Random

Versus GA
-
Based Test Cases

GA

Random

Overall Accuracy

81%

96%

Error Accuracy

96%

76%

Average per

Error
Accuracy

83%

29%

Improving the Fitness Function
calculation



The Second
strategy regarding the performance of
genetic algorithm in automated test case generation is
regarding the improvement of fitness function
calculation.

Fitness Function Calculations


What is fitness?


The probability of survival of an individual
chromosomes in the next generation


What is a chromosome?


Chromosome=string of digits


Gene= each digit that makes up the chromosome


Ex. of chromosome:
111001110101 100101100110 001010111000


1363 801 299

Ex. of utilization: this chromosome encodes the triangle sides
values of x, y, z




Fitness Function Calculations



How do we calculate the overall fitness?


Based on:


Likelihood of occurrence


Failures intensity


Similarity to other individuals from population


A. Likelihood of Occurrence



Highly fit individuals = high probability to be used


Poorly fit individuals = low probability to be used


How to calculate the likelihood of input data?



By multiplying the probabilities of occurrence


Ex: the likelihood that the user would select Input values 1
and Input value 3 is: 0.75 x 0.005=0.003

B. Failure Intensity


Combination between failure density and severity


Ex:



Low density, high severity
-

single failure that resulted in crash
of the software


High density, low severity
-

system doesn’t crash, but gives
erroneous output








C. Niche Size


What is niche?



the number of individuals from the population who
have common attributes


A situation very likely to occur and result in high failure
intensity


Situations which are similar, but different



How to improve fitness function
calculation?

1. Use a sample of fossil record


2. Summarize the fossil record

1. Use a sample of fossil record


Fossil record= data warehouse


Advantages


Large reduction in computation time


Make the process predictable with fixed size samples


Easy to implement


Example


Sample A of size 500 (6% from the fossil size)


Sample B of size 5000 (17% from the fossil size)



2. Summarize the fossil record


Adopt a higher level of abstraction


Advantages:


Reduced and predictable computation time


Disadvantages:


The strategy is complex and requires frequent re
-
calculation



Sample of fossil record

Conclusion (1)


The GA based software test case generation can be
improved by using oracles or models and the way
fitness function is calculated.

Conclusion (2)



Though the methods for improving the
performance of GA’s discussed in this paper sound
feasible, not enough evidence was presented to
corroborate any of the authors’ claims. Much of the
information that was presented here was actually
discovered in other articles, like:


Breeding Software Test Cases with Genetic
Algorithms
” by D. Berndt (2002)
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumbe
r=1174917