Revolutionizing the Field of Grey
Surface Testing with Evolutionary Fuzzing
Computer Science & Engineering
College of Engineering
Supported by Applied Security, Inc.
Find security vulnerabilities in software, particularly those not found by standard
In our previous work,
The Evolving Art of Fuzzing
showed how it relates to testing and security,
explained the field as it is currently, and
suggested incorporating genetic algorithms
CONCEPT: We want to better test the vulnerability of binary code (no source code) to
inputs that potentially might either “break” that code or make it vulnerable to further
attack. We pre
analyze the binary code to identify function addresses that can
potentially be called. We track progress by measuring how much code on the
(that part of the code available to be tested via program inputs) we have tested.
APPROACH: The research here is the use of a genetic algorithm to generate inputs to
attack the binary code. Evolutionary algorithms are adaptable in ways that humans are
not and may discover new or better test cases. Previously, developed is a general
purpose fuzzer (GPF) to automatically fuzz arbitrary network protocols using a capture
file as the base, and fuzzing heuristics as the method to generate semi
mutated capture file) data.
TEST: We have designed and implemented an Evolutionary Fuzzing System (EFS) that
marries a GA with GPF and modified version of PaiMei (Figure 1). From a high level,
data sessions (Figure 2) of semi
random data are delivered to a debugger
target application. Target
, the code coverage statistics from each session, are
stored to a database. At the end of each generation the fitness for each session is
calculated based on hits, and the sessions with the best fitness are allowed to breed
(Figure 3). The resulting sessions are used in the next iteration.
While in its infancy, initial tests are impressive (Figures 4 & 5). EFS was able to learn a
target protocol and discover previously unknown bugs. First results and complete
system design are included in .
A major challenge is to understand the complex interactions of the genetic algorithm
with the target. For example, we’ve developed a way of organizing data into pools of
sessions to allow a novel type of co
evolution. If multiple paths through code exist,
initial results show that pools help us better cover them. It’s unknown what the optimal
number of pools and number of sessions/pool is, plus we believe further
required for optimal coverage.
Niching would allow sessions that are
different from the most fit sessions to be carried over to the next generation
regardless of fitness.
Currently we’re testing against software whose internal design is
unknown to us. Thus, it’s difficult to measure the actual effectiveness, in terms of path
coverage and bugs found.
We propose to design and build a benchmarking application. This benchmarking
application will allow us to research various grey
box testing approaches, further study
EFS, and answer the above questions. The application would also be released to the
community at large to allow very interesting studies such as fuzzer “shoot offs” or
competitions. The current paper under way,
, is an initial design for the application and the process .
 Jared DeMott, “The Evolving Art of Fuzzing”,
August 2006 &
 J. DeMott, R. Enbody, W. Punch, “Revolutionizing the Field of Grey
box Attack Surface Testing with Evolutionary Fuzzing”,
Candidate for BlackHat & Defcon 2007
 Jared DeMott, “Benchmarking Grey
box Robustness Testing Tools”, work in progress.
March, 02 2007
Figure 1: The Evolving Fuzzer System (EFS)
Figure 3: Basic Genetic Operators, Session (left) and Pool (right) Crossover
Figure 2: Data Structure
Looking for Application Bugs or
Vulnerabilities by Attack Surface
High Level Fuzzing Flow Chart
EFS in Action
Exercising the set of all possible combinations of
inputs on all possible arcs or paths through code is in
infinite set. Testing in is an NP
Still more analysis of code coverage, path coverage,
input space, error heuristics, etc. is prudent for
improving application robustness. Particularly in the
face of rising security threats.
box testing analyzes source code. Black
testing exercises a target program or process without
examining any code, whether source code or
binary/assembly code. Grey
box testing falls in
between by allowing access to binary code. For
example, in basic grey
box testing one might attach a
debugger to target code and monitor various
statistics (crashes, code coverage, memory usage,
Fuzzing, or security/robustness testing, is an
important and growing field of interest to software
developers and security researchers alike.
Figure 4: Average fitness (left) and best (right) of pool and session over 6 runs
Figure 5: 10
pool Crash Total; 4
pool Crash Total; 1
pool Crash Total (all runs)
Initial Test Results against the Golden FTP server
The graphs show the number of functions covered by the best session and the
best pool of session discovered as the GA progresses over time (x axis). Note
that the best pool is outperforming the best session, indicating that multiple
sessions in a pool are cooperating to find a more complete fuzzing set, as
indicated by covering more of the attack surface (y axis).
The pie charts show the diversity in bugs (crash addresses) discovered. Notice
that the runs with multiple pools have greater diversity.