Optimizing Sorting With Genetic Algorithms

grandgoatAI and Robotics

Oct 23, 2013 (3 years and 7 months ago)

157 views

Optimizing Sorting With Genetic
Algorithms
CS 498: Progam Optimization
Fall 2006
University of Illinois at Urbana-Champaign
ESSL on Power3ESSL on Power4
Outline
! Our Solution
! Primitives & Selection mechanisms
! Genetic Algorithm
! Performance results
! Classifier System
! ConclusionMotivation
! No universally best sorting algorithm
! Can we automatically GENERATE and tune sorting
algorithms for each platform (such as FFTW and
Spiral)?
– Performance of sorting on the platform and on the input
characteristics.
! The algorithm selection may not be enough.
Algorithm Selection (CGO’04)
! Select the best algorithm from Quicksort, Multiway
Merge Sort and CC-radix.
! Relevant input characteristics: number of keys, entropy
vector.Algorithm Selection (CGO’04)
Proposed Solution
! We need different algorithms for different partitions
! The best sorting algorithm should be the result of the
composition of the these different best algorithms.
! Build Composite Sorting algorithms
– Identify primitives from the sorting algorithms
– Design a general method to select an appropriate sorting
primitive at runtime
– Design a mechanism to combine the primitives and the
selection methods to generate the composite sorting algorithmOutline
! Our Solution
! Primitives & Selection mechanisms
! Genetic Algorithm
! Performance results
! Classifier System
! Conclusion
Sorting Primitives
! Divide-by-Value
– A step in Quicksort
– Select one or multiple pivots and sort the input array around
these pivots
– Parameter: number of pivots
! Divide-by-Position (DP)
– Divide input into same-size sub-partitions
– Use heap to merge the multiple sorted sub-partitions
– Parameters: size of sub-partitions, fan-out and size of the heapSorting Primitives
! Divide-by-Radix (DR)
– Non-comparison based sorting algorithm
– Parameter: radix (r bits)
– Step 1: Scan the input to get distribution array, which records how many
r
elements in each of the 2 sub-partitions.
– Step 2: Compute the accumulative distribution array, which is used as the
indexes when copying the input to the destination array.
r
– Step 3: Copy the input to the 2 sub-partitions.
src. counter accum. dest.
0 0
11 1 0 1 30
1 1
23 1 1 2 11
2 2
30 1 2 3 12
3 3
12 1 3 4 23
Sorting Primitives
! Divide-by-radix-assuming-uniform-distribution (DU)
– Step 1 and Step 2 in DR are expensive.
r
– If the input elements are distributed among 2 sub-partitions near evenly,
the input can be copied into the destination array directly assuming every
partition have the same number of elements.
– Overhead: partition overflow
– Parameter: radix (r bits)
src. accum. dest.
0
11 0 1 30
1
23 1 2 11
2
30 2 3 12
3
12 3 4 23Selection Primitives
• Branch-by-Size
• Branch-by-Entropy
– Parameter: number of branches, threshold vector of the
branches
Leaf Primitives
! When the size of a partition is small, we stick to one
algorithm to sort the partition fully.
! Two methods are used in the cleanup operation
– Quicksort
– CC-RadixComposite Sorting Algorithms
• The composite sorting algorithms are built
from these primitives.
• The algorithms have shapes of tree.
Outline
! Our Solution
! Primitives & Selection mechanisms
! Genetic Algorithm
! Performance results
! Classifier System
! ConclusionSearch Strategy
! Search the best tree
! Search the best parameter values of the primitives
– Good solutions for small size problem should be retained to
use in the solution for larger problem.
! Genetic algorithms are a natural solution that satisfy the
requirements:
– Preserve good sub-trees
– Give good sub-trees more chances to propagate
Composite Sorting Algorithms
• Search the best parameter values to adapt
– To the architectural features
– To the input characteristicsSearch Strategy
! Search for the best tree
! Search for the best parameter values of the primitives
– Good solutions for small size problem should be retained to
use in the solution for larger problem.
! Genetic algorithms are a natural solution that satisfy the
requirements:
– Preserve good sub-trees
– Give good sub-trees more chances to propagate
Genetic Algorithm
• Mutation
– Mutate the structure of the algorithm.
– Change the parameter values of primitives.Crossover
• Propagate good sub-trees
Fitness Function
! A fitness function measures the relative performance of
the genomes in a population.
! The average performance of a genome on the training
inputs is the base for the fitness of the genome.
! A genome which performs well across inputs is
preferred
– fitness is penalized when performance varies across the test
inputsLibrary Generation
! Installation phase: Use genetic algorithm to search for
the sorting genome.
– Set of genomes in initial population
– Test the genomes in a set of inputs with different
characteristics
Outline
! Our Solution
! Primitives & Selection mechanisms
! Genetic Algorithm
! Performance results
! Classifier System
! ConclusionPlatforms
! AMD Athlon MP
! Sun UltraSparcIII
! SGI R12000
! IBM Power3
! IBM Power4
! Intel Itanium2
! Intel Xeon
AMD Athlon MPPower3
Multiple-peak PerformanceOutline
! Our Solution
! Primitives & Selection mechanisms
! Genetic Algorithm
! Performance results
! Classifier System
! Conclusion
The best genomes in different regionsProblems of Genetic Adaptation
! Fitness function is the average performance of the
genome on the test inputs.
! Fitness function in our genetic algorithm prefers
genomes with stable performance
! The genetic algorithm is not powerful enough to evolve
into the complex genome which chooses the best
genome in each small region
Using Classifier System
! Search the best genomes for different regions of the
input characteristics.
– Selects the regions
– Selects the best algorithm for each region
! Nice feature: The fitness of a genomes in a region will
not be affected by its fitness in other regionsMap sorting composition into a classifier
system
! The input characteristics (number of keys and entropy
vector) are encoded into bit strings.
! A rule in the classifier system has two parts
– Condition: A string consisting of ‘0’, ‘1’, and ‘*’. Condition
string will be used to match the encoded input characteristics.
– Action: Sorting genomes without branch primitives
Example for Classifier Sorting
• Example:
– For inputs of up-to 16M keys
– Encode number of keys with 4 bits.
• 0000: 0~1M, 0001: 1~2M…
• Number of keys = 10.5M. Encoded into “1100”
Condition Action Fitness Accuracy
… …
1100 01** (dr 5 (lq 1 16))
… …
1100 1010 (dp 4 2 ( lr 5 16))
… …
1100 110* (dv 2 ( lr 6 16))Performance of Classifier Sorting
• Power3
Power4Conclusions
! Replace the complexity of finding an efficient algorithm
with the task of defining a set of generic primitives.
! Design methods to search in the space of the
composition of the primitives.
• Genetic algorithms
• Classifier system