The Necessity of Metabias in
Metaheuristics.
John.woodward@nottingham.edu.cn
www.cs.nott.ac.uk/~jrw
Abstract
Bias is necessary for learning, and is a probability over a search space. This
is usually introduced implicitly. Each time a metaheuristic is executed it
will give a different solution. However, if executed repeated it will give the
same solution
on average
. In other words, the bias is static (even if we
include a self adaptive component to the search algorithm). One desirable
property of metaheuristics is that they
converge
. This means that there is
a non

zero probability of visiting each item in the search space. Search
algorithms are intended to be reused on many instances of a problem.
These instances can be consider to be drawn from a probability
distribution. In other words, a search algorithm and problem class can
both be viewed as probability distributions over the search space. If the
bias of a search algorithm does not match the bias of a problem class, it
will under perform, if however, they do match, it will perform well.
Therefore we need some mechanism of altering the initial bias of the
search algorithm to coincide with that of the problem class. This
mechanism can be realized by a meta level which alters the bias of the
base level. In other words, if a search algorithm is to be applied to many
instances of a problem, then meta bias is necessary. This implies that
convergence at the meta level means a search algorithm shift its bias to
any probability distribution. Additionally, shifting bias is equivalent to
automating the design of search algorithms.
Outline
•
Need and application of meta heuristics.
•
Preliminaries (search space, problem instance,
problem class, meta heuristic)
•
Generate

and

Test and Convergence
•
Bias and Probability
•
Many problem instances.
•
Bias and Problem Class
•
Summary and Conclusions
The Need for Heuristics
•
Many computational problems of industrial
interest are
intractable
to search.
•
The
combinatorial explosion
associated with
many problems means the search spaces
grows too rapidly for practical purposes.
•
For example, with the
travelling salesman
problem
, the size of the search space grows as
O(n!)
where n is the number of cities.
•
Therefore we
need
“non

exact” heuristics.
Examples of MetaHeuristics
•
Hill Climbing
•
Simulated Annealing
•
Genetic Algorithms
•
Ant colony algorithms
•
Swam particle optimization
•
…the list continues to grow…
Applications of Metaheuristis
•
Function Optimization
•
Function Regression
•
Combinatorial Problems
–
Bin Packing
–
Knapsack
–
Travelling Salesman
•
Program Induction
•
Reinforcement Learning
•
…
Preliminaries
•
A
search space
is a finite set of objects.
•
A
cost function
assigns a value to each object
in the search space.
•
A
problem instance
is a search space and cost
function.
•
A
problem class
is a probability distribution
over a set of problem instances with the same
search space.
•
A
metaheuristic
samples the search space in
order to find a “good quality” solution.
Generate

and

Test/Search
•
A candidate solution is generated
and tested.
•
This is repeated until some
termination condition is met
(usually a fixed number of
iterations).
•
Generate

and

test is also called
“search”.
•
In other words we are sampling the
search space.
test
generate
Property of Convergence
Convergence
: given enough time they will
eventually reach (sample/hit) the global
optima.
There is a
non

zero probability
of visiting all
points in the search space.
This criteria is usually easy to meet.
“
premature convergence
” and “
getting stuck in
a local optima
” are major issues for many
metaheuristics.
Bias and Probability
•
In the literature, “the bias of a metaheuristic”.
•
Bias is the probability distribution over the
search space (simplified assumption).
•
A metaheuristic is a random variable.
•
Let us call this “base bias”.
•
Sources of bias: Any design decision! E.g.
–
Cooling schedule, crossover operator, parameters.
•
It is unlikely the designer makes correct
choices or that the choices have much affect
(i.e. tuning one parameter maybe more
effective than tuning a different parameter).
One problem instance to the next one
Problem
instance1
MetaheuristicA
Problem
instance2
metaheuristicA
Metaheuritic A operates in the same way regardless of the
underlying problem instance (1 then 2 then 1). There in no mechanism to alter
the bias from one instance to the next (except case based reasoning).
Mechanisms do exist which alter the bias during the application of the
metaheuristic on a single problem instance, but rarely across problem
instances. We argue this is essential.
The metaheuristic may learn on one problem instance, but is not learning
across problem instances, which is essential if it is applied to many instances.
One problem instance to the next 2
Problem
instance1
metaheuristicA
Problem
instance2
metaheuristicA
Problem
instance1
metaheuristicA
Meta bias
Meta bias can affect how a metaheuristic operates on different instances.
As a special case, if we return to problem instance 1 (or something similar),
now at least have a mechanism to allow improved performance.
Bias and Problem Class
0
0.1
0.2
0.3
0.4
0.5
0.6
1
2
3
4
5
0
0.1
0.2
0.3
0.4
0.5
0.6
1
2
3
4
5
If no quality solutions exist in a
certain part of the search space,
there is no need in sampling it,
and therefore we do not require
the property of convergence.
We want, over a number of
instances, for the algorithm to
learn which are the promising
areas of the search space.
Convergence at the meta level
means for the probability of
sampling to change from an
arbitrary initial distribution for the
best for the class. This cannot be
achieved by parameter tuning
alone.
Distribution of quality solutions
over a problem class
Distribution of sampling
By a metaheuristic
Probability
Probability
Items in search space
Summarizing our contributions
1. If we apply our metaheuristics to many problem
instances, then meta
bias is necessary so the base bias
of the metaheuristic converges
towards the global
optima. Altering bias at the meta level is equivalent to
automatically designing metaheuristics.
2. At the meta level,
convergence means that the bias can
shift to any base bias (i.e. any probability distribution).
3.
Problem classes defined as probability distributions are
an essential part of the machine learning methodology.
A problem class defines a niche in which a suitable
metaheuristic can fit. Therefore algorithms should be
tested on problem instances which are drawn from this
distribution and NOT randomly selected benchmark
instances.
REFERENCES 1
[1] T.M. Mitchell, The Need for Biases in Learning Generalizations, Rutgers
Computer Science Department Technical Report CBM

TR

117, May, 1980.
Reprinted in Readings in Machine Learning, J.
Shavlik
and T.
Dietterich,
eds., Morgan Kaufmann, 1990.
[2] S.
Thrun
and L. Pratt, Learning To Learn, S.
Thrun
and L. Pratt, ed.,
Kluwer
Academic Publishers, 1998, 354 pages.
[3] E. K. Burke, J. Woodward, M. Hyde, G. Kendall, Automatic heuristic
generation with genetic programming: Evolving a Jack of all trades or a
master of one. Genetic and Evolutionary Computation Conference, GECCO
2007.
[4] C. Schumacher, M. D.
Vose
, and L. D. Whitley. The no free lunch and
problem description length. In proceedings of the Genetic and
Evolutionary Computation Conference, 565

570, California, USA, 7

11 July
2001. Morgan Kaufmann.
[5] T. M. Mitchell. Machine Learning. McGraw

Hill 1997.
[6]
Riccardo
Poli
, William B. Langdon and Nicholas
Freitag
McPhee
, A Field
Guide to Genetic Programming, Lulu.com, freely available under Creative
Commons
Licence
from www.gp

field

guide.org.uk, March 2008.
REFERENCES 2
[7] William B. Langdon: Scaling of Program Fitness Spaces. Evolutionary Computation
7(4): 399

428 (1999)
[8] Woodward J. Computable and Incomputable Search Algorithms and Functions. IEEE
International Conference on Intelligent Computing and Intelligent Systems (IEEE
ICIS 2009) November 20

22,2009 Shanghai, China.
[9] Woodward, J., Evans A., Dempster, P. 2008, A Syntactic Justification of Occam’s
Razor. October 31 to November 2, 2008 Midwest, A New Kind of Science
Conference Indiana University Bloomington, Indiana
[10] Marcus Hutter, ”Universal Artificial Intelligence: Sequential Decisions based on
Algorithmic Probability” Springer,2004,
http://www.hutter1.net/ai/uaibook.htm
.
[11] Hartley Rogers, Theory of Recursive Functions and Effective Computability, The
MIT Press (April 22, 1987)
[12] Edmund K. Burke and Graham Kendall (editors), Search Methodologies:
Introductory Tutorials in Optimization and Decision Support Techniques, Springer
2005.
[13] Stefan Droste, Thomas Jansen, Ingo Wegener: Perhaps Not a Free Lunch But At
Least a Free Appetizer, 13

17 July 1999: 833

839 Proceedings of the Genetic and
Evolutionary Computation Conference, Orlando, Florida, USA, Morgan Kaufmann,
The End
•
Thank you
•
I would be glad to take any questions.
•
John.woodward@nottingham.edu.cn
•
www.cs.nott.ac.uk/~jrw/
Comments 0
Log in to post a comment